Class SequenceListOnDisk.Builder<T extends SequenceDocument>

java.lang.Object
com.biomatters.geneious.publicapi.documents.sequence.SequenceListOnDisk.Builder<T>
Type Parameters:
T - the type of sequence (NucleotideSequenceDocument or AminoAcidSequenceDocument)
Enclosing class:
SequenceListOnDisk<T extends SequenceDocument>

public static class SequenceListOnDisk.Builder<T extends SequenceDocument> extends Object
Used for building a SequenceListOnDisk.

The simplest way to use this builder is to call addSequence for each sequence to be added, then call toSequenceListDocument or toAlignmentDocument when done.
However, to improve performance when using sequence compression (see tryCompressingSequences parameter in constructor) it is recommended a first pass be made of the sequences, providing their names to addNameOfSequence before making the 2nd pass using addSequence

This class is not thread safe.

Since:
API 4.40 (Geneious 5.4.0)
  • Constructor Details

    • Builder

      public Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Create a new builder
      Parameters:
      tryCompressingSequences - true if we should analyze sequences names for compressible patterns and convert to ImmutableSequence where possible. When using compression, for optimal performance, addNameOfSequence(String) should be called on a first pass for all sequences before calling addSequence(SequenceDocument, jebl.util.ProgressListener)
      alphabet - the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use with toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener), but for code that will only be using toSequenceListDocument(jebl.util.ProgressListener) or toAlignmentDocument(jebl.util.ProgressListener) on this builder, they can safely not use generics.
      allowGaps - true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (see toSequenceListDocument) and true when constructing alignments (see toAlignmentDocument) When allowGaps is true, sequences will be sorted according to the number of leading gaps (see SequenceCharSequence.getLeadingGapsLength) they have.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to a temporary file on disk.
    • Builder

      public Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps, ProgressListener progressListenerForSortingAlignment, int maximumNumberOfSequencesThatWillBeAdded) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Create a new builder
      Parameters:
      tryCompressingSequences - true if we should analyze sequences names for compressible patterns and convert to ImmutableSequence where possible. When using compression, for optimal performance, addNameOfSequence(String) should be called on a first pass for all sequences before calling addSequence(SequenceDocument, jebl.util.ProgressListener)
      alphabet - the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use with toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener), but for code that will only be using toSequenceListDocument(jebl.util.ProgressListener) or toAlignmentDocument(jebl.util.ProgressListener) on this builder, they can safely not use generics.
      allowGaps - true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (see toSequenceListDocument) and true when constructing alignments (see toAlignmentDocument) When allowGaps is true, sequences will be sorted according to the number of leading gaps they have.
      progressListenerForSortingAlignment - when creating alignments, sorting and saving may need to be report progress prior to toAlignmentDocument(jebl.util.ProgressListener) is called. This progress listener will continue to have progress reported to it after returning from this method.
      maximumNumberOfSequencesThatWillBeAdded - an upper bound number of sequences that will be added to the alignment builder, or -1 if unknown. Progress reporting will be inaccurate if -1 is provided.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to a temporary file on disk.
    • Builder

      public Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps, ProgressListener progressListenerForSortingAlignment, int maximumNumberOfSequencesThatWillBeAdded, boolean createMultipleThreads) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Create a new builder
      Parameters:
      tryCompressingSequences - true if we should analyze sequences names for compressible patterns and convert to ImmutableSequence where possible. When using compression, for optimal performance, addNameOfSequence(String) should be called on a first pass for all sequences before calling addSequence(SequenceDocument, jebl.util.ProgressListener)
      alphabet - the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use with toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener), but for code that will only be using toSequenceListDocument(jebl.util.ProgressListener) or toAlignmentDocument(jebl.util.ProgressListener) on this builder, they can safely not use generics.
      allowGaps - true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (see toSequenceListDocument) and true when constructing alignments (see toAlignmentDocument) When allowGaps is true, sequences will be sorted according to the number of leading gaps they have.
      progressListenerForSortingAlignment - when creating alignments, sorting and saving may need to be report progress prior to toAlignmentDocument(jebl.util.ProgressListener) is called. This progress listener will continue to have progress reported to it after returning from this method.
      maximumNumberOfSequencesThatWillBeAdded - an upper bound number of sequences that will be added to the alignment builder, or -1 if unknown. Progress reporting will be inaccurate if -1 is provided.
      createMultipleThreads - true to create multiple threads to improve performance when necessary. Normally this should be true except in cases where code is creating many SequenceListOnDisk.Builders at once in which case you might run out of system threads.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to a temporary file on disk.
      Since:
      API 4.800 (Geneious 8.0.0)
  • Method Details

    • getMinimumSuggestedContigSizeForCreatingContigsOnDisk

      public static int getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
      Returns a suggested minimum number of sequences that should be in a SequenceListOnDisk.Builder when creating one. There is a bit of overhead for each contig creating on disk versions so it is better to create contigs with a small number of sequences and a short reference using the standard DefaultAlignmentDocument constructors. For SequenceLists, the overhead is small so creating a SequenceListOnDisk for even a list of 2 sequences is fine.
      Returns:
      a suggested minimum number of sequences that should be in a contig created from a SequenceListOnDisk.Builder when creating one.
      See Also:
    • getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk

      public static int getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
      Returns a suggested minimum sequence length for the reference sequence in a SequenceListOnDisk.Builder when creating one. There is a bit of overhead for each contig creating on disk versions so it is better to create contigs with a small number of sequences and a short reference using the standard DefaultAlignmentDocument constructors.
      Returns:
      a suggested minimum sequence length for the reference sequence in a contig created from a SequenceListOnDisk.Builder when creating one.
      Since:
      API 4.50 (Geneious 5.5.0)
      See Also:
    • shouldCreateContigOnDisk

      public static boolean shouldCreateContigOnDisk(int sequenceCount, int referenceLength)
      Returns true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.
      Parameters:
      sequenceCount - number of sequences in the contig
      referenceLength - length of the reference sequence or -1 if no reference.
      Returns:
      true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.
      Since:
      API 4.50 (Geneious 5.5.0)
      See Also:
    • shouldCreateContigOnDisk

      public static boolean shouldCreateContigOnDisk(int sequenceCount, int referenceLength, long totalLengthOfMappedReads)
      Returns true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.
      Parameters:
      sequenceCount - number of sequences in the contig
      referenceLength - length of the reference sequence or -1 if no reference.
      totalLengthOfMappedReads - total length of all mapped reads or -1 if unknown
      Returns:
      true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.
      Since:
      API 4.810 (Geneious 8.1.0)
      See Also:
    • addNameOfSequence

      public boolean addNameOfSequence(String sequenceName)
      Adds the name of a sequence that will be later added using addSequence(SequenceDocument, jebl.util.ProgressListener). It is optional to make a first pass of the sequences providing their names, but doing so will improve performance if tryCompressingSequences==true was passed to the builder constructor. This method should not be called for the name of the reference sequence that will be passed to addAlignmentReferenceSequence (if any).
      Parameters:
      sequenceName - the name of the sequence. If this is null, all earlier names are discarded and names are not compressed.
      Returns:
      false if a compressible pattern in the sequence names could not be found, and further calls to addNameOfSequence(String) will be ignored. The calling code could skip straight to the 2nd pass in this case.
      Throws:
      IllegalStateException - if toSequenceList or addSequence has already been called
    • setCircularAlignmentLength

      public void setCircularAlignmentLength(int circularAlignmentLength)
      Sets the alignment being built as circular as specified by SequenceAlignmentDocument.getCircularLength()
      Parameters:
      circularAlignmentLength - the circular alignment length or 0 for not circular (SequenceAlignmentDocument.getCircularLength())
      Throws:
      IllegalStateException - if this is called after adding any sequences or if allowGaps==true wasn't passed to the constructor.
      Since:
      API 4.600 (Geneious 6.0.0)
    • addSequenceWithMate

      public void addSequenceWithMate(SequenceDocument sequence, SequenceDocument mateSequence, int expectedDistance1, int expectedDistance2, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Similar to addSequence but this version adds 2 sequences which are paired
      Parameters:
      sequence - the sequence to add
      mateSequence - the mate sequence to add
      expectedDistance1 - the expected distance from the first sequence to its mate. Must not be 0.
      expectedDistance2 - the expected distance from the second sequence to its mate. This must be equal to either expectedDistance1 or -1*expectedDistance1, depending on the relative orientation of the pairs. See PairedReads for expected distance meanings.
      progressListener - for reporting progress if this is a large sequence
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called on this builder or if the sequence is not a NucleotideSequenceDocument or AminoAcidSequenceDocument
      IllegalArgumentException - if the sequence contains gaps when not allowed, or if it contains invalid nucleotide or amino acid characters or if invalid expected distances are provided.
    • addSequence

      public void addSequence(SequenceDocument sequence, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Adds a new unpaired sequence.

      If this builder is for a sequence list, then paired sequences can be added either by using addSequenceWithMate(SequenceDocument, SequenceDocument, int, int, jebl.util.ProgressListener) or can be paired later by using DefaultSequenceListDocument.setPairedReadsManager on the result of toSequenceListDocument

      If this list is being built for use with an alignment, since sequences will get sorted by leading gaps, paired sequences need to be added using addSequenceWithMate.

      This method should not be called to add the reference sequence to a contig. Instead addAlignmentReferenceSequence(SequenceDocument, jebl.util.ProgressListener addAlignmentReferenceSequence) should be called prior to adding any sequences.

      Parameters:
      sequence - the sequence to add
      progressListener - for reporting progress if this is a large sequence
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called on this builder or if the sequence is not a NucleotideSequenceDocument or AminoAcidSequenceDocument
      IllegalArgumentException - if the sequence contains gaps when not allowed, or if it contains invalid nucleotide or amino acid characters.
    • addAlignmentReferenceSequence

      public void addAlignmentReferenceSequence(SequenceDocument referenceSequence, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Adds a reference sequence when building an alignment. Only a single reference sequence may be added, and it must be added prior to other sequences via addSequence
      Parameters:
      referenceSequence - the reference sequence
      progressListener - for reporting progress if this is a large sequence
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
    • toSequenceListOnDiskOrInMemoryIfNecessary

      public List<T> toSequenceListOnDiskOrInMemoryIfNecessary(ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Creates a SequenceListOnDisk from this builder, but if any of the sequences have references to other documents, creates a list in memory instead since SequenceListOnDisk does not handle references to other documents. In most situations you probably want to use toSequenceListDocument or toAlignmentDocument instead of this method.
      Parameters:
      progressListener - for reporting progress and cancelling.
      Returns:
      a SequenceListOnDisk from this builder.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if this method or toSequenceListDocument(jebl.util.ProgressListener) has already been called or if no sequences have been added
      Since:
      API 4.202500 (Geneious 2025.0.0)
    • toSequenceList

      public SequenceListOnDisk<T> toSequenceList(ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Creates a SequenceListOnDisk from this builder. In most situations you probably want to use toSequenceListDocument or toAlignmentDocument instead. And even in situations where you don't want a SequenceListDocument, you should generally use toSequenceListOnDiskOrInMemoryIfNecessary(ProgressListener) instead of this method.
      Parameters:
      progressListener - for reporting progress and cancelling.
      Returns:
      a SequenceListOnDisk from this builder.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if this method or toSequenceListDocument(jebl.util.ProgressListener) has already been called or if no sequences have been added
    • getSequenceList

      @Deprecated(since="2025.0.0") public SequenceListOnDisk<T> getSequenceList()
      Gets the result returned from a previous call to toSequenceList. In most situations you probably want to use toSequenceListDocument or toAlignmentDocument instead.

      This method need not be used unless you are constructing your own alignment or sequence list implementation rather than using toAlignmentDocument or toSequenceListDocument

      Returns:
      a SequenceListOnDisk from this builder.
      Throws:
      IllegalStateException - if at least one of toSequenceListDocument(jebl.util.ProgressListener) or toSequenceList(jebl.util.ProgressListener) or toAlignmentDocument(jebl.util.ProgressListener) have not been called yet
    • toSequenceListDocument

      public DefaultSequenceListDocument toSequenceListDocument(ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Creates a new SequenceListDocument from this builder. The result will usually contain nucleotide and/or amino acid sequencse which are an instance of a SequenceListOnDisk, but in cases where at least one of the sequences contains a references to other documents, then an in-memory list of sequences will be use instead. This is because SequenceListOnDisk doesn't handle references because when a document is copied to another database, the referenced URNs need updating, which is difficult to for a SequenceListOnDisk.
      Parameters:
      progressListener - for reporting progress and cancelling.
      Returns:
      a DefaultSequenceListDocument from this builder whose sequences are not loaded into memory, or an in memory list in cases where at least one of the sequences contains a references to other documents
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if this method or toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called or if allowGaps is true for this builder.
    • toSequenceListOnDiskDocument

      public DefaultSequenceListDocument toSequenceListOnDiskDocument(ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Creates a new SequenceListDocument from this builder. In most cases, it is best to use toSequenceListDocument(ProgressListener) instead.
      Parameters:
      progressListener - for reporting progress and cancelling.
      Returns:
      a DefaultSequenceListDocument from this builder whose sequences are not loaded into memory.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if this method or toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called or if allowGaps is true for this builder.
      Since:
      API 4.202502 (Geneious 2025.0.2)
    • toAlignmentDocument

      public DefaultAlignmentDocument toAlignmentDocument(ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Creates an Alignment document from this builder. This method may only be called if allowGaps==true was passed to the constructor Builder(boolean, SequenceDocument.Alphabet, boolean). For constructing reference sequence alignments, use addAlignmentReferenceSequence prior to adding sequences using addSequence
      Parameters:
      progressListener - for reporting progress and cancelling.
      Returns:
      a DefaultAlignmentDocument from this builder whose sequences are not loaded into memory.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      IllegalStateException - if allowGaps is false for this builder
    • getNumberOfSequences

      public int getNumberOfSequences()
      Returns the number of sequences added so far to this builder.
      Returns:
      the number of sequences added so far to this builder.