Class SequenceListOnDisk.Builder<T extends SequenceDocument>

    • Constructor Detail

      • Builder

        public Builder​(boolean tryCompressingSequences,
                       SequenceDocument.Alphabet alphabet,
                       boolean allowGaps,
                       jebl.util.ProgressListener progressListenerForSortingAlignment,
                       int maximumNumberOfSequencesThatWillBeAdded)
                throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Create a new builder
        Parameters:
        tryCompressingSequences - true if we should analyze sequences names for compressible patterns and convert to ImmutableSequence where possible. When using compression, for optimal performance, addNameOfSequence(String) should be called on a first pass for all sequences before calling addSequence(SequenceDocument, jebl.util.ProgressListener)
        allowGaps - true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (see toSequenceListDocument) and true when constructing alignments (see toAlignmentDocument) When allowGaps is true, sequences will be sorted according to the number of leading gaps they have.
        alphabet - the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use with toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener), but for code that will only be using toSequenceListDocument(jebl.util.ProgressListener) or toAlignmentDocument(jebl.util.ProgressListener) on this builder, they can safely not use generics.
        progressListenerForSortingAlignment - when creating alignments, sorting and saving may need to be report progress prior to toAlignmentDocument(jebl.util.ProgressListener) is called. This progress listener will continue to have progress reported to it after returning from this method.
        maximumNumberOfSequencesThatWillBeAdded - an upper bound number of sequences that will be added to the alignment builder, or -1 if unknown. Progress reporting will be inaccurate if -1 is provided.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to a temporary file on disk.
      • Builder

        public Builder​(boolean tryCompressingSequences,
                       SequenceDocument.Alphabet alphabet,
                       boolean allowGaps,
                       jebl.util.ProgressListener progressListenerForSortingAlignment,
                       int maximumNumberOfSequencesThatWillBeAdded,
                       boolean createMultipleThreads)
                throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Create a new builder
        Parameters:
        tryCompressingSequences - true if we should analyze sequences names for compressible patterns and convert to ImmutableSequence where possible. When using compression, for optimal performance, addNameOfSequence(String) should be called on a first pass for all sequences before calling addSequence(SequenceDocument, jebl.util.ProgressListener)
        allowGaps - true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (see toSequenceListDocument) and true when constructing alignments (see toAlignmentDocument) When allowGaps is true, sequences will be sorted according to the number of leading gaps they have.
        alphabet - the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use with toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener), but for code that will only be using toSequenceListDocument(jebl.util.ProgressListener) or toAlignmentDocument(jebl.util.ProgressListener) on this builder, they can safely not use generics.
        progressListenerForSortingAlignment - when creating alignments, sorting and saving may need to be report progress prior to toAlignmentDocument(jebl.util.ProgressListener) is called. This progress listener will continue to have progress reported to it after returning from this method.
        maximumNumberOfSequencesThatWillBeAdded - an upper bound number of sequences that will be added to the alignment builder, or -1 if unknown. Progress reporting will be inaccurate if -1 is provided.
        createMultipleThreads - true to create multiple threads to improve performance when necessary. Normally this should be true except in cases where code is creating many SequenceListOnDisk.Builders at once in which case you might run out of system threads.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to a temporary file on disk.
        Since:
        API 4.800 (Geneious 8.0.0)
    • Method Detail

      • getMinimumSuggestedContigSizeForCreatingContigsOnDisk

        public static int getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
        Returns a suggested minimum number of sequences that should be in a SequenceListOnDisk.Builder when creating one. There is a bit of overhead for each contig creating on disk versions so it is better to create contigs with a small number of sequences and a short reference using the standard DefaultAlignmentDocument constructors. For SequenceLists, the overhead is small so creating a SequenceListOnDisk for even a list of 2 sequences is fine.
        Returns:
        a suggested minimum number of sequences that should be in a contig created from a SequenceListOnDisk.Builder when creating one.
        See Also:
        getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk(), shouldCreateContigOnDisk(int, int)
      • getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk

        public static int getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
        Returns a suggested minimum sequence length for the reference sequence in a SequenceListOnDisk.Builder when creating one. There is a bit of overhead for each contig creating on disk versions so it is better to create contigs with a small number of sequences and a short reference using the standard DefaultAlignmentDocument constructors.
        Returns:
        a suggested minimum sequence length for the reference sequence in a contig created from a SequenceListOnDisk.Builder when creating one.
        Since:
        API 4.50 (Geneious 5.5.0)
        See Also:
        getMinimumSuggestedContigSizeForCreatingContigsOnDisk(), shouldCreateContigOnDisk(int, int)
      • shouldCreateContigOnDisk

        public static boolean shouldCreateContigOnDisk​(int sequenceCount,
                                                       int referenceLength)
        Returns true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.
        Parameters:
        sequenceCount - number of sequences in the contig
        referenceLength - length of the reference sequence or -1 if no reference.
        Returns:
        true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.
        Since:
        API 4.50 (Geneious 5.5.0)
        See Also:
        getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk(), getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
      • shouldCreateContigOnDisk

        public static boolean shouldCreateContigOnDisk​(int sequenceCount,
                                                       int referenceLength,
                                                       long totalLengthOfMappedReads)
        Returns true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.
        Parameters:
        sequenceCount - number of sequences in the contig
        referenceLength - length of the reference sequence or -1 if no reference.
        totalLengthOfMappedReads - total length of all mapped reads or -1 if unknown
        Returns:
        true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.
        Since:
        API 4.810 (Geneious 8.1.0)
        See Also:
        getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk(), getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
      • addNameOfSequence

        public boolean addNameOfSequence​(java.lang.String sequenceName)
        Adds the name of a sequence that will be later added using addSequence(SequenceDocument, jebl.util.ProgressListener). It is optional to make a first pass of the sequences providing their names, but doing so will improve performance if tryCompressingSequences==true was passed to the builder constructor. This method should not be called for the name of the reference sequence that will be passed to addAlignmentReferenceSequence (if any).
        Parameters:
        sequenceName - the name of the sequence. If this is null, all earlier names are discarded and names are not compressed.
        Returns:
        false if a compressible pattern in the sequence names could not be found, and further calls to addNameOfSequence(String) will be ignored. The calling code could skip straight to the 2nd pass in this case.
        Throws:
        java.lang.IllegalStateException - if toSequenceList or addSequence has already been called
      • setCircularAlignmentLength

        public void setCircularAlignmentLength​(int circularAlignmentLength)
        Sets the alignment being built as circular as specified by SequenceAlignmentDocument.getCircularLength()
        Parameters:
        circularAlignmentLength - the circular alignment length or 0 for not circular (SequenceAlignmentDocument.getCircularLength())
        Throws:
        java.lang.IllegalStateException - if this is called after adding any sequences or if allowGaps==true wasn't passed to the constructor.
        Since:
        API 4.600 (Geneious 6.0.0)
      • addSequenceWithMate

        public void addSequenceWithMate​(SequenceDocument sequence,
                                        SequenceDocument mateSequence,
                                        int expectedDistance1,
                                        int expectedDistance2,
                                        jebl.util.ProgressListener progressListener)
                                 throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Similar to addSequence but this version adds 2 sequences which are paired
        Parameters:
        sequence - the sequence to add
        mateSequence - the mate sequence to add
        expectedDistance1 - the expected distance from the first sequence to its mate. Must not be 0.
        expectedDistance2 - the expected distance from the second sequence to its mate. This must be equal to either expectedDistance1 or -1*expectedDistance1, depending on the relative orientation of the pairs. See PairedReads for expected distance meanings.
        progressListener - for reporting progress if this is a large sequence
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
        java.lang.IllegalStateException - if toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called on this builder or if the sequence is not a NucleotideSequenceDocument or AminoAcidSequenceDocument
        java.lang.IllegalArgumentException - if the sequence contains gaps when not allowed, or if it contains invalid nucleotide or amino acid characters or if invalid expected distances are provided.
      • addAlignmentReferenceSequence

        public void addAlignmentReferenceSequence​(SequenceDocument referenceSequence,
                                                  jebl.util.ProgressListener progressListener)
                                           throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Adds a reference sequence when building an alignment. Only a single reference sequence may be added, and it must be added prior to other sequences via addSequence
        Parameters:
        referenceSequence - the reference sequence
        progressListener - for reporting progress if this is a large sequence
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
      • toSequenceListOnDiskOrInMemoryIfNecessary

        public java.util.List<T> toSequenceListOnDiskOrInMemoryIfNecessary​(jebl.util.ProgressListener progressListener)
                                                                    throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Creates a SequenceListOnDisk from this builder, but if any of the sequences have references to other documents, creates a list in memory instead since SequenceListOnDisk does not handle references to other documents. In most situations you probably want to use toSequenceListDocument or toAlignmentDocument instead of this method.
        Parameters:
        progressListener - for reporting progress and cancelling.
        Returns:
        a SequenceListOnDisk from this builder.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
        java.lang.IllegalStateException - if this method or toSequenceListDocument(jebl.util.ProgressListener) has already been called or if no sequences have been added
        Since:
        API 4.202500 (Geneious 2025.0.0)
      • toSequenceList

        public SequenceListOnDisk<T> toSequenceList​(jebl.util.ProgressListener progressListener)
                                             throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Creates a SequenceListOnDisk from this builder. In most situations you probably want to use toSequenceListDocument or toAlignmentDocument instead. And even in situations where you don't want a SequenceListDocument, you should generally use toSequenceListOnDiskOrInMemoryIfNecessary(ProgressListener) instead of this method.
        Parameters:
        progressListener - for reporting progress and cancelling.
        Returns:
        a SequenceListOnDisk from this builder.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
        java.lang.IllegalStateException - if this method or toSequenceListDocument(jebl.util.ProgressListener) has already been called or if no sequences have been added
      • toSequenceListDocument

        public DefaultSequenceListDocument toSequenceListDocument​(jebl.util.ProgressListener progressListener)
                                                           throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Creates a new SequenceListDocument from this builder. The result will usually contain nucleotide and/or amino acid sequencse which are an instance of a SequenceListOnDisk, but in cases where at least one of the sequences contains a references to other documents, then an in-memory list of sequences will be use instead. This is because SequenceListOnDisk doesn't handle references because when a document is copied to another database, the referenced URNs need updating, which is difficult to for a SequenceListOnDisk.
        Parameters:
        progressListener - for reporting progress and cancelling.
        Returns:
        a DefaultSequenceListDocument from this builder whose sequences are not loaded into memory, or an in memory list in cases where at least one of the sequences contains a references to other documents
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
        java.lang.IllegalStateException - if this method or toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called or if allowGaps is true for this builder.
      • toSequenceListOnDiskDocument

        public DefaultSequenceListDocument toSequenceListOnDiskDocument​(jebl.util.ProgressListener progressListener)
                                                                 throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Creates a new SequenceListDocument from this builder. In most cases, it is best to use toSequenceListDocument(ProgressListener) instead.
        Parameters:
        progressListener - for reporting progress and cancelling.
        Returns:
        a DefaultSequenceListDocument from this builder whose sequences are not loaded into memory.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
        java.lang.IllegalStateException - if this method or toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener) has already been called or if allowGaps is true for this builder.
        Since:
        API 4.202502 (Geneious 2025.0.2)
      • toAlignmentDocument

        public DefaultAlignmentDocument toAlignmentDocument​(jebl.util.ProgressListener progressListener)
                                                     throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Creates an Alignment document from this builder. This method may only be called if allowGaps==true was passed to the constructor Builder(boolean, SequenceDocument.Alphabet, boolean). For constructing reference sequence alignments, use addAlignmentReferenceSequence prior to adding sequences using addSequence
        Parameters:
        progressListener - for reporting progress and cancelling.
        Returns:
        a DefaultAlignmentDocument from this builder whose sequences are not loaded into memory.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if we can't write to disk or if the progressListener indicates we should cancel.
        java.lang.IllegalStateException - if allowGaps is false for this builder
      • getNumberOfSequences

        public int getNumberOfSequences()
        Returns the number of sequences added so far to this builder.
        Returns:
        the number of sequences added so far to this builder.