Class SequenceListOnDisk.Builder<T extends SequenceDocument>
- java.lang.Object
-
- com.biomatters.geneious.publicapi.documents.sequence.SequenceListOnDisk.Builder<T>
-
- Type Parameters:
T
- the type of sequence (NucleotideSequenceDocument
orAminoAcidSequenceDocument
)
- Enclosing class:
- SequenceListOnDisk<T extends SequenceDocument>
public static class SequenceListOnDisk.Builder<T extends SequenceDocument> extends java.lang.Object
Used for building a SequenceListOnDisk. The simplest way to use this builder is to calladdSequence
for each sequence to be added, then calltoSequenceListDocument
ortoAlignmentDocument
when done.
However, to improve performance when using sequence compression (see tryCompressingSequences parameter in constructor) it is recommended a first pass be made of the sequences, providing their names toaddNameOfSequence
before making the 2nd pass usingaddSequence
This class is not thread safe.- Since:
- API 4.40 (Geneious 5.4.0)
-
-
Constructor Summary
Constructors Constructor Description Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps)
Create a new builderBuilder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps, jebl.util.ProgressListener progressListenerForSortingAlignment, int maximumNumberOfSequencesThatWillBeAdded)
Create a new builderBuilder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps, jebl.util.ProgressListener progressListenerForSortingAlignment, int maximumNumberOfSequencesThatWillBeAdded, boolean createMultipleThreads)
Create a new builder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addAlignmentReferenceSequence(SequenceDocument referenceSequence, jebl.util.ProgressListener progressListener)
Adds a reference sequence when building an alignment.boolean
addNameOfSequence(java.lang.String sequenceName)
Adds the name of a sequence that will be later added usingaddSequence(SequenceDocument, jebl.util.ProgressListener)
.void
addSequence(SequenceDocument sequence, jebl.util.ProgressListener progressListener)
Adds a new unpaired sequence.void
addSequenceWithMate(SequenceDocument sequence, SequenceDocument mateSequence, int expectedDistance1, int expectedDistance2, jebl.util.ProgressListener progressListener)
Similar toaddSequence
but this version adds 2 sequences which are pairedstatic int
getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
Returns a suggested minimum number of sequences that should be in a SequenceListOnDisk.Builder when creating one.static int
getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
Returns a suggested minimum sequence length for the reference sequence in a SequenceListOnDisk.Builder when creating one.int
getNumberOfSequences()
Returns the number of sequences added so far to this builder.SequenceListOnDisk<T>
getSequenceList()
void
setCircularAlignmentLength(int circularAlignmentLength)
Sets the alignment being built as circular as specified bySequenceAlignmentDocument.getCircularLength()
static boolean
shouldCreateContigOnDisk(int sequenceCount, int referenceLength)
Returns true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.static boolean
shouldCreateContigOnDisk(int sequenceCount, int referenceLength, long totalLengthOfMappedReads)
Returns true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.DefaultAlignmentDocument
toAlignmentDocument(jebl.util.ProgressListener progressListener)
Creates an Alignment document from this builder.SequenceListOnDisk<T>
toSequenceList(jebl.util.ProgressListener progressListener)
Creates a SequenceListOnDisk from this builder.DefaultSequenceListDocument
toSequenceListDocument(jebl.util.ProgressListener progressListener)
Creates a new SequenceListDocument from this builder.DefaultSequenceListDocument
toSequenceListOnDiskDocument(jebl.util.ProgressListener progressListener)
Creates a new SequenceListDocument from this builder.java.util.List<T>
toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener progressListener)
Creates a SequenceListOnDisk from this builder, but if any of the sequences have references to other documents, creates a list in memory instead since SequenceListOnDisk does not handle references to other documents.
-
-
-
Constructor Detail
-
Builder
public Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Create a new builder- Parameters:
tryCompressingSequences
- true if we should analyze sequences names for compressible patterns and convert toImmutableSequence
where possible. When using compression, for optimal performance,addNameOfSequence(String)
should be called on a first pass for all sequences before callingaddSequence(SequenceDocument, jebl.util.ProgressListener)
allowGaps
- true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (seetoSequenceListDocument
) and true when constructing alignments (seetoAlignmentDocument
) When allowGaps is true, sequences will be sorted according to the number of leading gaps (seeSequenceCharSequence.getLeadingGapsLength
) they have.alphabet
- the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use withtoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
, but for code that will only be usingtoSequenceListDocument(jebl.util.ProgressListener)
ortoAlignmentDocument(jebl.util.ProgressListener)
on this builder, they can safely not use generics.- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to a temporary file on disk.
-
Builder
public Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps, jebl.util.ProgressListener progressListenerForSortingAlignment, int maximumNumberOfSequencesThatWillBeAdded) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Create a new builder- Parameters:
tryCompressingSequences
- true if we should analyze sequences names for compressible patterns and convert toImmutableSequence
where possible. When using compression, for optimal performance,addNameOfSequence(String)
should be called on a first pass for all sequences before callingaddSequence(SequenceDocument, jebl.util.ProgressListener)
allowGaps
- true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (seetoSequenceListDocument
) and true when constructing alignments (seetoAlignmentDocument
) When allowGaps is true, sequences will be sorted according to the number of leading gaps they have.alphabet
- the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use withtoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
, but for code that will only be usingtoSequenceListDocument(jebl.util.ProgressListener)
ortoAlignmentDocument(jebl.util.ProgressListener)
on this builder, they can safely not use generics.progressListenerForSortingAlignment
- when creating alignments, sorting and saving may need to be report progress prior totoAlignmentDocument(jebl.util.ProgressListener)
is called. This progress listener will continue to have progress reported to it after returning from this method.maximumNumberOfSequencesThatWillBeAdded
- an upper bound number of sequences that will be added to the alignment builder, or -1 if unknown. Progress reporting will be inaccurate if -1 is provided.- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to a temporary file on disk.
-
Builder
public Builder(boolean tryCompressingSequences, SequenceDocument.Alphabet alphabet, boolean allowGaps, jebl.util.ProgressListener progressListenerForSortingAlignment, int maximumNumberOfSequencesThatWillBeAdded, boolean createMultipleThreads) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Create a new builder- Parameters:
tryCompressingSequences
- true if we should analyze sequences names for compressible patterns and convert toImmutableSequence
where possible. When using compression, for optimal performance,addNameOfSequence(String)
should be called on a first pass for all sequences before callingaddSequence(SequenceDocument, jebl.util.ProgressListener)
allowGaps
- true if gaps are allowed in the sequences. This should be false when constructing sequences lists documents (seetoSequenceListDocument
) and true when constructing alignments (seetoAlignmentDocument
) When allowGaps is true, sequences will be sorted according to the number of leading gaps they have.alphabet
- the alphabet (nucleotide or amino acid) of the sequences to be added. If the calling code chooses to use generics on this builder, it is up to it to ensure the builder type matches this parameter. The generics are useful for use withtoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
, but for code that will only be usingtoSequenceListDocument(jebl.util.ProgressListener)
ortoAlignmentDocument(jebl.util.ProgressListener)
on this builder, they can safely not use generics.progressListenerForSortingAlignment
- when creating alignments, sorting and saving may need to be report progress prior totoAlignmentDocument(jebl.util.ProgressListener)
is called. This progress listener will continue to have progress reported to it after returning from this method.maximumNumberOfSequencesThatWillBeAdded
- an upper bound number of sequences that will be added to the alignment builder, or -1 if unknown. Progress reporting will be inaccurate if -1 is provided.createMultipleThreads
- true to create multiple threads to improve performance when necessary. Normally this should be true except in cases where code is creating many SequenceListOnDisk.Builders at once in which case you might run out of system threads.- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to a temporary file on disk.- Since:
- API 4.800 (Geneious 8.0.0)
-
-
Method Detail
-
getMinimumSuggestedContigSizeForCreatingContigsOnDisk
public static int getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
Returns a suggested minimum number of sequences that should be in a SequenceListOnDisk.Builder when creating one. There is a bit of overhead for each contig creating on disk versions so it is better to create contigs with a small number of sequences and a short reference using the standardDefaultAlignmentDocument
constructors. For SequenceLists, the overhead is small so creating a SequenceListOnDisk for even a list of 2 sequences is fine.- Returns:
- a suggested minimum number of sequences that should be in a contig created from a SequenceListOnDisk.Builder when creating one.
- See Also:
getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
,shouldCreateContigOnDisk(int, int)
-
getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk
public static int getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
Returns a suggested minimum sequence length for the reference sequence in a SequenceListOnDisk.Builder when creating one. There is a bit of overhead for each contig creating on disk versions so it is better to create contigs with a small number of sequences and a short reference using the standardDefaultAlignmentDocument
constructors.- Returns:
- a suggested minimum sequence length for the reference sequence in a contig created from a SequenceListOnDisk.Builder when creating one.
- Since:
- API 4.50 (Geneious 5.5.0)
- See Also:
getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
,shouldCreateContigOnDisk(int, int)
-
shouldCreateContigOnDisk
public static boolean shouldCreateContigOnDisk(int sequenceCount, int referenceLength)
Returns true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.- Parameters:
sequenceCount
- number of sequences in the contigreferenceLength
- length of the reference sequence or -1 if no reference.- Returns:
- true if the sequence count and/or reference length of this contig are sufficient for it to be recommended to create a contig on disk.
- Since:
- API 4.50 (Geneious 5.5.0)
- See Also:
getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
,getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
-
shouldCreateContigOnDisk
public static boolean shouldCreateContigOnDisk(int sequenceCount, int referenceLength, long totalLengthOfMappedReads)
Returns true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.- Parameters:
sequenceCount
- number of sequences in the contigreferenceLength
- length of the reference sequence or -1 if no reference.totalLengthOfMappedReads
- total length of all mapped reads or -1 if unknown- Returns:
- true if the sequence count and/or reference length and/or total length of all the mapped reads in this contig are sufficient for it to be recommended to create a contig on disk.
- Since:
- API 4.810 (Geneious 8.1.0)
- See Also:
getMinimumSuggestedReferenceLengthForCreatingContigsOnDisk()
,getMinimumSuggestedContigSizeForCreatingContigsOnDisk()
-
addNameOfSequence
public boolean addNameOfSequence(java.lang.String sequenceName)
Adds the name of a sequence that will be later added usingaddSequence(SequenceDocument, jebl.util.ProgressListener)
. It is optional to make a first pass of the sequences providing their names, but doing so will improve performance if tryCompressingSequences==true was passed to the builder constructor. This method should not be called for the name of the reference sequence that will be passed toaddAlignmentReferenceSequence
(if any).- Parameters:
sequenceName
- the name of the sequence. If this is null, all earlier names are discarded and names are not compressed.- Returns:
- false if a compressible pattern in the sequence names could not be found, and further calls to
addNameOfSequence(String)
will be ignored. The calling code could skip straight to the 2nd pass in this case. - Throws:
java.lang.IllegalStateException
- iftoSequenceList
oraddSequence
has already been called
-
setCircularAlignmentLength
public void setCircularAlignmentLength(int circularAlignmentLength)
Sets the alignment being built as circular as specified bySequenceAlignmentDocument.getCircularLength()
- Parameters:
circularAlignmentLength
- the circular alignment length or 0 for not circular (SequenceAlignmentDocument.getCircularLength()
)- Throws:
java.lang.IllegalStateException
- if this is called after adding any sequences or if allowGaps==true wasn't passed to the constructor.- Since:
- API 4.600 (Geneious 6.0.0)
-
addSequenceWithMate
public void addSequenceWithMate(SequenceDocument sequence, SequenceDocument mateSequence, int expectedDistance1, int expectedDistance2, jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Similar toaddSequence
but this version adds 2 sequences which are paired- Parameters:
sequence
- the sequence to addmateSequence
- the mate sequence to addexpectedDistance1
- the expected distance from the first sequence to its mate. Must not be 0.expectedDistance2
- the expected distance from the second sequence to its mate. This must be equal to either expectedDistance1 or -1*expectedDistance1, depending on the relative orientation of the pairs. SeePairedReads
for expected distance meanings.progressListener
- for reporting progress if this is a large sequence- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- iftoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
has already been called on this builder or if the sequence is not a NucleotideSequenceDocument or AminoAcidSequenceDocumentjava.lang.IllegalArgumentException
- if the sequence contains gaps when not allowed, or if it contains invalid nucleotide or amino acid characters or if invalid expected distances are provided.
-
addSequence
public void addSequence(SequenceDocument sequence, jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Adds a new unpaired sequence. If this builder is for a sequence list, then paired sequences can be added either by usingaddSequenceWithMate(SequenceDocument, SequenceDocument, int, int, jebl.util.ProgressListener)
or can be paired later by usingDefaultSequenceListDocument.setPairedReadsManager
on the result oftoSequenceListDocument
If this list is being built for use with an alignment, since sequences will get sorted by leading gaps, paired sequences need to be added usingaddSequenceWithMate
. This method should not be called to add the reference sequence to a contig. InsteadaddAlignmentReferenceSequence(SequenceDocument, jebl.util.ProgressListener addAlignmentReferenceSequence)
should be called prior to adding any sequences.- Parameters:
sequence
- the sequence to addprogressListener
- for reporting progress if this is a large sequence- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- iftoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
has already been called on this builder or if the sequence is not a NucleotideSequenceDocument or AminoAcidSequenceDocumentjava.lang.IllegalArgumentException
- if the sequence contains gaps when not allowed, or if it contains invalid nucleotide or amino acid characters.
-
addAlignmentReferenceSequence
public void addAlignmentReferenceSequence(SequenceDocument referenceSequence, jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Adds a reference sequence when building an alignment. Only a single reference sequence may be added, and it must be added prior to other sequences viaaddSequence
- Parameters:
referenceSequence
- the reference sequenceprogressListener
- for reporting progress if this is a large sequence- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.
-
toSequenceListOnDiskOrInMemoryIfNecessary
public java.util.List<T> toSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Creates a SequenceListOnDisk from this builder, but if any of the sequences have references to other documents, creates a list in memory instead since SequenceListOnDisk does not handle references to other documents. In most situations you probably want to usetoSequenceListDocument
ortoAlignmentDocument
instead of this method.- Parameters:
progressListener
- for reporting progress and cancelling.- Returns:
- a SequenceListOnDisk from this builder.
- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- if this method ortoSequenceListDocument(jebl.util.ProgressListener)
has already been called or if no sequences have been added- Since:
- API 4.202500 (Geneious 2025.0.0)
-
toSequenceList
public SequenceListOnDisk<T> toSequenceList(jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Creates a SequenceListOnDisk from this builder. In most situations you probably want to usetoSequenceListDocument
ortoAlignmentDocument
instead. And even in situations where you don't want a SequenceListDocument, you should generally usetoSequenceListOnDiskOrInMemoryIfNecessary(ProgressListener)
instead of this method.- Parameters:
progressListener
- for reporting progress and cancelling.- Returns:
- a SequenceListOnDisk from this builder.
- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- if this method ortoSequenceListDocument(jebl.util.ProgressListener)
has already been called or if no sequences have been added
-
getSequenceList
@Deprecated(since="2025.0.0") public SequenceListOnDisk<T> getSequenceList()
Deprecated.Gets the result returned from a previous call totoSequenceList
. In most situations you probably want to usetoSequenceListDocument
ortoAlignmentDocument
instead. This method need not be used unless you are constructing your own alignment or sequence list implementation rather than usingtoAlignmentDocument
ortoSequenceListDocument
- Returns:
- a SequenceListOnDisk from this builder.
- Throws:
java.lang.IllegalStateException
- if at least one oftoSequenceListDocument(jebl.util.ProgressListener)
ortoSequenceList(jebl.util.ProgressListener)
ortoAlignmentDocument(jebl.util.ProgressListener)
have not been called yet
-
toSequenceListDocument
public DefaultSequenceListDocument toSequenceListDocument(jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Creates a new SequenceListDocument from this builder. The result will usually contain nucleotide and/or amino acid sequencse which are an instance of a SequenceListOnDisk, but in cases where at least one of the sequences contains a references to other documents, then an in-memory list of sequences will be use instead. This is because SequenceListOnDisk doesn't handle references because when a document is copied to another database, the referenced URNs need updating, which is difficult to for a SequenceListOnDisk.- Parameters:
progressListener
- for reporting progress and cancelling.- Returns:
- a DefaultSequenceListDocument from this builder whose sequences are not loaded into memory, or an in memory list in cases where at least one of the sequences contains a references to other documents
- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- if this method ortoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
has already been called or ifallowGaps
is true for this builder.
-
toSequenceListOnDiskDocument
public DefaultSequenceListDocument toSequenceListOnDiskDocument(jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Creates a new SequenceListDocument from this builder. In most cases, it is best to usetoSequenceListDocument(ProgressListener)
instead.- Parameters:
progressListener
- for reporting progress and cancelling.- Returns:
- a DefaultSequenceListDocument from this builder whose sequences are not loaded into memory.
- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- if this method ortoSequenceListOnDiskOrInMemoryIfNecessary(jebl.util.ProgressListener)
has already been called or ifallowGaps
is true for this builder.- Since:
- API 4.202502 (Geneious 2025.0.2)
-
toAlignmentDocument
public DefaultAlignmentDocument toAlignmentDocument(jebl.util.ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
Creates an Alignment document from this builder. This method may only be called if allowGaps==true was passed to the constructorBuilder(boolean, SequenceDocument.Alphabet, boolean)
. For constructing reference sequence alignments, useaddAlignmentReferenceSequence
prior to adding sequences usingaddSequence
- Parameters:
progressListener
- for reporting progress and cancelling.- Returns:
- a DefaultAlignmentDocument from this builder whose sequences are not loaded into memory.
- Throws:
com.biomatters.geneious.publicapi.plugin.DocumentOperationException
- if we can't write to disk or if the progressListener indicates we should cancel.java.lang.IllegalStateException
- ifallowGaps
is false for this builder
-
getNumberOfSequences
public int getNumberOfSequences()
Returns the number of sequences added so far to this builder.- Returns:
- the number of sequences added so far to this builder.
-
-