Class EndGapsManager
- java.lang.Object
-
- com.biomatters.geneious.publicapi.implementations.EndGapsManager
-
public class EndGapsManager extends java.lang.Object
Given a set of aligned sequences, provides methods for quickly finding all sequences that intersect a given base number, ignoring those sequences in end gap regions. On alignments/contigs with a reference sequence, the reference sequence is returned as an intersecting sequence just like reads. SomeDefaultAlignmentDocuments
may provideSequenceListOnDisk.AlignmentData
viaDefaultAlignmentDocument.getAlignmentDataForSequencesNotInMemory()
which in turn can provide a pre-built EndGapsManager viaAlignmentData.getEndGapsManager()
- Since:
- API 4.11 (Geneious 5.0)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
EndGapsManager.Builder
Used for building anEndGapsManager
which is too large to fit in memory
-
Constructor Summary
Constructors Constructor Description EndGapsManager(SequenceAlignmentDocument alignment, int minimumHashSize)
Creates an EndGapsManager for the sequences in the given alignment.EndGapsManager(SequenceCharSequence[] sequences, int minimumHashSize)
EndGapsManager(SequenceCharSequence[] sequences, int minimumHashSize, int numberOfSequences)
EndGapsManager(EndGapsManager endGapsManager, SequenceCharSequence referenceSequence)
Creates a new EndGapsManager identical to the provided one except for the addition of a reference sequence at index 0.EndGapsManager(java.util.List<SequenceCharSequence> sequences, int minimumHashSize, int numberOfSequences)
EndGapsManager(org.jdom.Element element, SequenceListOnDisk<SequenceDocument> associatedSequences)
Recreates an EndGapsManager from XML.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getHashSize()
Gets the hash size used for this manager.int
getNumberOfColumns()
int[]
getPotentialSequencesCoveringArray(int residueIndex)
Returns all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap, however, the returned list may also include sequences that do not cover the given residueIndex.int[]
getPotentialSequencesCoveringArray(int residueIndex, int circularAlignmentLength)
Returns all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap, however, the returned list may also include sequences that do not cover the given residueIndex.int
getSamePotentialSequencesLowerBound(int residueIndex)
Gets a lower bound such that all calls togetPotentialSequencesCoveringArray(int)
with an index between LowerBound and residueIndex inclusive would return the same array.int
getSamePotentialSequencesUpperBoundExclusive(int residueIndex)
Gets an upper bounds such at all calls togetPotentialSequencesCoveringArray(int)
with an index between residueIndex and this upper bound exclusive would return the same array.SequenceCharSequence
getSequence(int index)
int
getSequenceCount()
java.util.List<SequenceCharSequence>
getSequences()
int[]
getSequencesCoveringArray(int residueIndex)
java.lang.Iterable<java.lang.Integer>
getSequencesCoveringIterable(int residueIndex)
java.util.Iterator<java.lang.Integer>
getSequencesCoveringIterator(int residueIndex)
Geneious.MajorVersion
getVersionSupport(XMLSerializable.VersionSupportType versionType)
LikeXMLSerializable.OldVersionCompatible.getVersionSupport(com.biomatters.geneious.publicapi.documents.XMLSerializable.VersionSupportType)
but this class isn't really XML serialiable.org.jdom.Element
toXmlExcludingSequences(Geneious.MajorVersion version, jebl.util.ProgressListener progressListener)
Converts this EndGapsManager to XML, excluding the associated sequences, which must be stored independently and provided again during serialization usingEndGapsManager(org.jdom.Element, com.biomatters.geneious.publicapi.documents.sequence.SequenceListOnDisk)
org.jdom.Element
toXmlExcludingSequences(java.lang.String elementName)
Converts this EndGapsManager to XML, excluding the associated sequences, which must be stored independently and provided again during serialization usingEndGapsManager(org.jdom.Element, com.biomatters.geneious.publicapi.documents.sequence.SequenceListOnDisk)
-
-
-
Constructor Detail
-
EndGapsManager
public EndGapsManager(EndGapsManager endGapsManager, SequenceCharSequence referenceSequence)
Creates a new EndGapsManager identical to the provided one except for the addition of a reference sequence at index 0. If the provided EndGapsManager already has a reference sequence then the previous reference sequence is excluded from this new end gaps manager. An EndGapsManager created using this method is not serializable usingtoXmlExcludingSequences(String)
- Parameters:
endGapsManager
- an end gaps managerreferenceSequence
- the new reference sequence or null for no reference sequence- Since:
- API 4.40 (Geneious 5.4.0)
-
EndGapsManager
public EndGapsManager(org.jdom.Element element, SequenceListOnDisk<SequenceDocument> associatedSequences)
Recreates an EndGapsManager from XML.- Parameters:
element
- n element previously returned fromtoXmlExcludingSequences(String)
associatedSequences
- the sequence associated with the EndGapsManager- Since:
- API 4.40 (Geneious 5.4.0)
-
EndGapsManager
public EndGapsManager(SequenceCharSequence[] sequences, int minimumHashSize)
- Parameters:
sequences
- the sequences the consider end gaps on. This array is not copied and this class relies on the caller not changing the contents of this array after constructing this class.minimumHashSize
- Suggested value of 256. This means that for every 256 residues we maintain an index of all possible sequences that could cover that range. Lower values make lookups faster, but lower values take longer to construct and uses more memory. The implementation of this class may use a higher hash size than the provided one and the size it selects will be returned fromgetHashSize()
-
EndGapsManager
public EndGapsManager(SequenceCharSequence[] sequences, int minimumHashSize, int numberOfSequences)
- Parameters:
sequences
- the sequences to consider end gaps on. This array is not copied and this class relies on the caller not changing the contents of this array after constructing this class.minimumHashSize
- Suggested value of 256. This means that for every 256 residues we maintain an index of all possible sequences that could cover that range. Lower values make lookups faster, but lower values take longer to construct and uses more memory. The implementation of this class may use a higher hash size than the provided one and the size it selects will be returned fromgetHashSize()
numberOfSequences
- usually this is sequences.length, but sometimes the caller may want to use a subset of the sequences.
-
EndGapsManager
public EndGapsManager(java.util.List<SequenceCharSequence> sequences, int minimumHashSize, int numberOfSequences)
- Parameters:
sequences
- the sequences to consider end gaps on. This list is not copied and this class relies on the caller not changing the contents of this list after constructing this class.minimumHashSize
- Suggested value of 256. This means that for every 256 residues we maintain an index of all possible sequences that could cover that range. Lower values make lookups faster, but lower values take longer to construct and uses more memory. The implementation of this class may use a higher hash size than the provided one and the size it selects will be returned fromgetHashSize()
numberOfSequences
- usually this is sequences.length, but sometimes the caller may want to use a subset of the sequences.
-
EndGapsManager
public EndGapsManager(SequenceAlignmentDocument alignment, int minimumHashSize)
Creates an EndGapsManager for the sequences in the given alignment.- Parameters:
alignment
- the alignment containing the sequences to create the EndGapsManager on.minimumHashSize
- Suggested value of 256. This means that for every 256 residues we maintain an index of all possible sequences that could cover that range. Lower values make lookups faster, but lower values take longer to construct and uses more memory. The implementation of this class may use a higher hash size than the provided one and the size it selects will be returned fromgetHashSize()
- Since:
- API 4.11 (Geneious 5.0)
-
-
Method Detail
-
getVersionSupport
public Geneious.MajorVersion getVersionSupport(XMLSerializable.VersionSupportType versionType)
LikeXMLSerializable.OldVersionCompatible.getVersionSupport(com.biomatters.geneious.publicapi.documents.XMLSerializable.VersionSupportType)
but this class isn't really XML serialiable. Instead it provides a partial XML serialization method- Parameters:
versionType
- as defined byXMLSerializable.OldVersionCompatible.getVersionSupport(com.biomatters.geneious.publicapi.documents.XMLSerializable.VersionSupportType)
- Returns:
- as defined by
XMLSerializable.OldVersionCompatible.getVersionSupport(com.biomatters.geneious.publicapi.documents.XMLSerializable.VersionSupportType)
- Since:
- API 4.600 (Geneious 6.0.0)
-
toXmlExcludingSequences
public org.jdom.Element toXmlExcludingSequences(java.lang.String elementName) throws XMLSerializationException
Converts this EndGapsManager to XML, excluding the associated sequences, which must be stored independently and provided again during serialization usingEndGapsManager(org.jdom.Element, com.biomatters.geneious.publicapi.documents.sequence.SequenceListOnDisk)
- Parameters:
elementName
- the name of the element to return- Returns:
- XML for this EndGapsManager. Note that this uses
PluginDocument.FILE_DATA_ATTRIBUTE_NAME
- Throws:
XMLSerializationException
- if it can't be serialized (e.g. can't write to disk)java.lang.UnsupportedOperationException
- if this EndGapsManager was created usingEndGapsManager(EndGapsManager, com.biomatters.geneious.publicapi.documents.sequence.SequenceCharSequence)
- Since:
- API 4.40 (Geneious 5.4.0)
-
toXmlExcludingSequences
public org.jdom.Element toXmlExcludingSequences(Geneious.MajorVersion version, jebl.util.ProgressListener progressListener) throws XMLSerializationException
Converts this EndGapsManager to XML, excluding the associated sequences, which must be stored independently and provided again during serialization usingEndGapsManager(org.jdom.Element, com.biomatters.geneious.publicapi.documents.sequence.SequenceListOnDisk)
- Parameters:
version
-progressListener
-- Returns:
- XML for this EndGapsManager. Note that this uses
PluginDocument.FILE_DATA_ATTRIBUTE_NAME
- Throws:
XMLSerializationException
- if it can't be serialized (e.g. can't write to disk)- Since:
- API 4.600 (Geneious 6.0.0)
-
getSequencesCoveringArray
public int[] getSequencesCoveringArray(int residueIndex)
- Parameters:
residueIndex
- the 0-based index to get sequences indices for- Returns:
- all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap.
-
getPotentialSequencesCoveringArray
public int[] getPotentialSequencesCoveringArray(int residueIndex)
Returns all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap, however, the returned list may also include sequences that do not cover the given residueIndex. Most usages should usegetSequencesCoveringArray(int)
instead unless they are very concerned about performance and don't want to allocate the extra array required to return the results.- Parameters:
residueIndex
- the 0-based index to get sequences indices for- Returns:
- all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap. However, the returned list may also include sequences that do not cover the given residueIndex.
-
getPotentialSequencesCoveringArray
public int[] getPotentialSequencesCoveringArray(int residueIndex, int circularAlignmentLength)
Returns all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap, however, the returned list may also include sequences that do not cover the given residueIndex. Most usages should usegetSequencesCoveringArray(int)
instead unless they are very concerned about performance and don't want to allocate the extra array required to return the results.- Parameters:
residueIndex
- the 0-based index to get sequences indices forcircularAlignmentLength
- the circular length of the alignment if it is circular, or 0 if it is not circular (as defined bySequenceAlignmentDocument.getCircularLength()
- Returns:
- all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap. However, the returned list may also include sequences that do not cover the given residueIndex.
- Since:
- API 4.600 (Geneious 6.0.0)
-
getSamePotentialSequencesLowerBound
public int getSamePotentialSequencesLowerBound(int residueIndex)
Gets a lower bound such that all calls togetPotentialSequencesCoveringArray(int)
with an index between LowerBound and residueIndex inclusive would return the same array.- Parameters:
residueIndex
- the residue index to get a lower bound for.- Returns:
- an index<=residueIndex which indicates a lower bound 'LowerBound' for which all calls to
getPotentialSequencesCoveringArray(int)
with an index between LowerBound and residueIndex inclusive would return the same array. - See Also:
(int)
,getPotentialSequencesCoveringArray(int)
-
getSamePotentialSequencesUpperBoundExclusive
public int getSamePotentialSequencesUpperBoundExclusive(int residueIndex)
Gets an upper bounds such at all calls togetPotentialSequencesCoveringArray(int)
with an index between residueIndex and this upper bound exclusive would return the same array.- Parameters:
residueIndex
- the residue index to get an upper bound for.- Returns:
- and index>residueIndex which indicates an upper bound for which all calls to
getPotentialSequencesCoveringArray(int)
with an index between residueIndex and this upper bound exclusive would return the same array. - See Also:
getSamePotentialSequencesLowerBound(int)
,getPotentialSequencesCoveringArray(int)
-
getHashSize
public int getHashSize()
Gets the hash size used for this manager.- Returns:
- the hash size used for this end gaps manager. This will be greater than or equal to the hashSize passed to the constructor.
-
getSequencesCoveringIterable
public java.lang.Iterable<java.lang.Integer> getSequencesCoveringIterable(int residueIndex)
- Parameters:
residueIndex
- the 0-based index to get sequences indices for- Returns:
- all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap.
-
getSequencesCoveringIterator
public java.util.Iterator<java.lang.Integer> getSequencesCoveringIterator(int residueIndex)
- Parameters:
residueIndex
- the 0-based index to get sequences indices for- Returns:
- all sequences which contain the given index, excluding those sequences for which residueIndex is an end gap.
-
getSequenceCount
public int getSequenceCount()
- Returns:
- the number of sequences for which end gaps are being managed on.
-
getNumberOfColumns
public int getNumberOfColumns()
- Returns:
- the number of columns in the alignment (i.e. the
CharSequence.length()
of all sequences) - Since:
- API 4.610 (Geneious 6.1.0)
-
getSequence
public SequenceCharSequence getSequence(int index)
- Parameters:
index
- the index of the sequence to get in the range [0,getSequenceCount()
)- Returns:
- the sequence at this index.
-
getSequences
public java.util.List<SequenceCharSequence> getSequences()
- Returns:
- all sequences (plus potentially some more if
getSequenceCount()
< than the returned list length) managed by this end gaps manager. The returned value is the stored internal list so any usage of this method must not modify the contents of this list. - Since:
- API 4.11 (Geneious 5.0)
-
-