Class SequenceGapInformation


  • public final class SequenceGapInformation
    extends java.lang.Object
    Precalculates information about the location of gaps ('-') in a CharSequence, and can efficiently calculate translate between indices in the gapped and ungapped sequence. This translation also works for indices beyond the ends of the sequence, which is necessary for translating SequenceAnnotations, which can go beyond sequence ends.

    Like all other sequence indices except for the ones in SequenceAnnotations, indices used in this class are 0-based.

    When the sequence actually contains internal gaps, this class uses memory about 0.5 bytes of memory per base in the gapped internal sequence for large sequences (over 10 million base pairs)

    Some DefaultSequenceDocument instances (usually only instances of reference sequences in a big contig) store a pre-built SequenceGapInformation which is available DefaultSequenceDocument.getSequenceGapInformation()

    • Constructor Detail

      • SequenceGapInformation

        public SequenceGapInformation​(org.jdom.Element element,
                                      SequenceCharSequence gappedCharSequence)
        Deserializes a SequenceGapInformation from XML previously returned from toXML(String).
        Parameters:
        element - an element previously returned from toXML(String).
        gappedCharSequence - the gapped char sequence previously associated with the previously serialized SequenceGapInformation.
        Since:
        API 4.50 (Geneious 5.5.0)
      • SequenceGapInformation

        public SequenceGapInformation​(java.lang.CharSequence gappedSequence)
        Constructs SequenceGapInformation for the specified gapped sequence.
        Parameters:
        gappedSequence - A CharSequence that contains gaps and for which we want to be able to translate between gapped and ungapped indices.
      • SequenceGapInformation

        public SequenceGapInformation​(java.lang.CharSequence gappedSequence,
                                      jebl.util.ProgressListener progressListener)
                               throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException.Canceled
        Constructs SequenceGapInformation for the specified gapped sequence.
        Parameters:
        gappedSequence - A CharSequence that contains gaps and for which we want to be able to translate between gapped and ungapped indices.
        progressListener - for reporting progress and cancelling
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException.Canceled - if the progress listener requests we cancel.
        Since:
        API 4.50 (Geneious 5.5.0)
    • Method Detail

      • toXML

        public org.jdom.Element toXML​(java.lang.String name)
                               throws java.io.IOException
        Serializes this gap information (excluding the char sequence) to XML which may use PluginDocument.FILE_DATA_ATTRIBUTE_NAME
        Parameters:
        name - the name of the element to retrn
        Returns:
        some xml
        Throws:
        java.io.IOException - if it can't be serialized because we can't write to a local temporary file
        Since:
        API 4.50 (Geneious 5.5.0)
      • toXML

        public org.jdom.Element toXML​(Geneious.MajorVersion version,
                                      java.lang.String name)
                               throws java.io.IOException
        Serializes this gap information (excluding the char sequence) to XML which may use PluginDocument.FILE_DATA_ATTRIBUTE_NAME
        Parameters:
        version - the version of Geneious to serialize for
        name - the name of the element to retrn
        Returns:
        some xml
        Throws:
        java.io.IOException - if it can't be serialized because we can't write to a local temporary file
        Since:
        API 4.600 (Geneious 6.0.0)
      • forSequenceDocument

        public static SequenceGapInformation forSequenceDocument​(SequenceDocument sequence)
        Get SequenceGapInformation for the given sequence. This is the preferred method of getting a SequenceGapInformation because it can return the cached copy for a DefaultSequenceDocument.
        Parameters:
        sequence - a SequenceDocument that contains gaps and for which we want to be able to translate between gapped and ungapped indices.
        Returns:
        gap information for the sequence
        Since:
        API 4.600 (Geneious 6.0.0)
      • getLeadingGapsLength

        public int getLeadingGapsLength()
        Returns:
        the number of leading gaps in the sequence passed to the constructor. Equivalent to SequenceCharSequence.getLeadingGapsLength()
        Since:
        API 4.60 (Geneious 5.6.0)
      • getTrailingGapsLength

        public int getTrailingGapsLength()
        Returns:
        the number of trailing gaps in the sequence passed to the constructor. Equivalent to SequenceCharSequence.getTrailingGapsLength()
        Since:
        API 4.60 (Geneious 5.6.0)
      • getTrailingGapsStartIndex

        public int getTrailingGapsStartIndex()
        Returns:
        the start index of the trailing gaps in the sequence passed to the constructor. Equivalent to SequenceCharSequence.getTrailingGapsStartIndex()
        Since:
        API 4.60 (Geneious 5.6.0)
      • getUngappedIndexOfThisOrPreviousResidue

        public int getUngappedIndexOfThisOrPreviousResidue​(int indexInGappedSequence)
        Calculates the index where the character gappedSequence.charAt(indexInGappedSequence) would move if all gaps were stripped from gappedSequence. If the specified index is on a gap, the adjusted index of its nearest nongap neighbour on the left (or -1 if there is none) is returned. This corresponds to stripping all gap characters out of the sequence and the intervals covering those residues, with the characters moving to the left to fill the gaps, and implicitly treating characters beyond the sequence length and in end-gap regions as nongaps.
         Example:
            012345678901   old indices
            -ABC--DE--f-
           -101222344456   new indices
          
        Parameters:
        indexInGappedSequence - The 0-based index in the gapped sequence to convert to an index in the sequence without gaps
        Returns:
        the resulting index in the gapless CharSequence.
      • getUngappedIndexOfThisOrPreviousResidueTreatingEndGapsLikeInternalGaps

        public int getUngappedIndexOfThisOrPreviousResidueTreatingEndGapsLikeInternalGaps​(int indexInGappedSequence)
        Calculates the index where the character gappedSequence.charAt(indexInGappedSequence) would move if all gaps were stripped from gappedSequence. If the specified index is on a gap, the adjusted index of its nearest nongap neighbour on the left (or -1 if there is none) is returned. This corresponds to stripping all gap characters out of the sequence and the intervals covering those residues, with the characters moving to the left to fill the gaps, and implicitly treating characters beyond the sequence length as nongaps. Characters in end gap regions will be treated the same as internal gaps.
         Example:
            012345678901   old indices
            -ABC--DE--f-
           -101222344455   new indices
          
        Parameters:
        indexInGappedSequence - The 0-based index in the gapped sequence to convert to an index in the sequence without gaps
        Returns:
        the resulting index in the gapless CharSequence.
        Since:
        API 4.700 (Geneious 7.0.0
      • isGap

        public boolean isGap​(int indexInGappedSequence)
        Return true if the character at the specified gapped sequence index is an internal gap or end gap Characters beyond the ends of the gapped sequence are assumed to be non-gaps (for consistency with getUngappedIndexOfThisOrPreviousResidue(int)) therefore isGap(x) where x<0 or x>=gappedSequenceLength will return false.
        Parameters:
        indexInGappedSequence - an index of a character in the gapped sequence.
        Returns:
        true if the character at the specified gapped sequence index is a gap.
      • isInternalGap

        public boolean isInternalGap​(int indexInGappedSequence)
        Return true if the character at the specified gapped sequence index is an internal (non-end) gap. Characters beyond the ends of the gapped sequence are assumed to be non-gaps (for consistency with getUngappedIndexOfThisOrPreviousResidue(int)) therefore isGap(x) where x<0 or x>=gappedSequenceLength will return false.
        Parameters:
        indexInGappedSequence - an index of a character in the gapped sequence.
        Returns:
        true if the character at the specified gapped sequence index is an internal gap.
        Since:
        API 4.60 (Geneious 5.6.0)
      • getGappedCharAt

        public char getGappedCharAt​(int indexInGappedSequence)
        Returns the character at the given index in the gapped sequence.
        Parameters:
        indexInGappedSequence - 0-based index of the character in the gapped sequence
        Returns:
        the character at the gapped sequence
        Throws:
        java.lang.IndexOutOfBoundsException - if indexInGappedSequence is less than 0 or greater than or equal to the gapped sequence length
        Since:
        API 4.202000 (Geneious 2020.0.0)
      • getGappedCharSequence

        public SequenceCharSequence getGappedCharSequence()
        Returns:
        the gapped sequence passed to the constructor.
        Since:
        API 4.202010 (Geneious 2020.1.0)
      • getUngappedIndexOfThisOrNextResidueTreatingEndGapsLikeInternalGaps

        public int getUngappedIndexOfThisOrNextResidueTreatingEndGapsLikeInternalGaps​(int indexInGappedSequence)
        Same as getUngappedIndexOfThisOrPreviousResidueTreatingEndGapsLikeInternalGaps(int), but if the specified index is on a gap in the gapped sequence, then the ungapped index of the next rather than the previous nongap residue is returned.
        Parameters:
        indexInGappedSequence - a 0-based index in the gapped sequence
        Returns:
        the 0-based index of the first residue at or after the specified position in the ungapped sequence.
        Since:
        API 4.700 (Geneious 7.0.0)
      • getUngappedIndexOfThisOrNextResidue

        public int getUngappedIndexOfThisOrNextResidue​(int indexInGappedSequence)
        Same as getUngappedIndexOfThisOrPreviousResidue(int), but if the specified index is on a gap in the gapped sequence, then the ungapped index of the next rather than the previous nongap residue is returned.
        Parameters:
        indexInGappedSequence - a 0-based index in the gapped sequence
        Returns:
        the 0-based index of the first residue at or after the specified position in the ungapped sequence.
      • getUngappedIndexOfThisOrNextResidue

        public static int getUngappedIndexOfThisOrNextResidue​(SequenceCharSequence sequence,
                                                              int indexInGappedSequence)
        Gets the ungapped index corresponding to a gapped index. If the gapped index is a gap then the ungapped index of the next non-gap is returned. This is a static version of getUngappedIndexOfThisOrNextResidue(int) that doesn't require the time and memory usage of constructing a reusable SequenceGapInformation
        Parameters:
        sequence - the sequence
        indexInGappedSequence - a 0-based index in the gapped sequence
        Returns:
        the 0-based index of the first residue at or after the specified position in the ungapped sequence.
        Since:
        API 4.31 (Geneious 5.3.1)
      • getUngappedIndexOfThisOrPreviousResidue

        public static int getUngappedIndexOfThisOrPreviousResidue​(SequenceCharSequence sequence,
                                                                  int indexInGappedSequence)
        Gets the ungapped index corresponding to a gapped index. If the gapped index is a gap then the ungapped index of the previous non-gap is returned. This is a static version of getUngappedIndexOfThisOrPreviousResidue(int) that doesn't require the time and memory usage of constructing a reusable SequenceGapInformation
        Parameters:
        sequence - the sequence
        indexInGappedSequence - a 0-based index in the gapped sequence
        Returns:
        the 0-based index of the first residue at or before the specified position in the ungapped sequence.
        Since:
        API 4.31 (Geneious 5.3.1)
      • getGappedIndex

        public int getGappedIndex​(int indexInUngappedSequence)

        Translates an index in the ungapped sequence to the index of the corresponding nongap character in the gappedSequence passed to the constructor.

        It is permissible for indexInUngappedSequence to be < 0 or >= getUngappedSequenceLength(). For that case, it is assumed that gappedSequence is part of a larger sequence that contains no gaps beyond gappedSequence's ends. If the gapped sequence contains end gaps, the returned position may lie within the end gap region.

        Parameters:
        indexInUngappedSequence - An index in the ungapped sequence, i.e.SequenceUtilities.removeGaps(gappedSequence)
        Returns:
        The position of the indexInUngappedSequence'th nongap charcter in gappedSequence
        See Also:
        getGappedIndexTreatingEndGapsLikeInternalGaps(int)
      • getUngappedSequenceLength

        public int getUngappedSequenceLength()
        Returns the ungapped length of the sequence passed to the constructor
        Returns:
        the ungapped length of the sequence passed to the constructor
      • getGappedSequenceLength

        public int getGappedSequenceLength()
        Returns:
        the length of the gapped sequence passed to the constructor.