Class SequenceCharSequence

  • All Implemented Interfaces:
    XMLSerializable, XMLSerializable.OldVersionCompatible, java.lang.CharSequence, java.lang.Comparable<SequenceCharSequence>
    Direct Known Subclasses:
    ImmutableSequence

    public abstract class SequenceCharSequence
    extends java.lang.Object
    implements java.lang.CharSequence, XMLSerializable, java.lang.Comparable<SequenceCharSequence>, XMLSerializable.OldVersionCompatible

    A CharSequence that knows the length of its terminal gaps (sequences of '-') and allows for efficient insertions and deletions of subsequences.

    This class is a wrapper around another CharSequence which must be immutable. As long as the wrapped CharSequence honors this contract, instances of this class are also immutable and therefore thread safe.

    Traditionally we have used Strings to represent biological sequences in Geneious, however it is preferable for methods to not require Strings but just CharSequences to be passed in, mostly because some aspects of CharSequences can be calculated on the fly or (e.g. when a CharSequence contains a long repetition of the same character) be stored much more memory efficient than a String. Also, other than repetitive characters, a CharSequence implementation can also store any additional metainformation about a sequence.

    This class (SequenceCharSequence) makes use of this advantage. It is an immutable wrapper around another CharSequence (which must also be immutable, i.e. not change its length or sequence of characters, particularly it may not acquire or lose any terminal gaps). Failure of the wrapped CharSequence to comply with this contract may result in arbitrary nondeterministic behaviour or RuntimeExceptions, but it is not guaranteed that such exceptions will be thrown.

    You can obtain a SequenceCharSequence wrapper around any immutable CharSequence via: valueOf(CharSequence); to prepend terminal gaps to an existing sequence in constant time, use withTerminalGaps(int, CharSequence, int).

    Of note among the methods in this class that run in constant time are:

    Also, the methods don't modify the underlying SCS (because it's immutable) but return a modified copy. These methods run in linear time when they are first invoked, but return a SequenceCharSequence on which subsequent calls of this method will then run in logarithmic time!

    When working with SequenceCharSequences, it is advisable to use the methods in SequenceUtilities and CharSequenceUtilities wherever possible, rather than writing your own code. All of those methods treat SequenceCharSequences special to run more efficiently.

    • Field Detail

      • EMPTY

        public static final SequenceCharSequence EMPTY
        An empty (0 length) immutable SequenceCharSequence
    • Method Detail

      • withTerminalGaps

        public static SequenceCharSequence withTerminalGaps​(int gapPrefixLength,
                                                            java.lang.CharSequence charSequence,
                                                            int gapSuffixLength)
        Create a new SequenceCharSequence from a CharSequence with the specified terminal gap lengths.

        In order to obey the contract of SequenceCharSequence that it is immutable, the length of charSequence and the characters retured from charSequence must not change after calling this method.

        It is guaranteed that the returned SequenceCharSequence's internal charSequence is either charSequence, or a subSequence() of charSequence, or (if charSequence is already a SequenceCharSequence) its internal char sequence.

        Parameters:
        gapPrefixLength - the length of the leading gaps in addition to any leading gaps already present in charSequence
        charSequence - CharSequence to wrap in a SequenceCharSequence. The length of charSequence and the characters retured from charSequence must not change after calling this method.
        gapSuffixLength - the length of the trailing gaps in addition to any trailing gaps already present in charSequence
        Returns:
        a new SequenceCharSequence
      • withOnlyGaps

        public static SequenceCharSequence withOnlyGaps​(int numberOfGaps)
        Gets a char sequence that consists of only gaps.
        Parameters:
        numberOfGaps - the number of gaps
        Returns:
        a char sequence that consists of only gaps.
        Since:
        API 4.11 (Geneious 5.0)
      • valueOf

        public static SequenceCharSequence valueOf​(java.lang.CharSequence charSequence)
        A SequenceCharSequence representing the same sequence of characters as charSequence; It is guaranteed that if charSequence instanceof SequenceCharSequence, charSequence is returned.

        In order to obey the contract of SequenceCharSequence that it is immutable, the length of charSequence and the characters retured from charSequence must not change after calling this method.

        Parameters:
        charSequence - CharSequence to wrap in a SequenceCharSequence. The length of charSequence and the characters retured from charSequence must not change after calling this method.
        Returns:
        A SequenceCharSequence representing the same sequence of characters as charSequence.
      • contains

        public abstract boolean contains​(char c)
        Checks whether this CharSequence contains the specified character.
        Parameters:
        c - the character to look for
        Returns:
        true if this CharSequence contains the specified character.
      • count

        public abstract int count​(char c)
        Counts the number of occurences of c in this CharSequence
        Parameters:
        c - The character to search for
        Returns:
        The number of times c occurs in this CharSequence
      • toXML

        public org.jdom.Element toXML​(Geneious.MajorVersion majorVersion,
                                      jebl.util.ProgressListener progressListener)
        Description copied from interface: XMLSerializable.OldVersionCompatible
        Serializes this class to XML format, potentially to a format readable by an earlier version of Geneious. It is acceptable for the XML to include unnecessary tags that will be ignored by the earlier version. For example if the implementation has only extended the XML since the earlier version, then the XML returned may be identical to the XML returned for the current version.

        See XMLSerializable.toXML() for a more detailed description of what it means to serialize to XML.

        All classes that implement this method must also implement XMLSerializable.toXML() and should delegate back to this method using Geneious.getMajorVersion() and ProgressListener.EMPTY as parameters.

        Specified by:
        toXML in interface XMLSerializable.OldVersionCompatible
        Parameters:
        majorVersion - the major version of Geneious to serialize to XML for. For example "6.0" but not "6.0.0". This must be a version returned greater or equal to a version returned from getVersionSupport(VersionSupportType.OldestVersionSerializableTo) and must never be greater than the current version (Geneious.getMajorVersion())
        progressListener - for reporting progress and cancelling
        Returns:
        object encoded as a JDOM element
      • getLeadingGapsLength

        public abstract int getLeadingGapsLength()
        The number of leading gaps in this SequenceCharSequence, i.e. the first index i for which charAt(i) won't return '-'. If this sequence consists only of gaps, it is guaranteed that this method will return the same as length() (i.e. in a gap-only sequence, all gaps are considered leading rather than trailing gaps).
        Returns:
        the number of leading gaps in this SequenceCharSequence
      • getTrailingGapsLength

        public abstract int getTrailingGapsLength()
        The number of trailing gaps in this SequenceCharSequence. In other words, the last index for which charAt(i) won't return '-' is length() - getTrailingGapsLength(). If this sequence consists only of gaps, then this method will return 0 (i.e. in a gap-only sequnce, all gaps are considered leading rather than trailing gaps).
        Returns:
        the number of trailing gaps in this SequenceCharSequence
      • getTrailingGapsStartIndex

        public abstract int getTrailingGapsStartIndex()
        Get the index in the sequence at which the trailing gaps (if any) start. If there are no trailing gaps, this returns CharSequence.length().
        Returns:
        the index in the sequence at which the trailing gaps (if any) start.
      • isAllGaps

        public final boolean isAllGaps()
        Returns true if this seqeuence consists entirely of gap ('-') characters.
        Returns:
        true if this seqeuence consists entirely of gap ('-') characters.
      • charAt

        public abstract char charAt​(int index)
        Returns the char value at the specified index. If index < getLeadingGapsLength() or index >= CharSequence.length() - getTrailingGapsLength(), returns '-'.
        Specified by:
        charAt in interface java.lang.CharSequence
        Parameters:
        index - Index of character to look up
        Returns:
        char value at the specified index
        Throws:
        java.lang.IndexOutOfBoundsException - if index < 0 or index >= CharSequence.length()
      • charAtIgnoringEndGaps

        public char charAtIgnoringEndGaps​(int index)
        Gets the character at the given index, ignoring end gaps. Generally this is more efficient than calling charAt(int). charAtIgnoringEndGaps(index) is equivalent to charAt(index+getLeadingGapsLength())
        Parameters:
        index - the index of the character to get relative to the leading gaps length
        Returns:
        the character at the specified index
        Since:
        API 4.11 (Geneious 5.0)
      • subSequence

        public SequenceCharSequence subSequence​(int start,
                                                int end)
        Constructs a subsequence of this sequence. If this sequence was a result of a modification (insert(int, CharSequence) or delete(int, int)) and supports logarithmic time modifications, then it is guaranteed that the returned subsequence supports logarithmic time modifications as well.
        Specified by:
        subSequence in interface java.lang.CharSequence
        Parameters:
        start - the start index, inclusive
        end - the end index, exclusive
        Returns:
        A subsequence of this sequence covering positions start inclusive to end exclusive.
      • getInternalCharSequence

        public abstract java.lang.CharSequence getInternalCharSequence()
        Returns the CharSequence wrapped by this SequenceCharSequence, representing the sequence without the terminal gaps. This method runs in constant time.
        Returns:
        A CharSequence representing this SequenceCharSequence's internal sequence without the terminal gaps. It is not specified what the concrete class of the returned CharSequence will be.
      • getInternalSequenceLength

        public int getInternalSequenceLength()
        Returns:
        the length of the internal sequence without end gaps.
      • toString

        public final java.lang.String toString()
        Specified by:
        toString in interface java.lang.CharSequence
        Overrides:
        toString in class java.lang.Object
      • hashCode

        public final int hashCode()
        Calculates a hashCode that is based on this CharSequence's sequence of characters. So regardless of whether the wrapped CharSequence overrides Object.hashCode(), it is guaranteed that if a.toString().equals(b.toString()) for two SequenceCharSequences a, b, then a.hashCode() == b.hashCode(). However, a.hashCode() will generally be different from a.toString().hashCode().

        Note - since this method is generally unused, it is not that efficient or that good at hashing.

        Overrides:
        hashCode in class java.lang.Object
        Returns:
        The hashCode of this SequenceCharSequence.
      • equals

        public boolean equals​(java.lang.Object obj)
        Checks whether obj is a SequenceCharSequence representing the same sequence of characters as this one.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        obj -
        Returns:
        true if and only if obj is instanceof SequenceCharSequence and obj.toString().equals(this.toString()).
      • insert

        public SequenceCharSequence insert​(int index,
                                           java.lang.CharSequence charSequence)
        Creates a copy of this SequenceCharSequence with the specified CharSequence inserted at the specified index. Doesn't modify this SequenceCharSequence. The returned SequenceCharSequence will support logarithmic time insertions and deletions, and its length will be longer than this sequence's by charSequence.length().

        In order to obey the contract of SequenceCharSequence that it is immutable, the length of charSequence and the characters retured from charSequence must not change after calling this method.

        Parameters:
        index - The position where to insert csq. All existing characters at or to the right of this position will be moved to the right by charSequence.length().
        charSequence - A CharSequence to insert at the specified index. The length of charSequence and the characters retured from charSequence must not change after calling this method.
        Returns:
        A copy of this SequenceCharSequence with csq inserted at the specified index.
      • delete

        public SequenceCharSequence delete​(int deletionBegin,
                                           int deletionEnd)
        Returns a copy of this SequenceCharSequence where the characters at positions deletionBegin inclusive to deletionEnd exclusive have been removed. Doesn't modify this SequenceCharSequence. The returned SequenceCharSequence will support logarithmic time insertions and deletions, and its length will be shorter than this sequence's by deletionEnd - deletionBegin.
        Parameters:
        deletionBegin - Deletion start index, inclusive
        deletionEnd - Deletion end index, exclusive
        Returns:
        A copy of this SequenceCharSequence with the specified range of characters removed.
      • compareTo

        public final int compareTo​(SequenceCharSequence that)
        Lexicographically compares this SequenceCharSequence to another, taking into account case. The signum of the result value will always be the same as that of this.toString().compareTo(that.toString()). This comparison is consistent with equals.
        Specified by:
        compareTo in interface java.lang.Comparable<SequenceCharSequence>
        Parameters:
        that -
        Returns:
        result of comparison of this to that
      • equalsIgnoreCase

        public final boolean equalsIgnoreCase​(SequenceCharSequence that)
        Checks whether this SequenceCharSequence contains the same sequence of characters as the specified SequenceCharSequence, ignoring case.
        Parameters:
        that - SequenceCharSequence to be compared for equality (ignoring case) with this
        Returns:
        true if this.toString().equalsIgnoreCase(that.toString())
      • indexOf

        public int indexOf​(java.lang.CharSequence subSequence)
        Get the index of the first character of the first occurrence of the specified sub sequence in this sequence
        Parameters:
        subSequence - the subsequence to find
        Returns:
        the index of the first character of the first occurrence of the specified sub sequence in this sequence or -1 if the specified subsequence is not a subsequence of this sequence.
      • indexOf

        public int indexOf​(java.lang.CharSequence subSequence,
                           int fromIndex)
        Get the index of the first character of the first occurrence of the specified sub sequence in this sequence, starting at the specified index
        Parameters:
        subSequence - the subsequence to find
        fromIndex - the index from which to start the search.
        Returns:
        the index of the first character of the first occurrence of the specified sub sequence in this sequence or -1 if the specified subsequence is not a subsequence of this sequence.
        Since:
        API 4.700 (Geneious 7.0.0)
      • isEndGap

        public boolean isEndGap​(int index)
        Returns true if the given endex is an end gap (i.e. index&lt;getLeadingGapsLength or index&gt;=getTrailingGapsStartIndex())
        Parameters:
        index - Index of character to look up
        Returns:
        true if it is an end gap
      • isGap

        public boolean isGap​(int index)
        Returns true if the given endex is a gap.
        Parameters:
        index - Index of character to look up
        Returns:
        true if it is a gap
      • writeObject

        public void writeObject​(GeneiousObjectOutputStream outputStream,
                                jebl.util.ProgressListener progressListener)
                         throws java.io.IOException
        Writes this sequence to an ObjectOutputStream as specified by Serializable. The object can be reconstructed using readObject(GeneiousObjectInputStream)
        Parameters:
        outputStream - the stream to write to.
        progressListener - for reporting progress and allowing the write to be canceled.
        Throws:
        java.io.IOException - if the write cannot be completed for any reason (including if the progress listener requests the operation be canceled)
      • writeObject

        public void writeObject​(java.io.DataOutput dataOutput,
                                jebl.util.ProgressListener progressListener)
                         throws java.io.IOException
        Writes this sequence to a DataOutput. The object can be reconstructed using readObject(java.io.DataInput)
        Parameters:
        dataOutput - the DataOutput to write to.
        progressListener - for reporting progress and allowing the write to be canceled.
        Throws:
        java.io.IOException - if the write cannot be completed for any reason (including if the progress listener requests the operation be canceled)
        Since:
        API 4.30 (Geneious 5.3.0)
      • writeObject

        public void writeObject​(Geneious.MajorVersion version,
                                java.io.DataOutput dataOutput,
                                jebl.util.ProgressListener progressListener)
                         throws java.io.IOException
        Writes this sequence to a DataOutput. The object can be reconstructed using readObject(java.io.DataInput)
        Parameters:
        version - the version number of Geneious that must be able to deserialize this.
        dataOutput - the DataOutput to write to.
        progressListener - for reporting progress and allowing the write to be canceled.
        Throws:
        java.io.IOException - if the write cannot be completed for any reason (including if the progress listener requests the operation be canceled)
        Since:
        API 4.600 (Geneious 6.0.0)
      • countGaps

        public int countGaps​(int startIndex,
                             int endIndexExclusive)
        Counts the number of gaps in this sequence between the 2 given positions.
        Parameters:
        startIndex - the 0-based start index to count from (inclusive)
        endIndexExclusive - the 0-based end index to stop counting at (exclusive)
        Returns:
        the number of gaps in this sequence between the 2 given positions.
        Since:
        API 4.11 (Geneious 5.0)
      • getUngappedLength

        public int getUngappedLength()
        returns the length of this sequence, excluding any gaps (both internal and end gaps)
        Returns:
        the length of this sequence, excluding any gaps (both internal and end gaps)
        Since:
        API 4.40 (Geneious 5.4.0)