Class SequenceUtilities


  • public final class SequenceUtilities
    extends java.lang.Object
    A noninstantiable class providing static methods for common tasks associated with nucleotide and protein sequences.
    See Also:
    SequenceDocument, SequenceExtractionUtilities
    • Method Detail

      • getForwardRegexForSequence

        public static java.lang.String getForwardRegexForSequence​(java.lang.CharSequence querySequence,
                                                                  jebl.evolution.sequences.SequenceType sequenceType,
                                                                  boolean interpretAmbiguitiesInQuery,
                                                                  boolean interpretAmbiguitiesInTarget,
                                                                  boolean allowExtraGapsInTarget)
        Given a nucleotide or amino acid sequence, returns a regular expression that matches forward occurrences of this sequence in a larger sequence, i.e. a String s such that Pattern.compile(s, Pattern.CASE_INSENSITIVE) will find all case insensitive forward matches of sequenceString in a larger sequence. The regular expression returned will also match sequences with gaps inserted at any point within the sequence.
        Parameters:
        querySequence - The nucleotide or amino acid sequence to search for.
        sequenceType - The type of the sequence
        interpretAmbiguitiesInQuery - If true, then an ambiguous character (e.g. R for nucleotides) in querySequence will match the corresponding canonical states (A and G) in the target.
        interpretAmbiguitiesInTarget - If true, then an ambiguous character (e.g. R for nucleotides) in the sequence being searched within will match the corresponding canonical states (A and G) in the querySequence.
        allowExtraGapsInTarget - If true, then additional gaps will be allowed in the sequence being search within
        Returns:
        a regular expression that matches forward occurrences of this search string in a larger sequence string or null if any of the characters in the sequence string are not valid residues for sequenceType
        Since:
        API 4.610 (Geneious 6.1.0)
      • getForwardRegexPatternForSequence

        public static java.util.regex.Pattern getForwardRegexPatternForSequence​(java.lang.CharSequence querySequence,
                                                                                jebl.evolution.sequences.SequenceType sequenceType,
                                                                                boolean interpretAmbiguitiesInQuery,
                                                                                boolean interpretAmbiguitiesInTarget)
        Given a nucleotide or amino acid sequence string, returns a regular expression pattern that matches forward occurrences of this search string in a larger sequence string,
        Parameters:
        querySequence - The nucleotide or amino acid sequence to search for.
        sequenceType - The type of the sequence
        interpretAmbiguitiesInQuery - If true, then an ambiguous character (e.g. R for nucleotides) in sequenceString will match the corresponding canonical states (A and G) in the target.
        Returns:
        a regular expression that matches forward occurrences of this search string in a larger sequence string or null if any of the characters in the sequence string are not valid residues for sequenceType
      • isStateAssignableFrom

        public static boolean isStateAssignableFrom​(jebl.evolution.sequences.State stateA,
                                                    jebl.evolution.sequences.State stateB)
        Same as stateA.getCanonicalStates().containsAll(stateB.getCanonicalStates()) except that for NucleotideStates and AminoAcidStates it caches the result.
        Parameters:
        stateA - A state (e.g. a NucleotideState or AminoAcidState)
        stateB - A state of the same type as stateB
        Returns:
        true if stateA.getCanonicalStates().containsAll(stateB.getCanonicalStates))
      • createSequenceDocument

        public static DefaultSequenceDocument createSequenceDocument​(jebl.evolution.sequences.SequenceType sequenceType,
                                                                     java.lang.String name,
                                                                     java.lang.String description,
                                                                     java.lang.CharSequence sequenceString,
                                                                     java.util.Date creationDate)
        Creates a DefaultNucleotideSequence or DefaultAminoAcidSequence depending on sequenceType. See the documentation of these classes' constructors for the semantics of the parameters.
      • setOriginalResidueNumbering

        public static void setOriginalResidueNumbering​(EditableSequenceDocument document,
                                                       int startIndex,
                                                       boolean isReverse)
        set the original residue numbering of a document the residue index of a document will appear shifted if the user has "show original residue numbers" selected in the sequence view
        Parameters:
        document - document to set the residue numbering for
        startIndex - start index for the residue numbering. The first original residue is residue 1.
        isReverse - true if the residue numbering should count down from startIndex, false if it should count up.
      • containsInvalidResidues

        public static java.lang.String containsInvalidResidues​(SequenceDocument sequenceDocument,
                                                               boolean allowGaps,
                                                               boolean fastIncompleteCheck)
        Checks if a sequence contain invalid sequence residues.
        Parameters:
        sequenceDocument - the sequence to check for validity.
        allowGaps - true if the sequence is allowed to contain gaps
        fastIncompleteCheck - This parameter is ignored. It was added when Java 5 was widely used which is 10 times slower than Java 6. Checking enormous sequences is slow (a 2GB sequence takes about 20 seconds in Java 5, 2 seconds in Java 6). Set this parameter to true to check only the first and last 1,000,000 residues which catches almost all invalid cases and is much faster on enormous sequences.
        Returns:
        null if the sequence residues are all valid, or if the sequence contains invalid residues a message describing the first invalid residue is returned.
      • getSequenceType

        public static jebl.evolution.sequences.SequenceType getSequenceType​(SequenceDocument.Alphabet alphabet)
        Gets a jebl library SequenceType that is equivalent to a Geneious alphabet.
        Parameters:
        alphabet -
        Returns:
        sequence type that is equivalent to this alphabet
      • getAlphabet

        public static SequenceDocument.Alphabet getAlphabet​(jebl.evolution.sequences.SequenceType sequenceType)
        Gets a Geneious alphabet type that is equivalent to a jebl library SequenceType.
        Parameters:
        sequenceType -
        Returns:
        alphabet that is equivalent to this sequence tyqpe
      • containsInvalidResidues

        public static java.lang.String containsInvalidResidues​(java.lang.CharSequence sequenceResidues,
                                                               SequenceDocument.Alphabet alphabet,
                                                               boolean allowGaps,
                                                               boolean fastIncompleteCheck)
        Checks if a sequence contain invalid sequence residues.
        Parameters:
        sequenceResidues - sequence residues to check for validity.
        alphabet - the alphabet of residues expected to be in sequenceResidues
        allowGaps - true if the sequence is allowed to contain gaps
        fastIncompleteCheck - This parameter is ignored. It was added when Java 5 was widely used which is 10 times slower than Java 6. Checking enormous sequences is slow (a 2GB sequence takes about 20 seconds in Java 5, 2 seconds in Java 6). Set this parameter to true to check only the first and last 1,000,000 residues which catches almost all invalid cases and is much faster on enormous sequences.
        Returns:
        null if the sequence residues are all valid, or if the sequence contains invalid residues a message describing the first invalid residue is returned.
      • containsInvalidResidues

        public static java.lang.String containsInvalidResidues​(java.lang.CharSequence sequenceResidues,
                                                               jebl.evolution.sequences.SequenceType sequenceType,
                                                               boolean allowGaps,
                                                               boolean fastIncompleteCheck)
        Checks if a sequence contain invalid sequence residues.
        Parameters:
        sequenceResidues - sequence residues to check for validity.
        sequenceType - the type of residues expected to be in sequenceResidues
        allowGaps - true if the sequence is allowed to contain gaps
        fastIncompleteCheck - This parameter is ignored. It was added when Java 5 was widely used which is 10 times slower than Java 6. Checking enormous sequences is slow (a 2GB sequence takes about 20 seconds in Java 5, 2 seconds in Java 6). Set this parameter to true to check only the first and last 1,000,000 residues which catches almost all invalid cases and is much faster on enormous sequences.
        Returns:
        null if the sequence residues are all valid, or if the sequence contains invalid residues a message describing the first invalid residue is returned.
      • removeInvalidResidues

        public static java.lang.CharSequence removeInvalidResidues​(java.lang.CharSequence sequence,
                                                                   jebl.evolution.sequences.SequenceType sequenceType,
                                                                   boolean allowGaps)
        Get a sequence string identical to sequence except that any invalid residues are removed. Gaps are only removed if allowGaps is false. All valid characters remain unchanged (they maintain their original case and there are no U->T replacements for nucleotides.)
        Parameters:
        sequence - a string of residues that may or may not be valid residues
        sequenceType - the type of residues in sequence
        allowGaps - if this is true, then gaps are not removed.
        Returns:
        a sequence string identical to sequence except that any invalid residues are removed. If there are no invalid residues, sequence is returned.
        Throws:
        java.lang.OutOfMemoryError - if a sequence comtains invalid residues and a valid version of the sequence cannot fit in memory
      • getValidSequence

        public static java.lang.CharSequence getValidSequence​(SequenceDocument sequenceDocument,
                                                              boolean allowGaps)
        Replace any invalid bases/residues in the given sequence document with ambiguity symbols.
        Parameters:
        sequenceDocument - sequence document to replace the invalid bases in
        allowGaps - whether gaps are allowed (if false they will be replaced with ambiguity symbols)
        Returns:
        the version of the sequence string with the invalid bases/residues replaced with ambiguity symbols
      • getValidSequence

        public static java.lang.CharSequence getValidSequence​(SequenceDocument sequenceDocument,
                                                              boolean allowGaps,
                                                              boolean replaceWithGaps)
        Replace any invalid bases/residues in the given sequence document with ambiguity symbols or gaps.
        Parameters:
        sequenceDocument - sequence document to replace the invalid bases in
        allowGaps - whether gaps are allowed (if false they will be replaced with ambiguity symbols)
        replaceWithGaps - whether invalid bases/residues should be replaced with gaps - should only be done if sequence is in an alignment
        Returns:
        the version of the sequence string with the invalid bases/residues replaced with ambiguity symbols
        Throws:
        java.lang.IllegalArgumentException - if allowGaps is false but replaceWithGaps is true
        Since:
        API 4.20 (Geneious 5.2.0)
      • asRna

        public static java.lang.CharSequence asRna​(java.lang.CharSequence nucleotideCharSequence)
        Views an underlying (nucleotide) CharSequence as RNA by dynamically translating 'T's to 'U's and 't's to 'u's It is guaranteed that if charSequence instanceof SequenceCharSequence, the returned value will also be instanceof SequenceCharSequence (but it may not support log-time modifications).
        Parameters:
        nucleotideCharSequence - A nucleotide sequence which may already have some RNA residues; it is not guaranteed that it is checked whether the sequence contains invalid residues. Must not be null.
        Returns:
        A CharSequence with the same sequence of characters as charSequence, except that 'T's are replaced with 'U's and 't's are replaced with 'u's
      • asDna

        public static java.lang.CharSequence asDna​(java.lang.CharSequence nucleotideCharSequence)
        Views an underlying (nucleotide) CharSequence as DNA by dynamically translating 'U's to 'T's and 'u's to 't's It is guaranteed that if charSequence instanceof SequenceCharSequence, the returned value will also be instanceof SequenceCharSequence (but it may not support log-time modifications).
        Parameters:
        nucleotideCharSequence - A nucleotide sequence which may already have some DNA residues; it is not guaranteed that it is checked whether the sequence contains invalid residues. Must not be null.
        Returns:
        A CharSequence with the same sequence of characters as charSequence, except that 'U's are replaced with 'T's and 'u's are replaced with 't's
      • asTranslation

        public static java.lang.CharSequence asTranslation​(java.lang.CharSequence nucleotideCharSequence,
                                                           jebl.evolution.sequences.GeneticCode geneticCode,
                                                           boolean translateFirstCodonUsingFirstCodonTable)
        Views an underlying (nucleotide) CharSequence as its translation. If the CharSequence is not a multiple of 3, the extra 1 or 2 characters are ignored. The translated sequence will have length nucleotideCharSequence.length()/3. If the nucleotide sequence contains unknown nucleotide characters, these are treated as unknown states and the corresponding translated site will also be the unknown state (?) unless the nucleotide base would not affect the translation (e.g. the 3rd base in some triplets). The concrete type of the return value is not guaranteed. The specified charSequence must not change after it was passed to this method, but it is not guaranteed that violations of this contract will be detected.
        Parameters:
        nucleotideCharSequence - A nucleotide sequence which may be dna, rna or a mixture. Must not be null and must be immutable. Must not contain gaps.
        geneticCode - the genetic code to use for the translation. Must not be null.
        translateFirstCodonUsingFirstCodonTable - each genetic code specifies a set of codons which get translated as M if they are the first codon even though they normally wouldn't translate as an M when occurring elsewhere a coding region. If this parameter is true the first codon will be translated using this alternative translation table for the genetic code.
        Returns:
        A CharSequence which is a translation of nucleotideCharSequence
        Throws:
        java.lang.IllegalArgumentException - if nucleotideCharSequence contains gaps.
        java.lang.NullPointerException - if nucleotideCharSequence or geneticCode is null.
        Since:
        API 4.41 (Geneious 5.4.1)
      • reverseComplement

        public static java.lang.CharSequence reverseComplement​(java.lang.CharSequence charSequence)

        Provides a dynamic reverse complement view onto a nucletoide CharSequence. For performance, it is not guaranteed whether the charSequence will be checked for invalid residues. If an invalid nucleotide CharSequence is passed in, arbitrary nondeterministic behaviour may occur at any later time, such as e.g. unchecked exceptions thrown from CharSequence.charAt(int).

        It is guaranteed that if charSequence instanceof SequenceCharSequence, the returned value will also be instanceof SequenceCharSequence (but it may not support log-time modifications).

        Attention: Unlike Utils.reverseComplement(String), this method preserves case and doesn't remove gaps. To remove gaps and convert the sequence to upper case, use removeGaps(CharSequence) and CharSequenceUtilities.asUpperCase(CharSequence).

        This method may be slow on sequences which do not contain a T or U near the start of the sequence as it needs to scan through the sequence to determine if it is RNA or DNA. Consider using reverseComplementAsDna(CharSequence) for better performance.
        Parameters:
        charSequence - The charSequence for which to construct a reverse complement view.
        Returns:
        A reverse complement view onto charSequence, with case and gaps preserved.
        See Also:
        reverseComplementAsDna(CharSequence)
      • reverseComplementAsDna

        public static java.lang.CharSequence reverseComplementAsDna​(java.lang.CharSequence charSequence)
        Similar to reverseComplement except that the result will be returned as DNA even if the input sequence is RNA. This implementation is more efficient than reverseComplement(CharSequence) because it does not need to check the input sequence data type.
        Parameters:
        charSequence - The charSequence for which to construct a reverse complement view.
        Returns:
        A reverse complement view onto charSequence, with case and gaps preserved, but RNA converted to DNA
        Since:
        API 4.202100 (Geneious 2021.0.0)
      • isPredominantlyRna

        public static boolean isPredominantlyRna​(java.lang.CharSequence charSequence,
                                                 int maximumNonGapsToLookAt)
        Checks whether a sequence is predominantly RNA (rather than DNA). Same as Utils.isPredominantlyRNA(CharSequence, int), but more efficient on SequenceCharSequences.
        Parameters:
        charSequence - A charSequence that may contain DNA or RNA characters.
        maximumNonGapsToLookAt - Maximum number of non-gap residues to look at before making a decision; Pass in Integer.MAX_VALUE to look at all residues
        Returns:
        true of the non-gap residues of charSequence are predominantly RNA
      • isRna

        public static boolean isRna​(java.lang.CharSequence charSequence)
        Checks whether a sequence is RNA (rather than DNA) based on whether the sequence contains either a T/t or a U/u first. If it contains neither T/t nor U/u, this method returns false.
        Parameters:
        charSequence - A charSequence that may contain DNA or RNA characters.
        Returns:
        true if this charSequence is RNA (rather than DNA)
        See Also:
        isPredominantlyRna(CharSequence, int)
      • isRna

        public static boolean isRna​(java.lang.CharSequence charSequence,
                                    int maxNucleotidesToCheck)
        Checks whether a sequence is RNA (rather than DNA) based on whether the sequence contains either a T/t or a U/u first. If it contains neither T/t nor U/u, this method returns false.
        Parameters:
        charSequence - A charSequence that may contain DNA or RNA characters.
        maxNucleotidesToCheck - the maximum number of nucleotides/gaps to check (excluding leading/trailing gaps) before giving up and calling it DNA if no T or U is found.
        Returns:
        true if this charSequence is RNA (rather than DNA)
        Since:
        API 4.202200 (Geneious 2022.0.0)
        See Also:
        isPredominantlyRna(CharSequence, int)
      • removeGaps

        public static java.lang.CharSequence removeGaps​(java.lang.CharSequence charSequence)
        Constructs a sequence without gaps ('-') from a specified sequence that potentially has gaps. If the specified sequence does contain gaps, then a gapless copy is returned. Otherwise, the original charSequence is returned. It is guaranteed that CharSequenceUtilities.equals(removeGaps(cs), cs.toString().replace("-", "")) for any CharSequence cs that fulfills its contract.
        Parameters:
        charSequence - A nucleotide or amino acid sequence, potentially with gaps ('-')
        Returns:
        A CharSequence that contains the same sequence of characters but without the gaps ('-'). Returns charSequence if charSequence doesn't contain any gaps.
      • getLeadingGapsLength

        public static int getLeadingGapsLength​(java.lang.CharSequence charSequence)
        Returns the start index of the non-gap regions in the specified charSequence, i.e. the length of the longset prefix of charSequence that contains only '-' characters.
        Parameters:
        charSequence - A CharSequence that may contain some leading gap characters '-'
        Returns:
        The length of the longest prefix of charSequence that contains only '-' characters.
      • getTrailingGapsLength

        public static int getTrailingGapsLength​(java.lang.CharSequence charSequence)
        Get the number of trailing gap ('-') characters in the sequence.
        Parameters:
        charSequence - A CharSequence that may have trailing gap characters.
        Returns:
        the number of trailing gap ('-') characters in the sequence or 0 if the sequence is entirely gaps.
      • getTrailingGapsStartIndex

        public static int getTrailingGapsStartIndex​(java.lang.CharSequence charSequence)
        Returns the end index of the non-gap regions in the specified charSequence. This is identical to charSequence.length() minus the length of the longest suffix of charSequence that consists only of '-', except when charSequence consists only of '-', in which case this method returns charSequence.length() because there are no non-gap regions. In other words, in a sequence that consists only of gaps, all gaps are considered leading rather than trailing gaps, i.e. the non-gap region is considered to start just beyond the end of the sequence.
        Parameters:
        charSequence - A CharSequence that may contain some leading gap characters '-'
        Returns:
        1+the index of the last nongap character in charSequence, or charSequence.length() if charSequence consists only of gaps
      • getAlphabet

        public static SequenceDocument.Alphabet getAlphabet​(SequenceDocument sequence)
        Get the Alphabet of a sequence.
        Parameters:
        sequence - a SequenceDocument to get the alphabet for.
        Returns:
        Alphabet of sequence
      • getSequenceType

        public static jebl.evolution.sequences.SequenceType getSequenceType​(SequenceDocument sequence)
        Get the (jebl) sequence type.
        Parameters:
        sequence - a SequenceDocument to get the sequence type of.
        Returns:
        type of sequence
        Throws:
        java.lang.IllegalArgumentException - if sequence is not either a NucleotideSequenceDocument or a AminoAcidSequenceDocument.
      • getSequenceType

        public static java.util.List<jebl.evolution.sequences.SequenceType> getSequenceType​(AnnotatedPluginDocument document)
        Examines a document and determines what the (jebl) sequence type (or types) of the document is (or are), and returns it (or them).

        Always returns a List<SequenceType> of size 0, 1 or 2.
        • Size 0: when the given document was a type that could have either SequenceType.AMINO_ACID or SequenceType.NUCLEOTIDE or both, and that document has no sequences at all, for example an empty SequenceListDocument
        • Size 1: when the given document just contains a single sequence or is of a type where the SequenceType is always known, e,g a NucleotideSequenceDocument or a SequenceAlignmentDocument.
        • Size 2: when the given document has sequences of both types, e.g. a SequenceListDocument with sequences of both types.
        Parameters:
        document - the document to determine the SequenceType of
        Returns:
        a list containing the sequence type or types of the given document.
        Throws:
        java.lang.IllegalArgumentException - if the given document type wasn't a valid type to determine the SequenceType of.
        Since:
        API 4.610 (Geneious 6.1.0)
      • getAlphabet

        public static SequenceDocument.Alphabet getAlphabet​(AnnotatedPluginDocument... documents)
        Parameters:
        documents - the documents to get the alphabet for
        Returns:
        The alphabet that all these documents have in common, or null if they are not all the same alphabet or if any of the documents have multiple alphabets
        Throws:
        java.lang.IllegalArgumentException - if any of the documents aren't a type of sequence (nucleotide, protein, sequence list or alignment)
        Since:
        API 4.1010 (Geneious 10.1.0)
      • toHTMLFragment

        public static java.lang.String toHTMLFragment​(SequenceDocument sequence,
                                                      java.lang.String additionalContent)
        Generate a HTML fragment that summarises a sequence, including the sequence string. If the sequence is longer than a certain threshold X, then only the first X residues are shown.
        Parameters:
        sequence - a SequenceDocument
        additionalContent - additional content to include
        Returns:
        the html formatted summary
      • asJeblSequence

        public static jebl.evolution.sequences.Sequence asJeblSequence​(SequenceDocument sequence)
        Convert from a Geneious sequence to a jebl sequence.
        Parameters:
        sequence - a Geneious sequence
        Returns:
        sequence as a jebl sequence.
      • asJeblSequences

        public static java.util.List<jebl.evolution.sequences.Sequence> asJeblSequences​(java.util.List<SequenceDocument> sequences)
        Convert a set of Geneious sequences to jebl sequences.
        Parameters:
        sequences - Geneious sequences
        Returns:
        the Geneious sequences as jebl sequences
      • asJeblSequences

        public static java.util.List<jebl.evolution.sequences.Sequence> asJeblSequences​(SequenceDocument... sequences)
        Convert a set of Geneious sequences to jebl sequences.
        Parameters:
        sequences - Geneious sequences
        Returns:
        the Geneious sequences as jebl sequences
      • asJeblAlignment

        public static jebl.evolution.alignments.Alignment asJeblAlignment​(java.util.List<SequenceDocument> sequences)
        Convert a list of (aligned) Geneious sequences to a jebl alignmnent
        Parameters:
        sequences - aligned Geneious sequences
        Returns:
        the Geneious sequences as a jebl alignmnent
      • asJeblSequence

        public static jebl.evolution.sequences.Sequence asJeblSequence​(SequenceAlignmentDocument.ReferencedSequence referencedSequence,
                                                                       SequenceDocument sequence)
                                                                throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Convert from a Geneious sequence to a jebl sequence.
        Parameters:
        referencedSequence - original referenced sequence to copy additional fields from. May be null.
        sequence - a Geneious sequence
        Returns:
        sequence as a jebl sequence
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - when the referenced sequence cannot be loaded
        Since:
        API 4.700 (Geneious 7.0.0)
      • replaceQuestionMarksWithMaximalAmbiguitySymbol

        public static java.lang.String replaceQuestionMarksWithMaximalAmbiguitySymbol​(jebl.evolution.sequences.SequenceType sequenceType,
                                                                                      java.lang.String sequence)
        get a version of a sequence string with any question marks replaces with N (for nucleotide sequences) or X (for protein sequences)
        Parameters:
        sequenceType - sequence type of sequence
        sequence - sequence string
        Returns:
        version of sequence with any question marks replaces with N (for nucleotide sequences) or X (for protein sequences)
      • getMaximalAmbiguitySymbol

        public static java.lang.String getMaximalAmbiguitySymbol​(jebl.evolution.sequences.SequenceType sequenceType)
        get the code for the state in this sequence type which represents a base/residue that is completely unknown
        Parameters:
        sequenceType -
        Returns:
      • getAnnotationsOfType

        public static java.util.List<SequenceAnnotation> getAnnotationsOfType​(java.util.List<SequenceAnnotation> annotations,
                                                                              java.lang.String type)
        Get all annotations in list matching the given type
        Parameters:
        annotations - annotations
        type - type of annotations to get
        Returns:
        all annotations in document matching the given type
      • createSequenceCopyAdjustedForGapInsertion

        public static SequenceDocument createSequenceCopyAdjustedForGapInsertion​(SequenceDocument sequenceDocument,
                                                                                 java.lang.CharSequence gappedSequenceCharacters)
        Creates a copy of the given sequence with annotations, sequence residues, and chromatogram values adjusted to account for gap insertion. Note, the returned copy does not create gapped versions of SequenceTracks. Tracks are instead automatically propagated from referenced documents in alignments.
        Parameters:
        sequenceDocument - a sequence to insert gaps into. If the sequence alreayd contains gaps, the gaps are removed first
        gappedSequenceCharacters - the sequence characters to appear in the new gapped sequence. The positions of gaps in this character sequence determine how annotations and chromatograms are adjusted.
        Returns:
        a copy of sequenceDocument adjusted for gap insertion. This is always a DefaultSequenceDocument but this method isn't declared to return that for API backwards compatibility reasons
        See Also:
        SequenceExtractionUtilities.removeGaps(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument, boolean)
      • createSequenceCopyAdjustedForGapInsertion

        public static SequenceDocument createSequenceCopyAdjustedForGapInsertion​(SequenceDocument sequenceDocument,
                                                                                 java.lang.CharSequence gappedSequenceCharacters,
                                                                                 boolean includeTracks)
        Creates a copy of the given sequence with annotations, sequence residues, and chromatogram values adjusted to account for gap insertion.
        Parameters:
        sequenceDocument - a sequence to insert gaps into. If the sequence alreayd contains gaps, the gaps are removed first
        gappedSequenceCharacters - the sequence characters to appear in the new gapped sequence. The positions of gaps in this character sequence determine how annotations and chromatograms are adjusted.
        includeTracks - true if tracks should also be copied. If this is intended for use with an alignment which references the original documents, this should be false as alignment documents propagate tracks on demand from referenced documents.
        Returns:
        a copy of sequenceDocument adjusted for gap insertion. This is always a DefaultSequenceDocument but this method isn't declared to return that for API backwards compatibility reasons
        Since:
        API 4.202000 (Geneious 2020.0.0)
        See Also:
        SequenceExtractionUtilities.removeGaps(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument, boolean)
      • concatenateSequences

        public static SequenceDocument concatenateSequences​(java.util.List<? extends SequenceDocument> sequences,
                                                            boolean circular,
                                                            int indexOfDocumentToUseForOrigin,
                                                            jebl.util.ProgressListener progressListener)
                                                     throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Concatenate a list of sequence documents. All sequences must be of the same type (all nucleotide or all amino acid). For circular results, indexOfDocumentToUse may be used to specify which input sequence should be used to determine the origin for the result. If the specified input sequence is circular and has an annotated origin, this position will be used; otherwise, the start of the specified sequence will be the origin of the result. If circular is false, indexOfDocumentToUse must be -1.
        Parameters:
        sequences - sequence documents to concatenate
        circular - if true, the result will be circular
        indexOfDocumentToUseForOrigin - index of document to use for the origin (must be -1 for linear results)
        progressListener -
        Returns:
        concatenated sequence
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Since:
        API 4.1100 (Geneious 11.0.0)
      • getSequences

        public static java.util.List<? extends SequenceDocument> getSequences​(AnnotatedPluginDocument[] documents,
                                                                              SequenceDocument.Alphabet alphabet,
                                                                              jebl.util.ProgressListener progressListener)
                                                                       throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        get all the sequences out of a set of AnnotatedPluginDocuments that may wrap SequenceDocuments, SequenceListDocuments or SequenceAlignmentDocuments. For large sequence lists (SequenceListOnDisk) and genome sized sequences (those longer than SequenceDocument.GENOME_SEQUENCE_THRESHOLD) in other sequence lists, these are only loaded into memory on demand to ensure this method doesn't use excessive memory. If this method is potentially called on thousands of documents, then getSequencesWithoutImmediateLoading should be considered instead.
        Parameters:
        documents - documents to get the sequences out of
        alphabet - alphabet the sequences need to be to be included
        progressListener - for notifying the caller about progress of this method and for cancelling.
        Returns:
        all the sequences. Sequences are ordered by the AnnotatedPluginDocument they are in, and then by their index in that document.
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if there is a problem getting the PluginDocument out of an AnnotatedPluginDocument or if the progress listener cancels the request.
      • getSequences

        public static java.util.List<? extends SequenceDocument> getSequences​(java.util.List<AnnotatedPluginDocument> documents,
                                                                              SequenceDocument.Alphabet alphabet,
                                                                              jebl.util.ProgressListener progressListener)
                                                                       throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        get all the sequences out of a set of AnnotatedPluginDocuments that may wrap SequenceDocuments, SequenceListDocuments or SequenceAlignmentDocuments. For large sequence lists (SequenceListOnDisk) and genome sized sequences (those longer than SequenceDocument.GENOME_SEQUENCE_THRESHOLD) in other sequence lists, these are only loaded into memory on demand to ensure this method doesn't use excessive memory. If this method is potentially called on thousands of documents, then getSequencesWithoutImmediateLoading should be considered instead.
        Parameters:
        documents - documents to get the sequences out of
        alphabet - alphabet the sequences need to be to be included
        progressListener - for notifying the caller about progress of this method and for cancelling.
        Returns:
        all the sequences
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if there is a problem getting the PluginDocument out of an AnnotatedPluginDocument or if the progress listener cancels the request.
        Since:
        API 4.700 (Geneious 7.0.0)
      • getOriginalIndex

        public static int getOriginalIndex​(SequenceDocument sequence,
                                           int index)
        Gets the original numbering of the given index if it is covered by a SequenceAnnotation.TYPE_EXTRACTED_REGION annotation.
        Parameters:
        sequence - the sequence this index belongs to.
        index - the index to get the original numbering for.
        Returns:
        'translated' index or the original index if no other numbering can be found.
        Since:
        API 4.900 (Geneious 9.0.0)
      • getNumberOfSequences

        public static long getNumberOfSequences​(java.util.List<AnnotatedPluginDocument> documents,
                                                SequenceDocument.Alphabet alphabet)
        Gets the total number of nucleotide or amino acid sequences contained in the given documents which may be individual sequences, sequence lists, or alignments/contigs.
        Parameters:
        documents - the documents to get the number of sequences in
        alphabet - the alphabet (nucleotide or amino acid) of the sequences to count.
        Returns:
        the total number of nucleotide sequences or amino acid contained in the given documents
        Since:
        API 4.40 (Geneious 5.4.0)
      • getNumberOfSequences

        public static long getNumberOfSequences​(AnnotatedPluginDocument document,
                                                SequenceDocument.Alphabet alphabet)
        Gets the total number of nucleotide or amino acid sequences contained in the given document which may be an individual sequence, sequence list, or alignment/contig.
        Parameters:
        document - the document to get the number of sequences in
        alphabet - the alphabet (nucleotide or amino acid) of the sequences to count.
        Returns:
        the total number of nucleotide or amino acid sequences contained in the given document
        Since:
        API 4.40 (Geneious 5.4.0)
      • generateConsensus

        public static SequenceDocument generateConsensus​(SequenceAlignmentDocument alignment,
                                                         jebl.util.ProgressListener progressListener)
                                                  throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Generates a consensus sequence for an alignment using default consensus settings. Note that the returned sequence may contain gaps. If it is to be used as a stand-alone sequence, then SequenceExtractionUtilities.removeGaps(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument) should be used.

        To generate consensus sequences with non-default options, use PluginUtilities.getDocumentOperation("Generate_Consensus"). Note that this operation generates an sequence with gaps removed by default.

        Parameters:
        alignment - the alignment to generate the consensus sequence for
        progressListener - for reporting progress can cancelling.
        Returns:
        a sequence equal in length to the alignment. The sequence may contain gaps. Will not return null
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if the consensus can't be generated because there is insufficient free memory.
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException.Canceled - if the progressListener requests the consensus generation be cancelled.
        Since:
        API 4.610 (Geneious 6.1.0)
      • getBlastAlignmentText

        public static java.lang.String getBlastAlignmentText​(SequenceAlignmentDocument alignment,
                                                             boolean geneiousFriendly)
        Formats the given alignment in BLAST text format
        Parameters:
        alignment - alignment to format
        geneiousFriendly - whether to format the alignment in an html-formatted "Geneious friendly" way that is useful generally for alignments and not just for BLAST output
        Returns:
        alignment represented in BLAST text format
        Since:
        API 4.700 (Geneious 7.0.0)
      • alignmentFromJeblSequences

        public static DefaultAlignmentDocument alignmentFromJeblSequences​(java.lang.String name,
                                                                          java.util.List<jebl.evolution.sequences.Sequence> jeblSequences)
        Converts the given alignment of Jebl sequences into a DefaultAlignmentDocument
        Parameters:
        name - name for alignment
        jeblSequences - aligned jebl sequences
        Returns:
        a DefaultAlignmentDocument representing the given alignment.
        Since:
        API 4.700 (Geneious 7.0.0)
      • createNewDocumentsByTransformingSequences

        public static java.util.List<AnnotatedPluginDocument> createNewDocumentsByTransformingSequences​(java.util.List<AnnotatedPluginDocument> sourceDocuments,
                                                                                                        SequenceDocument.Transformer transformer,
                                                                                                        jebl.util.ProgressListener progressListener,
                                                                                                        java.lang.String newSequenceOrDocumentNamePrefix)
                                                                                                 throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Transforms the sequence(s) in each input document and returns a new document corresponding to each input document.
        Parameters:
        sourceDocuments - the source documents containing sequences to transform. These may be SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
        transformer - the transformer for transforming each sequence
        progressListener - for reporting progress and canceling
        newSequenceOrDocumentNamePrefix - an optional prefix to assign to the name of each newly generated document. May be an empty String to leave names unchanged.
        Returns:
        the new documents
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if documents can't be loaded, or if the input documents are not SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
        Since:
        API 4.701 (Geneious 7.0.1)
      • createNewDocumentsByTransformingSequences

        public static java.util.List<AnnotatedPluginDocument> createNewDocumentsByTransformingSequences​(java.util.List<AnnotatedPluginDocument> sourceDocuments,
                                                                                                        SequenceDocument.Transformer transformer,
                                                                                                        jebl.util.ProgressListener progressListener,
                                                                                                        java.lang.String newSequenceOrDocumentNamePrefix,
                                                                                                        java.lang.String newSequenceOrDocumentNameSuffix)
                                                                                                 throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
        Transforms the sequence(s) in each input document and returns a new document corresponding to each input document.
        Parameters:
        sourceDocuments - the source documents containing sequences to transform. These may be SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
        transformer - the transformer for transforming each sequence
        progressListener - for reporting progress and canceling
        newSequenceOrDocumentNamePrefix - an optional prefix to assign to the name of each newly generated document. May be an empty String to leave names unchanged.
        newSequenceOrDocumentNameSuffix - an optional suffix to assign to the name of each newly generated document. May be an empty String to leave names unchanged.
        Returns:
        the new documents
        Throws:
        com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if documents can't be loaded, or if the input documents are not SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
        Since:
        API 4.201920 (Geneious 2019.2.0)
      • getIntervalBasedOnExtractionAnnotation

        public static SequenceAnnotationInterval getIntervalBasedOnExtractionAnnotation​(SequenceDocument sequenceDocument,
                                                                                        SequenceAnnotationInterval interval,
                                                                                        boolean mapToOriginal)
        Gets the extraction annotations from the sequence document and maps the interval to either the original sequence or the result sequence, depending on the value of mapToOriginal
        Parameters:
        sequenceDocument - the document to get the extractionAnnotations from
        interval - the interval to re-map
        mapToOriginal - whether to map this interval to the corresponding bit on the original or to the corresponding bit on the result
        Returns:
        a new interval that represents the given interval on either the original or result document, return parameter interval back if can not find mapping
        Since:
        API 4.1000 (Geneious 10.0.0)
      • getIndexBasedOnExtractionAnnotation

        public static java.lang.Integer getIndexBasedOnExtractionAnnotation​(SequenceDocument sequenceDocument,
                                                                            int index,
                                                                            boolean mapToOriginal)
        Gets the extraction annotations from the sequence document and maps a residue index to a residue index on either the original sequence or the result sequence, depending on the value of mapToOriginal
        Parameters:
        sequenceDocument - the document to get the extractionAnnotations from
        index - the 1-based residue position in the sequence to re-map.
        mapToOriginal - whether to map this interval to the corresponding bit on the original or to the corresponding bit on the result
        Returns:
        a new index that represents the given index on either the original or result document, return null if the index can't be mapped.
        Since:
        API 4.1000 (Geneious 10.0.0)