Class SequenceUtilities

java.lang.Object
com.biomatters.geneious.publicapi.utilities.SequenceUtilities

public final class SequenceUtilities extends Object
A noninstantiable class providing static methods for common tasks associated with nucleotide and protein sequences.
See Also:
  • Method Details

    • getForwardRegexForSequence

      public static String getForwardRegexForSequence(CharSequence querySequence, SequenceType sequenceType, boolean interpretAmbiguitiesInQuery)
    • getForwardRegexForSequence

      @Deprecated public static String getForwardRegexForSequence(CharSequence querySequence, SequenceType sequenceType, boolean interpretAmbiguitiesInQuery, boolean interpretAmbiguitiesInTarget)
    • getForwardRegexForSequence

      public static String getForwardRegexForSequence(CharSequence querySequence, SequenceType sequenceType, boolean interpretAmbiguitiesInQuery, boolean interpretAmbiguitiesInTarget, boolean allowExtraGapsInTarget)
      Given a nucleotide or amino acid sequence, returns a regular expression that matches forward occurrences of this sequence in a larger sequence, i.e. a String s such that Pattern.compile(s, Pattern.CASE_INSENSITIVE) will find all case insensitive forward matches of sequenceString in a larger sequence. The regular expression returned will also match sequences with gaps inserted at any point within the sequence.
      Parameters:
      querySequence - The nucleotide or amino acid sequence to search for.
      sequenceType - The type of the sequence
      interpretAmbiguitiesInQuery - If true, then an ambiguous character (e.g. R for nucleotides) in querySequence will match the corresponding canonical states (A and G) in the target.
      interpretAmbiguitiesInTarget - If true, then an ambiguous character (e.g. R for nucleotides) in the sequence being searched within will match the corresponding canonical states (A and G) in the querySequence.
      allowExtraGapsInTarget - If true, then additional gaps will be allowed in the sequence being search within
      Returns:
      a regular expression that matches forward occurrences of this search string in a larger sequence string or null if any of the characters in the sequence string are not valid residues for sequenceType
      Since:
      API 4.610 (Geneious 6.1.0)
    • getForwardRegexPatternForSequence

      public static Pattern getForwardRegexPatternForSequence(CharSequence sequence, SequenceType sequenceType, boolean interpretAmbiguitiesInQuery)
    • getForwardRegexPatternForSequence

      public static Pattern getForwardRegexPatternForSequence(CharSequence querySequence, SequenceType sequenceType, boolean interpretAmbiguitiesInQuery, boolean interpretAmbiguitiesInTarget)
      Given a nucleotide or amino acid sequence string, returns a regular expression pattern that matches forward occurrences of this search string in a larger sequence string,
      Parameters:
      querySequence - The nucleotide or amino acid sequence to search for.
      sequenceType - The type of the sequence
      interpretAmbiguitiesInQuery - If true, then an ambiguous character (e.g. R for nucleotides) in sequenceString will match the corresponding canonical states (A and G) in the target.
      Returns:
      a regular expression that matches forward occurrences of this search string in a larger sequence string or null if any of the characters in the sequence string are not valid residues for sequenceType
    • isStateAssignableFrom

      public static boolean isStateAssignableFrom(State stateA, State stateB)
      Same as stateA.getCanonicalStates().containsAll(stateB.getCanonicalStates()) except that for NucleotideStates and AminoAcidStates it caches the result.
      Parameters:
      stateA - A state (e.g. a NucleotideState or AminoAcidState)
      stateB - A state of the same type as stateB
      Returns:
      true if stateA.getCanonicalStates().containsAll(stateB.getCanonicalStates))
    • createSequenceDocument

      public static DefaultSequenceDocument createSequenceDocument(SequenceType sequenceType, String name, String description, CharSequence sequenceString, Date creationDate)
      Creates a DefaultNucleotideSequence or DefaultAminoAcidSequence depending on sequenceType. See the documentation of these classes' constructors for the semantics of the parameters.
    • setOriginalResidueNumbering

      public static void setOriginalResidueNumbering(EditableSequenceDocument document, int startIndex, boolean isReverse)
      set the original residue numbering of a document the residue index of a document will appear shifted if the user has "show original residue numbers" selected in the sequence view
      Parameters:
      document - document to set the residue numbering for
      startIndex - start index for the residue numbering. The first original residue is residue 1.
      isReverse - true if the residue numbering should count down from startIndex, false if it should count up.
    • containsInvalidResidues

      public static String containsInvalidResidues(SequenceDocument sequenceDocument, boolean allowGaps, boolean fastIncompleteCheck)
      Checks if a sequence contain invalid sequence residues.
      Parameters:
      sequenceDocument - the sequence to check for validity.
      allowGaps - true if the sequence is allowed to contain gaps
      fastIncompleteCheck - This parameter is ignored. It was added when Java 5 was widely used which is 10 times slower than Java 6. Checking enormous sequences is slow (a 2GB sequence takes about 20 seconds in Java 5, 2 seconds in Java 6). Set this parameter to true to check only the first and last 1,000,000 residues which catches almost all invalid cases and is much faster on enormous sequences.
      Returns:
      null if the sequence residues are all valid, or if the sequence contains invalid residues a message describing the first invalid residue is returned.
    • getSequenceType

      public static SequenceType getSequenceType(SequenceDocument.Alphabet alphabet)
      Gets a jebl library SequenceType that is equivalent to a Geneious alphabet.
      Parameters:
      alphabet -
      Returns:
      sequence type that is equivalent to this alphabet
    • getAlphabet

      public static SequenceDocument.Alphabet getAlphabet(SequenceType sequenceType)
      Gets a Geneious alphabet type that is equivalent to a jebl library SequenceType.
      Parameters:
      sequenceType -
      Returns:
      alphabet that is equivalent to this sequence tyqpe
    • containsInvalidResidues

      public static String containsInvalidResidues(CharSequence sequenceResidues, SequenceDocument.Alphabet alphabet, boolean allowGaps, boolean fastIncompleteCheck)
      Checks if a sequence contain invalid sequence residues.
      Parameters:
      sequenceResidues - sequence residues to check for validity.
      alphabet - the alphabet of residues expected to be in sequenceResidues
      allowGaps - true if the sequence is allowed to contain gaps
      fastIncompleteCheck - This parameter is ignored. It was added when Java 5 was widely used which is 10 times slower than Java 6. Checking enormous sequences is slow (a 2GB sequence takes about 20 seconds in Java 5, 2 seconds in Java 6). Set this parameter to true to check only the first and last 1,000,000 residues which catches almost all invalid cases and is much faster on enormous sequences.
      Returns:
      null if the sequence residues are all valid, or if the sequence contains invalid residues a message describing the first invalid residue is returned.
    • containsInvalidResidues

      public static String containsInvalidResidues(CharSequence sequenceResidues, SequenceType sequenceType, boolean allowGaps, boolean fastIncompleteCheck)
      Checks if a sequence contain invalid sequence residues.
      Parameters:
      sequenceResidues - sequence residues to check for validity.
      sequenceType - the type of residues expected to be in sequenceResidues
      allowGaps - true if the sequence is allowed to contain gaps
      fastIncompleteCheck - This parameter is ignored. It was added when Java 5 was widely used which is 10 times slower than Java 6. Checking enormous sequences is slow (a 2GB sequence takes about 20 seconds in Java 5, 2 seconds in Java 6). Set this parameter to true to check only the first and last 1,000,000 residues which catches almost all invalid cases and is much faster on enormous sequences.
      Returns:
      null if the sequence residues are all valid, or if the sequence contains invalid residues a message describing the first invalid residue is returned.
    • removeInvalidResidues

      public static CharSequence removeInvalidResidues(CharSequence sequence, SequenceType sequenceType, boolean allowGaps)
      Get a sequence string identical to sequence except that any invalid residues are removed. Gaps are only removed if allowGaps is false. All valid characters remain unchanged (they maintain their original case and there are no U->T replacements for nucleotides.)
      Parameters:
      sequence - a string of residues that may or may not be valid residues
      sequenceType - the type of residues in sequence
      allowGaps - if this is true, then gaps are not removed.
      Returns:
      a sequence string identical to sequence except that any invalid residues are removed. If there are no invalid residues, sequence is returned.
      Throws:
      OutOfMemoryError - if a sequence comtains invalid residues and a valid version of the sequence cannot fit in memory
    • getValidSequence

      public static CharSequence getValidSequence(SequenceDocument sequenceDocument, boolean allowGaps)
      Replace any invalid bases/residues in the given sequence document with ambiguity symbols.
      Parameters:
      sequenceDocument - sequence document to replace the invalid bases in
      allowGaps - whether gaps are allowed (if false they will be replaced with ambiguity symbols)
      Returns:
      the version of the sequence string with the invalid bases/residues replaced with ambiguity symbols
    • getValidSequence

      public static CharSequence getValidSequence(SequenceDocument sequenceDocument, boolean allowGaps, boolean replaceWithGaps)
      Replace any invalid bases/residues in the given sequence document with ambiguity symbols or gaps.
      Parameters:
      sequenceDocument - sequence document to replace the invalid bases in
      allowGaps - whether gaps are allowed (if false they will be replaced with ambiguity symbols)
      replaceWithGaps - whether invalid bases/residues should be replaced with gaps - should only be done if sequence is in an alignment
      Returns:
      the version of the sequence string with the invalid bases/residues replaced with ambiguity symbols
      Throws:
      IllegalArgumentException - if allowGaps is false but replaceWithGaps is true
      Since:
      API 4.20 (Geneious 5.2.0)
    • asRna

      public static CharSequence asRna(CharSequence nucleotideCharSequence)
      Views an underlying (nucleotide) CharSequence as RNA by dynamically translating 'T's to 'U's and 't's to 'u's It is guaranteed that if charSequence instanceof SequenceCharSequence, the returned value will also be instanceof SequenceCharSequence (but it may not support log-time modifications).
      Parameters:
      nucleotideCharSequence - A nucleotide sequence which may already have some RNA residues; it is not guaranteed that it is checked whether the sequence contains invalid residues. Must not be null.
      Returns:
      A CharSequence with the same sequence of characters as charSequence, except that 'T's are replaced with 'U's and 't's are replaced with 'u's
    • asDna

      public static CharSequence asDna(CharSequence nucleotideCharSequence)
      Views an underlying (nucleotide) CharSequence as DNA by dynamically translating 'U's to 'T's and 'u's to 't's It is guaranteed that if charSequence instanceof SequenceCharSequence, the returned value will also be instanceof SequenceCharSequence (but it may not support log-time modifications).
      Parameters:
      nucleotideCharSequence - A nucleotide sequence which may already have some DNA residues; it is not guaranteed that it is checked whether the sequence contains invalid residues. Must not be null.
      Returns:
      A CharSequence with the same sequence of characters as charSequence, except that 'U's are replaced with 'T's and 'u's are replaced with 't's
    • asTranslation

      @Deprecated public static CharSequence asTranslation(CharSequence nucleotideCharSequence, GeneticCode geneticCode)
    • asTranslation

      public static CharSequence asTranslation(CharSequence nucleotideCharSequence, GeneticCode geneticCode, boolean translateFirstCodonUsingFirstCodonTable)
      Views an underlying (nucleotide) CharSequence as its translation. If the CharSequence is not a multiple of 3, the extra 1 or 2 characters are ignored. The translated sequence will have length nucleotideCharSequence.length()/3. If the nucleotide sequence contains unknown nucleotide characters, these are treated as unknown states and the corresponding translated site will also be the unknown state (?) unless the nucleotide base would not affect the translation (e.g. the 3rd base in some triplets). The concrete type of the return value is not guaranteed. The specified charSequence must not change after it was passed to this method, but it is not guaranteed that violations of this contract will be detected.
      Parameters:
      nucleotideCharSequence - A nucleotide sequence which may be dna, rna or a mixture. Must not be null and must be immutable. Must not contain gaps.
      geneticCode - the genetic code to use for the translation. Must not be null.
      translateFirstCodonUsingFirstCodonTable - each genetic code specifies a set of codons which get translated as M if they are the first codon even though they normally wouldn't translate as an M when occurring elsewhere a coding region. If this parameter is true the first codon will be translated using this alternative translation table for the genetic code.
      Returns:
      A CharSequence which is a translation of nucleotideCharSequence
      Throws:
      IllegalArgumentException - if nucleotideCharSequence contains gaps.
      NullPointerException - if nucleotideCharSequence or geneticCode is null.
      Since:
      API 4.41 (Geneious 5.4.1)
    • reverseComplement

      public static CharSequence reverseComplement(CharSequence charSequence)

      Provides a dynamic reverse complement view onto a nucletoide CharSequence. For performance, it is not guaranteed whether the charSequence will be checked for invalid residues. If an invalid nucleotide CharSequence is passed in, arbitrary nondeterministic behaviour may occur at any later time, such as e.g. unchecked exceptions thrown from CharSequence.charAt(int).

      It is guaranteed that if charSequence instanceof SequenceCharSequence, the returned value will also be instanceof SequenceCharSequence (but it may not support log-time modifications).

      Attention: Unlike Utils.reverseComplement(String), this method preserves case and doesn't remove gaps. To remove gaps and convert the sequence to upper case, use removeGaps(CharSequence) and CharSequenceUtilities.asUpperCase(CharSequence).

      This method may be slow on sequences which do not contain a T or U near the start of the sequence as it needs to scan through the sequence to determine if it is RNA or DNA. Consider using reverseComplementAsDna(CharSequence) for better performance.
      Parameters:
      charSequence - The charSequence for which to construct a reverse complement view.
      Returns:
      A reverse complement view onto charSequence, with case and gaps preserved.
      See Also:
    • reverseComplementAsDna

      public static CharSequence reverseComplementAsDna(CharSequence charSequence)
      Similar to reverseComplement except that the result will be returned as DNA even if the input sequence is RNA. This implementation is more efficient than reverseComplement(CharSequence) because it does not need to check the input sequence data type.
      Parameters:
      charSequence - The charSequence for which to construct a reverse complement view.
      Returns:
      A reverse complement view onto charSequence, with case and gaps preserved, but RNA converted to DNA
      Since:
      API 4.202100 (Geneious 2021.0.0)
    • isPredominantlyRna

      public static boolean isPredominantlyRna(CharSequence charSequence, int maximumNonGapsToLookAt)
      Checks whether a sequence is predominantly RNA (rather than DNA). Same as Utils.isPredominantlyRNA(CharSequence, int), but more efficient on SequenceCharSequences.
      Parameters:
      charSequence - A charSequence that may contain DNA or RNA characters.
      maximumNonGapsToLookAt - Maximum number of non-gap residues to look at before making a decision; Pass in Integer.MAX_VALUE to look at all residues
      Returns:
      true of the non-gap residues of charSequence are predominantly RNA
    • isRna

      public static boolean isRna(CharSequence charSequence)
      Checks whether a sequence is RNA (rather than DNA) based on whether the sequence contains either a T/t or a U/u first. If it contains neither T/t nor U/u, this method returns false.
      Parameters:
      charSequence - A charSequence that may contain DNA or RNA characters.
      Returns:
      true if this charSequence is RNA (rather than DNA)
      See Also:
    • isRna

      public static boolean isRna(CharSequence charSequence, int maxNucleotidesToCheck)
      Checks whether a sequence is RNA (rather than DNA) based on whether the sequence contains either a T/t or a U/u first. If it contains neither T/t nor U/u, this method returns false.
      Parameters:
      charSequence - A charSequence that may contain DNA or RNA characters.
      maxNucleotidesToCheck - the maximum number of nucleotides/gaps to check (excluding leading/trailing gaps) before giving up and calling it DNA if no T or U is found.
      Returns:
      true if this charSequence is RNA (rather than DNA)
      Since:
      API 4.202200 (Geneious 2022.0.0)
      See Also:
    • removeGaps

      public static CharSequence removeGaps(CharSequence charSequence)
      Constructs a sequence without gaps ('-') from a specified sequence that potentially has gaps. If the specified sequence does contain gaps, then a gapless copy is returned. Otherwise, the original charSequence is returned. It is guaranteed that CharSequenceUtilities.equals(removeGaps(cs), cs.toString().replace("-", "")) for any CharSequence cs that fulfills its contract.
      Parameters:
      charSequence - A nucleotide or amino acid sequence, potentially with gaps ('-')
      Returns:
      A CharSequence that contains the same sequence of characters but without the gaps ('-'). Returns charSequence if charSequence doesn't contain any gaps.
    • getLeadingGapsLength

      public static int getLeadingGapsLength(CharSequence charSequence)
      Returns the start index of the non-gap regions in the specified charSequence, i.e. the length of the longset prefix of charSequence that contains only '-' characters.
      Parameters:
      charSequence - A CharSequence that may contain some leading gap characters '-'
      Returns:
      The length of the longest prefix of charSequence that contains only '-' characters.
    • getTrailingGapsLength

      public static int getTrailingGapsLength(CharSequence charSequence)
      Get the number of trailing gap ('-') characters in the sequence.
      Parameters:
      charSequence - A CharSequence that may have trailing gap characters.
      Returns:
      the number of trailing gap ('-') characters in the sequence or 0 if the sequence is entirely gaps.
    • getTrailingGapsStartIndex

      public static int getTrailingGapsStartIndex(CharSequence charSequence)
      Returns the end index of the non-gap regions in the specified charSequence. This is identical to charSequence.length() minus the length of the longest suffix of charSequence that consists only of '-', except when charSequence consists only of '-', in which case this method returns charSequence.length() because there are no non-gap regions. In other words, in a sequence that consists only of gaps, all gaps are considered leading rather than trailing gaps, i.e. the non-gap region is considered to start just beyond the end of the sequence.
      Parameters:
      charSequence - A CharSequence that may contain some leading gap characters '-'
      Returns:
      1+the index of the last nongap character in charSequence, or charSequence.length() if charSequence consists only of gaps
    • getAlphabet

      public static SequenceDocument.Alphabet getAlphabet(SequenceDocument sequence)
      Get the Alphabet of a sequence.
      Parameters:
      sequence - a SequenceDocument to get the alphabet for.
      Returns:
      Alphabet of sequence
    • getSequenceType

      public static SequenceType getSequenceType(SequenceDocument sequence)
      Get the (jebl) sequence type.
      Parameters:
      sequence - a SequenceDocument to get the sequence type of.
      Returns:
      type of sequence
      Throws:
      IllegalArgumentException - if sequence is not either a NucleotideSequenceDocument or a AminoAcidSequenceDocument.
    • getSequenceType

      public static List<SequenceType> getSequenceType(AnnotatedPluginDocument document)
      Examines a document and determines what the (jebl) sequence type (or types) of the document is (or are), and returns it (or them).

      Always returns a List<SequenceType> of size 0, 1 or 2.
      Parameters:
      document - the document to determine the SequenceType of
      Returns:
      a list containing the sequence type or types of the given document.
      Throws:
      IllegalArgumentException - if the given document type wasn't a valid type to determine the SequenceType of.
      Since:
      API 4.610 (Geneious 6.1.0)
    • getAlphabet

      public static SequenceDocument.Alphabet getAlphabet(AnnotatedPluginDocument... documents)
      Parameters:
      documents - the documents to get the alphabet for
      Returns:
      The alphabet that all these documents have in common, or null if they are not all the same alphabet or if any of the documents have multiple alphabets
      Throws:
      IllegalArgumentException - if any of the documents aren't a type of sequence (nucleotide, protein, sequence list or alignment)
      Since:
      API 4.1010 (Geneious 10.1.0)
    • toHTMLFragment

      public static String toHTMLFragment(SequenceDocument sequence, String additionalContent)
      Generate a HTML fragment that summarises a sequence, including the sequence string. If the sequence is longer than a certain threshold X, then only the first X residues are shown.
      Parameters:
      sequence - a SequenceDocument
      additionalContent - additional content to include
      Returns:
      the html formatted summary
    • asJeblSequence

      public static Sequence asJeblSequence(SequenceDocument sequence)
      Convert from a Geneious sequence to a jebl sequence.
      Parameters:
      sequence - a Geneious sequence
      Returns:
      sequence as a jebl sequence.
    • asJeblSequences

      public static List<Sequence> asJeblSequences(List<SequenceDocument> sequences)
      Convert a set of Geneious sequences to jebl sequences.
      Parameters:
      sequences - Geneious sequences
      Returns:
      the Geneious sequences as jebl sequences
    • asJeblSequences

      public static List<Sequence> asJeblSequences(SequenceDocument... sequences)
      Convert a set of Geneious sequences to jebl sequences.
      Parameters:
      sequences - Geneious sequences
      Returns:
      the Geneious sequences as jebl sequences
    • asJeblAlignment

      public static Alignment asJeblAlignment(List<SequenceDocument> sequences)
      Convert a list of (aligned) Geneious sequences to a jebl alignmnent
      Parameters:
      sequences - aligned Geneious sequences
      Returns:
      the Geneious sequences as a jebl alignmnent
    • createSequenceCopy

      public static SequenceDocument createSequenceCopy(SequenceDocument original)
      Creates a copy of the original sequence if necessary. If the sequence is an immutable sequence (ImmutableSequence) then it is not copied and is just returned from this method.
      Parameters:
      original - the original sequence
      Returns:
      a new sequence document or the original sequence if the original sequence is immutable.
      Since:
      API 4.11 (Geneious 5.0)
      See Also:
    • createSequenceCopyEditable

      public static DefaultSequenceDocument createSequenceCopyEditable(SequenceDocument original)
      Creates a copy of the original sequence that is editable.
      Parameters:
      original - the original sequence
      Returns:
      a new sequence document
      Since:
      API 4.11 (Geneious 5.0)
      See Also:
    • getSequenceAnnotationsIncludingImmutableSequencesTrims

      public static List<SequenceAnnotation> getSequenceAnnotationsIncludingImmutableSequencesTrims(SequenceDocument sequence)
      Gets all the annotations on the given sequence. Additionally if it is an ImmutableSequence with ImmutableSequence.getLeadingTrimLength() or ImmutableSequence.getTrailingTrimLength()>0 then annotations are created to represent these trims.
      Parameters:
      sequence - the sequence to get annotations from
      Returns:
      the annotations from the sequence
      Since:
      API 4.52 (Geneious 5.5.2)
    • asJeblSequence

      public static Sequence asJeblSequence(SequenceAlignmentDocument.ReferencedSequence referencedSequence, SequenceDocument sequence) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Convert from a Geneious sequence to a jebl sequence.
      Parameters:
      referencedSequence - original referenced sequence to copy additional fields from. May be null.
      sequence - a Geneious sequence
      Returns:
      sequence as a jebl sequence
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - when the referenced sequence cannot be loaded
      Since:
      API 4.700 (Geneious 7.0.0)
    • asJeblSequence

      @Deprecated public static Sequence asJeblSequence(AnnotatedPluginDocument referenceDocument, SequenceDocument sequence)
      Convert from a Geneious sequence to a jebl sequence.
      Parameters:
      referenceDocument - original AnnotatedPluginDocument to copy additional fields from. May be null.
      sequence - a Geneious sequence
      Returns:
      sequence as a jebl sequence
      Since:
      API 4.43 (Geneious 5.4.3)
    • replaceQuestionMarksWithMaximalAmbiguitySymbol

      public static String replaceQuestionMarksWithMaximalAmbiguitySymbol(SequenceType sequenceType, String sequence)
      get a version of a sequence string with any question marks replaces with N (for nucleotide sequences) or X (for protein sequences)
      Parameters:
      sequenceType - sequence type of sequence
      sequence - sequence string
      Returns:
      version of sequence with any question marks replaces with N (for nucleotide sequences) or X (for protein sequences)
    • getMaximalAmbiguitySymbol

      public static String getMaximalAmbiguitySymbol(SequenceType sequenceType)
      get the code for the state in this sequence type which represents a base/residue that is completely unknown
      Parameters:
      sequenceType -
      Returns:
    • getAnnotationsOfType

      public static List<SequenceAnnotation> getAnnotationsOfType(List<SequenceAnnotation> annotations, String type)
      Get all annotations in list matching the given type
      Parameters:
      annotations - annotations
      type - type of annotations to get
      Returns:
      all annotations in document matching the given type
    • getAnnotationsOfType

      @Deprecated public static List<SequenceAnnotation> getAnnotationsOfType(SequenceDocument document, String type)
      Get all annotations in document matching the given type
      Parameters:
      document - document to get annotations form
      type - type of annotations to get
      Returns:
      all annotations in document matching the given type
    • getAnnotationsOfType

      public static List<SequenceAnnotation> getAnnotationsOfType(SequenceDocument document, String type, boolean returnAnnotationsInTracks)
      Get all annotations in document matching the given type.

      WARNING: this list may not include all SequenceAnnotations represented as annotations in the sequence viewer. One such case is with trim annotations, which should be found using getSequenceAnnotationsIncludingImmutableSequencesTrims(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument).

      This may also return annotations that are not visible in the sequence viewer, such as SequenceAnnotation.TYPE_EXTRACTED_REGION.

      Parameters:
      document - document to get annotations form
      type - type of annotations to get
      returnAnnotationsInTracks - true iff we want annotations from SequenceTracks as well as those annotated directly on a document
      Returns:
      all annotations in document matching the given type
      Since:
      API 4.50 (Geneious 5.5.0)
    • getSequenceAndTrackAnnotations

      public static Iterable<SequenceAnnotation> getSequenceAndTrackAnnotations(SequenceDocument sequence)
      A convenience method to get all annotations on the sequence and all annotations on all SequenceTracks on this sequence. Most code should instead manually load tracks on demand since they may be too large to fit into memory. To get tracks on a sequence use SequenceTrack.getTrackManager(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument) followed by SequenceTrack.Manager.getTracks().

      Iterating over the returned value may throw a RuntimeException whose cause is an XMLSerializationException if there is insufficient memory available to load the annotations. When running DocumentOperations or SequenceAnnotationGenerators core Geneious will automatic catch such exceptions and display a nice message to the user.
      Parameters:
      sequence - the sequence to get annotations for
      Returns:
      iterator containing the annotations. Will not return null.
      Since:
      API 4.50 (Geneious 5.5.0)
    • createSequenceCopyAdjustedForGapInsertion

      public static SequenceDocument createSequenceCopyAdjustedForGapInsertion(SequenceDocument sequenceDocument, CharSequence gappedSequenceCharacters)
      Creates a copy of the given sequence with annotations, sequence residues, and chromatogram values adjusted to account for gap insertion. Note, the returned copy does not create gapped versions of SequenceTracks. Tracks are instead automatically propagated from referenced documents in alignments.
      Parameters:
      sequenceDocument - a sequence to insert gaps into. If the sequence alreayd contains gaps, the gaps are removed first
      gappedSequenceCharacters - the sequence characters to appear in the new gapped sequence. The positions of gaps in this character sequence determine how annotations and chromatograms are adjusted.
      Returns:
      a copy of sequenceDocument adjusted for gap insertion. This is always a DefaultSequenceDocument but this method isn't declared to return that for API backwards compatibility reasons
      See Also:
    • createSequenceCopyAdjustedForGapInsertion

      public static SequenceDocument createSequenceCopyAdjustedForGapInsertion(SequenceDocument sequenceDocument, CharSequence gappedSequenceCharacters, boolean includeTracks)
      Creates a copy of the given sequence with annotations, sequence residues, and chromatogram values adjusted to account for gap insertion.
      Parameters:
      sequenceDocument - a sequence to insert gaps into. If the sequence alreayd contains gaps, the gaps are removed first
      gappedSequenceCharacters - the sequence characters to appear in the new gapped sequence. The positions of gaps in this character sequence determine how annotations and chromatograms are adjusted.
      includeTracks - true if tracks should also be copied. If this is intended for use with an alignment which references the original documents, this should be false as alignment documents propagate tracks on demand from referenced documents.
      Returns:
      a copy of sequenceDocument adjusted for gap insertion. This is always a DefaultSequenceDocument but this method isn't declared to return that for API backwards compatibility reasons
      Since:
      API 4.202000 (Geneious 2020.0.0)
      See Also:
    • concatenateSequences

      public static SequenceDocument concatenateSequences(List<? extends SequenceDocument> sequences, boolean circular, int indexOfDocumentToUseForOrigin, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Concatenate a list of sequence documents. All sequences must be of the same type (all nucleotide or all amino acid). For circular results, indexOfDocumentToUse may be used to specify which input sequence should be used to determine the origin for the result. If the specified input sequence is circular and has an annotated origin, this position will be used; otherwise, the start of the specified sequence will be the origin of the result. If circular is false, indexOfDocumentToUse must be -1.
      Parameters:
      sequences - sequence documents to concatenate
      circular - if true, the result will be circular
      indexOfDocumentToUseForOrigin - index of document to use for the origin (must be -1 for linear results)
      progressListener -
      Returns:
      concatenated sequence
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Since:
      API 4.1100 (Geneious 11.0.0)
    • getSequences

      public static List<? extends SequenceDocument> getSequences(AnnotatedPluginDocument[] documents, SequenceDocument.Alphabet alphabet, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      get all the sequences out of a set of AnnotatedPluginDocuments that may wrap SequenceDocuments, SequenceListDocuments or SequenceAlignmentDocuments. For large sequence lists (SequenceListOnDisk) and genome sized sequences (those longer than SequenceDocument.GENOME_SEQUENCE_THRESHOLD) in other sequence lists, these are only loaded into memory on demand to ensure this method doesn't use excessive memory. If this method is potentially called on thousands of documents, then getSequencesWithoutImmediateLoading should be considered instead.
      Parameters:
      documents - documents to get the sequences out of
      alphabet - alphabet the sequences need to be to be included
      progressListener - for notifying the caller about progress of this method and for cancelling.
      Returns:
      all the sequences. Sequences are ordered by the AnnotatedPluginDocument they are in, and then by their index in that document.
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if there is a problem getting the PluginDocument out of an AnnotatedPluginDocument or if the progress listener cancels the request.
    • getSequences

      public static List<? extends SequenceDocument> getSequences(List<AnnotatedPluginDocument> documents, SequenceDocument.Alphabet alphabet, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      get all the sequences out of a set of AnnotatedPluginDocuments that may wrap SequenceDocuments, SequenceListDocuments or SequenceAlignmentDocuments. For large sequence lists (SequenceListOnDisk) and genome sized sequences (those longer than SequenceDocument.GENOME_SEQUENCE_THRESHOLD) in other sequence lists, these are only loaded into memory on demand to ensure this method doesn't use excessive memory. If this method is potentially called on thousands of documents, then getSequencesWithoutImmediateLoading should be considered instead.
      Parameters:
      documents - documents to get the sequences out of
      alphabet - alphabet the sequences need to be to be included
      progressListener - for notifying the caller about progress of this method and for cancelling.
      Returns:
      all the sequences
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if there is a problem getting the PluginDocument out of an AnnotatedPluginDocument or if the progress listener cancels the request.
      Since:
      API 4.700 (Geneious 7.0.0)
    • getSequencesWithoutImmediateLoading

      public static Collection<? extends SequenceDocument> getSequencesWithoutImmediateLoading(AnnotatedPluginDocument[] documents, SequenceDocument.Alphabet alphabet) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Like getSequences(com.biomatters.geneious.publicapi.documents.AnnotatedPluginDocument[], com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument.Alphabet, jebl.util.ProgressListener) but doesn't require each plugin document to be in memory as long as this Collection is around. The trade-off is that the sequences can only be accessed sequentially (hence the Collection return type of this method). Also the Collection does not support removal.

      Since this Collection doesn't store the sequences immediately, DocumentOperationExceptions may be thrown down the line. Such a situation may be caught by surrounding the given iteration with try {... } catch (RuntimeDocumentOperationException e) and then handling the exception from there.

      Note that using getSequences(com.biomatters.geneious.publicapi.documents.AnnotatedPluginDocument[], com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument.Alphabet, jebl.util.ProgressListener) is preferable to using getSequencesWithoutImmediateLoading when dealing with under a thousand documents.

      Parameters:
      documents - documents to get the sequences out of
      alphabet - alphabet the sequences need to be to be included
      Returns:
      all the sequences whose iterator may throw a RuntimeDocumentOperationException
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if one or more of the documents has more than Integer.MAX_VALUE sequences.
      Since:
      API 4.610 (Geneious 6.1.0)
      See Also:
    • getOriginalIndex

      public static int getOriginalIndex(SequenceDocument sequence, int index)
      Gets the original numbering of the given index if it is covered by a SequenceAnnotation.TYPE_EXTRACTED_REGION annotation.
      Parameters:
      sequence - the sequence this index belongs to.
      index - the index to get the original numbering for.
      Returns:
      'translated' index or the original index if no other numbering can be found.
      Since:
      API 4.900 (Geneious 9.0.0)
    • getNumberOfSequences

      public static long getNumberOfSequences(List<AnnotatedPluginDocument> documents, SequenceDocument.Alphabet alphabet)
      Gets the total number of nucleotide or amino acid sequences contained in the given documents which may be individual sequences, sequence lists, or alignments/contigs.
      Parameters:
      documents - the documents to get the number of sequences in
      alphabet - the alphabet (nucleotide or amino acid) of the sequences to count.
      Returns:
      the total number of nucleotide sequences or amino acid contained in the given documents
      Since:
      API 4.40 (Geneious 5.4.0)
    • getNumberOfSequences

      public static long getNumberOfSequences(AnnotatedPluginDocument document, SequenceDocument.Alphabet alphabet)
      Gets the total number of nucleotide or amino acid sequences contained in the given document which may be an individual sequence, sequence list, or alignment/contig.
      Parameters:
      document - the document to get the number of sequences in
      alphabet - the alphabet (nucleotide or amino acid) of the sequences to count.
      Returns:
      the total number of nucleotide or amino acid sequences contained in the given document
      Since:
      API 4.40 (Geneious 5.4.0)
    • generateConsensusSequence

      @Deprecated public static SequenceDocument generateConsensusSequence(SequenceAlignmentDocument alignment, ProgressListener progressListener)
      Generates a consensus sequence for an alignment using default consensus settings. Note that the returned sequence may contain gaps. If it is to be used as a stand-alone sequence, then SequenceExtractionUtilities.removeGaps(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument) should be used.
      Parameters:
      alignment - the alignment to generate the consensus sequence for
      progressListener - for reporting progress can cancelling.
      Returns:
      a sequence equal in length to the alignment. The sequence may contain gaps. May return null if progressListener requests this get cancelled.
      Since:
      API 4.60 (Geneious 5.6.0)
    • generateConsensus

      public static SequenceDocument generateConsensus(SequenceAlignmentDocument alignment, ProgressListener progressListener) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Generates a consensus sequence for an alignment using default consensus settings. Note that the returned sequence may contain gaps. If it is to be used as a stand-alone sequence, then SequenceExtractionUtilities.removeGaps(com.biomatters.geneious.publicapi.documents.sequence.SequenceDocument) should be used.

      To generate consensus sequences with non-default options, use PluginUtilities.getDocumentOperation("Generate_Consensus"). Note that this operation generates an sequence with gaps removed by default.

      Parameters:
      alignment - the alignment to generate the consensus sequence for
      progressListener - for reporting progress can cancelling.
      Returns:
      a sequence equal in length to the alignment. The sequence may contain gaps. Will not return null
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if the consensus can't be generated because there is insufficient free memory.
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException.Canceled - if the progressListener requests the consensus generation be cancelled.
      Since:
      API 4.610 (Geneious 6.1.0)
    • getBlastAlignmentText

      public static String getBlastAlignmentText(SequenceAlignmentDocument alignment, boolean geneiousFriendly)
      Formats the given alignment in BLAST text format
      Parameters:
      alignment - alignment to format
      geneiousFriendly - whether to format the alignment in an html-formatted "Geneious friendly" way that is useful generally for alignments and not just for BLAST output
      Returns:
      alignment represented in BLAST text format
      Since:
      API 4.700 (Geneious 7.0.0)
    • alignmentFromJeblSequences

      public static DefaultAlignmentDocument alignmentFromJeblSequences(String name, List<Sequence> jeblSequences)
      Converts the given alignment of Jebl sequences into a DefaultAlignmentDocument
      Parameters:
      name - name for alignment
      jeblSequences - aligned jebl sequences
      Returns:
      a DefaultAlignmentDocument representing the given alignment.
      Since:
      API 4.700 (Geneious 7.0.0)
    • createNewDocumentsByTransformingSequences

      public static List<AnnotatedPluginDocument> createNewDocumentsByTransformingSequences(List<AnnotatedPluginDocument> sourceDocuments, SequenceDocument.Transformer transformer, ProgressListener progressListener, String newSequenceOrDocumentNamePrefix) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Transforms the sequence(s) in each input document and returns a new document corresponding to each input document.
      Parameters:
      sourceDocuments - the source documents containing sequences to transform. These may be SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
      transformer - the transformer for transforming each sequence
      progressListener - for reporting progress and canceling
      newSequenceOrDocumentNamePrefix - an optional prefix to assign to the name of each newly generated document. May be an empty String to leave names unchanged.
      Returns:
      the new documents
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if documents can't be loaded, or if the input documents are not SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
      Since:
      API 4.701 (Geneious 7.0.1)
    • createNewDocumentsByTransformingSequences

      public static List<AnnotatedPluginDocument> createNewDocumentsByTransformingSequences(List<AnnotatedPluginDocument> sourceDocuments, SequenceDocument.Transformer transformer, ProgressListener progressListener, String newSequenceOrDocumentNamePrefix, String newSequenceOrDocumentNameSuffix) throws com.biomatters.geneious.publicapi.plugin.DocumentOperationException
      Transforms the sequence(s) in each input document and returns a new document corresponding to each input document.
      Parameters:
      sourceDocuments - the source documents containing sequences to transform. These may be SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
      transformer - the transformer for transforming each sequence
      progressListener - for reporting progress and canceling
      newSequenceOrDocumentNamePrefix - an optional prefix to assign to the name of each newly generated document. May be an empty String to leave names unchanged.
      newSequenceOrDocumentNameSuffix - an optional suffix to assign to the name of each newly generated document. May be an empty String to leave names unchanged.
      Returns:
      the new documents
      Throws:
      com.biomatters.geneious.publicapi.plugin.DocumentOperationException - if documents can't be loaded, or if the input documents are not SequenceDocuments or SequenceListDocuments or SequenceAlignmentDocuments
      Since:
      API 4.201920 (Geneious 2019.2.0)
    • getIntervalBasedOnExtractionAnnotation

      public static SequenceAnnotationInterval getIntervalBasedOnExtractionAnnotation(SequenceDocument sequenceDocument, SequenceAnnotationInterval interval, boolean mapToOriginal)
      Gets the extraction annotations from the sequence document and maps the interval to either the original sequence or the result sequence, depending on the value of mapToOriginal
      Parameters:
      sequenceDocument - the document to get the extractionAnnotations from
      interval - the interval to re-map
      mapToOriginal - whether to map this interval to the corresponding bit on the original or to the corresponding bit on the result
      Returns:
      a new interval that represents the given interval on either the original or result document, return parameter interval back if can not find mapping
      Since:
      API 4.1000 (Geneious 10.0.0)
    • getIndexBasedOnExtractionAnnotation

      public static Integer getIndexBasedOnExtractionAnnotation(SequenceDocument sequenceDocument, int index, boolean mapToOriginal)
      Gets the extraction annotations from the sequence document and maps a residue index to a residue index on either the original sequence or the result sequence, depending on the value of mapToOriginal
      Parameters:
      sequenceDocument - the document to get the extractionAnnotations from
      index - the 1-based residue position in the sequence to re-map.
      mapToOriginal - whether to map this interval to the corresponding bit on the original or to the corresponding bit on the result
      Returns:
      a new index that represents the given index on either the original or result document, return null if the index can't be mapped.
      Since:
      API 4.1000 (Geneious 10.0.0)
    • getSequenceCharSequenceHash

      public static String getSequenceCharSequenceHash(SequenceCharSequence charSequence)
      Parameters:
      charSequence - a sequence returned from SequenceDocument.getCharSequence()
      Returns:
      a hexadecimal encoded MD5 hash of the nucleotides or amino acids in a sequence
      Since:
      API 4.202500 (Geneious 2025.0.0)
    • getSequenceHash

      public static String getSequenceHash(SequenceDocument sequence)
      Parameters:
      sequence - sequence to get a MD5 hash of
      Returns:
      a hexadecimal encoded MD5 hash of the nucleotides or amino acids in this sequence
      Since:
      API 4.202500 (Geneious 2025.0.0)
    • getSequenceHash

      public static String getSequenceHash(SequenceDocument sequence, List<Interval> intervals)
      Parameters:
      sequence - sequence to get a MD5 hash of
      intervals - residue (nucleotide or amino acid) intervals within the sequence
      Returns:
      a hexadecimal encoded MD5 hash of the nucleotides or amino acids within the specified intervals in this sequence
      Since:
      API 4.202500 (Geneious 2025.0.0)