Class SequenceMixin


  • public class SequenceMixin
    extends Object
    Provides a set of static methods to be used as static imports when needed across multiple Sequence implementations but inheritance gets in the way. It also provides a place to put utility methods whose application can be to a single class of Sequence e.g. NucleotideCompound Sequence; or to any Sequence e.g. looking for the getComposition(Sequence) or getDistribution(Sequence) for any type of Sequence. All of these methods assume that you can use the Iterable interface offered by the implementations of Sequence to provide all the compounds that implementation allows you to see. Since sequence should know nothing about its backing stores (apart from calling out to it) this should be true.
    Author:
    ayates
    • Method Detail

      • countCompounds

        public static <C extends Compound> int countCompounds​(Sequence<C> sequence,
                                                              C... compounds)
        For the given vargs of compounds this method counts the number of times those compounds appear in the given sequence
        Type Parameters:
        C - The type of compound we are looking for
        Parameters:
        sequence - The Sequence to perform the count on
        compounds - The compounds to look for
        Returns:
        The number of times the given compounds appear in this Sequence
      • getDistribution

        public static <C extends CompoundMap<C,​DoublegetDistribution​(Sequence<C> sequence)
        Analogous to getComposition(Sequence) but returns the distribution of that Compound over the given sequence.
        Type Parameters:
        C - The type of compound to look for
        Parameters:
        sequence - The type of sequence to look over
        Returns:
        Returns the decimal fraction of the compounds in the given sequence. Any compound not in the Map will return a fraction of 0.
      • getComposition

        public static <C extends CompoundMap<C,​IntegergetComposition​(Sequence<C> sequence)
        Does a linear scan over the given Sequence and records the number of times each base appears. The returned map will return 0 if a compound is asked for and the Map has no record of it.
        Type Parameters:
        C - The type of compound to look for
        Parameters:
        sequence - The type of sequence to look over
        Returns:
        Counts for the instances of all compounds in the sequence
      • write

        public static <C extends Compound> void write​(Appendable appendable,
                                                      Sequence<C> sequence)
                                               throws IOException
        Used as a way of sending a Sequence to a writer without the cost of converting to a full length String and then writing the data out
        Type Parameters:
        C - Type of compound
        Parameters:
        writer - The writer to send data to
        sequence - The sequence to write out
        Throws:
        IOException - Thrown if we encounter a problem
      • indexOf

        public static <C extends Compound> int indexOf​(Sequence<C> sequence,
                                                       C compound)
        Performs a linear search of the given Sequence for the given compound. Once we find the compound we return the position.
      • createIterator

        public static <C extends CompoundIterator<C> createIterator​(Sequence<C> sequence)
        Creates a simple sequence iterator which moves through a sequence going from 1 to the length of the Sequence. Modification of the Sequence is not allowed.
      • nonOverlappingKmers

        public static <C extends CompoundList<SequenceView<C>> nonOverlappingKmers​(Sequence<C> sequence,
                                                                                     int kmer)
        Produces kmers of the specified size e.g. ATGTGA returns two views which have ATG TGA
        Type Parameters:
        C - Compound to use
        Parameters:
        sequence - Sequence to build from
        kmer - Kmer size
        Returns:
        The list of non-overlapping K-mers
      • overlappingKmers

        public static <C extends CompoundList<SequenceView<C>> overlappingKmers​(Sequence<C> sequence,
                                                                                  int kmer)
        Used to generate overlapping k-mers such i.e. ATGTA will give rise to ATG, TGT & GTA
        Type Parameters:
        C - Compound to use
        Parameters:
        sequence - Sequence to build from
        kmer - Kmer size
        Returns:
        The list of overlapping K-mers
      • inverse

        public static <C extends CompoundSequenceView<C> inverse​(Sequence<C> sequence)
        A method which attempts to do the right thing when is comes to a reverse/reverse complement
        Type Parameters:
        C - The type of compound
        Parameters:
        sequence - The input sequence
        Returns:
        The inverted sequence which is optionally complemented
      • sequenceEqualityIgnoreCase

        public static <C extends Compound> boolean sequenceEqualityIgnoreCase​(Sequence<C> source,
                                                                              Sequence<C> target)
        A case-insensitive manner of comparing two sequence objects together. We will throw out any compounds which fail to match on their sequence length & compound sets used. The code will also bail out the moment we find something is wrong with a Sequence. Cost to run is linear to the length of the Sequence.
        Type Parameters:
        C - The type of compound
        Parameters:
        source - Source sequence to assess
        target - Target sequence to assess
        Returns:
        Boolean indicating if the sequences matched ignoring case
      • sequenceEquality

        public static <C extends Compound> boolean sequenceEquality​(Sequence<C> source,
                                                                    Sequence<C> target)
        A case-sensitive manner of comparing two sequence objects together. We will throw out any compounds which fail to match on their sequence length & compound sets used. The code will also bail out the moment we find something is wrong with a Sequence. Cost to run is linear to the length of the Sequence.
        Type Parameters:
        C - The type of compound
        Parameters:
        source - Source sequence to assess
        target - Target sequence to assess
        Returns:
        Boolean indicating if the sequences matched