Class DNATools


  • public final class DNATools
    extends Object
    Useful functionality for processing DNA sequences.
    Author:
    Matthew Pocock, Keith James (docs), Mark Schreiber, David Huen, Richard Holland
    • Method Detail

      • getDNA

        public static FiniteAlphabet getDNA()
        Return the DNA alphabet.
        Returns:
        a flyweight version of the DNA alphabet
      • getDNAxDNA

        public static FiniteAlphabet getDNAxDNA()
        Gets the (DNA x DNA) Alphabet
        Returns:
        a flyweight version of the (DNA x DNA) alphabet
      • getCodonAlphabet

        public static FiniteAlphabet getCodonAlphabet()
        Gets the (DNA x DNA x DNA) Alphabet
        Returns:
        a flyweight version of the (DNA x DNA x DNA) alphabet
      • index

        public static int index​(Symbol sym)
                         throws IllegalSymbolException
        Return an integer index for a symbol - compatible with forIndex.

        The index for a symbol is stable accross virtual machines & invocations.

        Parameters:
        sym - the Symbol to index
        Returns:
        the index for that symbol
        Throws:
        IllegalSymbolException - if sym is not a member of the DNA alphabet
      • forIndex

        public static Symbol forIndex​(int index)
                               throws IndexOutOfBoundsException
        Return the symbol for an index - compatible with index.

        The index for a symbol is stable accross virtual machines & invocations.

        Parameters:
        index - the index to look up
        Returns:
        the symbol at that index
        Throws:
        IndexOutOfBoundsException - if index is not between 0 and 3
      • getDNADistribution

        public static Distribution getDNADistribution​(double fractionGC)
        return a SimpleDistribution of specified GC content.
        Parameters:
        fractionGC - (G+C) content as a fraction.
      • getDNAxDNADistribution

        public static Distribution getDNAxDNADistribution​(double fractionGC0,
                                                          double fractionGC1)
        return a (DNA x DNA) cross-product Distribution with specified DNA contents in each component Alphabet.
        Parameters:
        fractionGC0 - (G+C) content of first sequence as a fraction.
        fractionGC1 - (G+C) content of second sequence as a fraction.
      • toRNA

        public static SymbolList toRNA​(SymbolList syms)
                                throws IllegalAlphabetException
        Converts a SymbolList from the DNA Alphabet to the RNA Alphabet.
        Parameters:
        syms - the SymbolList to convert to RNA
        Returns:
        a view on syms where Symbols have been converted to RNA. Most significantly t's are now u's. The 5' to 3' order of the Symbols is conserved.
        Throws:
        IllegalAlphabetException - if syms is not DNA.
        Since:
        1.4
      • transcribeToRNA

        public static SymbolList transcribeToRNA​(SymbolList syms)
                                          throws IllegalAlphabetException
        Transcribes DNA to RNA. The method more closely represents the biological reality than toRNA(SymbolList syms) does. The presented DNA SymbolList is assumed to be the template strand in the 5' to 3' orientation. The resulting RNA is transcribed from this template effectively a reverse complement in the RNA alphabet. The method is equivalent to calling reverseComplement() and toRNA() in sequence.

        If you are dealing with cDNA sequences that you want converted to RNA you would be better off calling toRNA(SymbolList syms)

        Parameters:
        syms - the SymbolList to convert to RNA
        Returns:
        a view on syms where Symbols have been converted to RNA.
        Throws:
        IllegalAlphabetException - if syms is not DNA.
        Since:
        1.4
      • toProtein

        public static SymbolList toProtein​(SymbolList syms)
                                    throws IllegalAlphabetException
        Convenience method that directly converts a DNA sequence to RNA then to protein. The translated protein is from the +1 reading frame of the SymbolList. The whole SymbolList is translated although up to 2 DNA residues may be truncated if full codons cannot be formed.
        Parameters:
        syms - the sequence to be translated.
        Returns:
        the translated protein sequence.
        Throws:
        IllegalAlphabetException - if syms is not from the DNA alphabet.
        Since:
        1.5.1
      • toProtein

        public static SymbolList toProtein​(SymbolList syms,
                                           int start,
                                           int end)
                                    throws IllegalAlphabetException
        Convenience method to translate a region of a DNA sequence directly into protein. While the start and end can be specified if the length of the specified region is not evenly divisible by three then the translated region will be truncated until a full terminal codon can be formed.
        Parameters:
        syms - the DNA sequence to be translated.
        start - the location to begin translation.
        end - the end of the translated region.
        Returns:
        the translated protein sequence.
        Throws:
        IllegalAlphabetException - if syms is not from the DNA alphabet.
        Since:
        1.5.1