Class MotifTools


  • public class MotifTools
    extends Object
    MotifTools contains utility methods for sequence motifs.
    Author:
    Keith James
    • Method Detail

      • createRegex

        public static String createRegex​(SymbolList motif)

        createRegex creates a regular expression which matches the SymbolList. Ambiguous Symbols are simply transformed into character classes. For example the nucleotide sequence "AAGCTT" becomes "A{2}GCT{2}" and "CTNNG" is expanded to "CT[ABCDGHKMNRSTVWY]{2}G". The character class is generated using the getMatches method of an ambiguity symbol to obtain the alphabet of AtomicSymbols it matches, followed by calling getAllSymbols on this alphabet, removal of any gap symbols and then tokenization of the remainder. The ordering of the tokens in a character class is by ascending numerical order of their tokens as determined by Arrays.sort(char []).

        The Alphabet of the SymbolList must be finite and must have a character token type. Regular expressions may be generated for any such SymbolList, not just DNA, RNA and protein.

        Parameters:
        motif - a SymbolList.
        Returns:
        a String regular expression.