Package org.biojava.utils.regex
The implementation uses the java.util.regex package to perform the heavy lifting. Previous work had already defined a SymbolListCharSequence class to wrap SymbolLists and permit java.util.regex to be applied to the resultant CharSequence. This package extends this in two ways.
First, this package implements a SymbolTokenization for Alphabets that do not have one defined. This is done by arbitrarily mapping AtomicSymbols in the Alphabet to Unicode characters in the private range. The String that is required defining the regex can be assembled by calling PatternFactory.charValue() to return the unicode character value for Symbols in the Alphabet.
Next, the structure of the package has been changed to resemble more closely the classes in java.util.regex albeit adapted to SymbolLists. The APIs are very similar indeed.
Caveats
It should be noted that ambiguity symbols in the pattern String will be expanded to specify each of the component Symbols as a variant Symbol for matching.Also, ambiguity and gap symbols in the target SymbolList will be converted to the all-ambiguity symbol by the default tokenizers and so they can be expected to fail to match any specific symbol in the Pattern. This is most frequently the desired behaviour as blocks of N in DNA, for example, should not be matched against any possible pattern.
-
Interface Summary Interface Description Search.Listener Interface for a class that will recieve match information from this class. -
Class Summary Class Description Matcher This class is analogous to java.util.Matcher except that it works on SymbolLists instead of Strings.Pattern A class analogous to java.util.regex.Pattern but for SymbolLists.PatternFactory A class that creates Patterns for regex matching on SymbolLists of a specific Alphabet.Search A utility class to make searching a Sequence with many regex patterns easier. -
Exception Summary Exception Description RegexException An exception thrown by classes of this package.