See: Description
Interface | Description |
---|---|
Search.Listener |
Interface for a class that will recieve match information
from this class.
|
Class | Description |
---|---|
Matcher |
This class is analogous to java.util.Matcher except that it works
on SymbolLists instead of Strings.
|
Pattern |
A class analogous to java.util.regex.Pattern but for SymbolLists.
|
PatternFactory |
A class that creates Patterns for regex matching on
SymbolLists of a specific Alphabet.
|
Search |
A utility class to make searching a Sequence with many regex patterns
easier.
|
Exception | Description |
---|---|
RegexException |
An exception thrown by classes of this package.
|
The implementation uses the java.util.regex package to perform the heavy lifting. Previous work had already defined a SymbolListCharSequence class to wrap SymbolLists and permit java.util.regex to be applied to the resultant CharSequence. This package extends this in two ways.
First, this package implements a SymbolTokenization for Alphabets that do not have one defined. This is done by arbitrarily mapping AtomicSymbols in the Alphabet to Unicode characters in the private range. The String that is required defining the regex can be assembled by calling PatternFactory.charValue() to return the unicode character value for Symbols in the Alphabet.
Next, the structure of the package has been changed to resemble more closely the classes in java.util.regex albeit adapted to SymbolLists. The APIs are very similar indeed.
Also, ambiguity and gap symbols in the target SymbolList will be converted to the all-ambiguity symbol by the default tokenizers and so they can be expected to fail to match any specific symbol in the Pattern. This is most frequently the desired behaviour as blocks of N in DNA, for example, should not be matched against any possible pattern.
Copyright © 2020 BioJava. All rights reserved.