public class CharacterTokenization extends Unchangeable implements SymbolTokenization, Serializable
Implementation of SymbolTokenization which binds symbols to single unicode characters.
Many alphabets (and all simple built-in alphabets like DNA, RNA and Protein) will have an instance of CharacterTokenization registered under the name 'token', so that you could say
CharacterTokenization ct = (CharacterTokenization) alpha.getTokenization('token');and expect it to work. When you construct a new instance of this class for an alphabet, there will be no initial associations of Symbols with characters. It is your responsibility to populate the new tokenization appropriately.
- Thomas Down, Matthew Pocock, Greg Cox, Keith James
- See Also:
- Serialized Form
All Methods Instance Methods Concrete Methods Modifier and Type Method Description
bindSymbol(Symbol s, char c)Bind a Symbol to a character.
getAlphabet()The alphabet to which this tokenization applies.
getAnnotation()Should return the associated annotation object.
getTokenType()Determine the style of tokenization represented by this object.
parseStream(SeqIOListener listener)Return an object which can parse an arbitrary character stream into symbols.
parseToken(String token)Returns the symbol for a single token.
tokenizeSymbol(Symbol s)Return a token representing a single symbol.
tokenizeSymbolList(SymbolList sl)Return a string representation of a list of symbols.
Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public Alphabet getAlphabet()The alphabet to which this tokenization applies.
public SymbolTokenization.TokenType getTokenType()Determine the style of tokenization represented by this object.
public Annotation getAnnotation()Description copied from interface:
AnnotatableShould return the associated annotation object.
public void bindSymbol(Symbol s, char c)
Bind a Symbol to a character.
This method will ensure that when this char is observed, it resolves to this symbol. If it was previously associated with another symbol, the old binding is removed. If this is the first time the symbol has been bound to any character, then this character is taken to be the default tokenization of the Symbol. This means that when converting symbols into characters, this char will be used. If the symbol has previously been bound to another character, then this char will not be produced for the symbol when stringifying the symbol, but this symbol will be produced when tokenizing this character.
s- the Symbol to bind
c- the char to bind it to
public Symbol parseToken(String token) throws IllegalSymbolExceptionReturns the symbol for a single token.
The Symbol will be a member of the alphabet. If the token is not recognized as mapping to a symbol, an exception will be thrown.
protected Symbol parseTokenChar(char c) throws IllegalSymbolException
public String tokenizeSymbol(Symbol s) throws IllegalSymbolExceptionReturn a token representing a single symbol.
public String tokenizeSymbolList(SymbolList sl) throws IllegalAlphabetExceptionReturn a string representation of a list of symbols.