public class CharacterTokenization extends Unchangeable implements SymbolTokenization, Serializable
Implementation of SymbolTokenization which binds symbols to single unicode characters.
Many alphabets (and all simple built-in alphabets like DNA, RNA
and Protein) will have an instance of CharacterTokenization
registered under the name 'token', so that you could say
CharacterTokenization ct = (CharacterTokenization)
alpha.getTokenization('token');
and expect it to work. When
you construct a new instance of this class for an alphabet, there
will be no initial associations of Symbols with characters. It is
your responsibility to populate the new tokenization appropriately.
SymbolTokenization.TokenType
Annotatable.AnnotationForwarder
CHARACTER, FIXEDWIDTH, SEPARATED, UNKNOWN
ANNOTATION
Constructor and Description |
---|
CharacterTokenization(Alphabet alpha,
boolean caseSensitive) |
Modifier and Type | Method and Description |
---|---|
void |
bindSymbol(Symbol s,
char c)
Bind a Symbol to a character.
|
Alphabet |
getAlphabet()
The alphabet to which this tokenization applies.
|
Annotation |
getAnnotation()
Should return the associated annotation object.
|
protected Symbol[] |
getTokenTable() |
SymbolTokenization.TokenType |
getTokenType()
Determine the style of tokenization represented by this object.
|
StreamParser |
parseStream(SeqIOListener listener)
Return an object which can parse an arbitrary character stream into
symbols.
|
Symbol |
parseToken(String token)
Returns the symbol for a single token.
|
protected Symbol |
parseTokenChar(char c) |
String |
tokenizeSymbol(Symbol s)
Return a token representing a single symbol.
|
String |
tokenizeSymbolList(SymbolList sl)
Return a string representation of a list of symbols.
|
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
public CharacterTokenization(Alphabet alpha, boolean caseSensitive)
public Alphabet getAlphabet()
SymbolTokenization
getAlphabet
in interface SymbolTokenization
public SymbolTokenization.TokenType getTokenType()
SymbolTokenization
getTokenType
in interface SymbolTokenization
public Annotation getAnnotation()
Annotatable
getAnnotation
in interface Annotatable
public void bindSymbol(Symbol s, char c)
Bind a Symbol to a character.
This method will ensure that when this char is observed, it resolves to this symbol. If it was previously associated with another symbol, the old binding is removed. If this is the first time the symbol has been bound to any character, then this character is taken to be the default tokenization of the Symbol. This means that when converting symbols into characters, this char will be used. If the symbol has previously been bound to another character, then this char will not be produced for the symbol when stringifying the symbol, but this symbol will be produced when tokenizing this character.
s
- the Symbol to bindc
- the char to bind it topublic Symbol parseToken(String token) throws IllegalSymbolException
SymbolTokenization
The Symbol will be a member of the alphabet. If the token is not recognized as mapping to a symbol, an exception will be thrown.
parseToken
in interface SymbolTokenization
token
- the token to retrieve a Symbol forIllegalSymbolException
- if there is no Symbol for the tokenprotected Symbol[] getTokenTable()
protected Symbol parseTokenChar(char c) throws IllegalSymbolException
IllegalSymbolException
public String tokenizeSymbol(Symbol s) throws IllegalSymbolException
SymbolTokenization
tokenizeSymbol
in interface SymbolTokenization
s
- The symbolIllegalSymbolException
- if the symbol isn't recognized.public String tokenizeSymbolList(SymbolList sl) throws IllegalAlphabetException
SymbolTokenization
tokenizeSymbolList
in interface SymbolTokenization
sl
- A SymbolListIllegalAlphabetException
- if alphabets don't matchpublic StreamParser parseStream(SeqIOListener listener)
SymbolTokenization
parseStream
in interface SymbolTokenization
listener
- The listener which gets notified of parsed symbols.Copyright © 2020 BioJava. All rights reserved.