Class CharacterTokenization
- java.lang.Object
- 
- org.biojava.utils.Unchangeable
- 
- org.biojava.bio.seq.io.CharacterTokenization
 
 
- 
- All Implemented Interfaces:
- Serializable,- Annotatable,- SymbolTokenization,- Changeable
 
 public class CharacterTokenization extends Unchangeable implements SymbolTokenization, Serializable Implementation of SymbolTokenization which binds symbols to single unicode characters. Many alphabets (and all simple built-in alphabets like DNA, RNA and Protein) will have an instance of CharacterTokenization registered under the name 'token', so that you could say CharacterTokenization ct = (CharacterTokenization) alpha.getTokenization('token');and expect it to work. When you construct a new instance of this class for an alphabet, there will be no initial associations of Symbols with characters. It is your responsibility to populate the new tokenization appropriately.- Since:
- 1.2
- Author:
- Thomas Down, Matthew Pocock, Greg Cox, Keith James
- See Also:
- Serialized Form
 
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from interface org.biojava.bio.AnnotatableAnnotatable.AnnotationForwarder
 - 
Nested classes/interfaces inherited from interface org.biojava.bio.seq.io.SymbolTokenizationSymbolTokenization.TokenType
 
- 
 - 
Field Summary- 
Fields inherited from interface org.biojava.bio.AnnotatableANNOTATION
 - 
Fields inherited from interface org.biojava.bio.seq.io.SymbolTokenizationCHARACTER, FIXEDWIDTH, SEPARATED, UNKNOWN
 
- 
 - 
Constructor SummaryConstructors Constructor Description CharacterTokenization(Alphabet alpha, boolean caseSensitive)
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbindSymbol(Symbol s, char c)Bind a Symbol to a character.AlphabetgetAlphabet()The alphabet to which this tokenization applies.AnnotationgetAnnotation()Should return the associated annotation object.protected Symbol[]getTokenTable()SymbolTokenization.TokenTypegetTokenType()Determine the style of tokenization represented by this object.StreamParserparseStream(SeqIOListener listener)Return an object which can parse an arbitrary character stream into symbols.SymbolparseToken(String token)Returns the symbol for a single token.protected SymbolparseTokenChar(char c)StringtokenizeSymbol(Symbol s)Return a token representing a single symbol.StringtokenizeSymbolList(SymbolList sl)Return a string representation of a list of symbols.- 
Methods inherited from class org.biojava.utils.UnchangeableaddChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
 - 
Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 - 
Methods inherited from interface org.biojava.utils.ChangeableaddChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
 
- 
 
- 
- 
- 
Constructor Detail- 
CharacterTokenizationpublic CharacterTokenization(Alphabet alpha, boolean caseSensitive) 
 
- 
 - 
Method Detail- 
getAlphabetpublic Alphabet getAlphabet() Description copied from interface:SymbolTokenizationThe alphabet to which this tokenization applies.- Specified by:
- getAlphabetin interface- SymbolTokenization
 
 - 
getTokenTypepublic SymbolTokenization.TokenType getTokenType() Description copied from interface:SymbolTokenizationDetermine the style of tokenization represented by this object.- Specified by:
- getTokenTypein interface- SymbolTokenization
 
 - 
getAnnotationpublic Annotation getAnnotation() Description copied from interface:AnnotatableShould return the associated annotation object.- Specified by:
- getAnnotationin interface- Annotatable
- Returns:
- an Annotation object, never null
 
 - 
bindSymbolpublic void bindSymbol(Symbol s, char c) Bind a Symbol to a character. This method will ensure that when this char is observed, it resolves to this symbol. If it was previously associated with another symbol, the old binding is removed. If this is the first time the symbol has been bound to any character, then this character is taken to be the default tokenization of the Symbol. This means that when converting symbols into characters, this char will be used. If the symbol has previously been bound to another character, then this char will not be produced for the symbol when stringifying the symbol, but this symbol will be produced when tokenizing this character. - Parameters:
- s- the Symbol to bind
- c- the char to bind it to
 
 - 
parseTokenpublic Symbol parseToken(String token) throws IllegalSymbolException Description copied from interface:SymbolTokenizationReturns the symbol for a single token.The Symbol will be a member of the alphabet. If the token is not recognized as mapping to a symbol, an exception will be thrown. - Specified by:
- parseTokenin interface- SymbolTokenization
- Parameters:
- token- the token to retrieve a Symbol for
- Returns:
- the Symbol for that token
- Throws:
- IllegalSymbolException- if there is no Symbol for the token
 
 - 
getTokenTableprotected Symbol[] getTokenTable() 
 - 
parseTokenCharprotected Symbol parseTokenChar(char c) throws IllegalSymbolException - Throws:
- IllegalSymbolException
 
 - 
tokenizeSymbolpublic String tokenizeSymbol(Symbol s) throws IllegalSymbolException Description copied from interface:SymbolTokenizationReturn a token representing a single symbol.- Specified by:
- tokenizeSymbolin interface- SymbolTokenization
- Parameters:
- s- The symbol
- Throws:
- IllegalSymbolException- if the symbol isn't recognized.
 
 - 
tokenizeSymbolListpublic String tokenizeSymbolList(SymbolList sl) throws IllegalAlphabetException Description copied from interface:SymbolTokenizationReturn a string representation of a list of symbols.- Specified by:
- tokenizeSymbolListin interface- SymbolTokenization
- Parameters:
- sl- A SymbolList
- Throws:
- IllegalAlphabetException- if alphabets don't match
 
 - 
parseStreampublic StreamParser parseStream(SeqIOListener listener) Description copied from interface:SymbolTokenizationReturn an object which can parse an arbitrary character stream into symbols.- Specified by:
- parseStreamin interface- SymbolTokenization
- Parameters:
- listener- The listener which gets notified of parsed symbols.
 
 
- 
 
-