Class CharacterTokenization

  • All Implemented Interfaces:
    Serializable, Annotatable, SymbolTokenization, Changeable

    public class CharacterTokenization
    extends Unchangeable
    implements SymbolTokenization, Serializable

    Implementation of SymbolTokenization which binds symbols to single unicode characters.

    Many alphabets (and all simple built-in alphabets like DNA, RNA and Protein) will have an instance of CharacterTokenization registered under the name 'token', so that you could say CharacterTokenization ct = (CharacterTokenization) alpha.getTokenization('token'); and expect it to work. When you construct a new instance of this class for an alphabet, there will be no initial associations of Symbols with characters. It is your responsibility to populate the new tokenization appropriately.

    Thomas Down, Matthew Pocock, Greg Cox, Keith James
