Class CharacterTokenization

  • All Implemented Interfaces:
    Serializable, Annotatable, SymbolTokenization, Changeable

    public class CharacterTokenization
    extends Unchangeable
    implements SymbolTokenization, Serializable

    Implementation of SymbolTokenization which binds symbols to single unicode characters.

    Many alphabets (and all simple built-in alphabets like DNA, RNA and Protein) will have an instance of CharacterTokenization registered under the name 'token', so that you could say CharacterTokenization ct = (CharacterTokenization) alpha.getTokenization('token'); and expect it to work. When you construct a new instance of this class for an alphabet, there will be no initial associations of Symbols with characters. It is your responsibility to populate the new tokenization appropriately.

    Since:
    1.2
    Author:
    Thomas Down, Matthew Pocock, Greg Cox, Keith James
    See Also:
    Serialized Form