Interface Symbol

  • All Superinterfaces:
    Annotatable, Changeable
    All Known Subinterfaces:
    AtomicSymbol, BasisSymbol, DotState, EmissionState, ModelInState, State
    All Known Implementing Classes:
    AbstractSymbol, DoubleAlphabet.DoubleRange, DoubleAlphabet.DoubleSymbol, FundamentalAtomicSymbol, IntegerAlphabet.IntegerSymbol, MagicalState, ProfileEmissionState, SimpleAtomicSymbol, SimpleDotState, SimpleEmissionState, SimpleModelInState

    public interface Symbol
    extends Annotatable
    A single symbol.

    This is the atomic unit of a SymbolList, or a sequence. It allows for fine-grain fly-weighting, so that there can be one instance of each symbol that is referenced multiple times.

    Symbols from finite alphabets are identifiable using the == operator. Symbols from infinite alphabets may have some specific API to test for equality, but should realy over-ride the equals() method.

    Some symbols represent a single token in the sequence. For example, there is a Symbol instance for adenine in DNA, and another one for cytosine. Symbols can potentialy represent sets of Symbols. For example, n represents any DNA Symbol, and X any protein Symbol. Gap represents the knowledge that there is no Symbol. In addition, some symbols represent ordered lists of other Symbols. For example, the codon agt can be represented by a single Symbol from the Alphabet DNAxDNAxDNA. Symbols can represent ambiguity over these complex symbols. For example, you could construct a Symbol instance that represents the codons atn. This matches the codons {ata, att, atg, atc}. It is also possible to build a Symbol instance that represents all stop codons {taa, tag, tga}, which can not be represented in terms of a single ambiguous n'tuple.

    There are three Symbol interfaces. Symbol is the most generic. It has the methods getToken and getName so that the Symbol can be textually represented. In addition, it defines getMatches that returns an Alphabet over all the AtomicSymbol instances that match the Symbol (N would return an Alphabet containing {A, G, C, T}, and Gap would return {}).

    BasisSymbol instances can always be represented by an n'tuple of BasisSymbol instances. It adds the method getSymbols so that you can retrieve this list. For example, the tuple [ant] is a BasisSymbol, as it is uniquely specified with those three BasisSymbol instances a, n and t. n is a BasisSymbol instance as it is uniquely represented by itself.

    AtomicSymbol instances specialize BasisSymbol by guaranteeing that getMatches returns a set containing only that instance. That is, they are indivisable. The DNA nucleotides are instances of AtomicSymbol, as are individual codons. The stop codon {tag} will have a getMatches method that returns {tag}, a getBases method that also returns {tag} and a getSymbols method that returns the List [t, a, g]. {tna} is a BasisSymbol but not an AtomicSymbol as it matches four AtomicSymbol instances {taa, tga, tca, tta}. It follows that each symbol in getSymbols for an AtomicSymbol instance will also be AtomicSymbol instances.

    Author:
    Matthew Pocock
    • Method Detail

      • getName

        String getName()
        The long name for the symbol.
        Returns:
        the long name
      • getMatches

        Alphabet getMatches()
        The alphabet containing the symbols matched by this ambiguity symbol.

        This alphabet contains all of, and only, the symbols matched by this symbol. For example, the symbol representing the DNA ambiguity code for W would contain the symbol for A and T from the DNA alphabet.

        Returns:
        the Alphabet of symbols matched by this symbol