public interface Symbol extends Annotatable
This is the atomic unit of a SymbolList, or a sequence. It allows for fine-grain fly-weighting, so that there can be one instance of each symbol that is referenced multiple times.
Symbols from finite alphabets are identifiable using the == operator. Symbols from infinite alphabets may have some specific API to test for equality, but should realy over-ride the equals() method.
Some symbols represent a single token in the sequence. For example, there is a Symbol instance for adenine in DNA, and another one for cytosine. Symbols can potentialy represent sets of Symbols. For example, n represents any DNA Symbol, and X any protein Symbol. Gap represents the knowledge that there is no Symbol. In addition, some symbols represent ordered lists of other Symbols. For example, the codon agt can be represented by a single Symbol from the Alphabet DNAxDNAxDNA. Symbols can represent ambiguity over these complex symbols. For example, you could construct a Symbol instance that represents the codons atn. This matches the codons {ata, att, atg, atc}. It is also possible to build a Symbol instance that represents all stop codons {taa, tag, tga}, which can not be represented in terms of a single ambiguous n'tuple.
There are three Symbol interfaces. Symbol is the most generic. It has the methods getToken and getName so that the Symbol can be textually represented. In addition, it defines getMatches that returns an Alphabet over all the AtomicSymbol instances that match the Symbol (N would return an Alphabet containing {A, G, C, T}, and Gap would return {}).
BasisSymbol instances can always be represented by an n'tuple of BasisSymbol instances. It adds the method getSymbols so that you can retrieve this list. For example, the tuple [ant] is a BasisSymbol, as it is uniquely specified with those three BasisSymbol instances a, n and t. n is a BasisSymbol instance as it is uniquely represented by itself.
AtomicSymbol instances specialize BasisSymbol by guaranteeing that getMatches returns a set containing only that instance. That is, they are indivisable. The DNA nucleotides are instances of AtomicSymbol, as are individual codons. The stop codon {tag} will have a getMatches method that returns {tag}, a getBases method that also returns {tag} and a getSymbols method that returns the List [t, a, g]. {tna} is a BasisSymbol but not an AtomicSymbol as it matches four AtomicSymbol instances {taa, tga, tca, tta}. It follows that each symbol in getSymbols for an AtomicSymbol instance will also be AtomicSymbol instances.
Annotatable.AnnotationForwarder
ANNOTATION
Modifier and Type | Method and Description |
---|---|
Alphabet |
getMatches()
The alphabet containing the symbols matched by this ambiguity symbol.
|
String |
getName()
The long name for the symbol.
|
getAnnotation
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
Alphabet getMatches()
This alphabet contains all of, and only, the symbols matched by this symbol. For example, the symbol representing the DNA ambiguity code for W would contain the symbol for A and T from the DNA alphabet.
Copyright © 2020 BioJava. All rights reserved.