001/* 002 * BioJava development code 003 * 004 * This code may be freely distributed and modified under the 005 * terms of the GNU Lesser General Public Licence. This should 006 * be distributed with the code. If you do not have a copy, 007 * see: 008 * 009 * http://www.gnu.org/copyleft/lesser.html 010 * 011 * Copyright for this code is held jointly by the individual 012 * authors. These should be listed in @author doc comments. 013 * 014 * For more information on the BioJava project and its aims, 015 * or to join the biojava-l mailing list, visit the home page 016 * at: 017 * 018 * http://www.biojava.org/ 019 * 020 */ 021 022 023package org.biojava.bio.symbol; 024 025import org.biojava.bio.Annotatable; 026 027/** 028 * A single symbol. 029 * <p> 030 * This is the atomic unit of a SymbolList, or a sequence. It allows 031 * for fine-grain fly-weighting, so that there can be one instance 032 * of each symbol that is referenced multiple times. 033 * <p> 034 * Symbols from finite alphabets are identifiable using the == operator. 035 * Symbols from infinite alphabets may have some specific API to test for 036 * equality, but should realy over-ride the equals() method. 037 * <p> 038 * Some symbols represent a single token in the sequence. For example, there is 039 * a Symbol instance for adenine in DNA, and another one for cytosine. 040 * Symbols can potentialy represent sets of Symbols. For example, n represents 041 * any DNA Symbol, and X any protein Symbol. Gap represents the knowledge that 042 * there is no Symbol. In addition, some symbols represent ordered lists of 043 * other Symbols. For example, the codon agt can be represented by a single 044 * Symbol from the Alphabet DNAxDNAxDNA. Symbols can represent ambiguity over 045 * these complex symbols. For example, you could construct a Symbol instance 046 * that represents the codons atn. This matches the codons {ata, att, atg, atc}. 047 * It is also possible to build a Symbol instance that represents all stop 048 * codons {taa, tag, tga}, which can not be represented in terms of a 049 * single ambiguous n'tuple. 050 * <p> 051 * There are three Symbol interfaces. Symbol is the most generic. It has the 052 * methods getToken and getName so that the Symbol can be textually represented. 053 * In addition, it defines getMatches that returns an Alphabet over all the 054 * AtomicSymbol instances that match the Symbol (N would return an Alphabet 055 * containing {A, G, C, T}, and Gap would return {}). 056 * <p> 057 * BasisSymbol instances can always be represented by an n'tuple of BasisSymbol 058 * instances. It adds the method getSymbols so that you can retrieve this list. 059 * For example, the tuple [ant] is a BasisSymbol, as it is uniquely specified 060 * with those three BasisSymbol instances a, n and t. n is a BasisSymbol 061 * instance as it is uniquely represented by itself. 062 * <p> 063 * AtomicSymbol instances specialize BasisSymbol by guaranteeing that getMatches 064 * returns a set containing only that instance. That is, they are indivisable. 065 * The DNA nucleotides are instances of AtomicSymbol, as are individual codons. 066 * The stop codon {tag} will have a getMatches method that returns {tag}, 067 * a getBases method that also returns {tag} and a getSymbols method that returns 068 * the List [t, a, g]. {tna} is a BasisSymbol but not an AtomicSymbol as it 069 * matches four AtomicSymbol instances {taa, tga, tca, tta}. It follows that 070 * each symbol in getSymbols for an AtomicSymbol instance will also be 071 * AtomicSymbol instances. 072 * 073 * @author Matthew Pocock 074 */ 075public interface Symbol extends Annotatable { 076 /** 077 * The long name for the symbol. 078 * 079 * @return the long name 080 */ 081 String getName(); 082 083 /** 084 * The alphabet containing the symbols matched by this ambiguity symbol. 085 * <p> 086 * This alphabet contains all of, and only, the symbols matched by this 087 * symbol. For example, the symbol representing the DNA 088 * ambiguity code for W would contain the symbol for A and T from the DNA 089 * alphabet. 090 * 091 * @return the Alphabet of symbols matched by this 092 * symbol 093 */ 094 Alphabet getMatches(); 095}