Class AlphabetManager
- java.lang.Object
-
- org.biojava.bio.symbol.AlphabetManager
-
public final class AlphabetManager extends Object
Utility methods for working with Alphabets. Also acts as a registry for well-known alphabets.The alphabet interfaces themselves don't give you a lot of help in actually getting an alphabet instance. This is where the AlphabetManager comes in handy. It helps out in serialization, generating derived alphabets and building CrossProductAlphabet instances. It also contains limited support for parsing complex alphabet names back into the alphabets.
- Author:
- Matthew Pocock, Thomas Down, Mark Schreiber, George Waldon (alternate tokenization)
-
-
Constructor Summary
Constructors Constructor Description AlphabetManager()
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static Alphabet
alphabetForName(String name)
Retrieve the alphabet for a specific name.static Iterator
alphabets()
Get an iterator over all alphabets known.static AtomicSymbol
createSymbol(char token, String name, Annotation annotation)
Deprecated.Use the two-arg version of this method instead.static Symbol
createSymbol(char token, Annotation annotation, List symList, Alphabet alpha)
Deprecated.use the new version, without the token argumentstatic Symbol
createSymbol(char token, Annotation annotation, Set symSet, Alphabet alpha)
Deprecated.use the three-arg version of this method instead.static AtomicSymbol
createSymbol(String name)
Generate a new AtomicSymbol instance with a name and an Empty Annotation.static AtomicSymbol
createSymbol(String name, Annotation annotation)
Generate a new AtomicSymbol instance with a name and Annotation.static Symbol
createSymbol(Annotation annotation, List symList, Alphabet alpha)
Generates a new Symbol instance that represents the tuple of Symbols in symList.static Symbol
createSymbol(Annotation annotation, Set symSet, Alphabet alpha)
Generates a new Symbol instance that represents the tuple of Symbols in symList.static List
factorize(Alphabet alpha, Set symSet)
Return a list of BasisSymbol instances that uniquely sum up all AtomicSymbol instances in symSet.static Alphabet
generateCrossProductAlphaFromName(String name)
Generates a new CrossProductAlphabet from the give name.static Symbol
getAllAmbiguitySymbol(FiniteAlphabet alpha)
Return the ambiguity symbol which matches all symbols in a given alphabet.static Set
getAllSymbols(FiniteAlphabet alpha)
Return a set containing all possible symbols which can be considered members of a given alphabet, including ambiguous symbols.static AlphabetIndex
getAlphabetIndex(FiniteAlphabet alpha)
Get an indexer for a specified alphabet.static AlphabetIndex
getAlphabetIndex(Symbol[] syms)
Get an indexer for an array of symbols.static Alphabet
getCrossProductAlphabet(List aList)
Retrieve a CrossProductAlphabet instance over the alphabets in aList.static Alphabet
getCrossProductAlphabet(List aList, String name)
Attempts to create a cross product alphabet and register it under a name.static Alphabet
getCrossProductAlphabet(List aList, Alphabet parent)
Retrieve a CrossProductAlphabet instance over the alphabets in aList.static Symbol
getGapSymbol()
Get the special `gap' Symbol.static Symbol
getGapSymbol(List alphas)
Get the gap symbol appropriate to this list of alphabets.static AlphabetManager
instance()
Deprecated.all AlphabetManager methods have become staticstatic void
loadAlphabets(InputSource is)
Load additional Alphabets, defined in XML format, into the AlphabetManager's registry.static void
registerAlphabet(String[] names, Alphabet alphabet)
Register and Alphabet by more than one name.static void
registerAlphabet(String name, Alphabet alphabet)
Register an alphabet by name.static boolean
registered(String name)
Has an Alphabet been registered by that namestatic Set
registrations()
A set of names under which Alphabets have been registered.static Symbol
symbolForLifeScienceID(LifeScienceIdentifier lsid)
Retreives the Symbol for the LSIDstatic Symbol
symbolForName(String name)
Deprecated.use symbolForLifeScienceID() instead
-
-
-
Constructor Detail
-
AlphabetManager
public AlphabetManager()
-
-
Method Detail
-
instance
public static AlphabetManager instance()
Deprecated.all AlphabetManager methods have become staticRetrieve the singleton instance.- Returns:
- the AlphabetManager instance
-
getAllAmbiguitySymbol
public static Symbol getAllAmbiguitySymbol(FiniteAlphabet alpha)
Return the ambiguity symbol which matches all symbols in a given alphabet.- Parameters:
alpha
- The alphabet- Returns:
- the ambiguity symbol
- Since:
- 1.2
-
getAllSymbols
public static Set getAllSymbols(FiniteAlphabet alpha)
Return a set containing all possible symbols which can be considered members of a given alphabet, including ambiguous symbols. Warning, this method can return large sets!- Parameters:
alpha
- The alphabet- Returns:
- The set of symbols that are members of
alpha
- Since:
- 1.2
-
alphabetForName
public static Alphabet alphabetForName(String name) throws NoSuchElementException
Retrieve the alphabet for a specific name.- Parameters:
name
- the name of the alphabet- Returns:
- the alphabet object
- Throws:
NoSuchElementException
- if there is no alphabet by that name
-
symbolForName
public static Symbol symbolForName(String name) throws NoSuchElementException
Deprecated.use symbolForLifeScienceID() insteadRetrieve the symbol represented a String object- Parameters:
name
- of the string whose symbol you want to get- Returns:
- The Symbol
- Throws:
NoSuchElementException
- if the string name is invalid.
-
symbolForLifeScienceID
public static Symbol symbolForLifeScienceID(LifeScienceIdentifier lsid)
Retreives the Symbol for the LSID- Parameters:
lsid
- the URN for the Symbol- Returns:
- a reference to the Symbol
-
registerAlphabet
public static void registerAlphabet(String name, Alphabet alphabet)
Register an alphabet by name.- Parameters:
name
- the name by which it can be retrievedalphabet
- the Alphabet to store
-
registerAlphabet
public static void registerAlphabet(String[] names, Alphabet alphabet)
Register and Alphabet by more than one name. This allows aliasing of an alphabet with two or more names. It is equivalent to callingregisterAlphabet(String name, Alphabet alphabet)
several times.- Parameters:
names
- the names by which it can be retrievedalphabet
- the Alphabet to store- Since:
- 1.4
-
registrations
public static Set registrations()
A set of names under which Alphabets have been registered.- Returns:
- a
Set
ofStrings
-
registered
public static boolean registered(String name)
Has an Alphabet been registered by that name- Parameters:
name
- the name of the alphabet- Returns:
- true if it has or false otherwise
-
alphabets
public static Iterator alphabets()
Get an iterator over all alphabets known.- Returns:
- an Iterator over Alphabet objects
-
getGapSymbol
public static Symbol getGapSymbol()
Get the special `gap' Symbol.
The gap symbol is a Symbol that has an empty alphabet of matches. As such , ever alphabet contains gap, as there is no symbol that matches gap, so there is no case where an alphabet doesn't contain a symbol that matches gap.
Gap can be thought of as an empty sub-space within the space of all possible symbols. If you are working in a cross-product alphabet, you should chose whether to use gap to represent 'no symbol', or a basis symbol of the appropriate size built entirely of gaps to represent 'no symbol in each of the slots'. Perhaps this could be explained better.
- Returns:
- the system-wide symbol that represents a gap
-
getGapSymbol
public static Symbol getGapSymbol(List alphas)
Get the gap symbol appropriate to this list of alphabets.
The gap symbol with have the same shape a the alphabet list. It will be as long as the list, and if any of the alphabets in the list have a dimension greater than 1, it will also insert the appropriate gap there.
- Parameters:
alphas
- List of alphabets- Returns:
- the appropriate gap symbol for the alphabet list
-
createSymbol
public static AtomicSymbol createSymbol(String name, Annotation annotation)
Generate a new AtomicSymbol instance with a name and Annotation.
Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.
- Parameters:
name
- the String returned by getName()annotation
- the Annotation returned by getAnnotation()- Returns:
- a new AtomicSymbol instance
-
createSymbol
public static AtomicSymbol createSymbol(String name)
Generate a new AtomicSymbol instance with a name and an Empty Annotation.
Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.
- Parameters:
name
- the String returned by getName()- Returns:
- a new AtomicSymbol instance
-
createSymbol
public static AtomicSymbol createSymbol(char token, String name, Annotation annotation)
Deprecated.Use the two-arg version of this method instead.Generate a new AtomicSymbol instance with a token, name and Annotation.
Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.
- Parameters:
token
- the Char token returned by getToken() (ignpred as of BioJava 1.2)name
- the String returned by getName()annotation
- the Annotation returned by getAnnotation()- Returns:
- a new AtomicSymbol instance
-
createSymbol
public static Symbol createSymbol(char token, Annotation annotation, List symList, Alphabet alpha) throws IllegalSymbolException
Deprecated.use the new version, without the token argumentGenerates a new Symbol instance that represents the tuple of Symbols in symList.
This method is most useful for writing Alphabet implementations. It should not be invoked by casual users. Use alphabet.getSymbol(List) instead.
- Parameters:
annotation
- The annotation bundle for the symboltoken
- the Symbol's token [ignored since 1.2]symList
- a list of Symbol objectsalpha
- the Alphabet that this Symbol will reside in- Returns:
- a Symbol that encapsulates that List
- Throws:
IllegalSymbolException
- If the Symbol cannot be made
-
createSymbol
public static Symbol createSymbol(Annotation annotation, List symList, Alphabet alpha) throws IllegalSymbolException
Generates a new Symbol instance that represents the tuple of Symbols in symList. This will attempt to return the same symbol for the same list.
This method is most useful for writing Alphabet implementations. It should not be invoked by casual users. Use alphabet.getSymbol(List) instead.
- Parameters:
annotation
- The annotation bundle for the SymbolsymList
- a list of Symbol objectsalpha
- the Alphabet that this Symbol will reside in- Returns:
- a Symbol that encapsulates that List
- Throws:
IllegalSymbolException
- If the Symbol cannot be made
-
createSymbol
public static Symbol createSymbol(char token, Annotation annotation, Set symSet, Alphabet alpha) throws IllegalSymbolException
Deprecated.use the three-arg version of this method instead.Generates a new Symbol instance that represents the tuple of Symbols in symList.
This method is most useful for writing Alphabet implementations. It should not be invoked by users. Use alphabet.getSymbol(Set) instead.
- Parameters:
token
- the Symbol's token [ignored since 1.2]annotation
- the Symbol's AnnotationsymSet
- a Set of Symbol objectsalpha
- the Alphabet that this Symbol will reside in- Returns:
- a Symbol that encapsulates that List
- Throws:
IllegalSymbolException
- If the Symbol cannot be made
-
createSymbol
public static Symbol createSymbol(Annotation annotation, Set symSet, Alphabet alpha) throws IllegalSymbolException
Generates a new Symbol instance that represents the tuple of Symbols in symList.
This method is most useful for writing Alphabet implementations. It should not be invoked by users. Use alphabet.getSymbol(Set) instead.
- Parameters:
annotation
- the Symbol's AnnotationsymSet
- a Set of Symbol objectsalpha
- the Alphabet that this Symbol will reside in- Returns:
- a Symbol that encapsulates that List
- Throws:
IllegalSymbolException
- If the Symbol cannot be made
-
generateCrossProductAlphaFromName
public static Alphabet generateCrossProductAlphaFromName(String name)
Generates a new CrossProductAlphabet from the give name.- Parameters:
name
- the name to parse- Returns:
- the associated Alphabet
-
getCrossProductAlphabet
public static Alphabet getCrossProductAlphabet(List aList)
Retrieve a CrossProductAlphabet instance over the alphabets in aList.
If all of the alphabets in aList implements FiniteAlphabet then the method will return a FiniteAlphabet. Otherwise, it returns a non-finite alphabet.
If you call this method twice with a list containing the same alphabets, it will return the same alphabet. This promotes the re-use of alphabets and helps to maintain the 'flyweight' principal for finite alphabet symbols.
The resulting alphabet cpa will be retrievable via AlphabetManager.alphabetForName(cpa.getName())
- Parameters:
aList
- a list of Alphabet objects- Returns:
- a CrossProductAlphabet that is over the alphabets in aList
-
getCrossProductAlphabet
public static Alphabet getCrossProductAlphabet(List aList, String name) throws IllegalAlphabetException
Attempts to create a cross product alphabet and register it under a name.- Parameters:
aList
- A list of alphabetsname
- The name which the new alphabet will be registered under.- Returns:
- The CrossProductAlphabet
- Throws:
IllegalAlphabetException
- If the Alphabet cannot be made or a different alphabet is already registed under this name.
-
getCrossProductAlphabet
public static Alphabet getCrossProductAlphabet(List aList, Alphabet parent)
Retrieve a CrossProductAlphabet instance over the alphabets in aList.
This method is most usefull for implementors of cross-product alphabets, allowing them to safely build the matches alphabets for ambiguity symbols.
If all of the alphabets in aList implements FiniteAlphabet then the method will return a FiniteAlphabet. Otherwise, it returns a non-finite alphabet.
If you call this method twice with a list containing the same alphabets, it will return the same alphabet. This promotes the re-use of alphabets and helps to maintain the 'flyweight' principal for finite alphabet symbols.
The resulting alphabet cpa will be retrievable via AlphabetManager.alphabetForName(cpa.getName())
- Parameters:
aList
- a list of Alphabet objectsparent
- a parent alphabet- Returns:
- a CrossProductAlphabet that is over the alphabets in aList
-
factorize
public static List factorize(Alphabet alpha, Set symSet) throws IllegalSymbolException
Return a list of BasisSymbol instances that uniquely sum up all AtomicSymbol instances in symSet. If the symbol can't be represented by a single list of BasisSymbol instances, return null.
This method is most useful for implementers of Alphabet and Symbol. It probably should not be invoked by users.
- Parameters:
symSet
- the Set of AtomicSymbol instancesalpha
- the Alphabet instance that the Symbols are from- Returns:
- a List of BasisSymbols
- Throws:
IllegalSymbolException
- In practice it should not. If it does it probably indicates a subtle bug somewhere in AlphabetManager
-
loadAlphabets
public static void loadAlphabets(InputSource is) throws SAXException, IOException, BioException
Load additional Alphabets, defined in XML format, into the AlphabetManager's registry. These can the be retrieved by callingalphabetForName
.- Parameters:
is
- anInputSource
encapsulating the document to be parsed- Throws:
IOException
- if there is an error accessing the streamSAXException
- if there is an error while parsing the documentBioException
- if a problem occurs when creating the new Alphabets.- Since:
- 1.3
-
getAlphabetIndex
public static AlphabetIndex getAlphabetIndex(FiniteAlphabet alpha)
Get an indexer for a specified alphabet.- Parameters:
alpha
- The alphabet to index- Returns:
- an AlphabetIndex instance
- Since:
- 1.1
-
getAlphabetIndex
public static AlphabetIndex getAlphabetIndex(Symbol[] syms) throws IllegalSymbolException, BioException
Get an indexer for an array of symbols.- Parameters:
syms
- the Symbols to index in that order- Returns:
- an AlphabetIndex instance
- Throws:
IllegalSymbolException
BioException
- Since:
- 1.1
-
-