Class SoftMaskedAlphabet
- java.lang.Object
-
- org.biojava.utils.Unchangeable
-
- org.biojava.bio.symbol.SoftMaskedAlphabet
-
- All Implemented Interfaces:
Annotatable
,Alphabet
,FiniteAlphabet
,Changeable
public final class SoftMaskedAlphabet extends Unchangeable implements FiniteAlphabet
Soft masking is usually displayed by making the masked regions somehow different from the non masked regions. Typically the masked regions are lower case but other schemes could be invented. For example a softmasked DNA sequence may look like this:>DNA_sequence ATGGACGCTAGCATggtggtggtggtggtggtggtGCATAGCGAGCAAGTGGAGCGT
Where the lowercase regions are masked by low complexity.SoftMaskedAlphabet
s come withSymbolTokenizers
that understand how to read and write the softmasking. The interpretation of what constitutes a masked region is governed by an implementation of aMaskingDetector
. TheDEFAULT
field of theMaskingDetector
interface defines lower case tokens as masked.Copyright (c) 2004 Novartis Institute for Tropical Diseases
- Version:
- 1.0
- Author:
- Mark Schreiber
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
SoftMaskedAlphabet.CaseSensitiveTokenization
ThisSymbolTokenizer
works with a delegate to softmask symbol tokenization as appropriate.static interface
SoftMaskedAlphabet.MaskingDetector
Implementations will define how soft masking looks.-
Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable
Annotatable.AnnotationForwarder
-
-
Field Summary
-
Fields inherited from interface org.biojava.bio.symbol.Alphabet
EMPTY_ALPHABET, PARSERS, SYMBOLS
-
Fields inherited from interface org.biojava.bio.Annotatable
ANNOTATION
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addSymbol(Symbol s)
SoftMaskedAlphabet
s cannot add newSymbol
s.boolean
contains(Symbol s)
Returns whether or not this Alphabet contains the symbol.List
getAlphabets()
Gets the components of theAlphabet
.Symbol
getAmbiguity(Set s)
This is not supported.Annotation
getAnnotation()
The SoftMaskedAlphabet has no annotationprotected FiniteAlphabet
getDelegate()
The compound alpha that holds the symbols used by this wrapperSymbol
getGapSymbol()
Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.static SoftMaskedAlphabet
getInstance(FiniteAlphabet alphaToMask)
Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.static SoftMaskedAlphabet
getInstance(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector)
Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if anySymbol
is soft masked or not.FiniteAlphabet
getMaskedAlphabet()
Gets theAlphabet
upon which masking is being appliedSoftMaskedAlphabet.MaskingDetector
getMaskingDetector()
Getter for theMaskingDetector
String
getName()
The name of the AlphabetSymbol
getSymbol(List l)
Gets the compound symbol composed of theSymbols
in the List.SymbolTokenization
getTokenization(String type)
Get a SymbolTokenization by name.boolean
isMasked(BasisSymbol s)
Determines if aSymbol
is masked.Iterator
iterator()
Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.void
removeSymbol(Symbol s)
SoftMaskedAlphabet
s cannot removeSymbol
s.int
size()
The number of symbols in the alphabet.void
validate(Symbol s)
Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.-
Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
-
-
-
-
Method Detail
-
getInstance
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask) throws IllegalAlphabetException
Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.- Parameters:
alphaToMask
- for example the DNA alphabet.- Returns:
- a reference to a singleton
SoftMaskedAlphabet
. - Throws:
IllegalAlphabetException
- if it cannot be constructed
-
getInstance
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector) throws IllegalAlphabetException
Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if anySymbol
is soft masked or not.- Parameters:
alphaToMask
- for example the DNA alphabet.maskingDetector
- to define masking behaivour- Returns:
- a reference to a singleton
SoftMaskedAlphabet
. - Throws:
IllegalAlphabetException
- if it cannot be constructed
-
getMaskedAlphabet
public FiniteAlphabet getMaskedAlphabet()
Gets theAlphabet
upon which masking is being applied- Returns:
- A
FiniteAlphabet
-
getDelegate
protected FiniteAlphabet getDelegate()
The compound alpha that holds the symbols used by this wrapper- Returns:
- a
FiniteAlphabet
-
getAnnotation
public Annotation getAnnotation()
The SoftMaskedAlphabet has no annotation- Specified by:
getAnnotation
in interfaceAnnotatable
- Returns:
- Annotation.EMPTY_ANNOTATION
-
getAlphabets
public List getAlphabets()
Gets the components of theAlphabet
.- Specified by:
getAlphabets
in interfaceAlphabet
- Returns:
- a
List
with two members, the first is the wrappedAlphabet
the second is the binarySubIntegerAlphabet
.
-
getSymbol
public Symbol getSymbol(List l) throws IllegalSymbolException
Gets the compound symbol composed of theSymbols
in the List. TheSymbols
in theList
must be fromalpha
(defined in the constructor) andSUBINTEGER[0..1]
- Specified by:
getSymbol
in interfaceAlphabet
- Parameters:
l
- aList
ofSymbols
- Returns:
- A
Symbol
from this alphabet. - Throws:
IllegalSymbolException
- ifl
is not as expected (see above)
-
getAmbiguity
public Symbol getAmbiguity(Set s) throws UnsupportedOperationException
This is not supported. Ambiguity should be handled at the level of the wrapped Alphabet. UsegetSymbol(List l)
instead and provide it with an ambigutiy and a masking symbol.- Specified by:
getAmbiguity
in interfaceAlphabet
- Parameters:
s
- aSet
ofSymbols
- Returns:
- a Symbol (possibly fly-weighted) for the Set of symbols in syms
- Throws:
UnsupportedOperationException
- See Also:
getSymbol(List l)
-
getGapSymbol
public Symbol getGapSymbol()
Description copied from interface:Alphabet
Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.
In general, this will be a BasisSymbol that represents a list of AlphabetManager.getGapSymbol() the same length as the getAlphabets list.
- Specified by:
getGapSymbol
in interfaceAlphabet
- Returns:
- the appropriate gap Symbol instance
-
contains
public boolean contains(Symbol s)
Description copied from interface:Alphabet
Returns whether or not this Alphabet contains the symbol.
An alphabet contains an ambiguity symbol iff the ambiguity symbol's getMatches() returns an alphabet that is a proper sub-set of this alphabet. That means that every one of the symbols that could match the ambiguity symbol is also a member of this alphabet.
-
validate
public void validate(Symbol s) throws IllegalSymbolException
Description copied from interface:Alphabet
Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.
This function is used all over the code to validate symbols as they enter a method. Also, the code is littered with catches for IllegalSymbolException. There is a preferred style of handling this, which should be covererd in the package documentation.
- Specified by:
validate
in interfaceAlphabet
- Parameters:
s
- the Symbol to validate- Throws:
IllegalSymbolException
- if r is not contained in this alphabet
-
getMaskingDetector
public SoftMaskedAlphabet.MaskingDetector getMaskingDetector()
Getter for theMaskingDetector
- Returns:
- the
MaskingDetector
-
getTokenization
public SymbolTokenization getTokenization(String type) throws BioException
Description copied from interface:Alphabet
Get a SymbolTokenization by name.
The parser returned is guaranteed to return Symbols and SymbolLists that conform to this alphabet.
Every alphabet should have a SymbolTokenzation under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolTokenization under the name 'name' that uses symbol names to identify symbols. Any other names may also be defined, but the behavior of the returned SymbolTokenization is not defined here.
A SymbolTokenization under the name 'default' should be defined for all sequences, that determines the behavior when printing out a sequence. Standard behavior is to define the 'token' SymbolTokenization as default if it exists, else to define the 'name' SymbolTokenization as the default, but others are possible.
- Specified by:
getTokenization
in interfaceAlphabet
- Parameters:
type
- the name of the parser- Returns:
- a parser for that name
- Throws:
BioException
- if for any reason the tokenization could not be built
-
size
public int size()
Description copied from interface:FiniteAlphabet
The number of symbols in the alphabet.- Specified by:
size
in interfaceFiniteAlphabet
- Returns:
- the size of the alphabet
-
iterator
public Iterator iterator()
Description copied from interface:FiniteAlphabet
Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.Each AtomicSymbol as for which this.contains(as) is true will be returned exactly once by this iterator in no specified order.
- Specified by:
iterator
in interfaceFiniteAlphabet
- Returns:
- an Iterator over the contained AtomicSymbol objects
-
addSymbol
public void addSymbol(Symbol s) throws ChangeVetoException
SoftMaskedAlphabet
s cannot add newSymbol
s. AChangeVetoException
will be thrown.- Specified by:
addSymbol
in interfaceFiniteAlphabet
- Parameters:
s
- theSymbol
to add.- Throws:
ChangeVetoException
- when called.
-
removeSymbol
public void removeSymbol(Symbol s) throws ChangeVetoException
SoftMaskedAlphabet
s cannot removeSymbol
s. AChangeVetoException
will be thrown.- Specified by:
removeSymbol
in interfaceFiniteAlphabet
- Parameters:
s
- theSymbol
to remove.- Throws:
ChangeVetoException
- when called.
-
isMasked
public boolean isMasked(BasisSymbol s) throws IllegalSymbolException
Determines if aSymbol
is masked.- Parameters:
s
- theSymbol
to test.- Returns:
- true if
s
is masked. - Throws:
IllegalSymbolException
-
-