java.lang.Object
- org.biojava.utils.Unchangeable
- - org.biojava.bio.symbol.SoftMaskedAlphabet

All Implemented Interfaces:

Annotatable, Alphabet, FiniteAlphabet, Changeable
```
public final class SoftMaskedAlphabet
extends Unchangeable
implements FiniteAlphabet
```
Soft masking is usually displayed by making the masked regions somehow different from the non masked regions. Typically the masked regions are lower case but other schemes could be invented. For example a softmasked DNA sequence may look like this:
```
 >DNA_sequence
 ATGGACGCTAGCATggtggtggtggtggtggtggtGCATAGCGAGCAAGTGGAGCGT

 
```
Where the lowercase regions are masked by low complexity.
SoftMaskedAlphabets come with SymbolTokenizers that understand how to read and write the softmasking. The interpretation of what constitutes a masked region is governed by an implementation of a MaskingDetector. The DEFAULT field of the MaskingDetector interface defines lower case tokens as masked.
Copyright (c) 2004 Novartis Institute for Tropical Diseases
Version:

1.0

Author:

Mark Schreiber

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`class`	`SoftMaskedAlphabet.CaseSensitiveTokenization`	This `SymbolTokenizer` works with a delegate to softmask symbol tokenization as appropriate.
`static interface`	`SoftMaskedAlphabet.MaskingDetector`	Implementations will define how soft masking looks.

Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable
Annotatable.AnnotationForwarder

Field Summary
- Fields inherited from interface org.biojava.bio.symbol.Alphabet
  EMPTY_ALPHABET, PARSERS, SYMBOLS
- Fields inherited from interface org.biojava.bio.Annotatable
  ANNOTATION

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`addSymbol(Symbol s)`	`SoftMaskedAlphabet`s cannot add new `Symbol`s.
`boolean`	`contains(Symbol s)`	Returns whether or not this Alphabet contains the symbol.
`List`	`getAlphabets()`	Gets the components of the `Alphabet`.
`Symbol`	`getAmbiguity(Set s)`	This is not supported.
`Annotation`	`getAnnotation()`	The SoftMaskedAlphabet has no annotation
`protected FiniteAlphabet`	`getDelegate()`	The compound alpha that holds the symbols used by this wrapper
`Symbol`	`getGapSymbol()`	Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.
`static SoftMaskedAlphabet`	`getInstance(FiniteAlphabet alphaToMask)`	Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.
`static SoftMaskedAlphabet`	`getInstance(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector)`	Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if any `Symbol` is soft masked or not.
`FiniteAlphabet`	`getMaskedAlphabet()`	Gets the `Alphabet` upon which masking is being applied
`SoftMaskedAlphabet.MaskingDetector`	`getMaskingDetector()`	Getter for the `MaskingDetector`
`String`	`getName()`	The name of the Alphabet
`Symbol`	`getSymbol(List l)`	Gets the compound symbol composed of the `Symbols` in the List.
`SymbolTokenization`	`getTokenization(String type)`	Get a SymbolTokenization by name.
`boolean`	`isMasked(BasisSymbol s)`	Determines if a `Symbol` is masked.
`Iterator`	`iterator()`	Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.
`void`	`removeSymbol(Symbol s)`	`SoftMaskedAlphabet`s cannot remove `Symbol`s.
`int`	`size()`	The number of symbols in the alphabet.
`void`	`validate(Symbol s)`	Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.

Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener

- Method Detail
  - getInstance
```
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask)
                                      throws IllegalAlphabetException
```
    Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.
    
    Parameters:
    
    alphaToMask - for example the DNA alphabet.
    
    Returns:
    
    a reference to a singleton SoftMaskedAlphabet.
    
    Throws:
    
    IllegalAlphabetException - if it cannot be constructed
  - getInstance
```
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask,
                                             SoftMaskedAlphabet.MaskingDetector maskingDetector)
                                      throws IllegalAlphabetException
```
    Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if any Symbol is soft masked or not.
    
    Parameters:
    
    alphaToMask - for example the DNA alphabet.
    
    maskingDetector - to define masking behaivour
    
    Returns:
    
    a reference to a singleton SoftMaskedAlphabet.
    
    Throws:
    
    IllegalAlphabetException - if it cannot be constructed
  - getMaskedAlphabet
```
public FiniteAlphabet getMaskedAlphabet()
```
    Gets the Alphabet upon which masking is being applied
    
    Returns:
    
    A FiniteAlphabet
  - getDelegate
```
protected FiniteAlphabet getDelegate()
```
    The compound alpha that holds the symbols used by this wrapper
    
    Returns:
    
    a FiniteAlphabet
  - getAnnotation
```
public Annotation getAnnotation()
```
    The SoftMaskedAlphabet has no annotation
    
    Specified by:
    
    getAnnotation in interface Annotatable
    
    Returns:
    
    Annotation.EMPTY_ANNOTATION
  - getName
```
public String getName()
```
    The name of the Alphabet
    
    Specified by:
    
    getName in interface Alphabet
    
    Returns:
    
    a String in the form of "Softmasked {"+alphaToMask.getName()+"}"
  - getAlphabets
```
public List getAlphabets()
```
    Gets the components of the Alphabet.
    
    Specified by:
    
    getAlphabets in interface Alphabet
    
    Returns:
    
    a List with two members, the first is the wrapped Alphabet the second is the binary SubIntegerAlphabet.
  - getSymbol
```
public Symbol getSymbol(List l)
                 throws IllegalSymbolException
```
    Gets the compound symbol composed of the Symbols in the List. The Symbols in the List must be from alpha (defined in the constructor) and SUBINTEGER[0..1]
    
    Specified by:
    
    getSymbol in interface Alphabet
    
    Parameters:
    
    l - a List of Symbols
    
    Returns:
    
    A Symbol from this alphabet.
    
    Throws:
    
    IllegalSymbolException - if l is not as expected (see above)
  - getAmbiguity
```
public Symbol getAmbiguity(Set s)
                    throws UnsupportedOperationException
```
    This is not supported. Ambiguity should be handled at the level of the wrapped Alphabet. Use getSymbol(List l) instead and provide it with an ambigutiy and a masking symbol.
    
    Specified by:
    
    getAmbiguity in interface Alphabet
    
    Parameters:
    
    s - a Set of Symbols
    
    Returns:
    
    a Symbol (possibly fly-weighted) for the Set of symbols in syms
    
    Throws:
    
    UnsupportedOperationException
    
    See Also:
    
    getSymbol(List l)
  - getGapSymbol
```
public Symbol getGapSymbol()
```
    Description copied from interface: Alphabet
    
    Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.
    
    In general, this will be a BasisSymbol that represents a list of AlphabetManager.getGapSymbol() the same length as the getAlphabets list.
    
    Specified by:
    
    getGapSymbol in interface Alphabet
    
    Returns:
    
    the appropriate gap Symbol instance
  - contains
```
public boolean contains(Symbol s)
```
    Description copied from interface: Alphabet
    
    Returns whether or not this Alphabet contains the symbol.
    
    An alphabet contains an ambiguity symbol iff the ambiguity symbol's getMatches() returns an alphabet that is a proper sub-set of this alphabet. That means that every one of the symbols that could match the ambiguity symbol is also a member of this alphabet.
    
    Specified by:
    
    contains in interface Alphabet
    
    Parameters:
    
    s - the Symbol to check
    
    Returns:
    
    boolean true if the Alphabet contains the symbol and false otherwise
  - validate
```
public void validate(Symbol s)
              throws IllegalSymbolException
```
    Description copied from interface: Alphabet
    
    Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.
    
    This function is used all over the code to validate symbols as they enter a method. Also, the code is littered with catches for IllegalSymbolException. There is a preferred style of handling this, which should be covererd in the package documentation.
    
    Specified by:
    
    validate in interface Alphabet
    
    Parameters:
    
    s - the Symbol to validate
    
    Throws:
    
    IllegalSymbolException - if r is not contained in this alphabet
  - getMaskingDetector
```
public SoftMaskedAlphabet.MaskingDetector getMaskingDetector()
```
    Getter for the MaskingDetector
    Returns: the MaskingDetector
  getTokenization public SymbolTokenization getTokenization(String type) throws BioException Description copied from interface: Alphabet Get a SymbolTokenization by name. The parser returned is guaranteed to return Symbols and SymbolLists that conform to this alphabet. Every alphabet should have a SymbolTokenzation under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolTokenization under the name 'name' that uses symbol names to identify symbols. Any other names may also be defined, but the behavior of the returned SymbolTokenization is not defined here. A SymbolTokenization under the name 'default' should be defined for all sequences, that determines the behavior when printing out a sequence. Standard behavior is to define the 'token' SymbolTokenization as default if it exists, else to define the 'name' SymbolTokenization as the default, but others are possible. Specified by: getTokenization in interface Alphabet Parameters: type - the name of the parser Returns: a parser for that name Throws: BioException - if for any reason the tokenization could not be built size public int size() Description copied from interface: FiniteAlphabet The number of symbols in the alphabet. Specified by: size in interface FiniteAlphabet Returns: the size of the alphabet iterator public Iterator iterator() Description copied from interface: FiniteAlphabet Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet. Each AtomicSymbol as for which this.contains(as) is true will be returned exactly once by this iterator in no specified order. Specified by: iterator in interface FiniteAlphabet Returns: an Iterator over the contained AtomicSymbol objects addSymbol public void addSymbol(Symbol s) throws ChangeVetoException SoftMaskedAlphabets cannot add new Symbols. A ChangeVetoException will be thrown. Specified by: addSymbol in interface FiniteAlphabet Parameters: s - the Symbol to add. Throws: ChangeVetoException - when called. removeSymbol public void removeSymbol(Symbol s) throws ChangeVetoException SoftMaskedAlphabets cannot remove Symbols. A ChangeVetoException will be thrown. Specified by: removeSymbol in interface FiniteAlphabet Parameters: s - the Symbol to remove. Throws: ChangeVetoException - when called. isMasked public boolean isMasked(BasisSymbol s) throws IllegalSymbolException Determines if a Symbol is masked. Parameters: s - the Symbol to test. Returns: true if s is masked. Throws: IllegalSymbolException

Class SoftMaskedAlphabet

Nested Class Summary

Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable

Field Summary

Fields inherited from interface org.biojava.bio.symbol.Alphabet

Fields inherited from interface org.biojava.bio.Annotatable

Method Summary

Methods inherited from class org.biojava.utils.Unchangeable

Methods inherited from class java.lang.Object

Methods inherited from interface org.biojava.utils.Changeable

Method Detail

getInstance

getInstance

getMaskedAlphabet

getDelegate

getAnnotation

getName

getAlphabets

getSymbol

getAmbiguity

getGapSymbol

contains

validate

getMaskingDetector

getTokenization

size

iterator

addSymbol

removeSymbol

isMasked