Class SoftMaskedAlphabet

  • All Implemented Interfaces:
    Annotatable, Alphabet, FiniteAlphabet, Changeable

    public final class SoftMaskedAlphabet
    extends Unchangeable
    implements FiniteAlphabet
    Soft masking is usually displayed by making the masked regions somehow different from the non masked regions. Typically the masked regions are lower case but other schemes could be invented. For example a softmasked DNA sequence may look like this:
    
     >DNA_sequence
     ATGGACGCTAGCATggtggtggtggtggtggtggtGCATAGCGAGCAAGTGGAGCGT
    
     
    Where the lowercase regions are masked by low complexity.

    SoftMaskedAlphabets come with SymbolTokenizers that understand how to read and write the softmasking. The interpretation of what constitutes a masked region is governed by an implementation of a MaskingDetector. The DEFAULT field of the MaskingDetector interface defines lower case tokens as masked.

    Copyright (c) 2004 Novartis Institute for Tropical Diseases

    Version:
    1.0
    Author:
    Mark Schreiber
    • Method Detail

      • getDelegate

        protected FiniteAlphabet getDelegate()
        The compound alpha that holds the symbols used by this wrapper
        Returns:
        a FiniteAlphabet
      • getName

        public String getName()
        The name of the Alphabet
        Specified by:
        getName in interface Alphabet
        Returns:
        a String in the form of "Softmasked {"+alphaToMask.getName()+"}"
      • getAlphabets

        public List getAlphabets()
        Gets the components of the Alphabet.
        Specified by:
        getAlphabets in interface Alphabet
        Returns:
        a List with two members, the first is the wrapped Alphabet the second is the binary SubIntegerAlphabet.
      • getSymbol

        public Symbol getSymbol​(List l)
                         throws IllegalSymbolException
        Gets the compound symbol composed of the Symbols in the List. The Symbols in the List must be from alpha (defined in the constructor) and SUBINTEGER[0..1]
        Specified by:
        getSymbol in interface Alphabet
        Parameters:
        l - a List of Symbols
        Returns:
        A Symbol from this alphabet.
        Throws:
        IllegalSymbolException - if l is not as expected (see above)
      • getGapSymbol

        public Symbol getGapSymbol()
        Description copied from interface: Alphabet

        Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.

        In general, this will be a BasisSymbol that represents a list of AlphabetManager.getGapSymbol() the same length as the getAlphabets list.

        Specified by:
        getGapSymbol in interface Alphabet
        Returns:
        the appropriate gap Symbol instance
      • contains

        public boolean contains​(Symbol s)
        Description copied from interface: Alphabet

        Returns whether or not this Alphabet contains the symbol.

        An alphabet contains an ambiguity symbol iff the ambiguity symbol's getMatches() returns an alphabet that is a proper sub-set of this alphabet. That means that every one of the symbols that could match the ambiguity symbol is also a member of this alphabet.

        Specified by:
        contains in interface Alphabet
        Parameters:
        s - the Symbol to check
        Returns:
        boolean true if the Alphabet contains the symbol and false otherwise
      • validate

        public void validate​(Symbol s)
                      throws IllegalSymbolException
        Description copied from interface: Alphabet

        Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.

        This function is used all over the code to validate symbols as they enter a method. Also, the code is littered with catches for IllegalSymbolException. There is a preferred style of handling this, which should be covererd in the package documentation.

        Specified by:
        validate in interface Alphabet
        Parameters:
        s - the Symbol to validate
        Throws:
        IllegalSymbolException - if r is not contained in this alphabet
      • getTokenization

        public SymbolTokenization getTokenization​(String type)
                                           throws BioException
        Description copied from interface: Alphabet

        Get a SymbolTokenization by name.

        The parser returned is guaranteed to return Symbols and SymbolLists that conform to this alphabet.

        Every alphabet should have a SymbolTokenzation under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolTokenization under the name 'name' that uses symbol names to identify symbols. Any other names may also be defined, but the behavior of the returned SymbolTokenization is not defined here.

        A SymbolTokenization under the name 'default' should be defined for all sequences, that determines the behavior when printing out a sequence. Standard behavior is to define the 'token' SymbolTokenization as default if it exists, else to define the 'name' SymbolTokenization as the default, but others are possible.

        Specified by:
        getTokenization in interface Alphabet
        Parameters:
        type - the name of the parser
        Returns:
        a parser for that name
        Throws:
        BioException - if for any reason the tokenization could not be built
      • size

        public int size()
        Description copied from interface: FiniteAlphabet
        The number of symbols in the alphabet.
        Specified by:
        size in interface FiniteAlphabet
        Returns:
        the size of the alphabet
      • iterator

        public Iterator iterator()
        Description copied from interface: FiniteAlphabet
        Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.

        Each AtomicSymbol as for which this.contains(as) is true will be returned exactly once by this iterator in no specified order.

        Specified by:
        iterator in interface FiniteAlphabet
        Returns:
        an Iterator over the contained AtomicSymbol objects