Class ChunkedSymbolListFactory


  • public class ChunkedSymbolListFactory
    extends Object
    class that makes ChunkedSymbolLists with the chunks implemented as SymbolLists themselves.

    The advantage is that those SymbolLists can be packed implementations.

    You can build a SequenceBuilderFactory to create a packed chunked sequence from an input file without making an intermediate symbol list with:-

     public class PackedChunkedListFactory implements SequenceBuilderFactory
     {
       public SequenceBuilder makeSequenceBuilder()
       {
         return new SequenceBuilderBase() {
           private ChunkedSymbolListFactory chunker = new ChunkedSymbolListFactory(new PackedSymbolListFactory(true));
    
           // deal with symbols
           public void addSymbols(Alphabet alpha, Symbol[] syms, int pos, int len)
             throws IllegalAlphabetException
           {
             chunker.addSymbols(alpha, syms, pos, len);
           }
    
           // make the sequence
           public Sequence makeSequence()
           {
             try {
               // make the SymbolList
               SymbolList symbols = chunker.makeSymbolList();
               seq = new SimpleSequence(symbols, uri, name, annotation);
    
               // call superclass method
               return super.makeSequence();
             }
             catch (IllegalAlphabetException iae) {
               throw new BioError("couldn't create symbol list");
             }
           }
         };
       }
     }
     

    Then reading in FASTA files can be done with something like:-

     SequenceIterator seqI = new StreamReader(br, new FastaFormat(),
         DNATools.getDNA().getTokenization("token"),
         new PackedChunkedListFactory() );
     

    Blend to suit taste.

    Alternatively, you can input Symbols to the factory with addSymbols make the sequence eventually with makeSymbolList.

    NOTE: An improvement has been introduced where an internal default SymbolList factory is used for small sequences. This implementation allows for faster SymbolList creation and access for small sequences while allowing a more space-efficient implementation to be selected for large sequences.

    NOTE: This class is inherantly not threadsafe. You should create one instance for each symbol list you wish to manufacture, and then you should throw that instance away.

    Author:
    David Huen