BioJava:CookBook:Distribution:Custom

From BioJava

Jump to: navigation, search

How do I make a custom Alphabet then take an OrderNDistribution over it?

This example demonstrates the creation of a custom Alphabet that will have seven Symbols. The custom made Symbols and Alphabet can then be used to make SymbolLists, Sequences, Distributions etc. When the AlphabetManager creates the CrossProductAlphabet, it will infer that the order of the conditioning alphabet is (order - 1) and the order of the conditioned alphabet is 1.

Contributed by Russell Smithies.

import java.io.*;
import java.util.*;
 
import org.biojava.bio.*;
import org.biojava.bio.dist.*;
import org.biojava.bio.symbol.*;
import org.biojava.utils.*;
 
public class DistTest {
  public static void main(String[] args) throws Exception {
 
    //create a custom dwarf Alphabet
    String[] dNames = {
        "Grumpy", "Sleepy", "Dopey", "Doc", "Happy", "Sneezy", "Bashful"
    };
    Symbol[] dwarfs = new Symbol[7];
    SimpleAlphabet dwarfAlphabet = new SimpleAlphabet();
 
    //give the new Alphabet a name
    dwarfAlphabet.setName("Dwarf");
 
    for (int i = 1; i <= 7; i++) {
      try {
        dwarfs[i - 1] = AlphabetManager.createSymbol((char) ('0' + i), "" + dNames[i - 1],Annotation.EMPTY_ANNOTATION);
         //add your new Symbols to the Alphabet
			dwarfAlphabet.addSymbol(dwarfs[i - 1]);
      }
      catch (Exception e) {
        throw new NestedError(e, "Can't create symbols to represent dwarf");
      }
 
    //it is usual (but not essential) to register newly creates Alphabets with the AlphabetManager
    AlphabetManager.registerAlphabet(dwarfAlphabet.getName(), dwarfAlphabet);
 
    }

Create an OrderNDstribution using the newly built Dwarf Alphabet

//order of the distribution
    int order = 3;
 
    //create the cross-product Alphabet
    Alphabet a = AlphabetManager.getCrossProductAlphabet(Collections.nCopies(order, dwarfAlphabet));
 
    //use the OrderNDistributionFactory to create the Distribution
    OrderNDistribution ond = (OrderNDistribution)OrderNDistributionFactory.DEFAULT.createDistribution(a);
 
    //create the DistributionTrainer
    DistributionTrainerContext dtc = new SimpleDistributionTrainerContext();
 
    //register the Distribution with the trainer
    dtc.registerDistribution(ond);

This shows the creation of of a SymbolList from the Dwarf Alphabet so we can test our new OrderNDistribution. This is done by making, a UniformDistribution which is randomly sampled and adding the Symbols to an ArrayList. The ArrayList is then used to build the SymbolList.

//create a random symbolList of dwarves
    UniformDistribution udist = new UniformDistribution((FiniteAlphabet)dwarfAlphabet);
 
    int size = 100;
    List list = new ArrayList();
 
    for (int i = 0; i < size; i++) {
      list.add(udist.sampleSymbol());
    }
 
    //create a symbolList to test the Distribution
    SymbolList symbl = new SimpleSymbolList(dwarfAlphabet, list);

The SymbolList is changed into an OrderNSymbolList to enable an OrderNDistribution to be made over it.

//make it into an orderNSymbolList
    symbl = SymbolListViews.orderNSymbolList(symbl, order);
 
    //or you could have a windowed symbolList
    //symbl = SymbolListViews.windowedSymbolList(symbl, order);
 
    //add counts to the distribution
    for (Iterator i = symbl.iterator(); i.hasNext(); ) {
      try {
        dtc.addCount(ond, (Symbol) i.next(), 1.0);
      }
      catch (IllegalSymbolException ex) {
       //you  tried to add a Symbol not in your Alphabet
        ex.printstacktrace()}
    }
 
    // don't forget to train or none of your weights will be added
    dtc.train();
 
    //write the distribution to XML
    XMLDistributionWriter writer = new XMLDistributionWriter();
 
    writer.writeDistribution(ond, new FileOutputStream("dwarf.xml"));
  }
}
Personal tools