BioJava:CookBook:Count:Frequency

How do I calculate the frequency of a Symbol in a Sequence?

One of the most useful classes in BioJava is the Distribution. A Distribution is a map from a Symbol to a frequency. Distributions are trained with observed Symbols using a DistributionTrainerContext. A DistributionTrainerContext can train several registered Distributions and will handle any Symbol from any Alphabet. Ambiguous Symbols are divided amongst the AtomicSymbols that make up the ambiguous BasisSymbol.

The following program demonstrates the training of three Distributions with Sequences from three different Alphabets.

```java import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; import org.biojava.bio.dist.*; import org.biojava.utils.*; import java.util.*;

public class Frequency {

 public static void main(String[] args) {

   try {
     //make a DNA SymbolList
     SymbolList dna = DNATools.createDNA("atcgctagcgtyagcntatsggca");

     //make a RNA SymbolList
     SymbolList rna = RNATools.createRNA("aucgcuaucccaggga");

     //make a protein SymbolList
     SymbolList protein = ProteinTools.createProtein("asrvgchvhilmkapqrt");

     SymbolList[] sla = {dna, rna, protein};

     //get a DistributionTrainerContext
     DistributionTrainerContext dtc = new SimpleDistributionTrainerContext();

     //make three Distributions
     Distribution dnaDist =
         DistributionFactory.DEFAULT.createDistribution(dna.getAlphabet());
     Distribution rnaDist =
         DistributionFactory.DEFAULT.createDistribution(rna.getAlphabet());
     Distribution proteinDist =
         DistributionFactory.DEFAULT.createDistribution(protein.getAlphabet());

     Distribution[] da = {dnaDist, rnaDist, proteinDist};

     //register the Distributions with the trainer
     dtc.registerDistribution(dnaDist);
     dtc.registerDistribution(rnaDist);
     dtc.registerDistribution(proteinDist);

     //for each Sequence
     for (int i = 0; i < sla.length; i++) {
       //count each Symbol to the appropriate Distribution
       for(int j = 1; j <= sla[i].length(); j++){
         dtc.addCount(da[i], sla[i].symbolAt(j), 1.0);
       }
     }

     //train the Distributions
     dtc.train();

     //print the weights of each Distribution
     for (int i = 0; i < da.length; i++) {
       for (Iterator iter = ((FiniteAlphabet)da[i].getAlphabet()).iterator();
            iter.hasNext(); ) {

         Symbol sym = (Symbol)iter.next();
         System.out.println(sym.getName()+" : "+da[i].getWeight(sym));
       }
       System.out.println("\n");
     }

   }
   catch (Exception ex) {
     ex.printStackTrace();
   }
 }

} ```