Package org.biojava.bio.dist
Class DistributionTools
- java.lang.Object
 - 
- org.biojava.bio.dist.DistributionTools
 
 
- 
public final class DistributionTools extends Object
A class to hold static methods for calculations and manipulations using Distributions.- Since:
 - 1.2
 - Author:
 - Mark Schreiber, Matthew Pocock
 
 
- 
- 
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static booleanareEmissionSpectraEqual(Distribution[] a, Distribution[] b)Compares the emission spectra of two distribution arrays.static booleanareEmissionSpectraEqual(Distribution a, Distribution b)Compares the emission spectra of two distributions.static Distributionaverage(Distribution[] dists)Averages two or more distributions.static doublebitsOfInformation(Distribution observed)Calculates the total bits of information for a distribution.static DistributioncountToDistribution(Count c)Make a distribution from a count.static Distribution[]distOverAlignment(Alignment a)Equivalent to distOverAlignment(a, false, 0.0).static Distribution[]distOverAlignment(Alignment a, boolean countGaps)Creates an array of distributions, one for each column of the alignment.static Distribution[]distOverAlignment(Alignment a, boolean countGaps, double nullWeight)Creates an array of distributions, one for each column of the alignment.protected static SequencegenerateOrderNSequence(String name, OrderNDistribution d, int length)Deprecated.use generateSequence() or generateSymbolList() instead.static SequencegenerateSequence(String name, Distribution d, int length)Produces a sequence by randomly sampling the Distribution.static SymbolListgenerateSymbolList(Distribution d, int length)Produces aSymbolListby randomly sampling a Distribution.static DistributionjointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols)Creates a joint distribution.static HashMapKLDistance(Distribution observed, Distribution expected, double logBase)A method to calculate the Kullback-Liebler Distance (relative entropy).static voidrandomizeDistribution(Distribution d)Randomizes the weights of aDistribution.static DistributionreadFromXML(InputStream is)Read a distribution from XML.static HashMapshannonEntropy(Distribution observed, double logBase)A method to calculate the Shannon Entropy for a Distribution.static doubletotalEntropy(Distribution observed)Calculates the total Entropy for a Distribution.static voidwriteToXML(Distribution d, OutputStream os)Writes a Distribution to XML that can be read with the readFromXML method. 
 - 
 
- 
- 
Method Detail
- 
writeToXML
public static void writeToXML(Distribution d, OutputStream os) throws IOException
Writes a Distribution to XML that can be read with the readFromXML method.- Parameters:
 d- the Distribution to write.os- where to write it to.- Throws:
 IOException- if writing fails
 
- 
readFromXML
public static Distribution readFromXML(InputStream is) throws IOException, SAXException
Read a distribution from XML.- Parameters:
 is- an InputStream to read from- Returns:
 - a Distribution parameterised by the xml in is
 - Throws:
 IOException- if is failedSAXException- if is could not be processed as XML
 
- 
randomizeDistribution
public static void randomizeDistribution(Distribution d) throws ChangeVetoException
Randomizes the weights of aDistribution.- Parameters:
 d- theDistributionto randomize- Throws:
 ChangeVetoException- if the Distribution is locked
 
- 
countToDistribution
public static Distribution countToDistribution(Count c)
Make a distribution from a count.- Parameters:
 c- the count- Returns:
 - a Distrubution over the same 
FiniteAlphabetascand trained with the counts ofc 
 
- 
areEmissionSpectraEqual
public static final boolean areEmissionSpectraEqual(Distribution a, Distribution b) throws BioException
Compares the emission spectra of two distributions.- Parameters:
 a- ADistributionwith the sameAlphabetasbb- ADistributionwith the sameAlphabetasa- Returns:
 - true if alphabets and symbol weights are equal for the two distributions.
 - Throws:
 BioException- if one or both of the Distributions are over infinite alphabets.- Since:
 - 1.2
 
 
- 
areEmissionSpectraEqual
public static final boolean areEmissionSpectraEqual(Distribution[] a, Distribution[] b) throws BioException
Compares the emission spectra of two distribution arrays.- Parameters:
 a- ADistribution[]consisting ofDistributionsover aFiniteAlphabetb- ADistribution[]consisting ofDistributionsover aFiniteAlphabet- Returns:
 - true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
 - Throws:
 BioException- if one of the Distributions is over an infinite alphabet.- Since:
 - 1.3
 
 
- 
KLDistance
public static final HashMap KLDistance(Distribution observed, Distribution expected, double logBase)
A method to calculate the Kullback-Liebler Distance (relative entropy).- Parameters:
 logBase- - the log base for the entropy calculation. 2 is standard.observed- - the observed frequence ofSymbols.expected- - the excpected or background frequency.- Returns:
 - - A HashMap mapping Symbol to 
(Double)relative entropy. - Since:
 - 1.2
 
 
- 
shannonEntropy
public static final HashMap shannonEntropy(Distribution observed, double logBase)
A method to calculate the Shannon Entropy for a Distribution.- Parameters:
 logBase- - the log base for the entropy calculation. 2 is standard.observed- - the observed frequence ofSymbols.- Returns:
 - - A HashMap mapping Symbol to 
(Double)entropy. - Since:
 - 1.2
 
 
- 
totalEntropy
public static double totalEntropy(Distribution observed)
Calculates the total Entropy for a Distribution. Entropies for individualSymbolsare weighted by their probability of occurence.- Parameters:
 observed- the observed frequence ofSymbols.- Returns:
 - the total entropy of the 
Distribution. 
 
- 
bitsOfInformation
public static final double bitsOfInformation(Distribution observed)
Calculates the total bits of information for a distribution.- Parameters:
 observed- - the observed frequence ofSymbols.- Returns:
 - the total information content of the 
Distribution. - Since:
 - 1.2
 
 
- 
distOverAlignment
public static Distribution[] distOverAlignment(Alignment a) throws IllegalAlphabetException
Equivalent to distOverAlignment(a, false, 0.0).- Parameters:
 a- the Alignment- Returns:
 - an array of Distribution instances representing columns of the alignment
 - Throws:
 IllegalAlphabetException- if the alignment alphabet is not compattible
 
- 
jointDistOverAlignment
public static final Distribution jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols) throws IllegalAlphabetException
Creates a joint distribution.- Parameters:
 a- theAlignmentto build theDistribution[]over.countGaps- if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)nullWeight- the number of pseudo counts to add to each distributioncols- a list of positions in the alignment to include in the joint distribution- Returns:
 - a 
Distribution - Throws:
 IllegalAlphabetException- if all sequences don't use the same alphabet- Since:
 - 1.2
 
 
- 
distOverAlignment
public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps, double nullWeight) throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment.- Parameters:
 a- theAlignmentto build theDistribution[]over.countGaps- if true gaps will be included in the distributionsnullWeight- the number of pseudo counts to add to each distribution, pseudo counts will not affect gaps, no gaps, no gap counts.- Returns:
 - a 
Distribution[]where each member of the array is aDistributionof theSymbolsfound at that position of theAlignment. - Throws:
 IllegalAlphabetException- if all sequences don't use the same alphabet- Since:
 - 1.2
 
 
- 
distOverAlignment
public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps) throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.- Parameters:
 countGaps- if true gaps will be included in the distributionsa- theAlignmentto build theDistribution[]over.- Returns:
 - a 
Distribution[]where each member of the array is aDistributionof theSymbolsfound at that position of theAlignment. - Throws:
 IllegalAlphabetException- if the alignment is not composed from sequences all with the same alphabet- Since:
 - 1.2
 
 
- 
average
public static final Distribution average(Distribution[] dists)
Averages two or more distributions. NOTE the current implementation ignore the null model.- Parameters:
 dists- theDistributionsto average- Returns:
 - a 
Distributionwere the weight of eachSymbolis the average of the weights of thatSymbolin eachDistribution. - Since:
 - 1.2
 
 
- 
generateSequence
public static final Sequence generateSequence(String name, Distribution d, int length)
Produces a sequence by randomly sampling the Distribution.- Parameters:
 name- the name for the sequenced- the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.length- the number of symbols in the sequence.- Returns:
 - a Sequence with name and urn = to name and an Empty Annotation.
 
 
- 
generateSymbolList
public static final SymbolList generateSymbolList(Distribution d, int length)
Produces aSymbolListby randomly sampling a Distribution.- Parameters:
 d- the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.length- the number of symbols in the sequence.- Returns:
 - a SymbolList or length 
length 
 
- 
generateOrderNSequence
protected static final Sequence generateOrderNSequence(String name, OrderNDistribution d, int length)
Deprecated.use generateSequence() or generateSymbolList() instead.Generate a sequence by sampling a distribution.- Parameters:
 name- the name of the sequenced- the distribution to samplelength- the length of the sequence- Returns:
 - a new sequence with the required composition
 
 
 - 
 
 -