Package org.biojava.bio.dist
Class DistributionTools
- java.lang.Object
-
- org.biojava.bio.dist.DistributionTools
-
public final class DistributionTools extends Object
A class to hold static methods for calculations and manipulations using Distributions.- Since:
- 1.2
- Author:
- Mark Schreiber, Matthew Pocock
-
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static boolean
areEmissionSpectraEqual(Distribution[] a, Distribution[] b)
Compares the emission spectra of two distribution arrays.static boolean
areEmissionSpectraEqual(Distribution a, Distribution b)
Compares the emission spectra of two distributions.static Distribution
average(Distribution[] dists)
Averages two or more distributions.static double
bitsOfInformation(Distribution observed)
Calculates the total bits of information for a distribution.static Distribution
countToDistribution(Count c)
Make a distribution from a count.static Distribution[]
distOverAlignment(Alignment a)
Equivalent to distOverAlignment(a, false, 0.0).static Distribution[]
distOverAlignment(Alignment a, boolean countGaps)
Creates an array of distributions, one for each column of the alignment.static Distribution[]
distOverAlignment(Alignment a, boolean countGaps, double nullWeight)
Creates an array of distributions, one for each column of the alignment.protected static Sequence
generateOrderNSequence(String name, OrderNDistribution d, int length)
Deprecated.use generateSequence() or generateSymbolList() instead.static Sequence
generateSequence(String name, Distribution d, int length)
Produces a sequence by randomly sampling the Distribution.static SymbolList
generateSymbolList(Distribution d, int length)
Produces aSymbolList
by randomly sampling a Distribution.static Distribution
jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols)
Creates a joint distribution.static HashMap
KLDistance(Distribution observed, Distribution expected, double logBase)
A method to calculate the Kullback-Liebler Distance (relative entropy).static void
randomizeDistribution(Distribution d)
Randomizes the weights of aDistribution
.static Distribution
readFromXML(InputStream is)
Read a distribution from XML.static HashMap
shannonEntropy(Distribution observed, double logBase)
A method to calculate the Shannon Entropy for a Distribution.static double
totalEntropy(Distribution observed)
Calculates the total Entropy for a Distribution.static void
writeToXML(Distribution d, OutputStream os)
Writes a Distribution to XML that can be read with the readFromXML method.
-
-
-
Method Detail
-
writeToXML
public static void writeToXML(Distribution d, OutputStream os) throws IOException
Writes a Distribution to XML that can be read with the readFromXML method.- Parameters:
d
- the Distribution to write.os
- where to write it to.- Throws:
IOException
- if writing fails
-
readFromXML
public static Distribution readFromXML(InputStream is) throws IOException, SAXException
Read a distribution from XML.- Parameters:
is
- an InputStream to read from- Returns:
- a Distribution parameterised by the xml in is
- Throws:
IOException
- if is failedSAXException
- if is could not be processed as XML
-
randomizeDistribution
public static void randomizeDistribution(Distribution d) throws ChangeVetoException
Randomizes the weights of aDistribution
.- Parameters:
d
- theDistribution
to randomize- Throws:
ChangeVetoException
- if the Distribution is locked
-
countToDistribution
public static Distribution countToDistribution(Count c)
Make a distribution from a count.- Parameters:
c
- the count- Returns:
- a Distrubution over the same
FiniteAlphabet
asc
and trained with the counts ofc
-
areEmissionSpectraEqual
public static final boolean areEmissionSpectraEqual(Distribution a, Distribution b) throws BioException
Compares the emission spectra of two distributions.- Parameters:
a
- ADistribution
with the sameAlphabet
asb
b
- ADistribution
with the sameAlphabet
asa
- Returns:
- true if alphabets and symbol weights are equal for the two distributions.
- Throws:
BioException
- if one or both of the Distributions are over infinite alphabets.- Since:
- 1.2
-
areEmissionSpectraEqual
public static final boolean areEmissionSpectraEqual(Distribution[] a, Distribution[] b) throws BioException
Compares the emission spectra of two distribution arrays.- Parameters:
a
- ADistribution[]
consisting ofDistributions
over aFiniteAlphabet
b
- ADistribution[]
consisting ofDistributions
over aFiniteAlphabet
- Returns:
- true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
- Throws:
BioException
- if one of the Distributions is over an infinite alphabet.- Since:
- 1.3
-
KLDistance
public static final HashMap KLDistance(Distribution observed, Distribution expected, double logBase)
A method to calculate the Kullback-Liebler Distance (relative entropy).- Parameters:
logBase
- - the log base for the entropy calculation. 2 is standard.observed
- - the observed frequence ofSymbols
.expected
- - the excpected or background frequency.- Returns:
- - A HashMap mapping Symbol to
(Double)
relative entropy. - Since:
- 1.2
-
shannonEntropy
public static final HashMap shannonEntropy(Distribution observed, double logBase)
A method to calculate the Shannon Entropy for a Distribution.- Parameters:
logBase
- - the log base for the entropy calculation. 2 is standard.observed
- - the observed frequence ofSymbols
.- Returns:
- - A HashMap mapping Symbol to
(Double)
entropy. - Since:
- 1.2
-
totalEntropy
public static double totalEntropy(Distribution observed)
Calculates the total Entropy for a Distribution. Entropies for individualSymbols
are weighted by their probability of occurence.- Parameters:
observed
- the observed frequence ofSymbols
.- Returns:
- the total entropy of the
Distribution
.
-
bitsOfInformation
public static final double bitsOfInformation(Distribution observed)
Calculates the total bits of information for a distribution.- Parameters:
observed
- - the observed frequence ofSymbols
.- Returns:
- the total information content of the
Distribution
. - Since:
- 1.2
-
distOverAlignment
public static Distribution[] distOverAlignment(Alignment a) throws IllegalAlphabetException
Equivalent to distOverAlignment(a, false, 0.0).- Parameters:
a
- the Alignment- Returns:
- an array of Distribution instances representing columns of the alignment
- Throws:
IllegalAlphabetException
- if the alignment alphabet is not compattible
-
jointDistOverAlignment
public static final Distribution jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols) throws IllegalAlphabetException
Creates a joint distribution.- Parameters:
a
- theAlignment
to build theDistribution[]
over.countGaps
- if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)nullWeight
- the number of pseudo counts to add to each distributioncols
- a list of positions in the alignment to include in the joint distribution- Returns:
- a
Distribution
- Throws:
IllegalAlphabetException
- if all sequences don't use the same alphabet- Since:
- 1.2
-
distOverAlignment
public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps, double nullWeight) throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment.- Parameters:
a
- theAlignment
to build theDistribution[]
over.countGaps
- if true gaps will be included in the distributionsnullWeight
- the number of pseudo counts to add to each distribution, pseudo counts will not affect gaps, no gaps, no gap counts.- Returns:
- a
Distribution[]
where each member of the array is aDistribution
of theSymbols
found at that position of theAlignment
. - Throws:
IllegalAlphabetException
- if all sequences don't use the same alphabet- Since:
- 1.2
-
distOverAlignment
public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps) throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.- Parameters:
countGaps
- if true gaps will be included in the distributionsa
- theAlignment
to build theDistribution[]
over.- Returns:
- a
Distribution[]
where each member of the array is aDistribution
of theSymbols
found at that position of theAlignment
. - Throws:
IllegalAlphabetException
- if the alignment is not composed from sequences all with the same alphabet- Since:
- 1.2
-
average
public static final Distribution average(Distribution[] dists)
Averages two or more distributions. NOTE the current implementation ignore the null model.- Parameters:
dists
- theDistributions
to average- Returns:
- a
Distribution
were the weight of eachSymbol
is the average of the weights of thatSymbol
in eachDistribution
. - Since:
- 1.2
-
generateSequence
public static final Sequence generateSequence(String name, Distribution d, int length)
Produces a sequence by randomly sampling the Distribution.- Parameters:
name
- the name for the sequenced
- the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.length
- the number of symbols in the sequence.- Returns:
- a Sequence with name and urn = to name and an Empty Annotation.
-
generateSymbolList
public static final SymbolList generateSymbolList(Distribution d, int length)
Produces aSymbolList
by randomly sampling a Distribution.- Parameters:
d
- the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.length
- the number of symbols in the sequence.- Returns:
- a SymbolList or length
length
-
generateOrderNSequence
protected static final Sequence generateOrderNSequence(String name, OrderNDistribution d, int length)
Deprecated.use generateSequence() or generateSymbolList() instead.Generate a sequence by sampling a distribution.- Parameters:
name
- the name of the sequenced
- the distribution to samplelength
- the length of the sequence- Returns:
- a new sequence with the required composition
-
-