Class DistributionTools


  • public final class DistributionTools
    extends Object
    A class to hold static methods for calculations and manipulations using Distributions.
    Since:
    1.2
    Author:
    Mark Schreiber, Matthew Pocock
    • Method Detail

      • writeToXML

        public static void writeToXML​(Distribution d,
                                      OutputStream os)
                               throws IOException
        Writes a Distribution to XML that can be read with the readFromXML method.
        Parameters:
        d - the Distribution to write.
        os - where to write it to.
        Throws:
        IOException - if writing fails
      • countToDistribution

        public static Distribution countToDistribution​(Count c)
        Make a distribution from a count.
        Parameters:
        c - the count
        Returns:
        a Distrubution over the same FiniteAlphabet as c and trained with the counts of c
      • areEmissionSpectraEqual

        public static final boolean areEmissionSpectraEqual​(Distribution a,
                                                            Distribution b)
                                                     throws BioException
        Compares the emission spectra of two distributions.
        Parameters:
        a - A Distribution with the same Alphabet as b
        b - A Distribution with the same Alphabet as a
        Returns:
        true if alphabets and symbol weights are equal for the two distributions.
        Throws:
        BioException - if one or both of the Distributions are over infinite alphabets.
        Since:
        1.2
      • areEmissionSpectraEqual

        public static final boolean areEmissionSpectraEqual​(Distribution[] a,
                                                            Distribution[] b)
                                                     throws BioException
        Compares the emission spectra of two distribution arrays.
        Parameters:
        a - A Distribution[] consisting of Distributions over a FiniteAlphabet
        b - A Distribution[] consisting of Distributions over a FiniteAlphabet
        Returns:
        true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
        Throws:
        BioException - if one of the Distributions is over an infinite alphabet.
        Since:
        1.3
      • KLDistance

        public static final HashMap KLDistance​(Distribution observed,
                                               Distribution expected,
                                               double logBase)
        A method to calculate the Kullback-Liebler Distance (relative entropy).
        Parameters:
        logBase - - the log base for the entropy calculation. 2 is standard.
        observed - - the observed frequence of Symbols .
        expected - - the excpected or background frequency.
        Returns:
        - A HashMap mapping Symbol to (Double) relative entropy.
        Since:
        1.2
      • shannonEntropy

        public static final HashMap shannonEntropy​(Distribution observed,
                                                   double logBase)
        A method to calculate the Shannon Entropy for a Distribution.
        Parameters:
        logBase - - the log base for the entropy calculation. 2 is standard.
        observed - - the observed frequence of Symbols .
        Returns:
        - A HashMap mapping Symbol to (Double) entropy.
        Since:
        1.2
      • totalEntropy

        public static double totalEntropy​(Distribution observed)
        Calculates the total Entropy for a Distribution. Entropies for individual Symbols are weighted by their probability of occurence.
        Parameters:
        observed - the observed frequence of Symbols .
        Returns:
        the total entropy of the Distribution .
      • bitsOfInformation

        public static final double bitsOfInformation​(Distribution observed)
        Calculates the total bits of information for a distribution.
        Parameters:
        observed - - the observed frequence of Symbols .
        Returns:
        the total information content of the Distribution .
        Since:
        1.2
      • jointDistOverAlignment

        public static final Distribution jointDistOverAlignment​(Alignment a,
                                                                boolean countGaps,
                                                                double nullWeight,
                                                                int[] cols)
                                                         throws IllegalAlphabetException
        Creates a joint distribution.
        Parameters:
        a - the Alignment to build the Distribution[] over.
        countGaps - if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)
        nullWeight - the number of pseudo counts to add to each distribution
        cols - a list of positions in the alignment to include in the joint distribution
        Returns:
        a Distribution
        Throws:
        IllegalAlphabetException - if all sequences don't use the same alphabet
        Since:
        1.2
      • distOverAlignment

        public static final Distribution[] distOverAlignment​(Alignment a,
                                                             boolean countGaps,
                                                             double nullWeight)
                                                      throws IllegalAlphabetException
        Creates an array of distributions, one for each column of the alignment.
        Parameters:
        a - the Alignment to build the Distribution[] over.
        countGaps - if true gaps will be included in the distributions
        nullWeight - the number of pseudo counts to add to each distribution, pseudo counts will not affect gaps, no gaps, no gap counts.
        Returns:
        a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
        Throws:
        IllegalAlphabetException - if all sequences don't use the same alphabet
        Since:
        1.2
      • distOverAlignment

        public static final Distribution[] distOverAlignment​(Alignment a,
                                                             boolean countGaps)
                                                      throws IllegalAlphabetException
        Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.
        Parameters:
        countGaps - if true gaps will be included in the distributions
        a - the Alignment to build the Distribution[] over.
        Returns:
        a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
        Throws:
        IllegalAlphabetException - if the alignment is not composed from sequences all with the same alphabet
        Since:
        1.2
      • average

        public static final Distribution average​(Distribution[] dists)
        Averages two or more distributions. NOTE the current implementation ignore the null model.
        Parameters:
        dists - the Distributions to average
        Returns:
        a Distribution were the weight of each Symbol is the average of the weights of that Symbol in each Distribution .
        Since:
        1.2
      • generateSequence

        public static final Sequence generateSequence​(String name,
                                                      Distribution d,
                                                      int length)
        Produces a sequence by randomly sampling the Distribution.
        Parameters:
        name - the name for the sequence
        d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
        length - the number of symbols in the sequence.
        Returns:
        a Sequence with name and urn = to name and an Empty Annotation.
      • generateSymbolList

        public static final SymbolList generateSymbolList​(Distribution d,
                                                          int length)
        Produces a SymbolList by randomly sampling a Distribution.
        Parameters:
        d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
        length - the number of symbols in the sequence.
        Returns:
        a SymbolList or length length
      • generateOrderNSequence

        protected static final Sequence generateOrderNSequence​(String name,
                                                               OrderNDistribution d,
                                                               int length)
        Deprecated.
        use generateSequence() or generateSymbolList() instead.
        Generate a sequence by sampling a distribution.
        Parameters:
        name - the name of the sequence
        d - the distribution to sample
        length - the length of the sequence
        Returns:
        a new sequence with the required composition