java.lang.Object

org.biojava.nbio.phylo.DistanceMatrixCalculator

public class DistanceMatrixCalculator extends Object

The DistanceMatrixCalculator methods generate a DistanceMatrix from a MultipleSequenceAlignment or other indirect distance infomation (RMSD).

Since:: 4.1.1
Author:: Aleix Lafita

Method Summary

Modifier and Type

Method

Description

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

dissimilarityScore(MultipleSequenceAlignment<C,D> msa, SubstitutionMatrix<D> M)

The dissimilarity score is the additive inverse of the similarity score (sum of scores) between two aligned sequences using a substitution model (Substitution Matrix).

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

fractionalDissimilarity(MultipleSequenceAlignment<C,D> msa)

The fractional dissimilarity (D) is defined as the percentage of sites that differ between two aligned sequences.

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

fractionalDissimilarityScore(MultipleSequenceAlignment<C,D> msa, SubstitutionMatrix<D> M)

The fractional dissimilarity score (Ds) is a relative measure of the dissimilarity between two aligned sequences.

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

kimuraDistance(MultipleSequenceAlignment<C,D> msa)

The Kimura evolutionary distance (d) is a correction of the fractional dissimilarity (D) specially needed for large evolutionary distances.

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

pamMLdistance(MultipleSequenceAlignment<C,D> msa)

The PAM (Point Accepted Mutations) distance is a measure of evolutionary distance in protein sequences.

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

percentageIdentity(MultipleSequenceAlignment<C,D> msa)

BioJava implementation for percentage of identity (PID).

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

poissonDistance(MultipleSequenceAlignment<C,D> msa)

The Poisson (correction) evolutionary distance (d) is a function of the fractional dissimilarity (D), given by: d = -log(1 - D) The gapped positons in the alignment are ignored in the calculation.

static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix

structuralDistance(double[][] rmsdMat, double alpha, double rmsdMax, double rmsd0)

The structural distance (d_S) uses the structural similarity (or dissimilarity) from a the structural alignment of two protein strutures.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- fractionalDissimilarity
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix fractionalDissimilarity(MultipleSequenceAlignment<C,D> msa) throws IOException
  
  The fractional dissimilarity (D) is defined as the percentage of sites that differ between two aligned sequences. The percentage of identity (PID) is the fraction of identical sites between two aligned sequences. D = 1 - PID The gapped positons in the alignment are ignored in the calculation. This method is a wrapper to the forester implementation of the calculation: PairwiseDistanceCalculator.calcFractionalDissimilarities(Msa)
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  Returns:
  
  DistanceMatrix
  
  Throws:
  
  IOException
- poissonDistance
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix poissonDistance(MultipleSequenceAlignment<C,D> msa) throws IOException
  
  The Poisson (correction) evolutionary distance (d) is a function of the fractional dissimilarity (D), given by: d = -log(1 - D) The gapped positons in the alignment are ignored in the calculation. This method is a wrapper to the forester implementation of the calculation: PairwiseDistanceCalculator.calcPoissonDistances(Msa)
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  Returns:
  
  DistanceMatrix
  
  Throws:
  
  IOException
- kimuraDistance
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix kimuraDistance(MultipleSequenceAlignment<C,D> msa) throws IOException
  
  The Kimura evolutionary distance (d) is a correction of the fractional dissimilarity (D) specially needed for large evolutionary distances. It is given by: d = -log(1 - D - 0.2 * D²) The equation is derived by fitting the relationship between the evolutionary distance (d) and the fractional dissimilarity (D) according to the PAM model of evolution (it is an empirical approximation for the method pamMLdistance(MultipleSequenceAlignment)). The gapped positons in the alignment are ignored in the calculation. This method is a wrapper to the forester implementation of the calculation: PairwiseDistanceCalculator.calcKimuraDistances(Msa).
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  Returns:
  
  DistanceMatrix
  
  Throws:
  
  IOException
- percentageIdentity
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix percentageIdentity(MultipleSequenceAlignment<C,D> msa)
  
  BioJava implementation for percentage of identity (PID). Although the name of the method is percentage of identity, the DistanceMatrix contains the fractional dissimilarity (D), computed as D = 1 - PID.
  It is recommended to use the method fractionalDissimilarity(MultipleSequenceAlignment) instead of this one.
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  Returns:
  
  DistanceMatrix
- fractionalDissimilarityScore
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix fractionalDissimilarityScore(MultipleSequenceAlignment<C,D> msa, SubstitutionMatrix<D> M)
  
  The fractional dissimilarity score (Ds) is a relative measure of the dissimilarity between two aligned sequences. It is calculated as: Ds = sum( max(M) - M_ai,bi ) / (max(M)-min(M)) ) / L Where the sum through i runs for all the alignment positions, ai and bi are the AA at position i in the first and second aligned sequences, respectively, and L is the total length of the alignment (normalization).
  The fractional dissimilarity score (Ds) is in the interval [0, 1], where 0 means that the sequences are identical and 1 that the sequences are completely different.
  Gaps do not have a contribution to the similarity score calculation (gap penalty = 0)
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  M - SubstitutionMatrix for similarity scoring
  
  Returns:
  
  DistanceMatrix
- dissimilarityScore
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix dissimilarityScore(MultipleSequenceAlignment<C,D> msa, SubstitutionMatrix<D> M)
  
  The dissimilarity score is the additive inverse of the similarity score (sum of scores) between two aligned sequences using a substitution model (Substitution Matrix). The maximum dissimilarity score is taken to be the maximum similarity score between self-alignments (each sequence against itself). Calculation of the score is as follows: Ds = maxScore - sum_i(M_ai,bi) It is recommended to use the method fractionalDissimilarityScore(MultipleSequenceAlignment, SubstitutionMatrix) , since the maximum similarity score is not relative to the data set, but relative to the Substitution Matrix, and the score is normalized by the alignment length (fractional).
  Gaps do not have a contribution to the similarity score calculation (gap penalty = 0).
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  M - SubstitutionMatrix for similarity scoring
  
  Returns:
  
  DistanceMatrix
- pamMLdistance
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix pamMLdistance(MultipleSequenceAlignment<C,D> msa)
  
  The PAM (Point Accepted Mutations) distance is a measure of evolutionary distance in protein sequences. The PAM unit represents an average substitution rate of 1% per site. The fractional dissimilarity (D) of two aligned sequences is related with the PAM distance (d) by the equation: D = sum(fi * (1 - M_ii^d)) Where the sum is for all 20 AA, fi denotes the natural fraction of the given AA and M is the substitution matrix (in this case the PAM1 matrix).
  To calculate the PAM distance between two aligned sequences the maximum likelihood (ML) approach is used, which consists in finding d that maximazies the function: L(d) = product(f_ai * (1 - M_ai,bi^d)) Where the product is for every position i in the alignment, and ai and bi are the AA at position i in the first and second aligned sequences, respectively.
  
  Parameters:
  
  msa - MultipleSequenceAlignment
  
  Returns:
- structuralDistance
  
  public static <C extends Sequence<D>, D extends Compound> org.forester.evoinference.matrix.distance.DistanceMatrix structuralDistance(double[][] rmsdMat, double alpha, double rmsdMax, double rmsd0)
  
  The structural distance (d_S) uses the structural similarity (or dissimilarity) from a the structural alignment of two protein strutures. It is based on the diffusive model for protein fold evolution (Grishin 1995). The structural deviations are captured as RMS deviations. d_Sij = (rmsd_max² / alpha²) * ln( (rmsd_max² - rmsd₀²) / (rmsd_max² - (rmsd_ij²) )
  
  Parameters:
  
  rmsdMat - RMSD matrix for all structure pairs (symmetric matrix)
  
  alpha - change in CA positions introduced by a single AA substitution (Grishin 1995: 1 A)
  
  rmsdMax - estimated RMSD between proteins of the same fold when the percentage of identity is infinitely low (the maximum allowed RMSD of proteins with the same fold). (Grishin 1995: 5 A)
  
  rmsd0 - arithmetical mean of squares of the RMSD for identical proteins (Grishin 1995: 0.4 A)
  
  Returns:
  
  DistanceMatrix

Class DistanceMatrixCalculator

Method Summary

Methods inherited from class java.lang.Object

Method Details

fractionalDissimilarity

poissonDistance

kimuraDistance

percentageIdentity

fractionalDissimilarityScore

dissimilarityScore

pamMLdistance

structuralDistance