Class AlignmentTools

java.lang.Object
org.biojava.nbio.structure.align.util.AlignmentTools

public class AlignmentTools extends Object
Methods for analyzing and manipulating AFPChains and for other pairwise alignment utilities.

Current methods: replace optimal alignment, create new AFPChain, format conversion, update superposition, etc.

Author:
Spencer Bliven, Aleix Lafita
  • Field Details

    • debug

      public static boolean debug
  • Constructor Details

  • Method Details

    • isSequentialAlignment

      public static boolean isSequentialAlignment(AFPChain afpChain, boolean checkWithinBlocks)
      Checks that the alignment given by afpChain is sequential. This means that the residue indices of both proteins increase monotonically as a function of the alignment position (ie both proteins are sorted). This will return false for circularly permuted alignments or other non-topological alignments. It will also return false for cases where the alignment itself is sequential but it is not stored in the afpChain in a sorted manner. Since algorithms which create non-sequential alignments split the alignment into multiple blocks, some computational time can be saved by only checking block boundaries for sequentiality. Setting checkWithinBlocks to true makes this function slower, but detects AFPChains with non-sequential blocks. Note that this method should give the same results as AFPChain.isSequentialAlignment(). However, the AFPChain version relies on the StructureAlignment algorithm correctly setting this parameter, which is sadly not always the case.
      Parameters:
      afpChain - An alignment
      checkWithinBlocks - Indicates whether individual blocks should be checked for sequentiality
      Returns:
      True if the alignment is sequential.
    • alignmentAsMap

      public static Map<Integer,Integer> alignmentAsMap(AFPChain afpChain) throws StructureException
      Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.

      For example,

       1234
       5678
      becomes
       1->5
       2->6
       3->7
       4->8
      Parameters:
      afpChain - An alignment
      Returns:
      A mapping from aligned residues of protein 1 to their partners in protein 2.
      Throws:
      StructureException - If afpChain is not one-to-one
    • applyAlignment

      public static <T> Map<T,T> applyAlignment(Map<T,T> alignmentMap, int k)
      Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)).
      Type Parameters:
      T -
      Parameters:
      alignmentMap - The input function, as a map (see alignmentAsMap(AFPChain))
      k - The number of times to apply the alignment
      Returns:
      A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
    • applyAlignment

      public static <S, T> Map<S,T> applyAlignment(Map<S,T> alignmentMap, Map<T,S> identity, int k)
      Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)). To allow for functions with different domains and codomains, the identity function allows converting back in a reasonable way. For instance, if alignmentMap represented an alignment between two proteins with different numbering schemes, the identity function could calculate the offset between residue numbers, eg I(x) = x-offset. When an identity function is provided, the returned function calculates f^k(x) = f(I( f(I( ... f(x) ... )) )).
      Type Parameters:
      S -
      T -
      Parameters:
      alignmentMap - The input function, as a map (see alignmentAsMap(AFPChain))
      identity - An identity-like function providing the isomorphism between the codomain of alignmentMap (of type T) and the domain (type S).
      k - The number of times to apply the alignment
      Returns:
      A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
    • getSymmetryOrder

      public static int getSymmetryOrder(Map<Integer,Integer> alignment, int maxSymmetry, float minimumMetricChange)
      Helper for getSymmetryOrder(Map, Map, int, float) with a true identity function (X->X).

      This method should only be used in cases where the two proteins aligned have identical numbering, as for self-alignments. See getSymmetryOrder(AFPChain, int, float) for a way to guess the sequential correspondence between two proteins.

      Parameters:
      alignment -
      maxSymmetry -
      minimumMetricChange -
      Returns:
    • getSymmetryOrder

      public static int getSymmetryOrder(Map<Integer,Integer> alignment, Map<Integer,Integer> identity, int maxSymmetry, float minimumMetricChange)
      Tries to detect symmetry in an alignment.

      Conceptually, an alignment is a function f:A->B between two sets of integers. The function may have simple topology (meaning that if two elements of A are close, then their images in B will also be close), or may have more complex topology (such as a circular permutation). This function checks alignment against a reference function identity, which should have simple topology. It then tries to determine the symmetry order of alignment relative to identity, up to a maximum order of maxSymmetry.

      Details
      Considers the offset (in number of residues) which a residue moves after undergoing n alternating transforms by alignment and identity. If n corresponds to the intrinsic order of the alignment, this will be small. This algorithm tries increasing values of n and looks for abrupt decreases in the root mean squared offset. If none are found at n<=maxSymmetry, the alignment is reported as non-symmetric.

      Parameters:
      alignment - The alignment to test for symmetry
      identity - An alignment with simple topology which approximates the sequential relationship between the two proteins. Should map in the reverse direction from alignment.
      maxSymmetry - Maximum symmetry to consider. High values increase the calculation time and can lead to overfitting.
      minimumMetricChange - Percent decrease in root mean squared offsets in order to declare symmetry. 0.4f seems to work well for CeSymm.
      Returns:
      The order of symmetry of alignment, or 1 if no order <= maxSymmetry is found.
      See Also:
    • getSymmetryOrder

      public static int getSymmetryOrder(AFPChain afpChain, int maxSymmetry, float minimumMetricChange) throws StructureException
      Guesses the order of symmetry in an alignment

      Uses getSymmetryOrder(Map alignment, Map identity, int, float) to determine the the symmetry order. For the identity alignment, sorts the aligned residues of each protein sequentially, then defines the ith residues of each protein to be equivalent.

      Note that the selection of the identity alignment here is very naive, and only works for proteins with very good coverage. Wherever possible, it is better to construct an identity function explicitly from a sequence alignment (or use an AlignmentTools.IdentityMap for internally symmetric proteins) and use getSymmetryOrder(Map, Map, int, float).

      Throws:
      StructureException
    • guessSequentialAlignment

      public static Map<Integer,Integer> guessSequentialAlignment(Map<Integer,Integer> alignment, boolean inverseAlignment)
      Takes a potentially non-sequential alignment and guesses a sequential version of it. Residues from each structure are sorted sequentially and then compared directly.

      The results of this method are consistent with what one might expect from an identity function, and are therefore useful with getSymmetryOrder(Map, Map identity, int, float).

      • Perfect self-alignments will have the same pre-image and image, so will map X->X
      • Gaps and alignment errors will cause errors in the resulting map, but only locally. Errors do not propagate through the whole alignment.

      Example:

      A non sequential alignment, represented schematically as
       12456789
       78912345
      would result in a map
       12456789
       12345789
      Parameters:
      alignment - The non-sequential input alignment
      inverseAlignment - If false, map from structure1 to structure2. If true, generate the inverse of that map.
      Returns:
      A mapping from sequential residues of one protein to those of the other
      Throws:
      IllegalArgumentException - if the input alignment is not one-to-one.
    • getOptAlnAsList

      public static List<List<List<Integer>>> getOptAlnAsList(AFPChain afpChain)
      Retrieves the optimum alignment from an AFPChain and returns it as a java collection. The result is indexed in the same way as AFPChain.getOptAln(), but has the correct size().
      
       List<List<List<Integer>>> aln = getOptAlnAsList(AFPChain afpChain);
       aln.get(blockNum).get(structureNum={0,1}).get(pos)
       
      Parameters:
      afpChain -
      Returns:
    • createAFPChain

      public static AFPChain createAFPChain(Atom[] ca1, Atom[] ca2, ResidueNumber[] aligned1, ResidueNumber[] aligned2) throws StructureException
      Fundamentally, an alignment is just a list of aligned residues in each protein. This method converts two lists of ResidueNumbers into an AFPChain.

      Parameters are filled with defaults (often null) or sometimes calculated.

      For a way to modify the alignment of an existing AFPChain, see replaceOptAln(AFPChain, Atom[], Atom[], Map)

      Parameters:
      ca1 - CA atoms of the first protein
      ca2 - CA atoms of the second protein
      aligned1 - A list of aligned residues from the first protein
      aligned2 - A list of aligned residues from the second protein. Must be the same length as aligned1.
      Returns:
      An AFPChain representing the alignment. Many properties may be null or another default.
      Throws:
      StructureException - if an error occured during superposition
      IllegalArgumentException - if aligned1 and aligned2 have different lengths
      See Also:
    • splitBlocksByTopology

      public static AFPChain splitBlocksByTopology(AFPChain a, Atom[] ca1, Atom[] ca2) throws StructureException
      Parameters:
      a -
      ca1 -
      ca2 -
      Returns:
      Throws:
      StructureException - if an error occurred during superposition
    • replaceOptAln

      public static AFPChain replaceOptAln(int[][][] newAlgn, AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
      It replaces an optimal alignment of an AFPChain and calculates all the new alignment scores and variables.
      Throws:
      StructureException
    • replaceOptAln

      public static AFPChain replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Map<Integer,Integer> alignment) throws StructureException
      Takes an AFPChain and replaces the optimal alignment based on an alignment map

      Parameters are filled with defaults (often null) or sometimes calculated.

      For a way to create a new AFPChain, see createAFPChain(Atom[], Atom[], ResidueNumber[], ResidueNumber[])

      Parameters:
      afpChain - The alignment to be modified
      alignment - The new alignment, as a Map
      Throws:
      StructureException - if an error occurred during superposition
      See Also:
    • replaceOptAln

      public static AFPChain replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int blockNum, int[] optLens, int[][][] optAln) throws StructureException
      Parameters:
      afpChain - Input afpchain. UNMODIFIED
      ca1 -
      ca2 -
      optLens -
      optAln -
      Returns:
      A NEW AfpChain based off the input but with the optAln modified
      Throws:
      StructureException - if an error occured during superposition
    • updateSuperposition

      public static void updateSuperposition(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
      After the alignment changes (optAln, optLen, blockNum, at a minimum), many other properties which depend on the superposition will be invalid. This method re-runs a rigid superposition over the whole alignment and repopulates the required properties, including RMSD (TotalRMSD) and TM-Score.
      Parameters:
      afpChain -
      ca1 -
      ca2 - Second set of ca atoms. Will be modified based on the superposition
      Throws:
      StructureException
      See Also:
    • resizeArray

      public static Object resizeArray(Object oldArray, int newSize)
      Reallocates an array with a new size, and copies the contents of the old array to the new array.
      Parameters:
      oldArray - the old array, to be reallocated.
      newSize - the new array size.
      Returns:
      A new array with the same contents.
    • toConciseAlignmentString

      public static <S, T> String toConciseAlignmentString(Map<S,T> alignment, Map<T,S> identity)
      Print an alignment map in a concise representation. Edges are given as two numbers separated by '>'. They are chained together where possible, or separated by spaces where disjoint or branched.

      Note that more concise representations may be possible.

      Examples:
      • 1>2>3>1
      • 1>2>3>2 4>3
      Parameters:
      alignment - The input function, as a map (see alignmentAsMap(AFPChain))
      identity - An identity-like function providing the isomorphism between the codomain of alignment (of type T) and the domain (type S).
      Returns:
    • toConciseAlignmentString

      public static <T> String toConciseAlignmentString(Map<T,T> alignment)
      See Also:
    • fromConciseAlignmentString

      See Also:
    • calculateBlockGap

      public static int[] calculateBlockGap(int[][][] optAln)
      Method that calculates the number of gaps in each subunit block of an optimal AFP alignment.
      Parameters:
      optAln - an optimal alignment in the format int[][][]
      Returns:
      an int[] array of order length containing the gaps in each block as int[block]
    • alignmentToSIF

      public static void alignmentToSIF(Writer out, AFPChain afpChain, Atom[] ca1, Atom[] ca2, String backboneInteraction, String alignmentInteraction) throws IOException
      Creates a simple interaction format (SIF) file for an alignment. The SIF file can be read by network software (eg Cytoscape) to analyze alignments as graphs. This function creates a graph with residues as nodes and two types of edges: 1. backbone edges, which connect adjacent residues in the aligned protein 2. alignment edges, which connect aligned residues
      Parameters:
      out - Stream to write to
      afpChain - alignment to write
      ca1 - First protein, used to generate node names
      ca2 - Second protein, used to generate node names
      backboneInteraction - Two-letter string used to identify backbone edges
      alignmentInteraction - Two-letter string used to identify alignment edges
      Throws:
      IOException
    • getAlignedModel

      public static final List<Chain> getAlignedModel(Atom[] ca)
      get an artificial List of chains containing the Atoms and groups. Does NOT rotate anything.
      Parameters:
      ca -
      Returns:
      a list of Chains that is built up from the Atoms in the ca array
    • getAlignedStructure

      public static final Structure getAlignedStructure(Atom[] ca1, Atom[] ca2) throws StructureException
      Get an artifical Structure containing both chains. Does NOT rotate anything
      Parameters:
      ca1 -
      ca2 -
      Returns:
      a structure object containing two models, one for each set of Atoms.
      Throws:
      StructureException
    • prepareGroupsForDisplay

      public static Group[] prepareGroupsForDisplay(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
      Rotate the Atoms/Groups so they are aligned for the 3D visualisation
      Parameters:
      afpChain -
      ca1 -
      ca2 -
      Returns:
      an array of Groups that are transformed for 3D display
      Throws:
      StructureException
    • shiftCA2

      public static void shiftCA2(AFPChain afpChain, Atom[] ca2, Matrix m, Atom shift, Group[] twistedGroups)
      only shift CA positions.
    • fillAlignedAtomArrays

      public static void fillAlignedAtomArrays(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Atom[] ca1aligned, Atom[] ca2aligned)
      Fill the aligned Atom arrays with the equivalent residues in the afpChain.
      Parameters:
      afpChain -
      ca1 -
      ca2 -
      ca1aligned -
      ca2aligned -
    • deleteHighestDistanceColumn

      public static AFPChain deleteHighestDistanceColumn(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
      Find the alignment position with the highest atomic distance between the equivalent atomic positions of the arrays and remove it from the alignment.
      Parameters:
      afpChain - original alignment, will be modified
      ca1 - atom array, will not be modified
      ca2 - atom array, will not be modified
      Returns:
      the original alignment, with the alignment position at the highest distance removed
      Throws:
      StructureException
    • deleteColumn

      public static AFPChain deleteColumn(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int block, int pos) throws StructureException
      Delete an alignment position from the original alignment object.
      Parameters:
      afpChain - original alignment, will be modified
      ca1 - atom array, will not be modified
      ca2 - atom array, will not be modified
      block - block of the alignment position
      pos - position index in the block
      Returns:
      the original alignment, with the alignment position removed
      Throws:
      StructureException