Class AlignmentTools


  • public class AlignmentTools
    extends Object
    Methods for analyzing and manipulating AFPChains and for other pairwise alignment utilities.

    Current methods: replace optimal alignment, create new AFPChain, format conversion, update superposition, etc.

    Author:
    Spencer Bliven, Aleix Lafita
    • Field Detail

      • debug

        public static boolean debug
    • Method Detail

      • isSequentialAlignment

        public static boolean isSequentialAlignment​(AFPChain afpChain,
                                                    boolean checkWithinBlocks)
        Checks that the alignment given by afpChain is sequential. This means that the residue indices of both proteins increase monotonically as a function of the alignment position (ie both proteins are sorted). This will return false for circularly permuted alignments or other non-topological alignments. It will also return false for cases where the alignment itself is sequential but it is not stored in the afpChain in a sorted manner. Since algorithms which create non-sequential alignments split the alignment into multiple blocks, some computational time can be saved by only checking block boundaries for sequentiality. Setting checkWithinBlocks to true makes this function slower, but detects AFPChains with non-sequential blocks. Note that this method should give the same results as AFPChain.isSequentialAlignment(). However, the AFPChain version relies on the StructureAlignment algorithm correctly setting this parameter, which is sadly not always the case.
        Parameters:
        afpChain - An alignment
        checkWithinBlocks - Indicates whether individual blocks should be checked for sequentiality
        Returns:
        True if the alignment is sequential.
      • alignmentAsMap

        public static Map<Integer,​IntegeralignmentAsMap​(AFPChain afpChain)
                                                         throws StructureException
        Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.

        For example,

         1234
         5678
        becomes
         1->5
         2->6
         3->7
         4->8
        Parameters:
        afpChain - An alignment
        Returns:
        A mapping from aligned residues of protein 1 to their partners in protein 2.
        Throws:
        StructureException - If afpChain is not one-to-one
      • applyAlignment

        public static <T> Map<T,​T> applyAlignment​(Map<T,​T> alignmentMap,
                                                        int k)
        Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)).
        Type Parameters:
        T -
        Parameters:
        alignmentMap - The input function, as a map (see alignmentAsMap(AFPChain))
        k - The number of times to apply the alignment
        Returns:
        A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
      • applyAlignment

        public static <S,​T> Map<S,​T> applyAlignment​(Map<S,​T> alignmentMap,
                                                                Map<T,​S> identity,
                                                                int k)
        Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)). To allow for functions with different domains and codomains, the identity function allows converting back in a reasonable way. For instance, if alignmentMap represented an alignment between two proteins with different numbering schemes, the identity function could calculate the offset between residue numbers, eg I(x) = x-offset. When an identity function is provided, the returned function calculates f^k(x) = f(I( f(I( ... f(x) ... )) )).
        Type Parameters:
        S -
        T -
        Parameters:
        alignmentMap - The input function, as a map (see alignmentAsMap(AFPChain))
        identity - An identity-like function providing the isomorphism between the codomain of alignmentMap (of type ) and the domain (type ).
        k - The number of times to apply the alignment
        Returns:
        A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
      • getSymmetryOrder

        public static int getSymmetryOrder​(Map<Integer,​Integer> alignment,
                                           Map<Integer,​Integer> identity,
                                           int maxSymmetry,
                                           float minimumMetricChange)
        Tries to detect symmetry in an alignment.

        Conceptually, an alignment is a function f:A->B between two sets of integers. The function may have simple topology (meaning that if two elements of A are close, then their images in B will also be close), or may have more complex topology (such as a circular permutation). This function checks alignment against a reference function identity, which should have simple topology. It then tries to determine the symmetry order of alignment relative to identity, up to a maximum order of maxSymmetry.

        Details
        Considers the offset (in number of residues) which a residue moves after undergoing n alternating transforms by alignment and identity. If n corresponds to the intrinsic order of the alignment, this will be small. This algorithm tries increasing values of n and looks for abrupt decreases in the root mean squared offset. If none are found at n<=maxSymmetry, the alignment is reported as non-symmetric.

        Parameters:
        alignment - The alignment to test for symmetry
        identity - An alignment with simple topology which approximates the sequential relationship between the two proteins. Should map in the reverse direction from alignment.
        maxSymmetry - Maximum symmetry to consider. High values increase the calculation time and can lead to overfitting.
        minimumMetricChange - Percent decrease in root mean squared offsets in order to declare symmetry. 0.4f seems to work well for CeSymm.
        Returns:
        The order of symmetry of alignment, or 1 if no order <= maxSymmetry is found.
        See Also:
        For a simple identity function
      • guessSequentialAlignment

        public static Map<Integer,​IntegerguessSequentialAlignment​(Map<Integer,​Integer> alignment,
                                                                          boolean inverseAlignment)
        Takes a potentially non-sequential alignment and guesses a sequential version of it. Residues from each structure are sorted sequentially and then compared directly.

        The results of this method are consistent with what one might expect from an identity function, and are therefore useful with getSymmetryOrder(Map, Map identity, int, float).

        • Perfect self-alignments will have the same pre-image and image, so will map X->X
        • Gaps and alignment errors will cause errors in the resulting map, but only locally. Errors do not propagate through the whole alignment.

        Example:

        A non sequential alignment, represented schematically as
         12456789
         78912345
        would result in a map
         12456789
         12345789
        Parameters:
        alignment - The non-sequential input alignment
        inverseAlignment - If false, map from structure1 to structure2. If true, generate the inverse of that map.
        Returns:
        A mapping from sequential residues of one protein to those of the other
        Throws:
        IllegalArgumentException - if the input alignment is not one-to-one.
      • getOptAlnAsList

        public static List<List<List<Integer>>> getOptAlnAsList​(AFPChain afpChain)
        Retrieves the optimum alignment from an AFPChain and returns it as a java collection. The result is indexed in the same way as AFPChain.getOptAln(), but has the correct size().
         List>> aln = getOptAlnAsList(AFPChain afpChain);
         aln.get(blockNum).get(structureNum={0,1}).get(pos)
        Parameters:
        afpChain -
        Returns:
      • createAFPChain

        public static AFPChain createAFPChain​(Atom[] ca1,
                                              Atom[] ca2,
                                              ResidueNumber[] aligned1,
                                              ResidueNumber[] aligned2)
                                       throws StructureException
        Fundamentally, an alignment is just a list of aligned residues in each protein. This method converts two lists of ResidueNumbers into an AFPChain.

        Parameters are filled with defaults (often null) or sometimes calculated.

        For a way to modify the alignment of an existing AFPChain, see replaceOptAln(AFPChain, Atom[], Atom[], Map)

        Parameters:
        ca1 - CA atoms of the first protein
        ca2 - CA atoms of the second protein
        aligned1 - A list of aligned residues from the first protein
        aligned2 - A list of aligned residues from the second protein. Must be the same length as aligned1.
        Returns:
        An AFPChain representing the alignment. Many properties may be null or another default.
        Throws:
        StructureException - if an error occured during superposition
        IllegalArgumentException - if aligned1 and aligned2 have different lengths
        See Also:
        replaceOptAln(AFPChain, Atom[], Atom[], Map)
      • replaceOptAln

        public static AFPChain replaceOptAln​(AFPChain afpChain,
                                             Atom[] ca1,
                                             Atom[] ca2,
                                             int blockNum,
                                             int[] optLens,
                                             int[][][] optAln)
                                      throws StructureException
        Parameters:
        afpChain - Input afpchain. UNMODIFIED
        ca1 -
        ca2 -
        optLens -
        optAln -
        Returns:
        A NEW AfpChain based off the input but with the optAln modified
        Throws:
        StructureException - if an error occured during superposition
      • updateSuperposition

        public static void updateSuperposition​(AFPChain afpChain,
                                               Atom[] ca1,
                                               Atom[] ca2)
                                        throws StructureException
        After the alignment changes (optAln, optLen, blockNum, at a minimum), many other properties which depend on the superposition will be invalid. This method re-runs a rigid superposition over the whole alignment and repopulates the required properties, including RMSD (TotalRMSD) and TM-Score.
        Parameters:
        afpChain -
        ca1 -
        ca2 - Second set of ca atoms. Will be modified based on the superposition
        Throws:
        StructureException
      • resizeArray

        public static Object resizeArray​(Object oldArray,
                                         int newSize)
        Reallocates an array with a new size, and copies the contents of the old array to the new array.
        Parameters:
        oldArray - the old array, to be reallocated.
        newSize - the new array size.
        Returns:
        A new array with the same contents.
      • toConciseAlignmentString

        public static <S,​T> String toConciseAlignmentString​(Map<S,​T> alignment,
                                                                  Map<T,​S> identity)
        Print an alignment map in a concise representation. Edges are given as two numbers separated by '>'. They are chained together where possible, or separated by spaces where disjoint or branched.

        Note that more concise representations may be possible.

        Examples:
      • 1>2>3>1
      • 1>2>3>2 4>3
Parameters:
alignment - The input function, as a map (see alignmentAsMap(AFPChain))
identity - An identity-like function providing the isomorphism between the codomain of alignment (of type ) and the domain (type ).
Returns:
  • calculateBlockGap

    public static int[] calculateBlockGap​(int[][][] optAln)
    Method that calculates the number of gaps in each subunit block of an optimal AFP alignment. INPUT: an optimal alignment in the format int[][][]. OUTPUT: an int[] array of length containing the gaps in each block as int[block].
  • alignmentToSIF

    public static void alignmentToSIF​(Writer out,
                                      AFPChain afpChain,
                                      Atom[] ca1,
                                      Atom[] ca2,
                                      String backboneInteraction,
                                      String alignmentInteraction)
                               throws IOException
    Creates a simple interaction format (SIF) file for an alignment. The SIF file can be read by network software (eg Cytoscape) to analyze alignments as graphs. This function creates a graph with residues as nodes and two types of edges: 1. backbone edges, which connect adjacent residues in the aligned protein 2. alignment edges, which connect aligned residues
    Parameters:
    out - Stream to write to
    afpChain - alignment to write
    ca1 - First protein, used to generate node names
    ca2 - Second protein, used to generate node names
    backboneInteraction - Two-letter string used to identify backbone edges
    alignmentInteraction - Two-letter string used to identify alignment edges
    Throws:
    IOException
  • getAlignedModel

    public static final List<ChaingetAlignedModel​(Atom[] ca)
    get an artificial List of chains containing the Atoms and groups. Does NOT rotate anything.
    Parameters:
    ca -
    Returns:
    a list of Chains that is built up from the Atoms in the ca array
    Throws:
    StructureException
  • fillAlignedAtomArrays

    public static void fillAlignedAtomArrays​(AFPChain afpChain,
                                             Atom[] ca1,
                                             Atom[] ca2,
                                             Atom[] ca1aligned,
                                             Atom[] ca2aligned)
    Fill the aligned Atom arrays with the equivalent residues in the afpChain.
    Parameters:
    afpChain -
    ca1 -
    ca2 -
    ca1aligned -
    ca2aligned -
  • deleteHighestDistanceColumn

    public static AFPChain deleteHighestDistanceColumn​(AFPChain afpChain,
                                                       Atom[] ca1,
                                                       Atom[] ca2)
                                                throws StructureException
    Find the alignment position with the highest atomic distance between the equivalent atomic positions of the arrays and remove it from the alignment.
    Parameters:
    afpChain - original alignment, will be modified
    ca1 - atom array, will not be modified
    ca2 - atom array, will not be modified
    Returns:
    the original alignment, with the alignment position at the highest distance removed
    Throws:
    StructureException
  • deleteColumn

    public static AFPChain deleteColumn​(AFPChain afpChain,
                                        Atom[] ca1,
                                        Atom[] ca2,
                                        int block,
                                        int pos)
                                 throws StructureException
    Delete an alignment position from the original alignment object.
    Parameters:
    afpChain - original alignment, will be modified
    ca1 - atom array, will not be modified
    ca2 - atom array, will not be modified
    block - block of the alignment position
    pos - position index in the block
    Returns:
    the original alignment, with the alignment position removed
    Throws:
    StructureException