Class AlignmentTools
- java.lang.Object
-
- org.biojava.nbio.structure.align.util.AlignmentTools
-
public class AlignmentTools extends Object
Methods for analyzing and manipulating AFPChains and for other pairwise alignment utilities.Current methods: replace optimal alignment, create new AFPChain, format conversion, update superposition, etc.
- Author:
- Spencer Bliven, Aleix Lafita
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AlignmentTools.IdentityMap<K>
A Mapcan be viewed as a function from K to V.
-
Field Summary
Fields Modifier and Type Field Description static boolean
debug
-
Constructor Summary
Constructors Constructor Description AlignmentTools()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static Map<Integer,Integer>
alignmentAsMap(AFPChain afpChain)
Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.static void
alignmentToSIF(Writer out, AFPChain afpChain, Atom[] ca1, Atom[] ca2, String backboneInteraction, String alignmentInteraction)
Creates a simple interaction format (SIF) file for an alignment.static <S,T>
Map<S,T>applyAlignment(Map<S,T> alignmentMap, Map<T,S> identity, int k)
Applies an alignment k times.static <T> Map<T,T>
applyAlignment(Map<T,T> alignmentMap, int k)
Applies an alignment k times.static int[]
calculateBlockGap(int[][][] optAln)
Method that calculates the number of gaps in each subunit block of an optimal AFP alignment.static AFPChain
createAFPChain(Atom[] ca1, Atom[] ca2, ResidueNumber[] aligned1, ResidueNumber[] aligned2)
Fundamentally, an alignment is just a list of aligned residues in each protein.static AFPChain
deleteColumn(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int block, int pos)
Delete an alignment position from the original alignment object.static AFPChain
deleteHighestDistanceColumn(AFPChain afpChain, Atom[] ca1, Atom[] ca2)
Find the alignment position with the highest atomic distance between the equivalent atomic positions of the arrays and remove it from the alignment.static void
fillAlignedAtomArrays(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Atom[] ca1aligned, Atom[] ca2aligned)
Fill the aligned Atom arrays with the equivalent residues in the afpChain.static Map<Integer,Integer>
fromConciseAlignmentString(String string)
static List<Chain>
getAlignedModel(Atom[] ca)
get an artificial List of chains containing the Atoms and groups.static Structure
getAlignedStructure(Atom[] ca1, Atom[] ca2)
Get an artifical Structure containing both chains.static List<List<List<Integer>>>
getOptAlnAsList(AFPChain afpChain)
Retrieves the optimum alignment from an AFPChain and returns it as a java collection.static int
getSymmetryOrder(Map<Integer,Integer> alignment, int maxSymmetry, float minimumMetricChange)
Helper forgetSymmetryOrder(Map, Map, int, float)
with a true identity function (X->X).static int
getSymmetryOrder(Map<Integer,Integer> alignment, Map<Integer,Integer> identity, int maxSymmetry, float minimumMetricChange)
Tries to detect symmetry in an alignment.static int
getSymmetryOrder(AFPChain afpChain, int maxSymmetry, float minimumMetricChange)
Guesses the order of symmetry in an alignmentstatic Map<Integer,Integer>
guessSequentialAlignment(Map<Integer,Integer> alignment, boolean inverseAlignment)
Takes a potentially non-sequential alignment and guesses a sequential version of it.static boolean
isSequentialAlignment(AFPChain afpChain, boolean checkWithinBlocks)
Checks that the alignment given by afpChain is sequential.static Group[]
prepareGroupsForDisplay(AFPChain afpChain, Atom[] ca1, Atom[] ca2)
Rotate the Atoms/Groups so they are aligned for the 3D visualisationstatic AFPChain
replaceOptAln(int[][][] newAlgn, AFPChain afpChain, Atom[] ca1, Atom[] ca2)
It replaces an optimal alignment of an AFPChain and calculates all the new alignment scores and variables.static AFPChain
replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int blockNum, int[] optLens, int[][][] optAln)
static AFPChain
replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Map<Integer,Integer> alignment)
Takes an AFPChain and replaces the optimal alignment based on an alignment mapstatic Object
resizeArray(Object oldArray, int newSize)
Reallocates an array with a new size, and copies the contents of the old array to the new array.static void
shiftCA2(AFPChain afpChain, Atom[] ca2, Matrix m, Atom shift, Group[] twistedGroups)
only shift CA positions.static AFPChain
splitBlocksByTopology(AFPChain a, Atom[] ca1, Atom[] ca2)
static <S,T>
StringtoConciseAlignmentString(Map<S,T> alignment, Map<T,S> identity)
Print an alignment map in a concise representation.static <T> String
toConciseAlignmentString(Map<T,T> alignment)
static void
updateSuperposition(AFPChain afpChain, Atom[] ca1, Atom[] ca2)
After the alignment changes (optAln, optLen, blockNum, at a minimum), many other properties which depend on the superposition will be invalid.
-
-
-
Field Detail
-
debug
public static boolean debug
-
-
Constructor Detail
-
AlignmentTools
public AlignmentTools()
-
-
Method Detail
-
isSequentialAlignment
public static boolean isSequentialAlignment(AFPChain afpChain, boolean checkWithinBlocks)
Checks that the alignment given by afpChain is sequential. This means that the residue indices of both proteins increase monotonically as a function of the alignment position (ie both proteins are sorted). This will return false for circularly permuted alignments or other non-topological alignments. It will also return false for cases where the alignment itself is sequential but it is not stored in the afpChain in a sorted manner. Since algorithms which create non-sequential alignments split the alignment into multiple blocks, some computational time can be saved by only checking block boundaries for sequentiality. Setting checkWithinBlocks to true makes this function slower, but detects AFPChains with non-sequential blocks. Note that this method should give the same results asAFPChain.isSequentialAlignment()
. However, the AFPChain version relies on the StructureAlignment algorithm correctly setting this parameter, which is sadly not always the case.- Parameters:
afpChain
- An alignmentcheckWithinBlocks
- Indicates whether individual blocks should be checked for sequentiality- Returns:
- True if the alignment is sequential.
-
alignmentAsMap
public static Map<Integer,Integer> alignmentAsMap(AFPChain afpChain) throws StructureException
Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.For example,
1234 5678
becomes1->5 2->6 3->7 4->8
- Parameters:
afpChain
- An alignment- Returns:
- A mapping from aligned residues of protein 1 to their partners in protein 2.
- Throws:
StructureException
- If afpChain is not one-to-one
-
applyAlignment
public static <T> Map<T,T> applyAlignment(Map<T,T> alignmentMap, int k)
Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)).- Type Parameters:
T
-- Parameters:
alignmentMap
- The input function, as a map (seealignmentAsMap(AFPChain)
)k
- The number of times to apply the alignment- Returns:
- A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
-
applyAlignment
public static <S,T> Map<S,T> applyAlignment(Map<S,T> alignmentMap, Map<T,S> identity, int k)
Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)). To allow for functions with different domains and codomains, the identity function allows converting back in a reasonable way. For instance, if alignmentMap represented an alignment between two proteins with different numbering schemes, the identity function could calculate the offset between residue numbers, eg I(x) = x-offset. When an identity function is provided, the returned function calculates f^k(x) = f(I( f(I( ... f(x) ... )) )).- Type Parameters:
S
-T
-- Parameters:
alignmentMap
- The input function, as a map (seealignmentAsMap(AFPChain)
)identity
- An identity-like function providing the isomorphism between the codomain of alignmentMap (of type) and the domain (type ).k
- The number of times to apply the alignment- Returns:
- A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
-
getSymmetryOrder
public static int getSymmetryOrder(Map<Integer,Integer> alignment, int maxSymmetry, float minimumMetricChange)
Helper forgetSymmetryOrder(Map, Map, int, float)
with a true identity function (X->X).This method should only be used in cases where the two proteins aligned have identical numbering, as for self-alignments. See
getSymmetryOrder(AFPChain, int, float)
for a way to guess the sequential correspondence between two proteins.- Parameters:
alignment
-maxSymmetry
-minimumMetricChange
-- Returns:
-
getSymmetryOrder
public static int getSymmetryOrder(Map<Integer,Integer> alignment, Map<Integer,Integer> identity, int maxSymmetry, float minimumMetricChange)
Tries to detect symmetry in an alignment.Conceptually, an alignment is a function f:A->B between two sets of integers. The function may have simple topology (meaning that if two elements of A are close, then their images in B will also be close), or may have more complex topology (such as a circular permutation). This function checks alignment against a reference function identity, which should have simple topology. It then tries to determine the symmetry order of alignment relative to identity, up to a maximum order of maxSymmetry.
Details
Considers the offset (in number of residues) which a residue moves after undergoing n alternating transforms by alignment and identity. If n corresponds to the intrinsic order of the alignment, this will be small. This algorithm tries increasing values of n and looks for abrupt decreases in the root mean squared offset. If none are found at n<=maxSymmetry, the alignment is reported as non-symmetric.- Parameters:
alignment
- The alignment to test for symmetryidentity
- An alignment with simple topology which approximates the sequential relationship between the two proteins. Should map in the reverse direction from alignment.maxSymmetry
- Maximum symmetry to consider. High values increase the calculation time and can lead to overfitting.minimumMetricChange
- Percent decrease in root mean squared offsets in order to declare symmetry. 0.4f seems to work well for CeSymm.- Returns:
- The order of symmetry of alignment, or 1 if no order <= maxSymmetry is found.
- See Also:
For a simple identity function
-
getSymmetryOrder
public static int getSymmetryOrder(AFPChain afpChain, int maxSymmetry, float minimumMetricChange) throws StructureException
Guesses the order of symmetry in an alignmentUses
getSymmetryOrder(Map alignment, Map identity, int, float)
to determine the the symmetry order. For the identity alignment, sorts the aligned residues of each protein sequentially, then defines the ith residues of each protein to be equivalent.Note that the selection of the identity alignment here is very naive, and only works for proteins with very good coverage. Wherever possible, it is better to construct an identity function explicitly from a sequence alignment (or use an
AlignmentTools.IdentityMap
for internally symmetric proteins) and usegetSymmetryOrder(Map, Map, int, float)
.- Throws:
StructureException
-
guessSequentialAlignment
public static Map<Integer,Integer> guessSequentialAlignment(Map<Integer,Integer> alignment, boolean inverseAlignment)
Takes a potentially non-sequential alignment and guesses a sequential version of it. Residues from each structure are sorted sequentially and then compared directly.The results of this method are consistent with what one might expect from an identity function, and are therefore useful with
getSymmetryOrder(Map, Map identity, int, float)
.- Perfect self-alignments will have the same pre-image and image, so will map X->X
- Gaps and alignment errors will cause errors in the resulting map, but only locally. Errors do not propagate through the whole alignment.
Example:
A non sequential alignment, represented schematically as12456789 78912345
would result in a map12456789 12345789
- Parameters:
alignment
- The non-sequential input alignmentinverseAlignment
- If false, map from structure1 to structure2. If true, generate the inverse of that map.- Returns:
- A mapping from sequential residues of one protein to those of the other
- Throws:
IllegalArgumentException
- if the input alignment is not one-to-one.
-
getOptAlnAsList
public static List<List<List<Integer>>> getOptAlnAsList(AFPChain afpChain)
Retrieves the optimum alignment from an AFPChain and returns it as a java collection. The result is indexed in the same way asAFPChain.getOptAln()
, but has the correct size().List
- >> aln = getOptAlnAsList(AFPChain afpChain);
aln.get(blockNum).get(structureNum={0,1}).get(pos)
- Parameters:
afpChain
-- Returns:
-
createAFPChain
public static AFPChain createAFPChain(Atom[] ca1, Atom[] ca2, ResidueNumber[] aligned1, ResidueNumber[] aligned2) throws StructureException
Fundamentally, an alignment is just a list of aligned residues in each protein. This method converts two lists of ResidueNumbers into an AFPChain.Parameters are filled with defaults (often null) or sometimes calculated.
For a way to modify the alignment of an existing AFPChain, see
replaceOptAln(AFPChain, Atom[], Atom[], Map)
- Parameters:
ca1
- CA atoms of the first proteinca2
- CA atoms of the second proteinaligned1
- A list of aligned residues from the first proteinaligned2
- A list of aligned residues from the second protein. Must be the same length as aligned1.- Returns:
- An AFPChain representing the alignment. Many properties may be null or another default.
- Throws:
StructureException
- if an error occured during superpositionIllegalArgumentException
- if aligned1 and aligned2 have different lengths- See Also:
replaceOptAln(AFPChain, Atom[], Atom[], Map)
-
splitBlocksByTopology
public static AFPChain splitBlocksByTopology(AFPChain a, Atom[] ca1, Atom[] ca2) throws StructureException
- Parameters:
a
-ca1
-ca2
-- Returns:
- Throws:
StructureException
- if an error occurred during superposition
-
replaceOptAln
public static AFPChain replaceOptAln(int[][][] newAlgn, AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
It replaces an optimal alignment of an AFPChain and calculates all the new alignment scores and variables.- Throws:
StructureException
-
replaceOptAln
public static AFPChain replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Map<Integer,Integer> alignment) throws StructureException
Takes an AFPChain and replaces the optimal alignment based on an alignment mapParameters are filled with defaults (often null) or sometimes calculated.
For a way to create a new AFPChain, see
createAFPChain(Atom[], Atom[], ResidueNumber[], ResidueNumber[])
- Parameters:
afpChain
- The alignment to be modifiedalignment
- The new alignment, as a Map- Throws:
StructureException
- if an error occurred during superposition- See Also:
createAFPChain(Atom[], Atom[], ResidueNumber[], ResidueNumber[])
-
replaceOptAln
public static AFPChain replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int blockNum, int[] optLens, int[][][] optAln) throws StructureException
- Parameters:
afpChain
- Input afpchain. UNMODIFIEDca1
-ca2
-optLens
-optAln
-- Returns:
- A NEW AfpChain based off the input but with the optAln modified
- Throws:
StructureException
- if an error occured during superposition
-
updateSuperposition
public static void updateSuperposition(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
After the alignment changes (optAln, optLen, blockNum, at a minimum), many other properties which depend on the superposition will be invalid. This method re-runs a rigid superposition over the whole alignment and repopulates the required properties, including RMSD (TotalRMSD) and TM-Score.- Parameters:
afpChain
-ca1
-ca2
- Second set of ca atoms. Will be modified based on the superposition- Throws:
StructureException
-
resizeArray
public static Object resizeArray(Object oldArray, int newSize)
Reallocates an array with a new size, and copies the contents of the old array to the new array.- Parameters:
oldArray
- the old array, to be reallocated.newSize
- the new array size.- Returns:
- A new array with the same contents.
-
toConciseAlignmentString
public static <S,T> String toConciseAlignmentString(Map<S,T> alignment, Map<T,S> identity)
Print an alignment map in a concise representation. Edges are given as two numbers separated by '>'. They are chained together where possible, or separated by spaces where disjoint or branched.Note that more concise representations may be possible.
Examples:- 1>2>3>1
- 1>2>3>2 4>3
- Parameters:
alignment
- The input function, as a map (seealignmentAsMap(AFPChain)
)identity
- An identity-like function providing the isomorphism between the codomain of alignment (of type) and the domain (type ).- Returns:
-
toConciseAlignmentString
public static <T> String toConciseAlignmentString(Map<T,T> alignment)
- See Also:
toConciseAlignmentString(Map, Map)
-
fromConciseAlignmentString
public static Map<Integer,Integer> fromConciseAlignmentString(String string)
- See Also:
toConciseAlignmentString(Map, Map)
-
calculateBlockGap
public static int[] calculateBlockGap(int[][][] optAln)
Method that calculates the number of gaps in each subunit block of an optimal AFP alignment. INPUT: an optimal alignment in the format int[][][]. OUTPUT: an int[] array oflength containing the gaps in each block as int[block].
-
alignmentToSIF
public static void alignmentToSIF(Writer out, AFPChain afpChain, Atom[] ca1, Atom[] ca2, String backboneInteraction, String alignmentInteraction) throws IOException
Creates a simple interaction format (SIF) file for an alignment. The SIF file can be read by network software (eg Cytoscape) to analyze alignments as graphs. This function creates a graph with residues as nodes and two types of edges: 1. backbone edges, which connect adjacent residues in the aligned protein 2. alignment edges, which connect aligned residues- Parameters:
out
- Stream to write toafpChain
- alignment to writeca1
- First protein, used to generate node namesca2
- Second protein, used to generate node namesbackboneInteraction
- Two-letter string used to identify backbone edgesalignmentInteraction
- Two-letter string used to identify alignment edges- Throws:
IOException
-
getAlignedModel
public static final List<Chain> getAlignedModel(Atom[] ca)
get an artificial List of chains containing the Atoms and groups. Does NOT rotate anything.- Parameters:
ca
-- Returns:
- a list of Chains that is built up from the Atoms in the ca array
- Throws:
StructureException
-
getAlignedStructure
public static final Structure getAlignedStructure(Atom[] ca1, Atom[] ca2) throws StructureException
Get an artifical Structure containing both chains. Does NOT rotate anything- Parameters:
ca1
-ca2
-- Returns:
- a structure object containing two models, one for each set of Atoms.
- Throws:
StructureException
-
prepareGroupsForDisplay
public static Group[] prepareGroupsForDisplay(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
Rotate the Atoms/Groups so they are aligned for the 3D visualisation- Parameters:
afpChain
-ca1
-ca2
-- Returns:
- an array of Groups that are transformed for 3D display
- Throws:
StructureException
-
shiftCA2
public static void shiftCA2(AFPChain afpChain, Atom[] ca2, Matrix m, Atom shift, Group[] twistedGroups)
only shift CA positions.
-
fillAlignedAtomArrays
public static void fillAlignedAtomArrays(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Atom[] ca1aligned, Atom[] ca2aligned)
Fill the aligned Atom arrays with the equivalent residues in the afpChain.- Parameters:
afpChain
-ca1
-ca2
-ca1aligned
-ca2aligned
-
-
deleteHighestDistanceColumn
public static AFPChain deleteHighestDistanceColumn(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException
Find the alignment position with the highest atomic distance between the equivalent atomic positions of the arrays and remove it from the alignment.- Parameters:
afpChain
- original alignment, will be modifiedca1
- atom array, will not be modifiedca2
- atom array, will not be modified- Returns:
- the original alignment, with the alignment position at the highest distance removed
- Throws:
StructureException
-
deleteColumn
public static AFPChain deleteColumn(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int block, int pos) throws StructureException
Delete an alignment position from the original alignment object.- Parameters:
afpChain
- original alignment, will be modifiedca1
- atom array, will not be modifiedca2
- atom array, will not be modifiedblock
- block of the alignment positionpos
- position index in the block- Returns:
- the original alignment, with the alignment position removed
- Throws:
StructureException
-
-