Package org.biojava.nbio.structure.io
Class StructureSequenceMatcher
- java.lang.Object
 - 
- org.biojava.nbio.structure.io.StructureSequenceMatcher
 
 
- 
public class StructureSequenceMatcher extends Object
A utility class with methods for matching ProteinSequences with Structures.- Author:
 - Spencer Bliven
 
 
- 
- 
Constructor Summary
Constructors Constructor Description StructureSequenceMatcher() 
- 
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static ProteinSequencegetProteinSequenceForStructure(Structure struct, Map<Integer,Group> groupIndexPosition)Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups.static StructuregetSubstructureMatchingProteinSequence(ProteinSequence sequence, Structure wholeStructure)static ResidueNumber[]matchSequenceToStructure(ProteinSequence seq, Structure struct)Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.static ProteinSequenceremoveGaps(ProteinSequence gapped)Removes all gaps ('-') from a protein sequencestatic <T> T[][]removeGaps(T[][] gapped)Creates a new list consisting of all columns of gapped where no row contained a null value. 
 - 
 
- 
- 
Constructor Detail
- 
StructureSequenceMatcher
public StructureSequenceMatcher()
 
 - 
 
- 
Method Detail
- 
getSubstructureMatchingProteinSequence
public static Structure getSubstructureMatchingProteinSequence(ProteinSequence sequence, Structure wholeStructure)
Get a substructure ofwholeStructurecontaining only theGroupsthat are included insequence. The resulting structure will contain onlyATOMresidues; the SEQ-RES will be empty. TheChainsof the Structure will be new instances (cloned), but theGroupswill not.- Parameters:
 sequence- The input protein sequencewholeStructure- The structure from which to take a substructure- Returns:
 - The resulting structure
 - Throws:
 StructureException
 
- 
getProteinSequenceForStructure
public static ProteinSequence getProteinSequenceForStructure(Structure struct, Map<Integer,Group> groupIndexPosition)
Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups. Chains are appended to one another. 'X' is used for heteroatoms.- Parameters:
 struct- Input structuregroupIndexPosition- An empty map, which will be populated with (residue index in returned ProteinSequence) -> (Group within struct)- Returns:
 - A ProteinSequence with the full sequence of struct. Chains are concatenated in the same order as the input structures
 
 
- 
matchSequenceToStructure
public static ResidueNumber[] matchSequenceToStructure(ProteinSequence seq, Structure struct)
Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.Smith-Waterman alignment is used to match the sequences. Residues in the sequence but not the structure or mismatched between sequence and structure will have a null atom, while residues in the structure but not the sequence are ignored with a warning.
- Parameters:
 seq- The protein sequence. Should match the sequence of struct very closely.struct- The corresponding protein structure- Returns:
 - A list of ResidueNumbers of the same length as seq, containing either the corresponding residue or null.
 
 
- 
removeGaps
public static ProteinSequence removeGaps(ProteinSequence gapped)
Removes all gaps ('-') from a protein sequence- Parameters:
 gapped-- Returns:
 
 
- 
removeGaps
public static <T> T[][] removeGaps(T[][] gapped)
Creates a new list consisting of all columns of gapped where no row contained a null value. Here, "row" refers to the first index and "column" to the second, eg gapped.get(row).get(column)- Parameters:
 gapped- A rectangular matrix containing null to mark gaps- Returns:
 - A new List without columns containing nulls
 
 
 - 
 
 -