java.lang.Object

org.biojava.nbio.structure.io.StructureSequenceMatcher

public class StructureSequenceMatcher extends Object

A utility class with methods for matching ProteinSequences with Structures.

Author:: Spencer Bliven

Constructor Summary

Constructors

Constructor

Description

StructureSequenceMatcher()
Method Summary

Modifier and Type

Method

Description

static ProteinSequence

getProteinSequenceForStructure(Structure struct, Map<Integer,Group> groupIndexPosition)

Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups.

static Structure

getSubstructureMatchingProteinSequence(ProteinSequence sequence, Structure wholeStructure)

Get a substructure of wholeStructure containing only the Groups that are included in sequence.

static ResidueNumber[]

matchSequenceToStructure(ProteinSequence seq, Structure struct)

Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.

static ProteinSequence

removeGaps(ProteinSequence gapped)

Removes all gaps ('-') from a protein sequence

static <T> T[][]

removeGaps(T[][] gapped)

Creates a new list consisting of all columns of gapped where no row contained a null value.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- StructureSequenceMatcher
  
  public StructureSequenceMatcher()
Method Details
- getSubstructureMatchingProteinSequence
  
  public static Structure getSubstructureMatchingProteinSequence(ProteinSequence sequence, Structure wholeStructure)
  
  Get a substructure of wholeStructure containing only the Groups that are included in sequence. The resulting structure will contain only ATOM residues; the SEQ-RES will be empty. The Chains of the Structure will be new instances (cloned), but the Groups will not.
  Parameters:
  
  sequence - The input protein sequence
  
  wholeStructure - The structure from which to take a substructure
  
  Returns:
  
  The resulting structure
  
  See Also:
  
  matchSequenceToStructure(ProteinSequence, Structure)
- getProteinSequenceForStructure
  
  public static ProteinSequence getProteinSequenceForStructure(Structure struct, Map<Integer,Group> groupIndexPosition)
  
  Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups. Chains are appended to one another. 'X' is used for heteroatoms.
  Parameters:
  
  struct - Input structure
  
  groupIndexPosition - An empty map, which will be populated with (residue index in returned ProteinSequence) -> (Group within struct)
  
  Returns:
  
  A ProteinSequence with the full sequence of struct. Chains are concatenated in the same order as the input structures
  
  See Also:
  
  SeqRes2AtomAligner.getFullAtomSequence(List, Map, boolean)
- matchSequenceToStructure
  
  public static ResidueNumber[] matchSequenceToStructure(ProteinSequence seq, Structure struct)
  
  Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.
  Smith-Waterman alignment is used to match the sequences. Residues in the sequence but not the structure or mismatched between sequence and structure will have a null atom, while residues in the structure but not the sequence are ignored with a warning.
  
  Parameters:
  
  seq - The protein sequence. Should match the sequence of struct very closely.
  
  struct - The corresponding protein structure
  
  Returns:
  
  A list of ResidueNumbers of the same length as seq, containing either the corresponding residue or null.
- removeGaps
  
  public static ProteinSequence removeGaps(ProteinSequence gapped)
  
  Removes all gaps ('-') from a protein sequence
  
  Parameters:
  
  gapped -
  
  Returns:
- removeGaps
  
  public static <T> T[][] removeGaps(T[][] gapped)
  
  Creates a new list consisting of all columns of gapped where no row contained a null value. Here, "row" refers to the first index and "column" to the second, eg gapped.get(row).get(column)
  
  Parameters:
  
  gapped - A rectangular matrix containing null to mark gaps
  
  Returns:
  
  A new List without columns containing nulls

Class StructureSequenceMatcher

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

StructureSequenceMatcher

Method Details

getSubstructureMatchingProteinSequence

getProteinSequenceForStructure

matchSequenceToStructure

removeGaps

removeGaps