Class StructureTools

java.lang.Object
org.biojava.nbio.structure.StructureTools

public class StructureTools extends Object
A class that provides some tool methods.
Since:
1.0
Author:
Andreas Prlic, Jules Jacobsen
  • Field Details

  • Constructor Details

  • Method Details

    • getNrAtoms

      public static int getNrAtoms(Structure s)
      Count how many Atoms are contained within a Structure object.
      Parameters:
      s - the structure object
      Returns:
      the number of Atoms in this Structure
    • getNrGroups

      public static int getNrGroups(Structure s)
      Count how many groups are contained within a structure object.
      Parameters:
      s - the structure object
      Returns:
      the number of groups in the structure
    • getAtomArray

      public static Atom[] getAtomArray(Structure s, String[] atomNames)
      Returns an array of the requested Atoms from the Structure object. Iterates over all groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or HetatomImpl group. If the group does not contain all requested atoms then no atoms are added for that group. For structures with more than one model, only model 0 will be used.
      Parameters:
      s - the structure to get the atoms from
      atomNames - contains the atom names to be used.
      Returns:
      an Atom[] array
    • getAtomArrayAllModels

      public static Atom[] getAtomArrayAllModels(Structure s, String[] atomNames)
      Returns an array of the requested Atoms from the Structure object. In contrast to getAtomArray(Structure, String[]) this method iterates over all chains. Iterates over all chains and groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or HetatomImpl group. If the group does not contain all requested atoms then no atoms are added for that group. For structures with more than one model, only model 0 will be used.
      Parameters:
      s - the structure to get the atoms from
      atomNames - contains the atom names to be used.
      Returns:
      an Atom[] array
    • getAllAtomArray

      public static Atom[] getAllAtomArray(Structure s)
      Convert all atoms of the structure (all models) into an Atom array
      Parameters:
      s - input structure
      Returns:
      all atom array
    • getAllAtomArray

      public static Atom[] getAllAtomArray(Structure s, int model)
      Convert all atoms of the structure (specified model) into an Atom array
      Parameters:
      s - input structure
      Returns:
      all atom array
    • getAllAtomArray

      public static Atom[] getAllAtomArray(Chain c)
      Returns and array of all atoms of the chain, including Hydrogens (if present) and all HETATOMs. Waters are not included.
      Parameters:
      c - input chain
      Returns:
      all atom array
    • getUnalignedGroups

      public static List<Group> getUnalignedGroups(Atom[] ca)
      List of groups from the structure not included in ca (e.g. ligands). Unaligned groups are searched from all chains referenced in ca, as well as any chains in the first model of the structure from ca[0], if any.
      Parameters:
      ca - an array of atoms
      Returns:
    • getLigandsByProximity

      public static List<Group> getLigandsByProximity(Collection<Group> target, Atom[] query, double cutoff)
      Finds all ligand groups from the target which fall within the cutoff distance of some atom from the query set.
      Parameters:
      target - Set of groups including the ligands
      query - Atom selection
      cutoff - Distance from query atoms to consider, in angstroms
      Returns:
      All groups from the target with at least one atom within cutoff of a query atom
      See Also:
    • addGroupToStructure

      public static Chain addGroupToStructure(Structure s, Group g, int model, Chain chainGuess, boolean clone)
      Adds a particular group to a structure. A new chain will be created if necessary.

      When adding multiple groups, pass the return value of one call as the chainGuess parameter of the next call for efficiency.

       Chain guess = null;
       for(Group g : groups) {
           guess = addGroupToStructure(s, g, guess );
       }
       
      Parameters:
      s - structure to receive the group
      g - group to add
      chainGuess - (optional) If not null, should be a chain from s. Used to improve performance when adding many groups from the same chain
      clone - Indicates whether the input group should be cloned before being added to the new chain
      Returns:
      the chain g was added to
    • addGroupsToStructure

      public static void addGroupsToStructure(Structure s, Collection<Group> groups, int model, boolean clone)
      Add a list of groups to a new structure. Chains will be automatically created in the new structure as needed.
      Parameters:
      s - structure to receive the group
      groups - groups to add
      model - model number
      clone - Indicates whether the input groups should be cloned before being added to the new chain
    • getAllGroupsFromSubset

      public static Set<Group> getAllGroupsFromSubset(Atom[] atoms)
      Expand a set of atoms into all groups from the same structure. If the structure is set, only the first atom is used (assuming all atoms come from the same original structure). If the atoms aren't linked to a structure (for instance, for cloned atoms), searches all chains of all atoms for groups.
      Parameters:
      atoms - Sample of atoms
      Returns:
      All groups from all chains accessible from the input atoms
    • getAllGroupsFromSubset

      public static Set<Group> getAllGroupsFromSubset(Atom[] atoms, GroupType types)
      Expand a set of atoms into all groups from the same structure. If the structure is set, only the first atom is used (assuming all atoms come from the same original structure). If the atoms aren't linked to a structure (for instance, for cloned atoms), searches all chains of all atoms for groups.
      Parameters:
      atoms - Sample of atoms
      types - Type of groups to return (useful for getting only ligands, for instance). Null gets all groups.
      Returns:
      All groups from all chains accessible from the input atoms
    • getAllNonHAtomArray

      public static final Atom[] getAllNonHAtomArray(Structure s, boolean hetAtoms)
      Returns and array of all non-Hydrogen atoms in the given Structure, optionally including HET atoms or not. Waters are not included.
      Parameters:
      s -
      hetAtoms - if true HET atoms are included in array, if false they are not
      Returns:
    • getAllNonHAtomArray

      public static Atom[] getAllNonHAtomArray(Structure s, boolean hetAtoms, int modelNr)
      Returns and array of all non-Hydrogen atoms in the given Structure, optionally including HET atoms or not. Waters are not included.
      Parameters:
      s -
      hetAtoms - if true HET atoms are included in array, if false they are not
      modelNr - Model number to draw atoms from
      Returns:
    • getAllNonHAtomArray

      public static Atom[] getAllNonHAtomArray(Chain c, boolean hetAtoms)
      Returns and array of all non-Hydrogen atoms in the given Chain, optionally including HET atoms or not Waters are not included.
      Parameters:
      c -
      hetAtoms - if true HET atoms are included in array, if false they are not
      Returns:
    • getAllNonHCoordsArray

      public static javax.vecmath.Point3d[] getAllNonHCoordsArray(Chain c, boolean hetAtoms)
      Returns and array of all non-Hydrogen atoms coordinates in the given Chain, optionally including HET atoms or not Waters are not included.
      Parameters:
      c -
      hetAtoms - if true HET atoms are included in array, if false they are not
      Returns:
    • getAtomArray

      public static Atom[] getAtomArray(Chain c, String[] atomNames)
      Returns an array of the requested Atoms from the Chain object. Iterates over all groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or Hetatom group. If the group does not contain all requested atoms then no atoms are added for that group.
      Parameters:
      c - the Chain to get the atoms from
      atomNames - contains the atom names to be used.
      Returns:
      an Atom[] array
    • getAtomCAArray

      public static Atom[] getAtomCAArray(Chain c)
      Returns an Atom array of the C-alpha atoms. Any atom that is a carbon and has CA name will be returned.
      Parameters:
      c - the structure object
      Returns:
      an Atom[] array
      See Also:
    • getRepresentativeAtomArray

      public static Atom[] getRepresentativeAtomArray(Chain c)
      Gets a representative atom for each group that is part of the chain backbone. Note that modified aminoacids won't be returned as part of the backbone if the ReducedChemCompProvider was used to load the structure. For amino acids, the representative is a CA carbon. For nucleotides, the representative is the "P". Other group types will be ignored.
      Parameters:
      c -
      Returns:
      representative Atoms of the chain backbone
      Since:
      Biojava 4.1.0
    • cloneAtomArray

      public static Atom[] cloneAtomArray(Atom[] ca)
      Provides an equivalent copy of Atoms in a new array. Clones everything, starting with parent groups and chains. The chain will only contain groups that are part of the input array.
      Parameters:
      ca - array of representative atoms, e.g. CA atoms
      Returns:
      Atom array
      Since:
      Biojava 4.1.0
    • cloneGroups

      public static Group[] cloneGroups(Atom[] ca)
      Clone a set of representative Atoms, but returns the parent groups
      Parameters:
      ca - Atom array
      Returns:
      Group array
    • duplicateCA2

      public static Atom[] duplicateCA2(Atom[] ca2)
      Utility method for working with circular permutations. Creates a duplicated and cloned set of Calpha atoms from the input array.
      Parameters:
      ca2 - atom array
      Returns:
      cloned and duplicated set of input array
    • getAtomCAArray

      public static Atom[] getAtomCAArray(Structure s)
      Return an Atom array of the C-alpha atoms. Any atom that is a carbon and has CA name will be returned.
      Parameters:
      s - the structure object
      Returns:
      an Atom[] array
      See Also:
    • getRepresentativeAtomArray

      Gets a representative atom for each group that is part of the chain backbone. Note that modified aminoacids won't be returned as part of the backbone if the ReducedChemCompProvider was used to load the structure. For amino acids, the representative is a CA carbon. For nucleotides, the representative is the "P". Other group types will be ignored.
      Parameters:
      s - Input structure
      Returns:
      representative Atoms of the structure backbone
      Since:
      Biojava 4.1.0
    • getBackboneAtomArray

      public static Atom[] getBackboneAtomArray(Structure s)
      Return an Atom array of the main chain atoms: CA, C, N, O Any group that contains those atoms will be included, be it a standard aminoacid or not
      Parameters:
      s - the structure object
      Returns:
      an Atom[] array
    • get1LetterCodeAmino

      public static Character get1LetterCodeAmino(String groupCode3)
      Convert three character amino acid codes into single character e.g. convert CYS to C. Valid 3-letter codes will be those of the standard 20 amino acids plus MSE, CSE, SEC, PYH, PYL (see the aminoAcids map)
      Parameters:
      groupCode3 - a three character amino acid representation String
      Returns:
      the 1 letter code, or null if the given 3 letter code does not correspond to an amino acid code
      See Also:
    • get1LetterCode

      public static Character get1LetterCode(String groupCode3)
      Convert a three letter amino acid or nucleotide code into a single character code. If the code does not correspond to an amino acid or nucleotide, returns UNKNOWN_GROUP_LABEL. Returned null for nucleotides prior to version 4.0.1.
      Parameters:
      groupCode3 - three letter representation
      Returns:
      The 1-letter abbreviation
    • isNucleotide

      public static boolean isNucleotide(String groupCode3)
      Test if the three-letter code of an ATOM entry corresponds to a nucleotide or to an aminoacid.
      Parameters:
      groupCode3 - 3-character code for a group.
    • convertAtomsToSeq

      public static String convertAtomsToSeq(Atom[] atoms)
    • getGroupByPDBResidueNumber

      Get a group represented by a ResidueNumber.
      Parameters:
      struc - a Structure
      pdbResNum - a ResidueNumber
      Returns:
      a group in the structure that is represented by the pdbResNum.
      Throws:
      StructureException - if the group cannot be found.
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain, String[] atomNames, double cutoff)
      Returns the set of intra-chain contacts for the given chain for given atom names, i.e. the contact map. Uses a spatial hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain -
      atomNames - the array with atom names to be used. Beware: CA will do both C-alphas an Calciums if null all non-H atoms of non-hetatoms will be used
      cutoff -
      Returns:
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain, double cutoff)
      Returns the set of intra-chain contacts for the given chain for all non-H atoms of non-hetatoms, i.e. the contact map. Uses a spatial hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain -
      cutoff -
      Returns:
    • getAtomsCAInContact

      public static AtomContactSet getAtomsCAInContact(Chain chain, double cutoff)
      Returns the set of intra-chain contacts for the given chain for C-alpha atoms (including non-standard aminoacids appearing as HETATM groups), i.e. the contact map. Uses a spatial hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain -
      cutoff -
      Returns:
      See Also:
    • getRepresentativeAtomsInContact

      public static AtomContactSet getRepresentativeAtomsInContact(Chain chain, double cutoff)
      Returns the set of intra-chain contacts for the given chain for C-alpha or C3' atoms (including non-standard aminoacids appearing as HETATM groups), i.e. the contact map. Uses a spatial hashing algorithm that speeds up the calculation without need of full distance matrix.
      Parameters:
      chain -
      cutoff -
      Returns:
      Since:
      Biojava 4.1.0
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain1, Chain chain2, String[] atomNames, double cutoff, boolean hetAtoms)
      Returns the set of inter-chain contacts between the two given chains for the given atom names. Uses a spatial hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain1 -
      chain2 -
      atomNames - the array with atom names to be used. For Calphas use {"CA"}, if null all non-H atoms will be used. Note HET atoms are ignored unless this parameter is null.
      cutoff -
      hetAtoms - if true HET atoms are included, if false they are not
      Returns:
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain1, Chain chain2, double cutoff, boolean hetAtoms)
      Returns the set of inter-chain contacts between the two given chains for all non-H atoms. Uses a spatial hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain1 -
      chain2 -
      cutoff -
      hetAtoms - if true HET atoms are included, if false they are not
      Returns:
    • getGroupDistancesWithinShell

      public static Map<Group,Double> getGroupDistancesWithinShell(Structure structure, Atom centroid, Set<ResidueNumber> excludeResidues, double radius, boolean includeWater, boolean useAverageDistance)
      Finds Groups in structure that contain at least one Atom that is within radius Angstroms of centroid.
      Parameters:
      structure - The structure from which to find Groups
      centroid - The centroid of the shell
      excludeResidues - A list of ResidueNumbers to exclude
      radius - The radius from centroid, in Angstroms
      includeWater - Whether to include Groups whose only atoms are water
      useAverageDistance - When set to true, distances are the arithmetic mean (1-norm) of the distances of atoms that belong to the group and that are within the shell; otherwise, distances are the minimum of these values
      Returns:
      A map of Groups within (or partially within) the shell, to their distances in Angstroms
    • getGroupsWithinShell

      public static Set<Group> getGroupsWithinShell(Structure structure, Atom atom, Set<ResidueNumber> excludeResidues, double distance, boolean includeWater)
    • getGroupsWithinShell

      public static Set<Group> getGroupsWithinShell(Structure structure, Group group, double distance, boolean includeWater)

      Returns a Set of Groups in a structure within the distance specified of a given group.

      Updated 18-Sep-2015 sroughley to return a Set so only a unique set of Groups returned

      Parameters:
      structure - The structure to work with
      group - The 'query' group
      distance - The cutoff distance
      includeWater - Should water residues be included in the output?
      Returns:
      LinkedHashSet of Groups within at least one atom with distance of at least one atom in group
    • removeModels

      public static Structure removeModels(Structure s)
      Remove all models from a Structure and keep only the first
      Parameters:
      s - original Structure
      Returns:
      a structure that contains only the first model
      Since:
      3.0.5
    • filterLigands

      public static List<Group> filterLigands(List<Group> allGroups)
      Removes all polymeric and solvent groups from a list of groups
    • getStructure

      Short version of getStructure(String, PDBFileParser, AtomCache) which creates new parsers when needed
      Parameters:
      name -
      Returns:
      Throws:
      IOException
      StructureException
    • getStructure

      public static Structure getStructure(String name, PDBFileParser parser, AtomCache cache) throws IOException, StructureException
      Flexibly get a structure from an input String. The intent of this method is to allow any reasonable string which could refer to a structure to be correctly parsed. The following are currently supported:
      1. Filename (if name refers to an existing file)
      2. PDB ID
      3. SCOP domains
      4. PDP domains
      5. Residue ranges
      6. Other formats supported by AtomCache
      Parameters:
      name - Some reference to the protein structure
      parser - A clean PDBFileParser to use if it is a file. If null, a PDBFileParser will be instantiated if needed.
      cache - An AtomCache to use if the structure can be fetched from the PDB. If null, a AtomCache will be instantiated if needed.
      Returns:
      A Structure object
      Throws:
      IOException - if name is an existing file, but doesn't parse correctly
      StructureException - if the format is unknown, or if AtomCache throws an exception.
    • cleanUpAltLocs

      public static void cleanUpAltLocs(Structure structure)
      Cleans up the structure's alternate location (altloc) groups. All alternate location groups should have all atoms (except in the case of microheterogenity) or when a deuterium exists. Ensure that all the alt loc groups have all the atoms in the main group.
      Parameters:
      structure - The Structure to be cleaned up
    • hasNonDeuteratedEquiv

      public static boolean hasNonDeuteratedEquiv(Atom atom, Group currentGroup)
      Check to see if an Deuterated atom has a non deuterated brother in the group.
      Parameters:
      atom - the input atom that is putatively deuterium
      currentGroup - the group the atom is in
      Returns:
      true if the atom is deuterated and it's hydrogen equive exists.
    • hasDeuteratedEquiv

      public static boolean hasDeuteratedEquiv(Atom atom, Group currentGroup)
      Check to see if a Hydrogen has a Deuterated brother in the group.
      Parameters:
      atom - the input atom that is putatively hydorgen
      currentGroup - the group the atom is in
      Returns:
      true if the atom is hydrogen and it's Deuterium equiv exists.
    • reduceToRepresentativeAtoms

      public static void reduceToRepresentativeAtoms(Structure structure)
      Remove all atoms but the representative atoms (C alphas or phosphates) from the given structure.
      Parameters:
      structure - the structure
      Since:
      5.4.0