RCSB Viewers:MBT Libs:Bonds and Nucleic Acid Identification^Classification
Notes
Bond records are ignored in the loaders. Bonds are determined either through
a dictionary lookup, or via calculation if the lookup fails.
Currently the lookup files described here are generated by an external process and are incorporated
directly within the 'Structure Models' jar as a resource. This means that they can only be updated
if the 'Structure Models' jar is updated.
A preferable approach would be to put them in their own jar, that can be updated independently
of the model jar (or any functional jars.)
See theRCSB Excludedproject,CL Toolsdirectory for more information.
Relevent Classes
- Bond - definition class
- BondFactory - Creates the bonds (static)
- ChemicalComponentBonds - does lookup for bonds
- NucleicAcidInfo - does lookup for nucleic acids
- Octree - for calculating bonds
- OctreeAtomItem - for Octree
- OctreeDataItem - for Octree
Explanation
MBT maintains a dictionary of known structures. This comes from a combined .cif file that is found at this ftp
site:
ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz
This file is loaded and broken apart by an external process - see the RCSB Excluded
project, package
tools package.
ChemicalComponentBondsCreator is run from the commandline against the file. It's not a full parser - it just
extracts bond information. The output of that (ChemicalComponentBonds.dat') is copied into the
RCSB MBT Libs
project, source directory Structure Model, in the package
util as a resource.
At runtime, this abbreviated file is picked up and put into a hash-table. Atoms are checked against this for bond
information.
If bonds are not found for a given residue, the atoms are run through a bond-generation algorithm that determines
bonds by distance. Atoms are arranged in an octree, first, for quick spatial checks.
Look in the 'RCSB MBT Libs' project, source directory 'Structure Model', in the package
model for
the StructureMap class, again. In there, find generateBonds().
Note it checks a flag to ignore the dictionary
lookup and strictly use the distance algorithm (suspect this is for debugging, mainly). The
BondFactory class is what does the dictionary lookup or bond calculations, depending on
what's required.
Incidentally, the same kind of mechanism is used to determined nucleic acid classification. In the
RCSB Excluded project, source directory CL Tools',
the FindAllNucleicAcidCompoundNames is also
run from the commandline and generates an output file ('NucleicAcidCompoundNames.dat').