RCSB Viewers:MBT Libs:Bonds and Nucleic Acid Identification^Classification
Notes
Bond records are ignored in the loaders. Bonds are determined either through
a dictionary lookup, or via calculation if the lookup fails.
Currently the lookup files described here are generated by an external process and are incorporated
directly within the 'Structure Models' jar as a resource. This means that they can only be updated
if the 'Structure Models' jar is updated.
A preferable approach would be to put them in their own jar, that can be updated independently
of the model jar (or any functional jars.)
See the
RCSB Excluded
project,
CL Tools
directory for more information.
Relevent Classes
- Bond - definition class
- BondFactory - Creates the bonds (static)
- ChemicalComponentBonds - does lookup for bonds
- NucleicAcidInfo - does lookup for nucleic acids
- Octree - for calculating bonds
- OctreeAtomItem - for Octree
- OctreeDataItem - for Octree
Explanation
MBT maintains a dictionary of known structures. This comes from a combined .cif file that is found at this ftp
site:
ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz
This file is loaded and broken apart by an external process - see the
RCSB Excluded
project, package
tools
package.
ChemicalComponentBondsCreator
is run from the commandline against the file. It's not a full parser - it just
extracts bond information. The output of that (ChemicalComponentBonds.dat') is copied into the
RCSB MBT Libs
project, source directory
Structure Model
, in the package
util
as a resource.
At runtime, this abbreviated file is picked up and put into a hash-table. Atoms are checked against this for bond
information.
If bonds are not found for a given residue, the atoms are run through a bond-generation algorithm that determines
bonds by distance. Atoms are arranged in an octree, first, for quick spatial checks.
Look in the 'RCSB MBT Libs' project, source directory 'Structure Model', in the package
model
for
the
StructureMap
class, again. In there, find
generateBonds()
.
Note it checks a flag to ignore the dictionary
lookup and strictly use the distance algorithm (suspect this is for debugging, mainly). The
BondFactory
class is what does the dictionary lookup or bond calculations, depending on
what's required.
Incidentally, the same kind of mechanism is used to determined nucleic acid classification. In the
RCSB Excluded
project, source directory
CL Tools'
,
the
FindAllNucleicAcidCompoundNames
is also
run from the commandline and generates an output file ('NucleicAcidCompoundNames.dat').