BioJava:CookBook:PDB:read
How do I read a PDB file?
BioJava provides a PDB file parser, that reads the content of a PDB file into a flexible data model for managing protein structural data. It is possible to
- parse individual PDB files, or
- work with local PDB file installations.
The class providing the core functionality for this is the PDBFileReader class.
Short Example: the quickest way to read a local file
`// also works for gzip compressed files`
`String filename = "path/to/pdbfile.ent" ;`
`PDBFileReader pdbreader = new PDBFileReader();`
`try{`
` Structure struc = pdbreader.getStructure(filename);`
` `
`} catch (Exception e){`
` e.printStackTrace();`
`}`
Example: How to work with a local installation of PDB
` try {`
` PDBFileReader reader = new PDBFileReader();`
` // the path to the local PDB installation`
` reader.setPath("/tmp");`
` `
` // are all files in one directory, or are the files split,`
` // as on the PDB ftp servers?`
` reader.setPdbDirectorySplit(true);`
` `
` // should a missing PDB id be fetched automatically from the FTP servers?`
` reader.setAutoFetch(true);`
` `
` // should the ATOM and SEQRES residues be aligned when creating the internal data model?`
` reader.setAlignSeqRes(false);`
` `
` // should secondary structure get parsed from the file`
` reader.setParseSecStruc(false);`
` `
` Structure structure = reader.getStructureById("4hhb");`
` `
` System.out.println(structure);`
` `
` } catch (Exception e){`
` e.printStackTrace();`
` }`
Will give this output:
Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb4hhb.ent.gz
writing to /tmp/hh/pdb4hhb.ent.gz
structure 4HHB Authors: G.FERMI,M.F.PERUTZ Resolution: 1.74 Technique: X-RAY DIFFRACTION Classification: OXYGEN TRANSPORT DepDate: Wed Mar 07 00:00:00 PST 1984 IdCode: 4HHB Title: THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION ModDate: Tue Feb 24 00:00:00 PST 2009
chains:
chain 0: >A< HEMOGLOBIN (DEOXY) (ALPHA CHAIN)
length SEQRES: 0 length ATOM: 198 aminos: 141 hetatms: 57 nucleotides: 0
chain 1: >B< HEMOGLOBIN (DEOXY) (BETA CHAIN)
length SEQRES: 0 length ATOM: 205 aminos: 146 hetatms: 59 nucleotides: 0
chain 2: >C< HEMOGLOBIN (DEOXY) (ALPHA CHAIN)
length SEQRES: 0 length ATOM: 201 aminos: 141 hetatms: 60 nucleotides: 0
chain 3: >D< HEMOGLOBIN (DEOXY) (BETA CHAIN)
length SEQRES: 0 length ATOM: 197 aminos: 146 hetatms: 51 nucleotides: 0
DBRefs: 4
DBREF 4HHB A 1 141 UNP P69905 HBA_HUMAN 1 141
DBREF 4HHB B 1 146 UNP P68871 HBB_HUMAN 1 146
DBREF 4HHB C 1 141 UNP P69905 HBA_HUMAN 1 141
DBREF 4HHB D 1 146 UNP P68871 HBB_HUMAN 1 146
Molecules:
Compound: 1 HEMOGLOBIN (DEOXY) (ALPHA CHAIN) Chains: ChainId: A C Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN
Compound: 2 HEMOGLOBIN (DEOXY) (BETA CHAIN) Chains: ChainId: B D Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN
Example: How to parse a local file
This example shows how to read a PDB file from your file system, obtain a Structure object and iterate over the Groups that are contained in the file. For more examples of how to access the Atoms please go to <BioJava:CookBook:PDB:atoms>. For more info on how the parser deals with SEQRES and ATOM records please see <BioJava:CookBook:PDB:seqres> ```java
// also works for gzip compressed files
String filename = "path/to/pdbfile.ent" ;
PDBFileReader pdbreader = new PDBFileReader();
// the following parameters are optional:
//the parser can read the secondary structure
// assignment from the PDB file header and add it to the amino acids
pdbreader.setParseSecStruc(true);
// align the SEQRES and ATOM records, default = true
// slows the parsing speed slightly down, so if speed matters turn it off.
pdbreader.setAlignSeqRes(true);
// parse the C-alpha atoms only, default = false
pdbreader.setParseCAOnly(false);
// download missing PDB files automatically from EBI ftp server, default = false
pdbreader.setAutoFetch(false);
try{
Structure struc = pdbreader.getStructure(filename);
System.out.println(struc);
GroupIterator gi = new GroupIterator(struc);
while (gi.hasNext()){
Group g = (Group) gi.next();
if ( g instanceof AminoAcid ){
AminoAcid aa = (AminoAcid)g;
Map sec = aa.getSecStruc();
Chain c = g.getParent();
System.out.println(c.getName() + " " + g + " " + sec);
}
}
} catch (Exception e) {
e.printStackTrace();
}
```
To learn how to serialize a Structure object to a database see <BioJava:CookBook:PDB:hibernate>
Next: <BioJava:CookBook:PDB:atoms> - How to access atoms.