BioJava:CookBook:PDB:read

How do I read a PDB file?

BioJava provides a PDB file parser, that reads the content of a PDB file into a flexible data model for managing protein structural data. It is possible to

  • parse individual PDB files, or
  • work with local PDB file installations.

The class providing the core functionality for this is the PDBFileReader class.

Short Example: the quickest way to read a local file


`// also works for gzip compressed files`  
`String filename =  "path/to/pdbfile.ent" ;`  
  
`PDBFileReader pdbreader = new PDBFileReader();`

`try{`

`    Structure struc = pdbreader.getStructure(filename);`  
`    `  
`} catch (Exception e){`  
`    e.printStackTrace();`  
`}`

Example: How to work with a local installation of PDB


`       try {`  
`           PDBFileReader reader = new PDBFileReader();`

`           // the path to the local PDB installation`  
`           reader.setPath("/tmp");`  
`           `  
`           // are all files in one directory, or are the files split,`  
`           // as on the PDB ftp servers?`  
`           reader.setPdbDirectorySplit(true);`  
`           `  
`           // should a missing PDB id be fetched automatically from the FTP servers?`  
`           reader.setAutoFetch(true);`  
`           `  
`           // should the ATOM and SEQRES residues be aligned when creating the internal data model?`  
`           reader.setAlignSeqRes(false);`  
`           `  
`           // should secondary structure get parsed from the file`  
`           reader.setParseSecStruc(false);`  
`           `  
`           Structure structure = reader.getStructureById("4hhb");`  
`           `  
`           System.out.println(structure);`  
`           `  
`       } catch (Exception e){`  
`           e.printStackTrace();`  
`       }`

Will give this output:

Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb4hhb.ent.gz
writing to /tmp/hh/pdb4hhb.ent.gz
structure  4HHB Authors: G.FERMI,M.F.PERUTZ Resolution: 1.74 Technique: X-RAY DIFFRACTION  Classification: OXYGEN TRANSPORT DepDate: Wed Mar 07 00:00:00 PST 1984 IdCode: 4HHB Title: THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION ModDate: Tue Feb 24 00:00:00 PST 2009 
 chains:
chain 0: >A< HEMOGLOBIN (DEOXY) (ALPHA CHAIN)
 length SEQRES: 0 length ATOM: 198 aminos: 141 hetatms: 57 nucleotides: 0
chain 1: >B< HEMOGLOBIN (DEOXY) (BETA CHAIN)
 length SEQRES: 0 length ATOM: 205 aminos: 146 hetatms: 59 nucleotides: 0
chain 2: >C< HEMOGLOBIN (DEOXY) (ALPHA CHAIN)
 length SEQRES: 0 length ATOM: 201 aminos: 141 hetatms: 60 nucleotides: 0
chain 3: >D< HEMOGLOBIN (DEOXY) (BETA CHAIN)
 length SEQRES: 0 length ATOM: 197 aminos: 146 hetatms: 51 nucleotides: 0
DBRefs: 4
DBREF  4HHB A    1   141  UNP    P69905   HBA_HUMAN        1    141
DBREF  4HHB B    1   146  UNP    P68871   HBB_HUMAN        1    146
DBREF  4HHB C    1   141  UNP    P69905   HBA_HUMAN        1    141
DBREF  4HHB D    1   146  UNP    P68871   HBB_HUMAN        1    146
Molecules: 
Compound: 1 HEMOGLOBIN (DEOXY) (ALPHA CHAIN) Chains: ChainId: A C Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN 
Compound: 2 HEMOGLOBIN (DEOXY) (BETA CHAIN) Chains: ChainId: B D Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN 

Example: How to parse a local file

This example shows how to read a PDB file from your file system, obtain a Structure object and iterate over the Groups that are contained in the file. For more examples of how to access the Atoms please go to . For more info on how the parser deals with SEQRES and ATOM records please see

```java `// also works for gzip compressed files` `String filename =  "path/to/pdbfile.ent" ;` `PDBFileReader pdbreader = new PDBFileReader();` `// the following parameters are optional: ` `//the parser can read the secondary structure` `// assignment from the PDB file header and add it to the amino acids` `pdbreader.setParseSecStruc(true);` `// align the SEQRES and ATOM records, default = true   ` `// slows the parsing speed slightly down, so if speed matters turn it off.` `pdbreader.setAlignSeqRes(true);` ` ` `// parse the C-alpha atoms only, default = false` `pdbreader.setParseCAOnly(false);` `// download missing PDB files automatically from EBI ftp server, default = false` `pdbreader.setAutoFetch(false);` `try{` `    Structure struc = pdbreader.getStructure(filename);` `    ` `    System.out.println(struc);` `    GroupIterator gi = new GroupIterator(struc);` `    while (gi.hasNext()){` `          Group g = (Group) gi.next();` `         ` `          if ( g instanceof AminoAcid ){` `              AminoAcid aa = (AminoAcid)g;` `              Map sec = aa.getSecStruc();` `              Chain  c = g.getParent();` `              System.out.println(c.getName() + " " + g + " " + sec);` `          }                ` `    }` `} catch (Exception e) {` `    e.printStackTrace();` `}` ``` To learn how to serialize a Structure object to a database see Next: - How to access atoms.