BioJava:CookBook:PDB:read

How do I read a PDB file?

BioJava provides a PDB file parser, that reads the content of a PDB file into a flexible data model for managing protein structural data. It is possible to

  • parse individual PDB files, or
  • work with local PDB file installations.

The class providing the core functionality for this is the PDBFileReader class.

Short Example: the quickest way to read a local file


`// also works for gzip compressed files`  
`String filename =  "path/to/pdbfile.ent" ;`  
  
`PDBFileReader pdbreader = new PDBFileReader();`

`try{`

`    Structure struc = pdbreader.getStructure(filename);`  
`    `  
`} catch (Exception e){`  
`    e.printStackTrace();`  
`}`

Example: How to work with a local installation of PDB


`       try {`  
`           PDBFileReader reader = new PDBFileReader();`

`           // the path to the local PDB installation`  
`           reader.setPath("/tmp");`  
`           `  
`           // are all files in one directory, or are the files split,`  
`           // as on the PDB ftp servers?`  
`           reader.setPdbDirectorySplit(true);`  
`           `  
`           // should a missing PDB id be fetched automatically from the FTP servers?`  
`           reader.setAutoFetch(true);`  
`           `  
`           // should the ATOM and SEQRES residues be aligned when creating the internal data model?`  
`           reader.setAlignSeqRes(false);`  
`           `  
`           // should secondary structure get parsed from the file`  
`           reader.setParseSecStruc(false);`  
`           `  
`           Structure structure = reader.getStructureById("4hhb");`  
`           `  
`           System.out.println(structure);`  
`           `  
`       } catch (Exception e){`  
`           e.printStackTrace();`  
`       }`

Will give this output:

Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb4hhb.ent.gz
writing to /tmp/hh/pdb4hhb.ent.gz
structure  4HHB Authors: G.FERMI,M.F.PERUTZ Resolution: 1.74 Technique: X-RAY DIFFRACTION  Classification: OXYGEN TRANSPORT DepDate: Wed Mar 07 00:00:00 PST 1984 IdCode: 4HHB Title: THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION ModDate: Tue Feb 24 00:00:00 PST 2009 
 chains:
chain 0: >A< HEMOGLOBIN (DEOXY) (ALPHA CHAIN)
 length SEQRES: 0 length ATOM: 198 aminos: 141 hetatms: 57 nucleotides: 0
chain 1: >B< HEMOGLOBIN (DEOXY) (BETA CHAIN)
 length SEQRES: 0 length ATOM: 205 aminos: 146 hetatms: 59 nucleotides: 0
chain 2: >C< HEMOGLOBIN (DEOXY) (ALPHA CHAIN)
 length SEQRES: 0 length ATOM: 201 aminos: 141 hetatms: 60 nucleotides: 0
chain 3: >D< HEMOGLOBIN (DEOXY) (BETA CHAIN)
 length SEQRES: 0 length ATOM: 197 aminos: 146 hetatms: 51 nucleotides: 0
DBRefs: 4
DBREF  4HHB A    1   141  UNP    P69905   HBA_HUMAN        1    141
DBREF  4HHB B    1   146  UNP    P68871   HBB_HUMAN        1    146
DBREF  4HHB C    1   141  UNP    P69905   HBA_HUMAN        1    141
DBREF  4HHB D    1   146  UNP    P68871   HBB_HUMAN        1    146
Molecules: 
Compound: 1 HEMOGLOBIN (DEOXY) (ALPHA CHAIN) Chains: ChainId: A C Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN 
Compound: 2 HEMOGLOBIN (DEOXY) (BETA CHAIN) Chains: ChainId: B D Engineered: YES OrganismScientific: HOMO SAPIENS OrganismTaxId: 9606 OrganismCommon: HUMAN 

Example: How to parse a local file

This example shows how to read a PDB file from your file system, obtain a Structure object and iterate over the Groups that are contained in the file. For more examples of how to access the Atoms please go to <BioJava:CookBook:PDB:atoms>. For more info on how the parser deals with SEQRES and ATOM records please see <BioJava:CookBook:PDB:seqres> ```java

// also works for gzip compressed files
String filename =  "path/to/pdbfile.ent" ;

PDBFileReader pdbreader = new PDBFileReader();

// the following parameters are optional: 

//the parser can read the secondary structure
// assignment from the PDB file header and add it to the amino acids
pdbreader.setParseSecStruc(true);

// align the SEQRES and ATOM records, default = true   
// slows the parsing speed slightly down, so if speed matters turn it off.
pdbreader.setAlignSeqRes(true);
 
// parse the C-alpha atoms only, default = false
pdbreader.setParseCAOnly(false);

// download missing PDB files automatically from EBI ftp server, default = false
pdbreader.setAutoFetch(false);

try{
    Structure struc = pdbreader.getStructure(filename);
    
    System.out.println(struc);

    GroupIterator gi = new GroupIterator(struc);

    while (gi.hasNext()){

          Group g = (Group) gi.next();
         
          if ( g instanceof AminoAcid ){
              AminoAcid aa = (AminoAcid)g;
              Map sec = aa.getSecStruc();
              Chain  c = g.getParent();
              System.out.println(c.getName() + " " + g + " " + sec);
          }                
    }

} catch (Exception e) {
    e.printStackTrace();
}

```

To learn how to serialize a Structure object to a database see <BioJava:CookBook:PDB:hibernate>

Next: <BioJava:CookBook:PDB:atoms> - How to access atoms.