BioJava:CookBookLegacy
BioJava 3.* release
BioJava 3 is a major re-write of BioJava 1. The cookbook for the new API
is available from here:
BioJava 1 reference
If you use BioJava 1 please cite:
BioJava In Anger - A Tutorial and Recipe Book for Those in a Hurry
BioJava can be both big and intimidating. For those of us who are in a hurry there really is a whole lot there to get your head around. This document is designed to help you develop BioJava programs that do 99% of common tasks without needing to read and understand 99% of the BioJava API.
The page was inspired by various programming cookbooks and follows a “How do I…?” type approach. Each “How do I?” is linked to some example code that does what you want and sometimes more. Basically if you find the code you want and copy and paste it into your program you should be up and running quickly. I have endeavoured to over document the code to make it more obvious what I am doing so some of the code might look a bit bloated.
If you have any suggestions, questions or comments contact the biojava mailing list. To subscribe to this list go here
If you re-use code from the cookbook please cite:
Announcing
You can now read BioJava in Anger in French (Translated by Sylvain Foisy; mise à jour / updated : 28 décembre 2009).
You can also read Biojava in Anger in Portuguese (Translated by Dickson Guedes)
You can also read BioJava in Anger in Japanese (Translated by Takeshi Sasayama and Kentaro Sugino, updated 14 Aug 2004).
How about simplified Chinese? (Translated by Wu Xin).
And lets not forget this new Italian translation (translated by Alessandro Cipriani; last update: 9 Sep 2010).
How Do I….?
Setup
- Where do I get a Java installation?
- How do I get and install BioJava?
- How do I integrate BioJava with NetBeans IDE?
Alphabets and Symbols
- How do I get a DNA, RNA or Protein Alphabet?
- How do I make a custom Alphabet from custom Symbols?
- How do I make a CrossProductAlphabet such as a codon Alphabet?
- How do I break Symbols from CrossProduct Alphabets into their component Symbols?
- How can I tell if two Alphabets or Symbols are equal?
- How can I make an ambiguous Symbol like Y or R?
Basic Sequence Manipulation
- How do I make a Sequence from a String or make a Sequence Object back into a String?
- How do I get a subsection of a Sequence?
- How do I transcribe a DNA Sequence to a RNA Sequence?
- How do I reverse complement a DNA or RNA Sequence?
- Sequences are immutable so how can I change it’s name?
- How can I edit a Sequence or SymbolList?
- How can I make a sequence motif into a regular expression?
- How can I extract all regions beeing marked (or not) with a special feature (e.g. ‘gene’ or ‘CDS’)?
Translation
- How do I translate a DNA or RNA Sequence or SymbolList to Protein?
- How do I translate a single codon to a single amino acid?
- How do I use a non standard translation table?
- How do I translate a nucleotide sequence in all six frames?
- How do I retrieve the 1-Letter code of a translated sequence containing ambiguities?
Proteomics
- How do I calculate the mass and pI of a peptide?
- How do I analyze the symbol properties of an amino acid sequence using the Amino Acid Index database?
Sequence I/O
- How do I write Sequences in Fasta format?
- How do I read in a Fasta file?
- How do I read a GenBank/EMBL/SwissProt file?(deprecated)
- How do I read a GenBank/EMBL/UniProt/FASTA/INSDseq file?
- How do I extract GenBank/EMBL/UniProt/FASTA/INSDseq sequences and write them as Fasta?
- How do I turn an ABI sequence trace into a BioJava Sequence?
- How do I work with nextgen sequencing reads in FASTQ format?
- How does sequence I/O work in BioJava?
Annotations
- How do I list the Annotations in a Sequence?
- How do I extract Annotations for a set of Features?
- How do I filter a Sequences based on their species (or another Annotation property)?
Locations and Features
- How do I specify a PointLocation?
- How do I specify a RangeLocation?
- How do CircularLocations work?
- How can I make a Feature?
- How can I filter Features by type?
- How can I remove features?
BLAST and FASTA
- How do I set up a BLAST parser?
- How do I set up a FASTA parser?
- How do I extract information from parsed results?
- How do I parse a large file; Or, How do I make a custom SearchContentHandler?
- How do I convert an XML BLAST result into HTML page?
Counts and Distributions
- How do I count the residues in a Sequence?
- How do I calculate the frequency of a Symbol in a Sequence?
- How can I turn a Count into a Distribution?
- How can I generate a random sequence from a Distribution?
- How can I find the amount of information or entropy in a Distribution?
- What is an easy way to tell if two Distributions have equal weights?
- How can I make an OrderNDistribution over a custom Alphabet?
- How can I write a Distribution as XML?
- Using Distributions to make a Gibbs sampler
- Using Distributions to make a naive Bayes classifier
- How do I calculate the composition of a Sequence or collection of Sequences? This example uses JDK 1.5 and BioJavaX
Weight Matrices and Dynamic Programming
- How do I use a WeightMatrix to find a motif?
- How do I make a HMMER like profile HMM?
- |How do I set up a custom HMM? (Link to Tutorial?? –Guedes 11:43, 8 February 2006 (EST) )
- How do I generate a pair-wise alignment with a Hidden Markov Model?
- How do I generate a global or local alignment with the Needleman-Wunsch- or the Smith-Waterman-algorithm?
User Interfaces
- How can I visualize Annotations and Features as a tree?
- How can I display a Sequence in a GUI?
- How can I create a RichSequence viewer?
- How do I display Sequence coordinates?
- How can I display features?
- How can I view an Alignment?
- How can I view an Alignment II?
- How can I display Protein Features / a Peptide Digest?
BioSQL and Sequence Databases
- How do I set up BioSQL with PostgreSQL? (by David Huen)
- How do I set up BioSQL with Oracle? (by Richard Holland)
- How do I add, view and remove Sequence Objects from a BioSQL DB?
- How can I get a sequence straight from NCBI?
External Applications and Services
- How can I use QBlast to do my alignments remotely?
- How to create multi-Alignments using ClustalW and BioJava?
Genetic Algorithms
Protein Structure
Since BioJava 1.8, all protein structure modules have moved to BioJava3.
Ontologies
Cloud computing
Disclaimer
This code is generously donated by people who probably have better things to do. Where possible we test it but errors may have crept in. As such, all code and advice here in has no warranty or guarantee of any sort. You didn’t pay for it and if you use it we are not responsible for anything that goes wrong. Be a good programmer and test it yourself before unleashing it on your corporate database.
Copyright
The documentation on this site is the property of the people who contributed it. If you wish to use it in a publication please make a request through the biojava mailing list.
The code is open-source. A good definition of open-source can be found here. If you agree with that definition then you can use it.