BOSC2008 Abstract

BOSC2008 Abstract

The abstract was submitted as appears below. Please send an email to the biojava-dev mailing list with any further changes to be made.

BOSC2008_Abstract.odt

BOSC2008_Abstract.pdf

General information

Paper Title: BioJava Project Update

Student Paper? No

Author(s) Information

Technical Areas

Bio * Open Source Project Updates

Content

Keywords:

OBF O|B|F open-bio biojava bioperl biosql sequence alphabet feature annotation alignment protein structure phylogenetic trees

Abstract:

BioJava is a mature free and open-source project that provides a framework for processing biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. BioJava is available freely under the terms of version 2.1 of the GNU Lesser General Public License (LGPL) from http://biojava.org/. Here we present the latest BioJava release (version 1.6, released on 13 Apr 2008) which provides improvements in the packages for phylogenetic trees, processing PDB files, and genetic algorithms.

Paper:

BioJava Project Update

BioJava was conceived in 1999 by Thomas Down and Matthew Pocock as an API to simplify bioinformatics software development using Java (Pocock et al., 2000). It has since then evolved to become a fully-featured framework with modules for performing many common bioinformatics tasks.

As a free and open-source project, BioJava is developed by volunteers coordinated by the Open Bioinformatics Foundation (O|B|F, http://open-bio.org/) and is one of several Bio* toolkits (Mangalam, 2002). Over the past eight years, the BioJava has brought together nearly fifty different code contributors, hundreds of mailing list subscribers, and several wiki contributors. All code and related documentation is distributed under version 2.1 of the GNU Lesser General Public License (LGPL) license (Free Software Foundation, Inc., 1999). All wiki documentation is made available online under version 1.2 of the GNU Free Documentation License (Free Software Foundation, Inc., 2000).

BioJava has been used in a number of real-world applications, including Bioclipse (Spjuth et al., 2007), BioWeka (Gewehr et al., 2007), Cytoscape (Shannon et al., 2003), and Taverna (Oinn et al., 2004), and has been referenced in over fifty published studies. A list of these can be found on the BioJava website.

The latest BioJava release (version 1.6, released on 13 Apr 2008) offers more functionality and stability over the previous official releases. The phylogenomics package was improved and expanded by our 2007 Google Summer of Code (GSOC’07) student Boh-Yun Lee. It now contains fully-functional Nexus and Phylip parsers, and tools for calculating UPGMA and Neighbour Joining, Jukes-Kantor and Kimura Two Parameter, and MP. The PDB file parser was improved by Jules Jacobsen for better dealing with PDB header records. Andreas Dräger provided several patches for improving the genetic algorithm packages. The version 1.6 release also contains numerous bug fixes and documentation improvements.

The BioJava website is http://biojava.org/. The version 1.6 release can be downloaded from http://biojava.org/wiki/BioJava:Download.

References

Free Software Foundation, Inc. (1999) GNU Lesser General Public License, version 2.1, http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html, accessed 10 May 2008.

Free Software Foundation, Inc. (2000) GNU Free Documentation License, version 1.2, http://www.gnu.org/licenses/fdl-1.2.html, accessed 10 May 2008.

Gewehr JE, Szugat M, Zimmer R. (2007) BioWeka—extending the Weka framework for bioinformatics Bioinformatics 2007 23(5):651-653.

Mangalam H. (2002) The Bio* toolkits – a brief overview. Brief Bioinform., 3, 396-302.

Oinn T, Addis M, Ferris J, Marvin D, Greenwood M, Carver T, Pocock MR, Wipat A, Li P. (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20, 3045–3054.

Pocock M, Down T, Hubbard T. (2000) BioJava: Open Source Components for Bioinformatics. ACM SIGBIO Newsletter 20(2), 10-12.

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 2003 Nov; 13(11):2498-504.

Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JE. (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics. 2007 Feb 22;8:59.

Notes

Save for talk

As a mature project, BioJava faces several challenges:

how one deals with a large established code base

what happens when committers move on, get married, have kids, etc.

how difficult it is to deprecate and remove existing code

the BioJava3 use case & refactoring/redesign criteria gathering process

evolutionary vs. revolutionary changes

http://incubator.apache.org/learn/rules-for-revolutionaries.html

the “second system” problem

http://www.joelonsoftware.com/articles/fog0000000069.html

Version 1.6 release announcement to biojava-dev and biojava-l

Date: Sun, 13 Apr 2008 19:02:41 +0100
From: Andreas Prlic
To: biojava-dev at biojava.org, biojava-l at biojava.org
Subject: [Biojava-dev] biojava 1.6 released
Biojava 1.6 has been released and is available from http:// biojava.org/wiki/BioJava:Download

Biojava 1.6 offers more functionality and stability over the previous official releases. BioJava now depends on Java 1.5+. We highly recommend you to upgrade as soon as possible.

In detail, the phylo package org.biojavax.bio.phylo was improved and expanded by our GSOC’07 student Boh-Yun Lee. It now contains fully- functional Nexus and Phylip parsers, and tools for calculating UPGMA and Neighbour Joining, Jukes-Kantor and Kimura Two Parameter, and MP. It uses JGraphT to represent parsed trees.

The PDB file parser was improved by Jules Jacobsen for better dealing with PDB header records. Andreas Draeger provided several patches for improving the Genetic Algorithm modules. Additionally this release contains numerous bug fixes and documentation improvements.

Thanks to the entire biojava community for making this possible!

Happy Biojava-ing,

Andreas

From http://www.ohloh.net/projects/6798

As of 08 May 2008

181,197 lines of code in “biojava-live/trunk”

Estimated effort using COCOMO 1 metric: 47 Person Years

48 contributors (committers with at least one commit to cvs and/or subversion repository)

also compare with:

BioJava StatSVN: http://www.spice-3d.org/statsvn/stats/

top 10 authors:

mrp     114161 (25.5%)
thomasd 82637 (18.5%)
holland 58798 (13.1%)
kdj     43546 (9.7%)
andreas 40727 (9.1%)
mark_s  36616 (8.2%)
dhuen   25610 (5.7%)
gcox    5954 (1.3%)
birney  4087 (0.9%)
draeger 3994 (0.9%)

First commit

This commit was generated by cvs2svn to compensate for changes in r2, which included commits to RCS files with non-trunk default branches.

by birney on 2000-01-26 15:53 (over 8 years ago)

Interesting to find out what happened before this administrative commit, as there were 6539 lines of code already.

Statsvn lists the files that were addded in the first commit:

    4087 lines of code changed in:

        * org/biojava/bio: BioError.java (new 92), BioException.java (new 88)
        * org/biojava/bio/alignment: AbstractCursor.java (new), AbstractState.java (new 1), AbstractTrainer.java (new), Alignment.java (new 29), AmbiguityState.java (new), BaumWelchSampler.java (new), BaumWelchTrainer.java (new), Column.java (new), ComplementaryState.java (new), DNAState.java (new), DNAWeightMatrix.java (new), DP.java (new 10), DPCursor.java (new), DoubleAlphabet.java (new), EmissionState.java (new), FlatModel.java (new), IllegalTransitionException.java (new), MarkovModel.java (new), MarkovModelWrapper.java (new), MatrixCursor.java (new), ModelInState.java (new), ModelTrainer.java (new), SimpleAlignment.java (new 77), SimpleMarkovModel.java (new 3), SimpleModelInState.java (new), SimpleModelTrainer.java (new), SimpleState.java (new), SimpleStateLabeledSequence.java (new), SimpleStateTrainer.java (new), SimpleTransitionTrainer.java (new), SimpleWeightMatrix.java (new), SmallCursor.java (new), State.java (new), StateFactory.java (new), StateLabeledSequence.java (new), StateTrainer.java (new), StateWrapper.java (new), StoppingCriteria.java (new), SuffixTree.java (new 35), TrainerTransition.java (new), TrainingAlgorithm.java (new), Transition.java (new), TransitionTrainer.java (new), WMAsMM.java (new), WeightMatrix.java (new), WeightMatrixAnnotator.java (new 23), XmlMarkovModel.java (new)
        * org/biojava/bio/gui: BarLogoPainter.java (new 85), DNAStyle.java (new 84), LogoPainter.java (new 45), PlainStyle.java (new 56), ResidueStyle.java (new), StateLogo.java (new 7), TextLogoPainter.java (new 209)
        * org/biojava/bio/program: Meme.java (new 151)
        * org/biojava/bio/seq: AbstractAlphabet.java (new), AllSymbolsAlphabet.java (new), Alphabet.java (new 4), Annotatable.java (new 1), Annotation.java (new 2), Annotator.java (new 5), CompoundLocation.java (new 2), Feature.java (new 66), FeatureFactory.java (new), FeatureFilter.java (new 81), FeatureHolder.java (new 34), FixedWidthParser.java (new), HashSequenceDB.java (new 1), IllegalResidueException.java (new), Location.java (new 7), NameParser.java (new), PointLocation.java (new 2), RangeLocation.java (new), Residue.java (new), ResidueList.java (new), ResidueParser.java (new), SeqException.java (new), Sequence.java (new 63), SequenceDB.java (new), SequenceFactory.java (new 28), SequenceIterator.java (new 33), SimpleAlphabet.java (new), SimpleAnnotation.java (new), SimpleFeature.java (new 14), SimpleFeatureFactory.java (new), SimpleFeatureHolder.java (new 65), SimpleResidue.java (new 1), SimpleResidueList.java (new 2), SimpleSequence.java (new 1), SimpleSequenceFactory.java (new), SymbolParser.java (new)
        * org/biojava/bio/seq/io: DefaultDescriptionReader.java (new), EmblFormat.java (new 60), FastaDescriptionReader.java (new), FastaFormat.java (new 109), SequenceFormat.java (new 37), StreamReader.java (new 93), StreamWriter.java (new 50)
        * org/biojava/bio/seq/tools: AlphabetManager.java (new 1), DNATools.java (new)
        * org/biojava/stats/svm: LinearKernel.java (new 37), ListSumKernel.java (new 74), PolynomialKernel.java (new 89), RadialBaseKernel.java (new 67), SMORegressionTrainer.java (new 429), SMOTrainer.java (new 283), SVMKernel.java (new 39), SVMModel.java (new), SVMRegressionModel.java (new 170), SigmoidKernel.java (new 83), SparseVector.java (new 113), TrainingContext.java (new 31), TrainingEvent.java (new 39), TrainingListener.java (new 34)
        * org/biojava/stats/svm/tools: ClassifierExample.java (new 387), Classify.java (new 80), SVM_Light.java (new 199), Train.java (new 90), TrainRegression.java (new 86)
        * org/biojava/utils/xml: XMLDispatcher.java (new), XMLPeerBuilder.java (new), XMLPeerFactory.java (new)

Latest commit

started to develop a mmcif parser

by Andreas.Prlic (Using name ‘andreas’) on 2008-04-28 07:27 (11 days ago)

BioJava group on LinkedIn

There is a BioJava group on LinkedIn:

Developers of the BioJava open-source bioinformatics project.

Not sure how to link to it though.

To join the BioJava linkedin group: You need to be a linkedin member. You then need to find the group and ask to join. I then get notified and asked to approve it, which I will if your name sounds vaguely familiar : ) You don’t need to be a contributor just a user or interested party. –Mark 14:39, 22 May 2008 (UTC)

Wiki edits for later

Clarify reference to LGPL.

Update references to “open source” with “free and open source”. Link to FOSS page on wikipedia?

DengueInfo link on BioJavaInside is broken.

BioJava in Anger is now on the wiki (so under FDL?) but has a separate vague Copyright section, see http://biojava.org/wiki/BioJava:CookBook#Copyright

This copyright section is a direct copy from the old BioJava in Anger page. This means the statement is outdated and can probably be removed. –Mark 14:39, 22 May 2008 (UTC)