Package org.biojava.bio.seq.db.biosql

General purpose Sequence storage in a relational database.

This package is Deprecated!!

This package has been deprecated by the package org.biojavax.bio.db.biosql which uses hibernate to persist objects to biosql and maintain transaction control. We strongly recommend you use that package.

Introduction

BioSQL is a general-purpose relational database schema for the storage of biological sequence data and annotation. It evolved from the bioperl-db system.

Using BioSQL

To use BioSQL, you will need:

  • A DBMS server (currently, PostgreSQL and MySQL are supported)
  • A JDBC driver for connecting to that database (if in doubt, contact your database vendor)
  • A BioSQL schema file, suitable for your database. Currently, these can be downloaded here
You will need to create a new database and all the tables specified in the schema file. For example (for PostgreSQL users):
    createdb thomasd_biosql
    psql thomasd_biosql -f biosqldb-schema-pg.sql
When accessing the database from Java programs, you will need to:
  • Add the JDBC driver .jar file to your CLASSPATH
  • Set the jdbc.drivers system property to the class name of the driver (if in doubt, contact your database vendor).
For example:
    export CLASSPATH=biojava.jar:xerces.jar:bytecode.jar:pgjdbc2.jar
    java -Djdbc.drivers=org.postgresql.Driver demos.MyProgram
You should now be able to connect to the database by simply constructing a new BioSQLSequenceDB object, passing your database connection details to the constructor.

Each physical BioSQL database may contain multiple namespaces (sometimes called biodatabases). In BioJava, each SequenceDB only reflects a single namespace.

Working with BioSQL sequences

The BioJava-BioSQL objects are transparently persistent. This means that you don't need to do anything special to write data back to the database, and that any changes you make to BioSQL sequences will be immediately reflected in the database. If you don't want this to happen, consider using a ViewSequence.

It is possible to completely remove sequences (and all their annotation) from the database. However, an exception will be thrown if any references still exist to that sequence. The following code will fail:

    SequenceDB seqDB = new BioSQLSequenceDB(...);
    Sequence seq = seqDB.getSequence("AL121903");
    // do things with sequence
    seqDB.removeSequence("AL121903");
If, however, the variable seq is set to null before calling removeSequence, the call will succeed.

Limitations

In general, the behaviour of BioSQL sequences and features is very similar to that of the standard in-memory interfaces. However, the current version has a few limitations:

  • Only Feature and StrandedFeature are currently supported. Other sub-interfaces of Feature are silently converted to one of these basic types.
  • Objects and binary data stored in Annotation bundles of sequences and feature may be lost -- only Strings and Collections of strings are safe (this may be fixed in the future)
  • Currently, only the MySQL amd PostgreSQL databases are supported. Porting to other databases should, however, be quite easy.