Class EmblCDROMIndexReader
- java.lang.Object
-
- org.biojava.bio.seq.db.emblcd.EmblCDROMIndexReader
-
- Direct Known Subclasses:
AcnumHitReader
,AcnumTrgReader
,DivisionLkpReader
,EntryNamIdxReader
public abstract class EmblCDROMIndexReader extends Object
EmblCDROMIndexReader
is an abstract class whose concrete subclasses read EMBL CD-ROM format indices from an underlyingInputStream
. This format is used by the EMBOSS package for database indexing (see programs dbiblast, dbifasta, dbiflat and dbigcg). Indexing produces four binary files with a simple format:- division.lkp : master index
- entrynam.idx : sequence ID index
- acnum.trg : accession number index
- acnum.hit : accession number auxiliary index
Internally EMBOSS checks for Big-endian architechtures and switches the byte order to Little-endian. This means trouble if you try to read the file using
DataInputStream
, but at least the binaries are consistent across architechtures. This class carries out the necessary conversion.The EMBL CD-ROM format stores the date in 4 bytes. One byte is unused (the first one), leaving one byte for the day, one for the month and one (!) for the year.
For further information see the EMBOSS documentation, or for a full description, the source code of the dbi programs and the Ajax library.
- Since:
- 1.2
- Author:
- Keith James
-
-
Field Summary
Fields Modifier and Type Field Description protected InputStream
input
protected org.biojava.bio.seq.db.emblcd.RecordParser
recParser
protected StringBuffer
sb
-
Constructor Summary
Constructors Constructor Description EmblCDROMIndexReader(InputStream input)
Creates a newEmblCDROMIndexReader
instance.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
close()
close
closes the underlyingInputStream
.String
readDBDate()
readDBDate
reads the date from the index header.String
readDBName()
readDBName
returns the database name from the index header.String
readDBRelease()
readDBRelease
returns the database release from the index header.long
readFileLength()
readFileLength
returns the file length in bytes (stored within the file's header by the indexing program).byte[]
readRawRecord()
readRawRecord
returns the raw bytes of a single record from the index.abstract Object[]
readRecord()
readRecord
returns an array of objects parsed from a single record.long
readRecordCount()
readRecordCount
returns the number of records in the file.int
readRecordLength()
readRecordLength
returns the record length (bytes).
-
-
-
Field Detail
-
input
protected InputStream input
-
sb
protected StringBuffer sb
-
recParser
protected org.biojava.bio.seq.db.emblcd.RecordParser recParser
-
-
Constructor Detail
-
EmblCDROMIndexReader
public EmblCDROMIndexReader(InputStream input) throws IOException
Creates a newEmblCDROMIndexReader
instance. ABufferedInputStream
is probably the most suitable.- Parameters:
input
- anInputStream
.- Throws:
IOException
- if an error occurs.
-
-
Method Detail
-
readFileLength
public long readFileLength()
readFileLength
returns the file length in bytes (stored within the file's header by the indexing program). This may be called more than once as the value is cached.- Returns:
- a
long
.
-
readRecordCount
public long readRecordCount()
readRecordCount
returns the number of records in the file. This may be called more than once as the value is cached.- Returns:
- a
long
.
-
readRecordLength
public int readRecordLength()
readRecordLength
returns the record length (bytes). This may be called more than once as the value is cached.- Returns:
- an
int
.
-
readDBName
public String readDBName()
readDBName
returns the database name from the index header. This may be called more than once as the value is cached.- Returns:
- a
String
.
-
readDBRelease
public String readDBRelease()
readDBRelease
returns the database release from the index header. This may be called more than once as the value is cached.- Returns:
- a
String
.
-
readDBDate
public String readDBDate()
readDBDate
reads the date from the index header. The date is stored in 4 bytes: 0, unused; 1, year; 2, month; 3, day. With a 1 byte year it's not very much use and I'm not sure that the EMBOSS programs set the value correctly anyway.- Returns:
- a
String
.
-
readRecord
public abstract Object[] readRecord() throws IOException
readRecord
returns an array of objects parsed from a single record. Its content will depend on the type of index file. Concrete subclasses must provide an implementation of this method.- Returns:
- an
Object []
array. - Throws:
IOException
- if an error occurs.
-
readRawRecord
public byte[] readRawRecord() throws IOException
readRawRecord
returns the raw bytes of a single record from the index.- Returns:
- a
byte []
array. - Throws:
IOException
- if an error occurs.
-
close
public void close() throws IOException
close
closes the underlyingInputStream
.- Throws:
IOException
- if an error occurs.
-
-