Class GenbankReader<S extends AbstractSequence<C>,C extends Compound>
java.lang.Object
org.biojava.nbio.core.sequence.io.GenbankReader<S,C>
- Type Parameters:
S
- the sequence typeC
- the compound type
Use
GenbankReaderHelper
as an example of how to use this class where GenbankReaderHelper
should be the
primary class used to read Genbank files-
Constructor Summary
ConstructorDescriptionGenbankReader
(File file, SequenceHeaderParserInterface<S, C> headerParser, SequenceCreatorInterface<C> sequenceCreator) If you are going to use the FileProxyProteinSequenceCreator then you need to use this constructor because we need details about the location of the file.GenbankReader
(InputStream is, SequenceHeaderParserInterface<S, C> headerParser, SequenceCreatorInterface<C> sequenceCreator) If you are going to useFileProxyProteinSequenceCreator
then do not use this constructor because we need details about local file offsets for quick reads. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
boolean
isClosed()
process()
The parsing is done in this method.
This method will return all the available Genbank records in the File or InputStream, closes the underlying resource, and return the results inLinkedHashMap
.
You don't need to callclose()
after calling this method.process
(int max) This method tries to parse maximummax
records from the open File or InputStream, and leaves the underlying resource open.
Subsequent calls to the same method continue parsing the rest of the file.
This is particularly useful when dealing with very big data files, (e.g.
-
Constructor Details
-
GenbankReader
public GenbankReader(InputStream is, SequenceHeaderParserInterface<S, C> headerParser, SequenceCreatorInterface<C> sequenceCreator) If you are going to useFileProxyProteinSequenceCreator
then do not use this constructor because we need details about local file offsets for quick reads.InputStream
does not give you the name of the stream to access quickly via file seek. A seek in anInputStream
is forced to read all the data so you don't gain anything.- Parameters:
is
-headerParser
-sequenceCreator
-
-
GenbankReader
public GenbankReader(File file, SequenceHeaderParserInterface<S, C> headerParser, SequenceCreatorInterface<C> sequenceCreator) throws FileNotFoundExceptionIf you are going to use the FileProxyProteinSequenceCreator then you need to use this constructor because we need details about the location of the file.- Parameters:
file
-headerParser
-sequenceCreator
-- Throws:
FileNotFoundException
- if the file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading.SecurityException
- if a security manager exists and its checkRead method denies read access to the file.
-
-
Method Details
-
isClosed
-
process
The parsing is done in this method.
This method will return all the available Genbank records in the File or InputStream, closes the underlying resource, and return the results inLinkedHashMap
.
You don't need to callclose()
after calling this method.- Returns:
HashMap
containing all the parsed Genbank records present, starting current fileIndex onwards.- Throws:
IOException
CompoundNotFoundException
OutOfMemoryError
- if the input resource is larger than the allocated heap.- See Also:
-
process
This method tries to parse maximummax
records from the open File or InputStream, and leaves the underlying resource open.
Subsequent calls to the same method continue parsing the rest of the file.
This is particularly useful when dealing with very big data files, (e.g. NCBI nr database), which can't fit into memory and will take long time before the first result is available.
N.B.- This method can't be called after calling its NO-ARGUMENT twin.
- remember to close the underlying resource when you are done.
- Parameters:
max
- maximum number of records to return.- Returns:
HashMap
containing maximummax
parsed Genbank records present, starting current fileIndex onwards.- Throws:
IOException
CompoundNotFoundException
- Since:
- 3.0.6
- See Also:
-
close
-