public class EMBLFormat extends RichSequenceFormat.HeaderlessFormat
This format will read both Pre-87 and 87+ versions of EMBL. It will also write
them both. By default, it will write the most recent version. If you want
an earlier one, you must specify the format by passing one of the constants
defined in this class to writeSequence(Sequence, String, Namespace)
.
Modifier and Type | Class and Description |
---|---|
static class |
EMBLFormat.Terms
Implements some EMBL-specific terms.
|
RichSequenceFormat.BasicFormat, RichSequenceFormat.HeaderlessFormat
Modifier and Type | Field and Description |
---|---|
protected static String |
ACCESSION_TAG |
protected static String |
AUTHORS_TAG |
protected static String |
COMMENT_TAG |
protected static String |
CONSORTIUM_TAG |
protected static String |
CONTIG_TAG |
protected static String |
DATABASE_XREF_TAG |
protected static String |
DATE_TAG |
protected static Pattern |
dbxp |
protected static String |
DEFINITION_TAG |
protected static String |
DELIMITER_TAG |
protected static Pattern |
dp |
static String |
EMBL_FORMAT
The name of the current format
|
static String |
EMBL_PRE87_FORMAT
The name of the Pre-87 format
|
protected static String |
END_SEQUENCE_TAG |
protected static String |
FEATURE_HEADER_TAG |
protected static String |
FEATURE_TAG |
protected static Pattern |
headerLine |
protected static String |
KEYWORDS_TAG |
protected static String |
LOCATOR_TAG |
protected static String |
LOCUS_TAG |
protected static Pattern |
lp |
protected static Pattern |
lpPre87 |
protected static String |
ORGANELLE_TAG |
protected static String |
ORGANISM_TAG |
protected static Pattern |
readableFileNames |
protected static String |
REFERENCE_POSITION_TAG |
protected static String |
REFERENCE_TAG |
protected static String |
REFERENCE_XREF_TAG |
protected static String |
REMARK_TAG |
protected static Pattern |
rpp |
protected static String |
SOURCE_TAG |
protected static String |
START_SEQUENCE_TAG |
protected static String |
TITLE_TAG |
protected static String |
TPA_TAG |
protected static String |
VERSION_TAG |
protected static Pattern |
vp |
Constructor and Description |
---|
EMBLFormat() |
Modifier and Type | Method and Description |
---|---|
boolean |
canRead(BufferedInputStream stream)
Check to see if a given stream is in our format.
|
boolean |
canRead(File file)
Check to see if a given file is in our format.
|
String |
getDefaultFormat()
getDefaultFormat returns the String identifier for
the default sub-format written by a SequenceFormat
implementation. |
SymbolTokenization |
guessSymbolTokenization(BufferedInputStream stream)
On the assumption that the stream is readable by this format (not checked),
attempt to guess which symbol tokenization we should use to read it.
|
SymbolTokenization |
guessSymbolTokenization(File file)
On the assumption that the file is readable by this format (not checked),
attempt to guess which symbol tokenization we should use to read it.
|
boolean |
readRichSequence(BufferedReader reader,
SymbolTokenization symParser,
RichSeqIOListener rlistener,
Namespace ns)
Reads a sequence from the given buffered reader using the given tokenizer to parse
sequence symbols.
|
boolean |
readSequence(BufferedReader reader,
SymbolTokenization symParser,
SeqIOListener listener)
Read a sequence and pass data on to a SeqIOListener.
|
void |
writeSequence(Sequence seq,
Namespace ns)
Writes a sequence out to the outputstream given by beginWriting() using the default format of the
implementing class.
|
void |
writeSequence(Sequence seq,
PrintStream os)
writeSequence writes a sequence to the specified
PrintStream, using the default format. |
void |
writeSequence(Sequence seq,
String format,
Namespace ns)
As per
writeSequence(Sequence, Namespace) , except
that it also takes a format parameter. |
void |
writeSequence(Sequence seq,
String format,
PrintStream os)
writeSequence writes a sequence to the specified
PrintStream , using the specified format. |
beginWriting, finishWriting
getElideComments, getElideFeatures, getElideReferences, getElideSymbols, getLineWidth, getPrintStream, setElideComments, setElideFeatures, setElideReferences, setElideSymbols, setLineWidth, setPrintStream
public static final String EMBL_PRE87_FORMAT
public static final String EMBL_FORMAT
protected static final String LOCUS_TAG
protected static final String ACCESSION_TAG
protected static final String VERSION_TAG
protected static final String DEFINITION_TAG
protected static final String DATE_TAG
protected static final String DATABASE_XREF_TAG
protected static final String SOURCE_TAG
protected static final String ORGANISM_TAG
protected static final String ORGANELLE_TAG
protected static final String REFERENCE_TAG
protected static final String REFERENCE_POSITION_TAG
protected static final String REFERENCE_XREF_TAG
protected static final String AUTHORS_TAG
protected static final String CONSORTIUM_TAG
protected static final String TITLE_TAG
protected static final String LOCATOR_TAG
protected static final String REMARK_TAG
protected static final String KEYWORDS_TAG
protected static final String COMMENT_TAG
protected static final String FEATURE_HEADER_TAG
protected static final String FEATURE_TAG
protected static final String CONTIG_TAG
protected static final String TPA_TAG
protected static final String START_SEQUENCE_TAG
protected static final String DELIMITER_TAG
protected static final String END_SEQUENCE_TAG
protected static final Pattern readableFileNames
protected static final Pattern headerLine
public EMBLFormat()
public boolean canRead(File file) throws IOException
canRead
in interface RichSequenceFormat
canRead
in class RichSequenceFormat.BasicFormat
file
- the File
to check.IOException
- in case the file is inaccessible.public SymbolTokenization guessSymbolTokenization(File file) throws IOException
guessSymbolTokenization
in interface RichSequenceFormat
guessSymbolTokenization
in class RichSequenceFormat.BasicFormat
file
- the File
object to guess the format of.SymbolTokenization
to read the file with.IOException
- if the file is unrecognisable or inaccessible.public boolean canRead(BufferedInputStream stream) throws IOException
stream
- the BufferedInputStream
to check.IOException
- in case the stream is inaccessible.public SymbolTokenization guessSymbolTokenization(BufferedInputStream stream) throws IOException
stream
- the BufferedInputStream
object to guess the format of.SymbolTokenization
to read the stream with.IOException
- if the stream is unrecognisable or inaccessible.public boolean readSequence(BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, IOException, ParseException
reader
- The stream of data to parse.symParser
- A SymbolParser defining a mapping from
character data to Symbols.listener
- A listener to notify when data is extracted
from the stream.IllegalSymbolException
- if it is not possible to
translate character data from the stream into valid BioJava
symbols.IOException
- if an error occurs while reading from the
stream.ParseException
public boolean readRichSequence(BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rlistener, Namespace ns) throws IllegalSymbolException, IOException, ParseException
reader
- the input sourcesymParser
- the tokenizer which understands the sequence being readrlistener
- the listener to send sequence events tons
- the namespace to read sequences into.IllegalSymbolException
- if the tokenizer couldn't understand one of the
sequence symbols in the file.IOException
- if there was a read error.ParseException
public void writeSequence(Sequence seq, PrintStream os) throws IOException
writeSequence
writes a sequence to the specified
PrintStream, using the default format.seq
- the sequence to write out.os
- the printstream to write to.IOException
public void writeSequence(Sequence seq, String format, PrintStream os) throws IOException
writeSequence
writes a sequence to the specified
PrintStream
, using the specified format.seq
- a Sequence
to write out.format
- a String
indicating which sub-format
of those available from a particular
SequenceFormat
implemention to use when
writing.os
- a PrintStream
object.IOException
- if an error occurs.public void writeSequence(Sequence seq, Namespace ns) throws IOException
seq
- the sequence to writens
- the namespace to write it withIOException
- in case it couldn't write somethingpublic void writeSequence(Sequence seq, String format, Namespace ns) throws IOException
writeSequence(Sequence, Namespace)
, except
that it also takes a format parameter. This can be any of the formats
defined as constants in this class.seq
- see writeSequence(Sequence, Namespace)
format
- the format to use.ns
- see writeSequence(Sequence, Namespace)
IOException
- see writeSequence(Sequence, Namespace)
public String getDefaultFormat()
getDefaultFormat
returns the String identifier for
the default sub-format written by a SequenceFormat
implementation.String
.Copyright © 2020 BioJava. All rights reserved.