Package org.biojava.bio.seq.io
Class SeqIOTools
- java.lang.Object
-
- org.biojava.bio.seq.io.SeqIOTools
-
public final class SeqIOTools extends Object
Deprecated.use org.biojavax.bio.seq.RichSequence.IOToolsA set of convenience methods for handling common file formats.- Since:
- 1.1
- Author:
- Thomas Down, Mark Schreiber, Nimesh Singh, Matthew Pocock, Keith James
-
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static void
biojavaToFile(int fileType, OutputStream os, Object biojava)
Deprecated.Converts a Biojava object to the given filetype.static void
biojavaToFile(String formatName, String alphabetName, OutputStream os, Object biojava)
Deprecated.Writes a BiojavaSequenceIterator
,SequenceDB
,Sequence
orAligment
to anOutputStream
static Object
fileToBiojava(int fileType, BufferedReader br)
Deprecated.Reads a file and returns the corresponding Biojava object.static Object
fileToBiojava(String formatName, String alphabetName, BufferedReader br)
Deprecated.Reads a file with the specified format and alphabetstatic SequenceBuilderFactory
formatToFactory(SequenceFormat format, Alphabet alpha)
Deprecated.as this essentially duplicates the operation available in the methodidentifyBuilderFactory
.static FiniteAlphabet
getAlphabet(int identifier)
Deprecated.getAlphabet
accepts a value which represents a sequence format and returns the relevantFiniteAlphabet
object.static SequenceBuilderFactory
getBuilderFactory(int identifier)
Deprecated.getBuilderFactory
accepts a value which represents a sequence format and returns the relevantSequenceBuilderFactory
object.static SequenceBuilderFactory
getEmblBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling EMBL files.static SequenceBuilderFactory
getFastaBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling FASTA files.static SequenceBuilderFactory
getGenbankBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling GenBank files.static SequenceBuilderFactory
getGenpeptBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling Genpept files.static SequenceFormat
getSequenceFormat(int identifier)
Deprecated.getSequenceFormat
accepts a value which represents a sequence format and returns the relevantSequenceFormat
object.static SequenceBuilderFactory
getSwissprotBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling Swissprot files.static int
guessFileType(File seqFile)
Deprecated.because there is no standard file naming convention and guessing by file name is inherantly error prone and bad.static int
identifyFormat(String formatName, String alphabetName)
Deprecated.identifyFormat
performs a case-insensitive mapping of a pair of common sequence format name (such as 'embl', 'genbank' or 'fasta') and alphabet name (such as 'dna', 'rna', 'protein', 'aa') to an integer.static SequenceIterator
readEmbl(BufferedReader br)
Deprecated.Iterate over the sequences in an EMBL-format stream.static SequenceIterator
readEmblNucleotide(BufferedReader br)
Deprecated.Iterate over the sequences in an EMBL-format stream.static SequenceIterator
readEmblRNA(BufferedReader br)
Deprecated.Iterate over the sequences in an EMBL-format stream, but for RNA.static SequenceIterator
readFasta(BufferedReader br, SymbolTokenization sTok)
Deprecated.Read a fasta file.static SequenceIterator
readFasta(BufferedReader br, SymbolTokenization sTok, SequenceBuilderFactory seqFactory)
Deprecated.Read a fasta file using a custom type of SymbolList.static SequenceDB
readFasta(InputStream seqFile, Alphabet alpha)
Deprecated.Create a sequence database from a fasta file provided as an input stream.static SequenceIterator
readFastaDNA(BufferedReader br)
Deprecated.Iterate over the sequences in an FASTA-format stream of DNA sequences.static SequenceIterator
readFastaProtein(BufferedReader br)
Deprecated.Iterate over the sequences in an FASTA-format stream of Protein sequences.static SequenceIterator
readFastaRNA(BufferedReader br)
Deprecated.Iterate over the sequences in an FASTA-format stream of RNA sequences.static SequenceIterator
readGenbank(BufferedReader br)
Deprecated.Iterate over the sequences in an Genbank-format stream.static SequenceIterator
readGenbankXml(BufferedReader br)
Deprecated.Iterate over the sequences in an GenbankXML-format stream.static SequenceIterator
readGenpept(BufferedReader br)
Deprecated.Iterate over the sequences in an Genpept-format stream.static SequenceIterator
readSwissprot(BufferedReader br)
Deprecated.Iterate over the sequences in an Swissprot-format stream.static void
writeEmbl(OutputStream os, Sequence seq)
Deprecated.Writes a single Sequence to an OutputStream in EMBL format.static void
writeEmbl(OutputStream os, SequenceIterator in)
Deprecated.Writes a stream of Sequences to an OutputStream in EMBL format.static void
writeFasta(OutputStream os, SequenceDB db)
Deprecated.Write a sequenceDB to an output stream in fasta format.static void
writeFasta(OutputStream os, Sequence seq)
Deprecated.Writes a single Sequence to an OutputStream in Fasta format.static void
writeFasta(OutputStream os, SequenceIterator in)
Deprecated.Writes sequences from a SequenceIterator to an OutputStream in Fasta Format.static void
writeGenbank(OutputStream os, Sequence seq)
Deprecated.Writes a single Sequence to an OutputStream in Genbank format.static void
writeGenbank(OutputStream os, SequenceIterator in)
Deprecated.Writes a stream of Sequences to an OutputStream in Genbank format.static void
writeGenpept(OutputStream os, Sequence seq)
Deprecated.Writes a single Sequence to an OutputStream in Genpept format.static void
writeGenpept(OutputStream os, SequenceIterator in)
Deprecated.Writes a stream of Sequences to an OutputStream in Genpept format.static void
writeSwissprot(OutputStream os, Sequence seq)
Deprecated.Writes a single Sequence to an OutputStream in SwissProt format.static void
writeSwissprot(OutputStream os, SequenceIterator in)
Deprecated.Writes a stream of Sequences to an OutputStream in SwissProt format.
-
-
-
Method Detail
-
getEmblBuilderFactory
public static SequenceBuilderFactory getEmblBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling EMBL files.- Returns:
- a
SmartSequenceBuilder.FACTORY
-
readEmbl
public static SequenceIterator readEmbl(BufferedReader br)
Deprecated.Iterate over the sequences in an EMBL-format stream.- Parameters:
br
- A reader for the EMBL source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readEmblRNA
public static SequenceIterator readEmblRNA(BufferedReader br)
Deprecated.Iterate over the sequences in an EMBL-format stream, but for RNA.- Parameters:
br
- A reader for the EMBL source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readEmblNucleotide
public static SequenceIterator readEmblNucleotide(BufferedReader br)
Deprecated.Iterate over the sequences in an EMBL-format stream.- Parameters:
br
- A reader for the EMBL source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
getGenbankBuilderFactory
public static SequenceBuilderFactory getGenbankBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling GenBank files.- Returns:
- a
SmartSequenceBuilder.FACTORY
-
readGenbank
public static SequenceIterator readGenbank(BufferedReader br)
Deprecated.Iterate over the sequences in an Genbank-format stream.- Parameters:
br
- A reader for the Genbank source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readGenbankXml
public static SequenceIterator readGenbankXml(BufferedReader br)
Deprecated.Iterate over the sequences in an GenbankXML-format stream.- Parameters:
br
- A reader for the GenbanXML source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
getGenpeptBuilderFactory
public static SequenceBuilderFactory getGenpeptBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling Genpept files.- Returns:
- a
SmartSequenceBuilder.FACTORY
-
readGenpept
public static SequenceIterator readGenpept(BufferedReader br)
Deprecated.Iterate over the sequences in an Genpept-format stream.- Parameters:
br
- A reader for the Genpept source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
getSwissprotBuilderFactory
public static SequenceBuilderFactory getSwissprotBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling Swissprot files.- Returns:
- a
SmartSequenceBuilder.FACTORY
-
readSwissprot
public static SequenceIterator readSwissprot(BufferedReader br)
Deprecated.Iterate over the sequences in an Swissprot-format stream.- Parameters:
br
- A reader for the Swissprot source or file- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
getFastaBuilderFactory
public static SequenceBuilderFactory getFastaBuilderFactory()
Deprecated.Get a default SequenceBuilderFactory for handling FASTA files.- Returns:
- a
SmartSequenceBuilder.FACTORY
-
readFasta
public static SequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok)
Deprecated.Read a fasta file.- Parameters:
br
- the BufferedReader to read data fromsTok
- a SymbolTokenization that understands the sequences- Returns:
- a SequenceIterator over each sequence in the fasta file
-
readFasta
public static SequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, SequenceBuilderFactory seqFactory)
Deprecated.Read a fasta file using a custom type of SymbolList. For example, use SmartSequenceBuilder.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and SmartSequenceBuilder.BIT_PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- the BufferedReader to read data fromsTok
- a SymbolTokenization that understands the sequencesseqFactory
- a factory used to build a SymbolList- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readFastaDNA
public static SequenceIterator readFastaDNA(BufferedReader br)
Deprecated.Iterate over the sequences in an FASTA-format stream of DNA sequences.- Parameters:
br
- the BufferedReader to read data from- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readFastaRNA
public static SequenceIterator readFastaRNA(BufferedReader br)
Deprecated.Iterate over the sequences in an FASTA-format stream of RNA sequences.- Parameters:
br
- the BufferedReader to read data from- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readFastaProtein
public static SequenceIterator readFastaProtein(BufferedReader br)
Deprecated.Iterate over the sequences in an FASTA-format stream of Protein sequences.- Parameters:
br
- the BufferedReader to read data from- Returns:
- a
SequenceIterator
that iterates over eachSequence
in the file
-
readFasta
public static SequenceDB readFasta(InputStream seqFile, Alphabet alpha) throws BioException
Deprecated.Create a sequence database from a fasta file provided as an input stream. Note this somewhat duplicates functionality in the readFastaDNA and readFastaProtein methods but uses a stream rather than a reader and returns a SequenceDB rather than a SequenceIterator. If the returned DB is likely to be large then the above mentioned methods should be used.- Parameters:
seqFile
- The file containg the fasta formatted sequencesalpha
- TheAlphabet
of the sequence, ie DNA, RNA etc- Returns:
- a
SequenceDB
containing all theSequences
in the file. - Throws:
BioException
- if problems occur during reading of the stream.- Since:
- 1.2
-
writeFasta
public static void writeFasta(OutputStream os, SequenceDB db) throws IOException
Deprecated.Write a sequenceDB to an output stream in fasta format.- Parameters:
os
- the stream to write the fasta formatted data to.db
- the database ofSequence
s to write- Throws:
IOException
- if there was an error while writing.- Since:
- 1.2
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in) throws IOException
Deprecated.Writes sequences from a SequenceIterator to an OutputStream in Fasta Format. This makes for a useful format filter where a StreamReader can be sent to the StreamWriter after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of inputSequences
- Throws:
IOException
- if there was an error while writing.- Since:
- 1.2
-
writeFasta
public static void writeFasta(OutputStream os, Sequence seq) throws IOException
Deprecated.Writes a single Sequence to an OutputStream in Fasta format.- Parameters:
os
- the OutputStream.seq
- the Sequence.- Throws:
IOException
- if there was an error while writing.
-
writeEmbl
public static void writeEmbl(OutputStream os, SequenceIterator in) throws IOException
Deprecated.Writes a stream of Sequences to an OutputStream in EMBL format.- Parameters:
os
- the OutputStream.in
- a SequenceIterator.- Throws:
IOException
- if there was an error while writing.
-
writeEmbl
public static void writeEmbl(OutputStream os, Sequence seq) throws IOException
Deprecated.Writes a single Sequence to an OutputStream in EMBL format.- Parameters:
os
- the OutputStream.seq
- the Sequence.- Throws:
IOException
- if there was an error while writing.
-
writeSwissprot
public static void writeSwissprot(OutputStream os, SequenceIterator in) throws IOException, BioException
Deprecated.Writes a stream of Sequences to an OutputStream in SwissProt format.- Parameters:
os
- the OutputStream.in
- a SequenceIterator.- Throws:
BioException
- if theSequence
cannot be converted to SwissProt formatIOException
- if there was an error while writing.
-
writeSwissprot
public static void writeSwissprot(OutputStream os, Sequence seq) throws IOException, BioException
Deprecated.Writes a single Sequence to an OutputStream in SwissProt format.- Parameters:
os
- the OutputStream.seq
- the Sequence.- Throws:
BioException
- if theSequence
cannot be written to SwissProt formatIOException
- if there was an error while writing.
-
writeGenpept
public static void writeGenpept(OutputStream os, SequenceIterator in) throws IOException, BioException
Deprecated.Writes a stream of Sequences to an OutputStream in Genpept format.- Parameters:
os
- the OutputStream.in
- a SequenceIterator.- Throws:
BioException
- if theSequence
cannot be written to Genpept formatIOException
- if there was an error while writing.
-
writeGenpept
public static void writeGenpept(OutputStream os, Sequence seq) throws IOException, BioException
Deprecated.Writes a single Sequence to an OutputStream in Genpept format.- Parameters:
os
- the OutputStream.seq
- the Sequence.- Throws:
BioException
- if theSequence
cannot be written to Genpept formatIOException
- if there was an error while writing.
-
writeGenbank
public static void writeGenbank(OutputStream os, SequenceIterator in) throws IOException
Deprecated.Writes a stream of Sequences to an OutputStream in Genbank format.- Parameters:
os
- the OutputStream.in
- a SequenceIterator.- Throws:
IOException
- if there was an error while writing.
-
writeGenbank
public static void writeGenbank(OutputStream os, Sequence seq) throws IOException
Deprecated.Writes a single Sequence to an OutputStream in Genbank format.- Parameters:
os
- the OutputStream.seq
- the Sequence.- Throws:
IOException
- if there was an error while writing.
-
identifyFormat
public static int identifyFormat(String formatName, String alphabetName)
Deprecated.identifyFormat
performs a case-insensitive mapping of a pair of common sequence format name (such as 'embl', 'genbank' or 'fasta') and alphabet name (such as 'dna', 'rna', 'protein', 'aa') to an integer. The value returned will be one of the public static final fields inSeqIOConstants
, or a bitwise-or combination of them. The method will reject known illegal combinations of format and alphabet (such as swissprot + dna) by throwing anIllegalArgumentException
. It will return theSeqIOConstants.UNKNOWN
value when either format or alphabet are unknown.- Parameters:
formatName
- aString
.alphabetName
- aString
.- Returns:
- an
int
.
-
getSequenceFormat
public static SequenceFormat getSequenceFormat(int identifier) throws BioException
Deprecated.getSequenceFormat
accepts a value which represents a sequence format and returns the relevantSequenceFormat
object.- Parameters:
identifier
- anint
which represents a binary value with bits set according to the scheme described inSeqIOConstants
.- Returns:
- a
SequenceFormat
. - Throws:
BioException
- if an error occurs.
-
getBuilderFactory
public static SequenceBuilderFactory getBuilderFactory(int identifier) throws BioException
Deprecated.getBuilderFactory
accepts a value which represents a sequence format and returns the relevantSequenceBuilderFactory
object.- Parameters:
identifier
- anint
which represents a binary value with bits set according to the scheme described inSeqIOConstants
.- Returns:
- a
SequenceBuilderFactory
. - Throws:
BioException
- if an error occurs.
-
getAlphabet
public static FiniteAlphabet getAlphabet(int identifier) throws BioException
Deprecated.getAlphabet
accepts a value which represents a sequence format and returns the relevantFiniteAlphabet
object.- Parameters:
identifier
- anint
which represents a binary value with bits set according to the scheme described inSeqIOConstants
.- Returns:
- a
FiniteAlphabet
. - Throws:
BioException
- if an error occurs.
-
guessFileType
public static int guessFileType(File seqFile) throws IOException, FileNotFoundException
Deprecated.because there is no standard file naming convention and guessing by file name is inherantly error prone and bad.Attempts to guess the filetype of a file given the name. For use with the functions below that take an int fileType as a parameter. EMBL and Genbank files are assumed to contain DNA sequence.- Parameters:
seqFile
- theFile
to read from.- Returns:
- a value that describes the file type.
- Throws:
IOException
- ifseqFile
cannot be readFileNotFoundException
- ifseqFile
cannot be found
-
formatToFactory
public static SequenceBuilderFactory formatToFactory(SequenceFormat format, Alphabet alpha) throws BioException
Deprecated.as this essentially duplicates the operation available in the methodidentifyBuilderFactory
.Attempts to retrieve the most appropriateSequenceBuilder
object for some combination ofAlphabet
andSequenceFormat
- Parameters:
format
- currently supportsFastaFormat
,GenbankFormat
,EmblLikeFormat
alpha
- currently only supports the DNA and Protein alphabets- Returns:
- the
SequenceBuilderFactory
- Throws:
BioException
- if the combination of alpha and format is unrecognized.
-
fileToBiojava
public static Object fileToBiojava(String formatName, String alphabetName, BufferedReader br) throws BioException
Deprecated.Reads a file with the specified format and alphabet- Parameters:
formatName
- the name of the format eg genbank or swissprot (case insensitive)alphabetName
- the name of the alphabet eg dna or rna or protein (case insensitive)br
- a BufferedReader for the input- Returns:
- either an Alignment object or a SequenceIterator (depending on the format read)
- Throws:
BioException
- if an error occurs while reading or a unrecognized format, alphabet combination is used (eg swissprot and DNA).- Since:
- 1.3
-
fileToBiojava
public static Object fileToBiojava(int fileType, BufferedReader br) throws BioException
Deprecated.Reads a file and returns the corresponding Biojava object. You need to cast it as an Alignment or a SequenceIterator as appropriate.- Parameters:
fileType
- a value that describes the file typebr
- the reader for the input- Returns:
- either a
SequenceIterator
if the file type is a sequence file, or aAlignment
if the file is a sequence alignment. - Throws:
BioException
- if the file cannot be parsed
-
biojavaToFile
public static void biojavaToFile(String formatName, String alphabetName, OutputStream os, Object biojava) throws BioException, IOException, IllegalSymbolException
Deprecated.Writes a BiojavaSequenceIterator
,SequenceDB
,Sequence
orAligment
to anOutputStream
- Parameters:
formatName
- eg fasta, GenBank (case insensitive)alphabetName
- eg DNA, RNA (case insensititve)os
- where to write tobiojava
- the object to write- Throws:
BioException
- problems getting data from the biojava object.IOException
- if there are IO problemsIllegalSymbolException
- a Symbol cannot be parsed
-
biojavaToFile
public static void biojavaToFile(int fileType, OutputStream os, Object biojava) throws BioException, IOException, IllegalSymbolException
Deprecated.Converts a Biojava object to the given filetype.- Parameters:
fileType
- a value that describes the type of sequence fileos
- the stream to write the formatted results tobiojava
- aSequenceIterator
,SequenceDB
,Sequence
, orAlignment
- Throws:
BioException
- ifbiojava
cannot be converted to that format.IOException
- if the output cannot be written toos
IllegalSymbolException
- ifbiojava
contains aSymbol
that cannot be understood by the parser.
-
-