Package org.biojavax.bio.seq
Class RichSequence.IOTools
- java.lang.Object
-
- org.biojavax.bio.seq.RichSequence.IOTools
-
- Enclosing interface:
- RichSequence
public static final class RichSequence.IOTools extends Object
A set of convenience methods for handling common file formats.- Since:
- 1.5
- Author:
- Mark Schreiber, Richard Holland
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
RichSequence.IOTools.SingleRichSeqIterator
Used to iterate over a single rich sequence
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static SymbolTokenization
getDNAParser()
Creates a DNA symbol tokenizer.static SymbolTokenization
getNucleotideParser()
Creates a nucleotide symbol tokenizer.static SymbolTokenization
getProteinParser()
Creates a protein symbol tokenizer.static SymbolTokenization
getRNAParser()
Creates a RNA symbol tokenizer.static RichSequenceIterator
readEMBL(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a EMBL file using a custom type of SymbolList.static RichSequenceIterator
readEMBLDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of DNA sequences.static RichSequenceIterator
readEMBLProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of Protein sequences.static RichSequenceIterator
readEMBLRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of RNA sequences.static RichSequenceIterator
readEMBLxml(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a EMBLxml file using a custom type of SymbolList.static RichSequenceIterator
readEMBLxmlDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of DNA sequences.static RichSequenceIterator
readEMBLxmlProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of Protein sequences.static RichSequenceIterator
readEMBLxmlRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of RNA sequences.static RichSequenceIterator
readFasta(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a fasta file building a custom type ofRichSequence
.static RichSequenceIterator
readFasta(BufferedReader br, SymbolTokenization sTok, Namespace ns)
Read a fasta file.static RichSequenceIterator
readFastaDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of DNA sequences.static RichSequenceIterator
readFastaProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of Protein sequences.static RichSequenceIterator
readFastaRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of RNA sequences.static RichSequenceIterator
readFile(File file, RichSequenceBuilderFactory seqFactory, Namespace ns)
Guess which format a file is then attempt to read it.static RichSequenceIterator
readFile(File file, Namespace ns)
Guess which format a file is then attempt to read it.static RichSequenceIterator
readGenbank(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a GenBank file using a custom type of SymbolList.static RichSequenceIterator
readGenbankDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of DNA sequences.static RichSequenceIterator
readGenbankProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of Protein sequences.static RichSequenceIterator
readGenbankRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of RNA sequences.static RichSequenceIterator
readHashedFastaDNA(BufferedInputStream is, Namespace ns)
Iterate over the sequences in an FASTA-format stream of DNA sequences.static RichSequenceIterator
readINSDseq(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a INSDseq file using a custom type of SymbolList.static RichSequenceIterator
readINSDseqDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of DNA sequences.static RichSequenceIterator
readINSDseqProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of Protein sequences.static RichSequenceIterator
readINSDseqRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of RNA sequences.static RichSequenceIterator
readStream(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns)
Guess which format a stream is then attempt to read it.static RichSequenceIterator
readStream(BufferedInputStream stream, Namespace ns)
Guess which format a stream is then attempt to read it.static RichSequenceIterator
readUniProt(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a UniProt file using a custom type of SymbolList.static RichSequenceIterator
readUniProt(BufferedReader br, Namespace ns)
Iterate over the sequences in an UniProt-format stream of RNA sequences.static RichSequenceIterator
readUniProtXML(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a UniProt XML file using a custom type of SymbolList.static RichSequenceIterator
readUniProtXML(BufferedReader br, Namespace ns)
Iterate over the sequences in an UniProt XML-format stream of RNA sequences.static void
registerFormat(Class formatClass)
Register a new format with IOTools for auto-guessing.static void
writeEMBL(OutputStream os, SequenceIterator in, Namespace ns)
Writes sequences from aSequenceIterator
to anOutputStream
in EMBL Format.static void
writeEMBL(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in EMBL format.static void
writeEMBLxml(OutputStream os, SequenceIterator in, Namespace ns)
Writes sequences from aSequenceIterator
to anOutputStream
in EMBLxml Format.static void
writeEMBLxml(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in EMBLxml format.static void
writeFasta(OutputStream os, SequenceIterator in, Namespace ns)
WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format.static void
writeFasta(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header)
WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format.static void
writeFasta(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in Fasta format.static void
writeFasta(OutputStream os, Sequence seq, Namespace ns, FastaHeader header)
Writes a singleSequence
to anOutputStream
in Fasta format.static void
writeGenbank(OutputStream os, SequenceIterator in, Namespace ns)
Writes sequences from aSequenceIterator
to anOutputStream
in GenBank Format.static void
writeGenbank(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in GenBank format.static void
writeINSDseq(OutputStream os, SequenceIterator in, Namespace ns)
Writes sequences from aSequenceIterator
to anOutputStream
in INSDseq Format.static void
writeINSDseq(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in INSDseq format.static void
writeUniProt(OutputStream os, SequenceIterator in, Namespace ns)
Writes sequences from aSequenceIterator
to anOutputStream
in UniProt Format.static void
writeUniProt(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in UniProt format.static void
writeUniProtXML(OutputStream os, SequenceIterator in, Namespace ns)
Writes sequences from aSequenceIterator
to anOutputStream
in UniProt XML Format.static void
writeUniProtXML(OutputStream os, Sequence seq, Namespace ns)
Writes a singleSequence
to anOutputStream
in UniProt XML format.
-
-
-
Method Detail
-
registerFormat
public static void registerFormat(Class formatClass)
Register a new format with IOTools for auto-guessing.- Parameters:
formatClass
- theRichSequenceFormat
object to register.
-
readStream
public static RichSequenceIterator readStream(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns) throws IOException
Guess which format a stream is then attempt to read it.- Parameters:
stream
- theBufferedInputStream
to attempt to read.seqFactory
- a factory used to build aRichSequence
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- in case the stream is unrecognisable or problems occur in reading it.
-
readStream
public static RichSequenceIterator readStream(BufferedInputStream stream, Namespace ns) throws IOException
Guess which format a stream is then attempt to read it.- Parameters:
stream
- theBufferedInputStream
to attempt to read.ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- If the file cannot be read.
-
readFile
public static RichSequenceIterator readFile(File file, RichSequenceBuilderFactory seqFactory, Namespace ns) throws IOException
Guess which format a file is then attempt to read it.- Parameters:
file
- theFile
to attempt to read.seqFactory
- a factory used to build aRichSequence
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- in case the file is unrecognisable or problems occur in reading it.
-
readFile
public static RichSequenceIterator readFile(File file, Namespace ns) throws IOException
Guess which format a file is then attempt to read it.- Parameters:
file
- theFile
to attempt to read.ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- If the file cannot be read.
-
readFasta
public static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, Namespace ns)
Read a fasta file.- Parameters:
br
- theBufferedReader
to read data from
sTok
- aSymbolTokenization
that understands the sequencesns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readFasta
public static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a fasta file building a custom type ofRichSequence
. For example, useRichSequenceBuilderFactory.FACTORY
to emulatereadFasta(BufferedReader, SymbolTokenization)
andRichSequenceBuilderFactory.PACKED
to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aRichSequence
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readFastaDNA
public static RichSequenceIterator readFastaDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file - See Also:
for a speeded up version that can access sequences from memory.
-
readHashedFastaDNA
public static RichSequenceIterator readHashedFastaDNA(BufferedInputStream is, Namespace ns) throws BioException
Iterate over the sequences in an FASTA-format stream of DNA sequences. In contrast to readFastaDNA, this provides a speeded up implementation where all sequences are accessed from memory.- Parameters:
is
- theBufferedInputStream
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file - Throws:
BioException
- if somethings goes wrong while reading the file.- See Also:
readFastaDNA(java.io.BufferedReader, org.biojavax.Namespace)
-
readFastaRNA
public static RichSequenceIterator readFastaRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readFastaProtein
public static RichSequenceIterator readFastaProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbank
public static RichSequenceIterator readGenbank(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a GenBank file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbankDNA
public static RichSequenceIterator readGenbankDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbankRNA
public static RichSequenceIterator readGenbankRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbankProtein
public static RichSequenceIterator readGenbankProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseq
public static RichSequenceIterator readINSDseq(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a INSDseq file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseqDNA
public static RichSequenceIterator readINSDseqDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseqRNA
public static RichSequenceIterator readINSDseqRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseqProtein
public static RichSequenceIterator readINSDseqProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxml
public static RichSequenceIterator readEMBLxml(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a EMBLxml file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxmlDNA
public static RichSequenceIterator readEMBLxmlDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxmlRNA
public static RichSequenceIterator readEMBLxmlRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxmlProtein
public static RichSequenceIterator readEMBLxmlProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBL
public static RichSequenceIterator readEMBL(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a EMBL file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLDNA
public static RichSequenceIterator readEMBLDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLRNA
public static RichSequenceIterator readEMBLRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLProtein
public static RichSequenceIterator readEMBLProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProt
public static RichSequenceIterator readUniProt(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a UniProt file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProt
public static RichSequenceIterator readUniProt(BufferedReader br, Namespace ns)
Iterate over the sequences in an UniProt-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProtXML
public static RichSequenceIterator readUniProtXML(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a UniProt XML file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProtXML
public static RichSequenceIterator readUniProtXML(BufferedReader br, Namespace ns)
Iterate over the sequences in an UniProt XML-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header) throws IOException
WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of inputRichSequence
sns
- aNamespace
to write theRichSequence
s to.Null
implies that it should use the namespace specified in the individual sequence.header
- the FastaHeader- Throws:
IOException
- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of inputRichSequence
sns
- aNamespace
to write theRichSequence
s to.Null
implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in Fasta format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, Sequence seq, Namespace ns, FastaHeader header) throws IOException
Writes a singleSequence
to anOutputStream
in Fasta format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.header
- aFastaHeader
that controls the fields in the header.- Throws:
IOException
- if there is an IO problem
-
writeGenbank
public static void writeGenbank(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIterator
to anOutputStream
in GenBank Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeGenbank
public static void writeGenbank(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in GenBank format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeINSDseq
public static void writeINSDseq(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIterator
to anOutputStream
in INSDseq Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeINSDseq
public static void writeINSDseq(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in INSDseq format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBLxml
public static void writeEMBLxml(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIterator
to anOutputStream
in EMBLxml Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBLxml
public static void writeEMBLxml(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in EMBLxml format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBL
public static void writeEMBL(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIterator
to anOutputStream
in EMBL Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBL
public static void writeEMBL(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in EMBL format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProt
public static void writeUniProt(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIterator
to anOutputStream
in UniProt Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProt
public static void writeUniProt(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in UniProt format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProtXML
public static void writeUniProtXML(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIterator
to anOutputStream
in UniProt XML Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProtXML
public static void writeUniProtXML(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequence
to anOutputStream
in UniProt XML format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
getDNAParser
public static SymbolTokenization getDNAParser()
Creates a DNA symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing DNA.
-
getRNAParser
public static SymbolTokenization getRNAParser()
Creates a RNA symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing RNA.
-
getNucleotideParser
public static SymbolTokenization getNucleotideParser()
Creates a nucleotide symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing nucleotides.
-
getProteinParser
public static SymbolTokenization getProteinParser()
Creates a protein symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing protein.
-
-