Package org.biojavax.bio.seq
Class RichSequence.IOTools
- java.lang.Object
-
- org.biojavax.bio.seq.RichSequence.IOTools
-
- Enclosing interface:
- RichSequence
public static final class RichSequence.IOTools extends Object
A set of convenience methods for handling common file formats.- Since:
- 1.5
- Author:
- Mark Schreiber, Richard Holland
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classRichSequence.IOTools.SingleRichSeqIteratorUsed to iterate over a single rich sequence
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static SymbolTokenizationgetDNAParser()Creates a DNA symbol tokenizer.static SymbolTokenizationgetNucleotideParser()Creates a nucleotide symbol tokenizer.static SymbolTokenizationgetProteinParser()Creates a protein symbol tokenizer.static SymbolTokenizationgetRNAParser()Creates a RNA symbol tokenizer.static RichSequenceIteratorreadEMBL(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a EMBL file using a custom type of SymbolList.static RichSequenceIteratorreadEMBLDNA(BufferedReader br, Namespace ns)Iterate over the sequences in an EMBL-format stream of DNA sequences.static RichSequenceIteratorreadEMBLProtein(BufferedReader br, Namespace ns)Iterate over the sequences in an EMBL-format stream of Protein sequences.static RichSequenceIteratorreadEMBLRNA(BufferedReader br, Namespace ns)Iterate over the sequences in an EMBL-format stream of RNA sequences.static RichSequenceIteratorreadEMBLxml(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a EMBLxml file using a custom type of SymbolList.static RichSequenceIteratorreadEMBLxmlDNA(BufferedReader br, Namespace ns)Iterate over the sequences in an EMBLxml-format stream of DNA sequences.static RichSequenceIteratorreadEMBLxmlProtein(BufferedReader br, Namespace ns)Iterate over the sequences in an EMBLxml-format stream of Protein sequences.static RichSequenceIteratorreadEMBLxmlRNA(BufferedReader br, Namespace ns)Iterate over the sequences in an EMBLxml-format stream of RNA sequences.static RichSequenceIteratorreadFasta(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a fasta file building a custom type ofRichSequence.static RichSequenceIteratorreadFasta(BufferedReader br, SymbolTokenization sTok, Namespace ns)Read a fasta file.static RichSequenceIteratorreadFastaDNA(BufferedReader br, Namespace ns)Iterate over the sequences in an FASTA-format stream of DNA sequences.static RichSequenceIteratorreadFastaProtein(BufferedReader br, Namespace ns)Iterate over the sequences in an FASTA-format stream of Protein sequences.static RichSequenceIteratorreadFastaRNA(BufferedReader br, Namespace ns)Iterate over the sequences in an FASTA-format stream of RNA sequences.static RichSequenceIteratorreadFile(File file, RichSequenceBuilderFactory seqFactory, Namespace ns)Guess which format a file is then attempt to read it.static RichSequenceIteratorreadFile(File file, Namespace ns)Guess which format a file is then attempt to read it.static RichSequenceIteratorreadGenbank(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a GenBank file using a custom type of SymbolList.static RichSequenceIteratorreadGenbankDNA(BufferedReader br, Namespace ns)Iterate over the sequences in an GenBank-format stream of DNA sequences.static RichSequenceIteratorreadGenbankProtein(BufferedReader br, Namespace ns)Iterate over the sequences in an GenBank-format stream of Protein sequences.static RichSequenceIteratorreadGenbankRNA(BufferedReader br, Namespace ns)Iterate over the sequences in an GenBank-format stream of RNA sequences.static RichSequenceIteratorreadHashedFastaDNA(BufferedInputStream is, Namespace ns)Iterate over the sequences in an FASTA-format stream of DNA sequences.static RichSequenceIteratorreadINSDseq(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a INSDseq file using a custom type of SymbolList.static RichSequenceIteratorreadINSDseqDNA(BufferedReader br, Namespace ns)Iterate over the sequences in an INSDseq-format stream of DNA sequences.static RichSequenceIteratorreadINSDseqProtein(BufferedReader br, Namespace ns)Iterate over the sequences in an INSDseq-format stream of Protein sequences.static RichSequenceIteratorreadINSDseqRNA(BufferedReader br, Namespace ns)Iterate over the sequences in an INSDseq-format stream of RNA sequences.static RichSequenceIteratorreadStream(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns)Guess which format a stream is then attempt to read it.static RichSequenceIteratorreadStream(BufferedInputStream stream, Namespace ns)Guess which format a stream is then attempt to read it.static RichSequenceIteratorreadUniProt(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a UniProt file using a custom type of SymbolList.static RichSequenceIteratorreadUniProt(BufferedReader br, Namespace ns)Iterate over the sequences in an UniProt-format stream of RNA sequences.static RichSequenceIteratorreadUniProtXML(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)Read a UniProt XML file using a custom type of SymbolList.static RichSequenceIteratorreadUniProtXML(BufferedReader br, Namespace ns)Iterate over the sequences in an UniProt XML-format stream of RNA sequences.static voidregisterFormat(Class formatClass)Register a new format with IOTools for auto-guessing.static voidwriteEMBL(OutputStream os, SequenceIterator in, Namespace ns)Writes sequences from aSequenceIteratorto anOutputStreamin EMBL Format.static voidwriteEMBL(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin EMBL format.static voidwriteEMBLxml(OutputStream os, SequenceIterator in, Namespace ns)Writes sequences from aSequenceIteratorto anOutputStreamin EMBLxml Format.static voidwriteEMBLxml(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin EMBLxml format.static voidwriteFasta(OutputStream os, SequenceIterator in, Namespace ns)WritesSequences from aSequenceIteratorto anOutputStreamin Fasta Format.static voidwriteFasta(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header)WritesSequences from aSequenceIteratorto anOutputStreamin Fasta Format.static voidwriteFasta(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin Fasta format.static voidwriteFasta(OutputStream os, Sequence seq, Namespace ns, FastaHeader header)Writes a singleSequenceto anOutputStreamin Fasta format.static voidwriteGenbank(OutputStream os, SequenceIterator in, Namespace ns)Writes sequences from aSequenceIteratorto anOutputStreamin GenBank Format.static voidwriteGenbank(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin GenBank format.static voidwriteINSDseq(OutputStream os, SequenceIterator in, Namespace ns)Writes sequences from aSequenceIteratorto anOutputStreamin INSDseq Format.static voidwriteINSDseq(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin INSDseq format.static voidwriteUniProt(OutputStream os, SequenceIterator in, Namespace ns)Writes sequences from aSequenceIteratorto anOutputStreamin UniProt Format.static voidwriteUniProt(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin UniProt format.static voidwriteUniProtXML(OutputStream os, SequenceIterator in, Namespace ns)Writes sequences from aSequenceIteratorto anOutputStreamin UniProt XML Format.static voidwriteUniProtXML(OutputStream os, Sequence seq, Namespace ns)Writes a singleSequenceto anOutputStreamin UniProt XML format.
-
-
-
Method Detail
-
registerFormat
public static void registerFormat(Class formatClass)
Register a new format with IOTools for auto-guessing.- Parameters:
formatClass- theRichSequenceFormatobject to register.
-
readStream
public static RichSequenceIterator readStream(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns) throws IOException
Guess which format a stream is then attempt to read it.- Parameters:
stream- theBufferedInputStreamto attempt to read.seqFactory- a factory used to build aRichSequencens- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the file - Throws:
IOException- in case the stream is unrecognisable or problems occur in reading it.
-
readStream
public static RichSequenceIterator readStream(BufferedInputStream stream, Namespace ns) throws IOException
Guess which format a stream is then attempt to read it.- Parameters:
stream- theBufferedInputStreamto attempt to read.ns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the file - Throws:
IOException- If the file cannot be read.
-
readFile
public static RichSequenceIterator readFile(File file, RichSequenceBuilderFactory seqFactory, Namespace ns) throws IOException
Guess which format a file is then attempt to read it.- Parameters:
file- theFileto attempt to read.seqFactory- a factory used to build aRichSequencens- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the file - Throws:
IOException- in case the file is unrecognisable or problems occur in reading it.
-
readFile
public static RichSequenceIterator readFile(File file, Namespace ns) throws IOException
Guess which format a file is then attempt to read it.- Parameters:
file- theFileto attempt to read.ns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the file - Throws:
IOException- If the file cannot be read.
-
readFasta
public static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, Namespace ns)
Read a fasta file.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readFasta
public static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a fasta file building a custom type ofRichSequence. For example, useRichSequenceBuilderFactory.FACTORYto emulatereadFasta(BufferedReader, SymbolTokenization)andRichSequenceBuilderFactory.PACKEDto force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aRichSequencens- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readFastaDNA
public static RichSequenceIterator readFastaDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of DNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file - See Also:
for a speeded up version that can access sequences from memory.
-
readHashedFastaDNA
public static RichSequenceIterator readHashedFastaDNA(BufferedInputStream is, Namespace ns) throws BioException
Iterate over the sequences in an FASTA-format stream of DNA sequences. In contrast to readFastaDNA, this provides a speeded up implementation where all sequences are accessed from memory.- Parameters:
is- theBufferedInputStreamto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file - Throws:
BioException- if somethings goes wrong while reading the file.- See Also:
readFastaDNA(java.io.BufferedReader, org.biojavax.Namespace)
-
readFastaRNA
public static RichSequenceIterator readFastaRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readFastaProtein
public static RichSequenceIterator readFastaProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an FASTA-format stream of Protein sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readGenbank
public static RichSequenceIterator readGenbank(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a GenBank file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aSymbolListns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readGenbankDNA
public static RichSequenceIterator readGenbankDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of DNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readGenbankRNA
public static RichSequenceIterator readGenbankRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readGenbankProtein
public static RichSequenceIterator readGenbankProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an GenBank-format stream of Protein sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readINSDseq
public static RichSequenceIterator readINSDseq(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a INSDseq file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aSymbolListns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readINSDseqDNA
public static RichSequenceIterator readINSDseqDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of DNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readINSDseqRNA
public static RichSequenceIterator readINSDseqRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readINSDseqProtein
public static RichSequenceIterator readINSDseqProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an INSDseq-format stream of Protein sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLxml
public static RichSequenceIterator readEMBLxml(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a EMBLxml file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aSymbolListns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLxmlDNA
public static RichSequenceIterator readEMBLxmlDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of DNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLxmlRNA
public static RichSequenceIterator readEMBLxmlRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLxmlProtein
public static RichSequenceIterator readEMBLxmlProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of Protein sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBL
public static RichSequenceIterator readEMBL(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a EMBL file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aSymbolListns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLDNA
public static RichSequenceIterator readEMBLDNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of DNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLRNA
public static RichSequenceIterator readEMBLRNA(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readEMBLProtein
public static RichSequenceIterator readEMBLProtein(BufferedReader br, Namespace ns)
Iterate over the sequences in an EMBL-format stream of Protein sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readUniProt
public static RichSequenceIterator readUniProt(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a UniProt file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aSymbolListns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readUniProt
public static RichSequenceIterator readUniProt(BufferedReader br, Namespace ns)
Iterate over the sequences in an UniProt-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readUniProtXML
public static RichSequenceIterator readUniProtXML(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
Read a UniProt XML file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br- theBufferedReaderto read data fromsTok- aSymbolTokenizationthat understands the sequencesseqFactory- a factory used to build aSymbolListns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
readUniProtXML
public static RichSequenceIterator readUniProtXML(BufferedReader br, Namespace ns)
Iterate over the sequences in an UniProt XML-format stream of RNA sequences.- Parameters:
br- theBufferedReaderto read data fromns- aNamespaceto load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()is used.- Returns:
- a
RichSequenceIteratorover each sequence in the fasta file
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header) throws IOException
WritesSequences from aSequenceIteratorto anOutputStreamin Fasta Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of inputRichSequencesns- aNamespaceto write theRichSequences to.Nullimplies that it should use the namespace specified in the individual sequence.header- the FastaHeader- Throws:
IOException- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
WritesSequences from aSequenceIteratorto anOutputStreamin Fasta Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of inputRichSequencesns- aNamespaceto write theRichSequences to.Nullimplies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin Fasta format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, Sequence seq, Namespace ns, FastaHeader header) throws IOException
Writes a singleSequenceto anOutputStreamin Fasta format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.header- aFastaHeaderthat controls the fields in the header.- Throws:
IOException- if there is an IO problem
-
writeGenbank
public static void writeGenbank(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIteratorto anOutputStreamin GenBank Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of input Sequencesns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeGenbank
public static void writeGenbank(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin GenBank format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeINSDseq
public static void writeINSDseq(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIteratorto anOutputStreamin INSDseq Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of input Sequencesns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeINSDseq
public static void writeINSDseq(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin INSDseq format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeEMBLxml
public static void writeEMBLxml(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIteratorto anOutputStreamin EMBLxml Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of input Sequencesns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeEMBLxml
public static void writeEMBLxml(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin EMBLxml format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeEMBL
public static void writeEMBL(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIteratorto anOutputStreamin EMBL Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of input Sequencesns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeEMBL
public static void writeEMBL(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin EMBL format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeUniProt
public static void writeUniProt(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIteratorto anOutputStreamin UniProt Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of input Sequencesns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeUniProt
public static void writeUniProt(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin UniProt format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeUniProtXML
public static void writeUniProtXML(OutputStream os, SequenceIterator in, Namespace ns) throws IOException
Writes sequences from aSequenceIteratorto anOutputStreamin UniProt XML Format. This makes for a useful format filter where aStreamReadercan be sent to theRichStreamWriterafter formatting.- Parameters:
os- The stream to write fasta formatted data toin- The source of input Sequencesns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
writeUniProtXML
public static void writeUniProtXML(OutputStream os, Sequence seq, Namespace ns) throws IOException
Writes a singleSequenceto anOutputStreamin UniProt XML format.- Parameters:
os- theOutputStream.seq- theSequence.ns- aNamespaceto write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException- if there is an IO problem
-
getDNAParser
public static SymbolTokenization getDNAParser()
Creates a DNA symbol tokenizer.- Returns:
- a
SymbolTokenizationfor parsing DNA.
-
getRNAParser
public static SymbolTokenization getRNAParser()
Creates a RNA symbol tokenizer.- Returns:
- a
SymbolTokenizationfor parsing RNA.
-
getNucleotideParser
public static SymbolTokenization getNucleotideParser()
Creates a nucleotide symbol tokenizer.- Returns:
- a
SymbolTokenizationfor parsing nucleotides.
-
getProteinParser
public static SymbolTokenization getProteinParser()
Creates a protein symbol tokenizer.- Returns:
- a
SymbolTokenizationfor parsing protein.
-
-