Package org.biojava.nbio.structure.io
Class FastaStructureParser
- java.lang.Object
-
- org.biojava.nbio.structure.io.FastaStructureParser
-
public class FastaStructureParser extends Object
Reads a protein sequence from a fasta file and attempts to match it to a 3D structure. Any gaps ('-') in the fasta file are preserved as null atoms in the output, allowing structural alignments to be read from fasta files.Structures are loaded from an AtomCache. For this to work, the accession for each protein should be parsed from the fasta header line into a form understood by
AtomCache.getStructure(String)
.Lowercase letters are sometimes used to specify unaligned residues. This information can be preserved by using a CasePreservingSequenceCreator, which allows the case of residues to be accessed through the
AbstractSequence.getUserCollection()
method.- Author:
- Spencer Bliven
-
-
Constructor Summary
Constructors Constructor Description FastaStructureParser(File file, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache)
FastaStructureParser(InputStream is, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache)
FastaStructureParser(FastaReader<ProteinSequence,AminoAcidCompound> reader, AtomCache cache)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String[]
getAccessions()
Gets the protein accessions mapped from the Fasta file.ResidueNumber[][]
getResidues()
For each residue in the fasta file, return the ResidueNumber in the corresponding structure.ProteinSequence[]
getSequences()
Gets the protein sequences read from the Fasta file.Structure[]
getStructures()
Gets the protein structures mapped from the Fasta file.void
process()
Parses the fasta file and loads it into memory.
-
-
-
Constructor Detail
-
FastaStructureParser
public FastaStructureParser(InputStream is, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache)
-
FastaStructureParser
public FastaStructureParser(File file, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache) throws FileNotFoundException
- Throws:
FileNotFoundException
-
FastaStructureParser
public FastaStructureParser(FastaReader<ProteinSequence,AminoAcidCompound> reader, AtomCache cache)
-
-
Method Detail
-
process
public void process() throws IOException, StructureException
Parses the fasta file and loads it into memory. Information can be subsequently accessed throughgetSequences()
,getStructures()
,getResidues()
, andgetAccessions()
.- Throws:
IOException
StructureException
-
getSequences
public ProteinSequence[] getSequences()
Gets the protein sequences read from the Fasta file. Returns null ifprocess()
has not been called.- Returns:
- An array ProteinSequences from parsing the fasta file, or null if process() hasn't been called.
-
getStructures
public Structure[] getStructures()
Gets the protein structures mapped from the Fasta file. Returns null ifprocess()
has not been called.- Returns:
- An array of Structures for each protein in the fasta file, or null if process() hasn't been called.
-
getResidues
public ResidueNumber[][] getResidues()
For each residue in the fasta file, return the ResidueNumber in the corresponding structure. If the residue cannot be found in the structure, that entry will be null. This can happen if that residue was not included in the PDB file (eg disordered residues), if the fasta sequence does not match the PDB sequence, or if errors occur during the matching process.- Returns:
- A 2D array of ResidueNumbers, or null if process() hasn't been called.
- See Also:
StructureSequenceMatcher.matchSequenceToStructure(ProteinSequence, Structure)
-
getAccessions
public String[] getAccessions()
Gets the protein accessions mapped from the Fasta file. Returns null ifprocess()
has not been called.- Returns:
- An array of Structures for each protein in the fasta file, or null if process() hasn't been called.
-
-