Class UniprotProxySequenceReader<C extends Compound>
- java.lang.Object
-
- org.biojava.nbio.core.sequence.loader.UniprotProxySequenceReader<C>
-
- Type Parameters:
C
-
- All Implemented Interfaces:
Iterable<C>
,DatabaseReferenceInterface
,FeaturesKeyWordInterface
,Accessioned
,ProxySequenceReader<C>
,Sequence<C>
,SequenceReader<C>
public class UniprotProxySequenceReader<C extends Compound> extends Object implements ProxySequenceReader<C>, FeaturesKeyWordInterface, DatabaseReferenceInterface
Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the sequence data and other data elements associated with the ProteinSequence by Uniprot. This is an example of how to map external databases of proteins and features to the BioJava3 ProteinSequence. Important to call @see setUniprotDirectoryCache to allow caching of XML files so they don't need to be reloaded each time. Does not manage cache.
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_UNIPROT_BASE_URL
static Pattern
UP_AC_PATTERN
-
Constructor Summary
Constructors Constructor Description UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet)
The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein.UniprotProxySequenceReader(Document document, CompoundSet<C> compoundSet)
The xml is passed in as a DOM object so we know everything about the protein.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
countCompounds(C... compounds)
Returns the number of times we found a compound in the Sequenceboolean
equals(Object o)
AccessionID
getAccession()
Returns the AccessionID this location is currently bound withArrayList<AccessionID>
getAccessions()
Pull uniprot accessions associated with this sequenceArrayList<String>
getAliases()
Pull uniprot protein aliases associated with this sequence Provided for backwards compatibility now that we support both gene and protein aliases via separate methods.List<C>
getAsList()
Returns the Sequence as a List of compoundsC
getCompoundAt(int position)
Returns the Compound at the given biological indexCompoundSet<C>
getCompoundSet()
Gets the compound set used to back this SequenceMap<String,List<DBReferenceInfo>>
getDatabaseReferences()
The Uniprot mappings to other database identifiers for this sequenceArrayList<String>
getGeneAliases()
Pull uniprot gene aliases associated with this sequenceString
getGeneName()
Get the gene name associated with this sequence.int
getIndexOf(C compound)
Scans through the Sequence looking for the first occurrence of the given compoundSequenceView<C>
getInverse()
Does the right thing to get the inverse of the current Sequence.ArrayList<String>
getKeyWords()
Pull UniProt key words which is a mixed bag of words associated with this sequenceint
getLastIndexOf(C compound)
Scans through the Sequence looking for the last occurrence of the given compoundint
getLength()
The sequence lengthString
getOrganismName()
Get the organism name assigned to this sequenceArrayList<String>
getProteinAliases()
Pull uniprot protein aliases associated with this sequenceString
getSequenceAsString()
Returns the String representation of the SequenceString
getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)
SequenceView<C>
getSubSequence(Integer bioBegin, Integer bioEnd)
Returns a portion of the sequence from the different positions.static String
getUniprotbaseURL()
The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.static String
getUniprotDirectoryCache()
Local directory cache of XML that can be downloadedint
hashCode()
Iterator<C>
iterator()
static <C extends Compound>
UniprotProxySequenceReader<C>parseUniprotXMLString(String xml, CompoundSet<C> compoundSet)
The passed in xml is parsed as a DOM object so we know everything about the protein.void
setCompoundSet(CompoundSet<C> compoundSet)
void
setContents(String sequence)
Once the sequence is retrieved set the contents and make sure everything this is valid Some uniprot records contain white space in the sequence.static void
setUniprotbaseURL(String aUniprotbaseURL)
static void
setUniprotDirectoryCache(String aUniprotDirectoryCache)
String
toString()
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Field Detail
-
UP_AC_PATTERN
public static final Pattern UP_AC_PATTERN
-
DEFAULT_UNIPROT_BASE_URL
public static final String DEFAULT_UNIPROT_BASE_URL
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
UniprotProxySequenceReader
public UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet) throws CompoundNotFoundException, IOException
The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error- Parameters:
accession
-compoundSet
-- Throws:
CompoundNotFoundException
IOException
- if problems while reading the UniProt XML
-
UniprotProxySequenceReader
public UniprotProxySequenceReader(Document document, CompoundSet<C> compoundSet) throws CompoundNotFoundException
The xml is passed in as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id- Parameters:
document
-compoundSet
-- Throws:
CompoundNotFoundException
-
-
Method Detail
-
parseUniprotXMLString
public static <C extends Compound> UniprotProxySequenceReader<C> parseUniprotXMLString(String xml, CompoundSet<C> compoundSet)
The passed in xml is parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id- Parameters:
xml
-compoundSet
-- Returns:
- UniprotProxySequenceReader
- Throws:
Exception
-
setCompoundSet
public void setCompoundSet(CompoundSet<C> compoundSet)
- Specified by:
setCompoundSet
in interfaceSequenceReader<C extends Compound>
-
setContents
public void setContents(String sequence) throws CompoundNotFoundException
Once the sequence is retrieved set the contents and make sure everything this is valid Some uniprot records contain white space in the sequence. We must strip it out so setContents doesn't fail.- Specified by:
setContents
in interfaceSequenceReader<C extends Compound>
- Parameters:
sequence
-- Throws:
CompoundNotFoundException
-
getLength
public int getLength()
The sequence length
-
getCompoundAt
public C getCompoundAt(int position)
Description copied from interface:Sequence
Returns the Compound at the given biological index- Specified by:
getCompoundAt
in interfaceSequence<C extends Compound>
- Parameters:
position
-- Returns:
-
getIndexOf
public int getIndexOf(C compound)
Description copied from interface:Sequence
Scans through the Sequence looking for the first occurrence of the given compound- Specified by:
getIndexOf
in interfaceSequence<C extends Compound>
- Parameters:
compound
-- Returns:
-
getLastIndexOf
public int getLastIndexOf(C compound)
Description copied from interface:Sequence
Scans through the Sequence looking for the last occurrence of the given compound- Specified by:
getLastIndexOf
in interfaceSequence<C extends Compound>
- Parameters:
compound
-- Returns:
-
getSequenceAsString
public String getSequenceAsString()
Description copied from interface:Sequence
Returns the String representation of the Sequence- Specified by:
getSequenceAsString
in interfaceSequence<C extends Compound>
- Returns:
-
getAsList
public List<C> getAsList()
Description copied from interface:Sequence
Returns the Sequence as a List of compounds
-
getInverse
public SequenceView<C> getInverse()
Description copied from interface:Sequence
Does the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.- Specified by:
getInverse
in interfaceSequence<C extends Compound>
- Returns:
-
getSequenceAsString
public String getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)
- Parameters:
bioBegin
-bioEnd
-strand
-- Returns:
-
getSubSequence
public SequenceView<C> getSubSequence(Integer bioBegin, Integer bioEnd)
Description copied from interface:Sequence
Returns a portion of the sequence from the different positions. This is indexed from 1- Specified by:
getSubSequence
in interfaceSequence<C extends Compound>
- Parameters:
bioBegin
-bioEnd
-- Returns:
-
getCompoundSet
public CompoundSet<C> getCompoundSet()
Description copied from interface:Sequence
Gets the compound set used to back this Sequence- Specified by:
getCompoundSet
in interfaceSequence<C extends Compound>
- Returns:
-
getAccession
public AccessionID getAccession()
Description copied from interface:Accessioned
Returns the AccessionID this location is currently bound with- Specified by:
getAccession
in interfaceAccessioned
- Returns:
-
getAccessions
public ArrayList<AccessionID> getAccessions() throws XPathExpressionException
Pull uniprot accessions associated with this sequence- Returns:
- Throws:
XPathExpressionException
-
getAliases
public ArrayList<String> getAliases() throws XPathExpressionException
Pull uniprot protein aliases associated with this sequence Provided for backwards compatibility now that we support both gene and protein aliases via separate methods.- Returns:
- Throws:
XPathExpressionException
-
getProteinAliases
public ArrayList<String> getProteinAliases() throws XPathExpressionException
Pull uniprot protein aliases associated with this sequence- Returns:
- Throws:
XPathExpressionException
-
getGeneAliases
public ArrayList<String> getGeneAliases() throws XPathExpressionException
Pull uniprot gene aliases associated with this sequence- Returns:
- Throws:
XPathExpressionException
-
countCompounds
public int countCompounds(C... compounds)
Description copied from interface:Sequence
Returns the number of times we found a compound in the Sequence- Specified by:
countCompounds
in interfaceSequence<C extends Compound>
- Parameters:
compounds
-- Returns:
-
getUniprotbaseURL
public static String getUniprotbaseURL()
The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.- Returns:
- the uniprotbaseURL
-
setUniprotbaseURL
public static void setUniprotbaseURL(String aUniprotbaseURL)
- Parameters:
aUniprotbaseURL
- the uniprotbaseURL to set
-
getUniprotDirectoryCache
public static String getUniprotDirectoryCache()
Local directory cache of XML that can be downloaded- Returns:
- the uniprotDirectoryCache
-
setUniprotDirectoryCache
public static void setUniprotDirectoryCache(String aUniprotDirectoryCache)
- Parameters:
aUniprotDirectoryCache
- the uniprotDirectoryCache to set
-
getGeneName
public String getGeneName()
Get the gene name associated with this sequence.- Returns:
-
getOrganismName
public String getOrganismName()
Get the organism name assigned to this sequence- Returns:
-
getKeyWords
public ArrayList<String> getKeyWords()
Pull UniProt key words which is a mixed bag of words associated with this sequence- Specified by:
getKeyWords
in interfaceFeaturesKeyWordInterface
- Returns:
-
getDatabaseReferences
public Map<String,List<DBReferenceInfo>> getDatabaseReferences()
The Uniprot mappings to other database identifiers for this sequence- Specified by:
getDatabaseReferences
in interfaceDatabaseReferenceInterface
- Returns:
-
-