java.lang.Object
- org.biojava.nbio.core.sequence.loader.UniprotProxySequenceReader<C>

Type Parameters:

C -

All Implemented Interfaces:

Iterable<C>, DatabaseReferenceInterface, FeaturesKeyWordInterface, Accessioned, ProxySequenceReader<C>, Sequence<C>, SequenceReader<C>
```
public class UniprotProxySequenceReader<C extends Compound>
extends Object
implements ProxySequenceReader<C>, FeaturesKeyWordInterface, DatabaseReferenceInterface
```
Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the sequence data and other data elements associated with the ProteinSequence by Uniprot. This is an example of how to map external databases of proteins and features to the BioJava3 ProteinSequence. Important to call @see setUniprotDirectoryCache to allow caching of XML files so they don't need to be reloaded each time. Does not manage cache.

Field Summary

Fields
Modifier and Type Field Description

static String DEFAULT_UNIPROT_BASE_URL

static Pattern UP_AC_PATTERN

Constructor Summary

Constructors
Constructor	Description
`UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet)`	The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein.
`UniprotProxySequenceReader(Document document, CompoundSet<C> compoundSet)`	The xml is passed in as a DOM object so we know everything about the protein.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`int`	`countCompounds(C... compounds)`	Returns the number of times we found a compound in the Sequence
`boolean`	`equals(Object o)`
`AccessionID`	`getAccession()`	Returns the AccessionID this location is currently bound with
`ArrayList<AccessionID>`	`getAccessions()`	Pull uniprot accessions associated with this sequence
`ArrayList<String>`	`getAliases()`	Pull uniprot protein aliases associated with this sequence Provided for backwards compatibility now that we support both gene and protein aliases via separate methods.
`List<C>`	`getAsList()`	Returns the Sequence as a List of compounds
`C`	`getCompoundAt(int position)`	Returns the Compound at the given biological index
`CompoundSet<C>`	`getCompoundSet()`	Gets the compound set used to back this Sequence
`Map<String,List<DBReferenceInfo>>`	`getDatabaseReferences()`	The Uniprot mappings to other database identifiers for this sequence
`ArrayList<String>`	`getGeneAliases()`	Pull uniprot gene aliases associated with this sequence
`String`	`getGeneName()`	Get the gene name associated with this sequence.
`int`	`getIndexOf(C compound)`	Scans through the Sequence looking for the first occurrence of the given compound
`SequenceView<C>`	`getInverse()`	Does the right thing to get the inverse of the current Sequence.
`ArrayList<String>`	`getKeyWords()`	Pull UniProt key words which is a mixed bag of words associated with this sequence
`int`	`getLastIndexOf(C compound)`	Scans through the Sequence looking for the last occurrence of the given compound
`int`	`getLength()`	The sequence length
`String`	`getOrganismName()`	Get the organism name assigned to this sequence
`ArrayList<String>`	`getProteinAliases()`	Pull uniprot protein aliases associated with this sequence
`String`	`getSequenceAsString()`	Returns the String representation of the Sequence
`String`	`getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)`
`SequenceView<C>`	`getSubSequence(Integer bioBegin, Integer bioEnd)`	Returns a portion of the sequence from the different positions.
`static String`	`getUniprotbaseURL()`	The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.
`static String`	`getUniprotDirectoryCache()`	Local directory cache of XML that can be downloaded
`int`	`hashCode()`
`Iterator<C>`	`iterator()`
`static <C extends Compound> UniprotProxySequenceReader<C>`	`parseUniprotXMLString(String xml, CompoundSet<C> compoundSet)`	The passed in xml is parsed as a DOM object so we know everything about the protein.
`void`	`setCompoundSet(CompoundSet<C> compoundSet)`
`void`	`setContents(String sequence)`	Once the sequence is retrieved set the contents and make sure everything this is valid Some uniprot records contain white space in the sequence.
`static void`	`setUniprotbaseURL(String aUniprotbaseURL)`
`static void`	`setUniprotDirectoryCache(String aUniprotDirectoryCache)`
`String`	`toString()`

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.lang.Iterable
forEach, spliterator

- Field Detail
  - UP_AC_PATTERN
```
public static final Pattern UP_AC_PATTERN
```
  - DEFAULT_UNIPROT_BASE_URL
```
public static final String DEFAULT_UNIPROT_BASE_URL
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - UniprotProxySequenceReader
```
public UniprotProxySequenceReader(String accession,
                                  CompoundSet<C> compoundSet)
                           throws CompoundNotFoundException,
                                  IOException
```
    The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error
    
    Parameters:
    
    accession -
    
    compoundSet -
    
    Throws:
    
    CompoundNotFoundException
    
    IOException - if problems while reading the UniProt XML
  - UniprotProxySequenceReader
```
public UniprotProxySequenceReader(Document document,
                                  CompoundSet<C> compoundSet)
                           throws CompoundNotFoundException
```
    The xml is passed in as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
    
    Parameters:
    
    document -
    
    compoundSet -
    
    Throws:
    
    CompoundNotFoundException
- Method Detail
  - parseUniprotXMLString
```
public static <C extends Compound> UniprotProxySequenceReader<C> parseUniprotXMLString(String xml,
                                                                                       CompoundSet<C> compoundSet)
```
    The passed in xml is parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
    
    Parameters:
    
    xml -
    
    compoundSet -
    
    Returns:
    
    UniprotProxySequenceReader
    
    Throws:
    
    Exception
  - setCompoundSet
```
public void setCompoundSet(CompoundSet<C> compoundSet)
```
    Specified by:
    
    setCompoundSet in interface SequenceReader<C extends Compound>
  - setContents
```
public void setContents(String sequence)
                 throws CompoundNotFoundException
```
    Once the sequence is retrieved set the contents and make sure everything this is valid Some uniprot records contain white space in the sequence. We must strip it out so setContents doesn't fail.
    
    Specified by:
    
    setContents in interface SequenceReader<C extends Compound>
    
    Parameters:
    
    sequence -
    
    Throws:
    
    CompoundNotFoundException
  - getLength
```
public int getLength()
```
    The sequence length
    
    Specified by:
    
    getLength in interface Sequence<C extends Compound>
    
    Returns:
  - getCompoundAt
```
public C getCompoundAt(int position)
```
    Description copied from interface: Sequence
    
    Returns the Compound at the given biological index
    
    Specified by:
    
    getCompoundAt in interface Sequence<C extends Compound>
    
    Parameters:
    
    position -
    
    Returns:
  - getIndexOf
```
public int getIndexOf(C compound)
```
    Description copied from interface: Sequence
    
    Scans through the Sequence looking for the first occurrence of the given compound
    
    Specified by:
    
    getIndexOf in interface Sequence<C extends Compound>
    
    Parameters:
    
    compound -
    
    Returns:
  - getLastIndexOf
```
public int getLastIndexOf(C compound)
```
    Description copied from interface: Sequence
    
    Scans through the Sequence looking for the last occurrence of the given compound
    
    Specified by:
    
    getLastIndexOf in interface Sequence<C extends Compound>
    
    Parameters:
    
    compound -
    
    Returns:
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
    
    Returns:
  - getSequenceAsString
```
public String getSequenceAsString()
```
    Description copied from interface: Sequence
    
    Returns the String representation of the Sequence
    
    Specified by:
    
    getSequenceAsString in interface Sequence<C extends Compound>
    
    Returns:
  - getAsList
```
public List<C> getAsList()
```
    Description copied from interface: Sequence
    
    Returns the Sequence as a List of compounds
    
    Specified by:
    
    getAsList in interface Sequence<C extends Compound>
    
    Returns:
  - equals
```
public boolean equals(Object o)
```
    Overrides:
    
    equals in class Object
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class Object
  - getInverse
```
public SequenceView<C> getInverse()
```
    Description copied from interface: Sequence
    
    Does the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.
    
    Specified by:
    
    getInverse in interface Sequence<C extends Compound>
    
    Returns:
  - getSequenceAsString
```
public String getSequenceAsString(Integer bioBegin,
                                  Integer bioEnd,
                                  Strand strand)
```
    Parameters:
    
    bioBegin -
    
    bioEnd -
    
    strand -
    
    Returns:
  - getSubSequence
```
public SequenceView<C> getSubSequence(Integer bioBegin,
                                      Integer bioEnd)
```
    Description copied from interface: Sequence
    
    Returns a portion of the sequence from the different positions. This is indexed from 1
    
    Specified by:
    
    getSubSequence in interface Sequence<C extends Compound>
    
    Parameters:
    
    bioBegin -
    
    bioEnd -
    
    Returns:
  - iterator
```
public Iterator<C> iterator()
```
    Specified by:
    
    iterator in interface Iterable<C extends Compound>
    
    Returns:
  - getCompoundSet
```
public CompoundSet<C> getCompoundSet()
```
    Description copied from interface: Sequence
    
    Gets the compound set used to back this Sequence
    
    Specified by:
    
    getCompoundSet in interface Sequence<C extends Compound>
    
    Returns:
  - getAccession
```
public AccessionID getAccession()
```
    Description copied from interface: Accessioned
    
    Returns the AccessionID this location is currently bound with
    
    Specified by:
    
    getAccession in interface Accessioned
    
    Returns:
  - getAccessions
```
public ArrayList<AccessionID> getAccessions()
                                     throws XPathExpressionException
```
    Pull uniprot accessions associated with this sequence
    
    Returns:
    
    Throws:
    
    XPathExpressionException
  - getAliases
```
public ArrayList<String> getAliases()
                             throws XPathExpressionException
```
    Pull uniprot protein aliases associated with this sequence Provided for backwards compatibility now that we support both gene and protein aliases via separate methods.
    
    Returns:
    
    Throws:
    
    XPathExpressionException
  - getProteinAliases
```
public ArrayList<String> getProteinAliases()
                                    throws XPathExpressionException
```
    Pull uniprot protein aliases associated with this sequence
    
    Returns:
    
    Throws:
    
    XPathExpressionException
  - getGeneAliases
```
public ArrayList<String> getGeneAliases()
                                 throws XPathExpressionException
```
    Pull uniprot gene aliases associated with this sequence
    
    Returns:
    
    Throws:
    
    XPathExpressionException
  - countCompounds
```
public int countCompounds(C... compounds)
```
    Description copied from interface: Sequence
    
    Returns the number of times we found a compound in the Sequence
    
    Specified by:
    
    countCompounds in interface Sequence<C extends Compound>
    
    Parameters:
    
    compounds -
    
    Returns:
  - getUniprotbaseURL
```
public static String getUniprotbaseURL()
```
    The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.
    
    Returns:
    
    the uniprotbaseURL
  - setUniprotbaseURL
```
public static void setUniprotbaseURL(String aUniprotbaseURL)
```
    Parameters:
    
    aUniprotbaseURL - the uniprotbaseURL to set
  - getUniprotDirectoryCache
```
public static String getUniprotDirectoryCache()
```
    Local directory cache of XML that can be downloaded
    
    Returns:
    
    the uniprotDirectoryCache
  - setUniprotDirectoryCache
```
public static void setUniprotDirectoryCache(String aUniprotDirectoryCache)
```
    Parameters:
    
    aUniprotDirectoryCache - the uniprotDirectoryCache to set
  - getGeneName
```
public String getGeneName()
```
    Get the gene name associated with this sequence.
    
    Returns:
  - getOrganismName
```
public String getOrganismName()
```
    Get the organism name assigned to this sequence
    
    Returns:
  - getKeyWords
```
public ArrayList<String> getKeyWords()
```
    Pull UniProt key words which is a mixed bag of words associated with this sequence
    
    Specified by:
    
    getKeyWords in interface FeaturesKeyWordInterface
    
    Returns:
  - getDatabaseReferences
```
public Map<String,List<DBReferenceInfo>> getDatabaseReferences()
```
    The Uniprot mappings to other database identifiers for this sequence
    
    Specified by:
    
    getDatabaseReferences in interface DatabaseReferenceInterface
    
    Returns:

Modifier and Type	Field	Description
`static String`	`DEFAULT_UNIPROT_BASE_URL`
`static Pattern`	`UP_AC_PATTERN`

Class UniprotProxySequenceReader<C extends Compound>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.Iterable

Field Detail

UP_AC_PATTERN

DEFAULT_UNIPROT_BASE_URL

Constructor Detail

UniprotProxySequenceReader

UniprotProxySequenceReader

Method Detail

parseUniprotXMLString

setCompoundSet

setContents

getLength

getCompoundAt

getIndexOf

getLastIndexOf

toString

getSequenceAsString

getAsList

equals

hashCode

getInverse

getSequenceAsString

getSubSequence

iterator

getCompoundSet

getAccession

getAccessions

getAliases

getProteinAliases

getGeneAliases

countCompounds

getUniprotbaseURL

setUniprotbaseURL

getUniprotDirectoryCache

setUniprotDirectoryCache

getGeneName

getOrganismName

getKeyWords

getDatabaseReferences