Class UkkonenSuffixTree
- java.lang.Object
-
- org.biojava.bio.symbol.UkkonenSuffixTree
-
public class UkkonenSuffixTree extends Object
A suffix tree is an efficient method for encoding the frequencies of motifs in a sequence. They are sometimes used to quickly screen for similar sequences. For instance, all motifs of length up to 2 in the sequence
AAGT
could be encoded as:root(4) | A(2)--------G(1)-----T(1) | | A(1)--G(1) T(1)
It supports addition of elements both as String and SymbolList. They should not be mixed together. The strings are also terminated internally, so it is possible to go and add more than one string to the suffix tree.
Some more work need to be done on how data should be generated from this class. If you need something that's not in there, please e-mail the list at biojava-dev@biojava.org and I'll add it in there. <\p>
- Version:
- 1.3
- Author:
- Francois Pepin
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
UkkonenSuffixTree.SuffixNode
end Tree modification methods
-
Field Summary
Fields Modifier and Type Field Description static char
DEFAULT_TERM_CHAR
static int
TO_A_LEAF
-
Constructor Summary
Constructors Constructor Description UkkonenSuffixTree()
Initializes a newUkkonenSuffixTree
instance.UkkonenSuffixTree(String seqs)
UkkonenSuffixTree(FiniteAlphabet alpha)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addSequence(String seq, String name, boolean doNotTerminate)
Add a sequence into the tree.void
addSymbolList(SymbolList list, String name, boolean doNotTerminate)
protected ArrayList
getAllNodes(UkkonenSuffixTree.SuffixNode root, ArrayList list, boolean leavesOnly)
protected CharSequence
getEdgeLabel(UkkonenSuffixTree.SuffixNode child)
protected int
getEdgeLength(UkkonenSuffixTree.SuffixNode child)
Tree navigation methodsprotected CharSequence
getLabel(UkkonenSuffixTree.SuffixNode node)
protected int
getPathEnd(UkkonenSuffixTree.SuffixNode node)
protected int
getPathLength(UkkonenSuffixTree.SuffixNode node)
UkkonenSuffixTree.SuffixNode
getRoot()
UkkonenSuffixTree.SuffixNode
jumpTo(UkkonenSuffixTree.SuffixNode starting, CharSequence source, int from, int to)
Just like walkTo, but faster when used during tree construction, as it assumes that a mismatch can only occurs with the last character of the target string.void
printTree()
boolean
subStringExists(String str)
UkkonenSuffixTree.SuffixNode
walkTo(UkkonenSuffixTree.SuffixNode starting, String source, int from, int to)
This method is used to walk down the tree, from a given node.
-
-
-
Field Detail
-
DEFAULT_TERM_CHAR
public static final char DEFAULT_TERM_CHAR
- See Also:
- Constant Field Values
-
TO_A_LEAF
public static final int TO_A_LEAF
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
UkkonenSuffixTree
public UkkonenSuffixTree()
Initializes a newUkkonenSuffixTree
instance.
-
UkkonenSuffixTree
public UkkonenSuffixTree(String seqs)
-
UkkonenSuffixTree
public UkkonenSuffixTree(FiniteAlphabet alpha)
-
-
Method Detail
-
addSymbolList
public void addSymbolList(SymbolList list, String name, boolean doNotTerminate) throws IllegalSymbolException
- Throws:
IllegalSymbolException
-
addSequence
public void addSequence(String seq, String name, boolean doNotTerminate)
Add a sequence into the tree. If there are more sequences, they should be separated by a terminationChar ($ by default). If none exist, it is assumed that there is only 1 continuous sequence to be added.- Parameters:
seq
- the sequence/sequences to be added into the tree.doNotTerminate
- whether we should terminate the sequence if it's non-terminated.
-
walkTo
public UkkonenSuffixTree.SuffixNode walkTo(UkkonenSuffixTree.SuffixNode starting, String source, int from, int to)
This method is used to walk down the tree, from a given node. Therule
variable can be used to check where the walk stopped. Note that rule 3 means that the string used to walk down the tree does not match (which is a bit different from the construction where rule 3 implies that only the last character doesn't match.The String is encoded as a substring of a given source. This is done to avoid replicating the string. To send walk down the string
x
from the root, one would call walkTo(root,x,0,x.length()).- Parameters:
starting
- the root of the subtree we're walking down form.source
- a superstring that contains the string we're using to walking down. source.subtring(from,to) should give the string we're walking down from.from
- the start position (inclusive) of the target string in the source.to
- the end position (exclusive) of the target string in the node.- Returns:
- a
SuffixNode
that the walk stopped at. If the walk stopped inside an edge. (check the rule variable to see where it stopped).
-
jumpTo
public UkkonenSuffixTree.SuffixNode jumpTo(UkkonenSuffixTree.SuffixNode starting, CharSequence source, int from, int to)
Just like walkTo, but faster when used during tree construction, as it assumes that a mismatch can only occurs with the last character of the target string.- Parameters:
starting
- the root of the subtree we're walking down form.source
- a superstring that contains the string we're using to walking down. source.subtring(from,to) should give the string we're walking down from.from
- the start position (inclusive) of the target string in the source.to
- the end position (exclusive) of the target string in the node.- Returns:
- a
SuffixNode
that the walk stopped at. If the walk stopped inside an edge. (check the rule variable to see where it stopped).
-
getEdgeLength
protected int getEdgeLength(UkkonenSuffixTree.SuffixNode child)
Tree navigation methods
-
getEdgeLabel
protected CharSequence getEdgeLabel(UkkonenSuffixTree.SuffixNode child)
-
getPathLength
protected int getPathLength(UkkonenSuffixTree.SuffixNode node)
-
getPathEnd
protected int getPathEnd(UkkonenSuffixTree.SuffixNode node)
-
getLabel
protected CharSequence getLabel(UkkonenSuffixTree.SuffixNode node)
-
getAllNodes
protected ArrayList getAllNodes(UkkonenSuffixTree.SuffixNode root, ArrayList list, boolean leavesOnly)
-
printTree
public void printTree()
-
getRoot
public UkkonenSuffixTree.SuffixNode getRoot()
-
subStringExists
public boolean subStringExists(String str)
-
-