public class UkkonenSuffixTree extends Object
A suffix tree is an efficient method for encoding the frequencies
of motifs in a sequence. They are sometimes used to quickly screen
for similar sequences. For instance, all motifs of length up to
2 in the sequence AAGT
could be encoded as:
root(4) | A(2)--------G(1)-----T(1) | | A(1)--G(1) T(1)
It supports addition of elements both as String and SymbolList. They should not be mixed together. The strings are also terminated internally, so it is possible to go and add more than one string to the suffix tree.
Some more work need to be done on how data should be generated from this class. If you need something that's not in there, please e-mail the list at biojava-dev@biojava.org and I'll add it in there. <\p>
Modifier and Type | Class and Description |
---|---|
static class |
UkkonenSuffixTree.SuffixNode
end Tree modification methods
|
Modifier and Type | Field and Description |
---|---|
static char |
DEFAULT_TERM_CHAR |
static int |
TO_A_LEAF |
Constructor and Description |
---|
UkkonenSuffixTree()
Initializes a new
UkkonenSuffixTree instance. |
UkkonenSuffixTree(FiniteAlphabet alpha) |
UkkonenSuffixTree(String seqs) |
Modifier and Type | Method and Description |
---|---|
void |
addSequence(String seq,
String name,
boolean doNotTerminate)
Add a sequence into the tree.
|
void |
addSymbolList(SymbolList list,
String name,
boolean doNotTerminate) |
protected ArrayList |
getAllNodes(UkkonenSuffixTree.SuffixNode root,
ArrayList list,
boolean leavesOnly) |
protected CharSequence |
getEdgeLabel(UkkonenSuffixTree.SuffixNode child) |
protected int |
getEdgeLength(UkkonenSuffixTree.SuffixNode child)
Tree navigation methods
|
protected CharSequence |
getLabel(UkkonenSuffixTree.SuffixNode node) |
protected int |
getPathEnd(UkkonenSuffixTree.SuffixNode node) |
protected int |
getPathLength(UkkonenSuffixTree.SuffixNode node) |
UkkonenSuffixTree.SuffixNode |
getRoot() |
UkkonenSuffixTree.SuffixNode |
jumpTo(UkkonenSuffixTree.SuffixNode starting,
CharSequence source,
int from,
int to)
Just like walkTo, but faster when used during tree construction, as it
assumes that a mismatch can only occurs with the last character of the
target string.
|
void |
printTree() |
boolean |
subStringExists(String str) |
UkkonenSuffixTree.SuffixNode |
walkTo(UkkonenSuffixTree.SuffixNode starting,
String source,
int from,
int to)
This method is used to walk down the tree, from a given node.
|
public static final char DEFAULT_TERM_CHAR
public static final int TO_A_LEAF
public UkkonenSuffixTree()
UkkonenSuffixTree
instance.public UkkonenSuffixTree(String seqs)
public UkkonenSuffixTree(FiniteAlphabet alpha)
public void addSymbolList(SymbolList list, String name, boolean doNotTerminate) throws IllegalSymbolException
IllegalSymbolException
public void addSequence(String seq, String name, boolean doNotTerminate)
seq
- the sequence/sequences to be added into the tree.doNotTerminate
- whether we should terminate the sequence if it's non-terminated.public UkkonenSuffixTree.SuffixNode walkTo(UkkonenSuffixTree.SuffixNode starting, String source, int from, int to)
rule
variable can be used to check where the walk
stopped. Note that rule 3 means that the string used to walk down the
tree does not match (which is a bit different from the construction
where rule 3 implies that only the last character doesn't match.
The String is encoded as a substring of a given source. This is done to
avoid replicating the string. To send walk down the string
x
from the root, one would call walkTo(root,x,0,x.length()).
starting
- the root of the subtree we're walking down form.source
- a superstring that contains the string we're using to
walking down. source.subtring(from,to) should give the string we're
walking down from.from
- the start position (inclusive) of the target string in the
source.to
- the end position (exclusive) of the target string in the node.SuffixNode
that the walk stopped at. If the walk
stopped inside an edge. (check the rule variable to see where it stopped).public UkkonenSuffixTree.SuffixNode jumpTo(UkkonenSuffixTree.SuffixNode starting, CharSequence source, int from, int to)
starting
- the root of the subtree we're walking down form.source
- a superstring that contains the string we're using to
walking down. source.subtring(from,to) should give the string we're
walking down from.from
- the start position (inclusive) of the target string in the
source.to
- the end position (exclusive) of the target string in the node.SuffixNode
that the walk stopped at. If the walk
stopped inside an edge. (check the rule variable to see where it
stopped).protected int getEdgeLength(UkkonenSuffixTree.SuffixNode child)
protected CharSequence getEdgeLabel(UkkonenSuffixTree.SuffixNode child)
protected int getPathLength(UkkonenSuffixTree.SuffixNode node)
protected int getPathEnd(UkkonenSuffixTree.SuffixNode node)
protected CharSequence getLabel(UkkonenSuffixTree.SuffixNode node)
protected ArrayList getAllNodes(UkkonenSuffixTree.SuffixNode root, ArrayList list, boolean leavesOnly)
public void printTree()
public UkkonenSuffixTree.SuffixNode getRoot()
public boolean subStringExists(String str)
Copyright © 2020 BioJava. All rights reserved.