Package org.biojava.bio.search
Class KnuthMorrisPrattSearch
- java.lang.Object
-
- org.biojava.bio.search.KnuthMorrisPrattSearch
-
public final class KnuthMorrisPrattSearch extends Object
An object to find exact subsequences within a sequence.Reference: KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350. USAGE:
When the object is constructed thefindMatches()
method would be called. This will return an int[] giving the offsets (ie the location of the first symbol of each match in the text). ThegetKMPNextTable()
returns the table of border lengths used in the algorithm. This method is protected as it is unlikely it will be needed except for debugging.The algorithm finds exact matches therefore ambiguity symbols will match only themselves. The class cannot perform regular expressions. The class operates on all alphabets thus if searching for a DNA pattern you should compile both the pattern and its reverse complement.
WARNING the behaviour of a pattern containing gaps is undefined. It's not recommended that you try it.
Copyright: Copyright (c) 2002
Company: AgResearch
- Version:
- 1.0
- Author:
- Mark Schreiber
-
-
Constructor Summary
Constructors Constructor Description KnuthMorrisPrattSearch(SymbolList pattern)
Constructs a KMP matcher to find exact occurances ofpattern
intext
using the Knuth-Morris-Pratt algorithm.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int[]
findMatches(SymbolList text)
This will return an int[] giving the offsets of the matches intext
(ie the location of the first symbol of each match in thetext
).protected int[]
getKmpNextTable()
Returns the table of border lengthsSymbolList
getPattern()
static void
main(String[] args)
Demo and Test method
-
-
-
Constructor Detail
-
KnuthMorrisPrattSearch
public KnuthMorrisPrattSearch(SymbolList pattern)
Constructs a KMP matcher to find exact occurances ofpattern
intext
using the Knuth-Morris-Pratt algorithm.A new class should be constructed for each occurance of
pattern
.- Parameters:
pattern
- the pattern to search for
-
-
Method Detail
-
findMatches
public int[] findMatches(SymbolList text)
This will return an int[] giving the offsets of the matches intext
(ie the location of the first symbol of each match in thetext
).- Parameters:
text
- the text or sequence to search for the pattern- Returns:
- an int[] giving the offsets to the matches or a zero length array if there are no matches.
-
getKmpNextTable
protected int[] getKmpNextTable()
Returns the table of border lengths- Returns:
- an int[] of border lenghts
-
getPattern
public SymbolList getPattern()
- Returns:
- the pattern being searched for
-
-