Package org.biojava.bio.dp

HMM and Dynamic Programming Algorithms.

This package deals with dynamic programming. It uses the same notions of sequences, alphabets and alignments as org.biojava.bio.seq, and extends them to incorporate HMMs, HMM states and state paths. As far as possible, the implementation detail is hidden from the casual observer, so that these objects can be used as black-boxes. Alternatively, there is scope for you to implement your own efficient representations of states and dynamic programming algorithms.

HMMs are defined by a finite set of states and a finite set of transitions. The states are encapsulated as subinterfaces of Symbols, so that we can re-use alphabets and SymbolList to store legal states and sequences of states. States that emit residues must implement EmissionState. They define a probability distribution over an alphabet. Other states may contain entire HMMs, or be non-emitting states which make the model easier to wire. An HMM contains an alphabet of states and a set of transitions with scores. They really resemble directed weighted graphs with the nodes being the states and the arcs being the transitions.

A simple HMM can be aligned to a single sequence at a time. This effectively finds the most likely way that the HMM could have emitted that sequence. More complex algorithms may align more than one sequence to a model simultaneously. For example, Smith-Waterman is a three-state model that aligns two sequences to each other and to the model. These more complex models can still be represented as producing a single sequence, but in this case the sequence is an alignment of the two input sequences against one-another (including gap characters where appropriate).