MSA skype 20100721

Back to


Mark, Scooter, Andreas


Mark sent out this agenda for this meeting agenda:


Profile-Profile Alignment

Multiple Sequence Alignment

Cookbook pages

  • <BioJava:CookBook3:PSA>
  • <BioJava:CookBook3:MSA>

Alignments class access levels

  • public API
    • getAllPairsAlignments
    • getMultipleSequenceAlignment
    • getPairwiseAlignment
  • default access: useful methods in alternative alignment routines
    • getAllPairsAligners, getAllPairsScorers, getAllPairsScores
    • getPairwiseScore, getProfileProfileAlignment, getProgressiveAlignment
    • runPairwiseAligners, runPairwiseScorers, runProfileAligners
  • private: internals, give default access?
    • getListFromFutures
    • getPairwiseAligner, getPairwiseScorer, getProfileProfileAligner x 4


MSA emulation: easy way to set defaults, additional options customize the stages

  • Stage 1: Pairwise scoring
    • Identical in alignment: CLUSTALW, CLUSTALW2 (done)
    • Ktuples: CLUSTAL, MUSCLE
    • Wu-Manber ‘Fuzzy’ Ktuples: KALIGN
  • Stage 2: Guide tree clustering
    • NJ: CLUSTALW, CLUSTALW2 (done)
  • Stage 3: Progressive profile-profile alignments
    • Profile NW: KALIGN (done)
    • Consensus NW: CLUSTAL
    • Variable Gap Penalty NW: MUSCLE
    • Variable Gap Penalty MM: CLUSTALW
    • Refinement each step: CLUSTALW2
  • Stage 4: Refinement
    • None: CLUSTAL, CLUSTALW, KALIGN (done)
    • Rescore pairs in MSA: MUSCLE
    • Tree partitioning: MUSCLE
    • Single partitioning: CLUSTALW2


  • Sequence exceptions
    • wrap ProxySequenceReader exceptions in IllegalStateException
    • add defensive programming method(s) to ProxySequenceReader interface

Meeting Notes

We are back after a long conference induced skype call break…

Mark reports: finished basic implementation of profile-profile alignment added examples to wiki.

Discussion: clustalw emulation, discussions of implications of this emulations. We can’t guarantee that the results will be identical and clustalw’s license is kind of restrictive. Also the code is more an implementation of the principles behind, rather than a re-implementation. Suggested renaming: Name the parameters according to what it is. e.g: identities, ktuples, Wu-Manber ktuples.

Discussion of concurrency tools: At the present flag if use all CPUs or only one. Would be nice to have parameters to fine tune this. E.g use X CPUs, leave X CPUs available, use X percentage of CPUs.

memory consumption? We did not test memory consumtptions yet. We should take a large Pfam familiy and try to align to get a better feeling for that. Alternative: dengue virus: use 3000 residue long several thousand sequences

GSoC meeting in october: Q: who is going ANdreas will check out dates and see who from OBF is attending.

Priorities for next week:

Variable gap penalty extendable linear space version (timewise neighbour joining might be crux) spacewise the alignments are the space limitations

Refinement stage

There are 2 different ideaas how to do that: re-run progressive alignment or split MSA< then re-align it. Clustalw took of indiv. sequences, re-align to profile Muscle follows guide tree, splits sequ. off, splits profiles

Q: Benchmarking? Benchmarking for alignment quality. We will look if there are any simple to use Benchmarks, so we don;t have to write too much code for that.

Exception handling Short discussion about exception handling but no real conclusion

After meeting: Kyle about benchmarking: One approach may be to target large families in Pfam, like Snoal or Susd, re-align them, and show that our results are more accurate then what was previously published given the structures that exist for those families