BioJava:PhyloSOC07
This page will include all info and docs about our efforts in the 2007 Google Summer of Code as part of the NESCent phyloinformatics group.
<APIs for BioJava: Project Plan>
Week 0 (~ May 20th) : Building project plan, Program set-up (Java, Eclipse and BioJava, JGraphT), Reading NEXUS paper, etc.
Part I : Development of basic I/O
Week 1 (May 21st ~ May 27th) Development of basic Input
-
Input: Nucleic acid sequences (practice w/ FASTA format and create API for NEXUS format)
-
Initialization: create objects for each sequence
Day 1: Practice w/ FASTA parser -done
Day 2: Getting to know NEXUS parser(1) (read and parse the TAXA, CHARACTER block) -done
Day 3: Getting to know NEXUS parser(2) (TREE block) -done
Day 4: Tree building practice w/ JGraphT (http://www.jgrapht.org/javadoc/) -done
Day 5/6: Extend functions for NEXUS parser (parse a tree block and create tree by JGraphT) -done
Week 2 Development of basic Output (May 28th ~ June 3rd)
- Output file creation in NEXUS format(converting tree object into NEXUS format)
Day1 & 2 : Finish the NexusToJgraphT code
Day3 : Nexus Tree code for Output(1) (create JgraphT object & Convert it to Nexus Tree Object) -done (method: AddTree)
Day4 : Nexus Tree code for Output (2) (generating an output string) - done (testing file: SampleAddTree.java)
Day5 : Nexus Tree code for Output (3) (debugging) -done
Day6 : Documentation -done (getTree, addTree)
Part II: Distance method (multiple hit correction method)
Week 3 Jukes-Cantor -Devoloping API for Jukes-Cantor method
Day1: Method for Nexus Parser -done(getTreeAsJGraphT)
Day2 :Jukes-Cantor method review & algorithm study & write sample input file -done
Day3: program development (1) code for pairwise comparison -done
Day4: program development (2) calculate K( # of nucleotide substitutions since the divergence) from the pairwise comparison result -done
K = -(3/4)*ln(1-(4/3)*p), p = prob. of two sequences to have different base at certain position
Day5: Documentaion & feedback for methods in PartI (getTree, AddTree, getTreeAsJgrapht) -done
Week 4 Kimura’s 2-parameter
Day1: getting to know CVS and upload file -done
Day2: Kimura’s 2-parameter model reveiw & write sample input file w/ Nexus Parser - done
Day3: program development: code for differenciate transition/transversion & Calculate K - done
K = (1/2)*ln(1/(1-2p-q)) + (1/4)*ln(1/(1-2q)),
p: proportion of diff. transition
q: proportion of diff. transversion
Day4: feedback for Multiple correction methods (JukesCantor, Kimura)
Day5: Reviewing UPGMA & N-J method.
Part III: Distance based phylogeny reconstruction
week5 UPGMA method & Neighbor-Joining method
Day1: code for UPGMA method(1) - building distance matrix (by JukesCantor or Kimura’s 2-parameter)
Day2: code for UPGMA method(2) - calculate branch length & build weighted sub-tree as JGraphT
Day3: code for UPGMA method(3) - collapsing a pair and rebuild distance matrix
Day4: code for N-J method(1) - build initial star tree & choose a pair minizimg total branch length
Day5: code for N-J method(2) - collapse a pair & rebuild distance matrix & iterate
Day6: Revising code (if necessary)
[UPGMA]
-
finding shortest distance within distance matrix
-
calculate branch lengths as distance/2
-
build a sub-tree for that pair
-
collapse a pair (changes distance into 0)
-
repeat process expanding/combining trees
[N-J]
-
S = total branch length of tree
-
separate pair of taxa from all others
-
choose pair of taxa that minimizes S
-
build a sub-tree for that pair
-
collapse pair as distance and recalculate distance matrix
-
next pair that gives smallest S is chosen
-
repeat until complete
Week 6 Documentation for Part I & II & III : (JavaDoc and BJ website)
Day 1: N-J method (1) -done
Day 2: N-J method (2) -done
Day 3: implementing CharactersBlock Parser for UPGMA/N-J method - done
Day 4: Documentation (by format) -done
Day 5: updating wiki page (specifying methods w/ sample codes) - Waiting for uploading as July 1st.
Part III : Maximum Parsimony
Week 7 Maximum Parsimony Method
Day 1: Implementing Taxa & CharactersBlock for UPGMA/N-J/MaximumParsimony methods -done
Day 2: Revising AddTree method( for weighted tree) -done (currently being discussed as well)
Day 3: Revising GetTreeAsJgrapht method( for weighted tree) -done (currently being discussed as well)
Day 4: Code for Maximum Parsimony Method (1) -done
Input: Read Nexus File & Extract MATRIX data (Align sequences & decide informative sites)
Day 5: Code for Maximum Parsimony Method (2) -changing plans
Building Data Structure : decide all possible tree structures & initialize variables for those trees.
Week 8 Maximum Parsimony Method
Day 1: Code for Maximum Parsimony Method (2) - done
Building Data Structure : decide all possible tree structures & initialize variables for those trees.
Day 2: Code for Maximum Parsimony Method (3) - changing plans iterate the calculation to dicide a tree
Day 3: Revising AddTree & getTreeAsJGraphT method (to allow both weighted/unweighted tree) - done
Day 4: Debugging for non-symmetric tree structure (1) - done Day 5: Debugging for non-symmetric tree structure (2) - done