BioJava:PhyloSOC07
This page will include all info and docs about our efforts in the 2007 Google Summer of Code as part of the NESCent phyloinformatics group.
Week 0 (~ May 20th) : Building project plan, Program setup (Java, Eclipse and BioJava, JGraphT), Reading NEXUS paper, etc.
Part I : Development of basic I/O
Week 1 (May 21st ~ May 27th) Development of basic Input

Input: Nucleic acid sequences (practice w/ FASTA format and create API for NEXUS format)

Initialization: create objects for each sequence
Day 1: Practice w/ FASTA parser done
Day 2: Getting to know NEXUS parser(1) (read and parse the TAXA, CHARACTER block) done
Day 3: Getting to know NEXUS parser(2) (TREE block) done
Day 4: Tree building practice w/ JGraphT (http://www.jgrapht.org/javadoc/) done
Day 5/6: Extend functions for NEXUS parser (parse a tree block and create tree by JGraphT) done
Week 2 Development of basic Output (May 28th ~ June 3rd)
 Output file creation in NEXUS format(converting tree object into NEXUS format)
Day1 & 2 : Finish the NexusToJgraphT code
Day3 : Nexus Tree code for Output(1) (create JgraphT object & Convert it to Nexus Tree Object) done (method: AddTree)
Day4 : Nexus Tree code for Output (2) (generating an output string)  done (testing file: SampleAddTree.java)
Day5 : Nexus Tree code for Output (3) (debugging) done
Day6 : Documentation done (getTree, addTree)
Part II: Distance method (multiple hit correction method)
Week 3 JukesCantor Devoloping API for JukesCantor method
Day1: Method for Nexus Parser done(getTreeAsJGraphT)
Day2 :JukesCantor method review & algorithm study & write sample input file done
Day3: program development (1) code for pairwise comparison done
Day4: program development (2) calculate K( # of nucleotide substitutions since the divergence) from the pairwise comparison result done
K = (3/4)*ln(1(4/3)*p), p = prob. of two sequences to have different base at certain position
Day5: Documentaion & feedback for methods in PartI (getTree, AddTree, getTreeAsJgrapht) done
Week 4 Kimura’s 2parameter
Day1: getting to know CVS and upload file done
Day2: Kimura’s 2parameter model reveiw & write sample input file w/ Nexus Parser  done
Day3: program development: code for differenciate transition/transversion & Calculate K  done
K = (1/2)*ln(1/(12pq)) + (1/4)*ln(1/(12q)),
p: proportion of diff. transition
q: proportion of diff. transversion
Day4: feedback for Multiple correction methods (JukesCantor, Kimura)
Day5: Reviewing UPGMA & NJ method.
Part III: Distance based phylogeny reconstruction
week5 UPGMA method & NeighborJoining method
Day1: code for UPGMA method(1)  building distance matrix (by JukesCantor or Kimura’s 2parameter)
Day2: code for UPGMA method(2)  calculate branch length & build weighted subtree as JGraphT
Day3: code for UPGMA method(3)  collapsing a pair and rebuild distance matrix
Day4: code for NJ method(1)  build initial star tree & choose a pair minizimg total branch length
Day5: code for NJ method(2)  collapse a pair & rebuild distance matrix & iterate
Day6: Revising code (if necessary)
[UPGMA]

finding shortest distance within distance matrix

calculate branch lengths as distance/2

build a subtree for that pair

collapse a pair (changes distance into 0)

repeat process expanding/combining trees
[NJ]

S = total branch length of tree

separate pair of taxa from all others

choose pair of taxa that minimizes S

build a subtree for that pair

collapse pair as distance and recalculate distance matrix

next pair that gives smallest S is chosen

repeat until complete
Week 6 Documentation for Part I & II & III : (JavaDoc and BJ website)
Day 1: NJ method (1) done
Day 2: NJ method (2) done
Day 3: implementing CharactersBlock Parser for UPGMA/NJ method  done
Day 4: Documentation (by format) done
Day 5: updating wiki page (specifying methods w/ sample codes)  Waiting for uploading as July 1st.
Part III : Maximum Parsimony
Week 7 Maximum Parsimony Method
Day 1: Implementing Taxa & CharactersBlock for UPGMA/NJ/MaximumParsimony methods done
Day 2: Revising AddTree method( for weighted tree) done (currently being discussed as well)
Day 3: Revising GetTreeAsJgrapht method( for weighted tree) done (currently being discussed as well)
Day 4: Code for Maximum Parsimony Method (1) done
Input: Read Nexus File & Extract MATRIX data (Align sequences & decide informative sites)
Day 5: Code for Maximum Parsimony Method (2) changing plans
Building Data Structure : decide all possible tree structures & initialize variables for those trees.
Week 8 Maximum Parsimony Method
Day 1: Code for Maximum Parsimony Method (2)  done
Building Data Structure : decide all possible tree structures & initialize variables for those trees.
Day 2: Code for Maximum Parsimony Method (3)  changing plans iterate the calculation to dicide a tree
Day 3: Revising AddTree & getTreeAsJGraphT method (to allow both weighted/unweighted tree)  done
Day 4: Debugging for nonsymmetric tree structure (1)  done Day 5: Debugging for nonsymmetric tree structure (2)  done