Short coding exercise
Goal 1
This task is inspired by the pre-processing of the next generation sequencing data.
Implement the interface
public interface StringOverlapFinder {
String cleanX(String x, String y);
}
The cleanX function should locate the overlap of the end of x with the
beginning of y.
The function should return the non-overlapping start of x If x does not
match the first length(x) characters of y, then return x. The minimum
overlap should be 5 or more characters.
Goal 2
The function should be wrapped up to accept a file, from the command line, that has two columns.
The resulting program should accept the file, feeding the function with each element from the column and print the result to either an out file or a standard output.
Example Command:
java FindEnds inputFile.txt outputFile.txt
Goal 3
Code under the assumption that
- The input file does not fit into memory
- The length of the individual strings to compare can be up to 100000 characters (fits into memory).
Example of Function Input and Result
x = “abcdefghijklm”
y = “hijklmnopqrst”
return: “abcdefg”
Example Input File
Column_1 (x) Column_2 (y)
abcdefghijklm hijklmnopqrstuvw
aioludhfgakjn akjnopqrstuvwxuh
...... .......
...... .......
There will be no header in the file and the columns are separated with a tab character (\t).
Example Output
If you coding for Goal 2 or 3 to help with assessment please make sure your program can write the “clean” output. For example for the input
abcdefghijklm hijklmnopqrstuvw
aioludhfgakjn akjnopqrstuvwxuh
your program should produce the following output
abcdefg
aioludhfgakjn
Target
The target is optimal code that solves the problem correctly, therefore there are several goals (in no particular order):
- Code quality (maintability, reusability, OO design, etc)
- CPU & RAM efficiency
- Execution speed
Using multiple threads to speed up the comparison is a plus.
Submission
Please prepare a JAR file containing the following:
- The executable JAR containing the program. This must be called runme.jar.
- A directory called src, containing all the source code and related parts of your project.
- A directory called docs, containing a pure ASCII text file called choices.txt describing the significant design choices you made, uncertainties you had regarding the project, and the decisions you made when resolving them.
- A directory called docs/javadoc containing javadoc for your classes.
Submit the Jar file to gsocexercise at gmail dot com by the 10 of April inclusive.
Either code for one, two or all of the goals.