BioJava:Cookbook:Sequence:SubSequence
How do I get a subsection of a Sequence?
Given a Sequence object we might only be interested in examining the first 10 bases or we might want to get a region between two points. You might also want to print a subsequence to an OutputStream like STDOUT how could you do this?
BioJava uses a biological coordinate system for identifying bases. The first base is numbered 1 and the last base index is equal to the length of the sequence. Note that this is different from String indexing which starts at 0 and proceedes to length -1. If you attempt to access a region outside of 1…length an IndexOutOfBoundsException will occur.
Getting a Sub - Sequence
` SymbolList symL = null;`
` //code here to generate a SymbolList`
` //get the first Symbol`
` Symbol sym = symL.symbolAt(1);`
` //get the first three bases`
` SymbolList symL2 = symL.subList(1,3);`
` //get the last three bases`
` SymbolList symL3 = symL.subList(symL.length() - 3, symL.length());`
Printing a Sub - Sequence
` //print the last three bases of a SymbolList or Sequence`
` String s = symL.subStr(symL.length() - 3, symL.length());`
` System.out.println(s);`
Complete Listing
```java import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*;
public class SubSequencing {
public static void main(String[] args) {
SymbolList symL = null;
//generate an RNA SymbolList
try {
symL = RNATools.createRNA("auggcaccguccagauu");
}
catch (IllegalSymbolException ex) {
ex.printStackTrace();
}
//get the first Symbol
Symbol sym = symL.symbolAt(1);
//get the first three bases
SymbolList symL2 = symL.subList(1,3);
//get the last three bases
SymbolList symL3 = symL.subList(symL.length() - 3, symL.length());
//print the last three bases
String s = symL.subStr(symL.length() - 3, symL.length());
System.out.println(s);
}
} ```