BioJava:Cookbook:Sequence:SubSequence
From BioJava
Contents |
How do I get a subsection of a Sequence?
Given a Sequence object we might only be interested in examining the first 10 bases or we might want to get a region between two points. You might also want to print a subsequence to an OutputStream like STDOUT how could you do this?
BioJava uses a biological coordinate system for identifying bases. The first base is numbered 1 and the last base index is equal to the length of the sequence. Note that this is different from String indexing which starts at 0 and proceedes to length -1. If you attempt to access a region outside of 1...length an IndexOutOfBoundsException will occur.
Getting a Sub - Sequence
SymbolList symL = null; //code here to generate a SymbolList //get the first Symbol Symbol sym = symL.symbolAt(1); //get the first three bases SymbolList symL2 = symL.subList(1,3); //get the last three bases SymbolList symL3 = symL.subList(symL.length() - 3, symL.length());
Printing a Sub - Sequence
//print the last three bases of a SymbolList or Sequence String s = symL.subStr(symL.length() - 3, symL.length()); System.out.println(s);
Complete Listing
import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; public class SubSequencing { public static void main(String[] args) { SymbolList symL = null; //generate an RNA SymbolList try { symL = RNATools.createRNA("auggcaccguccagauu"); } catch (IllegalSymbolException ex) { ex.printStackTrace(); } //get the first Symbol Symbol sym = symL.symbolAt(1); //get the first three bases SymbolList symL2 = symL.subList(1,3); //get the last three bases SymbolList symL3 = symL.subList(symL.length() - 3, symL.length()); //print the last three bases String s = symL.subStr(symL.length() - 3, symL.length()); System.out.println(s); } }

