How do I filter sequences based on their species?

The species field of a GenBank SwissProt or EMBL file ends up as an Annotation entry. Essentially all you need to do is get the species property from a sequences Annotation and check to see if it is what you want.

The species property name depends on the source: for EMBL or SwissProt it is “OS” for GenBank it is “Organism”.

The following program will read in Sequences from a file and filter them according to their species. The same general recipe with a little modification could be used for any Annotation property.

```java import*;

import*; import*; import*; import*;

public class FilterEMBLBySpecies {

 public static void main(String[] args) {

   try {
     //read an EMBL file specified in args[0]
     BufferedReader br = new BufferedReader(new FileReader(args[0]));
     SequenceIterator iter = SeqIOTools.readEmbl(br);

     //the species name to search for (specified by args[1]);
     String species = args[1];

     //A sequenceDB to store the filtered Seqs
     SequenceDB db = new HashSequenceDB();

     //As each sequence is read
       Sequence seq = iter.nextSequence();
       Annotation anno = seq.getAnnotation();

       //check the annotation for Embl organism field "OS"

         String property = (String)anno.getProperty("OS");

         //check the value of the property, could also do this with a regular expression

     //write the sequences as FASTA
     SeqIOTools.writeFasta(System.out, db);
   catch (Exception ex) {

} ```