Class SequenceMixin
- java.lang.Object
-
- org.biojava.nbio.core.sequence.template.SequenceMixin
-
public class SequenceMixin extends Object
Provides a set of static methods to be used as static imports when needed across multiple Sequence implementations but inheritance gets in the way. It also provides a place to put utility methods whose application can be to a single class of Sequence e.g.NucleotideCompound
Sequence
; or to any Sequence e.g. looking for thegetComposition(Sequence)
orgetDistribution(Sequence)
for any type of Sequence. All of these methods assume that you can use theIterable
interface offered by the implementations ofSequence
to provide all the compounds that implementation allows you to see. Since sequence should know nothing about its backing stores (apart from calling out to it) this should be true.- Author:
- ayates
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SequenceMixin.SequenceIterator<C extends Compound>
A basic sequence iterator which iterates over the given Sequence by biological index.
-
Constructor Summary
Constructors Constructor Description SequenceMixin()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <C extends Compound>
Stringchecksum(Sequence<C> sequence)
Performs a simple CRC64 checksum on any given sequence.static int
countAT(Sequence<NucleotideCompound> sequence)
Returns the count of AT in the given sequencestatic <C extends Compound>
intcountCompounds(Sequence<C> sequence, C... compounds)
For the given vargs of compounds this method counts the number of times those compounds appear in the given sequencestatic int
countGC(Sequence<NucleotideCompound> sequence)
Returns the count of GC in the given sequencestatic <C extends Compound>
Iterator<C>createIterator(Sequence<C> sequence)
Creates a simple sequence iterator which moves through a sequence going from 1 to the length of the Sequence.static <C extends Compound>
SequenceView<C>createSubSequence(Sequence<C> sequence, int start, int end)
Creates a simple sub sequence view delimited by the given start and end.static <C extends Compound>
Map<C,Integer>getComposition(Sequence<C> sequence)
Does a linear scan over the given Sequence and records the number of times each base appears.static <C extends Compound>
Map<C,Double>getDistribution(Sequence<C> sequence)
Analogous togetComposition(Sequence)
but returns the distribution of thatCompound
over the given sequence.static <C extends Compound>
intindexOf(Sequence<C> sequence, C compound)
Performs a linear search of the given Sequence for the given compound.static <C extends Compound>
SequenceView<C>inverse(Sequence<C> sequence)
A method which attempts to do the right thing when is comes to a reverse/reverse complementstatic <C extends Compound>
intlastIndexOf(Sequence<C> sequence, C compound)
Performs a reversed linear search of the given Sequence by wrapping it in aReversedSequenceView
and passing it intoindexOf(Sequence, Compound)
.static <C extends Compound>
List<SequenceView<C>>nonOverlappingKmers(Sequence<C> sequence, int kmer)
Produces kmers of the specified size e.g.static <C extends Compound>
List<SequenceView<C>>overlappingKmers(Sequence<C> sequence, int kmer)
Used to generate overlapping k-mers such i.e.static <C extends Compound>
booleansequenceEquality(Sequence<C> source, Sequence<C> target)
A case-sensitive manner of comparing two sequence objects together.static <C extends Compound>
booleansequenceEqualityIgnoreCase(Sequence<C> source, Sequence<C> target)
A case-insensitive manner of comparing two sequence objects together.static <C extends Compound>
Sequence<C>shuffle(Sequence<C> sequence)
Implements sequence shuffling by first materializing the givenSequence
into aList
, applyingCollections.shuffle(List)
and then returning the shuffled elements in a new instance ofSequenceBackingStore
which behaves as aSequence
.static <C extends Compound>
List<C>toList(Sequence<C> sequence)
static <C extends Compound>
StringtoString(Sequence<C> sequence)
Shortcut totoStringBuilder(org.biojava.nbio.core.sequence.template.Sequence)
which calls toString() on the resulting object.static <C extends Compound>
StringBuildertoStringBuilder(Sequence<C> sequence)
For the given Sequence this will return aStringBuilder
object filled with the results ofCompound#toString()
.static <C extends Compound>
voidwrite(Appendable appendable, Sequence<C> sequence)
Used as a way of sending a Sequence to a writer without the cost of converting to a full length String and then writing the data out
-
-
-
Constructor Detail
-
SequenceMixin
public SequenceMixin()
-
-
Method Detail
-
countCompounds
public static <C extends Compound> int countCompounds(Sequence<C> sequence, C... compounds)
For the given vargs of compounds this method counts the number of times those compounds appear in the given sequence- Type Parameters:
C
- The type of compound we are looking for- Parameters:
sequence
- TheSequence
to perform the count oncompounds
- The compounds to look for- Returns:
- The number of times the given compounds appear in this Sequence
-
countGC
public static int countGC(Sequence<NucleotideCompound> sequence)
Returns the count of GC in the given sequence- Parameters:
sequence
- TheNucleotideCompound
Sequence
to perform the GC analysis on- Returns:
- The number of GC compounds in the sequence
-
countAT
public static int countAT(Sequence<NucleotideCompound> sequence)
Returns the count of AT in the given sequence- Parameters:
sequence
- TheNucleotideCompound
Sequence
to perform the AT analysis on- Returns:
- The number of AT compounds in the sequence
-
getDistribution
public static <C extends Compound> Map<C,Double> getDistribution(Sequence<C> sequence)
Analogous togetComposition(Sequence)
but returns the distribution of thatCompound
over the given sequence.- Type Parameters:
C
- The type of compound to look for- Parameters:
sequence
- The type of sequence to look over- Returns:
- Returns the decimal fraction of the compounds in the given sequence. Any compound not in the Map will return a fraction of 0.
-
getComposition
public static <C extends Compound> Map<C,Integer> getComposition(Sequence<C> sequence)
Does a linear scan over the given Sequence and records the number of times each base appears. The returned map will return 0 if a compound is asked for and the Map has no record of it.- Type Parameters:
C
- The type of compound to look for- Parameters:
sequence
- The type of sequence to look over- Returns:
- Counts for the instances of all compounds in the sequence
-
write
public static <C extends Compound> void write(Appendable appendable, Sequence<C> sequence) throws IOException
Used as a way of sending a Sequence to a writer without the cost of converting to a full length String and then writing the data out- Type Parameters:
C
- Type of compound- Parameters:
writer
- The writer to send data tosequence
- The sequence to write out- Throws:
IOException
- Thrown if we encounter a problem
-
toStringBuilder
public static <C extends Compound> StringBuilder toStringBuilder(Sequence<C> sequence)
For the given Sequence this will return aStringBuilder
object filled with the results ofCompound#toString()
. Does not usedwrite(java.lang.Appendable, org.biojava.nbio.core.sequence.template.Sequence)
because of itsIOException
signature.
-
toString
public static <C extends Compound> String toString(Sequence<C> sequence)
Shortcut totoStringBuilder(org.biojava.nbio.core.sequence.template.Sequence)
which calls toString() on the resulting object.
-
indexOf
public static <C extends Compound> int indexOf(Sequence<C> sequence, C compound)
Performs a linear search of the given Sequence for the given compound. Once we find the compound we return the position.
-
lastIndexOf
public static <C extends Compound> int lastIndexOf(Sequence<C> sequence, C compound)
Performs a reversed linear search of the given Sequence by wrapping it in aReversedSequenceView
and passing it intoindexOf(Sequence, Compound)
. We then inverse the index coming out of it.
-
createIterator
public static <C extends Compound> Iterator<C> createIterator(Sequence<C> sequence)
Creates a simple sequence iterator which moves through a sequence going from 1 to the length of the Sequence. Modification of the Sequence is not allowed.
-
createSubSequence
public static <C extends Compound> SequenceView<C> createSubSequence(Sequence<C> sequence, int start, int end)
Creates a simple sub sequence view delimited by the given start and end.
-
shuffle
public static <C extends Compound> Sequence<C> shuffle(Sequence<C> sequence)
Implements sequence shuffling by first materializing the givenSequence
into aList
, applyingCollections.shuffle(List)
and then returning the shuffled elements in a new instance ofSequenceBackingStore
which behaves as aSequence
.
-
checksum
public static <C extends Compound> String checksum(Sequence<C> sequence)
Performs a simple CRC64 checksum on any given sequence.
-
nonOverlappingKmers
public static <C extends Compound> List<SequenceView<C>> nonOverlappingKmers(Sequence<C> sequence, int kmer)
Produces kmers of the specified size e.g. ATGTGA returns two views which have ATG TGA- Type Parameters:
C
- Compound to use- Parameters:
sequence
- Sequence to build fromkmer
- Kmer size- Returns:
- The list of non-overlapping K-mers
-
overlappingKmers
public static <C extends Compound> List<SequenceView<C>> overlappingKmers(Sequence<C> sequence, int kmer)
Used to generate overlapping k-mers such i.e. ATGTA will give rise to ATG, TGT & GTA- Type Parameters:
C
- Compound to use- Parameters:
sequence
- Sequence to build fromkmer
- Kmer size- Returns:
- The list of overlapping K-mers
-
inverse
public static <C extends Compound> SequenceView<C> inverse(Sequence<C> sequence)
A method which attempts to do the right thing when is comes to a reverse/reverse complement- Type Parameters:
C
- The type of compound- Parameters:
sequence
- The input sequence- Returns:
- The inverted sequence which is optionally complemented
-
sequenceEqualityIgnoreCase
public static <C extends Compound> boolean sequenceEqualityIgnoreCase(Sequence<C> source, Sequence<C> target)
A case-insensitive manner of comparing two sequence objects together. We will throw out any compounds which fail to match on their sequence length & compound sets used. The code will also bail out the moment we find something is wrong with a Sequence. Cost to run is linear to the length of the Sequence.- Type Parameters:
C
- The type of compound- Parameters:
source
- Source sequence to assesstarget
- Target sequence to assess- Returns:
- Boolean indicating if the sequences matched ignoring case
-
sequenceEquality
public static <C extends Compound> boolean sequenceEquality(Sequence<C> source, Sequence<C> target)
A case-sensitive manner of comparing two sequence objects together. We will throw out any compounds which fail to match on their sequence length & compound sets used. The code will also bail out the moment we find something is wrong with a Sequence. Cost to run is linear to the length of the Sequence.- Type Parameters:
C
- The type of compound- Parameters:
source
- Source sequence to assesstarget
- Target sequence to assess- Returns:
- Boolean indicating if the sequences matched
-
-