Class BitSequenceReader.BitArrayWorker<C extends Compound>
- java.lang.Object
-
- org.biojava.nbio.core.sequence.storage.BitSequenceReader.BitArrayWorker<C>
-
- Type Parameters:
C
- TheCompound
to use
- Direct Known Subclasses:
FourBitSequenceReader.FourBitArrayWorker
,TwoBitSequenceReader.TwoBitArrayWorker
- Enclosing class:
- BitSequenceReader<C extends Compound>
public abstract static class BitSequenceReader.BitArrayWorker<C extends Compound> extends Object
The logic of working with a bit has been separated out into this class to help developers create the bit data structures without having to put the code into an intermediate format and to also use the format without the need to copy this code. This class behaves just like aSequence
without the interface- Author:
- ayates
-
-
Field Summary
Fields Modifier and Type Field Description static int
BYTES_PER_INT
-
Constructor Summary
Constructors Constructor Description BitArrayWorker(String sequence, CompoundSet<C> compoundSet)
BitArrayWorker(CompoundSet<C> compoundSet, int length)
BitArrayWorker(CompoundSet<C> compoundSet, int[] sequence)
BitArrayWorker(Sequence<C> sequence)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract byte
bitMask()
This method should return the bit mask to be used to extract the bytes you are interested in working with.protected int
bitsPerCompound()
Returns how many bits are used to represent a compound e.g. 2 if using 2bit encoding.protected abstract int
compoundsPerDatatype()
Should return the maximum amount of compounds we can encode per intboolean
equals(Object o)
protected abstract Map<C,Integer>
generateCompoundsToIndex()
Returns what the value of a compound is in the backing bit storage i.e.protected abstract List<C>
generateIndexToCompounds()
Should return the inverse information thatgenerateCompoundsToIndex()
returns i.e. if the Compound C returns 1 from compoundsToIndex then we should find that compound here in position 1C
getCompoundAt(int position)
Returns the compound at the specified biological indexCompoundSet<C>
getCompoundSet()
Returns the compound set backing this storeprotected Map<C,Integer>
getCompoundsToIndexLookup()
Returns a map which converts from compound to an integer representationprotected List<C>
getIndexToCompoundsLookup()
Returns a list of compounds the index position of which is used to translate from the byte representation into a compound.int
getLength()
int
hashCode()
void
populate(String sequence)
Loops through the chars in a String and passes them ontosetCompoundAt(char, int)
void
populate(Sequence<C> sequence)
Loops through the Compounds in a Sequence and passes them ontosetCompoundAt(Compound, int)
protected byte
processUnknownCompound(C compound, int position)
Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g.int
seqArraySize(int length)
void
setCompoundAt(char base, int position)
Converts from char to Compound and sets it at the given biological indexvoid
setCompoundAt(C compound, int position)
Sets the compound at the specified biological index
-
-
-
Field Detail
-
BYTES_PER_INT
public static final int BYTES_PER_INT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BitArrayWorker
public BitArrayWorker(Sequence<C> sequence)
-
BitArrayWorker
public BitArrayWorker(String sequence, CompoundSet<C> compoundSet)
-
BitArrayWorker
public BitArrayWorker(CompoundSet<C> compoundSet, int length)
-
BitArrayWorker
public BitArrayWorker(CompoundSet<C> compoundSet, int[] sequence)
-
-
Method Detail
-
bitMask
protected abstract byte bitMask()
This method should return the bit mask to be used to extract the bytes you are interested in working with. See solid implementations on how to create these
-
compoundsPerDatatype
protected abstract int compoundsPerDatatype()
Should return the maximum amount of compounds we can encode per int
-
generateIndexToCompounds
protected abstract List<C> generateIndexToCompounds()
Should return the inverse information thatgenerateCompoundsToIndex()
returns i.e. if the Compound C returns 1 from compoundsToIndex then we should find that compound here in position 1
-
generateCompoundsToIndex
protected abstract Map<C,Integer> generateCompoundsToIndex()
Returns what the value of a compound is in the backing bit storage i.e. in 2bit storage the value 0 is encoded as 00 (in binary).
-
bitsPerCompound
protected int bitsPerCompound()
Returns how many bits are used to represent a compound e.g. 2 if using 2bit encoding.
-
seqArraySize
public int seqArraySize(int length)
-
populate
public void populate(Sequence<C> sequence)
Loops through the Compounds in a Sequence and passes them ontosetCompoundAt(Compound, int)
-
populate
public void populate(String sequence)
Loops through the chars in a String and passes them ontosetCompoundAt(char, int)
-
setCompoundAt
public void setCompoundAt(char base, int position)
Converts from char to Compound and sets it at the given biological index
-
setCompoundAt
public void setCompoundAt(C compound, int position)
Sets the compound at the specified biological index
-
getCompoundAt
public C getCompoundAt(int position)
Returns the compound at the specified biological index
-
processUnknownCompound
protected byte processUnknownCompound(C compound, int position) throws IllegalStateException
Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g. N in a 2bit sequence. You can override this to convert the unknown base into one you can process or store locations of unknown bases for a level of post processing in your subclass.- Parameters:
compound
- Compound process- Returns:
- Byte representation of the compound
- Throws:
IllegalStateException
- Done whenever this method is invoked
-
getIndexToCompoundsLookup
protected List<C> getIndexToCompoundsLookup()
Returns a list of compounds the index position of which is used to translate from the byte representation into a compound.
-
getCompoundsToIndexLookup
protected Map<C,Integer> getCompoundsToIndexLookup()
Returns a map which converts from compound to an integer representation
-
getCompoundSet
public CompoundSet<C> getCompoundSet()
Returns the compound set backing this store
-
getLength
public int getLength()
-
-