Bit vector  0.9
Fast and space efficient dynamic bit vector library.
Public Member Functions | Protected Member Functions | Protected Attributes | Static Protected Attributes | List of all members
bv::leaf< buffer_size, avx > Class Template Reference

Simple flat dynamic bit vector for use as a leaf for a dynamic b-tree bit vector. More...

#include <leaf.hpp>

Public Member Functions

 leaf (uint64_t capacity, uint64_t *data)
 Leaf constructor. More...
 
bool at (const uint32_t i) const
 Get the value of the ith element in the leaf. More...
 
uint32_t p_sum () const
 Getter for p_sum_.
 
uint32_t size () const
 Getter for size_.
 
void insert (const uint64_t i, const bool x)
 Insert x into position i. More...
 
bool remove (const uint64_t i)
 Remove the ith bit from the leaf. More...
 
int set (const uint64_t i, const bool x)
 Set the ith bit to x. More...
 
uint32_t rank (const uint64_t n) const
 Number of 1-bits up to position n. More...
 
uint32_t rank (const uint64_t n, const uint64_t offset) const
 Number of 1-bits up to position n from position offset. More...
 
uint32_t select (const uint32_t x) const
 Index of the xth 1-bit in the data structure. More...
 
uint32_t select (const uint32_t x, uint32_t pos, uint32_t pop) const
 Index of the xth 1-bit in the data structure starting at pos with pop More...
 
uint64_t bits_size () const
 Size of the leaf and associated data in bits.
 
bool need_realloc () const
 Determine if reallocation / splitting is required prior to additional insertion of new elements. More...
 
uint32_t capacity () const
 Size of the leaf-associated data storage in 64-bit words. More...
 
void capacity (uint32_t cap)
 Set the size of the leaf-associated data storage. More...
 
void set_data_ptr (uint64_t *ptr)
 Sets the pointer to the leaf-associated data storage. More...
 
uint64_t * data ()
 Provides access to raw leaf-associated data. More...
 
void clear_first (uint64_t elems)
 Remove the fist "elems" elements from the leaf. More...
 
void transfer_append (leaf *other, uint64_t elems)
 Move "elems" elements from the start of "other" to the end of "this". More...
 
void clear_last (uint64_t elems)
 Remove the last "elems" elements from the leaf. More...
 
void transfer_prepend (leaf *other, uint64_t elems)
 Move "elems" elements from the end of "other" to the start of "this". More...
 
void append_all (leaf *other)
 Copy all elements from (the start of) "other" to the end of "this". More...
 
void commit ()
 Commit and clear the Insert/Remove buffer for the leaf. More...
 
uint64_t validate () const
 Ensure that the leaf is in a valid state. More...
 
void print (bool internal_only) const
 Output data stored in the leaf as json. More...
 

Protected Member Functions

bool buffer_value (uint32_t e) const
 Extract the value of a buffer element. More...
 
bool buffer_is_insertion (uint32_t e) const
 Extract type information of a buffer element. More...
 
uint32_t buffer_index (uint32_t e) const
 Extract index information from a butter element. More...
 
void set_buffer_index (uint32_t v, uint8_t i)
 Updates index information for a specified buffer element. More...
 
uint32_t create_buffer (uint32_t idx, bool t, bool v)
 Creates a new 32-bit buffer element with the given parameters. More...
 
void insert_buffer (uint8_t idx, uint32_t buf)
 Insert a new element into the buffer. More...
 
void delete_buffer_element (uint8_t idx)
 Remove an element from the buffer. More...
 
void push_back (const bool x)
 Add an element to the end of the leaf data. More...
 

Protected Attributes

uint8_t buffer_count_
 Number of elements in insert/remove buffer.
 
uint16_t capacity_
 Number of 64-bit integers available in data.
 
uint32_t size_
 Logical number of bits stored.
 
uint32_t p_sum_
 Logical number of 1-bits stored.
 
uint32_t buffer_ [buffer_size]
 Insert/remove buffer.
 
uint64_t * data_
 Pointer to data storage.
 

Static Protected Attributes

static const constexpr uint64_t MASK = 1
 0x1 to be used in bit operations.
 
static const constexpr uint32_t VALUE_MASK = 1
 Mask for accessing buffer value.
 
static const constexpr uint32_t TYPE_MASK = 8
 Mask for accessing buffer type.
 
static const constexpr uint32_t INDEX_MASK = ((uint32_t(1) << 8) - 1)
 Mask for accessing buffer index value.
 

Detailed Description

template<uint8_t buffer_size, bool avx = true>
class bv::leaf< buffer_size, avx >

Simple flat dynamic bit vector for use as a leaf for a dynamic b-tree bit vector.

The leaf is not usable as fully featured bit vector by itself since the leaf cannot reallocate itself when full. The leaf requires a parent structure (like bv::bit_vector or bv::node) to manage reallocations and rebalancing.

Practical limitations

The maximum leaf size for a buffered leaf without risking undefined behaviour is \(2^{24} - 1\) due to buffer elements storing reference indexes in 24-bits of 32-bit integers. The practical upper limit of leaf size due to performance concerns is no more than \(2^{20}\) and optimal leaf size is likely to be closer to the \(2^{12}\) to \(2^{15}\) range.

Maximum leaf size can't in effect be non-divisible by 64, since 64-bit integers are used for storage and a leaf "will use" all available words fully before indicating that a reallocation is necessary, triggering limit checks.

The maximum buffer size is 63 due to storing the information on buffer usage with 6 bits of an 8-bit word, with 2 bits reserved for other purposes. This could easily be "fixed", but testing indicates that bigger buffer sizes are unlikely to be of practical use.

Template Parameters
Sizeof insertion/removal buffer.
Useavx population counts for rank operations.

Constructor & Destructor Documentation

◆ leaf()

template<uint8_t buffer_size, bool avx = true>
bv::leaf< buffer_size, avx >::leaf ( uint64_t  capacity,
uint64_t *  data 
)
inline

Leaf constructor.

The intention of this constructor is to enable either allocation of a larger consecutive memory block where the leaf struct can be placed at the start followed by the "data" section, or the data section can be allocated separately.

Parameters
capacityNumber of 64-bit integers available for use in data.
dataPointer to a contiguous memory area available for storage.

Member Function Documentation

◆ append_all()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::append_all ( leaf< buffer_size, avx > *  other)
inline

Copy all elements from (the start of) "other" to the end of "this".

Intended for merging leaves.

Will not ensure sufficient capacity for merge.

Will ensure that the buffers are empty for both leaves.

Will now clear elements from "other" as it is assumed that "other" will be deallocated .

Parameters
otherPointer to next leaf.

◆ at()

template<uint8_t buffer_size, bool avx = true>
bool bv::leaf< buffer_size, avx >::at ( const uint32_t  i) const
inline

Get the value of the ith element in the leaf.

Parameters
iIndex to check
Returns
Value of bit at index i.

◆ buffer_index()

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::buffer_index ( uint32_t  e) const
inlineprotected

Extract index information from a butter element.

The 24 most significant bits of the 32-bit buffer element contain index information on the insert/removal operation.

Parameters
eBuffer element to extract index from.
Returns
Index information related to the buffer element.

◆ buffer_is_insertion()

template<uint8_t buffer_size, bool avx = true>
bool bv::leaf< buffer_size, avx >::buffer_is_insertion ( uint32_t  e) const
inlineprotected

Extract type information of a buffer element.

The fourth least significant bit (0b1000), contains a 1 if the buffered operation is an insertion.

Parameters
eBuffer element to extract type from.
Returns
Boolean value True if the buffer is related to an insert operations, and false if the buffer is related to a removal.

◆ buffer_value()

template<uint8_t buffer_size, bool avx = true>
bool bv::leaf< buffer_size, avx >::buffer_value ( uint32_t  e) const
inlineprotected

Extract the value of a buffer element.

The First bit (lsb) of a 23-bit buffer element contains the value of the element.

I.e. if bv[i] is 1, then the buffer created for remove(i) will have a lsb with value 1.

Parameters
eBuffer element to extract value from.
Returns
Boolean value indicating the value of the element referred to by the buffer.

◆ capacity() [1/2]

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::capacity ( ) const
inline

Size of the leaf-associated data storage in 64-bit words.

Primarily intended for debugging and validation.

Returns
Size of internal data storage in 64-bit words.

◆ capacity() [2/2]

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::capacity ( uint32_t  cap)
inline

Set the size of the leaf-associated data storage.

Intended only to be set by an allocator after allocating additional storage for the leaf. Setting the capacity carelessly easily leads to undefined behaviour.

Parameters
capNew size for data_

◆ clear_first()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::clear_first ( uint64_t  elems)
inline

Remove the fist "elems" elements from the leaf.

Intended for removing elements that have been copied to a neighbouring leaf. Assumes that buffer has been committed before calling.

Will update size and p_sum.

Parameters
elemsnumber of elements to remove.

◆ clear_last()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::clear_last ( uint64_t  elems)
inline

Remove the last "elems" elements from the leaf.

Intended for removing elements that have been copied to a neighbouring leaf. Assumes that buffer has been committed before calling.

Will update size and p_sum.

Parameters
elemsnumber of elements to remove.

◆ commit()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::commit ( )
inline

Commit and clear the Insert/Remove buffer for the leaf.

Intended for clearing a full buffer before insertion or removal, and for ensuring an empty buffer before transfer operations.

Slightly complicated but linear time function for committing all buffered operations to the underlying data.

◆ create_buffer()

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::create_buffer ( uint32_t  idx,
bool  t,
bool  v 
)
inlineprotected

Creates a new 32-bit buffer element with the given parameters.

A new buffer element is typically created for insertion into the buffer.

Parameters
idxIndex value for the new buffer element.
tType (Insert/Remove) of new buffer element.
vValue associated with the new buffer element.

◆ data()

template<uint8_t buffer_size, bool avx = true>
uint64_t* bv::leaf< buffer_size, avx >::data ( )
inline

Provides access to raw leaf-associated data.

Intended for use when rebalancing and splitting leaves.

It is generally a very good idea to make sure the leaf buffer is committed before accessing the raw leaf data.

Returns
Pointer to raw leaf data.

◆ delete_buffer_element()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::delete_buffer_element ( uint8_t  idx)
inlineprotected

Remove an element from the buffer.

Existing elements with index greater than idx get shuffled backwards and the old element at idx will be overwritten.

Parameters
idxLocation of the element in the buffer

◆ insert()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::insert ( const uint64_t  i,
const bool  x 
)
inline

Insert x into position i.

The leaf cannot reallocate so inserting into a full leaf is undefined behaviour.

If the the buffer is not full or the insertion is appending a new element, the operation can be considered constant time. If the buffer is full and the insertion is not appending, the operations will be linear in the leaf size.

Parameters
iInsertion index
xValue to insert

◆ insert_buffer()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::insert_buffer ( uint8_t  idx,
uint32_t  buf 
)
inlineprotected

Insert a new element into the buffer.

Existing elements with index idx or greater gets shuffled forward and the new element will overwrite the buffer at index idx.

Parameters
idxPosition of the new element in the buffer.
bufElement to insert.

◆ need_realloc()

template<uint8_t buffer_size, bool avx = true>
bool bv::leaf< buffer_size, avx >::need_realloc ( ) const
inline

Determine if reallocation / splitting is required prior to additional insertion of new elements.

Returns
True if there is no guarantee that an insertion can be completed without undefined behaviour.

◆ print()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::print ( bool  internal_only) const
inline

Output data stored in the leaf as json.

Will output valid json, but will not output a trailing newline since the call is expected to be part of a recursive tree traversal.

Parameters
internal_onlyWill not output raw data it true to save space.

◆ push_back()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::push_back ( const bool  x)
inlineprotected

Add an element to the end of the leaf data.

If naively writing to the next available position would cause an overflow, the buffer will be committed and this will guarantee that the next available position will become valid assuming proper handling of the leaf by the parent element.

Parameters
xValue to be appended to the data.

◆ rank() [1/2]

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::rank ( const uint64_t  n) const
inline

Number of 1-bits up to position n.

Counts the number of bits set in the first n bits.

This is a simple linear operations of population counting.

Parameters
nNumber of elements to include in the "summation".
Returns
\(\sum_{i = 0}^{\mathrm{index - 1}} \mathrm{bv}[i]\).

◆ rank() [2/2]

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::rank ( const uint64_t  n,
const uint64_t  offset 
) const
inline

Number of 1-bits up to position n from position offset.

Counts the number of bits set in the [offset n) range.

This is a simple linear operations of population counting.

Parameters
nEnd position of summation.
offsetStart position of summation.
Returns
\(\sum_{i = \mathrm{offset}}^{\mathrm{n- 1}} \mathrm{bv}[i]\).

◆ remove()

template<uint8_t buffer_size, bool avx = true>
bool bv::leaf< buffer_size, avx >::remove ( const uint64_t  i)
inline

Remove the ith bit from the leaf.

No implicit balancing is possible in leaves without a managing parent (bv::bit_vector or bv::node).

If the the buffer is not full the operation can be considered constant time. If the buffer is full, the operations will be linear in the leaf size.

Parameters
iIndex to remove
Returns
Value of removed element.

◆ select() [1/2]

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::select ( const uint32_t  x) const
inline

Index of the xth 1-bit in the data structure.

Select is a simple (if somewhat slow) linear time operation.

Parameters
xSelection target.
Returns
\(\underset{i \in [0..n)}{\mathrm{arg min}}\left(\sum_{j = 0}^i \mathrm{bv}[j]\right) = x\).

◆ select() [2/2]

template<uint8_t buffer_size, bool avx = true>
uint32_t bv::leaf< buffer_size, avx >::select ( const uint32_t  x,
uint32_t  pos,
uint32_t  pop 
) const
inline

Index of the xth 1-bit in the data structure starting at pos with pop

Parameters
xSelection target.
posStart position of Select calculation.
popStart population as pos
Returns
\(\underset{i \in [0..n)}{\mathrm{arg min}}\left(\sum_{j = 0}^i \mathrm{bv}[j]\right) = x\).

◆ set()

template<uint8_t buffer_size, bool avx = true>
int bv::leaf< buffer_size, avx >::set ( const uint64_t  i,
const bool  x 
)
inline

Set the ith bit to x.

The value is changed in constant time (if buffer size is considered a constant). The change in data structure sum is returned for updateing cumulative sums in ancestor objects.

Parameters
iIndex of bit to modify
xNew value for the ith element.
Returns
\(x - \mathrm{bv}[i]\)

◆ set_buffer_index()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::set_buffer_index ( uint32_t  v,
uint8_t  i 
)
inlineprotected

Updates index information for a specified buffer element.

Clears index information for the ith buffer element and replases it with v.

Parameters
vValue to set the buffer index to.
iIndex of buffer element in buffer.

◆ set_data_ptr()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::set_data_ptr ( uint64_t *  ptr)
inline

Sets the pointer to the leaf-associated data storage.

Intended only to be set by an allocator after allocating additional storage for the leaf. Setting the the pointer carelessly easily leads to undefined behaviour.

Parameters
ptrPointer to data storage.

◆ transfer_append()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::transfer_append ( leaf< buffer_size, avx > *  other,
uint64_t  elems 
)
inline

Move "elems" elements from the start of "other" to the end of "this".

Will not ensure sufficient capacity for the copy target.

The operation will ensure that the buffers of both leaves are cleared before transferring elements. Elements are first copied, after which, elements are cleared from the sibling with other->clear_first(elems).

Parameters
otherPointer to the next sibling.
elemsNumber of elements to transfer.

◆ transfer_prepend()

template<uint8_t buffer_size, bool avx = true>
void bv::leaf< buffer_size, avx >::transfer_prepend ( leaf< buffer_size, avx > *  other,
uint64_t  elems 
)
inline

Move "elems" elements from the end of "other" to the start of "this".

Will not ensure sufficient capacity for the copy target.

The operation will ensure that the buffers of both leaves are cleared before transferring elements. Elements are first copied, after which, elements are cleared from the sibling with other->clear_last(elems).

Parameters
otherPointer to the previous sibling.
elemsNumber of elements to transfer.

◆ validate()

template<uint8_t buffer_size, bool avx = true>
uint64_t bv::leaf< buffer_size, avx >::validate ( ) const
inline

Ensure that the leaf is in a valid state.

Will use assertions to check that the actual stored data agrees with stored metadata. Will not in any way that the leaf is in the correct state given the preceding sequence of operations, just that the leaf is in a state that is valid for a leaf.

If the compiler flag -DNDEBUG is given, this function will do nothing and may be completely optimized out by an optimizing compiler.

Returns
1 for calculating the number of nodes in the tree.

The documentation for this class was generated from the following file: