PLF C++ Library - plf::bitset/bitsetb/bitsetc

Intro
Frequently-asked questions
License
Download
Benchmarks
Version History
Contact

Introduction

plf::bitset

This is a basic constexpr drop-in replacement for std::bitset, with generally better performance (this may change a little if vendors borrow the algorithms), some small changes to functions plus additional functions. The following stats summarise the significant performance results from the benchmarks versus std::bitset under GCC-libstdc++/MSVC-MSSTL respectively (I did not test libc++).

Under release (O2, AVX2) builds it has:

34652%/67612% faster setting/resetting of ranges of bits.
101%/35% faster left-shifting and 98%/22% right-shifting.
6%/204% faster set(position, value).
3%/0% faster operator [ ].
24%/20% faster overall in test suite benchmarks (testing all functionality of bitset on loop).

Under debug builds it has:

428127%/750726% faster setting/resetting of ranges of bits.
108%/85% faster left-shifting and 110%/66% right-shifting.
206%/31% faster set(position, value).
360%/132% faster operator [ ].
175%/40% faster overall in test suite benchmarks.

See benchmarks for more details. Other performance characteristics are more or less the same between plf and std.

Like std::bitset it takes it's size as a template parameter and allocates on the stack. If you want it to allocate on the heap, heap-allocate the whole bitset ie. plf::bitset<134> *temp = new plf::bitset<134>. In addition, it takes its storage type as an optional parameter (must be a trivial unsigned integer data type - uses std::size_t by default), which means, if you want to make the bitset use the least space possible for a size which isn't a multiple of sizeof(size_t) * 8, you can always do: plf::bitset<134, unsigned char>; or similar to reduce waste. All other functions match the std::bitset documentation, with the exceptions noted below.

New functionality

In addition plf::bitset brings new functions to the table:

constexpr void bitset(bitset &source) noexcept
Copy constructor.
constexpr void operator = () noexcept
Copy assignment.
constexpr bool any_range(size_t begin_index, size_t end_index), constexpr bool all_range(size_t begin_index, size_t end_index), constexpr bool none_range(size_t begin_index, size_t end_index)
Equivalents of any/all/none for a specific range within the bitset.
constexpr size_type count_range(size_t begin_index, size_t end_index)
Equivalent of count() for a specific range within the bitset.
constexpr void set_range(size_t begin_index, size_t end_index), constexpr void reset_range(size_t begin_index, size_t end_index), constexpr void set_range(size_t begin_index, size_t end_index, bool value)
These are optimized functions for setting a range of bits to 1 or 0. See the summary above or the benchmarks below for the performance benefits.
constexpr std::size_t first_one() noexcept, constexpr std::size_t last_one() noexcept, constexpr std::size_t first_zero() noexcept, constexpr std::size_t last_zero() noexcept
Optimized functions to find the index of the first/last 1 or 0 in the bitset. If no 1's or 0's are found (depending on which function is called), the function will return std::numeric_limits<std::size_t>::max().
constexpr std::size_t next_one(std::size_t index) noexcept, constexpr std::size_t prev_one(std::size_t index) noexcept, constexpr std::size_t next_zero(std::size_t index) noexcept, constexpr std::size_t prev_zero(std::size_t index) noexcept
Optimized functions to find the index of the next/previous 1 or 0 in the bitset before/after a user-supplied index. If no 1's or 0's are found (depending on which function is called), or the supplied index is greater than the size of the bitset, the function will return std::numeric_limits<std::size_t>::max().
constexpr void shift_left_range (std::size_t shift_amount, std::size_t first) noexcept, constexpr void shift_left_range_one (const std::size_t first) noexcept
These are basically operator >>= but starting from a particular index - all bits before that index are unaffected. The second function is an optimization of the first specifically for shifting by one. I could've written shift_right_range and/or first, last variants as well but I didn't personally have any need for them. The 'left' refers to the direction the bits move index-wise (which to be honest should be how operator <<= is defined).
constexpr std::basic_string<char_type, traits, allocator_type> to_rstring()
I find the reverse ordering of std::bitset::to_string() compared to the actual indices to be a waste of my time and unintuitive. It is also slower to calculate. This fixes that. In C++03/98 this and to_string() do not support custom char/allocator types.
constexpr unsigned long to_rulong() const and constexpr unsigned long long to_rullong() const
Similar to to_rstring, these deliver the same contents as to_ulong() and to_ullong() but with the order reversed ie. in index order.
constexpr void swap(bitset &source) noexcept, void std::swap(bitset &a, bitset &b) noexcept
Does an allocation-free swap of two plf::bitset's (with equal sizes, storage types and allocators) using the XOR method.

Other differences

Hardened mode - the third template parameter of plf::bitset is the hardened bool (false by default) which determines whether or not set, reset, set_range, reset_range and operator [ ] functions do bounds-checking. All hardened mode checks are constexpr, so the bounds-checking code won't be generated if hardened is false under C++17 and above.
Because I couldn't see the point in writing a bit reference class, operator [ ] can only be used to read bits, not to write to them. If you want to write to them, use set(pos), reset(pos) or set(pos, value) - the performance is the same because of the (by default) lack of bounds-checking.
plf::bitset only supports a default constructor and a copy constructor, none of the extra ones from std::bitset like the "from std::string" constructor.
to_string defaults to signed char, not char (which is not signed on all platforms) as signedness is necessary for the algorithm to work. Likewise, all potential supplied types to to-string()/to_rstring() must be signed.
Because it is pointless to do so, the following functions do not return anything: flip()/flip(pos), set()/set(pos)/set(pos, value), reset()/reset(pos).

plf::bitsetb

This has 4 template parameters, bool user_supplied_buffer = false, typename storage_type = std::size_t, class allocator_type = std::allocator<storage_type>, bool hardened = false. Size is assigned in the constructor. With parameter "user_supplied_buffer" left at default "false", it allocates it's own buffer on the heap using the allocator and deallocates it upon destruction. When this parameter is set to true it takes a buffer supplied by the user in the constructor (not deallocated upon destruction) - in this case no functions are noexcept, as the user could easily supply a NULL buffer, or a buffer smaller than the specified size. In both cases the size is supplied in the constructor, not as a template parameter.

Due to the dynamic allocation it is not as fast as plf::bitset (stack allocation). But because size is a constructor parameter, you can have a bitsetb as part of a class without having to define the same sized bitsetb between class instances, or having to define the class as a template (this is helpful when bulk-processing multiple instances of a class). This is not possible when size is a template parameter (without using type erasure). Lastly, it means you can construct bitsets without having to know the bitset size at compile time.

Please note: using bitsetb for a very small bitset makes no sense - the size of the internal pointer to the buffer + the size of the internal 'size' parameter, in total, is 128 bits on a 64-bit platform. Therefore if you're making a 128-bit bitset you waste twice as much memory if you use plf::bitsetb over plf::bitset; so if you merely want to have a very small bitset allocated on the heap and don't require different bitset sizes within instances of the same non-template class, allocating a plf::bitset instance on the heap is a better way to go.

Other differences: It has a move constructor and a move assignment operator, whereas plf::bitset doesn't. The constexpr change_size(std::size_t new_size) function is provided for changing the size of your bitset (previous bitset length will be truncated if new size is smaller). operator = copies across a number of bytes equivalent to the smaller of the two bitset's sizes. Swap does not swap buffers.

Note: for historical reasons, there's also a typedef of bitsetb<false> named bitsetc.

Frequently-asked questions

Why no iterator?

Iterators for bitsets don't work well, the reason being that an iterator has to, upon ++ iteration, increment the bit location within the selected storage unit and if it's over the sizeof(storage_type) threshold, increment the storage unit location to the next one over. This necessitates a branch instruction, which (and I did implement an iterator to prove this) makes it slower than simply using an index and the [] operator (which doesn't require a branch instruction). Also, an iterator is larger than storing an index because you have to store both the index of the storage unit and the sub-storage-unit bit index. But even if you implement it with just an index number and use operator [], you're only adding unnecessary boilerplate complexity over a std::size_t index number and slowing things down in debug mode by passing numbers to functions etcetera. Essentially there are no advantages to using an iterator over using a std::size_t and operator [].

Why not support std::string/long int constructors?

Because I couldn't see myself using them.

License

plf::bitset, plf::bitsetb and plf::bitsetc are under the Computing for Good ethical licence.

Download

Download here or view the repository

The bitset libraries are simple .h header files, to be included with a #include command.

Benchmarks for plf::bitset

Results below were calculated on an Intel i7-9750H processor under Windows 10 in safe mode, using nuwen mingw GCC 13.2 and MSVC 2022 (results for earlier CPUs were either similar or better). Codegen was O2, AVX2 and C++23 for the 'Release' results, just C++23 for the 'Debug' results. Code runs are mostly 'hot' with a number of 'warm-up' function passes before timing is calculated. Results are averaged across a large number of runs. Alll benchmark code is available here. I did not bother with benchmarks for bitsetb and bitsetc since their buffers are allocated on the heap and as such are not performance-comparable with std::bitset. Hardened template parameter is default (off).

Release build results

The only function results included below are where there were significant differences in performance between the different bitset implementations. Please assume that for all other functions the results are roughly equivalent.

set(position, value)

One can see here that while the libstdc++ approach almost optimizes equivalently to plf::bitset under GCC, the exact same (seriously, check the libstdc++ vs MSSTL bitset code) approach optimizes poorly in MSVC, being roughly 3 times slower. The code in question is a branch between two expressions, for both libstdc++/MSSTIL, so my guess (I haven't checked the assembly) is that in this scenario GCC calculates both expressions and then performs a conditional move between the two results, whereas MSVC doesn't.

to_string()

The reason why MSVC has such a superior result here is because it has an intrinsic (I'm guessing assembly-level or AVX-optimized) instruction for converting the bits to a string. I'm not even going to *try* and compete with that. However for GCC/libstdc++ we see a +144% performance improvement for plf::bitset over std::bitset.

>> and << shifting

Both shifts are roughly +100% faster for plf under GCC/libstdc++, and +35%/20% faster under MSVC/MSSTL.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at linear scale instead

This result shows the difference between the optimized plf::bitset set_range/reset_range functions and doing the same thing in std::bitset using a loop and set() or reset(). The bitset sizes were 128000 bits. The operation was repeated 10000 times with the start and end locations of the range of bits to set/reset being randomly calculated each time, so, anything between 1 and 127999 bits being set/reset. In GCC set_range was +34652% faster than std::bitset::set, while in MSVC it was +67612% faster.

Test suite benchmark

This shows the result of testing all functionality in plf::bitset vs std::bitset (minus the functions which are not shared between std::bitset and plf::bitset). Under GCC/libstdc++ there is a +24% overall performance increase, under MSVC/MSSTL it is +20%.

Operator [ ] ie. bit access

Here we see the results for plf are 3% faster in GCC/libstdc++, but equivalent under MSVC. This is surprising since both libstdc++ and MSSTL use almost exactly the same operator [ ] code. The main difference is the lack of boundary checking in the plf code.

Debug build results

As a favour to game development, and all other fields which work extensively with debug builds as well as optimized builds and/or non-AVX builds, the following are provided. Again, results are only shown for where there were significant performance differences between bitset implementations.

set(position, value)

Despite optimizing well at O2/AVX2, the libstdc++ approach to set(value, pos) is 3x slower than the plf approach in debug mode. While the same approach is only 31% slower than plf in MSVC - despite being 2x slower in release mode.

to_string()

The superior result for MSVC's to_string intrinsic function is even more prominent here. For GCC/libstdc++ we see +27% improvement for plf over std.

>> and << shifting

For both results plf's approach is roughly +110% faster under GCC, while being around +85%/+66% faster under MSVC.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at logarithmic scale instead

In debug mode we see plf's set_range is +428127% faster than a GCC/libstdc++ std::bitset::set loop, while in MSVC/MSSTL it is +750726% faster.

Test suite benchmark

Under GCC/libstdc++ plf is +175% faster than std, under MSVC/MSSTL it is +40% faster.

Operator [ ]

In debug mode we see the plf approach is +360% faster under GCC and +132% faster under MSVC.

Version history

2026-02-02: v1.3.00 - Replaced fill_n with memset under non-constexpr conditions, since it's soooooooooo much slower (particularly in debug mode). Made all containers C++98/C++03-compatible, and consolidated bitsetc into bitsetb. bitsetc is now the default state of bitsetb, to get the old bitsetb behaviour use the new bool template parameter, "user_supplied_buffer", set to true. ststring() removed as will not work for very large bitsets, and is incompatible with bitsetb. bitsetb now contains operators &, |, ^ and ~. These always allocate and return new bitsetb's with a non-user-supplied buffer. Correction to change_size() when bitsetb allocates it's own buffer (formerly bitsetc).
2026-01-17: v1.2.17 - Minor fix for pre-c++20 support.
2026-01-08: v1.2.16 - Simplified bitsetc unbasing, corrected hardened mode support in bitsetb/c.
2026-01-02: v1.2.14 - Added allocator-supplied move constructor to bitsetc.
2025-12-18: v1.2.12 - Corrections to bitsetc allocator-awareness and consistency. Removal of some unnecessary pre-c++11 checks in bitsetb and bitset. Bitsetb now no longer sets the buffer to zero on default construction, as this interferes with correct allocator transfer with copy constructor in bitsetc. When using bitsetb directly, if the buffer is not already zero'ed when supplied to bitsetb, use bitsetb::reset() to do so.
2025-12-15: v1.2.9 - Switch to copy_n.
2025-12-14: v1.2.8 - Correction to pre-C++20 support, correction to constexpr support, some code reduction.
2025-11-03: v1.2.2 - Added count_range, all_range, any_range and none_range functions (range-based equivalents of count/all/any/none). Corrected prev new operators in bitsetc. Added operator = & and operator = && to bitsetc.
2025-11-02: v1.2.0 - Added operators &, | and ^ to all bitsets except b. Corrected non-member functions in plf_bitset. Updated license to v1.01.
2025-10-26: v1.1.7 - fixed mistake in plf_bitsetb/c.
2025-08-02: v1.1.6 - optimization/code-reduction to first_one()/last_one()/next_one()/prev_one() and first_zero()/last_zero()/next_zero()/prev_zero().
2025-08-02: v1.1.4 - Corrected exceptions support, added hardened mode and corrected swap for bitsetb and bitsetc.
2025-07-11: v1.1.0 - Added next_one, next_zero, prev_one and prev_zero functions. Corrections to support under C++11.
2025-03-29: v1.0.3 - Added shift_left_range and shift_left_range_one.
2025-02-26: v1.0.1 - XOR swap changed to naive swap technique, as it is faster.
2025-02-15: v1.0.0 - First release