Home

PLF C++ Library - plf::bitset/bitsetb/bitsetc

Introduction

plf::bitset

This is a basic constexpr drop-in replacement for std::bitset, with generally better performance (this may change a little if vendors borrow the algorithms), some small changes to functions plus additional functions. The following stats summarise the significant performance results from the benchmarks versus std::bitset under GCC-libstdc++/MSVC-MSSTL respectively (I did not test libc++).

Under release (O2, AVX2) builds it has:

Under debug builds it has:

See benchmarks for more details. Other performance characteristics are more or less the same between plf and std.

Like std::bitset it takes it's size as a template parameter and allocates on the stack. If you want it to allocate on the heap, heap-allocate the whole bitset ie. plf::bitset<134> *temp = new plf::bitset<134>. In addition, it takes its storage type as an optional parameter (must be a trivial unsigned integer data type - uses std::size_t by default), which means, if you want to make the bitset use the least space possible for a size which isn't a multiple of sizeof(size_t) * 8, you can always do: plf::bitset<134, unsigned char>; or similar to reduce waste. All other functions match the std::bitset documentation, with the exceptions noted below.

New functionality

In addition plf::bitset brings new functions to the table:

Other differences

plf::bitsetb

This has 4 template parameters, bool user_supplied_buffer = false, typename storage_type = std::size_t, class allocator_type = std::allocator<storage_type>, bool hardened = false. Size is assigned in the constructor. With parameter "user_supplied_buffer" left at default "false", it allocates it's own buffer on the heap using the allocator and deallocates it upon destruction. When this parameter is set to true it takes a buffer supplied by the user in the constructor (not deallocated upon destruction) - in this case no functions are noexcept, as the user could easily supply a NULL buffer, or a buffer smaller than the specified size. In both cases the size is supplied in the constructor, not as a template parameter.

Due to the dynamic allocation it is not as fast as plf::bitset (stack allocation). But because size is a constructor parameter, you can have a bitsetb as part of a class without having to define the same sized bitsetb between class instances, or having to define the class as a template (this is helpful when bulk-processing multiple instances of a class). This is not possible when size is a template parameter (without using type erasure). Lastly, it means you can construct bitsets without having to know the bitset size at compile time.

Please note: using bitsetb for a very small bitset makes no sense - the size of the internal pointer to the buffer + the size of the internal 'size' parameter, in total, is 128 bits on a 64-bit platform. Therefore if you're making a 128-bit bitset you waste twice as much memory if you use plf::bitsetb over plf::bitset; so if you merely want to have a very small bitset allocated on the heap and don't require different bitset sizes within instances of the same non-template class, allocating a plf::bitset instance on the heap is a better way to go.

Other differences: It has a move constructor and a move assignment operator, whereas plf::bitset doesn't. The constexpr change_size(std::size_t new_size) function is provided for changing the size of your bitset (previous bitset length will be truncated if new size is smaller). operator = copies across a number of bytes equivalent to the smaller of the two bitset's sizes. Swap does not swap buffers.

Note: for historical reasons, there's also a typedef of bitsetb<false> named bitsetc.

Frequently-asked questions

Why no iterator?

Iterators for bitsets don't work well, the reason being that an iterator has to, upon ++ iteration, increment the bit location within the selected storage unit and if it's over the sizeof(storage_type) threshold, increment the storage unit location to the next one over. This necessitates a branch instruction, which (and I did implement an iterator to prove this) makes it slower than simply using an index and the [] operator (which doesn't require a branch instruction). Also, an iterator is larger than storing an index because you have to store both the index of the storage unit and the sub-storage-unit bit index. But even if you implement it with just an index number and use operator [], you're only adding unnecessary boilerplate complexity over a std::size_t index number and slowing things down in debug mode by passing numbers to functions etcetera. Essentially there are no advantages to using an iterator over using a std::size_t and operator [].

Why not support std::string/long int constructors?

Because I couldn't see myself using them.

License

plf::bitset, plf::bitsetb and plf::bitsetc are under the Computing for Good ethical licence.

Download

Download here or view the repository

The bitset libraries are simple .h header files, to be included with a #include command.

Benchmarks for plf::bitset

Results below were calculated on an Intel i7-9750H processor under Windows 10 in safe mode, using nuwen mingw GCC 13.2 and MSVC 2022 (results for earlier CPUs were either similar or better). Codegen was O2, AVX2 and C++23 for the 'Release' results, just C++23 for the 'Debug' results. Code runs are mostly 'hot' with a number of 'warm-up' function passes before timing is calculated. Results are averaged across a large number of runs. Alll benchmark code is available here. I did not bother with benchmarks for bitsetb and bitsetc since their buffers are allocated on the heap and as such are not performance-comparable with std::bitset. Hardened template parameter is default (off).

Release build results

The only function results included below are where there were significant differences in performance between the different bitset implementations. Please assume that for all other functions the results are roughly equivalent.

set(position, value)

test result graph

One can see here that while the libstdc++ approach almost optimizes equivalently to plf::bitset under GCC, the exact same (seriously, check the libstdc++ vs MSSTL bitset code) approach optimizes poorly in MSVC, being roughly 3 times slower. The code in question is a branch between two expressions, for both libstdc++/MSSTIL, so my guess (I haven't checked the assembly) is that in this scenario GCC calculates both expressions and then performs a conditional move between the two results, whereas MSVC doesn't.

to_string()

test result graph

The reason why MSVC has such a superior result here is because it has an intrinsic (I'm guessing assembly-level or AVX-optimized) instruction for converting the bits to a string. I'm not even going to *try* and compete with that. However for GCC/libstdc++ we see a +144% performance improvement for plf::bitset over std::bitset.

>> and << shifting

test result graph

test result graph

Both shifts are roughly +100% faster for plf under GCC/libstdc++, and +35%/20% faster under MSVC/MSSTL.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at linear scale instead

test result graph

This result shows the difference between the optimized plf::bitset set_range/reset_range functions and doing the same thing in std::bitset using a loop and set() or reset(). The bitset sizes were 128000 bits. The operation was repeated 10000 times with the start and end locations of the range of bits to set/reset being randomly calculated each time, so, anything between 1 and 127999 bits being set/reset. In GCC set_range was +34652% faster than std::bitset::set, while in MSVC it was +67612% faster.

Test suite benchmark

test result graph

This shows the result of testing all functionality in plf::bitset vs std::bitset (minus the functions which are not shared between std::bitset and plf::bitset). Under GCC/libstdc++ there is a +24% overall performance increase, under MSVC/MSSTL it is +20%.

Operator [ ] ie. bit access

test result graph

Here we see the results for plf are 3% faster in GCC/libstdc++, but equivalent under MSVC. This is surprising since both libstdc++ and MSSTL use almost exactly the same operator [ ] code. The main difference is the lack of boundary checking in the plf code.

Debug build results

As a favour to game development, and all other fields which work extensively with debug builds as well as optimized builds and/or non-AVX builds, the following are provided. Again, results are only shown for where there were significant performance differences between bitset implementations.

set(position, value)

test result graph

Despite optimizing well at O2/AVX2, the libstdc++ approach to set(value, pos) is 3x slower than the plf approach in debug mode. While the same approach is only 31% slower than plf in MSVC - despite being 2x slower in release mode.

to_string()

test result graph

The superior result for MSVC's to_string intrinsic function is even more prominent here. For GCC/libstdc++ we see +27% improvement for plf over std.

>> and << shifting

test result graph

test result graph

For both results plf's approach is roughly +110% faster under GCC, while being around +85%/+66% faster under MSVC.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at logarithmic scale instead

test result graph

In debug mode we see plf's set_range is +428127% faster than a GCC/libstdc++ std::bitset::set loop, while in MSVC/MSSTL it is +750726% faster.

Test suite benchmark

test result graph

Under GCC/libstdc++ plf is +175% faster than std, under MSVC/MSSTL it is +40% faster.

Operator [ ]

test result graph

In debug mode we see the plf approach is +360% faster under GCC and +132% faster under MSVC.

Version history

Contact: footer
plf:: library and this page Copyright (c) 2026, Matthew Bentley