This is a basic constexpr drop-in replacement for std::bitset, with generally better performance (this may change a little if vendors borrow the algorithms), some small changes to functions plus additional functions. The following stats summarise the significant performance results from the benchmarks versus std::bitset under GCC-libstdc++/MSVC-MSSTL respectively (I did not test libc++).
Under release (O2, AVX2) builds it has:
Under debug builds it has:
See benchmarks for more details. Other performance characteristics are more or less the same between plf and std.
Like std::bitset it takes it's size as a template parameter and allocates on the stack. If you want it to allocate on the heap, heap-allocate the whole bitset ie. plf::bitset<134> *temp = new plf::bitset<134>. In addition, it takes its storage type as an optional parameter (must be a trivial unsigned integer data type - uses std::size_t by default), which means, if you want to make the bitset use the least space possible for a size which isn't a multiple of sizeof(size_t) * 8, you can always do: plf::bitset<134, unsigned char>; or similar to reduce waste. All other functions match the std::bitset documentation, with the exceptions noted below.
In addition plf::bitset brings new functions to the table:
constexpr void bitset(bitset &source) noexceptconstexpr void operator = () noexceptconstexpr bool any_range(size_t begin_index, size_t end_index), constexpr bool all_range(size_t begin_index, size_t end_index), constexpr bool none_range(size_t begin_index, size_t end_index)constexpr size_type count_range(size_t begin_index, size_t end_index)constexpr void set_range(size_t begin_index, size_t end_index), constexpr void reset_range(size_t begin_index, size_t end_index), constexpr void set_range(size_t begin_index, size_t end_index, bool value)constexpr std::size_t first_one() noexcept, constexpr std::size_t last_one() noexcept, constexpr std::size_t first_zero() noexcept, constexpr std::size_t last_zero() noexceptstd::numeric_limits<std::size_t>::max().constexpr std::size_t next_one(std::size_t index) noexcept, constexpr std::size_t prev_one(std::size_t index) noexcept, constexpr std::size_t next_zero(std::size_t index) noexcept, constexpr std::size_t prev_zero(std::size_t index) noexceptstd::numeric_limits<std::size_t>::max().constexpr void shift_left_range (std::size_t shift_amount, std::size_t first) noexcept, constexpr void shift_left_range_one (const std::size_t first) noexceptoperator >>= but starting from a particular index - all bits before that index are unaffected. The second function is an optimization of the first specifically for shifting by one. I could've written shift_right_range and/or first, last variants as well but I didn't personally have any need for them. The 'left' refers to the direction the bits move index-wise (which to be honest should be how operator <<= is defined).
constexpr std::basic_string<char_type, traits, allocator_type> to_rstring()constexpr unsigned long to_rulong() const and constexpr unsigned long long to_rullong() constconstexpr void swap(bitset &source) noexcept, void std::swap(bitset &a, bitset &b) noexcepthardened bool (false by default) which determines whether or not set, reset, set_range, reset_range and operator [ ] functions do bounds-checking. All hardened mode checks are constexpr, so the bounds-checking code won't be generated if hardened is false under C++17 and above.signed char, not char (which is not signed on all platforms) as signedness is necessary for the algorithm to work. Likewise, all potential supplied types to to-string()/to_rstring() must be signed.This has 4 template parameters, bool user_supplied_buffer = false, typename storage_type = std::size_t, class allocator_type = std::allocator<storage_type>, bool hardened = false. Size is assigned in the constructor.
With parameter "user_supplied_buffer" left at default "false", it allocates it's own buffer on the heap using the allocator and deallocates it upon destruction. When this parameter is set to true it takes a buffer supplied by the user in the constructor (not deallocated upon destruction) - in this case no functions are noexcept, as the user could easily supply a NULL buffer, or a buffer smaller than the specified size. In both cases the size is supplied in the constructor, not as a template parameter.
Due to the dynamic allocation it is not as fast as plf::bitset (stack allocation). But because size is a constructor parameter, you can have a bitsetb as part of a class without having to define the same sized bitsetb between class instances, or having to define the class as a template (this is helpful when bulk-processing multiple instances of a class). This is not possible when size is a template parameter (without using type erasure). Lastly, it means you can construct bitsets without having to know the bitset size at compile time.
Please note: using bitsetb for a very small bitset makes no sense - the size of the internal pointer to the buffer + the size of the internal 'size' parameter, in total, is 128 bits on a 64-bit platform. Therefore if you're making a 128-bit bitset you waste twice as much memory if you use plf::bitsetb over plf::bitset; so if you merely want to have a very small bitset allocated on the heap and don't require different bitset sizes within instances of the same non-template class, allocating a plf::bitset instance on the heap is a better way to go.
Other differences: It has a move constructor and a move assignment operator, whereas plf::bitset doesn't. The constexpr change_size(std::size_t new_size) function is provided for changing the size of your bitset (previous bitset length will be truncated if new size is smaller). operator = copies across a number of bytes equivalent to the smaller of the two bitset's sizes. Swap does not swap buffers.
Note: for historical reasons, there's also a typedef of bitsetb<false> named bitsetc.
Iterators for bitsets don't work well, the reason being that an iterator has to, upon ++ iteration, increment the bit location within the selected storage unit and if it's over the sizeof(storage_type) threshold, increment the storage unit location to the next one over. This necessitates a branch instruction, which (and I did implement an iterator to prove this) makes it slower than simply using an index and the [] operator (which doesn't require a branch instruction). Also, an iterator is larger than storing an index because you have to store both the index of the storage unit and the sub-storage-unit bit index. But even if you implement it with just an index number and use operator [], you're only adding unnecessary boilerplate complexity over a std::size_t index number and slowing things down in debug mode by passing numbers to functions etcetera. Essentially there are no advantages to using an iterator over using a std::size_t and operator [].
Because I couldn't see myself using them.
plf::bitset, plf::bitsetb and plf::bitsetc are under the Computing for Good ethical licence.
Download here or view
the repository
The bitset libraries are simple .h header files, to be included with a #include
command.
Results below were calculated on an Intel i7-9750H processor under Windows 10 in safe mode, using nuwen mingw GCC 13.2 and MSVC 2022 (results for earlier CPUs were either similar or better). Codegen was O2, AVX2 and C++23 for the 'Release' results, just C++23 for the 'Debug' results. Code runs are mostly 'hot' with a number of 'warm-up' function passes before timing is calculated. Results are averaged across a large number of runs. Alll benchmark code is available here. I did not bother with benchmarks for bitsetb and bitsetc since their buffers are allocated on the heap and as such are not performance-comparable with std::bitset. Hardened template parameter is default (off).
The only function results included below are where there were significant differences in performance between the different bitset implementations. Please assume that for all other functions the results are roughly equivalent.

One can see here that while the libstdc++ approach almost optimizes equivalently to plf::bitset under GCC, the exact same (seriously, check the libstdc++ vs MSSTL bitset code) approach optimizes poorly in MSVC, being roughly 3 times slower. The code in question is a branch between two expressions, for both libstdc++/MSSTIL, so my guess (I haven't checked the assembly) is that in this scenario GCC calculates both expressions and then performs a conditional move between the two results, whereas MSVC doesn't.

The reason why MSVC has such a superior result here is because it has an intrinsic (I'm guessing assembly-level or AVX-optimized) instruction for converting the bits to a string. I'm not even going to *try* and compete with that. However for GCC/libstdc++ we see a +144% performance improvement for plf::bitset over std::bitset.


Both shifts are roughly +100% faster for plf under GCC/libstdc++, and +35%/20% faster under MSVC/MSSTL.
Click image or hover over to see results at linear scale instead

This result shows the difference between the optimized plf::bitset set_range/reset_range functions and doing the same thing in std::bitset using a loop and set() or reset(). The bitset sizes were 128000 bits. The operation was repeated 10000 times with the start and end locations of the range of bits to set/reset being randomly calculated each time, so, anything between 1 and 127999 bits being set/reset. In GCC set_range was +34652% faster than std::bitset::set, while in MSVC it was +67612% faster.

This shows the result of testing all functionality in plf::bitset vs std::bitset (minus the functions which are not shared between std::bitset and plf::bitset). Under GCC/libstdc++ there is a +24% overall performance increase, under MSVC/MSSTL it is +20%.

Here we see the results for plf are 3% faster in GCC/libstdc++, but equivalent under MSVC. This is surprising since both libstdc++ and MSSTL use almost exactly the same operator [ ] code. The main difference is the lack of boundary checking in the plf code.
As a favour to game development, and all other fields which work extensively with debug builds as well as optimized builds and/or non-AVX builds, the following are provided. Again, results are only shown for where there were significant performance differences between bitset implementations.

Despite optimizing well at O2/AVX2, the libstdc++ approach to set(value, pos) is 3x slower than the plf approach in debug mode. While the same approach is only 31% slower than plf in MSVC - despite being 2x slower in release mode.

The superior result for MSVC's to_string intrinsic function is even more prominent here. For GCC/libstdc++ we see +27% improvement for plf over std.


For both results plf's approach is roughly +110% faster under GCC, while being around +85%/+66% faster under MSVC.
Click image or hover over to see results at logarithmic scale instead

In debug mode we see plf's set_range is +428127% faster than a GCC/libstdc++ std::bitset::set loop, while in MSVC/MSSTL it is +750726% faster.

Under GCC/libstdc++ plf is +175% faster than std, under MSVC/MSSTL it is +40% faster.

In debug mode we see the plf approach is +360% faster under GCC and +132% faster under MSVC.
Contact: 
plf:: library and this page Copyright (c) 2026, Matthew Bentley