Andrew Noyes - Weaselab

conflict-set: advance version even if no writes

Andrew Noyes — Sat, 24 Aug 2024 22:05:51 GMT

In versions v0.0.12 and earlier, the documentation in ConflictSet.h is ambiguous about the behavior of a call to check or setOldestVersion with a version greater than the version of the last call to addWrites. One might reasonably expect the behavior to be the same as if there's an implicit call to addWrites with zero writes and the later version prior to the check or setOldestVersion call. However, the 32-bit version implementation was written with the implicit assumption that this does not happen, and so effectively there's a missing precondition in the documentation for check and setOldestVersion. Subsequent versions will explicitly list this precondition.

Versions v0.0.7 - v0.0.12 can report conflicts incorrectly if this precondition is violated. The precondition is now explicit, and so later versions have undefined behavior (the usual consequence of violating preconditions) if the precondition is violated.

Timeline

2024-07-08

v0.0.7 is released containing the bug/missing precondition.

2024/08/23

A downstream project reports an unexpected conflict in v0.0.12

2024/08/24

Root cause is discovered, the missing precondition is published

Details

Recall that 32 bit versions are safe as long as you never compare versions whose full-precision values are different by more than 2^31 - 1.

If you check a version large enough to be outside the window, it can give a wrong result.

cs.addWrites(0, write(b""))  
cs.check(read(2**31 + 1, b""))  
# incorrectly gives conflict, should be commit

The implementation of setOldestVersion doesn't garbage collect until the oldest extant version is back in the valid window (addWrites does), so calling setOldestVersion can result in versions within the tree being outside the window as well.

Extensive fuzzing has not found a scenario where this issue causes an incorrect commit.

You may see use-of-uninitialized-memory if running under Valgrind, which technically means undefined behavior in the c++ abstract machine, but the only undesired behavior observed in the generated code is incorrect conflicts.

Root Cause Analysis

Why not caught sooner?

Since the code was written implicitly assuming that versions for check and setOldestVersion are always <= the version of the last addWrites, there is no code that tests this scenario. The fuzz tests were written in a way where this never happens (and they still are, otherwise the test case would violate a precondition).

How was it discovered?

A downstream project reports an unexpected conflict in v0.0.12

How did we go from symptom to root cause?

After some exploratory experiments trying to rule out various possibilities, we learned that this doesn't reproduce in 64-bit version mode. Then we added some instrumentation to check for comparisons of versions whose full precision is different by more than 2^31 - 1, and found those. Asking "why do we expect this to not happen" led to noticing the implicit precondition.

Prevention

We plan to add a new fuzz test or change the existing one such that the structure of the calls is less constrained. Currently it calls check, then addWrites, then setOldestVersion in a cycle. See https://git.weaselab.dev/weaselab/conflict-set/issues/33

What went well

Deterministic simulation to reproduce consistently
Structuring the code to allow switching between 32-bit versions and 64-bit versions was convenient for root-causing
Having USE_SIMD_FALLBACK handy was also convenient as a way to disable all the simd code that assumes a particular version representation

A bug finds its way into conflict-set v0.0.7

Andrew Noyes — Tue, 23 Jul 2024 22:41:59 GMT

Here at weaselab we take correctness seriously, so when a release contains a correctness issue it's worth taking the time to reflect on how it happened. This serves to improve our understanding of our test effectiveness, incentivize not releasing bugs, and communicate to users transparently.

Summary

In affected versions, interleaving calls to multiple conflict sets in the same thread can cause wrong results. Upgrade to v0.0.9.

Affected versions:

v0.0.7
v0.0.8

Timeline

2024-06-25

We realize (by inspection/thought experiment) that performance for range reads is unsatisfactory in certain scenarios (https://git.weaselab.dev/weaselab/conflict-set/issues/27). This results in a spree of optimizations intended to improve read range performance, the most ambitious of which is storing versions in the tree with 32-bit precision in order to use wider SIMD instructions and save memory. Doing this correctly is quite tricky, but basically it suffices to compare versions (using two's complement as guaranteed in c++20) with subtract followed by signed compare with 0, and ensure that you never compare versions whose full-precision values are different by more than 2^31 - 1. As part of this effort, test coverage is added for advancing the version by large amounts and testing with large versions.

2024-07-02

The bug is introduced in https://git.weaselab.dev/weaselab/conflict-set/commit/3e2c8310bb91681b30ad0b1bf0924d1df308ee93, as part of implementing 32-bit precision versions

2024-07-08

v0.0.7 is released containing the bug

2024-07-18

A downstream project discovers the bug. The symptom is a use-of-uninitialized-memory in an internal ConflictSet call stack, reported by Valgrind.

2024-07-22

Root cause discovered, fixed, and released as v0.0.9

Details

A suitable value for "zero" is tracked in a thread local variable. The semantics of "zero" are simply "<= all valid versions in the current window". This value was incorrectly not updated at the beginning of a call to addWrites. Here's how to reproduce (in a python-like pseudocode):

# write the empty key at version 2 to conflict set one
cs1.addWrites(2, write(b""))  
cs1.setOldestVersion(2)  
# The internal thread local variable "zero" is now 2

# Make a Node48. Correctly checking range reads on a Node48 relies on "zero"
# being set correctly when the Node48 is created.
for i in range(256 - 17, 256):  
    cs2.addWrites(int(1), write(bytes([i])))

# Scan until first point write. This incorrectly conflicts. It should commit.
cs2.check(read(0, b"\x00", bytes([256 - 17])))

It's also possible to commit incorrectly:

# Add an empty list of writes at version 2**32
cs1.addWrites(2**32)  
cs1.setOldestVersion(2**32)  
cs2.addWrites(2**31 + 100)  
cs2.setOldestVersion(2**31 + 100)  
# "zero" is now 2**31 + 100
cs1.addWrites(2**32 + 101, write(b"", b"\x02"), write(b"\x01"))  
# rangeVersion of \x01 is now 2**31 + 100 ("max" of (2**31 + 100, 2**32 + 101))
cs1.check(read(2**32 + 1, b"\x00"))  
# but 2**32 + 1 ">" 2**31 + 100, and it incorrectly commits

If there's only one conflict set at a time per thread, this bug cannot manifest since "zero" remains correct throughout.

Root Cause Analysis

The question to ask is almost always "why wasn't this caught sooner?" The answer in this case is pretty simple. We have sophisticated testing for many scenarios, including concurrent calls on different threads, but we do not test interleaving calls to different conflict sets on the same thread. In hindsight this would have been prudent given the error-prone thread local variable usage.

But let's keep asking "why?". Why were we using a thread local variable in an error-prone way? This was to avoid plumbing "zero" throughout the call tree internally. Why have "zero" at all? Something like this is a necessary part of the book-keeping required to enable storing 32-bit versions internally. Why 32-bit versions? Performance. Why performance? Outperforming the skip list is the raison d'être of this project. The skip list still beats us in "pathological" workloads, but now it's mostly just slow in the usually way a radix tree can be made slow: a, aa, aaa, aaaa, ...

So in the end this is a case of introducing complexity in the name of performance without managing that complexity well enough.

Another interesting question is "how was it discovered?" In this case Valgrind caught it, which is a real blessing since debugging memory errors is often much easier than logical errors. The usual knee-jerk reaction to use-of-uninitialized-memory is to ... initialize the memory. However, while this would have prevented Valgrind from issuing a diagnostic, this would not have been correct and the net effect is that debugging this would be much harder, and delayed since in the short-term we assume we fixed it. In this case, a correct "zero" value prevents the uninitialized memory from getting used, and the memory was uninitialized in the first place because there's nothing meaningful to initialize it with. If you're not comfortable with actual uninitialized memory, you can zero it but still have valgrind track it as uninitialized using client requests.

Last question: "How did we go from symptom to root cause?" Once we could reproduce, we tried recording a trace of the calls to each conflict set and replaying them, to isolate the conflict set. This of course didn't work since interleaving calls from multiple conflict sets is necessary to trigger the bug. After initial frustration with the "record and replay" idea, we realized that "record and replay isn't working" was actually a big clue! Then we started looking at differences between replay and in situ, and found the bug by inspection.

Prevention

Introduce regression tests for this specific bug, and include coverage for interleaving calls in the same thread in our fuzz testing.

What went well

Using Valgrind, and tracking "meaningfulness" of memory as "initializedness"
Deterministic simulation to reproduce consistently
Using libfuzzer to find minimal counter examples to assist with understanding the bug

What is Weaselab?

Andrew Noyes — Wed, 24 Apr 2024 22:13:05 GMT

Weaselab is mostly just a domain name, but it's also where former FoundationDB contributor Andrew Noyes publishes ideas and projects.

The first such project is conflict-set, which is a "A data structure for optimistic concurrency control on ranges of bitwise-lexicographically-ordered keys." It's intended to replace the version-augmented Skip List in FoundationDB, and synthetic benchmarks show it could improver resolver throughput by double or more.