* [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures
@ 2026-06-05 22:20 Emil Tsalapatis
2026-06-05 22:20 ` [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure Emil Tsalapatis
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Emil Tsalapatis @ 2026-06-05 22:20 UTC (permalink / raw)
To: bpf
Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski,
Emil Tsalapatis
Add two new data structures to libarena. These data structures initially
resided in the sched-ext repo (https://github.com/sched-ext/scx) and
have been adapted to the internal libarena build system. The data
structures are:
- Red black tree: Fundamental tree data structure that can also serve
as a base for more domain-specific data structures.
- Lev-Chase deque: Queue data structure that allows efficient work
stealing, useful in scheduling scenarios.
The data structures are accompanied by selftests that are automatically
discovered by the existing libarena test_progs selftest and incorporated
in the CI.
CHANGELOG
=========
v3 -> v4 (https://lore.kernel.org/bpf/20260604235016.20856-1-emil@etsalapatis.com/)
- Turn off load_acquire/store_relesase - dependent selftests for s390 (CI)
- Various style/non-functional nits (AI)
v2 -> v3 (https://lore.kernel.org/bpf/20260603182727.3922-1-emil@etsalapatis.com/)
- Add workaround to handle LLVM 21 and GCC 15 assignment-to-memset promotions
that are causing verification failures for arena programs (CI)
- Incorporate Sashiko feedback for cleanup edge cases (Sashiko)
- Simplify some of the ordering semantics in spmc
v1 -> v2 (https://lore.kernel.org/bpf/20260511214100.9487-1-emil@etsalapatis.com/):
- Rename tests from st_ to test_ (Alexei)
- Removed the freelist caches from the rbtrees, previously used to defer freeing (Alexei)
- Moved the type and function definitions to use the __arena identifier
- Removed the typecasts during function return and directly return __arena
pointers (Alexei)
- Renamed queues to spmc queues to abstract away the algorithm (Alexei)
- Adjusted the memory barriers in the spmc queue
- Added multithreaded testing harness for libarena programs (Alexei)
- Added parallel selftest for queues (Alexei)
- Split least upper bound and exact find operations back into separate
functions to prevent RB_DUPLICATE-related bug (AI)
Emil Tsalapatis (3):
selftests/bpf: libarena: Add rbtree data structure
selftests/bpf: libarena: Add spmc queue data structure
selftests/bpf: libarena: parallel test harness and spmc parallel
selftest
.../bpf/libarena/include/libarena/rbtree.h | 83 ++
.../bpf/libarena/include/libarena/spmc.h | 27 +
.../bpf/libarena/include/libarena/userspace.h | 6 +
.../selftests/test_parallel_spmc.bpf.c | 673 +++++++++++
.../bpf/libarena/selftests/test_rbtree.bpf.c | 968 +++++++++++++++
.../bpf/libarena/selftests/test_spmc.bpf.c | 194 +++
.../selftests/bpf/libarena/src/rbtree.bpf.c | 1047 +++++++++++++++++
.../selftests/bpf/libarena/src/spmc.bpf.c | 234 ++++
.../selftests/bpf/prog_tests/libarena.c | 187 +++
9 files changed, 3419 insertions(+)
create mode 100644 tools/testing/selftests/bpf/libarena/include/libarena/rbtree.h
create mode 100644 tools/testing/selftests/bpf/libarena/include/libarena/spmc.h
create mode 100644 tools/testing/selftests/bpf/libarena/selftests/test_parallel_spmc.bpf.c
create mode 100644 tools/testing/selftests/bpf/libarena/selftests/test_rbtree.bpf.c
create mode 100644 tools/testing/selftests/bpf/libarena/selftests/test_spmc.bpf.c
create mode 100644 tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c
create mode 100644 tools/testing/selftests/bpf/libarena/src/spmc.bpf.c
--
2.54.0
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure 2026-06-05 22:20 [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures Emil Tsalapatis @ 2026-06-05 22:20 ` Emil Tsalapatis 2026-06-05 22:30 ` sashiko-bot 2026-06-05 22:51 ` bot+bpf-ci 2026-06-05 22:20 ` [PATCH bpf-next v4 2/3] selftests/bpf: libarena: Add spmc queue " Emil Tsalapatis ` (2 subsequent siblings) 3 siblings, 2 replies; 11+ messages in thread From: Emil Tsalapatis @ 2026-06-05 22:20 UTC (permalink / raw) To: bpf Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski, Emil Tsalapatis Add a native red-black tree data structure to libarena. The data structure supports multiple APIs (key-value based, node based) with which users can query and modify it. The tree uses the libarena memory allocator to manage its data. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> --- .../bpf/libarena/include/libarena/rbtree.h | 83 ++ .../bpf/libarena/selftests/test_rbtree.bpf.c | 968 +++++++++++++++ .../selftests/bpf/libarena/src/rbtree.bpf.c | 1047 +++++++++++++++++ 3 files changed, 2098 insertions(+) create mode 100644 tools/testing/selftests/bpf/libarena/include/libarena/rbtree.h create mode 100644 tools/testing/selftests/bpf/libarena/selftests/test_rbtree.bpf.c create mode 100644 tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c diff --git a/tools/testing/selftests/bpf/libarena/include/libarena/rbtree.h b/tools/testing/selftests/bpf/libarena/include/libarena/rbtree.h new file mode 100644 index 000000000000..486428911d96 --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/include/libarena/rbtree.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause */ + +#pragma once + +#define RB_MAXLVL_PRINT (16) + +struct rbnode; + +struct rbnode { + struct rbnode __arena *parent; + union { + struct { + struct rbnode __arena *left; + struct rbnode __arena *right; + }; + + struct rbnode __arena *child[2]; + }; + uint64_t key; + /* Used as a linked list or to store KV pairs. */ + union { + struct rbnode __arena *next; + uint64_t value; + }; + bool is_red; +}; + +/* + * Does the rbtree allocate its own nodes, or do they get + * allocated by the caller? + */ +enum rbtree_alloc { + RB_ALLOC, + RB_NOALLOC, +}; + +/* + * Specify the behavior of rbtree insertions when the key is + * already present in the tree. + * + * RB_DEFAULT: Default behavior, reject the new insert. + * + * RB_UPDATE: Update the existing value in the rbtree. + * This updates the node itself, not just the value in + * the existing node. + * + * RB_DUPLICATE: Allow nodes with identical keys in the rbtree. + * Finding/popping/removing a key acts on any of the nodes + * with the appropriate key - there is no ordering by time + * of insertion. + */ +enum rbtree_insert_mode { + RB_DEFAULT, + RB_UPDATE, + RB_DUPLICATE, +}; + +struct rbtree { + struct rbnode __arena *root; + enum rbtree_alloc alloc; + enum rbtree_insert_mode insert; +}; + +#ifdef __BPF__ +struct rbtree __arena *rb_create(enum rbtree_alloc alloc, enum rbtree_insert_mode insert); + +int rb_destroy(struct rbtree __arena *rbtree); +int rb_insert(struct rbtree __arena *rbtree, u64 key, u64 value); +int rb_remove(struct rbtree __arena *rbtree, u64 key); +int rb_find(struct rbtree __arena *rbtree, u64 key, u64 *value); +int rb_print(struct rbtree __arena *rbtree); +int rb_least(struct rbtree __arena *rbtree, u64 *key, u64 *value); +int rb_pop(struct rbtree __arena *rbtree, u64 *key, u64 *value); + +int rb_insert_node(struct rbtree __arena *rbtree, struct rbnode __arena *node); +int rb_remove_node(struct rbtree __arena *rbtree, struct rbnode __arena *node); + +struct rbnode __arena *rb_node_alloc(u64 key, u64 value); +void rb_node_free(struct rbnode __arena *rbnode); + +int rb_integrity_check(struct rbtree __arena *rbtree); + +#endif /* __BPF__ */ diff --git a/tools/testing/selftests/bpf/libarena/selftests/test_rbtree.bpf.c b/tools/testing/selftests/bpf/libarena/selftests/test_rbtree.bpf.c new file mode 100644 index 000000000000..856c484a009a --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/selftests/test_rbtree.bpf.c @@ -0,0 +1,968 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause + +#include <libarena/common.h> + +#include <libarena/asan.h> +#include <libarena/rbtree.h> + +typedef struct node_ctx __arena *node_ctx; + +struct node_ctx { + struct rbnode rbnode; + node_ctx next; +}; + +static const u64 keys[] = { 51, 43, 37, 3, 301, 46, 383, 990, 776, 729, 871, 96, 189, 213, + 376, 167, 131, 939, 626, 119, 374, 700, 772, 154, 883, 620, 641, 5, + 428, 516, 105, 622, 988, 811, 931, 973, 246, 690, 934, 744, 210, 311, + 32, 255, 960, 830, 523, 429, 541, 738, 705, 774, 715, 446, 98, 578, + 777, 191, 279, 91, 767 }; + +static const u64 morekeys[] = { 173, 636, 1201, 8642, 5957, 3617, 4586, 8053, 6551, 7592, 1748, 1589, 8644, 9918, 6977, + 4448, 5852, 4640, 9717, 2303, 7424, 7695, 2334, 8876, 8618, 5745, 7134, 2178, 5280, 2140, 1138, + 5083, 8922, 1516, 2437, 2488, 4307, 4329, 5088, 8456, 5938, 1441, 1684, 5750, 721, 1107, 2089, + 9737, 4687, 5016, 4849, 8193, 9603, 9147, 5992, 166, 6721, 812, 4144, 6237, 6509, 3466, 9255, + 7767, 3960, 6759, 2968, 6046, 9784, 8395, 2619, 1711, 528, 6424, 9084, 3179, 1342, 5676, 9445, + 5691, 6678, 8487, 1627, 998, 6178, 2229, 1987, 3319, 572, 169, 2161, 3018, 5439, 7287, 7265, 5995, + 5003, 5857, 2836, 5634, 4735, 9261, 8287, 5359, 533, 1406, 9573, 4026, 714, 3956, 1722, 6395, + 9648, 3887, 7185, 470, 4482, 4997, 841, 8913, 9946, 3999, 9357, 9847, 277, 8184, 8704, 6766, 3323, + 5468, 8638, 7905, 8858, 6142, 3685, 3452, 4689, 8878, 8836, 158, 831, 7914, 3031, 8374, 4921, + 4207, 3460, 5547, 3358, 1083, 4619, 7818, 2962, 4879, 4583, 2172, 8819, 9830, 1194, 2666, 9812, + 5704, 8432, 5916, 6007, 6609, 4791, 1985, 3226, 2478, 9605, 5236, 8079, 3042, 1965, 3539, 9704, + 4267, 6416, 760, 9968, 2983, 1190, 1964, 3211, 2870, 3106, 2794, 1542, 6916, 5986, 9096, 441, + 5894, 8353, 7765, 3757, 5732, 88, 3091, 5637, 6042, 8447, 4073, 6923, 5491, 7010, 3663, 5029, + 6162, 822, 4874, 7491, 5100, 3461, 6983, 2170, 1458, 1856, 648, 6272, 4887, 976, 2369, 5909, 4274, + 3324, 6968, 2312, 2271, 8891, 6268, 6581, 1610, 8880, 6194, 6144, 9764, 6915, 829, 3774, 2265, + 1752, 1314, 6377, 8760, 8004, 501, 4912, 9278, 1425, 9578, 7337, 307, 1885, 3151, 9617, 1647, + 2458, 3702, 6091, 8902, 5663, 9378, 7640, 3336, 557, 1644, 6848, 1559, 8821, 266, 4330, 9790, + 5920, 4222, 1143, 6248, 5792, 4847, 9726, 6303, 821, 6839, 6062, 7133, 3649, 9888, 2528, 1966, + 5456, 4914, 3615, 1543, 3206, 3353, 6097, 2800, 1424, 9094, 7920, 7243, 1394, 5464, 1707, 576, + 6524, 4261, 4187, 7889, 5336, 3377, 2921, 7244, 2766, 6584, 5514, 1387, 2957, 2258, 1077, 9979, + 1128, 876, 4056, 4668, 4532, 1982, 7093, 4184, 5460, 7588, 4704, 6717, 61, 3959, 1826, 2294, 18, + 8170, 9394, 8796, 7288, 7285, 7143, 148, 6676, 6603, 1051, 8225, 4169, 3230, 7697, 6971, 3454, + 7501, 9514, 394, 2339, 4993, 5606, 6060, 1297, 8273, 3012, 157, 8181, 6765, 7207, 1005, 8833, 1914, + 7456, 1846, 8375, 2741, 2074, 1712, 5286 }; + +SEC("syscall") +__weak int test_rbtree_find_nonexistent(void) +{ + u64 key = 0xdeadbeef; + u64 value = 0; + int ret; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_DEFAULT); + if (!rbtree) + return 1; + + /* Should return -EINVAL */ + ret = rb_find(rbtree, key, &value); + if (!ret) + return 2; + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_insert_existing(void) +{ + u64 key = 525252; + u64 value = 24; + int ret; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_DEFAULT); + if (!rbtree) + return 1; + + ret = rb_insert(rbtree, key, value); + if (ret) + return 2; + + /* Should return -EALREADY. */ + ret = rb_insert(rbtree, key, value); + if (ret != -EALREADY) { + return 3; + } + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_update_existing(void) +{ + u64 key = 33333; + u64 value; + int ret; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_UPDATE); + if (!rbtree) + return 1; + + value = 52; + ret = rb_insert(rbtree, key, value); + if (ret) + return 2; + + ret = rb_find(rbtree, key, &value); + if (ret) + return 3; + + if (value != 52) + return 4; + + value = 65; + + /* Should succeed. */ + ret = rb_insert(rbtree, key, value); + if (ret) + return 5; + + /* Should be updated. */ + ret = rb_find(rbtree, key, &value); + if (ret) + return 6; + + if (value != 65) + return 7; + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_insert_one(void) +{ + u64 key = 202020; + u64 value = 0xbadcafe; + int ret; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_UPDATE); + if (!rbtree) + return 1; + + ret = rb_insert(rbtree, key, value); + if (ret) + return 2; + + ret = rb_find(rbtree, key, &value); + if (ret) + return 3; + + if (value != 0xbadcafe) + return 4; + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_insert_ten(void) +{ + u64 key, value; + int ret, i; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_UPDATE); + if (!rbtree) + return 1; + + for (i = 0; i < 10 && can_loop; i++) { + key = keys[i]; + ret = rb_insert(rbtree, key, 2 * key); + if (ret) + return 2 + 3 * i; + + /* Read it back. */ + ret = rb_find(rbtree, key, &value); + if (ret) + return 2 + 3 * i + 1; + + if (value != 2 * key) + return 2 + 3 * i + 2; + } + + /* Go find all inserted pairs. */ + for (i = 0; i < 10 && can_loop; i++) { + key = keys[i]; + + ret = rb_find(rbtree, key, &value); + if (ret) + return 35 + 2 * i; + + if (value != 2 * key) + return 35 + 2 * i + 1; + } + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_duplicate(void) +{ + u64 key = 0x121212; + u64 value; + int ret, i; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_DUPLICATE); + if (!rbtree) + return 1; + + for (i = 0; i < 10 && can_loop; i++) { + ret = rb_insert(rbtree, key, 2 * key); + if (ret) + return 2 + 3 * i; + + /* Read it back. */ + ret = rb_find(rbtree, key, &value); + if (ret) + return 2 + 3 * i + 1; + + if (value != 2 * key) + return 2 + 3 * i + 2; + } + + /* Go find all inserted copies and remove them. */ + for (i = 0; i < 10 && can_loop; i++) { + ret = rb_find(rbtree, key, &value); + if (ret) { + rb_print(rbtree); + return 35 + 3 * i; + } + + if (value != 2 * key) + return 35 + 3 * i + 1; + + ret = rb_remove(rbtree, key); + if (ret) + return 35 + 3 * i + 2; + } + + return rb_destroy(rbtree); +} + +static inline int +clean_up_noalloc_tree(struct rbtree __arena *rbtree) +{ + node_ctx nodec; + int ret; + + if (rbtree->alloc != RB_NOALLOC) + return -EINVAL; + + /* Can't destroy an RB_NOALLOC tree that still has nodes. */ + if (rb_destroy(rbtree) != -EBUSY) + return -EINVAL; + + while (rbtree->root && can_loop) { + nodec = (node_ctx)arena_container_of(rbtree->root, struct node_ctx, rbnode); + ret = rb_remove_node(rbtree, &nodec->rbnode); + if (ret) + return ret; + + arena_free(nodec); + } + + return 0; +} + +int insert_many(enum rbtree_alloc alloc, enum rbtree_insert_mode insert) +{ + const size_t numkeys = sizeof(keys) / sizeof(keys[0]); + node_ctx nodec; + u64 key, value; + int ret; + int i; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(alloc, insert); + if (!rbtree) + return 1; + + for (i = 0; i < numkeys && can_loop; i++) { + key = keys[i]; + if (rbtree->alloc != RB_ALLOC) { + nodec = arena_malloc(sizeof(*nodec)); + if (!nodec) { + arena_stderr("out of memory\n"); + return -ENOMEM; + } + nodec->rbnode.key = key; + nodec->rbnode.value = 2 * key; + ret = rb_insert_node(rbtree, &nodec->rbnode); + } else { + ret = rb_insert(rbtree, key, 2 * key); + } + if (ret) + return 2 + 3 * i; + + /* Read it back. */ + ret = rb_find(rbtree, key, &value); + if (ret) + return 2 + 3 * i + 1; + + if (value != 2 * key) + return 2 + 3 * i + 2; + } + + /* Go find all inserted pairs. */ + for (i = 0; i < numkeys && can_loop; i++) { + key = keys[i]; + + ret = rb_find(rbtree, key, &value); + if (ret) + return 302 + 2 * i; + + if (value != 2 * key) + return 302 + 2 * i + 1; + } + + /* RB_ALLOC trees are destroyed while still having elements. */ + if (rbtree->alloc == RB_ALLOC) + return rb_destroy(rbtree); + + /* Otherwise manually clean up the tree. */ + if (clean_up_noalloc_tree(rbtree)) + return 5; + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_remove_one(void) +{ + u64 key = 20, value = 5, newvalue; + int ret; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_DEFAULT); + if (!rbtree) + return 1; + + ret = rb_find(rbtree, key, &newvalue); + if (!ret) + return 2; + + ret = rb_insert(rbtree, key, value); + if (ret) + return 3; + + ret = rb_find(rbtree, key, &newvalue); + if (ret || value != newvalue) + return 4; + + ret = rb_remove(rbtree, key); + if (ret) + return 5; + + ret = rb_find(rbtree, key, &newvalue); + if (!ret) + return 6; + + return rb_destroy(rbtree); +} + +static __always_inline int remove_many_verify_all_present(struct rbtree __arena *rbtree) +{ + const size_t numkeys = sizeof(morekeys) / sizeof(morekeys[0]); + u64 value; + int ret; + int i; + + for (i = 0; i < numkeys && can_loop; i++) { + u64 key = morekeys[i]; + + ret = rb_find(rbtree, key, &value); + if (ret) + return -1; + + if (value != 2 * key) + return -1; + } + + return 0; +} + +static __always_inline int remove_many_verify_remaining(struct rbtree __arena *rbtree) +{ + const size_t numkeys = sizeof(morekeys) / sizeof(morekeys[0]); + u64 value; + int ret; + int i; + + for (i = 0; i < numkeys && can_loop; i += 2) { + u64 key = morekeys[i]; + + ret = rb_find(rbtree, key, &value); + if (!ret) + return -1; + + if (i + 1 >= numkeys) + break; + + key = morekeys[i + 1]; + ret = rb_find(rbtree, key, &value); + if (ret) + return -1; + + if (value != 2 * key) + return -1; + } + + for (i = 1; i < numkeys && can_loop; i += 2) { + u64 key = morekeys[i]; + + ret = rb_find(rbtree, key, &value); + if (ret) + return -1; + + if (value != 2 * key) + return -1; + } + + return 0; +} + +static __noinline int remove_many_alloc(struct rbtree __arena *rbtree) +{ + const size_t numkeys = sizeof(morekeys) / sizeof(morekeys[0]); + u64 value; + int ret; + int i; + + for (i = 0; i < numkeys && can_loop; i++) { + u64 key = morekeys[i]; + + ret = rb_insert(rbtree, key, 2 * key); + if (ret) + return -1; + + if (rb_integrity_check(rbtree)) { + arena_stderr("iteration %d\n", i); + return -EINVAL; + } + + ret = rb_find(rbtree, key, &value); + if (ret) + return -1; + + if (value != 2 * key) + return -1; + } + + ret = remove_many_verify_all_present(rbtree); + if (ret) + return ret; + + for (i = 0; i < numkeys && can_loop; i += 2) { + u64 key = morekeys[i]; + + ret = rb_remove(rbtree, key); + if (ret) { + arena_stderr("Failed to remove %ld\n", key); + return -1; + } + + ret = rb_find(rbtree, key, &value); + if (!ret) + return -1; + } + + return remove_many_verify_remaining(rbtree); +} + +static __noinline int remove_many_noalloc(struct rbtree __arena *rbtree) +{ + const size_t numkeys = sizeof(morekeys) / sizeof(morekeys[0]); + node_ctx first = NULL, last = NULL; + u64 value; + int ret; + int i; + + for (i = 0; i < numkeys && can_loop; i++) { + u64 key = morekeys[i]; + node_ctx nodec = arena_malloc(sizeof(*nodec)); + + if (!nodec) { + arena_stderr("out of memory\n"); + return -ENOMEM; + } + nodec->rbnode.key = key; + nodec->rbnode.value = 2 * key; + nodec->next = NULL; + + if (!first) + first = nodec; + + if (last) + last->next = nodec; + last = nodec; + + ret = rb_insert_node(rbtree, &nodec->rbnode); + if (ret) + return -1; + + if (rb_integrity_check(rbtree)) { + arena_stderr("iteration %d\n", i); + return -EINVAL; + } + + ret = rb_find(rbtree, key, &value); + if (ret) + return -1; + + if (value != 2 * key) + return -1; + } + + ret = remove_many_verify_all_present(rbtree); + if (ret) + return ret; + + for (i = 0; i < numkeys && can_loop; i += 2) { + u64 key = morekeys[i]; + node_ctx nodec = first; + + if (!nodec || key != nodec->rbnode.key) + return -1; + + first = nodec->next ? nodec->next->next : NULL; + ret = rb_remove_node(rbtree, &nodec->rbnode); + if (ret) { + arena_stderr("Failed to remove %ld\n", key); + return -1; + } + + ret = rb_find(rbtree, key, &value); + if (!ret) + return -1; + } + + return remove_many_verify_remaining(rbtree); +} + +static inline int remove_many(enum rbtree_alloc alloc, + enum rbtree_insert_mode insert) +{ + int ret; + struct rbtree __arena *rbtree; + + rbtree = rb_create(alloc, insert); + if (!rbtree) + return -ENOMEM; + + ret = (alloc == RB_ALLOC) ? remove_many_alloc(rbtree) + : remove_many_noalloc(rbtree); + if (ret) + return ret; + + if (alloc == RB_ALLOC) + return rb_destroy(rbtree); + + ret = clean_up_noalloc_tree(rbtree); + if (ret) + return ret; + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_insert_many_update(void) +{ + return insert_many(RB_ALLOC, RB_UPDATE); +} + +SEC("syscall") +__weak int test_rbtree_insert_many_noalloc(void) +{ + return insert_many(RB_NOALLOC, RB_DUPLICATE); +} + +SEC("syscall") +__weak int test_rbtree_remove_many_update(void) +{ + return remove_many(RB_ALLOC, RB_UPDATE); +} + +SEC("syscall") +__weak int test_rbtree_remove_many_noalloc(void) +{ + return remove_many(RB_NOALLOC, RB_DUPLICATE); +} + +SEC("syscall") +__weak int test_rbtree_add_remove_circular(void) +{ + const size_t iters = 60; + const size_t prefill = 10; + const size_t numkeys = 50; + const size_t prefix = 400000; + u64 value, rmval; + int errval = 1; + u64 key; + int ret; + int i; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_UPDATE); + if (!rbtree) + return 1; + + for (i = 0; i < prefill && can_loop; i++) { + ret = rb_insert(rbtree, prefix + (i % numkeys), i); + if (ret) + return errval; + + errval += 1; + } + + errval = 2 * 1000 * 1000; + + for (i = 0; i < prefill && can_loop; i++) { + /* Read it back. */ + ret = rb_find(rbtree, prefix + (i % numkeys), &value); + if (ret) + return errval; + + if (value != i) + return errval; + } + + errval = 3 * 1000 * 1000; + + for (i = prefill; i < iters && can_loop; i++) { + key = prefix + (i % numkeys); + + ret = rb_find(rbtree, key, &value); + if (!ret) { + arena_stderr("Key %d already present\n", key); + return errval; + } + + errval += 1; + + ret = rb_insert(rbtree, key, i); + if (ret) { + arena_stderr("ITERATION %d\n", i); + rb_print(rbtree); + return errval; + } + + rmval = i - prefill; + + errval += 1; + + ret = rb_find(rbtree, prefix + (rmval % numkeys), &value); + if (ret) + return errval; + + errval += 1; + + if (value != rmval) + return errval; + + errval += 1; + + ret = rb_remove(rbtree, prefix + (rmval % numkeys)); + if (ret) { + arena_stderr("ITERATION %d\n", i); + return errval; + } + + errval += 1; + } + + for (i = 0; i < numkeys && can_loop; i++) { + rb_remove(rbtree, prefix + i); + } + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_add_remove_circular_reverse(void) +{ + const size_t iters = 110; + const size_t prefill = 10; + const size_t numkeys = 50; + const size_t prefix = 500000; + u64 value, rmval; + int errval = 1; + u64 key; + int ret; + int i; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_UPDATE); + if (!rbtree) + return 1; + + for (i = 0; i < prefill && can_loop; i++) { + ret = rb_insert(rbtree, prefix - (i % numkeys), i); + if (ret) + return errval; + + errval += 1; + } + + errval = 2 * 1000 * 1000; + + for (i = 0; i < prefill && can_loop; i++) { + /* Read it back. */ + ret = rb_find(rbtree, prefix - (i % numkeys), &value); + if (ret) + return errval; + + if (value != i) + return errval; + } + + errval = 3 * 1000 * 1000; + + for (i = prefill; i < iters && can_loop; i++) { + key = prefix - (i % numkeys); + + ret = rb_find(rbtree, key, &value); + if (!ret) { + arena_stderr("Key %d already present\n", key); + return errval; + } + + errval += 1; + + ret = rb_insert(rbtree, key, i); + if (ret) { + arena_stderr("error %d on insert\n", ret); + rb_print(rbtree); + return errval; + } + + rmval = i - prefill; + + errval += 1; + + ret = rb_find(rbtree, prefix - (rmval % numkeys), &value); + if (ret) + return errval; + + errval += 1; + + if (value != rmval) + return errval; + + errval += 1; + + ret = rb_remove(rbtree, prefix - (rmval % numkeys)); + if (ret) + return errval; + + errval += 1; + } + + + errval = 4 * 1000 * 1000; + for (i = 0; i < prefill && can_loop; i++) { + ret = rb_remove(rbtree, prefix - i); + if (ret) { + arena_stderr("Did not remove %d, error %d\n", prefix - i, ret); + return errval + i; + } + } + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_least_pop(void) +{ + const size_t keys = 10; + u64 key, value; + int errval = 1; + int ret, i; + + struct rbtree __arena *rbtree; + + rbtree = rb_create(RB_ALLOC, RB_DEFAULT); + if (!rbtree) + return errval; + + errval += 1; + + for (i = 0; i < keys / 2 && can_loop; i++) { + ret = rb_insert(rbtree, i, i); + if (ret) + return errval; + + errval += 1; + + ret = rb_insert(rbtree, keys - 1 - i, keys - 1 - i); + if (ret) + return errval; + + errval += 1; + + ret = rb_least(rbtree, &key, &value); + if (ret) + return errval; + + errval += 1; + + if (key != 0 || value != 0) + return errval; + + errval += 1; + } + + errval = 1000; + + for (i = 0; i < keys && can_loop; i++) { + ret = rb_least(rbtree, &key, &value); + if (ret) { + arena_stderr("rb_least failed with %d\n", ret); + return errval; + } + + errval += 1; + + if (key != i || value != i) { + arena_stderr("Got KV %ld/%ld expected %d\n", key, value, i); + return errval; + } + + errval += 1; + + ret = rb_pop(rbtree, &key, &value); + if (ret) { + arena_stderr("Error %d during pop on iter %d\n", ret, i); + return errval; + } + + errval += 1; + + if (key != i || value != i) + return errval; + } + + return rb_destroy(rbtree); +} + +/* Reject rb_pop() for RB_NOALLOC trees. */ +SEC("syscall") +__weak int test_rbtree_noalloc_pop(void) +{ + const u64 expect_value = 1; + const u64 expect_key = 0; + struct rbtree __arena *rbtree; + struct rbnode __arena *node; + u64 value = 0; + int ret; + + rbtree = rb_create(RB_NOALLOC, RB_DEFAULT); + if (!rbtree) + return 1; + + node = rb_node_alloc(expect_key, expect_value); + if (!node) { + rb_destroy(rbtree); + return 2; + } + + ret = rb_insert_node(rbtree, node); + if (ret) { + rb_node_free(node); + rb_destroy(rbtree); + return 3; + } + + ret = rb_pop(rbtree, NULL, &value); + if (ret != -EINVAL) + return 4; + + ret = rb_find(rbtree, expect_key, &value); + if (ret) + return 5; + + if (value != expect_value) + return 6; + + ret = rb_remove_node(rbtree, node); + if (ret) + return 7; + + rb_node_free(node); + + return rb_destroy(rbtree); +} + +SEC("syscall") +__weak int test_rbtree_alloc_check(void) +{ + struct rbtree __arena *alloc, *noalloc; + struct rbnode __arena *node; + int ret; + + alloc = rb_create(RB_ALLOC, RB_DEFAULT); + if (!alloc) + return 1; + + noalloc = rb_create(RB_NOALLOC, RB_DEFAULT); + if (!noalloc) + return 2; + + + node = rb_node_alloc(0, 0); + if (!node) + return 3; + + /* + * RB_ALLOC trees can use rb_insert, RB_NOALLOC trees can + * use rb_insert_node. RB_ALLOC and RB_NOALLOC trees cannot + * use each other's APIs. + * + * NOTE: This begs the question, why not different types? We + * want to partially share the API and that would require us + * to duplicate it. + */ + if (rb_insert(alloc, 0, 0)) + return 4; + + if (!rb_insert_node(alloc, node)) + return 5; + + if (!rb_remove_node(alloc, node)) + return 6; + + if (rb_remove(alloc, 0)) + return 7; + + if (rb_insert_node(noalloc, node)) + return 8; + + if (!rb_insert(noalloc, 0, 0)) + return 9; + + if (!rb_remove(noalloc, 0)) + return 10; + + if (rb_remove_node(noalloc, node)) + return 11; + + rb_node_free(node); + + ret = rb_destroy(alloc); + if (ret) + return ret; + + return rb_destroy(noalloc); +} diff --git a/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c new file mode 100644 index 000000000000..7f0f6dc3e17d --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c @@ -0,0 +1,1047 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause +/* + * Copyright (c) 2025-2026 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2025-2026 Emil Tsalapatis <emil@etsalapatis.com> + */ + +#include <libarena/common.h> + +#include <libarena/asan.h> +#include <libarena/rbtree.h> + +int rb_integrity_check(struct rbtree __arena *rbtree); +void rbnode_print(size_t depth, struct rbnode __arena *rbn); +static int rbnode_replace(struct rbtree __arena *rbtree, + struct rbnode __arena *existing, + struct rbnode __arena *replacement); + +struct rbtree __arena *rb_create(enum rbtree_alloc alloc, + enum rbtree_insert_mode insert) +{ + struct rbtree __arena *rbtree; + + rbtree = arena_malloc(sizeof(*rbtree)); + if (unlikely(!rbtree)) + return NULL; + + /* + * RB_UPDATE overwrites existing values in the nodes, but RB_NOALLOC + * trees manage the tree nodes directly (including holding pointers + * to them). Disallow mixing the two modes to avoid dealing with + * unintuitive semantics. + */ + if (alloc == RB_NOALLOC && insert == RB_UPDATE) { + arena_stderr("WARNING: Cannot combine RB_NOALLOC and RB_UPDATE"); + arena_free(rbtree); + return NULL; + } + + rbtree->alloc = alloc; + rbtree->insert = insert; + rbtree->root = NULL; + + return rbtree; +} + +__weak +int rb_destroy(struct rbtree __arena *rbtree) +{ + int ret = 0; + + arena_subprog_init(); + + if (unlikely(!rbtree)) + return -EINVAL; + + if (rbtree->alloc == RB_NOALLOC) { + /* + * We cannot do anything about RB_NOALLOC nodes. The whole + * point of RB_NOALLOC is that the nodes are directly owned + * by the caller that allocates and inserts them. We could + * unilaterally grab all nodes and free them anyway, but that + * would almost certainly cause UAF as the callers keep accessing + * the now freed nodes. Throw an error instead. + */ + if (rbtree->root) { + arena_stderr("WARNING: Destroying RB_NOALLOC tree with > 0 nodes"); + return -EBUSY; + } + + goto out; + } + + while (rbtree->root && can_loop) { + ret = rb_remove(rbtree, rbtree->root->key); + if (ret) + break; + } + +out: + arena_free(rbtree); + return ret; +} + +static inline int rbnode_dir(struct rbnode __arena *node) +{ + /* Arbitrarily choose a direction for the root. */ + if (unlikely(!node->parent)) + return 0; + + return (node->parent->left == node) ? 0 : 1; +} + +/* + * The __noinline is to prevent inlining from bloating the add + * remove calls, in turn causing register splits and increasing + * stack usage above what is permitted. + */ +__noinline +int rbnode_rotate(struct rbtree __arena *rbtree, + struct rbnode __arena *node, int dir) +{ + struct rbnode __arena *tmp, *parent; + int parentdir; + + parent = node->parent; + if (parent) + parentdir = rbnode_dir(node); + + /* If we're doing a root change, are we the root? */ + if (unlikely(!parent && rbtree->root != node)) + return -EINVAL; + + /* + * Does the node we're turning into the root into exist? + * Note that the new root is on the opposite side of the + * rotation's direction. + */ + tmp = node->child[1 - dir]; + if (unlikely(!tmp)) + return -EINVAL; + + /* Steal the closest child of the new root. */ + node->child[1 - dir] = tmp->child[dir]; + if (node->child[1 - dir]) + node->child[1 - dir]->parent = node; + + /* Put the node below the new root.*/ + tmp->child[dir] = node; + node->parent = tmp; + + tmp->parent = parent; + if (parent) + parent->child[parentdir] = tmp; + else + rbtree->root = tmp; + + return 0; +} + +static +struct rbnode __arena *rbnode_find(struct rbnode __arena *subtree, u64 key) +{ + struct rbnode __arena *node = subtree; + int dir; + + if (!subtree) + return NULL; + + while (can_loop) { + if (node->key == key) + break; + + dir = (key < node->key) ? 0 : 1; + + if (!node->child[dir]) + break; + + node = node->child[dir]; + } + + return node; +} + +static +struct rbnode __arena *rbnode_least_upper_bound(struct rbnode __arena *subtree, uint64_t key) +{ + struct rbnode __arena *node = subtree; + int dir; + + if (!subtree) + return NULL; + + while (can_loop) { + dir = (key <= node->key) ? 0 : 1; + + if (!node->child[dir]) + break; + + node = node->child[dir]; + } + + return node; +} + +__weak +int rb_find(struct rbtree __arena *rbtree, u64 key, u64 *value) +{ + struct rbnode __arena *node; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (unlikely(!value)) + return -EINVAL; + + node = rbnode_find(rbtree->root, key); + if (!node || node->key != key) + return -ENOENT; + + *value = node->value; + + return 0; +} + +__weak +struct rbnode __arena *rb_node_alloc(u64 key, u64 value) +{ + struct rbnode __arena *rbnode = NULL; + + rbnode = (struct rbnode __arena *)arena_malloc(sizeof(*rbnode)); + if (!rbnode) + return NULL; + + /* + * WARNING: The order of assignments is weird on purpose. + * See comment in rb_insert_node() for more context. + * TL;DR: Prevent consecutive 0 assignments from being + * promoted into an unverifiable memset by the compiler. + */ + + rbnode->key = key; + rbnode->parent = NULL; + rbnode->value = value; + rbnode->left = NULL; + rbnode->is_red = true; + rbnode->right = NULL; + + return rbnode; +} + +__weak +void rb_node_free(struct rbnode __arena *rbnode) +{ + arena_free(rbnode); +} + +static +int rb_node_insert(struct rbtree __arena *rbtree, + struct rbnode __arena *node) +{ + struct rbnode __arena *grandparent, *parent = rbtree->root; + u64 key = node->key; + struct rbnode __arena *uncle; + int dir; + int ret; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (!parent) { + rbtree->root = node; + return 0; + } + + if (rbtree->insert != RB_DUPLICATE) + parent = rbnode_find(parent, key); + else + parent = rbnode_least_upper_bound(parent, key); + + if (key == parent->key && rbtree->insert != RB_DUPLICATE) { + if (rbtree->insert == RB_UPDATE) { + /* + * Replace the old node with the new one. + * Free up the old node. + */ + ret = rbnode_replace(rbtree, parent, node); + if (ret) + return ret; + + if (rbtree->alloc == RB_ALLOC) + rb_node_free(parent); + + return 0; + } + + /* Otherwise it's RB_DEFAULT. */ + return -EALREADY; + } + + node->parent = parent; + /* Also works if key == parent->key. */ + if (key <= parent->key) + parent->left = node; + else + parent->right = node; + + while (can_loop) { + parent = node->parent; + if (!parent) + return 0; + + if (!parent->is_red) + return 0; + + grandparent = parent->parent; + if (!grandparent) { + parent->is_red = false; + return 0; + } + + dir = rbnode_dir(parent); + uncle = grandparent->child[1 - dir]; + + if (!uncle || !uncle->is_red) { + if (node == parent->child[1 - dir]) { + rbnode_rotate(rbtree, parent, dir); + node = parent; + parent = grandparent->child[dir]; + } + + rbnode_rotate(rbtree, grandparent, 1 - dir); + parent->is_red = false; + grandparent->is_red = true; + + return 0; + } + + /* Uncle is red. */ + + parent->is_red = false; + uncle->is_red = false; + grandparent->is_red = true; + + node = grandparent; + } + + return 0; +} + +int rb_insert_node(struct rbtree __arena *rbtree, + struct rbnode __arena *node) +{ + if (unlikely(!rbtree)) + return -EINVAL; + + if (unlikely(rbtree->alloc == RB_ALLOC)) + return -EINVAL; + + node->left = NULL; + + /* + * Workaround to break an optimization that causes + * verification failures on some compilers. Assignments + * of the kind + * + * *(r0 + 0) = 0; + * *(r0 + 8) = 0; + * *(r0 + 16) = 0; + * + * get promoted into a memset, and that in turn is not + * handled properly for arena memory by LLVM 21 and GCC 15. + * Add a barrier for now to prevent the assignments from being fused. + */ + barrier(); + + node->parent = NULL; + node->right = NULL; + + node->is_red = true; + + return rb_node_insert(rbtree, node); +} + +__weak +int rb_insert(struct rbtree __arena *rbtree, u64 key, u64 value) +{ + struct rbnode __arena *node; + int ret; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (unlikely(rbtree->alloc != RB_ALLOC)) + return -EINVAL; + + node = rb_node_alloc(key, value); + if (!node) + return -ENOMEM; + + ret = rb_node_insert(rbtree, node); + if (ret) { + rb_node_free(node); + return ret; + } + + return 0; +} + +static inline struct rbnode __arena *rbnode_least(struct rbnode __arena *subtree) +{ + while (subtree->left && can_loop) + subtree = subtree->left; + + return subtree; +} + +__weak int rb_least(struct rbtree __arena *rbtree, u64 *key, u64 *value) +{ + struct rbnode __arena *least; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (!rbtree->root) + return -ENOENT; + + least = rbnode_least(rbtree->root); + if (key) + *key = least->key; + if (value) + *value = least->value; + + return 0; +} + + +/* + * If we are referencing ourselves, a and b have a parent-child relation, + * and we should be pointing at the other node instead. + */ +static inline void rbnode_fixup_pointers(struct rbnode __arena *a, + struct rbnode __arena *b) +{ +#define fixup(n1, n2, member) do { if (n1->member == n1) n1->member = n2; } while (0) + fixup(a, b, left); + fixup(a, b, right); + fixup(a, b, parent); +#undef fixup +} + +static inline void rbnode_swap_values(struct rbnode __arena *a, + struct rbnode __arena *b) +{ +#define swap(n1, n2, tmp) do { (tmp) = (n1); (n1) = (n2); (n2) = (tmp); } while (0) + struct rbnode __arena *tmpnode; + u64 tmp; + + /* Swap the pointers. */ + swap(a->is_red, b->is_red, tmp); + + swap(a->left, b->left, tmpnode); + swap(a->right, b->right, tmpnode); + swap(a->parent, b->parent, tmpnode); +#undef swap + + /* Account for the nodes being parent and child. */ + rbnode_fixup_pointers(b, a); + rbnode_fixup_pointers(a, b); +} + +static inline void rbnode_adjust_neighbors(struct rbtree __arena *rbtree, + struct rbnode __arena *node, int dir) +{ + if (node->left) + node->left->parent = node; + if (node->right) + node->right->parent = node; + + if (node->parent) { + node->parent->child[dir] = node; + return; + } + + rbtree->root = node; +} + +/* + * Directly replace an existing node with a replacement. The replacement node + * should not already be in the tree. + */ +static int rbnode_replace(struct rbtree __arena *rbtree, + struct rbnode __arena *existing, + struct rbnode __arena *replacement) +{ + int dir = 0; + + if (unlikely(replacement->parent || replacement->left || replacement->right)) + return -EINVAL; + + if (existing->parent) + dir = rbnode_dir(existing); + + replacement->is_red = existing->is_red; + replacement->left = existing->left; + replacement->right = existing->right; + replacement->parent = existing->parent; + + /* Fix up the new node's neighbors. */ + rbnode_adjust_neighbors(rbtree, replacement, dir); + + return 0; +} + +/* + * Switch two nodes in the tree in place. This is useful during node deletion. + * This is more involved than switching the values of the two nodes because we + * must update all tree pointers. + */ +static void rbnode_switch(struct rbtree __arena *rbtree, + struct rbnode __arena *a, + struct rbnode __arena *b) +{ + int adir = 0, bdir = 0; + + /* + * Store the direction in the parent because we will not + * be able to recompute it once we start swapping values. + */ + if (a->parent) + adir = rbnode_dir(a); + + if (b->parent) + bdir = rbnode_dir(b); + + rbnode_swap_values(a, b); + + /* + * Fix up the pointers from the children/parent to the + * new nodes. + */ + rbnode_adjust_neighbors(rbtree, a, bdir); + rbnode_adjust_neighbors(rbtree, b, adir); +} + +static inline int rbnode_remove_node_single_child(struct rbtree __arena *rbtree, + struct rbnode __arena *node, + bool free) +{ + struct rbnode __arena *child; + int dir; + + if (unlikely(node->is_red)) { + arena_stderr("Node unexpectedly red\n"); + return -EINVAL; + } + + child = node->left ? node->left : node->right; + if (unlikely(!child->is_red)) { + arena_stderr("Only child is black\n"); + return -EINVAL; + } + + /* + * Since it's the immediate child, we can just + * remove the parent. + */ + child->parent = node->parent; + + if (node->parent) { + dir = rbnode_dir(node); + node->parent->child[dir] = child; + } else { + rbtree->root = child; + } + + /* Color the child black. */ + child->is_red = false; + + /* Only free if called from rb_remove. */ + if (free) + rb_node_free(node); + + return 0; +} + +static inline bool rbnode_has_red_children(struct rbnode __arena *node) +{ + if (node->left && node->left->is_red) + return true; + + return node->right && node->right->is_red; +} + +static +int rb_node_remove(struct rbtree __arena *rbtree, + struct rbnode __arena *node) +{ + struct rbnode __arena *parent, *sibling, *close_nephew, *distant_nephew; + bool free = (rbtree->alloc == RB_ALLOC); + struct rbnode __arena *replace, *initial; + bool is_red; + int dir; + + /* Both children present, replace with next largest key. */ + if (node->left && node->right) { + /* + * Swap the node itself instead of just the + * key/value pair to account for nodes embedded + * in other structs. + */ + + replace = rbnode_least(node->right); + rbnode_switch(rbtree, replace, node); + + /* + * FALLTHROUGH: We moved the node we are removing to + * the leftmost position of the subtree. We can now + * remove it as if it was always where we moved it to. + */ + } + + initial = node; + + /* Only one child present, replace with child and paint it black. */ + if (!node->left != !node->right) + return rbnode_remove_node_single_child(rbtree, node, free); + + /* (!node->left && !node->right) */ + + parent = node->parent; + if (!parent) { + /* Check that we're _actually_ the root. */ + if (rbtree->root == node) + rbtree->root = NULL; + else + arena_stderr("WARNING: Attempting to remove detached node from rbtree\n"); + + if (free) + rb_node_free(node); + return 0; + } + + dir = rbnode_dir(node); + parent->child[dir] = NULL; + is_red = node->is_red; + + if (free) + rb_node_free(node); + + /* If we removed a red node, we did not unbalance the tree.*/ + if (is_red) + return 0; + + sibling = parent->child[1 - dir]; + if (unlikely(!sibling)) { + arena_stderr("rbtree: removed black node has no sibling\n"); + return -EINVAL; + } + + /* + * We removed a black node, causing a change in path + * weight. Start rebalancing. The invariant is that + * all paths going through the node are shortened + * by one, and the current node is black. + */ + while (can_loop) { + + /* Balancing reached the root, there can be no imbalance. */ + if (!parent) + return 0; + + /* + * We already determined the dir, either above or + * at the end of the loop. + */ + + /* + * If we have no sibling, the tree was + * already unbalanced. + */ + sibling = parent->child[1 - dir]; + if (unlikely(!sibling)) { + arena_stderr("rbtree: removed black node has no sibling\n"); + return -EINVAL; + } + + /* Sibling is red, turn it into the grandparent. */ + if (sibling->is_red) { + /* + * Sibling is red. Transform the tree to turn + * the sibling into the parent's position, and + * repaint them. This does not balance the tree + * but makes it so we know the sibling is black + * and so can use the transformations to balance. + */ + rbnode_rotate(rbtree, parent, dir); + parent->is_red = true; + sibling->is_red = false; + + /* Our new sibling is now the close nephew. */ + sibling = parent->child[1 - dir]; + /* If sibling has any red siblings, break out. */ + if (rbnode_has_red_children(sibling)) + break; + + /* We can repaint the sibling and parent, we're done. */ + sibling->is_red = true; + parent->is_red = false; + + return 0; + } + + /* Sibling guaranteed to be black. If it has red children, break out. */ + if (rbnode_has_red_children(sibling)) + break; + + /* + * Both sibling and children are black. If parent is red, swap + * colors with the sibling. Otherwise + */ + if (parent->is_red) { + parent->is_red = false; + sibling->is_red = true; + return 0; + } + + /* + * Parent, sibling, and all its children are black. Repaint the sibling. + * This shortens the paths through it, so pop up a level in the + * tree and repeat the balancing. + */ + sibling->is_red = true; + node = parent; + parent = node->parent; + dir = rbnode_dir(node); + } + + if (node != initial) { + dir = rbnode_dir(node); + parent = node->parent; + sibling = parent->child[1-dir]; + } + /* + * Almost there. We know between the parent, sibling, + * and nephews only one or two of the nephews are red. If + * it is the close one, rotate it to the sibling position, + * paint it black, and paint the previous sibling red. + */ + + close_nephew = sibling->child[dir]; + distant_nephew = sibling->child[1 - dir]; + + /* + * If the distant red nephew is not red, rotate + * and repaint. We need the distant nephew + * to be red. We know the close nephew is red + * because at least one of them are, so the + * distant one is black if it exists. + */ + if (!distant_nephew || !distant_nephew->is_red) { + rbnode_rotate(rbtree, sibling, 1 - dir); + sibling->is_red = true; + close_nephew->is_red = false; + distant_nephew = sibling; + sibling = close_nephew; + } + + /* + * We now know it's the distant nephew that's red. + * Rotate the sibling into our parent's position + * and paint both black. + */ + + rbnode_rotate(rbtree, parent, dir); + sibling->is_red = parent->is_red; + parent->is_red = false; + distant_nephew->is_red = false; + + return 0; +} + +__weak +int rb_remove_node(struct rbtree __arena *rbtree, + struct rbnode __arena *node) +{ + if (unlikely(!rbtree)) + return -EINVAL; + + if (unlikely(rbtree->alloc == RB_ALLOC)) + return -EINVAL; + + return rb_node_remove(rbtree, node); +} + +__weak +int rb_remove(struct rbtree __arena *rbtree, u64 key) +{ + struct rbnode __arena *node; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (unlikely(rbtree->alloc != RB_ALLOC)) + return -EINVAL; + + if (!rbtree->root) + return -ENOENT; + + node = rbnode_find(rbtree->root, key); + if (!node || node->key != key) + return -ENOENT; + + return rb_node_remove(rbtree, node); +} + +__weak +int rb_pop(struct rbtree __arena *rbtree, u64 *key, u64 *value) +{ + struct rbnode __arena *node; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (!rbtree->root) + return -ENOENT; + + if (rbtree->alloc != RB_ALLOC) + return -EINVAL; + + node = rbnode_least(rbtree->root); + if (unlikely(!node)) + return -ENOENT; + + if (key) + *key = node->key; + if (value) + *value = node->value; + + return rb_node_remove(rbtree, node); +} + +inline void rbnode_print(size_t depth, struct rbnode __arena *rbn) +{ + arena_stderr("[DEPTH %d] %p (%s)\n PARENT %p", depth, rbn, rbn->is_red ? "red" : "black", rbn->parent); + arena_stderr("\tKV (%ld, %ld)\n LEFT %p RIGHT %p]\n", rbn->key, rbn->value, rbn->left, rbn->right); +} + +enum rb_print_state { + RB_NONE_VISITED, + RB_LEFT_VISITED, + RB_RIGHT_VISITED, +}; + +__weak +enum rb_print_state rb_print_next_state(struct rbnode __arena *rbnode, + enum rb_print_state state, u64 *next) +{ + if (unlikely(!next)) + return RB_NONE_VISITED; + + switch (state) { + case RB_NONE_VISITED: + if (rbnode->left) { + *next = (u64)rbnode->left; + state = RB_LEFT_VISITED; + break; + } + + /* FALLTHROUGH */ + + case RB_LEFT_VISITED: + if (rbnode->right) { + *next = (u64)rbnode->right; + state = RB_RIGHT_VISITED; + break; + } + + /* FALLTHROUGH */ + + default: + *next = 0; + state = RB_RIGHT_VISITED; + } + + return state; +} + +__weak +int rb_print_pop_up(struct rbnode __arena **rbnodep, u8 *depthp, enum rb_print_state (*stack)[RB_MAXLVL_PRINT], enum rb_print_state *state) +{ + struct rbnode __arena *rbnode; + volatile u8 depth; + int j; + + if (unlikely(!rbnodep || !depthp || !stack || !state)) + return -EINVAL; + + rbnode = *rbnodep; + depth = *depthp; + + for (j = 0; j < RB_MAXLVL_PRINT && can_loop; j++) { + if (*state != RB_RIGHT_VISITED) + break; + + depth -= 1; + if (depth < 0 || depth >= RB_MAXLVL_PRINT) + break; + + *state = (*stack)[depth % RB_MAXLVL_PRINT]; + rbnode = rbnode->parent; + } + + *rbnodep = rbnode; + *depthp = depth; + + return 0; +} + +__weak +int rb_print(struct rbtree __arena *rbtree) +{ + enum rb_print_state stack[RB_MAXLVL_PRINT]; + struct rbnode __arena *rbnode = rbtree->root; + enum rb_print_state state; + struct rbnode __arena *next; + u64 next_addr; + u8 depth; + int ret; + + if (unlikely(!rbtree)) + return -EINVAL; + + depth = 0; + state = RB_NONE_VISITED; + + arena_stderr("=== RB TREE START ===\n"); + + if (!rbtree->root) + goto out; + + /* Even with can_loop, the verifier doesn't like infinite loops. */ + while (can_loop) { + if (state == RB_NONE_VISITED) + rbnode_print(depth, rbnode); + + /* Find which child to traverse next. */ + state = rb_print_next_state(rbnode, state, &next_addr); + next = (struct rbnode __arena *)next_addr; + + /* Child found. Store the node state and go on. */ + if (next) { + if (depth < 0 || depth >= RB_MAXLVL_PRINT) + return 0; + + stack[depth++] = state; + + rbnode = next; + state = RB_NONE_VISITED; + + continue; + } + + /* Otherwise, go as far up as possible. */ + ret = rb_print_pop_up(&rbnode, &depth, &stack, &state); + if (ret) + return -EINVAL; + + if (depth < 0 || depth >= RB_MAXLVL_PRINT) { + arena_stderr("=== RB TREE END (depth %d\n)===", depth); + return 0; + } + + } + +out: + arena_stderr("=== RB TREE END ===\n"); + + return 0; +} + +__weak +int rb_integrity_check(struct rbtree __arena *rbtree) +{ + enum rb_print_state stack[RB_MAXLVL_PRINT]; + struct rbnode __arena *rbnode = rbtree->root; + enum rb_print_state state; + struct rbnode __arena *next; + u64 next_addr; + u8 depth; + int ret; + + if (unlikely(!rbtree)) + return -EINVAL; + + if (!rbtree->root) + return 0; + + depth = 0; + state = RB_NONE_VISITED; + + /* Even with can_loop, the verifier doesn't like infinite loops. */ + while (can_loop) { + if (rbnode->parent && rbnode->parent->left != rbnode + && rbnode->parent->right != rbnode) { + arena_stderr("WARNING: Inconsistent tree. Parent %p has no child %p\n", rbnode->parent, rbnode); + return -EINVAL; + } + + if (rbnode->parent == rbnode) { + arena_stderr("WARNING: Inconsistent tree, node %p is its own parent\n", rbnode); + return -EINVAL; + } + + if (rbnode->left == rbnode) { + arena_stderr("WARNING: Inconsistent tree, node %p is its own left child\n", rbnode); + return -EINVAL; + } + + if (rbnode->right == rbnode) { + arena_stderr("WARNING: Inconsistent tree, node %p is its own right child\n", rbnode); + return -EINVAL; + } + + if (rbnode->is_red) { + if (rbnode->left && rbnode->left->is_red) { + arena_stderr("WARNING: Inconsistent tree. Parent has %p has red child %p\n", rbnode, rbnode->left); + return -EINVAL; + } + if (rbnode->right && rbnode->right->is_red) { + arena_stderr("WARNING: Inconsistent tree. Parent has %p has red child %p\n", rbnode, rbnode->right); + return -EINVAL; + } + } else if (rbnode->parent && rbnode->parent->child[1 - rbnode_dir(rbnode)] == NULL) { + arena_stderr("WARNING: Inconsistent tree. Black node %p has no sibling\n", rbnode); + return -EINVAL; + } + + /* Find which child to traverse next. */ + state = rb_print_next_state(rbnode, state, &next_addr); + next = (struct rbnode __arena *)next_addr; + + /* Child found. Store the node state and go on. */ + if (next) { + if (depth < 0 || depth >= RB_MAXLVL_PRINT) + return 0; + + stack[depth++] = state; + + rbnode = next; + state = RB_NONE_VISITED; + + continue; + } + + /* Otherwise, go as far up as possible. */ + ret = rb_print_pop_up(&rbnode, &depth, &stack, &state); + if (ret) + return -EINVAL; + + if (depth < 0 || depth >= RB_MAXLVL_PRINT) { + return 0; + } + + } + + return 0; +} -- 2.54.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure 2026-06-05 22:20 ` [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure Emil Tsalapatis @ 2026-06-05 22:30 ` sashiko-bot 2026-06-05 23:01 ` Emil Tsalapatis 2026-06-05 22:51 ` bot+bpf-ci 1 sibling, 1 reply; 11+ messages in thread From: sashiko-bot @ 2026-06-05 22:30 UTC (permalink / raw) To: Emil Tsalapatis; +Cc: bpf Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [High] Failure to check `can_loop` termination leads to severe tree corruption, memory leaks, and NULL pointer dereferences. -- commit 090a58da31563cb07c7562cdaaf67470a54e90c0 Author: Emil Tsalapatis <emil@etsalapatis.com> selftests/bpf: libarena: Add rbtree data structure This commit adds a native red-black tree data structure to libarena. It supports multiple APIs for querying and modifying the tree, and uses the libarena memory allocator to manage its data. > diff --git a/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c > new file mode 100644 > index 0000000000000..0e4aaec96b6fa > --- /dev/null > +++ b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c [ ... ] > +__weak > +int rb_destroy(struct rbtree __arena *rbtree) > +{ > + int ret = 0; > + > + arena_subprog_init(); > + > + if (unlikely(!rbtree)) > + return -EINVAL; [ ... ] > + while (rbtree->root && can_loop) { > + ret = rb_remove(rbtree, rbtree->root->key); > + if (ret) > + break; > + } > + > +out: > + arena_free(rbtree); [Severity: High] Does this code leak memory if the loop exits early due to the can_loop limit being reached? If the loop aborts early, the rbtree container is freed even though the remaining nodes have not been removed. > + return ret; > +} [ ... ] > +static > +struct rbnode __arena *rbnode_least_upper_bound(struct rbnode __arena *subtree, uint64_t key) > +{ > + struct rbnode __arena *node = subtree; > + int dir; > + > + if (!subtree) > + return NULL; > + > + while (can_loop) { > + dir = (key <= node->key) ? 0 : 1; > + > + if (!node->child[dir]) > + break; > + > + node = node->child[dir]; > + } > + > + return node; [Severity: High] If this loop terminates early because can_loop evaluates to false, will it return an internal tree node rather than a leaf node? > +} [ ... ] > +static > +int rb_node_insert(struct rbtree __arena *rbtree, > + struct rbnode __arena *node) > +{ > + struct rbnode __arena *grandparent, *parent = rbtree->root; > + u64 key = node->key; > + struct rbnode __arena *uncle; > + int dir; > + int ret; [ ... ] > + if (rbtree->insert != RB_DUPLICATE) > + parent = rbnode_find(parent, key); > + else > + parent = rbnode_least_upper_bound(parent, key); [ ... ] > + node->parent = parent; > + /* Also works if key == parent->key. */ > + if (key <= parent->key) > + parent->left = node; [Severity: High] If rbnode_find() or rbnode_least_upper_bound() returns an internal node due to a can_loop timeout, does unconditionally overwriting parent->left or parent->right here permanently detach and leak the existing subtree? > + else > + parent->right = node; [ ... ] > +static > +int rb_node_remove(struct rbtree __arena *rbtree, > + struct rbnode __arena *node) > +{ > + struct rbnode __arena *parent, *sibling, *close_nephew, *distant_nephew; > + bool free = (rbtree->alloc == RB_ALLOC); > + struct rbnode __arena *replace, *initial; [ ... ] > + while (can_loop) { > + > + /* Balancing reached the root, there can be no imbalance. */ > + if (!parent) > + return 0; [ ... ] > + sibling->is_red = true; > + node = parent; > + parent = node->parent; > + dir = rbnode_dir(node); > + } > + > + if (node != initial) { > + dir = rbnode_dir(node); > + parent = node->parent; > + sibling = parent->child[1-dir]; > + } > + /* > + * Almost there. We know between the parent, sibling, > + * and nephews only one or two of the nephews are red. If > + * it is the close one, rotate it to the sibling position, > + * paint it black, and paint the previous sibling red. > + */ > + > + close_nephew = sibling->child[dir]; > + distant_nephew = sibling->child[1 - dir]; > + > + /* > + * If the distant red nephew is not red, rotate > + * and repaint. We need the distant nephew > + * to be red. We know the close nephew is red > + * because at least one of them are, so the > + * distant one is black if it exists. > + */ > + if (!distant_nephew || !distant_nephew->is_red) { > + rbnode_rotate(rbtree, sibling, 1 - dir); > + sibling->is_red = true; > + close_nephew->is_red = false; [Severity: High] Can this cause a NULL pointer dereference if the rebalancing loop above exits early due to can_loop? An early exit bypasses the guarantee that the sibling has red children, which means both nephews, including close_nephew, might be NULL here. > + distant_nephew = sibling; > + sibling = close_nephew; > + } -- Sashiko AI review · https://sashiko.dev/#/patchset/20260605222020.5231-1-emil@etsalapatis.com?part=1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure 2026-06-05 22:30 ` sashiko-bot @ 2026-06-05 23:01 ` Emil Tsalapatis 0 siblings, 0 replies; 11+ messages in thread From: Emil Tsalapatis @ 2026-06-05 23:01 UTC (permalink / raw) To: sashiko-reviews, Emil Tsalapatis; +Cc: bpf On Fri Jun 5, 2026 at 6:30 PM EDT, sashiko-bot wrote: > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: > - [High] Failure to check `can_loop` termination leads to severe tree corruption, memory leaks, and NULL pointer dereferences. > -- > > commit 090a58da31563cb07c7562cdaaf67470a54e90c0 > Author: Emil Tsalapatis <emil@etsalapatis.com> > > selftests/bpf: libarena: Add rbtree data structure > > This commit adds a native red-black tree data structure to libarena. It > supports multiple APIs for querying and modifying the tree, and uses the > libarena memory allocator to manage its data. > >> diff --git a/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c >> new file mode 100644 >> index 0000000000000..0e4aaec96b6fa >> --- /dev/null >> +++ b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c > > [ ... ] > >> +__weak >> +int rb_destroy(struct rbtree __arena *rbtree) >> +{ >> + int ret = 0; >> + >> + arena_subprog_init(); >> + >> + if (unlikely(!rbtree)) >> + return -EINVAL; > > [ ... ] > >> + while (rbtree->root && can_loop) { >> + ret = rb_remove(rbtree, rbtree->root->key); >> + if (ret) >> + break; >> + } >> + >> +out: >> + arena_free(rbtree); > > [Severity: High] > Does this code leak memory if the loop exits early due to the can_loop > limit being reached? If the loop aborts early, the rbtree container is freed > even though the remaining nodes have not been removed. > >> + return ret; >> +} > > [ ... ] > >> +static >> +struct rbnode __arena *rbnode_least_upper_bound(struct rbnode __arena *subtree, uint64_t key) >> +{ >> + struct rbnode __arena *node = subtree; >> + int dir; >> + >> + if (!subtree) >> + return NULL; >> + >> + while (can_loop) { >> + dir = (key <= node->key) ? 0 : 1; >> + >> + if (!node->child[dir]) >> + break; >> + >> + node = node->child[dir]; >> + } >> + >> + return node; > > [Severity: High] > If this loop terminates early because can_loop evaluates to false, will it > return an internal tree node rather than a leaf node? > >> +} > > [ ... ] > >> +static >> +int rb_node_insert(struct rbtree __arena *rbtree, >> + struct rbnode __arena *node) >> +{ >> + struct rbnode __arena *grandparent, *parent = rbtree->root; >> + u64 key = node->key; >> + struct rbnode __arena *uncle; >> + int dir; >> + int ret; > > [ ... ] > >> + if (rbtree->insert != RB_DUPLICATE) >> + parent = rbnode_find(parent, key); >> + else >> + parent = rbnode_least_upper_bound(parent, key); > > [ ... ] > >> + node->parent = parent; >> + /* Also works if key == parent->key. */ >> + if (key <= parent->key) >> + parent->left = node; > > [Severity: High] > If rbnode_find() or rbnode_least_upper_bound() returns an internal node > due to a can_loop timeout, does unconditionally overwriting parent->left or > parent->right here permanently detach and leak the existing subtree? > >> + else >> + parent->right = node; > > [ ... ] > >> +static >> +int rb_node_remove(struct rbtree __arena *rbtree, >> + struct rbnode __arena *node) >> +{ >> + struct rbnode __arena *parent, *sibling, *close_nephew, *distant_nephew; >> + bool free = (rbtree->alloc == RB_ALLOC); >> + struct rbnode __arena *replace, *initial; > > [ ... ] > >> + while (can_loop) { >> + >> + /* Balancing reached the root, there can be no imbalance. */ >> + if (!parent) >> + return 0; > > [ ... ] > >> + sibling->is_red = true; >> + node = parent; >> + parent = node->parent; >> + dir = rbnode_dir(node); >> + } >> + >> + if (node != initial) { >> + dir = rbnode_dir(node); >> + parent = node->parent; >> + sibling = parent->child[1-dir]; >> + } >> + /* >> + * Almost there. We know between the parent, sibling, >> + * and nephews only one or two of the nephews are red. If >> + * it is the close one, rotate it to the sibling position, >> + * paint it black, and paint the previous sibling red. >> + */ >> + >> + close_nephew = sibling->child[dir]; >> + distant_nephew = sibling->child[1 - dir]; >> + >> + /* >> + * If the distant red nephew is not red, rotate >> + * and repaint. We need the distant nephew >> + * to be red. We know the close nephew is red >> + * because at least one of them are, so the >> + * distant one is black if it exists. >> + */ >> + if (!distant_nephew || !distant_nephew->is_red) { >> + rbnode_rotate(rbtree, sibling, 1 - dir); >> + sibling->is_red = true; >> + close_nephew->is_red = false; > > [Severity: High] > Can this cause a NULL pointer dereference if the rebalancing loop above > exits early due to can_loop? > > An early exit bypasses the guarantee that the sibling has red children, > which means both nephews, including close_nephew, might be NULL here. No. We don't care what happens if can_loop triggers. Same for the rest of the bot feedback. The only outstanding actionable nit is imo the mismatch between the function names in the patch description and the actual code for spmc, for which I can send another version if necessary. > >> + distant_nephew = sibling; >> + sibling = close_nephew; >> + } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure 2026-06-05 22:20 ` [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure Emil Tsalapatis 2026-06-05 22:30 ` sashiko-bot @ 2026-06-05 22:51 ` bot+bpf-ci 1 sibling, 0 replies; 11+ messages in thread From: bot+bpf-ci @ 2026-06-05 22:51 UTC (permalink / raw) To: emil, bpf Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski, emil, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 7260 bytes --] > diff --git a/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c > new file mode 100644 > index 000000000000..7f0f6dc3e17d > --- /dev/null > +++ b/tools/testing/selftests/bpf/libarena/src/rbtree.bpf.c [ ... ] > +__weak > +int rb_destroy(struct rbtree __arena *rbtree) > +{ > + int ret = 0; > + > + arena_subprog_init(); > + > + if (unlikely(!rbtree)) > + return -EINVAL; > + > + if (rbtree->alloc == RB_NOALLOC) { > + /* > + * We cannot do anything about RB_NOALLOC nodes. The whole > + * point of RB_NOALLOC is that the nodes are directly owned > + * by the caller that allocates and inserts them. We could > + * unilaterally grab all nodes and free them anyway, but that > + * would almost certainly cause UAF as the callers keep accessing > + * the now freed nodes. Throw an error instead. > + */ > + if (rbtree->root) { > + arena_stderr("WARNING: Destroying RB_NOALLOC tree with > 0 nodes"); > + return -EBUSY; > + } > + > + goto out; > + } > + > + while (rbtree->root && can_loop) { > + ret = rb_remove(rbtree, rbtree->root->key); > + if (ret) > + break; > + } > + > +out: > + arena_free(rbtree); > + return ret; > +} Does this code leak memory if the loop exits early due to the can_loop limit being reached? If the loop aborts early, the rbtree container is freed even though the remaining nodes have not been removed. This was flagged in the v4 review by sashiko-bot: https://lore.kernel.org/bpf/20260605223052.D18201F00893@smtp.kernel.org/ [ ... ] > +static > +struct rbnode __arena *rbnode_least_upper_bound(struct rbnode __arena *subtree, uint64_t key) > +{ > + struct rbnode __arena *node = subtree; > + int dir; > + > + if (!subtree) > + return NULL; > + > + while (can_loop) { > + dir = (key <= node->key) ? 0 : 1; > + > + if (!node->child[dir]) > + break; > + > + node = node->child[dir]; > + } > + > + return node; > +} If this loop terminates early because can_loop evaluates to false, will it return an internal tree node rather than a leaf node? This was flagged in the v4 review by sashiko-bot: https://lore.kernel.org/bpf/20260605223052.D18201F00893@smtp.kernel.org/ [ ... ] > +static > +int rb_node_insert(struct rbtree __arena *rbtree, > + struct rbnode __arena *node) > +{ > + struct rbnode __arena *grandparent, *parent = rbtree->root; > + u64 key = node->key; > + struct rbnode __arena *uncle; > + int dir; > + int ret; > + > + if (unlikely(!rbtree)) > + return -EINVAL; > + > + if (!parent) { > + rbtree->root = node; > + return 0; > + } > + > + if (rbtree->insert != RB_DUPLICATE) > + parent = rbnode_find(parent, key); > + else > + parent = rbnode_least_upper_bound(parent, key); > + > + if (key == parent->key && rbtree->insert != RB_DUPLICATE) { > + if (rbtree->insert == RB_UPDATE) { > + /* > + * Replace the old node with the new one. > + * Free up the old node. > + */ > + ret = rbnode_replace(rbtree, parent, node); > + if (ret) > + return ret; > + > + if (rbtree->alloc == RB_ALLOC) > + rb_node_free(parent); > + > + return 0; > + } > + > + /* Otherwise it's RB_DEFAULT. */ > + return -EALREADY; > + } > + > + node->parent = parent; > + /* Also works if key == parent->key. */ > + if (key <= parent->key) > + parent->left = node; > + else > + parent->right = node; If rbnode_find() or rbnode_least_upper_bound() returns an internal node due to a can_loop timeout, does unconditionally overwriting parent->left or parent->right here permanently detach and leak the existing subtree? This was flagged in the v4 review by sashiko-bot: https://lore.kernel.org/bpf/20260605223052.D18201F00893@smtp.kernel.org/ [ ... ] > +static > +int rb_node_remove(struct rbtree __arena *rbtree, > + struct rbnode __arena *node) > +{ > + struct rbnode __arena *parent, *sibling, *close_nephew, *distant_nephew; > + bool free = (rbtree->alloc == RB_ALLOC); > + struct rbnode __arena *replace, *initial; > + bool is_red; > + int dir; [ ... ] > + while (can_loop) { > + > + /* Balancing reached the root, there can be no imbalance. */ > + if (!parent) > + return 0; [ ... ] > + /* Sibling is red, turn it into the grandparent. */ > + if (sibling->is_red) { > + /* > + * Sibling is red. Transform the tree to turn > + * the sibling into the parent's position, and > + * repaint them. This does not balance the tree > + * but makes it so we know the sibling is black > + * and so can use the transformations to balance. > + */ > + rbnode_rotate(rbtree, parent, dir); > + parent->is_red = true; > + sibling->is_red = false; > + > + /* Our new sibling is now the close nephew. */ > + sibling = parent->child[1 - dir]; > + /* If sibling has any red siblings, break out. */ > + if (rbnode_has_red_children(sibling)) > + break; > + > + /* We can repaint the sibling and parent, we're done. */ > + sibling->is_red = true; > + parent->is_red = false; > + > + return 0; > + } > + > + /* Sibling guaranteed to be black. If it has red children, break out. */ > + if (rbnode_has_red_children(sibling)) > + break; > + > + /* > + * Both sibling and children are black. If parent is red, swap > + * colors with the sibling. Otherwise > + */ > + if (parent->is_red) { > + parent->is_red = false; > + sibling->is_red = true; > + return 0; > + } > + > + /* > + * Parent, sibling, and all its children are black. Repaint the sibling. > + * This shortens the paths through it, so pop up a level in the > + * tree and repeat the balancing. > + */ > + sibling->is_red = true; > + node = parent; > + parent = node->parent; > + dir = rbnode_dir(node); > + } > + > + if (node != initial) { > + dir = rbnode_dir(node); > + parent = node->parent; > + sibling = parent->child[1-dir]; > + } > + /* > + * Almost there. We know between the parent, sibling, > + * and nephews only one or two of the nephews are red. If > + * it is the close one, rotate it to the sibling position, > + * paint it black, and paint the previous sibling red. > + */ > + > + close_nephew = sibling->child[dir]; > + distant_nephew = sibling->child[1 - dir]; > + > + /* > + * If the distant red nephew is not red, rotate > + * and repaint. We need the distant nephew > + * to be red. We know the close nephew is red > + * because at least one of them are, so the > + * distant one is black if it exists. > + */ > + if (!distant_nephew || !distant_nephew->is_red) { > + rbnode_rotate(rbtree, sibling, 1 - dir); > + sibling->is_red = true; > + close_nephew->is_red = false; > + distant_nephew = sibling; > + sibling = close_nephew; > + } Can this cause a NULL pointer dereference if the rebalancing loop exits early due to can_loop? An early exit bypasses the guarantee that the sibling has red children, which means both nephews, including close_nephew, might be NULL here. This was flagged in the v4 review by sashiko-bot: https://lore.kernel.org/bpf/20260605223052.D18201F00893@smtp.kernel.org/ --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27043406243 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v4 2/3] selftests/bpf: libarena: Add spmc queue data structure 2026-06-05 22:20 [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures Emil Tsalapatis 2026-06-05 22:20 ` [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure Emil Tsalapatis @ 2026-06-05 22:20 ` Emil Tsalapatis 2026-06-05 22:51 ` bot+bpf-ci 2026-06-05 22:20 ` [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest Emil Tsalapatis 2026-06-06 3:40 ` [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures patchwork-bot+netdevbpf 3 siblings, 1 reply; 11+ messages in thread From: Emil Tsalapatis @ 2026-06-05 22:20 UTC (permalink / raw) To: bpf Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski, Emil Tsalapatis Expand libarena with a single producer multiple consumer deque data structure. This is a single producer, multiple consumer lockless structure that permits efficient work stealing. The structure is a Lev-Chase queue, so it is lock-free and wait-free. The data structure exposes three main calls. two of them are available to the thread owning the queue and one available to all threads in the program: spmc_owner_push(): Push an item to the top of the queue. spmc_owner_pop(): Pop an item from the top of the queue. spmc_steal(): Steal a thread from the bottom of the queue from any thread. Note that the queue is not really FIFO for all consumers, since non-owners of the queue can only work steal from the bottom. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> --- .../bpf/libarena/include/libarena/spmc.h | 27 ++ .../bpf/libarena/selftests/test_spmc.bpf.c | 194 +++++++++++++++ .../selftests/bpf/libarena/src/spmc.bpf.c | 234 ++++++++++++++++++ 3 files changed, 455 insertions(+) create mode 100644 tools/testing/selftests/bpf/libarena/include/libarena/spmc.h create mode 100644 tools/testing/selftests/bpf/libarena/selftests/test_spmc.bpf.c create mode 100644 tools/testing/selftests/bpf/libarena/src/spmc.bpf.c diff --git a/tools/testing/selftests/bpf/libarena/include/libarena/spmc.h b/tools/testing/selftests/bpf/libarena/include/libarena/spmc.h new file mode 100644 index 000000000000..75611276ce13 --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/include/libarena/spmc.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause */ + +#pragma once + +struct spmc_arr; + +#define SPMC_ARR_BASESZ 128 +#define SPMC_ARR_ORDERS 10 + +struct spmc_arr { + u64 __arena *data; + u64 order; +}; + +struct spmc { + volatile struct spmc_arr __arena *cur; + volatile u64 top; + volatile u64 bottom; + struct spmc_arr arr[SPMC_ARR_ORDERS]; +}; + +int spmc_owned_add(struct spmc __arena *spmc, u64 val); +int spmc_owned_remove(struct spmc __arena *spmc, u64 *val); +int spmc_steal(struct spmc __arena *spmc, u64 *val); + +struct spmc __arena *spmc_create(void); +int spmc_destroy(struct spmc __arena *spmc); diff --git a/tools/testing/selftests/bpf/libarena/selftests/test_spmc.bpf.c b/tools/testing/selftests/bpf/libarena/selftests/test_spmc.bpf.c new file mode 100644 index 000000000000..4d7a520115d1 --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/selftests/test_spmc.bpf.c @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause + +#include <libarena/common.h> + +#include <libarena/asan.h> +#include <libarena/spmc.h> + +/* + * NOTE: These selftests only test for the single-threaded use case, which for + * Lev-Chase queues is obviously the simplest one. Still, it is important to + * exercise the API to ensure it passes verification and basic checks. + */ + +SEC("syscall") +int test_spmc_remove_empty(void) +{ + u64 val; + int ret; + + struct spmc __arena *spmc = spmc_create(); + + if (!spmc) + return 1; + + ret = spmc_owned_remove(spmc, &val); + if (ret != -ENOENT) + return 1; + + spmc_destroy(spmc); + + return 0; +} + +SEC("syscall") +int test_spmc_steal_empty(void) +{ + u64 val; + int ret; + + struct spmc __arena *spmc = spmc_create(); + + if (!spmc) + return 1; + + ret = spmc_steal(spmc, &val); + if (ret != -ENOENT) + return 1; + + spmc_destroy(spmc); + + return 0; +} + +SEC("syscall") +int test_spmc_steal_one(void) +{ + u64 val, newval; + int ret, i; + + struct spmc __arena *spmc = spmc_create(); + + if (!spmc) + return 1; + + for (i = 0; i < 10 && can_loop; i++) { + val = i; + + ret = spmc_owned_add(spmc, val); + if (ret) + return 1; + + ret = spmc_steal(spmc, &newval); + if (ret) + return 2; + + if (val != newval) + return 3; + } + + spmc_destroy(spmc); + + return 0; +} + +SEC("syscall") +int test_spmc_remove_one(void) +{ + u64 val, newval; + int ret, i; + + struct spmc __arena *spmc = spmc_create(); + + if (!spmc) + return 1; + + for (i = 0; i < 10 && can_loop; i++) { + val = i; + + ret = spmc_owned_add(spmc, val); + if (ret) + return 1; + + ret = spmc_owned_remove(spmc, &newval); + if (ret) + return 2; + + if (val != newval) + return 3; + } + + spmc_destroy(spmc); + + return 0; +} + +SEC("syscall") +int test_spmc_remove_many(void) +{ + u64 val, newval; + int ret, i; + u64 expected; + + struct spmc __arena *spmc = spmc_create(); + + if (!spmc) + return 1; + + for (i = 0; i < 500 && can_loop; i++) { + val = i; + + ret = spmc_owned_add(spmc, val); + if (ret) { + arena_stderr("%s:%d error %d\n", __func__, __LINE__, ret); + return 1; + } + } + + for (i = 0; i < 500 && can_loop; i++) { + ret = spmc_owned_remove(spmc, &newval); + if (ret) { + arena_stderr("%s:%d error %d\n", __func__, __LINE__, ret); + return 1; + } + + expected = 500 - 1 - i; + if (newval != expected) { + arena_stderr("%s:%d expected %llu found %llu\n", __func__, __LINE__, expected, newval); + return 1; + } + } + + spmc_destroy(spmc); + + return 0; +} + +SEC("syscall") +int test_spmc_steal_many(void) +{ + u64 val, newval; + int ret, i; + + struct spmc __arena *spmc = spmc_create(); + + if (!spmc) + return 1; + + for (i = 0; i < 500 && can_loop; i++) { + val = i; + + ret = spmc_owned_add(spmc, val); + if (ret) { + arena_stderr("%s:%d error %d\n", __func__, __LINE__, ret); + return 1; + } + } + + for (i = 0; i < 500 && can_loop; i++) { + ret = spmc_steal(spmc, &newval); + if (ret) { + arena_stderr("%s:%d error %d\n", __func__, __LINE__, ret); + return 1; + } + + if (newval != i) { + arena_stderr("%s:%d expected %d found %llu\n", __func__, __LINE__, i, newval); + return 1; + } + } + + spmc_destroy(spmc); + + return 0; +} diff --git a/tools/testing/selftests/bpf/libarena/src/spmc.bpf.c b/tools/testing/selftests/bpf/libarena/src/spmc.bpf.c new file mode 100644 index 000000000000..42732b7d29a6 --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/src/spmc.bpf.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause +/* + * Copyright (c) 2025-2026 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2025-2026 Emil Tsalapatis <etsal@meta.com> + */ + +#include <bpf_atomic.h> + +#include <libarena/common.h> + +#include <libarena/asan.h> +#include <libarena/spmc.h> + +static inline +u64 spmc_arr_size(volatile struct spmc_arr __arena *spmc_arr) +{ + return SPMC_ARR_BASESZ << spmc_arr->order; +} + +static inline +u64 spmc_arr_get(volatile struct spmc_arr __arena *spmc_arr, u64 ind) +{ + u64 ret = READ_ONCE(spmc_arr->data[ind % spmc_arr_size(spmc_arr)]); + + return ret; +} + +static inline +void spmc_arr_put(volatile struct spmc_arr __arena *spmc_arr, u64 ind, u64 value) +{ + WRITE_ONCE(spmc_arr->data[ind % spmc_arr_size(spmc_arr)], value); +} + +static inline +void spmc_arr_copy(volatile struct spmc_arr __arena *dst, + volatile struct spmc_arr __arena *src, u64 b, u64 t) +{ + u64 i; + + for (i = t; i < b && can_loop; i++) + spmc_arr_put(dst, i, spmc_arr_get(src, i)); +} + +static inline +int spmc_order_init(struct spmc __arena *spmc, int order) +{ + volatile struct spmc_arr __arena *arr = &spmc->arr[order]; + + if (unlikely(!spmc)) + return -EINVAL; + + if (order >= SPMC_ARR_ORDERS) + return -E2BIG; + + /* Already allocated? */ + if (arr->data) + return 0; + + arr->data = arena_malloc((SPMC_ARR_BASESZ << order) * sizeof(*arr->data)); + if (!arr->data) + return -ENOMEM; + + return 0; +} + +__weak +int spmc_owned_add(struct spmc __arena *spmc, u64 val) +{ + volatile struct spmc_arr __arena *newarr; + volatile struct spmc_arr __arena *arr; + ssize_t sz; + u64 b, t; + int ret; + + if (unlikely(!spmc)) + return -EINVAL; + + /* + * Bottom must always be read first, also + * see spmc_steal(). + */ + b = smp_load_acquire(&spmc->bottom); + t = READ_ONCE(spmc->top); + arr = READ_ONCE(spmc->cur); + + sz = b - t; + if (sz >= spmc_arr_size(arr) - 1) { + ret = spmc_order_init(spmc, arr->order + 1); + if (ret) + return ret; + + newarr = &spmc->arr[arr->order + 1]; + + spmc_arr_copy(newarr, arr, b, t); + smp_store_release(&spmc->cur, newarr); + arr = newarr; + } + + spmc_arr_put(arr, b, val); + smp_store_release(&spmc->bottom, b + 1); + + return 0; +} + + +__weak +int spmc_owned_remove(struct spmc __arena *spmc, u64 *val) +{ + volatile struct spmc_arr __arena *arr; + int ret = 0; + ssize_t sz; + u64 value; + u64 b, t; + + if (unlikely(!spmc || !val)) + return -EINVAL; + + b = READ_ONCE(spmc->bottom) - 1; + WRITE_ONCE(spmc->bottom, b); + smp_mb(); + + t = READ_ONCE(spmc->top); + arr = READ_ONCE(spmc->cur); + + sz = b - t; + if (sz < 0) { + WRITE_ONCE(spmc->bottom, t); + return -ENOENT; + } + + value = spmc_arr_get(arr, b); + if (sz > 0) { + *val = value; + return 0; + } + + if (cmpxchg(&spmc->top, t, t + 1) != t) + ret = -EAGAIN; + + WRITE_ONCE(spmc->bottom, t + 1); + + if (ret) + return ret; + + *val = value; + + return 0; +} + +__weak +int spmc_steal(struct spmc __arena *spmc, u64 *val) +{ + volatile struct spmc_arr __arena *arr; + ssize_t sz; + u64 value; + u64 b, t; + + if (unlikely(!spmc || !val)) + return -EINVAL; + + /* + * It is important that t is read before b for + * stealers to avoid racing with the owner. + * Races between stealers are dealt with using + * CAS to increment the top value below. + */ + t = smp_load_acquire(&spmc->top); + b = smp_load_acquire(&spmc->bottom); + + sz = b - t; + if (sz <= 0) + return -ENOENT; + + arr = smp_load_acquire(&spmc->cur); + value = spmc_arr_get(arr, t); + + if (cmpxchg(&spmc->top, t, t + 1) != t) + return -EAGAIN; + + *val = value; + + return 0; +} + + +__weak +struct spmc __arena *spmc_create(void) +{ + /* + * Marked as volatile because otherwise the array + * reference in the internal loop gets demoted to + * scalar and the program fails verification. + */ + struct spmc __arena *volatile spmc; + int ret, i; + + spmc = arena_malloc(sizeof(*spmc)); + if (!spmc) + return NULL; + + spmc->bottom = 0; + spmc->top = 0; + + for (i = 0; i < SPMC_ARR_ORDERS && can_loop; i++) { + spmc->arr[i].data = NULL; + spmc->arr[i].order = i; + } + + ret = spmc_order_init((struct spmc __arena *)spmc, 0); + if (ret) { + arena_free(spmc); + return NULL; + } + + spmc->cur = &spmc->arr[0]; + + return (struct spmc __arena *)spmc; +} + +__weak +int spmc_destroy(struct spmc __arena *spmc) +{ + int i; + + if (unlikely(!spmc)) + return -EINVAL; + + for (i = 0; i < SPMC_ARR_ORDERS && can_loop; i++) + arena_free(spmc->arr[i].data); + + arena_free(spmc); + + return 0; +} -- 2.54.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 2/3] selftests/bpf: libarena: Add spmc queue data structure 2026-06-05 22:20 ` [PATCH bpf-next v4 2/3] selftests/bpf: libarena: Add spmc queue " Emil Tsalapatis @ 2026-06-05 22:51 ` bot+bpf-ci 0 siblings, 0 replies; 11+ messages in thread From: bot+bpf-ci @ 2026-06-05 22:51 UTC (permalink / raw) To: emil, bpf Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski, emil, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 2365 bytes --] > selftests/bpf: libarena: Add spmc queue data structure > > Expand libarena with a single producer multiple consumer deque data > structure. This is a single producer, multiple consumer lockless structure > that permits efficient work stealing. The structure is a Lev-Chase queue, > so it is lock-free and wait-free. > > The data structure exposes three main calls. two of them are available to > the thread owning the queue and one available to all threads in the program: > > spmc_owner_push(): Push an item to the top of the queue. > spmc_owner_pop(): Pop an item from the top of the queue. > spmc_steal(): Steal a thread from the bottom of the queue from > any thread. The commit message describes an API that doesn't match the implementation. The message refers to spmc_owner_push() and spmc_owner_pop(), but the actual functions declared in libarena/spmc.h and defined in spmc.bpf.c are spmc_owned_add() and spmc_owned_remove(): tools/testing/selftests/bpf/libarena/include/libarena/spmc.h: int spmc_owned_add(struct spmc __arena *spmc, u64 val); int spmc_owned_remove(struct spmc __arena *spmc, u64 *val); int spmc_steal(struct spmc __arena *spmc, u64 *val); Additionally, the descriptions are inverted. The message says the owner operates on the 'top' and stealers steal from the 'bottom', whereas the code has the owner operate on 'bottom' and stealers operate on 'top': tools/testing/selftests/bpf/libarena/src/spmc.bpf.c:spmc_owned_add() { b = smp_load_acquire(&spmc->bottom); ... spmc_arr_put(arr, b, val); smp_store_release(&spmc->bottom, b + 1); } tools/testing/selftests/bpf/libarena/src/spmc.bpf.c:spmc_owned_remove() { b = READ_ONCE(spmc->bottom) - 1; ... value = spmc_arr_get(arr, b); } tools/testing/selftests/bpf/libarena/src/spmc.bpf.c:spmc_steal() { t = smp_load_acquire(&spmc->top); ... value = spmc_arr_get(arr, t); ... if (cmpxchg(&spmc->top, t, t + 1) != t) } Finally, the message says spmc_steal() steals "a thread" rather than an item or value. Should the commit message be updated to match the actual API and implementation? --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27043406243 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest 2026-06-05 22:20 [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures Emil Tsalapatis 2026-06-05 22:20 ` [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure Emil Tsalapatis 2026-06-05 22:20 ` [PATCH bpf-next v4 2/3] selftests/bpf: libarena: Add spmc queue " Emil Tsalapatis @ 2026-06-05 22:20 ` Emil Tsalapatis 2026-06-05 22:28 ` sashiko-bot 2026-06-05 22:51 ` bot+bpf-ci 2026-06-06 3:40 ` [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures patchwork-bot+netdevbpf 3 siblings, 2 replies; 11+ messages in thread From: Emil Tsalapatis @ 2026-06-05 22:20 UTC (permalink / raw) To: bpf Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski, Emil Tsalapatis Add a parallel test for the SPMC Lev-Chase workstealing queue. The queue is built to be wait-free even when there are multiple consumers, and the parallel selftest provides a signal on whether the queue behaves correctly when stress tested. To support the test, this patch includes a test harness for parallel selftests. The spmc selftest acts as an example of the naming and other conventions expected by the harness. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> --- .../bpf/libarena/include/libarena/userspace.h | 6 + .../selftests/test_parallel_spmc.bpf.c | 673 ++++++++++++++++++ .../selftests/bpf/prog_tests/libarena.c | 187 +++++ 3 files changed, 866 insertions(+) create mode 100644 tools/testing/selftests/bpf/libarena/selftests/test_parallel_spmc.bpf.c diff --git a/tools/testing/selftests/bpf/libarena/include/libarena/userspace.h b/tools/testing/selftests/bpf/libarena/include/libarena/userspace.h index 88b68ac73cca..fc27a4bcf5d7 100644 --- a/tools/testing/selftests/bpf/libarena/include/libarena/userspace.h +++ b/tools/testing/selftests/bpf/libarena/include/libarena/userspace.h @@ -32,6 +32,12 @@ static inline bool libarena_is_asan_test_prog(const char *name) return strstr(name, "asan_test") == name; } +static inline bool libarena_is_parallel_test_prog(const char *name) +{ + return strstr(name, "parallel_test") == name; +} + + static inline int libarena_run_prog_args(int prog_fd, void *args, size_t argsize) { LIBBPF_OPTS(bpf_test_run_opts, opts); diff --git a/tools/testing/selftests/bpf/libarena/selftests/test_parallel_spmc.bpf.c b/tools/testing/selftests/bpf/libarena/selftests/test_parallel_spmc.bpf.c new file mode 100644 index 000000000000..981c845e2d15 --- /dev/null +++ b/tools/testing/selftests/bpf/libarena/selftests/test_parallel_spmc.bpf.c @@ -0,0 +1,673 @@ +// SPDX-License-Identifier: LGPL-2.1 OR BSD-2-Clause + +#include <bpf_atomic.h> + +#include <libarena/common.h> + +#include <libarena/asan.h> +#include <libarena/spmc.h> + +#define TEST_SPMC_THREADS 4 +#define TEST_SPMC_STEALERS (TEST_SPMC_THREADS - 1) + +/* + * The test requires the stealers/owners to sometimes quiesce + * before continuing the benchmark. Normally we'd use something + * like a condition variable, but since the benchmark is short-lived + * and operations are wait-free we just spin around the quiescence + * point instead. If we time out, we just fail the benchmark. + */ +#define TEST_SPMC_SYNC_SPINS (1U << 18) + +/* + * We track all the values we retrieve from the queue + * to get some guarantee we're, not corrupting data, + * e.g., accidentally reusing a past value from a slot. + */ +#define TEST_SPMC_MAX_VALUES (1024) +static u64 __arena seen[TEST_SPMC_MAX_VALUES]; + +/* The single spmc queue for the benchmark. */ +static struct spmc __arena *spmc; + +/* Owner and stealer epochs. We define the , */ +static volatile u64 owner_epoch; +static volatile u64 stealer_epoch; + +/* Map owner epochs to stealer epochs (simply scale by # of stealers). */ +#define STEALER_EPOCH(owner_epoch) ((owner_epoch) * TEST_SPMC_STEALERS) + +/* Global abort switch. If any thread fails, all others exit ASAP. */ +static volatile bool test_abort; + +/* + * Counters useful for ensuring conservation of pushes/pops of unique values + * (we're not stealing/popping more/fewer items than were pushed). + */ +static volatile u64 expected_total; +static volatile u64 total_seen; + +/* Measure how many pops and steals we've made (irrespective of retrieved value). */ +static volatile u64 pops; +static volatile u64 steals; + +/* Used for the resize selftest, see below. */ +static volatile u64 stealers_started; + +/* Used for the mixed selftest, see below. */ +static volatile u64 round_steals; + +/* + * We have multiple stealers and a single owner. We sometimes want the owner + * to successfully outproduce the stealers, we add a busy loop in them. + */ +#define TEST_SPMC_WASTE_ROUNDS (1024) + +/* + * The spmc data structure depends on the runtime fully + * supporting acquire/release semantics, which is not + * the case for all architectures. + */ +#if defined(ENABLE_ATOMICS_TESTS) && \ + (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \ + (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64)) +static bool spmc_tests_enabled(void) +{ + return true; +} +#else +static bool spmc_tests_enabled(void) +{ + return false; +} +#endif + +/* + * Scaffolding for each parallel test. Each test has setup/teardown, + * a single owner thread that owns the queue, and TEST_SPMC_STEALER + * threads that try to steal. + */ +#define DEFINE_PARALLEL_SPMC_TEST(prefix, expected_total) \ + SEC("syscall") int parallel_test_spmc_##prefix##__enabled(void) \ + { \ + return spmc_tests_enabled() ? 0 : -EOPNOTSUPP; \ + } \ + SEC("syscall") int parallel_test_spmc_##prefix##__init(void) \ + { \ + return spmc_common_init(expected_total); \ + } \ + SEC("syscall") int parallel_test_spmc_##prefix##__fini(void) \ + { \ + return spmc_common_fini(); \ + } \ + SEC("syscall") int parallel_test_spmc_##prefix##__0(void) \ + { \ + return spmc_##prefix##_owner(); \ + } \ + SEC("syscall") int parallel_test_spmc_##prefix##__1(void) \ + { \ + return spmc_##prefix##_stealer(); \ + } \ + SEC("syscall") int parallel_test_spmc_##prefix##__2(void) \ + { \ + return spmc_##prefix##_stealer(); \ + } \ + SEC("syscall") int parallel_test_spmc_##prefix##__3(void) \ + { \ + return spmc_##prefix##_stealer(); \ + } + +static int spmc_common_init(u64 total) +{ + u64 i; + + if (total > TEST_SPMC_MAX_VALUES) + return -E2BIG; + + owner_epoch = 0; + stealer_epoch = 0; + test_abort = false; + expected_total = total; + total_seen = 0; + pops = 0; + steals = 0; + stealers_started = 0; + round_steals = 0; + + for (i = zero; i < TEST_SPMC_MAX_VALUES && can_loop; i++) + seen[i] = 0; + + spmc = spmc_create(); + if (!spmc) + return -ENOMEM; + + return 0; +} + +static int spmc_common_fini(void) +{ + int ret; + + ret = spmc_destroy(spmc); + spmc = NULL; + + return ret; +} + +__weak +int spmc_quiesce_on_owner(u64 epoch) +{ + u64 i; + + bpf_for(i, 0, TEST_SPMC_SYNC_SPINS) { + if (test_abort) + return -EINTR; + if (smp_load_acquire(&owner_epoch) >= epoch) + return 0; + } + + test_abort = true; + + return -ETIMEDOUT; +} + +__weak +int spmc_quiesce_on_stealer(u64 epoch) +{ + u64 target, cur; + unsigned int i; + int err = -ETIMEDOUT; + + target = STEALER_EPOCH(epoch); + bpf_for(i, 0, TEST_SPMC_SYNC_SPINS) { + + if (test_abort) { + err = -EINTR; + break; + } + + cur = smp_load_acquire(&stealer_epoch); + if (cur > target) { + err = -EINVAL; + test_abort = true; + break; + } + + if (cur == target) + return 0; + } + + test_abort = true; + + return err; +} + +static int spmc_update_stats(u64 val, bool owner) +{ + u64 total; + + total = expected_total; + if (val >= total || val >= TEST_SPMC_MAX_VALUES) { + test_abort = true; + return -EINVAL; + } + + if (__sync_fetch_and_add(&seen[val], 1) != 0) { + test_abort = true; + return -EINVAL; + } + + __sync_fetch_and_add(&total_seen, 1); + if (owner) + __sync_fetch_and_add(&pops, 1); + else + __sync_fetch_and_add(&steals, 1); + + return 0; +} + +static int spmc_validate_owner_empty(void) +{ + u64 val; + int ret; + + ret = spmc_owned_remove(spmc, &val); + if (ret != -ENOENT) { + test_abort = true; + /* Change a 0 return value into -EINVAL. */ + return ret ?: -EINVAL; + } + + return 0; +} + +__weak +int spmc_validate_all_seen(void) +{ + u64 i, total; + + total = expected_total; + if (total_seen != total) + goto err; + + if (pops + steals != total) + goto err; + + for (i = zero; i < total && can_loop; i++) { + if (seen[i % TEST_SPMC_MAX_VALUES] != 1) + goto err; + } + + return 0; + +err: + test_abort = true; + + return -EINVAL; +} + +/* + * Single value benchmark. The owner adds an item then races with + * the stealers for it. This way directly race between owner and + * stealers on the same slot. + */ + + +#define TEST_SPMC_SINGLEVAL_ITERS (64) + +__weak +int spmc_singleval_tryconsume(u64 expected, bool steal) +{ + u64 val; + int ret; + + while (can_loop) { + if (steal) + ret = spmc_steal(spmc, &val); + else + ret = spmc_owned_remove(spmc, &val); + + /* Success. Update and validate. */ + if (!ret) { + if (val != expected) + return -EINVAL; + + ret = spmc_update_stats(val, !steal); + if (ret) + return ret; + + return 0; + } + + /* + * If we got -ENOENT, the queue is empty + * and we're good to go. + */ + if (ret != -EAGAIN) + return (ret == -ENOENT) ? 0 : ret; + } + + /* Impossible. */ + return -EINVAL; +} + +static int spmc_singleval_owner(void) +{ + int ret; + u64 i; + + for (i = zero; i < TEST_SPMC_SINGLEVAL_ITERS && can_loop; i++) { + ret = spmc_quiesce_on_stealer(i); + if (ret) + goto err; + + ret = spmc_owned_add(spmc, i); + if (ret) + goto err; + + __sync_fetch_and_add(&owner_epoch, 1); + + ret = spmc_singleval_tryconsume(i, false); + if (ret) + goto err; + + ret = spmc_quiesce_on_stealer(i + 1); + if (ret) + goto err; + } + + ret = spmc_validate_owner_empty(); + if (ret) + return ret; + + return spmc_validate_all_seen(); + +err: + test_abort = true; + return -EINVAL; +} + +static int spmc_singleval_stealer(void) +{ + int ret; + u64 i; + + for (i = zero; i < TEST_SPMC_SINGLEVAL_ITERS && can_loop; i++) { + ret = spmc_quiesce_on_owner(i + 1); + if (ret) + goto err; + + ret = spmc_singleval_tryconsume(i, true); + if (ret) + goto err; + + __sync_fetch_and_add(&stealer_epoch, 1); + } + + return 0; + +err: + test_abort = true; + return -EINVAL; +} + +DEFINE_PARALLEL_SPMC_TEST(singleval, TEST_SPMC_SINGLEVAL_ITERS) + +/* + * The resize test. Force a resize from the owner even while the stealers + * are trying to consume. Then make sure the queue is still consistent + * after the resize. + * + * The owner _doesn't_ consume from the queue. The test makes sure that + * switching the array from underneath the stealers works. + */ + +/* Force 2 resizes (since the rate of resize is logarithmic). */ +#define TEST_SPMC_RESIZE_ORDER (2) +#define TEST_SPMC_RESIZE_PREFILL ((SPMC_ARR_BASESZ << TEST_SPMC_RESIZE_ORDER) - 1) + +/* */ +#define TEST_SPMC_RESIZE_TAIL (SPMC_ARR_BASESZ << TEST_SPMC_RESIZE_ORDER) +#define TEST_SPMC_RESIZE_TOTAL (TEST_SPMC_RESIZE_PREFILL + TEST_SPMC_RESIZE_TAIL) + +__weak +int spmc_wait_for_stealers_to_start(u64 target) +{ + u64 i; + + bpf_for(i, 0, TEST_SPMC_SYNC_SPINS) { + if (test_abort) + return -EINTR; + if (READ_ONCE(stealers_started) >= target) + return 0; + } + + test_abort = true; + + return -ETIMEDOUT; +} + +__weak +void spmc_waste_time(void) +{ + int i; + int j; + + for (i = zero; i < TEST_SPMC_WASTE_ROUNDS && can_loop; i++) { + /* Random computation. */ + WRITE_ONCE(j, i * 17 + 23); + } +} + +static int spmc_resize_owner(void) +{ + bool resized = false; + u64 i; + int ret; + + /* Get a head start vs the consumers. */ + for (i = zero; i < TEST_SPMC_RESIZE_PREFILL && can_loop; i++) { + ret = spmc_owned_add(spmc, i); + if (ret) { + test_abort = true; + return ret; + } + } + + __sync_fetch_and_add(&owner_epoch, 1); + + /* Wait for stealers to start then start racing. */ + ret = spmc_wait_for_stealers_to_start(TEST_SPMC_STEALERS); + if (ret) + return ret; + + for (i = TEST_SPMC_RESIZE_PREFILL; i < TEST_SPMC_RESIZE_TOTAL && can_loop; i++) { + ret = spmc_owned_add(spmc, i); + if (ret) { + test_abort = true; + return ret; + } + + if (spmc->cur->order > TEST_SPMC_RESIZE_ORDER) + resized = true; + } + + /* Did we get to resize while racing/ */ + if (!resized) { + test_abort = true; + return -153; + } + + /* + * Wait for the stealers to drain and make sure + * we didn't lose any items along the way. + */ + __sync_fetch_and_add(&owner_epoch, 1); + + ret = spmc_quiesce_on_stealer(1); + if (ret) + return ret; + + ret = spmc_validate_owner_empty(); + if (ret) + return ret; + + return spmc_validate_all_seen(); +} + +static int spmc_resize_stealer(void) +{ + bool owner_done = false; + u64 val; + int ret; + + arena_subprog_init(); + + ret = spmc_quiesce_on_owner(1); + if (ret) + return ret; + + __sync_fetch_and_add(&stealers_started, 1); + + while (can_loop) { + spmc_waste_time(); + if (test_abort) + return -EINTR; + + ret = spmc_steal(spmc, &val); + if (!ret) { + ret = spmc_update_stats(val, false); + if (ret) + return ret; + continue; + } + + if (ret == -EAGAIN) + continue; + + if (ret == -ENOENT) { + if (owner_done) + break; + owner_done = owner_epoch >= 2; + continue; + } + + test_abort = true; + return ret; + } + + __sync_fetch_and_add(&stealer_epoch, 1); + + return 0; +} + +DEFINE_PARALLEL_SPMC_TEST(resize, TEST_SPMC_RESIZE_TOTAL) + +/* + * The burst benchmark. The owner generates data all at once, + * then waits for the stealers to steal half then starts removing + * items until the queue empties. The owner also makes sure the + * item order is not jumbled. + */ + +#define TEST_SPMC_BURST_ROUNDS (4) +#define TEST_SPMC_BURST_BURST (64) +#define TEST_SPMC_BURST_TOTAL (TEST_SPMC_BURST_ROUNDS * TEST_SPMC_BURST_BURST) +#define TEST_SPMC_BURST_STEAL_TARGET (TEST_SPMC_BURST_BURST / 2) + +static int spmc_wait_for_round_steals(u64 target) +{ + u64 i; + + arena_subprog_init(); + + bpf_for(i, 0, TEST_SPMC_SYNC_SPINS) { + if (test_abort) + return -EINTR; + if (round_steals >= target) + return 0; + } + + test_abort = true; + + return -ETIMEDOUT; +} + +__weak int +spmc_burst_owner_round(u64 round) +{ + u64 i, base, stolen, expected, val; + int ret; + + base = round * TEST_SPMC_BURST_BURST; + round_steals = 0; + + for (i = zero; i < TEST_SPMC_BURST_BURST && can_loop; i++) { + ret = spmc_owned_add(spmc, base + i); + if (ret) + return ret; + } + + __sync_fetch_and_add(&owner_epoch, 1); + + ret = spmc_wait_for_round_steals(TEST_SPMC_BURST_STEAL_TARGET); + if (ret == -EINTR || ret == -ETIMEDOUT) + return ret; + + __sync_fetch_and_add(&owner_epoch, 1); + + ret = spmc_quiesce_on_stealer(round + 1); + if (ret) + return ret; + + stolen = round_steals; + if (stolen > TEST_SPMC_BURST_BURST) + return -EINVAL; + + for (i = zero; i < TEST_SPMC_BURST_BURST - stolen && can_loop; i++) { + ret = spmc_owned_remove(spmc, &val); + if (ret) + return ret; + + expected = base + TEST_SPMC_BURST_BURST - 1 - i; + if (val != expected) + return -EINVAL; + + ret = spmc_update_stats(val, true); + if (ret) { + test_abort = true; + return -EINVAL; + } + } + + ret = spmc_validate_owner_empty(); + if (ret) + return ret; + + return 0; +} + +static int spmc_burst_owner(void) +{ + u64 round; + int ret; + + arena_subprog_init(); + + for (round = zero; round < TEST_SPMC_BURST_ROUNDS && can_loop; round++) { + ret = spmc_burst_owner_round(round); + if (ret) + goto err; + } + + return spmc_validate_all_seen(); + +err: + test_abort = true; + return -EINVAL; +} + +static int spmc_burst_stealer(void) +{ + u64 round, val, active_epoch; + int ret; + + arena_subprog_init(); + + for (round = zero; round < TEST_SPMC_BURST_ROUNDS && can_loop; round++) { + active_epoch = round * 2 + 1; + + /* + * Wait till the owner prefills the queue then + * start stealing. + */ + ret = spmc_quiesce_on_owner(active_epoch); + if (ret) + return ret; + + while (owner_epoch == active_epoch && can_loop) { + if (test_abort) + return -EINTR; + + ret = spmc_steal(spmc, &val); + if (!ret) { + ret = spmc_update_stats(val, false); + if (ret) + return ret; + __sync_fetch_and_add(&round_steals, 1); + continue; + } + if (ret == -EAGAIN || ret == -ENOENT) + continue; + + test_abort = true; + return ret; + } + + __sync_fetch_and_add(&stealer_epoch, 1); + } + + return 0; +} + +DEFINE_PARALLEL_SPMC_TEST(burst, TEST_SPMC_BURST_TOTAL) diff --git a/tools/testing/selftests/bpf/prog_tests/libarena.c b/tools/testing/selftests/bpf/prog_tests/libarena.c index 81bdb084c271..61ea68dce410 100644 --- a/tools/testing/selftests/bpf/prog_tests/libarena.c +++ b/tools/testing/selftests/bpf/prog_tests/libarena.c @@ -27,6 +27,177 @@ static void run_libarena_test(struct libarena *skel, struct bpf_program *prog, } +static void *run_libarena_parallel_prog(void *arg) +{ + struct bpf_program *prog = arg; + + return (void *)(long)libarena_run_prog(bpf_program__fd(prog)); +} + +/* Max suffix is ceil((lg 2^32) / (lg 10)) + sizeof("__") = 10 + 2 = 12. */ +#define MAX_PARTEST_SUFFIX (12) +#define MAX_PARTEST_NAME (1024) +#define MAX_PARTEST_PREFIX (MAX_PARTEST_NAME - MAX_PARTEST_SUFFIX) + +static int run_libarena_parallel_fini(struct libarena *skel, const char *name, + size_t prefixlen) +{ + char tdname[MAX_PARTEST_NAME]; + struct bpf_program *fini_prog; + int ret; + + ret = snprintf(tdname, sizeof(tdname), "%.*s__fini", (int)prefixlen, name); + if (!ASSERT_LT(ret, sizeof(tdname), "partest fini name")) + return -ENAMETOOLONG; + + fini_prog = bpf_object__find_program_by_name(skel->obj, tdname); + if (!ASSERT_TRUE(fini_prog, "partest fini prog")) + return -ENOENT; + + ret = libarena_run_prog(bpf_program__fd(fini_prog)); + ASSERT_OK(ret, tdname); + + return ret; +} + +static int run_libarena_parallel_test_workers(struct libarena *skel, + const char *name, size_t prefixlen) +{ + pthread_t *threads = NULL, *tmp_threads; + char tdname[MAX_PARTEST_NAME]; + struct bpf_program *tdprog; + uint32_t nthreads; + void *thread_ret; + int ret, err = 0; + int i; + + for (nthreads = 0; nthreads < UINT_MAX; nthreads++) { + ret = snprintf(tdname, sizeof(tdname), "%.*s__%u", (int)prefixlen, + name, nthreads); + if (!ASSERT_LT(ret, sizeof(tdname), "test worker name")) { + err = -ENAMETOOLONG; + break; + } + + /* + * We enumerate the worker threads for a given test with __0, __1, + * and so on. The suffixes always start from 0 and are contiguous, + * so if we don't find a program with the requested name we have + * discovered all available worker programs. + */ + tdprog = bpf_object__find_program_by_name(skel->obj, tdname); + if (!tdprog) + break; + + /* Bump the alloc array to accommodate the new thread. */ + tmp_threads = realloc(threads, (nthreads + 1) * sizeof(*threads)); + if (!ASSERT_TRUE(tmp_threads, "realloc")) { + err = -ENOMEM; + break; + } + threads = tmp_threads; + + ret = pthread_create(&threads[nthreads], NULL, + run_libarena_parallel_prog, + tdprog); + if (!ASSERT_OK(ret, "pthread_create")) { + err = ret; + break; + } + } + + + for (i = 0; i < nthreads; i++) { + ret = pthread_join(threads[i], &thread_ret); + if (!ASSERT_OK(ret, "pthread_join")) { + err = err ?: ret; + continue; + } + + err = err ?: (long)thread_ret; + } + + free(threads); + + return err; +} + +static bool libarena_parallel_test_enabled(struct libarena *skel, + const char *prefix, + size_t prefixlen) +{ + struct bpf_program *prog; + char progname[MAX_PARTEST_NAME]; + int ret; + + ret = snprintf(progname, sizeof(progname), "%.*s__enabled", (int)prefixlen, + prefix); + if (!ASSERT_LT(ret, sizeof(progname), "partest enabled name")) + return false; + + prog = bpf_object__find_program_by_name(skel->obj, progname); + if (!prog) + return true; + + ret = libarena_run_prog(bpf_program__fd(prog)); + if (ret == -EOPNOTSUPP) + return false; + if (!ASSERT_OK(ret, progname)) + return false; + return true; +} + +static void run_libarena_parallel_test(struct libarena *skel, struct bpf_program *prog, + const char *name) +{ + char testname[MAX_PARTEST_NAME]; + size_t prefixlen; + const char *pos; + int ret; + + /* + * We annotate the initialization prog with __init. If the current prog does + * not match, it is one of the parallel threads instead and is ignored. + * + * We assume the test writer knows what they are doing and do not add __init + * randomly in the middle of a test name. + */ + pos = strstr(name, "__init"); + if (!pos) + return; + + prefixlen = pos - name; + if (!ASSERT_LT(prefixlen, MAX_PARTEST_PREFIX, "partest prefix too long")) + return; + + /* The name of the test without the __init suffix. Looks nicer in the test log. */ + ret = snprintf(testname, sizeof(testname), "%.*s", (int)prefixlen, name); + if (!ASSERT_LT(ret, sizeof(testname), "partest test name")) + return; + + if (!test__start_subtest(testname)) + return; + + if (!libarena_parallel_test_enabled(skel, testname, prefixlen)) { + test__skip(); + return; + } + + ret = libarena_run_prog(bpf_program__fd(skel->progs.arena_buddy_reset)); + if (!ASSERT_OK(ret, "arena_buddy_reset")) + return; + + ret = libarena_run_prog(bpf_program__fd(prog)); + if (!ASSERT_OK(ret, testname)) + return; + + ret = run_libarena_parallel_test_workers(skel, name, prefixlen); + + ASSERT_OK(ret, testname); + + run_libarena_parallel_fini(skel, name, prefixlen); +} + void test_libarena(void) { struct arena_alloc_reserve_args args; @@ -52,6 +223,22 @@ void test_libarena(void) bpf_object__for_each_program(prog, skel->obj) { const char *name = bpf_program__name(prog); + /* + * Handle parallel test progs separately. For those + * progs it's not a matter of test/skip, because each + * parallel test prog includes an initialization prog + * and a set of progs to be run in parallel. For the + * latter we do not record them as skipped or run, + * because we run them all at once when we come across + * the initialization prog. For more details on how we + * discover the progs see the comment on + * run_libarena_parallel_test. + */ + if (libarena_is_parallel_test_prog(name)) { + run_libarena_parallel_test(skel, prog, name); + continue; + } + if (!libarena_is_test_prog(name)) continue; -- 2.54.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest 2026-06-05 22:20 ` [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest Emil Tsalapatis @ 2026-06-05 22:28 ` sashiko-bot 2026-06-05 22:51 ` bot+bpf-ci 1 sibling, 0 replies; 11+ messages in thread From: sashiko-bot @ 2026-06-05 22:28 UTC (permalink / raw) To: Emil Tsalapatis; +Cc: bpf Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [Medium] Missing file descriptor (FD) validation after manual program lookups. -- commit f9cb27c914dcd0d4f6eca457c86b2aac266e34a4 Author: Emil Tsalapatis <emil@etsalapatis.com> selftests/bpf: libarena: parallel test harness and spmc parallel selftest Add a parallel test for the SPMC Lev-Chase workstealing queue. The queue is built to be wait-free even when there are multiple consumers, and the parallel selftest provides a signal on whether the queue behaves correctly when stress tested. > diff --git a/tools/testing/selftests/bpf/prog_tests/libarena.c b/tools/testing/selftests/bpf/prog_tests/libarena.c > index 81bdb084c2715..aa2070f8be825 100644 > --- a/tools/testing/selftests/bpf/prog_tests/libarena.c > +++ b/tools/testing/selftests/bpf/prog_tests/libarena.c > @@ -27,6 +27,177 @@ static void run_libarena_test(struct libarena *skel, struct bpf_program *prog, > > } > > +static void *run_libarena_parallel_prog(void *arg) > +{ > + struct bpf_program *prog = arg; > + > + return (void *)(long)libarena_run_prog(bpf_program__fd(prog)); [Severity: Medium] Should we validate the file descriptor returned by bpf_program__fd here? The program is manually looked up via bpf_object__find_program_by_name in run_libarena_parallel_test_workers before being passed to this thread worker. When using manual lookups instead of auto-generated skeleton fields, the program pointer could be valid while the underlying program is invalid, resulting in a negative FD. An ASSERT_GE check for the file descriptor could prevent unexpected behavior. > +} > + > +/* Max suffix is ceil((lg 2^32) / (lg 10)) + sizeof("__") = 10 + 2 = 12. */ > +#define MAX_PARTEST_SUFFIX (12) > +#define MAX_PARTEST_NAME (1024) > +#define MAX_PARTEST_PREFIX (MAX_PARTEST_NAME - MAX_PARTEST_SUFFIX) > + > +static int run_libarena_parallel_fini(struct libarena *skel, const char *name, > + size_t prefixlen) > +{ > + char tdname[MAX_PARTEST_NAME]; > + struct bpf_program *fini_prog; > + int ret; > + > + ret = snprintf(tdname, sizeof(tdname), "%.*s__fini", (int)prefixlen, name); > + if (!ASSERT_LT(ret, sizeof(tdname), "partest fini name")) > + return -ENAMETOOLONG; > + > + fini_prog = bpf_object__find_program_by_name(skel->obj, tdname); > + if (!ASSERT_TRUE(fini_prog, "partest fini prog")) > + return -ENOENT; > + > + ret = libarena_run_prog(bpf_program__fd(fini_prog)); [Severity: Medium] Does this missing file descriptor check risk passing a negative FD to libarena_run_prog? Similar to the above, we manually look up fini_prog. Checking the return value of bpf_program__fd with ASSERT_GE before using it would align with the BPF selftest guidelines for manual lookups. > + ASSERT_OK(ret, tdname); > + > + return ret; > +} [ ... ] > +static bool libarena_parallel_test_enabled(struct libarena *skel, > + const char *prefix, > + size_t prefixlen) > +{ > + struct bpf_program *prog; > + char progname[MAX_PARTEST_NAME]; > + int ret; > + > + ret = snprintf(progname, sizeof(progname), "%.*s__enabled", (int)prefixlen, > + prefix); > + if (!ASSERT_LT(ret, sizeof(progname), "partest enabled name")) > + return false; > + > + prog = bpf_object__find_program_by_name(skel->obj, progname); > + if (!prog) > + return true; > + > + ret = libarena_run_prog(bpf_program__fd(prog)); [Severity: Medium] Should we verify that the FD returned by bpf_program__fd is non-negative before executing it? Because progname is dynamically constructed and manually looked up, we do not have the same FD validity guarantees as we do for auto-generated skeleton fields. -- Sashiko AI review · https://sashiko.dev/#/patchset/20260605222020.5231-1-emil@etsalapatis.com?part=3 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest 2026-06-05 22:20 ` [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest Emil Tsalapatis 2026-06-05 22:28 ` sashiko-bot @ 2026-06-05 22:51 ` bot+bpf-ci 1 sibling, 0 replies; 11+ messages in thread From: bot+bpf-ci @ 2026-06-05 22:51 UTC (permalink / raw) To: emil, bpf Cc: ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski, emil, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 3647 bytes --] > diff --git a/tools/testing/selftests/bpf/prog_tests/libarena.c b/tools/testing/selftests/bpf/prog_tests/libarena.c > index 81bdb084c271..61ea68dce410 100644 > --- a/tools/testing/selftests/bpf/prog_tests/libarena.c > +++ b/tools/testing/selftests/bpf/prog_tests/libarena.c [ ... ] > @@ -27,6 +27,177 @@ static void run_libarena_test(struct libarena *skel, struct bpf_program *prog, > > } > > +static void *run_libarena_parallel_prog(void *arg) > +{ > + struct bpf_program *prog = arg; > + > + return (void *)(long)libarena_run_prog(bpf_program__fd(prog)); > +} Should the file descriptor returned by bpf_program__fd be validated here? Reviewers in v2, v3, and v4 noted that because prog is manually looked up via bpf_object__find_program_by_name in run_libarena_parallel_test_workers before being passed to this thread worker, the program pointer could be valid while the underlying program is invalid, resulting in a negative FD. An ASSERT_GE check for the file descriptor could prevent unexpected behavior. > + > +/* Max suffix is ceil((lg 2^32) / (lg 10)) + sizeof("__") = 10 + 2 = 12. */ > +#define MAX_PARTEST_SUFFIX (12) > +#define MAX_PARTEST_NAME (1024) > +#define MAX_PARTEST_PREFIX (MAX_PARTEST_NAME - MAX_PARTEST_SUFFIX) > + > +static int run_libarena_parallel_fini(struct libarena *skel, const char *name, > + size_t prefixlen) > +{ > + char tdname[MAX_PARTEST_NAME]; > + struct bpf_program *fini_prog; > + int ret; > + > + ret = snprintf(tdname, sizeof(tdname), "%.*s__fini", (int)prefixlen, name); > + if (!ASSERT_LT(ret, sizeof(tdname), "partest fini name")) > + return -ENAMETOOLONG; > + > + fini_prog = bpf_object__find_program_by_name(skel->obj, tdname); > + if (!ASSERT_TRUE(fini_prog, "partest fini prog")) > + return -ENOENT; > + > + ret = libarena_run_prog(bpf_program__fd(fini_prog)); > + ASSERT_OK(ret, tdname); Does this risk passing a negative FD to libarena_run_prog? Reviewers in v2, v3, and v4 raised concerns about passing an unverified file descriptor. Because fini_prog is manually looked up via bpf_object__find_program_by_name, the manual lookup API can return a valid bpf_program pointer even if the program failed to load, which results in a negative FD. An explicit file descriptor validation such as ASSERT_GE should be added before using the FD. > + > + return ret; > +} [ ... ] > +static bool libarena_parallel_test_enabled(struct libarena *skel, > + const char *prefix, > + size_t prefixlen) > +{ > + struct bpf_program *prog; > + char progname[MAX_PARTEST_NAME]; > + int ret; > + > + ret = snprintf(progname, sizeof(progname), "%.*s__enabled", (int)prefixlen, > + prefix); > + if (!ASSERT_LT(ret, sizeof(progname), "partest enabled name")) > + return false; > + > + prog = bpf_object__find_program_by_name(skel->obj, progname); > + if (!prog) > + return true; > + > + ret = libarena_run_prog(bpf_program__fd(prog)); Should the FD returned by bpf_program__fd be verified as non-negative before executing it? The reviewer noted in v4 that because progname is dynamically constructed and manually looked up via bpf_object__find_program_by_name, it does not have the same FD validity guarantees as auto-generated skeleton fields. FD validation should be added to prevent passing a negative FD. > + if (ret == -EOPNOTSUPP) > + return false; > + if (!ASSERT_OK(ret, progname)) > + return false; > + return true; > +} --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27043406243 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures 2026-06-05 22:20 [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures Emil Tsalapatis ` (2 preceding siblings ...) 2026-06-05 22:20 ` [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest Emil Tsalapatis @ 2026-06-06 3:40 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 11+ messages in thread From: patchwork-bot+netdevbpf @ 2026-06-06 3:40 UTC (permalink / raw) To: Emil Tsalapatis Cc: bpf, ast, andrii, memxor, daniel, eddyz87, song, mattbobrowski Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Fri, 5 Jun 2026 18:20:17 -0400 you wrote: > Add two new data structures to libarena. These data structures initially > resided in the sched-ext repo (https://github.com/sched-ext/scx) and > have been adapted to the internal libarena build system. The data > structures are: > > - Red black tree: Fundamental tree data structure that can also serve > as a base for more domain-specific data structures. > > [...] Here is the summary with links: - [bpf-next,v4,1/3] selftests/bpf: libarena: Add rbtree data structure https://git.kernel.org/bpf/bpf-next/c/6c3e8a4d4765 - [bpf-next,v4,2/3] selftests/bpf: libarena: Add spmc queue data structure https://git.kernel.org/bpf/bpf-next/c/57c6ace8395d - [bpf-next,v4,3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest https://git.kernel.org/bpf/bpf-next/c/42998f819256 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-06 3:40 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-05 22:20 [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures Emil Tsalapatis 2026-06-05 22:20 ` [PATCH bpf-next v4 1/3] selftests/bpf: libarena: Add rbtree data structure Emil Tsalapatis 2026-06-05 22:30 ` sashiko-bot 2026-06-05 23:01 ` Emil Tsalapatis 2026-06-05 22:51 ` bot+bpf-ci 2026-06-05 22:20 ` [PATCH bpf-next v4 2/3] selftests/bpf: libarena: Add spmc queue " Emil Tsalapatis 2026-06-05 22:51 ` bot+bpf-ci 2026-06-05 22:20 ` [PATCH bpf-next v4 3/3] selftests/bpf: libarena: parallel test harness and spmc parallel selftest Emil Tsalapatis 2026-06-05 22:28 ` sashiko-bot 2026-06-05 22:51 ` bot+bpf-ci 2026-06-06 3:40 ` [PATCH bpf-next v4 0/3] selftests/bpf: libarena: Add initial data structures patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox