* [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map
@ 2026-06-05 11:41 Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API Mykyta Yatsenko
` (12 more replies)
0 siblings, 13 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko, Emil Tsalapatis
This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that
leverages the kernel's rhashtable to provide resizable hash map for BPF.
The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at
map creation time. While this works well for many use cases, it presents
challenges when:
1. The number of elements is unknown at creation time
2. The element count varies significantly during runtime
3. Memory efficiency is important (over-provisioning wastes memory,
under-provisioning hurts performance)
BPF_MAP_TYPE_RHASH addresses these issues by using rhashtable, which
automatically grows and shrinks based on load factor.
The implementation wraps the kernel's rhashtable with BPF map operations:
- Uses bpf_mem_alloc for RCU-safe memory management
- Supports all standard map operations (lookup, update, delete, get_next_key)
- Supports batch operations (lookup_batch, lookup_and_delete_batch)
- Supports BPF iterators for traversal
- Supports BPF_F_LOCK for spin locks in values
- Requires BPF_F_NO_PREALLOC flag (elements allocated on demand)
- In-place updates for improved performance.
- max_entries serves as a hard limit, not bucket count
- Uses bit_spin_lock() + local_irq_save() for bucket locking,
similar to existing BPF hashmap's raw_spin_lock_irqsave(), insertions and
deletes may fail.
- Iterations are best-effort, if resize, insertions, deletions take place
concurrently, iterations may visit same elements multiple times or skip
elements.
- Lock out insertions, when running special fields destructor to guarantee
its completion.
The series includes comprehensive tests:
- Basic operations in test_maps (lookup, update, delete, get_next_key)
- BPF program tests for lookup/update/delete semantics
- Seq file tests
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Update implementation
---------------------
Current implementation of the BPF_MAP_TYPE_RHASH does not provide
the same strong guarantees on the values consistency under concurrent
reads/writes as BPF_MAP_TYPE_HASH.
BPF_MAP_TYPE_HASH allocates a new element and atomically swaps the
pointer. BPF_MAP_TYPE_RHASH does memcpy in place with no lock held.
rhash trades consistency for speed, concurrent readers can observe
partially updated data. Two concurrent writers to the same key can
also interleave, producing mixed values. This is similar to arraymap
update implementation, including handling of the special fields.
As a solution, user may use BPF_F_LOCK to guarantee consistent reads
and write serialization.
Summary of the read consistency guarantees:
map type | write mechanism | read consistency
-------------+------------------+--------------------------
htab | alloc, swap ptr | always consistent (RCU)
htab F_LOCK | in-place + lock | consistent if reader locks
-------------+------------------+--------------------------
rhtab | in-place memcpy | torn reads
rhtab F_LOCK | in-place + lock | consistent if reader locks
Benchmarks
----------
1. LOOKUP (single producer, M events/sec)
key | max | nr | htab | rhtab | ratio | delta
----+-----+-------+---------+---------+-------+-------
8 | 1K | 750 | 99.85 | 81.92 | 0.82x | -18 %
8 | 1K | 1K | 100.71 | 80.19 | 0.80x | -20 %
8 | 1M | 750K | 23.37 | 72.09 | 3.08x | +208 %
8 | 1M | 1M | 13.39 | 53.72 | 4.01x | +301 %
32 | 1K | 750 | 51.57 | 42.78 | 0.83x | -17 %
32 | 1K | 1K | 50.81 | 45.83 | 0.90x | -10 %
32 | 1M | 750K | 11.27 | 15.29 | 1.36x | +36 %
32 | 1M | 1M | 7.32 | 8.75 | 1.19x | +19 %
256 | 1K | 750 | 7.58 | 7.88 | 1.04x | +4 %
256 | 1K | 1K | 7.43 | 7.81 | 1.05x | +5 %
256 | 1M | 750K | 3.69 | 4.27 | 1.16x | +16 %
256 | 1M | 1M | 2.60 | 3.12 | 1.20x | +20 %
Pattern:
* Small map (1K): htab wins for 8 / 32 byte keys by 10-20 %
because the preallocated bucket array fits in L1. Equalises
at 256 byte keys.
* Large map (1M): rhtab wins everywhere, up to 4x at high load
factor with 8 byte keys.
* Higher load factor amplifies rhtab's lead: rhtab grows the
bucket array; htab stays at user-declared max.
2. FULL UPDATE (M events/sec per producer, -p 7)
htab per-producer:
20.33 22.02 19.27 23.61 24.18 23.17 21.07
mean 21.94 range 19.27 - 24.18
rhtab per-producer:
133.51 129.47 74.52 129.29 102.26 129.98 107.64
mean 115.24 range 74.52 - 133.51
speedup (mean): 5.25x (+425 %)
In-place memcpy avoids the per-update alloc + RCU pointer swap
that htab pays.
3. MEMORY (overwrite, -p 8, no --preallocated)
value_size | htab ops/s | rhtab ops/s | htab mem | rhtab mem
-----------+-------------+-------------+----------+----------
32 B | 122.87 k/s | 133.04 k/s | 2.47 MiB | 2.49 MiB
4096 B | 64.43 k/s | 65.38 k/s | 6.74 MiB | 6.44 MiB
rhtab/htab : +8 % ops, +0.8 % mem (32 B)
+1 % ops, -4 % mem (4096 B)
SUMMARY
* Small / well-fitting map: htab is faster (cache-friendly
fixed bucket array), but only by ~10-20 %.
* Large / high-load-factor map: rhtab is dramatically faster
(1.2x to 4x) because rhashtable resizes to keep the load
factor sane while htab stays stuck at user-declared max.
* Update-heavy workloads: rhtab is ~5x faster per producer
via in-place memcpy.
* Memory benchmark: effectively on par
---
Changes in v7:
- rhashtable_next_key: move into lib/rhashtable.c, drop params argument
(Herbert).
- rhashtable_next_key: kdoc clarifies that behavior on tables with
duplicate keys is undefined (sashiko).
- rhashtable: include Herbert's "Use irq work for shrinking" patch so
__rhashtable_remove_fast_one() can fire the shrink path from NMI
context (Herbert).
- hashtab: fix u32 multiply overflow in __rhtab_map_lookup_and_delete_batch
copy_to_user; cast total to size_t before multiplying by key_size /
value_size (sashiko, bot+bpf-ci).
- hashtab: allow kptr/refcount fields in rhtab values (same model as
array map).
- Link to v6: https://patch.msgid.link/20260602-rhash-v6-0-1bfd35a4184f@meta.com
Changes in v6:
- rhashtable_next_key: advance past duplicate keys in the main bucket
chain to avoid an infinite loop when there are duplicate keys
(sashiko).
- rhashtable_next_key: return ERR_PTR(-EOPNOTSUPP) on rhltable (sashiko).
- rhashtable: selftest pre-sizes the table to avoid concurrent rehash
triggering spurious failures (sashiko).
- hashtab: real rhtab_map_mem_usage in the basic commit; move
bpf_map_free_internal_structs from rhtab_free_elem into the
special-fields commit where it does meaningful work (bot+bpf-ci).
- bpf_iter (seq_file): switch to rhashtable_walk_* for stronger
coverage under concurrent rehash; get_next_key and batch keep
rhashtable_next_key (sashiko).
- iter ops: rhtab_map_get_next_key adds IS_ERR check
before dereferencing the element pointer (sashiko).
- iter ops: bpf_each_rhash_elem removes cond_resched() (sashiko).
- iter ops: batch returns -EAGAIN (not -ENOENT) on cursor delete,
so userspace can distinguish lost cursor from end-of-iteration
and restart from NULL (sashiko).
- Link to v5: https://patch.msgid.link/20260528-rhash-v5-0-7205191b6c57@meta.com
Changes in v5:
- rhashtable_next_key: add kdoc WARNING to highlight lack of rehash
detection and unbounded iteration (Herbert).
- rhashtable: selftest now checks IS_ERR() before PTR_ERR comparison
on the missing-key path (bot+bpf-ci).
- hashtab: drop dead stub bodies and unused map_ops registrations
from the basic commit; iteration commit adds bodies, structs, and
registrations together. .map_get_next_key keeps a stub registration
in the basic commit because the syscall dispatcher does not
NULL-check it; iteration commit replaces the stub body with the
real implementation (bot+bpf-ci).
- hashtab: fix batch cursor advancement. v4 stashed the lookahead
element key but then resumed via next_key(cursor), skipping that
element across batch boundaries and orphaning it on
lookup_and_delete_batch. v5 stashes the lookahead key and looks
it up directly on the next batch entry (bot+bpf-ci, sashiko v3).
- hashtab: document torn-read race in rhtab_map_update_existing,
matching arraymap semantics (bot+bpf-ci).
- Link to v4: https://patch.msgid.link/20260513-rhash-v4-0-dd3d541ccb0b@meta.com
Changes in v4:
- rhashtable: introduce rhashtable_next_key(), drop walker-based
iteration for BPF (also drops earlier rhashtable_walk_enter_from()
proposal).
- map_extra: presize hint via lower 32 bits (nelem_hint), capped at
U16_MAX.
- Automatic shrinking enabled (was missing despite being advertised).
- Reject key_size > U16_MAX (rhashtable_params.key_len is u16).
- Replace irqs_disabled() guard with bpf_disable_instrumentation around
bucket-lock paths: closes same-CPU NMI tracing recursion without
rejecting legitimate IRQ-context callers.
- lookup_and_delete reordered: unlink before copy to avoid populating
user buffer on concurrent-unlink -ENOENT.
- update_existing reordered: copy then free_fields, matching arraymap.
- Word-sized key fast path (sizeof(long) bytes), inlined hashfn/cmpfn
via static-const rhashtable_params; works on both 32-bit and 64-bit.
- check_and_init_map_value() on insert (zero special-field bytes from
recycled bpf_mem_alloc memory; previously bpf_spin_lock could read
garbage and qspinlock would deadlock).
- BPF_SPIN_LOCK / BPF_RES_SPIN_LOCK allowlist moved to the special-
fields commit so each commit is bisect-safe.
- Link to v3: https://patch.msgid.link/20260424-rhash-v3-0-d0fa0ce4379b@meta.com
Changes in v3:
- Squash all commits implementing basic functions into one (Alexei)
- Remove selftests that were not necessary (Alexei)
- Resize detection for kernel full iterations, error out on resize (Alexei)
- Remove second lookup in get_next_key() (Emil)
- __acquires(RCU)/__releases(RCU) on seq_start/seq_stop (Emil)
- Use bpf_map_check_op_flags() where it makes sense (Leon)
- Benchmarks refresh, experiment with alternative hash functions
- Rely on iterator invalidation during rehash to handle table resizes:
fail on resize where we fully iterate on table inside kernel, dont fail on
resize where iteration goes through userspace. Exception -
rhtab_map_free_internal_structs() should be just safe to iterate fully
in kernel, no risk of infinite loop, because no user holding reference.
- Handle special fields during in-place updates (Emil, sashiko)
- Link to v2: https://lore.kernel.org/all/20260408-rhash-v2-0-3b3675da1f6e@meta.com/
Changes in v2:
- Added benchmarks
- Reworked all functions that walk the rhashtable, use walk API, instead
of directly accessing tbl and future_tbl
- Added rhashtable_walk_enter_from() into rhashtable to support O(1)
iteration continuations
- Link to v1: https://lore.kernel.org/r/20260205-rhash-v1-0-30dd6d63c462@meta.com
---
Herbert Xu (1):
rhashtable: Use irq work for shrinking
Mykyta Yatsenko (11):
rhashtable: Add rhashtable_next_key() API
rhashtable: Add selftest for rhashtable_next_key()
bpf: Implement resizable hashmap basic functions
bpf: Implement iteration ops for resizable hashtab
bpf: Allow special fields in resizable hashtab
bpf: Optimize word-sized keys for resizable hashtable
libbpf: Support resizable hashtable
selftests/bpf: Add basic tests for resizable hash map
selftests/bpf: Add BPF iterator tests for resizable hash map
bpftool: Add rhash map documentation
selftests/bpf: Add resizable hashmap to benchmarks
include/linux/bpf_types.h | 1 +
include/linux/rhashtable.h | 42 +-
include/uapi/linux/bpf.h | 6 +
kernel/bpf/hashtab.c | 808 ++++++++++++++++++++-
kernel/bpf/map_iter.c | 3 +-
kernel/bpf/syscall.c | 6 +
kernel/bpf/verifier.c | 1 +
lib/rhashtable.c | 69 ++
lib/test_rhashtable.c | 75 ++
tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
tools/include/uapi/linux/bpf.h | 6 +
tools/lib/bpf/libbpf.c | 1 +
tools/lib/bpf/libbpf_probes.c | 3 +
tools/testing/selftests/bpf/bench.c | 6 +
.../bpf/benchs/bench_bpf_hashmap_full_update.c | 34 +-
.../bpf/benchs/bench_bpf_hashmap_lookup.c | 31 +-
.../testing/selftests/bpf/benchs/bench_htab_mem.c | 35 +-
tools/testing/selftests/bpf/prog_tests/rhash.c | 183 +++++
.../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c | 34 +
tools/testing/selftests/bpf/progs/rhash.c | 248 +++++++
21 files changed, 1577 insertions(+), 19 deletions(-)
---
base-commit: 73f574b0cd0ef87a25afdffe1d907b55f7fcc940
change-id: 20251103-rhash-7b70069923d8
Best regards,
--
Mykyta Yatsenko <yatsenko@meta.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:43 ` Herbert Xu
2026-06-05 11:41 ` [PATCH bpf-next v7 02/12] rhashtable: Add selftest for rhashtable_next_key() Mykyta Yatsenko
` (11 subsequent siblings)
12 siblings, 1 reply; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Introduce a simpler iteration mechanism for rhashtable that lets
the caller continue from an arbitrary position by supplying the
previous key, without the per-iterator state of the
rhashtable_walk_* API.
void *rhashtable_next_key(struct rhashtable *ht,
const void *prev_key);
Caller holds RCU; passes NULL prev_key for the first element or
the previously returned key to advance. Walks tbl->future_tbl
chain so in-flight rehashes are observed.
Best-effort: in case of concurrent resize, provides no guarantees:
- may produce duplicate elements
- may skip any amount of elements
- termination of the loop is not guaranteed in case of
sustained rehash. Callers are advised to bound loop externally
or avoid inserting new elements during such loop.
Returns ERR_PTR(-ENOENT) if prev_key is not found.
Behavior on tables with duplicate keys is undefined.
rhltable is not supported — returns ERR_PTR(-EOPNOTSUPP).
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
include/linux/rhashtable.h | 40 +++++++++++++++++++++++++++
lib/rhashtable.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 109 insertions(+)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index ef5230cece36..6f3aea498515 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -650,6 +650,46 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
return NULL;
}
+/**
+ * rhashtable_next_key - return next element after a given key
+ * @ht: hash table
+ * @prev_key: pointer to previous key, or NULL for the first element
+ *
+ * WARNING: this walk is highly unstable. Unlike rhashtable_walk_*(),
+ * it cannot detect a concurrent resize or rehash, so a full iteration
+ * is NOT guaranteed to terminate under adversarial or sustained
+ * rehashing. Callers MUST tolerate skipped and duplicated elements and
+ * SHOULD bound their loop externally.
+ *
+ * Returns the next element in best-effort iteration order, walking the
+ * @tbl chain (including any future_tbl in flight). Caller must hold RCU.
+ *
+ * Pass @prev_key == NULL to obtain the first element. To iterate, set
+ * @prev_key to the key of the previously returned element on each call,
+ * and stop when NULL is returned.
+ *
+ * Best-effort semantics:
+ * - Across the tbl->future_tbl chain, an element being migrated may
+ * transiently appear in both tables and be observed twice.
+ * - Concurrent inserts may or may not be observed.
+ * - Termination of a full iteration loop is NOT guaranteed under
+ * adversarial continuous rehash; callers MUST tolerate skips and
+ * repeats and SHOULD bound their loop externally.
+ * - Behavior on tables that contain duplicate keys is undefined:
+ * duplicates may be skipped, repeated, or trap the walk in a
+ * cycle. Callers requiring duplicate-key iteration must use
+ * rhashtable_walk_*() instead.
+ * - rhltable instances are not supported and return
+ * ERR_PTR(-EOPNOTSUPP).
+ * - If prev_key was concurrently deleted and is not present in any
+ * in-flight table, returns ERR_PTR(-ENOENT).
+ *
+ * Returns entry of the next element, or NULL when iteration is exhausted,
+ * or ERR_PTR(-ENOENT) if prev_key is not found, or
+ * ERR_PTR(-EOPNOTSUPP) if @ht is an rhltable.
+ */
+void *rhashtable_next_key(struct rhashtable *ht, const void *prev_key);
+
/**
* rhashtable_lookup - search hash table
* @ht: hash table
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 04b3a808fca9..dd6eaa09c55d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -687,6 +687,75 @@ void *rhashtable_insert_slow(struct rhashtable *ht, const void *key,
}
EXPORT_SYMBOL_GPL(rhashtable_insert_slow);
+/* Scan one element forward from prev_key's position in @tbl.
+ * Returns first rhash_head whose bucket > prev_key's bucket, or the
+ * element immediately after prev_key inside prev_key's bucket.
+ * Returns the first element if prev_key is NULL, NULL when @tbl is
+ * exhausted, or ERR_PTR(-ENOENT) if prev_key is not found in @tbl.
+ */
+static struct rhash_head *__rhashtable_next_in_table(
+ struct rhashtable *ht, struct bucket_table *tbl,
+ const void *prev_key)
+{
+ struct rhashtable_compare_arg arg = { .ht = ht, .key = prev_key };
+ const struct rhashtable_params params = ht->p;
+ struct rhash_head *he;
+ unsigned int b = 0;
+ bool found = false;
+
+ if (prev_key) {
+ b = rht_key_hashfn(ht, tbl, prev_key, params);
+ rht_for_each_rcu(he, tbl, b) {
+ bool match = params.obj_cmpfn
+ ? !params.obj_cmpfn(&arg, rht_obj(ht, he))
+ : !rhashtable_compare(&arg, rht_obj(ht, he));
+ if (found) {
+ if (match)
+ continue;
+ return he;
+ }
+ if (match)
+ found = true;
+ }
+ if (!found)
+ return ERR_PTR(-ENOENT);
+ b++;
+ }
+
+ for (; b < tbl->size; b++)
+ rht_for_each_rcu(he, tbl, b)
+ return he;
+ return NULL;
+}
+
+/**
+ * rhashtable_next_key - return next element after a given key
+ *
+ * See include/linux/rhashtable.h for the full contract.
+ */
+void *rhashtable_next_key(struct rhashtable *ht, const void *prev_key)
+{
+ struct bucket_table *tbl;
+ struct rhash_head *he;
+
+ if (unlikely(ht->rhlist))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ tbl = rht_dereference_rcu(ht->tbl, ht);
+ do {
+ he = __rhashtable_next_in_table(ht, tbl, prev_key);
+ if (!IS_ERR_OR_NULL(he))
+ return rht_obj(ht, he);
+ if (!he)
+ prev_key = NULL;
+ /* See any new future_tbl attached during a rehash. */
+ smp_rmb();
+ tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+ } while (tbl);
+ return he; /* NULL or -ENOENT */
+}
+EXPORT_SYMBOL_GPL(rhashtable_next_key);
+
/**
* rhashtable_walk_enter - Initialise an iterator
* @ht: Table to walk over
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 02/12] rhashtable: Add selftest for rhashtable_next_key()
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:43 ` Herbert Xu
2026-06-05 11:41 ` [PATCH bpf-next v7 03/12] rhashtable: Use irq work for shrinking Mykyta Yatsenko
` (10 subsequent siblings)
12 siblings, 1 reply; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Insert n elements, then verify:
- NULL prev_key walks from the beginning, visiting all n
- non-existing prev_key returns ERR_PTR(-ENOENT)
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
lib/test_rhashtable.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 0b33559a910b..b767a38a74f9 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -679,6 +679,78 @@ static int threadfunc(void *data)
return err;
}
+static int __init test_rhashtable_next_key(void)
+{
+ struct rhashtable_params params = test_rht_params;
+ struct test_obj_val key_missing = { .id = 99999, .tid = 0 };
+ struct test_obj_val *prev_key = NULL;
+ struct rhashtable ht;
+ struct test_obj *objs, *cur;
+ int i, count = 0, err;
+ int visited_keys[8] = { 0 };
+ const int n = ARRAY_SIZE(visited_keys);
+
+ params.nelem_hint = n;
+
+ err = rhashtable_init(&ht, ¶ms);
+ if (err)
+ return err;
+
+ objs = kcalloc(n, sizeof(*objs), GFP_KERNEL);
+ if (!objs) {
+ rhashtable_destroy(&ht);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < n; i++) {
+ objs[i].value.id = i;
+ err = rhashtable_insert_fast(&ht, &objs[i].node, params);
+ if (err)
+ goto out;
+ }
+
+ rcu_read_lock();
+
+ /* NULL prev_key: walk from the beginning, expect all n elements. */
+ while ((cur = rhashtable_next_key(&ht, prev_key))) {
+ if (IS_ERR(cur)) {
+ err = -EINVAL;
+ goto unlock;
+ }
+ count++;
+ prev_key = &cur->value;
+ visited_keys[cur->value.id] = 1;
+ if (count > n)
+ break;
+ }
+
+ if (count != n) {
+ err = -EINVAL;
+ goto unlock;
+ }
+
+ for (i = 0; i < n; i++) {
+ if (!visited_keys[i]) {
+ err = -EINVAL;
+ goto unlock;
+ }
+ }
+
+ /* Non-existing prev_key: must return ERR_PTR(-ENOENT). */
+ cur = rhashtable_next_key(&ht, &key_missing);
+ if (!IS_ERR(cur) || PTR_ERR(cur) != -ENOENT)
+ err = -EINVAL;
+
+unlock:
+ rcu_read_unlock();
+out:
+ for (i = 0; i < n; i++)
+ rhashtable_remove_fast(&ht, &objs[i].node, params);
+ kfree(objs);
+ rhashtable_destroy(&ht);
+ return err;
+}
+
static int __init test_rht_init(void)
{
unsigned int entries;
@@ -738,6 +810,9 @@ static int __init test_rht_init(void)
test_insert_duplicates_run();
+ pr_info("Testing rhashtable_next_key: %s\n",
+ test_rhashtable_next_key() == 0 ? "pass" : "FAIL");
+
if (!tcount)
return 0;
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 03/12] rhashtable: Use irq work for shrinking
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 02/12] rhashtable: Add selftest for rhashtable_next_key() Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 12:40 ` bot+bpf-ci
2026-06-05 11:41 ` [PATCH bpf-next v7 04/12] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
` (9 subsequent siblings)
12 siblings, 1 reply; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Herbert Xu <herbert@gondor.apana.org.au>
Use irq work for automatic shrinking so that this may be called in NMI
context.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
include/linux/rhashtable.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 6f3aea498515..3de3412d53c8 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -1157,7 +1157,7 @@ static __always_inline int __rhashtable_remove_fast_one(
atomic_dec(&ht->nelems);
if (unlikely(ht->p.automatic_shrinking &&
rht_shrink_below_30(ht, tbl)))
- schedule_work(&ht->run_work);
+ irq_work_queue(&ht->run_irq_work);
err = 0;
}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 04/12] bpf: Implement resizable hashmap basic functions
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (2 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 03/12] rhashtable: Use irq work for shrinking Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab Mykyta Yatsenko
` (8 subsequent siblings)
12 siblings, 0 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
for deletes, and rhashtable_lookup_get_insert_fast() for inserts.
Updates modify values in place under RCU rather than allocating a
new element and swapping the pointer (as regular htab does). This
trades read consistency for performance: concurrent readers may
see partial updates. BPF_F_LOCK support and special-field
handling (timers, kptrs, etc.) follow in a later commit.
Initialize rhashtable with bpf_mem_alloc element cache. Require
BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via
rhashtable_free_and_destroy().
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
include/linux/bpf_types.h | 1 +
include/uapi/linux/bpf.h | 6 +
kernel/bpf/hashtab.c | 311 +++++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 3 +
kernel/bpf/verifier.c | 1 +
tools/include/uapi/linux/bpf.h | 6 +
6 files changed, 328 insertions(+)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b13de31e163f..56e4c3f983d3 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -134,6 +134,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_ARRAY, insn_array_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_RHASH, rhtab_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index aec171ccb6ef..bed9b1b4d5ef 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1047,6 +1047,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
BPF_MAP_TYPE_INSN_ARRAY,
+ BPF_MAP_TYPE_RHASH,
__MAX_BPF_MAP_TYPE
};
@@ -1545,6 +1546,11 @@ union bpf_attr {
*
* BPF_MAP_TYPE_ARENA - contains the address where user space
* is going to mmap() the arena. It has to be page aligned.
+ *
+ * BPF_MAP_TYPE_RHASH - initial table size hint
+ * (nelem_hint). 0 = use rhashtable default. Must be
+ * <= min(max_entries, U16_MAX). Upper 32 bits reserved,
+ * must be zero.
*/
__u64 map_extra;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3dd9b4924ae4..10f3a058747b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -9,6 +9,7 @@
#include <linux/rculist_nulls.h>
#include <linux/rcupdate_wait.h>
#include <linux/random.h>
+#include <linux/rhashtable.h>
#include <uapi/linux/btf.h>
#include <linux/rcupdate_trace.h>
#include <linux/btf_ids.h>
@@ -2739,3 +2740,313 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
};
+
+struct rhtab_elem {
+ struct rhash_head node;
+ /* key bytes, then value bytes follow */
+ u8 data[] __aligned(8);
+};
+
+struct bpf_rhtab {
+ struct bpf_map map;
+ struct rhashtable ht;
+ struct bpf_mem_alloc ma;
+ u32 elem_size;
+};
+
+static const struct rhashtable_params rhtab_params = {
+ .head_offset = offsetof(struct rhtab_elem, node),
+ .key_offset = offsetof(struct rhtab_elem, data),
+};
+
+static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
+{
+ return l->data + round_up(key_size, 8);
+}
+
+static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
+{
+ struct rhashtable_params params;
+ struct bpf_rhtab *rhtab;
+ int err = 0;
+
+ rhtab = bpf_map_area_alloc(sizeof(*rhtab), NUMA_NO_NODE);
+ if (!rhtab)
+ return ERR_PTR(-ENOMEM);
+
+ bpf_map_init_from_attr(&rhtab->map, attr);
+
+ if (rhtab->map.max_entries > 1UL << 31) {
+ err = -E2BIG;
+ goto free_rhtab;
+ }
+
+ rhtab->elem_size = sizeof(struct rhtab_elem) + round_up(rhtab->map.key_size, 8) +
+ round_up(rhtab->map.value_size, 8);
+
+ params = rhtab_params;
+ params.key_len = rhtab->map.key_size;
+ params.nelem_hint = (u32)attr->map_extra;
+ params.automatic_shrinking = true;
+
+ err = rhashtable_init(&rhtab->ht, ¶ms);
+ if (err)
+ goto free_rhtab;
+
+ /* Set max_elems after rhashtable_init() since init zeroes the struct */
+ rhtab->ht.max_elems = rhtab->map.max_entries;
+
+ err = bpf_mem_alloc_init(&rhtab->ma, rhtab->elem_size, false);
+ if (err)
+ goto destroy_rhtab;
+
+ return &rhtab->map;
+
+destroy_rhtab:
+ rhashtable_destroy(&rhtab->ht);
+free_rhtab:
+ bpf_map_area_free(rhtab);
+ return ERR_PTR(err);
+}
+
+static int rhtab_map_alloc_check(union bpf_attr *attr)
+{
+ if (!(attr->map_flags & BPF_F_NO_PREALLOC))
+ return -EINVAL;
+
+ if (attr->map_flags & BPF_F_ZERO_SEED)
+ return -EINVAL;
+
+ if (attr->key_size > U16_MAX)
+ return -E2BIG;
+
+ if (attr->map_extra >> 32)
+ return -EINVAL;
+
+ if ((u32)attr->map_extra > U16_MAX)
+ return -E2BIG;
+
+ if ((u32)attr->map_extra > attr->max_entries)
+ return -EINVAL;
+
+ return htab_map_alloc_check(attr);
+}
+
+static void rhtab_free_elem(void *ptr, void *arg)
+{
+ struct bpf_rhtab *rhtab = arg;
+ struct rhtab_elem *elem = ptr;
+
+ bpf_mem_cache_free_rcu(&rhtab->ma, elem);
+}
+
+static void rhtab_map_free(struct bpf_map *map)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+ rhashtable_free_and_destroy(&rhtab->ht, rhtab_free_elem, rhtab);
+ bpf_mem_alloc_destroy(&rhtab->ma);
+ bpf_map_area_free(rhtab);
+}
+
+static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+ /* Hold RCU lock in case sleepable program calls via gen_lookup */
+ guard(rcu)();
+
+ return rhashtable_lookup_likely(&rhtab->ht, key, rhtab_params);
+}
+
+static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
+{
+ struct rhtab_elem *l;
+
+ l = rhtab_lookup_elem(map, key);
+ return l ? rhtab_elem_value(l, map->key_size) : NULL;
+}
+
+static void rhtab_read_elem_value(struct bpf_map *map, void *dst, struct rhtab_elem *elem,
+ u64 flags)
+{
+ void *src = rhtab_elem_value(elem, map->key_size);
+
+ if (flags & BPF_F_LOCK)
+ copy_map_value_locked(map, dst, src, true);
+ else
+ copy_map_value(map, dst, src);
+}
+
+static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem, void *copy,
+ u64 flags)
+{
+ int err;
+
+ /*
+ * disable_instrumentation() mitigates the deadlock for programs running in NMI context.
+ * rhashtable locks bucket with local_irq_save(). Only NMI programs may reenter
+ * rhashtable code, bpf_disable_instrumentation() disables programs running in NMI, except
+ * raw tracepoints, which we don't have in rhashtable.
+ */
+ bpf_disable_instrumentation();
+ err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab_params);
+ bpf_enable_instrumentation();
+
+ if (err)
+ return err;
+
+ if (copy) {
+ rhtab_read_elem_value(&rhtab->map, copy, elem, flags);
+ check_and_init_map_value(&rhtab->map, copy);
+ }
+
+ bpf_mem_cache_free_rcu(&rhtab->ma, elem);
+ return 0;
+}
+
+
+static long rhtab_map_delete_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *elem;
+
+ guard(rcu)();
+
+ elem = rhtab_lookup_elem(map, key);
+ if (!elem)
+ return -ENOENT;
+
+ return rhtab_delete_elem(rhtab, elem, NULL, 0);
+}
+
+static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *elem;
+ int err;
+
+ err = bpf_map_check_op_flags(map, flags, BPF_F_LOCK);
+ if (err)
+ return err;
+
+ guard(rcu)();
+
+ elem = rhtab_lookup_elem(map, key);
+ if (!elem)
+ return -ENOENT;
+
+ return rhtab_delete_elem(rhtab, elem, value, flags);
+}
+
+static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
+ u64 map_flags)
+{
+ void *old_val = rhtab_elem_value(elem, map->key_size);
+
+ if (map_flags & BPF_NOEXIST)
+ return -EEXIST;
+
+ if (map_flags & BPF_F_LOCK)
+ copy_map_value_locked(map, old_val, value, false);
+ else
+ copy_map_value(map, old_val, value);
+ return 0;
+}
+
+static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *elem, *tmp;
+
+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+ return -EINVAL;
+
+ if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
+ return -EINVAL;
+
+ guard(rcu)();
+ elem = rhtab_lookup_elem(map, key);
+ if (elem)
+ return rhtab_map_update_existing(map, elem, value, map_flags);
+
+ if (map_flags & BPF_EXIST)
+ return -ENOENT;
+
+ /* Check max_entries limit before inserting new element */
+ if (atomic_read(&rhtab->ht.nelems) >= map->max_entries)
+ return -E2BIG;
+
+ elem = bpf_mem_cache_alloc(&rhtab->ma);
+ if (!elem)
+ return -ENOMEM;
+
+ memcpy(elem->data, key, map->key_size);
+ copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+
+ /* Prevent deadlock for NMI programs attempting to take bucket lock */
+ bpf_disable_instrumentation();
+ tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab_params);
+ bpf_enable_instrumentation();
+
+ if (tmp) {
+ bpf_mem_cache_free(&rhtab->ma, elem);
+ if (IS_ERR(tmp))
+ return PTR_ERR(tmp);
+
+ return rhtab_map_update_existing(map, tmp, value, map_flags);
+ }
+
+ return 0;
+}
+
+static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
+{
+ struct bpf_insn *insn = insn_buf;
+ const int ret = BPF_REG_0;
+
+ BUILD_BUG_ON(!__same_type(&rhtab_lookup_elem,
+ (void *(*)(struct bpf_map *map, void *key)) NULL));
+ *insn++ = BPF_EMIT_CALL(rhtab_lookup_elem);
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1);
+ *insn++ = BPF_ALU64_IMM(BPF_ADD, ret,
+ offsetof(struct rhtab_elem, data) + round_up(map->key_size, 8));
+
+ return insn - insn_buf;
+}
+
+static void rhtab_map_free_internal_structs(struct bpf_map *map)
+{
+}
+
+static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ return -EOPNOTSUPP;
+}
+
+static u64 rhtab_map_mem_usage(const struct bpf_map *map)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ u64 num_entries;
+
+ /* Excludes rhashtable bucket overhead (~ nelems * sizeof(void *) at 75% load). */
+ num_entries = atomic_read(&rhtab->ht.nelems);
+ return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
+}
+
+BTF_ID_LIST_SINGLE(rhtab_map_btf_ids, struct, bpf_rhtab)
+const struct bpf_map_ops rhtab_map_ops = {
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_alloc_check = rhtab_map_alloc_check,
+ .map_alloc = rhtab_map_alloc,
+ .map_free = rhtab_map_free,
+ .map_get_next_key = rhtab_map_get_next_key,
+ .map_release_uref = rhtab_map_free_internal_structs,
+ .map_lookup_elem = rhtab_map_lookup_elem,
+ .map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem,
+ .map_update_elem = rhtab_map_update_elem,
+ .map_delete_elem = rhtab_map_delete_elem,
+ .map_gen_lookup = rhtab_map_gen_lookup,
+ .map_mem_usage = rhtab_map_mem_usage,
+ .map_btf_id = &rhtab_map_btf_ids[0],
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 625a4366fe6d..1faae184de48 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1398,6 +1398,7 @@ static int map_create_alloc(union bpf_attr *attr, bpfptr_t uattr, struct bpf_ver
if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
attr->map_type != BPF_MAP_TYPE_ARENA &&
+ attr->map_type != BPF_MAP_TYPE_RHASH &&
attr->map_extra != 0) {
bpf_log(log, "Invalid map_extra.\n");
return -EINVAL;
@@ -1469,6 +1470,7 @@ static int map_create_alloc(union bpf_attr *attr, bpfptr_t uattr, struct bpf_ver
case BPF_MAP_TYPE_CGROUP_ARRAY:
case BPF_MAP_TYPE_ARRAY_OF_MAPS:
case BPF_MAP_TYPE_HASH:
+ case BPF_MAP_TYPE_RHASH:
case BPF_MAP_TYPE_PERCPU_HASH:
case BPF_MAP_TYPE_HASH_OF_MAPS:
case BPF_MAP_TYPE_RINGBUF:
@@ -2259,6 +2261,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+ map->map_type == BPF_MAP_TYPE_RHASH ||
map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
if (!bpf_map_is_offloaded(map)) {
bpf_disable_instrumentation();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8ed484cb1a8a..7d27ba396d32 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17657,6 +17657,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
if (prog->sleepable)
switch (map->map_type) {
case BPF_MAP_TYPE_HASH:
+ case BPF_MAP_TYPE_RHASH:
case BPF_MAP_TYPE_LRU_HASH:
case BPF_MAP_TYPE_ARRAY:
case BPF_MAP_TYPE_PERCPU_HASH:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 37142e6d911a..7d0b282ba674 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1047,6 +1047,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
BPF_MAP_TYPE_INSN_ARRAY,
+ BPF_MAP_TYPE_RHASH,
__MAX_BPF_MAP_TYPE
};
@@ -1545,6 +1546,11 @@ union bpf_attr {
*
* BPF_MAP_TYPE_ARENA - contains the address where user space
* is going to mmap() the arena. It has to be page aligned.
+ *
+ * BPF_MAP_TYPE_RHASH - initial table size hint
+ * (nelem_hint). 0 = use rhashtable default. Must be
+ * <= min(max_entries, U16_MAX). Upper 32 bits reserved,
+ * must be zero.
*/
__u64 map_extra;
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (3 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 04/12] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:56 ` sashiko-bot
2026-06-05 12:40 ` bot+bpf-ci
2026-06-05 11:41 ` [PATCH bpf-next v7 06/12] bpf: Allow special fields in " Mykyta Yatsenko
` (7 subsequent siblings)
12 siblings, 2 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Implement get_next_key, batch lookup/lookup-and-delete, for_each_map_elem
callback, and the seq_file BPF iterator for BPF_MAP_TYPE_RHASH.
get_next_key() and batch use rhashtable_next_key() — stateless,
matches the syscall UAPI shape (no kernel-side iterator state).
get_next_key falls back to the first key when prev_key was
concurrently deleted (matches htab semantics). Batch reports
cursor loss as -EAGAIN so userspace can distinguish it from
end-of-iteration (-ENOENT) and restart from NULL.
The seq_file BPF iterator uses rhashtable_walk_* instead. It runs
only from read() syscall context, so the walker's spin_lock is
safe, and seq_file's per-fd state lets the walker handle rehash
correctly (retry on -EAGAIN) for stronger coverage than the
stateless API can provide.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 347 +++++++++++++++++++++++++++++++++++++++++++++++++-
kernel/bpf/map_iter.c | 3 +-
2 files changed, 348 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 10f3a058747b..a149713d0953 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -3020,8 +3020,79 @@ static void rhtab_map_free_internal_structs(struct bpf_map *map)
}
static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+ __must_hold_shared(RCU)
{
- return -EOPNOTSUPP;
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *elem;
+
+ elem = rhashtable_next_key(&rhtab->ht, key);
+
+ /* if not found, return the first key */
+ if (PTR_ERR(elem) == -ENOENT)
+ elem = rhashtable_next_key(&rhtab->ht, NULL);
+
+ if (IS_ERR(elem))
+ return PTR_ERR(elem);
+ if (!elem)
+ return -ENOENT;
+
+ memcpy(next_key, elem->data, map->key_size);
+ return 0;
+}
+
+static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
+{
+ void *value;
+
+ /* Guarantee that hashtab value is not freed */
+ guard(rcu)();
+
+ value = rhtab_map_lookup_elem(map, key);
+ if (!value)
+ return;
+
+ btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
+ seq_puts(m, ": ");
+ btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
+ seq_putc(m, '\n');
+}
+
+static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
+ void *callback_ctx, u64 flags)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ void *prev_key = NULL;
+ struct rhtab_elem *elem;
+ int num_elems = 0;
+ u64 ret = 0;
+
+ cant_migrate();
+
+ if (flags != 0)
+ return -EINVAL;
+
+ rcu_read_lock();
+ /*
+ * Best-effort iteration: if rhashtable is concurrently resized or
+ * elements are deleted/inserted, there may be missed or duplicate
+ * elements visited.
+ */
+ while ((elem = rhashtable_next_key(&rhtab->ht, prev_key))) {
+ if (IS_ERR(elem))
+ break;
+ num_elems++;
+ ret = callback_fn((u64)(long)map,
+ (u64)(long)elem->data,
+ (u64)(long)rhtab_elem_value(elem, map->key_size),
+ (u64)(long)callback_ctx, 0);
+ if (ret)
+ break;
+
+ prev_key = elem->data; /* valid while RCU held */
+ }
+ rcu_read_unlock();
+
+ return num_elems;
}
static u64 rhtab_map_mem_usage(const struct bpf_map *map)
@@ -3034,6 +3105,275 @@ static u64 rhtab_map_mem_usage(const struct bpf_map *map)
return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
}
+static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
+ const union bpf_attr *attr,
+ union bpf_attr __user *uattr,
+ bool do_delete)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ void __user *uvalues = u64_to_user_ptr(attr->batch.values);
+ void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
+ void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
+ void *cursor = NULL, *keys = NULL, *values = NULL, *dst_key, *dst_val;
+ struct rhtab_elem **del_elems = NULL;
+ u32 max_count, total, key_size, value_size, i;
+ bool has_next_cursor = false;
+ struct rhtab_elem *elem;
+ u64 elem_map_flags, map_flags;
+ int ret = 0;
+
+ elem_map_flags = attr->batch.elem_flags;
+ ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK);
+ if (ret)
+ return ret;
+
+ map_flags = attr->batch.flags;
+ if (map_flags)
+ return -EINVAL;
+
+ max_count = attr->batch.count;
+ if (!max_count)
+ return 0;
+
+ if (put_user(0, &uattr->batch.count))
+ return -EFAULT;
+
+ key_size = map->key_size;
+ value_size = map->value_size;
+
+ keys = kvmalloc_array(max_count, key_size, GFP_USER | __GFP_NOWARN);
+ values = kvmalloc_array(max_count, value_size, GFP_USER | __GFP_NOWARN);
+ if (do_delete)
+ del_elems = kvmalloc_array(max_count, sizeof(void *),
+ GFP_USER | __GFP_NOWARN);
+ cursor = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
+
+ if (!keys || !values || !cursor || (do_delete && !del_elems)) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ if (ubatch && copy_from_user(cursor, ubatch, key_size)) {
+ ret = -EFAULT;
+ goto free;
+ }
+
+ dst_key = keys;
+ dst_val = values;
+ total = 0;
+
+ rcu_read_lock();
+
+ /*
+ * Cursor stores the key of the next-to-process element (stashed by
+ * the previous batch). Look it up directly so the element is included
+ * here rather than skipped by next_key(). If the cursor was deleted
+ * concurrently (or by the previous do_delete batch), return -EAGAIN
+ * so userspace can distinguish a lost cursor from end-of-iteration
+ * (-ENOENT) and restart from a NULL cursor.
+ */
+ if (ubatch) {
+ elem = rhtab_lookup_elem(map, cursor);
+ if (!elem) {
+ rcu_read_unlock();
+ ret = -EAGAIN;
+ goto free;
+ }
+ } else {
+ elem = rhashtable_next_key(&rhtab->ht, NULL);
+ }
+
+ while (elem && !IS_ERR(elem) && total < max_count) {
+ memcpy(dst_key, elem->data, key_size);
+ rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
+ check_and_init_map_value(map, dst_val);
+
+ if (do_delete)
+ del_elems[total] = elem;
+
+ elem = rhashtable_next_key(&rhtab->ht, dst_key);
+ dst_key += key_size;
+ dst_val += value_size;
+ total++;
+
+ /* Bail to userspace to avoid stalls. */
+ if (need_resched())
+ break;
+ }
+
+ if (elem && !IS_ERR(elem)) {
+ /* Stash next-to-process key as cursor for the next batch. */
+ memcpy(cursor, elem->data, key_size);
+ has_next_cursor = true;
+ }
+
+ if (do_delete) {
+ for (i = 0; i < total; i++)
+ rhtab_delete_elem(rhtab, del_elems[i], NULL, 0);
+ }
+
+ rcu_read_unlock();
+
+ if (total == 0) {
+ ret = -ENOENT;
+ goto free;
+ }
+
+ /* No more elements after this batch. */
+ if (!has_next_cursor)
+ ret = -ENOENT;
+
+ if (copy_to_user(ukeys, keys, (size_t)total * key_size) ||
+ copy_to_user(uvalues, values, (size_t)total * value_size) ||
+ put_user(total, &uattr->batch.count) ||
+ (has_next_cursor &&
+ copy_to_user(u64_to_user_ptr(attr->batch.out_batch),
+ cursor, key_size))) {
+ ret = -EFAULT;
+ goto free;
+ }
+
+free:
+ kfree(cursor);
+ kvfree(keys);
+ kvfree(values);
+ kvfree(del_elems);
+ return ret;
+}
+
+static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, false);
+}
+
+static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, true);
+}
+
+struct bpf_iter_seq_rhash_map_info {
+ struct bpf_map *map;
+ struct bpf_rhtab *rhtab;
+ struct rhashtable_iter iter;
+};
+
+static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
+ __acquires(RCU)
+{
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+ struct rhtab_elem *elem;
+
+ rhashtable_walk_start(&info->iter);
+ /*
+ * Re-deliver the element returned by walk_next() at the end of the
+ * previous read() — bpf_seq_read may have stopped before show()
+ * consumed it. Rehash rewinds the walker; retry on -EAGAIN.
+ */
+ do {
+ elem = rhashtable_walk_peek(&info->iter);
+ } while (PTR_ERR(elem) == -EAGAIN);
+
+ if (IS_ERR(elem))
+ return NULL;
+
+ if (elem && *pos == 0)
+ ++*pos;
+ return elem;
+}
+
+static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+ struct rhtab_elem *elem;
+
+ ++*pos;
+
+ /* Rehash rewinds the walker; retry until it stops returning -EAGAIN. */
+ do {
+ elem = rhashtable_walk_next(&info->iter);
+ } while (PTR_ERR(elem) == -EAGAIN);
+
+ if (IS_ERR(elem))
+ return NULL;
+ return elem;
+}
+
+static int __bpf_rhash_map_seq_show(struct seq_file *seq,
+ struct rhtab_elem *elem)
+{
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+ struct bpf_iter__bpf_map_elem ctx = {};
+ struct bpf_iter_meta meta;
+ struct bpf_prog *prog;
+ int ret = 0;
+
+ meta.seq = seq;
+ prog = bpf_iter_get_info(&meta, elem == NULL);
+ if (prog) {
+ ctx.meta = &meta;
+ ctx.map = info->map;
+ if (elem) {
+ ctx.key = elem->data;
+ ctx.value = rhtab_elem_value(elem, info->map->key_size);
+ }
+ ret = bpf_iter_run_prog(prog, &ctx);
+ }
+
+ return ret;
+}
+
+static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v)
+{
+ return __bpf_rhash_map_seq_show(seq, v);
+}
+
+static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v)
+ __releases(RCU)
+{
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+
+ if (!v)
+ (void)__bpf_rhash_map_seq_show(seq, NULL);
+
+ rhashtable_walk_stop(&info->iter);
+}
+
+static int bpf_iter_init_rhash_map(void *priv_data, struct bpf_iter_aux_info *aux)
+{
+ struct bpf_iter_seq_rhash_map_info *info = priv_data;
+ struct bpf_map *map = aux->map;
+
+ bpf_map_inc_with_uref(map);
+ info->map = map;
+ info->rhtab = container_of(map, struct bpf_rhtab, map);
+ rhashtable_walk_enter(&info->rhtab->ht, &info->iter);
+ return 0;
+}
+
+static void bpf_iter_fini_rhash_map(void *priv_data)
+{
+ struct bpf_iter_seq_rhash_map_info *info = priv_data;
+
+ rhashtable_walk_exit(&info->iter);
+ bpf_map_put_with_uref(info->map);
+}
+
+static const struct seq_operations bpf_rhash_map_seq_ops = {
+ .start = bpf_rhash_map_seq_start,
+ .next = bpf_rhash_map_seq_next,
+ .stop = bpf_rhash_map_seq_stop,
+ .show = bpf_rhash_map_seq_show,
+};
+
+static const struct bpf_iter_seq_info rhash_iter_seq_info = {
+ .seq_ops = &bpf_rhash_map_seq_ops,
+ .init_seq_private = bpf_iter_init_rhash_map,
+ .fini_seq_private = bpf_iter_fini_rhash_map,
+ .seq_priv_size = sizeof(struct bpf_iter_seq_rhash_map_info),
+};
+
BTF_ID_LIST_SINGLE(rhtab_map_btf_ids, struct, bpf_rhtab)
const struct bpf_map_ops rhtab_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
@@ -3047,6 +3387,11 @@ const struct bpf_map_ops rhtab_map_ops = {
.map_update_elem = rhtab_map_update_elem,
.map_delete_elem = rhtab_map_delete_elem,
.map_gen_lookup = rhtab_map_gen_lookup,
+ .map_seq_show_elem = rhtab_map_seq_show_elem,
+ .map_set_for_each_callback_args = map_set_for_each_callback_args,
+ .map_for_each_callback = bpf_each_rhash_elem,
.map_mem_usage = rhtab_map_mem_usage,
+ BATCH_OPS(rhtab),
.map_btf_id = &rhtab_map_btf_ids[0],
+ .iter_seq_info = &rhash_iter_seq_info,
};
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index ae0741a09c6d..c19b360bad9e 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -123,7 +123,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog,
is_percpu = true;
else if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
- map->map_type != BPF_MAP_TYPE_ARRAY)
+ map->map_type != BPF_MAP_TYPE_ARRAY &&
+ map->map_type != BPF_MAP_TYPE_RHASH)
goto put_map;
key_acc_size = prog->aux->max_rdonly_access;
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 06/12] bpf: Allow special fields in resizable hashtab
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (4 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable Mykyta Yatsenko
` (6 subsequent siblings)
12 siblings, 0 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Add support for timers, workqueues, task work, spin locks and kptrs.
Without this, users needing deferred callbacks, BPF_F_LOCK, or
refcounted kernel pointers in a dynamically-sized map have no option -
fixed-size htab is the only map supporting these field types.
Resizable hashtab should offer the same capability.
kptr semantics under in-place updates are identical to array map.
Properly clean up BTF record fields on element delete and map
teardown by wiring up bpf_obj_free_fields through a memory allocator
destructor, matching the pattern used by htab for non-prealloc maps.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++-----
kernel/bpf/syscall.c | 3 ++
2 files changed, 104 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index a149713d0953..7b9408b8320c 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -497,28 +497,26 @@ static void htab_dtor_ctx_free(void *ctx)
kfree(ctx);
}
-static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *))
+static int bpf_ma_set_dtor(struct bpf_map *map, struct bpf_mem_alloc *ma,
+ void (*dtor)(void *, void *))
{
- u32 key_size = htab->map.key_size;
- struct bpf_mem_alloc *ma;
struct htab_btf_record *hrec;
int err;
/* No need for dtors. */
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(map->record))
return 0;
hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
if (!hrec)
return -ENOMEM;
- hrec->key_size = key_size;
- hrec->record = btf_record_dup(htab->map.record);
+ hrec->key_size = map->key_size;
+ hrec->record = btf_record_dup(map->record);
if (IS_ERR(hrec->record)) {
err = PTR_ERR(hrec->record);
kfree(hrec);
return err;
}
- ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec);
return 0;
}
@@ -535,9 +533,9 @@ static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf,
* populated in htab_map_alloc(), so it will always appear as NULL.
*/
if (htab_is_percpu(htab))
- return htab_set_dtor(htab, htab_pcpu_mem_dtor);
+ return bpf_ma_set_dtor(map, &htab->pcpu_ma, htab_pcpu_mem_dtor);
else
- return htab_set_dtor(htab, htab_mem_dtor);
+ return bpf_ma_set_dtor(map, &htab->ma, htab_mem_dtor);
}
static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
@@ -2752,6 +2750,7 @@ struct bpf_rhtab {
struct rhashtable ht;
struct bpf_mem_alloc ma;
u32 elem_size;
+ bool freeing_internal;
};
static const struct rhashtable_params rhtab_params = {
@@ -2832,11 +2831,34 @@ static int rhtab_map_alloc_check(union bpf_attr *attr)
return htab_map_alloc_check(attr);
}
+static void rhtab_check_and_free_fields(struct bpf_rhtab *rhtab,
+ struct rhtab_elem *elem)
+{
+ if (IS_ERR_OR_NULL(rhtab->map.record))
+ return;
+
+ bpf_obj_free_fields(rhtab->map.record,
+ rhtab_elem_value(elem, rhtab->map.key_size));
+}
+
+static void rhtab_mem_dtor(void *obj, void *ctx)
+{
+ struct htab_btf_record *hrec = ctx;
+ struct rhtab_elem *elem = obj;
+
+ if (IS_ERR_OR_NULL(hrec->record))
+ return;
+
+ bpf_obj_free_fields(hrec->record,
+ rhtab_elem_value(elem, hrec->key_size));
+}
+
static void rhtab_free_elem(void *ptr, void *arg)
{
struct bpf_rhtab *rhtab = arg;
struct rhtab_elem *elem = ptr;
+ bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
bpf_mem_cache_free_rcu(&rhtab->ma, elem);
}
@@ -2900,7 +2922,8 @@ static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem, v
rhtab_read_elem_value(&rhtab->map, copy, elem, flags);
check_and_init_map_value(&rhtab->map, copy);
}
-
+ /* Release internal structs: kptr, bpf_timer, task_work, wq */
+ rhtab_check_and_free_fields(rhtab, elem);
bpf_mem_cache_free_rcu(&rhtab->ma, elem);
return 0;
}
@@ -2942,6 +2965,7 @@ static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void
static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
u64 map_flags)
{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
void *old_val = rhtab_elem_value(elem, map->key_size);
if (map_flags & BPF_NOEXIST)
@@ -2951,6 +2975,17 @@ static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *el
copy_map_value_locked(map, old_val, value, false);
else
copy_map_value(map, old_val, value);
+
+ /*
+ * Torn reads: a concurrent reader without BPF_F_LOCK may observe
+ * the value mid-copy. Callers requiring consistent reads must use
+ * BPF_F_LOCK, matching arraymap semantics.
+ *
+ * copy_map_value() skips special-field offsets, so old timers/
+ * kptrs/etc. still sit in the slot. Cancel them after the copy
+ * to match arraymap's update semantics.
+ */
+ rhtab_check_and_free_fields(rhtab, elem);
return 0;
}
@@ -2973,6 +3008,14 @@ static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u
if (map_flags & BPF_EXIST)
return -ENOENT;
+ /*
+ * Reject new insertions while map_release_uref cleanup walks the
+ * table. Without this, new elements could keep triggering rehash
+ * and prevent the walk from terminating.
+ */
+ if (READ_ONCE(rhtab->freeing_internal))
+ return -EBUSY;
+
/* Check max_entries limit before inserting new element */
if (atomic_read(&rhtab->ht.nelems) >= map->max_entries)
return -E2BIG;
@@ -2983,6 +3026,7 @@ static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u
memcpy(elem->data, key, map->key_size);
copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+ check_and_init_map_value(map, rhtab_elem_value(elem, map->key_size));
/* Prevent deadlock for NMI programs attempting to take bucket lock */
bpf_disable_instrumentation();
@@ -3015,8 +3059,54 @@ static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
return insn - insn_buf;
}
+static int rhtab_map_check_btf(struct bpf_map *map, const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+ return bpf_ma_set_dtor(map, &rhtab->ma, rhtab_mem_dtor);
+}
+
static void rhtab_map_free_internal_structs(struct bpf_map *map)
{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhashtable_iter iter;
+ struct rhtab_elem *elem;
+
+ if (!bpf_map_has_internal_structs(map))
+ return;
+
+ /*
+ * Block new insertions. Once observed, no new growth is triggered,
+ * so any in-flight rehash will drain and the walker is guaranteed
+ * to stop returning -EAGAIN. Treat -EAGAIN as "rehash in progress,
+ * retry"; do not wait for the worker.
+ */
+ WRITE_ONCE(rhtab->freeing_internal, true);
+
+ rhashtable_walk_enter(&rhtab->ht, &iter);
+ rhashtable_walk_start(&iter);
+
+ while ((elem = rhashtable_walk_next(&iter))) {
+ if (IS_ERR(elem)) {
+ if (PTR_ERR(elem) == -EAGAIN)
+ continue;
+ break;
+ }
+
+ bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
+
+ if (need_resched()) { /* Avoid stalls on large maps */
+ rhashtable_walk_stop(&iter);
+ cond_resched();
+ rhashtable_walk_start(&iter);
+ }
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+ WRITE_ONCE(rhtab->freeing_internal, false);
}
static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
@@ -3382,6 +3472,7 @@ const struct bpf_map_ops rhtab_map_ops = {
.map_free = rhtab_map_free,
.map_get_next_key = rhtab_map_get_next_key,
.map_release_uref = rhtab_map_free_internal_structs,
+ .map_check_btf = rhtab_map_check_btf,
.map_lookup_elem = rhtab_map_lookup_elem,
.map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem,
.map_update_elem = rhtab_map_update_elem,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1faae184de48..31a3b70a0b5d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1280,6 +1280,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
case BPF_SPIN_LOCK:
case BPF_RES_SPIN_LOCK:
if (map->map_type != BPF_MAP_TYPE_HASH &&
+ map->map_type != BPF_MAP_TYPE_RHASH &&
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
@@ -1294,6 +1295,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
case BPF_WORKQUEUE:
case BPF_TASK_WORK:
if (map->map_type != BPF_MAP_TYPE_HASH &&
+ map->map_type != BPF_MAP_TYPE_RHASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
ret = -EOPNOTSUPP;
@@ -1305,6 +1307,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
case BPF_KPTR_PERCPU:
case BPF_REFCOUNT:
if (map->map_type != BPF_MAP_TYPE_HASH &&
+ map->map_type != BPF_MAP_TYPE_RHASH &&
map->map_type != BPF_MAP_TYPE_PERCPU_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_PERCPU_HASH &&
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (5 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 06/12] bpf: Allow special fields in " Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:54 ` sashiko-bot
2026-06-05 12:22 ` bot+bpf-ci
2026-06-05 11:41 ` [PATCH bpf-next v7 08/12] libbpf: Support " Mykyta Yatsenko
` (5 subsequent siblings)
12 siblings, 2 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Specialize the lookup/update/delete paths for keys whose size matches
sizeof(long) (4 bytes on 32-bit, 8 bytes on 64-bit). A static-const
rhashtable_params lets the compiler inline a custom XOR-fold hashfn and
a single-word equality cmpfn, eliminating the indirect jhash dispatch.
The same hashfn and cmpfn are installed into rhashtable's stored params
at rhashtable_init time, so the rehash worker, slow-path inserts, and
rhashtable_next_key() all agree with the inlined fast paths.
The seq_file BPF iterator uses rhashtable_walk_* and is unaffected.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 7b9408b8320c..b4366cad3cfa 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2763,6 +2763,31 @@ static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
return l->data + round_up(key_size, 8);
}
+/* Specialize hash function and objcmp for long sized key */
+static __always_inline int rhtab_key_cmp_long(struct rhashtable_compare_arg *arg,
+ const void *ptr)
+{
+ const unsigned long key1 = *(const unsigned long *)arg->key;
+ const struct rhtab_elem *key2 = ptr;
+
+ return key1 != *(const unsigned long *)key2->data;
+}
+
+static __always_inline u32 rhtab_hashfn_long(const void *data, u32 len, u32 seed)
+{
+ u64 k = *(const unsigned long *)data;
+
+ return (u32)(k ^ (k >> 32)) ^ seed;
+}
+
+static const struct rhashtable_params rhtab_params_long = {
+ .head_offset = offsetof(struct rhtab_elem, node),
+ .key_offset = offsetof(struct rhtab_elem, data),
+ .key_len = sizeof(long),
+ .hashfn = rhtab_hashfn_long,
+ .obj_cmpfn = rhtab_key_cmp_long,
+};
+
static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
{
struct rhashtable_params params;
@@ -2788,6 +2813,11 @@ static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
params.nelem_hint = (u32)attr->map_extra;
params.automatic_shrinking = true;
+ if (rhtab->map.key_size == sizeof(long)) {
+ params.hashfn = rhtab_hashfn_long;
+ params.obj_cmpfn = rhtab_key_cmp_long;
+ }
+
err = rhashtable_init(&rhtab->ht, ¶ms);
if (err)
goto free_rhtab;
@@ -2878,6 +2908,9 @@ static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
/* Hold RCU lock in case sleepable program calls via gen_lookup */
guard(rcu)();
+ if (map->key_size == sizeof(long))
+ return rhashtable_lookup_likely(&rhtab->ht, key, rhtab_params_long);
+
return rhashtable_lookup_likely(&rhtab->ht, key, rhtab_params);
}
@@ -2912,7 +2945,12 @@ static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem, v
* raw tracepoints, which we don't have in rhashtable.
*/
bpf_disable_instrumentation();
- err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab_params);
+
+ if (rhtab->map.key_size == sizeof(long))
+ err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab_params_long);
+ else
+ err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab_params);
+
bpf_enable_instrumentation();
if (err)
@@ -3030,7 +3068,12 @@ static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u
/* Prevent deadlock for NMI programs attempting to take bucket lock */
bpf_disable_instrumentation();
- tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab_params);
+
+ if (map->key_size == sizeof(long))
+ tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab_params_long);
+ else
+ tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab_params);
+
bpf_enable_instrumentation();
if (tmp) {
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 08/12] libbpf: Support resizable hashtable
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (6 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 09/12] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
` (4 subsequent siblings)
12 siblings, 0 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko, Emil Tsalapatis
From: Mykyta Yatsenko <yatsenko@meta.com>
Add BPF_MAP_TYPE_RHASH to libbpf's map type name table and feature
probing so that libbpf-based tools can create and identify resizable
hash maps.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
tools/lib/bpf/libbpf.c | 1 +
tools/lib/bpf/libbpf_probes.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ab2071fdd3e8..1354bcbc8b30 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -192,6 +192,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
[BPF_MAP_TYPE_ARENA] = "arena",
[BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
+ [BPF_MAP_TYPE_RHASH] = "rhash",
};
static const char * const prog_type_name[] = {
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index b70d9637ecf5..e40819465ddc 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -309,6 +309,9 @@ static int probe_map_create(enum bpf_map_type map_type)
value_size = sizeof(__u64);
opts.map_flags = BPF_F_NO_PREALLOC;
break;
+ case BPF_MAP_TYPE_RHASH:
+ opts.map_flags = BPF_F_NO_PREALLOC;
+ break;
case BPF_MAP_TYPE_CGROUP_STORAGE:
case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
key_size = sizeof(struct bpf_cgroup_storage_key);
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 09/12] selftests/bpf: Add basic tests for resizable hash map
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (7 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 08/12] libbpf: Support " Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 10/12] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
` (3 subsequent siblings)
12 siblings, 0 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Test basic map operations (lookup, update, delete) for
BPF_MAP_TYPE_RHASH including boundary conditions like duplicate
key insertion and deletion of nonexistent keys.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/prog_tests/rhash.c | 120 ++++++++++++
tools/testing/selftests/bpf/progs/rhash.c | 248 +++++++++++++++++++++++++
2 files changed, 368 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
new file mode 100644
index 000000000000..69686bf69ba5
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <string.h>
+#include <stdio.h>
+#include "rhash.skel.h"
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+
+static void rhash_run(const char *prog_name)
+{
+ struct rhash *skel;
+ struct bpf_program *prog;
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ int err;
+
+ skel = rhash__open();
+ if (!ASSERT_OK_PTR(skel, "rhash__open"))
+ return;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+ bpf_program__set_autoload(prog, true);
+
+ err = rhash__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+ if (!ASSERT_OK(err, "prog run"))
+ goto cleanup;
+
+ if (!ASSERT_OK(opts.retval, "prog retval"))
+ goto cleanup;
+
+ if (!ASSERT_OK(skel->bss->err, "bss->err"))
+ goto cleanup;
+
+cleanup:
+ rhash__destroy(skel);
+}
+
+static int rhash_map_create(__u32 max_entries, __u64 map_extra)
+{
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .map_flags = BPF_F_NO_PREALLOC,
+ .map_extra = map_extra);
+
+ return bpf_map_create(BPF_MAP_TYPE_RHASH, "rhash_extra",
+ sizeof(__u32), sizeof(__u64), max_entries, &opts);
+}
+
+static void rhash_map_extra_presize(void)
+{
+ const __u32 max_entries = 1024;
+ const __u32 nelem_hint = 256;
+ struct bpf_map_info info = {};
+ __u32 info_len = sizeof(info);
+ __u64 val = 0;
+ __u32 key;
+ int fd, i;
+
+ fd = rhash_map_create(max_entries, nelem_hint);
+ if (!ASSERT_GE(fd, 0, "rhash_map_create presize"))
+ return;
+
+ if (!ASSERT_OK(bpf_map_get_info_by_fd(fd, &info, &info_len), "info"))
+ goto close;
+ ASSERT_EQ(info.map_extra, nelem_hint, "info.map_extra");
+
+ for (i = 0; i < (int)nelem_hint; i++) {
+ key = i;
+ if (!ASSERT_OK(bpf_map_update_elem(fd, &key, &val, BPF_NOEXIST),
+ "update"))
+ goto close;
+ }
+close:
+ close(fd);
+}
+
+static void rhash_map_extra_too_big(void)
+{
+ int fd;
+
+ fd = rhash_map_create(1U << 20, 0x10000);
+ if (!ASSERT_LT(fd, 0, "rhash_map_create hint > U16_MAX"))
+ close(fd);
+}
+
+void test_rhash(void)
+{
+ if (test__start_subtest("test_rhash_lookup_update"))
+ rhash_run("test_rhash_lookup_update");
+
+ if (test__start_subtest("test_rhash_update_delete"))
+ rhash_run("test_rhash_update_delete");
+
+ if (test__start_subtest("test_rhash_update_elements"))
+ rhash_run("test_rhash_update_elements");
+
+ if (test__start_subtest("test_rhash_update_exist"))
+ rhash_run("test_rhash_update_exist");
+
+ if (test__start_subtest("test_rhash_update_any"))
+ rhash_run("test_rhash_update_any");
+
+ if (test__start_subtest("test_rhash_noexist_duplicate"))
+ rhash_run("test_rhash_noexist_duplicate");
+
+ if (test__start_subtest("test_rhash_delete_nonexistent"))
+ rhash_run("test_rhash_delete_nonexistent");
+
+ if (test__start_subtest("test_rhash_map_extra_presize"))
+ rhash_map_extra_presize();
+
+ if (test__start_subtest("test_rhash_map_extra_too_big"))
+ rhash_map_extra_too_big();
+}
diff --git a/tools/testing/selftests/bpf/progs/rhash.c b/tools/testing/selftests/bpf/progs/rhash.c
new file mode 100644
index 000000000000..fc2dac3a719e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rhash.c
@@ -0,0 +1,248 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+
+#define ENOENT 2
+#define EEXIST 17
+
+char _license[] SEC("license") = "GPL";
+
+int err;
+
+struct elem {
+ char arr[128];
+ int val;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RHASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 128);
+ __type(key, int);
+ __type(value, struct elem);
+} rhmap SEC(".maps");
+
+SEC("syscall")
+int test_rhash_lookup_update(void *ctx)
+{
+ int key = 5;
+ struct elem empty = {.val = 3, .arr = {0}};
+ struct elem *e;
+
+ err = 1;
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 1;
+
+ err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+ if (err)
+ return 1;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != empty.val) {
+ err = 2;
+ return 2;
+ }
+
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_delete(void *ctx)
+{
+ int key = 6;
+ struct elem empty = {.val = 4, .arr = {0}};
+ struct elem *e;
+
+ err = 1;
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 1;
+
+ err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+ if (err)
+ return 2;
+
+ err = bpf_map_delete_elem(&rhmap, &key);
+ if (err)
+ return 3;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e) {
+ err = 4;
+ return 4;
+ }
+
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_elements(void *ctx)
+{
+ int key = 0;
+ struct elem empty = {.val = 4, .arr = {0}};
+ struct elem *e;
+ int i;
+
+ err = 1;
+
+ for (i = 0; i < 128; ++i) {
+ key = i;
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 1;
+
+ empty.val = key;
+ err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+ if (err)
+ return 2;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != key) {
+ err = 4;
+ return 4;
+ }
+ }
+
+ for (i = 0; i < 128; ++i) {
+ key = i;
+ err = bpf_map_delete_elem(&rhmap, &key);
+ if (err)
+ return 3;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e) {
+ err = 5;
+ return 5;
+ }
+ }
+
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_exist(void *ctx)
+{
+ int key = 10;
+ struct elem val1 = {.val = 100, .arr = {0}};
+ struct elem val2 = {.val = 200, .arr = {0}};
+ struct elem *e;
+ int ret;
+
+ err = 1;
+
+ /* BPF_EXIST on non-existent key should fail with -ENOENT */
+ ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_EXIST);
+ if (ret != -ENOENT)
+ return 1;
+
+ /* Insert element first */
+ ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_NOEXIST);
+ if (ret)
+ return 2;
+
+ /* Verify initial value */
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 100)
+ return 3;
+
+ /* BPF_EXIST on existing key should succeed and update value */
+ ret = bpf_map_update_elem(&rhmap, &key, &val2, BPF_EXIST);
+ if (ret)
+ return 4;
+
+ /* Verify value was updated */
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 200)
+ return 5;
+
+ /* Cleanup */
+ bpf_map_delete_elem(&rhmap, &key);
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_any(void *ctx)
+{
+ int key = 11;
+ struct elem val1 = {.val = 111, .arr = {0}};
+ struct elem val2 = {.val = 222, .arr = {0}};
+ struct elem *e;
+ int ret;
+
+ err = 1;
+
+ /* BPF_ANY on non-existent key should insert */
+ ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_ANY);
+ if (ret)
+ return 1;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 111)
+ return 2;
+
+ /* BPF_ANY on existing key should update */
+ ret = bpf_map_update_elem(&rhmap, &key, &val2, BPF_ANY);
+ if (ret)
+ return 3;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 222)
+ return 4;
+
+ /* Cleanup */
+ bpf_map_delete_elem(&rhmap, &key);
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_noexist_duplicate(void *ctx)
+{
+ int key = 12;
+ struct elem val = {.val = 600, .arr = {0}};
+ int ret;
+
+ err = 1;
+
+ /* Insert element */
+ ret = bpf_map_update_elem(&rhmap, &key, &val, BPF_NOEXIST);
+ if (ret)
+ return 1;
+
+ /* Try to insert again with BPF_NOEXIST - should fail with -EEXIST */
+ ret = bpf_map_update_elem(&rhmap, &key, &val, BPF_NOEXIST);
+ if (ret != -EEXIST)
+ return 2;
+
+ /* Cleanup */
+ bpf_map_delete_elem(&rhmap, &key);
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_delete_nonexistent(void *ctx)
+{
+ int key = 99999;
+ int ret;
+
+ err = 1;
+
+ /* Delete non-existent key should return -ENOENT */
+ ret = bpf_map_delete_elem(&rhmap, &key);
+ if (ret != -ENOENT)
+ return 1;
+
+ err = 0;
+ return 0;
+}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 10/12] selftests/bpf: Add BPF iterator tests for resizable hash map
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (8 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 09/12] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:56 ` sashiko-bot
2026-06-05 11:41 ` [PATCH bpf-next v7 11/12] bpftool: Add rhash map documentation Mykyta Yatsenko
` (2 subsequent siblings)
12 siblings, 1 reply; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Test basic BPF iterator functionality for BPF_MAP_TYPE_RHASH,
verifying all elements are visited.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/prog_tests/rhash.c | 63 ++++++++++++++++++++++
.../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c | 34 ++++++++++++
2 files changed, 97 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
index 69686bf69ba5..98bb66907b7f 100644
--- a/tools/testing/selftests/bpf/prog_tests/rhash.c
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -4,6 +4,7 @@
#include <string.h>
#include <stdio.h>
#include "rhash.skel.h"
+#include "bpf_iter_bpf_rhash_map.skel.h"
#include <linux/bpf.h>
#include <linux/perf_event.h>
#include <sys/syscall.h>
@@ -89,6 +90,65 @@ static void rhash_map_extra_too_big(void)
close(fd);
}
+static void rhash_iter_test(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+ struct bpf_iter_bpf_rhash_map *skel;
+ int err, i, len, map_fd, iter_fd;
+ union bpf_iter_link_info linfo;
+ u32 expected_key_sum = 0, key;
+ struct bpf_link *link;
+ u64 val = 0;
+ char buf[64];
+
+ skel = bpf_iter_bpf_rhash_map__open();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_rhash_map__open"))
+ return;
+
+ err = bpf_iter_bpf_rhash_map__load(skel);
+ if (!ASSERT_OK(err, "bpf_iter_bpf_rhash_map__load"))
+ goto out;
+
+ map_fd = bpf_map__fd(skel->maps.rhashmap);
+
+ /* Populate map with test data */
+ for (i = 0; i < 64; i++) {
+ key = i + 1;
+ expected_key_sum += key;
+
+ err = bpf_map_update_elem(map_fd, &key, &val, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "map_update"))
+ goto out;
+ }
+
+ memset(&linfo, 0, sizeof(linfo));
+ linfo.map.map_fd = map_fd;
+ opts.link_info = &linfo;
+ opts.link_info_len = sizeof(linfo);
+
+ link = bpf_program__attach_iter(skel->progs.dump_bpf_rhash_map, &opts);
+ if (!ASSERT_OK_PTR(link, "attach_iter"))
+ goto out;
+
+ iter_fd = bpf_iter_create(bpf_link__fd(link));
+ if (!ASSERT_GE(iter_fd, 0, "create_iter"))
+ goto free_link;
+
+ do {
+ len = read(iter_fd, buf, sizeof(buf));
+ } while (len > 0);
+
+ ASSERT_EQ(skel->bss->key_sum, expected_key_sum, "key_sum");
+ ASSERT_EQ(skel->bss->elem_count, 64, "elem_count");
+
+ close(iter_fd);
+
+free_link:
+ bpf_link__destroy(link);
+out:
+ bpf_iter_bpf_rhash_map__destroy(skel);
+}
+
void test_rhash(void)
{
if (test__start_subtest("test_rhash_lookup_update"))
@@ -117,4 +177,7 @@ void test_rhash(void)
if (test__start_subtest("test_rhash_map_extra_too_big"))
rhash_map_extra_too_big();
+
+ if (test__start_subtest("test_rhash_iter"))
+ rhash_iter_test();
}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c
new file mode 100644
index 000000000000..86f6c0d5eadb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RHASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 64);
+ __type(key, __u32);
+ __type(value, __u64);
+} rhashmap SEC(".maps");
+
+__u32 key_sum = 0;
+__u64 val_sum = 0;
+__u32 elem_count = 0;
+__u32 err = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_rhash_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+ __u32 *key = ctx->key;
+ __u64 *val = ctx->value;
+
+ if (!key || !val)
+ return 0;
+
+ key_sum += *key;
+ val_sum += *val;
+ elem_count++;
+ return 0;
+}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 11/12] bpftool: Add rhash map documentation
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (9 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 10/12] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 12/12] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
2026-06-05 15:10 ` [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map patchwork-bot+netdevbpf
12 siblings, 0 replies; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko, Emil Tsalapatis
From: Mykyta Yatsenko <yatsenko@meta.com>
Make bpftool documentation aware of the resizable hash map.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 1af3305ea2b2..5daf3de5c744 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -56,7 +56,7 @@ MAP COMMANDS
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena**
-| | **insn_array** }
+| | **insn_array** | **rhash** }
DESCRIPTION
===========
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7ebf7dbcfba4..71a45d96617e 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1478,7 +1478,7 @@ static int do_help(int argc, char **argv)
" cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
" queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
" task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena |\n"
- " insn_array }\n"
+ " insn_array | rhash }\n"
" " HELP_SPEC_OPTIONS " |\n"
" {-f|--bpffs} | {-n|--nomount} }\n"
"",
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH bpf-next v7 12/12] selftests/bpf: Add resizable hashmap to benchmarks
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (10 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 11/12] bpftool: Add rhash map documentation Mykyta Yatsenko
@ 2026-06-05 11:41 ` Mykyta Yatsenko
2026-06-05 12:03 ` sashiko-bot
2026-06-05 15:10 ` [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map patchwork-bot+netdevbpf
12 siblings, 1 reply; 23+ messages in thread
From: Mykyta Yatsenko @ 2026-06-05 11:41 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Support resizable hashmap in BPF map benchmarks.
1. LOOKUP (single producer, M events/sec)
key | max | nr | htab | rhtab | ratio | delta
----+-----+-------+---------+---------+-------+-------
8 | 1K | 750 | 99.85 | 81.92 | 0.82x | -18 %
8 | 1K | 1K | 100.71 | 80.19 | 0.80x | -20 %
8 | 1M | 750K | 23.37 | 72.09 | 3.08x | +208 %
8 | 1M | 1M | 13.39 | 53.72 | 4.01x | +301 %
32 | 1K | 750 | 51.57 | 42.78 | 0.83x | -17 %
32 | 1K | 1K | 50.81 | 45.83 | 0.90x | -10 %
32 | 1M | 750K | 11.27 | 15.29 | 1.36x | +36 %
32 | 1M | 1M | 7.32 | 8.75 | 1.19x | +19 %
256 | 1K | 750 | 7.58 | 7.88 | 1.04x | +4 %
256 | 1K | 1K | 7.43 | 7.81 | 1.05x | +5 %
256 | 1M | 750K | 3.69 | 4.27 | 1.16x | +16 %
256 | 1M | 1M | 2.60 | 3.12 | 1.20x | +20 %
Pattern:
* Small map (1K): htab wins for 8 / 32 byte keys by 10-20%
* Large map (1M): rhtab wins everywhere, up to 4x at high load
factor with 8 byte keys.
* Higher load factor amplifies rhtab's lead: rhtab grows the
bucket array; htab stays at user-declared max.
2. FULL UPDATE (M events/sec per producer)
htab per-producer:
20.33 22.02 19.27 23.61 24.18 23.17 21.07
mean 21.94 range 19.27 - 24.18
rhtab per-producer:
133.51 129.47 74.52 129.29 102.26 129.98 107.64
mean 115.24 range 74.52 - 133.51
speedup (mean): 5.25x (+425 %)
In-place memcpy avoids the per-update alloc + RCU pointer swap
that htab pays.
3. MEMORY
value_size | htab ops/s | rhtab ops/s | htab mem | rhtab mem
-----------+-------------+-------------+----------+----------
32 B | 122.87 k/s | 133.04 k/s | 2.47 MiB | 2.49 MiB
4096 B | 64.43 k/s | 65.38 k/s | 6.74 MiB | 6.44 MiB
rhtab/htab : +8 % ops, +0.8 % mem (32 B)
+1 % ops, -4 % mem (4096 B)
Throughput effectively tied
SUMMARY
* Small / well-fitting map: htab is faster (cache-friendly
fixed bucket array), but only by ~10-20 %.
* Large / high-load-factor map: rhtab is dramatically faster
(1.2x to 4x) because rhashtable resizes to keep the load
factor sane while htab stays stuck at user-declared max.
* Update-heavy workloads: rhtab is ~5x faster per producer
via in-place memcpy.
* Memory benchmark: effectively on par.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/bench.c | 6 ++++
.../bpf/benchs/bench_bpf_hashmap_full_update.c | 34 +++++++++++++++++++--
.../bpf/benchs/bench_bpf_hashmap_lookup.c | 31 +++++++++++++++++--
.../testing/selftests/bpf/benchs/bench_htab_mem.c | 35 ++++++++++++++++++++--
4 files changed, 100 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 6155ce455c27..3d9d2cd7764b 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -560,13 +560,16 @@ extern const struct bench bench_bpf_loop;
extern const struct bench bench_strncmp_no_helper;
extern const struct bench bench_strncmp_helper;
extern const struct bench bench_bpf_hashmap_full_update;
+extern const struct bench bench_bpf_rhashmap_full_update;
extern const struct bench bench_local_storage_cache_seq_get;
extern const struct bench bench_local_storage_cache_interleaved_get;
extern const struct bench bench_local_storage_cache_hashmap_control;
extern const struct bench bench_local_storage_tasks_trace;
extern const struct bench bench_bpf_hashmap_lookup;
+extern const struct bench bench_bpf_rhashmap_lookup;
extern const struct bench bench_local_storage_create;
extern const struct bench bench_htab_mem;
+extern const struct bench bench_rhtab_mem;
extern const struct bench bench_crypto_encrypt;
extern const struct bench bench_crypto_decrypt;
extern const struct bench bench_sockmap;
@@ -640,13 +643,16 @@ static const struct bench *benchs[] = {
&bench_strncmp_no_helper,
&bench_strncmp_helper,
&bench_bpf_hashmap_full_update,
+ &bench_bpf_rhashmap_full_update,
&bench_local_storage_cache_seq_get,
&bench_local_storage_cache_interleaved_get,
&bench_local_storage_cache_hashmap_control,
&bench_local_storage_tasks_trace,
&bench_bpf_hashmap_lookup,
+ &bench_bpf_rhashmap_lookup,
&bench_local_storage_create,
&bench_htab_mem,
+ &bench_rhtab_mem,
&bench_crypto_encrypt,
&bench_crypto_decrypt,
&bench_sockmap,
diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
index ee1dc12c5e5e..7278fa860397 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
@@ -34,19 +34,29 @@ static void measure(struct bench_res *res)
{
}
-static void setup(void)
+static void hashmap_full_update_setup(enum bpf_map_type map_type)
{
struct bpf_link *link;
int map_fd, i, max_entries;
setup_libbpf();
- ctx.skel = bpf_hashmap_full_update_bench__open_and_load();
+ ctx.skel = bpf_hashmap_full_update_bench__open();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
+ bpf_map__set_type(ctx.skel->maps.hash_map_bench, map_type);
+ if (map_type == BPF_MAP_TYPE_RHASH)
+ bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench,
+ BPF_F_NO_PREALLOC);
+
+ if (bpf_hashmap_full_update_bench__load(ctx.skel)) {
+ fprintf(stderr, "failed to load skeleton\n");
+ exit(1);
+ }
+
ctx.skel->bss->nr_loops = MAX_LOOP_NUM;
link = bpf_program__attach(ctx.skel->progs.benchmark);
@@ -62,6 +72,16 @@ static void setup(void)
bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
}
+static void setup(void)
+{
+ hashmap_full_update_setup(BPF_MAP_TYPE_HASH);
+}
+
+static void rhash_setup(void)
+{
+ hashmap_full_update_setup(BPF_MAP_TYPE_RHASH);
+}
+
static void hashmap_report_final(struct bench_res res[], int res_cnt)
{
unsigned int nr_cpus = bpf_num_possible_cpus();
@@ -87,3 +107,13 @@ const struct bench bench_bpf_hashmap_full_update = {
.report_progress = NULL,
.report_final = hashmap_report_final,
};
+
+const struct bench bench_bpf_rhashmap_full_update = {
+ .name = "bpf-rhashmap-full-update",
+ .validate = validate,
+ .setup = rhash_setup,
+ .producer_thread = producer,
+ .measure = measure,
+ .report_progress = NULL,
+ .report_final = hashmap_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
index 279ff1b8b5b2..5264b7b20e39 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
@@ -148,9 +148,10 @@ static inline void patch_key(u32 i, u32 *key)
/* the rest of key is random */
}
-static void setup(void)
+static void hashmap_lookup_setup(enum bpf_map_type map_type)
{
struct bpf_link *link;
+ __u32 map_flags;
int map_fd;
int ret;
int i;
@@ -163,10 +164,15 @@ static void setup(void)
exit(1);
}
+ map_flags = args.map_flags;
+ if (map_type == BPF_MAP_TYPE_RHASH)
+ map_flags |= BPF_F_NO_PREALLOC;
+
+ bpf_map__set_type(ctx.skel->maps.hash_map_bench, map_type);
bpf_map__set_max_entries(ctx.skel->maps.hash_map_bench, args.max_entries);
bpf_map__set_key_size(ctx.skel->maps.hash_map_bench, args.key_size);
bpf_map__set_value_size(ctx.skel->maps.hash_map_bench, 8);
- bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench, args.map_flags);
+ bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench, map_flags);
ctx.skel->bss->nr_entries = args.nr_entries;
ctx.skel->bss->nr_loops = args.nr_loops / args.nr_entries;
@@ -197,6 +203,16 @@ static void setup(void)
}
}
+static void setup(void)
+{
+ hashmap_lookup_setup(BPF_MAP_TYPE_HASH);
+}
+
+static void rhash_setup(void)
+{
+ hashmap_lookup_setup(BPF_MAP_TYPE_RHASH);
+}
+
static inline double events_from_time(u64 time)
{
if (time)
@@ -275,3 +291,14 @@ const struct bench bench_bpf_hashmap_lookup = {
.report_progress = NULL,
.report_final = hashmap_report_final,
};
+
+const struct bench bench_bpf_rhashmap_lookup = {
+ .name = "bpf-rhashmap-lookup",
+ .argp = &bench_hashmap_lookup_argp,
+ .validate = validate,
+ .setup = rhash_setup,
+ .producer_thread = producer,
+ .measure = measure,
+ .report_progress = NULL,
+ .report_final = hashmap_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
index 297e32390cd1..1ee217d97434 100644
--- a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
+++ b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
@@ -152,7 +152,7 @@ static const struct htab_mem_use_case *htab_mem_find_use_case_or_exit(const char
exit(1);
}
-static void htab_mem_setup(void)
+static void htab_mem_setup_impl(enum bpf_map_type map_type)
{
struct bpf_map *map;
const char **names;
@@ -178,10 +178,11 @@ static void htab_mem_setup(void)
}
map = ctx.skel->maps.htab;
+ bpf_map__set_type(map, map_type);
bpf_map__set_value_size(map, args.value_size);
/* Ensure that different CPUs can operate on different subset */
bpf_map__set_max_entries(map, MAX(8192, 64 * env.nr_cpus));
- if (args.preallocated)
+ if (map_type != BPF_MAP_TYPE_RHASH && args.preallocated)
bpf_map__set_map_flags(map, bpf_map__map_flags(map) & ~BPF_F_NO_PREALLOC);
names = ctx.uc->progs;
@@ -220,6 +221,16 @@ static void htab_mem_setup(void)
exit(1);
}
+static void htab_mem_setup(void)
+{
+ htab_mem_setup_impl(BPF_MAP_TYPE_HASH);
+}
+
+static void rhtab_mem_setup(void)
+{
+ htab_mem_setup_impl(BPF_MAP_TYPE_RHASH);
+}
+
static void htab_mem_add_fn(pthread_barrier_t *notify)
{
while (true) {
@@ -338,6 +349,15 @@ static void htab_mem_report_final(struct bench_res res[], int res_cnt)
cleanup_cgroup_environment();
}
+static void rhtab_mem_validate(void)
+{
+ if (args.preallocated) {
+ fprintf(stderr, "rhash map does not support preallocation\n");
+ exit(1);
+ }
+ htab_mem_validate();
+}
+
const struct bench bench_htab_mem = {
.name = "htab-mem",
.argp = &bench_htab_mem_argp,
@@ -348,3 +368,14 @@ const struct bench bench_htab_mem = {
.report_progress = htab_mem_report_progress,
.report_final = htab_mem_report_final,
};
+
+const struct bench bench_rhtab_mem = {
+ .name = "rhtab-mem",
+ .argp = &bench_htab_mem_argp,
+ .validate = rhtab_mem_validate,
+ .setup = rhtab_mem_setup,
+ .producer_thread = htab_mem_producer,
+ .measure = htab_mem_measure,
+ .report_progress = htab_mem_report_progress,
+ .report_final = htab_mem_report_final,
+};
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API
2026-06-05 11:41 ` [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API Mykyta Yatsenko
@ 2026-06-05 11:43 ` Herbert Xu
0 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2026-06-05 11:43 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
Mykyta Yatsenko
On Fri, Jun 05, 2026 at 04:41:18AM -0700, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Introduce a simpler iteration mechanism for rhashtable that lets
> the caller continue from an arbitrary position by supplying the
> previous key, without the per-iterator state of the
> rhashtable_walk_* API.
>
> void *rhashtable_next_key(struct rhashtable *ht,
> const void *prev_key);
>
> Caller holds RCU; passes NULL prev_key for the first element or
> the previously returned key to advance. Walks tbl->future_tbl
> chain so in-flight rehashes are observed.
>
> Best-effort: in case of concurrent resize, provides no guarantees:
> - may produce duplicate elements
> - may skip any amount of elements
> - termination of the loop is not guaranteed in case of
> sustained rehash. Callers are advised to bound loop externally
> or avoid inserting new elements during such loop.
>
> Returns ERR_PTR(-ENOENT) if prev_key is not found.
> Behavior on tables with duplicate keys is undefined.
> rhltable is not supported — returns ERR_PTR(-EOPNOTSUPP).
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> include/linux/rhashtable.h | 40 +++++++++++++++++++++++++++
> lib/rhashtable.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 109 insertions(+)
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 02/12] rhashtable: Add selftest for rhashtable_next_key()
2026-06-05 11:41 ` [PATCH bpf-next v7 02/12] rhashtable: Add selftest for rhashtable_next_key() Mykyta Yatsenko
@ 2026-06-05 11:43 ` Herbert Xu
0 siblings, 0 replies; 23+ messages in thread
From: Herbert Xu @ 2026-06-05 11:43 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
Mykyta Yatsenko
On Fri, Jun 05, 2026 at 04:41:19AM -0700, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Insert n elements, then verify:
> - NULL prev_key walks from the beginning, visiting all n
> - non-existing prev_key returns ERR_PTR(-ENOENT)
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> lib/test_rhashtable.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 75 insertions(+)
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable
2026-06-05 11:41 ` [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable Mykyta Yatsenko
@ 2026-06-05 11:54 ` sashiko-bot
2026-06-05 12:22 ` bot+bpf-ci
1 sibling, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-06-05 11:54 UTC (permalink / raw)
To: Mykyta Yatsenko; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The custom hash function `rhtab_hashfn_long` is vulnerable to Hash Denial of Service (HashDoS) because it fails to mix the random seed non-linearly.
- [High] Directly dereferencing potentially unaligned BPF map key pointers causes unaligned memory access faults.
--
commit 376379c40761f9d44cd7e1638b3ca3a7cb1e989b
Author: Mykyta Yatsenko <yatsenko@meta.com>
bpf: Optimize word-sized keys for resizable hashtable
This commit specializes the lookup, update, and delete paths for BPF resizable
hashtables when the key size matches sizeof(long). It introduces a custom
inline hash and compare function to avoid indirect jhash dispatch.
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 7b9408b8320c9..b4366cad3cfa5 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2763,6 +2763,31 @@ static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
> return l->data + round_up(key_size, 8);
> }
>
> +/* Specialize hash function and objcmp for long sized key */
> +static __always_inline int rhtab_key_cmp_long(struct rhashtable_compare_arg *arg,
> + const void *ptr)
> +{
> + const unsigned long key1 = *(const unsigned long *)arg->key;
[Severity: High]
Can this direct pointer dereference cause unaligned access faults on
architectures without hardware support for unaligned loads?
The BPF verifier ensures memory bounds checking for map keys, but does not
enforce that the key pointer is naturally aligned (e.g., keys from packed
structs or packet buffers).
Should this use get_unaligned() to safely read arg->key?
> + const struct rhtab_elem *key2 = ptr;
> +
> + return key1 != *(const unsigned long *)key2->data;
> +}
> +
> +static __always_inline u32 rhtab_hashfn_long(const void *data, u32 len, u32 seed)
> +{
> + u64 k = *(const unsigned long *)data;
[Severity: High]
Can this dereference of the data pointer also cause unaligned access faults,
similar to the key comparison function above?
> +
> + return (u32)(k ^ (k >> 32)) ^ seed;
[Severity: High]
Does this custom hash function bypass the rhashtable HashDoS mitigation?
Because the random seed is applied via a final linear XOR, the collision
condition for two keys reduces to `(H(k1) & (S - 1)) == (H(k2) & (S - 1))`.
This mathematically makes collisions completely independent of the seed.
If rhashtable's bucket chain length limit is exceeded and it attempts to
rehash with a new random seed, will all colliding keys still map to the exact
same new bucket, causing an infinite rehash loop?
> +}
> +
> +static const struct rhashtable_params rhtab_params_long = {
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-rhash-v7-0-5b8e05f8630d@meta.com?part=7
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab
2026-06-05 11:41 ` [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab Mykyta Yatsenko
@ 2026-06-05 11:56 ` sashiko-bot
2026-06-05 12:40 ` bot+bpf-ci
1 sibling, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-06-05 11:56 UTC (permalink / raw)
To: Mykyta Yatsenko; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Map iteration operations (bpf_for_each_map_elem and batch operations) prematurely and silently abort if an element is concurrently deleted or intentionally deleted by the BPF program.
- [High] Lack of atomicity in `lookup_and_delete_batch` on RHASH maps allows returning duplicate elements to concurrent consumers.
--
commit 54494d14a4d025348520c4921d8c583a6d4257a2
Author: Mykyta Yatsenko <yatsenko@meta.com>
bpf: Implement iteration ops for resizable hashtab
Implement get_next_key, batch lookup/lookup-and-delete, for_each_map_elem
callback, and the seq_file BPF iterator for BPF_MAP_TYPE_RHASH.
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
[ ... ]
> @@ -3020,8 +3020,79 @@ static void rhtab_map_free_internal_structs(struct bpf_map *map)
> static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
> void *callback_ctx, u64 flags)
> {
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + void *prev_key = NULL;
> + struct rhtab_elem *elem;
> + int num_elems = 0;
> + u64 ret = 0;
[ ... ]
> + rcu_read_lock();
> + /*
> + * Best-effort iteration: if rhashtable is concurrently resized or
> + * elements are deleted/inserted, there may be missed or duplicate
> + * elements visited.
> + */
> + while ((elem = rhashtable_next_key(&rhtab->ht, prev_key))) {
> + if (IS_ERR(elem))
> + break;
[Severity: High]
If the BPF callback intentionally deletes the current element (which is a
common pattern for map cleanup), will rhashtable_next_key() return -ENOENT
because prev_key is no longer in the table?
If so, does this loop prematurely break and return num_elems without an
error code, silently truncating the iteration?
> + num_elems++;
> + ret = callback_fn((u64)(long)map,
> + (u64)(long)elem->data,
> + (u64)(long)rhtab_elem_value(elem, map->key_size),
> + (u64)(long)callback_ctx, 0);
> + if (ret)
> + break;
> +
> + prev_key = elem->data; /* valid while RCU held */
> + }
> + rcu_read_unlock();
> +
> + return num_elems;
> }
[ ... ]
> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
> + const union bpf_attr *attr,
> + union bpf_attr __user *uattr,
> + bool do_delete)
> +{
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
[ ... ]
> + while (elem && !IS_ERR(elem) && total < max_count) {
> + memcpy(dst_key, elem->data, key_size);
> + rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
> + check_and_init_map_value(map, dst_val);
> +
> + if (do_delete)
> + del_elems[total] = elem;
> +
> + elem = rhashtable_next_key(&rhtab->ht, dst_key);
[Severity: High]
If a concurrent thread deletes the dst_key element, will
rhashtable_next_key() return -ENOENT and break the loop?
If this happens, has_next_cursor remains false and the function returns
-ENOENT to userspace. Will userspace misinterpret this -ENOENT as the
legitimate end of the map and stop iterating, dropping all remaining
unvisited elements?
> + dst_key += key_size;
> + dst_val += value_size;
> + total++;
> +
> + /* Bail to userspace to avoid stalls. */
> + if (need_resched())
> + break;
> + }
> +
> + if (elem && !IS_ERR(elem)) {
> + /* Stash next-to-process key as cursor for the next batch. */
> + memcpy(cursor, elem->data, key_size);
> + has_next_cursor = true;
> + }
> +
> + if (do_delete) {
> + for (i = 0; i < total; i++)
> + rhtab_delete_elem(rhtab, del_elems[i], NULL, 0);
> + }
[Severity: High]
Because this function iterates locklessly under RCU and defers deletion
until after the loop, can concurrent batch calls overlap their reads and
observe the exact same elements?
If they both copy the identical elements into their userspace buffers, only
one will successfully delete the element, but the return value of
rhtab_delete_elem() is ignored. Could both threads successfully return the
same elements to userspace, violating the atomic pop semantics required by
applications draining the queue?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-rhash-v7-0-5b8e05f8630d@meta.com?part=5
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 10/12] selftests/bpf: Add BPF iterator tests for resizable hash map
2026-06-05 11:41 ` [PATCH bpf-next v7 10/12] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
@ 2026-06-05 11:56 ` sashiko-bot
0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-06-05 11:56 UTC (permalink / raw)
To: Mykyta Yatsenko; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The BPF iterator test `rhash_iter_test` does not verify that `read()` completed without an error.
--
commit 80a3099ee25442fbc4f994c884097e00ff4bfc97
Author: Mykyta Yatsenko <yatsenko@meta.com>
selftests/bpf: Add BPF iterator tests for resizable hash map
This commit tests basic BPF iterator functionality for BPF_MAP_TYPE_RHASH,
verifying all elements are visited.
> diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
> index 69686bf69ba57..98bb66907b7f7 100644
> --- a/tools/testing/selftests/bpf/prog_tests/rhash.c
> +++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
[ ... ]
> @@ -89,6 +90,65 @@ static void rhash_map_extra_too_big(void)
> close(fd);
> }
>
> +static void rhash_iter_test(void)
> +{
[ ... ]
> + iter_fd = bpf_iter_create(bpf_link__fd(link));
> + if (!ASSERT_GE(iter_fd, 0, "create_iter"))
> + goto free_link;
> +
> + do {
> + len = read(iter_fd, buf, sizeof(buf));
> + } while (len > 0);
[Severity: Medium]
Could this loop silently swallow errors if read() returns a negative value?
If read() returns an error, the loop will exit silently. If an error occurs
after all map elements have already been processed by the BPF program, the
subsequent assertions on key_sum and elem_count might still pass, which would
completely mask the failure.
Would it make sense to add an explicit ASSERT_GE(len, 0, "read") after the
loop to ensure read() completes without error?
> +
> + ASSERT_EQ(skel->bss->key_sum, expected_key_sum, "key_sum");
> + ASSERT_EQ(skel->bss->elem_count, 64, "elem_count");
> +
> + close(iter_fd);
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-rhash-v7-0-5b8e05f8630d@meta.com?part=10
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 12/12] selftests/bpf: Add resizable hashmap to benchmarks
2026-06-05 11:41 ` [PATCH bpf-next v7 12/12] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
@ 2026-06-05 12:03 ` sashiko-bot
0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-06-05 12:03 UTC (permalink / raw)
To: Mykyta Yatsenko; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The bpf-rhashmap-full-update benchmark measures the -E2BIG error path instead of successful map updates, invalidating the commit message's performance claims.
--
commit c016012bca8439800d9763b0e9dbdba68b4e1642
Author: Mykyta Yatsenko <yatsenko@meta.com>
selftests/bpf: Add resizable hashmap to benchmarks
This patch adds support for resizable hashmaps in BPF map benchmarks.
It claims a 5.25x speedup for rhtab over htab during full updates.
> 2. FULL UPDATE (M events/sec per producer)
>
> htab per-producer:
> 20.33 22.02 19.27 23.61 24.18 23.17 21.07
> mean 21.94 range 19.27 - 24.18
>
> rhtab per-producer:
> 133.51 129.47 74.52 129.29 102.26 129.98 107.64
> mean 115.24 range 74.52 - 133.51
>
> speedup (mean): 5.25x (+425 %)
>
> In-place memcpy avoids the per-update alloc + RCU pointer swap
> that htab pays.
[Severity: Medium]
Are these benchmark results measuring successful updates, or just the speed of
the -E2BIG error path?
In tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c:hashmap_full_update_setup(),
the userspace setup pre-fills the map to its maximum capacity:
/* fill hash_map */
map_fd = bpf_map__fd(ctx.skel->maps.hash_map_bench);
max_entries = bpf_map__max_entries(ctx.skel->maps.hash_map_bench);
for (i = 0; i < max_entries; i++)
bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
The BPF program then attempts to insert a new key that doesn't exist in the map
in tools/testing/selftests/bpf/progs/bpf_hashmap_full_update_bench.c:benchmark():
u32 cpu = bpf_get_smp_processor_id();
u32 key = cpu + MAX_ENTRIES;
...
bpf_loop(nr_loops, loop_update_callback, &key, 0);
Because the map is already full, does every update attempt immediately fail with
-E2BIG without performing any updates or memcpy?
For htab (which preallocates), returning -E2BIG requires a slow fallback
traversal of all other CPUs' empty freelists. For rhtab (no prealloc), it is
a fast atomic check against the element count.
Does this regression mean the benchmark is comparing the speed of error paths
rather than successful map updates, which might invalidate the performance
claims?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-rhash-v7-0-5b8e05f8630d@meta.com?part=12
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable
2026-06-05 11:41 ` [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable Mykyta Yatsenko
2026-06-05 11:54 ` sashiko-bot
@ 2026-06-05 12:22 ` bot+bpf-ci
1 sibling, 0 replies; 23+ messages in thread
From: bot+bpf-ci @ 2026-06-05 12:22 UTC (permalink / raw)
To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 7b9408b8320c..b4366cad3cfa 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2763,6 +2763,31 @@ static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
> return l->data + round_up(key_size, 8);
> }
>
> +/* Specialize hash function and objcmp for long sized key */
> +static __always_inline int rhtab_key_cmp_long(struct rhashtable_compare_arg *arg,
> + const void *ptr)
> +{
> + const unsigned long key1 = *(const unsigned long *)arg->key;
> + const struct rhtab_elem *key2 = ptr;
> +
> + return key1 != *(const unsigned long *)key2->data;
Could dereferencing these pointers directly cause alignment faults on
architectures that require strict alignment?
Since BPF programs can pass unaligned pointers (such as from packet data
in skb->data) directly to map helpers, using get_unaligned() might be
necessary here to prevent kernel panics.
Previous discussion: The author responded in v5 that "the same problem
exists for the htab jhash2" but did not address the concern. This issue
was raised by sashiko-bot in v5 and v6, and repeated by BPF CI bot in v6.
Reference: https://lore.kernel.org/bpf/20260528182522.23D121F00A3C@smtp.kernel.org/
> +}
> +
> +static __always_inline u32 rhtab_hashfn_long(const void *data, u32 len, u32 seed)
> +{
> + u64 k = *(const unsigned long *)data;
Same alignment issue here: direct dereferencing could fault on
strict-alignment architectures like ARM32 or SPARC.
> +
> + return (u32)(k ^ (k >> 32)) ^ seed;
Does applying the seed after folding the key make hash collisions entirely
independent of the seed?
If a BPF program or an attacker inserts keys constructed as:
k = (Y << 32) | (X ^ Y)
These keys will all fold to exactly X before the XOR with the seed, meaning
the hash output for all such keys satisfies:
f(key1) ^ seed == f(key2) ^ seed
for any random seed.
If rhashtable detects a long bucket chain and triggers a background rehash
with a new seed, won't all these colliding keys move together to the same
new bucket?
This might trap the rhashtable worker in an infinite loop of rehashing,
leading to a 100% CPU livelock.
Previous discussion: The author acknowledged in v5 that "the attack is
possible in theory" but decided to "leave for performance" without
addressing it. This issue was raised by sashiko-bot in v5 and v6, and
repeated by BPF CI bot in v6.
Reference: https://lore.kernel.org/bpf/20260528182522.23D121F00A3C@smtp.kernel.org/
> +}
> +
> +static const struct rhashtable_params rhtab_params_long = {
> + .head_offset = offsetof(struct rhtab_elem, node),
> + .key_offset = offsetof(struct rhtab_elem, data),
> + .key_len = sizeof(long),
> + .hashfn = rhtab_hashfn_long,
> + .obj_cmpfn = rhtab_key_cmp_long,
> +};
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27013077824
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab
2026-06-05 11:41 ` [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab Mykyta Yatsenko
2026-06-05 11:56 ` sashiko-bot
@ 2026-06-05 12:40 ` bot+bpf-ci
1 sibling, 0 replies; 23+ messages in thread
From: bot+bpf-ci @ 2026-06-05 12:40 UTC (permalink / raw)
To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3768 bytes --]
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 10f3a058747b..a149713d0953 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -3020,8 +3020,79 @@ static void rhtab_map_free_internal_structs(struct bpf_map *map)
[ ... ]
> +static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
> + void *callback_ctx, u64 flags)
> +{
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + void *prev_key = NULL;
> + struct rhtab_elem *elem;
> + int num_elems = 0;
> + u64 ret = 0;
> +
> + cant_migrate();
> +
> + if (flags != 0)
> + return -EINVAL;
> +
> + rcu_read_lock();
> + /*
> + * Best-effort iteration: if rhashtable is concurrently resized or
> + * elements are deleted/inserted, there may be missed or duplicate
> + * elements visited.
> + */
> + while ((elem = rhashtable_next_key(&rhtab->ht, prev_key))) {
The rhashtable_next_key() documentation notes that a full iteration is
not guaranteed to terminate under adversarial or sustained rehashing.
This loop has no external bound and runs under rcu_read_lock() with
migration disabled. Can continuous concurrent resizes keep this loop
from terminating, leading to a soft lockup or RCU stall?
> + if (IS_ERR(elem))
> + break;
If prev_key is deleted concurrently, or by the callback itself,
rhashtable_next_key() returns ERR_PTR(-ENOENT) and this break exits the
loop. Does that abort the iteration immediately and skip the remaining
elements?
rhtab_map_get_next_key() handles the same -ENOENT case by falling back
to the first key:
/* if not found, return the first key */
if (PTR_ERR(elem) == -ENOENT)
elem = rhashtable_next_key(&rhtab->ht, NULL);
Should bpf_each_rhash_elem() do something similar rather than breaking,
given the comment describes this as best-effort iteration?
> + num_elems++;
> + ret = callback_fn((u64)(long)map,
> + (u64)(long)elem->data,
> + (u64)(long)rhtab_elem_value(elem, map->key_size),
> + (u64)(long)callback_ctx, 0);
> + if (ret)
> + break;
> +
> + prev_key = elem->data; /* valid while RCU held */
> + }
> + rcu_read_unlock();
> +
> + return num_elems;
> }
>
> static u64 rhtab_map_mem_usage(const struct bpf_map *map)
> @@ -3034,6 +3105,275 @@ static u64 rhtab_map_mem_usage(const struct bpf_map *map)
[ ... ]
> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
> + const union bpf_attr *attr,
> + union bpf_attr __user *uattr,
> + bool do_delete)
> +{
[ ... ]
> + while (elem && !IS_ERR(elem) && total < max_count) {
> + memcpy(dst_key, elem->data, key_size);
> + rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
> + check_and_init_map_value(map, dst_val);
> +
> + if (do_delete)
> + del_elems[total] = elem;
> +
> + elem = rhashtable_next_key(&rhtab->ht, dst_key);
> + dst_key += key_size;
> + dst_val += value_size;
> + total++;
> +
> + /* Bail to userspace to avoid stalls. */
> + if (need_resched())
> + break;
> + }
[ ... ]
> + if (do_delete) {
> + for (i = 0; i < total; i++)
> + rhtab_delete_elem(rhtab, del_elems[i], NULL, 0);
> + }
Deferring the deletion until after the iteration leaves the elements
visible in the hash table while they are copied into the user buffers.
Does this open a window where a concurrent batch operation can
traverse, copy, and return the same elements to userspace?
If so, does that break the exactly-once consumption semantics expected
when a batch is used to drain a map like a work queue?
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27013077824
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 03/12] rhashtable: Use irq work for shrinking
2026-06-05 11:41 ` [PATCH bpf-next v7 03/12] rhashtable: Use irq work for shrinking Mykyta Yatsenko
@ 2026-06-05 12:40 ` bot+bpf-ci
0 siblings, 0 replies; 23+ messages in thread
From: bot+bpf-ci @ 2026-06-05 12:40 UTC (permalink / raw)
To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]
> rhashtable: Use irq work for shrinking
>
> Use irq work for automatic shrinking so that this may be called in NMI
> context.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
This looks like it completes a fix that was only partially applied
earlier. The shrink path here is being switched to irq_work:
atomic_dec(&ht->nelems);
if (unlikely(ht->p.automatic_shrinking &&
rht_shrink_below_30(ht, tbl)))
- schedule_work(&ht->run_work);
+ irq_work_queue(&ht->run_irq_work);
The earlier "rhashtable: Use irq work" change introduced run_irq_work,
rht_deferred_irq_work and init_irq_work and converted the insert/grow
path to irq_work, but it left this shrink path still calling
schedule_work() directly. Calling schedule_work() here can close a
lock-ordering cycle (caller_lock -> pool->lock -> pi_lock -> rq->__lock)
and cannot run from NMI context.
Should this carry a Fixes: tag pointing at that earlier
"rhashtable: Use irq work" commit?
I was not able to resolve the exact introducing commit SHA in this tree,
since that history is squashed into the CI base snapshot and the
introducing change is not present as a standalone commit.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27013077824
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
` (11 preceding siblings ...)
2026-06-05 11:41 ` [PATCH bpf-next v7 12/12] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
@ 2026-06-05 15:10 ` patchwork-bot+netdevbpf
12 siblings, 0 replies; 23+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-05 15:10 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert, yatsenko, emil
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Fri, 05 Jun 2026 04:41:17 -0700 you wrote:
> This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that
> leverages the kernel's rhashtable to provide resizable hash map for BPF.
>
> The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at
> map creation time. While this works well for many use cases, it presents
> challenges when:
>
> [...]
Here is the summary with links:
- [bpf-next,v7,01/12] rhashtable: Add rhashtable_next_key() API
https://git.kernel.org/bpf/bpf-next/c/8f4fa9f89b72
- [bpf-next,v7,02/12] rhashtable: Add selftest for rhashtable_next_key()
https://git.kernel.org/bpf/bpf-next/c/e673eee0f49e
- [bpf-next,v7,03/12] rhashtable: Use irq work for shrinking
https://git.kernel.org/bpf/bpf-next/c/46730ee6e884
- [bpf-next,v7,04/12] bpf: Implement resizable hashmap basic functions
https://git.kernel.org/bpf/bpf-next/c/16b4d3e2fb24
- [bpf-next,v7,05/12] bpf: Implement iteration ops for resizable hashtab
https://git.kernel.org/bpf/bpf-next/c/818e00848227
- [bpf-next,v7,06/12] bpf: Allow special fields in resizable hashtab
https://git.kernel.org/bpf/bpf-next/c/6905f8601298
- [bpf-next,v7,07/12] bpf: Optimize word-sized keys for resizable hashtable
https://git.kernel.org/bpf/bpf-next/c/9dcbb5045fe5
- [bpf-next,v7,08/12] libbpf: Support resizable hashtable
https://git.kernel.org/bpf/bpf-next/c/6e46ff0abefd
- [bpf-next,v7,09/12] selftests/bpf: Add basic tests for resizable hash map
https://git.kernel.org/bpf/bpf-next/c/249996365b66
- [bpf-next,v7,10/12] selftests/bpf: Add BPF iterator tests for resizable hash map
https://git.kernel.org/bpf/bpf-next/c/a996794fda84
- [bpf-next,v7,11/12] bpftool: Add rhash map documentation
https://git.kernel.org/bpf/bpf-next/c/2bea44ea3c4e
- [bpf-next,v7,12/12] selftests/bpf: Add resizable hashmap to benchmarks
https://git.kernel.org/bpf/bpf-next/c/84f7a49e76ec
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-06-05 15:10 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 11:41 [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 01/12] rhashtable: Add rhashtable_next_key() API Mykyta Yatsenko
2026-06-05 11:43 ` Herbert Xu
2026-06-05 11:41 ` [PATCH bpf-next v7 02/12] rhashtable: Add selftest for rhashtable_next_key() Mykyta Yatsenko
2026-06-05 11:43 ` Herbert Xu
2026-06-05 11:41 ` [PATCH bpf-next v7 03/12] rhashtable: Use irq work for shrinking Mykyta Yatsenko
2026-06-05 12:40 ` bot+bpf-ci
2026-06-05 11:41 ` [PATCH bpf-next v7 04/12] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 05/12] bpf: Implement iteration ops for resizable hashtab Mykyta Yatsenko
2026-06-05 11:56 ` sashiko-bot
2026-06-05 12:40 ` bot+bpf-ci
2026-06-05 11:41 ` [PATCH bpf-next v7 06/12] bpf: Allow special fields in " Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 07/12] bpf: Optimize word-sized keys for resizable hashtable Mykyta Yatsenko
2026-06-05 11:54 ` sashiko-bot
2026-06-05 12:22 ` bot+bpf-ci
2026-06-05 11:41 ` [PATCH bpf-next v7 08/12] libbpf: Support " Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 09/12] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 10/12] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
2026-06-05 11:56 ` sashiko-bot
2026-06-05 11:41 ` [PATCH bpf-next v7 11/12] bpftool: Add rhash map documentation Mykyta Yatsenko
2026-06-05 11:41 ` [PATCH bpf-next v7 12/12] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
2026-06-05 12:03 ` sashiko-bot
2026-06-05 15:10 ` [PATCH bpf-next v7 00/12] bpf: Introduce resizable hash map patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.