public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map
@ 2026-04-24 19:50 Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko, Emil Tsalapatis

This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that
leverages the kernel's rhashtable to provide resizable hash map for BPF.

The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at
map creation time. While this works well for many use cases, it presents
challenges when:

1. The number of elements is unknown at creation time
2. The element count varies significantly during runtime
3. Memory efficiency is important (over-provisioning wastes memory,
 under-provisioning hurts performance)

BPF_MAP_TYPE_RHASH addresses these issues by using rhashtable, which
automatically grows and shrinks based on load factor.

The implementation wraps the kernel's rhashtable with BPF map operations:

- Uses bpf_mem_alloc for RCU-safe memory management
- Supports all standard map operations (lookup, update, delete, get_next_key)
- Supports batch operations (lookup_batch, lookup_and_delete_batch)
- Supports BPF iterators for traversal
- Supports BPF_F_LOCK for spin locks in values
- Requires BPF_F_NO_PREALLOC flag (elements allocated on demand)
- max_entries serves as a hard limit, not bucket count

The series includes comprehensive tests:
- Basic operations in test_maps (lookup, update, delete, get_next_key)
- BPF program tests for lookup/update/delete semantics
- Seq file tests

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Update implementation
---------------------
Current implementation of the BPF_MAP_TYPE_RHASH does not provide
the same strong guarantees on the values consistency under concurrent
reads/writes as BPF_MAP_TYPE_HASH.
BPF_MAP_TYPE_HASH allocates a new element and atomically swaps the
pointer. BPF_MAP_TYPE_RHASH does memcpy in place with no lock held.
rhash trades consistency for speed, concurrent readers can observe
partially updated data. Two concurrent writers to the same key can
also interleave, producing mixed values. This is similar to arraymap
update implementation, including handling of the special fields.
As a solution, user may use BPF_F_LOCK to guarantee consistent reads
and write serialization.

Summary of the read consistency guarantees:

  map type     |  write mechanism |  read consistency
  -------------+------------------+--------------------------
  htab         |  alloc, swap ptr |  always consistent (RCU)
  htab  F_LOCK |  in-place + lock |  consistent if reader locks
  -------------+------------------+--------------------------
  rhtab        |  in-place memcpy |  torn reads
  rhtab F_LOCK |  in-place + lock |  consistent if reader locks

Benchmarks
----------

Lookup performance (bpf-rhashmap-lookup vs bpf-hashmap-lookup, events/sec):

  key_size | max_entries | nr_entries |    htab   |   rhtab  |  tbl load
  ---------+-------------+------------+-----------+----------+------------
     4     |    1,000    |    500     |  17.22M   |  16.92M  |   50%
     8     |  1,000,000  |  200,000   |  15.21M   |  13.39M  |   20%
     8     |  1,000,000  |  700,000   |  13.16M   |  13.62M  |   70%
     8     |  1,000,000  | 1,000,000  |   7.93M   |   9.52M  |  100%
    16     |  1,000,000  |  700,000   |  10.77M   |  11.85M  |   70%
    32     |  1,000,000  |  700,000   |  10.24M   |  10.69M  |   70%

rhtab matches or exceeds htab at higher load factors and larger keys.
At full occupancy (1M/1M) rhtab is 20% faster since htab's fixed
bucket count leads to longer chains while rhtab resizes to maintain
a healthy load factor.

In-place update (bpf-*-full-update, events/sec):
  htab :  7.62M
  rhtab: 14.94M  (+96%)

rhtab updates values in place via memcpy; htab allocates a new element
and swaps the pointer for each update.

Memory (htab-mem / rhtab-mem, 8 producers, overwrite workload):

    value_size |  ops/s  |  memory  |  peak memory
    -----------+---------+----------+-------------
    htab    32 | 18.27k  | 0.98 MiB |  1.24 MiB
    rhtab   32 | 27.05k  | 0.99 MiB |  0.99 MiB
    htab  4096 | 14.00k  | 2.23 MiB |  2.23 MiB
    rhtab 4096 | 17.00k  | 1.99 MiB |  1.99 MiB

rhtab is 21-48% faster on concurrent overwrites due to in-place
updates avoiding per-element allocation.
htab's peak memory exceeds steady-state (1.24 vs 0.98 MiB)
because each update allocates a new element before freeing the
old one. rhtab updates in place so peak equals steady-state.

Custom hash functions
---------------------

  Hash function      | key=4     | key=8     | key=32
  -------------------+-----------+-----------+-----------
  jhash (inlined)    | 15-16M    | 16M*      | 15M
  jhash (indirect)   | 11.6M     | 11.9M     | 11.0M
  xxh64 (indirect)   | 11.4M     | 11.9M     | 10.8M
  hsiphash (indirect)| 11.8M     | 11.6M     | 10.7M

Conclusions:
1. The indirect call dispatch is the bottleneck, not the hash
function. All three hash functions perform identically
(~11-12M) because the overhead of going through rhtab->params
(non-constant params -> ht->p.hashfn indirect call.
2. The zeroed-params trick is critical - it lets the compiler
see params.hashfn == NULL at compile time, inlining the
jhash dispatch and avoiding the indirect call entirely. That's
the ~15M vs ~11M difference (~35% improvement).

If we substitute inlined jhash with custom function in rhashtable:

Hash function | key=4   | key=8   | key=32  | key=256
--------------+---------+---------+---------+---------
jhash         | 15.6M   | 15.0M   | 15.0M   |  5.9M
xxh64         | 15.5M   | 14.8M   | 14.8M   | 10.3M

At key=256, xxh64 is 75% faster than jhash (10.3M vs 5.9M).

Asm code for resizable hashmap lookup
-------------------------------------
       : 0                0xffffffff81491860 <rhtab_lookup_elem>:
  0.01 :   ffffffff81491860:        endbr64
  0.08 :   ffffffff81491864:        pushq   %rbp
  0.09 :   ffffffff81491865:        pushq   %r15
  0.02 :   ffffffff81491867:        pushq   %r14
  0.02 :   ffffffff81491869:        pushq   %r13
  0.34 :   ffffffff8149186b:        pushq   %r12
  0.02 :   ffffffff8149186d:        pushq   %rbx
  0.06 :   ffffffff8149186e:        subq    $0x20, %rsp
  0.00 :   ffffffff81491872:        movq    %rsi, %rbx
  0.01 :   ffffffff81491875:        movq    %rdi, %r13
  0.01 :   ffffffff81491878:        movq    0x120(%rdi), %r14
  0.23 :   ffffffff8149187f:        movq    %rdi, (%rsp)
  0.75 :   ffffffff81491883:        movl    0x8(%r14), %edx
  0.66 :   ffffffff81491887:        movzwl  0x132(%r13), %esi
  0.04 :   ffffffff8149188f:        movq    %rbx, %rdi
  0.00 :   ffffffff81491892:        callq   0xffffffff8148e250 <jhash>
  0.05 :   ffffffff81491897:        movl    (%r14), %esi
 21.55 :   ffffffff8149189a:        decl    %esi
  0.47 :   ffffffff8149189c:        andl    %eax, %esi
  0.00 :   ffffffff8149189e:        cmpl    $0x0, 0x4(%r14)
  0.02 :   ffffffff814918a3:        movq    %r14, 0x18(%rsp)
  0.00 :   ffffffff814918a8:        jne     0xffffffff8149193c <rhtab_lookup_elem+0xdc>
  0.03 :   ffffffff814918ae:        movl    %esi, %eax
  0.00 :   ffffffff814918b0:        leaq    (%r14,%rax,8), %rax
  0.25 :   ffffffff814918b4:        addq    $0x40, %rax
  0.67 :   ffffffff814918b8:        movq    %rax, %rcx
  0.00 :   ffffffff814918bb:        orq     $0x1, %rcx
  0.00 :   ffffffff814918bf:        movq    %rcx, 0x8(%rsp)
  0.38 :   ffffffff814918c4:        movq    %rax, 0x10(%rsp)
  0.00 :   ffffffff814918c9:        movq    (%rax), %r12
  8.00 :   ffffffff814918cc:        andq    $-0x2, %r12
  0.00 :   ffffffff814918d0:        je      0xffffffff81491929 <rhtab_lookup_elem+0xc9>
  0.00 :   ffffffff814918d2:        movzwl  0x136(%r13), %r14d
  0.14 :   ffffffff814918da:        negq    %r14
  0.00 :   ffffffff814918dd:        movzwl  0x134(%r13), %r15d
  0.00 :   ffffffff814918e5:        movzwl  0x132(%r13), %r13d
  0.00 :   ffffffff814918ed:        leaq    (%r12,%r14), %rbp
  0.68 :   ffffffff814918f1:        leaq    (%r15,%rbp), %rdi
  2.02 :   ffffffff814918f5:        movq    %rbx, %rsi
  2.63 :   ffffffff814918f8:        movq    %r13, %rdx
  0.00 :   ffffffff814918fb:        callq   0xffffffff81ef69a0 <bcmp>
  6.30 :   ffffffff81491900:        testl   %eax, %eax
 34.73 :   ffffffff81491902:        je      0xffffffff81491920 <rhtab_lookup_elem+0xc0>
  0.00 :   ffffffff81491904:        movq    (%r12), %r12
  1.03 :   ffffffff81491908:        testb   $0x1, %r12b
  0.00 :   ffffffff8149190c:        je      0xffffffff814918ed <rhtab_lookup_elem+0x8d>
  0.00 :   ffffffff8149190e:        cmpq    0x8(%rsp), %r12
  0.00 :   ffffffff81491913:        movq    (%rsp), %r13
  0.00 :   ffffffff81491917:        movq    0x10(%rsp), %rax
  0.00 :   ffffffff8149191c:        jne     0xffffffff814918c9 <rhtab_lookup_elem+0x69>
  0.00 :   ffffffff8149191e:        jmp     0xffffffff81491929 <rhtab_lookup_elem+0xc9>
  0.00 :   ffffffff81491920:        testq   %r12, %r12
  7.31 :   ffffffff81491923:        movq    (%rsp), %r13
  0.43 :   ffffffff81491927:        jne     0xffffffff8149194b <rhtab_lookup_elem+0xeb>
  0.00 :   ffffffff81491929:        movq    0x18(%rsp), %r14
  0.00 :   ffffffff8149192e:        movq    0x30(%r14), %r14
  0.00 :   ffffffff81491932:        testq   %r14, %r14
  0.00 :   ffffffff81491935:        je      0xffffffff81491949 <rhtab_lookup_elem+0xe9>
  0.00 :   ffffffff81491937:        jmp     0xffffffff81491883 <rhtab_lookup_elem+0x23>
  0.00 :   ffffffff8149193c:        movq    %r14, %rdi
  0.00 :   ffffffff8149193f:        callq   0xffffffff818711a0 <rht_bucket_nested>
  0.00 :   ffffffff81491944:        jmp     0xffffffff814918b8 <rhtab_lookup_elem+0x58>
  0.00 :   ffffffff81491949:        xorl    %ebp, %ebp
  0.00 :   ffffffff8149194b:        movq    %rbp, %rax
  0.00 :   ffffffff8149194e:        addq    $0x20, %rsp
  0.23 :   ffffffff81491952:        popq    %rbx
  8.98 :   ffffffff81491953:        popq    %r12
  0.07 :   ffffffff81491955:        popq    %r13
  0.04 :   ffffffff81491957:        popq    %r14
  1.47 :   ffffffff81491959:        popq    %r15
  0.18 :   ffffffff8149195b:        popq    %rbp
  0.00 :   ffffffff8149195c:        jmp     0xffffffff81f182b0 <__pi___x86_return_thunk>

---
Changes in v3:
- Squash all commits implementing basic functions into one (Alexei)
- Remove selftests that were not necessary (Alexei)
- Resize detection for kernel full iterations, error out on resize (Alexei)
- Remove second lookup in get_next_key() (Emil)
- __acquires(RCU)/__releases(RCU) on seq_start/seq_stop (Emil)
- Use bpf_map_check_op_flags() where it makes sense (Leon)
- Benchmarks refresh, experiment with alternative hash functions
- Rely on iterator invalidation during rehash to handle table resizes:
fail on resize where we fully iterate on table inside kernel, dont fail on
resize where iteration goes through userspace. Exception -
rhtab_map_free_internal_structs() should be just safe to iterate fully
in kernel, no risk of infinite loop, because no user holding reference.
- Handle special fields during in-place updates (Emil, sashiko)
- Link to v2: https://lore.kernel.org/all/20260408-rhash-v2-0-3b3675da1f6e@meta.com/

Changes in v2:
- Added benchmarks
- Reworked all functions that walk the rhashtable, use walk API, instead
of directly accessing tbl and future_tbl
- Added rhashtable_walk_enter_from() into rhashtable to support O(1)
iteration continuations
- Link to v1: https://lore.kernel.org/r/20260205-rhash-v1-0-30dd6d63c462@meta.com

---
Mykyta Yatsenko (10):
      bpf: Implement resizable hashmap basic functions
      rhashtable: Add rhashtable_walk_enter_from()
      bpf: Implement get_next_key() resizable hashtab
      bpf: Implement batch ops and iterators for resizable hashtab
      bpf: Allow timers, workqueues and task_work in resizable hashtab
      libbpf: Support resizable hashtable
      selftests/bpf: Add basic tests for resizable hash map
      selftests/bpf: Add BPF iterator tests for resizable hash map
      bpftool: Add rhash map documentation
      selftests/bpf: Add resizable hashmap to benchmarks

 include/linux/bpf_types.h                          |   1 +
 include/linux/rhashtable.h                         |  31 +-
 include/uapi/linux/bpf.h                           |   1 +
 kernel/bpf/hashtab.c                               | 730 ++++++++++++++++++++-
 kernel/bpf/map_iter.c                              |   3 +-
 kernel/bpf/syscall.c                               |   4 +
 kernel/bpf/verifier.c                              |   1 +
 lib/rhashtable.c                                   |  41 ++
 lib/test_rhashtable.c                              | 131 ++++
 tools/bpf/bpftool/Documentation/bpftool-map.rst    |   2 +-
 tools/bpf/bpftool/map.c                            |   2 +-
 tools/include/uapi/linux/bpf.h                     |   1 +
 tools/lib/bpf/libbpf.c                             |   1 +
 tools/lib/bpf/libbpf_probes.c                      |   3 +
 tools/testing/selftests/bpf/bench.c                |   6 +
 .../bpf/benchs/bench_bpf_hashmap_full_update.c     |  34 +-
 .../bpf/benchs/bench_bpf_hashmap_lookup.c          |  31 +-
 .../testing/selftests/bpf/benchs/bench_htab_mem.c  |  35 +-
 tools/testing/selftests/bpf/prog_tests/rhash.c     | 127 ++++
 .../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c   |  34 +
 tools/testing/selftests/bpf/progs/rhash.c          | 239 +++++++
 21 files changed, 1435 insertions(+), 23 deletions(-)
---
base-commit: 05103382104b8ffc701f7b2c79379a0361ecad30
change-id: 20251103-rhash-7b70069923d8

Best regards,
--  
Mykyta Yatsenko <yatsenko@meta.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 20:40   ` sashiko-bot
  2026-04-24 20:45   ` bot+bpf-ci
  2026-04-24 19:50 ` [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko, Emil Tsalapatis

From: Mykyta Yatsenko <yatsenko@meta.com>

Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
for deletes, and rhashtable_lookup_get_insert_fast() for inserts.

Updates modify values in place under RCU rather than allocating a
new element and swapping the pointer (as regular htab does). This
trades read consistency for performance: concurrent readers may
see partial updates. Users requiring consistent reads should use
BPF_F_LOCK.

Initialize rhashtable with bpf_mem_alloc element cache. Require
BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via
rhashtable_free_and_destroy() callback to handle internal structs.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 include/linux/bpf_types.h      |   1 +
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/hashtab.c           | 381 +++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/map_iter.c          |   3 +-
 kernel/bpf/syscall.c           |   3 +
 kernel/bpf/verifier.c          |   1 +
 tools/include/uapi/linux/bpf.h |   1 +
 7 files changed, 390 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b13de31e163f..56e4c3f983d3 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -134,6 +134,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_ARRAY, insn_array_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_RHASH, rhtab_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..822582c04f22 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1046,6 +1046,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_CGRP_STORAGE,
 	BPF_MAP_TYPE_ARENA,
 	BPF_MAP_TYPE_INSN_ARRAY,
+	BPF_MAP_TYPE_RHASH,
 	__MAX_BPF_MAP_TYPE
 };
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3dd9b4924ae4..a37bd2a7b30f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -9,6 +9,7 @@
 #include <linux/rculist_nulls.h>
 #include <linux/rcupdate_wait.h>
 #include <linux/random.h>
+#include <linux/rhashtable.h>
 #include <uapi/linux/btf.h>
 #include <linux/rcupdate_trace.h>
 #include <linux/btf_ids.h>
@@ -2739,3 +2740,383 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
 	BATCH_OPS(htab),
 	.map_btf_id = &htab_map_btf_ids[0],
 };
+
+struct rhtab_elem {
+	struct rhash_head node;
+	/* key bytes, then value bytes follow */
+	u8 data[] __aligned(8);
+};
+
+struct bpf_rhtab {
+	struct bpf_map map;
+	struct rhashtable ht;
+	struct rhashtable_params params;
+	struct bpf_mem_alloc ma;
+	u32 elem_size;
+};
+
+static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
+{
+	return l->data + round_up(key_size, 8);
+}
+
+static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_rhtab *rhtab;
+	int err = 0;
+
+	rhtab = bpf_map_area_alloc(sizeof(*rhtab), NUMA_NO_NODE);
+	if (!rhtab)
+		return ERR_PTR(-ENOMEM);
+
+	bpf_map_init_from_attr(&rhtab->map, attr);
+
+	if (rhtab->map.max_entries > 1UL << 31) {
+		err = -E2BIG;
+		goto free_rhtab;
+	}
+
+	rhtab->elem_size = sizeof(struct rhtab_elem) + round_up(rhtab->map.key_size, 8) +
+			   round_up(rhtab->map.value_size, 8);
+
+	rhtab->params.head_offset = offsetof(struct rhtab_elem, node);
+	rhtab->params.key_offset = offsetof(struct rhtab_elem, data);
+	rhtab->params.key_len = rhtab->map.key_size;
+
+	err = rhashtable_init(&rhtab->ht, &rhtab->params);
+	if (err)
+		goto free_rhtab;
+
+	/* Set max_elems after rhashtable_init() since init zeroes the struct */
+	rhtab->ht.max_elems = rhtab->map.max_entries;
+
+	err = bpf_mem_alloc_init(&rhtab->ma, rhtab->elem_size, false);
+	if (err)
+		goto destroy_rhtab;
+
+	return &rhtab->map;
+
+destroy_rhtab:
+	rhashtable_destroy(&rhtab->ht);
+free_rhtab:
+	bpf_map_area_free(rhtab);
+	return ERR_PTR(err);
+}
+
+static int rhtab_map_alloc_check(union bpf_attr *attr)
+{
+	if (!(attr->map_flags & BPF_F_NO_PREALLOC))
+		return -EINVAL;
+
+	if (attr->map_flags & BPF_F_ZERO_SEED)
+		return -EINVAL;
+
+	return htab_map_alloc_check(attr);
+}
+
+static void rhtab_check_and_free_fields(struct bpf_rhtab *rhtab,
+					struct rhtab_elem *elem)
+{
+	if (IS_ERR_OR_NULL(rhtab->map.record))
+		return;
+
+	bpf_obj_free_fields(rhtab->map.record,
+			    rhtab_elem_value(elem, rhtab->map.key_size));
+}
+
+static void rhtab_mem_dtor(void *obj, void *ctx)
+{
+	struct htab_btf_record *hrec = ctx;
+	struct rhtab_elem *elem = obj;
+
+	if (IS_ERR_OR_NULL(hrec->record))
+		return;
+
+	bpf_obj_free_fields(hrec->record,
+			    rhtab_elem_value(elem, hrec->key_size));
+}
+
+static int rhtab_map_check_btf(struct bpf_map *map, const struct btf *btf,
+			       const struct btf_type *key_type,
+			       const struct btf_type *value_type)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+	return bpf_ma_set_dtor(map, &rhtab->ma, rhtab_mem_dtor);
+}
+
+static void rhtab_free_elem(void *ptr, void *arg)
+{
+	struct bpf_rhtab *rhtab = arg;
+	struct rhtab_elem *elem = ptr;
+
+	bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
+	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
+}
+
+static void rhtab_map_free(struct bpf_map *map)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+	rhashtable_free_and_destroy(&rhtab->ht, rhtab_free_elem, rhtab);
+	bpf_mem_alloc_destroy(&rhtab->ma);
+	bpf_map_area_free(rhtab);
+}
+
+static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	/* Using constant zeroed params to force rhashtable use inlined hashfunc */
+	static const struct rhashtable_params params = { 0 };
+
+	/* Hold RCU lock in case sleepable program calls via gen_lookup */
+	guard(rcu)();
+
+	return rhashtable_lookup_likely(&rhtab->ht, key, params);
+}
+
+static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
+{
+	struct rhtab_elem *l;
+
+	l = rhtab_lookup_elem(map, key);
+	return l ? rhtab_elem_value(l, map->key_size) : NULL;
+}
+
+static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
+{
+	int err;
+
+	err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
+	if (err)
+		return err;
+
+	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
+	return 0;
+}
+
+static long rhtab_map_delete_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	struct rhtab_elem *l;
+
+	guard(rcu)();
+	l = rhtab_lookup_elem(map, key);
+	return l ? rhtab_delete_elem(rhtab, l) : -ENOENT;
+}
+
+static void rhtab_read_elem_value(struct bpf_map *map, void *dst, struct rhtab_elem *elem,
+				  u64 flags)
+{
+	void *src = rhtab_elem_value(elem, map->key_size);
+
+	if (flags & BPF_F_LOCK)
+		copy_map_value_locked(map, dst, src, true);
+	else
+		copy_map_value(map, dst, src);
+}
+
+static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	struct rhtab_elem *l;
+	int err;
+
+	err = bpf_map_check_op_flags(map, flags, BPF_F_LOCK);
+	if (err)
+		return err;
+
+	/* Make sure element is not deleted between lookup and copy */
+	guard(rcu)();
+
+	l = rhtab_lookup_elem(map, key);
+	if (!l)
+		return -ENOENT;
+
+	rhtab_read_elem_value(map, value, l, flags);
+	err = rhtab_delete_elem(rhtab, l);
+	if (err)
+		return err;
+
+	check_and_init_map_value(map, value);
+	return 0;
+}
+
+static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
+				      u64 map_flags)
+{
+	void *old_val = rhtab_elem_value(elem, map->key_size);
+
+	if (map_flags & BPF_NOEXIST)
+		return -EEXIST;
+
+	if (map_flags & BPF_F_LOCK)
+		copy_map_value_locked(map, old_val, value, false);
+	else
+		copy_map_value(map, old_val, value);
+	return 0;
+}
+
+static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	struct rhtab_elem *elem, *tmp;
+
+	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+		return -EINVAL;
+
+	if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
+		return -EINVAL;
+
+	guard(rcu)();
+	elem = rhtab_lookup_elem(map, key);
+	if (elem)
+		return rhtab_map_update_existing(map, elem, value, map_flags);
+
+	if (map_flags & BPF_EXIST)
+		return -ENOENT;
+
+	/* Check max_entries limit before inserting new element */
+	if (atomic_read(&rhtab->ht.nelems) >= map->max_entries)
+		return -E2BIG;
+
+	elem = bpf_mem_cache_alloc(&rhtab->ma);
+	if (!elem)
+		return -ENOMEM;
+
+	memcpy(elem->data, key, map->key_size);
+	copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+
+	tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
+	if (tmp) {
+		bpf_mem_cache_free(&rhtab->ma, elem);
+		if (IS_ERR(tmp))
+			return PTR_ERR(tmp);
+
+		return rhtab_map_update_existing(map, tmp, value, map_flags);
+	}
+
+	return 0;
+}
+
+static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
+{
+	struct bpf_insn *insn = insn_buf;
+	const int ret = BPF_REG_0;
+
+	BUILD_BUG_ON(!__same_type(&rhtab_lookup_elem,
+				  (void *(*)(struct bpf_map *map, void *key)) NULL));
+	*insn++ = BPF_EMIT_CALL(rhtab_lookup_elem);
+	*insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1);
+	*insn++ = BPF_ALU64_IMM(BPF_ADD, ret,
+				offsetof(struct rhtab_elem, data) + round_up(map->key_size, 8));
+
+	return insn - insn_buf;
+}
+
+static void rhtab_map_free_internal_structs(struct bpf_map *map)
+{
+}
+
+static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+	return -EOPNOTSUPP;
+}
+
+static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
+{
+}
+
+static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
+				void *callback_ctx, u64 flags)
+{
+	return -EOPNOTSUPP;
+}
+
+static u64 rhtab_map_mem_usage(const struct bpf_map *map)
+{
+	return 0;
+}
+
+static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+				  union bpf_attr __user *uattr)
+{
+	return -EOPNOTSUPP;
+}
+
+static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
+					     union bpf_attr __user *uattr)
+{
+	return -EOPNOTSUPP;
+}
+
+struct bpf_iter_seq_rhash_map_info {
+	struct bpf_map *map;
+	struct bpf_rhtab *rhtab;
+	struct rhashtable_iter iter;
+	bool iter_active;
+};
+
+static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	return NULL;
+}
+
+static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	return NULL;
+}
+
+static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v)
+{
+	return 0;
+}
+
+static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int bpf_iter_init_rhash_map(void *priv_data, struct bpf_iter_aux_info *aux)
+{
+	return 0;
+}
+
+static void bpf_iter_fini_rhash_map(void *priv_data)
+{
+}
+
+static const struct seq_operations bpf_rhash_map_seq_ops = {
+	.start = bpf_rhash_map_seq_start,
+	.next = bpf_rhash_map_seq_next,
+	.stop = bpf_rhash_map_seq_stop,
+	.show = bpf_rhash_map_seq_show,
+};
+
+static const struct bpf_iter_seq_info rhash_iter_seq_info = {
+	.seq_ops = &bpf_rhash_map_seq_ops,
+	.init_seq_private = bpf_iter_init_rhash_map,
+	.fini_seq_private = bpf_iter_fini_rhash_map,
+	.seq_priv_size = sizeof(struct bpf_iter_seq_rhash_map_info),
+};
+
+BTF_ID_LIST_SINGLE(rhtab_map_btf_ids, struct, bpf_rhtab)
+const struct bpf_map_ops rhtab_map_ops = {
+	.map_meta_equal = bpf_map_meta_equal,
+	.map_alloc_check = rhtab_map_alloc_check,
+	.map_alloc = rhtab_map_alloc,
+	.map_free = rhtab_map_free,
+	.map_get_next_key = rhtab_map_get_next_key,
+	.map_release_uref = rhtab_map_free_internal_structs,
+	.map_lookup_elem = rhtab_map_lookup_elem,
+	.map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem,
+	.map_update_elem = rhtab_map_update_elem,
+	.map_delete_elem = rhtab_map_delete_elem,
+	.map_gen_lookup = rhtab_map_gen_lookup,
+	.map_seq_show_elem = rhtab_map_seq_show_elem,
+	.map_set_for_each_callback_args = map_set_for_each_callback_args,
+	.map_for_each_callback = bpf_each_rhash_elem,
+	.map_mem_usage = rhtab_map_mem_usage,
+	BATCH_OPS(rhtab),
+	.map_btf_id = &rhtab_map_btf_ids[0],
+	.iter_seq_info = &rhash_iter_seq_info,
+};
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index 261a03ea73d3..4a2aafbe28b4 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -119,7 +119,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog,
 		is_percpu = true;
 	else if (map->map_type != BPF_MAP_TYPE_HASH &&
 		 map->map_type != BPF_MAP_TYPE_LRU_HASH &&
-		 map->map_type != BPF_MAP_TYPE_ARRAY)
+		 map->map_type != BPF_MAP_TYPE_ARRAY &&
+		 map->map_type != BPF_MAP_TYPE_RHASH)
 		goto put_map;
 
 	key_acc_size = prog->aux->max_rdonly_access;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a3c0214ca934..6c4369691d87 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1280,6 +1280,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case BPF_SPIN_LOCK:
 			case BPF_RES_SPIN_LOCK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_RHASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
 				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
 				    map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
@@ -1457,6 +1458,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_MAP_TYPE_CGROUP_ARRAY:
 	case BPF_MAP_TYPE_ARRAY_OF_MAPS:
 	case BPF_MAP_TYPE_HASH:
+	case BPF_MAP_TYPE_RHASH:
 	case BPF_MAP_TYPE_PERCPU_HASH:
 	case BPF_MAP_TYPE_HASH_OF_MAPS:
 	case BPF_MAP_TYPE_RINGBUF:
@@ -2192,6 +2194,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 		   map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
 		   map->map_type == BPF_MAP_TYPE_LRU_HASH ||
 		   map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+		   map->map_type == BPF_MAP_TYPE_RHASH ||
 		   map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
 		if (!bpf_map_is_offloaded(map)) {
 			bpf_disable_instrumentation();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 185210b73385..633d29fe79b6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18070,6 +18070,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 	if (prog->sleepable)
 		switch (map->map_type) {
 		case BPF_MAP_TYPE_HASH:
+		case BPF_MAP_TYPE_RHASH:
 		case BPF_MAP_TYPE_LRU_HASH:
 		case BPF_MAP_TYPE_ARRAY:
 		case BPF_MAP_TYPE_PERCPU_HASH:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 677be9a47347..9d7df174770a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1046,6 +1046,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_CGRP_STORAGE,
 	BPF_MAP_TYPE_ARENA,
 	BPF_MAP_TYPE_INSN_ARRAY,
+	BPF_MAP_TYPE_RHASH,
 	__MAX_BPF_MAP_TYPE
 };
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from()
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 20:15   ` sashiko-bot
                     ` (2 more replies)
  2026-04-24 19:50 ` [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab Mykyta Yatsenko
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

BPF resizable hashmap needs efficient iteration resume for
get_next_key and seq_file iterators. rhashtable_walk_enter()
always starts from bucket 0, forcing linear skip of already-seen
elements.

Add rhashtable_walk_enter_from() that looks up the key's bucket
and positions the walker there, so walk_next returns the successor
directly. If a resize moved the key to the future table, the
walker resets to the beginning.

Refactor __rhashtable_lookup into __rhashtable_lookup_one to reuse
the single-table lookup in both the two-table search and the new
enter_from positioning.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/rhashtable.h |  31 +++++++++--
 lib/rhashtable.c           |  41 ++++++++++++++
 lib/test_rhashtable.c      | 131 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 198 insertions(+), 5 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 0480509a6339..6ac547d6c619 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -246,6 +246,11 @@ static inline void rhashtable_walk_start(struct rhashtable_iter *iter)
 	(void)rhashtable_walk_start_check(iter);
 }
 
+void rhashtable_walk_enter_from(struct rhashtable *ht,
+				struct rhashtable_iter *iter,
+				const void *key,
+				const struct rhashtable_params params);
+
 void *rhashtable_walk_next(struct rhashtable_iter *iter);
 void *rhashtable_walk_peek(struct rhashtable_iter *iter);
 void rhashtable_walk_stop(struct rhashtable_iter *iter) __releases_shared(RCU);
@@ -606,8 +611,8 @@ static inline int rhashtable_compare(struct rhashtable_compare_arg *arg,
 }
 
 /* Internal function, do not use. */
-static __always_inline struct rhash_head *__rhashtable_lookup(
-	struct rhashtable *ht, const void *key,
+static __always_inline struct rhash_head *__rhashtable_lookup_one(
+	struct rhashtable *ht, struct bucket_table *tbl, const void *key,
 	const struct rhashtable_params params,
 	const enum rht_lookup_freq freq)
 	__must_hold_shared(RCU)
@@ -617,13 +622,10 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
 		.key = key,
 	};
 	struct rhash_lock_head __rcu *const *bkt;
-	struct bucket_table *tbl;
 	struct rhash_head *he;
 	unsigned int hash;
 
 	BUILD_BUG_ON(!__builtin_constant_p(freq));
-	tbl = rht_dereference_rcu(ht->tbl, ht);
-restart:
 	hash = rht_key_hashfn(ht, tbl, key, params);
 	bkt = rht_bucket(tbl, hash);
 	do {
@@ -639,6 +641,25 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
 		 */
 	} while (he != RHT_NULLS_MARKER(bkt));
 
+	return NULL;
+}
+
+/* Internal function, do not use. */
+static __always_inline struct rhash_head *__rhashtable_lookup(
+	struct rhashtable *ht, const void *key,
+	const struct rhashtable_params params,
+	const enum rht_lookup_freq freq)
+	__must_hold_shared(RCU)
+{
+	struct bucket_table *tbl;
+	struct rhash_head *he;
+
+	tbl = rht_dereference_rcu(ht->tbl, ht);
+restart:
+	he = __rhashtable_lookup_one(ht, tbl, key, params, freq);
+	if (he)
+		return he;
+
 	/* Ensure we see any new tables. */
 	smp_rmb();
 
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6074ed5f66f3..2a6098edf737 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -692,6 +692,47 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
 
+/**
+ * rhashtable_walk_enter_from - Initialise a walk starting at a key's bucket
+ * @ht:		Table to walk over
+ * @iter:	Hash table iterator
+ * @key:	Key whose bucket to start from
+ * @params:	Hash table parameters
+ *
+ * Like rhashtable_walk_enter(), but positions the iterator at the bucket
+ * containing @key.  If @key is not present in the main table, the iterator is
+ * positioned at the beginning.
+ *
+ * Same constraints as rhashtable_walk_enter() apply.
+ */
+void rhashtable_walk_enter_from(struct rhashtable *ht,
+				struct rhashtable_iter *iter,
+				const void *key,
+				const struct rhashtable_params params)
+				__must_hold(RCU)
+{
+	struct bucket_table *tbl;
+	struct rhash_head *he;
+
+	rhashtable_walk_enter(ht, iter);
+
+	if (!key)
+		return;
+
+	tbl = iter->walker.tbl;
+	if (!tbl)
+		return;
+
+	he = __rhashtable_lookup_one(ht, tbl, key, params,
+				     RHT_LOOKUP_NORMAL);
+	if (!he)
+		return;
+
+	iter->slot = rht_key_hashfn(ht, tbl, key, params);
+	iter->p = he;
+}
+EXPORT_SYMBOL_GPL(rhashtable_walk_enter_from);
+
 /**
  * rhashtable_walk_exit - Free an iterator
  * @iter:	Hash table Iterator
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 0b33559a910b..0c4f489bf655 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -23,6 +23,7 @@
 #include <linux/random.h>
 #include <linux/vmalloc.h>
 #include <linux/wait.h>
+#include <linux/cleanup.h>
 
 #define MAX_ENTRIES	1000000
 #define TEST_INSERT_FAIL INT_MAX
@@ -679,6 +680,133 @@ static int threadfunc(void *data)
 	return err;
 }
 
+static int __init test_walk_enter_from(void)
+{
+	struct rhashtable ht;
+	struct test_obj objs[4];
+	struct rhashtable_iter iter;
+	struct test_obj *obj;
+	struct test_obj_val key = { };
+	int err, i, count;
+	int len_freq[4] = { 0, 0, 0, 0 };
+
+	err = rhashtable_init(&ht, &test_rht_params);
+	if (err)
+		return err;
+
+	/* Insert 4 elements with keys 0, 2, 4, 6 */
+	for (i = 0; i < 4; i++) {
+		objs[i].value.id = i * 2;
+		objs[i].value.tid = 0;
+		err = rhashtable_insert_fast(&ht, &objs[i].node,
+					     test_rht_params);
+		if (err) {
+			pr_warn("walk_enter_from: insert %d failed: %d\n",
+				i, err);
+			goto out;
+		}
+	}
+
+	/*
+	 * Test 1: walk_enter_from positions at key, walk_next returns
+	 * the successor (not the key itself).
+	 */
+	for (i = 0; i < 4; i++) {
+		key.id = i * 2;
+		scoped_guard(rcu) {
+			rhashtable_walk_enter_from(&ht, &iter, &key,
+						   test_rht_params);
+			rhashtable_walk_start(&iter);
+		}
+
+		do {
+			obj = rhashtable_walk_next(&iter);
+		} while (PTR_ERR(obj) == -EAGAIN);
+
+		/* Successor must not be the key itself */
+		if (IS_ERR(obj) || (obj && obj->value.id == i * 2)) {
+			pr_warn("walk_enter_from: returned err %ld or key %d instead of successor\n",
+				PTR_ERR(obj), i * 2);
+			err = -EINVAL;
+		}
+
+		rhashtable_walk_stop(&iter);
+		rhashtable_walk_exit(&iter);
+
+		if (err)
+			goto out;
+	}
+
+	/* Test 2: walk_enter_from with non-existent key starts from the beginning */
+	key.id = 99;
+	scoped_guard(rcu) {
+		rhashtable_walk_enter_from(&ht, &iter, &key, test_rht_params);
+		rhashtable_walk_start(&iter);
+	}
+
+	do {
+		obj = rhashtable_walk_next(&iter);
+	} while (PTR_ERR(obj) == -EAGAIN);
+
+	if (IS_ERR(obj)) {
+		pr_warn("walk_enter_from: returned err %ld\n", PTR_ERR(obj));
+		err = -EINVAL;
+	}
+
+	/* Should still return some element (iteration from bucket start) */
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	if (err)
+		goto out;
+
+	/* Test 3: verify walk_enter_from + walk_next can iterate remaining elements */
+	for (i = 0; i < 4; i++) {
+		key.id = i * 2;
+		count = 0;
+
+		scoped_guard(rcu) {
+			rhashtable_walk_enter_from(&ht, &iter, &key, test_rht_params);
+			rhashtable_walk_start(&iter);
+		}
+
+		/* Count the number of elements after this one */
+		do {
+			obj = rhashtable_walk_next(&iter);
+			count++;
+		} while (obj && (!IS_ERR(obj) || (PTR_ERR(obj) == -EAGAIN)));
+
+		rhashtable_walk_stop(&iter);
+		rhashtable_walk_exit(&iter);
+		if (count > 4) {
+			pr_warn("walk_enter_from: walked %d elements instead of max 4\n", count);
+			goto out;
+		}
+		/*
+		 * Collect the lengths of every walked path,
+		 * we should have each length walked one time
+		 */
+		len_freq[count - 1]++;
+	}
+
+	for (i = 0; i < 4; i++) {
+		/*
+		 * Should see at least some elements after key 0.
+		 * Exact count depends on hash distribution.
+		 */
+		if (len_freq[i] != 1) {
+			pr_warn("walk_enter_from: duplicate tail length\n");
+			err = -EINVAL;
+			goto out;
+		}
+	}
+
+	pr_info("walk_enter_from: all tests passed\n");
+out:
+	rhashtable_destroy(&ht);
+	return err;
+}
+
 static int __init test_rht_init(void)
 {
 	unsigned int entries;
@@ -738,6 +866,9 @@ static int __init test_rht_init(void)
 
 	test_insert_duplicates_run();
 
+	pr_info("Testing walk_enter_from: %s\n",
+		test_walk_enter_from() == 0 ? "pass" : "FAIL");
+
 	if (!tcount)
 		return 0;
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-28 10:33   ` Herbert Xu
  2026-04-24 19:50 ` [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for " Mykyta Yatsenko
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko, Emil Tsalapatis

From: Mykyta Yatsenko <yatsenko@meta.com>

Implement rhtab_map_get_next_key() for the BPF resizable hashtable.
Called from syscall, so it's safe to use rhashtable walk API that
uses spinlocks internally.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 kernel/bpf/hashtab.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index a37bd2a7b30f..d37f3d548d36 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -3014,13 +3014,54 @@ static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 	return insn - insn_buf;
 }
 
+/* Helper to get next element, handling -EAGAIN during resize */
+static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
+{
+	struct rhtab_elem *elem;
+
+	while ((elem = rhashtable_walk_next(iter))) {
+		if (IS_ERR(elem)) {
+			if (PTR_ERR(elem) == -EAGAIN)
+				continue;
+			return NULL;
+		}
+		return elem;
+	}
+
+	return NULL;
+}
+
 static void rhtab_map_free_internal_structs(struct bpf_map *map)
 {
 }
 
 static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 {
-	return -EOPNOTSUPP;
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	struct rhashtable_iter iter;
+	struct rhtab_elem *elem;
+	int ret = -ENOENT;
+
+	/*
+	 * Hold RCU across enter_from + walk_start to prevent the
+	 * element cached by enter_from from being freed before
+	 * walk_start re-acquires RCU.
+	 */
+	guard(rcu)();
+	/* If key is not found, places iterator at the beginning */
+	rhashtable_walk_enter_from(&rhtab->ht, &iter, key, rhtab->params);
+	rhashtable_walk_start(&iter);
+
+	elem = rhtab_iter_next(&iter);
+	if (elem) {
+		memcpy(next_key, elem->data, map->key_size);
+		ret = 0;
+	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	return ret;
 }
 
 static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for resizable hashtab
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (2 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 20:28   ` sashiko-bot
  2026-04-24 19:50 ` [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko, Emil Tsalapatis

From: Mykyta Yatsenko <yatsenko@meta.com>

Add batch operations for BPF_MAP_TYPE_RHASH.
Batch operations:
 * rhtab_map_lookup_batch: Bulk lookup of elements by bucket
 * rhtab_map_lookup_and_delete_batch: Atomic bulk lookup and delete

The batch implementation uses rhashtable_walk_enter_from() to resume
iteration from the last collected key. When the buffer fills, the last
key becomes the cursor for the next batch call.

Also implements rhtab_map_mem_usage() to report memory consumption.

Wire up seq_file BPF iterator for BPF_MAP_TYPE_RHASH so that
bpf_iter and bpftool map dump work with resizable hash maps.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 kernel/bpf/hashtab.c | 280 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 273 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d37f3d548d36..000caa2c7f4c 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -3066,64 +3066,330 @@ static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key
 
 static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
 {
+	void *value;
+
+	/* Guarantee that hashtab value is not freed */
+	guard(rcu)();
+
+	value = rhtab_map_lookup_elem(map, key);
+	if (!value)
+		return;
+
+	btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
+	seq_puts(m, ": ");
+	btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
+	seq_putc(m, '\n');
 }
 
 static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
 				void *callback_ctx, u64 flags)
 {
-	return -EOPNOTSUPP;
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	struct rhashtable_iter iter;
+	struct rhtab_elem *elem;
+	int num_elems = 0;
+	u64 ret = 0;
+
+	if (flags != 0)
+		return -EINVAL;
+
+	/*
+	 * The rhashtable walk API uses spin_lock in rhashtable_walk_start/stop,
+	 * which is not safe in NMI or soft/hard IRQ context.
+	 */
+	if (in_nmi() || in_hardirq() || in_softirq())
+		return -EOPNOTSUPP;
+
+	rhashtable_walk_enter(&rhtab->ht, &iter);
+	rhashtable_walk_start(&iter);
+
+	while ((elem = rhashtable_walk_next(&iter))) {
+		/* rhashtable_walk_next returns -EAGAIN on resize, abort */
+		if (IS_ERR(elem)) {
+			num_elems = -EBUSY;
+			break;
+		}
+		num_elems++;
+		ret = callback_fn((u64)(long)map,
+				  (u64)(long)elem->data,
+				  (u64)(long)rhtab_elem_value(elem, map->key_size),
+				  (u64)(long)callback_ctx, 0);
+		if (ret)
+			break;
+	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	return num_elems;
 }
 
 static u64 rhtab_map_mem_usage(const struct bpf_map *map)
 {
-	return 0;
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	u64 num_entries;
+
+	num_entries = atomic_read(&rhtab->ht.nelems);
+	return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
+}
+
+static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
+					       const union bpf_attr *attr,
+					       union bpf_attr __user *uattr,
+					       bool do_delete)
+{
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	void __user *uvalues = u64_to_user_ptr(attr->batch.values);
+	void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
+	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
+	void *buf = NULL, *keys = NULL, *values = NULL, *dst_key, *dst_val;
+	struct rhtab_elem **del_elems = NULL;
+	u32 max_count, total, key_size, value_size, i;
+	struct rhashtable_iter iter;
+	struct rhtab_elem *elem;
+	u64 elem_map_flags, map_flags;
+	int ret = 0;
+
+	elem_map_flags = attr->batch.elem_flags;
+	ret = bpf_map_check_op_flags(map, elem_map_flags, BPF_F_LOCK);
+	if (ret)
+		return ret;
+
+	map_flags = attr->batch.flags;
+	if (map_flags)
+		return -EINVAL;
+
+	max_count = attr->batch.count;
+	if (!max_count)
+		return 0;
+
+	if (put_user(0, &uattr->batch.count))
+		return -EFAULT;
+
+	key_size = map->key_size;
+	value_size = map->value_size;
+
+	keys = kvmalloc_array(max_count, key_size, GFP_USER | __GFP_NOWARN);
+	values = kvmalloc_array(max_count, value_size, GFP_USER | __GFP_NOWARN);
+	if (do_delete)
+		del_elems = kvmalloc_array(max_count, sizeof(void *),
+					   GFP_USER | __GFP_NOWARN);
+
+	if (!keys || !values || (do_delete && !del_elems)) {
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	/*
+	 * Use the last key from the previous batch as cursor.
+	 * enter_from positions at that key's bucket, walk_next
+	 * returns the successor in O(1).
+	 * First call (ubatch == NULL): starts from bucket 0.
+	 */
+	if (ubatch) {
+		buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
+		if (!buf) {
+			ret = -ENOMEM;
+			goto free;
+		}
+		if (copy_from_user(buf, ubatch, key_size)) {
+			ret = -EFAULT;
+			goto free;
+		}
+	}
+
+	scoped_guard(rcu) {
+		rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
+		rhashtable_walk_start(&iter);
+	}
+
+	dst_key = keys;
+	dst_val = values;
+	total = 0;
+
+	while (total < max_count) {
+		elem = rhtab_iter_next(&iter);
+		if (!elem)
+			break;
+
+		memcpy(dst_key, elem->data, key_size);
+		rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
+		check_and_init_map_value(map, dst_val);
+
+		if (do_delete)
+			del_elems[total] = elem;
+
+		dst_key += key_size;
+		dst_val += value_size;
+		total++;
+	}
+
+	if (do_delete) {
+		for (i = 0; i < total; i++)
+			rhtab_delete_elem(rhtab, del_elems[i]);
+	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	if (total == 0) {
+		ret = -ENOENT;
+		goto free;
+	}
+
+	/* Signal end of table when we collected fewer than requested */
+	if (total < max_count)
+		ret = -ENOENT;
+
+	/* Write last key as cursor for the next batch call */
+	if (copy_to_user(ukeys, keys, total * key_size) ||
+	    copy_to_user(uvalues, values, total * value_size) ||
+	    put_user(total, &uattr->batch.count) ||
+	    copy_to_user(u64_to_user_ptr(attr->batch.out_batch),
+			 dst_key - key_size, key_size)) {
+		ret = -EFAULT;
+		goto free;
+	}
+
+free:
+	kfree(buf);
+	kvfree(keys);
+	kvfree(values);
+	kvfree(del_elems);
+	return ret;
 }
 
 static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
 				  union bpf_attr __user *uattr)
 {
-	return -EOPNOTSUPP;
+	return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, false);
 }
 
 static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
 					     union bpf_attr __user *uattr)
 {
-	return -EOPNOTSUPP;
+	return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, true);
 }
 
 struct bpf_iter_seq_rhash_map_info {
 	struct bpf_map *map;
 	struct bpf_rhtab *rhtab;
 	struct rhashtable_iter iter;
+	void *last_key;
+	bool last_key_valid;
 	bool iter_active;
 };
 
 static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
+	__acquires(RCU)
 {
-	return NULL;
+	struct bpf_iter_seq_rhash_map_info *info = seq->private;
+	struct rhtab_elem *elem;
+	void *key = *pos > 0 && info->last_key_valid ? info->last_key : NULL;
+
+	scoped_guard(rcu) {
+		rhashtable_walk_enter_from(&info->rhtab->ht, &info->iter,
+					   key, info->rhtab->params);
+		rhashtable_walk_start(&info->iter);
+	}
+	info->iter_active = true;
+
+	elem = rhashtable_walk_next(&info->iter);
+	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
+	if (IS_ERR(elem))
+		return ERR_PTR(-EBUSY);
+	if (!elem)
+		return NULL;
+	if (*pos == 0)
+		++*pos;
+	return elem;
 }
 
 static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	return NULL;
+	struct bpf_iter_seq_rhash_map_info *info = seq->private;
+	struct rhtab_elem *elem = v;
+
+	memcpy(info->last_key, elem->data, info->map->key_size);
+	info->last_key_valid = true;
+
+	++*pos;
+
+	elem = rhashtable_walk_next(&info->iter);
+	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
+	if (IS_ERR(elem))
+		return ERR_PTR(-EBUSY);
+
+	return elem;
+}
+
+static int __bpf_rhash_map_seq_show(struct seq_file *seq,
+				    struct rhtab_elem *elem)
+{
+	struct bpf_iter_seq_rhash_map_info *info = seq->private;
+	struct bpf_iter__bpf_map_elem ctx = {};
+	struct bpf_iter_meta meta;
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, elem == NULL);
+	if (prog) {
+		ctx.meta = &meta;
+		ctx.map = info->map;
+		if (elem) {
+			ctx.key = elem->data;
+			ctx.value = rhtab_elem_value(elem, info->map->key_size);
+		}
+		ret = bpf_iter_run_prog(prog, &ctx);
+	}
+
+	return ret;
 }
 
 static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v)
 {
-	return 0;
+	return __bpf_rhash_map_seq_show(seq, v);
 }
 
 static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v)
+	__releases(RCU)
 {
+	struct bpf_iter_seq_rhash_map_info *info = seq->private;
+
+	if (!v)
+		(void)__bpf_rhash_map_seq_show(seq, NULL);
+
+	if (info->iter_active) {
+		rhashtable_walk_stop(&info->iter);
+		rhashtable_walk_exit(&info->iter);
+		info->iter_active = false;
+	}
 }
 
 static int bpf_iter_init_rhash_map(void *priv_data, struct bpf_iter_aux_info *aux)
 {
+	struct bpf_iter_seq_rhash_map_info *info = priv_data;
+	struct bpf_map *map = aux->map;
+
+	info->last_key_valid = false;
+	info->last_key = kmalloc(map->key_size, GFP_USER);
+	if (!info->last_key)
+		return -ENOMEM;
+
+	bpf_map_inc_with_uref(map);
+	info->map = map;
+	info->rhtab = container_of(map, struct bpf_rhtab, map);
+	info->iter_active = false;
 	return 0;
 }
 
 static void bpf_iter_fini_rhash_map(void *priv_data)
 {
+	struct bpf_iter_seq_rhash_map_info *info = priv_data;
+
+	kfree(info->last_key);
+	bpf_map_put_with_uref(info->map);
 }
 
 static const struct seq_operations bpf_rhash_map_seq_ops = {

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in resizable hashtab
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (3 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for " Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 21:05   ` sashiko-bot
  2026-04-24 19:50 ` [PATCH bpf-next v3 06/10] libbpf: Support resizable hashtable Mykyta Yatsenko
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Without this, users needing deferred callbacks in a dynamically-sized
map have no option - fixed-size htab is the only map supporting these
field types. Resizable hashtab should offer the same capability.

Properly clean up BTF record fields on element delete and map
teardown by wiring up bpf_obj_free_fields through a memory allocator
destructor, matching the pattern used by htab for non-prealloc maps.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 kernel/bpf/hashtab.c | 42 +++++++++++++++++++++++++++++++++---------
 kernel/bpf/syscall.c |  1 +
 2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 000caa2c7f4c..ba03797b8960 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -497,28 +497,26 @@ static void htab_dtor_ctx_free(void *ctx)
 	kfree(ctx);
 }
 
-static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *))
+static int bpf_ma_set_dtor(struct bpf_map *map, struct bpf_mem_alloc *ma,
+			   void (*dtor)(void *, void *))
 {
-	u32 key_size = htab->map.key_size;
-	struct bpf_mem_alloc *ma;
 	struct htab_btf_record *hrec;
 	int err;
 
 	/* No need for dtors. */
-	if (IS_ERR_OR_NULL(htab->map.record))
+	if (IS_ERR_OR_NULL(map->record))
 		return 0;
 
 	hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
 	if (!hrec)
 		return -ENOMEM;
-	hrec->key_size = key_size;
-	hrec->record = btf_record_dup(htab->map.record);
+	hrec->key_size = map->key_size;
+	hrec->record = btf_record_dup(map->record);
 	if (IS_ERR(hrec->record)) {
 		err = PTR_ERR(hrec->record);
 		kfree(hrec);
 		return err;
 	}
-	ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
 	bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec);
 	return 0;
 }
@@ -535,9 +533,9 @@ static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf,
 	 * populated in htab_map_alloc(), so it will always appear as NULL.
 	 */
 	if (htab_is_percpu(htab))
-		return htab_set_dtor(htab, htab_pcpu_mem_dtor);
+		return bpf_ma_set_dtor(map, &htab->pcpu_ma, htab_pcpu_mem_dtor);
 	else
-		return htab_set_dtor(htab, htab_mem_dtor);
+		return bpf_ma_set_dtor(map, &htab->ma, htab_mem_dtor);
 }
 
 static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
@@ -2891,6 +2889,7 @@ static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
 	if (err)
 		return err;
 
+	rhtab_check_and_free_fields(rhtab, elem);
 	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
 	return 0;
 }
@@ -2945,11 +2944,15 @@ static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void
 static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
 				      u64 map_flags)
 {
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
 	void *old_val = rhtab_elem_value(elem, map->key_size);
 
 	if (map_flags & BPF_NOEXIST)
 		return -EEXIST;
 
+	/* Free special fields, and reuse memory similar to arraymap */
+	rhtab_check_and_free_fields(rhtab, elem);
+
 	if (map_flags & BPF_F_LOCK)
 		copy_map_value_locked(map, old_val, value, false);
 	else
@@ -2986,6 +2989,7 @@ static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u
 
 	memcpy(elem->data, key, map->key_size);
 	copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+	check_and_init_map_value(map, rhtab_elem_value(elem, map->key_size));
 
 	tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
 	if (tmp) {
@@ -3033,6 +3037,25 @@ static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
 
 static void rhtab_map_free_internal_structs(struct bpf_map *map)
 {
+	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+	struct rhashtable_iter iter;
+	struct rhtab_elem *elem;
+
+	if (!bpf_map_has_internal_structs(map))
+		return;
+
+	/*
+	 * An element can be processed twice if rhashtable resized concurrently.
+	 * Special structs freeing handles duplicate cancel_and_free.
+	 */
+	rhashtable_walk_enter(&rhtab->ht, &iter);
+	rhashtable_walk_start(&iter);
+
+	for (elem = rhtab_iter_next(&iter); elem; elem = rhtab_iter_next(&iter))
+		bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
 }
 
 static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
@@ -3414,6 +3437,7 @@ const struct bpf_map_ops rhtab_map_ops = {
 	.map_free = rhtab_map_free,
 	.map_get_next_key = rhtab_map_get_next_key,
 	.map_release_uref = rhtab_map_free_internal_structs,
+	.map_check_btf = rhtab_map_check_btf,
 	.map_lookup_elem = rhtab_map_lookup_elem,
 	.map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem,
 	.map_update_elem = rhtab_map_update_elem,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6c4369691d87..fffca2f484eb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1295,6 +1295,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case BPF_WORKQUEUE:
 			case BPF_TASK_WORK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_RHASH &&
 				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY) {
 					ret = -EOPNOTSUPP;

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 06/10] libbpf: Support resizable hashtable
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (4 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko, Emil Tsalapatis

From: Mykyta Yatsenko <yatsenko@meta.com>

Add BPF_MAP_TYPE_RHASH to libbpf's map type name table and feature
probing so that libbpf-based tools can create and identify resizable
hash maps.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 tools/lib/bpf/libbpf.c        | 1 +
 tools/lib/bpf/libbpf_probes.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 83aae7a39d36..b8dd0c09841a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -192,6 +192,7 @@ static const char * const map_type_name[] = {
 	[BPF_MAP_TYPE_CGRP_STORAGE]		= "cgrp_storage",
 	[BPF_MAP_TYPE_ARENA]			= "arena",
 	[BPF_MAP_TYPE_INSN_ARRAY]		= "insn_array",
+	[BPF_MAP_TYPE_RHASH]			= "rhash",
 };
 
 static const char * const prog_type_name[] = {
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index b70d9637ecf5..e40819465ddc 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -309,6 +309,9 @@ static int probe_map_create(enum bpf_map_type map_type)
 		value_size	= sizeof(__u64);
 		opts.map_flags	= BPF_F_NO_PREALLOC;
 		break;
+	case BPF_MAP_TYPE_RHASH:
+		opts.map_flags	= BPF_F_NO_PREALLOC;
+		break;
 	case BPF_MAP_TYPE_CGROUP_STORAGE:
 	case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
 		key_size	= sizeof(struct bpf_cgroup_storage_key);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (5 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 06/10] libbpf: Support resizable hashtable Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 20:02   ` sashiko-bot
  2026-04-24 20:32   ` bot+bpf-ci
  2026-04-24 19:50 ` [PATCH bpf-next v3 08/10] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Test basic map operations (lookup, update, delete) for
BPF_MAP_TYPE_RHASH including boundary conditions like duplicate
key insertion and deletion of nonexistent keys.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 tools/testing/selftests/bpf/prog_tests/rhash.c |  64 +++++++
 tools/testing/selftests/bpf/progs/rhash.c      | 239 +++++++++++++++++++++++++
 2 files changed, 303 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
new file mode 100644
index 000000000000..0f6eec4fe1fe
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <string.h>
+#include <stdio.h>
+#include "rhash.skel.h"
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+
+static void rhash_run(const char *prog_name)
+{
+	struct rhash *skel;
+	struct bpf_program *prog;
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	int err;
+
+	skel = rhash__open();
+	if (!ASSERT_OK_PTR(skel, "rhash__open"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+	bpf_program__set_autoload(prog, true);
+
+	err = rhash__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+	if (!ASSERT_OK(err, "prog run"))
+		goto cleanup;
+
+	if (!ASSERT_OK(skel->bss->err, "bss->err"))
+		goto cleanup;
+
+cleanup:
+	rhash__destroy(skel);
+}
+
+void test_rhash(void)
+{
+	if (test__start_subtest("test_rhash_lookup_update"))
+		rhash_run("test_rhash_lookup_update");
+
+	if (test__start_subtest("test_rhash_update_delete"))
+		rhash_run("test_rhash_update_delete");
+
+	if (test__start_subtest("test_rhash_update_elements"))
+		rhash_run("test_rhash_update_elements");
+
+	if (test__start_subtest("test_rhash_update_exist"))
+		rhash_run("test_rhash_update_exist");
+
+	if (test__start_subtest("test_rhash_update_any"))
+		rhash_run("test_rhash_update_any");
+
+	if (test__start_subtest("test_rhash_noexist_duplicate"))
+		rhash_run("test_rhash_noexist_duplicate");
+
+	if (test__start_subtest("test_rhash_delete_nonexistent"))
+		rhash_run("test_rhash_delete_nonexistent");
+}
diff --git a/tools/testing/selftests/bpf/progs/rhash.c b/tools/testing/selftests/bpf/progs/rhash.c
new file mode 100644
index 000000000000..9d066a22ce18
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rhash.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+
+#define ENOENT 2
+#define EEXIST 17
+
+char _license[] SEC("license") = "GPL";
+
+int err;
+
+struct elem {
+	char arr[128];
+	int val;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RHASH);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__uint(max_entries, 128);
+	__type(key, int);
+	__type(value, struct elem);
+} rhmap SEC(".maps");
+
+SEC("syscall")
+int test_rhash_lookup_update(void *ctx)
+{
+	int key = 5;
+	struct elem empty = {.val = 3, .arr = {0}};
+	struct elem *e;
+
+	err = 1;
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (e)
+		return 1;
+
+	err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+	if (err)
+		return 1;
+
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (e && e->val == empty.val)
+		err = 0;
+
+	return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_delete(void *ctx)
+{
+	int key = 6;
+	struct elem empty = {.val = 4, .arr = {0}};
+	struct elem *e;
+
+	err = 1;
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (e)
+		return 1;
+
+	err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+	if (err)
+		return 2;
+
+	err = bpf_map_delete_elem(&rhmap, &key);
+	if (err)
+		return 3;
+
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (e)
+		return 4;
+
+	err = 0;
+	return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_elements(void *ctx)
+{
+	int key = 0;
+	struct elem empty = {.val = 4, .arr = {0}};
+	struct elem *e;
+	int i;
+
+	err = 1;
+
+	for (i = 0; i < 128; ++i) {
+		key = i;
+		e = bpf_map_lookup_elem(&rhmap, &key);
+		if (e)
+			return 1;
+
+		empty.val = key;
+		err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+		if (err)
+			return 2;
+
+		e = bpf_map_lookup_elem(&rhmap, &key);
+		if (!e || e->val != key)
+			return 4;
+	}
+
+	for (i = 0; i < 128; ++i) {
+		key = i;
+		err = bpf_map_delete_elem(&rhmap, &key);
+		if (err)
+			return 3;
+
+		e = bpf_map_lookup_elem(&rhmap, &key);
+		if (e)
+			return 4;
+	}
+
+	err = 0;
+	return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_exist(void *ctx)
+{
+	int key = 10;
+	struct elem val1 = {.val = 100, .arr = {0}};
+	struct elem val2 = {.val = 200, .arr = {0}};
+	struct elem *e;
+	int ret;
+
+	err = 1;
+
+	/* BPF_EXIST on non-existent key should fail with -ENOENT */
+	ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_EXIST);
+	if (ret != -ENOENT)
+		return 1;
+
+	/* Insert element first */
+	ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_NOEXIST);
+	if (ret)
+		return 2;
+
+	/* Verify initial value */
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (!e || e->val != 100)
+		return 3;
+
+	/* BPF_EXIST on existing key should succeed and update value */
+	ret = bpf_map_update_elem(&rhmap, &key, &val2, BPF_EXIST);
+	if (ret)
+		return 4;
+
+	/* Verify value was updated */
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (!e || e->val != 200)
+		return 5;
+
+	/* Cleanup */
+	bpf_map_delete_elem(&rhmap, &key);
+	err = 0;
+	return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_any(void *ctx)
+{
+	int key = 11;
+	struct elem val1 = {.val = 111, .arr = {0}};
+	struct elem val2 = {.val = 222, .arr = {0}};
+	struct elem *e;
+	int ret;
+
+	err = 1;
+
+	/* BPF_ANY on non-existent key should insert */
+	ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_ANY);
+	if (ret)
+		return 1;
+
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (!e || e->val != 111)
+		return 2;
+
+	/* BPF_ANY on existing key should update */
+	ret = bpf_map_update_elem(&rhmap, &key, &val2, BPF_ANY);
+	if (ret)
+		return 3;
+
+	e = bpf_map_lookup_elem(&rhmap, &key);
+	if (!e || e->val != 222)
+		return 4;
+
+	/* Cleanup */
+	bpf_map_delete_elem(&rhmap, &key);
+	err = 0;
+	return 0;
+}
+
+SEC("syscall")
+int test_rhash_noexist_duplicate(void *ctx)
+{
+	int key = 12;
+	struct elem val = {.val = 600, .arr = {0}};
+	int ret;
+
+	err = 1;
+
+	/* Insert element */
+	ret = bpf_map_update_elem(&rhmap, &key, &val, BPF_NOEXIST);
+	if (ret)
+		return 1;
+
+	/* Try to insert again with BPF_NOEXIST - should fail with -EEXIST */
+	ret = bpf_map_update_elem(&rhmap, &key, &val, BPF_NOEXIST);
+	if (ret != -EEXIST)
+		return 2;
+
+	/* Cleanup */
+	bpf_map_delete_elem(&rhmap, &key);
+	err = 0;
+	return 0;
+}
+
+SEC("syscall")
+int test_rhash_delete_nonexistent(void *ctx)
+{
+	int key = 99999;
+	int ret;
+
+	err = 1;
+
+	/* Delete non-existent key should return -ENOENT */
+	ret = bpf_map_delete_elem(&rhmap, &key);
+	if (ret != -ENOENT)
+		return 1;
+
+	err = 0;
+	return 0;
+}

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 08/10] selftests/bpf: Add BPF iterator tests for resizable hash map
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (6 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 09/10] bpftool: Add rhash map documentation Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 10/10] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
  9 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Test basic BPF iterator functionality for BPF_MAP_TYPE_RHASH,
verifying all elements are visited.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 tools/testing/selftests/bpf/prog_tests/rhash.c     | 63 ++++++++++++++++++++++
 .../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c   | 34 ++++++++++++
 2 files changed, 97 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
index 0f6eec4fe1fe..7e5a5fbd70c2 100644
--- a/tools/testing/selftests/bpf/prog_tests/rhash.c
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -4,6 +4,7 @@
 #include <string.h>
 #include <stdio.h>
 #include "rhash.skel.h"
+#include "bpf_iter_bpf_rhash_map.skel.h"
 #include <linux/bpf.h>
 #include <linux/perf_event.h>
 #include <sys/syscall.h>
@@ -39,6 +40,65 @@ static void rhash_run(const char *prog_name)
 	rhash__destroy(skel);
 }
 
+static void rhash_iter_test(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct bpf_iter_bpf_rhash_map *skel;
+	int err, i, len, map_fd, iter_fd;
+	union bpf_iter_link_info linfo;
+	u32 expected_key_sum = 0, key;
+	struct bpf_link *link;
+	u64 val = 0;
+	char buf[64];
+
+	skel = bpf_iter_bpf_rhash_map__open();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_rhash_map__open"))
+		return;
+
+	err = bpf_iter_bpf_rhash_map__load(skel);
+	if (!ASSERT_OK(err, "bpf_iter_bpf_rhash_map__load"))
+		goto out;
+
+	map_fd = bpf_map__fd(skel->maps.rhashmap);
+
+	/* Populate map with test data */
+	for (i = 0; i < 64; i++) {
+		key = i + 1;
+		expected_key_sum += key;
+
+		err = bpf_map_update_elem(map_fd, &key, &val, BPF_NOEXIST);
+		if (!ASSERT_OK(err, "map_update"))
+			goto out;
+	}
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.map.map_fd = map_fd;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_rhash_map, &opts);
+	if (!ASSERT_OK_PTR(link, "attach_iter"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (!ASSERT_GE(iter_fd, 0, "create_iter"))
+		goto free_link;
+
+	do {
+		len = read(iter_fd, buf, sizeof(buf));
+	} while (len > 0);
+
+	ASSERT_EQ(skel->bss->key_sum, expected_key_sum, "key_sum");
+	ASSERT_EQ(skel->bss->elem_count, 64, "elem_count");
+
+	close(iter_fd);
+
+free_link:
+	bpf_link__destroy(link);
+out:
+	bpf_iter_bpf_rhash_map__destroy(skel);
+}
+
 void test_rhash(void)
 {
 	if (test__start_subtest("test_rhash_lookup_update"))
@@ -61,4 +121,7 @@ void test_rhash(void)
 
 	if (test__start_subtest("test_rhash_delete_nonexistent"))
 		rhash_run("test_rhash_delete_nonexistent");
+
+	if (test__start_subtest("test_rhash_iter"))
+		rhash_iter_test();
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c
new file mode 100644
index 000000000000..86f6c0d5eadb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RHASH);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__uint(max_entries, 64);
+	__type(key, __u32);
+	__type(value, __u64);
+} rhashmap SEC(".maps");
+
+__u32 key_sum = 0;
+__u64 val_sum = 0;
+__u32 elem_count = 0;
+__u32 err = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_rhash_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+	__u32 *key = ctx->key;
+	__u64 *val = ctx->value;
+
+	if (!key || !val)
+		return 0;
+
+	key_sum += *key;
+	val_sum += *val;
+	elem_count++;
+	return 0;
+}

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 09/10] bpftool: Add rhash map documentation
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (7 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 08/10] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  2026-04-24 19:50 ` [PATCH bpf-next v3 10/10] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
  9 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko, Emil Tsalapatis

From: Mykyta Yatsenko <yatsenko@meta.com>

Make bpftool documentation aware of the resizable hash map.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
 tools/bpf/bpftool/map.c                         | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 1af3305ea2b2..5daf3de5c744 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -56,7 +56,7 @@ MAP COMMANDS
 |     | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
 |     | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
 |     | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena**
-|     | **insn_array** }
+|     | **insn_array** | **rhash** }
 
 DESCRIPTION
 ===========
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7ebf7dbcfba4..71a45d96617e 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1478,7 +1478,7 @@ static int do_help(int argc, char **argv)
 		"                 cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
 		"                 queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
 		"                 task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena |\n"
-		"                 insn_array }\n"
+		"                 insn_array | rhash }\n"
 		"       " HELP_SPEC_OPTIONS " |\n"
 		"                    {-f|--bpffs} | {-n|--nomount} }\n"
 		"",

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v3 10/10] selftests/bpf: Add resizable hashmap to benchmarks
  2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
                   ` (8 preceding siblings ...)
  2026-04-24 19:50 ` [PATCH bpf-next v3 09/10] bpftool: Add rhash map documentation Mykyta Yatsenko
@ 2026-04-24 19:50 ` Mykyta Yatsenko
  9 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-24 19:50 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Support resizable hashmap in BPF map benchmarks.

Lookup performance (bpf-rhashmap-lookup vs bpf-hashmap-lookup, events/sec):

  key_size | max_entries | nr_entries |    htab   |   rhtab
  ---------+-------------+------------+-----------+-----------
     4     |    1,000    |    500     |  17.22M   |  16.92M
     8     |  1,000,000  |  200,000   |  15.21M   |  13.39M
     8     |  1,000,000  |  700,000   |  13.16M   |  13.62M
     8     |  1,000,000  | 1,000,000  |   7.93M   |   9.52M
    16     |  1,000,000  |  700,000   |  10.77M   |  11.85M
    32     |  1,000,000  |  700,000   |  10.24M   |  10.69M

rhtab matches or exceeds htab at higher load factors and larger keys.
At full occupancy (1M/1M) rhtab is 20% faster since htab's fixed
bucket count leads to longer chains while rhtab resizes to maintain
a healthy load factor.

In-place update (bpf-*-full-update, events/sec):

  htab :  7.62M
  rhtab: 14.94M  (+96%)

rhtab updates values in place via memcpy; htab allocates a new element
and swaps the pointer for each update.

Memory (htab-mem / rhtab-mem, 8 producers, overwrite workload):

  value_size |  htab ops/s  |  rhtab ops/s  |  htab mem  |  rhtab mem
  -----------+--------------+---------------+------------+-----------
      32     |   18.27k/s   |   27.05k/s    |  0.98 MiB  |  0.99 MiB
    4096     |   14.00k/s   |   17.00k/s    |  2.23 MiB  |  1.99 MiB

rhtab is 21-48% faster on concurrent overwrites due to in-place
updates avoiding per-element allocation.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 tools/testing/selftests/bpf/bench.c                |  6 ++++
 .../bpf/benchs/bench_bpf_hashmap_full_update.c     | 34 +++++++++++++++++++--
 .../bpf/benchs/bench_bpf_hashmap_lookup.c          | 31 +++++++++++++++++--
 .../testing/selftests/bpf/benchs/bench_htab_mem.c  | 35 ++++++++++++++++++++--
 4 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 029b3e21f438..722877c00e8b 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -558,13 +558,16 @@ extern const struct bench bench_bpf_loop;
 extern const struct bench bench_strncmp_no_helper;
 extern const struct bench bench_strncmp_helper;
 extern const struct bench bench_bpf_hashmap_full_update;
+extern const struct bench bench_bpf_rhashmap_full_update;
 extern const struct bench bench_local_storage_cache_seq_get;
 extern const struct bench bench_local_storage_cache_interleaved_get;
 extern const struct bench bench_local_storage_cache_hashmap_control;
 extern const struct bench bench_local_storage_tasks_trace;
 extern const struct bench bench_bpf_hashmap_lookup;
+extern const struct bench bench_bpf_rhashmap_lookup;
 extern const struct bench bench_local_storage_create;
 extern const struct bench bench_htab_mem;
+extern const struct bench bench_rhtab_mem;
 extern const struct bench bench_crypto_encrypt;
 extern const struct bench bench_crypto_decrypt;
 extern const struct bench bench_sockmap;
@@ -636,13 +639,16 @@ static const struct bench *benchs[] = {
 	&bench_strncmp_no_helper,
 	&bench_strncmp_helper,
 	&bench_bpf_hashmap_full_update,
+	&bench_bpf_rhashmap_full_update,
 	&bench_local_storage_cache_seq_get,
 	&bench_local_storage_cache_interleaved_get,
 	&bench_local_storage_cache_hashmap_control,
 	&bench_local_storage_tasks_trace,
 	&bench_bpf_hashmap_lookup,
+	&bench_bpf_rhashmap_lookup,
 	&bench_local_storage_create,
 	&bench_htab_mem,
+	&bench_rhtab_mem,
 	&bench_crypto_encrypt,
 	&bench_crypto_decrypt,
 	&bench_sockmap,
diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
index ee1dc12c5e5e..7278fa860397 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
@@ -34,19 +34,29 @@ static void measure(struct bench_res *res)
 {
 }
 
-static void setup(void)
+static void hashmap_full_update_setup(enum bpf_map_type map_type)
 {
 	struct bpf_link *link;
 	int map_fd, i, max_entries;
 
 	setup_libbpf();
 
-	ctx.skel = bpf_hashmap_full_update_bench__open_and_load();
+	ctx.skel = bpf_hashmap_full_update_bench__open();
 	if (!ctx.skel) {
 		fprintf(stderr, "failed to open skeleton\n");
 		exit(1);
 	}
 
+	bpf_map__set_type(ctx.skel->maps.hash_map_bench, map_type);
+	if (map_type == BPF_MAP_TYPE_RHASH)
+		bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench,
+				       BPF_F_NO_PREALLOC);
+
+	if (bpf_hashmap_full_update_bench__load(ctx.skel)) {
+		fprintf(stderr, "failed to load skeleton\n");
+		exit(1);
+	}
+
 	ctx.skel->bss->nr_loops = MAX_LOOP_NUM;
 
 	link = bpf_program__attach(ctx.skel->progs.benchmark);
@@ -62,6 +72,16 @@ static void setup(void)
 		bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
 }
 
+static void setup(void)
+{
+	hashmap_full_update_setup(BPF_MAP_TYPE_HASH);
+}
+
+static void rhash_setup(void)
+{
+	hashmap_full_update_setup(BPF_MAP_TYPE_RHASH);
+}
+
 static void hashmap_report_final(struct bench_res res[], int res_cnt)
 {
 	unsigned int nr_cpus = bpf_num_possible_cpus();
@@ -87,3 +107,13 @@ const struct bench bench_bpf_hashmap_full_update = {
 	.report_progress = NULL,
 	.report_final = hashmap_report_final,
 };
+
+const struct bench bench_bpf_rhashmap_full_update = {
+	.name = "bpf-rhashmap-full-update",
+	.validate = validate,
+	.setup = rhash_setup,
+	.producer_thread = producer,
+	.measure = measure,
+	.report_progress = NULL,
+	.report_final = hashmap_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
index 279ff1b8b5b2..5264b7b20e39 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
@@ -148,9 +148,10 @@ static inline void patch_key(u32 i, u32 *key)
 	/* the rest of key is random */
 }
 
-static void setup(void)
+static void hashmap_lookup_setup(enum bpf_map_type map_type)
 {
 	struct bpf_link *link;
+	__u32 map_flags;
 	int map_fd;
 	int ret;
 	int i;
@@ -163,10 +164,15 @@ static void setup(void)
 		exit(1);
 	}
 
+	map_flags = args.map_flags;
+	if (map_type == BPF_MAP_TYPE_RHASH)
+		map_flags |= BPF_F_NO_PREALLOC;
+
+	bpf_map__set_type(ctx.skel->maps.hash_map_bench, map_type);
 	bpf_map__set_max_entries(ctx.skel->maps.hash_map_bench, args.max_entries);
 	bpf_map__set_key_size(ctx.skel->maps.hash_map_bench, args.key_size);
 	bpf_map__set_value_size(ctx.skel->maps.hash_map_bench, 8);
-	bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench, args.map_flags);
+	bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench, map_flags);
 
 	ctx.skel->bss->nr_entries = args.nr_entries;
 	ctx.skel->bss->nr_loops = args.nr_loops / args.nr_entries;
@@ -197,6 +203,16 @@ static void setup(void)
 	}
 }
 
+static void setup(void)
+{
+	hashmap_lookup_setup(BPF_MAP_TYPE_HASH);
+}
+
+static void rhash_setup(void)
+{
+	hashmap_lookup_setup(BPF_MAP_TYPE_RHASH);
+}
+
 static inline double events_from_time(u64 time)
 {
 	if (time)
@@ -275,3 +291,14 @@ const struct bench bench_bpf_hashmap_lookup = {
 	.report_progress = NULL,
 	.report_final = hashmap_report_final,
 };
+
+const struct bench bench_bpf_rhashmap_lookup = {
+	.name = "bpf-rhashmap-lookup",
+	.argp = &bench_hashmap_lookup_argp,
+	.validate = validate,
+	.setup = rhash_setup,
+	.producer_thread = producer,
+	.measure = measure,
+	.report_progress = NULL,
+	.report_final = hashmap_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
index 297e32390cd1..1ee217d97434 100644
--- a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
+++ b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
@@ -152,7 +152,7 @@ static const struct htab_mem_use_case *htab_mem_find_use_case_or_exit(const char
 	exit(1);
 }
 
-static void htab_mem_setup(void)
+static void htab_mem_setup_impl(enum bpf_map_type map_type)
 {
 	struct bpf_map *map;
 	const char **names;
@@ -178,10 +178,11 @@ static void htab_mem_setup(void)
 	}
 
 	map = ctx.skel->maps.htab;
+	bpf_map__set_type(map, map_type);
 	bpf_map__set_value_size(map, args.value_size);
 	/* Ensure that different CPUs can operate on different subset */
 	bpf_map__set_max_entries(map, MAX(8192, 64 * env.nr_cpus));
-	if (args.preallocated)
+	if (map_type != BPF_MAP_TYPE_RHASH && args.preallocated)
 		bpf_map__set_map_flags(map, bpf_map__map_flags(map) & ~BPF_F_NO_PREALLOC);
 
 	names = ctx.uc->progs;
@@ -220,6 +221,16 @@ static void htab_mem_setup(void)
 	exit(1);
 }
 
+static void htab_mem_setup(void)
+{
+	htab_mem_setup_impl(BPF_MAP_TYPE_HASH);
+}
+
+static void rhtab_mem_setup(void)
+{
+	htab_mem_setup_impl(BPF_MAP_TYPE_RHASH);
+}
+
 static void htab_mem_add_fn(pthread_barrier_t *notify)
 {
 	while (true) {
@@ -338,6 +349,15 @@ static void htab_mem_report_final(struct bench_res res[], int res_cnt)
 	cleanup_cgroup_environment();
 }
 
+static void rhtab_mem_validate(void)
+{
+	if (args.preallocated) {
+		fprintf(stderr, "rhash map does not support preallocation\n");
+		exit(1);
+	}
+	htab_mem_validate();
+}
+
 const struct bench bench_htab_mem = {
 	.name = "htab-mem",
 	.argp = &bench_htab_mem_argp,
@@ -348,3 +368,14 @@ const struct bench bench_htab_mem = {
 	.report_progress = htab_mem_report_progress,
 	.report_final = htab_mem_report_final,
 };
+
+const struct bench bench_rhtab_mem = {
+	.name = "rhtab-mem",
+	.argp = &bench_htab_mem_argp,
+	.validate = rhtab_mem_validate,
+	.setup = rhtab_mem_setup,
+	.producer_thread = htab_mem_producer,
+	.measure = htab_mem_measure,
+	.report_progress = htab_mem_report_progress,
+	.report_final = htab_mem_report_final,
+};

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map
  2026-04-24 19:50 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
@ 2026-04-24 20:02   ` sashiko-bot
  2026-04-24 20:32   ` bot+bpf-ci
  1 sibling, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-04-24 20:02 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [Medium] The BPF test framework silently ignores test failures because it does not check the BPF program's return value (`opts.retval`) and incorrectly reuses the global `err` variable.
--

commit dc85c0ab999d049899213215cdd840f6c0a212f0
Author: Mykyta Yatsenko <yatsenko@meta.com>

selftests/bpf: Add basic tests for resizable hash map
    
Test basic map operations (lookup, update, delete) for BPF_MAP_TYPE_RHASH
including boundary conditions like duplicate key insertion and deletion of
nonexistent keys.

> diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
> new file mode 100644
> index 00000000000000..0f6eec4fe1fe1d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/rhash.c

[ ... ]

> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */

[ ... ]

> +	err = rhash__load(skel);
> +	if (!ASSERT_OK(err, "skel_load"))
> +		goto cleanup;
> +
> +	err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
> +	if (!ASSERT_OK(err, "prog run"))
> +		goto cleanup;
> +
> +	if (!ASSERT_OK(skel->bss->err, "bss->err"))
> +		goto cleanup;

Does the test framework ignore the BPF program's return value here?

It seems opts.retval is not being checked.

> diff --git a/tools/testing/selftests/bpf/progs/rhash.c b/tools/testing/selftests/bpf/progs/rhash.c
> new file mode 100644
> index 00000000000000..9d066a22ce1865
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/rhash.c

[ ... ]

> @@ -0,0 +1,239 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */

[ ... ]

> +SEC("syscall")
> +int test_rhash_update_delete(void *ctx)
> +{
> +	int key = 6;
> +	struct elem empty = {.val = 4, .arr = {0}};
> +	struct elem *e;
> +
> +	err = 1;
> +	e = bpf_map_lookup_elem(&rhmap, &key);
> +	if (e)
> +		return 1;
> +
> +	err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
> +	if (err)
> +		return 2;
> +
> +	err = bpf_map_delete_elem(&rhmap, &key);
> +	if (err)
> +		return 3;
> +
> +	e = bpf_map_lookup_elem(&rhmap, &key);
> +	if (e)
> +		return 4;

Can this silently ignore test failures?

If this final lookup succeeds (meaning e is not NULL when it should be), the
BPF program exits early with return 4. However, the global err variable
was just set to 0 by the successful bpf_map_delete_elem() call above.

Because the userspace runner only checks ASSERT_OK(skel->bss->err) and ignores
opts.retval, won't this incorrectly mark a failed test as passed?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260424-rhash-v3-0-d0fa0ce4379b@meta.com?part=7

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from()
  2026-04-24 19:50 ` [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
@ 2026-04-24 20:15   ` sashiko-bot
  2026-04-24 20:45   ` bot+bpf-ci
  2026-04-28 10:35   ` Herbert Xu
  2 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-04-24 20:15 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 3 potential issue(s):
- [High] `rhashtable_walk_enter_from()` ignores `future_tbl` during resizes, causing stateless iterators to infinite loop.
- [High] `rhashtable_walk_enter_from()` fails to initialize `iter->list` for `rhltable`s, causing an uninitialized memory read and breaking iteration.
- [Medium] In `test_walk_enter_from()`, an error path jumps to `out` without setting the `err` variable, causing a silent test failure.
--

commit 63530ac8c42e3a9716786a83ad797edaae556017
Author: Mykyta Yatsenko <yatsenko@meta.com>

rhashtable: Add rhashtable_walk_enter_from()

This commit introduces rhashtable_walk_enter_from() to allow efficient
iteration resume by positioning the walker at the bucket containing the
given key. It also refactors the internal lookup functions and adds
test cases.

> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c

[ ... ]

> @@ -692,6 +692,47 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
>  }
>  EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
>  
> +/**
> + * rhashtable_walk_enter_from - Initialise a walk starting at a key's bucket
[ ... ]
> +void rhashtable_walk_enter_from(struct rhashtable *ht,
> +				struct rhashtable_iter *iter,
> +				const void *key,
> +				const struct rhashtable_params params)
> +				__must_hold(RCU)
> +{
> +	struct bucket_table *tbl;
> +	struct rhash_head *he;
> +
> +	rhashtable_walk_enter(ht, iter);
> +
> +	if (!key)
> +		return;
> +
> +	tbl = iter->walker.tbl;
> +	if (!tbl)
> +		return;
> +
> +	he = __rhashtable_lookup_one(ht, tbl, key, params,
> +				     RHT_LOOKUP_NORMAL);
> +	if (!he)
> +		return;

What happens if the table is currently resizing and the key has already been
moved to the future table?

If __rhashtable_lookup_one() fails to find the key in the current table, it
leaves the iterator positioned at bucket 0.

For stateless iterators like BPF's get_next_key, could this cause an
infinite loop where it constantly resets to the beginning of the old table
until the resize completes?

> +
> +	iter->slot = rht_key_hashfn(ht, tbl, key, params);
> +	iter->p = he;

Does this leave iter->list uninitialized for rhlist tables?

rhashtable_walk_enter() does not initialize iter->list. When
rhashtable_walk_start_check() is called for an rhlist table, it validates
the element using:

lib/rhashtable.c:rhashtable_walk_start_check() {
    ...
    } else if (iter->p && rhlist) {
        ...
                if (list == iter->list) {
                    iter->p = p;
                    iter->skip = skip;
                    goto found;
                }
    ...
}

Will this uninitialized read of iter->list cause the validation to fail
and silently fall back to the beginning of the bucket?

> +}
> +EXPORT_SYMBOL_GPL(rhashtable_walk_enter_from);

[ ... ]

> diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
> --- a/lib/test_rhashtable.c
> +++ b/lib/test_rhashtable.c

[ ... ]

> @@ -679,6 +680,133 @@ static int threadfunc(void *data)
>  	return err;
>  }
>  
> +static int __init test_walk_enter_from(void)
> +{

[ ... ]

> +		/* Count the number of elements after this one */
> +		do {
> +			obj = rhashtable_walk_next(&iter);
> +			count++;
> +		} while (obj && (!IS_ERR(obj) || (PTR_ERR(obj) == -EAGAIN)));
> +
> +		rhashtable_walk_stop(&iter);
> +		rhashtable_walk_exit(&iter);
> +		if (count > 4) {
> +			pr_warn("walk_enter_from: walked %d elements instead of max 4\n", count);
> +			goto out;
> +		}

Does this error path need to set the err variable?

Right now it jumps to out and returns 0, which might cause the test suite
to silently miss the failure.

> +		/*
> +		 * Collect the lengths of every walked path,
> +		 * we should have each length walked one time
> +		 */
> +		len_freq[count - 1]++;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260424-rhash-v3-0-d0fa0ce4379b@meta.com?part=2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for resizable hashtab
  2026-04-24 19:50 ` [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for " Mykyta Yatsenko
@ 2026-04-24 20:28   ` sashiko-bot
  2026-04-25 21:24     ` Mykyta Yatsenko
  0 siblings, 1 reply; 27+ messages in thread
From: sashiko-bot @ 2026-04-24 20:28 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 4 potential issue(s):
- [High] Cursor invalidation in batch and seq_file iterators causes O(N^2) complexity and infinite loops.
- [High] Unbounded RCU read lock duration in batch operations allows user-triggered RCU stalls.
- [High] Map iterators and seq_file dumps spuriously abort during normal background resizes.
- [High] Deadlock risk in `bpf_each_rhash_elem` due to acquiring `spin_lock` from atomic contexts.
--

commit 9f7ce51619eb27e8a4a65df12f862e12e34bc285
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Implement batch ops and iterators for resizable hashtab

This commit adds batch operations for resizable hashtab maps, using
rhashtable_walk_enter_from() to resume iteration. It also wires up
seq_file BPF iterators and implements rhtab_map_mem_usage().

> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index d37f3d548d3658..000caa2c7f4cd9 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c

[ ... ]

> @@ -3066,64 +3066,330 @@ static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key
> +static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
> +				void *callback_ctx, u64 flags)
> +{

[ ... ]

> +	/*
> +	 * The rhashtable walk API uses spin_lock in rhashtable_walk_start/stop,
> +	 * which is not safe in NMI or soft/hard IRQ context.
> +	 */
> +	if (in_nmi() || in_hardirq() || in_softirq())
> +		return -EOPNOTSUPP;
> +
> +	rhashtable_walk_enter(&rhtab->ht, &iter);

Does this adequately protect against deadlocks when called from kprobes or
tracepoints in process context with preemption disabled?

If a thread currently holding ht->lock (e.g., during map insertion) is
interrupted by a kprobe that runs this BPF program, attempting to acquire
the same spin_lock inside rhashtable_walk_enter() might cause a deadlock.

> +	rhashtable_walk_start(&iter);
> +
> +	while ((elem = rhashtable_walk_next(&iter))) {
> +		/* rhashtable_walk_next returns -EAGAIN on resize, abort */
> +		if (IS_ERR(elem)) {
> +			num_elems = -EBUSY;
> +			break;
> +		}

Is it expected to abort the entire iteration if a background map resize
occurs? 

rhashtable_walk_next() returns -EAGAIN to indicate the table was resized and
the iterator was rewound. Since resizable hashtabs automatically resize
during normal workloads, returning -EBUSY here could cause
bpf_for_each_map_elem() and bpftool map dumps to randomly fail.

Should this handle -EAGAIN by retrying or continuing the walk instead?

[ ... ]

> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
> +					       const union bpf_attr *attr,
> +					       union bpf_attr __user *uattr,
> +					       bool do_delete)
> +{

[ ... ]

> +	/*
> +	 * Use the last key from the previous batch as cursor.
> +	 * enter_from positions at that key's bucket, walk_next
> +	 * returns the successor in O(1).
> +	 * First call (ubatch == NULL): starts from bucket 0.
> +	 */
> +	if (ubatch) {
> +		buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
> +		if (!buf) {
> +			ret = -ENOMEM;
> +			goto free;
> +		}
> +		if (copy_from_user(buf, ubatch, key_size)) {
> +			ret = -EFAULT;
> +			goto free;
> +		}
> +	}
> +
> +	scoped_guard(rcu) {
> +		rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
> +		rhashtable_walk_start(&iter);
> +	}

If the key stored in buf was already deleted, will this iteration restart
from the beginning of the map?

Looking at rhashtable_walk_enter_from(), if it fails to find the key, it
leaves iter->slot at 0. Then rhashtable_walk_start() and walk_next() will
start from bucket 0 instead of where the previous batch left off.

Since lookup_and_delete_batch deletes the elements returned in the previous
batch, the cursor key will always be deleted. Could this cause the iterator
to repeatedly scan over all emptied buckets, resulting in an expensive
traversal?

Similarly, for lookup_batch and seq_file dumps, if the cursor is
concurrently deleted, could this trap userspace in an infinite loop of
duplicate elements?

> +
> +	dst_key = keys;
> +	dst_val = values;
> +	total = 0;
> +
> +	while (total < max_count) {
> +		elem = rhtab_iter_next(&iter);
> +		if (!elem)
> +			break;

Can this loop cause RCU stalls or soft lockups if max_count is very large? 

The max_count parameter is controlled by userspace. Since
rhashtable_walk_start() acquires the RCU read-side critical section, this
loop might iterate, copy memory, and process millions of elements without
ever dropping the RCU lock or calling cond_resched().

[ ... ]

> +static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
> +	__acquires(RCU)
> +{

[ ... ]

> +	elem = rhashtable_walk_next(&info->iter);
> +	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
> +	if (IS_ERR(elem))
> +		return ERR_PTR(-EBUSY);

Similar to bpf_each_rhash_elem, will returning -EBUSY here on normal hash
table resizes cause userspace map dumps to fail spuriously?

[ ... ]

> +static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{

[ ... ]

> +	elem = rhashtable_walk_next(&info->iter);
> +	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
> +	if (IS_ERR(elem))
> +		return ERR_PTR(-EBUSY);
> +
> +	return elem;
> +}

Again, aborting on -EAGAIN here might interrupt the map dump when a resize
happens in the background. Is there a way to handle -EAGAIN without failing
the entire sequence?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260424-rhash-v3-0-d0fa0ce4379b@meta.com?part=4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map
  2026-04-24 19:50 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
  2026-04-24 20:02   ` sashiko-bot
@ 2026-04-24 20:32   ` bot+bpf-ci
  1 sibling, 0 replies; 27+ messages in thread
From: bot+bpf-ci @ 2026-04-24 20:32 UTC (permalink / raw)
  To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
	eddyz87, memxor, herbert
  Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
	clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

> diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
> new file mode 100644
> index 000000000000..0f6eec4fe1fe
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
> +#include <test_progs.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include "rhash.skel.h"
> +#include <linux/bpf.h>
> +#include <linux/perf_event.h>
> +#include <sys/syscall.h>
> +
> +static void rhash_run(const char *prog_name)
> +{
> +	struct rhash *skel;
> +	struct bpf_program *prog;
> +	LIBBPF_OPTS(bpf_test_run_opts, opts);
> +	int err;
> +
> +	skel = rhash__open();
> +	if (!ASSERT_OK_PTR(skel, "rhash__open"))
> +		return;
> +
> +	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
> +	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +		goto cleanup;
> +	bpf_program__set_autoload(prog, true);
> +
> +	err = rhash__load(skel);
> +	if (!ASSERT_OK(err, "skel_load"))
> +		goto cleanup;
> +
> +	err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
> +	if (!ASSERT_OK(err, "prog run"))
> +		goto cleanup;
> +
> +	if (!ASSERT_OK(skel->bss->err, "bss->err"))
> +		goto cleanup;

sashiko-bot@kernel.org raised a concern about the test framework potentially
missing BPF program failures. Can the test pass incorrectly if a BPF program
returns non-zero while skel->bss->err is zero?

Looking at test_rhash_update_delete() in progs/rhash.c:

    err = bpf_map_delete_elem(&rhmap, &key);
    if (err)
        return 3;

    e = bpf_map_lookup_elem(&rhmap, &key);
    if (e)
        return 4;

    err = 0;
    return 0;

If the final lookup finds an element when it shouldn't (e is not NULL), the
BPF program exits early with return 4. However, the global err variable was
just set to 0 by the successful bpf_map_delete_elem() call.

Since rhash_run() only checks ASSERT_OK(skel->bss->err) and doesn't check
opts.retval from bpf_prog_test_run_opts(), won't this allow the test to pass
when it should fail?

This issue appears in other subtests as well, where early return values like
1, 2, 3, 4, and 5 can be returned while err remains 0.

Reference: https://lore.kernel.org/bpf/20260424200255.D5C89C19425@smtp.kernel.org/

> +
> +cleanup:
> +	rhash__destroy(skel);
> +}

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24909591959

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions
  2026-04-24 19:50 ` [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
@ 2026-04-24 20:40   ` sashiko-bot
  2026-04-25 20:41     ` Mykyta Yatsenko
  2026-04-24 20:45   ` bot+bpf-ci
  1 sibling, 1 reply; 27+ messages in thread
From: sashiko-bot @ 2026-04-24 20:40 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [Critical] Using rhashtable for BPF map updates/deletes is unsafe and will lead to hard deadlocks from IRQ/NMI/softirq contexts.
--

commit 47c235933705172d3a36c4339c754282500161f1
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Implement resizable hashmap basic functions

This commit adds a new BPF map type BPF_MAP_TYPE_RHASH backed by the kernel
rhashtable implementation. It provides basic map operations like lookup,
update, and delete using rhashtable APIs.

> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 3dd9b4924ae4fb..a37bd2a7b30f4b 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c

[ ... ]

> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
> +{
> +	int err;
> +
> +	err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);

Can this lead to a deadlock when used by non-sleepable BPF programs in
interrupt or softirq contexts?

The verifier permits BPF_MAP_TYPE_RHASH to be used by non-sleepable BPF
programs, such as those attached to XDP or TC, which execute in softirq
context and do not check bpf_prog_active.

Map operations like rhashtable_remove_fast() and
rhashtable_lookup_get_insert_fast() internally use bit_spin_lock() for
bucket-level synchronization. This locking mechanism disables preemption
but leaves hardware and software interrupts enabled, unlike the
raw_spin_lock_irqsave() approach used by the standard htab.

If a syscall updates the map from process context and acquires the bucket
lock, and is then preempted by a hardware interrupt that schedules a softirq
on return, the softirq will execute the XDP or TC BPF program.

If this BPF program attempts to access the same hash bucket, wouldn't it spin
indefinitely waiting for the bit_spin_lock held by the interrupted process,
causing a system deadlock?

[ ... ]

> +static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
> +{
> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> +	struct rhtab_elem *elem, *tmp;

[ ... ]

> +	tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);

Does the same deadlock mechanism apply here as well, given that
rhashtable_lookup_get_insert_fast() also relies on bucket-level locks?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260424-rhash-v3-0-d0fa0ce4379b@meta.com?part=1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from()
  2026-04-24 19:50 ` [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
  2026-04-24 20:15   ` sashiko-bot
@ 2026-04-24 20:45   ` bot+bpf-ci
  2026-04-28 10:35   ` Herbert Xu
  2 siblings, 0 replies; 27+ messages in thread
From: bot+bpf-ci @ 2026-04-24 20:45 UTC (permalink / raw)
  To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
	eddyz87, memxor, herbert
  Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
	clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 4942 bytes --]

> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -693,6 +693,47 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
>  EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
>
>  /**
> + * rhashtable_walk_enter_from - Initialise a walk starting at a key's bucket
> + * @ht:		Table to walk over
> + * @iter:	Hash table iterator
> + * @key:	Key whose bucket to start from
> + * @params:	Hash table parameters
> + *
> + * Like rhashtable_walk_enter(), but positions the iterator at the bucket
> + * containing @key.  If @key is not present in the main table, the iterator is
> + * positioned at the beginning.
> + *
> + * Same constraints as rhashtable_walk_enter() apply.
> + */
> +void rhashtable_walk_enter_from(struct rhashtable *ht,
> +				struct rhashtable_iter *iter,
> +				const void *key,
> +				const struct rhashtable_params params)
> +				__must_hold(RCU)
> +{
> +	struct bucket_table *tbl;
> +	struct rhash_head *he;
> +
> +	rhashtable_walk_enter(ht, iter);
> +
> +	if (!key)
> +		return;
> +
> +	tbl = iter->walker.tbl;
> +	if (!tbl)
> +		return;
> +
> +	he = __rhashtable_lookup_one(ht, tbl, key, params,
> +				     RHT_LOOKUP_NORMAL);
> +	if (!he)
> +		return;

What happens if the table is currently resizing and the key has already
been moved to the future table?

When __rhashtable_lookup_one() fails to find the key in the current
table, the function returns early, leaving the iterator positioned at
bucket 0.

For stateless iterators like BPF's get_next_key, could this cause an
infinite loop where it constantly resets to the beginning of the old
table until the resize completes?

The v2 implementation had code to check future_tbl when the key was not
found in the main table, but this was removed in v3.

Reference: https://lore.kernel.org/all/20260424201535.EA1BCC19425@smtp.kernel.org/

> +
> +	iter->slot = rht_key_hashfn(ht, tbl, key, params);
> +	iter->p = he;

Does this leave iter->list uninitialized for rhlist tables?

rhashtable_walk_enter() does not initialize iter->list. When
rhashtable_walk_start_check() is later called for an rhlist table, it
validates the element using:

lib/rhashtable.c:rhashtable_walk_start_check() {
    ...
} else if (iter->p && rhlist) {
    ...
    if (list == iter->list) {
        iter->p = p;
        iter->skip = skip;
        goto found;
    }
    ...
}

Will this uninitialized read of iter->list cause the validation to fail
and silently fall back to the beginning of the bucket?

Reference: https://lore.kernel.org/all/20260424201535.EA1BCC19425@smtp.kernel.org/

> +}
> +EXPORT_SYMBOL_GPL(rhashtable_walk_enter_from);

[ ... ]

> diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
> --- a/lib/test_rhashtable.c
> +++ b/lib/test_rhashtable.c

[ ... ]

> @@ -679,6 +680,133 @@
>  	return err;
>  }
>
> +static int __init test_walk_enter_from(void)
> +{
> +	struct rhashtable ht;
> +	struct test_obj objs[4];
> +	struct rhashtable_iter iter;
> +	struct test_obj *obj;
> +	struct test_obj_val key = { };
> +	int err, i, count;
> +	int len_freq[4] = { 0, 0, 0, 0 };
> +
> +	err = rhashtable_init(&ht, &test_rht_params);
> +	if (err)
> +		return err;
> +
> +	/* Insert 4 elements with keys 0, 2, 4, 6 */
> +	for (i = 0; i < 4; i++) {
> +		objs[i].value.id = i * 2;
> +		objs[i].value.tid = 0;
> +		err = rhashtable_insert_fast(&ht, &objs[i].node,
> +					     test_rht_params);
> +		if (err) {
> +			pr_warn("walk_enter_from: insert %d failed: %d\n",
> +				i, err);
> +			goto out;
> +		}
> +	}

[ ... ]

> +	/* Test 3: verify walk_enter_from + walk_next can iterate remaining elements */
> +	for (i = 0; i < 4; i++) {
> +		key.id = i * 2;
> +		count = 0;
> +
> +		scoped_guard(rcu) {
> +			rhashtable_walk_enter_from(&ht, &iter, &key, test_rht_params);
> +			rhashtable_walk_start(&iter);
> +		}
> +
> +		/* Count the number of elements after this one */
> +		do {
> +			obj = rhashtable_walk_next(&iter);
> +			count++;
> +		} while (obj && (!IS_ERR(obj) || (PTR_ERR(obj) == -EAGAIN)));
> +
> +		rhashtable_walk_stop(&iter);
> +		rhashtable_walk_exit(&iter);
> +		if (count > 4) {
> +			pr_warn("walk_enter_from: walked %d elements instead of max 4\n", count);
> +			goto out;

Does this error path need to set the err variable?

Right now it jumps to out and returns 0, which might cause the test
suite to silently miss the failure. The err variable should be set to
-EINVAL before the goto.

Reference: https://lore.kernel.org/all/20260424201535.EA1BCC19425@smtp.kernel.org/

> +		}
> +		/*
> +		 * Collect the lengths of every walked path,
> +		 * we should have each length walked one time
> +		 */
> +		len_freq[count - 1]++;
> +	}

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24909591959

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions
  2026-04-24 19:50 ` [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
  2026-04-24 20:40   ` sashiko-bot
@ 2026-04-24 20:45   ` bot+bpf-ci
  2026-04-25 20:50     ` Mykyta Yatsenko
  1 sibling, 1 reply; 27+ messages in thread
From: bot+bpf-ci @ 2026-04-24 20:45 UTC (permalink / raw)
  To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
	eddyz87, memxor, herbert
  Cc: yatsenko, emil, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3248 bytes --]

> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 3dd9b4924ae4..a37bd2a7b30f 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c

[ ... ]

> +static int rhtab_map_check_btf(struct bpf_map *map, const struct btf *btf,
> +			       const struct btf_type *key_type,
> +			       const struct btf_type *value_type)
> +{
> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> +
> +	return bpf_ma_set_dtor(map, &rhtab->ma, rhtab_mem_dtor);
> +}

Does bpf_ma_set_dtor() exist at this commit?

A git-wide search shows no definition or declaration of bpf_ma_set_dtor().
The kernel provides bpf_mem_alloc_set_dtor() with a 4-argument signature
(ma, dtor, dtor_ctx_free, ctx) in include/linux/bpf_mem_alloc.h:

    int bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, dtor_fn dtor,
                               dtor_ctx_fn dtor_ctx_free, void *ctx);

The existing htab_set_dtor() in kernel/bpf/hashtab.c builds a struct
htab_btf_record context and calls bpf_mem_alloc_set_dtor() with all four
arguments.

The new rhtab_map_check_btf() attempts to call a not-yet-existing helper
with a 3-argument signature. This breaks compilation with 'implicit
declaration of function "bpf_ma_set_dtor"' (fatal with kernel's
-Werror=implicit-function-declaration) or link failure with 'undefined
reference to bpf_ma_set_dtor'.

Looking at the later commit b3eef0fdafe1 ("bpf: Allow timers, workqueues
and task_work in resizable hashtab"), it renames htab_set_dtor() to
bpf_ma_set_dtor() and changes its signature to (map, ma, dtor), providing
the missing symbol. That commit also wires up .map_check_btf =
rhtab_map_check_btf in rhtab_map_ops.

Can this commit be reordered or squashed to avoid the forward reference
and maintain bisectability?

> +static void rhtab_free_elem(void *ptr, void *arg)
> +{
> +	struct bpf_rhtab *rhtab = arg;
> +	struct rhtab_elem *elem = ptr;
> +
> +	bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
> +	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
> +}

[ ... ]

> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
> +{
> +	int err;
> +
> +	err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
> +	if (err)
> +		return err;
> +
> +	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
> +	return 0;
> +}

Does rhtab_delete_elem() handle special fields properly?

Looking at the v2 review thread
(https://lore.kernel.org/bpf/DHSBDLRKCAQO.1IPJTSEMPESV7@etsalapatis.com/),
Emil Tsalapatis and sashiko raised concerns that special fields are not
handled properly during in-place updates. The v3 changelog claims this was
addressed, but rhtab_delete_elem() does not call
bpf_map_free_internal_structs() before freeing the element.

This is inconsistent with rhtab_free_elem() which properly calls
bpf_map_free_internal_structs(). Could this lead to resource leaks for
maps with timers, workqueues, or other internal structures when elements
are deleted via rhtab_map_delete_elem()?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24909591959

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in resizable hashtab
  2026-04-24 19:50 ` [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
@ 2026-04-24 21:05   ` sashiko-bot
  2026-04-25 21:29     ` Mykyta Yatsenko
  0 siblings, 1 reply; 27+ messages in thread
From: sashiko-bot @ 2026-04-24 21:05 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [High] Unbounded iteration over the entire hash table inside a single RCU read-side critical section in `rhtab_map_free_internal_structs`.
- [High] Race condition in `rhtab_map_update_existing` allows concurrently initialized special fields (timers, kptrs) to survive map updates.
--

commit 1516bbfa3123276ce6abdd224ad40eb2ca633302
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Allow timers, workqueues and task_work in resizable hashtab

This commit allows resizable hashtab maps to support deferred callbacks by
enabling timers, workqueues, and task_work field types. It also wires up
bpf_obj_free_fields during element deletion and map teardown to properly
clean up BTF record fields.

> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 000caa2c7f4cd9..ba03797b896020 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
[ ... ]
> @@ -2945,11 +2944,15 @@ static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void
>  static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
>  				      u64 map_flags)
>  {
[ ... ]
> +	/* Free special fields, and reuse memory similar to arraymap */
> +	rhtab_check_and_free_fields(rhtab, elem);
> +
>  	if (map_flags & BPF_F_LOCK)
>  		copy_map_value_locked(map, old_val, value, false);
>  	else
>  		copy_map_value(map, old_val, value);

Does calling rhtab_check_and_free_fields() before copy_map_value()
create a race condition with concurrent BPF programs?

If a concurrent program acquires the spinlock and initializes a new
timer or kptr right after rhtab_check_and_free_fields() finishes,
would those newly initialized special fields survive the update?
The subsequent copy_map_value() call intentionally skips special fields.

Should rhtab_check_and_free_fields() be called after the copy completes
to ensure no concurrent modifications survive the element update?

[ ... ]
> @@ -3033,6 +3037,25 @@ static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
>  
>  static void rhtab_map_free_internal_structs(struct bpf_map *map)
>  {
> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> +	struct rhashtable_iter iter;
> +	struct rhtab_elem *elem;
> +
> +	if (!bpf_map_has_internal_structs(map))
> +		return;
> +
> +	/*
> +	 * An element can be processed twice if rhashtable resized concurrently.
> +	 * Special structs freeing handles duplicate cancel_and_free.
> +	 */
> +	rhashtable_walk_enter(&rhtab->ht, &iter);
> +	rhashtable_walk_start(&iter);
> +
> +	for (elem = rhtab_iter_next(&iter); elem; elem = rhtab_iter_next(&iter))
> +		bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
> +
> +	rhashtable_walk_stop(&iter);
> +	rhashtable_walk_exit(&iter);
>  }

Can this loop trigger RCU CPU stall warnings for large hash maps?

The function rhashtable_walk_start() explicitly acquires the RCU read lock,
meaning this entire iteration over potentially millions of elements occurs
inside a single RCU read-side critical section.

If the map contains many elements, looping without yielding could block
RCU grace periods for an extended time.

Would it be better to periodically drop the lock and yield, similar to
how other map structures handle large iterations?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260424-rhash-v3-0-d0fa0ce4379b@meta.com?part=5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions
  2026-04-24 20:40   ` sashiko-bot
@ 2026-04-25 20:41     ` Mykyta Yatsenko
  0 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-25 20:41 UTC (permalink / raw)
  To: sashiko; +Cc: bpf



On 4/24/26 9:40 PM, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 1 potential issue(s):
> - [Critical] Using rhashtable for BPF map updates/deletes is unsafe and will lead to hard deadlocks from IRQ/NMI/softirq contexts.
> --
> 
> commit 47c235933705172d3a36c4339c754282500161f1
> Author: Mykyta Yatsenko <yatsenko@meta.com>
> 
> bpf: Implement resizable hashmap basic functions
> 
> This commit adds a new BPF map type BPF_MAP_TYPE_RHASH backed by the kernel
> rhashtable implementation. It provides basic map operations like lookup,
> update, and delete using rhashtable APIs.
> 
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 3dd9b4924ae4fb..a37bd2a7b30f4b 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
> 
> [ ... ]
> 
>> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
>> +{
>> +	int err;
>> +
>> +	err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
> 
> Can this lead to a deadlock when used by non-sleepable BPF programs in
> interrupt or softirq contexts?
> 
> The verifier permits BPF_MAP_TYPE_RHASH to be used by non-sleepable BPF
> programs, such as those attached to XDP or TC, which execute in softirq
> context and do not check bpf_prog_active.
> 
> Map operations like rhashtable_remove_fast() and
> rhashtable_lookup_get_insert_fast() internally use bit_spin_lock() for
> bucket-level synchronization. This locking mechanism disables preemption
> but leaves hardware and software interrupts enabled, unlike the
> raw_spin_lock_irqsave() approach used by the standard htab.
> 
> If a syscall updates the map from process context and acquires the bucket
> lock, and is then preempted by a hardware interrupt that schedules a softirq
> on return, the softirq will execute the XDP or TC BPF program.
> 
> If this BPF program attempts to access the same hash bucket, wouldn't it spin
> indefinitely waiting for the bit_spin_lock held by the interrupted process,
> causing a system deadlock?
> 

This is right, I forgot to add if (irqs_disabled()) return -EBUSY 
pointed to in v1: 
https://lore.kernel.org/all/CAADnVQKvp2tnQSWW4Kh3ofGLvaEYJ47kyZeNUscAH_DShw8XHw@mail.gmail.com/

I'll add it in v4.

> [ ... ]
> 
>> +static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
>> +{
>> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> +	struct rhtab_elem *elem, *tmp;
> 
> [ ... ]
> 
>> +	tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
> 
> Does the same deadlock mechanism apply here as well, given that
> rhashtable_lookup_get_insert_fast() also relies on bucket-level locks?
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions
  2026-04-24 20:45   ` bot+bpf-ci
@ 2026-04-25 20:50     ` Mykyta Yatsenko
  0 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-25 20:50 UTC (permalink / raw)
  To: bot+bpf-ci, bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87,
	memxor, herbert
  Cc: yatsenko, emil, martin.lau, yonghong.song, clm, ihor.solodrai



On 4/24/26 9:45 PM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 3dd9b4924ae4..a37bd2a7b30f 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
> 
> [ ... ]
> 
>> +static int rhtab_map_check_btf(struct bpf_map *map, const struct btf *btf,
>> +			       const struct btf_type *key_type,
>> +			       const struct btf_type *value_type)
>> +{
>> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> +
>> +	return bpf_ma_set_dtor(map, &rhtab->ma, rhtab_mem_dtor);
>> +}
> 
> Does bpf_ma_set_dtor() exist at this commit?
> 
> A git-wide search shows no definition or declaration of bpf_ma_set_dtor().
> The kernel provides bpf_mem_alloc_set_dtor() with a 4-argument signature
> (ma, dtor, dtor_ctx_free, ctx) in include/linux/bpf_mem_alloc.h:
> 
>     int bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, dtor_fn dtor,
>                                dtor_ctx_fn dtor_ctx_free, void *ctx);
> 
> The existing htab_set_dtor() in kernel/bpf/hashtab.c builds a struct
> htab_btf_record context and calls bpf_mem_alloc_set_dtor() with all four
> arguments.
> 
> The new rhtab_map_check_btf() attempts to call a not-yet-existing helper
> with a 3-argument signature. This breaks compilation with 'implicit
> declaration of function "bpf_ma_set_dtor"' (fatal with kernel's
> -Werror=implicit-function-declaration) or link failure with 'undefined
> reference to bpf_ma_set_dtor'.
> 
> Looking at the later commit b3eef0fdafe1 ("bpf: Allow timers, workqueues
> and task_work in resizable hashtab"), it renames htab_set_dtor() to
> bpf_ma_set_dtor() and changes its signature to (map, ma, dtor), providing
> the missing symbol. That commit also wires up .map_check_btf =
> rhtab_map_check_btf in rhtab_map_ops.
> 
> Can this commit be reordered or squashed to avoid the forward reference
> and maintain bisectability?

this is right, this function should be added later, at the point of this
commit special fields are not supported, so this is not needed anyway.

> 
>> +static void rhtab_free_elem(void *ptr, void *arg)
>> +{
>> +	struct bpf_rhtab *rhtab = arg;
>> +	struct rhtab_elem *elem = ptr;
>> +
>> +	bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
>> +	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
>> +}
> 
> [ ... ]
> 
>> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
>> +{
>> +	int err;
>> +
>> +	err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
>> +	if (err)
>> +		return err;
>> +
>> +	bpf_mem_cache_free_rcu(&rhtab->ma, elem);
>> +	return 0;
>> +}
> 
> Does rhtab_delete_elem() handle special fields properly?
> 
> Looking at the v2 review thread
> (https://lore.kernel.org/bpf/DHSBDLRKCAQO.1IPJTSEMPESV7@etsalapatis.com/),
> Emil Tsalapatis and sashiko raised concerns that special fields are not
> handled properly during in-place updates. The v3 changelog claims this was
> addressed, but rhtab_delete_elem() does not call
> bpf_map_free_internal_structs() before freeing the element.
> 
> This is inconsistent with rhtab_free_elem() which properly calls
> bpf_map_free_internal_structs(). Could this lead to resource leaks for
> maps with timers, workqueues, or other internal structures when elements
> are deleted via rhtab_map_delete_elem()?
> 
At this point special fields are not supported, the corresponding changes
are introduced in commit 5.

> 
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24909591959


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for resizable hashtab
  2026-04-24 20:28   ` sashiko-bot
@ 2026-04-25 21:24     ` Mykyta Yatsenko
  2026-04-27 13:36       ` Mykyta Yatsenko
  0 siblings, 1 reply; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-25 21:24 UTC (permalink / raw)
  To: sashiko; +Cc: bpf



On 4/24/26 9:28 PM, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 4 potential issue(s):
> - [High] Cursor invalidation in batch and seq_file iterators causes O(N^2) complexity and infinite loops.
> - [High] Unbounded RCU read lock duration in batch operations allows user-triggered RCU stalls.
> - [High] Map iterators and seq_file dumps spuriously abort during normal background resizes.
> - [High] Deadlock risk in `bpf_each_rhash_elem` due to acquiring `spin_lock` from atomic contexts.
> --
> 
> commit 9f7ce51619eb27e8a4a65df12f862e12e34bc285
> Author: Mykyta Yatsenko <yatsenko@meta.com>
> 
> bpf: Implement batch ops and iterators for resizable hashtab
> 
> This commit adds batch operations for resizable hashtab maps, using
> rhashtable_walk_enter_from() to resume iteration. It also wires up
> seq_file BPF iterators and implements rhtab_map_mem_usage().
> 
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index d37f3d548d3658..000caa2c7f4cd9 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
> 
> [ ... ]
> 
>> @@ -3066,64 +3066,330 @@ static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key
>> +static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
>> +				void *callback_ctx, u64 flags)
>> +{
> 
> [ ... ]
> 
>> +	/*
>> +	 * The rhashtable walk API uses spin_lock in rhashtable_walk_start/stop,
>> +	 * which is not safe in NMI or soft/hard IRQ context.
>> +	 */
>> +	if (in_nmi() || in_hardirq() || in_softirq())
>> +		return -EOPNOTSUPP;
>> +
>> +	rhashtable_walk_enter(&rhtab->ht, &iter);
> 
> Does this adequately protect against deadlocks when called from kprobes or
> tracepoints in process context with preemption disabled?
> 
> If a thread currently holding ht->lock (e.g., during map insertion) is
> interrupted by a kprobe that runs this BPF program, attempting to acquire
> the same spin_lock inside rhashtable_walk_enter() might cause a deadlock.
> 
>> +	rhashtable_walk_start(&iter);
>> +
>> +	while ((elem = rhashtable_walk_next(&iter))) {
>> +		/* rhashtable_walk_next returns -EAGAIN on resize, abort */
>> +		if (IS_ERR(elem)) {
>> +			num_elems = -EBUSY;
>> +			break;
>> +		}
> 
> Is it expected to abort the entire iteration if a background map resize
> occurs? 
> 
> rhashtable_walk_next() returns -EAGAIN to indicate the table was resized and
> the iterator was rewound. Since resizable hashtabs automatically resize
> during normal workloads, returning -EBUSY here could cause
> bpf_for_each_map_elem() and bpftool map dumps to randomly fail.
> 
> Should this handle -EAGAIN by retrying or continuing the walk instead?

No, resize is not handled automatically.

> 
> [ ... ]
> 
>> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
>> +					       const union bpf_attr *attr,
>> +					       union bpf_attr __user *uattr,
>> +					       bool do_delete)
>> +{
> 
> [ ... ]
> 
>> +	/*
>> +	 * Use the last key from the previous batch as cursor.
>> +	 * enter_from positions at that key's bucket, walk_next
>> +	 * returns the successor in O(1).
>> +	 * First call (ubatch == NULL): starts from bucket 0.
>> +	 */
>> +	if (ubatch) {
>> +		buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
>> +		if (!buf) {
>> +			ret = -ENOMEM;
>> +			goto free;
>> +		}
>> +		if (copy_from_user(buf, ubatch, key_size)) {
>> +			ret = -EFAULT;
>> +			goto free;
>> +		}
>> +	}
>> +
>> +	scoped_guard(rcu) {
>> +		rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
>> +		rhashtable_walk_start(&iter);
>> +	}
> 
> If the key stored in buf was already deleted, will this iteration restart
> from the beginning of the map?
>

 
yes, if cursor is deleted, iteration is restarted.

> Looking at rhashtable_walk_enter_from(), if it fails to find the key, it
> leaves iter->slot at 0. Then rhashtable_walk_start() and walk_next() will
> start from bucket 0 instead of where the previous batch left off.
> 
> Since lookup_and_delete_batch deletes the elements returned in the previous
> batch, the cursor key will always be deleted. Could this cause the iterator
> to repeatedly scan over all emptied buckets, resulting in an expensive
> traversal?
> 

If too many elements are deleted, the table will be resized, so the
number of empty buckets traversed should not be very big. But this is a
good point, we may not need to store the cursor at all.

> Similarly, for lookup_batch and seq_file dumps, if the cursor is
> concurrently deleted, could this trap userspace in an infinite loop of
> duplicate elements?
> 

This should not cause infinite loop, if cursor is deleted every time,
we'll run out of elements. Table resize exposes any iterators to the risk
of duplicate elements, it's up to user to make sure table is in steady
state when iterating.

>> +
>> +	dst_key = keys;
>> +	dst_val = values;
>> +	total = 0;
>> +
>> +	while (total < max_count) {
>> +		elem = rhtab_iter_next(&iter);
>> +		if (!elem)
>> +			break;
> 
> Can this loop cause RCU stalls or soft lockups if max_count is very large? 
> 
> The max_count parameter is controlled by userspace. Since
> rhashtable_walk_start() acquires the RCU read-side critical section, this
> loop might iterate, copy memory, and process millions of elements without
> ever dropping the RCU lock or calling cond_resched().
>
 
Existing htab does handle this situation; maybe splitting big batch into
smaller chunks with 
```
rcu_read_unlock();
bpf_enable_instrumentation();
```
in between will help.

> [ ... ]
> 
>> +static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
>> +	__acquires(RCU)
>> +{
> 
> [ ... ]
> 
>> +	elem = rhashtable_walk_next(&info->iter);
>> +	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
>> +	if (IS_ERR(elem))
>> +		return ERR_PTR(-EBUSY);
> 
> Similar to bpf_each_rhash_elem, will returning -EBUSY here on normal hash
> table resizes cause userspace map dumps to fail spuriously?
>
 
This is by design.

> [ ... ]
> 
>> +static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>> +{
> 
> [ ... ]
> 
>> +	elem = rhashtable_walk_next(&info->iter);
>> +	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
>> +	if (IS_ERR(elem))
>> +		return ERR_PTR(-EBUSY);
>> +
>> +	return elem;
>> +}
> 
> Again, aborting on -EAGAIN here might interrupt the map dump when a resize
> happens in the background. Is there a way to handle -EAGAIN without failing
> the entire sequence?
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in resizable hashtab
  2026-04-24 21:05   ` sashiko-bot
@ 2026-04-25 21:29     ` Mykyta Yatsenko
  0 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-25 21:29 UTC (permalink / raw)
  To: sashiko; +Cc: bpf



On 4/24/26 10:05 PM, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 2 potential issue(s):
> - [High] Unbounded iteration over the entire hash table inside a single RCU read-side critical section in `rhtab_map_free_internal_structs`.
> - [High] Race condition in `rhtab_map_update_existing` allows concurrently initialized special fields (timers, kptrs) to survive map updates.
> --
> 
> commit 1516bbfa3123276ce6abdd224ad40eb2ca633302
> Author: Mykyta Yatsenko <yatsenko@meta.com>
> 
> bpf: Allow timers, workqueues and task_work in resizable hashtab
> 
> This commit allows resizable hashtab maps to support deferred callbacks by
> enabling timers, workqueues, and task_work field types. It also wires up
> bpf_obj_free_fields during element deletion and map teardown to properly
> clean up BTF record fields.
> 
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 000caa2c7f4cd9..ba03797b896020 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
> [ ... ]
>> @@ -2945,11 +2944,15 @@ static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void
>>  static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
>>  				      u64 map_flags)
>>  {
> [ ... ]
>> +	/* Free special fields, and reuse memory similar to arraymap */
>> +	rhtab_check_and_free_fields(rhtab, elem);
>> +
>>  	if (map_flags & BPF_F_LOCK)
>>  		copy_map_value_locked(map, old_val, value, false);
>>  	else
>>  		copy_map_value(map, old_val, value);
> 
> Does calling rhtab_check_and_free_fields() before copy_map_value()
> create a race condition with concurrent BPF programs?
> 
> If a concurrent program acquires the spinlock and initializes a new
> timer or kptr right after rhtab_check_and_free_fields() finishes,
> would those newly initialized special fields survive the update?
> The subsequent copy_map_value() call intentionally skips special fields.
> 
> Should rhtab_check_and_free_fields() be called after the copy completes
> to ensure no concurrent modifications survive the element update?

Even calling rhtab_check_and_free_fields() after the copy_map_value()
does not guarantee that concurrent BPF program does not hold a pointer
to the value and uses special field like timer.

> 
> [ ... ]
>> @@ -3033,6 +3037,25 @@ static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
>>  
>>  static void rhtab_map_free_internal_structs(struct bpf_map *map)
>>  {
>> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> +	struct rhashtable_iter iter;
>> +	struct rhtab_elem *elem;
>> +
>> +	if (!bpf_map_has_internal_structs(map))
>> +		return;
>> +
>> +	/*
>> +	 * An element can be processed twice if rhashtable resized concurrently.
>> +	 * Special structs freeing handles duplicate cancel_and_free.
>> +	 */
>> +	rhashtable_walk_enter(&rhtab->ht, &iter);
>> +	rhashtable_walk_start(&iter);
>> +
>> +	for (elem = rhtab_iter_next(&iter); elem; elem = rhtab_iter_next(&iter))
>> +		bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
>> +
>> +	rhashtable_walk_stop(&iter);
>> +	rhashtable_walk_exit(&iter);
>>  }
> 
> Can this loop trigger RCU CPU stall warnings for large hash maps?
> 
> The function rhashtable_walk_start() explicitly acquires the RCU read lock,
> meaning this entire iteration over potentially millions of elements occurs
> inside a single RCU read-side critical section.
> 
> If the map contains many elements, looping without yielding could block
> RCU grace periods for an extended time.
> 
> Would it be better to periodically drop the lock and yield, similar to
> how other map structures handle large iterations?
> 
yes, I'll address these rcu stall issues in v4.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for resizable hashtab
  2026-04-25 21:24     ` Mykyta Yatsenko
@ 2026-04-27 13:36       ` Mykyta Yatsenko
  0 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-27 13:36 UTC (permalink / raw)
  To: sashiko; +Cc: bpf



On 4/25/26 10:24 PM, Mykyta Yatsenko wrote:
> 
> 
> On 4/24/26 9:28 PM, sashiko-bot@kernel.org wrote:
>> Sashiko AI review found 4 potential issue(s):
>> - [High] Cursor invalidation in batch and seq_file iterators causes O(N^2) complexity and infinite loops.
>> - [High] Unbounded RCU read lock duration in batch operations allows user-triggered RCU stalls.
>> - [High] Map iterators and seq_file dumps spuriously abort during normal background resizes.
>> - [High] Deadlock risk in `bpf_each_rhash_elem` due to acquiring `spin_lock` from atomic contexts.
>> --
>>
>> commit 9f7ce51619eb27e8a4a65df12f862e12e34bc285
>> Author: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> bpf: Implement batch ops and iterators for resizable hashtab
>>
>> This commit adds batch operations for resizable hashtab maps, using
>> rhashtable_walk_enter_from() to resume iteration. It also wires up
>> seq_file BPF iterators and implements rhtab_map_mem_usage().
>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index d37f3d548d3658..000caa2c7f4cd9 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>
>> [ ... ]
>>
>>> @@ -3066,64 +3066,330 @@ static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key
>>> +static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
>>> +				void *callback_ctx, u64 flags)
>>> +{
>>
>> [ ... ]
>>
>>> +	/*
>>> +	 * The rhashtable walk API uses spin_lock in rhashtable_walk_start/stop,
>>> +	 * which is not safe in NMI or soft/hard IRQ context.
>>> +	 */
>>> +	if (in_nmi() || in_hardirq() || in_softirq())
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	rhashtable_walk_enter(&rhtab->ht, &iter);
>>
>> Does this adequately protect against deadlocks when called from kprobes or
>> tracepoints in process context with preemption disabled?
>>
>> If a thread currently holding ht->lock (e.g., during map insertion) is
>> interrupted by a kprobe that runs this BPF program, attempting to acquire
>> the same spin_lock inside rhashtable_walk_enter() might cause a deadlock.
>>
>>> +	rhashtable_walk_start(&iter);
>>> +
>>> +	while ((elem = rhashtable_walk_next(&iter))) {
>>> +		/* rhashtable_walk_next returns -EAGAIN on resize, abort */
>>> +		if (IS_ERR(elem)) {
>>> +			num_elems = -EBUSY;
>>> +			break;
>>> +		}
>>
>> Is it expected to abort the entire iteration if a background map resize
>> occurs? 
>>
>> rhashtable_walk_next() returns -EAGAIN to indicate the table was resized and
>> the iterator was rewound. Since resizable hashtabs automatically resize
>> during normal workloads, returning -EBUSY here could cause
>> bpf_for_each_map_elem() and bpftool map dumps to randomly fail.
>>
>> Should this handle -EAGAIN by retrying or continuing the walk instead?
> 
> No, resize is not handled automatically.
> 
>>
>> [ ... ]
>>
>>> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
>>> +					       const union bpf_attr *attr,
>>> +					       union bpf_attr __user *uattr,
>>> +					       bool do_delete)
>>> +{
>>
>> [ ... ]
>>
>>> +	/*
>>> +	 * Use the last key from the previous batch as cursor.
>>> +	 * enter_from positions at that key's bucket, walk_next
>>> +	 * returns the successor in O(1).
>>> +	 * First call (ubatch == NULL): starts from bucket 0.
>>> +	 */
>>> +	if (ubatch) {
>>> +		buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
>>> +		if (!buf) {
>>> +			ret = -ENOMEM;
>>> +			goto free;
>>> +		}
>>> +		if (copy_from_user(buf, ubatch, key_size)) {
>>> +			ret = -EFAULT;
>>> +			goto free;
>>> +		}
>>> +	}
>>> +
>>> +	scoped_guard(rcu) {
>>> +		rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
>>> +		rhashtable_walk_start(&iter);
>>> +	}
>>
>> If the key stored in buf was already deleted, will this iteration restart
>> from the beginning of the map?
>>
> 
>  
> yes, if cursor is deleted, iteration is restarted.
> 
>> Looking at rhashtable_walk_enter_from(), if it fails to find the key, it
>> leaves iter->slot at 0. Then rhashtable_walk_start() and walk_next() will
>> start from bucket 0 instead of where the previous batch left off.
>>
>> Since lookup_and_delete_batch deletes the elements returned in the previous
>> batch, the cursor key will always be deleted. Could this cause the iterator
>> to repeatedly scan over all emptied buckets, resulting in an expensive
>> traversal?
>>
> 
> If too many elements are deleted, the table will be resized, so the
> number of empty buckets traversed should not be very big. But this is a
> good point, we may not need to store the cursor at all.

Probably a safer fix would be just to use the next element as a cursor.

> 
>> Similarly, for lookup_batch and seq_file dumps, if the cursor is
>> concurrently deleted, could this trap userspace in an infinite loop of
>> duplicate elements?
>>
> 
> This should not cause infinite loop, if cursor is deleted every time,
> we'll run out of elements. Table resize exposes any iterators to the risk
> of duplicate elements, it's up to user to make sure table is in steady
> state when iterating.
> 
>>> +
>>> +	dst_key = keys;
>>> +	dst_val = values;
>>> +	total = 0;
>>> +
>>> +	while (total < max_count) {
>>> +		elem = rhtab_iter_next(&iter);
>>> +		if (!elem)
>>> +			break;
>>
>> Can this loop cause RCU stalls or soft lockups if max_count is very large? 
>>
>> The max_count parameter is controlled by userspace. Since
>> rhashtable_walk_start() acquires the RCU read-side critical section, this
>> loop might iterate, copy memory, and process millions of elements without
>> ever dropping the RCU lock or calling cond_resched().
>>
>  
> Existing htab does handle this situation; maybe splitting big batch into
> smaller chunks with 
> ```
> rcu_read_unlock();
> bpf_enable_instrumentation();
> ```
> in between will help.
> 
>> [ ... ]
>>
>>> +static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
>>> +	__acquires(RCU)
>>> +{
>>
>> [ ... ]
>>
>>> +	elem = rhashtable_walk_next(&info->iter);
>>> +	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
>>> +	if (IS_ERR(elem))
>>> +		return ERR_PTR(-EBUSY);
>>
>> Similar to bpf_each_rhash_elem, will returning -EBUSY here on normal hash
>> table resizes cause userspace map dumps to fail spuriously?
>>
>  
> This is by design.
> 
>> [ ... ]
>>
>>> +static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>>> +{
>>
>> [ ... ]
>>
>>> +	elem = rhashtable_walk_next(&info->iter);
>>> +	/* rhashtable_walk_next returns -EAGAIN on resize, abort */
>>> +	if (IS_ERR(elem))
>>> +		return ERR_PTR(-EBUSY);
>>> +
>>> +	return elem;
>>> +}
>>
>> Again, aborting on -EAGAIN here might interrupt the map dump when a resize
>> happens in the background. Is there a way to handle -EAGAIN without failing
>> the entire sequence?
>>
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab
  2026-04-24 19:50 ` [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab Mykyta Yatsenko
@ 2026-04-28 10:33   ` Herbert Xu
  2026-04-28 13:20     ` Mykyta Yatsenko
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2026-04-28 10:33 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	Mykyta Yatsenko, Emil Tsalapatis

On Fri, Apr 24, 2026 at 12:50:45PM -0700, Mykyta Yatsenko wrote:
>
>  static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>  {
> -	return -EOPNOTSUPP;
> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> +	struct rhashtable_iter iter;
> +	struct rhtab_elem *elem;
> +	int ret = -ENOENT;
> +
> +	/*
> +	 * Hold RCU across enter_from + walk_start to prevent the
> +	 * element cached by enter_from from being freed before
> +	 * walk_start re-acquires RCU.
> +	 */
> +	guard(rcu)();
> +	/* If key is not found, places iterator at the beginning */
> +	rhashtable_walk_enter_from(&rhtab->ht, &iter, key, rhtab->params);
> +	rhashtable_walk_start(&iter);
> +
> +	elem = rhtab_iter_next(&iter);
> +	if (elem) {
> +		memcpy(next_key, elem->data, map->key_size);
> +		ret = 0;
> +	}
> +
> +	rhashtable_walk_stop(&iter);
> +	rhashtable_walk_exit(&iter);
> +
> +	return ret;
>  }

The same issue from v2 still exists here.  If a rehash occurs
in between calls to rhtab_map_get_next_key you could be iterating
over the table forever.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from()
  2026-04-24 19:50 ` [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
  2026-04-24 20:15   ` sashiko-bot
  2026-04-24 20:45   ` bot+bpf-ci
@ 2026-04-28 10:35   ` Herbert Xu
  2 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2026-04-28 10:35 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	Mykyta Yatsenko

On Fri, Apr 24, 2026 at 12:50:44PM -0700, Mykyta Yatsenko wrote:
.
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 6074ed5f66f3..2a6098edf737 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -692,6 +692,47 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
>  }
>  EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
>  
> +/**
> + * rhashtable_walk_enter_from - Initialise a walk starting at a key's bucket
> + * @ht:		Table to walk over
> + * @iter:	Hash table iterator
> + * @key:	Key whose bucket to start from
> + * @params:	Hash table parameters
> + *
> + * Like rhashtable_walk_enter(), but positions the iterator at the bucket
> + * containing @key.  If @key is not present in the main table, the iterator is
> + * positioned at the beginning.
> + *
> + * Same constraints as rhashtable_walk_enter() apply.
> + */
> +void rhashtable_walk_enter_from(struct rhashtable *ht,
> +				struct rhashtable_iter *iter,
> +				const void *key,
> +				const struct rhashtable_params params)
> +				__must_hold(RCU)
> +{
> +	struct bucket_table *tbl;
> +	struct rhash_head *he;
> +
> +	rhashtable_walk_enter(ht, iter);
> +
> +	if (!key)
> +		return;
> +
> +	tbl = iter->walker.tbl;
> +	if (!tbl)
> +		return;
> +
> +	he = __rhashtable_lookup_one(ht, tbl, key, params,
> +				     RHT_LOOKUP_NORMAL);
> +	if (!he)
> +		return;
> +
> +	iter->slot = rht_key_hashfn(ht, tbl, key, params);
> +	iter->p = he;
> +}
> +EXPORT_SYMBOL_GPL(rhashtable_walk_enter_from);

I still think this is misguided.  I'm not going to stop you from
doing it if you really wanted to, but this thing will not work well
if your rhashtable is constantly resizing.

You also need to consider the possibility that an adversary is
triggering the constant rehashing.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab
  2026-04-28 10:33   ` Herbert Xu
@ 2026-04-28 13:20     ` Mykyta Yatsenko
  0 siblings, 0 replies; 27+ messages in thread
From: Mykyta Yatsenko @ 2026-04-28 13:20 UTC (permalink / raw)
  To: Herbert Xu
  Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	Mykyta Yatsenko, Emil Tsalapatis



On 4/28/26 11:33 AM, Herbert Xu wrote:
> On Fri, Apr 24, 2026 at 12:50:45PM -0700, Mykyta Yatsenko wrote:
>>
>>  static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>>  {
>> -	return -EOPNOTSUPP;
>> +	struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> +	struct rhashtable_iter iter;
>> +	struct rhtab_elem *elem;
>> +	int ret = -ENOENT;
>> +
>> +	/*
>> +	 * Hold RCU across enter_from + walk_start to prevent the
>> +	 * element cached by enter_from from being freed before
>> +	 * walk_start re-acquires RCU.
>> +	 */
>> +	guard(rcu)();
>> +	/* If key is not found, places iterator at the beginning */
>> +	rhashtable_walk_enter_from(&rhtab->ht, &iter, key, rhtab->params);
>> +	rhashtable_walk_start(&iter);
>> +
>> +	elem = rhtab_iter_next(&iter);
>> +	if (elem) {
>> +		memcpy(next_key, elem->data, map->key_size);
>> +		ret = 0;
>> +	}
>> +
>> +	rhashtable_walk_stop(&iter);
>> +	rhashtable_walk_exit(&iter);
>> +
>> +	return ret;
>>  }
> 
> The same issue from v2 still exists here.  If a rehash occurs
> in between calls to rhtab_map_get_next_key you could be iterating
> over the table forever.
> 
> Cheers,

Yes, I agree that it is possible. For this function kernel does not
iterate the entire hashtable, each get_next_key() call comes from 
userspace via syscall, maybe the risk is acceptable in this situation?

Other functions, where kernel does iterate over the rhashtable, it fails
and returns -EBUSY on resize.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-04-28 13:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 19:50 [PATCH bpf-next v3 00/10] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 01/10] bpf: Implement resizable hashmap basic functions Mykyta Yatsenko
2026-04-24 20:40   ` sashiko-bot
2026-04-25 20:41     ` Mykyta Yatsenko
2026-04-24 20:45   ` bot+bpf-ci
2026-04-25 20:50     ` Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 02/10] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
2026-04-24 20:15   ` sashiko-bot
2026-04-24 20:45   ` bot+bpf-ci
2026-04-28 10:35   ` Herbert Xu
2026-04-24 19:50 ` [PATCH bpf-next v3 03/10] bpf: Implement get_next_key() resizable hashtab Mykyta Yatsenko
2026-04-28 10:33   ` Herbert Xu
2026-04-28 13:20     ` Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 04/10] bpf: Implement batch ops and iterators for " Mykyta Yatsenko
2026-04-24 20:28   ` sashiko-bot
2026-04-25 21:24     ` Mykyta Yatsenko
2026-04-27 13:36       ` Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 05/10] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
2026-04-24 21:05   ` sashiko-bot
2026-04-25 21:29     ` Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 06/10] libbpf: Support resizable hashtable Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
2026-04-24 20:02   ` sashiko-bot
2026-04-24 20:32   ` bot+bpf-ci
2026-04-24 19:50 ` [PATCH bpf-next v3 08/10] selftests/bpf: Add BPF iterator " Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 09/10] bpftool: Add rhash map documentation Mykyta Yatsenko
2026-04-24 19:50 ` [PATCH bpf-next v3 10/10] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox