* [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map
@ 2026-04-08 15:10 Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
` (18 more replies)
0 siblings, 19 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that
leverages the kernel's rhashtable to provide resizable hash map for BPF.
The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at
map creation time. While this works well for many use cases, it presents
challenges when:
1. The number of elements is unknown at creation time
2. The element count varies significantly during runtime
3. Memory efficiency is important (over-provisioning wastes memory,
under-provisioning hurts performance)
BPF_MAP_TYPE_RHASH addresses these issues by using rhashtable, which
automatically grows and shrinks based on load factor.
The implementation wraps the kernel's rhashtable with BPF map operations:
- Uses bpf_mem_alloc for RCU-safe memory management
- Supports all standard map operations (lookup, update, delete, get_next_key)
- Supports batch operations (lookup_batch, lookup_and_delete_batch)
- Supports BPF iterators for traversal
- Supports BPF_F_LOCK for spin locks in values
- Requires BPF_F_NO_PREALLOC flag (elements allocated on demand)
- max_entries serves as a hard limit, not bucket count
The series includes comprehensive tests:
- Basic operations in test_maps (lookup, update, delete, get_next_key)
- BPF program tests for lookup/update/delete semantics
- BPF_F_LOCK tests with concurrent access
- Stress tests for get_next_key during concurrent resize operations
- Seq file tests
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Current implementation of the BPF_MAP_TYPE_RHASH does not provide
the same strong guarantees on the values consistency under concurrent
reads/writes as BPF_MAP_TYPE_HASH.
BPF_MAP_TYPE_HASH allocates a new element and atomically swaps the
pointer, so RCU readers always see a complete value. BPF_MAP_TYPE_RHASH
does memcpy in place with no lock held.
rhash trades consistency for speed (5x improvement in update benchmark):
concurrent readers can observe partially updated data. Two concurrent
writers to the same key can also interleave, producing mixed values.
As a solution, user may use BPF_F_LOCK to guarantee consistent reads
and write serialization.
Summary of the read consistency guarantees:
map type | write mechanism | read consistency
-------------+------------------+--------------------------
htab | alloc, swap ptr | always consistent (RCU)
htab F_LOCK | in-place + lock | consistent if reader locks
-------------+------------------+--------------------------
rhtab | in-place memcpy | torn reads
rhtab F_LOCK | in-place + lock | consistent if reader locks
Benchmarks and s390 tests depend on the
https://lore.kernel.org/linux-crypto/20260224192954.819444-1-mykyta.yatsenko5@gmail.com/
Changes in v2:
- Added benchmarks
- Reworked all functions that walk the rhashtable, use walk API, instead
of directly accessing tbl and future_tbl
- Added rhashtable_walk_enter_from() into rhashtable to support O(1)
iteration continuations
- Link to v1: https://lore.kernel.org/r/20260205-rhash-v1-0-30dd6d63c462@meta.com
---
Mykyta Yatsenko (18):
bpf: Register rhash map
bpf: Add resizable hashtab skeleton
bpf: Implement lookup, delete, update for resizable hashtab
rhashtable: Add rhashtable_walk_enter_from()
bpf: Implement get_next_key and free_internal_structs for resizable hashtab
bpf: Implement bpf_each_rhash_elem() using walk API
bpf: Implement batch ops for resizable hashtab
bpf: Implement iterator APIs for resizable hashtab
bpf: Implement alloc and free for resizable hashtab
bpf: Allow timers, workqueues and task_work in resizable hashtab
libbpf: Support resizable hashtable
selftests/bpf: Add basic tests for resizable hash map
selftests/bpf: Support resizable hashtab in test_maps
selftests/bpf: Resizable hashtab BPF_F_LOCK tests
selftests/bpf: Add stress tests for resizable hash get_next_key
selftests/bpf: Add BPF iterator tests for resizable hash map
bpftool: Add rhash map documentation
selftests/bpf: Add resizable hashmap to benchmarks
include/linux/bpf_types.h | 1 +
include/linux/rhashtable.h | 31 +-
include/uapi/linux/bpf.h | 1 +
kernel/bpf/hashtab.c | 730 ++++++++++++++++++++-
kernel/bpf/map_iter.c | 3 +-
kernel/bpf/syscall.c | 4 +
kernel/bpf/verifier.c | 1 +
lib/rhashtable.c | 53 ++
lib/test_rhashtable.c | 120 ++++
tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
tools/include/uapi/linux/bpf.h | 1 +
tools/lib/bpf/libbpf.c | 1 +
tools/lib/bpf/libbpf_probes.c | 3 +
tools/testing/selftests/bpf/bench.c | 6 +
.../bpf/benchs/bench_bpf_hashmap_full_update.c | 34 +-
.../bpf/benchs/bench_bpf_hashmap_lookup.c | 31 +-
.../testing/selftests/bpf/benchs/bench_htab_mem.c | 35 +-
.../selftests/bpf/map_tests/htab_map_batch_ops.c | 22 +-
tools/testing/selftests/bpf/prog_tests/rhash.c | 502 ++++++++++++++
.../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c | 75 +++
tools/testing/selftests/bpf/progs/rhash.c | 285 ++++++++
tools/testing/selftests/bpf/test_maps.c | 127 +++-
23 files changed, 2012 insertions(+), 58 deletions(-)
---
base-commit: b199f582a924e04f3f3b1050c484798034c61a12
change-id: 20251103-rhash-7b70069923d8
Best regards,
--
Mykyta Yatsenko <yatsenko@meta.com>
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-10 22:31 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 02/18] bpf: Add resizable hashtab skeleton Mykyta Yatsenko
` (17 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Add resizable hash map into enums where it is needed.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
include/uapi/linux/bpf.h | 1 +
kernel/bpf/map_iter.c | 3 ++-
kernel/bpf/syscall.c | 3 +++
kernel/bpf/verifier.c | 1 +
tools/include/uapi/linux/bpf.h | 1 +
5 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..822582c04f22 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1046,6 +1046,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
BPF_MAP_TYPE_INSN_ARRAY,
+ BPF_MAP_TYPE_RHASH,
__MAX_BPF_MAP_TYPE
};
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index 261a03ea73d3..4a2aafbe28b4 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -119,7 +119,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog,
is_percpu = true;
else if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
- map->map_type != BPF_MAP_TYPE_ARRAY)
+ map->map_type != BPF_MAP_TYPE_ARRAY &&
+ map->map_type != BPF_MAP_TYPE_RHASH)
goto put_map;
key_acc_size = prog->aux->max_rdonly_access;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 51ade3cde8bb..0a5ec417638d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1287,6 +1287,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
case BPF_SPIN_LOCK:
case BPF_RES_SPIN_LOCK:
if (map->map_type != BPF_MAP_TYPE_HASH &&
+ map->map_type != BPF_MAP_TYPE_RHASH &&
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
@@ -1464,6 +1465,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_MAP_TYPE_CGROUP_ARRAY:
case BPF_MAP_TYPE_ARRAY_OF_MAPS:
case BPF_MAP_TYPE_HASH:
+ case BPF_MAP_TYPE_RHASH:
case BPF_MAP_TYPE_PERCPU_HASH:
case BPF_MAP_TYPE_HASH_OF_MAPS:
case BPF_MAP_TYPE_RINGBUF:
@@ -2199,6 +2201,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+ map->map_type == BPF_MAP_TYPE_RHASH ||
map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
if (!bpf_map_is_offloaded(map)) {
bpf_disable_instrumentation();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8c1cf2eb6cbb..53523ab953c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21816,6 +21816,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
if (prog->sleepable)
switch (map->map_type) {
case BPF_MAP_TYPE_HASH:
+ case BPF_MAP_TYPE_RHASH:
case BPF_MAP_TYPE_LRU_HASH:
case BPF_MAP_TYPE_ARRAY:
case BPF_MAP_TYPE_PERCPU_HASH:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 677be9a47347..9d7df174770a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1046,6 +1046,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
BPF_MAP_TYPE_INSN_ARRAY,
+ BPF_MAP_TYPE_RHASH,
__MAX_BPF_MAP_TYPE
};
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 02/18] bpf: Add resizable hashtab skeleton
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
` (16 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Added a skeleton for resizable hashtab. Actual implementation follows
in the next patches of the series.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
include/linux/bpf_types.h | 1 +
kernel/bpf/hashtab.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 167 insertions(+)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b13de31e163f..56e4c3f983d3 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -134,6 +134,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_ARRAY, insn_array_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_RHASH, rhtab_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index bc6bc8bb871d..9e7806814fec 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -9,6 +9,7 @@
#include <linux/rculist_nulls.h>
#include <linux/rcupdate_wait.h>
#include <linux/random.h>
+#include <linux/rhashtable.h>
#include <uapi/linux/btf.h>
#include <linux/rcupdate_trace.h>
#include <linux/btf_ids.h>
@@ -418,6 +419,7 @@ static int htab_map_alloc_check(union bpf_attr *attr)
bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED);
+ bool resizable = attr->map_type == BPF_MAP_TYPE_RHASH;
int numa_node = bpf_map_attr_numa_node(attr);
BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
@@ -459,6 +461,9 @@ static int htab_map_alloc_check(union bpf_attr *attr)
if (percpu && round_up(attr->value_size, 8) > PCPU_MIN_UNIT_SIZE)
return -E2BIG;
+ if (resizable && percpu_lru)
+ return -EINVAL;
+
return 0;
}
@@ -2735,3 +2740,164 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
};
+
+struct rhtab_elem {
+ struct rhash_head node;
+ /* key bytes, then value bytes follow */
+ u8 data[] __aligned(8);
+};
+
+struct bpf_rhtab {
+ struct bpf_map map;
+ struct rhashtable ht;
+ struct rhashtable_params params;
+ struct bpf_mem_alloc ma;
+ u32 elem_size;
+};
+
+static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static int rhtab_map_alloc_check(union bpf_attr *attr)
+{
+ return -EOPNOTSUPP;
+}
+
+static void rhtab_map_free(struct bpf_map *map)
+{
+}
+
+static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
+{
+ return NULL;
+}
+
+static long rhtab_map_delete_elem(struct bpf_map *map, void *key)
+{
+ return -EOPNOTSUPP;
+}
+
+static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
+{
+ return -EOPNOTSUPP;
+}
+
+static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+{
+ return -EOPNOTSUPP;
+}
+
+static void rhtab_map_free_internal_structs(struct bpf_map *map)
+{
+}
+
+static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
+{
+ return -EOPNOTSUPP;
+}
+
+static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ return -EOPNOTSUPP;
+}
+
+static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
+{
+}
+
+static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
+ void *callback_ctx, u64 flags)
+{
+ return -EOPNOTSUPP;
+}
+
+static u64 rhtab_map_mem_usage(const struct bpf_map *map)
+{
+ return 0;
+}
+
+static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ return 0;
+}
+
+static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ return 0;
+}
+
+struct bpf_iter_seq_rhash_map_info {
+ struct bpf_map *map;
+ struct bpf_rhtab *rhtab;
+ struct rhashtable_iter iter;
+ u32 skip_elems;
+ bool iter_active;
+};
+
+static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+ return NULL;
+}
+
+static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ return NULL;
+}
+
+static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v)
+{
+ return 0;
+}
+
+static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int bpf_iter_init_rhash_map(void *priv_data, struct bpf_iter_aux_info *aux)
+{
+ return 0;
+}
+
+static void bpf_iter_fini_rhash_map(void *priv_data)
+{
+}
+
+static const struct seq_operations bpf_rhash_map_seq_ops = {
+ .start = bpf_rhash_map_seq_start,
+ .next = bpf_rhash_map_seq_next,
+ .stop = bpf_rhash_map_seq_stop,
+ .show = bpf_rhash_map_seq_show,
+};
+
+static const struct bpf_iter_seq_info rhash_iter_seq_info = {
+ .seq_ops = &bpf_rhash_map_seq_ops,
+ .init_seq_private = bpf_iter_init_rhash_map,
+ .fini_seq_private = bpf_iter_fini_rhash_map,
+ .seq_priv_size = sizeof(struct bpf_iter_seq_rhash_map_info),
+};
+
+BTF_ID_LIST_SINGLE(rhtab_map_btf_ids, struct, bpf_rhtab)
+const struct bpf_map_ops rhtab_map_ops = {
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_alloc_check = rhtab_map_alloc_check,
+ .map_alloc = rhtab_map_alloc,
+ .map_free = rhtab_map_free,
+ .map_get_next_key = rhtab_map_get_next_key,
+ .map_release_uref = rhtab_map_free_internal_structs,
+ .map_lookup_elem = rhtab_map_lookup_elem,
+ .map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem,
+ .map_update_elem = rhtab_map_update_elem,
+ .map_delete_elem = rhtab_map_delete_elem,
+ .map_gen_lookup = rhtab_map_gen_lookup,
+ .map_seq_show_elem = rhtab_map_seq_show_elem,
+ .map_set_for_each_callback_args = map_set_for_each_callback_args,
+ .map_for_each_callback = bpf_each_rhash_elem,
+ .map_mem_usage = rhtab_map_mem_usage,
+ BATCH_OPS(rhtab),
+ .map_btf_id = &rhtab_map_btf_ids[0],
+ .iter_seq_info = &rhash_iter_seq_info,
+};
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 02/18] bpf: Add resizable hashtab skeleton Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:10 ` Alexei Starovoitov
` (2 more replies)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
` (15 subsequent siblings)
18 siblings, 3 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
for deletes, and rhashtable_lookup_get_insert_fast() for inserts.
Updates modify values in place under RCU rather than allocating a
new element and swapping the pointer (as regular htab does). This
trades read consistency for performance: concurrent readers may
see partial updates. Users requiring consistent reads should use
BPF_F_LOCK.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 134 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 9e7806814fec..ea7314cc3703 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2755,6 +2755,11 @@ struct bpf_rhtab {
u32 elem_size;
};
+static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
+{
+ return l->data + round_up(key_size, 8);
+}
+
static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
{
return ERR_PTR(-EOPNOTSUPP);
@@ -2769,33 +2774,155 @@ static void rhtab_map_free(struct bpf_map *map)
{
}
+static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ /* Using constant zeroed params to force rhashtable use inlined hashfunc */
+ static const struct rhashtable_params params = { 0 };
+
+ return rhashtable_lookup_likely(&rhtab->ht, key, params);
+}
+
static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
{
- return NULL;
+ struct rhtab_elem *l;
+
+ l = rhtab_lookup_elem(map, key);
+ return l ? rhtab_elem_value(l, map->key_size) : NULL;
+}
+
+static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
+{
+ int err;
+
+ err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
+ if (err)
+ return err;
+
+ bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
+ bpf_mem_cache_free_rcu(&rhtab->ma, elem);
+ return 0;
}
static long rhtab_map_delete_elem(struct bpf_map *map, void *key)
{
- return -EOPNOTSUPP;
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *l;
+
+ guard(rcu)();
+ l = rhtab_lookup_elem(map, key);
+ return l ? rhtab_delete_elem(rhtab, l) : -ENOENT;
+}
+
+static void rhtab_read_elem_value(struct bpf_map *map, void *dst, struct rhtab_elem *elem,
+ u64 flags)
+{
+ void *src = rhtab_elem_value(elem, map->key_size);
+
+ if (flags & BPF_F_LOCK)
+ copy_map_value_locked(map, dst, src, true);
+ else
+ copy_map_value(map, dst, src);
}
static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
{
- return -EOPNOTSUPP;
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *l;
+ int err;
+
+ if ((flags & ~BPF_F_LOCK) ||
+ ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
+ return -EINVAL;
+
+ /* Make sure element is not deleted between lookup and copy */
+ guard(rcu)();
+
+ l = rhtab_lookup_elem(map, key);
+ if (!l)
+ return -ENOENT;
+
+ rhtab_read_elem_value(map, value, l, flags);
+ err = rhtab_delete_elem(rhtab, l);
+ if (err)
+ return err;
+
+ check_and_init_map_value(map, value);
+ return 0;
}
-static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
+static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
+ u64 map_flags)
{
- return -EOPNOTSUPP;
+ if (map_flags & BPF_NOEXIST)
+ return -EEXIST;
+
+ if (map_flags & BPF_F_LOCK)
+ copy_map_value_locked(map, rhtab_elem_value(elem, map->key_size), value, false);
+ else
+ copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+ return 0;
}
-static void rhtab_map_free_internal_structs(struct bpf_map *map)
+static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhtab_elem *elem, *tmp;
+
+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+ return -EINVAL;
+
+ if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
+ return -EINVAL;
+
+ guard(rcu)();
+ elem = rhtab_lookup_elem(map, key);
+ if (elem)
+ return rhtab_map_update_existing(map, elem, value, map_flags);
+
+ if (map_flags & BPF_EXIST)
+ return -ENOENT;
+
+ /* Check max_entries limit before inserting new element */
+ if (atomic_read(&rhtab->ht.nelems) >= map->max_entries)
+ return -E2BIG;
+
+ elem = bpf_mem_cache_alloc(&rhtab->ma);
+ if (!elem)
+ return -ENOMEM;
+
+ memcpy(elem->data, key, map->key_size);
+ copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+
+ tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
+ if (tmp) {
+ bpf_mem_cache_free(&rhtab->ma, elem);
+ if (IS_ERR(tmp))
+ return PTR_ERR(tmp);
+
+ return rhtab_map_update_existing(map, tmp, value, map_flags);
+ }
+
+ return 0;
}
static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
- return -EOPNOTSUPP;
+ struct bpf_insn *insn = insn_buf;
+ const int ret = BPF_REG_0;
+
+ BUILD_BUG_ON(!__same_type(&rhtab_lookup_elem,
+ (void *(*)(struct bpf_map *map, void *key)) NULL));
+ *insn++ = BPF_EMIT_CALL(rhtab_lookup_elem);
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1);
+ *insn++ = BPF_ALU64_IMM(BPF_ADD, ret,
+ offsetof(struct rhtab_elem, data) + round_up(map->key_size, 8));
+
+ return insn - insn_buf;
+}
+
+static void rhtab_map_free_internal_structs(struct bpf_map *map)
+{
}
static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from()
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (2 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:13 ` Alexei Starovoitov
2026-04-13 22:22 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab Mykyta Yatsenko
` (14 subsequent siblings)
18 siblings, 2 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
BPF resizable hashmap needs efficient iteration resume for
get_next_key and seq_file iterators. rhashtable_walk_enter()
always starts from bucket 0, forcing linear skip of already-seen
elements.
Add rhashtable_walk_enter_from() that looks up the key's bucket
and positions the walker there, so walk_next returns the successor
directly. If a resize moved the key to the future table, the
walker is migrated to that table.
Refactor __rhashtable_lookup into __rhashtable_lookup_one to reuse
the single-table lookup in both the two-table search and the new
enter_from positioning.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
include/linux/rhashtable.h | 31 ++++++++++--
lib/rhashtable.c | 53 ++++++++++++++++++++
lib/test_rhashtable.c | 120 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 199 insertions(+), 5 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 133ccb39137a..2c7a343ac592 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -253,6 +253,11 @@ static inline void rhashtable_walk_start(struct rhashtable_iter *iter)
(void)rhashtable_walk_start_check(iter);
}
+void rhashtable_walk_enter_from(struct rhashtable *ht,
+ struct rhashtable_iter *iter,
+ const void *key,
+ const struct rhashtable_params params);
+
void *rhashtable_walk_next(struct rhashtable_iter *iter);
void *rhashtable_walk_peek(struct rhashtable_iter *iter);
void rhashtable_walk_stop(struct rhashtable_iter *iter) __releases_shared(RCU);
@@ -613,8 +618,8 @@ static inline int rhashtable_compare(struct rhashtable_compare_arg *arg,
}
/* Internal function, do not use. */
-static __always_inline struct rhash_head *__rhashtable_lookup(
- struct rhashtable *ht, const void *key,
+static __always_inline struct rhash_head *__rhashtable_lookup_one(
+ struct rhashtable *ht, struct bucket_table *tbl, const void *key,
const struct rhashtable_params params,
const enum rht_lookup_freq freq)
__must_hold_shared(RCU)
@@ -624,13 +629,10 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
.key = key,
};
struct rhash_lock_head __rcu *const *bkt;
- struct bucket_table *tbl;
struct rhash_head *he;
unsigned int hash;
BUILD_BUG_ON(!__builtin_constant_p(freq));
- tbl = rht_dereference_rcu(ht->tbl, ht);
-restart:
hash = rht_key_hashfn(ht, tbl, key, params);
bkt = rht_bucket(tbl, hash);
do {
@@ -646,6 +648,25 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
*/
} while (he != RHT_NULLS_MARKER(bkt));
+ return NULL;
+}
+
+/* Internal function, do not use. */
+static __always_inline struct rhash_head *__rhashtable_lookup(
+ struct rhashtable *ht, const void *key,
+ const struct rhashtable_params params,
+ const enum rht_lookup_freq freq)
+ __must_hold_shared(RCU)
+{
+ struct bucket_table *tbl;
+ struct rhash_head *he;
+
+ tbl = rht_dereference_rcu(ht->tbl, ht);
+restart:
+ he = __rhashtable_lookup_one(ht, tbl, key, params, freq);
+ if (he)
+ return he;
+
/* Ensure we see any new tables. */
smp_rmb();
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6074ed5f66f3..2fc277207dcc 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -692,6 +692,59 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
}
EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
+/**
+ * rhashtable_walk_enter_from - Initialise a walk starting at a key's bucket
+ * @ht: Table to walk over
+ * @iter: Hash table iterator
+ * @key: Key whose bucket to start from
+ * @params: Hash table parameters
+ *
+ * Like rhashtable_walk_enter(), but positions the iterator at the bucket
+ * containing @key. If a resize is in progress and @key has been migrated
+ * to the future table, the walker is moved to that table.
+ *
+ * Same constraints as rhashtable_walk_enter() apply.
+ */
+void rhashtable_walk_enter_from(struct rhashtable *ht,
+ struct rhashtable_iter *iter,
+ const void *key,
+ const struct rhashtable_params params)
+ __must_hold(RCU)
+{
+ struct bucket_table *tbl;
+ struct rhash_head *he;
+
+ rhashtable_walk_enter(ht, iter);
+
+ if (!key)
+ return;
+
+ tbl = rht_dereference_rcu(ht->tbl, ht);
+ he = __rhashtable_lookup_one(ht, tbl, key, params,
+ RHT_LOOKUP_NORMAL);
+ if (!he) {
+ smp_rmb();
+ tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+ if (!tbl)
+ return;
+
+ he = __rhashtable_lookup_one(ht, tbl, key, params,
+ RHT_LOOKUP_NORMAL);
+ if (!he)
+ return;
+
+ spin_lock(&ht->lock);
+ list_del(&iter->walker.list);
+ iter->walker.tbl = tbl;
+ list_add(&iter->walker.list, &tbl->walkers);
+ spin_unlock(&ht->lock);
+ }
+
+ iter->slot = rht_key_hashfn(ht, tbl, key, params);
+ iter->p = he;
+}
+EXPORT_SYMBOL_GPL(rhashtable_walk_enter_from);
+
/**
* rhashtable_walk_exit - Free an iterator
* @iter: Hash table Iterator
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 0b33559a910b..0084157a96b4 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -23,6 +23,7 @@
#include <linux/random.h>
#include <linux/vmalloc.h>
#include <linux/wait.h>
+#include <linux/cleanup.h>
#define MAX_ENTRIES 1000000
#define TEST_INSERT_FAIL INT_MAX
@@ -679,6 +680,122 @@ static int threadfunc(void *data)
return err;
}
+static int __init test_walk_enter_from(void)
+{
+ struct rhashtable ht;
+ struct test_obj objs[4];
+ struct rhashtable_iter iter;
+ struct test_obj *obj;
+ int err, i;
+
+ err = rhashtable_init(&ht, &test_rht_params);
+ if (err)
+ return err;
+
+ /* Insert 4 elements with keys 0, 2, 4, 6 */
+ for (i = 0; i < 4; i++) {
+ objs[i].value.id = i * 2;
+ objs[i].value.tid = 0;
+ err = rhashtable_insert_fast(&ht, &objs[i].node,
+ test_rht_params);
+ if (err) {
+ pr_warn("walk_enter_from: insert %d failed: %d\n",
+ i, err);
+ goto out;
+ }
+ }
+
+ /*
+ * Test 1: walk_enter_from positions at key, walk_next returns
+ * the successor (not the key itself).
+ */
+ for (i = 0; i < 4; i++) {
+ struct test_obj_val key = { .id = i * 2 };
+
+ scoped_guard(rcu) {
+ rhashtable_walk_enter_from(&ht, &iter, &key,
+ test_rht_params);
+ rhashtable_walk_start(&iter);
+ }
+
+ obj = rhashtable_walk_next(&iter);
+ while (IS_ERR(obj) && PTR_ERR(obj) == -EAGAIN)
+ obj = rhashtable_walk_next(&iter);
+
+ /* Successor must not be the key itself */
+ if (obj && obj->value.id == i * 2) {
+ pr_warn("walk_enter_from: returned key %d instead of successor\n",
+ i * 2);
+ err = -EINVAL;
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+ goto out;
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+ }
+
+ /* Test 2: walk_enter_from with non-existent key starts from bucket */
+ {
+ struct test_obj_val key = { .id = 99 };
+
+ scoped_guard(rcu) {
+ rhashtable_walk_enter_from(&ht, &iter, &key,
+ test_rht_params);
+ rhashtable_walk_start(&iter);
+ }
+
+ obj = rhashtable_walk_next(&iter);
+ while (IS_ERR(obj) && PTR_ERR(obj) == -EAGAIN)
+ obj = rhashtable_walk_next(&iter);
+
+ /* Should still return some element (iteration from bucket start) */
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+ }
+
+ /* Test 3: verify walk_enter_from + walk_next can iterate remaining elements */
+ {
+ struct test_obj_val key = { .id = 0 };
+ int count = 0;
+
+ scoped_guard(rcu) {
+ rhashtable_walk_enter_from(&ht, &iter, &key,
+ test_rht_params);
+ rhashtable_walk_start(&iter);
+ }
+
+ while ((obj = rhashtable_walk_next(&iter))) {
+ if (IS_ERR(obj)) {
+ if (PTR_ERR(obj) == -EAGAIN)
+ continue;
+ break;
+ }
+ count++;
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+
+ /*
+ * Should see at least some elements after key 0.
+ * Exact count depends on hash distribution.
+ */
+ if (count == 0) {
+ pr_warn("walk_enter_from: no elements found after key 0\n");
+ err = -EINVAL;
+ goto out;
+ }
+ }
+
+ pr_info("walk_enter_from: all tests passed\n");
+ err = 0;
+out:
+ rhashtable_destroy(&ht);
+ return err;
+}
+
static int __init test_rht_init(void)
{
unsigned int entries;
@@ -738,6 +855,9 @@ static int __init test_rht_init(void)
test_insert_duplicates_run();
+ pr_info("Testing walk_enter_from: %s\n",
+ test_walk_enter_from() == 0 ? "pass" : "FAIL");
+
if (!tcount)
return 0;
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (3 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-13 22:44 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API Mykyta Yatsenko
` (13 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Implement rhtab_map_get_next_key() and rhtab_map_free_internal_structs()
of the BPF resizable hashtable. Both are only called from syscall, so
it's safe to use rhashtable walk API that uses spinlocks internally.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 77 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index ea7314cc3703..7eee450a321e 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2921,13 +2921,89 @@ static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
return insn - insn_buf;
}
+/* Helper to get next element, handling -EAGAIN during resize */
+static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
+{
+ struct rhtab_elem *elem;
+
+ while ((elem = rhashtable_walk_next(iter))) {
+ if (IS_ERR(elem)) {
+ if (PTR_ERR(elem) == -EAGAIN)
+ continue;
+ return NULL;
+ }
+ return elem;
+ }
+
+ return NULL;
+}
+
static void rhtab_map_free_internal_structs(struct bpf_map *map)
{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhashtable_iter iter;
+ struct rhtab_elem *elem;
+
+ if (!bpf_map_has_internal_structs(map))
+ return;
+
+ /*
+ * An element can be processed twice if rhashtable resized concurrently.
+ * Special structs freeing handles duplicate cancel_and_free.
+ */
+ rhashtable_walk_enter(&rhtab->ht, &iter);
+ rhashtable_walk_start(&iter);
+
+ for (elem = rhtab_iter_next(&iter); elem; elem = rhtab_iter_next(&iter))
+ bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
}
static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
{
- return -EOPNOTSUPP;
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhashtable_iter iter;
+ struct rhtab_elem *elem;
+ bool key_found;
+ int ret = -ENOENT;
+
+ /*
+ * Hold RCU across enter_from + walk_start to prevent the
+ * element cached by enter_from from being freed before
+ * walk_start re-acquires RCU.
+ */
+ guard(rcu)();
+ rhashtable_walk_enter_from(&rhtab->ht, &iter, key, rhtab->params);
+ key_found = key && iter.p;
+ rhashtable_walk_start(&iter);
+
+ elem = rhtab_iter_next(&iter);
+ if (elem) {
+ memcpy(next_key, elem->data, map->key_size);
+ ret = 0;
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+
+ if (ret == 0 || key_found)
+ return ret;
+
+ /* Key was not found restart from the beginning */
+ rhashtable_walk_enter(&rhtab->ht, &iter);
+ rhashtable_walk_start(&iter);
+
+ elem = rhtab_iter_next(&iter);
+ if (elem) {
+ memcpy(next_key, elem->data, map->key_size);
+ ret = 0;
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+ return ret;
}
static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (4 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-13 23:02 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab Mykyta Yatsenko
` (12 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
rhashtable walk API takes spin_lock(&ht->lock) in start/stop,
making it unsafe in NMI and hard IRQ contexts. Guard with
!in_task() rather than open-coding raw RCU
iteration that would need to handle resize races manually.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 7eee450a321e..e79c194e2779 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -3013,7 +3013,40 @@ static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_f
static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
void *callback_ctx, u64 flags)
{
- return -EOPNOTSUPP;
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ struct rhashtable_iter iter;
+ struct rhtab_elem *elem;
+ int num_elems = 0;
+ u64 ret = 0;
+
+ if (flags != 0)
+ return -EINVAL;
+
+ /*
+ * The rhashtable walk API uses spin_lock(&ht->lock) in rhashtable_walk_start/stop,
+ * which is not safe in NMI or soft/hard IRQ context.
+ */
+ if (!in_task())
+ return -EOPNOTSUPP;
+
+ rhashtable_walk_enter(&rhtab->ht, &iter);
+ rhashtable_walk_start(&iter);
+
+ for (elem = rhtab_iter_next(&iter); elem;
+ elem = rhtab_iter_next(&iter)) {
+ num_elems++;
+ ret = callback_fn((u64)(long)map,
+ (u64)(long)elem->data,
+ (u64)(long)rhtab_elem_value(elem, map->key_size),
+ (u64)(long)callback_ctx, 0);
+ if (ret)
+ break;
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+
+ return num_elems;
}
static u64 rhtab_map_mem_usage(const struct bpf_map *map)
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (5 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-13 23:25 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs " Mykyta Yatsenko
` (11 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Add batch operations for BPF_MAP_TYPE_RHASH.
Batch operations:
* rhtab_map_lookup_batch: Bulk lookup of elements by bucket
* rhtab_map_lookup_and_delete_batch: Atomic bulk lookup and delete
The batch implementation uses rhashtable_walk_enter_from() to resume
iteration from the last collected key. When the buffer fills, the last
key becomes the cursor for the next batch call.
Also implements rhtab_map_mem_usage() to report memory consumption.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 134 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index e79c194e2779..a79d434dc626 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -3051,19 +3051,150 @@ static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
static u64 rhtab_map_mem_usage(const struct bpf_map *map)
{
- return 0;
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ u64 num_entries;
+
+ num_entries = atomic_read(&rhtab->ht.nelems);
+ return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
+}
+
+static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
+ const union bpf_attr *attr,
+ union bpf_attr __user *uattr,
+ bool do_delete)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+ void __user *uvalues = u64_to_user_ptr(attr->batch.values);
+ void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
+ void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
+ void *buf = NULL, *keys = NULL, *values = NULL, *dst_key, *dst_val;
+ struct rhtab_elem **del_elems = NULL;
+ u32 max_count, total, key_size, value_size, i;
+ struct rhashtable_iter iter;
+ struct rhtab_elem *elem;
+ u64 elem_map_flags, map_flags;
+ int ret = 0;
+
+ elem_map_flags = attr->batch.elem_flags;
+ if ((elem_map_flags & ~BPF_F_LOCK) ||
+ ((elem_map_flags & BPF_F_LOCK) &&
+ !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
+ return -EINVAL;
+
+ map_flags = attr->batch.flags;
+ if (map_flags)
+ return -EINVAL;
+
+ max_count = attr->batch.count;
+ if (!max_count)
+ return 0;
+
+ if (put_user(0, &uattr->batch.count))
+ return -EFAULT;
+
+ key_size = map->key_size;
+ value_size = map->value_size;
+
+ keys = kvmalloc_array(max_count, key_size, GFP_USER | __GFP_NOWARN);
+ values = kvmalloc_array(max_count, value_size, GFP_USER | __GFP_NOWARN);
+ if (do_delete)
+ del_elems = kvmalloc_array(max_count, sizeof(void *),
+ GFP_USER | __GFP_NOWARN);
+
+ if (!keys || !values || (do_delete && !del_elems)) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ /*
+ * Use the last key from the previous batch as cursor.
+ * enter_from positions at that key's bucket, walk_next
+ * returns the successor in O(1).
+ * First call (ubatch == NULL): starts from bucket 0.
+ */
+ if (ubatch) {
+ buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ if (copy_from_user(buf, ubatch, key_size)) {
+ ret = -EFAULT;
+ goto free;
+ }
+ }
+
+ scoped_guard(rcu) {
+ rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
+ rhashtable_walk_start(&iter);
+ }
+
+ dst_key = keys;
+ dst_val = values;
+ total = 0;
+
+ while (total < max_count) {
+ elem = rhtab_iter_next(&iter);
+ if (!elem)
+ break;
+
+ memcpy(dst_key, elem->data, key_size);
+ rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
+ check_and_init_map_value(map, dst_val);
+
+ if (do_delete)
+ del_elems[total] = elem;
+
+ dst_key += key_size;
+ dst_val += value_size;
+ total++;
+ }
+
+ if (do_delete) {
+ for (i = 0; i < total; i++)
+ rhtab_delete_elem(rhtab, del_elems[i]);
+ }
+
+ rhashtable_walk_stop(&iter);
+ rhashtable_walk_exit(&iter);
+
+ if (total == 0) {
+ ret = -ENOENT;
+ goto free;
+ }
+
+ /* Signal end of table when we collected fewer than requested */
+ if (total < max_count)
+ ret = -ENOENT;
+
+ /* Write last key as cursor for the next batch call */
+ if (copy_to_user(ukeys, keys, total * key_size) ||
+ copy_to_user(uvalues, values, total * value_size) ||
+ put_user(total, &uattr->batch.count) ||
+ copy_to_user(u64_to_user_ptr(attr->batch.out_batch),
+ dst_key - key_size, key_size)) {
+ ret = -EFAULT;
+ goto free;
+ }
+
+free:
+ kfree(buf);
+ kvfree(keys);
+ kvfree(values);
+ kvfree(del_elems);
+ return ret;
}
static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
- return 0;
+ return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, false);
}
static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
- return 0;
+ return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, true);
}
struct bpf_iter_seq_rhash_map_info {
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs for resizable hashtab
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (6 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-14 17:49 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free " Mykyta Yatsenko
` (10 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Wire up seq_file BPF iterator for BPF_MAP_TYPE_RHASH so that
bpf_iter and bpftool map dump work with resizable hash maps.
Use rhashtable_walk_enter_from() with a saved last_key to resume
iteration across read() calls without linear skip from the
beginning on each seq_start.
Also implement rhtab_map_seq_show_elem() for bpftool map dump
in non-iterator mode.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 94 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index a79d434dc626..492c6a9154b6 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -3008,6 +3008,19 @@ static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key
static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
{
+ void *value;
+
+ /* Guarantee that hashtab value is not freed */
+ guard(rcu)();
+
+ value = rhtab_map_lookup_elem(map, key);
+ if (!value)
+ return;
+
+ btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
+ seq_puts(m, ": ");
+ btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
+ seq_putc(m, '\n');
}
static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
@@ -3201,36 +3214,113 @@ struct bpf_iter_seq_rhash_map_info {
struct bpf_map *map;
struct bpf_rhtab *rhtab;
struct rhashtable_iter iter;
- u32 skip_elems;
+ void *last_key;
bool iter_active;
};
static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
{
- return NULL;
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+ struct rhtab_elem *elem;
+ void *key = *pos > 0 ? info->last_key : NULL;
+
+ scoped_guard(rcu) {
+ rhashtable_walk_enter_from(&info->rhtab->ht, &info->iter,
+ key, info->rhtab->params);
+ rhashtable_walk_start(&info->iter);
+ }
+ info->iter_active = true;
+
+ elem = rhtab_iter_next(&info->iter);
+ if (!elem)
+ return NULL;
+ /*
+ * if *pos is not 0, previously iteration failed on this elem,
+ * so we are restarting it. That's why no need to increment *pos.
+ */
+ if (*pos == 0)
+ ++*pos;
+ return elem;
}
static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
- return NULL;
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+ struct rhtab_elem *elem = v;
+
+ /* Save current key for O(1) resume in next seq_start */
+ memcpy(info->last_key, elem->data, info->map->key_size);
+
+ ++*pos;
+
+ return rhtab_iter_next(&info->iter);
+}
+
+static int __bpf_rhash_map_seq_show(struct seq_file *seq,
+ struct rhtab_elem *elem)
+{
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+ struct bpf_iter__bpf_map_elem ctx = {};
+ struct bpf_iter_meta meta;
+ struct bpf_prog *prog;
+ int ret = 0;
+
+ meta.seq = seq;
+ prog = bpf_iter_get_info(&meta, elem == NULL);
+ if (prog) {
+ ctx.meta = &meta;
+ ctx.map = info->map;
+ if (elem) {
+ ctx.key = elem->data;
+ ctx.value = rhtab_elem_value(elem, info->map->key_size);
+ }
+ ret = bpf_iter_run_prog(prog, &ctx);
+ }
+
+ return ret;
}
static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v)
{
- return 0;
+ return __bpf_rhash_map_seq_show(seq, v);
}
static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v)
{
+ struct bpf_iter_seq_rhash_map_info *info = seq->private;
+
+ if (!v)
+ (void)__bpf_rhash_map_seq_show(seq, NULL);
+
+ if (info->iter_active) {
+ rhashtable_walk_stop(&info->iter);
+ rhashtable_walk_exit(&info->iter);
+ info->iter_active = false;
+ }
}
static int bpf_iter_init_rhash_map(void *priv_data, struct bpf_iter_aux_info *aux)
{
+ struct bpf_iter_seq_rhash_map_info *info = priv_data;
+ struct bpf_map *map = aux->map;
+
+ info->last_key = kmalloc(map->key_size, GFP_USER);
+ if (!info->last_key)
+ return -ENOMEM;
+
+ bpf_map_inc_with_uref(map);
+ info->map = map;
+ info->rhtab = container_of(map, struct bpf_rhtab, map);
+ info->iter_active = false;
return 0;
}
static void bpf_iter_fini_rhash_map(void *priv_data)
{
+ struct bpf_iter_seq_rhash_map_info *info = priv_data;
+
+ kfree(info->last_key);
+ bpf_map_put_with_uref(info->map);
}
static const struct seq_operations bpf_rhash_map_seq_ops = {
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free for resizable hashtab
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (7 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs " Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:15 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 10/18] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
` (9 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Initialize rhashtable with bpf_mem_alloc element cache. Require
BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via
rhashtable_free_and_destroy() callback to handle internal structs.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 60 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 492c6a9154b6..a62093d8d1ae 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2762,16 +2762,74 @@ static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
{
- return ERR_PTR(-EOPNOTSUPP);
+ struct bpf_rhtab *rhtab;
+ int err = 0;
+
+ rhtab = bpf_map_area_alloc(sizeof(*rhtab), NUMA_NO_NODE);
+ if (!rhtab)
+ return ERR_PTR(-ENOMEM);
+
+ bpf_map_init_from_attr(&rhtab->map, attr);
+
+ if (rhtab->map.max_entries > 1UL << 31) {
+ err = -E2BIG;
+ goto free_rhtab;
+ }
+
+ rhtab->elem_size = sizeof(struct rhtab_elem) + round_up(rhtab->map.key_size, 8) +
+ round_up(rhtab->map.value_size, 8);
+
+ rhtab->params.head_offset = offsetof(struct rhtab_elem, node);
+ rhtab->params.key_offset = offsetof(struct rhtab_elem, data);
+ rhtab->params.key_len = rhtab->map.key_size;
+
+ err = rhashtable_init(&rhtab->ht, &rhtab->params);
+ if (err)
+ goto free_rhtab;
+
+ /* Set max_elems after rhashtable_init() since init zeroes the struct */
+ rhtab->ht.max_elems = rhtab->map.max_entries;
+
+ err = bpf_mem_alloc_init(&rhtab->ma, rhtab->elem_size, false);
+ if (err)
+ goto destroy_rhtab;
+
+ return &rhtab->map;
+
+destroy_rhtab:
+ rhashtable_destroy(&rhtab->ht);
+free_rhtab:
+ bpf_map_area_free(rhtab);
+ return ERR_PTR(err);
}
static int rhtab_map_alloc_check(union bpf_attr *attr)
{
- return -EOPNOTSUPP;
+ if (!(attr->map_flags & BPF_F_NO_PREALLOC))
+ return -EINVAL;
+
+ if (attr->map_flags & BPF_F_ZERO_SEED)
+ return -EINVAL;
+
+ return htab_map_alloc_check(attr);
+}
+
+static void rhtab_free_elem(void *ptr, void *arg)
+{
+ struct bpf_rhtab *rhtab = arg;
+ struct rhtab_elem *elem = ptr;
+
+ bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
+ bpf_mem_cache_free_rcu(&rhtab->ma, elem);
}
static void rhtab_map_free(struct bpf_map *map)
{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+ rhashtable_free_and_destroy(&rhtab->ht, rhtab_free_elem, rhtab);
+ bpf_mem_alloc_destroy(&rhtab->ma);
+ bpf_map_area_free(rhtab);
}
static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 10/18] bpf: Allow timers, workqueues and task_work in resizable hashtab
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (8 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free " Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable Mykyta Yatsenko
` (8 subsequent siblings)
18 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Without this, users needing deferred callbacks in a dynamically-sized
map have no option — fixed-size htab is the only map supporting these
field types. Resizable hashtab should offer the same capability.
Properly clean up BTF record fields on element delete and map
teardown by wiring up bpf_obj_free_fields through a memory allocator
destructor, matching the pattern used by htab for non-prealloc maps.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
kernel/bpf/hashtab.c | 53 +++++++++++++++++++++++++++++++++++++++++-----------
kernel/bpf/syscall.c | 1 +
2 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index a62093d8d1ae..4a631651bc70 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -501,28 +501,26 @@ static void htab_dtor_ctx_free(void *ctx)
kfree(ctx);
}
-static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *))
+static int bpf_ma_set_dtor(struct bpf_map *map, struct bpf_mem_alloc *ma,
+ void (*dtor)(void *, void *))
{
- u32 key_size = htab->map.key_size;
- struct bpf_mem_alloc *ma;
struct htab_btf_record *hrec;
int err;
/* No need for dtors. */
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(map->record))
return 0;
hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
if (!hrec)
return -ENOMEM;
- hrec->key_size = key_size;
- hrec->record = btf_record_dup(htab->map.record);
+ hrec->key_size = map->key_size;
+ hrec->record = btf_record_dup(map->record);
if (IS_ERR(hrec->record)) {
err = PTR_ERR(hrec->record);
kfree(hrec);
return err;
}
- ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec);
return 0;
}
@@ -539,9 +537,9 @@ static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf,
* populated in htab_map_alloc(), so it will always appear as NULL.
*/
if (htab_is_percpu(htab))
- return htab_set_dtor(htab, htab_pcpu_mem_dtor);
+ return bpf_ma_set_dtor(map, &htab->pcpu_ma, htab_pcpu_mem_dtor);
else
- return htab_set_dtor(htab, htab_mem_dtor);
+ return bpf_ma_set_dtor(map, &htab->ma, htab_mem_dtor);
}
static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
@@ -2814,12 +2812,43 @@ static int rhtab_map_alloc_check(union bpf_attr *attr)
return htab_map_alloc_check(attr);
}
+static void rhtab_check_and_free_fields(struct bpf_rhtab *rhtab,
+ struct rhtab_elem *elem)
+{
+ if (IS_ERR_OR_NULL(rhtab->map.record))
+ return;
+
+ bpf_obj_free_fields(rhtab->map.record,
+ rhtab_elem_value(elem, rhtab->map.key_size));
+}
+
+static void rhtab_mem_dtor(void *obj, void *ctx)
+{
+ struct htab_btf_record *hrec = ctx;
+ struct rhtab_elem *elem = obj;
+
+ if (IS_ERR_OR_NULL(hrec->record))
+ return;
+
+ bpf_obj_free_fields(hrec->record,
+ rhtab_elem_value(elem, hrec->key_size));
+}
+
+static int rhtab_map_check_btf(struct bpf_map *map, const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
+
+ return bpf_ma_set_dtor(map, &rhtab->ma, rhtab_mem_dtor);
+}
+
static void rhtab_free_elem(void *ptr, void *arg)
{
struct bpf_rhtab *rhtab = arg;
struct rhtab_elem *elem = ptr;
- bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
+ rhtab_check_and_free_fields(rhtab, elem);
bpf_mem_cache_free_rcu(&rhtab->ma, elem);
}
@@ -2857,7 +2886,7 @@ static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
if (err)
return err;
- bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
+ rhtab_check_and_free_fields(rhtab, elem);
bpf_mem_cache_free_rcu(&rhtab->ma, elem);
return 0;
}
@@ -2951,6 +2980,7 @@ static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u
memcpy(elem->data, key, map->key_size);
copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
+ check_and_init_map_value(map, rhtab_elem_value(elem, map->key_size));
tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
if (tmp) {
@@ -3403,6 +3433,7 @@ const struct bpf_map_ops rhtab_map_ops = {
.map_free = rhtab_map_free,
.map_get_next_key = rhtab_map_get_next_key,
.map_release_uref = rhtab_map_free_internal_structs,
+ .map_check_btf = rhtab_map_check_btf,
.map_lookup_elem = rhtab_map_lookup_elem,
.map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem,
.map_update_elem = rhtab_map_update_elem,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0a5ec417638d..792dbe288093 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1302,6 +1302,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
case BPF_WORKQUEUE:
case BPF_TASK_WORK:
if (map->map_type != BPF_MAP_TYPE_HASH &&
+ map->map_type != BPF_MAP_TYPE_RHASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
ret = -EOPNOTSUPP;
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (9 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 10/18] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-14 17:46 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
` (7 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Add BPF_MAP_TYPE_RHASH to libbpf's map type name table and feature
probing so that libbpf-based tools can create and identify resizable
hash maps.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/lib/bpf/libbpf.c | 1 +
tools/lib/bpf/libbpf_probes.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9ea41f40dc82..a0324e5b6085 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -192,6 +192,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
[BPF_MAP_TYPE_ARENA] = "arena",
[BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
+ [BPF_MAP_TYPE_RHASH] = "rhash",
};
static const char * const prog_type_name[] = {
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index b70d9637ecf5..e40819465ddc 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -309,6 +309,9 @@ static int probe_map_create(enum bpf_map_type map_type)
value_size = sizeof(__u64);
opts.map_flags = BPF_F_NO_PREALLOC;
break;
+ case BPF_MAP_TYPE_RHASH:
+ opts.map_flags = BPF_F_NO_PREALLOC;
+ break;
case BPF_MAP_TYPE_CGROUP_STORAGE:
case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
key_size = sizeof(struct bpf_cgroup_storage_key);
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (10 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:16 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps Mykyta Yatsenko
` (6 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Test basic map operations (lookup, update, delete) for
BPF_MAP_TYPE_RHASH including boundary conditions like duplicate
key insertion and deletion of nonexistent keys.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/prog_tests/rhash.c | 64 +++++++
tools/testing/selftests/bpf/progs/rhash.c | 242 +++++++++++++++++++++++++
2 files changed, 306 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
new file mode 100644
index 000000000000..40f30ce69190
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <string.h>
+#include <stdio.h>
+#include "rhash.skel.h"
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+
+static void rhash_run(const char *prog_name)
+{
+ struct rhash *skel;
+ struct bpf_program *prog;
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ int err;
+
+ skel = rhash__open();
+ if (!ASSERT_OK_PTR(skel, "rhash__open"))
+ return;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+ bpf_program__set_autoload(prog, true);
+
+ err = rhash__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+ if (!ASSERT_OK(err, "prog run"))
+ goto cleanup;
+
+ if (!ASSERT_OK(skel->bss->err, "bss->err"))
+ goto cleanup;
+
+cleanup:
+ rhash__destroy(skel);
+}
+
+void test_rhash(void)
+{
+ if (test__start_subtest("test_rhash_lookup_update"))
+ rhash_run("test_rhash_lookup_update");
+
+ if (test__start_subtest("test_rhash_update_delete"))
+ rhash_run("test_rhash_update_delete");
+
+ if (test__start_subtest("test_rhash_update_elements"))
+ rhash_run("test_rhash_update_elements");
+
+ if (test__start_subtest("test_rhash_update_exist"))
+ rhash_run("test_rhash_update_exist");
+
+ if (test__start_subtest("test_rhash_update_any"))
+ rhash_run("test_rhash_update_any");
+
+ if (test__start_subtest("test_rhash_noexist_duplicate"))
+ rhash_run("test_rhash_noexist_duplicate");
+
+ if (test__start_subtest("test_rhash_delete_nonexistent"))
+ rhash_run("test_rhash_delete_nonexistent");
+}
diff --git a/tools/testing/selftests/bpf/progs/rhash.c b/tools/testing/selftests/bpf/progs/rhash.c
new file mode 100644
index 000000000000..2cd41324bcb9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rhash.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <string.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+
+#define ENOENT 2
+#define EEXIST 17
+
+char _license[] SEC("license") = "GPL";
+
+int err;
+
+struct elem {
+ char arr[128];
+ int val;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RHASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 128);
+ __type(key, int);
+ __type(value, struct elem);
+} rhmap SEC(".maps");
+
+SEC("syscall")
+int test_rhash_lookup_update(void *ctx)
+{
+ int key = 5;
+ struct elem empty = {.val = 3, .arr = {0}};
+ struct elem *e;
+
+ err = 1;
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 1;
+
+ err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+ if (err)
+ return 1;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e && e->val == empty.val) {
+ bpf_printk("All good look up!");
+ err = 0;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_delete(void *ctx)
+{
+ int key = 6;
+ struct elem empty = {.val = 4, .arr = {0}};
+ struct elem *e;
+
+ err = 1;
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 1;
+
+ err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+ if (err)
+ return 2;
+
+ err = bpf_map_delete_elem(&rhmap, &key);
+ if (err)
+ return 3;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 4;
+
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_elements(void *ctx)
+{
+ int key = 0;
+ struct elem empty = {.val = 4, .arr = {0}};
+ struct elem *e;
+ int i;
+
+ err = 1;
+
+ for (i = 0; i < 128; ++i) {
+ key = i;
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 1;
+
+ empty.val = key;
+ err = bpf_map_update_elem(&rhmap, &key, &empty, BPF_NOEXIST);
+ if (err)
+ return 2;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != key)
+ return 4;
+ }
+
+ for (i = 0; i < 128; ++i) {
+ key = i;
+ err = bpf_map_delete_elem(&rhmap, &key);
+ if (err)
+ return 3;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (e)
+ return 4;
+ }
+
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_exist(void *ctx)
+{
+ int key = 10;
+ struct elem val1 = {.val = 100, .arr = {0}};
+ struct elem val2 = {.val = 200, .arr = {0}};
+ struct elem *e;
+ int ret;
+
+ err = 1;
+
+ /* BPF_EXIST on non-existent key should fail with -ENOENT */
+ ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_EXIST);
+ if (ret != -ENOENT)
+ return 1;
+
+ /* Insert element first */
+ ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_NOEXIST);
+ if (ret)
+ return 2;
+
+ /* Verify initial value */
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 100)
+ return 3;
+
+ /* BPF_EXIST on existing key should succeed and update value */
+ ret = bpf_map_update_elem(&rhmap, &key, &val2, BPF_EXIST);
+ if (ret)
+ return 4;
+
+ /* Verify value was updated */
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 200)
+ return 5;
+
+ /* Cleanup */
+ bpf_map_delete_elem(&rhmap, &key);
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_update_any(void *ctx)
+{
+ int key = 11;
+ struct elem val1 = {.val = 111, .arr = {0}};
+ struct elem val2 = {.val = 222, .arr = {0}};
+ struct elem *e;
+ int ret;
+
+ err = 1;
+
+ /* BPF_ANY on non-existent key should insert */
+ ret = bpf_map_update_elem(&rhmap, &key, &val1, BPF_ANY);
+ if (ret)
+ return 1;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 111)
+ return 2;
+
+ /* BPF_ANY on existing key should update */
+ ret = bpf_map_update_elem(&rhmap, &key, &val2, BPF_ANY);
+ if (ret)
+ return 3;
+
+ e = bpf_map_lookup_elem(&rhmap, &key);
+ if (!e || e->val != 222)
+ return 4;
+
+ /* Cleanup */
+ bpf_map_delete_elem(&rhmap, &key);
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_noexist_duplicate(void *ctx)
+{
+ int key = 12;
+ struct elem val = {.val = 600, .arr = {0}};
+ int ret;
+
+ err = 1;
+
+ /* Insert element */
+ ret = bpf_map_update_elem(&rhmap, &key, &val, BPF_NOEXIST);
+ if (ret)
+ return 1;
+
+ /* Try to insert again with BPF_NOEXIST - should fail with -EEXIST */
+ ret = bpf_map_update_elem(&rhmap, &key, &val, BPF_NOEXIST);
+ if (ret != -EEXIST)
+ return 2;
+
+ /* Cleanup */
+ bpf_map_delete_elem(&rhmap, &key);
+ err = 0;
+ return 0;
+}
+
+SEC("syscall")
+int test_rhash_delete_nonexistent(void *ctx)
+{
+ int key = 99999;
+ int ret;
+
+ err = 1;
+
+ /* Delete non-existent key should return -ENOENT */
+ ret = bpf_map_delete_elem(&rhmap, &key);
+ if (ret != -ENOENT)
+ return 1;
+
+ err = 0;
+ return 0;
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (11 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:17 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests Mykyta Yatsenko
` (5 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Parameterize existing htab tests (test_hashmap, test_hashmap_sizes,
test_hashmap_walk, htab_map_batch_ops) by map type so they run
against BPF_MAP_TYPE_RHASH as well. Relax ordering and exact-count
assertions where rhashtable resize makes them non-deterministic.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
.../selftests/bpf/map_tests/htab_map_batch_ops.c | 22 +++-
tools/testing/selftests/bpf/test_maps.c | 127 ++++++++++++++++-----
2 files changed, 114 insertions(+), 35 deletions(-)
diff --git a/tools/testing/selftests/bpf/map_tests/htab_map_batch_ops.c b/tools/testing/selftests/bpf/map_tests/htab_map_batch_ops.c
index 5da493b94ae2..e8e83bf95be4 100644
--- a/tools/testing/selftests/bpf/map_tests/htab_map_batch_ops.c
+++ b/tools/testing/selftests/bpf/map_tests/htab_map_batch_ops.c
@@ -74,10 +74,11 @@ static void map_batch_verify(int *visited, __u32 max_entries,
}
}
-void __test_map_lookup_and_delete_batch(bool is_pcpu)
+void __test_map_lookup_and_delete_batch(enum bpf_map_type map_type)
{
__u32 batch, count, total, total_success;
typedef BPF_DECLARE_PERCPU(int, value);
+ bool is_pcpu = (map_type == BPF_MAP_TYPE_PERCPU_HASH);
int map_fd, *keys, *visited, key;
const __u32 max_entries = 10;
value pcpu_values[max_entries];
@@ -88,9 +89,13 @@ void __test_map_lookup_and_delete_batch(bool is_pcpu)
.elem_flags = 0,
.flags = 0,
);
+ struct bpf_map_create_opts map_opts = {
+ .sz = sizeof(map_opts),
+ .map_flags = (map_type == BPF_MAP_TYPE_RHASH) ? BPF_F_NO_PREALLOC : 0,
+ };
- map_fd = bpf_map_create(is_pcpu ? BPF_MAP_TYPE_PERCPU_HASH : BPF_MAP_TYPE_HASH,
- "hash_map", sizeof(int), sizeof(int), max_entries, NULL);
+ map_fd = bpf_map_create(map_type, "hash_map", sizeof(int), sizeof(int),
+ max_entries, &map_opts);
CHECK(map_fd == -1,
"bpf_map_create()", "error:%s\n", strerror(errno));
@@ -261,13 +266,19 @@ void __test_map_lookup_and_delete_batch(bool is_pcpu)
void htab_map_batch_ops(void)
{
- __test_map_lookup_and_delete_batch(false);
+ __test_map_lookup_and_delete_batch(BPF_MAP_TYPE_HASH);
printf("test_%s:PASS\n", __func__);
}
void htab_percpu_map_batch_ops(void)
{
- __test_map_lookup_and_delete_batch(true);
+ __test_map_lookup_and_delete_batch(BPF_MAP_TYPE_PERCPU_HASH);
+ printf("test_%s:PASS\n", __func__);
+}
+
+void rhtab_map_batch_ops(void)
+{
+ __test_map_lookup_and_delete_batch(BPF_MAP_TYPE_RHASH);
printf("test_%s:PASS\n", __func__);
}
@@ -275,4 +286,5 @@ void test_htab_map_batch_ops(void)
{
htab_map_batch_ops();
htab_percpu_map_batch_ops();
+ rhtab_map_batch_ops();
}
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index ccc5acd55ff9..2e06b7e0c3ba 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -30,12 +30,23 @@ int skips;
static struct bpf_map_create_opts map_opts = { .sz = sizeof(map_opts) };
+static bool skip_test(enum bpf_map_type map_type)
+{
+ return map_type == BPF_MAP_TYPE_RHASH &&
+ !(map_opts.map_flags & BPF_F_NO_PREALLOC);
+}
+
static void test_hashmap(unsigned int task, void *data)
{
+ enum bpf_map_type map_type = data ? *(enum bpf_map_type *)data : BPF_MAP_TYPE_HASH;
long long key, next_key, first_key, value;
int fd;
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 2, &map_opts);
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
+ fd = bpf_map_create(map_type, NULL, sizeof(key), sizeof(value), 2, &map_opts);
if (fd < 0) {
printf("Failed to create hashmap '%s'!\n", strerror(errno));
exit(1);
@@ -128,11 +139,16 @@ static void test_hashmap(unsigned int task, void *data)
static void test_hashmap_sizes(unsigned int task, void *data)
{
+ enum bpf_map_type map_type = data ? *(enum bpf_map_type *)data : BPF_MAP_TYPE_HASH;
int fd, i, j;
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
for (i = 1; i <= 512; i <<= 1)
for (j = 1; j <= 1 << 18; j <<= 1) {
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, i, j, 2, &map_opts);
+ fd = bpf_map_create(map_type, NULL, i, j, 2, &map_opts);
if (fd < 0) {
if (errno == ENOMEM)
return;
@@ -261,12 +277,12 @@ static void test_hashmap_percpu(unsigned int task, void *data)
}
#define VALUE_SIZE 3
-static int helper_fill_hashmap(int max_entries)
+static int helper_fill_hashmap(int max_entries, enum bpf_map_type map_type)
{
int i, fd, ret;
long long key, value[VALUE_SIZE] = {};
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value),
+ fd = bpf_map_create(map_type, NULL, sizeof(key), sizeof(value),
max_entries, &map_opts);
CHECK(fd < 0,
"failed to create hashmap",
@@ -285,19 +301,32 @@ static int helper_fill_hashmap(int max_entries)
static void test_hashmap_walk(unsigned int task, void *data)
{
+ enum bpf_map_type map_type = data ? *(enum bpf_map_type *)data : BPF_MAP_TYPE_HASH;
int fd, i, max_entries = 10000;
long long key, value[VALUE_SIZE], next_key;
bool next_key_valid = true;
- fd = helper_fill_hashmap(max_entries);
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
+ fd = helper_fill_hashmap(max_entries, map_type);
for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key,
&next_key) == 0; i++) {
key = next_key;
- assert(bpf_map_lookup_elem(fd, &key, value) == 0);
+ if (map_type == BPF_MAP_TYPE_RHASH)
+ /*
+ * During chained rhashtable resize, a key visible to
+ * get_next_key can be transiently invisible to lookup.
+ */
+ bpf_map_lookup_elem(fd, &key, value);
+ else
+ assert(bpf_map_lookup_elem(fd, &key, value) == 0);
}
- assert(i == max_entries);
+ /* rhash does not guarantee visiting all elements or not visiting the same element twice */
+ assert(map_type == BPF_MAP_TYPE_RHASH || i == max_entries);
assert(bpf_map_get_next_key(fd, NULL, &key) == 0);
for (i = 0; next_key_valid; i++) {
@@ -308,16 +337,23 @@ static void test_hashmap_walk(unsigned int task, void *data)
key = next_key;
}
- assert(i == max_entries);
+ assert(map_type == BPF_MAP_TYPE_RHASH || i == max_entries);
for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key,
&next_key) == 0; i++) {
key = next_key;
assert(bpf_map_lookup_elem(fd, &key, value) == 0);
- assert(value[0] - 1 == key);
+ /*
+ * Async rhashtable resize (triggered by the fill above) can
+ * still be in progress. get_next_key follows future_tbl and
+ * may revisit elements, so some values get incremented more
+ * than once.
+ */
+ assert(map_type == BPF_MAP_TYPE_RHASH ||
+ value[0] - 1 == key);
}
- assert(i == max_entries);
+ assert(map_type == BPF_MAP_TYPE_RHASH || i == max_entries);
close(fd);
}
@@ -329,8 +365,8 @@ static void test_hashmap_zero_seed(void)
old_flags = map_opts.map_flags;
map_opts.map_flags |= BPF_F_ZERO_SEED;
- first = helper_fill_hashmap(3);
- second = helper_fill_hashmap(3);
+ first = helper_fill_hashmap(3, BPF_MAP_TYPE_HASH);
+ second = helper_fill_hashmap(3, BPF_MAP_TYPE_HASH);
for (i = 0; ; i++) {
void *key_ptr = !i ? NULL : &key;
@@ -1301,7 +1337,7 @@ static void test_map_in_map(void)
#define MAP_SIZE (32 * 1024)
-static void test_map_large(void)
+static void test_map_large(enum bpf_map_type map_type)
{
struct bigkey {
@@ -1311,7 +1347,11 @@ static void test_map_large(void)
} key;
int fd, i, value;
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value),
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
+ fd = bpf_map_create(map_type, NULL, sizeof(key), sizeof(value),
MAP_SIZE, &map_opts);
if (fd < 0) {
printf("Failed to create large map '%s'!\n", strerror(errno));
@@ -1378,10 +1418,16 @@ static void __run_parallel(unsigned int tasks,
static void test_map_stress(void)
{
- run_parallel(100, test_hashmap_walk, NULL);
- run_parallel(100, test_hashmap, NULL);
+ enum bpf_map_type hash_type = BPF_MAP_TYPE_HASH;
+ enum bpf_map_type rhash_type = BPF_MAP_TYPE_RHASH;
+
+ run_parallel(100, test_hashmap_walk, &hash_type);
+ run_parallel(100, test_hashmap_walk, &rhash_type);
+ run_parallel(100, test_hashmap, &hash_type);
+ run_parallel(100, test_hashmap, &rhash_type);
run_parallel(100, test_hashmap_percpu, NULL);
- run_parallel(100, test_hashmap_sizes, NULL);
+ run_parallel(100, test_hashmap_sizes, &hash_type);
+ run_parallel(100, test_hashmap_sizes, &rhash_type);
run_parallel(100, test_arraymap, NULL);
run_parallel(100, test_arraymap_percpu, NULL);
@@ -1399,7 +1445,7 @@ static void test_map_stress(void)
static bool can_retry(int err)
{
return (err == EAGAIN || err == EBUSY ||
- ((err == ENOMEM || err == E2BIG) &&
+ ((err == ENOMEM || err == E2BIG || err == ENOENT) &&
map_opts.map_flags == BPF_F_NO_PREALLOC));
}
@@ -1471,12 +1517,16 @@ static void test_update_delete(unsigned int fn, void *data)
}
}
-static void test_map_parallel(void)
+static void test_map_parallel(enum bpf_map_type map_type)
{
int i, fd, key = 0, value = 0, j = 0;
int data[2];
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value),
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
+ fd = bpf_map_create(map_type, NULL, sizeof(key), sizeof(value),
MAP_SIZE, &map_opts);
if (fd < 0) {
printf("Failed to create map for parallel test '%s'!\n",
@@ -1529,14 +1579,18 @@ static void test_map_parallel(void)
close(fd);
}
-static void test_map_rdonly(void)
+static void test_map_rdonly(enum bpf_map_type map_type)
{
int fd, key = 0, value = 0;
__u32 old_flags;
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
old_flags = map_opts.map_flags;
map_opts.map_flags |= BPF_F_RDONLY;
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value),
+ fd = bpf_map_create(map_type, NULL, sizeof(key), sizeof(value),
MAP_SIZE, &map_opts);
map_opts.map_flags = old_flags;
if (fd < 0) {
@@ -1558,14 +1612,18 @@ static void test_map_rdonly(void)
close(fd);
}
-static void test_map_wronly_hash(void)
+static void test_map_wronly_hash(enum bpf_map_type map_type)
{
int fd, key = 0, value = 0;
__u32 old_flags;
+ /* RHASH doesn't support prealloc mode */
+ if (skip_test(map_type))
+ return;
+
old_flags = map_opts.map_flags;
map_opts.map_flags |= BPF_F_WRONLY;
- fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value),
+ fd = bpf_map_create(map_type, NULL, sizeof(key), sizeof(value),
MAP_SIZE, &map_opts);
map_opts.map_flags = old_flags;
if (fd < 0) {
@@ -1623,7 +1681,8 @@ static void test_map_wronly_stack_or_queue(enum bpf_map_type map_type)
static void test_map_wronly(void)
{
- test_map_wronly_hash();
+ test_map_wronly_hash(BPF_MAP_TYPE_HASH);
+ test_map_wronly_hash(BPF_MAP_TYPE_RHASH);
test_map_wronly_stack_or_queue(BPF_MAP_TYPE_STACK);
test_map_wronly_stack_or_queue(BPF_MAP_TYPE_QUEUE);
}
@@ -1883,9 +1942,14 @@ static void test_reuseport_array(void)
static void run_all_tests(void)
{
- test_hashmap(0, NULL);
+ enum bpf_map_type hash_type = BPF_MAP_TYPE_HASH;
+ enum bpf_map_type rhash_type = BPF_MAP_TYPE_RHASH;
+
+ test_hashmap(0, &hash_type);
+ test_hashmap(0, &rhash_type);
test_hashmap_percpu(0, NULL);
- test_hashmap_walk(0, NULL);
+ test_hashmap_walk(0, &hash_type);
+ test_hashmap_walk(0, &rhash_type);
test_hashmap_zero_seed();
test_arraymap(0, NULL);
@@ -1897,11 +1961,14 @@ static void run_all_tests(void)
test_devmap_hash(0, NULL);
test_sockmap(0, NULL);
- test_map_large();
- test_map_parallel();
+ test_map_large(BPF_MAP_TYPE_HASH);
+ test_map_large(BPF_MAP_TYPE_RHASH);
+ test_map_parallel(BPF_MAP_TYPE_HASH);
+ test_map_parallel(BPF_MAP_TYPE_RHASH);
test_map_stress();
- test_map_rdonly();
+ test_map_rdonly(BPF_MAP_TYPE_HASH);
+ test_map_rdonly(BPF_MAP_TYPE_RHASH);
test_map_wronly();
test_reuseport_array();
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (12 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:18 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key Mykyta Yatsenko
` (4 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Add tests validating resizable hash map handles BPF_F_LOCK flag as
expected.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/prog_tests/rhash.c | 99 ++++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/rhash.c | 35 +++++++++
2 files changed, 134 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
index 40f30ce69190..4fa8ef582b01 100644
--- a/tools/testing/selftests/bpf/prog_tests/rhash.c
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -7,6 +7,7 @@
#include <linux/bpf.h>
#include <linux/perf_event.h>
#include <sys/syscall.h>
+#include <network_helpers.h>
static void rhash_run(const char *prog_name)
{
@@ -39,6 +40,100 @@ static void rhash_run(const char *prog_name)
rhash__destroy(skel);
}
+struct lock_thread_args {
+ int prog_fd;
+ int map_fd;
+};
+
+struct lock_elem {
+ struct bpf_spin_lock lock;
+ int var[16];
+};
+
+static void *spin_lock_thread(void *arg)
+{
+ struct lock_thread_args *args = arg;
+ LIBBPF_OPTS(bpf_test_run_opts, topts,
+ .data_in = &pkt_v4,
+ .data_size_in = sizeof(pkt_v4),
+ .repeat = 10000,
+ );
+ int err;
+
+ err = bpf_prog_test_run_opts(args->prog_fd, &topts);
+ if (err || topts.retval)
+ return (void *)1;
+
+ return (void *)0;
+}
+
+static void *parallel_map_access(void *arg)
+{
+ struct lock_thread_args *args = arg;
+ int i, j, key = 0;
+ int err;
+ struct lock_elem val;
+
+ for (i = 0; i < 10000; i++) {
+ err = bpf_map_lookup_elem_flags(args->map_fd, &key, &val, BPF_F_LOCK);
+ if (err)
+ return (void *)1;
+ if (val.lock.val)
+ return (void *)1;
+ for (j = 1; j < 16; j++) {
+ if (val.var[j] != val.var[0])
+ return (void *)1;
+ }
+ }
+
+ return (void *)0;
+}
+
+static void rhash_spin_lock_test(void)
+{
+ struct lock_thread_args args;
+ struct rhash *skel;
+ struct lock_elem val = {};
+ pthread_t thread_id[4];
+ int err, key = 0, i;
+ void *ret;
+
+ skel = rhash__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "rhash__open_and_load"))
+ return;
+
+ args.prog_fd = bpf_program__fd(skel->progs.test_rhash_spin_lock);
+ args.map_fd = bpf_map__fd(skel->maps.rhmap_lock);
+
+ /* Insert initial element with BPF_F_LOCK */
+ err = bpf_map_update_elem(args.map_fd, &key, &val, BPF_F_LOCK);
+ if (!ASSERT_OK(err, "initial update"))
+ goto cleanup;
+
+ /* Spawn 2 threads running BPF program (uses bpf_spin_lock) */
+ for (i = 0; i < 2; i++)
+ if (!ASSERT_OK(pthread_create(&thread_id[i], NULL,
+ &spin_lock_thread, &args),
+ "pthread_create spin_lock"))
+ goto cleanup;
+
+ /* Spawn 2 threads doing parallel map access with BPF_F_LOCK */
+ for (i = 2; i < 4; i++)
+ if (!ASSERT_OK(pthread_create(&thread_id[i], NULL,
+ ¶llel_map_access, &args),
+ "pthread_create parallel_map_access"))
+ goto cleanup;
+
+ /* Wait for all threads */
+ for (i = 0; i < 4; i++)
+ if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join") ||
+ !ASSERT_OK((long)ret, "thread ret"))
+ goto cleanup;
+
+cleanup:
+ rhash__destroy(skel);
+}
+
void test_rhash(void)
{
if (test__start_subtest("test_rhash_lookup_update"))
@@ -61,4 +156,8 @@ void test_rhash(void)
if (test__start_subtest("test_rhash_delete_nonexistent"))
rhash_run("test_rhash_delete_nonexistent");
+
+ if (test__start_subtest("test_rhash_spin_lock"))
+ rhash_spin_lock_test();
}
+
diff --git a/tools/testing/selftests/bpf/progs/rhash.c b/tools/testing/selftests/bpf/progs/rhash.c
index 2cd41324bcb9..0b30d0ec779d 100644
--- a/tools/testing/selftests/bpf/progs/rhash.c
+++ b/tools/testing/selftests/bpf/progs/rhash.c
@@ -240,3 +240,38 @@ int test_rhash_delete_nonexistent(void *ctx)
err = 0;
return 0;
}
+
+#define VAR_NUM 16
+
+struct lock_elem {
+ struct bpf_spin_lock lock;
+ int var[VAR_NUM];
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RHASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 1);
+ __type(key, __u32);
+ __type(value, struct lock_elem);
+} rhmap_lock SEC(".maps");
+
+SEC("cgroup/skb")
+int test_rhash_spin_lock(struct __sk_buff *skb)
+{
+ struct lock_elem *val;
+ int rnd = bpf_get_prandom_u32();
+ int key = 0, i;
+
+ val = bpf_map_lookup_elem(&rhmap_lock, &key);
+ if (!val)
+ return 1;
+
+ /* spin_lock in resizable hash map */
+ bpf_spin_lock(&val->lock);
+ for (i = 0; i < VAR_NUM; i++)
+ val->var[i] = rnd;
+ bpf_spin_unlock(&val->lock);
+
+ return 0;
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (13 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:19 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map Mykyta Yatsenko
` (3 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Test get_next_key behavior under concurrent modification:
* Resize test: verify all elements visited after resize trigger
* Stress test: concurrent iterators and modifiers to detect races
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/prog_tests/rhash.c | 181 +++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/rhash.c | 8 ++
2 files changed, 189 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
index 4fa8ef582b01..53ccc9366b5a 100644
--- a/tools/testing/selftests/bpf/prog_tests/rhash.c
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -134,6 +134,181 @@ static void rhash_spin_lock_test(void)
rhash__destroy(skel);
}
+struct iter_thread_args {
+ int map_fd;
+ int stop;
+ int error;
+};
+
+static void *get_next_key_thread(void *arg)
+{
+ struct iter_thread_args *args = arg;
+ int key, next_key;
+ int i = 0;
+
+ for (i = 0; i < 1000; i++) {
+ if (READ_ONCE(args->stop))
+ break;
+
+ if (bpf_map_get_next_key(args->map_fd, NULL, &next_key) != 0) {
+ WRITE_ONCE(args->error, 1);
+ continue;
+ }
+
+ key = next_key;
+ while (bpf_map_get_next_key(args->map_fd, &key, &next_key) == 0)
+ key = next_key;
+ }
+
+ return (void *)0;
+}
+
+static void *modifier_thread(void *arg)
+{
+ struct iter_thread_args *args = arg;
+ int key, value;
+ int i;
+
+ for (i = 0; i < 10000; i++) {
+ if (READ_ONCE(args->stop))
+ break;
+
+ key = i;
+ value = i;
+ if (bpf_map_update_elem(args->map_fd, &key, &value, BPF_ANY))
+ WRITE_ONCE(args->error, 1);
+ }
+
+ return (void *)0;
+}
+
+static void rhash_get_next_key_stress_test(void)
+{
+ struct iter_thread_args args = {};
+ struct rhash *skel;
+ pthread_t iter_threads[2];
+ pthread_t mod_threads[2];
+ int key, value;
+ int err, i;
+ void *ret;
+
+ skel = rhash__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "rhash__open_and_load"))
+ return;
+
+ args.map_fd = bpf_map__fd(skel->maps.rhmap_iter);
+ args.stop = 0;
+ args.error = 0;
+
+ /* Pre-populate map */
+ for (i = 0; i < 50; i++) {
+ key = i;
+ value = i;
+ err = bpf_map_update_elem(args.map_fd, &key, &value, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "initial insert"))
+ goto cleanup;
+ }
+
+ /* Iterator threads */
+ for (i = 0; i < 2; i++)
+ if (!ASSERT_OK(pthread_create(&iter_threads[i], NULL,
+ &get_next_key_thread, &args),
+ "pthread_create iter"))
+ goto cleanup;
+
+ /* Modifier threads */
+ for (i = 0; i < 2; i++)
+ if (!ASSERT_OK(pthread_create(&mod_threads[i], NULL,
+ &modifier_thread, &args),
+ "pthread_create mod"))
+ goto cleanup;
+
+ /* Wait for modifier threads to finish */
+ for (i = 0; i < 2; i++)
+ pthread_join(mod_threads[i], &ret);
+
+ /* Signal iterator threads to stop */
+ WRITE_ONCE(args.stop, 1);
+
+ /* Wait for iterator threads */
+ for (i = 0; i < 2; i++)
+ if (!ASSERT_OK(pthread_join(iter_threads[i], &ret), "pthread_join iter") ||
+ !ASSERT_OK((long)ret, "iter thread ret"))
+ goto cleanup;
+
+ ASSERT_EQ(args.error, 0, "no infinite loop");
+
+cleanup:
+ rhash__destroy(skel);
+}
+
+static void iterate_all(int map_fd, int num_elems)
+{
+ int *visited, key, next_key, i, err;
+
+ visited = calloc(num_elems, sizeof(int));
+ if (!ASSERT_TRUE(visited, "calloc"))
+ return;
+ memset(visited, 0, num_elems * sizeof(int));
+
+ for (err = bpf_map_get_next_key(map_fd, NULL, &next_key); err == 0;
+ err = bpf_map_get_next_key(map_fd, &key, &next_key)) {
+ key = next_key;
+ if (ASSERT_TRUE(key >= 0 && key < num_elems, "key valid"))
+ visited[key] += 1;
+ }
+
+ for (i = 0; i < num_elems; i++)
+ ASSERT_EQ(visited[i], 1, "element visited");
+
+ free(visited);
+}
+
+static void rhash_get_next_key_resize_test(void)
+{
+ struct rhash *skel;
+ int key, next_key, value;
+ int map_fd;
+ int err, i;
+
+ skel = rhash__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "rhash__open_and_load"))
+ return;
+
+ map_fd = bpf_map__fd(skel->maps.rhmap_iter);
+
+ /* Phase 1: small table, no resize - verify completeness */
+ for (i = 0; i < 4; i++) {
+ key = i;
+ value = i;
+ err = bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "insert small"))
+ goto cleanup;
+ }
+ iterate_all(map_fd, 4);
+
+ /* Phase 2: trigger resize by inserting more elements */
+ for (i = 4; i < 100; i++) {
+ key = i;
+ value = i;
+ err = bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "insert resize"))
+ goto cleanup;
+
+ /* Full iteration during resize - verify all code paths are safe */
+ for (err = bpf_map_get_next_key(map_fd, NULL, &next_key); err == 0;
+ err = bpf_map_get_next_key(map_fd, &key, &next_key)) {
+ key = next_key;
+ }
+ }
+
+ /* Phase 3: after resize settled - verify completeness */
+ iterate_all(map_fd, 100);
+
+cleanup:
+ rhash__destroy(skel);
+}
+
void test_rhash(void)
{
if (test__start_subtest("test_rhash_lookup_update"))
@@ -159,5 +334,11 @@ void test_rhash(void)
if (test__start_subtest("test_rhash_spin_lock"))
rhash_spin_lock_test();
+
+ if (test__start_subtest("test_rhash_get_next_key_resize"))
+ rhash_get_next_key_resize_test();
+
+ if (test__start_subtest("test_rhash_get_next_key_stress"))
+ rhash_get_next_key_stress_test();
}
diff --git a/tools/testing/selftests/bpf/progs/rhash.c b/tools/testing/selftests/bpf/progs/rhash.c
index 0b30d0ec779d..ebbc1ba63d8b 100644
--- a/tools/testing/selftests/bpf/progs/rhash.c
+++ b/tools/testing/selftests/bpf/progs/rhash.c
@@ -256,6 +256,14 @@ struct {
__type(value, struct lock_elem);
} rhmap_lock SEC(".maps");
+struct {
+ __uint(type, BPF_MAP_TYPE_RHASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 65536);
+ __type(key, int);
+ __type(value, int);
+} rhmap_iter SEC(".maps");
+
SEC("cgroup/skb")
int test_rhash_spin_lock(struct __sk_buff *skb)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (14 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:20 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation Mykyta Yatsenko
` (2 subsequent siblings)
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Test BPF iterator functionality for BPF_MAP_TYPE_RHASH:
* Basic iteration verifying all elements are visited
* Overflow test triggering seq_file restart, validating correct
resume behavior via skip_elems tracking
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/prog_tests/rhash.c | 160 ++++++++++++++++++++-
.../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c | 75 ++++++++++
2 files changed, 234 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
index 53ccc9366b5a..f28296b16593 100644
--- a/tools/testing/selftests/bpf/prog_tests/rhash.c
+++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
@@ -4,6 +4,7 @@
#include <string.h>
#include <stdio.h>
#include "rhash.skel.h"
+#include "bpf_iter_bpf_rhash_map.skel.h"
#include <linux/bpf.h>
#include <linux/perf_event.h>
#include <sys/syscall.h>
@@ -309,6 +310,158 @@ static void rhash_get_next_key_resize_test(void)
rhash__destroy(skel);
}
+static void rhash_iter_test(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+ struct bpf_iter_bpf_rhash_map *skel;
+ int err, i, len, map_fd, iter_fd;
+ union bpf_iter_link_info linfo;
+ u32 expected_key_sum = 0, key;
+ struct bpf_link *link;
+ u64 val = 0;
+ char buf[64];
+
+ skel = bpf_iter_bpf_rhash_map__open();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_rhash_map__open"))
+ return;
+
+ err = bpf_iter_bpf_rhash_map__load(skel);
+ if (!ASSERT_OK(err, "bpf_iter_bpf_rhash_map__load"))
+ goto out;
+
+ map_fd = bpf_map__fd(skel->maps.rhashmap);
+
+ /* Populate map with test data */
+ for (i = 0; i < 64; i++) {
+ key = i + 1;
+ expected_key_sum += key;
+
+ err = bpf_map_update_elem(map_fd, &key, &val, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "map_update"))
+ goto out;
+ }
+
+ memset(&linfo, 0, sizeof(linfo));
+ linfo.map.map_fd = map_fd;
+ opts.link_info = &linfo;
+ opts.link_info_len = sizeof(linfo);
+
+ link = bpf_program__attach_iter(skel->progs.dump_bpf_rhash_map, &opts);
+ if (!ASSERT_OK_PTR(link, "attach_iter"))
+ goto out;
+
+ iter_fd = bpf_iter_create(bpf_link__fd(link));
+ if (!ASSERT_GE(iter_fd, 0, "create_iter"))
+ goto free_link;
+
+ do {
+ len = read(iter_fd, buf, sizeof(buf));
+ } while (len > 0);
+
+ ASSERT_EQ(skel->bss->key_sum, expected_key_sum, "key_sum");
+ ASSERT_EQ(skel->bss->elem_count, 64, "elem_count");
+
+ close(iter_fd);
+
+free_link:
+ bpf_link__destroy(link);
+out:
+ bpf_iter_bpf_rhash_map__destroy(skel);
+}
+
+/*
+ * Test seq_file overflow handling for BPF iterator over resizable hashmap.
+ *
+ * The BPF program writes print_count * 8 bytes per element, configured so
+ * that a single element's output nearly fills the seq_file buffer (8 pages).
+ * With multiple elements, the buffer overflows mid-element, triggering
+ * seq_file's restart mechanism: it discards the partial output, enlarges or
+ * flushes the buffer, and re-invokes the BPF program starting from the
+ * element that caused the overflow.
+ *
+ * Insert few elements to avoid triggering rhashtable resize, then verify:
+ * - All elements are seen (unique_elem_count == num_elems)
+ * - Overflow occurred (total_visits > unique_elem_count)
+ * - Output is consistent (each chunk of print_count u64s has the same value)
+ */
+static void rhash_iter_overflow_test(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+ struct bpf_iter_bpf_rhash_map *skel;
+ u32 total_read_len, expected_read_len, write_len, num_elems = 4;
+ int err, i, j, len, map_fd, iter_fd;
+ union bpf_iter_link_info linfo;
+ struct bpf_link *link;
+ char *buf;
+
+ skel = bpf_iter_bpf_rhash_map__open();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_rhash_map__open"))
+ return;
+
+ write_len = sysconf(_SC_PAGE_SIZE) * 8;
+ skel->bss->print_count = (write_len - 8) / 8;
+ expected_read_len = num_elems * (write_len - 8);
+
+ err = bpf_iter_bpf_rhash_map__load(skel);
+ if (!ASSERT_OK(err, "bpf_iter_bpf_rhash_map__load"))
+ goto out;
+
+ map_fd = bpf_map__fd(skel->maps.rhashmap);
+
+ for (i = 0; i < num_elems; i++) {
+ __u64 val = i;
+
+ err = bpf_map_update_elem(map_fd, &i, &val, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "map_update"))
+ goto out;
+ }
+
+ memset(&linfo, 0, sizeof(linfo));
+ linfo.map.map_fd = map_fd;
+ opts.link_info = &linfo;
+ opts.link_info_len = sizeof(linfo);
+
+ link = bpf_program__attach_iter(skel->progs.dump_bpf_rhash_map_overflow, &opts);
+ if (!ASSERT_OK_PTR(link, "attach_iter"))
+ goto out;
+
+ iter_fd = bpf_iter_create(bpf_link__fd(link));
+ if (!ASSERT_GE(iter_fd, 0, "create_iter"))
+ goto free_link;
+
+ buf = malloc(expected_read_len);
+ if (!ASSERT_OK_PTR(buf, "malloc"))
+ goto close_iter;
+
+ total_read_len = 0;
+ while ((len = read(iter_fd, buf + total_read_len,
+ expected_read_len - total_read_len)) > 0)
+ total_read_len += len;
+
+ ASSERT_OK(len, "len");
+ ASSERT_EQ(total_read_len, expected_read_len, "total_read_len");
+ ASSERT_EQ(skel->bss->unique_elem_count, num_elems, "unique_elem_count");
+ ASSERT_GT(skel->bss->total_visits, skel->bss->unique_elem_count,
+ "overflow_occurred");
+
+ /* Verify each output chunk is internally consistent */
+ for (i = 0; i < num_elems; i++) {
+ __u64 *val = ((__u64 *)buf) + i * skel->bss->print_count;
+
+ ASSERT_LT(val[0], num_elems, "value_in_range");
+ for (j = 1; j < skel->bss->print_count; j++)
+ ASSERT_EQ(val[j], val[0], "consistent_value");
+ }
+
+ free(buf);
+close_iter:
+ close(iter_fd);
+free_link:
+ bpf_link__destroy(link);
+out:
+ bpf_iter_bpf_rhash_map__destroy(skel);
+}
+
void test_rhash(void)
{
if (test__start_subtest("test_rhash_lookup_update"))
@@ -340,5 +493,10 @@ void test_rhash(void)
if (test__start_subtest("test_rhash_get_next_key_stress"))
rhash_get_next_key_stress_test();
-}
+ if (test__start_subtest("test_rhash_iter"))
+ rhash_iter_test();
+
+ if (test__start_subtest("test_rhash_iter_overflow"))
+ rhash_iter_overflow_test();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c
new file mode 100644
index 000000000000..30c270f12b61
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_rhash_map.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RHASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 64);
+ __type(key, __u32);
+ __type(value, __u64);
+} rhashmap SEC(".maps");
+
+__u32 key_sum = 0;
+__u64 val_sum = 0;
+__u32 elem_count = 0;
+__u32 err = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_rhash_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+ __u32 *key = ctx->key;
+ __u64 *val = ctx->value;
+
+ if (!key || !val)
+ return 0;
+
+ key_sum += *key;
+ val_sum += *val;
+ elem_count++;
+ return 0;
+}
+
+/* For overflow test: configurable print count */
+__u32 print_count = 0;
+
+__u64 seen_keys = 0;
+__u32 unique_elem_count = 0;
+__u32 total_visits = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_rhash_map_overflow(struct bpf_iter__bpf_map_elem *ctx)
+{
+ struct seq_file *seq = ctx->meta->seq;
+ __u32 *key = ctx->key;
+ __u64 *val = ctx->value;
+ __u64 bit;
+ __u32 i;
+
+ if (!key || !val)
+ return 0; /* The end of iteration */
+
+ total_visits++;
+
+ /* Validate key value are as expected */
+ if (*key != *val || *key > 64) {
+ err = 1;
+ return 0;
+ }
+
+ bit = 1ULL << *key;
+ if (!(seen_keys & bit))
+ unique_elem_count++;
+ seen_keys |= bit;
+
+ /* Write print_count * 8 bytes to potentially overflow buffer */
+ bpf_for(i, 0, print_count) {
+ if (bpf_seq_write(seq, val, sizeof(__u64)))
+ return 0;
+ }
+
+ return 0;
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (15 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-14 17:51 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
2026-04-12 23:11 ` [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Alexei Starovoitov
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Make bpftool documentation aware of the resizable hash map.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/map.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 1af3305ea2b2..5daf3de5c744 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -56,7 +56,7 @@ MAP COMMANDS
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena**
-| | **insn_array** }
+| | **insn_array** | **rhash** }
DESCRIPTION
===========
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7ebf7dbcfba4..71a45d96617e 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1478,7 +1478,7 @@ static int do_help(int argc, char **argv)
" cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
" queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
" task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena |\n"
- " insn_array }\n"
+ " insn_array | rhash }\n"
" " HELP_SPEC_OPTIONS " |\n"
" {-f|--bpffs} | {-n|--nomount} }\n"
"",
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (16 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation Mykyta Yatsenko
@ 2026-04-08 15:10 ` Mykyta Yatsenko
2026-04-12 23:25 ` Alexei Starovoitov
2026-04-12 23:11 ` [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Alexei Starovoitov
18 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
herbert
Cc: Mykyta Yatsenko
From: Mykyta Yatsenko <yatsenko@meta.com>
Support resizable hashmap in BPF map benchmarks.
Results:
$ sudo ./bench -w3 -d10 -a bpf-rhashmap-full-update
0:hash_map_full_perf 21641414 events per sec
$ sudo ./bench -w3 -d10 -a bpf-hashmap-full-update
0:hash_map_full_perf 4392758 events per sec
$ sudo ./bench -w3 -d10 -a -p8 htab-mem --use-case overwrite --value-size 8
Iter 0 (302.834us): per-prod-op 62.85k/s, memory usage 2.70MiB
Iter 1 (-44.810us): per-prod-op 62.81k/s, memory usage 2.70MiB
Iter 2 (-45.821us): per-prod-op 62.81k/s, memory usage 2.70MiB
Iter 3 (-63.658us): per-prod-op 62.92k/s, memory usage 2.70MiB
Iter 4 ( 32.887us): per-prod-op 62.85k/s, memory usage 2.70MiB
Iter 5 (-76.948us): per-prod-op 62.75k/s, memory usage 2.70MiB
Iter 6 (157.235us): per-prod-op 63.01k/s, memory usage 2.70MiB
Iter 7 (-118.761us): per-prod-op 62.85k/s, memory usage 2.70MiB
Iter 8 (127.139us): per-prod-op 62.92k/s, memory usage 2.70MiB
Iter 9 (-169.908us): per-prod-op 62.99k/s, memory usage 2.70MiB
Iter 10 (101.962us): per-prod-op 62.97k/s, memory usage 2.70MiB
Iter 11 (-64.330us): per-prod-op 63.05k/s, memory usage 2.70MiB
Iter 12 (-20.543us): per-prod-op 62.86k/s, memory usage 2.70MiB
Iter 13 ( 55.382us): per-prod-op 62.95k/s, memory usage 2.70MiB
Summary: per-prod-op 62.92 ± 0.09k/s, memory usage 2.70 ± 0.00MiB, peak memory usage 2.96MiB
$ sudo ./bench -w3 -d10 -a -p8 rhtab-mem --use-case overwrite --value-size 8
Iter 0 (316.805us): per-prod-op 96.40k/s, memory usage 2.71MiB
Iter 1 (-35.225us): per-prod-op 96.54k/s, memory usage 2.71MiB
Iter 2 (-12.431us): per-prod-op 96.54k/s, memory usage 2.71MiB
Iter 3 (-56.537us): per-prod-op 96.58k/s, memory usage 2.71MiB
Iter 4 ( 27.108us): per-prod-op 96.62k/s, memory usage 2.71MiB
Iter 5 (-52.491us): per-prod-op 96.57k/s, memory usage 2.71MiB
Iter 6 ( -2.777us): per-prod-op 96.52k/s, memory usage 2.71MiB
Iter 7 (108.963us): per-prod-op 96.45k/s, memory usage 2.71MiB
Iter 8 (-61.575us): per-prod-op 96.48k/s, memory usage 2.71MiB
Iter 9 (-21.595us): per-prod-op 96.14k/s, memory usage 2.71MiB
Iter 10 ( 3.243us): per-prod-op 96.36k/s, memory usage 2.71MiB
Iter 11 ( 3.102us): per-prod-op 94.70k/s, memory usage 2.71MiB
Iter 12 (109.102us): per-prod-op 95.77k/s, memory usage 2.71MiB
Iter 13 ( 16.153us): per-prod-op 95.91k/s, memory usage 2.71MiB
Summary: per-prod-op 96.19 ± 0.57k/s, memory usage 2.71 ± 0.00MiB, peak memory usage 2.71MiB
sudo ./bench -w3 -d10 -a bpf-hashmap-lookup --key_size 4\
--max_entries 1000 --nr_entries 500 --nr_loops 1000000
cpu00: lookup 28.603M ± 0.536M events/sec (approximated from 32 samples of ~34ms)
sudo ./bench -w3 -d10 -a bpf-rhashmap-lookup --key_size 4\
--max_entries 1000 --nr_entries 500 --nr_loops 1000000
cpu00: lookup 27.340M ± 0.864M events/sec (approximated from 32 samples of ~36ms)
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
tools/testing/selftests/bpf/bench.c | 6 ++++
.../bpf/benchs/bench_bpf_hashmap_full_update.c | 34 +++++++++++++++++++--
.../bpf/benchs/bench_bpf_hashmap_lookup.c | 31 +++++++++++++++++--
.../testing/selftests/bpf/benchs/bench_htab_mem.c | 35 ++++++++++++++++++++--
4 files changed, 100 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 029b3e21f438..722877c00e8b 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -558,13 +558,16 @@ extern const struct bench bench_bpf_loop;
extern const struct bench bench_strncmp_no_helper;
extern const struct bench bench_strncmp_helper;
extern const struct bench bench_bpf_hashmap_full_update;
+extern const struct bench bench_bpf_rhashmap_full_update;
extern const struct bench bench_local_storage_cache_seq_get;
extern const struct bench bench_local_storage_cache_interleaved_get;
extern const struct bench bench_local_storage_cache_hashmap_control;
extern const struct bench bench_local_storage_tasks_trace;
extern const struct bench bench_bpf_hashmap_lookup;
+extern const struct bench bench_bpf_rhashmap_lookup;
extern const struct bench bench_local_storage_create;
extern const struct bench bench_htab_mem;
+extern const struct bench bench_rhtab_mem;
extern const struct bench bench_crypto_encrypt;
extern const struct bench bench_crypto_decrypt;
extern const struct bench bench_sockmap;
@@ -636,13 +639,16 @@ static const struct bench *benchs[] = {
&bench_strncmp_no_helper,
&bench_strncmp_helper,
&bench_bpf_hashmap_full_update,
+ &bench_bpf_rhashmap_full_update,
&bench_local_storage_cache_seq_get,
&bench_local_storage_cache_interleaved_get,
&bench_local_storage_cache_hashmap_control,
&bench_local_storage_tasks_trace,
&bench_bpf_hashmap_lookup,
+ &bench_bpf_rhashmap_lookup,
&bench_local_storage_create,
&bench_htab_mem,
+ &bench_rhtab_mem,
&bench_crypto_encrypt,
&bench_crypto_decrypt,
&bench_sockmap,
diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
index ee1dc12c5e5e..7278fa860397 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_full_update.c
@@ -34,19 +34,29 @@ static void measure(struct bench_res *res)
{
}
-static void setup(void)
+static void hashmap_full_update_setup(enum bpf_map_type map_type)
{
struct bpf_link *link;
int map_fd, i, max_entries;
setup_libbpf();
- ctx.skel = bpf_hashmap_full_update_bench__open_and_load();
+ ctx.skel = bpf_hashmap_full_update_bench__open();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
+ bpf_map__set_type(ctx.skel->maps.hash_map_bench, map_type);
+ if (map_type == BPF_MAP_TYPE_RHASH)
+ bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench,
+ BPF_F_NO_PREALLOC);
+
+ if (bpf_hashmap_full_update_bench__load(ctx.skel)) {
+ fprintf(stderr, "failed to load skeleton\n");
+ exit(1);
+ }
+
ctx.skel->bss->nr_loops = MAX_LOOP_NUM;
link = bpf_program__attach(ctx.skel->progs.benchmark);
@@ -62,6 +72,16 @@ static void setup(void)
bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
}
+static void setup(void)
+{
+ hashmap_full_update_setup(BPF_MAP_TYPE_HASH);
+}
+
+static void rhash_setup(void)
+{
+ hashmap_full_update_setup(BPF_MAP_TYPE_RHASH);
+}
+
static void hashmap_report_final(struct bench_res res[], int res_cnt)
{
unsigned int nr_cpus = bpf_num_possible_cpus();
@@ -87,3 +107,13 @@ const struct bench bench_bpf_hashmap_full_update = {
.report_progress = NULL,
.report_final = hashmap_report_final,
};
+
+const struct bench bench_bpf_rhashmap_full_update = {
+ .name = "bpf-rhashmap-full-update",
+ .validate = validate,
+ .setup = rhash_setup,
+ .producer_thread = producer,
+ .measure = measure,
+ .report_progress = NULL,
+ .report_final = hashmap_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
index 279ff1b8b5b2..5264b7b20e39 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bpf_hashmap_lookup.c
@@ -148,9 +148,10 @@ static inline void patch_key(u32 i, u32 *key)
/* the rest of key is random */
}
-static void setup(void)
+static void hashmap_lookup_setup(enum bpf_map_type map_type)
{
struct bpf_link *link;
+ __u32 map_flags;
int map_fd;
int ret;
int i;
@@ -163,10 +164,15 @@ static void setup(void)
exit(1);
}
+ map_flags = args.map_flags;
+ if (map_type == BPF_MAP_TYPE_RHASH)
+ map_flags |= BPF_F_NO_PREALLOC;
+
+ bpf_map__set_type(ctx.skel->maps.hash_map_bench, map_type);
bpf_map__set_max_entries(ctx.skel->maps.hash_map_bench, args.max_entries);
bpf_map__set_key_size(ctx.skel->maps.hash_map_bench, args.key_size);
bpf_map__set_value_size(ctx.skel->maps.hash_map_bench, 8);
- bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench, args.map_flags);
+ bpf_map__set_map_flags(ctx.skel->maps.hash_map_bench, map_flags);
ctx.skel->bss->nr_entries = args.nr_entries;
ctx.skel->bss->nr_loops = args.nr_loops / args.nr_entries;
@@ -197,6 +203,16 @@ static void setup(void)
}
}
+static void setup(void)
+{
+ hashmap_lookup_setup(BPF_MAP_TYPE_HASH);
+}
+
+static void rhash_setup(void)
+{
+ hashmap_lookup_setup(BPF_MAP_TYPE_RHASH);
+}
+
static inline double events_from_time(u64 time)
{
if (time)
@@ -275,3 +291,14 @@ const struct bench bench_bpf_hashmap_lookup = {
.report_progress = NULL,
.report_final = hashmap_report_final,
};
+
+const struct bench bench_bpf_rhashmap_lookup = {
+ .name = "bpf-rhashmap-lookup",
+ .argp = &bench_hashmap_lookup_argp,
+ .validate = validate,
+ .setup = rhash_setup,
+ .producer_thread = producer,
+ .measure = measure,
+ .report_progress = NULL,
+ .report_final = hashmap_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
index 297e32390cd1..1ee217d97434 100644
--- a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
+++ b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
@@ -152,7 +152,7 @@ static const struct htab_mem_use_case *htab_mem_find_use_case_or_exit(const char
exit(1);
}
-static void htab_mem_setup(void)
+static void htab_mem_setup_impl(enum bpf_map_type map_type)
{
struct bpf_map *map;
const char **names;
@@ -178,10 +178,11 @@ static void htab_mem_setup(void)
}
map = ctx.skel->maps.htab;
+ bpf_map__set_type(map, map_type);
bpf_map__set_value_size(map, args.value_size);
/* Ensure that different CPUs can operate on different subset */
bpf_map__set_max_entries(map, MAX(8192, 64 * env.nr_cpus));
- if (args.preallocated)
+ if (map_type != BPF_MAP_TYPE_RHASH && args.preallocated)
bpf_map__set_map_flags(map, bpf_map__map_flags(map) & ~BPF_F_NO_PREALLOC);
names = ctx.uc->progs;
@@ -220,6 +221,16 @@ static void htab_mem_setup(void)
exit(1);
}
+static void htab_mem_setup(void)
+{
+ htab_mem_setup_impl(BPF_MAP_TYPE_HASH);
+}
+
+static void rhtab_mem_setup(void)
+{
+ htab_mem_setup_impl(BPF_MAP_TYPE_RHASH);
+}
+
static void htab_mem_add_fn(pthread_barrier_t *notify)
{
while (true) {
@@ -338,6 +349,15 @@ static void htab_mem_report_final(struct bench_res res[], int res_cnt)
cleanup_cgroup_environment();
}
+static void rhtab_mem_validate(void)
+{
+ if (args.preallocated) {
+ fprintf(stderr, "rhash map does not support preallocation\n");
+ exit(1);
+ }
+ htab_mem_validate();
+}
+
const struct bench bench_htab_mem = {
.name = "htab-mem",
.argp = &bench_htab_mem_argp,
@@ -348,3 +368,14 @@ const struct bench bench_htab_mem = {
.report_progress = htab_mem_report_progress,
.report_final = htab_mem_report_final,
};
+
+const struct bench bench_rhtab_mem = {
+ .name = "rhtab-mem",
+ .argp = &bench_htab_mem_argp,
+ .validate = rhtab_mem_validate,
+ .setup = rhtab_mem_setup,
+ .producer_thread = htab_mem_producer,
+ .measure = htab_mem_measure,
+ .report_progress = htab_mem_report_progress,
+ .report_final = htab_mem_report_final,
+};
--
2.52.0
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
@ 2026-04-10 22:31 ` Emil Tsalapatis
2026-04-13 8:10 ` Mykyta Yatsenko
0 siblings, 1 reply; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-10 22:31 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Add resizable hash map into enums where it is needed.
>
These changes in isolation are difficult to reason about,
can we roll this into subsequent patches? Right now this
adds a the BPF_MAP_TYPE_RHASH without there being a way
to create one.
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> include/uapi/linux/bpf.h | 1 +
> kernel/bpf/map_iter.c | 3 ++-
> kernel/bpf/syscall.c | 3 +++
> kernel/bpf/verifier.c | 1 +
> tools/include/uapi/linux/bpf.h | 1 +
> 5 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 552bc5d9afbd..822582c04f22 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1046,6 +1046,7 @@ enum bpf_map_type {
> BPF_MAP_TYPE_CGRP_STORAGE,
> BPF_MAP_TYPE_ARENA,
> BPF_MAP_TYPE_INSN_ARRAY,
> + BPF_MAP_TYPE_RHASH,
> __MAX_BPF_MAP_TYPE
> };
>
> diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
> index 261a03ea73d3..4a2aafbe28b4 100644
> --- a/kernel/bpf/map_iter.c
> +++ b/kernel/bpf/map_iter.c
> @@ -119,7 +119,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog,
> is_percpu = true;
> else if (map->map_type != BPF_MAP_TYPE_HASH &&
> map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> - map->map_type != BPF_MAP_TYPE_ARRAY)
> + map->map_type != BPF_MAP_TYPE_ARRAY &&
> + map->map_type != BPF_MAP_TYPE_RHASH)
> goto put_map;
>
> key_acc_size = prog->aux->max_rdonly_access;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 51ade3cde8bb..0a5ec417638d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1287,6 +1287,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> case BPF_SPIN_LOCK:
> case BPF_RES_SPIN_LOCK:
> if (map->map_type != BPF_MAP_TYPE_HASH &&
> + map->map_type != BPF_MAP_TYPE_RHASH &&
> map->map_type != BPF_MAP_TYPE_ARRAY &&
> map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
> map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
> @@ -1464,6 +1465,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
> case BPF_MAP_TYPE_CGROUP_ARRAY:
> case BPF_MAP_TYPE_ARRAY_OF_MAPS:
> case BPF_MAP_TYPE_HASH:
> + case BPF_MAP_TYPE_RHASH:
> case BPF_MAP_TYPE_PERCPU_HASH:
> case BPF_MAP_TYPE_HASH_OF_MAPS:
> case BPF_MAP_TYPE_RINGBUF:
> @@ -2199,6 +2201,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
> map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
> map->map_type == BPF_MAP_TYPE_LRU_HASH ||
> map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
> + map->map_type == BPF_MAP_TYPE_RHASH ||
> map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
> if (!bpf_map_is_offloaded(map)) {
> bpf_disable_instrumentation();
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8c1cf2eb6cbb..53523ab953c2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -21816,6 +21816,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> if (prog->sleepable)
> switch (map->map_type) {
> case BPF_MAP_TYPE_HASH:
> + case BPF_MAP_TYPE_RHASH:
> case BPF_MAP_TYPE_LRU_HASH:
> case BPF_MAP_TYPE_ARRAY:
> case BPF_MAP_TYPE_PERCPU_HASH:
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 677be9a47347..9d7df174770a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1046,6 +1046,7 @@ enum bpf_map_type {
> BPF_MAP_TYPE_CGRP_STORAGE,
> BPF_MAP_TYPE_ARENA,
> BPF_MAP_TYPE_INSN_ARRAY,
> + BPF_MAP_TYPE_RHASH,
> __MAX_BPF_MAP_TYPE
> };
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
@ 2026-04-12 23:10 ` Alexei Starovoitov
2026-04-13 10:52 ` Mykyta Yatsenko
2026-04-13 20:37 ` Emil Tsalapatis
2026-04-14 10:25 ` Leon Hwang
2 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:10 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
>
> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
> +{
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + /* Using constant zeroed params to force rhashtable use inlined hashfunc */
> + static const struct rhashtable_params params = { 0 };
> +
> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
How does the asm look like?
Please share 'perf report' or 'perf annotate' for lookup.
Maybe worth it to include in commit log.
> +}
> +
> static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
> {
> - return NULL;
imo there is no point in such a step by step introduction.
Patches 1,2,3 can be squashed.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
` (17 preceding siblings ...)
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
@ 2026-04-12 23:11 ` Alexei Starovoitov
2026-04-13 8:28 ` Mykyta Yatsenko
18 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:11 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> Benchmarks and s390 tests depend on the
> https://lore.kernel.org/linux-crypto/20260224192954.819444-1-mykyta.yatsenko5@gmail.com/
Is it going into this merge window?
Please include all perf numbers right in the cover letter.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from()
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
@ 2026-04-12 23:13 ` Alexei Starovoitov
2026-04-13 12:22 ` Mykyta Yatsenko
2026-04-13 22:22 ` Emil Tsalapatis
1 sibling, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:13 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> +
> + /* Test 2: walk_enter_from with non-existent key starts from bucket */
> + {
> + struct test_obj_val key = { .id = 99 };
> +
> + scoped_guard(rcu) {
> + rhashtable_walk_enter_from(&ht, &iter, &key,
> + test_rht_params);
> + rhashtable_walk_start(&iter);
> + }
> +
> + obj = rhashtable_walk_next(&iter);
> + while (IS_ERR(obj) && PTR_ERR(obj) == -EAGAIN)
> + obj = rhashtable_walk_next(&iter);
> +
> + /* Should still return some element (iteration from bucket start) */
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> + }
> +
> + /* Test 3: verify walk_enter_from + walk_next can iterate remaining elements */
> + {
> + struct test_obj_val key = { .id = 0 };
> + int count = 0;
Please de-claude this.
I couldn't force claude to avoid using these pointless indents.
Either figure out how to do it or remove it manually.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free " Mykyta Yatsenko
@ 2026-04-12 23:15 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:15 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Initialize rhashtable with bpf_mem_alloc element cache. Require
> BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via
> rhashtable_free_and_destroy() callback to handle internal structs.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> kernel/bpf/hashtab.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 60 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 492c6a9154b6..a62093d8d1ae 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2762,16 +2762,74 @@ static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
>
> static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
> {
> - return ERR_PTR(-EOPNOTSUPP);
> + struct bpf_rhtab *rhtab;
patches 6,7,8 are ok as step-by-step,
but this one should be squashed with 1-3
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
@ 2026-04-12 23:16 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:16 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Test basic map operations (lookup, update, delete) for
> BPF_MAP_TYPE_RHASH including boundary conditions like duplicate
> key insertion and deletion of nonexistent keys.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> tools/testing/selftests/bpf/prog_tests/rhash.c | 64 +++++++
> tools/testing/selftests/bpf/progs/rhash.c | 242 +++++++++++++++++++++++++
> 2 files changed, 306 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/rhash.c b/tools/testing/selftests/bpf/prog_tests/rhash.c
> new file mode 100644
> index 000000000000..40f30ce69190
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/rhash.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
year what?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps Mykyta Yatsenko
@ 2026-04-12 23:17 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:17 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Parameterize existing htab tests (test_hashmap, test_hashmap_sizes,
> test_hashmap_walk, htab_map_batch_ops) by map type so they run
> against BPF_MAP_TYPE_RHASH as well. Relax ordering and exact-count
> assertions where rhashtable resize makes them non-deterministic.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> .../selftests/bpf/map_tests/htab_map_batch_ops.c | 22 +++-
> tools/testing/selftests/bpf/test_maps.c | 127 ++++++++++++++++-----
> 2 files changed, 114 insertions(+), 35 deletions(-)
drop test_maps changes. It's quite obsolete. We probably should delete it.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests Mykyta Yatsenko
@ 2026-04-12 23:18 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:18 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:11 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Add tests validating resizable hash map handles BPF_F_LOCK flag as
> expected.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> tools/testing/selftests/bpf/prog_tests/rhash.c | 99 ++++++++++++++++++++++++++
> tools/testing/selftests/bpf/progs/rhash.c | 35 +++++++++
> 2 files changed, 134 insertions(+)
drop this patch completely. Yes, we know spinlock works.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key Mykyta Yatsenko
@ 2026-04-12 23:19 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:19 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:11 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Test get_next_key behavior under concurrent modification:
> * Resize test: verify all elements visited after resize trigger
> * Stress test: concurrent iterators and modifiers to detect races
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> tools/testing/selftests/bpf/prog_tests/rhash.c | 181 +++++++++++++++++++++++++
> tools/testing/selftests/bpf/progs/rhash.c | 8 ++
> 2 files changed, 189 insertions(+)
I don't like this either. It's a test of rhashtable internals.
If you really feel like testing it, it shouldn't be in selftests/bpf
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map Mykyta Yatsenko
@ 2026-04-12 23:20 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:20 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:11 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Test BPF iterator functionality for BPF_MAP_TYPE_RHASH:
> * Basic iteration verifying all elements are visited
> * Overflow test triggering seq_file restart, validating correct
> resume behavior via skip_elems tracking
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> tools/testing/selftests/bpf/prog_tests/rhash.c | 160 ++++++++++++++++++++-
> .../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c | 75 ++++++++++
> 2 files changed, 234 insertions(+), 1 deletion(-)
too much code for little test coverage.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
@ 2026-04-12 23:25 ` Alexei Starovoitov
0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-12 23:25 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Wed, Apr 8, 2026 at 8:11 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Support resizable hashmap in BPF map benchmarks.
>
> Results:
> $ sudo ./bench -w3 -d10 -a bpf-rhashmap-full-update
> 0:hash_map_full_perf 21641414 events per sec
>
> $ sudo ./bench -w3 -d10 -a bpf-hashmap-full-update
> 0:hash_map_full_perf 4392758 events per sec
>
> $ sudo ./bench -w3 -d10 -a -p8 htab-mem --use-case overwrite --value-size 8
> Iter 0 (302.834us): per-prod-op 62.85k/s, memory usage 2.70MiB
> Iter 1 (-44.810us): per-prod-op 62.81k/s, memory usage 2.70MiB
> Iter 2 (-45.821us): per-prod-op 62.81k/s, memory usage 2.70MiB
> Iter 3 (-63.658us): per-prod-op 62.92k/s, memory usage 2.70MiB
> Iter 4 ( 32.887us): per-prod-op 62.85k/s, memory usage 2.70MiB
> Iter 5 (-76.948us): per-prod-op 62.75k/s, memory usage 2.70MiB
> Iter 6 (157.235us): per-prod-op 63.01k/s, memory usage 2.70MiB
> Iter 7 (-118.761us): per-prod-op 62.85k/s, memory usage 2.70MiB
> Iter 8 (127.139us): per-prod-op 62.92k/s, memory usage 2.70MiB
> Iter 9 (-169.908us): per-prod-op 62.99k/s, memory usage 2.70MiB
> Iter 10 (101.962us): per-prod-op 62.97k/s, memory usage 2.70MiB
> Iter 11 (-64.330us): per-prod-op 63.05k/s, memory usage 2.70MiB
> Iter 12 (-20.543us): per-prod-op 62.86k/s, memory usage 2.70MiB
> Iter 13 ( 55.382us): per-prod-op 62.95k/s, memory usage 2.70MiB
No need to list them all. 2 or 3 is enough.
> Summary: per-prod-op 62.92 ± 0.09k/s, memory usage 2.70 ± 0.00MiB, peak memory usage 2.96MiB
>
> $ sudo ./bench -w3 -d10 -a -p8 rhtab-mem --use-case overwrite --value-size 8
> Iter 0 (316.805us): per-prod-op 96.40k/s, memory usage 2.71MiB
> Iter 1 (-35.225us): per-prod-op 96.54k/s, memory usage 2.71MiB
> Iter 2 (-12.431us): per-prod-op 96.54k/s, memory usage 2.71MiB
> Iter 3 (-56.537us): per-prod-op 96.58k/s, memory usage 2.71MiB
> Iter 4 ( 27.108us): per-prod-op 96.62k/s, memory usage 2.71MiB
> Iter 5 (-52.491us): per-prod-op 96.57k/s, memory usage 2.71MiB
> Iter 6 ( -2.777us): per-prod-op 96.52k/s, memory usage 2.71MiB
> Iter 7 (108.963us): per-prod-op 96.45k/s, memory usage 2.71MiB
> Iter 8 (-61.575us): per-prod-op 96.48k/s, memory usage 2.71MiB
> Iter 9 (-21.595us): per-prod-op 96.14k/s, memory usage 2.71MiB
> Iter 10 ( 3.243us): per-prod-op 96.36k/s, memory usage 2.71MiB
> Iter 11 ( 3.102us): per-prod-op 94.70k/s, memory usage 2.71MiB
> Iter 12 (109.102us): per-prod-op 95.77k/s, memory usage 2.71MiB
> Iter 13 ( 16.153us): per-prod-op 95.91k/s, memory usage 2.71MiB
> Summary: per-prod-op 96.19 ± 0.57k/s, memory usage 2.71 ± 0.00MiB, peak memory usage 2.71MiB
>
> sudo ./bench -w3 -d10 -a bpf-hashmap-lookup --key_size 4\
> --max_entries 1000 --nr_entries 500 --nr_loops 1000000
> cpu00: lookup 28.603M ± 0.536M events/sec (approximated from 32 samples of ~34ms)
>
> sudo ./bench -w3 -d10 -a bpf-rhashmap-lookup --key_size 4\
> --max_entries 1000 --nr_entries 500 --nr_loops 1000000
> cpu00: lookup 27.340M ± 0.864M events/sec (approximated from 32 samples of ~36ms)
What about other key sizes? That would be more useful info.
size 8 especially.
and different entries.
Think of holistic benchmark. Everything that one might need
from a hashtable.
key_size=4, 500 entries... is ... ok. fine. but not something
anyone can use to make a decision whether bpf_rhash should be
used or not by a user with its own use case pattern.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map
2026-04-10 22:31 ` Emil Tsalapatis
@ 2026-04-13 8:10 ` Mykyta Yatsenko
2026-04-14 17:50 ` Emil Tsalapatis
0 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 8:10 UTC (permalink / raw)
To: Emil Tsalapatis, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On 4/10/26 11:31 PM, Emil Tsalapatis wrote:
> On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Add resizable hash map into enums where it is needed.
>>
>
> These changes in isolation are difficult to reason about,
> can we roll this into subsequent patches? Right now this
> adds a the BPF_MAP_TYPE_RHASH without there being a way
> to create one.
>
Thanks for taking a look, makes sense, I can squash it into the latter
commits.
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>> include/uapi/linux/bpf.h | 1 +
>> kernel/bpf/map_iter.c | 3 ++-
>> kernel/bpf/syscall.c | 3 +++
>> kernel/bpf/verifier.c | 1 +
>> tools/include/uapi/linux/bpf.h | 1 +
>> 5 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 552bc5d9afbd..822582c04f22 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1046,6 +1046,7 @@ enum bpf_map_type {
>> BPF_MAP_TYPE_CGRP_STORAGE,
>> BPF_MAP_TYPE_ARENA,
>> BPF_MAP_TYPE_INSN_ARRAY,
>> + BPF_MAP_TYPE_RHASH,
>> __MAX_BPF_MAP_TYPE
>> };
>>
>> diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
>> index 261a03ea73d3..4a2aafbe28b4 100644
>> --- a/kernel/bpf/map_iter.c
>> +++ b/kernel/bpf/map_iter.c
>> @@ -119,7 +119,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog,
>> is_percpu = true;
>> else if (map->map_type != BPF_MAP_TYPE_HASH &&
>> map->map_type != BPF_MAP_TYPE_LRU_HASH &&
>> - map->map_type != BPF_MAP_TYPE_ARRAY)
>> + map->map_type != BPF_MAP_TYPE_ARRAY &&
>> + map->map_type != BPF_MAP_TYPE_RHASH)
>> goto put_map;
>>
>> key_acc_size = prog->aux->max_rdonly_access;
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 51ade3cde8bb..0a5ec417638d 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -1287,6 +1287,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
>> case BPF_SPIN_LOCK:
>> case BPF_RES_SPIN_LOCK:
>> if (map->map_type != BPF_MAP_TYPE_HASH &&
>> + map->map_type != BPF_MAP_TYPE_RHASH &&
>> map->map_type != BPF_MAP_TYPE_ARRAY &&
>> map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
>> map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
>> @@ -1464,6 +1465,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
>> case BPF_MAP_TYPE_CGROUP_ARRAY:
>> case BPF_MAP_TYPE_ARRAY_OF_MAPS:
>> case BPF_MAP_TYPE_HASH:
>> + case BPF_MAP_TYPE_RHASH:
>> case BPF_MAP_TYPE_PERCPU_HASH:
>> case BPF_MAP_TYPE_HASH_OF_MAPS:
>> case BPF_MAP_TYPE_RINGBUF:
>> @@ -2199,6 +2201,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>> map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
>> map->map_type == BPF_MAP_TYPE_LRU_HASH ||
>> map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
>> + map->map_type == BPF_MAP_TYPE_RHASH ||
>> map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
>> if (!bpf_map_is_offloaded(map)) {
>> bpf_disable_instrumentation();
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 8c1cf2eb6cbb..53523ab953c2 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -21816,6 +21816,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>> if (prog->sleepable)
>> switch (map->map_type) {
>> case BPF_MAP_TYPE_HASH:
>> + case BPF_MAP_TYPE_RHASH:
>> case BPF_MAP_TYPE_LRU_HASH:
>> case BPF_MAP_TYPE_ARRAY:
>> case BPF_MAP_TYPE_PERCPU_HASH:
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 677be9a47347..9d7df174770a 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -1046,6 +1046,7 @@ enum bpf_map_type {
>> BPF_MAP_TYPE_CGRP_STORAGE,
>> BPF_MAP_TYPE_ARENA,
>> BPF_MAP_TYPE_INSN_ARRAY,
>> + BPF_MAP_TYPE_RHASH,
>> __MAX_BPF_MAP_TYPE
>> };
>>
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map
2026-04-12 23:11 ` [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Alexei Starovoitov
@ 2026-04-13 8:28 ` Mykyta Yatsenko
0 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 8:28 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On 4/13/26 12:11 AM, Alexei Starovoitov wrote:
> On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>> Benchmarks and s390 tests depend on the
>> https://lore.kernel.org/linux-crypto/20260224192954.819444-1-mykyta.yatsenko5@gmail.com/
>
> Is it going into this merge window?
>
It should, the patch has been applied to linux-crypto on 7-Mar. I
realized it also fixes a bug only after that, so no fixes tag on the patch.
> Please include all perf numbers right in the cover letter.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-12 23:10 ` Alexei Starovoitov
@ 2026-04-13 10:52 ` Mykyta Yatsenko
2026-04-13 16:24 ` Alexei Starovoitov
0 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 10:52 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On 4/13/26 12:10 AM, Alexei Starovoitov wrote:
> On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>>
>> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
>> +{
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + /* Using constant zeroed params to force rhashtable use inlined hashfunc */
>> + static const struct rhashtable_params params = { 0 };
>> +
>> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
>
> How does the asm look like?
You can see how asm looks like below. An interesting thing is that gcc
inlines jhash2, but not clang, which costs a lot of performance.
gcc:
lookup 20.675M ± 0.090M events/sec (approximated from 32 samples of ~48ms)
clang:
cpu00: lookup 15.882M ± 0.530M events/sec (approximated from 32 samples
of ~62ms)
I think inlining is also consistent for htab and rhtab (either both
inline or both do not inline).
perf annotate on gcc:
Percent | Source code & Disassembly of vmlinux for cpu/cycles/P (18251
samples, percent: local period)
------------------------------------------------------------------------------------------------------------
: 0 0xffffffff8146b630 <rhtab_lookup_elem>:
0.00 : ffffffff8146b630: endbr64
0.10 : ffffffff8146b634: callq 0xffffffff8126c6c0
<__fentry__>
0.00 : ffffffff8146b639: pushq %r15
0.70 : ffffffff8146b63b: pushq %r14
0.03 : ffffffff8146b63d: pushq %r13
0.56 : ffffffff8146b63f: pushq %r12
0.00 : ffffffff8146b641: pushq %rbp
0.00 : ffffffff8146b642: movq %rsi, %rbp
0.00 : ffffffff8146b645: pushq %rbx
0.06 : ffffffff8146b646: movq %rdi, %rbx
0.00 : ffffffff8146b649: subq $0x18, %rsp
0.02 : ffffffff8146b64d: movq 0x120(%rdi), %r12
2.58 : ffffffff8146b654: movzwl 0x132(%rbx), %r8d
0.00 : ffffffff8146b65c: movl 0x8(%r12), %edx
0.12 : ffffffff8146b661: movl %r8d, %eax
0.00 : ffffffff8146b664: movl %r8d, %esi
0.00 : ffffffff8146b667: testb $0x3, %al
0.00 : ffffffff8146b669: je 0xffffffff8146b726
<rhtab_lookup_elem+0xf6>
0.00 : ffffffff8146b66f: movq %rbp, %rdi
0.00 : ffffffff8146b672: callq 0xffffffff81467bd0 <jhash>
0.00 : ffffffff8146b677: movl (%r12), %esi
0.13 : ffffffff8146b67b: subl $0x1, %esi
1.91 : ffffffff8146b67e: andl %eax, %esi
0.00 : ffffffff8146b680: movl %esi, %eax
0.01 : ffffffff8146b682: leaq 0x40(%r12,%rax,8), %r14
2.80 : ffffffff8146b687: movl 0x4(%r12), %eax
0.01 : ffffffff8146b68c: testl %eax, %eax
0.01 : ffffffff8146b68e: jne 0xffffffff8146b821
<rhtab_lookup_elem+0x1f1>
0.00 : ffffffff8146b694: movq %r14, %rax
2.41 : ffffffff8146b697: orq $0x1, %rax
0.00 : ffffffff8146b69b: movq %rax, 0x8(%rsp)
2.63 : ffffffff8146b6a0: movq (%r14), %rcx
13.96 : ffffffff8146b6a3: andq $-0x2, %rcx
0.01 : ffffffff8146b6a7: cmoveq 0x8(%rsp), %rcx
0.02 : ffffffff8146b6ad: movq %rcx, %r15
0.00 : ffffffff8146b6b0: testb $0x1, %cl
0.06 : ffffffff8146b6b3: jne 0xffffffff8146b809
<rhtab_lookup_elem+0x1d9>
0.00 : ffffffff8146b6b9: movzwl 0x136(%rbx), %eax
0.00 : ffffffff8146b6c0: movzwl 0x134(%rbx), %r13d
0.00 : ffffffff8146b6c8: movzwl 0x132(%rbx), %edx
0.00 : ffffffff8146b6cf: movq %rax, 0x10(%rsp)
2.15 : ffffffff8146b6d4: subq %rax, %r13
0.00 : ffffffff8146b6d7: leaq (%r15,%r13), %rdi
0.04 : ffffffff8146b6db: movq %rbp, %rsi
0.00 : ffffffff8146b6de: movq %rdx, (%rsp)
1.13 : ffffffff8146b6e2: callq 0xffffffff81eaa020 <memcmp>
3.18 : ffffffff8146b6e7: movq (%rsp), %rdx
42.96 : ffffffff8146b6eb: testl %eax, %eax
0.00 : ffffffff8146b6ed: jne 0xffffffff8146b7fc
<rhtab_lookup_elem+0x1cc>
0.00 : ffffffff8146b6f3: testq %r15, %r15
0.00 : ffffffff8146b6f6: jne 0xffffffff8146b71c
<rhtab_lookup_elem+0xec>
0.00 : ffffffff8146b6f8: movq 0x30(%r12), %r12
0.00 : ffffffff8146b6fd: testq %r12, %r12
0.00 : ffffffff8146b700: jne 0xffffffff8146b654
<rhtab_lookup_elem+0x24>
0.00 : ffffffff8146b706: addq $0x18, %rsp
0.01 : ffffffff8146b70a: movq %r12, %rax
0.07 : ffffffff8146b70d: popq %rbx
3.71 : ffffffff8146b70e: popq %rbp
0.00 : ffffffff8146b70f: popq %r12
0.02 : ffffffff8146b711: popq %r13
0.09 : ffffffff8146b713: popq %r14
0.03 : ffffffff8146b715: popq %r15
0.09 : ffffffff8146b717: jmp 0xffffffff81ecbb70
<__pi___x86_return_thunk>
0.00 : ffffffff8146b71c: subq 0x10(%rsp), %r15
2.77 : ffffffff8146b721: movq %r15, %r12
0.00 : ffffffff8146b724: jmp 0xffffffff8146b706
<rhtab_lookup_elem+0xd6>
0.00 : ffffffff8146b726: andl $-0x4, %eax
0.00 : ffffffff8146b729: shrl $0x2, %esi
0.00 : ffffffff8146b72c: movq %rbp, %rcx
0.45 : ffffffff8146b72f: leal -0x21524111(%rdx,%rax), %edx
0.01 : ffffffff8146b736: movl %edx, %eax
0.00 : ffffffff8146b738: movl %edx, %edi
0.00 : ffffffff8146b73a: cmpl $0xf, %r8d
0.00 : ffffffff8146b73e: jbe 0xffffffff8146b7a7
<rhtab_lookup_elem+0x177>
0.00 : ffffffff8146b740: addl 0x8(%rcx), %eax
0.00 : ffffffff8146b743: addl (%rcx), %edx
0.00 : ffffffff8146b745: subl $0x3, %esi
0.00 : ffffffff8146b748: addq $0xc, %rcx
0.00 : ffffffff8146b74c: movl %eax, %r8d
0.00 : ffffffff8146b74f: subl %eax, %edx
0.00 : ffffffff8146b751: addl -0x8(%rcx), %edi
0.00 : ffffffff8146b754: roll $0x4, %r8d
0.00 : ffffffff8146b758: addl %edi, %eax
0.00 : ffffffff8146b75a: xorl %edx, %r8d
0.00 : ffffffff8146b75d: movl %r8d, %edx
0.00 : ffffffff8146b760: subl %r8d, %edi
0.00 : ffffffff8146b763: addl %eax, %r8d
0.00 : ffffffff8146b766: roll $0x6, %edx
0.00 : ffffffff8146b769: xorl %edx, %edi
0.00 : ffffffff8146b76b: movl %edi, %edx
0.00 : ffffffff8146b76d: subl %edi, %eax
0.00 : ffffffff8146b76f: addl %r8d, %edi
0.00 : ffffffff8146b772: roll $0x8, %edx
0.00 : ffffffff8146b775: xorl %eax, %edx
0.00 : ffffffff8146b777: movl %edx, %eax
0.00 : ffffffff8146b779: subl %edx, %r8d
0.00 : ffffffff8146b77c: roll $0x10, %eax
0.00 : ffffffff8146b77f: xorl %r8d, %eax
0.00 : ffffffff8146b782: leal (%rdx,%rdi), %r8d
0.00 : ffffffff8146b786: subl %eax, %edi
0.00 : ffffffff8146b788: movl %edi, %edx
0.00 : ffffffff8146b78a: movl %eax, %edi
0.00 : ffffffff8146b78c: rorl $0xd, %edi
0.00 : ffffffff8146b78f: xorl %edx, %edi
0.00 : ffffffff8146b791: leal (%rax,%r8), %edx
0.00 : ffffffff8146b795: movl %edi, %eax
0.00 : ffffffff8146b797: subl %edi, %r8d
0.00 : ffffffff8146b79a: addl %edx, %edi
0.00 : ffffffff8146b79c: roll $0x4, %eax
0.00 : ffffffff8146b79f: xorl %r8d, %eax
0.00 : ffffffff8146b7a2: cmpl $0x3, %esi
0.00 : ffffffff8146b7a5: ja 0xffffffff8146b740
<rhtab_lookup_elem+0x110>
0.00 : ffffffff8146b7a7: cmpl $0x2, %esi
0.01 : ffffffff8146b7aa: je 0xffffffff8146b81c
<rhtab_lookup_elem+0x1ec>
0.01 : ffffffff8146b7ac: cmpl $0x3, %esi
0.01 : ffffffff8146b7af: je 0xffffffff8146b819
<rhtab_lookup_elem+0x1e9>
0.00 : ffffffff8146b7b1: cmpl $0x1, %esi
1.14 : ffffffff8146b7b4: jne 0xffffffff8146b677
<rhtab_lookup_elem+0x47>
0.00 : ffffffff8146b7ba: addl (%rcx), %edx
0.03 : ffffffff8146b7bc: movl %edi, %ecx
0.00 : ffffffff8146b7be: xorl %edi, %eax
0.00 : ffffffff8146b7c0: roll $0xe, %ecx
0.04 : ffffffff8146b7c3: subl %ecx, %eax
0.00 : ffffffff8146b7c5: movl %eax, %ecx
0.00 : ffffffff8146b7c7: xorl %eax, %edx
0.00 : ffffffff8146b7c9: roll $0xb, %ecx
0.85 : ffffffff8146b7cc: subl %ecx, %edx
0.00 : ffffffff8146b7ce: movl %edx, %ecx
0.00 : ffffffff8146b7d0: xorl %edx, %edi
0.03 : ffffffff8146b7d2: rorl $0x7, %ecx
0.06 : ffffffff8146b7d5: subl %ecx, %edi
0.00 : ffffffff8146b7d7: movl %edi, %ecx
0.00 : ffffffff8146b7d9: xorl %edi, %eax
0.13 : ffffffff8146b7db: roll $0x10, %ecx
0.78 : ffffffff8146b7de: subl %ecx, %eax
0.00 : ffffffff8146b7e0: movl %eax, %ecx
0.00 : ffffffff8146b7e2: xorl %eax, %edx
1.22 : ffffffff8146b7e4: roll $0x4, %ecx
0.28 : ffffffff8146b7e7: subl %ecx, %edx
0.00 : ffffffff8146b7e9: xorl %edx, %edi
0.06 : ffffffff8146b7eb: roll $0xe, %edx
1.13 : ffffffff8146b7ee: subl %edx, %edi
0.01 : ffffffff8146b7f0: xorl %edi, %eax
1.37 : ffffffff8146b7f2: rorl $0x8, %edi
4.96 : ffffffff8146b7f5: subl %edi, %eax
0.00 : ffffffff8146b7f7: jmp 0xffffffff8146b677
<rhtab_lookup_elem+0x47>
0.00 : ffffffff8146b7fc: movq (%r15), %r15
3.07 : ffffffff8146b7ff: testb $0x1, %r15b
0.00 : ffffffff8146b803: je 0xffffffff8146b6d7
<rhtab_lookup_elem+0xa7>
0.00 : ffffffff8146b809: cmpq 0x8(%rsp), %r15
0.00 : ffffffff8146b80e: jne 0xffffffff8146b6a0
<rhtab_lookup_elem+0x70>
0.00 : ffffffff8146b814: jmp 0xffffffff8146b6f8
<rhtab_lookup_elem+0xc8>
0.00 : ffffffff8146b819: addl 0x8(%rcx), %eax
0.00 : ffffffff8146b81c: addl 0x4(%rcx), %edi
0.00 : ffffffff8146b81f: jmp 0xffffffff8146b7ba
<rhtab_lookup_elem+0x18a>
0.00 : ffffffff8146b821: movq %r12, %rdi
0.00 : ffffffff8146b824: callq 0xffffffff81840570
<rht_bucket_nested>
0.00 : ffffffff8146b829: movq %rax, %r14
0.00 : ffffffff8146b82c: jmp 0xffffffff8146b694
<rhtab_lookup_elem+0x64>
> Please share 'perf report' or 'perf annotate' for lookup.
> Maybe worth it to include in commit log.
>
>> +}
>> +
>> static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
>> {
>> - return NULL;
>
> imo there is no point in such a step by step introduction.
> Patches 1,2,3 can be squashed.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from()
2026-04-12 23:13 ` Alexei Starovoitov
@ 2026-04-13 12:22 ` Mykyta Yatsenko
0 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 12:22 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On 4/13/26 12:13 AM, Alexei Starovoitov wrote:
> On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>> +
>> + /* Test 2: walk_enter_from with non-existent key starts from bucket */
>> + {
>> + struct test_obj_val key = { .id = 99 };
>> +
>> + scoped_guard(rcu) {
>> + rhashtable_walk_enter_from(&ht, &iter, &key,
>> + test_rht_params);
>> + rhashtable_walk_start(&iter);
>> + }
>> +
>> + obj = rhashtable_walk_next(&iter);
>> + while (IS_ERR(obj) && PTR_ERR(obj) == -EAGAIN)
>> + obj = rhashtable_walk_next(&iter);
>> +
>> + /* Should still return some element (iteration from bucket start) */
>> + rhashtable_walk_stop(&iter);
>> + rhashtable_walk_exit(&iter);
>> + }
>> +
>> + /* Test 3: verify walk_enter_from + walk_next can iterate remaining elements */
>> + {
>> + struct test_obj_val key = { .id = 0 };
>> + int count = 0;
>
> Please de-claude this.
> I couldn't force claude to avoid using these pointless indents.
> Either figure out how to do it or remove it manually.
Thanks for the reviews, I'll apply your suggestions here and other
patches for the v3.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-13 10:52 ` Mykyta Yatsenko
@ 2026-04-13 16:24 ` Alexei Starovoitov
2026-04-13 16:27 ` Daniel Borkmann
0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2026-04-13 16:24 UTC (permalink / raw)
To: Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
Herbert Xu, Mykyta Yatsenko
On Mon, Apr 13, 2026 at 3:52 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
>
>
> On 4/13/26 12:10 AM, Alexei Starovoitov wrote:
> > On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
> > <mykyta.yatsenko5@gmail.com> wrote:
> >>
> >>
> >> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
> >> +{
> >> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> >> + /* Using constant zeroed params to force rhashtable use inlined hashfunc */
> >> + static const struct rhashtable_params params = { 0 };
> >> +
> >> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
> >
> > How does the asm look like?
>
> You can see how asm looks like below. An interesting thing is that gcc
> inlines jhash2, but not clang, which costs a lot of performance.
>
> gcc:
> lookup 20.675M ± 0.090M events/sec (approximated from 32 samples of ~48ms)
>
> clang:
> cpu00: lookup 15.882M ± 0.530M events/sec (approximated from 32 samples
> of ~62ms)
>
> I think inlining is also consistent for htab and rhtab (either both
> inline or both do not inline).
Try other hashes. I recall somebody from isovalent experimented
replacing jhash with xxhash and/or siphash in hash map and had nice
improvements for certain key sizes.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-13 16:24 ` Alexei Starovoitov
@ 2026-04-13 16:27 ` Daniel Borkmann
2026-04-13 19:43 ` Mykyta Yatsenko
0 siblings, 1 reply; 52+ messages in thread
From: Daniel Borkmann @ 2026-04-13 16:27 UTC (permalink / raw)
To: Alexei Starovoitov, Mykyta Yatsenko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin Lau, Kernel Team,
Eduard, Kumar Kartikeya Dwivedi, Herbert Xu, Mykyta Yatsenko,
Anton Protopopov
On 4/13/26 6:24 PM, Alexei Starovoitov wrote:
> On Mon, Apr 13, 2026 at 3:52 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>> On 4/13/26 12:10 AM, Alexei Starovoitov wrote:
>>> On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
>>> <mykyta.yatsenko5@gmail.com> wrote:
>>>>
>>>>
>>>> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
>>>> +{
>>>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>>>> + /* Using constant zeroed params to force rhashtable use inlined hashfunc */
>>>> + static const struct rhashtable_params params = { 0 };
>>>> +
>>>> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
>>>
>>> How does the asm look like?
>>
>> You can see how asm looks like below. An interesting thing is that gcc
>> inlines jhash2, but not clang, which costs a lot of performance.
>>
>> gcc:
>> lookup 20.675M ± 0.090M events/sec (approximated from 32 samples of ~48ms)
>>
>> clang:
>> cpu00: lookup 15.882M ± 0.530M events/sec (approximated from 32 samples
>> of ~62ms)
>>
>> I think inlining is also consistent for htab and rhtab (either both
>> inline or both do not inline).
>
> Try other hashes. I recall somebody from isovalent experimented
> replacing jhash with xxhash and/or siphash in hash map and had nice
> improvements for certain key sizes.
[ +Anton ]
page 26+: https://bpfconf.ebpf.io/bpfconf2023/bpfconf2023_material/anton-protopopov-lsf-mm-bpf-2023.pdf
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-13 16:27 ` Daniel Borkmann
@ 2026-04-13 19:43 ` Mykyta Yatsenko
0 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 19:43 UTC (permalink / raw)
To: Daniel Borkmann, Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin Lau, Kernel Team,
Eduard, Kumar Kartikeya Dwivedi, Herbert Xu, Mykyta Yatsenko,
Anton Protopopov
On 4/13/26 5:27 PM, Daniel Borkmann wrote:
> On 4/13/26 6:24 PM, Alexei Starovoitov wrote:
>> On Mon, Apr 13, 2026 at 3:52 AM Mykyta Yatsenko
>> <mykyta.yatsenko5@gmail.com> wrote:
>>> On 4/13/26 12:10 AM, Alexei Starovoitov wrote:
>>>> On Wed, Apr 8, 2026 at 8:10 AM Mykyta Yatsenko
>>>> <mykyta.yatsenko5@gmail.com> wrote:
>>>>>
>>>>>
>>>>> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
>>>>> +{
>>>>> + struct bpf_rhtab *rhtab = container_of(map, struct
>>>>> bpf_rhtab, map);
>>>>> + /* Using constant zeroed params to force rhashtable use
>>>>> inlined hashfunc */
>>>>> + static const struct rhashtable_params params = { 0 };
>>>>> +
>>>>> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
>>>>
>>>> How does the asm look like?
>>>
>>> You can see how asm looks like below. An interesting thing is that gcc
>>> inlines jhash2, but not clang, which costs a lot of performance.
>>>
>>> gcc:
>>> lookup 20.675M ± 0.090M events/sec (approximated from 32 samples of
>>> ~48ms)
>>>
>>> clang:
>>> cpu00: lookup 15.882M ± 0.530M events/sec (approximated from 32 samples
>>> of ~62ms)
>>>
>>> I think inlining is also consistent for htab and rhtab (either both
>>> inline or both do not inline).
>>
>> Try other hashes. I recall somebody from isovalent experimented
>> replacing jhash with xxhash and/or siphash in hash map and had nice
>> improvements for certain key sizes.
>
> [ +Anton ]
>
> page 26+: https://bpfconf.ebpf.io/bpfconf2023/bpfconf2023_material/
> anton-protopopov-lsf-mm-bpf-2023.pdf
Thanks for sharing that presentation, looking at the graphs on page 27,
jhash2 is actually a better choice for the key size below 10 bytes
(correct me if I'm wrong). And for rhashtable it's going to make even
bigger difference, because jhash2 is inlined, because it's used by
default by rhashtable, while other hash fns we'll have to provide via
function pointer -> no inlining, which is a huge difference (see above).
Looking at the profiling data, hash fn (when inlined) does not look like
a bottleneck memcmp is, though.
Anyways, I'll run benchmarks with other hash functions.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
2026-04-12 23:10 ` Alexei Starovoitov
@ 2026-04-13 20:37 ` Emil Tsalapatis
2026-04-14 8:34 ` Mykyta Yatsenko
2026-04-14 10:25 ` Leon Hwang
2 siblings, 1 reply; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-13 20:37 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
> for deletes, and rhashtable_lookup_get_insert_fast() for inserts.
>
> Updates modify values in place under RCU rather than allocating a
> new element and swapping the pointer (as regular htab does). This
> trades read consistency for performance: concurrent readers may
> see partial updates. Users requiring consistent reads should use
> BPF_F_LOCK.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> kernel/bpf/hashtab.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 134 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 9e7806814fec..ea7314cc3703 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2755,6 +2755,11 @@ struct bpf_rhtab {
> u32 elem_size;
> };
>
> +static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
> +{
> + return l->data + round_up(key_size, 8);
> +}
> +
> static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
> {
> return ERR_PTR(-EOPNOTSUPP);
> @@ -2769,33 +2774,155 @@ static void rhtab_map_free(struct bpf_map *map)
> {
> }
>
> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
> +{
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + /* Using constant zeroed params to force rhashtable use inlined hashfunc */
> + static const struct rhashtable_params params = { 0 };
> +
> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
> +}
> +
> static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
> {
> - return NULL;
> + struct rhtab_elem *l;
> +
> + l = rhtab_lookup_elem(map, key);
> + return l ? rhtab_elem_value(l, map->key_size) : NULL;
> +}
> +
> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
> +{
> + int err;
> +
> + err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
> + if (err)
> + return err;
> +
> + bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
> + bpf_mem_cache_free_rcu(&rhtab->ma, elem);
> + return 0;
> }
>
> static long rhtab_map_delete_elem(struct bpf_map *map, void *key)
> {
> - return -EOPNOTSUPP;
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + struct rhtab_elem *l;
> +
> + guard(rcu)();
> + l = rhtab_lookup_elem(map, key);
> + return l ? rhtab_delete_elem(rhtab, l) : -ENOENT;
> +}
> +
> +static void rhtab_read_elem_value(struct bpf_map *map, void *dst, struct rhtab_elem *elem,
> + u64 flags)
> +{
> + void *src = rhtab_elem_value(elem, map->key_size);
> +
> + if (flags & BPF_F_LOCK)
> + copy_map_value_locked(map, dst, src, true);
> + else
> + copy_map_value(map, dst, src);
> }
>
> static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
> {
> - return -EOPNOTSUPP;
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + struct rhtab_elem *l;
> + int err;
> +
> + if ((flags & ~BPF_F_LOCK) ||
> + ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> + return -EINVAL;
> +
> + /* Make sure element is not deleted between lookup and copy */
> + guard(rcu)();
> +
> + l = rhtab_lookup_elem(map, key);
> + if (!l)
> + return -ENOENT;
> +
> + rhtab_read_elem_value(map, value, l, flags);
> + err = rhtab_delete_elem(rhtab, l);
> + if (err)
> + return err;
> +
> + check_and_init_map_value(map, value);
> + return 0;
> }
>
> -static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
> +static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
> + u64 map_flags)
> {
> - return -EOPNOTSUPP;
> + if (map_flags & BPF_NOEXIST)
> + return -EEXIST;
> +
> + if (map_flags & BPF_F_LOCK)
> + copy_map_value_locked(map, rhtab_elem_value(elem, map->key_size), value, false);
> + else
> + copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
It looks like Sashiko is accurate about special fields not getting handled here.
> + return 0;
> }
>
> -static void rhtab_map_free_internal_structs(struct bpf_map *map)
> +static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
> {
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + struct rhtab_elem *elem, *tmp;
> +
> + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
> + return -EINVAL;
> +
> + if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
> + return -EINVAL;
> +
> + guard(rcu)();
> + elem = rhtab_lookup_elem(map, key);
> + if (elem)
> + return rhtab_map_update_existing(map, elem, value, map_flags);
> +
> + if (map_flags & BPF_EXIST)
> + return -ENOENT;
> +
> + /* Check max_entries limit before inserting new element */
> + if (atomic_read(&rhtab->ht.nelems) >= map->max_entries)
> + return -E2BIG;
> +
> + elem = bpf_mem_cache_alloc(&rhtab->ma);
> + if (!elem)
> + return -ENOMEM;
> +
> + memcpy(elem->data, key, map->key_size);
> + copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
> +
> + tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
> + if (tmp) {
> + bpf_mem_cache_free(&rhtab->ma, elem);
> + if (IS_ERR(tmp))
> + return PTR_ERR(tmp);
> +
> + return rhtab_map_update_existing(map, tmp, value, map_flags);
> + }
> +
> + return 0;
> }
Note: I am actually skeptical about Sashiko's warning here. Sure, the
update may get overwritten even as we are returning 0, but we are
providing no guarantees about how long the write will survive in the
map, and there is no inherent atomicity between an update and any other
operations.
>
> static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
> {
> - return -EOPNOTSUPP;
> + struct bpf_insn *insn = insn_buf;
> + const int ret = BPF_REG_0;
> +
> + BUILD_BUG_ON(!__same_type(&rhtab_lookup_elem,
> + (void *(*)(struct bpf_map *map, void *key)) NULL));
> + *insn++ = BPF_EMIT_CALL(rhtab_lookup_elem);
> + *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1);
> + *insn++ = BPF_ALU64_IMM(BPF_ADD, ret,
> + offsetof(struct rhtab_elem, data) + round_up(map->key_size, 8));
> +
> + return insn - insn_buf;
> +}
> +
> +static void rhtab_map_free_internal_structs(struct bpf_map *map)
> +{
> }
This gets filled in in later patches, but the fact it's here and a no-op
debatably the line into being non-bisectable. Can we move it to the walk
patch, since that's where it gets populated?
>
> static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from()
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
2026-04-12 23:13 ` Alexei Starovoitov
@ 2026-04-13 22:22 ` Emil Tsalapatis
1 sibling, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-13 22:22 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> BPF resizable hashmap needs efficient iteration resume for
> get_next_key and seq_file iterators. rhashtable_walk_enter()
> always starts from bucket 0, forcing linear skip of already-seen
> elements.
>
> Add rhashtable_walk_enter_from() that looks up the key's bucket
> and positions the walker there, so walk_next returns the successor
> directly. If a resize moved the key to the future table, the
> walker is migrated to that table.
>
> Refactor __rhashtable_lookup into __rhashtable_lookup_one to reuse
> the single-table lookup in both the two-table search and the new
> enter_from positioning.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Provided tests are updated:
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> include/linux/rhashtable.h | 31 ++++++++++--
> lib/rhashtable.c | 53 ++++++++++++++++++++
> lib/test_rhashtable.c | 120 +++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 199 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
> index 133ccb39137a..2c7a343ac592 100644
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -253,6 +253,11 @@ static inline void rhashtable_walk_start(struct rhashtable_iter *iter)
> (void)rhashtable_walk_start_check(iter);
> }
>
> +void rhashtable_walk_enter_from(struct rhashtable *ht,
> + struct rhashtable_iter *iter,
> + const void *key,
> + const struct rhashtable_params params);
> +
> void *rhashtable_walk_next(struct rhashtable_iter *iter);
> void *rhashtable_walk_peek(struct rhashtable_iter *iter);
> void rhashtable_walk_stop(struct rhashtable_iter *iter) __releases_shared(RCU);
> @@ -613,8 +618,8 @@ static inline int rhashtable_compare(struct rhashtable_compare_arg *arg,
> }
>
> /* Internal function, do not use. */
> -static __always_inline struct rhash_head *__rhashtable_lookup(
> - struct rhashtable *ht, const void *key,
> +static __always_inline struct rhash_head *__rhashtable_lookup_one(
> + struct rhashtable *ht, struct bucket_table *tbl, const void *key,
> const struct rhashtable_params params,
> const enum rht_lookup_freq freq)
> __must_hold_shared(RCU)
> @@ -624,13 +629,10 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
> .key = key,
> };
> struct rhash_lock_head __rcu *const *bkt;
> - struct bucket_table *tbl;
> struct rhash_head *he;
> unsigned int hash;
>
> BUILD_BUG_ON(!__builtin_constant_p(freq));
> - tbl = rht_dereference_rcu(ht->tbl, ht);
> -restart:
> hash = rht_key_hashfn(ht, tbl, key, params);
> bkt = rht_bucket(tbl, hash);
> do {
> @@ -646,6 +648,25 @@ static __always_inline struct rhash_head *__rhashtable_lookup(
> */
> } while (he != RHT_NULLS_MARKER(bkt));
>
> + return NULL;
> +}
> +
> +/* Internal function, do not use. */
> +static __always_inline struct rhash_head *__rhashtable_lookup(
> + struct rhashtable *ht, const void *key,
> + const struct rhashtable_params params,
> + const enum rht_lookup_freq freq)
> + __must_hold_shared(RCU)
> +{
> + struct bucket_table *tbl;
> + struct rhash_head *he;
> +
> + tbl = rht_dereference_rcu(ht->tbl, ht);
> +restart:
> + he = __rhashtable_lookup_one(ht, tbl, key, params, freq);
> + if (he)
> + return he;
> +
> /* Ensure we see any new tables. */
> smp_rmb();
>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 6074ed5f66f3..2fc277207dcc 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -692,6 +692,59 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
> }
> EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
>
> +/**
> + * rhashtable_walk_enter_from - Initialise a walk starting at a key's bucket
> + * @ht: Table to walk over
> + * @iter: Hash table iterator
> + * @key: Key whose bucket to start from
> + * @params: Hash table parameters
> + *
> + * Like rhashtable_walk_enter(), but positions the iterator at the bucket
> + * containing @key. If a resize is in progress and @key has been migrated
> + * to the future table, the walker is moved to that table.
> + *
> + * Same constraints as rhashtable_walk_enter() apply.
> + */
> +void rhashtable_walk_enter_from(struct rhashtable *ht,
> + struct rhashtable_iter *iter,
> + const void *key,
> + const struct rhashtable_params params)
> + __must_hold(RCU)
> +{
> + struct bucket_table *tbl;
> + struct rhash_head *he;
> +
> + rhashtable_walk_enter(ht, iter);
> +
> + if (!key)
> + return;
> +
> + tbl = rht_dereference_rcu(ht->tbl, ht);
> + he = __rhashtable_lookup_one(ht, tbl, key, params,
> + RHT_LOOKUP_NORMAL);
> + if (!he) {
> + smp_rmb();
> + tbl = rht_dereference_rcu(tbl->future_tbl, ht);
> + if (!tbl)
> + return;
> +
> + he = __rhashtable_lookup_one(ht, tbl, key, params,
> + RHT_LOOKUP_NORMAL);
> + if (!he)
> + return;
> +
> + spin_lock(&ht->lock);
> + list_del(&iter->walker.list);
> + iter->walker.tbl = tbl;
> + list_add(&iter->walker.list, &tbl->walkers);
> + spin_unlock(&ht->lock);
> + }
> +
> + iter->slot = rht_key_hashfn(ht, tbl, key, params);
> + iter->p = he;
> +}
> +EXPORT_SYMBOL_GPL(rhashtable_walk_enter_from);
> +
> /**
> * rhashtable_walk_exit - Free an iterator
> * @iter: Hash table Iterator
> diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
> index 0b33559a910b..0084157a96b4 100644
> --- a/lib/test_rhashtable.c
> +++ b/lib/test_rhashtable.c
> @@ -23,6 +23,7 @@
> #include <linux/random.h>
> #include <linux/vmalloc.h>
> #include <linux/wait.h>
> +#include <linux/cleanup.h>
>
> #define MAX_ENTRIES 1000000
> #define TEST_INSERT_FAIL INT_MAX
> @@ -679,6 +680,122 @@ static int threadfunc(void *data)
> return err;
> }
>
> +static int __init test_walk_enter_from(void)
> +{
> + struct rhashtable ht;
> + struct test_obj objs[4];
> + struct rhashtable_iter iter;
> + struct test_obj *obj;
> + int err, i;
> +
> + err = rhashtable_init(&ht, &test_rht_params);
> + if (err)
> + return err;
> +
> + /* Insert 4 elements with keys 0, 2, 4, 6 */
> + for (i = 0; i < 4; i++) {
> + objs[i].value.id = i * 2;
> + objs[i].value.tid = 0;
> + err = rhashtable_insert_fast(&ht, &objs[i].node,
> + test_rht_params);
> + if (err) {
> + pr_warn("walk_enter_from: insert %d failed: %d\n",
> + i, err);
> + goto out;
> + }
> + }
> +
> + /*
> + * Test 1: walk_enter_from positions at key, walk_next returns
> + * the successor (not the key itself).
> + */
> + for (i = 0; i < 4; i++) {
> + struct test_obj_val key = { .id = i * 2 };
> +
> + scoped_guard(rcu) {
> + rhashtable_walk_enter_from(&ht, &iter, &key,
> + test_rht_params);
> + rhashtable_walk_start(&iter);
> + }
> +
> + obj = rhashtable_walk_next(&iter);
> + while (IS_ERR(obj) && PTR_ERR(obj) == -EAGAIN)
> + obj = rhashtable_walk_next(&iter);
> +
> + /* Successor must not be the key itself */
> + if (obj && obj->value.id == i * 2) {
> + pr_warn("walk_enter_from: returned key %d instead of successor\n",
> + i * 2);
> + err = -EINVAL;
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> + goto out;
> + }
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> + }
> +
> + /* Test 2: walk_enter_from with non-existent key starts from bucket */
> + {
> + struct test_obj_val key = { .id = 99 };
> +
> + scoped_guard(rcu) {
> + rhashtable_walk_enter_from(&ht, &iter, &key,
> + test_rht_params);
> + rhashtable_walk_start(&iter);
> + }
> +
> + obj = rhashtable_walk_next(&iter);
> + while (IS_ERR(obj) && PTR_ERR(obj) == -EAGAIN)
> + obj = rhashtable_walk_next(&iter);
> +
> + /* Should still return some element (iteration from bucket start) */
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> + }
> +
> + /* Test 3: verify walk_enter_from + walk_next can iterate remaining elements */
> + {
> + struct test_obj_val key = { .id = 0 };
> + int count = 0;
> +
> + scoped_guard(rcu) {
> + rhashtable_walk_enter_from(&ht, &iter, &key,
> + test_rht_params);
> + rhashtable_walk_start(&iter);
> + }
> +
> + while ((obj = rhashtable_walk_next(&iter))) {
> + if (IS_ERR(obj)) {
> + if (PTR_ERR(obj) == -EAGAIN)
> + continue;
> + break;
> + }
> + count++;
> + }
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> +
> + /*
> + * Should see at least some elements after key 0.
> + * Exact count depends on hash distribution.
> + */
> + if (count == 0) {
> + pr_warn("walk_enter_from: no elements found after key 0\n");
> + err = -EINVAL;
> + goto out;
> + }
> + }
> +
> + pr_info("walk_enter_from: all tests passed\n");
> + err = 0;
> +out:
> + rhashtable_destroy(&ht);
> + return err;
> +}
> +
> static int __init test_rht_init(void)
> {
> unsigned int entries;
> @@ -738,6 +855,9 @@ static int __init test_rht_init(void)
>
> test_insert_duplicates_run();
>
> + pr_info("Testing walk_enter_from: %s\n",
> + test_walk_enter_from() == 0 ? "pass" : "FAIL");
> +
> if (!tcount)
> return 0;
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab Mykyta Yatsenko
@ 2026-04-13 22:44 ` Emil Tsalapatis
2026-04-14 8:11 ` Mykyta Yatsenko
0 siblings, 1 reply; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-13 22:44 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Implement rhtab_map_get_next_key() and rhtab_map_free_internal_structs()
> of the BPF resizable hashtable. Both are only called from syscall, so
> it's safe to use rhashtable walk API that uses spinlocks internally.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> kernel/bpf/hashtab.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 77 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index ea7314cc3703..7eee450a321e 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2921,13 +2921,89 @@ static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
> return insn - insn_buf;
> }
>
> +/* Helper to get next element, handling -EAGAIN during resize */
> +static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
> +{
> + struct rhtab_elem *elem;
> +
> + while ((elem = rhashtable_walk_next(iter))) {
> + if (IS_ERR(elem)) {
> + if (PTR_ERR(elem) == -EAGAIN)
> + continue;
> + return NULL;
> + }
> + return elem;
> + }
> +
> + return NULL;
> +}
> +
> static void rhtab_map_free_internal_structs(struct bpf_map *map)
> {
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + struct rhashtable_iter iter;
> + struct rhtab_elem *elem;
> +
> + if (!bpf_map_has_internal_structs(map))
> + return;
> +
> + /*
> + * An element can be processed twice if rhashtable resized concurrently.
> + * Special structs freeing handles duplicate cancel_and_free.
> + */
> + rhashtable_walk_enter(&rhtab->ht, &iter);
> + rhashtable_walk_start(&iter);
> +
> + for (elem = rhtab_iter_next(&iter); elem; elem = rhtab_iter_next(&iter))
> + bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> }
>
> static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> {
> - return -EOPNOTSUPP;
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + struct rhashtable_iter iter;
> + struct rhtab_elem *elem;
> + bool key_found;
> + int ret = -ENOENT;
> +
> + /*
> + * Hold RCU across enter_from + walk_start to prevent the
> + * element cached by enter_from from being freed before
> + * walk_start re-acquires RCU.
> + */
> + guard(rcu)();
> + rhashtable_walk_enter_from(&rhtab->ht, &iter, key, rhtab->params);
> + key_found = key && iter.p;
> + rhashtable_walk_start(&iter);
> +
> + elem = rhtab_iter_next(&iter);
> + if (elem) {
> + memcpy(next_key, elem->data, map->key_size);
> + ret = 0;
> + }
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> +
> + if (ret == 0 || key_found)
> + return ret;
> +
> + /* Key was not found restart from the beginning */
> + rhashtable_walk_enter(&rhtab->ht, &iter);
> + rhashtable_walk_start(&iter);
> +
> + elem = rhtab_iter_next(&iter);
> + if (elem) {
> + memcpy(next_key, elem->data, map->key_size);
> + ret = 0;
> + }
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
Nit: The two attempts are partly identical, can you try to
roll them together and use a goto to retry? Or do you think
this will be too messy?
Otherwise:
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> + return ret;
> }
>
> static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API Mykyta Yatsenko
@ 2026-04-13 23:02 ` Emil Tsalapatis
0 siblings, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-13 23:02 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> rhashtable walk API takes spin_lock(&ht->lock) in start/stop,
> making it unsafe in NMI and hard IRQ contexts. Guard with
> !in_task() rather than open-coding raw RCU
> iteration that would need to handle resize races manually.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> kernel/bpf/hashtab.c | 35 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 7eee450a321e..e79c194e2779 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -3013,7 +3013,40 @@ static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_f
> static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
> void *callback_ctx, u64 flags)
> {
> - return -EOPNOTSUPP;
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + struct rhashtable_iter iter;
> + struct rhtab_elem *elem;
> + int num_elems = 0;
> + u64 ret = 0;
> +
> + if (flags != 0)
> + return -EINVAL;
> +
> + /*
> + * The rhashtable walk API uses spin_lock(&ht->lock) in rhashtable_walk_start/stop,
> + * which is not safe in NMI or soft/hard IRQ context.
> + */
> + if (!in_task())
> + return -EOPNOTSUPP;
Use in_nmi()/in_hardirq()/in_softirq() instead?
> +
> + rhashtable_walk_enter(&rhtab->ht, &iter);
> + rhashtable_walk_start(&iter);
> +
> + for (elem = rhtab_iter_next(&iter); elem;
> + elem = rhtab_iter_next(&iter)) {
> + num_elems++;
> + ret = callback_fn((u64)(long)map,
> + (u64)(long)elem->data,
> + (u64)(long)rhtab_elem_value(elem, map->key_size),
> + (u64)(long)callback_ctx, 0);
AFAICT this callback is impossible to be sleepable which is why we're
able to call it under RCU, can we note this here in a comment?
> + if (ret)
> + break;
> + }
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> +
> + return num_elems;
> }
>
> static u64 rhtab_map_mem_usage(const struct bpf_map *map)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab Mykyta Yatsenko
@ 2026-04-13 23:25 ` Emil Tsalapatis
2026-04-14 8:08 ` Mykyta Yatsenko
0 siblings, 1 reply; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-13 23:25 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Add batch operations for BPF_MAP_TYPE_RHASH.
>
> Batch operations:
> * rhtab_map_lookup_batch: Bulk lookup of elements by bucket
> * rhtab_map_lookup_and_delete_batch: Atomic bulk lookup and delete
>
> The batch implementation uses rhashtable_walk_enter_from() to resume
> iteration from the last collected key. When the buffer fills, the last
> key becomes the cursor for the next batch call.
>
> Also implements rhtab_map_mem_usage() to report memory consumption.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
> kernel/bpf/hashtab.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 134 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index e79c194e2779..a79d434dc626 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -3051,19 +3051,150 @@ static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
>
> static u64 rhtab_map_mem_usage(const struct bpf_map *map)
> {
> - return 0;
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + u64 num_entries;
> +
> + num_entries = atomic_read(&rhtab->ht.nelems);
> + return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
> +}
> +
> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
> + const union bpf_attr *attr,
> + union bpf_attr __user *uattr,
> + bool do_delete)
> +{
> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
> + void __user *uvalues = u64_to_user_ptr(attr->batch.values);
> + void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
> + void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
> + void *buf = NULL, *keys = NULL, *values = NULL, *dst_key, *dst_val;
> + struct rhtab_elem **del_elems = NULL;
> + u32 max_count, total, key_size, value_size, i;
> + struct rhashtable_iter iter;
> + struct rhtab_elem *elem;
> + u64 elem_map_flags, map_flags;
> + int ret = 0;
> +
> + elem_map_flags = attr->batch.elem_flags;
> + if ((elem_map_flags & ~BPF_F_LOCK) ||
> + ((elem_map_flags & BPF_F_LOCK) &&
> + !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
> + return -EINVAL;
> +
> + map_flags = attr->batch.flags;
> + if (map_flags)
> + return -EINVAL;
> +
> + max_count = attr->batch.count;
> + if (!max_count)
> + return 0;
> +
> + if (put_user(0, &uattr->batch.count))
> + return -EFAULT;
> +
> + key_size = map->key_size;
> + value_size = map->value_size;
> +
> + keys = kvmalloc_array(max_count, key_size, GFP_USER | __GFP_NOWARN);
> + values = kvmalloc_array(max_count, value_size, GFP_USER | __GFP_NOWARN);
> + if (do_delete)
> + del_elems = kvmalloc_array(max_count, sizeof(void *),
> + GFP_USER | __GFP_NOWARN);
> +
> + if (!keys || !values || (do_delete && !del_elems)) {
> + ret = -ENOMEM;
> + goto free;
> + }
> +
> + /*
> + * Use the last key from the previous batch as cursor.
> + * enter_from positions at that key's bucket, walk_next
> + * returns the successor in O(1).
> + * First call (ubatch == NULL): starts from bucket 0.
> + */
> + if (ubatch) {
> + buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
> + if (!buf) {
> + ret = -ENOMEM;
> + goto free;
> + }
> + if (copy_from_user(buf, ubatch, key_size)) {
> + ret = -EFAULT;
> + goto free;
> + }
> + }
> +
> + scoped_guard(rcu) {
AFAICT this guard makes sure the RCU critical section extends from the
beginning of rhashtable_walk_enter_from all the way to walk_stop(), is
that correct?
> + rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
> + rhashtable_walk_start(&iter);
> + }
> +
> + dst_key = keys;
> + dst_val = values;
> + total = 0;
> +
> + while (total < max_count) {
> + elem = rhtab_iter_next(&iter);
> + if (!elem)
> + break;
> +
> + memcpy(dst_key, elem->data, key_size);
> + rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
> + check_and_init_map_value(map, dst_val);
> +
> + if (do_delete)
> + del_elems[total] = elem;
> +
> + dst_key += key_size;
> + dst_val += value_size;
> + total++;
> + }
> +
> + if (do_delete) {
> + for (i = 0; i < total; i++)
> + rhtab_delete_elem(rhtab, del_elems[i]);
> + }
> +
> + rhashtable_walk_stop(&iter);
> + rhashtable_walk_exit(&iter);
> +
> + if (total == 0) {
> + ret = -ENOENT;
> + goto free;
> + }
> +
> + /* Signal end of table when we collected fewer than requested */
> + if (total < max_count)
> + ret = -ENOENT;
> +
> + /* Write last key as cursor for the next batch call */
> + if (copy_to_user(ukeys, keys, total * key_size) ||
> + copy_to_user(uvalues, values, total * value_size) ||
> + put_user(total, &uattr->batch.count) ||
> + copy_to_user(u64_to_user_ptr(attr->batch.out_batch),
> + dst_key - key_size, key_size)) {
> + ret = -EFAULT;
> + goto free;
> + }
> +
> +free:
> + kfree(buf);
> + kvfree(keys);
> + kvfree(values);
> + kvfree(del_elems);
> + return ret;
> }
>
> static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> union bpf_attr __user *uattr)
> {
> - return 0;
> + return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, false);
> }
>
> static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
> union bpf_attr __user *uattr)
> {
> - return 0;
Shouldn't these have been -EINVAL or -ENOSUPP in previous patches?
> + return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, true);
> }
>
> struct bpf_iter_seq_rhash_map_info {
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab
2026-04-13 23:25 ` Emil Tsalapatis
@ 2026-04-14 8:08 ` Mykyta Yatsenko
2026-04-14 17:47 ` Emil Tsalapatis
0 siblings, 1 reply; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-14 8:08 UTC (permalink / raw)
To: Emil Tsalapatis, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On 4/14/26 12:25 AM, Emil Tsalapatis wrote:
> On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Add batch operations for BPF_MAP_TYPE_RHASH.
>>
>> Batch operations:
>> * rhtab_map_lookup_batch: Bulk lookup of elements by bucket
>> * rhtab_map_lookup_and_delete_batch: Atomic bulk lookup and delete
>>
>> The batch implementation uses rhashtable_walk_enter_from() to resume
>> iteration from the last collected key. When the buffer fills, the last
>> key becomes the cursor for the next batch call.
>>
>> Also implements rhtab_map_mem_usage() to report memory consumption.
>>
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>> kernel/bpf/hashtab.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 134 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index e79c194e2779..a79d434dc626 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -3051,19 +3051,150 @@ static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
>>
>> static u64 rhtab_map_mem_usage(const struct bpf_map *map)
>> {
>> - return 0;
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + u64 num_entries;
>> +
>> + num_entries = atomic_read(&rhtab->ht.nelems);
>> + return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
>> +}
>> +
>> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
>> + const union bpf_attr *attr,
>> + union bpf_attr __user *uattr,
>> + bool do_delete)
>> +{
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + void __user *uvalues = u64_to_user_ptr(attr->batch.values);
>> + void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
>> + void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
>> + void *buf = NULL, *keys = NULL, *values = NULL, *dst_key, *dst_val;
>> + struct rhtab_elem **del_elems = NULL;
>> + u32 max_count, total, key_size, value_size, i;
>> + struct rhashtable_iter iter;
>> + struct rhtab_elem *elem;
>> + u64 elem_map_flags, map_flags;
>> + int ret = 0;
>> +
>> + elem_map_flags = attr->batch.elem_flags;
>> + if ((elem_map_flags & ~BPF_F_LOCK) ||
>> + ((elem_map_flags & BPF_F_LOCK) &&
>> + !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> + return -EINVAL;
>> +
>> + map_flags = attr->batch.flags;
>> + if (map_flags)
>> + return -EINVAL;
>> +
>> + max_count = attr->batch.count;
>> + if (!max_count)
>> + return 0;
>> +
>> + if (put_user(0, &uattr->batch.count))
>> + return -EFAULT;
>> +
>> + key_size = map->key_size;
>> + value_size = map->value_size;
>> +
>> + keys = kvmalloc_array(max_count, key_size, GFP_USER | __GFP_NOWARN);
>> + values = kvmalloc_array(max_count, value_size, GFP_USER | __GFP_NOWARN);
>> + if (do_delete)
>> + del_elems = kvmalloc_array(max_count, sizeof(void *),
>> + GFP_USER | __GFP_NOWARN);
>> +
>> + if (!keys || !values || (do_delete && !del_elems)) {
>> + ret = -ENOMEM;
>> + goto free;
>> + }
>> +
>> + /*
>> + * Use the last key from the previous batch as cursor.
>> + * enter_from positions at that key's bucket, walk_next
>> + * returns the successor in O(1).
>> + * First call (ubatch == NULL): starts from bucket 0.
>> + */
>> + if (ubatch) {
>> + buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
>> + if (!buf) {
>> + ret = -ENOMEM;
>> + goto free;
>> + }
>> + if (copy_from_user(buf, ubatch, key_size)) {
>> + ret = -EFAULT;
>> + goto free;
>> + }
>> + }
>> +
>> + scoped_guard(rcu) {
>
> AFAICT this guard makes sure the RCU critical section extends from the
> beginning of rhashtable_walk_enter_from all the way to walk_stop(), is
> that correct?
>
This guard is to make sure that when rhashtable_walk_enter_from()
initializes iterator, the element which it initialised with is not
freed. rhashtable_walk_start() calls rcu lock as well, that's why I can
end guard right after rhashtable_walk_start().
>> + rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
>> + rhashtable_walk_start(&iter);
>> + }
>> +
>> + dst_key = keys;
>> + dst_val = values;
>> + total = 0;
>> +
>> + while (total < max_count) {
>> + elem = rhtab_iter_next(&iter);
>> + if (!elem)
>> + break;
>> +
>> + memcpy(dst_key, elem->data, key_size);
>> + rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
>> + check_and_init_map_value(map, dst_val);
>> +
>> + if (do_delete)
>> + del_elems[total] = elem;
>> +
>> + dst_key += key_size;
>> + dst_val += value_size;
>> + total++;
>> + }
>> +
>> + if (do_delete) {
>> + for (i = 0; i < total; i++)
>> + rhtab_delete_elem(rhtab, del_elems[i]);
>> + }
>> +
>> + rhashtable_walk_stop(&iter);
>> + rhashtable_walk_exit(&iter);
>> +
>> + if (total == 0) {
>> + ret = -ENOENT;
>> + goto free;
>> + }
>> +
>> + /* Signal end of table when we collected fewer than requested */
>> + if (total < max_count)
>> + ret = -ENOENT;
>> +
>> + /* Write last key as cursor for the next batch call */
>> + if (copy_to_user(ukeys, keys, total * key_size) ||
>> + copy_to_user(uvalues, values, total * value_size) ||
>> + put_user(total, &uattr->batch.count) ||
>> + copy_to_user(u64_to_user_ptr(attr->batch.out_batch),
>> + dst_key - key_size, key_size)) {
>> + ret = -EFAULT;
>> + goto free;
>> + }
>> +
>> +free:
>> + kfree(buf);
>> + kvfree(keys);
>> + kvfree(values);
>> + kvfree(del_elems);
>> + return ret;
>> }
>>
>> static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
>> union bpf_attr __user *uattr)
>> {
>> - return 0;
>> + return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, false);
>> }
>>
>> static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
>> union bpf_attr __user *uattr)
>> {
>> - return 0;
> Shouldn't these have been -EINVAL or -ENOSUPP in previous patches?
>
>> + return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, true);
>> }
>>
>> struct bpf_iter_seq_rhash_map_info {
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab
2026-04-13 22:44 ` Emil Tsalapatis
@ 2026-04-14 8:11 ` Mykyta Yatsenko
0 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-14 8:11 UTC (permalink / raw)
To: Emil Tsalapatis, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On 4/13/26 11:44 PM, Emil Tsalapatis wrote:
> On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Implement rhtab_map_get_next_key() and rhtab_map_free_internal_structs()
>> of the BPF resizable hashtable. Both are only called from syscall, so
>> it's safe to use rhashtable walk API that uses spinlocks internally.
>>
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>> kernel/bpf/hashtab.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 77 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index ea7314cc3703..7eee450a321e 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -2921,13 +2921,89 @@ static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>> return insn - insn_buf;
>> }
>>
>> +/* Helper to get next element, handling -EAGAIN during resize */
>> +static struct rhtab_elem *rhtab_iter_next(struct rhashtable_iter *iter)
>> +{
>> + struct rhtab_elem *elem;
>> +
>> + while ((elem = rhashtable_walk_next(iter))) {
>> + if (IS_ERR(elem)) {
>> + if (PTR_ERR(elem) == -EAGAIN)
>> + continue;
>> + return NULL;
>> + }
>> + return elem;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> static void rhtab_map_free_internal_structs(struct bpf_map *map)
>> {
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhashtable_iter iter;
>> + struct rhtab_elem *elem;
>> +
>> + if (!bpf_map_has_internal_structs(map))
>> + return;
>> +
>> + /*
>> + * An element can be processed twice if rhashtable resized concurrently.
>> + * Special structs freeing handles duplicate cancel_and_free.
>> + */
>> + rhashtable_walk_enter(&rhtab->ht, &iter);
>> + rhashtable_walk_start(&iter);
>> +
>> + for (elem = rhtab_iter_next(&iter); elem; elem = rhtab_iter_next(&iter))
>> + bpf_map_free_internal_structs(map, rhtab_elem_value(elem, map->key_size));
>> +
>> + rhashtable_walk_stop(&iter);
>> + rhashtable_walk_exit(&iter);
>> }
>>
>> static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhashtable_iter iter;
>> + struct rhtab_elem *elem;
>> + bool key_found;
>> + int ret = -ENOENT;
>> +
>> + /*
>> + * Hold RCU across enter_from + walk_start to prevent the
>> + * element cached by enter_from from being freed before
>> + * walk_start re-acquires RCU.
>> + */
>> + guard(rcu)();
>> + rhashtable_walk_enter_from(&rhtab->ht, &iter, key, rhtab->params);
>> + key_found = key && iter.p;
>> + rhashtable_walk_start(&iter);
>> +
>> + elem = rhtab_iter_next(&iter);
>> + if (elem) {
>> + memcpy(next_key, elem->data, map->key_size);
>> + ret = 0;
>> + }
>> +
>> + rhashtable_walk_stop(&iter);
>> + rhashtable_walk_exit(&iter);
>> +
>> + if (ret == 0 || key_found)
>> + return ret;
>> +
>> + /* Key was not found restart from the beginning */
>> + rhashtable_walk_enter(&rhtab->ht, &iter);
>> + rhashtable_walk_start(&iter);
>> +
>> + elem = rhtab_iter_next(&iter);
>> + if (elem) {
>> + memcpy(next_key, elem->data, map->key_size);
>> + ret = 0;
>> + }
>> +
>> + rhashtable_walk_stop(&iter);
>> + rhashtable_walk_exit(&iter);
> Nit: The two attempts are partly identical, can you try to
> roll them together and use a goto to retry? Or do you think
> this will be too messy?
>
It felt nicer just to write this code sequentially without any gotos or
loops. No strong opinion here.
> Otherwise:
>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
>
>> + return ret;
>> }
>>
>> static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-13 20:37 ` Emil Tsalapatis
@ 2026-04-14 8:34 ` Mykyta Yatsenko
0 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-14 8:34 UTC (permalink / raw)
To: Emil Tsalapatis, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On 4/13/26 9:37 PM, Emil Tsalapatis wrote:
> On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
>> for deletes, and rhashtable_lookup_get_insert_fast() for inserts.
>>
>> Updates modify values in place under RCU rather than allocating a
>> new element and swapping the pointer (as regular htab does). This
>> trades read consistency for performance: concurrent readers may
>> see partial updates. Users requiring consistent reads should use
>> BPF_F_LOCK.
>>
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>> kernel/bpf/hashtab.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 134 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 9e7806814fec..ea7314cc3703 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -2755,6 +2755,11 @@ struct bpf_rhtab {
>> u32 elem_size;
>> };
>>
>> +static inline void *rhtab_elem_value(struct rhtab_elem *l, u32 key_size)
>> +{
>> + return l->data + round_up(key_size, 8);
>> +}
>> +
>> static struct bpf_map *rhtab_map_alloc(union bpf_attr *attr)
>> {
>> return ERR_PTR(-EOPNOTSUPP);
>> @@ -2769,33 +2774,155 @@ static void rhtab_map_free(struct bpf_map *map)
>> {
>> }
>>
>> +static void *rhtab_lookup_elem(struct bpf_map *map, void *key)
>> +{
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + /* Using constant zeroed params to force rhashtable use inlined hashfunc */
>> + static const struct rhashtable_params params = { 0 };
>> +
>> + return rhashtable_lookup_likely(&rhtab->ht, key, params);
>> +}
>> +
>> static void *rhtab_map_lookup_elem(struct bpf_map *map, void *key) __must_hold(RCU)
>> {
>> - return NULL;
>> + struct rhtab_elem *l;
>> +
>> + l = rhtab_lookup_elem(map, key);
>> + return l ? rhtab_elem_value(l, map->key_size) : NULL;
>> +}
>> +
>> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem)
>> +{
>> + int err;
>> +
>> + err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab->params);
>> + if (err)
>> + return err;
>> +
>> + bpf_map_free_internal_structs(&rhtab->map, rhtab_elem_value(elem, rhtab->map.key_size));
>> + bpf_mem_cache_free_rcu(&rhtab->ma, elem);
>> + return 0;
>> }
>>
>> static long rhtab_map_delete_elem(struct bpf_map *map, void *key)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhtab_elem *l;
>> +
>> + guard(rcu)();
>> + l = rhtab_lookup_elem(map, key);
>> + return l ? rhtab_delete_elem(rhtab, l) : -ENOENT;
>> +}
>> +
>> +static void rhtab_read_elem_value(struct bpf_map *map, void *dst, struct rhtab_elem *elem,
>> + u64 flags)
>> +{
>> + void *src = rhtab_elem_value(elem, map->key_size);
>> +
>> + if (flags & BPF_F_LOCK)
>> + copy_map_value_locked(map, dst, src, true);
>> + else
>> + copy_map_value(map, dst, src);
>> }
>>
>> static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhtab_elem *l;
>> + int err;
>> +
>> + if ((flags & ~BPF_F_LOCK) ||
>> + ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> + return -EINVAL;
>> +
>> + /* Make sure element is not deleted between lookup and copy */
>> + guard(rcu)();
>> +
>> + l = rhtab_lookup_elem(map, key);
>> + if (!l)
>> + return -ENOENT;
>> +
>> + rhtab_read_elem_value(map, value, l, flags);
>> + err = rhtab_delete_elem(rhtab, l);
>> + if (err)
>> + return err;
>> +
>> + check_and_init_map_value(map, value);
>> + return 0;
>> }
>>
>> -static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
>> +static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value,
>> + u64 map_flags)
>> {
>> - return -EOPNOTSUPP;
>> + if (map_flags & BPF_NOEXIST)
>> + return -EEXIST;
>> +
>> + if (map_flags & BPF_F_LOCK)
>> + copy_map_value_locked(map, rhtab_elem_value(elem, map->key_size), value, false);
>> + else
>> + copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
>
> It looks like Sashiko is accurate about special fields not getting handled here.
>
Special fields are not copied, this is similar to other maps.
>> + return 0;
>> }
>>
>> -static void rhtab_map_free_internal_structs(struct bpf_map *map)
>> +static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
>> {
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhtab_elem *elem, *tmp;
>> +
>> + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
>> + return -EINVAL;
>> +
>> + if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
>> + return -EINVAL;
>> +
>> + guard(rcu)();
>> + elem = rhtab_lookup_elem(map, key);
>> + if (elem)
>> + return rhtab_map_update_existing(map, elem, value, map_flags);
>> +
>> + if (map_flags & BPF_EXIST)
>> + return -ENOENT;
>> +
>> + /* Check max_entries limit before inserting new element */
>> + if (atomic_read(&rhtab->ht.nelems) >= map->max_entries)
>> + return -E2BIG;
>> +
>> + elem = bpf_mem_cache_alloc(&rhtab->ma);
>> + if (!elem)
>> + return -ENOMEM;
>> +
>> + memcpy(elem->data, key, map->key_size);
>> + copy_map_value(map, rhtab_elem_value(elem, map->key_size), value);
>> +
>> + tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab->params);
>> + if (tmp) {
>> + bpf_mem_cache_free(&rhtab->ma, elem);
>> + if (IS_ERR(tmp))
>> + return PTR_ERR(tmp);
>> +
>> + return rhtab_map_update_existing(map, tmp, value, map_flags);
>> + }
>> +
>> + return 0;
>> }
>
> Note: I am actually skeptical about Sashiko's warning here. Sure, the
> update may get overwritten even as we are returning 0, but we are
> providing no guarantees about how long the write will survive in the
> map, and there is no inherent atomicity between an update and any other
> operations.
>
Right, either way can't guarantee that user going to see it's write
anyway, if concurrent writers exist.
>>
>> static int rhtab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_insn *insn = insn_buf;
>> + const int ret = BPF_REG_0;
>> +
>> + BUILD_BUG_ON(!__same_type(&rhtab_lookup_elem,
>> + (void *(*)(struct bpf_map *map, void *key)) NULL));
>> + *insn++ = BPF_EMIT_CALL(rhtab_lookup_elem);
>> + *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1);
>> + *insn++ = BPF_ALU64_IMM(BPF_ADD, ret,
>> + offsetof(struct rhtab_elem, data) + round_up(map->key_size, 8));
>> +
>> + return insn - insn_buf;
>> +}
>> +
>> +static void rhtab_map_free_internal_structs(struct bpf_map *map)
>> +{
>> }
>
> This gets filled in in later patches, but the fact it's here and a no-op
> debatably the line into being non-bisectable. Can we move it to the walk
> patch, since that's where it gets populated?
>
I think this is bisectable, because with this patch you can't create
resizable hashmap anyway (allocation support comes as the last patch of
the series). Alexei suggested to to merge these patches, so I'll do that.
>>
>> static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
2026-04-12 23:10 ` Alexei Starovoitov
2026-04-13 20:37 ` Emil Tsalapatis
@ 2026-04-14 10:25 ` Leon Hwang
2026-04-14 10:28 ` Mykyta Yatsenko
2 siblings, 1 reply; 52+ messages in thread
From: Leon Hwang @ 2026-04-14 10:25 UTC (permalink / raw)
To: mykyta.yatsenko5
Cc: andrii, ast, bpf, daniel, eddyz87, herbert, kafai, kernel-team,
memxor, yatsenko, Leon Hwang
On Wed, Apr 08, 2026 at 08:10:08AM -0700, Mykyta Yatsenko wrote:
[...]
>
> static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
> {
>- return -EOPNOTSUPP;
>+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>+ struct rhtab_elem *l;
>+ int err;
>+
>+ if ((flags & ~BPF_F_LOCK) ||
>+ ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>+ return -EINVAL;
At the moment, we can check the flags using bpf_map_check_op_flags() helper.
This helper can be applied to flags check in this series.
>+
>+ /* Make sure element is not deleted between lookup and copy */
>+ guard(rcu)();
>+
>+ l = rhtab_lookup_elem(map, key);
>+ if (!l)
>+ return -ENOENT;
>+
>+ rhtab_read_elem_value(map, value, l, flags);
>+ err = rhtab_delete_elem(rhtab, l);
>+ if (err)
>+ return err;
>+
>+ check_and_init_map_value(map, value);
>+ return 0;
> }
[...]
>
>-static void rhtab_map_free_internal_structs(struct bpf_map *map)
>+static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
> {
>+ struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>+ struct rhtab_elem *elem, *tmp;
>+
>+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
>+ return -EINVAL;
>+
>+ if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
>+ return -EINVAL;
ditto
Thanks,
Leon
>+ [...]
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab
2026-04-14 10:25 ` Leon Hwang
@ 2026-04-14 10:28 ` Mykyta Yatsenko
0 siblings, 0 replies; 52+ messages in thread
From: Mykyta Yatsenko @ 2026-04-14 10:28 UTC (permalink / raw)
To: Leon Hwang
Cc: andrii, ast, bpf, daniel, eddyz87, herbert, kafai, kernel-team,
memxor, yatsenko
On 4/14/26 11:25 AM, Leon Hwang wrote:
> On Wed, Apr 08, 2026 at 08:10:08AM -0700, Mykyta Yatsenko wrote:
> [...]
>>
>> static int rhtab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhtab_elem *l;
>> + int err;
>> +
>> + if ((flags & ~BPF_F_LOCK) ||
>> + ((flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>> + return -EINVAL;
>
> At the moment, we can check the flags using bpf_map_check_op_flags() helper.
>
> This helper can be applied to flags check in this series.
>
Thanks, I'll use it.
>> +
>> + /* Make sure element is not deleted between lookup and copy */
>> + guard(rcu)();
>> +
>> + l = rhtab_lookup_elem(map, key);
>> + if (!l)
>> + return -ENOENT;
>> +
>> + rhtab_read_elem_value(map, value, l, flags);
>> + err = rhtab_delete_elem(rhtab, l);
>> + if (err)
>> + return err;
>> +
>> + check_and_init_map_value(map, value);
>> + return 0;
>> }
> [...]
>>
>> -static void rhtab_map_free_internal_structs(struct bpf_map *map)
>> +static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags)
>> {
>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>> + struct rhtab_elem *elem, *tmp;
>> +
>> + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
>> + return -EINVAL;
>> +
>> + if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK))
>> + return -EINVAL;
>
> ditto
>
> Thanks,
> Leon
>
>> + [...]
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable Mykyta Yatsenko
@ 2026-04-14 17:46 ` Emil Tsalapatis
0 siblings, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-14 17:46 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Add BPF_MAP_TYPE_RHASH to libbpf's map type name table and feature
> probing so that libbpf-based tools can create and identify resizable
> hash maps.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> tools/lib/bpf/libbpf.c | 1 +
> tools/lib/bpf/libbpf_probes.c | 3 +++
> 2 files changed, 4 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 9ea41f40dc82..a0324e5b6085 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -192,6 +192,7 @@ static const char * const map_type_name[] = {
> [BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage",
> [BPF_MAP_TYPE_ARENA] = "arena",
> [BPF_MAP_TYPE_INSN_ARRAY] = "insn_array",
> + [BPF_MAP_TYPE_RHASH] = "rhash",
> };
>
> static const char * const prog_type_name[] = {
> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> index b70d9637ecf5..e40819465ddc 100644
> --- a/tools/lib/bpf/libbpf_probes.c
> +++ b/tools/lib/bpf/libbpf_probes.c
> @@ -309,6 +309,9 @@ static int probe_map_create(enum bpf_map_type map_type)
> value_size = sizeof(__u64);
> opts.map_flags = BPF_F_NO_PREALLOC;
> break;
> + case BPF_MAP_TYPE_RHASH:
> + opts.map_flags = BPF_F_NO_PREALLOC;
> + break;
> case BPF_MAP_TYPE_CGROUP_STORAGE:
> case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE:
> key_size = sizeof(struct bpf_cgroup_storage_key);
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab
2026-04-14 8:08 ` Mykyta Yatsenko
@ 2026-04-14 17:47 ` Emil Tsalapatis
0 siblings, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-14 17:47 UTC (permalink / raw)
To: Mykyta Yatsenko, Emil Tsalapatis, bpf, ast, andrii, daniel, kafai,
kernel-team, eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Tue Apr 14, 2026 at 4:08 AM EDT, Mykyta Yatsenko wrote:
>
>
> On 4/14/26 12:25 AM, Emil Tsalapatis wrote:
>> On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
>>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>>
>>> Add batch operations for BPF_MAP_TYPE_RHASH.
>>>
>>> Batch operations:
>>> * rhtab_map_lookup_batch: Bulk lookup of elements by bucket
>>> * rhtab_map_lookup_and_delete_batch: Atomic bulk lookup and delete
>>>
>>> The batch implementation uses rhashtable_walk_enter_from() to resume
>>> iteration from the last collected key. When the buffer fills, the last
>>> key becomes the cursor for the next batch call.
>>>
>>> Also implements rhtab_map_mem_usage() to report memory consumption.
>>>
>>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>>> ---
>>> kernel/bpf/hashtab.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 134 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index e79c194e2779..a79d434dc626 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>> @@ -3051,19 +3051,150 @@ static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
>>>
>>> static u64 rhtab_map_mem_usage(const struct bpf_map *map)
>>> {
>>> - return 0;
>>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>>> + u64 num_entries;
>>> +
>>> + num_entries = atomic_read(&rhtab->ht.nelems);
>>> + return sizeof(struct bpf_rhtab) + rhtab->elem_size * num_entries;
>>> +}
>>> +
>>> +static int __rhtab_map_lookup_and_delete_batch(struct bpf_map *map,
>>> + const union bpf_attr *attr,
>>> + union bpf_attr __user *uattr,
>>> + bool do_delete)
>>> +{
>>> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map);
>>> + void __user *uvalues = u64_to_user_ptr(attr->batch.values);
>>> + void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
>>> + void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
>>> + void *buf = NULL, *keys = NULL, *values = NULL, *dst_key, *dst_val;
>>> + struct rhtab_elem **del_elems = NULL;
>>> + u32 max_count, total, key_size, value_size, i;
>>> + struct rhashtable_iter iter;
>>> + struct rhtab_elem *elem;
>>> + u64 elem_map_flags, map_flags;
>>> + int ret = 0;
>>> +
>>> + elem_map_flags = attr->batch.elem_flags;
>>> + if ((elem_map_flags & ~BPF_F_LOCK) ||
>>> + ((elem_map_flags & BPF_F_LOCK) &&
>>> + !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
>>> + return -EINVAL;
>>> +
>>> + map_flags = attr->batch.flags;
>>> + if (map_flags)
>>> + return -EINVAL;
>>> +
>>> + max_count = attr->batch.count;
>>> + if (!max_count)
>>> + return 0;
>>> +
>>> + if (put_user(0, &uattr->batch.count))
>>> + return -EFAULT;
>>> +
>>> + key_size = map->key_size;
>>> + value_size = map->value_size;
>>> +
>>> + keys = kvmalloc_array(max_count, key_size, GFP_USER | __GFP_NOWARN);
>>> + values = kvmalloc_array(max_count, value_size, GFP_USER | __GFP_NOWARN);
>>> + if (do_delete)
>>> + del_elems = kvmalloc_array(max_count, sizeof(void *),
>>> + GFP_USER | __GFP_NOWARN);
>>> +
>>> + if (!keys || !values || (do_delete && !del_elems)) {
>>> + ret = -ENOMEM;
>>> + goto free;
>>> + }
>>> +
>>> + /*
>>> + * Use the last key from the previous batch as cursor.
>>> + * enter_from positions at that key's bucket, walk_next
>>> + * returns the successor in O(1).
>>> + * First call (ubatch == NULL): starts from bucket 0.
>>> + */
>>> + if (ubatch) {
>>> + buf = kmalloc(key_size, GFP_USER | __GFP_NOWARN);
>>> + if (!buf) {
>>> + ret = -ENOMEM;
>>> + goto free;
>>> + }
>>> + if (copy_from_user(buf, ubatch, key_size)) {
>>> + ret = -EFAULT;
>>> + goto free;
>>> + }
>>> + }
>>> +
>>> + scoped_guard(rcu) {
>>
>> AFAICT this guard makes sure the RCU critical section extends from the
>> beginning of rhashtable_walk_enter_from all the way to walk_stop(), is
>> that correct?
>>
>
> This guard is to make sure that when rhashtable_walk_enter_from()
> initializes iterator, the element which it initialised with is not
> freed. rhashtable_walk_start() calls rcu lock as well, that's why I can
> end guard right after rhashtable_walk_start().
>
Sounds good, thanks.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
>>> + rhashtable_walk_enter_from(&rhtab->ht, &iter, buf, rhtab->params);
>>> + rhashtable_walk_start(&iter);
>>> + }
>>> +
>>> + dst_key = keys;
>>> + dst_val = values;
>>> + total = 0;
>>> +
>>> + while (total < max_count) {
>>> + elem = rhtab_iter_next(&iter);
>>> + if (!elem)
>>> + break;
>>> +
>>> + memcpy(dst_key, elem->data, key_size);
>>> + rhtab_read_elem_value(map, dst_val, elem, elem_map_flags);
>>> + check_and_init_map_value(map, dst_val);
>>> +
>>> + if (do_delete)
>>> + del_elems[total] = elem;
>>> +
>>> + dst_key += key_size;
>>> + dst_val += value_size;
>>> + total++;
>>> + }
>>> +
>>> + if (do_delete) {
>>> + for (i = 0; i < total; i++)
>>> + rhtab_delete_elem(rhtab, del_elems[i]);
>>> + }
>>> +
>>> + rhashtable_walk_stop(&iter);
>>> + rhashtable_walk_exit(&iter);
>>> +
>>> + if (total == 0) {
>>> + ret = -ENOENT;
>>> + goto free;
>>> + }
>>> +
>>> + /* Signal end of table when we collected fewer than requested */
>>> + if (total < max_count)
>>> + ret = -ENOENT;
>>> +
>>> + /* Write last key as cursor for the next batch call */
>>> + if (copy_to_user(ukeys, keys, total * key_size) ||
>>> + copy_to_user(uvalues, values, total * value_size) ||
>>> + put_user(total, &uattr->batch.count) ||
>>> + copy_to_user(u64_to_user_ptr(attr->batch.out_batch),
>>> + dst_key - key_size, key_size)) {
>>> + ret = -EFAULT;
>>> + goto free;
>>> + }
>>> +
>>> +free:
>>> + kfree(buf);
>>> + kvfree(keys);
>>> + kvfree(values);
>>> + kvfree(del_elems);
>>> + return ret;
>>> }
>>>
>>> static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
>>> union bpf_attr __user *uattr)
>>> {
>>> - return 0;
>>> + return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, false);
>>> }
>>>
>>> static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr,
>>> union bpf_attr __user *uattr)
>>> {
>>> - return 0;
>> Shouldn't these have been -EINVAL or -ENOSUPP in previous patches?
>>
>>> + return __rhtab_map_lookup_and_delete_batch(map, attr, uattr, true);
>>> }
>>>
>>> struct bpf_iter_seq_rhash_map_info {
>>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs for resizable hashtab
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs " Mykyta Yatsenko
@ 2026-04-14 17:49 ` Emil Tsalapatis
0 siblings, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-14 17:49 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Wire up seq_file BPF iterator for BPF_MAP_TYPE_RHASH so that
> bpf_iter and bpftool map dump work with resizable hash maps.
>
> Use rhashtable_walk_enter_from() with a saved last_key to resume
> iteration across read() calls without linear skip from the
> beginning on each seq_start.
>
> Also implement rhtab_map_seq_show_elem() for bpftool map dump
> in non-iterator mode.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> kernel/bpf/hashtab.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 94 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index a79d434dc626..492c6a9154b6 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -3008,6 +3008,19 @@ static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key
>
> static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m)
> {
> + void *value;
> +
> + /* Guarantee that hashtab value is not freed */
> + guard(rcu)();
> +
> + value = rhtab_map_lookup_elem(map, key);
> + if (!value)
> + return;
> +
> + btf_type_seq_show(map->btf, map->btf_key_type_id, key, m);
> + seq_puts(m, ": ");
> + btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
> + seq_putc(m, '\n');
> }
>
> static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn,
> @@ -3201,36 +3214,113 @@ struct bpf_iter_seq_rhash_map_info {
> struct bpf_map *map;
> struct bpf_rhtab *rhtab;
> struct rhashtable_iter iter;
> - u32 skip_elems;
> + void *last_key;
Nit: Could we avoid adding skip_elems to this in the first place since
it's never used?
> bool iter_active;
> };
>
Question: Would it make sense/be worth annotating the seq functions with
__acquires/__releases(RCU)?
> static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos)
> {
> - return NULL;
> + struct bpf_iter_seq_rhash_map_info *info = seq->private;
> + struct rhtab_elem *elem;
> + void *key = *pos > 0 ? info->last_key : NULL;
> +
> + scoped_guard(rcu) {
> + rhashtable_walk_enter_from(&info->rhtab->ht, &info->iter,
> + key, info->rhtab->params);
> + rhashtable_walk_start(&info->iter);
> + }
> + info->iter_active = true;
> +
> + elem = rhtab_iter_next(&info->iter);
> + if (!elem)
> + return NULL;
> + /*
> + * if *pos is not 0, previously iteration failed on this elem,
> + * so we are restarting it. That's why no need to increment *pos.
> + */
> + if (*pos == 0)
> + ++*pos;
> + return elem;
> }
>
> static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> {
> - return NULL;
> + struct bpf_iter_seq_rhash_map_info *info = seq->private;
> + struct rhtab_elem *elem = v;
> +
> + /* Save current key for O(1) resume in next seq_start */
> + memcpy(info->last_key, elem->data, info->map->key_size);
> +
> + ++*pos;
> +
> + return rhtab_iter_next(&info->iter);
> +}
> +
> +static int __bpf_rhash_map_seq_show(struct seq_file *seq,
> + struct rhtab_elem *elem)
> +{
> + struct bpf_iter_seq_rhash_map_info *info = seq->private;
> + struct bpf_iter__bpf_map_elem ctx = {};
> + struct bpf_iter_meta meta;
> + struct bpf_prog *prog;
> + int ret = 0;
> +
> + meta.seq = seq;
> + prog = bpf_iter_get_info(&meta, elem == NULL);
> + if (prog) {
> + ctx.meta = &meta;
> + ctx.map = info->map;
> + if (elem) {
> + ctx.key = elem->data;
> + ctx.value = rhtab_elem_value(elem, info->map->key_size);
> + }
> + ret = bpf_iter_run_prog(prog, &ctx);
> + }
> +
> + return ret;
> }
>
> static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v)
> {
> - return 0;
> + return __bpf_rhash_map_seq_show(seq, v);
> }
>
> static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v)
> {
> + struct bpf_iter_seq_rhash_map_info *info = seq->private;
> +
> + if (!v)
> + (void)__bpf_rhash_map_seq_show(seq, NULL);
> +
> + if (info->iter_active) {
> + rhashtable_walk_stop(&info->iter);
> + rhashtable_walk_exit(&info->iter);
> + info->iter_active = false;
> + }
> }
>
> static int bpf_iter_init_rhash_map(void *priv_data, struct bpf_iter_aux_info *aux)
> {
> + struct bpf_iter_seq_rhash_map_info *info = priv_data;
> + struct bpf_map *map = aux->map;
> +
> + info->last_key = kmalloc(map->key_size, GFP_USER);
> + if (!info->last_key)
> + return -ENOMEM;
> +
> + bpf_map_inc_with_uref(map);
> + info->map = map;
> + info->rhtab = container_of(map, struct bpf_rhtab, map);
> + info->iter_active = false;
> return 0;
> }
>
> static void bpf_iter_fini_rhash_map(void *priv_data)
> {
> + struct bpf_iter_seq_rhash_map_info *info = priv_data;
> +
> + kfree(info->last_key);
> + bpf_map_put_with_uref(info->map);
> }
>
> static const struct seq_operations bpf_rhash_map_seq_ops = {
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map
2026-04-13 8:10 ` Mykyta Yatsenko
@ 2026-04-14 17:50 ` Emil Tsalapatis
0 siblings, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-14 17:50 UTC (permalink / raw)
To: Mykyta Yatsenko, Emil Tsalapatis, bpf, ast, andrii, daniel, kafai,
kernel-team, eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Mon Apr 13, 2026 at 4:10 AM EDT, Mykyta Yatsenko wrote:
> On 4/10/26 11:31 PM, Emil Tsalapatis wrote:
>> On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
>>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>>
>>> Add resizable hash map into enums where it is needed.
>>>
>>
>> These changes in isolation are difficult to reason about,
>> can we roll this into subsequent patches? Right now this
>> adds a the BPF_MAP_TYPE_RHASH without there being a way
>> to create one.
>>
>
> Thanks for taking a look, makes sense, I can squash it into the latter
> commits.
>
For the squashed commit:
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
>>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>>> ---
>>> include/uapi/linux/bpf.h | 1 +
>>> kernel/bpf/map_iter.c | 3 ++-
>>> kernel/bpf/syscall.c | 3 +++
>>> kernel/bpf/verifier.c | 1 +
>>> tools/include/uapi/linux/bpf.h | 1 +
>>> 5 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 552bc5d9afbd..822582c04f22 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -1046,6 +1046,7 @@ enum bpf_map_type {
>>> BPF_MAP_TYPE_CGRP_STORAGE,
>>> BPF_MAP_TYPE_ARENA,
>>> BPF_MAP_TYPE_INSN_ARRAY,
>>> + BPF_MAP_TYPE_RHASH,
>>> __MAX_BPF_MAP_TYPE
>>> };
>>>
>>> diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
>>> index 261a03ea73d3..4a2aafbe28b4 100644
>>> --- a/kernel/bpf/map_iter.c
>>> +++ b/kernel/bpf/map_iter.c
>>> @@ -119,7 +119,8 @@ static int bpf_iter_attach_map(struct bpf_prog *prog,
>>> is_percpu = true;
>>> else if (map->map_type != BPF_MAP_TYPE_HASH &&
>>> map->map_type != BPF_MAP_TYPE_LRU_HASH &&
>>> - map->map_type != BPF_MAP_TYPE_ARRAY)
>>> + map->map_type != BPF_MAP_TYPE_ARRAY &&
>>> + map->map_type != BPF_MAP_TYPE_RHASH)
>>> goto put_map;
>>>
>>> key_acc_size = prog->aux->max_rdonly_access;
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 51ade3cde8bb..0a5ec417638d 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -1287,6 +1287,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
>>> case BPF_SPIN_LOCK:
>>> case BPF_RES_SPIN_LOCK:
>>> if (map->map_type != BPF_MAP_TYPE_HASH &&
>>> + map->map_type != BPF_MAP_TYPE_RHASH &&
>>> map->map_type != BPF_MAP_TYPE_ARRAY &&
>>> map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
>>> map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
>>> @@ -1464,6 +1465,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
>>> case BPF_MAP_TYPE_CGROUP_ARRAY:
>>> case BPF_MAP_TYPE_ARRAY_OF_MAPS:
>>> case BPF_MAP_TYPE_HASH:
>>> + case BPF_MAP_TYPE_RHASH:
>>> case BPF_MAP_TYPE_PERCPU_HASH:
>>> case BPF_MAP_TYPE_HASH_OF_MAPS:
>>> case BPF_MAP_TYPE_RINGBUF:
>>> @@ -2199,6 +2201,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>>> map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
>>> map->map_type == BPF_MAP_TYPE_LRU_HASH ||
>>> map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
>>> + map->map_type == BPF_MAP_TYPE_RHASH ||
>>> map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
>>> if (!bpf_map_is_offloaded(map)) {
>>> bpf_disable_instrumentation();
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 8c1cf2eb6cbb..53523ab953c2 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -21816,6 +21816,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>>> if (prog->sleepable)
>>> switch (map->map_type) {
>>> case BPF_MAP_TYPE_HASH:
>>> + case BPF_MAP_TYPE_RHASH:
>>> case BPF_MAP_TYPE_LRU_HASH:
>>> case BPF_MAP_TYPE_ARRAY:
>>> case BPF_MAP_TYPE_PERCPU_HASH:
>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>> index 677be9a47347..9d7df174770a 100644
>>> --- a/tools/include/uapi/linux/bpf.h
>>> +++ b/tools/include/uapi/linux/bpf.h
>>> @@ -1046,6 +1046,7 @@ enum bpf_map_type {
>>> BPF_MAP_TYPE_CGRP_STORAGE,
>>> BPF_MAP_TYPE_ARENA,
>>> BPF_MAP_TYPE_INSN_ARRAY,
>>> + BPF_MAP_TYPE_RHASH,
>>> __MAX_BPF_MAP_TYPE
>>> };
>>>
>>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation Mykyta Yatsenko
@ 2026-04-14 17:51 ` Emil Tsalapatis
0 siblings, 0 replies; 52+ messages in thread
From: Emil Tsalapatis @ 2026-04-14 17:51 UTC (permalink / raw)
To: Mykyta Yatsenko, bpf, ast, andrii, daniel, kafai, kernel-team,
eddyz87, memxor, herbert
Cc: Mykyta Yatsenko
On Wed Apr 8, 2026 at 11:10 AM EDT, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Make bpftool documentation aware of the resizable hash map.
>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> tools/bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
> tools/bpf/bpftool/map.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index 1af3305ea2b2..5daf3de5c744 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -56,7 +56,7 @@ MAP COMMANDS
> | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
> | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
> | | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena**
> -| | **insn_array** }
> +| | **insn_array** | **rhash** }
>
> DESCRIPTION
> ===========
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 7ebf7dbcfba4..71a45d96617e 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1478,7 +1478,7 @@ static int do_help(int argc, char **argv)
> " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
> " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
> " task_storage | bloom_filter | user_ringbuf | cgrp_storage | arena |\n"
> - " insn_array }\n"
> + " insn_array | rhash }\n"
> " " HELP_SPEC_OPTIONS " |\n"
> " {-f|--bpffs} | {-n|--nomount} }\n"
> "",
^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2026-04-14 17:51 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
2026-04-10 22:31 ` Emil Tsalapatis
2026-04-13 8:10 ` Mykyta Yatsenko
2026-04-14 17:50 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 02/18] bpf: Add resizable hashtab skeleton Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
2026-04-12 23:10 ` Alexei Starovoitov
2026-04-13 10:52 ` Mykyta Yatsenko
2026-04-13 16:24 ` Alexei Starovoitov
2026-04-13 16:27 ` Daniel Borkmann
2026-04-13 19:43 ` Mykyta Yatsenko
2026-04-13 20:37 ` Emil Tsalapatis
2026-04-14 8:34 ` Mykyta Yatsenko
2026-04-14 10:25 ` Leon Hwang
2026-04-14 10:28 ` Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
2026-04-12 23:13 ` Alexei Starovoitov
2026-04-13 12:22 ` Mykyta Yatsenko
2026-04-13 22:22 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab Mykyta Yatsenko
2026-04-13 22:44 ` Emil Tsalapatis
2026-04-14 8:11 ` Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API Mykyta Yatsenko
2026-04-13 23:02 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab Mykyta Yatsenko
2026-04-13 23:25 ` Emil Tsalapatis
2026-04-14 8:08 ` Mykyta Yatsenko
2026-04-14 17:47 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs " Mykyta Yatsenko
2026-04-14 17:49 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free " Mykyta Yatsenko
2026-04-12 23:15 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 10/18] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable Mykyta Yatsenko
2026-04-14 17:46 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
2026-04-12 23:16 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps Mykyta Yatsenko
2026-04-12 23:17 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests Mykyta Yatsenko
2026-04-12 23:18 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key Mykyta Yatsenko
2026-04-12 23:19 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map Mykyta Yatsenko
2026-04-12 23:20 ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation Mykyta Yatsenko
2026-04-14 17:51 ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
2026-04-12 23:25 ` Alexei Starovoitov
2026-04-12 23:11 ` [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Alexei Starovoitov
2026-04-13 8:28 ` Mykyta Yatsenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox