public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map
@ 2026-04-08 15:10 Mykyta Yatsenko
  2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
                   ` (18 more replies)
  0 siblings, 19 replies; 42+ messages in thread
From: Mykyta Yatsenko @ 2026-04-08 15:10 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	herbert
  Cc: Mykyta Yatsenko

This patch series introduces BPF_MAP_TYPE_RHASH, a new hash map type that
leverages the kernel's rhashtable to provide resizable hash map for BPF.

The existing BPF_MAP_TYPE_HASH uses a fixed number of buckets determined at
map creation time. While this works well for many use cases, it presents
challenges when:

1. The number of elements is unknown at creation time
2. The element count varies significantly during runtime
3. Memory efficiency is important (over-provisioning wastes memory,
 under-provisioning hurts performance)

BPF_MAP_TYPE_RHASH addresses these issues by using rhashtable, which
automatically grows and shrinks based on load factor.

The implementation wraps the kernel's rhashtable with BPF map operations:

- Uses bpf_mem_alloc for RCU-safe memory management
- Supports all standard map operations (lookup, update, delete, get_next_key)
- Supports batch operations (lookup_batch, lookup_and_delete_batch)
- Supports BPF iterators for traversal
- Supports BPF_F_LOCK for spin locks in values
- Requires BPF_F_NO_PREALLOC flag (elements allocated on demand)
- max_entries serves as a hard limit, not bucket count

The series includes comprehensive tests:
- Basic operations in test_maps (lookup, update, delete, get_next_key)
- BPF program tests for lookup/update/delete semantics
- BPF_F_LOCK tests with concurrent access
- Stress tests for get_next_key during concurrent resize operations
- Seq file tests

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Current implementation of the BPF_MAP_TYPE_RHASH does not provide
the same strong guarantees on the values consistency under concurrent
reads/writes as BPF_MAP_TYPE_HASH.
BPF_MAP_TYPE_HASH allocates a new element and atomically swaps the
pointer, so RCU readers always see a complete value. BPF_MAP_TYPE_RHASH
does memcpy in place with no lock held.
rhash trades consistency for speed (5x improvement in update benchmark):
concurrent readers can observe partially updated data. Two concurrent
writers to the same key can also interleave, producing mixed values.
As a solution, user may use BPF_F_LOCK to guarantee consistent reads
and write serialization.
Summary of the read consistency guarantees:
  map type     |  write mechanism |  read consistency
  -------------+------------------+--------------------------
  htab         |  alloc, swap ptr |  always consistent (RCU)
  htab  F_LOCK |  in-place + lock |  consistent if reader locks
  -------------+------------------+--------------------------
  rhtab        |  in-place memcpy |  torn reads
  rhtab F_LOCK |  in-place + lock |  consistent if reader locks

Benchmarks and s390 tests depend on the
https://lore.kernel.org/linux-crypto/20260224192954.819444-1-mykyta.yatsenko5@gmail.com/

Changes in v2:
- Added benchmarks
- Reworked all functions that walk the rhashtable, use walk API, instead
of directly accessing tbl and future_tbl
- Added rhashtable_walk_enter_from() into rhashtable to support O(1)
iteration continuations
- Link to v1: https://lore.kernel.org/r/20260205-rhash-v1-0-30dd6d63c462@meta.com

---
Mykyta Yatsenko (18):
      bpf: Register rhash map
      bpf: Add resizable hashtab skeleton
      bpf: Implement lookup, delete, update for resizable hashtab
      rhashtable: Add rhashtable_walk_enter_from()
      bpf: Implement get_next_key and free_internal_structs for resizable hashtab
      bpf: Implement bpf_each_rhash_elem() using walk API
      bpf: Implement batch ops for resizable hashtab
      bpf: Implement iterator APIs for resizable hashtab
      bpf: Implement alloc and free for resizable hashtab
      bpf: Allow timers, workqueues and task_work in resizable hashtab
      libbpf: Support resizable hashtable
      selftests/bpf: Add basic tests for resizable hash map
      selftests/bpf: Support resizable hashtab in test_maps
      selftests/bpf: Resizable hashtab BPF_F_LOCK tests
      selftests/bpf: Add stress tests for resizable hash get_next_key
      selftests/bpf: Add BPF iterator tests for resizable hash map
      bpftool: Add rhash map documentation
      selftests/bpf: Add resizable hashmap to benchmarks

 include/linux/bpf_types.h                          |   1 +
 include/linux/rhashtable.h                         |  31 +-
 include/uapi/linux/bpf.h                           |   1 +
 kernel/bpf/hashtab.c                               | 730 ++++++++++++++++++++-
 kernel/bpf/map_iter.c                              |   3 +-
 kernel/bpf/syscall.c                               |   4 +
 kernel/bpf/verifier.c                              |   1 +
 lib/rhashtable.c                                   |  53 ++
 lib/test_rhashtable.c                              | 120 ++++
 tools/bpf/bpftool/Documentation/bpftool-map.rst    |   2 +-
 tools/bpf/bpftool/map.c                            |   2 +-
 tools/include/uapi/linux/bpf.h                     |   1 +
 tools/lib/bpf/libbpf.c                             |   1 +
 tools/lib/bpf/libbpf_probes.c                      |   3 +
 tools/testing/selftests/bpf/bench.c                |   6 +
 .../bpf/benchs/bench_bpf_hashmap_full_update.c     |  34 +-
 .../bpf/benchs/bench_bpf_hashmap_lookup.c          |  31 +-
 .../testing/selftests/bpf/benchs/bench_htab_mem.c  |  35 +-
 .../selftests/bpf/map_tests/htab_map_batch_ops.c   |  22 +-
 tools/testing/selftests/bpf/prog_tests/rhash.c     | 502 ++++++++++++++
 .../selftests/bpf/progs/bpf_iter_bpf_rhash_map.c   |  75 +++
 tools/testing/selftests/bpf/progs/rhash.c          | 285 ++++++++
 tools/testing/selftests/bpf/test_maps.c            | 127 +++-
 23 files changed, 2012 insertions(+), 58 deletions(-)
---
base-commit: b199f582a924e04f3f3b1050c484798034c61a12
change-id: 20251103-rhash-7b70069923d8

Best regards,
--  
Mykyta Yatsenko <yatsenko@meta.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2026-04-13 23:25 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 15:10 [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 01/18] bpf: Register rhash map Mykyta Yatsenko
2026-04-10 22:31   ` Emil Tsalapatis
2026-04-13  8:10     ` Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 02/18] bpf: Add resizable hashtab skeleton Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 03/18] bpf: Implement lookup, delete, update for resizable hashtab Mykyta Yatsenko
2026-04-12 23:10   ` Alexei Starovoitov
2026-04-13 10:52     ` Mykyta Yatsenko
2026-04-13 16:24       ` Alexei Starovoitov
2026-04-13 16:27         ` Daniel Borkmann
2026-04-13 19:43           ` Mykyta Yatsenko
2026-04-13 20:37   ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 04/18] rhashtable: Add rhashtable_walk_enter_from() Mykyta Yatsenko
2026-04-12 23:13   ` Alexei Starovoitov
2026-04-13 12:22     ` Mykyta Yatsenko
2026-04-13 22:22   ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 05/18] bpf: Implement get_next_key and free_internal_structs for resizable hashtab Mykyta Yatsenko
2026-04-13 22:44   ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 06/18] bpf: Implement bpf_each_rhash_elem() using walk API Mykyta Yatsenko
2026-04-13 23:02   ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 07/18] bpf: Implement batch ops for resizable hashtab Mykyta Yatsenko
2026-04-13 23:25   ` Emil Tsalapatis
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 08/18] bpf: Implement iterator APIs " Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 09/18] bpf: Implement alloc and free " Mykyta Yatsenko
2026-04-12 23:15   ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 10/18] bpf: Allow timers, workqueues and task_work in " Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 11/18] libbpf: Support resizable hashtable Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 12/18] selftests/bpf: Add basic tests for resizable hash map Mykyta Yatsenko
2026-04-12 23:16   ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 13/18] selftests/bpf: Support resizable hashtab in test_maps Mykyta Yatsenko
2026-04-12 23:17   ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 14/18] selftests/bpf: Resizable hashtab BPF_F_LOCK tests Mykyta Yatsenko
2026-04-12 23:18   ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 15/18] selftests/bpf: Add stress tests for resizable hash get_next_key Mykyta Yatsenko
2026-04-12 23:19   ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 16/18] selftests/bpf: Add BPF iterator tests for resizable hash map Mykyta Yatsenko
2026-04-12 23:20   ` Alexei Starovoitov
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 17/18] bpftool: Add rhash map documentation Mykyta Yatsenko
2026-04-08 15:10 ` [PATCH RFC bpf-next v2 18/18] selftests/bpf: Add resizable hashmap to benchmarks Mykyta Yatsenko
2026-04-12 23:25   ` Alexei Starovoitov
2026-04-12 23:11 ` [PATCH RFC bpf-next v2 00/18] bpf: Introduce resizable hash map Alexei Starovoitov
2026-04-13  8:28   ` Mykyta Yatsenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox