BPF List
 help / color / mirror / Atom feed
* [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps
@ 2026-06-22  7:16 Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Gyutae Bae @ 2026-06-22  7:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf
  Cc: John Fastabend, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, linux-kselftest, linux-kernel,
	Minsu Jeon, Siwan Kim, Jonghyeon Kim, Gyutae Bae

From: Gyutae Bae <gyutae.bae@navercorp.com>

This series adds an atomic compare-and-delete primitive to BPF hash
maps, motivated by a TOCTOU race in Cilium's conntrack GC [1]: the
batched GC snapshots CT entries, decides which expired, then deletes
them by key in a later syscall; between snapshot and delete the
datapath can refresh the same entry, so a live entry is deleted. A
userspace re-check before delete can't close it (lookup and delete are
separate, individually bucket-locked calls).

BPF_F_COMPARE lets userspace delete a key only if a chosen value region
is unchanged, with the compare and the delete done atomically under the
hash bucket lock:

    attr.flags |= BPF_F_COMPARE;
    attr.compare = <expected>;
    attr.compare_offset = <off>;
    attr.compare_size = <len>;

mismatch -> -EBUSY, absent -> -ENOENT, unsupported map -> -EOPNOTSUPP.
The compare* fields without the flag are rejected (-EINVAL) so a dropped
flag can't silently become an unconditional delete; maps whose value
carries BTF-managed fields (spin_lock/timer/kptr/...) are rejected
(-EOPNOTSUPP) since those bytes are sanitised on lookup.

Atomicity boundary (please scrutinise): the compare is atomic vs every
bucket-lock holder, but NOT vs a BPF program writing the value in place
via the pointer from bpf_map_lookup_elem() (no bucket lock). It
collapses the race window from the whole GC batch to one bucket-locked
critical section; full closure wants the compared region treated as a
synchronization variable (e.g. a monotonic revision). The selftest
models this.

Scope of this RFC: per-element compare-and-delete on BPF_MAP_TYPE_HASH
only. Deferred (will follow once the approach is agreed): batch delete +
its attr fields, a libbpf wrapper, LRU-hash and other map types, a
compare-and-swap *update*.

Open questions:
  - flag name: BPF_F_COMPARE vs something else?
  - mismatch errno: -EBUSY vs -EAGAIN?
  - new ->map_delete_elem_cmp() op vs extending ->map_delete_elem?

[1] https://github.com/cilium/cilium/issues/46298

Gyutae Bae (3):
  bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI
  bpf: implement compare-and-delete (BPF_F_COMPARE) for
    BPF_MAP_TYPE_HASH
  selftests/bpf: test BPF_F_COMPARE compare-and-delete

 include/linux/bpf.h                           |   2 +
 include/uapi/linux/bpf.h                      |   6 +-
 kernel/bpf/hashtab.c                          |  39 +++++++
 kernel/bpf/syscall.c                          |  54 ++++++++-
 tools/include/uapi/linux/bpf.h                |   6 +-
 .../selftests/bpf/prog_tests/map_cmp_delete.c | 106 ++++++++++++++++++
 6 files changed, 208 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c


base-commit: a975094bf98ca97be9146f9d3b5681a6f9cf5ce3
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-22  7:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22  7:16 [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
2026-06-22  7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
2026-06-22  7:16 ` [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH Gyutae Bae
2026-06-22  7:58   ` sashiko-bot
2026-06-22  7:16 ` [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete Gyutae Bae
2026-06-22  7:53   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox