BPF List
 help / color / mirror / Atom feed
* [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps
@ 2026-06-22  7:16 Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Gyutae Bae @ 2026-06-22  7:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf
  Cc: John Fastabend, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, linux-kselftest, linux-kernel,
	Minsu Jeon, Siwan Kim, Jonghyeon Kim, Gyutae Bae

From: Gyutae Bae <gyutae.bae@navercorp.com>

This series adds an atomic compare-and-delete primitive to BPF hash
maps, motivated by a TOCTOU race in Cilium's conntrack GC [1]: the
batched GC snapshots CT entries, decides which expired, then deletes
them by key in a later syscall; between snapshot and delete the
datapath can refresh the same entry, so a live entry is deleted. A
userspace re-check before delete can't close it (lookup and delete are
separate, individually bucket-locked calls).

BPF_F_COMPARE lets userspace delete a key only if a chosen value region
is unchanged, with the compare and the delete done atomically under the
hash bucket lock:

    attr.flags |= BPF_F_COMPARE;
    attr.compare = <expected>;
    attr.compare_offset = <off>;
    attr.compare_size = <len>;

mismatch -> -EBUSY, absent -> -ENOENT, unsupported map -> -EOPNOTSUPP.
The compare* fields without the flag are rejected (-EINVAL) so a dropped
flag can't silently become an unconditional delete; maps whose value
carries BTF-managed fields (spin_lock/timer/kptr/...) are rejected
(-EOPNOTSUPP) since those bytes are sanitised on lookup.

Atomicity boundary (please scrutinise): the compare is atomic vs every
bucket-lock holder, but NOT vs a BPF program writing the value in place
via the pointer from bpf_map_lookup_elem() (no bucket lock). It
collapses the race window from the whole GC batch to one bucket-locked
critical section; full closure wants the compared region treated as a
synchronization variable (e.g. a monotonic revision). The selftest
models this.

Scope of this RFC: per-element compare-and-delete on BPF_MAP_TYPE_HASH
only. Deferred (will follow once the approach is agreed): batch delete +
its attr fields, a libbpf wrapper, LRU-hash and other map types, a
compare-and-swap *update*.

Open questions:
  - flag name: BPF_F_COMPARE vs something else?
  - mismatch errno: -EBUSY vs -EAGAIN?
  - new ->map_delete_elem_cmp() op vs extending ->map_delete_elem?

[1] https://github.com/cilium/cilium/issues/46298

Gyutae Bae (3):
  bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI
  bpf: implement compare-and-delete (BPF_F_COMPARE) for
    BPF_MAP_TYPE_HASH
  selftests/bpf: test BPF_F_COMPARE compare-and-delete

 include/linux/bpf.h                           |   2 +
 include/uapi/linux/bpf.h                      |   6 +-
 kernel/bpf/hashtab.c                          |  39 +++++++
 kernel/bpf/syscall.c                          |  54 ++++++++-
 tools/include/uapi/linux/bpf.h                |   6 +-
 .../selftests/bpf/prog_tests/map_cmp_delete.c | 106 ++++++++++++++++++
 6 files changed, 208 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c


base-commit: a975094bf98ca97be9146f9d3b5681a6f9cf5ce3
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI
  2026-06-22  7:16 [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
@ 2026-06-22  7:16 ` Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete Gyutae Bae
  2 siblings, 0 replies; 6+ messages in thread
From: Gyutae Bae @ 2026-06-22  7:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf
  Cc: John Fastabend, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, linux-kselftest, linux-kernel,
	Minsu Jeon, Siwan Kim, Jonghyeon Kim, Gyutae Bae

From: Gyutae Bae <gyutae.bae@navercorp.com>

Introduce the UAPI surface for a compare-and-delete primitive: a new map
flag BPF_F_COMPARE plus compare/compare_offset/compare_size fields let
a delete request name an expected value region.  Behaviour is wired up in
the following patch; this adds the flag and fields (kernel + tools/ copy)
only.

Co-developed-by: Minsu Jeon <minsu.jeon@navercorp.com>
Signed-off-by: Minsu Jeon <minsu.jeon@navercorp.com>
Co-developed-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Siwan Kim <siwan.kim@navercorp.com>
Co-developed-by: Jonghyeon Kim <jong-hyeon.kim@navercorp.com>
Signed-off-by: Jonghyeon Kim <jong-hyeon.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
---
 include/uapi/linux/bpf.h       | 6 +++++-
 tools/include/uapi/linux/bpf.h | 6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 89b36de5fdbb..4705b02fb6a4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1397,7 +1397,7 @@ enum bpf_addr_space_cast {
 	BPF_ADDR_SPACE_CAST = 1,
 };
 
-/* flags for BPF_MAP_UPDATE_ELEM command */
+/* flags for BPF_MAP_UPDATE_ELEM / BPF_MAP_DELETE_ELEM commands */
 enum {
 	BPF_ANY		= 0, /* create new element or update existing */
 	BPF_NOEXIST	= 1, /* create new element if it didn't exist */
@@ -1405,6 +1405,7 @@ enum {
 	BPF_F_LOCK	= 4, /* spin_lock-ed map_lookup/map_update */
 	BPF_F_CPU	= 8, /* cpu flag for percpu maps, upper 32-bit of flags is a cpu number */
 	BPF_F_ALL_CPUS	= 16, /* update value across all CPUs for percpu maps */
+	BPF_F_COMPARE	= 32, /* compare-and-delete: delete elem only if value region == compare */
 };
 
 /* flags for BPF_MAP_CREATE command */
@@ -1586,6 +1587,9 @@ union bpf_attr {
 			__aligned_u64 next_key;
 		};
 		__u64		flags;
+		__aligned_u64	compare;	/* user ptr to expected bytes (BPF_F_COMPARE) */
+		__u32		compare_offset;	/* start offset within value */
+		__u32		compare_size;	/* bytes to compare; 0 == whole value */
 	};
 
 	struct { /* struct used by BPF_MAP_*_BATCH commands */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 89b36de5fdbb..4705b02fb6a4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1397,7 +1397,7 @@ enum bpf_addr_space_cast {
 	BPF_ADDR_SPACE_CAST = 1,
 };
 
-/* flags for BPF_MAP_UPDATE_ELEM command */
+/* flags for BPF_MAP_UPDATE_ELEM / BPF_MAP_DELETE_ELEM commands */
 enum {
 	BPF_ANY		= 0, /* create new element or update existing */
 	BPF_NOEXIST	= 1, /* create new element if it didn't exist */
@@ -1405,6 +1405,7 @@ enum {
 	BPF_F_LOCK	= 4, /* spin_lock-ed map_lookup/map_update */
 	BPF_F_CPU	= 8, /* cpu flag for percpu maps, upper 32-bit of flags is a cpu number */
 	BPF_F_ALL_CPUS	= 16, /* update value across all CPUs for percpu maps */
+	BPF_F_COMPARE	= 32, /* compare-and-delete: delete elem only if value region == compare */
 };
 
 /* flags for BPF_MAP_CREATE command */
@@ -1586,6 +1587,9 @@ union bpf_attr {
 			__aligned_u64 next_key;
 		};
 		__u64		flags;
+		__aligned_u64	compare;	/* user ptr to expected bytes (BPF_F_COMPARE) */
+		__u32		compare_offset;	/* start offset within value */
+		__u32		compare_size;	/* bytes to compare; 0 == whole value */
 	};
 
 	struct { /* struct used by BPF_MAP_*_BATCH commands */
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH
  2026-06-22  7:16 [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
@ 2026-06-22  7:16 ` Gyutae Bae
  2026-06-22  7:58   ` sashiko-bot
  2026-06-22  7:16 ` [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete Gyutae Bae
  2 siblings, 1 reply; 6+ messages in thread
From: Gyutae Bae @ 2026-06-22  7:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf
  Cc: John Fastabend, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, linux-kselftest, linux-kernel,
	Minsu Jeon, Siwan Kim, Jonghyeon Kim, Gyutae Bae

From: Gyutae Bae <gyutae.bae@navercorp.com>

map_delete_elem() now honours BPF_F_COMPARE: it copies the expected
bytes, validates the compare window against value_size, and dispatches to a
new ->map_delete_elem_cmp().  The hash-map implementation, under the bucket
lock, memcmps the [off, off+size) window of the live value and unlinks only
on a match (mismatch -EBUSY, absent -ENOENT), making the compare and the
delete atomic w.r.t. every bucket-lock holder.  Only BPF_MAP_TYPE_HASH
wires up the op; every other map type lacks it and returns -EOPNOTSUPP.

Atomicity boundary: the compare is read under the bucket lock, so it is
atomic against every other bucket-lock holder (concurrent deletes,
element-replacing updates).  It is NOT mutual exclusion against a BPF
program that writes the value in place via the pointer from
bpf_map_lookup_elem(), which takes no bucket lock; against such a writer
the compare may observe a torn value within the locked section.  For a
meaningful result the compared region should be a synchronization variable
-- written with a single aligned store such as a monotonic revision, or
replaced via an update -- not a multi-field blob mutated in place.

It compares the raw stored value, so maps whose value carries BTF-managed
fields (bpf_spin_lock, bpf_timer, kptr, ...) are also rejected with
-EOPNOTSUPP: those bytes are sanitised on lookup, so a raw compare would
never match the caller's snapshot and could expose kernel-internal bytes.

BPF_MAP_DELETE_ELEM_LAST_FIELD now reaches compare_size, so the compare*
fields pass CHECK_ATTR even without BPF_F_COMPARE.  Reject that combination
with -EINVAL: a dropped flag must not silently turn a compare-and-delete
into an unconditional delete.

Co-developed-by: Minsu Jeon <minsu.jeon@navercorp.com>
Signed-off-by: Minsu Jeon <minsu.jeon@navercorp.com>
Co-developed-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Siwan Kim <siwan.kim@navercorp.com>
Co-developed-by: Jonghyeon Kim <jong-hyeon.kim@navercorp.com>
Signed-off-by: Jonghyeon Kim <jong-hyeon.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/hashtab.c | 39 ++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c | 54 +++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7719f6528445..2e858272f7a7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -110,6 +110,8 @@ struct bpf_map_ops {
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
 	long (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags);
 	long (*map_delete_elem)(struct bpf_map *map, void *key);
+	long (*map_delete_elem_cmp)(struct bpf_map *map, void *key,
+				    const void *compare, u32 off, u32 size);
 	long (*map_push_elem)(struct bpf_map *map, void *value, u64 flags);
 	long (*map_pop_elem)(struct bpf_map *map, void *value);
 	long (*map_peek_elem)(struct bpf_map *map, void *value);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 9f394e1aa2e8..8fc3ab938b9e 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1534,6 +1534,44 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	return ret;
 }
 
+static long htab_map_delete_elem_cmp(struct bpf_map *map, void *key,
+				     const void *compare, u32 off, u32 size)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	struct hlist_nulls_head *head;
+	struct bucket *b;
+	struct htab_elem *l;
+	unsigned long flags;
+	u32 hash, key_size;
+	int ret;
+
+	WARN_ON_ONCE(!bpf_rcu_lock_held());
+
+	key_size = map->key_size;
+
+	hash = htab_map_hash(key, key_size, htab->hashrnd);
+	b = __select_bucket(htab, hash);
+	head = &b->head;
+
+	ret = htab_lock_bucket(b, &flags);
+	if (ret)
+		return ret;
+
+	l = lookup_elem_raw(head, hash, key, key_size);
+	if (!l)
+		ret = -ENOENT;
+	else if (memcmp(htab_elem_value(l, key_size) + off, compare, size) != 0)
+		ret = -EBUSY;
+	else
+		hlist_nulls_del_rcu(&l->hash_node);
+
+	htab_unlock_bucket(b, flags);
+
+	if (!ret)
+		free_htab_elem(htab, l);	/* ret==0 implies we unlinked l */
+	return ret;
+}
+
 static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
@@ -2366,6 +2404,7 @@ const struct bpf_map_ops htab_map_ops = {
 	.map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem,
 	.map_update_elem = htab_map_update_elem,
 	.map_delete_elem = htab_map_delete_elem,
+	.map_delete_elem_cmp = htab_map_delete_elem_cmp,
 	.map_gen_lookup = htab_map_gen_lookup,
 	.map_seq_show_elem = htab_map_seq_show_elem,
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b44106c8ea75..42fc432f82e1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1892,18 +1892,30 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 	return err;
 }
 
-#define BPF_MAP_DELETE_ELEM_LAST_FIELD key
+#define BPF_MAP_DELETE_ELEM_LAST_FIELD compare_size
 
 static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 {
 	bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
 	struct bpf_map *map;
-	void *key;
+	void *key, *compare = NULL;
+	u32 off = 0, size = 0;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_DELETE_ELEM))
 		return -EINVAL;
 
+	if (attr->flags & ~BPF_F_COMPARE)
+		return -EINVAL;
+
+	/* The compare* fields are meaningful only with BPF_F_COMPARE.  Reject them
+	 * when the flag is absent so a dropped BPF_F_COMPARE cannot silently turn a
+	 * compare-and-delete into an unconditional delete.
+	 */
+	if (!(attr->flags & BPF_F_COMPARE) &&
+	    (attr->compare || attr->compare_offset || attr->compare_size))
+		return -EINVAL;
+
 	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
@@ -1920,6 +1932,38 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 		goto err_put;
 	}
 
+	if (attr->flags & BPF_F_COMPARE) {
+		bpfptr_t ucmp = make_bpfptr(attr->compare, uattr.is_kernel);
+
+		off = attr->compare_offset;
+		size = attr->compare_size ?: map->value_size;
+		/* off + size must fit in value_size, overflow-safe */
+		if (size > map->value_size || off > map->value_size - size) {
+			err = -EINVAL;
+			goto out;
+		}
+		if (!map->ops->map_delete_elem_cmp) {
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+		/* Compare-and-delete reads the raw stored value.  Maps whose value carries
+		 * BTF-managed fields (bpf_spin_lock, bpf_timer, kptr, ...) have
+		 * those bytes sanitised on lookup, so a raw compare would never
+		 * match the caller's snapshot and could expose kernel-internal
+		 * bytes.  Reject such maps for now.
+		 */
+		if (!IS_ERR_OR_NULL(map->record)) {
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+		compare = kvmemdup_bpfptr(ucmp, size);
+		if (IS_ERR(compare)) {
+			err = PTR_ERR(compare);
+			compare = NULL;
+			goto out;
+		}
+	}
+
 	if (bpf_map_is_offloaded(map)) {
 		err = bpf_map_offload_delete_elem(map, key);
 		goto out;
@@ -1932,12 +1976,16 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 
 	bpf_disable_instrumentation();
 	rcu_read_lock();
-	err = map->ops->map_delete_elem(map, key);
+	if (compare)
+		err = map->ops->map_delete_elem_cmp(map, key, compare, off, size);
+	else
+		err = map->ops->map_delete_elem(map, key);
 	rcu_read_unlock();
 	bpf_enable_instrumentation();
 	if (!err)
 		maybe_wait_bpf_programs(map);
 out:
+	kvfree(compare);
 	kvfree(key);
 err_put:
 	bpf_map_write_active_dec(map);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete
  2026-06-22  7:16 [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
  2026-06-22  7:16 ` [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH Gyutae Bae
@ 2026-06-22  7:16 ` Gyutae Bae
  2026-06-22  7:53   ` sashiko-bot
  2 siblings, 1 reply; 6+ messages in thread
From: Gyutae Bae @ 2026-06-22  7:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf
  Cc: John Fastabend, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, linux-kselftest, linux-kernel,
	Minsu Jeon, Siwan Kim, Jonghyeon Kim, Gyutae Bae

From: Gyutae Bae <gyutae.bae@navercorp.com>

Add a test for compare-and-delete on a hash map. There is no libbpf
wrapper yet, so it issues the request through the raw bpf() syscall and
compares against a revision field stored in the value.

A matching expected value deletes the element, a stale one leaves it in
place and returns -EBUSY, and deleting an already-removed key returns
-ENOENT. A compare window that reaches past value_size is rejected with
-EINVAL, as are compare fields passed without BPF_F_COMPARE, so a dropped
flag cannot silently turn into an unconditional delete. Running the same
call on an array map, which has no compare-and-delete handler, returns
-EOPNOTSUPP.

Co-developed-by: Minsu Jeon <minsu.jeon@navercorp.com>
Signed-off-by: Minsu Jeon <minsu.jeon@navercorp.com>
Co-developed-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Siwan Kim <siwan.kim@navercorp.com>
Co-developed-by: Jonghyeon Kim <jong-hyeon.kim@navercorp.com>
Signed-off-by: Jonghyeon Kim <jong-hyeon.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
---
 .../selftests/bpf/prog_tests/map_cmp_delete.c | 106 ++++++++++++++++++
 1 file changed, 106 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c

diff --git a/tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c b/tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c
new file mode 100644
index 000000000000..6d6df13adcc4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026 NAVER Corp.
+ *
+ * Author: Gyutae Bae <gyutae.bae@navercorp.com>
+ */
+
+#include <test_progs.h>
+#include <sys/syscall.h>
+
+struct val {
+	__u64 revision;	/* synchronization field we compare on */
+	__u64 payload;
+};
+
+/* libbpf has no wrapper that passes the compare-and-delete fields, so
+ * issue bpf() directly. 'flags' is a parameter so tests can also
+ * exercise the no-BPF_F_COMPARE case.
+ */
+static int cmp_delete_flags(int fd, const void *key, const void *compare,
+			    __u32 off, __u32 size, __u64 flags)
+{
+	union bpf_attr attr;
+	int ret;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.map_fd = fd;
+	attr.key = (__u64)(unsigned long)key;
+	attr.flags = flags;
+	attr.compare = (__u64)(unsigned long)compare;
+	attr.compare_offset = off;
+	attr.compare_size = size;
+	ret = syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
+	return ret < 0 ? -errno : ret;
+}
+
+static int cmp_delete(int fd, const void *key, const void *compare,
+		      __u32 off, __u32 size)
+{
+	return cmp_delete_flags(fd, key, compare, off, size, BPF_F_COMPARE);
+}
+
+static int hash_map(void)
+{
+	return bpf_map_create(BPF_MAP_TYPE_HASH, "cmp_hash",
+			      sizeof(__u32), sizeof(struct val), 64, NULL);
+}
+
+static void test_contract(void)
+{
+	const __u32 off = offsetof(struct val, revision);
+	const __u32 sz = sizeof(((struct val *)0)->revision);
+	struct val v = { .revision = 10, .payload = 0xabc };
+	struct val snap = v;
+	__u32 k = 1;
+	int fd = hash_map();
+
+	if (!ASSERT_GE(fd, 0, "map_create"))
+		return;
+	ASSERT_OK(bpf_map_update_elem(fd, &k, &v, BPF_ANY), "insert");
+
+	/* mismatch -> -EBUSY, entry preserved */
+	snap.revision = 9;
+	ASSERT_EQ(cmp_delete(fd, &k, &snap, off, sz), -EBUSY, "mismatch_ebusy");
+	ASSERT_OK(bpf_map_lookup_elem(fd, &k, &v), "still_present");
+
+	/* compare fields set but BPF_F_COMPARE omitted -> -EINVAL, entry preserved
+	 * (a dropped flag must not silently become an unconditional delete)
+	 */
+	ASSERT_EQ(cmp_delete_flags(fd, &k, &snap, off, sz, 0), -EINVAL,
+		  "cmp_fields_no_flag_einval");
+	ASSERT_OK(bpf_map_lookup_elem(fd, &k, &v), "preserved_no_flag");
+
+	/* match -> 0, entry gone */
+	snap.revision = 10;
+	ASSERT_OK(cmp_delete(fd, &k, &snap, off, sz), "match_deletes");
+	ASSERT_EQ(bpf_map_lookup_elem(fd, &k, &v), -ENOENT, "gone");
+
+	/* absent -> -ENOENT */
+	ASSERT_EQ(cmp_delete(fd, &k, &snap, off, sz), -ENOENT, "absent_enoent");
+
+	/* bad window -> -EINVAL */
+	ASSERT_EQ(cmp_delete(fd, &k, &snap, 0, sizeof(struct val) + 8), -EINVAL,
+		  "oversize_einval");
+
+	/* unsupported map type (no map_delete_elem_cmp op) -> -EOPNOTSUPP */
+	{
+		int afd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "cmp_arr",
+					 sizeof(__u32), sizeof(struct val), 4, NULL);
+		__u32 ak = 0;
+
+		if (ASSERT_GE(afd, 0, "array_create")) {
+			ASSERT_EQ(cmp_delete(afd, &ak, &snap, off, sz),
+				  -EOPNOTSUPP, "array_eopnotsupp");
+			close(afd);
+		}
+	}
+	close(fd);
+}
+
+void test_map_cmp_delete(void)
+{
+	if (test__start_subtest("contract"))
+		test_contract();
+}
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete
  2026-06-22  7:16 ` [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete Gyutae Bae
@ 2026-06-22  7:53   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-22  7:53 UTC (permalink / raw)
  To: Gyutae Bae; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Multi-line comments do not follow the mandatory BPF subsystem style.
--

commit d9d64a94ef921683b52276aed902be3aa7f68eed
Author: Gyutae Bae <gyutae.bae@navercorp.com>

selftests/bpf: test BPF_F_COMPARE compare-and-delete

This patch introduces a selftest for the compare-and-delete operation on a
hash map. Since there is no libbpf wrapper, it uses the raw bpf syscall to
verify the deletion behavior based on a revision field.

> diff --git a/tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c b/tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c
> new file mode 100644
> index 0000000000000..6d6df13adcc42
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/map_cmp_delete.c

[ ... ]

> +struct val {
> +	__u64 revision;	/* synchronization field we compare on */
> +	__u64 payload;
> +};
> +
> +/* libbpf has no wrapper that passes the compare-and-delete fields, so
> + * issue bpf() directly. 'flags' is a parameter so tests can also
> + * exercise the no-BPF_F_COMPARE case.
> + */

[Severity: Low]
This isn't a bug, but does this multi-line comment follow the preferred BPF
subsystem style? The guidelines specify that the opening /* should be on its
own line, with the text starting on the following line.

> +static int cmp_delete_flags(int fd, const void *key, const void *compare,
> +			    __u32 off, __u32 size, __u64 flags)

[ ... ]

> +static void test_contract(void)
> +{

[ ... ]

> +	/* mismatch -> -EBUSY, entry preserved */
> +	snap.revision = 9;
> +	ASSERT_EQ(cmp_delete(fd, &k, &snap, off, sz), -EBUSY, "mismatch_ebusy");
> +	ASSERT_OK(bpf_map_lookup_elem(fd, &k, &v), "still_present");
> +
> +	/* compare fields set but BPF_F_COMPARE omitted -> -EINVAL, entry preserved
> +	 * (a dropped flag must not silently become an unconditional delete)
> +	 */

[Severity: Low]
This isn't a bug, but is this multi-line comment using the preferred
formatting? Similar to the comment above, the opening /* should typically
be placed on its own empty line.

> +	ASSERT_EQ(cmp_delete_flags(fd, &k, &snap, off, sz, 0), -EINVAL,
> +		  "cmp_fields_no_flag_einval");
> +	ASSERT_OK(bpf_map_lookup_elem(fd, &k, &v), "preserved_no_flag");

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622071649.31541-1-gyutae.opensource@navercorp.com?part=3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH
  2026-06-22  7:16 ` [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH Gyutae Bae
@ 2026-06-22  7:58   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-22  7:58 UTC (permalink / raw)
  To: Gyutae Bae; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] BPF_MAP_DELETE_ELEM no longer validates that the `value` field is zero, silently allowing uninitialized garbage and breaking BPF's strict UAPI zero-initialization guarantees.
--

commit 513ae67888e0e87d8de22021be59f6117444ee43
Author: Gyutae Bae <gyutae.bae@navercorp.com>

bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH

This commit introduces the BPF_F_COMPARE flag for map_delete_elem(), allowing
atomic compare-and-delete operations on hash maps. It adds a new map operation
and updates the syscall handler to process the new compare fields.

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
[ ... ]
> @@ -1892,18 +1892,30 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
>  	return err;
>  }
>  
> -#define BPF_MAP_DELETE_ELEM_LAST_FIELD key
> +#define BPF_MAP_DELETE_ELEM_LAST_FIELD compare_size
>  
>  static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
>  {
>  	bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
>  	struct bpf_map *map;
> -	void *key;
> +	void *key, *compare = NULL;
> +	u32 off = 0, size = 0;
>  	int err;
>  
>  	if (CHECK_ATTR(BPF_MAP_DELETE_ELEM))
>  		return -EINVAL;
>  
> +	if (attr->flags & ~BPF_F_COMPARE)
> +		return -EINVAL;
> +
> +	/* The compare* fields are meaningful only with BPF_F_COMPARE.  Reject them
> +	 * when the flag is absent so a dropped BPF_F_COMPARE cannot silently turn a
> +	 * compare-and-delete into an unconditional delete.
> +	 */
> +	if (!(attr->flags & BPF_F_COMPARE) &&
> +	    (attr->compare || attr->compare_offset || attr->compare_size))
> +		return -EINVAL;

[Severity: High]
Does changing BPF_MAP_DELETE_ELEM_LAST_FIELD to compare_size bypass
zero-validation for attr->value?

CHECK_ATTR now only validates that fields after compare_size are zero.
The patch manually adds validation for flags and the compare* fields to ensure
they are zero when not in use, but it does not appear to add a similar check
for attr->value.

Could this allow userspace to pass uninitialized stack garbage in the value
field without being rejected, permanently preventing the safe repurposing of
the value field for future BPF_MAP_DELETE_ELEM extensions?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622071649.31541-1-gyutae.opensource@navercorp.com?part=2

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-22  7:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22  7:16 [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
2026-06-22  7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
2026-06-22  7:16 ` [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH Gyutae Bae
2026-06-22  7:58   ` sashiko-bot
2026-06-22  7:16 ` [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete Gyutae Bae
2026-06-22  7:53   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox