From: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>
To: "Gyutae Bae" <gyutae.opensource@navercorp.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>, <bpf@vger.kernel.org>
Cc: "John Fastabend" <john.fastabend@gmail.com>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"Jiri Olsa" <jolsa@kernel.org>,
"Emil Tsalapatis" <emil@etsalapatis.com>,
"Shuah Khan" <shuah@kernel.org>,
<linux-kselftest@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
"Minsu Jeon" <minsu.jeon@navercorp.com>,
"Siwan Kim" <siwan.kim@navercorp.com>,
"Jonghyeon Kim" <jong-hyeon.kim@navercorp.com>,
"Gyutae Bae" <gyutae.bae@navercorp.com>
Subject: Re: [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps
Date: Mon, 22 Jun 2026 15:32:52 -0700 [thread overview]
Message-ID: <DJFXOGFS8DXY.4CDI7LHXGHWN@gmail.com> (raw)
In-Reply-To: <20260622071649.31541-1-gyutae.opensource@navercorp.com>
On Mon Jun 22, 2026 at 12:16 AM PDT, Gyutae Bae wrote:
> From: Gyutae Bae <gyutae.bae@navercorp.com>
>
> This series adds an atomic compare-and-delete primitive to BPF hash
> maps, motivated by a TOCTOU race in Cilium's conntrack GC [1]: the
> batched GC snapshots CT entries, decides which expired, then deletes
> them by key in a later syscall; between snapshot and delete the
> datapath can refresh the same entry, so a live entry is deleted. A
> userspace re-check before delete can't close it (lookup and delete are
> separate, individually bucket-locked calls).
>
> BPF_F_COMPARE lets userspace delete a key only if a chosen value region
> is unchanged, with the compare and the delete done atomically under the
> hash bucket lock:
>
> attr.flags |= BPF_F_COMPARE;
> attr.compare = <expected>;
> attr.compare_offset = <off>;
> attr.compare_size = <len>;
>
> mismatch -> -EBUSY, absent -> -ENOENT, unsupported map -> -EOPNOTSUPP.
> The compare* fields without the flag are rejected (-EINVAL) so a dropped
> flag can't silently become an unconditional delete; maps whose value
> carries BTF-managed fields (spin_lock/timer/kptr/...) are rejected
> (-EOPNOTSUPP) since those bytes are sanitised on lookup.
>
> Atomicity boundary (please scrutinise): the compare is atomic vs every
> bucket-lock holder, but NOT vs a BPF program writing the value in place
> via the pointer from bpf_map_lookup_elem() (no bucket lock). It
> collapses the race window from the whole GC batch to one bucket-locked
> critical section; full closure wants the compared region treated as a
> synchronization variable (e.g. a monotonic revision). The selftest
> models this.
>
> Scope of this RFC: per-element compare-and-delete on BPF_MAP_TYPE_HASH
> only. Deferred (will follow once the approach is agreed): batch delete +
> its attr fields, a libbpf wrapper, LRU-hash and other map types, a
> compare-and-swap *update*.
>
> Open questions:
> - flag name: BPF_F_COMPARE vs something else?
> - mismatch errno: -EBUSY vs -EAGAIN?
> - new ->map_delete_elem_cmp() op vs extending ->map_delete_elem?
Sorry, this is no go.
There is bpf_spin_lock that use can use to synchronize access
between bpf progs and user space.
lookup_and_delete with BPF_F_LOCK uses the same lock.
Or add another syscall program that is triggered from user space
that operates on the same map.
Or convert everything to arena and use whatever algorithm you prefer.
next prev parent reply other threads:[~2026-06-22 22:32 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-22 7:16 [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
2026-06-22 7:16 ` [RFC bpf-next 1/3] bpf: add BPF_F_COMPARE flag and compare fields to map elem UAPI Gyutae Bae
2026-06-22 7:16 ` [RFC bpf-next 2/3] bpf: implement compare-and-delete (BPF_F_COMPARE) for BPF_MAP_TYPE_HASH Gyutae Bae
2026-06-22 7:16 ` [RFC bpf-next 3/3] selftests/bpf: test BPF_F_COMPARE compare-and-delete Gyutae Bae
2026-06-22 22:32 ` Alexei Starovoitov [this message]
2026-06-23 2:58 ` [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps Gyutae Bae
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DJFXOGFS8DXY.4CDI7LHXGHWN@gmail.com \
--to=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=emil@etsalapatis.com \
--cc=gyutae.bae@navercorp.com \
--cc=gyutae.opensource@navercorp.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jong-hyeon.kim@navercorp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=minsu.jeon@navercorp.com \
--cc=shuah@kernel.org \
--cc=siwan.kim@navercorp.com \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox