From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54544346E60 for ; Mon, 22 Jun 2026 22:32:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782167576; cv=none; b=E90Ul2YeZ/BTrCQEfwYVePw7yutyZQO3X8gy79B9LXhk7GrX6Q0c8iSzYZb+zQv3HO/YknMQy/Kf/GVwmDq2y+oDfB4z4asM4NJ5HrEYpJC9B46IDnUi8zD9c+uKXK6y0/wt43Wh7mphezArAUWW+USl8doU96Y4TSgAPxcXbpk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782167576; c=relaxed/simple; bh=GTrgiK6YZW/87K2ASsSbJjCcOhmlzHAMF/U2Or9B9PE=; h=Mime-Version:Content-Type:Date:Message-Id:From:To:Cc:Subject: References:In-Reply-To; b=cM6hserbipnLbCSdPamhXS6wbzxZIRZQG0493aQU0Zj4nagRyLscAlHGArb/snq0VeySFMcFOYnPg2R5Fo4tnHfIN3c0ec9/6BaZ61G7abqbAXu5RIUaPk1kAB2zpK+tYWigT/Hr++brlc5MGkqVpa3MFniV14qgcrBuIVaBQ38= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MadtyKnm; arc=none smtp.client-ip=209.85.210.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MadtyKnm" Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-7e94cd6f99aso1789114a34.3 for ; Mon, 22 Jun 2026 15:32:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1782167574; x=1782772374; darn=vger.kernel.org; h=in-reply-to:references:subject:cc:to:from:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TNMt0SALBc9dR+2Mkf/0/J5cVE9RUiLWYRooeHFSC9w=; b=MadtyKnm5uSmIUVx3YRDIVSIL270KLeCsV50TfSxxpfORa0sanjziOsk8oaKmwE0qz DjZ2GT/Fm1RtiIf6JyEbGfephpFHeyU8VkGS6MB1KGFLB6sIorGEpekPgJn4Tu74X61G Xr7shP2aAjdgvZ1OnaYd+3XP9KXPhA/vHocqFGyOqP51008apj68jzLbVPL6cVtBEvzk 56DUNGSHg/pE6d0sNrSckfeqKSMtafxg+5+tW6HICxgCjbVvExwZYNX9KTGN39/MSFHf MiVmoo12fYu5ZFQg52ohcCTigkAn/R8GO5GZ5Sd2PzAWjonvLwVcS3ej+50Xsa3CTUiM hq2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782167574; x=1782772374; h=in-reply-to:references:subject:cc:to:from:message-id:date :content-transfer-encoding:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=TNMt0SALBc9dR+2Mkf/0/J5cVE9RUiLWYRooeHFSC9w=; b=Yl3RA9owZbJl8mTQD5qtWaLJDV4rW/uZxO92OfZSjdVsI9Id3tRkR8kkuXDyfLEGZs Ss+KW5oVWMKuXmYIBIQB9vNjVmBio4aU498wEIP0pNZhacY+/+XeQTyWYJktI76GlW+i 0X7DFhd0UkONJi3+OCl9WKAHX1ZiUdIl1bUB2VO7Rxy9VXXQ/hgVxhhALXjN2rV3Ca9P LJRFlaxt+6VjN7IN8u7gLO3uFmariPhArUvNwG5hutj9EN4oJAEZrjU9bvGBnnOCT3QP QRAMvuF6fRV7VtKAv7mKBiqVJAic3jRUcR04X/HxoCnX8XLx2sSLJvD8saxg6Py1erZm evuQ== X-Forwarded-Encrypted: i=1; AFNElJ/ys7D7Xydw7QgF1ormiCx/2Jn0UlG5w6XvnrJJBKHIjTdpQPxPq16OnxqxQANYtYjkAKoITpNZajmewJ4=@vger.kernel.org X-Gm-Message-State: AOJu0YymA7NJSN+/fyFRm0Eat/iZLoykW3IQ4KY46PDuXR/yS4N6ts9e FOOnj0Kjl5+Y/BhGblNvsGB/yufSxZpgDobKK9VCV6LKCSn2VbY5O0t7jv/dFg== X-Gm-Gg: AfdE7cn9YvY2v5gn3Ix7Mq3Jf0xBPhBpd5gKv5jKsx1TUWm9L5H6bHTN6cTRiq8XORm 3mwyazPxNIRLeVjyXfDn0dGOkxiGUE5BlNW1QfHtEaf33uK9r8t9tQrCvoQ4tJbzu002aPC3CgS kWoz08TvMuY6BBpUoJnblPX3TH6lKD854pajckW8MBwejCqG15Ja+eK/0bBy/nnwklprlHDTnnB 4m5wjBi6HP2kEz2iAtz+qgILgrm+nSDmw7nlOMmpDvwrh0JV4ojA06LbXdN6Gd0KGFWfkg50iub FZIK8Mo9FiGOAK6+qB/wfZFB2w2Y+t+gGoptb7x6jCnTytdYro8x1QPjpKzoic4ol7/rkL6I5sf JJMU6NFOhYoTwgJod7vWTz87MXUKjiBG/Ne+g5RQvQrctjmLIJ3fAq2ZeojSMyig5rrZvJZH6yj tctWCNLIx5F9F21alS9QPc6o3y9BQywRTulOxS6FAujpr4jGs56PzA5u9EODGG5VltkQ332KhQ3 93MBA== X-Received: by 2002:a05:6830:4488:b0:7d7:570b:6800 with SMTP id 46e09a7af769-7e97999d908mr71007a34.23.1782167574242; Mon, 22 Jun 2026 15:32:54 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e94429a5bdsm7867392a34.22.2026.06.22.15.32.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 Jun 2026 15:32:53 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Mon, 22 Jun 2026 15:32:52 -0700 Message-Id: From: "Alexei Starovoitov" To: "Gyutae Bae" , "Alexei Starovoitov" , "Daniel Borkmann" , "Andrii Nakryiko" , Cc: "John Fastabend" , "Eduard Zingerman" , "Kumar Kartikeya Dwivedi" , "Martin KaFai Lau" , "Song Liu" , "Yonghong Song" , "Jiri Olsa" , "Emil Tsalapatis" , "Shuah Khan" , , , "Minsu Jeon" , "Siwan Kim" , "Jonghyeon Kim" , "Gyutae Bae" Subject: Re: [RFC bpf-next 0/3] bpf: compare-and-delete (BPF_F_COMPARE) for hash maps X-Mailer: aerc References: <20260622071649.31541-1-gyutae.opensource@navercorp.com> In-Reply-To: <20260622071649.31541-1-gyutae.opensource@navercorp.com> On Mon Jun 22, 2026 at 12:16 AM PDT, Gyutae Bae wrote: > From: Gyutae Bae > > This series adds an atomic compare-and-delete primitive to BPF hash > maps, motivated by a TOCTOU race in Cilium's conntrack GC [1]: the > batched GC snapshots CT entries, decides which expired, then deletes > them by key in a later syscall; between snapshot and delete the > datapath can refresh the same entry, so a live entry is deleted. A > userspace re-check before delete can't close it (lookup and delete are > separate, individually bucket-locked calls). > > BPF_F_COMPARE lets userspace delete a key only if a chosen value region > is unchanged, with the compare and the delete done atomically under the > hash bucket lock: > > attr.flags |=3D BPF_F_COMPARE; > attr.compare =3D ; > attr.compare_offset =3D ; > attr.compare_size =3D ; > > mismatch -> -EBUSY, absent -> -ENOENT, unsupported map -> -EOPNOTSUPP. > The compare* fields without the flag are rejected (-EINVAL) so a dropped > flag can't silently become an unconditional delete; maps whose value > carries BTF-managed fields (spin_lock/timer/kptr/...) are rejected > (-EOPNOTSUPP) since those bytes are sanitised on lookup. > > Atomicity boundary (please scrutinise): the compare is atomic vs every > bucket-lock holder, but NOT vs a BPF program writing the value in place > via the pointer from bpf_map_lookup_elem() (no bucket lock). It > collapses the race window from the whole GC batch to one bucket-locked > critical section; full closure wants the compared region treated as a > synchronization variable (e.g. a monotonic revision). The selftest > models this. > > Scope of this RFC: per-element compare-and-delete on BPF_MAP_TYPE_HASH > only. Deferred (will follow once the approach is agreed): batch delete + > its attr fields, a libbpf wrapper, LRU-hash and other map types, a > compare-and-swap *update*. > > Open questions: > - flag name: BPF_F_COMPARE vs something else? > - mismatch errno: -EBUSY vs -EAGAIN? > - new ->map_delete_elem_cmp() op vs extending ->map_delete_elem? Sorry, this is no go. There is bpf_spin_lock that use can use to synchronize access between bpf progs and user space. lookup_and_delete with BPF_F_LOCK uses the same lock. Or add another syscall program that is triggered from user space that operates on the same map. Or convert everything to arena and use whatever algorithm you prefer.