From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A95D35477E for ; Wed, 10 Jun 2026 02:34:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781058876; cv=none; b=IrTerSjmRzrCYjq9btKxVW6egTjSi5weooPfKCYFpnUDBBaHNlxmz4kwmhL/KdicitQt2WNAR4vlXoiFovqn5uUi3ACjZvephVMvrVlDJXl5G2Dpk4ye9Vsg6XoY8eVHpIp1A9Z6BM/z4bQIrxeBdOfSC2r9yWwFg2hDrW9WoeU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781058876; c=relaxed/simple; bh=k0Ux9tJ/CJmzpRawTFB7jjdxHa6GhgOHMEVMEuMUY0Y=; h=Mime-Version:Content-Type:Date:Message-Id:To:Cc:Subject:From: References:In-Reply-To; b=cphIrPQwY+AFd1X1IwSLU4geZuufELZdz3lQK+t5PGDrWnfl7cjaKVHRemFyzmlxvYIxnwXLVXPzw4LQzjLGgeaK1pXDwg2AKRxNWJkS6bh2YqrFETavHNXV9yvNeoqowTVCLxBz8a7/51iCf9o3BMYOtT8ikjUlMaBYcwu7b/E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HcX87M+J; arc=none smtp.client-ip=209.85.210.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HcX87M+J" Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-7e71dd64ea2so2252811a34.3 for ; Tue, 09 Jun 2026 19:34:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781058875; x=1781663675; darn=vger.kernel.org; h=in-reply-to:references:from:subject:cc:to:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5a1EyuZX0UgpOqWTMwxkb3RUUDojRXF8E/bEe1SSxmQ=; b=HcX87M+JWA94fYpZS0kmGMLYmB8irBJ6W3UONRPxQWJcyMUQY3aD7cDUQnkuIAjGIr QjyJ3k1LrhCTHGY7FjcmrVTN1QSmhlawpAzyPhW54VzeeKsQY7hqWrluibE6/FnIFmry Uy6meFmj3GSJDmdNWM7EnlDO0tsszQJt6l00sbkmm9XGhp3siVA4sUnWuQkt1XzXyAtd tejc+ZjYE/xqwGmV9RQp8JZdVD/VcPoc2cUW/T3+4675hx3GkylZOHdPXDsP3S1DTwsn S4pZej2nADaDjS8rH3pISFbLurnx7s7ZUeW/Ew0Y6hiQEJvaspG/hAh4EKm14GEMBdLp 2VGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781058875; x=1781663675; h=in-reply-to:references:from:subject:cc:to:message-id:date :content-transfer-encoding:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5a1EyuZX0UgpOqWTMwxkb3RUUDojRXF8E/bEe1SSxmQ=; b=KuZSxchUg4JkAUKQI3lVOsPficJCG3u1RbshRaYE+DLLTZo6cigvvOmAGKBB9T+qyY RPT0/67ObsbLbxoVhY/SMobHOyNdsW+xmNiEDKVgOQ2eMZloTixZ3SH/+jb9ZTXbZDoy wmo0kKPlDJT30Clrh+tea84scL4U2O3sG3DtG27hKzwrhi9XkDNjuUY9yfJ6/5SknWFq lLAaIGxpXF8b8PC3PAzOswglGw11wuMUQvFd6ZbiEr2wal8H/chvEt1JvkkM9cs51O97 JKjRvOAKMzLaC0y7AVgPVXcCRo8YyYotSqf97QlXLY3hH2f/2a/4Jsv6d/xZIkj4t9xq 4ykQ== X-Forwarded-Encrypted: i=1; AFNElJ/jh/3ZtioeY5tgZXj5sfyTF5aXlMRDTLJxqOiyPK5G91YnSqa9E+/P26ssNhgG/19k8MjJzQa+4Lk6S24=@vger.kernel.org X-Gm-Message-State: AOJu0Yz9y/i9y4goapV3V1QhDYjy9Kp38Gsgbe9n8SQtgRRO8W1Z4rsi pW9oUhjNNdY/ag8qaGow98HS269JwNGEOr11sc3oTMbwgRtEQ3J9t3N2 X-Gm-Gg: Acq92OEEtCyK6lYctUUAP9RWKHtp9PhNDdmFp/5C7HCFEvoD/SLjOXnPYn7/6OzOcXZ PkR02F546POUl+Yze9FmvhbAJft056MYzRGm3p04rb36wOs89PlTn9VvNU5ACF+WMOF+fPiqnIO hrbcjR3mA+P7BFJmhuONdPdlYLrI88DTQ2apD6oCryxnYQDb2NPVfAxJWGgg01ABHcJCb4cSOWM CgYBsF8gaMqzapXP4LGTusz9NWJ/ycwlPbEO9TIgq3ANNgxFoYAWTBiDx1YP4vSFDzZ2h6vh6wt MUC+5ewUqKL39zZbPdPX9ZujFGfjXqRcT1c7lbpyLd6l9pTgiGcjH4rqMJUBk7nVVkJdQ3dEgY1 gDY6DS/6G45P3y/F+z+rzdHEYWRXOq0A9rtHPPTSl8l6A4uBY0GaBeRNcp2jkMZa0hxIHUwfWhP 43odnpHmeAzD3qxdzNV/Iz+l7xwOqZ7cu6zxx5hd3EKcJ7i+qA3ylxNoFfm0cC1ESxr5bxUxK96 7WycPjtkqdZnIWZkdLcCja2gknI X-Received: by 2002:a05:6820:180b:b0:69e:2cb4:2ed8 with SMTP id 006d021491bc7-69e68b95539mr11104567eaf.26.1781058874493; Tue, 09 Jun 2026 19:34:34 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4a::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-69e464050fasm12197566eaf.9.2026.06.09.19.34.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 09 Jun 2026 19:34:31 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 09 Jun 2026 19:34:30 -0700 Message-Id: To: "Hou Tao" , "Vlad Poenaru" , , "Alexei Starovoitov" , "Daniel Borkmann" , "Andrii Nakryiko" , "John Fastabend" , "Martin KaFai Lau" , "Eduard Zingerman" , "Kumar Kartikeya Dwivedi" , "Song Liu" , "Yonghong Song" , "Jiri Olsa" , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= Cc: "Emil Tsalapatis" , Subject: Re: [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly From: "Alexei Starovoitov" X-Mailer: aerc References: <20260529174233.2954240-1-vlad.wing@gmail.com> <20260609135558.193287-1-vlad.wing@gmail.com> <20260609135558.193287-3-vlad.wing@gmail.com> In-Reply-To: On Tue Jun 9, 2026 at 6:53 PM PDT, Hou Tao wrote: > Hi, > > On 6/9/2026 9:55 PM, Vlad Poenaru wrote: >> The previous change relaxed the rcu_dereference annotations in >> lpm_trie.c so the trie walks no longer trip lockdep when reached from a >> sleepable BPF program holding only rcu_read_lock_trace(). By itself >> that only helps tries reached as the inner map of a map-of-maps, or >> from the classic-RCU syscall path: a sleepable program that references >> an LPM trie directly is still rejected at load time by >> check_map_prog_compatibility(), whose sleepable whitelist omits >> BPF_MAP_TYPE_LPM_TRIE: >> >> Sleepable programs can only use array, hash, ringbuf and local storage= maps >> >> LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed >> with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period >> into a Tasks Trace grace period before the node -- and the value >> embedded in it that trie_lookup_elem() returns to the program -- is >> released. That is the same reclaim discipline BPF_MAP_TYPE_HASH relies >> on for sleepable access, so a value handed to a sleepable reader cannot >> be freed while the program is still running under rcu_read_lock_trace(). >> The writer paths take trie->lock across the walk and never relied on the >> RCU read-side lock to keep nodes alive. > > For trie_lookup_elem(), I think it is not safe to enable the usage in > the sleep-able program as the patch does and it may return unexpected > value. The main reason is that rcu_read_lock_trace() can not guarantee > the current node which is being lookup-ed up will not reused by other > update procedure concurrently. However rcu_read_lock() has such > guarantee, because bpf_mem_cache_free_rcu() makes it be reusable only > after one RCU grace. For the hash-table case, I think it has the similar > problem through it has already used some trickle (hlist_nulls_node > variants) to mitigate it. You're correct. I remember that discussion. Yet people already use lpm via map-in-map bug/workaround. So I applied this set to make lpm-in-sleepable usage official and force us to do a proper fix. Also both AI bots didn't spot an issue, so the bug won't be discovered immediately and we won't see a flurry of "security" reports with slop "fixes". AI isn't that smart yet.