From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A9AF271A7C for ; Wed, 31 Dec 2025 07:42:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767166928; cv=none; b=LFmG0RE63V3qs64ncVgsRRKSqsZFBBQojRyyvzYp5AQleQnOfQInuI53WTgTbRP2nkPq03QVOHVehrCmcPweE6io26gbSWGCMgMPRAVN4zrQ/fuJUQDe9Sk3k/NYLLn8Uswh7+kpNq4H5nMMZDtBmu1+vbubrBnIrCu66eEZuPY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767166928; c=relaxed/simple; bh=GvswzSQgWtrqhQcg4WlQY7/TDalX55vAXZ5y0FYd8hw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=REpRI5rciu9d5e3gj7nHVxs490cgh+EOg85CDcp+xC27YmL4aVAKQ9QnKbXAOm1JlC2HTcz6pc8yLGqVC6Ks6jCMp2FvBiCC1z+SdumbujMA/vvWlidJkAE+C35KriZVtAeCUNBY8vAXwZgbW6pS5KTckAxhC9yjX5qK2Bi4ioA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0jB+qgm7; arc=none smtp.client-ip=209.85.208.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0jB+qgm7" Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-64daeb28c56so9096754a12.2 for ; Tue, 30 Dec 2025 23:42:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767166923; x=1767771723; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=sBxwhR+DOUWySSn1nkmmHXTpgGimoEMqTlfeKUVI3Ic=; b=0jB+qgm7yJzpI0aa0dNQMBTVMPxzm3STBr0+ut3jJAG5VDIKBH8JpiRlOPCNUOJUio 4k1h8AlL9qeBVl6JcJ6fV3DESe+ZuX8kGtREbkONlYQbgJukEs0Qs6LqYBJrTUXcL1qO 35CZy1n72WbFLN2GVF+UGhYx/TOqGZw7QIez+G02/a/UAmbt9BOHwv7jyjM72Y4aBoVs CSpd7vRggjS0E9oIlbC45MC/QpGcu3CbEVXGo192bZkvwxsRfXkFkWbuLG60Ff9cSEOT KHIt5G1PQyNqLQQrb93Y9HGilqpN9oQE42AgVDlvHRqsSb48ji5/ReNp7ZK6bqOayW7Q LMxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767166923; x=1767771723; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sBxwhR+DOUWySSn1nkmmHXTpgGimoEMqTlfeKUVI3Ic=; b=f4oJk49VK09rwkb4UXsWOkaLYbKre6EhU8dMRjPrTARDKLxDJ0Kd0g/9hTGrZWbQQH tNtd1824wF2EOppH8WL14zjKOWMbQ+YyZs2JjsBNJ0uE3BZv5znMIVa2E4tkTlM/jWXS M7xnkNKjOuxBvGxG8kFeYntO+SRkhLBldvdhK4FCa8Rlaeor+X36IL9YNDlV4Le/MYfs igNnQdeVLI8SH1fBhiveqCRH5yV36QlsQHMK8+jsnr7idnj+bK6jz9OsdRQoPnTqETo1 /DGBIad1vDeUOvXuJwZ4M2G4ZDaKKQLT9zNI+VaxkFfhk+BVbUw8vY4ZZxXqrCVwcDgO DGQw== X-Forwarded-Encrypted: i=1; AJvYcCWDLOMqPxVYDp9QKQWRXJXDeJ0DkCgvLIbYSDhdAkvuEYZICTbQXntlpAn/QVzqHnvW8CtNhWQNzIM9v18=@vger.kernel.org X-Gm-Message-State: AOJu0YwlMLCfaoUxSITM+1G/te23ADMkNCquy0EvX1ChAWW7KROve6nl I4kpLXGFVs2kPVM4WU/To8bPQsuqxVYbhO6rVkHp1mpPkQlhb4rlLEE38IOSVM+2aQ== X-Gm-Gg: AY/fxX5wpwKMXhGIk3XWiphiiTpEeaZYD5nRZnD2OpmAaPqkAURNEqtC3QDnNdo2pGf W7xbwpKNKr2qzwYNeaQGduGgebPWmqLykUogLTeHPWFxiIEBkh8rsAXpXwD3VZ09H73ya0g/OZT p/ht7cEHJg0UBZHZuZNZIqlCfMFBLQ3OqjqgubFBF+Te4AbsUIvwKTO3u5qrFqQbjrs0TbgTmFX Kn+M9x60Ekq/MPNaB2oEfrXhlzq2iHNRxAyYNtf611EH2dmEbNxlfsdpkMPOhQSYLr9qCNrusMi qQkEymF11w2ErAl/4m8elJo030GnjpjJVOcv2PSUMOMP/N4sxo0GTNljlEMo3ls4HHhyUd6Bl5F 9R3VdsNImiISrHhCylQXKNAwiD+E315AiU2TMEK0cuPhJz5WIER7zb3/Dbz9uLvPb7DjJUsqStI sQhcypSqaftrYHpuLJLpVd8Jlm9jY3i+TRqglCq+pKyij55qz2SaI+ew== X-Google-Smtp-Source: AGHT+IG1aRB+d3WsL8zwpg57El7FgfpDZaQb4jebLM9i908vaeSLRjskbvtS5lkEflNe6+T45D2gZg== X-Received: by 2002:a05:6402:5188:b0:64b:83cb:d93e with SMTP id 4fb4d7f45d1cf-64b8eb6194bmr32320732a12.20.1767166922985; Tue, 30 Dec 2025 23:42:02 -0800 (PST) Received: from google.com (14.59.147.34.bc.googleusercontent.com. [34.147.59.14]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-64b9ef904bcsm36693514a12.22.2025.12.30.23.42.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Dec 2025 23:42:01 -0800 (PST) Date: Wed, 31 Dec 2025 07:41:58 +0000 From: Matt Bobrowski To: Roman Gushchin Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, JP Kobryn , Alexei Starovoitov , Daniel Borkmann , Shakeel Butt , Michal Hocko , Johannes Weiner Subject: Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc Message-ID: References: <20251223044156.208250-1-roman.gushchin@linux.dev> <20251223044156.208250-4-roman.gushchin@linux.dev> <7ia4ms2zwuqb.fsf@castle.c.googlers.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7ia4ms2zwuqb.fsf@castle.c.googlers.com> On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote: > Matt Bobrowski writes: > > > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote: > >> Introduce a BPF kfunc to get a trusted pointer to the root memory > >> cgroup. It's very handy to traverse the full memcg tree, e.g. > >> for handling a system-wide OOM. > >> > >> It's possible to obtain this pointer by traversing the memcg tree > >> up from any known memcg, but it's sub-optimal and makes BPF programs > >> more complex and less efficient. > >> > >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics, > >> however in reality it's not necessary to bump the corresponding > >> reference counter - root memory cgroup is immortal, reference counting > >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid > >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer > >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op. > >> > >> Signed-off-by: Roman Gushchin > >> --- > >> mm/bpf_memcontrol.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c > >> index 82eb95de77b7..187919eb2fe2 100644 > >> --- a/mm/bpf_memcontrol.c > >> +++ b/mm/bpf_memcontrol.c > >> @@ -10,6 +10,25 @@ > >> > >> __bpf_kfunc_start_defs(); > >> > >> +/** > >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup > >> + * > >> + * The function has KF_ACQUIRE semantics, even though the root memory > >> + * cgroup is never destroyed after being created and doesn't require > >> + * reference counting. And it's perfectly safe to pass it to > >> + * bpf_put_mem_cgroup() > >> + * > >> + * Return: A pointer to the root memory cgroup. > >> + */ > >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void) > >> +{ > >> + if (mem_cgroup_disabled()) > >> + return NULL; > >> + > >> + /* css_get() is not needed */ > >> + return root_mem_cgroup; > >> +} > >> + > >> /** > >> * bpf_get_mem_cgroup - Get a reference to a memory cgroup > >> * @css: pointer to the css structure > >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg) > >> __bpf_kfunc_end_defs(); > >> > >> BTF_KFUNCS_START(bpf_memcontrol_kfuncs) > >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL) > > > > I feel as though relying on KF_ACQUIRE semantics here is somewhat > > odd. Users of this BPF kfunc will now be forced to call > > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being > > completely unnecessary. > > A agree that it's annoying, but I doubt this extra call makes any > difference in the real world. Sure, that certainly holds true. > Also, the corresponding kernel code designed to hide the special > handling of the root cgroup. css_get()/css_put() are simple no-ops for > the root cgroup, but are totally valid. Yes, I do see that. > So in most places the root cgroup is handled as any other, which > simplifies the code. I guess the same will be true for many bpf > programs. I see, however the same might not necessarily hold for all other global pointers which end up being handed out by a BPF kfunc (not necessarily bpf_get_root_mem_cgroup()). This is why I was wondering whether there's some sense to introducing another KF flag (or something similar) which allows returned values from BPF kfuncs to be implicitly treated as trusted.