From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BB7C277C96 for ; Mon, 5 Jan 2026 07:49:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767599359; cv=none; b=lmUeQfEvtsWqNZIOBwIVcGxmrxXpEWY4JfPeXjQkuIyRLrLvTqbfGQuq5fLtzcdiO2oAOumYY6dfy7g7ZFs4tEFCCYpZX3Dkjkz08Ttec0K8Vs3js0USdUMo33Qlx93RsG6Z9/PswChbOGvwaVB11i1yAH6PS0wq1iLf4jkZlYc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767599359; c=relaxed/simple; bh=lroDlpp3EtG/xdUPoyBT9AdqfbomUMxKvTx9guySoE0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=LcsZS50RxxnIsGTlueVPa+ncfdGUzIFLo/u66E9J+whM1/RVDeDMGrM0UQUlToJoP9DvWv6Yahx9shuf/7H5qBzQskIEVVzCoornSoZp3BXDT1abFQh6qNvkX4nDM3f2ZHWc/wniRXKXWoI/glhRLEbPXho8NYU0gYGPrMBAGrw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LKShwgKv; arc=none smtp.client-ip=209.85.208.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LKShwgKv" Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-64daeb28c56so13876184a12.2 for ; Sun, 04 Jan 2026 23:49:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767599355; x=1768204155; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=xwnJzjvPN5VkHF+hXu2VjijyXAYnkVzRVRjTTEfl1AU=; b=LKShwgKvv/HzyV7AxwrMHAnlAadx669CS1rktKoVa8ksmm77zmaiRBdUFc6GJezStX +DTrX+Z7XbSP1Fr9dlxwyvaZnUJS4NxMiW8SjcQJqKnLkNJDfWS4OOrXbOwfP1XrS1R3 9WIX91p17nqAC9/tC1D5tvGOQxGyoEJZHIyJ/GSZRzrYmg/ScCDllwUuvbQOTZS1GKmA DLc67QwvYgazq4J4HKZCffr/CeTguwDZAkuL9XYeXjb032ajYIJKKHzrqeYkaG70Z8e2 /hUWcNkZ4mG5NNd1e3apLOa1jIDHDwjDB8jPkPl2ImJDO6wpjzgdJyYNc8VC1cLjfa7E WvVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767599355; x=1768204155; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xwnJzjvPN5VkHF+hXu2VjijyXAYnkVzRVRjTTEfl1AU=; b=bk7sl8lNFZGsHMEz0LVOrHVaTtAvd1zZRdFKyhTFBPIj5SchufmJ7BVOtWxItaheZn kN4voDWqKu32yfNBolyMSG493PrktThC9JznB0TYzdwlwcquLedlyRqi6qSj3KZulcFH Cnm1G3+bZBiErIIEedA4m/q5/0anhs8pHfmY2WjUPvEm41Gdg2SwrTJWGroaI1eIabs5 6pH9TRBXb+kucoXbojSeihVqNRtx/C+PsnDHrAMZlmueAYfC/A+7zEsIIykmpDwrCIPR +RpSeu6vfSNOxqUmV0Yonb363FHPEE+SwcGATHhbAe0c0Ju92zjygUkMkqI25xF+dJlo SMjQ== X-Forwarded-Encrypted: i=1; AJvYcCU5wKn6QRvsDycYjxY3ANO2nyc+q1jcX4Kl/89fiy311/ghBPY53c8IC6kXBnfsIz9YA/8YluYz68ZHlQQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxYSp5jnxR7HKbLajD4TgrkU45h4vcvAg7oyfmLSY1oS7pxYF01 MmRVBAckFMOBVY+XKKzk6TBcPMGlmKMAVJLFxRWMy24LuOA3FSUiGd6MdeYjT4mNsg== X-Gm-Gg: AY/fxX4yV3iPg71Fkl0R2b5/cATwdnacXdieo5fm6XWl5/cChKb0i99lGZ20EESSgwe uH2L1EfvjCsF74gQoz7gtRsGIlKfQJJObyPabX2GYOAee1qkjVr/Edo0/hRCGVxN0+V6daZ2xx/ 3M1lRi3oli3LIXhEcqkuL36Z8HkkwNnHk5hDn8/boVTXb5Jld3ZLvJS362n0rS2gO/M4A2ENGQy UC5z2LXTzhnzOHFkw7pTxHm/kBQgvRfyeVcImbE/i4Ww4UCtMRTZAuMGESMuxnt0HgRy6mzHCIR e4Tg1HyYy2jQtHNKiyaNt5sQ/iluhbVcw18N+0q+YKVY5sZF2ITt2xh1s3dIVHWJOlX6t8N16Js 8TeQJsI0cqJHll9YRLGlfU3Qahvre1nzbKx4NmKkcNiTHQiV2JMTQuD4N9y1VUgn2ai8IOqXGWq jU3dW3tL0vBsv/6ys+lUn9/+fPpmQ4fnWulsKeNK7VnLN+Kw3yAtbqmw== X-Google-Smtp-Source: AGHT+IG+sJtiRMznvJK7eAF3BfvksDE0W80LZT+rNRqafVCu95AAKjpXen+wQ/4SfBB9Uo3ubbufhQ== X-Received: by 2002:a17:906:9f92:b0:b2b:3481:93c8 with SMTP id a640c23a62f3a-b8036f1d812mr4908758366b.19.1767599355292; Sun, 04 Jan 2026 23:49:15 -0800 (PST) Received: from google.com (14.59.147.34.bc.googleusercontent.com. [34.147.59.14]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b8350268f86sm3163336166b.16.2026.01.04.23.49.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jan 2026 23:49:14 -0800 (PST) Date: Mon, 5 Jan 2026 07:49:10 +0000 From: Matt Bobrowski To: Alexei Starovoitov Cc: Roman Gushchin , bpf , linux-mm , LKML , JP Kobryn , Alexei Starovoitov , Daniel Borkmann , Shakeel Butt , Michal Hocko , Johannes Weiner Subject: Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc Message-ID: References: <20251223044156.208250-1-roman.gushchin@linux.dev> <20251223044156.208250-4-roman.gushchin@linux.dev> <7ia4ms2zwuqb.fsf@castle.c.googlers.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Dec 31, 2025 at 09:32:17AM -0800, Alexei Starovoitov wrote: > On Tue, Dec 30, 2025 at 11:42 PM Matt Bobrowski > wrote: > > > > On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote: > > > Matt Bobrowski writes: > > > > > > > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote: > > > >> Introduce a BPF kfunc to get a trusted pointer to the root memory > > > >> cgroup. It's very handy to traverse the full memcg tree, e.g. > > > >> for handling a system-wide OOM. > > > >> > > > >> It's possible to obtain this pointer by traversing the memcg tree > > > >> up from any known memcg, but it's sub-optimal and makes BPF programs > > > >> more complex and less efficient. > > > >> > > > >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics, > > > >> however in reality it's not necessary to bump the corresponding > > > >> reference counter - root memory cgroup is immortal, reference counting > > > >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid > > > >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer > > > >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op. > > > >> > > > >> Signed-off-by: Roman Gushchin > > > >> --- > > > >> mm/bpf_memcontrol.c | 20 ++++++++++++++++++++ > > > >> 1 file changed, 20 insertions(+) > > > >> > > > >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c > > > >> index 82eb95de77b7..187919eb2fe2 100644 > > > >> --- a/mm/bpf_memcontrol.c > > > >> +++ b/mm/bpf_memcontrol.c > > > >> @@ -10,6 +10,25 @@ > > > >> > > > >> __bpf_kfunc_start_defs(); > > > >> > > > >> +/** > > > >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup > > > >> + * > > > >> + * The function has KF_ACQUIRE semantics, even though the root memory > > > >> + * cgroup is never destroyed after being created and doesn't require > > > >> + * reference counting. And it's perfectly safe to pass it to > > > >> + * bpf_put_mem_cgroup() > > > >> + * > > > >> + * Return: A pointer to the root memory cgroup. > > > >> + */ > > > >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void) > > > >> +{ > > > >> + if (mem_cgroup_disabled()) > > > >> + return NULL; > > > >> + > > > >> + /* css_get() is not needed */ > > > >> + return root_mem_cgroup; > > > >> +} > > > >> + > > > >> /** > > > >> * bpf_get_mem_cgroup - Get a reference to a memory cgroup > > > >> * @css: pointer to the css structure > > > >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg) > > > >> __bpf_kfunc_end_defs(); > > > >> > > > >> BTF_KFUNCS_START(bpf_memcontrol_kfuncs) > > > >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL) > > > > > > > > I feel as though relying on KF_ACQUIRE semantics here is somewhat > > > > odd. Users of this BPF kfunc will now be forced to call > > > > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being > > > > completely unnecessary. > > > > > > A agree that it's annoying, but I doubt this extra call makes any > > > difference in the real world. > > > > Sure, that certainly holds true. > > > > > Also, the corresponding kernel code designed to hide the special > > > handling of the root cgroup. css_get()/css_put() are simple no-ops for > > > the root cgroup, but are totally valid. > > > > Yes, I do see that. > > > > > So in most places the root cgroup is handled as any other, which > > > simplifies the code. I guess the same will be true for many bpf > > > programs. > > > > I see, however the same might not necessarily hold for all other > > global pointers which end up being handed out by a BPF kfunc (not > > necessarily bpf_get_root_mem_cgroup()). This is why I was wondering > > whether there's some sense to introducing another KF flag (or > > something similar) which allows returned values from BPF kfuncs to be > > implicitly treated as trusted. > > No need for a new KF flag. Any struct returned by kfunc should be > trusted or trusted_or_null if KF_RET_NULL was specified. > I don't remember off the top of my head, but this behavior > is already implemented or we discussed making it this way. Hm, I do not see any evidence of this kind of semantic currently implemented, so perhaps it was only discussed at some point. Would you like me to put forward a patch that introduces this kind of implicit trust semantic for BPF kfuncs returning pointer to struct types?