From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4B3A2C0F83 for ; Fri, 19 Dec 2025 22:42:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766184181; cv=none; b=tva5LMCQTzgQdcU6x0jVAOfTn+MFFIrrZEEL4A2hDvzFgnT5EmhJRv7ghnJG269VE9xEa8F7efc8RIotH6vZn9k5wxc33Ob0r68FbFvbYdfqacY2Y0I4wAC7RrQag/e8qyLCiu/QUGtdlxq9ph/po2xC3hA+mhphzwB4heU5nhw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766184181; c=relaxed/simple; bh=QM3fPvs24xxUK46lUgLPZTIdIbBY3T5sZJ5/XNMBwUE=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=Y952omAK+doLX9m3us83GAXCvTEcM/JNi7LJO9hUazSIIywDzu3Uhe7/d/v9fpQLcebVtpH3c678HkpJKWQGHpLcfX4Sl9wRAZBu0u5H5f7sAYgmmM8+9Gmcf7TGy7PTjdGjtg/cY21VyifjG99fNsGyHDKq4Q8PSZxLOjufaII= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=kryMfCV8; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="kryMfCV8" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1766184167; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5yKgwo1ObVk1PtlBS0PDy4iJ86OgBR5tXdRHxkExP6E=; b=kryMfCV8sBFXLDY2tZpjHZ5qu1vI/gVuXJUBjnNSCrdRnE7BFWmA/8RA09W/orfjto4LX/ h63wNLUV8XW7skMuBaqQpi8JYdA71n5gX3uMc9YI0BDiFM5SvOqubNegMWpviB2JVlOH0P 7eHaitaKQmRxSh5LAftv34jiQIwjSBU= From: Roman Gushchin To: Shakeel Butt Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, JP Kobryn , Alexei Starovoitov , Daniel Borkmann , Michal Hocko , Johannes Weiner Subject: Re: [PATCH bpf-next v1 2/6] mm: introduce BPF kfuncs to deal with memcg pointers In-Reply-To: (Shakeel Butt's message of "Fri, 19 Dec 2025 13:51:16 -0800") References: <20251219015750.23732-1-roman.gushchin@linux.dev> <20251219015750.23732-3-roman.gushchin@linux.dev> Date: Fri, 19 Dec 2025 14:42:35 -0800 Message-ID: <87qzsqhz50.fsf@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain X-Migadu-Flow: FLOW_OUT Shakeel Butt writes: > On Thu, Dec 18, 2025 at 05:57:46PM -0800, Roman Gushchin wrote: >> To effectively operate with memory cgroups in BPF there is a need >> to convert css pointers to memcg pointers. A simple container_of >> cast which is used in the kernel code can't be used in BPF because >> from the verifier's point of view that's a out-of-bounds memory access. >> >> Introduce helper get/put kfuncs which can be used to get >> a refcounted memcg pointer from the css pointer: >> - bpf_get_mem_cgroup, >> - bpf_put_mem_cgroup. >> >> bpf_get_mem_cgroup() can take both memcg's css and the corresponding >> cgroup's "self" css. It allows it to be used with the existing cgroup >> iterator which iterates over cgroup tree, not memcg tree. >> >> Signed-off-by: Roman Gushchin >> --- >> mm/Makefile | 3 ++ >> mm/bpf_memcontrol.c | 88 +++++++++++++++++++++++++++++++++++++++++++++ > > Let's add this file to MAINTAINERS file. Will do. I planned to create a new entry for mm-related bpf files as part of the bpf oom patchset. > >> 2 files changed, 91 insertions(+) >> create mode 100644 mm/bpf_memcontrol.c >> >> diff --git a/mm/Makefile b/mm/Makefile >> index 9175f8cc6565..79c39a98ff83 100644 >> --- a/mm/Makefile >> +++ b/mm/Makefile >> @@ -106,6 +106,9 @@ obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o >> ifdef CONFIG_SWAP >> obj-$(CONFIG_MEMCG) += swap_cgroup.o >> endif >> +ifdef CONFIG_BPF_SYSCALL >> +obj-$(CONFIG_MEMCG) += bpf_memcontrol.o >> +endif >> obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o >> obj-$(CONFIG_GUP_TEST) += gup_test.o >> obj-$(CONFIG_DMAPOOL_TEST) += dmapool_test.o >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c >> new file mode 100644 >> index 000000000000..8aa842b56817 >> --- /dev/null >> +++ b/mm/bpf_memcontrol.c >> @@ -0,0 +1,88 @@ >> +// SPDX-License-Identifier: GPL-2.0-or-later >> +/* >> + * Memory Controller-related BPF kfuncs and auxiliary code >> + * >> + * Author: Roman Gushchin >> + */ >> + >> +#include >> +#include >> + >> +__bpf_kfunc_start_defs(); >> + >> +/** >> + * bpf_get_mem_cgroup - Get a reference to a memory cgroup >> + * @css: pointer to the css structure >> + * >> + * Returns a pointer to a mem_cgroup structure after bumping >> + * the corresponding css's reference counter. >> + * >> + * It's fine to pass a css which belongs to any cgroup controller, >> + * e.g. unified hierarchy's main css. >> + * >> + * Implements KF_ACQUIRE semantics. >> + */ >> +__bpf_kfunc struct mem_cgroup * >> +bpf_get_mem_cgroup(struct cgroup_subsys_state *css) >> +{ >> + struct mem_cgroup *memcg = NULL; >> + bool rcu_unlock = false; >> + >> + if (!root_mem_cgroup) >> + return NULL; > > Should we also handle mem_cgroup_disabled() here? Good point, will add in v2. Same with bpf_get_root_mem_cgroup() patch. > >> + >> + if (root_mem_cgroup->css.ss != css->ss) { >> + struct cgroup *cgroup = css->cgroup; >> + int ssid = root_mem_cgroup->css.ss->id; >> + >> + rcu_read_lock(); >> + rcu_unlock = true; >> + css = rcu_dereference_raw(cgroup->subsys[ssid]); >> + } >> + >> + if (css && css_tryget(css)) >> + memcg = container_of(css, struct mem_cgroup, css); >> + >> + if (rcu_unlock) >> + rcu_read_unlock(); > > Any reason to handle rcu lock like this? Why not just take the rcu read > lock irrespective? It is cheap. Idk, it's cheap but not entirely free and I think the code is still perfectly readable. > >> + >> + return memcg; >> +} >> + >> +/** >> + * bpf_put_mem_cgroup - Put a reference to a memory cgroup >> + * @memcg: memory cgroup to release >> + * >> + * Releases a previously acquired memcg reference. >> + * Implements KF_RELEASE semantics. >> + */ >> +__bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg) >> +{ >> + css_put(&memcg->css); > > Should we NULL check memcg here? bpf_get_mem_cgroup() can return NULL. No, the verifier ensures it's a valid memcg pointer. No need for an additional check. > >> +} >> + >> +__bpf_kfunc_end_defs(); >> + >> +BTF_KFUNCS_START(bpf_memcontrol_kfuncs) >> +BTF_ID_FLAGS(func, bpf_get_mem_cgroup, KF_TRUSTED_ARGS | KF_ACQUIRE | KF_RET_NULL | KF_RCU) >> +BTF_ID_FLAGS(func, bpf_put_mem_cgroup, KF_TRUSTED_ARGS | KF_RELEASE) > > Will the verifier enforce that bpf_put_mem_cgroup() can not be called > with NULL? Yep. Thanks for the review!