From: JP Kobryn <inwardvessel@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Shakeel Butt" <shakeel.butt@linux.dev>,
"Michal Koutný" <mkoutny@suse.com>,
"Yosry Ahmed" <yosryahmed@google.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Tejun Heo" <tj@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, bpf <bpf@vger.kernel.org>,
"Kernel Team" <kernel-team@meta.com>
Subject: Re: [PATCH] memcg: introduce kfuncs for fetching memcg stats
Date: Mon, 6 Oct 2025 13:33:07 -0700 [thread overview]
Message-ID: <82c392dc-e750-43d9-9394-1df00a366ae0@gmail.com> (raw)
In-Reply-To: <CAADnVQKQpEsgoR5xw0_32deMqT4Pc7ZOo8jwJWkarcOrZijPzw@mail.gmail.com>
On 10/1/25 3:25 PM, Alexei Starovoitov wrote:
> On Tue, Sep 30, 2025 at 9:57 PM JP Kobryn <inwardvessel@gmail.com> wrote:
[..]
>>
>> There is a significant perf benefit when using this approach. In terms of
>> elapsed time, the kfuncs allow a bpf cgroup iterator program to outperform
>> the traditional file reading method, saving almost 80% of the time spent in
>> kernel.
>>
>> control: elapsed time
>> real 0m14.421s
>> user 0m0.183s
>> sys 0m14.184s
>>
>> experiment: elapsed time
>> real 0m3.250s
>> user 0m0.225s
>> sys 0m2.916s
>
> Nice, but github repo somewhere doesn't guarantee that
> the work is equivalent.
> Please add it as a selftest/bpf instead.
> Like was done in commit
> https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com/
> to demonstrate equivalence of 'cat /proc' vs iterator approach.
Sure, I'll relocate the test code there.
[..]
>> ---
>> mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 67 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 8dd7fbed5a94..aa8cbf883d71 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -870,6 +870,73 @@ unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
>> }
>> #endif
>>
>> +static inline struct mem_cgroup *memcg_from_cgroup(struct cgroup *cgrp)
>> +{
>> + return cgrp ? mem_cgroup_from_css(cgrp->subsys[memory_cgrp_id]) : NULL;
>> +}
>> +
>> +__bpf_kfunc static void memcg_flush_stats(struct cgroup *cgrp)
>> +{
>> + struct mem_cgroup *memcg = memcg_from_cgroup(cgrp);
>> +
>> + if (!memcg)
>> + return;
>> +
>> + mem_cgroup_flush_stats(memcg);
>> +}
>
> css_rstat_flush() is sleepable, so this kfunc must be sleepable too.
> Not sure about the rest.
Good catch. I'll add the sleepable flag where it's needed.
>
>> +
>> +__bpf_kfunc static unsigned long memcg_node_stat_fetch(struct cgroup *cgrp,
>> + enum node_stat_item item)
>> +{
>> + struct mem_cgroup *memcg = memcg_from_cgroup(cgrp);
>> +
>> + if (!memcg)
>> + return 0;
>> +
>> + return memcg_page_state_output(memcg, item);
>> +}
>> +
>> +__bpf_kfunc static unsigned long memcg_stat_fetch(struct cgroup *cgrp,
>> + enum memcg_stat_item item)
>> +{
>> + struct mem_cgroup *memcg = memcg_from_cgroup(cgrp);
>> +
>> + if (!memcg)
>> + return 0;
>> +
>> + return memcg_page_state_output(memcg, item);
>> +}
>> +
>> +__bpf_kfunc static unsigned long memcg_vm_event_fetch(struct cgroup *cgrp,
>> + enum vm_event_item item)
>> +{
>> + struct mem_cgroup *memcg = memcg_from_cgroup(cgrp);
>> +
>> + if (!memcg)
>> + return 0;
>> +
>> + return memcg_events(memcg, item);
>> +}
>> +
>> +BTF_KFUNCS_START(bpf_memcontrol_kfunc_ids)
>> +BTF_ID_FLAGS(func, memcg_flush_stats)
>> +BTF_ID_FLAGS(func, memcg_node_stat_fetch)
>> +BTF_ID_FLAGS(func, memcg_stat_fetch)
>> +BTF_ID_FLAGS(func, memcg_vm_event_fetch)
>> +BTF_KFUNCS_END(bpf_memcontrol_kfunc_ids)
>
> At least one of them must be sleepable and the rest probably too?
> All of them must be KF_TRUSTED_ARGS too.
Thanks, I'll include the trusted args flag. As to which are sleepable,
only memcg_flush_stats can block.
>
>> +
>> +static const struct btf_kfunc_id_set bpf_memcontrol_kfunc_set = {
>> + .owner = THIS_MODULE,
>> + .set = &bpf_memcontrol_kfunc_ids,
>> +};
>> +
>> +static int __init bpf_memcontrol_kfunc_init(void)
>> +{
>> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
>> + &bpf_memcontrol_kfunc_set);
>> +}
>
> Why tracing only?
Hmmm, initially I didn't think about use cases outside of the cgroup
iterator programs. After discussing with teammates though, some other
potential use cases could be within sched_ext or (future) bpf-oom. I'm
thinking I'll go with the "UNSPEC" type in v2.
prev parent reply other threads:[~2025-10-06 20:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 4:54 [PATCH] memcg: introduce kfuncs for fetching memcg stats JP Kobryn
2025-10-01 22:25 ` Alexei Starovoitov
2025-10-06 20:32 ` Shakeel Butt
2025-10-06 20:33 ` JP Kobryn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82c392dc-e750-43d9-9394-1df00a366ae0@gmail.com \
--to=inwardvessel@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mkoutny@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.