All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	JP Kobryn <inwardvessel@gmail.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, bpf@vger.kernel.org,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Song Liu <song@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v2 10/23] mm: introduce BPF kfuncs to access memcg statistics and events
Date: Fri, 31 Oct 2025 10:08:25 +0100	[thread overview]
Message-ID: <aQR8if0cpQQ5Am36@tiehlicka> (raw)
In-Reply-To: <20251027231727.472628-11-roman.gushchin@linux.dev>

On Mon 27-10-25 16:17:13, Roman Gushchin wrote:
> Introduce BPF kfuncs to conveniently access memcg data:
>   - bpf_mem_cgroup_vm_events(),
>   - bpf_mem_cgroup_usage(),
>   - bpf_mem_cgroup_page_state(),
>   - bpf_mem_cgroup_flush_stats().
> 
> These functions are useful for implementing BPF OOM policies, but
> also can be used to accelerate access to the memcg data. Reading
> it through cgroupfs is much more expensive, roughly 5x, mostly
> because of the need to convert the data into the text and back.
> 
> JP Kobryn:
> An experiment was setup to compare the performance of a program that
> uses the traditional method of reading memory.stat vs a program using
> the new kfuncs. The control program opens up the root memory.stat file
> and for 1M iterations reads, converts the string values to numeric data,
> then seeks back to the beginning. The experimental program sets up the
> requisite libbpf objects and for 1M iterations invokes a bpf program
> which uses the kfuncs to fetch all available stats for node_stat_item,
> memcg_stat_item, and vm_event_item types.
> 
> The results showed a significant perf benefit on the experimental side,
> outperforming the control side by a margin of 93%. In kernel mode,
> elapsed time was reduced by 80%, while in user mode, over 99% of time
> was saved.
> 
> control: elapsed time
> real    0m38.318s
> user    0m25.131s
> sys     0m13.070s
> 
> experiment: elapsed time
> real    0m2.789s
> user    0m0.187s
> sys     0m2.512s
> 
> control: perf data
> 33.43% a.out libc.so.6         [.] __vfscanf_internal
>  6.88% a.out [kernel.kallsyms] [k] vsnprintf
>  6.33% a.out libc.so.6         [.] _IO_fgets
>  5.51% a.out [kernel.kallsyms] [k] format_decode
>  4.31% a.out libc.so.6         [.] __GI_____strtoull_l_internal
>  3.78% a.out [kernel.kallsyms] [k] string
>  3.53% a.out [kernel.kallsyms] [k] number
>  2.71% a.out libc.so.6         [.] _IO_sputbackc
>  2.41% a.out [kernel.kallsyms] [k] strlen
>  1.98% a.out a.out             [.] main
>  1.70% a.out libc.so.6         [.] _IO_getline_info
>  1.51% a.out libc.so.6         [.] __isoc99_sscanf
>  1.47% a.out [kernel.kallsyms] [k] memory_stat_format
>  1.47% a.out [kernel.kallsyms] [k] memcpy_orig
>  1.41% a.out [kernel.kallsyms] [k] seq_buf_printf
> 
> experiment: perf data
> 10.55% memcgstat bpf_prog_..._query [k] bpf_prog_16aab2f19fa982a7_query
>  6.90% memcgstat [kernel.kallsyms]  [k] memcg_page_state_output
>  3.55% memcgstat [kernel.kallsyms]  [k] _raw_spin_lock
>  3.12% memcgstat [kernel.kallsyms]  [k] memcg_events
>  2.87% memcgstat [kernel.kallsyms]  [k] __memcg_slab_post_alloc_hook
>  2.73% memcgstat [kernel.kallsyms]  [k] kmem_cache_free
>  2.70% memcgstat [kernel.kallsyms]  [k] entry_SYSRETQ_unsafe_stack
>  2.25% memcgstat [kernel.kallsyms]  [k] __memcg_slab_free_hook
>  2.06% memcgstat [kernel.kallsyms]  [k] get_page_from_freelist
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Co-developed-by: JP Kobryn <inwardvessel@gmail.com>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/memcontrol.h |  2 ++
>  mm/bpf_memcontrol.c        | 57 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 39a6c7c8735b..b9e08dddd7ad 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -953,6 +953,8 @@ static inline void mod_memcg_page_state(struct page *page,
>  	rcu_read_unlock();
>  }
>  
> +unsigned long memcg_events(struct mem_cgroup *memcg, int event);
> +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
>  unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx);
>  unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item);
>  unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx);
> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> index 76c342318256..387255b8ab88 100644
> --- a/mm/bpf_memcontrol.c
> +++ b/mm/bpf_memcontrol.c
> @@ -75,6 +75,56 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
>  	css_put(&memcg->css);
>  }
>  
> +/**
> + * bpf_mem_cgroup_vm_events - Read memory cgroup's vm event counter
> + * @memcg: memory cgroup
> + * @event: event id
> + *
> + * Allows to read memory cgroup event counters.
> + */
> +__bpf_kfunc unsigned long bpf_mem_cgroup_vm_events(struct mem_cgroup *memcg,
> +						enum vm_event_item event)
> +{
> +	return memcg_events(memcg, event);
> +}
> +
> +/**
> + * bpf_mem_cgroup_usage - Read memory cgroup's usage
> + * @memcg: memory cgroup
> + *
> + * Returns current memory cgroup size in bytes.
> + */
> +__bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
> +{
> +	return page_counter_read(&memcg->memory);
> +}
> +
> +/**
> + * bpf_mem_cgroup_page_state - Read memory cgroup's page state counter
> + * @memcg: memory cgroup
> + * @idx: counter idx
> + *
> + * Allows to read memory cgroup statistics. The output is in bytes.
> + */
> +__bpf_kfunc unsigned long bpf_mem_cgroup_page_state(struct mem_cgroup *memcg, int idx)
> +{
> +	if (idx < 0 || idx >= MEMCG_NR_STAT)
> +		return (unsigned long)-1;
> +
> +	return memcg_page_state_output(memcg, idx);
> +}
> +
> +/**
> + * bpf_mem_cgroup_flush_stats - Flush memory cgroup's statistics
> + * @memcg: memory cgroup
> + *
> + * Propagate memory cgroup's statistics up the cgroup tree.
> + */
> +__bpf_kfunc void bpf_mem_cgroup_flush_stats(struct mem_cgroup *memcg)
> +{
> +	mem_cgroup_flush_stats(memcg);
> +}
> +
>  __bpf_kfunc_end_defs();
>  
>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
> @@ -82,6 +132,11 @@ BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_get_mem_cgroup, KF_ACQUIRE | KF_RET_NULL | KF_RCU)
>  BTF_ID_FLAGS(func, bpf_put_mem_cgroup, KF_RELEASE)
>  
> +BTF_ID_FLAGS(func, bpf_mem_cgroup_vm_events, KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_mem_cgroup_usage, KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_mem_cgroup_page_state, KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_mem_cgroup_flush_stats, KF_TRUSTED_ARGS | KF_SLEEPABLE)
> +
>  BTF_KFUNCS_END(bpf_memcontrol_kfuncs)
>  
>  static const struct btf_kfunc_id_set bpf_memcontrol_kfunc_set = {
> @@ -93,7 +148,7 @@ static int __init bpf_memcontrol_init(void)
>  {
>  	int err;
>  
> -	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> +	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC,
>  					&bpf_memcontrol_kfunc_set);
>  	if (err)
>  		pr_warn("error while registering bpf memcontrol kfuncs: %d", err);
> -- 
> 2.51.0

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2025-10-31  9:08 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 23:17 [PATCH v2 00/23] mm: BPF OOM Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 01/23] bpf: move bpf_struct_ops_link into bpf.h Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 15:57     ` Roman Gushchin
2025-10-29 18:01   ` Song Liu
2025-10-29 20:26     ` Roman Gushchin
2025-10-30 17:22     ` Roman Gushchin
2025-10-30 18:03       ` Song Liu
2025-10-30 18:19         ` Amery Hung
2025-10-30 19:06           ` Roman Gushchin
2025-10-30 21:34             ` Song Liu
2025-10-30 22:42               ` Martin KaFai Lau
2025-10-30 23:14                 ` Roman Gushchin
2025-10-31  0:05                 ` Song Liu
2025-10-30 22:19             ` bpf_st_ops and cgroups. Was: " Alexei Starovoitov
2025-10-30 23:24               ` Roman Gushchin
2025-10-31  3:03                 ` Yafang Shao
2025-10-31  6:14                 ` Song Liu
2025-10-31 11:35                   ` Yafang Shao
2025-10-31 17:37                 ` Alexei Starovoitov
2025-10-29 18:14   ` Tejun Heo
2025-10-29 20:25     ` Roman Gushchin
2025-10-29 20:36       ` Tejun Heo
2025-10-29 21:18         ` Song Liu
2025-10-29 21:27           ` Tejun Heo
2025-10-29 21:37             ` Song Liu
2025-10-29 21:45               ` Tejun Heo
2025-10-30  4:32                 ` Song Liu
2025-10-30 16:13                   ` Tejun Heo
2025-10-30 17:56                     ` Song Liu
2025-10-29 21:53           ` Roman Gushchin
2025-10-29 22:43             ` Alexei Starovoitov
2025-10-29 22:53               ` Tejun Heo
2025-10-29 23:53                 ` Alexei Starovoitov
2025-10-30  0:03                   ` Tejun Heo
2025-10-30  0:16                     ` Alexei Starovoitov
2025-10-30  6:33                       ` Yafang Shao
2025-10-29 21:04   ` Song Liu
2025-10-30  0:43   ` Martin KaFai Lau
2025-10-27 23:17 ` [PATCH v2 03/23] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 04/23] mm: define mem_cgroup_get_from_ino() outside of CONFIG_SHRINKER_DEBUG Roman Gushchin
2025-10-31  8:32   ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 05/23] mm: declare memcg_page_state_output() in memcontrol.h Roman Gushchin
2025-10-31  8:34   ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 06/23] mm: introduce BPF struct ops for OOM handling Roman Gushchin
2025-10-27 23:57   ` bot+bpf-ci
2025-10-28 17:45   ` Alexei Starovoitov
2025-10-28 18:42     ` Roman Gushchin
2025-10-28 22:07       ` Alexei Starovoitov
2025-10-28 22:56         ` Roman Gushchin
2025-10-28 21:33   ` Song Liu
2025-10-28 23:24     ` Roman Gushchin
2025-10-30  0:20   ` Martin KaFai Lau
2025-10-30  5:57   ` Yafang Shao
2025-10-30 14:26     ` Roman Gushchin
2025-10-31  9:02   ` Michal Hocko
2025-11-02 21:36     ` Roman Gushchin
2025-11-03 19:00       ` Michal Hocko
2025-11-04  1:45         ` Roman Gushchin
2025-11-04  8:18           ` Michal Hocko
2025-11-04 18:14             ` Roman Gushchin
2025-11-04 19:22               ` Michal Hocko
2026-01-12 14:54   ` Matt Bobrowski
2026-01-12 17:20     ` Roman Gushchin
2026-01-12 18:20       ` Matt Bobrowski
2025-10-27 23:17 ` [PATCH v2 07/23] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2025-10-31  9:05   ` Michal Hocko
2025-11-02 21:09     ` Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 08/23] mm: introduce BPF kfuncs to deal with memcg pointers Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 16:10     ` Roman Gushchin
2025-10-28 17:12       ` Alexei Starovoitov
2025-10-28 18:03         ` Chris Mason
2025-10-28 18:32           ` Roman Gushchin
2025-10-28 17:42   ` Tejun Heo
2025-10-28 18:12     ` Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 09/23] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 10/23] mm: introduce BPF kfuncs to access memcg statistics and events Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 16:16     ` Roman Gushchin
2025-10-31  9:08   ` Michal Hocko [this message]
2025-10-31  9:31 ` [PATCH v2 00/23] mm: BPF OOM Michal Hocko
2025-10-31 16:48   ` Lance Yang
2025-11-02 20:53   ` Roman Gushchin
2025-11-03 18:18     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQR8if0cpQQ5Am36@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.