From: hui.zhu@linux.dev
To: "Michal Koutný" <mkoutny@suse.com>, chenridong@huaweicloud.com
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@fomichev.me>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
"Shuah Khan" <shuah@kernel.org>,
"Peter Zijlstra" <peterz@infradead.org>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Nathan Chancellor" <nathan@kernel.org>,
"Kees Cook" <kees@kernel.org>, "Tejun Heo" <tj@kernel.org>,
"Jeff Xu" <jeffxu@chromium.org>,
"Jan Hendrik Farr" <kernel@jfarr.cc>,
"Christian Brauner" <brauner@kernel.org>,
"Randy Dunlap" <rdunlap@infradead.org>,
"Brian Gerst" <brgerst@gmail.com>,
"Masahiro Yamada" <masahiroy@kernel.org>,
davem@davemloft.net, "Jakub Kicinski" <kuba@kernel.org>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, "Hui Zhu" <zhuhui@kylinos.cn>
Subject: Re: [RFC PATCH v2 0/3] Memory Controller eBPF support
Date: Sun, 04 Jan 2026 09:30:46 +0000 [thread overview]
Message-ID: <a935563217affe85b2a6d0689914d7aba2ce127f@linux.dev> (raw)
In-Reply-To: <enlefo5mmoha2htsrvv76tdmj6yum4jan6hgym76adtpxuhvrp@aug6qh3ocde5>
2025年12月30日 17:49, "Michal Koutný" <mkoutny@suse.com mailto:mkoutny@suse.com?to=%22Michal%20Koutn%C3%BD%22%20%3Cmkoutny%40suse.com%3E > 写到:
Hi Michal and Ridong,
>
> Hi Hui.
>
> On Tue, Dec 30, 2025 at 11:01:58AM +0800, Hui Zhu <hui.zhu@linux.dev> wrote:
>
> >
> > This allows administrators to suppress low-priority cgroups' memory
> > usage based on custom policies implemented in BPF programs.
> >
> BTW memory.low was conceived as a work-conserving mechanism for
> prioritization of different workloads. Have you tried that? No need to
> go directly to (high) limits. (<- Main question, below are some
> secondary implementation questions/remarks.)
>
> ...
>
memory.low is a helpful feature, but it can struggle to effectively
throttle low-priority processes that continuously access their memory.
For instance, consider the following example I ran:
root@ubuntu:~# echo $((4 * 1024 * 1024 * 1024)) > /sys/fs/cgroup/high/memory.low
root@ubuntu:~# cgexec -g memory:low stress-ng --vm 4 --vm-keep --vm-bytes 80% --vm-method all --seed 2025 --metrics -t 60 & cgexec -g memory:high stress-ng --vm 4 --vm-keep --vm-bytes 80% --vm-method all --seed 2025 --metrics -t 60
[1] 2011
stress-ng: info: [2011] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [2012] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [2011] dispatching hogs: 4 vm
stress-ng: info: [2012] dispatching hogs: 4 vm
stress-ng: metrc: [2012] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [2012] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [2012] vm 23584 60.21 2.75 15.94 391.73 1262.07 7.76 649988
stress-ng: info: [2012] skipped: 0
stress-ng: info: [2012] passed: 4: vm (4)
stress-ng: info: [2012] failed: 0
stress-ng: info: [2012] metrics untrustworthy: 0
stress-ng: info: [2012] successful run completed in 1 min, 0.22 secs
stress-ng: metrc: [2011] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [2011] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [2011] vm 23584 60.22 3.06 16.19 391.63 1224.97 7.99 688836
stress-ng: info: [2011] skipped: 0
stress-ng: info: [2011] passed: 4: vm (4)
stress-ng: info: [2011] failed: 0
stress-ng: info: [2011] metrics untrustworthy: 0
stress-ng: info: [2011] successful run completed in 1 min, 0.23 secs
As the results show, setting memory.low on the cgroup with the
high-priority workload did not improve its memory performance.
However, memory.low is beneficial in many other scenarios.
Perhaps extending it with eBPF support could help address a wider
range of issues.
> >
> > This series introduces a BPF hook that allows reporting
> > additional "pages over high" for specific cgroups, effectively
> > increasing memory pressure and throttling for lower-priority
> > workloads when higher-priority cgroups need resources.
> >
> Have you considered hooking into calculate_high_delay() instead? (That
> function has undergone some evolution so it'd seem like the candidate
> for BPFication.)
>
It seems that try_charge_memcg will not reach
__mem_cgroup_handle_over_high if it only hook calculate_high_delay
without setting memory.high.
What do you think about hooking try_charge_memcg as well,
so that it ensures __mem_cgroup_handle_over_high is called?
> ...
>
> >
> > 3. Cgroup hierarchy management (inheritance during online/offline)
> >
> I see you're copying the program upon memcg creation.
> Configuration copies aren't such a good way to properly handle
> hierarchical behavior.
> I wonder if this could follow the more generic pattern of how BPF progs
> are evaluated in hierarchies, see BPF_F_ALLOW_OVERRIDE and
> BPF_F_ALLOW_MULTI.
I will support them in the next version.
>
> >
> > Example Results
> >
> ...
>
> >
> > Results show the low-priority cgroup (/sys/fs/cgroup/low) was
> > significantly throttled:
> > - High-priority cgroup: 21,033,377 bogo ops at 347,825 ops/s
> > - Low-priority cgroup: 11,568 bogo ops at 177 ops/s
> >
> > The stress-ng process in the low-priority cgroup experienced a
> > ~99.9% slowdown in memory operations compared to the
> > high-priority cgroup, demonstrating effective priority
> > enforcement through BPF-controlled memory pressure.
> >
> As a demonstrator, it'd be good to compare this with a baseline without
> any extra progs, e.g. show that high-prio performed better and low-prio
> wasn't throttled for nothing.
Thanks for your remind.
This is a test log in the test environment without any extra progs:
root@ubuntu:~# cgexec -g memory:low stress-ng --vm 4 --vm-keep --vm-bytes 80% \
--vm-method all --seed 2025 --metrics -t 60 \
& cgexec -g memory:high stress-ng --vm 4 --vm-keep --vm-bytes 80% \
--vm-method all --seed 2025 --metrics -t 60
[1] 982
stress-ng: info: [982] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [983] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [982] dispatching hogs: 4 vm
stress-ng: info: [983] dispatching hogs: 4 vm
stress-ng: metrc: [982] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [982] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [982] vm 23544 60.08 2.90 15.74 391.85 1263.43 7.75 524708
stress-ng: info: [982] skipped: 0
stress-ng: info: [982] passed: 4: vm (4)
stress-ng: info: [982] failed: 0
stress-ng: info: [982] metrics untrustworthy: 0
stress-ng: info: [982] successful run completed in 1 min, 0.09 secs
stress-ng: metrc: [983] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [983] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [983] vm 23544 60.09 3.12 15.91 391.81 1237.10 7.92 705076
stress-ng: info: [983] skipped: 0
stress-ng: info: [983] passed: 4: vm (4)
stress-ng: info: [983] failed: 0
stress-ng: info: [983] metrics untrustworthy: 0
stress-ng: info: [983] successful run completed in 1 min, 0.09 secs
Best,
Hui
>
> Thanks,
> Michal
>
next prev parent reply other threads:[~2026-01-04 9:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-30 3:01 [RFC PATCH v2 0/3] Memory Controller eBPF support Hui Zhu
2025-12-30 3:01 ` [RFC PATCH v2 1/3] mm: memcontrol: Add BPF struct_ops for memory pressure control Hui Zhu
2025-12-30 3:02 ` [RFC PATCH v2 2/3] selftests/bpf: Add tests for memcg_bpf_ops Hui Zhu
2025-12-30 3:02 ` [RFC PATCH v2 3/3] samples/bpf: Add memcg priority control example Hui Zhu
2025-12-30 9:49 ` [RFC PATCH v2 0/3] Memory Controller eBPF support Michal Koutný
2025-12-30 13:24 ` Chen Ridong
2026-01-04 9:30 ` hui.zhu [this message]
2026-01-09 11:00 ` Michal Koutný
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a935563217affe85b2a6d0689914d7aba2ce127f@linux.dev \
--to=hui.zhu@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=brgerst@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=jeffxu@chromium.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kees@kernel.org \
--cc=kernel@jfarr.cc \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=martin.lau@linux.dev \
--cc=masahiroy@kernel.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nathan@kernel.org \
--cc=ojeda@kernel.org \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=roman.gushchin@linux.dev \
--cc=sdf@fomichev.me \
--cc=shakeel.butt@linux.dev \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=tj@kernel.org \
--cc=yonghong.song@linux.dev \
--cc=zhuhui@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.