All of lore.kernel.org
 help / color / mirror / Atom feed
From: hui.zhu@linux.dev
To: "Roman Gushchin" <roman.gushchin@linux.dev>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Alexei  Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"John  Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Hao Luo" <haoluo@google.com>, "Jiri  Olsa" <jolsa@kernel.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Nathan  Chancellor" <nathan@kernel.org>,
	"Kees Cook" <kees@kernel.org>, "Tejun Heo" <tj@kernel.org>,
	"Jeff Xu" <jeffxu@chromium.org>,
	mkoutny@suse.com, "Jan  Hendrik Farr" <kernel@jfarr.cc>,
	"Christian Brauner" <brauner@kernel.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Brian Gerst" <brgerst@gmail.com>,
	"Masahiro Yamada" <masahiroy@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, "Hui Zhu" <zhuhui@kylinos.cn>
Subject: Re: [RFC PATCH 0/3] Memory Controller eBPF support
Date: Thu, 20 Nov 2025 09:29:52 +0000	[thread overview]
Message-ID: <895f996653b3385e72763d5b35ccd993b07c6125@linux.dev> (raw)
In-Reply-To: <87ldk1mmk3.fsf@linux.dev>

2025年11月20日 11:04, "Roman Gushchin" <roman.gushchin@linux.dev mailto:roman.gushchin@linux.dev?to=%22Roman%20Gushchin%22%20%3Croman.gushchin%40linux.dev%3E > 写到:


> 
> Hui Zhu <hui.zhu@linux.dev> writes:
> 
> > 
> > From: Hui Zhu <zhuhui@kylinos.cn>
> > 
> >  This series proposes adding eBPF support to the Linux memory
> >  controller, enabling dynamic and extensible memory management
> >  policies at runtime.
> > 
> >  Background
> > 
> >  The memory controller (memcg) currently provides fixed memory
> >  accounting and reclamation policies through static kernel code.
> >  This limits flexibility for specialized workloads and use cases
> >  that require custom memory management strategies.
> > 
> >  By enabling eBPF programs to hook into key memory control
> >  operations, administrators can implement custom policies without
> >  recompiling the kernel, while maintaining the safety guarantees
> >  provided by the BPF verifier.
> > 
> >  Use Cases
> > 
> >  1. Custom memory reclamation strategies for specialized workloads
> >  2. Dynamic memory pressure monitoring and telemetry
> >  3. Memory accounting adjustments based on runtime conditions
> >  4. Integration with container orchestration systems for
> >  intelligent resource management
> >  5. Research and experimentation with novel memory management
> >  algorithms
> > 
> >  Design Overview
> > 
> >  This series introduces:
> > 
> >  1. A new BPF struct ops type (`memcg_ops`) that allows eBPF
> >  programs to implement custom behavior for memory charging
> >  operations.
> > 
> >  2. A hook point in the `try_charge_memcg()` fast path that
> >  invokes registered eBPF programs to determine if custom
> >  memory management should be applied.
> > 
> >  3. The eBPF handler can inspect memory cgroup context and
> >  optionally modify certain parameters (e.g., `nr_pages` for
> >  reclamation size).
> > 
> >  4. A reference counting mechanism using `percpu_ref` to safely
> >  manage the lifecycle of registered eBPF struct ops instances.
> > 
> Can you please describe how these hooks will be used in practice?
> What's the problem you can solve with it and can't without?
> 
> I generally agree with an idea to use BPF for various memcg-related
> policies, but I'm not sure how specific callbacks can be used in
> practice.

Hi Roman,

Following are some ideas that can use ebpf memcg:

Priority‑Based Reclaim and Limits in Multi‑Tenant Environments:
On a single machine with multiple tenants / namespaces / containers,
under memory pressure it’s hard to decide “who should be squeezed first”
with static policies baked into the kernel.
Assign a BPF profile to each tenant’s memcg:
Under high global pressure, BPF can decide:
Which memcgs’ memory.high should be raised (delaying reclaim),
Which memcgs should be scanned and reclaimed more aggressively.

Online Profiling / Diagnosing Memory Hotspots:
A cgroup’s memory keeps growing, but without patching the kernel it’s
difficult to obtain fine‑grained information.
Attach BPF to the memcg charge/uncharge path:
Record large allocations (greater than N KB) with call stacks and
owning file/module, and send them to user space via a BPF ring buffer.
Based on sampled data, generate:
“Top N memory allocation stacks in this container over the last 10 minutes,”
Reports of which objects / call paths are growing fastest.
This makes it possible to pinpoint the root cause of host memory
anomalies without changing application code, which is very useful
in operations/ops scenarios.

SLO‑Driven Auto Throttling / Scale‑In/Out Signals:
Use eBPF to observe memory usage slope, frequent reclaim,
or near‑OOM behavior within a memcg.
When it decides “OOM is imminent,” instead of just killing/raising
limits, it can emit a signal to a control‑plane component.
For example, send an event to a user‑space agent to trigger
automatic scaling, QPS adjustment, or throttling.

Prevent a cgroup from launching a large‑scale fork+malloc attack:
BPF checks per‑uid or per‑cgroup allocation behavior over the
last few seconds during memcg charge.

And I maintain a software project, https://github.com/teawater/mem-agent,
for specialized memory management and related functions.
However, I found that implementing certain memory QoS categories
for memcg solely from user space is rather inefficient,
as it requires frequent access to values within memcg.
This is why I want memcg to support eBPF—so that I can place
custom memory management logic directly into the kernel using eBPF,
greatly improving efficiency.

Best,
Hui

> 
> Thanks!
>

  reply	other threads:[~2025-11-20  9:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19  1:34 [RFC PATCH 0/3] Memory Controller eBPF support Hui Zhu
2025-11-19  1:34 ` [RFC PATCH 1/3] memcg: add eBPF struct ops support for memory charging Hui Zhu
2025-11-19  2:10   ` bot+bpf-ci
2025-11-19 16:07   ` Tejun Heo
2025-11-21 19:24   ` kernel test robot
2025-11-19  1:34 ` [RFC PATCH 2/3] selftests/bpf: add memcg eBPF struct ops test Hui Zhu
2025-11-19  2:19   ` bot+bpf-ci
2025-11-19  1:34 ` [RFC PATCH 3/3] samples/bpf: add example memcg eBPF program Hui Zhu
2025-11-19  2:19   ` bot+bpf-ci
2025-11-20  3:04 ` [RFC PATCH 0/3] Memory Controller eBPF support Roman Gushchin
2025-11-20  9:29   ` hui.zhu [this message]
2025-11-20 19:20     ` Michal Hocko
2025-11-21  2:46       ` hui.zhu
2025-11-25 12:12         ` Michal Hocko
2025-11-25 12:39           ` hui.zhu
2025-11-25 12:55             ` Michal Hocko
2025-11-26  3:05               ` hui.zhu
2025-11-26 16:01                 ` Michal Hocko
2025-11-27  8:51                   ` hui.zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=895f996653b3385e72763d5b35ccd993b07c6125@linux.dev \
    --to=hui.zhu@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=brgerst@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=jeffxu@chromium.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kees@kernel.org \
    --cc=kernel@jfarr.cc \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@linux.dev \
    --cc=masahiroy@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nathan@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=shakeel.butt@linux.dev \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    --cc=zhuhui@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.