public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: "teawater" <hui.zhu@linux.dev>
To: "Muchun Song" <muchun.song@linux.dev>,
	"Shakeel Butt" <shakeel.butt@linux.dev>
Cc: lsf-pc@lists.linux-foundation.org,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Tejun Heo" <tj@kernel.org>, "Michal Hocko" <mhocko@suse.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"JP Kobryn" <inwardvessel@gmail.com>,
	"Geliang Tang" <geliang@kernel.org>,
	"Sweet Tea Dorminy" <sweettea-kernel@dorminy.me>,
	"Emil Tsalapatis" <emil@etsalapatis.com>,
	"David Rientjes" <rientjes@google.com>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Meta kernel team" <kernel-team@meta.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Reimagining Memory Cgroup (memcg_ext)
Date: Fri, 13 Mar 2026 06:17:18 +0000	[thread overview]
Message-ID: <90829ef692dabd1635daf6475bd09b192788376d@linux.dev> (raw)
In-Reply-To: <8F3593EB-9D81-4459-8675-E922426DCB1E@linux.dev>

> 
> > 
> > On Mar 12, 2026, at 04:39, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >  
> >  On Wed, Mar 11, 2026 at 03:19:31PM +0800, Muchun Song wrote:
> > 
> > > 
> > > 
> > > 
> >  On Mar 8, 2026, at 02:24, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >  
> >  
> >  [...]
> >  
> >  
> >  Per-Memcg Background Reclaim
> >  
> >  In the new memcg world, with the goal of (mostly) eliminating direct synchronous
> >  reclaim for limit enforcement, provide per-memcg background reclaimers which can
> >  scale across CPUs with the allocation rate.
> > 
> > > 
> > > Hi Shakeel,
> > >  
> > >  I'm quite interested in this. Internally, we privately maintain a set
> > >  of code to implement asynchronous reclamation, but we're also trying to
> > >  discard these private codes as much as possible. Therefore, we want to
> > >  implement a similar asynchronous reclamation mechanism in user space
> > >  through the memory.reclaim mechanism. However, currently there's a lack
> > >  of suitable policy notification mechanisms to trigger user threads to
> > >  proactively reclaim in advance.
> > > 
> >  
> >  Cool, can you please share what "suitable policy notification mechanisms" you
> >  need for your use-case? This will give me more data on the comparison between
> >  memory.reclaim and the proposed approach.
> > 
> If we expect the proactive reclamation to be triggered when the current
> memcg's memory usage reaches a certain point, we have to continuously read
> memory.current to determine whether it has reached our set watermark value
> to trigger asynchronous reclamation. Perhaps we need an event that can notify
> user-space threads when the current memory usage reaches a specific
> watermark value. Currently, the events supported by memory.events may lack
> the capability for custom watermarks.

I agree. Even with BPF controlling proactive reclamation, I believe
there needs to be an event reflecting capacity changes to signal
when to stop. 
Otherwise, the reclamation volume per batch would have to be set very
low, leading to frequent BPF triggers and poor efficiency.

Best,
Hui


> 
> > 
> > > 
> > > 
> > > 
> >  
> >  Lock-Aware Throttling
> >  
> >  The ability to avoid throttling an allocating task that is holding locks, to
> >  prevent priority inversion. In Meta's fleet, we have observed lock holders stuck
> >  in memcg reclaim, blocking all waiters regardless of their priority or
> >  criticality.
> > 
> > > 
> > > This is a real problem we encountered, especially with the jbd handler
> > >  resources of the ext4 file system. Our current attempt is to defer
> > >  memory reclamation until returning to user space, in order to solve
> > >  various priority inversion issues caused by the jbd handler. Therefore,
> > >  I would be interested to discuss this topic.
> > > 
> >  
> >  Awesome, do you use memory.max and memory.high both and defer the reclaim for
> >  both? Are you deferring all the reclaims or just the ones where the charging
> >  process has the lock? (I need to look what jbd handler is).
> > 
> We do not use memory.high, although it supports deferring memory reclamation
> to user-space, it also attempts to throttle memory allocation speed, which
> introduces significant latency. In our application's case, we would rather
> accept an OOM under such circumstances. We previously attempted to address
> the priority inversion issue caused by the jbd handler separately (which we
> frequently encounter since we use the ext4 file system), and you can refer
> to this [1]. Of course, this solution lacks generality, as it requires
> calling new interfaces for various lock resources. Therefore, we internally
> have a more aggressive idea: defer all reclamation triggered by kernel-space
> memory allocation until just before returning to user-space. This should
> resolve the vast majority of priority inversion problems. The only potential
> issue introduced is that kernel-space memory usage may briefly exceed memory.max.
> 
> [1] https://lore.kernel.org/linux-mm/cover.1750234270.git.hezhongkun.hzk@bytedance.com/#r
> 
> Muchun,
> Thanks.
>


  reply	other threads:[~2026-03-13  6:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07 18:24 [LSF/MM/BPF TOPIC] Reimagining Memory Cgroup (memcg_ext) Shakeel Butt
2026-03-09 21:33 ` Roman Gushchin
2026-03-09 23:09   ` Shakeel Butt
2026-03-11  4:57 ` Jiayuan Chen
2026-03-11 17:00   ` Shakeel Butt
2026-03-11  7:19 ` Muchun Song
2026-03-11 20:39   ` Shakeel Butt
2026-03-12  2:46     ` Muchun Song
2026-03-13  6:17       ` teawater [this message]
2026-03-11  7:29 ` Greg Thelen
2026-03-11 21:35   ` Shakeel Butt
2026-03-11 13:20 ` Johannes Weiner
2026-03-11 22:47   ` Shakeel Butt
2026-03-12  3:06 ` hui.zhu
2026-03-12  3:36 ` hui.zhu
2026-03-25 18:47 ` Donet Tom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90829ef692dabd1635daf6475bd09b192788376d@linux.dev \
    --to=hui.zhu@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=emil@etsalapatis.com \
    --cc=geliang@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.lau@linux.dev \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=sweettea-kernel@dorminy.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox