Linux Documentation
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry@kernel.org>
To: Hao Jia <jiahao.kernel@gmail.com>
Cc: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org,
	 shakeel.butt@linux.dev, mhocko@kernel.org, mkoutny@suse.com,
	nphamcs@gmail.com,  chengming.zhou@linux.dev,
	muchun.song@linux.dev, roman.gushchin@linux.dev,
	 cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	Hao Jia <jiahao1@lixiang.com>
Subject: Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg
Date: Wed, 3 Jun 2026 17:53:21 +0000	[thread overview]
Message-ID: <aiBpibRNi0BcM1Zu@google.com> (raw)
In-Reply-To: <c7870fe2-3588-79db-cbfb-bd6a2b78f594@gmail.com>

On Wed, Jun 03, 2026 at 11:02:54AM +0800, Hao Jia wrote:
> 
> 
> On 2026/6/3 07:19, Yosry Ahmed wrote:
> > > > > > > Proactive writeback also wants a similar per-memcg cursor that is
> > > > > > > scoped to the specified memcg, so that repeated invocations against
> > > > > > > the same memcg make forward progress across its descendant memcgs
> > > > > > > instead of restarting from the first child memcg each time.
> > > > > > 
> > > > > > Is this a problem in practice?
> > > > > > 
> > > > > > Is the concern the overhead of scanning memcgs repeatedly, or lack of
> > > > > > fairness? I wonder if we should just do writeback in batches from all
> > > > > > memcgs, similar to how reclaim does it, then evaluate at the end if we
> > > > > > need to start over?
> > > > > > 
> > > > > 
> > > > > Not using a per-cgroup cursor will cause issues for "repeated small-budget
> > > > > calls" cases. For example, repeatedly triggering a 2MB writeback might
> > > > > result in only writing back pages from the first few child memcgs every
> > > > > time. In the worst-case scenario (where the writeback amount is less than
> > > > > WB_BATCH), it might only ever write back from the first child memcg.
> > > > 
> > > > Right, so a fairness concern?
> > > > 
> > > > I wonder if we should just reclaim a batch from each memcg, then check
> > > > if we reached the goal, otherwise start over. If the batch size is small
> > > > enough that should work?
> > > 
> > > Even with a small batch size, for small writeback requests triggered by
> > > user-space (e.g., 2MB, which is batch size * N), it might still repeatedly
> > > write back from only the first N child memcgs.
> > 
> > Yes, I understand, I am asking if this is a problem in practice. For
> > this to be a problem we'd need to trigger small writeback requests and
> > have many memcgs.
> > 
> > > This could cause the user-space agent to prematurely give up on zswap
> > > writeback.
> > 
> > Why? The kernel should not return before trying to writeback from all
> > memcgs. If we scan the first N child memcgs and did not writeback
> > enough, we should keep going, right?
> > 
> 
> Yes, this issue is not caused by the kernel, but rather by our user-space
> agent itself.
> 
> For instance, suppose a parent memcg has two children, memcg1 and memcg2,
> each with 200MB of zswap (100MB inactive). Triggering proactive writeback on
> the parent memcg will exhaust memcg1's inactive zswap pages. After that,
> even though memcg2 still has plenty of inactive zswap pages, it will
> continue to write back memcg1's active zswap pages. Writing back active
> zswap pages causes the user-space agent to prematurely abort the writeback
> because it detects that certain memcg metrics have exceeded predefined
> thresholds.

This will only happen if the reclaim size is smaller than the batch
size, right? Otherwise the kernel should reclaim more or less equally
from both memcgs?

> Of course, real-world scenarios are much more complex, and this kind of case
> is extremely rare in our environment.
> 
> That being said, your suggestion of using the global lock for the per-memcg
> cursors makes the writeback fairer and would resolve these corner cases.

Right, but I'd rather not do per-memcg cursors at all if we can avoid
it. Will using batches help make reclaim fair over all memcgs without a
cursor?

We can always add the cursor later if needed.

  reply	other threads:[~2026-06-03 17:53 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 11:45 [PATCH v3 0/4] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-05-26 11:45 ` [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg Hao Jia
2026-05-29 19:51   ` Nhat Pham
2026-05-30  1:24   ` Yosry Ahmed
2026-06-01 11:07     ` Hao Jia
2026-06-01 16:44       ` Nhat Pham
2026-06-01 16:47         ` Nhat Pham
2026-06-01 17:08       ` Nhat Pham
2026-06-02 11:32         ` Hao Jia
2026-06-02  0:31       ` Yosry Ahmed
2026-06-02 11:33         ` Hao Jia
2026-06-02 23:19           ` Yosry Ahmed
2026-06-03  3:02             ` Hao Jia
2026-06-03 17:53               ` Yosry Ahmed [this message]
2026-05-26 11:45 ` [PATCH v3 2/4] mm/zswap: Implement proactive writeback Hao Jia
2026-05-29 19:58   ` Nhat Pham
2026-05-30  1:40     ` Yosry Ahmed
2026-06-03 11:22       ` Hao Jia
2026-06-03 17:58         ` Yosry Ahmed
2026-06-03 18:14           ` Nhat Pham
2026-05-30  1:37   ` Yosry Ahmed
2026-06-03 11:27     ` Hao Jia
2026-06-03 17:55       ` Yosry Ahmed
2026-06-03 18:23       ` Nhat Pham
2026-06-03 18:26         ` Yosry Ahmed
2026-06-03 18:34           ` Nhat Pham
2026-06-03 18:43             ` Yosry Ahmed
2026-06-03 18:51               ` Nhat Pham
2026-06-03 18:54                 ` Yosry Ahmed
2026-05-26 11:46 ` [PATCH v3 3/4] mm/zswap: Add per-memcg stat for " Hao Jia
2026-05-29 20:01   ` Nhat Pham
2026-06-03 11:29     ` Hao Jia
2026-05-26 11:46 ` [PATCH v3 4/4] selftests/cgroup: Add tests for zswap " Hao Jia
2026-05-29 20:02   ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiBpibRNi0BcM1Zu@google.com \
    --to=yosry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=jiahao.kernel@gmail.com \
    --cc=jiahao1@lixiang.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox