From: Yosry Ahmed <yosry@kernel.org>
To: Hao Jia <jiahao.kernel@gmail.com>
Cc: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org,
shakeel.butt@linux.dev, mhocko@kernel.org, mkoutny@suse.com,
nphamcs@gmail.com, chengming.zhou@linux.dev,
muchun.song@linux.dev, roman.gushchin@linux.dev,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
Hao Jia <jiahao1@lixiang.com>
Subject: Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg
Date: Wed, 3 Jun 2026 17:53:21 +0000 [thread overview]
Message-ID: <aiBpibRNi0BcM1Zu@google.com> (raw)
In-Reply-To: <c7870fe2-3588-79db-cbfb-bd6a2b78f594@gmail.com>
On Wed, Jun 03, 2026 at 11:02:54AM +0800, Hao Jia wrote:
>
>
> On 2026/6/3 07:19, Yosry Ahmed wrote:
> > > > > > > Proactive writeback also wants a similar per-memcg cursor that is
> > > > > > > scoped to the specified memcg, so that repeated invocations against
> > > > > > > the same memcg make forward progress across its descendant memcgs
> > > > > > > instead of restarting from the first child memcg each time.
> > > > > >
> > > > > > Is this a problem in practice?
> > > > > >
> > > > > > Is the concern the overhead of scanning memcgs repeatedly, or lack of
> > > > > > fairness? I wonder if we should just do writeback in batches from all
> > > > > > memcgs, similar to how reclaim does it, then evaluate at the end if we
> > > > > > need to start over?
> > > > > >
> > > > >
> > > > > Not using a per-cgroup cursor will cause issues for "repeated small-budget
> > > > > calls" cases. For example, repeatedly triggering a 2MB writeback might
> > > > > result in only writing back pages from the first few child memcgs every
> > > > > time. In the worst-case scenario (where the writeback amount is less than
> > > > > WB_BATCH), it might only ever write back from the first child memcg.
> > > >
> > > > Right, so a fairness concern?
> > > >
> > > > I wonder if we should just reclaim a batch from each memcg, then check
> > > > if we reached the goal, otherwise start over. If the batch size is small
> > > > enough that should work?
> > >
> > > Even with a small batch size, for small writeback requests triggered by
> > > user-space (e.g., 2MB, which is batch size * N), it might still repeatedly
> > > write back from only the first N child memcgs.
> >
> > Yes, I understand, I am asking if this is a problem in practice. For
> > this to be a problem we'd need to trigger small writeback requests and
> > have many memcgs.
> >
> > > This could cause the user-space agent to prematurely give up on zswap
> > > writeback.
> >
> > Why? The kernel should not return before trying to writeback from all
> > memcgs. If we scan the first N child memcgs and did not writeback
> > enough, we should keep going, right?
> >
>
> Yes, this issue is not caused by the kernel, but rather by our user-space
> agent itself.
>
> For instance, suppose a parent memcg has two children, memcg1 and memcg2,
> each with 200MB of zswap (100MB inactive). Triggering proactive writeback on
> the parent memcg will exhaust memcg1's inactive zswap pages. After that,
> even though memcg2 still has plenty of inactive zswap pages, it will
> continue to write back memcg1's active zswap pages. Writing back active
> zswap pages causes the user-space agent to prematurely abort the writeback
> because it detects that certain memcg metrics have exceeded predefined
> thresholds.
This will only happen if the reclaim size is smaller than the batch
size, right? Otherwise the kernel should reclaim more or less equally
from both memcgs?
> Of course, real-world scenarios are much more complex, and this kind of case
> is extremely rare in our environment.
>
> That being said, your suggestion of using the global lock for the per-memcg
> cursors makes the writeback fairer and would resolve these corner cases.
Right, but I'd rather not do per-memcg cursors at all if we can avoid
it. Will using batches help make reclaim fair over all memcgs without a
cursor?
We can always add the cursor later if needed.
next prev parent reply other threads:[~2026-06-03 17:53 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 11:45 [PATCH v3 0/4] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-05-26 11:45 ` [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg Hao Jia
2026-05-29 19:51 ` Nhat Pham
2026-05-30 1:24 ` Yosry Ahmed
2026-06-01 11:07 ` Hao Jia
2026-06-01 16:44 ` Nhat Pham
2026-06-01 16:47 ` Nhat Pham
2026-06-01 17:08 ` Nhat Pham
2026-06-02 11:32 ` Hao Jia
2026-06-02 0:31 ` Yosry Ahmed
2026-06-02 11:33 ` Hao Jia
2026-06-02 23:19 ` Yosry Ahmed
2026-06-03 3:02 ` Hao Jia
2026-06-03 17:53 ` Yosry Ahmed [this message]
2026-06-04 1:58 ` Hao Jia
2026-06-04 5:34 ` Yosry Ahmed
2026-06-04 13:06 ` Hao Jia
2026-06-04 16:10 ` Yosry Ahmed
2026-06-04 17:23 ` Nhat Pham
2026-05-26 11:45 ` [PATCH v3 2/4] mm/zswap: Implement proactive writeback Hao Jia
2026-05-29 19:58 ` Nhat Pham
2026-05-30 1:40 ` Yosry Ahmed
2026-06-03 11:22 ` Hao Jia
2026-06-03 17:58 ` Yosry Ahmed
2026-06-03 18:14 ` Nhat Pham
2026-06-04 2:11 ` Hao Jia
2026-06-04 5:36 ` Yosry Ahmed
2026-06-04 14:01 ` Shakeel Butt
2026-05-30 1:37 ` Yosry Ahmed
2026-06-03 11:27 ` Hao Jia
2026-06-03 17:55 ` Yosry Ahmed
2026-06-03 18:23 ` Nhat Pham
2026-06-03 18:26 ` Yosry Ahmed
2026-06-03 18:34 ` Nhat Pham
2026-06-03 18:43 ` Yosry Ahmed
2026-06-03 18:51 ` Nhat Pham
2026-06-03 18:54 ` Yosry Ahmed
2026-05-26 11:46 ` [PATCH v3 3/4] mm/zswap: Add per-memcg stat for " Hao Jia
2026-05-29 20:01 ` Nhat Pham
2026-06-03 11:29 ` Hao Jia
2026-05-26 11:46 ` [PATCH v3 4/4] selftests/cgroup: Add tests for zswap " Hao Jia
2026-05-29 20:02 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiBpibRNi0BcM1Zu@google.com \
--to=yosry@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=jiahao.kernel@gmail.com \
--cc=jiahao1@lixiang.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.