Re: [PATCH v7 00/15] mm/mglru: improve reclaim loop and dirty folio handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Shakeel Butt <shakeel.butt@linux.dev>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	David Hildenbrand <david@kernel.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
	 David Stevens <stevensd@google.com>,
	Chen Ridong <chenridong@huaweicloud.com>,
	 Leno Hou <lenohou@gmail.com>, Yafang Shao <laoar.shao@gmail.com>,
	Yu Zhao <yuzhao@google.com>,
	 Zicheng Wang <wangzicheng@honor.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 Kalesh Singh <kaleshsingh@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	 Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
	linux-kernel@vger.kernel.org,  Qi Zheng <qi.zheng@linux.dev>
Subject: Re: [PATCH v7 00/15] mm/mglru: improve reclaim loop and dirty folio handling
Date: Mon, 11 May 2026 22:56:21 -0700	[thread overview]
Message-ID: <agK-rkIIZlwBiMsv@linux.dev> (raw)
In-Reply-To: <CAMgjq7BzQAPp8u_3-9e3ueXmRCoW=2sydok0hFM=MYL7VC1YYg@mail.gmail.com>

On Tue, May 12, 2026 at 01:08:49PM +0800, Kairui Song wrote:
> On Tue, May 12, 2026 at 2:51 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> >
> > Hi Kairui,
> 
> Hello,
> 
> >
> > On Tue, Apr 28, 2026 at 02:06:51AM +0800, Kairui Song via B4 Relay wrote:
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > Test results: All tests are done on a 48c96t NUMA machine with 2 nodes
> > > and a 128G memory machine using NVME as storage.
> >
> > Please include traditional LRU results for all of the following experiments as
> > well (where it makes sense).
> 
> Sure, I've spawn a few test instances, was busy travelling last week.
> That specific test machine is occupied so it might take a while.
> 
> A systematic test run takes roughly one or two days to complete for
> one kernel version or config, e.g. the JS test takes at least 2 hours
> to finish. Comparing versions/setups takes more time.
> 

No worries, we have couple of weeks before the next merge window, so no urgency.
I will go through the series in depth, hopefully there will not be a need for
next version and in that case, please just resend the cover letter with the
information you provided below and don't worry about the length of the cover
letter.

> >
> > >
> > > MongoDB
> > > =======
> > > Running YCSB workloadb [2] (recordcount:20000000 operationcount:6000000,
> > > threads:32), which does 95% read and 5% update to generate mixed read
> > > and dirty writeback. MongoDB is set up in a 10G cgroup using Docker, and
> > > the WiredTiger cache size is set to 4.5G, using NVME as storage.
> >
> > Can you add a sentence here on why this workload is chosen and is important for
> > evaluation?
> 
> Because that's exactly the one we observed with regression since it
> involves mixed writeback, and it's a pratical case.
> 

Sure, add this sentence in the cover letter.

> >
> > >
> > > Not using SWAP.
> >
> > Any specific reason to not have swap in this test?
> 
> Because we are testing the writeback here, not related to SWAP, so
> just to avoid noise and irrelevant parts.
> 
> A longer history involving SWAP is explained here:
> https://lore.kernel.org/linux-mm/20230920190244.16839-1-ryncsn@gmail.com/
> 
> And a longer discussion on that:
> https://lore.kernel.org/linux-mm/CAMgjq7BRaRgYLf2+8=+=nWtzkrHFKmudZPRm41PR6W+A+L=AKA@mail.gmail.com/
> 
> Both are not easy to reproduce, though. YCSB with MongoDB seems close
> enough and I believe we are heading in the right track.
> 
> In an internal workload, we observed that patched MGLRU is about 20%
> faster than classical LRU with MongoDB. Upstream MGLRU is still
> slightly behind classical LRU at this point, and will hopefully be
> patched soon, which is the RFC I posted:
> https://lore.kernel.org/linux-mm/20260502-mglru-fg-v1-0-913619b014d9@tencent.com/
> 

Same here but don't need to go in such details.

> >
> > >
> > > Before:
> > > Throughput(ops/sec): 62485.02962831822
> > > AverageLatency(us): 500.9746963330107
> > > pgpgin 159347462
> > > pgpgout 5413332
> > > workingset_refault_anon 0
> > > workingset_refault_file 34522071
> > >
> > > After:
> > > Throughput(ops/sec): 79760.71784646061 (+27.6%, higher is better)
> > > AverageLatency(us): 391.25169970043726 (-21.9%, lower is better)
> > > pgpgin 111093923                       (-30.3%, lower is better)
> > > pgpgout 5437456
> > > workingset_refault_anon 0
> > > workingset_refault_file 19566366       (-43.3%, lower is better)
> > >
> > > We can see a significant performance improvement after this series.
> > > The test is done on NVME and the performance gap would be even larger
> > > for slow devices, such as HDD or network storage. We observed over
> > > 100% gain for some workloads with slow IO.
> > >
> > > Chrome & Node.js [3]
> > > ====================
> > > Using Yu Zhao's test script [3], testing on a x86_64 NUMA machine with 2
> > > nodes and 128G memory, using 256G ZRAM as swap and spawn 32 memcg 64
> > > workers:
> > >
> > > Before:
> > > Total requests:            79915
> > > Per-worker 95% CI (mean):  [1233.9, 1263.5]
> > > Per-worker stdev:          59.2
> > > Jain's fairness:           0.997795 (1.0 = perfectly fair)
> > > Latency:
> > > Bucket     Count      Pct    Cumul
> > > [0,1)s     26859   33.61%   33.61%
> > > [1,2)s      7818    9.78%   43.39%
> > > [2,4)s      5532    6.92%   50.31%
> > > [4,8)s     39706   49.69%  100.00%
> > >
> > > After:
> > > Total requests:            81382
> > > Per-worker 95% CI (mean):  [1241.9, 1301.3]
> > > Per-worker stdev:          118.8
> > > Jain's fairness:           0.991480 (1.0 = perfectly fair)
> > > Latency:
> > > Bucket     Count      Pct    Cumul
> > > [0,1)s     26696   32.80%   32.80%
> > > [1,2)s      8745   10.75%   43.55%
> > > [2,4)s      6865    8.44%   51.98%
> > > [4,8)s     39076   48.02%  100.00%
> > >
> > > Reclaim is still fair and effective, total requests number seems
> > > slightly better.
> >
> > Please add a reference to Jain's fairness and a sentence on why we should care
> > about it.
> 
> So first, Here is the previous test setup for that:
> https://lore.kernel.org/all/20221220214923.1229538-1-yuzhao@google.com/
> 
> The basical idea is simple: if all memcgs are under similar pressure,
> they should be reclaimed equally, which seems fair.

I think this is too much information. Just summarize this in couple of sentences
in the cover letter. You can refer to your email in the cover letter for more
details.

[...]

> > >
> > > MySQL:
> > > ======
> > >
> > > Testing with innodb_buffer_pool_size=26106127360, in a 2G memcg, using
> > > ZRAM as swap and test command:
> > >
> > > sysbench /usr/share/sysbench/oltp_read_only.lua --mysql-db=sb \
> > >   --tables=48 --table-size=2000000 --threads=48 --time=600 run
> > >
> > > Before:            17303.41 tps
> > > After this series: 17291.50 tps
> > >
> > > Seems only noise level changes, no regression.
> > >
> >
> > Please add a sentence on why this specific params.
> >
> > > FIO:
> > > ====
> > > Testing with the following command, where /mnt/ramdisk is a
> > > 64G EXT4 ramdisk, each test file is 3G, in a 10G memcg,
> > > 6 test run each:
> > >
> > > fio --directory=/mnt/ramdisk --filename_format='test.$jobnum.img' \
> > >   --name=cached --numjobs=16 --size=3072M --buffered=1 --ioengine=mmap \
> > >   --rw=randread --norandommap --time_based \
> > >   --ramp_time=1m --runtime=5m --group_reporting
> > >
> > > Before:            8968.76 MB/s
> > > After this series: 8995.63 MB/s
> > >
> > > Also seem only noise level changes and no regression or slightly better.
> >
> > Same here.
> 
> I tested the page cache performance with buffered read. There is
> another test involving classical LRU, where MGLRU seems to
> significantly outperform classical LRU. The case was provided by the
> CachyOS community, I didn't include it here because the cover letter
> is already getting tediously long.
> 
> https://lore.kernel.org/all/acgNCzRDVmSbXrOE@KASONG-MC4/
> 
> MGLRU seems to have significantly lower jitter and better performance with that.
> 
> BTW I also disabled OOMD or any related daemon to avoid noise during
> that test. I repeated the test several times, and recorded one test
> run as well since it's meant for a desktop test and I was discussing
> with distro communities at that time. MGLRU TTL can completely avoid
> jitter, however, it's not enabled during the test to prevent
> confusion.
> 
> Classical LRU:
> https://www.youtube.com/watch?v=pujboGNcBNI
> 
> MGLRU:
> https://www.youtube.com/watch?v=ffnFUeaBQ_0

The point is not which is better but documenting the performance difference
between them for the given workload.

At the high level, I am just asking for a given benchmark/workload, let's add a
sentence why we think this specific workload is important to measure and
evaluate reclaim mechanism.

next prev parent reply	other threads:[~2026-05-12  5:56 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 18:06 [PATCH v7 00/15] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song
2026-04-27 18:06 ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 01/15] mm/mglru: consolidate common code for retrieving evictable size Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 02/15] mm/mglru: rename variables related to aging and rotation Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 03/15] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 04/15] mm/mglru: restructure the reclaim loop Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 05/15] mm/mglru: scan and count the exact number of folios Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 06/15] mm/mglru: avoid reclaim type fall back when isolation makes no progress Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-28  4:18   ` Kairui Song
2026-04-27 18:06 ` [PATCH v7 07/15] mm/mglru: use a smaller batch for reclaim Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:06 ` [PATCH v7 08/15] mm/mglru: don't abort scan immediately right after aging Kairui Song
2026-04-27 18:06   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 09/15] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 10/15] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 11/15] mm/mglru: simplify and improve dirty writeback handling Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 12/15] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 13/15] mm/vmscan: remove sc->file_taken Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 14/15] mm/vmscan: remove sc->unqueued_dirty Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:07 ` [PATCH v7 15/15] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song
2026-04-27 18:07   ` Kairui Song via B4 Relay
2026-04-27 18:22 ` [PATCH v7 00/15] mm/mglru: improve reclaim loop and dirty folio handling Andrew Morton
2026-05-11 18:51 ` Shakeel Butt
2026-05-12  5:08   ` Kairui Song
2026-05-12  5:56     ` Shakeel Butt [this message]
2026-05-14 18:50 ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agK-rkIIZlwBiMsv@linux.dev \
    --to=shakeel.butt@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=qi.zheng@linux.dev \
    --cc=ryncsn@gmail.com \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.