From: Matt Fleming <matt@readmodwrite.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Barry Song <baohua@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Minchan Kim <minchan@kernel.org>,
kernel-team@cloudflare.com,
Matt Fleming <mfleming@cloudflare.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Vlastimil Babka <vbabka@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
David Hildenbrand <david@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Lorenzo Stoakes <ljs@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim
Date: Fri, 24 Apr 2026 16:00:35 +0100 [thread overview]
Message-ID: <aetr2ju-8st9j4ir@matt-Precision-5490> (raw)
In-Reply-To: <aeFa7uIML6NmS6T0@linux.dev>
On Thu, Apr 16, 2026 at 02:58:30PM -0700, Shakeel Butt wrote:
> On Thu, Apr 16, 2026 at 09:44:55AM +0800, Barry Song wrote:
> >
> > I am still struggling to understand when zram-backed
> > reclamation cannot make progress. Is it because zram is
> > full, or because folio_alloc_swap() fails?
> >
> > Or does zs_malloc() fail, causing pageout() to fail?
> > Even incompressible pages are still written as
> > ZRAM_HUGE pages and reclaimed successfully.
>
> We should have counters for these, right?
Let me try and provide some more data for this. It's hard to replicate
on our production systems so I've resorted to creating a minimal Qemu
repro that has 1GiB RAM and zram disk = 1GiB. The workload is a simple
anon memory mapper that allocs 900MiB of memory and touches all pages
for 60s.
zs_malloc
---------
None of the zs_malloc() calls failed and we made ~1.2M of them during
the test. Here's a breakdown of allocation sizes:
@hist_zs_malloc_size:
[32, 64) 4831015 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 409 | |
[128, 256) 1090 | |
[256, 512) 2334 | |
[512, 1K) 5069 | |
[1K, 2K) 11174 | |
[2K, 4K) 2395 | |
[4K, 8K) 237 | |
During direct reclaim only:
@hist_zs_malloc_size_in_dr:
[32, 64) 1268042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 52 | |
[128, 256) 149 | |
[256, 512) 292 | |
[512, 1K) 1234 | |
[1K, 2K) 3539 | |
[2K, 4K) 1156 | |
[4K, 8K) 135 | |
/sys/block/zram0/mm_stat
--------------------------
before: 4096 74 12288 0 12288 0 0 0 0
after: 42622976 9412667 10985472 0 34131968 0 1962 0 237
trace_mm_vmscan_lru_shrink_inactive
-----------------------------------
Anon LRU shrink events: 397,949
sum(args->nr_scanned): 11,837,216
sum(args->nr_reclaimed): 4,871,775
sum(args->nr_dirty): 0
sum(args->nr_writeback): 0
sum(args->nr_congested): 0
sum(args->nr_immediate): 0
sum(args->nr_ref_keep): 5,200,896
sum(args->nr_unmap_fail): 0
File LRU shrink events: 2,632
sum(args->nr_scanned): 26,048
sum(args->nr_reclaimed): 12,681
sum(args->nr_dirty): 0
sum(args->nr_writeback): 0
sum(args->nr_congested): 0
sum(args->nr_immediate): 0
sum(args->nr_ref_keep): 476
sum(args->nr_unmap_fail): 0
> > I would rather detect what causes the lack of progress
> > and implement a better fallback.
>
> This is a good question. I think we have appropriate counters in /proc/vmstat
> for cases where pages keep getting recycled in the LRUs instead of reclaim.
Here's the output of /proc/vmstat before and after the test runs.
nr_free_pages 210,825 -> 206,742 (delta=-4,083)
nr_free_pages_blocks 209,920 -> 65,536 (delta=-144,384)
nr_zone_inactive_anon 1,685 -> 136 (delta=-1,549)
nr_zone_active_anon 15 -> 3,774 (delta=3,759)
nr_zone_inactive_file 329 -> 591 (delta=262)
nr_zone_active_file 673 -> 504 (delta=-169)
nr_zspages 3 -> 2,716 (delta=2,713)
nr_inactive_anon 1,685 -> 136 (delta=-1,549)
nr_active_anon 15 -> 3,774 (delta=3,759)
nr_inactive_file 329 -> 591 (delta=262)
nr_active_file 673 -> 504 (delta=-169)
nr_slab_reclaimable 1,352 -> 2,037 (delta=685)
nr_slab_unreclaimable 9,581 -> 11,689 (delta=2,108)
nr_anon_pages 1,526 -> 262 (delta=-1,264)
nr_mapped 912 -> 442 (delta=-470)
nr_file_pages 1,132 -> 4,760 (delta=3,628)
nr_shmem 162 -> 3,608 (delta=3,446)
nr_swapcached 0 -> 19 (delta=19)
nr_vmscan_write 0 -> 4,872,846 (delta=4,872,846)
nr_written 1 -> 4,853,727 (delta=4,853,726)
pgpgin 1,200 -> 19,035,312 (delta=19,034,112)
pgpgout 4 -> 19,414,908 (delta=19,414,904)
pswpin 0 -> 4,758,528 (delta=4,758,528)
pswpout 0 -> 4,853,726 (delta=4,853,726)
pgalloc_dma 32 -> 84,262 (delta=84,230)
pgalloc_dma32 45,989 -> 5,095,307 (delta=5,049,318)
pgfree 269,896 -> 5,415,629 (delta=5,145,733)
pgactivate 2,820 -> 14,490 (delta=11,670)
pgdeactivate 10 -> 10,924 (delta=10,914)
pgfault 29,321 -> 5,088,427 (delta=5,059,106)
pgmajfault 3,750 -> 4,794,781 (delta=4,791,031)
pgrefill 0 -> 13,733 (delta=13,733)
pgreuse 3,333 -> 5,852 (delta=2,519)
pgsteal_kswapd 0 -> 3,605,552 (delta=3,605,552)
pgsteal_direct 0 -> 1,280,091 (delta=1,280,091)
pgscan_kswapd 0 -> 6,579,240 (delta=6,579,240)
pgscan_direct 0 -> 5,290,778 (delta=5,290,778)
pgscan_anon 0 -> 11,843,970 (delta=11,843,970)
pgscan_file 0 -> 26,048 (delta=26,048)
pgsteal_anon 0 -> 4,872,962 (delta=4,872,962)
pgsteal_file 0 -> 12,681 (delta=12,681)
allocstall_normal 0 -> 110 (delta=110)
allocstall_movable 0 -> 32,088 (delta=32,088)
oom_kill 0 -> 0 (delta=0)
workingset_nodes 0 -> 302 (delta=302)
workingset_refault_anon 0 -> 4,777,591 (delta=4,777,591)
workingset_refault_file 0 -> 870 (delta=870)
workingset_activate_anon 0 -> 487 (delta=487)
kswapd_low_wmark_hit_quickly 0 -> 35 (delta=35)
kswapd_high_wmark_hit_quickly 0 -> 99 (delta=99)
pageoutrun 0 -> 135 (delta=135)
pgmigrate_success 0 -> 21,317 (delta=21,317)
compact_migrate_scanned 0 -> 98,848 (delta=98,848)
compact_free_scanned 0 -> 136,667 (delta=136,667)
swpin_zero 0 -> 19,069 (delta=19,069)
swpout_zero 0 -> 19,120 (delta=19,120)
swap_ra 0 -> 63 (delta=63)
swap_ra_hit 0 -> 26 (delta=26)
Happy to do any other tests or pull any other data for you to help.
Thanks,
Matt
next prev parent reply other threads:[~2026-04-24 15:00 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 10:15 [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Matt Fleming
2026-04-13 15:38 ` Vlastimil Babka (SUSE)
2026-04-15 9:11 ` Matt Fleming
2026-04-20 9:13 ` Vlastimil Babka (SUSE)
2026-04-15 14:57 ` Pedro Falcato
2026-04-16 14:51 ` Matt Fleming
2026-04-16 21:49 ` Shakeel Butt
2026-04-17 10:35 ` Pedro Falcato
2026-04-16 1:01 ` Shakeel Butt
2026-04-16 14:54 ` Matt Fleming
2026-04-16 1:44 ` Barry Song
2026-04-16 21:58 ` Shakeel Butt
2026-04-24 15:00 ` Matt Fleming [this message]
-- strict thread matches above, loose matches on Subject: below --
2026-03-03 11:53 [RFC PATCH 0/1] mm: Reduce direct reclaim stalls with RAM-backed swap Matt Fleming
2026-04-10 9:41 ` [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Matt Fleming
2026-04-10 10:13 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aetr2ju-8st9j4ir@matt-Precision-5490 \
--to=matt@readmodwrite.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jackmanb@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mfleming@cloudflare.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=senozhatsky@chromium.org \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox