From: Matt Fleming <matt@readmodwrite.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Barry Song <baohua@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Minchan Kim <minchan@kernel.org>,
kernel-team@cloudflare.com,
Matt Fleming <mfleming@cloudflare.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Vlastimil Babka <vbabka@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
David Hildenbrand <david@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Lorenzo Stoakes <ljs@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim
Date: Fri, 24 Apr 2026 16:00:35 +0100 [thread overview]
Message-ID: <aetr2ju-8st9j4ir@matt-Precision-5490> (raw)
In-Reply-To: <aeFa7uIML6NmS6T0@linux.dev>
On Thu, Apr 16, 2026 at 02:58:30PM -0700, Shakeel Butt wrote:
> On Thu, Apr 16, 2026 at 09:44:55AM +0800, Barry Song wrote:
> >
> > I am still struggling to understand when zram-backed
> > reclamation cannot make progress. Is it because zram is
> > full, or because folio_alloc_swap() fails?
> >
> > Or does zs_malloc() fail, causing pageout() to fail?
> > Even incompressible pages are still written as
> > ZRAM_HUGE pages and reclaimed successfully.
>
> We should have counters for these, right?
Let me try and provide some more data for this. It's hard to replicate
on our production systems so I've resorted to creating a minimal Qemu
repro that has 1GiB RAM and zram disk = 1GiB. The workload is a simple
anon memory mapper that allocs 900MiB of memory and touches all pages
for 60s.
zs_malloc
---------
None of the zs_malloc() calls failed and we made ~1.2M of them during
the test. Here's a breakdown of allocation sizes:
@hist_zs_malloc_size:
[32, 64) 4831015 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 409 | |
[128, 256) 1090 | |
[256, 512) 2334 | |
[512, 1K) 5069 | |
[1K, 2K) 11174 | |
[2K, 4K) 2395 | |
[4K, 8K) 237 | |
During direct reclaim only:
@hist_zs_malloc_size_in_dr:
[32, 64) 1268042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 52 | |
[128, 256) 149 | |
[256, 512) 292 | |
[512, 1K) 1234 | |
[1K, 2K) 3539 | |
[2K, 4K) 1156 | |
[4K, 8K) 135 | |
/sys/block/zram0/mm_stat
--------------------------
before: 4096 74 12288 0 12288 0 0 0 0
after: 42622976 9412667 10985472 0 34131968 0 1962 0 237
trace_mm_vmscan_lru_shrink_inactive
-----------------------------------
Anon LRU shrink events: 397,949
sum(args->nr_scanned): 11,837,216
sum(args->nr_reclaimed): 4,871,775
sum(args->nr_dirty): 0
sum(args->nr_writeback): 0
sum(args->nr_congested): 0
sum(args->nr_immediate): 0
sum(args->nr_ref_keep): 5,200,896
sum(args->nr_unmap_fail): 0
File LRU shrink events: 2,632
sum(args->nr_scanned): 26,048
sum(args->nr_reclaimed): 12,681
sum(args->nr_dirty): 0
sum(args->nr_writeback): 0
sum(args->nr_congested): 0
sum(args->nr_immediate): 0
sum(args->nr_ref_keep): 476
sum(args->nr_unmap_fail): 0
> > I would rather detect what causes the lack of progress
> > and implement a better fallback.
>
> This is a good question. I think we have appropriate counters in /proc/vmstat
> for cases where pages keep getting recycled in the LRUs instead of reclaim.
Here's the output of /proc/vmstat before and after the test runs.
nr_free_pages 210,825 -> 206,742 (delta=-4,083)
nr_free_pages_blocks 209,920 -> 65,536 (delta=-144,384)
nr_zone_inactive_anon 1,685 -> 136 (delta=-1,549)
nr_zone_active_anon 15 -> 3,774 (delta=3,759)
nr_zone_inactive_file 329 -> 591 (delta=262)
nr_zone_active_file 673 -> 504 (delta=-169)
nr_zspages 3 -> 2,716 (delta=2,713)
nr_inactive_anon 1,685 -> 136 (delta=-1,549)
nr_active_anon 15 -> 3,774 (delta=3,759)
nr_inactive_file 329 -> 591 (delta=262)
nr_active_file 673 -> 504 (delta=-169)
nr_slab_reclaimable 1,352 -> 2,037 (delta=685)
nr_slab_unreclaimable 9,581 -> 11,689 (delta=2,108)
nr_anon_pages 1,526 -> 262 (delta=-1,264)
nr_mapped 912 -> 442 (delta=-470)
nr_file_pages 1,132 -> 4,760 (delta=3,628)
nr_shmem 162 -> 3,608 (delta=3,446)
nr_swapcached 0 -> 19 (delta=19)
nr_vmscan_write 0 -> 4,872,846 (delta=4,872,846)
nr_written 1 -> 4,853,727 (delta=4,853,726)
pgpgin 1,200 -> 19,035,312 (delta=19,034,112)
pgpgout 4 -> 19,414,908 (delta=19,414,904)
pswpin 0 -> 4,758,528 (delta=4,758,528)
pswpout 0 -> 4,853,726 (delta=4,853,726)
pgalloc_dma 32 -> 84,262 (delta=84,230)
pgalloc_dma32 45,989 -> 5,095,307 (delta=5,049,318)
pgfree 269,896 -> 5,415,629 (delta=5,145,733)
pgactivate 2,820 -> 14,490 (delta=11,670)
pgdeactivate 10 -> 10,924 (delta=10,914)
pgfault 29,321 -> 5,088,427 (delta=5,059,106)
pgmajfault 3,750 -> 4,794,781 (delta=4,791,031)
pgrefill 0 -> 13,733 (delta=13,733)
pgreuse 3,333 -> 5,852 (delta=2,519)
pgsteal_kswapd 0 -> 3,605,552 (delta=3,605,552)
pgsteal_direct 0 -> 1,280,091 (delta=1,280,091)
pgscan_kswapd 0 -> 6,579,240 (delta=6,579,240)
pgscan_direct 0 -> 5,290,778 (delta=5,290,778)
pgscan_anon 0 -> 11,843,970 (delta=11,843,970)
pgscan_file 0 -> 26,048 (delta=26,048)
pgsteal_anon 0 -> 4,872,962 (delta=4,872,962)
pgsteal_file 0 -> 12,681 (delta=12,681)
allocstall_normal 0 -> 110 (delta=110)
allocstall_movable 0 -> 32,088 (delta=32,088)
oom_kill 0 -> 0 (delta=0)
workingset_nodes 0 -> 302 (delta=302)
workingset_refault_anon 0 -> 4,777,591 (delta=4,777,591)
workingset_refault_file 0 -> 870 (delta=870)
workingset_activate_anon 0 -> 487 (delta=487)
kswapd_low_wmark_hit_quickly 0 -> 35 (delta=35)
kswapd_high_wmark_hit_quickly 0 -> 99 (delta=99)
pageoutrun 0 -> 135 (delta=135)
pgmigrate_success 0 -> 21,317 (delta=21,317)
compact_migrate_scanned 0 -> 98,848 (delta=98,848)
compact_free_scanned 0 -> 136,667 (delta=136,667)
swpin_zero 0 -> 19,069 (delta=19,069)
swpout_zero 0 -> 19,120 (delta=19,120)
swap_ra 0 -> 63 (delta=63)
swap_ra_hit 0 -> 26 (delta=26)
Happy to do any other tests or pull any other data for you to help.
Thanks,
Matt
next prev parent reply other threads:[~2026-04-24 15:00 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 10:15 [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Matt Fleming
2026-04-13 15:38 ` Vlastimil Babka (SUSE)
2026-04-15 9:11 ` Matt Fleming
2026-04-20 9:13 ` Vlastimil Babka (SUSE)
2026-04-15 14:57 ` Pedro Falcato
2026-04-16 14:51 ` Matt Fleming
2026-04-16 21:49 ` Shakeel Butt
2026-04-17 10:35 ` Pedro Falcato
2026-04-16 1:01 ` Shakeel Butt
2026-04-16 14:54 ` Matt Fleming
2026-04-16 1:44 ` Barry Song
2026-04-16 21:58 ` Shakeel Butt
2026-04-24 15:00 ` Matt Fleming [this message]
-- strict thread matches above, loose matches on Subject: below --
2026-03-03 11:53 [RFC PATCH 0/1] mm: Reduce direct reclaim stalls with RAM-backed swap Matt Fleming
2026-04-10 9:41 ` [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Matt Fleming
2026-04-10 10:13 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aetr2ju-8st9j4ir@matt-Precision-5490 \
--to=matt@readmodwrite.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jackmanb@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mfleming@cloudflare.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=senozhatsky@chromium.org \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.