All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Fleming <matt@readmodwrite.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Barry Song <baohua@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	 Jens Axboe <axboe@kernel.dk>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Minchan Kim <minchan@kernel.org>,
	kernel-team@cloudflare.com,
	 Matt Fleming <mfleming@cloudflare.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 David Hildenbrand <david@kernel.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim
Date: Fri, 24 Apr 2026 16:00:35 +0100	[thread overview]
Message-ID: <aetr2ju-8st9j4ir@matt-Precision-5490> (raw)
In-Reply-To: <aeFa7uIML6NmS6T0@linux.dev>

On Thu, Apr 16, 2026 at 02:58:30PM -0700, Shakeel Butt wrote:
> On Thu, Apr 16, 2026 at 09:44:55AM +0800, Barry Song wrote:
> > 
> > I am still struggling to understand when zram-backed
> > reclamation cannot make progress. Is it because zram is
> > full, or because folio_alloc_swap() fails?
> > 
> > Or does zs_malloc() fail, causing pageout() to fail?
> > Even incompressible pages are still written as
> > ZRAM_HUGE pages and reclaimed successfully.
> 
> We should have counters for these, right?
 
Let me try and provide some more data for this. It's hard to replicate
on our production systems so I've resorted to creating a minimal Qemu
repro that has 1GiB RAM and zram disk = 1GiB. The workload is a simple
anon memory mapper that allocs 900MiB of memory and touches all pages
for 60s.

zs_malloc
---------
None of the zs_malloc() calls failed and we made ~1.2M of them during
the test. Here's a breakdown of allocation sizes:

@hist_zs_malloc_size: 
[32, 64)         4831015 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128)            409 |                                                    |
[128, 256)          1090 |                                                    |
[256, 512)          2334 |                                                    |
[512, 1K)           5069 |                                                    |
[1K, 2K)           11174 |                                                    |
[2K, 4K)            2395 |                                                    |
[4K, 8K)             237 |                                                    |

During direct reclaim only:
@hist_zs_malloc_size_in_dr: 
[32, 64)         1268042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128)             52 |                                                    |
[128, 256)           149 |                                                    |
[256, 512)           292 |                                                    |
[512, 1K)           1234 |                                                    |
[1K, 2K)            3539 |                                                    |
[2K, 4K)            1156 |                                                    |
[4K, 8K)             135 |                                                    |


/sys/block/zram0/mm_stat
--------------------------
before: 4096       74    12288        0    12288        0        0        0        0
after:  42622976  9412667 10985472        0 34131968        0     1962        0      237


trace_mm_vmscan_lru_shrink_inactive
-----------------------------------
Anon LRU shrink events:                 397,949
  sum(args->nr_scanned):                11,837,216
  sum(args->nr_reclaimed):              4,871,775
  sum(args->nr_dirty):                  0
  sum(args->nr_writeback):              0
  sum(args->nr_congested):              0
  sum(args->nr_immediate):              0
  sum(args->nr_ref_keep):               5,200,896
  sum(args->nr_unmap_fail):             0

File LRU shrink events:                 2,632
  sum(args->nr_scanned):                26,048
  sum(args->nr_reclaimed):              12,681
  sum(args->nr_dirty):                  0
  sum(args->nr_writeback):              0
  sum(args->nr_congested):              0
  sum(args->nr_immediate):              0
  sum(args->nr_ref_keep):               476
  sum(args->nr_unmap_fail):             0


> > I would rather detect what causes the lack of progress
> > and implement a better fallback.
> 
> This is a good question. I think we have appropriate counters in /proc/vmstat
> for cases where pages keep getting recycled in the LRUs instead of reclaim.

Here's the output of /proc/vmstat before and after the test runs.

nr_free_pages                             210,825 ->       206,742  (delta=-4,083)
nr_free_pages_blocks                      209,920 ->        65,536  (delta=-144,384)
nr_zone_inactive_anon                       1,685 ->           136  (delta=-1,549)
nr_zone_active_anon                            15 ->         3,774  (delta=3,759)
nr_zone_inactive_file                         329 ->           591  (delta=262)
nr_zone_active_file                           673 ->           504  (delta=-169)
nr_zspages                                      3 ->         2,716  (delta=2,713)
nr_inactive_anon                            1,685 ->           136  (delta=-1,549)
nr_active_anon                                 15 ->         3,774  (delta=3,759)
nr_inactive_file                              329 ->           591  (delta=262)
nr_active_file                                673 ->           504  (delta=-169)
nr_slab_reclaimable                         1,352 ->         2,037  (delta=685)
nr_slab_unreclaimable                       9,581 ->        11,689  (delta=2,108)
nr_anon_pages                               1,526 ->           262  (delta=-1,264)
nr_mapped                                     912 ->           442  (delta=-470)
nr_file_pages                               1,132 ->         4,760  (delta=3,628)
nr_shmem                                      162 ->         3,608  (delta=3,446)
nr_swapcached                                   0 ->            19  (delta=19)
nr_vmscan_write                                 0 ->     4,872,846  (delta=4,872,846)
nr_written                                      1 ->     4,853,727  (delta=4,853,726)

pgpgin                                      1,200 ->    19,035,312  (delta=19,034,112)
pgpgout                                         4 ->    19,414,908  (delta=19,414,904)
pswpin                                          0 ->     4,758,528  (delta=4,758,528)
pswpout                                         0 ->     4,853,726  (delta=4,853,726)

pgalloc_dma                                    32 ->        84,262  (delta=84,230)
pgalloc_dma32                              45,989 ->     5,095,307  (delta=5,049,318)
pgfree                                    269,896 ->     5,415,629  (delta=5,145,733)
pgactivate                                  2,820 ->        14,490  (delta=11,670)
pgdeactivate                                   10 ->        10,924  (delta=10,914)
pgfault                                    29,321 ->     5,088,427  (delta=5,059,106)
pgmajfault                                  3,750 ->     4,794,781  (delta=4,791,031)
pgrefill                                        0 ->        13,733  (delta=13,733)
pgreuse                                     3,333 ->         5,852  (delta=2,519)

pgsteal_kswapd                                  0 ->     3,605,552  (delta=3,605,552)
pgsteal_direct                                  0 ->     1,280,091  (delta=1,280,091)
pgscan_kswapd                                   0 ->     6,579,240  (delta=6,579,240)
pgscan_direct                                   0 ->     5,290,778  (delta=5,290,778)
pgscan_anon                                     0 ->    11,843,970  (delta=11,843,970)
pgscan_file                                     0 ->        26,048  (delta=26,048)
pgsteal_anon                                    0 ->     4,872,962  (delta=4,872,962)
pgsteal_file                                    0 ->        12,681  (delta=12,681)

allocstall_normal                               0 ->           110  (delta=110)
allocstall_movable                              0 ->        32,088  (delta=32,088)
oom_kill                                        0 ->             0  (delta=0)

workingset_nodes                                0 ->           302  (delta=302)
workingset_refault_anon                         0 ->     4,777,591  (delta=4,777,591)
workingset_refault_file                         0 ->           870  (delta=870)
workingset_activate_anon                        0 ->           487  (delta=487)

kswapd_low_wmark_hit_quickly                    0 ->            35  (delta=35)
kswapd_high_wmark_hit_quickly                   0 ->            99  (delta=99)
pageoutrun                                      0 ->           135  (delta=135)

pgmigrate_success                               0 ->        21,317  (delta=21,317)
compact_migrate_scanned                         0 ->        98,848  (delta=98,848)
compact_free_scanned                            0 ->       136,667  (delta=136,667)

swpin_zero                                      0 ->        19,069  (delta=19,069)
swpout_zero                                     0 ->        19,120  (delta=19,120)
swap_ra                                         0 ->            63  (delta=63)
swap_ra_hit                                     0 ->            26  (delta=26)

Happy to do any other tests or pull any other data for you to help.

Thanks,
Matt


  reply	other threads:[~2026-04-24 15:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 10:15 [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Matt Fleming
2026-04-13 15:38 ` Vlastimil Babka (SUSE)
2026-04-15  9:11   ` Matt Fleming
2026-04-20  9:13     ` Vlastimil Babka (SUSE)
2026-04-15 14:57 ` Pedro Falcato
2026-04-16 14:51   ` Matt Fleming
2026-04-16 21:49     ` Shakeel Butt
2026-04-17 10:35       ` Pedro Falcato
2026-04-16  1:01 ` Shakeel Butt
2026-04-16 14:54   ` Matt Fleming
2026-04-16  1:44 ` Barry Song
2026-04-16 21:58   ` Shakeel Butt
2026-04-24 15:00     ` Matt Fleming [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-03-03 11:53 [RFC PATCH 0/1] mm: Reduce direct reclaim stalls with RAM-backed swap Matt Fleming
2026-04-10  9:41 ` [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Matt Fleming
2026-04-10 10:13   ` Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aetr2ju-8st9j4ir@matt-Precision-5490 \
    --to=matt@readmodwrite.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jackmanb@google.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mfleming@cloudflare.com \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=senozhatsky@chromium.org \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.