From: Li Wang <liwang@redhat.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: "Yosry Ahmed" <yosryahmed@google.com>,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, "Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Tejun Heo" <tj@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>
Subject: Re: [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
Date: Fri, 13 Mar 2026 10:59:31 +0800 [thread overview]
Message-ID: <abN9k5A8rJaA8mkR@redhat.com> (raw)
In-Reply-To: <CAKEwX=O69LepTsB1tJO1otDziyE4Oi3PAW=jShiTxQ=Hg7dB-Q@mail.gmail.com>
On Thu, Mar 12, 2026 at 10:09:10AM -0700, Nhat Pham wrote:
> On Wed, Mar 11, 2026 at 9:01 PM Li Wang <liwang@redhat.com> wrote:
> >
> > On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote:
> > > On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
> > > >
> > > > test_swapin_nozswap can hit OOM before reaching its assertions on some
> > > > setups. The test currently sets memory.max=8M and then allocates/reads
> > > > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> > > > the workload process.
> > > >
> > > > Raise memory.max to 24M so the workload can make forward progress, and
> > > > lower the swap_peak expectation from 24M to 8M to keep the check robust
> > > > across environments.
> > > >
> > > > The test intent is unchanged: verify that swapping happens while zswap
> > > > remains unused when memory.zswap.max=0.
> > > >
> > > > === Error Logs ===
> > > >
> > > > # ./test_zswap
> > > > TAP version 13
> > > > 1..7
> > > > ok 1 test_zswap_usage
> > > > not ok 2 test_swapin_nozswap
> > > > ...
> > > >
> > > > # dmesg
> > > > [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > > > [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
> > > > [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
> > > > [271641.879173] Call Trace:
> > > > [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
> > > > [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
> > > > [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
> > > > [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
> > > > [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
> > > > [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
> > > > [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
> > > > [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
> > > > [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
> > > > [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
> > > > [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
> > > > [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
> > > > [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
> > > > [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
> > > > [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
> > > > [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
> > > > [271641.879548] Tasks state (memory values in pages):
> > > > ...
> > > > [271641.879550] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > > > [271641.879555] [ 177372] 0 177372 571 0 0 0 0 51200 96 0 test_zswap
> > > > [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
> > > > [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0
> > >
> > > Why are we getting an OOM kill when there's a swap device? Is the
> > > device slow / not keeping up with reclaim pace?
> >
> > This is a good question. The OOM is triggered very likely because memcg
> > reclaim can't make forward progress fast enough within the retry budget
> > of try_charge_memcg.
> >
> > Looking at the OOM info, the system has 64K pages, so memory.max=8M gives
> > only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space
> > itself isn't full, the charge path simply gave up trying to reclaim.
> >
> > The core issue, I guess, is that with memory.zswap.max=0, every page
> > reclaimed must go through the real block device. The charge path works
> > like this: a page fault fires, charge_memcg tries to charge 64K to the
> > cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct
> > reclaim to free space. If the swap device can't drain pages fast enough,
> > the reclaim attempts within the retry loop fail to bring usage below
> > memory.max, and the kernel invokes OOM, even though swap space is
> > technically available.
> >
> > Raising memory.max to 24M gives reclaim a much larger pool to work with,
> > so it can absorb I/O latency without exhausting its retry budget.
>
> Hmmm, perhaps we should change all these constants to multiples of
> base page size of a system?
Yeah, this may better, let me try it in next version.
--
Regards,
Li Wang
next prev parent reply other threads:[~2026-03-13 2:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
2026-03-11 18:50 ` Yosry Ahmed
2026-03-12 4:01 ` Li Wang
2026-03-12 17:09 ` Nhat Pham
2026-03-13 2:59 ` Li Wang [this message]
2026-03-11 11:05 ` [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check Li Wang
2026-03-11 18:56 ` Yosry Ahmed
2026-03-12 2:35 ` Li Wang
2026-03-11 11:05 ` [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system Li Wang
2026-03-11 19:01 ` Yosry Ahmed
2026-03-12 2:36 ` Li Wang
2026-03-11 11:05 ` [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() " Li Wang
2026-03-11 18:58 ` Yosry Ahmed
2026-03-12 2:38 ` Li Wang
2026-03-11 13:20 ` [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Michal Koutný
2026-03-11 18:41 ` Yosry Ahmed
2026-03-11 18:47 ` Yosry Ahmed
2026-03-12 1:41 ` Li Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abN9k5A8rJaA8mkR@redhat.com \
--to=liwang@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox