From: Li Wang <liwang@redhat.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: "Yosry Ahmed" <yosryahmed@google.com>,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, "Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Tejun Heo" <tj@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>
Subject: Re: [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
Date: Fri, 13 Mar 2026 10:59:31 +0800 [thread overview]
Message-ID: <abN9k5A8rJaA8mkR@redhat.com> (raw)
In-Reply-To: <CAKEwX=O69LepTsB1tJO1otDziyE4Oi3PAW=jShiTxQ=Hg7dB-Q@mail.gmail.com>
On Thu, Mar 12, 2026 at 10:09:10AM -0700, Nhat Pham wrote:
> On Wed, Mar 11, 2026 at 9:01 PM Li Wang <liwang@redhat.com> wrote:
> >
> > On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote:
> > > On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
> > > >
> > > > test_swapin_nozswap can hit OOM before reaching its assertions on some
> > > > setups. The test currently sets memory.max=8M and then allocates/reads
> > > > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> > > > the workload process.
> > > >
> > > > Raise memory.max to 24M so the workload can make forward progress, and
> > > > lower the swap_peak expectation from 24M to 8M to keep the check robust
> > > > across environments.
> > > >
> > > > The test intent is unchanged: verify that swapping happens while zswap
> > > > remains unused when memory.zswap.max=0.
> > > >
> > > > === Error Logs ===
> > > >
> > > > # ./test_zswap
> > > > TAP version 13
> > > > 1..7
> > > > ok 1 test_zswap_usage
> > > > not ok 2 test_swapin_nozswap
> > > > ...
> > > >
> > > > # dmesg
> > > > [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > > > [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
> > > > [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
> > > > [271641.879173] Call Trace:
> > > > [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
> > > > [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
> > > > [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
> > > > [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
> > > > [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
> > > > [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
> > > > [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
> > > > [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
> > > > [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
> > > > [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
> > > > [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
> > > > [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
> > > > [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
> > > > [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
> > > > [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
> > > > [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
> > > > [271641.879548] Tasks state (memory values in pages):
> > > > ...
> > > > [271641.879550] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > > > [271641.879555] [ 177372] 0 177372 571 0 0 0 0 51200 96 0 test_zswap
> > > > [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
> > > > [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0
> > >
> > > Why are we getting an OOM kill when there's a swap device? Is the
> > > device slow / not keeping up with reclaim pace?
> >
> > This is a good question. The OOM is triggered very likely because memcg
> > reclaim can't make forward progress fast enough within the retry budget
> > of try_charge_memcg.
> >
> > Looking at the OOM info, the system has 64K pages, so memory.max=8M gives
> > only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space
> > itself isn't full, the charge path simply gave up trying to reclaim.
> >
> > The core issue, I guess, is that with memory.zswap.max=0, every page
> > reclaimed must go through the real block device. The charge path works
> > like this: a page fault fires, charge_memcg tries to charge 64K to the
> > cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct
> > reclaim to free space. If the swap device can't drain pages fast enough,
> > the reclaim attempts within the retry loop fail to bring usage below
> > memory.max, and the kernel invokes OOM, even though swap space is
> > technically available.
> >
> > Raising memory.max to 24M gives reclaim a much larger pool to work with,
> > so it can absorb I/O latency without exhausting its retry budget.
>
> Hmmm, perhaps we should change all these constants to multiples of
> base page size of a system?
Yeah, this may better, let me try it in next version.
--
Regards,
Li Wang
next prev parent reply other threads:[~2026-03-13 2:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
2026-03-11 18:50 ` Yosry Ahmed
2026-03-12 4:01 ` Li Wang
2026-03-12 17:09 ` Nhat Pham
2026-03-13 2:59 ` Li Wang [this message]
2026-03-11 11:05 ` [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check Li Wang
2026-03-11 18:56 ` Yosry Ahmed
2026-03-12 2:35 ` Li Wang
2026-03-11 11:05 ` [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system Li Wang
2026-03-11 19:01 ` Yosry Ahmed
2026-03-12 2:36 ` Li Wang
2026-03-11 11:05 ` [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() " Li Wang
2026-03-11 18:58 ` Yosry Ahmed
2026-03-12 2:38 ` Li Wang
2026-03-11 13:20 ` [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Michal Koutný
2026-03-11 18:41 ` Yosry Ahmed
2026-03-11 18:47 ` Yosry Ahmed
2026-03-12 1:41 ` Li Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abN9k5A8rJaA8mkR@redhat.com \
--to=liwang@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.