From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Yosry Ahmed <yosry.ahmed@linux.dev>, Zi Yan <ziy@nvidia.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Usama Arif <usama.arif@linux.dev>,
Kiryl Shutsemau <kas@kernel.org>,
Dave Chinner <david@fromorbit.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org,
Alexander Egorenkov <egorenar@linux.ibm.com>
Subject: Re: [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru - [s390] panic in __memcg_list_lru_alloc
Date: Mon, 30 Mar 2026 18:37:01 +0200 [thread overview]
Message-ID: <4d3f8d79-3593-47df-9de8-f94f7f09a403@linux.ibm.com> (raw)
In-Reply-To: <20260318200352.1039011-8-hannes@cmpxchg.org>
On 18-Mar-26 20:53, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion.
> The queue is per-NUMA node or per-cgroup, not the intersection. That
> means on a cgrouped system, a node-restricted allocation entering
> reclaim can end up splitting large pages on other nodes:
>
> alloc/unmap deferred_split_folio() list_add_tail(memcg-
> >split_queue) set_shrinker_bit(memcg, node, deferred_shrinker_id)
>
> for_each_zone_zonelist_nodemask(restricted_nodes) mem_cgroup_iter()
> shrink_slab(node, memcg) shrink_slab_memcg(node, memcg) if
> test_shrinker_bit(memcg, node, deferred_shrinker_id)
> deferred_split_scan() walks memcg->split_queue
>
> The shrinker bit adds an imperfect guard rail. As soon as the
> cgroup has a single large page on the node of interest, all large
> pages owned by that memcg, including those on other nodes, will be
> split.
>
> list_lru properly sets up per-node, per-cgroup lists. As a bonus,
> it streamlines a lot of the list operations and reclaim walks. It's
> used widely by other major shrinkers already. Convert the deferred
> split queue as well.
>
> The list_lru per-memcg heads are instantiated on demand when the
> first object of interest is allocated for a cgroup, by calling
> folio_memcg_list_lru_alloc(). Add calls to where splittable pages
> are created: anon faults, swapin faults, khugepaged collapse.
>
> These calls create all possible node heads for the cgroup at once,
> so the migration code (between nodes) doesn't need any special care.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/
> linux/huge_mm.h | 6 +- include/linux/memcontrol.h | 4 -
> include/linux/mmzone.h | 12 -- mm/huge_memory.c | 342
> ++++++++++++------------------------- mm/internal.h | 2
> +- mm/khugepaged.c | 7 + mm/memcontrol.c | 12 +- mm/
> memory.c | 52 +++--- mm/ mm_init.c |
> 15 -- 9 files changed, 151 insertions(+), 301 deletions(-)
>
Hi,
with this series in linux-next (since next-20260324) I see a reproducible panic on s390 in the
dump kernel when running NVMe standalone dump (ngdump).
This only happens in the 'capture kernel', normal boot of the same kernel works fine.
[ 14.350676] Unable to handle kernel pointer dereference in virtual kernel address space
[ 14.350682] Failing address: 4000000000000000 TEID: 4000000000000803 ESOP-2 FSI
[ 14.350686] Fault in home space mode while using kernel ASCE.
[ 14.350689] AS:0000000002798007 R3:000000002d2c4007 S:000000002d2c3001 P:000000000000013d
[ 14.350730] Oops: 0038 ilc:3 [#1]SMP
[ 14.350735] Modules linked in: dm_service_time zfcp scsi_transport_fc uvdevice diag288_wdt nvme prng aes_s390 nvme_core des_s390 libdes zcrypt_cex4 dm_mirror dm_region_hash dm_log scsi_dh_rdac scsi_dh_emc scsi_dh_alua paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey dm_multipath autofs4
[ 14.350760] CPU: 0 UID: 0 PID: 32 Comm: khugepaged Not tainted 7.0.0-rc5-next-20260324
[ 14.350762] Hardware name: IBM 3931 A01 704 (LPAR)
[ 14.350764] Krnl PSW : 0704d00180000000 000003ffe0443a82 (__memcg_list_lru_alloc+0x52/0x1d0)
[ 14.350774] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[ 14.350776] Krnl GPRS: 0000000000000402 00000000000bece0 0000000000000000 000003ffe1c17928
[ 14.350778] 00000000001c24ca 0000000000000000 0000000000000000 000003ffe1c17948
[ 14.350780] 0000000000000000 00000000000824c0 0000037200098000 4000000000000000
[ 14.350782] 0000000000782400 0000000000000001 0000037fe00f39b8 0000037fe00f3918
[ 14.350788] Krnl Code: 000003ffe0443a72: a7690000 lghi %r6,0
[ 14.350788] 000003ffe0443a76: e380f0a00004 lg %r8,160(%r15)
[ 14.350788] *000003ffe0443a7c: e3b080b80004 lg %r11,184(%r8)
[ 14.350788] >000003ffe0443a82: e330b9400012 lt %r3,2368(%r11)
[ 14.350788] 000003ffe0443a88: a7a40065 brc 10,000003ffe0443b52
[ 14.350788] 000003ffe0443a8c: e3b0f0a00004 lg %r11,160(%r15)
[ 14.350788] 000003ffe0443a92: ec68006f007c cgij %r6,0,8,000003ffe0443b70
[ 14.350788] 000003ffe0443a98: e300b9400014 lgf %r0,2368(%r11)
[ 14.350825] Call Trace:
[ 14.350826] [<000003ffe0443a82>] __memcg_list_lru_alloc+0x52/0x1d0
[ 14.350831] [<000003ffe044529a>] folio_memcg_list_lru_alloc+0xba/0x150
[ 14.350834] [<000003ffe04f279a>] alloc_charge_folio+0x18a/0x250
[ 14.350839] [<000003ffe04f34dc>] collapse_huge_page+0x8c/0x890
[ 14.350841] [<000003ffe04f4222>] collapse_scan_pmd+0x542/0x690
[ 14.350844] [<000003ffe04f65b4>] collapse_single_pmd+0x144/0x240
[ 14.350847] [<000003ffe04f69ce>] collapse_scan_mm_slot.constprop.0+0x31e/0x480
[ 14.350849] [<000003ffe04f6d3c>] khugepaged+0x20c/0x210
[ 14.350852] [<000003ffe019b0a8>] kthread+0x148/0x170
[ 14.350856] [<000003ffe0119fec>] __ret_from_fork+0x3c/0x240
[ 14.350860] [<000003ffe0ffa4b2>] ret_from_fork+0xa/0x30
[ 14.350865] Last Breaking-Event-Address:
[ 14.350865] [<000003ffe0445294>] folio_memcg_list_lru_alloc+0xb4/0x150
[ 14.350870] Kernel panic - not syncing: Fatal exception: panic_on_oops
Environment:
Arch: s390x (IBM LPAR)
Kernel: next-20260324
Config: (can provide if needed)
Reproducible: always
Steps to Reproduce:
Install ngdump to an NVMe device partition via 'zipl -d' and initiate a dump (same issue with DASD ldipl-dump).
I have bisected to this specific commit.
good: 230bbdc110b3 ("mm: list_lru: introduce folio_memcg_list_lru_alloc()")
bad: b0f512f6e36c ("mm: switch deferred split shrinker to list_lru")
Reverting it on top of linux-next <next-20260327> restores normal Standalone Dump operation.
Let me know if I can provide any other data.
Thanks,
Mikhail
next prev parent reply other threads:[~2026-03-30 16:37 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-03-18 20:12 ` Shakeel Butt
2026-03-24 11:30 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-03-24 11:32 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-03-18 20:20 ` Shakeel Butt
2026-03-24 11:34 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-03-18 20:22 ` Shakeel Butt
2026-03-24 11:36 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-03-18 20:51 ` Shakeel Butt
2026-03-20 16:18 ` Johannes Weiner
2026-03-24 11:55 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
2026-03-18 20:52 ` Shakeel Butt
2026-03-18 21:01 ` Shakeel Butt
2026-03-24 12:01 ` Lorenzo Stoakes (Oracle)
2026-03-30 16:54 ` Johannes Weiner
2026-04-01 14:43 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-03-18 20:26 ` David Hildenbrand (Arm)
2026-03-18 23:18 ` Shakeel Butt
2026-03-24 13:48 ` Lorenzo Stoakes (Oracle)
2026-03-30 16:40 ` Johannes Weiner
2026-04-01 17:33 ` Lorenzo Stoakes (Oracle)
2026-04-06 21:37 ` Johannes Weiner
2026-04-07 9:55 ` Lorenzo Stoakes (Oracle)
2026-03-27 7:51 ` Kairui Song
2026-03-30 16:51 ` Johannes Weiner
2026-03-30 16:37 ` Mikhail Zaslonko [this message]
2026-03-30 19:03 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru - [s390] panic in __memcg_list_lru_alloc Andrew Morton
2026-03-30 20:41 ` Johannes Weiner
2026-03-30 20:56 ` Johannes Weiner
2026-03-30 22:46 ` Vasily Gorbik
2026-03-31 8:04 ` Mikhail Zaslonko
2026-03-18 21:00 ` [PATCH v3 0/7] mm: switch THP shrinker to list_lru Lorenzo Stoakes (Oracle)
2026-03-18 22:31 ` Johannes Weiner
2026-03-19 8:47 ` Lorenzo Stoakes (Oracle)
2026-03-19 8:52 ` David Hildenbrand (Arm)
2026-03-19 11:45 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4d3f8d79-3593-47df-9de8-f94f7f09a403@linux.ibm.com \
--to=zaslonko@linux.ibm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=david@kernel.org \
--cc=egorenar@linux.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=usama.arif@linux.dev \
--cc=yosry.ahmed@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.