Re: page type is 0, migratetype passed is 2 (nr=256)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Marc Hartmayer" <mhartmay@linux.ibm.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	linux-mm@kvack.org, linux-s390@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>
Subject: Re: page type is 0, migratetype passed is 2 (nr=256)
Date: Tue, 20 May 2025 12:23:42 +0200	[thread overview]
Message-ID: <87zff7r369.fsf@linux.ibm.com> (raw)
In-Reply-To: <20250512171429.GB615800@cmpxchg.org>

On Mon, May 12, 2025 at 01:14 PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, May 12, 2025 at 12:35:39PM -0400, Zi Yan wrote:
>> On 12 May 2025, at 12:16, Lorenzo Stoakes wrote:
>> 
>> > +cc Zi
>> >
>> > Hi Marc,
>> >
>> > I noticed this same bug as reported in [0], but only for a _very_ recent
>> > patch series by Zi, which is only present in mm-new, which is the most
>> > unstable mm branch right now :)
>> >
>> > So I wonder if related or a coincidence caused by something else?
>> 
>> Unless Marc's branch has my "make MIGRATE_ISOLATE a standalone bit" patchset,
>> it should be caused by something else.
>> 
>> A bisect would be very helpful.
>> 
>> >
>> > This is triggered by the mm self-test (in tools/testing/selftests/mm, you
>> > can just make -jXX there) transhuge-stress, invoked as:
>> >
>> > $ sudo ./transhuge-stress -d 20
>> >
>> > The stack traces do look very different though so perhaps unrelated?
>> 
>> The warning is triggered, in the both cases, a pageblock with MIGRATE_UNMOVABLE(0)
>> is moved to MIGRATE_RECLAIMABLE(2). The pageblock is supposed to have
>> MIGRATE_RECLAIMABLE(2) before the movement.
>
> The weird thing is that the warning is from expand(), when the broken
> up chunks are put *back*. Marc, can you confirm that this is the only
> warning in dmesg, and there aren't any before this one?

Yep, I’ve just checked, it was the first warning and `panic_on_warn` is
set to 1.

I managed to reproduce a similar crash using 6.15.0-rc7 (this time THP
seems to be involved):

  …
  root@qemus390x:~# [   40.442403] ------------[ cut here ]------------
  [   40.442471] page type is 0, passed migratetype is 1 (nr=256)
  [   40.442525] WARNING: CPU: 0 PID: 350 at mm/page_alloc.c:669 expand (mm/page_alloc.c:669 (discriminator 2) mm/page_alloc.c:1572 (discriminator 2))
  [   40.442558] Modules linked in: pkey_pckmo(E) pkey(E) diag288_wdt(E) watchdog(E) s390_trng(E) virtio_console(E) rng_core(E) vmw_vsock_virtio_transport(E) vmw_vsock_virtio_transport_common(E) vsock(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) nfnetlink(E) autofs4(E)
  [   40.442651] Unloaded tainted modules: hmac_s390(E):1
  [   40.442677] CPU: 0 UID: 0 PID: 350 Comm: mempig_verify Tainted: G            E       6.15.0-rc7-11557-ga01c92c55b53 #1 PREEMPT
  [   40.442683] Tainted: [E]=UNSIGNED_MODULE
  [   40.442687] Hardware name: IBM 3931 A01 701 (KVM/Linux)
  [   40.442692] Krnl PSW : 0404d00180000000 000002ff929af40c expand (mm/page_alloc.c:669 (discriminator 10) mm/page_alloc.c:1572 (discriminator 10))
  [   40.442696]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  [   40.442699] Krnl GPRS: 000002ff80000004 0000000000000005 0000000000000030 0000000000000000
  [   40.442701]            0000000000000005 0000027f80000005 0000000000000100 0000000000000008
  [   40.442703]            000002ff93f99290 000001f63a415900 0000027500000008 00000275829f4000
  [   40.442704]            0000000000000000 0000000000000008 000002ff929af408 0000027f928c36f8
  [ 40.442722] Krnl Code: 000002ff929af3fc: c02000883f4b larl %r2,000002ff93ab7292

  Code starting with the faulting instruction
  ===========================================
  [   40.442722]            000002ff929af402: c0e5ffe7bd17        brasl   %r14,000002ff926a6e30
  [   40.442722]           #000002ff929af408: af000000            mc      0,0
  [   40.442722]           >000002ff929af40c: a7f4ff49            brc     15,000002ff929af29e
  [   40.442722]            000002ff929af410: b904002b            lgr     %r2,%r11
  [   40.442722]            000002ff929af414: c03000881980        larl    %r3,000002ff93ab2714
  [   40.442722]            000002ff929af41a: c0e5fffdd883        brasl   %r14,000002ff9296a520
  [   40.442722]            000002ff929af420: af000000            mc      0,0
  [   40.442736] Call Trace:
  [   40.442738] expand (mm/page_alloc.c:669 (discriminator 10) mm/page_alloc.c:1572 (discriminator 10))
  [   40.442741] expand (mm/page_alloc.c:669 (discriminator 2) mm/page_alloc.c:1572 (discriminator 2))
  [   40.442743] rmqueue_bulk (mm/page_alloc.c:1587 mm/page_alloc.c:1758 mm/page_alloc.c:2311 mm/page_alloc.c:2364)
  [   40.442745] __rmqueue_pcplist (mm/page_alloc.c:3086)
  [   40.442748] rmqueue.isra.0 (mm/page_alloc.c:3124 mm/page_alloc.c:3155)
  [   40.442751] get_page_from_freelist (mm/page_alloc.c:3683)
  [   40.442754] __alloc_frozen_pages_noprof (mm/page_alloc.c:4967 (discriminator 1))
  [   40.442756] alloc_pages_mpol (mm/mempolicy.c:2290)
  [   40.442764] folio_alloc_mpol_noprof (mm/mempolicy.c:2322)
  [   40.442766] vma_alloc_folio_noprof (mm/mempolicy.c:2355 (discriminator 1))
  [   40.442769] vma_alloc_anon_folio_pmd (mm/huge_memory.c:1167 (discriminator 1))
  [   40.442773] __do_huge_pmd_anonymous_page (mm/huge_memory.c:1227 (discriminator 1))
  [   40.442775] __handle_mm_fault (mm/memory.c:5862 mm/memory.c:6111)
  [   40.442781] handle_mm_fault (mm/memory.c:6321)
  [   40.442783] do_exception (arch/s390/mm/fault.c:298)
  [   40.442792] __do_pgm_check (arch/s390/kernel/traps.c:345)
  [   40.442802] pgm_check_handler (arch/s390/kernel/entry.S:334)
  [   40.442805] Last Breaking-Event-Address:
  [   40.442806] __warn_printk (kernel/panic.c:801)
  [   40.442818] Kernel panic - not syncing: kernel: panic_on_warn set ...
  [   40.442822] CPU: 0 UID: 0 PID: 350 Comm: mempig_verify Tainted: G            E       6.15.0-rc7-11557-ga01c92c55b53 #1 PREEMPT
  [   40.442825] Tainted: [E]=UNSIGNED_MODULE
  [   40.442826] Hardware name: IBM 3931 A01 701 (KVM/Linux)
  [   40.442827] Call Trace:
  [   40.442828] dump_stack_lvl (lib/dump_stack.c:122)
  [   40.442831] panic (kernel/panic.c:372)
  [   40.442833] check_panic_on_warn (kernel/panic.c:247)
  [   40.442836] __warn (kernel/panic.c:751)
  [   40.443057] report_bug (lib/bug.c:176 lib/bug.c:215)
  [   40.443064] monitor_event_exception (arch/s390/kernel/traps.c:227 (discriminator 1))
  [   40.443067] __do_pgm_check (arch/s390/kernel/traps.c:345)
  [   40.443071] pgm_check_handler (arch/s390/kernel/entry.S:334)
  [   40.443074] expand (mm/page_alloc.c:669 (discriminator 10) mm/page_alloc.c:1572 (discriminator 10))
  [   40.443077] expand (mm/page_alloc.c:669 (discriminator 2) mm/page_alloc.c:1572 (discriminator 2))
  [   40.443080] rmqueue_bulk (mm/page_alloc.c:1587 mm/page_alloc.c:1758 mm/page_alloc.c:2311 mm/page_alloc.c:2364)
  [   40.443087] __rmqueue_pcplist (mm/page_alloc.c:3086)
  [   40.443090] rmqueue.isra.0 (mm/page_alloc.c:3124 mm/page_alloc.c:3155)
  [   40.443093] get_page_from_freelist (mm/page_alloc.c:3683)
  [   40.443097] __alloc_frozen_pages_noprof (mm/page_alloc.c:4967 (discriminator 1))
  [   40.443100] alloc_pages_mpol (mm/mempolicy.c:2290)
  [   40.443104] folio_alloc_mpol_noprof (mm/mempolicy.c:2322)
  [   40.443110] vma_alloc_folio_noprof (mm/mempolicy.c:2355 (discriminator 1))
  [   40.443114] vma_alloc_anon_folio_pmd (mm/huge_memory.c:1167 (discriminator 1))
  [   40.443117] __do_huge_pmd_anonymous_page (mm/huge_memory.c:1227 (discriminator 1))
  [   40.443120] __handle_mm_fault (mm/memory.c:5862 mm/memory.c:6111)
  [   40.443123] handle_mm_fault (mm/memory.c:6321)
  [   40.443126] do_exception (arch/s390/mm/fault.c:298)
  [   40.443129] __do_pgm_check (arch/s390/kernel/traps.c:345)
  [   40.443132] pgm_check_handler (arch/s390/kernel/entry.S:334)

This time, the setup is even simpler:

1. Start a 2GB QEMU/KVM guest
2. Now run some memory stress test

I run this test in a loop (with starting/shutting down the VM) and after
many iterations, the bug occurs.

[…snip…]

next prev parent reply	other threads:[~2025-05-20 10:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 14:18 page type is 0, migratetype passed is 2 (nr=256) Marc Hartmayer
2025-05-12 16:16 ` Lorenzo Stoakes
2025-05-12 16:35   ` Zi Yan
2025-05-12 17:14     ` Johannes Weiner
2025-05-20 10:23       ` Marc Hartmayer [this message]
2025-06-12  9:05         ` Alexander Gordeev
2025-06-14  8:24           ` Johannes Weiner
2025-08-05 12:02             ` Alexander Gordeev
2025-11-10 14:39       ` Alexander Gordeev
2025-05-13  8:30   ` Marc Hartmayer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zff7r369.fsf@linux.ibm.com \
    --to=mhartmay@linux.ibm.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=hannes@cmpxchg.org \
    --cc=hca@linux.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.