From: Mike Rapoport <rppt@kernel.org>
To: Changyuan Lyu <changyuanl@google.com>
Cc: akpm@linux-foundation.org, bhe@redhat.com, chrisl@kernel.org,
graf@amazon.com, jasonmiu@google.com, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
pasha.tatashin@soleen.com
Subject: Re: [PATCH 1/2] memblock: show a warning if allocation in KHO scratch fails
Date: Wed, 21 May 2025 10:43:15 +0300 [thread overview]
Message-ID: <aC2EE1pg9ktQdstI@kernel.org> (raw)
In-Reply-To: <20250521070310.2478491-1-changyuanl@google.com>
Hi Changyuan,
On Wed, May 21, 2025 at 12:03:10AM -0700, Changyuan Lyu wrote:
> Hi Mike,
>
> On Sun, May 18, 2025 at 19:07:02 +0300, Mike Rapoport <rppt@kernel.org> wrote:
> >
> > This can be reproduced without KHO, just squeeze the RAM size, boot with a huge
> > kernel and initrd and you'll get the same panic.
> >
> > The issue is that sparse_init_nid() does not treat allocation failures as
> > fatal and just continues with some sections being unpopulated and then
> > subsection_map_init() presumes all the sections are valid.
> >
> > This should be fixed in mm/sparse.c regardless of KHO, maybe as simple as
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 3c012cf83cc2..64d071f9f037 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -197,6 +197,10 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> > pfns = min(nr_pages, PAGES_PER_SECTION
> > - (pfn & ~PAGE_SECTION_MASK));
> > ms = __nr_to_section(nr);
> > +
> > + if (!ms->section_mem_map)
> > + continue;
> > +
> > subsection_mask_set(ms->usage->subsection_map, pfn, pfns);
> >
> > pr_debug("%s: sec: %lu pfns: %lu set(%d, %d)\n", __func__, nr,
>
> I tried your patch and the kernel log now looks like
>
> [ 0.027562] Faking a node at [mem 0x0000000000000000-0x000000057fffffff]
> [ 0.028338] NODE_DATA(0) allocated [mem 0x562bd5a00-0x562bfffff]
> [ 0.029201] Could not allocate 0x0000000014000000 bytes in KHO scratch
> [ 0.030229] sparse_init_nid: node[0] memory map backing failed. Some memory will not be available.
> [ 0.030232] Zone ranges:
> [ 0.031539] DMA [mem 0x0000000000001000-0x0000000000ffffff]
> [ 0.032242] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
> [ 0.032952] Normal [mem 0x0000000100000000-0x000000057fffffff]
> [ 0.033658] Device empty
> [ 0.033987] Movable zone start for each node
> [ 0.034473] Early memory node ranges
> [ 0.034878] node 0: [mem 0x0000000000001000-0x000000000007ffff]
> [ 0.035591] node 0: [mem 0x0000000000100000-0x00000000773fffff]
> [ 0.036308] node 0: [mem 0x0000000077400000-0x00000000775fffff]
> [ 0.037030] node 0: [mem 0x0000000077600000-0x000000007fffffff]
> [ 0.037750] node 0: [mem 0x0000000100000000-0x000000054abfffff]
> [ 0.038463] node 0: [mem 0x000000054ac00000-0x0000000562bfffff]
> [ 0.039180] node 0: [mem 0x0000000562c00000-0x000000057fffffff]
> [ 0.039901] Initmem setup node 0 [mem 0x0000000000001000-0x000000057fffffff]
> [ 0.040707] On node 0, zone DMA: 1 pages in unavailable ranges
> [ 0.041401] On node 0, zone DMA: 128 pages in unavailable ranges
> [ 0.221829] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [ 0.222675] #PF: supervisor read access in kernel mode
> [ 0.223271] #PF: error_code(0x0000) - not-present page
> [ 0.223859] PGD 0 P4D 0
> [ 0.224152] Oops: Oops: 0000 [#1] SMP
> [ 0.224575] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc4+ #279 NONE
> [ 0.225439] RIP: 0010:set_pageblock_migratetype+0x97/0xd0
> [ 0.226069] Code: b6 c9 c1 e1 04 48 01 c8 eb 02 31 c0 48 8b 70 08 89 f9 c1 e9 07 c1 ef 0d 83 e7 03 80 e1 3c 41 b8 07 00 00 00 49 d3 e0 48 d3 e2 <48> 8b 44 fe 18 49 f7 d0 48 89 c1 4c 21 c1 48 09 d1 f0 48 0f b1 4c
> [ 0.228231] RSP: 0000:ffffffffa4203d58 EFLAGS: 00010046
> [ 0.228834] RAX: ffff8e4722bd13b0 RBX: 0000000000000000 RCX: 0000000000009b00
> [ 0.229664] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
> [ 0.230487] RBP: ffffffffa4203d58 R08: 0000000000000007 R09: 0000000000000000
> [ 0.231303] R10: ffffffffa4edc610 R11: 000000000000000c R12: 000000000054ac00
> [ 0.232119] R13: 0017fff000000000 R14: 0000000000000002 R15: 00000000004d8000
> [ 0.232937] FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
> [ 0.233868] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.234529] CR2: 0000000000000018 CR3: 000000055923e000 CR4: 00000000000200b0
> [ 0.235351] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.236171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 0.236990] Call Trace:
> [ 0.237272] <TASK>
> [ 0.237514] memmap_init_range+0x1d8/0x210
> [ 0.237983] memmap_init_zone_range+0x7f/0xb0
> [ 0.238488] memmap_init+0x9a/0x120
> [ 0.238892] free_area_init+0x369/0x3d0
> [ 0.239331] zone_sizes_init+0x5e/0x80
> [ 0.239765] paging_init+0x27/0x30
> [ 0.240153] setup_arch+0x307/0x3e0
> [ 0.240556] start_kernel+0x59/0x390
> [ 0.240968] x86_64_start_reservations+0x28/0x30
> [ 0.241493] x86_64_start_kernel+0x70/0x80
> [ 0.241962] common_startup_64+0x13b/0x140
> [ 0.242433] </TASK>
> [ 0.242682] CR2: 0000000000000018
> [ 0.243064] ---[ end trace 0000000000000000 ]---
>
> It seems we are just defering the panic from subsection_map_init() to
> memmap_init(). To me it is still not obvious that the failure was
> caused by samll KHO scratch.
Small KHO scratch only exposes the issue that from one side
sparse_init_nid() does not treat OOM condition as fatal and tries to
continue with hardly noticeable error message but from the other side, we
presume that all section data was properly allocated and access it.
> I think the error log in my original patch still makes sense since it
> indicates potential panics early.
This will add another barely noticeable message at the same place
sparse_init_nid() reports an error.
I don't see how it will be better.
I think we should just make sparse_init_nid() panic or at least change
"sparse_init_nid: node[0] memory map backing failed. Some memory will not be available."
to something more visible and clear.
> Best,
> Changyuan
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2025-05-21 7:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-18 14:23 [PATCH 0/2] KHO Fixes Changyuan Lyu
2025-05-18 14:23 ` [PATCH 1/2] memblock: show a warning if allocation in KHO scratch fails Changyuan Lyu
2025-05-18 16:07 ` Mike Rapoport
2025-05-21 7:03 ` Changyuan Lyu
2025-05-21 7:43 ` Mike Rapoport [this message]
2025-05-21 8:48 ` Oscar Salvador
2025-05-21 15:27 ` Mike Rapoport
2025-05-18 14:23 ` [PATCH 2/2] KHO: init new_physxa->phys_bits to fix lockdep Changyuan Lyu
2025-05-18 15:51 ` Mike Rapoport
2025-05-19 12:10 ` Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC2EE1pg9ktQdstI@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=changyuanl@google.com \
--cc=chrisl@kernel.org \
--cc=graf@amazon.com \
--cc=jasonmiu@google.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pasha.tatashin@soleen.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).