From: Mike Rapoport <rppt@kernel.org>
To: Usama Arif <usamaarif642@gmail.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
Changyuan Lyu <changyuanl@google.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com,
benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com,
corbet@lwn.net, dave.hansen@linux.intel.com,
devicetree@vger.kernel.org, dwmw2@infradead.org,
ebiederm@xmission.com, graf@amazon.com, hpa@zytor.com,
jgowans@amazon.com, kexec@lists.infradead.org, krzk@kernel.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com,
mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com,
peterz@infradead.org, robh@kernel.org, rostedt@goodmis.org,
saravanak@google.com, skinsburskii@linux.microsoft.com,
tglx@linutronix.de, thomas.lendacky@amd.com, will@kernel.org,
x86@kernel.org, Breno Leitao <leitao@debian.org>,
thevlad@meta.com
Subject: Re: [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M
Date: Wed, 26 Nov 2025 08:14:38 +0200 [thread overview]
Message-ID: <aSaazgjKX8PfFDXf@kernel.org> (raw)
In-Reply-To: <ba690e06-c2a1-4d2e-9428-9ca2ea9f2b86@gmail.com>
On Tue, Nov 25, 2025 at 06:47:15PM +0000, Usama Arif wrote:
>
>
> On 25/11/2025 13:50, Mike Rapoport wrote:
> > Hi,
> >
> > On Tue, Nov 25, 2025 at 02:15:34PM +0100, Pratyush Yadav wrote:
> >> On Mon, Nov 24 2025, Usama Arif wrote:
> >
> >>>> --- a/arch/x86/realmode/init.c
> >>>> +++ b/arch/x86/realmode/init.c
> >>>> @@ -65,6 +65,8 @@ void __init reserve_real_mode(void)
> >>>> * setup_arch().
> >>>> */
> >>>> memblock_reserve(0, SZ_1M);
> >>>> +
> >>>> + memblock_clear_kho_scratch(0, SZ_1M);
> >>>> }
> >>>>
> >>>> static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
> >>>
> >>> Hello!
> >>>
> >>> I am working with Breno who reported that we are seeing the below warning at boot
> >>> when rolling out 6.16 in Meta fleet. It is difficult to reproduce on a single host
> >>> manually but we are seeing this several times a day inside the fleet.
> >>>
> >>> 20:16:33 ------------[ cut here ]------------
> >>> 20:16:33 WARNING: CPU: 0 PID: 0 at mm/memblock.c:668 memblock_add_range+0x316/0x330
> >>> 20:16:33 Modules linked in:
> >>> 20:16:33 CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G S 6.16.1-0_fbk0_0_gc0739ee5037a #1 NONE
> >>> 20:16:33 Tainted: [S]=CPU_OUT_OF_SPEC
> >>> 20:16:33 RIP: 0010:memblock_add_range+0x316/0x330
> >>> 20:16:33 Code: ff ff ff 89 5c 24 08 41 ff c5 44 89 6c 24 10 48 63 74 24 08 48 63 54 24 10 e8 26 0c 00 00 e9 41 ff ff ff 0f 0b e9 af fd ff ff <0f> 0b e9 b7 fd ff ff 0f 0b 0f 0b cc cc cc cc cc cc cc cc cc cc cc
> >>> 20:16:33 RSP: 0000:ffffffff83403dd8 EFLAGS: 00010083 ORIG_RAX: 0000000000000000
> >>> 20:16:33 RAX: ffffffff8476ff90 RBX: 0000000000001c00 RCX: 0000000000000002
> >>> 20:16:33 RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff83bad4d8
> >>> 20:16:33 RBP: 000000000009f000 R08: 0000000000000020 R09: 8000000000097101
> >>> 20:16:33 R10: ffffffffff2004b0 R11: 203a6d6f646e6172 R12: 000000000009ec00
> >>> 20:16:33 R13: 0000000000000002 R14: 0000000000100000 R15: 000000000009d000
> >>> 20:16:33 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
> >>> 20:16:33 CR2: ffff888065413ff8 CR3: 00000000663b7000 CR4: 00000000000000b0
> >>> 20:16:33 Call Trace:
> >>> 20:16:33 <TASK>
> >>> 20:16:33 ? __memblock_reserve+0x75/0x80
> >
> > Do you have faddr2line for this?
> > >>> 20:16:33 ? setup_arch+0x30f/0xb10
> >
> > And this?
> >
>
>
> Thanks for this! I think it helped narrow down the problem.
>
> The stack is:
>
> 20:16:33 ? __memblock_reserve (mm/memblock.c:936)
> 20:16:33 ? setup_arch (arch/x86/kernel/setup.c:413 arch/x86/kernel/setup.c:499 arch/x86/kernel/setup.c:956)
> 20:16:33 ? start_kernel (init/main.c:922)
> 20:16:33 ? x86_64_start_reservations (arch/x86/kernel/ebda.c:57)
> 20:16:33 ? x86_64_start_kernel (arch/x86/kernel/head64.c:231)
> 20:16:33 ? common_startup_64 (arch/x86/kernel/head_64.S:419)
>
> This is 6.16 kernel.
>
> 20:16:33 ? __memblock_reserve (mm/memblock.c:936)
> Thats memblock_add_range call in memblock_reserve
>
> 20:16:33 ? setup_arch (arch/x86/kernel/setup.c:413 arch/x86/kernel/setup.c:499 arch/x86/kernel/setup.c:956)
> That is parse_setup_data -> add_early_ima_buffer -> add_early_ima_buffer -> memblock_reserve_kern
>
>
> I put a simple print like below:
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 680d1b6dfea41..cc97ffc0083c7 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -409,6 +409,7 @@ static void __init add_early_ima_buffer(u64 phys_addr)
> }
>
> if (data->size) {
> + pr_err("PPP %s %s %d data->addr %llx, data->size %llx \n", __FILE__, __func__, __LINE__, data->addr, data->size);
> memblock_reserve_kern(data->addr, data->size);
> ima_kexec_buffer_phys = data->addr;
> ima_kexec_buffer_size = data->size;
>
>
> and I see (without replicating the warning):
>
> [ 0.000000] PPP arch/x86/kernel/setup.c add_early_ima_buffer 412 data->addr 9e000, data->size 1000
> ....
So it looks like in cases when the warning reproduces there's something
that reserves memory overlapping with IMA buffer before
add_early_ima_buffer().
>
> [ 0.000348] MEMBLOCK configuration:
> [ 0.000348] memory size = 0x0000003fea329ff0 reserved size = 0x00000000050c969b
> [ 0.000350] memory.cnt = 0x5
> [ 0.000351] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes flags: 0x40
> [ 0.000353] memory[0x1] [0x0000000000100000-0x0000000067c65fff], 0x0000000067b66000 bytes flags: 0x0
> [ 0.000355] memory[0x2] [0x000000006d8db000-0x000000006fffffff], 0x0000000002725000 bytes flags: 0x0
> [ 0.000356] memory[0x3] [0x0000000100000000-0x000000407fff8fff], 0x0000003f7fff9000 bytes flags: 0x0
> [ 0.000358] memory[0x4] [0x000000407fffa000-0x000000407fffffff], 0x0000000000006000 bytes flags: 0x0
> [ 0.000359] reserved.cnt = 0x7
>
>
> So MEMBLOCK_RSRV_KERN and MEMBLOCK_KHO_SCRATCH seem to overlap..
It does not matter, they are set on different arrays. RSRV_KERN is set on
regions in memblock.reserved and KHO_SCRATCH is set on regions in
memblock.memory.
So dumping memblock.memory is completely irrelevant, you need to check
memblock.reserved for potential conflicts.
> >>> 20:16:33 ? start_kernel+0x58/0x960
> >>> 20:16:33 ? x86_64_start_reservations+0x20/0x20
> >>> 20:16:33 ? x86_64_start_kernel+0x13d/0x140
> >>> 20:16:33 ? common_startup_64+0x13e/0x140
> >>> 20:16:33 </TASK>
> >>> 20:16:33 ---[ end trace 0000000000000000 ]---
> >>>
> >>>
> >>> Rolling out with memblock=debug is not really an option in a large scale fleet due to the
> >>> time added to boot. But I did try on one of the hosts (without reproducing the issue) and I see:
> >
> > Is it a problem to roll out a kernel that has additional debug printouts as
> > Breno suggested earlier? I.e.
> >
> > if (flags != MEMBLOCK_NONE && flags != rgn->flags) {
> > pr_warn("memblock: Flag mismatch at region [%pa-%pa]\n",
> > &rgn->base, &rend);
> > pr_warn(" Existing region flags: %#x\n", rgn->flags);
> > pr_warn(" New range flags: %#x\n", flags);
> > pr_warn(" New range: [%pa-%pa]\n", &base, &end);
> > WARN_ON_ONCE(1);
> > }
> >
>
> I can add this, but the only thing is that it might be several weeks between me putting this in the
> kernel and that kernel being deployed to enough machines that it starts to show up. I think the IMA coinciding
> with memblock_mark_kho_scratch in e820__memblock_setup could be the reason for the warning. It might be better to
> fix that case and deploy it to see if the warnings still show up?
> I can add these prints as well incase it doesnt fix the problem.
I really don't think that effectively disabling memblock_mark_kho_scratch()
when KHO is disabled will solve the problem because as I said the flags it
sets are on different structure than the flags set by
memblock_reserve_kern().
> > If you have the logs from failing boots up to the point where SLUB reports
> > about it's initialization, e.g.
> >
> > [ 0.134377] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
> >
> > something there may hint about what's the issue.
>
> So the boot doesnt fail, its just giving warnings in the fleet.
> I have added the dmesg to the end of the mail.
Thanks, unfortunately nothing jumped at me there.
> Does something like this look good? I can try deploying this (although it will take sometime to find out).
> We can get it upstream as well as that makes backports easier.
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 154f1d73b61f2..257c6f0eee03d 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1119,8 +1119,13 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
> */
> __init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
> {
> - return memblock_setclr_flag(&memblock.memory, base, size, 1,
> - MEMBLOCK_KHO_SCRATCH);
> +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> + if (is_kho_boot())
Please use
if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)
instead of indef.
If you send a formal patch with it, I'll take it.
I'd suggest still deploying additional debug printouts internally.
> + return memblock_setclr_flag(&memblock.memory, base, size, 1,
> + MEMBLOCK_KHO_SCRATCH);
> +#else
> + return 0;
> +#endif
> }
>
> /**
> @@ -1133,8 +1138,13 @@ __init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
> */
> __init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
> {
> - return memblock_setclr_flag(&memblock.memory, base, size, 0,
> - MEMBLOCK_KHO_SCRATCH);
> +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> + if (is_kho_boot())
> + return memblock_setclr_flag(&memblock.memory, base, size, 0,
> + MEMBLOCK_KHO_SCRATCH);
> +#else
If nothing sets the flag _clear is anyway nop, but let's update it as well
for symmetry.
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2025-11-26 6:15 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 7:46 [PATCH v8 00/17] kexec: introduce Kexec HandOver (KHO) Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 01/17] memblock: add MEMBLOCK_RSRV_KERN flag Changyuan Lyu
2025-10-10 9:33 ` Breno Leitao
2025-10-13 14:59 ` Pratyush Yadav
2025-10-13 16:40 ` Pratyush Yadav
2025-10-14 8:34 ` Breno Leitao
2025-10-14 13:10 ` Pratyush Yadav
2025-11-05 10:18 ` Breno Leitao
2025-11-06 8:24 ` Mike Rapoport
2025-05-09 7:46 ` [PATCH v8 02/17] memblock: Add support for scratch memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 03/17] memblock: introduce memmap_init_kho_scratch() Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 04/17] kexec: add Kexec HandOver (KHO) generation helpers Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 05/17] kexec: add KHO parsing support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 06/17] kexec: enable KHO support for memory preservation Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 07/17] kexec: add KHO support to kexec file loads Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 08/17] kexec: add config option for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 09/17] arm64: add KHO support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 10/17] x86/setup: use memblock_reserve_kern for memory used by kernel Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 11/17] x86/kexec: add support for passing kexec handover (KHO) data Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M Changyuan Lyu
2025-11-24 19:24 ` Usama Arif
2025-11-25 0:56 ` H. Peter Anvin
2025-11-25 12:23 ` Pratyush Yadav
2025-11-25 13:53 ` Mike Rapoport
2025-11-25 13:15 ` Pratyush Yadav
2025-11-25 13:50 ` Mike Rapoport
2025-11-25 18:47 ` Usama Arif
2025-11-26 6:14 ` Mike Rapoport [this message]
2025-11-26 7:25 ` Usama Arif
2025-11-25 14:31 ` Usama Arif
2025-11-25 14:39 ` Pratyush Yadav
2025-05-09 7:46 ` [PATCH v8 13/17] x86/boot: make sure KASLR does not step over KHO preserved memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 14/17] x86/Kconfig: enable kexec handover for 64 bits Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 15/17] memblock: add KHO support for reserve_mem Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 16/17] Documentation: add documentation for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 17/17] Documentation: KHO: Add memblock bindings Changyuan Lyu
2025-12-11 3:00 ` Ricardo Neri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSaazgjKX8PfFDXf@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=ashish.kalra@amd.com \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=changyuanl@google.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=krzk@kernel.org \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pratyush@kernel.org \
--cc=robh@kernel.org \
--cc=rostedt@goodmis.org \
--cc=saravanak@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=thevlad@meta.com \
--cc=thomas.lendacky@amd.com \
--cc=usamaarif642@gmail.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.