The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] sched/isolation: Don't free memblock allocated cpumasks
@ 2026-05-05  5:18 Waiman Long
  2026-05-06 13:25 ` Valentin Schneider
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Waiman Long @ 2026-05-05  5:18 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	Mike Rapoport
  Cc: linux-kernel, Waiman Long

When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
freeing reserved memory before memory map is initialized"), the following
warning was hit when there was a "nohz_full" kernel boot parameter.

[    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
[    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
  :
[    0.080945] Call Trace:
[    0.080947]  <TASK>
[    0.080949]  memblock_phys_free+0xcb/0x100
[    0.080953]  housekeeping_init+0x14c/0x170
[    0.080957]  start_kernel+0x207/0x450
[    0.080961]  x86_64_start_reservations+0x24/0x30
[    0.080965]  x86_64_start_kernel+0xda/0xe0
[    0.080967]  common_startup_64+0x13e/0x141
[    0.080972]  </TASK>

The commit states that freeing of reserved memory before the memory
map is fully initialized in deferred_init_memmap() would cause access
to uninitialized struct pages and may crash when accessing spurious
list pointers. However, if the memblock_free() call is deferred to
the start of initcall processing in the bootup process, for instance,
the following KASAN warning can appear.

[    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
[    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
  :
[    8.514775] Call Trace:
[    8.514775]  <TASK>
[    8.514775]  kasan_report+0xb2/0x1b0
[    8.514775]  memblock_isolate_range+0x4ac/0x650
[    8.514775]  memblock_phys_free+0xc4/0x190
[    8.514775]  housekeeping_late_init+0x257/0x280
[    8.514775]  do_one_initcall+0xaa/0x470
[    8.514775]  do_initcalls+0x1b4/0x1f0
[    8.514775]  kernel_init_freeable+0x4b5/0x550
[    8.514775]  kernel_init+0x1c/0x150
[    8.514775]  ret_from_fork+0x5dc/0x8e0
[    8.514775]  ret_from_fork_asm+0x1a/0x30
[    8.514775]  </TASK>

It is likely that memblock_discard() may discard memblock data needed
for memblock_free(). One workaround for now to avoid these warning/bug
messages is to keep the memblock allocated cpumasks even if they are
no longer needed until the memblock subsystem is properly updated to
handle memblock_free().

On most systems, memory occuipied by a cpumask is pretty small. So not
much memory will be wasted if the memblock cpumasks are not freed.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/isolation.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index ef152d401fe2..ad9b1a1104e3 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -189,7 +189,13 @@ void __init housekeeping_init(void)
 		WARN_ON_ONCE(cpumask_empty(omask));
 		cpumask_copy(nmask, omask);
 		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
-		memblock_free(omask, cpumask_size());
+
+		/*
+		 * TODO: Don't free memblock allocated cpumasks until the
+		 * memblock subystem is able to handle the memblock_free()
+		 * properly.
+		 */
+		// memblock_free(omask, cpumask_size());
 	}
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-05  5:18 [PATCH] sched/isolation: Don't free memblock allocated cpumasks Waiman Long
@ 2026-05-06 13:25 ` Valentin Schneider
  2026-05-06 14:02   ` Waiman Long
  2026-05-08 14:19 ` Breno Leitao
  2026-05-10 15:02 ` Mike Rapoport
  2 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2026-05-06 13:25 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, K Prateek Nayak, Frederic Weisbecker, Mike Rapoport
  Cc: linux-kernel, Waiman Long

On 05/05/26 01:18, Waiman Long wrote:
> When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
> freeing reserved memory before memory map is initialized"), the following
> warning was hit when there was a "nohz_full" kernel boot parameter.
>
> [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
> [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>   :
> [    0.080945] Call Trace:
> [    0.080947]  <TASK>
> [    0.080949]  memblock_phys_free+0xcb/0x100
> [    0.080953]  housekeeping_init+0x14c/0x170
> [    0.080957]  start_kernel+0x207/0x450
> [    0.080961]  x86_64_start_reservations+0x24/0x30
> [    0.080965]  x86_64_start_kernel+0xda/0xe0
> [    0.080967]  common_startup_64+0x13e/0x141
> [    0.080972]  </TASK>
>
> The commit states that freeing of reserved memory before the memory
> map is fully initialized in deferred_init_memmap() would cause access
> to uninitialized struct pages and may crash when accessing spurious
> list pointers. However, if the memblock_free() call is deferred to
> the start of initcall processing in the bootup process, for instance,
> the following KASAN warning can appear.
>
> [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
> [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
>   :
> [    8.514775] Call Trace:
> [    8.514775]  <TASK>
> [    8.514775]  kasan_report+0xb2/0x1b0
> [    8.514775]  memblock_isolate_range+0x4ac/0x650
> [    8.514775]  memblock_phys_free+0xc4/0x190
> [    8.514775]  housekeeping_late_init+0x257/0x280
> [    8.514775]  do_one_initcall+0xaa/0x470
> [    8.514775]  do_initcalls+0x1b4/0x1f0
> [    8.514775]  kernel_init_freeable+0x4b5/0x550
> [    8.514775]  kernel_init+0x1c/0x150
> [    8.514775]  ret_from_fork+0x5dc/0x8e0
> [    8.514775]  ret_from_fork_asm+0x1a/0x30
> [    8.514775]  </TASK>
>

Darn, I just saw the previous version doing this.

> It is likely that memblock_discard() may discard memblock data needed
> for memblock_free(). One workaround for now to avoid these warning/bug
> messages is to keep the memblock allocated cpumasks even if they are
> no longer needed until the memblock subsystem is properly updated to
> handle memblock_free().

Pardon my ignorance, but how come this isn't the case for the other
memblock users? It sounds like there is no right place for freeing this
mask.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-06 13:25 ` Valentin Schneider
@ 2026-05-06 14:02   ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2026-05-06 14:02 UTC (permalink / raw)
  To: Valentin Schneider, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, K Prateek Nayak, Frederic Weisbecker, Mike Rapoport
  Cc: linux-kernel

On 5/6/26 9:25 AM, Valentin Schneider wrote:
> On 05/05/26 01:18, Waiman Long wrote:
>> When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
>> freeing reserved memory before memory map is initialized"), the following
>> warning was hit when there was a "nohz_full" kernel boot parameter.
>>
>> [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
>> [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>>    :
>> [    0.080945] Call Trace:
>> [    0.080947]  <TASK>
>> [    0.080949]  memblock_phys_free+0xcb/0x100
>> [    0.080953]  housekeeping_init+0x14c/0x170
>> [    0.080957]  start_kernel+0x207/0x450
>> [    0.080961]  x86_64_start_reservations+0x24/0x30
>> [    0.080965]  x86_64_start_kernel+0xda/0xe0
>> [    0.080967]  common_startup_64+0x13e/0x141
>> [    0.080972]  </TASK>
>>
>> The commit states that freeing of reserved memory before the memory
>> map is fully initialized in deferred_init_memmap() would cause access
>> to uninitialized struct pages and may crash when accessing spurious
>> list pointers. However, if the memblock_free() call is deferred to
>> the start of initcall processing in the bootup process, for instance,
>> the following KASAN warning can appear.
>>
>> [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
>> [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
>>    :
>> [    8.514775] Call Trace:
>> [    8.514775]  <TASK>
>> [    8.514775]  kasan_report+0xb2/0x1b0
>> [    8.514775]  memblock_isolate_range+0x4ac/0x650
>> [    8.514775]  memblock_phys_free+0xc4/0x190
>> [    8.514775]  housekeeping_late_init+0x257/0x280
>> [    8.514775]  do_one_initcall+0xaa/0x470
>> [    8.514775]  do_initcalls+0x1b4/0x1f0
>> [    8.514775]  kernel_init_freeable+0x4b5/0x550
>> [    8.514775]  kernel_init+0x1c/0x150
>> [    8.514775]  ret_from_fork+0x5dc/0x8e0
>> [    8.514775]  ret_from_fork_asm+0x1a/0x30
>> [    8.514775]  </TASK>
>>
> Darn, I just saw the previous version doing this.
>
>> It is likely that memblock_discard() may discard memblock data needed
>> for memblock_free(). One workaround for now to avoid these warning/bug
>> messages is to keep the memblock allocated cpumasks even if they are
>> no longer needed until the memblock subsystem is properly updated to
>> handle memblock_free().
> Pardon my ignorance, but how come this isn't the case for the other
> memblock users? It sounds like there is no right place for freeing this
> mask.

My current thought is to chain all the memblock memory blocks to be 
freed in a singly linked list first and then freed them at the right 
moment by the memblock code. That will require some more investigation 
into the memblock code. This patch is just a temporary workaround which 
I hope will be reverted in the future.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-05  5:18 [PATCH] sched/isolation: Don't free memblock allocated cpumasks Waiman Long
  2026-05-06 13:25 ` Valentin Schneider
@ 2026-05-08 14:19 ` Breno Leitao
  2026-05-10 14:45   ` Mike Rapoport
  2026-05-10 15:02 ` Mike Rapoport
  2 siblings, 1 reply; 10+ messages in thread
From: Breno Leitao @ 2026-05-08 14:19 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	Mike Rapoport, linux-kernel

On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
> One workaround for now to avoid these warning/bug
> messages is to keep the memblock allocated cpumasks even if they are
> no longer needed until the memblock subsystem is properly updated to
> handle memblock_free().

We just hit the same KASAN UAF from a different caller on a v7.1-rc3 boot,
which I think reinforces that the fix really needs to be in memblock rather
than in each subsystem.

In our case the offender is the IMA kexec buffer release path:

    [  113.498542] BUG: KASAN: use-after-free in memblock_isolate_range+0x208/0x8f0
    [  113.514206] Read of size 8 at addr ff11001824ba4000 by task swapper/0/1
    ...
    [  113.532258]  memblock_isolate_range+0x208/0x8f0
    [  113.532267]  memblock_phys_free+0x5f/0x300
    [  113.532274]  ima_free_kexec_buffer+0x1d/0x40
    [  113.532280]  ima_load_kexec_buffer+0xbf/0xf0
    [  113.532285]  ima_init+0x42/0xa0
    [  113.532287]  init_ima+0x5e/0x190
    [  113.532290]  security_initcall_late+0xad/0x210
    [  113.532301]  do_one_initcall+0x138/0x540

Same shape as your second trace: memblock_phys_free() reads
memblock.reserved.regions, which memblock_discard() has already returned
to the buddy allocator (the KASAN shadow shows the page as fully poisoned,
and pfn 0x1824ba4 has been reallocated). It then page-faults a moment later
on the same address.

ima_init runs as a security_initcall_late, so by the time
ima_free_kexec_buffer() calls memblock_phys_free() on the previous
kernel's measurement buffer, memblock has long been torn down on
configurations without CONFIG_ARCH_KEEP_MEMBLOCK 

This regression seems to come from commit 87ce9e83ab8b ("memblock, treewide: make
memblock_free() handle late freeing"), which dropped memblock_free_late()
and made memblock_phys_free() unconditionally call
memblock_remove_range(&memblock.reserved, ...) followed by an optional
__free_reserved_area().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-08 14:19 ` Breno Leitao
@ 2026-05-10 14:45   ` Mike Rapoport
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Rapoport @ 2026-05-10 14:45 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Waiman Long, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, K Prateek Nayak,
	Frederic Weisbecker, linux-kernel

Hi Breno,

On Fri, May 08, 2026 at 07:19:06AM -0700, Breno Leitao wrote:
> On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
> > One workaround for now to avoid these warning/bug
> > messages is to keep the memblock allocated cpumasks even if they are
> > no longer needed until the memblock subsystem is properly updated to
> > handle memblock_free().
> 
> We just hit the same KASAN UAF from a different caller on a v7.1-rc3 boot,
> which I think reinforces that the fix really needs to be in memblock rather
> than in each subsystem.
> 
> In our case the offender is the IMA kexec buffer release path:
> 
>     [  113.498542] BUG: KASAN: use-after-free in memblock_isolate_range+0x208/0x8f0
>     [  113.514206] Read of size 8 at addr ff11001824ba4000 by task swapper/0/1
>     ...
>     [  113.532258]  memblock_isolate_range+0x208/0x8f0
>     [  113.532267]  memblock_phys_free+0x5f/0x300
>     [  113.532274]  ima_free_kexec_buffer+0x1d/0x40
>     [  113.532280]  ima_load_kexec_buffer+0xbf/0xf0
>     [  113.532285]  ima_init+0x42/0xa0
>     [  113.532287]  init_ima+0x5e/0x190
>     [  113.532290]  security_initcall_late+0xad/0x210
>     [  113.532301]  do_one_initcall+0x138/0x540
> 
> Same shape as your second trace: memblock_phys_free() reads
> memblock.reserved.regions, which memblock_discard() has already returned
> to the buddy allocator (the KASAN shadow shows the page as fully poisoned,
> and pfn 0x1824ba4 has been reallocated). It then page-faults a moment later
> on the same address.
> 
> ima_init runs as a security_initcall_late, so by the time
> ima_free_kexec_buffer() calls memblock_phys_free() on the previous
> kernel's measurement buffer, memblock has long been torn down on
> configurations without CONFIG_ARCH_KEEP_MEMBLOCK 
> 
> This regression seems to come from commit 87ce9e83ab8b ("memblock, treewide: make
> memblock_free() handle late freeing"), which dropped memblock_free_late()
> and made memblock_phys_free() unconditionally call
> memblock_remove_range(&memblock.reserved, ...) followed by an optional
> __free_reserved_area().

Oops, somehow I overlooked that late freeing can't access memblock arrays :(

Can you please test this fix:

diff --git a/mm/memblock.c b/mm/memblock.c
index a6a1c91e276d..ccd43f3abb82 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -989,13 +989,15 @@ void __init_memblock memblock_free(void *ptr, size_t size)
 int __init_memblock memblock_phys_free(phys_addr_t base, phys_addr_t size)
 {
 	phys_addr_t end = base + size - 1;
-	int ret;
+	int ret = 0;
 
 	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
 	kmemleak_free_part_phys(base, size);
-	ret = memblock_remove_range(&memblock.reserved, base, size);
+
+	if (!slab_is_available() || IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
+		ret = memblock_remove_range(&memblock.reserved, base, size);
 
 	if (slab_is_available())
 		__free_reserved_area(base, base + size, -1);


-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-05  5:18 [PATCH] sched/isolation: Don't free memblock allocated cpumasks Waiman Long
  2026-05-06 13:25 ` Valentin Schneider
  2026-05-08 14:19 ` Breno Leitao
@ 2026-05-10 15:02 ` Mike Rapoport
  2026-05-11  4:55   ` Waiman Long
  2 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport @ 2026-05-10 15:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel

Hi Waiman,

On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
> When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
> freeing reserved memory before memory map is initialized"), the following
> warning was hit when there was a "nohz_full" kernel boot parameter.
> 
> [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
> [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>   :
> [    0.080945] Call Trace:
> [    0.080947]  <TASK>
> [    0.080949]  memblock_phys_free+0xcb/0x100
> [    0.080953]  housekeeping_init+0x14c/0x170
> [    0.080957]  start_kernel+0x207/0x450
> [    0.080961]  x86_64_start_reservations+0x24/0x30
> [    0.080965]  x86_64_start_kernel+0xda/0xe0
> [    0.080967]  common_startup_64+0x13e/0x141
> [    0.080972]  </TASK>
> 
> The commit states that freeing of reserved memory before the memory
> map is fully initialized in deferred_init_memmap() would cause access
> to uninitialized struct pages and may crash when accessing spurious
> list pointers. However, if the memblock_free() call is deferred to
> the start of initcall processing in the bootup process, for instance,
> the following KASAN warning can appear.
> 
> [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
> [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
>   :
> [    8.514775] Call Trace:
> [    8.514775]  <TASK>
> [    8.514775]  kasan_report+0xb2/0x1b0
> [    8.514775]  memblock_isolate_range+0x4ac/0x650
> [    8.514775]  memblock_phys_free+0xc4/0x190
> [    8.514775]  housekeeping_late_init+0x257/0x280
> [    8.514775]  do_one_initcall+0xaa/0x470
> [    8.514775]  do_initcalls+0x1b4/0x1f0
> [    8.514775]  kernel_init_freeable+0x4b5/0x550
> [    8.514775]  kernel_init+0x1c/0x150
> [    8.514775]  ret_from_fork+0x5dc/0x8e0
> [    8.514775]  ret_from_fork_asm+0x1a/0x30
> [    8.514775]  </TASK>
> 
> It is likely that memblock_discard() may discard memblock data needed
> for memblock_free(). One workaround for now to avoid these warning/bug
> messages is to keep the memblock allocated cpumasks even if they are
> no longer needed until the memblock subsystem is properly updated to
> handle memblock_free().
> 
> On most systems, memory occuipied by a cpumask is pretty small. So not
> much memory will be wasted if the memblock cpumasks are not freed.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/sched/isolation.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index ef152d401fe2..ad9b1a1104e3 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -189,7 +189,13 @@ void __init housekeeping_init(void)
>  		WARN_ON_ONCE(cpumask_empty(omask));
>  		cpumask_copy(nmask, omask);
>  		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> -		memblock_free(omask, cpumask_size());
> +
> +		/*
> +		 * TODO: Don't free memblock allocated cpumasks until the
> +		 * memblock subystem is able to handle the memblock_free()
> +		 * properly.
> +		 */
> +		// memblock_free(omask, cpumask_size());

Before 59bd1d914bb5 it was a silent leak. housekeeping_init() is called
after memblock moves all the memory to buddy, so this would only update
memblock.reserved.

The comment a few lines above says that we reallocate to be able to kfree()
later. Is it possible to delay reallocation until an initcall?

>  	}
>  }
>  
> -- 
> 2.53.0
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-10 15:02 ` Mike Rapoport
@ 2026-05-11  4:55   ` Waiman Long
  2026-05-11  8:34     ` Mike Rapoport
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2026-05-11  4:55 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel

On 5/10/26 11:02 AM, Mike Rapoport wrote:
> Hi Waiman,
>
> On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
>> When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
>> freeing reserved memory before memory map is initialized"), the following
>> warning was hit when there was a "nohz_full" kernel boot parameter.
>>
>> [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
>> [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>>    :
>> [    0.080945] Call Trace:
>> [    0.080947]  <TASK>
>> [    0.080949]  memblock_phys_free+0xcb/0x100
>> [    0.080953]  housekeeping_init+0x14c/0x170
>> [    0.080957]  start_kernel+0x207/0x450
>> [    0.080961]  x86_64_start_reservations+0x24/0x30
>> [    0.080965]  x86_64_start_kernel+0xda/0xe0
>> [    0.080967]  common_startup_64+0x13e/0x141
>> [    0.080972]  </TASK>
>>
>> The commit states that freeing of reserved memory before the memory
>> map is fully initialized in deferred_init_memmap() would cause access
>> to uninitialized struct pages and may crash when accessing spurious
>> list pointers. However, if the memblock_free() call is deferred to
>> the start of initcall processing in the bootup process, for instance,
>> the following KASAN warning can appear.
>>
>> [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
>> [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
>>    :
>> [    8.514775] Call Trace:
>> [    8.514775]  <TASK>
>> [    8.514775]  kasan_report+0xb2/0x1b0
>> [    8.514775]  memblock_isolate_range+0x4ac/0x650
>> [    8.514775]  memblock_phys_free+0xc4/0x190
>> [    8.514775]  housekeeping_late_init+0x257/0x280
>> [    8.514775]  do_one_initcall+0xaa/0x470
>> [    8.514775]  do_initcalls+0x1b4/0x1f0
>> [    8.514775]  kernel_init_freeable+0x4b5/0x550
>> [    8.514775]  kernel_init+0x1c/0x150
>> [    8.514775]  ret_from_fork+0x5dc/0x8e0
>> [    8.514775]  ret_from_fork_asm+0x1a/0x30
>> [    8.514775]  </TASK>
>>
>> It is likely that memblock_discard() may discard memblock data needed
>> for memblock_free(). One workaround for now to avoid these warning/bug
>> messages is to keep the memblock allocated cpumasks even if they are
>> no longer needed until the memblock subsystem is properly updated to
>> handle memblock_free().
>>
>> On most systems, memory occuipied by a cpumask is pretty small. So not
>> much memory will be wasted if the memblock cpumasks are not freed.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/sched/isolation.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>> index ef152d401fe2..ad9b1a1104e3 100644
>> --- a/kernel/sched/isolation.c
>> +++ b/kernel/sched/isolation.c
>> @@ -189,7 +189,13 @@ void __init housekeeping_init(void)
>>   		WARN_ON_ONCE(cpumask_empty(omask));
>>   		cpumask_copy(nmask, omask);
>>   		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
>> -		memblock_free(omask, cpumask_size());
>> +
>> +		/*
>> +		 * TODO: Don't free memblock allocated cpumasks until the
>> +		 * memblock subystem is able to handle the memblock_free()
>> +		 * properly.
>> +		 */
>> +		// memblock_free(omask, cpumask_size());
> Before 59bd1d914bb5 it was a silent leak. housekeeping_init() is called
> after memblock moves all the memory to buddy, so this would only update
> memblock.reserved.
>
> The comment a few lines above says that we reallocate to be able to kfree()
> later. Is it possible to delay reallocation until an initcall?

My original thought was to defer the freeing to init call. That changes 
led to the KASAN bug splat listed in the commit log, I think the right 
window to free memblock memory is currently just too narrow. Do you mean 
that with the fix patch you sent to Breno, memblock freeing in initcall 
will work without bug report? If so, I can send another patch to defer 
memblock freeing after the fix patch is merged as the KASAN bug is more 
serious than the memblock warning. I will do some testing tomorrow with 
your fix patch.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-11  4:55   ` Waiman Long
@ 2026-05-11  8:34     ` Mike Rapoport
  2026-05-11 21:36       ` Waiman Long
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport @ 2026-05-11  8:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel

On Mon, May 11, 2026 at 12:55:39AM -0400, Waiman Long wrote:
> On 5/10/26 11:02 AM, Mike Rapoport wrote:
> > Hi Waiman,
> > 
> > On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
> > > When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
> > > freeing reserved memory before memory map is initialized"), the following
> > > warning was hit when there was a "nohz_full" kernel boot parameter.
> > > 
> > > [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
> > > [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
> > >    :
> > > [    0.080945] Call Trace:
> > > [    0.080947]  <TASK>
> > > [    0.080949]  memblock_phys_free+0xcb/0x100
> > > [    0.080953]  housekeeping_init+0x14c/0x170
> > > [    0.080957]  start_kernel+0x207/0x450
> > > [    0.080961]  x86_64_start_reservations+0x24/0x30
> > > [    0.080965]  x86_64_start_kernel+0xda/0xe0
> > > [    0.080967]  common_startup_64+0x13e/0x141
> > > [    0.080972]  </TASK>
> > > 
> > > The commit states that freeing of reserved memory before the memory
> > > map is fully initialized in deferred_init_memmap() would cause access
> > > to uninitialized struct pages and may crash when accessing spurious
> > > list pointers. However, if the memblock_free() call is deferred to
> > > the start of initcall processing in the bootup process, for instance,
> > > the following KASAN warning can appear.
> > > 
> > > [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
> > > [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
> > >    :
> > > [    8.514775] Call Trace:
> > > [    8.514775]  <TASK>
> > > [    8.514775]  kasan_report+0xb2/0x1b0
> > > [    8.514775]  memblock_isolate_range+0x4ac/0x650
> > > [    8.514775]  memblock_phys_free+0xc4/0x190
> > > [    8.514775]  housekeeping_late_init+0x257/0x280
> > > [    8.514775]  do_one_initcall+0xaa/0x470
> > > [    8.514775]  do_initcalls+0x1b4/0x1f0
> > > [    8.514775]  kernel_init_freeable+0x4b5/0x550
> > > [    8.514775]  kernel_init+0x1c/0x150
> > > [    8.514775]  ret_from_fork+0x5dc/0x8e0
> > > [    8.514775]  ret_from_fork_asm+0x1a/0x30
> > > [    8.514775]  </TASK>
> > > 
> > > It is likely that memblock_discard() may discard memblock data needed
> > > for memblock_free(). One workaround for now to avoid these warning/bug
> > > messages is to keep the memblock allocated cpumasks even if they are
> > > no longer needed until the memblock subsystem is properly updated to
> > > handle memblock_free().
> > > 
> > > On most systems, memory occuipied by a cpumask is pretty small. So not
> > > much memory will be wasted if the memblock cpumasks are not freed.
> > > 
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > >   kernel/sched/isolation.c | 8 +++++++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > index ef152d401fe2..ad9b1a1104e3 100644
> > > --- a/kernel/sched/isolation.c
> > > +++ b/kernel/sched/isolation.c
> > > @@ -189,7 +189,13 @@ void __init housekeeping_init(void)
> > >   		WARN_ON_ONCE(cpumask_empty(omask));
> > >   		cpumask_copy(nmask, omask);
> > >   		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> > > -		memblock_free(omask, cpumask_size());
> > > +
> > > +		/*
> > > +		 * TODO: Don't free memblock allocated cpumasks until the
> > > +		 * memblock subystem is able to handle the memblock_free()
> > > +		 * properly.
> > > +		 */
> > > +		// memblock_free(omask, cpumask_size());
> > Before 59bd1d914bb5 it was a silent leak. housekeeping_init() is called
> > after memblock moves all the memory to buddy, so this would only update
> > memblock.reserved.
> > 
> > The comment a few lines above says that we reallocate to be able to kfree()
> > later. Is it possible to delay reallocation until an initcall?
> 
> My original thought was to defer the freeing to init call. That changes led
> to the KASAN bug splat listed in the commit log, I think the right window to
> free memblock memory is currently just too narrow. Do you mean that with the
> fix patch you sent to Breno, memblock freeing in initcall will work without
> bug report?

Yes, with the fix I sent to Breno memblock_free() should work in an
initcall and "do the right thing".

> If so, I can send another patch to defer memblock freeing after the fix
> patch is merged as the KASAN bug is more serious than the memblock
> warning. I will do some testing tomorrow with your fix patch.
 
> Cheers,
> Longman
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-11  8:34     ` Mike Rapoport
@ 2026-05-11 21:36       ` Waiman Long
  2026-05-12 13:40         ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2026-05-11 21:36 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel

On 5/11/26 4:34 AM, Mike Rapoport wrote:
> On Mon, May 11, 2026 at 12:55:39AM -0400, Waiman Long wrote:
>> On 5/10/26 11:02 AM, Mike Rapoport wrote:
>>> Hi Waiman,
>>>
>>> On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
>>>> When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
>>>> freeing reserved memory before memory map is initialized"), the following
>>>> warning was hit when there was a "nohz_full" kernel boot parameter.
>>>>
>>>> [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
>>>> [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>>>>     :
>>>> [    0.080945] Call Trace:
>>>> [    0.080947]  <TASK>
>>>> [    0.080949]  memblock_phys_free+0xcb/0x100
>>>> [    0.080953]  housekeeping_init+0x14c/0x170
>>>> [    0.080957]  start_kernel+0x207/0x450
>>>> [    0.080961]  x86_64_start_reservations+0x24/0x30
>>>> [    0.080965]  x86_64_start_kernel+0xda/0xe0
>>>> [    0.080967]  common_startup_64+0x13e/0x141
>>>> [    0.080972]  </TASK>
>>>>
>>>> The commit states that freeing of reserved memory before the memory
>>>> map is fully initialized in deferred_init_memmap() would cause access
>>>> to uninitialized struct pages and may crash when accessing spurious
>>>> list pointers. However, if the memblock_free() call is deferred to
>>>> the start of initcall processing in the bootup process, for instance,
>>>> the following KASAN warning can appear.
>>>>
>>>> [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
>>>> [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
>>>>     :
>>>> [    8.514775] Call Trace:
>>>> [    8.514775]  <TASK>
>>>> [    8.514775]  kasan_report+0xb2/0x1b0
>>>> [    8.514775]  memblock_isolate_range+0x4ac/0x650
>>>> [    8.514775]  memblock_phys_free+0xc4/0x190
>>>> [    8.514775]  housekeeping_late_init+0x257/0x280
>>>> [    8.514775]  do_one_initcall+0xaa/0x470
>>>> [    8.514775]  do_initcalls+0x1b4/0x1f0
>>>> [    8.514775]  kernel_init_freeable+0x4b5/0x550
>>>> [    8.514775]  kernel_init+0x1c/0x150
>>>> [    8.514775]  ret_from_fork+0x5dc/0x8e0
>>>> [    8.514775]  ret_from_fork_asm+0x1a/0x30
>>>> [    8.514775]  </TASK>
>>>>
>>>> It is likely that memblock_discard() may discard memblock data needed
>>>> for memblock_free(). One workaround for now to avoid these warning/bug
>>>> messages is to keep the memblock allocated cpumasks even if they are
>>>> no longer needed until the memblock subsystem is properly updated to
>>>> handle memblock_free().
>>>>
>>>> On most systems, memory occuipied by a cpumask is pretty small. So not
>>>> much memory will be wasted if the memblock cpumasks are not freed.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>    kernel/sched/isolation.c | 8 +++++++-
>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>>>> index ef152d401fe2..ad9b1a1104e3 100644
>>>> --- a/kernel/sched/isolation.c
>>>> +++ b/kernel/sched/isolation.c
>>>> @@ -189,7 +189,13 @@ void __init housekeeping_init(void)
>>>>    		WARN_ON_ONCE(cpumask_empty(omask));
>>>>    		cpumask_copy(nmask, omask);
>>>>    		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
>>>> -		memblock_free(omask, cpumask_size());
>>>> +
>>>> +		/*
>>>> +		 * TODO: Don't free memblock allocated cpumasks until the
>>>> +		 * memblock subystem is able to handle the memblock_free()
>>>> +		 * properly.
>>>> +		 */
>>>> +		// memblock_free(omask, cpumask_size());
>>> Before 59bd1d914bb5 it was a silent leak. housekeeping_init() is called
>>> after memblock moves all the memory to buddy, so this would only update
>>> memblock.reserved.
>>>
>>> The comment a few lines above says that we reallocate to be able to kfree()
>>> later. Is it possible to delay reallocation until an initcall?
>> My original thought was to defer the freeing to init call. That changes led
>> to the KASAN bug splat listed in the commit log, I think the right window to
>> free memblock memory is currently just too narrow. Do you mean that with the
>> fix patch you sent to Breno, memblock freeing in initcall will work without
>> bug report?
> Yes, with the fix I sent to Breno memblock_free() should work in an
> initcall and "do the right thing".

Thanks for the confirmation. I have tested your patch with my patch to 
defer the memblock_free() to initcall. There is no longer any KASAN 
splat when booting up a debug test kernel. You can add the following tag 
when you send out your patch.

Tested-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks
  2026-05-11 21:36       ` Waiman Long
@ 2026-05-12 13:40         ` Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2026-05-12 13:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mike Rapoport, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, K Prateek Nayak, linux-kernel

Le Mon, May 11, 2026 at 05:36:08PM -0400, Waiman Long a écrit :
> On 5/11/26 4:34 AM, Mike Rapoport wrote:
> > On Mon, May 11, 2026 at 12:55:39AM -0400, Waiman Long wrote:
> > > On 5/10/26 11:02 AM, Mike Rapoport wrote:
> > > > Hi Waiman,
> > > > 
> > > > On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
> > > > > When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
> > > > > freeing reserved memory before memory map is initialized"), the following
> > > > > warning was hit when there was a "nohz_full" kernel boot parameter.
> > > > > 
> > > > > [    0.080911] Cannot free reserved memory because of deferred initialization of the memory map
> > > > > [    0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
> > > > >     :
> > > > > [    0.080945] Call Trace:
> > > > > [    0.080947]  <TASK>
> > > > > [    0.080949]  memblock_phys_free+0xcb/0x100
> > > > > [    0.080953]  housekeeping_init+0x14c/0x170
> > > > > [    0.080957]  start_kernel+0x207/0x450
> > > > > [    0.080961]  x86_64_start_reservations+0x24/0x30
> > > > > [    0.080965]  x86_64_start_kernel+0xda/0xe0
> > > > > [    0.080967]  common_startup_64+0x13e/0x141
> > > > > [    0.080972]  </TASK>
> > > > > 
> > > > > The commit states that freeing of reserved memory before the memory
> > > > > map is fully initialized in deferred_init_memmap() would cause access
> > > > > to uninitialized struct pages and may crash when accessing spurious
> > > > > list pointers. However, if the memblock_free() call is deferred to
> > > > > the start of initcall processing in the bootup process, for instance,
> > > > > the following KASAN warning can appear.
> > > > > 
> > > > > [    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
> > > > > [    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
> > > > >     :
> > > > > [    8.514775] Call Trace:
> > > > > [    8.514775]  <TASK>
> > > > > [    8.514775]  kasan_report+0xb2/0x1b0
> > > > > [    8.514775]  memblock_isolate_range+0x4ac/0x650
> > > > > [    8.514775]  memblock_phys_free+0xc4/0x190
> > > > > [    8.514775]  housekeeping_late_init+0x257/0x280
> > > > > [    8.514775]  do_one_initcall+0xaa/0x470
> > > > > [    8.514775]  do_initcalls+0x1b4/0x1f0
> > > > > [    8.514775]  kernel_init_freeable+0x4b5/0x550
> > > > > [    8.514775]  kernel_init+0x1c/0x150
> > > > > [    8.514775]  ret_from_fork+0x5dc/0x8e0
> > > > > [    8.514775]  ret_from_fork_asm+0x1a/0x30
> > > > > [    8.514775]  </TASK>
> > > > > 
> > > > > It is likely that memblock_discard() may discard memblock data needed
> > > > > for memblock_free(). One workaround for now to avoid these warning/bug
> > > > > messages is to keep the memblock allocated cpumasks even if they are
> > > > > no longer needed until the memblock subsystem is properly updated to
> > > > > handle memblock_free().
> > > > > 
> > > > > On most systems, memory occuipied by a cpumask is pretty small. So not
> > > > > much memory will be wasted if the memblock cpumasks are not freed.
> > > > > 
> > > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > > ---
> > > > >    kernel/sched/isolation.c | 8 +++++++-
> > > > >    1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > > > index ef152d401fe2..ad9b1a1104e3 100644
> > > > > --- a/kernel/sched/isolation.c
> > > > > +++ b/kernel/sched/isolation.c
> > > > > @@ -189,7 +189,13 @@ void __init housekeeping_init(void)
> > > > >    		WARN_ON_ONCE(cpumask_empty(omask));
> > > > >    		cpumask_copy(nmask, omask);
> > > > >    		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> > > > > -		memblock_free(omask, cpumask_size());
> > > > > +
> > > > > +		/*
> > > > > +		 * TODO: Don't free memblock allocated cpumasks until the
> > > > > +		 * memblock subystem is able to handle the memblock_free()
> > > > > +		 * properly.
> > > > > +		 */
> > > > > +		// memblock_free(omask, cpumask_size());
> > > > Before 59bd1d914bb5 it was a silent leak. housekeeping_init() is called
> > > > after memblock moves all the memory to buddy, so this would only update
> > > > memblock.reserved.
> > > > 
> > > > The comment a few lines above says that we reallocate to be able to kfree()
> > > > later. Is it possible to delay reallocation until an initcall?
> > > My original thought was to defer the freeing to init call. That changes led
> > > to the KASAN bug splat listed in the commit log, I think the right window to
> > > free memblock memory is currently just too narrow. Do you mean that with the
> > > fix patch you sent to Breno, memblock freeing in initcall will work without
> > > bug report?
> > Yes, with the fix I sent to Breno memblock_free() should work in an
> > initcall and "do the right thing".
> 
> Thanks for the confirmation. I have tested your patch with my patch to defer
> the memblock_free() to initcall. There is no longer any KASAN splat when
> booting up a debug test kernel. You can add the following tag when you send
> out your patch.
> 
> Tested-by: Waiman Long <longman@redhat.com>

Thanks a lot guys!

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-12 13:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05  5:18 [PATCH] sched/isolation: Don't free memblock allocated cpumasks Waiman Long
2026-05-06 13:25 ` Valentin Schneider
2026-05-06 14:02   ` Waiman Long
2026-05-08 14:19 ` Breno Leitao
2026-05-10 14:45   ` Mike Rapoport
2026-05-10 15:02 ` Mike Rapoport
2026-05-11  4:55   ` Waiman Long
2026-05-11  8:34     ` Mike Rapoport
2026-05-11 21:36       ` Waiman Long
2026-05-12 13:40         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox