The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
@ 2026-06-04 18:24 Waiman Long
  2026-06-30 21:36 ` Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Waiman Long @ 2026-06-04 18:24 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker
  Cc: linux-kernel, Waiman Long

When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
warn when freeing reserved memory before memory map is initialized"),
the following warning was hit when there was a "nohz_full" kernel boot
parameter.

  Cannot free reserved memory because of deferred initialization of the memory map
  WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
    :
  Call Trace:
   <TASK>
   memblock_phys_free+0xcb/0x100
   housekeeping_init+0x14c/0x170
   start_kernel+0x207/0x450
   x86_64_start_reservations+0x24/0x30
   x86_64_start_kernel+0xda/0xe0
   common_startup_64+0x13e/0x141
   </TASK>

IOW, we shouldn't free memblock allocated memory so early
in the boot process when memory map isn't fully initialized in
deferred_init_memmap().

Fix it by saving the housekeeping cpumask memblock memory to
be freed into a free list in housekeeping_init() and add a new
housekeeping_late_init() helper to defer the actual freeing of memblock
memory to when initcall's are being processed. The non-atomic version
of the llist APIs are used as there is no contention.

This commit also depends on the presence of commit 7c2eee9c1367
("memblock: don't touch memblock arrays when memblock_free() is called
late") to prevent a KASAN UAF bug report [1].

 [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/

Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/isolation.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

 [v3.1] Add __initdata to memblock_freelist

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index ef152d401fe2..156025ef81b7 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
  *
  */
 #include <linux/sched/isolation.h>
+#include <linux/llist.h>
 #include <linux/pci.h>
 #include "sched.h"
 
@@ -27,6 +28,7 @@ struct housekeeping {
 };
 
 static struct housekeeping housekeeping;
+static __initdata LLIST_HEAD(memblock_freelist);
 
 bool housekeeping_enabled(enum hk_type type)
 {
@@ -189,10 +191,22 @@ void __init housekeeping_init(void)
 		WARN_ON_ONCE(cpumask_empty(omask));
 		cpumask_copy(nmask, omask);
 		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
-		memblock_free(omask, cpumask_size());
+		__llist_add((struct llist_node *)omask, &memblock_freelist);
 	}
 }
 
+static int __init housekeeping_late_init(void)
+{
+	struct llist_node *llnode, *pos, *t;
+
+	/* Free allocated memblock memory, if any */
+	llnode = __llist_del_all(&memblock_freelist);
+	llist_for_each_safe(pos, t, llnode)
+		memblock_free(pos, cpumask_size());
+	return 0;
+}
+pure_initcall(housekeeping_late_init);
+
 static void __init housekeeping_setup_type(enum hk_type type,
 					   cpumask_var_t housekeeping_staging)
 {
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
  2026-06-04 18:24 [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall Waiman Long
@ 2026-06-30 21:36 ` Waiman Long
  2026-07-01 13:28 ` Frederic Weisbecker
  2026-07-01 14:13 ` Phil Auld
  2 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2026-06-30 21:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker
  Cc: linux-kernel

On 6/4/26 2:24 PM, Waiman Long wrote:
> When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
> warn when freeing reserved memory before memory map is initialized"),
> the following warning was hit when there was a "nohz_full" kernel boot
> parameter.
>
>    Cannot free reserved memory because of deferred initialization of the memory map
>    WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>      :
>    Call Trace:
>     <TASK>
>     memblock_phys_free+0xcb/0x100
>     housekeeping_init+0x14c/0x170
>     start_kernel+0x207/0x450
>     x86_64_start_reservations+0x24/0x30
>     x86_64_start_kernel+0xda/0xe0
>     common_startup_64+0x13e/0x141
>     </TASK>
>
> IOW, we shouldn't free memblock allocated memory so early
> in the boot process when memory map isn't fully initialized in
> deferred_init_memmap().
>
> Fix it by saving the housekeeping cpumask memblock memory to
> be freed into a free list in housekeeping_init() and add a new
> housekeeping_late_init() helper to defer the actual freeing of memblock
> memory to when initcall's are being processed. The non-atomic version
> of the llist APIs are used as there is no contention.
>
> This commit also depends on the presence of commit 7c2eee9c1367
> ("memblock: don't touch memblock arrays when memblock_free() is called
> late") to prevent a KASAN UAF bug report [1].
>
>   [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
>
> Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/sched/isolation.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
>
>   [v3.1] Add __initdata to memblock_freelist
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index ef152d401fe2..156025ef81b7 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -8,6 +8,7 @@
>    *
>    */
>   #include <linux/sched/isolation.h>
> +#include <linux/llist.h>
>   #include <linux/pci.h>
>   #include "sched.h"
>   
> @@ -27,6 +28,7 @@ struct housekeeping {
>   };
>   
>   static struct housekeeping housekeeping;
> +static __initdata LLIST_HEAD(memblock_freelist);
>   
>   bool housekeeping_enabled(enum hk_type type)
>   {
> @@ -189,10 +191,22 @@ void __init housekeeping_init(void)
>   		WARN_ON_ONCE(cpumask_empty(omask));
>   		cpumask_copy(nmask, omask);
>   		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> -		memblock_free(omask, cpumask_size());
> +		__llist_add((struct llist_node *)omask, &memblock_freelist);
>   	}
>   }
>   
> +static int __init housekeeping_late_init(void)
> +{
> +	struct llist_node *llnode, *pos, *t;
> +
> +	/* Free allocated memblock memory, if any */
> +	llnode = __llist_del_all(&memblock_freelist);
> +	llist_for_each_safe(pos, t, llnode)
> +		memblock_free(pos, cpumask_size());
> +	return 0;
> +}
> +pure_initcall(housekeeping_late_init);
> +
>   static void __init housekeeping_setup_type(enum hk_type type,
>   					   cpumask_var_t housekeeping_staging)
>   {

Ping! Does anyone have comment for this patch?

Thanks,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
  2026-06-04 18:24 [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall Waiman Long
  2026-06-30 21:36 ` Waiman Long
@ 2026-07-01 13:28 ` Frederic Weisbecker
  2026-07-01 14:13 ` Phil Auld
  2 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2026-07-01 13:28 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, linux-kernel

Le Thu, Jun 04, 2026 at 02:24:40PM -0400, Waiman Long a écrit :
> When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
> warn when freeing reserved memory before memory map is initialized"),
> the following warning was hit when there was a "nohz_full" kernel boot
> parameter.
> 
>   Cannot free reserved memory because of deferred initialization of the memory map
>   WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>     :
>   Call Trace:
>    <TASK>
>    memblock_phys_free+0xcb/0x100
>    housekeeping_init+0x14c/0x170
>    start_kernel+0x207/0x450
>    x86_64_start_reservations+0x24/0x30
>    x86_64_start_kernel+0xda/0xe0
>    common_startup_64+0x13e/0x141
>    </TASK>
> 
> IOW, we shouldn't free memblock allocated memory so early
> in the boot process when memory map isn't fully initialized in
> deferred_init_memmap().
> 
> Fix it by saving the housekeeping cpumask memblock memory to
> be freed into a free list in housekeeping_init() and add a new
> housekeeping_late_init() helper to defer the actual freeing of memblock
> memory to when initcall's are being processed. The non-atomic version
> of the llist APIs are used as there is no contention.
> 
> This commit also depends on the presence of commit 7c2eee9c1367
> ("memblock: don't touch memblock arrays when memblock_free() is called
> late") to prevent a KASAN UAF bug report [1].
> 
>  [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
> 
> Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
> Signed-off-by: Waiman Long <longman@redhat.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
  2026-06-04 18:24 [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall Waiman Long
  2026-06-30 21:36 ` Waiman Long
  2026-07-01 13:28 ` Frederic Weisbecker
@ 2026-07-01 14:13 ` Phil Auld
  2026-07-01 14:25   ` Phil Auld
  2 siblings, 1 reply; 7+ messages in thread
From: Phil Auld @ 2026-07-01 14:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel

Hi Waiman,

On Thu, Jun 04, 2026 at 02:24:40PM -0400 Waiman Long wrote:
> When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
> warn when freeing reserved memory before memory map is initialized"),
> the following warning was hit when there was a "nohz_full" kernel boot
> parameter.
> 
>   Cannot free reserved memory because of deferred initialization of the memory map
>   WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>     :
>   Call Trace:
>    <TASK>
>    memblock_phys_free+0xcb/0x100
>    housekeeping_init+0x14c/0x170
>    start_kernel+0x207/0x450
>    x86_64_start_reservations+0x24/0x30
>    x86_64_start_kernel+0xda/0xe0
>    common_startup_64+0x13e/0x141
>    </TASK>
> 
> IOW, we shouldn't free memblock allocated memory so early
> in the boot process when memory map isn't fully initialized in
> deferred_init_memmap().
> 
> Fix it by saving the housekeeping cpumask memblock memory to
> be freed into a free list in housekeeping_init() and add a new
> housekeeping_late_init() helper to defer the actual freeing of memblock
> memory to when initcall's are being processed. The non-atomic version
> of the llist APIs are used as there is no contention.
> 
> This commit also depends on the presence of commit 7c2eee9c1367
> ("memblock: don't touch memblock arrays when memblock_free() is called
> late") to prevent a KASAN UAF bug report [1].
> 
>  [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
> 
> Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/sched/isolation.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
>  [v3.1] Add __initdata to memblock_freelist
> 
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index ef152d401fe2..156025ef81b7 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -8,6 +8,7 @@
>   *
>   */
>  #include <linux/sched/isolation.h>
> +#include <linux/llist.h>
>  #include <linux/pci.h>
>  #include "sched.h"
>  
> @@ -27,6 +28,7 @@ struct housekeeping {
>  };
>  
>  static struct housekeeping housekeeping;
> +static __initdata LLIST_HEAD(memblock_freelist);
>  
>  bool housekeeping_enabled(enum hk_type type)
>  {
> @@ -189,10 +191,22 @@ void __init housekeeping_init(void)
>  		WARN_ON_ONCE(cpumask_empty(omask));
>  		cpumask_copy(nmask, omask);
>  		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> -		memblock_free(omask, cpumask_size());
> +		__llist_add((struct llist_node *)omask, &memblock_freelist);

This cast is somewhat concerning. I think I see why it's needed. Wrapping
it in a proper struct would require more allocating and freeing and
make the problem worse. It should work though.


Reviewed-by: Phil Auld <pauld@dhat.com>




Cheers,
Phil


>  	}
>  }
>  
> +static int __init housekeeping_late_init(void)
> +{
> +	struct llist_node *llnode, *pos, *t;
> +
> +	/* Free allocated memblock memory, if any */
> +	llnode = __llist_del_all(&memblock_freelist);
> +	llist_for_each_safe(pos, t, llnode)
> +		memblock_free(pos, cpumask_size());
> +	return 0;
> +}
> +pure_initcall(housekeeping_late_init);
> +
>  static void __init housekeeping_setup_type(enum hk_type type,
>  					   cpumask_var_t housekeeping_staging)
>  {
> -- 
> 2.54.0
> 
> 

-- 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
  2026-07-01 14:13 ` Phil Auld
@ 2026-07-01 14:25   ` Phil Auld
  2026-07-01 14:56     ` Frederic Weisbecker
  2026-07-01 19:03     ` Waiman Long
  0 siblings, 2 replies; 7+ messages in thread
From: Phil Auld @ 2026-07-01 14:25 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel

On Wed, Jul 01, 2026 at 10:13:57AM -0400 Phil Auld wrote:
> Hi Waiman,
> 
> On Thu, Jun 04, 2026 at 02:24:40PM -0400 Waiman Long wrote:
> > When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
> > warn when freeing reserved memory before memory map is initialized"),
> > the following warning was hit when there was a "nohz_full" kernel boot
> > parameter.
> > 
> >   Cannot free reserved memory because of deferred initialization of the memory map
> >   WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
> >     :
> >   Call Trace:
> >    <TASK>
> >    memblock_phys_free+0xcb/0x100
> >    housekeeping_init+0x14c/0x170
> >    start_kernel+0x207/0x450
> >    x86_64_start_reservations+0x24/0x30
> >    x86_64_start_kernel+0xda/0xe0
> >    common_startup_64+0x13e/0x141
> >    </TASK>
> > 
> > IOW, we shouldn't free memblock allocated memory so early
> > in the boot process when memory map isn't fully initialized in
> > deferred_init_memmap().
> > 
> > Fix it by saving the housekeeping cpumask memblock memory to
> > be freed into a free list in housekeeping_init() and add a new
> > housekeeping_late_init() helper to defer the actual freeing of memblock
> > memory to when initcall's are being processed. The non-atomic version
> > of the llist APIs are used as there is no contention.
> > 
> > This commit also depends on the presence of commit 7c2eee9c1367
> > ("memblock: don't touch memblock arrays when memblock_free() is called
> > late") to prevent a KASAN UAF bug report [1].
> > 
> >  [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
> > 
> > Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
> > Signed-off-by: Waiman Long <longman@redhat.com>
> > ---
> >  kernel/sched/isolation.c | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> >  [v3.1] Add __initdata to memblock_freelist
> > 
> > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > index ef152d401fe2..156025ef81b7 100644
> > --- a/kernel/sched/isolation.c
> > +++ b/kernel/sched/isolation.c
> > @@ -8,6 +8,7 @@
> >   *
> >   */
> >  #include <linux/sched/isolation.h>
> > +#include <linux/llist.h>
> >  #include <linux/pci.h>
> >  #include "sched.h"
> >  
> > @@ -27,6 +28,7 @@ struct housekeeping {
> >  };
> >  
> >  static struct housekeeping housekeeping;
> > +static __initdata LLIST_HEAD(memblock_freelist);
> >  
> >  bool housekeeping_enabled(enum hk_type type)
> >  {
> > @@ -189,10 +191,22 @@ void __init housekeeping_init(void)
> >  		WARN_ON_ONCE(cpumask_empty(omask));
> >  		cpumask_copy(nmask, omask);
> >  		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> > -		memblock_free(omask, cpumask_size());
> > +		__llist_add((struct llist_node *)omask, &memblock_freelist);
> 
> This cast is somewhat concerning. I think I see why it's needed. Wrapping
> it in a proper struct would require more allocating and freeing and
> make the problem worse. It should work though.
> 
>

Fwiw, opencode/sonnet suggested a comment like this:

/*
 * We can't allocate wrapper structs from memblock as they'd need
 * deferred freeing too. Instead, reuse the cpumask memory itself
 * as llist nodes. This is safe because:
 * - cpumask_size() >= sizeof(struct llist_node)
 * - Memory is properly aligned (SMP_CACHE_BYTES)
 * - The cpumask is never accessed after being added to the list
 */          

... which may be overkill :)



Cheers,
Phil


> Reviewed-by: Phil Auld <pauld@dhat.com>
> 
> 
> 
> 
> Cheers,
> Phil
> 
> 
> >  	}
> >  }
> >  
> > +static int __init housekeeping_late_init(void)
> > +{
> > +	struct llist_node *llnode, *pos, *t;
> > +
> > +	/* Free allocated memblock memory, if any */
> > +	llnode = __llist_del_all(&memblock_freelist);
> > +	llist_for_each_safe(pos, t, llnode)
> > +		memblock_free(pos, cpumask_size());
> > +	return 0;
> > +}
> > +pure_initcall(housekeeping_late_init);
> > +
> >  static void __init housekeeping_setup_type(enum hk_type type,
> >  					   cpumask_var_t housekeeping_staging)
> >  {
> > -- 
> > 2.54.0
> > 
> > 
> 
> -- 
> 
> 

-- 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
  2026-07-01 14:25   ` Phil Auld
@ 2026-07-01 14:56     ` Frederic Weisbecker
  2026-07-01 19:03     ` Waiman Long
  1 sibling, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2026-07-01 14:56 UTC (permalink / raw)
  To: Phil Auld
  Cc: Waiman Long, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, K Prateek Nayak, linux-kernel

Le Wed, Jul 01, 2026 at 10:25:59AM -0400, Phil Auld a écrit :
> On Wed, Jul 01, 2026 at 10:13:57AM -0400 Phil Auld wrote:
> > Hi Waiman,
> > 
> > On Thu, Jun 04, 2026 at 02:24:40PM -0400 Waiman Long wrote:
> > > When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
> > > warn when freeing reserved memory before memory map is initialized"),
> > > the following warning was hit when there was a "nohz_full" kernel boot
> > > parameter.
> > > 
> > >   Cannot free reserved memory because of deferred initialization of the memory map
> > >   WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
> > >     :
> > >   Call Trace:
> > >    <TASK>
> > >    memblock_phys_free+0xcb/0x100
> > >    housekeeping_init+0x14c/0x170
> > >    start_kernel+0x207/0x450
> > >    x86_64_start_reservations+0x24/0x30
> > >    x86_64_start_kernel+0xda/0xe0
> > >    common_startup_64+0x13e/0x141
> > >    </TASK>
> > > 
> > > IOW, we shouldn't free memblock allocated memory so early
> > > in the boot process when memory map isn't fully initialized in
> > > deferred_init_memmap().
> > > 
> > > Fix it by saving the housekeeping cpumask memblock memory to
> > > be freed into a free list in housekeeping_init() and add a new
> > > housekeeping_late_init() helper to defer the actual freeing of memblock
> > > memory to when initcall's are being processed. The non-atomic version
> > > of the llist APIs are used as there is no contention.
> > > 
> > > This commit also depends on the presence of commit 7c2eee9c1367
> > > ("memblock: don't touch memblock arrays when memblock_free() is called
> > > late") to prevent a KASAN UAF bug report [1].
> > > 
> > >  [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
> > > 
> > > Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > >  kernel/sched/isolation.c | 16 +++++++++++++++-
> > >  1 file changed, 15 insertions(+), 1 deletion(-)
> > > 
> > >  [v3.1] Add __initdata to memblock_freelist
> > > 
> > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > index ef152d401fe2..156025ef81b7 100644
> > > --- a/kernel/sched/isolation.c
> > > +++ b/kernel/sched/isolation.c
> > > @@ -8,6 +8,7 @@
> > >   *
> > >   */
> > >  #include <linux/sched/isolation.h>
> > > +#include <linux/llist.h>
> > >  #include <linux/pci.h>
> > >  #include "sched.h"
> > >  
> > > @@ -27,6 +28,7 @@ struct housekeeping {
> > >  };
> > >  
> > >  static struct housekeeping housekeeping;
> > > +static __initdata LLIST_HEAD(memblock_freelist);
> > >  
> > >  bool housekeeping_enabled(enum hk_type type)
> > >  {
> > > @@ -189,10 +191,22 @@ void __init housekeeping_init(void)
> > >  		WARN_ON_ONCE(cpumask_empty(omask));
> > >  		cpumask_copy(nmask, omask);
> > >  		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> > > -		memblock_free(omask, cpumask_size());
> > > +		__llist_add((struct llist_node *)omask, &memblock_freelist);
> > 
> > This cast is somewhat concerning. I think I see why it's needed. Wrapping
> > it in a proper struct would require more allocating and freeing and
> > make the problem worse. It should work though.
> > 
> >
> 
> Fwiw, opencode/sonnet suggested a comment like this:
> 
> /*
>  * We can't allocate wrapper structs from memblock as they'd need
>  * deferred freeing too. Instead, reuse the cpumask memory itself
>  * as llist nodes. This is safe because:
>  * - cpumask_size() >= sizeof(struct llist_node)
>  * - Memory is properly aligned (SMP_CACHE_BYTES)
>  * - The cpumask is never accessed after being added to the list
>  */          
> 
> ... which may be overkill :)

It tells the truth, just a bit too much :-)

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
  2026-07-01 14:25   ` Phil Auld
  2026-07-01 14:56     ` Frederic Weisbecker
@ 2026-07-01 19:03     ` Waiman Long
  1 sibling, 0 replies; 7+ messages in thread
From: Waiman Long @ 2026-07-01 19:03 UTC (permalink / raw)
  To: Phil Auld
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Frederic Weisbecker,
	linux-kernel


On 7/1/26 10:25 AM, Phil Auld wrote:
> On Wed, Jul 01, 2026 at 10:13:57AM -0400 Phil Auld wrote:
>> Hi Waiman,
>>
>> On Thu, Jun 04, 2026 at 02:24:40PM -0400 Waiman Long wrote:
>>> When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
>>> warn when freeing reserved memory before memory map is initialized"),
>>> the following warning was hit when there was a "nohz_full" kernel boot
>>> parameter.
>>>
>>>    Cannot free reserved memory because of deferred initialization of the memory map
>>>    WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
>>>      :
>>>    Call Trace:
>>>     <TASK>
>>>     memblock_phys_free+0xcb/0x100
>>>     housekeeping_init+0x14c/0x170
>>>     start_kernel+0x207/0x450
>>>     x86_64_start_reservations+0x24/0x30
>>>     x86_64_start_kernel+0xda/0xe0
>>>     common_startup_64+0x13e/0x141
>>>     </TASK>
>>>
>>> IOW, we shouldn't free memblock allocated memory so early
>>> in the boot process when memory map isn't fully initialized in
>>> deferred_init_memmap().
>>>
>>> Fix it by saving the housekeeping cpumask memblock memory to
>>> be freed into a free list in housekeeping_init() and add a new
>>> housekeeping_late_init() helper to defer the actual freeing of memblock
>>> memory to when initcall's are being processed. The non-atomic version
>>> of the llist APIs are used as there is no contention.
>>>
>>> This commit also depends on the presence of commit 7c2eee9c1367
>>> ("memblock: don't touch memblock arrays when memblock_free() is called
>>> late") to prevent a KASAN UAF bug report [1].
>>>
>>>   [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
>>>
>>> Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/sched/isolation.c | 16 +++++++++++++++-
>>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>>   [v3.1] Add __initdata to memblock_freelist
>>>
>>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>>> index ef152d401fe2..156025ef81b7 100644
>>> --- a/kernel/sched/isolation.c
>>> +++ b/kernel/sched/isolation.c
>>> @@ -8,6 +8,7 @@
>>>    *
>>>    */
>>>   #include <linux/sched/isolation.h>
>>> +#include <linux/llist.h>
>>>   #include <linux/pci.h>
>>>   #include "sched.h"
>>>   
>>> @@ -27,6 +28,7 @@ struct housekeeping {
>>>   };
>>>   
>>>   static struct housekeeping housekeeping;
>>> +static __initdata LLIST_HEAD(memblock_freelist);
>>>   
>>>   bool housekeeping_enabled(enum hk_type type)
>>>   {
>>> @@ -189,10 +191,22 @@ void __init housekeeping_init(void)
>>>   		WARN_ON_ONCE(cpumask_empty(omask));
>>>   		cpumask_copy(nmask, omask);
>>>   		RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
>>> -		memblock_free(omask, cpumask_size());
>>> +		__llist_add((struct llist_node *)omask, &memblock_freelist);
>> This cast is somewhat concerning. I think I see why it's needed. Wrapping
>> it in a proper struct would require more allocating and freeing and
>> make the problem worse. It should work though.
>>
>>
> Fwiw, opencode/sonnet suggested a comment like this:
>
> /*
>   * We can't allocate wrapper structs from memblock as they'd need
>   * deferred freeing too. Instead, reuse the cpumask memory itself
>   * as llist nodes. This is safe because:
>   * - cpumask_size() >= sizeof(struct llist_node)

I know that as the smallest allocation size is sizeof(long) which is the 
size of a llist_node. I should have mentioned that either in the commit 
log or as a comment.

Cheers,
Longman

>   * - Memory is properly aligned (SMP_CACHE_BYTES)
>   * - The cpumask is never accessed after being added to the list
>   */
>
> ... which may be overkill :)
>
>
>
> Cheers,
> Phil
>
>
>> Reviewed-by: Phil Auld <pauld@dhat.com>
>>
>>
>>
>>
>> Cheers,
>> Phil
>>
>>
>>>   	}
>>>   }
>>>   
>>> +static int __init housekeeping_late_init(void)
>>> +{
>>> +	struct llist_node *llnode, *pos, *t;
>>> +
>>> +	/* Free allocated memblock memory, if any */
>>> +	llnode = __llist_del_all(&memblock_freelist);
>>> +	llist_for_each_safe(pos, t, llnode)
>>> +		memblock_free(pos, cpumask_size());
>>> +	return 0;
>>> +}
>>> +pure_initcall(housekeeping_late_init);
>>> +
>>>   static void __init housekeeping_setup_type(enum hk_type type,
>>>   					   cpumask_var_t housekeeping_staging)
>>>   {
>>> -- 
>>> 2.54.0
>>>
>>>
>> -- 
>>
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-07-01 19:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 18:24 [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall Waiman Long
2026-06-30 21:36 ` Waiman Long
2026-07-01 13:28 ` Frederic Weisbecker
2026-07-01 14:13 ` Phil Auld
2026-07-01 14:25   ` Phil Auld
2026-07-01 14:56     ` Frederic Weisbecker
2026-07-01 19:03     ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox