From: Frederic Weisbecker <frederic@kernel.org>
To: Phil Auld <pauld@redhat.com>
Cc: Waiman Long <longman@redhat.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall
Date: Wed, 1 Jul 2026 16:56:30 +0200 [thread overview]
Message-ID: <akUqniHncH9d8EBH@localhost.localdomain> (raw)
In-Reply-To: <20260701142559.GB156809@pauld.westford.csb>
Le Wed, Jul 01, 2026 at 10:25:59AM -0400, Phil Auld a écrit :
> On Wed, Jul 01, 2026 at 10:13:57AM -0400 Phil Auld wrote:
> > Hi Waiman,
> >
> > On Thu, Jun 04, 2026 at 02:24:40PM -0400 Waiman Long wrote:
> > > When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock:
> > > warn when freeing reserved memory before memory map is initialized"),
> > > the following warning was hit when there was a "nohz_full" kernel boot
> > > parameter.
> > >
> > > Cannot free reserved memory because of deferred initialization of the memory map
> > > WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
> > > :
> > > Call Trace:
> > > <TASK>
> > > memblock_phys_free+0xcb/0x100
> > > housekeeping_init+0x14c/0x170
> > > start_kernel+0x207/0x450
> > > x86_64_start_reservations+0x24/0x30
> > > x86_64_start_kernel+0xda/0xe0
> > > common_startup_64+0x13e/0x141
> > > </TASK>
> > >
> > > IOW, we shouldn't free memblock allocated memory so early
> > > in the boot process when memory map isn't fully initialized in
> > > deferred_init_memmap().
> > >
> > > Fix it by saving the housekeeping cpumask memblock memory to
> > > be freed into a free list in housekeeping_init() and add a new
> > > housekeeping_late_init() helper to defer the actual freeing of memblock
> > > memory to when initcall's are being processed. The non-atomic version
> > > of the llist APIs are used as there is no contention.
> > >
> > > This commit also depends on the presence of commit 7c2eee9c1367
> > > ("memblock: don't touch memblock arrays when memblock_free() is called
> > > late") to prevent a KASAN UAF bug report [1].
> > >
> > > [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/
> > >
> > > Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > > kernel/sched/isolation.c | 16 +++++++++++++++-
> > > 1 file changed, 15 insertions(+), 1 deletion(-)
> > >
> > > [v3.1] Add __initdata to memblock_freelist
> > >
> > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > index ef152d401fe2..156025ef81b7 100644
> > > --- a/kernel/sched/isolation.c
> > > +++ b/kernel/sched/isolation.c
> > > @@ -8,6 +8,7 @@
> > > *
> > > */
> > > #include <linux/sched/isolation.h>
> > > +#include <linux/llist.h>
> > > #include <linux/pci.h>
> > > #include "sched.h"
> > >
> > > @@ -27,6 +28,7 @@ struct housekeeping {
> > > };
> > >
> > > static struct housekeeping housekeeping;
> > > +static __initdata LLIST_HEAD(memblock_freelist);
> > >
> > > bool housekeeping_enabled(enum hk_type type)
> > > {
> > > @@ -189,10 +191,22 @@ void __init housekeeping_init(void)
> > > WARN_ON_ONCE(cpumask_empty(omask));
> > > cpumask_copy(nmask, omask);
> > > RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> > > - memblock_free(omask, cpumask_size());
> > > + __llist_add((struct llist_node *)omask, &memblock_freelist);
> >
> > This cast is somewhat concerning. I think I see why it's needed. Wrapping
> > it in a proper struct would require more allocating and freeing and
> > make the problem worse. It should work though.
> >
> >
>
> Fwiw, opencode/sonnet suggested a comment like this:
>
> /*
> * We can't allocate wrapper structs from memblock as they'd need
> * deferred freeing too. Instead, reuse the cpumask memory itself
> * as llist nodes. This is safe because:
> * - cpumask_size() >= sizeof(struct llist_node)
> * - Memory is properly aligned (SMP_CACHE_BYTES)
> * - The cpumask is never accessed after being added to the list
> */
>
> ... which may be overkill :)
It tells the truth, just a bit too much :-)
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2026-07-01 14:56 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 18:24 [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall Waiman Long
2026-06-30 21:36 ` Waiman Long
2026-07-01 13:28 ` Frederic Weisbecker
2026-07-01 14:13 ` Phil Auld
2026-07-01 14:25 ` Phil Auld
2026-07-01 14:56 ` Frederic Weisbecker [this message]
2026-07-01 19:03 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akUqniHncH9d8EBH@localhost.localdomain \
--to=frederic@kernel.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox