From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C552234D397 for ; Wed, 1 Jul 2026 14:56:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782917794; cv=none; b=QQ8onP8sLzOWJnJsr6oSmtNHnrvUuvqmreZpiO0XTj4M4xoBD8aFMCoo84ciBnh+dpxqU+gMSVPftst/o3VINyYeoBXnQHJTBfhIF1ky8qDYITi53cbHoW0H1887e2taDv9SiQpOtS8YQ1Cw1jcLX/5mXZjWoml9po47Ja2vlKE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782917794; c=relaxed/simple; bh=rfBUkdnszv5yCHfy/e2oJWTAR1T81uX3Blx6jTNt8Us=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=NWStxI4iUUMh7J0JHsEu8YHHq680lKbXiMvQna7Aaz9JTapx/6gIlKqbpTKdhBhMr6QmWtSmiA+l3rYmDrKinW0bIpPl88jvRJ2GrBV84vVT6zm0J/Xe5BNABVnJSAOcySNSPj6Wk5Kj+94Kep42GEpR5r/4Kax/S6Dg+Y0t8Xs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bdvVPKHR; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bdvVPKHR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC0901F000E9; Wed, 1 Jul 2026 14:56:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782917793; bh=UpoK9eu/G6h0aZVR0ejlKJdL/wtfChJRTb0z2mmfRr0=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=bdvVPKHRGR0ZNDOV4B4Fp//pwb/Y2QzbSM5bGPk7HQyRty6nbjAlHwboxYJulaArL NLXAvlechDQ4wsVPZNrUgMLlT19EoZBciKVZSAWUAbGJj65Xfy6ZPmArcjYuDJUsNC soU5YHfKmZxXsf2b8YUbuT7dLZgGaI1Qa9XxAC9ztfIlg0lQ8YyZ+PIVVpeRrNrDYj krusypqgZCbNEqAXbfL6tZvf74O6H2QaJL4ULCkOr6g1TkusHKl2ggjMSjlRWMK5xk KS1pY8e0yce4DH3+f6fU8WV+mbrflwx2OYG1gdpAy5W5hAv4FRNpaczeXPXYfpemTi 1CssDxGkwtxXA== Date: Wed, 1 Jul 2026 16:56:30 +0200 From: Frederic Weisbecker To: Phil Auld Cc: Waiman Long , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3.1] sched/isolation: Defer freeing of cpumask memblock memory to initcall Message-ID: References: <20260604182440.430811-1-longman@redhat.com> <20260701141357.GA156809@pauld.westford.csb> <20260701142559.GB156809@pauld.westford.csb> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260701142559.GB156809@pauld.westford.csb> Le Wed, Jul 01, 2026 at 10:25:59AM -0400, Phil Auld a écrit : > On Wed, Jul 01, 2026 at 10:13:57AM -0400 Phil Auld wrote: > > Hi Waiman, > > > > On Thu, Jun 04, 2026 at 02:24:40PM -0400 Waiman Long wrote: > > > When testing a linux-next kernel with commit 59bd1d914bb5 ("memblock: > > > warn when freeing reserved memory before memory map is initialized"), > > > the following warning was hit when there was a "nohz_full" kernel boot > > > parameter. > > > > > > Cannot free reserved memory because of deferred initialization of the memory map > > > WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0 > > > : > > > Call Trace: > > > > > > memblock_phys_free+0xcb/0x100 > > > housekeeping_init+0x14c/0x170 > > > start_kernel+0x207/0x450 > > > x86_64_start_reservations+0x24/0x30 > > > x86_64_start_kernel+0xda/0xe0 > > > common_startup_64+0x13e/0x141 > > > > > > > > > IOW, we shouldn't free memblock allocated memory so early > > > in the boot process when memory map isn't fully initialized in > > > deferred_init_memmap(). > > > > > > Fix it by saving the housekeeping cpumask memblock memory to > > > be freed into a free list in housekeeping_init() and add a new > > > housekeeping_late_init() helper to defer the actual freeing of memblock > > > memory to when initcall's are being processed. The non-atomic version > > > of the llist APIs are used as there is no contention. > > > > > > This commit also depends on the presence of commit 7c2eee9c1367 > > > ("memblock: don't touch memblock arrays when memblock_free() is called > > > late") to prevent a KASAN UAF bug report [1]. > > > > > > [1] https://lore.kernel.org/lkml/20260505051821.1107133-1-longman@redhat.com/ > > > > > > Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers") > > > Signed-off-by: Waiman Long > > > --- > > > kernel/sched/isolation.c | 16 +++++++++++++++- > > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > > > [v3.1] Add __initdata to memblock_freelist > > > > > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > > > index ef152d401fe2..156025ef81b7 100644 > > > --- a/kernel/sched/isolation.c > > > +++ b/kernel/sched/isolation.c > > > @@ -8,6 +8,7 @@ > > > * > > > */ > > > #include > > > +#include > > > #include > > > #include "sched.h" > > > > > > @@ -27,6 +28,7 @@ struct housekeeping { > > > }; > > > > > > static struct housekeeping housekeeping; > > > +static __initdata LLIST_HEAD(memblock_freelist); > > > > > > bool housekeeping_enabled(enum hk_type type) > > > { > > > @@ -189,10 +191,22 @@ void __init housekeeping_init(void) > > > WARN_ON_ONCE(cpumask_empty(omask)); > > > cpumask_copy(nmask, omask); > > > RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask); > > > - memblock_free(omask, cpumask_size()); > > > + __llist_add((struct llist_node *)omask, &memblock_freelist); > > > > This cast is somewhat concerning. I think I see why it's needed. Wrapping > > it in a proper struct would require more allocating and freeing and > > make the problem worse. It should work though. > > > > > > Fwiw, opencode/sonnet suggested a comment like this: > > /* > * We can't allocate wrapper structs from memblock as they'd need > * deferred freeing too. Instead, reuse the cpumask memory itself > * as llist nodes. This is safe because: > * - cpumask_size() >= sizeof(struct llist_node) > * - Memory is properly aligned (SMP_CACHE_BYTES) > * - The cpumask is never accessed after being added to the list > */ > > ... which may be overkill :) It tells the truth, just a bit too much :-) -- Frederic Weisbecker SUSE Labs