From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF6CB388382 for ; Fri, 19 Jun 2026 12:45:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781873137; cv=none; b=UVQVhFeghTZuqTxVObAM6YGhCgqiD4f4ZVEQOvEAUlGfGPLkR2dtxUOEXDA0JMfpKDy1vlql7/qmZLNRmpLyBvfjaZmkMiSog81ASAfIp1nFX8jYaiWULoPbyTqb/TE1CyazBHyoyakM/LTWj/403obfSpQFLy1tJeSYGLUwH7w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781873137; c=relaxed/simple; bh=C/oz/WP1ZqNGOqBp7Ttwt9pL1QqcidIduo0OYImy5Jk=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=PDSBoG4VAb2oS6ShpiJpxMxJCh17/r1KUmcfRmfTEDxc7FwwqPo3ZX1A99zpO6+dxmn9dOwm3Jg1TDDrlnfLnpVJHW301sOyoGEOCsU1mEJE9k+OrUIBahuPr4FjNe6FyK1OTELyrXzS4STb76IUv7I/Q9/vLCX7W9HxZTvcuLU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=crEsyyb4; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="crEsyyb4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9EAF11F000E9; Fri, 19 Jun 2026 12:45:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781873135; bh=L/Vhz6lOiktHFsg9eoV4+oceyN6BbtxBaibjZtdCVyw=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=crEsyyb4DPjpSRuGTlnWJ6/ztzR5JnGP9C/rROgh6jlFUllrBxDQaANCr39K6zTUj EZfZwtqBaGCBJULOoxGCl1ETc8HjbrAF+JiwmihwwpJR+BytRx67bASip+9cQXTCnm sov7vDyQt1oPjLb7Cv9yNPU22Slw/QYIPuFeRszoLlYXHhm9ydbpCzAUNOEXKM/L52 erekEPdF5PNV7UIX2U/C2JG+gZHJC9X3PSCEuriwiircCaLfXK6/yv08OW8PjeBqoI IRpGFPDuEXq2pqW8LskhzGF4c7AIL06/SBdNhAl+dzhvANvoek/MLugG8Zru8ytbEK O1Lkp9ddMZgpg== From: Thomas Gleixner To: Dave Hansen , Zach O'Keefe Cc: "H. Peter Anvin" , David Stevens , Pasha Tatashin , Linus Walleij , Will Deacon , Quentin Perret , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Andy Lutomirski , Xin Li , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Uladzislau Rezki , Kees Cook , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 00/13] Dynamic Kernel Stacks In-Reply-To: References: <20260424191456.2679717-1-stevensd@google.com> <6369e5ce-74e3-4c68-8053-d7d7d21b6955@zytor.com> Date: Fri, 19 Jun 2026 14:45:31 +0200 Message-ID: <87pl1md7h0.ffs@fw13> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Thu, Jun 18 2026 at 11:53, Dave Hansen wrote: > On 6/18/26 07:50, Zach O'Keefe wrote: >> Overall, are there any particular painpoints you'd like to see flushed >> out, first? > > Handing exceptions in the kernel is hard. Period. That's the pain point. > Just look at NMIs, #VC, #MC and the rest of that mess. Just look at how > we've moved away from ever taking random page faults in the kernel. Or, > heck, randomly taking faults at *all*. We've concentrated them in very > specific places, not in general code. > > Now you're arguing that the kernel can pretty much take a fault *AND* > allocate memory reliably at any point*. > > I just don't see the collateral in this series to justify that claim. There is none because it's simply impossible to guarantee and when reading through the series even a CPU hotplug operation happily continues with success when the stack page cache of the upcoming CPU can't be filled.... > The NMI entry code is a disaster because NMIs can happen anywhere. The > #VC code is a disaster because #VCs can happen anywhere. Once #PF can > happen anywhere*, why won't #PF become a disaster? It's already a disaster. See kvm_handle_async_pf() and the cute issues vs. taking a #PF in NMI or some other IST handler. > It would be a completely different story if there was a track record of > finding and fixing bugs in the x86 entry code from the authors of this > series. But I don't think I've ever seen a single email from your folks > before this, much less a review tag or a patch. I'd be much happier if > you got Andy L's blessing on this, for example. > >> How would you like to proceed? Would explicitly marking this as an >> experimental config, in the interim, be more attractive? > No. > > The enemy here is complexity. *Maintenance* complexity. Being able to > compile out some of the complexity helps with debugging. But it doesn't > help maintaining the code. Correct. Aside of that the part which worries me most is the IDT hackery. That's fragile as hell and full of unvalidated assumptions. Reading "should not happen" several times in a changelog doesn't make me more confident. "It is possible for #MCE to occur on the #PF IST stack, but the #MCE handler shouldn't generate new #PFs. The reentrancy check on the #PF stack will trigger if any recoverable #MCEs do generate #PFs - if there are actually reports of it happening, we can address it then." Seriously? We don't wait until the report comes in because the report won't even happen in the worst case: #PF on IST ... cmp 0, reentrance jne abort #MC ... #PF rewinds #PF IST cmp 0, reentrance jne abort <- Not taken because #MC happened before it could be set. IST is fundamentally not suitable for this and I'm sure there are more holes in this. I haven't looked at the FRED side of affairs yet in detail, but the handwavy explanation about external interrupts having to be moved to stack level 1 and unconditionally bounced back does not really make it appealing. I agree that chapter 8.3.4 in the SDM volume 3 is not really helpful, but papering over the problem without understanding the root cause is not cutting it. If it's a genuine FRED hardware issue, then this needs to be understood and documented. The x86 folks have spent a lot of time to make the horrific x86 interrupt and exception handling solid and therefore have zero interest to deal with the fallout of something based on "shouldn't happen" assumptions. Either it can prove correctness under all circumstances or not. I understand the save tons of memory accross a fleet argument, but a large fleet is also a guarantee to trigger all the "should not happen and impropable" issues which are gracefully handwaved away. That's a truly bad tradeoff as it ends up in non-decodable bug reports. What's worse the have to be handled by the maintainers and not necessarily by those who implemented it. Thanks, tglx