From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5325C233958 for ; Fri, 19 Jun 2026 21:59:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781906364; cv=none; b=tN3URpZsSXNrowp5xI4gJ8oj4DfSoK9rTOVtf4cUZ4L3ojyeJ+Ub/d1PFV24m2Tw0b8s2oVK6hKtmFLsyw7SP756sa1AaEP+t7Va+G0tW+wuTs3yw3ZghqIlGquqNJfr/yIf/0mt90llh/Fz+zwx2WO96vyybYqEMA2uyBWC8GI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781906364; c=relaxed/simple; bh=gZ9r4f8lqICiPs/txGtOdVjHyaFEuR+sk3GmAVIs7CI=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=X8WLFAdcRl3t9/tnnkiFJTKra7KCKXACdfKcBmAeOguH0J2ifCnGQdJ2XV9PbjC/XQi75TmxmSqnFeawVcYpovn8zoAyP4qayBtTMQ+LyblUE5h+4i3HNfGBl7OwTIQxc4GEX58NXuG82W8nqA63wGxgThcXWwNY1r3KJLq5pfo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mh6D9kHz; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mh6D9kHz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 47EE91F000E9; Fri, 19 Jun 2026 21:59:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781906362; bh=YE9FsAEawOx/IbPBCnSCpqjzjQQ6cvPqBHsq0OtQzPU=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=mh6D9kHzxJY0pa3LOENHD/3hzGrhGHyUo/Btagm7V30EUC1odfn+g7EPyOqel0Pa2 6KVvTR2tqr+Ir1pg9Gxz9+Q0q+6WuSawLH7UcOFW521X9PX2K8aUm2IqPqWVQoM61h JNwqXTIQdfx8e/QRZluAQJ71M5uKSF7VetigNM9VyViDfvSbnkcJ5mMFHiWscMUtle TWZF0GsS5DSpoC5ufaRDMCM6e4/ZdSmvJpCkg28UQmzafIo1kKBmHTNabrqnMnDpvh PtqirfSKgFhVIykW8zogS1GjXfT38UR/tNztYwt6X77YCdsMp0oe1NCkTl6t1F9F+U aP3F3GpJodxZw== From: Thomas Gleixner To: Zach O'Keefe Cc: Dave Hansen , "H. Peter Anvin" , David Stevens , Pasha Tatashin , Linus Walleij , Will Deacon , Quentin Perret , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Andy Lutomirski , Xin Li , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Uladzislau Rezki , Kees Cook , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 00/13] Dynamic Kernel Stacks In-Reply-To: References: <20260424191456.2679717-1-stevensd@google.com> <6369e5ce-74e3-4c68-8053-d7d7d21b6955@zytor.com> <87pl1md7h0.ffs@fw13> Date: Fri, 19 Jun 2026 23:59:19 +0200 Message-ID: <87qzm2b39k.ffs@fw13> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Zach! On Fri, Jun 19 2026 at 12:20, Zach O'Keefe wrote: > While it seems common opinion that the IST-based solution is fragile, > what of FRED? It seems like this is exactly the kind of support needed > to avoid some of the aforementioned sw "mess" in various x86 exception > handling paths. I agree that it's less-than-ideal that we are forced > to downgrade exception levels in the common #PF case, but is that an > unsurmountable problem? Pardon my ignorance. The #PF path is considered perfomance critical. But how much the downgrade matters needs actual numbers to analyze under various workload scenarios. I've not seen numbers to that effect anywhere. The only numbers provided are marketing material about the memory savings on a freshly booted idle machine. There are _zero_ numbers about the actual real world savings, but claims about the PETABYTE savings possible. Seriously? > Lastly, I just want to clarify what folks have meant by "extraordinary > claims" or "evidence". Aside from the above discussion on FRED > exception handling, the "only" other part of this is the allocation. Clearly anything which is explained with "shouldn't happen" and "unlikely". At cloud scale nothing is unlikely anymore. That's simply the reality of statistical math. As I pointed out before the same applies to the unexplained upgrade/downgrade game with external interrupts. Such issues cannot be papered over without understanding the root cause as from decades long experience they come inevitably back some time down the road. Cloud scale even guarantees that. > Are people concerned about memory unavailability, deadlocking-type > issues, or something else? We have considerable design freedom here to > avoid certain classes of unreliability, but=E2=80=94barring any clever > tricks=E2=80=94I don't know if the allocation can be guaranteed to succee= d in > all conceivable circumstances. I want to ensure that reality does not > present a hard blocker. First of all the failure scenario has to be clearly defined. Right now, if I'm reading the patches correctly this simply can end up killing the wrong tasks/processes just because an OOM situation results in a depletion of the per CPU cache and the very wrong task which runs into the deep call stack situation ends up in the creek without a paddle. Given that you even fail to abort a CPU bringup when the allocation of the per CPU stack page cache fails, makes it pretty clear that there has been spent exactly zero thoughts about this problem. Why the heck does this cache refill call have to be unconditionally in __schedule() where preemption is disabled and therefore GFP_ATOMIC is mandatory? I know "Works for me" (most of the time). And just because I was looking at the patch in question I found this other insanity: > + /* > + * Most likely we faulted in the page right next to the last mapped > + * page in the stack, however, it is possible (but very unlikely) that > + * the faulted page is actually skips some pages in the stack. Make sure > + * we do not create more than one holes in the stack, and map every > + * page between the current fault address and the last page that is > + * mapped in the stack. > + */ Can anyone with a sane mind and the most minimal understanding of the kernel's inner working explain to me how the kernel can skip "some pages" on the stack? If the kernel skips a whole page or more then there is a serious bug somewhere. I might be missing something, but again the "very unlikely" wording which handwaves about it is just disgustingly useless. I disagree with Dave on the RFC status of this series. It's not even close to RFC, it's at PoC status. Thanks, tglx