From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5325C233958
	for <linux-kernel@vger.kernel.org>; Fri, 19 Jun 2026 21:59:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1781906364; cv=none; b=tN3URpZsSXNrowp5xI4gJ8oj4DfSoK9rTOVtf4cUZ4L3ojyeJ+Ub/d1PFV24m2Tw0b8s2oVK6hKtmFLsyw7SP756sa1AaEP+t7Va+G0tW+wuTs3yw3ZghqIlGquqNJfr/yIf/0mt90llh/Fz+zwx2WO96vyybYqEMA2uyBWC8GI=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1781906364; c=relaxed/simple;
	bh=gZ9r4f8lqICiPs/txGtOdVjHyaFEuR+sk3GmAVIs7CI=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID:
	 MIME-Version:Content-Type; b=X8WLFAdcRl3t9/tnnkiFJTKra7KCKXACdfKcBmAeOguH0J2ifCnGQdJ2XV9PbjC/XQi75TmxmSqnFeawVcYpovn8zoAyP4qayBtTMQ+LyblUE5h+4i3HNfGBl7OwTIQxc4GEX58NXuG82W8nqA63wGxgThcXWwNY1r3KJLq5pfo=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mh6D9kHz; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mh6D9kHz"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 47EE91F000E9;
	Fri, 19 Jun 2026 21:59:21 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1781906362;
	bh=YE9FsAEawOx/IbPBCnSCpqjzjQQ6cvPqBHsq0OtQzPU=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date;
	b=mh6D9kHzxJY0pa3LOENHD/3hzGrhGHyUo/Btagm7V30EUC1odfn+g7EPyOqel0Pa2
	 6KVvTR2tqr+Ir1pg9Gxz9+Q0q+6WuSawLH7UcOFW521X9PX2K8aUm2IqPqWVQoM61h
	 JNwqXTIQdfx8e/QRZluAQJ71M5uKSF7VetigNM9VyViDfvSbnkcJ5mMFHiWscMUtle
	 TWZF0GsS5DSpoC5ufaRDMCM6e4/ZdSmvJpCkg28UQmzafIo1kKBmHTNabrqnMnDpvh
	 PtqirfSKgFhVIykW8zogS1GjXfT38UR/tNztYwt6X77YCdsMp0oe1NCkTl6t1F9F+U
	 aP3F3GpJodxZw==
From: Thomas Gleixner <tglx@kernel.org>
To: Zach O'Keefe <zokeefe@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>, "H. Peter Anvin" <hpa@zytor.com>,
 David Stevens <stevensd@google.com>, Pasha Tatashin
 <pasha.tatashin@soleen.com>, Linus Walleij <linus.walleij@linaro.org>,
 Will Deacon <willdeacon@google.com>, Quentin Perret <qperret@google.com>,
 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave
 Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, Andy Lutomirski
 <luto@kernel.org>, Xin Li <xin@zytor.com>, Peter Zijlstra
 <peterz@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, David
 Hildenbrand <david@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>, "Liam R.
 Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@kernel.org>,
 Mike Rapoport <rppt@kernel.org>, Suren Baghdasaryan <surenb@google.com>,
 Michal Hocko <mhocko@suse.com>, Uladzislau Rezki <urezki@gmail.com>, Kees
 Cook <kees@kernel.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 00/13] Dynamic Kernel Stacks
In-Reply-To: <CAAa6QmSHBDeY0G=_N1P4dAAH917J7jerfZrWDfDd8w=8jH8nVw@mail.gmail.com>
References: <20260424191456.2679717-1-stevensd@google.com>
 <da9321ad-4198-494e-b9fa-30d69bd29be3@intel.com>
 <6369e5ce-74e3-4c68-8053-d7d7d21b6955@zytor.com>
 <dbeeea58-16cb-4383-b8e8-91a8ca84e88a@intel.com>
 <CAAa6QmRw6QLnVJ8+uvMV8ASreLXzSab5Jii3Ju11qCZYio6Few@mail.gmail.com>
 <c070c4d6-a570-4eea-aca0-72eed319a198@intel.com> <87pl1md7h0.ffs@fw13>
 <CAAa6QmSHBDeY0G=_N1P4dAAH917J7jerfZrWDfDd8w=8jH8nVw@mail.gmail.com>
Date: Fri, 19 Jun 2026 23:59:19 +0200
Message-ID: <87qzm2b39k.ffs@fw13>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Zach!

On Fri, Jun 19 2026 at 12:20, Zach O'Keefe wrote:
> While it seems common opinion that the IST-based solution is fragile,
> what of FRED? It seems like this is exactly the kind of support needed
> to avoid some of the aforementioned sw "mess" in various x86 exception
> handling paths. I agree that it's less-than-ideal that we are forced
> to downgrade exception levels in the common #PF case, but is that an
> unsurmountable problem? Pardon my ignorance.

The #PF path is considered perfomance critical. But how much the
downgrade matters needs actual numbers to analyze under various workload
scenarios.

I've not seen numbers to that effect anywhere. The only numbers provided
are marketing material about the memory savings on a freshly booted idle
machine. There are _zero_ numbers about the actual real world savings,
but claims about the PETABYTE savings possible.

Seriously?

> Lastly, I just want to clarify what folks have meant by "extraordinary
> claims" or "evidence".  Aside from the above discussion on FRED
> exception handling, the "only" other part of this is the allocation.

Clearly anything which is explained with "shouldn't happen" and
"unlikely". At cloud scale nothing is unlikely anymore. That's simply the
reality of statistical math.

As I pointed out before the same applies to the unexplained
upgrade/downgrade game with external interrupts. Such issues cannot be
papered over without understanding the root cause as from decades long
experience they come inevitably back some time down the road. Cloud
scale even guarantees that.

> Are people concerned about memory unavailability, deadlocking-type
> issues, or something else? We have considerable design freedom here to
> avoid certain classes of unreliability, but=E2=80=94barring any clever
> tricks=E2=80=94I don't know if the allocation can be guaranteed to succee=
d in
> all conceivable circumstances. I want to ensure that reality does not
> present a hard blocker.

First of all the failure scenario has to be clearly defined.

Right now, if I'm reading the patches correctly this simply can end up
killing the wrong tasks/processes just because an OOM situation results
in a depletion of the per CPU cache and the very wrong task which runs
into the deep call stack situation ends up in the creek without a paddle.

Given that you even fail to abort a CPU bringup when the allocation of
the per CPU stack page cache fails, makes it pretty clear that there has
been spent exactly zero thoughts about this problem.

Why the heck does this cache refill call have to be unconditionally in
__schedule() where preemption is disabled and therefore GFP_ATOMIC
is mandatory? I know "Works for me" (most of the time).

And just because I was looking at the patch in question I found this
other insanity:

> +	/*
> +	 * Most likely we faulted in the page right next to the last mapped
> +	 * page in the stack, however, it is possible (but very unlikely) that
> +	 * the faulted page is actually skips some pages in the stack. Make sure
> +	 * we do not create  more than one holes in the stack, and map every
> +	 * page between the current fault  address and the last page that is
> +	 * mapped in the stack.
> +	 */

Can anyone with a sane mind and the most minimal understanding of the
kernel's inner working explain to me how the kernel can skip "some
pages" on the stack?

If the kernel skips a whole page or more then there is a serious bug
somewhere. I might be missing something, but again the "very unlikely"
wording which handwaves about it is just disgustingly useless.

I disagree with Dave on the RFC status of this series. It's not even
close to RFC, it's at PoC status.

Thanks,

        tglx