public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	 David Stevens <stevensd@google.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Linus Walleij <linus.walleij@linaro.org>,
	Will Deacon <willdeacon@google.com>,
	 Quentin Perret <qperret@google.com>,
	Thomas Gleixner <tglx@kernel.org>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, Andy Lutomirski <luto@kernel.org>,
	 Xin Li <xin@zytor.com>, Peter Zijlstra <peterz@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Uladzislau Rezki <urezki@gmail.com>, Kees Cook <kees@kernel.org>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH v2 00/13] Dynamic Kernel Stacks
Date: Mon, 27 Apr 2026 16:31:02 +0000	[thread overview]
Message-ID: <ae-DE1lC6-zEA8Zi@plex> (raw)
In-Reply-To: <6369e5ce-74e3-4c68-8053-d7d7d21b6955@zytor.com>

On 04-25 02:19, H. Peter Anvin wrote:
> On 2026-04-24 12:41, Dave Hansen wrote:
> > On 4/24/26 12:14, David Stevens wrote:
> >> The question is then: is this approach something that is fundamentally
> >> untenable in the kernel
> > 
> > Yes. Fundamentally untenable.
> > 
> > Not allowing stack faults has been a wonderful simplification. It's one
> > of those things that just plain makes the kernel easier to maintain.
> > Saving low single digits of system memory is not exactly making me eager
> > to go back to the harder-to-maintain days.
> > 
> > I seriously doubt that this 1% is the lowest hanging fruit for memory
> > bloat on these systems. ;)
> 
> It is worth noting that this was one of the VERY early design decisions that
> has shaped Linux from the beginning:
> 
> - No swapping of kernel memory
> - Kernel stacks are statically allocated
> - Physical RAM is mapped into the kernel at all times
> - A "monolithic" kernel using function calls, not message passing
> - A kernel interface that closely maps to the low-level application API
>   (e.g. each user space thread is a kernel thread.)
> - Kernel ABIs and APIs are subject to evolution; stability is only guaranteed
>   in user space.
> 
> Those design decisions are, by and large, what has made Linux Linux: a
> relatively simple, highly performant, and reliable system.

I think there is a bit of survivorship bias in that list. Originally,
there were many other foundational assumptions that have since evolved
as hardware and requirements scaled.

For example, there were assumptions about no dynamic hardware
reconfiguration (no memory/CPU hot-plug), uniform memory access (no
NUMA), and fixed page sizes (no THP or HugeTLB). All of those have
changed, and you, better than most, know of many other such examples.

A more recent example is PREEMPT_RT: the Linux kernel was originally
designed to be non-preemptible.

Even the assumptions in your list, such as "physical RAM is mapped into
the kernel at all times," are evolving: emulated pmem is not mapped, and
guestmemfd plans to allow unmapping memory from the direct map for
security reasons.

Aside from trying our best not to break user space and allowing the
internal kernel API to evolve, the other items are architectural
decisions that can and should adapt to new requirements.

We now have machines with thousands of hardware threads. Running
millions of software threads on such machines is a practical reality,
and at fleet scales, statically allocating kernel stacks for all of them
wastes a massive amount of memory.

The proposed solution won't affect Linux as a whole. It can be
optionally enabled for targeted configurations. Additionally, the max
stack size is still statically set; it simply isn't populated until
actually used.

Pasha


      parent reply	other threads:[~2026-04-27 16:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH v2 00/13] Dynamic Kernel Stacks David Stevens
2026-04-24 19:14 ` [PATCH v2 01/13] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE David Stevens
2026-04-24 19:14 ` [PATCH v2 02/13] fork: Don't assume fully populated stack during reuse David Stevens
2026-04-24 19:14 ` [PATCH v2 03/13] fork: Move vm_stack to the beginning of the stack David Stevens
2026-04-24 19:14 ` [PATCH v2 04/13] fork: separate vmap stack allocation and free calls David Stevens
2026-04-24 19:14 ` [PATCH v2 05/13] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range() public functions David Stevens
2026-04-24 19:14 ` [PATCH v2 06/13] fork: Move vmap stack freeing to work queue David Stevens
2026-04-24 19:14 ` [PATCH v2 07/13] fork: Dynamic Kernel Stacks David Stevens
2026-04-24 19:14 ` [PATCH v2 08/13] task_stack.h: Add stack_not_used() support for dynamic stack David Stevens
2026-04-24 19:14 ` [PATCH v2 09/13] fork: Dynamic Kernel Stack accounting David Stevens
2026-04-24 19:14 ` [PATCH v2 10/13] fork: Store task pointer in unpopulated stack ptes David Stevens
2026-04-24 19:14 ` [PATCH v2 11/13] x86/entry/fred: encode frame pointer on entry David Stevens
2026-04-24 19:14 ` [PATCH v2 12/13] x86: Add support for dynamic kernel stacks via FRED David Stevens
2026-04-24 19:14 ` [PATCH v2 13/13] x86: Add support for dynamic kernel stacks via IST David Stevens
2026-04-24 19:41 ` [PATCH v2 00/13] Dynamic Kernel Stacks Dave Hansen
2026-04-24 21:35   ` Pasha Tatashin
2026-04-24 22:21     ` Dave Hansen
2026-04-24 22:49       ` David Stevens
2026-04-24 22:26     ` David Laight
2026-04-24 23:06       ` Pasha Tatashin
2026-04-25  9:19   ` H. Peter Anvin
2026-04-27 16:17     ` Dave Hansen
2026-04-27 16:31     ` Pasha Tatashin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae-DE1lC6-zEA8Zi@plex \
    --to=pasha.tatashin@soleen.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=kees@kernel.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    --cc=willdeacon@google.com \
    --cc=x86@kernel.org \
    --cc=xin@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox