Re: [PATCH v2 00/13] Dynamic Kernel Stacks

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: David Laight <david.laight.linux@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Dave Hansen <dave.hansen@intel.com>,
	David Stevens <stevensd@google.com>,
	 Linus Walleij <linus.walleij@linaro.org>,
	Will Deacon <willdeacon@google.com>,
	 Quentin Perret <qperret@google.com>,
	Thomas Gleixner <tglx@kernel.org>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	 Andy Lutomirski <luto@kernel.org>, Xin Li <xin@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Uladzislau Rezki <urezki@gmail.com>, Kees Cook <kees@kernel.org>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	willy@infradead.org
Subject: Re: [PATCH v2 00/13] Dynamic Kernel Stacks
Date: Fri, 24 Apr 2026 23:06:18 +0000	[thread overview]
Message-ID: <aevzYQ8HxmDKj-1_@plex> (raw)
In-Reply-To: <20260424232637.054f15dd@pumpkin>

On 04-24 23:26, David Laight wrote:
> On Fri, 24 Apr 2026 21:35:20 +0000
> Pasha Tatashin <pasha.tatashin@soleen.com> wrote:
> 
> > On 04-24 12:41, Dave Hansen wrote:
> > > On 4/24/26 12:14, David Stevens wrote:  
> > > > The question is then: is this approach something that is fundamentally
> > > > untenable in the kernel  
> > > 
> > > Yes. Fundamentally untenable.
> > > 
> > > Not allowing stack faults has been a wonderful simplification. It's one
> > > of those things that just plain makes the kernel easier to maintain.
> > > Saving low single digits of system memory is not exactly making me eager
> > > to go back to the harder-to-maintain days.
> > > 
> > > I seriously doubt that this 1% is the lowest hanging fruit for memory
> > > bloat on these systems. ;)  
> > 
> > This true until, in a fleet of millions of machines, you encounter a 
> > one-in-a-billion chance of a stack overflow. You are then forced to 
> > double the statically allocated kernel stacks on every machine, paying a 
> > memory tax even though 99.999..% of threads never exceed 4K. This 
> > overhead accumulates to petabytes of wasted capacity.
> 
> And then you hit a stack fault in some path where you can't sleep and
> there isn't any available kernel memory.

Well, at least if we hit this rare case, we can simply double a buffer 
of pre-reserved stack memory per CPU. This still saves significant 
memory compared to wasting it on every single thread.

> An alternative idea is to arrange for some system calls to sleep in
> userspace, so when the thread is woken it re-executes the system call.
> It then makes sense to assign the kernel stack to the process when
> it enters the kernel.
> That might mean that you don't need a kernel stack for all the threads
> sleeping in futex() - it might even be possible to do the retry in
> userspace saving the second kernel entry most of the time.
> It is all 'hard and difficult' though.

I was thinking about a similar approach as well—sort of multiplexing the 
kernel stacks. But honestly, when trying to cover all the edge cases, I 
didn't find it to be any better or easier than just using dynamic kernel 
stacks.

An alternative approach, which was proposed at LSFMM by Willy, is to add 
an explicit deep stack calls. When we enter a path that we know is 
exceptionally deep, only then do we extend the stack, keeping the 
default (say, 8K) everywhere else.

> The easier solution is to rewrite the system code so it doesn't have
> 1000s of threads :-)

That ship sailed in the early 90s of the previous millennium.  Nowadays, 
we have high end workstations with almost 200 hardware threads. 
Rewriting system code to reduce thread counts simply isn't an option for 
our storage machines, which have millions of threads per unit.

+CC Matthew Wilcox

next prev parent reply	other threads:[~2026-04-24 23:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH v2 00/13] Dynamic Kernel Stacks David Stevens
2026-04-24 19:14 ` [PATCH v2 01/13] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE David Stevens
2026-04-24 19:14 ` [PATCH v2 02/13] fork: Don't assume fully populated stack during reuse David Stevens
2026-04-24 19:14 ` [PATCH v2 03/13] fork: Move vm_stack to the beginning of the stack David Stevens
2026-04-24 19:14 ` [PATCH v2 04/13] fork: separate vmap stack allocation and free calls David Stevens
2026-04-24 19:14 ` [PATCH v2 05/13] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range() public functions David Stevens
2026-04-24 19:14 ` [PATCH v2 06/13] fork: Move vmap stack freeing to work queue David Stevens
2026-04-24 19:14 ` [PATCH v2 07/13] fork: Dynamic Kernel Stacks David Stevens
2026-04-24 19:14 ` [PATCH v2 08/13] task_stack.h: Add stack_not_used() support for dynamic stack David Stevens
2026-04-24 19:14 ` [PATCH v2 09/13] fork: Dynamic Kernel Stack accounting David Stevens
2026-04-24 19:14 ` [PATCH v2 10/13] fork: Store task pointer in unpopulated stack ptes David Stevens
2026-04-24 19:14 ` [PATCH v2 11/13] x86/entry/fred: encode frame pointer on entry David Stevens
2026-04-24 19:14 ` [PATCH v2 12/13] x86: Add support for dynamic kernel stacks via FRED David Stevens
2026-04-24 19:14 ` [PATCH v2 13/13] x86: Add support for dynamic kernel stacks via IST David Stevens
2026-04-24 19:41 ` [PATCH v2 00/13] Dynamic Kernel Stacks Dave Hansen
2026-04-24 21:35   ` Pasha Tatashin
2026-04-24 22:21     ` Dave Hansen
2026-04-24 22:49       ` David Stevens
2026-04-24 22:26     ` David Laight
2026-04-24 23:06       ` Pasha Tatashin [this message]
2026-04-25  9:19   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aevzYQ8HxmDKj-1_@plex \
    --to=pasha.tatashin@soleen.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david.laight.linux@gmail.com \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=kees@kernel.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    --cc=willdeacon@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xin@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox