All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: David Stevens <stevensd@google.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Linus Walleij <linus.walleij@linaro.org>,
	Will Deacon <willdeacon@google.com>,
	Quentin Perret <qperret@google.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>, Xin Li <xin@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Uladzislau Rezki <urezki@gmail.com>, Kees Cook <kees@kernel.org>
Cc: David Stevens <stevensd@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 10/13] fork: Store task pointer in unpopulated stack ptes
Date: Sun, 28 Jun 2026 01:11:20 +0200	[thread overview]
Message-ID: <87qzlrh947.ffs@fw13> (raw)
In-Reply-To: <87wlvkhqd0.ffs@fw13>

On Sat, Jun 27 2026 at 00:46, Thomas Gleixner wrote:
> On Fri, Apr 24 2026 at 12:14, David Stevens wrote:
>> Store the task pointer in the ptes of the unpopulated pages of dynamic
>> stacks, to allow the vm_struct pointer to be retrieved without relying
>> on any locks or current.
>
> You fail to explain why you can't use current. Changelogs have to
> describe the WHY and not the WHAT.

I obviously know why you can't use it. But the absence of a proper
explanation and my disgust for the implementation bothered me enough to
look deeper into it.

Let's look at the only problematic case:

      schedule()
          ....
          switch_to(prev, next)
              switch_to_asm(prev, next)
1)                switch(RSP)
                  __switch_to(prev, next)
2)                    this_cpu_write(current_task, next);

There is obviously a hole between #1 and #2 where 'current_task' is not
giving the right answer. You work around that with this PTE storage
magic which is admittedly smart, but completely overengineered and not
necessary at all.

Why?

If you look at the above condensed context switch logic related to this
problem thoroughly, you'll notice that there are three sources of
information:

    - prev: the task being scheduled out
    - next: the task being scheduled in
    - RSP:  the stack pointer

Between #1 and #2 it cannot be determined whether RSP belongs to 'prev'
or 'next' because 'next' is not exposed to the fault handler. But if it
would be exposed it would allow to answer the question where RSP belongs
to, no?

So the obvious _and_ simple solution _is_ to expose 'next':

      schedule()
          ....
          switch_to(prev, next)
1)            raw_cpu_write(next_task, next);
              switch_to_asm(prev, next)
2)                switch(RSP)
                  __switch_to(prev, next)
3)                    raw_cpu_write(current_task, next);

With that the stack fault handler logic becomes:

   curr = raw_cpu_read(current_task);
   addr = fred_event_data(regs);

   if (within_task_stack(addr, curr))
   	return handle_stack_fault(regs, curr);

   next = raw_cpu_read(next_task);
   if (curr != next && within_task_stack(addr, next)
   	return handle_stack_fault(regs, next);

   return 0;

Which is correct at any point in time and that pattern works on _all_
supported architectures because it's all CPU local. All you need is one
extra store. When done right that's ending up in the same cache line
which is anyway dirtied by the context switch (i.e. current_task), so
you won't even be able to measure the overhead.

Thanks,

        tglx

---
  Everything should be made as simple as possible, but not simpler. - Einstein


  reply	other threads:[~2026-06-27 23:11 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:14 [PATCH v2 00/13] Dynamic Kernel Stacks David Stevens
2026-04-24 19:14 ` [PATCH v2 01/13] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE David Stevens
2026-04-24 19:14 ` [PATCH v2 02/13] fork: Don't assume fully populated stack during reuse David Stevens
2026-04-24 19:14 ` [PATCH v2 03/13] fork: Move vm_stack to the beginning of the stack David Stevens
2026-04-24 19:14 ` [PATCH v2 04/13] fork: separate vmap stack allocation and free calls David Stevens
2026-04-24 19:14 ` [PATCH v2 05/13] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range() public functions David Stevens
2026-04-24 19:14 ` [PATCH v2 06/13] fork: Move vmap stack freeing to work queue David Stevens
2026-04-24 19:14 ` [PATCH v2 07/13] fork: Dynamic Kernel Stacks David Stevens
2026-04-24 19:14 ` [PATCH v2 08/13] task_stack.h: Add stack_not_used() support for dynamic stack David Stevens
2026-04-24 19:14 ` [PATCH v2 09/13] fork: Dynamic Kernel Stack accounting David Stevens
2026-04-24 19:14 ` [PATCH v2 10/13] fork: Store task pointer in unpopulated stack ptes David Stevens
2026-06-26 22:46   ` Thomas Gleixner
2026-06-27 23:11     ` Thomas Gleixner [this message]
2026-04-24 19:14 ` [PATCH v2 11/13] x86/entry/fred: encode frame pointer on entry David Stevens
2026-05-20 22:24   ` David Stevens
2026-05-22 22:25     ` H. Peter Anvin
2026-05-24 18:22       ` Xin Li
2026-04-24 19:14 ` [PATCH v2 12/13] x86: Add support for dynamic kernel stacks via FRED David Stevens
2026-06-26 22:39   ` Thomas Gleixner
2026-06-27  4:05     ` David Stevens
2026-04-24 19:14 ` [PATCH v2 13/13] x86: Add support for dynamic kernel stacks via IST David Stevens
2026-04-24 19:41 ` [PATCH v2 00/13] Dynamic Kernel Stacks Dave Hansen
2026-04-24 21:35   ` Pasha Tatashin
2026-04-24 22:21     ` Dave Hansen
2026-04-24 22:49       ` David Stevens
2026-04-24 22:26     ` David Laight
2026-04-24 23:06       ` Pasha Tatashin
2026-06-19  0:29       ` Dave Hansen
2026-06-19 19:56         ` Zach O'Keefe
2026-06-20  5:25         ` David Stevens
2026-06-20 23:22           ` Dave Hansen
2026-04-25  9:19   ` H. Peter Anvin
2026-04-27 16:17     ` Dave Hansen
2026-06-18 14:50       ` Zach O'Keefe
2026-06-18 18:53         ` Dave Hansen
2026-06-18 22:28           ` H. Peter Anvin
2026-06-19  0:40             ` David Stevens
2026-06-19  0:44               ` H. Peter Anvin
2026-06-19 12:45           ` Thomas Gleixner
2026-06-19 19:20             ` Zach O'Keefe
2026-06-19 21:59               ` Thomas Gleixner
2026-06-20  5:02                 ` David Stevens
2026-06-20 21:59                   ` Thomas Gleixner
2026-06-20 19:33                 ` Zach O'Keefe
2026-06-20 19:44                   ` H. Peter Anvin
2026-06-20 20:01                     ` Zach O'Keefe
2026-06-20 23:34                   ` Thomas Gleixner
2026-06-22 23:00                     ` Zach O'Keefe
2026-06-23  7:50                       ` David Hildenbrand (Arm)
2026-06-23  9:10                         ` David Laight
2026-06-23  9:19                           ` David Hildenbrand (Arm)
2026-06-23 21:58                       ` Thomas Gleixner
2026-06-25 11:56         ` Andrew Cooper
2026-06-25 21:38           ` H. Peter Anvin
2026-06-26  8:16         ` David Laight
2026-04-27 16:31     ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87qzlrh947.ffs@fw13 \
    --to=tglx@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=kees@kernel.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    --cc=willdeacon@google.com \
    --cc=x86@kernel.org \
    --cc=xin@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.