From: Mark Rutland <mark.rutland@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Kees Cook <kees@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Ard Biesheuvel <ardb@kernel.org>,
Jeremy Linton <jeremy.linton@arm.com>,
Will Deacon <will@kernel.org>,
Catalin Marinas <Catalin.Marinas@arm.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [DISCUSSION] kstack offset randomization: bugs and performance
Date: Tue, 18 Nov 2025 11:25:05 +0000 [thread overview]
Message-ID: <aRxXkSx3WbgAPp_Q@J2N7QTR9R3> (raw)
In-Reply-To: <251bcfb4-6069-40f7-be03-0a745bb8f761@arm.com>
On Tue, Nov 18, 2025 at 10:28:29AM +0000, Ryan Roberts wrote:
> On 17/11/2025 20:27, Kees Cook wrote:
> > On Mon, Nov 17, 2025 at 11:31:22AM +0000, Ryan Roberts wrote:
> >> On 17/11/2025 11:30, Ryan Roberts wrote:
> The original rationale for a separate choose_random_kstack_offset() at the end
> of the syscall is described as:
>
> * This position in the syscall flow is done to
> * frustrate attacks from userspace attempting to learn the next offset:
> * - Maximize the timing uncertainty visible from userspace: if the
> * offset is chosen at syscall entry, userspace has much more control
> * over the timing between choosing offsets. "How long will we be in
> * kernel mode?" tends to be more difficult to predict than "how long
> * will we be in user mode?"
> * - Reduce the lifetime of the new offset sitting in memory during
> * kernel mode execution. Exposure of "thread-local" memory content
> * (e.g. current, percpu, etc) tends to be easier than arbitrary
> * location memory exposure.
>
> I'm not totally convinced by the first argument; for arches that use the tsc,
> sampling the tsc at syscall entry would mean that userspace can figure out the
> random value that will be used for syscall N by sampling the tsc and adding a
> bit just before calling syscall N. Sampling the tsc at syscall exit would mean
> that userspace can figure out the random value that will be used for syscall N
> by sampling the tsc and subtracting a bit just after syscall N-1 returns. I
> don't really see any difference in protection?
>
> If you're trying force the kernel-sampled tsc to be a specific value, then for
> the sample-on-exit case, userspace can just make a syscall with an invalid id as
> it's syscall N-1 and in that case the duration between entry and exit is tiny
> and fixed so it's still pretty simple to force the value.
FWIW, I agree. I don't think we're gaining much based on the placement
of choose_random_kstack_offset() at the start/end of the entry/exit
sequences.
As an aside, it looks like x86 calls choose_random_kstack_offset() for
*any* return to userspace, including non-syscall returns (e.g. from
IRQ), in arch_exit_to_user_mode_prepare(). There's some additional
randomness/perturbation that'll cause, but logically it's not necessary
to do that for *all* returns to userspace.
> So what do you think of this approach? :
>
> #define add_random_kstack_offset(rand) do { \
> if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> &randomize_kstack_offset)) { \
> u32 offset = raw_cpu_read(kstack_offset); \
> u8 *ptr; \
> \
> offset = ror32(offset, 5) ^ (rand); \
> raw_cpu_write(kstack_offset, offset); \
> u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
> /* Keep allocation even after "ptr" loses scope. */ \
> asm volatile("" :: "r"(ptr) : "memory"); \
> } \
> } while (0)
>
> This ignores "Maximize the timing uncertainty" (but that's ok because the
> current version doesn't really do that either), but strengthens "Reduce the
> lifetime of the new offset sitting in memory".
Is this assuming that 'rand' can be generated in a non-preemptible
context? If so (and this is non-preemptible), that's fine.
I'm not sure whether that was the intent, or this was ignoring the
rescheduling problem.
If we do this per-task, then that concern disappears, and this can all
be preemptible.
Mark.
next prev parent reply other threads:[~2025-11-18 11:25 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <66c4e2a0-c7fb-46c2-acce-8a040a71cd8e@arm.com>
2025-11-17 11:31 ` [DISCUSSION] kstack offset randomization: bugs and performance Ryan Roberts
2025-11-17 16:47 ` Arnd Bergmann
2025-11-17 17:23 ` Ryan Roberts
2025-11-17 17:46 ` Mark Rutland
2025-11-17 23:04 ` Arnd Bergmann
2025-11-18 17:15 ` Jason A. Donenfeld
2025-11-18 17:21 ` Ryan Roberts
2025-11-18 17:28 ` Jason A. Donenfeld
2025-11-17 20:27 ` Kees Cook
2025-11-18 10:28 ` Ryan Roberts
2025-11-18 11:25 ` Mark Rutland [this message]
2025-11-18 12:16 ` Ryan Roberts
2025-11-18 11:05 ` Mark Rutland
2025-11-17 20:56 ` Jeremy Linton
2025-11-18 11:05 ` Ryan Roberts
2025-11-24 14:36 ` Will Deacon
2025-11-24 17:11 ` Kees Cook
2025-11-24 17:50 ` Ryan Roberts
2025-11-24 20:51 ` Kees Cook
2025-11-25 11:14 ` Ryan Roberts
2025-11-26 22:58 ` Ard Biesheuvel
2025-11-27 8:00 ` Kees Cook
2025-11-27 11:50 ` Ryan Roberts
2025-11-27 12:19 ` Ard Biesheuvel
2025-11-27 14:09 ` Ryan Roberts
2025-11-27 19:17 ` Kees Cook
2025-11-24 19:08 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRxXkSx3WbgAPp_Q@J2N7QTR9R3 \
--to=mark.rutland@arm.com \
--cc=Catalin.Marinas@arm.com \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=jeremy.linton@arm.com \
--cc=kees@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ryan.roberts@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox