* [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation
@ 2025-11-27 10:59 Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 1/2] randomize_kstack: Maintain kstack_offset per task Ryan Roberts
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-11-27 10:59 UTC (permalink / raw)
To: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld
Cc: Ryan Roberts, linux-hardening, linux-arm-kernel, linux-kernel
Hi All,
I guess this is an alternative to Ard's series since he beat me to it. I'll
benchmark his series so we have comparable numbers...
As I reported at [1], kstack offset randomisation suffers from a couple of bugs
and, on arm64 at least, the performance is poor. This series attempts to fix
both; patch 1 provides back-portable fixes for the functional bugs. Patch 2
proposes a performance improvement approach.
I've looked at a few different options but ultimately decided that Jeremy's
original prng approach is the fastest, and I want to make the argument that it
is "secure enough" for this use case. This could be combined with an occasional
reseed from the crng; If amortized over ~10K syscalls it makes no difference to
mean performance. But I'm not convinced of the value. I think we should aim to
keep it simple and architecture agnostic.
Ultimately we are only randomising the offset with 6 bits. So an attack that
depnds on knowing the stack offset would get lucky 1 in 64 attempts. Surely
that's trivially brute forcable? Given that, why go to so much effort to ensure
the 6 bits are strong random?
More details in the commit logs.
Performance
===========
Mean and tail performance of 3 "small" syscalls was measured. syscall was made
10 million times and each individually measured and binned. These results have
low noise so I'm confident that they are trustworthy.
The baseline if v6.18-rc5 with stack randomization turned *off*. So I'm showing
performance cost of turning it on without any changes to the implementation,
then the reduced performance cost of turning it on with my changes applied.
arm64 (AWS Graviton3):
+-----------------+--------------+-------------+---------------+
| Benchmark | Result Class | v6.18-rc5 | per-task-prng |
| | | rndstack-on | |
| | | | |
+=================+==============+=============+===============+
| syscall/getpid | mean (ns) | (R) 15.62% | (R) 3.43% |
| | p99 (ns) | (R) 155.01% | (R) 3.20% |
| | p99.9 (ns) | (R) 156.71% | (R) 2.93% |
+-----------------+--------------+-------------+---------------+
| syscall/getppid | mean (ns) | (R) 14.09% | (R) 2.12% |
| | p99 (ns) | (R) 152.81% | 1.55% |
| | p99.9 (ns) | (R) 153.67% | 1.77% |
+-----------------+--------------+-------------+---------------+
| syscall/invalid | mean (ns) | (R) 13.89% | (R) 3.32% |
| | p99 (ns) | (R) 165.82% | (R) 3.51% |
| | p99.9 (ns) | (R) 168.83% | (R) 3.77% |
+-----------------+--------------+-------------+---------------+
Because arm64 was previously using get_random_u16(), it was expensive when it
didn't have any buffered bits and had to call into the crng. That's what caused
the enormous tail latency.
x86 (AWS Sapphire Rapids):
+-----------------+--------------+-------------+---------------+
| Benchmark | Result Class | v6.18-rc5 | per-task-prng |
| | | rndstack-on | |
| | | | |
+=================+==============+=============+===============+
| syscall/getpid | mean (ns) | (R) 13.32% | (R) 4.60% |
| | p99 (ns) | (R) 13.38% | (R) 18.08% |
| | p99.9 (ns) | 16.26% | (R) 19.38% |
+-----------------+--------------+-------------+---------------+
| syscall/getppid | mean (ns) | (R) 11.96% | (R) 5.26% |
| | p99 (ns) | (R) 11.83% | (R) 8.35% |
| | p99.9 (ns) | (R) 11.42% | (R) 22.37% |
+-----------------+--------------+-------------+---------------+
| syscall/invalid | mean (ns) | (R) 10.58% | (R) 2.91% |
| | p99 (ns) | (R) 10.51% | (R) 4.36% |
| | p99.9 (ns) | (R) 10.35% | (R) 21.97% |
+-----------------+--------------+-------------+---------------+
I was surprised to see that the baseline cost on x86 is 10-12% since it is just
using rdtsc. But as I say, I believe the results are accurate.
Thanks,
Ryan
Ryan Roberts (2):
randomize_kstack: Maintain kstack_offset per task
randomize_kstack: Unify random source across arches
arch/Kconfig | 5 ++-
arch/arm64/kernel/syscall.c | 11 ------
arch/loongarch/kernel/syscall.c | 11 ------
arch/powerpc/kernel/syscall.c | 12 -------
arch/riscv/kernel/traps.c | 12 -------
arch/s390/include/asm/entry-common.h | 8 -----
arch/x86/include/asm/entry-common.h | 12 -------
include/linux/prandom.h | 7 +++-
include/linux/randomize_kstack.h | 53 ++++++++++------------------
include/linux/sched.h | 5 +++
init/main.c | 1 -
kernel/fork.c | 2 ++
12 files changed, 34 insertions(+), 105 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH v1 1/2] randomize_kstack: Maintain kstack_offset per task
2025-11-27 10:59 [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Ryan Roberts
@ 2025-11-27 10:59 ` Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches Ryan Roberts
2025-12-02 16:59 ` [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Kees Cook
2 siblings, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-11-27 10:59 UTC (permalink / raw)
To: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld
Cc: Ryan Roberts, linux-hardening, linux-arm-kernel, linux-kernel,
stable
kstack_offset was previously maintained per-cpu, but this caused a
couple of issues. So let's instead make it per-task.
Issue 1: add_random_kstack_offset() and choose_random_kstack_offset()
expected and required to be called with interrupts and preemption
disabled so that it could manipulate per-cpu state. But arm64, loongarch
and risc-v are calling them with interrupts and preemption enabled. I
don't _think_ this causes any functional issues, but it's certainly
unexpected and could lead to manipulating the wrong cpu's state, which
could cause a minor performance degradation due to bouncing the cache
lines. By maintaining the state per-task those functions can safely be
called in preemptible context.
Issue 2: add_random_kstack_offset() is called before executing the
syscall and expands the stack using a previously chosen rnadom offset.
choose_random_kstack_offset() is called after executing the syscall and
chooses and stores a new random offset for the next syscall. With
per-cpu storage for this offset, an attacker could force cpu migration
during the execution of the syscall and prevent the offset from being
updated for the original cpu such that it is predictable for the next
syscall on that cpu. By maintaining the state per-task, this problem
goes away because the per-task random offset is updated after the
syscall regardless of which cpu it is executing on.
Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall")
Closes: https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
include/linux/randomize_kstack.h | 26 +++++++++++++++-----------
include/linux/sched.h | 4 ++++
init/main.c | 1 -
kernel/fork.c | 2 ++
4 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 1d982dbdd0d0..089b1432f7e6 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -9,7 +9,6 @@
DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
randomize_kstack_offset);
-DECLARE_PER_CPU(u32, kstack_offset);
/*
* Do not use this anywhere else in the kernel. This is used here because
@@ -50,15 +49,14 @@ DECLARE_PER_CPU(u32, kstack_offset);
* add_random_kstack_offset - Increase stack utilization by previously
* chosen random offset
*
- * This should be used in the syscall entry path when interrupts and
- * preempt are disabled, and after user registers have been stored to
- * the stack. For testing the resulting entropy, please see:
- * tools/testing/selftests/lkdtm/stack-entropy.sh
+ * This should be used in the syscall entry path after user registers have been
+ * stored to the stack. Preemption may be enabled. For testing the resulting
+ * entropy, please see: tools/testing/selftests/lkdtm/stack-entropy.sh
*/
#define add_random_kstack_offset() do { \
if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
&randomize_kstack_offset)) { \
- u32 offset = raw_cpu_read(kstack_offset); \
+ u32 offset = current->kstack_offset; \
u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
/* Keep allocation even after "ptr" loses scope. */ \
asm volatile("" :: "r"(ptr) : "memory"); \
@@ -69,9 +67,9 @@ DECLARE_PER_CPU(u32, kstack_offset);
* choose_random_kstack_offset - Choose the random offset for the next
* add_random_kstack_offset()
*
- * This should only be used during syscall exit when interrupts and
- * preempt are disabled. This position in the syscall flow is done to
- * frustrate attacks from userspace attempting to learn the next offset:
+ * This should only be used during syscall exit. Preemption may be enabled. This
+ * position in the syscall flow is done to frustrate attacks from userspace
+ * attempting to learn the next offset:
* - Maximize the timing uncertainty visible from userspace: if the
* offset is chosen at syscall entry, userspace has much more control
* over the timing between choosing offsets. "How long will we be in
@@ -85,14 +83,20 @@ DECLARE_PER_CPU(u32, kstack_offset);
#define choose_random_kstack_offset(rand) do { \
if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
&randomize_kstack_offset)) { \
- u32 offset = raw_cpu_read(kstack_offset); \
+ u32 offset = current->kstack_offset; \
offset = ror32(offset, 5) ^ (rand); \
- raw_cpu_write(kstack_offset, offset); \
+ current->kstack_offset = offset; \
} \
} while (0)
+
+static inline void random_kstack_task_init(struct task_struct *tsk)
+{
+ current->kstack_offset = 0;
+}
#else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
#define add_random_kstack_offset() do { } while (0)
#define choose_random_kstack_offset(rand) do { } while (0)
+#define random_kstack_task_init(tsk) do { } while (0)
#endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b469878de25c..dae227d217ef 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1613,6 +1613,10 @@ struct task_struct {
unsigned long prev_lowest_stack;
#endif
+#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
+ u32 kstack_offset;
+#endif
+
#ifdef CONFIG_X86_MCE
void __user *mce_vaddr;
__u64 mce_kflags;
diff --git a/init/main.c b/init/main.c
index 07a3116811c5..048a62538242 100644
--- a/init/main.c
+++ b/init/main.c
@@ -830,7 +830,6 @@ static inline void initcall_debug_enable(void)
#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
randomize_kstack_offset);
-DEFINE_PER_CPU(u32, kstack_offset);
static int __init early_randomize_kstack_offset(char *buf)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index 3da0f08615a9..c0dced542b8a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -95,6 +95,7 @@
#include <linux/thread_info.h>
#include <linux/kstack_erase.h>
#include <linux/kasan.h>
+#include <linux/randomize_kstack.h>
#include <linux/scs.h>
#include <linux/io_uring.h>
#include <linux/bpf.h>
@@ -2191,6 +2192,7 @@ __latent_entropy struct task_struct *copy_process(
if (retval)
goto bad_fork_cleanup_io;
+ random_kstack_task_init(p);
stackleak_task_init(p);
if (pid != &init_struct_pid) {
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-11-27 10:59 [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 1/2] randomize_kstack: Maintain kstack_offset per task Ryan Roberts
@ 2025-11-27 10:59 ` Ryan Roberts
2025-11-28 11:01 ` Ard Biesheuvel
2025-12-02 16:59 ` [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Kees Cook
2 siblings, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-11-27 10:59 UTC (permalink / raw)
To: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld
Cc: Ryan Roberts, linux-hardening, linux-arm-kernel, linux-kernel
Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().
There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.
So let's use a common, architecture-agnostic source of entropy; a
per-task prng, seeded at fork-time from the crng. This has a few
benefits:
- We can remove choose_random_kstack_offset(); That was only there to
try to make the timestamp counter value a bit harder to influence
from user space.
- The architecture code is simplified. All it has to do now is call
add_random_kstack_offset() in the syscall path.
- The strength of the randomness can be reasoned about independently
of the architecture.
- Arches previously using get_random_u16() now have much faster
syscall paths, see below results.
There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.
Additionally, we are only consuming 6 bits to randomize the stack, so
there are only 64 possible random offsets. I assert that it would be
trivial for an attacker to brute force by repeating their attack and
waiting for the random stack offset to be the desired one. The prng
approach seems entirely proportional to this level of protection.
Performance data are provided below. The baseline is v6.18-rc5 with
rndstack on for each respective arch. (I)/(R) indicate statistically
significant improvement/regression. arm64 platform is AWS Graviton3.
x86_64 platform is AWS Sapphire Rapids:
+-----------------+--------------+---------------+---------------+
| Benchmark | Result Class | per-task-prng | per-task-prng |
| | | arm64 | x86_64 |
+=================+==============+===============+===============+
| syscall/getpid | mean (ns) | (I) -10.54% | (I) -7.69% |
| | p99 (ns) | (I) -59.53% | 4.14% |
| | p99.9 (ns) | (I) -59.90% | 2.68% |
+-----------------+--------------+---------------+---------------+
| syscall/getppid | mean (ns) | (I) -10.49% | (I) -5.98% |
| | p99 (ns) | (I) -59.83% | -3.11% |
| | p99.9 (ns) | (I) -59.88% | (R) 9.84% |
+-----------------+--------------+---------------+---------------+
| syscall/invalid | mean (ns) | (I) -9.28% | (I) -6.94% |
| | p99 (ns) | (I) -61.06% | (I) -5.57% |
| | p99.9 (ns) | (I) -61.40% | (R) 10.53% |
+-----------------+--------------+---------------+---------------+
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/Kconfig | 5 ++--
arch/arm64/kernel/syscall.c | 11 -------
arch/loongarch/kernel/syscall.c | 11 -------
arch/powerpc/kernel/syscall.c | 12 --------
arch/riscv/kernel/traps.c | 12 --------
arch/s390/include/asm/entry-common.h | 8 ------
arch/x86/include/asm/entry-common.h | 12 --------
include/linux/prandom.h | 7 ++++-
include/linux/randomize_kstack.h | 43 ++++++++--------------------
include/linux/sched.h | 3 +-
10 files changed, 22 insertions(+), 102 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 61130b88964b..b2c70c9dc098 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1519,9 +1519,8 @@ config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
def_bool n
help
An arch should select this symbol if it can support kernel stack
- offset randomization with calls to add_random_kstack_offset()
- during syscall entry and choose_random_kstack_offset() during
- syscall exit. Careful removal of -fstack-protector-strong and
+ offset randomization with a call to add_random_kstack_offset()
+ during syscall entry. Careful removal of -fstack-protector-strong and
-fstack-protector should also be applied to the entry code and
closely examined, as the artificial stack bump looks like an array
to the compiler, so it will attempt to add canary checks regardless
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index aba7ca6bca2d..957c4ccd5d14 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -52,17 +52,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
}
syscall_set_return_value(current, regs, 0, ret);
-
- /*
- * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
- * bits. The actual entropy will be further reduced by the compiler
- * when applying stack alignment constraints: the AAPCS mandates a
- * 16-byte aligned SP at function boundaries, which will remove the
- * 4 low bits from any entropy chosen here.
- *
- * The resulting 6 bits of entropy is seen in SP[9:4].
- */
- choose_random_kstack_offset(get_random_u16());
}
static inline bool has_syscall_work(unsigned long flags)
diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c
index 168bd97540f8..80544a5ec6e1 100644
--- a/arch/loongarch/kernel/syscall.c
+++ b/arch/loongarch/kernel/syscall.c
@@ -66,16 +66,5 @@ void noinstr __no_stack_protector do_syscall(struct pt_regs *regs)
regs->regs[7], regs->regs[8], regs->regs[9]);
}
- /*
- * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
- * bits. The actual entropy will be further reduced by the compiler
- * when applying stack alignment constraints: 16-bytes (i.e. 4-bits)
- * aligned, which will remove the 4 low bits from any entropy chosen
- * here.
- *
- * The resulting 6 bits of entropy is seen in SP[9:4].
- */
- choose_random_kstack_offset(drdtime());
-
syscall_exit_to_user_mode(regs);
}
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index be159ad4b77b..b3d8b0f9823b 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -173,17 +173,5 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
}
#endif
- /*
- * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
- * so the maximum stack offset is 1k bytes (10 bits).
- *
- * The actual entropy will be further reduced by the compiler when
- * applying stack alignment constraints: the powerpc architecture
- * may have two kinds of stack alignment (16-bytes and 8-bytes).
- *
- * So the resulting 6 or 7 bits of entropy is seen in SP[9:4] or SP[9:3].
- */
- choose_random_kstack_offset(mftb());
-
return ret;
}
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 80230de167de..79b285bdfd1a 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -342,18 +342,6 @@ void do_trap_ecall_u(struct pt_regs *regs)
if (syscall >= 0 && syscall < NR_syscalls)
syscall_handler(regs, syscall);
- /*
- * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
- * so the maximum stack offset is 1k bytes (10 bits).
- *
- * The actual entropy will be further reduced by the compiler when
- * applying stack alignment constraints: 16-byte (i.e. 4-bit) aligned
- * for RV32I or RV64I.
- *
- * The resulting 6 bits of entropy is seen in SP[9:4].
- */
- choose_random_kstack_offset(get_random_u16());
-
syscall_exit_to_user_mode(regs);
} else {
irqentry_state_t state = irqentry_nmi_enter(regs);
diff --git a/arch/s390/include/asm/entry-common.h b/arch/s390/include/asm/entry-common.h
index 979af986a8fe..35450a485323 100644
--- a/arch/s390/include/asm/entry-common.h
+++ b/arch/s390/include/asm/entry-common.h
@@ -51,14 +51,6 @@ static __always_inline void arch_exit_to_user_mode(void)
#define arch_exit_to_user_mode arch_exit_to_user_mode
-static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
- unsigned long ti_work)
-{
- choose_random_kstack_offset(get_tod_clock_fast());
-}
-
-#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
-
static __always_inline bool arch_in_rcu_eqs(void)
{
if (IS_ENABLED(CONFIG_KVM))
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce3eb6d5fdf9..7535131c711b 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -82,18 +82,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED);
#endif
- /*
- * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
- * bits. The actual entropy will be further reduced by the compiler
- * when applying stack alignment constraints (see cc_stack_align4/8 in
- * arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
- * low bits from any entropy chosen here.
- *
- * Therefore, final stack offset entropy will be 7 (x86_64) or
- * 8 (ia32) bits.
- */
- choose_random_kstack_offset(rdtsc());
-
/* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
this_cpu_read(x86_ibpb_exit_to_user)) {
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index f2ed5b72b3d6..827edde11cb9 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -10,13 +10,18 @@
#include <linux/types.h>
#include <linux/once.h>
-#include <linux/percpu.h>
#include <linux/random.h>
struct rnd_state {
__u32 s1, s2, s3, s4;
};
+/*
+ * percpu.h includes sched.h, which requires struct rnd_state. So defer until
+ * after struct rnd_state is defined.
+ */
+#include <linux/percpu.h>
+
u32 prandom_u32_state(struct rnd_state *state);
void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 089b1432f7e6..83c7e6710f6d 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/jump_label.h>
#include <linux/percpu-defs.h>
+#include <linux/prandom.h>
DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
randomize_kstack_offset);
@@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
#define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
#endif
+static __always_inline u32 get_update_kstack_offset(void)
+{
+ return prandom_u32_state(¤t->kstack_rnd_state);
+}
+
/**
- * add_random_kstack_offset - Increase stack utilization by previously
- * chosen random offset
+ * add_random_kstack_offset - Increase stack utilization by a random offset.
*
* This should be used in the syscall entry path after user registers have been
* stored to the stack. Preemption may be enabled. For testing the resulting
@@ -56,46 +61,22 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
#define add_random_kstack_offset() do { \
if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
&randomize_kstack_offset)) { \
- u32 offset = current->kstack_offset; \
+ u32 offset = get_update_kstack_offset(); \
u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
/* Keep allocation even after "ptr" loses scope. */ \
asm volatile("" :: "r"(ptr) : "memory"); \
} \
} while (0)
-/**
- * choose_random_kstack_offset - Choose the random offset for the next
- * add_random_kstack_offset()
- *
- * This should only be used during syscall exit. Preemption may be enabled. This
- * position in the syscall flow is done to frustrate attacks from userspace
- * attempting to learn the next offset:
- * - Maximize the timing uncertainty visible from userspace: if the
- * offset is chosen at syscall entry, userspace has much more control
- * over the timing between choosing offsets. "How long will we be in
- * kernel mode?" tends to be more difficult to predict than "how long
- * will we be in user mode?"
- * - Reduce the lifetime of the new offset sitting in memory during
- * kernel mode execution. Exposure of "thread-local" memory content
- * (e.g. current, percpu, etc) tends to be easier than arbitrary
- * location memory exposure.
- */
-#define choose_random_kstack_offset(rand) do { \
- if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
- &randomize_kstack_offset)) { \
- u32 offset = current->kstack_offset; \
- offset = ror32(offset, 5) ^ (rand); \
- current->kstack_offset = offset; \
- } \
-} while (0)
-
static inline void random_kstack_task_init(struct task_struct *tsk)
{
- current->kstack_offset = 0;
+ if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
+ &randomize_kstack_offset)) {
+ prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
+ }
}
#else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
#define add_random_kstack_offset() do { } while (0)
-#define choose_random_kstack_offset(rand) do { } while (0)
#define random_kstack_task_init(tsk) do { } while (0)
#endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dae227d217ef..ac0402c2b988 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,6 +34,7 @@
#include <linux/sched/prio.h>
#include <linux/sched/types.h>
#include <linux/signal_types.h>
+#include <linux/prandom.h>
#include <linux/spinlock.h>
#include <linux/syscall_user_dispatch_types.h>
#include <linux/mm_types_task.h>
@@ -1614,7 +1615,7 @@ struct task_struct {
#endif
#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
- u32 kstack_offset;
+ struct rnd_state kstack_rnd_state;
#endif
#ifdef CONFIG_X86_MCE
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-11-27 10:59 ` [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches Ryan Roberts
@ 2025-11-28 11:01 ` Ard Biesheuvel
2025-12-01 18:20 ` Ryan Roberts
0 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2025-11-28 11:01 UTC (permalink / raw)
To: Ryan Roberts
Cc: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld,
linux-hardening, linux-arm-kernel, linux-kernel
On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> Previously different architectures were using random sources of
> differing strength and cost to decide the random kstack offset. A number
> of architectures (loongarch, powerpc, s390, x86) were using their
> timestamp counter, at whatever the frequency happened to be. Other
> arches (arm64, riscv) were using entropy from the crng via
> get_random_u16().
>
> There have been concerns that in some cases the timestamp counters may
> be too weak, because they can be easily guessed or influenced by user
> space. And get_random_u16() has been shown to be too costly for the
> level of protection kstack offset randomization provides.
>
> So let's use a common, architecture-agnostic source of entropy; a
> per-task prng, seeded at fork-time from the crng. This has a few
> benefits:
>
> - We can remove choose_random_kstack_offset(); That was only there to
> try to make the timestamp counter value a bit harder to influence
> from user space.
>
> - The architecture code is simplified. All it has to do now is call
> add_random_kstack_offset() in the syscall path.
>
> - The strength of the randomness can be reasoned about independently
> of the architecture.
>
> - Arches previously using get_random_u16() now have much faster
> syscall paths, see below results.
>
> There have been some claims that a prng may be less strong than the
> timestamp counter if not regularly reseeded. But the prng has a period
> of about 2^113. So as long as the prng state remains secret, it should
> not be possible to guess. If the prng state can be accessed, we have
> bigger problems.
>
> Additionally, we are only consuming 6 bits to randomize the stack, so
> there are only 64 possible random offsets. I assert that it would be
> trivial for an attacker to brute force by repeating their attack and
> waiting for the random stack offset to be the desired one. The prng
> approach seems entirely proportional to this level of protection.
>
> Performance data are provided below. The baseline is v6.18-rc5 with
> rndstack on for each respective arch. (I)/(R) indicate statistically
> significant improvement/regression. arm64 platform is AWS Graviton3.
> x86_64 platform is AWS Sapphire Rapids:
>
> +-----------------+--------------+---------------+---------------+
> | Benchmark | Result Class | per-task-prng | per-task-prng |
> | | | arm64 | x86_64 |
> +=================+==============+===============+===============+
> | syscall/getpid | mean (ns) | (I) -10.54% | (I) -7.69% |
> | | p99 (ns) | (I) -59.53% | 4.14% |
> | | p99.9 (ns) | (I) -59.90% | 2.68% |
> +-----------------+--------------+---------------+---------------+
> | syscall/getppid | mean (ns) | (I) -10.49% | (I) -5.98% |
> | | p99 (ns) | (I) -59.83% | -3.11% |
> | | p99.9 (ns) | (I) -59.88% | (R) 9.84% |
> +-----------------+--------------+---------------+---------------+
> | syscall/invalid | mean (ns) | (I) -9.28% | (I) -6.94% |
> | | p99 (ns) | (I) -61.06% | (I) -5.57% |
> | | p99.9 (ns) | (I) -61.40% | (R) 10.53% |
> +-----------------+--------------+---------------+---------------+
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> arch/Kconfig | 5 ++--
> arch/arm64/kernel/syscall.c | 11 -------
> arch/loongarch/kernel/syscall.c | 11 -------
> arch/powerpc/kernel/syscall.c | 12 --------
> arch/riscv/kernel/traps.c | 12 --------
> arch/s390/include/asm/entry-common.h | 8 ------
> arch/x86/include/asm/entry-common.h | 12 --------
> include/linux/prandom.h | 7 ++++-
> include/linux/randomize_kstack.h | 43 ++++++++--------------------
> include/linux/sched.h | 3 +-
> 10 files changed, 22 insertions(+), 102 deletions(-)
>
I think this is a substantial improvement over the status quo,
especially with all the dodgy uses of monotonic counters with
unspecified granularity.
So with two comments below:
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 61130b88964b..b2c70c9dc098 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1519,9 +1519,8 @@ config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
> def_bool n
> help
> An arch should select this symbol if it can support kernel stack
> - offset randomization with calls to add_random_kstack_offset()
> - during syscall entry and choose_random_kstack_offset() during
> - syscall exit. Careful removal of -fstack-protector-strong and
> + offset randomization with a call to add_random_kstack_offset()
> + during syscall entry. Careful removal of -fstack-protector-strong and
> -fstack-protector should also be applied to the entry code and
> closely examined, as the artificial stack bump looks like an array
> to the compiler, so it will attempt to add canary checks regardless
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index aba7ca6bca2d..957c4ccd5d14 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -52,17 +52,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
> }
>
> syscall_set_return_value(current, regs, 0, ret);
> -
> - /*
> - * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
> - * bits. The actual entropy will be further reduced by the compiler
> - * when applying stack alignment constraints: the AAPCS mandates a
> - * 16-byte aligned SP at function boundaries, which will remove the
> - * 4 low bits from any entropy chosen here.
> - *
> - * The resulting 6 bits of entropy is seen in SP[9:4].
> - */
> - choose_random_kstack_offset(get_random_u16());
> }
>
> static inline bool has_syscall_work(unsigned long flags)
> diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c
> index 168bd97540f8..80544a5ec6e1 100644
> --- a/arch/loongarch/kernel/syscall.c
> +++ b/arch/loongarch/kernel/syscall.c
> @@ -66,16 +66,5 @@ void noinstr __no_stack_protector do_syscall(struct pt_regs *regs)
> regs->regs[7], regs->regs[8], regs->regs[9]);
> }
>
> - /*
> - * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
> - * bits. The actual entropy will be further reduced by the compiler
> - * when applying stack alignment constraints: 16-bytes (i.e. 4-bits)
> - * aligned, which will remove the 4 low bits from any entropy chosen
> - * here.
> - *
> - * The resulting 6 bits of entropy is seen in SP[9:4].
> - */
> - choose_random_kstack_offset(drdtime());
> -
> syscall_exit_to_user_mode(regs);
> }
> diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
> index be159ad4b77b..b3d8b0f9823b 100644
> --- a/arch/powerpc/kernel/syscall.c
> +++ b/arch/powerpc/kernel/syscall.c
> @@ -173,17 +173,5 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
> }
> #endif
>
> - /*
> - * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> - * so the maximum stack offset is 1k bytes (10 bits).
> - *
> - * The actual entropy will be further reduced by the compiler when
> - * applying stack alignment constraints: the powerpc architecture
> - * may have two kinds of stack alignment (16-bytes and 8-bytes).
> - *
> - * So the resulting 6 or 7 bits of entropy is seen in SP[9:4] or SP[9:3].
> - */
> - choose_random_kstack_offset(mftb());
> -
> return ret;
> }
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 80230de167de..79b285bdfd1a 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -342,18 +342,6 @@ void do_trap_ecall_u(struct pt_regs *regs)
> if (syscall >= 0 && syscall < NR_syscalls)
> syscall_handler(regs, syscall);
>
> - /*
> - * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> - * so the maximum stack offset is 1k bytes (10 bits).
> - *
> - * The actual entropy will be further reduced by the compiler when
> - * applying stack alignment constraints: 16-byte (i.e. 4-bit) aligned
> - * for RV32I or RV64I.
> - *
> - * The resulting 6 bits of entropy is seen in SP[9:4].
> - */
> - choose_random_kstack_offset(get_random_u16());
> -
> syscall_exit_to_user_mode(regs);
> } else {
> irqentry_state_t state = irqentry_nmi_enter(regs);
> diff --git a/arch/s390/include/asm/entry-common.h b/arch/s390/include/asm/entry-common.h
> index 979af986a8fe..35450a485323 100644
> --- a/arch/s390/include/asm/entry-common.h
> +++ b/arch/s390/include/asm/entry-common.h
> @@ -51,14 +51,6 @@ static __always_inline void arch_exit_to_user_mode(void)
>
> #define arch_exit_to_user_mode arch_exit_to_user_mode
>
> -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> - unsigned long ti_work)
> -{
> - choose_random_kstack_offset(get_tod_clock_fast());
> -}
> -
> -#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
> -
> static __always_inline bool arch_in_rcu_eqs(void)
> {
> if (IS_ENABLED(CONFIG_KVM))
> diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
> index ce3eb6d5fdf9..7535131c711b 100644
> --- a/arch/x86/include/asm/entry-common.h
> +++ b/arch/x86/include/asm/entry-common.h
> @@ -82,18 +82,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED);
> #endif
>
> - /*
> - * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
> - * bits. The actual entropy will be further reduced by the compiler
> - * when applying stack alignment constraints (see cc_stack_align4/8 in
> - * arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
> - * low bits from any entropy chosen here.
> - *
> - * Therefore, final stack offset entropy will be 7 (x86_64) or
> - * 8 (ia32) bits.
> - */
> - choose_random_kstack_offset(rdtsc());
> -
> /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
> if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
> this_cpu_read(x86_ibpb_exit_to_user)) {
> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> index f2ed5b72b3d6..827edde11cb9 100644
> --- a/include/linux/prandom.h
> +++ b/include/linux/prandom.h
> @@ -10,13 +10,18 @@
>
> #include <linux/types.h>
> #include <linux/once.h>
> -#include <linux/percpu.h>
> #include <linux/random.h>
>
> struct rnd_state {
> __u32 s1, s2, s3, s4;
> };
>
> +/*
> + * percpu.h includes sched.h, which requires struct rnd_state. So defer until
> + * after struct rnd_state is defined.
> + */
> +#include <linux/percpu.h>
> +
Yuck. Is this the best we can do to disentangle this?
> u32 prandom_u32_state(struct rnd_state *state);
> void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
> void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> index 089b1432f7e6..83c7e6710f6d 100644
> --- a/include/linux/randomize_kstack.h
> +++ b/include/linux/randomize_kstack.h
> @@ -6,6 +6,7 @@
> #include <linux/kernel.h>
> #include <linux/jump_label.h>
> #include <linux/percpu-defs.h>
> +#include <linux/prandom.h>
>
> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> randomize_kstack_offset);
> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
> #endif
>
> +static __always_inline u32 get_update_kstack_offset(void)
> +{
> + return prandom_u32_state(¤t->kstack_rnd_state);
> +}
> +
> /**
> - * add_random_kstack_offset - Increase stack utilization by previously
> - * chosen random offset
> + * add_random_kstack_offset - Increase stack utilization by a random offset.
> *
> * This should be used in the syscall entry path after user registers have been
> * stored to the stack. Preemption may be enabled. For testing the resulting
> @@ -56,46 +61,22 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> #define add_random_kstack_offset() do { \
> if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> &randomize_kstack_offset)) { \
> - u32 offset = current->kstack_offset; \
> + u32 offset = get_update_kstack_offset(); \
> u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
> /* Keep allocation even after "ptr" loses scope. */ \
> asm volatile("" :: "r"(ptr) : "memory"); \
> } \
> } while (0)
>
> -/**
> - * choose_random_kstack_offset - Choose the random offset for the next
> - * add_random_kstack_offset()
> - *
> - * This should only be used during syscall exit. Preemption may be enabled. This
> - * position in the syscall flow is done to frustrate attacks from userspace
> - * attempting to learn the next offset:
> - * - Maximize the timing uncertainty visible from userspace: if the
> - * offset is chosen at syscall entry, userspace has much more control
> - * over the timing between choosing offsets. "How long will we be in
> - * kernel mode?" tends to be more difficult to predict than "how long
> - * will we be in user mode?"
> - * - Reduce the lifetime of the new offset sitting in memory during
> - * kernel mode execution. Exposure of "thread-local" memory content
> - * (e.g. current, percpu, etc) tends to be easier than arbitrary
> - * location memory exposure.
> - */
> -#define choose_random_kstack_offset(rand) do { \
> - if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> - &randomize_kstack_offset)) { \
> - u32 offset = current->kstack_offset; \
> - offset = ror32(offset, 5) ^ (rand); \
> - current->kstack_offset = offset; \
> - } \
> -} while (0)
> -
> static inline void random_kstack_task_init(struct task_struct *tsk)
> {
> - current->kstack_offset = 0;
> + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> + &randomize_kstack_offset)) {
> + prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
We should either fix prandom_seed_state() not to truncate the u64 to
u32, or even better, refactor prandom_seed_full_state() so we can
reuse it here, and use a 128-bit seed directly.
> + }
> }
> #else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
> #define add_random_kstack_offset() do { } while (0)
> -#define choose_random_kstack_offset(rand) do { } while (0)
> #define random_kstack_task_init(tsk) do { } while (0)
> #endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index dae227d217ef..ac0402c2b988 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -34,6 +34,7 @@
> #include <linux/sched/prio.h>
> #include <linux/sched/types.h>
> #include <linux/signal_types.h>
> +#include <linux/prandom.h>
> #include <linux/spinlock.h>
> #include <linux/syscall_user_dispatch_types.h>
> #include <linux/mm_types_task.h>
> @@ -1614,7 +1615,7 @@ struct task_struct {
> #endif
>
> #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> - u32 kstack_offset;
> + struct rnd_state kstack_rnd_state;
> #endif
>
> #ifdef CONFIG_X86_MCE
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-11-28 11:01 ` Ard Biesheuvel
@ 2025-12-01 18:20 ` Ryan Roberts
2025-12-02 9:15 ` Ard Biesheuvel
0 siblings, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-12-01 18:20 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld,
linux-hardening, linux-arm-kernel, linux-kernel
On 28/11/2025 11:01, Ard Biesheuvel wrote:
> On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Previously different architectures were using random sources of
>> differing strength and cost to decide the random kstack offset. A number
>> of architectures (loongarch, powerpc, s390, x86) were using their
>> timestamp counter, at whatever the frequency happened to be. Other
>> arches (arm64, riscv) were using entropy from the crng via
>> get_random_u16().
>>
>> There have been concerns that in some cases the timestamp counters may
>> be too weak, because they can be easily guessed or influenced by user
>> space. And get_random_u16() has been shown to be too costly for the
>> level of protection kstack offset randomization provides.
>>
>> So let's use a common, architecture-agnostic source of entropy; a
>> per-task prng, seeded at fork-time from the crng. This has a few
>> benefits:
>>
>> - We can remove choose_random_kstack_offset(); That was only there to
>> try to make the timestamp counter value a bit harder to influence
>> from user space.
>>
>> - The architecture code is simplified. All it has to do now is call
>> add_random_kstack_offset() in the syscall path.
>>
>> - The strength of the randomness can be reasoned about independently
>> of the architecture.
>>
>> - Arches previously using get_random_u16() now have much faster
>> syscall paths, see below results.
>>
>> There have been some claims that a prng may be less strong than the
>> timestamp counter if not regularly reseeded. But the prng has a period
>> of about 2^113. So as long as the prng state remains secret, it should
>> not be possible to guess. If the prng state can be accessed, we have
>> bigger problems.
>>
>> Additionally, we are only consuming 6 bits to randomize the stack, so
>> there are only 64 possible random offsets. I assert that it would be
>> trivial for an attacker to brute force by repeating their attack and
>> waiting for the random stack offset to be the desired one. The prng
>> approach seems entirely proportional to this level of protection.
>>
>> Performance data are provided below. The baseline is v6.18-rc5 with
>> rndstack on for each respective arch. (I)/(R) indicate statistically
>> significant improvement/regression. arm64 platform is AWS Graviton3.
>> x86_64 platform is AWS Sapphire Rapids:
>>
>> +-----------------+--------------+---------------+---------------+
>> | Benchmark | Result Class | per-task-prng | per-task-prng |
>> | | | arm64 | x86_64 |
>> +=================+==============+===============+===============+
>> | syscall/getpid | mean (ns) | (I) -10.54% | (I) -7.69% |
>> | | p99 (ns) | (I) -59.53% | 4.14% |
>> | | p99.9 (ns) | (I) -59.90% | 2.68% |
>> +-----------------+--------------+---------------+---------------+
>> | syscall/getppid | mean (ns) | (I) -10.49% | (I) -5.98% |
>> | | p99 (ns) | (I) -59.83% | -3.11% |
>> | | p99.9 (ns) | (I) -59.88% | (R) 9.84% |
>> +-----------------+--------------+---------------+---------------+
>> | syscall/invalid | mean (ns) | (I) -9.28% | (I) -6.94% |
>> | | p99 (ns) | (I) -61.06% | (I) -5.57% |
>> | | p99.9 (ns) | (I) -61.40% | (R) 10.53% |
>> +-----------------+--------------+---------------+---------------+
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>> arch/Kconfig | 5 ++--
>> arch/arm64/kernel/syscall.c | 11 -------
>> arch/loongarch/kernel/syscall.c | 11 -------
>> arch/powerpc/kernel/syscall.c | 12 --------
>> arch/riscv/kernel/traps.c | 12 --------
>> arch/s390/include/asm/entry-common.h | 8 ------
>> arch/x86/include/asm/entry-common.h | 12 --------
>> include/linux/prandom.h | 7 ++++-
>> include/linux/randomize_kstack.h | 43 ++++++++--------------------
>> include/linux/sched.h | 3 +-
>> 10 files changed, 22 insertions(+), 102 deletions(-)
>>
>
> I think this is a substantial improvement over the status quo,
> especially with all the dodgy uses of monotonic counters with
> unspecified granularity.
>
> So with two comments below:
>
> Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Thanks!
I shall plan to re-post without the RFC tag against -rc1 then, and we will see
if anyone pushes back.
>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 61130b88964b..b2c70c9dc098 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -1519,9 +1519,8 @@ config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>> def_bool n
>> help
>> An arch should select this symbol if it can support kernel stack
>> - offset randomization with calls to add_random_kstack_offset()
>> - during syscall entry and choose_random_kstack_offset() during
>> - syscall exit. Careful removal of -fstack-protector-strong and
>> + offset randomization with a call to add_random_kstack_offset()
>> + during syscall entry. Careful removal of -fstack-protector-strong and
>> -fstack-protector should also be applied to the entry code and
>> closely examined, as the artificial stack bump looks like an array
>> to the compiler, so it will attempt to add canary checks regardless
>> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
>> index aba7ca6bca2d..957c4ccd5d14 100644
>> --- a/arch/arm64/kernel/syscall.c
>> +++ b/arch/arm64/kernel/syscall.c
>> @@ -52,17 +52,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>> }
>>
>> syscall_set_return_value(current, regs, 0, ret);
>> -
>> - /*
>> - * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
>> - * bits. The actual entropy will be further reduced by the compiler
>> - * when applying stack alignment constraints: the AAPCS mandates a
>> - * 16-byte aligned SP at function boundaries, which will remove the
>> - * 4 low bits from any entropy chosen here.
>> - *
>> - * The resulting 6 bits of entropy is seen in SP[9:4].
>> - */
>> - choose_random_kstack_offset(get_random_u16());
>> }
>>
>> static inline bool has_syscall_work(unsigned long flags)
>> diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c
>> index 168bd97540f8..80544a5ec6e1 100644
>> --- a/arch/loongarch/kernel/syscall.c
>> +++ b/arch/loongarch/kernel/syscall.c
>> @@ -66,16 +66,5 @@ void noinstr __no_stack_protector do_syscall(struct pt_regs *regs)
>> regs->regs[7], regs->regs[8], regs->regs[9]);
>> }
>>
>> - /*
>> - * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
>> - * bits. The actual entropy will be further reduced by the compiler
>> - * when applying stack alignment constraints: 16-bytes (i.e. 4-bits)
>> - * aligned, which will remove the 4 low bits from any entropy chosen
>> - * here.
>> - *
>> - * The resulting 6 bits of entropy is seen in SP[9:4].
>> - */
>> - choose_random_kstack_offset(drdtime());
>> -
>> syscall_exit_to_user_mode(regs);
>> }
>> diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
>> index be159ad4b77b..b3d8b0f9823b 100644
>> --- a/arch/powerpc/kernel/syscall.c
>> +++ b/arch/powerpc/kernel/syscall.c
>> @@ -173,17 +173,5 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
>> }
>> #endif
>>
>> - /*
>> - * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
>> - * so the maximum stack offset is 1k bytes (10 bits).
>> - *
>> - * The actual entropy will be further reduced by the compiler when
>> - * applying stack alignment constraints: the powerpc architecture
>> - * may have two kinds of stack alignment (16-bytes and 8-bytes).
>> - *
>> - * So the resulting 6 or 7 bits of entropy is seen in SP[9:4] or SP[9:3].
>> - */
>> - choose_random_kstack_offset(mftb());
>> -
>> return ret;
>> }
>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> index 80230de167de..79b285bdfd1a 100644
>> --- a/arch/riscv/kernel/traps.c
>> +++ b/arch/riscv/kernel/traps.c
>> @@ -342,18 +342,6 @@ void do_trap_ecall_u(struct pt_regs *regs)
>> if (syscall >= 0 && syscall < NR_syscalls)
>> syscall_handler(regs, syscall);
>>
>> - /*
>> - * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
>> - * so the maximum stack offset is 1k bytes (10 bits).
>> - *
>> - * The actual entropy will be further reduced by the compiler when
>> - * applying stack alignment constraints: 16-byte (i.e. 4-bit) aligned
>> - * for RV32I or RV64I.
>> - *
>> - * The resulting 6 bits of entropy is seen in SP[9:4].
>> - */
>> - choose_random_kstack_offset(get_random_u16());
>> -
>> syscall_exit_to_user_mode(regs);
>> } else {
>> irqentry_state_t state = irqentry_nmi_enter(regs);
>> diff --git a/arch/s390/include/asm/entry-common.h b/arch/s390/include/asm/entry-common.h
>> index 979af986a8fe..35450a485323 100644
>> --- a/arch/s390/include/asm/entry-common.h
>> +++ b/arch/s390/include/asm/entry-common.h
>> @@ -51,14 +51,6 @@ static __always_inline void arch_exit_to_user_mode(void)
>>
>> #define arch_exit_to_user_mode arch_exit_to_user_mode
>>
>> -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
>> - unsigned long ti_work)
>> -{
>> - choose_random_kstack_offset(get_tod_clock_fast());
>> -}
>> -
>> -#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
>> -
>> static __always_inline bool arch_in_rcu_eqs(void)
>> {
>> if (IS_ENABLED(CONFIG_KVM))
>> diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
>> index ce3eb6d5fdf9..7535131c711b 100644
>> --- a/arch/x86/include/asm/entry-common.h
>> +++ b/arch/x86/include/asm/entry-common.h
>> @@ -82,18 +82,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
>> current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED);
>> #endif
>>
>> - /*
>> - * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
>> - * bits. The actual entropy will be further reduced by the compiler
>> - * when applying stack alignment constraints (see cc_stack_align4/8 in
>> - * arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
>> - * low bits from any entropy chosen here.
>> - *
>> - * Therefore, final stack offset entropy will be 7 (x86_64) or
>> - * 8 (ia32) bits.
>> - */
>> - choose_random_kstack_offset(rdtsc());
>> -
>> /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
>> if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
>> this_cpu_read(x86_ibpb_exit_to_user)) {
>> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
>> index f2ed5b72b3d6..827edde11cb9 100644
>> --- a/include/linux/prandom.h
>> +++ b/include/linux/prandom.h
>> @@ -10,13 +10,18 @@
>>
>> #include <linux/types.h>
>> #include <linux/once.h>
>> -#include <linux/percpu.h>
>> #include <linux/random.h>
>>
>> struct rnd_state {
>> __u32 s1, s2, s3, s4;
>> };
>>
>> +/*
>> + * percpu.h includes sched.h, which requires struct rnd_state. So defer until
>> + * after struct rnd_state is defined.
>> + */
>> +#include <linux/percpu.h>
>> +
>
> Yuck. Is this the best we can do to disentangle this?
I think I might be able to remove some stuff out of sched.h into alloc_tag.h
(looks to me like it should have been there all along). And that will allow
removing the sched.h include from percpu.h. I think that will solve it.
>
>> u32 prandom_u32_state(struct rnd_state *state);
>> void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
>> void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
>> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
>> index 089b1432f7e6..83c7e6710f6d 100644
>> --- a/include/linux/randomize_kstack.h
>> +++ b/include/linux/randomize_kstack.h
>> @@ -6,6 +6,7 @@
>> #include <linux/kernel.h>
>> #include <linux/jump_label.h>
>> #include <linux/percpu-defs.h>
>> +#include <linux/prandom.h>
>>
>> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>> randomize_kstack_offset);
>> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
>> #endif
>>
>> +static __always_inline u32 get_update_kstack_offset(void)
>> +{
>> + return prandom_u32_state(¤t->kstack_rnd_state);
I've got bot warnings because this is being called from noinstr code. I guess
the best option is to just move add_random_kstack_offset() to after
instrumentation is enabled for the affected arches.
>> +}
>> +
>> /**
>> - * add_random_kstack_offset - Increase stack utilization by previously
>> - * chosen random offset
>> + * add_random_kstack_offset - Increase stack utilization by a random offset.
>> *
>> * This should be used in the syscall entry path after user registers have been
>> * stored to the stack. Preemption may be enabled. For testing the resulting
>> @@ -56,46 +61,22 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>> #define add_random_kstack_offset() do { \
>> if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
>> &randomize_kstack_offset)) { \
>> - u32 offset = current->kstack_offset; \
>> + u32 offset = get_update_kstack_offset(); \
>> u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
>> /* Keep allocation even after "ptr" loses scope. */ \
>> asm volatile("" :: "r"(ptr) : "memory"); \
>> } \
>> } while (0)
>>
>> -/**
>> - * choose_random_kstack_offset - Choose the random offset for the next
>> - * add_random_kstack_offset()
>> - *
>> - * This should only be used during syscall exit. Preemption may be enabled. This
>> - * position in the syscall flow is done to frustrate attacks from userspace
>> - * attempting to learn the next offset:
>> - * - Maximize the timing uncertainty visible from userspace: if the
>> - * offset is chosen at syscall entry, userspace has much more control
>> - * over the timing between choosing offsets. "How long will we be in
>> - * kernel mode?" tends to be more difficult to predict than "how long
>> - * will we be in user mode?"
>> - * - Reduce the lifetime of the new offset sitting in memory during
>> - * kernel mode execution. Exposure of "thread-local" memory content
>> - * (e.g. current, percpu, etc) tends to be easier than arbitrary
>> - * location memory exposure.
>> - */
>> -#define choose_random_kstack_offset(rand) do { \
>> - if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
>> - &randomize_kstack_offset)) { \
>> - u32 offset = current->kstack_offset; \
>> - offset = ror32(offset, 5) ^ (rand); \
>> - current->kstack_offset = offset; \
>> - } \
>> -} while (0)
>> -
>> static inline void random_kstack_task_init(struct task_struct *tsk)
>> {
>> - current->kstack_offset = 0;
>> + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>> + &randomize_kstack_offset)) {
>> + prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
>
> We should either fix prandom_seed_state() not to truncate the u64 to
> u32, or even better, refactor prandom_seed_full_state() so we can
> reuse it here, and use a 128-bit seed directly.
How about something like this:
---8<---
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index f2ed5b72b3d6..9b651c9b3448 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -19,10 +19,11 @@ struct rnd_state {
u32 prandom_u32_state(struct rnd_state *state);
void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
-void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
+void prandom_seed_full_state_one(struct rnd_state *state);
+void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state);
#define prandom_init_once(pcpu_state) \
- DO_ONCE(prandom_seed_full_state, (pcpu_state))
+ DO_ONCE(prandom_seed_full_state_all, (pcpu_state))
/*
* Handle minimum values for seeds
diff --git a/lib/random32.c b/lib/random32.c
index 24e7acd9343f..50d8f5f9fca7 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -107,24 +107,28 @@ static void prandom_warmup(struct rnd_state *state)
prandom_u32_state(state);
}
-void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
+void prandom_seed_full_state_one(struct rnd_state *state)
{
- int i;
+ u32 seeds[4];
- for_each_possible_cpu(i) {
- struct rnd_state *state = per_cpu_ptr(pcpu_state, i);
- u32 seeds[4];
+ get_random_bytes(&seeds, sizeof(seeds));
+ state->s1 = __seed(seeds[0], 2U);
+ state->s2 = __seed(seeds[1], 8U);
+ state->s3 = __seed(seeds[2], 16U);
+ state->s4 = __seed(seeds[3], 128U);
- get_random_bytes(&seeds, sizeof(seeds));
- state->s1 = __seed(seeds[0], 2U);
- state->s2 = __seed(seeds[1], 8U);
- state->s3 = __seed(seeds[2], 16U);
- state->s4 = __seed(seeds[3], 128U);
+ prandom_warmup(state);
+}
+EXPORT_SYMBOL(prandom_seed_full_state_one);
- prandom_warmup(state);
- }
+void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state)
+{
+ int i;
+
+ for_each_possible_cpu(i)
+ prandom_seed_full_state_one(per_cpu_ptr(pcpu_state, i));
}
-EXPORT_SYMBOL(prandom_seed_full_state);
+EXPORT_SYMBOL(prandom_seed_full_state_all);
#ifdef CONFIG_RANDOM32_SELFTEST
static struct prandom_test1 {
---8<---
>
>> + }
>> }
>> #else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
>> #define add_random_kstack_offset() do { } while (0)
>> -#define choose_random_kstack_offset(rand) do { } while (0)
>> #define random_kstack_task_init(tsk) do { } while (0)
>> #endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index dae227d217ef..ac0402c2b988 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -34,6 +34,7 @@
>> #include <linux/sched/prio.h>
>> #include <linux/sched/types.h>
>> #include <linux/signal_types.h>
>> +#include <linux/prandom.h>
>> #include <linux/spinlock.h>
>> #include <linux/syscall_user_dispatch_types.h>
>> #include <linux/mm_types_task.h>
>> @@ -1614,7 +1615,7 @@ struct task_struct {
>> #endif
>>
>> #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
>> - u32 kstack_offset;
>> + struct rnd_state kstack_rnd_state;
>> #endif
>>
>> #ifdef CONFIG_X86_MCE
>> --
>> 2.43.0
>>
>>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-01 18:20 ` Ryan Roberts
@ 2025-12-02 9:15 ` Ard Biesheuvel
2025-12-02 9:34 ` Ryan Roberts
2025-12-02 9:35 ` Mark Rutland
0 siblings, 2 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2025-12-02 9:15 UTC (permalink / raw)
To: Ryan Roberts
Cc: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld,
linux-hardening, linux-arm-kernel, linux-kernel
On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 28/11/2025 11:01, Ard Biesheuvel wrote:
> > On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Previously different architectures were using random sources of
> >> differing strength and cost to decide the random kstack offset. A number
> >> of architectures (loongarch, powerpc, s390, x86) were using their
> >> timestamp counter, at whatever the frequency happened to be. Other
> >> arches (arm64, riscv) were using entropy from the crng via
> >> get_random_u16().
> >>
> >> There have been concerns that in some cases the timestamp counters may
> >> be too weak, because they can be easily guessed or influenced by user
> >> space. And get_random_u16() has been shown to be too costly for the
> >> level of protection kstack offset randomization provides.
> >>
> >> So let's use a common, architecture-agnostic source of entropy; a
> >> per-task prng, seeded at fork-time from the crng. This has a few
> >> benefits:
> >>
> >> - We can remove choose_random_kstack_offset(); That was only there to
> >> try to make the timestamp counter value a bit harder to influence
> >> from user space.
> >>
> >> - The architecture code is simplified. All it has to do now is call
> >> add_random_kstack_offset() in the syscall path.
> >>
> >> - The strength of the randomness can be reasoned about independently
> >> of the architecture.
> >>
> >> - Arches previously using get_random_u16() now have much faster
> >> syscall paths, see below results.
> >>
> >> There have been some claims that a prng may be less strong than the
> >> timestamp counter if not regularly reseeded. But the prng has a period
> >> of about 2^113. So as long as the prng state remains secret, it should
> >> not be possible to guess. If the prng state can be accessed, we have
> >> bigger problems.
> >>
> >> Additionally, we are only consuming 6 bits to randomize the stack, so
> >> there are only 64 possible random offsets. I assert that it would be
> >> trivial for an attacker to brute force by repeating their attack and
> >> waiting for the random stack offset to be the desired one. The prng
> >> approach seems entirely proportional to this level of protection.
> >>
> >> Performance data are provided below. The baseline is v6.18-rc5 with
> >> rndstack on for each respective arch. (I)/(R) indicate statistically
> >> significant improvement/regression. arm64 platform is AWS Graviton3.
> >> x86_64 platform is AWS Sapphire Rapids:
> >>
> >> +-----------------+--------------+---------------+---------------+
> >> | Benchmark | Result Class | per-task-prng | per-task-prng |
> >> | | | arm64 | x86_64 |
> >> +=================+==============+===============+===============+
> >> | syscall/getpid | mean (ns) | (I) -10.54% | (I) -7.69% |
> >> | | p99 (ns) | (I) -59.53% | 4.14% |
> >> | | p99.9 (ns) | (I) -59.90% | 2.68% |
> >> +-----------------+--------------+---------------+---------------+
> >> | syscall/getppid | mean (ns) | (I) -10.49% | (I) -5.98% |
> >> | | p99 (ns) | (I) -59.83% | -3.11% |
> >> | | p99.9 (ns) | (I) -59.88% | (R) 9.84% |
> >> +-----------------+--------------+---------------+---------------+
> >> | syscall/invalid | mean (ns) | (I) -9.28% | (I) -6.94% |
> >> | | p99 (ns) | (I) -61.06% | (I) -5.57% |
> >> | | p99.9 (ns) | (I) -61.40% | (R) 10.53% |
> >> +-----------------+--------------+---------------+---------------+
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >> arch/Kconfig | 5 ++--
> >> arch/arm64/kernel/syscall.c | 11 -------
> >> arch/loongarch/kernel/syscall.c | 11 -------
> >> arch/powerpc/kernel/syscall.c | 12 --------
> >> arch/riscv/kernel/traps.c | 12 --------
> >> arch/s390/include/asm/entry-common.h | 8 ------
> >> arch/x86/include/asm/entry-common.h | 12 --------
> >> include/linux/prandom.h | 7 ++++-
> >> include/linux/randomize_kstack.h | 43 ++++++++--------------------
> >> include/linux/sched.h | 3 +-
> >> 10 files changed, 22 insertions(+), 102 deletions(-)
> >>
> >
> > I think this is a substantial improvement over the status quo,
> > especially with all the dodgy uses of monotonic counters with
> > unspecified granularity.
> >
> > So with two comments below:
> >
> > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
>
> Thanks!
>
> I shall plan to re-post without the RFC tag against -rc1 then, and we will see
> if anyone pushes back.
>
Sounds good.
...
> >> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> >> index f2ed5b72b3d6..827edde11cb9 100644
> >> --- a/include/linux/prandom.h
> >> +++ b/include/linux/prandom.h
> >> @@ -10,13 +10,18 @@
> >>
> >> #include <linux/types.h>
> >> #include <linux/once.h>
> >> -#include <linux/percpu.h>
> >> #include <linux/random.h>
> >>
> >> struct rnd_state {
> >> __u32 s1, s2, s3, s4;
> >> };
> >>
> >> +/*
> >> + * percpu.h includes sched.h, which requires struct rnd_state. So defer until
> >> + * after struct rnd_state is defined.
> >> + */
> >> +#include <linux/percpu.h>
> >> +
> >
> > Yuck. Is this the best we can do to disentangle this?
>
> I think I might be able to remove some stuff out of sched.h into alloc_tag.h
> (looks to me like it should have been there all along). And that will allow
> removing the sched.h include from percpu.h. I think that will solve it.
>
OK.
> >> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> >> index 089b1432f7e6..83c7e6710f6d 100644
> >> --- a/include/linux/randomize_kstack.h
> >> +++ b/include/linux/randomize_kstack.h
> >> @@ -6,6 +6,7 @@
> >> #include <linux/kernel.h>
> >> #include <linux/jump_label.h>
> >> #include <linux/percpu-defs.h>
> >> +#include <linux/prandom.h>
> >>
> >> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> randomize_kstack_offset);
> >> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
> >> #endif
> >>
> >> +static __always_inline u32 get_update_kstack_offset(void)
> >> +{
> >> + return prandom_u32_state(¤t->kstack_rnd_state);
>
> I've got bot warnings because this is being called from noinstr code. I guess
> the best option is to just move add_random_kstack_offset() to after
> instrumentation is enabled for the affected arches.
>
Just put instrumentation_begin()/instrumentation_end() around the call
to prandom_u32_state() - that seems to be the common approach for
punching holes into the 'noinstr' validation.
> >> +}
> >> +
> >> /**
> >> - * add_random_kstack_offset - Increase stack utilization by previously
> >> - * chosen random offset
> >> + * add_random_kstack_offset - Increase stack utilization by a random offset.
> >> *
> >> * This should be used in the syscall entry path after user registers have been
> >> * stored to the stack. Preemption may be enabled. For testing the resulting
> >> @@ -56,46 +61,22 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> #define add_random_kstack_offset() do { \
> >> if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> >> &randomize_kstack_offset)) { \
> >> - u32 offset = current->kstack_offset; \
> >> + u32 offset = get_update_kstack_offset(); \
> >> u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
> >> /* Keep allocation even after "ptr" loses scope. */ \
> >> asm volatile("" :: "r"(ptr) : "memory"); \
> >> } \
> >> } while (0)
> >>
> >> -/**
> >> - * choose_random_kstack_offset - Choose the random offset for the next
> >> - * add_random_kstack_offset()
> >> - *
> >> - * This should only be used during syscall exit. Preemption may be enabled. This
> >> - * position in the syscall flow is done to frustrate attacks from userspace
> >> - * attempting to learn the next offset:
> >> - * - Maximize the timing uncertainty visible from userspace: if the
> >> - * offset is chosen at syscall entry, userspace has much more control
> >> - * over the timing between choosing offsets. "How long will we be in
> >> - * kernel mode?" tends to be more difficult to predict than "how long
> >> - * will we be in user mode?"
> >> - * - Reduce the lifetime of the new offset sitting in memory during
> >> - * kernel mode execution. Exposure of "thread-local" memory content
> >> - * (e.g. current, percpu, etc) tends to be easier than arbitrary
> >> - * location memory exposure.
> >> - */
> >> -#define choose_random_kstack_offset(rand) do { \
> >> - if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> >> - &randomize_kstack_offset)) { \
> >> - u32 offset = current->kstack_offset; \
> >> - offset = ror32(offset, 5) ^ (rand); \
> >> - current->kstack_offset = offset; \
> >> - } \
> >> -} while (0)
> >> -
> >> static inline void random_kstack_task_init(struct task_struct *tsk)
> >> {
> >> - current->kstack_offset = 0;
> >> + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> + &randomize_kstack_offset)) {
> >> + prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
> >
> > We should either fix prandom_seed_state() not to truncate the u64 to
> > u32, or even better, refactor prandom_seed_full_state() so we can
> > reuse it here, and use a 128-bit seed directly.
>
> How about something like this:
>
Looks good to me, but it does make we wonder why a full prandom state
(i.e., one per cpu) isn't sufficient here, and why we are putting one
in every task. Is it just to avoid the overhead of dis/enabling
preemption on every syscall entry?
>
> ---8<---
> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> index f2ed5b72b3d6..9b651c9b3448 100644
> --- a/include/linux/prandom.h
> +++ b/include/linux/prandom.h
> @@ -19,10 +19,11 @@ struct rnd_state {
>
> u32 prandom_u32_state(struct rnd_state *state);
> void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
> +void prandom_seed_full_state_one(struct rnd_state *state);
> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state);
>
> #define prandom_init_once(pcpu_state) \
> - DO_ONCE(prandom_seed_full_state, (pcpu_state))
> + DO_ONCE(prandom_seed_full_state_all, (pcpu_state))
>
> /*
> * Handle minimum values for seeds
> diff --git a/lib/random32.c b/lib/random32.c
> index 24e7acd9343f..50d8f5f9fca7 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -107,24 +107,28 @@ static void prandom_warmup(struct rnd_state *state)
> prandom_u32_state(state);
> }
>
> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
> +void prandom_seed_full_state_one(struct rnd_state *state)
> {
> - int i;
> + u32 seeds[4];
>
> - for_each_possible_cpu(i) {
> - struct rnd_state *state = per_cpu_ptr(pcpu_state, i);
> - u32 seeds[4];
> + get_random_bytes(&seeds, sizeof(seeds));
> + state->s1 = __seed(seeds[0], 2U);
> + state->s2 = __seed(seeds[1], 8U);
> + state->s3 = __seed(seeds[2], 16U);
> + state->s4 = __seed(seeds[3], 128U);
>
> - get_random_bytes(&seeds, sizeof(seeds));
> - state->s1 = __seed(seeds[0], 2U);
> - state->s2 = __seed(seeds[1], 8U);
> - state->s3 = __seed(seeds[2], 16U);
> - state->s4 = __seed(seeds[3], 128U);
> + prandom_warmup(state);
> +}
> +EXPORT_SYMBOL(prandom_seed_full_state_one);
>
> - prandom_warmup(state);
> - }
> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state)
> +{
> + int i;
> +
> + for_each_possible_cpu(i)
> + prandom_seed_full_state_one(per_cpu_ptr(pcpu_state, i));
> }
> -EXPORT_SYMBOL(prandom_seed_full_state);
> +EXPORT_SYMBOL(prandom_seed_full_state_all);
>
> #ifdef CONFIG_RANDOM32_SELFTEST
> static struct prandom_test1 {
> ---8<---
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:15 ` Ard Biesheuvel
@ 2025-12-02 9:34 ` Ryan Roberts
2025-12-02 9:35 ` Mark Rutland
1 sibling, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-12-02 9:34 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Mark Rutland, Jason A . Donenfeld,
linux-hardening, linux-arm-kernel, linux-kernel
On 02/12/2025 09:15, Ard Biesheuvel wrote:
> On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 28/11/2025 11:01, Ard Biesheuvel wrote:
>>> On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
[...]
>>>> static inline void random_kstack_task_init(struct task_struct *tsk)
>>>> {
>>>> - current->kstack_offset = 0;
>>>> + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>>>> + &randomize_kstack_offset)) {
>>>> + prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
>>>
>>> We should either fix prandom_seed_state() not to truncate the u64 to
>>> u32, or even better, refactor prandom_seed_full_state() so we can
>>> reuse it here, and use a 128-bit seed directly.
>>
>> How about something like this:
>>
>
> Looks good to me, but it does make we wonder why a full prandom state
> (i.e., one per cpu) isn't sufficient here, and why we are putting one
> in every task. Is it just to avoid the overhead of dis/enabling
> preemption on every syscall entry?
Good point, I'm not sure there is a good reason. It's just the shape I ended up
with after the first patch that fixes the bugs. But given we are removing the
second API call in this patch, the migraiton bug goes away. As you say, we could
disable preemption around the prandom_seed_state() call and it should all be
safe. I'll benchmark it.
>
>
>
>
>
>>
>> ---8<---
>> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
>> index f2ed5b72b3d6..9b651c9b3448 100644
>> --- a/include/linux/prandom.h
>> +++ b/include/linux/prandom.h
>> @@ -19,10 +19,11 @@ struct rnd_state {
>>
>> u32 prandom_u32_state(struct rnd_state *state);
>> void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
>> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
>> +void prandom_seed_full_state_one(struct rnd_state *state);
>> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state);
>>
>> #define prandom_init_once(pcpu_state) \
>> - DO_ONCE(prandom_seed_full_state, (pcpu_state))
>> + DO_ONCE(prandom_seed_full_state_all, (pcpu_state))
>>
>> /*
>> * Handle minimum values for seeds
>> diff --git a/lib/random32.c b/lib/random32.c
>> index 24e7acd9343f..50d8f5f9fca7 100644
>> --- a/lib/random32.c
>> +++ b/lib/random32.c
>> @@ -107,24 +107,28 @@ static void prandom_warmup(struct rnd_state *state)
>> prandom_u32_state(state);
>> }
>>
>> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
>> +void prandom_seed_full_state_one(struct rnd_state *state)
>> {
>> - int i;
>> + u32 seeds[4];
>>
>> - for_each_possible_cpu(i) {
>> - struct rnd_state *state = per_cpu_ptr(pcpu_state, i);
>> - u32 seeds[4];
>> + get_random_bytes(&seeds, sizeof(seeds));
>> + state->s1 = __seed(seeds[0], 2U);
>> + state->s2 = __seed(seeds[1], 8U);
>> + state->s3 = __seed(seeds[2], 16U);
>> + state->s4 = __seed(seeds[3], 128U);
>>
>> - get_random_bytes(&seeds, sizeof(seeds));
>> - state->s1 = __seed(seeds[0], 2U);
>> - state->s2 = __seed(seeds[1], 8U);
>> - state->s3 = __seed(seeds[2], 16U);
>> - state->s4 = __seed(seeds[3], 128U);
>> + prandom_warmup(state);
>> +}
>> +EXPORT_SYMBOL(prandom_seed_full_state_one);
>>
>> - prandom_warmup(state);
>> - }
>> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state)
>> +{
>> + int i;
>> +
>> + for_each_possible_cpu(i)
>> + prandom_seed_full_state_one(per_cpu_ptr(pcpu_state, i));
>> }
>> -EXPORT_SYMBOL(prandom_seed_full_state);
>> +EXPORT_SYMBOL(prandom_seed_full_state_all);
>>
>> #ifdef CONFIG_RANDOM32_SELFTEST
>> static struct prandom_test1 {
>> ---8<---
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:15 ` Ard Biesheuvel
2025-12-02 9:34 ` Ryan Roberts
@ 2025-12-02 9:35 ` Mark Rutland
2025-12-02 9:39 ` Ryan Roberts
2025-12-02 9:47 ` Ard Biesheuvel
1 sibling, 2 replies; 14+ messages in thread
From: Mark Rutland @ 2025-12-02 9:35 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ryan Roberts, Kees Cook, Ard Biesheuvel, Will Deacon,
Arnd Bergmann, Jeremy Linton, Catalin Marinas,
Jason A . Donenfeld, linux-hardening, linux-arm-kernel,
linux-kernel
On Tue, Dec 02, 2025 at 10:15:22AM +0100, Ard Biesheuvel wrote:
> On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
> > On 28/11/2025 11:01, Ard Biesheuvel wrote:
> > > On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
> > >> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> > >> index 089b1432f7e6..83c7e6710f6d 100644
> > >> --- a/include/linux/randomize_kstack.h
> > >> +++ b/include/linux/randomize_kstack.h
> > >> @@ -6,6 +6,7 @@
> > >> #include <linux/kernel.h>
> > >> #include <linux/jump_label.h>
> > >> #include <linux/percpu-defs.h>
> > >> +#include <linux/prandom.h>
> > >>
> > >> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> > >> randomize_kstack_offset);
> > >> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> > >> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
> > >> #endif
> > >>
> > >> +static __always_inline u32 get_update_kstack_offset(void)
> > >> +{
> > >> + return prandom_u32_state(¤t->kstack_rnd_state);
> >
> > I've got bot warnings because this is being called from noinstr code. I guess
> > the best option is to just move add_random_kstack_offset() to after
> > instrumentation is enabled for the affected arches.
>
> Just put instrumentation_begin()/instrumentation_end() around the call
> to prandom_u32_state() - that seems to be the common approach for
> punching holes into the 'noinstr' validation.
That silences the warning, but isn't necessarily safe, so please DO NOT
do that blindly. The instrumentation_{begin,end}() annotations are only
supposed to be used when we know by construction that instrumentation is
safe.
Generally, if you can move this to after instrumentation is already
enabled, that should be safe, and so that'd be the better approach.
Ryan, can you share those warnings (e.g. link to those reports)?
IIUC only x86 has noinstr validation, and from a quick scan, I expect
you see warnings from:
* do_syscall_64()
* do_int80_syscall_32()
* __do_fast_syscall_32()
For all of these, it is not safe to call instrumentable code before the
calls to {syscall_,}enter_from_user_mode{,_prepare}(). You'll need to
move the stack rnadomization after the existing instrumentation_begin()
calls.
We'll need to go check the other architectures similarly.
Mark.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:35 ` Mark Rutland
@ 2025-12-02 9:39 ` Ryan Roberts
2025-12-02 9:53 ` Mark Rutland
2025-12-02 9:47 ` Ard Biesheuvel
1 sibling, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-12-02 9:39 UTC (permalink / raw)
To: Mark Rutland, Ard Biesheuvel
Cc: Kees Cook, Ard Biesheuvel, Will Deacon, Arnd Bergmann,
Jeremy Linton, Catalin Marinas, Jason A . Donenfeld,
linux-hardening, linux-arm-kernel, linux-kernel
On 02/12/2025 09:35, Mark Rutland wrote:
> On Tue, Dec 02, 2025 at 10:15:22AM +0100, Ard Biesheuvel wrote:
>> On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>> On 28/11/2025 11:01, Ard Biesheuvel wrote:
>>>> On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
>>>>> index 089b1432f7e6..83c7e6710f6d 100644
>>>>> --- a/include/linux/randomize_kstack.h
>>>>> +++ b/include/linux/randomize_kstack.h
>>>>> @@ -6,6 +6,7 @@
>>>>> #include <linux/kernel.h>
>>>>> #include <linux/jump_label.h>
>>>>> #include <linux/percpu-defs.h>
>>>>> +#include <linux/prandom.h>
>>>>>
>>>>> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>>>>> randomize_kstack_offset);
>>>>> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>>>>> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
>>>>> #endif
>>>>>
>>>>> +static __always_inline u32 get_update_kstack_offset(void)
>>>>> +{
>>>>> + return prandom_u32_state(¤t->kstack_rnd_state);
>>>
>>> I've got bot warnings because this is being called from noinstr code. I guess
>>> the best option is to just move add_random_kstack_offset() to after
>>> instrumentation is enabled for the affected arches.
>>
>> Just put instrumentation_begin()/instrumentation_end() around the call
>> to prandom_u32_state() - that seems to be the common approach for
>> punching holes into the 'noinstr' validation.
>
> That silences the warning, but isn't necessarily safe, so please DO NOT
> do that blindly. The instrumentation_{begin,end}() annotations are only
> supposed to be used when we know by construction that instrumentation is
> safe.
>
> Generally, if you can move this to after instrumentation is already
> enabled, that should be safe, and so that'd be the better approach.
>
> Ryan, can you share those warnings (e.g. link to those reports)?
https://lore.kernel.org/oe-kbuild-all/202511302137.u4iIA2kf-lkp@intel.com/
https://lore.kernel.org/oe-kbuild-all/202511302317.BGFIlAYX-lkp@intel.com/
>
> IIUC only x86 has noinstr validation, and from a quick scan, I expect
> you see warnings from:
>
> * do_syscall_64()
> * do_int80_syscall_32()
> * __do_fast_syscall_32()
>
> For all of these, it is not safe to call instrumentable code before the
> calls to {syscall_,}enter_from_user_mode{,_prepare}(). You'll need to
> move the stack rnadomization after the existing instrumentation_begin()
> calls.
>
> We'll need to go check the other architectures similarly.
OK understood. I'll do it this way then.
>
> Mark.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:35 ` Mark Rutland
2025-12-02 9:39 ` Ryan Roberts
@ 2025-12-02 9:47 ` Ard Biesheuvel
2025-12-02 10:02 ` Mark Rutland
1 sibling, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2025-12-02 9:47 UTC (permalink / raw)
To: Mark Rutland
Cc: Ryan Roberts, Kees Cook, Ard Biesheuvel, Will Deacon,
Arnd Bergmann, Jeremy Linton, Catalin Marinas,
Jason A . Donenfeld, linux-hardening, linux-arm-kernel,
linux-kernel
On Tue, 2 Dec 2025 at 10:35, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Dec 02, 2025 at 10:15:22AM +0100, Ard Biesheuvel wrote:
> > On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
> > > On 28/11/2025 11:01, Ard Biesheuvel wrote:
> > > > On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
> > > >> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> > > >> index 089b1432f7e6..83c7e6710f6d 100644
> > > >> --- a/include/linux/randomize_kstack.h
> > > >> +++ b/include/linux/randomize_kstack.h
> > > >> @@ -6,6 +6,7 @@
> > > >> #include <linux/kernel.h>
> > > >> #include <linux/jump_label.h>
> > > >> #include <linux/percpu-defs.h>
> > > >> +#include <linux/prandom.h>
> > > >>
> > > >> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> > > >> randomize_kstack_offset);
> > > >> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> > > >> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
> > > >> #endif
> > > >>
> > > >> +static __always_inline u32 get_update_kstack_offset(void)
> > > >> +{
> > > >> + return prandom_u32_state(¤t->kstack_rnd_state);
> > >
> > > I've got bot warnings because this is being called from noinstr code. I guess
> > > the best option is to just move add_random_kstack_offset() to after
> > > instrumentation is enabled for the affected arches.
> >
> > Just put instrumentation_begin()/instrumentation_end() around the call
> > to prandom_u32_state() - that seems to be the common approach for
> > punching holes into the 'noinstr' validation.
>
> That silences the warning, but isn't necessarily safe, so please DO NOT
> do that blindly.
Oops - sorry about that.
> The instrumentation_{begin,end}() annotations are only
> supposed to be used when we know by construction that instrumentation is
> safe.
>
> Generally, if you can move this to after instrumentation is already
> enabled, that should be safe, and so that'd be the better approach.
>
> Ryan, can you share those warnings (e.g. link to those reports)?
>
> IIUC only x86 has noinstr validation, and from a quick scan, I expect
> you see warnings from:
>
> * do_syscall_64()
> * do_int80_syscall_32()
> * __do_fast_syscall_32()
>
> For all of these, it is not safe to call instrumentable code before the
> calls to {syscall_,}enter_from_user_mode{,_prepare}(). You'll need to
> move the stack rnadomization after the existing instrumentation_begin()
> calls.
>
> We'll need to go check the other architectures similarly.
>
Given that prandom_u32_state() does a fairly straight-forward mangle
of 4 32-bit words, might it be better to make that __always_inline
itself?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:39 ` Ryan Roberts
@ 2025-12-02 9:53 ` Mark Rutland
2025-12-02 11:04 ` Ryan Roberts
0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2025-12-02 9:53 UTC (permalink / raw)
To: Ryan Roberts
Cc: Ard Biesheuvel, Kees Cook, Ard Biesheuvel, Will Deacon,
Arnd Bergmann, Jeremy Linton, Catalin Marinas,
Jason A . Donenfeld, linux-hardening, linux-arm-kernel,
linux-kernel
On Tue, Dec 02, 2025 at 09:39:40AM +0000, Ryan Roberts wrote:
> On 02/12/2025 09:35, Mark Rutland wrote:
> > On Tue, Dec 02, 2025 at 10:15:22AM +0100, Ard Biesheuvel wrote:
> >> On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>> On 28/11/2025 11:01, Ard Biesheuvel wrote:
> >>>> On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> >>>>> index 089b1432f7e6..83c7e6710f6d 100644
> >>>>> --- a/include/linux/randomize_kstack.h
> >>>>> +++ b/include/linux/randomize_kstack.h
> >>>>> @@ -6,6 +6,7 @@
> >>>>> #include <linux/kernel.h>
> >>>>> #include <linux/jump_label.h>
> >>>>> #include <linux/percpu-defs.h>
> >>>>> +#include <linux/prandom.h>
> >>>>>
> >>>>> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >>>>> randomize_kstack_offset);
> >>>>> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >>>>> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
> >>>>> #endif
> >>>>>
> >>>>> +static __always_inline u32 get_update_kstack_offset(void)
> >>>>> +{
> >>>>> + return prandom_u32_state(¤t->kstack_rnd_state);
> >>>
> >>> I've got bot warnings because this is being called from noinstr code. I guess
> >>> the best option is to just move add_random_kstack_offset() to after
> >>> instrumentation is enabled for the affected arches.
> >>
> >> Just put instrumentation_begin()/instrumentation_end() around the call
> >> to prandom_u32_state() - that seems to be the common approach for
> >> punching holes into the 'noinstr' validation.
> >
> > That silences the warning, but isn't necessarily safe, so please DO NOT
> > do that blindly. The instrumentation_{begin,end}() annotations are only
> > supposed to be used when we know by construction that instrumentation is
> > safe.
> >
> > Generally, if you can move this to after instrumentation is already
> > enabled, that should be safe, and so that'd be the better approach.
> >
> > Ryan, can you share those warnings (e.g. link to those reports)?
>
> https://lore.kernel.org/oe-kbuild-all/202511302137.u4iIA2kf-lkp@intel.com/
> https://lore.kernel.org/oe-kbuild-all/202511302317.BGFIlAYX-lkp@intel.com/
Cool, so those are __do_fast_syscall_32() and do_syscall_64(), which I'd
expect (and those are legitimate warnings).
From a quick scan of v6.18, it looks like arm64, loongarch, powerpc, and
riscv perform all the necessary entry work before calling
add_random_kstack_offset(), and we'll need to change:
* s390's __do_syscall()
* x86's do_syscall_64()
* x86's do_int80_syscall_32()
* x86's __do_fast_syscall_32()
Mark.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:47 ` Ard Biesheuvel
@ 2025-12-02 10:02 ` Mark Rutland
0 siblings, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2025-12-02 10:02 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ryan Roberts, Kees Cook, Ard Biesheuvel, Will Deacon,
Arnd Bergmann, Jeremy Linton, Catalin Marinas,
Jason A . Donenfeld, linux-hardening, linux-arm-kernel,
linux-kernel
On Tue, Dec 02, 2025 at 10:47:04AM +0100, Ard Biesheuvel wrote:
> On Tue, 2 Dec 2025 at 10:35, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Dec 02, 2025 at 10:15:22AM +0100, Ard Biesheuvel wrote:
> > > On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
> > > > I've got bot warnings because this is being called from noinstr code. I guess
> > > > the best option is to just move add_random_kstack_offset() to after
> > > > instrumentation is enabled for the affected arches.
> > >
> > > Just put instrumentation_begin()/instrumentation_end() around the call
> > > to prandom_u32_state() - that seems to be the common approach for
> > > punching holes into the 'noinstr' validation.
> >
> > That silences the warning, but isn't necessarily safe, so please DO NOT
> > do that blindly.
>
> Oops - sorry about that.
No problem! I just wanted to make sure we didn't start to gain broken
usage that'd need an audit and cleanup.
[...]
> Given that prandom_u32_state() does a fairly straight-forward mangle
> of 4 32-bit words, might it be better to make that __always_inline
> itself?
Possibly! I don't know whether it's better to have prandom_u32_state()
inline or out-of-line.
So long as prandom_u32_state() doesn't call out to instrumented code,
making it an __always_inline function will be safe.
Mark.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
2025-12-02 9:53 ` Mark Rutland
@ 2025-12-02 11:04 ` Ryan Roberts
0 siblings, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-12-02 11:04 UTC (permalink / raw)
To: Mark Rutland
Cc: Ard Biesheuvel, Kees Cook, Ard Biesheuvel, Will Deacon,
Arnd Bergmann, Jeremy Linton, Catalin Marinas,
Jason A . Donenfeld, linux-hardening, linux-arm-kernel,
linux-kernel
On 02/12/2025 09:53, Mark Rutland wrote:
> On Tue, Dec 02, 2025 at 09:39:40AM +0000, Ryan Roberts wrote:
>> On 02/12/2025 09:35, Mark Rutland wrote:
>>> On Tue, Dec 02, 2025 at 10:15:22AM +0100, Ard Biesheuvel wrote:
>>>> On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>> On 28/11/2025 11:01, Ard Biesheuvel wrote:
>>>>>> On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
>>>>>>> index 089b1432f7e6..83c7e6710f6d 100644
>>>>>>> --- a/include/linux/randomize_kstack.h
>>>>>>> +++ b/include/linux/randomize_kstack.h
>>>>>>> @@ -6,6 +6,7 @@
>>>>>>> #include <linux/kernel.h>
>>>>>>> #include <linux/jump_label.h>
>>>>>>> #include <linux/percpu-defs.h>
>>>>>>> +#include <linux/prandom.h>
>>>>>>>
>>>>>>> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>>>>>>> randomize_kstack_offset);
>>>>>>> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
>>>>>>> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
>>>>>>> #endif
>>>>>>>
>>>>>>> +static __always_inline u32 get_update_kstack_offset(void)
>>>>>>> +{
>>>>>>> + return prandom_u32_state(¤t->kstack_rnd_state);
>>>>>
>>>>> I've got bot warnings because this is being called from noinstr code. I guess
>>>>> the best option is to just move add_random_kstack_offset() to after
>>>>> instrumentation is enabled for the affected arches.
>>>>
>>>> Just put instrumentation_begin()/instrumentation_end() around the call
>>>> to prandom_u32_state() - that seems to be the common approach for
>>>> punching holes into the 'noinstr' validation.
>>>
>>> That silences the warning, but isn't necessarily safe, so please DO NOT
>>> do that blindly. The instrumentation_{begin,end}() annotations are only
>>> supposed to be used when we know by construction that instrumentation is
>>> safe.
>>>
>>> Generally, if you can move this to after instrumentation is already
>>> enabled, that should be safe, and so that'd be the better approach.
>>>
>>> Ryan, can you share those warnings (e.g. link to those reports)?
>>
>> https://lore.kernel.org/oe-kbuild-all/202511302137.u4iIA2kf-lkp@intel.com/
>> https://lore.kernel.org/oe-kbuild-all/202511302317.BGFIlAYX-lkp@intel.com/
>
> Cool, so those are __do_fast_syscall_32() and do_syscall_64(), which I'd
> expect (and those are legitimate warnings).
>
> From a quick scan of v6.18, it looks like arm64, loongarch, powerpc, and
> riscv perform all the necessary entry work before calling
> add_random_kstack_offset(), and we'll need to change:
>
> * s390's __do_syscall()
> * x86's do_syscall_64()
> * x86's do_int80_syscall_32()
> * x86's __do_fast_syscall_32()
Yup. I'll have a play with inline vs outline. If I go with outline, I'll fix
these sites up.
>
> Mark.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation
2025-11-27 10:59 [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 1/2] randomize_kstack: Maintain kstack_offset per task Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches Ryan Roberts
@ 2025-12-02 16:59 ` Kees Cook
2 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2025-12-02 16:59 UTC (permalink / raw)
To: Ryan Roberts
Cc: Ard Biesheuvel, Will Deacon, Arnd Bergmann, Jeremy Linton,
Catalin Marinas, Mark Rutland, Jason A . Donenfeld,
linux-hardening, linux-arm-kernel, linux-kernel
On Thu, Nov 27, 2025 at 10:59:54AM +0000, Ryan Roberts wrote:
> As I reported at [1], kstack offset randomisation suffers from a couple of bugs
> and, on arm64 at least, the performance is poor. This series attempts to fix
> both; patch 1 provides back-portable fixes for the functional bugs. Patch 2
> proposes a performance improvement approach.
Just to chime in -- this looks really good to me! Universally improved
performance, reduced complexity, and better randomness. :) I'm looking
forward to v2!
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-12-02 16:59 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 10:59 [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 1/2] randomize_kstack: Maintain kstack_offset per task Ryan Roberts
2025-11-27 10:59 ` [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches Ryan Roberts
2025-11-28 11:01 ` Ard Biesheuvel
2025-12-01 18:20 ` Ryan Roberts
2025-12-02 9:15 ` Ard Biesheuvel
2025-12-02 9:34 ` Ryan Roberts
2025-12-02 9:35 ` Mark Rutland
2025-12-02 9:39 ` Ryan Roberts
2025-12-02 9:53 ` Mark Rutland
2025-12-02 11:04 ` Ryan Roberts
2025-12-02 9:47 ` Ard Biesheuvel
2025-12-02 10:02 ` Mark Rutland
2025-12-02 16:59 ` [RFC PATCH v1 0/2] Fix bugs and performance of kstack offset randomisation Kees Cook
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).