* [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
@ 2025-08-14 10:13 Oleg Nesterov
2025-08-14 10:14 ` [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct Oleg Nesterov
` (6 more replies)
0 siblings, 7 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:13 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
Sorry, I have no idea how to test these changes, please review. Especially
4/6 and 5/6, I don't really understand shstk.c.
If you are fine with these changes, I'll try to update the fpregs_soft_get()
and user_regset.set() paths as well.
Oleg.
---
arch/x86/include/asm/shstk.h | 8 ++++----
arch/x86/kernel/fpu/regset.c | 46 ++++++++++++++++++++++++++------------------
arch/x86/kernel/fpu/xstate.c | 12 ++++++------
arch/x86/kernel/fpu/xstate.h | 4 ++--
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/shstk.c | 19 +++++++++++++-----
7 files changed, 55 insertions(+), 38 deletions(-)
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
@ 2025-08-14 10:14 ` Oleg Nesterov
2025-08-14 16:46 ` Edgecombe, Rick P
2025-08-14 10:14 ` [PATCH 2/6] x86/fpu: regset: introduce get_fpstate() helper Oleg Nesterov
` (5 subsequent siblings)
6 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:14 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
Preparation for the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 10 ++++++----
arch/x86/kernel/fpu/xstate.c | 12 ++++++------
arch/x86/kernel/fpu/xstate.h | 4 ++--
3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 0986c2200adc..d280d415b171 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -83,7 +83,7 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
sizeof(fpu->fpstate->regs.fxsave));
}
- copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_FX);
+ copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FX);
return 0;
}
@@ -130,12 +130,14 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
+ struct fpu *fpu = x86_task_fpu(target);
+
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;
- sync_fpstate(x86_task_fpu(target));
+ sync_fpstate(fpu);
- copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
+ copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_XSAVE);
return 0;
}
@@ -419,7 +421,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf mb = { .p = &fxsave, .left = sizeof(fxsave) };
/* Handle init state optimized xstate correctly */
- copy_xstate_to_uabi_buf(mb, target, XSTATE_COPY_FP);
+ copy_xstate_to_uabi_buf(mb, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FP);
fx = &fxsave;
} else {
fx = &fpu->fpstate->regs.fxsave;
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 12ed75c1b567..2bd5974d5f0e 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1256,7 +1256,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
/**
* copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
* @to: membuf descriptor
- * @tsk: The task from which to copy the saved xstate
+ * @fpstate: The fpstate buffer from which to copy
+ * @pkru_val: The PKRU value to store in the PKRU component
* @copy_mode: The requested copy mode
*
* Converts from kernel XSAVE or XSAVES compacted format to UABI conforming
@@ -1265,12 +1266,11 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*
* It supports partial copy but @to.pos always starts from zero.
*/
-void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
- enum xstate_copy_mode copy_mode)
+void copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
+ u32 pkru_val, enum xstate_copy_mode copy_mode)
{
- __copy_xstate_to_uabi_buf(to, x86_task_fpu(tsk)->fpstate,
- x86_task_fpu(tsk)->fpstate->user_xfeatures,
- tsk->thread.pkru, copy_mode);
+ __copy_xstate_to_uabi_buf(to, fpstate, fpstate->user_xfeatures,
+ pkru_val, copy_mode);
}
static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 52ce19289989..9d76ded84cdd 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ struct membuf;
extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u64 xfeatures, u32 pkru_val,
enum xstate_copy_mode copy_mode);
-extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
- enum xstate_copy_mode mode);
+extern void copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
+ u32 pkru_val, enum xstate_copy_mode copy_mode);
extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 2/6] x86/fpu: regset: introduce get_fpstate() helper
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-14 10:14 ` [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct Oleg Nesterov
@ 2025-08-14 10:14 ` Oleg Nesterov
2025-08-14 10:14 ` [PATCH 3/6] x86/fpu: fold sync_fpstate() into get_fpstate() Oleg Nesterov
` (4 subsequent siblings)
6 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:14 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
After the previous change the regset get() functions do not really need
"struct fpu *", they can use "struct fpstate *" returned by the new helper
which also does sync_fpstate().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 42 ++++++++++++++++++++----------------
1 file changed, 24 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index d280d415b171..f5a803774e1c 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -49,6 +49,13 @@ static void sync_fpstate(struct fpu *fpu)
fpu_sync_fpstate(fpu);
}
+static struct fpstate *get_fpstate(struct task_struct *task)
+{
+ struct fpu *fpu = x86_task_fpu(task);
+ sync_fpstate(fpu);
+ return fpu->fpstate;
+}
+
/*
* Invalidate cached FPU registers before modifying the stopped target
* task's fpstate.
@@ -71,19 +78,19 @@ static void fpu_force_restore(struct fpu *fpu)
int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
- sync_fpstate(fpu);
+ fpstate = get_fpstate(target);
if (!use_xsave()) {
- return membuf_write(&to, &fpu->fpstate->regs.fxsave,
- sizeof(fpu->fpstate->regs.fxsave));
+ return membuf_write(&to, &fpstate->regs.fxsave,
+ sizeof(fpstate->regs.fxsave));
}
- copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FX);
+ copy_xstate_to_uabi_buf(to, fpstate, target->thread.pkru, XSTATE_COPY_FX);
return 0;
}
@@ -130,14 +137,13 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;
- sync_fpstate(fpu);
-
- copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_XSAVE);
+ fpstate = get_fpstate(target);
+ copy_xstate_to_uabi_buf(to, fpstate, target->thread.pkru, XSTATE_COPY_XSAVE);
return 0;
}
@@ -189,15 +195,15 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
struct cet_user_state *cetregs;
if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) ||
!ssp_active(target, regset))
return -ENODEV;
- sync_fpstate(fpu);
- cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER);
+ fpstate = get_fpstate(target);
+ cetregs = get_xsave_addr(&fpstate->regs.xsave, XFEATURE_CET_USER);
if (WARN_ON(!cetregs)) {
/*
* This shouldn't ever be NULL because shadow stack was
@@ -403,17 +409,17 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;
- sync_fpstate(fpu);
-
if (!cpu_feature_enabled(X86_FEATURE_FPU))
return fpregs_soft_get(target, regset, to);
+ fpstate = get_fpstate(target);
+
if (!cpu_feature_enabled(X86_FEATURE_FXSR)) {
- return membuf_write(&to, &fpu->fpstate->regs.fsave,
+ return membuf_write(&to, &fpstate->regs.fsave,
sizeof(struct fregs_state));
}
@@ -421,10 +427,10 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf mb = { .p = &fxsave, .left = sizeof(fxsave) };
/* Handle init state optimized xstate correctly */
- copy_xstate_to_uabi_buf(mb, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FP);
+ copy_xstate_to_uabi_buf(mb, fpstate, target->thread.pkru, XSTATE_COPY_FP);
fx = &fxsave;
} else {
- fx = &fpu->fpstate->regs.fxsave;
+ fx = &fpstate->regs.fxsave;
}
__convert_from_fxsr(&env, target, fx);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 3/6] x86/fpu: fold sync_fpstate() into get_fpstate()
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-14 10:14 ` [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct Oleg Nesterov
2025-08-14 10:14 ` [PATCH 2/6] x86/fpu: regset: introduce get_fpstate() helper Oleg Nesterov
@ 2025-08-14 10:14 ` Oleg Nesterov
2025-08-14 10:14 ` [PATCH 4/6] x86/shstk: add "task_struct *tsk" argument to reset_thread_features() Oleg Nesterov
` (3 subsequent siblings)
6 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:14 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
After the previous change sync_fpstate() has no other callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index f5a803774e1c..ecbabdc15ec1 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -43,16 +43,12 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
* - ptrace to dump fpstate of a stopped task, in which case the registers
* have already been saved to fpstate on context switch.
*/
-static void sync_fpstate(struct fpu *fpu)
-{
- if (fpu == x86_task_fpu(current))
- fpu_sync_fpstate(fpu);
-}
-
static struct fpstate *get_fpstate(struct task_struct *task)
{
struct fpu *fpu = x86_task_fpu(task);
- sync_fpstate(fpu);
+
+ if (task == current)
+ fpu_sync_fpstate(fpu);
return fpu->fpstate;
}
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 4/6] x86/shstk: add "task_struct *tsk" argument to reset_thread_features()
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (2 preceding siblings ...)
2025-08-14 10:14 ` [PATCH 3/6] x86/fpu: fold sync_fpstate() into get_fpstate() Oleg Nesterov
@ 2025-08-14 10:14 ` Oleg Nesterov
2025-08-14 10:14 ` [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
` (2 subsequent siblings)
6 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:14 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
Preparation for the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/include/asm/shstk.h | 4 ++--
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/shstk.c | 8 ++++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index ba6f2fe43848..92d449cc352a 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -15,7 +15,7 @@ struct thread_shstk {
};
long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
-void reset_thread_features(void);
+void reset_thread_features(struct task_struct *task);
unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
unsigned long stack_size);
void shstk_free(struct task_struct *p);
@@ -26,7 +26,7 @@ bool shstk_is_enabled(void);
#else
static inline long shstk_prctl(struct task_struct *task, int option,
unsigned long arg2) { return -EINVAL; }
-static inline void reset_thread_features(void) {}
+static inline void reset_thread_features(struct task_struct *task) {}
static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
unsigned long clone_flags,
unsigned long stack_size) { return 0; }
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 52a5c03c353c..543425ea8d44 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -540,7 +540,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
load_gs_index(__USER_DS);
}
- reset_thread_features();
+ reset_thread_features(current);
loadsegment(fs, 0);
loadsegment(es, _ds);
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 2ddf23387c7e..e6d3b1371b11 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -184,11 +184,11 @@ static int shstk_setup(void)
return 0;
}
-void reset_thread_features(void)
+void reset_thread_features(struct task_struct *tsk)
{
- memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk));
- current->thread.features = 0;
- current->thread.features_locked = 0;
+ memset(&tsk->thread.shstk, 0, sizeof(struct thread_shstk));
+ tsk->thread.features = 0;
+ tsk->thread.features_locked = 0;
}
unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (3 preceding siblings ...)
2025-08-14 10:14 ` [PATCH 4/6] x86/shstk: add "task_struct *tsk" argument to reset_thread_features() Oleg Nesterov
@ 2025-08-14 10:14 ` Oleg Nesterov
2025-08-14 17:03 ` Edgecombe, Rick P
2025-08-14 10:14 ` [PATCH 6/6] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER Oleg Nesterov
2025-08-15 15:52 ` [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
6 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:14 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
stack for no reason, the new (kernel) thread will never return to usermode.
Plus the current code doesn't even look correct, in this case fpu_clone()
won't call update_fpu_shstk().
Add the new "bool minimal = !!args->fn" argument (which matches that of
fpu_clone()) to shstk_alloc_thread_stack() and change it to do
reset_thread_features(tsk) if "minimal" is true.
With this patch ssp_get() -> ssp_active(target) should never return true
if target->flags & PF_USER_WORKER.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/include/asm/shstk.h | 4 ++--
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 11 ++++++++++-
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 92d449cc352a..dfb2aeebc25f 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -17,7 +17,7 @@ struct thread_shstk {
long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
void reset_thread_features(struct task_struct *task);
unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
- unsigned long stack_size);
+ bool minimal, unsigned long stack_size);
void shstk_free(struct task_struct *p);
int setup_signal_shadow_stack(struct ksignal *ksig);
int restore_signal_shadow_stack(void);
@@ -28,7 +28,7 @@ static inline long shstk_prctl(struct task_struct *task, int option,
unsigned long arg2) { return -EINVAL; }
static inline void reset_thread_features(struct task_struct *task) {}
static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
- unsigned long clone_flags,
+ unsigned long clone_flags, bool minimal,
unsigned long stack_size) { return 0; }
static inline void shstk_free(struct task_struct *p) {}
static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1b7960cf6eb0..e932e0e53972 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -209,7 +209,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
* is disabled, new_ssp will remain 0, and fpu_clone() will know not to
* update it.
*/
- new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
+ new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->fn, args->stack_size);
if (IS_ERR_VALUE(new_ssp))
return PTR_ERR((void *)new_ssp);
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index e6d3b1371b11..3da22c6f5874 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -192,11 +192,20 @@ void reset_thread_features(struct task_struct *tsk)
}
unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
- unsigned long stack_size)
+ bool minimal, unsigned long stack_size)
{
struct thread_shstk *shstk = &tsk->thread.shstk;
unsigned long addr, size;
+ /*
+ * Kernel threads cloned from userspace thread never return to
+ * usermode.
+ */
+ if (minimal) {
+ reset_thread_features(tsk);
+ return 0;
+ }
+
/*
* If shadow stack is not enabled on the new thread, skip any
* switch to a new shadow stack.
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 6/6] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (4 preceding siblings ...)
2025-08-14 10:14 ` [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
@ 2025-08-14 10:14 ` Oleg Nesterov
2025-08-15 15:52 ` [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
6 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-14 10:14 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86
PF_USER_WORKERs must never use FPU, this is what kernel_fpu_begin/etc
assume. The .regset_get() functions can safely use init_fpstate if
target->flags & PF_USER_WORKER.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index ecbabdc15ec1..dfd12d109f00 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,8 +45,12 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
*/
static struct fpstate *get_fpstate(struct task_struct *task)
{
- struct fpu *fpu = x86_task_fpu(task);
+ struct fpu *fpu;
+ if (unlikely(task->flags & PF_USER_WORKER))
+ return &init_fpstate;
+
+ fpu = x86_task_fpu(task);
if (task == current)
fpu_sync_fpstate(fpu);
return fpu->fpstate;
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct
2025-08-14 10:14 ` [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct Oleg Nesterov
@ 2025-08-14 16:46 ` Edgecombe, Rick P
2025-08-15 12:22 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Edgecombe, Rick P @ 2025-08-14 16:46 UTC (permalink / raw)
To: mingo@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, axboe@kernel.dk,
tglx@linutronix.de, Mehta, Sohil, oleg@redhat.com
Cc: linux-kernel@vger.kernel.org, x86@kernel.org
On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> Preparation for the next change.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
These patches and the coverletter have very little information.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 10:14 ` [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
@ 2025-08-14 17:03 ` Edgecombe, Rick P
2025-08-14 18:33 ` Mark Brown
2025-08-15 12:17 ` Oleg Nesterov
0 siblings, 2 replies; 33+ messages in thread
From: Edgecombe, Rick P @ 2025-08-14 17:03 UTC (permalink / raw)
To: mingo@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, axboe@kernel.dk,
tglx@linutronix.de, Mehta, Sohil, oleg@redhat.com
Cc: broonie@kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org,
debug@rivosinc.com
+Mark and Deepak
On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> stack for no reason, the new (kernel) thread will never return to usermode.
>
> Plus the current code doesn't even look correct, in this case fpu_clone()
> won't call update_fpu_shstk().
The actual SSP register gets set when returning to userspace, it will get this
from MSR_IA32_PL3_SSP, which is restored from the xsave buffer. What I am seeing
is that fpu_clone() clears out the buffer in the 'minimal' case. So the xstate
copy of the SSP should already be zero and the problem is just that a shadow
stack gets allocated when one doesn't need to be.
I agree we don't need to allocate a shadow stack in this case, but I'm not sure
it is right to fully disable shadow stack in thread.features. First of all,
disabling it from shstk_alloc_thread_stack() seems weird. It just handles
allocating shadow stacks.
But also it seems the requirements are very similar to the vfork case where we
don't allocate a shadow stack. If we implement it similarly we can have less
special cases.
Lastly, it doesn't seem there is any way to clone from IO uring today, but there
were proposals. The general idea from the security POV is that copied threads
will keep shadow stack enabled. So it seems to fit better with the concepts
involved to not clear the thread.features.
How about just adding the 'minimal' condition to:
if (clone_flags & CLONE_VFORK) {
shstk->base = 0;
shstk->size = 0;
return 0;
}
...then update all the comments where vfork is called out as the only case that
does this?
>
> Add the new "bool minimal = !!args->fn" argument (which matches that of
> fpu_clone()) to shstk_alloc_thread_stack() and change it to do
> reset_thread_features(tsk) if "minimal" is true.
>
> With this patch ssp_get() -> ssp_active(target) should never return true
> if target->flags & PF_USER_WORKER.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> arch/x86/include/asm/shstk.h | 4 ++--
> arch/x86/kernel/process.c | 2 +-
> arch/x86/kernel/shstk.c | 11 ++++++++++-
> 3 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
> index 92d449cc352a..dfb2aeebc25f 100644
> --- a/arch/x86/include/asm/shstk.h
> +++ b/arch/x86/include/asm/shstk.h
> @@ -17,7 +17,7 @@ struct thread_shstk {
> long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
> void reset_thread_features(struct task_struct *task);
> unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
> - unsigned long stack_size);
> + bool minimal, unsigned long stack_size);
> void shstk_free(struct task_struct *p);
> int setup_signal_shadow_stack(struct ksignal *ksig);
> int restore_signal_shadow_stack(void);
> @@ -28,7 +28,7 @@ static inline long shstk_prctl(struct task_struct *task, int option,
> unsigned long arg2) { return -EINVAL; }
> static inline void reset_thread_features(struct task_struct *task) {}
> static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
> - unsigned long clone_flags,
> + unsigned long clone_flags, bool minimal,
> unsigned long stack_size) { return 0; }
> static inline void shstk_free(struct task_struct *p) {}
> static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 1b7960cf6eb0..e932e0e53972 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -209,7 +209,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> * is disabled, new_ssp will remain 0, and fpu_clone() will know not to
> * update it.
> */
> - new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
> + new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->fn, args->stack_size);
> if (IS_ERR_VALUE(new_ssp))
> return PTR_ERR((void *)new_ssp);
>
> diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
> index e6d3b1371b11..3da22c6f5874 100644
> --- a/arch/x86/kernel/shstk.c
> +++ b/arch/x86/kernel/shstk.c
> @@ -192,11 +192,20 @@ void reset_thread_features(struct task_struct *tsk)
> }
>
> unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
> - unsigned long stack_size)
> + bool minimal, unsigned long stack_size)
> {
> struct thread_shstk *shstk = &tsk->thread.shstk;
> unsigned long addr, size;
>
> + /*
> + * Kernel threads cloned from userspace thread never return to
> + * usermode.
> + */
> + if (minimal) {
> + reset_thread_features(tsk);
> + return 0;
> + }
> +
> /*
> * If shadow stack is not enabled on the new thread, skip any
> * switch to a new shadow stack.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 17:03 ` Edgecombe, Rick P
@ 2025-08-14 18:33 ` Mark Brown
2025-08-14 22:43 ` Edgecombe, Rick P
2025-08-15 13:01 ` Oleg Nesterov
2025-08-15 12:17 ` Oleg Nesterov
1 sibling, 2 replies; 33+ messages in thread
From: Mark Brown @ 2025-08-14 18:33 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: mingo@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, axboe@kernel.dk,
tglx@linutronix.de, Mehta, Sohil, oleg@redhat.com,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]
On Thu, Aug 14, 2025 at 05:03:36PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > stack for no reason, the new (kernel) thread will never return to usermode.
> I agree we don't need to allocate a shadow stack in this case, but I'm not sure
> it is right to fully disable shadow stack in thread.features. First of all,
> disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> allocating shadow stacks.
I agree that it's better to leave userspace shadow stacks enabled, given
that the reason we're not allocating the shadow stack is that we don't
expect to ever return to userspace then it should be fine to leave the
feature turned on for userspace. If we mess up and do somehow return to
userspace it seems better to have the feature enabled and hopefully give
us some protection against our mistake, and if something causes the
worker thread to start a normal thread then things should work as
expected.
> How about just adding the 'minimal' condition to:
> if (clone_flags & CLONE_VFORK) {
> shstk->base = 0;
> shstk->size = 0;
> return 0;
> }
> ...then update all the comments where vfork is called out as the only case that
> does this?
Perhaps we should factor the logic for deciding if we need to allocate a
userspace shadow stack out into the arch independent code and do
something like set a flag in kernel_clone_args that the arches can
check? I think the logic is the same for all arches at the minute and
don't see a reason why it'd diverge. That'd collide a bit with my
clone3() series, there's some overlap there with that creating another
reason why the decision would change. Reducing the duplication would be
nice.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 18:33 ` Mark Brown
@ 2025-08-14 22:43 ` Edgecombe, Rick P
2025-08-15 11:44 ` Mark Brown
2025-08-15 13:01 ` Oleg Nesterov
1 sibling, 1 reply; 33+ messages in thread
From: Edgecombe, Rick P @ 2025-08-14 22:43 UTC (permalink / raw)
To: broonie@kernel.org
Cc: debug@rivosinc.com, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, oleg@redhat.com, x86@kernel.org
On Thu, 2025-08-14 at 19:33 +0100, Mark Brown wrote:
> > How about just adding the 'minimal' condition to:
> > if (clone_flags & CLONE_VFORK) {
> > shstk->base = 0;
> > shstk->size = 0;
> > return 0;
> > }
> > ...then update all the comments where vfork is called out as the only case
> > that
> > does this?
>
> Perhaps we should factor the logic for deciding if we need to allocate a
> userspace shadow stack out into the arch independent code and do
> something like set a flag in kernel_clone_args that the arches can
> check? I think the logic is the same for all arches at the minute and
> don't see a reason why it'd diverge.
Sounds good. Like a little start to this:
https://lore.kernel.org/lkml/20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com/
> That'd collide a bit with my
> clone3() series, there's some overlap there with that creating another
> reason why the decision would change. Reducing the duplication would be
> nice.
Argh, I need to test the latest of that still.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 22:43 ` Edgecombe, Rick P
@ 2025-08-15 11:44 ` Mark Brown
2025-08-15 19:11 ` Deepak Gupta
0 siblings, 1 reply; 33+ messages in thread
From: Mark Brown @ 2025-08-15 11:44 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, oleg@redhat.com, x86@kernel.org
[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]
On Thu, Aug 14, 2025 at 10:43:45PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-08-14 at 19:33 +0100, Mark Brown wrote:
> > Perhaps we should factor the logic for deciding if we need to allocate a
> > userspace shadow stack out into the arch independent code and do
> > something like set a flag in kernel_clone_args that the arches can
> > check? I think the logic is the same for all arches at the minute and
> > don't see a reason why it'd diverge.
> Sounds good. Like a little start to this:
> https://lore.kernel.org/lkml/20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com/
Yes, exactly.
> > That'd collide a bit with my
> > clone3() series, there's some overlap there with that creating another
> > reason why the decision would change. Reducing the duplication would be
> > nice.
> Argh, I need to test the latest of that still.
Yury pointed me at some newer x86 systems I was able to get access to so
I was finally able to test that one myself before sending it out,
confirmation would be good but hopefully it's fine. I've been holding
back on sending a rebased version out since Deepak was going to help me
get set up to test it on RISC-V. Though I see now that the RISC-V code
has vanished from -next (I guess due to fallout from the issues with the
merge to Linus, it looks like there's almost nothing in the branch
currently), not sure what the plan is there?
Perhaps I should just send it out, but given the difficulty getting
anyone to pay attention I was trying to avoid issues with missing
updates for newly added RISC-V shadow stacks.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 17:03 ` Edgecombe, Rick P
2025-08-14 18:33 ` Mark Brown
@ 2025-08-15 12:17 ` Oleg Nesterov
2025-08-15 16:19 ` Edgecombe, Rick P
1 sibling, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 12:17 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: mingo@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, axboe@kernel.dk,
tglx@linutronix.de, Mehta, Sohil, broonie@kernel.org,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
On 08/14, Edgecombe, Rick P wrote:
>
> +Mark and Deepak
>
> On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > stack for no reason, the new (kernel) thread will never return to usermode.
> >
> > Plus the current code doesn't even look correct, in this case fpu_clone()
> > won't call update_fpu_shstk().
...
> I agree we don't need to allocate a shadow stack in this case,
Great,
> but I'm not sure
> it is right to fully disable shadow stack in thread.features.
Why?
> First of all,
> disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> allocating shadow stacks.
I agree in advance with any other change.
> Lastly, it doesn't seem there is any way to clone from IO uring today,
Not sure I understand... create_io_thread() ?
> How about just adding the 'minimal' condition to:
> if (clone_flags & CLONE_VFORK) {
> shstk->base = 0;
> shstk->size = 0;
> return 0;
> }
> ...then update all the comments where vfork is called out as the only case that
> does this?
but create_io_thread() and vhost_task_create() do not use CLONE_VFORK?
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct
2025-08-14 16:46 ` Edgecombe, Rick P
@ 2025-08-15 12:22 ` Oleg Nesterov
0 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 12:22 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: mingo@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, axboe@kernel.dk,
tglx@linutronix.de, Mehta, Sohil, linux-kernel@vger.kernel.org,
x86@kernel.org
On 08/14, Edgecombe, Rick P wrote:
>
> On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> > Preparation for the next change.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> These patches and the coverletter have very little information.
Agreed. I forgot to add the links to the previous discussions,
please see
Warning from x86_task_fpu()
https://lore.kernel.org/all/aJVuZZgYjEMxiUYq@ly-workstation/
PF_USER_WORKERs and shadow stack
https://lore.kernel.org/all/20250813162824.GA15234@redhat.com/
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-14 18:33 ` Mark Brown
2025-08-14 22:43 ` Edgecombe, Rick P
@ 2025-08-15 13:01 ` Oleg Nesterov
2025-08-15 13:08 ` Oleg Nesterov
1 sibling, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 13:01 UTC (permalink / raw)
To: Mark Brown
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
On 08/14, Mark Brown wrote:
>
> On Thu, Aug 14, 2025 at 05:03:36PM +0000, Edgecombe, Rick P wrote:
> > On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
>
> > > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > > stack for no reason, the new (kernel) thread will never return to usermode.
>
> > I agree we don't need to allocate a shadow stack in this case, but I'm not sure
> > it is right to fully disable shadow stack in thread.features. First of all,
> > disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> > allocating shadow stacks.
>
> I agree that it's better to leave userspace shadow stacks enabled, given
> that the reason we're not allocating the shadow stack is that we don't
> expect to ever return to userspace then it should be fine to leave the
> feature turned on for userspace. If we mess up and do somehow return to
> userspace
But a PF_USER_WORKER task can never return to userspace. It doesn't differ
from PF_KTHREAD in this respect.
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 13:01 ` Oleg Nesterov
@ 2025-08-15 13:08 ` Oleg Nesterov
2025-08-15 15:28 ` Mark Brown
0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 13:08 UTC (permalink / raw)
To: Mark Brown
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
On 08/15, Oleg Nesterov wrote:
>
> On 08/14, Mark Brown wrote:
> >
> > On Thu, Aug 14, 2025 at 05:03:36PM +0000, Edgecombe, Rick P wrote:
> > > On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> >
> > > > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > > > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > > > stack for no reason, the new (kernel) thread will never return to usermode.
> >
> > > I agree we don't need to allocate a shadow stack in this case, but I'm not sure
> > > it is right to fully disable shadow stack in thread.features. First of all,
> > > disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> > > allocating shadow stacks.
> >
> > I agree that it's better to leave userspace shadow stacks enabled, given
> > that the reason we're not allocating the shadow stack is that we don't
> > expect to ever return to userspace then it should be fine to leave the
> > feature turned on for userspace. If we mess up and do somehow return to
> > userspace
>
> But a PF_USER_WORKER task can never return to userspace. It doesn't differ
> from PF_KTHREAD in this respect.
... of course unless it does exec.
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 13:08 ` Oleg Nesterov
@ 2025-08-15 15:28 ` Mark Brown
2025-08-15 15:43 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Mark Brown @ 2025-08-15 15:28 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
[-- Attachment #1: Type: text/plain, Size: 833 bytes --]
On Fri, Aug 15, 2025 at 03:08:52PM +0200, Oleg Nesterov wrote:
> On 08/15, Oleg Nesterov wrote:
> > On 08/14, Mark Brown wrote:
> > > I agree that it's better to leave userspace shadow stacks enabled, given
> > > that the reason we're not allocating the shadow stack is that we don't
> > > expect to ever return to userspace then it should be fine to leave the
> > > feature turned on for userspace. If we mess up and do somehow return to
> > > userspace
> > But a PF_USER_WORKER task can never return to userspace. It doesn't differ
> > from PF_KTHREAD in this respect.
> ... of course unless it does exec.
Sure, but OTOH at least for arm64 there's no cost to leaving the feature
enabled unless you actually execute userspace code so if we never return
to userspace writing the code to disable isn't really buying us anything.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 15:28 ` Mark Brown
@ 2025-08-15 15:43 ` Oleg Nesterov
2025-08-15 15:48 ` Mark Brown
0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 15:43 UTC (permalink / raw)
To: Mark Brown
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
On 08/15, Mark Brown wrote:
>
> On Fri, Aug 15, 2025 at 03:08:52PM +0200, Oleg Nesterov wrote:
> > On 08/15, Oleg Nesterov wrote:
> > > On 08/14, Mark Brown wrote:
>
> > > > I agree that it's better to leave userspace shadow stacks enabled, given
> > > > that the reason we're not allocating the shadow stack is that we don't
> > > > expect to ever return to userspace then it should be fine to leave the
> > > > feature turned on for userspace. If we mess up and do somehow return to
> > > > userspace
>
> > > But a PF_USER_WORKER task can never return to userspace. It doesn't differ
> > > from PF_KTHREAD in this respect.
>
> > ... of course unless it does exec.
>
> Sure, but OTOH at least for arm64 there's no cost to leaving the feature
> enabled unless you actually execute userspace code so if we never return
> to userspace writing the code to disable isn't really buying us anything.
The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.
I'd like to make this logic consistent with PF_KTHREAD, and in the longer
term change the x86 FPU code so that the kernel threads can run without
without "struct fpu" attached to task_struct.
Again, please see
https://lore.kernel.org/all/20250813191441.GA26754@redhat.com/
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 15:43 ` Oleg Nesterov
@ 2025-08-15 15:48 ` Mark Brown
2025-08-15 16:00 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Mark Brown @ 2025-08-15 15:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
[-- Attachment #1: Type: text/plain, Size: 758 bytes --]
On Fri, Aug 15, 2025 at 05:43:11PM +0200, Oleg Nesterov wrote:
> On 08/15, Mark Brown wrote:
> > Sure, but OTOH at least for arm64 there's no cost to leaving the feature
> > enabled unless you actually execute userspace code so if we never return
> > to userspace writing the code to disable isn't really buying us anything.
> The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
> the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.
> I'd like to make this logic consistent with PF_KTHREAD, and in the longer
> term change the x86 FPU code so that the kernel threads can run without
> without "struct fpu" attached to task_struct.
OK, that's entirely x86 specific - there's no reason we'd want to do
that for arm64.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (5 preceding siblings ...)
2025-08-14 10:14 ` [PATCH 6/6] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER Oleg Nesterov
@ 2025-08-15 15:52 ` Oleg Nesterov
2025-08-15 15:59 ` Dave Hansen
6 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 15:52 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner
Cc: linux-kernel, x86, Mark Brown
Dave, Sohil, what do you think?
OK, it seems that 5/6 (and thus 6/6) needs more discussion, but what
about 1-3 for the start?
These changes simply shift x86_task_fpu() and sync_fpstate() from
.regset_get() paths into the single helper, get_fpstate(). To me this
makes sense...
Oleg.
On 08/14, Oleg Nesterov wrote:
>
> Sorry, I have no idea how to test these changes, please review. Especially
> 4/6 and 5/6, I don't really understand shstk.c.
>
> If you are fine with these changes, I'll try to update the fpregs_soft_get()
> and user_regset.set() paths as well.
>
> Oleg.
> ---
>
> arch/x86/include/asm/shstk.h | 8 ++++----
> arch/x86/kernel/fpu/regset.c | 46 ++++++++++++++++++++++++++------------------
> arch/x86/kernel/fpu/xstate.c | 12 ++++++------
> arch/x86/kernel/fpu/xstate.h | 4 ++--
> arch/x86/kernel/process.c | 2 +-
> arch/x86/kernel/process_64.c | 2 +-
> arch/x86/kernel/shstk.c | 19 +++++++++++++-----
> 7 files changed, 55 insertions(+), 38 deletions(-)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-15 15:52 ` [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
@ 2025-08-15 15:59 ` Dave Hansen
2025-08-15 16:02 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Dave Hansen @ 2025-08-15 15:59 UTC (permalink / raw)
To: Oleg Nesterov, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ingo Molnar, Jens Axboe, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86, Mark Brown
On 8/15/25 08:52, Oleg Nesterov wrote:
> Dave, Sohil, what do you think?
>
> OK, it seems that 5/6 (and thus 6/6) needs more discussion, but what
> about 1-3 for the start?
They kinda need fleshed out changelogs first, don't you think?
Especially 1/6. It also needs an actual cover letter explaining what
it's trying to do and the larger context. Ideally, it's self-contained
and as opposed to links to previous discussions.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 15:48 ` Mark Brown
@ 2025-08-15 16:00 ` Oleg Nesterov
2025-08-15 17:08 ` Mark Brown
0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 16:00 UTC (permalink / raw)
To: Mark Brown
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
On 08/15, Mark Brown wrote:
>
> On Fri, Aug 15, 2025 at 05:43:11PM +0200, Oleg Nesterov wrote:
> > On 08/15, Mark Brown wrote:
>
> > > Sure, but OTOH at least for arm64 there's no cost to leaving the feature
> > > enabled unless you actually execute userspace code so if we never return
> > > to userspace writing the code to disable isn't really buying us anything.
>
> > The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
> > the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.
>
> > I'd like to make this logic consistent with PF_KTHREAD, and in the longer
> > term change the x86 FPU code so that the kernel threads can run without
> > without "struct fpu" attached to task_struct.
>
> OK, that's entirely x86 specific - there's no reason we'd want to do
> that for arm64.
Since I know nothing about arm64. Any reason we do want to have the unnecessary
ARCH_SHSTK_SHSTK/shstk on arm64?
And... do you agree that shstk_alloc_thread_stack() without update_fpu_shstk()
in copy_thread() path doesn't look right? Even if nothing really bad can happen.
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-15 15:59 ` Dave Hansen
@ 2025-08-15 16:02 ` Oleg Nesterov
2025-08-15 16:32 ` Sohil Mehta
0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 16:02 UTC (permalink / raw)
To: Dave Hansen
Cc: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Sohil Mehta,
Thomas Gleixner, linux-kernel, x86, Mark Brown
On 08/15, Dave Hansen wrote:
>
> On 8/15/25 08:52, Oleg Nesterov wrote:
> > Dave, Sohil, what do you think?
> >
> > OK, it seems that 5/6 (and thus 6/6) needs more discussion, but what
> > about 1-3 for the start?
> They kinda need fleshed out changelogs first, don't you think?
> Especially 1/6. It also needs an actual cover letter explaining what
> it's trying to do and the larger context. Ideally, it's self-contained
> and as opposed to links to previous discussions.
I agree about the cover letter, but what else do you think the changelog
in 1/6 could say? ;)
OK, I'll try to improve and resend.
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 12:17 ` Oleg Nesterov
@ 2025-08-15 16:19 ` Edgecombe, Rick P
2025-08-15 16:54 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Edgecombe, Rick P @ 2025-08-15 16:19 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
broonie@kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, linux-kernel@vger.kernel.org, x86@kernel.org
On Fri, 2025-08-15 at 14:17 +0200, Oleg Nesterov wrote:
> > but I'm not sure
> > it is right to fully disable shadow stack in thread.features.
>
> Why?
The bit in thread.features is like a sticky bit that is inherrited whenver a
thread is cloned. How it works normally is that the first thread in the app
(really glibc loader) enables shadow stacks, then new threads automatically
inherit that shadow stack is enabled. So in practice it is like a process wide
thing, but stored on each thread. This process-wide behavior is to add to the
security. You don't want to allow a protected app to spawn a new thread that
escapes the enforcement. There is a way to manually disable shadow stack per-
thread, but it is protected by ARCH_SHSTK_LOCK, which gets set by glibc loader
before jumping into the actual app.
When shadow stack is enabled, depending on the circumstances a new shadow stack
will automatically be allocated for a new thread. shstk->base and shstk->size
are about that automatically enabled shadow stack.
So what are we trying to do for PF_USER_WORKER? Prevent them from wasting a VMA
with an unused shadow stack? Or set PF_USER_WORKER's aside from the logic that
is about more than protecting the individual thread in the process?
>
> > First of all,
> > disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> > allocating shadow stacks.
>
> I agree in advance with any other change.
>
> > Lastly, it doesn't seem there is any way to clone from IO uring today,
>
> Not sure I understand... create_io_thread() ?
There was some discussion in the past about adding a clone, but the comment was
more about whether it fit the concepts in involved.
https://lwn.net/Articles/908268/
>
> > How about just adding the 'minimal' condition to:
> > if (clone_flags & CLONE_VFORK) {
> > shstk->base = 0;
> > shstk->size = 0;
> > return 0;
> > }
> > ...then update all the comments where vfork is called out as the only case
> > that
> > does this?
>
> but create_io_thread() and vhost_task_create() do not use CLONE_VFORK?
No, to make it have the same logic as the vfork case (which doesn't allocate a
new shadow stack).
Like:
if ((clone_flags & CLONE_VFORK) || minimal) {
shstk->base = 0;
shstk->size = 0;
return 0;
}
Or as Mark was suggesting, replace it with something like:
if (needs_new_shstk(clone_args)) {
shstk->base = 0;
shstk->size = 0;
return 0;
}
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-15 16:02 ` Oleg Nesterov
@ 2025-08-15 16:32 ` Sohil Mehta
2025-08-15 19:33 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Sohil Mehta @ 2025-08-15 16:32 UTC (permalink / raw)
To: Oleg Nesterov, Dave Hansen
Cc: Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
Jens Axboe, Peter Zijlstra, Rick Edgecombe, Thomas Gleixner,
linux-kernel, x86, Mark Brown
On 8/15/2025 9:02 AM, Oleg Nesterov wrote:
>>> Dave, Sohil, what do you think?
Thank you for doing this series.
I think it would be useful to categorize the impact of the "abuse" in
the cover letter. Is it going to cause kernel crashes, userspace crashes
or just incorrect reporting?
Are there any "must do" fixes that need to be backported in comparison
with the "good to have" optimizations? I am wondering if it might be
possible to structure the series that way to make the separation clear.
> I agree about the cover letter, but what else do you think the changelog
> in 1/6 could say? ;)
>
For folks like me who are barely familiar with the FPU code, some
additional context or reasoning would be surely be useful.
For example, I don't know why PKRU needs to be passed separately. I know
there is some history there but a line or two in the changelog might help.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 16:19 ` Edgecombe, Rick P
@ 2025-08-15 16:54 ` Oleg Nesterov
2025-08-15 17:46 ` Edgecombe, Rick P
0 siblings, 1 reply; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 16:54 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
broonie@kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, linux-kernel@vger.kernel.org, x86@kernel.org
On 08/15, Edgecombe, Rick P wrote:
>
> The bit in thread.features is like a sticky bit that is inherrited whenver a
> thread is cloned.
...
> You don't want to allow a protected app to spawn a new thread that
> escapes the enforcement.
Ah, this is clear. But again, PF_USER_WORKER is the kernel thread cloned
by the kernel. Yes, it shares the same thread-group, but this is only to
make SIGKILL/exit_group/etc work. It is not that userspace app can create
it via something like pthread_create().
> So what are we trying to do for PF_USER_WORKER? Prevent them from wasting a VMA
> with an unused shadow stack? Or set PF_USER_WORKER's aside from the logic that
> is about more than protecting the individual thread in the process?
Let me quote my answer to Mark:
The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.
I'd like to make this logic consistent with PF_KTHREAD, and in the longer
term change the x86 FPU code so that the kernel threads can run without
without "struct fpu" attached to task_struct.
And again, please see
Warning from x86_task_fpu()
https://lore.kernel.org/all/aJVuZZgYjEMxiUYq@ly-workstation/
PF_USER_WORKERs and shadow stack
https://lore.kernel.org/all/20250813162824.GA15234@redhat.com/
and 6/6 in this series.
> No, to make it have the same logic as the vfork case (which doesn't allocate a
> new shadow stack).
>
> Like:
> if ((clone_flags & CLONE_VFORK) || minimal) {
> shstk->base = 0;
> shstk->size = 0;
> return 0;
> }
Aha, got it. Agreed, but I think we also need to clear ARCH_SHSTK_SHSTK
copied by arch_dup_task_struct() ?
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 16:00 ` Oleg Nesterov
@ 2025-08-15 17:08 ` Mark Brown
0 siblings, 0 replies; 33+ messages in thread
From: Mark Brown @ 2025-08-15 17:08 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
axboe@kernel.dk, tglx@linutronix.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org, debug@rivosinc.com
[-- Attachment #1: Type: text/plain, Size: 723 bytes --]
On Fri, Aug 15, 2025 at 06:00:23PM +0200, Oleg Nesterov wrote:
> On 08/15, Mark Brown wrote:
> > OK, that's entirely x86 specific - there's no reason we'd want to do
> > that for arm64.
> Since I know nothing about arm64. Any reason we do want to have the unnecessary
> ARCH_SHSTK_SHSTK/shstk on arm64?
If you mean allocating the userspace shadow stack for threads that never
go to userspace then no, it's not needed.
> And... do you agree that shstk_alloc_thread_stack() without update_fpu_shstk()
> in copy_thread() path doesn't look right? Even if nothing really bad can happen.
Honestly the update_fpu_ stuff is sufficently x86 specific that it
doesn't really register as something I'd expect to see in that path.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 16:54 ` Oleg Nesterov
@ 2025-08-15 17:46 ` Edgecombe, Rick P
2025-08-15 19:13 ` Oleg Nesterov
0 siblings, 1 reply; 33+ messages in thread
From: Edgecombe, Rick P @ 2025-08-15 17:46 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, axboe@kernel.dk,
x86@kernel.org
On Fri, 2025-08-15 at 18:54 +0200, Oleg Nesterov wrote:
> > So what are we trying to do for PF_USER_WORKER? Prevent them from wasting a
> > VMA
> > with an unused shadow stack? Or set PF_USER_WORKER's aside from the logic
> > that
> > is about more than protecting the individual thread in the process?
>
> Let me quote my answer to Mark:
>
> The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK
> is
> the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.
Maybe you can explain the exact failure mode here? ARCH_SHSTK_SHSTK isn't part
of the FPU infrastructure, so maybe you can explain how there is some cascade.
>
> I'd like to make this logic consistent with PF_KTHREAD, and in the
> longer
> term change the x86 FPU code so that the kernel threads can run
> without
> without "struct fpu" attached to task_struct.
For PF_USER_WORKER it still has access to the user MM, right? Shouldn't it
inherit PKRU from the parent?
Stop me if I'm telling you something you already know... For better or worse,
the x86 FPU state has grown from the classic "extra math registers", to include
other things like PKRU and supervisor state. Supervisor state controls other
thread specific state that *only* the kernel is supposed to have complete access
to. Some of the xfeatures are even about saving and restoring state that *only*
affects the kernel. So I think each xfeature needs to be evaluated, and they are
all annoyingly different. I think LBR works in the kernel too?
>
> And again, please see
>
> Warning from x86_task_fpu()
> https://lore.kernel.org/all/aJVuZZgYjEMxiUYq@ly-workstation/
>
> PF_USER_WORKERs and shadow stack
> https://lore.kernel.org/all/20250813162824.GA15234@redhat.com/
>
> and 6/6 in this series.
I kind of think it would be more appropriate for you to explain more about what
you are trying to do. I've read three things:
- Prevent wasting a shstk VMA (which Dave suggested maybe wasn't worth it)
- Prevent issue around update_fpu_shstk(), which I'm not sure is an issue
- Prevent ptrace from setting FPU state on user workers because it caused
problems (details unclear), by changing the FPU design in a way that apparently
has impacts across FPU-using features (unclear why this is the best way to
prevent it)
TBH, based on my current understanding, all three sound dubious to me. Let's get
on the same page as far as the goals before we discuss shstk changes further.
You might be aware that the x86 FPU is seen as too complex already and so adding
special cases tends to have a high bar. So consider making a strong, clear
justification for the overall problem/solution you have in mind.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 11:44 ` Mark Brown
@ 2025-08-15 19:11 ` Deepak Gupta
2025-08-18 17:27 ` Mark Brown
0 siblings, 1 reply; 33+ messages in thread
From: Deepak Gupta @ 2025-08-15 19:11 UTC (permalink / raw)
To: Mark Brown
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, oleg@redhat.com, x86@kernel.org
On Fri, Aug 15, 2025 at 12:44:14PM +0100, Mark Brown wrote:
>On Thu, Aug 14, 2025 at 10:43:45PM +0000, Edgecombe, Rick P wrote:
>> On Thu, 2025-08-14 at 19:33 +0100, Mark Brown wrote:
>
>> > Perhaps we should factor the logic for deciding if we need to allocate a
>> > userspace shadow stack out into the arch independent code and do
>> > something like set a flag in kernel_clone_args that the arches can
>> > check? I think the logic is the same for all arches at the minute and
>> > don't see a reason why it'd diverge.
>
>> Sounds good. Like a little start to this:
>> https://lore.kernel.org/lkml/20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com/
>
>Yes, exactly.
>
>> > That'd collide a bit with my
>> > clone3() series, there's some overlap there with that creating another
>> > reason why the decision would change. Reducing the duplication would be
>> > nice.
>
>> Argh, I need to test the latest of that still.
>
>Yury pointed me at some newer x86 systems I was able to get access to so
>I was finally able to test that one myself before sending it out,
>confirmation would be good but hopefully it's fine. I've been holding
>back on sending a rebased version out since Deepak was going to help me
>get set up to test it on RISC-V. Though I see now that the RISC-V code
>has vanished from -next (I guess due to fallout from the issues with the
>merge to Linus, it looks like there's almost nothing in the branch
>currently), not sure what the plan is there?
>
>Perhaps I should just send it out, but given the difficulty getting
>anyone to pay attention I was trying to avoid issues with missing
>updates for newly added RISC-V shadow stacks.
Yes I was trying to get that sorted as well. Because now I'll have to
rebase my changes to 6.17. So I wanted to make sure that it applies
cleanly. I suggest that you send it out because risc-v was left out
anyways. I'll apply your patch series on my risc-v shadow stack changes
(on top of 6.17) and will report back. It might be easier that way.
How does that sound?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 17:46 ` Edgecombe, Rick P
@ 2025-08-15 19:13 ` Oleg Nesterov
0 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 19:13 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, axboe@kernel.dk,
x86@kernel.org
Rick,
I'll try to write another email and send V2 next week.
Although perhaps I'll send some other unrelated and "obvious" cleanups
before that...
Until then:
On 08/15, Edgecombe, Rick P wrote:
>
> Maybe you can explain
Damn yes, agreed, my fault.
> Stop me if I'm telling you something you already know...
Or, quite possibly, you should stop me to send the patches which change
the code I can hardly understand ;) And that is why I CC'ed the experts
like you.
See you next week,
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-15 16:32 ` Sohil Mehta
@ 2025-08-15 19:33 ` Oleg Nesterov
0 siblings, 0 replies; 33+ messages in thread
From: Oleg Nesterov @ 2025-08-15 19:33 UTC (permalink / raw)
To: Sohil Mehta
Cc: Dave Hansen, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ingo Molnar, Jens Axboe, Peter Zijlstra, Rick Edgecombe,
Thomas Gleixner, linux-kernel, x86, Mark Brown
On 08/15, Sohil Mehta wrote:
>
> I think it would be useful to categorize the impact of the "abuse" in
> the cover letter.
Yes, let me repeat that this is my fault.
>Is it going to cause kernel crashes, userspace crashes
> or just incorrect reporting?
Heh ;)
In the short term, I'd like to make your patch correct ;)
[PATCH v3 2/2] x86/fpu: Update the debug flow for x86_task_fpu()
https://lore.kernel.org/all/20250724013422.307954-2-sohil.mehta@intel.com/
(just in case, no, this series alone is not enough).
In the longer term, rightly or not I'd like to do other changes we have already
discussed. See my other emails in this thread.
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-15 19:11 ` Deepak Gupta
@ 2025-08-18 17:27 ` Mark Brown
2025-08-19 17:41 ` Deepak Gupta
0 siblings, 1 reply; 33+ messages in thread
From: Mark Brown @ 2025-08-18 17:27 UTC (permalink / raw)
To: Deepak Gupta
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, oleg@redhat.com, x86@kernel.org
[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]
On Fri, Aug 15, 2025 at 12:11:47PM -0700, Deepak Gupta wrote:
> On Fri, Aug 15, 2025 at 12:44:14PM +0100, Mark Brown wrote:
> > confirmation would be good but hopefully it's fine. I've been holding
> > back on sending a rebased version out since Deepak was going to help me
> > get set up to test it on RISC-V. Though I see now that the RISC-V code
> > has vanished from -next (I guess due to fallout from the issues with the
> > merge to Linus, it looks like there's almost nothing in the branch
> > currently), not sure what the plan is there?
> > Perhaps I should just send it out, but given the difficulty getting
> > anyone to pay attention I was trying to avoid issues with missing
> > updates for newly added RISC-V shadow stacks.
> Yes I was trying to get that sorted as well. Because now I'll have to
> rebase my changes to 6.17. So I wanted to make sure that it applies
> cleanly. I suggest that you send it out because risc-v was left out
> anyways. I'll apply your patch series on my risc-v shadow stack changes
> (on top of 6.17) and will report back. It might be easier that way.
> How does that sound?
Sounds good.
My main concern is that I don't want to end up needlessly holding off
either series due to dependencies/cross tree issues - I remain
(endlessly!) hopeful that the everyone's happy with the clone3() work at
this point and it could get merged, but if RISC-V support is going in
then it should support the new interface too. Hopefully we can do
something like apply this on a branch and then merge that into the
RISC-V tree?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-18 17:27 ` Mark Brown
@ 2025-08-19 17:41 ` Deepak Gupta
0 siblings, 0 replies; 33+ messages in thread
From: Deepak Gupta @ 2025-08-19 17:41 UTC (permalink / raw)
To: Mark Brown
Cc: Edgecombe, Rick P, mingo@kernel.org, dave.hansen@linux.intel.com,
bp@alien8.de, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de, axboe@kernel.dk,
Mehta, Sohil, oleg@redhat.com, x86@kernel.org
On Mon, Aug 18, 2025 at 06:27:02PM +0100, Mark Brown wrote:
>On Fri, Aug 15, 2025 at 12:11:47PM -0700, Deepak Gupta wrote:
>> On Fri, Aug 15, 2025 at 12:44:14PM +0100, Mark Brown wrote:
>
>> > confirmation would be good but hopefully it's fine. I've been holding
>> > back on sending a rebased version out since Deepak was going to help me
>> > get set up to test it on RISC-V. Though I see now that the RISC-V code
>> > has vanished from -next (I guess due to fallout from the issues with the
>> > merge to Linus, it looks like there's almost nothing in the branch
>> > currently), not sure what the plan is there?
>
>> > Perhaps I should just send it out, but given the difficulty getting
>> > anyone to pay attention I was trying to avoid issues with missing
>> > updates for newly added RISC-V shadow stacks.
>
>> Yes I was trying to get that sorted as well. Because now I'll have to
>> rebase my changes to 6.17. So I wanted to make sure that it applies
>> cleanly. I suggest that you send it out because risc-v was left out
>> anyways. I'll apply your patch series on my risc-v shadow stack changes
>> (on top of 6.17) and will report back. It might be easier that way.
>
>> How does that sound?
>
>Sounds good.
>
>My main concern is that I don't want to end up needlessly holding off
>either series due to dependencies/cross tree issues - I remain
>(endlessly!) hopeful that the everyone's happy with the clone3() work at
>this point and it could get merged, but if RISC-V support is going in
>then it should support the new interface too. Hopefully we can do
>something like apply this on a branch and then merge that into the
>RISC-V tree?
Yes they should make to upstream independent of each other. There are some
changes which I can adopt in my current set of changes (like change
`shstk_alloc_thread_stack` prototype). In your current patchsets
"fork: Add shadow stack support to clone3()" will need riscv support. Other
than that I see that support for `my_syscall2` for riscv arch will be needed.
If clone3 changes make in before riscv shadow stack changes, then its easy
I'll just add riscv specific changes pertaining to clone3 in my current
patchsets.
If clone3 changes are later than riscv shadow stack changes, then I'll point
you to a branch from where you can pick riscv specific clone3 + shadow stack
changes.
If they both are going in together (may happen for 6.18 window), then we will
have to coordinate on which one applies first. So let's keep an eye on that.
If I push my branch somewhere on github (riscv shadow stack changes and then
clone3 changes on top of them), Is that fine?
-Deepak
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2025-08-19 17:41 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 10:13 [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-14 10:14 ` [PATCH 1/6] x86/fpu: change copy_xstate_to_uabi_buf() to accept fpstate + pkru instead of task_struct Oleg Nesterov
2025-08-14 16:46 ` Edgecombe, Rick P
2025-08-15 12:22 ` Oleg Nesterov
2025-08-14 10:14 ` [PATCH 2/6] x86/fpu: regset: introduce get_fpstate() helper Oleg Nesterov
2025-08-14 10:14 ` [PATCH 3/6] x86/fpu: fold sync_fpstate() into get_fpstate() Oleg Nesterov
2025-08-14 10:14 ` [PATCH 4/6] x86/shstk: add "task_struct *tsk" argument to reset_thread_features() Oleg Nesterov
2025-08-14 10:14 ` [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
2025-08-14 17:03 ` Edgecombe, Rick P
2025-08-14 18:33 ` Mark Brown
2025-08-14 22:43 ` Edgecombe, Rick P
2025-08-15 11:44 ` Mark Brown
2025-08-15 19:11 ` Deepak Gupta
2025-08-18 17:27 ` Mark Brown
2025-08-19 17:41 ` Deepak Gupta
2025-08-15 13:01 ` Oleg Nesterov
2025-08-15 13:08 ` Oleg Nesterov
2025-08-15 15:28 ` Mark Brown
2025-08-15 15:43 ` Oleg Nesterov
2025-08-15 15:48 ` Mark Brown
2025-08-15 16:00 ` Oleg Nesterov
2025-08-15 17:08 ` Mark Brown
2025-08-15 12:17 ` Oleg Nesterov
2025-08-15 16:19 ` Edgecombe, Rick P
2025-08-15 16:54 ` Oleg Nesterov
2025-08-15 17:46 ` Edgecombe, Rick P
2025-08-15 19:13 ` Oleg Nesterov
2025-08-14 10:14 ` [PATCH 6/6] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER Oleg Nesterov
2025-08-15 15:52 ` [PATCH 0/6] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-15 15:59 ` Dave Hansen
2025-08-15 16:02 ` Oleg Nesterov
2025-08-15 16:32 ` Sohil Mehta
2025-08-15 19:33 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).