* [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
@ 2025-08-22 15:36 Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 1/5] x86/fpu: don't use x86_task_fpu() in copy_xstate_to_uabi_buf() Oleg Nesterov
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 15:36 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, Deepak Gupta, H. Peter Anvin,
Ingo Molnar, Mark Brown, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86
PF_USER_WORKER threads don't really differ from PF_KTHREAD threads
at least in that they never return to usermode and never use their
FPU state.
However, ptrace or coredump paths can access their FPU state and this
is the only reason why x86_task_fpu(PF_USER_WORKER) needs to work and
and discriminate PF_USER_WORKER from PF_KTHREAD. Unlike all other x86
FPU code paths which do not distinguish them.
OTOH, arch/x86/kernel/fpu/regset.c doesn't really need "struct fpu *",
the .regset_get() functions actually need a "struct fpstate *". If the
target task is PF_USER_WORKER, they can safely use &init_fpstate. So
this series adds the new simple helper
static struct fpstate *get_fpstate(struct task_struct *task)
{
struct fpu *fpu;
if (unlikely(task->flags & PF_USER_WORKER))
return &init_fpstate;
fpu = x86_task_fpu(task);
if (task == current)
fpu_sync_fpstate(fpu);
return fpu->fpstate;
}
which can be used instead of x86_task_fpu(task)->fpstate pattern in
arch/x86/kernel/fpu/regset.c.
However, there is an annoying complication: shstk_alloc_thread_stack()
can alloc the pointless shadow stack for PF_USER_WORKER thread and set
the ARCH_SHSTK_SHSTK flag. This means that ssp_get()->ssp_active() can
return true, and in this case it wouldn't be right to use the "unrelated"
init_fpstate.
That is why this series includes 4/5, and to me it looks like a cleanup
which makes sense regardless.
Link to V1: https://lore.kernel.org/all/20250814101340.GA17288@redhat.com/
Changes:
- improve the subject/changelog in 1/5
- drop "x86/shstk: add "task_struct *tsk" argument to reset_thread_features()"
- rework 4/5 to not use reset_thread_features()
TODO:
update the fpregs_soft_get() and user_regset.set() paths as well
Oleg.
---
arch/x86/include/asm/shstk.h | 4 ++--
arch/x86/kernel/fpu/regset.c | 46 ++++++++++++++++++++++++++------------------
arch/x86/kernel/fpu/xstate.c | 12 ++++++------
arch/x86/kernel/fpu/xstate.h | 4 ++--
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 9 +++++++--
6 files changed, 45 insertions(+), 32 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/5] x86/fpu: don't use x86_task_fpu() in copy_xstate_to_uabi_buf()
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
@ 2025-08-22 15:36 ` Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 2/5] x86/fpu: regset: introduce get_fpstate() helper Oleg Nesterov
` (4 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 15:36 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, Deepak Gupta, H. Peter Anvin,
Ingo Molnar, Mark Brown, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86
No functional changes, preparation for the next patches.
Change copy_xstate_to_uabi_buf() to take a "struct fpstate *" and
"u32 pkru" instead of "struct task_struct *" to avoid x86_task_fpu(tsk).
The callers already have "struct fpu *" and can pass fpu->fpstate directly.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 10 ++++++----
arch/x86/kernel/fpu/xstate.c | 12 ++++++------
arch/x86/kernel/fpu/xstate.h | 4 ++--
3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 0986c2200adc..d280d415b171 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -83,7 +83,7 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
sizeof(fpu->fpstate->regs.fxsave));
}
- copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_FX);
+ copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FX);
return 0;
}
@@ -130,12 +130,14 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
+ struct fpu *fpu = x86_task_fpu(target);
+
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;
- sync_fpstate(x86_task_fpu(target));
+ sync_fpstate(fpu);
- copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
+ copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_XSAVE);
return 0;
}
@@ -419,7 +421,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf mb = { .p = &fxsave, .left = sizeof(fxsave) };
/* Handle init state optimized xstate correctly */
- copy_xstate_to_uabi_buf(mb, target, XSTATE_COPY_FP);
+ copy_xstate_to_uabi_buf(mb, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FP);
fx = &fxsave;
} else {
fx = &fpu->fpstate->regs.fxsave;
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 12ed75c1b567..2bd5974d5f0e 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1256,7 +1256,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
/**
* copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
* @to: membuf descriptor
- * @tsk: The task from which to copy the saved xstate
+ * @fpstate: The fpstate buffer from which to copy
+ * @pkru_val: The PKRU value to store in the PKRU component
* @copy_mode: The requested copy mode
*
* Converts from kernel XSAVE or XSAVES compacted format to UABI conforming
@@ -1265,12 +1266,11 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*
* It supports partial copy but @to.pos always starts from zero.
*/
-void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
- enum xstate_copy_mode copy_mode)
+void copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
+ u32 pkru_val, enum xstate_copy_mode copy_mode)
{
- __copy_xstate_to_uabi_buf(to, x86_task_fpu(tsk)->fpstate,
- x86_task_fpu(tsk)->fpstate->user_xfeatures,
- tsk->thread.pkru, copy_mode);
+ __copy_xstate_to_uabi_buf(to, fpstate, fpstate->user_xfeatures,
+ pkru_val, copy_mode);
}
static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 52ce19289989..9d76ded84cdd 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ struct membuf;
extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u64 xfeatures, u32 pkru_val,
enum xstate_copy_mode copy_mode);
-extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
- enum xstate_copy_mode mode);
+extern void copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
+ u32 pkru_val, enum xstate_copy_mode copy_mode);
extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/5] x86/fpu: regset: introduce get_fpstate() helper
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 1/5] x86/fpu: don't use x86_task_fpu() in copy_xstate_to_uabi_buf() Oleg Nesterov
@ 2025-08-22 15:36 ` Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 3/5] x86/fpu: fold sync_fpstate() into get_fpstate() Oleg Nesterov
` (3 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 15:36 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, Deepak Gupta, H. Peter Anvin,
Ingo Molnar, Mark Brown, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86
After the previous change the regset get() functions do not really need
"struct fpu *", they can use "struct fpstate *" returned by the new helper
which also does sync_fpstate().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 42 ++++++++++++++++++++----------------
1 file changed, 24 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index d280d415b171..f5a803774e1c 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -49,6 +49,13 @@ static void sync_fpstate(struct fpu *fpu)
fpu_sync_fpstate(fpu);
}
+static struct fpstate *get_fpstate(struct task_struct *task)
+{
+ struct fpu *fpu = x86_task_fpu(task);
+ sync_fpstate(fpu);
+ return fpu->fpstate;
+}
+
/*
* Invalidate cached FPU registers before modifying the stopped target
* task's fpstate.
@@ -71,19 +78,19 @@ static void fpu_force_restore(struct fpu *fpu)
int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
- sync_fpstate(fpu);
+ fpstate = get_fpstate(target);
if (!use_xsave()) {
- return membuf_write(&to, &fpu->fpstate->regs.fxsave,
- sizeof(fpu->fpstate->regs.fxsave));
+ return membuf_write(&to, &fpstate->regs.fxsave,
+ sizeof(fpstate->regs.fxsave));
}
- copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FX);
+ copy_xstate_to_uabi_buf(to, fpstate, target->thread.pkru, XSTATE_COPY_FX);
return 0;
}
@@ -130,14 +137,13 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;
- sync_fpstate(fpu);
-
- copy_xstate_to_uabi_buf(to, fpu->fpstate, target->thread.pkru, XSTATE_COPY_XSAVE);
+ fpstate = get_fpstate(target);
+ copy_xstate_to_uabi_buf(to, fpstate, target->thread.pkru, XSTATE_COPY_XSAVE);
return 0;
}
@@ -189,15 +195,15 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
struct cet_user_state *cetregs;
if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) ||
!ssp_active(target, regset))
return -ENODEV;
- sync_fpstate(fpu);
- cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER);
+ fpstate = get_fpstate(target);
+ cetregs = get_xsave_addr(&fpstate->regs.xsave, XFEATURE_CET_USER);
if (WARN_ON(!cetregs)) {
/*
* This shouldn't ever be NULL because shadow stack was
@@ -403,17 +409,17 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = x86_task_fpu(target);
+ struct fpstate *fpstate;
struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;
- sync_fpstate(fpu);
-
if (!cpu_feature_enabled(X86_FEATURE_FPU))
return fpregs_soft_get(target, regset, to);
+ fpstate = get_fpstate(target);
+
if (!cpu_feature_enabled(X86_FEATURE_FXSR)) {
- return membuf_write(&to, &fpu->fpstate->regs.fsave,
+ return membuf_write(&to, &fpstate->regs.fsave,
sizeof(struct fregs_state));
}
@@ -421,10 +427,10 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf mb = { .p = &fxsave, .left = sizeof(fxsave) };
/* Handle init state optimized xstate correctly */
- copy_xstate_to_uabi_buf(mb, fpu->fpstate, target->thread.pkru, XSTATE_COPY_FP);
+ copy_xstate_to_uabi_buf(mb, fpstate, target->thread.pkru, XSTATE_COPY_FP);
fx = &fxsave;
} else {
- fx = &fpu->fpstate->regs.fxsave;
+ fx = &fpstate->regs.fxsave;
}
__convert_from_fxsr(&env, target, fx);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 3/5] x86/fpu: fold sync_fpstate() into get_fpstate()
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 1/5] x86/fpu: don't use x86_task_fpu() in copy_xstate_to_uabi_buf() Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 2/5] x86/fpu: regset: introduce get_fpstate() helper Oleg Nesterov
@ 2025-08-22 15:36 ` Oleg Nesterov
2025-08-22 15:37 ` [PATCH v2 4/5] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
` (2 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 15:36 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, Deepak Gupta, H. Peter Anvin,
Ingo Molnar, Mark Brown, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86
After the previous change sync_fpstate() has no other callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index f5a803774e1c..ecbabdc15ec1 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -43,16 +43,12 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
* - ptrace to dump fpstate of a stopped task, in which case the registers
* have already been saved to fpstate on context switch.
*/
-static void sync_fpstate(struct fpu *fpu)
-{
- if (fpu == x86_task_fpu(current))
- fpu_sync_fpstate(fpu);
-}
-
static struct fpstate *get_fpstate(struct task_struct *task)
{
struct fpu *fpu = x86_task_fpu(task);
- sync_fpstate(fpu);
+
+ if (task == current)
+ fpu_sync_fpstate(fpu);
return fpu->fpstate;
}
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 4/5] x86/shstk: don't create the shadow stack for PF_USER_WORKERs
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (2 preceding siblings ...)
2025-08-22 15:36 ` [PATCH v2 3/5] x86/fpu: fold sync_fpstate() into get_fpstate() Oleg Nesterov
@ 2025-08-22 15:37 ` Oleg Nesterov
2025-08-22 15:37 ` [PATCH v2 5/5] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER Oleg Nesterov
2025-08-22 16:32 ` [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Edgecombe, Rick P
5 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 15:37 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, Deepak Gupta, H. Peter Anvin,
Ingo Molnar, Mark Brown, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86
If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
stack for no reason, the new (kernel) thread will never return to usermode.
Plus the current code doesn't even look correct, in this case fpu_clone()
won't call update_fpu_shstk().
Add the new "bool minimal = !!args->fn" argument (which matches that of
fpu_clone()) to shstk_alloc_thread_stack() and change it to check this
argument along with CLONE_VFORK.
With this patch ssp_get() -> ssp_active(target) should never return true
if target->flags & PF_USER_WORKER.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/include/asm/shstk.h | 4 ++--
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 9 +++++++--
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index ba6f2fe43848..a4ee2baab51c 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -17,7 +17,7 @@ struct thread_shstk {
long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
void reset_thread_features(void);
unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
- unsigned long stack_size);
+ bool minimal, unsigned long stack_size);
void shstk_free(struct task_struct *p);
int setup_signal_shadow_stack(struct ksignal *ksig);
int restore_signal_shadow_stack(void);
@@ -28,7 +28,7 @@ static inline long shstk_prctl(struct task_struct *task, int option,
unsigned long arg2) { return -EINVAL; }
static inline void reset_thread_features(void) {}
static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
- unsigned long clone_flags,
+ unsigned long clone_flags, bool minimal,
unsigned long stack_size) { return 0; }
static inline void shstk_free(struct task_struct *p) {}
static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1b7960cf6eb0..e932e0e53972 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -209,7 +209,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
* is disabled, new_ssp will remain 0, and fpu_clone() will know not to
* update it.
*/
- new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
+ new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->fn, args->stack_size);
if (IS_ERR_VALUE(new_ssp))
return PTR_ERR((void *)new_ssp);
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 2ddf23387c7e..6c8c4593e202 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -192,7 +192,7 @@ void reset_thread_features(void)
}
unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
- unsigned long stack_size)
+ bool minimal, unsigned long stack_size)
{
struct thread_shstk *shstk = &tsk->thread.shstk;
unsigned long addr, size;
@@ -208,8 +208,13 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl
* For CLONE_VFORK the child will share the parents shadow stack.
* Make sure to clear the internal tracking of the thread shadow
* stack so the freeing logic run for child knows to leave it alone.
+ *
+ * If minimal == true, the new kernel thread cloned from userspace
+ * thread will never return to usermode.
*/
- if (clone_flags & CLONE_VFORK) {
+ if ((clone_flags & CLONE_VFORK) || minimal) {
+ if (minimal)
+ tsk->thread.features &= ~ARCH_SHSTK_SHSTK;
shstk->base = 0;
shstk->size = 0;
return 0;
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 5/5] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (3 preceding siblings ...)
2025-08-22 15:37 ` [PATCH v2 4/5] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
@ 2025-08-22 15:37 ` Oleg Nesterov
2025-08-22 16:32 ` [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Edgecombe, Rick P
5 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 15:37 UTC (permalink / raw)
To: Borislav Petkov, Dave Hansen, Deepak Gupta, H. Peter Anvin,
Ingo Molnar, Mark Brown, Peter Zijlstra, Rick Edgecombe,
Sohil Mehta, Thomas Gleixner
Cc: linux-kernel, x86
PF_USER_WORKERs must never use FPU, this is what kernel_fpu_begin/etc
assume. The .regset_get() functions can safely use init_fpstate if
target->flags & PF_USER_WORKER.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/kernel/fpu/regset.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index ecbabdc15ec1..dfd12d109f00 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,8 +45,12 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
*/
static struct fpstate *get_fpstate(struct task_struct *task)
{
- struct fpu *fpu = x86_task_fpu(task);
+ struct fpu *fpu;
+ if (unlikely(task->flags & PF_USER_WORKER))
+ return &init_fpstate;
+
+ fpu = x86_task_fpu(task);
if (task == current)
fpu_sync_fpstate(fpu);
return fpu->fpstate;
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
` (4 preceding siblings ...)
2025-08-22 15:37 ` [PATCH v2 5/5] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER Oleg Nesterov
@ 2025-08-22 16:32 ` Edgecombe, Rick P
2025-08-22 19:21 ` Oleg Nesterov
5 siblings, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2025-08-22 16:32 UTC (permalink / raw)
To: debug@rivosinc.com, mingo@kernel.org, dave.hansen@linux.intel.com,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
tglx@linutronix.de, bp@alien8.de, Mehta, Sohil, oleg@redhat.com
Cc: linux-kernel@vger.kernel.org, x86@kernel.org
On Fri, 2025-08-22 at 17:36 +0200, Oleg Nesterov wrote:
> PF_USER_WORKER threads don't really differ from PF_KTHREAD threads
> at least in that they never return to usermode and never use their
> FPU state.
>
> However, ptrace or coredump paths can access their FPU state and this
> is the only reason why x86_task_fpu(PF_USER_WORKER) needs to work and
> and discriminate PF_USER_WORKER from PF_KTHREAD. Unlike all other x86
> FPU code paths which do not distinguish them.
>
> OTOH, arch/x86/kernel/fpu/regset.c doesn't really need "struct fpu *",
> the .regset_get() functions actually need a "struct fpstate *". If the
> target task is PF_USER_WORKER, they can safely use &init_fpstate. So
> this series adds the new simple helper
PKRU affects kernel accesses to userspace. io threads and vhost access
userspace. So why don't we want PKRU state to be inherited for user workers? I
guess it is not today, but to me, conceptually we maybe don't want a special
case for them? So rather than add more special handling, could we actually just
remove special handling to make it consistent?
But again, what exactly is the problem here? Is there a crash or something for
user workers?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-22 16:32 ` [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Edgecombe, Rick P
@ 2025-08-22 19:21 ` Oleg Nesterov
2025-08-22 20:01 ` Edgecombe, Rick P
0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-22 19:21 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, dave.hansen@linux.intel.com,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
tglx@linutronix.de, bp@alien8.de, Mehta, Sohil,
linux-kernel@vger.kernel.org, x86@kernel.org
On 08/22, Edgecombe, Rick P wrote:
>
> On Fri, 2025-08-22 at 17:36 +0200, Oleg Nesterov wrote:
> > PF_USER_WORKER threads don't really differ from PF_KTHREAD threads
> > at least in that they never return to usermode and never use their
> > FPU state.
> >
> > However, ptrace or coredump paths can access their FPU state and this
> > is the only reason why x86_task_fpu(PF_USER_WORKER) needs to work and
> > and discriminate PF_USER_WORKER from PF_KTHREAD. Unlike all other x86
> > FPU code paths which do not distinguish them.
> >
> > OTOH, arch/x86/kernel/fpu/regset.c doesn't really need "struct fpu *",
> > the .regset_get() functions actually need a "struct fpstate *". If the
> > target task is PF_USER_WORKER, they can safely use &init_fpstate. So
> > this series adds the new simple helper
>
> PKRU affects kernel accesses to userspace. io threads and vhost access
> userspace. So why don't we want PKRU state to be inherited for user workers?
Sorry I don't follow... Again, this is not my area, I am sure I've missed something.
But could you please explain how can this series affect the PKRU logic?
> I guess it is not today, but to me, conceptually we maybe don't want a special
> case for them? So rather than add more special handling, could we actually just
> remove special handling to make it consistent?
Could you spell please?
> But again, what exactly is the problem here? Is there a crash or something for
> user workers?
Well. I already tried to to explain this in the previous discussions. Apperently
I wasn't clear, my fault. So I guess this needs yet another email which I'll write
tomorrow, becauase I am already sleeping today.
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-22 19:21 ` Oleg Nesterov
@ 2025-08-22 20:01 ` Edgecombe, Rick P
2025-08-25 13:47 ` Oleg Nesterov
0 siblings, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2025-08-22 20:01 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On Fri, 2025-08-22 at 21:21 +0200, Oleg Nesterov wrote:
> > PKRU affects kernel accesses to userspace. io threads and vhost access
> > userspace. So why don't we want PKRU state to be inherited for user workers?
>
> Sorry I don't follow... Again, this is not my area, I am sure I've missed
> something.
> But could you please explain how can this series affect the PKRU logic?
>
> > I guess it is not today
I'm sorry, this is incorrect. PKRU is not kept in the FPU structs anymore. So it
should be inherited over clone I guess. But despite not being in the actual FPU
buffer, for compatibility it's left in the uabi xstate copy stuff that the
regsets use.
> > , but to me, conceptually we maybe don't want a special case for them? So
> > rather than add more special handling, could we actually just remove special
> > handling to make it consistent?
>
> Could you spell please?
PKRU is for "protection keys". These are memory permissions that can be applied
per-thread. So you can make a virtual address toggleable for read or write
without affecting the other threads. The kernel is supposed to have these
permissions enforced just like the normal ones (read-only, etc). So it's an
example of user FPU state that applied to the kernel. Except that, as above, it
had to be moved out of the actual FPU state because it was special in that way.
But I think you could argue that it should be part of ptrace regsets. A debugger
may want to inspect what PKRU enforcement was happening for the io thread.
>
> > But again, what exactly is the problem here? Is there a crash or something
> > for
> > user workers?
>
> Well. I already tried to to explain this in the previous discussions.
> Apperently I wasn't clear, my fault. So I guess this needs yet another email
> which I'll write tomorrow, becauase I am already sleeping today.
I believe you said something like "sorry my fault and I'll explain in another
mail"[0]. Did I miss it?
[0]
https://lore.kernel.org/lkml/20250815191306.GK11549@redhat.com/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-22 20:01 ` Edgecombe, Rick P
@ 2025-08-25 13:47 ` Oleg Nesterov
2025-08-27 14:12 ` Edgecombe, Rick P
0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-25 13:47 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On 08/22, Edgecombe, Rick P wrote:
>
> On Fri, 2025-08-22 at 21:21 +0200, Oleg Nesterov wrote:
> > > PKRU affects kernel accesses to userspace. io threads and vhost access
> > > userspace. So why don't we want PKRU state to be inherited for user workers?
> >
> > Sorry I don't follow... Again, this is not my area, I am sure I've missed
> > something.
> > But could you please explain how can this series affect the PKRU logic?
> >
> > > I guess it is not today
>
> I'm sorry, this is incorrect. PKRU is not kept in the FPU structs anymore. So it
> should be inherited over clone I guess.
Yes,
> But despite not being in the actual FPU
> buffer, for compatibility it's left in the uabi xstate copy stuff that the
> regsets use.
Yes, and copy_xstate_to_uabi_buf() still reports target->thread.pkru for
io threads.
So this series doesn't make any difference in this respect...
> > > But again, what exactly is the problem here? Is there a crash or something
> > > for
> > > user workers?
> >
> > Well. I already tried to to explain this in the previous discussions.
> > Apperently I wasn't clear, my fault. So I guess this needs yet another email
> > which I'll write tomorrow, becauase I am already sleeping today.
>
> I believe you said something like "sorry my fault and I'll explain in another
> mail"[0]. Did I miss it?
>
> [0]
> https://lore.kernel.org/lkml/20250815191306.GK11549@redhat.com/
I tried to add more details in this "[PATCH v2 0/5]" cover letter, in particular
to explain why does this series include "[PATCH v2 4/5] x86/shstk: don't create the
shadow stack for PF_USER_WORKERs". I thought that your were asking to explain this
part...
So. Sorry if it wasn't clear, this series is not a bug fix or something like this.
This starts the cleanups I was thinking about year ago, see
https://lore.kernel.org/all/20240606120038.GB22450@redhat.com/
Then later we can probably make more changes so that the kernel threads
(PF_KTHREADs and PF_USER_WORKERs) will run without "struct fpu" attached
to task_struct, so that x86_task_fpu() should return NULL regardless of
CONFIG_X86_DEBUG_FPU.
But even the WARN_ON_ONCE(task->flags & (PF_KTHREAD|PF_USER_WORKER)) in
x86_task_fpu() makes sense to me.
Say, switch_fpu() has no reason to check "PF_KTHREAD | PF_USER_WORKER",
this check should die. But if something goes wrong, it would be nice to
have a warning for io threads as well.
But as I said, I understand that cleanups are always subjective. It seems
that nobody is interested, and the only reviewer (you ;) doesn't like these
changes. I am going to give up.
That said... Could you explain why do you dislike 4/5 ?
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-25 13:47 ` Oleg Nesterov
@ 2025-08-27 14:12 ` Edgecombe, Rick P
2025-08-27 14:51 ` Oleg Nesterov
0 siblings, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2025-08-27 14:12 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On Mon, 2025-08-25 at 15:47 +0200, Oleg Nesterov wrote:
> I tried to add more details in this "[PATCH v2 0/5]" cover letter, in particular
> to explain why does this series include "[PATCH v2 4/5] x86/shstk: don't create the
> shadow stack for PF_USER_WORKERs". I thought that your were asking to explain this
> part...
>
> So. Sorry if it wasn't clear, this series is not a bug fix or something like this.
> This starts the cleanups I was thinking about year ago, see
>
> https://lore.kernel.org/all/20240606120038.GB22450@redhat.com/
>
> Then later we can probably make more changes so that the kernel threads
> (PF_KTHREADs and PF_USER_WORKERs) will run without "struct fpu" attached
> to task_struct, so that x86_task_fpu() should return NULL regardless of
> CONFIG_X86_DEBUG_FPU.
To save space?
>
> But even the WARN_ON_ONCE(task->flags & (PF_KTHREAD|PF_USER_WORKER)) in
> x86_task_fpu() makes sense to me.
>
> Say, switch_fpu() has no reason to check "PF_KTHREAD | PF_USER_WORKER",
> this check should die.
>
Digging through git, the reason for the PF_USER_WORKER check in switch_fpu() was
originally: "more of a cosmetic thing that was found while debugging and issue
and pondering why the FPU flag is set on these threads."
So a goal could be to make the code make more sense? If that is a reason, then I
have some concerns with it. The simpler code would need to somehow work with
that (I think...) user workers should still have a PKRU value. So then does
ptrace need branching logic for xstateregs_get/set() with a struct fpu and
without? If so, is that ultimately simpler?
> But if something goes wrong, it would be nice to
> have a warning for io threads as well.
I guess I question whether it really makes sense to add a special case for
PF_USER_WORKER, including the existing logic. But I'm still trying to piece
together a clearly stated benefit.
>
> But as I said, I understand that cleanups are always subjective. It seems
> that nobody is interested, and the only reviewer (you ;) doesn't like these
> changes. I am going to give up.
>
> That said... Could you explain why do you dislike 4/5 ?
As I said, shstk_alloc_thread_stack() shouldn't clear ARCH_SHSTK_SHSTK because
the function is about shadow stack allocation. It also doesn't make sense to
clear ARCH_SHSTK_SHSTK for user workers. It seemed like Mark (arm shadow stack
person) agreed on those...
I think Dave also questioned whether a rare extra shadow stack is really a
problem.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-27 14:12 ` Edgecombe, Rick P
@ 2025-08-27 14:51 ` Oleg Nesterov
2025-08-28 21:48 ` Edgecombe, Rick P
0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-27 14:51 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On 08/27, Edgecombe, Rick P wrote:
>
> On Mon, 2025-08-25 at 15:47 +0200, Oleg Nesterov wrote:
> >
> > So. Sorry if it wasn't clear, this series is not a bug fix or something like this.
> > This starts the cleanups I was thinking about year ago, see
> >
> > https://lore.kernel.org/all/20240606120038.GB22450@redhat.com/
> >
> > Then later we can probably make more changes so that the kernel threads
> > (PF_KTHREADs and PF_USER_WORKERs) will run without "struct fpu" attached
> > to task_struct, so that x86_task_fpu() should return NULL regardless of
> > CONFIG_X86_DEBUG_FPU.
>
> To save space?
Yes. And to make the fact that kernel threads never use (do not really have)
FPU state more clear.
> > But even the WARN_ON_ONCE(task->flags & (PF_KTHREAD|PF_USER_WORKER)) in
> > x86_task_fpu() makes sense to me.
> >
> > Say, switch_fpu() has no reason to check "PF_KTHREAD | PF_USER_WORKER",
> > this check should die.
> >
>
> Digging through git, the reason for the PF_USER_WORKER check in switch_fpu() was
> originally: "more of a cosmetic thing that was found while debugging and issue
> and pondering why the FPU flag is set on these threads."
Whatever reasons we had, they're gone. We can rely on TIF_NEED_FPU_LOAD.
I'll send a coupld of patches tomorrow.
> So a goal could be to make the code make more sense? If that is a reason, then I
> have some concerns with it. The simpler code would need to somehow work with
> that (I think...) user workers should still have a PKRU value. So then does
> ptrace need branching logic for xstateregs_get/set() with a struct fpu and
> without? If so, is that ultimately simpler?
Sorry, I don't understand... In particular, I don't understand again how
this connects to PKRU. __switch_to()->x86_pkru_load() doesn't depend on
switch_fpu() ?
> > But if something goes wrong, it would be nice to
> > have a warning for io threads as well.
>
> I guess I question whether it really makes sense to add a special case for
> PF_USER_WORKER, including the existing logic. But I'm still trying to piece
> together a clearly stated benefit.
Again, I don't understand... To me, currently arch/x86/kernel/fpu/regset.c
adds a special case for PF_USER_WORKER, this series tries to remove it (but
we need a bit more of simple changes).
> > That said... Could you explain why do you dislike 4/5 ?
>
> As I said, shstk_alloc_thread_stack() shouldn't clear ARCH_SHSTK_SHSTK because
> the function is about shadow stack allocation.
OK, then how/where we can clear this flag if we avoid the pointless shadow stack
allocation for PF_USER_WORKER?
> It also doesn't make sense to
> clear ARCH_SHSTK_SHSTK for user workers.
Why?
> I think Dave also questioned whether a rare extra shadow stack is really a
> problem.
Sure, it is not really a problem. In that it is not a bug. But why we can't
avoid the pointless shadow stack / ARCH_SHSTK_SHSTK for user workers ? 4/5
doesn't complicate this code.
Plus, again, the current code is not consistent. fpu_clone() won't do
update_fpu_shstk() in this case. Not a bug too, but imo deserves a cleanup.
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-27 14:51 ` Oleg Nesterov
@ 2025-08-28 21:48 ` Edgecombe, Rick P
2025-08-29 15:06 ` Oleg Nesterov
0 siblings, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2025-08-28 21:48 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On Wed, 2025-08-27 at 16:51 +0200, Oleg Nesterov wrote:
> >
> > I guess I question whether it really makes sense to add a special case for
> > PF_USER_WORKER, including the existing logic. But I'm still trying to piece
> > together a clearly stated benefit.
>
> Again, I don't understand... To me, currently arch/x86/kernel/fpu/regset.c
> adds a special case for PF_USER_WORKER, this series tries to remove it (but
> we need a bit more of simple changes).
That commit I dug up? It didn't have a super strong justification either. Can
you say what your intended benefit is?
>
> > > That said... Could you explain why do you dislike 4/5 ?
> >
> > As I said, shstk_alloc_thread_stack() shouldn't clear ARCH_SHSTK_SHSTK
> > because
> > the function is about shadow stack allocation.
>
> OK, then how/where we can clear this flag if we avoid the pointless shadow
> stack allocation for PF_USER_WORKER?
*If* we want to worry about an extra shadow stack allocation (which Dave seems
to doubt), we don't need to clear ARCH_SHSTK_SHSTK to avoid allocations. Other
thread types already avoid it (vfork, etc). So just add to the existing logic
that skips shadow stack allocation. Make it do that for user workers too, and
leave ARCH_SHSTK_SHSTK alone.
>
> > It also doesn't make sense to clear ARCH_SHSTK_SHSTK for user workers.
>
> Why?
Because ARCH_SHSTK_SHSTK is supposed to be inherited by children. It adds a
special case for no reason.
>
> > I think Dave also questioned whether a rare extra shadow stack is really a
> > problem.
>
> Sure, it is not really a problem. In that it is not a bug. But why we can't
> avoid the pointless shadow stack / ARCH_SHSTK_SHSTK for user workers ? 4/5
> doesn't complicate this code.
>
> Plus, again, the current code is not consistent. fpu_clone() won't do
> update_fpu_shstk() in this case. Not a bug too, but imo deserves a cleanup.
I thought we discussed that the user worker logic already wipes the whole FPU
state though, so we don't need to call update_fpu_shstk(). Did I get that wrong?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-28 21:48 ` Edgecombe, Rick P
@ 2025-08-29 15:06 ` Oleg Nesterov
2025-09-02 20:37 ` Edgecombe, Rick P
0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-29 15:06 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On 08/28, Edgecombe, Rick P wrote:
>
> On Wed, 2025-08-27 at 16:51 +0200, Oleg Nesterov wrote:
> > >
> > > I guess I question whether it really makes sense to add a special case for
> > > PF_USER_WORKER, including the existing logic. But I'm still trying to piece
> > > together a clearly stated benefit.
> >
> > Again, I don't understand... To me, currently arch/x86/kernel/fpu/regset.c
> > adds a special case for PF_USER_WORKER, this series tries to remove it (but
> > we need a bit more of simple changes).
>
> That commit I dug up? It didn't have a super strong justification either. Can
> you say what your intended benefit is?
I meant that arch/x86/kernel/fpu/regset.c adds a special case for PF_USER_WORKER
in that this is the only case when x86_task_fpu(PF_USER_WORKER) is used.
> > OK, then how/where we can clear this flag if we avoid the pointless shadow
> > stack allocation for PF_USER_WORKER?
>
> *If* we want to worry about an extra shadow stack allocation (which Dave seems
> to doubt), we don't need to clear ARCH_SHSTK_SHSTK to avoid allocations. Other
> thread types already avoid it (vfork, etc). So just add to the existing logic
> that skips shadow stack allocation. Make it do that for user workers too, and
> leave ARCH_SHSTK_SHSTK alone.
From 0/5:
However, there is an annoying complication: shstk_alloc_thread_stack()
can alloc the pointless shadow stack for PF_USER_WORKER thread and set
the ARCH_SHSTK_SHSTK flag. This means that ssp_get()->ssp_active() can
return true, and in this case it wouldn't be right to use the "unrelated"
init_fpstate.
> > Why?
>
> Because ARCH_SHSTK_SHSTK is supposed to be inherited by children. It adds a
> special case for no reason.
See above. And it has no meaning for io-threads, right?
> > Plus, again, the current code is not consistent. fpu_clone() won't do
> > update_fpu_shstk() in this case. Not a bug too, but imo deserves a cleanup.
>
> I thought we discussed that the user worker logic already wipes the whole FPU
> state though, so we don't need to call update_fpu_shstk(). Did I get that wrong?
Sure, but see the note from 0/5.
We don't need to call update_fpu_shstk() and initialize ->user_ssp.
Yet ssp_get() will report the bogus cetregs->user_ssp.
This all doesn't look right to me even if nothing really bad can happen.
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-08-29 15:06 ` Oleg Nesterov
@ 2025-09-02 20:37 ` Edgecombe, Rick P
2025-09-03 9:54 ` Oleg Nesterov
0 siblings, 1 reply; 17+ messages in thread
From: Edgecombe, Rick P @ 2025-09-02 20:37 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On Fri, 2025-08-29 at 17:06 +0200, Oleg Nesterov wrote:
> > *If* we want to worry about an extra shadow stack allocation (which Dave
> > seems to doubt), we don't need to clear ARCH_SHSTK_SHSTK to avoid
> > allocations.
> > Other thread types already avoid it (vfork, etc). So just add to the
> > existing logic that skips shadow stack allocation. Make it do that for user
> > workers too, and leave ARCH_SHSTK_SHSTK alone.
>
> From 0/5:
>
> However, there is an annoying complication:
> shstk_alloc_thread_stack()
> can alloc the pointless shadow stack for PF_USER_WORKER thread and
> set
> the ARCH_SHSTK_SHSTK flag. This means that ssp_get()->ssp_active()
> can
> return true, and in this case it wouldn't be right to use the
> "unrelated"
> init_fpstate.
Yea the ptrace code currently assumes there will be a non-init SHSTK FPU state.
But if the init state is currently associated with the FPU, whether it's via a
cleared copy, or some pointer redirection as you proposed, what is the
difference?
Hmm, I actually do see a potential concrete issue...
fpu_clone() will wipe out the FPU state for PF_USER_WORKER, which means if
xsaves decides to use the init optimization for CET, "get_xsave_addr(xsave,
XFEATURE_CET_USER)" could return NULL and trigger a warning. I would think we
could address this by just removing the warning, since the comment is incorrect.
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 0986c2200adc..094a891bfea8 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -196,15 +196,8 @@ int ssp_get(struct task_struct *target, const struct
user_regset *regset,
sync_fpstate(fpu);
cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER);
- if (WARN_ON(!cetregs)) {
- /*
- * This shouldn't ever be NULL because shadow stack was
- * verified to be enabled above. This means
- * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so
- * XFEATURE_CET_USER should not be in the init state.
- */
+ if (cetregs)
return -ENODEV;
- }
return membuf_write(&to, (unsigned long *)&cetregs->user_ssp,
sizeof(cetregs->user_ssp));
@@ -241,15 +234,8 @@ int ssp_set(struct task_struct *target, const struct
user_regset *regset,
fpu_force_restore(fpu);
cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER);
- if (WARN_ON(!cetregs)) {
- /*
- * This shouldn't ever be NULL because shadow stack was
- * verified to be enabled above. This means
- * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so
- * XFEATURE_CET_USER should not be in the init state.
- */
+ if (cetregs)
return -ENODEV;
- }
cetregs->user_ssp = user_ssp;
return 0;
If PF_USER_WORKER's ever do grow the ability to spawn threads, further changes
would be needed to restore CET_SHSTK_EN for the new thread. I actually think
this is a further point towards not having special logic for PF_USER_WORKER FPUs
(beyond the PKRU reasoning). As in, instead of making these proposed changes,
instead rollback the existing differences. But I'm not sure it's worth it at
this time.
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-09-02 20:37 ` Edgecombe, Rick P
@ 2025-09-03 9:54 ` Oleg Nesterov
2025-09-03 15:46 ` Edgecombe, Rick P
0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-09-03 9:54 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On 09/02, Edgecombe, Rick P wrote:
>
> On Fri, 2025-08-29 at 17:06 +0200, Oleg Nesterov wrote:
> > > *If* we want to worry about an extra shadow stack allocation (which Dave
> > > seems to doubt), we don't need to clear ARCH_SHSTK_SHSTK to avoid
> > > allocations.
> > > Other thread types already avoid it (vfork, etc). So just add to the
> > > existing logic that skips shadow stack allocation. Make it do that for user
> > > workers too, and leave ARCH_SHSTK_SHSTK alone.
> >
> > From 0/5:
> >
> > However, there is an annoying complication:
> > shstk_alloc_thread_stack()
> > can alloc the pointless shadow stack for PF_USER_WORKER thread and
> > set
> > the ARCH_SHSTK_SHSTK flag. This means that ssp_get()->ssp_active()
> > can
> > return true, and in this case it wouldn't be right to use the
> > "unrelated"
> > init_fpstate.
>
> Yea the ptrace code currently assumes there will be a non-init SHSTK FPU state.
> But if the init state is currently associated with the FPU, whether it's via a
> cleared copy, or some pointer redirection as you proposed, what is the
> difference?
Well. Lets forget about other changes for the moment. Lets only discuss 4/5.
> Hmm, I actually do see a potential concrete issue...
>
> fpu_clone() will wipe out the FPU state for PF_USER_WORKER, which means if
> xsaves decides to use the init optimization for CET, "get_xsave_addr(xsave,
> XFEATURE_CET_USER)" could return NULL and trigger a warning.
Even if get_xsave_addr() returns a valid pointer, what is the point to try to
report cetregs->user_ssp which doesn't match the reality?
Again, update_fpu_shstk() was not called, ->user_ssp can't be correct.
Why not simply clear ARCH_SHSTK_SHSTK as 4/5 does? With this change ssp_get()
will return -ENODEV right after the ssp_active() check.
Unless I am totally confused, ARCH_SHSTK_SHSTK has no meaning for PF_USER_WORKER
kernel threads, so I don't understand your objections.
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths
2025-09-03 9:54 ` Oleg Nesterov
@ 2025-09-03 15:46 ` Edgecombe, Rick P
0 siblings, 0 replies; 17+ messages in thread
From: Edgecombe, Rick P @ 2025-09-03 15:46 UTC (permalink / raw)
To: oleg@redhat.com
Cc: debug@rivosinc.com, mingo@kernel.org, bp@alien8.de,
broonie@kernel.org, peterz@infradead.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
dave.hansen@linux.intel.com, Mehta, Sohil, x86@kernel.org
On Wed, 2025-09-03 at 11:54 +0200, Oleg Nesterov wrote:
> > Hmm, I actually do see a potential concrete issue...
> >
> > fpu_clone() will wipe out the FPU state for PF_USER_WORKER, which means if
> > xsaves decides to use the init optimization for CET, "get_xsave_addr(xsave,
> > XFEATURE_CET_USER)" could return NULL and trigger a warning.
>
> Even if get_xsave_addr() returns a valid pointer, what is the point to try to
> report cetregs->user_ssp which doesn't match the reality?
> Again, update_fpu_shstk() was not called, ->user_ssp can't be correct.
I think it would be better to have less special cases in the FPU. I'm not sure
what you mean by "correct". As above, it gets zeroed in fpu_clone(). I guess you
want it to be something else.
If we are talking pure correctness, and not functional issues, I'm questioning
whether it actually should be zeroed for user workers. The behavior would be
leave shadow stack enabled, but don't allocate a shadow stack. But all this adds
complexity cost, which seems there isn't appetite for.
I do see the potential to hit a warning from userspace. That seems like
something that could be fixed and does have a clear purpose.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-09-03 15:47 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 15:36 [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 1/5] x86/fpu: don't use x86_task_fpu() in copy_xstate_to_uabi_buf() Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 2/5] x86/fpu: regset: introduce get_fpstate() helper Oleg Nesterov
2025-08-22 15:36 ` [PATCH v2 3/5] x86/fpu: fold sync_fpstate() into get_fpstate() Oleg Nesterov
2025-08-22 15:37 ` [PATCH v2 4/5] x86/shstk: don't create the shadow stack for PF_USER_WORKERs Oleg Nesterov
2025-08-22 15:37 ` [PATCH v2 5/5] x86/fpu: change get_fpstate() to return &init_fpstate if PF_USER_WORKER Oleg Nesterov
2025-08-22 16:32 ` [PATCH v2 0/5] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths Edgecombe, Rick P
2025-08-22 19:21 ` Oleg Nesterov
2025-08-22 20:01 ` Edgecombe, Rick P
2025-08-25 13:47 ` Oleg Nesterov
2025-08-27 14:12 ` Edgecombe, Rick P
2025-08-27 14:51 ` Oleg Nesterov
2025-08-28 21:48 ` Edgecombe, Rick P
2025-08-29 15:06 ` Oleg Nesterov
2025-09-02 20:37 ` Edgecombe, Rick P
2025-09-03 9:54 ` Oleg Nesterov
2025-09-03 15:46 ` Edgecombe, Rick P
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).