* [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
@ 2023-08-18 17:03 Rick Edgecombe
2023-08-18 19:35 ` Lijun Pan
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Rick Edgecombe @ 2023-08-18 17:03 UTC (permalink / raw)
To: tglx, mingo, bp, x86, hpa, dave.hansen, luto, peterz
Cc: linux-kernel, Rick Edgecombe
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task reschedules to CPU1
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0
4. Continuing the exec, the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
5. The task reschedules back to CPU0 before completing the exec and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
6. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
Hi,
This was spotted on a specific pre-production setup running an
out-of-tree glibc and the x86/shstk tip branch. The symptom observed
was a shadow stack segfault in ld-linux. The test case was a kernel
build with a high number of threads and it was able to generate the
segfault relatively reliably.
I was surprised to find that the root cause was not related to supervisor
xsave and instead seems to be a general FPU bug where the FPU state will
not be reset during exec if rescheduling happens twice in certain points
during the operation. It seems to be so old that I had a hard time
figuring which commit to blame.
A guess at how this was able to lurk so long is the combination of two
factors. One is that this specific test environment and workload seemed
to like to generate this specific pattern of scheduling for some reason.
So the fact it was reliably reproduced there could be not be indicative of
the typical case. The other factor is that CET features will rather loudly
alert to any corrupted FPU state due to the enforcement nature of that
state. So maybe this FPU reset miss during exec happened less commonly in
the wild, but most existing apps can survive it silently.
But since it's still a bit surprising, I would appreciate some extra
scrutiny on the reasoning. I verified the FPU state was not getting reset
during exec’s that experienced rescheduling to another CPU and back at
times as described in the commit log. Then following the logic in the
code, failing to restore the FPU would be expected. And fixing that logic
fixed the observed issue. But still surprised this wasn't seen before now.
Thanks,
Rick
---
arch/x86/kernel/fpu/context.h | 3 +--
arch/x86/kernel/fpu/core.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index af5cbdd9bd29..f6d856bd50bc 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,7 @@
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * Invalidate a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index e03b6b107b20..a86d37052a64 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -713,7 +713,7 @@ static void fpu_reset_fpregs(void)
struct fpu *fpu = ¤t->thread.fpu;
fpregs_lock();
- fpu__drop(fpu);
+ __fpu_invalidate_fpregs_state(fpu);
/*
* This does not change the actual hardware registers. It just
* resets the memory image and sets TIF_NEED_FPU_LOAD so a
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
2023-08-18 17:03 [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec Rick Edgecombe
@ 2023-08-18 19:35 ` Lijun Pan
2023-08-18 21:56 ` Sohil Mehta
2023-08-24 0:04 ` Lijun Pan
2023-08-18 23:35 ` Sohil Mehta
` (2 subsequent siblings)
3 siblings, 2 replies; 8+ messages in thread
From: Lijun Pan @ 2023-08-18 19:35 UTC (permalink / raw)
To: Rick Edgecombe, tglx, mingo, bp, x86, hpa, dave.hansen, luto,
peterz
Cc: linux-kernel
Hi Rick,
On 8/18/2023 12:03 PM, Rick Edgecombe wrote:
> The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
> valid and should be reloaded when returning to userspace. However, the
> kernel will skip doing this if the FPU registers are already valid as
> determined by fpregs_state_valid(). The logic embedded there considers
> the state valid if two cases are both true:
> 1: fpu_fpregs_owner_ctx points to the current tasks FPU state
> 2: the last CPU the registers were live in was the current CPU.
>
> This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
> the current FPU during the fpregs_restore_userregs() operation, so it
> indicates that the registers have been restored on this CPU. But this
> alone doesn’t preclude that the task hasn’t been rescheduled to a
> different CPU, where the registers were modified, and then back to the
> current CPU. To verify that this was not the case the logic relies on the
> second condition. So the assumption is that if the registers have been
> restored, AND they haven’t had the chance to be modified (by being
> loaded on another CPU), then they MUST be valid on the current CPU.
>
> Besides the lazy FPU optimizations, the other cases where the FPU
> registers might not be valid are when the kernel modifies the FPU register
> state or the FPU saved buffer. In this case the operation modifying the
> FPU state needs to let the kernel know the correspondence has been
> broken. The comment in “arch/x86/kernel/fpu/context.h” has:
> /*
> ...
> * If the FPU register state is valid, the kernel can skip restoring the
> * FPU state from memory.
> *
> * Any code that clobbers the FPU registers or updates the in-memory
> * FPU state for a task MUST let the rest of the kernel know that the
> * FPU registers are no longer valid for this task.
> *
> * Either one of these invalidation functions is enough. Invalidate
> * a resource you control: CPU if using the CPU for something else
> * (with preemption disabled), FPU for the current task, or a task that
> * is prevented from running by the current task.
> */
>
> However, this is not completely true. When the kernel modifies the
> registers or saved FPU state, it can only rely on
> __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
> tracking. The exec path instead relies on fpregs_deactivate(), which sets
> the CPU’s FPU context to NULL. This was observed to fail to restore the
> reset FPU state to the registers when returning to userspace in the
> following scenario:
>
> 1. A task is executing in userspace on CPU0
> - CPU0’s FPU context points to tasks
> - fpu->last_cpu=CPU0
> 2. The task exec()’s
> 3. While in the kernel the task reschedules to CPU1
> - CPU0 gets a thread executing in the kernel (such that no other
> FPU context is activated)
> - Scheduler sets task’s fpu->last_cpu=CPU0
> 4. Continuing the exec, the task gets to
> fpu_flush_thread()->fpu_reset_fpregs()
> - Sets CPU1’s fpu context to NULL
> - Copies the init state to the task’s FPU buffer
> - Sets TIF_NEED_FPU_LOAD on the task
> 5. The task reschedules back to CPU0 before completing the exec and
> returning to userspace
> - During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
> - Skips saving the registers and updating task’s fpu→last_cpu,
> because TIF_NEED_FPU_LOAD is the canonical source.
> 6. Now CPU0’s FPU context is still pointing to the task’s, and
> fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
> though the reset FPU state has not been restored.
>
> So the root cause is that exec() is doing the wrong kind of invalidate. It
> should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
> fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
> going away, they are just getting reset as part of an exec. So switch to
> __fpu_invalidate_fpregs_state().
>
> Also, delete the misleading comment that says that either kind of
> invalidate will be enough, because it’s not always the case.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>
> ---
> Hi,
>
> This was spotted on a specific pre-production setup running an
> out-of-tree glibc and the x86/shstk tip branch. The symptom observed
> was a shadow stack segfault in ld-linux. The test case was a kernel
> build with a high number of threads and it was able to generate the
> segfault relatively reliably.
>
> I was surprised to find that the root cause was not related to supervisor
> xsave and instead seems to be a general FPU bug where the FPU state will
> not be reset during exec if rescheduling happens twice in certain points
> during the operation. It seems to be so old that I had a hard time
> figuring which commit to blame.
>
> A guess at how this was able to lurk so long is the combination of two
> factors. One is that this specific test environment and workload seemed
> to like to generate this specific pattern of scheduling for some reason.
> So the fact it was reliably reproduced there could be not be indicative of
> the typical case. The other factor is that CET features will rather loudly
> alert to any corrupted FPU state due to the enforcement nature of that
> state. So maybe this FPU reset miss during exec happened less commonly in
> the wild, but most existing apps can survive it silently.
>
> But since it's still a bit surprising, I would appreciate some extra
> scrutiny on the reasoning. I verified the FPU state was not getting reset
> during exec’s that experienced rescheduling to another CPU and back at
> times as described in the commit log. Then following the logic in the
> code, failing to restore the FPU would be expected. And fixing that logic
> fixed the observed issue. But still surprised this wasn't seen before now.
>
> Thanks,
> Rick
> ---
> arch/x86/kernel/fpu/context.h | 3 +--
> arch/x86/kernel/fpu/core.c | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
> index af5cbdd9bd29..f6d856bd50bc 100644
> --- a/arch/x86/kernel/fpu/context.h
> +++ b/arch/x86/kernel/fpu/context.h
> @@ -19,8 +19,7 @@
> * FPU state for a task MUST let the rest of the kernel know that the
> * FPU registers are no longer valid for this task.
> *
> - * Either one of these invalidation functions is enough. Invalidate
> - * a resource you control: CPU if using the CPU for something else
> + * Invalidate a resource you control: CPU if using the CPU for something else
> * (with preemption disabled), FPU for the current task, or a task that
> * is prevented from running by the current task.
> */
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e03b6b107b20..a86d37052a64 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -713,7 +713,7 @@ static void fpu_reset_fpregs(void)
> struct fpu *fpu = ¤t->thread.fpu;
>
> fpregs_lock();
> - fpu__drop(fpu);
> + __fpu_invalidate_fpregs_state(fpu);
> /*
> * This does not change the actual hardware registers. It just
> * resets the memory image and sets TIF_NEED_FPU_LOAD so a
Thanks for the patch. Let me get back to my server (offline now) next
Monday and will add a "Test-by: Lijun Pan <lijun.pan@intel.com>" if it
passes.
In our bug case, probably just switching to
__fpu_invalidate_fpregs_state(fpu) from fpu__drop(fpu) is enough.
Maybe there are some other cases that need
__this_cpu_write(fpu_fpregs_owner_ctx, NULL) through fpu__drop() call,
which I don't know.
Here is the excerpt of the call sequence:
fpu__drop() -> fpregs_deactivate() ->
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
arch/x86/kernel/fpu/context.h has
static inline void __cpu_invalidate_fpregs_state(void)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
{
fpu->last_cpu = -1;
}
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu ==
fpu->last_cpu;
}
So, I am thinking if it is more rigorous to have both
(__cpu_invalidate_fpregs_state, __fpu_invalidate_fpregs_state)
invalidated, similarly as fpregs_state_valid checks both conditions.
code changes like below:
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 958accf2ccf0..fd3304bed0a2 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,8 @@
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * To be more rigorous and to prevent from future corner case, Invalidate
+ * both resources you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 97a899bf98b9..08b9cef0e076 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -725,6 +725,7 @@ static void fpu_reset_fpregs(void)
fpregs_lock();
fpu__drop(fpu);
+ __fpu_invalidate_fpregs_state(fpu);
/*
* This does not change the actual hardware registers. It just
* resets the memory image and sets TIF_NEED_FPU_LOAD so a
Thanks,
Lijun
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
2023-08-18 19:35 ` Lijun Pan
@ 2023-08-18 21:56 ` Sohil Mehta
2023-08-24 0:04 ` Lijun Pan
1 sibling, 0 replies; 8+ messages in thread
From: Sohil Mehta @ 2023-08-18 21:56 UTC (permalink / raw)
To: Lijun Pan, Rick Edgecombe, tglx, mingo, bp, x86, hpa, dave.hansen,
luto, peterz
Cc: linux-kernel
On 8/18/2023 12:35 PM, Lijun Pan wrote:
> So, I am thinking if it is more rigorous to have both
> (__cpu_invalidate_fpregs_state, __fpu_invalidate_fpregs_state)
> invalidated, similarly as fpregs_state_valid checks both conditions.
>
> code changes like below:
> diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
> index 958accf2ccf0..fd3304bed0a2 100644
> --- a/arch/x86/kernel/fpu/context.h
> +++ b/arch/x86/kernel/fpu/context.h
> @@ -19,8 +19,8 @@
> * FPU state for a task MUST let the rest of the kernel know that the
> * FPU registers are no longer valid for this task.
> *
> - * Either one of these invalidation functions is enough. Invalidate
> - * a resource you control: CPU if using the CPU for something else
> + * To be more rigorous and to prevent from future corner case, Invalidate
> + * both resources you control: CPU if using the CPU for something else
> * (with preemption disabled), FPU for the current task, or a task that
> * is prevented from running by the current task.
> */
This is a general comment in the header to guide multiple scenarios. I
am not sure if invalidating both would be feasible in all scenarios. For
example, in cases such as the fpu_idle_fpregs() there is no task fpu
available.
I think the current issue stems from the fact that the code in exec()
doesn't really control the CPU. It is mostly running with preemption
enabled which can cause the CPUs to change as described in Rick's
scenario. Thus clearing CPU1's FPU pointer in step 4 seems unnecessary.
An extra invalidation probably won't impact performance for such a rare
scenario. However, replacing it with __fpu_invalidate_fpregs_state() to
invalidate the task's fpu state looks correct to me.
Maybe, we can expand the comment above to specify exactly when a
particular invalidation is expected vs when both are expected.
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 97a899bf98b9..08b9cef0e076 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -725,6 +725,7 @@ static void fpu_reset_fpregs(void)
>
> fpregs_lock();
> fpu__drop(fpu);
> + __fpu_invalidate_fpregs_state(fpu);
> /*
> * This does not change the actual hardware registers. It just
> * resets the memory image and sets TIF_NEED_FPU_LOAD so a
>
> Thanks,
> Lijun
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
2023-08-18 17:03 [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec Rick Edgecombe
2023-08-18 19:35 ` Lijun Pan
@ 2023-08-18 23:35 ` Sohil Mehta
2023-08-22 20:10 ` Sohil Mehta
2023-08-24 9:11 ` [tip: x86/urgent] x86/fpu: Invalidate FPU state correctly on exec() tip-bot2 for Rick Edgecombe
3 siblings, 0 replies; 8+ messages in thread
From: Sohil Mehta @ 2023-08-18 23:35 UTC (permalink / raw)
To: Rick Edgecombe, tglx, mingo, bp, x86, hpa, dave.hansen, luto,
peterz
Cc: linux-kernel
Hi Rick,
On 8/18/2023 10:03 AM, Rick Edgecombe wrote:
> So the root cause is that exec() is doing the wrong kind of invalidate. It
> should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
> fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
> going away, they are just getting reset as part of an exec. So switch to
> __fpu_invalidate_fpregs_state().
>
I went through the scenario you described and it seems plausible. Since
the task is being reset (and the CPU is unaffected) it makes sense to
use __fpu_invalidate_fpregs_state() during exec().
> Also, delete the misleading comment that says that either kind of
> invalidate will be enough, because it’s not always the case.
>
I think it would be helpful to expand the comment specifying exactly
when __fpu_invalidate_fpregs_state() should be used vs
__cpu_invalidate_fpregs_state(). I'll try to audit the usages and come
up with something reasonable if possible.
Sohil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
2023-08-18 17:03 [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec Rick Edgecombe
2023-08-18 19:35 ` Lijun Pan
2023-08-18 23:35 ` Sohil Mehta
@ 2023-08-22 20:10 ` Sohil Mehta
2023-08-24 9:11 ` [tip: x86/urgent] x86/fpu: Invalidate FPU state correctly on exec() tip-bot2 for Rick Edgecombe
3 siblings, 0 replies; 8+ messages in thread
From: Sohil Mehta @ 2023-08-22 20:10 UTC (permalink / raw)
To: Rick Edgecombe, tglx, mingo, bp, x86, hpa, dave.hansen, luto,
peterz
Cc: linux-kernel
Hi Rick,
The comment below doesn't change anything meaningfully, but mainly helps
clarifies the original comment.
Maybe something like this:
> * If the FPU register state is valid, the kernel can skip restoring the
> * FPU state from memory.
> *
> - * Any code that clobbers the FPU registers or updates the in-memory
> - * FPU state for a task MUST let the rest of the kernel know that the
> - * FPU registers are no longer valid for this task.
> + * Any code that clobbers the FPU registers or updates the in-memory FPU state
> + * for a task MUST let the rest of the kernel know that whether the FPU
> + * registers are no longer valid anywhere for this task or if the FPU state on
> + * a particular CPU is invalid.
> *
> - * Either one of these invalidation functions is enough. Invalidate
> - * a resource you control: CPU if using the CPU for something else
> - * (with preemption disabled), FPU for the current task, or a task that
> - * is prevented from running by the current task.
> + * Invalidate a resource that you truly control:
> + * - Use __cpu_invalidate_fpregs_state() or equivalent when using the CPU for
> + * something else (only with preemption disabled)
> + * - Use __fpu_invalidate_fpregs_state() or equivalent when modifying the FPU
> + * for the current task, or a task that is prevented from running by the
> + * current task.
> */
Please disregard my suggestion if you think the existing one is
sufficient for now.
On 8/18/2023 10:03 AM, Rick Edgecombe wrote:
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e03b6b107b20..a86d37052a64 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -713,7 +713,7 @@ static void fpu_reset_fpregs(void)
> struct fpu *fpu = ¤t->thread.fpu;
>
> fpregs_lock();
> - fpu__drop(fpu);
> + __fpu_invalidate_fpregs_state(fpu);
> /*
> * This does not change the actual hardware registers. It just
> * resets the memory image and sets TIF_NEED_FPU_LOAD so a
The code fix looks fine to me.
Even if you don't update the comment, please feel free to add:
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec
2023-08-18 19:35 ` Lijun Pan
2023-08-18 21:56 ` Sohil Mehta
@ 2023-08-24 0:04 ` Lijun Pan
1 sibling, 0 replies; 8+ messages in thread
From: Lijun Pan @ 2023-08-24 0:04 UTC (permalink / raw)
To: Rick Edgecombe, tglx, mingo, bp, x86, hpa, dave.hansen, luto,
peterz
Cc: linux-kernel
On 8/18/2023 2:35 PM, Lijun Pan wrote:
> Hi Rick,
>
> On 8/18/2023 12:03 PM, Rick Edgecombe wrote:
>> The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
>> valid and should be reloaded when returning to userspace. However, the
>> kernel will skip doing this if the FPU registers are already valid as
>> determined by fpregs_state_valid(). The logic embedded there considers
>> the state valid if two cases are both true:
>> 1: fpu_fpregs_owner_ctx points to the current tasks FPU state
>> 2: the last CPU the registers were live in was the current CPU.
>>
>> This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
>> the current FPU during the fpregs_restore_userregs() operation, so it
>> indicates that the registers have been restored on this CPU. But this
>> alone doesn’t preclude that the task hasn’t been rescheduled to a
>> different CPU, where the registers were modified, and then back to the
>> current CPU. To verify that this was not the case the logic relies on the
>> second condition. So the assumption is that if the registers have been
>> restored, AND they haven’t had the chance to be modified (by being
>> loaded on another CPU), then they MUST be valid on the current CPU.
>>
>> Besides the lazy FPU optimizations, the other cases where the FPU
>> registers might not be valid are when the kernel modifies the FPU
>> register
>> state or the FPU saved buffer. In this case the operation modifying the
>> FPU state needs to let the kernel know the correspondence has been
>> broken. The comment in “arch/x86/kernel/fpu/context.h” has:
>> /*
>> ...
>> * If the FPU register state is valid, the kernel can skip restoring the
>> * FPU state from memory.
>> *
>> * Any code that clobbers the FPU registers or updates the in-memory
>> * FPU state for a task MUST let the rest of the kernel know that the
>> * FPU registers are no longer valid for this task.
>> *
>> * Either one of these invalidation functions is enough. Invalidate
>> * a resource you control: CPU if using the CPU for something else
>> * (with preemption disabled), FPU for the current task, or a task that
>> * is prevented from running by the current task.
>> */
>>
>> However, this is not completely true. When the kernel modifies the
>> registers or saved FPU state, it can only rely on
>> __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
>> tracking. The exec path instead relies on fpregs_deactivate(), which sets
>> the CPU’s FPU context to NULL. This was observed to fail to restore the
>> reset FPU state to the registers when returning to userspace in the
>> following scenario:
>>
>> 1. A task is executing in userspace on CPU0
>> - CPU0’s FPU context points to tasks
>> - fpu->last_cpu=CPU0
>> 2. The task exec()’s
>> 3. While in the kernel the task reschedules to CPU1
>> - CPU0 gets a thread executing in the kernel (such that no other
>> FPU context is activated)
>> - Scheduler sets task’s fpu->last_cpu=CPU0
>> 4. Continuing the exec, the task gets to
>> fpu_flush_thread()->fpu_reset_fpregs()
>> - Sets CPU1’s fpu context to NULL
>> - Copies the init state to the task’s FPU buffer
>> - Sets TIF_NEED_FPU_LOAD on the task
>> 5. The task reschedules back to CPU0 before completing the exec and
>> returning to userspace
>> - During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
>> - Skips saving the registers and updating task’s fpu→last_cpu,
>> because TIF_NEED_FPU_LOAD is the canonical source.
>> 6. Now CPU0’s FPU context is still pointing to the task’s, and
>> fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true
>> even
>> though the reset FPU state has not been restored.
>>
>> So the root cause is that exec() is doing the wrong kind of
>> invalidate. It
>> should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
>> fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
>> going away, they are just getting reset as part of an exec. So switch to
>> __fpu_invalidate_fpregs_state().
>>
>> Also, delete the misleading comment that says that either kind of
>> invalidate will be enough, because it’s not always the case.
>>
>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>
>> ---
>> Hi,
>>
>> This was spotted on a specific pre-production setup running an
>> out-of-tree glibc and the x86/shstk tip branch. The symptom observed
>> was a shadow stack segfault in ld-linux. The test case was a kernel
>> build with a high number of threads and it was able to generate the
>> segfault relatively reliably.
>>
>> I was surprised to find that the root cause was not related to supervisor
>> xsave and instead seems to be a general FPU bug where the FPU state will
>> not be reset during exec if rescheduling happens twice in certain points
>> during the operation. It seems to be so old that I had a hard time
>> figuring which commit to blame.
>>
>> A guess at how this was able to lurk so long is the combination of two
>> factors. One is that this specific test environment and workload seemed
>> to like to generate this specific pattern of scheduling for some reason.
>> So the fact it was reliably reproduced there could be not be
>> indicative of
>> the typical case. The other factor is that CET features will rather
>> loudly
>> alert to any corrupted FPU state due to the enforcement nature of that
>> state. So maybe this FPU reset miss during exec happened less commonly in
>> the wild, but most existing apps can survive it silently.
>>
>> But since it's still a bit surprising, I would appreciate some extra
>> scrutiny on the reasoning. I verified the FPU state was not getting reset
>> during exec’s that experienced rescheduling to another CPU and back at
>> times as described in the commit log. Then following the logic in the
>> code, failing to restore the FPU would be expected. And fixing that logic
>> fixed the observed issue. But still surprised this wasn't seen before
>> now.
>>
>> Thanks,
>> Rick
>> ---
>> arch/x86/kernel/fpu/context.h | 3 +--
>> arch/x86/kernel/fpu/core.c | 2 +-
>> 2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/fpu/context.h
>> b/arch/x86/kernel/fpu/context.h
>> index af5cbdd9bd29..f6d856bd50bc 100644
>> --- a/arch/x86/kernel/fpu/context.h
>> +++ b/arch/x86/kernel/fpu/context.h
>> @@ -19,8 +19,7 @@
>> * FPU state for a task MUST let the rest of the kernel know that the
>> * FPU registers are no longer valid for this task.
>> *
>> - * Either one of these invalidation functions is enough. Invalidate
>> - * a resource you control: CPU if using the CPU for something else
>> + * Invalidate a resource you control: CPU if using the CPU for
>> something else
>> * (with preemption disabled), FPU for the current task, or a task that
>> * is prevented from running by the current task.
>> */
>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> index e03b6b107b20..a86d37052a64 100644
>> --- a/arch/x86/kernel/fpu/core.c
>> +++ b/arch/x86/kernel/fpu/core.c
>> @@ -713,7 +713,7 @@ static void fpu_reset_fpregs(void)
>> struct fpu *fpu = ¤t->thread.fpu;
>> fpregs_lock();
>> - fpu__drop(fpu);
>> + __fpu_invalidate_fpregs_state(fpu);
>> /*
>> * This does not change the actual hardware registers. It just
>> * resets the memory image and sets TIF_NEED_FPU_LOAD so a
>
>
> Thanks for the patch. Let me get back to my server (offline now) next
> Monday and will add a "Test-by: Lijun Pan <lijun.pan@intel.com>" if it
> passes.
I have run some relevant tests (compiling kernel with high thread amount
repeatedly, make -j 256),
glibc tests (https://gitlab.com/x86-glibc/glibc/-/blob/master/INSTALL),
fpu & cet (ibt, shadow stack) test (https://github.com/intel/lkvs),
and not yet found regressions.
Tested-by: Lijun Pan <lijun.pan@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Since this problem was first reported by Lei Wang, I suggest adding:
Reported-by: Lei Wang <lei4.wang@intel.com>
Thanks,
Lijun
>
> In our bug case, probably just switching to
> __fpu_invalidate_fpregs_state(fpu) from fpu__drop(fpu) is enough.
>
> Maybe there are some other cases that need
> __this_cpu_write(fpu_fpregs_owner_ctx, NULL) through fpu__drop() call,
> which I don't know.
>
> Here is the excerpt of the call sequence:
> fpu__drop() -> fpregs_deactivate() ->
> __this_cpu_write(fpu_fpregs_owner_ctx, NULL);
> arch/x86/kernel/fpu/context.h has
> static inline void __cpu_invalidate_fpregs_state(void)
> {
> __this_cpu_write(fpu_fpregs_owner_ctx, NULL);
> }
> static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
> {
> fpu->last_cpu = -1;
> }
> static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
> {
> return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu ==
> fpu->last_cpu;
> }
>
> So, I am thinking if it is more rigorous to have both
> (__cpu_invalidate_fpregs_state, __fpu_invalidate_fpregs_state)
> invalidated, similarly as fpregs_state_valid checks both conditions.
>
> code changes like below:
> diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
> index 958accf2ccf0..fd3304bed0a2 100644
> --- a/arch/x86/kernel/fpu/context.h
> +++ b/arch/x86/kernel/fpu/context.h
> @@ -19,8 +19,8 @@
> * FPU state for a task MUST let the rest of the kernel know that the
> * FPU registers are no longer valid for this task.
> *
> - * Either one of these invalidation functions is enough. Invalidate
> - * a resource you control: CPU if using the CPU for something else
> + * To be more rigorous and to prevent from future corner case, Invalidate
> + * both resources you control: CPU if using the CPU for something else
> * (with preemption disabled), FPU for the current task, or a task that
> * is prevented from running by the current task.
> */
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 97a899bf98b9..08b9cef0e076 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -725,6 +725,7 @@ static void fpu_reset_fpregs(void)
>
> fpregs_lock();
> fpu__drop(fpu);
> + __fpu_invalidate_fpregs_state(fpu);
> /*
> * This does not change the actual hardware registers. It just
> * resets the memory image and sets TIF_NEED_FPU_LOAD so a
>
> Thanks,
> Lijun
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip: x86/urgent] x86/fpu: Invalidate FPU state correctly on exec()
2023-08-18 17:03 [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec Rick Edgecombe
` (2 preceding siblings ...)
2023-08-22 20:10 ` Sohil Mehta
@ 2023-08-24 9:11 ` tip-bot2 for Rick Edgecombe
2023-08-24 15:22 ` Edgecombe, Rick P
3 siblings, 1 reply; 8+ messages in thread
From: tip-bot2 for Rick Edgecombe @ 2023-08-24 9:11 UTC (permalink / raw)
To: linux-tip-commits
Cc: Lei Wang, Rick Edgecombe, Thomas Gleixner, Lijun Pan, Sohil Mehta,
stable, x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 1f69383b203e28cf8a4ca9570e572da1699f76cd
Gitweb: https://git.kernel.org/tip/1f69383b203e28cf8a4ca9570e572da1699f76cd
Author: Rick Edgecombe <rick.p.edgecombe@intel.com>
AuthorDate: Fri, 18 Aug 2023 10:03:05 -07:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 24 Aug 2023 11:01:45 +02:00
x86/fpu: Invalidate FPU state correctly on exec()
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
---
arch/x86/kernel/fpu/context.h | 3 +--
arch/x86/kernel/fpu/core.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index af5cbdd..f6d856b 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,7 @@
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * Invalidate a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1015af1..98e507c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -679,7 +679,7 @@ static void fpu_reset_fpregs(void)
struct fpu *fpu = ¤t->thread.fpu;
fpregs_lock();
- fpu__drop(fpu);
+ __fpu_invalidate_fpregs_state(fpu);
/*
* This does not change the actual hardware registers. It just
* resets the memory image and sets TIF_NEED_FPU_LOAD so a
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [tip: x86/urgent] x86/fpu: Invalidate FPU state correctly on exec()
2023-08-24 9:11 ` [tip: x86/urgent] x86/fpu: Invalidate FPU state correctly on exec() tip-bot2 for Rick Edgecombe
@ 2023-08-24 15:22 ` Edgecombe, Rick P
0 siblings, 0 replies; 8+ messages in thread
From: Edgecombe, Rick P @ 2023-08-24 15:22 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org
Cc: tglx@linutronix.de, Mehta, Sohil, stable@vger.kernel.org,
Wang, Lei4, Pan, Lijun, x86@kernel.org
On Thu, 2023-08-24 at 09:11 +0000, tip-bot2 for Rick Edgecombe wrote:
> Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants")
Oops, yes this looks correct. Sorry for the wrong characterization of
it being older.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-24 15:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18 17:03 [PATCH] x86: Use __fpu_invalidate_fpregs_state() in exec Rick Edgecombe
2023-08-18 19:35 ` Lijun Pan
2023-08-18 21:56 ` Sohil Mehta
2023-08-24 0:04 ` Lijun Pan
2023-08-18 23:35 ` Sohil Mehta
2023-08-22 20:10 ` Sohil Mehta
2023-08-24 9:11 ` [tip: x86/urgent] x86/fpu: Invalidate FPU state correctly on exec() tip-bot2 for Rick Edgecombe
2023-08-24 15:22 ` Edgecombe, Rick P
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox