* [PATCH] ARM: spectre-v2: fix unstable cpu get
@ 2025-04-24 10:04 xieyuanbin1
2025-04-24 10:21 ` Russell King (Oracle)
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: xieyuanbin1 @ 2025-04-24 10:04 UTC (permalink / raw)
To: linux, sfr, rmk+kernel, xieyuanbin1
Cc: linux-arm-kernel, linux-kernel, will, nixiaoming, liaohua4,
wangbing6, lincheng8, wangfangpeng1
From: Xie Yuanbin <xieyuanbin1@huawei.com>
When a user program accesses the kernel address,
a segment fault will be triggered.
For the latest kernel, with arm's multi_v7_defconfig, and set
CONFIG_PREEMPT=y, CONFIG_DEBUG_PREEMPT=y, CONFIG_ARM_LPAE=y,
if a user program try to accesses a valid kernel address, for example:
```c
static void han(int x)
{
while (1);
}
int main(void)
{
signal(SIGSEGV, han);
/* 0xc0331fd4 is just a kernel address in kernel .text section */
__asm__ volatile (""::"r"(*(int *)(uintptr_t)0xc0331fd4):"memory");
while (1);
return 0;
}
```
, the following WARN will be triggered:
[ 1.089103] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
[ 1.093367] caller is __do_user_fault+0x20/0x6c
[ 1.094355] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.14.3 #7
[ 1.094585] Hardware name: Generic DT based system
[ 1.094706] Call trace:
[ 1.095211] unwind_backtrace from show_stack+0x10/0x14
[ 1.095329] show_stack from dump_stack_lvl+0x50/0x5c
[ 1.095352] dump_stack_lvl from check_preemption_disabled+0x104/0x108
[ 1.095448] check_preemption_disabled from __do_user_fault+0x20/0x6c
[ 1.095459] __do_user_fault from do_page_fault+0x334/0x3dc
[ 1.095505] do_page_fault from do_DataAbort+0x30/0xa8
[ 1.095528] do_DataAbort from __dabt_usr+0x54/0x60
[ 1.095570] Exception stack(0xf0825fb0 to 0xf0825ff8)
[ 1.095706] 5fa0: 00000000 befe9c64 00000000 c0331000
[ 1.095737] 5fc0: 00000002 0006dd68 00000001 befe9f04 befe9f0c 00070010 00000002 00073770
[ 1.095746] 5fe0: befe9d94 befe9da8 00010953 00010250 600d0030 ffffffff
In this case, the hook func in do_DataAbort `inf->fn` is `do_page_fault`,
`do_page_fault` enables the irq, so it is preemptible here.
Inside `harden_branch_predictor`, if we just obtains the hook function `fn`
of this CPU, then the task is scheduled to another CPU.
The previous function is still executed, which is dangerous.
Use get_cpu() and put_cpu() to fix it.
Fixes: f5fe12b1eaee ("ARM: spectre-v2: harden user aborts in kernel space")
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
---
arch/arm/include/asm/system_misc.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index 98b373403..93cad3d50 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -18,15 +18,16 @@ extern void (*arm_pm_idle)(void);
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
typedef void (*harden_branch_predictor_fn_t)(void);
DECLARE_PER_CPU(harden_branch_predictor_fn_t, harden_branch_predictor_fn);
static inline void harden_branch_predictor(void)
{
harden_branch_predictor_fn_t fn = per_cpu(harden_branch_predictor_fn,
- smp_processor_id());
+ get_cpu());
if (fn)
fn();
+ put_cpu();
}
#else
#define harden_branch_predictor() do { } while (0)
#endif
#define UDBG_UNDEFINED (1 << 0)
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: spectre-v2: fix unstable cpu get
2025-04-24 10:04 [PATCH] ARM: spectre-v2: fix unstable cpu get xieyuanbin1
@ 2025-04-24 10:21 ` Russell King (Oracle)
2025-04-24 13:31 ` xieyuanbin1
2025-05-07 12:28 ` [RFC PATCH] ARM: spectre-v2: fix the spectre operation that may be bypassed Xie Yuanbin
2025-06-06 1:43 ` [RFC RESEND " Xie Yuanbin
2 siblings, 1 reply; 7+ messages in thread
From: Russell King (Oracle) @ 2025-04-24 10:21 UTC (permalink / raw)
To: xieyuanbin1
Cc: sfr, linux-arm-kernel, linux-kernel, will, nixiaoming, liaohua4,
wangbing6, lincheng8, wangfangpeng1
On Thu, Apr 24, 2025 at 06:04:37PM +0800, xieyuanbin1 wrote:
> From: Xie Yuanbin <xieyuanbin1@huawei.com>
>
> When a user program accesses the kernel address,
Please see
https://lore.kernel.org/all/795c9463-452e-bf64-1cc0-c318ccecb1da@I-love.SAKURA.ne.jp/T/
As I see it, this can't be fixed - certainly not in this way due to the
reasons I set out in the above thread.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] ARM: spectre-v2: fix unstable cpu get
2025-04-24 10:21 ` Russell King (Oracle)
@ 2025-04-24 13:31 ` xieyuanbin1
2025-04-24 13:50 ` Russell King (Oracle)
0 siblings, 1 reply; 7+ messages in thread
From: xieyuanbin1 @ 2025-04-24 13:31 UTC (permalink / raw)
To: linux
Cc: liaohua4, lincheng8, linux-arm-kernel, linux-kernel, nixiaoming,
sfr, wangbing6, wangfangpeng1, will, xieyuanbin1
From: Xie Yuanbin <xieyuanbin1@huawei.com>
>> From: Xie Yuanbin <xieyuanbin1@huawei.com>
>>
>> When a user program accesses the kernel address,
>
>Please see
>https://lore.kernel.org/all/795c9463-452e-bf64-1cc0-c318ccecb1da@I-love.SAKURA.ne.jp/T/
>
>As I see it, this can't be fixed - certainly not in this way due to the
>reasons I set out in the above thread.
Oh, I'm sorry that I haven't noticed that someone has submitted similar patch before.
I've actually thought about a similar problem. In areas other than put_cpu/get_cpu, tasks may be scheduled to other CPUs, this cpu actually does not execute the spectre code.
However, in this case, a context_switch must be happended. Inside switch_mm, the spectre code is still executed, so I didn't think of it as a problem.
But there is a situation, if we switch to another thread of **this** process, we will not execute switch_mm, which is so sad.
As mentioned in the previous patch discussion, this is only an alarm. However, I do not think so. On heterogeneous CPUs, the spectre code of different cores may have different implementations.
If another cpu's fn is executed, there may be illegal instruction and panic.
User processes can attack this point, which is very dangerous.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: spectre-v2: fix unstable cpu get
2025-04-24 13:31 ` xieyuanbin1
@ 2025-04-24 13:50 ` Russell King (Oracle)
2025-04-24 14:03 ` xieyuanbin1
0 siblings, 1 reply; 7+ messages in thread
From: Russell King (Oracle) @ 2025-04-24 13:50 UTC (permalink / raw)
To: xieyuanbin1
Cc: liaohua4, lincheng8, linux-arm-kernel, linux-kernel, nixiaoming,
sfr, wangbing6, wangfangpeng1, will
On Thu, Apr 24, 2025 at 09:31:33PM +0800, xieyuanbin1 wrote:
> I've actually thought about a similar problem. In areas other than put_cpu/get_cpu, tasks may be scheduled to other CPUs, this cpu actually does not execute the spectre code.
My point is that if harden_branch_predictor() has been called from a
context where we are preemptible, then we _could_ end up running on a
different CPU to the one that we need to take action.
Consider your test program running on CPU 1 which requires fixup. It
takes a fault, and before we enter harden_branch_predictor(), we end
up being migrated to CPU 0, but doesn't require a switch of the MM.
Let's say we then disable preemption and then call
harden_branch_predictor(), and then restore the preemption state.
The thread then gets migrated back to CPU 1. Again, no switch of
the MM.
At this point, the mitigation has been completely bypassed.
IMHO, better to be noisy about this event (and it is only a kernel
warning) than to be silent about it and let userspace get away with
bypassing the mitigation.
I don't care if this disrupts test tooling. The trade off between
test tooling having a problem and a silent data leak through this
channel... the answer is pretty obvious that the test tooling
failing is less important than having a silent data leak.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] ARM: spectre-v2: fix unstable cpu get
2025-04-24 13:50 ` Russell King (Oracle)
@ 2025-04-24 14:03 ` xieyuanbin1
0 siblings, 0 replies; 7+ messages in thread
From: xieyuanbin1 @ 2025-04-24 14:03 UTC (permalink / raw)
To: linux
Cc: liaohua4, lincheng8, linux-arm-kernel, linux-kernel, nixiaoming,
sfr, wangbing6, wangfangpeng1, will, xieyuanbin1
From: Xie Yuanbin <xieyuanbin1@huawei.com>
>Consider your test program running on CPU 1 which requires fixup. It
>takes a fault, and before we enter harden_branch_predictor(), we end
>up being migrated to CPU 0, but doesn't require a switch of the MM.
>Let's say we then disable preemption and then call
>harden_branch_predictor(), and then restore the preemption state.
>The thread then gets migrated back to CPU 1. Again, no switch of
>the MM.
Your assumption is correct, and I agree with it.
>I don't care if this disrupts test tooling. The trade off between
>test tooling having a problem and a silent data leak through this
>channel... the answer is pretty obvious that the test tooling
>failing is less important than having a silent data leak.
I have never mentioned a test tool, and I agree with you.
The problem I mentioned before is possible illegal instruction and panic.
If we want to fix this problem without affecting performance,
can we add a new hook functions (in fsr-3level.c)?
I don't know much about the details of the ARM manual,
Are all user-mode access to kernel-mode addresses are `permission fault`
and common page faults (hot code) are `access flag fault`?
If so, we can add a new hook function to deal with `permission fault`
and set it in fsr-3level.c.
This means we fix the problem without adding a branch,
that there is no performance loss.
Otherwise, we can only add condition judgment before enabling irq,
which means that performance loss occurs. This is so sad. :(
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH] ARM: spectre-v2: fix the spectre operation that may be bypassed
2025-04-24 10:04 [PATCH] ARM: spectre-v2: fix unstable cpu get xieyuanbin1
2025-04-24 10:21 ` Russell King (Oracle)
@ 2025-05-07 12:28 ` Xie Yuanbin
2025-06-06 1:43 ` [RFC RESEND " Xie Yuanbin
2 siblings, 0 replies; 7+ messages in thread
From: Xie Yuanbin @ 2025-05-07 12:28 UTC (permalink / raw)
To: linux, rmk+kernel, kees, yangyj.ee, ardb, tony, xieyuanbin1
Cc: linux-arm-kernel, linux-kernel, will, nixiaoming, liaohua4,
wangbing6, lincheng8, wangfangpeng1
As discussed before, to completely fix this problem, we must do
the spectre operation after the user mode is trapped in the kernel
and before the interrupt is enabled.
I have tried to add a hook function and it in fsr_info to avoid
performance deterioration (The preceding example will trigger
"level 2 permission fault", which is cold code in normal cases).
However, this does not work. I find that the user program can
also trigger "translation fault" or "access flag fault"
when accessing a kernel address.
Therefore, extra performance overhead is inevitable.
(An if branch needs to be added before the interrupt is enabled.)
I have tried to reduce the impact on performance.
If the page fault comes from the user mode,
the interrupt must be enabled. In this case,
a judgment can be reduced.
Fixes: f5fe12b1eaee ("ARM: spectre-v2: harden user aborts in kernel space")
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
---
arch/arm/mm/fault.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index ab01b51de559..3425a12a8f52 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -272,25 +272,35 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if (kprobe_page_fault(regs, fsr))
return 0;
/* Enable interrupts if they were enabled in the parent context. */
- if (interrupts_enabled(regs))
- local_irq_enable();
+ if (likely(user_mode(regs))) {
+ if (IS_ENABLED(CONFIG_PREEMPT) &&
+ IS_ENABLED(CONFIG_HARDEN_BRANCH_PREDICTOR) &&
+ unlikely(addr >= TASK_SIZE)) {
+
+ __do_user_fault(addr, fsr, SIGSEGV, SEGV_MAPERR, regs);
+ return 0;
+ }
+
+ flags |= FAULT_FLAG_USER;
+ } else if (!interrupts_enabled(regs))
+ goto irq_disabled;
+
+ local_irq_enable();
+irq_disabled:
/*
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
if (faulthandler_disabled() || !mm)
goto no_context;
- if (user_mode(regs))
- flags |= FAULT_FLAG_USER;
-
if (is_write_fault(fsr)) {
flags |= FAULT_FLAG_WRITE;
vm_flags = VM_WRITE;
}
if (fsr & FSR_LNX_PF) {
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC RESEND PATCH] ARM: spectre-v2: fix the spectre operation that may be bypassed
2025-04-24 10:04 [PATCH] ARM: spectre-v2: fix unstable cpu get xieyuanbin1
2025-04-24 10:21 ` Russell King (Oracle)
2025-05-07 12:28 ` [RFC PATCH] ARM: spectre-v2: fix the spectre operation that may be bypassed Xie Yuanbin
@ 2025-06-06 1:43 ` Xie Yuanbin
2 siblings, 0 replies; 7+ messages in thread
From: Xie Yuanbin @ 2025-06-06 1:43 UTC (permalink / raw)
To: linux, rmk+kernel, kees, yangyj.ee, ardb, tony, xieyuanbin1
Cc: linux-arm-kernel, linux-kernel, will, nixiaoming, liaohua4,
wangbing6, lilinjie8, wangfangpeng1
As discussed before, to completely fix this problem, we must do
the spectre operation after the user mode is trapped in the kernel
and before the interrupt is enabled.
I have tried to add a hook function and it in fsr_info to avoid
performance deterioration (The preceding example will trigger
"level 2 permission fault", which is cold code in normal cases).
However, this does not work. I find that the user program can
also trigger "translation fault" or "access flag fault"
when accessing a kernel address.
Therefore, extra performance overhead is inevitable.
(An if branch needs to be added before the interrupt is enabled.)
I have tried to reduce the impact on performance.
If the page fault comes from the user mode,
the interrupt must be enabled. In this case,
a judgment can be reduced.
Fixes: f5fe12b1eaee ("ARM: spectre-v2: harden user aborts in kernel space")
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
---
arch/arm/mm/fault.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index ab01b51de559..3425a12a8f52 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -272,25 +272,35 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if (kprobe_page_fault(regs, fsr))
return 0;
/* Enable interrupts if they were enabled in the parent context. */
- if (interrupts_enabled(regs))
- local_irq_enable();
+ if (likely(user_mode(regs))) {
+ if (IS_ENABLED(CONFIG_PREEMPT) &&
+ IS_ENABLED(CONFIG_HARDEN_BRANCH_PREDICTOR) &&
+ unlikely(addr >= TASK_SIZE)) {
+
+ __do_user_fault(addr, fsr, SIGSEGV, SEGV_MAPERR, regs);
+ return 0;
+ }
+
+ flags |= FAULT_FLAG_USER;
+ } else if (!interrupts_enabled(regs))
+ goto irq_disabled;
+
+ local_irq_enable();
+irq_disabled:
/*
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
if (faulthandler_disabled() || !mm)
goto no_context;
- if (user_mode(regs))
- flags |= FAULT_FLAG_USER;
-
if (is_write_fault(fsr)) {
flags |= FAULT_FLAG_WRITE;
vm_flags = VM_WRITE;
}
if (fsr & FSR_LNX_PF) {
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-06-06 1:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-24 10:04 [PATCH] ARM: spectre-v2: fix unstable cpu get xieyuanbin1
2025-04-24 10:21 ` Russell King (Oracle)
2025-04-24 13:31 ` xieyuanbin1
2025-04-24 13:50 ` Russell King (Oracle)
2025-04-24 14:03 ` xieyuanbin1
2025-05-07 12:28 ` [RFC PATCH] ARM: spectre-v2: fix the spectre operation that may be bypassed Xie Yuanbin
2025-06-06 1:43 ` [RFC RESEND " Xie Yuanbin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).