* [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
@ 2025-07-18 14:28 Ada Couprie Diaz
2025-07-18 15:13 ` Dev Jain
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Ada Couprie Diaz @ 2025-07-18 14:28 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Cristian Prundeanu,
Will Deacon, Ard Biesheuvel
`cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
to different stacks along with the Shadow Call Stack if it is enabled.
Those two stack changes cannot be done atomically and both functions
can be interrupted by SErrors or Debug Exceptions which, though unlikely,
is very much broken : if interrupted, we can end up with mismatched stacks
and Shadow Call Stack leading to clobbered stacks.
In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
but x18 stills points to the old task's SCS. When the interrupt handler
tries to save the task's SCS pointer, it will save the old task
SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
clobbering it.
In `call_on_irq_stack()`, it can happen when switching from the task stack
to the IRQ stack and when switching back. In both cases, we can be
interrupted when the SCS pointer points to the IRQ SCS, but SP points to
the task stack. The nested interrupt handler pushes its return addresses
on the IRQ SCS. It then detects that SP points to the task stack,
calls `call_on_irq_stack()` and clobbers the task SCS pointer with
the IRQ SCS pointer, which it will also use !
This leads to tasks returning to addresses on the wrong SCS,
or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
or FPAC if enabled.
This is possible on a default config, but unlikely.
However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
instead the GIC is responsible for filtering what interrupts the CPU
should receive based on priority.
Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
frequently depending on the system configuration and workload, leading
to unpredictable kernel panics.
Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
Do the same in `call_on_irq_stack()`, but restore and mask around
the branch.
Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
of behaviour between all configurations.
Introduce and use an assembly macro for saving and masking DAIF,
as the existing one saves but only masks IF.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Cristian Prundeanu <cpru@amazon.com>
Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
---
Hi,
I spent some time evaluating the performance impact of this change
to make sure that it would be OK to mask in those functions.
They have very few instructions so have few chances to be interrupted
to begin with so the impact should be minimal.
Disclaimer : I am no benchmarking or performance analysis expert.
I'm happy to take additional inputs/validation of the findings below !
I ran the following benchmarks on 4 core VMs running on recent
commercial hardware, trying to maximize task switches :
- `stress-ng --switch` with 100 processes [0],
- `hackbench -T` and `hackbench -P`, both with 400 tasks [1].
Comparing the effect on base defconfig :
- `stress-ng` is nearly identical, the median switch time with the fix
was reduced by 0.1%, average raised by 0.04%.
- `hackbench` results are slightly different, medians were reduced by
0.3-0.4%, with some high task time outliers raising the average by
1.7-1.8%.
- Both benchmarks have almost identical distribution.
- The effects seem mostly minimal, possibly in the noise.
Comparing the effects with pNMI+SCS, pNMI enabled :
- `stress-ng` is slightly slower : median is +1.9%, average +1.4%.
- `hackbench` is similar : median is +0.8-0.9%, average +0.3-+0.6%.
- Times distribution is larger in both cases.
- There seems to be a small performance impact, however without the fix
there is a high likelihood of triggering the race condition and panic
at some point.
I also tried to benchmark the performance impact on `memcached`
as a workload reported to crash with pNMI enabled and SCS.
I used `mc-crusher`[2] as recommended per the `memcached` documentation,
specifically the `binconf`, `slab_rebal_torture` and `slab_rebal_torture5`
configurations, measuring the average amount of get/set operations
per second each minute.
Those were also run on a 4 core VM, but running on an older machine.
Comparing the effects on base defconfig :
- `binconf` is slightly worse, -0.8% ops/s on average
- `slab_rebal_torture` is slightly better, +0.7% op/s on average
- `slab_rebal_torture5` is almost identical, +0.1% op/s on average
- There is much less variation in operations/s.
Comparing the effects with pNMI+SCS, pNMI enabled :
- `binconf` is slightly better, +0.5% op/s on average
- `slab_rebal_torture` as well, +0.5% op/s on average
- `slab_rebal_torture5` is slightly worse, -0.6% op/s on average
- The spread of values is similar
The `mc-crusher` performance results seem to validate that the change
has little impact in practice and might very much be lost in the noise.
Given those results I feel it is OK to mask DAIF in `call_on_irq_stack()`
and `cpu_switch_to()` in all configurations for consistency of behaviour,
as well as not being notably detrimental in cases where it does fix the
race condition.
My apologies if this is all *very long*. I felt it was important to
explain the mechanisms triggering the issues, as well as justifying
the performance impact given the functions affected (though this has
less place in the commit message itself.)
I might be wrong on both counts, so I'm happy to trim the commit
message if needed or be corrected on the performance impact !
Thanks,
Ada
[0]: https://github.com/ColinIanKing/stress-ng
[1]: https://man.archlinux.org/man/hackbench.8
[2]: https://github.com/memcached/mc-crusher
---
arch/arm64/include/asm/assembler.h | 5 +++++
arch/arm64/kernel/entry.S | 6 ++++++
2 files changed, 11 insertions(+)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index ad63457a05c5..c56c21bb1eec 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -41,6 +41,11 @@
/*
* Save/restore interrupts.
*/
+ .macro save_and_disable_daif, flags
+ mrs \flags, daif
+ msr daifset, #0xf
+ .endm
+
.macro save_and_disable_irq, flags
mrs \flags, daif
msr daifset, #3
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5ae2a34b50bd..30dcb719685b 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -825,6 +825,7 @@ SYM_CODE_END(__bp_harden_el1_vectors)
*
*/
SYM_FUNC_START(cpu_switch_to)
+ save_and_disable_daif x11
mov x10, #THREAD_CPU_CONTEXT
add x8, x0, x10
mov x9, sp
@@ -848,6 +849,7 @@ SYM_FUNC_START(cpu_switch_to)
ptrauth_keys_install_kernel x1, x8, x9, x10
scs_save x0
scs_load_current
+ restore_irq x11
ret
SYM_FUNC_END(cpu_switch_to)
NOKPROBE(cpu_switch_to)
@@ -874,6 +876,7 @@ NOKPROBE(ret_from_fork)
* Calls func(regs) using this CPU's irq stack and shadow irq stack.
*/
SYM_FUNC_START(call_on_irq_stack)
+ save_and_disable_daif x9
#ifdef CONFIG_SHADOW_CALL_STACK
get_current_task x16
scs_save x16
@@ -888,8 +891,10 @@ SYM_FUNC_START(call_on_irq_stack)
/* Move to the new stack and call the function there */
add sp, x16, #IRQ_STACK_SIZE
+ restore_irq x9
blr x1
+ save_and_disable_daif x9
/*
* Restore the SP from the FP, and restore the FP and LR from the frame
* record.
@@ -897,6 +902,7 @@ SYM_FUNC_START(call_on_irq_stack)
mov sp, x29
ldp x29, x30, [sp], #16
scs_load_current
+ restore_irq x9
ret
SYM_FUNC_END(call_on_irq_stack)
NOKPROBE(call_on_irq_stack)
base-commit: 347e9f5043c89695b01e66b3ed111755afcf1911
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-18 14:28 [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() Ada Couprie Diaz
@ 2025-07-18 15:13 ` Dev Jain
2025-07-18 17:07 ` Ada Couprie Diaz
2025-07-21 14:18 ` Will Deacon
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Dev Jain @ 2025-07-18 15:13 UTC (permalink / raw)
To: Ada Couprie Diaz, linux-arm-kernel
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Cristian Prundeanu,
Will Deacon, Ard Biesheuvel
On 18/07/25 7:58 pm, Ada Couprie Diaz wrote:
> `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
> to different stacks along with the Shadow Call Stack if it is enabled.
> Those two stack changes cannot be done atomically and both functions
> can be interrupted by SErrors or Debug Exceptions which, though unlikely,
> is very much broken : if interrupted, we can end up with mismatched stacks
> and Shadow Call Stack leading to clobbered stacks.
>
> In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
> but x18 stills points to the old task's SCS. When the interrupt handler
> tries to save the task's SCS pointer, it will save the old task
> SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
> clobbering it.
>
> In `call_on_irq_stack()`, it can happen when switching from the task stack
> to the IRQ stack and when switching back. In both cases, we can be
> interrupted when the SCS pointer points to the IRQ SCS, but SP points to
> the task stack. The nested interrupt handler pushes its return addresses
> on the IRQ SCS. It then detects that SP points to the task stack,
> calls `call_on_irq_stack()` and clobbers the task SCS pointer with
> the IRQ SCS pointer, which it will also use !
>
> This leads to tasks returning to addresses on the wrong SCS,
> or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
> or FPAC if enabled.
>
> This is possible on a default config, but unlikely.
> However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
> instead the GIC is responsible for filtering what interrupts the CPU
> should receive based on priority.
> Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
> even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
> frequently depending on the system configuration and workload, leading
> to unpredictable kernel panics.
>
> Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> Do the same in `call_on_irq_stack()`, but restore and mask around
> the branch.
> Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> of behaviour between all configurations.
>
> Introduce and use an assembly macro for saving and masking DAIF,
> as the existing one saves but only masks IF.
>
> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Reported-by: Cristian Prundeanu <cpru@amazon.com>
> Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
> ---
Nit: We only write a prefix of length 12 for the fixes commit.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-18 15:13 ` Dev Jain
@ 2025-07-18 17:07 ` Ada Couprie Diaz
0 siblings, 0 replies; 11+ messages in thread
From: Ada Couprie Diaz @ 2025-07-18 17:07 UTC (permalink / raw)
To: Dev Jain, linux-arm-kernel
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Cristian Prundeanu,
Will Deacon, Ard Biesheuvel
Hi Dev,
On 18/07/2025 16:13, Dev Jain wrote:
>
> On 18/07/25 7:58 pm, Ada Couprie Diaz wrote:
>> [...]
>> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
>> Reported-by: Cristian Prundeanu <cpru@amazon.com>
>> Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow
>> stack pointer in the task struct on interrupt")
>> ---
>
> Nit: We only write a prefix of length 12 for the fixes commit.
>
Indeed, I forgot to trim it.
Thanks !
Ada
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-18 14:28 [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() Ada Couprie Diaz
2025-07-18 15:13 ` Dev Jain
@ 2025-07-21 14:18 ` Will Deacon
2025-07-22 5:31 ` Ard Biesheuvel
2025-07-21 21:42 ` Prundeanu, Cristian
2025-07-22 15:59 ` Will Deacon
3 siblings, 1 reply; 11+ messages in thread
From: Will Deacon @ 2025-07-21 14:18 UTC (permalink / raw)
To: Ada Couprie Diaz
Cc: linux-arm-kernel, Catalin Marinas, Mark Rutland, Ard Biesheuvel,
Mark Brown, Cristian Prundeanu
On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
> `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
> to different stacks along with the Shadow Call Stack if it is enabled.
> Those two stack changes cannot be done atomically and both functions
> can be interrupted by SErrors or Debug Exceptions which, though unlikely,
> is very much broken : if interrupted, we can end up with mismatched stacks
> and Shadow Call Stack leading to clobbered stacks.
>
> In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
> but x18 stills points to the old task's SCS. When the interrupt handler
> tries to save the task's SCS pointer, it will save the old task
> SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
> clobbering it.
>
> In `call_on_irq_stack()`, it can happen when switching from the task stack
> to the IRQ stack and when switching back. In both cases, we can be
> interrupted when the SCS pointer points to the IRQ SCS, but SP points to
> the task stack. The nested interrupt handler pushes its return addresses
> on the IRQ SCS. It then detects that SP points to the task stack,
> calls `call_on_irq_stack()` and clobbers the task SCS pointer with
> the IRQ SCS pointer, which it will also use !
>
> This leads to tasks returning to addresses on the wrong SCS,
> or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
> or FPAC if enabled.
>
> This is possible on a default config, but unlikely.
> However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
> instead the GIC is responsible for filtering what interrupts the CPU
> should receive based on priority.
> Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
> even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
> frequently depending on the system configuration and workload, leading
> to unpredictable kernel panics.
>
> Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> Do the same in `call_on_irq_stack()`, but restore and mask around
> the branch.
> Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> of behaviour between all configurations.
>
> Introduce and use an assembly macro for saving and masking DAIF,
> as the existing one saves but only masks IF.
>
> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Reported-by: Cristian Prundeanu <cpru@amazon.com>
> Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
> ---
> Hi,
> I spent some time evaluating the performance impact of this change
> to make sure that it would be OK to mask in those functions.
> They have very few instructions so have few chances to be interrupted
> to begin with so the impact should be minimal.
It's definitely worthwhile doing the benchmarking, so thanks for that,
but realistically I don't think we have an awful lot of choice but to
fix this, do we?
The code looks good to me:
Acked-by: Will Deacon <will@kernel.org>
Will
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-18 14:28 [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() Ada Couprie Diaz
2025-07-18 15:13 ` Dev Jain
2025-07-21 14:18 ` Will Deacon
@ 2025-07-21 21:42 ` Prundeanu, Cristian
2025-07-22 15:20 ` Will Deacon
2025-07-22 15:59 ` Will Deacon
3 siblings, 1 reply; 11+ messages in thread
From: Prundeanu, Cristian @ 2025-07-21 21:42 UTC (permalink / raw)
To: Ada Couprie Diaz, linux-arm-kernel@lists.infradead.org
Cc: Catalin Marinas, Mark Rutland, Will Deacon, Ard Biesheuvel,
Mark Brown
On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
> Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> Do the same in `call_on_irq_stack()`, but restore and mask around
> the branch.
> Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> of behaviour between all configurations.
>
> Introduce and use an assembly macro for saving and masking DAIF,
> as the existing one saves but only masks IF.
>
> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Reported-by: Cristian Prundeanu <cpru@amazon.com>
> Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
Confirming this fixes the spontaneous reboot previously observed when
enabling both pseudo-NMI (irqchip.gicv3_pseudo_nmi=1) and shadow call
stack (CONFIG_SHADOW_CALL_STACK=y). Tested both on kernel 6.16-rc7 and
legacy kernel 6.8 where the issue was initially observed.
Tested-by: Cristian Prundeanu <cpru@amazon.com>
-Cristian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-21 14:18 ` Will Deacon
@ 2025-07-22 5:31 ` Ard Biesheuvel
2025-07-24 17:50 ` Ada Couprie Diaz
0 siblings, 1 reply; 11+ messages in thread
From: Ard Biesheuvel @ 2025-07-22 5:31 UTC (permalink / raw)
To: Will Deacon, Sami Tolvanen
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Cristian Prundeanu,
linux-arm-kernel
(cc Sami)
On Tue, 22 Jul 2025 at 00:18, Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
> > `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
> > to different stacks along with the Shadow Call Stack if it is enabled.
> > Those two stack changes cannot be done atomically and both functions
> > can be interrupted by SErrors or Debug Exceptions which, though unlikely,
> > is very much broken : if interrupted, we can end up with mismatched stacks
> > and Shadow Call Stack leading to clobbered stacks.
> >
> > In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
> > but x18 stills points to the old task's SCS. When the interrupt handler
> > tries to save the task's SCS pointer, it will save the old task
> > SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
> > clobbering it.
> >
In this case, the 'interrupt handler' is not the entry code, but
call_on_irq_stack(), right?
> > In `call_on_irq_stack()`, it can happen when switching from the task stack
> > to the IRQ stack and when switching back. In both cases, we can be
> > interrupted when the SCS pointer points to the IRQ SCS, but SP points to
> > the task stack. The nested interrupt handler pushes its return addresses
> > on the IRQ SCS. It then detects that SP points to the task stack,
> > calls `call_on_irq_stack()` and clobbers the task SCS pointer with
> > the IRQ SCS pointer, which it will also use !
> >
> > This leads to tasks returning to addresses on the wrong SCS,
> > or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
> > or FPAC if enabled.
> >
> > This is possible on a default config, but unlikely.
> > However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
> > instead the GIC is responsible for filtering what interrupts the CPU
> > should receive based on priority.
> > Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
> > even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
> > frequently depending on the system configuration and workload, leading
> > to unpredictable kernel panics.
> >
> > Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> > Do the same in `call_on_irq_stack()`, but restore and mask around
> > the branch.
> > Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> > of behaviour between all configurations.
> >
> > Introduce and use an assembly macro for saving and masking DAIF,
> > as the existing one saves but only masks IF.
> >
> > Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> > Reported-by: Cristian Prundeanu <cpru@amazon.com>
> > Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
> > ---
> > Hi,
> > I spent some time evaluating the performance impact of this change
> > to make sure that it would be OK to mask in those functions.
> > They have very few instructions so have few chances to be interrupted
> > to begin with so the impact should be minimal.
>
> It's definitely worthwhile doing the benchmarking, so thanks for that,
> but realistically I don't think we have an awful lot of choice but to
> fix this, do we?
>
So AIUI, the problem is that both cpu_switch_to() and
call_on_irq_stack() itself can be interrupted by the latter, resulting
in the shadow call stack pointer recorded in the task struct to be
incorrect, either because the task struct pointer is pointing to the
wrong task, or the shadow stack pointer is pointing to the wrong
stack.
Commit 59b37fe52f49955 changed this code to ensure that the shadow
stack pointer is not preserved to/restored from the ordinary stack,
because that turns call_on_irq_stack() into a rather useful gadget but
use the task struct instead.
What we could do instead is record the caller's SCS pointer on the IRQ
shadow stack, avoiding the task struct altogether. This should be the
only occurrence of scs_save that may interrupt cpu_switch_to(), so it
should address both issues AFAICT.
The only minor issue is that loading the IRQ shadow stack pointer
before assigning SP is problematic, because when call_on_irq_stack()
is re-entered over the top of an exception that is taken at that
point, the clobbering of scs_sp will interfere with the return from
that exception, and so scs_sp should be left untouched until after SP
is assigned (which is what prevents call_on_irq_stack() from
re-entering). This creates a tiny window where we may end up servicing
a (nested) IRQ running on the IRQ stack but on the task shadow stack,
but I don't think that is a thing to obsess about tbh. (Sami?) If it
is, we'll need your DAIF changes to turn that into a proper critical
section, but I doubt whether that is necessary.
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -874,12 +874,6 @@ NOKPROBE(ret_from_fork)
* Calls func(regs) using this CPU's irq stack and shadow irq stack.
*/
SYM_FUNC_START(call_on_irq_stack)
-#ifdef CONFIG_SHADOW_CALL_STACK
- get_current_task x16
- scs_save x16
- ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x17
-#endif
-
/* Create a frame record to save our LR and SP (implicit in FP) */
stp x29, x30, [sp, #-16]!
mov x29, sp
@@ -888,7 +882,23 @@ SYM_FUNC_START(call_on_irq_stack)
/* Move to the new stack and call the function there */
add sp, x16, #IRQ_STACK_SIZE
+#ifdef CONFIG_SHADOW_CALL_STACK
+ mov x16, scs_sp
+ ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x17
+#ifdef CONFIG_DYNAMIC_SCS
+ cbz scs_sp, 0f
+#endif
+ str x16, [scs_sp], #8
+0:
+#endif
blr x1
+#ifdef CONFIG_SHADOW_CALL_STACK
+#ifdef CONFIG_DYNAMIC_SCS
+ cbz scs_sp, 1f
+#endif
+ ldr scs_sp, [scs_sp, #-8]
+1:
+#endif
/*
* Restore the SP from the FP, and restore the FP and LR from the
@@ -896,7 +906,6 @@ SYM_FUNC_START(call_on_irq_stack)
*/
mov sp, x29
ldp x29, x30, [sp], #16
- scs_load_current
ret
SYM_FUNC_END(call_on_irq_stack)
NOKPROBE(call_on_irq_stack)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-21 21:42 ` Prundeanu, Cristian
@ 2025-07-22 15:20 ` Will Deacon
2025-07-23 1:20 ` Ard Biesheuvel
0 siblings, 1 reply; 11+ messages in thread
From: Will Deacon @ 2025-07-22 15:20 UTC (permalink / raw)
To: Prundeanu, Cristian
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Ard Biesheuvel,
linux-arm-kernel@lists.infradead.org
On Mon, Jul 21, 2025 at 09:42:23PM +0000, Prundeanu, Cristian wrote:
> On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
>
> > Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> > Do the same in `call_on_irq_stack()`, but restore and mask around
> > the branch.
> > Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> > of behaviour between all configurations.
> >
> > Introduce and use an assembly macro for saving and masking DAIF,
> > as the existing one saves but only masks IF.
> >
> > Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> > Reported-by: Cristian Prundeanu <cpru@amazon.com>
> > Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
>
> Confirming this fixes the spontaneous reboot previously observed when
> enabling both pseudo-NMI (irqchip.gicv3_pseudo_nmi=1) and shadow call
> stack (CONFIG_SHADOW_CALL_STACK=y). Tested both on kernel 6.16-rc7 and
> legacy kernel 6.8 where the issue was initially observed.
>
> Tested-by: Cristian Prundeanu <cpru@amazon.com>
Ah, I hadn't appreciated from the cover letter that this was fixing a
real issue seen in the field. It all sounded a bit theoretical (but more
likely with pNMI).
I'll pick it up as a fix so we can land it in v6.16.
Ard -- maybe we can rework this in future along the lines that you
suggest but, from what Mark was saying offline, there may be problems
beyond the SCS that need addressing too if we decide to leave IRQs
enabled.
Will
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-18 14:28 [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() Ada Couprie Diaz
` (2 preceding siblings ...)
2025-07-21 21:42 ` Prundeanu, Cristian
@ 2025-07-22 15:59 ` Will Deacon
3 siblings, 0 replies; 11+ messages in thread
From: Will Deacon @ 2025-07-22 15:59 UTC (permalink / raw)
To: linux-arm-kernel, Ada Couprie Diaz
Cc: catalin.marinas, kernel-team, Will Deacon, Mark Rutland,
Mark Brown, Cristian Prundeanu, Ard Biesheuvel
On Fri, 18 Jul 2025 15:28:14 +0100, Ada Couprie Diaz wrote:
> `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
> to different stacks along with the Shadow Call Stack if it is enabled.
> Those two stack changes cannot be done atomically and both functions
> can be interrupted by SErrors or Debug Exceptions which, though unlikely,
> is very much broken : if interrupted, we can end up with mismatched stacks
> and Shadow Call Stack leading to clobbered stacks.
>
> [...]
Applied to arm64 (for-next/fixes), thanks!
[1/1] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
https://git.kernel.org/arm64/c/d42e6c20de61
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-22 15:20 ` Will Deacon
@ 2025-07-23 1:20 ` Ard Biesheuvel
2025-07-24 17:35 ` Ada Couprie Diaz
0 siblings, 1 reply; 11+ messages in thread
From: Ard Biesheuvel @ 2025-07-23 1:20 UTC (permalink / raw)
To: Will Deacon, Sami Tolvanen
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Prundeanu, Cristian,
linux-arm-kernel@lists.infradead.org
(cc Sami again)
On Wed, 23 Jul 2025 at 01:20, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jul 21, 2025 at 09:42:23PM +0000, Prundeanu, Cristian wrote:
> > On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
> >
> > > Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> > > Do the same in `call_on_irq_stack()`, but restore and mask around
> > > the branch.
> > > Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> > > of behaviour between all configurations.
> > >
> > > Introduce and use an assembly macro for saving and masking DAIF,
> > > as the existing one saves but only masks IF.
> > >
> > > Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> > > Reported-by: Cristian Prundeanu <cpru@amazon.com>
> > > Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
> >
> > Confirming this fixes the spontaneous reboot previously observed when
> > enabling both pseudo-NMI (irqchip.gicv3_pseudo_nmi=1) and shadow call
> > stack (CONFIG_SHADOW_CALL_STACK=y). Tested both on kernel 6.16-rc7 and
> > legacy kernel 6.8 where the issue was initially observed.
> >
> > Tested-by: Cristian Prundeanu <cpru@amazon.com>
>
> Ah, I hadn't appreciated from the cover letter that this was fixing a
> real issue seen in the field. It all sounded a bit theoretical (but more
> likely with pNMI).
>
> I'll pick it up as a fix so we can land it in v6.16.
>
> Ard -- maybe we can rework this in future along the lines that you
> suggest but, from what Mark was saying offline, there may be problems
> beyond the SCS that need addressing too if we decide to leave IRQs
> enabled.
>
That is fine with me - the change looks alright, it's just that the
wall of text implicates the shadow call stack (and the commit in
question) in particular, even though (per Mark), the actual issues
being addressed here are more intricate and not limited to the shadow
call stack at all. E.g., the fact that call_on_irq_stack() may
re-enter up until the point where SP is assigned could have other
implications, which were present before that particular commit made
things even worse.
But if masking DAIF is needed in any case, it make sense of course to
rely on it to fix those issues too - I just think it might be better
to get rid of scs_save in call_on_irq_stack() anyway.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-23 1:20 ` Ard Biesheuvel
@ 2025-07-24 17:35 ` Ada Couprie Diaz
0 siblings, 0 replies; 11+ messages in thread
From: Ada Couprie Diaz @ 2025-07-24 17:35 UTC (permalink / raw)
To: Ard Biesheuvel, Will Deacon, Sami Tolvanen
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Prundeanu, Cristian,
linux-arm-kernel@lists.infradead.org
On 23/07/2025 02:20, Ard Biesheuvel wrote:
> (cc Sami again)
>
> On Wed, 23 Jul 2025 at 01:20, Will Deacon <will@kernel.org> wrote:
>> On Mon, Jul 21, 2025 at 09:42:23PM +0000, Prundeanu, Cristian wrote:
>>> On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
>>>
>>>> Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
>>>> Do the same in `call_on_irq_stack()`, but restore and mask around
>>>> the branch.
>>>> Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
>>>> of behaviour between all configurations.
>>>>
>>>> Introduce and use an assembly macro for saving and masking DAIF,
>>>> as the existing one saves but only masks IF.
>>>>
>>>> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
>>>> Reported-by: Cristian Prundeanu <cpru@amazon.com>
>>>> Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
>>> Confirming this fixes the spontaneous reboot previously observed when
>>> enabling both pseudo-NMI (irqchip.gicv3_pseudo_nmi=1) and shadow call
>>> stack (CONFIG_SHADOW_CALL_STACK=y). Tested both on kernel 6.16-rc7 and
>>> legacy kernel 6.8 where the issue was initially observed.
>>>
>>> Tested-by: Cristian Prundeanu <cpru@amazon.com>
>> Ah, I hadn't appreciated from the cover letter that this was fixing a
>> real issue seen in the field. It all sounded a bit theoretical (but more
>> likely with pNMI).
I can see how : I detailed the mechanism but not really mentioned either
a reproducer or the originally observed issues, on top of mentioning pNMI
after all the "possible but unlikely" explanation. That's my bad !
I'll probably lead with the observed issues rather than the explanation
of their reason in the future.
>> I'll pick it up as a fix so we can land it in v6.16.
>>
>> Ard -- maybe we can rework this in future along the lines that you
>> suggest but, from what Mark was saying offline, there may be problems
>> beyond the SCS that need addressing too if we decide to leave IRQs
>> enabled.
>>
> That is fine with me - the change looks alright, it's just that the
> wall of text implicates the shadow call stack (and the commit in
> question) in particular, even though (per Mark), the actual issues
> being addressed here are more intricate and not limited to the shadow
> call stack at all. E.g., the fact that call_on_irq_stack() may
> re-enter up until the point where SP is assigned could have other
> implications, which were present before that particular commit made
> things even worse.
To be honest, that was indeed my understanding of the issue
and how I approached it. Given offline discussions, there's other things
that having interrupts masked seems to help/fix. Knowing that now
I agree that the commit message is not great.
> But if masking DAIF is needed in any case, it make sense of course to
> rely on it to fix those issues too - I just think it might be better
> to get rid of scs_save in call_on_irq_stack() anyway.
I think that makes sense, and with interrupts masked would probably
prevent the issue you mentioned in the other thread if I understand
correctly.
Thanks all, sorry for the confusion of the commit message.
Ada
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
2025-07-22 5:31 ` Ard Biesheuvel
@ 2025-07-24 17:50 ` Ada Couprie Diaz
0 siblings, 0 replies; 11+ messages in thread
From: Ada Couprie Diaz @ 2025-07-24 17:50 UTC (permalink / raw)
To: Ard Biesheuvel, Will Deacon, Sami Tolvanen
Cc: Mark Rutland, Catalin Marinas, Mark Brown, Cristian Prundeanu,
linux-arm-kernel
I think this has been clarified offline (?), but answering for the list.
On 22/07/2025 06:31, Ard Biesheuvel wrote:
> (cc Sami)
>
> On Tue, 22 Jul 2025 at 00:18, Will Deacon <will@kernel.org> wrote:
>> On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
>>> `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
>>> to different stacks along with the Shadow Call Stack if it is enabled.
>>> Those two stack changes cannot be done atomically and both functions
>>> can be interrupted by SErrors or Debug Exceptions which, though unlikely,
>>> is very much broken : if interrupted, we can end up with mismatched stacks
>>> and Shadow Call Stack leading to clobbered stacks.
>>>
>>> In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
>>> but x18 stills points to the old task's SCS. When the interrupt handler
>>> tries to save the task's SCS pointer, it will save the old task
>>> SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
>>> clobbering it.
>>>
> In this case, the 'interrupt handler' is not the entry code, but
> call_on_irq_stack(), right?
That is correct yes.
[...]
> So AIUI, the problem is that both cpu_switch_to() and
> call_on_irq_stack() itself can be interrupted by the latter, resulting
> in the shadow call stack pointer recorded in the task struct to be
> incorrect, either because the task struct pointer is pointing to the
> wrong task, or the shadow stack pointer is pointing to the wrong
> stack.
Exactly : that is what I was trying to fix, which is why the commit message
is focused on this.
However, as per the other thread there's potentially other issues here
that I was not well aware of at the time for which masking DAIF is still
beneficial.
>
> Commit 59b37fe52f49955 changed this code to ensure that the shadow
> stack pointer is not preserved to/restored from the ordinary stack,
> because that turns call_on_irq_stack() into a rather useful gadget but
> use the task struct instead.
>
> What we could do instead is record the caller's SCS pointer on the IRQ
> shadow stack, avoiding the task struct altogether. This should be the
> only occurrence of scs_save that may interrupt cpu_switch_to(), so it
> should address both issues AFAICT.
Will has pulled the DAIF fix as per the other discussion, but I did test
with
your patch below : it does indeed fix the panics that were reported
with pNMI+SCS as far as I can tell.
Thanks for the details !
Ada
> The only minor issue is that loading the IRQ shadow stack pointer
> before assigning SP is problematic, because when call_on_irq_stack()
> is re-entered over the top of an exception that is taken at that
> point, the clobbering of scs_sp will interfere with the return from
> that exception, and so scs_sp should be left untouched until after SP
> is assigned (which is what prevents call_on_irq_stack() from
> re-entering). This creates a tiny window where we may end up servicing
> a (nested) IRQ running on the IRQ stack but on the task shadow stack,
> but I don't think that is a thing to obsess about tbh. (Sami?) If it
> is, we'll need your DAIF changes to turn that into a proper critical
> section, but I doubt whether that is necessary.
>
>
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -874,12 +874,6 @@ NOKPROBE(ret_from_fork)
> * Calls func(regs) using this CPU's irq stack and shadow irq stack.
> */
> SYM_FUNC_START(call_on_irq_stack)
> -#ifdef CONFIG_SHADOW_CALL_STACK
> - get_current_task x16
> - scs_save x16
> - ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x17
> -#endif
> -
> /* Create a frame record to save our LR and SP (implicit in FP) */
> stp x29, x30, [sp, #-16]!
> mov x29, sp
> @@ -888,7 +882,23 @@ SYM_FUNC_START(call_on_irq_stack)
>
> /* Move to the new stack and call the function there */
> add sp, x16, #IRQ_STACK_SIZE
> +#ifdef CONFIG_SHADOW_CALL_STACK
> + mov x16, scs_sp
> + ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x17
> +#ifdef CONFIG_DYNAMIC_SCS
> + cbz scs_sp, 0f
> +#endif
> + str x16, [scs_sp], #8
> +0:
> +#endif
> blr x1
> +#ifdef CONFIG_SHADOW_CALL_STACK
> +#ifdef CONFIG_DYNAMIC_SCS
> + cbz scs_sp, 1f
> +#endif
> + ldr scs_sp, [scs_sp, #-8]
> +1:
> +#endif
>
> /*
> * Restore the SP from the FP, and restore the FP and LR from the
> @@ -896,7 +906,6 @@ SYM_FUNC_START(call_on_irq_stack)
> */
> mov sp, x29
> ldp x29, x30, [sp], #16
> - scs_load_current
> ret
> SYM_FUNC_END(call_on_irq_stack)
> NOKPROBE(call_on_irq_stack)
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-24 18:36 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-18 14:28 [PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() Ada Couprie Diaz
2025-07-18 15:13 ` Dev Jain
2025-07-18 17:07 ` Ada Couprie Diaz
2025-07-21 14:18 ` Will Deacon
2025-07-22 5:31 ` Ard Biesheuvel
2025-07-24 17:50 ` Ada Couprie Diaz
2025-07-21 21:42 ` Prundeanu, Cristian
2025-07-22 15:20 ` Will Deacon
2025-07-23 1:20 ` Ard Biesheuvel
2025-07-24 17:35 ` Ada Couprie Diaz
2025-07-22 15:59 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).