From: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
Cc: mark.rutland@arm.com, vladimir.murzin@arm.com,
peterz@infradead.org, catalin.marinas@arm.com,
ruanjinjie@huawei.com, linux-kernel@vger.kernel.org,
tglx@kernel.org, luto@kernel.org, will@kernel.org
Subject: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking
Date: Fri, 20 Mar 2026 11:30:25 +0000 [thread overview]
Message-ID: <20260320113026.3219620-2-mark.rutland@arm.com> (raw)
In-Reply-To: <20260320113026.3219620-1-mark.rutland@arm.com>
On arm64, involuntary kernel preemption has been subtly broken since the
move to the generic irq entry code. When preemption occurs, the new task
may run with SError and Debug exceptions masked unexpectedly, leading to
a loss of RAS events, breakpoints, watchpoints, and single-step
exceptions.
We can fix this relatively simply by moving the preemption logic out of
irqentry_exit(), which is desirable for a number of other reasons on
arm64. Context and rationale below:
1) Architecturally, several groups of exceptions can be masked
independently, including 'Debug', 'SError', 'IRQ', and 'FIQ', whose
mask bits can be read/written via the 'DAIF' register.
Other mask bits exist, including 'PM' and 'AllInt', which we will
need to use in future (e.g. for architectural NMI support).
The entry code needs to manipulate all of these, but the generic
entry code only knows about interrupts (which means both IRQ and FIQ
on arm64), and the other exception masks aren't generic.
2) Architecturally, all maskable exceptions MUST be masked during
exception entry and exception return.
Upon exception entry, hardware places exception context into
exception registers (e.g. the PC is saved into ELR_ELx). Upon
exception return, hardware restores exception context from those
exception registers (e.g. the PC is restored from ELR_ELx).
To ensure the exception registers aren't clobbered by recursive
exceptions, all maskable exceptions must be masked early during entry
and late during exit. Hardware masks all maskable exceptions
automatically at exception entry. Software must unmask these as
required, and must mask them prior to exception return.
3) Architecturally, hardware masks all maskable exceptions upon any
exception entry. A synchronous exception (e.g. a fault on a memory
access) can be taken from any context (e.g. where IRQ+FIQ might be
masked), and the entry code must explicitly 'inherit' the unmasking
from the original context by reading the exception registers (e.g.
SPSR_ELx) and writing to DAIF, etc.
4) When 'pseudo-NMI' is used, Linux masks interrupts via a combination
of DAIF and the 'PMR' priority mask register. At entry and exit,
interrupts must be masked via DAIF, but most kernel code will
mask/unmask regular interrupts using PMR (e.g. in local_irq_save()
and local_irq_restore()).
This requires more complicated transitions at entry and exit. Early
during entry or late during return, interrupts are masked via DAIF,
and kernel code which manipulates PMR to mask/unmask interrupts will
not function correctly in this state.
This also requires fairly complicated management of DAIF and PMR when
handling interrupts, and arm64 has special logic to avoid preempting
from pseudo-NMIs which currently lives in
arch_irqentry_exit_need_resched().
5) Most kernel code runs with all exceptions unmasked. When scheduling,
only interrupts should be masked (by PMR pseudo-NMI is used, and by
DAIF otherwise).
For most exceptions, arm64's entry code has a sequence similar to that
of el1_abort(), which is used for faults:
| static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
| {
| unsigned long far = read_sysreg(far_el1);
| irqentry_state_t state;
|
| state = enter_from_kernel_mode(regs);
| local_daif_inherit(regs);
| do_mem_abort(far, esr, regs);
| local_daif_mask();
| exit_to_kernel_mode(regs, state);
| }
... where enter_from_kernel_mode() and exit_to_kernel_mode() are
wrappers around irqentry_enter() and irqentry_exit() which perform
additional arm64-specific entry/exit logic.
Currently, the generic irq entry code will attempt to preempt from any
exception under irqentry_exit() where interrupts were unmasked in the
original context. As arm64's entry code will have already masked
exceptions via DAIF, this results in the problems described above.
Fix this by opting out of preemption in irqentry_exit(), and restoring
arm64's old behaivour of explicitly preempting when returning from IRQ
or FIQ, before calling exit_to_kernel_mode() / irqentry_exit(). This
ensures that preemption occurs when only interrupts are masked, and
where that masking is compatible with most kernel code (e.g. using PMR
when pseudo-NMI is in use).
Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
arch/Kconfig | 3 +++
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/entry-common.c | 2 ++
kernel/entry/common.c | 4 +++-
4 files changed, 9 insertions(+), 1 deletion(-)
Thomas, Peter, I have a couple of things I'd like to check:
(1) The generic irq entry code will preempt from any exception (e.g. a
synchronous fault) where interrupts were unmasked in the original
context. Is that intentional/necessary, or was that just the way the
x86 code happened to be implemented?
I assume that it'd be fine if arm64 only preempted from true
interrupts, but if that was intentional/necessary I can go rework
this.
(2) The generic irq entry code only preempts when RCU was watching in
the original context. IIUC that's just to avoid preempting from the
idle thread. Is it functionally necessary to avoid that, or is that
just an optimization?
I'm asking because historically arm64 didn't check that, and I
haven't bothered checking here. I don't know whether we have a
latent functional bug.
Mark.
diff --git a/arch/Kconfig b/arch/Kconfig
index 102ddbd4298ef..c8c99cd955281 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -102,6 +102,9 @@ config HOTPLUG_PARALLEL
bool
select HOTPLUG_SPLIT_STARTUP
+config ARCH_HAS_OWN_IRQ_PREEMPTION
+ bool
+
config GENERIC_IRQ_ENTRY
bool
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 38dba5f7e4d2d..bf0ec8237de45 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -42,6 +42,7 @@ config ARM64
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
+ select ARCH_HAS_OWN_IRQ_PREEMPTION
select ARCH_HAS_PREEMPT_LAZY
select ARCH_HAS_PTDUMP
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3625797e9ee8f..1aedadf09eb4d 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -497,6 +497,8 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
do_interrupt_handler(regs, handler);
irq_exit_rcu();
+ irqentry_exit_cond_resched();
+
exit_to_kernel_mode(regs, state);
}
static void noinstr el1_interrupt(struct pt_regs *regs,
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9ef63e4147913..af9cae1f225e3 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -235,8 +235,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
}
instrumentation_begin();
- if (IS_ENABLED(CONFIG_PREEMPTION))
+ if (IS_ENABLED(CONFIG_PREEMPTION) &&
+ !IS_ENABLED(CONFIG_ARCH_HAS_OWN_IRQ_PREEMPTION)) {
irqentry_exit_cond_resched();
+ }
/* Covers both tracing and lockdep */
trace_hardirqs_on();
--
2.30.2
next prev parent reply other threads:[~2026-03-20 11:30 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 11:30 [PATCH 0/2] arm64/entry: Fix involuntary preemption exception masking Mark Rutland
2026-03-20 11:30 ` Mark Rutland [this message]
2026-03-20 13:04 ` [PATCH 1/2] " Peter Zijlstra
2026-03-20 14:11 ` Thomas Gleixner
2026-03-20 14:57 ` Mark Rutland
2026-03-20 15:34 ` Peter Zijlstra
2026-03-20 16:16 ` Mark Rutland
2026-03-20 15:50 ` Thomas Gleixner
2026-03-23 17:21 ` Mark Rutland
2026-03-20 14:59 ` Thomas Gleixner
2026-03-20 15:37 ` Mark Rutland
2026-03-20 16:26 ` Thomas Gleixner
2026-03-20 17:31 ` Mark Rutland
2026-03-21 23:25 ` Thomas Gleixner
2026-03-24 12:19 ` Thomas Gleixner
2026-03-25 11:03 ` Mark Rutland
2026-03-25 15:46 ` Thomas Gleixner
2026-03-26 8:56 ` Jinjie Ruan
2026-03-26 18:11 ` Mark Rutland
2026-03-26 18:32 ` Thomas Gleixner
2026-03-27 1:27 ` Jinjie Ruan
2026-03-26 8:52 ` Jinjie Ruan
2026-03-24 3:14 ` Jinjie Ruan
2026-03-24 10:51 ` Mark Rutland
2026-03-20 11:30 ` [PATCH 2/2] arm64/entry: Remove arch_irqentry_exit_need_resched() Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260320113026.3219620-2-mark.rutland@arm.com \
--to=mark.rutland@arm.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=ruanjinjie@huawei.com \
--cc=tglx@kernel.org \
--cc=vladimir.murzin@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox