public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
Cc: mark.rutland@arm.com, vladimir.murzin@arm.com,
	peterz@infradead.org, catalin.marinas@arm.com,
	ruanjinjie@huawei.com, linux-kernel@vger.kernel.org,
	tglx@kernel.org, luto@kernel.org, will@kernel.org
Subject: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking
Date: Fri, 20 Mar 2026 11:30:25 +0000	[thread overview]
Message-ID: <20260320113026.3219620-2-mark.rutland@arm.com> (raw)
In-Reply-To: <20260320113026.3219620-1-mark.rutland@arm.com>

On arm64, involuntary kernel preemption has been subtly broken since the
move to the generic irq entry code. When preemption occurs, the new task
may run with SError and Debug exceptions masked unexpectedly, leading to
a loss of RAS events, breakpoints, watchpoints, and single-step
exceptions.

We can fix this relatively simply by moving the preemption logic out of
irqentry_exit(), which is desirable for a number of other reasons on
arm64. Context and rationale below:

1) Architecturally, several groups of exceptions can be masked
   independently, including 'Debug', 'SError', 'IRQ', and 'FIQ', whose
   mask bits can be read/written via the 'DAIF' register.

   Other mask bits exist, including 'PM' and 'AllInt', which we will
   need to use in future (e.g. for architectural NMI support).

   The entry code needs to manipulate all of these, but the generic
   entry code only knows about interrupts (which means both IRQ and FIQ
   on arm64), and the other exception masks aren't generic.

2) Architecturally, all maskable exceptions MUST be masked during
   exception entry and exception return.

   Upon exception entry, hardware places exception context into
   exception registers (e.g. the PC is saved into ELR_ELx). Upon
   exception return, hardware restores exception context from those
   exception registers (e.g. the PC is restored from ELR_ELx).

   To ensure the exception registers aren't clobbered by recursive
   exceptions, all maskable exceptions must be masked early during entry
   and late during exit. Hardware masks all maskable exceptions
   automatically at exception entry. Software must unmask these as
   required, and must mask them prior to exception return.

3) Architecturally, hardware masks all maskable exceptions upon any
   exception entry. A synchronous exception (e.g. a fault on a memory
   access) can be taken from any context (e.g. where IRQ+FIQ might be
   masked), and the entry code must explicitly 'inherit' the unmasking
   from the original context by reading the exception registers (e.g.
   SPSR_ELx) and writing to DAIF, etc.

4) When 'pseudo-NMI' is used, Linux masks interrupts via a combination
   of DAIF and the 'PMR' priority mask register. At entry and exit,
   interrupts must be masked via DAIF, but most kernel code will
   mask/unmask regular interrupts using PMR (e.g. in local_irq_save()
   and local_irq_restore()).

   This requires more complicated transitions at entry and exit. Early
   during entry or late during return, interrupts are masked via DAIF,
   and kernel code which manipulates PMR to mask/unmask interrupts will
   not function correctly in this state.

   This also requires fairly complicated management of DAIF and PMR when
   handling interrupts, and arm64 has special logic to avoid preempting
   from pseudo-NMIs which currently lives in
   arch_irqentry_exit_need_resched().

5) Most kernel code runs with all exceptions unmasked. When scheduling,
   only interrupts should be masked (by PMR pseudo-NMI is used, and by
   DAIF otherwise).

For most exceptions, arm64's entry code has a sequence similar to that
of el1_abort(), which is used for faults:

| static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
| {
|         unsigned long far = read_sysreg(far_el1);
|         irqentry_state_t state;
|
|         state = enter_from_kernel_mode(regs);
|         local_daif_inherit(regs);
|         do_mem_abort(far, esr, regs);
|         local_daif_mask();
|         exit_to_kernel_mode(regs, state);
| }

... where enter_from_kernel_mode() and exit_to_kernel_mode() are
wrappers around irqentry_enter() and irqentry_exit() which perform
additional arm64-specific entry/exit logic.

Currently, the generic irq entry code will attempt to preempt from any
exception under irqentry_exit() where interrupts were unmasked in the
original context. As arm64's entry code will have already masked
exceptions via DAIF, this results in the problems described above.

Fix this by opting out of preemption in irqentry_exit(), and restoring
arm64's old behaivour of explicitly preempting when returning from IRQ
or FIQ, before calling exit_to_kernel_mode() / irqentry_exit(). This
ensures that preemption occurs when only interrupts are masked, and
where that masking is compatible with most kernel code (e.g. using PMR
when pseudo-NMI is in use).

Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/Kconfig                     | 3 +++
 arch/arm64/Kconfig               | 1 +
 arch/arm64/kernel/entry-common.c | 2 ++
 kernel/entry/common.c            | 4 +++-
 4 files changed, 9 insertions(+), 1 deletion(-)

Thomas, Peter, I have a couple of things I'd like to check:

(1) The generic irq entry code will preempt from any exception (e.g. a
    synchronous fault) where interrupts were unmasked in the original
    context. Is that intentional/necessary, or was that just the way the
    x86 code happened to be implemented?

    I assume that it'd be fine if arm64 only preempted from true
    interrupts, but if that was intentional/necessary I can go rework
    this.

(2) The generic irq entry code only preempts when RCU was watching in
    the original context. IIUC that's just to avoid preempting from the
    idle thread. Is it functionally necessary to avoid that, or is that
    just an optimization?

    I'm asking because historically arm64 didn't check that, and I
    haven't bothered checking here. I don't know whether we have a
    latent functional bug.

Mark.

diff --git a/arch/Kconfig b/arch/Kconfig
index 102ddbd4298ef..c8c99cd955281 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -102,6 +102,9 @@ config HOTPLUG_PARALLEL
 	bool
 	select HOTPLUG_SPLIT_STARTUP
 
+config ARCH_HAS_OWN_IRQ_PREEMPTION
+	bool
+
 config GENERIC_IRQ_ENTRY
 	bool
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 38dba5f7e4d2d..bf0ec8237de45 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -42,6 +42,7 @@ config ARM64
 	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
+	select ARCH_HAS_OWN_IRQ_PREEMPTION
 	select ARCH_HAS_PREEMPT_LAZY
 	select ARCH_HAS_PTDUMP
 	select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3625797e9ee8f..1aedadf09eb4d 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -497,6 +497,8 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
 	do_interrupt_handler(regs, handler);
 	irq_exit_rcu();
 
+	irqentry_exit_cond_resched();
+
 	exit_to_kernel_mode(regs, state);
 }
 static void noinstr el1_interrupt(struct pt_regs *regs,
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9ef63e4147913..af9cae1f225e3 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -235,8 +235,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 		}
 
 		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
+		if (IS_ENABLED(CONFIG_PREEMPTION) &&
+		    !IS_ENABLED(CONFIG_ARCH_HAS_OWN_IRQ_PREEMPTION)) {
 			irqentry_exit_cond_resched();
+		}
 
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
-- 
2.30.2



  reply	other threads:[~2026-03-20 11:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 11:30 [PATCH 0/2] arm64/entry: Fix involuntary preemption exception masking Mark Rutland
2026-03-20 11:30 ` Mark Rutland [this message]
2026-03-20 13:04   ` [PATCH 1/2] " Peter Zijlstra
2026-03-20 14:11     ` Thomas Gleixner
2026-03-20 14:57       ` Mark Rutland
2026-03-20 15:34         ` Peter Zijlstra
2026-03-20 16:16           ` Mark Rutland
2026-03-20 15:50         ` Thomas Gleixner
2026-03-23 17:21           ` Mark Rutland
2026-03-20 14:59   ` Thomas Gleixner
2026-03-20 15:37     ` Mark Rutland
2026-03-20 16:26       ` Thomas Gleixner
2026-03-20 17:31         ` Mark Rutland
2026-03-21 23:25           ` Thomas Gleixner
2026-03-24 12:19             ` Thomas Gleixner
2026-03-25 11:03             ` Mark Rutland
2026-03-25 15:46               ` Thomas Gleixner
2026-03-26  8:56                 ` Jinjie Ruan
2026-03-26 18:11                 ` Mark Rutland
2026-03-26 18:32                   ` Thomas Gleixner
2026-03-27  1:27                   ` Jinjie Ruan
2026-03-26  8:52               ` Jinjie Ruan
2026-03-24  3:14   ` Jinjie Ruan
2026-03-24 10:51     ` Mark Rutland
2026-03-20 11:30 ` [PATCH 2/2] arm64/entry: Remove arch_irqentry_exit_need_resched() Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320113026.3219620-2-mark.rutland@arm.com \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ruanjinjie@huawei.com \
    --cc=tglx@kernel.org \
    --cc=vladimir.murzin@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox