From: Jinjie Ruan <ruanjinjie@huawei.com>
To: Mark Rutland <mark.rutland@arm.com>,
<linux-arm-kernel@lists.infradead.org>
Cc: <ada.coupriediaz@arm.com>, <catalin.marinas@arm.com>,
<linux-kernel@vger.kernel.org>, <luto@kernel.org>,
<peterz@infradead.org>, <tglx@kernel.org>,
<vladimir.murzin@arm.com>, <will@kernel.org>
Subject: Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking
Date: Tue, 24 Mar 2026 11:14:28 +0800 [thread overview]
Message-ID: <c4e823cd-bd35-0e4b-d3fd-52688bfdf4ab@huawei.com> (raw)
In-Reply-To: <20260320113026.3219620-2-mark.rutland@arm.com>
On 2026/3/20 19:30, Mark Rutland wrote:
> On arm64, involuntary kernel preemption has been subtly broken since the
> move to the generic irq entry code. When preemption occurs, the new task
> may run with SError and Debug exceptions masked unexpectedly, leading to
> a loss of RAS events, breakpoints, watchpoints, and single-step
> exceptions.
We can also add a check in arch_irqentry_exit_need_resched to prevent
schedule-out when the DA bit is set.
>
> We can fix this relatively simply by moving the preemption logic out of
> irqentry_exit(), which is desirable for a number of other reasons on
> arm64. Context and rationale below:
>
> 1) Architecturally, several groups of exceptions can be masked
> independently, including 'Debug', 'SError', 'IRQ', and 'FIQ', whose
> mask bits can be read/written via the 'DAIF' register.
>
> Other mask bits exist, including 'PM' and 'AllInt', which we will
> need to use in future (e.g. for architectural NMI support).
>
> The entry code needs to manipulate all of these, but the generic
> entry code only knows about interrupts (which means both IRQ and FIQ
> on arm64), and the other exception masks aren't generic.
>
> 2) Architecturally, all maskable exceptions MUST be masked during
> exception entry and exception return.
>
> Upon exception entry, hardware places exception context into
> exception registers (e.g. the PC is saved into ELR_ELx). Upon
> exception return, hardware restores exception context from those
> exception registers (e.g. the PC is restored from ELR_ELx).
>
> To ensure the exception registers aren't clobbered by recursive
> exceptions, all maskable exceptions must be masked early during entry
> and late during exit. Hardware masks all maskable exceptions
> automatically at exception entry. Software must unmask these as
> required, and must mask them prior to exception return.
>
> 3) Architecturally, hardware masks all maskable exceptions upon any
> exception entry. A synchronous exception (e.g. a fault on a memory
> access) can be taken from any context (e.g. where IRQ+FIQ might be
> masked), and the entry code must explicitly 'inherit' the unmasking
> from the original context by reading the exception registers (e.g.
> SPSR_ELx) and writing to DAIF, etc.
>
> 4) When 'pseudo-NMI' is used, Linux masks interrupts via a combination
> of DAIF and the 'PMR' priority mask register. At entry and exit,
> interrupts must be masked via DAIF, but most kernel code will
> mask/unmask regular interrupts using PMR (e.g. in local_irq_save()
> and local_irq_restore()).
>
> This requires more complicated transitions at entry and exit. Early
> during entry or late during return, interrupts are masked via DAIF,
> and kernel code which manipulates PMR to mask/unmask interrupts will
> not function correctly in this state.
>
> This also requires fairly complicated management of DAIF and PMR when
> handling interrupts, and arm64 has special logic to avoid preempting
> from pseudo-NMIs which currently lives in
> arch_irqentry_exit_need_resched().
>
> 5) Most kernel code runs with all exceptions unmasked. When scheduling,
> only interrupts should be masked (by PMR pseudo-NMI is used, and by
> DAIF otherwise).
>
> For most exceptions, arm64's entry code has a sequence similar to that
> of el1_abort(), which is used for faults:
>
> | static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
> | {
> | unsigned long far = read_sysreg(far_el1);
> | irqentry_state_t state;
> |
> | state = enter_from_kernel_mode(regs);
> | local_daif_inherit(regs);
> | do_mem_abort(far, esr, regs);
> | local_daif_mask();
> | exit_to_kernel_mode(regs, state);
> | }
>
> ... where enter_from_kernel_mode() and exit_to_kernel_mode() are
> wrappers around irqentry_enter() and irqentry_exit() which perform
> additional arm64-specific entry/exit logic.
>
> Currently, the generic irq entry code will attempt to preempt from any
> exception under irqentry_exit() where interrupts were unmasked in the
> original context. As arm64's entry code will have already masked
> exceptions via DAIF, this results in the problems described above.
>
> Fix this by opting out of preemption in irqentry_exit(), and restoring
> arm64's old behaivour of explicitly preempting when returning from IRQ
> or FIQ, before calling exit_to_kernel_mode() / irqentry_exit(). This
> ensures that preemption occurs when only interrupts are masked, and
> where that masking is compatible with most kernel code (e.g. using PMR
> when pseudo-NMI is in use).
>
> Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
> Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/Kconfig | 3 +++
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/entry-common.c | 2 ++
> kernel/entry/common.c | 4 +++-
> 4 files changed, 9 insertions(+), 1 deletion(-)
>
> Thomas, Peter, I have a couple of things I'd like to check:
>
> (1) The generic irq entry code will preempt from any exception (e.g. a
> synchronous fault) where interrupts were unmasked in the original
> context. Is that intentional/necessary, or was that just the way the
> x86 code happened to be implemented?
>
> I assume that it'd be fine if arm64 only preempted from true
> interrupts, but if that was intentional/necessary I can go rework
> this.
>
> (2) The generic irq entry code only preempts when RCU was watching in
> the original context. IIUC that's just to avoid preempting from the
> idle thread. Is it functionally necessary to avoid that, or is that
> just an optimization?
>
> I'm asking because historically arm64 didn't check that, and I
> haven't bothered checking here. I don't know whether we have a
> latent functional bug.
>
> Mark.
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 102ddbd4298ef..c8c99cd955281 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -102,6 +102,9 @@ config HOTPLUG_PARALLEL
> bool
> select HOTPLUG_SPLIT_STARTUP
>
> +config ARCH_HAS_OWN_IRQ_PREEMPTION
> + bool
> +
> config GENERIC_IRQ_ENTRY
> bool
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 38dba5f7e4d2d..bf0ec8237de45 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -42,6 +42,7 @@ config ARM64
> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
> + select ARCH_HAS_OWN_IRQ_PREEMPTION
> select ARCH_HAS_PREEMPT_LAZY
> select ARCH_HAS_PTDUMP
> select ARCH_HAS_PTE_SPECIAL
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 3625797e9ee8f..1aedadf09eb4d 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -497,6 +497,8 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
> do_interrupt_handler(regs, handler);
> irq_exit_rcu();
>
> + irqentry_exit_cond_resched();
> +
> exit_to_kernel_mode(regs, state);
> }
> static void noinstr el1_interrupt(struct pt_regs *regs,
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 9ef63e4147913..af9cae1f225e3 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -235,8 +235,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
> }
>
> instrumentation_begin();
> - if (IS_ENABLED(CONFIG_PREEMPTION))
> + if (IS_ENABLED(CONFIG_PREEMPTION) &&
> + !IS_ENABLED(CONFIG_ARCH_HAS_OWN_IRQ_PREEMPTION)) {
> irqentry_exit_cond_resched();
> + }
>
> /* Covers both tracing and lockdep */
> trace_hardirqs_on();
next prev parent reply other threads:[~2026-03-24 3:14 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 11:30 [PATCH 0/2] arm64/entry: Fix involuntary preemption exception masking Mark Rutland
2026-03-20 11:30 ` [PATCH 1/2] " Mark Rutland
2026-03-20 13:04 ` Peter Zijlstra
2026-03-20 14:11 ` Thomas Gleixner
2026-03-20 14:57 ` Mark Rutland
2026-03-20 15:34 ` Peter Zijlstra
2026-03-20 16:16 ` Mark Rutland
2026-03-20 15:50 ` Thomas Gleixner
2026-03-23 17:21 ` Mark Rutland
2026-03-20 14:59 ` Thomas Gleixner
2026-03-20 15:37 ` Mark Rutland
2026-03-20 16:26 ` Thomas Gleixner
2026-03-20 17:31 ` Mark Rutland
2026-03-21 23:25 ` Thomas Gleixner
2026-03-24 12:19 ` Thomas Gleixner
2026-03-25 11:03 ` Mark Rutland
2026-03-25 15:46 ` Thomas Gleixner
2026-03-26 8:56 ` Jinjie Ruan
2026-03-26 18:11 ` Mark Rutland
2026-03-26 18:32 ` Thomas Gleixner
2026-03-27 1:27 ` Jinjie Ruan
2026-03-26 8:52 ` Jinjie Ruan
2026-03-24 3:14 ` Jinjie Ruan [this message]
2026-03-24 10:51 ` Mark Rutland
2026-03-20 11:30 ` [PATCH 2/2] arm64/entry: Remove arch_irqentry_exit_need_resched() Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4e823cd-bd35-0e4b-d3fd-52688bfdf4ab@huawei.com \
--to=ruanjinjie@huawei.com \
--cc=ada.coupriediaz@arm.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=tglx@kernel.org \
--cc=vladimir.murzin@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox