From: Thomas Gleixner <tglx@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, ada.coupriediaz@arm.com,
catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
luto@kernel.org, peterz@infradead.org, ruanjinjie@huawei.com,
vladimir.murzin@arm.com, will@kernel.org
Subject: Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking
Date: Sun, 22 Mar 2026 00:25:06 +0100 [thread overview]
Message-ID: <87fr5six4d.ffs@tglx> (raw)
In-Reply-To: <ab2EZAXvL6bYcuKt@J2N7QTR9R3.cambridge.arm.com>
On Fri, Mar 20 2026 at 17:31, Mark Rutland wrote:
> On Fri, Mar 20, 2026 at 05:26:47PM +0100, Thomas Gleixner wrote:
>> > I realise this is crazy. I would love to delete pseudo-NMI.
>> > Unfortunately people are using it.
>>
>> What is it used for?
>
> It's used where people would want an NMI; specifically today that's:
>
> * The PMU interrupt (so people can profile code that has IRQs off).
> * IPI_CPU_BACKTRACE (so we can get a backtrace when code has IRQs off).
> * IPI_CPU_STOP_NMI (so panic can stop secondaries more reliably).
> * IPI_KGDB_ROUNDUP (so KGDB can stop secondaries more reliably).
>
> I mostly care about the first two. People *really* want the PMU interrupt as an
> NMI.
Which makes actually sense.
> My other concern is that I'd like to backport a fix for the issue I
> mentioned in the commit message, and I'd like to have something that's
> simpler than "rewrite the entire entry code" for that -- for backporting
> it'd be easier to revert the move to generic irq entry.
I understand.
> Not really, but we might be talking past one another.
>
> I *think* you're saying that because the arch code would manage DAIF
> early during entry and late during exit, that would all be in one place.
That was my thought, see below.
> However, that doubles the number of times we have to write to DAIF: at
> entry we'd have to poke it once to unmask everything except IRQs, then
> again to unmask IRQs, and exit would need the inverse. We'd also have to
> split the inheritance logic into inherit-everything-but-interrupt and
> inherit-only-interrupt, which is doable but not ideal. With pseudo-NMI
> it's even worse, but that's largely because pseudo-NMI is
> over-complicated today.
Interrupts are not unmasked on interrupt/exception entry ever and I
don't understand your concerns at all, but as always I might be missing
something.
The current sequence on entry is:
// interrupts are disabled by interrupt/exception entry
enter_from_kernel_mode()
irqentry_enter(regs);
mte_check_tfsr_entry();
mte_disable_tco_entry();
daif_inherit(regs);
// interrupts are still disabled
which then becomes:
// interrupts are disabled by interrupt/exception entry
irqentry_enter(regs)
establish_state();
// RCU is watching
arch_irqentry_enter_rcu()
mte_check_tfsr_entry();
mte_disable_tco_entry();
daif_inherit(regs);
// interrupts are still disabled
Which is equivalent versus the MTE/DAIF requirements, no?
The current broken sequence vs. preemption on exit is:
// interrupts are disabled
exit_to_kernel_mode
daif_mask();
mte_check_tfsr_exit();
irqentry_exit(regs, state);
which then becomes:
// interrupts are disabled
irqentry_exit(regs, state)
// includes preemption
prepare_for_exit();
// RCU is still watching
arch_irqentry_ecit_rcu()
daif_mask();
mte_check_tfsr_exit();
if (state.exit_rcu)
ct_irq_exit();
Which is equivalent versus the MTE/DAIF requirements and fixes the
preempt on exit issue too, no?
That change would be trivial enough for backporting, right?
It also prevents you from staring at the bug reports which are going to
end up in your mailbox after I merged the patch which moves the
misplaced rcu_irq_exit_check_preempt() check _before_ the
preempt_count() check where it belongs.
I fully agree that ARM64 is special vs. CPU state handling, but it's not
special enough to justify it's own semantically broken preemption logic.
Looking at those details made me also look at this magic
arch_irqentry_exit_need_resched() inline function.
/*
* DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
* priority masking is used the GIC irqchip driver will clear DAIF.IF
* using gic_arch_enable_irqs() for normal IRQs. If anything is set in
* DAIF we must have handled an NMI, so skip preemption.
*/
if (system_uses_irq_prio_masking() && read_sysreg(daif))
return false;
Why is this using irqentry_enter/exit() in the first place?
NMI delivery has to go through irqentry_nmi_enter/exit() as I explained
to you before. This hack is fundamentally wrong:
1) It fails to indicate NMI state in preempt_count and other facilities
There is code which cares about this state. It's truly amazing that
it did not explode in your face yet.
2) It uses a code path which is not NMI safe by definition
I did not go through all the gory details there, but the lack of
explosions might be sheer luck because most of the related code is
written in a way that it can be used in both context.
But the pending for v7.1 hrtimer changes are going to actually
expose this ARM64 NMI bogosity. Assume the following scenario:
timer interrupt
hrtimer_interrupt()
raw_spin_lock(&cpu_base->lock);
...
cpu_base->deferred_rearm = true;
---> NMI
irqentry_enter();
handle();
irqentry_exit()
...
hrtimer_rearm_deferred()
if (!cpu_base->deferred_rearm)
return;
raw_spin_lock(&cpu_base->lock)
---> LIVELOCK
That's code which is not upstream yet, but it was written under the
perfectly valid assumption that architectures actually adhere to the
mandatory interrupt/NMI destinction, which ARM64 clearly does not.
You might argue that this assumption is wrong vs. other irqentry_enter()
bound exceptions which can be raised and handled even if interrupts are
disabled, i.e. page faults, divide by zero etc.
It's not wrong at all because any of these exceptions in the context of
a interrupt handler will cause a kernel panic.
Actually thinking about that the rearm code needs to be hardened against
this by having:
if (WARN_ON_ONCE(in_hardirq())
return;
Thanks,
tglx
next prev parent reply other threads:[~2026-03-21 23:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 11:30 [PATCH 0/2] arm64/entry: Fix involuntary preemption exception masking Mark Rutland
2026-03-20 11:30 ` [PATCH 1/2] " Mark Rutland
2026-03-20 13:04 ` Peter Zijlstra
2026-03-20 14:11 ` Thomas Gleixner
2026-03-20 14:57 ` Mark Rutland
2026-03-20 15:34 ` Peter Zijlstra
2026-03-20 16:16 ` Mark Rutland
2026-03-20 15:50 ` Thomas Gleixner
2026-03-23 17:21 ` Mark Rutland
2026-03-20 14:59 ` Thomas Gleixner
2026-03-20 15:37 ` Mark Rutland
2026-03-20 16:26 ` Thomas Gleixner
2026-03-20 17:31 ` Mark Rutland
2026-03-21 23:25 ` Thomas Gleixner [this message]
2026-03-24 12:19 ` Thomas Gleixner
2026-03-25 11:03 ` Mark Rutland
2026-03-25 15:46 ` Thomas Gleixner
2026-03-26 8:56 ` Jinjie Ruan
2026-03-26 18:11 ` Mark Rutland
2026-03-26 18:32 ` Thomas Gleixner
2026-03-27 1:27 ` Jinjie Ruan
2026-03-26 8:52 ` Jinjie Ruan
2026-03-24 3:14 ` Jinjie Ruan
2026-03-24 10:51 ` Mark Rutland
2026-03-20 11:30 ` [PATCH 2/2] arm64/entry: Remove arch_irqentry_exit_need_resched() Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fr5six4d.ffs@tglx \
--to=tglx@kernel.org \
--cc=ada.coupriediaz@arm.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=ruanjinjie@huawei.com \
--cc=vladimir.murzin@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox