Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, ada.coupriediaz@arm.com,
	catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
	luto@kernel.org, peterz@infradead.org, ruanjinjie@huawei.com,
	vladimir.murzin@arm.com, will@kernel.org
Subject: Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking
Date: Fri, 20 Mar 2026 17:26:47 +0100	[thread overview]
Message-ID: <87341ujwl4.ffs@tglx> (raw)
In-Reply-To: <ab1prenkP-tFgUzK@J2N7QTR9R3.cambridge.arm.com>

On Fri, Mar 20 2026 at 15:37, Mark Rutland wrote:
> On Fri, Mar 20, 2026 at 03:59:40PM +0100, Thomas Gleixner wrote:
>> Why are you routing NMI like exceptions through irqentry_enter() and
>> irqentry_exit() in the first place? That's just wrong.
>
> Sorry, the above was not clear, and some of this logic is gunk that has
> been carried over unnnecessarily from our old exception handling flow.
>
> The issue with pseudo-NMI is that it uses the same exception as regular
> interrupts, but we don't know whether we have a pseudo-NMI until we
> acknowledge the event at the irqchip level. When a pseudo-NMI is taken,
> there are two possibilities:
>
> (1) The pseudo-NMI is taken from a context where interrupts were
>     *disabled*. The entry code immediately knows it must be a
>     pseudo-NMI, and we call irqentry_nmi_{enter,exit}(), NOT
>     irqentry_{enter,exit}(), treating it as an NMI.
>
>
> (2) The pseudo-NMI was taken from a context where interrupts were
>     *enabled*. The entry code doesn't know whether it's a pseudo-NMI or
>     a regular interrupt, so it calls irqentry_{enter,exit}(), and then
>     within that we'll call nmi_{enter,exit}() to transiently enter NMI
>     context.
>
> I realise this is crazy. I would love to delete pseudo-NMI.
> Unfortunately people are using it.

What is it used for?

> Putting aside the nesting here, I think it's fine to preempt upon return
> from case (2), and we can delete the logic to avoid preempting.

Correct.

>> 
>> That means at the point where irqentry_entry() is invoked, the
>> architecture side should have made sure that everything is set up for
>> the kernel to operate until irqentry_exit() returns.
>
> Ok. I think you're saying I should try:
>
> * At entry, *before* irqentry_enter():
>   - unmask everything EXCEPT regular interrupts.
>   - fix up all the necessary state.
>
> * At exception exit, *after* irqentry_exit():
>   - mask everything.
>   - fix up all the necessary state.
>
> ... right?

Yes.

>> Looking at your example:
>> 
>> > | static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
>> > | {
>> > |         unsigned long far = read_sysreg(far_el1);
>> > |         irqentry_state_t state;
>> > |
>> > |         state = enter_from_kernel_mode(regs);
>> > |         local_daif_inherit(regs);
>> > |         do_mem_abort(far, esr, regs);
>> > |         local_daif_mask();
>> > |         exit_to_kernel_mode(regs, state);
>> 
>> and the paragraph right below that:
>> 
>> > Currently, the generic irq entry code will attempt to preempt from any
>> > exception under irqentry_exit() where interrupts were unmasked in the
>> > original context. As arm64's entry code will have already masked
>> > exceptions via DAIF, this results in the problems described above.
>> 
>> To me this looks like your ordering is wrong. Why are you doing the DAIF
>> inherit _after_ irqentry_enter() and the mask _before_ irqentry_exit()?
>
> As above, I can go look at reworking this. 
>
> For context, we do it this way today for several reasons, including:
>
> (1) Because some of the arch-specific bits (such as checking the TFSR
>     for MTE) in enter_from_kernel_mode() and exit_to_kernel_mode() need
>     to be done while RCU is watching, etc, but needs other exceptions
>     masked. I can look at reworking that.

Something like the below should do that for you. If you need more than
regs, then you can either stick something on your stack frame or we go
and extend irqentry_enter()/exit() with an additional argument which
allows you to hand some exception/interrupt specific cookie in. The core
code would just hand it through to arch_irqenter_enter/exit_rcu() along
with @regs. That cookie might be data or even a function pointer. The
core does not have to know about it.

> (2) To minimize the number of times we have to write to things like
>     DAIF, as that can be expensive.
>
> (3) To simplify the management of things like DAIF, so that we don't
>     have several points in time at which we need to inherit different
>     pieces of state.

The above should cover your #2/3 too, no?

> (4) Historical, as that's the flow we had in assembly, and prior to the
>     move to generic irq entry.

No comment :)

>> I might be missing something, but this smells more than fishy.
>> 
>> As no other architecture has that problem I'm pretty sure that the
>> problem is not in the way how the generic code was designed. Why?
>
> Hey, I'm not saying the generic entry code is wrong, just that there's a
> mismatch between it and what would be optimal for arm64.
>
>> Because your architecture is _not_ sooo special! :)
>
> I think it's pretty special, but not necessarily in the same sense. ;)

Hehehe.

Thanks,

        tglx
---        
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -149,6 +149,7 @@ noinstr irqentry_state_t irqentry_enter(
 		instrumentation_begin();
 		kmsan_unpoison_entry_regs(regs);
 		trace_hardirqs_off_finish();
+		arch_irqentry_enter_rcu(regs);
 		instrumentation_end();
 
 		ret.exit_rcu = true;
@@ -166,6 +167,7 @@ noinstr irqentry_state_t irqentry_enter(
 	kmsan_unpoison_entry_regs(regs);
 	rcu_irq_enter_check_tick();
 	trace_hardirqs_off_finish();
+	arch_irqentry_enter_rcu(regs);
 	instrumentation_end();
 
 	return ret;
@@ -225,6 +227,7 @@ noinstr void irqentry_exit(struct pt_reg
 		 */
 		if (state.exit_rcu) {
 			instrumentation_begin();
+			arch_irqentry_exit_rcu(regs);
 			hrtimer_rearm_deferred();
 			/* Tell the tracer that IRET will enable interrupts */
 			trace_hardirqs_on_prepare();
@@ -239,11 +242,13 @@ noinstr void irqentry_exit(struct pt_reg
 		if (IS_ENABLED(CONFIG_PREEMPTION))
 			irqentry_exit_cond_resched();
 
+		arch_irqentry_exit_rcu(regs);
 		hrtimer_rearm_deferred();
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
 		instrumentation_end();
 	} else {
+		arch_irqentry_exit_rcu(regs);
 		/*
 		 * IRQ flags state is correct already. Just tell RCU if it
 		 * was not watching on entry.

next prev parent reply	other threads:[~2026-03-20 16:26 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 11:30 [PATCH 0/2] arm64/entry: Fix involuntary preemption exception masking Mark Rutland
2026-03-20 11:30 ` [PATCH 1/2] " Mark Rutland
2026-03-20 13:04   ` Peter Zijlstra
2026-03-20 14:11     ` Thomas Gleixner
2026-03-20 14:57       ` Mark Rutland
2026-03-20 15:34         ` Peter Zijlstra
2026-03-20 16:16           ` Mark Rutland
2026-03-20 15:50         ` Thomas Gleixner
2026-03-23 17:21           ` Mark Rutland
2026-03-20 14:59   ` Thomas Gleixner
2026-03-20 15:37     ` Mark Rutland
2026-03-20 16:26       ` Thomas Gleixner [this message]
2026-03-20 17:31         ` Mark Rutland
2026-03-21 23:25           ` Thomas Gleixner
2026-03-24 12:19             ` Thomas Gleixner
2026-03-25 11:03             ` Mark Rutland
2026-03-25 15:46               ` Thomas Gleixner
2026-03-26  8:56                 ` Jinjie Ruan
2026-03-26 18:11                 ` Mark Rutland
2026-03-26 18:32                   ` Thomas Gleixner
2026-03-27  1:27                   ` Jinjie Ruan
2026-03-26  8:52               ` Jinjie Ruan
2026-03-24  3:14   ` Jinjie Ruan
2026-03-24 10:51     ` Mark Rutland
2026-03-20 11:30 ` [PATCH 2/2] arm64/entry: Remove arch_irqentry_exit_need_resched() Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87341ujwl4.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=ada.coupriediaz@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterz@infradead.org \
    --cc=ruanjinjie@huawei.com \
    --cc=vladimir.murzin@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox