From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 408181643B for ; Fri, 20 Mar 2026 16:26:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774024011; cv=none; b=Kxl2dIHSV4XUOXcDV0a+9ZTJxWpGsm790AQmh0j7VjftDgUCkWSeS1slf6Z95+CC979ZYGxx7Uwk/oa8PF/QGOk3xa0Uwh6rtq7KoeNKNGKM7ntuSOgctRyHYYSrPPyRG7OMrT/Pi9oxJ6S8O4sxkkAUhW27TfVQcbW1p24SKLk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774024011; c=relaxed/simple; bh=Mu8K4ltMOyKtrO/vxTH8fzGT8sDL87bCwkqPs4n7qOg=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=U4h/lo5ds/XVejfZlWh8LOZ68yArwopMYKhsjMIQiEqpWQQWP3YFHAza8c3ii20RvR/lT3KUTWzxb6ICPI612LLksNLD9aA0W2ZkK+27VbubtIQvqhtMiwdN+r6xcPvtGeXz5JOljg5N8kl+xwAy9IKcbOqhVLenfDU9frpXAEY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EOd4dzPu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EOd4dzPu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59AE4C4CEF7; Fri, 20 Mar 2026 16:26:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774024010; bh=Mu8K4ltMOyKtrO/vxTH8fzGT8sDL87bCwkqPs4n7qOg=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=EOd4dzPuRgAqJ6y5qEw7hANCw4SsZCxWvA62ZwvbcM253aRrtFH9DQLykfNTuAUeB S8chSM/Hk44/IL0m/oycPdR3ttyYAVE495stCFIcEqB/B7XcshyCZZS9H/mJ0rjsY9 8bc1WpiJlwJFuuDX3lTXKz80s7bHG/LWy+dO4fDhbeziUM6wSJVhQqtlJu6dZlMm1m sQF85HTtdMERsicOUm2zbh/hbLUtlbUuH8hBS+khQi1j85CAr+w3zmvIzTzGn3eqOj eucvOxYam1nT++j1UzHcqnqwXHxBWd9rV1aYEZj78EGg8O8mdcYEwHTWTOzwZK1C7x w7gb9ZhpHRuJw== From: Thomas Gleixner To: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org, ada.coupriediaz@arm.com, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, luto@kernel.org, peterz@infradead.org, ruanjinjie@huawei.com, vladimir.murzin@arm.com, will@kernel.org Subject: Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking In-Reply-To: References: <20260320113026.3219620-1-mark.rutland@arm.com> <20260320113026.3219620-2-mark.rutland@arm.com> <87eclek0mb.ffs@tglx> Date: Fri, 20 Mar 2026 17:26:47 +0100 Message-ID: <87341ujwl4.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Fri, Mar 20 2026 at 15:37, Mark Rutland wrote: > On Fri, Mar 20, 2026 at 03:59:40PM +0100, Thomas Gleixner wrote: >> Why are you routing NMI like exceptions through irqentry_enter() and >> irqentry_exit() in the first place? That's just wrong. > > Sorry, the above was not clear, and some of this logic is gunk that has > been carried over unnnecessarily from our old exception handling flow. > > The issue with pseudo-NMI is that it uses the same exception as regular > interrupts, but we don't know whether we have a pseudo-NMI until we > acknowledge the event at the irqchip level. When a pseudo-NMI is taken, > there are two possibilities: > > (1) The pseudo-NMI is taken from a context where interrupts were > *disabled*. The entry code immediately knows it must be a > pseudo-NMI, and we call irqentry_nmi_{enter,exit}(), NOT > irqentry_{enter,exit}(), treating it as an NMI. > > > (2) The pseudo-NMI was taken from a context where interrupts were > *enabled*. The entry code doesn't know whether it's a pseudo-NMI or > a regular interrupt, so it calls irqentry_{enter,exit}(), and then > within that we'll call nmi_{enter,exit}() to transiently enter NMI > context. > > I realise this is crazy. I would love to delete pseudo-NMI. > Unfortunately people are using it. What is it used for? > Putting aside the nesting here, I think it's fine to preempt upon return > from case (2), and we can delete the logic to avoid preempting. Correct. >> >> That means at the point where irqentry_entry() is invoked, the >> architecture side should have made sure that everything is set up for >> the kernel to operate until irqentry_exit() returns. > > Ok. I think you're saying I should try: > > * At entry, *before* irqentry_enter(): > - unmask everything EXCEPT regular interrupts. > - fix up all the necessary state. > > * At exception exit, *after* irqentry_exit(): > - mask everything. > - fix up all the necessary state. > > ... right? Yes. >> Looking at your example: >> >> > | static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr) >> > | { >> > | unsigned long far = read_sysreg(far_el1); >> > | irqentry_state_t state; >> > | >> > | state = enter_from_kernel_mode(regs); >> > | local_daif_inherit(regs); >> > | do_mem_abort(far, esr, regs); >> > | local_daif_mask(); >> > | exit_to_kernel_mode(regs, state); >> >> and the paragraph right below that: >> >> > Currently, the generic irq entry code will attempt to preempt from any >> > exception under irqentry_exit() where interrupts were unmasked in the >> > original context. As arm64's entry code will have already masked >> > exceptions via DAIF, this results in the problems described above. >> >> To me this looks like your ordering is wrong. Why are you doing the DAIF >> inherit _after_ irqentry_enter() and the mask _before_ irqentry_exit()? > > As above, I can go look at reworking this. > > For context, we do it this way today for several reasons, including: > > (1) Because some of the arch-specific bits (such as checking the TFSR > for MTE) in enter_from_kernel_mode() and exit_to_kernel_mode() need > to be done while RCU is watching, etc, but needs other exceptions > masked. I can look at reworking that. Something like the below should do that for you. If you need more than regs, then you can either stick something on your stack frame or we go and extend irqentry_enter()/exit() with an additional argument which allows you to hand some exception/interrupt specific cookie in. The core code would just hand it through to arch_irqenter_enter/exit_rcu() along with @regs. That cookie might be data or even a function pointer. The core does not have to know about it. > (2) To minimize the number of times we have to write to things like > DAIF, as that can be expensive. > > (3) To simplify the management of things like DAIF, so that we don't > have several points in time at which we need to inherit different > pieces of state. The above should cover your #2/3 too, no? > (4) Historical, as that's the flow we had in assembly, and prior to the > move to generic irq entry. No comment :) >> I might be missing something, but this smells more than fishy. >> >> As no other architecture has that problem I'm pretty sure that the >> problem is not in the way how the generic code was designed. Why? > > Hey, I'm not saying the generic entry code is wrong, just that there's a > mismatch between it and what would be optimal for arm64. > >> Because your architecture is _not_ sooo special! :) > > I think it's pretty special, but not necessarily in the same sense. ;) Hehehe. Thanks, tglx --- --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -149,6 +149,7 @@ noinstr irqentry_state_t irqentry_enter( instrumentation_begin(); kmsan_unpoison_entry_regs(regs); trace_hardirqs_off_finish(); + arch_irqentry_enter_rcu(regs); instrumentation_end(); ret.exit_rcu = true; @@ -166,6 +167,7 @@ noinstr irqentry_state_t irqentry_enter( kmsan_unpoison_entry_regs(regs); rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); + arch_irqentry_enter_rcu(regs); instrumentation_end(); return ret; @@ -225,6 +227,7 @@ noinstr void irqentry_exit(struct pt_reg */ if (state.exit_rcu) { instrumentation_begin(); + arch_irqentry_exit_rcu(regs); hrtimer_rearm_deferred(); /* Tell the tracer that IRET will enable interrupts */ trace_hardirqs_on_prepare(); @@ -239,11 +242,13 @@ noinstr void irqentry_exit(struct pt_reg if (IS_ENABLED(CONFIG_PREEMPTION)) irqentry_exit_cond_resched(); + arch_irqentry_exit_rcu(regs); hrtimer_rearm_deferred(); /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); } else { + arch_irqentry_exit_rcu(regs); /* * IRQ flags state is correct already. Just tell RCU if it * was not watching on entry.