From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 933542EC0A1 for ; Sat, 21 Mar 2026 23:25:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774135510; cv=none; b=VCdEq91PhFKq1o5oibtj6RBaQ0xG2dr6xDszco9iOU+ZEmrUnDKap27lWWRlUNedCUEMR40KuqwhFdwZMZi4oFVtr0u+0v2/QeMbgHZj/ZigNt3oKDfpc6S4LrBTw4aiFdlT7qAxx3fgkmgdRs50eS2kAxHHIfIfOLvAds1T26I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774135510; c=relaxed/simple; bh=mF6yhrLYKrMU36jVyej2tThhdvq8vNISgwbJfS7h6w4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=FgtXuFkXF9SL6FM66pJSZJtXNKDQzxw2EfO52WXOZ/G66lWr7wLjflraCIhkxiHVswHjxjngqjlq1c2C+UPNsv5PeUzUQXND4IYWbhcZA+GKEj6MY3vsId/tmAMXaH+cXEqcGyRGWaBftgvVdk9eCB2BoxsyMiq4mnOCkfMSrAQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sZVYHxjA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sZVYHxjA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F2F2C19421; Sat, 21 Mar 2026 23:25:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774135510; bh=mF6yhrLYKrMU36jVyej2tThhdvq8vNISgwbJfS7h6w4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=sZVYHxjAIZGoV+GajMxAXc1Xtp2c5p2jRbxmRCcDEmF51VQGrePbKQirSPLg1SEdi 1aZ9ZgFqqhpGKvqUrSOQvhF5weQnQ/20EGuN2EhU2QvK0hgg67NuMmVGMmvCsunPU0 taSWNlTcHqhCZUajC76sqhUlE3M5Lid4b0lPZlYeNUF//KdoSXOhcJvxQO+NKsyeTY 2s887/ZEgbetU7lxJefmLWf/jSkw5qVe75sjssDqmkHW8adPk2B/jMpFOTaIp1Xqo/ UENKfxdCDo3g3TjpcxhQKc4YQ/zizVAIxtL9nNat7FGqLGFZEDCWjrzLX1cDpsVoRJ fcGWLsEHb+8NQ== From: Thomas Gleixner To: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org, ada.coupriediaz@arm.com, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, luto@kernel.org, peterz@infradead.org, ruanjinjie@huawei.com, vladimir.murzin@arm.com, will@kernel.org Subject: Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking In-Reply-To: References: <20260320113026.3219620-1-mark.rutland@arm.com> <20260320113026.3219620-2-mark.rutland@arm.com> <87eclek0mb.ffs@tglx> <87341ujwl4.ffs@tglx> Date: Sun, 22 Mar 2026 00:25:06 +0100 Message-ID: <87fr5six4d.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Fri, Mar 20 2026 at 17:31, Mark Rutland wrote: > On Fri, Mar 20, 2026 at 05:26:47PM +0100, Thomas Gleixner wrote: >> > I realise this is crazy. I would love to delete pseudo-NMI. >> > Unfortunately people are using it. >> >> What is it used for? > > It's used where people would want an NMI; specifically today that's: > > * The PMU interrupt (so people can profile code that has IRQs off). > * IPI_CPU_BACKTRACE (so we can get a backtrace when code has IRQs off). > * IPI_CPU_STOP_NMI (so panic can stop secondaries more reliably). > * IPI_KGDB_ROUNDUP (so KGDB can stop secondaries more reliably). > > I mostly care about the first two. People *really* want the PMU interrupt as an > NMI. Which makes actually sense. > My other concern is that I'd like to backport a fix for the issue I > mentioned in the commit message, and I'd like to have something that's > simpler than "rewrite the entire entry code" for that -- for backporting > it'd be easier to revert the move to generic irq entry. I understand. > Not really, but we might be talking past one another. > > I *think* you're saying that because the arch code would manage DAIF > early during entry and late during exit, that would all be in one place. That was my thought, see below. > However, that doubles the number of times we have to write to DAIF: at > entry we'd have to poke it once to unmask everything except IRQs, then > again to unmask IRQs, and exit would need the inverse. We'd also have to > split the inheritance logic into inherit-everything-but-interrupt and > inherit-only-interrupt, which is doable but not ideal. With pseudo-NMI > it's even worse, but that's largely because pseudo-NMI is > over-complicated today. Interrupts are not unmasked on interrupt/exception entry ever and I don't understand your concerns at all, but as always I might be missing something. The current sequence on entry is: // interrupts are disabled by interrupt/exception entry enter_from_kernel_mode() irqentry_enter(regs); mte_check_tfsr_entry(); mte_disable_tco_entry(); daif_inherit(regs); // interrupts are still disabled which then becomes: // interrupts are disabled by interrupt/exception entry irqentry_enter(regs) establish_state(); // RCU is watching arch_irqentry_enter_rcu() mte_check_tfsr_entry(); mte_disable_tco_entry(); daif_inherit(regs); // interrupts are still disabled Which is equivalent versus the MTE/DAIF requirements, no? The current broken sequence vs. preemption on exit is: // interrupts are disabled exit_to_kernel_mode daif_mask(); mte_check_tfsr_exit(); irqentry_exit(regs, state); which then becomes: // interrupts are disabled irqentry_exit(regs, state) // includes preemption prepare_for_exit(); // RCU is still watching arch_irqentry_ecit_rcu() daif_mask(); mte_check_tfsr_exit(); if (state.exit_rcu) ct_irq_exit(); Which is equivalent versus the MTE/DAIF requirements and fixes the preempt on exit issue too, no? That change would be trivial enough for backporting, right? It also prevents you from staring at the bug reports which are going to end up in your mailbox after I merged the patch which moves the misplaced rcu_irq_exit_check_preempt() check _before_ the preempt_count() check where it belongs. I fully agree that ARM64 is special vs. CPU state handling, but it's not special enough to justify it's own semantically broken preemption logic. Looking at those details made me also look at this magic arch_irqentry_exit_need_resched() inline function. /* * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC * priority masking is used the GIC irqchip driver will clear DAIF.IF * using gic_arch_enable_irqs() for normal IRQs. If anything is set in * DAIF we must have handled an NMI, so skip preemption. */ if (system_uses_irq_prio_masking() && read_sysreg(daif)) return false; Why is this using irqentry_enter/exit() in the first place? NMI delivery has to go through irqentry_nmi_enter/exit() as I explained to you before. This hack is fundamentally wrong: 1) It fails to indicate NMI state in preempt_count and other facilities There is code which cares about this state. It's truly amazing that it did not explode in your face yet. 2) It uses a code path which is not NMI safe by definition I did not go through all the gory details there, but the lack of explosions might be sheer luck because most of the related code is written in a way that it can be used in both context. But the pending for v7.1 hrtimer changes are going to actually expose this ARM64 NMI bogosity. Assume the following scenario: timer interrupt hrtimer_interrupt() raw_spin_lock(&cpu_base->lock); ... cpu_base->deferred_rearm = true; ---> NMI irqentry_enter(); handle(); irqentry_exit() ... hrtimer_rearm_deferred() if (!cpu_base->deferred_rearm) return; raw_spin_lock(&cpu_base->lock) ---> LIVELOCK That's code which is not upstream yet, but it was written under the perfectly valid assumption that architectures actually adhere to the mandatory interrupt/NMI destinction, which ARM64 clearly does not. You might argue that this assumption is wrong vs. other irqentry_enter() bound exceptions which can be raised and handled even if interrupts are disabled, i.e. page faults, divide by zero etc. It's not wrong at all because any of these exceptions in the context of a interrupt handler will cause a kernel panic. Actually thinking about that the rearm code needs to be hardened against this by having: if (WARN_ON_ONCE(in_hardirq()) return; Thanks, tglx