public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] arm64/entry:
@ 2026-04-07 13:16 Mark Rutland
  2026-04-07 13:16 ` [PATCH 01/10] entry: Fix stale comment for irqentry_enter() Mark Rutland
                   ` (10 more replies)
  0 siblings, 11 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Catalin Marinas,
	Peter Zijlstra, Thomas Gleixner, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, mark.rutland, ruanjinjie,
	vladimir.murzin

Since the move to generic IRQ entry, arm64's involuntary kernel
preemption logic has been subtly broken, and preemption can lead to
tasks running with some exceptions masked unexpectedly.

The gory details were discussed in the thread for my earlier attempt to
fix this:

  https://lore.kernel.org/linux-arm-kernel/20260320113026.3219620-1-mark.rutland@arm.com/
  https://lore.kernel.org/linux-arm-kernel/ab1prenkP-tFgUzK@J2N7QTR9R3.cambridge.arm.com/
  https://lore.kernel.org/linux-arm-kernel/ab2EZAXvL6bYcuKt@J2N7QTR9R3.cambridge.arm.com/
  https://lore.kernel.org/linux-arm-kernel/acPAzdtjK5w-rNqC@J2N7QTR9R3/

In summary, due to the way arm64's exceptions work architecturally, and
due to some constraints on sequencing during entry/exit, fixing this
properly requires tha arm64 handles more of the sequencing and
(architectural) state management itself.

This series attempts to make that possible by refactoring the generic
irqentry kernel mode entry/exit paths to look more like the user mode
entry/exit paths, with a separate 'prepare' step prior to return. The
refactoring also allows more of the generic irqentry code to be inlined
into architectural entry code, which can result in slightly better code
generation.

I've split the series into a prefix of changes for generic irqentry,
followed by changes to the arm64 code. I'm hoping that we can queue the
generic irqentry patches onto a stable branch, or take those via arm64.
The patches are as follows:

* Patches 1 and 2 are cleanup to the generic irqentry code. These have no
  functional impact, and I think these can be taken regardless of the
  rest of the series.

* Patches 3 to 5 refactor the generic irqentry code as described above,
  providing separate irqentry_{enter,exit}() functions and providing a
  split form of irqentry_exit_to_kernel_mode() similar to what exists
  for irqentry_exit_to_user_mode(). These patches alone should have no
  functional impact.

* Patch 6 is a minimal fix for the arm64 exception masking issues. This
  DOES NOT depend on the generic irqentry patches, and can be backported
  to stable.

* Patches 7 to 9 refactor the arm64 entry code and provide a more
  optimal fix (which permits preemption in more cases). These are split
  into separate patches to aid bisection.

* Patch 10 is a test which can detect exceptions being masked
  unexpectedly. I don't know whether we want to take this as-is, but
  I've included it here to aid testing and so that it gets archived for
  future reference.

The series is based on v7.0-rc3.

Thanks,
Mark.

Mark Rutland (10):
  entry: Fix stale comment for irqentry_enter()
  entry: Remove local_irq_{enable,disable}_exit_to_user()
  entry: Move irqentry_enter() prototype later
  entry: Split kernel mode logic from irqentry_{enter,exit}()
  entry: Split preemption from irqentry_exit_to_kernel_mode()
  arm64: entry: Don't preempt with SError or Debug masked
  arm64: entry: Consistently prefix arm64-specific wrappers
  arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  arm64: entry: Use split preemption logic
  arm64: Check DAIF (and PMR) at task-switch time

 arch/arm64/kernel/entry-common.c |  52 ++++----
 arch/arm64/kernel/process.c      |  25 ++++
 include/linux/entry-common.h     |   2 +-
 include/linux/irq-entry-common.h | 196 ++++++++++++++++++++++---------
 kernel/entry/common.c            | 107 ++---------------
 5 files changed, 202 insertions(+), 180 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/10] entry: Fix stale comment for irqentry_enter()
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:14   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  2026-04-07 13:16 ` [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user() Mark Rutland
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, mark.rutland,
	ruanjinjie, vladimir.murzin, will

The kerneldoc comment for irqentry_enter() refers to idtentry_exit(),
which is an accidental holdover from the x86 entry code that the generic
irqentry code was based on.

Correct this to refer to irqentry_exit().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index d26d1b1bcbfb9..3cf4d21168ba1 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -394,7 +394,7 @@ typedef struct irqentry_state {
  * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
  * would not be possible.
  *
- * Returns: An opaque object that must be passed to idtentry_exit()
+ * Returns: An opaque object that must be passed to irqentry_exit()
  */
 irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user()
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
  2026-04-07 13:16 ` [PATCH 01/10] entry: Fix stale comment for irqentry_enter() Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:18   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  2026-04-07 13:16 ` [PATCH 03/10] entry: Move irqentry_enter() prototype later Mark Rutland
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, mark.rutland,
	ruanjinjie, vladimir.murzin, will

The local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user()
functions are never overridden by architecture code, and are always
equivalent to local_irq_enable() and local_irq_disable().

These functions were added on the assumption that arm64 would override
them to manage 'DAIF' exception masking, as described by Thomas Gleixner
in these threads:

  https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/
  https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/

In practice arm64 did not need to override either. Prior to moving to
the generic irqentry code, arm64's management of DAIF was reworked in
commit:

  97d935faacde ("arm64: Unmask Debug + SError in do_notify_resume()")

Since that commit, arm64 only masks interrupts during the 'prepare' step
when returning to user mode, and masks other DAIF exceptions later.
Within arm64_exit_to_user_mode(), the arm64 entry code is as follows:

	local_irq_disable();
	exit_to_user_mode_prepare_legacy(regs);
	local_daif_mask();
	mte_check_tfsr_exit();
	exit_to_user_mode();

Remove the unnecessary local_irq_enable_exit_to_user() and
local_irq_disable_exit_to_user() functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/entry-common.h     |  2 +-
 include/linux/irq-entry-common.h | 31 -------------------------------
 kernel/entry/common.c            |  4 ++--
 3 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f83ca0abf2cdb..dbaa153100f44 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -321,7 +321,7 @@ static __always_inline void syscall_exit_to_user_mode(struct pt_regs *regs)
 {
 	instrumentation_begin();
 	syscall_exit_to_user_mode_work(regs);
-	local_irq_disable_exit_to_user();
+	local_irq_disable();
 	syscall_exit_to_user_mode_prepare(regs);
 	instrumentation_end();
 	exit_to_user_mode();
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 3cf4d21168ba1..93b4b551f7ae4 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -100,37 +100,6 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
 	instrumentation_end();
 }
 
-/**
- * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
- * @ti_work:	Cached TIF flags gathered with interrupts disabled
- *
- * Defaults to local_irq_enable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
-
-#ifndef local_irq_enable_exit_to_user
-static __always_inline void local_irq_enable_exit_to_user(unsigned long ti_work)
-{
-	local_irq_enable();
-}
-#endif
-
-/**
- * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
- *
- * Defaults to local_irq_disable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_disable_exit_to_user(void);
-
-#ifndef local_irq_disable_exit_to_user
-static __always_inline void local_irq_disable_exit_to_user(void)
-{
-	local_irq_disable();
-}
-#endif
-
 /**
  * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
  *				 to user mode.
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9ef63e4147913..b5e05d87ba391 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -47,7 +47,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 	 */
 	while (ti_work & EXIT_TO_USER_MODE_WORK_LOOP) {
 
-		local_irq_enable_exit_to_user(ti_work);
+		local_irq_enable();
 
 		if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
 			if (!rseq_grant_slice_extension(ti_work & TIF_SLICE_EXT_DENY))
@@ -74,7 +74,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 		 * might have changed while interrupts and preemption was
 		 * enabled above.
 		 */
-		local_irq_disable_exit_to_user();
+		local_irq_disable();
 
 		/* Check if any of the above work has queued a deferred wakeup */
 		tick_nohz_user_enter_prepare();
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/10] entry: Move irqentry_enter() prototype later
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
  2026-04-07 13:16 ` [PATCH 01/10] entry: Fix stale comment for irqentry_enter() Mark Rutland
  2026-04-07 13:16 ` [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user() Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:21   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  2026-04-07 13:16 ` [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}() Mark Rutland
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, mark.rutland,
	ruanjinjie, vladimir.murzin, will

Subsequent patches will rework the irqentry_*() functions. The end
result (and the intermediate diffs) will be much clearer if the
prototype for the irqentry_enter() function is moved later, immediately
before the prototype of the irqentry_exit() function.

Move the prototype later.

This is purely a move; there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 44 ++++++++++++++++----------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 93b4b551f7ae4..d1e8591a59195 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -334,6 +334,28 @@ typedef struct irqentry_state {
 } irqentry_state_t;
 #endif
 
+/**
+ * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
+ *
+ * Conditional reschedule with additional sanity checks.
+ */
+void raw_irqentry_exit_cond_resched(void);
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
+#define irqentry_exit_cond_resched_dynamic_disabled	NULL
+DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
+#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
+#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
+DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+void dynamic_irqentry_exit_cond_resched(void);
+#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
+#endif
+#else /* CONFIG_PREEMPT_DYNAMIC */
+#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
 /**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
@@ -367,28 +389,6 @@ typedef struct irqentry_state {
  */
 irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
 
-/**
- * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
- *
- * Conditional reschedule with additional sanity checks.
- */
-void raw_irqentry_exit_cond_resched(void);
-
-#ifdef CONFIG_PREEMPT_DYNAMIC
-#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
-#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
-#define irqentry_exit_cond_resched_dynamic_disabled	NULL
-DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
-#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
-#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
-DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-void dynamic_irqentry_exit_cond_resched(void);
-#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
-#endif
-#else /* CONFIG_PREEMPT_DYNAMIC */
-#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
-#endif /* CONFIG_PREEMPT_DYNAMIC */
-
 /**
  * irqentry_exit - Handle return from exception that used irqentry_enter()
  * @regs:	Pointer to pt_regs (exception entry regs)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}()
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (2 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 03/10] entry: Move irqentry_enter() prototype later Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:32   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  2026-04-07 13:16 ` [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode() Mark Rutland
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, mark.rutland,
	ruanjinjie, vladimir.murzin, will

The generic irqentry code has entry/exit functions specifically for
exceptions taken from user mode, but doesn't have entry/exit functions
specifically for exceptions taken from kernel mode.

It would be helpful to have separate entry/exit functions specifically
for exceptions taken from kernel mode. This would make the structure of
the entry code more consistent, and would make it easier for
architectures to manage logic specific to exceptions taken from kernel
mode.

Move the logic specific to kernel mode out of irqentry_enter() and
irqentry_exit() into new irqentry_enter_from_kernel_mode() and
irqentry_exit_to_kernel_mode() functions. These are marked
__always_inline and placed in irq-entry-common.h, as with
irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so
that they can be inlined into architecture-specific wrappers. The
existing out-of-line irqentry_enter() and irqentry_exit() functions
retained as callers of the new functions.

The lockdep assertion from irqentry_exit() is moved into
irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This
was previously missing from irqentry_exit_to_user_mode() when called
directly, and any new lockdep assertion failure relating from this
change is a latent bug.

Aside from the lockdep change noted above, there should be no functional
change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 103 +++++++++++++++++++++++++++++++
 kernel/entry/common.c            | 103 +++----------------------------
 2 files changed, 111 insertions(+), 95 deletions(-)

Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
comments for these new functions because the existing comments don't
seem all that consistent (e.g. for user mode vs kernel mode), and I
suspect we want to rewrite them all in one go for wider consistency.

I'm happy to respin this, or to follow-up with that as per your
preference.

Mark.

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index d1e8591a59195..2206150e526d8 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -304,6 +304,8 @@ static __always_inline void irqentry_enter_from_user_mode(struct pt_regs *regs)
  */
 static __always_inline void irqentry_exit_to_user_mode(struct pt_regs *regs)
 {
+	lockdep_assert_irqs_disabled();
+
 	instrumentation_begin();
 	irqentry_exit_to_user_mode_prepare(regs);
 	instrumentation_end();
@@ -356,6 +358,107 @@ void dynamic_irqentry_exit_cond_resched(void);
 #define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 
+static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct pt_regs *regs)
+{
+	irqentry_state_t ret = {
+		.exit_rcu = false,
+	};
+
+	/*
+	 * If this entry hit the idle task invoke ct_irq_enter() whether
+	 * RCU is watching or not.
+	 *
+	 * Interrupts can nest when the first interrupt invokes softirq
+	 * processing on return which enables interrupts.
+	 *
+	 * Scheduler ticks in the idle task can mark quiescent state and
+	 * terminate a grace period, if and only if the timer interrupt is
+	 * not nested into another interrupt.
+	 *
+	 * Checking for rcu_is_watching() here would prevent the nesting
+	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
+	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
+	 * assume that it is the first interrupt and eventually claim
+	 * quiescent state and end grace periods prematurely.
+	 *
+	 * Unconditionally invoke ct_irq_enter() so RCU state stays
+	 * consistent.
+	 *
+	 * TINY_RCU does not support EQS, so let the compiler eliminate
+	 * this part when enabled.
+	 */
+	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
+	    (is_idle_task(current) || arch_in_rcu_eqs())) {
+		/*
+		 * If RCU is not watching then the same careful
+		 * sequence vs. lockdep and tracing is required
+		 * as in irqentry_enter_from_user_mode().
+		 */
+		lockdep_hardirqs_off(CALLER_ADDR0);
+		ct_irq_enter();
+		instrumentation_begin();
+		kmsan_unpoison_entry_regs(regs);
+		trace_hardirqs_off_finish();
+		instrumentation_end();
+
+		ret.exit_rcu = true;
+		return ret;
+	}
+
+	/*
+	 * If RCU is watching then RCU only wants to check whether it needs
+	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
+	 * already contains a warning when RCU is not watching, so no point
+	 * in having another one here.
+	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
+	rcu_irq_enter_check_tick();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+
+	return ret;
+}
+
+static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
+{
+	lockdep_assert_irqs_disabled();
+
+	if (!regs_irqs_disabled(regs)) {
+		/*
+		 * If RCU was not watching on entry this needs to be done
+		 * carefully and needs the same ordering of lockdep/tracing
+		 * and RCU as the return to user mode path.
+		 */
+		if (state.exit_rcu) {
+			instrumentation_begin();
+			/* Tell the tracer that IRET will enable interrupts */
+			trace_hardirqs_on_prepare();
+			lockdep_hardirqs_on_prepare();
+			instrumentation_end();
+			ct_irq_exit();
+			lockdep_hardirqs_on(CALLER_ADDR0);
+			return;
+		}
+
+		instrumentation_begin();
+		if (IS_ENABLED(CONFIG_PREEMPTION))
+			irqentry_exit_cond_resched();
+
+		/* Covers both tracing and lockdep */
+		trace_hardirqs_on();
+		instrumentation_end();
+	} else {
+		/*
+		 * IRQ flags state is correct already. Just tell RCU if it
+		 * was not watching on entry.
+		 */
+		if (state.exit_rcu)
+			ct_irq_exit();
+	}
+}
+
 /**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index b5e05d87ba391..1034be02eae84 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -105,70 +105,16 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 
 noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 {
-	irqentry_state_t ret = {
-		.exit_rcu = false,
-	};
-
 	if (user_mode(regs)) {
-		irqentry_enter_from_user_mode(regs);
-		return ret;
-	}
+		irqentry_state_t ret = {
+			.exit_rcu = false,
+		};
 
-	/*
-	 * If this entry hit the idle task invoke ct_irq_enter() whether
-	 * RCU is watching or not.
-	 *
-	 * Interrupts can nest when the first interrupt invokes softirq
-	 * processing on return which enables interrupts.
-	 *
-	 * Scheduler ticks in the idle task can mark quiescent state and
-	 * terminate a grace period, if and only if the timer interrupt is
-	 * not nested into another interrupt.
-	 *
-	 * Checking for rcu_is_watching() here would prevent the nesting
-	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
-	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
-	 * assume that it is the first interrupt and eventually claim
-	 * quiescent state and end grace periods prematurely.
-	 *
-	 * Unconditionally invoke ct_irq_enter() so RCU state stays
-	 * consistent.
-	 *
-	 * TINY_RCU does not support EQS, so let the compiler eliminate
-	 * this part when enabled.
-	 */
-	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
-	    (is_idle_task(current) || arch_in_rcu_eqs())) {
-		/*
-		 * If RCU is not watching then the same careful
-		 * sequence vs. lockdep and tracing is required
-		 * as in irqentry_enter_from_user_mode().
-		 */
-		lockdep_hardirqs_off(CALLER_ADDR0);
-		ct_irq_enter();
-		instrumentation_begin();
-		kmsan_unpoison_entry_regs(regs);
-		trace_hardirqs_off_finish();
-		instrumentation_end();
-
-		ret.exit_rcu = true;
+		irqentry_enter_from_user_mode(regs);
 		return ret;
 	}
 
-	/*
-	 * If RCU is watching then RCU only wants to check whether it needs
-	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
-	 * already contains a warning when RCU is not watching, so no point
-	 * in having another one here.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	instrumentation_begin();
-	kmsan_unpoison_entry_regs(regs);
-	rcu_irq_enter_check_tick();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
-
-	return ret;
+	return irqentry_enter_from_kernel_mode(regs);
 }
 
 /**
@@ -212,43 +158,10 @@ void dynamic_irqentry_exit_cond_resched(void)
 
 noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
-
-	/* Check whether this returns to user mode */
-	if (user_mode(regs)) {
+	if (user_mode(regs))
 		irqentry_exit_to_user_mode(regs);
-	} else if (!regs_irqs_disabled(regs)) {
-		/*
-		 * If RCU was not watching on entry this needs to be done
-		 * carefully and needs the same ordering of lockdep/tracing
-		 * and RCU as the return to user mode path.
-		 */
-		if (state.exit_rcu) {
-			instrumentation_begin();
-			/* Tell the tracer that IRET will enable interrupts */
-			trace_hardirqs_on_prepare();
-			lockdep_hardirqs_on_prepare();
-			instrumentation_end();
-			ct_irq_exit();
-			lockdep_hardirqs_on(CALLER_ADDR0);
-			return;
-		}
-
-		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
-		/* Covers both tracing and lockdep */
-		trace_hardirqs_on();
-		instrumentation_end();
-	} else {
-		/*
-		 * IRQ flags state is correct already. Just tell RCU if it
-		 * was not watching on entry.
-		 */
-		if (state.exit_rcu)
-			ct_irq_exit();
-	}
+	else
+		irqentry_exit_to_kernel_mode(regs, state);
 }
 
 irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode()
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (3 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}() Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:40   ` Jinjie Ruan
                     ` (2 more replies)
  2026-04-07 13:16 ` [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked Mark Rutland
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, mark.rutland,
	ruanjinjie, vladimir.murzin, will

Some architecture-specific work needs to be performed between the state
management for exception entry/exit and the "real" work to handle the
exception. For example, arm64 needs to manipulate a number of exception
masking bits, with different exceptions requiring different masking.

Generally this can all be hidden in the architecture code, but for arm64
the current structure of irqentry_exit_to_kernel_mode() makes this
particularly difficult to handle in a way that is correct, maintainable,
and efficient.

The gory details are described in the thread surrounding:

  https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/

The summary is:

* Currently, irqentry_exit_to_kernel_mode() handles both involuntary
  preemption AND state management necessary for exception return.

* When scheduling (including involuntary preemption), arm64 needs to
  have all arm64-specific exceptions unmasked, though regular interrupts
  must be masked.

* Prior to the state management for exception return, arm64 needs to
  mask a number of arm64-specific exceptions, and perform some work with
  these exceptions masked (with RCU watching, etc).

While in theory it is possible to handle this with a new arch_*() hook
called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
and complicated, and doesn't match the flow used for exception return to
user mode, which has a separate 'prepare' step (where preemption can
occur) prior to the state management.

To solve this, refactor irqentry_exit_to_kernel_mode() to match the
style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
state management in a new irqentry_exit_to_kernel_mode_after_preempt()
function. The existing irqentry_exit_to_kernel_mode() is left as a
caller of both of these, avoiding the need to modify existing callers.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
comments for these new functions because the existing comments don't
seem all that consistent (e.g. for user mode vs kernel mode), and I
suspect we want to rewrite them all in one go for wider consistency.

I'm happy to respin this, or to follow-up with that as per your
preference.

Mark.

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 2206150e526d8..24830baa539c6 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -421,10 +421,18 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
 	return ret;
 }
 
-static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
+static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
+	if (regs_irqs_disabled(regs) || state.exit_rcu)
+		return;
+
+	if (IS_ENABLED(CONFIG_PREEMPTION))
+		irqentry_exit_cond_resched();
+}
 
+static __always_inline void
+irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
+{
 	if (!regs_irqs_disabled(regs)) {
 		/*
 		 * If RCU was not watching on entry this needs to be done
@@ -443,9 +451,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
 		}
 
 		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
 		instrumentation_end();
@@ -459,6 +464,17 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
 	}
 }
 
+static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
+{
+	lockdep_assert_irqs_disabled();
+
+	instrumentation_begin();
+	irqentry_exit_to_kernel_mode_preempt(regs, state);
+	instrumentation_end();
+
+	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
+}
+
 /**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (4 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode() Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:47   ` Jinjie Ruan
  2026-04-07 13:16 ` [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers Mark Rutland
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, mark.rutland, peterz,
	ruanjinjie, tglx, vladimir.murzin

On arm64, involuntary kernel preemption has been subtly broken since the
move to the generic irqentry code. When preemption occurs, the new task
may run with SError and Debug exceptions masked unexpectedly, leading to
a loss of RAS events, breakpoints, watchpoints, and single-step
exceptions.

Prior to moving to the generic irqentry code, involuntary preemption of
kernel mode would only occur when returning from regular interrupts, in
a state where interrupts were masked and all other arm64-specific
exceptions (SError, Debug, and pseudo-NMI) were unmasked. This is the
only state in which it is valid to switch tasks.

As part of moving to the generic irqentry code, the involuntary
preemption logic was moved such that involuntary preemption could occur
when returning from any (non-NMI) exception. As most exception handlers
mask all arm64-specific exceptions before this point, preemption could
occur in a state where arm64-specific exceptions were masked. This is
not a valid state to switch tasks, and resulted in the loss of
exceptions described above.

As a temporary bodge, avoid the loss of exceptions by avoiding
involuntary preemption when SError and/or Debug exceptions are masked.
Practically speaking this means that involuntary preemption will only
occur when returning from regular interrupts, as was the case before
moving to the generic irqentry code.

Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/entry-common.h | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
index cab8cd78f6938..20f0a7c7bde15 100644
--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -29,14 +29,19 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
 
 static inline bool arch_irqentry_exit_need_resched(void)
 {
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return false;
+	if (system_uses_irq_prio_masking()) {
+		/*
+		 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+		 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+		 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+		 * DAIF we must have handled an NMI, so skip preemption.
+		 */
+		if (read_sysreg(daif))
+			return false;
+	} else {
+		if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
+			return false;
+	}
 
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (5 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:49   ` Jinjie Ruan
  2026-04-07 13:16 ` [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() Mark Rutland
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, mark.rutland, peterz,
	ruanjinjie, tglx, vladimir.murzin

For historical reasons, arm64's entry code has arm64-specific functions
named enter_from_kernel_mode() and exit_to_kernel_mode(), which are
wrappers for similarly-named functions from the generic irqentry code.
Other arm64-specific wrappers have an 'arm64_' prefix to clearly
distinguish them from their generic counterparts, e.g.
arm64_enter_from_user_mode() and arm64_exit_to_user_mode().

For consistency and clarity, add an 'arm64_' prefix to these functions.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/entry-common.c | 38 ++++++++++++++++----------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3625797e9ee8f..3d01cdacdc7a2 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -35,7 +35,7 @@
  * Before this function is called it is not safe to call regular kernel code,
  * instrumentable code, or any code which may trigger an exception.
  */
-static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
+static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *regs)
 {
 	irqentry_state_t state;
 
@@ -51,8 +51,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
  * After this function returns it is not safe to call regular kernel code,
  * instrumentable code, or any code which may trigger an exception.
  */
-static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
-					irqentry_state_t state)
+static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
+					      irqentry_state_t state)
 {
 	mte_check_tfsr_exit();
 	irqentry_exit(regs, state);
@@ -298,11 +298,11 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
 	unsigned long far = read_sysreg(far_el1);
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_mem_abort(far, esr, regs);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
@@ -310,55 +310,55 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
 	unsigned long far = read_sysreg(far_el1);
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_sp_pc_abort(far, esr, regs);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_undef(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_bti(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_gcs(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_mops(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_breakpt(struct pt_regs *regs, unsigned long esr)
@@ -420,11 +420,11 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_fpac(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
@@ -491,13 +491,13 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 
 	irq_enter_rcu();
 	do_interrupt_handler(regs, handler);
 	irq_exit_rcu();
 
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 static void noinstr el1_interrupt(struct pt_regs *regs,
 				  void (*handler)(struct pt_regs *))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (6 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:50   ` Jinjie Ruan
  2026-04-07 13:16 ` [PATCH 09/10] arm64: entry: Use split preemption logic Mark Rutland
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, mark.rutland, peterz,
	ruanjinjie, tglx, vladimir.murzin

The generic irqentry code now provides irqentry_enter_from_kernel_mode()
and irqentry_exit_to_kernel_mode(), which can be used when an exception
is known to be taken from kernel mode. These can be inlined into
architecture-specific entry code, and avoid redundant work to test
whether the exception was taken from user mode.

Use these in arm64_enter_from_kernel_mode() and
arm64_exit_to_kernel_mode(), which are only used for exceptions known to
be taken from kernel mode. This will remove a small amount of redundant
work, and will permit further changes to arm64_exit_to_kernel_mode() in
subsequent patches.

There should be no funcitonal change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/entry-common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3d01cdacdc7a2..16a65987a6a9b 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -39,7 +39,7 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
 {
 	irqentry_state_t state;
 
-	state = irqentry_enter(regs);
+	state = irqentry_enter_from_kernel_mode(regs);
 	mte_check_tfsr_entry();
 	mte_disable_tco_entry(current);
 
@@ -55,7 +55,7 @@ static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
 					      irqentry_state_t state)
 {
 	mte_check_tfsr_exit();
-	irqentry_exit(regs, state);
+	irqentry_exit_to_kernel_mode(regs, state);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/10] arm64: entry: Use split preemption logic
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (7 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  1:52   ` Jinjie Ruan
  2026-04-07 13:16 ` [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time Mark Rutland
  2026-04-07 21:08 ` [PATCH 00/10] arm64/entry: Thomas Gleixner
  10 siblings, 1 reply; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, mark.rutland, peterz,
	ruanjinjie, tglx, vladimir.murzin

The generic irqentry code now provides
irqentry_exit_to_kernel_mode_preempt() and
irqentry_exit_to_kernel_mode_after_preempt(), which can be used
where architectures have different state requirements for involuntary
preemption and exception return, as is the case on arm64.

Use the new functions on arm64, aligning our exit to kernel mode logic
with the style of our exit to user mode logic. This removes the need for
the recently-added bodge in arch_irqentry_exit_need_resched(), and
allows preemption to occur when returning from any exception taken from
kernel mode, which is nicer for RT.

In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
fold the conditionality directly into the architecture-specific entry
code. That way all the logic necessary to avoid preempting from a
pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
avoiding redundant work for other exceptions, and making the flow a bit
clearer. At present it looks like that would require a larger
refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
as-is for now.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/entry-common.h | 21 ++++++++-------------
 arch/arm64/kernel/entry-common.c      | 12 ++++--------
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
index 20f0a7c7bde15..cab8cd78f6938 100644
--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -29,19 +29,14 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
 
 static inline bool arch_irqentry_exit_need_resched(void)
 {
-	if (system_uses_irq_prio_masking()) {
-		/*
-		 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-		 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-		 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-		 * DAIF we must have handled an NMI, so skip preemption.
-		 */
-		if (read_sysreg(daif))
-			return false;
-	} else {
-		if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
-			return false;
-	}
+	/*
+	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+	 * DAIF we must have handled an NMI, so skip preemption.
+	 */
+	if (system_uses_irq_prio_masking() && read_sysreg(daif))
+		return false;
 
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 16a65987a6a9b..f42ce7b5c67f3 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -54,8 +54,11 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
 static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
 					      irqentry_state_t state)
 {
+	local_irq_disable();
+	irqentry_exit_to_kernel_mode_preempt(regs, state);
+	local_daif_mask();
 	mte_check_tfsr_exit();
-	irqentry_exit_to_kernel_mode(regs, state);
+	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
 }
 
 /*
@@ -301,7 +304,6 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_mem_abort(far, esr, regs);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -313,7 +315,6 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_sp_pc_abort(far, esr, regs);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -324,7 +325,6 @@ static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_undef(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -335,7 +335,6 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_bti(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -346,7 +345,6 @@ static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_gcs(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -357,7 +355,6 @@ static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_mops(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -423,7 +420,6 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_fpac(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (8 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 09/10] arm64: entry: Use split preemption logic Mark Rutland
@ 2026-04-07 13:16 ` Mark Rutland
  2026-04-08  2:17   ` Jinjie Ruan
  2026-04-07 21:08 ` [PATCH 00/10] arm64/entry: Thomas Gleixner
  10 siblings, 1 reply; 34+ messages in thread
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, mark.rutland, peterz,
	ruanjinjie, tglx, vladimir.murzin

When __switch_to() switches from a 'prev' task to a 'next' task, various
pieces of CPU state are expected to have specific values, such that
these do not need to be saved/restored. If any of these hold an
unexpected value when switching away from the prev task, they could lead
to surprising behaviour in the context of the next task, and it would be
difficult to determine where they were configured to their unexpected
value.

Add some checks for DAIF and PMR at task-switch time so that we can
detect such issues.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 489554931231e..ba9038434d2fb 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -699,6 +699,29 @@ void update_sctlr_el1(u64 sctlr)
 	isb();
 }
 
+static inline void debug_switch_state(void)
+{
+	if (system_uses_irq_prio_masking()) {
+		unsigned long daif_expected = 0;
+		unsigned long daif_actual = read_sysreg(daif);
+		unsigned long pmr_expected = GIC_PRIO_IRQOFF;
+		unsigned long pmr_actual = read_sysreg_s(SYS_ICC_PMR_EL1);
+
+		WARN_ONCE(daif_actual != daif_expected ||
+			  pmr_actual != pmr_expected,
+			  "Unexpected DAIF + PMR: 0x%lx + 0x%lx (expected 0x%lx + 0x%lx)\n",
+			  daif_actual, pmr_actual,
+			  daif_expected, pmr_expected);
+	} else {
+		unsigned long daif_expected = DAIF_PROCCTX_NOIRQ;
+		unsigned long daif_actual = read_sysreg(daif);
+
+		WARN_ONCE(daif_actual != daif_expected,
+			  "Unexpected DAIF value: 0x%lx (expected 0x%lx)\n",
+			  daif_actual, daif_expected);
+	}
+}
+
 /*
  * Thread switching.
  */
@@ -708,6 +731,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
 {
 	struct task_struct *last;
 
+	debug_switch_state();
+
 	fpsimd_thread_switch(next);
 	tls_thread_switch(next);
 	hw_breakpoint_thread_switch(next);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] arm64/entry:
  2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
                   ` (9 preceding siblings ...)
  2026-04-07 13:16 ` [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time Mark Rutland
@ 2026-04-07 21:08 ` Thomas Gleixner
  2026-04-08  9:02   ` Mark Rutland
  2026-04-08  9:19   ` Peter Zijlstra
  10 siblings, 2 replies; 34+ messages in thread
From: Thomas Gleixner @ 2026-04-07 21:08 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Catalin Marinas,
	Peter Zijlstra, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, mark.rutland, ruanjinjie,
	vladimir.murzin

On Tue, Apr 07 2026 at 14:16, Mark Rutland wrote:
> I've split the series into a prefix of changes for generic irqentry,
> followed by changes to the arm64 code. I'm hoping that we can queue the
> generic irqentry patches onto a stable branch, or take those via arm64.
> The patches are as follows:
>
> * Patches 1 and 2 are cleanup to the generic irqentry code. These have no
>   functional impact, and I think these can be taken regardless of the
>   rest of the series.
>
> * Patches 3 to 5 refactor the generic irqentry code as described above,
>   providing separate irqentry_{enter,exit}() functions and providing a
>   split form of irqentry_exit_to_kernel_mode() similar to what exists
>   for irqentry_exit_to_user_mode(). These patches alone should have no
>   functional impact.

I looked through them and I can't find any problem with them. I queued
them localy and added the missing kernel doc as I promised you on IRC.

As I have quite a conflict pending in the tip tree with other changes
related to the generic entry code, I suggest that I queue 1-5, tag them
for arm64 consumption and merge them into the conflicting branch to
avoid trouble with pull request ordering and headaches for the -next
people.

Does that work for you?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/10] entry: Fix stale comment for irqentry_enter()
  2026-04-07 13:16 ` [PATCH 01/10] entry: Fix stale comment for irqentry_enter() Mark Rutland
@ 2026-04-08  1:14   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:14 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, vladimir.murzin,
	will



On 2026/4/7 21:16, Mark Rutland wrote:
> The kerneldoc comment for irqentry_enter() refers to idtentry_exit(),
> which is an accidental holdover from the x86 entry code that the generic
> irqentry code was based on.
> 
> Correct this to refer to irqentry_exit().
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/irq-entry-common.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> index d26d1b1bcbfb9..3cf4d21168ba1 100644
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -394,7 +394,7 @@ typedef struct irqentry_state {
>   * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
>   * would not be possible.
>   *
> - * Returns: An opaque object that must be passed to idtentry_exit()
> + * Returns: An opaque object that must be passed to irqentry_exit()

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

>   */
>  irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
>  

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user()
  2026-04-07 13:16 ` [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user() Mark Rutland
@ 2026-04-08  1:18   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:18 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, vladimir.murzin,
	will



On 2026/4/7 21:16, Mark Rutland wrote:
> The local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user()
> functions are never overridden by architecture code, and are always
> equivalent to local_irq_enable() and local_irq_disable().
> 
> These functions were added on the assumption that arm64 would override
> them to manage 'DAIF' exception masking, as described by Thomas Gleixner
> in these threads:
> 
>   https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/
>   https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/
> 
> In practice arm64 did not need to override either. Prior to moving to
> the generic irqentry code, arm64's management of DAIF was reworked in
> commit:
> 
>   97d935faacde ("arm64: Unmask Debug + SError in do_notify_resume()")
> 
> Since that commit, arm64 only masks interrupts during the 'prepare' step
> when returning to user mode, and masks other DAIF exceptions later.
> Within arm64_exit_to_user_mode(), the arm64 entry code is as follows:
> 
> 	local_irq_disable();
> 	exit_to_user_mode_prepare_legacy(regs);
> 	local_daif_mask();
> 	mte_check_tfsr_exit();
> 	exit_to_user_mode();
> 
> Remove the unnecessary local_irq_enable_exit_to_user() and
> local_irq_disable_exit_to_user() functions.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/entry-common.h     |  2 +-
>  include/linux/irq-entry-common.h | 31 -------------------------------
>  kernel/entry/common.c            |  4 ++--
>  3 files changed, 3 insertions(+), 34 deletions(-)

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

> 
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index f83ca0abf2cdb..dbaa153100f44 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -321,7 +321,7 @@ static __always_inline void syscall_exit_to_user_mode(struct pt_regs *regs)
>  {
>  	instrumentation_begin();
>  	syscall_exit_to_user_mode_work(regs);
> -	local_irq_disable_exit_to_user();
> +	local_irq_disable();
>  	syscall_exit_to_user_mode_prepare(regs);
>  	instrumentation_end();
>  	exit_to_user_mode();
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> index 3cf4d21168ba1..93b4b551f7ae4 100644
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -100,37 +100,6 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
>  	instrumentation_end();
>  }
>  
> -/**
> - * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Defaults to local_irq_enable(). Can be supplied by architecture specific
> - * code.
> - */
> -static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
> -
> -#ifndef local_irq_enable_exit_to_user
> -static __always_inline void local_irq_enable_exit_to_user(unsigned long ti_work)
> -{
> -	local_irq_enable();
> -}
> -#endif
> -
> -/**
> - * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
> - *
> - * Defaults to local_irq_disable(). Can be supplied by architecture specific
> - * code.
> - */
> -static inline void local_irq_disable_exit_to_user(void);
> -
> -#ifndef local_irq_disable_exit_to_user
> -static __always_inline void local_irq_disable_exit_to_user(void)
> -{
> -	local_irq_disable();
> -}
> -#endif
> -
>  /**
>   * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
>   *				 to user mode.
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 9ef63e4147913..b5e05d87ba391 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -47,7 +47,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
>  	 */
>  	while (ti_work & EXIT_TO_USER_MODE_WORK_LOOP) {
>  
> -		local_irq_enable_exit_to_user(ti_work);
> +		local_irq_enable();
>  
>  		if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
>  			if (!rseq_grant_slice_extension(ti_work & TIF_SLICE_EXT_DENY))
> @@ -74,7 +74,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
>  		 * might have changed while interrupts and preemption was
>  		 * enabled above.
>  		 */
> -		local_irq_disable_exit_to_user();
> +		local_irq_disable();
>  
>  		/* Check if any of the above work has queued a deferred wakeup */
>  		tick_nohz_user_enter_prepare();

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/10] entry: Move irqentry_enter() prototype later
  2026-04-07 13:16 ` [PATCH 03/10] entry: Move irqentry_enter() prototype later Mark Rutland
@ 2026-04-08  1:21   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:21 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, vladimir.murzin,
	will



On 2026/4/7 21:16, Mark Rutland wrote:
> Subsequent patches will rework the irqentry_*() functions. The end
> result (and the intermediate diffs) will be much clearer if the
> prototype for the irqentry_enter() function is moved later, immediately
> before the prototype of the irqentry_exit() function.
> 
> Move the prototype later.
> 
> This is purely a move; there should be no functional change as a result
> of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/irq-entry-common.h | 44 ++++++++++++++++----------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> index 93b4b551f7ae4..d1e8591a59195 100644
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -334,6 +334,28 @@ typedef struct irqentry_state {
>  } irqentry_state_t;
>  #endif
>  
> +/**
> + * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
> + *
> + * Conditional reschedule with additional sanity checks.
> + */
> +void raw_irqentry_exit_cond_resched(void);
> +
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> +#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> +#define irqentry_exit_cond_resched_dynamic_disabled	NULL
> +DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> +#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> +#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +void dynamic_irqentry_exit_cond_resched(void);
> +#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
> +#endif
> +#else /* CONFIG_PREEMPT_DYNAMIC */
> +#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
> +#endif /* CONFIG_PREEMPT_DYNAMIC */
> +

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

>  /**
>   * irqentry_enter - Handle state tracking on ordinary interrupt entries
>   * @regs:	Pointer to pt_regs of interrupted context
> @@ -367,28 +389,6 @@ typedef struct irqentry_state {
>   */
>  irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
>  
> -/**
> - * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
> - *
> - * Conditional reschedule with additional sanity checks.
> - */
> -void raw_irqentry_exit_cond_resched(void);
> -
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> -#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> -#define irqentry_exit_cond_resched_dynamic_disabled	NULL
> -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> -#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> -#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -void dynamic_irqentry_exit_cond_resched(void);
> -#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
> -#endif
> -#else /* CONFIG_PREEMPT_DYNAMIC */
> -#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
> -#endif /* CONFIG_PREEMPT_DYNAMIC */
> -
>  /**
>   * irqentry_exit - Handle return from exception that used irqentry_enter()
>   * @regs:	Pointer to pt_regs (exception entry regs)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}()
  2026-04-07 13:16 ` [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}() Mark Rutland
@ 2026-04-08  1:32   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:32 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, vladimir.murzin,
	will



On 2026/4/7 21:16, Mark Rutland wrote:
> The generic irqentry code has entry/exit functions specifically for
> exceptions taken from user mode, but doesn't have entry/exit functions
> specifically for exceptions taken from kernel mode.
> 
> It would be helpful to have separate entry/exit functions specifically
> for exceptions taken from kernel mode. This would make the structure of
> the entry code more consistent, and would make it easier for
> architectures to manage logic specific to exceptions taken from kernel
> mode.
> 
> Move the logic specific to kernel mode out of irqentry_enter() and
> irqentry_exit() into new irqentry_enter_from_kernel_mode() and
> irqentry_exit_to_kernel_mode() functions. These are marked
> __always_inline and placed in irq-entry-common.h, as with
> irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so
> that they can be inlined into architecture-specific wrappers. The
> existing out-of-line irqentry_enter() and irqentry_exit() functions
> retained as callers of the new functions.
> 
> The lockdep assertion from irqentry_exit() is moved into
> irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This
> was previously missing from irqentry_exit_to_user_mode() when called
> directly, and any new lockdep assertion failure relating from this
> change is a latent bug.
> 
> Aside from the lockdep change noted above, there should be no functional
> change as a result of this patch.

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/irq-entry-common.h | 103 +++++++++++++++++++++++++++++++
>  kernel/entry/common.c            | 103 +++----------------------------
>  2 files changed, 111 insertions(+), 95 deletions(-)
> 
> Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
> comments for these new functions because the existing comments don't
> seem all that consistent (e.g. for user mode vs kernel mode), and I
> suspect we want to rewrite them all in one go for wider consistency.
> 
> I'm happy to respin this, or to follow-up with that as per your
> preference.
> 
> Mark.
> 
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> index d1e8591a59195..2206150e526d8 100644
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -304,6 +304,8 @@ static __always_inline void irqentry_enter_from_user_mode(struct pt_regs *regs)
>   */
>  static __always_inline void irqentry_exit_to_user_mode(struct pt_regs *regs)
>  {
> +	lockdep_assert_irqs_disabled();
> +
>  	instrumentation_begin();
>  	irqentry_exit_to_user_mode_prepare(regs);
>  	instrumentation_end();
> @@ -356,6 +358,107 @@ void dynamic_irqentry_exit_cond_resched(void);
>  #define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
>  #endif /* CONFIG_PREEMPT_DYNAMIC */
>  
> +static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct pt_regs *regs)
> +{
> +	irqentry_state_t ret = {
> +		.exit_rcu = false,
> +	};
> +
> +	/*
> +	 * If this entry hit the idle task invoke ct_irq_enter() whether
> +	 * RCU is watching or not.
> +	 *
> +	 * Interrupts can nest when the first interrupt invokes softirq
> +	 * processing on return which enables interrupts.
> +	 *
> +	 * Scheduler ticks in the idle task can mark quiescent state and
> +	 * terminate a grace period, if and only if the timer interrupt is
> +	 * not nested into another interrupt.
> +	 *
> +	 * Checking for rcu_is_watching() here would prevent the nesting
> +	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
> +	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
> +	 * assume that it is the first interrupt and eventually claim
> +	 * quiescent state and end grace periods prematurely.
> +	 *
> +	 * Unconditionally invoke ct_irq_enter() so RCU state stays
> +	 * consistent.
> +	 *
> +	 * TINY_RCU does not support EQS, so let the compiler eliminate
> +	 * this part when enabled.
> +	 */
> +	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
> +	    (is_idle_task(current) || arch_in_rcu_eqs())) {
> +		/*
> +		 * If RCU is not watching then the same careful
> +		 * sequence vs. lockdep and tracing is required
> +		 * as in irqentry_enter_from_user_mode().
> +		 */
> +		lockdep_hardirqs_off(CALLER_ADDR0);
> +		ct_irq_enter();
> +		instrumentation_begin();
> +		kmsan_unpoison_entry_regs(regs);
> +		trace_hardirqs_off_finish();
> +		instrumentation_end();
> +
> +		ret.exit_rcu = true;
> +		return ret;
> +	}
> +
> +	/*
> +	 * If RCU is watching then RCU only wants to check whether it needs
> +	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
> +	 * already contains a warning when RCU is not watching, so no point
> +	 * in having another one here.
> +	 */
> +	lockdep_hardirqs_off(CALLER_ADDR0);
> +	instrumentation_begin();
> +	kmsan_unpoison_entry_regs(regs);
> +	rcu_irq_enter_check_tick();
> +	trace_hardirqs_off_finish();
> +	instrumentation_end();
> +
> +	return ret;
> +}
> +
> +static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	if (!regs_irqs_disabled(regs)) {
> +		/*
> +		 * If RCU was not watching on entry this needs to be done
> +		 * carefully and needs the same ordering of lockdep/tracing
> +		 * and RCU as the return to user mode path.
> +		 */
> +		if (state.exit_rcu) {
> +			instrumentation_begin();
> +			/* Tell the tracer that IRET will enable interrupts */
> +			trace_hardirqs_on_prepare();
> +			lockdep_hardirqs_on_prepare();
> +			instrumentation_end();
> +			ct_irq_exit();
> +			lockdep_hardirqs_on(CALLER_ADDR0);
> +			return;
> +		}
> +
> +		instrumentation_begin();
> +		if (IS_ENABLED(CONFIG_PREEMPTION))
> +			irqentry_exit_cond_resched();
> +
> +		/* Covers both tracing and lockdep */
> +		trace_hardirqs_on();
> +		instrumentation_end();
> +	} else {
> +		/*
> +		 * IRQ flags state is correct already. Just tell RCU if it
> +		 * was not watching on entry.
> +		 */
> +		if (state.exit_rcu)
> +			ct_irq_exit();
> +	}
> +}
> +
>  /**
>   * irqentry_enter - Handle state tracking on ordinary interrupt entries
>   * @regs:	Pointer to pt_regs of interrupted context
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index b5e05d87ba391..1034be02eae84 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -105,70 +105,16 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  
>  noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>  {
> -	irqentry_state_t ret = {
> -		.exit_rcu = false,
> -	};
> -
>  	if (user_mode(regs)) {
> -		irqentry_enter_from_user_mode(regs);
> -		return ret;
> -	}
> +		irqentry_state_t ret = {
> +			.exit_rcu = false,
> +		};
>  
> -	/*
> -	 * If this entry hit the idle task invoke ct_irq_enter() whether
> -	 * RCU is watching or not.
> -	 *
> -	 * Interrupts can nest when the first interrupt invokes softirq
> -	 * processing on return which enables interrupts.
> -	 *
> -	 * Scheduler ticks in the idle task can mark quiescent state and
> -	 * terminate a grace period, if and only if the timer interrupt is
> -	 * not nested into another interrupt.
> -	 *
> -	 * Checking for rcu_is_watching() here would prevent the nesting
> -	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
> -	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
> -	 * assume that it is the first interrupt and eventually claim
> -	 * quiescent state and end grace periods prematurely.
> -	 *
> -	 * Unconditionally invoke ct_irq_enter() so RCU state stays
> -	 * consistent.
> -	 *
> -	 * TINY_RCU does not support EQS, so let the compiler eliminate
> -	 * this part when enabled.
> -	 */
> -	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
> -	    (is_idle_task(current) || arch_in_rcu_eqs())) {
> -		/*
> -		 * If RCU is not watching then the same careful
> -		 * sequence vs. lockdep and tracing is required
> -		 * as in irqentry_enter_from_user_mode().
> -		 */
> -		lockdep_hardirqs_off(CALLER_ADDR0);
> -		ct_irq_enter();
> -		instrumentation_begin();
> -		kmsan_unpoison_entry_regs(regs);
> -		trace_hardirqs_off_finish();
> -		instrumentation_end();
> -
> -		ret.exit_rcu = true;
> +		irqentry_enter_from_user_mode(regs);
>  		return ret;
>  	}
>  
> -	/*
> -	 * If RCU is watching then RCU only wants to check whether it needs
> -	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
> -	 * already contains a warning when RCU is not watching, so no point
> -	 * in having another one here.
> -	 */
> -	lockdep_hardirqs_off(CALLER_ADDR0);
> -	instrumentation_begin();
> -	kmsan_unpoison_entry_regs(regs);
> -	rcu_irq_enter_check_tick();
> -	trace_hardirqs_off_finish();
> -	instrumentation_end();
> -
> -	return ret;
> +	return irqentry_enter_from_kernel_mode(regs);
>  }
>  
>  /**
> @@ -212,43 +158,10 @@ void dynamic_irqentry_exit_cond_resched(void)
>  
>  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>  {
> -	lockdep_assert_irqs_disabled();
> -
> -	/* Check whether this returns to user mode */
> -	if (user_mode(regs)) {
> +	if (user_mode(regs))
>  		irqentry_exit_to_user_mode(regs);
> -	} else if (!regs_irqs_disabled(regs)) {
> -		/*
> -		 * If RCU was not watching on entry this needs to be done
> -		 * carefully and needs the same ordering of lockdep/tracing
> -		 * and RCU as the return to user mode path.
> -		 */
> -		if (state.exit_rcu) {
> -			instrumentation_begin();
> -			/* Tell the tracer that IRET will enable interrupts */
> -			trace_hardirqs_on_prepare();
> -			lockdep_hardirqs_on_prepare();
> -			instrumentation_end();
> -			ct_irq_exit();
> -			lockdep_hardirqs_on(CALLER_ADDR0);
> -			return;
> -		}
> -
> -		instrumentation_begin();
> -		if (IS_ENABLED(CONFIG_PREEMPTION))
> -			irqentry_exit_cond_resched();
> -
> -		/* Covers both tracing and lockdep */
> -		trace_hardirqs_on();
> -		instrumentation_end();
> -	} else {
> -		/*
> -		 * IRQ flags state is correct already. Just tell RCU if it
> -		 * was not watching on entry.
> -		 */
> -		if (state.exit_rcu)
> -			ct_irq_exit();
> -	}
> +	else
> +		irqentry_exit_to_kernel_mode(regs, state);
>  }
>  
>  irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode()
  2026-04-07 13:16 ` [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode() Mark Rutland
@ 2026-04-08  1:40   ` Jinjie Ruan
  2026-04-08  9:17   ` Jinjie Ruan
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  2 siblings, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:40 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, vladimir.murzin,
	will



On 2026/4/7 21:16, Mark Rutland wrote:
> Some architecture-specific work needs to be performed between the state
> management for exception entry/exit and the "real" work to handle the
> exception. For example, arm64 needs to manipulate a number of exception
> masking bits, with different exceptions requiring different masking.
> 
> Generally this can all be hidden in the architecture code, but for arm64
> the current structure of irqentry_exit_to_kernel_mode() makes this
> particularly difficult to handle in a way that is correct, maintainable,
> and efficient.
> 
> The gory details are described in the thread surrounding:
> 
>   https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/
> 
> The summary is:
> 
> * Currently, irqentry_exit_to_kernel_mode() handles both involuntary
>   preemption AND state management necessary for exception return.
> 
> * When scheduling (including involuntary preemption), arm64 needs to
>   have all arm64-specific exceptions unmasked, though regular interrupts
>   must be masked.
> 
> * Prior to the state management for exception return, arm64 needs to
>   mask a number of arm64-specific exceptions, and perform some work with
>   these exceptions masked (with RCU watching, etc).
> 
> While in theory it is possible to handle this with a new arch_*() hook
> called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
> and complicated, and doesn't match the flow used for exception return to
> user mode, which has a separate 'prepare' step (where preemption can
> occur) prior to the state management.
> 
> To solve this, refactor irqentry_exit_to_kernel_mode() to match the
> style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
> into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
> state management in a new irqentry_exit_to_kernel_mode_after_preempt()
> function. The existing irqentry_exit_to_kernel_mode() is left as a
> caller of both of these, avoiding the need to modify existing callers.
> 
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/irq-entry-common.h | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
> comments for these new functions because the existing comments don't
> seem all that consistent (e.g. for user mode vs kernel mode), and I
> suspect we want to rewrite them all in one go for wider consistency.
> 
> I'm happy to respin this, or to follow-up with that as per your
> preference.
> 
> Mark.
> 
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> index 2206150e526d8..24830baa539c6 100644
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -421,10 +421,18 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
>  	return ret;
>  }
>  
> -static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> +static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, irqentry_state_t state)
>  {
> -	lockdep_assert_irqs_disabled();
> +	if (regs_irqs_disabled(regs) || state.exit_rcu)
> +		return;
> +
> +	if (IS_ENABLED(CONFIG_PREEMPTION))
> +		irqentry_exit_cond_resched();
> +}
>  
> +static __always_inline void
> +irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
> +{
>  	if (!regs_irqs_disabled(regs)) {
>  		/*
>  		 * If RCU was not watching on entry this needs to be done
> @@ -443,9 +451,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
>  		}
>  
>  		instrumentation_begin();
> -		if (IS_ENABLED(CONFIG_PREEMPTION))
> -			irqentry_exit_cond_resched();
> -
>  		/* Covers both tracing and lockdep */
>  		trace_hardirqs_on();
>  		instrumentation_end();
> @@ -459,6 +464,17 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
>  	}
>  }
>  
> +static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	instrumentation_begin();
> +	irqentry_exit_to_kernel_mode_preempt(regs, state);
> +	instrumentation_end();
> +
> +	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
> +}

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

> +
>  /**
>   * irqentry_enter - Handle state tracking on ordinary interrupt entries
>   * @regs:	Pointer to pt_regs of interrupted context

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked
  2026-04-07 13:16 ` [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked Mark Rutland
@ 2026-04-08  1:47   ` Jinjie Ruan
  0 siblings, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:47 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, peterz, tglx,
	vladimir.murzin



On 2026/4/7 21:16, Mark Rutland wrote:
> On arm64, involuntary kernel preemption has been subtly broken since the
> move to the generic irqentry code. When preemption occurs, the new task
> may run with SError and Debug exceptions masked unexpectedly, leading to
> a loss of RAS events, breakpoints, watchpoints, and single-step
> exceptions.
> 
> Prior to moving to the generic irqentry code, involuntary preemption of
> kernel mode would only occur when returning from regular interrupts, in
> a state where interrupts were masked and all other arm64-specific
> exceptions (SError, Debug, and pseudo-NMI) were unmasked. This is the
> only state in which it is valid to switch tasks.
> 
> As part of moving to the generic irqentry code, the involuntary
> preemption logic was moved such that involuntary preemption could occur
> when returning from any (non-NMI) exception. As most exception handlers
> mask all arm64-specific exceptions before this point, preemption could
> occur in a state where arm64-specific exceptions were masked. This is
> not a valid state to switch tasks, and resulted in the loss of
> exceptions described above.
> 
> As a temporary bodge, avoid the loss of exceptions by avoiding
> involuntary preemption when SError and/or Debug exceptions are masked.
> Practically speaking this means that involuntary preemption will only
> occur when returning from regular interrupts, as was the case before
> moving to the generic irqentry code.
> 
> Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
> Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/entry-common.h | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
> index cab8cd78f6938..20f0a7c7bde15 100644
> --- a/arch/arm64/include/asm/entry-common.h
> +++ b/arch/arm64/include/asm/entry-common.h
> @@ -29,14 +29,19 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
>  
>  static inline bool arch_irqentry_exit_need_resched(void)
>  {
> -	/*
> -	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> -	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> -	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> -	 * DAIF we must have handled an NMI, so skip preemption.
> -	 */
> -	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> -		return false;
> +	if (system_uses_irq_prio_masking()) {
> +		/*
> +		 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +		 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +		 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +		 * DAIF we must have handled an NMI, so skip preemption.
> +		 */
> +		if (read_sysreg(daif))
> +			return false;
> +	} else {
> +		if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
> +			return false;

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

> +	}
>  
>  	/*
>  	 * Preempting a task from an IRQ means we leave copies of PSTATE

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers
  2026-04-07 13:16 ` [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers Mark Rutland
@ 2026-04-08  1:49   ` Jinjie Ruan
  0 siblings, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:49 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, peterz, tglx,
	vladimir.murzin



On 2026/4/7 21:16, Mark Rutland wrote:
> For historical reasons, arm64's entry code has arm64-specific functions
> named enter_from_kernel_mode() and exit_to_kernel_mode(), which are
> wrappers for similarly-named functions from the generic irqentry code.
> Other arm64-specific wrappers have an 'arm64_' prefix to clearly
> distinguish them from their generic counterparts, e.g.
> arm64_enter_from_user_mode() and arm64_exit_to_user_mode().
> 
> For consistency and clarity, add an 'arm64_' prefix to these functions.
> 
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/entry-common.c | 38 ++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 19 deletions(-)

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 3625797e9ee8f..3d01cdacdc7a2 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -35,7 +35,7 @@
>   * Before this function is called it is not safe to call regular kernel code,
>   * instrumentable code, or any code which may trigger an exception.
>   */
> -static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> +static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *regs)
>  {
>  	irqentry_state_t state;
>  
> @@ -51,8 +51,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>   * After this function returns it is not safe to call regular kernel code,
>   * instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
> -					irqentry_state_t state)
> +static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
> +					      irqentry_state_t state)
>  {
>  	mte_check_tfsr_exit();
>  	irqentry_exit(regs, state);
> @@ -298,11 +298,11 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
>  	unsigned long far = read_sysreg(far_el1);
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_mem_abort(far, esr, regs);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
> @@ -310,55 +310,55 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
>  	unsigned long far = read_sysreg(far_el1);
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_sp_pc_abort(far, esr, regs);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
>  {
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_undef(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
>  {
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_bti(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
>  {
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_gcs(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
>  {
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_mops(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_breakpt(struct pt_regs *regs, unsigned long esr)
> @@ -420,11 +420,11 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
>  {
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_fpac(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
>  asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
> @@ -491,13 +491,13 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
>  {
>  	irqentry_state_t state;
>  
> -	state = enter_from_kernel_mode(regs);
> +	state = arm64_enter_from_kernel_mode(regs);
>  
>  	irq_enter_rcu();
>  	do_interrupt_handler(regs, handler);
>  	irq_exit_rcu();
>  
> -	exit_to_kernel_mode(regs, state);
> +	arm64_exit_to_kernel_mode(regs, state);
>  }
>  static void noinstr el1_interrupt(struct pt_regs *regs,
>  				  void (*handler)(struct pt_regs *))

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  2026-04-07 13:16 ` [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() Mark Rutland
@ 2026-04-08  1:50   ` Jinjie Ruan
  0 siblings, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:50 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, peterz, tglx,
	vladimir.murzin



On 2026/4/7 21:16, Mark Rutland wrote:
> The generic irqentry code now provides irqentry_enter_from_kernel_mode()
> and irqentry_exit_to_kernel_mode(), which can be used when an exception
> is known to be taken from kernel mode. These can be inlined into
> architecture-specific entry code, and avoid redundant work to test
> whether the exception was taken from user mode.
> 
> Use these in arm64_enter_from_kernel_mode() and
> arm64_exit_to_kernel_mode(), which are only used for exceptions known to
> be taken from kernel mode. This will remove a small amount of redundant
> work, and will permit further changes to arm64_exit_to_kernel_mode() in
> subsequent patches.
> 
> There should be no funcitonal change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/entry-common.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 3d01cdacdc7a2..16a65987a6a9b 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -39,7 +39,7 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
>  {
>  	irqentry_state_t state;
>  
> -	state = irqentry_enter(regs);
> +	state = irqentry_enter_from_kernel_mode(regs);
>  	mte_check_tfsr_entry();
>  	mte_disable_tco_entry(current);
>  
> @@ -55,7 +55,7 @@ static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
>  					      irqentry_state_t state)
>  {
>  	mte_check_tfsr_exit();
> -	irqentry_exit(regs, state);
> +	irqentry_exit_to_kernel_mode(regs, state);

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/10] arm64: entry: Use split preemption logic
  2026-04-07 13:16 ` [PATCH 09/10] arm64: entry: Use split preemption logic Mark Rutland
@ 2026-04-08  1:52   ` Jinjie Ruan
  0 siblings, 0 replies; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  1:52 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, peterz, tglx,
	vladimir.murzin



On 2026/4/7 21:16, Mark Rutland wrote:
> The generic irqentry code now provides
> irqentry_exit_to_kernel_mode_preempt() and
> irqentry_exit_to_kernel_mode_after_preempt(), which can be used
> where architectures have different state requirements for involuntary
> preemption and exception return, as is the case on arm64.
> 
> Use the new functions on arm64, aligning our exit to kernel mode logic
> with the style of our exit to user mode logic. This removes the need for
> the recently-added bodge in arch_irqentry_exit_need_resched(), and
> allows preemption to occur when returning from any exception taken from
> kernel mode, which is nicer for RT.
> 
> In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
> fold the conditionality directly into the architecture-specific entry
> code. That way all the logic necessary to avoid preempting from a
> pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
> avoiding redundant work for other exceptions, and making the flow a bit
> clearer. At present it looks like that would require a larger
> refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
> as-is for now.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/entry-common.h | 21 ++++++++-------------
>  arch/arm64/kernel/entry-common.c      | 12 ++++--------
>  2 files changed, 12 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
> index 20f0a7c7bde15..cab8cd78f6938 100644
> --- a/arch/arm64/include/asm/entry-common.h
> +++ b/arch/arm64/include/asm/entry-common.h
> @@ -29,19 +29,14 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
>  
>  static inline bool arch_irqentry_exit_need_resched(void)
>  {
> -	if (system_uses_irq_prio_masking()) {
> -		/*
> -		 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> -		 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> -		 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> -		 * DAIF we must have handled an NMI, so skip preemption.
> -		 */
> -		if (read_sysreg(daif))
> -			return false;
> -	} else {
> -		if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
> -			return false;
> -	}
> +	/*
> +	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +	 * DAIF we must have handled an NMI, so skip preemption.
> +	 */
> +	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> +		return false;
>  
>  	/*
>  	 * Preempting a task from an IRQ means we leave copies of PSTATE
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 16a65987a6a9b..f42ce7b5c67f3 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -54,8 +54,11 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
>  static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
>  					      irqentry_state_t state)
>  {
> +	local_irq_disable();
> +	irqentry_exit_to_kernel_mode_preempt(regs, state);
> +	local_daif_mask();
>  	mte_check_tfsr_exit();
> -	irqentry_exit_to_kernel_mode(regs, state);
> +	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
>  }

Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>

>  
>  /*
> @@ -301,7 +304,6 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_mem_abort(far, esr, regs);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
> @@ -313,7 +315,6 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_sp_pc_abort(far, esr, regs);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
> @@ -324,7 +325,6 @@ static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_undef(regs, esr);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
> @@ -335,7 +335,6 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_bti(regs, esr);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
> @@ -346,7 +345,6 @@ static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_gcs(regs, esr);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
> @@ -357,7 +355,6 @@ static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_mops(regs, esr);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  
> @@ -423,7 +420,6 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
>  	state = arm64_enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_el1_fpac(regs, esr);
> -	local_daif_mask();
>  	arm64_exit_to_kernel_mode(regs, state);
>  }
>  

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time
  2026-04-07 13:16 ` [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time Mark Rutland
@ 2026-04-08  2:17   ` Jinjie Ruan
  2026-04-08  9:08     ` Mark Rutland
  0 siblings, 1 reply; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  2:17 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: ada.coupriediaz, linux-kernel, luto, peterz, tglx,
	vladimir.murzin



On 2026/4/7 21:16, Mark Rutland wrote:
> When __switch_to() switches from a 'prev' task to a 'next' task, various
> pieces of CPU state are expected to have specific values, such that
> these do not need to be saved/restored. If any of these hold an
> unexpected value when switching away from the prev task, they could lead
> to surprising behaviour in the context of the next task, and it would be
> difficult to determine where they were configured to their unexpected
> value.
> 
> Add some checks for DAIF and PMR at task-switch time so that we can
> detect such issues.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/process.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 489554931231e..ba9038434d2fb 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -699,6 +699,29 @@ void update_sctlr_el1(u64 sctlr)
>  	isb();
>  }
>  
> +static inline void debug_switch_state(void)
> +{
> +	if (system_uses_irq_prio_masking()) {
> +		unsigned long daif_expected = 0;
> +		unsigned long daif_actual = read_sysreg(daif);
> +		unsigned long pmr_expected = GIC_PRIO_IRQOFF;
> +		unsigned long pmr_actual = read_sysreg_s(SYS_ICC_PMR_EL1);
> +
> +		WARN_ONCE(daif_actual != daif_expected ||
> +			  pmr_actual != pmr_expected,
> +			  "Unexpected DAIF + PMR: 0x%lx + 0x%lx (expected 0x%lx + 0x%lx)\n",
> +			  daif_actual, pmr_actual,
> +			  daif_expected, pmr_expected);
> +	} else {
> +		unsigned long daif_expected = DAIF_PROCCTX_NOIRQ;
> +		unsigned long daif_actual = read_sysreg(daif);
> +
> +		WARN_ONCE(daif_actual != daif_expected,
> +			  "Unexpected DAIF value: 0x%lx (expected 0x%lx)\n",
> +			  daif_actual, daif_expected);
> +	}

This logic seems consistent with arm64's local_irq_disable()
implementation. Do we need to wrap these debug checks in a config option
(e.g., CONFIG_ARM64_DEBUG_PRIORITY_MASKING) to avoid unnecessary overhead?


__schedule()
  -> local_irq_disable()
    -> arch_local_irq_disable()

52 static __always_inline void __daif_local_irq_disable(void)
 53 {
 54         barrier();
 55         asm volatile("msr daifset, #3");
 56         barrier();
 57 }
 58
 59 static __always_inline void __pmr_local_irq_disable(void)
 60 {
 61         if (IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING)) {
 62                 u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1);
 63                 WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr !=
GIC_PRIO_IRQOFF);
 64         }
 65
 66         barrier();
 67         write_sysreg_s(GIC_PRIO_IRQOFF, SYS_ICC_PMR_EL1);
 68         barrier();
 69 }
 70
 71 static inline void arch_local_irq_disable(void)
 72 {
 73         if (system_uses_irq_prio_masking()) {
 74                 __pmr_local_irq_disable();
 75         } else {
 76                 __daif_local_irq_disable();
 77         }
 78 }


> +}
> +
>  /*
>   * Thread switching.
>   */
> @@ -708,6 +731,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
>  {
>  	struct task_struct *last;
>  
> +	debug_switch_state();
> +
>  	fpsimd_thread_switch(next);
>  	tls_thread_switch(next);
>  	hw_breakpoint_thread_switch(next);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] arm64/entry:
  2026-04-07 21:08 ` [PATCH 00/10] arm64/entry: Thomas Gleixner
@ 2026-04-08  9:02   ` Mark Rutland
  2026-04-08  9:06     ` Catalin Marinas
  2026-04-08  9:19   ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Mark Rutland @ 2026-04-08  9:02 UTC (permalink / raw)
  To: Thomas Gleixner, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	ada.coupriediaz, linux-kernel, ruanjinjie, vladimir.murzin

On Tue, Apr 07, 2026 at 11:08:36PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 07 2026 at 14:16, Mark Rutland wrote:
> > I've split the series into a prefix of changes for generic irqentry,
> > followed by changes to the arm64 code. I'm hoping that we can queue the
> > generic irqentry patches onto a stable branch, or take those via arm64.
> > The patches are as follows:
> >
> > * Patches 1 and 2 are cleanup to the generic irqentry code. These have no
> >   functional impact, and I think these can be taken regardless of the
> >   rest of the series.
> >
> > * Patches 3 to 5 refactor the generic irqentry code as described above,
> >   providing separate irqentry_{enter,exit}() functions and providing a
> >   split form of irqentry_exit_to_kernel_mode() similar to what exists
> >   for irqentry_exit_to_user_mode(). These patches alone should have no
> >   functional impact.
> 
> I looked through them and I can't find any problem with them. I queued
> them localy and added the missing kernel doc as I promised you on IRC.

Thanks! Much appreciated!

> As I have quite a conflict pending in the tip tree with other changes
> related to the generic entry code, I suggest that I queue 1-5, tag them
> for arm64 consumption and merge them into the conflicting branch to
> avoid trouble with pull request ordering and headaches for the -next
> people.
> 
> Does that work for you?

That sounds good to me.

Catalin, Will, does that work for you?

Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] arm64/entry:
  2026-04-08  9:02   ` Mark Rutland
@ 2026-04-08  9:06     ` Catalin Marinas
  2026-04-08 10:14       ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Catalin Marinas @ 2026-04-08  9:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Will Deacon, linux-arm-kernel, Andy Lutomirski,
	Peter Zijlstra, ada.coupriediaz, linux-kernel, ruanjinjie,
	vladimir.murzin

On Wed, Apr 08, 2026 at 10:02:28AM +0100, Mark Rutland wrote:
> On Tue, Apr 07, 2026 at 11:08:36PM +0200, Thomas Gleixner wrote:
> > On Tue, Apr 07 2026 at 14:16, Mark Rutland wrote:
> > > I've split the series into a prefix of changes for generic irqentry,
> > > followed by changes to the arm64 code. I'm hoping that we can queue the
> > > generic irqentry patches onto a stable branch, or take those via arm64.
> > > The patches are as follows:
> > >
> > > * Patches 1 and 2 are cleanup to the generic irqentry code. These have no
> > >   functional impact, and I think these can be taken regardless of the
> > >   rest of the series.
> > >
> > > * Patches 3 to 5 refactor the generic irqentry code as described above,
> > >   providing separate irqentry_{enter,exit}() functions and providing a
> > >   split form of irqentry_exit_to_kernel_mode() similar to what exists
> > >   for irqentry_exit_to_user_mode(). These patches alone should have no
> > >   functional impact.
> > 
> > I looked through them and I can't find any problem with them. I queued
> > them localy and added the missing kernel doc as I promised you on IRC.
> 
> Thanks! Much appreciated!
> 
> > As I have quite a conflict pending in the tip tree with other changes
> > related to the generic entry code, I suggest that I queue 1-5, tag them
> > for arm64 consumption and merge them into the conflicting branch to
> > avoid trouble with pull request ordering and headaches for the -next
> > people.
> > 
> > Does that work for you?
> 
> That sounds good to me.
> 
> Catalin, Will, does that work for you?

Yes, it does. Thanks!

-- 
Catalin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time
  2026-04-08  2:17   ` Jinjie Ruan
@ 2026-04-08  9:08     ` Mark Rutland
  0 siblings, 0 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-08  9:08 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, ada.coupriediaz,
	linux-kernel, luto, peterz, tglx, vladimir.murzin

On Wed, Apr 08, 2026 at 10:17:56AM +0800, Jinjie Ruan wrote:
> On 2026/4/7 21:16, Mark Rutland wrote:
> > +static inline void debug_switch_state(void)
> > +{
> > +	if (system_uses_irq_prio_masking()) {
> > +		unsigned long daif_expected = 0;
> > +		unsigned long daif_actual = read_sysreg(daif);
> > +		unsigned long pmr_expected = GIC_PRIO_IRQOFF;
> > +		unsigned long pmr_actual = read_sysreg_s(SYS_ICC_PMR_EL1);
> > +
> > +		WARN_ONCE(daif_actual != daif_expected ||
> > +			  pmr_actual != pmr_expected,
> > +			  "Unexpected DAIF + PMR: 0x%lx + 0x%lx (expected 0x%lx + 0x%lx)\n",
> > +			  daif_actual, pmr_actual,
> > +			  daif_expected, pmr_expected);
> > +	} else {
> > +		unsigned long daif_expected = DAIF_PROCCTX_NOIRQ;
> > +		unsigned long daif_actual = read_sysreg(daif);
> > +
> > +		WARN_ONCE(daif_actual != daif_expected,
> > +			  "Unexpected DAIF value: 0x%lx (expected 0x%lx)\n",
> > +			  daif_actual, daif_expected);
> > +	}
> 
> This logic seems consistent with arm64's local_irq_disable()
> implementation. Do we need to wrap these debug checks in a config option
> (e.g., CONFIG_ARM64_DEBUG_PRIORITY_MASKING) to avoid unnecessary overhead?

Possibly. I'd expected this was infrequent enough that there wouldn't be
a noticeable overhead, but admittedly I don't have numbers.

Given Thomas seems happy to queue the preparatory bits, (hopefully) we
can queue the rest of this as-is, and I reckon it's probably best to
drop this patch for now and follow up with a better version later.

There are some other bits of state I'd like to check here (e.g. PAN),
and I think this requires a bit more work.

Thanks for looking at this!

Mark.

> 
> 
> __schedule()
>   -> local_irq_disable()
>     -> arch_local_irq_disable()
> 
> 52 static __always_inline void __daif_local_irq_disable(void)
>  53 {
>  54         barrier();
>  55         asm volatile("msr daifset, #3");
>  56         barrier();
>  57 }
>  58
>  59 static __always_inline void __pmr_local_irq_disable(void)
>  60 {
>  61         if (IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING)) {
>  62                 u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1);
>  63                 WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr !=
> GIC_PRIO_IRQOFF);
>  64         }
>  65
>  66         barrier();
>  67         write_sysreg_s(GIC_PRIO_IRQOFF, SYS_ICC_PMR_EL1);
>  68         barrier();
>  69 }
>  70
>  71 static inline void arch_local_irq_disable(void)
>  72 {
>  73         if (system_uses_irq_prio_masking()) {
>  74                 __pmr_local_irq_disable();
>  75         } else {
>  76                 __daif_local_irq_disable();
>  77         }
>  78 }
> 
> 
> > +}
> > +
> >  /*
> >   * Thread switching.
> >   */
> > @@ -708,6 +731,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
> >  {
> >  	struct task_struct *last;
> >  
> > +	debug_switch_state();
> > +
> >  	fpsimd_thread_switch(next);
> >  	tls_thread_switch(next);
> >  	hw_breakpoint_thread_switch(next);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode()
  2026-04-07 13:16 ` [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode() Mark Rutland
  2026-04-08  1:40   ` Jinjie Ruan
@ 2026-04-08  9:17   ` Jinjie Ruan
  2026-04-08 10:19     ` Mark Rutland
  2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
  2 siblings, 1 reply; 34+ messages in thread
From: Jinjie Ruan @ 2026-04-08  9:17 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: ada.coupriediaz, catalin.marinas, linux-kernel, vladimir.murzin,
	will



On 2026/4/7 21:16, Mark Rutland wrote:
> Some architecture-specific work needs to be performed between the state
> management for exception entry/exit and the "real" work to handle the
> exception. For example, arm64 needs to manipulate a number of exception
> masking bits, with different exceptions requiring different masking.
> 
> Generally this can all be hidden in the architecture code, but for arm64
> the current structure of irqentry_exit_to_kernel_mode() makes this
> particularly difficult to handle in a way that is correct, maintainable,
> and efficient.
> 
> The gory details are described in the thread surrounding:
> 
>   https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/
> 
> The summary is:
> 
> * Currently, irqentry_exit_to_kernel_mode() handles both involuntary
>   preemption AND state management necessary for exception return.
> 
> * When scheduling (including involuntary preemption), arm64 needs to
>   have all arm64-specific exceptions unmasked, though regular interrupts
>   must be masked.
> 
> * Prior to the state management for exception return, arm64 needs to
>   mask a number of arm64-specific exceptions, and perform some work with
>   these exceptions masked (with RCU watching, etc).
> 
> While in theory it is possible to handle this with a new arch_*() hook
> called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
> and complicated, and doesn't match the flow used for exception return to
> user mode, which has a separate 'prepare' step (where preemption can
> occur) prior to the state management.
> 
> To solve this, refactor irqentry_exit_to_kernel_mode() to match the
> style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
> into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
> state management in a new irqentry_exit_to_kernel_mode_after_preempt()
> function. The existing irqentry_exit_to_kernel_mode() is left as a
> caller of both of these, avoiding the need to modify existing callers.
> 
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  include/linux/irq-entry-common.h | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
> comments for these new functions because the existing comments don't
> seem all that consistent (e.g. for user mode vs kernel mode), and I
> suspect we want to rewrite them all in one go for wider consistency.
> 
> I'm happy to respin this, or to follow-up with that as per your
> preference.
> 
> Mark.
> 
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> index 2206150e526d8..24830baa539c6 100644
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -421,10 +421,18 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
>  	return ret;
>  }
>  
> -static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> +static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, irqentry_state_t state)
>  {
> -	lockdep_assert_irqs_disabled();
> +	if (regs_irqs_disabled(regs) || state.exit_rcu)
> +		return;
> +
> +	if (IS_ENABLED(CONFIG_PREEMPTION))
> +		irqentry_exit_cond_resched();
> +}
>  
> +static __always_inline void
> +irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
> +{
>  	if (!regs_irqs_disabled(regs)) {
>  		/*
>  		 * If RCU was not watching on entry this needs to be done
> @@ -443,9 +451,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
>  		}
>  
>  		instrumentation_begin();
> -		if (IS_ENABLED(CONFIG_PREEMPTION))
> -			irqentry_exit_cond_resched();
> -
>  		/* Covers both tracing and lockdep */
>  		trace_hardirqs_on();
>  		instrumentation_end();
> @@ -459,6 +464,17 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
>  	}
>  }
>  
> +static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	instrumentation_begin();
> +	irqentry_exit_to_kernel_mode_preempt(regs, state);
> +	instrumentation_end();

I think the below AI's feedback makes sense. Directly calling
irqentry_exit_to_kernel_mode_preempt() on arm64/other archs could lead
to missing instrumentation_begin()/end() markers.

https://sashiko.dev/#/patchset/20260407131650.3813777-1-mark.rutland%40arm.com

> +
> +	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
> +}
> +
>  /**
>   * irqentry_enter - Handle state tracking on ordinary interrupt entries
>   * @regs:	Pointer to pt_regs of interrupted context

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] arm64/entry:
  2026-04-07 21:08 ` [PATCH 00/10] arm64/entry: Thomas Gleixner
  2026-04-08  9:02   ` Mark Rutland
@ 2026-04-08  9:19   ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2026-04-08  9:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, linux-arm-kernel, Andy Lutomirski, Catalin Marinas,
	Will Deacon, ada.coupriediaz, linux-kernel, ruanjinjie,
	vladimir.murzin

On Tue, Apr 07, 2026 at 11:08:36PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 07 2026 at 14:16, Mark Rutland wrote:
> > I've split the series into a prefix of changes for generic irqentry,
> > followed by changes to the arm64 code. I'm hoping that we can queue the
> > generic irqentry patches onto a stable branch, or take those via arm64.
> > The patches are as follows:
> >
> > * Patches 1 and 2 are cleanup to the generic irqentry code. These have no
> >   functional impact, and I think these can be taken regardless of the
> >   rest of the series.
> >
> > * Patches 3 to 5 refactor the generic irqentry code as described above,
> >   providing separate irqentry_{enter,exit}() functions and providing a
> >   split form of irqentry_exit_to_kernel_mode() similar to what exists
> >   for irqentry_exit_to_user_mode(). These patches alone should have no
> >   functional impact.
> 
> I looked through them and I can't find any problem with them. I queued
> them localy and added the missing kernel doc as I promised you on IRC.
> 
> As I have quite a conflict pending in the tip tree with other changes
> related to the generic entry code, I suggest that I queue 1-5, tag them
> for arm64 consumption and merge them into the conflicting branch to
> avoid trouble with pull request ordering and headaches for the -next
> people.

FWIW, for those 1-5

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [tip: sched/hrtick] entry: Split preemption from irqentry_exit_to_kernel_mode()
  2026-04-07 13:16 ` [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode() Mark Rutland
  2026-04-08  1:40   ` Jinjie Ruan
  2026-04-08  9:17   ` Jinjie Ruan
@ 2026-04-08 10:10   ` tip-bot2 for Mark Rutland
  2 siblings, 0 replies; 34+ messages in thread
From: tip-bot2 for Mark Rutland @ 2026-04-08 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Thomas Gleixner, Jinjie Ruan,
	Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/hrtick branch of tip:

Commit-ID:     041aa7a85390c99b1de86dc28eddcff0890d8186
Gitweb:        https://git.kernel.org/tip/041aa7a85390c99b1de86dc28eddcff0890d8186
Author:        Mark Rutland <mark.rutland@arm.com>
AuthorDate:    Tue, 07 Apr 2026 14:16:45 +01:00
Committer:     Thomas Gleixner <tglx@kernel.org>
CommitterDate: Wed, 08 Apr 2026 11:43:32 +02:00

entry: Split preemption from irqentry_exit_to_kernel_mode()

Some architecture-specific work needs to be performed between the state
management for exception entry/exit and the "real" work to handle the
exception. For example, arm64 needs to manipulate a number of exception
masking bits, with different exceptions requiring different masking.

Generally this can all be hidden in the architecture code, but for arm64
the current structure of irqentry_exit_to_kernel_mode() makes this
particularly difficult to handle in a way that is correct, maintainable,
and efficient.

The gory details are described in the thread surrounding:

  https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/

The summary is:

* Currently, irqentry_exit_to_kernel_mode() handles both involuntary
  preemption AND state management necessary for exception return.

* When scheduling (including involuntary preemption), arm64 needs to
  have all arm64-specific exceptions unmasked, though regular interrupts
  must be masked.

* Prior to the state management for exception return, arm64 needs to
  mask a number of arm64-specific exceptions, and perform some work with
  these exceptions masked (with RCU watching, etc).

While in theory it is possible to handle this with a new arch_*() hook
called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
and complicated, and doesn't match the flow used for exception return to
user mode, which has a separate 'prepare' step (where preemption can
occur) prior to the state management.

To solve this, refactor irqentry_exit_to_kernel_mode() to match the
style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
state management in a new irqentry_exit_to_kernel_mode_after_preempt()
function. The existing irqentry_exit_to_kernel_mode() is left as a
caller of both of these, avoiding the need to modify existing callers.

There should be no functional change as a result of this change.

[ tglx: Updated kernel doc ]

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-6-mark.rutland@arm.com
---
 include/linux/irq-entry-common.h | 73 +++++++++++++++++++++++++------
 1 file changed, 59 insertions(+), 14 deletions(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 66bc168..3845202 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -438,24 +438,46 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
 }
 
 /**
- * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after
- *				  invoking the interrupt handler
+ * irqentry_exit_to_kernel_mode_preempt - Run preempt checks on return to kernel mode
  * @regs:	Pointer to current's pt_regs
  * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
  *
- * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the
- * necessary preemption check if possible and required. It returns to the caller
- * with interrupts disabled and the correct state vs. tracing, lockdep and RCU
- * required to return to the interrupted context.
+ * This is to be invoked before irqentry_exit_to_kernel_mode_after_preempt() to
+ * allow kernel preemption on return from interrupt.
+ *
+ * Must be invoked with interrupts disabled and CPU state which allows kernel
+ * preemption.
  *
- * It is the last action before returning to the low level ASM code which just
- * needs to return.
+ * After returning from this function, the caller can modify CPU state before
+ * invoking irqentry_exit_to_kernel_mode_after_preempt(), which is required to
+ * re-establish the tracing, lockdep and RCU state for returning to the
+ * interrupted context.
  */
-static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
-							 irqentry_state_t state)
+static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs,
+							irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
+	if (regs_irqs_disabled(regs) || state.exit_rcu)
+		return;
+
+	if (IS_ENABLED(CONFIG_PREEMPTION))
+		irqentry_exit_cond_resched();
+}
 
+/**
+ * irqentry_exit_to_kernel_mode_after_preempt - Establish trace, lockdep and RCU state
+ * @regs:	Pointer to current's pt_regs
+ * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
+ *
+ * This is to be invoked after irqentry_exit_to_kernel_mode_preempt() and before
+ * actually returning to the interrupted context.
+ *
+ * There are no requirements for the CPU state other than being able to complete
+ * the tracing, lockdep and RCU state transitions. After this function returns
+ * the caller must return directly to the interrupted context.
+ */
+static __always_inline void
+irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
+{
 	if (!regs_irqs_disabled(regs)) {
 		/*
 		 * If RCU was not watching on entry this needs to be done
@@ -474,9 +496,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
 		}
 
 		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
 		instrumentation_end();
@@ -491,6 +510,32 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
 }
 
 /**
+ * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after
+ *				  invoking the interrupt handler
+ * @regs:	Pointer to current's pt_regs
+ * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
+ *
+ * This is the counterpart of irqentry_enter_from_kernel_mode() and combines
+ * the calls to irqentry_exit_to_kernel_mode_preempt() and
+ * irqentry_exit_to_kernel_mode_after_preempt().
+ *
+ * The requirement for the CPU state is that it can schedule. After the function
+ * returns the tracing, lockdep and RCU state transitions are completed and the
+ * caller must return directly to the interrupted context.
+ */
+static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
+							 irqentry_state_t state)
+{
+	lockdep_assert_irqs_disabled();
+
+	instrumentation_begin();
+	irqentry_exit_to_kernel_mode_preempt(regs, state);
+	instrumentation_end();
+
+	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
+}
+
+/**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
  *

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: sched/hrtick] entry: Split kernel mode logic from irqentry_{enter,exit}()
  2026-04-07 13:16 ` [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}() Mark Rutland
  2026-04-08  1:32   ` Jinjie Ruan
@ 2026-04-08 10:10   ` tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot2 for Mark Rutland @ 2026-04-08 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Thomas Gleixner, Jinjie Ruan,
	Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/hrtick branch of tip:

Commit-ID:     c5538d0141b383808f440186fcd0bc2799af2853
Gitweb:        https://git.kernel.org/tip/c5538d0141b383808f440186fcd0bc2799af2853
Author:        Mark Rutland <mark.rutland@arm.com>
AuthorDate:    Tue, 07 Apr 2026 14:16:44 +01:00
Committer:     Thomas Gleixner <tglx@kernel.org>
CommitterDate: Wed, 08 Apr 2026 11:43:32 +02:00

entry: Split kernel mode logic from irqentry_{enter,exit}()

The generic irqentry code has entry/exit functions specifically for
exceptions taken from user mode, but doesn't have entry/exit functions
specifically for exceptions taken from kernel mode.

It would be helpful to have separate entry/exit functions specifically
for exceptions taken from kernel mode. This would make the structure of
the entry code more consistent, and would make it easier for
architectures to manage logic specific to exceptions taken from kernel
mode.

Move the logic specific to kernel mode out of irqentry_enter() and
irqentry_exit() into new irqentry_enter_from_kernel_mode() and
irqentry_exit_to_kernel_mode() functions. These are marked
__always_inline and placed in irq-entry-common.h, as with
irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so
that they can be inlined into architecture-specific wrappers. The
existing out-of-line irqentry_enter() and irqentry_exit() functions
retained as callers of the new functions.

The lockdep assertion from irqentry_exit() is moved into
irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This
was previously missing from irqentry_exit_to_user_mode() when called
directly, and any new lockdep assertion failure relating from this
change is a latent bug.

Aside from the lockdep change noted above, there should be no functional
change as a result of this change.

[ tglx: Updated kernel doc ]

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-5-mark.rutland@arm.com
---
 include/linux/irq-entry-common.h | 134 ++++++++++++++++++++++++++++++-
 kernel/entry/common.c            | 103 +----------------------
 2 files changed, 142 insertions(+), 95 deletions(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index d1e8591..66bc168 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -304,6 +304,8 @@ static __always_inline void irqentry_enter_from_user_mode(struct pt_regs *regs)
  */
 static __always_inline void irqentry_exit_to_user_mode(struct pt_regs *regs)
 {
+	lockdep_assert_irqs_disabled();
+
 	instrumentation_begin();
 	irqentry_exit_to_user_mode_prepare(regs);
 	instrumentation_end();
@@ -357,6 +359,138 @@ void dynamic_irqentry_exit_cond_resched(void);
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 
 /**
+ * irqentry_enter_from_kernel_mode - Establish state before invoking the irq handler
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from architecture specific entry code with interrupts disabled.
+ * Can only be called when the interrupt entry came from kernel mode. The
+ * calling code must be non-instrumentable.  When the function returns all
+ * state is correct and the subsequent functions can be instrumented.
+ *
+ * The function establishes state (lockdep, RCU (context tracking), tracing) and
+ * is provided for architectures which require a strict split between entry from
+ * kernel and user mode and therefore cannot use irqentry_enter() which handles
+ * both entry modes.
+ *
+ * Returns: An opaque object that must be passed to irqentry_exit_to_kernel_mode().
+ */
+static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct pt_regs *regs)
+{
+	irqentry_state_t ret = {
+		.exit_rcu = false,
+	};
+
+	/*
+	 * If this entry hit the idle task invoke ct_irq_enter() whether
+	 * RCU is watching or not.
+	 *
+	 * Interrupts can nest when the first interrupt invokes softirq
+	 * processing on return which enables interrupts.
+	 *
+	 * Scheduler ticks in the idle task can mark quiescent state and
+	 * terminate a grace period, if and only if the timer interrupt is
+	 * not nested into another interrupt.
+	 *
+	 * Checking for rcu_is_watching() here would prevent the nesting
+	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
+	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
+	 * assume that it is the first interrupt and eventually claim
+	 * quiescent state and end grace periods prematurely.
+	 *
+	 * Unconditionally invoke ct_irq_enter() so RCU state stays
+	 * consistent.
+	 *
+	 * TINY_RCU does not support EQS, so let the compiler eliminate
+	 * this part when enabled.
+	 */
+	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
+	    (is_idle_task(current) || arch_in_rcu_eqs())) {
+		/*
+		 * If RCU is not watching then the same careful
+		 * sequence vs. lockdep and tracing is required
+		 * as in irqentry_enter_from_user_mode().
+		 */
+		lockdep_hardirqs_off(CALLER_ADDR0);
+		ct_irq_enter();
+		instrumentation_begin();
+		kmsan_unpoison_entry_regs(regs);
+		trace_hardirqs_off_finish();
+		instrumentation_end();
+
+		ret.exit_rcu = true;
+		return ret;
+	}
+
+	/*
+	 * If RCU is watching then RCU only wants to check whether it needs
+	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
+	 * already contains a warning when RCU is not watching, so no point
+	 * in having another one here.
+	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
+	rcu_irq_enter_check_tick();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+
+	return ret;
+}
+
+/**
+ * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after
+ *				  invoking the interrupt handler
+ * @regs:	Pointer to current's pt_regs
+ * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
+ *
+ * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the
+ * necessary preemption check if possible and required. It returns to the caller
+ * with interrupts disabled and the correct state vs. tracing, lockdep and RCU
+ * required to return to the interrupted context.
+ *
+ * It is the last action before returning to the low level ASM code which just
+ * needs to return.
+ */
+static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
+							 irqentry_state_t state)
+{
+	lockdep_assert_irqs_disabled();
+
+	if (!regs_irqs_disabled(regs)) {
+		/*
+		 * If RCU was not watching on entry this needs to be done
+		 * carefully and needs the same ordering of lockdep/tracing
+		 * and RCU as the return to user mode path.
+		 */
+		if (state.exit_rcu) {
+			instrumentation_begin();
+			/* Tell the tracer that IRET will enable interrupts */
+			trace_hardirqs_on_prepare();
+			lockdep_hardirqs_on_prepare();
+			instrumentation_end();
+			ct_irq_exit();
+			lockdep_hardirqs_on(CALLER_ADDR0);
+			return;
+		}
+
+		instrumentation_begin();
+		if (IS_ENABLED(CONFIG_PREEMPTION))
+			irqentry_exit_cond_resched();
+
+		/* Covers both tracing and lockdep */
+		trace_hardirqs_on();
+		instrumentation_end();
+	} else {
+		/*
+		 * IRQ flags state is correct already. Just tell RCU if it
+		 * was not watching on entry.
+		 */
+		if (state.exit_rcu)
+			ct_irq_exit();
+	}
+}
+
+/**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
  *
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index b5e05d8..1034be0 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -105,70 +105,16 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 
 noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 {
-	irqentry_state_t ret = {
-		.exit_rcu = false,
-	};
-
 	if (user_mode(regs)) {
-		irqentry_enter_from_user_mode(regs);
-		return ret;
-	}
+		irqentry_state_t ret = {
+			.exit_rcu = false,
+		};
 
-	/*
-	 * If this entry hit the idle task invoke ct_irq_enter() whether
-	 * RCU is watching or not.
-	 *
-	 * Interrupts can nest when the first interrupt invokes softirq
-	 * processing on return which enables interrupts.
-	 *
-	 * Scheduler ticks in the idle task can mark quiescent state and
-	 * terminate a grace period, if and only if the timer interrupt is
-	 * not nested into another interrupt.
-	 *
-	 * Checking for rcu_is_watching() here would prevent the nesting
-	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
-	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
-	 * assume that it is the first interrupt and eventually claim
-	 * quiescent state and end grace periods prematurely.
-	 *
-	 * Unconditionally invoke ct_irq_enter() so RCU state stays
-	 * consistent.
-	 *
-	 * TINY_RCU does not support EQS, so let the compiler eliminate
-	 * this part when enabled.
-	 */
-	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
-	    (is_idle_task(current) || arch_in_rcu_eqs())) {
-		/*
-		 * If RCU is not watching then the same careful
-		 * sequence vs. lockdep and tracing is required
-		 * as in irqentry_enter_from_user_mode().
-		 */
-		lockdep_hardirqs_off(CALLER_ADDR0);
-		ct_irq_enter();
-		instrumentation_begin();
-		kmsan_unpoison_entry_regs(regs);
-		trace_hardirqs_off_finish();
-		instrumentation_end();
-
-		ret.exit_rcu = true;
+		irqentry_enter_from_user_mode(regs);
 		return ret;
 	}
 
-	/*
-	 * If RCU is watching then RCU only wants to check whether it needs
-	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
-	 * already contains a warning when RCU is not watching, so no point
-	 * in having another one here.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	instrumentation_begin();
-	kmsan_unpoison_entry_regs(regs);
-	rcu_irq_enter_check_tick();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
-
-	return ret;
+	return irqentry_enter_from_kernel_mode(regs);
 }
 
 /**
@@ -212,43 +158,10 @@ void dynamic_irqentry_exit_cond_resched(void)
 
 noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
-
-	/* Check whether this returns to user mode */
-	if (user_mode(regs)) {
+	if (user_mode(regs))
 		irqentry_exit_to_user_mode(regs);
-	} else if (!regs_irqs_disabled(regs)) {
-		/*
-		 * If RCU was not watching on entry this needs to be done
-		 * carefully and needs the same ordering of lockdep/tracing
-		 * and RCU as the return to user mode path.
-		 */
-		if (state.exit_rcu) {
-			instrumentation_begin();
-			/* Tell the tracer that IRET will enable interrupts */
-			trace_hardirqs_on_prepare();
-			lockdep_hardirqs_on_prepare();
-			instrumentation_end();
-			ct_irq_exit();
-			lockdep_hardirqs_on(CALLER_ADDR0);
-			return;
-		}
-
-		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
-		/* Covers both tracing and lockdep */
-		trace_hardirqs_on();
-		instrumentation_end();
-	} else {
-		/*
-		 * IRQ flags state is correct already. Just tell RCU if it
-		 * was not watching on entry.
-		 */
-		if (state.exit_rcu)
-			ct_irq_exit();
-	}
+	else
+		irqentry_exit_to_kernel_mode(regs, state);
 }
 
 irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: sched/hrtick] entry: Move irqentry_enter() prototype later
  2026-04-07 13:16 ` [PATCH 03/10] entry: Move irqentry_enter() prototype later Mark Rutland
  2026-04-08  1:21   ` Jinjie Ruan
@ 2026-04-08 10:10   ` tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot2 for Mark Rutland @ 2026-04-08 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Thomas Gleixner, Jinjie Ruan,
	Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/hrtick branch of tip:

Commit-ID:     eb1b51afde506a8e38976190e518990d69ef5382
Gitweb:        https://git.kernel.org/tip/eb1b51afde506a8e38976190e518990d69ef5382
Author:        Mark Rutland <mark.rutland@arm.com>
AuthorDate:    Tue, 07 Apr 2026 14:16:43 +01:00
Committer:     Thomas Gleixner <tglx@kernel.org>
CommitterDate: Wed, 08 Apr 2026 11:43:31 +02:00

entry: Move irqentry_enter() prototype later

Subsequent patches will rework the irqentry_*() functions. The end
result (and the intermediate diffs) will be much clearer if the
prototype for the irqentry_enter() function is moved later, immediately
before the prototype of the irqentry_exit() function.

Move the prototype later.

This is purely a move; there should be no functional change as a result
of this change.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-4-mark.rutland@arm.com
---
 include/linux/irq-entry-common.h | 44 +++++++++++++++----------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 93b4b55..d1e8591 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -335,6 +335,28 @@ typedef struct irqentry_state {
 #endif
 
 /**
+ * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
+ *
+ * Conditional reschedule with additional sanity checks.
+ */
+void raw_irqentry_exit_cond_resched(void);
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
+#define irqentry_exit_cond_resched_dynamic_disabled	NULL
+DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
+#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
+#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
+DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+void dynamic_irqentry_exit_cond_resched(void);
+#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
+#endif
+#else /* CONFIG_PREEMPT_DYNAMIC */
+#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
+/**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
  *
@@ -368,28 +390,6 @@ typedef struct irqentry_state {
 irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
 
 /**
- * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
- *
- * Conditional reschedule with additional sanity checks.
- */
-void raw_irqentry_exit_cond_resched(void);
-
-#ifdef CONFIG_PREEMPT_DYNAMIC
-#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
-#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
-#define irqentry_exit_cond_resched_dynamic_disabled	NULL
-DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
-#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
-#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
-DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-void dynamic_irqentry_exit_cond_resched(void);
-#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
-#endif
-#else /* CONFIG_PREEMPT_DYNAMIC */
-#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
-#endif /* CONFIG_PREEMPT_DYNAMIC */
-
-/**
  * irqentry_exit - Handle return from exception that used irqentry_enter()
  * @regs:	Pointer to pt_regs (exception entry regs)
  * @state:	Return value from matching call to irqentry_enter()

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: sched/hrtick] entry: Remove local_irq_{enable,disable}_exit_to_user()
  2026-04-07 13:16 ` [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user() Mark Rutland
  2026-04-08  1:18   ` Jinjie Ruan
@ 2026-04-08 10:10   ` tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot2 for Mark Rutland @ 2026-04-08 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Thomas Gleixner, Jinjie Ruan,
	Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/hrtick branch of tip:

Commit-ID:     22f66e7ef4ce9414b4bd18abe50ead4a1284b01a
Gitweb:        https://git.kernel.org/tip/22f66e7ef4ce9414b4bd18abe50ead4a1284b01a
Author:        Mark Rutland <mark.rutland@arm.com>
AuthorDate:    Tue, 07 Apr 2026 14:16:42 +01:00
Committer:     Thomas Gleixner <tglx@kernel.org>
CommitterDate: Wed, 08 Apr 2026 11:43:31 +02:00

entry: Remove local_irq_{enable,disable}_exit_to_user()

local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() are
never overridden by architecture code, and are always equivalent to
local_irq_enable() and local_irq_disable().

These functions were added on the assumption that arm64 would override
them to manage 'DAIF' exception masking, as described by Thomas Gleixner
in these threads:

  https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/
  https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/

In practice arm64 did not need to override either. Prior to moving to
the generic irqentry code, arm64's management of DAIF was reworked in
commit:

  97d935faacde ("arm64: Unmask Debug + SError in do_notify_resume()")

Since that commit, arm64 only masks interrupts during the 'prepare' step
when returning to user mode, and masks other DAIF exceptions later.
Within arm64_exit_to_user_mode(), the arm64 entry code is as follows:

	local_irq_disable();
	exit_to_user_mode_prepare_legacy(regs);
	local_daif_mask();
	mte_check_tfsr_exit();
	exit_to_user_mode();

Remove the unnecessary local_irq_enable_exit_to_user() and
local_irq_disable_exit_to_user() functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-3-mark.rutland@arm.com
---
 include/linux/entry-common.h     |  2 +-
 include/linux/irq-entry-common.h | 31 +-------------------------------
 kernel/entry/common.c            |  4 ++--
 3 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f83ca0a..dbaa153 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -321,7 +321,7 @@ static __always_inline void syscall_exit_to_user_mode(struct pt_regs *regs)
 {
 	instrumentation_begin();
 	syscall_exit_to_user_mode_work(regs);
-	local_irq_disable_exit_to_user();
+	local_irq_disable();
 	syscall_exit_to_user_mode_prepare(regs);
 	instrumentation_end();
 	exit_to_user_mode();
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 3cf4d21..93b4b55 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -101,37 +101,6 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
 }
 
 /**
- * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
- * @ti_work:	Cached TIF flags gathered with interrupts disabled
- *
- * Defaults to local_irq_enable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
-
-#ifndef local_irq_enable_exit_to_user
-static __always_inline void local_irq_enable_exit_to_user(unsigned long ti_work)
-{
-	local_irq_enable();
-}
-#endif
-
-/**
- * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
- *
- * Defaults to local_irq_disable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_disable_exit_to_user(void);
-
-#ifndef local_irq_disable_exit_to_user
-static __always_inline void local_irq_disable_exit_to_user(void)
-{
-	local_irq_disable();
-}
-#endif
-
-/**
  * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
  *				 to user mode.
  * @regs:	Pointer to currents pt_regs
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9ef63e4..b5e05d8 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -47,7 +47,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 	 */
 	while (ti_work & EXIT_TO_USER_MODE_WORK_LOOP) {
 
-		local_irq_enable_exit_to_user(ti_work);
+		local_irq_enable();
 
 		if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
 			if (!rseq_grant_slice_extension(ti_work & TIF_SLICE_EXT_DENY))
@@ -74,7 +74,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 		 * might have changed while interrupts and preemption was
 		 * enabled above.
 		 */
-		local_irq_disable_exit_to_user();
+		local_irq_disable();
 
 		/* Check if any of the above work has queued a deferred wakeup */
 		tick_nohz_user_enter_prepare();

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [tip: sched/hrtick] entry: Fix stale comment for irqentry_enter()
  2026-04-07 13:16 ` [PATCH 01/10] entry: Fix stale comment for irqentry_enter() Mark Rutland
  2026-04-08  1:14   ` Jinjie Ruan
@ 2026-04-08 10:10   ` tip-bot2 for Mark Rutland
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot2 for Mark Rutland @ 2026-04-08 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Thomas Gleixner, Jinjie Ruan,
	Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/hrtick branch of tip:

Commit-ID:     1f0d117cd6ca8e74e70e415e89b059fce37674c6
Gitweb:        https://git.kernel.org/tip/1f0d117cd6ca8e74e70e415e89b059fce37674c6
Author:        Mark Rutland <mark.rutland@arm.com>
AuthorDate:    Tue, 07 Apr 2026 14:16:41 +01:00
Committer:     Thomas Gleixner <tglx@kernel.org>
CommitterDate: Wed, 08 Apr 2026 11:43:31 +02:00

entry: Fix stale comment for irqentry_enter()

The kerneldoc comment for irqentry_enter() refers to idtentry_exit(),
which is an accidental holdover from the x86 entry code that the generic
irqentry code was based on.

Correct this to refer to irqentry_exit().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-2-mark.rutland@arm.com
---
 include/linux/irq-entry-common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index d26d1b1..3cf4d21 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -394,7 +394,7 @@ typedef struct irqentry_state {
  * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
  * would not be possible.
  *
- * Returns: An opaque object that must be passed to idtentry_exit()
+ * Returns: An opaque object that must be passed to irqentry_exit()
  */
 irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/10] arm64/entry:
  2026-04-08  9:06     ` Catalin Marinas
@ 2026-04-08 10:14       ` Thomas Gleixner
  0 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2026-04-08 10:14 UTC (permalink / raw)
  To: Catalin Marinas, Mark Rutland
  Cc: Will Deacon, linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	ada.coupriediaz, linux-kernel, ruanjinjie, vladimir.murzin

On Wed, Apr 08 2026 at 10:06, Catalin Marinas wrote:
> On Wed, Apr 08, 2026 at 10:02:28AM +0100, Mark Rutland wrote:
>> > Does that work for you?
>> 
>> That sounds good to me.
>> 
>> Catalin, Will, does that work for you?
>
> Yes, it does. Thanks!

Here you go:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git entry-for-arm64-26-04-08

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode()
  2026-04-08  9:17   ` Jinjie Ruan
@ 2026-04-08 10:19     ` Mark Rutland
  0 siblings, 0 replies; 34+ messages in thread
From: Mark Rutland @ 2026-04-08 10:19 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, ada.coupriediaz, catalin.marinas, linux-kernel,
	vladimir.murzin, will

On Wed, Apr 08, 2026 at 05:17:29PM +0800, Jinjie Ruan wrote:
> 
> 
> On 2026/4/7 21:16, Mark Rutland wrote:
> > Some architecture-specific work needs to be performed between the state
> > management for exception entry/exit and the "real" work to handle the
> > exception. For example, arm64 needs to manipulate a number of exception
> > masking bits, with different exceptions requiring different masking.
> > 
> > Generally this can all be hidden in the architecture code, but for arm64
> > the current structure of irqentry_exit_to_kernel_mode() makes this
> > particularly difficult to handle in a way that is correct, maintainable,
> > and efficient.
> > 
> > The gory details are described in the thread surrounding:
> > 
> >   https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/
> > 
> > The summary is:
> > 
> > * Currently, irqentry_exit_to_kernel_mode() handles both involuntary
> >   preemption AND state management necessary for exception return.
> > 
> > * When scheduling (including involuntary preemption), arm64 needs to
> >   have all arm64-specific exceptions unmasked, though regular interrupts
> >   must be masked.
> > 
> > * Prior to the state management for exception return, arm64 needs to
> >   mask a number of arm64-specific exceptions, and perform some work with
> >   these exceptions masked (with RCU watching, etc).
> > 
> > While in theory it is possible to handle this with a new arch_*() hook
> > called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
> > and complicated, and doesn't match the flow used for exception return to
> > user mode, which has a separate 'prepare' step (where preemption can
> > occur) prior to the state management.
> > 
> > To solve this, refactor irqentry_exit_to_kernel_mode() to match the
> > style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
> > into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
> > state management in a new irqentry_exit_to_kernel_mode_after_preempt()
> > function. The existing irqentry_exit_to_kernel_mode() is left as a
> > caller of both of these, avoiding the need to modify existing callers.
> > 
> > There should be no functional change as a result of this patch.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@kernel.org>
> > Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >  include/linux/irq-entry-common.h | 26 +++++++++++++++++++++-----
> >  1 file changed, 21 insertions(+), 5 deletions(-)
> > 
> > Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
> > comments for these new functions because the existing comments don't
> > seem all that consistent (e.g. for user mode vs kernel mode), and I
> > suspect we want to rewrite them all in one go for wider consistency.
> > 
> > I'm happy to respin this, or to follow-up with that as per your
> > preference.
> > 
> > Mark.
> > 
> > diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> > index 2206150e526d8..24830baa539c6 100644
> > --- a/include/linux/irq-entry-common.h
> > +++ b/include/linux/irq-entry-common.h
> > @@ -421,10 +421,18 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
> >  	return ret;
> >  }
> >  
> > -static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> > +static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, irqentry_state_t state)
> >  {
> > -	lockdep_assert_irqs_disabled();
> > +	if (regs_irqs_disabled(regs) || state.exit_rcu)
> > +		return;
> > +
> > +	if (IS_ENABLED(CONFIG_PREEMPTION))
> > +		irqentry_exit_cond_resched();
> > +}
> >  
> > +static __always_inline void
> > +irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
> > +{
> >  	if (!regs_irqs_disabled(regs)) {
> >  		/*
> >  		 * If RCU was not watching on entry this needs to be done
> > @@ -443,9 +451,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
> >  		}
> >  
> >  		instrumentation_begin();
> > -		if (IS_ENABLED(CONFIG_PREEMPTION))
> > -			irqentry_exit_cond_resched();
> > -
> >  		/* Covers both tracing and lockdep */
> >  		trace_hardirqs_on();
> >  		instrumentation_end();
> > @@ -459,6 +464,17 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
> >  	}
> >  }
> >  
> > +static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
> > +{
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	instrumentation_begin();
> > +	irqentry_exit_to_kernel_mode_preempt(regs, state);
> > +	instrumentation_end();
> 
> I think the below AI's feedback makes sense. Directly calling
> irqentry_exit_to_kernel_mode_preempt() on arm64/other archs could lead
> to missing instrumentation_begin()/end() markers.
> 
> https://sashiko.dev/#/patchset/20260407131650.3813777-1-mark.rutland%40arm.com

I deliberartely made irqentry_exit_to_kernel_mode_preempt 'inline'
rather than '__always_inline' since everything it does is
instrumentable, and it's up to architecture code to handle that
appropriately.

On arm64 instrumentation_begin() and instrumentation_end() are currently
irrelevant. I didn't add those in the arm64-specific entry code as
they'd simply add pointless NOPs.

This is fine as-is.

Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2026-04-08 10:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07 13:16 [PATCH 00/10] arm64/entry: Mark Rutland
2026-04-07 13:16 ` [PATCH 01/10] entry: Fix stale comment for irqentry_enter() Mark Rutland
2026-04-08  1:14   ` Jinjie Ruan
2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
2026-04-07 13:16 ` [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user() Mark Rutland
2026-04-08  1:18   ` Jinjie Ruan
2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
2026-04-07 13:16 ` [PATCH 03/10] entry: Move irqentry_enter() prototype later Mark Rutland
2026-04-08  1:21   ` Jinjie Ruan
2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
2026-04-07 13:16 ` [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}() Mark Rutland
2026-04-08  1:32   ` Jinjie Ruan
2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
2026-04-07 13:16 ` [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode() Mark Rutland
2026-04-08  1:40   ` Jinjie Ruan
2026-04-08  9:17   ` Jinjie Ruan
2026-04-08 10:19     ` Mark Rutland
2026-04-08 10:10   ` [tip: sched/hrtick] " tip-bot2 for Mark Rutland
2026-04-07 13:16 ` [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked Mark Rutland
2026-04-08  1:47   ` Jinjie Ruan
2026-04-07 13:16 ` [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers Mark Rutland
2026-04-08  1:49   ` Jinjie Ruan
2026-04-07 13:16 ` [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() Mark Rutland
2026-04-08  1:50   ` Jinjie Ruan
2026-04-07 13:16 ` [PATCH 09/10] arm64: entry: Use split preemption logic Mark Rutland
2026-04-08  1:52   ` Jinjie Ruan
2026-04-07 13:16 ` [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time Mark Rutland
2026-04-08  2:17   ` Jinjie Ruan
2026-04-08  9:08     ` Mark Rutland
2026-04-07 21:08 ` [PATCH 00/10] arm64/entry: Thomas Gleixner
2026-04-08  9:02   ` Mark Rutland
2026-04-08  9:06     ` Catalin Marinas
2026-04-08 10:14       ` Thomas Gleixner
2026-04-08  9:19   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox