public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
@ 2024-12-06 10:17 Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Jinjie Ruan
                   ` (22 more replies)
  0 siblings, 23 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
to use the generic entry infrastructure from kernel/entry/*. The generic
entry makes maintainers' work easier and codes more elegant, which aslo
removed a lot of duplicate code.

The main steps are as follows:
- Make arm64 easier to use irqentry_enter/exit().
- Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
- Split generic entry into generic irq entry and generic syscall to
  make the single patch more concentrated in switching to one thing.
- Switch to generic irq entry.
- Make arm64 closer to the generic syscall code.
- Switch to generic entry completely.

Changes in v5:
- Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
- Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
- Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
- Update "irqentry_state_t ret/irq_state" to "state"
  to keep it consistently.
- Use generic irq entry header for PREEMPT_DYNAMIC after split
  the generic entry.
- Also refactor the ARM64 syscall code.
- Introduce arch_ptrace_report_syscall_entry/exit(), instead of
  arch_pre/post_report_syscall_entry/exit() to simplify code.
- Make the syscall patches clear separation.
- Update the commit message.

Changes in v4:
- Rework/cleanup split into a few patches as Mark suggested.
- Replace interrupts_enabled() macro with regs_irqs_disabled(), instead
  of left it here.
- Remove rcu and lockdep state in pt_regs by using temporary
  irqentry_state_t as Mark suggested.
- Remove some unnecessary intermediate functions to make it clear.
- Rework preempt irq and PREEMPT_DYNAMIC code
  to make the switch more clear.
- arch_prepare_*_entry/exit() -> arch_pre_*_entry/exit().
- Expand the arch functions comment.
- Make arch functions closer to its caller.
- Declare saved_reg in for block.
- Remove arch_exit_to_kernel_mode_prepare(), arch_enter_from_kernel_mode().
- Adjust "Add few arch functions to use generic entry" patch to be
  the penultimate.
- Update the commit message.
- Add suggested-by.

Changes in v3:
- Test the MTE test cases.
- Handle forget_syscall() in arch_post_report_syscall_entry()
- Make the arch funcs not use __weak as Thomas suggested, so move
  the arch funcs to entry-common.h, and make arch_forget_syscall() folded
  in arch_post_report_syscall_entry() as suggested.
- Move report_single_step() to thread_info.h for arm64
- Change __always_inline() to inline, add inline for the other arch funcs.
- Remove unused signal.h for entry-common.h.
- Add Suggested-by.
- Update the commit message.

Changes in v2:
- Add tested-by.
- Fix a bug that not call arch_post_report_syscall_entry() in
  syscall_trace_enter() if ptrace_report_syscall_entry() return not zero.
- Refactor report_syscall().
- Add comment for arch_prepare_report_syscall_exit().
- Adjust entry-common.h header file inclusion to alphabetical order.
- Update the commit message.

Jinjie Ruan (22):
  arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
  arm64: entry: Refactor the entry and exit for exceptions from EL1
  arm64: entry: Move arm64_preempt_schedule_irq() into
    __exit_to_kernel_mode()
  arm64: entry: Rework arm64_preempt_schedule_irq()
  arm64: entry: Use preempt_count() and need_resched() helper
  arm64: entry: Expand the need_irq_preemption() macro ahead
  arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled
  arm64: entry: Use different helpers to check resched for
    PREEMPT_DYNAMIC
  entry: Split generic entry into irq and syscall
  entry: Add arch_irqentry_exit_need_resched() for arm64
  arm64: entry: Switch to generic IRQ entry
  arm64/ptrace: Split report_syscall() function
  arm64/ptrace: Refactor syscall_trace_enter()
  arm64/ptrace: Refactor syscall_trace_exit()
  arm64/ptrace: Refator el0_svc_common()
  entry: Make syscall_exit_to_user_mode_prepare() not static
  arm64/ptrace: Return early for ptrace_report_syscall_entry() error
  arm64/ptrace: Expand secure_computing() in place
  arm64/ptrace: Use syscall_get_arguments() heleper
  entry: Add arch_ptrace_report_syscall_entry/exit()
  entry: Add has_syscall_work() helepr
  arm64: entry: Convert to generic entry

 MAINTAINERS                           |   1 +
 arch/Kconfig                          |   8 +
 arch/arm64/Kconfig                    |   1 +
 arch/arm64/include/asm/daifflags.h    |   2 +-
 arch/arm64/include/asm/entry-common.h | 134 +++++++++
 arch/arm64/include/asm/preempt.h      |   2 -
 arch/arm64/include/asm/ptrace.h       |  11 +-
 arch/arm64/include/asm/syscall.h      |   6 +-
 arch/arm64/include/asm/thread_info.h  |  23 +-
 arch/arm64/include/asm/xen/events.h   |   2 +-
 arch/arm64/kernel/acpi.c              |   2 +-
 arch/arm64/kernel/debug-monitors.c    |   9 +-
 arch/arm64/kernel/entry-common.c      | 377 ++++++++-----------------
 arch/arm64/kernel/ptrace.c            |  90 ------
 arch/arm64/kernel/sdei.c              |   2 +-
 arch/arm64/kernel/signal.c            |   3 +-
 arch/arm64/kernel/syscall.c           |  31 +-
 include/linux/entry-common.h          | 384 +------------------------
 include/linux/irq-entry-common.h      | 389 ++++++++++++++++++++++++++
 kernel/entry/Makefile                 |   3 +-
 kernel/entry/common.c                 | 176 ++----------
 kernel/entry/syscall-common.c         | 198 +++++++++++++
 kernel/sched/core.c                   |   8 +-
 23 files changed, 909 insertions(+), 953 deletions(-)
 create mode 100644 arch/arm64/include/asm/entry-common.h
 create mode 100644 include/linux/irq-entry-common.h
 create mode 100644 kernel/entry/syscall-common.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:04   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1 Jinjie Ruan
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry code expects architecture code to provide
regs_irqs_disabled(regs) function, but arm64 does not have this and
provides inerrupts_enabled(regs), which has the opposite polarity.

In preparation for moving arm64 over to the generic entry code,
relace arm64's interrupts_enabled() with regs_irqs_disabled() and
update its callers under arch/arm64.

For the moment, a definition of interrupts_enabled() is provided for
the GICv3 driver. Once arch/arm implement regs_irqs_disabled(), this
can be removed.

No functional changes.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/daifflags.h  | 2 +-
 arch/arm64/include/asm/ptrace.h     | 7 +++++++
 arch/arm64/include/asm/xen/events.h | 2 +-
 arch/arm64/kernel/acpi.c            | 2 +-
 arch/arm64/kernel/debug-monitors.c  | 2 +-
 arch/arm64/kernel/entry-common.c    | 4 ++--
 arch/arm64/kernel/sdei.c            | 2 +-
 7 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index fbb5c99eb2f9..5fca48009043 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -128,7 +128,7 @@ static inline void local_daif_inherit(struct pt_regs *regs)
 {
 	unsigned long flags = regs->pstate & DAIF_MASK;
 
-	if (interrupts_enabled(regs))
+	if (!regs_irqs_disabled(regs))
 		trace_hardirqs_on();
 
 	if (system_uses_irq_prio_masking())
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 47ff8654c5ec..bcfa96880377 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -214,9 +214,16 @@ static inline void forget_syscall(struct pt_regs *regs)
 		(regs)->pmr == GIC_PRIO_IRQON :				\
 		true)
 
+/*
+ * Used by the GICv3 driver, can be removed once arch/arm implements
+ * regs_irqs_disabled() directly.
+ */
 #define interrupts_enabled(regs)			\
 	(!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
 
+#define regs_irqs_disabled(regs)			\
+	(((regs)->pstate & PSR_I_BIT) || (!irqs_priority_unmasked(regs)))
+
 #define fast_interrupts_enabled(regs) \
 	(!((regs)->pstate & PSR_F_BIT))
 
diff --git a/arch/arm64/include/asm/xen/events.h b/arch/arm64/include/asm/xen/events.h
index 2788e95d0ff0..2977b5fe068d 100644
--- a/arch/arm64/include/asm/xen/events.h
+++ b/arch/arm64/include/asm/xen/events.h
@@ -14,7 +14,7 @@ enum ipi_vector {
 
 static inline int xen_irqs_disabled(struct pt_regs *regs)
 {
-	return !interrupts_enabled(regs);
+	return regs_irqs_disabled(regs);
 }
 
 #define xchg_xen_ulong(ptr, val) xchg((ptr), (val))
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index e6f66491fbe9..732f89daae23 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -403,7 +403,7 @@ int apei_claim_sea(struct pt_regs *regs)
 	return_to_irqs_enabled = !irqs_disabled_flags(arch_local_save_flags());
 
 	if (regs)
-		return_to_irqs_enabled = interrupts_enabled(regs);
+		return_to_irqs_enabled = !regs_irqs_disabled(regs);
 
 	/*
 	 * SEA can interrupt SError, mask it and describe this as an NMI so
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 58f047de3e1c..460c09d03a73 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -231,7 +231,7 @@ static void send_user_sigtrap(int si_code)
 	if (WARN_ON(!user_mode(regs)))
 		return;
 
-	if (interrupts_enabled(regs))
+	if (!regs_irqs_disabled(regs))
 		local_irq_enable();
 
 	arm64_force_sig_fault(SIGTRAP, si_code, instruction_pointer(regs),
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index b260ddc4d3e9..c547e70428d3 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -73,7 +73,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
 {
 	lockdep_assert_irqs_disabled();
 
-	if (interrupts_enabled(regs)) {
+	if (!regs_irqs_disabled(regs)) {
 		if (regs->exit_rcu) {
 			trace_hardirqs_on_prepare();
 			lockdep_hardirqs_on_prepare();
@@ -569,7 +569,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
 {
 	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
 
-	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && regs_irqs_disabled(regs))
 		__el1_pnmi(regs, handler);
 	else
 		__el1_irq(regs, handler);
diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
index 255d12f881c2..27a17da635d8 100644
--- a/arch/arm64/kernel/sdei.c
+++ b/arch/arm64/kernel/sdei.c
@@ -247,7 +247,7 @@ unsigned long __kprobes do_sdei_event(struct pt_regs *regs,
 	 * If we interrupted the kernel with interrupts masked, we always go
 	 * back to wherever we came from.
 	 */
-	if (mode == kernel_mode && !interrupts_enabled(regs))
+	if (mode == kernel_mode && regs_irqs_disabled(regs))
 		return SDEI_EV_HANDLED;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:08   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() Jinjie Ruan
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry code uses irqentry_state_t to track lockdep and RCU
state across exception entry and return. For historical reasons, arm64
embeds similar fields within its pt_regs structure.

In preparation for moving arm64 over to the generic entry code, pull
these fields out of arm64's pt_regs, and use a separate structure,
matching the style of the generic entry code.

No functional changes.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/ptrace.h  |   4 -
 arch/arm64/kernel/entry-common.c | 136 +++++++++++++++++++------------
 2 files changed, 85 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index bcfa96880377..e90dfc9982aa 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -169,10 +169,6 @@ struct pt_regs {
 
 	u64 sdei_ttbr1;
 	struct frame_record_meta stackframe;
-
-	/* Only valid for some EL1 exceptions. */
-	u64 lockdep_hardirqs;
-	u64 exit_rcu;
 };
 
 /* For correct stack alignment, pt_regs has to be a multiple of 16 bytes. */
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index c547e70428d3..1687627b2ecf 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -28,6 +28,13 @@
 #include <asm/sysreg.h>
 #include <asm/system_misc.h>
 
+typedef struct irqentry_state {
+	union {
+		bool	exit_rcu;
+		bool	lockdep;
+	};
+} irqentry_state_t;
+
 /*
  * Handle IRQ/context state management when entering from kernel mode.
  * Before this function is called it is not safe to call regular kernel code,
@@ -36,29 +43,36 @@
  * This is intended to match the logic in irqentry_enter(), handling the kernel
  * mode transitions only.
  */
-static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs)
+static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt_regs *regs)
 {
-	regs->exit_rcu = false;
+	irqentry_state_t state = {
+		.exit_rcu = false,
+	};
 
 	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
 		lockdep_hardirqs_off(CALLER_ADDR0);
 		ct_irq_enter();
 		trace_hardirqs_off_finish();
 
-		regs->exit_rcu = true;
-		return;
+		state.exit_rcu = true;
+		return state;
 	}
 
 	lockdep_hardirqs_off(CALLER_ADDR0);
 	rcu_irq_enter_check_tick();
 	trace_hardirqs_off_finish();
+
+	return state;
 }
 
-static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
+static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 {
-	__enter_from_kernel_mode(regs);
+	irqentry_state_t state = __enter_from_kernel_mode(regs);
+
 	mte_check_tfsr_entry();
 	mte_disable_tco_entry(current);
+
+	return state;
 }
 
 /*
@@ -69,12 +83,13 @@ static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
  * This is intended to match the logic in irqentry_exit(), handling the kernel
  * mode transitions only, and with preemption handled elsewhere.
  */
-static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
+static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
+						  irqentry_state_t state)
 {
 	lockdep_assert_irqs_disabled();
 
 	if (!regs_irqs_disabled(regs)) {
-		if (regs->exit_rcu) {
+		if (state.exit_rcu) {
 			trace_hardirqs_on_prepare();
 			lockdep_hardirqs_on_prepare();
 			ct_irq_exit();
@@ -84,15 +99,16 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
 
 		trace_hardirqs_on();
 	} else {
-		if (regs->exit_rcu)
+		if (state.exit_rcu)
 			ct_irq_exit();
 	}
 }
 
-static void noinstr exit_to_kernel_mode(struct pt_regs *regs)
+static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
+					irqentry_state_t state)
 {
 	mte_check_tfsr_exit();
-	__exit_to_kernel_mode(regs);
+	__exit_to_kernel_mode(regs, state);
 }
 
 /*
@@ -190,9 +206,11 @@ asmlinkage void noinstr asm_exit_to_user_mode(struct pt_regs *regs)
  * mode. Before this function is called it is not safe to call regular kernel
  * code, instrumentable code, or any code which may trigger an exception.
  */
-static void noinstr arm64_enter_nmi(struct pt_regs *regs)
+static noinstr irqentry_state_t arm64_enter_nmi(struct pt_regs *regs)
 {
-	regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
+	irqentry_state_t state;
+
+	state.lockdep = lockdep_hardirqs_enabled();
 
 	__nmi_enter();
 	lockdep_hardirqs_off(CALLER_ADDR0);
@@ -201,6 +219,8 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
 
 	trace_hardirqs_off_finish();
 	ftrace_nmi_enter();
+
+	return state;
 }
 
 /*
@@ -208,19 +228,18 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
  * mode. After this function returns it is not safe to call regular kernel
  * code, instrumentable code, or any code which may trigger an exception.
  */
-static void noinstr arm64_exit_nmi(struct pt_regs *regs)
+static void noinstr arm64_exit_nmi(struct pt_regs *regs,
+				   irqentry_state_t state)
 {
-	bool restore = regs->lockdep_hardirqs;
-
 	ftrace_nmi_exit();
-	if (restore) {
+	if (state.lockdep) {
 		trace_hardirqs_on_prepare();
 		lockdep_hardirqs_on_prepare();
 	}
 
 	ct_nmi_exit();
 	lockdep_hardirq_exit();
-	if (restore)
+	if (state.lockdep)
 		lockdep_hardirqs_on(CALLER_ADDR0);
 	__nmi_exit();
 }
@@ -230,14 +249,18 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs)
  * kernel mode. Before this function is called it is not safe to call regular
  * kernel code, instrumentable code, or any code which may trigger an exception.
  */
-static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
+static noinstr irqentry_state_t arm64_enter_el1_dbg(struct pt_regs *regs)
 {
-	regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
+	irqentry_state_t state;
+
+	state.lockdep = lockdep_hardirqs_enabled();
 
 	lockdep_hardirqs_off(CALLER_ADDR0);
 	ct_nmi_enter();
 
 	trace_hardirqs_off_finish();
+
+	return state;
 }
 
 /*
@@ -245,17 +268,16 @@ static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
  * kernel mode. After this function returns it is not safe to call regular
  * kernel code, instrumentable code, or any code which may trigger an exception.
  */
-static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
+static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
+				       irqentry_state_t state)
 {
-	bool restore = regs->lockdep_hardirqs;
-
-	if (restore) {
+	if (state.lockdep) {
 		trace_hardirqs_on_prepare();
 		lockdep_hardirqs_on_prepare();
 	}
 
 	ct_nmi_exit();
-	if (restore)
+	if (state.lockdep)
 		lockdep_hardirqs_on(CALLER_ADDR0);
 }
 
@@ -426,78 +448,86 @@ UNHANDLED(el1t, 64, error)
 static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
 {
 	unsigned long far = read_sysreg(far_el1);
+	irqentry_state_t state;
 
-	enter_from_kernel_mode(regs);
+	state = enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_mem_abort(far, esr, regs);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
 {
 	unsigned long far = read_sysreg(far_el1);
+	irqentry_state_t state;
 
-	enter_from_kernel_mode(regs);
+	state = enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_sp_pc_abort(far, esr, regs);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_kernel_mode(regs);
+	irqentry_state_t state = enter_from_kernel_mode(regs);
+
 	local_daif_inherit(regs);
 	do_el1_undef(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_kernel_mode(regs);
+	irqentry_state_t state = enter_from_kernel_mode(regs);
+
 	local_daif_inherit(regs);
 	do_el1_bti(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_kernel_mode(regs);
+	irqentry_state_t state = enter_from_kernel_mode(regs);
+
 	local_daif_inherit(regs);
 	do_el1_gcs(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_kernel_mode(regs);
+	irqentry_state_t state = enter_from_kernel_mode(regs);
+
 	local_daif_inherit(regs);
 	do_el1_mops(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr)
 {
 	unsigned long far = read_sysreg(far_el1);
+	irqentry_state_t state;
 
-	arm64_enter_el1_dbg(regs);
+	state = arm64_enter_el1_dbg(regs);
 	if (!cortex_a76_erratum_1463225_debug_handler(regs))
 		do_debug_exception(far, esr, regs);
-	arm64_exit_el1_dbg(regs);
+	arm64_exit_el1_dbg(regs, state);
 }
 
 static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_kernel_mode(regs);
+	irqentry_state_t state = enter_from_kernel_mode(regs);
+
 	local_daif_inherit(regs);
 	do_el1_fpac(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 
 asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
@@ -546,15 +576,16 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
 static __always_inline void __el1_pnmi(struct pt_regs *regs,
 				       void (*handler)(struct pt_regs *))
 {
-	arm64_enter_nmi(regs);
+	irqentry_state_t state = arm64_enter_nmi(regs);
+
 	do_interrupt_handler(regs, handler);
-	arm64_exit_nmi(regs);
+	arm64_exit_nmi(regs, state);
 }
 
 static __always_inline void __el1_irq(struct pt_regs *regs,
 				      void (*handler)(struct pt_regs *))
 {
-	enter_from_kernel_mode(regs);
+	irqentry_state_t state = enter_from_kernel_mode(regs);
 
 	irq_enter_rcu();
 	do_interrupt_handler(regs, handler);
@@ -562,7 +593,7 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
 
 	arm64_preempt_schedule_irq();
 
-	exit_to_kernel_mode(regs);
+	exit_to_kernel_mode(regs, state);
 }
 static void noinstr el1_interrupt(struct pt_regs *regs,
 				  void (*handler)(struct pt_regs *))
@@ -588,11 +619,12 @@ asmlinkage void noinstr el1h_64_fiq_handler(struct pt_regs *regs)
 asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs)
 {
 	unsigned long esr = read_sysreg(esr_el1);
+	irqentry_state_t state;
 
 	local_daif_restore(DAIF_ERRCTX);
-	arm64_enter_nmi(regs);
+	state = arm64_enter_nmi(regs);
 	do_serror(regs, esr);
-	arm64_exit_nmi(regs);
+	arm64_exit_nmi(regs, state);
 }
 
 static void noinstr el0_da(struct pt_regs *regs, unsigned long esr)
@@ -855,12 +887,13 @@ asmlinkage void noinstr el0t_64_fiq_handler(struct pt_regs *regs)
 static void noinstr __el0_error_handler_common(struct pt_regs *regs)
 {
 	unsigned long esr = read_sysreg(esr_el1);
+	irqentry_state_t state;
 
 	enter_from_user_mode(regs);
 	local_daif_restore(DAIF_ERRCTX);
-	arm64_enter_nmi(regs);
+	state = arm64_enter_nmi(regs);
 	do_serror(regs, esr);
-	arm64_exit_nmi(regs);
+	arm64_exit_nmi(regs, state);
 	local_daif_restore(DAIF_PROCCTX);
 	exit_to_user_mode(regs);
 }
@@ -968,6 +1001,7 @@ asmlinkage void noinstr __noreturn handle_bad_stack(struct pt_regs *regs)
 asmlinkage noinstr unsigned long
 __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
 {
+	irqentry_state_t state;
 	unsigned long ret;
 
 	/*
@@ -992,9 +1026,9 @@ __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
 	else if (cpu_has_pan())
 		set_pstate_pan(0);
 
-	arm64_enter_nmi(regs);
+	state = arm64_enter_nmi(regs);
 	ret = do_sdei_event(regs, arg);
-	arm64_exit_nmi(regs);
+	arm64_exit_nmi(regs, state);
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1 Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:26   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq() Jinjie Ruan
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry code try to reschedule every time when the kernel
mode non-NMI exception return. At the moment, arm64 only reschedule every
time when EL1 irq exception return;

In preparation for moving arm64 over to the generic entry code, move
arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
only EL1 irq but also all EL1 non-NMI exception return, there is a chance
to reschedule. And only if irqs are enabled when the exception trapped,
there may be a chance to reschedule after the exceptions have been handled,
so move arm64_preempt_schedule_irq() into regs_irqs_disabled()
check false block, but it will try to reschedule only when TINY_RCU is
enabled or current is not an idle task.

As Mark pointed out, this change will have the following 2 key impact:

- " We'll preempt even without taking a "real" interrupt. That
    shouldn't result in preemption that wasn't possible before,
    but it does change the probability of preempting at certain points,
    and might have a performance impact, so probably warrants a
    benchmark."

- " We will not preempt when taking interrupts from a region of kernel
    code where IRQs are enabled but RCU is not watching, matching the
    behaviour of the generic entry code.

    This has the potential to introduce livelock if we can ever have a
    screaming interrupt in such a region, so we'll need to go figure out
    whether that's actually a problem.

    Having this as a separate patch will make it easier to test/bisect
    for that specifically."

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/entry-common.c | 88 ++++++++++++++++----------------
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 1687627b2ecf..7a588515ee07 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -75,6 +75,48 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 	return state;
 }
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+#define need_irq_preemption() \
+	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
+#else
+#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
+#endif
+
+static void __sched arm64_preempt_schedule_irq(void)
+{
+	if (!need_irq_preemption())
+		return;
+
+	/*
+	 * Note: thread_info::preempt_count includes both thread_info::count
+	 * and thread_info::need_resched, and is not equivalent to
+	 * preempt_count().
+	 */
+	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
+		return;
+
+	/*
+	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+	 * DAIF we must have handled an NMI, so skip preemption.
+	 */
+	if (system_uses_irq_prio_masking() && read_sysreg(daif))
+		return;
+
+	/*
+	 * Preempting a task from an IRQ means we leave copies of PSTATE
+	 * on the stack. cpufeature's enable calls may modify PSTATE, but
+	 * resuming one of these preempted tasks would undo those changes.
+	 *
+	 * Only allow a task to be preempted once cpufeatures have been
+	 * enabled.
+	 */
+	if (system_capabilities_finalized())
+		preempt_schedule_irq();
+}
+
 /*
  * Handle IRQ/context state management when exiting to kernel mode.
  * After this function returns it is not safe to call regular kernel code,
@@ -97,6 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 			return;
 		}
 
+		arm64_preempt_schedule_irq();
+
 		trace_hardirqs_on();
 	} else {
 		if (state.exit_rcu)
@@ -281,48 +325,6 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
 		lockdep_hardirqs_on(CALLER_ADDR0);
 }
 
-#ifdef CONFIG_PREEMPT_DYNAMIC
-DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-#define need_irq_preemption() \
-	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
-#else
-#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
-#endif
-
-static void __sched arm64_preempt_schedule_irq(void)
-{
-	if (!need_irq_preemption())
-		return;
-
-	/*
-	 * Note: thread_info::preempt_count includes both thread_info::count
-	 * and thread_info::need_resched, and is not equivalent to
-	 * preempt_count().
-	 */
-	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
-		return;
-
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return;
-
-	/*
-	 * Preempting a task from an IRQ means we leave copies of PSTATE
-	 * on the stack. cpufeature's enable calls may modify PSTATE, but
-	 * resuming one of these preempted tasks would undo those changes.
-	 *
-	 * Only allow a task to be preempted once cpufeatures have been
-	 * enabled.
-	 */
-	if (system_capabilities_finalized())
-		preempt_schedule_irq();
-}
-
 static void do_interrupt_handler(struct pt_regs *regs,
 				 void (*handler)(struct pt_regs *))
 {
@@ -591,8 +593,6 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
 	do_interrupt_handler(regs, handler);
 	irq_exit_rcu();
 
-	arm64_preempt_schedule_irq();
-
 	exit_to_kernel_mode(regs, state);
 }
 static void noinstr el1_interrupt(struct pt_regs *regs,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (2 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:33   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper Jinjie Ruan
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry do preempt_schedule_irq() by checking if need_resched()
satisfied, but arm64 has some of its own additional checks such as
GIC priority masking.

In preparation for moving arm64 over to the generic entry code, rework
arm64_preempt_schedule_irq() to check whether it need resched in a check
function called arm64_need_resched().

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/entry-common.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 7a588515ee07..da68c089b74b 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -83,10 +83,10 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
 #define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
 #endif
 
-static void __sched arm64_preempt_schedule_irq(void)
+static inline bool arm64_need_resched(void)
 {
 	if (!need_irq_preemption())
-		return;
+		return false;
 
 	/*
 	 * Note: thread_info::preempt_count includes both thread_info::count
@@ -94,7 +94,7 @@ static void __sched arm64_preempt_schedule_irq(void)
 	 * preempt_count().
 	 */
 	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
-		return;
+		return false;
 
 	/*
 	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
@@ -103,7 +103,7 @@ static void __sched arm64_preempt_schedule_irq(void)
 	 * DAIF we must have handled an NMI, so skip preemption.
 	 */
 	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return;
+		return false;
 
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
@@ -113,8 +113,10 @@ static void __sched arm64_preempt_schedule_irq(void)
 	 * Only allow a task to be preempted once cpufeatures have been
 	 * enabled.
 	 */
-	if (system_capabilities_finalized())
-		preempt_schedule_irq();
+	if (!system_capabilities_finalized())
+		return false;
+
+	return true;
 }
 
 /*
@@ -139,7 +141,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 			return;
 		}
 
-		arm64_preempt_schedule_irq();
+		if (arm64_need_resched())
+			preempt_schedule_irq();
 
 		trace_hardirqs_on();
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (3 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:40   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead Jinjie Ruan
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry code uses preempt_count() and need_resched() helpers to
check if it is time to resched. Currently, arm64 use its own check logic,
that is "READ_ONCE(current_thread_info()->preempt_count == 0", which is
equivalent to "preempt_count() == 0 && need_resched()".

In preparation for moving arm64 over to the generic entry code, use
these helpers to replace arm64's own code and move it ahead.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/entry-common.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index da68c089b74b..efd1a990d138 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -88,14 +88,6 @@ static inline bool arm64_need_resched(void)
 	if (!need_irq_preemption())
 		return false;
 
-	/*
-	 * Note: thread_info::preempt_count includes both thread_info::count
-	 * and thread_info::need_resched, and is not equivalent to
-	 * preempt_count().
-	 */
-	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
-		return false;
-
 	/*
 	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
 	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
@@ -141,8 +133,10 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 			return;
 		}
 
-		if (arm64_need_resched())
-			preempt_schedule_irq();
+		if (!preempt_count() && need_resched()) {
+			if (arm64_need_resched())
+				preempt_schedule_irq();
+		}
 
 		trace_hardirqs_on();
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (4 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:48   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled Jinjie Ruan
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry has the same logic as need_irq_preemption()
macro and use a helper function to check other resched condition.

In preparation for moving arm64 over to the generic entry code,
check and expand need_irq_preemption() ahead and extract arm64 resched
check code to a helper function.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/preempt.h |  1 +
 arch/arm64/kernel/entry-common.c | 28 +++++++++++++++++-----------
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index 0159b625cc7f..d0f93385bd85 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -85,6 +85,7 @@ static inline bool should_resched(int preempt_offset)
 void preempt_schedule(void);
 void preempt_schedule_notrace(void);
 
+void raw_irqentry_exit_cond_resched(void);
 #ifdef CONFIG_PREEMPT_DYNAMIC
 
 DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index efd1a990d138..80b47ca02db2 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -77,17 +77,10 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-#define need_irq_preemption() \
-	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
-#else
-#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
 #endif
 
 static inline bool arm64_need_resched(void)
 {
-	if (!need_irq_preemption())
-		return false;
-
 	/*
 	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
 	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
@@ -111,6 +104,22 @@ static inline bool arm64_need_resched(void)
 	return true;
 }
 
+void raw_irqentry_exit_cond_resched(void)
+{
+#ifdef CONFIG_PREEMPT_DYNAMIC
+	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
+		return;
+#else
+	if (!IS_ENABLED(CONFIG_PREEMPTION))
+		return;
+#endif
+
+	if (!preempt_count()) {
+		if (need_resched() && arm64_need_resched())
+			preempt_schedule_irq();
+	}
+}
+
 /*
  * Handle IRQ/context state management when exiting to kernel mode.
  * After this function returns it is not safe to call regular kernel code,
@@ -133,10 +142,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 			return;
 		}
 
-		if (!preempt_count() && need_resched()) {
-			if (arm64_need_resched())
-				preempt_schedule_irq();
-		}
+		raw_irqentry_exit_cond_resched();
 
 		trace_hardirqs_on();
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (5 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:52   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC Jinjie Ruan
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry check PREEMPTION for both PREEMPT_DYNAMIC
enabled and PREEMPT_DYNAMIC disabled.

Whether PREEMPT_DYNAMIC enabled or not, PREEMPTION should
be enabled to allow reschedule before EL1 exception return, so
move PREEMPTION check ahead in preparation for moving arm64 over
to the generic entry code.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/entry-common.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 80b47ca02db2..029f8bd72f8a 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -109,9 +109,6 @@ void raw_irqentry_exit_cond_resched(void)
 #ifdef CONFIG_PREEMPT_DYNAMIC
 	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
 		return;
-#else
-	if (!IS_ENABLED(CONFIG_PREEMPTION))
-		return;
 #endif
 
 	if (!preempt_count()) {
@@ -142,7 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 			return;
 		}
 
-		raw_irqentry_exit_cond_resched();
+		if (IS_ENABLED(CONFIG_PREEMPTION))
+			raw_irqentry_exit_cond_resched();
 
 		trace_hardirqs_on();
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (6 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 11:54   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall Jinjie Ruan
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

In generic entry, when PREEMPT_DYNAMIC is enabled or disabled, two
different helpers are used to check whether resched is required
and some common code is reused.

In preparation for moving arm64 over to the generic entry code,
use new helper to check resched when PREEMPT_DYNAMIC enabled and
reuse common code for the disabled case.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/preempt.h |  3 +++
 arch/arm64/kernel/entry-common.c | 21 +++++++++++----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index d0f93385bd85..0f0ba250efe8 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -93,11 +93,14 @@ void dynamic_preempt_schedule(void);
 #define __preempt_schedule()		dynamic_preempt_schedule()
 void dynamic_preempt_schedule_notrace(void);
 #define __preempt_schedule_notrace()	dynamic_preempt_schedule_notrace()
+void dynamic_irqentry_exit_cond_resched(void);
+#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
 
 #else /* CONFIG_PREEMPT_DYNAMIC */
 
 #define __preempt_schedule()		preempt_schedule()
 #define __preempt_schedule_notrace()	preempt_schedule_notrace()
+#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
 
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 #endif /* CONFIG_PREEMPTION */
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 029f8bd72f8a..015a65d19b52 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -75,10 +75,6 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 	return state;
 }
 
-#ifdef CONFIG_PREEMPT_DYNAMIC
-DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-#endif
-
 static inline bool arm64_need_resched(void)
 {
 	/*
@@ -106,17 +102,22 @@ static inline bool arm64_need_resched(void)
 
 void raw_irqentry_exit_cond_resched(void)
 {
-#ifdef CONFIG_PREEMPT_DYNAMIC
-	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
-		return;
-#endif
-
 	if (!preempt_count()) {
 		if (need_resched() && arm64_need_resched())
 			preempt_schedule_irq();
 	}
 }
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+void dynamic_irqentry_exit_cond_resched(void)
+{
+	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
+		return;
+	raw_irqentry_exit_cond_resched();
+}
+#endif
+
 /*
  * Handle IRQ/context state management when exiting to kernel mode.
  * After this function returns it is not safe to call regular kernel code,
@@ -140,7 +141,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 		}
 
 		if (IS_ENABLED(CONFIG_PREEMPTION))
-			raw_irqentry_exit_cond_resched();
+			irqentry_exit_cond_resched();
 
 		trace_hardirqs_on();
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (7 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 12:04   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64 Jinjie Ruan
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

As Mark pointed out, do not try to switch to *all* the
generic entry code in one go. The regular entry state management
(e.g. enter_from_user_mode() and exit_to_user_mode()) is largely
separate from the syscall state management. Move arm64 over to
enter_from_user_mode() and exit_to_user_mode() without needing to use
any of the generic syscall logic. Doing that first, *then* moving over
to the generic syscall handling would be much easier to
review/test/bisect, and if there are any ABI issues with the syscall
handling in particular, it will be easier to handle those in isolation.

So split generic entry into irq entry and syscall code, which will
make review work easier and switch to generic entry clear.

Introdue two configs called GENERIC_SYSCALL and GENERIC_IRQ_ENTRY,
which control the irq entry and syscall parts of the generic code
respectively. And split the header file irq-entry-common.h from
entry-common.h for GENERIC_IRQ_ENTRY.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 MAINTAINERS                      |   1 +
 arch/Kconfig                     |   8 +
 include/linux/entry-common.h     | 382 +-----------------------------
 include/linux/irq-entry-common.h | 389 +++++++++++++++++++++++++++++++
 kernel/entry/Makefile            |   3 +-
 kernel/entry/common.c            | 160 +------------
 kernel/entry/syscall-common.c    | 159 +++++++++++++
 kernel/sched/core.c              |   8 +-
 8 files changed, 565 insertions(+), 545 deletions(-)
 create mode 100644 include/linux/irq-entry-common.h
 create mode 100644 kernel/entry/syscall-common.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 21f855fe468b..7a6e87587101 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9585,6 +9585,7 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/entry
 F:	include/linux/entry-common.h
 F:	include/linux/entry-kvm.h
+F:	include/linux/irq-entry-common.h
 F:	kernel/entry/
 
 GENERIC GPIO I2C DRIVER
diff --git a/arch/Kconfig b/arch/Kconfig
index 6682b2a53e34..5a454eff780b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -64,8 +64,16 @@ config HOTPLUG_PARALLEL
 	bool
 	select HOTPLUG_SPLIT_STARTUP
 
+config GENERIC_IRQ_ENTRY
+	bool
+
+config GENERIC_SYSCALL
+	bool
+
 config GENERIC_ENTRY
 	bool
+	select GENERIC_IRQ_ENTRY
+	select GENERIC_SYSCALL
 
 config KPROBES
 	bool "Kprobes"
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index fc61d0205c97..b3233e8328c5 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -2,27 +2,15 @@
 #ifndef __LINUX_ENTRYCOMMON_H
 #define __LINUX_ENTRYCOMMON_H
 
-#include <linux/static_call_types.h>
+#include <linux/irq-entry-common.h>
 #include <linux/ptrace.h>
-#include <linux/syscalls.h>
 #include <linux/seccomp.h>
 #include <linux/sched.h>
-#include <linux/context_tracking.h>
 #include <linux/livepatch.h>
 #include <linux/resume_user_mode.h>
-#include <linux/tick.h>
-#include <linux/kmsan.h>
 
 #include <asm/entry-common.h>
 
-/*
- * Define dummy _TIF work flags if not defined by the architecture or for
- * disabled functionality.
- */
-#ifndef _TIF_PATCH_PENDING
-# define _TIF_PATCH_PENDING		(0)
-#endif
-
 #ifndef _TIF_UPROBE
 # define _TIF_UPROBE			(0)
 #endif
@@ -55,69 +43,6 @@
 				 SYSCALL_WORK_SYSCALL_EXIT_TRAP	|	\
 				 ARCH_SYSCALL_WORK_EXIT)
 
-/*
- * TIF flags handled in exit_to_user_mode_loop()
- */
-#ifndef ARCH_EXIT_TO_USER_MODE_WORK
-# define ARCH_EXIT_TO_USER_MODE_WORK		(0)
-#endif
-
-#define EXIT_TO_USER_MODE_WORK						\
-	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
-	 _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY |			\
-	 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |			\
-	 ARCH_EXIT_TO_USER_MODE_WORK)
-
-/**
- * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs
- * @regs:	Pointer to currents pt_regs
- *
- * Defaults to an empty implementation. Can be replaced by architecture
- * specific code.
- *
- * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
- * section. Use __always_inline so the compiler cannot push it out of line
- * and make it instrumentable.
- */
-static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
-
-#ifndef arch_enter_from_user_mode
-static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
-#endif
-
-/**
- * enter_from_user_mode - Establish state when coming from user mode
- *
- * Syscall/interrupt entry disables interrupts, but user mode is traced as
- * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
- *
- * 1) Tell lockdep that interrupts are disabled
- * 2) Invoke context tracking if enabled to reactivate RCU
- * 3) Trace interrupts off state
- *
- * Invoked from architecture specific syscall entry code with interrupts
- * disabled. The calling code has to be non-instrumentable. When the
- * function returns all state is correct and interrupts are still
- * disabled. The subsequent functions can be instrumented.
- *
- * This is invoked when there is architecture specific functionality to be
- * done between establishing state and enabling interrupts. The caller must
- * enable interrupts before invoking syscall_enter_from_user_mode_work().
- */
-static __always_inline void enter_from_user_mode(struct pt_regs *regs)
-{
-	arch_enter_from_user_mode(regs);
-	lockdep_hardirqs_off(CALLER_ADDR0);
-
-	CT_WARN_ON(__ct_state() != CT_STATE_USER);
-	user_exit_irqoff();
-
-	instrumentation_begin();
-	kmsan_unpoison_entry_regs(regs);
-	trace_hardirqs_off_finish();
-	instrumentation_end();
-}
-
 /**
  * syscall_enter_from_user_mode_prepare - Establish state and enable interrupts
  * @regs:	Pointer to currents pt_regs
@@ -202,170 +127,6 @@ static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, l
 	return ret;
 }
 
-/**
- * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
- * @ti_work:	Cached TIF flags gathered with interrupts disabled
- *
- * Defaults to local_irq_enable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
-
-#ifndef local_irq_enable_exit_to_user
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
-{
-	local_irq_enable();
-}
-#endif
-
-/**
- * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
- *
- * Defaults to local_irq_disable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_disable_exit_to_user(void);
-
-#ifndef local_irq_disable_exit_to_user
-static inline void local_irq_disable_exit_to_user(void)
-{
-	local_irq_disable();
-}
-#endif
-
-/**
- * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
- *				 to user mode.
- * @regs:	Pointer to currents pt_regs
- * @ti_work:	Cached TIF flags gathered with interrupts disabled
- *
- * Invoked from exit_to_user_mode_loop() with interrupt enabled
- *
- * Defaults to NOOP. Can be supplied by architecture specific code.
- */
-static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
-					       unsigned long ti_work);
-
-#ifndef arch_exit_to_user_mode_work
-static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
-					       unsigned long ti_work)
-{
-}
-#endif
-
-/**
- * arch_exit_to_user_mode_prepare - Architecture specific preparation for
- *				    exit to user mode.
- * @regs:	Pointer to currents pt_regs
- * @ti_work:	Cached TIF flags gathered with interrupts disabled
- *
- * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last
- * function before return. Defaults to NOOP.
- */
-static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
-						  unsigned long ti_work);
-
-#ifndef arch_exit_to_user_mode_prepare
-static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
-						  unsigned long ti_work)
-{
-}
-#endif
-
-/**
- * arch_exit_to_user_mode - Architecture specific final work before
- *			    exit to user mode.
- *
- * Invoked from exit_to_user_mode() with interrupt disabled as the last
- * function before return. Defaults to NOOP.
- *
- * This needs to be __always_inline because it is non-instrumentable code
- * invoked after context tracking switched to user mode.
- *
- * An architecture implementation must not do anything complex, no locking
- * etc. The main purpose is for speculation mitigations.
- */
-static __always_inline void arch_exit_to_user_mode(void);
-
-#ifndef arch_exit_to_user_mode
-static __always_inline void arch_exit_to_user_mode(void) { }
-#endif
-
-/**
- * arch_do_signal_or_restart -  Architecture specific signal delivery function
- * @regs:	Pointer to currents pt_regs
- *
- * Invoked from exit_to_user_mode_loop().
- */
-void arch_do_signal_or_restart(struct pt_regs *regs);
-
-/**
- * exit_to_user_mode_loop - do any pending work before leaving to user space
- */
-unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
-				     unsigned long ti_work);
-
-/**
- * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
- * @regs:	Pointer to pt_regs on entry stack
- *
- * 1) check that interrupts are disabled
- * 2) call tick_nohz_user_enter_prepare()
- * 3) call exit_to_user_mode_loop() if any flags from
- *    EXIT_TO_USER_MODE_WORK are set
- * 4) check that interrupts are still disabled
- */
-static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
-{
-	unsigned long ti_work;
-
-	lockdep_assert_irqs_disabled();
-
-	/* Flush pending rcuog wakeup before the last need_resched() check */
-	tick_nohz_user_enter_prepare();
-
-	ti_work = read_thread_flags();
-	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
-		ti_work = exit_to_user_mode_loop(regs, ti_work);
-
-	arch_exit_to_user_mode_prepare(regs, ti_work);
-
-	/* Ensure that kernel state is sane for a return to userspace */
-	kmap_assert_nomap();
-	lockdep_assert_irqs_disabled();
-	lockdep_sys_exit();
-}
-
-/**
- * exit_to_user_mode - Fixup state when exiting to user mode
- *
- * Syscall/interrupt exit enables interrupts, but the kernel state is
- * interrupts disabled when this is invoked. Also tell RCU about it.
- *
- * 1) Trace interrupts on state
- * 2) Invoke context tracking if enabled to adjust RCU state
- * 3) Invoke architecture specific last minute exit code, e.g. speculation
- *    mitigations, etc.: arch_exit_to_user_mode()
- * 4) Tell lockdep that interrupts are enabled
- *
- * Invoked from architecture specific code when syscall_exit_to_user_mode()
- * is not suitable as the last step before returning to userspace. Must be
- * invoked with interrupts disabled and the caller must be
- * non-instrumentable.
- * The caller has to invoke syscall_exit_to_user_mode_work() before this.
- */
-static __always_inline void exit_to_user_mode(void)
-{
-	instrumentation_begin();
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare();
-	instrumentation_end();
-
-	user_enter_irqoff();
-	arch_exit_to_user_mode();
-	lockdep_hardirqs_on(CALLER_ADDR0);
-}
-
 /**
  * syscall_exit_to_user_mode_work - Handle work before returning to user mode
  * @regs:	Pointer to currents pt_regs
@@ -412,145 +173,4 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs);
  */
 void syscall_exit_to_user_mode(struct pt_regs *regs);
 
-/**
- * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
- * @regs:	Pointer to currents pt_regs
- *
- * Invoked from architecture specific entry code with interrupts disabled.
- * Can only be called when the interrupt entry came from user mode. The
- * calling code must be non-instrumentable.  When the function returns all
- * state is correct and the subsequent functions can be instrumented.
- *
- * The function establishes state (lockdep, RCU (context tracking), tracing)
- */
-void irqentry_enter_from_user_mode(struct pt_regs *regs);
-
-/**
- * irqentry_exit_to_user_mode - Interrupt exit work
- * @regs:	Pointer to current's pt_regs
- *
- * Invoked with interrupts disabled and fully valid regs. Returns with all
- * work handled, interrupts disabled such that the caller can immediately
- * switch to user mode. Called from architecture specific interrupt
- * handling code.
- *
- * The call order is #2 and #3 as described in syscall_exit_to_user_mode().
- * Interrupt exit is not invoking #1 which is the syscall specific one time
- * work.
- */
-void irqentry_exit_to_user_mode(struct pt_regs *regs);
-
-#ifndef irqentry_state
-/**
- * struct irqentry_state - Opaque object for exception state storage
- * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
- *            exit path has to invoke ct_irq_exit().
- * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
- *           lockdep state is restored correctly on exit from nmi.
- *
- * This opaque object is filled in by the irqentry_*_enter() functions and
- * must be passed back into the corresponding irqentry_*_exit() functions
- * when the exception is complete.
- *
- * Callers of irqentry_*_[enter|exit]() must consider this structure opaque
- * and all members private.  Descriptions of the members are provided to aid in
- * the maintenance of the irqentry_*() functions.
- */
-typedef struct irqentry_state {
-	union {
-		bool	exit_rcu;
-		bool	lockdep;
-	};
-} irqentry_state_t;
-#endif
-
-/**
- * irqentry_enter - Handle state tracking on ordinary interrupt entries
- * @regs:	Pointer to pt_regs of interrupted context
- *
- * Invokes:
- *  - lockdep irqflag state tracking as low level ASM entry disabled
- *    interrupts.
- *
- *  - Context tracking if the exception hit user mode.
- *
- *  - The hardirq tracer to keep the state consistent as low level ASM
- *    entry disabled interrupts.
- *
- * As a precondition, this requires that the entry came from user mode,
- * idle, or a kernel context in which RCU is watching.
- *
- * For kernel mode entries RCU handling is done conditional. If RCU is
- * watching then the only RCU requirement is to check whether the tick has
- * to be restarted. If RCU is not watching then ct_irq_enter() has to be
- * invoked on entry and ct_irq_exit() on exit.
- *
- * Avoiding the ct_irq_enter/exit() calls is an optimization but also
- * solves the problem of kernel mode pagefaults which can schedule, which
- * is not possible after invoking ct_irq_enter() without undoing it.
- *
- * For user mode entries irqentry_enter_from_user_mode() is invoked to
- * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
- * would not be possible.
- *
- * Returns: An opaque object that must be passed to idtentry_exit()
- */
-irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
-
-/**
- * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
- *
- * Conditional reschedule with additional sanity checks.
- */
-void raw_irqentry_exit_cond_resched(void);
-#ifdef CONFIG_PREEMPT_DYNAMIC
-#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
-#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
-#define irqentry_exit_cond_resched_dynamic_disabled	NULL
-DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
-#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
-#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
-DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-void dynamic_irqentry_exit_cond_resched(void);
-#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
-#endif
-#else /* CONFIG_PREEMPT_DYNAMIC */
-#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
-#endif /* CONFIG_PREEMPT_DYNAMIC */
-
-/**
- * irqentry_exit - Handle return from exception that used irqentry_enter()
- * @regs:	Pointer to pt_regs (exception entry regs)
- * @state:	Return value from matching call to irqentry_enter()
- *
- * Depending on the return target (kernel/user) this runs the necessary
- * preemption and work checks if possible and required and returns to
- * the caller with interrupts disabled and no further work pending.
- *
- * This is the last action before returning to the low level ASM code which
- * just needs to return to the appropriate context.
- *
- * Counterpart to irqentry_enter().
- */
-void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
-
-/**
- * irqentry_nmi_enter - Handle NMI entry
- * @regs:	Pointer to currents pt_regs
- *
- * Similar to irqentry_enter() but taking care of the NMI constraints.
- */
-irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs);
-
-/**
- * irqentry_nmi_exit - Handle return from NMI handling
- * @regs:	Pointer to pt_regs (NMI entry regs)
- * @irq_state:	Return value from matching call to irqentry_nmi_enter()
- *
- * Last action before returning to the low level assembly code.
- *
- * Counterpart to irqentry_nmi_enter().
- */
-void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state);
-
 #endif
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
new file mode 100644
index 000000000000..8af374331900
--- /dev/null
+++ b/include/linux/irq-entry-common.h
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_IRQENTRYCOMMON_H
+#define __LINUX_IRQENTRYCOMMON_H
+
+#include <linux/static_call_types.h>
+#include <linux/syscalls.h>
+#include <linux/context_tracking.h>
+#include <linux/tick.h>
+#include <linux/kmsan.h>
+
+#include <asm/entry-common.h>
+
+/*
+ * Define dummy _TIF work flags if not defined by the architecture or for
+ * disabled functionality.
+ */
+#ifndef _TIF_PATCH_PENDING
+# define _TIF_PATCH_PENDING		(0)
+#endif
+
+/*
+ * TIF flags handled in exit_to_user_mode_loop()
+ */
+#ifndef ARCH_EXIT_TO_USER_MODE_WORK
+# define ARCH_EXIT_TO_USER_MODE_WORK		(0)
+#endif
+
+#define EXIT_TO_USER_MODE_WORK						\
+	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
+	 _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY |			\
+	 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |			\
+	 ARCH_EXIT_TO_USER_MODE_WORK)
+
+/**
+ * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Defaults to an empty implementation. Can be replaced by architecture
+ * specific code.
+ *
+ * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
+ * section. Use __always_inline so the compiler cannot push it out of line
+ * and make it instrumentable.
+ */
+static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
+
+#ifndef arch_enter_from_user_mode
+static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
+#endif
+
+/**
+ * enter_from_user_mode - Establish state when coming from user mode
+ *
+ * Syscall/interrupt entry disables interrupts, but user mode is traced as
+ * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
+ *
+ * 1) Tell lockdep that interrupts are disabled
+ * 2) Invoke context tracking if enabled to reactivate RCU
+ * 3) Trace interrupts off state
+ *
+ * Invoked from architecture specific syscall entry code with interrupts
+ * disabled. The calling code has to be non-instrumentable. When the
+ * function returns all state is correct and interrupts are still
+ * disabled. The subsequent functions can be instrumented.
+ *
+ * This is invoked when there is architecture specific functionality to be
+ * done between establishing state and enabling interrupts. The caller must
+ * enable interrupts before invoking syscall_enter_from_user_mode_work().
+ */
+static __always_inline void enter_from_user_mode(struct pt_regs *regs)
+{
+	arch_enter_from_user_mode(regs);
+	lockdep_hardirqs_off(CALLER_ADDR0);
+
+	CT_WARN_ON(__ct_state() != CT_STATE_USER);
+	user_exit_irqoff();
+
+	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+}
+
+/**
+ * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
+ * @ti_work:	Cached TIF flags gathered with interrupts disabled
+ *
+ * Defaults to local_irq_enable(). Can be supplied by architecture specific
+ * code.
+ */
+static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
+
+#ifndef local_irq_enable_exit_to_user
+static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
+{
+	local_irq_enable();
+}
+#endif
+
+/**
+ * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
+ *
+ * Defaults to local_irq_disable(). Can be supplied by architecture specific
+ * code.
+ */
+static inline void local_irq_disable_exit_to_user(void);
+
+#ifndef local_irq_disable_exit_to_user
+static inline void local_irq_disable_exit_to_user(void)
+{
+	local_irq_disable();
+}
+#endif
+
+/**
+ * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
+ *				 to user mode.
+ * @regs:	Pointer to currents pt_regs
+ * @ti_work:	Cached TIF flags gathered with interrupts disabled
+ *
+ * Invoked from exit_to_user_mode_loop() with interrupt enabled
+ *
+ * Defaults to NOOP. Can be supplied by architecture specific code.
+ */
+static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
+					       unsigned long ti_work);
+
+#ifndef arch_exit_to_user_mode_work
+static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
+					       unsigned long ti_work)
+{
+}
+#endif
+
+/**
+ * arch_exit_to_user_mode_prepare - Architecture specific preparation for
+ *				    exit to user mode.
+ * @regs:	Pointer to currents pt_regs
+ * @ti_work:	Cached TIF flags gathered with interrupts disabled
+ *
+ * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last
+ * function before return. Defaults to NOOP.
+ */
+static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
+						  unsigned long ti_work);
+
+#ifndef arch_exit_to_user_mode_prepare
+static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
+						  unsigned long ti_work)
+{
+}
+#endif
+
+/**
+ * arch_exit_to_user_mode - Architecture specific final work before
+ *			    exit to user mode.
+ *
+ * Invoked from exit_to_user_mode() with interrupt disabled as the last
+ * function before return. Defaults to NOOP.
+ *
+ * This needs to be __always_inline because it is non-instrumentable code
+ * invoked after context tracking switched to user mode.
+ *
+ * An architecture implementation must not do anything complex, no locking
+ * etc. The main purpose is for speculation mitigations.
+ */
+static __always_inline void arch_exit_to_user_mode(void);
+
+#ifndef arch_exit_to_user_mode
+static __always_inline void arch_exit_to_user_mode(void) { }
+#endif
+
+/**
+ * arch_do_signal_or_restart -  Architecture specific signal delivery function
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from exit_to_user_mode_loop().
+ */
+void arch_do_signal_or_restart(struct pt_regs *regs);
+
+/**
+ * exit_to_user_mode_loop - do any pending work before leaving to user space
+ */
+unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
+				     unsigned long ti_work);
+
+/**
+ * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
+ * @regs:	Pointer to pt_regs on entry stack
+ *
+ * 1) check that interrupts are disabled
+ * 2) call tick_nohz_user_enter_prepare()
+ * 3) call exit_to_user_mode_loop() if any flags from
+ *    EXIT_TO_USER_MODE_WORK are set
+ * 4) check that interrupts are still disabled
+ */
+static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
+{
+	unsigned long ti_work;
+
+	lockdep_assert_irqs_disabled();
+
+	/* Flush pending rcuog wakeup before the last need_resched() check */
+	tick_nohz_user_enter_prepare();
+
+	ti_work = read_thread_flags();
+	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
+		ti_work = exit_to_user_mode_loop(regs, ti_work);
+
+	arch_exit_to_user_mode_prepare(regs, ti_work);
+
+	/* Ensure that kernel state is sane for a return to userspace */
+	kmap_assert_nomap();
+	lockdep_assert_irqs_disabled();
+	lockdep_sys_exit();
+}
+
+/**
+ * exit_to_user_mode - Fixup state when exiting to user mode
+ *
+ * Syscall/interrupt exit enables interrupts, but the kernel state is
+ * interrupts disabled when this is invoked. Also tell RCU about it.
+ *
+ * 1) Trace interrupts on state
+ * 2) Invoke context tracking if enabled to adjust RCU state
+ * 3) Invoke architecture specific last minute exit code, e.g. speculation
+ *    mitigations, etc.: arch_exit_to_user_mode()
+ * 4) Tell lockdep that interrupts are enabled
+ *
+ * Invoked from architecture specific code when syscall_exit_to_user_mode()
+ * is not suitable as the last step before returning to userspace. Must be
+ * invoked with interrupts disabled and the caller must be
+ * non-instrumentable.
+ * The caller has to invoke syscall_exit_to_user_mode_work() before this.
+ */
+static __always_inline void exit_to_user_mode(void)
+{
+	instrumentation_begin();
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare();
+	instrumentation_end();
+
+	user_enter_irqoff();
+	arch_exit_to_user_mode();
+	lockdep_hardirqs_on(CALLER_ADDR0);
+}
+
+/**
+ * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from architecture specific entry code with interrupts disabled.
+ * Can only be called when the interrupt entry came from user mode. The
+ * calling code must be non-instrumentable.  When the function returns all
+ * state is correct and the subsequent functions can be instrumented.
+ *
+ * The function establishes state (lockdep, RCU (context tracking), tracing)
+ */
+void irqentry_enter_from_user_mode(struct pt_regs *regs);
+
+/**
+ * irqentry_exit_to_user_mode - Interrupt exit work
+ * @regs:	Pointer to current's pt_regs
+ *
+ * Invoked with interrupts disabled and fully valid regs. Returns with all
+ * work handled, interrupts disabled such that the caller can immediately
+ * switch to user mode. Called from architecture specific interrupt
+ * handling code.
+ *
+ * The call order is #2 and #3 as described in syscall_exit_to_user_mode().
+ * Interrupt exit is not invoking #1 which is the syscall specific one time
+ * work.
+ */
+void irqentry_exit_to_user_mode(struct pt_regs *regs);
+
+#ifndef irqentry_state
+/**
+ * struct irqentry_state - Opaque object for exception state storage
+ * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
+ *            exit path has to invoke ct_irq_exit().
+ * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
+ *           lockdep state is restored correctly on exit from nmi.
+ *
+ * This opaque object is filled in by the irqentry_*_enter() functions and
+ * must be passed back into the corresponding irqentry_*_exit() functions
+ * when the exception is complete.
+ *
+ * Callers of irqentry_*_[enter|exit]() must consider this structure opaque
+ * and all members private.  Descriptions of the members are provided to aid in
+ * the maintenance of the irqentry_*() functions.
+ */
+typedef struct irqentry_state {
+	union {
+		bool	exit_rcu;
+		bool	lockdep;
+	};
+} irqentry_state_t;
+#endif
+
+/**
+ * irqentry_enter - Handle state tracking on ordinary interrupt entries
+ * @regs:	Pointer to pt_regs of interrupted context
+ *
+ * Invokes:
+ *  - lockdep irqflag state tracking as low level ASM entry disabled
+ *    interrupts.
+ *
+ *  - Context tracking if the exception hit user mode.
+ *
+ *  - The hardirq tracer to keep the state consistent as low level ASM
+ *    entry disabled interrupts.
+ *
+ * As a precondition, this requires that the entry came from user mode,
+ * idle, or a kernel context in which RCU is watching.
+ *
+ * For kernel mode entries RCU handling is done conditional. If RCU is
+ * watching then the only RCU requirement is to check whether the tick has
+ * to be restarted. If RCU is not watching then ct_irq_enter() has to be
+ * invoked on entry and ct_irq_exit() on exit.
+ *
+ * Avoiding the ct_irq_enter/exit() calls is an optimization but also
+ * solves the problem of kernel mode pagefaults which can schedule, which
+ * is not possible after invoking ct_irq_enter() without undoing it.
+ *
+ * For user mode entries irqentry_enter_from_user_mode() is invoked to
+ * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
+ * would not be possible.
+ *
+ * Returns: An opaque object that must be passed to idtentry_exit()
+ */
+irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
+
+/**
+ * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
+ *
+ * Conditional reschedule with additional sanity checks.
+ */
+void raw_irqentry_exit_cond_resched(void);
+#ifdef CONFIG_PREEMPT_DYNAMIC
+#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
+#define irqentry_exit_cond_resched_dynamic_disabled	NULL
+DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
+#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
+#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
+DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+void dynamic_irqentry_exit_cond_resched(void);
+#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
+#endif
+#else /* CONFIG_PREEMPT_DYNAMIC */
+#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
+/**
+ * irqentry_exit - Handle return from exception that used irqentry_enter()
+ * @regs:	Pointer to pt_regs (exception entry regs)
+ * @state:	Return value from matching call to irqentry_enter()
+ *
+ * Depending on the return target (kernel/user) this runs the necessary
+ * preemption and work checks if possible and required and returns to
+ * the caller with interrupts disabled and no further work pending.
+ *
+ * This is the last action before returning to the low level ASM code which
+ * just needs to return to the appropriate context.
+ *
+ * Counterpart to irqentry_enter().
+ */
+void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
+
+/**
+ * irqentry_nmi_enter - Handle NMI entry
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Similar to irqentry_enter() but taking care of the NMI constraints.
+ */
+irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs);
+
+/**
+ * irqentry_nmi_exit - Handle return from NMI handling
+ * @regs:	Pointer to pt_regs (NMI entry regs)
+ * @irq_state:	Return value from matching call to irqentry_nmi_enter()
+ *
+ * Last action before returning to the low level assembly code.
+ *
+ * Counterpart to irqentry_nmi_enter().
+ */
+void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state);
+
+#endif
diff --git a/kernel/entry/Makefile b/kernel/entry/Makefile
index 095c775e001e..d38f3a7e7396 100644
--- a/kernel/entry/Makefile
+++ b/kernel/entry/Makefile
@@ -9,5 +9,6 @@ KCOV_INSTRUMENT := n
 CFLAGS_REMOVE_common.o	 = -fstack-protector -fstack-protector-strong
 CFLAGS_common.o		+= -fno-stack-protector
 
-obj-$(CONFIG_GENERIC_ENTRY) 		+= common.o syscall_user_dispatch.o
+obj-$(CONFIG_GENERIC_IRQ_ENTRY) 	+= common.o
+obj-$(CONFIG_GENERIC_SYSCALL) 		+= syscall-common.o syscall_user_dispatch.o
 obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK)	+= kvm.o
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index e33691d5adf7..b82032777310 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -1,84 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include <linux/context_tracking.h>
-#include <linux/entry-common.h>
+#include <linux/irq-entry-common.h>
 #include <linux/resume_user_mode.h>
 #include <linux/highmem.h>
 #include <linux/jump_label.h>
 #include <linux/kmsan.h>
 #include <linux/livepatch.h>
-#include <linux/audit.h>
 #include <linux/tick.h>
 
-#include "common.h"
-
-#define CREATE_TRACE_POINTS
-#include <trace/events/syscalls.h>
-
-static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
-{
-	if (unlikely(audit_context())) {
-		unsigned long args[6];
-
-		syscall_get_arguments(current, regs, args);
-		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
-	}
-}
-
-long syscall_trace_enter(struct pt_regs *regs, long syscall,
-				unsigned long work)
-{
-	long ret = 0;
-
-	/*
-	 * Handle Syscall User Dispatch.  This must comes first, since
-	 * the ABI here can be something that doesn't make sense for
-	 * other syscall_work features.
-	 */
-	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
-		if (syscall_user_dispatch(regs))
-			return -1L;
-	}
-
-	/* Handle ptrace */
-	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
-		ret = ptrace_report_syscall_entry(regs);
-		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
-			return -1L;
-	}
-
-	/* Do seccomp after ptrace, to catch any tracer changes. */
-	if (work & SYSCALL_WORK_SECCOMP) {
-		ret = __secure_computing(NULL);
-		if (ret == -1L)
-			return ret;
-	}
-
-	/* Either of the above might have changed the syscall number */
-	syscall = syscall_get_nr(current, regs);
-
-	if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
-		trace_sys_enter(regs, syscall);
-		/*
-		 * Probes or BPF hooks in the tracepoint may have changed the
-		 * system call number as well.
-		 */
-		syscall = syscall_get_nr(current, regs);
-	}
-
-	syscall_enter_audit(regs, syscall);
-
-	return ret ? : syscall;
-}
-
-noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
-{
-	enter_from_user_mode(regs);
-	instrumentation_begin();
-	local_irq_enable();
-	instrumentation_end();
-}
-
 /* Workaround to allow gradual conversion of architecture code */
 void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
 
@@ -133,93 +62,6 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 	return ti_work;
 }
 
-/*
- * If SYSCALL_EMU is set, then the only reason to report is when
- * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
- * instruction has been already reported in syscall_enter_from_user_mode().
- */
-static inline bool report_single_step(unsigned long work)
-{
-	if (work & SYSCALL_WORK_SYSCALL_EMU)
-		return false;
-
-	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
-}
-
-static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
-{
-	bool step;
-
-	/*
-	 * If the syscall was rolled back due to syscall user dispatching,
-	 * then the tracers below are not invoked for the same reason as
-	 * the entry side was not invoked in syscall_trace_enter(): The ABI
-	 * of these syscalls is unknown.
-	 */
-	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
-		if (unlikely(current->syscall_dispatch.on_dispatch)) {
-			current->syscall_dispatch.on_dispatch = false;
-			return;
-		}
-	}
-
-	audit_syscall_exit(regs);
-
-	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
-		trace_sys_exit(regs, syscall_get_return_value(current, regs));
-
-	step = report_single_step(work);
-	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
-		ptrace_report_syscall_exit(regs, step);
-}
-
-/*
- * Syscall specific exit to user mode preparation. Runs with interrupts
- * enabled.
- */
-static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
-{
-	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
-	unsigned long nr = syscall_get_nr(current, regs);
-
-	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
-
-	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
-		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
-			local_irq_enable();
-	}
-
-	rseq_syscall(regs);
-
-	/*
-	 * Do one-time syscall specific work. If these work items are
-	 * enabled, we want to run them exactly once per syscall exit with
-	 * interrupts enabled.
-	 */
-	if (unlikely(work & SYSCALL_WORK_EXIT))
-		syscall_exit_work(regs, work);
-}
-
-static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
-{
-	syscall_exit_to_user_mode_prepare(regs);
-	local_irq_disable_exit_to_user();
-	exit_to_user_mode_prepare(regs);
-}
-
-void syscall_exit_to_user_mode_work(struct pt_regs *regs)
-{
-	__syscall_exit_to_user_mode_work(regs);
-}
-
-__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
-{
-	instrumentation_begin();
-	__syscall_exit_to_user_mode_work(regs);
-	instrumentation_end();
-	exit_to_user_mode();
-}
-
 noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
 {
 	enter_from_user_mode(regs);
diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c
new file mode 100644
index 000000000000..0eb036986ad4
--- /dev/null
+++ b/kernel/entry/syscall-common.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/audit.h>
+#include <linux/entry-common.h>
+#include "common.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/syscalls.h>
+
+static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
+{
+	if (unlikely(audit_context())) {
+		unsigned long args[6];
+
+		syscall_get_arguments(current, regs, args);
+		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
+	}
+}
+
+long syscall_trace_enter(struct pt_regs *regs, long syscall,
+				unsigned long work)
+{
+	long ret = 0;
+
+	/*
+	 * Handle Syscall User Dispatch.  This must comes first, since
+	 * the ABI here can be something that doesn't make sense for
+	 * other syscall_work features.
+	 */
+	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
+		if (syscall_user_dispatch(regs))
+			return -1L;
+	}
+
+	/* Handle ptrace */
+	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
+		ret = ptrace_report_syscall_entry(regs);
+		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
+			return -1L;
+	}
+
+	/* Do seccomp after ptrace, to catch any tracer changes. */
+	if (work & SYSCALL_WORK_SECCOMP) {
+		ret = __secure_computing(NULL);
+		if (ret == -1L)
+			return ret;
+	}
+
+	/* Either of the above might have changed the syscall number */
+	syscall = syscall_get_nr(current, regs);
+
+	if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
+		trace_sys_enter(regs, syscall);
+		/*
+		 * Probes or BPF hooks in the tracepoint may have changed the
+		 * system call number as well.
+		 */
+		syscall = syscall_get_nr(current, regs);
+	}
+
+	syscall_enter_audit(regs, syscall);
+
+	return ret ? : syscall;
+}
+
+noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
+{
+	enter_from_user_mode(regs);
+	instrumentation_begin();
+	local_irq_enable();
+	instrumentation_end();
+}
+
+/*
+ * If SYSCALL_EMU is set, then the only reason to report is when
+ * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
+ * instruction has been already reported in syscall_enter_from_user_mode().
+ */
+static inline bool report_single_step(unsigned long work)
+{
+	if (work & SYSCALL_WORK_SYSCALL_EMU)
+		return false;
+
+	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
+}
+
+static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
+{
+	bool step;
+
+	/*
+	 * If the syscall was rolled back due to syscall user dispatching,
+	 * then the tracers below are not invoked for the same reason as
+	 * the entry side was not invoked in syscall_trace_enter(): The ABI
+	 * of these syscalls is unknown.
+	 */
+	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
+		if (unlikely(current->syscall_dispatch.on_dispatch)) {
+			current->syscall_dispatch.on_dispatch = false;
+			return;
+		}
+	}
+
+	audit_syscall_exit(regs);
+
+	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
+		trace_sys_exit(regs, syscall_get_return_value(current, regs));
+
+	step = report_single_step(work);
+	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
+		ptrace_report_syscall_exit(regs, step);
+}
+
+/*
+ * Syscall specific exit to user mode preparation. Runs with interrupts
+ * enabled.
+ */
+static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
+{
+	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
+	unsigned long nr = syscall_get_nr(current, regs);
+
+	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
+
+	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
+			local_irq_enable();
+	}
+
+	rseq_syscall(regs);
+
+	/*
+	 * Do one-time syscall specific work. If these work items are
+	 * enabled, we want to run them exactly once per syscall exit with
+	 * interrupts enabled.
+	 */
+	if (unlikely(work & SYSCALL_WORK_EXIT))
+		syscall_exit_work(regs, work);
+}
+
+static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
+{
+	syscall_exit_to_user_mode_prepare(regs);
+	local_irq_disable_exit_to_user();
+	exit_to_user_mode_prepare(regs);
+}
+
+void syscall_exit_to_user_mode_work(struct pt_regs *regs)
+{
+	__syscall_exit_to_user_mode_work(regs);
+}
+
+__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
+{
+	instrumentation_begin();
+	__syscall_exit_to_user_mode_work(regs);
+	instrumentation_end();
+	exit_to_user_mode();
+}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 27a8fbd58091..2d560bb3efaa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -68,8 +68,8 @@
 #include <linux/workqueue_api.h>
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
-# ifdef CONFIG_GENERIC_ENTRY
-#  include <linux/entry-common.h>
+# ifdef CONFIG_GENERIC_IRQ_ENTRY
+#  include <linux/irq-entry-common.h>
 # endif
 #endif
 
@@ -7398,8 +7398,8 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 
-#ifdef CONFIG_GENERIC_ENTRY
-#include <linux/entry-common.h>
+#ifdef CONFIG_GENERIC_IRQ_ENTRY
+#include <linux/irq-entry-common.h>
 #endif
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (8 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 12:05   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Jinjie Ruan
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

ARM64 requires an additional check whether to reschedule on return
from interrupt.

Add arch_irqentry_exit_need_resched() as the default NOP
implementation and hook it up into the need_resched() condition in
raw_irqentry_exit_cond_resched().

This allows ARM64 to implement the architecture specific version for
switching over to the generic entry code.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 kernel/entry/common.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index b82032777310..4aa9656fa1b4 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -142,6 +142,20 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 	return ret;
 }
 
+/**
+ * arch_irqentry_exit_need_resched - Architecture specific need resched function
+ *
+ * Invoked from raw_irqentry_exit_cond_resched() to check if need resched.
+ * Defaults return true.
+ *
+ * The main purpose is to permit arch to skip preempt a task from an IRQ.
+ */
+static inline bool arch_irqentry_exit_need_resched(void);
+
+#ifndef arch_irqentry_exit_need_resched
+static inline bool arch_irqentry_exit_need_resched(void) { return true; }
+#endif
+
 void raw_irqentry_exit_cond_resched(void)
 {
 	if (!preempt_count()) {
@@ -149,7 +163,7 @@ void raw_irqentry_exit_cond_resched(void)
 		rcu_irq_exit_check_preempt();
 		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
 			WARN_ON_ONCE(!on_thread_stack());
-		if (need_resched())
+		if (need_resched() && arch_irqentry_exit_need_resched())
 			preempt_schedule_irq();
 	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (9 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64 Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-10 12:24   ` Mark Rutland
  2024-12-06 10:17 ` [PATCH -next v5 12/22] arm64/ptrace: Split report_syscall() function Jinjie Ruan
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
to use the generic entry infrastructure from kernel/entry/*.
The generic entry makes maintainers' work easier and codes
more elegant.

Switch arm64 to generic IRQ entry first, which removed duplicate 100+
LOC, and it will switch to generic entry completely later. Switch to
generic entry in two steps according to Mark's suggestion will make
it easier to review.

The changes are below:
 - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
   irqentry_enter/exit(). Also remove *enter_from/exit_to_user_mode(),
   and wrap with generic enter_from/exit_to_user_mode() because they
   are exactly the same so far.

 - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
   because they're exactly the same, so the temporary arm64 version
   irqentry_state can also be removed.

 - Remove PREEMPT_DYNAMIC code, as generic entry do the same thing
   if arm64 implement arch_irqentry_exit_need_resched().

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/Kconfig                    |   1 +
 arch/arm64/include/asm/entry-common.h |  64 ++++++
 arch/arm64/include/asm/preempt.h      |   6 -
 arch/arm64/kernel/entry-common.c      | 307 ++++++--------------------
 arch/arm64/kernel/signal.c            |   3 +-
 5 files changed, 129 insertions(+), 252 deletions(-)
 create mode 100644 arch/arm64/include/asm/entry-common.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0cd423d9aa5b..3751ab9f2a21 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -150,6 +150,7 @@ config ARM64
 	select GENERIC_EARLY_IOREMAP
 	select GENERIC_IDLE_POLL_SETUP
 	select GENERIC_IOREMAP
+	select GENERIC_IRQ_ENTRY
 	select GENERIC_IRQ_IPI
 	select GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD
 	select GENERIC_IRQ_PROBE
diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
new file mode 100644
index 000000000000..1cc9d966a6c3
--- /dev/null
+++ b/arch/arm64/include/asm/entry-common.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_ARM64_ENTRY_COMMON_H
+#define _ASM_ARM64_ENTRY_COMMON_H
+
+#include <linux/thread_info.h>
+
+#include <asm/daifflags.h>
+#include <asm/fpsimd.h>
+#include <asm/mte.h>
+#include <asm/stacktrace.h>
+
+#define ARCH_EXIT_TO_USER_MODE_WORK (_TIF_MTE_ASYNC_FAULT | _TIF_FOREIGN_FPSTATE)
+
+static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
+							unsigned long ti_work)
+{
+	if (ti_work & _TIF_MTE_ASYNC_FAULT) {
+		clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+		send_sig_fault(SIGSEGV, SEGV_MTEAERR, (void __user *)NULL, current);
+	}
+
+	if (ti_work & _TIF_FOREIGN_FPSTATE)
+		fpsimd_restore_current_state();
+}
+
+#define arch_exit_to_user_mode_work arch_exit_to_user_mode_work
+
+static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
+						  unsigned long ti_work)
+{
+	local_daif_mask();
+}
+
+#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
+
+static inline bool arch_irqentry_exit_need_resched(void)
+{
+	/*
+	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+	 * DAIF we must have handled an NMI, so skip preemption.
+	 */
+	if (system_uses_irq_prio_masking() && read_sysreg(daif))
+		return false;
+
+	/*
+	 * Preempting a task from an IRQ means we leave copies of PSTATE
+	 * on the stack. cpufeature's enable calls may modify PSTATE, but
+	 * resuming one of these preempted tasks would undo those changes.
+	 *
+	 * Only allow a task to be preempted once cpufeatures have been
+	 * enabled.
+	 */
+	if (!system_capabilities_finalized())
+		return false;
+
+	return true;
+}
+
+#define arch_irqentry_exit_need_resched arch_irqentry_exit_need_resched
+
+#endif /* _ASM_ARM64_ENTRY_COMMON_H */
diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index 0f0ba250efe8..932ea4b62042 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -2,7 +2,6 @@
 #ifndef __ASM_PREEMPT_H
 #define __ASM_PREEMPT_H
 
-#include <linux/jump_label.h>
 #include <linux/thread_info.h>
 
 #define PREEMPT_NEED_RESCHED	BIT(32)
@@ -85,22 +84,17 @@ static inline bool should_resched(int preempt_offset)
 void preempt_schedule(void);
 void preempt_schedule_notrace(void);
 
-void raw_irqentry_exit_cond_resched(void);
 #ifdef CONFIG_PREEMPT_DYNAMIC
 
-DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
 void dynamic_preempt_schedule(void);
 #define __preempt_schedule()		dynamic_preempt_schedule()
 void dynamic_preempt_schedule_notrace(void);
 #define __preempt_schedule_notrace()	dynamic_preempt_schedule_notrace()
-void dynamic_irqentry_exit_cond_resched(void);
-#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
 
 #else /* CONFIG_PREEMPT_DYNAMIC */
 
 #define __preempt_schedule()		preempt_schedule()
 #define __preempt_schedule_notrace()	preempt_schedule_notrace()
-#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
 
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 #endif /* CONFIG_PREEMPTION */
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 015a65d19b52..95885da2d776 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/context_tracking.h>
+#include <linux/irq-entry-common.h>
 #include <linux/kasan.h>
 #include <linux/linkage.h>
 #include <linux/lockdep.h>
@@ -28,13 +29,6 @@
 #include <asm/sysreg.h>
 #include <asm/system_misc.h>
 
-typedef struct irqentry_state {
-	union {
-		bool	exit_rcu;
-		bool	lockdep;
-	};
-} irqentry_state_t;
-
 /*
  * Handle IRQ/context state management when entering from kernel mode.
  * Before this function is called it is not safe to call regular kernel code,
@@ -45,24 +39,7 @@ typedef struct irqentry_state {
  */
 static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt_regs *regs)
 {
-	irqentry_state_t state = {
-		.exit_rcu = false,
-	};
-
-	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
-		lockdep_hardirqs_off(CALLER_ADDR0);
-		ct_irq_enter();
-		trace_hardirqs_off_finish();
-
-		state.exit_rcu = true;
-		return state;
-	}
-
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	rcu_irq_enter_check_tick();
-	trace_hardirqs_off_finish();
-
-	return state;
+	return irqentry_enter(regs);
 }
 
 static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
@@ -75,49 +52,6 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 	return state;
 }
 
-static inline bool arm64_need_resched(void)
-{
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return false;
-
-	/*
-	 * Preempting a task from an IRQ means we leave copies of PSTATE
-	 * on the stack. cpufeature's enable calls may modify PSTATE, but
-	 * resuming one of these preempted tasks would undo those changes.
-	 *
-	 * Only allow a task to be preempted once cpufeatures have been
-	 * enabled.
-	 */
-	if (!system_capabilities_finalized())
-		return false;
-
-	return true;
-}
-
-void raw_irqentry_exit_cond_resched(void)
-{
-	if (!preempt_count()) {
-		if (need_resched() && arm64_need_resched())
-			preempt_schedule_irq();
-	}
-}
-
-#ifdef CONFIG_PREEMPT_DYNAMIC
-DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-void dynamic_irqentry_exit_cond_resched(void)
-{
-	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
-		return;
-	raw_irqentry_exit_cond_resched();
-}
-#endif
-
 /*
  * Handle IRQ/context state management when exiting to kernel mode.
  * After this function returns it is not safe to call regular kernel code,
@@ -129,25 +63,7 @@ void dynamic_irqentry_exit_cond_resched(void)
 static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
 						  irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
-
-	if (!regs_irqs_disabled(regs)) {
-		if (state.exit_rcu) {
-			trace_hardirqs_on_prepare();
-			lockdep_hardirqs_on_prepare();
-			ct_irq_exit();
-			lockdep_hardirqs_on(CALLER_ADDR0);
-			return;
-		}
-
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
-		trace_hardirqs_on();
-	} else {
-		if (state.exit_rcu)
-			ct_irq_exit();
-	}
+	irqentry_exit(regs, state);
 }
 
 static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
@@ -162,18 +78,15 @@ static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
  * Before this function is called it is not safe to call regular kernel code,
  * instrumentable code, or any code which may trigger an exception.
  */
-static __always_inline void __enter_from_user_mode(void)
+static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
 {
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	CT_WARN_ON(ct_state() != CT_STATE_USER);
-	user_exit_irqoff();
-	trace_hardirqs_off_finish();
+	enter_from_user_mode(regs);
 	mte_disable_tco_entry(current);
 }
 
-static __always_inline void enter_from_user_mode(struct pt_regs *regs)
+static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)
 {
-	__enter_from_user_mode();
+	__enter_from_user_mode(regs);
 }
 
 /*
@@ -181,113 +94,17 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
  * After this function returns it is not safe to call regular kernel code,
  * instrumentable code, or any code which may trigger an exception.
  */
-static __always_inline void __exit_to_user_mode(void)
+static __always_inline void arm64_exit_to_user_mode(struct pt_regs *regs)
 {
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare();
-	user_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
-}
-
-static void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
-{
-	do {
-		local_irq_enable();
-
-		if (thread_flags & _TIF_NEED_RESCHED)
-			schedule();
-
-		if (thread_flags & _TIF_UPROBE)
-			uprobe_notify_resume(regs);
-
-		if (thread_flags & _TIF_MTE_ASYNC_FAULT) {
-			clear_thread_flag(TIF_MTE_ASYNC_FAULT);
-			send_sig_fault(SIGSEGV, SEGV_MTEAERR,
-				       (void __user *)NULL, current);
-		}
-
-		if (thread_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
-			do_signal(regs);
-
-		if (thread_flags & _TIF_NOTIFY_RESUME)
-			resume_user_mode_work(regs);
-
-		if (thread_flags & _TIF_FOREIGN_FPSTATE)
-			fpsimd_restore_current_state();
-
-		local_irq_disable();
-		thread_flags = read_thread_flags();
-	} while (thread_flags & _TIF_WORK_MASK);
-}
-
-static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
-{
-	unsigned long flags;
-
 	local_irq_disable();
-
-	flags = read_thread_flags();
-	if (unlikely(flags & _TIF_WORK_MASK))
-		do_notify_resume(regs, flags);
-
-	local_daif_mask();
-
-	lockdep_sys_exit();
-}
-
-static __always_inline void exit_to_user_mode(struct pt_regs *regs)
-{
 	exit_to_user_mode_prepare(regs);
 	mte_check_tfsr_exit();
-	__exit_to_user_mode();
+	exit_to_user_mode();
 }
 
 asmlinkage void noinstr asm_exit_to_user_mode(struct pt_regs *regs)
 {
-	exit_to_user_mode(regs);
-}
-
-/*
- * Handle IRQ/context state management when entering an NMI from user/kernel
- * mode. Before this function is called it is not safe to call regular kernel
- * code, instrumentable code, or any code which may trigger an exception.
- */
-static noinstr irqentry_state_t arm64_enter_nmi(struct pt_regs *regs)
-{
-	irqentry_state_t state;
-
-	state.lockdep = lockdep_hardirqs_enabled();
-
-	__nmi_enter();
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	lockdep_hardirq_enter();
-	ct_nmi_enter();
-
-	trace_hardirqs_off_finish();
-	ftrace_nmi_enter();
-
-	return state;
-}
-
-/*
- * Handle IRQ/context state management when exiting an NMI from user/kernel
- * mode. After this function returns it is not safe to call regular kernel
- * code, instrumentable code, or any code which may trigger an exception.
- */
-static void noinstr arm64_exit_nmi(struct pt_regs *regs,
-				   irqentry_state_t state)
-{
-	ftrace_nmi_exit();
-	if (state.lockdep) {
-		trace_hardirqs_on_prepare();
-		lockdep_hardirqs_on_prepare();
-	}
-
-	ct_nmi_exit();
-	lockdep_hardirq_exit();
-	if (state.lockdep)
-		lockdep_hardirqs_on(CALLER_ADDR0);
-	__nmi_exit();
+	arm64_exit_to_user_mode(regs);
 }
 
 /*
@@ -346,7 +163,7 @@ extern void (*handle_arch_fiq)(struct pt_regs *);
 static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
 				      unsigned long esr)
 {
-	arm64_enter_nmi(regs);
+	irqentry_nmi_enter(regs);
 
 	console_verbose();
 
@@ -580,10 +397,10 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
 static __always_inline void __el1_pnmi(struct pt_regs *regs,
 				       void (*handler)(struct pt_regs *))
 {
-	irqentry_state_t state = arm64_enter_nmi(regs);
+	irqentry_state_t state = irqentry_nmi_enter(regs);
 
 	do_interrupt_handler(regs, handler);
-	arm64_exit_nmi(regs, state);
+	irqentry_nmi_exit(regs, state);
 }
 
 static __always_inline void __el1_irq(struct pt_regs *regs,
@@ -624,19 +441,19 @@ asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs)
 	irqentry_state_t state;
 
 	local_daif_restore(DAIF_ERRCTX);
-	state = arm64_enter_nmi(regs);
+	state = irqentry_nmi_enter(regs);
 	do_serror(regs, esr);
-	arm64_exit_nmi(regs, state);
+	irqentry_nmi_exit(regs, state);
 }
 
 static void noinstr el0_da(struct pt_regs *regs, unsigned long esr)
 {
 	unsigned long far = read_sysreg(far_el1);
 
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_mem_abort(far, esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_ia(struct pt_regs *regs, unsigned long esr)
@@ -651,50 +468,50 @@ static void noinstr el0_ia(struct pt_regs *regs, unsigned long esr)
 	if (!is_ttbr0_addr(far))
 		arm64_apply_bp_hardening();
 
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_mem_abort(far, esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_fpsimd_acc(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_fpsimd_acc(esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_sve_acc(esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_sme_acc(esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_fpsimd_exc(esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_sys(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_sys(esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_pc(struct pt_regs *regs, unsigned long esr)
@@ -704,58 +521,58 @@ static void noinstr el0_pc(struct pt_regs *regs, unsigned long esr)
 	if (!is_ttbr0_addr(instruction_pointer(regs)))
 		arm64_apply_bp_hardening();
 
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_sp_pc_abort(far, esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_sp(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_sp_pc_abort(regs->sp, esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_undef(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_undef(regs, esr);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_bti(struct pt_regs *regs)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_bti(regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_mops(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_mops(regs, esr);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_gcs(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_gcs(regs, esr);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_inv(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	bad_el0_sync(regs, 0, esr);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_dbg(struct pt_regs *regs, unsigned long esr)
@@ -763,28 +580,28 @@ static void noinstr el0_dbg(struct pt_regs *regs, unsigned long esr)
 	/* Only watchpoints write FAR_EL1, otherwise its UNKNOWN */
 	unsigned long far = read_sysreg(far_el1);
 
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	do_debug_exception(far, esr, regs);
 	local_daif_restore(DAIF_PROCCTX);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_svc(struct pt_regs *regs)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	cortex_a76_erratum_1463225_svc_handler();
 	fp_user_discard();
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_svc(regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_fpac(regs, esr);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
@@ -852,7 +669,7 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 static void noinstr el0_interrupt(struct pt_regs *regs,
 				  void (*handler)(struct pt_regs *))
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 
 	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
 
@@ -863,7 +680,7 @@ static void noinstr el0_interrupt(struct pt_regs *regs,
 	do_interrupt_handler(regs, handler);
 	irq_exit_rcu();
 
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr __el0_irq_handler_common(struct pt_regs *regs)
@@ -891,13 +708,13 @@ static void noinstr __el0_error_handler_common(struct pt_regs *regs)
 	unsigned long esr = read_sysreg(esr_el1);
 	irqentry_state_t state;
 
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_ERRCTX);
-	state = arm64_enter_nmi(regs);
+	state = irqentry_nmi_enter(regs);
 	do_serror(regs, esr);
-	arm64_exit_nmi(regs, state);
+	irqentry_nmi_exit(regs, state);
 	local_daif_restore(DAIF_PROCCTX);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 asmlinkage void noinstr el0t_64_error_handler(struct pt_regs *regs)
@@ -908,19 +725,19 @@ asmlinkage void noinstr el0t_64_error_handler(struct pt_regs *regs)
 #ifdef CONFIG_COMPAT
 static void noinstr el0_cp15(struct pt_regs *regs, unsigned long esr)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_cp15(esr, regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 static void noinstr el0_svc_compat(struct pt_regs *regs)
 {
-	enter_from_user_mode(regs);
+	arm64_enter_from_user_mode(regs);
 	cortex_a76_erratum_1463225_svc_handler();
 	local_daif_restore(DAIF_PROCCTX);
 	do_el0_svc_compat(regs);
-	exit_to_user_mode(regs);
+	arm64_exit_to_user_mode(regs);
 }
 
 asmlinkage void noinstr el0t_32_sync_handler(struct pt_regs *regs)
@@ -994,7 +811,7 @@ asmlinkage void noinstr __noreturn handle_bad_stack(struct pt_regs *regs)
 	unsigned long esr = read_sysreg(esr_el1);
 	unsigned long far = read_sysreg(far_el1);
 
-	arm64_enter_nmi(regs);
+	irqentry_nmi_enter(regs);
 	panic_bad_stack(regs, esr, far);
 }
 #endif /* CONFIG_VMAP_STACK */
@@ -1028,9 +845,9 @@ __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
 	else if (cpu_has_pan())
 		set_pstate_pan(0);
 
-	state = arm64_enter_nmi(regs);
+	state = irqentry_nmi_enter(regs);
 	ret = do_sdei_event(regs, arg);
-	arm64_exit_nmi(regs, state);
+	irqentry_nmi_exit(regs, state);
 
 	return ret;
 }
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 14ac6fdb872b..84b6628647c7 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -9,6 +9,7 @@
 #include <linux/cache.h>
 #include <linux/compat.h>
 #include <linux/errno.h>
+#include <linux/irq-entry-common.h>
 #include <linux/kernel.h>
 #include <linux/signal.h>
 #include <linux/freezer.h>
@@ -1603,7 +1604,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
  * the kernel can handle, and then we build all the user-level signal handling
  * stack-frames in one go after that.
  */
-void do_signal(struct pt_regs *regs)
+void arch_do_signal_or_restart(struct pt_regs *regs)
 {
 	unsigned long continue_addr = 0, restart_addr = 0;
 	int retval = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 12/22] arm64/ptrace: Split report_syscall() function
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (10 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 13/22] arm64/ptrace: Refactor syscall_trace_enter() Jinjie Ruan
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

Split report_syscall() to two separate enter and exit
functions. So it will be more clear when arm64 switch to
generic entry.

No functional changes.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/ptrace.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index e4437f62a2cd..d0d801a4094a 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2298,7 +2298,7 @@ enum ptrace_syscall_dir {
 	PTRACE_SYSCALL_EXIT,
 };
 
-static void report_syscall(struct pt_regs *regs, enum ptrace_syscall_dir dir)
+static void report_syscall_enter(struct pt_regs *regs)
 {
 	int regno;
 	unsigned long saved_reg;
@@ -2321,13 +2321,24 @@ static void report_syscall(struct pt_regs *regs, enum ptrace_syscall_dir dir)
 	 */
 	regno = (is_compat_task() ? 12 : 7);
 	saved_reg = regs->regs[regno];
-	regs->regs[regno] = dir;
+	regs->regs[regno] = PTRACE_SYSCALL_ENTER;
 
-	if (dir == PTRACE_SYSCALL_ENTER) {
-		if (ptrace_report_syscall_entry(regs))
-			forget_syscall(regs);
-		regs->regs[regno] = saved_reg;
-	} else if (!test_thread_flag(TIF_SINGLESTEP)) {
+	if (ptrace_report_syscall_entry(regs))
+		forget_syscall(regs);
+	regs->regs[regno] = saved_reg;
+}
+
+static void report_syscall_exit(struct pt_regs *regs)
+{
+	int regno;
+	unsigned long saved_reg;
+
+	/* See comment for report_syscall_enter() */
+	regno = (is_compat_task() ? 12 : 7);
+	saved_reg = regs->regs[regno];
+	regs->regs[regno] = PTRACE_SYSCALL_EXIT;
+
+	if (!test_thread_flag(TIF_SINGLESTEP)) {
 		ptrace_report_syscall_exit(regs, 0);
 		regs->regs[regno] = saved_reg;
 	} else {
@@ -2347,7 +2358,7 @@ int syscall_trace_enter(struct pt_regs *regs)
 	unsigned long flags = read_thread_flags();
 
 	if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
-		report_syscall(regs, PTRACE_SYSCALL_ENTER);
+		report_syscall_enter(regs);
 		if (flags & _TIF_SYSCALL_EMU)
 			return NO_SYSCALL;
 	}
@@ -2375,7 +2386,7 @@ void syscall_trace_exit(struct pt_regs *regs)
 		trace_sys_exit(regs, syscall_get_return_value(current, regs));
 
 	if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP))
-		report_syscall(regs, PTRACE_SYSCALL_EXIT);
+		report_syscall_exit(regs);
 
 	rseq_syscall(regs);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 13/22] arm64/ptrace: Refactor syscall_trace_enter()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (11 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 12/22] arm64/ptrace: Split report_syscall() function Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 14/22] arm64/ptrace: Refactor syscall_trace_exit() Jinjie Ruan
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry syscall_trace_enter() use the input syscall
work flag and syscall number.

In preparation for moving arm64 over to the generic entry code, refactor
syscall_trace_enter() to also pass syscall number and thread flags,
by using syscall_get_nr() helper.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/syscall.h |  2 +-
 arch/arm64/kernel/ptrace.c       | 20 ++++++++++++++------
 arch/arm64/kernel/syscall.c      |  2 +-
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ab8e14b96f68..6b71d335c224 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -85,7 +85,7 @@ static inline int syscall_get_arch(struct task_struct *task)
 	return AUDIT_ARCH_AARCH64;
 }
 
-int syscall_trace_enter(struct pt_regs *regs);
+int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags);
 void syscall_trace_exit(struct pt_regs *regs);
 
 #endif	/* __ASM_SYSCALL_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index d0d801a4094a..48bb813e0ef6 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2353,10 +2353,8 @@ static void report_syscall_exit(struct pt_regs *regs)
 	}
 }
 
-int syscall_trace_enter(struct pt_regs *regs)
+int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 {
-	unsigned long flags = read_thread_flags();
-
 	if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
 		report_syscall_enter(regs);
 		if (flags & _TIF_SYSCALL_EMU)
@@ -2367,10 +2365,20 @@ int syscall_trace_enter(struct pt_regs *regs)
 	if (secure_computing() == -1)
 		return NO_SYSCALL;
 
-	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
-		trace_sys_enter(regs, regs->syscallno);
+	/* Either of the above might have changed the syscall number */
+	syscall = syscall_get_nr(current, regs);
+
+	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) {
+		trace_sys_enter(regs, syscall);
+
+		/*
+		 * Probes or BPF hooks in the tracepoint may have changed the
+		 * system call number as well.
+		 */
+		 syscall = syscall_get_nr(current, regs);
+	}
 
-	audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
+	audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1],
 			    regs->regs[2], regs->regs[3]);
 
 	return regs->syscallno;
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index c442fcec6b9e..eb328ee1423c 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -124,7 +124,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 		 */
 		if (scno == NO_SYSCALL)
 			syscall_set_return_value(current, regs, -ENOSYS, 0);
-		scno = syscall_trace_enter(regs);
+		scno = syscall_trace_enter(regs, regs->syscallno, flags);
 		if (scno == NO_SYSCALL)
 			goto trace_exit;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 14/22] arm64/ptrace: Refactor syscall_trace_exit()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (12 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 13/22] arm64/ptrace: Refactor syscall_trace_enter() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 15/22] arm64/ptrace: Refator el0_svc_common() Jinjie Ruan
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry syscall_exit_work() use the input
syscall work flag.

In preparation for moving arm64 over to the generic entry
code, refactor syscall_trace_exit() to also pass thread flags.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/syscall.h | 2 +-
 arch/arm64/kernel/ptrace.c       | 4 +---
 arch/arm64/kernel/syscall.c      | 3 ++-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 6b71d335c224..925a257145f9 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -86,6 +86,6 @@ static inline int syscall_get_arch(struct task_struct *task)
 }
 
 int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags);
-void syscall_trace_exit(struct pt_regs *regs);
+void syscall_trace_exit(struct pt_regs *regs, unsigned long flags);
 
 #endif	/* __ASM_SYSCALL_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 48bb813e0ef6..bb994d668d74 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2384,10 +2384,8 @@ int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 	return regs->syscallno;
 }
 
-void syscall_trace_exit(struct pt_regs *regs)
+void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
 {
-	unsigned long flags = read_thread_flags();
-
 	audit_syscall_exit(regs);
 
 	if (flags & _TIF_SYSCALL_TRACEPOINT)
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index eb328ee1423c..064dc114fb9b 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -143,7 +143,8 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	}
 
 trace_exit:
-	syscall_trace_exit(regs);
+	flags = read_thread_flags();
+	syscall_trace_exit(regs, flags);
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 15/22] arm64/ptrace: Refator el0_svc_common()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (13 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 14/22] arm64/ptrace: Refactor syscall_trace_exit() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 16/22] entry: Make syscall_exit_to_user_mode_prepare() not static Jinjie Ruan
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

As the generic entry, before report_syscall_exit(), it terminate
the process if the syscall is issued within a restartable sequence.

In preparation for moving arm64 over to the generic entry code,
refator el0_svc_common() as below:

- Extract syscall_exit_to_user_mode_prepare() helper to replace the
  the combination of read_thread_flags() and syscall_trace_exit(), also
  move the syscall exit check logic into it.

- Move rseq_syscall() ahead, so the CONFIG_DEBUG_RSEQ check is
  not needed.

- Move has_syscall_work() helper into asm/syscall.h to be reused for
  ptrace.c.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/syscall.h |  7 ++++++-
 arch/arm64/kernel/ptrace.c       | 10 +++++++++-
 arch/arm64/kernel/syscall.c      | 26 +++++---------------------
 3 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 925a257145f9..6eeb1e7b033f 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -85,7 +85,12 @@ static inline int syscall_get_arch(struct task_struct *task)
 	return AUDIT_ARCH_AARCH64;
 }
 
+static inline bool has_syscall_work(unsigned long flags)
+{
+	return unlikely(flags & _TIF_SYSCALL_WORK);
+}
+
 int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags);
-void syscall_trace_exit(struct pt_regs *regs, unsigned long flags);
+void syscall_exit_to_user_mode_prepare(struct pt_regs *regs);
 
 #endif	/* __ASM_SYSCALL_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index bb994d668d74..23df2e558fe9 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2384,7 +2384,7 @@ int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 	return regs->syscallno;
 }
 
-void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
+static void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
 {
 	audit_syscall_exit(regs);
 
@@ -2393,8 +2393,16 @@ void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
 
 	if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP))
 		report_syscall_exit(regs);
+}
+
+void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
+{
+	unsigned long flags = read_thread_flags();
 
 	rseq_syscall(regs);
+
+	if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP)
+		syscall_trace_exit(regs, flags);
 }
 
 /*
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 064dc114fb9b..a50db885fc34 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -65,11 +65,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 	choose_random_kstack_offset(get_random_u16());
 }
 
-static inline bool has_syscall_work(unsigned long flags)
-{
-	return unlikely(flags & _TIF_SYSCALL_WORK);
-}
-
 static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 			   const syscall_fn_t syscall_table[])
 {
@@ -125,26 +120,15 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 		if (scno == NO_SYSCALL)
 			syscall_set_return_value(current, regs, -ENOSYS, 0);
 		scno = syscall_trace_enter(regs, regs->syscallno, flags);
-		if (scno == NO_SYSCALL)
-			goto trace_exit;
+		if (scno == NO_SYSCALL) {
+			syscall_exit_to_user_mode_prepare(regs);
+			return;
+		}
 	}
 
 	invoke_syscall(regs, scno, sc_nr, syscall_table);
 
-	/*
-	 * The tracing status may have changed under our feet, so we have to
-	 * check again. However, if we were tracing entry, then we always trace
-	 * exit regardless, as the old entry assembly did.
-	 */
-	if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
-		flags = read_thread_flags();
-		if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP))
-			return;
-	}
-
-trace_exit:
-	flags = read_thread_flags();
-	syscall_trace_exit(regs, flags);
+	syscall_exit_to_user_mode_prepare(regs);
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 16/22] entry: Make syscall_exit_to_user_mode_prepare() not static
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (14 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 15/22] arm64/ptrace: Refator el0_svc_common() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 17/22] arm64/ptrace: Return early for ptrace_report_syscall_entry() error Jinjie Ruan
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

In order to switch to the generic entry for arm64, make
syscall_exit_to_user_mode_prepare() not static and can be used by
arm64.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 include/linux/entry-common.h  | 1 +
 kernel/entry/syscall-common.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index b3233e8328c5..d11bdb4679b3 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -172,5 +172,6 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs);
  * compelling architectural reason to use the separate functions.
  */
 void syscall_exit_to_user_mode(struct pt_regs *regs);
+void syscall_exit_to_user_mode_prepare(struct pt_regs *regs);
 
 #endif
diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c
index 0eb036986ad4..f78285097111 100644
--- a/kernel/entry/syscall-common.c
+++ b/kernel/entry/syscall-common.c
@@ -115,7 +115,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
  * Syscall specific exit to user mode preparation. Runs with interrupts
  * enabled.
  */
-static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
+void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
 {
 	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
 	unsigned long nr = syscall_get_nr(current, regs);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 17/22] arm64/ptrace: Return early for ptrace_report_syscall_entry() error
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (15 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 16/22] entry: Make syscall_exit_to_user_mode_prepare() not static Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 18/22] arm64/ptrace: Expand secure_computing() in place Jinjie Ruan
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

As the comment said, the calling arch code should abort the system
call and must prevent normal entry so no system call is
made if ptrace_report_syscall_entry() return nonzero.

As the generic entry check error for ptrace_report_syscall_entry(), in
preparation for moving arm64 over to the generic entry code, also return
early if it returns error.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/ptrace.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 23df2e558fe9..b53d3759baf8 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2298,10 +2298,10 @@ enum ptrace_syscall_dir {
 	PTRACE_SYSCALL_EXIT,
 };
 
-static void report_syscall_enter(struct pt_regs *regs)
+static int report_syscall_enter(struct pt_regs *regs)
 {
-	int regno;
 	unsigned long saved_reg;
+	int regno, ret;
 
 	/*
 	 * We have some ABI weirdness here in the way that we handle syscall
@@ -2323,9 +2323,13 @@ static void report_syscall_enter(struct pt_regs *regs)
 	saved_reg = regs->regs[regno];
 	regs->regs[regno] = PTRACE_SYSCALL_ENTER;
 
-	if (ptrace_report_syscall_entry(regs))
+	ret = ptrace_report_syscall_entry(regs);
+	if (ret)
 		forget_syscall(regs);
+
 	regs->regs[regno] = saved_reg;
+
+	return ret;
 }
 
 static void report_syscall_exit(struct pt_regs *regs)
@@ -2355,9 +2359,11 @@ static void report_syscall_exit(struct pt_regs *regs)
 
 int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 {
+	int ret;
+
 	if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
-		report_syscall_enter(regs);
-		if (flags & _TIF_SYSCALL_EMU)
+		ret = report_syscall_enter(regs);
+		if (ret || (flags & _TIF_SYSCALL_EMU))
 			return NO_SYSCALL;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 18/22] arm64/ptrace: Expand secure_computing() in place
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (16 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 17/22] arm64/ptrace: Return early for ptrace_report_syscall_entry() error Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 19/22] arm64/ptrace: Use syscall_get_arguments() heleper Jinjie Ruan
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry expand secure_computing() in place and call
__secure_computing() directly.

In order to switch to the generic entry for arm64, refactor
secure_computing() for syscall_trace_enter().

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/ptrace.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index b53d3759baf8..c0c00e173f61 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2368,8 +2368,11 @@ int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 	}
 
 	/* Do the secure computing after ptrace; failures should be fast. */
-	if (secure_computing() == -1)
-		return NO_SYSCALL;
+	if (flags & _TIF_SECCOMP) {
+		ret = __secure_computing(NULL);
+		if (ret == -1L)
+			return NO_SYSCALL;
+	}
 
 	/* Either of the above might have changed the syscall number */
 	syscall = syscall_get_nr(current, regs);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 19/22] arm64/ptrace: Use syscall_get_arguments() heleper
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (17 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 18/22] arm64/ptrace: Expand secure_computing() in place Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 20/22] entry: Add arch_ptrace_report_syscall_entry/exit() Jinjie Ruan
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

The generic entry check audit context first and use
syscall_get_arguments() helper.

In order to switch to the generic entry for arm64,

- Also use the helper.

- Extract the syscall_enter_audit() helper to make it clear.

- Check audit context for syscall_enter_audit(), which only adds
  one additional check without any other differences as
  audit_syscall_entry() check it first otherwise do nothing.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/ptrace.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index c0c00e173f61..3a7a1eaca0a9 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2357,6 +2357,17 @@ static void report_syscall_exit(struct pt_regs *regs)
 	}
 }
 
+static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
+{
+	if (unlikely(audit_context())) {
+		unsigned long args[6];
+
+		syscall_get_arguments(current, regs, args);
+		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
+	}
+
+}
+
 int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 {
 	int ret;
@@ -2387,8 +2398,7 @@ int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
 		 syscall = syscall_get_nr(current, regs);
 	}
 
-	audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1],
-			    regs->regs[2], regs->regs[3]);
+	syscall_enter_audit(regs, syscall);
 
 	return regs->syscallno;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 20/22] entry: Add arch_ptrace_report_syscall_entry/exit()
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (18 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 19/22] arm64/ptrace: Use syscall_get_arguments() heleper Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 21/22] entry: Add has_syscall_work() helepr Jinjie Ruan
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

Due to historical reasons, ARM64 need to save/restore during syscall
entry/exit because ARM64 use a scratch register (ip(r12) on AArch32,
x7 on AArch64) to denote syscall entry/exit, which differs from
the implementation of the generic entry.

Add arch_ptrace_report_syscall_entry/exit() as the default
ptrace_report_syscall_entry/exit() implementation. This allows arm64
to implement the architecture specific version for switching over to
the generic entry code.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 kernel/entry/syscall-common.c | 43 +++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c
index f78285097111..9ffa6349e769 100644
--- a/kernel/entry/syscall-common.c
+++ b/kernel/entry/syscall-common.c
@@ -17,6 +17,25 @@ static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
 	}
 }
 
+/**
+ * arch_ptrace_report_syscall_entry - Architecture specific
+ *				      ptrace_report_syscall_entry().
+ *
+ * Invoked from syscall_trace_enter() to wrap ptrace_report_syscall_entry().
+ * Defaults to ptrace_report_syscall_entry.
+ *
+ * The main purpose is to support arch-specific ptrace_report_syscall_entry()
+ * implementation.
+ */
+static inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs);
+
+#ifndef arch_ptrace_report_syscall_entry
+static inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs)
+{
+	return ptrace_report_syscall_entry(regs);
+}
+#endif
+
 long syscall_trace_enter(struct pt_regs *regs, long syscall,
 				unsigned long work)
 {
@@ -34,7 +53,7 @@ long syscall_trace_enter(struct pt_regs *regs, long syscall,
 
 	/* Handle ptrace */
 	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
-		ret = ptrace_report_syscall_entry(regs);
+		ret = arch_ptrace_report_syscall_entry(regs);
 		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
 			return -1L;
 	}
@@ -84,6 +103,26 @@ static inline bool report_single_step(unsigned long work)
 	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
 }
 
+/**
+ * arch_ptrace_report_syscall_exit - Architecture specific
+ *				     ptrace_report_syscall_exit.
+ *
+ * Invoked from syscall_exit_work() to wrap ptrace_report_syscall_exit().
+ *
+ * The main purpose is to support arch-specific ptrace_report_syscall_exit
+ * implementation.
+ */
+static inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs,
+						   int step);
+
+#ifndef arch_ptrace_report_syscall_exit
+static inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs,
+						   int step)
+{
+	ptrace_report_syscall_exit(regs, step);
+}
+#endif
+
 static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
 {
 	bool step;
@@ -108,7 +147,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
 
 	step = report_single_step(work);
 	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
-		ptrace_report_syscall_exit(regs, step);
+		arch_ptrace_report_syscall_exit(regs, step);
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 21/22] entry: Add has_syscall_work() helepr
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (19 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 20/22] entry: Add arch_ptrace_report_syscall_entry/exit() Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2024-12-06 10:17 ` [PATCH -next v5 22/22] arm64: entry: Convert to generic entry Jinjie Ruan
  2025-02-08  1:15 ` [PATCH -next v5 00/22] " Jinjie Ruan
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

Add has_syscall_work() heleper and use it in entry.h. The benefits of
doing so lie in the fact that it can be used in the architecture code
that uses generic entry.

No functional changes.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 include/linux/entry-common.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index d11bdb4679b3..3bb5d7d839f4 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -43,6 +43,11 @@
 				 SYSCALL_WORK_SYSCALL_EXIT_TRAP	|	\
 				 ARCH_SYSCALL_WORK_EXIT)
 
+static inline bool has_syscall_work(unsigned long work)
+{
+	return unlikely(work & SYSCALL_WORK_ENTER);
+}
+
 /**
  * syscall_enter_from_user_mode_prepare - Establish state and enable interrupts
  * @regs:	Pointer to currents pt_regs
@@ -90,7 +95,7 @@ static __always_inline long syscall_enter_from_user_mode_work(struct pt_regs *re
 {
 	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
 
-	if (work & SYSCALL_WORK_ENTER)
+	if (has_syscall_work(work))
 		syscall = syscall_trace_enter(regs, syscall, work);
 
 	return syscall;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH -next v5 22/22] arm64: entry: Convert to generic entry
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (20 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 21/22] entry: Add has_syscall_work() helepr Jinjie Ruan
@ 2024-12-06 10:17 ` Jinjie Ruan
  2025-02-08  1:15 ` [PATCH -next v5 00/22] " Jinjie Ruan
  22 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2024-12-06 10:17 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, ruanjinjie, pcc, ardb, sudeep.holla,
	guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
to use the generic entry infrastructure from kernel/entry/*.
The generic entry makes maintainers' work easier and codes more elegant.

The changes are below:
 - Remove TIF_SYSCALL_* flag, _TIF_WORK_MASK, _TIF_SYSCALL_WORK
 - Remove syscall_trace_enter/exit() and use generic identical functions.

Tested ok with following test cases on Qemu virt platform:
 - Perf tests.
 - Different `dynamic preempt` mode switch.
 - Pseudo NMI tests.
 - Stress-ng CPU stress test.
 - MTE test case in Documentation/arch/arm64/memory-tagging-extension.rst
   and all test cases in tools/testing/selftests/arm64/mte/*.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
v5:
- Rebased on the previous patch udapte.
- Define ARCH_SYSCALL_WORK_EXIT.
---
 arch/arm64/Kconfig                    |   2 +-
 arch/arm64/include/asm/entry-common.h |  70 ++++++++++++++
 arch/arm64/include/asm/syscall.h      |   7 +-
 arch/arm64/include/asm/thread_info.h  |  23 +----
 arch/arm64/kernel/debug-monitors.c    |   7 ++
 arch/arm64/kernel/ptrace.c            | 134 --------------------------
 arch/arm64/kernel/signal.c            |   2 +-
 arch/arm64/kernel/syscall.c           |   6 +-
 8 files changed, 87 insertions(+), 164 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3751ab9f2a21..a1d96712428e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -148,9 +148,9 @@ config ARM64
 	select GENERIC_CPU_DEVICES
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
+	select GENERIC_ENTRY
 	select GENERIC_IDLE_POLL_SETUP
 	select GENERIC_IOREMAP
-	select GENERIC_IRQ_ENTRY
 	select GENERIC_IRQ_IPI
 	select GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD
 	select GENERIC_IRQ_PROBE
diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
index 1cc9d966a6c3..6082393c61f2 100644
--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -10,6 +10,12 @@
 #include <asm/mte.h>
 #include <asm/stacktrace.h>
 
+enum ptrace_syscall_dir {
+	PTRACE_SYSCALL_ENTER = 0,
+	PTRACE_SYSCALL_EXIT,
+};
+
+#define ARCH_SYSCALL_WORK_EXIT (SYSCALL_WORK_SECCOMP | SYSCALL_WORK_SYSCALL_EMU)
 #define ARCH_EXIT_TO_USER_MODE_WORK (_TIF_MTE_ASYNC_FAULT | _TIF_FOREIGN_FPSTATE)
 
 static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
@@ -61,4 +67,68 @@ static inline bool arch_irqentry_exit_need_resched(void)
 
 #define arch_irqentry_exit_need_resched arch_irqentry_exit_need_resched
 
+static inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs)
+{
+	unsigned long saved_reg;
+	int regno, ret;
+
+	/*
+	 * We have some ABI weirdness here in the way that we handle syscall
+	 * exit stops because we indicate whether or not the stop has been
+	 * signalled from syscall entry or syscall exit by clobbering a general
+	 * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee
+	 * and restoring its old value after the stop. This means that:
+	 *
+	 * - Any writes by the tracer to this register during the stop are
+	 *   ignored/discarded.
+	 *
+	 * - The actual value of the register is not available during the stop,
+	 *   so the tracer cannot save it and restore it later.
+	 *
+	 * - Syscall stops behave differently to seccomp and pseudo-step traps
+	 *   (the latter do not nobble any registers).
+	 */
+	regno = (is_compat_task() ? 12 : 7);
+	saved_reg = regs->regs[regno];
+	regs->regs[regno] = PTRACE_SYSCALL_ENTER;
+
+	ret = ptrace_report_syscall_entry(regs);
+	if (ret)
+		forget_syscall(regs);
+
+	regs->regs[regno] = saved_reg;
+
+	return ret;
+}
+
+#define arch_ptrace_report_syscall_entry arch_ptrace_report_syscall_entry
+
+static inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs,
+						   int step)
+{
+	unsigned long saved_reg;
+	int regno;
+
+	/* See comment for arch_ptrace_report_syscall_entry() */
+	regno = (is_compat_task() ? 12 : 7);
+	saved_reg = regs->regs[regno];
+	regs->regs[regno] = PTRACE_SYSCALL_EXIT;
+
+	if (!test_thread_flag(TIF_SINGLESTEP)) {
+		ptrace_report_syscall_exit(regs, 0);
+		regs->regs[regno] = saved_reg;
+	} else {
+		regs->regs[regno] = saved_reg;
+
+		/*
+		 * Signal a pseudo-step exception since we are stepping but
+		 * tracer modifications to the registers may have rewound the
+		 * state machine.
+		 */
+		ptrace_report_syscall_exit(regs, 1);
+	}
+}
+
+#define arch_ptrace_report_syscall_exit arch_ptrace_report_syscall_exit
+
 #endif /* _ASM_ARM64_ENTRY_COMMON_H */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 6eeb1e7b033f..9891b15da4c3 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -85,12 +85,9 @@ static inline int syscall_get_arch(struct task_struct *task)
 	return AUDIT_ARCH_AARCH64;
 }
 
-static inline bool has_syscall_work(unsigned long flags)
+static inline bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs)
 {
-	return unlikely(flags & _TIF_SYSCALL_WORK);
+	return false;
 }
 
-int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags);
-void syscall_exit_to_user_mode_prepare(struct pt_regs *regs);
-
 #endif	/* __ASM_SYSCALL_H */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 1114c1c3300a..543fdb00d713 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -43,6 +43,7 @@ struct thread_info {
 	void			*scs_sp;
 #endif
 	u32			cpu;
+	unsigned long		syscall_work;   /* SYSCALL_WORK_ flags */
 };
 
 #define thread_saved_pc(tsk)	\
@@ -64,11 +65,6 @@ void arch_setup_new_exec(void);
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_MTE_ASYNC_FAULT	5	/* MTE Asynchronous Tag Check Fault */
 #define TIF_NOTIFY_SIGNAL	6	/* signal notifications exist */
-#define TIF_SYSCALL_TRACE	8	/* syscall trace active */
-#define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
-#define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
-#define TIF_SECCOMP		11	/* syscall secure computing */
-#define TIF_SYSCALL_EMU		12	/* syscall emulation active */
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
 #define TIF_FREEZE		19
 #define TIF_RESTORE_SIGMASK	20
@@ -87,28 +83,13 @@ void arch_setup_new_exec(void);
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
-#define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
-#define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
-#define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
-#define _TIF_SECCOMP		(1 << TIF_SECCOMP)
-#define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
-#define _TIF_UPROBE		(1 << TIF_UPROBE)
-#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
+#define _TIF_UPROBE            (1 << TIF_UPROBE)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_SVE		(1 << TIF_SVE)
 #define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 #define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
 #define _TIF_TSC_SIGSEGV	(1 << TIF_TSC_SIGSEGV)
 
-#define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \
-				 _TIF_NOTIFY_SIGNAL)
-
-#define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
-				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
-				 _TIF_SYSCALL_EMU)
-
 #ifdef CONFIG_SHADOW_CALL_STACK
 #define INIT_SCS							\
 	.scs_base	= init_shadow_call_stack,			\
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 460c09d03a73..95b70555a1a8 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -452,11 +452,18 @@ void user_enable_single_step(struct task_struct *task)
 
 	if (!test_and_set_ti_thread_flag(ti, TIF_SINGLESTEP))
 		set_regs_spsr_ss(task_pt_regs(task));
+
+	/*
+	 * Ensure that a trap is triggered once stepping out of a system
+	 * call prior to executing any user instruction.
+	 */
+	set_task_syscall_work(task, SYSCALL_EXIT_TRAP);
 }
 NOKPROBE_SYMBOL(user_enable_single_step);
 
 void user_disable_single_step(struct task_struct *task)
 {
 	clear_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP);
+	clear_task_syscall_work(task, SYSCALL_EXIT_TRAP);
 }
 NOKPROBE_SYMBOL(user_disable_single_step);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 3a7a1eaca0a9..a09058b9b7fb 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -42,9 +42,6 @@
 #include <asm/traps.h>
 #include <asm/system_misc.h>
 
-#define CREATE_TRACE_POINTS
-#include <trace/events/syscalls.h>
-
 struct pt_regs_offset {
 	const char *name;
 	int offset;
@@ -2293,137 +2290,6 @@ long arch_ptrace(struct task_struct *child, long request,
 	return ptrace_request(child, request, addr, data);
 }
 
-enum ptrace_syscall_dir {
-	PTRACE_SYSCALL_ENTER = 0,
-	PTRACE_SYSCALL_EXIT,
-};
-
-static int report_syscall_enter(struct pt_regs *regs)
-{
-	unsigned long saved_reg;
-	int regno, ret;
-
-	/*
-	 * We have some ABI weirdness here in the way that we handle syscall
-	 * exit stops because we indicate whether or not the stop has been
-	 * signalled from syscall entry or syscall exit by clobbering a general
-	 * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee
-	 * and restoring its old value after the stop. This means that:
-	 *
-	 * - Any writes by the tracer to this register during the stop are
-	 *   ignored/discarded.
-	 *
-	 * - The actual value of the register is not available during the stop,
-	 *   so the tracer cannot save it and restore it later.
-	 *
-	 * - Syscall stops behave differently to seccomp and pseudo-step traps
-	 *   (the latter do not nobble any registers).
-	 */
-	regno = (is_compat_task() ? 12 : 7);
-	saved_reg = regs->regs[regno];
-	regs->regs[regno] = PTRACE_SYSCALL_ENTER;
-
-	ret = ptrace_report_syscall_entry(regs);
-	if (ret)
-		forget_syscall(regs);
-
-	regs->regs[regno] = saved_reg;
-
-	return ret;
-}
-
-static void report_syscall_exit(struct pt_regs *regs)
-{
-	int regno;
-	unsigned long saved_reg;
-
-	/* See comment for report_syscall_enter() */
-	regno = (is_compat_task() ? 12 : 7);
-	saved_reg = regs->regs[regno];
-	regs->regs[regno] = PTRACE_SYSCALL_EXIT;
-
-	if (!test_thread_flag(TIF_SINGLESTEP)) {
-		ptrace_report_syscall_exit(regs, 0);
-		regs->regs[regno] = saved_reg;
-	} else {
-		regs->regs[regno] = saved_reg;
-
-		/*
-		 * Signal a pseudo-step exception since we are stepping but
-		 * tracer modifications to the registers may have rewound the
-		 * state machine.
-		 */
-		ptrace_report_syscall_exit(regs, 1);
-	}
-}
-
-static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
-{
-	if (unlikely(audit_context())) {
-		unsigned long args[6];
-
-		syscall_get_arguments(current, regs, args);
-		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
-	}
-
-}
-
-int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
-{
-	int ret;
-
-	if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
-		ret = report_syscall_enter(regs);
-		if (ret || (flags & _TIF_SYSCALL_EMU))
-			return NO_SYSCALL;
-	}
-
-	/* Do the secure computing after ptrace; failures should be fast. */
-	if (flags & _TIF_SECCOMP) {
-		ret = __secure_computing(NULL);
-		if (ret == -1L)
-			return NO_SYSCALL;
-	}
-
-	/* Either of the above might have changed the syscall number */
-	syscall = syscall_get_nr(current, regs);
-
-	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) {
-		trace_sys_enter(regs, syscall);
-
-		/*
-		 * Probes or BPF hooks in the tracepoint may have changed the
-		 * system call number as well.
-		 */
-		 syscall = syscall_get_nr(current, regs);
-	}
-
-	syscall_enter_audit(regs, syscall);
-
-	return regs->syscallno;
-}
-
-static void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
-{
-	audit_syscall_exit(regs);
-
-	if (flags & _TIF_SYSCALL_TRACEPOINT)
-		trace_sys_exit(regs, syscall_get_return_value(current, regs));
-
-	if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP))
-		report_syscall_exit(regs);
-}
-
-void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
-{
-	unsigned long flags = read_thread_flags();
-
-	rseq_syscall(regs);
-
-	if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP)
-		syscall_trace_exit(regs, flags);
-}
-
 /*
  * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487D.a.
  * We permit userspace to set SSBS (AArch64 bit 12, AArch32 bit 23) which is
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 84b6628647c7..6cc8fe19e6a0 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -8,8 +8,8 @@
 
 #include <linux/cache.h>
 #include <linux/compat.h>
+#include <linux/entry-common.h>
 #include <linux/errno.h>
-#include <linux/irq-entry-common.h>
 #include <linux/kernel.h>
 #include <linux/signal.h>
 #include <linux/freezer.h>
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index a50db885fc34..5aa585111c4b 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -2,6 +2,7 @@
 
 #include <linux/compiler.h>
 #include <linux/context_tracking.h>
+#include <linux/entry-common.h>
 #include <linux/errno.h>
 #include <linux/nospec.h>
 #include <linux/ptrace.h>
@@ -68,6 +69,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 			   const syscall_fn_t syscall_table[])
 {
+	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
 	unsigned long flags = read_thread_flags();
 
 	regs->orig_x0 = regs->regs[0];
@@ -101,7 +103,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 		return;
 	}
 
-	if (has_syscall_work(flags)) {
+	if (has_syscall_work(work)) {
 		/*
 		 * The de-facto standard way to skip a system call using ptrace
 		 * is to set the system call to -1 (NO_SYSCALL) and set x0 to a
@@ -119,7 +121,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 		 */
 		if (scno == NO_SYSCALL)
 			syscall_set_return_value(current, regs, -ENOSYS, 0);
-		scno = syscall_trace_enter(regs, regs->syscallno, flags);
+		scno = syscall_trace_enter(regs, regs->syscallno, work);
 		if (scno == NO_SYSCALL) {
 			syscall_exit_to_user_mode_prepare(regs);
 			return;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
  2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
                   ` (21 preceding siblings ...)
  2024-12-06 10:17 ` [PATCH -next v5 22/22] arm64: entry: Convert to generic entry Jinjie Ruan
@ 2025-02-08  1:15 ` Jinjie Ruan
  2025-02-10 12:30   ` Mark Rutland
  22 siblings, 1 reply; 38+ messages in thread
From: Jinjie Ruan @ 2025-02-08  1:15 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, pcc, ardb, sudeep.holla, guohanjun,
	rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel



On 2024/12/6 18:17, Jinjie Ruan wrote:
> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> to use the generic entry infrastructure from kernel/entry/*. The generic
> entry makes maintainers' work easier and codes more elegant, which aslo
> removed a lot of duplicate code.
> 
> The main steps are as follows:
> - Make arm64 easier to use irqentry_enter/exit().
> - Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
> - Split generic entry into generic irq entry and generic syscall to
>   make the single patch more concentrated in switching to one thing.
> - Switch to generic irq entry.
> - Make arm64 closer to the generic syscall code.
> - Switch to generic entry completely.
> 
> Changes in v5:
> - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
> - Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
> - Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
> - Update "irqentry_state_t ret/irq_state" to "state"
>   to keep it consistently.
> - Use generic irq entry header for PREEMPT_DYNAMIC after split
>   the generic entry.
> - Also refactor the ARM64 syscall code.
> - Introduce arch_ptrace_report_syscall_entry/exit(), instead of
>   arch_pre/post_report_syscall_entry/exit() to simplify code.
> - Make the syscall patches clear separation.
> - Update the commit message.

Gentle Ping.

> 
> Changes in v4:
> - Rework/cleanup split into a few patches as Mark suggested.
> - Replace interrupts_enabled() macro with regs_irqs_disabled(), instead
>   of left it here.
> - Remove rcu and lockdep state in pt_regs by using temporary
>   irqentry_state_t as Mark suggested.
> - Remove some unnecessary intermediate functions to make it clear.
> - Rework preempt irq and PREEMPT_DYNAMIC code
>   to make the switch more clear.
> - arch_prepare_*_entry/exit() -> arch_pre_*_entry/exit().
> - Expand the arch functions comment.
> - Make arch functions closer to its caller.
> - Declare saved_reg in for block.
> - Remove arch_exit_to_kernel_mode_prepare(), arch_enter_from_kernel_mode().
> - Adjust "Add few arch functions to use generic entry" patch to be
>   the penultimate.
> - Update the commit message.
> - Add suggested-by.
> 
> Changes in v3:
> - Test the MTE test cases.
> - Handle forget_syscall() in arch_post_report_syscall_entry()
> - Make the arch funcs not use __weak as Thomas suggested, so move
>   the arch funcs to entry-common.h, and make arch_forget_syscall() folded
>   in arch_post_report_syscall_entry() as suggested.
> - Move report_single_step() to thread_info.h for arm64
> - Change __always_inline() to inline, add inline for the other arch funcs.
> - Remove unused signal.h for entry-common.h.
> - Add Suggested-by.
> - Update the commit message.
> 
> Changes in v2:
> - Add tested-by.
> - Fix a bug that not call arch_post_report_syscall_entry() in
>   syscall_trace_enter() if ptrace_report_syscall_entry() return not zero.
> - Refactor report_syscall().
> - Add comment for arch_prepare_report_syscall_exit().
> - Adjust entry-common.h header file inclusion to alphabetical order.
> - Update the commit message.
> 
> Jinjie Ruan (22):
>   arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
>   arm64: entry: Refactor the entry and exit for exceptions from EL1
>   arm64: entry: Move arm64_preempt_schedule_irq() into
>     __exit_to_kernel_mode()
>   arm64: entry: Rework arm64_preempt_schedule_irq()
>   arm64: entry: Use preempt_count() and need_resched() helper
>   arm64: entry: Expand the need_irq_preemption() macro ahead
>   arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled
>   arm64: entry: Use different helpers to check resched for
>     PREEMPT_DYNAMIC
>   entry: Split generic entry into irq and syscall
>   entry: Add arch_irqentry_exit_need_resched() for arm64
>   arm64: entry: Switch to generic IRQ entry
>   arm64/ptrace: Split report_syscall() function
>   arm64/ptrace: Refactor syscall_trace_enter()
>   arm64/ptrace: Refactor syscall_trace_exit()
>   arm64/ptrace: Refator el0_svc_common()
>   entry: Make syscall_exit_to_user_mode_prepare() not static
>   arm64/ptrace: Return early for ptrace_report_syscall_entry() error
>   arm64/ptrace: Expand secure_computing() in place
>   arm64/ptrace: Use syscall_get_arguments() heleper
>   entry: Add arch_ptrace_report_syscall_entry/exit()
>   entry: Add has_syscall_work() helepr
>   arm64: entry: Convert to generic entry
> 
>  MAINTAINERS                           |   1 +
>  arch/Kconfig                          |   8 +
>  arch/arm64/Kconfig                    |   1 +
>  arch/arm64/include/asm/daifflags.h    |   2 +-
>  arch/arm64/include/asm/entry-common.h | 134 +++++++++
>  arch/arm64/include/asm/preempt.h      |   2 -
>  arch/arm64/include/asm/ptrace.h       |  11 +-
>  arch/arm64/include/asm/syscall.h      |   6 +-
>  arch/arm64/include/asm/thread_info.h  |  23 +-
>  arch/arm64/include/asm/xen/events.h   |   2 +-
>  arch/arm64/kernel/acpi.c              |   2 +-
>  arch/arm64/kernel/debug-monitors.c    |   9 +-
>  arch/arm64/kernel/entry-common.c      | 377 ++++++++-----------------
>  arch/arm64/kernel/ptrace.c            |  90 ------
>  arch/arm64/kernel/sdei.c              |   2 +-
>  arch/arm64/kernel/signal.c            |   3 +-
>  arch/arm64/kernel/syscall.c           |  31 +-
>  include/linux/entry-common.h          | 384 +------------------------
>  include/linux/irq-entry-common.h      | 389 ++++++++++++++++++++++++++
>  kernel/entry/Makefile                 |   3 +-
>  kernel/entry/common.c                 | 176 ++----------
>  kernel/entry/syscall-common.c         | 198 +++++++++++++
>  kernel/sched/core.c                   |   8 +-
>  23 files changed, 909 insertions(+), 953 deletions(-)
>  create mode 100644 arch/arm64/include/asm/entry-common.h
>  create mode 100644 include/linux/irq-entry-common.h
>  create mode 100644 kernel/entry/syscall-common.c
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
  2024-12-06 10:17 ` [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Jinjie Ruan
@ 2025-02-10 11:04   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:04 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:23PM +0800, Jinjie Ruan wrote:
> The generic entry code expects architecture code to provide
> regs_irqs_disabled(regs) function, but arm64 does not have this and
> provides inerrupts_enabled(regs), which has the opposite polarity.
> 
> In preparation for moving arm64 over to the generic entry code,
> relace arm64's interrupts_enabled() with regs_irqs_disabled() and
> update its callers under arch/arm64.
> 
> For the moment, a definition of interrupts_enabled() is provided for
> the GICv3 driver. Once arch/arm implement regs_irqs_disabled(), this
> can be removed.
> 
> No functional changes.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/daifflags.h  | 2 +-
>  arch/arm64/include/asm/ptrace.h     | 7 +++++++
>  arch/arm64/include/asm/xen/events.h | 2 +-
>  arch/arm64/kernel/acpi.c            | 2 +-
>  arch/arm64/kernel/debug-monitors.c  | 2 +-
>  arch/arm64/kernel/entry-common.c    | 4 ++--
>  arch/arm64/kernel/sdei.c            | 2 +-
>  7 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
> index fbb5c99eb2f9..5fca48009043 100644
> --- a/arch/arm64/include/asm/daifflags.h
> +++ b/arch/arm64/include/asm/daifflags.h
> @@ -128,7 +128,7 @@ static inline void local_daif_inherit(struct pt_regs *regs)
>  {
>  	unsigned long flags = regs->pstate & DAIF_MASK;
>  
> -	if (interrupts_enabled(regs))
> +	if (!regs_irqs_disabled(regs))
>  		trace_hardirqs_on();
>  
>  	if (system_uses_irq_prio_masking())
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index 47ff8654c5ec..bcfa96880377 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -214,9 +214,16 @@ static inline void forget_syscall(struct pt_regs *regs)
>  		(regs)->pmr == GIC_PRIO_IRQON :				\
>  		true)
>  
> +/*
> + * Used by the GICv3 driver, can be removed once arch/arm implements
> + * regs_irqs_disabled() directly.
> + */
>  #define interrupts_enabled(regs)			\
>  	(!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
>  
> +#define regs_irqs_disabled(regs)			\
> +	(((regs)->pstate & PSR_I_BIT) || (!irqs_priority_unmasked(regs)))

Please make this:

| static __always_inline bool regs_irqs_disabled(const struct pt_regs *regs)
| {
| 	return (regs->pstate & PSR_I_BIT) || !irqs_priority_unmasked(regs);
| }
| 
| #define interrupts_enabled(regs)	(!regs_irqs_disabled(regs))

That way this matches the style of x86 and s390, and with
interrupts_enabled() defined in terms of regs_irqs_disabled(), the two
cannot accidentaly diverge.

>  #define fast_interrupts_enabled(regs) \
>  	(!((regs)->pstate & PSR_F_BIT))

We should probably delete this at the same time; it's unused and we
don't want any new users to show up.

With those changes:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

>  
> diff --git a/arch/arm64/include/asm/xen/events.h b/arch/arm64/include/asm/xen/events.h
> index 2788e95d0ff0..2977b5fe068d 100644
> --- a/arch/arm64/include/asm/xen/events.h
> +++ b/arch/arm64/include/asm/xen/events.h
> @@ -14,7 +14,7 @@ enum ipi_vector {
>  
>  static inline int xen_irqs_disabled(struct pt_regs *regs)
>  {
> -	return !interrupts_enabled(regs);
> +	return regs_irqs_disabled(regs);
>  }
>  
>  #define xchg_xen_ulong(ptr, val) xchg((ptr), (val))
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index e6f66491fbe9..732f89daae23 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -403,7 +403,7 @@ int apei_claim_sea(struct pt_regs *regs)
>  	return_to_irqs_enabled = !irqs_disabled_flags(arch_local_save_flags());
>  
>  	if (regs)
> -		return_to_irqs_enabled = interrupts_enabled(regs);
> +		return_to_irqs_enabled = !regs_irqs_disabled(regs);
>  
>  	/*
>  	 * SEA can interrupt SError, mask it and describe this as an NMI so
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 58f047de3e1c..460c09d03a73 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -231,7 +231,7 @@ static void send_user_sigtrap(int si_code)
>  	if (WARN_ON(!user_mode(regs)))
>  		return;
>  
> -	if (interrupts_enabled(regs))
> +	if (!regs_irqs_disabled(regs))
>  		local_irq_enable();
>  
>  	arm64_force_sig_fault(SIGTRAP, si_code, instruction_pointer(regs),
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index b260ddc4d3e9..c547e70428d3 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -73,7 +73,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
>  {
>  	lockdep_assert_irqs_disabled();
>  
> -	if (interrupts_enabled(regs)) {
> +	if (!regs_irqs_disabled(regs)) {
>  		if (regs->exit_rcu) {
>  			trace_hardirqs_on_prepare();
>  			lockdep_hardirqs_on_prepare();
> @@ -569,7 +569,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
>  {
>  	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
>  
> -	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> +	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && regs_irqs_disabled(regs))
>  		__el1_pnmi(regs, handler);
>  	else
>  		__el1_irq(regs, handler);
> diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
> index 255d12f881c2..27a17da635d8 100644
> --- a/arch/arm64/kernel/sdei.c
> +++ b/arch/arm64/kernel/sdei.c
> @@ -247,7 +247,7 @@ unsigned long __kprobes do_sdei_event(struct pt_regs *regs,
>  	 * If we interrupted the kernel with interrupts masked, we always go
>  	 * back to wherever we came from.
>  	 */
> -	if (mode == kernel_mode && !interrupts_enabled(regs))
> +	if (mode == kernel_mode && regs_irqs_disabled(regs))
>  		return SDEI_EV_HANDLED;
>  
>  	/*
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1
  2024-12-06 10:17 ` [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1 Jinjie Ruan
@ 2025-02-10 11:08   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:08 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:24PM +0800, Jinjie Ruan wrote:
> The generic entry code uses irqentry_state_t to track lockdep and RCU
> state across exception entry and return. For historical reasons, arm64
> embeds similar fields within its pt_regs structure.
> 
> In preparation for moving arm64 over to the generic entry code, pull
> these fields out of arm64's pt_regs, and use a separate structure,
> matching the style of the generic entry code.
> 
> No functional changes.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/ptrace.h  |   4 -
>  arch/arm64/kernel/entry-common.c | 136 +++++++++++++++++++------------
>  2 files changed, 85 insertions(+), 55 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index bcfa96880377..e90dfc9982aa 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -169,10 +169,6 @@ struct pt_regs {
>  
>  	u64 sdei_ttbr1;
>  	struct frame_record_meta stackframe;
> -
> -	/* Only valid for some EL1 exceptions. */
> -	u64 lockdep_hardirqs;
> -	u64 exit_rcu;
>  };
>  
>  /* For correct stack alignment, pt_regs has to be a multiple of 16 bytes. */
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index c547e70428d3..1687627b2ecf 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -28,6 +28,13 @@
>  #include <asm/sysreg.h>
>  #include <asm/system_misc.h>
>  
> +typedef struct irqentry_state {
> +	union {
> +		bool	exit_rcu;
> +		bool	lockdep;
> +	};
> +} irqentry_state_t;

I think we should add an arm64_ prefix here, to avoid the possiblity of
build errors if we somehow get this and the common definition included
at the same time.

That'll require some simple changes when we switch over, but it should
be relatively obvious and simple.

Otherwise, the structural changes look good to me.

Mark.

> +
>  /*
>   * Handle IRQ/context state management when entering from kernel mode.
>   * Before this function is called it is not safe to call regular kernel code,
> @@ -36,29 +43,36 @@
>   * This is intended to match the logic in irqentry_enter(), handling the kernel
>   * mode transitions only.
>   */
> -static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs)
> +static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt_regs *regs)
>  {
> -	regs->exit_rcu = false;
> +	irqentry_state_t state = {
> +		.exit_rcu = false,
> +	};
>  
>  	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
>  		lockdep_hardirqs_off(CALLER_ADDR0);
>  		ct_irq_enter();
>  		trace_hardirqs_off_finish();
>  
> -		regs->exit_rcu = true;
> -		return;
> +		state.exit_rcu = true;
> +		return state;
>  	}
>  
>  	lockdep_hardirqs_off(CALLER_ADDR0);
>  	rcu_irq_enter_check_tick();
>  	trace_hardirqs_off_finish();
> +
> +	return state;
>  }
>  
> -static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> +static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  {
> -	__enter_from_kernel_mode(regs);
> +	irqentry_state_t state = __enter_from_kernel_mode(regs);
> +
>  	mte_check_tfsr_entry();
>  	mte_disable_tco_entry(current);
> +
> +	return state;
>  }
>  
>  /*
> @@ -69,12 +83,13 @@ static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
>   * This is intended to match the logic in irqentry_exit(), handling the kernel
>   * mode transitions only, and with preemption handled elsewhere.
>   */
> -static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
> +static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
> +						  irqentry_state_t state)
>  {
>  	lockdep_assert_irqs_disabled();
>  
>  	if (!regs_irqs_disabled(regs)) {
> -		if (regs->exit_rcu) {
> +		if (state.exit_rcu) {
>  			trace_hardirqs_on_prepare();
>  			lockdep_hardirqs_on_prepare();
>  			ct_irq_exit();
> @@ -84,15 +99,16 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
>  
>  		trace_hardirqs_on();
>  	} else {
> -		if (regs->exit_rcu)
> +		if (state.exit_rcu)
>  			ct_irq_exit();
>  	}
>  }
>  
> -static void noinstr exit_to_kernel_mode(struct pt_regs *regs)
> +static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
> +					irqentry_state_t state)
>  {
>  	mte_check_tfsr_exit();
> -	__exit_to_kernel_mode(regs);
> +	__exit_to_kernel_mode(regs, state);
>  }
>  
>  /*
> @@ -190,9 +206,11 @@ asmlinkage void noinstr asm_exit_to_user_mode(struct pt_regs *regs)
>   * mode. Before this function is called it is not safe to call regular kernel
>   * code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_enter_nmi(struct pt_regs *regs)
> +static noinstr irqentry_state_t arm64_enter_nmi(struct pt_regs *regs)
>  {
> -	regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
> +	irqentry_state_t state;
> +
> +	state.lockdep = lockdep_hardirqs_enabled();
>  
>  	__nmi_enter();
>  	lockdep_hardirqs_off(CALLER_ADDR0);
> @@ -201,6 +219,8 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
>  
>  	trace_hardirqs_off_finish();
>  	ftrace_nmi_enter();
> +
> +	return state;
>  }
>  
>  /*
> @@ -208,19 +228,18 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
>   * mode. After this function returns it is not safe to call regular kernel
>   * code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_exit_nmi(struct pt_regs *regs)
> +static void noinstr arm64_exit_nmi(struct pt_regs *regs,
> +				   irqentry_state_t state)
>  {
> -	bool restore = regs->lockdep_hardirqs;
> -
>  	ftrace_nmi_exit();
> -	if (restore) {
> +	if (state.lockdep) {
>  		trace_hardirqs_on_prepare();
>  		lockdep_hardirqs_on_prepare();
>  	}
>  
>  	ct_nmi_exit();
>  	lockdep_hardirq_exit();
> -	if (restore)
> +	if (state.lockdep)
>  		lockdep_hardirqs_on(CALLER_ADDR0);
>  	__nmi_exit();
>  }
> @@ -230,14 +249,18 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs)
>   * kernel mode. Before this function is called it is not safe to call regular
>   * kernel code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
> +static noinstr irqentry_state_t arm64_enter_el1_dbg(struct pt_regs *regs)
>  {
> -	regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
> +	irqentry_state_t state;
> +
> +	state.lockdep = lockdep_hardirqs_enabled();
>  
>  	lockdep_hardirqs_off(CALLER_ADDR0);
>  	ct_nmi_enter();
>  
>  	trace_hardirqs_off_finish();
> +
> +	return state;
>  }
>  
>  /*
> @@ -245,17 +268,16 @@ static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
>   * kernel mode. After this function returns it is not safe to call regular
>   * kernel code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
> +static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
> +				       irqentry_state_t state)
>  {
> -	bool restore = regs->lockdep_hardirqs;
> -
> -	if (restore) {
> +	if (state.lockdep) {
>  		trace_hardirqs_on_prepare();
>  		lockdep_hardirqs_on_prepare();
>  	}
>  
>  	ct_nmi_exit();
> -	if (restore)
> +	if (state.lockdep)
>  		lockdep_hardirqs_on(CALLER_ADDR0);
>  }
>  
> @@ -426,78 +448,86 @@ UNHANDLED(el1t, 64, error)
>  static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
>  {
>  	unsigned long far = read_sysreg(far_el1);
> +	irqentry_state_t state;
>  
> -	enter_from_kernel_mode(regs);
> +	state = enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_mem_abort(far, esr, regs);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
>  {
>  	unsigned long far = read_sysreg(far_el1);
> +	irqentry_state_t state;
>  
> -	enter_from_kernel_mode(regs);
> +	state = enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_sp_pc_abort(far, esr, regs);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_undef(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_bti(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_gcs(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_mops(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr)
>  {
>  	unsigned long far = read_sysreg(far_el1);
> +	irqentry_state_t state;
>  
> -	arm64_enter_el1_dbg(regs);
> +	state = arm64_enter_el1_dbg(regs);
>  	if (!cortex_a76_erratum_1463225_debug_handler(regs))
>  		do_debug_exception(far, esr, regs);
> -	arm64_exit_el1_dbg(regs);
> +	arm64_exit_el1_dbg(regs, state);
>  }
>  
>  static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_fpac(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
> @@ -546,15 +576,16 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
>  static __always_inline void __el1_pnmi(struct pt_regs *regs,
>  				       void (*handler)(struct pt_regs *))
>  {
> -	arm64_enter_nmi(regs);
> +	irqentry_state_t state = arm64_enter_nmi(regs);
> +
>  	do_interrupt_handler(regs, handler);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  }
>  
>  static __always_inline void __el1_irq(struct pt_regs *regs,
>  				      void (*handler)(struct pt_regs *))
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
>  
>  	irq_enter_rcu();
>  	do_interrupt_handler(regs, handler);
> @@ -562,7 +593,7 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
>  
>  	arm64_preempt_schedule_irq();
>  
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  static void noinstr el1_interrupt(struct pt_regs *regs,
>  				  void (*handler)(struct pt_regs *))
> @@ -588,11 +619,12 @@ asmlinkage void noinstr el1h_64_fiq_handler(struct pt_regs *regs)
>  asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs)
>  {
>  	unsigned long esr = read_sysreg(esr_el1);
> +	irqentry_state_t state;
>  
>  	local_daif_restore(DAIF_ERRCTX);
> -	arm64_enter_nmi(regs);
> +	state = arm64_enter_nmi(regs);
>  	do_serror(regs, esr);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  }
>  
>  static void noinstr el0_da(struct pt_regs *regs, unsigned long esr)
> @@ -855,12 +887,13 @@ asmlinkage void noinstr el0t_64_fiq_handler(struct pt_regs *regs)
>  static void noinstr __el0_error_handler_common(struct pt_regs *regs)
>  {
>  	unsigned long esr = read_sysreg(esr_el1);
> +	irqentry_state_t state;
>  
>  	enter_from_user_mode(regs);
>  	local_daif_restore(DAIF_ERRCTX);
> -	arm64_enter_nmi(regs);
> +	state = arm64_enter_nmi(regs);
>  	do_serror(regs, esr);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  	local_daif_restore(DAIF_PROCCTX);
>  	exit_to_user_mode(regs);
>  }
> @@ -968,6 +1001,7 @@ asmlinkage void noinstr __noreturn handle_bad_stack(struct pt_regs *regs)
>  asmlinkage noinstr unsigned long
>  __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
>  {
> +	irqentry_state_t state;
>  	unsigned long ret;
>  
>  	/*
> @@ -992,9 +1026,9 @@ __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
>  	else if (cpu_has_pan())
>  		set_pstate_pan(0);
>  
> -	arm64_enter_nmi(regs);
> +	state = arm64_enter_nmi(regs);
>  	ret = do_sdei_event(regs, arg);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  
>  	return ret;
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()
  2024-12-06 10:17 ` [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() Jinjie Ruan
@ 2025-02-10 11:26   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:26 UTC (permalink / raw)
  To: Jinjie Ruan, tglx
  Cc: catalin.marinas, will, oleg, sstabellini, peterz, luto, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, kees, wad, akpm, samitolvanen, masahiroy, hca,
	aliceryhl, rppt, xur, paulmck, arnd, mbenes, puranjay, pcc, ardb,
	sudeep.holla, guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron,
	liaochang1, kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:25PM +0800, Jinjie Ruan wrote:
> The generic entry code try to reschedule every time when the kernel
> mode non-NMI exception return. At the moment, arm64 only reschedule every
> time when EL1 irq exception return;

I think this is a bit unclear, and should say something like:
  
| The arm64 entry code only preempts a kernel context upon a return from
| a regular IRQ exception. The generic entry code may preempt a kernel
| context for any exception return where irqentry_exit() is used, and so
| may preempt other exceptions such as faults.

Thomas, can you confirm that's the *intent* of the generic entry code?

> In preparation for moving arm64 over to the generic entry code, move
> arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
> only EL1 irq but also all EL1 non-NMI exception return, there is a chance
> to reschedule. And only if irqs are enabled when the exception trapped,
> there may be a chance to reschedule after the exceptions have been handled,
> so move arm64_preempt_schedule_irq() into regs_irqs_disabled()
> check false block, but it will try to reschedule only when TINY_RCU is
> enabled or current is not an idle task.

I think the detail is confusing here, and it would be better to say:

| In preparation for moving arm64 over to the generic entry code, align
| arm64 with the generic behaviour by calling
| arm64_preempt_schedule_irq() from exit_to_kernel_mode(). To make this
| possible, arm64_preempt_schedule_irq() and need_irq_preemption() are
| moved earlier in the file, with no changes.

Mark.

> As Mark pointed out, this change will have the following 2 key impact:
> 
> - " We'll preempt even without taking a "real" interrupt. That
>     shouldn't result in preemption that wasn't possible before,
>     but it does change the probability of preempting at certain points,
>     and might have a performance impact, so probably warrants a
>     benchmark."
> 
> - " We will not preempt when taking interrupts from a region of kernel
>     code where IRQs are enabled but RCU is not watching, matching the
>     behaviour of the generic entry code.
> 
>     This has the potential to introduce livelock if we can ever have a
>     screaming interrupt in such a region, so we'll need to go figure out
>     whether that's actually a problem.
> 
>     Having this as a separate patch will make it easier to test/bisect
>     for that specifically."
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 88 ++++++++++++++++----------------
>  1 file changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 1687627b2ecf..7a588515ee07 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -75,6 +75,48 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  	return state;
>  }
>  
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +#define need_irq_preemption() \
> +	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +#else
> +#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
> +#endif
> +
> +static void __sched arm64_preempt_schedule_irq(void)
> +{
> +	if (!need_irq_preemption())
> +		return;
> +
> +	/*
> +	 * Note: thread_info::preempt_count includes both thread_info::count
> +	 * and thread_info::need_resched, and is not equivalent to
> +	 * preempt_count().
> +	 */
> +	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> +		return;
> +
> +	/*
> +	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +	 * DAIF we must have handled an NMI, so skip preemption.
> +	 */
> +	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> +		return;
> +
> +	/*
> +	 * Preempting a task from an IRQ means we leave copies of PSTATE
> +	 * on the stack. cpufeature's enable calls may modify PSTATE, but
> +	 * resuming one of these preempted tasks would undo those changes.
> +	 *
> +	 * Only allow a task to be preempted once cpufeatures have been
> +	 * enabled.
> +	 */
> +	if (system_capabilities_finalized())
> +		preempt_schedule_irq();
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -97,6 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> +		arm64_preempt_schedule_irq();
> +
>  		trace_hardirqs_on();
>  	} else {
>  		if (state.exit_rcu)
> @@ -281,48 +325,6 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
>  		lockdep_hardirqs_on(CALLER_ADDR0);
>  }
>  
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -#define need_irq_preemption() \
> -	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -#else
> -#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
> -#endif
> -
> -static void __sched arm64_preempt_schedule_irq(void)
> -{
> -	if (!need_irq_preemption())
> -		return;
> -
> -	/*
> -	 * Note: thread_info::preempt_count includes both thread_info::count
> -	 * and thread_info::need_resched, and is not equivalent to
> -	 * preempt_count().
> -	 */
> -	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> -		return;
> -
> -	/*
> -	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> -	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> -	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> -	 * DAIF we must have handled an NMI, so skip preemption.
> -	 */
> -	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> -		return;
> -
> -	/*
> -	 * Preempting a task from an IRQ means we leave copies of PSTATE
> -	 * on the stack. cpufeature's enable calls may modify PSTATE, but
> -	 * resuming one of these preempted tasks would undo those changes.
> -	 *
> -	 * Only allow a task to be preempted once cpufeatures have been
> -	 * enabled.
> -	 */
> -	if (system_capabilities_finalized())
> -		preempt_schedule_irq();
> -}
> -
>  static void do_interrupt_handler(struct pt_regs *regs,
>  				 void (*handler)(struct pt_regs *))
>  {
> @@ -591,8 +593,6 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
>  	do_interrupt_handler(regs, handler);
>  	irq_exit_rcu();
>  
> -	arm64_preempt_schedule_irq();
> -
>  	exit_to_kernel_mode(regs, state);
>  }
>  static void noinstr el1_interrupt(struct pt_regs *regs,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq()
  2024-12-06 10:17 ` [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq() Jinjie Ruan
@ 2025-02-10 11:33   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:33 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:26PM +0800, Jinjie Ruan wrote:
> The generic entry do preempt_schedule_irq() by checking if need_resched()
> satisfied, but arm64 has some of its own additional checks such as
> GIC priority masking.
> 
> In preparation for moving arm64 over to the generic entry code, rework
> arm64_preempt_schedule_irq() to check whether it need resched in a check
> function called arm64_need_resched().

I think what this is saying is that the generic entry code has the form:

| raw_irqentry_exit_cond_resched()
| {
| 	if (!preempt_count()) {
| 		...
| 		if (need_resched())
| 			preempt_schedule_irq();
| 	}
| }

... but it's not obvious why it's better to have and
arm64_need_resched() rather than a arm64_preempt_schedule_irq().

Having some idea of the change you intend to make to the generic code
would be helpful, and/or that generic change should be made earlier as a
preparatory patch.

Mark.

> No functional changes.
> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 7a588515ee07..da68c089b74b 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -83,10 +83,10 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
>  #define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
>  #endif
>  
> -static void __sched arm64_preempt_schedule_irq(void)
> +static inline bool arm64_need_resched(void)
>  {
>  	if (!need_irq_preemption())
> -		return;
> +		return false;
>  
>  	/*
>  	 * Note: thread_info::preempt_count includes both thread_info::count
> @@ -94,7 +94,7 @@ static void __sched arm64_preempt_schedule_irq(void)
>  	 * preempt_count().
>  	 */
>  	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> -		return;
> +		return false;
>  
>  	/*
>  	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> @@ -103,7 +103,7 @@ static void __sched arm64_preempt_schedule_irq(void)
>  	 * DAIF we must have handled an NMI, so skip preemption.
>  	 */
>  	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> -		return;
> +		return false;
>  
>  	/*
>  	 * Preempting a task from an IRQ means we leave copies of PSTATE
> @@ -113,8 +113,10 @@ static void __sched arm64_preempt_schedule_irq(void)
>  	 * Only allow a task to be preempted once cpufeatures have been
>  	 * enabled.
>  	 */
> -	if (system_capabilities_finalized())
> -		preempt_schedule_irq();
> +	if (!system_capabilities_finalized())
> +		return false;
> +
> +	return true;
>  }
>  
>  /*
> @@ -139,7 +141,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		arm64_preempt_schedule_irq();
> +		if (arm64_need_resched())
> +			preempt_schedule_irq();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper
  2024-12-06 10:17 ` [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper Jinjie Ruan
@ 2025-02-10 11:40   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:40 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:27PM +0800, Jinjie Ruan wrote:
> The generic entry code uses preempt_count() and need_resched() helpers to
> check if it is time to resched. Currently, arm64 use its own check logic,
> that is "READ_ONCE(current_thread_info()->preempt_count == 0", which is
> equivalent to "preempt_count() == 0 && need_resched()".

Hmm. The existing code relies upon preempt_fold_need_resched() to work
correctly. If we want to move from:

	READ_ONCE(current_thread_info()->preempt_count) == 0

... to:

	!preempt_count() && need_resched()

... then that change should be made *before* we change the preemption
logic to preempt non-IRQ exceptions in patch 3. Otherwise, that logic is
consuming stale data most of the time.

Mark.

> In preparation for moving arm64 over to the generic entry code, use
> these helpers to replace arm64's own code and move it ahead.
> 
> No functional changes.
> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index da68c089b74b..efd1a990d138 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -88,14 +88,6 @@ static inline bool arm64_need_resched(void)
>  	if (!need_irq_preemption())
>  		return false;
>  
> -	/*
> -	 * Note: thread_info::preempt_count includes both thread_info::count
> -	 * and thread_info::need_resched, and is not equivalent to
> -	 * preempt_count().
> -	 */
> -	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> -		return false;
> -
>  	/*
>  	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
>  	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> @@ -141,8 +133,10 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		if (arm64_need_resched())
> -			preempt_schedule_irq();
> +		if (!preempt_count() && need_resched()) {
> +			if (arm64_need_resched())
> +				preempt_schedule_irq();
> +		}
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead
  2024-12-06 10:17 ` [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead Jinjie Ruan
@ 2025-02-10 11:48   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:48 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:28PM +0800, Jinjie Ruan wrote:
> The generic entry has the same logic as need_irq_preemption()
> macro and use a helper function to check other resched condition.
> 
> In preparation for moving arm64 over to the generic entry code,
> check and expand need_irq_preemption() ahead and extract arm64 resched
> check code to a helper function.

I think this is just saying that the goal is to align the structure of
the code with raw_irqentry_exit_cond_resched() from the generic entry
code.

It'd be a bit clearer to say that, and to do this *before* moving the
call into __exit_to_kernel_mode().

Mark.

> 
> No functional changes.
> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/preempt.h |  1 +
>  arch/arm64/kernel/entry-common.c | 28 +++++++++++++++++-----------
>  2 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
> index 0159b625cc7f..d0f93385bd85 100644
> --- a/arch/arm64/include/asm/preempt.h
> +++ b/arch/arm64/include/asm/preempt.h
> @@ -85,6 +85,7 @@ static inline bool should_resched(int preempt_offset)
>  void preempt_schedule(void);
>  void preempt_schedule_notrace(void);
>  
> +void raw_irqentry_exit_cond_resched(void);
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  
>  DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index efd1a990d138..80b47ca02db2 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -77,17 +77,10 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -#define need_irq_preemption() \
> -	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -#else
> -#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
>  #endif
>  
>  static inline bool arm64_need_resched(void)
>  {
> -	if (!need_irq_preemption())
> -		return false;
> -
>  	/*
>  	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
>  	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> @@ -111,6 +104,22 @@ static inline bool arm64_need_resched(void)
>  	return true;
>  }
>  
> +void raw_irqentry_exit_cond_resched(void)
> +{
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +		return;
> +#else
> +	if (!IS_ENABLED(CONFIG_PREEMPTION))
> +		return;
> +#endif
> +
> +	if (!preempt_count()) {
> +		if (need_resched() && arm64_need_resched())
> +			preempt_schedule_irq();
> +	}
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -133,10 +142,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		if (!preempt_count() && need_resched()) {
> -			if (arm64_need_resched())
> -				preempt_schedule_irq();
> -		}
> +		raw_irqentry_exit_cond_resched();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled
  2024-12-06 10:17 ` [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled Jinjie Ruan
@ 2025-02-10 11:52   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:52 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:29PM +0800, Jinjie Ruan wrote:
> The generic entry check PREEMPTION for both PREEMPT_DYNAMIC
> enabled and PREEMPT_DYNAMIC disabled.
> 
> Whether PREEMPT_DYNAMIC enabled or not, PREEMPTION should
> be enabled to allow reschedule before EL1 exception return, so
> move PREEMPTION check ahead in preparation for moving arm64 over
> to the generic entry code.

This is just moving the IS_ENABLED() check. It'd be clearer to say
something like "hoist the IS_ENABLED() check earlier", but equally we
could do that earleir in the series by folding this into the prior
patch.

Mark.

> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 80b47ca02db2..029f8bd72f8a 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -109,9 +109,6 @@ void raw_irqentry_exit_cond_resched(void)
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
>  		return;
> -#else
> -	if (!IS_ENABLED(CONFIG_PREEMPTION))
> -		return;
>  #endif
>  
>  	if (!preempt_count()) {
> @@ -142,7 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		raw_irqentry_exit_cond_resched();
> +		if (IS_ENABLED(CONFIG_PREEMPTION))
> +			raw_irqentry_exit_cond_resched();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC
  2024-12-06 10:17 ` [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC Jinjie Ruan
@ 2025-02-10 11:54   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 11:54 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:30PM +0800, Jinjie Ruan wrote:
> In generic entry, when PREEMPT_DYNAMIC is enabled or disabled, two
> different helpers are used to check whether resched is required
> and some common code is reused.
> 
> In preparation for moving arm64 over to the generic entry code,
> use new helper to check resched when PREEMPT_DYNAMIC enabled and
> reuse common code for the disabled case.
> 
> No functional changes.

Please fold this together with the last two patches; it's undoing
changes you made in patch 6, and it'd be far clearer to see that all at
once.

Mark.

> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/preempt.h |  3 +++
>  arch/arm64/kernel/entry-common.c | 21 +++++++++++----------
>  2 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
> index d0f93385bd85..0f0ba250efe8 100644
> --- a/arch/arm64/include/asm/preempt.h
> +++ b/arch/arm64/include/asm/preempt.h
> @@ -93,11 +93,14 @@ void dynamic_preempt_schedule(void);
>  #define __preempt_schedule()		dynamic_preempt_schedule()
>  void dynamic_preempt_schedule_notrace(void);
>  #define __preempt_schedule_notrace()	dynamic_preempt_schedule_notrace()
> +void dynamic_irqentry_exit_cond_resched(void);
> +#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
>  
>  #else /* CONFIG_PREEMPT_DYNAMIC */
>  
>  #define __preempt_schedule()		preempt_schedule()
>  #define __preempt_schedule_notrace()	preempt_schedule_notrace()
> +#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
>  
>  #endif /* CONFIG_PREEMPT_DYNAMIC */
>  #endif /* CONFIG_PREEMPTION */
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 029f8bd72f8a..015a65d19b52 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -75,10 +75,6 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  	return state;
>  }
>  
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -#endif
> -
>  static inline bool arm64_need_resched(void)
>  {
>  	/*
> @@ -106,17 +102,22 @@ static inline bool arm64_need_resched(void)
>  
>  void raw_irqentry_exit_cond_resched(void)
>  {
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -		return;
> -#endif
> -
>  	if (!preempt_count()) {
>  		if (need_resched() && arm64_need_resched())
>  			preempt_schedule_irq();
>  	}
>  }
>  
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +void dynamic_irqentry_exit_cond_resched(void)
> +{
> +	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +		return;
> +	raw_irqentry_exit_cond_resched();
> +}
> +#endif
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -140,7 +141,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  		}
>  
>  		if (IS_ENABLED(CONFIG_PREEMPTION))
> -			raw_irqentry_exit_cond_resched();
> +			irqentry_exit_cond_resched();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall
  2024-12-06 10:17 ` [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall Jinjie Ruan
@ 2025-02-10 12:04   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 12:04 UTC (permalink / raw)
  To: Jinjie Ruan, tglx
  Cc: catalin.marinas, will, oleg, sstabellini, peterz, luto, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, kees, wad, akpm, samitolvanen, masahiroy, hca,
	aliceryhl, rppt, xur, paulmck, arnd, mbenes, puranjay, pcc, ardb,
	sudeep.holla, guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron,
	liaochang1, kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:31PM +0800, Jinjie Ruan wrote:
> As Mark pointed out, do not try to switch to *all* the
> generic entry code in one go. The regular entry state management
> (e.g. enter_from_user_mode() and exit_to_user_mode()) is largely
> separate from the syscall state management. Move arm64 over to
> enter_from_user_mode() and exit_to_user_mode() without needing to use
> any of the generic syscall logic. Doing that first, *then* moving over
> to the generic syscall handling would be much easier to
> review/test/bisect, and if there are any ABI issues with the syscall
> handling in particular, it will be easier to handle those in isolation.
>
> So split generic entry into irq entry and syscall code, which will
> make review work easier and switch to generic entry clear.

> Introdue two configs called GENERIC_SYSCALL and GENERIC_IRQ_ENTRY,
> which control the irq entry and syscall parts of the generic code
> respectively. And split the header file irq-entry-common.h from
> entry-common.h for GENERIC_IRQ_ENTRY.

I think this would be simpler and clearer as:

| Currently CONFIG_GENERIC_ENTRY enables both the generic exception
| entry logic and the generic syscall entry logic, which are otherwise
| loosely coupled.
|
| Introduce separate config options for these so that archtiectures can
| select the two independently. This will make it easier for
| architectures to migrate to generic entry code.

It would be good to have this *before* the arm64 changes, either at the
start of the series or upstreamed earlier.

Thomas, can you confirm whether you're happy with splitting this up?

As above, the thinking is that we can easily/quickly move arm64 over to
the generic exception/irq entry code, but the syscall changes have a
much bigger potential impact (e.g. we've had lots of fun historically
with the ptrace state machine), and I'd like to handle the syscall
changes as a follow-up.

Mark.

> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  MAINTAINERS                      |   1 +
>  arch/Kconfig                     |   8 +
>  include/linux/entry-common.h     | 382 +-----------------------------
>  include/linux/irq-entry-common.h | 389 +++++++++++++++++++++++++++++++
>  kernel/entry/Makefile            |   3 +-
>  kernel/entry/common.c            | 160 +------------
>  kernel/entry/syscall-common.c    | 159 +++++++++++++
>  kernel/sched/core.c              |   8 +-
>  8 files changed, 565 insertions(+), 545 deletions(-)
>  create mode 100644 include/linux/irq-entry-common.h
>  create mode 100644 kernel/entry/syscall-common.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 21f855fe468b..7a6e87587101 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9585,6 +9585,7 @@ S:	Maintained
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/entry
>  F:	include/linux/entry-common.h
>  F:	include/linux/entry-kvm.h
> +F:	include/linux/irq-entry-common.h
>  F:	kernel/entry/
>  
>  GENERIC GPIO I2C DRIVER
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 6682b2a53e34..5a454eff780b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -64,8 +64,16 @@ config HOTPLUG_PARALLEL
>  	bool
>  	select HOTPLUG_SPLIT_STARTUP
>  
> +config GENERIC_IRQ_ENTRY
> +	bool
> +
> +config GENERIC_SYSCALL
> +	bool
> +
>  config GENERIC_ENTRY
>  	bool
> +	select GENERIC_IRQ_ENTRY
> +	select GENERIC_SYSCALL
>  
>  config KPROBES
>  	bool "Kprobes"
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index fc61d0205c97..b3233e8328c5 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -2,27 +2,15 @@
>  #ifndef __LINUX_ENTRYCOMMON_H
>  #define __LINUX_ENTRYCOMMON_H
>  
> -#include <linux/static_call_types.h>
> +#include <linux/irq-entry-common.h>
>  #include <linux/ptrace.h>
> -#include <linux/syscalls.h>
>  #include <linux/seccomp.h>
>  #include <linux/sched.h>
> -#include <linux/context_tracking.h>
>  #include <linux/livepatch.h>
>  #include <linux/resume_user_mode.h>
> -#include <linux/tick.h>
> -#include <linux/kmsan.h>
>  
>  #include <asm/entry-common.h>
>  
> -/*
> - * Define dummy _TIF work flags if not defined by the architecture or for
> - * disabled functionality.
> - */
> -#ifndef _TIF_PATCH_PENDING
> -# define _TIF_PATCH_PENDING		(0)
> -#endif
> -
>  #ifndef _TIF_UPROBE
>  # define _TIF_UPROBE			(0)
>  #endif
> @@ -55,69 +43,6 @@
>  				 SYSCALL_WORK_SYSCALL_EXIT_TRAP	|	\
>  				 ARCH_SYSCALL_WORK_EXIT)
>  
> -/*
> - * TIF flags handled in exit_to_user_mode_loop()
> - */
> -#ifndef ARCH_EXIT_TO_USER_MODE_WORK
> -# define ARCH_EXIT_TO_USER_MODE_WORK		(0)
> -#endif
> -
> -#define EXIT_TO_USER_MODE_WORK						\
> -	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
> -	 _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY |			\
> -	 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |			\
> -	 ARCH_EXIT_TO_USER_MODE_WORK)
> -
> -/**
> - * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Defaults to an empty implementation. Can be replaced by architecture
> - * specific code.
> - *
> - * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
> - * section. Use __always_inline so the compiler cannot push it out of line
> - * and make it instrumentable.
> - */
> -static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
> -
> -#ifndef arch_enter_from_user_mode
> -static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
> -#endif
> -
> -/**
> - * enter_from_user_mode - Establish state when coming from user mode
> - *
> - * Syscall/interrupt entry disables interrupts, but user mode is traced as
> - * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
> - *
> - * 1) Tell lockdep that interrupts are disabled
> - * 2) Invoke context tracking if enabled to reactivate RCU
> - * 3) Trace interrupts off state
> - *
> - * Invoked from architecture specific syscall entry code with interrupts
> - * disabled. The calling code has to be non-instrumentable. When the
> - * function returns all state is correct and interrupts are still
> - * disabled. The subsequent functions can be instrumented.
> - *
> - * This is invoked when there is architecture specific functionality to be
> - * done between establishing state and enabling interrupts. The caller must
> - * enable interrupts before invoking syscall_enter_from_user_mode_work().
> - */
> -static __always_inline void enter_from_user_mode(struct pt_regs *regs)
> -{
> -	arch_enter_from_user_mode(regs);
> -	lockdep_hardirqs_off(CALLER_ADDR0);
> -
> -	CT_WARN_ON(__ct_state() != CT_STATE_USER);
> -	user_exit_irqoff();
> -
> -	instrumentation_begin();
> -	kmsan_unpoison_entry_regs(regs);
> -	trace_hardirqs_off_finish();
> -	instrumentation_end();
> -}
> -
>  /**
>   * syscall_enter_from_user_mode_prepare - Establish state and enable interrupts
>   * @regs:	Pointer to currents pt_regs
> @@ -202,170 +127,6 @@ static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, l
>  	return ret;
>  }
>  
> -/**
> - * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Defaults to local_irq_enable(). Can be supplied by architecture specific
> - * code.
> - */
> -static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
> -
> -#ifndef local_irq_enable_exit_to_user
> -static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
> -{
> -	local_irq_enable();
> -}
> -#endif
> -
> -/**
> - * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
> - *
> - * Defaults to local_irq_disable(). Can be supplied by architecture specific
> - * code.
> - */
> -static inline void local_irq_disable_exit_to_user(void);
> -
> -#ifndef local_irq_disable_exit_to_user
> -static inline void local_irq_disable_exit_to_user(void)
> -{
> -	local_irq_disable();
> -}
> -#endif
> -
> -/**
> - * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
> - *				 to user mode.
> - * @regs:	Pointer to currents pt_regs
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Invoked from exit_to_user_mode_loop() with interrupt enabled
> - *
> - * Defaults to NOOP. Can be supplied by architecture specific code.
> - */
> -static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> -					       unsigned long ti_work);
> -
> -#ifndef arch_exit_to_user_mode_work
> -static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> -					       unsigned long ti_work)
> -{
> -}
> -#endif
> -
> -/**
> - * arch_exit_to_user_mode_prepare - Architecture specific preparation for
> - *				    exit to user mode.
> - * @regs:	Pointer to currents pt_regs
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last
> - * function before return. Defaults to NOOP.
> - */
> -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> -						  unsigned long ti_work);
> -
> -#ifndef arch_exit_to_user_mode_prepare
> -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> -						  unsigned long ti_work)
> -{
> -}
> -#endif
> -
> -/**
> - * arch_exit_to_user_mode - Architecture specific final work before
> - *			    exit to user mode.
> - *
> - * Invoked from exit_to_user_mode() with interrupt disabled as the last
> - * function before return. Defaults to NOOP.
> - *
> - * This needs to be __always_inline because it is non-instrumentable code
> - * invoked after context tracking switched to user mode.
> - *
> - * An architecture implementation must not do anything complex, no locking
> - * etc. The main purpose is for speculation mitigations.
> - */
> -static __always_inline void arch_exit_to_user_mode(void);
> -
> -#ifndef arch_exit_to_user_mode
> -static __always_inline void arch_exit_to_user_mode(void) { }
> -#endif
> -
> -/**
> - * arch_do_signal_or_restart -  Architecture specific signal delivery function
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Invoked from exit_to_user_mode_loop().
> - */
> -void arch_do_signal_or_restart(struct pt_regs *regs);
> -
> -/**
> - * exit_to_user_mode_loop - do any pending work before leaving to user space
> - */
> -unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> -				     unsigned long ti_work);
> -
> -/**
> - * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
> - * @regs:	Pointer to pt_regs on entry stack
> - *
> - * 1) check that interrupts are disabled
> - * 2) call tick_nohz_user_enter_prepare()
> - * 3) call exit_to_user_mode_loop() if any flags from
> - *    EXIT_TO_USER_MODE_WORK are set
> - * 4) check that interrupts are still disabled
> - */
> -static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
> -{
> -	unsigned long ti_work;
> -
> -	lockdep_assert_irqs_disabled();
> -
> -	/* Flush pending rcuog wakeup before the last need_resched() check */
> -	tick_nohz_user_enter_prepare();
> -
> -	ti_work = read_thread_flags();
> -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> -		ti_work = exit_to_user_mode_loop(regs, ti_work);
> -
> -	arch_exit_to_user_mode_prepare(regs, ti_work);
> -
> -	/* Ensure that kernel state is sane for a return to userspace */
> -	kmap_assert_nomap();
> -	lockdep_assert_irqs_disabled();
> -	lockdep_sys_exit();
> -}
> -
> -/**
> - * exit_to_user_mode - Fixup state when exiting to user mode
> - *
> - * Syscall/interrupt exit enables interrupts, but the kernel state is
> - * interrupts disabled when this is invoked. Also tell RCU about it.
> - *
> - * 1) Trace interrupts on state
> - * 2) Invoke context tracking if enabled to adjust RCU state
> - * 3) Invoke architecture specific last minute exit code, e.g. speculation
> - *    mitigations, etc.: arch_exit_to_user_mode()
> - * 4) Tell lockdep that interrupts are enabled
> - *
> - * Invoked from architecture specific code when syscall_exit_to_user_mode()
> - * is not suitable as the last step before returning to userspace. Must be
> - * invoked with interrupts disabled and the caller must be
> - * non-instrumentable.
> - * The caller has to invoke syscall_exit_to_user_mode_work() before this.
> - */
> -static __always_inline void exit_to_user_mode(void)
> -{
> -	instrumentation_begin();
> -	trace_hardirqs_on_prepare();
> -	lockdep_hardirqs_on_prepare();
> -	instrumentation_end();
> -
> -	user_enter_irqoff();
> -	arch_exit_to_user_mode();
> -	lockdep_hardirqs_on(CALLER_ADDR0);
> -}
> -
>  /**
>   * syscall_exit_to_user_mode_work - Handle work before returning to user mode
>   * @regs:	Pointer to currents pt_regs
> @@ -412,145 +173,4 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs);
>   */
>  void syscall_exit_to_user_mode(struct pt_regs *regs);
>  
> -/**
> - * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Invoked from architecture specific entry code with interrupts disabled.
> - * Can only be called when the interrupt entry came from user mode. The
> - * calling code must be non-instrumentable.  When the function returns all
> - * state is correct and the subsequent functions can be instrumented.
> - *
> - * The function establishes state (lockdep, RCU (context tracking), tracing)
> - */
> -void irqentry_enter_from_user_mode(struct pt_regs *regs);
> -
> -/**
> - * irqentry_exit_to_user_mode - Interrupt exit work
> - * @regs:	Pointer to current's pt_regs
> - *
> - * Invoked with interrupts disabled and fully valid regs. Returns with all
> - * work handled, interrupts disabled such that the caller can immediately
> - * switch to user mode. Called from architecture specific interrupt
> - * handling code.
> - *
> - * The call order is #2 and #3 as described in syscall_exit_to_user_mode().
> - * Interrupt exit is not invoking #1 which is the syscall specific one time
> - * work.
> - */
> -void irqentry_exit_to_user_mode(struct pt_regs *regs);
> -
> -#ifndef irqentry_state
> -/**
> - * struct irqentry_state - Opaque object for exception state storage
> - * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
> - *            exit path has to invoke ct_irq_exit().
> - * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
> - *           lockdep state is restored correctly on exit from nmi.
> - *
> - * This opaque object is filled in by the irqentry_*_enter() functions and
> - * must be passed back into the corresponding irqentry_*_exit() functions
> - * when the exception is complete.
> - *
> - * Callers of irqentry_*_[enter|exit]() must consider this structure opaque
> - * and all members private.  Descriptions of the members are provided to aid in
> - * the maintenance of the irqentry_*() functions.
> - */
> -typedef struct irqentry_state {
> -	union {
> -		bool	exit_rcu;
> -		bool	lockdep;
> -	};
> -} irqentry_state_t;
> -#endif
> -
> -/**
> - * irqentry_enter - Handle state tracking on ordinary interrupt entries
> - * @regs:	Pointer to pt_regs of interrupted context
> - *
> - * Invokes:
> - *  - lockdep irqflag state tracking as low level ASM entry disabled
> - *    interrupts.
> - *
> - *  - Context tracking if the exception hit user mode.
> - *
> - *  - The hardirq tracer to keep the state consistent as low level ASM
> - *    entry disabled interrupts.
> - *
> - * As a precondition, this requires that the entry came from user mode,
> - * idle, or a kernel context in which RCU is watching.
> - *
> - * For kernel mode entries RCU handling is done conditional. If RCU is
> - * watching then the only RCU requirement is to check whether the tick has
> - * to be restarted. If RCU is not watching then ct_irq_enter() has to be
> - * invoked on entry and ct_irq_exit() on exit.
> - *
> - * Avoiding the ct_irq_enter/exit() calls is an optimization but also
> - * solves the problem of kernel mode pagefaults which can schedule, which
> - * is not possible after invoking ct_irq_enter() without undoing it.
> - *
> - * For user mode entries irqentry_enter_from_user_mode() is invoked to
> - * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
> - * would not be possible.
> - *
> - * Returns: An opaque object that must be passed to idtentry_exit()
> - */
> -irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
> -
> -/**
> - * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
> - *
> - * Conditional reschedule with additional sanity checks.
> - */
> -void raw_irqentry_exit_cond_resched(void);
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> -#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> -#define irqentry_exit_cond_resched_dynamic_disabled	NULL
> -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> -#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> -#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -void dynamic_irqentry_exit_cond_resched(void);
> -#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
> -#endif
> -#else /* CONFIG_PREEMPT_DYNAMIC */
> -#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
> -#endif /* CONFIG_PREEMPT_DYNAMIC */
> -
> -/**
> - * irqentry_exit - Handle return from exception that used irqentry_enter()
> - * @regs:	Pointer to pt_regs (exception entry regs)
> - * @state:	Return value from matching call to irqentry_enter()
> - *
> - * Depending on the return target (kernel/user) this runs the necessary
> - * preemption and work checks if possible and required and returns to
> - * the caller with interrupts disabled and no further work pending.
> - *
> - * This is the last action before returning to the low level ASM code which
> - * just needs to return to the appropriate context.
> - *
> - * Counterpart to irqentry_enter().
> - */
> -void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
> -
> -/**
> - * irqentry_nmi_enter - Handle NMI entry
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Similar to irqentry_enter() but taking care of the NMI constraints.
> - */
> -irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs);
> -
> -/**
> - * irqentry_nmi_exit - Handle return from NMI handling
> - * @regs:	Pointer to pt_regs (NMI entry regs)
> - * @irq_state:	Return value from matching call to irqentry_nmi_enter()
> - *
> - * Last action before returning to the low level assembly code.
> - *
> - * Counterpart to irqentry_nmi_enter().
> - */
> -void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state);
> -
>  #endif
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> new file mode 100644
> index 000000000000..8af374331900
> --- /dev/null
> +++ b/include/linux/irq-entry-common.h
> @@ -0,0 +1,389 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_IRQENTRYCOMMON_H
> +#define __LINUX_IRQENTRYCOMMON_H
> +
> +#include <linux/static_call_types.h>
> +#include <linux/syscalls.h>
> +#include <linux/context_tracking.h>
> +#include <linux/tick.h>
> +#include <linux/kmsan.h>
> +
> +#include <asm/entry-common.h>
> +
> +/*
> + * Define dummy _TIF work flags if not defined by the architecture or for
> + * disabled functionality.
> + */
> +#ifndef _TIF_PATCH_PENDING
> +# define _TIF_PATCH_PENDING		(0)
> +#endif
> +
> +/*
> + * TIF flags handled in exit_to_user_mode_loop()
> + */
> +#ifndef ARCH_EXIT_TO_USER_MODE_WORK
> +# define ARCH_EXIT_TO_USER_MODE_WORK		(0)
> +#endif
> +
> +#define EXIT_TO_USER_MODE_WORK						\
> +	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
> +	 _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY |			\
> +	 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |			\
> +	 ARCH_EXIT_TO_USER_MODE_WORK)
> +
> +/**
> + * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Defaults to an empty implementation. Can be replaced by architecture
> + * specific code.
> + *
> + * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
> + * section. Use __always_inline so the compiler cannot push it out of line
> + * and make it instrumentable.
> + */
> +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
> +
> +#ifndef arch_enter_from_user_mode
> +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
> +#endif
> +
> +/**
> + * enter_from_user_mode - Establish state when coming from user mode
> + *
> + * Syscall/interrupt entry disables interrupts, but user mode is traced as
> + * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
> + *
> + * 1) Tell lockdep that interrupts are disabled
> + * 2) Invoke context tracking if enabled to reactivate RCU
> + * 3) Trace interrupts off state
> + *
> + * Invoked from architecture specific syscall entry code with interrupts
> + * disabled. The calling code has to be non-instrumentable. When the
> + * function returns all state is correct and interrupts are still
> + * disabled. The subsequent functions can be instrumented.
> + *
> + * This is invoked when there is architecture specific functionality to be
> + * done between establishing state and enabling interrupts. The caller must
> + * enable interrupts before invoking syscall_enter_from_user_mode_work().
> + */
> +static __always_inline void enter_from_user_mode(struct pt_regs *regs)
> +{
> +	arch_enter_from_user_mode(regs);
> +	lockdep_hardirqs_off(CALLER_ADDR0);
> +
> +	CT_WARN_ON(__ct_state() != CT_STATE_USER);
> +	user_exit_irqoff();
> +
> +	instrumentation_begin();
> +	kmsan_unpoison_entry_regs(regs);
> +	trace_hardirqs_off_finish();
> +	instrumentation_end();
> +}
> +
> +/**
> + * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
> + * @ti_work:	Cached TIF flags gathered with interrupts disabled
> + *
> + * Defaults to local_irq_enable(). Can be supplied by architecture specific
> + * code.
> + */
> +static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
> +
> +#ifndef local_irq_enable_exit_to_user
> +static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
> +{
> +	local_irq_enable();
> +}
> +#endif
> +
> +/**
> + * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
> + *
> + * Defaults to local_irq_disable(). Can be supplied by architecture specific
> + * code.
> + */
> +static inline void local_irq_disable_exit_to_user(void);
> +
> +#ifndef local_irq_disable_exit_to_user
> +static inline void local_irq_disable_exit_to_user(void)
> +{
> +	local_irq_disable();
> +}
> +#endif
> +
> +/**
> + * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
> + *				 to user mode.
> + * @regs:	Pointer to currents pt_regs
> + * @ti_work:	Cached TIF flags gathered with interrupts disabled
> + *
> + * Invoked from exit_to_user_mode_loop() with interrupt enabled
> + *
> + * Defaults to NOOP. Can be supplied by architecture specific code.
> + */
> +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> +					       unsigned long ti_work);
> +
> +#ifndef arch_exit_to_user_mode_work
> +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> +					       unsigned long ti_work)
> +{
> +}
> +#endif
> +
> +/**
> + * arch_exit_to_user_mode_prepare - Architecture specific preparation for
> + *				    exit to user mode.
> + * @regs:	Pointer to currents pt_regs
> + * @ti_work:	Cached TIF flags gathered with interrupts disabled
> + *
> + * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last
> + * function before return. Defaults to NOOP.
> + */
> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work);
> +
> +#ifndef arch_exit_to_user_mode_prepare
> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work)
> +{
> +}
> +#endif
> +
> +/**
> + * arch_exit_to_user_mode - Architecture specific final work before
> + *			    exit to user mode.
> + *
> + * Invoked from exit_to_user_mode() with interrupt disabled as the last
> + * function before return. Defaults to NOOP.
> + *
> + * This needs to be __always_inline because it is non-instrumentable code
> + * invoked after context tracking switched to user mode.
> + *
> + * An architecture implementation must not do anything complex, no locking
> + * etc. The main purpose is for speculation mitigations.
> + */
> +static __always_inline void arch_exit_to_user_mode(void);
> +
> +#ifndef arch_exit_to_user_mode
> +static __always_inline void arch_exit_to_user_mode(void) { }
> +#endif
> +
> +/**
> + * arch_do_signal_or_restart -  Architecture specific signal delivery function
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Invoked from exit_to_user_mode_loop().
> + */
> +void arch_do_signal_or_restart(struct pt_regs *regs);
> +
> +/**
> + * exit_to_user_mode_loop - do any pending work before leaving to user space
> + */
> +unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> +				     unsigned long ti_work);
> +
> +/**
> + * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
> + * @regs:	Pointer to pt_regs on entry stack
> + *
> + * 1) check that interrupts are disabled
> + * 2) call tick_nohz_user_enter_prepare()
> + * 3) call exit_to_user_mode_loop() if any flags from
> + *    EXIT_TO_USER_MODE_WORK are set
> + * 4) check that interrupts are still disabled
> + */
> +static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
> +{
> +	unsigned long ti_work;
> +
> +	lockdep_assert_irqs_disabled();
> +
> +	/* Flush pending rcuog wakeup before the last need_resched() check */
> +	tick_nohz_user_enter_prepare();
> +
> +	ti_work = read_thread_flags();
> +	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> +		ti_work = exit_to_user_mode_loop(regs, ti_work);
> +
> +	arch_exit_to_user_mode_prepare(regs, ti_work);
> +
> +	/* Ensure that kernel state is sane for a return to userspace */
> +	kmap_assert_nomap();
> +	lockdep_assert_irqs_disabled();
> +	lockdep_sys_exit();
> +}
> +
> +/**
> + * exit_to_user_mode - Fixup state when exiting to user mode
> + *
> + * Syscall/interrupt exit enables interrupts, but the kernel state is
> + * interrupts disabled when this is invoked. Also tell RCU about it.
> + *
> + * 1) Trace interrupts on state
> + * 2) Invoke context tracking if enabled to adjust RCU state
> + * 3) Invoke architecture specific last minute exit code, e.g. speculation
> + *    mitigations, etc.: arch_exit_to_user_mode()
> + * 4) Tell lockdep that interrupts are enabled
> + *
> + * Invoked from architecture specific code when syscall_exit_to_user_mode()
> + * is not suitable as the last step before returning to userspace. Must be
> + * invoked with interrupts disabled and the caller must be
> + * non-instrumentable.
> + * The caller has to invoke syscall_exit_to_user_mode_work() before this.
> + */
> +static __always_inline void exit_to_user_mode(void)
> +{
> +	instrumentation_begin();
> +	trace_hardirqs_on_prepare();
> +	lockdep_hardirqs_on_prepare();
> +	instrumentation_end();
> +
> +	user_enter_irqoff();
> +	arch_exit_to_user_mode();
> +	lockdep_hardirqs_on(CALLER_ADDR0);
> +}
> +
> +/**
> + * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Invoked from architecture specific entry code with interrupts disabled.
> + * Can only be called when the interrupt entry came from user mode. The
> + * calling code must be non-instrumentable.  When the function returns all
> + * state is correct and the subsequent functions can be instrumented.
> + *
> + * The function establishes state (lockdep, RCU (context tracking), tracing)
> + */
> +void irqentry_enter_from_user_mode(struct pt_regs *regs);
> +
> +/**
> + * irqentry_exit_to_user_mode - Interrupt exit work
> + * @regs:	Pointer to current's pt_regs
> + *
> + * Invoked with interrupts disabled and fully valid regs. Returns with all
> + * work handled, interrupts disabled such that the caller can immediately
> + * switch to user mode. Called from architecture specific interrupt
> + * handling code.
> + *
> + * The call order is #2 and #3 as described in syscall_exit_to_user_mode().
> + * Interrupt exit is not invoking #1 which is the syscall specific one time
> + * work.
> + */
> +void irqentry_exit_to_user_mode(struct pt_regs *regs);
> +
> +#ifndef irqentry_state
> +/**
> + * struct irqentry_state - Opaque object for exception state storage
> + * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
> + *            exit path has to invoke ct_irq_exit().
> + * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
> + *           lockdep state is restored correctly on exit from nmi.
> + *
> + * This opaque object is filled in by the irqentry_*_enter() functions and
> + * must be passed back into the corresponding irqentry_*_exit() functions
> + * when the exception is complete.
> + *
> + * Callers of irqentry_*_[enter|exit]() must consider this structure opaque
> + * and all members private.  Descriptions of the members are provided to aid in
> + * the maintenance of the irqentry_*() functions.
> + */
> +typedef struct irqentry_state {
> +	union {
> +		bool	exit_rcu;
> +		bool	lockdep;
> +	};
> +} irqentry_state_t;
> +#endif
> +
> +/**
> + * irqentry_enter - Handle state tracking on ordinary interrupt entries
> + * @regs:	Pointer to pt_regs of interrupted context
> + *
> + * Invokes:
> + *  - lockdep irqflag state tracking as low level ASM entry disabled
> + *    interrupts.
> + *
> + *  - Context tracking if the exception hit user mode.
> + *
> + *  - The hardirq tracer to keep the state consistent as low level ASM
> + *    entry disabled interrupts.
> + *
> + * As a precondition, this requires that the entry came from user mode,
> + * idle, or a kernel context in which RCU is watching.
> + *
> + * For kernel mode entries RCU handling is done conditional. If RCU is
> + * watching then the only RCU requirement is to check whether the tick has
> + * to be restarted. If RCU is not watching then ct_irq_enter() has to be
> + * invoked on entry and ct_irq_exit() on exit.
> + *
> + * Avoiding the ct_irq_enter/exit() calls is an optimization but also
> + * solves the problem of kernel mode pagefaults which can schedule, which
> + * is not possible after invoking ct_irq_enter() without undoing it.
> + *
> + * For user mode entries irqentry_enter_from_user_mode() is invoked to
> + * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
> + * would not be possible.
> + *
> + * Returns: An opaque object that must be passed to idtentry_exit()
> + */
> +irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
> +
> +/**
> + * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
> + *
> + * Conditional reschedule with additional sanity checks.
> + */
> +void raw_irqentry_exit_cond_resched(void);
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> +#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> +#define irqentry_exit_cond_resched_dynamic_disabled	NULL
> +DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> +#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> +#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +void dynamic_irqentry_exit_cond_resched(void);
> +#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
> +#endif
> +#else /* CONFIG_PREEMPT_DYNAMIC */
> +#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
> +#endif /* CONFIG_PREEMPT_DYNAMIC */
> +
> +/**
> + * irqentry_exit - Handle return from exception that used irqentry_enter()
> + * @regs:	Pointer to pt_regs (exception entry regs)
> + * @state:	Return value from matching call to irqentry_enter()
> + *
> + * Depending on the return target (kernel/user) this runs the necessary
> + * preemption and work checks if possible and required and returns to
> + * the caller with interrupts disabled and no further work pending.
> + *
> + * This is the last action before returning to the low level ASM code which
> + * just needs to return to the appropriate context.
> + *
> + * Counterpart to irqentry_enter().
> + */
> +void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
> +
> +/**
> + * irqentry_nmi_enter - Handle NMI entry
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Similar to irqentry_enter() but taking care of the NMI constraints.
> + */
> +irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs);
> +
> +/**
> + * irqentry_nmi_exit - Handle return from NMI handling
> + * @regs:	Pointer to pt_regs (NMI entry regs)
> + * @irq_state:	Return value from matching call to irqentry_nmi_enter()
> + *
> + * Last action before returning to the low level assembly code.
> + *
> + * Counterpart to irqentry_nmi_enter().
> + */
> +void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state);
> +
> +#endif
> diff --git a/kernel/entry/Makefile b/kernel/entry/Makefile
> index 095c775e001e..d38f3a7e7396 100644
> --- a/kernel/entry/Makefile
> +++ b/kernel/entry/Makefile
> @@ -9,5 +9,6 @@ KCOV_INSTRUMENT := n
>  CFLAGS_REMOVE_common.o	 = -fstack-protector -fstack-protector-strong
>  CFLAGS_common.o		+= -fno-stack-protector
>  
> -obj-$(CONFIG_GENERIC_ENTRY) 		+= common.o syscall_user_dispatch.o
> +obj-$(CONFIG_GENERIC_IRQ_ENTRY) 	+= common.o
> +obj-$(CONFIG_GENERIC_SYSCALL) 		+= syscall-common.o syscall_user_dispatch.o
>  obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK)	+= kvm.o
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index e33691d5adf7..b82032777310 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -1,84 +1,13 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -#include <linux/context_tracking.h>
> -#include <linux/entry-common.h>
> +#include <linux/irq-entry-common.h>
>  #include <linux/resume_user_mode.h>
>  #include <linux/highmem.h>
>  #include <linux/jump_label.h>
>  #include <linux/kmsan.h>
>  #include <linux/livepatch.h>
> -#include <linux/audit.h>
>  #include <linux/tick.h>
>  
> -#include "common.h"
> -
> -#define CREATE_TRACE_POINTS
> -#include <trace/events/syscalls.h>
> -
> -static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
> -{
> -	if (unlikely(audit_context())) {
> -		unsigned long args[6];
> -
> -		syscall_get_arguments(current, regs, args);
> -		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
> -	}
> -}
> -
> -long syscall_trace_enter(struct pt_regs *regs, long syscall,
> -				unsigned long work)
> -{
> -	long ret = 0;
> -
> -	/*
> -	 * Handle Syscall User Dispatch.  This must comes first, since
> -	 * the ABI here can be something that doesn't make sense for
> -	 * other syscall_work features.
> -	 */
> -	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> -		if (syscall_user_dispatch(regs))
> -			return -1L;
> -	}
> -
> -	/* Handle ptrace */
> -	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
> -		ret = ptrace_report_syscall_entry(regs);
> -		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
> -			return -1L;
> -	}
> -
> -	/* Do seccomp after ptrace, to catch any tracer changes. */
> -	if (work & SYSCALL_WORK_SECCOMP) {
> -		ret = __secure_computing(NULL);
> -		if (ret == -1L)
> -			return ret;
> -	}
> -
> -	/* Either of the above might have changed the syscall number */
> -	syscall = syscall_get_nr(current, regs);
> -
> -	if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
> -		trace_sys_enter(regs, syscall);
> -		/*
> -		 * Probes or BPF hooks in the tracepoint may have changed the
> -		 * system call number as well.
> -		 */
> -		syscall = syscall_get_nr(current, regs);
> -	}
> -
> -	syscall_enter_audit(regs, syscall);
> -
> -	return ret ? : syscall;
> -}
> -
> -noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
> -{
> -	enter_from_user_mode(regs);
> -	instrumentation_begin();
> -	local_irq_enable();
> -	instrumentation_end();
> -}
> -
>  /* Workaround to allow gradual conversion of architecture code */
>  void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
>  
> @@ -133,93 +62,6 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  	return ti_work;
>  }
>  
> -/*
> - * If SYSCALL_EMU is set, then the only reason to report is when
> - * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
> - * instruction has been already reported in syscall_enter_from_user_mode().
> - */
> -static inline bool report_single_step(unsigned long work)
> -{
> -	if (work & SYSCALL_WORK_SYSCALL_EMU)
> -		return false;
> -
> -	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
> -}
> -
> -static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> -{
> -	bool step;
> -
> -	/*
> -	 * If the syscall was rolled back due to syscall user dispatching,
> -	 * then the tracers below are not invoked for the same reason as
> -	 * the entry side was not invoked in syscall_trace_enter(): The ABI
> -	 * of these syscalls is unknown.
> -	 */
> -	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> -		if (unlikely(current->syscall_dispatch.on_dispatch)) {
> -			current->syscall_dispatch.on_dispatch = false;
> -			return;
> -		}
> -	}
> -
> -	audit_syscall_exit(regs);
> -
> -	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> -		trace_sys_exit(regs, syscall_get_return_value(current, regs));
> -
> -	step = report_single_step(work);
> -	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
> -		ptrace_report_syscall_exit(regs, step);
> -}
> -
> -/*
> - * Syscall specific exit to user mode preparation. Runs with interrupts
> - * enabled.
> - */
> -static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
> -{
> -	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
> -	unsigned long nr = syscall_get_nr(current, regs);
> -
> -	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
> -
> -	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> -		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
> -			local_irq_enable();
> -	}
> -
> -	rseq_syscall(regs);
> -
> -	/*
> -	 * Do one-time syscall specific work. If these work items are
> -	 * enabled, we want to run them exactly once per syscall exit with
> -	 * interrupts enabled.
> -	 */
> -	if (unlikely(work & SYSCALL_WORK_EXIT))
> -		syscall_exit_work(regs, work);
> -}
> -
> -static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
> -{
> -	syscall_exit_to_user_mode_prepare(regs);
> -	local_irq_disable_exit_to_user();
> -	exit_to_user_mode_prepare(regs);
> -}
> -
> -void syscall_exit_to_user_mode_work(struct pt_regs *regs)
> -{
> -	__syscall_exit_to_user_mode_work(regs);
> -}
> -
> -__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
> -{
> -	instrumentation_begin();
> -	__syscall_exit_to_user_mode_work(regs);
> -	instrumentation_end();
> -	exit_to_user_mode();
> -}
> -
>  noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
>  {
>  	enter_from_user_mode(regs);
> diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c
> new file mode 100644
> index 000000000000..0eb036986ad4
> --- /dev/null
> +++ b/kernel/entry/syscall-common.c
> @@ -0,0 +1,159 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/audit.h>
> +#include <linux/entry-common.h>
> +#include "common.h"
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/syscalls.h>
> +
> +static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
> +{
> +	if (unlikely(audit_context())) {
> +		unsigned long args[6];
> +
> +		syscall_get_arguments(current, regs, args);
> +		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
> +	}
> +}
> +
> +long syscall_trace_enter(struct pt_regs *regs, long syscall,
> +				unsigned long work)
> +{
> +	long ret = 0;
> +
> +	/*
> +	 * Handle Syscall User Dispatch.  This must comes first, since
> +	 * the ABI here can be something that doesn't make sense for
> +	 * other syscall_work features.
> +	 */
> +	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> +		if (syscall_user_dispatch(regs))
> +			return -1L;
> +	}
> +
> +	/* Handle ptrace */
> +	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
> +		ret = ptrace_report_syscall_entry(regs);
> +		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
> +			return -1L;
> +	}
> +
> +	/* Do seccomp after ptrace, to catch any tracer changes. */
> +	if (work & SYSCALL_WORK_SECCOMP) {
> +		ret = __secure_computing(NULL);
> +		if (ret == -1L)
> +			return ret;
> +	}
> +
> +	/* Either of the above might have changed the syscall number */
> +	syscall = syscall_get_nr(current, regs);
> +
> +	if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
> +		trace_sys_enter(regs, syscall);
> +		/*
> +		 * Probes or BPF hooks in the tracepoint may have changed the
> +		 * system call number as well.
> +		 */
> +		syscall = syscall_get_nr(current, regs);
> +	}
> +
> +	syscall_enter_audit(regs, syscall);
> +
> +	return ret ? : syscall;
> +}
> +
> +noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
> +{
> +	enter_from_user_mode(regs);
> +	instrumentation_begin();
> +	local_irq_enable();
> +	instrumentation_end();
> +}
> +
> +/*
> + * If SYSCALL_EMU is set, then the only reason to report is when
> + * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
> + * instruction has been already reported in syscall_enter_from_user_mode().
> + */
> +static inline bool report_single_step(unsigned long work)
> +{
> +	if (work & SYSCALL_WORK_SYSCALL_EMU)
> +		return false;
> +
> +	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
> +}
> +
> +static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> +{
> +	bool step;
> +
> +	/*
> +	 * If the syscall was rolled back due to syscall user dispatching,
> +	 * then the tracers below are not invoked for the same reason as
> +	 * the entry side was not invoked in syscall_trace_enter(): The ABI
> +	 * of these syscalls is unknown.
> +	 */
> +	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> +		if (unlikely(current->syscall_dispatch.on_dispatch)) {
> +			current->syscall_dispatch.on_dispatch = false;
> +			return;
> +		}
> +	}
> +
> +	audit_syscall_exit(regs);
> +
> +	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> +		trace_sys_exit(regs, syscall_get_return_value(current, regs));
> +
> +	step = report_single_step(work);
> +	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
> +		ptrace_report_syscall_exit(regs, step);
> +}
> +
> +/*
> + * Syscall specific exit to user mode preparation. Runs with interrupts
> + * enabled.
> + */
> +static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
> +{
> +	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
> +	unsigned long nr = syscall_get_nr(current, regs);
> +
> +	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
> +
> +	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> +		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
> +			local_irq_enable();
> +	}
> +
> +	rseq_syscall(regs);
> +
> +	/*
> +	 * Do one-time syscall specific work. If these work items are
> +	 * enabled, we want to run them exactly once per syscall exit with
> +	 * interrupts enabled.
> +	 */
> +	if (unlikely(work & SYSCALL_WORK_EXIT))
> +		syscall_exit_work(regs, work);
> +}
> +
> +static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
> +{
> +	syscall_exit_to_user_mode_prepare(regs);
> +	local_irq_disable_exit_to_user();
> +	exit_to_user_mode_prepare(regs);
> +}
> +
> +void syscall_exit_to_user_mode_work(struct pt_regs *regs)
> +{
> +	__syscall_exit_to_user_mode_work(regs);
> +}
> +
> +__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
> +{
> +	instrumentation_begin();
> +	__syscall_exit_to_user_mode_work(regs);
> +	instrumentation_end();
> +	exit_to_user_mode();
> +}
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 27a8fbd58091..2d560bb3efaa 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -68,8 +68,8 @@
>  #include <linux/workqueue_api.h>
>  
>  #ifdef CONFIG_PREEMPT_DYNAMIC
> -# ifdef CONFIG_GENERIC_ENTRY
> -#  include <linux/entry-common.h>
> +# ifdef CONFIG_GENERIC_IRQ_ENTRY
> +#  include <linux/irq-entry-common.h>
>  # endif
>  #endif
>  
> @@ -7398,8 +7398,8 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write);
>  
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  
> -#ifdef CONFIG_GENERIC_ENTRY
> -#include <linux/entry-common.h>
> +#ifdef CONFIG_GENERIC_IRQ_ENTRY
> +#include <linux/irq-entry-common.h>
>  #endif
>  
>  /*
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64
  2024-12-06 10:17 ` [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64 Jinjie Ruan
@ 2025-02-10 12:05   ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 12:05 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:32PM +0800, Jinjie Ruan wrote:
> ARM64 requires an additional check whether to reschedule on return
> from interrupt.
> 
> Add arch_irqentry_exit_need_resched() as the default NOP
> implementation and hook it up into the need_resched() condition in
> raw_irqentry_exit_cond_resched().
> 
> This allows ARM64 to implement the architecture specific version for
> switching over to the generic entry code.

Please fold this into the earlier changes in this area mad over patches
6 to 8.

> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  kernel/entry/common.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index b82032777310..4aa9656fa1b4 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -142,6 +142,20 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>  	return ret;
>  }
>  
> +/**
> + * arch_irqentry_exit_need_resched - Architecture specific need resched function
> + *
> + * Invoked from raw_irqentry_exit_cond_resched() to check if need resched.
> + * Defaults return true.
> + *
> + * The main purpose is to permit arch to skip preempt a task from an IRQ.
> + */
> +static inline bool arch_irqentry_exit_need_resched(void);
> +
> +#ifndef arch_irqentry_exit_need_resched
> +static inline bool arch_irqentry_exit_need_resched(void) { return true; }
> +#endif
> +
>  void raw_irqentry_exit_cond_resched(void)
>  {
>  	if (!preempt_count()) {
> @@ -149,7 +163,7 @@ void raw_irqentry_exit_cond_resched(void)
>  		rcu_irq_exit_check_preempt();
>  		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
>  			WARN_ON_ONCE(!on_thread_stack());
> -		if (need_resched())
> +		if (need_resched() && arch_irqentry_exit_need_resched())
>  			preempt_schedule_irq();
>  	}
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry
  2024-12-06 10:17 ` [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Jinjie Ruan
@ 2025-02-10 12:24   ` Mark Rutland
  2025-02-11 11:32     ` Jinjie Ruan
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 12:24 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:33PM +0800, Jinjie Ruan wrote:
> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> to use the generic entry infrastructure from kernel/entry/*.
> The generic entry makes maintainers' work easier and codes
> more elegant.
> 
> Switch arm64 to generic IRQ entry first, which removed duplicate 100+
> LOC, and it will switch to generic entry completely later. Switch to
> generic entry in two steps according to Mark's suggestion will make
> it easier to review.
> 
> The changes are below:
>  - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
>    irqentry_enter/exit(). Also remove *enter_from/exit_to_user_mode(),
>    and wrap with generic enter_from/exit_to_user_mode() because they
>    are exactly the same so far.
> 
>  - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
>    because they're exactly the same, so the temporary arm64 version
>    irqentry_state can also be removed.
> 
>  - Remove PREEMPT_DYNAMIC code, as generic entry do the same thing
>    if arm64 implement arch_irqentry_exit_need_resched().
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/Kconfig                    |   1 +
>  arch/arm64/include/asm/entry-common.h |  64 ++++++
>  arch/arm64/include/asm/preempt.h      |   6 -
>  arch/arm64/kernel/entry-common.c      | 307 ++++++--------------------
>  arch/arm64/kernel/signal.c            |   3 +-
>  5 files changed, 129 insertions(+), 252 deletions(-)
>  create mode 100644 arch/arm64/include/asm/entry-common.h

Superficially this looks nice, but to be clear I have *not* looked at
this in great detail; minor comments below.

[...]

> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work)
> +{
> +	local_daif_mask();
> +}
> +
> +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare

I'm a little worried that this may be fragile having been hidden in the
common code, as it's not clear exactly when this will occur during the
return sequence, and the ordering requirements could easily be broken by
refactoring there.

I suspect we'll want to pull this later in the arm64 exit sequence so
that we can have it explicit in entry-common.c.

[...]

> index 14ac6fdb872b..84b6628647c7 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -9,6 +9,7 @@
>  #include <linux/cache.h>
>  #include <linux/compat.h>
>  #include <linux/errno.h>
> +#include <linux/irq-entry-common.h>
>  #include <linux/kernel.h>
>  #include <linux/signal.h>
>  #include <linux/freezer.h>
> @@ -1603,7 +1604,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>   * the kernel can handle, and then we build all the user-level signal handling
>   * stack-frames in one go after that.
>   */
> -void do_signal(struct pt_regs *regs)
> +void arch_do_signal_or_restart(struct pt_regs *regs)
>  {
>  	unsigned long continue_addr = 0, restart_addr = 0;
>  	int retval = 0;

Is the expected semantic the same here, or is those more than just a
name change?

Mark.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
  2025-02-08  1:15 ` [PATCH -next v5 00/22] " Jinjie Ruan
@ 2025-02-10 12:30   ` Mark Rutland
  2025-02-11 11:43     ` Jinjie Ruan
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2025-02-10 12:30 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Sat, Feb 08, 2025 at 09:15:08AM +0800, Jinjie Ruan wrote:
> On 2024/12/6 18:17, Jinjie Ruan wrote:
> > Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> > to use the generic entry infrastructure from kernel/entry/*. The generic
> > entry makes maintainers' work easier and codes more elegant, which aslo
> > removed a lot of duplicate code.
> > 
> > The main steps are as follows:
> > - Make arm64 easier to use irqentry_enter/exit().
> > - Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
> > - Split generic entry into generic irq entry and generic syscall to
> >   make the single patch more concentrated in switching to one thing.
> > - Switch to generic irq entry.
> > - Make arm64 closer to the generic syscall code.
> > - Switch to generic entry completely.
> > 
> > Changes in v5:
> > - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
> > - Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
> > - Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
> > - Update "irqentry_state_t ret/irq_state" to "state"
> >   to keep it consistently.
> > - Use generic irq entry header for PREEMPT_DYNAMIC after split
> >   the generic entry.
> > - Also refactor the ARM64 syscall code.
> > - Introduce arch_ptrace_report_syscall_entry/exit(), instead of
> >   arch_pre/post_report_syscall_entry/exit() to simplify code.
> > - Make the syscall patches clear separation.
> > - Update the commit message.
> 
> Gentle Ping.

I've left soem comments.

As I mentioned previously, I'd very much prefer that we do the syscall
entry logic changes *later* (i.e. as a follow-up patch series), after
we've got the irq/exception entry logic sorted.

I reckon we've got just enough time to get the irq/exception entry
changes ready this cycle, with another round or two of review. So can we
please put the syscall bits aside for now? ... that and run all the
tests you mention in patch 22 on the irq/exception entry changes alone.

Mark.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry
  2025-02-10 12:24   ` Mark Rutland
@ 2025-02-11 11:32     ` Jinjie Ruan
  0 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2025-02-11 11:32 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel



On 2025/2/10 20:24, Mark Rutland wrote:
> On Fri, Dec 06, 2024 at 06:17:33PM +0800, Jinjie Ruan wrote:
>> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
>> to use the generic entry infrastructure from kernel/entry/*.
>> The generic entry makes maintainers' work easier and codes
>> more elegant.
>>
>> Switch arm64 to generic IRQ entry first, which removed duplicate 100+
>> LOC, and it will switch to generic entry completely later. Switch to
>> generic entry in two steps according to Mark's suggestion will make
>> it easier to review.
>>
>> The changes are below:
>>  - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
>>    irqentry_enter/exit(). Also remove *enter_from/exit_to_user_mode(),
>>    and wrap with generic enter_from/exit_to_user_mode() because they
>>    are exactly the same so far.
>>
>>  - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
>>    because they're exactly the same, so the temporary arm64 version
>>    irqentry_state can also be removed.
>>
>>  - Remove PREEMPT_DYNAMIC code, as generic entry do the same thing
>>    if arm64 implement arch_irqentry_exit_need_resched().
>>
>> Suggested-by: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
>> ---
>>  arch/arm64/Kconfig                    |   1 +
>>  arch/arm64/include/asm/entry-common.h |  64 ++++++
>>  arch/arm64/include/asm/preempt.h      |   6 -
>>  arch/arm64/kernel/entry-common.c      | 307 ++++++--------------------
>>  arch/arm64/kernel/signal.c            |   3 +-
>>  5 files changed, 129 insertions(+), 252 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/entry-common.h
> 
> Superficially this looks nice, but to be clear I have *not* looked at
> this in great detail; minor comments below.
> 
> [...]
> 
>> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
>> +						  unsigned long ti_work)
>> +{
>> +	local_daif_mask();
>> +}
>> +
>> +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
> 
> I'm a little worried that this may be fragile having been hidden in the
> common code, as it's not clear exactly when this will occur during the
> return sequence, and the ordering requirements could easily be broken by
> refactoring there.
> 
> I suspect we'll want to pull this later in the arm64 exit sequence so
> that we can have it explicit in entry-common.c.

Yes, this key function is hidden in generic entry code and is not easy
to clear and see when it is executed. But placing it directly in
entry-common.c in arm64 may change the order in which lockdep_sys_exit()
and local_daif_mask() are called, it's not clear what the potential
impact is.

Before:
   exit_to_user_mode_prepare()
      ...
      -> local_daif_mask()
      -> lockdep_sys_exit()


arm64_exit_to_user_mode()
  ...
  -> exit_to_user_mode_prepare()
     -> lockdep_sys_exit()
  -> local_daif_mask()

> 
> [...]
> 
>> index 14ac6fdb872b..84b6628647c7 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -9,6 +9,7 @@
>>  #include <linux/cache.h>
>>  #include <linux/compat.h>
>>  #include <linux/errno.h>
>> +#include <linux/irq-entry-common.h>
>>  #include <linux/kernel.h>
>>  #include <linux/signal.h>
>>  #include <linux/freezer.h>
>> @@ -1603,7 +1604,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>>   * the kernel can handle, and then we build all the user-level signal handling
>>   * stack-frames in one go after that.
>>   */
>> -void do_signal(struct pt_regs *regs)
>> +void arch_do_signal_or_restart(struct pt_regs *regs)
>>  {
>>  	unsigned long continue_addr = 0, restart_addr = 0;
>>  	int retval = 0;
> 
> Is the expected semantic the same here, or is those more than just a
> name change?

Yes, the expected semantic is the same here, they both handle
_TIF_SIGPENDING and _TIF_NOTIFY_SIGNAL thread flags before
exit to user.

In arm64 the code call sequence is:

  exit_to_user_mode()
     -> exit_to_user_mode_prepare()
        -> do_notify_resume(regs, flags)
           -> if (thread_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
                 do_signal(regs)

In generic entry code, the logic is the same:

  exit_to_user_mode_prepare()
      -> exit_to_user_mode_loop()
          -> if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
                 arch_do_signal_or_restart(regs)

> 
> Mark.
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
  2025-02-10 12:30   ` Mark Rutland
@ 2025-02-11 11:43     ` Jinjie Ruan
  0 siblings, 0 replies; 38+ messages in thread
From: Jinjie Ruan @ 2025-02-11 11:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel



On 2025/2/10 20:30, Mark Rutland wrote:
> On Sat, Feb 08, 2025 at 09:15:08AM +0800, Jinjie Ruan wrote:
>> On 2024/12/6 18:17, Jinjie Ruan wrote:
>>> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
>>> to use the generic entry infrastructure from kernel/entry/*. The generic
>>> entry makes maintainers' work easier and codes more elegant, which aslo
>>> removed a lot of duplicate code.
>>>
>>> The main steps are as follows:
>>> - Make arm64 easier to use irqentry_enter/exit().
>>> - Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
>>> - Split generic entry into generic irq entry and generic syscall to
>>>   make the single patch more concentrated in switching to one thing.
>>> - Switch to generic irq entry.
>>> - Make arm64 closer to the generic syscall code.
>>> - Switch to generic entry completely.
>>>
>>> Changes in v5:
>>> - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
>>> - Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
>>> - Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
>>> - Update "irqentry_state_t ret/irq_state" to "state"
>>>   to keep it consistently.
>>> - Use generic irq entry header for PREEMPT_DYNAMIC after split
>>>   the generic entry.
>>> - Also refactor the ARM64 syscall code.
>>> - Introduce arch_ptrace_report_syscall_entry/exit(), instead of
>>>   arch_pre/post_report_syscall_entry/exit() to simplify code.
>>> - Make the syscall patches clear separation.
>>> - Update the commit message.
>>
>> Gentle Ping.
> 
> I've left soem comments.
> 
> As I mentioned previously, I'd very much prefer that we do the syscall
> entry logic changes *later* (i.e. as a follow-up patch series), after
> we've got the irq/exception entry logic sorted.
> 
> I reckon we've got just enough time to get the irq/exception entry
> changes ready this cycle, with another round or two of review. So can we
> please put the syscall bits aside for now? ... that and run all the
> tests you mention in patch 22 on the irq/exception entry changes alone.

Sure, it is ok to put the syscall bits aside and split it out .

> 
> Mark.
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-02-11 11:44 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 10:17 [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Jinjie Ruan
2025-02-10 11:04   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1 Jinjie Ruan
2025-02-10 11:08   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() Jinjie Ruan
2025-02-10 11:26   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq() Jinjie Ruan
2025-02-10 11:33   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper Jinjie Ruan
2025-02-10 11:40   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead Jinjie Ruan
2025-02-10 11:48   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled Jinjie Ruan
2025-02-10 11:52   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC Jinjie Ruan
2025-02-10 11:54   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall Jinjie Ruan
2025-02-10 12:04   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64 Jinjie Ruan
2025-02-10 12:05   ` Mark Rutland
2024-12-06 10:17 ` [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Jinjie Ruan
2025-02-10 12:24   ` Mark Rutland
2025-02-11 11:32     ` Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 12/22] arm64/ptrace: Split report_syscall() function Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 13/22] arm64/ptrace: Refactor syscall_trace_enter() Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 14/22] arm64/ptrace: Refactor syscall_trace_exit() Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 15/22] arm64/ptrace: Refator el0_svc_common() Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 16/22] entry: Make syscall_exit_to_user_mode_prepare() not static Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 17/22] arm64/ptrace: Return early for ptrace_report_syscall_entry() error Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 18/22] arm64/ptrace: Expand secure_computing() in place Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 19/22] arm64/ptrace: Use syscall_get_arguments() heleper Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 20/22] entry: Add arch_ptrace_report_syscall_entry/exit() Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 21/22] entry: Add has_syscall_work() helepr Jinjie Ruan
2024-12-06 10:17 ` [PATCH -next v5 22/22] arm64: entry: Convert to generic entry Jinjie Ruan
2025-02-08  1:15 ` [PATCH -next v5 00/22] " Jinjie Ruan
2025-02-10 12:30   ` Mark Rutland
2025-02-11 11:43     ` Jinjie Ruan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox