public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* Re: [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
       [not found] <20241206101744.4161990-1-ruanjinjie@huawei.com>
@ 2025-02-08  1:15 ` Jinjie Ruan
  2025-02-10 12:30   ` Mark Rutland
       [not found] ` <20241206101744.4161990-2-ruanjinjie@huawei.com>
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: Jinjie Ruan @ 2025-02-08  1:15 UTC (permalink / raw)
  To: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, mark.rutland, pcc, ardb, sudeep.holla, guohanjun,
	rafael, liuwei09, dwmw, Jonathan.Cameron, liaochang1,
	kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel



On 2024/12/6 18:17, Jinjie Ruan wrote:
> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> to use the generic entry infrastructure from kernel/entry/*. The generic
> entry makes maintainers' work easier and codes more elegant, which aslo
> removed a lot of duplicate code.
> 
> The main steps are as follows:
> - Make arm64 easier to use irqentry_enter/exit().
> - Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
> - Split generic entry into generic irq entry and generic syscall to
>   make the single patch more concentrated in switching to one thing.
> - Switch to generic irq entry.
> - Make arm64 closer to the generic syscall code.
> - Switch to generic entry completely.
> 
> Changes in v5:
> - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
> - Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
> - Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
> - Update "irqentry_state_t ret/irq_state" to "state"
>   to keep it consistently.
> - Use generic irq entry header for PREEMPT_DYNAMIC after split
>   the generic entry.
> - Also refactor the ARM64 syscall code.
> - Introduce arch_ptrace_report_syscall_entry/exit(), instead of
>   arch_pre/post_report_syscall_entry/exit() to simplify code.
> - Make the syscall patches clear separation.
> - Update the commit message.

Gentle Ping.

> 
> Changes in v4:
> - Rework/cleanup split into a few patches as Mark suggested.
> - Replace interrupts_enabled() macro with regs_irqs_disabled(), instead
>   of left it here.
> - Remove rcu and lockdep state in pt_regs by using temporary
>   irqentry_state_t as Mark suggested.
> - Remove some unnecessary intermediate functions to make it clear.
> - Rework preempt irq and PREEMPT_DYNAMIC code
>   to make the switch more clear.
> - arch_prepare_*_entry/exit() -> arch_pre_*_entry/exit().
> - Expand the arch functions comment.
> - Make arch functions closer to its caller.
> - Declare saved_reg in for block.
> - Remove arch_exit_to_kernel_mode_prepare(), arch_enter_from_kernel_mode().
> - Adjust "Add few arch functions to use generic entry" patch to be
>   the penultimate.
> - Update the commit message.
> - Add suggested-by.
> 
> Changes in v3:
> - Test the MTE test cases.
> - Handle forget_syscall() in arch_post_report_syscall_entry()
> - Make the arch funcs not use __weak as Thomas suggested, so move
>   the arch funcs to entry-common.h, and make arch_forget_syscall() folded
>   in arch_post_report_syscall_entry() as suggested.
> - Move report_single_step() to thread_info.h for arm64
> - Change __always_inline() to inline, add inline for the other arch funcs.
> - Remove unused signal.h for entry-common.h.
> - Add Suggested-by.
> - Update the commit message.
> 
> Changes in v2:
> - Add tested-by.
> - Fix a bug that not call arch_post_report_syscall_entry() in
>   syscall_trace_enter() if ptrace_report_syscall_entry() return not zero.
> - Refactor report_syscall().
> - Add comment for arch_prepare_report_syscall_exit().
> - Adjust entry-common.h header file inclusion to alphabetical order.
> - Update the commit message.
> 
> Jinjie Ruan (22):
>   arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
>   arm64: entry: Refactor the entry and exit for exceptions from EL1
>   arm64: entry: Move arm64_preempt_schedule_irq() into
>     __exit_to_kernel_mode()
>   arm64: entry: Rework arm64_preempt_schedule_irq()
>   arm64: entry: Use preempt_count() and need_resched() helper
>   arm64: entry: Expand the need_irq_preemption() macro ahead
>   arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled
>   arm64: entry: Use different helpers to check resched for
>     PREEMPT_DYNAMIC
>   entry: Split generic entry into irq and syscall
>   entry: Add arch_irqentry_exit_need_resched() for arm64
>   arm64: entry: Switch to generic IRQ entry
>   arm64/ptrace: Split report_syscall() function
>   arm64/ptrace: Refactor syscall_trace_enter()
>   arm64/ptrace: Refactor syscall_trace_exit()
>   arm64/ptrace: Refator el0_svc_common()
>   entry: Make syscall_exit_to_user_mode_prepare() not static
>   arm64/ptrace: Return early for ptrace_report_syscall_entry() error
>   arm64/ptrace: Expand secure_computing() in place
>   arm64/ptrace: Use syscall_get_arguments() heleper
>   entry: Add arch_ptrace_report_syscall_entry/exit()
>   entry: Add has_syscall_work() helepr
>   arm64: entry: Convert to generic entry
> 
>  MAINTAINERS                           |   1 +
>  arch/Kconfig                          |   8 +
>  arch/arm64/Kconfig                    |   1 +
>  arch/arm64/include/asm/daifflags.h    |   2 +-
>  arch/arm64/include/asm/entry-common.h | 134 +++++++++
>  arch/arm64/include/asm/preempt.h      |   2 -
>  arch/arm64/include/asm/ptrace.h       |  11 +-
>  arch/arm64/include/asm/syscall.h      |   6 +-
>  arch/arm64/include/asm/thread_info.h  |  23 +-
>  arch/arm64/include/asm/xen/events.h   |   2 +-
>  arch/arm64/kernel/acpi.c              |   2 +-
>  arch/arm64/kernel/debug-monitors.c    |   9 +-
>  arch/arm64/kernel/entry-common.c      | 377 ++++++++-----------------
>  arch/arm64/kernel/ptrace.c            |  90 ------
>  arch/arm64/kernel/sdei.c              |   2 +-
>  arch/arm64/kernel/signal.c            |   3 +-
>  arch/arm64/kernel/syscall.c           |  31 +-
>  include/linux/entry-common.h          | 384 +------------------------
>  include/linux/irq-entry-common.h      | 389 ++++++++++++++++++++++++++
>  kernel/entry/Makefile                 |   3 +-
>  kernel/entry/common.c                 | 176 ++----------
>  kernel/entry/syscall-common.c         | 198 +++++++++++++
>  kernel/sched/core.c                   |   8 +-
>  23 files changed, 909 insertions(+), 953 deletions(-)
>  create mode 100644 arch/arm64/include/asm/entry-common.h
>  create mode 100644 include/linux/irq-entry-common.h
>  create mode 100644 kernel/entry/syscall-common.c
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
       [not found] ` <20241206101744.4161990-2-ruanjinjie@huawei.com>
@ 2025-02-10 11:04   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:04 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:23PM +0800, Jinjie Ruan wrote:
> The generic entry code expects architecture code to provide
> regs_irqs_disabled(regs) function, but arm64 does not have this and
> provides inerrupts_enabled(regs), which has the opposite polarity.
> 
> In preparation for moving arm64 over to the generic entry code,
> relace arm64's interrupts_enabled() with regs_irqs_disabled() and
> update its callers under arch/arm64.
> 
> For the moment, a definition of interrupts_enabled() is provided for
> the GICv3 driver. Once arch/arm implement regs_irqs_disabled(), this
> can be removed.
> 
> No functional changes.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/daifflags.h  | 2 +-
>  arch/arm64/include/asm/ptrace.h     | 7 +++++++
>  arch/arm64/include/asm/xen/events.h | 2 +-
>  arch/arm64/kernel/acpi.c            | 2 +-
>  arch/arm64/kernel/debug-monitors.c  | 2 +-
>  arch/arm64/kernel/entry-common.c    | 4 ++--
>  arch/arm64/kernel/sdei.c            | 2 +-
>  7 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
> index fbb5c99eb2f9..5fca48009043 100644
> --- a/arch/arm64/include/asm/daifflags.h
> +++ b/arch/arm64/include/asm/daifflags.h
> @@ -128,7 +128,7 @@ static inline void local_daif_inherit(struct pt_regs *regs)
>  {
>  	unsigned long flags = regs->pstate & DAIF_MASK;
>  
> -	if (interrupts_enabled(regs))
> +	if (!regs_irqs_disabled(regs))
>  		trace_hardirqs_on();
>  
>  	if (system_uses_irq_prio_masking())
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index 47ff8654c5ec..bcfa96880377 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -214,9 +214,16 @@ static inline void forget_syscall(struct pt_regs *regs)
>  		(regs)->pmr == GIC_PRIO_IRQON :				\
>  		true)
>  
> +/*
> + * Used by the GICv3 driver, can be removed once arch/arm implements
> + * regs_irqs_disabled() directly.
> + */
>  #define interrupts_enabled(regs)			\
>  	(!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
>  
> +#define regs_irqs_disabled(regs)			\
> +	(((regs)->pstate & PSR_I_BIT) || (!irqs_priority_unmasked(regs)))

Please make this:

| static __always_inline bool regs_irqs_disabled(const struct pt_regs *regs)
| {
| 	return (regs->pstate & PSR_I_BIT) || !irqs_priority_unmasked(regs);
| }
| 
| #define interrupts_enabled(regs)	(!regs_irqs_disabled(regs))

That way this matches the style of x86 and s390, and with
interrupts_enabled() defined in terms of regs_irqs_disabled(), the two
cannot accidentaly diverge.

>  #define fast_interrupts_enabled(regs) \
>  	(!((regs)->pstate & PSR_F_BIT))

We should probably delete this at the same time; it's unused and we
don't want any new users to show up.

With those changes:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

>  
> diff --git a/arch/arm64/include/asm/xen/events.h b/arch/arm64/include/asm/xen/events.h
> index 2788e95d0ff0..2977b5fe068d 100644
> --- a/arch/arm64/include/asm/xen/events.h
> +++ b/arch/arm64/include/asm/xen/events.h
> @@ -14,7 +14,7 @@ enum ipi_vector {
>  
>  static inline int xen_irqs_disabled(struct pt_regs *regs)
>  {
> -	return !interrupts_enabled(regs);
> +	return regs_irqs_disabled(regs);
>  }
>  
>  #define xchg_xen_ulong(ptr, val) xchg((ptr), (val))
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index e6f66491fbe9..732f89daae23 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -403,7 +403,7 @@ int apei_claim_sea(struct pt_regs *regs)
>  	return_to_irqs_enabled = !irqs_disabled_flags(arch_local_save_flags());
>  
>  	if (regs)
> -		return_to_irqs_enabled = interrupts_enabled(regs);
> +		return_to_irqs_enabled = !regs_irqs_disabled(regs);
>  
>  	/*
>  	 * SEA can interrupt SError, mask it and describe this as an NMI so
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 58f047de3e1c..460c09d03a73 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -231,7 +231,7 @@ static void send_user_sigtrap(int si_code)
>  	if (WARN_ON(!user_mode(regs)))
>  		return;
>  
> -	if (interrupts_enabled(regs))
> +	if (!regs_irqs_disabled(regs))
>  		local_irq_enable();
>  
>  	arm64_force_sig_fault(SIGTRAP, si_code, instruction_pointer(regs),
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index b260ddc4d3e9..c547e70428d3 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -73,7 +73,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
>  {
>  	lockdep_assert_irqs_disabled();
>  
> -	if (interrupts_enabled(regs)) {
> +	if (!regs_irqs_disabled(regs)) {
>  		if (regs->exit_rcu) {
>  			trace_hardirqs_on_prepare();
>  			lockdep_hardirqs_on_prepare();
> @@ -569,7 +569,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
>  {
>  	write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
>  
> -	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> +	if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && regs_irqs_disabled(regs))
>  		__el1_pnmi(regs, handler);
>  	else
>  		__el1_irq(regs, handler);
> diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
> index 255d12f881c2..27a17da635d8 100644
> --- a/arch/arm64/kernel/sdei.c
> +++ b/arch/arm64/kernel/sdei.c
> @@ -247,7 +247,7 @@ unsigned long __kprobes do_sdei_event(struct pt_regs *regs,
>  	 * If we interrupted the kernel with interrupts masked, we always go
>  	 * back to wherever we came from.
>  	 */
> -	if (mode == kernel_mode && !interrupts_enabled(regs))
> +	if (mode == kernel_mode && regs_irqs_disabled(regs))
>  		return SDEI_EV_HANDLED;
>  
>  	/*
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1
       [not found] ` <20241206101744.4161990-3-ruanjinjie@huawei.com>
@ 2025-02-10 11:08   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:08 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:24PM +0800, Jinjie Ruan wrote:
> The generic entry code uses irqentry_state_t to track lockdep and RCU
> state across exception entry and return. For historical reasons, arm64
> embeds similar fields within its pt_regs structure.
> 
> In preparation for moving arm64 over to the generic entry code, pull
> these fields out of arm64's pt_regs, and use a separate structure,
> matching the style of the generic entry code.
> 
> No functional changes.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/ptrace.h  |   4 -
>  arch/arm64/kernel/entry-common.c | 136 +++++++++++++++++++------------
>  2 files changed, 85 insertions(+), 55 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index bcfa96880377..e90dfc9982aa 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -169,10 +169,6 @@ struct pt_regs {
>  
>  	u64 sdei_ttbr1;
>  	struct frame_record_meta stackframe;
> -
> -	/* Only valid for some EL1 exceptions. */
> -	u64 lockdep_hardirqs;
> -	u64 exit_rcu;
>  };
>  
>  /* For correct stack alignment, pt_regs has to be a multiple of 16 bytes. */
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index c547e70428d3..1687627b2ecf 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -28,6 +28,13 @@
>  #include <asm/sysreg.h>
>  #include <asm/system_misc.h>
>  
> +typedef struct irqentry_state {
> +	union {
> +		bool	exit_rcu;
> +		bool	lockdep;
> +	};
> +} irqentry_state_t;

I think we should add an arm64_ prefix here, to avoid the possiblity of
build errors if we somehow get this and the common definition included
at the same time.

That'll require some simple changes when we switch over, but it should
be relatively obvious and simple.

Otherwise, the structural changes look good to me.

Mark.

> +
>  /*
>   * Handle IRQ/context state management when entering from kernel mode.
>   * Before this function is called it is not safe to call regular kernel code,
> @@ -36,29 +43,36 @@
>   * This is intended to match the logic in irqentry_enter(), handling the kernel
>   * mode transitions only.
>   */
> -static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs)
> +static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt_regs *regs)
>  {
> -	regs->exit_rcu = false;
> +	irqentry_state_t state = {
> +		.exit_rcu = false,
> +	};
>  
>  	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
>  		lockdep_hardirqs_off(CALLER_ADDR0);
>  		ct_irq_enter();
>  		trace_hardirqs_off_finish();
>  
> -		regs->exit_rcu = true;
> -		return;
> +		state.exit_rcu = true;
> +		return state;
>  	}
>  
>  	lockdep_hardirqs_off(CALLER_ADDR0);
>  	rcu_irq_enter_check_tick();
>  	trace_hardirqs_off_finish();
> +
> +	return state;
>  }
>  
> -static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> +static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  {
> -	__enter_from_kernel_mode(regs);
> +	irqentry_state_t state = __enter_from_kernel_mode(regs);
> +
>  	mte_check_tfsr_entry();
>  	mte_disable_tco_entry(current);
> +
> +	return state;
>  }
>  
>  /*
> @@ -69,12 +83,13 @@ static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
>   * This is intended to match the logic in irqentry_exit(), handling the kernel
>   * mode transitions only, and with preemption handled elsewhere.
>   */
> -static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
> +static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
> +						  irqentry_state_t state)
>  {
>  	lockdep_assert_irqs_disabled();
>  
>  	if (!regs_irqs_disabled(regs)) {
> -		if (regs->exit_rcu) {
> +		if (state.exit_rcu) {
>  			trace_hardirqs_on_prepare();
>  			lockdep_hardirqs_on_prepare();
>  			ct_irq_exit();
> @@ -84,15 +99,16 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
>  
>  		trace_hardirqs_on();
>  	} else {
> -		if (regs->exit_rcu)
> +		if (state.exit_rcu)
>  			ct_irq_exit();
>  	}
>  }
>  
> -static void noinstr exit_to_kernel_mode(struct pt_regs *regs)
> +static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
> +					irqentry_state_t state)
>  {
>  	mte_check_tfsr_exit();
> -	__exit_to_kernel_mode(regs);
> +	__exit_to_kernel_mode(regs, state);
>  }
>  
>  /*
> @@ -190,9 +206,11 @@ asmlinkage void noinstr asm_exit_to_user_mode(struct pt_regs *regs)
>   * mode. Before this function is called it is not safe to call regular kernel
>   * code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_enter_nmi(struct pt_regs *regs)
> +static noinstr irqentry_state_t arm64_enter_nmi(struct pt_regs *regs)
>  {
> -	regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
> +	irqentry_state_t state;
> +
> +	state.lockdep = lockdep_hardirqs_enabled();
>  
>  	__nmi_enter();
>  	lockdep_hardirqs_off(CALLER_ADDR0);
> @@ -201,6 +219,8 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
>  
>  	trace_hardirqs_off_finish();
>  	ftrace_nmi_enter();
> +
> +	return state;
>  }
>  
>  /*
> @@ -208,19 +228,18 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
>   * mode. After this function returns it is not safe to call regular kernel
>   * code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_exit_nmi(struct pt_regs *regs)
> +static void noinstr arm64_exit_nmi(struct pt_regs *regs,
> +				   irqentry_state_t state)
>  {
> -	bool restore = regs->lockdep_hardirqs;
> -
>  	ftrace_nmi_exit();
> -	if (restore) {
> +	if (state.lockdep) {
>  		trace_hardirqs_on_prepare();
>  		lockdep_hardirqs_on_prepare();
>  	}
>  
>  	ct_nmi_exit();
>  	lockdep_hardirq_exit();
> -	if (restore)
> +	if (state.lockdep)
>  		lockdep_hardirqs_on(CALLER_ADDR0);
>  	__nmi_exit();
>  }
> @@ -230,14 +249,18 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs)
>   * kernel mode. Before this function is called it is not safe to call regular
>   * kernel code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
> +static noinstr irqentry_state_t arm64_enter_el1_dbg(struct pt_regs *regs)
>  {
> -	regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
> +	irqentry_state_t state;
> +
> +	state.lockdep = lockdep_hardirqs_enabled();
>  
>  	lockdep_hardirqs_off(CALLER_ADDR0);
>  	ct_nmi_enter();
>  
>  	trace_hardirqs_off_finish();
> +
> +	return state;
>  }
>  
>  /*
> @@ -245,17 +268,16 @@ static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
>   * kernel mode. After this function returns it is not safe to call regular
>   * kernel code, instrumentable code, or any code which may trigger an exception.
>   */
> -static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
> +static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
> +				       irqentry_state_t state)
>  {
> -	bool restore = regs->lockdep_hardirqs;
> -
> -	if (restore) {
> +	if (state.lockdep) {
>  		trace_hardirqs_on_prepare();
>  		lockdep_hardirqs_on_prepare();
>  	}
>  
>  	ct_nmi_exit();
> -	if (restore)
> +	if (state.lockdep)
>  		lockdep_hardirqs_on(CALLER_ADDR0);
>  }
>  
> @@ -426,78 +448,86 @@ UNHANDLED(el1t, 64, error)
>  static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
>  {
>  	unsigned long far = read_sysreg(far_el1);
> +	irqentry_state_t state;
>  
> -	enter_from_kernel_mode(regs);
> +	state = enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_mem_abort(far, esr, regs);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
>  {
>  	unsigned long far = read_sysreg(far_el1);
> +	irqentry_state_t state;
>  
> -	enter_from_kernel_mode(regs);
> +	state = enter_from_kernel_mode(regs);
>  	local_daif_inherit(regs);
>  	do_sp_pc_abort(far, esr, regs);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_undef(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_bti(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_gcs(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_mops(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr)
>  {
>  	unsigned long far = read_sysreg(far_el1);
> +	irqentry_state_t state;
>  
> -	arm64_enter_el1_dbg(regs);
> +	state = arm64_enter_el1_dbg(regs);
>  	if (!cortex_a76_erratum_1463225_debug_handler(regs))
>  		do_debug_exception(far, esr, regs);
> -	arm64_exit_el1_dbg(regs);
> +	arm64_exit_el1_dbg(regs, state);
>  }
>  
>  static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
> +
>  	local_daif_inherit(regs);
>  	do_el1_fpac(regs, esr);
>  	local_daif_mask();
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  
>  asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
> @@ -546,15 +576,16 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
>  static __always_inline void __el1_pnmi(struct pt_regs *regs,
>  				       void (*handler)(struct pt_regs *))
>  {
> -	arm64_enter_nmi(regs);
> +	irqentry_state_t state = arm64_enter_nmi(regs);
> +
>  	do_interrupt_handler(regs, handler);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  }
>  
>  static __always_inline void __el1_irq(struct pt_regs *regs,
>  				      void (*handler)(struct pt_regs *))
>  {
> -	enter_from_kernel_mode(regs);
> +	irqentry_state_t state = enter_from_kernel_mode(regs);
>  
>  	irq_enter_rcu();
>  	do_interrupt_handler(regs, handler);
> @@ -562,7 +593,7 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
>  
>  	arm64_preempt_schedule_irq();
>  
> -	exit_to_kernel_mode(regs);
> +	exit_to_kernel_mode(regs, state);
>  }
>  static void noinstr el1_interrupt(struct pt_regs *regs,
>  				  void (*handler)(struct pt_regs *))
> @@ -588,11 +619,12 @@ asmlinkage void noinstr el1h_64_fiq_handler(struct pt_regs *regs)
>  asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs)
>  {
>  	unsigned long esr = read_sysreg(esr_el1);
> +	irqentry_state_t state;
>  
>  	local_daif_restore(DAIF_ERRCTX);
> -	arm64_enter_nmi(regs);
> +	state = arm64_enter_nmi(regs);
>  	do_serror(regs, esr);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  }
>  
>  static void noinstr el0_da(struct pt_regs *regs, unsigned long esr)
> @@ -855,12 +887,13 @@ asmlinkage void noinstr el0t_64_fiq_handler(struct pt_regs *regs)
>  static void noinstr __el0_error_handler_common(struct pt_regs *regs)
>  {
>  	unsigned long esr = read_sysreg(esr_el1);
> +	irqentry_state_t state;
>  
>  	enter_from_user_mode(regs);
>  	local_daif_restore(DAIF_ERRCTX);
> -	arm64_enter_nmi(regs);
> +	state = arm64_enter_nmi(regs);
>  	do_serror(regs, esr);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  	local_daif_restore(DAIF_PROCCTX);
>  	exit_to_user_mode(regs);
>  }
> @@ -968,6 +1001,7 @@ asmlinkage void noinstr __noreturn handle_bad_stack(struct pt_regs *regs)
>  asmlinkage noinstr unsigned long
>  __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
>  {
> +	irqentry_state_t state;
>  	unsigned long ret;
>  
>  	/*
> @@ -992,9 +1026,9 @@ __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
>  	else if (cpu_has_pan())
>  		set_pstate_pan(0);
>  
> -	arm64_enter_nmi(regs);
> +	state = arm64_enter_nmi(regs);
>  	ret = do_sdei_event(regs, arg);
> -	arm64_exit_nmi(regs);
> +	arm64_exit_nmi(regs, state);
>  
>  	return ret;
>  }
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()
       [not found] ` <20241206101744.4161990-4-ruanjinjie@huawei.com>
@ 2025-02-10 11:26   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:26 UTC (permalink / raw)
  To: Jinjie Ruan, tglx
  Cc: catalin.marinas, will, oleg, sstabellini, peterz, luto, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, kees, wad, akpm, samitolvanen, masahiroy, hca,
	aliceryhl, rppt, xur, paulmck, arnd, mbenes, puranjay, pcc, ardb,
	sudeep.holla, guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron,
	liaochang1, kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:25PM +0800, Jinjie Ruan wrote:
> The generic entry code try to reschedule every time when the kernel
> mode non-NMI exception return. At the moment, arm64 only reschedule every
> time when EL1 irq exception return;

I think this is a bit unclear, and should say something like:
  
| The arm64 entry code only preempts a kernel context upon a return from
| a regular IRQ exception. The generic entry code may preempt a kernel
| context for any exception return where irqentry_exit() is used, and so
| may preempt other exceptions such as faults.

Thomas, can you confirm that's the *intent* of the generic entry code?

> In preparation for moving arm64 over to the generic entry code, move
> arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
> only EL1 irq but also all EL1 non-NMI exception return, there is a chance
> to reschedule. And only if irqs are enabled when the exception trapped,
> there may be a chance to reschedule after the exceptions have been handled,
> so move arm64_preempt_schedule_irq() into regs_irqs_disabled()
> check false block, but it will try to reschedule only when TINY_RCU is
> enabled or current is not an idle task.

I think the detail is confusing here, and it would be better to say:

| In preparation for moving arm64 over to the generic entry code, align
| arm64 with the generic behaviour by calling
| arm64_preempt_schedule_irq() from exit_to_kernel_mode(). To make this
| possible, arm64_preempt_schedule_irq() and need_irq_preemption() are
| moved earlier in the file, with no changes.

Mark.

> As Mark pointed out, this change will have the following 2 key impact:
> 
> - " We'll preempt even without taking a "real" interrupt. That
>     shouldn't result in preemption that wasn't possible before,
>     but it does change the probability of preempting at certain points,
>     and might have a performance impact, so probably warrants a
>     benchmark."
> 
> - " We will not preempt when taking interrupts from a region of kernel
>     code where IRQs are enabled but RCU is not watching, matching the
>     behaviour of the generic entry code.
> 
>     This has the potential to introduce livelock if we can ever have a
>     screaming interrupt in such a region, so we'll need to go figure out
>     whether that's actually a problem.
> 
>     Having this as a separate patch will make it easier to test/bisect
>     for that specifically."
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 88 ++++++++++++++++----------------
>  1 file changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 1687627b2ecf..7a588515ee07 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -75,6 +75,48 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  	return state;
>  }
>  
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +#define need_irq_preemption() \
> +	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +#else
> +#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
> +#endif
> +
> +static void __sched arm64_preempt_schedule_irq(void)
> +{
> +	if (!need_irq_preemption())
> +		return;
> +
> +	/*
> +	 * Note: thread_info::preempt_count includes both thread_info::count
> +	 * and thread_info::need_resched, and is not equivalent to
> +	 * preempt_count().
> +	 */
> +	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> +		return;
> +
> +	/*
> +	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +	 * DAIF we must have handled an NMI, so skip preemption.
> +	 */
> +	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> +		return;
> +
> +	/*
> +	 * Preempting a task from an IRQ means we leave copies of PSTATE
> +	 * on the stack. cpufeature's enable calls may modify PSTATE, but
> +	 * resuming one of these preempted tasks would undo those changes.
> +	 *
> +	 * Only allow a task to be preempted once cpufeatures have been
> +	 * enabled.
> +	 */
> +	if (system_capabilities_finalized())
> +		preempt_schedule_irq();
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -97,6 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> +		arm64_preempt_schedule_irq();
> +
>  		trace_hardirqs_on();
>  	} else {
>  		if (state.exit_rcu)
> @@ -281,48 +325,6 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
>  		lockdep_hardirqs_on(CALLER_ADDR0);
>  }
>  
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -#define need_irq_preemption() \
> -	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -#else
> -#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
> -#endif
> -
> -static void __sched arm64_preempt_schedule_irq(void)
> -{
> -	if (!need_irq_preemption())
> -		return;
> -
> -	/*
> -	 * Note: thread_info::preempt_count includes both thread_info::count
> -	 * and thread_info::need_resched, and is not equivalent to
> -	 * preempt_count().
> -	 */
> -	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> -		return;
> -
> -	/*
> -	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> -	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> -	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> -	 * DAIF we must have handled an NMI, so skip preemption.
> -	 */
> -	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> -		return;
> -
> -	/*
> -	 * Preempting a task from an IRQ means we leave copies of PSTATE
> -	 * on the stack. cpufeature's enable calls may modify PSTATE, but
> -	 * resuming one of these preempted tasks would undo those changes.
> -	 *
> -	 * Only allow a task to be preempted once cpufeatures have been
> -	 * enabled.
> -	 */
> -	if (system_capabilities_finalized())
> -		preempt_schedule_irq();
> -}
> -
>  static void do_interrupt_handler(struct pt_regs *regs,
>  				 void (*handler)(struct pt_regs *))
>  {
> @@ -591,8 +593,6 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
>  	do_interrupt_handler(regs, handler);
>  	irq_exit_rcu();
>  
> -	arm64_preempt_schedule_irq();
> -
>  	exit_to_kernel_mode(regs, state);
>  }
>  static void noinstr el1_interrupt(struct pt_regs *regs,
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq()
       [not found] ` <20241206101744.4161990-5-ruanjinjie@huawei.com>
@ 2025-02-10 11:33   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:33 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:26PM +0800, Jinjie Ruan wrote:
> The generic entry do preempt_schedule_irq() by checking if need_resched()
> satisfied, but arm64 has some of its own additional checks such as
> GIC priority masking.
> 
> In preparation for moving arm64 over to the generic entry code, rework
> arm64_preempt_schedule_irq() to check whether it need resched in a check
> function called arm64_need_resched().

I think what this is saying is that the generic entry code has the form:

| raw_irqentry_exit_cond_resched()
| {
| 	if (!preempt_count()) {
| 		...
| 		if (need_resched())
| 			preempt_schedule_irq();
| 	}
| }

... but it's not obvious why it's better to have and
arm64_need_resched() rather than a arm64_preempt_schedule_irq().

Having some idea of the change you intend to make to the generic code
would be helpful, and/or that generic change should be made earlier as a
preparatory patch.

Mark.

> No functional changes.
> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 7a588515ee07..da68c089b74b 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -83,10 +83,10 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
>  #define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
>  #endif
>  
> -static void __sched arm64_preempt_schedule_irq(void)
> +static inline bool arm64_need_resched(void)
>  {
>  	if (!need_irq_preemption())
> -		return;
> +		return false;
>  
>  	/*
>  	 * Note: thread_info::preempt_count includes both thread_info::count
> @@ -94,7 +94,7 @@ static void __sched arm64_preempt_schedule_irq(void)
>  	 * preempt_count().
>  	 */
>  	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> -		return;
> +		return false;
>  
>  	/*
>  	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> @@ -103,7 +103,7 @@ static void __sched arm64_preempt_schedule_irq(void)
>  	 * DAIF we must have handled an NMI, so skip preemption.
>  	 */
>  	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> -		return;
> +		return false;
>  
>  	/*
>  	 * Preempting a task from an IRQ means we leave copies of PSTATE
> @@ -113,8 +113,10 @@ static void __sched arm64_preempt_schedule_irq(void)
>  	 * Only allow a task to be preempted once cpufeatures have been
>  	 * enabled.
>  	 */
> -	if (system_capabilities_finalized())
> -		preempt_schedule_irq();
> +	if (!system_capabilities_finalized())
> +		return false;
> +
> +	return true;
>  }
>  
>  /*
> @@ -139,7 +141,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		arm64_preempt_schedule_irq();
> +		if (arm64_need_resched())
> +			preempt_schedule_irq();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper
       [not found] ` <20241206101744.4161990-6-ruanjinjie@huawei.com>
@ 2025-02-10 11:40   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:40 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:27PM +0800, Jinjie Ruan wrote:
> The generic entry code uses preempt_count() and need_resched() helpers to
> check if it is time to resched. Currently, arm64 use its own check logic,
> that is "READ_ONCE(current_thread_info()->preempt_count == 0", which is
> equivalent to "preempt_count() == 0 && need_resched()".

Hmm. The existing code relies upon preempt_fold_need_resched() to work
correctly. If we want to move from:

	READ_ONCE(current_thread_info()->preempt_count) == 0

... to:

	!preempt_count() && need_resched()

... then that change should be made *before* we change the preemption
logic to preempt non-IRQ exceptions in patch 3. Otherwise, that logic is
consuming stale data most of the time.

Mark.

> In preparation for moving arm64 over to the generic entry code, use
> these helpers to replace arm64's own code and move it ahead.
> 
> No functional changes.
> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index da68c089b74b..efd1a990d138 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -88,14 +88,6 @@ static inline bool arm64_need_resched(void)
>  	if (!need_irq_preemption())
>  		return false;
>  
> -	/*
> -	 * Note: thread_info::preempt_count includes both thread_info::count
> -	 * and thread_info::need_resched, and is not equivalent to
> -	 * preempt_count().
> -	 */
> -	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> -		return false;
> -
>  	/*
>  	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
>  	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> @@ -141,8 +133,10 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		if (arm64_need_resched())
> -			preempt_schedule_irq();
> +		if (!preempt_count() && need_resched()) {
> +			if (arm64_need_resched())
> +				preempt_schedule_irq();
> +		}
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead
       [not found] ` <20241206101744.4161990-7-ruanjinjie@huawei.com>
@ 2025-02-10 11:48   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:48 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:28PM +0800, Jinjie Ruan wrote:
> The generic entry has the same logic as need_irq_preemption()
> macro and use a helper function to check other resched condition.
> 
> In preparation for moving arm64 over to the generic entry code,
> check and expand need_irq_preemption() ahead and extract arm64 resched
> check code to a helper function.

I think this is just saying that the goal is to align the structure of
the code with raw_irqentry_exit_cond_resched() from the generic entry
code.

It'd be a bit clearer to say that, and to do this *before* moving the
call into __exit_to_kernel_mode().

Mark.

> 
> No functional changes.
> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/preempt.h |  1 +
>  arch/arm64/kernel/entry-common.c | 28 +++++++++++++++++-----------
>  2 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
> index 0159b625cc7f..d0f93385bd85 100644
> --- a/arch/arm64/include/asm/preempt.h
> +++ b/arch/arm64/include/asm/preempt.h
> @@ -85,6 +85,7 @@ static inline bool should_resched(int preempt_offset)
>  void preempt_schedule(void);
>  void preempt_schedule_notrace(void);
>  
> +void raw_irqentry_exit_cond_resched(void);
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  
>  DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index efd1a990d138..80b47ca02db2 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -77,17 +77,10 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -#define need_irq_preemption() \
> -	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -#else
> -#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
>  #endif
>  
>  static inline bool arm64_need_resched(void)
>  {
> -	if (!need_irq_preemption())
> -		return false;
> -
>  	/*
>  	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
>  	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> @@ -111,6 +104,22 @@ static inline bool arm64_need_resched(void)
>  	return true;
>  }
>  
> +void raw_irqentry_exit_cond_resched(void)
> +{
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +		return;
> +#else
> +	if (!IS_ENABLED(CONFIG_PREEMPTION))
> +		return;
> +#endif
> +
> +	if (!preempt_count()) {
> +		if (need_resched() && arm64_need_resched())
> +			preempt_schedule_irq();
> +	}
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -133,10 +142,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		if (!preempt_count() && need_resched()) {
> -			if (arm64_need_resched())
> -				preempt_schedule_irq();
> -		}
> +		raw_irqentry_exit_cond_resched();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled
       [not found] ` <20241206101744.4161990-8-ruanjinjie@huawei.com>
@ 2025-02-10 11:52   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:52 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:29PM +0800, Jinjie Ruan wrote:
> The generic entry check PREEMPTION for both PREEMPT_DYNAMIC
> enabled and PREEMPT_DYNAMIC disabled.
> 
> Whether PREEMPT_DYNAMIC enabled or not, PREEMPTION should
> be enabled to allow reschedule before EL1 exception return, so
> move PREEMPTION check ahead in preparation for moving arm64 over
> to the generic entry code.

This is just moving the IS_ENABLED() check. It'd be clearer to say
something like "hoist the IS_ENABLED() check earlier", but equally we
could do that earleir in the series by folding this into the prior
patch.

Mark.

> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/kernel/entry-common.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 80b47ca02db2..029f8bd72f8a 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -109,9 +109,6 @@ void raw_irqentry_exit_cond_resched(void)
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
>  		return;
> -#else
> -	if (!IS_ENABLED(CONFIG_PREEMPTION))
> -		return;
>  #endif
>  
>  	if (!preempt_count()) {
> @@ -142,7 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  			return;
>  		}
>  
> -		raw_irqentry_exit_cond_resched();
> +		if (IS_ENABLED(CONFIG_PREEMPTION))
> +			raw_irqentry_exit_cond_resched();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC
       [not found] ` <20241206101744.4161990-9-ruanjinjie@huawei.com>
@ 2025-02-10 11:54   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 11:54 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:30PM +0800, Jinjie Ruan wrote:
> In generic entry, when PREEMPT_DYNAMIC is enabled or disabled, two
> different helpers are used to check whether resched is required
> and some common code is reused.
> 
> In preparation for moving arm64 over to the generic entry code,
> use new helper to check resched when PREEMPT_DYNAMIC enabled and
> reuse common code for the disabled case.
> 
> No functional changes.

Please fold this together with the last two patches; it's undoing
changes you made in patch 6, and it'd be far clearer to see that all at
once.

Mark.

> 
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/preempt.h |  3 +++
>  arch/arm64/kernel/entry-common.c | 21 +++++++++++----------
>  2 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
> index d0f93385bd85..0f0ba250efe8 100644
> --- a/arch/arm64/include/asm/preempt.h
> +++ b/arch/arm64/include/asm/preempt.h
> @@ -93,11 +93,14 @@ void dynamic_preempt_schedule(void);
>  #define __preempt_schedule()		dynamic_preempt_schedule()
>  void dynamic_preempt_schedule_notrace(void);
>  #define __preempt_schedule_notrace()	dynamic_preempt_schedule_notrace()
> +void dynamic_irqentry_exit_cond_resched(void);
> +#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
>  
>  #else /* CONFIG_PREEMPT_DYNAMIC */
>  
>  #define __preempt_schedule()		preempt_schedule()
>  #define __preempt_schedule_notrace()	preempt_schedule_notrace()
> +#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
>  
>  #endif /* CONFIG_PREEMPT_DYNAMIC */
>  #endif /* CONFIG_PREEMPTION */
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 029f8bd72f8a..015a65d19b52 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -75,10 +75,6 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  	return state;
>  }
>  
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -#endif
> -
>  static inline bool arm64_need_resched(void)
>  {
>  	/*
> @@ -106,17 +102,22 @@ static inline bool arm64_need_resched(void)
>  
>  void raw_irqentry_exit_cond_resched(void)
>  {
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -		return;
> -#endif
> -
>  	if (!preempt_count()) {
>  		if (need_resched() && arm64_need_resched())
>  			preempt_schedule_irq();
>  	}
>  }
>  
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +void dynamic_irqentry_exit_cond_resched(void)
> +{
> +	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +		return;
> +	raw_irqentry_exit_cond_resched();
> +}
> +#endif
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -140,7 +141,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,
>  		}
>  
>  		if (IS_ENABLED(CONFIG_PREEMPTION))
> -			raw_irqentry_exit_cond_resched();
> +			irqentry_exit_cond_resched();
>  
>  		trace_hardirqs_on();
>  	} else {
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall
       [not found] ` <20241206101744.4161990-10-ruanjinjie@huawei.com>
@ 2025-02-10 12:04   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 12:04 UTC (permalink / raw)
  To: Jinjie Ruan, tglx
  Cc: catalin.marinas, will, oleg, sstabellini, peterz, luto, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, kees, wad, akpm, samitolvanen, masahiroy, hca,
	aliceryhl, rppt, xur, paulmck, arnd, mbenes, puranjay, pcc, ardb,
	sudeep.holla, guohanjun, rafael, liuwei09, dwmw, Jonathan.Cameron,
	liaochang1, kristina.martsenko, ptosi, broonie, thiago.bauermann,
	kevin.brodsky, joey.gouly, liuyuntao12, leobras, linux-kernel,
	linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:31PM +0800, Jinjie Ruan wrote:
> As Mark pointed out, do not try to switch to *all* the
> generic entry code in one go. The regular entry state management
> (e.g. enter_from_user_mode() and exit_to_user_mode()) is largely
> separate from the syscall state management. Move arm64 over to
> enter_from_user_mode() and exit_to_user_mode() without needing to use
> any of the generic syscall logic. Doing that first, *then* moving over
> to the generic syscall handling would be much easier to
> review/test/bisect, and if there are any ABI issues with the syscall
> handling in particular, it will be easier to handle those in isolation.
>
> So split generic entry into irq entry and syscall code, which will
> make review work easier and switch to generic entry clear.

> Introdue two configs called GENERIC_SYSCALL and GENERIC_IRQ_ENTRY,
> which control the irq entry and syscall parts of the generic code
> respectively. And split the header file irq-entry-common.h from
> entry-common.h for GENERIC_IRQ_ENTRY.

I think this would be simpler and clearer as:

| Currently CONFIG_GENERIC_ENTRY enables both the generic exception
| entry logic and the generic syscall entry logic, which are otherwise
| loosely coupled.
|
| Introduce separate config options for these so that archtiectures can
| select the two independently. This will make it easier for
| architectures to migrate to generic entry code.

It would be good to have this *before* the arm64 changes, either at the
start of the series or upstreamed earlier.

Thomas, can you confirm whether you're happy with splitting this up?

As above, the thinking is that we can easily/quickly move arm64 over to
the generic exception/irq entry code, but the syscall changes have a
much bigger potential impact (e.g. we've had lots of fun historically
with the ptrace state machine), and I'd like to handle the syscall
changes as a follow-up.

Mark.

> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  MAINTAINERS                      |   1 +
>  arch/Kconfig                     |   8 +
>  include/linux/entry-common.h     | 382 +-----------------------------
>  include/linux/irq-entry-common.h | 389 +++++++++++++++++++++++++++++++
>  kernel/entry/Makefile            |   3 +-
>  kernel/entry/common.c            | 160 +------------
>  kernel/entry/syscall-common.c    | 159 +++++++++++++
>  kernel/sched/core.c              |   8 +-
>  8 files changed, 565 insertions(+), 545 deletions(-)
>  create mode 100644 include/linux/irq-entry-common.h
>  create mode 100644 kernel/entry/syscall-common.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 21f855fe468b..7a6e87587101 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9585,6 +9585,7 @@ S:	Maintained
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/entry
>  F:	include/linux/entry-common.h
>  F:	include/linux/entry-kvm.h
> +F:	include/linux/irq-entry-common.h
>  F:	kernel/entry/
>  
>  GENERIC GPIO I2C DRIVER
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 6682b2a53e34..5a454eff780b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -64,8 +64,16 @@ config HOTPLUG_PARALLEL
>  	bool
>  	select HOTPLUG_SPLIT_STARTUP
>  
> +config GENERIC_IRQ_ENTRY
> +	bool
> +
> +config GENERIC_SYSCALL
> +	bool
> +
>  config GENERIC_ENTRY
>  	bool
> +	select GENERIC_IRQ_ENTRY
> +	select GENERIC_SYSCALL
>  
>  config KPROBES
>  	bool "Kprobes"
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index fc61d0205c97..b3233e8328c5 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -2,27 +2,15 @@
>  #ifndef __LINUX_ENTRYCOMMON_H
>  #define __LINUX_ENTRYCOMMON_H
>  
> -#include <linux/static_call_types.h>
> +#include <linux/irq-entry-common.h>
>  #include <linux/ptrace.h>
> -#include <linux/syscalls.h>
>  #include <linux/seccomp.h>
>  #include <linux/sched.h>
> -#include <linux/context_tracking.h>
>  #include <linux/livepatch.h>
>  #include <linux/resume_user_mode.h>
> -#include <linux/tick.h>
> -#include <linux/kmsan.h>
>  
>  #include <asm/entry-common.h>
>  
> -/*
> - * Define dummy _TIF work flags if not defined by the architecture or for
> - * disabled functionality.
> - */
> -#ifndef _TIF_PATCH_PENDING
> -# define _TIF_PATCH_PENDING		(0)
> -#endif
> -
>  #ifndef _TIF_UPROBE
>  # define _TIF_UPROBE			(0)
>  #endif
> @@ -55,69 +43,6 @@
>  				 SYSCALL_WORK_SYSCALL_EXIT_TRAP	|	\
>  				 ARCH_SYSCALL_WORK_EXIT)
>  
> -/*
> - * TIF flags handled in exit_to_user_mode_loop()
> - */
> -#ifndef ARCH_EXIT_TO_USER_MODE_WORK
> -# define ARCH_EXIT_TO_USER_MODE_WORK		(0)
> -#endif
> -
> -#define EXIT_TO_USER_MODE_WORK						\
> -	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
> -	 _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY |			\
> -	 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |			\
> -	 ARCH_EXIT_TO_USER_MODE_WORK)
> -
> -/**
> - * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Defaults to an empty implementation. Can be replaced by architecture
> - * specific code.
> - *
> - * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
> - * section. Use __always_inline so the compiler cannot push it out of line
> - * and make it instrumentable.
> - */
> -static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
> -
> -#ifndef arch_enter_from_user_mode
> -static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
> -#endif
> -
> -/**
> - * enter_from_user_mode - Establish state when coming from user mode
> - *
> - * Syscall/interrupt entry disables interrupts, but user mode is traced as
> - * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
> - *
> - * 1) Tell lockdep that interrupts are disabled
> - * 2) Invoke context tracking if enabled to reactivate RCU
> - * 3) Trace interrupts off state
> - *
> - * Invoked from architecture specific syscall entry code with interrupts
> - * disabled. The calling code has to be non-instrumentable. When the
> - * function returns all state is correct and interrupts are still
> - * disabled. The subsequent functions can be instrumented.
> - *
> - * This is invoked when there is architecture specific functionality to be
> - * done between establishing state and enabling interrupts. The caller must
> - * enable interrupts before invoking syscall_enter_from_user_mode_work().
> - */
> -static __always_inline void enter_from_user_mode(struct pt_regs *regs)
> -{
> -	arch_enter_from_user_mode(regs);
> -	lockdep_hardirqs_off(CALLER_ADDR0);
> -
> -	CT_WARN_ON(__ct_state() != CT_STATE_USER);
> -	user_exit_irqoff();
> -
> -	instrumentation_begin();
> -	kmsan_unpoison_entry_regs(regs);
> -	trace_hardirqs_off_finish();
> -	instrumentation_end();
> -}
> -
>  /**
>   * syscall_enter_from_user_mode_prepare - Establish state and enable interrupts
>   * @regs:	Pointer to currents pt_regs
> @@ -202,170 +127,6 @@ static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, l
>  	return ret;
>  }
>  
> -/**
> - * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Defaults to local_irq_enable(). Can be supplied by architecture specific
> - * code.
> - */
> -static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
> -
> -#ifndef local_irq_enable_exit_to_user
> -static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
> -{
> -	local_irq_enable();
> -}
> -#endif
> -
> -/**
> - * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
> - *
> - * Defaults to local_irq_disable(). Can be supplied by architecture specific
> - * code.
> - */
> -static inline void local_irq_disable_exit_to_user(void);
> -
> -#ifndef local_irq_disable_exit_to_user
> -static inline void local_irq_disable_exit_to_user(void)
> -{
> -	local_irq_disable();
> -}
> -#endif
> -
> -/**
> - * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
> - *				 to user mode.
> - * @regs:	Pointer to currents pt_regs
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Invoked from exit_to_user_mode_loop() with interrupt enabled
> - *
> - * Defaults to NOOP. Can be supplied by architecture specific code.
> - */
> -static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> -					       unsigned long ti_work);
> -
> -#ifndef arch_exit_to_user_mode_work
> -static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> -					       unsigned long ti_work)
> -{
> -}
> -#endif
> -
> -/**
> - * arch_exit_to_user_mode_prepare - Architecture specific preparation for
> - *				    exit to user mode.
> - * @regs:	Pointer to currents pt_regs
> - * @ti_work:	Cached TIF flags gathered with interrupts disabled
> - *
> - * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last
> - * function before return. Defaults to NOOP.
> - */
> -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> -						  unsigned long ti_work);
> -
> -#ifndef arch_exit_to_user_mode_prepare
> -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> -						  unsigned long ti_work)
> -{
> -}
> -#endif
> -
> -/**
> - * arch_exit_to_user_mode - Architecture specific final work before
> - *			    exit to user mode.
> - *
> - * Invoked from exit_to_user_mode() with interrupt disabled as the last
> - * function before return. Defaults to NOOP.
> - *
> - * This needs to be __always_inline because it is non-instrumentable code
> - * invoked after context tracking switched to user mode.
> - *
> - * An architecture implementation must not do anything complex, no locking
> - * etc. The main purpose is for speculation mitigations.
> - */
> -static __always_inline void arch_exit_to_user_mode(void);
> -
> -#ifndef arch_exit_to_user_mode
> -static __always_inline void arch_exit_to_user_mode(void) { }
> -#endif
> -
> -/**
> - * arch_do_signal_or_restart -  Architecture specific signal delivery function
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Invoked from exit_to_user_mode_loop().
> - */
> -void arch_do_signal_or_restart(struct pt_regs *regs);
> -
> -/**
> - * exit_to_user_mode_loop - do any pending work before leaving to user space
> - */
> -unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> -				     unsigned long ti_work);
> -
> -/**
> - * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
> - * @regs:	Pointer to pt_regs on entry stack
> - *
> - * 1) check that interrupts are disabled
> - * 2) call tick_nohz_user_enter_prepare()
> - * 3) call exit_to_user_mode_loop() if any flags from
> - *    EXIT_TO_USER_MODE_WORK are set
> - * 4) check that interrupts are still disabled
> - */
> -static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
> -{
> -	unsigned long ti_work;
> -
> -	lockdep_assert_irqs_disabled();
> -
> -	/* Flush pending rcuog wakeup before the last need_resched() check */
> -	tick_nohz_user_enter_prepare();
> -
> -	ti_work = read_thread_flags();
> -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> -		ti_work = exit_to_user_mode_loop(regs, ti_work);
> -
> -	arch_exit_to_user_mode_prepare(regs, ti_work);
> -
> -	/* Ensure that kernel state is sane for a return to userspace */
> -	kmap_assert_nomap();
> -	lockdep_assert_irqs_disabled();
> -	lockdep_sys_exit();
> -}
> -
> -/**
> - * exit_to_user_mode - Fixup state when exiting to user mode
> - *
> - * Syscall/interrupt exit enables interrupts, but the kernel state is
> - * interrupts disabled when this is invoked. Also tell RCU about it.
> - *
> - * 1) Trace interrupts on state
> - * 2) Invoke context tracking if enabled to adjust RCU state
> - * 3) Invoke architecture specific last minute exit code, e.g. speculation
> - *    mitigations, etc.: arch_exit_to_user_mode()
> - * 4) Tell lockdep that interrupts are enabled
> - *
> - * Invoked from architecture specific code when syscall_exit_to_user_mode()
> - * is not suitable as the last step before returning to userspace. Must be
> - * invoked with interrupts disabled and the caller must be
> - * non-instrumentable.
> - * The caller has to invoke syscall_exit_to_user_mode_work() before this.
> - */
> -static __always_inline void exit_to_user_mode(void)
> -{
> -	instrumentation_begin();
> -	trace_hardirqs_on_prepare();
> -	lockdep_hardirqs_on_prepare();
> -	instrumentation_end();
> -
> -	user_enter_irqoff();
> -	arch_exit_to_user_mode();
> -	lockdep_hardirqs_on(CALLER_ADDR0);
> -}
> -
>  /**
>   * syscall_exit_to_user_mode_work - Handle work before returning to user mode
>   * @regs:	Pointer to currents pt_regs
> @@ -412,145 +173,4 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs);
>   */
>  void syscall_exit_to_user_mode(struct pt_regs *regs);
>  
> -/**
> - * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Invoked from architecture specific entry code with interrupts disabled.
> - * Can only be called when the interrupt entry came from user mode. The
> - * calling code must be non-instrumentable.  When the function returns all
> - * state is correct and the subsequent functions can be instrumented.
> - *
> - * The function establishes state (lockdep, RCU (context tracking), tracing)
> - */
> -void irqentry_enter_from_user_mode(struct pt_regs *regs);
> -
> -/**
> - * irqentry_exit_to_user_mode - Interrupt exit work
> - * @regs:	Pointer to current's pt_regs
> - *
> - * Invoked with interrupts disabled and fully valid regs. Returns with all
> - * work handled, interrupts disabled such that the caller can immediately
> - * switch to user mode. Called from architecture specific interrupt
> - * handling code.
> - *
> - * The call order is #2 and #3 as described in syscall_exit_to_user_mode().
> - * Interrupt exit is not invoking #1 which is the syscall specific one time
> - * work.
> - */
> -void irqentry_exit_to_user_mode(struct pt_regs *regs);
> -
> -#ifndef irqentry_state
> -/**
> - * struct irqentry_state - Opaque object for exception state storage
> - * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
> - *            exit path has to invoke ct_irq_exit().
> - * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
> - *           lockdep state is restored correctly on exit from nmi.
> - *
> - * This opaque object is filled in by the irqentry_*_enter() functions and
> - * must be passed back into the corresponding irqentry_*_exit() functions
> - * when the exception is complete.
> - *
> - * Callers of irqentry_*_[enter|exit]() must consider this structure opaque
> - * and all members private.  Descriptions of the members are provided to aid in
> - * the maintenance of the irqentry_*() functions.
> - */
> -typedef struct irqentry_state {
> -	union {
> -		bool	exit_rcu;
> -		bool	lockdep;
> -	};
> -} irqentry_state_t;
> -#endif
> -
> -/**
> - * irqentry_enter - Handle state tracking on ordinary interrupt entries
> - * @regs:	Pointer to pt_regs of interrupted context
> - *
> - * Invokes:
> - *  - lockdep irqflag state tracking as low level ASM entry disabled
> - *    interrupts.
> - *
> - *  - Context tracking if the exception hit user mode.
> - *
> - *  - The hardirq tracer to keep the state consistent as low level ASM
> - *    entry disabled interrupts.
> - *
> - * As a precondition, this requires that the entry came from user mode,
> - * idle, or a kernel context in which RCU is watching.
> - *
> - * For kernel mode entries RCU handling is done conditional. If RCU is
> - * watching then the only RCU requirement is to check whether the tick has
> - * to be restarted. If RCU is not watching then ct_irq_enter() has to be
> - * invoked on entry and ct_irq_exit() on exit.
> - *
> - * Avoiding the ct_irq_enter/exit() calls is an optimization but also
> - * solves the problem of kernel mode pagefaults which can schedule, which
> - * is not possible after invoking ct_irq_enter() without undoing it.
> - *
> - * For user mode entries irqentry_enter_from_user_mode() is invoked to
> - * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
> - * would not be possible.
> - *
> - * Returns: An opaque object that must be passed to idtentry_exit()
> - */
> -irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
> -
> -/**
> - * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
> - *
> - * Conditional reschedule with additional sanity checks.
> - */
> -void raw_irqentry_exit_cond_resched(void);
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> -#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> -#define irqentry_exit_cond_resched_dynamic_disabled	NULL
> -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> -#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> -#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> -void dynamic_irqentry_exit_cond_resched(void);
> -#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
> -#endif
> -#else /* CONFIG_PREEMPT_DYNAMIC */
> -#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
> -#endif /* CONFIG_PREEMPT_DYNAMIC */
> -
> -/**
> - * irqentry_exit - Handle return from exception that used irqentry_enter()
> - * @regs:	Pointer to pt_regs (exception entry regs)
> - * @state:	Return value from matching call to irqentry_enter()
> - *
> - * Depending on the return target (kernel/user) this runs the necessary
> - * preemption and work checks if possible and required and returns to
> - * the caller with interrupts disabled and no further work pending.
> - *
> - * This is the last action before returning to the low level ASM code which
> - * just needs to return to the appropriate context.
> - *
> - * Counterpart to irqentry_enter().
> - */
> -void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
> -
> -/**
> - * irqentry_nmi_enter - Handle NMI entry
> - * @regs:	Pointer to currents pt_regs
> - *
> - * Similar to irqentry_enter() but taking care of the NMI constraints.
> - */
> -irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs);
> -
> -/**
> - * irqentry_nmi_exit - Handle return from NMI handling
> - * @regs:	Pointer to pt_regs (NMI entry regs)
> - * @irq_state:	Return value from matching call to irqentry_nmi_enter()
> - *
> - * Last action before returning to the low level assembly code.
> - *
> - * Counterpart to irqentry_nmi_enter().
> - */
> -void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state);
> -
>  #endif
> diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
> new file mode 100644
> index 000000000000..8af374331900
> --- /dev/null
> +++ b/include/linux/irq-entry-common.h
> @@ -0,0 +1,389 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_IRQENTRYCOMMON_H
> +#define __LINUX_IRQENTRYCOMMON_H
> +
> +#include <linux/static_call_types.h>
> +#include <linux/syscalls.h>
> +#include <linux/context_tracking.h>
> +#include <linux/tick.h>
> +#include <linux/kmsan.h>
> +
> +#include <asm/entry-common.h>
> +
> +/*
> + * Define dummy _TIF work flags if not defined by the architecture or for
> + * disabled functionality.
> + */
> +#ifndef _TIF_PATCH_PENDING
> +# define _TIF_PATCH_PENDING		(0)
> +#endif
> +
> +/*
> + * TIF flags handled in exit_to_user_mode_loop()
> + */
> +#ifndef ARCH_EXIT_TO_USER_MODE_WORK
> +# define ARCH_EXIT_TO_USER_MODE_WORK		(0)
> +#endif
> +
> +#define EXIT_TO_USER_MODE_WORK						\
> +	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
> +	 _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY |			\
> +	 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |			\
> +	 ARCH_EXIT_TO_USER_MODE_WORK)
> +
> +/**
> + * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Defaults to an empty implementation. Can be replaced by architecture
> + * specific code.
> + *
> + * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
> + * section. Use __always_inline so the compiler cannot push it out of line
> + * and make it instrumentable.
> + */
> +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
> +
> +#ifndef arch_enter_from_user_mode
> +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
> +#endif
> +
> +/**
> + * enter_from_user_mode - Establish state when coming from user mode
> + *
> + * Syscall/interrupt entry disables interrupts, but user mode is traced as
> + * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
> + *
> + * 1) Tell lockdep that interrupts are disabled
> + * 2) Invoke context tracking if enabled to reactivate RCU
> + * 3) Trace interrupts off state
> + *
> + * Invoked from architecture specific syscall entry code with interrupts
> + * disabled. The calling code has to be non-instrumentable. When the
> + * function returns all state is correct and interrupts are still
> + * disabled. The subsequent functions can be instrumented.
> + *
> + * This is invoked when there is architecture specific functionality to be
> + * done between establishing state and enabling interrupts. The caller must
> + * enable interrupts before invoking syscall_enter_from_user_mode_work().
> + */
> +static __always_inline void enter_from_user_mode(struct pt_regs *regs)
> +{
> +	arch_enter_from_user_mode(regs);
> +	lockdep_hardirqs_off(CALLER_ADDR0);
> +
> +	CT_WARN_ON(__ct_state() != CT_STATE_USER);
> +	user_exit_irqoff();
> +
> +	instrumentation_begin();
> +	kmsan_unpoison_entry_regs(regs);
> +	trace_hardirqs_off_finish();
> +	instrumentation_end();
> +}
> +
> +/**
> + * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
> + * @ti_work:	Cached TIF flags gathered with interrupts disabled
> + *
> + * Defaults to local_irq_enable(). Can be supplied by architecture specific
> + * code.
> + */
> +static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
> +
> +#ifndef local_irq_enable_exit_to_user
> +static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
> +{
> +	local_irq_enable();
> +}
> +#endif
> +
> +/**
> + * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
> + *
> + * Defaults to local_irq_disable(). Can be supplied by architecture specific
> + * code.
> + */
> +static inline void local_irq_disable_exit_to_user(void);
> +
> +#ifndef local_irq_disable_exit_to_user
> +static inline void local_irq_disable_exit_to_user(void)
> +{
> +	local_irq_disable();
> +}
> +#endif
> +
> +/**
> + * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
> + *				 to user mode.
> + * @regs:	Pointer to currents pt_regs
> + * @ti_work:	Cached TIF flags gathered with interrupts disabled
> + *
> + * Invoked from exit_to_user_mode_loop() with interrupt enabled
> + *
> + * Defaults to NOOP. Can be supplied by architecture specific code.
> + */
> +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> +					       unsigned long ti_work);
> +
> +#ifndef arch_exit_to_user_mode_work
> +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
> +					       unsigned long ti_work)
> +{
> +}
> +#endif
> +
> +/**
> + * arch_exit_to_user_mode_prepare - Architecture specific preparation for
> + *				    exit to user mode.
> + * @regs:	Pointer to currents pt_regs
> + * @ti_work:	Cached TIF flags gathered with interrupts disabled
> + *
> + * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last
> + * function before return. Defaults to NOOP.
> + */
> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work);
> +
> +#ifndef arch_exit_to_user_mode_prepare
> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work)
> +{
> +}
> +#endif
> +
> +/**
> + * arch_exit_to_user_mode - Architecture specific final work before
> + *			    exit to user mode.
> + *
> + * Invoked from exit_to_user_mode() with interrupt disabled as the last
> + * function before return. Defaults to NOOP.
> + *
> + * This needs to be __always_inline because it is non-instrumentable code
> + * invoked after context tracking switched to user mode.
> + *
> + * An architecture implementation must not do anything complex, no locking
> + * etc. The main purpose is for speculation mitigations.
> + */
> +static __always_inline void arch_exit_to_user_mode(void);
> +
> +#ifndef arch_exit_to_user_mode
> +static __always_inline void arch_exit_to_user_mode(void) { }
> +#endif
> +
> +/**
> + * arch_do_signal_or_restart -  Architecture specific signal delivery function
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Invoked from exit_to_user_mode_loop().
> + */
> +void arch_do_signal_or_restart(struct pt_regs *regs);
> +
> +/**
> + * exit_to_user_mode_loop - do any pending work before leaving to user space
> + */
> +unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> +				     unsigned long ti_work);
> +
> +/**
> + * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
> + * @regs:	Pointer to pt_regs on entry stack
> + *
> + * 1) check that interrupts are disabled
> + * 2) call tick_nohz_user_enter_prepare()
> + * 3) call exit_to_user_mode_loop() if any flags from
> + *    EXIT_TO_USER_MODE_WORK are set
> + * 4) check that interrupts are still disabled
> + */
> +static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs)
> +{
> +	unsigned long ti_work;
> +
> +	lockdep_assert_irqs_disabled();
> +
> +	/* Flush pending rcuog wakeup before the last need_resched() check */
> +	tick_nohz_user_enter_prepare();
> +
> +	ti_work = read_thread_flags();
> +	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> +		ti_work = exit_to_user_mode_loop(regs, ti_work);
> +
> +	arch_exit_to_user_mode_prepare(regs, ti_work);
> +
> +	/* Ensure that kernel state is sane for a return to userspace */
> +	kmap_assert_nomap();
> +	lockdep_assert_irqs_disabled();
> +	lockdep_sys_exit();
> +}
> +
> +/**
> + * exit_to_user_mode - Fixup state when exiting to user mode
> + *
> + * Syscall/interrupt exit enables interrupts, but the kernel state is
> + * interrupts disabled when this is invoked. Also tell RCU about it.
> + *
> + * 1) Trace interrupts on state
> + * 2) Invoke context tracking if enabled to adjust RCU state
> + * 3) Invoke architecture specific last minute exit code, e.g. speculation
> + *    mitigations, etc.: arch_exit_to_user_mode()
> + * 4) Tell lockdep that interrupts are enabled
> + *
> + * Invoked from architecture specific code when syscall_exit_to_user_mode()
> + * is not suitable as the last step before returning to userspace. Must be
> + * invoked with interrupts disabled and the caller must be
> + * non-instrumentable.
> + * The caller has to invoke syscall_exit_to_user_mode_work() before this.
> + */
> +static __always_inline void exit_to_user_mode(void)
> +{
> +	instrumentation_begin();
> +	trace_hardirqs_on_prepare();
> +	lockdep_hardirqs_on_prepare();
> +	instrumentation_end();
> +
> +	user_enter_irqoff();
> +	arch_exit_to_user_mode();
> +	lockdep_hardirqs_on(CALLER_ADDR0);
> +}
> +
> +/**
> + * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Invoked from architecture specific entry code with interrupts disabled.
> + * Can only be called when the interrupt entry came from user mode. The
> + * calling code must be non-instrumentable.  When the function returns all
> + * state is correct and the subsequent functions can be instrumented.
> + *
> + * The function establishes state (lockdep, RCU (context tracking), tracing)
> + */
> +void irqentry_enter_from_user_mode(struct pt_regs *regs);
> +
> +/**
> + * irqentry_exit_to_user_mode - Interrupt exit work
> + * @regs:	Pointer to current's pt_regs
> + *
> + * Invoked with interrupts disabled and fully valid regs. Returns with all
> + * work handled, interrupts disabled such that the caller can immediately
> + * switch to user mode. Called from architecture specific interrupt
> + * handling code.
> + *
> + * The call order is #2 and #3 as described in syscall_exit_to_user_mode().
> + * Interrupt exit is not invoking #1 which is the syscall specific one time
> + * work.
> + */
> +void irqentry_exit_to_user_mode(struct pt_regs *regs);
> +
> +#ifndef irqentry_state
> +/**
> + * struct irqentry_state - Opaque object for exception state storage
> + * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
> + *            exit path has to invoke ct_irq_exit().
> + * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
> + *           lockdep state is restored correctly on exit from nmi.
> + *
> + * This opaque object is filled in by the irqentry_*_enter() functions and
> + * must be passed back into the corresponding irqentry_*_exit() functions
> + * when the exception is complete.
> + *
> + * Callers of irqentry_*_[enter|exit]() must consider this structure opaque
> + * and all members private.  Descriptions of the members are provided to aid in
> + * the maintenance of the irqentry_*() functions.
> + */
> +typedef struct irqentry_state {
> +	union {
> +		bool	exit_rcu;
> +		bool	lockdep;
> +	};
> +} irqentry_state_t;
> +#endif
> +
> +/**
> + * irqentry_enter - Handle state tracking on ordinary interrupt entries
> + * @regs:	Pointer to pt_regs of interrupted context
> + *
> + * Invokes:
> + *  - lockdep irqflag state tracking as low level ASM entry disabled
> + *    interrupts.
> + *
> + *  - Context tracking if the exception hit user mode.
> + *
> + *  - The hardirq tracer to keep the state consistent as low level ASM
> + *    entry disabled interrupts.
> + *
> + * As a precondition, this requires that the entry came from user mode,
> + * idle, or a kernel context in which RCU is watching.
> + *
> + * For kernel mode entries RCU handling is done conditional. If RCU is
> + * watching then the only RCU requirement is to check whether the tick has
> + * to be restarted. If RCU is not watching then ct_irq_enter() has to be
> + * invoked on entry and ct_irq_exit() on exit.
> + *
> + * Avoiding the ct_irq_enter/exit() calls is an optimization but also
> + * solves the problem of kernel mode pagefaults which can schedule, which
> + * is not possible after invoking ct_irq_enter() without undoing it.
> + *
> + * For user mode entries irqentry_enter_from_user_mode() is invoked to
> + * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
> + * would not be possible.
> + *
> + * Returns: An opaque object that must be passed to idtentry_exit()
> + */
> +irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
> +
> +/**
> + * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
> + *
> + * Conditional reschedule with additional sanity checks.
> + */
> +void raw_irqentry_exit_cond_resched(void);
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> +#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> +#define irqentry_exit_cond_resched_dynamic_disabled	NULL
> +DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> +#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> +#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +void dynamic_irqentry_exit_cond_resched(void);
> +#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
> +#endif
> +#else /* CONFIG_PREEMPT_DYNAMIC */
> +#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
> +#endif /* CONFIG_PREEMPT_DYNAMIC */
> +
> +/**
> + * irqentry_exit - Handle return from exception that used irqentry_enter()
> + * @regs:	Pointer to pt_regs (exception entry regs)
> + * @state:	Return value from matching call to irqentry_enter()
> + *
> + * Depending on the return target (kernel/user) this runs the necessary
> + * preemption and work checks if possible and required and returns to
> + * the caller with interrupts disabled and no further work pending.
> + *
> + * This is the last action before returning to the low level ASM code which
> + * just needs to return to the appropriate context.
> + *
> + * Counterpart to irqentry_enter().
> + */
> +void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
> +
> +/**
> + * irqentry_nmi_enter - Handle NMI entry
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Similar to irqentry_enter() but taking care of the NMI constraints.
> + */
> +irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs);
> +
> +/**
> + * irqentry_nmi_exit - Handle return from NMI handling
> + * @regs:	Pointer to pt_regs (NMI entry regs)
> + * @irq_state:	Return value from matching call to irqentry_nmi_enter()
> + *
> + * Last action before returning to the low level assembly code.
> + *
> + * Counterpart to irqentry_nmi_enter().
> + */
> +void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state);
> +
> +#endif
> diff --git a/kernel/entry/Makefile b/kernel/entry/Makefile
> index 095c775e001e..d38f3a7e7396 100644
> --- a/kernel/entry/Makefile
> +++ b/kernel/entry/Makefile
> @@ -9,5 +9,6 @@ KCOV_INSTRUMENT := n
>  CFLAGS_REMOVE_common.o	 = -fstack-protector -fstack-protector-strong
>  CFLAGS_common.o		+= -fno-stack-protector
>  
> -obj-$(CONFIG_GENERIC_ENTRY) 		+= common.o syscall_user_dispatch.o
> +obj-$(CONFIG_GENERIC_IRQ_ENTRY) 	+= common.o
> +obj-$(CONFIG_GENERIC_SYSCALL) 		+= syscall-common.o syscall_user_dispatch.o
>  obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK)	+= kvm.o
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index e33691d5adf7..b82032777310 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -1,84 +1,13 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -#include <linux/context_tracking.h>
> -#include <linux/entry-common.h>
> +#include <linux/irq-entry-common.h>
>  #include <linux/resume_user_mode.h>
>  #include <linux/highmem.h>
>  #include <linux/jump_label.h>
>  #include <linux/kmsan.h>
>  #include <linux/livepatch.h>
> -#include <linux/audit.h>
>  #include <linux/tick.h>
>  
> -#include "common.h"
> -
> -#define CREATE_TRACE_POINTS
> -#include <trace/events/syscalls.h>
> -
> -static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
> -{
> -	if (unlikely(audit_context())) {
> -		unsigned long args[6];
> -
> -		syscall_get_arguments(current, regs, args);
> -		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
> -	}
> -}
> -
> -long syscall_trace_enter(struct pt_regs *regs, long syscall,
> -				unsigned long work)
> -{
> -	long ret = 0;
> -
> -	/*
> -	 * Handle Syscall User Dispatch.  This must comes first, since
> -	 * the ABI here can be something that doesn't make sense for
> -	 * other syscall_work features.
> -	 */
> -	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> -		if (syscall_user_dispatch(regs))
> -			return -1L;
> -	}
> -
> -	/* Handle ptrace */
> -	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
> -		ret = ptrace_report_syscall_entry(regs);
> -		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
> -			return -1L;
> -	}
> -
> -	/* Do seccomp after ptrace, to catch any tracer changes. */
> -	if (work & SYSCALL_WORK_SECCOMP) {
> -		ret = __secure_computing(NULL);
> -		if (ret == -1L)
> -			return ret;
> -	}
> -
> -	/* Either of the above might have changed the syscall number */
> -	syscall = syscall_get_nr(current, regs);
> -
> -	if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
> -		trace_sys_enter(regs, syscall);
> -		/*
> -		 * Probes or BPF hooks in the tracepoint may have changed the
> -		 * system call number as well.
> -		 */
> -		syscall = syscall_get_nr(current, regs);
> -	}
> -
> -	syscall_enter_audit(regs, syscall);
> -
> -	return ret ? : syscall;
> -}
> -
> -noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
> -{
> -	enter_from_user_mode(regs);
> -	instrumentation_begin();
> -	local_irq_enable();
> -	instrumentation_end();
> -}
> -
>  /* Workaround to allow gradual conversion of architecture code */
>  void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
>  
> @@ -133,93 +62,6 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  	return ti_work;
>  }
>  
> -/*
> - * If SYSCALL_EMU is set, then the only reason to report is when
> - * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
> - * instruction has been already reported in syscall_enter_from_user_mode().
> - */
> -static inline bool report_single_step(unsigned long work)
> -{
> -	if (work & SYSCALL_WORK_SYSCALL_EMU)
> -		return false;
> -
> -	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
> -}
> -
> -static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> -{
> -	bool step;
> -
> -	/*
> -	 * If the syscall was rolled back due to syscall user dispatching,
> -	 * then the tracers below are not invoked for the same reason as
> -	 * the entry side was not invoked in syscall_trace_enter(): The ABI
> -	 * of these syscalls is unknown.
> -	 */
> -	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> -		if (unlikely(current->syscall_dispatch.on_dispatch)) {
> -			current->syscall_dispatch.on_dispatch = false;
> -			return;
> -		}
> -	}
> -
> -	audit_syscall_exit(regs);
> -
> -	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> -		trace_sys_exit(regs, syscall_get_return_value(current, regs));
> -
> -	step = report_single_step(work);
> -	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
> -		ptrace_report_syscall_exit(regs, step);
> -}
> -
> -/*
> - * Syscall specific exit to user mode preparation. Runs with interrupts
> - * enabled.
> - */
> -static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
> -{
> -	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
> -	unsigned long nr = syscall_get_nr(current, regs);
> -
> -	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
> -
> -	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> -		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
> -			local_irq_enable();
> -	}
> -
> -	rseq_syscall(regs);
> -
> -	/*
> -	 * Do one-time syscall specific work. If these work items are
> -	 * enabled, we want to run them exactly once per syscall exit with
> -	 * interrupts enabled.
> -	 */
> -	if (unlikely(work & SYSCALL_WORK_EXIT))
> -		syscall_exit_work(regs, work);
> -}
> -
> -static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
> -{
> -	syscall_exit_to_user_mode_prepare(regs);
> -	local_irq_disable_exit_to_user();
> -	exit_to_user_mode_prepare(regs);
> -}
> -
> -void syscall_exit_to_user_mode_work(struct pt_regs *regs)
> -{
> -	__syscall_exit_to_user_mode_work(regs);
> -}
> -
> -__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
> -{
> -	instrumentation_begin();
> -	__syscall_exit_to_user_mode_work(regs);
> -	instrumentation_end();
> -	exit_to_user_mode();
> -}
> -
>  noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
>  {
>  	enter_from_user_mode(regs);
> diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c
> new file mode 100644
> index 000000000000..0eb036986ad4
> --- /dev/null
> +++ b/kernel/entry/syscall-common.c
> @@ -0,0 +1,159 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/audit.h>
> +#include <linux/entry-common.h>
> +#include "common.h"
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/syscalls.h>
> +
> +static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
> +{
> +	if (unlikely(audit_context())) {
> +		unsigned long args[6];
> +
> +		syscall_get_arguments(current, regs, args);
> +		audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
> +	}
> +}
> +
> +long syscall_trace_enter(struct pt_regs *regs, long syscall,
> +				unsigned long work)
> +{
> +	long ret = 0;
> +
> +	/*
> +	 * Handle Syscall User Dispatch.  This must comes first, since
> +	 * the ABI here can be something that doesn't make sense for
> +	 * other syscall_work features.
> +	 */
> +	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> +		if (syscall_user_dispatch(regs))
> +			return -1L;
> +	}
> +
> +	/* Handle ptrace */
> +	if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
> +		ret = ptrace_report_syscall_entry(regs);
> +		if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
> +			return -1L;
> +	}
> +
> +	/* Do seccomp after ptrace, to catch any tracer changes. */
> +	if (work & SYSCALL_WORK_SECCOMP) {
> +		ret = __secure_computing(NULL);
> +		if (ret == -1L)
> +			return ret;
> +	}
> +
> +	/* Either of the above might have changed the syscall number */
> +	syscall = syscall_get_nr(current, regs);
> +
> +	if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
> +		trace_sys_enter(regs, syscall);
> +		/*
> +		 * Probes or BPF hooks in the tracepoint may have changed the
> +		 * system call number as well.
> +		 */
> +		syscall = syscall_get_nr(current, regs);
> +	}
> +
> +	syscall_enter_audit(regs, syscall);
> +
> +	return ret ? : syscall;
> +}
> +
> +noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
> +{
> +	enter_from_user_mode(regs);
> +	instrumentation_begin();
> +	local_irq_enable();
> +	instrumentation_end();
> +}
> +
> +/*
> + * If SYSCALL_EMU is set, then the only reason to report is when
> + * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
> + * instruction has been already reported in syscall_enter_from_user_mode().
> + */
> +static inline bool report_single_step(unsigned long work)
> +{
> +	if (work & SYSCALL_WORK_SYSCALL_EMU)
> +		return false;
> +
> +	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
> +}
> +
> +static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> +{
> +	bool step;
> +
> +	/*
> +	 * If the syscall was rolled back due to syscall user dispatching,
> +	 * then the tracers below are not invoked for the same reason as
> +	 * the entry side was not invoked in syscall_trace_enter(): The ABI
> +	 * of these syscalls is unknown.
> +	 */
> +	if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> +		if (unlikely(current->syscall_dispatch.on_dispatch)) {
> +			current->syscall_dispatch.on_dispatch = false;
> +			return;
> +		}
> +	}
> +
> +	audit_syscall_exit(regs);
> +
> +	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> +		trace_sys_exit(regs, syscall_get_return_value(current, regs));
> +
> +	step = report_single_step(work);
> +	if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
> +		ptrace_report_syscall_exit(regs, step);
> +}
> +
> +/*
> + * Syscall specific exit to user mode preparation. Runs with interrupts
> + * enabled.
> + */
> +static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
> +{
> +	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
> +	unsigned long nr = syscall_get_nr(current, regs);
> +
> +	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
> +
> +	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
> +		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
> +			local_irq_enable();
> +	}
> +
> +	rseq_syscall(regs);
> +
> +	/*
> +	 * Do one-time syscall specific work. If these work items are
> +	 * enabled, we want to run them exactly once per syscall exit with
> +	 * interrupts enabled.
> +	 */
> +	if (unlikely(work & SYSCALL_WORK_EXIT))
> +		syscall_exit_work(regs, work);
> +}
> +
> +static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
> +{
> +	syscall_exit_to_user_mode_prepare(regs);
> +	local_irq_disable_exit_to_user();
> +	exit_to_user_mode_prepare(regs);
> +}
> +
> +void syscall_exit_to_user_mode_work(struct pt_regs *regs)
> +{
> +	__syscall_exit_to_user_mode_work(regs);
> +}
> +
> +__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
> +{
> +	instrumentation_begin();
> +	__syscall_exit_to_user_mode_work(regs);
> +	instrumentation_end();
> +	exit_to_user_mode();
> +}
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 27a8fbd58091..2d560bb3efaa 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -68,8 +68,8 @@
>  #include <linux/workqueue_api.h>
>  
>  #ifdef CONFIG_PREEMPT_DYNAMIC
> -# ifdef CONFIG_GENERIC_ENTRY
> -#  include <linux/entry-common.h>
> +# ifdef CONFIG_GENERIC_IRQ_ENTRY
> +#  include <linux/irq-entry-common.h>
>  # endif
>  #endif
>  
> @@ -7398,8 +7398,8 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write);
>  
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  
> -#ifdef CONFIG_GENERIC_ENTRY
> -#include <linux/entry-common.h>
> +#ifdef CONFIG_GENERIC_IRQ_ENTRY
> +#include <linux/irq-entry-common.h>
>  #endif
>  
>  /*
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64
       [not found] ` <20241206101744.4161990-11-ruanjinjie@huawei.com>
@ 2025-02-10 12:05   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 12:05 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:32PM +0800, Jinjie Ruan wrote:
> ARM64 requires an additional check whether to reschedule on return
> from interrupt.
> 
> Add arch_irqentry_exit_need_resched() as the default NOP
> implementation and hook it up into the need_resched() condition in
> raw_irqentry_exit_cond_resched().
> 
> This allows ARM64 to implement the architecture specific version for
> switching over to the generic entry code.

Please fold this into the earlier changes in this area mad over patches
6 to 8.

> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  kernel/entry/common.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index b82032777310..4aa9656fa1b4 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -142,6 +142,20 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>  	return ret;
>  }
>  
> +/**
> + * arch_irqentry_exit_need_resched - Architecture specific need resched function
> + *
> + * Invoked from raw_irqentry_exit_cond_resched() to check if need resched.
> + * Defaults return true.
> + *
> + * The main purpose is to permit arch to skip preempt a task from an IRQ.
> + */
> +static inline bool arch_irqentry_exit_need_resched(void);
> +
> +#ifndef arch_irqentry_exit_need_resched
> +static inline bool arch_irqentry_exit_need_resched(void) { return true; }
> +#endif
> +
>  void raw_irqentry_exit_cond_resched(void)
>  {
>  	if (!preempt_count()) {
> @@ -149,7 +163,7 @@ void raw_irqentry_exit_cond_resched(void)
>  		rcu_irq_exit_check_preempt();
>  		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
>  			WARN_ON_ONCE(!on_thread_stack());
> -		if (need_resched())
> +		if (need_resched() && arch_irqentry_exit_need_resched())
>  			preempt_schedule_irq();
>  	}
>  }
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry
       [not found] ` <20241206101744.4161990-12-ruanjinjie@huawei.com>
@ 2025-02-10 12:24   ` Mark Rutland
  2025-02-11 11:32     ` Jinjie Ruan
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 12:24 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Fri, Dec 06, 2024 at 06:17:33PM +0800, Jinjie Ruan wrote:
> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> to use the generic entry infrastructure from kernel/entry/*.
> The generic entry makes maintainers' work easier and codes
> more elegant.
> 
> Switch arm64 to generic IRQ entry first, which removed duplicate 100+
> LOC, and it will switch to generic entry completely later. Switch to
> generic entry in two steps according to Mark's suggestion will make
> it easier to review.
> 
> The changes are below:
>  - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
>    irqentry_enter/exit(). Also remove *enter_from/exit_to_user_mode(),
>    and wrap with generic enter_from/exit_to_user_mode() because they
>    are exactly the same so far.
> 
>  - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
>    because they're exactly the same, so the temporary arm64 version
>    irqentry_state can also be removed.
> 
>  - Remove PREEMPT_DYNAMIC code, as generic entry do the same thing
>    if arm64 implement arch_irqentry_exit_need_resched().
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/Kconfig                    |   1 +
>  arch/arm64/include/asm/entry-common.h |  64 ++++++
>  arch/arm64/include/asm/preempt.h      |   6 -
>  arch/arm64/kernel/entry-common.c      | 307 ++++++--------------------
>  arch/arm64/kernel/signal.c            |   3 +-
>  5 files changed, 129 insertions(+), 252 deletions(-)
>  create mode 100644 arch/arm64/include/asm/entry-common.h

Superficially this looks nice, but to be clear I have *not* looked at
this in great detail; minor comments below.

[...]

> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +						  unsigned long ti_work)
> +{
> +	local_daif_mask();
> +}
> +
> +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare

I'm a little worried that this may be fragile having been hidden in the
common code, as it's not clear exactly when this will occur during the
return sequence, and the ordering requirements could easily be broken by
refactoring there.

I suspect we'll want to pull this later in the arm64 exit sequence so
that we can have it explicit in entry-common.c.

[...]

> index 14ac6fdb872b..84b6628647c7 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -9,6 +9,7 @@
>  #include <linux/cache.h>
>  #include <linux/compat.h>
>  #include <linux/errno.h>
> +#include <linux/irq-entry-common.h>
>  #include <linux/kernel.h>
>  #include <linux/signal.h>
>  #include <linux/freezer.h>
> @@ -1603,7 +1604,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>   * the kernel can handle, and then we build all the user-level signal handling
>   * stack-frames in one go after that.
>   */
> -void do_signal(struct pt_regs *regs)
> +void arch_do_signal_or_restart(struct pt_regs *regs)
>  {
>  	unsigned long continue_addr = 0, restart_addr = 0;
>  	int retval = 0;

Is the expected semantic the same here, or is those more than just a
name change?

Mark.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
  2025-02-08  1:15 ` [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
@ 2025-02-10 12:30   ` Mark Rutland
  2025-02-11 11:43     ` Jinjie Ruan
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2025-02-10 12:30 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel

On Sat, Feb 08, 2025 at 09:15:08AM +0800, Jinjie Ruan wrote:
> On 2024/12/6 18:17, Jinjie Ruan wrote:
> > Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> > to use the generic entry infrastructure from kernel/entry/*. The generic
> > entry makes maintainers' work easier and codes more elegant, which aslo
> > removed a lot of duplicate code.
> > 
> > The main steps are as follows:
> > - Make arm64 easier to use irqentry_enter/exit().
> > - Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
> > - Split generic entry into generic irq entry and generic syscall to
> >   make the single patch more concentrated in switching to one thing.
> > - Switch to generic irq entry.
> > - Make arm64 closer to the generic syscall code.
> > - Switch to generic entry completely.
> > 
> > Changes in v5:
> > - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
> > - Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
> > - Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
> > - Update "irqentry_state_t ret/irq_state" to "state"
> >   to keep it consistently.
> > - Use generic irq entry header for PREEMPT_DYNAMIC after split
> >   the generic entry.
> > - Also refactor the ARM64 syscall code.
> > - Introduce arch_ptrace_report_syscall_entry/exit(), instead of
> >   arch_pre/post_report_syscall_entry/exit() to simplify code.
> > - Make the syscall patches clear separation.
> > - Update the commit message.
> 
> Gentle Ping.

I've left soem comments.

As I mentioned previously, I'd very much prefer that we do the syscall
entry logic changes *later* (i.e. as a follow-up patch series), after
we've got the irq/exception entry logic sorted.

I reckon we've got just enough time to get the irq/exception entry
changes ready this cycle, with another round or two of review. So can we
please put the syscall bits aside for now? ... that and run all the
tests you mention in patch 22 on the irq/exception entry changes alone.

Mark.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry
  2025-02-10 12:24   ` [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Mark Rutland
@ 2025-02-11 11:32     ` Jinjie Ruan
  0 siblings, 0 replies; 15+ messages in thread
From: Jinjie Ruan @ 2025-02-11 11:32 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel



On 2025/2/10 20:24, Mark Rutland wrote:
> On Fri, Dec 06, 2024 at 06:17:33PM +0800, Jinjie Ruan wrote:
>> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
>> to use the generic entry infrastructure from kernel/entry/*.
>> The generic entry makes maintainers' work easier and codes
>> more elegant.
>>
>> Switch arm64 to generic IRQ entry first, which removed duplicate 100+
>> LOC, and it will switch to generic entry completely later. Switch to
>> generic entry in two steps according to Mark's suggestion will make
>> it easier to review.
>>
>> The changes are below:
>>  - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
>>    irqentry_enter/exit(). Also remove *enter_from/exit_to_user_mode(),
>>    and wrap with generic enter_from/exit_to_user_mode() because they
>>    are exactly the same so far.
>>
>>  - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
>>    because they're exactly the same, so the temporary arm64 version
>>    irqentry_state can also be removed.
>>
>>  - Remove PREEMPT_DYNAMIC code, as generic entry do the same thing
>>    if arm64 implement arch_irqentry_exit_need_resched().
>>
>> Suggested-by: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
>> ---
>>  arch/arm64/Kconfig                    |   1 +
>>  arch/arm64/include/asm/entry-common.h |  64 ++++++
>>  arch/arm64/include/asm/preempt.h      |   6 -
>>  arch/arm64/kernel/entry-common.c      | 307 ++++++--------------------
>>  arch/arm64/kernel/signal.c            |   3 +-
>>  5 files changed, 129 insertions(+), 252 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/entry-common.h
> 
> Superficially this looks nice, but to be clear I have *not* looked at
> this in great detail; minor comments below.
> 
> [...]
> 
>> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
>> +						  unsigned long ti_work)
>> +{
>> +	local_daif_mask();
>> +}
>> +
>> +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
> 
> I'm a little worried that this may be fragile having been hidden in the
> common code, as it's not clear exactly when this will occur during the
> return sequence, and the ordering requirements could easily be broken by
> refactoring there.
> 
> I suspect we'll want to pull this later in the arm64 exit sequence so
> that we can have it explicit in entry-common.c.

Yes, this key function is hidden in generic entry code and is not easy
to clear and see when it is executed. But placing it directly in
entry-common.c in arm64 may change the order in which lockdep_sys_exit()
and local_daif_mask() are called, it's not clear what the potential
impact is.

Before:
   exit_to_user_mode_prepare()
      ...
      -> local_daif_mask()
      -> lockdep_sys_exit()


arm64_exit_to_user_mode()
  ...
  -> exit_to_user_mode_prepare()
     -> lockdep_sys_exit()
  -> local_daif_mask()

> 
> [...]
> 
>> index 14ac6fdb872b..84b6628647c7 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -9,6 +9,7 @@
>>  #include <linux/cache.h>
>>  #include <linux/compat.h>
>>  #include <linux/errno.h>
>> +#include <linux/irq-entry-common.h>
>>  #include <linux/kernel.h>
>>  #include <linux/signal.h>
>>  #include <linux/freezer.h>
>> @@ -1603,7 +1604,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>>   * the kernel can handle, and then we build all the user-level signal handling
>>   * stack-frames in one go after that.
>>   */
>> -void do_signal(struct pt_regs *regs)
>> +void arch_do_signal_or_restart(struct pt_regs *regs)
>>  {
>>  	unsigned long continue_addr = 0, restart_addr = 0;
>>  	int retval = 0;
> 
> Is the expected semantic the same here, or is those more than just a
> name change?

Yes, the expected semantic is the same here, they both handle
_TIF_SIGPENDING and _TIF_NOTIFY_SIGNAL thread flags before
exit to user.

In arm64 the code call sequence is:

  exit_to_user_mode()
     -> exit_to_user_mode_prepare()
        -> do_notify_resume(regs, flags)
           -> if (thread_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
                 do_signal(regs)

In generic entry code, the logic is the same:

  exit_to_user_mode_prepare()
      -> exit_to_user_mode_loop()
          -> if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
                 arch_do_signal_or_restart(regs)

> 
> Mark.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next v5 00/22] arm64: entry: Convert to generic entry
  2025-02-10 12:30   ` Mark Rutland
@ 2025-02-11 11:43     ` Jinjie Ruan
  0 siblings, 0 replies; 15+ messages in thread
From: Jinjie Ruan @ 2025-02-11 11:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, will, oleg, sstabellini, tglx, peterz, luto,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, kees, wad, akpm, samitolvanen,
	masahiroy, hca, aliceryhl, rppt, xur, paulmck, arnd, mbenes,
	puranjay, pcc, ardb, sudeep.holla, guohanjun, rafael, liuwei09,
	dwmw, Jonathan.Cameron, liaochang1, kristina.martsenko, ptosi,
	broonie, thiago.bauermann, kevin.brodsky, joey.gouly, liuyuntao12,
	leobras, linux-kernel, linux-arm-kernel, xen-devel



On 2025/2/10 20:30, Mark Rutland wrote:
> On Sat, Feb 08, 2025 at 09:15:08AM +0800, Jinjie Ruan wrote:
>> On 2024/12/6 18:17, Jinjie Ruan wrote:
>>> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
>>> to use the generic entry infrastructure from kernel/entry/*. The generic
>>> entry makes maintainers' work easier and codes more elegant, which aslo
>>> removed a lot of duplicate code.
>>>
>>> The main steps are as follows:
>>> - Make arm64 easier to use irqentry_enter/exit().
>>> - Make arm64 closer to the PREEMPT_DYNAMIC code of generic entry.
>>> - Split generic entry into generic irq entry and generic syscall to
>>>   make the single patch more concentrated in switching to one thing.
>>> - Switch to generic irq entry.
>>> - Make arm64 closer to the generic syscall code.
>>> - Switch to generic entry completely.
>>>
>>> Changes in v5:
>>> - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
>>> - Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
>>> - Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
>>> - Update "irqentry_state_t ret/irq_state" to "state"
>>>   to keep it consistently.
>>> - Use generic irq entry header for PREEMPT_DYNAMIC after split
>>>   the generic entry.
>>> - Also refactor the ARM64 syscall code.
>>> - Introduce arch_ptrace_report_syscall_entry/exit(), instead of
>>>   arch_pre/post_report_syscall_entry/exit() to simplify code.
>>> - Make the syscall patches clear separation.
>>> - Update the commit message.
>>
>> Gentle Ping.
> 
> I've left soem comments.
> 
> As I mentioned previously, I'd very much prefer that we do the syscall
> entry logic changes *later* (i.e. as a follow-up patch series), after
> we've got the irq/exception entry logic sorted.
> 
> I reckon we've got just enough time to get the irq/exception entry
> changes ready this cycle, with another round or two of review. So can we
> please put the syscall bits aside for now? ... that and run all the
> tests you mention in patch 22 on the irq/exception entry changes alone.

Sure, it is ok to put the syscall bits aside and split it out .

> 
> Mark.
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-11 12:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241206101744.4161990-1-ruanjinjie@huawei.com>
2025-02-08  1:15 ` [PATCH -next v5 00/22] arm64: entry: Convert to generic entry Jinjie Ruan
2025-02-10 12:30   ` Mark Rutland
2025-02-11 11:43     ` Jinjie Ruan
     [not found] ` <20241206101744.4161990-2-ruanjinjie@huawei.com>
2025-02-10 11:04   ` [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Mark Rutland
     [not found] ` <20241206101744.4161990-3-ruanjinjie@huawei.com>
2025-02-10 11:08   ` [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1 Mark Rutland
     [not found] ` <20241206101744.4161990-4-ruanjinjie@huawei.com>
2025-02-10 11:26   ` [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() Mark Rutland
     [not found] ` <20241206101744.4161990-5-ruanjinjie@huawei.com>
2025-02-10 11:33   ` [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq() Mark Rutland
     [not found] ` <20241206101744.4161990-6-ruanjinjie@huawei.com>
2025-02-10 11:40   ` [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper Mark Rutland
     [not found] ` <20241206101744.4161990-7-ruanjinjie@huawei.com>
2025-02-10 11:48   ` [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead Mark Rutland
     [not found] ` <20241206101744.4161990-8-ruanjinjie@huawei.com>
2025-02-10 11:52   ` [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled Mark Rutland
     [not found] ` <20241206101744.4161990-9-ruanjinjie@huawei.com>
2025-02-10 11:54   ` [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC Mark Rutland
     [not found] ` <20241206101744.4161990-10-ruanjinjie@huawei.com>
2025-02-10 12:04   ` [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall Mark Rutland
     [not found] ` <20241206101744.4161990-11-ruanjinjie@huawei.com>
2025-02-10 12:05   ` [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64 Mark Rutland
     [not found] ` <20241206101744.4161990-12-ruanjinjie@huawei.com>
2025-02-10 12:24   ` [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Mark Rutland
2025-02-11 11:32     ` Jinjie Ruan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox