public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>,
	Jens Axboe <axboe@kernel.dk>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	x86@kernel.org, Arnd Bergmann <arnd@arndb.de>,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>
Subject: Re: [patch V4 28/36] rseq: Switch to fast path processing on exit to user
Date: Thu, 11 Sep 2025 10:44:56 -0400	[thread overview]
Message-ID: <84d9beb2-85e7-4fc0-b403-0514bd87ae8b@efficios.com> (raw)
In-Reply-To: <20250908212927.058801648@linutronix.de>

On 2025-09-08 17:32, Thomas Gleixner wrote:
> Now that all bits and pieces are in place, hook the RSEQ handling fast path
> function into exit_to_user_mode_prepare() after the TIF work bits have been
> handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
> and the caller needs to take another turn through the TIF handling slow
> path.
> 
> This only works for architectures, which use the generic entry code.

Remove comma after "architectures"

> Architectures, who still have their own incomplete hacks are not supported

Remove comma after "Architectures"

> and won't be.
> 
> This results in the following improvements:
> 
>    Kernel build	       Before		  After		      Reduction
> 		
>    exit to user         80692981		  80514451
>    signal checks:          32581		       121	       99%
>    slowpath runs:        1201408   1.49%	       198 0.00%      100%
>    fastpath runs:           	  	    675941 0.84%       N/A
>    id updates:           1233989   1.53%	     50541 0.06%       96%
>    cs checks:            1125366   1.39%	         0 0.00%      100%
>      cs cleared:         1125366      100%	 0            100%
>      cs fixup:                 0        0%	 0
> 
>    RSEQ selftests      Before		  After		      Reduction
> 
>    exit to user:       386281778		  387373750
>    signal checks:       35661203		          0           100%
>    slowpath runs:      140542396 36.38%	        100  0.00%    100%
>    fastpath runs:           	  	    9509789  2.51%     N/A
>    id updates:         176203599 45.62%	    9087994  2.35%     95%
>    cs checks:          175587856 45.46%	    4728394  1.22%     98%
>      cs cleared:       172359544   98.16%    1319307   27.90%   99%
>      cs fixup:           3228312    1.84%    3409087   72.10%
> 
> The 'cs cleared' and 'cs fixup' percentanges are not relative to the exit

percentages

> to user invocations, they are relative to the actual 'cs check'
> invocations.
> 
> While some of this could have been avoided in the original code, like the
> obvious clearing of CS when it's already clear, the main problem of going
> through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ
> notify handler is invoked more than once before going out to user
> space. Doing this once when everything has stabilized is the only solution
> to avoid this.
> 
> The initial attempt to completely decouple it from the TIF work turned out
> to be suboptimal for workloads, which do a lot of quick and short system
> calls. Even if the fast path decision is only 4 instructions (including a
> conditional branch), this adds up quickly and becomes measurable when the
> rate for actually having to handle rseq is in the low single digit
> percentage range of user/kernel transitions.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V4: Move the rseq handling into a separate loop to avoid gotos later on
> ---
>   include/linux/irq-entry-common.h |    7 ++-----
>   include/linux/resume_user_mode.h |    2 +-
>   include/linux/rseq.h             |   23 +++++++++++++++++------
>   init/Kconfig                     |    2 +-
>   kernel/entry/common.c            |   26 +++++++++++++++++++-------
>   kernel/rseq.c                    |    8 ++++++--
>   6 files changed, 46 insertions(+), 22 deletions(-)
> 
> --- a/include/linux/irq-entry-common.h
> +++ b/include/linux/irq-entry-common.h
> @@ -197,11 +197,8 @@ static __always_inline void arch_exit_to
>    */
>   void arch_do_signal_or_restart(struct pt_regs *regs);
>   
> -/**
> - * exit_to_user_mode_loop - do any pending work before leaving to user space
> - */
> -unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> -				     unsigned long ti_work);
> +/* Handle pending TIF work */
> +unsigned long exit_to_user_mode_loop(struct pt_regs *regs, unsigned long ti_work);
>   
>   /**
>    * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required
> --- a/include/linux/resume_user_mode.h
> +++ b/include/linux/resume_user_mode.h
> @@ -59,7 +59,7 @@ static inline void resume_user_mode_work
>   	mem_cgroup_handle_over_high(GFP_KERNEL);
>   	blkcg_maybe_throttle_current();
>   
> -	rseq_handle_notify_resume(regs);
> +	rseq_handle_slowpath(regs);
>   }
>   
>   #endif /* LINUX_RESUME_USER_MODE_H */
> --- a/include/linux/rseq.h
> +++ b/include/linux/rseq.h
> @@ -5,13 +5,19 @@
>   #ifdef CONFIG_RSEQ
>   #include <linux/sched.h>
>   
> -void __rseq_handle_notify_resume(struct pt_regs *regs);
> +void __rseq_handle_slowpath(struct pt_regs *regs);
>   
> -static inline void rseq_handle_notify_resume(struct pt_regs *regs)
> +/* Invoked from resume_user_mode_work() */
> +static inline void rseq_handle_slowpath(struct pt_regs *regs)
>   {
> -	/* '&' is intentional to spare one conditional branch */
> -	if (current->rseq.event.sched_switch & current->rseq.event.has_rseq)
> -		__rseq_handle_notify_resume(regs);
> +	if (IS_ENABLED(CONFIG_GENERIC_ENTRY)) {
> +		if (current->rseq.event.slowpath)
> +			__rseq_handle_slowpath(regs);
> +	} else {
> +		/* '&' is intentional to spare one conditional branch */
> +		if (current->rseq.event.sched_switch & current->rseq.event.has_rseq)

Ref. to earlier comment about has_rseq check perhaps redundant.

> +			__rseq_handle_slowpath(regs);
> +	}
>   }
>   
>   void __rseq_signal_deliver(int sig, struct pt_regs *regs);
> @@ -142,11 +148,16 @@ static inline void rseq_fork(struct task
>   	} else {
>   		t->rseq = current->rseq;
>   		t->rseq.ids.cpu_cid = ~0ULL;

As discussed earlier, do we really want to clear cpu_cid here, or
copy from parent ? If we keep the parent's cached values, I suspect
we can skip the page fault on return from fork in many cases.

> +		/*
> +		 * If it has rseq, force it into the slow path right away
> +		 * because it is guaranteed to fault.
> +		 */
> +		t->rseq.event.slowpath = t->rseq.event.has_rseq;

I think we can do better here. It's only guaranteed to fault if:

- has_rseq is set, AND
   - cpu or cid has changed compared to the cached value OR
   - rseq_cs user pointer is non-NULL.

Otherwise we should be able to handle the return from fork from the fast
path just with loads from the rseq area, or am I missing something ?

Thanks,

Mathieu

>   	}
>   }
>   
>   #else /* CONFIG_RSEQ */
> -static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) { }
> +static inline void rseq_handle_slowpath(struct pt_regs *regs) { }
>   static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { }
>   static inline void rseq_sched_switch_event(struct task_struct *t) { }
>   static inline void rseq_sched_set_task_cpu(struct task_struct *t, unsigned int cpu) { }
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1911,7 +1911,7 @@ config RSEQ_DEBUG_DEFAULT_ENABLE
>   config DEBUG_RSEQ
>   	default n
>   	bool "Enable debugging of rseq() system call" if EXPERT
> -	depends on RSEQ && DEBUG_KERNEL
> +	depends on RSEQ && DEBUG_KERNEL && !GENERIC_ENTRY

I'm confused about this hunk. Perhaps this belongs to a different
commit ?

Thanks,

Mathieu

>   	select RSEQ_DEBUG_DEFAULT_ENABLE
>   	help
>   	  Enable extra debugging checks for the rseq system call.
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -11,13 +11,8 @@
>   /* Workaround to allow gradual conversion of architecture code */
>   void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
>   
> -/**
> - * exit_to_user_mode_loop - do any pending work before leaving to user space
> - * @regs:	Pointer to pt_regs on entry stack
> - * @ti_work:	TIF work flags as read by the caller
> - */
> -__always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> -						     unsigned long ti_work)
> +static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *regs,
> +							      unsigned long ti_work)
>   {
>   	/*
>   	 * Before returning to user space ensure that all pending work
> @@ -62,6 +57,23 @@ void __weak arch_do_signal_or_restart(st
>   	return ti_work;
>   }
>   
> +/**
> + * exit_to_user_mode_loop - do any pending work before leaving to user space
> + * @regs:	Pointer to pt_regs on entry stack
> + * @ti_work:	TIF work flags as read by the caller
> + */
> +__always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> +						     unsigned long ti_work)
> +{
> +	for (;;) {
> +		ti_work = __exit_to_user_mode_loop(regs, ti_work);
> +
> +		if (likely(!rseq_exit_to_user_mode_restart(regs)))
> +			return ti_work;
> +		ti_work = read_thread_flags();
> +	}
> +}
> +
>   noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>   {
>   	irqentry_state_t ret = {
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -234,7 +234,11 @@ static bool rseq_handle_cs(struct task_s
>   
>   static void rseq_slowpath_update_usr(struct pt_regs *regs)
>   {
> -	/* Preserve rseq state and user_irq state for exit to user */
> +	/*
> +	 * Preserve rseq state and user_irq state. The generic entry code
> +	 * clears user_irq on the way out, the non-generic entry
> +	 * architectures are not having user_irq.
> +	 */
>   	const struct rseq_event evt_mask = { .has_rseq = true, .user_irq = true, };
>   	struct task_struct *t = current;
>   	struct rseq_ids ids;
> @@ -286,7 +290,7 @@ static void rseq_slowpath_update_usr(str
>   	}
>   }
>   
> -void __rseq_handle_notify_resume(struct pt_regs *regs)
> +void __rseq_handle_slowpath(struct pt_regs *regs)
>   {
>   	/*
>   	 * If invoked from hypervisors before entering the guest via
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

  reply	other threads:[~2025-09-11 14:45 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 21:31 [patch V4 00/36] rseq: Optimize exit to user space Thomas Gleixner
2025-09-08 21:31 ` [patch V4 01/36] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-09-08 21:31 ` [patch V4 02/36] rseq: Condense the inline stubs Thomas Gleixner
2025-09-08 21:31 ` [patch V4 03/36] rseq: Move algorithm comment to top Thomas Gleixner
2025-09-08 21:31 ` [patch V4 04/36] rseq: Remove the ksig argument from rseq_handle_notify_resume() Thomas Gleixner
2025-09-08 21:31 ` [patch V4 05/36] rseq: Simplify registration Thomas Gleixner
2025-09-08 21:31 ` [patch V4 06/36] rseq: Simplify the event notification Thomas Gleixner
2025-09-09 13:18   ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 07/36] rseq, virt: Retrigger RSEQ after vcpu_run() Thomas Gleixner
2025-09-09  0:00   ` Sean Christopherson
2025-09-09 12:10     ` Thomas Gleixner
2025-09-09 13:21   ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 08/36] rseq: Avoid CPU/MM CID updates when no event pending Thomas Gleixner
2025-09-09 13:25   ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 09/36] rseq: Introduce struct rseq_data Thomas Gleixner
2025-09-09 13:30   ` Mathieu Desnoyers
2025-09-12 20:44     ` Thomas Gleixner
2025-09-12 21:33       ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 10/36] entry: Cleanup header Thomas Gleixner
2025-09-08 21:31 ` [patch V4 11/36] entry: Remove syscall_enter_from_user_mode_prepare() Thomas Gleixner
2025-09-09 13:33   ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 12/36] entry: Inline irqentry_enter/exit_from/to_user_mode() Thomas Gleixner
2025-09-09 13:38   ` Mathieu Desnoyers
2025-09-09 14:10     ` Thomas Gleixner
2025-09-09 14:59       ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 13/36] sched: Move MM CID related functions to sched.h Thomas Gleixner
2025-09-08 21:31 ` [patch V4 14/36] rseq: Cache CPU ID and MM CID values Thomas Gleixner
2025-09-09 13:43   ` Mathieu Desnoyers
2025-09-09 14:13     ` Thomas Gleixner
2025-09-09 15:01       ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 15/36] rseq: Record interrupt from user space Thomas Gleixner
2025-09-09 13:53   ` Mathieu Desnoyers
2025-09-09 14:17     ` Thomas Gleixner
2025-09-09 15:05       ` Mathieu Desnoyers
2025-09-08 21:31 ` [patch V4 16/36] rseq: Provide tracepoint wrappers for inline code Thomas Gleixner
2025-09-08 21:31 ` [patch V4 17/36] rseq: Expose lightweight statistics in debugfs Thomas Gleixner
2025-09-08 21:32 ` [patch V4 18/36] rseq: Provide static branch for runtime debugging Thomas Gleixner
2025-09-08 21:32 ` [patch V4 19/36] rseq: Provide and use rseq_update_user_cs() Thomas Gleixner
2025-09-09 15:11   ` Mathieu Desnoyers
2025-09-08 21:32 ` [patch V4 20/36] rseq: Replace the original debug implementation Thomas Gleixner
2025-09-08 21:32 ` [patch V4 21/36] rseq: Make exit debugging static branch based Thomas Gleixner
2025-09-08 21:32 ` [patch V4 22/36] rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y Thomas Gleixner
2025-09-08 21:32 ` [patch V4 23/36] rseq: Provide and use rseq_set_ids() Thomas Gleixner
2025-09-11 13:40   ` Mathieu Desnoyers
2025-09-11 16:02     ` Thomas Gleixner
2025-09-11 17:13       ` Mathieu Desnoyers
2025-09-08 21:32 ` [patch V4 24/36] rseq: Separate the signal delivery path Thomas Gleixner
2025-09-08 21:32 ` [patch V4 25/36] rseq: Rework the TIF_NOTIFY handler Thomas Gleixner
2025-09-08 21:32 ` [patch V4 26/36] rseq: Optimize event setting Thomas Gleixner
2025-09-11 14:03   ` Mathieu Desnoyers
2025-09-11 16:06     ` Thomas Gleixner
2025-09-11 17:15       ` Mathieu Desnoyers
2025-09-12  6:58         ` Thomas Gleixner
2025-09-08 21:32 ` [patch V4 27/36] rseq: Implement fast path for exit to user Thomas Gleixner
2025-09-11 14:27   ` Mathieu Desnoyers
2025-09-11 16:08     ` Thomas Gleixner
2025-09-08 21:32 ` [patch V4 28/36] rseq: Switch to fast path processing on " Thomas Gleixner
2025-09-11 14:44   ` Mathieu Desnoyers [this message]
2025-09-11 14:45     ` Mathieu Desnoyers
2025-09-11 16:50       ` Thomas Gleixner
2025-09-11 16:47     ` Thomas Gleixner
2025-09-11 20:00       ` Mathieu Desnoyers
2025-09-12 14:22         ` Thomas Gleixner
2025-09-12 15:44           ` Mathieu Desnoyers
2025-09-08 21:32 ` [patch V4 29/36] entry: Split up exit_to_user_mode_prepare() Thomas Gleixner
2025-09-08 21:32 ` [patch V4 30/36] rseq: Split up rseq_exit_to_user_mode() Thomas Gleixner
2025-09-08 21:32 ` [patch V4 31/36] asm-generic: Provide generic TIF infrastructure Thomas Gleixner
2025-09-17  6:16   ` [tip: core/core] " tip-bot2 for Thomas Gleixner
2025-09-08 21:32 ` [patch V4 32/36] x86: Use generic TIF bits Thomas Gleixner
2025-09-17  6:16   ` [tip: core/core] " tip-bot2 for Thomas Gleixner
2025-09-08 21:32 ` [patch V4 33/36] s390: " Thomas Gleixner
2025-09-11  9:11   ` Sven Schnelle
2025-09-11 11:03   ` Heiko Carstens
2025-09-17  6:16   ` [tip: core/core] " tip-bot2 for Thomas Gleixner
2025-09-08 21:32 ` [patch V4 34/36] loongarch: " Thomas Gleixner
2025-09-17  6:16   ` [tip: core/core] " tip-bot2 for Thomas Gleixner
2025-09-08 21:32 ` [patch V4 35/36] riscv: " Thomas Gleixner
2025-09-17  6:16   ` [tip: core/core] " tip-bot2 for Thomas Gleixner
2025-09-08 21:32 ` [patch V4 36/36] rseq: Switch to TIF_RSEQ if supported Thomas Gleixner
2025-09-10 13:55 ` [patch V4 00/36] rseq: Optimize exit to user space Jens Axboe
2025-09-10 14:45   ` Michael Jeanson
2025-09-10 15:34     ` Jens Axboe
2025-09-10 14:54   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84d9beb2-85e7-4fc0-b403-0514bd87ae8b@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=decui@microsoft.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjeanson@efficios.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox