From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
x86@kernel.org, Arnd Bergmann <arnd@arndb.de>,
Heiko Carstens <hca@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Huacai Chen <chenhuacai@kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>
Subject: Re: [patch V2 37/37] entry/rseq: Optimize for TIF_RSEQ on exit
Date: Mon, 25 Aug 2025 15:43:32 -0400 [thread overview]
Message-ID: <0090fb14-e78f-4b67-8933-bf9ef89ba0d9@efficios.com> (raw)
In-Reply-To: <20250823161655.651830871@linutronix.de>
On 2025-08-23 12:40, Thomas Gleixner wrote:
> Further analysis of the exit path with the seperate TIF_RSEQ showed that
> depending on the workload a significant amount of invocations of
> resume_user_mode_work() ends up with no other bit set than TIF_RSEQ.
>
> On architectures with a separate TIF_RSEQ this can be distinguished and
> checked right at the beginning of the function before entering the loop.
>
> The quick check is lightweight so it does not impose a massive penalty on
> non-RSEQ use cases. It just checks for the work being empty, except for
> TIF_RSEQ and jumps right into the handling fast path.
>
> This is truly the only TIF bit there which can be optimized that way
> because the handling runs only when all the other work has been done. The
> optimization spares a full round trip through the other conditionals and an
> interrupt enable/disable pair. The generated code looks reasonable enough
> to justify this and the resulting numbers do so as well.
>
> The main beneficiaries are blocking syscall heavy work loads, where the
> tasks often end up being scheduled on a different CPU or get a different MM
> CID, but have no other work to handle on return.
>
> A futex benchmark showed up to 90% shortcut utilization and a measurable
> improvement in perf of ~1%. Non-scheduling work loads do neither see an
> improvement nor degrade. A full kernel build shows about 15% shortcuts,
> but no measurable side effects in either direction.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> include/linux/rseq_entry.h | 14 ++++++++++++++
> kernel/entry/common.c | 13 +++++++++++--
> kernel/rseq.c | 2 ++
> 3 files changed, 27 insertions(+), 2 deletions(-)
>
> --- a/include/linux/rseq_entry.h
> +++ b/include/linux/rseq_entry.h
> @@ -11,6 +11,7 @@ struct rseq_stats {
> unsigned long signal;
> unsigned long slowpath;
> unsigned long fastpath;
> + unsigned long quicktif;
> unsigned long ids;
> unsigned long cs;
> unsigned long clear;
> @@ -532,6 +533,14 @@ rseq_exit_to_user_mode_work(struct pt_re
> return ti_work | _TIF_NOTIFY_RESUME;
> }
>
> +static __always_inline bool
> +rseq_exit_to_user_mode_early(unsigned long ti_work, const unsigned long mask)
> +{
> + if (IS_ENABLED(CONFIG_HAVE_GENERIC_TIF_BITS))
> + return (ti_work & mask) == CHECK_TIF_RSEQ;
> + return false;
> +}
> +
> #endif /* !CONFIG_GENERIC_ENTRY */
>
> static __always_inline void rseq_syscall_exit_to_user_mode(void)
> @@ -577,6 +586,11 @@ static inline unsigned long rseq_exit_to
> {
> return ti_work;
> }
> +
> +static inline bool rseq_exit_to_user_mode_early(unsigned long ti_work, const unsigned long mask)
> +{
> + return false;
> +}
> static inline void rseq_note_user_irq_entry(void) { }
> static inline void rseq_syscall_exit_to_user_mode(void) { }
> static inline void rseq_irqentry_exit_to_user_mode(void) { }
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -22,7 +22,14 @@ void __weak arch_do_signal_or_restart(st
> /*
> * Before returning to user space ensure that all pending work
> * items have been completed.
> + *
> + * Optimize for TIF_RSEQ being the only bit set.
> */
> + if (rseq_exit_to_user_mode_early(ti_work, EXIT_TO_USER_MODE_WORK)) {
> + rseq_stat_inc(rseq_stats.quicktif);
> + goto do_rseq;
> + }
> +
> do {
> local_irq_enable_exit_to_user(ti_work);
>
> @@ -56,10 +63,12 @@ void __weak arch_do_signal_or_restart(st
>
> ti_work = read_thread_flags();
>
> + do_rseq:
> /*
> * This returns the unmodified ti_work, when ti_work is not
> - * empty. In that case it waits for the next round to avoid
> - * multiple updates in case of rescheduling.
> + * empty (except for TIF_RSEQ). In that case it waits for
> + * the next round to avoid multiple updates in case of
> + * rescheduling.
> *
> * When it handles rseq it returns either with empty work
> * on success or with TIF_NOTIFY_RESUME set on failure to
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -134,6 +134,7 @@ static int rseq_stats_show(struct seq_fi
> stats.signal += data_race(per_cpu(rseq_stats.signal, cpu));
> stats.slowpath += data_race(per_cpu(rseq_stats.slowpath, cpu));
> stats.fastpath += data_race(per_cpu(rseq_stats.fastpath, cpu));
> + stats.quicktif += data_race(per_cpu(rseq_stats.quicktif, cpu));
> stats.ids += data_race(per_cpu(rseq_stats.ids, cpu));
> stats.cs += data_race(per_cpu(rseq_stats.cs, cpu));
> stats.clear += data_race(per_cpu(rseq_stats.clear, cpu));
> @@ -144,6 +145,7 @@ static int rseq_stats_show(struct seq_fi
> seq_printf(m, "signal: %16lu\n", stats.signal);
> seq_printf(m, "slowp: %16lu\n", stats.slowpath);
> seq_printf(m, "fastp: %16lu\n", stats.fastpath);
> + seq_printf(m, "quickt: %16lu\n", stats.quicktif);
> seq_printf(m, "ids: %16lu\n", stats.ids);
> seq_printf(m, "cs: %16lu\n", stats.cs);
> seq_printf(m, "clear: %16lu\n", stats.clear);
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2025-08-25 19:43 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-23 16:39 [patch V2 00/37] rseq: Optimize exit to user space Thomas Gleixner
2025-08-23 16:39 ` [patch V2 01/37] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-08-25 15:39 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 02/37] rseq: Condense the inline stubs Thomas Gleixner
2025-08-25 15:40 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 03/37] resq: Move algorithm comment to top Thomas Gleixner
2025-08-25 15:41 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 04/37] rseq: Remove the ksig argument from rseq_handle_notify_resume() Thomas Gleixner
2025-08-25 15:43 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 05/37] rseq: Simplify registration Thomas Gleixner
2025-08-25 15:44 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 06/37] rseq: Simplify the event notification Thomas Gleixner
2025-08-25 17:36 ` Mathieu Desnoyers
2025-09-02 13:39 ` Thomas Gleixner
2025-09-04 17:19 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 07/37] rseq, virt: Retrigger RSEQ after vcpu_run() Thomas Gleixner
2025-08-25 17:54 ` Mathieu Desnoyers
2025-08-25 20:24 ` Sean Christopherson
2025-09-02 15:37 ` Thomas Gleixner
2025-08-23 16:39 ` [patch V2 08/37] rseq: Avoid CPU/MM CID updates when no event pending Thomas Gleixner
2025-08-25 18:02 ` Mathieu Desnoyers
2025-09-02 13:41 ` Thomas Gleixner
2025-09-04 17:20 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 09/37] rseq: Introduce struct rseq_event Thomas Gleixner
2025-08-25 18:11 ` Mathieu Desnoyers
2025-09-02 13:45 ` Thomas Gleixner
2025-08-23 16:39 ` [patch V2 10/37] entry: Cleanup header Thomas Gleixner
2025-08-25 18:13 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 11/37] entry: Remove syscall_enter_from_user_mode_prepare() Thomas Gleixner
2025-08-23 16:39 ` [patch V2 12/37] entry: Inline irqentry_enter/exit_from/to_user_mode() Thomas Gleixner
2025-08-23 16:39 ` [patch V2 13/37] sched: Move MM CID related functions to sched.h Thomas Gleixner
2025-08-25 18:14 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 14/37] rseq: Cache CPU ID and MM CID values Thomas Gleixner
2025-08-25 18:19 ` Mathieu Desnoyers
2025-09-02 13:48 ` Thomas Gleixner
2025-09-04 17:21 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 15/37] rseq: Record interrupt from user space Thomas Gleixner
2025-08-25 18:29 ` Mathieu Desnoyers
2025-09-02 13:54 ` Thomas Gleixner
2025-08-23 16:39 ` [patch V2 16/37] rseq: Provide tracepoint wrappers for inline code Thomas Gleixner
2025-08-25 18:32 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 17/37] rseq: Expose lightweight statistics in debugfs Thomas Gleixner
2025-08-25 18:34 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 18/37] rseq: Provide static branch for runtime debugging Thomas Gleixner
2025-08-25 18:36 ` Mathieu Desnoyers
2025-08-25 20:30 ` Michael Jeanson
2025-09-02 13:56 ` Thomas Gleixner
2025-08-23 16:39 ` [patch V2 19/37] rseq: Provide and use rseq_update_user_cs() Thomas Gleixner
2025-08-25 19:16 ` Mathieu Desnoyers
2025-09-02 15:19 ` Thomas Gleixner
2025-08-23 16:39 ` [patch V2 20/37] rseq: Replace the debug crud Thomas Gleixner
2025-08-26 14:21 ` Mathieu Desnoyers
2025-08-23 16:39 ` [patch V2 21/37] rseq: Make exit debugging static branch based Thomas Gleixner
2025-08-26 14:23 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 22/37] rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y Thomas Gleixner
2025-08-26 14:28 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 23/37] rseq: Provide and use rseq_set_uids() Thomas Gleixner
2025-08-26 14:52 ` Mathieu Desnoyers
2025-09-02 14:08 ` Thomas Gleixner
2025-09-02 16:33 ` Thomas Gleixner
2025-09-04 17:25 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 24/37] rseq: Seperate the signal delivery path Thomas Gleixner
2025-08-26 15:08 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 25/37] rseq: Rework the TIF_NOTIFY handler Thomas Gleixner
2025-08-26 15:12 ` Mathieu Desnoyers
2025-09-02 17:32 ` Thomas Gleixner
2025-09-04 9:52 ` Sean Christopherson
2025-09-04 10:53 ` Thomas Gleixner
2025-09-04 17:07 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 26/37] rseq: Optimize event setting Thomas Gleixner
2025-08-26 15:26 ` Mathieu Desnoyers
2025-09-02 14:17 ` Thomas Gleixner
2025-08-23 16:40 ` [patch V2 27/37] rseq: Implement fast path for exit to user Thomas Gleixner
2025-08-26 15:33 ` Mathieu Desnoyers
2025-09-02 18:31 ` Thomas Gleixner
2025-08-23 16:40 ` [patch V2 28/37] rseq: Switch to fast path processing on " Thomas Gleixner
2025-08-26 15:40 ` Mathieu Desnoyers
2025-08-27 13:45 ` Mathieu Desnoyers
2025-09-02 18:36 ` Thomas Gleixner
2025-09-04 17:54 ` Mathieu Desnoyers
2025-09-04 21:31 ` Thomas Gleixner
2025-08-23 16:40 ` [patch V2 29/37] entry: Split up exit_to_user_mode_prepare() Thomas Gleixner
2025-08-26 15:41 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 30/37] rseq: Split up rseq_exit_to_user_mode() Thomas Gleixner
2025-08-26 15:45 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 31/37] asm-generic: Provide generic TIF infrastructure Thomas Gleixner
2025-08-23 20:37 ` Arnd Bergmann
2025-08-25 19:33 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 32/37] x86: Use generic TIF bits Thomas Gleixner
2025-08-25 19:34 ` Mathieu Desnoyers
2025-08-23 16:40 ` [patch V2 33/37] s390: " Thomas Gleixner
2025-08-23 16:40 ` [patch V2 34/37] loongarch: " Thomas Gleixner
2025-08-23 16:40 ` [patch V2 35/37] riscv: " Thomas Gleixner
2025-08-23 16:40 ` [patch V2 36/37] rseq: Switch to TIF_RSEQ if supported Thomas Gleixner
2025-08-25 19:39 ` Mathieu Desnoyers
2025-08-25 20:02 ` Sean Christopherson
2025-09-02 11:03 ` Thomas Gleixner
2025-09-04 10:08 ` Sean Christopherson
2025-09-04 12:26 ` Thomas Gleixner
2025-08-23 16:40 ` [patch V2 37/37] entry/rseq: Optimize for TIF_RSEQ on exit Thomas Gleixner
2025-08-25 19:43 ` Mathieu Desnoyers [this message]
2025-08-25 15:10 ` [patch V2 00/37] rseq: Optimize exit to user space Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0090fb14-e78f-4b67-8933-bf9ef89ba0d9@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=arnd@arndb.de \
--cc=axboe@kernel.dk \
--cc=boqun.feng@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=decui@microsoft.com \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=seanjc@google.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=wei.liu@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).