linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/11] rseq: Optimize exit to user space
@ 2025-08-13 16:29 Thomas Gleixner
  2025-08-13 16:29 ` [patch 01/11] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
                   ` (12 more replies)
  0 siblings, 13 replies; 43+ messages in thread
From: Thomas Gleixner @ 2025-08-13 16:29 UTC (permalink / raw)
  To: LKML
  Cc: Michael Jeanson, Mathieu Desnoyers, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Wei Liu, Jens Axboe

With the more wide spread usage of rseq in glibc, rseq is not longer a
niche use case for special applications.

While working on a sane implementation of a rseq based time slice extension
mechanism, I noticed several shortcomings of the current rseq code:

  1) task::rseq_event_mask is a pointless bitfield despite the fact that
     the ABI flags it was meant to support have been deprecated and
     functionally disabled three years ago.

  2) task::rseq_event_mask is accumulating bits unless there is a critical
     section discovered in the user space rseq memory. This results in
     pointless invocations of the rseq user space exit handler even if
     there had nothing changed. As a matter of correctness these bits have
     to be clear when exiting to user space and therefore pristine when
     coming back into the kernel. Aside of correctness, this also avoids
     pointless evaluation of the user space memory, which is a performance
     benefit.

  3) The evaluation of critical sections does not differentiate between
     syscall and interrupt/exception exits. The current implementation
     silently fixes up critical sections which invoked a syscall unless
     CONFIG_DEBUG_RSEQ is enabled.

     That's just wrong. If user space does that on a production kernel it
     can keep the pieces. The kernel is not there to proliferate mindless
     user space programming and letting everyone pay the performance
     penalty.

This series addresses these issues and on top converts parts of the user
space access over to the new masked access model, which lowers the overhead
of Spectre-V1 mitigations significantly on architectures which support it
(x86 as of today). This is especially noticable in the access to the
rseq_cs field in struct rseq, which is the first quick check to figure out
whether a critical section is installed or not.

It survives the kernels rseq selftests, but I did not any performance tests
vs. rseq because I have no idea how to use the gazillion of undocumented
command line parameters of the benchmark. I leave that to people who are so
familiar with them, that they assume everyone else is too :)

The performance gain on regular workloads is clearly measurable and the
consistent event flag state allows now to build the time slice extension
mechanism on top. The first POC I implemented:

   https://lore.kernel.org/lkml/87o6smb3a0.ffs@tglx/

suffered badly from the stale eventmask bits and the cleaned up version
brought a whopping 25+% performance gain.

The series depends on the seperately posted rseq bugfix:

   https://lore.kernel.org/lkml/87o6sj6z95.ffs@tglx/

and the uaccess generic helper series:

   https://lore.kernel.org/lkml/20250813150610.521355442@linutronix.de/

and a related futex fix in

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent

For your conveniance it is also available as a conglomorate from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/perf

Thanks,

	tglx
---
 drivers/hv/mshv_common.c         |    2 
 include/linux/entry-common.h     |   12 +--
 include/linux/irq-entry-common.h |   31 ++++++--
 include/linux/resume_user_mode.h |   40 +++++++---
 include/linux/rseq.h             |  115 +++++++++----------------------
 include/linux/sched.h            |   10 +-
 include/uapi/linux/rseq.h        |   21 +----
 io_uring/io_uring.h              |    2 
 kernel/entry/common.c            |    9 +-
 kernel/entry/kvm.c               |    2 
 kernel/rseq.c                    |  142 +++++++++++++++++++++++++--------------
 kernel/sched/core.c              |    8 +-
 kernel/sched/membarrier.c        |    8 +-
 13 files changed, 213 insertions(+), 189 deletions(-)


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-09-01 19:31 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 16:29 [patch 00/11] rseq: Optimize exit to user space Thomas Gleixner
2025-08-13 16:29 ` [patch 01/11] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-08-20 14:23   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 02/11] rseq: Condense the inline stubs Thomas Gleixner
2025-08-20 14:24   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 03/11] rseq: Rename rseq_syscall() to rseq_debug_syscall_exit() Thomas Gleixner
2025-08-20 14:25   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 04/11] rseq: Replace the pointless event mask bit fiddling Thomas Gleixner
2025-08-13 16:29 ` [patch 05/11] rseq: Optimize the signal delivery path Thomas Gleixner
2025-08-13 16:29 ` [patch 06/11] rseq: Optimize exit to user space further Thomas Gleixner
2025-08-13 16:29 ` [patch 07/11] entry: Cleanup header Thomas Gleixner
2025-08-13 17:09   ` Giorgi Tchankvetadze
2025-08-13 21:30     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 08/11] entry: Distinguish between syscall and interrupt exit Thomas Gleixner
2025-08-13 16:29 ` [patch 09/11] entry: Provide exit_to_user_notify_resume() Thomas Gleixner
2025-08-13 16:29 ` [patch 10/11] rseq: Skip fixup when returning from a syscall Thomas Gleixner
2025-08-14  8:54   ` Peter Zijlstra
2025-08-14 13:24     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 11/11] rseq: Convert to masked user access where applicable Thomas Gleixner
2025-08-13 17:45 ` [patch 00/11] rseq: Optimize exit to user space Jens Axboe
2025-08-13 21:32   ` Thomas Gleixner
2025-08-13 21:36     ` Jens Axboe
2025-08-13 22:08       ` Thomas Gleixner
2025-08-17 21:23         ` Thomas Gleixner
2025-08-18 14:00           ` BUG: rseq selftests and librseq vs. glibc fail Thomas Gleixner
2025-08-18 14:15             ` Florian Weimer
2025-08-18 17:13               ` Thomas Gleixner
2025-08-18 19:33                 ` Florian Weimer
2025-08-18 19:46                   ` Sean Christopherson
2025-08-18 19:55                     ` Florian Weimer
2025-08-18 20:27                       ` Sean Christopherson
2025-08-18 23:54                         ` Thomas Gleixner
2025-08-19  0:28                           ` Sean Christopherson
2025-08-19  6:18                             ` Florian Weimer
2025-08-29 18:44                 ` Prakash Sangappa
2025-08-29 18:50                   ` Mathieu Desnoyers
2025-09-01 19:30                     ` Prakash Sangappa
2025-08-18 17:38           ` [patch 00/11] rseq: Optimize exit to user space Michael Jeanson
2025-08-18 20:21             ` Thomas Gleixner
2025-08-18 21:29               ` Michael Jeanson
2025-08-18 23:43                 ` Thomas Gleixner
2025-08-20 14:27           ` Mathieu Desnoyers
2025-08-20 14:10 ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).