All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jens Axboe <axboe@kernel.dk>, LKML <linux-kernel@vger.kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>, Wei Liu <wei.liu@kernel.org>
Subject: Re: [patch 00/11] rseq: Optimize exit to user space
Date: Sun, 17 Aug 2025 23:23:07 +0200	[thread overview]
Message-ID: <87y0rh63t0.ffs@tglx> (raw)
In-Reply-To: <877bz67u3j.ffs@tglx>

On Thu, Aug 14 2025 at 00:08, Thomas Gleixner wrote:
> On Wed, Aug 13 2025 at 15:36, Jens Axboe wrote:
>> On 8/13/25 3:32 PM, Thomas Gleixner wrote:
>>> Could you give it a test ride to see whether this makes a difference in
>>> your environment?
>>
>> Yep, I'll give a spin.
>
> Appreciated.

Please do not use the git branch I had in the cover letter. I did some
more analysis of this and it's even worse than I thought. Use

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/wip

instead.

I've rewritten the whole pile by now and made it a real fast path
without the TIF_NOTIFY horror show, unless the fast path, which runs
_after_ the TIF work loop faults. So far that happens once for each
fork() as that has to fault in the copy of the user space rseq region.

There are lightweight per CPU stats for the various events, which can be
accessed via

      /sys/kernel/debug/rseq/stat

which exposes a racy sum of them. Here is the output after a reboot and
a full kernel recompile

exit:           85703905    // Total invocations
signal:            34635    // Invocations from signal delivery
slowp:               134    // Slow path via TIF_NOTIFY_RESUME
ids:               70052    // Updates of CPU and MM CID
cs:                    0    // Critical section analysis
clear:                 0    // Clearing of critical section
fixup:                 0    // Fixup of critical section (abort)

Before the rewrite this took more than a million of ID updates and
critical section evaluations even when completely pointless.

So on any syscall or interrupt heavy workload this should be clearly
visible as a difference in the profile.

I'm still not happy about the exit to user fast path decision as it's
two conditionals instead of one, but all attempts to do that lightweight
somewhere else turned out to make stuff worse as I just burdened other
fast path operations, i.e. the scheduler with pointless conditionals.

I'll think about that more, but nevertheless this is way better than the
current horror show.

I also have no real good plan yet how to gradually convert this over,
but I'm way too tired to think about that now.

It survives the self test suite after I wasted a day to figure out why
the selftests reliably segfault on a machine which has debian trixie
installed. The fix is in the branch.

Michael, can you please run your librseq tests against that too? They
have the same segfault problem as the kernel and they lack a run script,
so I couldn't be bothered to test against them. See commit 2bff3a0e5998
in that branch. I'll send out a patch with a proper change log later.

Thanks,

        tglx





  reply	other threads:[~2025-08-17 21:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 16:29 [patch 00/11] rseq: Optimize exit to user space Thomas Gleixner
2025-08-13 16:29 ` [patch 01/11] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-08-20 14:23   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 02/11] rseq: Condense the inline stubs Thomas Gleixner
2025-08-20 14:24   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 03/11] rseq: Rename rseq_syscall() to rseq_debug_syscall_exit() Thomas Gleixner
2025-08-20 14:25   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 04/11] rseq: Replace the pointless event mask bit fiddling Thomas Gleixner
2025-08-13 16:29 ` [patch 05/11] rseq: Optimize the signal delivery path Thomas Gleixner
2025-08-13 16:29 ` [patch 06/11] rseq: Optimize exit to user space further Thomas Gleixner
2025-08-13 16:29 ` [patch 07/11] entry: Cleanup header Thomas Gleixner
2025-08-13 17:09   ` Giorgi Tchankvetadze
2025-08-13 21:30     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 08/11] entry: Distinguish between syscall and interrupt exit Thomas Gleixner
2025-08-13 16:29 ` [patch 09/11] entry: Provide exit_to_user_notify_resume() Thomas Gleixner
2025-08-13 16:29 ` [patch 10/11] rseq: Skip fixup when returning from a syscall Thomas Gleixner
2025-08-14  8:54   ` Peter Zijlstra
2025-08-14 13:24     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 11/11] rseq: Convert to masked user access where applicable Thomas Gleixner
2025-08-13 17:45 ` [patch 00/11] rseq: Optimize exit to user space Jens Axboe
2025-08-13 21:32   ` Thomas Gleixner
2025-08-13 21:36     ` Jens Axboe
2025-08-13 22:08       ` Thomas Gleixner
2025-08-17 21:23         ` Thomas Gleixner [this message]
2025-08-18 14:00           ` BUG: rseq selftests and librseq vs. glibc fail Thomas Gleixner
2025-08-18 14:15             ` Florian Weimer
2025-08-18 17:13               ` Thomas Gleixner
2025-08-18 19:33                 ` Florian Weimer
2025-08-18 19:46                   ` Sean Christopherson
2025-08-18 19:55                     ` Florian Weimer
2025-08-18 20:27                       ` Sean Christopherson
2025-08-18 23:54                         ` Thomas Gleixner
2025-08-19  0:28                           ` Sean Christopherson
2025-08-19  6:18                             ` Florian Weimer
2025-08-29 18:44                 ` Prakash Sangappa
2025-08-29 18:50                   ` Mathieu Desnoyers
2025-09-01 19:30                     ` Prakash Sangappa
2025-08-18 17:38           ` [patch 00/11] rseq: Optimize exit to user space Michael Jeanson
2025-08-18 20:21             ` Thomas Gleixner
2025-08-18 21:29               ` Michael Jeanson
2025-08-18 23:43                 ` Thomas Gleixner
2025-08-20 14:27           ` Mathieu Desnoyers
2025-08-20 14:10 ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0rh63t0.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mjeanson@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.