linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>, Wei Liu <wei.liu@kernel.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: [patch 10/11] rseq: Skip fixup when returning from a syscall
Date: Wed, 13 Aug 2025 18:29:37 +0200 (CEST)	[thread overview]
Message-ID: <20250813162824.420583910@linutronix.de> (raw)
In-Reply-To: 20250813155941.014821755@linutronix.de

The TIF_NOTIFY_RESUME handler of restartable sequences is invoked as all
other functionality unconditionally when TIF_NOTIFY_RESUME is set for
what ever reason.

The invocation is already conditional on the rseq_event_pending bit being
set, but there is further room for improvement.

The heavy lifting of critical section fixup can be completely avoided, when
the exit to user mode loop is from a syscall unless it's a debug
kernel. There was no way for the RSEQ code to distinguish that case so far.

On architectures, which enable CONFIG_GENERIC_ENTRY, the information is now
available through a function argument to exit_to_user_notify_resume(),
which tells whether the invocation comes from return from syscall or return
from interrupt.

Let the RSEQ code utilize this 'from_irq' argument when

    - CONFIG_GENERIC_ENTRY is enabled
    - CONFIG_DEBUG_RSEQ is disabled

and skip the critical section fixup when the invocation comes from a
syscall return. The update of CPU and node ID has to happen in both cases,
so the out of line call has always to happen, when a event is pending
whether it's a syscall return or not.

This changes the current behaviour, which just blindly fixes up the
critical section unconditionally in the syscall case. But that's a user
space problem when it invokes a syscall from within a critical section and
expects it to work. That code was clearly never tested on a debug kernel
and user space can keep the pieces.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/resume_user_mode.h |    2 +-
 include/linux/rseq.h             |   12 ++++++------
 kernel/rseq.c                    |   22 +++++++++++++++++++++-
 3 files changed, 28 insertions(+), 8 deletions(-)

--- a/include/linux/resume_user_mode.h
+++ b/include/linux/resume_user_mode.h
@@ -60,7 +60,7 @@ static inline void exit_to_user_notify_r
 	mem_cgroup_handle_over_high(GFP_KERNEL);
 	blkcg_maybe_throttle_current();
 
-	rseq_handle_notify_resume(regs);
+	rseq_handle_notify_resume(regs, from_irq);
 }
 
 #ifndef CONFIG_GENERIC_ENTRY
--- a/include/linux/rseq.h
+++ b/include/linux/rseq.h
@@ -13,19 +13,19 @@ static inline void rseq_set_notify_resum
 		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
 }
 
-void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs);
+void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs,
+				 bool from_irq);
 
-static inline void rseq_handle_notify_resume(struct pt_regs *regs)
+static inline void rseq_handle_notify_resume(struct pt_regs *regs, bool from_irq)
 {
 	if (IS_ENABLED(CONFIG_DEBUG_RESQ) || READ_ONCE(current->rseq_event_pending))
-		__rseq_handle_notify_resume(NULL, regs);
+		__rseq_handle_notify_resume(NULL, regs, from_irq);
 }
 
-static inline void rseq_signal_deliver(struct ksignal *ksig,
-				       struct pt_regs *regs)
+static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs)
 {
 	if (current->rseq)
-		__rseq_handle_notify_resume(ksig, regs);
+		__rseq_handle_notify_resume(ksig, regs, false);
 }
 
 static inline void rseq_notify_event(struct task_struct *t)
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -408,6 +408,22 @@ static int rseq_ip_fixup(struct pt_regs
 	return 0;
 }
 
+static inline bool rseq_ignore_event(bool from_irq, bool ksig)
+{
+	/*
+	 * On architectures which do not select_GENERIC_ENTRY
+	 * @from_irq is not usable.
+	 */
+	if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || !IS_ENABLED(CONFIG_GENERIC_ENTRY))
+		return false;
+
+	/*
+	 * Avoid the heavy lifting when this is a return from syscall,
+	 * i.e. not from interrupt and not from signal delivery.
+	 */
+	return !from_irq && !ksig;
+}
+
 /*
  * This resume handler must always be executed between any of:
  * - preemption,
@@ -419,7 +435,8 @@ static int rseq_ip_fixup(struct pt_regs
  * respect to other threads scheduled on the same CPU, and with respect
  * to signal handlers.
  */
-void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
+void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs,
+				 bool from_irq)
 {
 	struct task_struct *t = current;
 	int ret, sig;
@@ -467,6 +484,9 @@ void __rseq_handle_notify_resume(struct
 			t->rseq_event_pending = false;
 		}
 
+		if (rseq_ignore_event(from_irq, !!ksig))
+			event = false;
+
 		if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event) {
 			ret = rseq_ip_fixup(regs, event);
 			if (unlikely(ret < 0))


  parent reply	other threads:[~2025-08-13 16:29 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 16:29 [patch 00/11] rseq: Optimize exit to user space Thomas Gleixner
2025-08-13 16:29 ` [patch 01/11] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-08-20 14:23   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 02/11] rseq: Condense the inline stubs Thomas Gleixner
2025-08-20 14:24   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 03/11] rseq: Rename rseq_syscall() to rseq_debug_syscall_exit() Thomas Gleixner
2025-08-20 14:25   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 04/11] rseq: Replace the pointless event mask bit fiddling Thomas Gleixner
2025-08-13 16:29 ` [patch 05/11] rseq: Optimize the signal delivery path Thomas Gleixner
2025-08-13 16:29 ` [patch 06/11] rseq: Optimize exit to user space further Thomas Gleixner
2025-08-13 16:29 ` [patch 07/11] entry: Cleanup header Thomas Gleixner
2025-08-13 17:09   ` Giorgi Tchankvetadze
2025-08-13 21:30     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 08/11] entry: Distinguish between syscall and interrupt exit Thomas Gleixner
2025-08-13 16:29 ` [patch 09/11] entry: Provide exit_to_user_notify_resume() Thomas Gleixner
2025-08-13 16:29 ` Thomas Gleixner [this message]
2025-08-14  8:54   ` [patch 10/11] rseq: Skip fixup when returning from a syscall Peter Zijlstra
2025-08-14 13:24     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 11/11] rseq: Convert to masked user access where applicable Thomas Gleixner
2025-08-13 17:45 ` [patch 00/11] rseq: Optimize exit to user space Jens Axboe
2025-08-13 21:32   ` Thomas Gleixner
2025-08-13 21:36     ` Jens Axboe
2025-08-13 22:08       ` Thomas Gleixner
2025-08-17 21:23         ` Thomas Gleixner
2025-08-18 14:00           ` BUG: rseq selftests and librseq vs. glibc fail Thomas Gleixner
2025-08-18 14:15             ` Florian Weimer
2025-08-18 17:13               ` Thomas Gleixner
2025-08-18 19:33                 ` Florian Weimer
2025-08-18 19:46                   ` Sean Christopherson
2025-08-18 19:55                     ` Florian Weimer
2025-08-18 20:27                       ` Sean Christopherson
2025-08-18 23:54                         ` Thomas Gleixner
2025-08-19  0:28                           ` Sean Christopherson
2025-08-19  6:18                             ` Florian Weimer
2025-08-29 18:44                 ` Prakash Sangappa
2025-08-29 18:50                   ` Mathieu Desnoyers
2025-09-01 19:30                     ` Prakash Sangappa
2025-08-18 17:38           ` [patch 00/11] rseq: Optimize exit to user space Michael Jeanson
2025-08-18 20:21             ` Thomas Gleixner
2025-08-18 21:29               ` Michael Jeanson
2025-08-18 23:43                 ` Thomas Gleixner
2025-08-20 14:27           ` Mathieu Desnoyers
2025-08-20 14:10 ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250813162824.420583910@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mjeanson@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).