linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>,
	Wei Liu <wei.liu@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Peter Zijlstra <peterz@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>
Subject: [patch 09/11] entry: Provide exit_to_user_notify_resume()
Date: Wed, 13 Aug 2025 18:29:34 +0200 (CEST)	[thread overview]
Message-ID: <20250813162824.356621744@linutronix.de> (raw)
In-Reply-To: 20250813155941.014821755@linutronix.de

The TIF_NOTIFY_RESUME handler of restartable sequences is invoked as all
other functionality unconditionally when TIF_NOTIFY_RESUME is set for
what ever reason.

The invocation is already conditional on the rseq_event_pending bit being
set, but there is further room for improvement.

The actual invocation cannot be avoided when the event bit is set, but the
actual heavy lifting of accessing user space can be avoided, when the exit
to user mode loop is from a syscall unless it's a debug kernel. There is no
way for the RSEQ code to distinguish that case.

That's trivial for all architectures which use the generic entry code, but
for all others it's non-trivial work, which is beyond the scope of
this. The architectures, which want to benefit should convert their code
over to the generic entry code finally.

To prepare for that optimization rename resume_user_mode_work() to
exit_to_user_notify_resume() and add a @from_irq argument to it, which can
be supplied by the caller.

Let the generic entry code and all non-entry code users like hypervisors
and IO-URING use this new function and supply the correct information.

Any NOTIFY_RESUME work, which evaluates this new argument, has to make the
evaluation dependent on CONFIG_GENERIC_ENTRY because otherwise there is no
guarantee that the value is correct at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 drivers/hv/mshv_common.c         |    2 +-
 include/linux/resume_user_mode.h |   38 +++++++++++++++++++++++++++-----------
 io_uring/io_uring.h              |    2 +-
 kernel/entry/common.c            |    2 +-
 kernel/entry/kvm.c               |    2 +-
 5 files changed, 31 insertions(+), 15 deletions(-)

--- a/drivers/hv/mshv_common.c
+++ b/drivers/hv/mshv_common.c
@@ -155,7 +155,7 @@ int mshv_do_pre_guest_mode_work(ulong th
 		schedule();
 
 	if (th_flags & _TIF_NOTIFY_RESUME)
-		resume_user_mode_work(NULL);
+		exit_to_user_notify_resume(NULL, false);
 
 	return 0;
 }
--- a/include/linux/resume_user_mode.h
+++ b/include/linux/resume_user_mode.h
@@ -24,21 +24,22 @@ static inline void set_notify_resume(str
 		kick_process(task);
 }
 
-
 /**
- * resume_user_mode_work - Perform work before returning to user mode
- * @regs:		user-mode registers of @current task
+ * exit_to_user_notify_resume - Perform work before returning to user mode
+ * @regs:	user-mode registers of @current task
+ * @from_irq:	If true this is a return from interrupt, if false it's
+ *		a syscall return.
  *
- * This is called when %TIF_NOTIFY_RESUME has been set.  Now we are
- * about to return to user mode, and the user state in @regs can be
- * inspected or adjusted.  The caller in arch code has cleared
- * %TIF_NOTIFY_RESUME before the call.  If the flag gets set again
- * asynchronously, this will be called again before we return to
- * user mode.
+ * This is called when %TIF_NOTIFY_RESUME has been set to handle the exit
+ * to user work, which is multiplexed under this TIF bit. The bit is
+ * cleared and work is probed as pending. If the flag gets set again before
+ * exiting to user space caller will invoke this again.
  *
- * Called without locks.
+ * Any work invoked here, which wants to make decisions on @from_irq, must
+ * make these decisions dependent on CONFIG_GENERIC_ENTRY to retain the
+ * historical behaviour of resume_user_mode_work().
  */
-static inline void resume_user_mode_work(struct pt_regs *regs)
+static inline void exit_to_user_notify_resume(struct pt_regs *regs, bool from_irq)
 {
 	clear_thread_flag(TIF_NOTIFY_RESUME);
 	/*
@@ -62,4 +63,19 @@ static inline void resume_user_mode_work
 	rseq_handle_notify_resume(regs);
 }
 
+#ifndef CONFIG_GENERIC_ENTRY
+/**
+ * resume_user_mode_work - Perform work before returning to user mode
+ * @regs:		user-mode registers of @current task
+ *
+ * This is a wrapper around exit_to_user_notify_resume() for the existing
+ * call sites in architecture code, which do not use the generic entry
+ * code.
+ */
+static inline void resume_user_mode_work(struct pt_regs *regs)
+{
+	exit_to_user_notify_resume(regs, false);
+}
+#endif
+
 #endif /* LINUX_RESUME_USER_MODE_H */
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -365,7 +365,7 @@ static inline int io_run_task_work(void)
 	if (current->flags & PF_IO_WORKER) {
 		if (test_thread_flag(TIF_NOTIFY_RESUME)) {
 			__set_current_state(TASK_RUNNING);
-			resume_user_mode_work(NULL);
+			exit_to_user_notify_resume(NULL, false);
 		}
 		if (current->io_uring) {
 			unsigned int count = 0;
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -41,7 +41,7 @@ void __weak arch_do_signal_or_restart(st
 			arch_do_signal_or_restart(regs);
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
-			resume_user_mode_work(regs);
+			exit_to_user_notify_resume(regs, from_irq);
 
 		/* Architecture specific TIF work */
 		arch_exit_to_user_mode_work(regs, ti_work);
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -17,7 +17,7 @@ static int xfer_to_guest_mode_work(struc
 			schedule();
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
-			resume_user_mode_work(NULL);
+			exit_to_user_notify_resume(NULL, false);
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)


  parent reply	other threads:[~2025-08-13 16:29 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 16:29 [patch 00/11] rseq: Optimize exit to user space Thomas Gleixner
2025-08-13 16:29 ` [patch 01/11] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-08-20 14:23   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 02/11] rseq: Condense the inline stubs Thomas Gleixner
2025-08-20 14:24   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 03/11] rseq: Rename rseq_syscall() to rseq_debug_syscall_exit() Thomas Gleixner
2025-08-20 14:25   ` Mathieu Desnoyers
2025-08-13 16:29 ` [patch 04/11] rseq: Replace the pointless event mask bit fiddling Thomas Gleixner
2025-08-13 16:29 ` [patch 05/11] rseq: Optimize the signal delivery path Thomas Gleixner
2025-08-13 16:29 ` [patch 06/11] rseq: Optimize exit to user space further Thomas Gleixner
2025-08-13 16:29 ` [patch 07/11] entry: Cleanup header Thomas Gleixner
2025-08-13 17:09   ` Giorgi Tchankvetadze
2025-08-13 21:30     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 08/11] entry: Distinguish between syscall and interrupt exit Thomas Gleixner
2025-08-13 16:29 ` Thomas Gleixner [this message]
2025-08-13 16:29 ` [patch 10/11] rseq: Skip fixup when returning from a syscall Thomas Gleixner
2025-08-14  8:54   ` Peter Zijlstra
2025-08-14 13:24     ` Thomas Gleixner
2025-08-13 16:29 ` [patch 11/11] rseq: Convert to masked user access where applicable Thomas Gleixner
2025-08-13 17:45 ` [patch 00/11] rseq: Optimize exit to user space Jens Axboe
2025-08-13 21:32   ` Thomas Gleixner
2025-08-13 21:36     ` Jens Axboe
2025-08-13 22:08       ` Thomas Gleixner
2025-08-17 21:23         ` Thomas Gleixner
2025-08-18 14:00           ` BUG: rseq selftests and librseq vs. glibc fail Thomas Gleixner
2025-08-18 14:15             ` Florian Weimer
2025-08-18 17:13               ` Thomas Gleixner
2025-08-18 19:33                 ` Florian Weimer
2025-08-18 19:46                   ` Sean Christopherson
2025-08-18 19:55                     ` Florian Weimer
2025-08-18 20:27                       ` Sean Christopherson
2025-08-18 23:54                         ` Thomas Gleixner
2025-08-19  0:28                           ` Sean Christopherson
2025-08-19  6:18                             ` Florian Weimer
2025-08-29 18:44                 ` Prakash Sangappa
2025-08-29 18:50                   ` Mathieu Desnoyers
2025-09-01 19:30                     ` Prakash Sangappa
2025-08-18 17:38           ` [patch 00/11] rseq: Optimize exit to user space Michael Jeanson
2025-08-18 20:21             ` Thomas Gleixner
2025-08-18 21:29               ` Michael Jeanson
2025-08-18 23:43                 ` Thomas Gleixner
2025-08-20 14:27           ` Mathieu Desnoyers
2025-08-20 14:10 ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250813162824.356621744@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mjeanson@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).