From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>,
Jens Axboe <axboe@kernel.dk>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
x86@kernel.org, Sean Christopherson <seanjc@google.com>,
Wei Liu <wei.liu@kernel.org>
Subject: [patch V6 07/31] rseq, virt: Retrigger RSEQ after vcpu_run()
Date: Mon, 27 Oct 2025 09:44:28 +0100 (CET) [thread overview]
Message-ID: <20251027084306.399495855@linutronix.de> (raw)
In-Reply-To: 20251027084220.785525188@linutronix.de
Hypervisors invoke resume_user_mode_work() before entering the guest, which
clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user
space context available to them, so the rseq notify handler skips
inspecting the critical section, but updates the CPU/MM CID values
unconditionally so that the eventual pending rseq event is not lost on the
way to user space.
This is a pointless exercise as the task might be rescheduled before
actually returning to user space and it creates unnecessary work in the
vcpu_run() loops.
It's way more efficient to ignore that invocation based on @regs == NULL
and let the hypervisors re-raise TIF_NOTIFY_RESUME after returning from the
vcpu_run() loop before returning from the ioctl().
This ensures that a pending RSEQ update is not lost and the IDs are updated
before returning to user space.
Once the RSEQ handling is decoupled from TIF_NOTIFY_RESUME, this turns into
a NOOP.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sean Christopherson <seanjc@google.com>
---
V5: Add a comment that this is temporary - Sean
V3: Add the missing rseq.h include for HV - 0-day
---
drivers/hv/mshv_root_main.c | 3 +
include/linux/rseq.h | 17 +++++++++
kernel/rseq.c | 76 +++++++++++++++++++++++---------------------
virt/kvm/kvm_main.c | 7 ++++
4 files changed, 67 insertions(+), 36 deletions(-)
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -29,6 +29,7 @@
#include <linux/crash_dump.h>
#include <linux/panic_notifier.h>
#include <linux/vmalloc.h>
+#include <linux/rseq.h>
#include "mshv_eventfd.h"
#include "mshv.h"
@@ -560,6 +561,8 @@ static long mshv_run_vp_with_root_schedu
}
} while (!vp->run.flags.intercept_suspend);
+ rseq_virt_userspace_exit();
+
return ret;
}
--- a/include/linux/rseq.h
+++ b/include/linux/rseq.h
@@ -38,6 +38,22 @@ static __always_inline void rseq_exit_to
}
/*
+ * KVM/HYPERV invoke resume_user_mode_work() before entering guest mode,
+ * which clears TIF_NOTIFY_RESUME. To avoid updating user space RSEQ in
+ * that case just to do it eventually again before returning to user space,
+ * the entry resume_user_mode_work() invocation is ignored as the register
+ * argument is NULL.
+ *
+ * After returning from guest mode, they have to invoke this function to
+ * re-raise TIF_NOTIFY_RESUME if necessary.
+ */
+static inline void rseq_virt_userspace_exit(void)
+{
+ if (current->rseq_event_pending)
+ set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+}
+
+/*
* If parent process has a registered restartable sequences area, the
* child inherits. Unregister rseq for a clone with CLONE_VM set.
*/
@@ -68,6 +84,7 @@ static inline void rseq_execve(struct ta
static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) { }
static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { }
static inline void rseq_sched_switch_event(struct task_struct *t) { }
+static inline void rseq_virt_userspace_exit(void) { }
static inline void rseq_fork(struct task_struct *t, u64 clone_flags) { }
static inline void rseq_execve(struct task_struct *t) { }
static inline void rseq_exit_to_user_mode(void) { }
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -422,50 +422,54 @@ void __rseq_handle_notify_resume(struct
{
struct task_struct *t = current;
int ret, sig;
+ bool event;
+
+ /*
+ * If invoked from hypervisors before entering the guest via
+ * resume_user_mode_work(), then @regs is a NULL pointer.
+ *
+ * resume_user_mode_work() clears TIF_NOTIFY_RESUME and re-raises
+ * it before returning from the ioctl() to user space when
+ * rseq_event.sched_switch is set.
+ *
+ * So it's safe to ignore here instead of pointlessly updating it
+ * in the vcpu_run() loop.
+ */
+ if (!regs)
+ return;
if (unlikely(t->flags & PF_EXITING))
return;
/*
- * If invoked from hypervisors or IO-URING, then @regs is a NULL
- * pointer, so fixup cannot be done. If the syscall which led to
- * this invocation was invoked inside a critical section, then it
- * will either end up in this code again or a possible violation of
- * a syscall inside a critical region can only be detected by the
- * debug code in rseq_syscall() in a debug enabled kernel.
+ * Read and clear the event pending bit first. If the task
+ * was not preempted or migrated or a signal is on the way,
+ * there is no point in doing any of the heavy lifting here
+ * on production kernels. In that case TIF_NOTIFY_RESUME
+ * was raised by some other functionality.
+ *
+ * This is correct because the read/clear operation is
+ * guarded against scheduler preemption, which makes it CPU
+ * local atomic. If the task is preempted right after
+ * re-enabling preemption then TIF_NOTIFY_RESUME is set
+ * again and this function is invoked another time _before_
+ * the task is able to return to user mode.
+ *
+ * On a debug kernel, invoke the fixup code unconditionally
+ * with the result handed in to allow the detection of
+ * inconsistencies.
*/
- if (regs) {
- /*
- * Read and clear the event pending bit first. If the task
- * was not preempted or migrated or a signal is on the way,
- * there is no point in doing any of the heavy lifting here
- * on production kernels. In that case TIF_NOTIFY_RESUME
- * was raised by some other functionality.
- *
- * This is correct because the read/clear operation is
- * guarded against scheduler preemption, which makes it CPU
- * local atomic. If the task is preempted right after
- * re-enabling preemption then TIF_NOTIFY_RESUME is set
- * again and this function is invoked another time _before_
- * the task is able to return to user mode.
- *
- * On a debug kernel, invoke the fixup code unconditionally
- * with the result handed in to allow the detection of
- * inconsistencies.
- */
- bool event;
-
- scoped_guard(RSEQ_EVENT_GUARD) {
- event = t->rseq_event_pending;
- t->rseq_event_pending = false;
- }
+ scoped_guard(RSEQ_EVENT_GUARD) {
+ event = t->rseq_event_pending;
+ t->rseq_event_pending = false;
+ }
- if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event) {
- ret = rseq_ip_fixup(regs, event);
- if (unlikely(ret < 0))
- goto error;
- }
+ if (IS_ENABLED(CONFIG_DEBUG_RSEQ) || event) {
+ ret = rseq_ip_fixup(regs, event);
+ if (unlikely(ret < 0))
+ goto error;
}
+
if (unlikely(rseq_update_cpu_node_id(t)))
goto error;
return;
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
#include <linux/lockdep.h>
#include <linux/kthread.h>
#include <linux/suspend.h>
+#include <linux/rseq.h>
#include <asm/processor.h>
#include <asm/ioctl.h>
@@ -4476,6 +4477,12 @@ static long kvm_vcpu_ioctl(struct file *
r = kvm_arch_vcpu_ioctl_run(vcpu);
vcpu->wants_to_run = false;
+ /*
+ * FIXME: Remove this hack once all KVM architectures
+ * support the generic TIF bits, i.e. a dedicated TIF_RSEQ.
+ */
+ rseq_virt_userspace_exit();
+
trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
break;
}
next prev parent reply other threads:[~2025-10-27 8:44 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-27 8:44 [patch V6 00/31] rseq: Optimize exit to user space Thomas Gleixner
2025-10-27 8:44 ` [patch V6 01/31] rseq: Avoid pointless evaluation in __rseq_notify_resume() Thomas Gleixner
2025-10-29 10:24 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 02/31] rseq: Condense the inline stubs Thomas Gleixner
2025-10-29 10:24 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 03/31] rseq: Move algorithm comment to top Thomas Gleixner
2025-10-29 10:24 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 04/31] rseq: Remove the ksig argument from rseq_handle_notify_resume() Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 05/31] rseq: Simplify registration Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 06/31] rseq: Simplify the event notification Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` Thomas Gleixner [this message]
2025-10-28 15:08 ` [patch V6 07/31] rseq, virt: Retrigger RSEQ after vcpu_run() Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 08/31] rseq: Avoid CPU/MM CID updates when no event pending Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 09/31] rseq: Introduce struct rseq_data Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 10/31] entry: Cleanup header Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` [tip: core/rseq] entry: Clean up header tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 11/31] entry: Remove syscall_enter_from_user_mode_prepare() Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 12/31] entry: Inline irqentry_enter/exit_from/to_user_mode() Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 13/31] sched: Move MM CID related functions to sched.h Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 14/31] rseq: Cache CPU ID and MM CID values Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 15/31] rseq: Record interrupt from user space Thomas Gleixner
2025-10-28 15:26 ` Mathieu Desnoyers
2025-10-28 17:02 ` Thomas Gleixner
2025-10-28 17:53 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 16/31] rseq: Provide tracepoint wrappers for inline code Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 17/31] rseq: Expose lightweight statistics in debugfs Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 18/31] rseq: Provide static branch for runtime debugging Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:44 ` [patch V6 19/31] rseq: Provide and use rseq_update_user_cs() Thomas Gleixner
2025-10-28 15:40 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-10-29 16:04 ` [patch V6 19/31] " Steven Rostedt
2025-10-29 21:00 ` Thomas Gleixner
2025-10-29 21:53 ` Steven Rostedt
2025-11-03 14:47 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 20/31] rseq: Replace the original debug implementation Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-10-30 21:52 ` [patch V6 20/31] " Prakash Sangappa
2025-10-31 14:27 ` Thomas Gleixner
2025-11-03 14:47 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 21/31] rseq: Make exit debugging static branch based Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 22/31] rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 23/31] rseq: Provide and use rseq_set_ids() Thomas Gleixner
2025-10-28 15:47 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 24/31] rseq: Separate the signal delivery path Thomas Gleixner
2025-10-28 15:51 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 25/31] rseq: Rework the TIF_NOTIFY handler Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:17 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 26/31] rseq: Optimize event setting Thomas Gleixner
2025-10-28 15:57 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:16 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 27/31] rseq: Implement fast path for exit to user Thomas Gleixner
2025-10-28 16:09 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-10-29 16:28 ` [patch V6 27/31] " Steven Rostedt
2025-11-03 14:47 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-04 8:16 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 28/31] rseq: Switch to fast path processing on " Thomas Gleixner
2025-10-28 16:14 ` Mathieu Desnoyers
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:16 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 29/31] entry: Split up exit_to_user_mode_prepare() Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:16 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 30/31] rseq: Split up rseq_exit_to_user_mode() Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:16 ` tip-bot2 for Thomas Gleixner
2025-10-27 8:45 ` [patch V6 31/31] rseq: Switch to TIF_RSEQ if supported Thomas Gleixner
2025-10-29 10:23 ` [tip: core/rseq] " tip-bot2 for Thomas Gleixner
2025-11-03 14:47 ` tip-bot2 for Thomas Gleixner
2025-11-04 8:16 ` tip-bot2 for Thomas Gleixner
2025-10-29 10:23 ` [patch V6 00/31] rseq: Optimize exit to user space Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251027084306.399495855@linutronix.de \
--to=tglx@linutronix.de \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mjeanson@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=seanjc@google.com \
--cc=wei.liu@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox