linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Peter Zilstra <peterz@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org
Subject: [patch V2 10/12] rseq: Implement rseq_grant_slice_extension()
Date: Wed, 22 Oct 2025 14:57:41 +0200 (CEST)	[thread overview]
Message-ID: <20251022121427.534334174@linutronix.de> (raw)
In-Reply-To: 20251022110646.839870156@linutronix.de

Provide the actual decision function, which decides whether a time slice
extension is granted in the exit to user mode path when NEED_RESCHED is
evaluated.

The decision is made in two stages. First an inline quick check to avoid
going into the actual decision function. This checks whether:

 #1 the functionality is enabled

 #2 the exit is a return from interrupt to user mode

 #3 any TIF bit, which causes extra work is set. That includes TIF_RSEQ,
    which means the task was already scheduled out.
 
The slow path, which implements the actual user space ABI, is invoked
when:

  A) #1 is true, #2 is true and #3 is false

     It checks whether user space requested a slice extension by setting
     the request bit in the rseq slice_ctrl field. If so, it grants the
     extension and stores the slice expiry time, so that the actual exit
     code can double check whether the slice is already exhausted before
     going back.

  B) #1 - #3 are true _and_ a slice extension was granted in a previous
     loop iteration

     In this case the grant is revoked.

In case that the user space access faults or invalid state is detected, the
task is terminated with SIGSEGV.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
V2: Provide an extra stub for the !RSEQ case - Prateek
---
 include/linux/rseq_entry.h |  108 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

--- a/include/linux/rseq_entry.h
+++ b/include/linux/rseq_entry.h
@@ -41,6 +41,7 @@ DECLARE_PER_CPU(struct rseq_stats, rseq_
 #ifdef CONFIG_RSEQ
 #include <linux/jump_label.h>
 #include <linux/rseq.h>
+#include <linux/sched/signal.h>
 #include <linux/uaccess.h>
 
 #include <linux/tracepoint-defs.h>
@@ -108,10 +109,116 @@ static __always_inline void rseq_slice_c
 	t->rseq.slice.state.granted = false;
 }
 
+static __always_inline bool rseq_grant_slice_extension(bool work_pending)
+{
+	struct task_struct *curr = current;
+	struct rseq_slice_ctrl usr_ctrl;
+	union rseq_slice_state state;
+	struct rseq __user *rseq;
+
+	if (!rseq_slice_extension_enabled())
+		return false;
+
+	/* If not enabled or not a return from interrupt, nothing to do. */
+	state = curr->rseq.slice.state;
+	state.enabled &= curr->rseq.event.user_irq;
+	if (likely(!state.state))
+		return false;
+
+	rseq = curr->rseq.usrptr;
+	scoped_user_rw_access(rseq, efault) {
+
+		/*
+		 * Quick check conditions where a grant is not possible or
+		 * needs to be revoked.
+		 *
+		 *  1) Any TIF bit which needs to do extra work aside of
+		 *     rescheduling prevents a grant.
+		 *
+		 *  2) A previous rescheduling request resulted in a slice
+		 *     extension grant.
+		 */
+		if (unlikely(work_pending || state.granted)) {
+			/* Clear user control unconditionally. No point for checking */
+			unsafe_put_user(0U, &rseq->slice_ctrl.all, efault);
+			rseq_slice_clear_grant(curr);
+			return false;
+		}
+
+		unsafe_get_user(usr_ctrl.all, &rseq->slice_ctrl.all, efault);
+		if (likely(!(usr_ctrl.request)))
+			return false;
+
+		/* Grant the slice extention */
+		usr_ctrl.request = 0;
+		usr_ctrl.granted = 1;
+		unsafe_put_user(usr_ctrl.all, &rseq->slice_ctrl.all, efault);
+	}
+
+	rseq_stat_inc(rseq_stats.s_granted);
+
+	curr->rseq.slice.state.granted = true;
+	/* Store expiry time for arming the timer on the way out */
+	curr->rseq.slice.expires = data_race(rseq_slice_ext_nsecs) + ktime_get_mono_fast_ns();
+	/*
+	 * This is racy against a remote CPU setting TIF_NEED_RESCHED in
+	 * several ways:
+	 *
+	 * 1)
+	 *	CPU0			CPU1
+	 *	clear_tsk()
+	 *				set_tsk()
+	 *	clear_preempt()
+	 *				Raise scheduler IPI on CPU0
+	 *	--> IPI
+	 *	    fold_need_resched() -> Folds correctly
+	 * 2)
+	 *	CPU0			CPU1
+	 *				set_tsk()
+	 *	clear_tsk()
+	 *	clear_preempt()
+	 *				Raise scheduler IPI on CPU0
+	 *	--> IPI
+	 *	    fold_need_resched() <- NOOP as TIF_NEED_RESCHED is false
+	 *
+	 * #1 is not any different from a regular remote reschedule as it
+	 *    sets the previously not set bit and then raises the IPI which
+	 *    folds it into the preempt counter
+	 *
+	 * #2 is obviously incorrect from a scheduler POV, but it's not
+	 *    differently incorrect than the code below clearing the
+	 *    reschedule request with the safety net of the timer.
+	 *
+	 * The important part is that the clearing is protected against the
+	 * scheduler IPI and also against any other interrupt which might
+	 * end up waking up a task and setting the bits in the middle of
+	 * the operation:
+	 *
+	 *	clear_tsk()
+	 *	---> Interrupt
+	 *		wakeup_on_this_cpu()
+	 *		set_tsk()
+	 *		set_preempt()
+	 *	clear_preempt()
+	 *
+	 * which would be inconsistent state.
+	 */
+	scoped_guard(irq) {
+		clear_tsk_need_resched(curr);
+		clear_preempt_need_resched();
+	}
+	return true;
+
+efault:
+	force_sig(SIGSEGV);
+	return false;
+}
+
 #else /* CONFIG_RSEQ_SLICE_EXTENSION */
 static inline bool rseq_slice_extension_enabled(void) { return false; }
 static inline bool rseq_arm_slice_extension_timer(void) { return false; }
 static inline void rseq_slice_clear_grant(struct task_struct *t) { }
+static inline bool rseq_grant_slice_extension(bool work_pending) { return false; }
 #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */
 
 bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs, unsigned long csaddr);
@@ -645,6 +752,7 @@ static inline bool rseq_exit_to_user_mod
 static inline void rseq_syscall_exit_to_user_mode(void) { }
 static inline void rseq_irqentry_exit_to_user_mode(void) { }
 static inline void rseq_debug_syscall_return(struct pt_regs *regs) { }
+static inline bool rseq_grant_slice_extension(bool work_pending) { return false; }
 #endif /* !CONFIG_RSEQ */
 
 #endif /* _LINUX_RSEQ_ENTRY_H */


  parent reply	other threads:[~2025-10-22 12:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-22 12:57 [patch V2 00/12] rseq: Implement time slice extension mechanism Thomas Gleixner
2025-10-22 12:57 ` [patch V2 01/12] sched: Provide and use set_need_resched_current() Thomas Gleixner
2025-10-27  8:59   ` Sebastian Andrzej Siewior
2025-10-27 11:13     ` Thomas Gleixner
2025-10-22 12:57 ` [patch V2 02/12] rseq: Add fields and constants for time slice extension Thomas Gleixner
2025-10-22 17:28   ` Randy Dunlap
2025-10-22 12:57 ` [patch V2 03/12] rseq: Provide static branch for time slice extensions Thomas Gleixner
2025-10-27  9:29   ` Sebastian Andrzej Siewior
2025-10-22 12:57 ` [patch V2 04/12] rseq: Add statistics " Thomas Gleixner
2025-10-22 12:57 ` [patch V2 05/12] rseq: Add prctl() to enable " Thomas Gleixner
2025-10-27  9:40   ` Sebastian Andrzej Siewior
2025-10-22 12:57 ` [patch V2 06/12] rseq: Implement sys_rseq_slice_yield() Thomas Gleixner
2025-10-22 12:57 ` [patch V2 07/12] rseq: Implement syscall entry work for time slice extensions Thomas Gleixner
2025-10-22 12:57 ` [patch V2 08/12] rseq: Implement time slice extension enforcement timer Thomas Gleixner
2025-10-27 11:38   ` Sebastian Andrzej Siewior
2025-10-27 16:26     ` Thomas Gleixner
2025-10-28  8:33       ` Sebastian Andrzej Siewior
2025-10-28  8:51         ` K Prateek Nayak
2025-10-28  9:00           ` Sebastian Andrzej Siewior
2025-10-28  9:22             ` K Prateek Nayak
2025-10-28 10:22               ` Sebastian Andrzej Siewior
2025-10-28 13:04         ` Thomas Gleixner
2025-10-22 12:57 ` [patch V2 09/12] rseq: Reset slice extension when scheduled Thomas Gleixner
2025-10-22 12:57 ` Thomas Gleixner [this message]
2025-10-22 12:57 ` [patch V2 11/12] entry: Hook up rseq time slice extension Thomas Gleixner
2025-10-22 12:57 ` [patch V2 12/12] selftests/rseq: Implement time slice extension test Thomas Gleixner
2025-10-27 17:30 ` [patch V2 00/12] rseq: Implement time slice extension mechanism Sebastian Andrzej Siewior
2025-10-27 18:48   ` Thomas Gleixner
2025-10-28  8:53     ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251022121427.534334174@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prakash.sangappa@oracle.com \
    --cc=rostedt@goodmis.org \
    --cc=vineethr@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).