From: Thomas Gleixner <tglx@linutronix.de>
To: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Jonathan Corbet <corbet@lwn.net>,
Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Steven Rostedt <rostedt@goodmis.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Arnd Bergmann <arnd@arndb.de>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Subject: Re: [patch V3 07/12] rseq: Implement syscall entry work for time slice extensions
Date: Wed, 19 Nov 2025 16:25:03 +0100 [thread overview]
Message-ID: <874iqqm4dr.ffs@tglx> (raw)
In-Reply-To: <261A8604-DA8D-468A-83BB-F530D5639A43@oracle.com>
On Wed, Nov 19 2025 at 00:20, Prakash Sangappa wrote:
>> On Oct 29, 2025, at 6:22 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> + if (put_user(0U, &curr->rseq.usrptr->slice_ctrl.all) || syscall != __NR_rseq_slice_yield)
>> + force_sig(SIGSEGV);
>> +}
>
> I have been trying to get our Database team to implement changes to
> use the slice extension API. They encounter the issue with a system
> call being made within the slice extension window and the process dies
> with SEGV.
Good. Works as designed.
> Apparently it will be hard to enforce not calling a system call in the
> slice extension window due to layering.
Why do I have a smell of rotten onions in my nose right now?
> For the DB use case, It is fine to terminate the slice extension if a
> system call is made, but the process getting killed will not work.
That's not a question of being fine or not.
The point is that on PREEMPT_NONE/VOLUNATRY that arbitrary syscall can
consume tons of CPU cycles until it either schedules out voluntarily or
reaches __exit_to_user_mode_loop(), which is defeating the whole
mechanism. The timer does not help in that case because once the task is
in the kernel it won't be preempted on return from interrupt.
sys_rseq_sched_yield() is time bound, which is why it was implemented
that way.
I was absolutely right when I asked to tie this mechanism to
PREEMPT_LAZY|FULL in the first place. That would nicely avoid the whole
problem.
Something like the uncompiled and untested below should work. Though I
hate it with a passion.
Thanks,
tglx
---
Subject: rseq/slice: Handle rotten onions gracefully
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 19 Nov 2025 16:07:15 +0100
Add rant here.
Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/rseq.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -643,13 +643,21 @@ void rseq_syscall_enter_work(long syscal
}
curr->rseq.slice.state.granted = false;
+ /* Clear the grant in user space. */
+ if (put_user(0U, &curr->rseq.usrptr->slice_ctrl.all))
+ force_sig(SIGSEGV);
+
/*
- * Clear the grant in user space and check whether this was the
- * correct syscall to yield. If the user access fails or the task
- * used an arbitrary syscall, terminate it.
+ * Grudgingly support onion layer applications which cannot
+ * guarantee that rseq_slice_yield() is used to yield the CPU for
+ * terminating a grant. This is a NOP on PREEMPT_FULL/LAZY because
+ * enabling preemption above already scheduled, but required for
+ * PREEMPT_NONE/VOLUNTARY to prevent that the slice is further
+ * expanded up to the point where the syscall code schedules
+ * voluntarily or reaches exit_to_user_mode_loop().
*/
- if (put_user(0U, &curr->rseq.usrptr->slice_ctrl.all) || syscall != __NR_rseq_slice_yield)
- force_sig(SIGSEGV);
+ if (syscall != __NR_rseq_slice_yield)
+ cond_resched();
}
int rseq_slice_extension_prctl(unsigned long arg2, unsigned long arg3)
next prev parent reply other threads:[~2025-11-19 15:25 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 13:22 [patch V3 00/12] rseq: Implement time slice extension mechanism Thomas Gleixner
2025-10-29 13:22 ` [patch V3 01/12] sched: Provide and use set_need_resched_current() Thomas Gleixner
2025-10-29 13:22 ` [patch V3 02/12] rseq: Add fields and constants for time slice extension Thomas Gleixner
2025-10-30 22:01 ` Prakash Sangappa
2025-10-31 14:32 ` Thomas Gleixner
2025-10-31 19:31 ` Mathieu Desnoyers
2025-10-31 20:58 ` Thomas Gleixner
2025-11-01 22:53 ` Thomas Gleixner
2025-11-03 17:00 ` Mathieu Desnoyers
2025-11-03 19:19 ` Florian Weimer
2025-11-04 0:20 ` Steven Rostedt
2025-10-29 13:22 ` [patch V3 03/12] rseq: Provide static branch for time slice extensions Thomas Gleixner
2025-10-29 17:23 ` Randy Dunlap
2025-10-29 21:12 ` Thomas Gleixner
2025-10-31 19:34 ` Mathieu Desnoyers
2025-10-29 13:22 ` [patch V3 04/12] rseq: Add statistics " Thomas Gleixner
2025-10-31 19:36 ` Mathieu Desnoyers
2025-10-29 13:22 ` [patch V3 05/12] rseq: Add prctl() to enable " Thomas Gleixner
2025-10-31 19:43 ` Mathieu Desnoyers
2025-10-31 21:05 ` Thomas Gleixner
2025-10-29 13:22 ` [patch V3 06/12] rseq: Implement sys_rseq_slice_yield() Thomas Gleixner
2025-10-31 19:46 ` Mathieu Desnoyers
2025-10-31 21:07 ` Thomas Gleixner
2025-11-03 17:07 ` Mathieu Desnoyers
2025-10-29 13:22 ` [patch V3 07/12] rseq: Implement syscall entry work for time slice extensions Thomas Gleixner
2025-10-31 19:53 ` Mathieu Desnoyers
2025-11-19 0:20 ` Prakash Sangappa
2025-11-19 15:25 ` Thomas Gleixner [this message]
2025-11-20 7:37 ` Prakash Sangappa
2025-11-20 11:31 ` Thomas Gleixner
2025-11-21 0:12 ` Prakash Sangappa
2025-11-26 22:02 ` Prakash Sangappa
2025-11-21 9:28 ` david laight
2025-10-29 13:22 ` [patch V3 08/12] rseq: Implement time slice extension enforcement timer Thomas Gleixner
2025-10-29 18:45 ` Steven Rostedt
2025-10-29 21:37 ` Thomas Gleixner
2025-10-29 23:53 ` Steven Rostedt
2025-10-31 19:59 ` Mathieu Desnoyers
2025-10-29 13:22 ` [patch V3 09/12] rseq: Reset slice extension when scheduled Thomas Gleixner
2025-10-31 20:03 ` Mathieu Desnoyers
2025-10-29 13:22 ` [patch V3 10/12] rseq: Implement rseq_grant_slice_extension() Thomas Gleixner
2025-10-29 20:08 ` Steven Rostedt
2025-10-29 21:46 ` Thomas Gleixner
2025-10-29 22:04 ` Steven Rostedt
2025-10-31 14:33 ` Thomas Gleixner
2025-10-29 13:22 ` [patch V3 11/12] entry: Hook up rseq time slice extension Thomas Gleixner
2025-10-29 13:22 ` [patch V3 12/12] selftests/rseq: Implement time slice extension test Thomas Gleixner
2025-10-29 15:10 ` [patch V3 00/12] rseq: Implement time slice extension mechanism Sebastian Andrzej Siewior
2025-10-29 15:40 ` Steven Rostedt
2025-10-29 21:49 ` Thomas Gleixner
2025-11-06 17:28 ` Prakash Sangappa
2025-11-10 14:23 ` Mathieu Desnoyers
2025-11-10 17:05 ` Mathieu Desnoyers
2025-11-11 16:42 ` Mathieu Desnoyers
2025-11-12 6:30 ` Prakash Sangappa
2025-11-12 20:40 ` Mathieu Desnoyers
2025-11-12 21:57 ` Thomas Gleixner
2025-11-12 23:17 ` Prakash Sangappa
2025-11-13 2:34 ` Prakash Sangappa
2025-11-13 14:38 ` Thomas Gleixner
2025-11-12 20:31 ` Thomas Gleixner
2025-11-12 20:46 ` Mathieu Desnoyers
2025-11-12 21:54 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874iqqm4dr.ffs@tglx \
--to=tglx@linutronix.de \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=corbet@lwn.net \
--cc=kprateek.nayak@amd.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=prakash.sangappa@oracle.com \
--cc=rostedt@goodmis.org \
--cc=vineethr@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).