From: Thomas Gleixner <tglx@linutronix.de>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zilstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Jonathan Corbet <corbet@lwn.net>,
Prakash Sangappa <prakash.sangappa@oracle.com>,
Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Steven Rostedt <rostedt@goodmis.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org
Subject: Re: [patch 00/12] rseq: Implement time slice extension mechanism
Date: Fri, 12 Sep 2025 18:31:28 +0200 [thread overview]
Message-ID: <87a52zr5sv.ffs@tglx> (raw)
In-Reply-To: <3d16490f-e4d3-4e91-af17-62018e789da9@efficios.com>
On Fri, Sep 12 2025 at 08:33, Mathieu Desnoyers wrote:
> On 2025-09-11 16:18, Thomas Gleixner wrote:
>> It receives SIGSEGV because that means that it did not follow the rules
>> and stuck an arbitrary syscall into the critical section.
>
> Not following the rules could also be done by just looping for a long
> time in userspace within or after the critical section, in which case
> the timer should catch it.
It's pretty much impossible to tell for the kernel without more
overhead, whether that's actually a violation of the rules or not.
The operation after the grant can be interrupted (without resulting in
scheduling), which is out of control of the task which got the extension
granted.
The timer is there to ensure that there is an upper bound to the grant
independent of the actual reason.
Going through a different syscall is an obvious deviation from the rule.
As far I understood the earlier discussions, scheduler folks want to
enforce that because of PREEMPT_NONE semantics, where a randomly chosen
syscall might not result in an immediate reschedule because the work,
which needs to be done takes arbitrary time to complete.
Though that's arguably not much different from
syscall()
-> tick -> NEED_RESCHED
do_tons_of_work();
exit_to_user()
schedule();
except that in the slice extension case, the latency increases by the
slice extension time.
If we allow arbitrary syscalls to terminate the grant, then we need to
stick an immediate schedule() into the syscall entry work function. We'd
still need the separate yield() syscall to provide a side effect free
way of termination.
I have no strong opinions either way. Peter?
>>> rseq->slice_request = true; /* WRITE_ONCE() */
>>> barrier();
>>> critical_section();
>>> barrier();
>>> rseq->slice_request = false; /* WRITE_ONCE() */
>>> if (rseq->slice_grant) /* READ_ONCE() */
>>> rseq_slice_yield();
>>
>> That should work as it's strictly CPU local. Good point, now that you
>> said it it's obvious :)
>>
>> Let me rework it accordingly.
>
> I have two questions wrt ABI here:
>
> 1) Do we expect the slice requests to be done from C and higher level
> languages or only from assembly ?
It doesn't matter as long as the ordering is guaranteed.
> 2) Slice requests are a good fit for locking. Locking typically
> has nesting ability.
>
> We should consider making the slice request ABI a 8-bit
> or 16-bit nesting counter to allow nesting of its users.
Making request a counter requires to keep request set when the
extension is granted. So the states would be:
request granted
0 0 Neutral
>0 0 Requested
>=0 1 Granted
That should work.
Though I'm not really convinced that unconditionally embeddeding it into
random locking primitives is the right thing to do.
The extension makes only sense, when the actual critical section is
small and likely to complete within the extension time, which is usually
only true for highly optimized code and not for general usage, where the
lock held section is arbitrary long and might even result in syscalls
even if the critical section itself does not have an obvious explicit
syscall embedded:
lock(a)
lock(b) <- Contention results in syscall
Same applies for library functions within a critical section.
That then immediately conflicts with the yield mechanism rules, because
the extension could have been granted _before_ the syscall happens, so
we'd have remove that restriction too.
That said, we can make the ABI a counter and split the slice control
word into two u16. So the decision function would be:
get_usr(ctrl);
if (!ctrl.request)
return;
....
ctrl.granted = 1;
put_usr(ctrl);
Along with documentation why this should only be used nested when you
know what you are doing.
> 3) Slice requests are also a good fit for rseq critical sections.
> Of course someone could explicitly increment/decrement the
> slice request counter before/after the rseq critical sections, but
> I think we could do better there and integrate this directly within
> the struct rseq_cs as a new critical section flag. Basically, a
> critical section with this new RSEQ_CS_SLICE_REQUEST flag (or
> better name) set within its descriptor flags would behave as if
> the slice request counter is non-zero when preempted without
> requiring any extra instruction on the fast path. The only
> added overhead would be a check of the rseq->slice_grant flag
> when exiting the critical section to conditionally issue
> rseq_slice_yield().
Plus checking first whether rseq->slice.request is actually zero,
i.e. whether the rseq critical section was the outermost one. If not,
you cannot invoke the yield even if granted is true, right?
But mixing state spaces is not really a good idea at all. Let's not go
there.
Also you'd make checking of rseq_cs unconditional, which means extra
work in the grant decision function as it would then have to do:
if (!usr->slice.ctrl.request) {
if (!usr->rseq_cs)
return;
if (!valid_ptr(usr->rseq_cs))
goto die;
if (!within(regs->ip, usr->rseq_cs.start_ip, usr->rseq_cs.offset))
return;
if (!(use->rseq_cs.flags & REQUEST))
return;
}
IOW, we'd copy half of the rseq cs handling into that code.
Can we please keep it independent and simple?
Thanks,
tglx
next prev parent reply other threads:[~2025-09-12 16:31 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 22:59 [patch 00/12] rseq: Implement time slice extension mechanism Thomas Gleixner
2025-09-08 22:59 ` [patch 01/12] sched: Provide and use set_need_resched_current() Thomas Gleixner
2025-09-08 22:59 ` [patch 02/12] rseq: Add fields and constants for time slice extension Thomas Gleixner
2025-09-09 0:04 ` Randy Dunlap
2025-09-11 15:41 ` Mathieu Desnoyers
2025-09-11 15:49 ` Mathieu Desnoyers
2025-09-22 5:28 ` Prakash Sangappa
2025-09-22 5:57 ` K Prateek Nayak
2025-09-22 13:57 ` Mathieu Desnoyers
2025-09-22 13:55 ` Mathieu Desnoyers
2025-09-23 0:57 ` Prakash Sangappa
2025-09-08 22:59 ` [patch 03/12] rseq: Provide static branch for time slice extensions Thomas Gleixner
2025-09-09 3:10 ` K Prateek Nayak
2025-09-09 4:11 ` Randy Dunlap
2025-09-09 12:12 ` Thomas Gleixner
2025-09-09 16:01 ` Randy Dunlap
2025-09-11 15:42 ` Mathieu Desnoyers
2025-09-08 22:59 ` [patch 04/12] rseq: Add statistics " Thomas Gleixner
2025-09-11 15:43 ` Mathieu Desnoyers
2025-09-08 22:59 ` [patch 05/12] rseq: Add prctl() to enable " Thomas Gleixner
2025-09-11 15:50 ` Mathieu Desnoyers
2025-09-11 16:52 ` K Prateek Nayak
2025-09-11 17:18 ` Mathieu Desnoyers
2025-09-08 23:00 ` [patch 06/12] rseq: Implement sys_rseq_slice_yield() Thomas Gleixner
2025-09-09 9:52 ` K Prateek Nayak
2025-09-09 12:23 ` Thomas Gleixner
2025-09-10 11:15 ` K Prateek Nayak
2025-09-08 23:00 ` [patch 07/12] rseq: Implement syscall entry work for time slice extensions Thomas Gleixner
2025-09-10 5:22 ` K Prateek Nayak
2025-09-10 7:49 ` Thomas Gleixner
2025-09-08 23:00 ` [patch 08/12] rseq: Implement time slice extension enforcement timer Thomas Gleixner
2025-09-10 11:20 ` K Prateek Nayak
2025-09-08 23:00 ` [patch 09/12] rseq: Reset slice extension when scheduled Thomas Gleixner
2025-09-08 23:00 ` [patch 10/12] rseq: Implement rseq_grant_slice_extension() Thomas Gleixner
2025-09-09 8:14 ` K Prateek Nayak
2025-09-09 12:16 ` Thomas Gleixner
2025-09-08 23:00 ` [patch 11/12] entry: Hook up rseq time slice extension Thomas Gleixner
2025-09-08 23:00 ` [patch 12/12] selftests/rseq: Implement time slice extension test Thomas Gleixner
2025-09-10 11:23 ` K Prateek Nayak
2025-09-09 12:37 ` [patch 00/12] rseq: Implement time slice extension mechanism Thomas Gleixner
2025-09-10 4:42 ` K Prateek Nayak
2025-09-10 11:28 ` K Prateek Nayak
2025-09-10 14:50 ` Thomas Gleixner
2025-09-11 3:03 ` K Prateek Nayak
2025-09-11 7:36 ` Prakash Sangappa
2025-09-11 15:27 ` Mathieu Desnoyers
2025-09-11 20:18 ` Thomas Gleixner
2025-09-12 12:33 ` Mathieu Desnoyers
2025-09-12 16:31 ` Thomas Gleixner [this message]
2025-09-12 19:26 ` Mathieu Desnoyers
2025-09-13 13:02 ` Thomas Gleixner
2025-09-19 17:30 ` Prakash Sangappa
2025-09-22 14:09 ` Mathieu Desnoyers
2025-09-23 1:01 ` Prakash Sangappa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a52zr5sv.ffs@tglx \
--to=tglx@linutronix.de \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=corbet@lwn.net \
--cc=kprateek.nayak@amd.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=prakash.sangappa@oracle.com \
--cc=rostedt@goodmis.org \
--cc=vineethr@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox