From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Jonathan Corbet <corbet@lwn.net>,
Prakash Sangappa <prakash.sangappa@oracle.com>,
Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Steven Rostedt <rostedt@goodmis.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, Randy Dunlap <rdunlap@infradead.org>,
Ron Geva <rongevarg@gmail.com>, Waiman Long <longman@redhat.com>
Subject: Re: [patch V6 07/11] rseq: Implement time slice extension enforcement timer
Date: Fri, 19 Dec 2025 11:05:17 +0100 [thread overview]
Message-ID: <20251219100517.GA1132199@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <87ecorbccp.ffs@tglx>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 3935 bytes --]
On Fri, Dec 19, 2025 at 12:26:46AM +0100, Thomas Gleixner wrote:
> On Thu, Dec 18 2025 at 16:05, Peter Zijlstra wrote:
> > On Mon, Dec 15, 2025 at 05:52:22PM +0100, Thomas Gleixner wrote:
> >
> >> V5: Document the slice extension range - PeterZ
> >
> >> --- a/Documentation/admin-guide/sysctl/kernel.rst
> >> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> >> @@ -1228,6 +1228,14 @@ reboot-cmd (SPARC only)
> >> ROM/Flash boot loader. Maybe to tell it what to do after
> >> rebooting. ???
> >>
> >> +rseq_slice_extension_nsec
> >> +=========================
> >> +
> >> +A task can request to delay its scheduling if it is in a critical section
> >> +via the prctl(PR_RSEQ_SLICE_EXTENSION_SET) mechanism. This sets the maximum
> >> +allowed extension in nanoseconds before scheduling of the task is enforced.
> >> +Default value is 30000ns (30us). The possible range is 10000ns (10us) to
> >> +50000ns (50us).
> >
> > The important bit: we're not going to increase these numbers. If
> > anything, I would like the default to be 10us and taint the kernel if
> > you up it.
>
> Fine with me.
Thanks; the thinking is that it will be very hard to shrink this number
due to unknown workloads in the wild and all that, so starting on the
small end is the conservative option.
> > I also think we want some tracing/tool to find the actual length of the
> > extension used (min/avg/max etc.). That is the time between the kernel
> > finding the extension bit set and arming the timer and the slice_yield()
> > syscall.
>
> I could probably integrate that easily into the RSEQ stats mechanism.
I was thinking that perhaps the hrtimer tracepoints, filtered on this
specific timer, might just do. Arming the timer is the point where the
extension is granted, cancelling the timer is on the slice_yield() (or
any other random syscall :/), and the timer actually firing is on fail.
Normally I would suggest using a Poison distribution to find the
'average', but this case is more complicated because the start of the
extension is lost.
Let me ask one of these fancy AI things. Ah, it says this is "a classic
example of Length-Biased Sampling combined with Left-Truncation". It
then further suggests:
If you cannot assume a distribution, you should use a Weighting
Method. Since the probability of catching an event of length L is
proportional to L, you must weight each observation by 1/L.
1. For each event, record the observed duration d_i
2. Calculate the weighted mean:
\Sum (d_i * 1/d_i) n
avg(x)_true = ------------------ = ----------
\Sum 1/d_i \Sum 1/d_i
This is the Harmonic Mean of your observed durations. The harmonic
mean effectively "penalizes" the long events you were more likely to
catch.
It also babbled something about an Inspection Paradox:
If your sampling rate is constant (a Poisson process) and the system is
in a "steady state," the most robust and mathematically elegant way to
find the true average duration (μ) is surprisingly simple.
In a steady-state system where you catch an event in progress:
The time from the start of the event to your arrival is U
(unobserved).
The time from your arrival to the end of the event is V (observed).
Under these specific conditions, the expected value of the observed
remaining duration (V) is exactly equal to the mean of the length-biased
distribution. However, because long events are over-sampled, the mean of
the durations you catch is actually higher than the true mean of all
events. For many common distributions (like the Exponential
distribution), the relationship is: μ=E[V]
Wait, if you ignore the part you missed (U) and only average the parts
you saw (V), you often arrive back at the true mean. This is known as
the Inspection Paradox.
Now I suppose I should do the real research to see how much of that is a
hallucination :-)
next prev parent reply other threads:[~2025-12-19 10:05 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 16:52 [patch V6 00/11] rseq: Implement time slice extension mechanism Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-15 16:52 ` [patch V6 01/11] rseq: Add fields and constants for time slice extension Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 14:36 ` Mathieu Desnoyers
2025-12-18 23:21 ` Thomas Gleixner
2026-01-07 21:11 ` Mathieu Desnoyers
2026-01-11 17:11 ` Thomas Gleixner
2026-01-13 23:45 ` Florian Weimer
2026-01-14 21:59 ` Thomas Gleixner
2026-01-17 16:16 ` Mathieu Desnoyers
2026-01-19 10:21 ` Peter Zijlstra
2026-01-19 10:30 ` Mathieu Desnoyers
2026-01-19 11:03 ` Peter Zijlstra
2026-01-19 11:10 ` Mathieu Desnoyers
2026-01-19 11:27 ` Peter Zijlstra
2026-01-19 10:46 ` Florian Weimer
2026-01-17 9:36 ` Peter Zijlstra
2026-01-19 10:10 ` Peter Zijlstra
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 02/11] rseq: Provide static branch for time slice extensions Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 03/11] rseq: Add statistics " Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 04/11] rseq: Add prctl() to enable " Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 05/11] rseq: Implement sys_rseq_slice_yield() Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 14:59 ` Mathieu Desnoyers
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 06/11] rseq: Implement syscall entry work for time slice extensions Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 15:05 ` Mathieu Desnoyers
2025-12-18 22:28 ` Thomas Gleixner
2025-12-18 22:30 ` Mathieu Desnoyers
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 07/11] rseq: Implement time slice extension enforcement timer Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 7:18 ` Randy Dunlap
2025-12-16 17:55 ` Prakash Sangappa
2025-12-16 8:26 ` [patch V6.1 " Thomas Gleixner
2025-12-16 15:13 ` [patch V6 " Mathieu Desnoyers
2025-12-18 15:05 ` Peter Zijlstra
2025-12-18 23:26 ` Thomas Gleixner
2025-12-19 10:05 ` Peter Zijlstra [this message]
2026-01-16 18:15 ` Peter Zijlstra
2026-01-18 10:46 ` Thomas Gleixner
2026-01-19 10:01 ` Peter Zijlstra
2025-12-18 15:18 ` Peter Zijlstra
2025-12-18 23:25 ` Thomas Gleixner
2026-01-17 9:57 ` Peter Zijlstra
2026-01-23 17:38 ` Prakash Sangappa
2026-01-23 17:41 ` Prakash Sangappa
2026-01-27 18:48 ` Peter Zijlstra
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 08/11] rseq: Reset slice extension when scheduled Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 15:17 ` Mathieu Desnoyers
2026-01-22 10:16 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 09/11] rseq: Implement rseq_grant_slice_extension() Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 15:25 ` Mathieu Desnoyers
2025-12-18 23:28 ` Thomas Gleixner
2026-01-11 10:22 ` Thomas Gleixner
2026-01-22 10:15 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 10/11] entry: Hook up rseq time slice extension Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2025-12-16 15:37 ` Mathieu Desnoyers
2025-12-19 11:07 ` Peter Zijlstra
2026-01-11 11:01 ` Thomas Gleixner
2026-01-17 9:51 ` Peter Zijlstra
2026-01-22 10:15 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2025-12-15 16:52 ` [patch V6 11/11] selftests/rseq: Implement time slice extension test Thomas Gleixner
2025-12-15 18:24 ` Thomas Gleixner
2026-01-22 10:15 ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251219100517.GA1132199@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=corbet@lwn.net \
--cc=kprateek.nayak@amd.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=prakash.sangappa@oracle.com \
--cc=rdunlap@infradead.org \
--cc=rongevarg@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vineethr@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.