* [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
@ 2025-06-14 2:04 Kuyo Chang
2025-06-14 2:35 ` John Stultz
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Kuyo Chang @ 2025-06-14 2:04 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno
Cc: jstultz, kuyo chang, linux-kernel, linux-arm-kernel,
linux-mediatek
From: kuyo chang <kuyo.chang@mediatek.com>
[Symptom]
The calculation formula for fair_server runtime is based on
Frequency/CPU scale-invariance.
This will cause excessive RT latency (expect absolute time).
[Analysis]
Consider the following case under a Big.LITTLE architecture:
Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
FIE: 100
CIE:50
First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
So it will scaled to 238,418 ns.
[Solution]
The runtime for fair_server should be absolute time
asis RT bandwidth control.
Fix the runtime calculation formula for the fair_server.
Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
---
kernel/sched/deadline.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245..8bfa846cf0dc 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
if (dl_entity_is_special(dl_se))
return;
- scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
+ if (dl_se == &rq->fair_server)
+ scaled_delta_exec = delta_exec;
+ else
+ scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
dl_se->runtime -= scaled_delta_exec;
@@ -1611,7 +1614,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
*/
void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
{
- s64 delta_exec, scaled_delta_exec;
+ s64 delta_exec;
if (!rq->fair_server.dl_defer)
return;
@@ -1624,9 +1627,7 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
if (delta_exec < 0)
return;
- scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
-
- rq->fair_server.runtime -= scaled_delta_exec;
+ rq->fair_server.runtime -= delta_exec;
if (rq->fair_server.runtime < 0) {
rq->fair_server.dl_defer_running = 0;
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-14 2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
@ 2025-06-14 2:35 ` John Stultz
2025-06-16 14:03 ` Juri Lelli
2025-06-17 8:55 ` Peter Zijlstra
2 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2025-06-14 2:35 UTC (permalink / raw)
To: Kuyo Chang
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno,
linux-kernel, linux-arm-kernel, linux-mediatek
On Fri, Jun 13, 2025 at 7:05 PM Kuyo Chang <kuyo.chang@mediatek.com> wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
>
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
>
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
>
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
>
> So it will scaled to 238,418 ns.
>
> [Solution]
> The runtime for fair_server should be absolute time
> asis RT bandwidth control.
> Fix the runtime calculation formula for the fair_server.
>
> Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
While I've not quite gotten my head around the details in the
dl_server code, I've been able to reproduce the problem described here
with a 6.12 based kernel.
Running cyclictest (with arguments "-t -a -p99 -m") , and a randomized
input test on an Android device, its pretty easy to trip 100ms to
*multi-second* delays of the RT prio 99 threads.
Perfetto image:
https://github.com/johnstultz-work/misc/blob/main/images/2025-06-13_cyclictest-dl-server-latency.png
Link to the actual trace:
https://ui.perfetto.dev/#!/?s=9bbb9e539ac2bbbfe3cfa954409134662a9f624a
Using this patch, so far in my testing with the same workload, the max
cyclictest latencies stick around the single digit ms range.
The part that is a little confusing to me, is that prior to the long
stall, it doesn't appear that RT tasks are actually starving
SCHED_NORMAL tasks, so I'm conceptually surprised to see the dl_server
boosting the normal tasks, especially for so long, but I admittedly
haven't looked in detail at the code and have been going off my
understanding of how it was supposed to replace rt-throttling, so I
may be missing a subtlety.
thanks
-john
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-14 2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
2025-06-14 2:35 ` John Stultz
@ 2025-06-16 14:03 ` Juri Lelli
2025-06-17 8:55 ` Peter Zijlstra
2 siblings, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2025-06-16 14:03 UTC (permalink / raw)
To: Kuyo Chang
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
linux-kernel, linux-arm-kernel, linux-mediatek
Hello,
On 14/06/25 10:04, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
>
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
>
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
>
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
>
> So it will scaled to 238,418 ns.
>
> [Solution]
> The runtime for fair_server should be absolute time
> asis RT bandwidth control.
> Fix the runtime calculation formula for the fair_server.
>
> Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
> ---
Right, I would agree we don't actually want to scale fair_server runtime
by frequency/capacity. Your change looks good to me.
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Thanks!
Juri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-14 2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
2025-06-14 2:35 ` John Stultz
2025-06-16 14:03 ` Juri Lelli
@ 2025-06-17 8:55 ` Peter Zijlstra
2025-06-17 12:33 ` Juri Lelli
2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2025-06-17 8:55 UTC (permalink / raw)
To: Kuyo Chang
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
linux-kernel, linux-arm-kernel, linux-mediatek
On Sat, Jun 14, 2025 at 10:04:55AM +0800, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
>
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
>
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
>
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
What's this FIE/CIE stuff? Is that some ARM lingo?
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index ad45a8fea245..8bfa846cf0dc 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> if (dl_entity_is_special(dl_se))
> return;
>
> - scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> + if (dl_se == &rq->fair_server)
> + scaled_delta_exec = delta_exec;
> + else
> + scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
Juri, the point it a bit moot atm, but is this something specific to the
fair_server in particular, or all servers?
Because if this is something all servers require then the above is
ofcourse wrong.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-17 8:55 ` Peter Zijlstra
@ 2025-06-17 12:33 ` Juri Lelli
2025-06-17 14:14 ` Peter Zijlstra
0 siblings, 1 reply; 8+ messages in thread
From: Juri Lelli @ 2025-06-17 12:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kuyo Chang, Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
linux-kernel, linux-arm-kernel, linux-mediatek
On 17/06/25 10:55, Peter Zijlstra wrote:
> On Sat, Jun 14, 2025 at 10:04:55AM +0800, Kuyo Chang wrote:
> > From: kuyo chang <kuyo.chang@mediatek.com>
> >
> > [Symptom]
> > The calculation formula for fair_server runtime is based on
> > Frequency/CPU scale-invariance.
> > This will cause excessive RT latency (expect absolute time).
> >
> > [Analysis]
> > Consider the following case under a Big.LITTLE architecture:
> >
> > Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> > FIE: 100
> > CIE:50
> > First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
>
> What's this FIE/CIE stuff? Is that some ARM lingo?
>
>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index ad45a8fea245..8bfa846cf0dc 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> > if (dl_entity_is_special(dl_se))
> > return;
> >
> > - scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> > + if (dl_se == &rq->fair_server)
> > + scaled_delta_exec = delta_exec;
> > + else
> > + scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
>
> Juri, the point it a bit moot atm, but is this something specific to the
> fair_server in particular, or all servers?
I believe for other servers (i.e., rt-server work from Yuri and Luca) it
might be useful to have it configurable somehow. I actually had a recent
discussion about this concerning single task entities (traditional
deadline servers) for which as well there might be cases where one might
want not to scale considering frequency and capacity.
> Because if this is something all servers require then the above is
> ofcourse wrong.
To me it looks like we want this (no scaling) for fair_server (and
possibly scx_server?) as for them we are only looking into a 'fixed
time' type of isolation. Full fledged servers (hierarchical scheduling)
maybe have it configurable, or enabled by default as a start (as we have
it today).
Best,
Juri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-17 12:33 ` Juri Lelli
@ 2025-06-17 14:14 ` Peter Zijlstra
2025-06-17 14:23 ` Juri Lelli
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2025-06-17 14:14 UTC (permalink / raw)
To: Juri Lelli
Cc: Kuyo Chang, Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
linux-kernel, linux-arm-kernel, linux-mediatek
On Tue, Jun 17, 2025 at 02:33:15PM +0200, Juri Lelli wrote:
> On 17/06/25 10:55, Peter Zijlstra wrote:
> > On Sat, Jun 14, 2025 at 10:04:55AM +0800, Kuyo Chang wrote:
> > > From: kuyo chang <kuyo.chang@mediatek.com>
> > >
> > > [Symptom]
> > > The calculation formula for fair_server runtime is based on
> > > Frequency/CPU scale-invariance.
> > > This will cause excessive RT latency (expect absolute time).
> > >
> > > [Analysis]
> > > Consider the following case under a Big.LITTLE architecture:
> > >
> > > Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> > > FIE: 100
> > > CIE:50
> > > First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > > Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> >
> > What's this FIE/CIE stuff? Is that some ARM lingo?
> >
> >
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index ad45a8fea245..8bfa846cf0dc 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> > > if (dl_entity_is_special(dl_se))
> > > return;
> > >
> > > - scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> > > + if (dl_se == &rq->fair_server)
> > > + scaled_delta_exec = delta_exec;
> > > + else
> > > + scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> >
> > Juri, the point it a bit moot atm, but is this something specific to the
> > fair_server in particular, or all servers?
>
> I believe for other servers (i.e., rt-server work from Yuri and Luca) it
> might be useful to have it configurable somehow. I actually had a recent
> discussion about this concerning single task entities (traditional
> deadline servers) for which as well there might be cases where one might
> want not to scale considering frequency and capacity.
>
> > Because if this is something all servers require then the above is
> > ofcourse wrong.
>
> To me it looks like we want this (no scaling) for fair_server (and
> possibly scx_server?) as for them we are only looking into a 'fixed
> time' type of isolation. Full fledged servers (hierarchical scheduling)
> maybe have it configurable, or enabled by default as a start (as we have
> it today).
Right. Then we should write the above like:
scaled_delta_exec = delta_exec;
if (!dl_se->dl_server)
scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
and let any later server users add bits on if they want more options.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-17 14:14 ` Peter Zijlstra
@ 2025-06-17 14:23 ` Juri Lelli
2025-06-17 14:41 ` Kuyo Chang (張建文)
0 siblings, 1 reply; 8+ messages in thread
From: Juri Lelli @ 2025-06-17 14:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kuyo Chang, Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
linux-kernel, linux-arm-kernel, linux-mediatek
On 17/06/25 16:14, Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 02:33:15PM +0200, Juri Lelli wrote:
...
> > To me it looks like we want this (no scaling) for fair_server (and
> > possibly scx_server?) as for them we are only looking into a 'fixed
> > time' type of isolation. Full fledged servers (hierarchical scheduling)
> > maybe have it configurable, or enabled by default as a start (as we have
> > it today).
>
> Right. Then we should write the above like:
>
> scaled_delta_exec = delta_exec;
> if (!dl_se->dl_server)
> scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
>
> and let any later server users add bits on if they want more options.
Works for me. Looks cleaner also.
Kuyo, can you please update your patch then?
Thanks,
Juri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
2025-06-17 14:23 ` Juri Lelli
@ 2025-06-17 14:41 ` Kuyo Chang (張建文)
0 siblings, 0 replies; 8+ messages in thread
From: Kuyo Chang (張建文) @ 2025-06-17 14:41 UTC (permalink / raw)
To: juri.lelli@redhat.com, peterz@infradead.org
Cc: bsegall@google.com, vschneid@redhat.com, dietmar.eggemann@arm.com,
rostedt@goodmis.org, AngeloGioacchino Del Regno, mingo@redhat.com,
vincent.guittot@linaro.org, mgorman@suse.de, jstultz@google.com,
matthias.bgg@gmail.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org
On Tue, 2025-06-17 at 16:23 +0200, Juri Lelli wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On 17/06/25 16:14, Peter Zijlstra wrote:
> > On Tue, Jun 17, 2025 at 02:33:15PM +0200, Juri Lelli wrote:
>
> ...
>
> > > To me it looks like we want this (no scaling) for fair_server
> > > (and
> > > possibly scx_server?) as for them we are only looking into a
> > > 'fixed
> > > time' type of isolation. Full fledged servers (hierarchical
> > > scheduling)
> > > maybe have it configurable, or enabled by default as a start (as
> > > we have
> > > it today).
> >
> > Right. Then we should write the above like:
> >
> > scaled_delta_exec = delta_exec;
> > if (!dl_se->dl_server)
> > scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se,
> > delta_exec);
> >
> > and let any later server users add bits on if they want more
> > options.
>
> Works for me. Looks cleaner also.
>
> Kuyo, can you please update your patch then?
>
ok, let me update my patch ASAP.
> Thanks,
> Juri
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-17 14:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-14 2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
2025-06-14 2:35 ` John Stultz
2025-06-16 14:03 ` Juri Lelli
2025-06-17 8:55 ` Peter Zijlstra
2025-06-17 12:33 ` Juri Lelli
2025-06-17 14:14 ` Peter Zijlstra
2025-06-17 14:23 ` Juri Lelli
2025-06-17 14:41 ` Kuyo Chang (張建文)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).