[PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
@ 2025-06-14  2:04 Kuyo Chang
  2025-06-14  2:35 ` John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Kuyo Chang @ 2025-06-14  2:04 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno
  Cc: jstultz, kuyo chang, linux-kernel, linux-arm-kernel,
	linux-mediatek

From: kuyo chang <kuyo.chang@mediatek.com>

[Symptom]
The calculation formula for fair_server runtime is based on
Frequency/CPU scale-invariance.
This will cause excessive RT latency (expect absolute time).

[Analysis]
Consider the following case under a Big.LITTLE architecture:

Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
FIE: 100
CIE:50
First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.

So it will scaled to 238,418 ns.

[Solution]
The runtime for fair_server should be absolute time
asis RT bandwidth control.
Fix the runtime calculation formula for the fair_server.

Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
---
 kernel/sched/deadline.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245..8bfa846cf0dc 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
 	if (dl_entity_is_special(dl_se))
 		return;
 
-	scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
+	if (dl_se == &rq->fair_server)
+		scaled_delta_exec = delta_exec;
+	else
+		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
 
 	dl_se->runtime -= scaled_delta_exec;
 
@@ -1611,7 +1614,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
  */
 void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
 {
-	s64 delta_exec, scaled_delta_exec;
+	s64 delta_exec;
 
 	if (!rq->fair_server.dl_defer)
 		return;
@@ -1624,9 +1627,7 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
 	if (delta_exec < 0)
 		return;
 
-	scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
-
-	rq->fair_server.runtime -= scaled_delta_exec;
+	rq->fair_server.runtime -= delta_exec;
 
 	if (rq->fair_server.runtime < 0) {
 		rq->fair_server.dl_defer_running = 0;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-14  2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
@ 2025-06-14  2:35 ` John Stultz
  2025-06-16 14:03 ` Juri Lelli
  2025-06-17  8:55 ` Peter Zijlstra
  2 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2025-06-14  2:35 UTC (permalink / raw)
  To: Kuyo Chang
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno,
	linux-kernel, linux-arm-kernel, linux-mediatek

On Fri, Jun 13, 2025 at 7:05 PM Kuyo Chang <kuyo.chang@mediatek.com> wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
>
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
>
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
>
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
>
> So it will scaled to 238,418 ns.
>
> [Solution]
> The runtime for fair_server should be absolute time
> asis RT bandwidth control.
> Fix the runtime calculation formula for the fair_server.
>
> Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>

While I've not quite gotten my head around the details in the
dl_server code, I've been able to reproduce the problem described here
with a 6.12 based kernel.

Running cyclictest (with arguments "-t -a -p99 -m") , and a randomized
input test on an Android device, its pretty easy to trip 100ms to
*multi-second* delays of the RT prio 99 threads.

Perfetto image:
https://github.com/johnstultz-work/misc/blob/main/images/2025-06-13_cyclictest-dl-server-latency.png

Link to the actual trace:
https://ui.perfetto.dev/#!/?s=9bbb9e539ac2bbbfe3cfa954409134662a9f624a

Using this patch, so far in my testing with the same workload, the max
cyclictest latencies stick around the single digit ms range.

The part that is a little confusing to me, is that prior to the long
stall, it doesn't appear that RT tasks are actually starving
SCHED_NORMAL tasks, so I'm conceptually surprised to see the dl_server
boosting the normal tasks, especially for so long, but I admittedly
haven't looked in detail at the code and have been going off my
understanding of how it was supposed to replace rt-throttling, so I
may be missing a subtlety.

thanks
-john

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-14  2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
  2025-06-14  2:35 ` John Stultz
@ 2025-06-16 14:03 ` Juri Lelli
  2025-06-17  8:55 ` Peter Zijlstra
  2 siblings, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2025-06-16 14:03 UTC (permalink / raw)
  To: Kuyo Chang
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
	linux-kernel, linux-arm-kernel, linux-mediatek

Hello,

On 14/06/25 10:04, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
> 
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
> 
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
> 
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> 
> So it will scaled to 238,418 ns.
> 
> [Solution]
> The runtime for fair_server should be absolute time
> asis RT bandwidth control.
> Fix the runtime calculation formula for the fair_server.
> 
> Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
> ---

Right, I would agree we don't actually want to scale fair_server runtime
by frequency/capacity. Your change looks good to me.

Acked-by: Juri Lelli <juri.lelli@redhat.com>

Thanks!
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-14  2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
  2025-06-14  2:35 ` John Stultz
  2025-06-16 14:03 ` Juri Lelli
@ 2025-06-17  8:55 ` Peter Zijlstra
  2025-06-17 12:33   ` Juri Lelli
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2025-06-17  8:55 UTC (permalink / raw)
  To: Kuyo Chang
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
	linux-kernel, linux-arm-kernel, linux-mediatek

On Sat, Jun 14, 2025 at 10:04:55AM +0800, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
> 
> [Symptom]
> The calculation formula for fair_server runtime is based on
> Frequency/CPU scale-invariance.
> This will cause excessive RT latency (expect absolute time).
> 
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
> 
> Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> FIE: 100
> CIE:50
> First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.

What's this FIE/CIE stuff? Is that some ARM lingo?


> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index ad45a8fea245..8bfa846cf0dc 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
>  	if (dl_entity_is_special(dl_se))
>  		return;
>  
> -	scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> +	if (dl_se == &rq->fair_server)
> +		scaled_delta_exec = delta_exec;
> +	else
> +		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);

Juri, the point it a bit moot atm, but is this something specific to the
fair_server in particular, or all servers?

Because if this is something all servers require then the above is
ofcourse wrong.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-17  8:55 ` Peter Zijlstra
@ 2025-06-17 12:33   ` Juri Lelli
  2025-06-17 14:14     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Juri Lelli @ 2025-06-17 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kuyo Chang, Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
	linux-kernel, linux-arm-kernel, linux-mediatek

On 17/06/25 10:55, Peter Zijlstra wrote:
> On Sat, Jun 14, 2025 at 10:04:55AM +0800, Kuyo Chang wrote:
> > From: kuyo chang <kuyo.chang@mediatek.com>
> > 
> > [Symptom]
> > The calculation formula for fair_server runtime is based on
> > Frequency/CPU scale-invariance.
> > This will cause excessive RT latency (expect absolute time).
> > 
> > [Analysis]
> > Consider the following case under a Big.LITTLE architecture:
> > 
> > Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> > FIE: 100
> > CIE:50
> > First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> 
> What's this FIE/CIE stuff? Is that some ARM lingo?
> 
> 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index ad45a8fea245..8bfa846cf0dc 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> >  	if (dl_entity_is_special(dl_se))
> >  		return;
> >  
> > -	scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> > +	if (dl_se == &rq->fair_server)
> > +		scaled_delta_exec = delta_exec;
> > +	else
> > +		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> 
> Juri, the point it a bit moot atm, but is this something specific to the
> fair_server in particular, or all servers?

I believe for other servers (i.e., rt-server work from Yuri and Luca) it
might be useful to have it configurable somehow. I actually had a recent
discussion about this concerning single task entities (traditional
deadline servers) for which as well there might be cases where one might
want not to scale considering frequency and capacity.

> Because if this is something all servers require then the above is
> ofcourse wrong.

To me it looks like we want this (no scaling) for fair_server (and
possibly scx_server?) as for them we are only looking into a 'fixed
time' type of isolation. Full fledged servers (hierarchical scheduling)
maybe have it configurable, or enabled by default as a start (as we have
it today).

Best,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-17 12:33   ` Juri Lelli
@ 2025-06-17 14:14     ` Peter Zijlstra
  2025-06-17 14:23       ` Juri Lelli
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2025-06-17 14:14 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Kuyo Chang, Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
	linux-kernel, linux-arm-kernel, linux-mediatek

On Tue, Jun 17, 2025 at 02:33:15PM +0200, Juri Lelli wrote:
> On 17/06/25 10:55, Peter Zijlstra wrote:
> > On Sat, Jun 14, 2025 at 10:04:55AM +0800, Kuyo Chang wrote:
> > > From: kuyo chang <kuyo.chang@mediatek.com>
> > > 
> > > [Symptom]
> > > The calculation formula for fair_server runtime is based on
> > > Frequency/CPU scale-invariance.
> > > This will cause excessive RT latency (expect absolute time).
> > > 
> > > [Analysis]
> > > Consider the following case under a Big.LITTLE architecture:
> > > 
> > > Assume the runtime is : 50,000,000 ns, and FIE/CIE as below
> > > FIE: 100
> > > CIE:50
> > > First by FIE, the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > > Then by CIE, it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> > 
> > What's this FIE/CIE stuff? Is that some ARM lingo?
> > 
> > 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index ad45a8fea245..8bfa846cf0dc 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -1504,7 +1504,10 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> > >  	if (dl_entity_is_special(dl_se))
> > >  		return;
> > >  
> > > -	scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> > > +	if (dl_se == &rq->fair_server)
> > > +		scaled_delta_exec = delta_exec;
> > > +	else
> > > +		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> > 
> > Juri, the point it a bit moot atm, but is this something specific to the
> > fair_server in particular, or all servers?
> 
> I believe for other servers (i.e., rt-server work from Yuri and Luca) it
> might be useful to have it configurable somehow. I actually had a recent
> discussion about this concerning single task entities (traditional
> deadline servers) for which as well there might be cases where one might
> want not to scale considering frequency and capacity.
> 
> > Because if this is something all servers require then the above is
> > ofcourse wrong.
> 
> To me it looks like we want this (no scaling) for fair_server (and
> possibly scx_server?) as for them we are only looking into a 'fixed
> time' type of isolation. Full fledged servers (hierarchical scheduling)
> maybe have it configurable, or enabled by default as a start (as we have
> it today).

Right. Then we should write the above like:

	scaled_delta_exec = delta_exec;
	if (!dl_se->dl_server)
		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);

and let any later server users add bits on if they want more options.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-17 14:14     ` Peter Zijlstra
@ 2025-06-17 14:23       ` Juri Lelli
  2025-06-17 14:41         ` Kuyo Chang (張建文)
  0 siblings, 1 reply; 8+ messages in thread
From: Juri Lelli @ 2025-06-17 14:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kuyo Chang, Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Matthias Brugger, AngeloGioacchino Del Regno, jstultz,
	linux-kernel, linux-arm-kernel, linux-mediatek

On 17/06/25 16:14, Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 02:33:15PM +0200, Juri Lelli wrote:

...

> > To me it looks like we want this (no scaling) for fair_server (and
> > possibly scx_server?) as for them we are only looking into a 'fixed
> > time' type of isolation. Full fledged servers (hierarchical scheduling)
> > maybe have it configurable, or enabled by default as a start (as we have
> > it today).
> 
> Right. Then we should write the above like:
> 
> 	scaled_delta_exec = delta_exec;
> 	if (!dl_se->dl_server)
> 		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
> 
> and let any later server users add bits on if they want more options.

Works for me. Looks cleaner also.

Kuyo, can you please update your patch then?

Thanks,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula
  2025-06-17 14:23       ` Juri Lelli
@ 2025-06-17 14:41         ` Kuyo Chang (張建文)
  0 siblings, 0 replies; 8+ messages in thread
From: Kuyo Chang (張建文) @ 2025-06-17 14:41 UTC (permalink / raw)
  To: juri.lelli@redhat.com, peterz@infradead.org
  Cc: bsegall@google.com, vschneid@redhat.com, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, AngeloGioacchino Del Regno, mingo@redhat.com,
	vincent.guittot@linaro.org, mgorman@suse.de, jstultz@google.com,
	matthias.bgg@gmail.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org

On Tue, 2025-06-17 at 16:23 +0200, Juri Lelli wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> On 17/06/25 16:14, Peter Zijlstra wrote:
> > On Tue, Jun 17, 2025 at 02:33:15PM +0200, Juri Lelli wrote:
> 
> ...
> 
> > > To me it looks like we want this (no scaling) for fair_server
> > > (and
> > > possibly scx_server?) as for them we are only looking into a
> > > 'fixed
> > > time' type of isolation. Full fledged servers (hierarchical
> > > scheduling)
> > > maybe have it configurable, or enabled by default as a start (as
> > > we have
> > > it today).
> > 
> > Right. Then we should write the above like:
> > 
> >       scaled_delta_exec = delta_exec;
> >       if (!dl_se->dl_server)
> >               scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se,
> > delta_exec);
> > 
> > and let any later server users add bits on if they want more
> > options.
> 
> Works for me. Looks cleaner also.
> 
> Kuyo, can you please update your patch then?
> 
ok, let me update my patch ASAP.

> Thanks,
> Juri
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-06-17 14:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-14  2:04 [PATCH 1/1] sched/deadline: Fix fair_server runtime calculation formula Kuyo Chang
2025-06-14  2:35 ` John Stultz
2025-06-16 14:03 ` Juri Lelli
2025-06-17  8:55 ` Peter Zijlstra
2025-06-17 12:33   ` Juri Lelli
2025-06-17 14:14     ` Peter Zijlstra
2025-06-17 14:23       ` Juri Lelli
2025-06-17 14:41         ` Kuyo Chang (張建文)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).