[PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula
@ 2025-06-27  2:28 Kuyo Chang
  2025-06-27  2:48 ` John Stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Kuyo Chang @ 2025-06-27  2:28 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno
  Cc: John Stultz, kuyo chang, linux-kernel, linux-arm-kernel,
	linux-mediatek

From: kuyo chang <kuyo.chang@mediatek.com>

In our testing with 6.12 based kernel on a big.LITTLE system, we were
seeing instances of RT tasks being blocked from running on the LITTLE
cpus for multiple seconds of time, apparently by the dl_server. This
far exceeds the default configured 50ms per second runtime.

This is due to the fair dl_server runtime calculation being scaled
for frequency & capacity of the cpu.

Consider the following case under a Big.LITTLE architecture:
Assume the runtime is: 50,000,000 ns, and Frequency/capacity
scale-invariance defined as below:
Frequency scale-invariance: 100
Capacity scale-invariance: 50
First by Frequency scale-invariance,
the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by capacity scale-invariance,
it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
So it will scaled to 238,418 ns.

This smaller "accounted runtime" value is what ends up being
subtracted against the fair-server's runtime for the current period.
Thus after 50ms of real time, we've only accounted ~238us against the
fair servers runtime. This 209:1 ratio in this example means that on
the smaller cpu the fair server is allowed to continue running,
blocking RT tasks, for over 10 seconds before it exhausts its supposed
50ms of runtime.  And on other hardware configurations it can be even
worse.

For the fair deadline_server, to prevent realtime tasks from being
unexpectedly delayed, we really do want to use fixed time, and not
scaled time for smaller capacity/frequency cpus. So remove the scaling
from the fair server's accounting to fix this.

Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: John Stultz <jstultz@google.com>
Tested-by: John Stultz <jstultz@google.com>

---
v1: https://lore.kernel.org/all/20250614020524.631521-1-kuyo.chang@mediatek.com/
v2: https://lore.kernel.org/all/20250617155355.1479777-1-kuyo.chang@mediatek.com/
v3: https://lore.kernel.org/all/20250626030746.2245365-1-kuyo.chang@mediatek.com/

v1->v2
Use the dl_server flag to identify scaled or non-scaled suggested by Peter.
v2->v3
Use the dl_server(dl_se) helper function for the code refactor suggested by John.
v3->v4
Commit log cleaned up/simplified suggested by John.

---
 kernel/sched/deadline.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245..96a21f38fcc3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1504,7 +1504,9 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
 	if (dl_entity_is_special(dl_se))
 		return;

-	scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
+	scaled_delta_exec = delta_exec;
+	if (!dl_server(dl_se))
+		scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);

 	dl_se->runtime -= scaled_delta_exec;

@@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
 	if (delta_exec < 0)
 		return;

-	scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
+	scaled_delta_exec = delta_exec;
+	if (!rq->fair_server.dl_server)
+		scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);

 	rq->fair_server.runtime -= scaled_delta_exec;

-- 
2.45.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula
  2025-06-27  2:28 [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula Kuyo Chang
@ 2025-06-27  2:48 ` John Stultz
  2025-07-01 10:49   ` Juri Lelli
  0 siblings, 1 reply; 4+ messages in thread
From: John Stultz @ 2025-06-27  2:48 UTC (permalink / raw)
  To: Kuyo Chang
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno,
	linux-kernel, linux-arm-kernel, linux-mediatek

On Thu, Jun 26, 2025 at 7:28 PM Kuyo Chang <kuyo.chang@mediatek.com> wrote:
> In our testing with 6.12 based kernel on a big.LITTLE system, we were
> seeing instances of RT tasks being blocked from running on the LITTLE
> cpus for multiple seconds of time, apparently by the dl_server. This
> far exceeds the default configured 50ms per second runtime.
>
> This is due to the fair dl_server runtime calculation being scaled
> for frequency & capacity of the cpu.
>
> Consider the following case under a Big.LITTLE architecture:
> Assume the runtime is: 50,000,000 ns, and Frequency/capacity
> scale-invariance defined as below:
> Frequency scale-invariance: 100
> Capacity scale-invariance: 50
> First by Frequency scale-invariance,
> the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by capacity scale-invariance,
> it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> So it will scaled to 238,418 ns.
>
> This smaller "accounted runtime" value is what ends up being
> subtracted against the fair-server's runtime for the current period.
> Thus after 50ms of real time, we've only accounted ~238us against the
> fair servers runtime. This 209:1 ratio in this example means that on
> the smaller cpu the fair server is allowed to continue running,
> blocking RT tasks, for over 10 seconds before it exhausts its supposed
> 50ms of runtime.  And on other hardware configurations it can be even
> worse.
>
> For the fair deadline_server, to prevent realtime tasks from being
> unexpectedly delayed, we really do want to use fixed time, and not
> scaled time for smaller capacity/frequency cpus. So remove the scaling
> from the fair server's accounting to fix this.
>

Thanks again for revising the commit message, this version is easier
(for me at least) to follow.

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index ad45a8fea245..96a21f38fcc3 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
>         if (delta_exec < 0)
>                 return;
>
> -       scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
> +       scaled_delta_exec = delta_exec;
> +       if (!rq->fair_server.dl_server)
> +               scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
>
>         rq->fair_server.runtime -= scaled_delta_exec;

As I mentioned earlier, I still don't see this conditional as making a
lot of sense, as I don't believe there is time when this function
would be called and (!rq->fair_server.dl_server) would be true.
And even if there were, I'm not sure it makes sense to scale the time
interval based on the fair_server.dl_server flag.

From a separate discussion, you highlighted that it might be useful
once we have multiple dl_server types, which may want scaled
accounting, but I think in that case we should use an explicit flag
instead of the dl_server bit to denote if the accounting should be
scaled or not.

So, since your patch is a fix for a pretty bad bug, I think it should
be focused on fixing the issue in the simplest and clearest way for
the existing code, and not be too worried about integrating with
future changes that haven't landed.

Then, as those future changes land, we can see how best to generalize
the decision to scale or not scale the accounting on a dl_server.

That said, the conditional is a bit of a moot point, since I don't
think we'll actually hit it, and I'm motivated to get the bug you are
fixing resolved, so I wouldn't object if this went in as-is, but it
seems like it would be much cleaner to just drop that conditional as
you did in the original version of this patch.

Thanks so much again for your iterating on this change here! I really
appreciate your efforts to both find this issue as well as to get it
fixed!
-john

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula
  2025-06-27  2:48 ` John Stultz
@ 2025-07-01 10:49   ` Juri Lelli
  2025-07-02  2:36     ` Kuyo Chang
  0 siblings, 1 reply; 4+ messages in thread
From: Juri Lelli @ 2025-07-01 10:49 UTC (permalink / raw)
  To: John Stultz
  Cc: Kuyo Chang, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Matthias Brugger, AngeloGioacchino Del Regno,
	linux-kernel, linux-arm-kernel, linux-mediatek

On 26/06/25 19:48, John Stultz wrote:
> On Thu, Jun 26, 2025 at 7:28 PM Kuyo Chang <kuyo.chang@mediatek.com> wrote:
> > In our testing with 6.12 based kernel on a big.LITTLE system, we were
> > seeing instances of RT tasks being blocked from running on the LITTLE
> > cpus for multiple seconds of time, apparently by the dl_server. This
> > far exceeds the default configured 50ms per second runtime.
> >
> > This is due to the fair dl_server runtime calculation being scaled
> > for frequency & capacity of the cpu.
> >
> > Consider the following case under a Big.LITTLE architecture:
> > Assume the runtime is: 50,000,000 ns, and Frequency/capacity
> > scale-invariance defined as below:
> > Frequency scale-invariance: 100
> > Capacity scale-invariance: 50
> > First by Frequency scale-invariance,
> > the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> > Then by capacity scale-invariance,
> > it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
> > So it will scaled to 238,418 ns.
> >
> > This smaller "accounted runtime" value is what ends up being
> > subtracted against the fair-server's runtime for the current period.
> > Thus after 50ms of real time, we've only accounted ~238us against the
> > fair servers runtime. This 209:1 ratio in this example means that on
> > the smaller cpu the fair server is allowed to continue running,
> > blocking RT tasks, for over 10 seconds before it exhausts its supposed
> > 50ms of runtime.  And on other hardware configurations it can be even
> > worse.
> >
> > For the fair deadline_server, to prevent realtime tasks from being
> > unexpectedly delayed, we really do want to use fixed time, and not
> > scaled time for smaller capacity/frequency cpus. So remove the scaling
> > from the fair server's accounting to fix this.
> >
> 
> Thanks again for revising the commit message, this version is easier
> (for me at least) to follow.
> 
> 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index ad45a8fea245..96a21f38fcc3 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
> >         if (delta_exec < 0)
> >                 return;
> >
> > -       scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
> > +       scaled_delta_exec = delta_exec;
> > +       if (!rq->fair_server.dl_server)
> > +               scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
> >
> >         rq->fair_server.runtime -= scaled_delta_exec;
> 
> As I mentioned earlier, I still don't see this conditional as making a
> lot of sense, as I don't believe there is time when this function
> would be called and (!rq->fair_server.dl_server) would be true.
> And even if there were, I'm not sure it makes sense to scale the time
> interval based on the fair_server.dl_server flag.
> 
> From a separate discussion, you highlighted that it might be useful
> once we have multiple dl_server types, which may want scaled
> accounting, but I think in that case we should use an explicit flag
> instead of the dl_server bit to denote if the accounting should be
> scaled or not.
> 
> So, since your patch is a fix for a pretty bad bug, I think it should
> be focused on fixing the issue in the simplest and clearest way for
> the existing code, and not be too worried about integrating with
> future changes that haven't landed.
> 
> Then, as those future changes land, we can see how best to generalize
> the decision to scale or not scale the accounting on a dl_server.
> 
> That said, the conditional is a bit of a moot point, since I don't
> think we'll actually hit it, and I'm motivated to get the bug you are
> fixing resolved, so I wouldn't object if this went in as-is, but it
> seems like it would be much cleaner to just drop that conditional as
> you did in the original version of this patch.

I agree. It would be better to drop the conditional.

Thanks!
Juri



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula
  2025-07-01 10:49   ` Juri Lelli
@ 2025-07-02  2:36     ` Kuyo Chang
  0 siblings, 0 replies; 4+ messages in thread
From: Kuyo Chang @ 2025-07-02  2:36 UTC (permalink / raw)
  To: Juri Lelli, John Stultz
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-kernel,
	linux-arm-kernel, linux-mediatek

On Tue, 2025-07-01 at 11:49 +0100, Juri Lelli wrote:
Hi John/Juri,
> 
> > 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index ad45a8fea245..96a21f38fcc3 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -1624,7 +1626,9 @@ void dl_server_update_idle_time(struct rq
> > > *rq, struct task_struct *p)
> > >         if (delta_exec < 0)
> > >                 return;
> > > 
> > > -       scaled_delta_exec = dl_scaled_delta_exec(rq, &rq-
> > > >fair_server, delta_exec);
> > > +       scaled_delta_exec = delta_exec;
> > > +       if (!rq->fair_server.dl_server)
> > > +               scaled_delta_exec = dl_scaled_delta_exec(rq, &rq-
> > > >fair_server, delta_exec);
> > > 
> > >         rq->fair_server.runtime -= scaled_delta_exec;
> > 
> > As I mentioned earlier, I still don't see this conditional as
> > making a
> > lot of sense, as I don't believe there is time when this function
> > would be called and (!rq->fair_server.dl_server) would be true.
> > And even if there were, I'm not sure it makes sense to scale the
> > time
> > interval based on the fair_server.dl_server flag.
> > 
> > From a separate discussion, you highlighted that it might be useful
> > once we have multiple dl_server types, which may want scaled
> > accounting, but I think in that case we should use an explicit flag
> > instead of the dl_server bit to denote if the accounting should be
> > scaled or not.
> > 
> > So, since your patch is a fix for a pretty bad bug, I think it
> > should
> > be focused on fixing the issue in the simplest and clearest way for
> > the existing code, and not be too worried about integrating with
> > future changes that haven't landed.
> > 
> > Then, as those future changes land, we can see how best to
> > generalize
> > the decision to scale or not scale the accounting on a dl_server.
> > 
> > That said, the conditional is a bit of a moot point, since I don't
> > think we'll actually hit it, and I'm motivated to get the bug you
> > are
> > fixing resolved, so I wouldn't object if this went in as-is, but it
> > seems like it would be much cleaner to just drop that conditional
> > as
> > you did in the original version of this patch.
> 
> I agree. It would be better to drop the conditional.
> 

Thanks for your feedback and suggestion.
So the original version of this patch for fair_server is much better &
cleaner.
Updated to v5 as below link
https://lore.kernel.org/all/20250702021440.2594736-1-kuyo.chang@mediatek.com/

> Thanks!
> Juri
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-02  2:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27  2:28 [PATCH V4 1/1] sched/deadline: Fix dl_server runtime calculation formula Kuyo Chang
2025-06-27  2:48 ` John Stultz
2025-07-01 10:49   ` Juri Lelli
2025-07-02  2:36     ` Kuyo Chang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).