* [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
@ 2023-05-30 13:55 ` Vineeth Pillai
2023-05-30 14:12 ` luca abeni
` (2 more replies)
2023-05-30 14:12 ` [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB luca abeni
` (3 subsequent siblings)
4 siblings, 3 replies; 10+ messages in thread
From: Vineeth Pillai @ 2023-05-30 13:55 UTC (permalink / raw)
To: luca.abeni, Juri Lelli, Daniel Bristot de Oliveira,
Peter Zijlstra, Ingo Molnar, Vincent Guittot, Steven Rostedt,
Joel Fernandes, youssefesmat, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider
Cc: Vineeth Pillai, Jonathan Corbet, linux-kernel, linux-doc
Update the details of GRUB to reflect the updated logic.
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
Documentation/scheduler/sched-deadline.rst | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index 9d9be52f221a..9fe4846079bb 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -203,12 +203,15 @@ Deadline Task Scheduling
- Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
runqueue, including the tasks in Inactive state.
+ - Maximum usable bandwidth (max_bw): This is the maximum bandwidth usable by
+ deadline tasks and is currently set to the RT capacity.
+
The algorithm reclaims the bandwidth of the tasks in Inactive state.
It does so by decrementing the runtime of the executing task Ti at a pace equal
to
- dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+ dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt
where:
--
2.40.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
@ 2023-05-30 14:12 ` luca abeni
2023-06-01 10:35 ` Daniel Bristot de Oliveira
2023-06-01 12:55 ` Juri Lelli
2 siblings, 0 replies; 10+ messages in thread
From: luca abeni @ 2023-05-30 14:12 UTC (permalink / raw)
To: Vineeth Pillai
Cc: Juri Lelli, Daniel Bristot de Oliveira, Peter Zijlstra,
Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc
I think this patch is also OK
Thanks,
Luca
On Tue, 30 May 2023 09:55:26 -0400
Vineeth Pillai <vineeth@bitbyteword.org> wrote:
> Update the details of GRUB to reflect the updated logic.
>
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
> Documentation/scheduler/sched-deadline.rst | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/scheduler/sched-deadline.rst
> b/Documentation/scheduler/sched-deadline.rst index
> 9d9be52f221a..9fe4846079bb 100644 ---
> a/Documentation/scheduler/sched-deadline.rst +++
> b/Documentation/scheduler/sched-deadline.rst @@ -203,12 +203,15 @@
> Deadline Task Scheduling
> - Total bandwidth (this_bw): this is the sum of all tasks
> "belonging" to the runqueue, including the tasks in Inactive state.
>
> + - Maximum usable bandwidth (max_bw): This is the maximum bandwidth
> usable by
> + deadline tasks and is currently set to the RT capacity.
> +
>
> The algorithm reclaims the bandwidth of the tasks in Inactive state.
> It does so by decrementing the runtime of the executing task Ti at
> a pace equal to
>
> - dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
> + dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt
>
> where:
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
2023-05-30 14:12 ` luca abeni
@ 2023-06-01 10:35 ` Daniel Bristot de Oliveira
2023-06-01 12:55 ` Juri Lelli
2 siblings, 0 replies; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-06-01 10:35 UTC (permalink / raw)
To: Vineeth Pillai, luca.abeni, Juri Lelli, Peter Zijlstra,
Ingo Molnar, Jonathan Corbet
Cc: linux-kernel, linux-doc, Vincent Guittot, Joel Fernandes,
Steven Rostedt, youssefesmat, Dietmar Eggemann, Ben Segall,
Valentin Schneider, Mel Gorman
On 5/30/23 15:55, Vineeth Pillai wrote:
> Update the details of GRUB to reflect the updated logic.
>
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Thanks!
-- Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
2023-05-30 14:12 ` luca abeni
2023-06-01 10:35 ` Daniel Bristot de Oliveira
@ 2023-06-01 12:55 ` Juri Lelli
2 siblings, 0 replies; 10+ messages in thread
From: Juri Lelli @ 2023-06-01 12:55 UTC (permalink / raw)
To: Vineeth Pillai
Cc: luca.abeni, Daniel Bristot de Oliveira, Peter Zijlstra,
Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc
On 30/05/23 09:55, Vineeth Pillai wrote:
> Update the details of GRUB to reflect the updated logic.
>
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Best,
Juri
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
@ 2023-05-30 14:12 ` luca abeni
2023-06-01 10:34 ` Daniel Bristot de Oliveira
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: luca abeni @ 2023-05-30 14:12 UTC (permalink / raw)
To: Vineeth Pillai
Cc: Juri Lelli, Daniel Bristot de Oliveira, Peter Zijlstra,
Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc
I think this patch is OK
Thanks,
Luca
On Tue, 30 May 2023 09:55:25 -0400
Vineeth Pillai <vineeth@bitbyteword.org> wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.
>
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
> kernel/sched/deadline.c | 50
> +++++++++++++++++++---------------------- kernel/sched/sched.h |
> 6 +++++ 2 files changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct
> sched_dl_entity *dl_se) }
>
> /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to
> the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq =
> -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
> * where u is the utilization of the task, Umax is the maximum
> reclaimable
> * utilization, Uinact is the (per-runqueue) inactive utilization,
> computed
> * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue)
> extra
> + * "runqueue active utilization", and Uextra is the (per runqueue)
> extra
> * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by
> RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64
> seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> + * Since delta is a 64 bit variable, to have an overflow its value
> should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So,
> overflow is
> + * not an issue here.
> */
> static u64 grub_reclaim(u64 delta, struct rq *rq, struct
> sched_dl_entity *dl_se) {
> - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */ u64 u_act;
> - u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >>
> RATIO_SHIFT;
> + u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */
> /*
> - * Instead of computing max{u * bw_ratio, (1 - u_inact -
> u_extra)},
> - * we compare u_inact + rq->dl.extra_bw with
> - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> - * u_inact + rq->dl.extra_bw can be larger than
> - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> - * leading to wrong results)
> + * Instead of computing max{u, (u_max - u_inact - u_extra)},
> we
> + * compare u_inact + u_extra with u_max - u, because u_inact
> + u_extra
> + * can be larger than u_max. So, u_max - u_inact - u_extra
> would be
> + * negative leading to wrong results.
> */
> - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> - u_act = u_act_min;
> + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> + u_act = dl_se->dl_bw;
> else
> - u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>
> + u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
> return (delta * u_act) >> BW_SHIFT;
> }
>
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq
> *dl_rq) {
> if (global_rt_runtime() == RUNTIME_INF) {
> dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> - dl_rq->extra_bw = 1 << BW_SHIFT;
> + dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
> } else {
> dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
> global_rt_period()) >> (BW_SHIFT -
> RATIO_SHIFT);
> - dl_rq->extra_bw = to_ratio(global_rt_period(),
> -
> global_rt_runtime());
> + dl_rq->max_bw = dl_rq->extra_bw =
> + to_ratio(global_rt_period(),
> global_rt_runtime()); }
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
> u64 this_bw;
> u64 extra_bw;
>
> + /*
> + * Maximum available bandwidth for reclaiming by
> SCHED_FLAG_RECLAIM
> + * tasks of this rq. Used in calculation of reclaimable
> bandwidth(GRUB).
> + */
> + u64 max_bw;
> +
> /*
> * Inverse of the fraction of CPU utilization that can be
> reclaimed
> * by the GRUB algorithm.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
2023-05-30 14:12 ` [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB luca abeni
@ 2023-06-01 10:34 ` Daniel Bristot de Oliveira
2023-06-01 11:56 ` Dietmar Eggemann
2023-06-01 12:54 ` Juri Lelli
4 siblings, 0 replies; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-06-01 10:34 UTC (permalink / raw)
To: Vineeth Pillai, luca.abeni, Juri Lelli, Peter Zijlstra,
Ingo Molnar
Cc: Jonathan Corbet, linux-kernel, linux-doc, Vincent Guittot,
Steven Rostedt, Joel Fernandes, youssefesmat, Dietmar Eggemann,
Mel Gorman, Valentin Schneider, Ben Segall
On 5/30/23 15:55, Vineeth Pillai wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.
So, I did some testing, mainly to validate the "one task cannot run on
two CPUs," that is, a case in which a higher utilization task would
always have its % of CPU available, even in the presence of low utilization
trying to reclaim all the CPU time. E.g.,
Task 1: runtime=1ms, deadline=period=10ms with reclaim
Task 2: runtime=1ms, deadline=period=10ms with reclaim
Task 3: runtime 8ms, deadline=period=10ms without reclaim
On two CPUs task 3 always have 80% available... the other tasks
do not get 95% because of migrations.
With 3+ cpus, the tasks can run up to 95% because there are CPUs to
everyone.
I played with migrate disable to force timelines... and yet, I failed
to break things so... we are good :-).
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Thanks!
-- Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
` (2 preceding siblings ...)
2023-06-01 10:34 ` Daniel Bristot de Oliveira
@ 2023-06-01 11:56 ` Dietmar Eggemann
2023-06-05 1:57 ` Vineeth Remanan Pillai
2023-06-01 12:54 ` Juri Lelli
4 siblings, 1 reply; 10+ messages in thread
From: Dietmar Eggemann @ 2023-06-01 11:56 UTC (permalink / raw)
To: Vineeth Pillai, luca.abeni, Juri Lelli,
Daniel Bristot de Oliveira, Peter Zijlstra, Ingo Molnar,
Vincent Guittot, Steven Rostedt, Joel Fernandes, youssefesmat,
Ben Segall, Mel Gorman, Valentin Schneider
Cc: Jonathan Corbet, linux-kernel, linux-doc
On 30/05/2023 15:55, Vineeth Pillai wrote:
[...]
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
> }
>
> /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
> * where u is the utilization of the task, Umax is the maximum reclaimable
> * utilization, Uinact is the (per-runqueue) inactive utilization, computed
> * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue) extra
> + * "runqueue active utilization", and Uextra is the (per runqueue) extra
> * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
nit-pick:
s/multiped/multiplied
> + * Since delta is a 64 bit variable, to have an overflow its value should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
> + * not an issue here.
> */
> static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
> {
> - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
> u64 u_act;
> - u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
> + u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
>
> /*
> - * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
> - * we compare u_inact + rq->dl.extra_bw with
> - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> - * u_inact + rq->dl.extra_bw can be larger than
> - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> - * leading to wrong results)
> + * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
> + * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
> + * can be larger than u_max. So, u_max - u_inact - u_extra would be
> + * negative leading to wrong results.
> */
> - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> - u_act = u_act_min;
> + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> + u_act = dl_se->dl_bw;
> else
> - u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>
> + u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
> return (delta * u_act) >> BW_SHIFT;
> }
>
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
> {
> if (global_rt_runtime() == RUNTIME_INF) {
> dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> - dl_rq->extra_bw = 1 << BW_SHIFT;
> + dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
> } else {
> dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
> global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
> - dl_rq->extra_bw = to_ratio(global_rt_period(),
> - global_rt_runtime());
> + dl_rq->max_bw = dl_rq->extra_bw =
> + to_ratio(global_rt_period(), global_rt_runtime());
> }
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
> u64 this_bw;
> u64 extra_bw;
>
> + /*
> + * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
> + * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
> + */
> + u64 max_bw;
> +
> /*
> * Inverse of the fraction of CPU utilization that can be reclaimed
> * by the GRUB algorithm.
Not related to this patch directly but I still can't see how `GRUB-PA`
with schedutil CPUfreq governor should work together with GRUB reclaiming.
CPU frequency is influenced by Uact (rq->dl.running_bw):
sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->
return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT
But the extra GRUB reclaim runtime is calculated based on rq->dl.max_bw
and AFAICS, Uact is not adjusted by scaled_delta_exec?
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
2023-06-01 11:56 ` Dietmar Eggemann
@ 2023-06-05 1:57 ` Vineeth Remanan Pillai
0 siblings, 0 replies; 10+ messages in thread
From: Vineeth Remanan Pillai @ 2023-06-05 1:57 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: luca.abeni, Juri Lelli, Daniel Bristot de Oliveira,
Peter Zijlstra, Ingo Molnar, Vincent Guittot, Steven Rostedt,
Joel Fernandes, youssefesmat, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc
Hi Dietmar,
Sorry for my late response..
> > + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> > + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
>
> nit-pick:
>
> s/multiped/multiplied
>
Sorry I missed this. I am working on couple more GRUB fixes and will fix
this along with those changes.
> Not related to this patch directly but I still can't see how `GRUB-PA`
> with schedutil CPUfreq governor should work together with GRUB reclaiming.
>
> CPU frequency is influenced by Uact (rq->dl.running_bw):
>
> sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->
>
> return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT
>
> But the extra GRUB reclaim runtime is calculated based on rq->dl.max_bw
> and AFAICS, Uact is not adjusted by scaled_delta_exec?
>
I haven't had a chance to look at GRUB-PA till now and after reading
your mail, I had a quick read of GRUB-PA paper[1] :-). From what I
understood, the dea of GRUB-PA is not to reclaim to the maximum
allowable bandwidth, but to scale down the frequency to the required
utilization(running_bw). This way the tasks run longer(similar to
reclaiming) but using less power.
GRUB reclaim is calculated indirectly based on running_bw as well as
"Uinact = this_bw - running_bw". We factor in reserved and running bw
into the equation and then scale it down to max_bw. So if my
understanding is correct, GRUB-PA doesn't technically reclaim extra
processing capacity, but just allows the task to run longer at lower
frequency so as to save power. I might be wrong and not an expert
here. Luca, please correct me if I am wrong.
Thanks,
Vineeth
[1] C. Scordino, G. Lipari, A Resource Reservation Algorithm for Power-Aware
Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions
on Computers, December 2006.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
` (3 preceding siblings ...)
2023-06-01 11:56 ` Dietmar Eggemann
@ 2023-06-01 12:54 ` Juri Lelli
4 siblings, 0 replies; 10+ messages in thread
From: Juri Lelli @ 2023-06-01 12:54 UTC (permalink / raw)
To: Vineeth Pillai
Cc: luca.abeni, Daniel Bristot de Oliveira, Peter Zijlstra,
Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc
Hi!
On 30/05/23 09:55, Vineeth Pillai wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.
>
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
This looks good to me too. Thanks a lot for working on this and of
course to Luca and Daniel who reviewed and played with it as well.
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Best,
Juri
^ permalink raw reply [flat|nested] 10+ messages in thread