Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: luca abeni <luca.abeni@santannapisa.it>
To: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	youssefesmat@google.com,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
Date: Tue, 30 May 2023 16:12:06 +0200	[thread overview]
Message-ID: <20230530161206.0ba6e956@nowhere> (raw)
In-Reply-To: <20230530135526.2385378-1-vineeth@bitbyteword.org>

I think this patch is OK


			Thanks,
				Luca

On Tue, 30 May 2023 09:55:25 -0400
Vineeth Pillai <vineeth@bitbyteword.org> wrote:

> According to the GRUB[1] rule, the runtime is depreciated as:
>   "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
> 
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
>   "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
> 
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
> 
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
> 
> Tests on 6.3.0
> ==============
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
> 
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
> 
> Tests with this patch applied
> =============================
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
> 
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
> %Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
> %Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa
> 
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
>      (2015). Parallel and sequential reclaiming in multicore
>      real-time global scheduling.
> 
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
>  kernel/sched/deadline.c | 50
> +++++++++++++++++++---------------------- kernel/sched/sched.h    |
> 6 +++++ 2 files changed, 29 insertions(+), 27 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct
> sched_dl_entity *dl_se) }
>  
>  /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to
> the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq =
> -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
>   * where u is the utilization of the task, Umax is the maximum
> reclaimable
>   * utilization, Uinact is the (per-runqueue) inactive utilization,
> computed
>   * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue)
> extra
> + * "runqueue active utilization", and Uextra is the (per runqueue)
> extra
>   * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by
> RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64
> seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> + * Since delta is a 64 bit variable, to have an overflow its value
> should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So,
> overflow is
> + * not an issue here.
>   */
>  static u64 grub_reclaim(u64 delta, struct rq *rq, struct
> sched_dl_entity *dl_se) {
> -	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */ u64 u_act;
> -	u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >>
> RATIO_SHIFT;
> +	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */ 
>  	/*
> -	 * Instead of computing max{u * bw_ratio, (1 - u_inact -
> u_extra)},
> -	 * we compare u_inact + rq->dl.extra_bw with
> -	 * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> -	 * u_inact + rq->dl.extra_bw can be larger than
> -	 * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> -	 * leading to wrong results)
> +	 * Instead of computing max{u, (u_max - u_inact - u_extra)},
> we
> +	 * compare u_inact + u_extra with u_max - u, because u_inact
> + u_extra
> +	 * can be larger than u_max. So, u_max - u_inact - u_extra
> would be
> +	 * negative leading to wrong results.
>  	 */
> -	if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> -		u_act = u_act_min;
> +	if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> +		u_act = dl_se->dl_bw;
>  	else
> -		u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> +		u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>  
> +	u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
>  	return (delta * u_act) >> BW_SHIFT;
>  }
>  
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq
> *dl_rq) {
>  	if (global_rt_runtime() == RUNTIME_INF) {
>  		dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> -		dl_rq->extra_bw = 1 << BW_SHIFT;
> +		dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
>  	} else {
>  		dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
>  			  global_rt_period()) >> (BW_SHIFT -
> RATIO_SHIFT);
> -		dl_rq->extra_bw = to_ratio(global_rt_period(),
> -
> global_rt_runtime());
> +		dl_rq->max_bw = dl_rq->extra_bw =
> +			to_ratio(global_rt_period(),
> global_rt_runtime()); }
>  }
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
>  	u64			this_bw;
>  	u64			extra_bw;
>  
> +	/*
> +	 * Maximum available bandwidth for reclaiming by
> SCHED_FLAG_RECLAIM
> +	 * tasks of this rq. Used in calculation of reclaimable
> bandwidth(GRUB).
> +	 */
> +	u64			max_bw;
> +
>  	/*
>  	 * Inverse of the fraction of CPU utilization that can be
> reclaimed
>  	 * by the GRUB algorithm.

next prev parent reply	other threads:[~2023-05-30 14:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
2023-05-30 14:12   ` luca abeni
2023-06-01 10:35   ` Daniel Bristot de Oliveira
2023-06-01 12:55   ` Juri Lelli
2023-05-30 14:12 ` luca abeni [this message]
2023-06-01 10:34 ` [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Daniel Bristot de Oliveira
2023-06-01 11:56 ` Dietmar Eggemann
2023-06-05  1:57   ` Vineeth Remanan Pillai
2023-06-01 12:54 ` Juri Lelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230530161206.0ba6e956@nowhere \
    --to=luca.abeni@santannapisa.it \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    --cc=youssefesmat@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).