public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: mingo@redhat.com, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, linux-kernel@vger.kernel.org,
	zhangqiao22@huawei.com
Subject: Re: [PATCH 4/4] sched/fair: limit sched slice duration
Date: Fri, 9 Sep 2022 13:14:07 +0200	[thread overview]
Message-ID: <Yxsf/5ErmVoKFucb@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20220825122726.20819-5-vincent.guittot@linaro.org>


Picked up the first three.

On Thu, Aug 25, 2022 at 02:27:26PM +0200, Vincent Guittot wrote:
> In presence of a lot of small weight tasks like sched_idle tasks, normal
> or high weight tasks can see their ideal runtime (sched_slice) to increase
> to hundreds ms whereas it normally stays below sysctl_sched_latency.
> 
> 2 normal tasks running on a CPU will have a max sched_slice of 12ms
> (half of the sched_period). This means that they will make progress
> every sysctl_sched_latency period.
> 
> If we now add 1000 idle tasks on the CPU, the sched_period becomes

Surely people aren't actually having that many runnable tasks and this
is a device for the argument?

> 3006 ms and the ideal runtime of the normal tasks becomes 609 ms.
> It will even become 1500ms if the idle tasks belongs to an idle cgroup.
> This means that the scheduler will look for picking another waiting task
> after 609ms running time (1500ms respectively). The idle tasks change
> significantly the way the 2 normal tasks interleave their running time
> slot whereas they should have a small impact.
> 
> Such long sched_slice can delay significantly the release of resources
> as the tasks can wait hundreds of ms before the next running slot just
> because of idle tasks queued on the rq.
> 
> Cap the ideal_runtime to sysctl_sched_latency when comparing to the next
> waiting task to make sure that tasks will regularly make progress and will
> not be significantly impacted by idle/background tasks queued on the rq.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> While studying the problem, I have also considered to substract
> cfs.idle_h_nr_running before computing the sched_slice but we can have
> quite similar problem with low weight bormal task/cgroup so I have decided
> to keep this solution.

That ^... my proposal below has the same problem.

This:

> Also, this solution doesn't completly remove the impact of idle tasks
> in the scheduling pattern but cap the running slice of a task to a max
> value of 2*sysctl_sched_latency.

I'm failing to see how.

>  kernel/sched/fair.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 260a55ac462f..96fedd0ab5fa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4599,6 +4599,8 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
>  	if (delta < 0)
>  		return;

(I'm thinking that early return is a bit pointless, a negative value
won't be larger than ideal_time anyway)

> +	ideal_runtime =  min_t(u64, ideal_runtime, sysctl_sched_latency);
> +

(superfluous whitespace -- twice, once after the = once this whole extra
line)

>  	if (delta > ideal_runtime)
>  		resched_curr(rq_of(cfs_rq));
>  }

Urgghhhh..

so delta is in vtime here, while sched_latency is not, so the heavier
the queue, the larger this value becomes.

Given those 1000 idle tasks, rq-weight would be around 2048; however due
to nr_running being insane, sched_slice() ends up being something like:

  1000 * min_gran * 2 / 2048

which is around ~min_gran and so won't come near to latency.


since we already have idle_min_gran; how about something like this?


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index efceb670e755..8dd18fc0affa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -706,12 +706,14 @@ static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
  *
  * p = (nr <= nl) ? l : l*nr/nl
  */
-static u64 __sched_period(unsigned long nr_running)
+static u64 __sched_period(unsigned long nr_running, unsigned long nr_idle)
 {
-	if (unlikely(nr_running > sched_nr_latency))
-		return nr_running * sysctl_sched_min_granularity;
-	else
-		return sysctl_sched_latency;
+	u64 period = 0;
+
+	period += nr_running * sysctl_sched_min_granularity;
+	period += nr_idle    * sysctl_sched_idle_min_granularity;
+
+	return max_t(u64, period, sysctl_sched_latency);
 }
 
 static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
@@ -724,15 +726,25 @@ static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	unsigned int nr_running = cfs_rq->nr_running;
+	unsigned int nr_idle = cfs_rq->idle_nr_running;
+	unsigned int nr_running = cfs_rq->nr_running - nr_idle;
 	struct sched_entity *init_se = se;
 	unsigned int min_gran;
 	u64 slice;
 
-	if (sched_feat(ALT_PERIOD))
-		nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
+	if (sched_feat(ALT_PERIOD)) {
+		nr_idle = rq_of(cfs_rq)->cfs.idle_h_nr_running;
+		nr_running = rq_of(cfs_rq)->cfs.h_nr_running - nr_idle;
+	}
+
+	if (!se->on_rq) {
+		if (se_is_idle(se))
+			nr_idle++;
+		else
+			nr_running++;
+	}
 
-	slice = __sched_period(nr_running + !se->on_rq);
+	slice = __sched_period(nr_running, nr_idle);
 
 	for_each_sched_entity(se) {
 		struct load_weight *load;


This changes how the compute the period depending on the composition. It
suffers the exact same problem you had earlier though in that it doesn't
work for the other low-weight cases. But perhaps we can come up with a
better means of computing the period that *does* consider them?

As said before;... urgh! bit of a sticky problem this.

  reply	other threads:[~2022-09-09 11:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 12:27 [PATCH 0/4] sched/fair: fixes in presence of lot of sched_idle tasks Vincent Guittot
2022-08-25 12:27 ` [PATCH 1/4] sched/fair: make sure to try to detach at least one movable task Vincent Guittot
2022-09-12  8:44   ` Dietmar Eggemann
2022-09-13  8:54     ` Vincent Guittot
2022-09-15 14:24   ` [tip: sched/core] sched/fair: Make " tip-bot2 for Vincent Guittot
2024-02-12 20:28   ` [PATCH 1/4] sched/fair: make " Josh Don
2024-03-20 16:57     ` Vincent Guittot
2024-03-21 20:25       ` Josh Don
2024-03-22 17:16         ` Vincent Guittot
2024-03-22 19:49           ` Josh Don
2022-08-25 12:27 ` [PATCH 2/4] sched/fair: cleanup loop_max and loop_break Vincent Guittot
2022-09-12  8:45   ` Dietmar Eggemann
2022-09-13  8:37     ` Vincent Guittot
2022-09-15 14:24   ` [tip: sched/core] sched/fair: Cleanup " tip-bot2 for Vincent Guittot
2022-08-25 12:27 ` [PATCH 3/4] sched/fair: move call to list_last_entry() in detach_tasks Vincent Guittot
2022-09-12  8:46   ` Dietmar Eggemann
2022-09-15 14:24   ` [tip: sched/core] sched/fair: Move " tip-bot2 for Vincent Guittot
2022-08-25 12:27 ` [PATCH 4/4] sched/fair: limit sched slice duration Vincent Guittot
2022-09-09 11:14   ` Peter Zijlstra [this message]
2022-09-09 14:03     ` Vincent Guittot
2022-09-09 15:05       ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yxsf/5ErmVoKFucb@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zhangqiao22@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox