All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Quentin Perret <qperret@google.com>,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	Rik van Riel <riel@surriel.com>,
	Lingutla Chandrasekhar <clingutla@codeaurora.org>
Subject: Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
Date: Fri, 16 Apr 2021 15:51:13 +0200	[thread overview]
Message-ID: <20210416135113.GA16445@vingu-book> (raw)
In-Reply-To: <20210415175846.494385-3-valentin.schneider@arm.com>

Le jeudi 15 avril 2021 à 18:58:46 (+0100), Valentin Schneider a écrit :
> Consider the following topology:
> 
>   DIE [          ]
>   MC  [    ][    ]
>        0  1  2  3
> 
>   capacity_orig_of(x \in {0-1}) < capacity_orig_of(x \in {2-3})
> 
> w/ CPUs 2-3 idle and CPUs 0-1 running CPU hogs (util_avg=1024).
> 
> When CPU2 goes through load_balance() (via periodic / NOHZ balance), it
> should pull one CPU hog from either CPU0 or CPU1 (this is misfit task
> upmigration). However, should a e.g. pcpu kworker awake on CPU0 just before
> this load_balance() happens and preempt the CPU hog running there, we would
> have, for the [0-1] group at CPU2's DIE level:
> 
> o sgs->sum_nr_running > sgs->group_weight
> o sgs->group_capacity * 100 < sgs->group_util * imbalance_pct
> 
> IOW, this group is group_overloaded.
> 
> Considering CPU0 is picked by find_busiest_queue(), we would then visit the
> preempted CPU hog in detach_tasks(). However, given it has just been
> preempted by this pcpu kworker, task_hot() will prevent it from being
> detached. We then leave load_balance() without having done anything.
> 
> Long story short, preempted misfit tasks are affected by task_hot(), while
> currently running misfit tasks are intentionally preempted by the stopper
> task to migrate them over to a higher-capacity CPU.
> 
> Align detach_tasks() with the active-balance logic and let it pick a
> cache-hot misfit task when the destination CPU can provide a capacity
> uplift.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d2d1a69d7aa7..43fc98d34276 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7493,6 +7493,7 @@ struct lb_env {
>  	enum fbq_type		fbq_type;
>  	enum migration_type	migration_type;
>  	enum group_type         src_grp_type;
> +	enum group_type         dst_grp_type;
>  	struct list_head	tasks;
>  };
>  
> @@ -7533,6 +7534,31 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
>  	return delta < (s64)sysctl_sched_migration_cost;
>  }
>  
> +
> +/*
> + * What does migrating this task do to our capacity-aware scheduling criterion?
> + *
> + * Returns 1, if the task needs more capacity than the dst CPU can provide.
> + * Returns 0, if the task needs the extra capacity provided by the dst CPU
> + * Returns -1, if the task isn't impacted by the migration wrt capacity.
> + */
> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env)
> +{
> +	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
> +		return -1;
> +
> +	if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
> +		if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
> +			return 0;
> +		else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
> +			return 1;
> +		else
> +			return -1;
> +	}

Being there means that task fits src_cpu capacity so why testing p against dst_cpu ?

> +
> +	return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1;
> +}

I prefer the below which easier to read because the same var is use everywhere and you can remove cpu_capacity_greater.

static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env)
{
    unsigned long src_capacity, dst_capacity;

    if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
        return -1;

    src_capacity = capacity_of(env->src_cpu);
    dst_capacity = capacity_of(env->dst_cpu);

    if (!task_fits_capacity(p, src_capacity)) {
        if (capacity_greater(dst_capacity, src_capacity))
            return 0;
        else if (capacity_greater(src_capacity, dst_capacity))
            return 1;
        else
            return -1;
    }

    return task_fits_capacity(p, dst_capacity) ? -1 : 1;
}


> +
>  #ifdef CONFIG_NUMA_BALANCING
>  /*
>   * Returns 1, if task migration degrades locality
> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	if (tsk_cache_hot == -1)
>  		tsk_cache_hot = task_hot(p, env);
>  
> +	/*
> +	 * On a (sane) asymmetric CPU capacity system, the increase in compute
> +	 * capacity should offset any potential performance hit caused by a
> +	 * migration.
> +	 */
> +	if ((env->dst_grp_type == group_has_spare) &&

Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit task case as
stated in $subject


> +	    !migrate_degrades_capacity(p, env))
> +		tsk_cache_hot = 0;
> +
>  	if (tsk_cache_hot <= 0 ||
>  	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
>  		if (tsk_cache_hot == 1) {
> @@ -9310,6 +9345,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  	if (!sds.busiest)
>  		goto out_balanced;
>  
> +	env->dst_grp_type = local->group_type;
>  	env->src_grp_type = busiest->group_type;
>  
>  	/* Misfit tasks should be dealt with regardless of the avg load */
> -- 
> 2.25.1
> 

  parent reply	other threads:[~2021-04-16 13:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15 17:58 [PATCH 0/2] sched/fair: (The return of) misfit task load-balance tweaks Valentin Schneider
2021-04-15 17:58 ` [PATCH 1/2] sched/fair: Filter out locally-unsolvable misfit imbalances Valentin Schneider
2021-04-15 18:47   ` Rik van Riel
2021-04-16 13:29   ` Vincent Guittot
2021-04-19 17:13     ` Valentin Schneider
2021-04-22  9:48   ` Dietmar Eggemann
2021-04-22 19:19     ` Valentin Schneider
2021-04-15 17:58 ` [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks Valentin Schneider
2021-04-15 20:39   ` Rik van Riel
2021-04-16  9:43     ` Valentin Schneider
2021-04-19 12:59       ` Phil Auld
2021-04-19 17:17         ` Valentin Schneider
2021-04-19 20:23           ` Phil Auld
2021-04-16 13:51   ` Vincent Guittot [this message]
2021-04-19 17:13     ` Valentin Schneider
2021-04-20 14:33       ` Vincent Guittot
2021-04-21 10:52         ` Valentin Schneider
2021-04-22 17:29           ` Dietmar Eggemann
2021-04-22 19:19             ` Valentin Schneider
2021-04-30  6:58           ` Vincent Guittot
2021-05-07 13:46             ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210416135113.GA16445@vingu-book \
    --to=vincent.guittot@linaro.org \
    --cc=clingutla@codeaurora.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=riel@surriel.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.