Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Qais Yousef <qyousef@layalina.io>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org,
	Pierre Gondois <Pierre.Gondois@arm.com>
Subject: Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit
Date: Mon, 22 Jan 2024 18:02:12 +0000	[thread overview]
Message-ID: <20240122180212.67csjrnsbs7vq57i@airbuntu> (raw)
In-Reply-To: <213f94df-cc36-4281-805d-9f56cbfef796@arm.com>

On 01/22/24 09:59, Dietmar Eggemann wrote:
> On 05/01/2024 23:20, Qais Yousef wrote:
> > From: Qais Yousef <qais.yousef@arm.com>
> > 
> > If a misfit task is affined to a subset of the possible cpus, we need to
> > verify that one of these cpus can fit it. Otherwise the load balancer
> > code will continuously trigger needlessly leading the balance_interval
> > to increase in return and eventually end up with a situation where real
> > imbalances take a long time to address because of this impossible
> > imbalance situation.
> > 
> > This can happen in Android world where it's common for background tasks
> > to be restricted to little cores.
> > 
> > Similarly if we can't fit the biggest core, triggering misfit is
> > pointless as it is the best we can ever get on this system.
> > 
> > To be able to detect that; we use asym_cap_list to iterate through
> > capacities in the system to see if the task is able to run at a higher
> > capacity level based on its p->cpus_ptr. To do so safely, we convert the
> > list to be RCU protected.
> > 
> > To be able to iterate through capacity levels, export asym_cap_list to
> > allow for fast traversal of all available capacity levels in the system.
> > 
> > Test:
> > =====
> > 
> > Add
> > 
> > 	trace_printk("balance_interval = %lu\n", interval)
> > 
> > in get_sd_balance_interval().
> > 
> > run
> > 	if [ "$MASK" != "0" ]; then
> > 		adb shell "taskset -a $MASK cat /dev/zero > /dev/null"
> > 	fi
> > 	sleep 10
> > 	// parse ftrace buffer counting the occurrence of each valaue
> > 
> > Where MASK is either:
> > 
> > 	* 0: no busy task running
> 
> ... no busy task stands for no misfit scenario?

Yes

> 
> > 	* 1: busy task is pinned to 1 cpu; handled today to not cause
> > 	  misfit
> > 	* f: busy task pinned to little cores, simulates busy background
> > 	  task, demonstrates the problem to be fixed
> > 
> 
> [...]
> 
> > +	/*
> > +	 * If the task affinity is not set to default, make sure it is not
> > +	 * restricted to a subset where no CPU can ever fit it. Triggering
> > +	 * misfit in this case is pointless as it has no where better to move
> > +	 * to. And it can lead to balance_interval to grow too high as we'll
> > +	 * continuously fail to move it anywhere.
> > +	 */
> > +	if (!cpumask_equal(p->cpus_ptr, cpu_possible_mask)) {
> 
> Shouldn't this be cpu_active_mask ?

Hmm. So the intention was to check if the affinity was changed from default.

If we hotplug all but little we could end up with the same problem, yes you're
right.

But if the affinity is set to only to littles and cpu_active_mask is only for
littles too, then we'll also end up with the same problem as they both are
equal.

Better to drop this check then? With the sorted list the common case should be
quick to return as they'll have 1024 as a possible CPU.

> 
> include/linux/cpumask.h
> 
>  * cpu_possible_mask- has bit 'cpu' set iff cpu is populatable
>  * cpu_present_mask - has bit 'cpu' set iff cpu is populated
>  * cpu_online_mask  - has bit 'cpu' set iff cpu available to scheduler
>  * cpu_active_mask  - has bit 'cpu' set iff cpu available to migration
> 
> 
> > +		unsigned long clamped_util = clamp(util, uclamp_min, uclamp_max);
> > +		bool has_fitting_cpu = false;
> > +		struct asym_cap_data *entry;
> > +
> > +		rcu_read_lock();
> > +		list_for_each_entry_rcu(entry, &asym_cap_list, link) {
> > +			if (entry->capacity > cpu_cap) {
> > +				cpumask_t *cpumask;
> > +
> > +				if (clamped_util > entry->capacity)
> > +					continue;
> > +
> > +				cpumask = cpu_capacity_span(entry);
> > +				if (!cpumask_intersects(p->cpus_ptr, cpumask))
> > +					continue;
> > +
> > +				has_fitting_cpu = true;
> > +				break;
> > +			}
> > +		}
> 
> What happen when we hotplug out all CPUs of one CPU capacity value?
> IMHO, we don't call asym_cpu_capacity_scan() with !new_topology
> (partition_sched_domains_locked()).

Right. I missed that. We can add another intersection check against
cpu_active_mask.

I am assuming the skipping was done by design, not a bug that needs fixing?
I see for suspend (cpuhp_tasks_frozen) the domains are rebuilt, but not for
hotplug.

> 
> > +		rcu_read_unlock();
> > +
> > +		if (!has_fitting_cpu)
> > +			goto out;
> >  	}
> >  
> >  	/*
> > @@ -5083,6 +5127,9 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
> >  	 * task_h_load() returns 0.
> >  	 */
> >  	rq->misfit_task_load = max_t(unsigned long, task_h_load(p), 1);
> > +	return;
> > +out:
> > +	rq->misfit_task_load = 0;
> >  }
> >  
> >  #else /* CONFIG_SMP */
> > @@ -9583,9 +9630,7 @@ check_cpu_capacity(struct rq *rq, struct sched_domain *sd)
> >   */
> >  static inline int check_misfit_status(struct rq *rq, struct sched_domain *sd)
> >  {
> > -	return rq->misfit_task_load &&
> > -		(arch_scale_cpu_capacity(rq->cpu) < rq->rd->max_cpu_capacity ||
> > -		 check_cpu_capacity(rq, sd));
> > +	return rq->misfit_task_load && check_cpu_capacity(rq, sd);
> 
> You removed 'arch_scale_cpu_capacity(rq->cpu) <
> rq->rd->max_cpu_capacity' here. Why? I can see that with the standard

Based on Pierre review since we no longer trigger misfit for big cores.
I thought Pierre's remark was correct so did the change in v3

	https://lore.kernel.org/lkml/bae88015-4205-4449-991f-8104436ab3ba@arm.com/

> setup (max CPU capacity equal 1024) which is what we probably use 100%
> of the time now. It might get useful again when Vincent will introduce
> his 'user space system pressure' implementation?

I don't mind putting it back if you think it'd be required again in the near
future. I still didn't get a chance to look at Vincent patches properly, but if
there's a clash let's reduce the work.

> 
> >  }
> 
> [...]
> 
> > @@ -1423,8 +1418,8 @@ static void asym_cpu_capacity_scan(void)
> >  
> >  	list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
> >  		if (cpumask_empty(cpu_capacity_span(entry))) {
> > -			list_del(&entry->link);
> > -			kfree(entry);
> > +			list_del_rcu(&entry->link);
> > +			call_rcu(&entry->rcu, free_asym_cap_entry);
> 
> Looks like there could be brief moments in which one CPU capacity group
> of CPUs could be twice in asym_cap_list. I'm thinking about initial
> startup + max CPU frequency related adjustment of CPU capacity
> (init_cpu_capacity_callback()) for instance. Not sure if this is really
> an issue?

I don't think so. As long as the reader sees a consistent value and no crashes
ensued, a momentarily wrong decision in transient or extra work is fine IMO.
I don't foresee a big impact.


Thanks!

--
Qais Yousef

> 
> [...]
>

next prev parent reply	other threads:[~2024-01-22 18:02 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05 22:20 [PATCH v4 0/2] sched: Don't trigger misfit if affinity is restricted Qais Yousef
2024-01-05 22:20 ` [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit Qais Yousef
2024-01-22  9:59   ` Dietmar Eggemann
2024-01-22 18:02     ` Qais Yousef [this message]
2024-01-23 18:07       ` Dietmar Eggemann
2024-01-24 22:43         ` Qais Yousef
2024-01-25 10:35           ` Dietmar Eggemann
2024-01-26  0:47             ` Qais Yousef
2024-01-23  8:32     ` Vincent Guittot
2024-01-24 22:46       ` Qais Yousef
2024-01-25 17:44         ` Vincent Guittot
2024-01-26  0:37           ` Qais Yousef
2024-01-23  8:26   ` Vincent Guittot
2024-01-24 22:29     ` Qais Yousef
2024-01-25 17:40       ` Vincent Guittot
2024-01-26  1:46         ` Qais Yousef
2024-01-26 14:08           ` Vincent Guittot
2024-01-28 23:50             ` Qais Yousef
2024-01-29 22:53               ` Qais Yousef
2024-01-30  9:41               ` Vincent Guittot
2024-01-30 23:57                 ` Qais Yousef
2024-01-31 13:55                   ` Vincent Guittot
2024-02-01 22:21                     ` Qais Yousef
2024-02-05 19:49           ` Dietmar Eggemann
2024-02-06 15:06             ` Qais Yousef
2024-02-06 17:17               ` Dietmar Eggemann
2024-02-20 16:07                 ` Qais Yousef
2024-01-23 17:22   ` Vincent Guittot
2024-01-24 22:38     ` Qais Yousef
2024-01-25 17:50       ` Vincent Guittot
2024-01-26  2:07         ` Qais Yousef
2024-01-26 14:15           ` Vincent Guittot
2024-01-28 23:32             ` Qais Yousef
2024-01-05 22:20 ` [PATCH v4 2/2] sched/topology: Sort asym_cap_list in descending order Qais Yousef
2024-01-21  0:10 ` [PATCH v4 0/2] sched: Don't trigger misfit if affinity is restricted Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240122180212.67csjrnsbs7vq57i@airbuntu \
    --to=qyousef@layalina.io \
    --cc=Pierre.Gondois@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox