Re: [PATCH] sched: balance_cpu to consider other cpus in its group as target of (pinned) task migration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Cc: mingo@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	roland@kernel.org, Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	efault@gmx.de, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH] sched: balance_cpu to consider other cpus in its group as target of (pinned) task migration
Date: Mon, 04 Jun 2012 11:00:54 +0200	[thread overview]
Message-ID: <1338800454.2448.71.camel@twins> (raw)
In-Reply-To: <4FCC4E3B.4090209@linux.vnet.ibm.com>

On Mon, 2012-06-04 at 11:27 +0530, Prashanth Nageshappa wrote:
> Based on the description in
> http://marc.info/?l=linux-kernel&m=133108682113018&w=2 , I was able to recreate
> a problem where in a SCHED_OTHER thread never gets runtime, even though there is
> one allowed CPU where it can run and make progress.

Do not use external references in changelogs if you don't absolutely
need to.

> On a dual socket box (4 cores per socket, 2 threads per core) with following
> config:
> 0 8	1 9	4 12	5 13
> 2 10	3 11	6 14	7 15
> |__________|    |__________|
>  socket 1        socket 2
> 
> If we have following 4 tasks (2 SCHED_FIFO and 2 SCHED_OTHER) started in the
> following order:
> 1> SCHED_FIFO cpu hogging task bound to cpu 1
> 2> SCHED_OTHER cpu hogging task bound to cpus 3 & 9 - running on cpu 3
>    sleeps and wakes up after all other tasks are started
> 3> SCHED_FIFO cpu hogging task bound to cpu 3
> 4> SCHED_OTHER cpu hogging task bound to cpu 9
> 
> Once all the 4 tasks are started, we observe that 2nd task is starved of CPU
> after waking up. When it wakes up, it wakes up on its prev_cpu (3) where
> a FIFO task is now hogging the cpu. To prevent starvation, 2nd task
> needs to be pulled to cpu 9. However, between cpus 1, 9, cpu1 is the chosen
> cpu that attempts pulling tasks towards its core. When it tries pulling
> 2nd tasks towards its core, it is unable to do so as cpu1 is not in 2nd
> task's cpus_allowed mask. Ideally cpu1 should note that the task can be
> moved to its sibling and trigger that movement.
> 
> In this patch, we try to identify if load balancing goal was not achieved
> completely because of destination cpu not being in cpus_allowed mask of target
> task(s) and retry load balancing to try and move tasks to other cpus in the
> same sched_group as that of destination cpu.

Either expand a little more on the implementation or preferably add some
comments.

> Tested on tip commit cca44889.

This is not a sane addition to the changelog.

> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Did vatsa write this patch? If so, you forgot a From header, if not,
wtf!?

> Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
> 
> ----
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index de49ed5..da275d8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3098,6 +3098,7 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10;
>  
>  #define LBF_ALL_PINNED	0x01
>  #define LBF_NEED_BREAK	0x02
> +#define LBF_NEW_DST_CPU	0x04
>
>  struct lb_env {
>  	struct sched_domain	*sd;
> @@ -3108,6 +3109,8 @@ struct lb_env {
>  	int			dst_cpu;
>  	struct rq		*dst_rq;
>  
> +	struct cpumask		*dst_grpmask;
> +	int			new_dst_cpu;
>  	enum cpu_idle_type	idle;
>  	long			imbalance;
>  	unsigned int		flags;
> @@ -3198,7 +3201,25 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	 * 3) are cache-hot on their current CPU.
>  	 */
>  	if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
> -		schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
> +		int new_dst_cpu;
> +
> +		if (!env->dst_grpmask) {
> +			schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
> +			return 0;
> +		}
> +		/*
> +		 * check if cpus_allowed has any cpus in the same sched_group
> +		 * as that of dst_cpu and set LBF_NEW_DST_CPU and new_dst_cpu
> +		 * accordingly
> +		 */
> +		new_dst_cpu = cpumask_first_and(env->dst_grpmask,
> +						tsk_cpus_allowed(p));
> +		if (new_dst_cpu >= nr_cpu_ids) {
> +			schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
> +		} else {
> +			env->flags |= LBF_NEW_DST_CPU;
> +			env->new_dst_cpu = new_dst_cpu;
> +		}

Only a flimsy comment on what the code does -- I can read C thank you.
Not a single comment on why it does this.

>  		return 0;
>  	}
>  	env->flags &= ~LBF_ALL_PINNED;
> @@ -4418,7 +4439,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  			struct sched_domain *sd, enum cpu_idle_type idle,
>  			int *balance)
>  {
> -	int ld_moved, active_balance = 0;
> +	int ld_moved, old_ld_moved, active_balance = 0;
>  	struct sched_group *group;
>  	struct rq *busiest;
>  	unsigned long flags;
> @@ -4428,6 +4449,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  		.sd		    = sd,
>  		.dst_cpu	    = this_cpu,
>  		.dst_rq		    = this_rq,
> +		.dst_grpmask	    = sched_group_cpus(sd->groups),
>  		.idle		    = idle,
>  		.loop_break	    = sched_nr_migrate_break,
>  		.find_busiest_queue = find_busiest_queue,
> @@ -4461,6 +4483,7 @@ redo:
>  	schedstat_add(sd, lb_imbalance[idle], env.imbalance);
>  
>  	ld_moved = 0;
> +	old_ld_moved = 0;
>  	if (busiest->nr_running > 1) {
>  		/*
>  		 * Attempt to move tasks. If find_busiest_group has found
> @@ -4488,12 +4511,27 @@ more_balance:
>  			env.flags &= ~LBF_NEED_BREAK;
>  			goto more_balance;
>  		}
> -
>  		/*
>  		 * some other cpu did the load balance for us.
>  		 */
> -		if (ld_moved && this_cpu != smp_processor_id())
> -			resched_cpu(this_cpu);
> +		if ((ld_moved != old_ld_moved) &&
> +			(env.dst_cpu != smp_processor_id()))
> +			resched_cpu(env.dst_cpu);

The whole old_ld_moved stuff sucks, and preferably needs a rename, or at
least a comment.

> +		if ((env.flags & LBF_NEW_DST_CPU) && (env.imbalance > 0)) {
> +			/*
> +			 * we could not balance completely as some tasks
> +			 * were not allowed to move to the dst_cpu, so try
> +			 * again with new_dst_cpu.
> +			 */
> +			this_rq = cpu_rq(env.new_dst_cpu);
> +			env.dst_rq = this_rq;
> +			env.dst_cpu = env.new_dst_cpu;
> +			env.flags &= ~LBF_NEW_DST_CPU;
> +			env.loop = 0;
> +			old_ld_moved = ld_moved;
> +			goto more_balance;
> +		}
>  

OK, so previously we only pulled to ourselves, now you make cpu x move
from cpu y to cpu z. This changes the dynamic of the load-balancer, not
a single word on that and its impact/ramifications.

next prev parent reply	other threads:[~2012-06-04  9:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04  5:57 [PATCH] sched: balance_cpu to consider other cpus in its group as target of (pinned) task migration Prashanth Nageshappa
2012-06-04  9:00 ` Peter Zijlstra [this message]
2012-06-04 11:41   ` Srivatsa Vaddagiri
2012-06-04 11:49     ` Peter Zijlstra
2012-06-04 12:27       ` Srivatsa Vaddagiri
2012-06-04 11:51     ` Peter Zijlstra
2012-06-04  9:25 ` Mike Galbraith
2012-06-04 11:53   ` Peter Zijlstra
2012-06-04 12:47     ` Mike Galbraith
2012-06-04 13:07       ` Srivatsa Vaddagiri
2012-06-04 14:30         ` Mike Galbraith
2012-06-04 14:38           ` Srivatsa Vaddagiri
2012-06-04 14:41             ` Mike Galbraith
2012-06-04 15:00               ` Srivatsa Vaddagiri
2012-06-04 15:21                 ` Peter Zijlstra
2012-06-04 15:25                   ` Srivatsa Vaddagiri
2012-06-04 15:33                     ` Peter Zijlstra
2012-06-04 15:46                       ` Srivatsa Vaddagiri
2012-06-04 16:56                       ` Mike Galbraith
2012-06-04 17:37                         ` Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1338800454.2448.71.camel@twins \
    --to=peterz@infradead.org \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@kernel.org \
    --cc=prashanth@linux.vnet.ibm.com \
    --cc=roland@kernel.org \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.