All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rabin Vincent <rabin.vincent@axis.com>
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
Date: Wed, 1 Jul 2015 16:55:51 +0200	[thread overview]
Message-ID: <20150701145551.GA15690@axis.com> (raw)
In-Reply-To: <1435728995.9397.7.camel@gmail.com>

On Wed, Jul 01, 2015 at 07:36:35AM +0200, Mike Galbraith wrote:
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5897,7 +5897,7 @@ static int detach_tasks(struct lb_env *e
>  {
>  	struct list_head *tasks = &env->src_rq->cfs_tasks;
>  	struct task_struct *p;
> -	unsigned long load;
> +	unsigned long load, d_load = 0, s_load = env->src_rq->load.weight;
>  	int detached = 0;
>  
>  	lockdep_assert_held(&env->src_rq->lock);
> @@ -5936,6 +5936,11 @@ static int detach_tasks(struct lb_env *e
>  
>  		detached++;
>  		env->imbalance -= load;
> +		if (!load) {
> +			load = min_t(unsigned long, env->imbalance, p->se.load.weight);
> +			trace_printk("%s:%d is non-contributor - count as %ld\n", p->comm, p->pid, load);
> +		}
> +		d_load += load;
>  
>  #ifdef CONFIG_PREEMPT
>  		/*
> @@ -5954,6 +5959,18 @@ static int detach_tasks(struct lb_env *e
>  		if (env->imbalance <= 0)
>  			break;
>  
> +		/*
> +		 * We don't want to bleed busiest_rq dry either.  Weighted load
> +		 * and/or imbalance may be dinky, load contribution can even be
> +		 * zero, perhaps causing us to over balancem we had not assigned
> +		 * it above.
> +		 */
> +		if (env->src_rq->load.weight <= env->dst_rq->load.weight + d_load) {
> +			trace_printk("OINK - imbal: %ld  load: %ld  run: %d  det: %d  sload_was: %ld sload_is: %ld  dload: %ld\n",
> +				env->imbalance, load, env->src_rq->nr_running, detached, s_load, env->src_rq->load.weight, env->dst_rq->load.weight+d_load);
> +			break;
> +		}
> +
>  		continue;
>  next:
>  		list_move_tail(&p->se.group_node, tasks);
> 

I've tried to analyse how your patch would affect the situation in one
of the crash dumps which I have of the problem.

In this dump, cpu0 is in the middle of dequeuing all tasks from cpu1.
rcu_sched has already been detached and there are two tasks left, one of them
which is being processed by dequeue_entity_load_avg() called from
dequeue_task() at the time the watchdog hits.  lb_env is the following.
imbalance is, as you can see, 60.

 crash> struct lb_env 8054fd50
 struct lb_env {
   sd = 0x8fc13e00, 
   src_rq = 0x81297200, 
   src_cpu = 1, 
   dst_cpu = 0, 
   dst_rq = 0x8128e200, 
   dst_grpmask = 0x0, 
   new_dst_cpu = 0, 
   idle = CPU_NEWLY_IDLE, 
   imbalance = 60, 
   cpus = 0x8128d238, 
   flags = 0, 
   loop = 2, 
   loop_break = 32, 
   loop_max = 3, 
   fbq_type = all, 
   tasks = {
     next = 0x8fc4c6ec, 
     prev = 0x8fc4c6ec
   }
 }

Weights of the runqueues:

 crash> struct rq.load.weight runqueues:0,1
 [0]: 8128e200
   load.weight = 0,
 [1]: 81297200
   load.weight = 1935,

The only running tasks on the system are these three:

 crash> foreach RU ps
    PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
 >     0      0   0  8056d8b0  RU   0.0       0      0  [swapper/0]
 >     0      0   1  8fc5cd18  RU   0.0       0      0  [swapper/1]
 >     0      0   2  8fc5c6b0  RU   0.0       0      0  [swapper/2]
 >     0      0   3  8fc5c048  RU   0.0       0      0  [swapper/3]
       7      2   0  8fc4c690  RU   0.0       0      0  [rcu_sched]
      30      2   1  8fd26108  RU   0.0       0      0  [kswapd0]
     413      1   1  8edda408  RU   0.6    1900    416  rngd

And the load.weight and load_avg_contribs for them and their parent SEs:

 crash> foreach 7 30 413 load
 PID: 7      TASK: 8fc4c690  CPU: 0   COMMAND: "rcu_sched"
  task_h_load():   325 [ = (load_avg_contrib {    5} * cfs_rq->h_load {   65}) / (cfs_rq->runnable_load_avg {    0} + 1) ]
  SE: 8fc4c6d8 load_avg_contrib:     5 load.weight:  1024 PARENT: 00000000 GROUPNAME: (null)
 
 PID: 30     TASK: 8fd26108  CPU: 1   COMMAND: "kswapd0"
  task_h_load():    10 [ = (load_avg_contrib {   10} * cfs_rq->h_load {  133}) / (cfs_rq->runnable_load_avg {  128} + 1) ]
  SE: 8fd26150 load_avg_contrib:    10 load.weight:  1024 PARENT: 00000000 GROUPNAME: (null)
 
 PID: 413    TASK: 8edda408  CPU: 1   COMMAND: "rngd"
  task_h_load():     0 [ = (load_avg_contrib {    0} * cfs_rq->h_load {    0}) / (cfs_rq->runnable_load_avg {    0} + 1) ]
  SE: 8edda450 load_avg_contrib:     0 load.weight:  1024 PARENT: 8fffbd00 GROUPNAME: (null)
  SE: 8fffbd00 load_avg_contrib:     0 load.weight:     2 PARENT: 8f531f80 GROUPNAME: rngd@hwrng.service
  SE: 8f531f80 load_avg_contrib:     0 load.weight:  1024 PARENT: 8f456e00 GROUPNAME: system-rngd.slice
  SE: 8f456e00 load_avg_contrib:   118 load.weight:   911 PARENT: 00000000 GROUPNAME: system.slice

Given the above, we can see that with your patch:

 - dst_rq->load.weight is 0 and will not change in this loop.

 - src_rq->load.weight was 1935 + 1024 before the loop.  It will go down
   to 1935 (already has), 1024, and then 0.
 
 - d_load will be 325*, 335, and then 395.

(* - probably not exactly since rcu_sched has already had set_task_rq() called
[cfs_rq switched] on it, but I guess it's actually going to be much lower based
on the other dumps I see where rcu_sched hasn't be switched yet).

So, we will not hit the "if (env->src_rq->load.weight <=
env->dst_rq->load.weight + d_load)" condition to break out of the loop until we
actualy move all tasks.  So the patch will not have any effect on this case.
Or am I missing something?

We'll set up a test anyway with the patch; the problem usually takes a
couple of days to reproduce.

/Rabin

  reply	other threads:[~2015-07-01 14:56 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent
2015-07-01  5:36 ` Mike Galbraith
2015-07-01 14:55   ` Rabin Vincent [this message]
2015-07-01 15:47     ` Mike Galbraith
2015-07-01 20:44     ` Peter Zijlstra
2015-07-01 23:25       ` Yuyang Du
2015-07-02  8:05         ` Mike Galbraith
2015-07-02  1:05           ` Yuyang Du
2015-07-02 10:25             ` Mike Galbraith
2015-07-02 11:40             ` Morten Rasmussen
2015-07-02 19:37               ` Yuyang Du
2015-07-03  9:34                 ` Morten Rasmussen
2015-07-03 16:38                   ` Peter Zijlstra
2015-07-05 22:31                     ` Yuyang Du
2015-07-09 14:32                       ` Morten Rasmussen
2015-07-09 23:24                         ` Yuyang Du
2015-07-05 20:12                   ` Yuyang Du
2015-07-06 17:36                     ` Dietmar Eggemann
2015-07-07 11:17                       ` Rabin Vincent
2015-07-13 17:43                         ` Dietmar Eggemann
2015-07-09 13:53                     ` Morten Rasmussen
2015-07-09 22:34                       ` Yuyang Du
2015-07-02 10:53         ` Peter Zijlstra
2015-07-02 11:44           ` Morten Rasmussen
2015-07-02 18:42             ` Yuyang Du
2015-07-03  4:42               ` Mike Galbraith
2015-07-03 16:39         ` Peter Zijlstra
2015-07-05 22:11           ` Yuyang Du
2015-07-09  6:15             ` Stefan Ekenberg
2015-07-26 18:57             ` Yuyang Du
2015-08-03 17:05             ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150701145551.GA15690@axis.com \
    --to=rabin.vincent@axis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.