All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Rabin Vincent <rabin.vincent@axis.com>
Cc: mingo@redhat.com, peterz@infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
Date: Wed, 01 Jul 2015 07:36:35 +0200	[thread overview]
Message-ID: <1435728995.9397.7.camel@gmail.com> (raw)
In-Reply-To: <20150630143057.GA31689@axis.com>

On Tue, 2015-06-30 at 16:30 +0200, Rabin Vincent wrote:
> Hi,
> 
> We're seeing a livelock where two CPUs both loop with interrupts
> disabled in pick_next_task_fair() / idle_balance() and continuously
> fetch all tasks from each other...

Hm.  Does the below help?  Looks to me like we are over-balancing.

homer:/sys/kernel/debug/tracing # echo > trace;massive_intr 12 10 > /dev/null;cat trace|grep OINK | wc -l;tail trace
480
    massive_intr-8967  [003] d...   404.285682: load_balance: OINK - imbal: 1  load: 83  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 83
    massive_intr-8960  [001] d.s.   404.293331: load_balance: OINK - imbal: 1  load: 84  run: 1  det: 1  sload_was: 180 sload_is: 99  dload: 174
    massive_intr-8962  [002] d.s.   404.317572: load_balance: OINK - imbal: 1  load: 83  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 161
    massive_intr-8967  [003] d...   404.318296: load_balance: OINK - imbal: 1  load: 83  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 83
    massive_intr-8960  [005] d...   404.341049: load_balance: OINK - imbal: 1  load: 84  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 84
    massive_intr-8962  [002] d.s.   404.381549: load_balance: OINK - imbal: 1  load: 83  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 161
    massive_intr-8971  [005] d...   404.417148: load_balance: OINK - imbal: 1  load: 84  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 84
    massive_intr-8964  [006] d.s.   404.418536: load_balance: OINK - imbal: 1  load: 72  run: 1  det: 1  sload_was: 144 sload_is: 83  dload: 149
    massive_intr-8968  [003] d...   404.437861: load_balance: OINK - imbal: 1  load: 83  run: 1  det: 1  sload_was: 180 sload_is: 83  dload: 83
    massive_intr-8970  [001] d.s.   404.485263: load_balance: OINK - imbal: 1  load: 71  run: 1  det: 1  sload_was: 154 sload_is: 83  dload: 148
homer:/sys/kernel/debug/tracing # echo > trace;hackbench > /dev/null;cat trace|grep OINK | wc -l;tail trace
51
       hackbench-9079  [006] d...   425.982722: load_balance: OINK - imbal: 54  load: 42  run: 4  det: 4  sload_was: 130 sload_is: 62  dload: 109
       hackbench-9231  [002] d...   425.982974: load_balance: OINK - imbal: 14  load: 23  run: 1  det: 1  sload_was: 30 sload_is: 15  dload: 23
       hackbench-9328  [006] d...   425.983037: load_balance: OINK - imbal: 16  load: 72  run: 2  det: 1  sload_was: 44 sload_is: 32  dload: 72
       hackbench-9197  [002] d...   425.984416: load_balance: OINK - imbal: 62  load: 21  run: 3  det: 8  sload_was: 232 sload_is: 78  dload: 119
       hackbench-9196  [004] d...   425.984507: load_balance: OINK - imbal: 45  load: 43  run: 1  det: 1  sload_was: 44 sload_is: 22  dload: 43
       hackbench-9201  [004] d...   425.984648: load_balance: OINK - imbal: 15  load: 44  run: 1  det: 2  sload_was: 71 sload_is: 25  dload: 73
       hackbench-9235  [002] d...   425.984789: load_balance: OINK - imbal: 5  load: 32  run: 2  det: 1  sload_was: 65 sload_is: 42  dload: 54
       hackbench-9327  [000] d...   425.985424: load_balance: OINK - imbal: 1  load: 95  run: 1  det: 1  sload_was: 49 sload_is: 25  dload: 95
       hackbench-9193  [003] d...   425.988701: load_balance: OINK - imbal: 22  load: 94  run: 1  det: 1  sload_was: 128 sload_is: 66  dload: 94
       hackbench-9197  [003] d...   425.988712: load_balance: OINK - imbal: 56  load: 92  run: 1  det: 1  sload_was: 118 sload_is: 66  dload: 92
homer:/sys/kernel/debug/tracing # tail trace
            kwin-4654  [002] d...   460.627311: load_balance: kglobalaccel:4597 is non-contributor - count as 37
         konsole-4707  [005] d...   460.627639: load_balance: kded4:4594 is non-contributor - count as 40
         konsole-4707  [005] d...   460.627640: load_balance: kactivitymanage:4611 is non-contributor - count as 40
            kwin-4654  [005] d...   460.627712: load_balance: kactivitymanage:4611 is non-contributor - count as 41
 kactivitymanage-4611  [005] d...   460.627726: load_balance: kded4:4594 is non-contributor - count as 40
            kwin-4654  [005] d...   460.828628: load_balance: OINK - imbal: 3  load: 618  run: 1  det: 1  sload_was: 1024 sload_is: 0  dload: 618
  plasma-desktop-4665  [000] d...   461.886746: load_balance: baloo_file:4666 is non-contributor - count as 141
 systemd-journal-397   [001] d...   466.209790: load_balance: pulseaudio:4718 is non-contributor - count as 5
         systemd-1     [007] d...   466.209868: load_balance: kmix:4704 is non-contributor - count as 13
 gnome-keyring-d-9455  [002] d...   466.209902: load_balance: krunner:4702 is non-contributor - count as 8

---
 kernel/sched/fair.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5897,7 +5897,7 @@ static int detach_tasks(struct lb_env *e
 {
 	struct list_head *tasks = &env->src_rq->cfs_tasks;
 	struct task_struct *p;
-	unsigned long load;
+	unsigned long load, d_load = 0, s_load = env->src_rq->load.weight;
 	int detached = 0;
 
 	lockdep_assert_held(&env->src_rq->lock);
@@ -5936,6 +5936,11 @@ static int detach_tasks(struct lb_env *e
 
 		detached++;
 		env->imbalance -= load;
+		if (!load) {
+			load = min_t(unsigned long, env->imbalance, p->se.load.weight);
+			trace_printk("%s:%d is non-contributor - count as %ld\n", p->comm, p->pid, load);
+		}
+		d_load += load;
 
 #ifdef CONFIG_PREEMPT
 		/*
@@ -5954,6 +5959,18 @@ static int detach_tasks(struct lb_env *e
 		if (env->imbalance <= 0)
 			break;
 
+		/*
+		 * We don't want to bleed busiest_rq dry either.  Weighted load
+		 * and/or imbalance may be dinky, load contribution can even be
+		 * zero, perhaps causing us to over balancem we had not assigned
+		 * it above.
+		 */
+		if (env->src_rq->load.weight <= env->dst_rq->load.weight + d_load) {
+			trace_printk("OINK - imbal: %ld  load: %ld  run: %d  det: %d  sload_was: %ld sload_is: %ld  dload: %ld\n",
+				env->imbalance, load, env->src_rq->nr_running, detached, s_load, env->src_rq->load.weight, env->dst_rq->load.weight+d_load);
+			break;
+		}
+
 		continue;
 next:
 		list_move_tail(&p->se.group_node, tasks);



  reply	other threads:[~2015-07-01  5:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent
2015-07-01  5:36 ` Mike Galbraith [this message]
2015-07-01 14:55   ` Rabin Vincent
2015-07-01 15:47     ` Mike Galbraith
2015-07-01 20:44     ` Peter Zijlstra
2015-07-01 23:25       ` Yuyang Du
2015-07-02  8:05         ` Mike Galbraith
2015-07-02  1:05           ` Yuyang Du
2015-07-02 10:25             ` Mike Galbraith
2015-07-02 11:40             ` Morten Rasmussen
2015-07-02 19:37               ` Yuyang Du
2015-07-03  9:34                 ` Morten Rasmussen
2015-07-03 16:38                   ` Peter Zijlstra
2015-07-05 22:31                     ` Yuyang Du
2015-07-09 14:32                       ` Morten Rasmussen
2015-07-09 23:24                         ` Yuyang Du
2015-07-05 20:12                   ` Yuyang Du
2015-07-06 17:36                     ` Dietmar Eggemann
2015-07-07 11:17                       ` Rabin Vincent
2015-07-13 17:43                         ` Dietmar Eggemann
2015-07-09 13:53                     ` Morten Rasmussen
2015-07-09 22:34                       ` Yuyang Du
2015-07-02 10:53         ` Peter Zijlstra
2015-07-02 11:44           ` Morten Rasmussen
2015-07-02 18:42             ` Yuyang Du
2015-07-03  4:42               ` Mike Galbraith
2015-07-03 16:39         ` Peter Zijlstra
2015-07-05 22:11           ` Yuyang Du
2015-07-09  6:15             ` Stefan Ekenberg
2015-07-26 18:57             ` Yuyang Du
2015-08-03 17:05             ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1435728995.9397.7.camel@gmail.com \
    --to=umgwanakikbuti@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rabin.vincent@axis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.