From: Rabin Vincent <rabin.vincent@axis.com>
To: mingo@redhat.com, peterz@infradead.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
Date: Tue, 30 Jun 2015 16:30:58 +0200 [thread overview]
Message-ID: <20150630143057.GA31689@axis.com> (raw)
Hi,
We're seeing a livelock where two CPUs both loop with interrupts
disabled in pick_next_task_fair() / idle_balance() and continuously
fetch all tasks from each other. This has been observed on a 3.18
kernel but the relevant code paths appear to be the same in current
mainline.
The problem situation appears to be this:
cpu0 cpu1
pick_next_task
take rq0->lock
pick_next_task_fair
running=0
idle_balance()
drop rq0->lock
wakeup A,B on cpu0
pick_next_task
take rq1->lock
pick_next_task_fair
running=0
idle_balance()
drop r1->lock
load_balance()
busiest=rq0
take rq0->lock
detach A,B
drop rq0->lock
take rq1->lock
attach A,B
pulled_task = 2
drop rq1->lock
load_balance()
busiest=rq1
take rq1->lock
detach A,B
drop rq1->lock
take rq0->lock
attach A,B
pulled_task = 2
drop rq0->lock
running=0()
idle_balance()
busiest=rq0, pull A,B, etc.
running = 0
load_balance()
busiest=rq1, pull A,B, etc
And this goes on, with interrupts disabled on both CPUs, for at least a
100 ms (which is when a hardware watchdog hits).
The conditions needed, apart from the right timing, are:
- cgroups. One of the tasks, say A, needs to be in a CPU cgroup. When
the problem occurs, A's ->se has zero load_avg_contrib and
task_h_load(A) is zero. However, the se->parent->parent of A has a
(relatively) high load_avg_contrib. cpu0's cfs_rq has therefore a
relatively high runnable_load_avg. find_busiest_group() therefore
detects imbalance, and detach_tasks() detaches all tasks.
- PREEMPT=n. Otherwise, the code under #ifdef in detach_tasks() would
ensure that we'd only ever pull a maximum of one task during idle
balancing.
- cpu0 and cpu1 are threads on the same core (cpus_share_cache() ==
true). otherwise, cpu1 will not be able to wakeup tasks on cpu0
while cpu0 has interrupts disabled (since an IPI would be required).
Turning off the default TTWU_QUEUE feature would also provide the
same effect.
I see two simple ways to prevent the livelock. One is to just to remove
the #ifdef:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d57cc0..74c94dc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5937,7 +5937,6 @@ static int detach_tasks(struct lb_env *env)
detached++;
env->imbalance -= load;
-#ifdef CONFIG_PREEMPT
/*
* NEWIDLE balancing is a source of latency, so preemptible
* kernels will stop after the first task is detached to minimize
@@ -5945,7 +5944,6 @@ static int detach_tasks(struct lb_env *env)
*/
if (env->idle == CPU_NEWLY_IDLE)
break;
-#endif
/*
* We only want to steal up to the prescribed amount of
Or this should work too:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d57cc0..13358cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7487,6 +7487,8 @@ static int idle_balance(struct rq *this_rq)
*/
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;
+ else if (!this_rq->cfs.h_nr_running && pulled_task)
+ pulled_task = 0;
out:
/* Move the next balance forward */
But it seems that the real problem is that detach_tasks() can remove all
tasks, when cgroups are involved as described above?
Thanks.
/Rabin
next reply other threads:[~2015-06-30 14:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-30 14:30 Rabin Vincent [this message]
2015-07-01 5:36 ` [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Mike Galbraith
2015-07-01 14:55 ` Rabin Vincent
2015-07-01 15:47 ` Mike Galbraith
2015-07-01 20:44 ` Peter Zijlstra
2015-07-01 23:25 ` Yuyang Du
2015-07-02 8:05 ` Mike Galbraith
2015-07-02 1:05 ` Yuyang Du
2015-07-02 10:25 ` Mike Galbraith
2015-07-02 11:40 ` Morten Rasmussen
2015-07-02 19:37 ` Yuyang Du
2015-07-03 9:34 ` Morten Rasmussen
2015-07-03 16:38 ` Peter Zijlstra
2015-07-05 22:31 ` Yuyang Du
2015-07-09 14:32 ` Morten Rasmussen
2015-07-09 23:24 ` Yuyang Du
2015-07-05 20:12 ` Yuyang Du
2015-07-06 17:36 ` Dietmar Eggemann
2015-07-07 11:17 ` Rabin Vincent
2015-07-13 17:43 ` Dietmar Eggemann
2015-07-09 13:53 ` Morten Rasmussen
2015-07-09 22:34 ` Yuyang Du
2015-07-02 10:53 ` Peter Zijlstra
2015-07-02 11:44 ` Morten Rasmussen
2015-07-02 18:42 ` Yuyang Du
2015-07-03 4:42 ` Mike Galbraith
2015-07-03 16:39 ` Peter Zijlstra
2015-07-05 22:11 ` Yuyang Du
2015-07-09 6:15 ` Stefan Ekenberg
2015-07-26 18:57 ` Yuyang Du
2015-08-03 17:05 ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150630143057.GA31689@axis.com \
--to=rabin.vincent@axis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox