From: Miao Xie <miaox@cn.fujitsu.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Ingo Molnar <mingo@elte.hu>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Avi Kivity <avi@qumranet.com>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [BUG] CFS vs cpu hotplug
Date: Mon, 07 Jul 2008 18:26:17 +0800 [thread overview]
Message-ID: <4871EF49.6000501@cn.fujitsu.com> (raw)
In-Reply-To: <486B490C.3090902@cn.fujitsu.com>
on 3:59 Lai Jiangshan wrote:
> Dmitry Adamushko wrote:
>> 2008/7/2 Lai Jiangshan <laijs@cn.fujitsu.com>:
>>> Ingo Molnar wrote:
>>>> * Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>>>
>>>>> The following oops still occurred whether this patch is applied or not.
>>>>> [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b
>>>>> [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb
>>>>> [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11
>>>>> [<ffffffff805736d6>] _cpu_down+0x191/0x256
>>>>> [<ffffffff805737c1>] cpu_down+0x26/0x36
>>>>> [<ffffffff805749c1>] store_online+0x32/0x75
>>>>> [<ffffffff803d1982>] sysdev_store+0x24/0x26
>>>>> [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c
>>>>> [<ffffffff80290e6b>] vfs_write+0xae/0x137
>>>>> [<ffffffff802913d3>] sys_write+0x47/0x70
>>>>> [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80
>>>> hm, there were multiple problems in this area and a lot of dormant bugs.
>>>> Do you have this recent upstream commit in your tree:
>>> Hi, Ingo
>>> I tested it again with the most recent upstreams(including the
>>> following patch) committed, the oops still occurred.
>> [ taken from the oops ]
>>> kernel BUG at kernel/sched.c:6133!
>>>
[snip]
>> We should see then all tasks that have been migrated (or failed to be
>> migrated) during migration_call(CPU_DEAD, ...).
>>
> Thank you. I'll test it again with your debugging patch applied
> and get more info.
I tested it with Dmitry's patch, and found that all the tasks on the offline
cpu were migrated to an online cpu by migrate_live_tasks() in migration_call().
But some tasks(such as klogd and so on)was moved back to the offline cpu
immediately before BUG_ON(rq->nr_running != 0) checking, even before acquiring
rq's lock.
static int __cpuinit
migration_call(struct notifier_block *nfb, unsigned long action, void *
{
...
switch (action) {
...
case CPU_DEAD:
case CPU_DEAD_FROZEN:
cpuset_lock();
migrate_live_tasks(cpu);
rq = cpu_rq(cpu);
...
spin_lock_irq(&rq->lock);
...
migrate_dead_tasks(cpu);
spin_unlock_irq(&rq->lock);
cpuset_unlock();
migrate_nr_uninterruptible(rq);
BUG_ON(rq->nr_running != 0);
...
break;
}
...
}
By debuging, I found this bug was caused by select_task_rq_fair().
After migrating the tasks on the offline cpu to an online cpu, the kernel would
wake up these migrated tasks quickly by try_to_wake_up(). try_to_wake_up() would
invoke select_task_rq_fair() to find a lower-load cpu in sched domains for them.
But the sched domains weren't updated and the offline cpu was still in the sched
domains. So select_task_rq_fair() might return the offline cpu's id, then the
bug occurred.
I fix the bug just by checking the select_task_rq_fair()'s return value in
try_to_wake_up().
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
kernel/sched.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 94ead43..15b5ddf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2103,6 +2103,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
goto out_activate;
cpu = p->sched_class->select_task_rq(p, sync);
+ if (unlikely(cpu_is_offline(cpu)))
+ cpu = orig_cpu;
+
if (cpu != orig_cpu) {
set_task_cpu(p, cpu);
task_rq_unlock(rq, &flags);
--
1.5.4.rc3
next prev parent reply other threads:[~2008-07-07 10:28 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-19 16:19 [BUG] CFS vs cpu hotplug Heiko Carstens
2008-06-19 18:05 ` Peter Zijlstra
2008-06-19 18:14 ` Peter Zijlstra
2008-06-19 21:14 ` Heiko Carstens
2008-06-19 21:26 ` Peter Zijlstra
2008-06-19 21:17 ` Heiko Carstens
2008-06-19 21:32 ` Peter Zijlstra
2008-06-19 21:49 ` Heiko Carstens
2008-06-20 8:51 ` Peter Zijlstra
2008-06-20 22:19 ` Heiko Carstens
2008-06-20 11:44 ` Dmitry Adamushko
2008-06-20 22:23 ` Heiko Carstens
2008-06-25 22:12 ` Dmitry Adamushko
2008-06-28 22:16 ` Dmitry Adamushko
2008-06-29 6:55 ` Ingo Molnar
2008-06-30 9:07 ` Heiko Carstens
2008-06-30 9:17 ` Ingo Molnar
2008-07-01 9:22 ` Lai Jiangshan
2008-07-01 9:31 ` Ingo Molnar
2008-07-01 10:09 ` Lai Jiangshan
2008-07-02 7:13 ` Lai Jiangshan
2008-07-02 8:50 ` Dmitry Adamushko
2008-07-02 9:23 ` Lai Jiangshan
2008-07-07 10:26 ` Miao Xie [this message]
2008-07-07 11:31 ` Dmitry Adamushko
-- strict thread matches above, loose matches on Subject: below --
2008-07-09 22:32 Dmitry Adamushko
2008-07-10 7:30 ` Heiko Carstens
2008-07-10 7:39 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4871EF49.6000501@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=avi@qumranet.com \
--cc=dmitry.adamushko@gmail.com \
--cc=heiko.carstens@de.ibm.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox