All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miao Xie <miaox@cn.fujitsu.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Ingo Molnar <mingo@elte.hu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Avi Kivity <avi@qumranet.com>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [BUG] CFS vs cpu hotplug
Date: Mon, 07 Jul 2008 18:26:17 +0800	[thread overview]
Message-ID: <4871EF49.6000501@cn.fujitsu.com> (raw)
In-Reply-To: <486B490C.3090902@cn.fujitsu.com>

on 3:59 Lai Jiangshan wrote:
> Dmitry Adamushko wrote:
>> 2008/7/2 Lai Jiangshan <laijs@cn.fujitsu.com>:
>>> Ingo Molnar wrote:
>>>> * Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>>>
>>>>> The following oops still occurred whether this patch is applied or not.
>>>>>  [<ffffffff8059372c>] notifier_call_chain+0x33/0x5b
>>>>>  [<ffffffff802476a9>] __raw_notifier_call_chain+0x9/0xb
>>>>>  [<ffffffff802476ba>] raw_notifier_call_chain+0xf/0x11
>>>>>  [<ffffffff805736d6>] _cpu_down+0x191/0x256
>>>>>  [<ffffffff805737c1>] cpu_down+0x26/0x36
>>>>>  [<ffffffff805749c1>] store_online+0x32/0x75
>>>>>  [<ffffffff803d1982>] sysdev_store+0x24/0x26
>>>>>  [<ffffffff802d2551>] sysfs_write_file+0xe0/0x11c
>>>>>  [<ffffffff80290e6b>] vfs_write+0xae/0x137
>>>>>  [<ffffffff802913d3>] sys_write+0x47/0x70
>>>>>  [<ffffffff8020b1eb>] system_call_after_swapgs+0x7b/0x80
>>>> hm, there were multiple problems in this area and a lot of dormant bugs.
>>>> Do you have this recent upstream commit in your tree:
>>> Hi, Ingo
>>>        I tested it again with the most recent upstreams(including the
>>> following patch) committed, the oops still occurred.
>> [ taken from the oops ]
>>> kernel BUG at kernel/sched.c:6133!
>>>
[snip]
>> We should see then all tasks that have been migrated (or failed to be
>> migrated) during migration_call(CPU_DEAD, ...).
>>
> Thank you. I'll test it again with your debugging patch applied
> and get more info.

I tested it with Dmitry's patch, and found that all the tasks on the offline
cpu were migrated to an online cpu by migrate_live_tasks() in migration_call().
But some tasks(such as klogd and so on)was moved back to the offline cpu
immediately before BUG_ON(rq->nr_running != 0) checking, even before acquiring
rq's lock.

	static int __cpuinit
	migration_call(struct notifier_block *nfb, unsigned long action, void *
	{
		...
		switch (action) {
		...
		case CPU_DEAD:
		case CPU_DEAD_FROZEN:
			cpuset_lock();
			migrate_live_tasks(cpu);
			rq = cpu_rq(cpu);
			...
			spin_lock_irq(&rq->lock);
			...
			migrate_dead_tasks(cpu);
			spin_unlock_irq(&rq->lock);
			cpuset_unlock();
			migrate_nr_uninterruptible(rq);
			BUG_ON(rq->nr_running != 0);
			...
			break;
		}
		...
	}

By debuging, I found this bug was caused by select_task_rq_fair().
After migrating the tasks on the offline cpu to an online cpu, the kernel would
wake up these migrated tasks quickly by try_to_wake_up(). try_to_wake_up() would
invoke select_task_rq_fair() to find a lower-load cpu in sched domains for them.
But the sched domains weren't updated and the offline cpu was still in the sched
domains. So select_task_rq_fair() might return the offline cpu's id, then the
bug occurred.

I fix the bug just by checking the select_task_rq_fair()'s return value in
try_to_wake_up().

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

---
 kernel/sched.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 94ead43..15b5ddf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2103,6 +2103,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 		goto out_activate;
 
 	cpu = p->sched_class->select_task_rq(p, sync);
+	if (unlikely(cpu_is_offline(cpu)))
+		cpu = orig_cpu;
+
 	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
 		task_rq_unlock(rq, &flags);
-- 
1.5.4.rc3



  reply	other threads:[~2008-07-07 10:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-19 16:19 [BUG] CFS vs cpu hotplug Heiko Carstens
2008-06-19 18:05 ` Peter Zijlstra
2008-06-19 18:14   ` Peter Zijlstra
2008-06-19 21:14     ` Heiko Carstens
2008-06-19 21:26       ` Peter Zijlstra
2008-06-19 21:17   ` Heiko Carstens
2008-06-19 21:32   ` Peter Zijlstra
2008-06-19 21:49     ` Heiko Carstens
2008-06-20  8:51       ` Peter Zijlstra
2008-06-20 22:19         ` Heiko Carstens
2008-06-20 11:44   ` Dmitry Adamushko
2008-06-20 22:23     ` Heiko Carstens
2008-06-25 22:12 ` Dmitry Adamushko
2008-06-28 22:16   ` Dmitry Adamushko
2008-06-29  6:55     ` Ingo Molnar
2008-06-30  9:07     ` Heiko Carstens
2008-06-30  9:17       ` Ingo Molnar
2008-07-01  9:22         ` Lai Jiangshan
2008-07-01  9:31           ` Ingo Molnar
2008-07-01 10:09             ` Lai Jiangshan
2008-07-02  7:13             ` Lai Jiangshan
2008-07-02  8:50               ` Dmitry Adamushko
2008-07-02  9:23                 ` Lai Jiangshan
2008-07-07 10:26                   ` Miao Xie [this message]
2008-07-07 11:31                     ` Dmitry Adamushko
  -- strict thread matches above, loose matches on Subject: below --
2008-07-09 22:32 Dmitry Adamushko
2008-07-10  7:30 ` Heiko Carstens
2008-07-10  7:39   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4871EF49.6000501@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=avi@qumranet.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.