From: Sachin Sant <sachinp@in.ibm.com>
To: Linux/PPC Development <linuxppc-dev@ozlabs.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
linux-next@vger.kernel.org
Subject: [Next] CPU Hotplug test failures on powerpc
Date: Fri, 11 Dec 2009 16:23:59 +0530 [thread overview]
Message-ID: <4B2224C7.1020908@in.ibm.com> (raw)
While executing cpu_hotplug(from autotest) tests against latest
next on a power6 box, the machine locks up. A soft reset shows
the following trace
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
pc: c0000000003433d8: .find_next_bit+0x54/0xc4
lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
sp: c00000000c933650
msr: 8000000000089032
current = 0xc00000000c173840
paca = 0xc000000000bc2600
pid = 2602, comm = hotplug06.top.s
enter ? for help
[link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
[c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
[c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
[c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
[c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
[c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
[c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
[c00000000c933b20] c000000000525940 .store_online+0x54/0x894
[c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
[c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
[c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
[c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
[c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff9fa8a8f8
SP (fffe7aef200) is in userspace
0:mon> e
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
pc: c0000000003433d8: .find_next_bit+0x54/0xc4
lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
sp: c00000000c933650
msr: 8000000000089032
current = 0xc00000000c173840
paca = 0xc000000000bc2600
pid = 2602, comm = hotplug06.top.s
Last few messages from the dmesg log shows
0:mon>
<4>IRQ 17 affinity broken off cpu 0
<4>IRQ 18 affinity broken off cpu 0
<4>IRQ 19 affinity broken off cpu 0
<4>IRQ 264 affinity broken off cpu 0
<4>cpu 0 (hwid 0) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Processor 0 found.
<4>IRQ 17 affinity broken off cpu 1
<4>IRQ 18 affinity broken off cpu 1
<4>IRQ 19 affinity broken off cpu 1
<4>IRQ 264 affinity broken off cpu 1
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<6>process 2423 (bash) no longer affine to cpu1
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
0:mon>
After some debugging a possible suspect seems to be commit
6ad4c18.. : sched: Fix balance vs hotplug race
If i revert this patch i am able to execute the tests on this
power6 without any issues.
But at the same time the above patch is required to solve the
cpu hotplug related race on x86_64(as a side note this same
x86_64 issue can be recreated against latest Linus git as well)
that i reported here :
http://marc.info/?l=linux-kernel&m=125802682922299&w=2
I will try few more iterations with and without the above
patch just to make sure i have the correct results.
If someone has a suggestion let me know.
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
WARNING: multiple messages have this Message-ID (diff)
From: Sachin Sant <sachinp@in.ibm.com>
To: Linux/PPC Development <linuxppc-dev@ozlabs.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
linux-next@vger.kernel.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: [Next] CPU Hotplug test failures on powerpc
Date: Fri, 11 Dec 2009 16:23:59 +0530 [thread overview]
Message-ID: <4B2224C7.1020908@in.ibm.com> (raw)
While executing cpu_hotplug(from autotest) tests against latest
next on a power6 box, the machine locks up. A soft reset shows
the following trace
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
pc: c0000000003433d8: .find_next_bit+0x54/0xc4
lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
sp: c00000000c933650
msr: 8000000000089032
current = 0xc00000000c173840
paca = 0xc000000000bc2600
pid = 2602, comm = hotplug06.top.s
enter ? for help
[link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
[c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
[c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
[c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
[c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
[c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
[c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
[c00000000c933b20] c000000000525940 .store_online+0x54/0x894
[c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
[c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
[c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
[c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
[c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff9fa8a8f8
SP (fffe7aef200) is in userspace
0:mon> e
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
pc: c0000000003433d8: .find_next_bit+0x54/0xc4
lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
sp: c00000000c933650
msr: 8000000000089032
current = 0xc00000000c173840
paca = 0xc000000000bc2600
pid = 2602, comm = hotplug06.top.s
Last few messages from the dmesg log shows
0:mon>
<4>IRQ 17 affinity broken off cpu 0
<4>IRQ 18 affinity broken off cpu 0
<4>IRQ 19 affinity broken off cpu 0
<4>IRQ 264 affinity broken off cpu 0
<4>cpu 0 (hwid 0) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Processor 0 found.
<4>IRQ 17 affinity broken off cpu 1
<4>IRQ 18 affinity broken off cpu 1
<4>IRQ 19 affinity broken off cpu 1
<4>IRQ 264 affinity broken off cpu 1
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<6>process 2423 (bash) no longer affine to cpu1
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
0:mon>
After some debugging a possible suspect seems to be commit
6ad4c18.. : sched: Fix balance vs hotplug race
If i revert this patch i am able to execute the tests on this
power6 without any issues.
But at the same time the above patch is required to solve the
cpu hotplug related race on x86_64(as a side note this same
x86_64 issue can be recreated against latest Linus git as well)
that i reported here :
http://marc.info/?l=linux-kernel&m=125802682922299&w=2
I will try few more iterations with and without the above
patch just to make sure i have the correct results.
If someone has a suggestion let me know.
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
next reply other threads:[~2009-12-11 10:53 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-11 10:53 Sachin Sant [this message]
2009-12-11 10:53 ` [Next] CPU Hotplug test failures on powerpc Sachin Sant
2009-12-14 2:48 ` Benjamin Herrenschmidt
2009-12-14 2:48 ` Benjamin Herrenschmidt
2009-12-14 4:37 ` Sachin Sant
2009-12-14 4:37 ` Sachin Sant
2009-12-14 10:22 ` Peter Zijlstra
2009-12-14 10:22 ` Peter Zijlstra
2009-12-14 11:11 ` Sachin Sant
2009-12-14 11:11 ` Sachin Sant
2009-12-14 11:11 ` Sachin Sant
2009-12-14 12:19 ` Peter Zijlstra
2009-12-14 12:19 ` Peter Zijlstra
2009-12-14 21:17 ` Benjamin Herrenschmidt
2009-12-14 21:17 ` Benjamin Herrenschmidt
2009-12-15 9:44 ` Sachin Sant
2009-12-15 9:44 ` Sachin Sant
2009-12-15 10:43 ` Peter Zijlstra
2009-12-15 10:43 ` Peter Zijlstra
2009-12-15 13:47 ` Sachin Sant
2009-12-15 13:47 ` Sachin Sant
2009-12-15 15:03 ` Peter Zijlstra
2009-12-15 15:03 ` Peter Zijlstra
2009-12-16 5:38 ` Sachin Sant
2009-12-16 5:38 ` Sachin Sant
2009-12-16 7:14 ` Peter Zijlstra
2009-12-16 7:14 ` Peter Zijlstra
2009-12-16 6:56 ` Xiaotian Feng
2009-12-16 6:56 ` Xiaotian Feng
2009-12-16 6:25 ` Xiaotian Feng
2009-12-16 6:25 ` Xiaotian Feng
2009-12-16 6:41 ` Sachin Sant
2009-12-16 6:41 ` Sachin Sant
2009-12-16 6:45 ` Xiaotian Feng
2009-12-16 6:45 ` Xiaotian Feng
2009-12-16 6:54 ` Sachin Sant
2009-12-16 6:54 ` Sachin Sant
2009-12-16 7:18 ` Peter Zijlstra
2009-12-16 7:18 ` Peter Zijlstra
2009-12-16 7:57 ` Xiaotian Feng
2009-12-16 7:57 ` Xiaotian Feng
2009-12-16 8:24 ` Sachin Sant
2009-12-16 8:24 ` Sachin Sant
2009-12-16 9:07 ` Xiaotian Feng
2009-12-16 9:07 ` Xiaotian Feng
2009-12-16 9:07 ` Xiaotian Feng
2009-12-16 9:15 ` [PATCH] fix cpu hotplug " Xiaotian Feng
2009-12-16 10:16 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B2224C7.1020908@in.ibm.com \
--to=sachinp@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.