All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sachin Sant <sachinp@in.ibm.com>
To: Linux/PPC Development <linuxppc-dev@ozlabs.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-next@vger.kernel.org
Subject: [Next] CPU Hotplug test failures on powerpc
Date: Fri, 11 Dec 2009 16:23:59 +0530	[thread overview]
Message-ID: <4B2224C7.1020908@in.ibm.com> (raw)

While executing cpu_hotplug(from autotest) tests against latest
next on a power6 box, the machine locks up. A soft reset shows
the following trace

cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
    pc: c0000000003433d8: .find_next_bit+0x54/0xc4
    lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
    sp: c00000000c933650
   msr: 8000000000089032
  current = 0xc00000000c173840
  paca    = 0xc000000000bc2600
    pid   = 2602, comm = hotplug06.top.s
enter ? for help
[link register   ] c000000000342f10 .cpumask_next_and+0x4c/0x94
[c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
[c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
[c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
[c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
[c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
[c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
[c00000000c933b20] c000000000525940 .store_online+0x54/0x894
[c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
[c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
[c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
[c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
[c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff9fa8a8f8
SP (fffe7aef200) is in userspace
0:mon> e
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
    pc: c0000000003433d8: .find_next_bit+0x54/0xc4
    lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
    sp: c00000000c933650
   msr: 8000000000089032
  current = 0xc00000000c173840
  paca    = 0xc000000000bc2600
    pid   = 2602, comm = hotplug06.top.s

Last few messages from the dmesg log shows

0:mon> 
<4>IRQ 17 affinity broken off cpu 0
<4>IRQ 18 affinity broken off cpu 0
<4>IRQ 19 affinity broken off cpu 0
<4>IRQ 264 affinity broken off cpu 0
<4>cpu 0 (hwid 0) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Processor 0 found.
<4>IRQ 17 affinity broken off cpu 1
<4>IRQ 18 affinity broken off cpu 1
<4>IRQ 19 affinity broken off cpu 1
<4>IRQ 264 affinity broken off cpu 1
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<6>process 2423 (bash) no longer affine to cpu1
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
0:mon>

After some debugging a possible suspect seems to be commit
6ad4c18.. : sched: Fix balance vs hotplug race

If i revert this patch i am able to execute the tests on this
power6 without any issues. 

But at the same time the above patch is required to solve the
cpu hotplug related race on x86_64(as a side note this same
x86_64 issue can be recreated against latest Linus git as well)
that i reported here :

http://marc.info/?l=linux-kernel&m=125802682922299&w=2

I will try few more iterations with and without the above
patch just to make sure i have the correct results.

If someone has a suggestion let me know.

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

WARNING: multiple messages have this Message-ID (diff)
From: Sachin Sant <sachinp@in.ibm.com>
To: Linux/PPC Development <linuxppc-dev@ozlabs.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-next@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: [Next] CPU Hotplug test failures on powerpc
Date: Fri, 11 Dec 2009 16:23:59 +0530	[thread overview]
Message-ID: <4B2224C7.1020908@in.ibm.com> (raw)

While executing cpu_hotplug(from autotest) tests against latest
next on a power6 box, the machine locks up. A soft reset shows
the following trace

cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
    pc: c0000000003433d8: .find_next_bit+0x54/0xc4
    lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
    sp: c00000000c933650
   msr: 8000000000089032
  current = 0xc00000000c173840
  paca    = 0xc000000000bc2600
    pid   = 2602, comm = hotplug06.top.s
enter ? for help
[link register   ] c000000000342f10 .cpumask_next_and+0x4c/0x94
[c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
[c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
[c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
[c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
[c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
[c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
[c00000000c933b20] c000000000525940 .store_online+0x54/0x894
[c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
[c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
[c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
[c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
[c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff9fa8a8f8
SP (fffe7aef200) is in userspace
0:mon> e
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
    pc: c0000000003433d8: .find_next_bit+0x54/0xc4
    lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
    sp: c00000000c933650
   msr: 8000000000089032
  current = 0xc00000000c173840
  paca    = 0xc000000000bc2600
    pid   = 2602, comm = hotplug06.top.s

Last few messages from the dmesg log shows

0:mon> 
<4>IRQ 17 affinity broken off cpu 0
<4>IRQ 18 affinity broken off cpu 0
<4>IRQ 19 affinity broken off cpu 0
<4>IRQ 264 affinity broken off cpu 0
<4>cpu 0 (hwid 0) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Processor 0 found.
<4>IRQ 17 affinity broken off cpu 1
<4>IRQ 18 affinity broken off cpu 1
<4>IRQ 19 affinity broken off cpu 1
<4>IRQ 264 affinity broken off cpu 1
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<6>process 2423 (bash) no longer affine to cpu1
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
0:mon>

After some debugging a possible suspect seems to be commit
6ad4c18.. : sched: Fix balance vs hotplug race

If i revert this patch i am able to execute the tests on this
power6 without any issues. 

But at the same time the above patch is required to solve the
cpu hotplug related race on x86_64(as a side note this same
x86_64 issue can be recreated against latest Linus git as well)
that i reported here :

http://marc.info/?l=linux-kernel&m=125802682922299&w=2

I will try few more iterations with and without the above
patch just to make sure i have the correct results.

If someone has a suggestion let me know.

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


             reply	other threads:[~2009-12-11 10:53 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-11 10:53 Sachin Sant [this message]
2009-12-11 10:53 ` [Next] CPU Hotplug test failures on powerpc Sachin Sant
2009-12-14  2:48 ` Benjamin Herrenschmidt
2009-12-14  2:48   ` Benjamin Herrenschmidt
2009-12-14  4:37   ` Sachin Sant
2009-12-14  4:37     ` Sachin Sant
2009-12-14 10:22 ` Peter Zijlstra
2009-12-14 10:22   ` Peter Zijlstra
2009-12-14 11:11   ` Sachin Sant
2009-12-14 11:11     ` Sachin Sant
2009-12-14 11:11     ` Sachin Sant
2009-12-14 12:19     ` Peter Zijlstra
2009-12-14 12:19       ` Peter Zijlstra
2009-12-14 21:17       ` Benjamin Herrenschmidt
2009-12-14 21:17         ` Benjamin Herrenschmidt
2009-12-15  9:44         ` Sachin Sant
2009-12-15  9:44           ` Sachin Sant
2009-12-15 10:43           ` Peter Zijlstra
2009-12-15 10:43             ` Peter Zijlstra
2009-12-15 13:47             ` Sachin Sant
2009-12-15 13:47               ` Sachin Sant
2009-12-15 15:03               ` Peter Zijlstra
2009-12-15 15:03                 ` Peter Zijlstra
2009-12-16  5:38                 ` Sachin Sant
2009-12-16  5:38                   ` Sachin Sant
2009-12-16  7:14                   ` Peter Zijlstra
2009-12-16  7:14                     ` Peter Zijlstra
2009-12-16  6:56               ` Xiaotian Feng
2009-12-16  6:56                 ` Xiaotian Feng
2009-12-16  6:25 ` Xiaotian Feng
2009-12-16  6:25   ` Xiaotian Feng
2009-12-16  6:41   ` Sachin Sant
2009-12-16  6:41     ` Sachin Sant
2009-12-16  6:45     ` Xiaotian Feng
2009-12-16  6:45       ` Xiaotian Feng
2009-12-16  6:54       ` Sachin Sant
2009-12-16  6:54         ` Sachin Sant
2009-12-16  7:18         ` Peter Zijlstra
2009-12-16  7:18           ` Peter Zijlstra
2009-12-16  7:57           ` Xiaotian Feng
2009-12-16  7:57             ` Xiaotian Feng
2009-12-16  8:24             ` Sachin Sant
2009-12-16  8:24               ` Sachin Sant
2009-12-16  9:07               ` Xiaotian Feng
2009-12-16  9:07                 ` Xiaotian Feng
2009-12-16  9:07                 ` Xiaotian Feng
2009-12-16  9:15               ` [PATCH] fix cpu hotplug " Xiaotian Feng
2009-12-16 10:16                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B2224C7.1020908@in.ibm.com \
    --to=sachinp@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.