[PATCH 1/1] Power: fix suspend vt regression

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/1] Power: fix suspend vt regression
@ 2009-08-11  8:41 Jiri Slaby
  2009-08-11 17:00 ` Greg KH
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-08-11  8:41 UTC (permalink / raw)
  To: gregkh; +Cc: linux-kernel, Jiri Slaby, Alan Cox, Rafael J. Wysocki

vt_waitactive no longer accepts console parameter as console-1
since commit "vt: add an event interface". It expects console
number directly (as viewed by userspace -- counting from 1).

Fix a deadlock suspend regression by redefining SUSPEND_CONSOLE
to be MAX_NR_CONSOLES, not MAX_NR_CONSOLES-1.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
---
 kernel/power/console.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/power/console.c b/kernel/power/console.c
index 5187136..6592a57 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -11,7 +11,7 @@
 #include "power.h"
 
 #if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
-#define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
+#define SUSPEND_CONSOLE	MAX_NR_CONSOLES
 
 static int orig_fgconsole, orig_kmsg;
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] Power: fix suspend vt regression
  2009-08-11  8:41 [PATCH 1/1] Power: fix suspend vt regression Jiri Slaby
@ 2009-08-11 17:00 ` Greg KH
  2009-08-11 21:19   ` Jiri Slaby
  0 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2009-08-11 17:00 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-kernel, Alan Cox, Rafael J. Wysocki

On Tue, Aug 11, 2009 at 10:41:33AM +0200, Jiri Slaby wrote:
> vt_waitactive no longer accepts console parameter as console-1
> since commit "vt: add an event interface". It expects console
> number directly (as viewed by userspace -- counting from 1).

As the event interface code is only in -next and not in mainline, this
doesn't pertain to Linus's current tree, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] Power: fix suspend vt regression
  2009-08-11 17:00 ` Greg KH
@ 2009-08-11 21:19   ` Jiri Slaby
  2009-08-11 21:20     ` Jiri Slaby
  2009-08-31  9:47     ` suspend race -next regression [Was: Power: fix suspend vt regression] Jiri Slaby
  0 siblings, 2 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-08-11 21:19 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, Alan Cox, Rafael J. Wysocki

On 08/11/2009 07:00 PM, Greg KH wrote:
> On Tue, Aug 11, 2009 at 10:41:33AM +0200, Jiri Slaby wrote:
>> vt_waitactive no longer accepts console parameter as console-1
>> since commit "vt: add an event interface". It expects console
>> number directly (as viewed by userspace -- counting from 1).
> 
> As the event interface code is only in -next and not in mainline, this
> doesn't pertain to Linus's current tree, right?

Correct. But please ignore this one. The comment above is correct, but
the change should have been done one level deeper. A new patch for this
will follow.

However there is still a race or something. Sometimes the suspend goes
through, sometimes it doesn't. I will investigate this further.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/1] Power: fix suspend vt regression
  2009-08-11 21:19   ` Jiri Slaby
@ 2009-08-11 21:20     ` Jiri Slaby
  2009-08-31  9:47     ` suspend race -next regression [Was: Power: fix suspend vt regression] Jiri Slaby
  1 sibling, 0 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-08-11 21:20 UTC (permalink / raw)
  To: gregkh; +Cc: linux-kernel, Jiri Slaby, Alan Cox, Rafael J. Wysocki

vt_waitactive no longer accepts console parameter as console-1
since commit "vt: add an event interface". It expects console
number directly (as viewed by userspace -- counting from 1).

Fix a deadlock suspend regression by redefining adding one
to vt in vt_move_to_console.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
---
 drivers/char/vt_ioctl.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/vt_ioctl.c b/drivers/char/vt_ioctl.c
index 0fceb8f..e3d4d13 100644
--- a/drivers/char/vt_ioctl.c
+++ b/drivers/char/vt_ioctl.c
@@ -1554,7 +1554,7 @@ int vt_move_to_console(unsigned int vt, int alloc)
 		return -EIO;
 	}
 	release_console_sem();
-	if (vt_waitactive(vt)) {
+	if (vt_waitactive(vt + 1)) {
 		pr_debug("Suspend: Can't switch VCs.");
 		return -EINTR;
 	}
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* suspend race -next regression [Was: Power: fix suspend vt regression]
  2009-08-11 21:19   ` Jiri Slaby
  2009-08-11 21:20     ` Jiri Slaby
@ 2009-08-31  9:47     ` Jiri Slaby
  2009-08-31 19:32       ` Rafael J. Wysocki
  1 sibling, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-08-31  9:47 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, Alan Cox, Rafael J. Wysocki, Ingo Molnar

On 08/11/2009 11:19 PM, Jiri Slaby wrote:
> However there is still a race or something. Sometimes the suspend goes
> through, sometimes it doesn't. I will investigate this further.

Hmm, this took a loong time to track down a bit. Code instrumentation by
outb(XX, 0x80) usually caused the issue to disappear.

However I found out that it's caused by might_sleep() calls in
flush_workqueue() and flush_cpu_workqueue(). I.e. it looks like there is
a task which deadlocks/spins forever. If we won't reschedule to it,
suspend proceeds.

I replaced the latter might_sleep() by show_state() and removed
refrigerated tasks afterwards. The thing is that I don't know if the
prank task is there. I need a scheduler to store "next" task pid or
whatever to see what it picked as "next" and so what will run due to
might_sched(). I can then show it on port 80 display and read it when
the hangup occurs.

Depending on which might_sleep(), either flush_workqueue() never (well,
at least in next 5 minutes) proceeds to for_each_cpu() or
wait_for_completion() in flush_cpu_workqueue() never returns.

It's a regression against some -rc1 based -next tree. Bisection
impossible, suspend needs to be run even 7 times before it occurs. Maybe
a s/might_sleep/yield/ could make it happen earlier (going to try)?

Attaching the filtered show_task() output:
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Suspending console(s) (use no_console_suspend to debug)
sd 1:0:0:0: [sdb] Synchronizing SCSI cache
sd 1:0:0:0: [sdb] Stopping disk
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Stopping disk
parport_pc 00:08: disabled
serial 00:06: disabled
ath5k 0000:04:00.0: PCI INT A disabled
ACPI handle has no context!
ehci_hcd 0000:00:1d.7: PCI INT A disabled
uhci_hcd 0000:00:1d.2: PCI INT D disabled
uhci_hcd 0000:00:1d.1: PCI INT B disabled
uhci_hcd 0000:00:1d.0: PCI INT A disabled
HDA Intel 0000:00:1b.0: PCI INT A disabled
ehci_hcd 0000:00:1a.7: PCI INT D disabled
uhci_hcd 0000:00:1a.2: PCI INT C disabled
uhci_hcd 0000:00:1a.1: PCI INT B disabled
uhci_hcd 0000:00:1a.0: PCI INT A disabled
e1000e 0000:00:19.0: PCI INT A disabled
e1000e 0000:00:19.0: PME# enabled
e1000e 0000:00:19.0: wake-up capability enabled by ACPI
ACPI handle has no context!
ehci_hcd 0000:00:1d.7: PME# disabled
ehci_hcd 0000:00:1a.7: PME# disabled
ACPI: Preparing to enter system sleep state S3
Disabling non-boot CPUs ...
  task                        PC stack   pid father
init          D ffff8801cb8ffb60     0     1      0 0x00000000
 ffff8801cb86fd88 0000000000000086 00000001cb86fd88 ffff8801cb854000
 ffff8801cb854000 0000000000000ba2 ffff8801cb86fd28 00000000ffff36c7
 ffff8801cb8542a8 000000000000df68 00000000000118c0 ffff8801cb8542a8
Call Trace:
 [<ffffffff8105d5fd>] refrigerator+0xad/0x100
 [<ffffffff8104eb64>] get_signal_to_deliver+0x84/0x380
 [<ffffffff8100b36c>] do_notify_resume+0xbc/0x7c0
 [<ffffffff8105ebe9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff810d2c18>] ? poll_select_copy_remaining+0xf8/0x150
 [<ffffffff810d44d4>] ? sys_select+0x54/0x110
 [<ffffffff8100bfc5>] sysret_signal+0x6d/0xb7
kthreadd      S 0000000000000001     0     2      0 0x00000000
 ffff8801cb871f00 0000000000000046 ffff8801cb871f40 ffffffff8100ce32
 0000000000000001 ffff8801cb8546b0 ffff8801cb8546b0 ffff8801c5dddcc0
 ffff8801cb854958 000000000000df68 00000000000118c0 ffff8801cb854958
Call Trace:
 [<ffffffff8100ce32>] ? kernel_thread+0x82/0xe0
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff810562a5>] kthreadd+0x115/0x120
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff81056190>] ? kthreadd+0x0/0x120
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
migration/0   S 0000000000000000     0     3      2 0x00000000
 ffff8801cb875e70 0000000000000046 ffff8800282918c0 0000000000000001
 ffff8801cb875df0 ffffffff81036bdd ffff8801c5fef730 00000000ffff367a
 ffff8801cb8592e8 000000000000df68 00000000000118c0 ffff8801cb8592e8
Call Trace:
 [<ffffffff81036bdd>] ? enqueue_task_fair+0x3d/0x80
 [<ffffffff8103bafe>] migration_thread+0x1ae/0x2c0
 [<ffffffff8103b950>] ? migration_thread+0x0/0x2c0
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
ksoftirqd/0   S ffff8801cb8596f0     0     4      2 0x00000000
 ffff8801cb877ea0 0000000000000046 0000000000000000 00000000000118c0
 0000000000000000 ffffffff81595920 0000000000000000 00000000ffff35ed
 ffff8801cb859998 000000000000df68 00000000000118c0 ffff8801cb859998
Call Trace:
 [<ffffffff8103760d>] ? default_wake_function+0xd/0x10
 [<ffffffff810450b5>] ksoftirqd+0xc5/0x100
 [<ffffffff81044ff0>] ? ksoftirqd+0x0/0x100
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
watchdog/0    S 0000000000000001     0     5      2 0x00000000
 ffff8801cb879eb0 0000000000000046 ffff8801cb879e20 ffffffff810687c3
 ffff8801cb85a080 ffff8800282118c0 ffff8801cb879ea0 ffffffff81037f5f
 ffff8801cb85a328 000000000000df68 00000000000118c0 ffff8801cb85a328
Call Trace:
 [<ffffffff810687c3>] ? rt_mutex_adjust_pi+0x73/0x80
 [<ffffffff81037f5f>] ? __sched_setscheduler+0x19f/0x420
 [<ffffffff81080a20>] ? watchdog+0x0/0x90
 [<ffffffff81080a20>] ? watchdog+0x0/0x90
 [<ffffffff81080a72>] watchdog+0x52/0x90
 [<ffffffff81080a20>] ? watchdog+0x0/0x90
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
migration/1   S 0000000000000001     0     6      2 0x00000000
 ffff8801cb87de70 0000000000000046 ffff8800282118c0 0000000000000000
 ffff8801cb87ddf0 ffffffff81036bdd ffff8801c9080200 00000000ffff367a
 ffff8801cb85a9d8 000000000000df68 00000000000118c0 ffff8801cb85a9d8
Call Trace:
 [<ffffffff81036bdd>] ? enqueue_task_fair+0x3d/0x80
 [<ffffffff8103bafe>] migration_thread+0x1ae/0x2c0
 [<ffffffff8103b950>] ? migration_thread+0x0/0x2c0
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
ksoftirqd/1   S ffff8801cb8600c0     0     7      2 0x00000000
 ffff8801cb87fea0 0000000000000046 ffff8801cb860368 000000000000df68
 00000000000118c0 ffff8801cb860370 ffff8801cb9197f0 00000000ffff35bf
 ffff8801cb860368 000000000000df68 00000000000118c0 ffff8801cb860368
Call Trace:
 [<ffffffff810450b5>] ksoftirqd+0xc5/0x100
 [<ffffffff81044ff0>] ? ksoftirqd+0x0/0x100
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
watchdog/1    S 0000000000000001     0     8      2 0x00000000
 ffff8801cb885eb0 0000000000000046 ffff8801cb885e20 ffffffff810687c3
 ffff8801cb860770 ffff8800282918c0 ffff8801cb885ea0 00000000fffedc26
 ffff8801cb860a18 000000000000df68 00000000000118c0 ffff8801cb860a18
Call Trace:
 [<ffffffff810687c3>] ? rt_mutex_adjust_pi+0x73/0x80
 [<ffffffff81080a20>] ? watchdog+0x0/0x90
 [<ffffffff81080a20>] ? watchdog+0x0/0x90
 [<ffffffff81080a72>] watchdog+0x52/0x90
 [<ffffffff81080a20>] ? watchdog+0x0/0x90
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
events/0      S ffff8801cb8617b0     0     9      2 0x00000000
 ffff8801cb88be60 0000000000000046 ffff8801cb88be10 ffffffff8128fcf5
 ffffffff81595ef8 ffff8800282142c8 ffff8801cb88bde0 00000000ffff36e5
 ffff8801cb861a58 000000000000df68 00000000000118c0 ffff8801cb861a58
Call Trace:
 [<ffffffff8128fcf5>] ? e1000_watchdog_task+0x185/0x6e0
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
events/1      S ffff8801cb864140     0    10      2 0x00000000
 ffff8801cb88de60 0000000000000046 ffff8801cb88ddb0 ffff8801cb88ddb0
 ffff8801cb88de00 ffff8800282942c0 ffffffff8136fd80 00000000ffff37ee
 ffff8801cb8643e8 000000000000df68 00000000000118c0 ffff8801cb8643e8
Call Trace:
 [<ffffffff8136fd80>] ? linkwatch_event+0x0/0x30
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
cpuset        S ffff8801cb8647f0     0    11      2 0x00000000
 ffff8801cb891e60 0000000000000046 ffff8801cb891de0 ffffffff81036826
 0000000000000000 ffff8801cb8647f0 ffff8800282918c0 00000000fffedb2d
 ffff8801cb864a98 000000000000df68 00000000000118c0 ffff8801cb864a98
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
khelper       S ffff8801cb865180     0    12      2 0x00000000
 ffff8801cb893e60 0000000000000046 0000000000000611 ffff8801c9ae4000
 ffffffff810518c0 0000000000000000 ffffffff8100ce90 00000000ffff35bf
 ffff8801cb865428 000000000000df68 00000000000118c0 ffff8801cb865428
Call Trace:
 [<ffffffff810518c0>] ? wait_for_helper+0x0/0x80
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
netns         S ffff8801cb865830     0    15      2 0x00000000
 ffff8801cb8b9e60 0000000000000046 ffff8801cb8b9de0 ffffffff81036826
 0000000000000000 ffff8801cb865830 ffff8800282918c0 00000000fffedb2d
 ffff8801cb865ad8 000000000000df68 00000000000118c0 ffff8801cb865ad8
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
async/mgr     S ffff8801cb8e3e80     0    19      2 0x00000000
 ffff8801cb8e3e70 0000000000000046 ffff8801cb8e3de0 ffffffff81032332
 0000000000000001 ffff8800282918c0 ffff8801cb8e3e00 ffffffff810323a8
 ffff8801cb8c2468 000000000000df68 00000000000118c0 ffff8801cb8c2468
Call Trace:
 [<ffffffff81032332>] ? enqueue_task+0x32/0x80
 [<ffffffff810323a8>] ? activate_task+0x28/0x40
 [<ffffffff8105c6e5>] async_manager_thread+0x65/0x100
 [<ffffffff81037600>] ? default_wake_function+0x0/0x10
 [<ffffffff8105c680>] ? async_manager_thread+0x0/0x100
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
pm            D ffff8801cb8c2870     0    20      2 0x00000000
 ffff8801cb8e5e30 0000000000000046 0000000000000000 ffff8801c9044080
 0000000000000000 0000000000000000 ffff8801cb8e5e60 ffffffff8142dc98
 ffff8801cb8c2b18 000000000000df68 00000000000118c0 ffff8801cb8c2b18
Call Trace:
 [<ffffffff8142dc98>] ? thread_return+0x3e/0x726
 [<ffffffff8105d5fd>] refrigerator+0xad/0x100
 [<ffffffff81051da5>] worker_thread+0xf5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kblockd/0     S ffff8801cb8df200     0   159      2 0x00000000
 ffff8801cba77e60 0000000000000046 ffff8801ca5114c8 ffff880028214688
 ffff8801cba77fd8 ffff8801cb8df200 ffff8801cba77df0 00000000ffff362d
 ffff8801cb8df4a8 000000000000df68 00000000000118c0 ffff8801cb8df4a8
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kblockd/1     S ffff8801cb8df8b0     0   160      2 0x00000000
 ffff8801cba7be60 0000000000000046 ffff8801ca4544c8 ffff880028294688
 ffff8801cba7bfd8 ffff8801cb8df8b0 ffff8801cba7bdf0 00000000ffff362a
 ffff8801cb8dfb58 000000000000df68 00000000000118c0 ffff8801cb8dfb58
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kacpid        S ffff8801cba7e240     0   162      2 0x00000000
 ffff8801cba9de60 0000000000000046 ffff8801cba9ddb0 ffff8801cba9ddb0
 0000000000000000 0000000000000286 ffff8801cba9ddf0 0000000000000202
 ffff8801cba7e4e8 000000000000df68 00000000000118c0 ffff8801cba7e4e8
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kacpi_notify  S ffff8801cba7e8f0     0   163      2 0x00000000
 ffff8801cba9fe60 0000000000000046 ffff8801cba9fde0 ffffffff81036826
 ffff8801cb854048 0000000000000286 ffff8801cba9fdf0 0000000000000202
 ffff8801cba7eb98 000000000000df68 00000000000118c0 ffff8801cba7eb98
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kacpi_hotplug S ffff8801cba7f280     0   164      2 0x00000000
 ffff8801cbaa3e60 0000000000000046 ffff8801cbaa3db0 ffff8801cbaa3db0
 0000000000000000 0000000000000286 ffff8801cbaa3df0 0000000000000202
 ffff8801cba7f528 000000000000df68 00000000000118c0 ffff8801cba7f528
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
ata/0         S ffff8801cba7f930     0   246      2 0x00000000
 ffff8801cb8e7e60 0000000000000046 ffff8801cb8e7de0 ffffffff81036826
 0000000000000000 ffff8801cba7f930 ffff8800282118c0 ffffffff81595920
 ffff8801cba7fbd8 000000000000df68 00000000000118c0 ffff8801cba7fbd8
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
ata/1         S ffff8801cb970000     0   247      2 0x00000000
 ffff8801cb8c1e60 0000000000000046 ffff8801cb8c1de0 ffffffff81036826
 ffff8801cb8c1df0 ffff8801cb970000 ffff8800282918c0 00000000fffedb2d
 ffff8801cb9702a8 000000000000df68 00000000000118c0 ffff8801cb9702a8
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
ata_aux       S ffff8801cb9706b0     0   248      2 0x00000000
 ffff8801cb907e60 0000000000000046 ffff8801cb907de0 ffffffff81036826
 0000000000000000 ffff8801cb9706b0 ffff8800282918c0 00000000fffedb2d
 ffff8801cb970958 000000000000df68 00000000000118c0 ffff8801cb970958
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
<removed refrigerated tasks>
aio/0         S ffff8801cba521c0     0   333      2 0x00000000
 ffff8801cb9e1e60 0000000000000046 ffff8801cb9e1de0 ffffffff81036826
 0000000000000000 ffff8801cba521c0 ffff8800282118c0 ffffffff81595920
 ffff8801cba52468 000000000000df68 00000000000118c0 ffff8801cba52468
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
aio/1         S ffff8801cb9f4180     0   334      2 0x00000000
 ffff8801cb911e60 0000000000000046 ffff8801cb911de0 ffffffff81036826
 ffff8801cb911df0 ffff8801cb9f4180 ffff8800282918c0 00000000fffedb2d
 ffff8801cb9f4428 000000000000df68 00000000000118c0 ffff8801cb9f4428
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
crypto/0      S ffff8801cb9f4830     0   336      2 0x00000000
 ffff8801cb937e60 0000000000000046 ffff8801cb937de0 ffffffff81036826
 0000000000000000 ffff8801cb9f4830 ffff8800282118c0 ffffffff81595920
 ffff8801cb9f4ad8 000000000000df68 00000000000118c0 ffff8801cb9f4ad8
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
crypto/1      S ffff8801cb9f51c0     0   337      2 0x00000000
 ffff8801cbb47e60 0000000000000046 ffff8801cbb47de0 ffffffff81036826
 ffff8801cbb47df0 ffff8801cb9f51c0 ffff8800282918c0 00000000fffedb2d
 ffff8801cb9f5468 000000000000df68 00000000000118c0 ffff8801cb9f5468
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
i915/0        S ffff8801cb9f5870     0   437      2 0x00000000
 ffff8801cab79e60 0000000000000046 ffff8801cab79de0 ffff8801cb9bc820
 ffff8801cbb77e68 ffff8801cb9bc800 ffff8801cbb77000 00000000ffff310a
 ffff8801cb9f5b18 000000000000df68 00000000000118c0 ffff8801cb9f5b18
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
i915/1        S ffff8801cb974730     0   438      2 0x00000000
 ffff8801cab7be60 0000000000000046 ffff8801cab7bde0 ffff8801cb9bc820
 ffff8801cbb77e68 ffff8801cb9bc800 ffff8801cbb77000 00000000ffff2f47
 ffff8801cb9749d8 000000000000df68 00000000000118c0 ffff8801cb9749d8
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
scsi_eh_0     S ffff8801cb301ea0     0   482      2 0x00000000
 ffff8801cb301e50 0000000000000046 ffff8801cb301dc0 ffffffff8116b067
 ffff8801cab6c800 0000000000000000 ffff8801cb301dd0 ffffffff8123af02
 ffff8801cb9204a8 000000000000df68 00000000000118c0 ffff8801cb9204a8
Call Trace:
 [<ffffffff8116b067>] ? kobject_put+0x27/0x60
 [<ffffffff8123af02>] ? put_device+0x12/0x20
 [<ffffffff8125584e>] scsi_error_handler+0x7e/0x430
 [<ffffffff812557d0>] ? scsi_error_handler+0x0/0x430
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
scsi_eh_1     S ffff8801cbbc7ea0     0   485      2 0x00000000
 ffff8801cbbc7e50 0000000000000046 ffff8801cbbc7dc0 ffffffff8116b067
 ffff8801cbbd1000 0000000000000000 ffff8801cbbc7dd0 ffffffff8123af02
 ffff8801cb91e468 000000000000df68 00000000000118c0 ffff8801cb91e468
Call Trace:
 [<ffffffff8116b067>] ? kobject_put+0x27/0x60
 [<ffffffff8123af02>] ? put_device+0x12/0x20
 [<ffffffff8125584e>] scsi_error_handler+0x7e/0x430
 [<ffffffff812557d0>] ? scsi_error_handler+0x0/0x430
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
scsi_eh_2     S ffff8801cb313ea0     0   488      2 0x00000000
 ffff8801cb313e50 0000000000000046 ffff8801cb313de0 ffffffff81032cfe
 ffff8801cb313de0 0000000000000282 ffff8801cb31c000 00000000ffff36e5
 ffff8801cb91c428 000000000000df68 00000000000118c0 ffff8801cb91c428
Call Trace:
 [<ffffffff81032cfe>] ? __wake_up+0x4e/0x70
 [<ffffffff8125584e>] scsi_error_handler+0x7e/0x430
 [<ffffffff812557d0>] ? scsi_error_handler+0x0/0x430
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
scsi_eh_3     S ffff8801cbbc3ea0     0   491      2 0x00000000
 ffff8801cbbc3e50 0000000000000046 ffff8801cbbc3dc0 ffffffff8116b067
 ffff8801cbbd4000 0000000000000000 ffff8801cbbc3dd0 00000000ffff36e5
 ffff8801cb9193e8 000000000000df68 00000000000118c0 ffff8801cb9193e8
Call Trace:
 [<ffffffff8116b067>] ? kobject_put+0x27/0x60
 [<ffffffff8125584e>] scsi_error_handler+0x7e/0x430
 [<ffffffff812557d0>] ? scsi_error_handler+0x0/0x430
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
scsi_eh_4     S ffff8801cb309ea0     0   494      2 0x00000000
 ffff8801cb309e50 0000000000000046 ffff8801cb309de0 ffffffff81032cfe
 ffff8801cb309de0 0000000000000282 ffff8801cbb84000 0000000000000000
 ffff8801cb8e13a8 000000000000df68 00000000000118c0 ffff8801cb8e13a8
Call Trace:
 [<ffffffff81032cfe>] ? __wake_up+0x4e/0x70
 [<ffffffff8125584e>] scsi_error_handler+0x7e/0x430
 [<ffffffff812557d0>] ? scsi_error_handler+0x0/0x430
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
scsi_eh_5     S ffff8801cb30fea0     0   497      2 0x00000000
 ffff8801cb30fe50 0000000000000046 ffff8801cb30fde0 ffffffff81032cfe
 ffff8801cb30fde0 0000000000000282 ffff8801cbb88000 0000000000000000
 ffff8801cb8b1368 000000000000df68 00000000000118c0 ffff8801cb8b1368
Call Trace:
 [<ffffffff81032cfe>] ? __wake_up+0x4e/0x70
 [<ffffffff8125584e>] scsi_error_handler+0x7e/0x430
 [<ffffffff812557d0>] ? scsi_error_handler+0x0/0x430
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kstriped      S ffff8801cb92d140     0   562      2 0x00000000
 ffff8801ca4a5e60 0000000000000046 ffff8801ca4a5de0 ffffffff81036826
 ffff8801cb854048 ffff880028291800 ffff8800282918c0 00000000fffedb2d
 ffff8801cb92d3e8 000000000000df68 00000000000118c0 ffff8801cb92d3e8
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
usbhid_resume D ffff8801ca45a140     0   591      2 0x00000000
 ffff8801ca4fbe30 0000000000000046 0000000000000000 ffff8801ca74e400
 ffff8801ca59a280 0000000000000000 ffff8801ca4fbe60 ffffffff8142dc98
 ffff8801ca45a3e8 000000000000df68 00000000000118c0 ffff8801ca45a3e8
Call Trace:
 [<ffffffff8142dc98>] ? thread_return+0x3e/0x726
 [<ffffffff8105d5fd>] refrigerator+0xad/0x100
 [<ffffffff81051da5>] worker_thread+0xf5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
hd-audio0     S ffff8801ca4c5240     0   601      2 0x00000000
 ffff8801ca509e60 0000000000000046 ffff8801ca509de0 ffffffff81036826
 ffff8801cb854048 ffff880028291800 ffff8800282918c0 00000000fffedb2d
 ffff8801ca4c54e8 000000000000df68 00000000000118c0 ffff8801ca4c54e8
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
md3_raid1     S ffff8801ca59fe80     0   656      2 0x00000000
 ffff8801ca59fdb0 0000000000000046 ffff8801ca59fd20 ffffffff812e2743
 ffff8801ca4c7aa0 0000000000000246 ffff8801ca59fe50 00000000ffff0f39
 ffff8801cb989b18 000000000000df68 00000000000118c0 ffff8801cb989b18
Call Trace:
 [<ffffffff812e2743>] ? flush_pending_writes+0x13/0x90
 [<ffffffff8142e8d5>] schedule_timeout+0x1c5/0x230
 [<ffffffff812e286a>] ? raid1d+0xa/0x10d0
 [<ffffffff812eaaea>] md_thread+0xea/0x130
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff812eaa00>] ? md_thread+0x0/0x130
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kjournald     S ffff8801ca582898     0   668      2 0x00000000
 ffff8801ca57de60 0000000000000046 ffff8801ca588000 ffff8801ca582968
 ffff880100000000 00000001ca582888 ffff880100000fdc 00000000ffff367a
 ffff8801cb981328 000000000000df68 00000000000118c0 ffff8801cb981328
Call Trace:
 [<ffffffff8113b24d>] kjournald+0x20d/0x230
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8113b040>] ? kjournald+0x0/0x230
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
<removed refrigerated tasks>
phy0          S ffff8801c93860c0     0  1010      2 0x00000000
 ffff8801ca643e60 0000000000000046 ffff8801ca643de0 ffffffff81036826
 0000000000000001 ffff8801c93860c0 ffff8800282118c0 00000000fffee0bd
 ffff8801c9386368 000000000000df68 00000000000118c0 ffff8801c9386368
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kdmflush      S ffff8801c88660c0     0  1251      2 0x00000000
 ffff8801c8825e60 0000000000000046 0000000000000000 ffff8801ca6a4cc8
 ffff8801c8825fd8 ffff8801c88660c0 ffff8801c8825de0 00000000fffee367
 ffff8801c8866368 000000000000df68 00000000000118c0 ffff8801c8866368
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kdmflush      S ffff8801ca5d36b0     0  1256      2 0x00000000
 ffff8801c8b49e60 0000000000000046 0000000000000000 ffff8801ca6eb0c8
 ffff8801c8b49fd8 ffff8801ca5d36b0 ffff8801c8b49de0 00000000fffee367
 ffff8801ca5d3958 000000000000df68 00000000000118c0 ffff8801ca5d3958
Call Trace:
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kdmflush      S ffff8801c8ba0140     0  1262      2 0x00000000
 ffff8801c92d3e60 0000000000000046 0000000000000000 ffff8801cb329cc8
 ffff8801c92d3fd8 ffff8801c8ba0140 ffff8801c92d3de0 ffffffff8105a599
 ffff8801c8ba03e8 000000000000df68 00000000000118c0 ffff8801c8ba03e8
Call Trace:
 [<ffffffff8105a599>] ? up_write+0x9/0x10
 [<ffffffff81051c3f>] ? run_workqueue+0xcf/0x140
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kjournald     S ffff8801cb329498     0  1310      2 0x00000000
 ffff8801c9347e60 0000000000000046 ffff8801c9347dd0 ffffffff810566d1
 ffff8801c893f738 ffff8801cb329488 ffff8801c9347e20 ffff8801cb329400
 ffff8801c9386a18 000000000000df68 00000000000118c0 ffff8801c9386a18
Call Trace:
 [<ffffffff810566d1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff8113b24d>] kjournald+0x20d/0x230
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8113b040>] ? kjournald+0x0/0x230
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kjournald     S ffff8801ca5c7c98     0  1311      2 0x00000000
 ffff8801c88bfe60 0000000000000046 ffff8801c88bfdd0 ffffffff810566d1
 0000000000000000 ffff8801ca5c7c88 ffff8801c88bfe20 ffff8801ca5c7c00
 ffff8801ca72f3a8 000000000000df68 00000000000118c0 ffff8801ca72f3a8
Call Trace:
 [<ffffffff810566d1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff8113b24d>] kjournald+0x20d/0x230
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8113b040>] ? kjournald+0x0/0x230
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kjournald     S ffff8801ca6f7498     0  1312      2 0x00000000
 ffff8801c887fe60 0000000000000046 ffff8801c887fdd0 ffffffff810566d1
 ffff8801c893f738 ffff8801ca6f7488 ffff8801c887fe20 ffff8801ca6f7400
 ffff8801ca6d54e8 000000000000df68 00000000000118c0 ffff8801ca6d54e8
Call Trace:
 [<ffffffff810566d1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff8113b24d>] kjournald+0x20d/0x230
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8113b040>] ? kjournald+0x0/0x230
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kjournald     S ffff8801ca78c898     0  1313      2 0x00000000
 ffff8801c88c3e60 0000000000000046 ffff8801c88c3dd0 ffffffff810566d1
 0000000000000000 ffff8801ca78c888 ffff8801c88c3e20 00000000fffee41f
 ffff8801c893f2e8 000000000000df68 00000000000118c0 ffff8801c893f2e8
Call Trace:
 [<ffffffff810566d1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff8113b24d>] kjournald+0x20d/0x230
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8113b040>] ? kjournald+0x0/0x230
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kjournald     S ffff8801ca78d898     0  1314      2 0x00000000
 ffff8801c8843e60 0000000000000046 ffff8801c8843dd0 ffffffff810566d1
 0000000000000000 ffff8801ca78d888 ffff8801c8843e20 00000000fffee424
 ffff8801cb989468 000000000000df68 00000000000118c0 ffff8801cb989468
Call Trace:
 [<ffffffff810566d1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff8113b24d>] kjournald+0x20d/0x230
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8113b040>] ? kjournald+0x0/0x230
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
<removed refrigerated tasks>
rpciod/0      S ffff8801ca60f7f0     0  2535      2 0x00000000
 ffff8801ca719e60 0000000000000046 ffff8801ca719de0 ffffffff81036826
 ffff8801ca719df0 ffff8801ca60f7f0 ffff8800282118c0 00000000fffeeca9
 ffff8801ca60fa98 000000000000df68 00000000000118c0 ffff8801ca60fa98
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
rpciod/1      S ffff8801c9362080     0  2536      2 0x00000000
 ffff8801ca577e60 0000000000000046 ffff8801ca577de0 ffffffff81036826
 ffff8801c9bb5188 ffff880028291800 ffff8800282918c0 00000000fffeecba
 ffff8801c9362328 000000000000df68 00000000000118c0 ffff8801c9362328
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
nfsiod        S ffff8801c9b998f0     0  2571      2 0x00000000
 ffff8801c9163e60 0000000000000046 ffff8801c9163de0 ffffffff81036826
 ffff8801c9163df0 ffff8801c9b998f0 ffff8800282118c0 00000000fffeece9
 ffff8801c9b99b98 000000000000df68 00000000000118c0 ffff8801c9b99b98
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
<removed refrigerated tasks>
pm-suspend    D ffff8801cb880000     0  3423   3387 0x00000000
 ffff8801c5dddc08 0000000000000086 0000000000000000 ffff8801c5dddc30
 ffff8801c5dddb78 ffffffff8103321c ffff8801c5dddba8 00000000ffff37ee
 ffff8801ca59a528 000000000000df68 00000000000118c0 ffff8801ca59a528
Call Trace:
 [<ffffffff8103321c>] ? __enqueue_entity+0x7c/0x80
 [<ffffffff8142e864>] schedule_timeout+0x154/0x230
 [<ffffffff8104a5c0>] ? process_timeout+0x0/0x10
 [<ffffffff8142e587>] wait_for_common+0xc7/0x170
 [<ffffffff81037600>] ? default_wake_function+0x0/0x10
 [<ffffffff8142e6ae>] wait_for_completion_timeout+0xe/0x10
 [<ffffffff8141cbf6>] _cpu_down+0xe6/0x110
 [<ffffffff8104115b>] disable_nonboot_cpus+0xab/0x130
 [<ffffffff8106e84d>] suspend_devices_and_enter+0xad/0x1a0
 [<ffffffff8106ea1b>] enter_state+0xdb/0xf0
 [<ffffffff8106e151>] state_store+0x91/0x100
 [<ffffffff8116af27>] kobj_attr_store+0x17/0x20
 [<ffffffff8111b590>] sysfs_write_file+0xe0/0x160
 [<ffffffff810c3698>] vfs_write+0xb8/0x1b0
 [<ffffffff81432ab5>] ? do_page_fault+0x185/0x350
 [<ffffffff810c3cfc>] sys_write+0x4c/0x80
 [<ffffffff8100beeb>] system_call_fastpath+0x16/0x1b
kstop/0       R  running task        0  3836      2 0x00000000
 ffff8801c980fe60 0000000000000046 ffff8801c980fde0 ffffffff81036826
 0000000000000001 ffff8801c6e241c0 ffff8800282118c0 00000000ffff380f
 ffff8801c6e24468 000000000000df68 00000000000118c0 ffff8801c6e24468
Call Trace:
 [<ffffffff81036826>] ? dequeue_task_fair+0x86/0x90
 [<ffffffff81051d95>] worker_thread+0xe5/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kstop/1       R  running task        0  3837      2 0x00000000
 ffff8801c5f08000 0000000000000001 ffffe8ffffc89888 ffffe8ffffc89888
 ffff8801c5f09e80 ffffe8ffffc87450 ffffffffffffff10 ffffffff81080556
 0000000000000010 0000000000000293 ffff8801c5f09e00 0000000000000018
Call Trace:
 [<ffffffff81080556>] ? stop_cpu+0x76/0xe0
 [<ffffffff810804e0>] ? stop_cpu+0x0/0xe0
 [<ffffffff81051c28>] ? run_workqueue+0xb8/0x140
 [<ffffffff81051d5b>] ? worker_thread+0xab/0x120
 [<ffffffff810566c0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81051cb0>] ? worker_thread+0x0/0x120
 [<ffffffff8105633e>] ? kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] ? child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
kcpu_down     R  running task        0  3838      2 0x00000008
 ffff8801c9867e50 ffffffff8105279a ffffffff815df680 0000000000000004
 ffffffff815df680 ffff8801c5dddd58 ffff8801c9867e80 ffffffff810804ac
 0000000000000001 ffff8801c9867ef8 ffff8801c5dddd58 0000000000000010
Call Trace:
 [<ffffffff8105279a>] ? flush_workqueue+0x5a/0x90
 [<ffffffff810804ac>] ? __stop_machine+0x10c/0x140
 [<ffffffff8141c882>] ? _cpu_down_thread+0xa2/0x2f0
 [<ffffffff8141c7e0>] ? _cpu_down_thread+0x0/0x2f0
 [<ffffffff8105633e>] ? kthread+0x8e/0xa0
 [<ffffffff8100ce9a>] ? child_rip+0xa/0x20
 [<ffffffff810562b0>] ? kthread+0x0/0xa0
 [<ffffffff8100ce90>] ? child_rip+0x0/0x20
CPU 1 is now offline
SMP alternatives: switching to UP code
CPU1 is down
Extended CMOS year: 2000
x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106
Back to C!
<removed devices prints>
Restarting tasks ... done.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: suspend race -next regression [Was: Power: fix suspend vt regression]
  2009-08-31  9:47     ` suspend race -next regression [Was: Power: fix suspend vt regression] Jiri Slaby
@ 2009-08-31 19:32       ` Rafael J. Wysocki
  2009-09-04 11:49         ` suspend race -mm " Jiri Slaby
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2009-08-31 19:32 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Greg KH, linux-kernel, Alan Cox, Ingo Molnar

On Monday 31 August 2009, Jiri Slaby wrote:
> On 08/11/2009 11:19 PM, Jiri Slaby wrote:
> > However there is still a race or something. Sometimes the suspend goes
> > through, sometimes it doesn't. I will investigate this further.
> 
> Hmm, this took a loong time to track down a bit. Code instrumentation by
> outb(XX, 0x80) usually caused the issue to disappear.
> 
> However I found out that it's caused by might_sleep() calls in
> flush_workqueue() and flush_cpu_workqueue(). I.e. it looks like there is
> a task which deadlocks/spins forever. If we won't reschedule to it,
> suspend proceeds.
> 
> I replaced the latter might_sleep() by show_state() and removed
> refrigerated tasks afterwards. The thing is that I don't know if the
> prank task is there. I need a scheduler to store "next" task pid or
> whatever to see what it picked as "next" and so what will run due to
> might_sched(). I can then show it on port 80 display and read it when
> the hangup occurs.
> 
> Depending on which might_sleep(), either flush_workqueue() never (well,
> at least in next 5 minutes) proceeds to for_each_cpu() or
> wait_for_completion() in flush_cpu_workqueue() never returns.
> 
> It's a regression against some -rc1 based -next tree. Bisection
> impossible, suspend needs to be run even 7 times before it occurs. Maybe
> a s/might_sleep/yield/ could make it happen earlier (going to try)?

If /sys/class/rtc/rtc0/wakealarm works on this box, you can use it to trigger
resume in a loop.

Basically, you can do

# echo 0 > /sys/class/rtc/rtc0/wakealarm
# date +%s -d "+60 seconds" > /sys/class/rtc/rtc0/wakealarm

then go to suspend and it will resume the box in ~1 minute.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: suspend race -mm regression [Was: Power: fix suspend vt regression]
  2009-08-31 19:32       ` Rafael J. Wysocki
@ 2009-09-04 11:49         ` Jiri Slaby
  2009-09-04 22:30           ` Jiri Slaby
  2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
  0 siblings, 2 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-04 11:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-kernel, Alan Cox, Ingo Molnar, Lai Jiangshan,
	Andrew Morton, Rusty Russell

On 08/31/2009 09:32 PM, Rafael J. Wysocki wrote:
> On Monday 31 August 2009, Jiri Slaby wrote:
>> On 08/11/2009 11:19 PM, Jiri Slaby wrote:
>>> However there is still a race or something. Sometimes the suspend goes
>>> through, sometimes it doesn't. I will investigate this further.
>>
>> Hmm, this took a loong time to track down a bit. Code instrumentation by
>> outb(XX, 0x80) usually caused the issue to disappear.
>>
>> However I found out that it's caused by might_sleep() calls in
>> flush_workqueue() and flush_cpu_workqueue(). I.e. it looks like there is
>> a task which deadlocks/spins forever. If we won't reschedule to it,
>> suspend proceeds.
>>
>> I replaced the latter might_sleep() by show_state() and removed
>> refrigerated tasks afterwards. The thing is that I don't know if the
>> prank task is there. I need a scheduler to store "next" task pid or
>> whatever to see what it picked as "next" and so what will run due to
>> might_sched(). I can then show it on port 80 display and read it when
>> the hangup occurs.
>>
>> Depending on which might_sleep(), either flush_workqueue() never (well,
>> at least in next 5 minutes) proceeds to for_each_cpu() or
>> wait_for_completion() in flush_cpu_workqueue() never returns.
>>
>> It's a regression against some -rc1 based -next tree. Bisection
>> impossible, suspend needs to be run even 7 times before it occurs. Maybe
>> a s/might_sleep/yield/ could make it happen earlier (going to try)?
> 
> If /sys/class/rtc/rtc0/wakealarm works on this box, you can use it to trigger
> resume in a loop.
> 
> Basically, you can do
> 
> # echo 0 > /sys/class/rtc/rtc0/wakealarm
> # date +%s -d "+60 seconds" > /sys/class/rtc/rtc0/wakealarm
> 
> then go to suspend and it will resume the box in ~1 minute.

Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
cpu_hotplug-dont-affect-current-tasks-affinity.patch

Well, I don't know why, but when the kthread overthere runs under
suspend conditions and gets rescheduled (e.g. by the might_sleep()
inside) it never returns. pick_next_task always returns the idle task
from the idle queue. State of the thread is TASK_RUNNING.

Why is it not enqueued into some queue? I tried also
sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
it wrong, it seems like a global scheduler problem?

Ingo, any ideas?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: suspend race -mm regression [Was: Power: fix suspend vt regression]
  2009-09-04 11:49         ` suspend race -mm " Jiri Slaby
@ 2009-09-04 22:30           ` Jiri Slaby
  2009-09-04 22:36             ` Jiri Slaby
  2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
  1 sibling, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-04 22:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Ingo Molnar, Lai Jiangshan, Andrew Morton,
	Rusty Russell

CCs reduced.

On 09/04/2009 01:49 PM, Jiri Slaby wrote:
> On 08/31/2009 09:32 PM, Rafael J. Wysocki wrote:
>> On Monday 31 August 2009, Jiri Slaby wrote:
>>> On 08/11/2009 11:19 PM, Jiri Slaby wrote:
>>>> However there is still a race or something. Sometimes the suspend goes
>>>> through, sometimes it doesn't. I will investigate this further.
>>>
>>> Hmm, this took a loong time to track down a bit. Code instrumentation by
>>> outb(XX, 0x80) usually caused the issue to disappear.
>>>
>>> However I found out that it's caused by might_sleep() calls in
>>> flush_workqueue() and flush_cpu_workqueue(). I.e. it looks like there is
>>> a task which deadlocks/spins forever. If we won't reschedule to it,
>>> suspend proceeds.
>>>
>>> I replaced the latter might_sleep() by show_state() and removed
>>> refrigerated tasks afterwards. The thing is that I don't know if the
>>> prank task is there. I need a scheduler to store "next" task pid or
>>> whatever to see what it picked as "next" and so what will run due to
>>> might_sched(). I can then show it on port 80 display and read it when
>>> the hangup occurs.
>>>
>>> Depending on which might_sleep(), either flush_workqueue() never (well,
>>> at least in next 5 minutes) proceeds to for_each_cpu() or
>>> wait_for_completion() in flush_cpu_workqueue() never returns.
>>>
>>> It's a regression against some -rc1 based -next tree. Bisection
>>> impossible, suspend needs to be run even 7 times before it occurs. Maybe
>>> a s/might_sleep/yield/ could make it happen earlier (going to try)?
>>
>> If /sys/class/rtc/rtc0/wakealarm works on this box, you can use it to trigger
>> resume in a loop.
>>
>> Basically, you can do
>>
>> # echo 0 > /sys/class/rtc/rtc0/wakealarm
>> # date +%s -d "+60 seconds" > /sys/class/rtc/rtc0/wakealarm
>>
>> then go to suspend and it will resume the box in ~1 minute.
> 
> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> cpu_hotplug-dont-affect-current-tasks-affinity.patch

BTW. when I reverted it, during suspend I got a warning:
SMP alternatives: switching to UP code
------------[ cut here ]------------
WARNING: at kernel/smp.c:124
__generic_smp_call_function_interrupt+0xfd/0x110()
Hardware name: To Be Filled By O.E.M.
Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
Call Trace:
 [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
 [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
 [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
 [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
 [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
 [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
 [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
 [<ffffffff8104169b>] disable_nonboot_cpus+0xab/0x130
 [<ffffffff8106ee3d>] suspend_devices_and_enter+0xad/0x1a0
 [<ffffffff8106f00b>] enter_state+0xdb/0xf0
 [<ffffffff8106e741>] state_store+0x91/0x100
 [<ffffffff8116c157>] kobj_attr_store+0x17/0x20
 [<ffffffff8111c6a0>] sysfs_write_file+0xe0/0x160
 [<ffffffff810c3ce8>] vfs_write+0xb8/0x1b0
 [<ffffffff81434c35>] ? do_page_fault+0x185/0x350
 [<ffffffff810c434c>] sys_write+0x4c/0x80
 [<ffffffff8100be2b>] system_call_fastpath+0x16/0x1b
---[ end trace 73264e95657dec65 ]---
CPU1 is down

> Well, I don't know why, but when the kthread overthere runs under
> suspend conditions and gets rescheduled (e.g. by the might_sleep()
> inside) it never returns. pick_next_task always returns the idle task
> from the idle queue. State of the thread is TASK_RUNNING.
> 
> Why is it not enqueued into some queue? I tried also
> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
> it wrong, it seems like a global scheduler problem?
> 
> Ingo, any ideas?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: suspend race -mm regression [Was: Power: fix suspend vt regression]
  2009-09-04 22:30           ` Jiri Slaby
@ 2009-09-04 22:36             ` Jiri Slaby
  2009-09-05 12:39               ` [-mm] warning during suspend [was: suspend race -mm regression] Jiri Slaby
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-04 22:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Ingo Molnar, Lai Jiangshan, Andrew Morton,
	Rusty Russell

On 09/05/2009 12:30 AM, Jiri Slaby wrote:
> WARNING: at kernel/smp.c:124
> __generic_smp_call_function_interrupt+0xfd/0x110()
> Hardware name: To Be Filled By O.E.M.
> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
> Call Trace:
>  [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>  [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>  [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>  [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>  [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>  [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>  [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0

It's the CPU_DEAD notifier:
ffffffff8141ecd0:       48 83 ce 07             or     $0x7,%rsi
ffffffff8141ecd4:       48 c7 c7 08 ff 5d 81    mov
$0xffffffff815dff08,%rdi
ffffffff8141ecdb:       e8 20 c6 c3 ff          callq  ffffffff8105b300
<raw_notifier_call_chain>
ffffffff8141ece0:       3d 02 80 00 00          cmp    $0x8002,%eax

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [-mm] warning during suspend [was: suspend race -mm regression]
  2009-09-04 22:36             ` Jiri Slaby
@ 2009-09-05 12:39               ` Jiri Slaby
  2009-09-05 14:41                 ` Xiao Guangrong
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-05 12:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Andrew Morton, Suresh Siddha, Nick Piggin,
	H. Peter Anvin, Xiao Guangrong, Peter Zijlstra, Rusty Russell,
	Ingo Molnar, Jens Axboe

On 09/05/2009 12:36 AM, Jiri Slaby wrote:
> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
>> WARNING: at kernel/smp.c:124
>> __generic_smp_call_function_interrupt+0xfd/0x110()
>> Hardware name: To Be Filled By O.E.M.
>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
>> Call Trace:
>>  [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>>  [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>>  [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>>  [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>>  [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>>  [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>>  [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
> 
> It's the CPU_DEAD notifier:
> ffffffff8141ecd0:       48 83 ce 07             or     $0x7,%rsi
> ffffffff8141ecd4:       48 c7 c7 08 ff 5d 81    mov
> $0xffffffff815dff08,%rdi
> ffffffff8141ecdb:       e8 20 c6 c3 ff          callq  ffffffff8105b300
> <raw_notifier_call_chain>
> ffffffff8141ece0:       3d 02 80 00 00          cmp    $0x8002,%eax

And it's due to:
generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch

Should the WARN_ONs now warn only when run_callbacks is true?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [-mm] warning during suspend [was: suspend race -mm regression]
  2009-09-05 12:39               ` [-mm] warning during suspend [was: suspend race -mm regression] Jiri Slaby
@ 2009-09-05 14:41                 ` Xiao Guangrong
  2009-09-10 20:57                   ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Xiao Guangrong @ 2009-09-05 14:41 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Rafael J. Wysocki, linux-kernel, Andrew Morton, Suresh Siddha,
	Nick Piggin, H. Peter Anvin, Xiao Guangrong, Peter Zijlstra,
	Rusty Russell, Ingo Molnar, Jens Axboe, Suresh Siddha

Jiri Slaby 写道:
> On 09/05/2009 12:36 AM, Jiri Slaby wrote:
>> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
>>> WARNING: at kernel/smp.c:124
>>> __generic_smp_call_function_interrupt+0xfd/0x110()
>>> Hardware name: To Be Filled By O.E.M.
>>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
>>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
>>> Call Trace:
>>>  [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>>>  [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>>>  [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>>>  [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>>>  [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>>>  [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>>>  [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
>> It's the CPU_DEAD notifier:
>> ffffffff8141ecd0:       48 83 ce 07             or     $0x7,%rsi
>> ffffffff8141ecd4:       48 c7 c7 08 ff 5d 81    mov
>> $0xffffffff815dff08,%rdi
>> ffffffff8141ecdb:       e8 20 c6 c3 ff          callq  ffffffff8105b300
>> <raw_notifier_call_chain>
>> ffffffff8141ece0:       3d 02 80 00 00          cmp    $0x8002,%eax
> 
> And it's due to:
> generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
> 

I think it has collision between my patch and below patch:

Commit-ID:  269c861baa2fe7c114c3bc7831292758d29eb336
Gitweb:     http://git.kernel.org/tip/269c861baa2fe7c114c3bc7831292758d29eb336
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Wed, 19 Aug 2009 18:05:35 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Fri, 21 Aug 2009 16:25:43 -0700

generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled

My patch is merged at -mm tree, but this patch is base on -tip tree later, so it has this
problem

Suresh, what your opinion?

Thanks,
Xiao

> Should the WARN_ONs now warn only when run_callbacks is true?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [-mm] warning during suspend [was: suspend race -mm regression]
  2009-09-05 14:41                 ` Xiao Guangrong
@ 2009-09-10 20:57                   ` Andrew Morton
  2009-09-11  0:00                     ` Suresh Siddha
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2009-09-10 20:57 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: jirislaby, rjw, linux-kernel, suresh.b.siddha, npiggin, hpa,
	xiaoguangrong, peterz, rusty, mingo, jens.axboe

On Sat, 05 Sep 2009 22:41:37 +0800
Xiao Guangrong <ericxiao.gr@gmail.com> wrote:

> Jiri Slaby ______:
> > On 09/05/2009 12:36 AM, Jiri Slaby wrote:
> >> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
> >>> WARNING: at kernel/smp.c:124
> >>> __generic_smp_call_function_interrupt+0xfd/0x110()
> >>> Hardware name: To Be Filled By O.E.M.
> >>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
> >>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
> >>> Call Trace:
> >>>  [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
> >>>  [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
> >>>  [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
> >>>  [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
> >>>  [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
> >>>  [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
> >>>  [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
> >> It's the CPU_DEAD notifier:
> >> ffffffff8141ecd0:       48 83 ce 07             or     $0x7,%rsi
> >> ffffffff8141ecd4:       48 c7 c7 08 ff 5d 81    mov
> >> $0xffffffff815dff08,%rdi
> >> ffffffff8141ecdb:       e8 20 c6 c3 ff          callq  ffffffff8105b300
> >> <raw_notifier_call_chain>
> >> ffffffff8141ece0:       3d 02 80 00 00          cmp    $0x8002,%eax
> > 
> > And it's due to:
> > generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
> > 
> 
> I think it has collision between my patch and below patch:
> 
> Commit-ID:  269c861baa2fe7c114c3bc7831292758d29eb336
> Gitweb:     http://git.kernel.org/tip/269c861baa2fe7c114c3bc7831292758d29eb336
> Author:     Suresh Siddha <suresh.b.siddha@intel.com>
> AuthorDate: Wed, 19 Aug 2009 18:05:35 -0700
> Committer:  H. Peter Anvin <hpa@zytor.com>
> CommitDate: Fri, 21 Aug 2009 16:25:43 -0700
> 
> generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
> 
> My patch is merged at -mm tree, but this patch is base on -tip tree later, so it has this
> problem
> 
> Suresh, what your opinion?
> 

Suresh appears to be hiding.

Could you please propose a fix for this issue?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [-mm] warning during suspend [was: suspend race -mm regression]
  2009-09-10 20:57                   ` Andrew Morton
@ 2009-09-11  0:00                     ` Suresh Siddha
  2009-09-11  7:55                       ` Xiao Guangrong
  0 siblings, 1 reply; 22+ messages in thread
From: Suresh Siddha @ 2009-09-11  0:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Xiao Guangrong, jirislaby@gmail.com, rjw@sisk.pl,
	linux-kernel@vger.kernel.org, npiggin@suse.de, hpa@zytor.com,
	xiaoguangrong@cn.fujitsu.com, peterz@infradead.org,
	rusty@rustcorp.com.au, mingo@elte.hu, jens.axboe@oracle.com

On Thu, 2009-09-10 at 13:57 -0700, Andrew Morton wrote:
> On Sat, 05 Sep 2009 22:41:37 +0800
> Xiao Guangrong <ericxiao.gr@gmail.com> wrote:
> 
> > Jiri Slaby ______:
> > > On 09/05/2009 12:36 AM, Jiri Slaby wrote:
> > >> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
> > >>> WARNING: at kernel/smp.c:124
> > >>> __generic_smp_call_function_interrupt+0xfd/0x110()
> > >>> Hardware name: To Be Filled By O.E.M.
> > >>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
> > >>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
> > >>> Call Trace:
> > >>>  [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
> > >>>  [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
> > >>>  [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
> > >>>  [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
> > >>>  [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
> > >>>  [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
> > >>>  [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
> > >> It's the CPU_DEAD notifier:
> > >> ffffffff8141ecd0:       48 83 ce 07             or     $0x7,%rsi
> > >> ffffffff8141ecd4:       48 c7 c7 08 ff 5d 81    mov
> > >> $0xffffffff815dff08,%rdi
> > >> ffffffff8141ecdb:       e8 20 c6 c3 ff          callq  ffffffff8105b300
> > >> <raw_notifier_call_chain>
> > >> ffffffff8141ece0:       3d 02 80 00 00          cmp    $0x8002,%eax
> > > 
> > > And it's due to:
> > > generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
> > > 
> > 
> > I think it has collision between my patch and below patch:

Xiao, I am not sure if the race that you are trying to fix here indeed
exists. Doesn't the stop machine that we do as part of cpu down address
and avoid the race that you mention? Have you seen any real crashes and
hangs or is it theory?

And if even the race exists (which I don't think) calling the interrupt
handler from the cpu down path looks like a hack.

Can you please elaborate why we need this patch? Then we can think of a
cleaner solution if needed.

> > 
> > Commit-ID:  269c861baa2fe7c114c3bc7831292758d29eb336
> > Gitweb:     http://git.kernel.org/tip/269c861baa2fe7c114c3bc7831292758d29eb336
> > Author:     Suresh Siddha <suresh.b.siddha@intel.com>
> > AuthorDate: Wed, 19 Aug 2009 18:05:35 -0700
> > Committer:  H. Peter Anvin <hpa@zytor.com>
> > CommitDate: Fri, 21 Aug 2009 16:25:43 -0700
> > 
> > generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
> > 
> > My patch is merged at -mm tree, but this patch is base on -tip tree later, so it has this
> > problem
> > 
> > Suresh, what your opinion?
> > 
> 
> Suresh appears to be hiding.

Not any more. I am back from vacation :(

thanks,
suresh


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [-mm] warning during suspend [was: suspend race -mm regression]
  2009-09-11  0:00                     ` Suresh Siddha
@ 2009-09-11  7:55                       ` Xiao Guangrong
  0 siblings, 0 replies; 22+ messages in thread
From: Xiao Guangrong @ 2009-09-11  7:55 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Andrew Morton, Xiao Guangrong, jirislaby@gmail.com, rjw@sisk.pl,
	linux-kernel@vger.kernel.org, npiggin@suse.de, hpa@zytor.com,
	peterz@infradead.org, rusty@rustcorp.com.au, mingo@elte.hu,
	jens.axboe@oracle.com



Suresh Siddha wrote:
> On Thu, 2009-09-10 at 13:57 -0700, Andrew Morton wrote:
>> On Sat, 05 Sep 2009 22:41:37 +0800
>> Xiao Guangrong <ericxiao.gr@gmail.com> wrote:
>>
>>> Jiri Slaby ______:
>>>> On 09/05/2009 12:36 AM, Jiri Slaby wrote:
>>>>> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
>>>>>> WARNING: at kernel/smp.c:124
>>>>>> __generic_smp_call_function_interrupt+0xfd/0x110()
>>>>>> Hardware name: To Be Filled By O.E.M.
>>>>>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
>>>>>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
>>>>>> Call Trace:
>>>>>>  [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>>>>>>  [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>>>>>>  [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>>>>>>  [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>>>>>>  [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>>>>>>  [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>>>>>>  [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
>>>>> It's the CPU_DEAD notifier:
>>>>> ffffffff8141ecd0:       48 83 ce 07             or     $0x7,%rsi
>>>>> ffffffff8141ecd4:       48 c7 c7 08 ff 5d 81    mov
>>>>> $0xffffffff815dff08,%rdi
>>>>> ffffffff8141ecdb:       e8 20 c6 c3 ff          callq  ffffffff8105b300
>>>>> <raw_notifier_call_chain>
>>>>> ffffffff8141ece0:       3d 02 80 00 00          cmp    $0x8002,%eax
>>>> And it's due to:
>>>> generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
>>>>
>>> I think it has collision between my patch and below patch:
> 
> Xiao, I am not sure if the race that you are trying to fix here indeed
> exists. Doesn't the stop machine that we do as part of cpu down address
> and avoid the race that you mention? Have you seen any real crashes and
> hangs or is it theory?
> 

Suresh, please see my explanation in another mail, see below URL:
http://marc.info/?l=linux-kernel&m=125265516529139&w=2

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-04 11:49         ` suspend race -mm " Jiri Slaby
  2009-09-04 22:30           ` Jiri Slaby
@ 2009-09-09 11:41           ` Jiri Slaby
  2009-09-09 11:53             ` Peter Zijlstra
  2009-09-11  6:09             ` Lai Jiangshan
  1 sibling, 2 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-09 11:41 UTC (permalink / raw)
  To: peterz; +Cc: rjw, laijs, akpm, rusty, linux-kernel, Jiri Slaby, Ingo Molnar

Jiri Slaby wrote:
> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> cpu_hotplug-dont-affect-current-tasks-affinity.patch
> 
> Well, I don't know why, but when the kthread overthere runs under
> suspend conditions and gets rescheduled (e.g. by the might_sleep()
> inside) it never returns. pick_next_task always returns the idle task
> from the idle queue. State of the thread is TASK_RUNNING.
> 
> Why is it not enqueued into some queue? I tried also
> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
> it wrong, it seems like a global scheduler problem?

Actually not, it definitely seems like a cpu_down problem.
 
> Ingo, any ideas?

Apparently not, but nevermind :). What about the patch below?

--

After a cpu is taken down in __stop_machine, the kcpu_thread still may be
rescheduled to that cpu, but in fact the cpu is not running at that
moment.

This causes kcpu_thread to never run again, because its enqueued on another
runqueue, hence pick_next_task never selects it on the set of newly
running cpus.

We do set_cpus_allowed_ptr in _cpu_down_thread, but cpu_active_mask is
updated to not contain the cpu which goes down even after the thread finishes
(and _cpu_down returns).

For me this triggers mostly while suspending a SMP machine with
FAIR_GROUP_SCHED enabled and
cpu_hotplug-dont-affect-current-tasks-affinity patch applied. The patch
adds kthread to the cpu_down pipeline.

Fix this issue by eliminating the to-be-killed-cpu from active_cpu
locally.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/cpu.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index be9c5ad..17a3635 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -196,6 +196,14 @@ static int __ref _cpu_down_thread(void *_param)
 	unsigned long mod = param->mod;
 	unsigned int cpu = param->cpu;
 	void *hcpu = (void *)(long)cpu;
+	cpumask_var_t active_mask;
+
+	if (!alloc_cpumask_var(&active_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	/* make sure we are not running on the cpu which goes down,
+	   cpu_active_mask is altered even after we return! */
+	cpumask_andnot(active_mask, cpu_active_mask, cpumask_of(cpu));
 
 	cpu_hotplug_begin();
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
@@ -211,7 +219,7 @@ static int __ref _cpu_down_thread(void *_param)
 	}
 
 	/* Ensure that we are not runnable on dying cpu */
-	set_cpus_allowed_ptr(current, cpu_active_mask);
+	set_cpus_allowed_ptr(current, active_mask);
 
 	err = __stop_machine(take_cpu_down, param, cpumask_of(cpu));
 	if (err) {
@@ -237,9 +245,9 @@ static int __ref _cpu_down_thread(void *_param)
 		BUG();
 
 	check_for_tasks(cpu);
-
 out_release:
 	cpu_hotplug_done();
+	free_cpumask_var(active_mask);
 	if (!err) {
 		if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
 					    hcpu) == NOTIFY_BAD)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
@ 2009-09-09 11:53             ` Peter Zijlstra
  2009-09-09 12:23               ` Jiri Slaby
  2009-09-11  6:09             ` Lai Jiangshan
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2009-09-09 11:53 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: rjw, laijs, akpm, rusty, linux-kernel, Ingo Molnar

On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
> Jiri Slaby wrote:
> > Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> > cpu_hotplug-dont-affect-current-tasks-affinity.patch

Is there a git tree with -mm in some place? I can't seem to find that
patch in my inbox.

All I can find is some comments from Oleg that the patch looks funny.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-09 11:53             ` Peter Zijlstra
@ 2009-09-09 12:23               ` Jiri Slaby
  2009-09-09 12:37                 ` Peter Zijlstra
  2009-09-09 13:46                 ` Oleg Nesterov
  0 siblings, 2 replies; 22+ messages in thread
From: Jiri Slaby @ 2009-09-09 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, laijs, akpm, rusty, linux-kernel, Ingo Molnar, Oleg Nesterov

On 09/09/2009 01:53 PM, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
>> Jiri Slaby wrote:
>>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
>>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
> 
> Is there a git tree with -mm in some place? I can't seem to find that
> patch in my inbox.
> 
> All I can find is some comments from Oleg that the patch looks funny.

Yes, here:
git://git.zen-sources.org/zen/mmotm.git

Actually I found Oleg came up with better solution to add
move_task_off_dead_cpu to take_cpu_down.

A discussion regarding this is at:
http://lkml.indiana.edu/hypermail/linux/kernel/0907.3/02278.html

So what's the status of the patches, please?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-09 12:23               ` Jiri Slaby
@ 2009-09-09 12:37                 ` Peter Zijlstra
  2009-09-09 13:46                 ` Oleg Nesterov
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2009-09-09 12:37 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: rjw, laijs, akpm, rusty, linux-kernel, Ingo Molnar, Oleg Nesterov

On Wed, 2009-09-09 at 14:23 +0200, Jiri Slaby wrote:
> On 09/09/2009 01:53 PM, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
> >> Jiri Slaby wrote:
> >>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> >>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
> > 
> > Is there a git tree with -mm in some place? I can't seem to find that
> > patch in my inbox.
> > 
> > All I can find is some comments from Oleg that the patch looks funny.
> 
> Yes, here:
> git://git.zen-sources.org/zen/mmotm.git

Ah thanks, no wonder I didn't find it.

> Actually I found Oleg came up with better solution to add
> move_task_off_dead_cpu to take_cpu_down.
> 
> A discussion regarding this is at:
> http://lkml.indiana.edu/hypermail/linux/kernel/0907.3/02278.html
> 
> So what's the status of the patches, please?

Oleg's patch looks good to me.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-09 12:23               ` Jiri Slaby
  2009-09-09 12:37                 ` Peter Zijlstra
@ 2009-09-09 13:46                 ` Oleg Nesterov
  1 sibling, 0 replies; 22+ messages in thread
From: Oleg Nesterov @ 2009-09-09 13:46 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Peter Zijlstra, rjw, laijs, akpm, rusty, linux-kernel,
	Ingo Molnar

On 09/09, Jiri Slaby wrote:
>
> On 09/09/2009 01:53 PM, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
> >> Jiri Slaby wrote:
> >>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> >>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
> >
> > Is there a git tree with -mm in some place? I can't seem to find that
> > patch in my inbox.
> >
> > All I can find is some comments from Oleg that the patch looks funny.
>
> Yes, here:
> git://git.zen-sources.org/zen/mmotm.git
>
> Actually I found Oleg came up with better solution to add
> move_task_off_dead_cpu to take_cpu_down.
>
> A discussion regarding this is at:
> http://lkml.indiana.edu/hypermail/linux/kernel/0907.3/02278.html
>
> So what's the status of the patches, please?

This patch depends on another one, please see

	"[PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()"
	http://marc.info/?t=124910242400002

	(as the changelog says, the patch is not complete: we need
	 ->cpumask_lock every time we update cs->allowed, but this
	 should be trivial)

In short: cpuset_lock() is buggy. But more importantly it is afaics unneeded,
and imho should die. I seem to answer all Lai's questions, but the patch was
ignored by maintainers.

I noticed another race in update_cpumask() which I was going to fix, but since
maintainers ignore me I lost the motivtion ;) Besides, currently I dont have
the time anyway.

So I think the original patch which creates the kthread is the best option.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
  2009-09-09 11:53             ` Peter Zijlstra
@ 2009-09-11  6:09             ` Lai Jiangshan
  2009-09-11  6:28               ` Jiri Slaby
  1 sibling, 1 reply; 22+ messages in thread
From: Lai Jiangshan @ 2009-09-11  6:09 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: peterz, rjw, akpm, rusty, linux-kernel, Ingo Molnar

Jiri Slaby wrote:
> Jiri Slaby wrote:
>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
>>
>> Well, I don't know why, but when the kthread overthere runs under
>> suspend conditions and gets rescheduled (e.g. by the might_sleep()
>> inside) it never returns. pick_next_task always returns the idle task
>> from the idle queue. State of the thread is TASK_RUNNING.
>>
>> Why is it not enqueued into some queue? I tried also
>> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
>> it wrong, it seems like a global scheduler problem?
> 
> Actually not, it definitely seems like a cpu_down problem.
>  
>> Ingo, any ideas?
> 
> Apparently not, but nevermind :). What about the patch below?
> 
> --
> 
> After a cpu is taken down in __stop_machine, the kcpu_thread still may be
> rescheduled to that cpu, but in fact the cpu is not running at that
> moment.
> 
> This causes kcpu_thread to never run again, because its enqueued on another
> runqueue, hence pick_next_task never selects it on the set of newly
> running cpus.
> 
> We do set_cpus_allowed_ptr in _cpu_down_thread, but cpu_active_mask is
> updated to not contain the cpu which goes down even after the thread finishes
> (and _cpu_down returns).
> 
> For me this triggers mostly while suspending a SMP machine with
> FAIR_GROUP_SCHED enabled and
> cpu_hotplug-dont-affect-current-tasks-affinity patch applied. The patch
> adds kthread to the cpu_down pipeline.
> 
> Fix this issue by eliminating the to-be-killed-cpu from active_cpu
> locally.
> 
> Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/cpu.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index be9c5ad..17a3635 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -196,6 +196,14 @@ static int __ref _cpu_down_thread(void *_param)
>  	unsigned long mod = param->mod;
>  	unsigned int cpu = param->cpu;
>  	void *hcpu = (void *)(long)cpu;
> +	cpumask_var_t active_mask;
> +
> +	if (!alloc_cpumask_var(&active_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	/* make sure we are not running on the cpu which goes down,
> +	   cpu_active_mask is altered even after we return! */
> +	cpumask_andnot(active_mask, cpu_active_mask, cpumask_of(cpu));
>  
>  	cpu_hotplug_begin();
>  	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> @@ -211,7 +219,7 @@ static int __ref _cpu_down_thread(void *_param)
>  	}
>  
>  	/* Ensure that we are not runnable on dying cpu */
> -	set_cpus_allowed_ptr(current, cpu_active_mask);
> +	set_cpus_allowed_ptr(current, active_mask);
>  
>  	err = __stop_machine(take_cpu_down, param, cpumask_of(cpu));
>  	if (err) {
> @@ -237,9 +245,9 @@ static int __ref _cpu_down_thread(void *_param)
>  		BUG();
>  
>  	check_for_tasks(cpu);
> -
>  out_release:
>  	cpu_hotplug_done();
> +	free_cpumask_var(active_mask);
>  	if (!err) {
>  		if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
>  					    hcpu) == NOTIFY_BAD)



Hi, Jiri Slaby

Does this bug occur when a cpu is being offlined or
when the system is being suspended?
Or Both?

Lai


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-11  6:09             ` Lai Jiangshan
@ 2009-09-11  6:28               ` Jiri Slaby
  2009-09-11  7:38                 ` Lai Jiangshan
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Slaby @ 2009-09-11  6:28 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: peterz, rjw, akpm, rusty, linux-kernel, Ingo Molnar

On 09/11/2009 08:09 AM, Lai Jiangshan wrote:
> Does this bug occur when a cpu is being offlined or
> when the system is being suspended?
> Or Both?

Hi, I tried echo 0/1 > /sys/devices/system/cpu/cpu1/online in a loop,
but it didn't trigger the bug. It happened only on suspend/resume cycle
(in the end I found even swsusp in qemu suffers from this).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] sched: fix cpu_down deadlock
  2009-09-11  6:28               ` Jiri Slaby
@ 2009-09-11  7:38                 ` Lai Jiangshan
  0 siblings, 0 replies; 22+ messages in thread
From: Lai Jiangshan @ 2009-09-11  7:38 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: peterz, rjw, akpm, rusty, linux-kernel, Ingo Molnar

Jiri Slaby wrote:
> On 09/11/2009 08:09 AM, Lai Jiangshan wrote:
>> Does this bug occur when a cpu is being offlined or
>> when the system is being suspended?
>> Or Both?
> 
> Hi, I tried echo 0/1 > /sys/devices/system/cpu/cpu1/online in a loop,
> but it didn't trigger the bug. It happened only on suspend/resume cycle
> (in the end I found even swsusp in qemu suffers from this).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

OK, I knew where this bug is.

I thought the corresponding bit in cpu_active_mask is cleared before
_cpu_down(), but I missed the system-suspend path:disable_nonboot_cpus().

There is a bug in disable_nonboot_cpus() even if my patch is removed.

cpu_active_map is wrong during suspending.(scheduler system who uses
cpu_active_map are still working while suspending)

You need:

int disable_nonboot_cpus(void)
{
	....
	....

	/*
	 * You need adding 'set_cpu_active(cpu, false);' here
	 * to fix this bug and make my patch works well.
	 */
	error = _cpu_down(cpu, 1);

	....
	....
}

Lai.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-09-11  7:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-11  8:41 [PATCH 1/1] Power: fix suspend vt regression Jiri Slaby
2009-08-11 17:00 ` Greg KH
2009-08-11 21:19   ` Jiri Slaby
2009-08-11 21:20     ` Jiri Slaby
2009-08-31  9:47     ` suspend race -next regression [Was: Power: fix suspend vt regression] Jiri Slaby
2009-08-31 19:32       ` Rafael J. Wysocki
2009-09-04 11:49         ` suspend race -mm " Jiri Slaby
2009-09-04 22:30           ` Jiri Slaby
2009-09-04 22:36             ` Jiri Slaby
2009-09-05 12:39               ` [-mm] warning during suspend [was: suspend race -mm regression] Jiri Slaby
2009-09-05 14:41                 ` Xiao Guangrong
2009-09-10 20:57                   ` Andrew Morton
2009-09-11  0:00                     ` Suresh Siddha
2009-09-11  7:55                       ` Xiao Guangrong
2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
2009-09-09 11:53             ` Peter Zijlstra
2009-09-09 12:23               ` Jiri Slaby
2009-09-09 12:37                 ` Peter Zijlstra
2009-09-09 13:46                 ` Oleg Nesterov
2009-09-11  6:09             ` Lai Jiangshan
2009-09-11  6:28               ` Jiri Slaby
2009-09-11  7:38                 ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox