* 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
@ 2009-03-17 6:33 Valdis.Kletnieks
2009-03-17 14:58 ` Alan Cox
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Valdis.Kletnieks @ 2009-03-17 6:33 UTC (permalink / raw)
To: jkosina, gregkh; +Cc: Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5617 bytes --]
29-rc3-mmotm0129 is OK, I hit it a few times under rc5-mmotm0214, but I'm
seeing it a lot under -rc8-mmotm0313 (have triggered it 6 times in the past 4
hours). Very consistent traceback out of the HID and USB stack - the events/0
kernel thread loses its shit:
[ 3816.196809] ------------[ cut here ]------------
[ 3816.196815] WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0x32/0x82()
[ 3816.196820] Hardware name: Latitude D820
[ 3816.196823] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp sunrpc nf_conntrack_ftp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_recent ipt_LOG xt_u32 xt_multiport iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_x86_64 aes_generic rtc acpi_cpufreq tpm_tis tpm tpm_bios arc4 ecb nvidia(P) iwl3945 iwlcore mac80211 ohci1394 pcmcia ieee1394 dell_laptop yenta_socket led_class snd_hda_codec_idt video processor uhci_hcd iTCO_wdt rsrc_nonstatic cfg80211 snd_hda_intel intel_agp pcmcia_core iTCO_vendor_support rfkill output snd_hda_codec button battery thermal ac dcdbas [last unloaded: microcode]
[ 3816.196950] Pid: 9, comm: events/0 Tainted: P 2.6.29-rc8-mmotm0313 #3
[ 3816.196955] Call Trace:
[ 3816.196965] [<ffffffff80233e53>] warn_slowpath+0xaf/0xd6
[ 3816.196974] [<ffffffff803b78a5>] ? extract_buf+0x8e/0xc3
[ 3816.196983] [<ffffffff8027e07a>] ? list_add+0xc/0xe
[ 3816.196990] [<ffffffff8027eb0b>] ? __free_one_page+0x17f/0x1e6
[ 3816.196997] [<ffffffff80243ea6>] flush_cpu_workqueue+0x32/0x82
[ 3816.197032] [<ffffffff804214dd>] ? usb_hcd_unlink_urb+0x48/0x84
[ 3816.197040] [<ffffffff80422762>] ? usb_kill_urb+0x21/0xce
[ 3816.197046] [<ffffffff802444d7>] flush_workqueue+0x4d/0x67
[ 3816.197053] [<ffffffff80244501>] flush_scheduled_work+0x10/0x12
[ 3816.197061] [<ffffffff8045da33>] hid_cease_io+0x3b/0x40
[ 3816.197067] [<ffffffff8045da7b>] hid_pre_reset+0x43/0x4a
[ 3816.197073] [<ffffffff8041e627>] usb_reset_device+0x6c/0x11c
[ 3816.197080] [<ffffffff8045e535>] hid_reset+0x9e/0x12e
[ 3816.197086] [<ffffffff8045e497>] ? hid_reset+0x0/0x12e
[ 3816.197092] [<ffffffff80243bca>] worker_thread+0x1d3/0x27b
[ 3816.197100] [<ffffffff802477c1>] ? autoremove_wake_function+0x0/0x34
[ 3816.197106] [<ffffffff802439f7>] ? worker_thread+0x0/0x27b
[ 3816.197113] [<ffffffff802473bb>] kthread+0x55/0x80
[ 3816.197120] [<ffffffff8020c15a>] child_rip+0xa/0x20
[ 3816.197128] [<ffffffff8020bb2d>] ? restore_args+0x0/0x30
[ 3816.197135] [<ffffffff80247366>] ? kthread+0x0/0x80
[ 3816.197140] [<ffffffff8020c150>] ? child_rip+0x0/0x20
[ 3816.197145] ---[ end trace 1e05d800555b77d7 ]---
Yes, there's an NVidia driver loaded - but this looks like an HID/USB
bug, where it's shooting itself in the foot by flushing the workqueue while
not realizing it's in a worker thread already, thus deadlocking.
After that traceback happens, the keyboard and USB mouse are dead. The onboard
touchpad that presents as a PS/2 mouse still works, and alt-sysrq semi-works (
-S and -U work to get the disks unmounted, but -B fails to reboot).
After that, things slowly grind to a halt as processes get blocked because
they're waiting for events/0, or they're waiting on something else that
blocked - I can hit the power button and it will *start* a shutdown, but
that too will wedge pretty quickly.
About 2 minutes later, if I wait that long, I'll get this:
[ 3240.474270] INFO: task events/0:9 blocked for more than 120 seconds.
[ 3240.474273] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3240.474275] events/0 D ffff8800010287c0 5352 9 2
[ 3240.474281] ffff88007f24fc30 0000000000000046 ffff88007f24fbb0 0031f9ec188eafbf
[ 3240.474287] ffff880001018000 ffff88007f24fa80 ffff88007f24fa20 0000000000000000
[ 3240.474291] ffff88007f2333a0 00000000000107c0 ffff88007f2333a0 00000000000107c0
[ 3240.474296] Call Trace:
[ 3240.474305] [<ffffffff805429d1>] schedule+0x13/0x3d
[ 3240.474308] [<ffffffff80542bca>] schedule_timeout+0x22/0xb9
[ 3240.474312] [<ffffffff8054483e>] ? _spin_unlock_irqrestore+0x45/0x52
[ 3240.474317] [<ffffffff8022cbeb>] ? get_parent_ip+0x11/0x41
[ 3240.474321] [<ffffffff80546ca5>] ? sub_preempt_count+0x35/0x48
[ 3240.474324] [<ffffffff8054209b>] wait_for_common+0xb7/0x106
[ 3240.474327] [<ffffffff8022de49>] ? default_wake_function+0x0/0xf
[ 3240.474331] [<ffffffff80542174>] wait_for_completion+0x18/0x1a
[ 3240.474335] [<ffffffff80243eea>] flush_cpu_workqueue+0x76/0x82
[ 3240.474338] [<ffffffff80243f9b>] ? wq_barrier_func+0x0/0xf
[ 3240.474341] [<ffffffff802444d7>] flush_workqueue+0x4d/0x67
[ 3240.474345] [<ffffffff80244501>] flush_scheduled_work+0x10/0x12
[ 3240.474349] [<ffffffff8045da33>] hid_cease_io+0x3b/0x40
[ 3240.474352] [<ffffffff8045da7b>] hid_pre_reset+0x43/0x4a
[ 3240.474357] [<ffffffff8041e627>] usb_reset_device+0x6c/0x11c
[ 3240.474360] [<ffffffff8045e535>] hid_reset+0x9e/0x12e
[ 3240.474363] [<ffffffff8045e497>] ? hid_reset+0x0/0x12e
[ 3240.474366] [<ffffffff80243bca>] worker_thread+0x1d3/0x27b
[ 3240.474370] [<ffffffff802477c1>] ? autoremove_wake_function+0x0/0x34
[ 3240.474373] [<ffffffff802439f7>] ? worker_thread+0x0/0x27b
[ 3240.474377] [<ffffffff802473bb>] kthread+0x55/0x80
[ 3240.474381] [<ffffffff8020c15a>] child_rip+0xa/0x20
[ 3240.474385] [<ffffffff8020bb2d>] ? restore_args+0x0/0x30
[ 3240.474388] [<ffffffff80247366>] ? kthread+0x0/0x80
[ 3240.474391] [<ffffffff8020c150>] ? child_rip+0x0/0x20
Yeah, it's wedged good and solid.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 6:33 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371 Valdis.Kletnieks
@ 2009-03-17 14:58 ` Alan Cox
2009-03-17 15:07 ` Oliver Neukum
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Alan Cox @ 2009-03-17 14:58 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: jkosina, gregkh, Andrew Morton, linux-kernel
> [ 3816.196815] WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0x32/0x82()
> [ 3816.196820] Hardware name: Latitude D820
> [ 3816.196823] Modules linked in:
> ... nvidia(P)
You really need to reproduce this without the proprietary stuff loaded
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 6:33 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371 Valdis.Kletnieks
2009-03-17 14:58 ` Alan Cox
@ 2009-03-17 15:07 ` Oliver Neukum
2009-03-17 16:55 ` Valdis.Kletnieks
2009-03-17 20:53 ` Greg KH
2009-03-17 20:54 ` Andrew Morton
3 siblings, 1 reply; 13+ messages in thread
From: Oliver Neukum @ 2009-03-17 15:07 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: jkosina, gregkh, Andrew Morton, linux-kernel
Am Dienstag 17 März 2009 07:33:09 schrieb Valdis.Kletnieks@vt.edu:
> Yes, there's an NVidia driver loaded - but this looks like an HID/USB
> bug, where it's shooting itself in the foot by flushing the workqueue while
> not realizing it's in a worker thread already, thus deadlocking.
I am looking into it. Do you know why you get a reset?
Regards
Oliver
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 15:07 ` Oliver Neukum
@ 2009-03-17 16:55 ` Valdis.Kletnieks
0 siblings, 0 replies; 13+ messages in thread
From: Valdis.Kletnieks @ 2009-03-17 16:55 UTC (permalink / raw)
To: Oliver Neukum; +Cc: jkosina, gregkh, Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]
On Tue, 17 Mar 2009 16:07:20 BST, Oliver Neukum said:
> Am Dienstag 17 M=C3=A4rz 2009 07:33:09 schrieb Valdis.Kletnieks@vt.edu:
> > Yes, there's an NVidia driver loaded - but this looks like an HID/USB
> > bug, where it's shooting itself in the foot by flushing the workqueue while
> > not realizing it's in a worker thread already, thus deadlocking.
> I am looking into it. Do you know why you get a reset?
This is a good question indeed - and I suspect the answer is "flaky hardware".
During the evening at home, I usually have 2 things plugged into USB ports. One
is a Microsoft mouse, and the other is a Targa USB-powered cooling pad
(basically just 2 USB-powered fans). The pad in question:
http://www.amazon.com/Targus-PA248U-Notebook-Chill-Pad/dp/B0000AKA8Y
There's zero actual smarts inside the cooling pad as far as I can tell - it
just draws its milliamps to drive the fans.
I noticed that most of the wedges happened right after I moved the laptop,
and there's apparently an intermittent short in the USB power cable for the
Targa - at one point the fans stopped until I moved the cable a little.
So I'm suspecting that the Targa glitched and did something the USB hub
noticed, the USB system concluded there was a confused device and tried to
reset it back to sanity - but did it from a thread it shouldn't have done it
from (guessing here, not having looked at the code).
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 6:33 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371 Valdis.Kletnieks
2009-03-17 14:58 ` Alan Cox
2009-03-17 15:07 ` Oliver Neukum
@ 2009-03-17 20:53 ` Greg KH
2009-03-17 23:48 ` Valdis.Kletnieks
2009-03-17 20:54 ` Andrew Morton
3 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2009-03-17 20:53 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: jkosina, Andrew Morton, linux-kernel
On Tue, Mar 17, 2009 at 02:33:09AM -0400, Valdis.Kletnieks@vt.edu wrote:
> Yes, there's an NVidia driver loaded
Can you reproduce it without the nvidia driver loaded?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 6:33 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371 Valdis.Kletnieks
` (2 preceding siblings ...)
2009-03-17 20:53 ` Greg KH
@ 2009-03-17 20:54 ` Andrew Morton
2009-03-17 22:01 ` Valdis.Kletnieks
3 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2009-03-17 20:54 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: jkosina, gregkh, linux-kernel, Lai Jiangshan, Oleg Nesterov,
Oliver Neukum
On Tue, 17 Mar 2009 02:33:09 -0400 Valdis.Kletnieks@vt.edu wrote:
> 29-rc3-mmotm0129 is OK, I hit it a few times under rc5-mmotm0214, but I'm
> seeing it a lot under -rc8-mmotm0313 (have triggered it 6 times in the past 4
> hours). Very consistent traceback out of the HID and USB stack - the events/0
> kernel thread loses its shit:
>
> [ 3816.196809] ------------[ cut here ]------------
> [ 3816.196815] WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0x32/0x82()
> [ 3816.196820] Hardware name: Latitude D820
> [ 3816.196823] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp sunrpc nf_conntrack_ftp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_recent ipt_LOG xt_u32 xt_multiport iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_x86_64 aes_generic rtc acpi_cpufreq tpm_tis tpm tpm_bios arc4 ecb nvidia(P) iwl3945 iwlcore mac80211 ohci1394 pcmcia ieee1394 dell_laptop yenta_socket led_class snd_hda_codec_idt video processor uhci_hcd iTCO_wdt rsrc_nonstatic cfg80211 snd_hda_intel intel_agp pcmcia_core iTCO_vendor_support rfkill output snd_hda_codec button battery thermal ac dcdbas [last unloaded: microcode]
> [ 3816.196950] Pid: 9, comm: events/0 Tainted: P 2.6.29-rc8-mmotm0313 #3
> [ 3816.196955] Call Trace:
> [ 3816.196965] [<ffffffff80233e53>] warn_slowpath+0xaf/0xd6
> [ 3816.196974] [<ffffffff803b78a5>] ? extract_buf+0x8e/0xc3
> [ 3816.196983] [<ffffffff8027e07a>] ? list_add+0xc/0xe
> [ 3816.196990] [<ffffffff8027eb0b>] ? __free_one_page+0x17f/0x1e6
> [ 3816.196997] [<ffffffff80243ea6>] flush_cpu_workqueue+0x32/0x82
> [ 3816.197032] [<ffffffff804214dd>] ? usb_hcd_unlink_urb+0x48/0x84
> [ 3816.197040] [<ffffffff80422762>] ? usb_kill_urb+0x21/0xce
> [ 3816.197046] [<ffffffff802444d7>] flush_workqueue+0x4d/0x67
> [ 3816.197053] [<ffffffff80244501>] flush_scheduled_work+0x10/0x12
> [ 3816.197061] [<ffffffff8045da33>] hid_cease_io+0x3b/0x40
> [ 3816.197067] [<ffffffff8045da7b>] hid_pre_reset+0x43/0x4a
> [ 3816.197073] [<ffffffff8041e627>] usb_reset_device+0x6c/0x11c
> [ 3816.197080] [<ffffffff8045e535>] hid_reset+0x9e/0x12e
> [ 3816.197086] [<ffffffff8045e497>] ? hid_reset+0x0/0x12e
> [ 3816.197092] [<ffffffff80243bca>] worker_thread+0x1d3/0x27b
> [ 3816.197100] [<ffffffff802477c1>] ? autoremove_wake_function+0x0/0x34
> [ 3816.197106] [<ffffffff802439f7>] ? worker_thread+0x0/0x27b
> [ 3816.197113] [<ffffffff802473bb>] kthread+0x55/0x80
> [ 3816.197120] [<ffffffff8020c15a>] child_rip+0xa/0x20
> [ 3816.197128] [<ffffffff8020bb2d>] ? restore_args+0x0/0x30
> [ 3816.197135] [<ffffffff80247366>] ? kthread+0x0/0x80
> [ 3816.197140] [<ffffffff8020c150>] ? child_rip+0x0/0x20
> [ 3816.197145] ---[ end trace 1e05d800555b77d7 ]---
It's an error in workqueue-avoid-recursion-in-run_workqueue.patch, methinks.
We used to permit keventd to run flush_workqueue():
static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
int active;
if (cwq->thread == current) {
/*
* Probably keventd trying to flush its own queue. So simply run
* it by hand rather than deadlocking.
*/
run_workqueue(cwq);
active = 1;
} else {
struct wq_barrier barr;
active = 0;
spin_lock_irq(&cwq->lock);
if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
insert_wq_barrier(cwq, &barr, &cwq->worklist);
active = 1;
}
spin_unlock_irq(&cwq->lock);
if (active)
wait_for_completion(&barr.done);
}
return active;
}
but after workqueue-avoid-recursion-in-run_workqueue.patch, we warn
instead:
static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
int active = 0;
struct wq_barrier barr;
WARN_ON(cwq->thread == current);
spin_lock_irq(&cwq->lock);
if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
insert_wq_barrier(cwq, &barr, &cwq->worklist);
active = 1;
}
spin_unlock_irq(&cwq->lock);
if (active)
wait_for_completion(&barr.done);
return active;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 20:54 ` Andrew Morton
@ 2009-03-17 22:01 ` Valdis.Kletnieks
2009-03-18 2:29 ` Lai Jiangshan
0 siblings, 1 reply; 13+ messages in thread
From: Valdis.Kletnieks @ 2009-03-17 22:01 UTC (permalink / raw)
To: Andrew Morton
Cc: jkosina, gregkh, linux-kernel, Lai Jiangshan, Oleg Nesterov,
Oliver Neukum
[-- Attachment #1: Type: text/plain, Size: 676 bytes --]
On Tue, 17 Mar 2009 13:54:24 PDT, Andrew Morton said:
> It's an error in workqueue-avoid-recursion-in-run_workqueue.patch, methinks.
Thanks for the diagnosis - I got as far as realizing that any backtrace that
included flush_cpu_workqueue() and worker_thread() had a problem and got
stuck there.
I'm going to throw some printk's into the usb_reset_device() area and related
code, and see if I can figure out why a cooling pad was able to trigger it (was
seeing a hit every 90 mins or so at home, have been up over 7 hours straight
in the office).
I'll let somebody who understands the code better figure out what should be
happening once we get into that reset situation.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 20:53 ` Greg KH
@ 2009-03-17 23:48 ` Valdis.Kletnieks
0 siblings, 0 replies; 13+ messages in thread
From: Valdis.Kletnieks @ 2009-03-17 23:48 UTC (permalink / raw)
To: Greg KH; +Cc: jkosina, Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 410 bytes --]
On Tue, 17 Mar 2009 13:53:00 PDT, Greg KH said:
> On Tue, Mar 17, 2009 at 02:33:09AM -0400, Valdis.Kletnieks@vt.edu wrote:
> > Yes, there's an NVidia driver loaded
>
> Can you reproduce it without the nvidia driver loaded?
Now that I suspect the Targa cooling pad is the reproducer, I can probably
boot the box into single-user and wiggle the wire till it craps out. Won't
happen till later tonight though.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-17 22:01 ` Valdis.Kletnieks
@ 2009-03-18 2:29 ` Lai Jiangshan
2009-03-18 2:50 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Lai Jiangshan @ 2009-03-18 2:29 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Andrew Morton, jkosina, gregkh, linux-kernel, Oleg Nesterov,
Oliver Neukum
Valdis.Kletnieks@vt.edu wrote:
> On Tue, 17 Mar 2009 13:54:24 PDT, Andrew Morton said:
>
>> It's an error in workqueue-avoid-recursion-in-run_workqueue.patch, methinks.
>
> Thanks for the diagnosis - I got as far as realizing that any backtrace that
> included flush_cpu_workqueue() and worker_thread() had a problem and got
> stuck there.
>
It's dangerous when we allow recursion in run_workqueue().
If it's hard for you avoid flush_scheduled_work() in
keventd's work fuction by other fix, you can create
another workqueue to handle your works, IMO.
Thanks
Lai jiangshan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-18 2:29 ` Lai Jiangshan
@ 2009-03-18 2:50 ` Andrew Morton
2009-03-18 3:26 ` Lai Jiangshan
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2009-03-18 2:50 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Valdis.Kletnieks, jkosina, gregkh, linux-kernel, Oleg Nesterov,
Oliver Neukum
On Wed, 18 Mar 2009 10:29:40 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> Valdis.Kletnieks@vt.edu wrote:
> > On Tue, 17 Mar 2009 13:54:24 PDT, Andrew Morton said:
> >
> >> It's an error in workqueue-avoid-recursion-in-run_workqueue.patch, methinks.
> >
> > Thanks for the diagnosis - I got as far as realizing that any backtrace that
> > included flush_cpu_workqueue() and worker_thread() had a problem and got
> > stuck there.
> >
>
> It's dangerous when we allow recursion in run_workqueue().
static void run_workqueue(struct cpu_workqueue_struct *cwq)
{
spin_lock_irq(&cwq->lock);
cwq->run_depth++;
if (cwq->run_depth > 3) {
/* morton gets to eat his hat */
printk("%s: recursion depth exceeded: %d\n",
__func__, cwq->run_depth);
dump_stack();
}
That was added five or six years ago, and I never ever got to eat my hat.
> If it's hard for you avoid flush_scheduled_work() in
> keventd's work fuction by other fix, you can create
> another workqueue to handle your works, IMO.
Why do we need to change anything here? No known bugs were fixed, and some
new ones were added.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-18 2:50 ` Andrew Morton
@ 2009-03-18 3:26 ` Lai Jiangshan
2009-03-18 7:48 ` Oliver Neukum
0 siblings, 1 reply; 13+ messages in thread
From: Lai Jiangshan @ 2009-03-18 3:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Valdis.Kletnieks, jkosina, gregkh, linux-kernel, Oleg Nesterov,
Oliver Neukum
Andrew Morton wrote:
> On Wed, 18 Mar 2009 10:29:40 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
>
> static void run_workqueue(struct cpu_workqueue_struct *cwq)
> {
> spin_lock_irq(&cwq->lock);
> cwq->run_depth++;
> if (cwq->run_depth > 3) {
> /* morton gets to eat his hat */
> printk("%s: recursion depth exceeded: %d\n",
> __func__, cwq->run_depth);
> dump_stack();
> }
>
> That was added five or six years ago, and I never ever got to eat my hat.
If Valdis.Kletnieks let lockdep work, he will see the complain
from lockdep at first.
>
>> If it's hard for you avoid flush_scheduled_work() in
>> keventd's work fuction by other fix, you can create
>> another workqueue to handle your works, IMO.
>
> Why do we need to change anything here? No known bugs were fixed, and some
> new ones were added.
>
I prefer the code which has no known bugs and no known potential bugs.
I defend myself:
I did not add a new bug as you said, it's the new code hid_cease_io()
adds a new unsafe usage of workqueue.
Thanks, Lai.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-18 3:26 ` Lai Jiangshan
@ 2009-03-18 7:48 ` Oliver Neukum
2009-03-20 13:42 ` Valdis.Kletnieks
0 siblings, 1 reply; 13+ messages in thread
From: Oliver Neukum @ 2009-03-18 7:48 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Andrew Morton, Valdis.Kletnieks, jkosina, gregkh, linux-kernel,
Oleg Nesterov
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2014 bytes --]
Am Mittwoch 18 März 2009 04:26:42 schrieb Lai Jiangshan:> Andrew Morton wrote:> > On Wed, 18 Mar 2009 10:29:40 +0800 Lai Jiangshan <laijs@cn.fujitsu.com>> > wrote:> >> >> > static void run_workqueue(struct cpu_workqueue_struct *cwq)> > {> > spin_lock_irq(&cwq->lock);> > cwq->run_depth++;> > if (cwq->run_depth > 3) {> > /* morton gets to eat his hat */> > printk("%s: recursion depth exceeded: %d\n",> > __func__, cwq->run_depth);> > dump_stack();> > }> >> > That was added five or six years ago, and I never ever got to eat my hat.>> If Valdis.Kletnieks let lockdep work, he will see the complain> from lockdep at first.>> >> If it's hard for you avoid flush_scheduled_work() in> >> keventd's work fuction by other fix, you can create> >> another workqueue to handle your works, IMO.> >> > Why do we need to change anything here? No known bugs were fixed, and> > some new ones were added.>> I prefer the code which has no known bugs and no known potential bugs.>> I defend myself:> I did not add a new bug as you said, it's the new code hid_cease_io()> adds a new unsafe usage of workqueue.
Very well, here's a patch.Please test.
Regards Oliverdiff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.cindex bc017cd..4306cb1 100644--- a/drivers/hid/usbhid/hid-core.c+++ b/drivers/hid/usbhid/hid-core.c@@ -1210,7 +1210,6 @@ static void hid_cease_io(struct usbhid_device *usbhid) usb_kill_urb(usbhid->urbin); usb_kill_urb(usbhid->urbctrl); usb_kill_urb(usbhid->urbout);- flush_scheduled_work(); } /* Treat USB reset pretty much the same as suspend/resume */@@ -1222,6 +1221,7 @@ static int hid_pre_reset(struct usb_interface *intf) spin_lock_irq(&usbhid->lock); set_bit(HID_RESET_PENDING, &usbhid->iofl); spin_unlock_irq(&usbhid->lock);+ cancel_work_sync(&usbhid->restart_work); hid_cease_io(usbhid); return 0;\0ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371
2009-03-18 7:48 ` Oliver Neukum
@ 2009-03-20 13:42 ` Valdis.Kletnieks
0 siblings, 0 replies; 13+ messages in thread
From: Valdis.Kletnieks @ 2009-03-20 13:42 UTC (permalink / raw)
To: Oliver Neukum
Cc: Lai Jiangshan, Andrew Morton, jkosina, gregkh, linux-kernel,
Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 509 bytes --]
On Wed, 18 Mar 2009 08:48:50 BST, Oliver Neukum said:
> Very well, here's a patch.
> Please test.
> Regards
> Oliver
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index bc017cd..4306cb1 100644
Applied and tested, over the past 2 evenings it's gotten about 18 hours of
runtime with the evil cooling pad without a problem - before it was rarely
living more than 90 minutes at a time. So this seems to have fixed the
workqueue issue I was seeing.
Thanks for the patch.. :)
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-20 14:33 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-17 6:33 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371 Valdis.Kletnieks
2009-03-17 14:58 ` Alan Cox
2009-03-17 15:07 ` Oliver Neukum
2009-03-17 16:55 ` Valdis.Kletnieks
2009-03-17 20:53 ` Greg KH
2009-03-17 23:48 ` Valdis.Kletnieks
2009-03-17 20:54 ` Andrew Morton
2009-03-17 22:01 ` Valdis.Kletnieks
2009-03-18 2:29 ` Lai Jiangshan
2009-03-18 2:50 ` Andrew Morton
2009-03-18 3:26 ` Lai Jiangshan
2009-03-18 7:48 ` Oliver Neukum
2009-03-20 13:42 ` Valdis.Kletnieks
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox