From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756931AbZCQVNS (ORCPT ); Tue, 17 Mar 2009 17:13:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754439AbZCQVNE (ORCPT ); Tue, 17 Mar 2009 17:13:04 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:35557 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752998AbZCQVND (ORCPT ); Tue, 17 Mar 2009 17:13:03 -0400 Date: Tue, 17 Mar 2009 13:54:24 -0700 From: Andrew Morton To: Valdis.Kletnieks@vt.edu Cc: jkosina@suse.cz, gregkh@suse.de, linux-kernel@vger.kernel.org, Lai Jiangshan , Oleg Nesterov , Oliver Neukum Subject: Re: 29-rc-mmotm - HID/USB wedge w/ WARNING: at kernel/workqueue.c:371 Message-Id: <20090317135424.9151c4f8.akpm@linux-foundation.org> In-Reply-To: <6648.1237271589@turing-police.cc.vt.edu> References: <6648.1237271589@turing-police.cc.vt.edu> X-Mailer: Sylpheed 2.4.7 (GTK+ 2.12.1; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 17 Mar 2009 02:33:09 -0400 Valdis.Kletnieks@vt.edu wrote: > 29-rc3-mmotm0129 is OK, I hit it a few times under rc5-mmotm0214, but I'm > seeing it a lot under -rc8-mmotm0313 (have triggered it 6 times in the past 4 > hours). Very consistent traceback out of the HID and USB stack - the events/0 > kernel thread loses its shit: > > [ 3816.196809] ------------[ cut here ]------------ > [ 3816.196815] WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0x32/0x82() > [ 3816.196820] Hardware name: Latitude D820 > [ 3816.196823] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp sunrpc nf_conntrack_ftp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_recent ipt_LOG xt_u32 xt_multiport iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_x86_64 aes_generic rtc acpi_cpufreq tpm_tis tpm tpm_bios arc4 ecb nvidia(P) iwl3945 iwlcore mac80211 ohci1394 pcmcia ieee1394 dell_laptop yenta_socket led_class snd_hda_codec_idt video processor uhci_hcd iTCO_wdt rsrc_nonstatic cfg80211 snd_hda_intel intel_agp pcmcia_core iTCO_vendor_support rfkill output snd_hda_codec button battery thermal ac dcdbas [last unloaded: microcode] > [ 3816.196950] Pid: 9, comm: events/0 Tainted: P 2.6.29-rc8-mmotm0313 #3 > [ 3816.196955] Call Trace: > [ 3816.196965] [] warn_slowpath+0xaf/0xd6 > [ 3816.196974] [] ? extract_buf+0x8e/0xc3 > [ 3816.196983] [] ? list_add+0xc/0xe > [ 3816.196990] [] ? __free_one_page+0x17f/0x1e6 > [ 3816.196997] [] flush_cpu_workqueue+0x32/0x82 > [ 3816.197032] [] ? usb_hcd_unlink_urb+0x48/0x84 > [ 3816.197040] [] ? usb_kill_urb+0x21/0xce > [ 3816.197046] [] flush_workqueue+0x4d/0x67 > [ 3816.197053] [] flush_scheduled_work+0x10/0x12 > [ 3816.197061] [] hid_cease_io+0x3b/0x40 > [ 3816.197067] [] hid_pre_reset+0x43/0x4a > [ 3816.197073] [] usb_reset_device+0x6c/0x11c > [ 3816.197080] [] hid_reset+0x9e/0x12e > [ 3816.197086] [] ? hid_reset+0x0/0x12e > [ 3816.197092] [] worker_thread+0x1d3/0x27b > [ 3816.197100] [] ? autoremove_wake_function+0x0/0x34 > [ 3816.197106] [] ? worker_thread+0x0/0x27b > [ 3816.197113] [] kthread+0x55/0x80 > [ 3816.197120] [] child_rip+0xa/0x20 > [ 3816.197128] [] ? restore_args+0x0/0x30 > [ 3816.197135] [] ? kthread+0x0/0x80 > [ 3816.197140] [] ? child_rip+0x0/0x20 > [ 3816.197145] ---[ end trace 1e05d800555b77d7 ]--- It's an error in workqueue-avoid-recursion-in-run_workqueue.patch, methinks. We used to permit keventd to run flush_workqueue(): static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) { int active; if (cwq->thread == current) { /* * Probably keventd trying to flush its own queue. So simply run * it by hand rather than deadlocking. */ run_workqueue(cwq); active = 1; } else { struct wq_barrier barr; active = 0; spin_lock_irq(&cwq->lock); if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { insert_wq_barrier(cwq, &barr, &cwq->worklist); active = 1; } spin_unlock_irq(&cwq->lock); if (active) wait_for_completion(&barr.done); } return active; } but after workqueue-avoid-recursion-in-run_workqueue.patch, we warn instead: static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) { int active = 0; struct wq_barrier barr; WARN_ON(cwq->thread == current); spin_lock_irq(&cwq->lock); if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { insert_wq_barrier(cwq, &barr, &cwq->worklist); active = 1; } spin_unlock_irq(&cwq->lock); if (active) wait_for_completion(&barr.done); return active; }