From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755379AbZCYVLt (ORCPT ); Wed, 25 Mar 2009 17:11:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752770AbZCYVLk (ORCPT ); Wed, 25 Mar 2009 17:11:40 -0400 Received: from g5t0009.atlanta.hp.com ([15.192.0.46]:48743 "EHLO g5t0009.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752734AbZCYVLj (ORCPT ); Wed, 25 Mar 2009 17:11:39 -0400 Date: Wed, 25 Mar 2009 15:11:36 -0600 From: Alex Chiang To: gregkh@suse.de Cc: kaneshige.kenji@jp.fujitsu.com, linux-kernel Subject: [PATCH] sysfs: don't use global workqueue in sysfs_schedule_callback() Message-ID: <20090325211136.GC30098@ldl.fc.hp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A sysfs attribute using sysfs_schedule_callback() to commit suicide may end up calling device_unregister(), which will eventually call a driver's ->remove function. Drivers may call flush_scheduled_work() in their shutdown routines, in which case lockdep will complain with something like the following: ============================================= [ INFO: possible recursive locking detected ] 2.6.29-rc8-kk #1 --------------------------------------------- events/4/56 is trying to acquire lock: (events){--..}, at: [] flush_workqueue+0x0/0xa0 but task is already holding lock: (events){--..}, at: [] run_workqueue+0x108/0x230 other info that might help us debug this: 3 locks held by events/4/56: #0: (events){--..}, at: [] run_workqueue+0x108/0x230 #1: (&ss->work){--..}, at: [] run_workqueue+0x108/0x230 #2: (pci_remove_rescan_mutex){--..}, at: [] remove_callback+0x21/0x40 stack backtrace: Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1 Call Trace: [] validate_chain+0xb7d/0x1260 [] __lock_acquire+0x42e/0xa40 [] lock_acquire+0x58/0x80 [] ? flush_workqueue+0x0/0xa0 [] flush_workqueue+0x4d/0xa0 [] ? flush_workqueue+0x0/0xa0 [] flush_scheduled_work+0x10/0x20 [] e1000_remove+0x55/0xfe [e1000e] [] ? sysfs_schedule_callback_work+0x0/0x50 [] pci_device_remove+0x32/0x70 [] __device_release_driver+0x59/0x90 [] device_release_driver+0x2b/0x40 [] bus_remove_device+0xa6/0x120 [] device_del+0x12b/0x190 [] device_unregister+0x26/0x70 [] pci_stop_dev+0x49/0x60 [] pci_remove_bus_device+0x40/0xc0 [] remove_callback+0x29/0x40 [] sysfs_schedule_callback_work+0x1f/0x50 [] run_workqueue+0x15a/0x230 [] ? run_workqueue+0x108/0x230 [] worker_thread+0x9f/0x100 [] ? autoremove_wake_function+0x0/0x40 [] ? worker_thread+0x0/0x100 [] kthread+0x4d/0x80 [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? kthread+0x0/0x80 [] ? child_rip+0x0/0x20 Although we know that the device_unregister path will never acquire a lock that a driver might try to acquire in its ->remove, in general we should never attempt to flush a workqueue from within the same workqueue, and lockdep rightly complains. So as long as sysfs attributes cannot commit suicide directly and we are stuck with this callback mechanism, put the sysfs callbacks on their own workqueue instead of the global one. This has the side benefit that if a suicidal sysfs attribute kicks off a long chain of ->remove callbacks, we no longer induce a long delay on the global queue. Reported-by: Kenji Kaneshige Tested-by: Kenji Kaneshige Signed-off-by: Alex Chiang --- This also fixes a missing module_put in the error path introduced by sysfs-only-allow-one-scheduled-removal-callback-per-kobj.patch. We never destroy the workqueue, but I'm not sure that's a problem. --- file.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) --- diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 289c43a..979e937 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -667,6 +667,7 @@ struct sysfs_schedule_callback_struct { struct work_struct work; }; +static struct workqueue_struct *sysfs_workqueue; static DEFINE_MUTEX(sysfs_workq_mutex); static LIST_HEAD(sysfs_workq); static void sysfs_schedule_callback_work(struct work_struct *work) @@ -715,11 +716,20 @@ int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *), mutex_lock(&sysfs_workq_mutex); list_for_each_entry_safe(ss, tmp, &sysfs_workq, workq_list) if (ss->kobj == kobj) { + module_put(owner); mutex_unlock(&sysfs_workq_mutex); return -EAGAIN; } mutex_unlock(&sysfs_workq_mutex); + if (sysfs_workqueue == NULL) { + sysfs_workqueue = create_workqueue("sysfsd"); + if (sysfs_workqueue == NULL) { + module_put(owner); + return -ENOMEM; + } + } + ss = kmalloc(sizeof(*ss), GFP_KERNEL); if (!ss) { module_put(owner); @@ -735,7 +745,7 @@ int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *), mutex_lock(&sysfs_workq_mutex); list_add_tail(&ss->workq_list, &sysfs_workq); mutex_unlock(&sysfs_workq_mutex); - schedule_work(&ss->work); + queue_work(sysfs_workqueue, &ss->work); return 0; } EXPORT_SYMBOL_GPL(sysfs_schedule_callback);