From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758356Ab0DAQjf (ORCPT ); Thu, 1 Apr 2010 12:39:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49597 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758336Ab0DAQj2 (ORCPT ); Thu, 1 Apr 2010 12:39:28 -0400 Date: Thu, 1 Apr 2010 18:36:42 +0200 From: Oleg Nesterov To: Cong Wang Cc: linux-kernel@vger.kernel.org, Tejun Heo , Rusty Russell , akpm@linux-foundation.org, Ingo Molnar Subject: Re: [Patch] workqueue: move lockdep annotations up to destroy_workqueue() Message-ID: <20100401163642.GA19551@redhat.com> References: <20100331105534.5601.50813.sendpatchset@localhost.localdomain> <20100331112559.GA17747@redhat.com> <4BB408AF.4080908@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4BB408AF.4080908@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/01, Cong Wang wrote: > >> I must have missed something, but it seems to me this patch tries to >> supress the valid warning. >> >> Could you please clarify? > > Sure, below is the whole warning. Please teach me how this is valid. Oh, I can never understand the output from lockdep, it is much more clever than me ;) But at first glance, > Mar 31 16:15:02 dhcp-66-70-5 kernel: -> #2 (rtnl_mutex){+.+.+.}: > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] validate_chain+0x1019/0x1540 > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] __lock_acquire+0xd8d/0xe55 > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] lock_acquire+0x160/0x1af > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] mutex_lock_nested+0x64/0x4e9 > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] rtnl_lock+0x1e/0x27 > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] bond_mii_monitor+0x39f/0x74b [bonding] > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] worker_thread+0x2da/0x46c > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] kthread+0xdd/0xec > Mar 31 16:15:02 dhcp-66-70-5 kernel: [] kernel_thread_helper+0x4/0x10 OK, so work->func() takes rtnl_mutex. This means it is not safe to do flush_workqueue() or destroy_workqueue() under rtnl_lock(). This is known fact. > Mar 31 16:15:03 dhcp-66-70-5 kernel: -> #0 ((bond_dev->name)){+.+...}: > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] validate_chain+0xaee/0x1540 > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] __lock_acquire+0xd8d/0xe55 > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] lock_acquire+0x160/0x1af > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] cleanup_workqueue_thread+0x59/0x10b > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] destroy_workqueue+0x9c/0x107 > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] bond_uninit+0x524/0x58a [bonding] > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] rollback_registered_many+0x205/0x2e3 > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] unregister_netdevice_many+0x2a/0x75 > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] __rtnl_kill_links+0x8b/0x9d > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] __rtnl_link_unregister+0x35/0x72 > Mar 31 16:15:03 dhcp-66-70-5 kernel: [] rtnl_link_unregister+0x2c/0x43 However, rtnl_link_unregister() takes rtnl_mutex and then bond_uninit() does cleanup_workqueue_thread(). So, looks like this warning is valid, this path can deadlock if destroy_workqueue() is called when bond->mii_work is queued. Lockdep decided to blaim cpu_add_remove_lock in this chain. Oleg.