All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	bhelgaas@google.com
Subject: Re: workqueue, pci: INFO: possible recursive locking detected
Date: Fri, 19 Jul 2013 09:47:39 +0800	[thread overview]
Message-ID: <51E89ABB.20808@cn.fujitsu.com> (raw)
In-Reply-To: <51E84EDC.5090502@linux.vnet.ibm.com>

On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
> 
> On 07/17/2013 03:37 PM, Lai Jiangshan wrote:
>> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
>>> Hi,
>>>
>>> I have been seeing this warning every time during boot. I haven't
>>> spent time digging through it though... Please let me know if
>>> any machine-specific info is needed.
>>>
>>> Regards,
>>> Srivatsa S. Bhat
>>>
>>>
>>> ----------------------------------------------------
>>>
>>> =============================================
>>> [ INFO: possible recursive locking detected ]
>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>> ---------------------------------------------
>>> kworker/0:1/142 is trying to acquire lock:
>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>
>>> but task is already holding lock:
>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>
>>> other info that might help us debug this:
>>>  Possible unsafe locking scenario:
>>>
>>>        CPU0
>>>        ----
>>>   lock((&wfc.work));
>>>   lock((&wfc.work));
>>
>>
> 
> 
> Hi Lai,
> 
> Thanks for taking a look into this!
> 
>>
>> This is false negative,
> 
> I believe you meant false-positive...
> 
>> the two "wfc"s are different, they are
>> both on stack. flush_work() can't be deadlock in such case:
>>
>> void foo(void *)
>> {
>> 	...
>> 	if (xxx)
>> 		work_on_cpu(..., foo, ...);
>> 	...
>> }
>>
>> bar()
>> {
>> 	work_on_cpu(..., foo, ...);
>> }
>>
>> The complaint is caused by "work_on_cpu() uses a static lock_class_key".
>> we should fix work_on_cpu().
>> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering)
>>
>> But I can't find an elegant fix.
>>
>> long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> {
>> 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> +#ifdef CONFIG_LOCKDEP
>> +	static struct lock_class_key __key;
>> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +	lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
>> +#else
>> 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +#endif
>> 	schedule_work_on(cpu, &wfc.work);
>> 	flush_work(&wfc.work);
>> 	return wfc.ret;
>> }
>>
> 
> Unfortunately that didn't seem to fix it.. I applied the patch
> shown below, and I got the same old warning.
> 
> ---
> 
>  kernel/workqueue.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..07d9a67 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>  {
>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>  
> +#ifdef CONFIG_LOCKDEP
> +	static struct lock_class_key __key;

Sorry, this "static" should be removed.

Thanks,
Lai


> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
> +#else
>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
>  	schedule_work_on(cpu, &wfc.work);
>  	flush_work(&wfc.work);
>  	return wfc.ret;
> 
> 
> 
> Warning:
> --------
> 
> wmi: Mapper loaded
> be2net 0000:11:00.0: irq 102 for MSI/MSI-X
> be2net 0000:11:00.0: enabled 1 MSI-x vector(s)
> be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue
> be2net 0000:11:00.0: created 1 TX queue(s)
> pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-wq-fix #10 Not tainted
> ---------------------------------------------
> kworker/0:1/126 is trying to acquire lock:
>  (&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0
> 
> but task is already holding lock:
>  (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&wfc.work);
>   lock(&wfc.work);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by kworker/0:1/126:
>  #0:  (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
>  #1:  (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
>  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0
> 
> stack backtrace:
> CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10
> Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: events work_for_cpu_fn
>  ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003
>  ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80
>  ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8
> Call Trace:
>  [<ffffffff81619059>] dump_stack+0x59/0x80
>  [<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100
>  [<ffffffff810c3104>] validate_chain+0x504/0x750
>  [<ffffffff810c365d>] __lock_acquire+0x30d/0x580
>  [<ffffffff810c3967>] lock_acquire+0x97/0x170
>  [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
>  [<ffffffff81077138>] flush_work+0x48/0xb0
>  [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
>  [<ffffffff810c2000>] ? mark_held_locks+0x80/0x130
>  [<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0
>  [<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0
>  [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81077334>] work_on_cpu+0xa4/0xc0
>  [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
>  [<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60
>  [<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0
>  [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff812d3302>] ? pci_dev_get+0x22/0x30
>  [<ffffffff812d334a>] pci_device_probe+0x3a/0x60
>  [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff81398bdc>] really_probe+0x6c/0x320
>  [<ffffffff81398ed7>] driver_probe_device+0x47/0xa0
>  [<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0
>  [<ffffffff81399033>] __device_attach+0x53/0x60
>  [<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0
>  [<ffffffff81398b50>] device_attach+0xa0/0xc0
>  [<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60
>  [<ffffffff812ed341>] virtfn_add+0x251/0x3e0
>  [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0
>  [<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60
>  [<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net]
>  [<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net]
>  [<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net]
>  [<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90
>  [<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30
>  [<ffffffff81075e3a>] process_one_work+0x1da/0x610
>  [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
>  [<ffffffff810764fc>] worker_thread+0x28c/0x3a0
>  [<ffffffff81076270>] ? process_one_work+0x610/0x610
>  [<ffffffff8107da5e>] kthread+0xee/0x100
>  [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
>  [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
> be2net 0000:11:04.0: enabling device (0040 -> 0042)
> be2net 0000:11:04.0: Could not use PCIe error reporting
> be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1
> be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1
> 
> 
> Regards,
> Srivatsa S. Bhat
> 
> 


  reply	other threads:[~2013-07-19  2:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 14:41 workqueue, pci: INFO: possible recursive locking detected Srivatsa S. Bhat
2013-07-17 10:07 ` Lai Jiangshan
2013-07-18 20:23   ` Srivatsa S. Bhat
2013-07-19  1:47     ` Lai Jiangshan [this message]
2013-07-19  8:57       ` Srivatsa S. Bhat
2013-07-22 11:52         ` Lai Jiangshan
2013-07-22 15:37           ` Srivatsa S. Bhat
2013-07-22 21:38             ` Bjorn Helgaas
2013-07-22 22:06               ` Yinghai Lu
2013-07-22 22:33               ` Alexander Duyck
2013-07-22 21:32           ` Tejun Heo
2013-07-23  1:23             ` Lai Jiangshan
2013-07-23 14:38               ` Tejun Heo
2013-07-24 10:31                 ` Lai Jiangshan
2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
2013-07-27 17:11                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E89ABB.20808@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.