All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	bhelgaas@google.com
Subject: Re: workqueue, pci: INFO: possible recursive locking detected
Date: Wed, 17 Jul 2013 18:07:08 +0800	[thread overview]
Message-ID: <51E66CCC.9010600@cn.fujitsu.com> (raw)
In-Reply-To: <51E55B7D.2040209@linux.vnet.ibm.com>

On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
> Hi,
> 
> I have been seeing this warning every time during boot. I haven't
> spent time digging through it though... Please let me know if
> any machine-specific info is needed.
> 
> Regards,
> Srivatsa S. Bhat
> 
> 
> ----------------------------------------------------
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
> 
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));


Hi, Srivatsa

This is false negative, the two "wfc"s are different, they are
both on stack. flush_work() can't be deadlock in such case:

void foo(void *)
{
	...
	if (xxx)
		work_on_cpu(..., foo, ...);
	...
}

bar()
{
	work_on_cpu(..., foo, ...);
}

The complaint is caused by "work_on_cpu() uses a static lock_class_key".
we should fix work_on_cpu().
(but the caller should also be careful, the foo()/local_pci_probe() is re-entering)

But I can't find an elegant fix.

long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
{
	struct work_for_cpu wfc = { .fn = fn, .arg = arg };

+#ifdef CONFIG_LOCKDEP
+	static struct lock_class_key __key;
+	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+	lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
+#else
	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
	schedule_work_on(cpu, &wfc.work);
	flush_work(&wfc.work);
	return wfc.ret;
}


Any think? Tejun?

thanks,
Lai

> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by kworker/0:1/142:
>  #0:  (events){.+.+.+}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>  #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8139a3ba>] device_attach+0x2a/0xc0
> 
> stack backtrace:
> CPU: 0 PID: 142 Comm: kworker/0:1 Not tainted 3.11.0-rc1-lockdep-fix-a #6
> Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: events work_for_cpu_fn
>  ffff881036fecd88 ffff881036fef678 ffffffff8161a919 0000000000000003
>  ffff881036fec400 ffff881036fef6a8 ffffffff810c2234 ffff881036fec400
>  ffff881036fecd88 ffff881036fec400 0000000000000000 ffff881036fef708
> Call Trace:
>  [<ffffffff8161a919>] dump_stack+0x59/0x80
>  [<ffffffff810c2234>] print_deadlock_bug+0xf4/0x100
>  [<ffffffff810c3d14>] validate_chain+0x504/0x750
>  [<ffffffff810c426d>] __lock_acquire+0x30d/0x580
>  [<ffffffff810c4577>] lock_acquire+0x97/0x170
>  [<ffffffff81077100>] ? start_flush_work+0x220/0x220
>  [<ffffffff81077148>] flush_work+0x48/0xb0
>  [<ffffffff81077100>] ? start_flush_work+0x220/0x220
>  [<ffffffff810c2c10>] ? mark_held_locks+0x80/0x130
>  [<ffffffff81074cfb>] ? queue_work_on+0x4b/0xa0
>  [<ffffffff810c2f85>] ? trace_hardirqs_on_caller+0x105/0x1d0
>  [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81077320>] work_on_cpu+0x80/0x90
>  [<ffffffff8106f950>] ? wqattrs_hash+0x190/0x190
>  [<ffffffff812d37b0>] ? pci_pm_prepare+0x60/0x60
>  [<ffffffff812a0059>] ? cpumask_next_and+0x29/0x50
>  [<ffffffff812d38da>] __pci_device_probe+0x9a/0xe0
>  [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff812d4be2>] ? pci_dev_get+0x22/0x30
>  [<ffffffff812d4c2a>] pci_device_probe+0x3a/0x60
>  [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff8139a4bc>] really_probe+0x6c/0x320
>  [<ffffffff8139a7b7>] driver_probe_device+0x47/0xa0
>  [<ffffffff8139a8c0>] ? __driver_attach+0xb0/0xb0
>  [<ffffffff8139a913>] __device_attach+0x53/0x60
>  [<ffffffff81398404>] bus_for_each_drv+0x74/0xa0
>  [<ffffffff8139a430>] device_attach+0xa0/0xc0
>  [<ffffffff812cb2d9>] pci_bus_add_device+0x39/0x60
>  [<ffffffff812eec21>] virtfn_add+0x251/0x3e0
>  [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff812ef29f>] sriov_enable+0x22f/0x3d0
>  [<ffffffff812ef48d>] pci_enable_sriov+0x4d/0x60
>  [<ffffffffa0143045>] be_vf_setup+0x175/0x410 [be2net]
>  [<ffffffffa01493ca>] be_setup+0x37a/0x4b0 [be2net]
>  [<ffffffffa0149ac0>] be_probe+0x5c0/0x820 [be2net]
>  [<ffffffff812d37fe>] local_pci_probe+0x4e/0x90
>  [<ffffffff8106f968>] work_for_cpu_fn+0x18/0x30
>  [<ffffffff81075e4a>] process_one_work+0x1da/0x610
>  [<ffffffff81075dd9>] ? process_one_work+0x169/0x610
>  [<ffffffff8107650c>] worker_thread+0x28c/0x3a0
>  [<ffffffff81076280>] ? process_one_work+0x610/0x610
>  [<ffffffff8107da3e>] kthread+0xee/0x100
>  [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
>  [<ffffffff8162a71c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
> 
> 


  reply	other threads:[~2013-07-17 10:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 14:41 workqueue, pci: INFO: possible recursive locking detected Srivatsa S. Bhat
2013-07-17 10:07 ` Lai Jiangshan [this message]
2013-07-18 20:23   ` Srivatsa S. Bhat
2013-07-19  1:47     ` Lai Jiangshan
2013-07-19  8:57       ` Srivatsa S. Bhat
2013-07-22 11:52         ` Lai Jiangshan
2013-07-22 15:37           ` Srivatsa S. Bhat
2013-07-22 21:38             ` Bjorn Helgaas
2013-07-22 22:06               ` Yinghai Lu
2013-07-22 22:33               ` Alexander Duyck
2013-07-22 21:32           ` Tejun Heo
2013-07-23  1:23             ` Lai Jiangshan
2013-07-23 14:38               ` Tejun Heo
2013-07-24 10:31                 ` Lai Jiangshan
2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
2013-07-27 17:11                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E66CCC.9010600@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.