linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	Yinghai Lu <yinghai@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: workqueue, pci: INFO: possible recursive locking detected
Date: Mon, 22 Jul 2013 15:33:01 -0700	[thread overview]
Message-ID: <51EDB31D.6090103@intel.com> (raw)
In-Reply-To: <CAErSpo4vbfctQwGzexEuLoGMh33oU1oJOM8bkuq4uk7EtkE0xw@mail.gmail.com>

On 07/22/2013 02:38 PM, Bjorn Helgaas wrote:
> [+cc Alex, Yinghai, linux-pci]
>
> On Mon, Jul 22, 2013 at 9:37 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
>>> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>>>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>>> ---
>>>>>>
>>>>>>  kernel/workqueue.c |    6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>>
>>>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>>>> index f02c4a4..07d9a67 100644
>>>>>> --- a/kernel/workqueue.c
>>>>>> +++ b/kernel/workqueue.c
>>>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>>>  {
>>>>>>    struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>>>
>>>>>> +#ifdef CONFIG_LOCKDEP
>>>>>> +  static struct lock_class_key __key;
>>>>> Sorry, this "static" should be removed.
>>>>>
>>>> That didn't help either :-( Because it makes lockdep unhappy,
>>>> since the key isn't persistent.
>>>>
>>>> This is the patch I used:
>>>>
>>>> ---
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index f02c4a4..7967e3b 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>  {
>>>>      struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>
>>>> +#ifdef CONFIG_LOCKDEP
>>>> +    struct lock_class_key __key;
>>>> +    INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>>> +    lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>>>> +#else
>>>>      INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>>> +#endif
>>>>      schedule_work_on(cpu, &wfc.work);
>>>>      flush_work(&wfc.work);
>>>>      return wfc.ret;
>>>>
>>>>
>>>> And here are the new warnings:
>>>>
>>>>
>>>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>>>> io scheduler noop registered
>>>> io scheduler deadline registered
>>>> io scheduler cfq registered (default)
>>>> BUG: key ffff881039557b98 not in .data!
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
>>> Sorry again.
>>>
>>> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 22 Jul 2013 19:08:51 +0800
>>> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
>>>  call work_on_cpu()
>>>
>>> If the @fn call work_on_cpu() again, the lockdep will complain:
>>>
>>>> [ INFO: possible recursive locking detected ]
>>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>>> ---------------------------------------------
>>>> kworker/0:1/142 is trying to acquire lock:
>>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>>
>>>> but task is already holding lock:
>>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>>
>>>> other info that might help us debug this:
>>>>  Possible unsafe locking scenario:
>>>>
>>>>        CPU0
>>>>        ----
>>>>   lock((&wfc.work));
>>>>   lock((&wfc.work));
>>>>
>>>>  *** DEADLOCK ***
>>> It is false-positive lockdep report. In this sutiation,
>>> the two "wfc"s of the two work_on_cpu() are different,
>>> they are both on stack. flush_work() can't be deadlock.
>>>
>>> To fix this, we need to avoid the lockdep checking in this case,
>>> But we don't want to change the flush_work(), so we use
>>> completion instead of flush_work() in the work_on_cpu().
>>>
>>> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>> That worked, thanks a lot!
>>
>> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>>  kernel/workqueue.c |    5 ++++-
>>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..b021a45 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>>>       long (*fn)(void *);
>>>       void *arg;
>>>       long ret;
>>> +     struct completion done;
>>>  };
>>>
>>>  static void work_for_cpu_fn(struct work_struct *work)
>>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>>>       struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>>
>>>       wfc->ret = wfc->fn(wfc->arg);
>>> +     complete(&wfc->done);
>>>  }
>>>
>>>  /**
>>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>       struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>
>>>       INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> +     init_completion(&wfc.done);
>>>       schedule_work_on(cpu, &wfc.work);
>>> -     flush_work(&wfc.work);
>>> +     wait_for_completion(&wfc.done);
>>>       return wfc.ret;
>>>  }
>>>  EXPORT_SYMBOL_GPL(work_on_cpu);
>>>
> Isn't this for the same issue Alex and others have been working on?
>
> It doesn't feel like we have consensus on how this should be fixed.
> You're proposing a change to work_on_cpu(), Alex proposed a change to
> pci_call_probe() [1], Yinghai proposed some changes to the PCI core
> SR-IOV code and several drivers [2].
>
> [1] https://lkml.kernel.org/r/20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com
> [2] https://lkml.kernel.org/r/1368498506-25857-7-git-send-email-yinghai@kernel.org

The solution I proposed was flawed due to possible preemption issues. 
If this solution resolves the issue then I am fine with it as long as it
doesn't introduce any new issues.

Thanks,

Alex

  parent reply	other threads:[~2013-07-22 22:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 14:41 workqueue, pci: INFO: possible recursive locking detected Srivatsa S. Bhat
2013-07-17 10:07 ` Lai Jiangshan
2013-07-18 20:23   ` Srivatsa S. Bhat
2013-07-19  1:47     ` Lai Jiangshan
2013-07-19  8:57       ` Srivatsa S. Bhat
2013-07-22 11:52         ` Lai Jiangshan
2013-07-22 15:37           ` Srivatsa S. Bhat
2013-07-22 21:38             ` Bjorn Helgaas
2013-07-22 22:06               ` Yinghai Lu
2013-07-22 22:33               ` Alexander Duyck [this message]
2013-07-22 21:32           ` Tejun Heo
2013-07-23  1:23             ` Lai Jiangshan
2013-07-23 14:38               ` Tejun Heo
2013-07-24 10:31                 ` Lai Jiangshan
2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
2013-07-27 17:11                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51EDB31D.6090103@intel.com \
    --to=alexander.h.duyck@intel.com \
    --cc=bhelgaas@google.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).