From: "Michael S. Tsirkin" <mst@redhat.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>,
Ming Lei <ming.lei@canonical.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
David Miller <davem@davemloft.net>,
Roland Dreier <roland@kernel.org>,
netdev <netdev@vger.kernel.org>, Yan Burman <yanb@mellanox.com>,
Jack Morgenstein <jackm@dev.mellanox.co.il>,
Tejun Heo <tj@kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCHv2 for-3.9] pci: avoid work_on_cpu for nested SRIOV probes
Date: Fri, 19 Apr 2013 17:36:10 +0300 [thread overview]
Message-ID: <20130419143610.GB1926@redhat.com> (raw)
In-Reply-To: <CAErSpo7jf-ytJPhuphzLFH80c9YMM6UzxWAVEnD-25nQgXmARQ@mail.gmail.com>
On Thu, Apr 18, 2013 at 03:57:36PM -0600, Bjorn Helgaas wrote:
> On Thu, Apr 18, 2013 at 3:40 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Thu, Apr 18, 2013 at 2:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> The following lockdep report triggers since 3.9-rc1:
> >>
> >> =============================================
> >> [ INFO: possible recursive locking detected ]
> >> 3.9.0-rc1 #96 Not tainted
> >> ---------------------------------------------
> >> kworker/0:1/734 is trying to acquire lock:
> >> ((&wfc.work)){+.+.+.}, at: [<ffffffff81066cb0>] flush_work+0x0/0x250
> >>
> >> but task is already holding lock:
> >> ((&wfc.work)){+.+.+.}, at: [<ffffffff81064352>]
> >> process_one_work+0x162/0x4c0
> >>
> >> other info that might help us debug this:
> >> Possible unsafe locking scenario:
> >>
> >> CPU0
> >> ----
> >> lock((&wfc.work));
> >> lock((&wfc.work));
> >>
> >> *** DEADLOCK ***
> >>
> >> May be due to missing lock nesting notation
> >>
> >> 3 locks held by kworker/0:1/734:
> >> #0: (events){.+.+.+}, at: [<ffffffff81064352>]
> >> process_one_work+0x162/0x4c0
> >> #1: ((&wfc.work)){+.+.+.}, at: [<ffffffff81064352>]
> >> process_one_work+0x162/0x4c0
> >> #2: (&__lockdep_no_validate__){......}, at: [<ffffffff812db225>]
> >> device_attach+0x25/0xb0
> >>
> >> stack backtrace:
> >> Pid: 734, comm: kworker/0:1 Not tainted 3.9.0-rc1 #96
> >> Call Trace:
> >> [<ffffffff810948ec>] validate_chain+0xdcc/0x11f0
> >> [<ffffffff81095150>] __lock_acquire+0x440/0xc70
> >> [<ffffffff81095150>] ? __lock_acquire+0x440/0xc70
> >> [<ffffffff810959da>] lock_acquire+0x5a/0x70
> >> [<ffffffff81066cb0>] ? wq_worker_waking_up+0x60/0x60
> >> [<ffffffff81066cf5>] flush_work+0x45/0x250
> >> [<ffffffff81066cb0>] ? wq_worker_waking_up+0x60/0x60
> >> [<ffffffff810922be>] ? mark_held_locks+0x9e/0x130
> >> [<ffffffff81066a96>] ? queue_work_on+0x46/0x90
> >> [<ffffffff810925dd>] ? trace_hardirqs_on_caller+0xfd/0x190
> >> [<ffffffff8109267d>] ? trace_hardirqs_on+0xd/0x10
> >> [<ffffffff81066f74>] work_on_cpu+0x74/0x90
> >> [<ffffffff81063820>] ? keventd_up+0x20/0x20
> >> [<ffffffff8121fd30>] ? pci_pm_prepare+0x60/0x60
> >> [<ffffffff811f9293>] ? cpumask_next_and+0x23/0x40
> >> [<ffffffff81220a1a>] pci_device_probe+0xba/0x110
> >> [<ffffffff812dadca>] ? driver_sysfs_add+0x7a/0xb0
> >> [<ffffffff812daf1f>] driver_probe_device+0x8f/0x230
> >> [<ffffffff812db170>] ? __driver_attach+0xb0/0xb0
> >> [<ffffffff812db1bb>] __device_attach+0x4b/0x60
> >> [<ffffffff812d9314>] bus_for_each_drv+0x64/0x90
> >> [<ffffffff812db298>] device_attach+0x98/0xb0
> >> [<ffffffff81218474>] pci_bus_add_device+0x24/0x50
> >> [<ffffffff81232e80>] virtfn_add+0x240/0x3e0
> >> [<ffffffff8146ce3d>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
> >> [<ffffffff812333be>] pci_enable_sriov+0x23e/0x500
> >> [<ffffffffa011fa1a>] __mlx4_init_one+0x5da/0xce0 [mlx4_core]
> >> [<ffffffffa012016d>] mlx4_init_one+0x2d/0x60 [mlx4_core]
> >> [<ffffffff8121fd79>] local_pci_probe+0x49/0x80
> >> [<ffffffff81063833>] work_for_cpu_fn+0x13/0x20
> >> [<ffffffff810643b8>] process_one_work+0x1c8/0x4c0
> >> [<ffffffff81064352>] ? process_one_work+0x162/0x4c0
> >> [<ffffffff81064cfb>] worker_thread+0x30b/0x430
> >> [<ffffffff810649f0>] ? manage_workers+0x340/0x340
> >> [<ffffffff8106cea6>] kthread+0xd6/0xe0
> >> [<ffffffff8106cdd0>] ? __init_kthread_worker+0x70/0x70
> >> [<ffffffff8146daac>] ret_from_fork+0x7c/0xb0
> >> [<ffffffff8106cdd0>] ? __init_kthread_worker+0x70/0x70
> >>
> >> The issue is that a driver, in it's probe function, calls
> >> pci_sriov_enable so a PF device probe causes VF probe (AKA nested
> >> probe). Each probe in pci_device_probe which is (normally) run through
> >> work_on_cpu (this is to get the right numa node for memory allocated by
> >> the driver). In turn work_on_cpu does this internally:
> >>
> >> schedule_work_on(cpu, &wfc.work);
> >> flush_work(&wfc.work);
> >>
> >> So if you are running probe on CPU1, and cause another
> >> probe on the same CPU, this will try to flush
> >> workqueue from inside same workqueue which causes
> >> a lockep warning.
> >>
> >> Nested probing might be tricky to get right generally.
> >>
> >> But for pci_sriov_enable, the situation is actually very simple: all VFs
> >> naturally have same affinity as the PF, and cpumask_any_and is actually
> >> same as cpumask_first_and, so it always gives us the same CPU.
> >> So let's just detect that, and run the probing for VFs locally without a
> >> workqueue.
> >>
> >> This is hardly elegant, but looks to me like an appropriate quick fix
> >> for 3.9.
> >>
> >> Tested-by: Or Gerlitz <ogerlitz@mellanox.com>
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> Acked-by: Tejun Heo <tj@kernel.org>
> >
> > Thanks, Michael. I put this in my for-linus branch:
> >
> > http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus
> >
> > I'll send a pull request to Linus today.
>
> Actually, let me make sure I understand this correctly: This patch
> fixes the lockdep warning, but it does not fix an actual deadlock or
> make any functional change. Is that right?
Tejun said that this warning is a false positive, so yes.
> If that's true, how much pain would it cause to just hold this for
> v3.9.1? I'm nervous about doing a warning fix when we're only a day
> or two before releasing v3.9.
>
> Bjorn
I don't have this hardware, so I don't know. It was apparently reported
by real users ...
> >> ---
> >>
> >> Changes from v1:
> >> - clarified commit log and added Ack by Tejun Heo
> >> patch is unchanged.
> >>
> >> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> >> index 1fa1e48..6eeb5ec 100644
> >> --- a/drivers/pci/pci-driver.c
> >> +++ b/drivers/pci/pci-driver.c
> >> @@ -286,8 +286,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> >> int cpu;
> >>
> >> get_online_cpus();
> >> cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
> >> - if (cpu < nr_cpu_ids)
> >> + if (cpu != raw_smp_processor_id() && cpu < nr_cpu_ids)
> >> error = work_on_cpu(cpu, local_pci_probe, &ddi);
> >> else
> >> error = local_pci_probe(&ddi);
next prev parent reply other threads:[~2013-04-19 15:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-05 15:21 hitting lockdep warning as of too early VF probe with 3.9-rc1 Or Gerlitz
2013-03-06 2:43 ` Ming Lei
2013-03-06 20:54 ` Or Gerlitz
2013-03-07 2:03 ` Ming Lei
2013-03-10 15:28 ` Jack Morgenstein
2013-03-10 16:37 ` Greg Kroah-Hartman
2013-03-11 1:26 ` Ming Lei
2013-03-11 20:24 ` Ben Hutchings
2013-04-17 15:14 ` Or Gerlitz
2013-04-11 15:25 ` [PATCH for-3.9] pci: avoid work_on_cpu for nested SRIOV probes Michael S. Tsirkin
2013-04-18 20:08 ` [PATCHv2 " Michael S. Tsirkin
2013-04-18 21:40 ` Bjorn Helgaas
2013-04-18 21:57 ` Bjorn Helgaas
2013-04-19 14:36 ` Michael S. Tsirkin [this message]
[not found] ` <CAOS58YO+uV5KkS=sTP9Y3BPPh1nVnQ06yRyNU8GvEbym7R+X+Q@mail.gmail.com>
2013-04-19 16:39 ` Bjorn Helgaas
2013-04-20 19:05 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130419143610.GB1926@redhat.com \
--to=mst@redhat.com \
--cc=bhelgaas@google.com \
--cc=davem@davemloft.net \
--cc=gregkh@linuxfoundation.org \
--cc=jackm@dev.mellanox.co.il \
--cc=linux-pci@vger.kernel.org \
--cc=ming.lei@canonical.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=roland@kernel.org \
--cc=tj@kernel.org \
--cc=yanb@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.