From: Bjorn Helgaas <bhelgaas@google.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
Joe Lawrence <Joe.Lawrence@stratus.com>,
linux-pci@vger.kernel.org, Matthew Garrett <mjg59@srcf.ucam.org>,
Myron Stowe <mstowe@redhat.com>,
David Bulkow <david.bulkow@stratus.com>
Subject: Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
Date: Wed, 27 Feb 2013 13:14:49 -0700 [thread overview]
Message-ID: <20130227201449.GA6409@google.com> (raw)
In-Reply-To: <CAE9FiQV814mCKqcOKAE_y3JSUq2HotVjde6VYW4M8Zb2-S=NRA@mail.gmail.com>
On Sat, Feb 23, 2013 at 07:13:11PM -0800, Yinghai Lu wrote:
> On Sat, Feb 23, 2013 at 4:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > I think this is a general object lifetime issue that really has
> > nothing to do with ASPM except that ASPM happens to be the victim.
> >
> > You're doing this:
> >
> > echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
> >> /sys/bus/pci/devices/0000\:1a\:01.0/remove
> >
> > The 1a:01.0 device is downstream from the 10:00.0 bridge. The sysfs
> > interface remove_store() uses device_schedule_callback() to schedule
> > the remove for later. I think what's happening is that we schedule
> > remove_callback() for both devices before 10:00.0 has been removed,
> > like this:
> >
> > # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
> > remove_store # for 10:00.0
> > device_schedule_callback(10:00.0, remove_callback)
> > sysfs_schedule_callback
> > kobject_get
> > queue_work
> > # echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
> > remove_store # for 1a:01.0
> > device_schedule_callback(1a:01.0, remove_callback)
> > sysfs_schedule_callback
> > kobject_get
> > queue_work
> >
> > Note that we acquire a reference on each pci_dev before queuing the work item.
> >
> > Later, we run the callbacks, starting with 10:00.0. This calls
> > remove_callback() to perform the remove:
> >
> > remove_callback(10:00.0)
> > mutex_lock(&pci_remove_rescan_mutex)
> > pci_stop_and_remove_bus_device(pdev)
> > mutex_unlock(&pci_remove_rescan_mutex)
> >
> > This will stop and remove the subtree below 10:00.0, but it does not
> > actually free the pci_dev for 1a:01.0 because we increased its ref
> > count in sysfs_schedule_callback. So after completing
> > remove_callback(10:00.0), we run the second callback for 1a:01.0.
> >
> > The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
> > pci_stop_dev(). This is where we blow up because, according to your
> > debugging, pdev->bus->self is no longer valid.
> >
> > The PCI core did this removal wrong. If we have a valid pci_dev
> > pointer, as we do in pcie_aspm_exit_link_state(), the whole object
> > ought to be valid. But the PCI core deallocated the struct pci_bus
> > for bus 0000:1a too soon.
> >
> > My guess is that when we build a pci_dev, we need to increase the ref
> > count on the pci_bus where that pci_dev lives. That way we can keep
> > around all the buses and bridges leading from the root to the device
> > in question.
>
> Agreed, Please check if attached is what you want.
I included the patch directly below for convenience.
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index b494066..6df5428 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
> */
> down_write(&pci_bus_sem);
> list_add_tail(&dev->bus_list, &bus->devices);
> + get_device(&bus->dev);
> up_write(&pci_bus_sem);
This suggests that taking the reference must be protected by
pci_bus_sem, but I don't think that's the case, is it?
> pci_fixup_device(pci_fixup_final, dev);
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index cc875e6..a72b700 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
> {
> down_write(&pci_bus_sem);
> list_del(&dev->bus_list);
> + put_device(&dev->bus->dev);
> up_write(&pci_bus_sem);
>
> pci_free_resources(dev);
The problem is that we capture the struct pci_bus pointer without
taking a reference on the bus object. The capture occurs in
pci_scan_device() (and other callers of alloc_pci_dev()) where we do
this:
dev = alloc_pci_dev();
dev->bus = bus;
To make code analysis easier, I think we should take the reference at
the exact point where we capture the pointer, using something like:
dev->bus = pci_bus_get(bus);
It'd probably be even better to deprecate alloc_pci_dev() and make a
pci_alloc_dev(struct pci_bus *bus) that takes care of acquiring the
reference itself.
But I don't think this takes care of the whole issue. What protects
the pci_scan_slot() path against concurrent removal of the pci_bus?
I don't see anything in the hotplug drivers or in the
pci_scan_child_bus() path that acquires a reference or does locking
for this.
Bjorn
next prev parent reply other threads:[~2013-02-27 20:14 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
2013-01-31 23:29 ` Myron Stowe
2013-02-01 19:55 ` Joe Lawrence
2013-02-01 22:31 ` Bjorn Helgaas
2013-02-06 10:09 ` Gu Zheng
2013-02-06 15:23 ` Joe Lawrence
2013-02-09 0:35 ` Bjorn Helgaas
[not found] ` <5122F276.80807@cn.fujitsu.com>
2013-02-24 0:20 ` Bjorn Helgaas
2013-02-24 3:13 ` Yinghai Lu
2013-02-27 20:14 ` Bjorn Helgaas [this message]
2013-02-25 5:59 ` Gu Zheng
2013-02-26 16:03 ` Myron Stowe
2013-02-27 6:42 ` Gu Zheng
2013-02-27 6:47 ` Yinghai Lu
2013-02-28 10:47 ` Gu Zheng
2013-02-28 11:50 ` Yijing Wang
2013-02-28 15:11 ` Yinghai Lu
2013-03-01 1:14 ` Gu Zheng
2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
2013-01-18 22:54 ` Myron Stowe
2013-02-01 22:32 ` Bjorn Helgaas
[not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
2013-01-18 19:53 ` PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 23:15 ` Myron Stowe
2013-01-18 23:41 ` Myron Stowe
2013-01-19 1:03 ` Joe Lawrence
2013-02-01 22:45 ` Bjorn Helgaas
2013-01-18 19:57 ` Myron Stowe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130227201449.GA6409@google.com \
--to=bhelgaas@google.com \
--cc=Joe.Lawrence@stratus.com \
--cc=david.bulkow@stratus.com \
--cc=guz.fnst@cn.fujitsu.com \
--cc=linux-pci@vger.kernel.org \
--cc=mjg59@srcf.ucam.org \
--cc=mstowe@redhat.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).