From: Bjorn Helgaas <bhelgaas@google.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
Joe Lawrence <Joe.Lawrence@stratus.com>,
linux-pci@vger.kernel.org, Matthew Garrett <mjg59@srcf.ucam.org>,
Myron Stowe <mstowe@redhat.com>,
David Bulkow <david.bulkow@stratus.com>
Subject: Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
Date: Wed, 27 Feb 2013 13:14:49 -0700 [thread overview]
Message-ID: <20130227201449.GA6409@google.com> (raw)
In-Reply-To: <CAE9FiQV814mCKqcOKAE_y3JSUq2HotVjde6VYW4M8Zb2-S=NRA@mail.gmail.com>
On Sat, Feb 23, 2013 at 07:13:11PM -0800, Yinghai Lu wrote:
> On Sat, Feb 23, 2013 at 4:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > I think this is a general object lifetime issue that really has
> > nothing to do with ASPM except that ASPM happens to be the victim.
> >
> > You're doing this:
> >
> > echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
> >> /sys/bus/pci/devices/0000\:1a\:01.0/remove
> >
> > The 1a:01.0 device is downstream from the 10:00.0 bridge. The sysfs
> > interface remove_store() uses device_schedule_callback() to schedule
> > the remove for later. I think what's happening is that we schedule
> > remove_callback() for both devices before 10:00.0 has been removed,
> > like this:
> >
> > # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
> > remove_store # for 10:00.0
> > device_schedule_callback(10:00.0, remove_callback)
> > sysfs_schedule_callback
> > kobject_get
> > queue_work
> > # echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
> > remove_store # for 1a:01.0
> > device_schedule_callback(1a:01.0, remove_callback)
> > sysfs_schedule_callback
> > kobject_get
> > queue_work
> >
> > Note that we acquire a reference on each pci_dev before queuing the work item.
> >
> > Later, we run the callbacks, starting with 10:00.0. This calls
> > remove_callback() to perform the remove:
> >
> > remove_callback(10:00.0)
> > mutex_lock(&pci_remove_rescan_mutex)
> > pci_stop_and_remove_bus_device(pdev)
> > mutex_unlock(&pci_remove_rescan_mutex)
> >
> > This will stop and remove the subtree below 10:00.0, but it does not
> > actually free the pci_dev for 1a:01.0 because we increased its ref
> > count in sysfs_schedule_callback. So after completing
> > remove_callback(10:00.0), we run the second callback for 1a:01.0.
> >
> > The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
> > pci_stop_dev(). This is where we blow up because, according to your
> > debugging, pdev->bus->self is no longer valid.
> >
> > The PCI core did this removal wrong. If we have a valid pci_dev
> > pointer, as we do in pcie_aspm_exit_link_state(), the whole object
> > ought to be valid. But the PCI core deallocated the struct pci_bus
> > for bus 0000:1a too soon.
> >
> > My guess is that when we build a pci_dev, we need to increase the ref
> > count on the pci_bus where that pci_dev lives. That way we can keep
> > around all the buses and bridges leading from the root to the device
> > in question.
>
> Agreed, Please check if attached is what you want.
I included the patch directly below for convenience.
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index b494066..6df5428 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
> */
> down_write(&pci_bus_sem);
> list_add_tail(&dev->bus_list, &bus->devices);
> + get_device(&bus->dev);
> up_write(&pci_bus_sem);
This suggests that taking the reference must be protected by
pci_bus_sem, but I don't think that's the case, is it?
> pci_fixup_device(pci_fixup_final, dev);
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index cc875e6..a72b700 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
> {
> down_write(&pci_bus_sem);
> list_del(&dev->bus_list);
> + put_device(&dev->bus->dev);
> up_write(&pci_bus_sem);
>
> pci_free_resources(dev);
The problem is that we capture the struct pci_bus pointer without
taking a reference on the bus object. The capture occurs in
pci_scan_device() (and other callers of alloc_pci_dev()) where we do
this:
dev = alloc_pci_dev();
dev->bus = bus;
To make code analysis easier, I think we should take the reference at
the exact point where we capture the pointer, using something like:
dev->bus = pci_bus_get(bus);
It'd probably be even better to deprecate alloc_pci_dev() and make a
pci_alloc_dev(struct pci_bus *bus) that takes care of acquiring the
reference itself.
But I don't think this takes care of the whole issue. What protects
the pci_scan_slot() path against concurrent removal of the pci_bus?
I don't see anything in the hotplug drivers or in the
pci_scan_child_bus() path that acquires a reference or does locking
for this.
Bjorn
next prev parent reply other threads:[~2013-02-27 20:14 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
2013-01-31 23:29 ` Myron Stowe
2013-02-01 19:55 ` Joe Lawrence
2013-02-01 22:31 ` Bjorn Helgaas
2013-02-06 10:09 ` Gu Zheng
2013-02-06 15:23 ` Joe Lawrence
2013-02-09 0:35 ` Bjorn Helgaas
[not found] ` <5122F276.80807@cn.fujitsu.com>
2013-02-24 0:20 ` Bjorn Helgaas
2013-02-24 3:13 ` Yinghai Lu
2013-02-27 20:14 ` Bjorn Helgaas [this message]
2013-02-25 5:59 ` Gu Zheng
2013-02-26 16:03 ` Myron Stowe
2013-02-27 6:42 ` Gu Zheng
2013-02-27 6:47 ` Yinghai Lu
2013-02-28 10:47 ` Gu Zheng
2013-02-28 11:50 ` Yijing Wang
2013-02-28 15:11 ` Yinghai Lu
2013-03-01 1:14 ` Gu Zheng
2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
2013-01-18 22:54 ` Myron Stowe
2013-02-01 22:32 ` Bjorn Helgaas
[not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
2013-01-18 19:53 ` PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 23:15 ` Myron Stowe
2013-01-18 23:41 ` Myron Stowe
2013-01-19 1:03 ` Joe Lawrence
2013-02-01 22:45 ` Bjorn Helgaas
2013-01-18 19:57 ` Myron Stowe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130227201449.GA6409@google.com \
--to=bhelgaas@google.com \
--cc=Joe.Lawrence@stratus.com \
--cc=david.bulkow@stratus.com \
--cc=guz.fnst@cn.fujitsu.com \
--cc=linux-pci@vger.kernel.org \
--cc=mjg59@srcf.ucam.org \
--cc=mstowe@redhat.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.