linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
	Joe Lawrence <Joe.Lawrence@stratus.com>,
	linux-pci@vger.kernel.org, Matthew Garrett <mjg59@srcf.ucam.org>,
	Myron Stowe <mstowe@redhat.com>,
	David Bulkow <david.bulkow@stratus.com>
Subject: Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
Date: Wed, 27 Feb 2013 13:14:49 -0700	[thread overview]
Message-ID: <20130227201449.GA6409@google.com> (raw)
In-Reply-To: <CAE9FiQV814mCKqcOKAE_y3JSUq2HotVjde6VYW4M8Zb2-S=NRA@mail.gmail.com>

On Sat, Feb 23, 2013 at 07:13:11PM -0800, Yinghai Lu wrote:
> On Sat, Feb 23, 2013 at 4:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > I think this is a general object lifetime issue that really has
> > nothing to do with ASPM except that ASPM happens to be the victim.
> >
> > You're doing this:
> >
> >     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
> >>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
> >
> > The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
> > interface remove_store() uses device_schedule_callback() to schedule
> > the remove for later.  I think what's happening is that we schedule
> > remove_callback() for both devices before 10:00.0 has been removed,
> > like this:
> >
> >     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
> >     remove_store  # for 10:00.0
> >       device_schedule_callback(10:00.0, remove_callback)
> >         sysfs_schedule_callback
> >           kobject_get
> >           queue_work
> >     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
> >     remove_store  # for 1a:01.0
> >       device_schedule_callback(1a:01.0, remove_callback)
> >         sysfs_schedule_callback
> >           kobject_get
> >           queue_work
> >
> > Note that we acquire a reference on each pci_dev before queuing the work item.
> >
> > Later, we run the callbacks, starting with 10:00.0.  This calls
> > remove_callback() to perform the remove:
> >
> >     remove_callback(10:00.0)
> >       mutex_lock(&pci_remove_rescan_mutex)
> >       pci_stop_and_remove_bus_device(pdev)
> >       mutex_unlock(&pci_remove_rescan_mutex)
> >
> > This will stop and remove the subtree below 10:00.0, but it does not
> > actually free the pci_dev for 1a:01.0 because we increased its ref
> > count in sysfs_schedule_callback.  So after completing
> > remove_callback(10:00.0), we run the second callback for 1a:01.0.
> >
> > The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
> > pci_stop_dev().  This is where we blow up because, according to your
> > debugging, pdev->bus->self is no longer valid.
> >
> > The PCI core did this removal wrong.  If we have a valid pci_dev
> > pointer, as we do in pcie_aspm_exit_link_state(), the whole object
> > ought to be valid.  But the PCI core deallocated the struct pci_bus
> > for bus 0000:1a too soon.
> >
> > My guess is that when we build a pci_dev, we need to increase the ref
> > count on the pci_bus where that pci_dev lives.  That way we can keep
> > around all the buses and bridges leading from the root to the device
> > in question.
> 
> Agreed, Please check if attached is what you want.

I included the patch directly below for convenience.

> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index b494066..6df5428 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>  	 */
>  	down_write(&pci_bus_sem);
>  	list_add_tail(&dev->bus_list, &bus->devices);
> +	get_device(&bus->dev);
>  	up_write(&pci_bus_sem);

This suggests that taking the reference must be protected by
pci_bus_sem, but I don't think that's the case, is it?

>  	pci_fixup_device(pci_fixup_final, dev);
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index cc875e6..a72b700 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
>  {
>  	down_write(&pci_bus_sem);
>  	list_del(&dev->bus_list);
> +	put_device(&dev->bus->dev);
>  	up_write(&pci_bus_sem);
>  
>  	pci_free_resources(dev);

The problem is that we capture the struct pci_bus pointer without
taking a reference on the bus object.  The capture occurs in
pci_scan_device() (and other callers of alloc_pci_dev()) where we do
this:

    dev = alloc_pci_dev();
    dev->bus = bus;

To make code analysis easier, I think we should take the reference at
the exact point where we capture the pointer, using something like:

    dev->bus = pci_bus_get(bus);

It'd probably be even better to deprecate alloc_pci_dev() and make a
pci_alloc_dev(struct pci_bus *bus) that takes care of acquiring the
reference itself.

But I don't think this takes care of the whole issue.  What protects
the pci_scan_slot() path against concurrent removal of the pci_bus?
I don't see anything in the hotplug drivers or in the
pci_scan_child_bus() path that acquires a reference or does locking
for this.

Bjorn

  reply	other threads:[~2013-02-27 20:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
2013-01-31 23:29   ` Myron Stowe
2013-02-01 19:55     ` Joe Lawrence
2013-02-01 22:31       ` Bjorn Helgaas
2013-02-06 10:09   ` Gu Zheng
2013-02-06 15:23     ` Joe Lawrence
2013-02-09  0:35     ` Bjorn Helgaas
     [not found]       ` <5122F276.80807@cn.fujitsu.com>
2013-02-24  0:20         ` Bjorn Helgaas
2013-02-24  3:13           ` Yinghai Lu
2013-02-27 20:14             ` Bjorn Helgaas [this message]
2013-02-25  5:59           ` Gu Zheng
2013-02-26 16:03             ` Myron Stowe
2013-02-27  6:42               ` Gu Zheng
2013-02-27  6:47                 ` Yinghai Lu
2013-02-28 10:47                   ` Gu Zheng
2013-02-28 11:50                     ` Yijing Wang
2013-02-28 15:11                     ` Yinghai Lu
2013-03-01  1:14                       ` Gu Zheng
2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
2013-01-18 22:54   ` Myron Stowe
2013-02-01 22:32     ` Bjorn Helgaas
     [not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
2013-01-18 19:53   ` PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 23:15     ` Myron Stowe
2013-01-18 23:41       ` Myron Stowe
2013-01-19  1:03         ` Joe Lawrence
2013-02-01 22:45           ` Bjorn Helgaas
2013-01-18 19:57 ` Myron Stowe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130227201449.GA6409@google.com \
    --to=bhelgaas@google.com \
    --cc=Joe.Lawrence@stratus.com \
    --cc=david.bulkow@stratus.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mjg59@srcf.ucam.org \
    --cc=mstowe@redhat.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).