linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
To: Myron Stowe <myron.stowe@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Joe Lawrence <Joe.Lawrence@stratus.com>,
	linux-pci@vger.kernel.org, Matthew Garrett <mjg59@srcf.ucam.org>,
	Myron Stowe <mstowe@redhat.com>,
	David Bulkow <david.bulkow@stratus.com>,
	Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
Date: Wed, 27 Feb 2013 14:42:17 +0800	[thread overview]
Message-ID: <512DAAC9.8030400@cn.fujitsu.com> (raw)
In-Reply-To: <CAL-B5D0xN6bs3oXugS9FOS5bQiFRJQ9766u5wsXw4UQdUmwfyw@mail.gmail.com>

On 02/27/2013 12:03 AM, Myron Stowe wrote:

> On Sun, Feb 24, 2013 at 10:59 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> On 02/24/2013 08:20 AM, Bjorn Helgaas wrote:
>>
>>> [+cc Yinghai]
>>>
...snip...
>>> Please create a bugzilla for this issue.
>>>
>>
>> Here:https://bugzilla.kernel.org/show_bug.cgi?id=54411
>>
>>> I think this is a general object lifetime issue that really has
>>> nothing to do with ASPM except that ASPM happens to be the victim.
>>>
>>> You're doing this:
>>>
>>>     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
>>>>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>
>>> The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
>>> interface remove_store() uses device_schedule_callback() to schedule
>>> the remove for later.  I think what's happening is that we schedule
>>> remove_callback() for both devices before 10:00.0 has been removed,
>>> like this:
>>>
>>>     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>>>     remove_store  # for 10:00.0
>>>       device_schedule_callback(10:00.0, remove_callback)
>>>         sysfs_schedule_callback
>>>           kobject_get
>>>           queue_work
>>>     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>     remove_store  # for 1a:01.0
>>>       device_schedule_callback(1a:01.0, remove_callback)
>>>         sysfs_schedule_callback
>>>           kobject_get
>>>           queue_work
>>>
>>> Note that we acquire a reference on each pci_dev before queuing the work item.
>>>
>>> Later, we run the callbacks, starting with 10:00.0.  This calls
>>> remove_callback() to perform the remove:
>>>
>>>     remove_callback(10:00.0)
>>>       mutex_lock(&pci_remove_rescan_mutex)
>>>       pci_stop_and_remove_bus_device(pdev)
>>>       mutex_unlock(&pci_remove_rescan_mutex)
>>>
>>> This will stop and remove the subtree below 10:00.0, but it does not
>>> actually free the pci_dev for 1a:01.0 because we increased its ref
>>> count in sysfs_schedule_callback.  So after completing
>>> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>>>
>>> The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
>>> pci_stop_dev().  This is where we blow up because, according to your
>>> debugging, pdev->bus->self is no longer valid.
>>>
>>> The PCI core did this removal wrong.  If we have a valid pci_dev
>>> pointer, as we do in pcie_aspm_exit_link_state(), the whole object
>>> ought to be valid.  But the PCI core deallocated the struct pci_bus
>>> for bus 0000:1a too soon.
>>
>> Your analysis is perfect, and it solves my doubt. Thanks very much!:)
> 
> Gu:
> 
> I can't tell from your response whether you are stating that you agree
> with Bjorn's analysis -or- that you applied Yinghai's patch above
> (dated Feb 23) and found that the testing scenario was now successful.
>  Could you please specifically state which and if it was just the
> former would you be able to test Yinghai's implementation of Bjorn's
> analysis?

Hi Myron,
    Sorry to make you confused.
    I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
, but it seems does not work. More infos, please refer to bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=54411

Thanks,
Gu

> 
> Thanks,
>  Myron
>>
>>>
>>> My guess is that when we build a pci_dev, we need to increase the ref
>>> count on the pci_bus where that pci_dev lives.  That way we can keep
>>> around all the buses and bridges leading from the root to the device
>>> in question.
>>>
>>> Bjorn
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



  reply	other threads:[~2013-02-27  6:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
2013-01-31 23:29   ` Myron Stowe
2013-02-01 19:55     ` Joe Lawrence
2013-02-01 22:31       ` Bjorn Helgaas
2013-02-06 10:09   ` Gu Zheng
2013-02-06 15:23     ` Joe Lawrence
2013-02-09  0:35     ` Bjorn Helgaas
     [not found]       ` <5122F276.80807@cn.fujitsu.com>
2013-02-24  0:20         ` Bjorn Helgaas
2013-02-24  3:13           ` Yinghai Lu
2013-02-27 20:14             ` Bjorn Helgaas
2013-02-25  5:59           ` Gu Zheng
2013-02-26 16:03             ` Myron Stowe
2013-02-27  6:42               ` Gu Zheng [this message]
2013-02-27  6:47                 ` Yinghai Lu
2013-02-28 10:47                   ` Gu Zheng
2013-02-28 11:50                     ` Yijing Wang
2013-02-28 15:11                     ` Yinghai Lu
2013-03-01  1:14                       ` Gu Zheng
2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
2013-01-18 22:54   ` Myron Stowe
2013-02-01 22:32     ` Bjorn Helgaas
     [not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
2013-01-18 19:53   ` PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 23:15     ` Myron Stowe
2013-01-18 23:41       ` Myron Stowe
2013-01-19  1:03         ` Joe Lawrence
2013-02-01 22:45           ` Bjorn Helgaas
2013-01-18 19:57 ` Myron Stowe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512DAAC9.8030400@cn.fujitsu.com \
    --to=guz.fnst@cn.fujitsu.com \
    --cc=Joe.Lawrence@stratus.com \
    --cc=bhelgaas@google.com \
    --cc=david.bulkow@stratus.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mjg59@srcf.ucam.org \
    --cc=mstowe@redhat.com \
    --cc=myron.stowe@gmail.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).