From: wangyijing <wangyijing@huawei.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: <linux-pci@vger.kernel.org>
Subject: Re: [PATCH] PCI/ASPM: Don't remove pcie_link_state until we stop the last device
Date: Tue, 1 Sep 2015 08:59:30 +0800 [thread overview]
Message-ID: <55E4F872.3040504@huawei.com> (raw)
In-Reply-To: <20150831135601.GC647@google.com>
在 2015/8/31 21:56, Bjorn Helgaas 写道:
> On Mon, Aug 31, 2015 at 09:24:56AM +0800, wangyijing wrote:
>> 在 2015/8/29 20:20, Bjorn Helgaas 写道:
>>> On Thu, Jul 30, 2015 at 12:09:20PM +0800, Yijing Wang wrote:
>>>> Now we stop the pci_bus->devices in reverse order, but in
>>>> pcie_aspm_exit_link_state(), we only would do something when
>>>> the device is the last one.
>>>>
>>>> void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>>> {
>>>> ...
>>>> if (!list_is_last(&pdev->bus_list, &parent->subordinate->devices))
>> ...
>
>>> What seems wrong to me is that when we're removing device X, we free the
>>> link_state for a *parent* of X. I think the code would be much simpler and
>>
>> What I am worried about here is if we hot remove a endpoint device here, and leave
>> the parent device, so we don't call pci_aspm_exit_link_state() for this link anymore ?
>>
>> pcie link
>> downstream port ---------- endpoint device
>
> If we remove the endpoint and leave the parent, we do call
> pcie_aspm_exit_link_state() for the endpoint. I'm proposing to change
> that path so that when the endpoint is removed, we do whatever is
> necessary for changing the ASPM configuration on the link, but leave
> the ASPM structure allocated, since the parent still exists.
>
> Today we allocate link_state only when a port has a downstream link
> *and* there's a device on the other end of the link. I'm proposing
> that we allocate link_state even if there's no downstream device.
>
> Then the alloc/free path is always tied directly to the parent. We
> can still change ASPM configuration as needed when downstream devices
> are added or removed. It's not a trivial change, but it seems
> possible.
>
Basically, I agree with you, but it's not a trival change, and now
we have no platform in hand, so I think it's better to leave it until we have
platform to reproduce it.
>>> easier to get right if we freed the link_state for X when we remove X.
>>>
>>> Can you look at fixing the problem that way?
>>
>> I'm sorry, I don't have the platform now, this issue was found in product department, and they moved
>> the platform away.
>
> OK. I'll drop this patch for now. I definitely agree this is a
> problem, but the fact that you don't have the platform any more
> doesn't mean we need to throw in a band-aid instead of designing a
> better solution.
OK.
>
> This problem should be pretty easy to reproduce on any platform,
> possibly with a little bit of scaffolding to tweak the topology.
>
>>>> goto out;
>>>> ...
>>>> }
>>>>
>>>> So if we have the following pcie tree, system may crash.
>>>>
>>>> [b7-bd]--+-02.0-[bb-bd]--+-00.0-[bc-bd]----01.0-[bd]----00.0 PLX Technology, Inc. Device 0002
>>>> +-00.1 PLX Technology, Inc. Device 0002
>>>> +-00.2 PLX Technology, Inc. Device 0002
>>>> +-00.3 PLX Technology, Inc. Device 0002
>> ...
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> CC: stable@vger.kernel.org #3.4+
>>>
>>> I need a clue about why you picked v3.4 here. Is it because ac205b7bb72f
>>> ("PCI: make sriov work with hotplug remove") appeared in v3.4?
>>
>> Actually, this issue was found at v3.4 stable kernel, which was introduced in
>> 3419c75e15f8 ("PCI: properly clean up ASPM link state on device remove") I think.
>
> Did you bisect to this or figure it out by analysis? It's good to
> mention the actual commit that broke it so people backporting can
> figure out if they need the fix.
By analysis, if I have the platform agagin, I would try to bisect it.
Thanks!
Yijing.
>
> Bjorn
>
>
prev parent reply other threads:[~2015-09-01 0:59 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-30 4:09 [PATCH] PCI/ASPM: Don't remove pcie_link_state until we stop the last device Yijing Wang
2015-08-29 12:20 ` Bjorn Helgaas
2015-08-31 1:24 ` wangyijing
2015-08-31 13:56 ` Bjorn Helgaas
2015-09-01 0:59 ` wangyijing [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55E4F872.3040504@huawei.com \
--to=wangyijing@huawei.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.