All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: wangyijing <wangyijing@huawei.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI/ASPM: Don't remove pcie_link_state until we stop the last device
Date: Mon, 31 Aug 2015 08:56:01 -0500	[thread overview]
Message-ID: <20150831135601.GC647@google.com> (raw)
In-Reply-To: <55E3ACE8.20002@huawei.com>

On Mon, Aug 31, 2015 at 09:24:56AM +0800, wangyijing wrote:
> 在 2015/8/29 20:20, Bjorn Helgaas 写道:
> > On Thu, Jul 30, 2015 at 12:09:20PM +0800, Yijing Wang wrote:
> >> Now we stop the pci_bus->devices in reverse order, but in
> >> pcie_aspm_exit_link_state(), we only would do something when
> >> the device is the last one.
> >>
> >> void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> >> {
> >> 	...
> >> 	if (!list_is_last(&pdev->bus_list, &parent->subordinate->devices))
> ...

> > What seems wrong to me is that when we're removing device X, we free the
> > link_state for a *parent* of X.  I think the code would be much simpler and
> 
> What I am worried about here is if we hot remove a endpoint device here, and leave
> the parent device, so we don't call pci_aspm_exit_link_state() for this link anymore ?
> 
>                   pcie link
> downstream port ---------- endpoint device

If we remove the endpoint and leave the parent, we do call
pcie_aspm_exit_link_state() for the endpoint.  I'm proposing to change
that path so that when the endpoint is removed, we do whatever is
necessary for changing the ASPM configuration on the link, but leave
the ASPM structure allocated, since the parent still exists.

Today we allocate link_state only when a port has a downstream link
*and* there's a device on the other end of the link.  I'm proposing
that we allocate link_state even if there's no downstream device.

Then the alloc/free path is always tied directly to the parent.  We
can still change ASPM configuration as needed when downstream devices
are added or removed.  It's not a trivial change, but it seems
possible.

> > easier to get right if we freed the link_state for X when we remove X.
> > 
> > Can you look at fixing the problem that way?
> 
> I'm sorry, I don't have the platform now, this issue was found in product department, and they moved
> the platform away.

OK.  I'll drop this patch for now.  I definitely agree this is a
problem, but the fact that you don't have the platform any more
doesn't mean we need to throw in a band-aid instead of designing a
better solution.

This problem should be pretty easy to reproduce on any platform,
possibly with a little bit of scaffolding to tweak the topology.

> >> 		goto out;
> >> 	...
> >> }
> >>
> >> So if we have the following pcie tree, system may crash.
> >>
> >> [b7-bd]--+-02.0-[bb-bd]--+-00.0-[bc-bd]----01.0-[bd]----00.0  PLX Technology, Inc. Device 0002
> >>                          +-00.1  PLX Technology, Inc. Device 0002
> >>                          +-00.2  PLX Technology, Inc. Device 0002
> >>                          +-00.3  PLX Technology, Inc. Device 0002
> ...
> >> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> >> CC: stable@vger.kernel.org #3.4+
> > 
> > I need a clue about why you picked v3.4 here.  Is it because ac205b7bb72f
> > ("PCI: make sriov work with hotplug remove") appeared in v3.4?
> 
> Actually, this issue was found at v3.4 stable kernel, which was introduced in
> 3419c75e15f8 ("PCI: properly clean up ASPM link state on device remove") I think.

Did you bisect to this or figure it out by analysis?  It's good to
mention the actual commit that broke it so people backporting can
figure out if they need the fix.

Bjorn

  reply	other threads:[~2015-08-31 13:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-30  4:09 [PATCH] PCI/ASPM: Don't remove pcie_link_state until we stop the last device Yijing Wang
2015-08-29 12:20 ` Bjorn Helgaas
2015-08-31  1:24   ` wangyijing
2015-08-31 13:56     ` Bjorn Helgaas [this message]
2015-09-01  0:59       ` wangyijing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150831135601.GC647@google.com \
    --to=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=wangyijing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.