Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, Richard Weinberger <richard@nod.at>,
	aaron@sigma-star.at
Subject: Re: [bugzilla-daemon@kernel.org: [Bug 216511] New: Spurious PCI_EXP_SLTSTA_DLLSC when hot plugging]
Date: Wed, 21 Sep 2022 20:56:09 +0200	[thread overview]
Message-ID: <20220921185609.GA20860@wunner.de> (raw)
In-Reply-To: <20220921180326.GA1221419@bhelgaas>

On Wed, Sep 21, 2022 at 01:03:26PM -0500, Bjorn Helgaas wrote:
> On Wed, Sep 21, 2022 at 06:40:20AM -0500, Bjorn Helgaas wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216511
[...]
> Here's the call chain when handling that DLL state change:
> 
>   pciehp_ist
>     pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status)
>     status &= ... PCI_EXP_SLTSTA_DLLSC
>     events |= status
>     if (events & PCI_EXP_SLTSTA_DLLSC)
>       pciehp_handle_presence_or_link_change
>         pciehp_disable_slot
>           __pciehp_disable_slot
>             remove_board
>               pciehp_unconfigure_device
>                 pci_stop_and_remove_bus_device
> 
> Per spec, "software must read the Data Link Layer Link Active bit of
> the Link Status Register to determine if the Link is active before
> initiating configuration cycles to the hot plugged device" (PCIe r6.0,
> sec 7.5.3.11).
> 
> It looks like Linux depends on PCI_EXP_SLTSTA_DLLSC but does not
> actually read PCI_EXP_LNKSTA in this path, so this looks like a pciehp
> defect.

I disagree.  The spec citation pertains to *bringup* of the slot,
but this is the bringdown code path.

The logic in pciehp is such that if we receive DLLSC or PDC and the
slot is up, we always bring it down.  Only then do we check whether
the slot is occupied or link is up.  If that's the case, we attempt
to bring the slot up again.

pciehp assumes that the card may have changed when it receives DLLSC
or PDC.  That's the rationale behind this behavior.

In theory one might think that if DLLSC is received without a concurrent
PDC event, then the card in the slot is still the same and only the
link went down (probably flapped).  Unfortunately the reality is not
that simple:  For one, DLLSC and PDC events may come in arbitrary order
and with quite a delay between them.  Second, there are broken slots
which hardwire PDC to 0 and we support those.  So we can't reliably
determine if presence hasn't changed and only link has.

In this particular case, the PEX switch is clearly broken because it
shouldn't signal DLLSC both for a slot where the link change occurred
and its sibling.

A while ago Jon Derrick submitted a patch for a similar problem:
A bifurcated SSD where bringing down one half of the SSD results
in a spurious DLLSC event for the other half:

https://lore.kernel.org/linux-pci/20210830155628.130054-1-jonathan.derrick@linux.dev/

I'm not really happy with that patch because it adds a quirk
in the middle of the code path for interpreting slot events
which makes it difficult to reason about the code's correctness.

I'm starting to wonder if instead of Jon's patch, we should
just disable DLLSC events on broken devices such as this
PEX switch or Jon's SSD.  We'd only rely on PDC then but
that's probably sufficient.  And the code changes would be
less intrusive.

FWIW, Jon is still interested in upstreaming his quirk:

https://lore.kernel.org/linux-pci/446a21e2-aea2-773f-ca88-b6676b54b292@linux.dev/

@Richard:  I think Jon's patch doesn't solve your issue does it?
Because I think the issue he's seeing is slightly different
albeit likewise caused by unreliable DLLSC.  (His pertains to
bringdown, yours to bringup of the slot it seems.)

Thanks,

Lukas

      reply	other threads:[~2022-09-21 18:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-21 11:40 [bugzilla-daemon@kernel.org: [Bug 216511] New: Spurious PCI_EXP_SLTSTA_DLLSC when hot plugging] Bjorn Helgaas
2022-09-21 18:03 ` Bjorn Helgaas
2022-09-21 18:56   ` Lukas Wunner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220921185609.GA20860@wunner.de \
    --to=lukas@wunner.de \
    --cc=aaron@sigma-star.at \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox