Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Sinan Kaya <Okaya@kernel.org>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Oza Pawandeep <poza@codeaurora.org>
Subject: Re: [PATCH v6 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending
Date: Sun, 29 Jul 2018 19:39:07 +0200	[thread overview]
Message-ID: <20180729173907.GA27216@wunner.de> (raw)
In-Reply-To: <CAK9iUCNfu8pKd1bHiQWVSYg+dppTSPV-Qa7YWBtaNLU_cwL8AA@mail.gmail.com>

On Sun, Jul 29, 2018 at 09:44:50AM -0700, Sinan Kaya wrote:
> On 7/29/18, Lukas Wunner <lukas@wunner.de> wrote:
> > After re-reading all the e-mails we've exchanged since early July,
> > the approach Oza suggested in this e-mail seems the most sensible to me:
> > https://lkml.kernel.org/r/324f8cf2fe6f7bdc43ca8a646eea908d@codeaurora.org
> >
> > He suggested skipping removal of devices in pcie_do_fatal_recovery()
> > for hotplug ports.
> >
> > The rationale is that when a PCIe port raises a fatal error, that port
> > will normally not have the ability to remove its children from the
> > hierarchy unless it's a hotplug port.  Thus, if the port is a hotplug
> > port, pcie_do_fatal_recovery() should let pciehp handle removal of the
> > devices.  Only if it's not a hotplug port should pcie_do_fatal_recovery()
> > remove the devices.  My understanding is that after the error has been
> > cleared, the link should automatically come back up, is that correct?
> > pciehp will then bring up the slot on its own.
> >
> 
> I understand your slot power concerns. I don't have an answer at this moment.
> 
> Here is the issue with mixing pciehp link handler and fatal error
> recovery paths.
> 
> 1. fatal error happens
> 2. Pciehp ISR executes and turns off slot power
> 3. Fatal error recovery executes either secondary bus reset or dpc
>    status trigger via reset_link() function to recover the link.
> 4. Link doesn't recover because power is off.

With the new code on the pci/hotplug branch, the expected behavior is:

4. Having brought the slot down after the DLLSC event,
   pciehp_handle_presence_or_link_change() notices that the slot is
   occupied ("Slot(...): Card present") and immediately tries to
   bring the slot up again:
   Slot power is re-enabled, green LED is set to blinking.
   pciehp_check_link_status() then waits 1 second for the link to
   go up.  If the link does not come up within 1 second, an error
   message is logged but the procedure does not abort yet.  We then
   wait for 100 ms and poll for 1 second whether the vendor ID of
   slot 0 function 0 becomes readable.  Only if the link has not come
   up at this point is the procedure aborted.

So the question is:

a. Is DPC/AER able to recover from the error if slot power is disabled?
   Or do we need to synchronize with pciehp to keep slot power enabled
   so that the error can be handled?

b. How fast is DPC/AER able to recover from a fatal error?
   If it's more than 2.1 seconds, pciehp will not bring the slot up
   automatically and the user is required to bring it up manually via
   sysfs.  If we know that DPC/AER need a specific amount of time that
   is longer than 2.1 seconds, we can amend board_added() to poll until
   recovery from the fatal error (or use a waitqueue which is woken on
   recovery), with a sufficient timeout.

Does this work for you?

Thanks,

Lukas

  reply	other threads:[~2018-07-29 19:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-28  9:13 [PATCH v6 0/1] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
2018-07-28  9:13 ` [PATCH v6 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending Sinan Kaya
2018-07-29 11:57   ` Lukas Wunner
2018-07-29 16:44     ` Sinan Kaya
2018-07-29 17:39       ` Lukas Wunner [this message]
2018-07-29 18:30         ` Sinan Kaya
2018-07-29 19:07           ` Lukas Wunner
2018-07-29 19:21             ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180729173907.GA27216@wunner.de \
    --to=lukas@wunner.de \
    --cc=Okaya@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=keith.busch@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=poza@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox