From: Keith Busch <keith.busch@intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Linux PCI <linux-pci@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Sinan Kaya <okaya@kernel.org>, Thomas Tai <thomas.tai@oracle.com>,
"poza@codeaurora.org" <poza@codeaurora.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks
Date: Mon, 10 Sep 2018 08:56:42 -0600 [thread overview]
Message-ID: <20180910145641.GA7466@localhost.localdomain> (raw)
In-Reply-To: <20180910132033.ei5nk4iibt7pesd5@wunner.de>
On Mon, Sep 10, 2018 at 06:20:33AM -0700, Lukas Wunner wrote:
> On Wed, Sep 05, 2018 at 02:35:42PM -0600, Keith Busch wrote:
> > Error handling may trigger spurious link events, which may trigger
> > hotplug handling to re-enumerate the topology. This patch temporarily
> > disables notification during such error handling and checks the topology
> > for changes after reset.
> [...]
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c
> > @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev)
> > pciehp_release_ctrl(ctrl);
> > }
> >
> > +static void pciehp_error_detected(struct pcie_device *dev)
> > +{
> > + struct controller *ctrl = get_service_data(dev);
> > +
> > + /*
> > + * Shutdown notification to ignore hotplug events during error
> > + * handling. We'll recheck the status when error handling completes.
> > + *
> > + * It is possible link event related to this error handling may have
> > + * triggered a the hotplug interrupt ahead of this notification, but we
> > + * can't do anything about that race.
> > + */
> > + pcie_shutdown_notification(ctrl);
>
> Unfortunately this patch does not take into account my comment on
> patch [13/16] in v1 of this series that pcie_shutdown_notification()
> can't be used here because the sysfs user interface to enable/disable
> the slot is still present but no longer functions once the IRQ is
> released:
> https://patchwork.ozlabs.org/patch/964715/
>
> My suggestion was:
>
> "I think what you want to do instead is "down_write(&ctrl->reset_lock)",
> see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset").
> And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does."
The sysfs entries still function. Their actions are only temporarily
stalled during error handling. Once the slot reset is called, the
ctrl->pending_events is queried to take requested actions.
> > +static void pciehp_slot_reset(struct pcie_device *dev)
> > +{
> > + struct controller *ctrl = get_service_data(dev);
> > + u8 changed;
> > +
> > + /*
> > + * Cache presence detect change, but ignore other hotplug events that
> > + * may occur during error handling. Ports that implement only in-band
> > + * presence detection may inadvertently believe the device has changed,
> > + * so those devices will have to be re-enumerated.
> > + */
> > + pciehp_get_adapter_changed(ctrl->slot, &changed);
> > + pcie_clear_hotplug_events(ctrl);
> > + pcie_init_notification(ctrl);
> > +
> > + if (changed) {
> > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>
> The __pci_walk_bus() seems superfluous because the devices are also
> marked disconnected when bringing down the slot as a result of the
> synthesized PCI_EXP_SLTSTA_PDC event.
Right, but we can't do that inline with the slot reset because of the
circular locks that creates with the pci_bus_sem. We still want to
fence off drivers for downstream devices from mucking with config space
in a topology that is reported to be different than the one we're
recoverying from. You can get undefined results that way.
next prev parent reply other threads:[~2018-09-10 19:49 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-05 20:35 [PATCHv2 00/20] PCI, error handling and hot plug Keith Busch
2018-09-05 20:35 ` [PATCHv2 01/20] PCI: Simplify disconnected marking Keith Busch
2018-09-05 20:35 ` [PATCHv2 02/20] PCI: Fix faulty logic in pci_reset_bus() Keith Busch
2018-09-05 20:35 ` [PATCHv2 03/20] PCI: Add required waits on link active Keith Busch
2018-09-06 11:42 ` Lukas Wunner
2018-09-06 14:44 ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 04/20] PCI/AER: Remove dead code Keith Busch
2018-09-05 20:35 ` [PATCHv2 05/20] PCI/ERR: Use slot reset if available Keith Busch
2018-09-05 20:35 ` [PATCHv2 06/20] PCI/ERR: Handle fatal error recovery Keith Busch
2018-09-05 20:35 ` [PATCHv2 07/20] PCI/ERR: Always use the first downstream port Keith Busch
2018-09-05 20:35 ` [PATCHv2 08/20] PCI/ERR: Simplify broadcast callouts Keith Busch
2018-09-05 20:35 ` [PATCHv2 09/20] PCI/ERR: Report current recovery status for udev Keith Busch
2018-09-05 20:35 ` [PATCHv2 10/20] PCI/ERR: Remove devices on recovery failure Keith Busch
2018-09-05 20:35 ` [PATCHv2 11/20] PCI/portdrv: Provide pci error callbacks Keith Busch
2018-09-05 20:35 ` [PATCHv2 12/20] PCI/portdrv: Restore pci state on slot reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 13/20] PCI: Make link active reporting detection generic Keith Busch
2018-09-06 12:38 ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 14/20] PCI: Create recursive bus walk Keith Busch
2018-09-05 20:35 ` [PATCHv2 15/20] PCI/pciehp: Fix powerfault detection order Keith Busch
2018-09-06 19:36 ` Bjorn Helgaas
2018-09-06 19:50 ` Keith Busch
2018-09-07 16:53 ` Bjorn Helgaas
2018-09-07 20:03 ` Bjorn Helgaas
2018-09-07 20:18 ` Keith Busch
2018-09-18 21:46 ` Bjorn Helgaas
2018-09-18 22:11 ` Keith Busch
2018-09-07 20:26 ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks Keith Busch
2018-09-06 18:23 ` Thomas Tai
2018-09-06 18:49 ` Keith Busch
2018-09-10 13:20 ` Lukas Wunner
2018-09-10 14:56 ` Keith Busch [this message]
2018-09-10 16:09 ` Lukas Wunner
2018-09-10 16:18 ` Keith Busch
2018-09-10 16:45 ` Keith Busch
2018-09-10 17:08 ` Lukas Wunner
2018-09-10 17:22 ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 17/20] PCI/pciehp: Ignore link events during DPC event Keith Busch
2018-09-05 20:35 ` [PATCHv2 18/20] PCI/DPC: Wait for link active after reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 19/20] PCI/DPC: Link reset code cleanup Keith Busch
2018-09-05 20:35 ` [PATCHv2 20/20] PCI: Unify device inaccessible Keith Busch
2018-09-06 4:20 ` Benjamin Herrenschmidt
2018-09-06 17:30 ` [PATCHv2 00/20] PCI, error handling and hot plug Thomas Tai
2018-09-06 17:36 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180910145641.GA7466@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=hch@lst.de \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=okaya@kernel.org \
--cc=poza@codeaurora.org \
--cc=thomas.tai@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.