From: Keith Busch <keith.busch@intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Linux PCI <linux-pci@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Sinan Kaya <okaya@kernel.org>, Thomas Tai <thomas.tai@oracle.com>,
"poza@codeaurora.org" <poza@codeaurora.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks
Date: Mon, 10 Sep 2018 08:56:42 -0600 [thread overview]
Message-ID: <20180910145641.GA7466@localhost.localdomain> (raw)
In-Reply-To: <20180910132033.ei5nk4iibt7pesd5@wunner.de>
On Mon, Sep 10, 2018 at 06:20:33AM -0700, Lukas Wunner wrote:
> On Wed, Sep 05, 2018 at 02:35:42PM -0600, Keith Busch wrote:
> > Error handling may trigger spurious link events, which may trigger
> > hotplug handling to re-enumerate the topology. This patch temporarily
> > disables notification during such error handling and checks the topology
> > for changes after reset.
> [...]
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c
> > @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev)
> > pciehp_release_ctrl(ctrl);
> > }
> >
> > +static void pciehp_error_detected(struct pcie_device *dev)
> > +{
> > + struct controller *ctrl = get_service_data(dev);
> > +
> > + /*
> > + * Shutdown notification to ignore hotplug events during error
> > + * handling. We'll recheck the status when error handling completes.
> > + *
> > + * It is possible link event related to this error handling may have
> > + * triggered a the hotplug interrupt ahead of this notification, but we
> > + * can't do anything about that race.
> > + */
> > + pcie_shutdown_notification(ctrl);
>
> Unfortunately this patch does not take into account my comment on
> patch [13/16] in v1 of this series that pcie_shutdown_notification()
> can't be used here because the sysfs user interface to enable/disable
> the slot is still present but no longer functions once the IRQ is
> released:
> https://patchwork.ozlabs.org/patch/964715/
>
> My suggestion was:
>
> "I think what you want to do instead is "down_write(&ctrl->reset_lock)",
> see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset").
> And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does."
The sysfs entries still function. Their actions are only temporarily
stalled during error handling. Once the slot reset is called, the
ctrl->pending_events is queried to take requested actions.
> > +static void pciehp_slot_reset(struct pcie_device *dev)
> > +{
> > + struct controller *ctrl = get_service_data(dev);
> > + u8 changed;
> > +
> > + /*
> > + * Cache presence detect change, but ignore other hotplug events that
> > + * may occur during error handling. Ports that implement only in-band
> > + * presence detection may inadvertently believe the device has changed,
> > + * so those devices will have to be re-enumerated.
> > + */
> > + pciehp_get_adapter_changed(ctrl->slot, &changed);
> > + pcie_clear_hotplug_events(ctrl);
> > + pcie_init_notification(ctrl);
> > +
> > + if (changed) {
> > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> > + __pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>
> The __pci_walk_bus() seems superfluous because the devices are also
> marked disconnected when bringing down the slot as a result of the
> synthesized PCI_EXP_SLTSTA_PDC event.
Right, but we can't do that inline with the slot reset because of the
circular locks that creates with the pci_bus_sem. We still want to
fence off drivers for downstream devices from mucking with config space
in a topology that is reported to be different than the one we're
recoverying from. You can get undefined results that way.
next prev parent reply other threads:[~2018-09-10 19:49 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-05 20:35 [PATCHv2 00/20] PCI, error handling and hot plug Keith Busch
2018-09-05 20:35 ` [PATCHv2 01/20] PCI: Simplify disconnected marking Keith Busch
2018-09-05 20:35 ` [PATCHv2 02/20] PCI: Fix faulty logic in pci_reset_bus() Keith Busch
2018-09-05 20:35 ` [PATCHv2 03/20] PCI: Add required waits on link active Keith Busch
2018-09-06 11:42 ` Lukas Wunner
2018-09-06 14:44 ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 04/20] PCI/AER: Remove dead code Keith Busch
2018-09-05 20:35 ` [PATCHv2 05/20] PCI/ERR: Use slot reset if available Keith Busch
2018-09-05 20:35 ` [PATCHv2 06/20] PCI/ERR: Handle fatal error recovery Keith Busch
2018-09-05 20:35 ` [PATCHv2 07/20] PCI/ERR: Always use the first downstream port Keith Busch
2018-09-05 20:35 ` [PATCHv2 08/20] PCI/ERR: Simplify broadcast callouts Keith Busch
2018-09-05 20:35 ` [PATCHv2 09/20] PCI/ERR: Report current recovery status for udev Keith Busch
2018-09-05 20:35 ` [PATCHv2 10/20] PCI/ERR: Remove devices on recovery failure Keith Busch
2018-09-05 20:35 ` [PATCHv2 11/20] PCI/portdrv: Provide pci error callbacks Keith Busch
2018-09-05 20:35 ` [PATCHv2 12/20] PCI/portdrv: Restore pci state on slot reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 13/20] PCI: Make link active reporting detection generic Keith Busch
2018-09-06 12:38 ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 14/20] PCI: Create recursive bus walk Keith Busch
2018-09-05 20:35 ` [PATCHv2 15/20] PCI/pciehp: Fix powerfault detection order Keith Busch
2018-09-06 19:36 ` Bjorn Helgaas
2018-09-06 19:50 ` Keith Busch
2018-09-07 16:53 ` Bjorn Helgaas
2018-09-07 20:03 ` Bjorn Helgaas
2018-09-07 20:18 ` Keith Busch
2018-09-18 21:46 ` Bjorn Helgaas
2018-09-18 22:11 ` Keith Busch
2018-09-18 22:11 ` Keith Busch
2018-09-07 20:26 ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks Keith Busch
2018-09-06 18:23 ` Thomas Tai
2018-09-06 18:49 ` Keith Busch
2018-09-10 13:20 ` Lukas Wunner
2018-09-10 14:56 ` Keith Busch [this message]
2018-09-10 16:09 ` Lukas Wunner
2018-09-10 16:18 ` Keith Busch
2018-09-10 16:45 ` Keith Busch
2018-09-10 17:08 ` Lukas Wunner
2018-09-10 17:22 ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 17/20] PCI/pciehp: Ignore link events during DPC event Keith Busch
2018-09-05 20:35 ` [PATCHv2 18/20] PCI/DPC: Wait for link active after reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 19/20] PCI/DPC: Link reset code cleanup Keith Busch
2018-09-05 20:35 ` [PATCHv2 20/20] PCI: Unify device inaccessible Keith Busch
2018-09-06 4:20 ` Benjamin Herrenschmidt
2018-09-06 17:30 ` [PATCHv2 00/20] PCI, error handling and hot plug Thomas Tai
2018-09-06 17:36 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180910145641.GA7466@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=hch@lst.de \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=okaya@kernel.org \
--cc=poza@codeaurora.org \
--cc=thomas.tai@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).