linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Sinan Kaya <okaya@kernel.org>, Thomas Tai <thomas.tai@oracle.com>,
	"poza@codeaurora.org" <poza@codeaurora.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks
Date: Mon, 10 Sep 2018 08:56:42 -0600	[thread overview]
Message-ID: <20180910145641.GA7466@localhost.localdomain> (raw)
In-Reply-To: <20180910132033.ei5nk4iibt7pesd5@wunner.de>

On Mon, Sep 10, 2018 at 06:20:33AM -0700, Lukas Wunner wrote:
> On Wed, Sep 05, 2018 at 02:35:42PM -0600, Keith Busch wrote:
> > Error handling may trigger spurious link events, which may trigger
> > hotplug handling to re-enumerate the topology. This patch temporarily
> > disables notification during such error handling and checks the topology
> > for changes after reset.
> [...]
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c
> > @@ -301,6 +301,43 @@ static void pciehp_remove(struct pcie_device *dev)
> >  	pciehp_release_ctrl(ctrl);
> >  }
> >  
> > +static void pciehp_error_detected(struct pcie_device *dev)
> > +{
> > +	struct controller *ctrl = get_service_data(dev);
> > +
> > +	/*
> > +	 * Shutdown notification to ignore hotplug events during error
> > +	 * handling. We'll recheck the status when error handling completes.
> > +	 *
> > +	 * It is possible link event related to this error handling may have
> > +	 * triggered a the hotplug interrupt ahead of this notification, but we
> > +	 * can't do anything about that race.
> > +	 */
> > +	pcie_shutdown_notification(ctrl);
> 
> Unfortunately this patch does not take into account my comment on
> patch [13/16] in v1 of this series that pcie_shutdown_notification()
> can't be used here because the sysfs user interface to enable/disable
> the slot is still present but no longer functions once the IRQ is
> released:
> https://patchwork.ozlabs.org/patch/964715/
> 
> My suggestion was:
> 
>    "I think what you want to do instead is "down_write(&ctrl->reset_lock)",
>     see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset").
>     And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does."

The sysfs entries still function. Their actions are only temporarily
stalled during error handling. Once the slot reset is called, the
ctrl->pending_events is queried to take requested actions.

> > +static void pciehp_slot_reset(struct pcie_device *dev)
> > +{
> > +	struct controller *ctrl = get_service_data(dev);
> > +	u8 changed;
> > +
> > +	/*
> > +	 * Cache presence detect change, but ignore other hotplug events that
> > +	 * may occur during error handling. Ports that implement only in-band
> > +	 * presence detection may inadvertently believe the device has changed,
> > +	 * so those devices will have to be re-enumerated.
> > +	 */
> > +	pciehp_get_adapter_changed(ctrl->slot, &changed);
> > +	pcie_clear_hotplug_events(ctrl);
> > +	pcie_init_notification(ctrl);
> > +
> > +	if (changed) {
> > +		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> > +		__pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
> 
> The __pci_walk_bus() seems superfluous because the devices are also
> marked disconnected when bringing down the slot as a result of the
> synthesized PCI_EXP_SLTSTA_PDC event.

Right, but we can't do that inline with the slot reset because of the
circular locks that creates with the pci_bus_sem. We still want to
fence off drivers for downstream devices from mucking with config space
in a topology that is reported to be different than the one we're
recoverying from. You can get undefined results that way.

  reply	other threads:[~2018-09-10 19:49 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-05 20:35 [PATCHv2 00/20] PCI, error handling and hot plug Keith Busch
2018-09-05 20:35 ` [PATCHv2 01/20] PCI: Simplify disconnected marking Keith Busch
2018-09-05 20:35 ` [PATCHv2 02/20] PCI: Fix faulty logic in pci_reset_bus() Keith Busch
2018-09-05 20:35 ` [PATCHv2 03/20] PCI: Add required waits on link active Keith Busch
2018-09-06 11:42   ` Lukas Wunner
2018-09-06 14:44     ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 04/20] PCI/AER: Remove dead code Keith Busch
2018-09-05 20:35 ` [PATCHv2 05/20] PCI/ERR: Use slot reset if available Keith Busch
2018-09-05 20:35 ` [PATCHv2 06/20] PCI/ERR: Handle fatal error recovery Keith Busch
2018-09-05 20:35 ` [PATCHv2 07/20] PCI/ERR: Always use the first downstream port Keith Busch
2018-09-05 20:35 ` [PATCHv2 08/20] PCI/ERR: Simplify broadcast callouts Keith Busch
2018-09-05 20:35 ` [PATCHv2 09/20] PCI/ERR: Report current recovery status for udev Keith Busch
2018-09-05 20:35 ` [PATCHv2 10/20] PCI/ERR: Remove devices on recovery failure Keith Busch
2018-09-05 20:35 ` [PATCHv2 11/20] PCI/portdrv: Provide pci error callbacks Keith Busch
2018-09-05 20:35 ` [PATCHv2 12/20] PCI/portdrv: Restore pci state on slot reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 13/20] PCI: Make link active reporting detection generic Keith Busch
2018-09-06 12:38   ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 14/20] PCI: Create recursive bus walk Keith Busch
2018-09-05 20:35 ` [PATCHv2 15/20] PCI/pciehp: Fix powerfault detection order Keith Busch
2018-09-06 19:36   ` Bjorn Helgaas
2018-09-06 19:50     ` Keith Busch
2018-09-07 16:53       ` Bjorn Helgaas
2018-09-07 20:03         ` Bjorn Helgaas
2018-09-07 20:18           ` Keith Busch
2018-09-18 21:46             ` Bjorn Helgaas
2018-09-18 22:11               ` Keith Busch
2018-09-18 22:11                 ` Keith Busch
2018-09-07 20:26           ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks Keith Busch
2018-09-06 18:23   ` Thomas Tai
2018-09-06 18:49     ` Keith Busch
2018-09-10 13:20   ` Lukas Wunner
2018-09-10 14:56     ` Keith Busch [this message]
2018-09-10 16:09       ` Lukas Wunner
2018-09-10 16:18         ` Keith Busch
2018-09-10 16:45         ` Keith Busch
2018-09-10 17:08           ` Lukas Wunner
2018-09-10 17:22             ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 17/20] PCI/pciehp: Ignore link events during DPC event Keith Busch
2018-09-05 20:35 ` [PATCHv2 18/20] PCI/DPC: Wait for link active after reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 19/20] PCI/DPC: Link reset code cleanup Keith Busch
2018-09-05 20:35 ` [PATCHv2 20/20] PCI: Unify device inaccessible Keith Busch
2018-09-06  4:20   ` Benjamin Herrenschmidt
2018-09-06 17:30 ` [PATCHv2 00/20] PCI, error handling and hot plug Thomas Tai
2018-09-06 17:36   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180910145641.GA7466@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=hch@lst.de \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=okaya@kernel.org \
    --cc=poza@codeaurora.org \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).