Re: [PATCH 1/4] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bjorn Helgaas <helgaas@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Sinan Kaya <okaya@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Yicong Yang <yangyicong@hisilicon.com>,
	linux-pci@vger.kernel.org, Russell Currey <ruscur@russell.cc>,
	Oliver OHalloran <oohall@gmail.com>,
	Stuart Hayes <stuart.w.hayes@gmail.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH 1/4] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
Date: Thu, 7 Oct 2021 18:00:20 -0500	[thread overview]
Message-ID: <20211007230020.GA1273969@bhelgaas> (raw)
In-Reply-To: <251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de>

On Sat, Jul 31, 2021 at 02:39:01PM +0200, Lukas Wunner wrote:
> Stuart Hayes reports that an error handled by DPC at a Root Port results
> in pciehp gratuitously bringing down a subordinate hotplug port:
> 
>   RP -- UP -- DP -- UP -- DP (hotplug) -- EP
> 
> pciehp brings the slot down because the Link to the Endpoint goes down.
> That is caused by a Hot Reset being propagated as a result of DPC.
> Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
> 
>   For a Switch, the following must cause a hot reset to be sent on all
>   Downstream Ports: [...]
> 
>   * The Data Link Layer of the Upstream Port reporting DL_Down status.
>     In Switches that support Link speeds greater than 5.0 GT/s, the
>     Upstream Port must direct the LTSSM of each Downstream Port to the
>     Hot Reset state, but not hold the LTSSMs in that state. This permits
>     each Downstream Port to begin Link training immediately after its
>     hot reset completes. This behavior is recommended for all Switches.
> 
>   * Receiving a hot reset on the Upstream Port.
> 
> Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
> invokes pcie_portdrv_slot_reset() to restore each port's config space.
> At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
> section 6.7.3.4 "Software Notification of Hot-Plug Events":
> 
>   If the Port is enabled for edge-triggered interrupt signaling using
>   MSI or MSI-X, an interrupt message must be sent every time the logical
>   AND of the following conditions transitions from FALSE to TRUE: [...]
> 
>   * The Hot-Plug Interrupt Enable bit in the Slot Control register is
>     set to 1b.
> 
>   * At least one hot-plug event status bit in the Slot Status register
>     and its associated enable bit in the Slot Control register are both
>     set to 1b.
> 
> Prevent pciehp from gratuitously bringing down the slot by clearing the
> error-induced Data Link Layer State Changed event before restoring
> config space.  Afterwards, check whether the link has unexpectedly
> failed to retrain and synthesize a DLLSC event if so.
> 
> Allow each pcie_port_service_driver (one of them being pciehp) to define
> a slot_reset callback and re-use the existing pm_iter() function to
> iterate over the callbacks.
> 
> Thereby, the Endpoint driver remains bound throughout error recovery and
> may restore the device to working state.
> 
> Surprise removal during error recovery is detected through a Presence
> Detect Changed event.  The hotplug port is expected to not signal that
> event as a result of a Hot Reset.
> 
> The issue isn't DPC-specific, it also occurs when an error is handled by
> AER through aer_root_reset().  So while the issue was noticed only now,
> it's been around since 2006 when AER support was first introduced.
> 
> Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
> Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
> Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel

Have you tried backporting this to v2.6.19?  I bet it's tough.  Not
sure we should suggest that stable pick this up unless there's a
reasonable path to do that.

> Cc: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/Kconfig               |  3 +++
>  drivers/pci/hotplug/pciehp.h      |  2 ++
>  drivers/pci/hotplug/pciehp_core.c |  4 ++++
>  drivers/pci/hotplug/pciehp_hpc.c  | 28 ++++++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h        |  2 ++
>  drivers/pci/pcie/portdrv_core.c   | 20 ++++++++++----------
>  drivers/pci/pcie/portdrv_pci.c    |  3 +++
>  7 files changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 0c473d75e625..a295d3c48927 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -182,6 +182,9 @@ config PCI_LABEL
>  	def_bool y if (DMI || ACPI)
>  	select NLS
>  
> +config PCI_ERROR_RECOVERY
> +	def_bool PCIEAER || EEH

I'm having a hard time connecting this to the code.  

pcie_portdrv_slot_reset() is the .slot_reset() method for
pcie_portdriver.  When CONFIG_PCIEPORTBUS=y, portdrv is bound to every
PCIe port and RCEC.

The callers of pci_driver->err_handler->slot_reset() that I see are:

  eeh_report_reset		# arch/powerpc/kernel/eeh_driver.c
  report_slot_reset		# drivers/pci/pcie/err.c
  pci_error_handlers		# drivers/misc/cxl/guest.c
  cxl_pci_slot_reset		# drivers/misc/cxl/pci.c
  pcifront_common_process	# drivers/pci/xen-pcifront.c

I guess the cxl and xen cases probably do not involve PCIe ports;
they're probably only concerned with endpoints, so maybe we can
exclude those.

But this still seems like it's kind of dangling.  It's not obvious to
me why pciehp_slot_reset() should be inside that #ifdef.  Do we need
the #ifdef for a functional reason, or is it there because we know it
will never be called?  If the latter, I don't think the savings are
worth it.

>  config PCI_HYPERV
>  	tristate "Hyper-V PCI Frontend"
>  	depends on X86_64 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index d4a930881054..7f24fe30a898 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
>  int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
>  int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
>  
> +int pciehp_slot_reset(struct pcie_device *dev);
> +
>  static inline const char *slot_name(struct controller *ctrl)
>  {
>  	return hotplug_slot_name(&ctrl->hotplug_slot);
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index ad3393930ecb..46a62237dcc8 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -351,6 +351,10 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
>  	.runtime_suspend = pciehp_runtime_suspend,
>  	.runtime_resume	= pciehp_runtime_resume,
>  #endif	/* PM */
> +
> +#ifdef	CONFIG_PCI_ERROR_RECOVERY
> +	.slot_reset	= pciehp_slot_reset,
> +#endif
>  };
>  
>  int __init pcie_hp_init(void)
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 9d06939736c0..72ef22d0c2c9 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -862,6 +862,34 @@ void pcie_disable_interrupt(struct controller *ctrl)
>  	pcie_write_cmd(ctrl, 0, mask);
>  }
>  
> +#ifdef CONFIG_PCI_ERROR_RECOVERY
> +/**
> + * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
> + * @dev: PCI Express port service device
> + *
> + * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
> + * further up in the hierarchy to recover from an error.  The reset was
> + * propagated down to this hotplug port.  Ignore the resulting link flap.
> + * If the link failed to retrain successfully, synthesize the ignored event.
> + * Surprise removal during reset is detected through Presence Detect Changed.
> + */
> +int pciehp_slot_reset(struct pcie_device *dev)
> +{
> +	struct controller *ctrl = get_service_data(dev);
> +
> +	if (ctrl->state != ON_STATE)
> +		return 0;
> +
> +	pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
> +				   PCI_EXP_SLTSTA_DLLSC);
> +
> +	if (!pciehp_check_link_active(ctrl))
> +		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
> +
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
>   * bus reset of the bridge, but at the same time we want to ensure that it is
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 2ff5724b8f13..92a776d211ec 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -85,6 +85,7 @@ struct pcie_port_service_driver {
>  	int (*runtime_suspend)(struct pcie_device *dev);
>  	int (*runtime_resume)(struct pcie_device *dev);
>  
> +	int (*slot_reset)(struct pcie_device *dev);
>  	/* Device driver may resume normal operations */
>  	void (*error_resume)(struct pci_dev *dev);
>  
> @@ -110,6 +111,7 @@ void pcie_port_service_unregister(struct pcie_port_service_driver *new);
>  
>  extern struct bus_type pcie_port_bus_type;
>  int pcie_port_device_register(struct pci_dev *dev);
> +int pcie_port_device_iter(struct device *dev, void *data);
>  #ifdef CONFIG_PM
>  int pcie_port_device_suspend(struct device *dev);
>  int pcie_port_device_resume_noirq(struct device *dev);
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..ebcec7daa245 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -362,24 +362,24 @@ int pcie_port_device_register(struct pci_dev *dev)
>  	return status;
>  }
>  
> -#ifdef CONFIG_PM
> -typedef int (*pcie_pm_callback_t)(struct pcie_device *);
> +typedef int (*pcie_callback_t)(struct pcie_device *);
>  
> -static int pm_iter(struct device *dev, void *data)
> +int pcie_port_device_iter(struct device *dev, void *data)

I like this change, and it seems like it's basically a rename that
could be split off from rest to make the slot_reset part a little more
focused.

>  {
>  	struct pcie_port_service_driver *service_driver;
>  	size_t offset = *(size_t *)data;
> -	pcie_pm_callback_t cb;
> +	pcie_callback_t cb;
>  
>  	if ((dev->bus == &pcie_port_bus_type) && dev->driver) {
>  		service_driver = to_service_driver(dev->driver);
> -		cb = *(pcie_pm_callback_t *)((void *)service_driver + offset);
> +		cb = *(pcie_callback_t *)((void *)service_driver + offset);
>  		if (cb)
>  			return cb(to_pcie_device(dev));
>  	}
>  	return 0;
>  }
>  
> +#ifdef CONFIG_PM
>  /**
>   * pcie_port_device_suspend - suspend port services associated with a PCIe port
>   * @dev: PCI Express port to handle
> @@ -387,13 +387,13 @@ static int pm_iter(struct device *dev, void *data)
>  int pcie_port_device_suspend(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, suspend);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  int pcie_port_device_resume_noirq(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, resume_noirq);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  /**
> @@ -403,7 +403,7 @@ int pcie_port_device_resume_noirq(struct device *dev)
>  int pcie_port_device_resume(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, resume);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  /**
> @@ -413,7 +413,7 @@ int pcie_port_device_resume(struct device *dev)
>  int pcie_port_device_runtime_suspend(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, runtime_suspend);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  /**
> @@ -423,7 +423,7 @@ int pcie_port_device_runtime_suspend(struct device *dev)
>  int pcie_port_device_runtime_resume(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, runtime_resume);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  #endif /* PM */
>  
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index c7ff1eea225a..1af74c3d9d5d 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
>  
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> +	size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
> +	device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
> +
>  	pci_restore_state(dev);
>  	pci_save_state(dev);
>  	return PCI_ERS_RESULT_RECOVERED;
> -- 
> 2.31.1
>

next prev parent reply	other threads:[~2021-10-07 23:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-31 12:39 [PATCH 0/4] pciehp error recovery fix + cleanups Lukas Wunner
2021-07-31 12:39 ` [PATCH 1/4] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset Lukas Wunner
2021-09-29 20:40   ` stuart hayes
2021-10-07 23:00   ` Bjorn Helgaas [this message]
2021-10-10  9:14     ` Lukas Wunner
2021-10-11 17:25       ` Bjorn Helgaas
2021-07-31 12:39 ` [PATCH 2/4] PCI/portdrv: Remove unused resume err_handler Lukas Wunner
2021-07-31 12:39 ` [PATCH 3/4] PCI/portdrv: Remove unused pcie_port_bus_{,un}register() declarations Lukas Wunner
2021-07-31 12:39 ` [PATCH 4/4] PCI/ERR: Reduce compile time for CONFIG_PCIEAER=n Lukas Wunner
2021-10-15 19:29 ` [PATCH 0/4] pciehp error recovery fix + cleanups Bjorn Helgaas
2021-10-16  9:06   ` Lukas Wunner
2021-10-16 14:23     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211007230020.GA1273969@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=ashok.raj@intel.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=okaya@kernel.org \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=stuart.w.hayes@gmail.com \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).