All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Lukas Wunner <lukas@wunner.de>, Bjorn Helgaas <helgaas@kernel.org>
Cc: Riana Tauro <riana.tauro@intel.com>,
	Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>,
	"Sean C. Dardis" <sean.c.dardis@intel.com>,
	Terry Bowman <terry.bowman@amd.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Linas Vepstas <linasvepstas@gmail.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Oliver OHalloran <oohall@gmail.com>,
	Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH 3/5] PCI/ERR: Notify drivers on failure to recover
Date: Wed, 13 Aug 2025 16:05:00 -0700	[thread overview]
Message-ID: <fa9f42ab-bced-4c7f-9977-c0b611e92e2e@linux.intel.com> (raw)
In-Reply-To: <ec212d4d4f5c65d29349df33acdc9768ff8279d1.1755008151.git.lukas@wunner.de>


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> According to Documentation/PCI/pci-error-recovery.rst, the following shall
> occur on failure to recover from a PCIe Uncorrectable Error:
>
>    STEP 6: Permanent Failure
>    -------------------------
>    A "permanent failure" has occurred, and the platform cannot recover
>    the device.  The platform will call error_detected() with a
>    pci_channel_state_t value of pci_channel_io_perm_failure.
>
>    The device driver should, at this point, assume the worst. It should
>    cancel all pending I/O, refuse all new I/O, returning -EIO to
>    higher layers. The device driver should then clean up all of its
>    memory and remove itself from kernel operations, much as it would
>    during system shutdown.
>
> Sathya notes that AER does not call error_detected() on failure and thus
> deviates from the document (as well as EEH, for which the document was
> originally added).
>
> Most drivers do nothing on permanent failure, but the SCSI drivers and a
> number of Ethernet drivers do take advantage of the notification to flush
> queues and give up resources.
>
> Amend AER to notify such drivers and align with the documentation and EEH.
>
> Link: https://lore.kernel.org/r/f496fc0f-64d7-46a4-8562-dba74e31a956@linux.intel.com/
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>   drivers/pci/pcie/err.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 21d554359fb1..930bb60fb761 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -110,7 +110,19 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
>   
>   static int report_perm_failure_detected(struct pci_dev *dev, void *data)
>   {
> +	struct pci_driver *pdrv;
> +	const struct pci_error_handlers *err_handler;
> +
> +	device_lock(&dev->dev);
> +	pdrv = dev->driver;
> +	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->error_detected)
> +		goto out;
> +
> +	err_handler = pdrv->err_handler;
> +	err_handler->error_detected(dev, pci_channel_io_perm_failure);
> +out:
>   	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +	device_unlock(&dev->dev);
>   	return 0;
>   }
>   

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


  reply	other threads:[~2025-08-13 23:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13  5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
2025-08-13  5:11 ` Lukas Wunner
2025-08-13  5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
2025-08-17 13:45     ` Lukas Wunner
2025-08-14  7:56   ` Niklas Schnelle
2025-08-14  9:36     ` Lukas Wunner
2025-08-14 19:29       ` Sathyanarayanan Kuppuswamy
2025-08-17 13:17         ` Lukas Wunner
2025-08-17 16:10           ` Sathyanarayanan Kuppuswamy
2025-08-14 20:31       ` Niklas Schnelle
2025-08-18 23:17     ` Linas Vepstas
2025-08-17 16:11   ` Sathyanarayanan Kuppuswamy
2025-08-13  5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
2025-08-13 23:01   ` Sathyanarayanan Kuppuswamy
2025-08-14  7:08   ` Niklas Schnelle
2025-08-13  5:11 ` [PATCH 3/5] PCI/ERR: Notify drivers " Lukas Wunner
2025-08-13 23:05   ` Sathyanarayanan Kuppuswamy [this message]
2025-08-13  5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
2025-08-13 23:43   ` Sathyanarayanan Kuppuswamy
2025-08-13  5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
2025-08-14  0:40   ` Sathyanarayanan Kuppuswamy
2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas
2025-08-14  0:30 ` Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa9f42ab-bced-4c7f-9977-c0b611e92e2e@linux.intel.com \
    --to=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=helgaas@kernel.org \
    --cc=linasvepstas@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=manivannan.sadhasivam@oss.qualcomm.com \
    --cc=oohall@gmail.com \
    --cc=riana.tauro@intel.com \
    --cc=schnelle@linux.ibm.com \
    --cc=sean.c.dardis@intel.com \
    --cc=terry.bowman@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.