From: Lukas Wunner <lukas@wunner.de>
To: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Niklas Schnelle" <schnelle@linux.ibm.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Linas Vepstas" <linasvepstas@gmail.com>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Gerald Schaefer" <gerald.schaefer@linux.ibm.com>,
"Heiko Carstens" <hca@linux.ibm.com>,
"Vasily Gorbik" <gor@linux.ibm.com>,
"Alexander Gordeev" <agordeev@linux.ibm.com>,
"Christian Borntraeger" <borntraeger@linux.ibm.com>,
"Sven Schnelle" <svens@linux.ibm.com>,
"Peter Oberparleiter" <oberpar@linux.ibm.com>,
"Matthew Rosato" <mjrosato@linux.ibm.com>,
"Oliver O'Halloran" <oohall@gmail.com>,
"Sinan Kaya" <okaya@kernel.org>,
linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
"Keith Busch" <kbusch@kernel.org>
Subject: Re: [PATCH v3 1/2] PCI/AER: Fix missing uevent on recovery when a reset is requested
Date: Fri, 1 Aug 2025 07:44:14 +0200 [thread overview]
Message-ID: <aIxULlDfQw4yhFDv@wunner.de> (raw)
In-Reply-To: <4969c441-fe2a-470f-9efd-4661efca56ec@linux.intel.com>
On Thu, Jul 31, 2025 at 10:04:38AM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 7/31/25 6:01 AM, Lukas Wunner wrote:
> > +++ b/drivers/pci/pcie/err.c
> > @@ -165,6 +165,12 @@ static int report_resume(struct pci_dev *dev, void *data)
> > return 0;
> > }
> > +static int report_disconnect(struct pci_dev *dev, void *data)
> > +{
> > + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> > + return 0;
> > +}
>
> Since you are notifying the user space, I am wondering whether the drivers
> should be notified about the recovery failure?
The drivers are usually *causing* the recovery failure by returning
PCI_ERS_RESULT_DISCONNECT from their pci_error_handlers callbacks
(or by lacking pci_error_handlers, in particular ->error_detected()).
So in principle the drivers should be aware of recovery failure.
There are cases where multiple drivers are involved. E.g. on GPUs,
there's often a PCIe switch with a graphics device and various sound
or telemetry devices. Typically errors are reported by the Upstream
Port, so the Secondary Bus Reset occurs at the Root or Downstream Port
above the Upstream Port and affects the switch and all subordinate
devices. In cases like this, recovery failure may be caused by a
single driver (e.g. GPU) and the other drivers (e.g. telemetry) may
be unaware of it.
The recovery flow documented in Documentation/PCI/pci-error-recovery.rst
was originally conceived for EEH and indeed EEH does notify all drivers
of recovery failures by invoking the ->error_detected() callback with
channel_state pci_channel_io_perm_failure. See this call ...
eeh_pe_report("error_detected(permanent failure)", pe,
eeh_report_failure, NULL);
... in arch/powerpc/kernel/eeh_driver.c below the recover_failed label
in eeh_handle_normal_event().
I don't know why pcie_do_recovery() doesn't do the same on recovery
failure. This is one of several annoying deviations between AER and
EEH. Ideally the behavior should be the same across all platforms
so that drivers don't have to cope with platform-specific quirks.
However I think that's orthogonal to the pci_uevent_ers() invocation
in pcie_do_recovery().
Thanks,
Lukas
next prev parent reply other threads:[~2025-08-01 5:44 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 11:20 [PATCH v3 0/2] PCI/ERR: s390/pci: Use pci_uevent_ers() in PCI recovery Niklas Schnelle
2025-07-30 11:20 ` [PATCH v3 1/2] PCI/AER: Fix missing uevent on recovery when a reset is requested Niklas Schnelle
2025-07-30 20:01 ` Lukas Wunner
2025-07-30 20:24 ` Lukas Wunner
2025-07-31 13:01 ` Lukas Wunner
2025-07-31 17:04 ` Sathyanarayanan Kuppuswamy
2025-08-01 5:44 ` Lukas Wunner [this message]
2025-08-01 17:20 ` Sathyanarayanan Kuppuswamy
2025-08-04 11:04 ` Niklas Schnelle
2025-07-31 13:04 ` Lukas Wunner
2025-07-30 11:20 ` [PATCH v3 2/2] PCI/ERR: s390/pci: Use pci_uevent_ers() in PCI recovery Niklas Schnelle
2025-07-30 20:26 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIxULlDfQw4yhFDv@wunner.de \
--to=lukas@wunner.de \
--cc=agordeev@linux.ibm.com \
--cc=bhelgaas@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=kbusch@kernel.org \
--cc=linasvepstas@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=mani@kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=oberpar@linux.ibm.com \
--cc=okaya@kernel.org \
--cc=oohall@gmail.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=schnelle@linux.ibm.com \
--cc=svens@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).