From: Lukas Wunner <lukas@wunner.de>
To: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Bjorn Helgaas <helgaas@kernel.org>,
Riana Tauro <riana.tauro@intel.com>,
Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>,
"Sean C. Dardis" <sean.c.dardis@intel.com>,
Terry Bowman <terry.bowman@amd.com>,
Linas Vepstas <linasvepstas@gmail.com>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver OHalloran <oohall@gmail.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>,
linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
Date: Sun, 17 Aug 2025 15:17:51 +0200 [thread overview]
Message-ID: <aKHWf3L0NCl_CET5@wunner.de> (raw)
In-Reply-To: <eec85830-e024-4801-bbc8-5cfa60048932@linux.intel.com>
On Thu, Aug 14, 2025 at 12:29:25PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 8/14/25 2:36 AM, Lukas Wunner wrote:
> > On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
> > > On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > > > @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > > > pci_walk_bridge(bridge, report_mmio_enabled, &status);
> > > > }
> > > > + if (status == PCI_ERS_RESULT_NEED_RESET ||
> > > > + state == pci_channel_io_frozen) {
> > > > + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > > > + pci_warn(bridge, "subordinate device reset failed\n");
> > > > + goto failed;
> > > > + }
> > > > + }
> > > > +
> > > > if (status == PCI_ERS_RESULT_NEED_RESET) {
> > > > /*
> > > > * TODO: Should call platform-specific
> > >
> > > I wonder if it might make sense to merge the reset into the above
> > > existing if.
> >
> > There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> > return PCI_ERS_RESULT_RECOVERED from ->error_detected(). So they
> > fall through directly to the ->resume() stage. They're doing this
> > even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
> >
> > But for DPC we must call reset_subordinates() to bring the link back up.
> > And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> > we must likewise call it because the link may be unreliable.
>
> For fatal errors, since we already ignore the value returned by
> ->error_detected() (by calling reset_subordinates() unconditionally), why
> not update status accordingly in report_frozen_detected() and notify the
> driver about the reset?
>
> That way, the reset logic could be unified under a single if
> (status == PCI_ERS_RESULT_NEED_RESET) condition.
>
> Checking the drivers/bus/mhi/host/pci_generic.c implementation, it looks
> like calling slot_reset callback looks harmless.
Unfortunately it's not harmless:
mhi_pci_slot_reset() calls pci_enable_device(). But a corresponding
call to pci_disable_device() is only performed before in
mhi_pci_error_detected() if that function returns
PCI_ERS_RESULT_NEED_RESET.
So there would be an enable_cnt imbalance if I'd change the logic to
overwrite the driver's vote with PCI_ERS_RESULT_NEED_RESET in the
pci_channel_io_frozen case and call its ->slot_reset() callback.
The approach taken by this patch is to minimize risk, avoid any changes
to drivers, make do with minimal changes to pcie_do_recovery() and
limit the behavioral change.
I think overriding status = PCI_ERS_RESULT_NEED_RESET and calling drivers'
->slot_reset() would have to be done in a separate patch on top and would
require going through all drivers again to see which ones need to be
amended.
Also, note that report_frozen_detected() is too early to set
"status = PCI_ERS_RESULT_NEED_RESET". That needs to happen after the
->mmio_enabled() step, so that drivers get a chance to examine the
device even in the pci_channel_io_frozen case before a reset is
performed. (The ->mmio_enabled() step is only performed if "status" is
PCI_ERS_RESULT_CAN_RECOVER.)
So then the code would look like this:
if (state == pci_channel_io_frozen)
status = PCI_ERS_RESULT_NEED_RESET;
if (status == PCI_ERS_RESULT_NEED_RESET) {
if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
pci_warn(bridge, "subordinate device reset failed\n");
goto failed;
}
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast slot_reset message\n");
pci_walk_bridge(bridge, report_slot_reset, &status);
}
... which isn't very different from the present patch:
if (status == PCI_ERS_RESULT_NEED_RESET ||
state == pci_channel_io_frozen) {
if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
pci_warn(bridge, "subordinate device reset failed\n");
goto failed;
}
}
if (status == PCI_ERS_RESULT_NEED_RESET) {
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast slot_reset message\n");
pci_walk_bridge(bridge, report_slot_reset, &status);
}
... except that this patch avoids touching any drivers.
Thanks,
Lukas
next prev parent reply other threads:[~2025-08-17 13:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-13 5:11 [PATCH 0/5] PCI: Reduce AER / EEH deviations Lukas Wunner
2025-08-13 5:11 ` Lukas Wunner
2025-08-13 5:11 ` [PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors Lukas Wunner
2025-08-13 23:01 ` Sathyanarayanan Kuppuswamy
2025-08-17 13:45 ` Lukas Wunner
2025-08-14 7:56 ` Niklas Schnelle
2025-08-14 9:36 ` Lukas Wunner
2025-08-14 19:29 ` Sathyanarayanan Kuppuswamy
2025-08-17 13:17 ` Lukas Wunner [this message]
2025-08-17 16:10 ` Sathyanarayanan Kuppuswamy
2025-08-14 20:31 ` Niklas Schnelle
2025-08-18 23:17 ` Linas Vepstas
2025-08-17 16:11 ` Sathyanarayanan Kuppuswamy
2025-08-13 5:11 ` [PATCH 2/5] PCI/ERR: Fix uevent on failure to recover Lukas Wunner
2025-08-13 23:01 ` Sathyanarayanan Kuppuswamy
2025-08-14 7:08 ` Niklas Schnelle
2025-08-13 5:11 ` [PATCH 3/5] PCI/ERR: Notify drivers " Lukas Wunner
2025-08-13 23:05 ` Sathyanarayanan Kuppuswamy
2025-08-13 5:11 ` [PATCH 4/5] PCI/ERR: Update device error_state already after reset Lukas Wunner
2025-08-13 23:43 ` Sathyanarayanan Kuppuswamy
2025-08-13 5:11 ` [PATCH 5/5] PCI/ERR: Remove remnants of .link_reset() callback Lukas Wunner
2025-08-14 0:40 ` Sathyanarayanan Kuppuswamy
2025-08-13 18:21 ` [PATCH 0/5] PCI: Reduce AER / EEH deviations Bjorn Helgaas
2025-08-14 0:30 ` Linas Vepstas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aKHWf3L0NCl_CET5@wunner.de \
--to=lukas@wunner.de \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=helgaas@kernel.org \
--cc=linasvepstas@gmail.com \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=manivannan.sadhasivam@oss.qualcomm.com \
--cc=oohall@gmail.com \
--cc=riana.tauro@intel.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=schnelle@linux.ibm.com \
--cc=sean.c.dardis@intel.com \
--cc=terry.bowman@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.