* [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers [not found] <20240617200411.1426554-1-terry.bowman@amd.com> @ 2024-06-17 20:04 ` Terry Bowman 2024-06-20 11:21 ` Jonathan Cameron 2024-06-21 19:17 ` Dan Williams 2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman ` (2 subsequent siblings) 3 siblings, 2 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-17 20:04 UTC (permalink / raw) To: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, terry.bowman, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci The AER service driver does not currently call a handler for AER uncorrectable errors (UCE) detected in root ports or downstream ports. This is not needed in most cases because common PCIe port functionality is handled by portdrv service drivers. CXL root ports include CXL specific RAS registers that need logging before starting do_recovery() in the UCE case. Update the AER service driver to call the UCE handler for root ports and downstream ports. These PCIe port devices are bound to the portdrv driver that includes a CE and UCE handler to be called. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 705893b5f7b0..a4db474b2be5 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); + /* + * PCIe ports may include functionality beyond the standard + * extended port capabilities. This may present a need to log and + * handle errors not addressed in this driver. Examples are CXL + * root ports and CXL downstream switch ports using AER UIE to + * indicate CXL UCE RAS protocol errors. + */ + if (type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_DOWNSTREAM) { + struct pci_driver *pdrv = dev->driver; + + if (pdrv && pdrv->err_handler && + pdrv->err_handler->error_detected) { + const struct pci_error_handlers *err_handler; + + err_handler = pdrv->err_handler; + status = err_handler->error_detected(dev, state); + } + } + /* * If the error was detected by a Root Port, Downstream Port, RCEC, * or RCiEP, recovery runs on the device itself. For Ports, that -- 2.34.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman @ 2024-06-20 11:21 ` Jonathan Cameron 2024-06-24 14:58 ` Terry Bowman 2024-06-21 19:17 ` Dan Williams 1 sibling, 1 reply; 29+ messages in thread From: Jonathan Cameron @ 2024-06-20 11:21 UTC (permalink / raw) To: Terry Bowman Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, 17 Jun 2024 15:04:03 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > The AER service driver does not currently call a handler for AER > uncorrectable errors (UCE) detected in root ports or downstream > ports. This is not needed in most cases because common PCIe port > functionality is handled by portdrv service drivers. > > CXL root ports include CXL specific RAS registers that need logging > before starting do_recovery() in the UCE case. > > Update the AER service driver to call the UCE handler for root ports > and downstream ports. These PCIe port devices are bound to the portdrv > driver that includes a CE and UCE handler to be called. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 705893b5f7b0..a4db474b2be5 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > + /* > + * PCIe ports may include functionality beyond the standard > + * extended port capabilities. This may present a need to log and > + * handle errors not addressed in this driver. Examples are CXL > + * root ports and CXL downstream switch ports using AER UIE to > + * indicate CXL UCE RAS protocol errors. > + */ > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->error_detected) { > + const struct pci_error_handlers *err_handler; > + > + err_handler = pdrv->err_handler; > + status = err_handler->error_detected(dev, state); This status is going to get overridden by one of the pci_walk_bridge() calls. Should it be kept around and acted on, or dropped silently? (I'd guess no for silent!). > + } > + } > + > /* > * If the error was detected by a Root Port, Downstream Port, RCEC, > * or RCiEP, recovery runs on the device itself. For Ports, that ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-06-20 11:21 ` Jonathan Cameron @ 2024-06-24 14:58 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-24 14:58 UTC (permalink / raw) To: Jonathan Cameron Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci Hi Jonathan, I added a response below. On 6/20/24 06:21, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:03 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> The AER service driver does not currently call a handler for AER >> uncorrectable errors (UCE) detected in root ports or downstream >> ports. This is not needed in most cases because common PCIe port >> functionality is handled by portdrv service drivers. >> >> CXL root ports include CXL specific RAS registers that need logging >> before starting do_recovery() in the UCE case. >> >> Update the AER service driver to call the UCE handler for root ports >> and downstream ports. These PCIe port devices are bound to the portdrv >> driver that includes a CE and UCE handler to be called. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 705893b5f7b0..a4db474b2be5 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> >> + /* >> + * PCIe ports may include functionality beyond the standard >> + * extended port capabilities. This may present a need to log and >> + * handle errors not addressed in this driver. Examples are CXL >> + * root ports and CXL downstream switch ports using AER UIE to >> + * indicate CXL UCE RAS protocol errors. >> + */ >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM) { >> + struct pci_driver *pdrv = dev->driver; >> + >> + if (pdrv && pdrv->err_handler && >> + pdrv->err_handler->error_detected) { >> + const struct pci_error_handlers *err_handler; >> + >> + err_handler = pdrv->err_handler; >> + status = err_handler->error_detected(dev, state); > > This status is going to get overridden by one of the pci_walk_bridge() > calls. Should it be kept around and acted on, or dropped silently? > (I'd guess no for silent!). > It should be used later. According to PCI spec "The only method of recovering from an Uncorrectable Internal Error is reset or hardware replacement." I need to make certain that carries through below. Regards, Terry >> + } >> + } >> + >> /* >> * If the error was detected by a Root Port, Downstream Port, RCEC, >> * or RCiEP, recovery runs on the device itself. For Ports, that > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman 2024-06-20 11:21 ` Jonathan Cameron @ 2024-06-21 19:17 ` Dan Williams 2024-06-24 17:56 ` Terry Bowman 1 sibling, 1 reply; 29+ messages in thread From: Dan Williams @ 2024-06-21 19:17 UTC (permalink / raw) To: Terry Bowman, dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Terry Bowman wrote: > The AER service driver does not currently call a handler for AER > uncorrectable errors (UCE) detected in root ports or downstream > ports. This is not needed in most cases because common PCIe port > functionality is handled by portdrv service drivers. > > CXL root ports include CXL specific RAS registers that need logging > before starting do_recovery() in the UCE case. > > Update the AER service driver to call the UCE handler for root ports > and downstream ports. These PCIe port devices are bound to the portdrv > driver that includes a CE and UCE handler to be called. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 705893b5f7b0..a4db474b2be5 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > + /* > + * PCIe ports may include functionality beyond the standard > + * extended port capabilities. This may present a need to log and > + * handle errors not addressed in this driver. Examples are CXL > + * root ports and CXL downstream switch ports using AER UIE to > + * indicate CXL UCE RAS protocol errors. > + */ > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->error_detected) { > + const struct pci_error_handlers *err_handler; > + > + err_handler = pdrv->err_handler; > + status = err_handler->error_detected(dev, state); > + } > + } > + Would not a more appropriate place for this be pci_walk_bridge() where the ->subordinate == NULL and these type-check cases are unified? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-06-21 19:17 ` Dan Williams @ 2024-06-24 17:56 ` Terry Bowman 2024-07-10 20:48 ` nifan.cxl 2024-08-19 18:35 ` Fan Ni 0 siblings, 2 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-24 17:56 UTC (permalink / raw) To: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Hi Dan, I added a response below. On 6/21/24 14:17, Dan Williams wrote: > Terry Bowman wrote: >> The AER service driver does not currently call a handler for AER >> uncorrectable errors (UCE) detected in root ports or downstream >> ports. This is not needed in most cases because common PCIe port >> functionality is handled by portdrv service drivers. >> >> CXL root ports include CXL specific RAS registers that need logging >> before starting do_recovery() in the UCE case. >> >> Update the AER service driver to call the UCE handler for root ports >> and downstream ports. These PCIe port devices are bound to the portdrv >> driver that includes a CE and UCE handler to be called. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 705893b5f7b0..a4db474b2be5 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> >> + /* >> + * PCIe ports may include functionality beyond the standard >> + * extended port capabilities. This may present a need to log and >> + * handle errors not addressed in this driver. Examples are CXL >> + * root ports and CXL downstream switch ports using AER UIE to >> + * indicate CXL UCE RAS protocol errors. >> + */ >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM) { >> + struct pci_driver *pdrv = dev->driver; >> + >> + if (pdrv && pdrv->err_handler && >> + pdrv->err_handler->error_detected) { >> + const struct pci_error_handlers *err_handler; >> + >> + err_handler = pdrv->err_handler; >> + status = err_handler->error_detected(dev, state); >> + } >> + } >> + > > Would not a more appropriate place for this be pci_walk_bridge() where > the ->subordinate == NULL and these type-check cases are unified? It does. I can take a look at moving that. Regards, Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-06-24 17:56 ` Terry Bowman @ 2024-07-10 20:48 ` nifan.cxl 2024-07-10 21:48 ` Terry Bowman 2024-08-19 18:35 ` Fan Ni 1 sibling, 1 reply; 29+ messages in thread From: nifan.cxl @ 2024-07-10 20:48 UTC (permalink / raw) To: Terry Bowman Cc: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: > Hi Dan, > > I added a response below. > > On 6/21/24 14:17, Dan Williams wrote: > > Terry Bowman wrote: > >> The AER service driver does not currently call a handler for AER > >> uncorrectable errors (UCE) detected in root ports or downstream > >> ports. This is not needed in most cases because common PCIe port > >> functionality is handled by portdrv service drivers. > >> > >> CXL root ports include CXL specific RAS registers that need logging > >> before starting do_recovery() in the UCE case. > >> > >> Update the AER service driver to call the UCE handler for root ports > >> and downstream ports. These PCIe port devices are bound to the portdrv > >> driver that includes a CE and UCE handler to be called. > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> Cc: Bjorn Helgaas <bhelgaas@google.com> > >> Cc: linux-pci@vger.kernel.org > >> --- > >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >> index 705893b5f7b0..a4db474b2be5 100644 > >> --- a/drivers/pci/pcie/err.c > >> +++ b/drivers/pci/pcie/err.c > >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > >> > >> + /* > >> + * PCIe ports may include functionality beyond the standard > >> + * extended port capabilities. This may present a need to log and > >> + * handle errors not addressed in this driver. Examples are CXL > >> + * root ports and CXL downstream switch ports using AER UIE to > >> + * indicate CXL UCE RAS protocol errors. > >> + */ > >> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >> + type == PCI_EXP_TYPE_DOWNSTREAM) { > >> + struct pci_driver *pdrv = dev->driver; > >> + > >> + if (pdrv && pdrv->err_handler && > >> + pdrv->err_handler->error_detected) { > >> + const struct pci_error_handlers *err_handler; > >> + > >> + err_handler = pdrv->err_handler; > >> + status = err_handler->error_detected(dev, state); > >> + } > >> + } > >> + > > > > Would not a more appropriate place for this be pci_walk_bridge() where > > the ->subordinate == NULL and these type-check cases are unified? > > It does. I can take a look at moving that. Has that already been handled in pci_walk_bridge? The function pci_walk_bridge() will call report_error_detected, where the err handler will be called. https://elixir.bootlin.com/linux/v6.10-rc6/source/drivers/pci/pcie/err.c#L80 Fan > > Regards, > Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-07-10 20:48 ` nifan.cxl @ 2024-07-10 21:48 ` Terry Bowman 2024-07-11 1:14 ` fan 0 siblings, 1 reply; 29+ messages in thread From: Terry Bowman @ 2024-07-10 21:48 UTC (permalink / raw) To: nifan.cxl Cc: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci Hi Fan, On 7/10/24 15:48, nifan.cxl@gmail.com wrote: > On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: >> Hi Dan, >> >> I added a response below. >> >> On 6/21/24 14:17, Dan Williams wrote: >>> Terry Bowman wrote: >>>> The AER service driver does not currently call a handler for AER >>>> uncorrectable errors (UCE) detected in root ports or downstream >>>> ports. This is not needed in most cases because common PCIe port >>>> functionality is handled by portdrv service drivers. >>>> >>>> CXL root ports include CXL specific RAS registers that need logging >>>> before starting do_recovery() in the UCE case. >>>> >>>> Update the AER service driver to call the UCE handler for root ports >>>> and downstream ports. These PCIe port devices are bound to the portdrv >>>> driver that includes a CE and UCE handler to be called. >>>> >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >>>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>>> Cc: linux-pci@vger.kernel.org >>>> --- >>>> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >>>> 1 file changed, 20 insertions(+) >>>> >>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>> index 705893b5f7b0..a4db474b2be5 100644 >>>> --- a/drivers/pci/pcie/err.c >>>> +++ b/drivers/pci/pcie/err.c >>>> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >>>> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >>>> >>>> + /* >>>> + * PCIe ports may include functionality beyond the standard >>>> + * extended port capabilities. This may present a need to log and >>>> + * handle errors not addressed in this driver. Examples are CXL >>>> + * root ports and CXL downstream switch ports using AER UIE to >>>> + * indicate CXL UCE RAS protocol errors. >>>> + */ >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>> + type == PCI_EXP_TYPE_DOWNSTREAM) { >>>> + struct pci_driver *pdrv = dev->driver; >>>> + >>>> + if (pdrv && pdrv->err_handler && >>>> + pdrv->err_handler->error_detected) { >>>> + const struct pci_error_handlers *err_handler; >>>> + >>>> + err_handler = pdrv->err_handler; >>>> + status = err_handler->error_detected(dev, state); >>>> + } >>>> + } >>>> + >>> >>> Would not a more appropriate place for this be pci_walk_bridge() where >>> the ->subordinate == NULL and these type-check cases are unified? >> >> It does. I can take a look at moving that. > > Has that already been handled in pci_walk_bridge? > > The function pci_walk_bridge() will call report_error_detected, where > the err handler will be called. > https://elixir.bootlin.com/linux/v6.10-rc6/source/drivers/pci/pcie/err.c#L80 > > Fan > You would think so but the UCE handler was not called in my testing for the PCIe ports (RP,USP,DSP). The pci_walk_bridge() function has 2 cases: - If there is a subordinate/secondary bus then the callback is called for those downstream devices but not the port itself. - If there is no subordinate/secondary bus then the callback is invoked for the port itself. The function header comment may explain it better: /** * pci_walk_bridge - walk bridges potentially AER affected * @bridge: bridge which may be a Port, an RCEC, or an RCiEP * @cb: callback to be called for each device found * @userdata: arbitrary pointer to be passed to callback * * If the device provided is a bridge, walk the subordinate bus, including * any bridged devices on buses under this bus. Call the provided callback * on each device found. * * If the device provided has no subordinate bus, e.g., an RCEC or RCiEP, * call the callback on the device itself. */ Regards, Terry >> >> Regards, >> Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-07-10 21:48 ` Terry Bowman @ 2024-07-11 1:14 ` fan 0 siblings, 0 replies; 29+ messages in thread From: fan @ 2024-07-11 1:14 UTC (permalink / raw) To: Terry Bowman Cc: nifan.cxl, Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Wed, Jul 10, 2024 at 04:48:09PM -0500, Terry Bowman wrote: > Hi Fan, > > On 7/10/24 15:48, nifan.cxl@gmail.com wrote: > > On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: > >> Hi Dan, > >> > >> I added a response below. > >> > >> On 6/21/24 14:17, Dan Williams wrote: > >>> Terry Bowman wrote: > >>>> The AER service driver does not currently call a handler for AER > >>>> uncorrectable errors (UCE) detected in root ports or downstream > >>>> ports. This is not needed in most cases because common PCIe port > >>>> functionality is handled by portdrv service drivers. > >>>> > >>>> CXL root ports include CXL specific RAS registers that need logging > >>>> before starting do_recovery() in the UCE case. > >>>> > >>>> Update the AER service driver to call the UCE handler for root ports > >>>> and downstream ports. These PCIe port devices are bound to the portdrv > >>>> driver that includes a CE and UCE handler to be called. > >>>> > >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >>>> Cc: Bjorn Helgaas <bhelgaas@google.com> > >>>> Cc: linux-pci@vger.kernel.org > >>>> --- > >>>> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > >>>> 1 file changed, 20 insertions(+) > >>>> > >>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >>>> index 705893b5f7b0..a4db474b2be5 100644 > >>>> --- a/drivers/pci/pcie/err.c > >>>> +++ b/drivers/pci/pcie/err.c > >>>> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >>>> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > >>>> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > >>>> > >>>> + /* > >>>> + * PCIe ports may include functionality beyond the standard > >>>> + * extended port capabilities. This may present a need to log and > >>>> + * handle errors not addressed in this driver. Examples are CXL > >>>> + * root ports and CXL downstream switch ports using AER UIE to > >>>> + * indicate CXL UCE RAS protocol errors. > >>>> + */ > >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >>>> + type == PCI_EXP_TYPE_DOWNSTREAM) { > >>>> + struct pci_driver *pdrv = dev->driver; > >>>> + > >>>> + if (pdrv && pdrv->err_handler && > >>>> + pdrv->err_handler->error_detected) { > >>>> + const struct pci_error_handlers *err_handler; > >>>> + > >>>> + err_handler = pdrv->err_handler; > >>>> + status = err_handler->error_detected(dev, state); > >>>> + } > >>>> + } > >>>> + > >>> > >>> Would not a more appropriate place for this be pci_walk_bridge() where > >>> the ->subordinate == NULL and these type-check cases are unified? > >> > >> It does. I can take a look at moving that. > > > > Has that already been handled in pci_walk_bridge? > > > > The function pci_walk_bridge() will call report_error_detected, where > > the err handler will be called. > > https://elixir.bootlin.com/linux/v6.10-rc6/source/drivers/pci/pcie/err.c#L80 > > > > Fan > > > > You would think so but the UCE handler was not called in my testing for the PCIe > ports (RP,USP,DSP). The pci_walk_bridge() function has 2 cases: > - If there is a subordinate/secondary bus then the callback is called for > those downstream devices but not the port itself. > - If there is no subordinate/secondary bus then the callback is invoked for the > port itself. > > The function header comment may explain it better: > /** > * pci_walk_bridge - walk bridges potentially AER affected > * @bridge: bridge which may be a Port, an RCEC, or an RCiEP > * @cb: callback to be called for each device found > * @userdata: arbitrary pointer to be passed to callback > * > * If the device provided is a bridge, walk the subordinate bus, including > * any bridged devices on buses under this bus. Call the provided callback > * on each device found. > * > * If the device provided has no subordinate bus, e.g., an RCEC or RCiEP, > * call the callback on the device itself. > */ > > Regards, > Terry OK, interesting. Btw, what is the "state" passed to pcie_do_recovery(...state...)? Fan > > >> > >> Regards, > >> Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers 2024-06-24 17:56 ` Terry Bowman 2024-07-10 20:48 ` nifan.cxl @ 2024-08-19 18:35 ` Fan Ni 1 sibling, 0 replies; 29+ messages in thread From: Fan Ni @ 2024-08-19 18:35 UTC (permalink / raw) To: Terry Bowman Cc: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci, a.manzanares On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: > Hi Dan, > > I added a response below. > > On 6/21/24 14:17, Dan Williams wrote: > > Terry Bowman wrote: > >> The AER service driver does not currently call a handler for AER > >> uncorrectable errors (UCE) detected in root ports or downstream > >> ports. This is not needed in most cases because common PCIe port > >> functionality is handled by portdrv service drivers. > >> > >> CXL root ports include CXL specific RAS registers that need logging > >> before starting do_recovery() in the UCE case. > >> > >> Update the AER service driver to call the UCE handler for root ports > >> and downstream ports. These PCIe port devices are bound to the portdrv > >> driver that includes a CE and UCE handler to be called. > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> Cc: Bjorn Helgaas <bhelgaas@google.com> > >> Cc: linux-pci@vger.kernel.org > >> --- > >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >> index 705893b5f7b0..a4db474b2be5 100644 > >> --- a/drivers/pci/pcie/err.c > >> +++ b/drivers/pci/pcie/err.c > >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > >> > >> + /* > >> + * PCIe ports may include functionality beyond the standard > >> + * extended port capabilities. This may present a need to log and > >> + * handle errors not addressed in this driver. Examples are CXL > >> + * root ports and CXL downstream switch ports using AER UIE to > >> + * indicate CXL UCE RAS protocol errors. > >> + */ > >> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >> + type == PCI_EXP_TYPE_DOWNSTREAM) { > >> + struct pci_driver *pdrv = dev->driver; > >> + > >> + if (pdrv && pdrv->err_handler && > >> + pdrv->err_handler->error_detected) { > >> + const struct pci_error_handlers *err_handler; > >> + > >> + err_handler = pdrv->err_handler; > >> + status = err_handler->error_detected(dev, state); > >> + } > >> + } > >> + > > > > Would not a more appropriate place for this be pci_walk_bridge() where > > the ->subordinate == NULL and these type-check cases are unified? > > It does. I can take a look at moving that. > Based on current code logic, the code added here will be executed as long as the type matches (downstream port or root port), and I also noticed the case ->subordinate == NULL never gets touched when I try to inject an error through the aer_inject module and the user space tool. If my way to do error injection is right, it means the behaviour will get changed after the code move. Here is some of my experimental setup: QEMU + cxl topology (one type3 memdev directly attached to a HB with a single root port). 1. Load the cxl related drivers before error injection 2. Do aer inject with aer_inject inside the QEMU VM # aer_inject ~/nonfatal aer inject input file looks like below ----------------------------------------------------- fan:~/cxl/linux-fixes$ cat ~/nonfatal # Inject an uncorrectable/non-fatal training error into the device # with header log words 0 1 2 3. # # Either specify the PCI id on the command-line option or uncomment and edit # the PCI_ID line below using the correct PCI ID. # # Note that system firmware/BIOS may mask certain errors, change their severity # and/or not report header log words. # AER PCI_ID 0000:0c:00.0 UNCOR_STATUS COMP_ABORT HEADER_LOG 0 1 2 3 ----------------------------------------------------- The "lspci" output on the VM looks like below ---------------------------------------------------- Qemu: execute "lspci" on VM 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02) 00:02.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) 00:03.0 Unclassified device [0002]: Red Hat, Inc. Virtio filesystem 00:04.0 Unclassified device [0002]: Red Hat, Inc. Virtio filesystem 00:05.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) 0c:00.0 PCI bridge: Intel Corporation Device 7075 0d:00.0 CXL: Intel Corporation Device 0d93 (rev 01) -------------------------------------------------- Fan > Regards, > Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register [not found] <20240617200411.1426554-1-terry.bowman@amd.com> 2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman @ 2024-06-17 20:04 ` Terry Bowman 2024-06-20 11:31 ` Jonathan Cameron 2024-06-21 19:23 ` Dan Williams 2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman 2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman 3 siblings, 2 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-17 20:04 UTC (permalink / raw) To: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, terry.bowman, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci The AER service driver clears the AER correctable error (CE) status before calling the correctable error handler. This results in the error's status not correctly reflected if read from the CE handler. The AER CE status is needed by the portdrv's CE handler. The portdrv's CE handler is intended to only call the registered notifier callbacks if the CE error status has correctable internal error (CIE) set. This is not a problem for AER uncorrrectbale errors (UCE). The UCE status is still present in the AER capability and available for reading, if needed, when the UCE handler is called. Change the order of clearing the CE status and calling the CE handler. Make it to call the CE handler first and then clear the CE status after returning. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pcie/aer.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ac6293c24976..4dc03cb9aff0 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1094,9 +1094,6 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) * Correctable error does not need software intervention. * No need to go through error recovery process. */ - if (aer) - pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, - info->status); if (pcie_aer_is_native(dev)) { struct pci_driver *pdrv = dev->driver; @@ -1105,6 +1102,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) pdrv->err_handler->cor_error_detected(dev); pcie_clear_device_status(dev); } + if (aer) + pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, + info->status); + } else if (info->severity == AER_NONFATAL) pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); else if (info->severity == AER_FATAL) -- 2.34.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register 2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman @ 2024-06-20 11:31 ` Jonathan Cameron 2024-06-24 15:08 ` Terry Bowman 2024-06-21 19:23 ` Dan Williams 1 sibling, 1 reply; 29+ messages in thread From: Jonathan Cameron @ 2024-06-20 11:31 UTC (permalink / raw) To: Terry Bowman Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, 17 Jun 2024 15:04:04 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > The AER service driver clears the AER correctable error (CE) status before > calling the correctable error handler. This results in the error's status > not correctly reflected if read from the CE handler. > > The AER CE status is needed by the portdrv's CE handler. The portdrv's > CE handler is intended to only call the registered notifier callbacks > if the CE error status has correctable internal error (CIE) set. > > This is not a problem for AER uncorrrectbale errors (UCE). The UCE status uncorrectable > is still present in the AER capability and available for reading, if > needed, when the UCE handler is called. I'm seeing the clear in the DPC path for UCE. For other cases is it a side effect of the reset? > > Change the order of clearing the CE status and calling the CE handler. > Make it to call the CE handler first and then clear the CE status > after returning. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org Seems reasonable, but many gremlins around the ordering in these flows, so I'm to particularly confident. With that in mind. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> > --- > drivers/pci/pcie/aer.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ac6293c24976..4dc03cb9aff0 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1094,9 +1094,6 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) > * Correctable error does not need software intervention. > * No need to go through error recovery process. > */ > - if (aer) > - pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, > - info->status); > if (pcie_aer_is_native(dev)) { > struct pci_driver *pdrv = dev->driver; > > @@ -1105,6 +1102,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) > pdrv->err_handler->cor_error_detected(dev); > pcie_clear_device_status(dev); > } > + if (aer) > + pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, > + info->status); > + > } else if (info->severity == AER_NONFATAL) > pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); > else if (info->severity == AER_FATAL) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register 2024-06-20 11:31 ` Jonathan Cameron @ 2024-06-24 15:08 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-24 15:08 UTC (permalink / raw) To: Jonathan Cameron Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On 6/20/24 06:31, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:04 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> The AER service driver clears the AER correctable error (CE) status before >> calling the correctable error handler. This results in the error's status >> not correctly reflected if read from the CE handler. >> >> The AER CE status is needed by the portdrv's CE handler. The portdrv's >> CE handler is intended to only call the registered notifier callbacks >> if the CE error status has correctable internal error (CIE) set. >> >> This is not a problem for AER uncorrrectbale errors (UCE). The UCE status > > uncorrectable > Thank you. >> is still present in the AER capability and available for reading, if >> needed, when the UCE handler is called. > > I'm seeing the clear in the DPC path for UCE. For other cases is > it a side effect of the reset? > Depends on when its being read. I'm assuming this is after recovery in your case. And after recovery it will be zeroed. Regards, Terry >> >> Change the order of clearing the CE status and calling the CE handler. >> Make it to call the CE handler first and then clear the CE status >> after returning. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org > Seems reasonable, but many gremlins around the ordering in these > flows, so I'm to particularly confident. With that in mind. > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com> > >> --- >> drivers/pci/pcie/aer.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index ac6293c24976..4dc03cb9aff0 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -1094,9 +1094,6 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) >> * Correctable error does not need software intervention. >> * No need to go through error recovery process. >> */ >> - if (aer) >> - pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, >> - info->status); >> if (pcie_aer_is_native(dev)) { >> struct pci_driver *pdrv = dev->driver; >> >> @@ -1105,6 +1102,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) >> pdrv->err_handler->cor_error_detected(dev); >> pcie_clear_device_status(dev); >> } >> + if (aer) >> + pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, >> + info->status); >> + >> } else if (info->severity == AER_NONFATAL) >> pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); >> else if (info->severity == AER_FATAL) > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register 2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman 2024-06-20 11:31 ` Jonathan Cameron @ 2024-06-21 19:23 ` Dan Williams 2024-06-24 18:00 ` Terry Bowman 1 sibling, 1 reply; 29+ messages in thread From: Dan Williams @ 2024-06-21 19:23 UTC (permalink / raw) To: Terry Bowman, dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Terry Bowman wrote: > The AER service driver clears the AER correctable error (CE) status before > calling the correctable error handler. This results in the error's status > not correctly reflected if read from the CE handler. > > The AER CE status is needed by the portdrv's CE handler. The portdrv's > CE handler is intended to only call the registered notifier callbacks > if the CE error status has correctable internal error (CIE) set. Is this a fix or a prep patch? It reads like a "fix", but there are no notifiers to worry about today. > This is not a problem for AER uncorrrectbale errors (UCE). The UCE status > is still present in the AER capability and available for reading, if > needed, when the UCE handler is called. > > Change the order of clearing the CE status and calling the CE handler. > Make it to call the CE handler first and then clear the CE status > after returning. With the changelog clarified to indicate whether this has any impact on current behavior you can add: Acked-by: Dan Williams <dan.j.williams@intel.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register 2024-06-21 19:23 ` Dan Williams @ 2024-06-24 18:00 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-24 18:00 UTC (permalink / raw) To: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Hi Dan, I added a response below. On 6/21/24 14:23, Dan Williams wrote: > Terry Bowman wrote: >> The AER service driver clears the AER correctable error (CE) status before >> calling the correctable error handler. This results in the error's status >> not correctly reflected if read from the CE handler. >> >> The AER CE status is needed by the portdrv's CE handler. The portdrv's >> CE handler is intended to only call the registered notifier callbacks >> if the CE error status has correctable internal error (CIE) set. > > Is this a fix or a prep patch? It reads like a "fix", but there are no > notifiers to worry about today. > I will add mention "in preparation for future patch". >> This is not a problem for AER uncorrrectbale errors (UCE). The UCE status >> is still present in the AER capability and available for reading, if >> needed, when the UCE handler is called. >> >> Change the order of clearing the CE status and calling the CE handler. >> Make it to call the CE handler first and then clear the CE status >> after returning. > > With the changelog clarified to indicate whether this has any impact on > current behavior you can add: > > Acked-by: Dan Williams <dan.j.williams@intel.com> Regards, Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors [not found] <20240617200411.1426554-1-terry.bowman@amd.com> 2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman 2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman @ 2024-06-17 20:04 ` Terry Bowman 2024-06-20 12:30 ` Jonathan Cameron ` (2 more replies) 2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman 3 siblings, 3 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-17 20:04 UTC (permalink / raw) To: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, terry.bowman, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv does not implement an AER correctable handler (CE) but does implement the AER uncorrectable error (UCE). The UCE handler is fairly straightforward in that it only checks for frozen error state and returns the next step for recovery accordingly. As a result, port devices relying on AER correctable internal errors (CIE) and AER uncorrectable internal errors (UIE) will not be handled. Note, the PCIe spec indicates AER CIE/UIE can be used to report implementation specific errors.[1] CXL root ports, CXL downstream switch ports, and CXL upstream switch ports are examples of devices using the AER CIE/UIE for implementation specific purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to report CXL RAS errors.[2] Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic notifier to report CIE/UIE errors to the registered functions. This will require adding a CE handler and updating the existing UCE handler. For the UCE handler, the CXL spec states UIE errors should return need reset: "The only method of recovering from an Uncorrectable Internal Error is reset or hardware replacement."[1] [1] PCI6.0 - 6.2.10 Internal Errors [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream Switch Ports Signed-off-by: Terry Bowman <terry.bowman@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ drivers/pci/pcie/portdrv.h | 2 ++ 2 files changed, 34 insertions(+) diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 14a4b89a3b83..86d80e0e9606 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -37,6 +37,9 @@ struct portdrv_service_data { u32 service; }; +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); + /** * release_pcie_device - free PCI Express port service device structure * @dev: Port service device to release @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, pci_channel_state_t error) { + if (dev->aer_cap) { + u32 status; + + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, + &status); + + if (status & PCI_ERR_UNC_INTN) { + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, + AER_FATAL, (void *)dev); + return PCI_ERS_RESULT_NEED_RESET; + } + } + if (error == pci_channel_io_frozen) return PCI_ERS_RESULT_NEED_RESET; return PCI_ERS_RESULT_CAN_RECOVER; } +static void pcie_portdrv_cor_error_detected(struct pci_dev *dev) +{ + u32 status; + + if (!dev->aer_cap) + return; + + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_COR_STATUS, + &status); + + if (status & PCI_ERR_COR_INTERNAL) + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, + AER_CORRECTABLE, (void *)dev); +} + static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) { size_t off = offsetof(struct pcie_port_service_driver, slot_reset); @@ -780,6 +811,7 @@ static const struct pci_device_id port_pci_ids[] = { static const struct pci_error_handlers pcie_portdrv_err_handler = { .error_detected = pcie_portdrv_error_detected, + .cor_error_detected = pcie_portdrv_cor_error_detected, .slot_reset = pcie_portdrv_slot_reset, .mmio_enabled = pcie_portdrv_mmio_enabled, }; diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h index 12c89ea0313b..8a39197f0203 100644 --- a/drivers/pci/pcie/portdrv.h +++ b/drivers/pci/pcie/portdrv.h @@ -121,4 +121,6 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {} #endif /* !CONFIG_PCIE_PME */ struct device *pcie_port_find_device(struct pci_dev *dev, u32 service); + +extern struct atomic_notifier_head portdrv_aer_internal_err_chain; #endif /* _PORTDRV_H_ */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman @ 2024-06-20 12:30 ` Jonathan Cameron 2024-06-24 15:22 ` Terry Bowman 2024-06-21 19:36 ` Dan Williams 2024-06-26 2:54 ` Li, Ming4 2 siblings, 1 reply; 29+ messages in thread From: Jonathan Cameron @ 2024-06-20 12:30 UTC (permalink / raw) To: Terry Bowman Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, 17 Jun 2024 15:04:05 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv > does not implement an AER correctable handler (CE) but does implement the > AER uncorrectable error (UCE). The UCE handler is fairly straightforward > in that it only checks for frozen error state and returns the next step > for recovery accordingly. > > As a result, port devices relying on AER correctable internal errors (CIE) > and AER uncorrectable internal errors (UIE) will not be handled. Note, > the PCIe spec indicates AER CIE/UIE can be used to report implementation > specific errors.[1] > > CXL root ports, CXL downstream switch ports, and CXL upstream switch ports > are examples of devices using the AER CIE/UIE for implementation specific > purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to > report CXL RAS errors.[2] > > Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic > notifier to report CIE/UIE errors to the registered functions. This will > require adding a CE handler and updating the existing UCE handler. > > For the UCE handler, the CXL spec states UIE errors should return need > reset: "The only method of recovering from an Uncorrectable Internal Error > is reset or hardware replacement."[1] > > [1] PCI6.0 - 6.2.10 Internal Errors > [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and > Upstream Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ > drivers/pci/pcie/portdrv.h | 2 ++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 14a4b89a3b83..86d80e0e9606 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -37,6 +37,9 @@ struct portdrv_service_data { > u32 service; > }; > > +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); > +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); Perhaps these should be per instance of the portdrv? I'd imagine we only want to register CXL ones on CXL ports etc and it's annoying to have to check at runtime for relevance of a particular notifier. > + > /** > * release_pcie_device - free PCI Express port service device structure > * @dev: Port service device to release > @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) > static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > pci_channel_state_t error) > { > + if (dev->aer_cap) { > + u32 status; > + > + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, > + &status); > + > + if (status & PCI_ERR_UNC_INTN) { > + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, > + AER_FATAL, (void *)dev); Don't think the cast is needed as always fine to implicitly cast to and from void * in C. > + return PCI_ERS_RESULT_NEED_RESET; > + } > + } > + > if (error == pci_channel_io_frozen) > return PCI_ERS_RESULT_NEED_RESET; > return PCI_ERS_RESULT_CAN_RECOVER; > } > > +static void pcie_portdrv_cor_error_detected(struct pci_dev *dev) > +{ > + u32 status; > + > + if (!dev->aer_cap) > + return; > + > + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_COR_STATUS, > + &status); > + > + if (status & PCI_ERR_COR_INTERNAL) > + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, > + AER_CORRECTABLE, (void *)dev); No need for the cast. > +} > + > static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) > { > size_t off = offsetof(struct pcie_port_service_driver, slot_reset); > @@ -780,6 +811,7 @@ static const struct pci_device_id port_pci_ids[] = { > > static const struct pci_error_handlers pcie_portdrv_err_handler = { > .error_detected = pcie_portdrv_error_detected, > + .cor_error_detected = pcie_portdrv_cor_error_detected, > .slot_reset = pcie_portdrv_slot_reset, > .mmio_enabled = pcie_portdrv_mmio_enabled, > }; > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > index 12c89ea0313b..8a39197f0203 100644 > --- a/drivers/pci/pcie/portdrv.h > +++ b/drivers/pci/pcie/portdrv.h > @@ -121,4 +121,6 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {} > #endif /* !CONFIG_PCIE_PME */ > > struct device *pcie_port_find_device(struct pci_dev *dev, u32 service); > + > +extern struct atomic_notifier_head portdrv_aer_internal_err_chain; > #endif /* _PORTDRV_H_ */ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-20 12:30 ` Jonathan Cameron @ 2024-06-24 15:22 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-24 15:22 UTC (permalink / raw) To: Jonathan Cameron Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci Hi Jonathan, I added responses inline below. On 6/20/24 07:30, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:05 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv >> does not implement an AER correctable handler (CE) but does implement the >> AER uncorrectable error (UCE). The UCE handler is fairly straightforward >> in that it only checks for frozen error state and returns the next step >> for recovery accordingly. >> >> As a result, port devices relying on AER correctable internal errors (CIE) >> and AER uncorrectable internal errors (UIE) will not be handled. Note, >> the PCIe spec indicates AER CIE/UIE can be used to report implementation >> specific errors.[1] >> >> CXL root ports, CXL downstream switch ports, and CXL upstream switch ports >> are examples of devices using the AER CIE/UIE for implementation specific >> purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to >> report CXL RAS errors.[2] >> >> Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic >> notifier to report CIE/UIE errors to the registered functions. This will >> require adding a CE handler and updating the existing UCE handler. >> >> For the UCE handler, the CXL spec states UIE errors should return need >> reset: "The only method of recovering from an Uncorrectable Internal Error >> is reset or hardware replacement."[1] >> >> [1] PCI6.0 - 6.2.10 Internal Errors >> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and >> Upstream Switch Ports >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ >> drivers/pci/pcie/portdrv.h | 2 ++ >> 2 files changed, 34 insertions(+) >> >> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c >> index 14a4b89a3b83..86d80e0e9606 100644 >> --- a/drivers/pci/pcie/portdrv.c >> +++ b/drivers/pci/pcie/portdrv.c >> @@ -37,6 +37,9 @@ struct portdrv_service_data { >> u32 service; >> }; >> >> +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); >> +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); > > Perhaps these should be per instance of the portdrv? > I'd imagine we only want to register CXL ones on CXL ports etc > and it's annoying to have to check at runtime for relevance > of a particular notifier. > This could be made per-instance by moving to the PCI/device drvdata. This would likely need a portdrv setup-init helper function to enable for a particular PCI device. >> + >> /** >> * release_pcie_device - free PCI Express port service device structure >> * @dev: Port service device to release >> @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >> pci_channel_state_t error) >> { >> + if (dev->aer_cap) { >> + u32 status; >> + >> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, >> + &status); >> + >> + if (status & PCI_ERR_UNC_INTN) { >> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, >> + AER_FATAL, (void *)dev); > > Don't think the cast is needed as always fine to implicitly cast to and from > void * in C. > Ok. >> + return PCI_ERS_RESULT_NEED_RESET; >> + } >> + } >> + >> if (error == pci_channel_io_frozen) >> return PCI_ERS_RESULT_NEED_RESET; >> return PCI_ERS_RESULT_CAN_RECOVER; >> } >> >> +static void pcie_portdrv_cor_error_detected(struct pci_dev *dev) >> +{ >> + u32 status; >> + >> + if (!dev->aer_cap) >> + return; >> + >> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_COR_STATUS, >> + &status); >> + >> + if (status & PCI_ERR_COR_INTERNAL) >> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, >> + AER_CORRECTABLE, (void *)dev); > > No need for the cast. > Ok Regards, Terry >> +} >> + >> static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) >> { >> size_t off = offsetof(struct pcie_port_service_driver, slot_reset); >> @@ -780,6 +811,7 @@ static const struct pci_device_id port_pci_ids[] = { >> >> static const struct pci_error_handlers pcie_portdrv_err_handler = { >> .error_detected = pcie_portdrv_error_detected, >> + .cor_error_detected = pcie_portdrv_cor_error_detected, >> .slot_reset = pcie_portdrv_slot_reset, >> .mmio_enabled = pcie_portdrv_mmio_enabled, >> }; >> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h >> index 12c89ea0313b..8a39197f0203 100644 >> --- a/drivers/pci/pcie/portdrv.h >> +++ b/drivers/pci/pcie/portdrv.h >> @@ -121,4 +121,6 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {} >> #endif /* !CONFIG_PCIE_PME */ >> >> struct device *pcie_port_find_device(struct pci_dev *dev, u32 service); >> + >> +extern struct atomic_notifier_head portdrv_aer_internal_err_chain; >> #endif /* _PORTDRV_H_ */ > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman 2024-06-20 12:30 ` Jonathan Cameron @ 2024-06-21 19:36 ` Dan Williams 2024-06-24 18:21 ` Terry Bowman 2024-06-26 2:54 ` Li, Ming4 2 siblings, 1 reply; 29+ messages in thread From: Dan Williams @ 2024-06-21 19:36 UTC (permalink / raw) To: Terry Bowman, dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Terry Bowman wrote: > PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv > does not implement an AER correctable handler (CE) but does implement the > AER uncorrectable error (UCE). The UCE handler is fairly straightforward > in that it only checks for frozen error state and returns the next step > for recovery accordingly. > > As a result, port devices relying on AER correctable internal errors (CIE) > and AER uncorrectable internal errors (UIE) will not be handled. Note, > the PCIe spec indicates AER CIE/UIE can be used to report implementation > specific errors.[1] > > CXL root ports, CXL downstream switch ports, and CXL upstream switch ports > are examples of devices using the AER CIE/UIE for implementation specific > purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to > report CXL RAS errors.[2] > > Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic > notifier to report CIE/UIE errors to the registered functions. This will > require adding a CE handler and updating the existing UCE handler. > > For the UCE handler, the CXL spec states UIE errors should return need > reset: "The only method of recovering from an Uncorrectable Internal Error > is reset or hardware replacement."[1] > > [1] PCI6.0 - 6.2.10 Internal Errors > [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and > Upstream Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ > drivers/pci/pcie/portdrv.h | 2 ++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 14a4b89a3b83..86d80e0e9606 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -37,6 +37,9 @@ struct portdrv_service_data { > u32 service; > }; > > +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); > +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); > + > /** > * release_pcie_device - free PCI Express port service device structure > * @dev: Port service device to release > @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) > static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > pci_channel_state_t error) > { > + if (dev->aer_cap) { > + u32 status; > + > + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, > + &status); > + > + if (status & PCI_ERR_UNC_INTN) { > + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, > + AER_FATAL, (void *)dev); > + return PCI_ERS_RESULT_NEED_RESET; > + } > + } > + Oh, this is a finer grained / lower-level location than I was expecting. I was expecting that the notifier was just conveying the port interrupt notification to a driver that knew how to take the next step. This pcie_portdrv_error_detected() is a notification that is already "downstream" of the AER notification. If PCIe does not care about CIE and UIE then don't make it care, but redirect the notifications to the CXL side that may care. Leave the portdrv handlers PCIe native as much as possible. Now, I have not thought through the full implications of that suggestion, but for now am reacting to this AER -> PCIe err_handler -> CXL notfier as potentially more awkward than AER -> CXL notifier. It's a separate error handling domain that the PCIe side likely does not want to worry about. PCIe side is only responsible for allowing CXL to register for the notifications beacuse the AER interrupt is shared. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-21 19:36 ` Dan Williams @ 2024-06-24 18:21 ` Terry Bowman 2024-06-24 21:46 ` Dan Williams 0 siblings, 1 reply; 29+ messages in thread From: Terry Bowman @ 2024-06-24 18:21 UTC (permalink / raw) To: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Hi Dan, I added responses inline below. On 6/21/24 14:36, Dan Williams wrote: > Terry Bowman wrote: >> PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv >> does not implement an AER correctable handler (CE) but does implement the >> AER uncorrectable error (UCE). The UCE handler is fairly straightforward >> in that it only checks for frozen error state and returns the next step >> for recovery accordingly. >> >> As a result, port devices relying on AER correctable internal errors (CIE) >> and AER uncorrectable internal errors (UIE) will not be handled. Note, >> the PCIe spec indicates AER CIE/UIE can be used to report implementation >> specific errors.[1] >> >> CXL root ports, CXL downstream switch ports, and CXL upstream switch ports >> are examples of devices using the AER CIE/UIE for implementation specific >> purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to >> report CXL RAS errors.[2] >> >> Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic >> notifier to report CIE/UIE errors to the registered functions. This will >> require adding a CE handler and updating the existing UCE handler. >> >> For the UCE handler, the CXL spec states UIE errors should return need >> reset: "The only method of recovering from an Uncorrectable Internal Error >> is reset or hardware replacement."[1] >> >> [1] PCI6.0 - 6.2.10 Internal Errors >> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and >> Upstream Switch Ports >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ >> drivers/pci/pcie/portdrv.h | 2 ++ >> 2 files changed, 34 insertions(+) >> >> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c >> index 14a4b89a3b83..86d80e0e9606 100644 >> --- a/drivers/pci/pcie/portdrv.c >> +++ b/drivers/pci/pcie/portdrv.c >> @@ -37,6 +37,9 @@ struct portdrv_service_data { >> u32 service; >> }; >> >> +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); >> +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); >> + >> /** >> * release_pcie_device - free PCI Express port service device structure >> * @dev: Port service device to release >> @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >> pci_channel_state_t error) >> { >> + if (dev->aer_cap) { >> + u32 status; >> + >> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, >> + &status); >> + >> + if (status & PCI_ERR_UNC_INTN) { >> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, >> + AER_FATAL, (void *)dev); >> + return PCI_ERS_RESULT_NEED_RESET; >> + } >> + } >> + > > Oh, this is a finer grained / lower-level location than I was > expecting. I was expecting that the notifier was just conveying the port > interrupt notification to a driver that knew how to take the next step. > This pcie_portdrv_error_detected() is a notification that is already > "downstream" of the AER notification. > My intent was to implement the UIE/CIE "implementation specific" behavior as mentioned in the PCI spec. This included allowing port devices to be notified if needed. This plan is not ideal but works within the PCI portdrv situation and before we can introduce a CXL specific portdriver. > If PCIe does not care about CIE and UIE then don't make it care, but > redirect the notifications to the CXL side that may care. > > Leave the portdrv handlers PCIe native as much as possible. > > Now, I have not thought through the full implications of that > suggestion, but for now am reacting to this AER -> PCIe err_handler -> > CXL notfier as potentially more awkward than AER -> CXL notifier. It's a > separate error handling domain that the PCIe side likely does not want > to worry about. PCIe side is only responsible for allowing CXL to > register for the notifications beacuse the AER interrupt is shared. Hmmm, this sounds like either option#2 or introducing a CXL portdrv service driver. Thanks for the reviews and please let me know which option you would like me to purse. Regards, Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-24 18:21 ` Terry Bowman @ 2024-06-24 21:46 ` Dan Williams 2024-06-25 14:41 ` Terry Bowman 0 siblings, 1 reply; 29+ messages in thread From: Dan Williams @ 2024-06-24 21:46 UTC (permalink / raw) To: Terry Bowman, Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci Terry Bowman wrote: > Hi Dan, > > I added responses inline below. > > On 6/21/24 14:36, Dan Williams wrote: > > Terry Bowman wrote: > >> PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv > >> does not implement an AER correctable handler (CE) but does implement the > >> AER uncorrectable error (UCE). The UCE handler is fairly straightforward > >> in that it only checks for frozen error state and returns the next step > >> for recovery accordingly. > >> > >> As a result, port devices relying on AER correctable internal errors (CIE) > >> and AER uncorrectable internal errors (UIE) will not be handled. Note, > >> the PCIe spec indicates AER CIE/UIE can be used to report implementation > >> specific errors.[1] > >> > >> CXL root ports, CXL downstream switch ports, and CXL upstream switch ports > >> are examples of devices using the AER CIE/UIE for implementation specific > >> purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to > >> report CXL RAS errors.[2] > >> > >> Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic > >> notifier to report CIE/UIE errors to the registered functions. This will > >> require adding a CE handler and updating the existing UCE handler. > >> > >> For the UCE handler, the CXL spec states UIE errors should return need > >> reset: "The only method of recovering from an Uncorrectable Internal Error > >> is reset or hardware replacement."[1] > >> > >> [1] PCI6.0 - 6.2.10 Internal Errors > >> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and > >> Upstream Switch Ports > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> Cc: Bjorn Helgaas <bhelgaas@google.com> > >> Cc: linux-pci@vger.kernel.org > >> --- > >> drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ > >> drivers/pci/pcie/portdrv.h | 2 ++ > >> 2 files changed, 34 insertions(+) > >> > >> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > >> index 14a4b89a3b83..86d80e0e9606 100644 > >> --- a/drivers/pci/pcie/portdrv.c > >> +++ b/drivers/pci/pcie/portdrv.c > >> @@ -37,6 +37,9 @@ struct portdrv_service_data { > >> u32 service; > >> }; > >> > >> +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); > >> +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); > >> + > >> /** > >> * release_pcie_device - free PCI Express port service device structure > >> * @dev: Port service device to release > >> @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) > >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > >> pci_channel_state_t error) > >> { > >> + if (dev->aer_cap) { > >> + u32 status; > >> + > >> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, > >> + &status); > >> + > >> + if (status & PCI_ERR_UNC_INTN) { > >> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, > >> + AER_FATAL, (void *)dev); > >> + return PCI_ERS_RESULT_NEED_RESET; > >> + } > >> + } > >> + > > > > Oh, this is a finer grained / lower-level location than I was > > expecting. I was expecting that the notifier was just conveying the port > > interrupt notification to a driver that knew how to take the next step. > > This pcie_portdrv_error_detected() is a notification that is already > > "downstream" of the AER notification. > > > > My intent was to implement the UIE/CIE "implementation specific" behavior as > mentioned in the PCI spec. This included allowing port devices to be notified if > needed. This plan is not ideal but works within the PCI portdrv situation > and before we can introduce a CXL specific portdriver. ...but it really isn't implementation specific behavior like all the other anonymous internal error cases. This is an open standard definition that just happens to alias with the PCIe "internal" notification mechanism. > > > If PCIe does not care about CIE and UIE then don't make it care, but > > redirect the notifications to the CXL side that may care. > > > > Leave the portdrv handlers PCIe native as much as possible. > > > > Now, I have not thought through the full implications of that > > suggestion, but for now am reacting to this AER -> PCIe err_handler -> > > CXL notfier as potentially more awkward than AER -> CXL notifier. It's a > > separate error handling domain that the PCIe side likely does not want > > to worry about. PCIe side is only responsible for allowing CXL to > > register for the notifications beacuse the AER interrupt is shared. > > Hmmm, this sounds like either option#2 or introducing a CXL portdrv service > driver. > > Thanks for the reviews and please let me know which option you > would like me to purse. So after looking at this patchset I think calling the PCIe portdrv error handler set for anything other than PCIe errors is likely a mistake. The CXL protocol side of the house can experience errors that have no relation to errors that PCIe needs to handle or care about. I am thinking something like cxl_rch_handle_error() becomes cxl_handle_error() and when that successfully handles the error then no need to trigger pcie_do_recovery(). pcie_do_recovery() is too tightly scoped to error recovery that is reasonable for PCIe links. That may not be reasonable to CXL devices where protocol errors potentially implicate that a system memory transaction failed. The blast radius of CXL protocol errors are not constrained to single devices like the PCIe case. With that change something like a new cxl_do_recovery() can operate on the cxl_port topology and know that it has exclusive control of the error handling registers. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-24 21:46 ` Dan Williams @ 2024-06-25 14:41 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-25 14:41 UTC (permalink / raw) To: Dan Williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci On 6/24/24 16:46, Dan Williams wrote: > Terry Bowman wrote: >> Hi Dan, >> >> I added responses inline below. >> >> On 6/21/24 14:36, Dan Williams wrote: >>> Terry Bowman wrote: >>>> PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv >>>> does not implement an AER correctable handler (CE) but does implement the >>>> AER uncorrectable error (UCE). The UCE handler is fairly straightforward >>>> in that it only checks for frozen error state and returns the next step >>>> for recovery accordingly. >>>> >>>> As a result, port devices relying on AER correctable internal errors (CIE) >>>> and AER uncorrectable internal errors (UIE) will not be handled. Note, >>>> the PCIe spec indicates AER CIE/UIE can be used to report implementation >>>> specific errors.[1] >>>> >>>> CXL root ports, CXL downstream switch ports, and CXL upstream switch ports >>>> are examples of devices using the AER CIE/UIE for implementation specific >>>> purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to >>>> report CXL RAS errors.[2] >>>> >>>> Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic >>>> notifier to report CIE/UIE errors to the registered functions. This will >>>> require adding a CE handler and updating the existing UCE handler. >>>> >>>> For the UCE handler, the CXL spec states UIE errors should return need >>>> reset: "The only method of recovering from an Uncorrectable Internal Error >>>> is reset or hardware replacement."[1] >>>> >>>> [1] PCI6.0 - 6.2.10 Internal Errors >>>> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and >>>> Upstream Switch Ports >>>> >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >>>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>>> Cc: linux-pci@vger.kernel.org >>>> --- >>>> drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ >>>> drivers/pci/pcie/portdrv.h | 2 ++ >>>> 2 files changed, 34 insertions(+) >>>> >>>> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c >>>> index 14a4b89a3b83..86d80e0e9606 100644 >>>> --- a/drivers/pci/pcie/portdrv.c >>>> +++ b/drivers/pci/pcie/portdrv.c >>>> @@ -37,6 +37,9 @@ struct portdrv_service_data { >>>> u32 service; >>>> }; >>>> >>>> +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); >>>> +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); >>>> + >>>> /** >>>> * release_pcie_device - free PCI Express port service device structure >>>> * @dev: Port service device to release >>>> @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) >>>> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >>>> pci_channel_state_t error) >>>> { >>>> + if (dev->aer_cap) { >>>> + u32 status; >>>> + >>>> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, >>>> + &status); >>>> + >>>> + if (status & PCI_ERR_UNC_INTN) { >>>> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, >>>> + AER_FATAL, (void *)dev); >>>> + return PCI_ERS_RESULT_NEED_RESET; >>>> + } >>>> + } >>>> + >>> >>> Oh, this is a finer grained / lower-level location than I was >>> expecting. I was expecting that the notifier was just conveying the port >>> interrupt notification to a driver that knew how to take the next step. >>> This pcie_portdrv_error_detected() is a notification that is already >>> "downstream" of the AER notification. >>> >> >> My intent was to implement the UIE/CIE "implementation specific" behavior as >> mentioned in the PCI spec. This included allowing port devices to be notified if >> needed. This plan is not ideal but works within the PCI portdrv situation >> and before we can introduce a CXL specific portdriver. > > ...but it really isn't implementation specific behavior like all the > other anonymous internal error cases. This is an open standard > definition that just happens to alias with the PCIe "internal" > notification mechanism. > >> >>> If PCIe does not care about CIE and UIE then don't make it care, but >>> redirect the notifications to the CXL side that may care. >>> >>> Leave the portdrv handlers PCIe native as much as possible. >>> >>> Now, I have not thought through the full implications of that >>> suggestion, but for now am reacting to this AER -> PCIe err_handler -> >>> CXL notfier as potentially more awkward than AER -> CXL notifier. It's a >>> separate error handling domain that the PCIe side likely does not want >>> to worry about. PCIe side is only responsible for allowing CXL to >>> register for the notifications beacuse the AER interrupt is shared. >> >> Hmmm, this sounds like either option#2 or introducing a CXL portdrv service >> driver. >> >> Thanks for the reviews and please let me know which option you >> would like me to purse. > > So after looking at this patchset I think calling the PCIe portdrv error > handler set for anything other than PCIe errors is likely a mistake. The > CXL protocol side of the house can experience errors that have no > relation to errors that PCIe needs to handle or care about. > > I am thinking something like cxl_rch_handle_error() becomes > cxl_handle_error() and when that successfully handles the error then no > need to trigger pcie_do_recovery(). > > pcie_do_recovery() is too tightly scoped to error recovery that is > reasonable for PCIe links. That may not be reasonable to CXL devices > where protocol errors potentially implicate that a system memory > transaction failed. The blast radius of CXL protocol errors are not > constrained to single devices like the PCIe case. > > With that change something like a new cxl_do_recovery() can operate on > the cxl_port topology and know that it has exclusive control of the > error handling registers. Ok, I'll refactor the existing AER RCH downstream port handling to support CXL USP, DSP, and RP as well. I can incorporate much of the feedback from this RFC into the new patchset. Thanks Dan. Regards, Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman 2024-06-20 12:30 ` Jonathan Cameron 2024-06-21 19:36 ` Dan Williams @ 2024-06-26 2:54 ` Li, Ming4 2024-06-26 13:39 ` Terry Bowman 2 siblings, 1 reply; 29+ messages in thread From: Li, Ming4 @ 2024-06-26 2:54 UTC (permalink / raw) To: Terry Bowman, Williams, Dan J, Weiny, Ira, dave@stgolabs.net, Jiang, Dave, Schofield, Alison, Verma, Vishal L, jim.harris@samsung.com, ilpo.jarvinen@linux.intel.com, ardb@kernel.org, sathyanarayanan.kuppuswamy@linux.intel.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Yazen.Ghannam@amd.com, Robert.Richter@amd.com Cc: Bjorn Helgaas, linux-pci@vger.kernel.org On 6/18/2024 4:04 AM, Terry Bowman wrote: > PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv > does not implement an AER correctable handler (CE) but does implement the > AER uncorrectable error (UCE). The UCE handler is fairly straightforward > in that it only checks for frozen error state and returns the next step > for recovery accordingly. > > As a result, port devices relying on AER correctable internal errors (CIE) > and AER uncorrectable internal errors (UIE) will not be handled. Note, > the PCIe spec indicates AER CIE/UIE can be used to report implementation > specific errors.[1] > > CXL root ports, CXL downstream switch ports, and CXL upstream switch ports > are examples of devices using the AER CIE/UIE for implementation specific > purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to > report CXL RAS errors.[2] > > Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic > notifier to report CIE/UIE errors to the registered functions. This will > require adding a CE handler and updating the existing UCE handler. > > For the UCE handler, the CXL spec states UIE errors should return need > reset: "The only method of recovering from an Uncorrectable Internal Error > is reset or hardware replacement."[1] > > [1] PCI6.0 - 6.2.10 Internal Errors > [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and > Upstream Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ > drivers/pci/pcie/portdrv.h | 2 ++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 14a4b89a3b83..86d80e0e9606 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -37,6 +37,9 @@ struct portdrv_service_data { > u32 service; > }; > > +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); > +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); > + > /** > * release_pcie_device - free PCI Express port service device structure > * @dev: Port service device to release > @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) > static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > pci_channel_state_t error) > { > + if (dev->aer_cap) { > + u32 status; > + > + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, > + &status); > + > + if (status & PCI_ERR_UNC_INTN) { > + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, > + AER_FATAL, (void *)dev); > + return PCI_ERS_RESULT_NEED_RESET; > + } > + } > + > if (error == pci_channel_io_frozen) > return PCI_ERS_RESULT_NEED_RESET; > return PCI_ERS_RESULT_CAN_RECOVER; > } > > +static void pcie_portdrv_cor_error_detected(struct pci_dev *dev) > +{ > + u32 status; > + > + if (!dev->aer_cap) > + return; Seems like that dev->aer_cap checking is not needed for cor_error_detected, aer_get_device_error_info() already checked it and won't call handle_error_source() if device has not AER capability. But I am curious why pci_aer_handle_error() checks dev->aer_cap again after aer_get_device_error_info(). > + > + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_COR_STATUS, > + &status); > + > + if (status & PCI_ERR_COR_INTERNAL) > + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, > + AER_CORRECTABLE, (void *)dev); > +} > + > static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) > { > size_t off = offsetof(struct pcie_port_service_driver, slot_reset); > @@ -780,6 +811,7 @@ static const struct pci_device_id port_pci_ids[] = { > > static const struct pci_error_handlers pcie_portdrv_err_handler = { > .error_detected = pcie_portdrv_error_detected, > + .cor_error_detected = pcie_portdrv_cor_error_detected, > .slot_reset = pcie_portdrv_slot_reset, > .mmio_enabled = pcie_portdrv_mmio_enabled, > }; > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > index 12c89ea0313b..8a39197f0203 100644 > --- a/drivers/pci/pcie/portdrv.h > +++ b/drivers/pci/pcie/portdrv.h > @@ -121,4 +121,6 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {} > #endif /* !CONFIG_PCIE_PME */ > > struct device *pcie_port_find_device(struct pci_dev *dev, u32 service); > + > +extern struct atomic_notifier_head portdrv_aer_internal_err_chain; > #endif /* _PORTDRV_H_ */ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors 2024-06-26 2:54 ` Li, Ming4 @ 2024-06-26 13:39 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-26 13:39 UTC (permalink / raw) To: Li, Ming4, Williams, Dan J, Weiny, Ira, dave@stgolabs.net, Jiang, Dave, Schofield, Alison, Verma, Vishal L, jim.harris@samsung.com, ilpo.jarvinen@linux.intel.com, ardb@kernel.org, sathyanarayanan.kuppuswamy@linux.intel.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Yazen.Ghannam@amd.com, Robert.Richter@amd.com Cc: Bjorn Helgaas, linux-pci@vger.kernel.org On 6/25/24 21:54, Li, Ming4 wrote: > On 6/18/2024 4:04 AM, Terry Bowman wrote: >> PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv >> does not implement an AER correctable handler (CE) but does implement the >> AER uncorrectable error (UCE). The UCE handler is fairly straightforward >> in that it only checks for frozen error state and returns the next step >> for recovery accordingly. >> >> As a result, port devices relying on AER correctable internal errors (CIE) >> and AER uncorrectable internal errors (UIE) will not be handled. Note, >> the PCIe spec indicates AER CIE/UIE can be used to report implementation >> specific errors.[1] >> >> CXL root ports, CXL downstream switch ports, and CXL upstream switch ports >> are examples of devices using the AER CIE/UIE for implementation specific >> purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to >> report CXL RAS errors.[2] >> >> Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic >> notifier to report CIE/UIE errors to the registered functions. This will >> require adding a CE handler and updating the existing UCE handler. >> >> For the UCE handler, the CXL spec states UIE errors should return need >> reset: "The only method of recovering from an Uncorrectable Internal Error >> is reset or hardware replacement."[1] >> >> [1] PCI6.0 - 6.2.10 Internal Errors >> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and >> Upstream Switch Ports >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++ >> drivers/pci/pcie/portdrv.h | 2 ++ >> 2 files changed, 34 insertions(+) >> >> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c >> index 14a4b89a3b83..86d80e0e9606 100644 >> --- a/drivers/pci/pcie/portdrv.c >> +++ b/drivers/pci/pcie/portdrv.c >> @@ -37,6 +37,9 @@ struct portdrv_service_data { >> u32 service; >> }; >> >> +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain); >> +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain); >> + >> /** >> * release_pcie_device - free PCI Express port service device structure >> * @dev: Port service device to release >> @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev) >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >> pci_channel_state_t error) >> { >> + if (dev->aer_cap) { >> + u32 status; >> + >> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS, >> + &status); >> + >> + if (status & PCI_ERR_UNC_INTN) { >> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain, >> + AER_FATAL, (void *)dev); >> + return PCI_ERS_RESULT_NEED_RESET; >> + } >> + } >> + >> if (error == pci_channel_io_frozen) >> return PCI_ERS_RESULT_NEED_RESET; >> return PCI_ERS_RESULT_CAN_RECOVER; >> } >> >> +static void pcie_portdrv_cor_error_detected(struct pci_dev *dev) >> +{ >> + u32 status; >> + >> + if (!dev->aer_cap) >> + return; > > Seems like that dev->aer_cap checking is not needed for cor_error_detected, aer_get_device_error_info() already checked it and won't call handle_error_source() if device has not AER capability. But I am curious why pci_aer_handle_error() checks dev->aer_cap again after aer_get_device_error_info(). > Hi Ming, I agree this check should be removed. Regards, Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() [not found] <20240617200411.1426554-1-terry.bowman@amd.com> ` (2 preceding siblings ...) 2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman @ 2024-06-17 20:04 ` Terry Bowman 2024-06-19 7:09 ` Christoph Hellwig ` (2 more replies) 3 siblings, 3 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-17 20:04 UTC (permalink / raw) To: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, terry.bowman, Yazen.Ghannam, Robert.Richter Cc: Bjorn Helgaas, linux-pci AER correctable internal errors (CIE) and AER uncorrectable internal errors (UIE) are disabled through the AER mask register by default.[1] CXL PCIe ports use the CIE/UIE to report RAS errors and as a result need CIE/UIE enabled.[2] Change pci_aer_unmask_internal_errors() function to be exported for the CXL driver and other drivers to use. [1] PCI6.0 - 7.8.4.3 Uncorrectable [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream Switch Ports Signed-off-by: Terry Bowman <terry.bowman@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pcie/aer.c | 3 ++- include/linux/aer.h | 6 ++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 4dc03cb9aff0..d7a1982f0c50 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -951,7 +951,7 @@ static bool find_source_device(struct pci_dev *parent, * Note: AER must be enabled and supported by the device which must be * checked in advance, e.g. with pcie_aer_is_native(). */ -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) +void pci_aer_unmask_internal_errors(struct pci_dev *dev) { int aer = dev->aer_cap; u32 mask; @@ -964,6 +964,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) mask &= ~PCI_ERR_COR_INTERNAL; pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); } +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors); static bool is_cxl_mem_dev(struct pci_dev *dev) { diff --git a/include/linux/aer.h b/include/linux/aer.h index 4b97f38f3fcf..a4fd25ea0280 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -50,6 +50,12 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } #endif +#ifdef CONFIG_PCIEAER_CXL +void pci_aer_unmask_internal_errors(struct pci_dev *dev); +#else +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { } +#endif + void pci_print_aer(struct pci_dev *dev, int aer_severity, struct aer_capability_regs *aer); int cper_severity_to_aer(int cper_severity); -- 2.34.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() 2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman @ 2024-06-19 7:09 ` Christoph Hellwig 2024-06-19 15:40 ` Terry Bowman 2024-06-20 13:11 ` Jonathan Cameron 2024-07-10 21:47 ` Bjorn Helgaas 2 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2024-06-19 7:09 UTC (permalink / raw) To: Terry Bowman Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, Jun 17, 2024 at 03:04:10PM -0500, Terry Bowman wrote: > AER correctable internal errors (CIE) and AER uncorrectable internal > errors (UIE) are disabled through the AER mask register by default.[1] > > CXL PCIe ports use the CIE/UIE to report RAS errors and as a result > need CIE/UIE enabled.[2] > > Change pci_aer_unmask_internal_errors() function to be exported for > the CXL driver and other drivers to use. I can't actually find a user for this. Maybe that's because you did weird partial CCs for your series, or maybe it's because you don't want to tell us. Either way it's a no-go. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() 2024-06-19 7:09 ` Christoph Hellwig @ 2024-06-19 15:40 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-19 15:40 UTC (permalink / raw) To: Christoph Hellwig Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On 6/19/24 02:09, Christoph Hellwig wrote: > On Mon, Jun 17, 2024 at 03:04:10PM -0500, Terry Bowman wrote: >> AER correctable internal errors (CIE) and AER uncorrectable internal >> errors (UIE) are disabled through the AER mask register by default.[1] >> >> CXL PCIe ports use the CIE/UIE to report RAS errors and as a result >> need CIE/UIE enabled.[2] >> >> Change pci_aer_unmask_internal_errors() function to be exported for >> the CXL driver and other drivers to use. > > I can't actually find a user for this. Maybe that's because you did > weird partial CCs for your series, or maybe it's because you don't > want to tell us. Either way it's a no-go. The use is in the following patchset (9/9) that missed being shared with PCI list. If there is rework I'll fix so both are sent to PCI list. https://lore.kernel.org/all/20240617200411.1426554-10-terry.bowman@amd.com/ Regards, Terry ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() 2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman 2024-06-19 7:09 ` Christoph Hellwig @ 2024-06-20 13:11 ` Jonathan Cameron 2024-06-24 16:22 ` Terry Bowman 2024-07-10 21:47 ` Bjorn Helgaas 2 siblings, 1 reply; 29+ messages in thread From: Jonathan Cameron @ 2024-06-20 13:11 UTC (permalink / raw) To: Terry Bowman Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, 17 Jun 2024 15:04:10 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > AER correctable internal errors (CIE) and AER uncorrectable internal > errors (UIE) are disabled through the AER mask register by default.[1] > > CXL PCIe ports use the CIE/UIE to report RAS errors and as a result > need CIE/UIE enabled.[2] > > Change pci_aer_unmask_internal_errors() function to be exported for > the CXL driver and other drivers to use. I've perhaps forgotten the end conclusion, but I thought there was a request to just try enabling this in general and mask it out only for known broken devices? Admittedly that's a more daring path, so maybe I hallucinated it! > > [1] PCI6.0 - 7.8.4.3 Uncorrectable > [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream > Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/aer.c | 3 ++- > include/linux/aer.h | 6 ++++++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 4dc03cb9aff0..d7a1982f0c50 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -951,7 +951,7 @@ static bool find_source_device(struct pci_dev *parent, > * Note: AER must be enabled and supported by the device which must be > * checked in advance, e.g. with pcie_aer_is_native(). > */ > -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > +void pci_aer_unmask_internal_errors(struct pci_dev *dev) > { > int aer = dev->aer_cap; > u32 mask; > @@ -964,6 +964,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > mask &= ~PCI_ERR_COR_INTERNAL; > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); > } > +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors); > > static bool is_cxl_mem_dev(struct pci_dev *dev) > { > diff --git a/include/linux/aer.h b/include/linux/aer.h > index 4b97f38f3fcf..a4fd25ea0280 100644 > --- a/include/linux/aer.h > +++ b/include/linux/aer.h > @@ -50,6 +50,12 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) > static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } > #endif > > +#ifdef CONFIG_PCIEAER_CXL > +void pci_aer_unmask_internal_errors(struct pci_dev *dev); > +#else > +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { } > +#endif > + > void pci_print_aer(struct pci_dev *dev, int aer_severity, > struct aer_capability_regs *aer); > int cper_severity_to_aer(int cper_severity); ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() 2024-06-20 13:11 ` Jonathan Cameron @ 2024-06-24 16:22 ` Terry Bowman 0 siblings, 0 replies; 29+ messages in thread From: Terry Bowman @ 2024-06-24 16:22 UTC (permalink / raw) To: Jonathan Cameron Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci Hi Jonathan, I added a response inline below. On 6/20/24 08:11, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:10 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> AER correctable internal errors (CIE) and AER uncorrectable internal >> errors (UIE) are disabled through the AER mask register by default.[1] >> >> CXL PCIe ports use the CIE/UIE to report RAS errors and as a result >> need CIE/UIE enabled.[2] >> >> Change pci_aer_unmask_internal_errors() function to be exported for >> the CXL driver and other drivers to use. > > I've perhaps forgotten the end conclusion, but I thought there was > a request to just try enabling this in general and mask it out only > for known broken devices? > > Admittedly that's a more daring path, so maybe I hallucinated it! > I remember there was discussion. A quick search for PCI_ERR_COR_INTERNAL and PCI_ERR_UNC_INTERNAL doesn't find any default enablement. Regards, Terry >> >> [1] PCI6.0 - 7.8.4.3 Uncorrectable >> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream >> Switch Ports >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/aer.c | 3 ++- >> include/linux/aer.h | 6 ++++++ >> 2 files changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index 4dc03cb9aff0..d7a1982f0c50 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -951,7 +951,7 @@ static bool find_source_device(struct pci_dev *parent, >> * Note: AER must be enabled and supported by the device which must be >> * checked in advance, e.g. with pcie_aer_is_native(). >> */ >> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> { >> int aer = dev->aer_cap; >> u32 mask; >> @@ -964,6 +964,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> mask &= ~PCI_ERR_COR_INTERNAL; >> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); >> } >> +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors); >> >> static bool is_cxl_mem_dev(struct pci_dev *dev) >> { >> diff --git a/include/linux/aer.h b/include/linux/aer.h >> index 4b97f38f3fcf..a4fd25ea0280 100644 >> --- a/include/linux/aer.h >> +++ b/include/linux/aer.h >> @@ -50,6 +50,12 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) >> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } >> #endif >> >> +#ifdef CONFIG_PCIEAER_CXL >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev); >> +#else >> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { } >> +#endif >> + >> void pci_print_aer(struct pci_dev *dev, int aer_severity, >> struct aer_capability_regs *aer); >> int cper_severity_to_aer(int cper_severity); > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() 2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman 2024-06-19 7:09 ` Christoph Hellwig 2024-06-20 13:11 ` Jonathan Cameron @ 2024-07-10 21:47 ` Bjorn Helgaas 2 siblings, 0 replies; 29+ messages in thread From: Bjorn Helgaas @ 2024-07-10 21:47 UTC (permalink / raw) To: Terry Bowman Cc: dan.j.williams, ira.weiny, dave, dave.jiang, alison.schofield, ming4.li, vishal.l.verma, jim.harris, ilpo.jarvinen, ardb, sathyanarayanan.kuppuswamy, linux-cxl, linux-kernel, Yazen.Ghannam, Robert.Richter, Bjorn Helgaas, linux-pci On Mon, Jun 17, 2024 at 03:04:10PM -0500, Terry Bowman wrote: > AER correctable internal errors (CIE) and AER uncorrectable internal > errors (UIE) are disabled through the AER mask register by default.[1] Nit: "Correctable Errors" and "Uncorrectable Errors" are generic PCIe concepts that exist independent of AER, so I wouldn't prefix them with AER. The AER mask registers control *reporting* of errors, but of course they don't disable the errors themselves. > CXL PCIe ports use the CIE/UIE to report RAS errors and as a result > need CIE/UIE enabled.[2] > > Change pci_aer_unmask_internal_errors() function to be exported for > the CXL driver and other drivers to use. > > [1] PCI6.0 - 7.8.4.3 Uncorrectable s/PCI6.0 .../PCIe r6.0, sec 7.8.4.3/ since there is a conventional PCI spec as well. > [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream > Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/aer.c | 3 ++- > include/linux/aer.h | 6 ++++++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 4dc03cb9aff0..d7a1982f0c50 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -951,7 +951,7 @@ static bool find_source_device(struct pci_dev *parent, > * Note: AER must be enabled and supported by the device which must be > * checked in advance, e.g. with pcie_aer_is_native(). > */ > -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > +void pci_aer_unmask_internal_errors(struct pci_dev *dev) > { > int aer = dev->aer_cap; > u32 mask; > @@ -964,6 +964,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > mask &= ~PCI_ERR_COR_INTERNAL; > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); > } > +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors); > > static bool is_cxl_mem_dev(struct pci_dev *dev) > { > diff --git a/include/linux/aer.h b/include/linux/aer.h > index 4b97f38f3fcf..a4fd25ea0280 100644 > --- a/include/linux/aer.h > +++ b/include/linux/aer.h > @@ -50,6 +50,12 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) > static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } > #endif > > +#ifdef CONFIG_PCIEAER_CXL > +void pci_aer_unmask_internal_errors(struct pci_dev *dev); > +#else > +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { } > +#endif I don't like the idea of exporting a generic PCI interface that only does something when CONFIG_PCIEAER_CXL is enabled. If there's ever a non-CXL caller, it will be confused. Bjorn ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-08-19 18:35 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240617200411.1426554-1-terry.bowman@amd.com>
2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman
2024-06-20 11:21 ` Jonathan Cameron
2024-06-24 14:58 ` Terry Bowman
2024-06-21 19:17 ` Dan Williams
2024-06-24 17:56 ` Terry Bowman
2024-07-10 20:48 ` nifan.cxl
2024-07-10 21:48 ` Terry Bowman
2024-07-11 1:14 ` fan
2024-08-19 18:35 ` Fan Ni
2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman
2024-06-20 11:31 ` Jonathan Cameron
2024-06-24 15:08 ` Terry Bowman
2024-06-21 19:23 ` Dan Williams
2024-06-24 18:00 ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman
2024-06-20 12:30 ` Jonathan Cameron
2024-06-24 15:22 ` Terry Bowman
2024-06-21 19:36 ` Dan Williams
2024-06-24 18:21 ` Terry Bowman
2024-06-24 21:46 ` Dan Williams
2024-06-25 14:41 ` Terry Bowman
2024-06-26 2:54 ` Li, Ming4
2024-06-26 13:39 ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
2024-06-19 7:09 ` Christoph Hellwig
2024-06-19 15:40 ` Terry Bowman
2024-06-20 13:11 ` Jonathan Cameron
2024-06-24 16:22 ` Terry Bowman
2024-07-10 21:47 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox