From: Bjorn Helgaas <helgaas@kernel.org>
To: Terry Bowman <terry.bowman@amd.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
dan.j.williams@intel.com, ira.weiny@intel.com,
vishal.l.verma@intel.com, alison.schofield@intel.com,
Jonathan.Cameron@huawei.com, bhelgaas@google.com
Subject: Re: [v6 11/11 PATCH] cxl/pci: Add callback to log AER correctable error
Date: Wed, 7 Dec 2022 14:29:20 -0600 [thread overview]
Message-ID: <20221207202920.GA1468863@bhelgaas> (raw)
In-Reply-To: <59c6e507-f67a-6ae5-4b3d-d836d86d5c0d@amd.com>
Hi Terry,
On Wed, Dec 07, 2022 at 02:04:17PM -0600, Terry Bowman wrote:
> On 11/30/22 18:02, Dave Jiang wrote:
> > Add AER error handler callback to read the RAS capability structure
> > correctable error (CE) status register for the CXL device. Log the
> > error as a trace event and clear the error. For CXL devices, the driver
> > also needs to write back to the status register to clear the
> > unmasked correctable errors.
> >
> > See CXL spec rev3.0 8.2.4.16 for RAS capability structure CE Status
> > Register.
> >
> > Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > ---
> >
> > v6:
> > - Update commit log to point to RAS capability structure. (Bjorn)
> > - Change cxl_correctable_error_logging() to cxl_cor_error_detected().
> > (Bjorn)
> >
> > drivers/cxl/pci.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 11f842df9807..02342830b612 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -622,10 +622,30 @@ static void cxl_error_resume(struct pci_dev *pdev)
> > dev->driver ? "successful" : "failed");
> > }
> >
> > +static void cxl_cor_error_detected(struct pci_dev *pdev)
> > +{
> > + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> > + struct device *dev = &cxlmd->dev;
> > + void __iomem *addr;
> > + u32 status;
> > +
> > + if (!cxlds->regs.ras)
> > + return;
> > +
> > + addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> > + status = le32_to_cpu(readl(addr));
> > + if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> > + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> > + trace_cxl_aer_correctable_error(dev_name(dev), status);
> > + }
> > +}
> > +
>
> This will log PCI AER CEs only if there is also a RAS CE. My
> understanding (could be the problem) is AER CE's are normally
> reported. Will this be inconsistent with other error AER CE
> handling?
I can't quite parse this, so let me ramble and see if we accidentally
converge on some understanding :)
cxl_cor_error_detected() is the .cor_error_detected handler, which is
called by the AER code in the PCI core. So IIUC, we'll only get here
if that PCI core AER code is invoked via an AER interrupt, AER
polling, or an event from the ACPI APEI framework.
So I would expect "this will only log CXL RAS CEs if there is a PCI
AER CE", which is the opposite of what you said. But I have no idea
at all about how CXL RAS CEs work.
It looks like aer_enable_rootport() sets PCI_ERR_ROOT_CMD_COR_EN, so I
would expect that AER CEs normally cause interrupts and would be
discovered that way.
> > static const struct pci_error_handlers cxl_error_handlers = {
> > .error_detected = cxl_error_detected,
> > .slot_reset = cxl_slot_reset,
> > .resume = cxl_error_resume,
> > + .cor_error_detected = cxl_cor_error_detected,
> > };
> >
> > static struct pci_driver cxl_pci_driver = {
> >
> >
>
next prev parent reply other threads:[~2022-12-07 20:29 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 17:48 [PATCH v4 00/11] cxl/pci: Add fundamental error handling Dave Jiang
2022-11-29 17:48 ` [PATCH v4 01/11] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers Dave Jiang
2022-11-29 17:48 ` [PATCH v4 02/11] cxl/pci: Cleanup cxl_map_device_regs() Dave Jiang
2022-11-29 17:48 ` [PATCH v4 03/11] cxl/pci: Kill cxl_map_regs() Dave Jiang
2022-11-29 17:48 ` [PATCH v4 04/11] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic Dave Jiang
2022-11-29 17:48 ` [PATCH v4 05/11] cxl/port: Limit the port driver to just the HDM Decoder Capability Dave Jiang
2022-11-29 17:48 ` [PATCH v4 06/11] cxl/pci: Prepare for mapping RAS Capability Structure Dave Jiang
2022-11-29 17:48 ` [PATCH v4 07/11] cxl/pci: Find and map the " Dave Jiang
2022-11-29 17:48 ` [PATCH v4 08/11] cxl/pci: add tracepoint events for CXL RAS Dave Jiang
2022-11-29 19:45 ` Steven Rostedt
2022-11-29 17:48 ` [PATCH v4 09/11] cxl/pci: Add (hopeful) error handling support Dave Jiang
2023-01-06 16:05 ` Jonathan Cameron
2023-01-06 16:12 ` Dave Jiang
2022-11-29 17:49 ` [PATCH v4 10/11] PCI/AER: Add optional logging callback for correctable error Dave Jiang
2022-11-30 19:45 ` Bjorn Helgaas
2022-11-30 21:37 ` Dave Jiang
2022-11-30 22:11 ` [v5 10/11 PATCH] " Dave Jiang
2022-11-30 22:13 ` [v5 11/11 PATCH] cxl/pci: Add callback to log AER " Dave Jiang
2022-11-30 22:47 ` Bjorn Helgaas
2022-12-01 0:02 ` [v6 " Dave Jiang
2022-12-07 20:04 ` Terry Bowman
2022-12-07 20:29 ` Bjorn Helgaas [this message]
2022-12-07 20:54 ` Terry Bowman
2022-11-29 17:49 ` [PATCH v4 11/11] " Dave Jiang
2022-12-13 15:17 ` [PATCH v4 00/11] cxl/pci: Add fundamental error handling Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221207202920.GA1468863@bhelgaas \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox