From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<dan.j.williams@intel.com>, <ira.weiny@intel.com>,
<vishal.l.verma@intel.com>, <alison.schofield@intel.com>,
<rostedt@goodmis.org>, <terry.bowman@amd.com>,
<bhelgaas@google.com>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<shiju.jose@huawei.com>
Subject: Re: [PATCH v4 09/11] cxl/pci: Add (hopeful) error handling support
Date: Fri, 6 Jan 2023 16:05:15 +0000 [thread overview]
Message-ID: <20230106160515.000046b8@huawei.com> (raw)
In-Reply-To: <166974413966.1608150.15522782911404473932.stgit@djiang5-desk3.ch.intel.com>
On Tue, 29 Nov 2022 10:48:59 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Add nominal error handling that tears down CXL.mem in response to error
> notifications that imply a device reset. Given some CXL.mem may be
> operating as System RAM, there is a high likelihood that these error
> events are fatal. However, if the system survives the notification the
> expectation is that the driver behavior is equivalent to a hot-unplug
> and re-plug of an endpoint.
>
> Note that this does not change the mask values from the default. That
> awaits CXL _OSC support to determine whether platform firmware is in
> control of the mask registers.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
I've been messing around with improving the qemu injection to do multiple
errors and ran into a bug...
I'll send a patch next week, but in meantime...
> ---
> +/*
> + * Log the state of the RAS status registers and prepare them to log the
> + * next error status. Return 1 if reset needed.
> + */
> +static bool cxl_report_and_clear(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct device *dev = &cxlmd->dev;
> + u32 hl[CXL_HEADERLOG_SIZE_U32];
> + void __iomem *addr;
> + u32 status;
> + u32 fe;
> +
> + if (!cxlds->regs.ras)
> + return false;
> +
> + addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
> + status = le32_to_cpu((__force __le32)readl(addr));
> + if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
> + return false;
> +
> + /* If multiple errors, log header points to first error from ctrl reg */
> + if (hweight32(status) > 1) {
> + addr = cxlds->regs.ras + CXL_RAS_CAP_CONTROL_OFFSET;
> + fe = BIT(le32_to_cpu((__force __le32)readl(addr)) &
> + CXL_RAS_CAP_CONTROL_FE_MASK);
> + } else {
> + fe = status;
> + }
> +
> + header_log_copy(cxlds, hl);
> + trace_cxl_aer_uncorrectable_error(dev_name(dev), status, fe, hl);
> + writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
This address is meant to be that of the CXL_RAS_UNCORRECTABLE_STATUS register
but in the event hweight32(status) > 1 it's been ovewritten with the
address of CXL_RAS_CAP_CONTROL.
> +
> + return true;
> +}
> +
next prev parent reply other threads:[~2023-01-06 16:05 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 17:48 [PATCH v4 00/11] cxl/pci: Add fundamental error handling Dave Jiang
2022-11-29 17:48 ` [PATCH v4 01/11] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers Dave Jiang
2022-11-29 17:48 ` [PATCH v4 02/11] cxl/pci: Cleanup cxl_map_device_regs() Dave Jiang
2022-11-29 17:48 ` [PATCH v4 03/11] cxl/pci: Kill cxl_map_regs() Dave Jiang
2022-11-29 17:48 ` [PATCH v4 04/11] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic Dave Jiang
2022-11-29 17:48 ` [PATCH v4 05/11] cxl/port: Limit the port driver to just the HDM Decoder Capability Dave Jiang
2022-11-29 17:48 ` [PATCH v4 06/11] cxl/pci: Prepare for mapping RAS Capability Structure Dave Jiang
2022-11-29 17:48 ` [PATCH v4 07/11] cxl/pci: Find and map the " Dave Jiang
2022-11-29 17:48 ` [PATCH v4 08/11] cxl/pci: add tracepoint events for CXL RAS Dave Jiang
2022-11-29 19:45 ` Steven Rostedt
2022-11-29 17:48 ` [PATCH v4 09/11] cxl/pci: Add (hopeful) error handling support Dave Jiang
2023-01-06 16:05 ` Jonathan Cameron [this message]
2023-01-06 16:12 ` Dave Jiang
2022-11-29 17:49 ` [PATCH v4 10/11] PCI/AER: Add optional logging callback for correctable error Dave Jiang
2022-11-30 19:45 ` Bjorn Helgaas
2022-11-30 21:37 ` Dave Jiang
2022-11-30 22:11 ` [v5 10/11 PATCH] " Dave Jiang
2022-11-30 22:13 ` [v5 11/11 PATCH] cxl/pci: Add callback to log AER " Dave Jiang
2022-11-30 22:47 ` Bjorn Helgaas
2022-12-01 0:02 ` [v6 " Dave Jiang
2022-12-07 20:04 ` Terry Bowman
2022-12-07 20:29 ` Bjorn Helgaas
2022-12-07 20:54 ` Terry Bowman
2022-11-29 17:49 ` [PATCH v4 11/11] " Dave Jiang
2022-12-13 15:17 ` [PATCH v4 00/11] cxl/pci: Add fundamental error handling Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230106160515.000046b8@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.