* [PATCH v6 0/8] cxl: support CXL memory RAS features
@ 2025-05-21 12:47 shiju.jose
2025-05-21 12:47 ` [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
` (11 more replies)
0 siblings, 12 replies; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Support for CXL memory EDAC features: patrol scrub, ECS, soft-PPR and
memory sparing.
Detailed history of the complete EDAC series with CXL EDAC patches
up to V20 [1] and this CXL specific series had separated from V20 of
the above series.
The series is based on [2] v6.15-rc4 (based on comment from Dave
in the thread [4]).
Also applied(no conflicts) and tested on cxl.git [3] branch: next
1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
2. https://github.com/torvalds/linux.git
3. https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
4. https://lore.kernel.org/all/d83a83d1-37e7-4192-913f-243098f679e3@intel.com/
Userspace code for CXL memory repair features [5] and
sample boot-script for CXL memory repair [6].
[5]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
[6]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/
Changes
=======
v5 -> v6:
1. Fixed feedback from Randy Dunlap on CXL EDAC documentation.
2. Feedback from Alison:
- Replace #ifdef using IS_ENABLED() in the series
- Fix for the kfree() oops in devm_cxl_memdev_edac_release()
while unloading cxl-test module.
- Added separate helper functions for scrub set attributes for
dev scrub and region scrub.
- renaming to scrub_cycle and scrub_region_id.
3. Feedback from Dave:
- Fix for the kfree() oops in devm_cxl_memdev_edac_release()
while unloading cxl-test module.
- Add cxl_test inclusion of edac.o
- Check return from cxl_feature_info() with IS_ERR in the series.
4. Rebased to linux.git [2] v6.15-rc4 (based on comment from Dave
in the thread [4]).
v4 -> v5:
1. Fixed a compilation warning introduced by v3->v4, reported by Dave Jiang on v4.
drivers/cxl/core/edac.c: In function ‘cxl_mem_perform_sparing’:
drivers/cxl/core/edac.c:1335:29: warning: the comparison will always evaluate as ‘true’ for the address of ‘validity_flags’ will never be NULL [-Waddress]
1335 | if (!rec->media_hdr.validity_flags)
| ^
In file included from ./drivers/cxl/cxlmem.h:10,
from drivers/cxl/core/edac.c:21:
./include/cxl/event.h:35:12: note: ‘validity_flags’ declared here
35 | u8 validity_flags[2];
| ^~~~~~~~~~~~~~
2. Updated patches for tags given.
v3 -> v4:
1. Feedback from Dave Jiang on v3,
1.1. Changes for comments in EDAC scrub documentation for CXL use cases.
https://lore.kernel.org/all/2df68c68-f1a8-4327-abc9-d265326c133d@intel.com/
1.2. Changes for comments in CXL memory sparing control feature.
https://lore.kernel.org/all/4ee3323c-fb27-4fbe-b032-78fd54bc21a0@intel.com/
v2 -> v3:
1. Feedback from Dan Williams on v2,
https://lore.kernel.org/linux-mm/20250320180450.539-1-shiju.jose@huawei.com/
- Modified get_support_feature_info() in fwctl series generic to use in
cxl/fxctl and cxl/edac and replace cxl_get_feature_entry() in the CXL edac
series.
- Add usecase note for CXL ECS in Documentation/edac/scrub.rst.
- Add info message when device scrub rate set by a region overwritten with a
local device scrub rate or another region's scrub rate.
- Replace 'ps' with patrol_scrub in the patrol scrub feature.
- Replaced usage of intermediate objects struct cxl_memdev_ps_params and
enum cxl_scrub_param etc for patrol scrub and did same for ECS.
- Rename CXL_MEMDEV_PS_* macros.
- Rename scrub_cycle_hrs-> scrub_cycle_hours
- Add if (!cxl_dev_name)
return -ENOMEM; to devm_cxl_memdev_edac_register()
- Add devm_cxl_region_edac_register(cxlr) for CXL_PARTMODE_PMEM case.
- Add separate configurations for CXL scrub, ECS and memory repair
CXL_EDAC_SCRUB, CXL_EDAC_ECS and CXL_EDAC_MEM_REPAIR.
- Add
if (!capable(CAP_SYS_RAWIO))
return -EPERM; for set attributes callbacks for CXL scrub, ECS and
memory repair.
- In patch "cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command"
* cxl_do_maintenance() -> cxl_perform_maintenance() and moved to cxl/core/edac.c
* kmalloc() -> kvzalloc()
- In patch, "cxl: Support for finding memory operation attributes from the current boot"
* Moved code from drivers/cxl/core/ras.c to drivers/cxl/core/edac.c
* Add few logics to releasing the cache to give safety with respect to error storms and burning
* unlimited memory.
* Add estimated memory overhead expense of this feature documented in the Kconfig.
* Unified various names such as attr, param, attrbs throughout the patches.
* Moved > struct xarray rec_gen_media and struct xarray rec_dram; out of struct cxl_memdev
to CXL edac object, but there is required a pointer to this object in struct cxl_memdev
because the error records are reported and thus stored in the cxl_memdev context not
in the CXL EDAC context.
2. Feedback from Borislav on v2,
- In include/linux/edac.h
Replace EDAC_PPR -> EDAC_REPAIR_PPR
EDAC_CACHELINE_SPARING -> EDAC_REPAIR_CACHELINE_SPARING etc.
v1 -> v2:
1. Feedback from Dan Williams on v1,
https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
- Fixed lock issues in region scrubbing, added local cxl_acquire()
and cxl_unlock.
- Replaced CXL examples using cat and echo from EDAC .rst docs
with short description and ref to ABI docs. Also corrections
in existing descriptions as suggested by Dan.
- Add policy description for the scrub control feature.
However this may require inputs from CXL experts.
- Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
- Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
- Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
- snprintf() -> kasprintf() in few places.
2. Feedback from Alison on v1,
- In cxl_get_feature_entry()(patch 1), return NULL on failures and
reintroduced checks in cxl_get_feature_entry().
- Changed logic in for loop in region based scrubbing code.
- Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
and add as a local function to drivers/cxl/core/edac.c
- Changed few multiline comments to single line comments.
- Removed unnecessary comments from the code.
- Reduced line length of few macros in ECS and memory repair code.
- In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
- Ran clang-format for new files and updated.
3. Changes for feedbacks from Jonathan on v1.
- Changed few multiline comments to single line comments.
Shiju Jose (8):
EDAC: Update documentation for the CXL memory patrol scrub control
feature
cxl: Update prototype of function get_support_feature_info()
cxl/edac: Add CXL memory device patrol scrub control feature
cxl/edac: Add CXL memory device ECS control feature
cxl/edac: Add support for PERFORM_MAINTENANCE command
cxl/edac: Support for finding memory operation attributes from the
current boot
cxl/edac: Add CXL memory device memory sparing control feature
cxl/edac: Add CXL memory device soft PPR control feature
Documentation/edac/memory_repair.rst | 31 +
Documentation/edac/scrub.rst | 76 +
drivers/cxl/Kconfig | 71 +
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/edac.c | 2103 ++++++++++++++++++++++++++
drivers/cxl/core/features.c | 17 +-
drivers/cxl/core/mbox.c | 11 +-
drivers/cxl/core/memdev.c | 1 +
drivers/cxl/core/region.c | 10 +
drivers/cxl/cxl.h | 10 +
drivers/cxl/cxlmem.h | 30 +
drivers/cxl/mem.c | 4 +
drivers/edac/mem_repair.c | 9 +
include/linux/edac.h | 7 +
tools/testing/cxl/Kbuild | 1 +
16 files changed, 2372 insertions(+), 12 deletions(-)
create mode 100644 drivers/cxl/core/edac.c
--
2.43.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 16:28 ` Fan Ni
2025-05-21 12:47 ` [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
` (10 subsequent siblings)
11 siblings, 1 reply; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Update the Documentation/edac/scrub.rst to include use cases and
policies for CXL memory device-based, CXL region-based patrol scrub
control and CXL Error Check Scrub (ECS).
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
Documentation/edac/scrub.rst | 76 ++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)
diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
index daab929cdba1..2cfa74fa1ffd 100644
--- a/Documentation/edac/scrub.rst
+++ b/Documentation/edac/scrub.rst
@@ -264,3 +264,79 @@ Sysfs files are documented in
`Documentation/ABI/testing/sysfs-edac-scrub`
`Documentation/ABI/testing/sysfs-edac-ecs`
+
+Examples
+--------
+
+The usage takes the form shown in these examples:
+
+1. CXL memory Patrol Scrub
+
+The following are the use cases identified why we might increase the scrub rate.
+
+- Scrubbing is needed at device granularity because a device is showing
+ unexpectedly high errors.
+
+- Scrubbing may apply to memory that isn't online at all yet. Likely this
+ is a system wide default setting on boot.
+
+- Scrubbing at a higher rate because the monitor software has determined that
+ more reliability is necessary for a particular data set. This is called
+ Differentiated Reliability.
+
+1.1. Device based scrubbing
+
+CXL memory is exposed to memory management subsystem and ultimately userspace
+via CXL devices. Device-based scrubbing is used for the first use case
+described in "Section 1 CXL Memory Patrol Scrub".
+
+When combining control via the device interfaces and region interfaces,
+"see Section 1.2 Region based scrubbing".
+
+Sysfs files for scrubbing are documented in
+`Documentation/ABI/testing/sysfs-edac-scrub`
+
+1.2. Region based scrubbing
+
+CXL memory is exposed to memory management subsystem and ultimately userspace
+via CXL regions. CXL Regions represent mapped memory capacity in system
+physical address space. These can incorporate one or more parts of multiple CXL
+memory devices with traffic interleaved across them. The user may want to control
+the scrub rate via this more abstract region instead of having to figure out the
+constituent devices and program them separately. The scrub rate for each device
+covers the whole device. Thus if multiple regions use parts of that device then
+requests for scrubbing of other regions may result in a higher scrub rate than
+requested for this specific region.
+
+Region-based scrubbing is used for the third use case described in
+"Section 1 CXL Memory Patrol Scrub".
+
+Userspace must follow below set of rules on how to set the scrub rates for any
+mixture of requirements.
+
+1. Taking each region in turn from lowest desired scrub rate to highest and set
+ their scrub rates. Later regions may override the scrub rate on individual
+ devices (and hence potentially whole regions).
+
+2. Take each device for which enhanced scrubbing is required (higher rate) and
+ set those scrub rates. This will override the scrub rates of individual devices,
+ setting them to the maximum rate required for any of the regions they help back,
+ unless a specific rate is already defined.
+
+Sysfs files for scrubbing are documented in
+`Documentation/ABI/testing/sysfs-edac-scrub`
+
+2. CXL memory Error Check Scrub (ECS)
+
+The Error Check Scrub (ECS) feature enables a memory device to perform error
+checking and correction (ECC) and count single-bit errors. The associated
+memory controller sets the ECS mode with a trigger sent to the memory
+device. CXL ECS control allows the host, thus the userspace, to change the
+attributes for error count mode, threshold number of errors per segment
+(indicating how many segments have at least that number of errors) for
+reporting errors, and reset the ECS counter. Thus the responsibility for
+initiating Error Check Scrub on a memory device may lie with the memory
+controller or platform when unexpectedly high error rates are detected.
+
+Sysfs files for scrubbing are documented in
+`Documentation/ABI/testing/sysfs-edac-ecs`
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info()
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
2025-05-21 12:47 ` [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 16:31 ` Fan Ni
2025-05-21 12:47 ` [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
` (9 subsequent siblings)
11 siblings, 1 reply; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add following changes to function get_support_feature_info()
1. Make generic to share between cxl-fwctl and cxl-edac paths.
2. Rename get_support_feature_info() to cxl_feature_info()
3. Change parameter const struct fwctl_rpc_cxl *rpc_in to
const uuid_t *uuid.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/core.h | 2 ++
drivers/cxl/core/features.c | 17 +++++++----------
2 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 17b692eb3257..613cce5c4f7b 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -124,6 +124,8 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
int nid, resource_size_t *size);
#ifdef CONFIG_CXL_FEATURES
+struct cxl_feat_entry *
+cxl_feature_info(struct cxl_features_state *cxlfs, const uuid_t *uuid);
size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
enum cxl_get_feat_selection selection,
void *feat_out, size_t feat_out_size, u16 offset,
diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
index 1498e2369c37..a83a2214a136 100644
--- a/drivers/cxl/core/features.c
+++ b/drivers/cxl/core/features.c
@@ -355,17 +355,11 @@ static void cxlctl_close_uctx(struct fwctl_uctx *uctx)
{
}
-static struct cxl_feat_entry *
-get_support_feature_info(struct cxl_features_state *cxlfs,
- const struct fwctl_rpc_cxl *rpc_in)
+struct cxl_feat_entry *
+cxl_feature_info(struct cxl_features_state *cxlfs,
+ const uuid_t *uuid)
{
struct cxl_feat_entry *feat;
- const uuid_t *uuid;
-
- if (rpc_in->op_size < sizeof(uuid))
- return ERR_PTR(-EINVAL);
-
- uuid = &rpc_in->set_feat_in.uuid;
for (int i = 0; i < cxlfs->entries->num_features; i++) {
feat = &cxlfs->entries->ent[i];
@@ -547,7 +541,10 @@ static bool cxlctl_validate_set_features(struct cxl_features_state *cxlfs,
struct cxl_feat_entry *feat;
u32 flags;
- feat = get_support_feature_info(cxlfs, rpc_in);
+ if (rpc_in->op_size < sizeof(uuid_t))
+ return ERR_PTR(-EINVAL);
+
+ feat = cxl_feature_info(cxlfs, &rpc_in->set_feat_in.uuid);
if (IS_ERR(feat))
return false;
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
2025-05-21 12:47 ` [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-05-21 12:47 ` [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 14:40 ` Jonathan Cameron
2025-05-21 17:07 ` Alison Schofield
2025-05-21 12:47 ` [PATCH v6 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
` (8 subsequent siblings)
11 siblings, 2 replies; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
control feature. The device patrol scrub proactively locates and makes
corrections to errors in regular cycle.
Allow specifying the number of hours within which the patrol scrub must be
completed, subject to minimum and maximum limits reported by the device.
Also allow disabling scrub allowing trade-off error rates against
performance.
Add support for patrol scrub control on CXL memory devices.
Register with the EDAC device driver, which retrieves the scrub attribute
descriptors from EDAC scrub and exposes the sysfs scrub control attributes
to userspace. For example, scrub control for the CXL memory device
"cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
Additionally, add support for region-based CXL memory patrol scrub control.
CXL memory regions may be interleaved across one or more CXL memory
devices. For example, region-based scrub control for "cxl_region1" is
exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
[dj: Add cxl_test inclusion of edac.o]
[dj: Check return from cxl_feature_info() with IS_ERR]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/Kconfig | 33 +++
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/edac.c | 520 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/region.c | 10 +
drivers/cxl/cxl.h | 10 +
drivers/cxl/cxlmem.h | 14 +
drivers/cxl/mem.c | 4 +
tools/testing/cxl/Kbuild | 1 +
8 files changed, 593 insertions(+)
create mode 100644 drivers/cxl/core/edac.c
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index cf1ba673b8c2..af72416edcd4 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -114,6 +114,39 @@ config CXL_FEATURES
If unsure say 'n'
+config CXL_EDAC_MEM_FEATURES
+ bool "CXL: EDAC Memory Features"
+ depends on EXPERT
+ depends on CXL_MEM
+ depends on CXL_FEATURES
+ depends on EDAC >= CXL_BUS
+ help
+ The CXL EDAC memory feature is optional and allows host to
+ control the EDAC memory features configurations of CXL memory
+ expander devices.
+
+ Say 'y' if you have an expert need to change default settings
+ of a memory RAS feature established by the platform/device.
+ Otherwise say 'n'.
+
+config CXL_EDAC_SCRUB
+ bool "Enable CXL Patrol Scrub Control (Patrol Read)"
+ depends on CXL_EDAC_MEM_FEATURES
+ depends on EDAC_SCRUB
+ help
+ The CXL EDAC scrub control is optional and allows host to
+ control the scrub feature configurations of CXL memory expander
+ devices.
+
+ When enabled 'cxl_mem' and 'cxl_region' EDAC devices are
+ published with memory scrub control attributes as described by
+ Documentation/ABI/testing/sysfs-edac-scrub.
+
+ Say 'y' if you have an expert need to change default settings
+ of a memory scrub feature established by the platform/device
+ (e.g. scrub rates for the patrol scrub feature).
+ Otherwise say 'n'.
+
config CXL_PORT
default CXL_BUS
tristate
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 086df97a0fcf..79e2ef81fde8 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -20,3 +20,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
cxl_core-$(CONFIG_CXL_MCE) += mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += features.o
+cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
new file mode 100644
index 000000000000..eae99ed7c018
--- /dev/null
+++ b/drivers/cxl/core/edac.c
@@ -0,0 +1,520 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CXL EDAC memory feature driver.
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited.
+ *
+ * - Supports functions to configure EDAC features of the
+ * CXL memory devices.
+ * - Registers with the EDAC device subsystem driver to expose
+ * the features sysfs attributes to the user for configuring
+ * CXL memory RAS feature.
+ */
+
+#include <linux/cleanup.h>
+#include <linux/edac.h>
+#include <linux/limits.h>
+#include <cxl/features.h>
+#include <cxl.h>
+#include <cxlmem.h>
+#include "core.h"
+
+#define CXL_NR_EDAC_DEV_FEATURES 1
+
+#define CXL_SCRUB_NO_REGION -1
+
+struct cxl_patrol_scrub_context {
+ u8 instance;
+ u16 get_feat_size;
+ u16 set_feat_size;
+ u8 get_version;
+ u8 set_version;
+ u16 effects;
+ struct cxl_memdev *cxlmd;
+ struct cxl_region *cxlr;
+};
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
+ * Feature Readable Attributes.
+ */
+struct cxl_scrub_rd_attrbs {
+ u8 scrub_cycle_cap;
+ __le16 scrub_cycle_hours;
+ u8 scrub_flags;
+} __packed;
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-223 Device Patrol Scrub Control
+ * Feature Writable Attributes.
+ */
+struct cxl_scrub_wr_attrbs {
+ u8 scrub_cycle_hours;
+ u8 scrub_flags;
+} __packed;
+
+#define CXL_SCRUB_CONTROL_CHANGEABLE BIT(0)
+#define CXL_SCRUB_CONTROL_REALTIME BIT(1)
+#define CXL_SCRUB_CONTROL_CYCLE_MASK GENMASK(7, 0)
+#define CXL_SCRUB_CONTROL_MIN_CYCLE_MASK GENMASK(15, 8)
+#define CXL_SCRUB_CONTROL_ENABLE BIT(0)
+
+#define CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap) \
+ FIELD_GET(CXL_SCRUB_CONTROL_CHANGEABLE, cap)
+#define CXL_GET_SCRUB_CYCLE(cycle) \
+ FIELD_GET(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
+#define CXL_GET_SCRUB_MIN_CYCLE(cycle) \
+ FIELD_GET(CXL_SCRUB_CONTROL_MIN_CYCLE_MASK, cycle)
+#define CXL_GET_SCRUB_EN_STS(flags) FIELD_GET(CXL_SCRUB_CONTROL_ENABLE, flags)
+
+#define CXL_SET_SCRUB_CYCLE(cycle) \
+ FIELD_PREP(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
+#define CXL_SET_SCRUB_EN(en) FIELD_PREP(CXL_SCRUB_CONTROL_ENABLE, en)
+
+static int cxl_mem_scrub_get_attrbs(struct cxl_mailbox *cxl_mbox, u8 *cap,
+ u16 *cycle, u8 *flags, u8 *min_cycle)
+{
+ size_t rd_data_size = sizeof(struct cxl_scrub_rd_attrbs);
+ size_t data_size;
+ struct cxl_scrub_rd_attrbs *rd_attrbs __free(kfree) =
+ kzalloc(rd_data_size, GFP_KERNEL);
+ if (!rd_attrbs)
+ return -ENOMEM;
+
+ data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
+ CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
+ rd_data_size, 0, NULL);
+ if (!data_size)
+ return -EIO;
+
+ *cap = rd_attrbs->scrub_cycle_cap;
+ *cycle = le16_to_cpu(rd_attrbs->scrub_cycle_hours);
+ *flags = rd_attrbs->scrub_flags;
+ if (min_cycle)
+ *min_cycle = CXL_GET_SCRUB_MIN_CYCLE(*cycle);
+
+ return 0;
+}
+
+static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
+ u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
+{
+ struct cxl_mailbox *cxl_mbox;
+ u8 min_scrub_cycle = U8_MAX;
+ struct cxl_region_params *p;
+ struct cxl_memdev *cxlmd;
+ struct cxl_region *cxlr;
+ int i, ret;
+
+ if (!cxl_ps_ctx->cxlr) {
+ cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
+ return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
+ flags, min_cycle);
+ }
+
+ struct rw_semaphore *region_lock __free(rwsem_read_release) =
+ rwsem_read_intr_acquire(&cxl_region_rwsem);
+ if (!region_lock)
+ return -EINTR;
+
+ cxlr = cxl_ps_ctx->cxlr;
+ p = &cxlr->params;
+
+ for (i = 0; i < p->nr_targets; i++) {
+ struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+ cxlmd = cxled_to_memdev(cxled);
+ cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
+ flags, min_cycle);
+ if (ret)
+ return ret;
+
+ if (min_cycle)
+ min_scrub_cycle =
+ min(*min_cycle, min_scrub_cycle);
+ }
+
+ if (min_cycle)
+ *min_cycle = min_scrub_cycle;
+
+ return 0;
+}
+
+static int cxl_scrub_set_attrbs_region(struct device *dev,
+ struct cxl_patrol_scrub_context *cxl_ps_ctx,
+ u8 cycle, u8 flags)
+{
+ struct cxl_scrub_wr_attrbs wr_attrbs;
+ struct cxl_mailbox *cxl_mbox;
+ struct cxl_region_params *p;
+ struct cxl_memdev *cxlmd;
+ struct cxl_region *cxlr;
+ int ret, i;
+
+ struct rw_semaphore *region_lock __free(rwsem_read_release) =
+ rwsem_read_intr_acquire(&cxl_region_rwsem);
+ if (!region_lock)
+ return -EINTR;
+
+ cxlr = cxl_ps_ctx->cxlr;
+ p = &cxlr->params;
+ wr_attrbs.scrub_cycle_hours = cycle;
+ wr_attrbs.scrub_flags = flags;
+
+ for (i = 0; i < p->nr_targets; i++) {
+ struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+ cxlmd = cxled_to_memdev(cxled);
+ cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
+ cxl_ps_ctx->set_version, &wr_attrbs,
+ sizeof(wr_attrbs),
+ CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET,
+ 0, NULL);
+ if (ret)
+ return ret;
+
+ if (cycle != cxlmd->scrub_cycle) {
+ if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
+ dev_info(dev,
+ "Device scrub rate(%d hours) set by region%d rate overwritten by region%d scrub rate(%d hours)\n",
+ cxlmd->scrub_cycle,
+ cxlmd->scrub_region_id, cxlr->id,
+ cycle);
+
+ cxlmd->scrub_cycle = cycle;
+ cxlmd->scrub_region_id = cxlr->id;
+ }
+ }
+
+ return 0;
+}
+
+static int cxl_scrub_set_attrbs_device(struct device *dev,
+ struct cxl_patrol_scrub_context *cxl_ps_ctx,
+ u8 cycle, u8 flags)
+{
+ struct cxl_scrub_wr_attrbs wr_attrbs;
+ struct cxl_mailbox *cxl_mbox;
+ struct cxl_memdev *cxlmd;
+ int ret;
+
+ wr_attrbs.scrub_cycle_hours = cycle;
+ wr_attrbs.scrub_flags = flags;
+
+ cxlmd = cxl_ps_ctx->cxlmd;
+ cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
+ cxl_ps_ctx->set_version, &wr_attrbs,
+ sizeof(wr_attrbs),
+ CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
+ NULL);
+ if (ret)
+ return ret;
+
+ if (cycle != cxlmd->scrub_cycle) {
+ if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
+ dev_info(dev,
+ "Device scrub rate(%d hours) set by region%d rate overwritten with device local scrub rate(%d hours)\n",
+ cxlmd->scrub_cycle, cxlmd->scrub_region_id,
+ cycle);
+
+ cxlmd->scrub_cycle = cycle;
+ cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
+ }
+
+ return 0;
+}
+
+static int cxl_scrub_set_attrbs(struct device *dev,
+ struct cxl_patrol_scrub_context *cxl_ps_ctx,
+ u8 cycle, u8 flags)
+{
+ if (cxl_ps_ctx->cxlr)
+ return cxl_scrub_set_attrbs_region(dev, cxl_ps_ctx, cycle, flags);
+
+ return cxl_scrub_set_attrbs_device(dev, cxl_ps_ctx, cycle, flags);
+}
+
+static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data,
+ bool *enabled)
+{
+ struct cxl_patrol_scrub_context *ctx = drv_data;
+ u8 cap, flags;
+ u16 cycle;
+ int ret;
+
+ ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
+ if (ret)
+ return ret;
+
+ *enabled = CXL_GET_SCRUB_EN_STS(flags);
+
+ return 0;
+}
+
+static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data,
+ bool enable)
+{
+ struct cxl_patrol_scrub_context *ctx = drv_data;
+ u8 cap, flags, wr_cycle;
+ u16 rd_cycle;
+ int ret;
+
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
+ ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, NULL);
+ if (ret)
+ return ret;
+
+ wr_cycle = CXL_GET_SCRUB_CYCLE(rd_cycle);
+ flags = CXL_SET_SCRUB_EN(enable);
+
+ return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
+}
+
+static int cxl_patrol_scrub_get_min_scrub_cycle(struct device *dev,
+ void *drv_data, u32 *min)
+{
+ struct cxl_patrol_scrub_context *ctx = drv_data;
+ u8 cap, flags, min_cycle;
+ u16 cycle;
+ int ret;
+
+ ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, &min_cycle);
+ if (ret)
+ return ret;
+
+ *min = min_cycle * 3600;
+
+ return 0;
+}
+
+static int cxl_patrol_scrub_get_max_scrub_cycle(struct device *dev,
+ void *drv_data, u32 *max)
+{
+ *max = U8_MAX * 3600; /* Max set by register size */
+
+ return 0;
+}
+
+static int cxl_patrol_scrub_get_scrub_cycle(struct device *dev, void *drv_data,
+ u32 *scrub_cycle_secs)
+{
+ struct cxl_patrol_scrub_context *ctx = drv_data;
+ u8 cap, flags;
+ u16 cycle;
+ int ret;
+
+ ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
+ if (ret)
+ return ret;
+
+ *scrub_cycle_secs = CXL_GET_SCRUB_CYCLE(cycle) * 3600;
+
+ return 0;
+}
+
+static int cxl_patrol_scrub_set_scrub_cycle(struct device *dev, void *drv_data,
+ u32 scrub_cycle_secs)
+{
+ struct cxl_patrol_scrub_context *ctx = drv_data;
+ u8 scrub_cycle_hours = scrub_cycle_secs / 3600;
+ u8 cap, wr_cycle, flags, min_cycle;
+ u16 rd_cycle;
+ int ret;
+
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
+ ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, &min_cycle);
+ if (ret)
+ return ret;
+
+ if (!CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap))
+ return -EOPNOTSUPP;
+
+ if (scrub_cycle_hours < min_cycle) {
+ dev_dbg(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
+ scrub_cycle_hours);
+ dev_dbg(dev,
+ "Minimum supported CXL patrol scrub cycle in hour %d\n",
+ min_cycle);
+ return -EINVAL;
+ }
+ wr_cycle = CXL_SET_SCRUB_CYCLE(scrub_cycle_hours);
+
+ return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
+}
+
+static const struct edac_scrub_ops cxl_ps_scrub_ops = {
+ .get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
+ .set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
+ .get_min_cycle = cxl_patrol_scrub_get_min_scrub_cycle,
+ .get_max_cycle = cxl_patrol_scrub_get_max_scrub_cycle,
+ .get_cycle_duration = cxl_patrol_scrub_get_scrub_cycle,
+ .set_cycle_duration = cxl_patrol_scrub_set_scrub_cycle,
+};
+
+static int cxl_memdev_scrub_init(struct cxl_memdev *cxlmd,
+ struct edac_dev_feature *ras_feature,
+ u8 scrub_inst)
+{
+ struct cxl_patrol_scrub_context *cxl_ps_ctx;
+ struct cxl_feat_entry *feat_entry;
+ u8 cap, flags;
+ u16 cycle;
+ int rc;
+
+ feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
+ &CXL_FEAT_PATROL_SCRUB_UUID);
+ if (IS_ERR(feat_entry))
+ return -EOPNOTSUPP;
+
+ if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+ return -EOPNOTSUPP;
+
+ cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+ if (!cxl_ps_ctx)
+ return -ENOMEM;
+
+ *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
+ .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+ .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+ .get_version = feat_entry->get_feat_ver,
+ .set_version = feat_entry->set_feat_ver,
+ .effects = le16_to_cpu(feat_entry->effects),
+ .instance = scrub_inst,
+ .cxlmd = cxlmd,
+ };
+
+ rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap, &cycle,
+ &flags, NULL);
+ if (rc)
+ return rc;
+
+ cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
+ cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
+
+ ras_feature->ft_type = RAS_FEAT_SCRUB;
+ ras_feature->instance = cxl_ps_ctx->instance;
+ ras_feature->scrub_ops = &cxl_ps_scrub_ops;
+ ras_feature->ctx = cxl_ps_ctx;
+
+ return 0;
+}
+
+static int cxl_region_scrub_init(struct cxl_region *cxlr,
+ struct edac_dev_feature *ras_feature,
+ u8 scrub_inst)
+{
+ struct cxl_patrol_scrub_context *cxl_ps_ctx;
+ struct cxl_region_params *p = &cxlr->params;
+ struct cxl_feat_entry *feat_entry = NULL;
+ struct cxl_memdev *cxlmd;
+ u8 cap, flags;
+ u16 cycle;
+ int i, rc;
+
+ /*
+ * The cxl_region_rwsem must be held if the code below is used in a context
+ * other than when the region is in the probe state, as shown here.
+ */
+ for (i = 0; i < p->nr_targets; i++) {
+ struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+ cxlmd = cxled_to_memdev(cxled);
+ feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
+ &CXL_FEAT_PATROL_SCRUB_UUID);
+ if (IS_ERR(feat_entry))
+ return -EOPNOTSUPP;
+
+ if (!(le32_to_cpu(feat_entry->flags) &
+ CXL_FEATURE_F_CHANGEABLE))
+ return -EOPNOTSUPP;
+
+ rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap,
+ &cycle, &flags, NULL);
+ if (rc)
+ return rc;
+
+ cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
+ cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
+ }
+
+ cxl_ps_ctx = devm_kzalloc(&cxlr->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+ if (!cxl_ps_ctx)
+ return -ENOMEM;
+
+ *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
+ .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+ .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+ .get_version = feat_entry->get_feat_ver,
+ .set_version = feat_entry->set_feat_ver,
+ .effects = le16_to_cpu(feat_entry->effects),
+ .instance = scrub_inst,
+ .cxlr = cxlr,
+ };
+
+ ras_feature->ft_type = RAS_FEAT_SCRUB;
+ ras_feature->instance = cxl_ps_ctx->instance;
+ ras_feature->scrub_ops = &cxl_ps_scrub_ops;
+ ras_feature->ctx = cxl_ps_ctx;
+
+ return 0;
+}
+
+int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
+{
+ struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
+ int num_ras_features = 0;
+ int rc;
+
+ if (IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) {
+ rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features], 0);
+ if (rc < 0 && rc != -EOPNOTSUPP)
+ return rc;
+
+ if (rc != -EOPNOTSUPP)
+ num_ras_features++;
+ }
+
+ if (!num_ras_features)
+ return -EINVAL;
+
+ char *cxl_dev_name __free(kfree) =
+ kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
+ if (!cxl_dev_name)
+ return -ENOMEM;
+
+ return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
+ num_ras_features, ras_features);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
+
+int devm_cxl_region_edac_register(struct cxl_region *cxlr)
+{
+ struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
+ int num_ras_features = 0;
+ int rc;
+
+ if (!IS_ENABLED(CONFIG_CXL_EDAC_SCRUB))
+ return 0;
+
+ rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features], 0);
+ if (rc < 0)
+ return rc;
+
+ num_ras_features++;
+
+ char *cxl_dev_name __free(kfree) =
+ kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
+ if (!cxl_dev_name)
+ return -ENOMEM;
+
+ return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
+ num_ras_features, ras_features);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c3f4dc244df7..d5b8108c4a6d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3537,8 +3537,18 @@ static int cxl_region_probe(struct device *dev)
switch (cxlr->mode) {
case CXL_PARTMODE_PMEM:
+ rc = devm_cxl_region_edac_register(cxlr);
+ if (rc)
+ dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
+ cxlr->id);
+
return devm_cxl_add_pmem_region(cxlr);
case CXL_PARTMODE_RAM:
+ rc = devm_cxl_region_edac_register(cxlr);
+ if (rc)
+ dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
+ cxlr->id);
+
/*
* The region can not be manged by CXL if any portion of
* it is already online as 'System RAM'
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a9ab46eb0610..8a252f8483f7 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -912,4 +912,14 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
u16 cxl_gpf_get_dvsec(struct device *dev);
+static inline struct rw_semaphore *rwsem_read_intr_acquire(struct rw_semaphore *rwsem)
+{
+ if (down_read_interruptible(rwsem))
+ return NULL;
+
+ return rwsem;
+}
+
+DEFINE_FREE(rwsem_read_release, struct rw_semaphore *, if (_T) up_read(_T))
+
#endif /* __CXL_H__ */
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..872131009e4c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -45,6 +45,8 @@
* @endpoint: connection to the CXL port topology for this memory device
* @id: id number of this memdev instance.
* @depth: endpoint port depth
+ * @scrub_cycle: current scrub cycle set for this device
+ * @scrub_region_id: id number of a backed region (if any) for which current scrub cycle set
*/
struct cxl_memdev {
struct device dev;
@@ -56,6 +58,8 @@ struct cxl_memdev {
struct cxl_port *endpoint;
int id;
int depth;
+ u8 scrub_cycle;
+ int scrub_region_id;
};
static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
@@ -853,6 +857,16 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
+#ifdef CONFIG_CXL_EDAC_MEM_FEATURES
+int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
+int devm_cxl_region_edac_register(struct cxl_region *cxlr);
+#else
+static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
+{ return 0; }
+static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
+{ return 0; }
+#endif
+
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 9675243bd05b..6e6777b7bafb 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -180,6 +180,10 @@ static int cxl_mem_probe(struct device *dev)
return rc;
}
+ rc = devm_cxl_memdev_edac_register(cxlmd);
+ if (rc)
+ dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
+
/*
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 387f3df8b988..31a2d73c963f 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -67,6 +67,7 @@ cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
+cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
cxl_core-y += config_check.o
cxl_core-y += cxl_core_test.o
cxl_core-y += cxl_core_exports.o
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 4/8] cxl/edac: Add CXL memory device ECS control feature
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (2 preceding siblings ...)
2025-05-21 12:47 ` [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 12:47 ` [PATCH v6 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
` (7 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec 3.2 section 8.2.10.9.11.2 describes the DDR5 ECS (Error Check
Scrub) control feature.
The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts.
The ECS control allows the requester to change the log entry type, the ECS
threshold count (provided the request falls within the limits specified in
DDR5 mode registers), switch between codeword mode and row count mode, and
reset the ECS counter.
Register with EDAC device driver, which retrieves the ECS attribute
descriptors from the EDAC ECS and exposes the ECS control attributes to
userspace via sysfs. For example, the ECS control for the memory media FRU0
in CXL mem0 device is located at /sys/bus/edac/devices/cxl_mem0/ecs_fru0/
[dj: Check return from cxl_feature_info() with IS_ERR]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/Kconfig | 17 ++
drivers/cxl/core/edac.c | 359 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 375 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index af72416edcd4..51987f2a2548 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -147,6 +147,23 @@ config CXL_EDAC_SCRUB
(e.g. scrub rates for the patrol scrub feature).
Otherwise say 'n'.
+config CXL_EDAC_ECS
+ bool "Enable CXL Error Check Scrub (Repair)"
+ depends on CXL_EDAC_MEM_FEATURES
+ depends on EDAC_ECS
+ help
+ The CXL EDAC ECS control is optional and allows host to
+ control the ECS feature configurations of CXL memory expander
+ devices.
+
+ When enabled 'cxl_mem' EDAC devices are published with memory
+ ECS control attributes as described by
+ Documentation/ABI/testing/sysfs-edac-ecs.
+
+ Say 'y' if you have an expert need to change default settings
+ of a memory ECS feature established by the platform/device.
+ Otherwise say 'n'.
+
config CXL_PORT
default CXL_BUS
tristate
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index eae99ed7c018..9c15d4ce2162 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -19,7 +19,7 @@
#include <cxlmem.h>
#include "core.h"
-#define CXL_NR_EDAC_DEV_FEATURES 1
+#define CXL_NR_EDAC_DEV_FEATURES 2
#define CXL_SCRUB_NO_REGION -1
@@ -466,6 +466,354 @@ static int cxl_region_scrub_init(struct cxl_region *cxlr,
return 0;
}
+struct cxl_ecs_context {
+ u16 num_media_frus;
+ u16 get_feat_size;
+ u16 set_feat_size;
+ u8 get_version;
+ u8 set_version;
+ u16 effects;
+ struct cxl_memdev *cxlmd;
+};
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-225 DDR5 ECS Control Feature
+ * Readable Attributes.
+ */
+struct cxl_ecs_fru_rd_attrbs {
+ u8 ecs_cap;
+ __le16 ecs_config;
+ u8 ecs_flags;
+} __packed;
+
+struct cxl_ecs_rd_attrbs {
+ u8 ecs_log_cap;
+ struct cxl_ecs_fru_rd_attrbs fru_attrbs[];
+} __packed;
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-226 DDR5 ECS Control Feature
+ * Writable Attributes.
+ */
+struct cxl_ecs_fru_wr_attrbs {
+ __le16 ecs_config;
+} __packed;
+
+struct cxl_ecs_wr_attrbs {
+ u8 ecs_log_cap;
+ struct cxl_ecs_fru_wr_attrbs fru_attrbs[];
+} __packed;
+
+#define CXL_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0)
+#define CXL_ECS_REALTIME_REPORT_CAP_MASK BIT(0)
+#define CXL_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0)
+#define CXL_ECS_COUNT_MODE_MASK BIT(3)
+#define CXL_ECS_RESET_COUNTER_MASK BIT(4)
+#define CXL_ECS_RESET_COUNTER 1
+
+enum {
+ ECS_THRESHOLD_256 = 256,
+ ECS_THRESHOLD_1024 = 1024,
+ ECS_THRESHOLD_4096 = 4096,
+};
+
+enum {
+ ECS_THRESHOLD_IDX_256 = 3,
+ ECS_THRESHOLD_IDX_1024 = 4,
+ ECS_THRESHOLD_IDX_4096 = 5,
+};
+
+static const u16 ecs_supp_threshold[] = {
+ [ECS_THRESHOLD_IDX_256] = 256,
+ [ECS_THRESHOLD_IDX_1024] = 1024,
+ [ECS_THRESHOLD_IDX_4096] = 4096,
+};
+
+enum {
+ ECS_LOG_ENTRY_TYPE_DRAM = 0x0,
+ ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1,
+};
+
+enum cxl_ecs_count_mode {
+ ECS_MODE_COUNTS_ROWS = 0,
+ ECS_MODE_COUNTS_CODEWORDS = 1,
+};
+
+static int cxl_mem_ecs_get_attrbs(struct device *dev,
+ struct cxl_ecs_context *cxl_ecs_ctx,
+ int fru_id, u8 *log_cap, u16 *config)
+{
+ struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ struct cxl_ecs_fru_rd_attrbs *fru_rd_attrbs;
+ size_t rd_data_size;
+ size_t data_size;
+
+ rd_data_size = cxl_ecs_ctx->get_feat_size;
+
+ struct cxl_ecs_rd_attrbs *rd_attrbs __free(kvfree) =
+ kvzalloc(rd_data_size, GFP_KERNEL);
+ if (!rd_attrbs)
+ return -ENOMEM;
+
+ data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
+ CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
+ rd_data_size, 0, NULL);
+ if (!data_size)
+ return -EIO;
+
+ fru_rd_attrbs = rd_attrbs->fru_attrbs;
+ *log_cap = rd_attrbs->ecs_log_cap;
+ *config = le16_to_cpu(fru_rd_attrbs[fru_id].ecs_config);
+
+ return 0;
+}
+
+static int cxl_mem_ecs_set_attrbs(struct device *dev,
+ struct cxl_ecs_context *cxl_ecs_ctx,
+ int fru_id, u8 log_cap, u16 config)
+{
+ struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ struct cxl_ecs_fru_rd_attrbs *fru_rd_attrbs;
+ struct cxl_ecs_fru_wr_attrbs *fru_wr_attrbs;
+ size_t rd_data_size, wr_data_size;
+ u16 num_media_frus, count;
+ size_t data_size;
+
+ num_media_frus = cxl_ecs_ctx->num_media_frus;
+ rd_data_size = cxl_ecs_ctx->get_feat_size;
+ wr_data_size = cxl_ecs_ctx->set_feat_size;
+ struct cxl_ecs_rd_attrbs *rd_attrbs __free(kvfree) =
+ kvzalloc(rd_data_size, GFP_KERNEL);
+ if (!rd_attrbs)
+ return -ENOMEM;
+
+ data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
+ CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
+ rd_data_size, 0, NULL);
+ if (!data_size)
+ return -EIO;
+
+ struct cxl_ecs_wr_attrbs *wr_attrbs __free(kvfree) =
+ kvzalloc(wr_data_size, GFP_KERNEL);
+ if (!wr_attrbs)
+ return -ENOMEM;
+
+ /*
+ * Fill writable attributes from the current attributes read
+ * for all the media FRUs.
+ */
+ fru_rd_attrbs = rd_attrbs->fru_attrbs;
+ fru_wr_attrbs = wr_attrbs->fru_attrbs;
+ wr_attrbs->ecs_log_cap = log_cap;
+ for (count = 0; count < num_media_frus; count++)
+ fru_wr_attrbs[count].ecs_config =
+ fru_rd_attrbs[count].ecs_config;
+
+ fru_wr_attrbs[fru_id].ecs_config = cpu_to_le16(config);
+
+ return cxl_set_feature(cxl_mbox, &CXL_FEAT_ECS_UUID,
+ cxl_ecs_ctx->set_version, wr_attrbs,
+ wr_data_size,
+ CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET,
+ 0, NULL);
+}
+
+static u8 cxl_get_ecs_log_entry_type(u8 log_cap, u16 config)
+{
+ return FIELD_GET(CXL_ECS_LOG_ENTRY_TYPE_MASK, log_cap);
+}
+
+static u16 cxl_get_ecs_threshold(u8 log_cap, u16 config)
+{
+ u8 index = FIELD_GET(CXL_ECS_THRESHOLD_COUNT_MASK, config);
+
+ return ecs_supp_threshold[index];
+}
+
+static u8 cxl_get_ecs_count_mode(u8 log_cap, u16 config)
+{
+ return FIELD_GET(CXL_ECS_COUNT_MODE_MASK, config);
+}
+
+#define CXL_ECS_GET_ATTR(attrb) \
+ static int cxl_ecs_get_##attrb(struct device *dev, void *drv_data, \
+ int fru_id, u32 *val) \
+ { \
+ struct cxl_ecs_context *ctx = drv_data; \
+ u8 log_cap; \
+ u16 config; \
+ int ret; \
+ \
+ ret = cxl_mem_ecs_get_attrbs(dev, ctx, fru_id, &log_cap, \
+ &config); \
+ if (ret) \
+ return ret; \
+ \
+ *val = cxl_get_ecs_##attrb(log_cap, config); \
+ \
+ return 0; \
+ }
+
+CXL_ECS_GET_ATTR(log_entry_type)
+CXL_ECS_GET_ATTR(count_mode)
+CXL_ECS_GET_ATTR(threshold)
+
+static int cxl_set_ecs_log_entry_type(struct device *dev, u8 *log_cap,
+ u16 *config, u32 val)
+{
+ if (val != ECS_LOG_ENTRY_TYPE_DRAM &&
+ val != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
+ return -EINVAL;
+
+ *log_cap = FIELD_PREP(CXL_ECS_LOG_ENTRY_TYPE_MASK, val);
+
+ return 0;
+}
+
+static int cxl_set_ecs_threshold(struct device *dev, u8 *log_cap, u16 *config,
+ u32 val)
+{
+ *config &= ~CXL_ECS_THRESHOLD_COUNT_MASK;
+
+ switch (val) {
+ case ECS_THRESHOLD_256:
+ *config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+ ECS_THRESHOLD_IDX_256);
+ break;
+ case ECS_THRESHOLD_1024:
+ *config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+ ECS_THRESHOLD_IDX_1024);
+ break;
+ case ECS_THRESHOLD_4096:
+ *config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+ ECS_THRESHOLD_IDX_4096);
+ break;
+ default:
+ dev_dbg(dev, "Invalid CXL ECS threshold count(%d) to set\n",
+ val);
+ dev_dbg(dev, "Supported ECS threshold counts: %u, %u, %u\n",
+ ECS_THRESHOLD_256, ECS_THRESHOLD_1024,
+ ECS_THRESHOLD_4096);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int cxl_set_ecs_count_mode(struct device *dev, u8 *log_cap, u16 *config,
+ u32 val)
+{
+ if (val != ECS_MODE_COUNTS_ROWS && val != ECS_MODE_COUNTS_CODEWORDS) {
+ dev_dbg(dev, "Invalid CXL ECS scrub mode(%d) to set\n", val);
+ dev_dbg(dev,
+ "Supported ECS Modes: 0: ECS counts rows with errors,"
+ " 1: ECS counts codewords with errors\n");
+ return -EINVAL;
+ }
+
+ *config &= ~CXL_ECS_COUNT_MODE_MASK;
+ *config |= FIELD_PREP(CXL_ECS_COUNT_MODE_MASK, val);
+
+ return 0;
+}
+
+static int cxl_set_ecs_reset_counter(struct device *dev, u8 *log_cap,
+ u16 *config, u32 val)
+{
+ if (val != CXL_ECS_RESET_COUNTER)
+ return -EINVAL;
+
+ *config &= ~CXL_ECS_RESET_COUNTER_MASK;
+ *config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK, val);
+
+ return 0;
+}
+
+#define CXL_ECS_SET_ATTR(attrb) \
+ static int cxl_ecs_set_##attrb(struct device *dev, void *drv_data, \
+ int fru_id, u32 val) \
+ { \
+ struct cxl_ecs_context *ctx = drv_data; \
+ u8 log_cap; \
+ u16 config; \
+ int ret; \
+ \
+ if (!capable(CAP_SYS_RAWIO)) \
+ return -EPERM; \
+ \
+ ret = cxl_mem_ecs_get_attrbs(dev, ctx, fru_id, &log_cap, \
+ &config); \
+ if (ret) \
+ return ret; \
+ \
+ ret = cxl_set_ecs_##attrb(dev, &log_cap, &config, val); \
+ if (ret) \
+ return ret; \
+ \
+ return cxl_mem_ecs_set_attrbs(dev, ctx, fru_id, log_cap, \
+ config); \
+ }
+CXL_ECS_SET_ATTR(log_entry_type)
+CXL_ECS_SET_ATTR(count_mode)
+CXL_ECS_SET_ATTR(reset_counter)
+CXL_ECS_SET_ATTR(threshold)
+
+static const struct edac_ecs_ops cxl_ecs_ops = {
+ .get_log_entry_type = cxl_ecs_get_log_entry_type,
+ .set_log_entry_type = cxl_ecs_set_log_entry_type,
+ .get_mode = cxl_ecs_get_count_mode,
+ .set_mode = cxl_ecs_set_count_mode,
+ .reset = cxl_ecs_set_reset_counter,
+ .get_threshold = cxl_ecs_get_threshold,
+ .set_threshold = cxl_ecs_set_threshold,
+};
+
+static int cxl_memdev_ecs_init(struct cxl_memdev *cxlmd,
+ struct edac_dev_feature *ras_feature)
+{
+ struct cxl_ecs_context *cxl_ecs_ctx;
+ struct cxl_feat_entry *feat_entry;
+ int num_media_frus;
+
+ feat_entry =
+ cxl_feature_info(to_cxlfs(cxlmd->cxlds), &CXL_FEAT_ECS_UUID);
+ if (IS_ERR(feat_entry))
+ return -EOPNOTSUPP;
+
+ if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+ return -EOPNOTSUPP;
+
+ num_media_frus = (le16_to_cpu(feat_entry->get_feat_size) -
+ sizeof(struct cxl_ecs_rd_attrbs)) /
+ sizeof(struct cxl_ecs_fru_rd_attrbs);
+ if (!num_media_frus)
+ return -EOPNOTSUPP;
+
+ cxl_ecs_ctx =
+ devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx), GFP_KERNEL);
+ if (!cxl_ecs_ctx)
+ return -ENOMEM;
+
+ *cxl_ecs_ctx = (struct cxl_ecs_context){
+ .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+ .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+ .get_version = feat_entry->get_feat_ver,
+ .set_version = feat_entry->set_feat_ver,
+ .effects = le16_to_cpu(feat_entry->effects),
+ .num_media_frus = num_media_frus,
+ .cxlmd = cxlmd,
+ };
+
+ ras_feature->ft_type = RAS_FEAT_ECS;
+ ras_feature->ecs_ops = &cxl_ecs_ops;
+ ras_feature->ctx = cxl_ecs_ctx;
+ ras_feature->ecs_info.num_media_frus = num_media_frus;
+
+ return 0;
+}
+
int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
{
struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
@@ -481,6 +829,15 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
num_ras_features++;
}
+ if (IS_ENABLED(CONFIG_CXL_EDAC_ECS)) {
+ rc = cxl_memdev_ecs_init(cxlmd, &ras_features[num_ras_features]);
+ if (rc < 0 && rc != -EOPNOTSUPP)
+ return rc;
+
+ if (rc != -EOPNOTSUPP)
+ num_ras_features++;
+ }
+
if (!num_ras_features)
return -EINVAL;
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (3 preceding siblings ...)
2025-05-21 12:47 ` [PATCH v6 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 12:47 ` [PATCH v6 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
` (6 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add support for PERFORM_MAINTENANCE command.
CXL spec 3.2 section 8.2.10.7.1 describes the Perform Maintenance command.
This command requests the device to execute the maintenance operation
specified by the maintenance operation class and the maintenance operation
subclass.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/edac.c | 49 +++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxlmem.h | 1 +
2 files changed, 50 insertions(+)
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 9c15d4ce2162..36e222dbbbf0 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -814,6 +814,55 @@ static int cxl_memdev_ecs_init(struct cxl_memdev *cxlmd,
return 0;
}
+/*
+ * Perform Maintenance CXL 3.2 Spec 8.2.10.7.1
+ */
+
+/*
+ * Perform Maintenance input payload
+ * CXL rev 3.2 section 8.2.10.7.1 Table 8-117
+ */
+struct cxl_mbox_maintenance_hdr {
+ u8 op_class;
+ u8 op_subclass;
+} __packed;
+
+static int cxl_perform_maintenance(struct cxl_mailbox *cxl_mbox, u8 class,
+ u8 subclass, void *data_in,
+ size_t data_in_size)
+{
+ struct cxl_memdev_maintenance_pi {
+ struct cxl_mbox_maintenance_hdr hdr;
+ u8 data[];
+ } __packed;
+ struct cxl_mbox_cmd mbox_cmd;
+ size_t hdr_size;
+
+ struct cxl_memdev_maintenance_pi *pi __free(kvfree) =
+ kvzalloc(cxl_mbox->payload_size, GFP_KERNEL);
+ if (!pi)
+ return -ENOMEM;
+
+ pi->hdr.op_class = class;
+ pi->hdr.op_subclass = subclass;
+ hdr_size = sizeof(pi->hdr);
+ /*
+ * Check minimum mbox payload size is available for
+ * the maintenance data transfer.
+ */
+ if (hdr_size + data_in_size > cxl_mbox->payload_size)
+ return -ENOMEM;
+
+ memcpy(pi->data, data_in, data_in_size);
+ mbox_cmd = (struct cxl_mbox_cmd){
+ .opcode = CXL_MBOX_OP_DO_MAINTENANCE,
+ .size_in = hdr_size + data_in_size,
+ .payload_in = pi,
+ };
+
+ return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+}
+
int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
{
struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 872131009e4c..1d4fe19c554d 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -531,6 +531,7 @@ enum cxl_opcode {
CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
CXL_MBOX_OP_GET_FEATURE = 0x0501,
CXL_MBOX_OP_SET_FEATURE = 0x0502,
+ CXL_MBOX_OP_DO_MAINTENANCE = 0x0600,
CXL_MBOX_OP_IDENTIFY = 0x4000,
CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 6/8] cxl/edac: Support for finding memory operation attributes from the current boot
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (4 preceding siblings ...)
2025-05-21 12:47 ` [PATCH v6 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 12:47 ` [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
` (5 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Certain operations on memory, such as memory repair, are permitted
only when the address and other attributes for the operation are
from the current boot. This is determined by checking whether the
memory attributes for the operation match those in the CXL gen_media
or CXL DRAM memory event records reported during the current boot.
The CXL event records must be backed up because they are cleared
in the hardware after being processed by the kernel.
Support is added for storing CXL gen_media or CXL DRAM memory event
records in xarrays. Old records are deleted when they expire or when
there is an overflow and which depends on platform correctly report
Event Record Timestamp field of CXL spec Table 8-55 Common Event
Record Format.
Additionally, helper functions are implemented to find a matching
record in the xarray storage based on the memory attributes and
repair type.
Add validity check, when matching attributes for sparing, using
the validity flag in the DRAM event record, to ensure that all
required attributes for a requested repair operation are valid and
set.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/Kconfig | 21 +++
drivers/cxl/core/edac.c | 311 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/mbox.c | 11 +-
drivers/cxl/core/memdev.c | 1 +
drivers/cxl/cxlmem.h | 15 ++
5 files changed, 357 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 51987f2a2548..48b7314afdb8 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -164,6 +164,27 @@ config CXL_EDAC_ECS
of a memory ECS feature established by the platform/device.
Otherwise say 'n'.
+config CXL_EDAC_MEM_REPAIR
+ bool "Enable CXL Memory Repair"
+ depends on CXL_EDAC_MEM_FEATURES
+ depends on EDAC_MEM_REPAIR
+ help
+ The CXL EDAC memory repair control is optional and allows host
+ to control the memory repair features (e.g. sparing, PPR)
+ configurations of CXL memory expander devices.
+
+ When enabled, the memory repair feature requires an additional
+ memory of approximately 43KB to store CXL DRAM and CXL general
+ media event records.
+
+ When enabled 'cxl_mem' EDAC devices are published with memory
+ repair control attributes as described by
+ Documentation/ABI/testing/sysfs-edac-memory-repair.
+
+ Say 'y' if you have an expert need to change default settings
+ of a memory repair feature established by the platform/device.
+ Otherwise say 'n'.
+
config CXL_PORT
default CXL_BUS
tristate
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 36e222dbbbf0..f54c6a78259e 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -14,10 +14,12 @@
#include <linux/cleanup.h>
#include <linux/edac.h>
#include <linux/limits.h>
+#include <linux/xarray.h>
#include <cxl/features.h>
#include <cxl.h>
#include <cxlmem.h>
#include "core.h"
+#include "trace.h"
#define CXL_NR_EDAC_DEV_FEATURES 2
@@ -863,10 +865,285 @@ static int cxl_perform_maintenance(struct cxl_mailbox *cxl_mbox, u8 class,
return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
}
+/*
+ * Support for finding a memory operation attributes
+ * are from the current boot or not.
+ */
+
+struct cxl_mem_err_rec {
+ struct xarray rec_gen_media;
+ struct xarray rec_dram;
+};
+
+enum cxl_mem_repair_type {
+ CXL_PPR,
+ CXL_CACHELINE_SPARING,
+ CXL_ROW_SPARING,
+ CXL_BANK_SPARING,
+ CXL_RANK_SPARING,
+ CXL_REPAIR_MAX,
+};
+
+/**
+ * struct cxl_mem_repair_attrbs - CXL memory repair attributes
+ * @dpa: DPA of memory to repair
+ * @nibble_mask: nibble mask, identifies one or more nibbles on the memory bus
+ * @row: row of memory to repair
+ * @column: column of memory to repair
+ * @channel: channel of memory to repair
+ * @sub_channel: sub channel of memory to repair
+ * @rank: rank of memory to repair
+ * @bank_group: bank group of memory to repair
+ * @bank: bank of memory to repair
+ * @repair_type: repair type. For eg. PPR, memory sparing etc.
+ */
+struct cxl_mem_repair_attrbs {
+ u64 dpa;
+ u32 nibble_mask;
+ u32 row;
+ u16 column;
+ u8 channel;
+ u8 sub_channel;
+ u8 rank;
+ u8 bank_group;
+ u8 bank;
+ enum cxl_mem_repair_type repair_type;
+};
+
+static struct cxl_event_gen_media *
+cxl_find_rec_gen_media(struct cxl_memdev *cxlmd,
+ struct cxl_mem_repair_attrbs *attrbs)
+{
+ struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
+ struct cxl_event_gen_media *rec;
+
+ if (!array_rec)
+ return NULL;
+
+ rec = xa_load(&array_rec->rec_gen_media, attrbs->dpa);
+ if (!rec)
+ return NULL;
+
+ if (attrbs->repair_type == CXL_PPR)
+ return rec;
+
+ return NULL;
+}
+
+static struct cxl_event_dram *
+cxl_find_rec_dram(struct cxl_memdev *cxlmd,
+ struct cxl_mem_repair_attrbs *attrbs)
+{
+ struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
+ struct cxl_event_dram *rec;
+ u16 validity_flags;
+
+ if (!array_rec)
+ return NULL;
+
+ rec = xa_load(&array_rec->rec_dram, attrbs->dpa);
+ if (!rec)
+ return NULL;
+
+ validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags);
+ if (!(validity_flags & CXL_DER_VALID_CHANNEL) ||
+ !(validity_flags & CXL_DER_VALID_RANK))
+ return NULL;
+
+ switch (attrbs->repair_type) {
+ case CXL_PPR:
+ if (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+ get_unaligned_le24(rec->nibble_mask) == attrbs->nibble_mask)
+ return rec;
+ break;
+ case CXL_CACHELINE_SPARING:
+ if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
+ !(validity_flags & CXL_DER_VALID_BANK) ||
+ !(validity_flags & CXL_DER_VALID_ROW) ||
+ !(validity_flags & CXL_DER_VALID_COLUMN))
+ return NULL;
+
+ if (rec->media_hdr.channel == attrbs->channel &&
+ rec->media_hdr.rank == attrbs->rank &&
+ rec->bank_group == attrbs->bank_group &&
+ rec->bank == attrbs->bank &&
+ get_unaligned_le24(rec->row) == attrbs->row &&
+ get_unaligned_le16(rec->column) == attrbs->column &&
+ (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+ get_unaligned_le24(rec->nibble_mask) ==
+ attrbs->nibble_mask) &&
+ (!(validity_flags & CXL_DER_VALID_SUB_CHANNEL) ||
+ rec->sub_channel == attrbs->sub_channel))
+ return rec;
+ break;
+ case CXL_ROW_SPARING:
+ if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
+ !(validity_flags & CXL_DER_VALID_BANK) ||
+ !(validity_flags & CXL_DER_VALID_ROW))
+ return NULL;
+
+ if (rec->media_hdr.channel == attrbs->channel &&
+ rec->media_hdr.rank == attrbs->rank &&
+ rec->bank_group == attrbs->bank_group &&
+ rec->bank == attrbs->bank &&
+ get_unaligned_le24(rec->row) == attrbs->row &&
+ (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+ get_unaligned_le24(rec->nibble_mask) ==
+ attrbs->nibble_mask))
+ return rec;
+ break;
+ case CXL_BANK_SPARING:
+ if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) ||
+ !(validity_flags & CXL_DER_VALID_BANK))
+ return NULL;
+
+ if (rec->media_hdr.channel == attrbs->channel &&
+ rec->media_hdr.rank == attrbs->rank &&
+ rec->bank_group == attrbs->bank_group &&
+ rec->bank == attrbs->bank &&
+ (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+ get_unaligned_le24(rec->nibble_mask) ==
+ attrbs->nibble_mask))
+ return rec;
+ break;
+ case CXL_RANK_SPARING:
+ if (rec->media_hdr.channel == attrbs->channel &&
+ rec->media_hdr.rank == attrbs->rank &&
+ (!(validity_flags & CXL_DER_VALID_NIBBLE) ||
+ get_unaligned_le24(rec->nibble_mask) ==
+ attrbs->nibble_mask))
+ return rec;
+ break;
+ default:
+ return NULL;
+ }
+
+ return NULL;
+}
+
+#define CXL_MAX_STORAGE_DAYS 10
+#define CXL_MAX_STORAGE_TIME_SECS (CXL_MAX_STORAGE_DAYS * 24 * 60 * 60)
+
+static void cxl_del_expired_gmedia_recs(struct xarray *rec_xarray,
+ struct cxl_event_gen_media *cur_rec)
+{
+ u64 cur_ts = le64_to_cpu(cur_rec->media_hdr.hdr.timestamp);
+ struct cxl_event_gen_media *rec;
+ unsigned long index;
+ u64 delta_ts_secs;
+
+ xa_for_each(rec_xarray, index, rec) {
+ delta_ts_secs = (cur_ts -
+ le64_to_cpu(rec->media_hdr.hdr.timestamp)) / 1000000000ULL;
+ if (delta_ts_secs >= CXL_MAX_STORAGE_TIME_SECS) {
+ xa_erase(rec_xarray, index);
+ kfree(rec);
+ }
+ }
+}
+
+static void cxl_del_expired_dram_recs(struct xarray *rec_xarray,
+ struct cxl_event_dram *cur_rec)
+{
+ u64 cur_ts = le64_to_cpu(cur_rec->media_hdr.hdr.timestamp);
+ struct cxl_event_dram *rec;
+ unsigned long index;
+ u64 delta_secs;
+
+ xa_for_each(rec_xarray, index, rec) {
+ delta_secs = (cur_ts -
+ le64_to_cpu(rec->media_hdr.hdr.timestamp)) / 1000000000ULL;
+ if (delta_secs >= CXL_MAX_STORAGE_TIME_SECS) {
+ xa_erase(rec_xarray, index);
+ kfree(rec);
+ }
+ }
+}
+
+#define CXL_MAX_REC_STORAGE_COUNT 200
+
+static void cxl_del_overflow_old_recs(struct xarray *rec_xarray)
+{
+ void *err_rec;
+ unsigned long index, count = 0;
+
+ xa_for_each(rec_xarray, index, err_rec)
+ count++;
+
+ if (count <= CXL_MAX_REC_STORAGE_COUNT)
+ return;
+
+ count -= CXL_MAX_REC_STORAGE_COUNT;
+ xa_for_each(rec_xarray, index, err_rec) {
+ xa_erase(rec_xarray, index);
+ kfree(err_rec);
+ count--;
+ if (!count)
+ break;
+ }
+}
+
+int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt)
+{
+ struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
+ struct cxl_event_gen_media *rec;
+ void *old_rec;
+
+ if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec)
+ return 0;
+
+ rec = kmemdup(&evt->gen_media, sizeof(*rec), GFP_KERNEL);
+ if (!rec)
+ return -ENOMEM;
+
+ old_rec = xa_store(&array_rec->rec_gen_media,
+ le64_to_cpu(rec->media_hdr.phys_addr), rec,
+ GFP_KERNEL);
+ if (xa_is_err(old_rec))
+ return xa_err(old_rec);
+
+ kfree(old_rec);
+
+ cxl_del_expired_gmedia_recs(&array_rec->rec_gen_media, rec);
+ cxl_del_overflow_old_recs(&array_rec->rec_gen_media);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_store_rec_gen_media, "CXL");
+
+int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt)
+{
+ struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
+ struct cxl_event_dram *rec;
+ void *old_rec;
+
+ if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec)
+ return 0;
+
+ rec = kmemdup(&evt->dram, sizeof(*rec), GFP_KERNEL);
+ if (!rec)
+ return -ENOMEM;
+
+ old_rec = xa_store(&array_rec->rec_dram,
+ le64_to_cpu(rec->media_hdr.phys_addr), rec,
+ GFP_KERNEL);
+ if (xa_is_err(old_rec))
+ return xa_err(old_rec);
+
+ kfree(old_rec);
+
+ cxl_del_expired_dram_recs(&array_rec->rec_dram, rec);
+ cxl_del_overflow_old_recs(&array_rec->rec_dram);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_store_rec_dram, "CXL");
+
int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
{
struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
int num_ras_features = 0;
+ u8 repair_inst = 0;
int rc;
if (IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) {
@@ -887,6 +1164,20 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
num_ras_features++;
}
+ if (IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR)) {
+ if (repair_inst) {
+ struct cxl_mem_err_rec *array_rec =
+ devm_kzalloc(&cxlmd->dev, sizeof(*array_rec),
+ GFP_KERNEL);
+ if (!array_rec)
+ return -ENOMEM;
+
+ xa_init(&array_rec->rec_gen_media);
+ xa_init(&array_rec->rec_dram);
+ cxlmd->err_rec_array = array_rec;
+ }
+ }
+
if (!num_ras_features)
return -EINVAL;
@@ -924,3 +1215,23 @@ int devm_cxl_region_edac_register(struct cxl_region *cxlr)
num_ras_features, ras_features);
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
+
+void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd)
+{
+ struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
+ struct cxl_event_gen_media *rec_gen_media;
+ struct cxl_event_dram *rec_dram;
+ unsigned long index;
+
+ if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec)
+ return;
+
+ xa_for_each(&array_rec->rec_dram, index, rec_dram)
+ kfree(rec_dram);
+ xa_destroy(&array_rec->rec_dram);
+
+ xa_for_each(&array_rec->rec_gen_media, index, rec_gen_media)
+ kfree(rec_gen_media);
+ xa_destroy(&array_rec->rec_gen_media);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_release, "CXL");
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..2689e6453c5a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -922,12 +922,19 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
hpa_alias = hpa - cache_size;
}
- if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
+ if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
+ if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt))
+ dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n");
+
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
hpa_alias, &evt->gen_media);
- else if (event_type == CXL_CPER_EVENT_DRAM)
+ } else if (event_type == CXL_CPER_EVENT_DRAM) {
+ if (cxl_store_rec_dram((struct cxl_memdev *)cxlmd, evt))
+ dev_dbg(&cxlmd->dev, "CXL store rec_dram failed\n");
+
trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
&evt->dram);
+ }
}
}
EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index a16a5886d40a..953d8407d0dd 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -27,6 +27,7 @@ static void cxl_memdev_release(struct device *dev)
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
ida_free(&cxl_memdev_ida, cxlmd->id);
+ devm_cxl_memdev_edac_release(cxlmd);
kfree(cxlmd);
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 1d4fe19c554d..551b0ba2caa1 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -47,6 +47,9 @@
* @depth: endpoint port depth
* @scrub_cycle: current scrub cycle set for this device
* @scrub_region_id: id number of a backed region (if any) for which current scrub cycle set
+ * @err_rec_array: List of xarrarys to store the memdev error records to
+ * check attributes for a memory repair operation are from
+ * current boot.
*/
struct cxl_memdev {
struct device dev;
@@ -60,6 +63,7 @@ struct cxl_memdev {
int depth;
u8 scrub_cycle;
int scrub_region_id;
+ void *err_rec_array;
};
static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
@@ -861,11 +865,22 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
#ifdef CONFIG_CXL_EDAC_MEM_FEATURES
int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
int devm_cxl_region_edac_register(struct cxl_region *cxlr);
+int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt);
+int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt);
+void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd);
#else
static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
{ return 0; }
static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
{ return 0; }
+static inline int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd,
+ union cxl_event *evt)
+{ return 0; }
+static inline int cxl_store_rec_dram(struct cxl_memdev *cxlmd,
+ union cxl_event *evt)
+{ return 0; }
+static inline void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd)
+{ return; }
#endif
#ifdef CONFIG_CXL_SUSPEND
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (5 preceding siblings ...)
2025-05-21 12:47 ` [PATCH v6 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-23 18:50 ` Dan Williams
2025-05-21 12:47 ` [PATCH v6 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose
` (4 subsequent siblings)
11 siblings, 1 reply; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Memory sparing is defined as a repair function that replaces a portion of
memory with a portion of functional memory at that same DPA. The subclasses
for this operation vary in terms of the scope of the sparing being
performed. The cacheline sparing subclass refers to a sparing action that
can replace a full cacheline. Row sparing is provided as an alternative to
PPR sparing functions and its scope is that of a single DDR row.
As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
PPR when possible.
Bank sparing allows an entire bank to be replaced. Rank sparing is defined
as an operation in which an entire DDR rank is replaced.
Memory sparing maintenance operations may be supported by CXL devices
that implement CXL.mem protocol. A sparing maintenance operation requests
the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support memory sparing
features may implement sparing maintenance operations.
The host may issue a query command by setting query resources flag in the
input payload (CXL spec 3.2 Table 8-120) to determine availability of
sparing resources for a given address. In response to a query request,
the device shall report the resource availability by producing the memory
sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
of the values specified in the request.
During the execution of a sparing maintenance operation, a CXL memory
device:
- may not retain data
- may not be able to process CXL.mem requests correctly.
These CXL memory device capabilities are specified by restriction flags
in the memory sparing feature readable attributes.
When a CXL device identifies error on a memory component, the device
may inform the host about the need for a memory sparing maintenance
operation by using DRAM event record, where the 'maintenance needed' flag
may set. The event record contains some of the DPA, Channel, Rank,
Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
should be repaired. The userspace tool requests for maintenance operation
if the 'maintenance needed' flag set in the CXL DRAM error record.
CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
maintenance operation feature.
CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
discovery and configuration.
Add support for controlling CXL memory device memory sparing feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for memory sparing to the userspace. For example CXL memory
sparing control for the CXL mem0 device is exposed in
/sys/bus/edac/devices/cxl_mem0/mem_repairX/
Use case
========
1. CXL device identifies a failure in a memory component, report to
userspace in a CXL DRAM trace event with DPA and other attributes of
memory to repair such as channel, rank, nibble mask, bank Group,
bank, row, column, sub-channel.
2. Rasdaemon process the trace event and may issue query request in sysfs
check resources available for memory sparing if either of the following
conditions met.
- 'maintenance needed' flag set in the event record.
- 'threshold event' flag set for CVME threshold feature.
- When the number of corrected error reported on a CXL.mem media to the
userspace exceeds the threshold value for corrected error count defined
by the userspace policy.
3. Rasdaemon process the memory sparing trace event and issue repair
request for memory sparing.
Kernel CXL driver shall report memory sparing event record to the userspace
with the resource availability in order rasdaemon to process the event
record and issue a repair request in sysfs for the memory sparing operation
in the CXL device.
Note: Based on the feedbacks from the community 'query' sysfs attribute is
removed and reporting memory sparing error record to the userspace are not
supported. Instead userspace issues sparing operation and kernel does the
same to the CXL memory device, when 'maintenance needed' flag set in the
DRAM event record.
Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device.
Note: Tested memory sparing feature control with QEMU patch
"hw/cxl: Add emulation for memory sparing control feature"
https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6
[dj: Move cxl_is_memdev_memory_online() before its caller. (Alison)]
[dj: Check return from cxl_feature_info() with IS_ERR]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
Documentation/edac/memory_repair.rst | 21 ++
drivers/cxl/core/edac.c | 542 ++++++++++++++++++++++++++-
drivers/edac/mem_repair.c | 9 +
include/linux/edac.h | 7 +
4 files changed, 578 insertions(+), 1 deletion(-)
diff --git a/Documentation/edac/memory_repair.rst b/Documentation/edac/memory_repair.rst
index 52162a422864..5b1cd8297442 100644
--- a/Documentation/edac/memory_repair.rst
+++ b/Documentation/edac/memory_repair.rst
@@ -119,3 +119,24 @@ sysfs
Sysfs files are documented in
`Documentation/ABI/testing/sysfs-edac-memory-repair`.
+
+Examples
+--------
+
+The memory repair usage takes the form shown in this example:
+
+1. CXL memory sparing
+
+Memory sparing is defined as a repair function that replaces a portion of
+memory with a portion of functional memory at that same DPA. The subclass
+for this operation, cacheline/row/bank/rank sparing, vary in terms of the
+scope of the sparing being performed.
+
+Memory sparing maintenance operations may be supported by CXL devices that
+implement CXL.mem protocol. A sparing maintenance operation requests the
+CXL device to perform a repair operation on its media. For example, a CXL
+device with DRAM components that support memory sparing features may
+implement sparing maintenance operations.
+
+Sysfs files for memory repair are documented in
+`Documentation/ABI/testing/sysfs-edac-memory-repair`
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index f54c6a78259e..90f7f8df34bf 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -21,7 +21,7 @@
#include "core.h"
#include "trace.h"
-#define CXL_NR_EDAC_DEV_FEATURES 2
+#define CXL_NR_EDAC_DEV_FEATURES 6
#define CXL_SCRUB_NO_REGION -1
@@ -1139,6 +1139,533 @@ int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt)
}
EXPORT_SYMBOL_NS_GPL(cxl_store_rec_dram, "CXL");
+static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *port = cxlmd->endpoint;
+
+ if (port && cxl_num_decoders_committed(port))
+ return true;
+
+ return false;
+}
+
+/*
+ * CXL memory sparing control
+ */
+enum cxl_mem_sparing_granularity {
+ CXL_MEM_SPARING_CACHELINE,
+ CXL_MEM_SPARING_ROW,
+ CXL_MEM_SPARING_BANK,
+ CXL_MEM_SPARING_RANK,
+ CXL_MEM_SPARING_MAX
+};
+
+struct cxl_mem_sparing_context {
+ struct cxl_memdev *cxlmd;
+ uuid_t repair_uuid;
+ u16 get_feat_size;
+ u16 set_feat_size;
+ u16 effects;
+ u8 instance;
+ u8 get_version;
+ u8 set_version;
+ u8 op_class;
+ u8 op_subclass;
+ bool cap_safe_when_in_use;
+ bool cap_hard_sparing;
+ bool cap_soft_sparing;
+ u8 channel;
+ u8 rank;
+ u8 bank_group;
+ u32 nibble_mask;
+ u64 dpa;
+ u32 row;
+ u16 column;
+ u8 bank;
+ u8 sub_channel;
+ enum edac_mem_repair_type repair_type;
+ bool persist_mode;
+};
+
+#define CXL_SPARING_RD_CAP_SAFE_IN_USE_MASK BIT(0)
+#define CXL_SPARING_RD_CAP_HARD_SPARING_MASK BIT(1)
+#define CXL_SPARING_RD_CAP_SOFT_SPARING_MASK BIT(2)
+
+#define CXL_SPARING_WR_DEVICE_INITIATED_MASK BIT(0)
+
+#define CXL_SPARING_QUERY_RESOURCE_FLAG BIT(0)
+#define CXL_SET_HARD_SPARING_FLAG BIT(1)
+#define CXL_SPARING_SUB_CHNL_VALID_FLAG BIT(2)
+#define CXL_SPARING_NIB_MASK_VALID_FLAG BIT(3)
+
+#define CXL_GET_SPARING_SAFE_IN_USE(flags) \
+ (FIELD_GET(CXL_SPARING_RD_CAP_SAFE_IN_USE_MASK, \
+ flags) ^ 1)
+#define CXL_GET_CAP_HARD_SPARING(flags) \
+ FIELD_GET(CXL_SPARING_RD_CAP_HARD_SPARING_MASK, \
+ flags)
+#define CXL_GET_CAP_SOFT_SPARING(flags) \
+ FIELD_GET(CXL_SPARING_RD_CAP_SOFT_SPARING_MASK, \
+ flags)
+
+#define CXL_SET_SPARING_QUERY_RESOURCE(val) \
+ FIELD_PREP(CXL_SPARING_QUERY_RESOURCE_FLAG, val)
+#define CXL_SET_HARD_SPARING(val) \
+ FIELD_PREP(CXL_SET_HARD_SPARING_FLAG, val)
+#define CXL_SET_SPARING_SUB_CHNL_VALID(val) \
+ FIELD_PREP(CXL_SPARING_SUB_CHNL_VALID_FLAG, val)
+#define CXL_SET_SPARING_NIB_MASK_VALID(val) \
+ FIELD_PREP(CXL_SPARING_NIB_MASK_VALID_FLAG, val)
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.7.2.3 Table 8-134 Memory Sparing Feature
+ * Readable Attributes.
+ */
+struct cxl_memdev_repair_rd_attrbs_hdr {
+ u8 max_op_latency;
+ __le16 op_cap;
+ __le16 op_mode;
+ u8 op_class;
+ u8 op_subclass;
+ u8 rsvd[9];
+} __packed;
+
+struct cxl_memdev_sparing_rd_attrbs {
+ struct cxl_memdev_repair_rd_attrbs_hdr hdr;
+ u8 rsvd;
+ __le16 restriction_flags;
+} __packed;
+
+/*
+ * See CXL spec rev 3.2 @8.2.10.7.1.4 Table 8-120 Memory Sparing Input Payload.
+ */
+struct cxl_memdev_sparing_in_payload {
+ u8 flags;
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ __le16 column;
+ u8 sub_channel;
+} __packed;
+
+static int
+cxl_mem_sparing_get_attrbs(struct cxl_mem_sparing_context *cxl_sparing_ctx)
+{
+ size_t rd_data_size = sizeof(struct cxl_memdev_sparing_rd_attrbs);
+ struct cxl_memdev *cxlmd = cxl_sparing_ctx->cxlmd;
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ u16 restriction_flags;
+ size_t data_size;
+ u16 return_code;
+ struct cxl_memdev_sparing_rd_attrbs *rd_attrbs __free(kfree) =
+ kzalloc(rd_data_size, GFP_KERNEL);
+ if (!rd_attrbs)
+ return -ENOMEM;
+
+ data_size = cxl_get_feature(cxl_mbox, &cxl_sparing_ctx->repair_uuid,
+ CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
+ rd_data_size, 0, &return_code);
+ if (!data_size)
+ return -EIO;
+
+ cxl_sparing_ctx->op_class = rd_attrbs->hdr.op_class;
+ cxl_sparing_ctx->op_subclass = rd_attrbs->hdr.op_subclass;
+ restriction_flags = le16_to_cpu(rd_attrbs->restriction_flags);
+ cxl_sparing_ctx->cap_safe_when_in_use =
+ CXL_GET_SPARING_SAFE_IN_USE(restriction_flags);
+ cxl_sparing_ctx->cap_hard_sparing =
+ CXL_GET_CAP_HARD_SPARING(restriction_flags);
+ cxl_sparing_ctx->cap_soft_sparing =
+ CXL_GET_CAP_SOFT_SPARING(restriction_flags);
+
+ return 0;
+}
+
+static struct cxl_event_dram *
+cxl_mem_get_rec_dram(struct cxl_memdev *cxlmd,
+ struct cxl_mem_sparing_context *ctx)
+{
+ struct cxl_mem_repair_attrbs attrbs = { 0 };
+
+ attrbs.dpa = ctx->dpa;
+ attrbs.channel = ctx->channel;
+ attrbs.rank = ctx->rank;
+ attrbs.nibble_mask = ctx->nibble_mask;
+ switch (ctx->repair_type) {
+ case EDAC_REPAIR_CACHELINE_SPARING:
+ attrbs.repair_type = CXL_CACHELINE_SPARING;
+ attrbs.bank_group = ctx->bank_group;
+ attrbs.bank = ctx->bank;
+ attrbs.row = ctx->row;
+ attrbs.column = ctx->column;
+ attrbs.sub_channel = ctx->sub_channel;
+ break;
+ case EDAC_REPAIR_ROW_SPARING:
+ attrbs.repair_type = CXL_ROW_SPARING;
+ attrbs.bank_group = ctx->bank_group;
+ attrbs.bank = ctx->bank;
+ attrbs.row = ctx->row;
+ break;
+ case EDAC_REPAIR_BANK_SPARING:
+ attrbs.repair_type = CXL_BANK_SPARING;
+ attrbs.bank_group = ctx->bank_group;
+ attrbs.bank = ctx->bank;
+ break;
+ case EDAC_REPAIR_RANK_SPARING:
+ attrbs.repair_type = CXL_BANK_SPARING;
+ break;
+ default:
+ return NULL;
+ }
+
+ return cxl_find_rec_dram(cxlmd, &attrbs);
+}
+
+static int
+cxl_mem_perform_sparing(struct device *dev,
+ struct cxl_mem_sparing_context *cxl_sparing_ctx)
+{
+ struct cxl_memdev *cxlmd = cxl_sparing_ctx->cxlmd;
+ struct cxl_memdev_sparing_in_payload sparing_pi;
+ struct cxl_event_dram *rec = NULL;
+ u16 validity_flags = 0;
+
+ struct rw_semaphore *region_lock __free(rwsem_read_release) =
+ rwsem_read_intr_acquire(&cxl_region_rwsem);
+ if (!region_lock)
+ return -EINTR;
+
+ struct rw_semaphore *dpa_lock __free(rwsem_read_release) =
+ rwsem_read_intr_acquire(&cxl_dpa_rwsem);
+ if (!dpa_lock)
+ return -EINTR;
+
+ if (!cxl_sparing_ctx->cap_safe_when_in_use) {
+ /* Memory to repair must be offline */
+ if (cxl_is_memdev_memory_online(cxlmd))
+ return -EBUSY;
+ } else {
+ if (cxl_is_memdev_memory_online(cxlmd)) {
+ rec = cxl_mem_get_rec_dram(cxlmd, cxl_sparing_ctx);
+ if (!rec)
+ return -EINVAL;
+
+ if (!get_unaligned_le16(rec->media_hdr.validity_flags))
+ return -EINVAL;
+ }
+ }
+
+ memset(&sparing_pi, 0, sizeof(sparing_pi));
+ sparing_pi.flags = CXL_SET_SPARING_QUERY_RESOURCE(0);
+ if (cxl_sparing_ctx->persist_mode)
+ sparing_pi.flags |= CXL_SET_HARD_SPARING(1);
+
+ if (rec)
+ validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags);
+
+ switch (cxl_sparing_ctx->repair_type) {
+ case EDAC_REPAIR_CACHELINE_SPARING:
+ sparing_pi.column = cpu_to_le16(cxl_sparing_ctx->column);
+ if (!rec || (validity_flags & CXL_DER_VALID_SUB_CHANNEL)) {
+ sparing_pi.flags |= CXL_SET_SPARING_SUB_CHNL_VALID(1);
+ sparing_pi.sub_channel = cxl_sparing_ctx->sub_channel;
+ }
+ fallthrough;
+ case EDAC_REPAIR_ROW_SPARING:
+ put_unaligned_le24(cxl_sparing_ctx->row, sparing_pi.row);
+ fallthrough;
+ case EDAC_REPAIR_BANK_SPARING:
+ sparing_pi.bank_group = cxl_sparing_ctx->bank_group;
+ sparing_pi.bank = cxl_sparing_ctx->bank;
+ fallthrough;
+ case EDAC_REPAIR_RANK_SPARING:
+ sparing_pi.rank = cxl_sparing_ctx->rank;
+ fallthrough;
+ default:
+ sparing_pi.channel = cxl_sparing_ctx->channel;
+ if ((rec && (validity_flags & CXL_DER_VALID_NIBBLE)) ||
+ (!rec && (!cxl_sparing_ctx->nibble_mask ||
+ (cxl_sparing_ctx->nibble_mask & 0xFFFFFF)))) {
+ sparing_pi.flags |= CXL_SET_SPARING_NIB_MASK_VALID(1);
+ put_unaligned_le24(cxl_sparing_ctx->nibble_mask,
+ sparing_pi.nibble_mask);
+ }
+ break;
+ }
+
+ return cxl_perform_maintenance(&cxlmd->cxlds->cxl_mbox,
+ cxl_sparing_ctx->op_class,
+ cxl_sparing_ctx->op_subclass,
+ &sparing_pi, sizeof(sparing_pi));
+}
+
+static int cxl_mem_sparing_get_repair_type(struct device *dev, void *drv_data,
+ const char **repair_type)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+
+ switch (ctx->repair_type) {
+ case EDAC_REPAIR_CACHELINE_SPARING:
+ case EDAC_REPAIR_ROW_SPARING:
+ case EDAC_REPAIR_BANK_SPARING:
+ case EDAC_REPAIR_RANK_SPARING:
+ *repair_type = edac_repair_type[ctx->repair_type];
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+#define CXL_SPARING_GET_ATTR(attrb, data_type) \
+ static int cxl_mem_sparing_get_##attrb( \
+ struct device *dev, void *drv_data, data_type *val) \
+ { \
+ struct cxl_mem_sparing_context *ctx = drv_data; \
+ \
+ *val = ctx->attrb; \
+ \
+ return 0; \
+ }
+CXL_SPARING_GET_ATTR(persist_mode, bool)
+CXL_SPARING_GET_ATTR(dpa, u64)
+CXL_SPARING_GET_ATTR(nibble_mask, u32)
+CXL_SPARING_GET_ATTR(bank_group, u32)
+CXL_SPARING_GET_ATTR(bank, u32)
+CXL_SPARING_GET_ATTR(rank, u32)
+CXL_SPARING_GET_ATTR(row, u32)
+CXL_SPARING_GET_ATTR(column, u32)
+CXL_SPARING_GET_ATTR(channel, u32)
+CXL_SPARING_GET_ATTR(sub_channel, u32)
+
+#define CXL_SPARING_SET_ATTR(attrb, data_type) \
+ static int cxl_mem_sparing_set_##attrb(struct device *dev, \
+ void *drv_data, data_type val) \
+ { \
+ struct cxl_mem_sparing_context *ctx = drv_data; \
+ \
+ ctx->attrb = val; \
+ \
+ return 0; \
+ }
+CXL_SPARING_SET_ATTR(nibble_mask, u32)
+CXL_SPARING_SET_ATTR(bank_group, u32)
+CXL_SPARING_SET_ATTR(bank, u32)
+CXL_SPARING_SET_ATTR(rank, u32)
+CXL_SPARING_SET_ATTR(row, u32)
+CXL_SPARING_SET_ATTR(column, u32)
+CXL_SPARING_SET_ATTR(channel, u32)
+CXL_SPARING_SET_ATTR(sub_channel, u32)
+
+static int cxl_mem_sparing_set_persist_mode(struct device *dev, void *drv_data,
+ bool persist_mode)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+
+ if ((persist_mode && ctx->cap_hard_sparing) ||
+ (!persist_mode && ctx->cap_soft_sparing))
+ ctx->persist_mode = persist_mode;
+ else
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static int cxl_get_mem_sparing_safe_when_in_use(struct device *dev,
+ void *drv_data, bool *safe)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+
+ *safe = ctx->cap_safe_when_in_use;
+
+ return 0;
+}
+
+static int cxl_mem_sparing_get_min_dpa(struct device *dev, void *drv_data,
+ u64 *min_dpa)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+ struct cxl_memdev *cxlmd = ctx->cxlmd;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ *min_dpa = cxlds->dpa_res.start;
+
+ return 0;
+}
+
+static int cxl_mem_sparing_get_max_dpa(struct device *dev, void *drv_data,
+ u64 *max_dpa)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+ struct cxl_memdev *cxlmd = ctx->cxlmd;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ *max_dpa = cxlds->dpa_res.end;
+
+ return 0;
+}
+
+static int cxl_mem_sparing_set_dpa(struct device *dev, void *drv_data, u64 dpa)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+ struct cxl_memdev *cxlmd = ctx->cxlmd;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end)
+ return -EINVAL;
+
+ ctx->dpa = dpa;
+
+ return 0;
+}
+
+static int cxl_do_mem_sparing(struct device *dev, void *drv_data, u32 val)
+{
+ struct cxl_mem_sparing_context *ctx = drv_data;
+
+ if (val != EDAC_DO_MEM_REPAIR)
+ return -EINVAL;
+
+ return cxl_mem_perform_sparing(dev, ctx);
+}
+
+#define RANK_OPS \
+ .get_repair_type = cxl_mem_sparing_get_repair_type, \
+ .get_persist_mode = cxl_mem_sparing_get_persist_mode, \
+ .set_persist_mode = cxl_mem_sparing_set_persist_mode, \
+ .get_repair_safe_when_in_use = cxl_get_mem_sparing_safe_when_in_use, \
+ .get_min_dpa = cxl_mem_sparing_get_min_dpa, \
+ .get_max_dpa = cxl_mem_sparing_get_max_dpa, \
+ .get_dpa = cxl_mem_sparing_get_dpa, \
+ .set_dpa = cxl_mem_sparing_set_dpa, \
+ .get_nibble_mask = cxl_mem_sparing_get_nibble_mask, \
+ .set_nibble_mask = cxl_mem_sparing_set_nibble_mask, \
+ .get_rank = cxl_mem_sparing_get_rank, \
+ .set_rank = cxl_mem_sparing_set_rank, \
+ .get_channel = cxl_mem_sparing_get_channel, \
+ .set_channel = cxl_mem_sparing_set_channel, \
+ .do_repair = cxl_do_mem_sparing
+
+#define BANK_OPS \
+ RANK_OPS, .get_bank_group = cxl_mem_sparing_get_bank_group, \
+ .set_bank_group = cxl_mem_sparing_set_bank_group, \
+ .get_bank = cxl_mem_sparing_get_bank, \
+ .set_bank = cxl_mem_sparing_set_bank
+
+#define ROW_OPS \
+ BANK_OPS, .get_row = cxl_mem_sparing_get_row, \
+ .set_row = cxl_mem_sparing_set_row
+
+#define CACHELINE_OPS \
+ ROW_OPS, .get_column = cxl_mem_sparing_get_column, \
+ .set_column = cxl_mem_sparing_set_column, \
+ .get_sub_channel = cxl_mem_sparing_get_sub_channel, \
+ .set_sub_channel = cxl_mem_sparing_set_sub_channel
+
+static const struct edac_mem_repair_ops cxl_rank_sparing_ops = {
+ RANK_OPS,
+};
+
+static const struct edac_mem_repair_ops cxl_bank_sparing_ops = {
+ BANK_OPS,
+};
+
+static const struct edac_mem_repair_ops cxl_row_sparing_ops = {
+ ROW_OPS,
+};
+
+static const struct edac_mem_repair_ops cxl_cacheline_sparing_ops = {
+ CACHELINE_OPS,
+};
+
+struct cxl_mem_sparing_desc {
+ const uuid_t repair_uuid;
+ enum edac_mem_repair_type repair_type;
+ const struct edac_mem_repair_ops *repair_ops;
+};
+
+static const struct cxl_mem_sparing_desc mem_sparing_desc[] = {
+ {
+ .repair_uuid = CXL_FEAT_CACHELINE_SPARING_UUID,
+ .repair_type = EDAC_REPAIR_CACHELINE_SPARING,
+ .repair_ops = &cxl_cacheline_sparing_ops,
+ },
+ {
+ .repair_uuid = CXL_FEAT_ROW_SPARING_UUID,
+ .repair_type = EDAC_REPAIR_ROW_SPARING,
+ .repair_ops = &cxl_row_sparing_ops,
+ },
+ {
+ .repair_uuid = CXL_FEAT_BANK_SPARING_UUID,
+ .repair_type = EDAC_REPAIR_BANK_SPARING,
+ .repair_ops = &cxl_bank_sparing_ops,
+ },
+ {
+ .repair_uuid = CXL_FEAT_RANK_SPARING_UUID,
+ .repair_type = EDAC_REPAIR_RANK_SPARING,
+ .repair_ops = &cxl_rank_sparing_ops,
+ },
+};
+
+static int cxl_memdev_sparing_init(struct cxl_memdev *cxlmd,
+ struct edac_dev_feature *ras_feature,
+ const struct cxl_mem_sparing_desc *desc,
+ u8 repair_inst)
+{
+ struct cxl_mem_sparing_context *cxl_sparing_ctx;
+ struct cxl_feat_entry *feat_entry;
+ int ret;
+
+ feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
+ &desc->repair_uuid);
+ if (IS_ERR(feat_entry))
+ return -EOPNOTSUPP;
+
+ if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+ return -EOPNOTSUPP;
+
+ cxl_sparing_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sparing_ctx),
+ GFP_KERNEL);
+ if (!cxl_sparing_ctx)
+ return -ENOMEM;
+
+ *cxl_sparing_ctx = (struct cxl_mem_sparing_context){
+ .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+ .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+ .get_version = feat_entry->get_feat_ver,
+ .set_version = feat_entry->set_feat_ver,
+ .effects = le16_to_cpu(feat_entry->effects),
+ .cxlmd = cxlmd,
+ .repair_type = desc->repair_type,
+ .instance = repair_inst++,
+ };
+ uuid_copy(&cxl_sparing_ctx->repair_uuid, &desc->repair_uuid);
+
+ ret = cxl_mem_sparing_get_attrbs(cxl_sparing_ctx);
+ if (ret)
+ return ret;
+
+ if ((cxl_sparing_ctx->cap_soft_sparing &&
+ cxl_sparing_ctx->cap_hard_sparing) ||
+ cxl_sparing_ctx->cap_soft_sparing)
+ cxl_sparing_ctx->persist_mode = 0;
+ else if (cxl_sparing_ctx->cap_hard_sparing)
+ cxl_sparing_ctx->persist_mode = 1;
+ else
+ return -EOPNOTSUPP;
+
+ ras_feature->ft_type = RAS_FEAT_MEM_REPAIR;
+ ras_feature->instance = cxl_sparing_ctx->instance;
+ ras_feature->mem_repair_ops = desc->repair_ops;
+ ras_feature->ctx = cxl_sparing_ctx;
+
+ return 0;
+}
+
int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
{
struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
@@ -1165,6 +1692,19 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
}
if (IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR)) {
+ for (int i = 0; i < CXL_MEM_SPARING_MAX; i++) {
+ rc = cxl_memdev_sparing_init(cxlmd,
+ &ras_features[num_ras_features],
+ &mem_sparing_desc[i], repair_inst);
+ if (rc == -EOPNOTSUPP)
+ continue;
+ if (rc < 0)
+ return rc;
+
+ repair_inst++;
+ num_ras_features++;
+ }
+
if (repair_inst) {
struct cxl_mem_err_rec *array_rec =
devm_kzalloc(&cxlmd->dev, sizeof(*array_rec),
diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
index 3b1a845457b0..d1a8caa85369 100755
--- a/drivers/edac/mem_repair.c
+++ b/drivers/edac/mem_repair.c
@@ -45,6 +45,15 @@ struct edac_mem_repair_context {
struct attribute_group group;
};
+const char * const edac_repair_type[] = {
+ [EDAC_REPAIR_PPR] = "ppr",
+ [EDAC_REPAIR_CACHELINE_SPARING] = "cacheline-sparing",
+ [EDAC_REPAIR_ROW_SPARING] = "row-sparing",
+ [EDAC_REPAIR_BANK_SPARING] = "bank-sparing",
+ [EDAC_REPAIR_RANK_SPARING] = "rank-sparing",
+};
+EXPORT_SYMBOL_GPL(edac_repair_type);
+
#define TO_MR_DEV_ATTR(_dev_attr) \
container_of(_dev_attr, struct edac_mem_repair_dev_attr, dev_attr)
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 451f9c152c99..fa32f2aca22f 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -745,9 +745,16 @@ static inline int edac_ecs_get_desc(struct device *ecs_dev,
#endif /* CONFIG_EDAC_ECS */
enum edac_mem_repair_type {
+ EDAC_REPAIR_PPR,
+ EDAC_REPAIR_CACHELINE_SPARING,
+ EDAC_REPAIR_ROW_SPARING,
+ EDAC_REPAIR_BANK_SPARING,
+ EDAC_REPAIR_RANK_SPARING,
EDAC_REPAIR_MAX
};
+extern const char * const edac_repair_type[];
+
enum edac_mem_repair_cmd {
EDAC_DO_MEM_REPAIR = 1,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v6 8/8] cxl/edac: Add CXL memory device soft PPR control feature
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (6 preceding siblings ...)
2025-05-21 12:47 ` [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
@ 2025-05-21 12:47 ` shiju.jose
2025-05-21 14:59 ` [PATCH v6 0/8] cxl: support CXL memory RAS features Jonathan Cameron
` (3 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: shiju.jose @ 2025-05-21 12:47 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. Soft PPR is much faster than hPPR,
but the repair is lost with a power cycle.
During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.
Soft PPR maintenance operation may be executed at runtime, if data is
retained and CXL.mem requests are correctly processed. For CXL devices with
DRAM components, hPPR maintenance operation may be executed only at boot
because typically data may not be retained with hPPR maintenance operation.
When a CXL device identifies error on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.
CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.10.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.
CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and
configuration.
CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and
configuration.
Add support for controlling CXL memory device soft PPR (sPPR) feature.
Register with EDAC driver, which gets the memory repair attr descriptors
from the EDAC memory repair driver and exposes sysfs repair control
attributes for PRR to the userspace. For example CXL PPR control for the
CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/
Add checks to ensure the memory to be repaired is offline and originates
from a CXL DRAM or CXL gen_media error record reported in the current boot,
before requesting a PPR operation on the device.
Note: Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m70b2b010f43f7f4a6f9acee5ec9008498bf292c3
[dj: Check return from cxl_feature_info() with IS_ERR]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
Documentation/edac/memory_repair.rst | 10 +
drivers/cxl/core/edac.c | 328 ++++++++++++++++++++++++++-
2 files changed, 337 insertions(+), 1 deletion(-)
diff --git a/Documentation/edac/memory_repair.rst b/Documentation/edac/memory_repair.rst
index 5b1cd8297442..5f8da7c9b186 100644
--- a/Documentation/edac/memory_repair.rst
+++ b/Documentation/edac/memory_repair.rst
@@ -138,5 +138,15 @@ CXL device to perform a repair operation on its media. For example, a CXL
device with DRAM components that support memory sparing features may
implement sparing maintenance operations.
+2. CXL memory Soft Post Package Repair (sPPR)
+
+Post Package Repair (PPR) maintenance operations may be supported by CXL
+devices that implement CXL.mem protocol. A PPR maintenance operation
+requests the CXL device to perform a repair operation on its media.
+For example, a CXL device with DRAM components that support PPR features
+may implement PPR Maintenance operations. Soft PPR (sPPR) is a temporary
+row repair. Soft PPR may be faster, but the repair is lost with a power
+cycle.
+
Sysfs files for memory repair are documented in
`Documentation/ABI/testing/sysfs-edac-memory-repair`
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 90f7f8df34bf..eba00e99bf4b 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -14,6 +14,7 @@
#include <linux/cleanup.h>
#include <linux/edac.h>
#include <linux/limits.h>
+#include <linux/unaligned.h>
#include <linux/xarray.h>
#include <cxl/features.h>
#include <cxl.h>
@@ -21,7 +22,7 @@
#include "core.h"
#include "trace.h"
-#define CXL_NR_EDAC_DEV_FEATURES 6
+#define CXL_NR_EDAC_DEV_FEATURES 7
#define CXL_SCRUB_NO_REGION -1
@@ -1666,6 +1667,321 @@ static int cxl_memdev_sparing_init(struct cxl_memdev *cxlmd,
return 0;
}
+/*
+ * CXL memory soft PPR & hard PPR control
+ */
+struct cxl_ppr_context {
+ uuid_t repair_uuid;
+ u8 instance;
+ u16 get_feat_size;
+ u16 set_feat_size;
+ u8 get_version;
+ u8 set_version;
+ u16 effects;
+ u8 op_class;
+ u8 op_subclass;
+ bool cap_dpa;
+ bool cap_nib_mask;
+ bool media_accessible;
+ bool data_retained;
+ struct cxl_memdev *cxlmd;
+ enum edac_mem_repair_type repair_type;
+ bool persist_mode;
+ u64 dpa;
+ u32 nibble_mask;
+};
+
+/*
+ * See CXL rev 3.2 @8.2.10.7.2.1 Table 8-128 sPPR Feature Readable Attributes
+ *
+ * See CXL rev 3.2 @8.2.10.7.2.2 Table 8-131 hPPR Feature Readable Attributes
+ */
+
+#define CXL_PPR_OP_CAP_DEVICE_INITIATED BIT(0)
+#define CXL_PPR_OP_MODE_DEV_INITIATED BIT(0)
+
+#define CXL_PPR_FLAG_DPA_SUPPORT_MASK BIT(0)
+#define CXL_PPR_FLAG_NIB_SUPPORT_MASK BIT(1)
+#define CXL_PPR_FLAG_MEM_SPARING_EV_REC_SUPPORT_MASK BIT(2)
+#define CXL_PPR_FLAG_DEV_INITED_PPR_AT_BOOT_CAP_MASK BIT(3)
+
+#define CXL_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK BIT(0)
+#define CXL_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK BIT(2)
+
+#define CXL_PPR_SPARING_EV_REC_EN_MASK BIT(0)
+#define CXL_PPR_DEV_INITED_PPR_AT_BOOT_EN_MASK BIT(1)
+
+#define CXL_PPR_GET_CAP_DPA(flags) \
+ FIELD_GET(CXL_PPR_FLAG_DPA_SUPPORT_MASK, flags)
+#define CXL_PPR_GET_CAP_NIB_MASK(flags) \
+ FIELD_GET(CXL_PPR_FLAG_NIB_SUPPORT_MASK, flags)
+#define CXL_PPR_GET_MEDIA_ACCESSIBLE(restriction_flags) \
+ (FIELD_GET(CXL_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK, \
+ restriction_flags) ^ 1)
+#define CXL_PPR_GET_DATA_RETAINED(restriction_flags) \
+ (FIELD_GET(CXL_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK, \
+ restriction_flags) ^ 1)
+
+struct cxl_memdev_ppr_rd_attrbs {
+ struct cxl_memdev_repair_rd_attrbs_hdr hdr;
+ u8 ppr_flags;
+ __le16 restriction_flags;
+ u8 ppr_op_mode;
+} __packed;
+
+/*
+ * See CXL rev 3.2 @8.2.10.7.1.2 Table 8-118 sPPR Maintenance Input Payload
+ *
+ * See CXL rev 3.2 @8.2.10.7.1.3 Table 8-119 hPPR Maintenance Input Payload
+ */
+struct cxl_memdev_ppr_maintenance_attrbs {
+ u8 flags;
+ __le64 dpa;
+ u8 nibble_mask[3];
+} __packed;
+
+static int cxl_mem_ppr_get_attrbs(struct cxl_ppr_context *cxl_ppr_ctx)
+{
+ size_t rd_data_size = sizeof(struct cxl_memdev_ppr_rd_attrbs);
+ struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
+ u16 restriction_flags;
+ size_t data_size;
+ u16 return_code;
+
+ struct cxl_memdev_ppr_rd_attrbs *rd_attrbs __free(kfree) =
+ kmalloc(rd_data_size, GFP_KERNEL);
+ if (!rd_attrbs)
+ return -ENOMEM;
+
+ data_size = cxl_get_feature(cxl_mbox, &cxl_ppr_ctx->repair_uuid,
+ CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
+ rd_data_size, 0, &return_code);
+ if (!data_size)
+ return -EIO;
+
+ cxl_ppr_ctx->op_class = rd_attrbs->hdr.op_class;
+ cxl_ppr_ctx->op_subclass = rd_attrbs->hdr.op_subclass;
+ cxl_ppr_ctx->cap_dpa = CXL_PPR_GET_CAP_DPA(rd_attrbs->ppr_flags);
+ cxl_ppr_ctx->cap_nib_mask =
+ CXL_PPR_GET_CAP_NIB_MASK(rd_attrbs->ppr_flags);
+
+ restriction_flags = le16_to_cpu(rd_attrbs->restriction_flags);
+ cxl_ppr_ctx->media_accessible =
+ CXL_PPR_GET_MEDIA_ACCESSIBLE(restriction_flags);
+ cxl_ppr_ctx->data_retained =
+ CXL_PPR_GET_DATA_RETAINED(restriction_flags);
+
+ return 0;
+}
+
+static int cxl_mem_perform_ppr(struct cxl_ppr_context *cxl_ppr_ctx)
+{
+ struct cxl_memdev_ppr_maintenance_attrbs maintenance_attrbs;
+ struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+ struct cxl_mem_repair_attrbs attrbs = { 0 };
+
+ struct rw_semaphore *region_lock __free(rwsem_read_release) =
+ rwsem_read_intr_acquire(&cxl_region_rwsem);
+ if (!region_lock)
+ return -EINTR;
+
+ struct rw_semaphore *dpa_lock __free(rwsem_read_release) =
+ rwsem_read_intr_acquire(&cxl_dpa_rwsem);
+ if (!dpa_lock)
+ return -EINTR;
+
+ if (!cxl_ppr_ctx->media_accessible || !cxl_ppr_ctx->data_retained) {
+ /* Memory to repair must be offline */
+ if (cxl_is_memdev_memory_online(cxlmd))
+ return -EBUSY;
+ } else {
+ if (cxl_is_memdev_memory_online(cxlmd)) {
+ /* Check memory to repair is from the current boot */
+ attrbs.repair_type = CXL_PPR;
+ attrbs.dpa = cxl_ppr_ctx->dpa;
+ attrbs.nibble_mask = cxl_ppr_ctx->nibble_mask;
+ if (!cxl_find_rec_dram(cxlmd, &attrbs) &&
+ !cxl_find_rec_gen_media(cxlmd, &attrbs))
+ return -EINVAL;
+ }
+ }
+
+ memset(&maintenance_attrbs, 0, sizeof(maintenance_attrbs));
+ maintenance_attrbs.flags = 0;
+ maintenance_attrbs.dpa = cpu_to_le64(cxl_ppr_ctx->dpa);
+ put_unaligned_le24(cxl_ppr_ctx->nibble_mask,
+ maintenance_attrbs.nibble_mask);
+
+ return cxl_perform_maintenance(&cxlmd->cxlds->cxl_mbox,
+ cxl_ppr_ctx->op_class,
+ cxl_ppr_ctx->op_subclass,
+ &maintenance_attrbs,
+ sizeof(maintenance_attrbs));
+}
+
+static int cxl_ppr_get_repair_type(struct device *dev, void *drv_data,
+ const char **repair_type)
+{
+ *repair_type = edac_repair_type[EDAC_REPAIR_PPR];
+
+ return 0;
+}
+
+static int cxl_ppr_get_persist_mode(struct device *dev, void *drv_data,
+ bool *persist_mode)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+ *persist_mode = cxl_ppr_ctx->persist_mode;
+
+ return 0;
+}
+
+static int cxl_get_ppr_safe_when_in_use(struct device *dev, void *drv_data,
+ bool *safe)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+ *safe = cxl_ppr_ctx->media_accessible & cxl_ppr_ctx->data_retained;
+
+ return 0;
+}
+
+static int cxl_ppr_get_min_dpa(struct device *dev, void *drv_data, u64 *min_dpa)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+ struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ *min_dpa = cxlds->dpa_res.start;
+
+ return 0;
+}
+
+static int cxl_ppr_get_max_dpa(struct device *dev, void *drv_data, u64 *max_dpa)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+ struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ *max_dpa = cxlds->dpa_res.end;
+
+ return 0;
+}
+
+static int cxl_ppr_get_dpa(struct device *dev, void *drv_data, u64 *dpa)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+ *dpa = cxl_ppr_ctx->dpa;
+
+ return 0;
+}
+
+static int cxl_ppr_set_dpa(struct device *dev, void *drv_data, u64 dpa)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+ struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end)
+ return -EINVAL;
+
+ cxl_ppr_ctx->dpa = dpa;
+
+ return 0;
+}
+
+static int cxl_ppr_get_nibble_mask(struct device *dev, void *drv_data,
+ u32 *nibble_mask)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+ *nibble_mask = cxl_ppr_ctx->nibble_mask;
+
+ return 0;
+}
+
+static int cxl_ppr_set_nibble_mask(struct device *dev, void *drv_data,
+ u32 nibble_mask)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+ cxl_ppr_ctx->nibble_mask = nibble_mask;
+
+ return 0;
+}
+
+static int cxl_do_ppr(struct device *dev, void *drv_data, u32 val)
+{
+ struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+ if (!cxl_ppr_ctx->dpa || val != EDAC_DO_MEM_REPAIR)
+ return -EINVAL;
+
+ return cxl_mem_perform_ppr(cxl_ppr_ctx);
+}
+
+static const struct edac_mem_repair_ops cxl_sppr_ops = {
+ .get_repair_type = cxl_ppr_get_repair_type,
+ .get_persist_mode = cxl_ppr_get_persist_mode,
+ .get_repair_safe_when_in_use = cxl_get_ppr_safe_when_in_use,
+ .get_min_dpa = cxl_ppr_get_min_dpa,
+ .get_max_dpa = cxl_ppr_get_max_dpa,
+ .get_dpa = cxl_ppr_get_dpa,
+ .set_dpa = cxl_ppr_set_dpa,
+ .get_nibble_mask = cxl_ppr_get_nibble_mask,
+ .set_nibble_mask = cxl_ppr_set_nibble_mask,
+ .do_repair = cxl_do_ppr,
+};
+
+static int cxl_memdev_soft_ppr_init(struct cxl_memdev *cxlmd,
+ struct edac_dev_feature *ras_feature,
+ u8 repair_inst)
+{
+ struct cxl_ppr_context *cxl_sppr_ctx;
+ struct cxl_feat_entry *feat_entry;
+ int ret;
+
+ feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
+ &CXL_FEAT_SPPR_UUID);
+ if (IS_ERR(feat_entry))
+ return -EOPNOTSUPP;
+
+ if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
+ return -EOPNOTSUPP;
+
+ cxl_sppr_ctx =
+ devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sppr_ctx), GFP_KERNEL);
+ if (!cxl_sppr_ctx)
+ return -ENOMEM;
+
+ *cxl_sppr_ctx = (struct cxl_ppr_context){
+ .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
+ .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
+ .get_version = feat_entry->get_feat_ver,
+ .set_version = feat_entry->set_feat_ver,
+ .effects = le16_to_cpu(feat_entry->effects),
+ .cxlmd = cxlmd,
+ .repair_type = EDAC_REPAIR_PPR,
+ .persist_mode = 0,
+ .instance = repair_inst,
+ };
+ uuid_copy(&cxl_sppr_ctx->repair_uuid, &CXL_FEAT_SPPR_UUID);
+
+ ret = cxl_mem_ppr_get_attrbs(cxl_sppr_ctx);
+ if (ret)
+ return ret;
+
+ ras_feature->ft_type = RAS_FEAT_MEM_REPAIR;
+ ras_feature->instance = cxl_sppr_ctx->instance;
+ ras_feature->mem_repair_ops = &cxl_sppr_ops;
+ ras_feature->ctx = cxl_sppr_ctx;
+
+ return 0;
+}
+
int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
{
struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
@@ -1705,6 +2021,16 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
num_ras_features++;
}
+ rc = cxl_memdev_soft_ppr_init(cxlmd, &ras_features[num_ras_features],
+ repair_inst);
+ if (rc < 0 && rc != -EOPNOTSUPP)
+ return rc;
+
+ if (rc != -EOPNOTSUPP) {
+ repair_inst++;
+ num_ras_features++;
+ }
+
if (repair_inst) {
struct cxl_mem_err_rec *array_rec =
devm_kzalloc(&cxlmd->dev, sizeof(*array_rec),
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
2025-05-21 12:47 ` [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
@ 2025-05-21 14:40 ` Jonathan Cameron
2025-05-21 23:55 ` Dave Jiang
2025-05-21 17:07 ` Alison Schofield
1 sibling, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2025-05-21 14:40 UTC (permalink / raw)
To: shiju.jose
Cc: linux-cxl, dan.j.williams, dave.jiang, dave, alison.schofield,
vishal.l.verma, ira.weiny, linux-edac, linux-doc, bp, tony.luck,
lenb, Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On Wed, 21 May 2025 13:47:41 +0100
<shiju.jose@huawei.com> wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
> control feature. The device patrol scrub proactively locates and makes
> corrections to errors in regular cycle.
>
> Allow specifying the number of hours within which the patrol scrub must be
> completed, subject to minimum and maximum limits reported by the device.
> Also allow disabling scrub allowing trade-off error rates against
> performance.
>
> Add support for patrol scrub control on CXL memory devices.
> Register with the EDAC device driver, which retrieves the scrub attribute
> descriptors from EDAC scrub and exposes the sysfs scrub control attributes
> to userspace. For example, scrub control for the CXL memory device
> "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
>
> Additionally, add support for region-based CXL memory patrol scrub control.
> CXL memory regions may be interleaved across one or more CXL memory
> devices. For example, region-based scrub control for "cxl_region1" is
> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
>
> [dj: Add cxl_test inclusion of edac.o]
> [dj: Check return from cxl_feature_info() with IS_ERR]
Trivial question on these. What do they reflect? Some changes
Dave made on a prior version? Or changes in response to feedback
(in which case they should be below the ---)
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
A couple of formatting trivial things inline from the refactors
in this version. Maybe Dave can tweak them whilst applying if
nothing else comes up?
J
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> new file mode 100644
> index 000000000000..eae99ed7c018
> --- /dev/null
> +++ b/drivers/cxl/core/edac.c
> @@ -0,0 +1,520 @@
> +static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
> + u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
> +{
> + struct cxl_mailbox *cxl_mbox;
> + u8 min_scrub_cycle = U8_MAX;
> + struct cxl_region_params *p;
> + struct cxl_memdev *cxlmd;
> + struct cxl_region *cxlr;
> + int i, ret;
> +
> + if (!cxl_ps_ctx->cxlr) {
> + cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
> + return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> + flags, min_cycle);
> + }
> +
> + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> + rwsem_read_intr_acquire(&cxl_region_rwsem);
Trivial but that should be indented one tab more.
> + if (!region_lock)
> + return -EINTR;
> +
> + cxlr = cxl_ps_ctx->cxlr;
> + p = &cxlr->params;
> +
> + for (i = 0; i < p->nr_targets; i++) {
> + struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> + cxlmd = cxled_to_memdev(cxled);
> + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> + ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> + flags, min_cycle);
Maybe move flags to previous line.
> + if (ret)
> + return ret;
> +
> + if (min_cycle)
> + min_scrub_cycle =
> + min(*min_cycle, min_scrub_cycle);
No need for the line wrap any more.
> + }
> +
> + if (min_cycle)
> + *min_cycle = min_scrub_cycle;
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 0/8] cxl: support CXL memory RAS features
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (7 preceding siblings ...)
2025-05-21 12:47 ` [PATCH v6 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose
@ 2025-05-21 14:59 ` Jonathan Cameron
2025-05-21 20:19 ` Alison Schofield
` (2 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2025-05-21 14:59 UTC (permalink / raw)
To: shiju.jose
Cc: linux-cxl, dan.j.williams, dave.jiang, dave, alison.schofield,
vishal.l.verma, ira.weiny, linux-edac, linux-doc, bp, tony.luck,
lenb, Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On Wed, 21 May 2025 13:47:38 +0100
<shiju.jose@huawei.com> wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Support for CXL memory EDAC features: patrol scrub, ECS, soft-PPR and
> memory sparing.
Thanks for the quick turn around! I took a (hopefully) final look through
and all I found was a couple of places where white space was slightly off
after the refactors for v6. Maybe Dave is fine tweaking those whilst
applying if there is no other reason to do another spin?
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
2025-05-21 12:47 ` [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
@ 2025-05-21 16:28 ` Fan Ni
0 siblings, 0 replies; 21+ messages in thread
From: Fan Ni @ 2025-05-21 16:28 UTC (permalink / raw)
To: shiju.jose
Cc: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny, linux-edac,
linux-doc, bp, tony.luck, lenb, Yazen.Ghannam, mchehab, nifan.cxl,
linuxarm, tanxiaofei, prime.zeng, roberto.sassu, kangkang.shen,
wanghuiqiang
On Wed, May 21, 2025 at 01:47:39PM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Update the Documentation/edac/scrub.rst to include use cases and
> policies for CXL memory device-based, CXL region-based patrol scrub
> control and CXL Error Check Scrub (ECS).
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
> ---
> Documentation/edac/scrub.rst | 76 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 76 insertions(+)
>
> diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
> index daab929cdba1..2cfa74fa1ffd 100644
> --- a/Documentation/edac/scrub.rst
> +++ b/Documentation/edac/scrub.rst
> @@ -264,3 +264,79 @@ Sysfs files are documented in
> `Documentation/ABI/testing/sysfs-edac-scrub`
>
> `Documentation/ABI/testing/sysfs-edac-ecs`
> +
> +Examples
> +--------
> +
> +The usage takes the form shown in these examples:
> +
> +1. CXL memory Patrol Scrub
> +
> +The following are the use cases identified why we might increase the scrub rate.
> +
> +- Scrubbing is needed at device granularity because a device is showing
> + unexpectedly high errors.
> +
> +- Scrubbing may apply to memory that isn't online at all yet. Likely this
> + is a system wide default setting on boot.
> +
> +- Scrubbing at a higher rate because the monitor software has determined that
> + more reliability is necessary for a particular data set. This is called
> + Differentiated Reliability.
> +
> +1.1. Device based scrubbing
> +
> +CXL memory is exposed to memory management subsystem and ultimately userspace
> +via CXL devices. Device-based scrubbing is used for the first use case
> +described in "Section 1 CXL Memory Patrol Scrub".
> +
> +When combining control via the device interfaces and region interfaces,
> +"see Section 1.2 Region based scrubbing".
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub`
> +
> +1.2. Region based scrubbing
> +
> +CXL memory is exposed to memory management subsystem and ultimately userspace
> +via CXL regions. CXL Regions represent mapped memory capacity in system
> +physical address space. These can incorporate one or more parts of multiple CXL
> +memory devices with traffic interleaved across them. The user may want to control
> +the scrub rate via this more abstract region instead of having to figure out the
> +constituent devices and program them separately. The scrub rate for each device
> +covers the whole device. Thus if multiple regions use parts of that device then
> +requests for scrubbing of other regions may result in a higher scrub rate than
> +requested for this specific region.
> +
> +Region-based scrubbing is used for the third use case described in
> +"Section 1 CXL Memory Patrol Scrub".
> +
> +Userspace must follow below set of rules on how to set the scrub rates for any
> +mixture of requirements.
> +
> +1. Taking each region in turn from lowest desired scrub rate to highest and set
> + their scrub rates. Later regions may override the scrub rate on individual
> + devices (and hence potentially whole regions).
> +
> +2. Take each device for which enhanced scrubbing is required (higher rate) and
> + set those scrub rates. This will override the scrub rates of individual devices,
> + setting them to the maximum rate required for any of the regions they help back,
> + unless a specific rate is already defined.
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub`
> +
> +2. CXL memory Error Check Scrub (ECS)
> +
> +The Error Check Scrub (ECS) feature enables a memory device to perform error
> +checking and correction (ECC) and count single-bit errors. The associated
> +memory controller sets the ECS mode with a trigger sent to the memory
> +device. CXL ECS control allows the host, thus the userspace, to change the
> +attributes for error count mode, threshold number of errors per segment
> +(indicating how many segments have at least that number of errors) for
> +reporting errors, and reset the ECS counter. Thus the responsibility for
> +initiating Error Check Scrub on a memory device may lie with the memory
> +controller or platform when unexpectedly high error rates are detected.
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-ecs`
> --
> 2.43.0
>
--
Fan Ni (From gmail)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info()
2025-05-21 12:47 ` [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
@ 2025-05-21 16:31 ` Fan Ni
0 siblings, 0 replies; 21+ messages in thread
From: Fan Ni @ 2025-05-21 16:31 UTC (permalink / raw)
To: shiju.jose
Cc: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
alison.schofield, vishal.l.verma, ira.weiny, linux-edac,
linux-doc, bp, tony.luck, lenb, Yazen.Ghannam, mchehab, nifan.cxl,
linuxarm, tanxiaofei, prime.zeng, roberto.sassu, kangkang.shen,
wanghuiqiang
On Wed, May 21, 2025 at 01:47:40PM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Add following changes to function get_support_feature_info()
> 1. Make generic to share between cxl-fwctl and cxl-edac paths.
> 2. Rename get_support_feature_info() to cxl_feature_info()
> 3. Change parameter const struct fwctl_rpc_cxl *rpc_in to
> const uuid_t *uuid.
>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
> ---
> drivers/cxl/core/core.h | 2 ++
> drivers/cxl/core/features.c | 17 +++++++----------
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 17b692eb3257..613cce5c4f7b 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -124,6 +124,8 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
> int nid, resource_size_t *size);
>
> #ifdef CONFIG_CXL_FEATURES
> +struct cxl_feat_entry *
> +cxl_feature_info(struct cxl_features_state *cxlfs, const uuid_t *uuid);
> size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
> enum cxl_get_feat_selection selection,
> void *feat_out, size_t feat_out_size, u16 offset,
> diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
> index 1498e2369c37..a83a2214a136 100644
> --- a/drivers/cxl/core/features.c
> +++ b/drivers/cxl/core/features.c
> @@ -355,17 +355,11 @@ static void cxlctl_close_uctx(struct fwctl_uctx *uctx)
> {
> }
>
> -static struct cxl_feat_entry *
> -get_support_feature_info(struct cxl_features_state *cxlfs,
> - const struct fwctl_rpc_cxl *rpc_in)
> +struct cxl_feat_entry *
> +cxl_feature_info(struct cxl_features_state *cxlfs,
> + const uuid_t *uuid)
> {
> struct cxl_feat_entry *feat;
> - const uuid_t *uuid;
> -
> - if (rpc_in->op_size < sizeof(uuid))
> - return ERR_PTR(-EINVAL);
> -
> - uuid = &rpc_in->set_feat_in.uuid;
>
> for (int i = 0; i < cxlfs->entries->num_features; i++) {
> feat = &cxlfs->entries->ent[i];
> @@ -547,7 +541,10 @@ static bool cxlctl_validate_set_features(struct cxl_features_state *cxlfs,
> struct cxl_feat_entry *feat;
> u32 flags;
>
> - feat = get_support_feature_info(cxlfs, rpc_in);
> + if (rpc_in->op_size < sizeof(uuid_t))
> + return ERR_PTR(-EINVAL);
> +
> + feat = cxl_feature_info(cxlfs, &rpc_in->set_feat_in.uuid);
> if (IS_ERR(feat))
> return false;
>
> --
> 2.43.0
>
--
Fan Ni (From gmail)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
2025-05-21 12:47 ` [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
2025-05-21 14:40 ` Jonathan Cameron
@ 2025-05-21 17:07 ` Alison Schofield
2025-05-21 17:48 ` Jonathan Cameron
1 sibling, 1 reply; 21+ messages in thread
From: Alison Schofield @ 2025-05-21 17:07 UTC (permalink / raw)
To: shiju.jose
Cc: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
vishal.l.verma, ira.weiny, linux-edac, linux-doc, bp, tony.luck,
lenb, Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On Wed, May 21, 2025 at 01:47:41PM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
> control feature. The device patrol scrub proactively locates and makes
> corrections to errors in regular cycle.
snip
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> new file mode 100644
> index 000000000000..eae99ed7c018
> --- /dev/null
> +++ b/drivers/cxl/core/edac.c
> @@ -0,0 +1,520 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * CXL EDAC memory feature driver.
> + *
> + * Copyright (c) 2024-2025 HiSilicon Limited.
> + *
> + * - Supports functions to configure EDAC features of the
> + * CXL memory devices.
> + * - Registers with the EDAC device subsystem driver to expose
> + * the features sysfs attributes to the user for configuring
> + * CXL memory RAS feature.
> + */
> +
> +#include <linux/cleanup.h>
> +#include <linux/edac.h>
> +#include <linux/limits.h>
> +#include <cxl/features.h>
This needs tidying-up. Not clear that tidy-up belongs in this patch.
sparse now complains:
drivers/cxl/core/edac.c: note: in included file:
./include/cxl/features.h:67:43: error: marked inline, but without a definition
because there is a proto of this in include/cxl/features.h and it is defined in
core/features.c. Compiler is looking for the definition in edac.c.
Removing the inline WFM but that may not be right soln:
diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
index a83a2214a136..4599e1d7668a 100644
--- a/drivers/cxl/core/features.c
+++ b/drivers/cxl/core/features.c
@@ -36,7 +36,7 @@ static bool is_cxl_feature_exclusive(struct cxl_feat_entry *entry)
return is_cxl_feature_exclusive_by_uuid(&entry->uuid);
}
-inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds)
+struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds)
{
return cxlds->cxlfs;
}
diff --git a/include/cxl/features.h b/include/cxl/features.h
index 5f7f842765a5..b9297693dae7 100644
--- a/include/cxl/features.h
+++ b/include/cxl/features.h
@@ -64,7 +64,7 @@ struct cxl_features_state {
struct cxl_mailbox;
struct cxl_memdev;
#ifdef CONFIG_CXL_FEATURES
-inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds);
+struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds);
int devm_cxl_setup_features(struct cxl_dev_state *cxlds);
int devm_cxl_setup_fwctl(struct device *host, struct cxl_memdev *cxlmd);
#else
> +#include <cxl.h>
> +#include <cxlmem.h>
> +#include "core.h"
> +
> +#define CXL_NR_EDAC_DEV_FEATURES 1
> +
> +#define CXL_SCRUB_NO_REGION -1
> +
> +struct cxl_patrol_scrub_context {
> + u8 instance;
> + u16 get_feat_size;
> + u16 set_feat_size;
> + u8 get_version;
> + u8 set_version;
> + u16 effects;
> + struct cxl_memdev *cxlmd;
> + struct cxl_region *cxlr;
> +};
> +
> +/*
> + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
> + * Feature Readable Attributes.
> + */
> +struct cxl_scrub_rd_attrbs {
> + u8 scrub_cycle_cap;
> + __le16 scrub_cycle_hours;
> + u8 scrub_flags;
> +} __packed;
> +
> +/*
> + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-223 Device Patrol Scrub Control
> + * Feature Writable Attributes.
> + */
> +struct cxl_scrub_wr_attrbs {
> + u8 scrub_cycle_hours;
> + u8 scrub_flags;
> +} __packed;
> +
> +#define CXL_SCRUB_CONTROL_CHANGEABLE BIT(0)
> +#define CXL_SCRUB_CONTROL_REALTIME BIT(1)
> +#define CXL_SCRUB_CONTROL_CYCLE_MASK GENMASK(7, 0)
> +#define CXL_SCRUB_CONTROL_MIN_CYCLE_MASK GENMASK(15, 8)
> +#define CXL_SCRUB_CONTROL_ENABLE BIT(0)
> +
> +#define CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap) \
> + FIELD_GET(CXL_SCRUB_CONTROL_CHANGEABLE, cap)
> +#define CXL_GET_SCRUB_CYCLE(cycle) \
> + FIELD_GET(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
> +#define CXL_GET_SCRUB_MIN_CYCLE(cycle) \
> + FIELD_GET(CXL_SCRUB_CONTROL_MIN_CYCLE_MASK, cycle)
> +#define CXL_GET_SCRUB_EN_STS(flags) FIELD_GET(CXL_SCRUB_CONTROL_ENABLE, flags)
> +
> +#define CXL_SET_SCRUB_CYCLE(cycle) \
> + FIELD_PREP(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
> +#define CXL_SET_SCRUB_EN(en) FIELD_PREP(CXL_SCRUB_CONTROL_ENABLE, en)
> +
> +static int cxl_mem_scrub_get_attrbs(struct cxl_mailbox *cxl_mbox, u8 *cap,
> + u16 *cycle, u8 *flags, u8 *min_cycle)
> +{
> + size_t rd_data_size = sizeof(struct cxl_scrub_rd_attrbs);
> + size_t data_size;
> + struct cxl_scrub_rd_attrbs *rd_attrbs __free(kfree) =
> + kzalloc(rd_data_size, GFP_KERNEL);
> + if (!rd_attrbs)
> + return -ENOMEM;
> +
> + data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
> + rd_data_size, 0, NULL);
> + if (!data_size)
> + return -EIO;
> +
> + *cap = rd_attrbs->scrub_cycle_cap;
> + *cycle = le16_to_cpu(rd_attrbs->scrub_cycle_hours);
> + *flags = rd_attrbs->scrub_flags;
> + if (min_cycle)
> + *min_cycle = CXL_GET_SCRUB_MIN_CYCLE(*cycle);
> +
> + return 0;
> +}
> +
> +static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
> + u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
> +{
> + struct cxl_mailbox *cxl_mbox;
> + u8 min_scrub_cycle = U8_MAX;
> + struct cxl_region_params *p;
> + struct cxl_memdev *cxlmd;
> + struct cxl_region *cxlr;
> + int i, ret;
> +
> + if (!cxl_ps_ctx->cxlr) {
> + cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
> + return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> + flags, min_cycle);
> + }
> +
> + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> + rwsem_read_intr_acquire(&cxl_region_rwsem);
> + if (!region_lock)
> + return -EINTR;
> +
> + cxlr = cxl_ps_ctx->cxlr;
> + p = &cxlr->params;
> +
> + for (i = 0; i < p->nr_targets; i++) {
> + struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> + cxlmd = cxled_to_memdev(cxled);
> + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> + ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> + flags, min_cycle);
> + if (ret)
> + return ret;
> +
> + if (min_cycle)
> + min_scrub_cycle =
> + min(*min_cycle, min_scrub_cycle);
> + }
> +
> + if (min_cycle)
> + *min_cycle = min_scrub_cycle;
> +
> + return 0;
> +}
> +
> +static int cxl_scrub_set_attrbs_region(struct device *dev,
> + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> + u8 cycle, u8 flags)
> +{
> + struct cxl_scrub_wr_attrbs wr_attrbs;
> + struct cxl_mailbox *cxl_mbox;
> + struct cxl_region_params *p;
> + struct cxl_memdev *cxlmd;
> + struct cxl_region *cxlr;
> + int ret, i;
> +
> + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> + rwsem_read_intr_acquire(&cxl_region_rwsem);
> + if (!region_lock)
> + return -EINTR;
> +
> + cxlr = cxl_ps_ctx->cxlr;
> + p = &cxlr->params;
> + wr_attrbs.scrub_cycle_hours = cycle;
> + wr_attrbs.scrub_flags = flags;
> +
> + for (i = 0; i < p->nr_targets; i++) {
> + struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> + cxlmd = cxled_to_memdev(cxled);
> + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> + cxl_ps_ctx->set_version, &wr_attrbs,
> + sizeof(wr_attrbs),
> + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET,
> + 0, NULL);
> + if (ret)
> + return ret;
> +
> + if (cycle != cxlmd->scrub_cycle) {
> + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
> + dev_info(dev,
> + "Device scrub rate(%d hours) set by region%d rate overwritten by region%d scrub rate(%d hours)\n",
> + cxlmd->scrub_cycle,
> + cxlmd->scrub_region_id, cxlr->id,
> + cycle);
> +
> + cxlmd->scrub_cycle = cycle;
> + cxlmd->scrub_region_id = cxlr->id;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int cxl_scrub_set_attrbs_device(struct device *dev,
> + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> + u8 cycle, u8 flags)
> +{
> + struct cxl_scrub_wr_attrbs wr_attrbs;
> + struct cxl_mailbox *cxl_mbox;
> + struct cxl_memdev *cxlmd;
> + int ret;
> +
> + wr_attrbs.scrub_cycle_hours = cycle;
> + wr_attrbs.scrub_flags = flags;
> +
> + cxlmd = cxl_ps_ctx->cxlmd;
> + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> + cxl_ps_ctx->set_version, &wr_attrbs,
> + sizeof(wr_attrbs),
> + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
> + NULL);
> + if (ret)
> + return ret;
> +
> + if (cycle != cxlmd->scrub_cycle) {
> + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
> + dev_info(dev,
> + "Device scrub rate(%d hours) set by region%d rate overwritten with device local scrub rate(%d hours)\n",
> + cxlmd->scrub_cycle, cxlmd->scrub_region_id,
> + cycle);
> +
> + cxlmd->scrub_cycle = cycle;
> + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> + }
> +
> + return 0;
> +}
> +
> +static int cxl_scrub_set_attrbs(struct device *dev,
> + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> + u8 cycle, u8 flags)
> +{
> + if (cxl_ps_ctx->cxlr)
> + return cxl_scrub_set_attrbs_region(dev, cxl_ps_ctx, cycle, flags);
> +
> + return cxl_scrub_set_attrbs_device(dev, cxl_ps_ctx, cycle, flags);
> +}
> +
> +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data,
> + bool *enabled)
> +{
> + struct cxl_patrol_scrub_context *ctx = drv_data;
> + u8 cap, flags;
> + u16 cycle;
> + int ret;
> +
> + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
> + if (ret)
> + return ret;
> +
> + *enabled = CXL_GET_SCRUB_EN_STS(flags);
> +
> + return 0;
> +}
> +
> +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data,
> + bool enable)
> +{
> + struct cxl_patrol_scrub_context *ctx = drv_data;
> + u8 cap, flags, wr_cycle;
> + u16 rd_cycle;
> + int ret;
> +
> + if (!capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
> + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, NULL);
> + if (ret)
> + return ret;
> +
> + wr_cycle = CXL_GET_SCRUB_CYCLE(rd_cycle);
> + flags = CXL_SET_SCRUB_EN(enable);
> +
> + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
> +}
> +
> +static int cxl_patrol_scrub_get_min_scrub_cycle(struct device *dev,
> + void *drv_data, u32 *min)
> +{
> + struct cxl_patrol_scrub_context *ctx = drv_data;
> + u8 cap, flags, min_cycle;
> + u16 cycle;
> + int ret;
> +
> + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, &min_cycle);
> + if (ret)
> + return ret;
> +
> + *min = min_cycle * 3600;
> +
> + return 0;
> +}
> +
> +static int cxl_patrol_scrub_get_max_scrub_cycle(struct device *dev,
> + void *drv_data, u32 *max)
> +{
> + *max = U8_MAX * 3600; /* Max set by register size */
> +
> + return 0;
> +}
> +
> +static int cxl_patrol_scrub_get_scrub_cycle(struct device *dev, void *drv_data,
> + u32 *scrub_cycle_secs)
> +{
> + struct cxl_patrol_scrub_context *ctx = drv_data;
> + u8 cap, flags;
> + u16 cycle;
> + int ret;
> +
> + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
> + if (ret)
> + return ret;
> +
> + *scrub_cycle_secs = CXL_GET_SCRUB_CYCLE(cycle) * 3600;
> +
> + return 0;
> +}
> +
> +static int cxl_patrol_scrub_set_scrub_cycle(struct device *dev, void *drv_data,
> + u32 scrub_cycle_secs)
> +{
> + struct cxl_patrol_scrub_context *ctx = drv_data;
> + u8 scrub_cycle_hours = scrub_cycle_secs / 3600;
> + u8 cap, wr_cycle, flags, min_cycle;
> + u16 rd_cycle;
> + int ret;
> +
> + if (!capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
> + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, &min_cycle);
> + if (ret)
> + return ret;
> +
> + if (!CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap))
> + return -EOPNOTSUPP;
> +
> + if (scrub_cycle_hours < min_cycle) {
> + dev_dbg(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> + scrub_cycle_hours);
> + dev_dbg(dev,
> + "Minimum supported CXL patrol scrub cycle in hour %d\n",
> + min_cycle);
> + return -EINVAL;
> + }
> + wr_cycle = CXL_SET_SCRUB_CYCLE(scrub_cycle_hours);
> +
> + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
> +}
> +
> +static const struct edac_scrub_ops cxl_ps_scrub_ops = {
> + .get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
> + .set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
> + .get_min_cycle = cxl_patrol_scrub_get_min_scrub_cycle,
> + .get_max_cycle = cxl_patrol_scrub_get_max_scrub_cycle,
> + .get_cycle_duration = cxl_patrol_scrub_get_scrub_cycle,
> + .set_cycle_duration = cxl_patrol_scrub_set_scrub_cycle,
> +};
> +
> +static int cxl_memdev_scrub_init(struct cxl_memdev *cxlmd,
> + struct edac_dev_feature *ras_feature,
> + u8 scrub_inst)
> +{
> + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> + struct cxl_feat_entry *feat_entry;
> + u8 cap, flags;
> + u16 cycle;
> + int rc;
> +
> + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
> + &CXL_FEAT_PATROL_SCRUB_UUID);
> + if (IS_ERR(feat_entry))
> + return -EOPNOTSUPP;
> +
> + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
> + return -EOPNOTSUPP;
> +
> + cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> + if (!cxl_ps_ctx)
> + return -ENOMEM;
> +
> + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
> + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
> + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
> + .get_version = feat_entry->get_feat_ver,
> + .set_version = feat_entry->set_feat_ver,
> + .effects = le16_to_cpu(feat_entry->effects),
> + .instance = scrub_inst,
> + .cxlmd = cxlmd,
> + };
> +
> + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap, &cycle,
> + &flags, NULL);
> + if (rc)
> + return rc;
> +
> + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
> + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> +
> + ras_feature->ft_type = RAS_FEAT_SCRUB;
> + ras_feature->instance = cxl_ps_ctx->instance;
> + ras_feature->scrub_ops = &cxl_ps_scrub_ops;
> + ras_feature->ctx = cxl_ps_ctx;
> +
> + return 0;
> +}
> +
> +static int cxl_region_scrub_init(struct cxl_region *cxlr,
> + struct edac_dev_feature *ras_feature,
> + u8 scrub_inst)
> +{
> + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> + struct cxl_region_params *p = &cxlr->params;
> + struct cxl_feat_entry *feat_entry = NULL;
> + struct cxl_memdev *cxlmd;
> + u8 cap, flags;
> + u16 cycle;
> + int i, rc;
> +
> + /*
> + * The cxl_region_rwsem must be held if the code below is used in a context
> + * other than when the region is in the probe state, as shown here.
> + */
> + for (i = 0; i < p->nr_targets; i++) {
> + struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> + cxlmd = cxled_to_memdev(cxled);
> + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
> + &CXL_FEAT_PATROL_SCRUB_UUID);
> + if (IS_ERR(feat_entry))
> + return -EOPNOTSUPP;
> +
> + if (!(le32_to_cpu(feat_entry->flags) &
> + CXL_FEATURE_F_CHANGEABLE))
> + return -EOPNOTSUPP;
> +
> + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap,
> + &cycle, &flags, NULL);
> + if (rc)
> + return rc;
> +
> + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
> + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> + }
> +
> + cxl_ps_ctx = devm_kzalloc(&cxlr->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> + if (!cxl_ps_ctx)
> + return -ENOMEM;
> +
> + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
> + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
> + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
> + .get_version = feat_entry->get_feat_ver,
> + .set_version = feat_entry->set_feat_ver,
> + .effects = le16_to_cpu(feat_entry->effects),
> + .instance = scrub_inst,
> + .cxlr = cxlr,
> + };
> +
> + ras_feature->ft_type = RAS_FEAT_SCRUB;
> + ras_feature->instance = cxl_ps_ctx->instance;
> + ras_feature->scrub_ops = &cxl_ps_scrub_ops;
> + ras_feature->ctx = cxl_ps_ctx;
> +
> + return 0;
> +}
> +
> +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> +{
> + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> + int num_ras_features = 0;
> + int rc;
> +
> + if (IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) {
> + rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features], 0);
> + if (rc < 0 && rc != -EOPNOTSUPP)
> + return rc;
> +
> + if (rc != -EOPNOTSUPP)
> + num_ras_features++;
> + }
> +
> + if (!num_ras_features)
> + return -EINVAL;
> +
> + char *cxl_dev_name __free(kfree) =
> + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
> + if (!cxl_dev_name)
> + return -ENOMEM;
> +
> + return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> + num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
> +
> +int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> +{
> + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> + int num_ras_features = 0;
> + int rc;
> +
> + if (!IS_ENABLED(CONFIG_CXL_EDAC_SCRUB))
> + return 0;
> +
> + rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features], 0);
> + if (rc < 0)
> + return rc;
> +
> + num_ras_features++;
> +
> + char *cxl_dev_name __free(kfree) =
> + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
> + if (!cxl_dev_name)
> + return -ENOMEM;
> +
> + return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
> + num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c3f4dc244df7..d5b8108c4a6d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3537,8 +3537,18 @@ static int cxl_region_probe(struct device *dev)
>
> switch (cxlr->mode) {
> case CXL_PARTMODE_PMEM:
> + rc = devm_cxl_region_edac_register(cxlr);
> + if (rc)
> + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
> + cxlr->id);
> +
> return devm_cxl_add_pmem_region(cxlr);
> case CXL_PARTMODE_RAM:
> + rc = devm_cxl_region_edac_register(cxlr);
> + if (rc)
> + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
> + cxlr->id);
> +
> /*
> * The region can not be manged by CXL if any portion of
> * it is already online as 'System RAM'
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index a9ab46eb0610..8a252f8483f7 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -912,4 +912,14 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
>
> u16 cxl_gpf_get_dvsec(struct device *dev);
>
> +static inline struct rw_semaphore *rwsem_read_intr_acquire(struct rw_semaphore *rwsem)
> +{
> + if (down_read_interruptible(rwsem))
> + return NULL;
> +
> + return rwsem;
> +}
> +
> +DEFINE_FREE(rwsem_read_release, struct rw_semaphore *, if (_T) up_read(_T))
> +
> #endif /* __CXL_H__ */
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 3ec6b906371b..872131009e4c 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -45,6 +45,8 @@
> * @endpoint: connection to the CXL port topology for this memory device
> * @id: id number of this memdev instance.
> * @depth: endpoint port depth
> + * @scrub_cycle: current scrub cycle set for this device
> + * @scrub_region_id: id number of a backed region (if any) for which current scrub cycle set
> */
> struct cxl_memdev {
> struct device dev;
> @@ -56,6 +58,8 @@ struct cxl_memdev {
> struct cxl_port *endpoint;
> int id;
> int depth;
> + u8 scrub_cycle;
> + int scrub_region_id;
> };
>
> static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
> @@ -853,6 +857,16 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
> int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
> int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>
> +#ifdef CONFIG_CXL_EDAC_MEM_FEATURES
> +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
> +int devm_cxl_region_edac_register(struct cxl_region *cxlr);
> +#else
> +static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> +{ return 0; }
> +static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> +{ return 0; }
> +#endif
> +
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 9675243bd05b..6e6777b7bafb 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -180,6 +180,10 @@ static int cxl_mem_probe(struct device *dev)
> return rc;
> }
>
> + rc = devm_cxl_memdev_edac_register(cxlmd);
> + if (rc)
> + dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
> +
> /*
> * The kernel may be operating out of CXL memory on this device,
> * there is no spec defined way to determine whether this device
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 387f3df8b988..31a2d73c963f 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -67,6 +67,7 @@ cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
> cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
> cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
> +cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
> cxl_core-y += config_check.o
> cxl_core-y += cxl_core_test.o
> cxl_core-y += cxl_core_exports.o
> --
> 2.43.0
>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
2025-05-21 17:07 ` Alison Schofield
@ 2025-05-21 17:48 ` Jonathan Cameron
2025-05-21 20:17 ` Alison Schofield
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2025-05-21 17:48 UTC (permalink / raw)
To: Alison Schofield
Cc: shiju.jose, linux-cxl, dan.j.williams, dave.jiang, dave,
vishal.l.verma, ira.weiny, linux-edac, linux-doc, bp, tony.luck,
lenb, Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On Wed, 21 May 2025 10:07:57 -0700
Alison Schofield <alison.schofield@intel.com> wrote:
> On Wed, May 21, 2025 at 01:47:41PM +0100, shiju.jose@huawei.com wrote:
> > From: Shiju Jose <shiju.jose@huawei.com>
> >
> > CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
> > control feature. The device patrol scrub proactively locates and makes
> > corrections to errors in regular cycle.
>
> snip
>
> > diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> > new file mode 100644
> > index 000000000000..eae99ed7c018
> > --- /dev/null
> > +++ b/drivers/cxl/core/edac.c
> > @@ -0,0 +1,520 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * CXL EDAC memory feature driver.
> > + *
> > + * Copyright (c) 2024-2025 HiSilicon Limited.
> > + *
> > + * - Supports functions to configure EDAC features of the
> > + * CXL memory devices.
> > + * - Registers with the EDAC device subsystem driver to expose
> > + * the features sysfs attributes to the user for configuring
> > + * CXL memory RAS feature.
> > + */
> > +
> > +#include <linux/cleanup.h>
> > +#include <linux/edac.h>
> > +#include <linux/limits.h>
> > +#include <cxl/features.h>
>
> This needs tidying-up. Not clear that tidy-up belongs in this patch.
> sparse now complains:
>
> drivers/cxl/core/edac.c: note: in included file:
> ./include/cxl/features.h:67:43: error: marked inline, but without a definition
>
> because there is a proto of this in include/cxl/features.h and it is defined in
> core/features.c. Compiler is looking for the definition in edac.c.
>
> Removing the inline WFM but that may not be right soln:
We definitely shouldn't have a non-static inline in a c file.
What would that actually mean?
So I think right fix, but needs to be a separate precursor patch
as the issue predates this series (no idea why we didn't see it before)
Alison, fancy spinning below into a patch?
Thanks,
Jonathan
>
> diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
> index a83a2214a136..4599e1d7668a 100644
> --- a/drivers/cxl/core/features.c
> +++ b/drivers/cxl/core/features.c
> @@ -36,7 +36,7 @@ static bool is_cxl_feature_exclusive(struct cxl_feat_entry *entry)
> return is_cxl_feature_exclusive_by_uuid(&entry->uuid);
> }
>
> -inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds)
> +struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds)
> {
> return cxlds->cxlfs;
> }
> diff --git a/include/cxl/features.h b/include/cxl/features.h
> index 5f7f842765a5..b9297693dae7 100644
> --- a/include/cxl/features.h
> +++ b/include/cxl/features.h
> @@ -64,7 +64,7 @@ struct cxl_features_state {
> struct cxl_mailbox;
> struct cxl_memdev;
> #ifdef CONFIG_CXL_FEATURES
> -inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds);
> +struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds);
> int devm_cxl_setup_features(struct cxl_dev_state *cxlds);
> int devm_cxl_setup_fwctl(struct device *host, struct cxl_memdev *cxlmd);
> #else
>
>
> > +#include <cxl.h>
> > +#include <cxlmem.h>
> > +#include "core.h"
> > +
> > +#define CXL_NR_EDAC_DEV_FEATURES 1
> > +
> > +#define CXL_SCRUB_NO_REGION -1
> > +
> > +struct cxl_patrol_scrub_context {
> > + u8 instance;
> > + u16 get_feat_size;
> > + u16 set_feat_size;
> > + u8 get_version;
> > + u8 set_version;
> > + u16 effects;
> > + struct cxl_memdev *cxlmd;
> > + struct cxl_region *cxlr;
> > +};
> > +
> > +/*
> > + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
> > + * Feature Readable Attributes.
> > + */
> > +struct cxl_scrub_rd_attrbs {
> > + u8 scrub_cycle_cap;
> > + __le16 scrub_cycle_hours;
> > + u8 scrub_flags;
> > +} __packed;
> > +
> > +/*
> > + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-223 Device Patrol Scrub Control
> > + * Feature Writable Attributes.
> > + */
> > +struct cxl_scrub_wr_attrbs {
> > + u8 scrub_cycle_hours;
> > + u8 scrub_flags;
> > +} __packed;
> > +
> > +#define CXL_SCRUB_CONTROL_CHANGEABLE BIT(0)
> > +#define CXL_SCRUB_CONTROL_REALTIME BIT(1)
> > +#define CXL_SCRUB_CONTROL_CYCLE_MASK GENMASK(7, 0)
> > +#define CXL_SCRUB_CONTROL_MIN_CYCLE_MASK GENMASK(15, 8)
> > +#define CXL_SCRUB_CONTROL_ENABLE BIT(0)
> > +
> > +#define CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap) \
> > + FIELD_GET(CXL_SCRUB_CONTROL_CHANGEABLE, cap)
> > +#define CXL_GET_SCRUB_CYCLE(cycle) \
> > + FIELD_GET(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
> > +#define CXL_GET_SCRUB_MIN_CYCLE(cycle) \
> > + FIELD_GET(CXL_SCRUB_CONTROL_MIN_CYCLE_MASK, cycle)
> > +#define CXL_GET_SCRUB_EN_STS(flags) FIELD_GET(CXL_SCRUB_CONTROL_ENABLE, flags)
> > +
> > +#define CXL_SET_SCRUB_CYCLE(cycle) \
> > + FIELD_PREP(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
> > +#define CXL_SET_SCRUB_EN(en) FIELD_PREP(CXL_SCRUB_CONTROL_ENABLE, en)
> > +
> > +static int cxl_mem_scrub_get_attrbs(struct cxl_mailbox *cxl_mbox, u8 *cap,
> > + u16 *cycle, u8 *flags, u8 *min_cycle)
> > +{
> > + size_t rd_data_size = sizeof(struct cxl_scrub_rd_attrbs);
> > + size_t data_size;
> > + struct cxl_scrub_rd_attrbs *rd_attrbs __free(kfree) =
> > + kzalloc(rd_data_size, GFP_KERNEL);
> > + if (!rd_attrbs)
> > + return -ENOMEM;
> > +
> > + data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> > + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
> > + rd_data_size, 0, NULL);
> > + if (!data_size)
> > + return -EIO;
> > +
> > + *cap = rd_attrbs->scrub_cycle_cap;
> > + *cycle = le16_to_cpu(rd_attrbs->scrub_cycle_hours);
> > + *flags = rd_attrbs->scrub_flags;
> > + if (min_cycle)
> > + *min_cycle = CXL_GET_SCRUB_MIN_CYCLE(*cycle);
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > + u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
> > +{
> > + struct cxl_mailbox *cxl_mbox;
> > + u8 min_scrub_cycle = U8_MAX;
> > + struct cxl_region_params *p;
> > + struct cxl_memdev *cxlmd;
> > + struct cxl_region *cxlr;
> > + int i, ret;
> > +
> > + if (!cxl_ps_ctx->cxlr) {
> > + cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
> > + return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> > + flags, min_cycle);
> > + }
> > +
> > + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> > + rwsem_read_intr_acquire(&cxl_region_rwsem);
> > + if (!region_lock)
> > + return -EINTR;
> > +
> > + cxlr = cxl_ps_ctx->cxlr;
> > + p = &cxlr->params;
> > +
> > + for (i = 0; i < p->nr_targets; i++) {
> > + struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +
> > + cxlmd = cxled_to_memdev(cxled);
> > + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> > + ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> > + flags, min_cycle);
> > + if (ret)
> > + return ret;
> > +
> > + if (min_cycle)
> > + min_scrub_cycle =
> > + min(*min_cycle, min_scrub_cycle);
> > + }
> > +
> > + if (min_cycle)
> > + *min_cycle = min_scrub_cycle;
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_scrub_set_attrbs_region(struct device *dev,
> > + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > + u8 cycle, u8 flags)
> > +{
> > + struct cxl_scrub_wr_attrbs wr_attrbs;
> > + struct cxl_mailbox *cxl_mbox;
> > + struct cxl_region_params *p;
> > + struct cxl_memdev *cxlmd;
> > + struct cxl_region *cxlr;
> > + int ret, i;
> > +
> > + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> > + rwsem_read_intr_acquire(&cxl_region_rwsem);
> > + if (!region_lock)
> > + return -EINTR;
> > +
> > + cxlr = cxl_ps_ctx->cxlr;
> > + p = &cxlr->params;
> > + wr_attrbs.scrub_cycle_hours = cycle;
> > + wr_attrbs.scrub_flags = flags;
> > +
> > + for (i = 0; i < p->nr_targets; i++) {
> > + struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +
> > + cxlmd = cxled_to_memdev(cxled);
> > + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> > + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> > + cxl_ps_ctx->set_version, &wr_attrbs,
> > + sizeof(wr_attrbs),
> > + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET,
> > + 0, NULL);
> > + if (ret)
> > + return ret;
> > +
> > + if (cycle != cxlmd->scrub_cycle) {
> > + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
> > + dev_info(dev,
> > + "Device scrub rate(%d hours) set by region%d rate overwritten by region%d scrub rate(%d hours)\n",
> > + cxlmd->scrub_cycle,
> > + cxlmd->scrub_region_id, cxlr->id,
> > + cycle);
> > +
> > + cxlmd->scrub_cycle = cycle;
> > + cxlmd->scrub_region_id = cxlr->id;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_scrub_set_attrbs_device(struct device *dev,
> > + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > + u8 cycle, u8 flags)
> > +{
> > + struct cxl_scrub_wr_attrbs wr_attrbs;
> > + struct cxl_mailbox *cxl_mbox;
> > + struct cxl_memdev *cxlmd;
> > + int ret;
> > +
> > + wr_attrbs.scrub_cycle_hours = cycle;
> > + wr_attrbs.scrub_flags = flags;
> > +
> > + cxlmd = cxl_ps_ctx->cxlmd;
> > + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> > + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> > + cxl_ps_ctx->set_version, &wr_attrbs,
> > + sizeof(wr_attrbs),
> > + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
> > + NULL);
> > + if (ret)
> > + return ret;
> > +
> > + if (cycle != cxlmd->scrub_cycle) {
> > + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
> > + dev_info(dev,
> > + "Device scrub rate(%d hours) set by region%d rate overwritten with device local scrub rate(%d hours)\n",
> > + cxlmd->scrub_cycle, cxlmd->scrub_region_id,
> > + cycle);
> > +
> > + cxlmd->scrub_cycle = cycle;
> > + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_scrub_set_attrbs(struct device *dev,
> > + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > + u8 cycle, u8 flags)
> > +{
> > + if (cxl_ps_ctx->cxlr)
> > + return cxl_scrub_set_attrbs_region(dev, cxl_ps_ctx, cycle, flags);
> > +
> > + return cxl_scrub_set_attrbs_device(dev, cxl_ps_ctx, cycle, flags);
> > +}
> > +
> > +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data,
> > + bool *enabled)
> > +{
> > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > + u8 cap, flags;
> > + u16 cycle;
> > + int ret;
> > +
> > + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
> > + if (ret)
> > + return ret;
> > +
> > + *enabled = CXL_GET_SCRUB_EN_STS(flags);
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data,
> > + bool enable)
> > +{
> > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > + u8 cap, flags, wr_cycle;
> > + u16 rd_cycle;
> > + int ret;
> > +
> > + if (!capable(CAP_SYS_RAWIO))
> > + return -EPERM;
> > +
> > + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, NULL);
> > + if (ret)
> > + return ret;
> > +
> > + wr_cycle = CXL_GET_SCRUB_CYCLE(rd_cycle);
> > + flags = CXL_SET_SCRUB_EN(enable);
> > +
> > + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
> > +}
> > +
> > +static int cxl_patrol_scrub_get_min_scrub_cycle(struct device *dev,
> > + void *drv_data, u32 *min)
> > +{
> > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > + u8 cap, flags, min_cycle;
> > + u16 cycle;
> > + int ret;
> > +
> > + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, &min_cycle);
> > + if (ret)
> > + return ret;
> > +
> > + *min = min_cycle * 3600;
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_patrol_scrub_get_max_scrub_cycle(struct device *dev,
> > + void *drv_data, u32 *max)
> > +{
> > + *max = U8_MAX * 3600; /* Max set by register size */
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_patrol_scrub_get_scrub_cycle(struct device *dev, void *drv_data,
> > + u32 *scrub_cycle_secs)
> > +{
> > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > + u8 cap, flags;
> > + u16 cycle;
> > + int ret;
> > +
> > + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
> > + if (ret)
> > + return ret;
> > +
> > + *scrub_cycle_secs = CXL_GET_SCRUB_CYCLE(cycle) * 3600;
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_patrol_scrub_set_scrub_cycle(struct device *dev, void *drv_data,
> > + u32 scrub_cycle_secs)
> > +{
> > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > + u8 scrub_cycle_hours = scrub_cycle_secs / 3600;
> > + u8 cap, wr_cycle, flags, min_cycle;
> > + u16 rd_cycle;
> > + int ret;
> > +
> > + if (!capable(CAP_SYS_RAWIO))
> > + return -EPERM;
> > +
> > + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, &min_cycle);
> > + if (ret)
> > + return ret;
> > +
> > + if (!CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap))
> > + return -EOPNOTSUPP;
> > +
> > + if (scrub_cycle_hours < min_cycle) {
> > + dev_dbg(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> > + scrub_cycle_hours);
> > + dev_dbg(dev,
> > + "Minimum supported CXL patrol scrub cycle in hour %d\n",
> > + min_cycle);
> > + return -EINVAL;
> > + }
> > + wr_cycle = CXL_SET_SCRUB_CYCLE(scrub_cycle_hours);
> > +
> > + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
> > +}
> > +
> > +static const struct edac_scrub_ops cxl_ps_scrub_ops = {
> > + .get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
> > + .set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
> > + .get_min_cycle = cxl_patrol_scrub_get_min_scrub_cycle,
> > + .get_max_cycle = cxl_patrol_scrub_get_max_scrub_cycle,
> > + .get_cycle_duration = cxl_patrol_scrub_get_scrub_cycle,
> > + .set_cycle_duration = cxl_patrol_scrub_set_scrub_cycle,
> > +};
> > +
> > +static int cxl_memdev_scrub_init(struct cxl_memdev *cxlmd,
> > + struct edac_dev_feature *ras_feature,
> > + u8 scrub_inst)
> > +{
> > + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> > + struct cxl_feat_entry *feat_entry;
> > + u8 cap, flags;
> > + u16 cycle;
> > + int rc;
> > +
> > + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
> > + &CXL_FEAT_PATROL_SCRUB_UUID);
> > + if (IS_ERR(feat_entry))
> > + return -EOPNOTSUPP;
> > +
> > + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
> > + return -EOPNOTSUPP;
> > +
> > + cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> > + if (!cxl_ps_ctx)
> > + return -ENOMEM;
> > +
> > + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
> > + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
> > + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
> > + .get_version = feat_entry->get_feat_ver,
> > + .set_version = feat_entry->set_feat_ver,
> > + .effects = le16_to_cpu(feat_entry->effects),
> > + .instance = scrub_inst,
> > + .cxlmd = cxlmd,
> > + };
> > +
> > + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap, &cycle,
> > + &flags, NULL);
> > + if (rc)
> > + return rc;
> > +
> > + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
> > + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> > +
> > + ras_feature->ft_type = RAS_FEAT_SCRUB;
> > + ras_feature->instance = cxl_ps_ctx->instance;
> > + ras_feature->scrub_ops = &cxl_ps_scrub_ops;
> > + ras_feature->ctx = cxl_ps_ctx;
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_region_scrub_init(struct cxl_region *cxlr,
> > + struct edac_dev_feature *ras_feature,
> > + u8 scrub_inst)
> > +{
> > + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> > + struct cxl_region_params *p = &cxlr->params;
> > + struct cxl_feat_entry *feat_entry = NULL;
> > + struct cxl_memdev *cxlmd;
> > + u8 cap, flags;
> > + u16 cycle;
> > + int i, rc;
> > +
> > + /*
> > + * The cxl_region_rwsem must be held if the code below is used in a context
> > + * other than when the region is in the probe state, as shown here.
> > + */
> > + for (i = 0; i < p->nr_targets; i++) {
> > + struct cxl_endpoint_decoder *cxled = p->targets[i];
> > +
> > + cxlmd = cxled_to_memdev(cxled);
> > + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
> > + &CXL_FEAT_PATROL_SCRUB_UUID);
> > + if (IS_ERR(feat_entry))
> > + return -EOPNOTSUPP;
> > +
> > + if (!(le32_to_cpu(feat_entry->flags) &
> > + CXL_FEATURE_F_CHANGEABLE))
> > + return -EOPNOTSUPP;
> > +
> > + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap,
> > + &cycle, &flags, NULL);
> > + if (rc)
> > + return rc;
> > +
> > + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
> > + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> > + }
> > +
> > + cxl_ps_ctx = devm_kzalloc(&cxlr->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> > + if (!cxl_ps_ctx)
> > + return -ENOMEM;
> > +
> > + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
> > + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
> > + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
> > + .get_version = feat_entry->get_feat_ver,
> > + .set_version = feat_entry->set_feat_ver,
> > + .effects = le16_to_cpu(feat_entry->effects),
> > + .instance = scrub_inst,
> > + .cxlr = cxlr,
> > + };
> > +
> > + ras_feature->ft_type = RAS_FEAT_SCRUB;
> > + ras_feature->instance = cxl_ps_ctx->instance;
> > + ras_feature->scrub_ops = &cxl_ps_scrub_ops;
> > + ras_feature->ctx = cxl_ps_ctx;
> > +
> > + return 0;
> > +}
> > +
> > +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> > +{
> > + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> > + int num_ras_features = 0;
> > + int rc;
> > +
> > + if (IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) {
> > + rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features], 0);
> > + if (rc < 0 && rc != -EOPNOTSUPP)
> > + return rc;
> > +
> > + if (rc != -EOPNOTSUPP)
> > + num_ras_features++;
> > + }
> > +
> > + if (!num_ras_features)
> > + return -EINVAL;
> > +
> > + char *cxl_dev_name __free(kfree) =
> > + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
> > + if (!cxl_dev_name)
> > + return -ENOMEM;
> > +
> > + return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> > + num_ras_features, ras_features);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
> > +
> > +int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> > +{
> > + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> > + int num_ras_features = 0;
> > + int rc;
> > +
> > + if (!IS_ENABLED(CONFIG_CXL_EDAC_SCRUB))
> > + return 0;
> > +
> > + rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features], 0);
> > + if (rc < 0)
> > + return rc;
> > +
> > + num_ras_features++;
> > +
> > + char *cxl_dev_name __free(kfree) =
> > + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
> > + if (!cxl_dev_name)
> > + return -ENOMEM;
> > +
> > + return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
> > + num_ras_features, ras_features);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index c3f4dc244df7..d5b8108c4a6d 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -3537,8 +3537,18 @@ static int cxl_region_probe(struct device *dev)
> >
> > switch (cxlr->mode) {
> > case CXL_PARTMODE_PMEM:
> > + rc = devm_cxl_region_edac_register(cxlr);
> > + if (rc)
> > + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
> > + cxlr->id);
> > +
> > return devm_cxl_add_pmem_region(cxlr);
> > case CXL_PARTMODE_RAM:
> > + rc = devm_cxl_region_edac_register(cxlr);
> > + if (rc)
> > + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
> > + cxlr->id);
> > +
> > /*
> > * The region can not be manged by CXL if any portion of
> > * it is already online as 'System RAM'
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index a9ab46eb0610..8a252f8483f7 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -912,4 +912,14 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
> >
> > u16 cxl_gpf_get_dvsec(struct device *dev);
> >
> > +static inline struct rw_semaphore *rwsem_read_intr_acquire(struct rw_semaphore *rwsem)
> > +{
> > + if (down_read_interruptible(rwsem))
> > + return NULL;
> > +
> > + return rwsem;
> > +}
> > +
> > +DEFINE_FREE(rwsem_read_release, struct rw_semaphore *, if (_T) up_read(_T))
> > +
> > #endif /* __CXL_H__ */
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 3ec6b906371b..872131009e4c 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -45,6 +45,8 @@
> > * @endpoint: connection to the CXL port topology for this memory device
> > * @id: id number of this memdev instance.
> > * @depth: endpoint port depth
> > + * @scrub_cycle: current scrub cycle set for this device
> > + * @scrub_region_id: id number of a backed region (if any) for which current scrub cycle set
> > */
> > struct cxl_memdev {
> > struct device dev;
> > @@ -56,6 +58,8 @@ struct cxl_memdev {
> > struct cxl_port *endpoint;
> > int id;
> > int depth;
> > + u8 scrub_cycle;
> > + int scrub_region_id;
> > };
> >
> > static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
> > @@ -853,6 +857,16 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
> > int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
> > int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
> >
> > +#ifdef CONFIG_CXL_EDAC_MEM_FEATURES
> > +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
> > +int devm_cxl_region_edac_register(struct cxl_region *cxlr);
> > +#else
> > +static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> > +{ return 0; }
> > +static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> > +{ return 0; }
> > +#endif
> > +
> > #ifdef CONFIG_CXL_SUSPEND
> > void cxl_mem_active_inc(void);
> > void cxl_mem_active_dec(void);
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 9675243bd05b..6e6777b7bafb 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -180,6 +180,10 @@ static int cxl_mem_probe(struct device *dev)
> > return rc;
> > }
> >
> > + rc = devm_cxl_memdev_edac_register(cxlmd);
> > + if (rc)
> > + dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
> > +
> > /*
> > * The kernel may be operating out of CXL memory on this device,
> > * there is no spec defined way to determine whether this device
> > diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> > index 387f3df8b988..31a2d73c963f 100644
> > --- a/tools/testing/cxl/Kbuild
> > +++ b/tools/testing/cxl/Kbuild
> > @@ -67,6 +67,7 @@ cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
> > cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
> > cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
> > cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
> > +cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
> > cxl_core-y += config_check.o
> > cxl_core-y += cxl_core_test.o
> > cxl_core-y += cxl_core_exports.o
> > --
> > 2.43.0
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
2025-05-21 17:48 ` Jonathan Cameron
@ 2025-05-21 20:17 ` Alison Schofield
0 siblings, 0 replies; 21+ messages in thread
From: Alison Schofield @ 2025-05-21 20:17 UTC (permalink / raw)
To: Jonathan Cameron
Cc: shiju.jose, linux-cxl, dan.j.williams, dave.jiang, dave,
vishal.l.verma, ira.weiny, linux-edac, linux-doc, bp, tony.luck,
lenb, Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On Wed, May 21, 2025 at 06:48:47PM +0100, Jonathan Cameron wrote:
> On Wed, 21 May 2025 10:07:57 -0700
> Alison Schofield <alison.schofield@intel.com> wrote:
>
> > On Wed, May 21, 2025 at 01:47:41PM +0100, shiju.jose@huawei.com wrote:
> > > From: Shiju Jose <shiju.jose@huawei.com>
> > >
> > > CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
> > > control feature. The device patrol scrub proactively locates and makes
> > > corrections to errors in regular cycle.
> >
> > snip
> >
> > > diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> > > new file mode 100644
> > > index 000000000000..eae99ed7c018
> > > --- /dev/null
> > > +++ b/drivers/cxl/core/edac.c
> > > @@ -0,0 +1,520 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * CXL EDAC memory feature driver.
> > > + *
> > > + * Copyright (c) 2024-2025 HiSilicon Limited.
> > > + *
> > > + * - Supports functions to configure EDAC features of the
> > > + * CXL memory devices.
> > > + * - Registers with the EDAC device subsystem driver to expose
> > > + * the features sysfs attributes to the user for configuring
> > > + * CXL memory RAS feature.
> > > + */
> > > +
> > > +#include <linux/cleanup.h>
> > > +#include <linux/edac.h>
> > > +#include <linux/limits.h>
> > > +#include <cxl/features.h>
> >
> > This needs tidying-up. Not clear that tidy-up belongs in this patch.
> > sparse now complains:
> >
> > drivers/cxl/core/edac.c: note: in included file:
> > ./include/cxl/features.h:67:43: error: marked inline, but without a definition
> >
> > because there is a proto of this in include/cxl/features.h and it is defined in
> > core/features.c. Compiler is looking for the definition in edac.c.
> >
> > Removing the inline WFM but that may not be right soln:
>
> We definitely shouldn't have a non-static inline in a c file.
> What would that actually mean?
>
> So I think right fix, but needs to be a separate precursor patch
> as the issue predates this series (no idea why we didn't see it before)
>
> Alison, fancy spinning below into a patch?
So - yes we will handle it in a precursor patch, and I'll submit
Mulling over the options w DaveJ
>
> Thanks,
>
> Jonathan
>
> >
> > diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
> > index a83a2214a136..4599e1d7668a 100644
> > --- a/drivers/cxl/core/features.c
> > +++ b/drivers/cxl/core/features.c
> > @@ -36,7 +36,7 @@ static bool is_cxl_feature_exclusive(struct cxl_feat_entry *entry)
> > return is_cxl_feature_exclusive_by_uuid(&entry->uuid);
> > }
> >
> > -inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds)
> > +struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds)
> > {
> > return cxlds->cxlfs;
> > }
> > diff --git a/include/cxl/features.h b/include/cxl/features.h
> > index 5f7f842765a5..b9297693dae7 100644
> > --- a/include/cxl/features.h
> > +++ b/include/cxl/features.h
> > @@ -64,7 +64,7 @@ struct cxl_features_state {
> > struct cxl_mailbox;
> > struct cxl_memdev;
> > #ifdef CONFIG_CXL_FEATURES
> > -inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds);
> > +struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds);
> > int devm_cxl_setup_features(struct cxl_dev_state *cxlds);
> > int devm_cxl_setup_fwctl(struct device *host, struct cxl_memdev *cxlmd);
> > #else
> >
> >
> > > +#include <cxl.h>
> > > +#include <cxlmem.h>
> > > +#include "core.h"
> > > +
> > > +#define CXL_NR_EDAC_DEV_FEATURES 1
> > > +
> > > +#define CXL_SCRUB_NO_REGION -1
> > > +
> > > +struct cxl_patrol_scrub_context {
> > > + u8 instance;
> > > + u16 get_feat_size;
> > > + u16 set_feat_size;
> > > + u8 get_version;
> > > + u8 set_version;
> > > + u16 effects;
> > > + struct cxl_memdev *cxlmd;
> > > + struct cxl_region *cxlr;
> > > +};
> > > +
> > > +/*
> > > + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
> > > + * Feature Readable Attributes.
> > > + */
> > > +struct cxl_scrub_rd_attrbs {
> > > + u8 scrub_cycle_cap;
> > > + __le16 scrub_cycle_hours;
> > > + u8 scrub_flags;
> > > +} __packed;
> > > +
> > > +/*
> > > + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-223 Device Patrol Scrub Control
> > > + * Feature Writable Attributes.
> > > + */
> > > +struct cxl_scrub_wr_attrbs {
> > > + u8 scrub_cycle_hours;
> > > + u8 scrub_flags;
> > > +} __packed;
> > > +
> > > +#define CXL_SCRUB_CONTROL_CHANGEABLE BIT(0)
> > > +#define CXL_SCRUB_CONTROL_REALTIME BIT(1)
> > > +#define CXL_SCRUB_CONTROL_CYCLE_MASK GENMASK(7, 0)
> > > +#define CXL_SCRUB_CONTROL_MIN_CYCLE_MASK GENMASK(15, 8)
> > > +#define CXL_SCRUB_CONTROL_ENABLE BIT(0)
> > > +
> > > +#define CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap) \
> > > + FIELD_GET(CXL_SCRUB_CONTROL_CHANGEABLE, cap)
> > > +#define CXL_GET_SCRUB_CYCLE(cycle) \
> > > + FIELD_GET(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
> > > +#define CXL_GET_SCRUB_MIN_CYCLE(cycle) \
> > > + FIELD_GET(CXL_SCRUB_CONTROL_MIN_CYCLE_MASK, cycle)
> > > +#define CXL_GET_SCRUB_EN_STS(flags) FIELD_GET(CXL_SCRUB_CONTROL_ENABLE, flags)
> > > +
> > > +#define CXL_SET_SCRUB_CYCLE(cycle) \
> > > + FIELD_PREP(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle)
> > > +#define CXL_SET_SCRUB_EN(en) FIELD_PREP(CXL_SCRUB_CONTROL_ENABLE, en)
> > > +
> > > +static int cxl_mem_scrub_get_attrbs(struct cxl_mailbox *cxl_mbox, u8 *cap,
> > > + u16 *cycle, u8 *flags, u8 *min_cycle)
> > > +{
> > > + size_t rd_data_size = sizeof(struct cxl_scrub_rd_attrbs);
> > > + size_t data_size;
> > > + struct cxl_scrub_rd_attrbs *rd_attrbs __free(kfree) =
> > > + kzalloc(rd_data_size, GFP_KERNEL);
> > > + if (!rd_attrbs)
> > > + return -ENOMEM;
> > > +
> > > + data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> > > + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs,
> > > + rd_data_size, 0, NULL);
> > > + if (!data_size)
> > > + return -EIO;
> > > +
> > > + *cap = rd_attrbs->scrub_cycle_cap;
> > > + *cycle = le16_to_cpu(rd_attrbs->scrub_cycle_hours);
> > > + *flags = rd_attrbs->scrub_flags;
> > > + if (min_cycle)
> > > + *min_cycle = CXL_GET_SCRUB_MIN_CYCLE(*cycle);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > > + u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
> > > +{
> > > + struct cxl_mailbox *cxl_mbox;
> > > + u8 min_scrub_cycle = U8_MAX;
> > > + struct cxl_region_params *p;
> > > + struct cxl_memdev *cxlmd;
> > > + struct cxl_region *cxlr;
> > > + int i, ret;
> > > +
> > > + if (!cxl_ps_ctx->cxlr) {
> > > + cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
> > > + return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> > > + flags, min_cycle);
> > > + }
> > > +
> > > + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> > > + rwsem_read_intr_acquire(&cxl_region_rwsem);
> > > + if (!region_lock)
> > > + return -EINTR;
> > > +
> > > + cxlr = cxl_ps_ctx->cxlr;
> > > + p = &cxlr->params;
> > > +
> > > + for (i = 0; i < p->nr_targets; i++) {
> > > + struct cxl_endpoint_decoder *cxled = p->targets[i];
> > > +
> > > + cxlmd = cxled_to_memdev(cxled);
> > > + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> > > + ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
> > > + flags, min_cycle);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (min_cycle)
> > > + min_scrub_cycle =
> > > + min(*min_cycle, min_scrub_cycle);
> > > + }
> > > +
> > > + if (min_cycle)
> > > + *min_cycle = min_scrub_cycle;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_scrub_set_attrbs_region(struct device *dev,
> > > + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > > + u8 cycle, u8 flags)
> > > +{
> > > + struct cxl_scrub_wr_attrbs wr_attrbs;
> > > + struct cxl_mailbox *cxl_mbox;
> > > + struct cxl_region_params *p;
> > > + struct cxl_memdev *cxlmd;
> > > + struct cxl_region *cxlr;
> > > + int ret, i;
> > > +
> > > + struct rw_semaphore *region_lock __free(rwsem_read_release) =
> > > + rwsem_read_intr_acquire(&cxl_region_rwsem);
> > > + if (!region_lock)
> > > + return -EINTR;
> > > +
> > > + cxlr = cxl_ps_ctx->cxlr;
> > > + p = &cxlr->params;
> > > + wr_attrbs.scrub_cycle_hours = cycle;
> > > + wr_attrbs.scrub_flags = flags;
> > > +
> > > + for (i = 0; i < p->nr_targets; i++) {
> > > + struct cxl_endpoint_decoder *cxled = p->targets[i];
> > > +
> > > + cxlmd = cxled_to_memdev(cxled);
> > > + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> > > + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> > > + cxl_ps_ctx->set_version, &wr_attrbs,
> > > + sizeof(wr_attrbs),
> > > + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET,
> > > + 0, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (cycle != cxlmd->scrub_cycle) {
> > > + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
> > > + dev_info(dev,
> > > + "Device scrub rate(%d hours) set by region%d rate overwritten by region%d scrub rate(%d hours)\n",
> > > + cxlmd->scrub_cycle,
> > > + cxlmd->scrub_region_id, cxlr->id,
> > > + cycle);
> > > +
> > > + cxlmd->scrub_cycle = cycle;
> > > + cxlmd->scrub_region_id = cxlr->id;
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_scrub_set_attrbs_device(struct device *dev,
> > > + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > > + u8 cycle, u8 flags)
> > > +{
> > > + struct cxl_scrub_wr_attrbs wr_attrbs;
> > > + struct cxl_mailbox *cxl_mbox;
> > > + struct cxl_memdev *cxlmd;
> > > + int ret;
> > > +
> > > + wr_attrbs.scrub_cycle_hours = cycle;
> > > + wr_attrbs.scrub_flags = flags;
> > > +
> > > + cxlmd = cxl_ps_ctx->cxlmd;
> > > + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
> > > + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID,
> > > + cxl_ps_ctx->set_version, &wr_attrbs,
> > > + sizeof(wr_attrbs),
> > > + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0,
> > > + NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (cycle != cxlmd->scrub_cycle) {
> > > + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION)
> > > + dev_info(dev,
> > > + "Device scrub rate(%d hours) set by region%d rate overwritten with device local scrub rate(%d hours)\n",
> > > + cxlmd->scrub_cycle, cxlmd->scrub_region_id,
> > > + cycle);
> > > +
> > > + cxlmd->scrub_cycle = cycle;
> > > + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_scrub_set_attrbs(struct device *dev,
> > > + struct cxl_patrol_scrub_context *cxl_ps_ctx,
> > > + u8 cycle, u8 flags)
> > > +{
> > > + if (cxl_ps_ctx->cxlr)
> > > + return cxl_scrub_set_attrbs_region(dev, cxl_ps_ctx, cycle, flags);
> > > +
> > > + return cxl_scrub_set_attrbs_device(dev, cxl_ps_ctx, cycle, flags);
> > > +}
> > > +
> > > +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data,
> > > + bool *enabled)
> > > +{
> > > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > > + u8 cap, flags;
> > > + u16 cycle;
> > > + int ret;
> > > +
> > > + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + *enabled = CXL_GET_SCRUB_EN_STS(flags);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data,
> > > + bool enable)
> > > +{
> > > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > > + u8 cap, flags, wr_cycle;
> > > + u16 rd_cycle;
> > > + int ret;
> > > +
> > > + if (!capable(CAP_SYS_RAWIO))
> > > + return -EPERM;
> > > +
> > > + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + wr_cycle = CXL_GET_SCRUB_CYCLE(rd_cycle);
> > > + flags = CXL_SET_SCRUB_EN(enable);
> > > +
> > > + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
> > > +}
> > > +
> > > +static int cxl_patrol_scrub_get_min_scrub_cycle(struct device *dev,
> > > + void *drv_data, u32 *min)
> > > +{
> > > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > > + u8 cap, flags, min_cycle;
> > > + u16 cycle;
> > > + int ret;
> > > +
> > > + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, &min_cycle);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + *min = min_cycle * 3600;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_patrol_scrub_get_max_scrub_cycle(struct device *dev,
> > > + void *drv_data, u32 *max)
> > > +{
> > > + *max = U8_MAX * 3600; /* Max set by register size */
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_patrol_scrub_get_scrub_cycle(struct device *dev, void *drv_data,
> > > + u32 *scrub_cycle_secs)
> > > +{
> > > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > > + u8 cap, flags;
> > > + u16 cycle;
> > > + int ret;
> > > +
> > > + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + *scrub_cycle_secs = CXL_GET_SCRUB_CYCLE(cycle) * 3600;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_patrol_scrub_set_scrub_cycle(struct device *dev, void *drv_data,
> > > + u32 scrub_cycle_secs)
> > > +{
> > > + struct cxl_patrol_scrub_context *ctx = drv_data;
> > > + u8 scrub_cycle_hours = scrub_cycle_secs / 3600;
> > > + u8 cap, wr_cycle, flags, min_cycle;
> > > + u16 rd_cycle;
> > > + int ret;
> > > +
> > > + if (!capable(CAP_SYS_RAWIO))
> > > + return -EPERM;
> > > +
> > > + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, &min_cycle);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (!CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if (scrub_cycle_hours < min_cycle) {
> > > + dev_dbg(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> > > + scrub_cycle_hours);
> > > + dev_dbg(dev,
> > > + "Minimum supported CXL patrol scrub cycle in hour %d\n",
> > > + min_cycle);
> > > + return -EINVAL;
> > > + }
> > > + wr_cycle = CXL_SET_SCRUB_CYCLE(scrub_cycle_hours);
> > > +
> > > + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags);
> > > +}
> > > +
> > > +static const struct edac_scrub_ops cxl_ps_scrub_ops = {
> > > + .get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
> > > + .set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
> > > + .get_min_cycle = cxl_patrol_scrub_get_min_scrub_cycle,
> > > + .get_max_cycle = cxl_patrol_scrub_get_max_scrub_cycle,
> > > + .get_cycle_duration = cxl_patrol_scrub_get_scrub_cycle,
> > > + .set_cycle_duration = cxl_patrol_scrub_set_scrub_cycle,
> > > +};
> > > +
> > > +static int cxl_memdev_scrub_init(struct cxl_memdev *cxlmd,
> > > + struct edac_dev_feature *ras_feature,
> > > + u8 scrub_inst)
> > > +{
> > > + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> > > + struct cxl_feat_entry *feat_entry;
> > > + u8 cap, flags;
> > > + u16 cycle;
> > > + int rc;
> > > +
> > > + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
> > > + &CXL_FEAT_PATROL_SCRUB_UUID);
> > > + if (IS_ERR(feat_entry))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE))
> > > + return -EOPNOTSUPP;
> > > +
> > > + cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> > > + if (!cxl_ps_ctx)
> > > + return -ENOMEM;
> > > +
> > > + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
> > > + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
> > > + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
> > > + .get_version = feat_entry->get_feat_ver,
> > > + .set_version = feat_entry->set_feat_ver,
> > > + .effects = le16_to_cpu(feat_entry->effects),
> > > + .instance = scrub_inst,
> > > + .cxlmd = cxlmd,
> > > + };
> > > +
> > > + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap, &cycle,
> > > + &flags, NULL);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
> > > + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> > > +
> > > + ras_feature->ft_type = RAS_FEAT_SCRUB;
> > > + ras_feature->instance = cxl_ps_ctx->instance;
> > > + ras_feature->scrub_ops = &cxl_ps_scrub_ops;
> > > + ras_feature->ctx = cxl_ps_ctx;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_region_scrub_init(struct cxl_region *cxlr,
> > > + struct edac_dev_feature *ras_feature,
> > > + u8 scrub_inst)
> > > +{
> > > + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> > > + struct cxl_region_params *p = &cxlr->params;
> > > + struct cxl_feat_entry *feat_entry = NULL;
> > > + struct cxl_memdev *cxlmd;
> > > + u8 cap, flags;
> > > + u16 cycle;
> > > + int i, rc;
> > > +
> > > + /*
> > > + * The cxl_region_rwsem must be held if the code below is used in a context
> > > + * other than when the region is in the probe state, as shown here.
> > > + */
> > > + for (i = 0; i < p->nr_targets; i++) {
> > > + struct cxl_endpoint_decoder *cxled = p->targets[i];
> > > +
> > > + cxlmd = cxled_to_memdev(cxled);
> > > + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds),
> > > + &CXL_FEAT_PATROL_SCRUB_UUID);
> > > + if (IS_ERR(feat_entry))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if (!(le32_to_cpu(feat_entry->flags) &
> > > + CXL_FEATURE_F_CHANGEABLE))
> > > + return -EOPNOTSUPP;
> > > +
> > > + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap,
> > > + &cycle, &flags, NULL);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle);
> > > + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION;
> > > + }
> > > +
> > > + cxl_ps_ctx = devm_kzalloc(&cxlr->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> > > + if (!cxl_ps_ctx)
> > > + return -ENOMEM;
> > > +
> > > + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){
> > > + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size),
> > > + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size),
> > > + .get_version = feat_entry->get_feat_ver,
> > > + .set_version = feat_entry->set_feat_ver,
> > > + .effects = le16_to_cpu(feat_entry->effects),
> > > + .instance = scrub_inst,
> > > + .cxlr = cxlr,
> > > + };
> > > +
> > > + ras_feature->ft_type = RAS_FEAT_SCRUB;
> > > + ras_feature->instance = cxl_ps_ctx->instance;
> > > + ras_feature->scrub_ops = &cxl_ps_scrub_ops;
> > > + ras_feature->ctx = cxl_ps_ctx;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> > > +{
> > > + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> > > + int num_ras_features = 0;
> > > + int rc;
> > > +
> > > + if (IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) {
> > > + rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features], 0);
> > > + if (rc < 0 && rc != -EOPNOTSUPP)
> > > + return rc;
> > > +
> > > + if (rc != -EOPNOTSUPP)
> > > + num_ras_features++;
> > > + }
> > > +
> > > + if (!num_ras_features)
> > > + return -EINVAL;
> > > +
> > > + char *cxl_dev_name __free(kfree) =
> > > + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
> > > + if (!cxl_dev_name)
> > > + return -ENOMEM;
> > > +
> > > + return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> > > + num_ras_features, ras_features);
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
> > > +
> > > +int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> > > +{
> > > + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> > > + int num_ras_features = 0;
> > > + int rc;
> > > +
> > > + if (!IS_ENABLED(CONFIG_CXL_EDAC_SCRUB))
> > > + return 0;
> > > +
> > > + rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features], 0);
> > > + if (rc < 0)
> > > + return rc;
> > > +
> > > + num_ras_features++;
> > > +
> > > + char *cxl_dev_name __free(kfree) =
> > > + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
> > > + if (!cxl_dev_name)
> > > + return -ENOMEM;
> > > +
> > > + return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
> > > + num_ras_features, ras_features);
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index c3f4dc244df7..d5b8108c4a6d 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -3537,8 +3537,18 @@ static int cxl_region_probe(struct device *dev)
> > >
> > > switch (cxlr->mode) {
> > > case CXL_PARTMODE_PMEM:
> > > + rc = devm_cxl_region_edac_register(cxlr);
> > > + if (rc)
> > > + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
> > > + cxlr->id);
> > > +
> > > return devm_cxl_add_pmem_region(cxlr);
> > > case CXL_PARTMODE_RAM:
> > > + rc = devm_cxl_region_edac_register(cxlr);
> > > + if (rc)
> > > + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
> > > + cxlr->id);
> > > +
> > > /*
> > > * The region can not be manged by CXL if any portion of
> > > * it is already online as 'System RAM'
> > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > index a9ab46eb0610..8a252f8483f7 100644
> > > --- a/drivers/cxl/cxl.h
> > > +++ b/drivers/cxl/cxl.h
> > > @@ -912,4 +912,14 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
> > >
> > > u16 cxl_gpf_get_dvsec(struct device *dev);
> > >
> > > +static inline struct rw_semaphore *rwsem_read_intr_acquire(struct rw_semaphore *rwsem)
> > > +{
> > > + if (down_read_interruptible(rwsem))
> > > + return NULL;
> > > +
> > > + return rwsem;
> > > +}
> > > +
> > > +DEFINE_FREE(rwsem_read_release, struct rw_semaphore *, if (_T) up_read(_T))
> > > +
> > > #endif /* __CXL_H__ */
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 3ec6b906371b..872131009e4c 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -45,6 +45,8 @@
> > > * @endpoint: connection to the CXL port topology for this memory device
> > > * @id: id number of this memdev instance.
> > > * @depth: endpoint port depth
> > > + * @scrub_cycle: current scrub cycle set for this device
> > > + * @scrub_region_id: id number of a backed region (if any) for which current scrub cycle set
> > > */
> > > struct cxl_memdev {
> > > struct device dev;
> > > @@ -56,6 +58,8 @@ struct cxl_memdev {
> > > struct cxl_port *endpoint;
> > > int id;
> > > int depth;
> > > + u8 scrub_cycle;
> > > + int scrub_region_id;
> > > };
> > >
> > > static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
> > > @@ -853,6 +857,16 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
> > > int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
> > > int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
> > >
> > > +#ifdef CONFIG_CXL_EDAC_MEM_FEATURES
> > > +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
> > > +int devm_cxl_region_edac_register(struct cxl_region *cxlr);
> > > +#else
> > > +static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> > > +{ return 0; }
> > > +static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> > > +{ return 0; }
> > > +#endif
> > > +
> > > #ifdef CONFIG_CXL_SUSPEND
> > > void cxl_mem_active_inc(void);
> > > void cxl_mem_active_dec(void);
> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index 9675243bd05b..6e6777b7bafb 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -180,6 +180,10 @@ static int cxl_mem_probe(struct device *dev)
> > > return rc;
> > > }
> > >
> > > + rc = devm_cxl_memdev_edac_register(cxlmd);
> > > + if (rc)
> > > + dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
> > > +
> > > /*
> > > * The kernel may be operating out of CXL memory on this device,
> > > * there is no spec defined way to determine whether this device
> > > diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> > > index 387f3df8b988..31a2d73c963f 100644
> > > --- a/tools/testing/cxl/Kbuild
> > > +++ b/tools/testing/cxl/Kbuild
> > > @@ -67,6 +67,7 @@ cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
> > > cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
> > > cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
> > > cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
> > > +cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
> > > cxl_core-y += config_check.o
> > > cxl_core-y += cxl_core_test.o
> > > cxl_core-y += cxl_core_exports.o
> > > --
> > > 2.43.0
> > >
> >
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 0/8] cxl: support CXL memory RAS features
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (8 preceding siblings ...)
2025-05-21 14:59 ` [PATCH v6 0/8] cxl: support CXL memory RAS features Jonathan Cameron
@ 2025-05-21 20:19 ` Alison Schofield
2025-05-23 18:53 ` Dan Williams
2025-05-23 20:38 ` Dave Jiang
11 siblings, 0 replies; 21+ messages in thread
From: Alison Schofield @ 2025-05-21 20:19 UTC (permalink / raw)
To: shiju.jose
Cc: linux-cxl, dan.j.williams, jonathan.cameron, dave.jiang, dave,
vishal.l.verma, ira.weiny, linux-edac, linux-doc, bp, tony.luck,
lenb, Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On Wed, May 21, 2025 at 01:47:38PM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Support for CXL memory EDAC features: patrol scrub, ECS, soft-PPR and
> memory sparing.
For the series:
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
2025-05-21 14:40 ` Jonathan Cameron
@ 2025-05-21 23:55 ` Dave Jiang
0 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2025-05-21 23:55 UTC (permalink / raw)
To: Jonathan Cameron, shiju.jose
Cc: linux-cxl, dan.j.williams, dave, alison.schofield, vishal.l.verma,
ira.weiny, linux-edac, linux-doc, bp, tony.luck, lenb,
Yazen.Ghannam, mchehab, nifan.cxl, linuxarm, tanxiaofei,
prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang
On 5/21/25 7:40 AM, Jonathan Cameron wrote:
> On Wed, 21 May 2025 13:47:41 +0100
> <shiju.jose@huawei.com> wrote:
>
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
>> control feature. The device patrol scrub proactively locates and makes
>> corrections to errors in regular cycle.
>>
>> Allow specifying the number of hours within which the patrol scrub must be
>> completed, subject to minimum and maximum limits reported by the device.
>> Also allow disabling scrub allowing trade-off error rates against
>> performance.
>>
>> Add support for patrol scrub control on CXL memory devices.
>> Register with the EDAC device driver, which retrieves the scrub attribute
>> descriptors from EDAC scrub and exposes the sysfs scrub control attributes
>> to userspace. For example, scrub control for the CXL memory device
>> "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
>>
>> Additionally, add support for region-based CXL memory patrol scrub control.
>> CXL memory regions may be interleaved across one or more CXL memory
>> devices. For example, region-based scrub control for "cxl_region1" is
>> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
>>
>> [dj: Add cxl_test inclusion of edac.o]
>> [dj: Check return from cxl_feature_info() with IS_ERR]
>
> Trivial question on these. What do they reflect? Some changes
> Dave made on a prior version? Or changes in response to feedback
> (in which case they should be below the ---)
Ah those are the notations I inserted when I pulled the the branch for merge testing and Shiju picked up. Normally those would go to Linus. But in this case, it can be dropped since there is another reversion for the series and the changes and folded in by Shiju.
DJ
>
>>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>
> A couple of formatting trivial things inline from the refactors
> in this version. Maybe Dave can tweak them whilst applying if
> nothing else comes up?
>
> J
>
>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>> new file mode 100644
>> index 000000000000..eae99ed7c018
>> --- /dev/null
>> +++ b/drivers/cxl/core/edac.c
>> @@ -0,0 +1,520 @@
>
>> +static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
>> + u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>> +{
>> + struct cxl_mailbox *cxl_mbox;
>> + u8 min_scrub_cycle = U8_MAX;
>> + struct cxl_region_params *p;
>> + struct cxl_memdev *cxlmd;
>> + struct cxl_region *cxlr;
>> + int i, ret;
>> +
>> + if (!cxl_ps_ctx->cxlr) {
>> + cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox;
>> + return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
>> + flags, min_cycle);
>> + }
>> +
>> + struct rw_semaphore *region_lock __free(rwsem_read_release) =
>> + rwsem_read_intr_acquire(&cxl_region_rwsem);
>
> Trivial but that should be indented one tab more.
>
>> + if (!region_lock)
>> + return -EINTR;
>> +
>> + cxlr = cxl_ps_ctx->cxlr;
>> + p = &cxlr->params;
>> +
>> + for (i = 0; i < p->nr_targets; i++) {
>> + struct cxl_endpoint_decoder *cxled = p->targets[i];
>> +
>> + cxlmd = cxled_to_memdev(cxled);
>> + cxl_mbox = &cxlmd->cxlds->cxl_mbox;
>> + ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle,
>> + flags, min_cycle);
>
> Maybe move flags to previous line.
>
>> + if (ret)
>> + return ret;
>> +
>> + if (min_cycle)
>> + min_scrub_cycle =
>> + min(*min_cycle, min_scrub_cycle);
>
> No need for the line wrap any more.
>
>
>> + }
>> +
>> + if (min_cycle)
>> + *min_cycle = min_scrub_cycle;
>> +
>> + return 0;
>> +}
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature
2025-05-21 12:47 ` [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
@ 2025-05-23 18:50 ` Dan Williams
0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2025-05-23 18:50 UTC (permalink / raw)
To: shiju.jose, linux-cxl, dan.j.williams, jonathan.cameron,
dave.jiang, dave, alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Memory sparing is defined as a repair function that replaces a portion of
> memory with a portion of functional memory at that same DPA. The subclasses
> for this operation vary in terms of the scope of the sparing being
> performed. The cacheline sparing subclass refers to a sparing action that
> can replace a full cacheline. Row sparing is provided as an alternative to
> PPR sparing functions and its scope is that of a single DDR row.
> As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over
> PPR when possible.
> Bank sparing allows an entire bank to be replaced. Rank sparing is defined
> as an operation in which an entire DDR rank is replaced.
>
> Memory sparing maintenance operations may be supported by CXL devices
> that implement CXL.mem protocol. A sparing maintenance operation requests
> the CXL device to perform a repair operation on its media.
> For example, a CXL device with DRAM components that support memory sparing
> features may implement sparing maintenance operations.
>
> The host may issue a query command by setting query resources flag in the
> input payload (CXL spec 3.2 Table 8-120) to determine availability of
> sparing resources for a given address. In response to a query request,
> the device shall report the resource availability by producing the memory
> sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank,
> Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
> of the values specified in the request.
>
> During the execution of a sparing maintenance operation, a CXL memory
> device:
> - may not retain data
> - may not be able to process CXL.mem requests correctly.
> These CXL memory device capabilities are specified by restriction flags
> in the memory sparing feature readable attributes.
>
> When a CXL device identifies error on a memory component, the device
> may inform the host about the need for a memory sparing maintenance
> operation by using DRAM event record, where the 'maintenance needed' flag
> may set. The event record contains some of the DPA, Channel, Rank,
> Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
> should be repaired. The userspace tool requests for maintenance operation
> if the 'maintenance needed' flag set in the CXL DRAM error record.
>
> CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
> maintenance operation feature.
>
> CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
> discovery and configuration.
>
> Add support for controlling CXL memory device memory sparing feature.
> Register with EDAC driver, which gets the memory repair attr descriptors
> from the EDAC memory repair driver and exposes sysfs repair control
> attributes for memory sparing to the userspace. For example CXL memory
> sparing control for the CXL mem0 device is exposed in
> /sys/bus/edac/devices/cxl_mem0/mem_repairX/
>
> Use case
> ========
> 1. CXL device identifies a failure in a memory component, report to
> userspace in a CXL DRAM trace event with DPA and other attributes of
> memory to repair such as channel, rank, nibble mask, bank Group,
> bank, row, column, sub-channel.
>
> 2. Rasdaemon process the trace event and may issue query request in sysfs
> check resources available for memory sparing if either of the following
> conditions met.
> - 'maintenance needed' flag set in the event record.
> - 'threshold event' flag set for CVME threshold feature.
> - When the number of corrected error reported on a CXL.mem media to the
> userspace exceeds the threshold value for corrected error count defined
> by the userspace policy.
>
> 3. Rasdaemon process the memory sparing trace event and issue repair
> request for memory sparing.
>
> Kernel CXL driver shall report memory sparing event record to the userspace
> with the resource availability in order rasdaemon to process the event
> record and issue a repair request in sysfs for the memory sparing operation
> in the CXL device.
>
> Note: Based on the feedbacks from the community 'query' sysfs attribute is
> removed and reporting memory sparing error record to the userspace are not
> supported. Instead userspace issues sparing operation and kernel does the
> same to the CXL memory device, when 'maintenance needed' flag set in the
> DRAM event record.
>
> Add checks to ensure the memory to be repaired is offline and if online,
> then originates from a CXL DRAM error record reported in the current boot
> before requesting a memory sparing operation on the device.
>
> Note: Tested memory sparing feature control with QEMU patch
> "hw/cxl: Add emulation for memory sparing control feature"
> https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6
>
> [dj: Move cxl_is_memdev_memory_online() before its caller. (Alison)]
> [dj: Check return from cxl_feature_info() with IS_ERR]
I would love for more of this changelog to make it into the
documentation, but that can be a follow-up. For example the policy
described by:
"Add checks to ensure the memory to be repaired is offline and if online,
then originates from a CXL DRAM error record reported in the current boot
before requesting a memory sparing operation on the device."
...is important information for the interface, but that can arrive in a
follow-on change.
It should probably also clarify the data consistency and access latency
impacts of the repair. Like it is a hardware bug if data changes over
the repair event, and to consult product documnetation about the latency
of repair.
> +static int cxl_mem_sparing_get_repair_type(struct device *dev, void *drv_data,
> + const char **repair_type)
A lot of my unease with this patch arises from the abandonment of
type-safety in all these callbacks... but that ship has sailed
at this point so that unease will need to be addressed as a follow-on,
if ever.
I also think the edac_dev_register() scheme and its usage of drvdata
outside of a driver context looks odd, i.e. the normal expectations about the
device_lock() relative to sysfs attribute visibility can not be applied.
However, nothing looks obviously broken, so:
Acked-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 0/8] cxl: support CXL memory RAS features
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (9 preceding siblings ...)
2025-05-21 20:19 ` Alison Schofield
@ 2025-05-23 18:53 ` Dan Williams
2025-05-23 20:38 ` Dave Jiang
11 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2025-05-23 18:53 UTC (permalink / raw)
To: shiju.jose, linux-cxl, dan.j.williams, jonathan.cameron,
dave.jiang, dave, alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang, shiju.jose
shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Support for CXL memory EDAC features: patrol scrub, ECS, soft-PPR and
> memory sparing.
>
> Detailed history of the complete EDAC series with CXL EDAC patches
> up to V20 [1] and this CXL specific series had separated from V20 of
> the above series.
>
> The series is based on [2] v6.15-rc4 (based on comment from Dave
> in the thread [4]).
>
> Also applied(no conflicts) and tested on cxl.git [3] branch: next
>
> 1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
> 2. https://github.com/torvalds/linux.git
> 3. https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
> 4. https://lore.kernel.org/all/d83a83d1-37e7-4192-913f-243098f679e3@intel.com/
>
> Userspace code for CXL memory repair features [5] and
> sample boot-script for CXL memory repair [6].
>
> [5]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
> [6]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/
All of my prior review comments are addressed, the maze of type-unsafe
callbacks gives me pause, but not disqualifying since it is all
self-contained out of the way in drivers/cxl/core/edac.c
For the series you can add:
Acked-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 0/8] cxl: support CXL memory RAS features
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
` (10 preceding siblings ...)
2025-05-23 18:53 ` Dan Williams
@ 2025-05-23 20:38 ` Dave Jiang
11 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2025-05-23 20:38 UTC (permalink / raw)
To: shiju.jose, linux-cxl, dan.j.williams, jonathan.cameron, dave,
alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-edac, linux-doc, bp, tony.luck, lenb, Yazen.Ghannam,
mchehab, nifan.cxl, linuxarm, tanxiaofei, prime.zeng,
roberto.sassu, kangkang.shen, wanghuiqiang
On 5/21/25 5:47 AM, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Support for CXL memory EDAC features: patrol scrub, ECS, soft-PPR and
> memory sparing.
>
> Detailed history of the complete EDAC series with CXL EDAC patches
> up to V20 [1] and this CXL specific series had separated from V20 of
> the above series.
>
> The series is based on [2] v6.15-rc4 (based on comment from Dave
> in the thread [4]).
>
> Also applied(no conflicts) and tested on cxl.git [3] branch: next
>
> 1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
> 2. https://github.com/torvalds/linux.git
> 3. https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
> 4. https://lore.kernel.org/all/d83a83d1-37e7-4192-913f-243098f679e3@intel.com/
>
> Userspace code for CXL memory repair features [5] and
> sample boot-script for CXL memory repair [6].
>
> [5]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
> [6]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/
Applied to cxl/next
>
> Changes
> =======
> v5 -> v6:
> 1. Fixed feedback from Randy Dunlap on CXL EDAC documentation.
>
> 2. Feedback from Alison:
> - Replace #ifdef using IS_ENABLED() in the series
> - Fix for the kfree() oops in devm_cxl_memdev_edac_release()
> while unloading cxl-test module.
> - Added separate helper functions for scrub set attributes for
> dev scrub and region scrub.
> - renaming to scrub_cycle and scrub_region_id.
>
> 3. Feedback from Dave:
> - Fix for the kfree() oops in devm_cxl_memdev_edac_release()
> while unloading cxl-test module.
> - Add cxl_test inclusion of edac.o
> - Check return from cxl_feature_info() with IS_ERR in the series.
>
> 4. Rebased to linux.git [2] v6.15-rc4 (based on comment from Dave
> in the thread [4]).
>
> v4 -> v5:
> 1. Fixed a compilation warning introduced by v3->v4, reported by Dave Jiang on v4.
> drivers/cxl/core/edac.c: In function ‘cxl_mem_perform_sparing’:
> drivers/cxl/core/edac.c:1335:29: warning: the comparison will always evaluate as ‘true’ for the address of ‘validity_flags’ will never be NULL [-Waddress]
> 1335 | if (!rec->media_hdr.validity_flags)
> | ^
> In file included from ./drivers/cxl/cxlmem.h:10,
> from drivers/cxl/core/edac.c:21:
> ./include/cxl/event.h:35:12: note: ‘validity_flags’ declared here
> 35 | u8 validity_flags[2];
> | ^~~~~~~~~~~~~~
> 2. Updated patches for tags given.
>
> v3 -> v4:
> 1. Feedback from Dave Jiang on v3,
> 1.1. Changes for comments in EDAC scrub documentation for CXL use cases.
> https://lore.kernel.org/all/2df68c68-f1a8-4327-abc9-d265326c133d@intel.com/
> 1.2. Changes for comments in CXL memory sparing control feature.
> https://lore.kernel.org/all/4ee3323c-fb27-4fbe-b032-78fd54bc21a0@intel.com/
>
> v2 -> v3:
> 1. Feedback from Dan Williams on v2,
> https://lore.kernel.org/linux-mm/20250320180450.539-1-shiju.jose@huawei.com/
> - Modified get_support_feature_info() in fwctl series generic to use in
> cxl/fxctl and cxl/edac and replace cxl_get_feature_entry() in the CXL edac
> series.
> - Add usecase note for CXL ECS in Documentation/edac/scrub.rst.
> - Add info message when device scrub rate set by a region overwritten with a
> local device scrub rate or another region's scrub rate.
> - Replace 'ps' with patrol_scrub in the patrol scrub feature.
> - Replaced usage of intermediate objects struct cxl_memdev_ps_params and
> enum cxl_scrub_param etc for patrol scrub and did same for ECS.
> - Rename CXL_MEMDEV_PS_* macros.
> - Rename scrub_cycle_hrs-> scrub_cycle_hours
> - Add if (!cxl_dev_name)
> return -ENOMEM; to devm_cxl_memdev_edac_register()
> - Add devm_cxl_region_edac_register(cxlr) for CXL_PARTMODE_PMEM case.
> - Add separate configurations for CXL scrub, ECS and memory repair
> CXL_EDAC_SCRUB, CXL_EDAC_ECS and CXL_EDAC_MEM_REPAIR.
> - Add
> if (!capable(CAP_SYS_RAWIO))
> return -EPERM; for set attributes callbacks for CXL scrub, ECS and
> memory repair.
> - In patch "cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command"
> * cxl_do_maintenance() -> cxl_perform_maintenance() and moved to cxl/core/edac.c
> * kmalloc() -> kvzalloc()
> - In patch, "cxl: Support for finding memory operation attributes from the current boot"
> * Moved code from drivers/cxl/core/ras.c to drivers/cxl/core/edac.c
> * Add few logics to releasing the cache to give safety with respect to error storms and burning
> * unlimited memory.
> * Add estimated memory overhead expense of this feature documented in the Kconfig.
> * Unified various names such as attr, param, attrbs throughout the patches.
> * Moved > struct xarray rec_gen_media and struct xarray rec_dram; out of struct cxl_memdev
> to CXL edac object, but there is required a pointer to this object in struct cxl_memdev
> because the error records are reported and thus stored in the cxl_memdev context not
> in the CXL EDAC context.
>
> 2. Feedback from Borislav on v2,
> - In include/linux/edac.h
> Replace EDAC_PPR -> EDAC_REPAIR_PPR
> EDAC_CACHELINE_SPARING -> EDAC_REPAIR_CACHELINE_SPARING etc.
>
> v1 -> v2:
> 1. Feedback from Dan Williams on v1,
> https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
> - Fixed lock issues in region scrubbing, added local cxl_acquire()
> and cxl_unlock.
> - Replaced CXL examples using cat and echo from EDAC .rst docs
> with short description and ref to ABI docs. Also corrections
> in existing descriptions as suggested by Dan.
> - Add policy description for the scrub control feature.
> However this may require inputs from CXL experts.
> - Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
> - Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
> - Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
> - snprintf() -> kasprintf() in few places.
>
> 2. Feedback from Alison on v1,
> - In cxl_get_feature_entry()(patch 1), return NULL on failures and
> reintroduced checks in cxl_get_feature_entry().
> - Changed logic in for loop in region based scrubbing code.
> - Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
> and add as a local function to drivers/cxl/core/edac.c
> - Changed few multiline comments to single line comments.
> - Removed unnecessary comments from the code.
> - Reduced line length of few macros in ECS and memory repair code.
> - In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
> - Ran clang-format for new files and updated.
> 3. Changes for feedbacks from Jonathan on v1.
> - Changed few multiline comments to single line comments.
>
> Shiju Jose (8):
> EDAC: Update documentation for the CXL memory patrol scrub control
> feature
> cxl: Update prototype of function get_support_feature_info()
> cxl/edac: Add CXL memory device patrol scrub control feature
> cxl/edac: Add CXL memory device ECS control feature
> cxl/edac: Add support for PERFORM_MAINTENANCE command
> cxl/edac: Support for finding memory operation attributes from the
> current boot
> cxl/edac: Add CXL memory device memory sparing control feature
> cxl/edac: Add CXL memory device soft PPR control feature
>
> Documentation/edac/memory_repair.rst | 31 +
> Documentation/edac/scrub.rst | 76 +
> drivers/cxl/Kconfig | 71 +
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 2 +
> drivers/cxl/core/edac.c | 2103 ++++++++++++++++++++++++++
> drivers/cxl/core/features.c | 17 +-
> drivers/cxl/core/mbox.c | 11 +-
> drivers/cxl/core/memdev.c | 1 +
> drivers/cxl/core/region.c | 10 +
> drivers/cxl/cxl.h | 10 +
> drivers/cxl/cxlmem.h | 30 +
> drivers/cxl/mem.c | 4 +
> drivers/edac/mem_repair.c | 9 +
> include/linux/edac.h | 7 +
> tools/testing/cxl/Kbuild | 1 +
> 16 files changed, 2372 insertions(+), 12 deletions(-)
> create mode 100644 drivers/cxl/core/edac.c
>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-05-23 20:38 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
2025-05-21 12:47 ` [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-05-21 16:28 ` Fan Ni
2025-05-21 12:47 ` [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
2025-05-21 16:31 ` Fan Ni
2025-05-21 12:47 ` [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
2025-05-21 14:40 ` Jonathan Cameron
2025-05-21 23:55 ` Dave Jiang
2025-05-21 17:07 ` Alison Schofield
2025-05-21 17:48 ` Jonathan Cameron
2025-05-21 20:17 ` Alison Schofield
2025-05-21 12:47 ` [PATCH v6 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
2025-05-21 12:47 ` [PATCH v6 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
2025-05-21 12:47 ` [PATCH v6 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
2025-05-21 12:47 ` [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
2025-05-23 18:50 ` Dan Williams
2025-05-21 12:47 ` [PATCH v6 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose
2025-05-21 14:59 ` [PATCH v6 0/8] cxl: support CXL memory RAS features Jonathan Cameron
2025-05-21 20:19 ` Alison Schofield
2025-05-23 18:53 ` Dan Williams
2025-05-23 20:38 ` Dave Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).