* [RFC PATCH v2 1/5] acpi: numa: Add support to enumerate and store extended linear address mode
2024-11-12 22:12 [RFC PATCH v2 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
@ 2024-11-12 22:12 ` Dave Jiang
2024-11-26 16:16 ` Jonathan Cameron
2024-11-12 22:12 ` [RFC PATCH v2 2/5] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2024-11-12 22:12 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Store the address mode as part of the cache attriutes. Export the mode
attribute to sysfs as all other cache attributes.
Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v2:
- Fix spelling errors (Jonathan)
- Change UNKNOWN to RESERVED (Jonathan)
---
Documentation/ABI/stable/sysfs-devices-node | 6 ++++++
drivers/acpi/numa/hmat.c | 3 +++
drivers/base/node.c | 2 ++
include/linux/node.h | 7 +++++++
4 files changed, 18 insertions(+)
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index 402af4b2b905..725ef0e1e01f 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -177,6 +177,12 @@ Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.
+What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/mode
+Date: September 2024
+Contact: Dave Jiang <dave.jiang@intel.com>
+Description:
+ The address mode: 0 for reserved, 1 for extended-linear.
+
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
Date: November 2021
Contact: Jarkko Sakkinen <jarkko@kernel.org>
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 1a902a02390f..39524f36be5b 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -506,6 +506,9 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
case ACPI_HMAT_CA_DIRECT_MAPPED:
tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
+ /* Extended Linear mode is only valid if cache is direct mapped */
+ if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR)
+ tcache->cache_attrs.mode = NODE_CACHE_MODE_EXTENDED_LINEAR;
break;
case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
diff --git a/drivers/base/node.c b/drivers/base/node.c
index eb72580288e6..744be5470728 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
CACHE_ATTR(line_size, "%u")
CACHE_ATTR(indexing, "%u")
CACHE_ATTR(write_policy, "%u")
+CACHE_ATTR(mode, "%u")
static struct attribute *cache_attrs[] = {
&dev_attr_indexing.attr,
&dev_attr_size.attr,
&dev_attr_line_size.attr,
&dev_attr_write_policy.attr,
+ &dev_attr_mode.attr,
NULL,
};
ATTRIBUTE_GROUPS(cache);
diff --git a/include/linux/node.h b/include/linux/node.h
index 9a881c2208b3..fdecb760ef49 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -57,6 +57,11 @@ enum cache_write_policy {
NODE_CACHE_WRITE_OTHER,
};
+enum cache_mode {
+ NODE_CACHE_MODE_RESERVED,
+ NODE_CACHE_MODE_EXTENDED_LINEAR,
+};
+
/**
* struct node_cache_attrs - system memory caching attributes
*
@@ -65,6 +70,7 @@ enum cache_write_policy {
* @size: Total size of cache in bytes
* @line_size: Number of bytes fetched on a cache miss
* @level: The cache hierarchy level
+ * @mode: The address mode
*/
struct node_cache_attrs {
enum cache_indexing indexing;
@@ -72,6 +78,7 @@ struct node_cache_attrs {
u64 size;
u16 line_size;
u8 level;
+ u16 mode;
};
#ifdef CONFIG_HMEM_REPORTING
--
2.47.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 1/5] acpi: numa: Add support to enumerate and store extended linear address mode
2024-11-12 22:12 ` [RFC PATCH v2 1/5] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
@ 2024-11-26 16:16 ` Jonathan Cameron
2024-12-03 23:05 ` Dave Jiang
0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2024-11-26 16:16 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Tue, 12 Nov 2024 15:12:33 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Store the address mode as part of the cache attriutes. Export the mode
> attribute to sysfs as all other cache attributes.
>
> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
One trivial suggestion that I don't care that much about.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> v2:
> - Fix spelling errors (Jonathan)
> - Change UNKNOWN to RESERVED (Jonathan)
> ---
> Documentation/ABI/stable/sysfs-devices-node | 6 ++++++
> drivers/acpi/numa/hmat.c | 3 +++
> drivers/base/node.c | 2 ++
> include/linux/node.h | 7 +++++++
> 4 files changed, 18 insertions(+)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 402af4b2b905..725ef0e1e01f 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -177,6 +177,12 @@ Description:
> The cache write policy: 0 for write-back, 1 for write-through,
> other or unknown.
>
> +What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/mode
Mode feels perhaps a bit to vague. Maybe address_mode?
> +Date: September 2024
> +Contact: Dave Jiang <dave.jiang@intel.com>
> +Description:
> + The address mode: 0 for reserved, 1 for extended-linear.
> +
> What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> Date: November 2021
> Contact: Jarkko Sakkinen <jarkko@kernel.org>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 1a902a02390f..39524f36be5b 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -506,6 +506,9 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
> switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
> case ACPI_HMAT_CA_DIRECT_MAPPED:
> tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
> + /* Extended Linear mode is only valid if cache is direct mapped */
> + if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR)
> + tcache->cache_attrs.mode = NODE_CACHE_MODE_EXTENDED_LINEAR;
> break;
> case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
> tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 1/5] acpi: numa: Add support to enumerate and store extended linear address mode
2024-11-26 16:16 ` Jonathan Cameron
@ 2024-12-03 23:05 ` Dave Jiang
0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2024-12-03 23:05 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 11/26/24 9:16 AM, Jonathan Cameron wrote:
> On Tue, 12 Nov 2024 15:12:33 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Store the address mode as part of the cache attriutes. Export the mode
>> attribute to sysfs as all other cache attributes.
>>
>> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> One trivial suggestion that I don't care that much about.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
>> ---
>> v2:
>> - Fix spelling errors (Jonathan)
>> - Change UNKNOWN to RESERVED (Jonathan)
>> ---
>> Documentation/ABI/stable/sysfs-devices-node | 6 ++++++
>> drivers/acpi/numa/hmat.c | 3 +++
>> drivers/base/node.c | 2 ++
>> include/linux/node.h | 7 +++++++
>> 4 files changed, 18 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
>> index 402af4b2b905..725ef0e1e01f 100644
>> --- a/Documentation/ABI/stable/sysfs-devices-node
>> +++ b/Documentation/ABI/stable/sysfs-devices-node
>> @@ -177,6 +177,12 @@ Description:
>> The cache write policy: 0 for write-back, 1 for write-through,
>> other or unknown.
>>
>> +What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/mode
>
> Mode feels perhaps a bit to vague. Maybe address_mode?
ok
DJ
>
>> +Date: September 2024
>> +Contact: Dave Jiang <dave.jiang@intel.com>
>> +Description:
>> + The address mode: 0 for reserved, 1 for extended-linear.
>> +
>> What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
>> Date: November 2021
>> Contact: Jarkko Sakkinen <jarkko@kernel.org>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index 1a902a02390f..39524f36be5b 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -506,6 +506,9 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
>> switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
>> case ACPI_HMAT_CA_DIRECT_MAPPED:
>> tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
>> + /* Extended Linear mode is only valid if cache is direct mapped */
>> + if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR)
>> + tcache->cache_attrs.mode = NODE_CACHE_MODE_EXTENDED_LINEAR;
>> break;
>> case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
>> tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH v2 2/5] acpi/hmat / cxl: Add extended linear cache support for CXL
2024-11-12 22:12 [RFC PATCH v2 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2024-11-12 22:12 ` [RFC PATCH v2 1/5] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
@ 2024-11-12 22:12 ` Dave Jiang
2024-11-26 16:23 ` Jonathan Cameron
2024-11-12 22:12 ` [RFC PATCH v2 3/5] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2024-11-12 22:12 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
The current cxl region size only indicates the size of the CXL memory
region without accounting for the extended linear cache size. Retrieve the
cache size from HMAT and append that to the cxl region size for the cxl
region range that matches the SRAT range that has extended linear cache
enabled.
The SRAT defines the whole memory range that includes the extended linear
cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
Cache Information Structure defines the size of the extended linear cache
size and matches to the SRAT Memory Affinity Structure by the memory
proxmity domain. Add a helper to match the cxl range to the SRAT memory
range in order to retrieve the cache size.
There are several places that checks the cxl region range against the
decoder range. Use new helper to check between the two ranges and address
the new cache size.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v2:
- Fix spelling errors (Jonathan)
- Move matching of res range to the match loop. (Jonathan)
- Rename region_res_match_range() to region_res_match_cxl_range() and add
comments. (Jonathan)
- Refactor region_res_match_cxl_range() to simplify code. (Jonathan)
- Remove unintended blank line. (Jonathan)
- Add warning emission when cache is not 1:1 to cxl region. (Jonathan)
---
drivers/acpi/numa/hmat.c | 44 ++++++++++++++++++++++++
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/acpi.c | 11 ++++++
drivers/cxl/core/core.h | 3 ++
drivers/cxl/core/region.c | 70 ++++++++++++++++++++++++++++++++++++---
drivers/cxl/cxl.h | 2 ++
include/linux/acpi.h | 19 +++++++++++
tools/testing/cxl/Kbuild | 1 +
8 files changed, 147 insertions(+), 4 deletions(-)
create mode 100644 drivers/cxl/core/acpi.c
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 39524f36be5b..92b818b72ecc 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -108,6 +108,50 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
return NULL;
}
+/**
+ * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
+ * @backing_res: resource from the backing media
+ * @nid: node id for the memory region
+ * @cache_size: (Output) size of extended linear cache.
+ *
+ * Return: 0 on success. Errno on failure.
+ *
+ */
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+ resource_size_t *cache_size)
+{
+ unsigned int pxm = node_to_pxm(nid);
+ struct memory_target *target;
+ struct target_cache *tcache;
+ bool cache_found = false;
+ struct resource *res;
+
+ target = find_mem_target(pxm);
+ if (!target)
+ return -ENOENT;
+
+ list_for_each_entry(tcache, &target->caches, node) {
+ if (tcache->cache_attrs.mode == NODE_CACHE_MODE_EXTENDED_LINEAR) {
+ res = &target->memregions;
+ if (!resource_contains(res, backing_res))
+ continue;
+
+ cache_found = true;
+ break;
+ }
+ }
+
+ if (!cache_found) {
+ *cache_size = 0;
+ return 0;
+ }
+
+ *cache_size = tcache->cache_attrs.size;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
+
static struct memory_target *acpi_find_genport_target(u32 uid)
{
struct memory_target *target;
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..1a0c9c6ca818 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -14,5 +14,6 @@ cxl_core-y += pci.o
cxl_core-y += hdm.o
cxl_core-y += pmu.o
cxl_core-y += cdat.o
+cxl_core-y += acpi.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
new file mode 100644
index 000000000000..f13b4dae6ac5
--- /dev/null
+++ b/drivers/cxl/core/acpi.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#include <linux/acpi.h>
+#include "cxl.h"
+#include "core.h"
+
+int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size)
+{
+ return hmat_get_extended_linear_cache_size(backing_res, nid, size);
+}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 0c62b4069ba0..c4dc9aefe25f 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -110,4 +110,7 @@ bool cxl_need_node_perf_attrs_update(int nid);
int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
struct access_coordinate *c);
+int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size);
+
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e701e4b04032..a37923c030a3 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -816,6 +816,21 @@ static int match_free_decoder(struct device *dev, void *data)
return 0;
}
+static bool region_res_match_cxl_range(struct cxl_region_params *p,
+ struct range *range)
+{
+ if (!p->res)
+ return false;
+
+ /*
+ * The CXL range is assumed to be fronted by the DRAM range in
+ * current known implementation. This assumption will be made
+ * until a variant implementation exists.
+ */
+ return p->res->start + p->cache_size == range->start &&
+ p->res->end == range->end;
+}
+
static int match_auto_decoder(struct device *dev, void *data)
{
struct cxl_region_params *p = data;
@@ -828,7 +843,7 @@ static int match_auto_decoder(struct device *dev, void *data)
cxld = to_cxl_decoder(dev);
r = &cxld->hpa_range;
- if (p->res && p->res->start == r->start && p->res->end == r->end)
+ if (region_res_match_cxl_range(p, r))
return 1;
return 0;
@@ -1406,8 +1421,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
if (cxld->interleave_ways != iw ||
cxld->interleave_granularity != ig ||
- cxld->hpa_range.start != p->res->start ||
- cxld->hpa_range.end != p->res->end ||
+ !region_res_match_cxl_range(p, &cxld->hpa_range) ||
((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
dev_err(&cxlr->dev,
"%s:%s %s expected iw: %d ig: %d %pr\n",
@@ -1931,7 +1945,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return -ENXIO;
}
- if (resource_size(cxled->dpa_res) * p->interleave_ways !=
+ if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
resource_size(p->res)) {
dev_dbg(&cxlr->dev,
"%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
@@ -3215,6 +3229,42 @@ static int match_region_by_range(struct device *dev, void *data)
return rc;
}
+static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
+ struct resource *res)
+{
+ struct cxl_region_params *p = &cxlr->params;
+ int nid = phys_to_target_node(res->start);
+ resource_size_t size, cache_size;
+ int rc;
+
+ size = resource_size(res);
+ if (!size)
+ return -EINVAL;
+
+ rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
+ if (rc)
+ return rc;
+
+ if (!cache_size)
+ return 0;
+
+ if (size != cache_size) {
+ dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
+ return -EOPNOTSUPP;
+ }
+
+ /*
+ * Move the start of the range to where the cache range starts. The
+ * implementation assumes that the cache range is in front of the
+ * CXL range. This is not dictated by the HMAT spec but is how the
+ * currently known implementation configured.
+ */
+ res->start -= cache_size;
+ p->cache_size = cache_size;
+
+ return 0;
+}
+
/* Establish an empty region covering the given HPA range */
static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
struct cxl_endpoint_decoder *cxled)
@@ -3261,6 +3311,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
dev_name(&cxlr->dev));
+
+ rc = cxl_extended_linear_cache_resize(cxlr, res);
+ if (rc) {
+ /*
+ * Failing to support extended linear cache region resize does not
+ * prevent the region from functioning. Only cause cxl list showing
+ * incorrect region size.
+ */
+ dev_warn(cxlmd->dev.parent,
+ "Failed to support extended linear cache.\n");
+ }
+
rc = insert_resource(cxlrd->res, res);
if (rc) {
/*
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 0d8b810a51f0..26466807fa7a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -483,6 +483,7 @@ enum cxl_config_state {
* @res: allocated iomem capacity for this region
* @targets: active ordered targets in current decoder configuration
* @nr_targets: number of targets
+ * @cache_size: extended linear cache size, if exists
*
* State transitions are protected by the cxl_region_rwsem
*/
@@ -494,6 +495,7 @@ struct cxl_region_params {
struct resource *res;
struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
int nr_targets;
+ resource_size_t cache_size;
};
/*
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4d5ee84c468b..10ffba7cb9ad 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -436,12 +436,20 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp);
#ifdef CONFIG_ACPI_HMAT
int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+ resource_size_t *size);
#else
static inline int acpi_get_genport_coordinates(u32 uid,
struct access_coordinate *coord)
{
return -EOPNOTSUPP;
}
+
+static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size)
+{
+ return -EOPNOTSUPP;
+}
#endif
#ifdef CONFIG_ACPI_NUMA
@@ -1090,6 +1098,17 @@ static inline acpi_handle acpi_get_processor_handle(int cpu)
#endif /* !CONFIG_ACPI */
+#ifdef CONFIG_ACPI_HMAT
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+ resource_size_t *size);
+#else
+static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
extern void arch_post_acpi_subsys_init(void);
#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index b1256fee3567..1ae13987a8a2 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -61,6 +61,7 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
cxl_core-y += $(CXL_CORE_SRC)/hdm.o
cxl_core-y += $(CXL_CORE_SRC)/pmu.o
cxl_core-y += $(CXL_CORE_SRC)/cdat.o
+cxl_core-y += $(CXL_CORE_SRC)/acpi.o
cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
cxl_core-y += config_check.o
--
2.47.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 2/5] acpi/hmat / cxl: Add extended linear cache support for CXL
2024-11-12 22:12 ` [RFC PATCH v2 2/5] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
@ 2024-11-26 16:23 ` Jonathan Cameron
2024-12-03 23:08 ` Dave Jiang
0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2024-11-26 16:23 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Tue, 12 Nov 2024 15:12:34 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> The current cxl region size only indicates the size of the CXL memory
> region without accounting for the extended linear cache size. Retrieve the
> cache size from HMAT and append that to the cxl region size for the cxl
> region range that matches the SRAT range that has extended linear cache
> enabled.
>
> The SRAT defines the whole memory range that includes the extended linear
> cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
> Cache Information Structure defines the size of the extended linear cache
> size and matches to the SRAT Memory Affinity Structure by the memory
> proxmity domain. Add a helper to match the cxl range to the SRAT memory
> range in order to retrieve the cache size.
>
> There are several places that checks the cxl region range against the
> decoder range. Use new helper to check between the two ranges and address
> the new cache size.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Hi Dave,
A few minor comments inline.
Thanks,
Jonathan
> ---
> drivers/acpi/numa/hmat.c | 44 ++++++++++++++++++++++++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/acpi.c | 11 ++++++
> drivers/cxl/core/core.h | 3 ++
> drivers/cxl/core/region.c | 70 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/cxl.h | 2 ++
> include/linux/acpi.h | 19 +++++++++++
> tools/testing/cxl/Kbuild | 1 +
> 8 files changed, 147 insertions(+), 4 deletions(-)
> create mode 100644 drivers/cxl/core/acpi.c
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 39524f36be5b..92b818b72ecc 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -108,6 +108,50 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
> return NULL;
> }
>
> +/**
> + * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
> + * @backing_res: resource from the backing media
> + * @nid: node id for the memory region
> + * @cache_size: (Output) size of extended linear cache.
> + *
> + * Return: 0 on success. Errno on failure.
> + *
> + */
> +int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> + resource_size_t *cache_size)
> +{
> + unsigned int pxm = node_to_pxm(nid);
> + struct memory_target *target;
> + struct target_cache *tcache;
> + bool cache_found = false;
> + struct resource *res;
> +
> + target = find_mem_target(pxm);
> + if (!target)
> + return -ENOENT;
> +
> + list_for_each_entry(tcache, &target->caches, node) {
> + if (tcache->cache_attrs.mode == NODE_CACHE_MODE_EXTENDED_LINEAR) {
I'd flip this for slightly better readability.
if (tcache->cache_attrs.mode != NODE_CACHE_MODE_EXTENDED_LINEAR)
continue;
res = ...
> + res = &target->memregions;
> + if (!resource_contains(res, backing_res))
> + continue;
> +
> + cache_found = true;
> + break;
> + }
> + }
> +
> + if (!cache_found) {
> + *cache_size = 0;
> + return 0;
> + }
> +
> + *cache_size = tcache->cache_attrs.size;
Why not set this and return in the loop?
That way no need to have a local variable.
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
> diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
> new file mode 100644
> index 000000000000..f13b4dae6ac5
> --- /dev/null
> +++ b/drivers/cxl/core/acpi.c
> @@ -0,0 +1,11 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
> +#include <linux/acpi.h>
> +#include "cxl.h"
> +#include "core.h"
Why do you need the cxl headers? Maybe a forwards def of
struct resource, but I'm not seeing anything else being needed.
> +
> +int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
> + int nid, resource_size_t *size)
> +{
> + return hmat_get_extended_linear_cache_size(backing_res, nid, size);
> +}
> @@ -3215,6 +3229,42 @@ static int match_region_by_range(struct device *dev, void *data)
> return rc;
> }
>
> +static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> + struct resource *res)
> +{
> + struct cxl_region_params *p = &cxlr->params;
> + int nid = phys_to_target_node(res->start);
> + resource_size_t size, cache_size;
> + int rc;
> +
> + size = resource_size(res);
> + if (!size)
> + return -EINVAL;
> +
> + rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
> + if (rc)
> + return rc;
> +
> + if (!cache_size)
> + return 0;
> +
> + if (size != cache_size) {
> + dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
> + return -EOPNOTSUPP;
> + }
> +
> + /*
> + * Move the start of the range to where the cache range starts. The
> + * implementation assumes that the cache range is in front of the
> + * CXL range. This is not dictated by the HMAT spec but is how the
> + * currently known implementation configured.
is configured
> + */
> + res->start -= cache_size;
> + p->cache_size = cache_size;
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 2/5] acpi/hmat / cxl: Add extended linear cache support for CXL
2024-11-26 16:23 ` Jonathan Cameron
@ 2024-12-03 23:08 ` Dave Jiang
0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2024-12-03 23:08 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 11/26/24 9:23 AM, Jonathan Cameron wrote:
> On Tue, 12 Nov 2024 15:12:34 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> The current cxl region size only indicates the size of the CXL memory
>> region without accounting for the extended linear cache size. Retrieve the
>> cache size from HMAT and append that to the cxl region size for the cxl
>> region range that matches the SRAT range that has extended linear cache
>> enabled.
>>
>> The SRAT defines the whole memory range that includes the extended linear
>> cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
>> Cache Information Structure defines the size of the extended linear cache
>> size and matches to the SRAT Memory Affinity Structure by the memory
>> proxmity domain. Add a helper to match the cxl range to the SRAT memory
>> range in order to retrieve the cache size.
>>
>> There are several places that checks the cxl region range against the
>> decoder range. Use new helper to check between the two ranges and address
>> the new cache size.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Hi Dave,
>
> A few minor comments inline.
>
> Thanks,
>
> Jonathan
>
>> ---
>> drivers/acpi/numa/hmat.c | 44 ++++++++++++++++++++++++
>> drivers/cxl/core/Makefile | 1 +
>> drivers/cxl/core/acpi.c | 11 ++++++
>> drivers/cxl/core/core.h | 3 ++
>> drivers/cxl/core/region.c | 70 ++++++++++++++++++++++++++++++++++++---
>> drivers/cxl/cxl.h | 2 ++
>> include/linux/acpi.h | 19 +++++++++++
>> tools/testing/cxl/Kbuild | 1 +
>> 8 files changed, 147 insertions(+), 4 deletions(-)
>> create mode 100644 drivers/cxl/core/acpi.c
>>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index 39524f36be5b..92b818b72ecc 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -108,6 +108,50 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
>> return NULL;
>> }
>>
>> +/**
>> + * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
>> + * @backing_res: resource from the backing media
>> + * @nid: node id for the memory region
>> + * @cache_size: (Output) size of extended linear cache.
>> + *
>> + * Return: 0 on success. Errno on failure.
>> + *
>> + */
>> +int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
>> + resource_size_t *cache_size)
>> +{
>> + unsigned int pxm = node_to_pxm(nid);
>> + struct memory_target *target;
>> + struct target_cache *tcache;
>> + bool cache_found = false;
>> + struct resource *res;
>> +
>> + target = find_mem_target(pxm);
>> + if (!target)
>> + return -ENOENT;
>> +
>> + list_for_each_entry(tcache, &target->caches, node) {
>> + if (tcache->cache_attrs.mode == NODE_CACHE_MODE_EXTENDED_LINEAR) {
>
> I'd flip this for slightly better readability.
ok
> if (tcache->cache_attrs.mode != NODE_CACHE_MODE_EXTENDED_LINEAR)
> continue;
>
> res = ...
>
>
>> + res = &target->memregions;
>> + if (!resource_contains(res, backing_res))
>> + continue;
>> +
>> + cache_found = true;
>> + break;
>> + }
>> + }
>> +
>> + if (!cache_found) {
>> + *cache_size = 0;
>> + return 0;
>> + }
>> +
>> + *cache_size = tcache->cache_attrs.size;
>
> Why not set this and return in the loop?
> That way no need to have a local variable.
ok
>
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
>
>> diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
>> new file mode 100644
>> index 000000000000..f13b4dae6ac5
>> --- /dev/null
>> +++ b/drivers/cxl/core/acpi.c
>> @@ -0,0 +1,11 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
>> +#include <linux/acpi.h>
>> +#include "cxl.h"
>> +#include "core.h"
>
> Why do you need the cxl headers? Maybe a forwards def of
> struct resource, but I'm not seeing anything else being needed.
The prototype is declared in core.h, and it seems core.h needs cxl.h. I wonder if core.h should just include cxl.h.
>
>
>> +
>> +int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
>> + int nid, resource_size_t *size)
>> +{
>> + return hmat_get_extended_linear_cache_size(backing_res, nid, size);
>> +}
>
>
>> @@ -3215,6 +3229,42 @@ static int match_region_by_range(struct device *dev, void *data)
>> return rc;
>> }
>>
>> +static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
>> + struct resource *res)
>> +{
>> + struct cxl_region_params *p = &cxlr->params;
>> + int nid = phys_to_target_node(res->start);
>> + resource_size_t size, cache_size;
>> + int rc;
>> +
>> + size = resource_size(res);
>> + if (!size)
>> + return -EINVAL;
>> +
>> + rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
>> + if (rc)
>> + return rc;
>> +
>> + if (!cache_size)
>> + return 0;
>> +
>> + if (size != cache_size) {
>> + dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + /*
>> + * Move the start of the range to where the cache range starts. The
>> + * implementation assumes that the cache range is in front of the
>> + * CXL range. This is not dictated by the HMAT spec but is how the
>> + * currently known implementation configured.
>
> is configured
will fix
>
>> + */
>> + res->start -= cache_size;
>> + p->cache_size = cache_size;
>> +
>> + return 0;
>> +}
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH v2 3/5] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-11-12 22:12 [RFC PATCH v2 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2024-11-12 22:12 ` [RFC PATCH v2 1/5] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
2024-11-12 22:12 ` [RFC PATCH v2 2/5] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
@ 2024-11-12 22:12 ` Dave Jiang
2024-11-27 10:23 ` Jonathan Cameron
2024-11-12 22:12 ` [RFC PATCH v2 4/5] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
2024-11-12 22:12 ` [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
4 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2024-11-12 22:12 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Add helper functions to help do address translation for either the address
of the extended linear cache or its alias address. The translation function
attempt to detect an I/O hole in the proximity domain and adjusts the
address if the hole impacts the aliasing of the address. The range of the
I/O hole is retrieved by walking through the associated memory target
resources.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v2:
- Drop extra variable and use 'res' from the loop. (Jonathan)
- Break up multiple if statements into single blocks and add comments. (Jonathan)
---
drivers/acpi/numa/hmat.c | 148 +++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 14 ++++
2 files changed, 162 insertions(+)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 92b818b72ecc..6c686d3c7266 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -152,6 +152,154 @@ int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
}
EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
+static int alias_address_find_iohole(struct memory_target *target,
+ u64 address, u64 alias, struct range *hole)
+{
+ struct resource *res, *prev;
+
+ *hole = (struct range) {
+ .start = 0,
+ .end = -1,
+ };
+
+ /* First find the resource that the address is in */
+ prev = target->memregions.child;
+ for (res = target->memregions.child; res; res = res->sibling) {
+ if (alias >= res->start && alias <= res->end)
+ break;
+ prev = res;
+ }
+ if (!res)
+ return -EINVAL;
+
+ /* No memory hole */
+ if (res == prev)
+ return 0;
+
+ /* If address is within the current resource, no need to deal with memory hole */
+ if (address >= res->start)
+ return 0;
+
+ *hole = (struct range) {
+ .start = prev->end + 1,
+ .end = res->start - 1,
+ };
+
+ return 0;
+}
+
+int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
+{
+ unsigned int pxm = node_to_pxm(nid);
+ struct memory_target *target;
+ struct range iohole;
+ int rc;
+
+ target = find_mem_target(pxm);
+ if (!target)
+ return -EINVAL;
+
+ rc = alias_address_find_iohole(target, address, *alias, &iohole);
+ if (rc)
+ return rc;
+
+ if (!range_len(&iohole))
+ return 0;
+
+ /*
+ * If the cache start (address) is behind the MMIO I/O hole then there
+ * is no change to the passed in CXL address (alias).
+ */
+ if (address >= iohole.start)
+ return 0;
+
+ /*
+ * If the aliased CXL address is before the MMIO I/O hole start then
+ * CXL address (alias) is also not impacted.
+ */
+ if (*alias <= iohole.start)
+ return 0;
+
+ *alias += range_len(&iohole);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_alias_xlat, CXL);
+
+static int target_address_find_iohole(struct memory_target *target,
+ u64 address, u64 alias,
+ struct range *hole)
+{
+ struct resource *res, *next;
+
+ *hole = (struct range) {
+ .start = 0,
+ .end = -1,
+ };
+
+ /* First find the resource that the address is in */
+ for (res = target->memregions.child; res; res = res->sibling) {
+ if (address >= res->start && address <= res->end)
+ break;
+ }
+ if (!res)
+ return -EINVAL;
+
+ next = res->sibling;
+ /* No memory hole after the region */
+ if (!next)
+ return 0;
+
+ /* If alias is within the current resource, no need to deal with memory hole */
+ if (alias <= res->end)
+ return 0;
+
+ *hole = (struct range) {
+ .start = res->end + 1,
+ .end = next->start - 1,
+ };
+
+ return 0;
+}
+
+int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
+{
+ unsigned int pxm = node_to_pxm(nid);
+ struct memory_target *target;
+ struct range iohole;
+ int rc;
+
+ target = find_mem_target(pxm);
+ if (!target)
+ return -EINVAL;
+
+ rc = target_address_find_iohole(target, *address, alias, &iohole);
+ if (rc)
+ return rc;
+
+ if (!range_len(&iohole))
+ return 0;
+
+ /*
+ * If the CXL address is before the MMIO hole then there is no change
+ * to the passed in cache address.
+ */
+ if (alias <= iohole.end)
+ return 0;
+
+ /*
+ * If the calculated cache address is after the MMIO hole then there
+ * is no change to the passed in cache address.
+ */
+ if (*address >= iohole.end)
+ return 0;
+
+ *address -= range_len(&iohole);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_address_xlat, CXL);
+
static struct memory_target *acpi_find_genport_target(u32 uid)
{
struct memory_target *target;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 10ffba7cb9ad..18a94d382d40 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -438,6 +438,8 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp);
int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
resource_size_t *size);
+int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
+int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
#else
static inline int acpi_get_genport_coordinates(u32 uid,
struct access_coordinate *coord)
@@ -450,6 +452,18 @@ static inline int hmat_get_extended_linear_cache_size(struct resource *backing_r
{
return -EOPNOTSUPP;
}
+
+static inline int hmat_extended_linear_cache_alias_xlat(u64 address,
+ u64 *alias, int nid)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int hmat_extended_linear_cache_address_xlat(u64 *address,
+ u64 alias, int nid)
+{
+ return -EOPNOTSUPP;
+}
#endif
#ifdef CONFIG_ACPI_NUMA
--
2.47.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 3/5] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-11-12 22:12 ` [RFC PATCH v2 3/5] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
@ 2024-11-27 10:23 ` Jonathan Cameron
0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-11-27 10:23 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Tue, 12 Nov 2024 15:12:35 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Add helper functions to help do address translation for either the address
> of the extended linear cache or its alias address. The translation function
> attempt to detect an I/O hole in the proximity domain and adjusts the
> address if the hole impacts the aliasing of the address. The range of the
> I/O hole is retrieved by walking through the associated memory target
> resources.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
Trivial comment inline. I'm far from expert on requirements here but it
seems to match your description.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> v2:
> - Drop extra variable and use 'res' from the loop. (Jonathan)
> - Break up multiple if statements into single blocks and add comments. (Jonathan)
> ---
> drivers/acpi/numa/hmat.c | 148 +++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 14 ++++
> 2 files changed, 162 insertions(+)
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 92b818b72ecc..6c686d3c7266 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -152,6 +152,154 @@ int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> }
> EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
>
> +static int alias_address_find_iohole(struct memory_target *target,
> + u64 address, u64 alias, struct range *hole)
> +{
> + struct resource *res, *prev;
> +
> + *hole = (struct range) {
> + .start = 0,
> + .end = -1,
> + };
> +
> + /* First find the resource that the address is in */
> + prev = target->memregions.child;
> + for (res = target->memregions.child; res; res = res->sibling) {
> + if (alias >= res->start && alias <= res->end)
> + break;
> + prev = res;
> + }
> + if (!res)
> + return -EINVAL;
> +
> + /* No memory hole */
> + if (res == prev)
> + return 0;
> +
> + /* If address is within the current resource, no need to deal with memory hole */
Rather long line that could be easily broken.
> + if (address >= res->start)
> + return 0;
> +
> + *hole = (struct range) {
> + .start = prev->end + 1,
> + .end = res->start - 1,
> + };
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH v2 4/5] cxl: Add extended linear cache address alias emission for cxl events
2024-11-12 22:12 [RFC PATCH v2 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
` (2 preceding siblings ...)
2024-11-12 22:12 ` [RFC PATCH v2 3/5] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
@ 2024-11-12 22:12 ` Dave Jiang
2024-11-27 16:40 ` Jonathan Cameron
2024-11-12 22:12 ` [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
4 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2024-11-12 22:12 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Add the aliased address of extended linear cache when emitting event
trace for DRAM and general media of CXL events.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v2:
- Fix spelling errors. (Jonathan)
---
drivers/cxl/core/acpi.c | 10 ++++++++++
drivers/cxl/core/core.h | 7 +++++++
drivers/cxl/core/mbox.c | 42 ++++++++++++++++++++++++++++++++++++---
drivers/cxl/core/region.c | 12 +++++++++++
drivers/cxl/core/trace.h | 24 ++++++++++++++--------
include/linux/acpi.h | 36 +++++++++++++--------------------
6 files changed, 98 insertions(+), 33 deletions(-)
diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
index f13b4dae6ac5..f74136320fc3 100644
--- a/drivers/cxl/core/acpi.c
+++ b/drivers/cxl/core/acpi.c
@@ -9,3 +9,13 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
{
return hmat_get_extended_linear_cache_size(backing_res, nid, size);
}
+
+int cxl_acpi_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
+{
+ return hmat_extended_linear_cache_address_xlat(address, alias, nid);
+}
+
+int cxl_acpi_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
+{
+ return hmat_extended_linear_cache_alias_xlat(address, alias, nid);
+}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index c4dc9aefe25f..9b10039b0ca7 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -30,8 +30,13 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port);
struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
u64 dpa);
+int cxl_region_nid(struct cxl_region *cxlr);
#else
+static inline int cxl_region_nid(struct cxl_region *cxlr)
+{
+ return NUMA_NO_NODE;
+}
static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
const struct cxl_memdev *cxlmd, u64 dpa)
{
@@ -112,5 +117,7 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
int nid, resource_size_t *size);
+int cxl_acpi_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
+int cxl_acpi_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5175138c4fb7..3fa9c658a253 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -856,6 +856,39 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
+static u64 cxlr_hpa_cache_alias(struct cxl_region *cxlr, u64 hpa)
+{
+ struct cxl_region_params *p = &cxlr->params;
+ u64 alias, address;
+ int nid, rc;
+
+ if (!p->cache_size)
+ return ~0ULL;
+
+ nid = cxl_region_nid(cxlr);
+ if (nid == NUMA_NO_NODE)
+ nid = 0;
+
+ if (hpa >= p->res->start + p->cache_size) {
+ address = hpa - p->cache_size;
+ alias = hpa;
+ rc = cxl_acpi_extended_linear_cache_address_xlat(&address,
+ alias, nid);
+ if (rc)
+ return rc;
+
+ return address;
+ }
+
+ address = hpa;
+ alias = hpa + p->cache_size;
+ rc = cxl_acpi_extended_linear_cache_alias_xlat(address, &alias, nid);
+ if (rc)
+ return rc;
+
+ return alias;
+}
+
void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
enum cxl_event_type event_type,
@@ -871,7 +904,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
}
if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) {
- u64 dpa, hpa = ULLONG_MAX;
+ u64 dpa, hpa = ULLONG_MAX, hpa_alias;
struct cxl_region *cxlr;
/*
@@ -887,11 +920,14 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
if (cxlr)
hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+ hpa_alias = cxlr_hpa_cache_alias(cxlr, hpa);
+
if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
- &evt->gen_media);
+ hpa_alias, &evt->gen_media);
else if (event_type == CXL_CPER_EVENT_DRAM)
- trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
+ trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
+ &evt->dram);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a37923c030a3..a7479b4aad8d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2405,6 +2405,18 @@ static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
return true;
}
+int cxl_region_nid(struct cxl_region *cxlr)
+{
+ struct cxl_region_params *p = &cxlr->params;
+ struct resource *res;
+
+ guard(rwsem_read)(&cxl_region_rwsem);
+ res = p->res;
+ if (!res)
+ return NUMA_NO_NODE;
+ return phys_to_target_node(res->start);
+}
+
static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
unsigned long action, void *arg)
{
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 8672b42ee4d1..a63183e23ac8 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -316,9 +316,10 @@ TRACE_EVENT(cxl_generic_event,
TRACE_EVENT(cxl_general_media,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_region *cxlr, u64 hpa, struct cxl_event_gen_media *rec),
+ struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
+ struct cxl_event_gen_media *rec),
- TP_ARGS(cxlmd, log, cxlr, hpa, rec),
+ TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -332,6 +333,7 @@ TRACE_EVENT(cxl_general_media,
__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
/* Following are out of order to pack trace record */
__field(u64, hpa)
+ __field(u64, hpa_alias)
__field_struct(uuid_t, region_uuid)
__field(u16, validity_flags)
__field(u8, rank)
@@ -358,6 +360,7 @@ TRACE_EVENT(cxl_general_media,
CXL_EVENT_GEN_MED_COMP_ID_SIZE);
__entry->validity_flags = get_unaligned_le16(&rec->media_hdr.validity_flags);
__entry->hpa = hpa;
+ __entry->hpa_alias = hpa_alias;
if (cxlr) {
__assign_str(region_name);
uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -370,7 +373,7 @@ TRACE_EVENT(cxl_general_media,
CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
"device=%x comp_id=%s validity_flags='%s' " \
- "hpa=%llx region=%s region_uuid=%pUb",
+ "hpa=%llx hpa_alias=%llx region=%s region_uuid=%pUb",
__entry->dpa, show_dpa_flags(__entry->dpa_flags),
show_event_desc_flags(__entry->descriptor),
show_mem_event_type(__entry->type),
@@ -378,7 +381,8 @@ TRACE_EVENT(cxl_general_media,
__entry->channel, __entry->rank, __entry->device,
__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
show_valid_flags(__entry->validity_flags),
- __entry->hpa, __get_str(region_name), &__entry->region_uuid
+ __entry->hpa, __entry->hpa_alias, __get_str(region_name),
+ &__entry->region_uuid
)
);
@@ -413,9 +417,10 @@ TRACE_EVENT(cxl_general_media,
TRACE_EVENT(cxl_dram,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_region *cxlr, u64 hpa, struct cxl_event_dram *rec),
+ struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
+ struct cxl_event_dram *rec),
- TP_ARGS(cxlmd, log, cxlr, hpa, rec),
+ TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -431,6 +436,7 @@ TRACE_EVENT(cxl_dram,
__field(u32, row)
__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
__field(u64, hpa)
+ __field(u64, hpa_alias)
__field_struct(uuid_t, region_uuid)
__field(u8, rank) /* Out of order to pack trace record */
__field(u8, bank_group) /* Out of order to pack trace record */
@@ -461,6 +467,7 @@ TRACE_EVENT(cxl_dram,
memcpy(__entry->cor_mask, &rec->correction_mask,
CXL_EVENT_DER_CORRECTION_MASK_SIZE);
__entry->hpa = hpa;
+ __entry->hpa_alias = hpa_alias;
if (cxlr) {
__assign_str(region_name);
uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -474,7 +481,7 @@ TRACE_EVENT(cxl_dram,
"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
"validity_flags='%s' " \
- "hpa=%llx region=%s region_uuid=%pUb",
+ "hpa=%llx hpa_alias=%llx region=%s region_uuid=%pUb",
__entry->dpa, show_dpa_flags(__entry->dpa_flags),
show_event_desc_flags(__entry->descriptor),
show_mem_event_type(__entry->type),
@@ -484,7 +491,8 @@ TRACE_EVENT(cxl_dram,
__entry->row, __entry->column,
__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
show_dram_valid_flags(__entry->validity_flags),
- __entry->hpa, __get_str(region_name), &__entry->region_uuid
+ __entry->hpa_alias, __entry->hpa, __get_str(region_name),
+ &__entry->region_uuid
)
);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 18a94d382d40..cdf6d42f5a94 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -436,34 +436,12 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp);
#ifdef CONFIG_ACPI_HMAT
int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
-int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
- resource_size_t *size);
-int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
-int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
#else
static inline int acpi_get_genport_coordinates(u32 uid,
struct access_coordinate *coord)
{
return -EOPNOTSUPP;
}
-
-static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
- int nid, resource_size_t *size)
-{
- return -EOPNOTSUPP;
-}
-
-static inline int hmat_extended_linear_cache_alias_xlat(u64 address,
- u64 *alias, int nid)
-{
- return -EOPNOTSUPP;
-}
-
-static inline int hmat_extended_linear_cache_address_xlat(u64 *address,
- u64 alias, int nid)
-{
- return -EOPNOTSUPP;
-}
#endif
#ifdef CONFIG_ACPI_NUMA
@@ -1115,12 +1093,26 @@ static inline acpi_handle acpi_get_processor_handle(int cpu)
#ifdef CONFIG_ACPI_HMAT
int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
resource_size_t *size);
+int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
+int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
#else
static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
int nid, resource_size_t *size)
{
return -EOPNOTSUPP;
}
+
+static inline int hmat_extended_linear_cache_alias_xlat(u64 address,
+ u64 *alias, int nid)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int hmat_extended_linear_cache_address_xlat(u64 *address,
+ u64 alias, int nid)
+{
+ return -EOPNOTSUPP;
+}
#endif
extern void arch_post_acpi_subsys_init(void);
--
2.47.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 4/5] cxl: Add extended linear cache address alias emission for cxl events
2024-11-12 22:12 ` [RFC PATCH v2 4/5] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
@ 2024-11-27 16:40 ` Jonathan Cameron
0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-11-27 16:40 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Tue, 12 Nov 2024 15:12:36 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Add the aliased address of extended linear cache when emitting event
> trace for DRAM and general media of CXL events.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
There is some code movement in here I wasn't expecting to see.
Otherwise looks fine to me.
Jonathan
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 18a94d382d40..cdf6d42f5a94 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -436,34 +436,12 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp);
>
> #ifdef CONFIG_ACPI_HMAT
> int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
> -int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> - resource_size_t *size);
> -int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
> -int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
> #else
> static inline int acpi_get_genport_coordinates(u32 uid,
> struct access_coordinate *coord)
> {
> return -EOPNOTSUPP;
> }
> -
> -static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
> - int nid, resource_size_t *size)
> -{
> - return -EOPNOTSUPP;
> -}
> -
> -static inline int hmat_extended_linear_cache_alias_xlat(u64 address,
> - u64 *alias, int nid)
> -{
> - return -EOPNOTSUPP;
> -}
> -
> -static inline int hmat_extended_linear_cache_address_xlat(u64 *address,
> - u64 alias, int nid)
> -{
> - return -EOPNOTSUPP;
> -}
> #endif
>
> #ifdef CONFIG_ACPI_NUMA
> @@ -1115,12 +1093,26 @@ static inline acpi_handle acpi_get_processor_handle(int cpu)
> #ifdef CONFIG_ACPI_HMAT
> int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> resource_size_t *size);
> +int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
> +int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
If this makes sense can we put them here in the first place?
> #else
> static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
> int nid, resource_size_t *size)
> {
> return -EOPNOTSUPP;
> }
> +
> +static inline int hmat_extended_linear_cache_alias_xlat(u64 address,
> + u64 *alias, int nid)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline int hmat_extended_linear_cache_address_xlat(u64 *address,
> + u64 alias, int nid)
> +{
> + return -EOPNOTSUPP;
> +}
> #endif
>
> extern void arch_post_acpi_subsys_init(void);
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-11-12 22:12 [RFC PATCH v2 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
` (3 preceding siblings ...)
2024-11-12 22:12 ` [RFC PATCH v2 4/5] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
@ 2024-11-12 22:12 ` Dave Jiang
2024-11-13 8:11 ` Borislav Petkov
4 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2024-11-12 22:12 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Below is a setup with extended linear cache configuration with an example
layout of memory region shown below presented as a single memory region
consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
The kernel sees a region of total 256G of system memory.
128G DRAM 128G CXL memory
|-----------------------------------|-------------------------------------|
Data resides in either DRAM or far memory (FM) with no replication. Hot
data is swapped into DRAM by the hardware behind the scenes. When error is
detected in one location, it is possible that error also resides in the
aliased location. Therefore when a memory location that is flagged by MCE
is part of the special region, the aliased memory location needs to be
offlined as well.
Add an mce notify callback to identify if the MCE address location is part
of an extended linear cache region and handle accordingly.
Added symbol export to set_mce_nospec() in x86 code in order to call
set_mce_nospec() from the CXL MCE notify callback.
Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v2:
- Move mce code to core/mce.c and add arch wrappers for CONFIG_X86_64. (Jonathan)
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/mm/pat/set_memory.c | 1 +
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/mbox.c | 3 +++
drivers/cxl/core/mce.c | 52 ++++++++++++++++++++++++++++++++++++
drivers/cxl/core/mce.h | 14 ++++++++++
drivers/cxl/core/region.c | 25 +++++++++++++++++
drivers/cxl/cxl.h | 6 +++++
drivers/cxl/cxlmem.h | 2 ++
9 files changed, 105 insertions(+)
create mode 100644 drivers/cxl/core/mce.c
create mode 100644 drivers/cxl/core/mce.h
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3b9970117a0f..a8ad140d5692 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -182,6 +182,7 @@ enum mce_notifier_prios {
MCE_PRIO_NFIT,
MCE_PRIO_EXTLOG,
MCE_PRIO_UC,
+ MCE_PRIO_CXL,
MCE_PRIO_EARLY,
MCE_PRIO_CEC,
MCE_PRIO_HIGHEST = MCE_PRIO_CEC
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 44f7b2ea6a07..1f85c29e118e 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2083,6 +2083,7 @@ int set_mce_nospec(unsigned long pfn)
pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
return rc;
}
+EXPORT_SYMBOL_GPL(set_mce_nospec);
/* Restore full speculative operation to the pfn. */
int clear_mce_nospec(unsigned long pfn)
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 1a0c9c6ca818..d619fb11febd 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -17,3 +17,4 @@ cxl_core-y += cdat.o
cxl_core-y += acpi.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
+cxl_core-$(CONFIG_X86_MCE) += mce.o
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 3fa9c658a253..2304786a1333 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -11,6 +11,7 @@
#include "core.h"
#include "trace.h"
+#include "mce.h"
static bool cxl_raw_allow_all;
@@ -1489,6 +1490,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
+ cxl_register_mce_notifier(&mds->mce_notifier);
+
return mds;
}
EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, CXL);
diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
new file mode 100644
index 000000000000..801e4e4ef91a
--- /dev/null
+++ b/drivers/cxl/core/mce.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#include <linux/notifier.h>
+#include <linux/set_memory.h>
+#include <asm/mce.h>
+#include <cxlmem.h>
+#include "mce.h"
+
+static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
+ void *data)
+{
+ struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
+ mce_notifier);
+ struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct mce *mce = (struct mce *)data;
+ u64 spa, spa_alias;
+ unsigned long pfn;
+
+ if (!mce || !mce_usable_address(mce))
+ return NOTIFY_DONE;
+
+ spa = mce->addr & MCI_ADDR_PHYSADDR;
+
+ pfn = spa >> PAGE_SHIFT;
+ if (!pfn_valid(pfn))
+ return NOTIFY_DONE;
+
+ spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
+ if (!spa_alias)
+ return NOTIFY_DONE;
+
+ pfn = spa_alias >> PAGE_SHIFT;
+
+ /*
+ * Take down the aliased memory page. The original memory page flagged
+ * by the MCE will be taken cared of by the standard MCE handler.
+ */
+ dev_emerg(mds->cxlds.dev, "Offlining aliased SPA address: %#llx\n",
+ spa_alias);
+ if (!memory_failure(pfn, 0))
+ set_mce_nospec(pfn);
+
+ return NOTIFY_OK;
+}
+
+void cxl_register_mce_notifier(struct notifier_block *mce_notifier)
+{
+ mce_notifier->notifier_call = cxl_handle_mce;
+ mce_notifier->priority = MCE_PRIO_CXL;
+ mce_register_decode_chain(mce_notifier);
+}
diff --git a/drivers/cxl/core/mce.h b/drivers/cxl/core/mce.h
new file mode 100644
index 000000000000..1d747618bbc0
--- /dev/null
+++ b/drivers/cxl/core/mce.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#ifndef _CXL_CORE_MCE_H_
+#define _CXL_CORE_MCE_H_
+
+#include <linux/notifier.h>
+
+#ifdef CONFIG_X86_MCE
+void cxl_register_mce_notifier(struct notifier_block *mce_notifer);
+#else
+static inline void cxl_register_mce_notifier(struct notifier_block *mce_notifier) {}
+#endif
+
+#endif
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a7479b4aad8d..d141ab11f784 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3440,6 +3440,31 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
}
EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
+u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa)
+{
+ struct cxl_region_ref *iter;
+ unsigned long index;
+
+ guard(rwsem_write)(&cxl_region_rwsem);
+
+ xa_for_each(&endpoint->regions, index, iter) {
+ struct cxl_region_params *p = &iter->region->params;
+
+ if (p->res->start <= spa && spa <= p->res->end) {
+ if (!p->cache_size)
+ return 0;
+
+ if (spa > p->res->start + p->cache_size)
+ return spa - p->cache_size;
+
+ return spa + p->cache_size;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_get_spa_cache_alias, CXL);
+
static int is_system_ram(struct resource *res, void *arg)
{
struct cxl_region *cxlr = arg;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 26466807fa7a..6b612a87469d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -866,6 +866,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled);
struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
+u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
#else
static inline bool is_cxl_pmem_region(struct device *dev)
{
@@ -884,6 +885,11 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
{
return NULL;
}
+static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
+ u64 spa)
+{
+ return 0;
+}
#endif
void cxl_endpoint_parse_cdat(struct cxl_port *port);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2a25d1957ddb..55752cbf408c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -477,6 +477,7 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
* @poison: poison driver state info
* @security: security driver state info
* @fw: firmware upload / activation state
+ * @mce_notifier: MCE notifier
*
* See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
* details on capacity parameters.
@@ -503,6 +504,7 @@ struct cxl_memdev_state {
struct cxl_poison_state poison;
struct cxl_security_state security;
struct cxl_fw_state fw;
+ struct notifier_block mce_notifier;
};
static inline struct cxl_memdev_state *
--
2.47.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-11-12 22:12 ` [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
@ 2024-11-13 8:11 ` Borislav Petkov
2024-11-13 15:27 ` Dave Jiang
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2024-11-13 8:11 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, dan.j.williams, tony.luck, dave,
jonathan.cameron, alison.schofield, ira.weiny
On Tue, Nov 12, 2024 at 03:12:37PM -0700, Dave Jiang wrote:
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 3b9970117a0f..a8ad140d5692 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -182,6 +182,7 @@ enum mce_notifier_prios {
> MCE_PRIO_NFIT,
> MCE_PRIO_EXTLOG,
> MCE_PRIO_UC,
> + MCE_PRIO_CXL,
> MCE_PRIO_EARLY,
> MCE_PRIO_CEC,
> MCE_PRIO_HIGHEST = MCE_PRIO_CEC
Why this priority exactly?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-11-13 8:11 ` Borislav Petkov
@ 2024-11-13 15:27 ` Dave Jiang
2024-11-14 9:32 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2024-11-13 15:27 UTC (permalink / raw)
To: Borislav Petkov
Cc: linux-cxl, linux-acpi, rafael, dan.j.williams, tony.luck, dave,
jonathan.cameron, alison.schofield, ira.weiny
On 11/13/24 1:11 AM, Borislav Petkov wrote:
> On Tue, Nov 12, 2024 at 03:12:37PM -0700, Dave Jiang wrote:
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index 3b9970117a0f..a8ad140d5692 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -182,6 +182,7 @@ enum mce_notifier_prios {
>> MCE_PRIO_NFIT,
>> MCE_PRIO_EXTLOG,
>> MCE_PRIO_UC,
>> + MCE_PRIO_CXL,
>> MCE_PRIO_EARLY,
>> MCE_PRIO_CEC,
>> MCE_PRIO_HIGHEST = MCE_PRIO_CEC
>
> Why this priority exactly?
Hi Boris,
I'm actually looking for recommendation on what the proper one is. The handler is expected to offline the aliased address of the reported MCE if there is one.
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-11-13 15:27 ` Dave Jiang
@ 2024-11-14 9:32 ` Borislav Petkov
2024-11-14 15:52 ` Dave Jiang
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2024-11-14 9:32 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, dan.j.williams, tony.luck, dave,
jonathan.cameron, alison.schofield, ira.weiny
On Wed, Nov 13, 2024 at 08:27:35AM -0700, Dave Jiang wrote:
> I'm actually looking for recommendation on what the proper one is. The
> handler is expected to offline the aliased address of the reported MCE if
> there is one.
Well, MCE_PRIO_EARLY will emit a trace record so that if you have error events
consumers like rasdaemon, it'll get that error record for reporting etc.
MCE_PRIO_UC calls memory_failure() on the error and thus offlines the page.
Functionality which you're partly replicating in your notifier.
And since you wanna do the same thing, why are you even adding a new priority
instead of using MCE_PRIO_UC? amdgpu_bad_page_notifier() uses that same prio
because it does a similar thing.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC PATCH v2 5/5] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-11-14 9:32 ` Borislav Petkov
@ 2024-11-14 15:52 ` Dave Jiang
0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2024-11-14 15:52 UTC (permalink / raw)
To: Borislav Petkov
Cc: linux-cxl, linux-acpi, rafael, dan.j.williams, tony.luck, dave,
jonathan.cameron, alison.schofield, ira.weiny
On 11/14/24 2:32 AM, Borislav Petkov wrote:
> On Wed, Nov 13, 2024 at 08:27:35AM -0700, Dave Jiang wrote:
>> I'm actually looking for recommendation on what the proper one is. The
>> handler is expected to offline the aliased address of the reported MCE if
>> there is one.
>
> Well, MCE_PRIO_EARLY will emit a trace record so that if you have error events
> consumers like rasdaemon, it'll get that error record for reporting etc.
>
> MCE_PRIO_UC calls memory_failure() on the error and thus offlines the page.
> Functionality which you're partly replicating in your notifier.
>
> And since you wanna do the same thing, why are you even adding a new priority
> instead of using MCE_PRIO_UC? amdgpu_bad_page_notifier() uses that same prio
> because it does a similar thing.
Ok thanks for the explanation. I will use MCE_PRIO_UC.
>
^ permalink raw reply [flat|nested] 16+ messages in thread