* [RFC PATCH 1/6] ACPICA: actbl1.h: Add extended linear address mode to MSCIS
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
@ 2024-09-27 14:16 ` Dave Jiang
2024-10-02 17:57 ` Rafael J. Wysocki
2024-09-27 14:16 ` [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
` (5 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-09-27 14:16 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
ECN for "Extended-linear" addressing for direct-mapped memory-side caches
adds a field in the SRAT Memory Side CAche Information Structure to
indicate the address mode at the previously reserved bytes at offset 28.
The field is described as:
When Address Mode is 1 'Extended-Linear' it indicates that the
associated address range (SRAT.MemoryAffinityStructure.Length) is
comprised of the backing store capacity extended by the cache
capacity. It is arranged such that there are N directly addressable
aliases of a given cacheline where N is an integer ratio of target memory
proximity domain size and the memory side cache size. Where the N
aliased addresses for a given cacheline all share the same result
for the operation 'address modulo cache size'. This setting is only
allowed when 'Cache Associativity' is 'Direct Map'."
Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Link: https://github.com/acpica/acpica/pull/961
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
include/acpi/actbl1.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 841ef9f22795..95ddc858a0c3 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1791,7 +1791,7 @@ struct acpi_hmat_cache {
u32 reserved1;
u64 cache_size;
u32 cache_attributes;
- u16 reserved2;
+ u16 address_mode;
u16 number_of_SMBIOShandles;
};
@@ -1803,6 +1803,9 @@ struct acpi_hmat_cache {
#define ACPI_HMAT_WRITE_POLICY (0x0000F000)
#define ACPI_HMAT_CACHE_LINE_SIZE (0xFFFF0000)
+#define ACPI_HMAT_CACHE_MODE_UNKNOWN (0)
+#define ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR (1)
+
/* Values for cache associativity flag */
#define ACPI_HMAT_CA_NONE (0)
--
2.46.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [RFC PATCH 1/6] ACPICA: actbl1.h: Add extended linear address mode to MSCIS
2024-09-27 14:16 ` [RFC PATCH 1/6] ACPICA: actbl1.h: Add extended linear address mode to MSCIS Dave Jiang
@ 2024-10-02 17:57 ` Rafael J. Wysocki
0 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2024-10-02 17:57 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, jonathan.cameron, alison.schofield, ira.weiny
On Fri, Sep 27, 2024 at 4:21 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
> ECN for "Extended-linear" addressing for direct-mapped memory-side caches
> adds a field in the SRAT Memory Side CAche Information Structure to
> indicate the address mode at the previously reserved bytes at offset 28.
>
> The field is described as:
> When Address Mode is 1 'Extended-Linear' it indicates that the
> associated address range (SRAT.MemoryAffinityStructure.Length) is
> comprised of the backing store capacity extended by the cache
> capacity. It is arranged such that there are N directly addressable
> aliases of a given cacheline where N is an integer ratio of target memory
> proximity domain size and the memory side cache size. Where the N
> aliased addresses for a given cacheline all share the same result
> for the operation 'address modulo cache size'. This setting is only
> allowed when 'Cache Associativity' is 'Direct Map'."
>
> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> Link: https://github.com/acpica/acpica/pull/961
This pull request has been merged into upstream ACPICA, so
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
and I'm assuming that it will be routed through the CXL tree along
with the rest of the patch series.
Thanks!
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> include/acpi/actbl1.h | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index 841ef9f22795..95ddc858a0c3 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1791,7 +1791,7 @@ struct acpi_hmat_cache {
> u32 reserved1;
> u64 cache_size;
> u32 cache_attributes;
> - u16 reserved2;
> + u16 address_mode;
> u16 number_of_SMBIOShandles;
> };
>
> @@ -1803,6 +1803,9 @@ struct acpi_hmat_cache {
> #define ACPI_HMAT_WRITE_POLICY (0x0000F000)
> #define ACPI_HMAT_CACHE_LINE_SIZE (0xFFFF0000)
>
> +#define ACPI_HMAT_CACHE_MODE_UNKNOWN (0)
> +#define ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR (1)
> +
> /* Values for cache associativity flag */
>
> #define ACPI_HMAT_CA_NONE (0)
> --
> 2.46.1
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2024-09-27 14:16 ` [RFC PATCH 1/6] ACPICA: actbl1.h: Add extended linear address mode to MSCIS Dave Jiang
@ 2024-09-27 14:16 ` Dave Jiang
2024-10-17 16:00 ` Jonathan Cameron
2024-09-27 14:16 ` [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
` (4 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-09-27 14:16 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Store the address mode as part of the cache attriutes. Export the mode
attribute to sysfs as all other cache attributes.
Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
Documentation/ABI/stable/sysfs-devices-node | 7 +++++++
drivers/acpi/numa/hmat.c | 3 +++
drivers/base/node.c | 2 ++
include/linux/node.h | 7 +++++++
4 files changed, 19 insertions(+)
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index 402af4b2b905..9016cc4f027c 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -177,6 +177,13 @@ Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.
+What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/mode
+Date: September 2024
+Contact: Dave Jiang <dave.jiang@intel.com>
+Description:
+ The address mode: 0 for reserved, 1 for extended-lniear,
+ other unknown.
+
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
Date: November 2021
Contact: Jarkko Sakkinen <jarkko@kernel.org>
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 1a902a02390f..39524f36be5b 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -506,6 +506,9 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
case ACPI_HMAT_CA_DIRECT_MAPPED:
tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
+ /* Extended Linear mode is only valid if cache is direct mapped */
+ if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR)
+ tcache->cache_attrs.mode = NODE_CACHE_MODE_EXTENDED_LINEAR;
break;
case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
diff --git a/drivers/base/node.c b/drivers/base/node.c
index eb72580288e6..744be5470728 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
CACHE_ATTR(line_size, "%u")
CACHE_ATTR(indexing, "%u")
CACHE_ATTR(write_policy, "%u")
+CACHE_ATTR(mode, "%u")
static struct attribute *cache_attrs[] = {
&dev_attr_indexing.attr,
&dev_attr_size.attr,
&dev_attr_line_size.attr,
&dev_attr_write_policy.attr,
+ &dev_attr_mode.attr,
NULL,
};
ATTRIBUTE_GROUPS(cache);
diff --git a/include/linux/node.h b/include/linux/node.h
index 9a881c2208b3..589951c5e36f 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -57,6 +57,11 @@ enum cache_write_policy {
NODE_CACHE_WRITE_OTHER,
};
+enum cache_mode {
+ NODE_CACHE_MODE_UNKOWN,
+ NODE_CACHE_MODE_EXTENDED_LINEAR,
+};
+
/**
* struct node_cache_attrs - system memory caching attributes
*
@@ -65,6 +70,7 @@ enum cache_write_policy {
* @size: Total size of cache in bytes
* @line_size: Number of bytes fetched on a cache miss
* @level: The cache hierarchy level
+ * @mode: The address mode
*/
struct node_cache_attrs {
enum cache_indexing indexing;
@@ -72,6 +78,7 @@ struct node_cache_attrs {
u64 size;
u16 line_size;
u8 level;
+ u16 mode;
};
#ifdef CONFIG_HMEM_REPORTING
--
2.46.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode
2024-09-27 14:16 ` [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
@ 2024-10-17 16:00 ` Jonathan Cameron
2024-10-29 21:01 ` Dave Jiang
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:00 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Fri, 27 Sep 2024 07:16:54 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Store the address mode as part of the cache attriutes. Export the mode
> attribute to sysfs as all other cache attributes.
>
> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Minor things inline. Basically looks fine.
Jonathan
> ---
> Documentation/ABI/stable/sysfs-devices-node | 7 +++++++
> drivers/acpi/numa/hmat.c | 3 +++
> drivers/base/node.c | 2 ++
> include/linux/node.h | 7 +++++++
> 4 files changed, 19 insertions(+)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 402af4b2b905..9016cc4f027c 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -177,6 +177,13 @@ Description:
> The cache write policy: 0 for write-back, 1 for write-through,
> other or unknown.
>
> +What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/mode
> +Date: September 2024
> +Contact: Dave Jiang <dave.jiang@intel.com>
> +Description:
> + The address mode: 0 for reserved, 1 for extended-lniear,
linear
also, is 0 reserved or unknown? I'm confused.
> + other unknown.
> +
> What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> Date: November 2021
> Contact: Jarkko Sakkinen <jarkko@kernel.org>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 1a902a02390f..39524f36be5b 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -506,6 +506,9 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
> switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
> case ACPI_HMAT_CA_DIRECT_MAPPED:
> tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
> + /* Extended Linear mode is only valid if cache is direct mapped */
> + if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR)
> + tcache->cache_attrs.mode = NODE_CACHE_MODE_EXTENDED_LINEAR;
> break;
> case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
> tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index eb72580288e6..744be5470728 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
> CACHE_ATTR(line_size, "%u")
> CACHE_ATTR(indexing, "%u")
> CACHE_ATTR(write_policy, "%u")
> +CACHE_ATTR(mode, "%u")
>
> static struct attribute *cache_attrs[] = {
> &dev_attr_indexing.attr,
> &dev_attr_size.attr,
> &dev_attr_line_size.attr,
> &dev_attr_write_policy.attr,
> + &dev_attr_mode.attr,
> NULL,
> };
> ATTRIBUTE_GROUPS(cache);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 9a881c2208b3..589951c5e36f 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -57,6 +57,11 @@ enum cache_write_policy {
> NODE_CACHE_WRITE_OTHER,
> };
>
> +enum cache_mode {
> + NODE_CACHE_MODE_UNKOWN,
UNKNOWN
> + NODE_CACHE_MODE_EXTENDED_LINEAR,
> +};
> +
> /**
> * struct node_cache_attrs - system memory caching attributes
> *
> @@ -65,6 +70,7 @@ enum cache_write_policy {
> * @size: Total size of cache in bytes
> * @line_size: Number of bytes fetched on a cache miss
> * @level: The cache hierarchy level
> + * @mode: The address mode
> */
> struct node_cache_attrs {
> enum cache_indexing indexing;
> @@ -72,6 +78,7 @@ struct node_cache_attrs {
> u64 size;
> u16 line_size;
> u8 level;
> + u16 mode;
> };
>
> #ifdef CONFIG_HMEM_REPORTING
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode
2024-10-17 16:00 ` Jonathan Cameron
@ 2024-10-29 21:01 ` Dave Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2024-10-29 21:01 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/17/24 9:00 AM, Jonathan Cameron wrote:
> On Fri, 27 Sep 2024 07:16:54 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Store the address mode as part of the cache attriutes. Export the mode
>> attribute to sysfs as all other cache attributes.
>>
>> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Minor things inline. Basically looks fine.
>
> Jonathan
>
>> ---
>> Documentation/ABI/stable/sysfs-devices-node | 7 +++++++
>> drivers/acpi/numa/hmat.c | 3 +++
>> drivers/base/node.c | 2 ++
>> include/linux/node.h | 7 +++++++
>> 4 files changed, 19 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
>> index 402af4b2b905..9016cc4f027c 100644
>> --- a/Documentation/ABI/stable/sysfs-devices-node
>> +++ b/Documentation/ABI/stable/sysfs-devices-node
>> @@ -177,6 +177,13 @@ Description:
>> The cache write policy: 0 for write-back, 1 for write-through,
>> other or unknown.
>>
>> +What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/mode
>> +Date: September 2024
>> +Contact: Dave Jiang <dave.jiang@intel.com>
>> +Description:
>> + The address mode: 0 for reserved, 1 for extended-lniear,
>
> linear
>
> also, is 0 reserved or unknown? I'm confused.
It's labeled Reserved and indicates unknown in the document.
0 - Reserved (Unknown Address Mode)
I'll just remove "others unknown" line and have 0 as reserved and as well have the define as RESERVED below.
DJ
>
>
>> + other unknown.
>> +
>> What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
>> Date: November 2021
>> Contact: Jarkko Sakkinen <jarkko@kernel.org>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index 1a902a02390f..39524f36be5b 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -506,6 +506,9 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
>> switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
>> case ACPI_HMAT_CA_DIRECT_MAPPED:
>> tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
>> + /* Extended Linear mode is only valid if cache is direct mapped */
>> + if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR)
>> + tcache->cache_attrs.mode = NODE_CACHE_MODE_EXTENDED_LINEAR;
>> break;
>> case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
>> tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index eb72580288e6..744be5470728 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
>> CACHE_ATTR(line_size, "%u")
>> CACHE_ATTR(indexing, "%u")
>> CACHE_ATTR(write_policy, "%u")
>> +CACHE_ATTR(mode, "%u")
>>
>> static struct attribute *cache_attrs[] = {
>> &dev_attr_indexing.attr,
>> &dev_attr_size.attr,
>> &dev_attr_line_size.attr,
>> &dev_attr_write_policy.attr,
>> + &dev_attr_mode.attr,
>> NULL,
>> };
>> ATTRIBUTE_GROUPS(cache);
>> diff --git a/include/linux/node.h b/include/linux/node.h
>> index 9a881c2208b3..589951c5e36f 100644
>> --- a/include/linux/node.h
>> +++ b/include/linux/node.h
>> @@ -57,6 +57,11 @@ enum cache_write_policy {
>> NODE_CACHE_WRITE_OTHER,
>> };
>>
>> +enum cache_mode {
>> + NODE_CACHE_MODE_UNKOWN,
> UNKNOWN
>
>> + NODE_CACHE_MODE_EXTENDED_LINEAR,
>> +};
>> +
>> /**
>> * struct node_cache_attrs - system memory caching attributes
>> *
>> @@ -65,6 +70,7 @@ enum cache_write_policy {
>> * @size: Total size of cache in bytes
>> * @line_size: Number of bytes fetched on a cache miss
>> * @level: The cache hierarchy level
>> + * @mode: The address mode
>> */
>> struct node_cache_attrs {
>> enum cache_indexing indexing;
>> @@ -72,6 +78,7 @@ struct node_cache_attrs {
>> u64 size;
>> u16 line_size;
>> u8 level;
>> + u16 mode;
>> };
>>
>> #ifdef CONFIG_HMEM_REPORTING
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2024-09-27 14:16 ` [RFC PATCH 1/6] ACPICA: actbl1.h: Add extended linear address mode to MSCIS Dave Jiang
2024-09-27 14:16 ` [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
@ 2024-09-27 14:16 ` Dave Jiang
2024-10-17 16:20 ` Jonathan Cameron
2024-09-27 14:16 ` [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
` (3 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-09-27 14:16 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
The current cxl region size only indicates the size of the CXL memory region
without accounting for the extended linear cache size. Retrieve the cache
size from HMAT and append that to the cxl region size for the cxl region
range that matches the SRAT range that has extended linear cache enabled.
The SRAT defines the whole memory range that inclues the extended linear
cache and the CXL memory region. The new HMAT update to the Memory Side
Cache Information Structure defines the size of the extended linear cache
size and matches to the SRAT Memory Affinity Structure by the memory proxmity
domain. Add a helper to match the cxl range to the SRAT memory range in order
to retrieve the cache size.
There are several places that checks the cxl region range against the
decoder range. Use new helper to check between the two ranges and address
the new cache size.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/acpi/numa/hmat.c | 44 +++++++++++++++++++++++++++++++++
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/acpi.c | 11 +++++++++
drivers/cxl/core/core.h | 3 +++
drivers/cxl/core/region.c | 51 ++++++++++++++++++++++++++++++++++++---
drivers/cxl/cxl.h | 2 ++
include/linux/acpi.h | 8 ++++++
tools/testing/cxl/Kbuild | 1 +
8 files changed, 117 insertions(+), 4 deletions(-)
create mode 100644 drivers/cxl/core/acpi.c
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 39524f36be5b..d299f8d7af8c 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -108,6 +108,50 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
return NULL;
}
+/**
+ * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
+ * @backing_res: resource from the backing media
+ * @nid: node id for the memory region
+ * @cache_size: (Output) size of extended linear cache.
+ *
+ * Return: 0 on success. Errno on failure.
+ *
+ */
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+ resource_size_t *cache_size)
+{
+ unsigned int pxm = node_to_pxm(nid);
+ struct memory_target *target;
+ struct target_cache *tcache;
+ bool cache_found = false;
+ struct resource *res;
+
+ target = find_mem_target(pxm);
+ if (!target)
+ return -ENOENT;
+
+ list_for_each_entry(tcache, &target->caches, node) {
+ if (tcache->cache_attrs.mode == NODE_CACHE_MODE_EXTENDED_LINEAR) {
+ cache_found = true;
+ break;
+ }
+ }
+
+ if (!cache_found) {
+ *cache_size = 0;
+ return 0;
+ }
+
+ res = &target->memregions;
+ if (!resource_contains(res, backing_res))
+ return -ENOENT;
+
+ *cache_size = tcache->cache_attrs.size;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
+
static struct memory_target *acpi_find_genport_target(u32 uid)
{
struct memory_target *target;
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..1a0c9c6ca818 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -14,5 +14,6 @@ cxl_core-y += pci.o
cxl_core-y += hdm.o
cxl_core-y += pmu.o
cxl_core-y += cdat.o
+cxl_core-y += acpi.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
new file mode 100644
index 000000000000..f13b4dae6ac5
--- /dev/null
+++ b/drivers/cxl/core/acpi.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#include <linux/acpi.h>
+#include "cxl.h"
+#include "core.h"
+
+int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size)
+{
+ return hmat_get_extended_linear_cache_size(backing_res, nid, size);
+}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 72a506c9dbd0..dd586c76c773 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -108,4 +108,7 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr,
enum access_coordinate_class access);
bool cxl_need_node_perf_attrs_update(int nid);
+int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size);
+
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 21ad5f242875..ddfb1e1a8909 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -816,6 +816,17 @@ static int match_free_decoder(struct device *dev, void *data)
return 0;
}
+static bool region_res_match_range(struct cxl_region_params *p,
+ struct range *range)
+{
+ if (p->res &&
+ p->res->start + p->cache_size == range->start &&
+ p->res->end == range->end)
+ return true;
+
+ return false;
+}
+
static int match_auto_decoder(struct device *dev, void *data)
{
struct cxl_region_params *p = data;
@@ -828,7 +839,7 @@ static int match_auto_decoder(struct device *dev, void *data)
cxld = to_cxl_decoder(dev);
r = &cxld->hpa_range;
- if (p->res && p->res->start == r->start && p->res->end == r->end)
+ if (region_res_match_range(p, r))
return 1;
return 0;
@@ -1406,8 +1417,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
if (cxld->interleave_ways != iw ||
cxld->interleave_granularity != ig ||
- cxld->hpa_range.start != p->res->start ||
- cxld->hpa_range.end != p->res->end ||
+ !region_res_match_range(p, &cxld->hpa_range) ||
((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
dev_err(&cxlr->dev,
"%s:%s %s expected iw: %d ig: %d %pr\n",
@@ -1931,7 +1941,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return -ENXIO;
}
- if (resource_size(cxled->dpa_res) * p->interleave_ways !=
+ if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
resource_size(p->res)) {
dev_dbg(&cxlr->dev,
"%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
@@ -3226,6 +3236,34 @@ static int match_region_by_range(struct device *dev, void *data)
return rc;
}
+static int cxl_extended_linear_cache_resize(struct cxl_region_params *p,
+ struct resource *res)
+{
+ int nid = phys_to_target_node(res->start);
+ resource_size_t size, cache_size;
+ int rc;
+
+ size = resource_size(res);
+ if (!size)
+ return -EINVAL;
+
+ rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
+ if (rc)
+ return rc;
+
+ if (!cache_size)
+ return 0;
+
+ if (size != cache_size)
+ return -EINVAL;
+
+ res->start -= cache_size;
+ p->cache_size = cache_size;
+
+ return 0;
+}
+
+
/* Establish an empty region covering the given HPA range */
static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
struct cxl_endpoint_decoder *cxled)
@@ -3272,6 +3310,11 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
dev_name(&cxlr->dev));
+
+ rc = cxl_extended_linear_cache_resize(p, res);
+ if (rc)
+ goto err;
+
rc = insert_resource(cxlrd->res, res);
if (rc) {
/*
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9afb407d438f..d8d715090779 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -483,6 +483,7 @@ enum cxl_config_state {
* @res: allocated iomem capacity for this region
* @targets: active ordered targets in current decoder configuration
* @nr_targets: number of targets
+ * @cache_size: extended linear cache size, if exists
*
* State transitions are protected by the cxl_region_rwsem
*/
@@ -494,6 +495,7 @@ struct cxl_region_params {
struct resource *res;
struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
int nr_targets;
+ resource_size_t cache_size;
};
/*
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 0687a442fec7..8ed72d431dca 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -435,12 +435,20 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp);
#ifdef CONFIG_ACPI_HMAT
int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+ resource_size_t *size);
#else
static inline int acpi_get_genport_coordinates(u32 uid,
struct access_coordinate *coord)
{
return -EOPNOTSUPP;
}
+
+static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
+ int nid, resource_size_t *size)
+{
+ return -EOPNOTSUPP;
+}
#endif
#ifdef CONFIG_ACPI_NUMA
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 3d1ca9e38b1f..c687ef56717d 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -61,6 +61,7 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
cxl_core-y += $(CXL_CORE_SRC)/hdm.o
cxl_core-y += $(CXL_CORE_SRC)/pmu.o
cxl_core-y += $(CXL_CORE_SRC)/cdat.o
+cxl_core-y += $(CXL_CORE_SRC)/acpi.o
cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
cxl_core-y += config_check.o
--
2.46.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL
2024-09-27 14:16 ` [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
@ 2024-10-17 16:20 ` Jonathan Cameron
2024-10-29 22:04 ` Dave Jiang
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:20 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Fri, 27 Sep 2024 07:16:55 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> The current cxl region size only indicates the size of the CXL memory region
> without accounting for the extended linear cache size. Retrieve the cache
> size from HMAT and append that to the cxl region size for the cxl region
> range that matches the SRAT range that has extended linear cache enabled.
>
> The SRAT defines the whole memory range that inclues the extended linear
includes
> cache and the CXL memory region. The new HMAT update to the Memory Side
ECN/ECR, not update. After all update might mean _HMA
> Cache Information Structure defines the size of the extended linear cache
> size and matches to the SRAT Memory Affinity Structure by the memory proxmity
> domain. Add a helper to match the cxl range to the SRAT memory range in order
> to retrieve the cache size.
>
> There are several places that checks the cxl region range against the
> decoder range. Use new helper to check between the two ranges and address
> the new cache size.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Various comments inline.
> ---
> drivers/acpi/numa/hmat.c | 44 +++++++++++++++++++++++++++++++++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/acpi.c | 11 +++++++++
> drivers/cxl/core/core.h | 3 +++
> drivers/cxl/core/region.c | 51 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/cxl.h | 2 ++
> include/linux/acpi.h | 8 ++++++
> tools/testing/cxl/Kbuild | 1 +
> 8 files changed, 117 insertions(+), 4 deletions(-)
> create mode 100644 drivers/cxl/core/acpi.c
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 39524f36be5b..d299f8d7af8c 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -108,6 +108,50 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
> return NULL;
> }
>
> +/**
> + * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
> + * @backing_res: resource from the backing media
> + * @nid: node id for the memory region
> + * @cache_size: (Output) size of extended linear cache.
> + *
> + * Return: 0 on success. Errno on failure.
> + *
> + */
> +int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> + resource_size_t *cache_size)
> +{
> + unsigned int pxm = node_to_pxm(nid);
> + struct memory_target *target;
> + struct target_cache *tcache;
> + bool cache_found = false;
> + struct resource *res;
> +
> + target = find_mem_target(pxm);
> + if (!target)
> + return -ENOENT;
> +
> + list_for_each_entry(tcache, &target->caches, node) {
> + if (tcache->cache_attrs.mode == NODE_CACHE_MODE_EXTENDED_LINEAR) {
Why is finding the first one appropriate? Maybe you have more than one?
I'd move the code bellow up here then carry on to see if there is another
entry if resource_contains fails.
> + cache_found = true;
> + break;
> + }
> + }
> +
> + if (!cache_found) {
> + *cache_size = 0;
> + return 0;
> + }
> +
> + res = &target->memregions;
> + if (!resource_contains(res, backing_res))
> + return -ENOENT;
> +
> + *cache_size = tcache->cache_attrs.size;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
> +
> static struct memory_target *acpi_find_genport_target(u32 uid)
> {
> struct memory_target *target;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 21ad5f242875..ddfb1e1a8909 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -816,6 +816,17 @@ static int match_free_decoder(struct device *dev, void *data)
> return 0;
> }
>
> +static bool region_res_match_range(struct cxl_region_params *p,
This is a little odd. So comment on what it is doing needed.
I think it is patching the CXL backed bit of the region
by offsetting the start back to where it was before you
subtracted the dram cache size.
> + struct range *range)
> +{
> + if (p->res &&
I'd break the
if (!p->res)
return false;
off then
return p->res->start + p->cache_size == range->start &&
p->res->end == range->end;
> + p->res->start + p->cache_size == range->start &&
> + p->res->end == range->end)
> + return true;
> +
> + return false;
> +}
Reasonable to factor this out first.
> +
> static int match_auto_decoder(struct device *dev, void *data)
> {
> struct cxl_region_params *p = data;
> @@ -828,7 +839,7 @@ static int match_auto_decoder(struct device *dev, void *data)
> cxld = to_cxl_decoder(dev);
> r = &cxld->hpa_range;
>
> - if (p->res && p->res->start == r->start && p->res->end == r->end)
> + if (region_res_match_range(p, r))
> return 1;
>
> return 0;
> @@ -1406,8 +1417,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
> if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> if (cxld->interleave_ways != iw ||
> cxld->interleave_granularity != ig ||
> - cxld->hpa_range.start != p->res->start ||
> - cxld->hpa_range.end != p->res->end ||
> + !region_res_match_range(p, &cxld->hpa_range) ||
> ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
> dev_err(&cxlr->dev,
> "%s:%s %s expected iw: %d ig: %d %pr\n",
> @@ -1931,7 +1941,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return -ENXIO;
> }
>
> - if (resource_size(cxled->dpa_res) * p->interleave_ways !=
> + if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
> resource_size(p->res)) {
> dev_dbg(&cxlr->dev,
> "%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
> @@ -3226,6 +3236,34 @@ static int match_region_by_range(struct device *dev, void *data)
> return rc;
> }
>
> +static int cxl_extended_linear_cache_resize(struct cxl_region_params *p,
> + struct resource *res)
> +{
> + int nid = phys_to_target_node(res->start);
> + resource_size_t size, cache_size;
> + int rc;
> +
> + size = resource_size(res);
> + if (!size)
> + return -EINVAL;
> +
> + rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
> + if (rc)
> + return rc;
> +
> + if (!cache_size)
> + return 0;
> +
> + if (size != cache_size)
> + return -EINVAL;
> +
> + res->start -= cache_size;
I don't recall the ECN saying which way round they were (and it didn't
occur to me at the time) i.e. local dram first or CXL dram first.
Did I miss that? I was kind of thinking extra capacity at higher
addresses but no particularly reason why...
> + p->cache_size = cache_size;
> +
> + return 0;
> +}
> +
Trivial but 1 blank line probably appropriate.
> +
> /* Establish an empty region covering the given HPA range */
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL
2024-10-17 16:20 ` Jonathan Cameron
@ 2024-10-29 22:04 ` Dave Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2024-10-29 22:04 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/17/24 9:20 AM, Jonathan Cameron wrote:
> On Fri, 27 Sep 2024 07:16:55 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> The current cxl region size only indicates the size of the CXL memory region
>> without accounting for the extended linear cache size. Retrieve the cache
>> size from HMAT and append that to the cxl region size for the cxl region
>> range that matches the SRAT range that has extended linear cache enabled.
>>
>> The SRAT defines the whole memory range that inclues the extended linear
>
> includes
>
>> cache and the CXL memory region. The new HMAT update to the Memory Side
>
> ECN/ECR, not update. After all update might mean _HMA
>
>> Cache Information Structure defines the size of the extended linear cache
>> size and matches to the SRAT Memory Affinity Structure by the memory proxmity
>> domain. Add a helper to match the cxl range to the SRAT memory range in order
>> to retrieve the cache size.
>>
>> There are several places that checks the cxl region range against the
>> decoder range. Use new helper to check between the two ranges and address
>> the new cache size.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Various comments inline.
>> ---
>> drivers/acpi/numa/hmat.c | 44 +++++++++++++++++++++++++++++++++
>> drivers/cxl/core/Makefile | 1 +
>> drivers/cxl/core/acpi.c | 11 +++++++++
>> drivers/cxl/core/core.h | 3 +++
>> drivers/cxl/core/region.c | 51 ++++++++++++++++++++++++++++++++++++---
>> drivers/cxl/cxl.h | 2 ++
>> include/linux/acpi.h | 8 ++++++
>> tools/testing/cxl/Kbuild | 1 +
>> 8 files changed, 117 insertions(+), 4 deletions(-)
>> create mode 100644 drivers/cxl/core/acpi.c
>>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index 39524f36be5b..d299f8d7af8c 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -108,6 +108,50 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
>> return NULL;
>> }
>>
>> +/**
>> + * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
>> + * @backing_res: resource from the backing media
>> + * @nid: node id for the memory region
>> + * @cache_size: (Output) size of extended linear cache.
>> + *
>> + * Return: 0 on success. Errno on failure.
>> + *
>> + */
>> +int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
>> + resource_size_t *cache_size)
>> +{
>> + unsigned int pxm = node_to_pxm(nid);
>> + struct memory_target *target;
>> + struct target_cache *tcache;
>> + bool cache_found = false;
>> + struct resource *res;
>> +
>> + target = find_mem_target(pxm);
>> + if (!target)
>> + return -ENOENT;
>> +
>> + list_for_each_entry(tcache, &target->caches, node) {
>> + if (tcache->cache_attrs.mode == NODE_CACHE_MODE_EXTENDED_LINEAR) {
>
> Why is finding the first one appropriate? Maybe you have more than one?
> I'd move the code bellow up here then carry on to see if there is another
> entry if resource_contains fails.
Yes I'll move that.
>
>
>> + cache_found = true;
>> + break;
>> + }
>> + }
>> +
>> + if (!cache_found) {
>> + *cache_size = 0;
>> + return 0;
>> + }
>> +
>> + res = &target->memregions;
>> + if (!resource_contains(res, backing_res))
>> + return -ENOENT;
>> +
>> + *cache_size = tcache->cache_attrs.size;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
>> +
>> static struct memory_target *acpi_find_genport_target(u32 uid)
>> {
>> struct memory_target *target;
>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 21ad5f242875..ddfb1e1a8909 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -816,6 +816,17 @@ static int match_free_decoder(struct device *dev, void *data)
>> return 0;
>> }
>>
>> +static bool region_res_match_range(struct cxl_region_params *p,
>
> This is a little odd. So comment on what it is doing needed.
> I think it is patching the CXL backed bit of the region
> by offsetting the start back to where it was before you
> subtracted the dram cache size.
Does it make it more clear if I rename it to region_res_match_cxl_range()? Regardless I'll add some comments.
>
>> + struct range *range)
>> +{
>> + if (p->res &&
> I'd break the
> if (!p->res)
> return false;
> off then
> return p->res->start + p->cache_size == range->start &&
> p->res->end == range->end;
>
>> + p->res->start + p->cache_size == range->start &&
>> + p->res->end == range->end)
>> + return true;
>> +
>> + return false;
>> +}
> Reasonable to factor this out first.
>> +
>> static int match_auto_decoder(struct device *dev, void *data)
>> {
>> struct cxl_region_params *p = data;
>> @@ -828,7 +839,7 @@ static int match_auto_decoder(struct device *dev, void *data)
>> cxld = to_cxl_decoder(dev);
>> r = &cxld->hpa_range;
>>
>> - if (p->res && p->res->start == r->start && p->res->end == r->end)
>> + if (region_res_match_range(p, r))
>> return 1;
>>
>> return 0;
>> @@ -1406,8 +1417,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>> if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
>> if (cxld->interleave_ways != iw ||
>> cxld->interleave_granularity != ig ||
>> - cxld->hpa_range.start != p->res->start ||
>> - cxld->hpa_range.end != p->res->end ||
>> + !region_res_match_range(p, &cxld->hpa_range) ||
>> ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
>> dev_err(&cxlr->dev,
>> "%s:%s %s expected iw: %d ig: %d %pr\n",
>> @@ -1931,7 +1941,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>> return -ENXIO;
>> }
>>
>> - if (resource_size(cxled->dpa_res) * p->interleave_ways !=
>> + if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
>> resource_size(p->res)) {
>> dev_dbg(&cxlr->dev,
>> "%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
>> @@ -3226,6 +3236,34 @@ static int match_region_by_range(struct device *dev, void *data)
>> return rc;
>> }
>>
>> +static int cxl_extended_linear_cache_resize(struct cxl_region_params *p,
>> + struct resource *res)
>> +{
>> + int nid = phys_to_target_node(res->start);
>> + resource_size_t size, cache_size;
>> + int rc;
>> +
>> + size = resource_size(res);
>> + if (!size)
>> + return -EINVAL;
>> +
>> + rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
>> + if (rc)
>> + return rc;
>> +
>> + if (!cache_size)
>> + return 0;
>> +
>> + if (size != cache_size)
>> + return -EINVAL;
>> +
>> + res->start -= cache_size;
>
> I don't recall the ECN saying which way round they were (and it didn't
> occur to me at the time) i.e. local dram first or CXL dram first.
> Did I miss that? I was kind of thinking extra capacity at higher
> addresses but no particularly reason why...
No the spec does not dictate that. However the current implementation by Intel firmware always puts local DRAM first as cache and CXL at the higher address. I just coded up the simple case for now. Not sure if we want to support the complex variation detection or just leave it to what's being implemented until someone shows up with a variation. I'll add some comments.
>
>> + p->cache_size = cache_size;
>> +
>> + return 0;
>> +}
>> +
> Trivial but 1 blank line probably appropriate.
thank you for spotting that.
>> +
>> /* Establish an empty region covering the given HPA range */
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
` (2 preceding siblings ...)
2024-09-27 14:16 ` [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
@ 2024-09-27 14:16 ` Dave Jiang
2024-10-17 16:33 ` Jonathan Cameron
2024-09-27 14:16 ` [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
` (2 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-09-27 14:16 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Add helper functions to help do address translation for either the address
of the extended linear cache or its alias address. The translation function
attempt to detect an I/O hole in the proximity domain and adjusts the address
if the hole impacts the aliasing of the address. The range of the I/O hole
is retrieved by walking through the associated memory target resources.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/acpi/numa/hmat.c | 136 +++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 14 ++++
2 files changed, 150 insertions(+)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index d299f8d7af8c..834314582f4c 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -152,6 +152,142 @@ int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
}
EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
+static int alias_address_find_iohole(struct memory_target *target,
+ u64 address, u64 alias, struct range *hole)
+{
+ struct resource *alias_res = NULL;
+ struct resource *res, *prev;
+
+ *hole = (struct range) {
+ .start = 0,
+ .end = -1,
+ };
+
+ /* First find the resource that the address is in */
+ prev = target->memregions.child;
+ for (res = target->memregions.child; res; res = res->sibling) {
+ if (alias >= res->start && alias <= res->end) {
+ alias_res = res;
+ break;
+ }
+ prev = res;
+ }
+ if (!alias_res)
+ return -EINVAL;
+
+ /* No memory hole */
+ if (alias_res == prev)
+ return 0;
+
+ /* If address is within the current resource, no need to deal with memory hole */
+ if (address >= alias_res->start)
+ return 0;
+
+ *hole = (struct range) {
+ .start = prev->end + 1,
+ .end = alias_res->start - 1,
+ };
+
+ return 0;
+}
+
+int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
+{
+ unsigned int pxm = node_to_pxm(nid);
+ struct memory_target *target;
+ struct range iohole;
+ int rc;
+
+ target = find_mem_target(pxm);
+ if (!target)
+ return -EINVAL;
+
+ rc = alias_address_find_iohole(target, address, *alias, &iohole);
+ if (rc)
+ return rc;
+
+ if (!range_len(&iohole))
+ return 0;
+
+ if (address < iohole.start) {
+ if (*alias > iohole.start) {
+ *alias = *alias + range_len(&iohole);
+ return 0;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_alias_xlat, CXL);
+
+static int target_address_find_iohole(struct memory_target *target,
+ u64 address, u64 alias,
+ struct range *hole)
+{
+ struct resource *addr_res = NULL;
+ struct resource *res, *next;
+
+ *hole = (struct range) {
+ .start = 0,
+ .end = -1,
+ };
+
+ /* First find the resource that the address is in */
+ for (res = target->memregions.child; res; res = res->sibling) {
+ if (address >= res->start && address <= res->end) {
+ addr_res = res;
+ break;
+ }
+ }
+ if (!addr_res)
+ return -EINVAL;
+
+ next = res->sibling;
+ /* No memory hole after the region */
+ if (!next)
+ return 0;
+
+ /* If alias is within the current resource, no need to deal with memory hole */
+ if (alias <= addr_res->end)
+ return 0;
+
+ *hole = (struct range) {
+ .start = addr_res->end + 1,
+ .end = next->start - 1,
+ };
+
+ return 0;
+}
+
+int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
+{
+ unsigned int pxm = node_to_pxm(nid);
+ struct memory_target *target;
+ struct range iohole;
+ int rc;
+
+ target = find_mem_target(pxm);
+ if (!target)
+ return -EINVAL;
+
+ rc = target_address_find_iohole(target, *address, alias, &iohole);
+ if (rc)
+ return rc;
+
+ if (!range_len(&iohole))
+ return 0;
+
+ if (alias > iohole.end) {
+ if (*address < iohole.end) {
+ *address = *address - range_len(&iohole);
+ return 0;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_address_xlat, CXL);
+
static struct memory_target *acpi_find_genport_target(u32 uid)
{
struct memory_target *target;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 8ed72d431dca..704bdfc79f85 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -437,6 +437,8 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp);
int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord);
int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
resource_size_t *size);
+int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
+int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
#else
static inline int acpi_get_genport_coordinates(u32 uid,
struct access_coordinate *coord)
@@ -449,6 +451,18 @@ static inline int hmat_get_extended_linear_cache_size(struct resource *backing_r
{
return -EOPNOTSUPP;
}
+
+static inline int hmat_extended_linear_cache_alias_xlat(u64 address,
+ u64 *alias, int nid)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int hmat_extended_linear_cache_address_xlat(u64 *address,
+ u64 alias, int nid)
+{
+ return -EOPNOTSUPP;
+}
#endif
#ifdef CONFIG_ACPI_NUMA
--
2.46.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-09-27 14:16 ` [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
@ 2024-10-17 16:33 ` Jonathan Cameron
2024-10-17 16:46 ` Luck, Tony
2024-10-30 22:53 ` Dave Jiang
0 siblings, 2 replies; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:33 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Fri, 27 Sep 2024 07:16:56 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Add helper functions to help do address translation for either the address
> of the extended linear cache or its alias address. The translation function
> attempt to detect an I/O hole in the proximity domain and adjusts the address
> if the hole impacts the aliasing of the address. The range of the I/O hole
> is retrieved by walking through the associated memory target resources.
What does the I/O hole correspond to in the system?
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/acpi/numa/hmat.c | 136 +++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 14 ++++
> 2 files changed, 150 insertions(+)
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index d299f8d7af8c..834314582f4c 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -152,6 +152,142 @@ int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> }
> EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
>
> +static int alias_address_find_iohole(struct memory_target *target,
> + u64 address, u64 alias, struct range *hole)
> +{
> + struct resource *alias_res = NULL;
> + struct resource *res, *prev;
> +
> + *hole = (struct range) {
> + .start = 0,
> + .end = -1,
> + };
> +
> + /* First find the resource that the address is in */
> + prev = target->memregions.child;
> + for (res = target->memregions.child; res; res = res->sibling) {
> + if (alias >= res->start && alias <= res->end) {
> + alias_res = res;
> + break;
> + }
> + prev = res;
> + }
> + if (!alias_res)
if (!res) and you can just use res instead of alias_res for the following
as you exit the loop with it set to the right value.
> + return -EINVAL;
> +
> + /* No memory hole */
> + if (alias_res == prev)
> + return 0;
> +
> + /* If address is within the current resource, no need to deal with memory hole */
> + if (address >= alias_res->start)
> + return 0;
> +
> + *hole = (struct range) {
> + .start = prev->end + 1,
> + .end = alias_res->start - 1,
> + };
Ordering assumption should be avoided in such a generic
sounding function. Can the hole be first?
or rename the function to include preceding_hole or something like that.
> +
> + return 0;
> +}
> +
> +int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
> +{
> + unsigned int pxm = node_to_pxm(nid);
> + struct memory_target *target;
> + struct range iohole;
> + int rc;
> +
> + target = find_mem_target(pxm);
> + if (!target)
> + return -EINVAL;
> +
> + rc = alias_address_find_iohole(target, address, *alias, &iohole);
> + if (rc)
> + return rc;
> +
> + if (!range_len(&iohole))
> + return 0;
> +
Maybe reformat like this and add comments on each condition.
if (address >= iohole.start)
return 0;
if (*alias <= iohole.start)
return 0;
*alias += range_len(&iohole);
return 0;
> + if (address < iohole.start) {
> + if (*alias > iohole.start) {
> + *alias = *alias + range_len(&iohole);
> + return 0;
> + }
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_alias_xlat, CXL);
> +
> +static int target_address_find_iohole(struct memory_target *target,
> + u64 address, u64 alias,
> + struct range *hole)
> +{
> + struct resource *addr_res = NULL;
> + struct resource *res, *next;
> +
> + *hole = (struct range) {
> + .start = 0,
> + .end = -1,
> + };
> +
> + /* First find the resource that the address is in */
> + for (res = target->memregions.child; res; res = res->sibling) {
> + if (address >= res->start && address <= res->end) {
> + addr_res = res;
Could just use res as it's scope is outside the loop.
> + break;
> + }
> + }
> + if (!addr_res)
> + return -EINVAL;
> +
> + next = res->sibling;
> + /* No memory hole after the region */
> + if (!next)
> + return 0;
> +
> + /* If alias is within the current resource, no need to deal with memory hole */
> + if (alias <= addr_res->end)
> + return 0;
> +
> + *hole = (struct range) {
> + .start = addr_res->end + 1,
> + .end = next->start - 1,
> + };
> +
> + return 0;
> +}
> +
> +int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
> +{
> + unsigned int pxm = node_to_pxm(nid);
> + struct memory_target *target;
> + struct range iohole;
> + int rc;
> +
> + target = find_mem_target(pxm);
> + if (!target)
> + return -EINVAL;
> +
> + rc = target_address_find_iohole(target, *address, alias, &iohole);
> + if (rc)
> + return rc;
> +
> + if (!range_len(&iohole))
> + return 0;
> +
Similar to above, maybe break into multiple reasons to exit early.
> + if (alias > iohole.end) {
> + if (*address < iohole.end) {
> + *address = *address - range_len(&iohole);
> + return 0;
> + }
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_address_xlat, CXL);
>
Jonathan
^ permalink raw reply [flat|nested] 25+ messages in thread* RE: [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-10-17 16:33 ` Jonathan Cameron
@ 2024-10-17 16:46 ` Luck, Tony
2024-10-17 16:59 ` Jonathan Cameron
2024-10-30 22:53 ` Dave Jiang
1 sibling, 1 reply; 25+ messages in thread
From: Luck, Tony @ 2024-10-17 16:46 UTC (permalink / raw)
To: Jonathan Cameron, Jiang, Dave
Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
rafael@kernel.org, bp@alien8.de, Williams, Dan J,
dave@stgolabs.net, Schofield, Alison, Weiny, Ira
> What does the I/O hole correspond to in the system?
PCIe mmio mapped space. 32-bit devices must have addresses below 4G
so X86 systems have a physical memory map that looks like:
0 - 2G: RAM
2G-4G: MMIO
4G-end of memory: RAM
end of memory-infinity: 64-bit MMIO
Depending on how much MMIO there is different systems put the
dividing line at other addresses than 2G.
-Tony
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-10-17 16:46 ` Luck, Tony
@ 2024-10-17 16:59 ` Jonathan Cameron
2024-10-29 22:51 ` Dave Jiang
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:59 UTC (permalink / raw)
To: Luck, Tony
Cc: Jiang, Dave, linux-cxl@vger.kernel.org,
linux-acpi@vger.kernel.org, rafael@kernel.org, bp@alien8.de,
Williams, Dan J, dave@stgolabs.net, Schofield, Alison, Weiny, Ira
On Thu, 17 Oct 2024 16:46:35 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:
> > What does the I/O hole correspond to in the system?
>
> PCIe mmio mapped space. 32-bit devices must have addresses below 4G
> so X86 systems have a physical memory map that looks like:
>
> 0 - 2G: RAM
> 2G-4G: MMIO
> 4G-end of memory: RAM
> end of memory-infinity: 64-bit MMIO
>
> Depending on how much MMIO there is different systems put the
> dividing line at other addresses than 2G.
Ah, thanks. So this weird cache setup might be not quite linear
module N aliases as described in the ACPI spec (System vs host
physical addresses I guess).
Had wrong mental model :(
Ouch.
>
> -Tony
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-10-17 16:59 ` Jonathan Cameron
@ 2024-10-29 22:51 ` Dave Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2024-10-29 22:51 UTC (permalink / raw)
To: Jonathan Cameron, Luck, Tony
Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
rafael@kernel.org, bp@alien8.de, Williams, Dan J,
dave@stgolabs.net, Schofield, Alison, Weiny, Ira
On 10/17/24 9:59 AM, Jonathan Cameron wrote:
> On Thu, 17 Oct 2024 16:46:35 +0000
> "Luck, Tony" <tony.luck@intel.com> wrote:
>
>>> What does the I/O hole correspond to in the system?
>>
>> PCIe mmio mapped space. 32-bit devices must have addresses below 4G
>> so X86 systems have a physical memory map that looks like:
>>
>> 0 - 2G: RAM
>> 2G-4G: MMIO
>> 4G-end of memory: RAM
>> end of memory-infinity: 64-bit MMIO
>>
>> Depending on how much MMIO there is different systems put the
>> dividing line at other addresses than 2G.
>
> Ah, thanks. So this weird cache setup might be not quite linear
> module N aliases as described in the ACPI spec (System vs host
> physical addresses I guess).
Well, as long as the cache range does not overlap the MMIO hole. I'm not coming up with an elegant way to enumerate the MMIO hole and looking for ideas/suggestions on how to do it nicely.
>
> Had wrong mental model :(
>
> Ouch.
>
>
>>
>> -Tony
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-10-17 16:33 ` Jonathan Cameron
2024-10-17 16:46 ` Luck, Tony
@ 2024-10-30 22:53 ` Dave Jiang
2024-11-01 11:56 ` Jonathan Cameron
1 sibling, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-10-30 22:53 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/17/24 9:33 AM, Jonathan Cameron wrote:
> On Fri, 27 Sep 2024 07:16:56 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Add helper functions to help do address translation for either the address
>> of the extended linear cache or its alias address. The translation function
>> attempt to detect an I/O hole in the proximity domain and adjusts the address
>> if the hole impacts the aliasing of the address. The range of the I/O hole
>> is retrieved by walking through the associated memory target resources.
>
> What does the I/O hole correspond to in the system?
>
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> drivers/acpi/numa/hmat.c | 136 +++++++++++++++++++++++++++++++++++++++
>> include/linux/acpi.h | 14 ++++
>> 2 files changed, 150 insertions(+)
>>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index d299f8d7af8c..834314582f4c 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -152,6 +152,142 @@ int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
>> }
>> EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
>>
>> +static int alias_address_find_iohole(struct memory_target *target,
>> + u64 address, u64 alias, struct range *hole)
>> +{
>> + struct resource *alias_res = NULL;
>> + struct resource *res, *prev;
>> +
>> + *hole = (struct range) {
>> + .start = 0,
>> + .end = -1,
>> + };
>> +
>> + /* First find the resource that the address is in */
>> + prev = target->memregions.child;
>> + for (res = target->memregions.child; res; res = res->sibling) {
>> + if (alias >= res->start && alias <= res->end) {
>> + alias_res = res;
>> + break;
>> + }
>> + prev = res;
>> + }
>> + if (!alias_res)
>
> if (!res) and you can just use res instead of alias_res for the following
> as you exit the loop with it set to the right value.
>
Ok will do that
>
>
>> + return -EINVAL;
>> +
>> + /* No memory hole */
>> + if (alias_res == prev)
>> + return 0;
>> +
>> + /* If address is within the current resource, no need to deal with memory hole */
>> + if (address >= alias_res->start)
>> + return 0;
>> +
>> + *hole = (struct range) {
>> + .start = prev->end + 1,
>> + .end = alias_res->start - 1,
>> + };
> Ordering assumption should be avoided in such a generic
> sounding function. Can the hole be first?
Do you mean if the address mapping starts out with an MMIO range and then memory range? I'm not sure if such an implementation exists in the x86 world. And if the hole is behind all the ranges, then it shouldn't impact the address calculations.
>
> or rename the function to include preceding_hole or something like that.
>> +
>> + return 0;
>> +}
>> +
>> +int hmat_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
>> +{
>> + unsigned int pxm = node_to_pxm(nid);
>> + struct memory_target *target;
>> + struct range iohole;
>> + int rc;
>> +
>> + target = find_mem_target(pxm);
>> + if (!target)
>> + return -EINVAL;
>> +
>> + rc = alias_address_find_iohole(target, address, *alias, &iohole);
>> + if (rc)
>> + return rc;
>> +
>> + if (!range_len(&iohole))
>> + return 0;
>> +
> Maybe reformat like this and add comments on each condition.
>
> if (address >= iohole.start)
> return 0;
>
> if (*alias <= iohole.start)
> return 0;
>
> *alias += range_len(&iohole);
>
> return 0;
>
>
Will change to that and add comments.
>> + if (address < iohole.start) {
>> + if (*alias > iohole.start) {
>> + *alias = *alias + range_len(&iohole);
>> + return 0;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_alias_xlat, CXL);
>> +
>> +static int target_address_find_iohole(struct memory_target *target,
>> + u64 address, u64 alias,
>> + struct range *hole)
>> +{
>> + struct resource *addr_res = NULL;
>> + struct resource *res, *next;
>> +
>> + *hole = (struct range) {
>> + .start = 0,
>> + .end = -1,
>> + };
>> +
>> + /* First find the resource that the address is in */
>> + for (res = target->memregions.child; res; res = res->sibling) {
>> + if (address >= res->start && address <= res->end) {
>> + addr_res = res;
>
> Could just use res as it's scope is outside the loop.
Will update.
>
>> + break;
>> + }
>> + }
>> + if (!addr_res)
>> + return -EINVAL;
>> +
>> + next = res->sibling;
>> + /* No memory hole after the region */
>> + if (!next)
>> + return 0;
>> +
>> + /* If alias is within the current resource, no need to deal with memory hole */
>> + if (alias <= addr_res->end)
>> + return 0;
>> +
>> + *hole = (struct range) {
>> + .start = addr_res->end + 1,
>> + .end = next->start - 1,
>> + };
>> +
>> + return 0;
>> +}
>> +
>> +int hmat_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
>> +{
>> + unsigned int pxm = node_to_pxm(nid);
>> + struct memory_target *target;
>> + struct range iohole;
>> + int rc;
>> +
>> + target = find_mem_target(pxm);
>> + if (!target)
>> + return -EINVAL;
>> +
>> + rc = target_address_find_iohole(target, *address, alias, &iohole);
>> + if (rc)
>> + return rc;
>> +
>> + if (!range_len(&iohole))
>> + return 0;
>> +
>
> Similar to above, maybe break into multiple reasons to exit early.
>
Will update.
>> + if (alias > iohole.end) {
>> + if (*address < iohole.end) {
>> + *address = *address - range_len(&iohole);
>> + return 0;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(hmat_extended_linear_cache_address_xlat, CXL);
>>
>
> Jonathan
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation
2024-10-30 22:53 ` Dave Jiang
@ 2024-11-01 11:56 ` Jonathan Cameron
0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-01 11:56 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Wed, 30 Oct 2024 15:53:20 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> On 10/17/24 9:33 AM, Jonathan Cameron wrote:
> > On Fri, 27 Sep 2024 07:16:56 -0700
> > Dave Jiang <dave.jiang@intel.com> wrote:
> >
> >> Add helper functions to help do address translation for either the address
> >> of the extended linear cache or its alias address. The translation function
> >> attempt to detect an I/O hole in the proximity domain and adjusts the address
> >> if the hole impacts the aliasing of the address. The range of the I/O hole
> >> is retrieved by walking through the associated memory target resources.
> >
> > What does the I/O hole correspond to in the system?
> >
> >>
> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> >> ---
> >> drivers/acpi/numa/hmat.c | 136 +++++++++++++++++++++++++++++++++++++++
> >> include/linux/acpi.h | 14 ++++
> >> 2 files changed, 150 insertions(+)
> >>
> >> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> >> index d299f8d7af8c..834314582f4c 100644
> >> --- a/drivers/acpi/numa/hmat.c
> >> +++ b/drivers/acpi/numa/hmat.c
> >> @@ -152,6 +152,142 @@ int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> >> }
> >> EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, CXL);
> >>
> >> +static int alias_address_find_iohole(struct memory_target *target,
> >> + u64 address, u64 alias, struct range *hole)
> >> +{
> >> + struct resource *alias_res = NULL;
> >> + struct resource *res, *prev;
> >> +
> >> + *hole = (struct range) {
> >> + .start = 0,
> >> + .end = -1,
> >> + };
> >> +
> >> + /* First find the resource that the address is in */
> >> + prev = target->memregions.child;
> >> + for (res = target->memregions.child; res; res = res->sibling) {
> >> + if (alias >= res->start && alias <= res->end) {
> >> + alias_res = res;
> >> + break;
> >> + }
> >> + prev = res;
> >> + }
> >> + if (!alias_res)
> >
> > if (!res) and you can just use res instead of alias_res for the following
> > as you exit the loop with it set to the right value.
> >
> Ok will do that
>
> >
> >
> >> + return -EINVAL;
> >> +
> >> + /* No memory hole */
> >> + if (alias_res == prev)
> >> + return 0;
> >> +
> >> + /* If address is within the current resource, no need to deal with memory hole */
> >> + if (address >= alias_res->start)
> >> + return 0;
> >> +
> >> + *hole = (struct range) {
> >> + .start = prev->end + 1,
> >> + .end = alias_res->start - 1,
> >> + };
> > Ordering assumption should be avoided in such a generic
> > sounding function. Can the hole be first?
>
> Do you mean if the address mapping starts out with an MMIO range and then memory range? I'm not sure if such an implementation exists in the x86 world. And if the hole is behind all the ranges, then it shouldn't impact the address calculations.
>
That was me not really understanding what the hole was.
Tony filled in that gap.
> >
> > or rename the function to include preceding_hole or something like that.
> >> +
> >> + return 0;
> >> +}
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
` (3 preceding siblings ...)
2024-09-27 14:16 ` [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
@ 2024-09-27 14:16 ` Dave Jiang
2024-10-17 16:38 ` Jonathan Cameron
2024-09-27 14:16 ` [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
2024-10-17 16:46 ` [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Jonathan Cameron
6 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-09-27 14:16 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Add the aliased address of exteneded linear cache when emitting event
trace for DRAM and general media of CXL events.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/acpi.c | 10 ++++++++++
drivers/cxl/core/core.h | 7 +++++++
drivers/cxl/core/mbox.c | 42 ++++++++++++++++++++++++++++++++++++---
drivers/cxl/core/region.c | 2 +-
drivers/cxl/core/trace.h | 24 ++++++++++++++--------
5 files changed, 73 insertions(+), 12 deletions(-)
diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
index f13b4dae6ac5..f74136320fc3 100644
--- a/drivers/cxl/core/acpi.c
+++ b/drivers/cxl/core/acpi.c
@@ -9,3 +9,13 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
{
return hmat_get_extended_linear_cache_size(backing_res, nid, size);
}
+
+int cxl_acpi_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
+{
+ return hmat_extended_linear_cache_address_xlat(address, alias, nid);
+}
+
+int cxl_acpi_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
+{
+ return hmat_extended_linear_cache_alias_xlat(address, alias, nid);
+}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index dd586c76c773..f23bff1b38a6 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -30,8 +30,13 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port);
struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
u64 dpa);
+int cxl_region_nid(struct cxl_region *cxlr);
#else
+static inline int cxl_region_nid(struct cxl_region *cxlr)
+{
+ return NUMA_NO_NODE;
+}
static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
const struct cxl_memdev *cxlmd, u64 dpa)
{
@@ -110,5 +115,7 @@ bool cxl_need_node_perf_attrs_update(int nid);
int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
int nid, resource_size_t *size);
+int cxl_acpi_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid);
+int cxl_acpi_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index e5cdeafdf76e..ac170fd85a1a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -849,6 +849,39 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
+static u64 cxlr_hpa_cache_alias(struct cxl_region *cxlr, u64 hpa)
+{
+ struct cxl_region_params *p = &cxlr->params;
+ u64 alias, address;
+ int nid, rc;
+
+ if (!p->cache_size)
+ return ~0ULL;
+
+ nid = cxl_region_nid(cxlr);
+ if (nid == NUMA_NO_NODE)
+ nid = 0;
+
+ if (hpa >= p->res->start + p->cache_size) {
+ address = hpa - p->cache_size;
+ alias = hpa;
+ rc = cxl_acpi_extended_linear_cache_address_xlat(&address,
+ alias, nid);
+ if (rc)
+ return rc;
+
+ return address;
+ }
+
+ address = hpa;
+ alias = hpa + p->cache_size;
+ rc = cxl_acpi_extended_linear_cache_alias_xlat(address, &alias, nid);
+ if (rc)
+ return rc;
+
+ return alias;
+}
+
void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
enum cxl_event_type event_type,
@@ -864,7 +897,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
}
if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) {
- u64 dpa, hpa = ULLONG_MAX;
+ u64 dpa, hpa = ULLONG_MAX, hpa_alias;
struct cxl_region *cxlr;
/*
@@ -880,11 +913,14 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
if (cxlr)
hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+ hpa_alias = cxlr_hpa_cache_alias(cxlr, hpa);
+
if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
- &evt->gen_media);
+ hpa_alias, &evt->gen_media);
else if (event_type == CXL_CPER_EVENT_DRAM)
- trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
+ trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
+ &evt->dram);
}
}
EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ddfb1e1a8909..c19bbbf8079d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2401,7 +2401,7 @@ static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
return true;
}
-static int cxl_region_nid(struct cxl_region *cxlr)
+int cxl_region_nid(struct cxl_region *cxlr)
{
struct cxl_region_params *p = &cxlr->params;
struct resource *res;
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 9167cfba7f59..79bee3fd7d25 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -316,9 +316,10 @@ TRACE_EVENT(cxl_generic_event,
TRACE_EVENT(cxl_general_media,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_region *cxlr, u64 hpa, struct cxl_event_gen_media *rec),
+ struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
+ struct cxl_event_gen_media *rec),
- TP_ARGS(cxlmd, log, cxlr, hpa, rec),
+ TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -332,6 +333,7 @@ TRACE_EVENT(cxl_general_media,
__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
/* Following are out of order to pack trace record */
__field(u64, hpa)
+ __field(u64, hpa_alias)
__field_struct(uuid_t, region_uuid)
__field(u16, validity_flags)
__field(u8, rank)
@@ -358,6 +360,7 @@ TRACE_EVENT(cxl_general_media,
CXL_EVENT_GEN_MED_COMP_ID_SIZE);
__entry->validity_flags = get_unaligned_le16(&rec->media_hdr.validity_flags);
__entry->hpa = hpa;
+ __entry->hpa_alias = hpa_alias;
if (cxlr) {
__assign_str(region_name);
uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -370,7 +373,7 @@ TRACE_EVENT(cxl_general_media,
CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
"device=%x comp_id=%s validity_flags='%s' " \
- "hpa=%llx region=%s region_uuid=%pUb",
+ "hpa=%llx hpa_alias=%llx region=%s region_uuid=%pUb",
__entry->dpa, show_dpa_flags(__entry->dpa_flags),
show_event_desc_flags(__entry->descriptor),
show_mem_event_type(__entry->type),
@@ -378,7 +381,8 @@ TRACE_EVENT(cxl_general_media,
__entry->channel, __entry->rank, __entry->device,
__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
show_valid_flags(__entry->validity_flags),
- __entry->hpa, __get_str(region_name), &__entry->region_uuid
+ __entry->hpa, __entry->hpa_alias, __get_str(region_name),
+ &__entry->region_uuid
)
);
@@ -413,9 +417,10 @@ TRACE_EVENT(cxl_general_media,
TRACE_EVENT(cxl_dram,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_region *cxlr, u64 hpa, struct cxl_event_dram *rec),
+ struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
+ struct cxl_event_dram *rec),
- TP_ARGS(cxlmd, log, cxlr, hpa, rec),
+ TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -431,6 +436,7 @@ TRACE_EVENT(cxl_dram,
__field(u32, row)
__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
__field(u64, hpa)
+ __field(u64, hpa_alias)
__field_struct(uuid_t, region_uuid)
__field(u8, rank) /* Out of order to pack trace record */
__field(u8, bank_group) /* Out of order to pack trace record */
@@ -461,6 +467,7 @@ TRACE_EVENT(cxl_dram,
memcpy(__entry->cor_mask, &rec->correction_mask,
CXL_EVENT_DER_CORRECTION_MASK_SIZE);
__entry->hpa = hpa;
+ __entry->hpa_alias = hpa_alias;
if (cxlr) {
__assign_str(region_name);
uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -474,7 +481,7 @@ TRACE_EVENT(cxl_dram,
"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
"validity_flags='%s' " \
- "hpa=%llx region=%s region_uuid=%pUb",
+ "hpa=%llx hpa_alias=%llx region=%s region_uuid=%pUb",
__entry->dpa, show_dpa_flags(__entry->dpa_flags),
show_event_desc_flags(__entry->descriptor),
show_mem_event_type(__entry->type),
@@ -484,7 +491,8 @@ TRACE_EVENT(cxl_dram,
__entry->row, __entry->column,
__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
show_dram_valid_flags(__entry->validity_flags),
- __entry->hpa, __get_str(region_name), &__entry->region_uuid
+ __entry->hpa_alias, __entry->hpa, __get_str(region_name),
+ &__entry->region_uuid
)
);
--
2.46.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events
2024-09-27 14:16 ` [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
@ 2024-10-17 16:38 ` Jonathan Cameron
2024-10-30 23:29 ` Dave Jiang
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:38 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Fri, 27 Sep 2024 07:16:57 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Add the aliased address of exteneded linear cache when emitting event
extended
> trace for DRAM and general media of CXL events.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Trivial comment inline.
> ---
> drivers/cxl/core/acpi.c | 10 ++++++++++
> drivers/cxl/core/core.h | 7 +++++++
> drivers/cxl/core/mbox.c | 42 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/core/region.c | 2 +-
> drivers/cxl/core/trace.h | 24 ++++++++++++++--------
> 5 files changed, 73 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
> index f13b4dae6ac5..f74136320fc3 100644
> --- a/drivers/cxl/core/acpi.c
> +++ b/drivers/cxl/core/acpi.c
> @@ -9,3 +9,13 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
> {
> return hmat_get_extended_linear_cache_size(backing_res, nid, size);
> }
> +
> +int cxl_acpi_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
> +{
> + return hmat_extended_linear_cache_address_xlat(address, alias, nid);
Can we just stub them out in acpi.h? I'm not sure wrapping them gives us
anything useful.
> +}
> +
> +int cxl_acpi_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
> +{
> + return hmat_extended_linear_cache_alias_xlat(address, alias, nid);
> +}
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events
2024-10-17 16:38 ` Jonathan Cameron
@ 2024-10-30 23:29 ` Dave Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2024-10-30 23:29 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/17/24 9:38 AM, Jonathan Cameron wrote:
> On Fri, 27 Sep 2024 07:16:57 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Add the aliased address of exteneded linear cache when emitting event
> extended
>
>> trace for DRAM and general media of CXL events.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>
> Trivial comment inline.
>> ---
>> drivers/cxl/core/acpi.c | 10 ++++++++++
>> drivers/cxl/core/core.h | 7 +++++++
>> drivers/cxl/core/mbox.c | 42 ++++++++++++++++++++++++++++++++++++---
>> drivers/cxl/core/region.c | 2 +-
>> drivers/cxl/core/trace.h | 24 ++++++++++++++--------
>> 5 files changed, 73 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
>> index f13b4dae6ac5..f74136320fc3 100644
>> --- a/drivers/cxl/core/acpi.c
>> +++ b/drivers/cxl/core/acpi.c
>> @@ -9,3 +9,13 @@ int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
>> {
>> return hmat_get_extended_linear_cache_size(backing_res, nid, size);
>> }
>> +
>> +int cxl_acpi_extended_linear_cache_address_xlat(u64 *address, u64 alias, int nid)
>> +{
>> + return hmat_extended_linear_cache_address_xlat(address, alias, nid);
>
> Can we just stub them out in acpi.h? I'm not sure wrapping them gives us
> anything useful.
It's already stubbed out in acpi.h. Here it's wrapped so that core/mbox.c does not need pull in linux/acpi.h header.
>
>> +}
>> +
>> +int cxl_acpi_extended_linear_cache_alias_xlat(u64 address, u64 *alias, int nid)
>> +{
>> + return hmat_extended_linear_cache_alias_xlat(address, alias, nid);
>> +}
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
` (4 preceding siblings ...)
2024-09-27 14:16 ` [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
@ 2024-09-27 14:16 ` Dave Jiang
2024-10-17 16:40 ` Jonathan Cameron
2024-10-17 16:46 ` [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Jonathan Cameron
6 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-09-27 14:16 UTC (permalink / raw)
To: linux-cxl, linux-acpi
Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
alison.schofield, ira.weiny
Below is a setup with extended linear cache configuration with an example
layout of of memory region shown below presented as a single memory region
consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
The kernel sees a region of total 256G of system memory.
128G DRAM 128G CXL memory
|-----------------------------------|-------------------------------------|
Data resides in either DRAM or far memory (FM) with no replication. Hot data
is swapped into DRAM by the hardware behind the scenes. When error is detected
in one location, it is possible that error also resides in the aliased
location. Therefore when a memory location that is flagged by MCE is part of
the special region, the aliased memory location needs to be offlined as well.
Add an mce notify callback to identify if the MCE address location is part of
an extended linear cache region and handle accordingly.
Added symbol export to set_mce_nospec() in x86 code in order to call
set_mce_nospec() from the CXL MCE notify callback.
Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/mm/pat/set_memory.c | 1 +
drivers/cxl/core/mbox.c | 45 ++++++++++++++++++++++++++++++++++++
drivers/cxl/core/region.c | 25 ++++++++++++++++++++
drivers/cxl/cxl.h | 6 +++++
drivers/cxl/cxlmem.h | 2 ++
6 files changed, 80 insertions(+)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3ad29b128943..5da45e870858 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -182,6 +182,7 @@ enum mce_notifier_prios {
MCE_PRIO_NFIT,
MCE_PRIO_EXTLOG,
MCE_PRIO_UC,
+ MCE_PRIO_CXL,
MCE_PRIO_EARLY,
MCE_PRIO_CEC,
MCE_PRIO_HIGHEST = MCE_PRIO_CEC
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 44f7b2ea6a07..1f85c29e118e 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2083,6 +2083,7 @@ int set_mce_nospec(unsigned long pfn)
pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
return rc;
}
+EXPORT_SYMBOL_GPL(set_mce_nospec);
/* Restore full speculative operation to the pfn. */
int clear_mce_nospec(unsigned long pfn)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index ac170fd85a1a..4488f30abc64 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -4,6 +4,9 @@
#include <linux/debugfs.h>
#include <linux/ktime.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/set_memory.h>
+#include <asm/mce.h>
#include <asm/unaligned.h>
#include <cxlpci.h>
#include <cxlmem.h>
@@ -1444,6 +1447,44 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
+static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
+ void *data)
+{
+ struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
+ mce_notifier);
+ struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct mce *mce = (struct mce *)data;
+ u64 spa, spa_alias;
+ unsigned long pfn;
+
+ if (!mce || !mce_usable_address(mce))
+ return NOTIFY_DONE;
+
+ spa = mce->addr & MCI_ADDR_PHYSADDR;
+
+ pfn = spa >> PAGE_SHIFT;
+ if (!pfn_valid(pfn))
+ return NOTIFY_DONE;
+
+ spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
+ if (!spa_alias)
+ return NOTIFY_DONE;
+
+ pfn = spa_alias >> PAGE_SHIFT;
+
+ /*
+ * Take down the aliased memory page. The original memory page flagged
+ * by the MCE will be taken cared of by the standard MCE handler.
+ */
+ dev_emerg(mds->cxlds.dev, "Offlining aliased SPA address: %#llx\n",
+ spa_alias);
+ if (!memory_failure(pfn, 0))
+ set_mce_nospec(pfn);
+
+ return NOTIFY_OK;
+}
+
struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
{
struct cxl_memdev_state *mds;
@@ -1463,6 +1504,10 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
+ mds->mce_notifier.notifier_call = cxl_handle_mce;
+ mds->mce_notifier.priority = MCE_PRIO_CXL;
+ mce_register_decode_chain(&mds->mce_notifier);
+
return mds;
}
EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, CXL);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c19bbbf8079d..a60af9763a95 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3420,6 +3420,31 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
}
EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
+u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa)
+{
+ struct cxl_region_ref *iter;
+ unsigned long index;
+
+ guard(rwsem_write)(&cxl_region_rwsem);
+
+ xa_for_each(&endpoint->regions, index, iter) {
+ struct cxl_region_params *p = &iter->region->params;
+
+ if (p->res->start <= spa && spa <= p->res->end) {
+ if (!p->cache_size)
+ return 0;
+
+ if (spa > p->res->start + p->cache_size)
+ return spa - p->cache_size;
+
+ return spa + p->cache_size;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_get_spa_cache_alias, CXL);
+
static int is_system_ram(struct resource *res, void *arg)
{
struct cxl_region *cxlr = arg;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index d8d715090779..8516f6da620c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -864,6 +864,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled);
struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
+u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
#else
static inline bool is_cxl_pmem_region(struct device *dev)
{
@@ -882,6 +883,11 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
{
return NULL;
}
+static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
+ u64 spa)
+{
+ return 0;
+}
#endif
void cxl_endpoint_parse_cdat(struct cxl_port *port);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index afb53d058d62..46515d2a49cb 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -470,6 +470,7 @@ struct cxl_dev_state {
* @poison: poison driver state info
* @security: security driver state info
* @fw: firmware upload / activation state
+ * @mce_notifier: MCE notifier
* @mbox_wait: RCU wait for mbox send completely
* @mbox_send: @dev specific transport for transmitting mailbox commands
*
@@ -500,6 +501,7 @@ struct cxl_memdev_state {
struct cxl_poison_state poison;
struct cxl_security_state security;
struct cxl_fw_state fw;
+ struct notifier_block mce_notifier;
struct rcuwait mbox_wait;
int (*mbox_send)(struct cxl_memdev_state *mds,
--
2.46.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-09-27 14:16 ` [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
@ 2024-10-17 16:40 ` Jonathan Cameron
2024-10-30 23:37 ` Dave Jiang
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:40 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Fri, 27 Sep 2024 07:16:58 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Below is a setup with extended linear cache configuration with an example
> layout of of memory region shown below presented as a single memory region
> consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
> The kernel sees a region of total 256G of system memory.
>
> 128G DRAM 128G CXL memory
> |-----------------------------------|-------------------------------------|
>
> Data resides in either DRAM or far memory (FM) with no replication. Hot data
> is swapped into DRAM by the hardware behind the scenes. When error is detected
> in one location, it is possible that error also resides in the aliased
> location. Therefore when a memory location that is flagged by MCE is part of
> the special region, the aliased memory location needs to be offlined as well.
>
> Add an mce notify callback to identify if the MCE address location is part of
> an extended linear cache region and handle accordingly.
>
> Added symbol export to set_mce_nospec() in x86 code in order to call
> set_mce_nospec() from the CXL MCE notify callback.
Whilst not commenting on whether any other implementation might exist,
this code should be written to be arch independent at some level.
>
> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-10-17 16:40 ` Jonathan Cameron
@ 2024-10-30 23:37 ` Dave Jiang
2024-10-31 21:12 ` Dave Jiang
0 siblings, 1 reply; 25+ messages in thread
From: Dave Jiang @ 2024-10-30 23:37 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/17/24 9:40 AM, Jonathan Cameron wrote:
> On Fri, 27 Sep 2024 07:16:58 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Below is a setup with extended linear cache configuration with an example
>> layout of of memory region shown below presented as a single memory region
>> consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
>> The kernel sees a region of total 256G of system memory.
>>
>> 128G DRAM 128G CXL memory
>> |-----------------------------------|-------------------------------------|
>>
>> Data resides in either DRAM or far memory (FM) with no replication. Hot data
>> is swapped into DRAM by the hardware behind the scenes. When error is detected
>> in one location, it is possible that error also resides in the aliased
>> location. Therefore when a memory location that is flagged by MCE is part of
>> the special region, the aliased memory location needs to be offlined as well.
>>
>> Add an mce notify callback to identify if the MCE address location is part of
>> an extended linear cache region and handle accordingly.
>>
>> Added symbol export to set_mce_nospec() in x86 code in order to call
>> set_mce_nospec() from the CXL MCE notify callback.
>
> Whilst not commenting on whether any other implementation might exist,
> this code should be written to be arch independent at some level.
I did get a 0-day report on this with mce bits. But with asm/mce.h included, it seems to make other archs happy as well AFAICT.
>
>>
>> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache
2024-10-30 23:37 ` Dave Jiang
@ 2024-10-31 21:12 ` Dave Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2024-10-31 21:12 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/30/24 4:37 PM, Dave Jiang wrote:
>
>
> On 10/17/24 9:40 AM, Jonathan Cameron wrote:
>> On Fri, 27 Sep 2024 07:16:58 -0700
>> Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>> Below is a setup with extended linear cache configuration with an example
>>> layout of of memory region shown below presented as a single memory region
>>> consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
>>> The kernel sees a region of total 256G of system memory.
>>>
>>> 128G DRAM 128G CXL memory
>>> |-----------------------------------|-------------------------------------|
>>>
>>> Data resides in either DRAM or far memory (FM) with no replication. Hot data
>>> is swapped into DRAM by the hardware behind the scenes. When error is detected
>>> in one location, it is possible that error also resides in the aliased
>>> location. Therefore when a memory location that is flagged by MCE is part of
>>> the special region, the aliased memory location needs to be offlined as well.
>>>
>>> Add an mce notify callback to identify if the MCE address location is part of
>>> an extended linear cache region and handle accordingly.
>>>
>>> Added symbol export to set_mce_nospec() in x86 code in order to call
>>> set_mce_nospec() from the CXL MCE notify callback.
>>
>> Whilst not commenting on whether any other implementation might exist,
>> this code should be written to be arch independent at some level.
>
> I did get a 0-day report on this with mce bits. But with asm/mce.h included, it seems to make other archs happy as well AFAICT.
Ok I was wrong. Arch wrappers needed to deal with MCE bits only exists for x86.
>>
>>>
>>> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support
2024-09-27 14:16 [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
` (5 preceding siblings ...)
2024-09-27 14:16 ` [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
@ 2024-10-17 16:46 ` Jonathan Cameron
2024-10-29 22:55 ` Dave Jiang
6 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-10-17 16:46 UTC (permalink / raw)
To: Dave Jiang
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On Fri, 27 Sep 2024 07:16:52 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> Hi all,
> I'm looking for comments on the approach and the implementation of dealing with
> this exclusive caching configuration. I have concerns with the discovering and
> handling of I/O hole in the memory mapping and looking for suggestions on if
> there are better ways to do it. I will be taking a 4 weeks sabbatical starting
> next week and I apologize in advance in the delay on responses. Thank you in
> advance for reviewing the patches.
>
> The MCE folks will be interested in patch 6/6 where MCE_PRIO_CXL is added.
>
>
> Certain systems provide an exclusive caching memory configurations where a
> 1:1 layout of DRAM and far memory (FR) such as CXL memory is utilized. In
(FM) at least that is what you use later.
> this configuration, the memory region is provided as a single memory region
> to the OS. For example such as below:
>
> 128GB DRAM 128GB CXL memory
> |------------------------------------|------------------------------------|
So this differs slightly from what I expected.
The ACPI spec change I believe allows for the CXL memory to be be N times
bigger than the cache.
I'm not against only supporting 1:1, but I didn't immediately see code
to check for that and scream if it sees something different.
Also as I mention in one of the patches, I don't recall the ACPI stuff
giving an 'order' to the two types of memory. Maybe I'm missing that
but in theory at least I think the code needs to be more flexible
(or renamed perhaps).
Jonathan
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support
2024-10-17 16:46 ` [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Jonathan Cameron
@ 2024-10-29 22:55 ` Dave Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2024-10-29 22:55 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
dave, alison.schofield, ira.weiny
On 10/17/24 9:46 AM, Jonathan Cameron wrote:
> On Fri, 27 Sep 2024 07:16:52 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Hi all,
>> I'm looking for comments on the approach and the implementation of dealing with
>> this exclusive caching configuration. I have concerns with the discovering and
>> handling of I/O hole in the memory mapping and looking for suggestions on if
>> there are better ways to do it. I will be taking a 4 weeks sabbatical starting
>> next week and I apologize in advance in the delay on responses. Thank you in
>> advance for reviewing the patches.
>>
>> The MCE folks will be interested in patch 6/6 where MCE_PRIO_CXL is added.
>>
>>
>> Certain systems provide an exclusive caching memory configurations where a
>> 1:1 layout of DRAM and far memory (FR) such as CXL memory is utilized. In
> (FM) at least that is what you use later.
>
>
>> this configuration, the memory region is provided as a single memory region
>> to the OS. For example such as below:
>>
>> 128GB DRAM 128GB CXL memory
>> |------------------------------------|------------------------------------|
>
> So this differs slightly from what I expected.
> The ACPI spec change I believe allows for the CXL memory to be be N times
> bigger than the cache.
Right. Spec allows that. Implementation I'm dealing with is only 1:1. So only limited implementation for now.
>
> I'm not against only supporting 1:1, but I didn't immediately see code
> to check for that and scream if it sees something different.
Yes. I need to add detection for that and emit warning.
>
> Also as I mention in one of the patches, I don't recall the ACPI stuff
> giving an 'order' to the two types of memory. Maybe I'm missing that
> but in theory at least I think the code needs to be more flexible
> (or renamed perhaps).
Yes no requirement by the spec on the ordering. Just available implementation.
>
> Jonathan
>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread