[PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support
@ 2025-01-17 17:28 Dave Jiang
  2025-01-17 17:28 ` [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Dave Jiang @ 2025-01-17 17:28 UTC (permalink / raw)
  To: linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny, ming.li

v3:
- Drop region to nid function, deadcode.
- Set hpa_alias default to ~0ULL to indicate no alias. (Jonathan)
- Add endpoint check for mce handler. (Ming)
- Add mce notifier unregister. (Ming)

v2:
- Fix 0-day issues
- Fix checking of cache flag. (Ming)
- Add comment about cache range vs CFMWS. (Ming)
- Update EXPORT_SYMBOL_(). (Jonathan)
- Fix various code comments. (Jonathan)
- Emit hpa_alias0 instead of hpa_alias. (Jonathan)
- Introduce CONFIG_CXL_MCE to address kernel build dep issues.

v1:
- Drop RFC prefix
- Drop MMIO hole discovery. Will implement if there's real world implementation.
- Drop MCE_PRI_CXL. Use MCE_PRI_UC. (Boris)
- Minor refactors and grammar fixes. (Jonathan)
- Rename 'mode' to 'address_mode'. (Jonathan)

RFCv2:
- Dropped 1/6 (ACPICA definition merged)
- Change UNKNOWN to RESERVED for cache definition. (Jonathan)
- Fix spelling errors (Jonathan)
- Rename region_res_match_range() to region_res_match_cxl_range(). (Jonathan)
- Add warning when cache is not 1:1 with backing region. (Jonathan)
- Code and comments cleanup. (Jonathan)
- Make MCE code access in CXL arch independent. (Jonathan)
- Fixup 0-day reports.

Certain systems provide an exclusive caching memory configurations where a
1:1 layout of DRAM and far memory (FM) such as CXL memory is utilized. In
this configuration, the memory region is provided as a single memory region
to the OS. For example such as below:

             128GB DRAM                         128GB CXL memory
|------------------------------------|------------------------------------|

The kernel sees the region as a 256G system memory region. Data can reside
in either DRAM or FM with no replication. Hot data is swapped into DRAM by
the hardware behind the scenes.

This kernel series introduces code to enumerate the side cache by the kernel
when configured in a exclusive-cache configuration. It also adds RAS support
to deal with the aliased memory addresses.

A new ECN [1] to ACPI HMAT table was introduced and was approved to describe
the "extended-linear" addressing for direct-mapped memory-side caches. A
reserved field in the Memory Side Cache Information Structure of HMAT is
redefined as "Address Mode" where a value of 1 is defined as Extended-linear
mode. This value is valid if the cache is direct mapped. "It indicates that
the associated address range (SRAT.MemoryAffinityStructure.Length) is
comprised of the backing store capacity extended by the cache capacity." By
augmenting the HMAT and SRAT parsing code, this new information can be stored
by the HMAT handling code.

Current CXL region enumeration code is not enlightened with the side cache
configuration and therefore only presents the region size as the size of the
CXL region. Add support to allow CXL region enumeration code to query the HMAT 
handling code and retrieve the information regarding the side cache and adjust
the region size accordingly. This should allow the CXL CLI to display the
full region size rather than just the CXL only region size.

There are 3 sources where the kernel may be notified that error is detected for
memory.
1. CXL DRAM event. This is a CXL event that is generated when an error is
   detected by the CXL device patrol or demand scrubber. The trace_event is
   augmented to display the aliased System Phyiscal Address (SPA) in addition
   to the alerted address.  However, reporting of memory failure is TBD until
   the discussion [2] of failure reporting is settled upstream.
2. UCNA event from DRAM patrol or demand scrubber. This should eventually go
   through the MCE callback chain.
3. MCE from kernel consume poison.

It is possible that all 3 sources may report at the same time and all report
at the error.

For 2 and 3, a MCE notifier callback is registered by the CXL on a per device
basis. The callback will determine if the reported address is in one of the
special regions and offline the aliased address if that is the case.

[1]: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
[2]: https://lore.kernel.org/linux-cxl/20240808151328.707869-2-ruansy.fnst@fujitsu.com/

---

Dave Jiang (4):
      acpi: numa: Add support to enumerate and store extended linear address mode
      acpi/hmat / cxl: Add extended linear cache support for CXL
      cxl: Add extended linear cache address alias emission for cxl events
      cxl: Add mce notifier to emit aliased address for extended linear cache

 Documentation/ABI/stable/sysfs-devices-node |   6 +++
 arch/x86/mm/pat/set_memory.c                |   1 +
 drivers/acpi/numa/hmat.c                    |  44 +++++++++++++++++++++
 drivers/base/node.c                         |   2 +
 drivers/cxl/Kconfig                         |   4 ++
 drivers/cxl/core/Makefile                   |   2 +
 drivers/cxl/core/acpi.c                     |  11 ++++++
 drivers/cxl/core/core.h                     |   3 ++
 drivers/cxl/core/mbox.c                     |  36 +++++++++++++++--
 drivers/cxl/core/mce.c                      |  63 ++++++++++++++++++++++++++++++
 drivers/cxl/core/mce.h                      |  16 ++++++++
 drivers/cxl/core/region.c                   | 101 ++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/cxl/core/trace.h                    |  24 ++++++++----
 drivers/cxl/cxl.h                           |   8 ++++
 drivers/cxl/cxlmem.h                        |   2 +
 include/linux/acpi.h                        |  11 ++++++
 include/linux/node.h                        |   7 ++++
 tools/testing/cxl/Kbuild                    |   2 +
 18 files changed, 327 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode
  2025-01-17 17:28 [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
@ 2025-01-17 17:28 ` Dave Jiang
  2025-02-21  1:42   ` Alison Schofield
  2025-01-17 17:28 ` [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Jiang @ 2025-01-17 17:28 UTC (permalink / raw)
  To: linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny, ming.li, Jonathan Cameron

Store the address mode as part of the cache attriutes. Export the mode
attribute to sysfs as all other cache attributes.

Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/ABI/stable/sysfs-devices-node | 6 ++++++
 drivers/acpi/numa/hmat.c                    | 5 +++++
 drivers/base/node.c                         | 2 ++
 include/linux/node.h                        | 7 +++++++
 4 files changed, 20 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index 402af4b2b905..c46b910dfe00 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -177,6 +177,12 @@ Description:
 		The cache write policy: 0 for write-back, 1 for write-through,
 		other or unknown.
 
+What:		/sys/devices/system/node/nodeX/memory_side_cache/indexY/address_mode
+Date:		December 2024
+Contact:	Dave Jiang <dave.jiang@intel.com>
+Description:
+		The address mode: 0 for reserved, 1 for extended-linear.
+
 What:		/sys/devices/system/node/nodeX/x86/sgx_total_bytes
 Date:		November 2021
 Contact:	Jarkko Sakkinen <jarkko@kernel.org>
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 80a3481c0470..a9172cf90002 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -506,6 +506,11 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
 	switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
 	case ACPI_HMAT_CA_DIRECT_MAPPED:
 		tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
+		/* Extended Linear mode is only valid if cache is direct mapped */
+		if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR) {
+			tcache->cache_attrs.address_mode =
+				NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR;
+		}
 		break;
 	case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
 		tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ea653fa3433..cd13ef287011 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
 CACHE_ATTR(line_size, "%u")
 CACHE_ATTR(indexing, "%u")
 CACHE_ATTR(write_policy, "%u")
+CACHE_ATTR(address_mode, "%#x")
 
 static struct attribute *cache_attrs[] = {
 	&dev_attr_indexing.attr,
 	&dev_attr_size.attr,
 	&dev_attr_line_size.attr,
 	&dev_attr_write_policy.attr,
+	&dev_attr_address_mode.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(cache);
diff --git a/include/linux/node.h b/include/linux/node.h
index 9a881c2208b3..2b7517892230 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -57,6 +57,11 @@ enum cache_write_policy {
 	NODE_CACHE_WRITE_OTHER,
 };
 
+enum cache_mode {
+	NODE_CACHE_ADDR_MODE_RESERVED,
+	NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR,
+};
+
 /**
  * struct node_cache_attrs - system memory caching attributes
  *
@@ -65,6 +70,7 @@ enum cache_write_policy {
  * @size:		Total size of cache in bytes
  * @line_size:		Number of bytes fetched on a cache miss
  * @level:		The cache hierarchy level
+ * @address_mode:		The address mode
  */
 struct node_cache_attrs {
 	enum cache_indexing indexing;
@@ -72,6 +78,7 @@ struct node_cache_attrs {
 	u64 size;
 	u16 line_size;
 	u8 level;
+	u16 address_mode;
 };
 
 #ifdef CONFIG_HMEM_REPORTING
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode
  2025-01-17 17:28 ` [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
@ 2025-02-21  1:42   ` Alison Schofield
  2025-02-21 23:45     ` Dave Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Alison Schofield @ 2025-02-21  1:42 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, jonathan.cameron, ira.weiny, ming.li

On Fri, Jan 17, 2025 at 10:28:30AM -0700, Dave Jiang wrote:
> Store the address mode as part of the cache attriutes. Export the mode
> attribute to sysfs as all other cache attributes.
> 
> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  Documentation/ABI/stable/sysfs-devices-node | 6 ++++++
>  drivers/acpi/numa/hmat.c                    | 5 +++++
>  drivers/base/node.c                         | 2 ++
>  include/linux/node.h                        | 7 +++++++
>  4 files changed, 20 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 402af4b2b905..c46b910dfe00 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -177,6 +177,12 @@ Description:
>  		The cache write policy: 0 for write-back, 1 for write-through,
>  		other or unknown.
>  
> +What:		/sys/devices/system/node/nodeX/memory_side_cache/indexY/address_mode
> +Date:		December 2024
> +Contact:	Dave Jiang <dave.jiang@intel.com>
> +Description:
> +		The address mode: 0 for reserved, 1 for extended-linear.
> +

I was going to say something about the brevity of the description,
but when I looked in the file, I see this is like all the other
memory_side_cache descriptions.

So - I'll just say - update that Date :)


>  What:		/sys/devices/system/node/nodeX/x86/sgx_total_bytes
>  Date:		November 2021
>  Contact:	Jarkko Sakkinen <jarkko@kernel.org>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 80a3481c0470..a9172cf90002 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -506,6 +506,11 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
>  	switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
>  	case ACPI_HMAT_CA_DIRECT_MAPPED:
>  		tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
> +		/* Extended Linear mode is only valid if cache is direct mapped */
> +		if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR) {
> +			tcache->cache_attrs.address_mode =
> +				NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR;
> +		}
>  		break;
>  	case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
>  		tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 0ea653fa3433..cd13ef287011 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
>  CACHE_ATTR(line_size, "%u")
>  CACHE_ATTR(indexing, "%u")
>  CACHE_ATTR(write_policy, "%u")
> +CACHE_ATTR(address_mode, "%#x")

why not "%u" fmt ?

>  
>  static struct attribute *cache_attrs[] = {
>  	&dev_attr_indexing.attr,
>  	&dev_attr_size.attr,
>  	&dev_attr_line_size.attr,
>  	&dev_attr_write_policy.attr,
> +	&dev_attr_address_mode.attr,
>  	NULL,
>  };
>  ATTRIBUTE_GROUPS(cache);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 9a881c2208b3..2b7517892230 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -57,6 +57,11 @@ enum cache_write_policy {
>  	NODE_CACHE_WRITE_OTHER,
>  };
>  
> +enum cache_mode {
> +	NODE_CACHE_ADDR_MODE_RESERVED,
> +	NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR,
> +};
> +
>  /**
>   * struct node_cache_attrs - system memory caching attributes
>   *
> @@ -65,6 +70,7 @@ enum cache_write_policy {
>   * @size:		Total size of cache in bytes
>   * @line_size:		Number of bytes fetched on a cache miss
>   * @level:		The cache hierarchy level
> + * @address_mode:		The address mode
>   */
>  struct node_cache_attrs {
>  	enum cache_indexing indexing;
> @@ -72,6 +78,7 @@ struct node_cache_attrs {
>  	u64 size;
>  	u16 line_size;
>  	u8 level;
> +	u16 address_mode;
>  };
>  
>  #ifdef CONFIG_HMEM_REPORTING
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode
  2025-02-21  1:42   ` Alison Schofield
@ 2025-02-21 23:45     ` Dave Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Jiang @ 2025-02-21 23:45 UTC (permalink / raw)
  To: Alison Schofield
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, jonathan.cameron, ira.weiny, ming.li



On 2/20/25 6:42 PM, Alison Schofield wrote:
> On Fri, Jan 17, 2025 at 10:28:30AM -0700, Dave Jiang wrote:
>> Store the address mode as part of the cache attriutes. Export the mode
>> attribute to sysfs as all other cache attributes.
>>
>> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>  Documentation/ABI/stable/sysfs-devices-node | 6 ++++++
>>  drivers/acpi/numa/hmat.c                    | 5 +++++
>>  drivers/base/node.c                         | 2 ++
>>  include/linux/node.h                        | 7 +++++++
>>  4 files changed, 20 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
>> index 402af4b2b905..c46b910dfe00 100644
>> --- a/Documentation/ABI/stable/sysfs-devices-node
>> +++ b/Documentation/ABI/stable/sysfs-devices-node
>> @@ -177,6 +177,12 @@ Description:
>>  		The cache write policy: 0 for write-back, 1 for write-through,
>>  		other or unknown.
>>  
>> +What:		/sys/devices/system/node/nodeX/memory_side_cache/indexY/address_mode
>> +Date:		December 2024
>> +Contact:	Dave Jiang <dave.jiang@intel.com>
>> +Description:
>> +		The address mode: 0 for reserved, 1 for extended-linear.
>> +
> 
> I was going to say something about the brevity of the description,
> but when I looked in the file, I see this is like all the other
> memory_side_cache descriptions.
> 
> So - I'll just say - update that Date :)
> 
> 
>>  What:		/sys/devices/system/node/nodeX/x86/sgx_total_bytes
>>  Date:		November 2021
>>  Contact:	Jarkko Sakkinen <jarkko@kernel.org>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index 80a3481c0470..a9172cf90002 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -506,6 +506,11 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header,
>>  	switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) {
>>  	case ACPI_HMAT_CA_DIRECT_MAPPED:
>>  		tcache->cache_attrs.indexing = NODE_CACHE_DIRECT_MAP;
>> +		/* Extended Linear mode is only valid if cache is direct mapped */
>> +		if (cache->address_mode == ACPI_HMAT_CACHE_MODE_EXTENDED_LINEAR) {
>> +			tcache->cache_attrs.address_mode =
>> +				NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR;
>> +		}
>>  		break;
>>  	case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING:
>>  		tcache->cache_attrs.indexing = NODE_CACHE_INDEXED;
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 0ea653fa3433..cd13ef287011 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -244,12 +244,14 @@ CACHE_ATTR(size, "%llu")
>>  CACHE_ATTR(line_size, "%u")
>>  CACHE_ATTR(indexing, "%u")
>>  CACHE_ATTR(write_policy, "%u")
>> +CACHE_ATTR(address_mode, "%#x")
> 
> why not "%u" fmt ?

It's a bitfield value and not decimal. 

DJ

> 
>>  
>>  static struct attribute *cache_attrs[] = {
>>  	&dev_attr_indexing.attr,
>>  	&dev_attr_size.attr,
>>  	&dev_attr_line_size.attr,
>>  	&dev_attr_write_policy.attr,
>> +	&dev_attr_address_mode.attr,
>>  	NULL,
>>  };
>>  ATTRIBUTE_GROUPS(cache);
>> diff --git a/include/linux/node.h b/include/linux/node.h
>> index 9a881c2208b3..2b7517892230 100644
>> --- a/include/linux/node.h
>> +++ b/include/linux/node.h
>> @@ -57,6 +57,11 @@ enum cache_write_policy {
>>  	NODE_CACHE_WRITE_OTHER,
>>  };
>>  
>> +enum cache_mode {
>> +	NODE_CACHE_ADDR_MODE_RESERVED,
>> +	NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR,
>> +};
>> +
>>  /**
>>   * struct node_cache_attrs - system memory caching attributes
>>   *
>> @@ -65,6 +70,7 @@ enum cache_write_policy {
>>   * @size:		Total size of cache in bytes
>>   * @line_size:		Number of bytes fetched on a cache miss
>>   * @level:		The cache hierarchy level
>> + * @address_mode:		The address mode
>>   */
>>  struct node_cache_attrs {
>>  	enum cache_indexing indexing;
>> @@ -72,6 +78,7 @@ struct node_cache_attrs {
>>  	u64 size;
>>  	u16 line_size;
>>  	u8 level;
>> +	u16 address_mode;
>>  };
>>  
>>  #ifdef CONFIG_HMEM_REPORTING
>> -- 
>> 2.47.1
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL
  2025-01-17 17:28 [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
  2025-01-17 17:28 ` [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
@ 2025-01-17 17:28 ` Dave Jiang
  2025-02-21  3:09   ` Alison Schofield
  2025-01-17 17:28 ` [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
  2025-01-17 17:28 ` [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Jiang @ 2025-01-17 17:28 UTC (permalink / raw)
  To: linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny, ming.li, Jonathan Cameron

The current cxl region size only indicates the size of the CXL memory
region without accounting for the extended linear cache size. Retrieve the
cache size from HMAT and append that to the cxl region size for the cxl
region range that matches the SRAT range that has extended linear cache
enabled.

The SRAT defines the whole memory range that includes the extended linear
cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
Cache Information Structure defines the size of the extended linear cache
size and matches to the SRAT Memory Affinity Structure by the memory
proxmity domain. Add a helper to match the cxl range to the SRAT memory
range in order to retrieve the cache size.

There are several places that checks the cxl region range against the
decoder range. Use new helper to check between the two ranges and address
the new cache size.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/acpi/numa/hmat.c  | 39 +++++++++++++++++++++
 drivers/cxl/core/Makefile |  1 +
 drivers/cxl/core/acpi.c   | 11 ++++++
 drivers/cxl/core/core.h   |  3 ++
 drivers/cxl/core/region.c | 73 ++++++++++++++++++++++++++++++++++++---
 drivers/cxl/cxl.h         |  2 ++
 include/linux/acpi.h      | 11 ++++++
 tools/testing/cxl/Kbuild  |  1 +
 8 files changed, 137 insertions(+), 4 deletions(-)
 create mode 100644 drivers/cxl/core/acpi.c

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index a9172cf90002..6a210abb4a32 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -108,6 +108,45 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
 	return NULL;
 }
 
+/**
+ * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
+ * @backing_res: resource from the backing media
+ * @nid: node id for the memory region
+ * @cache_size: (Output) size of extended linear cache.
+ *
+ * Return: 0 on success. Errno on failure.
+ *
+ */
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+					resource_size_t *cache_size)
+{
+	unsigned int pxm = node_to_pxm(nid);
+	struct memory_target *target;
+	struct target_cache *tcache;
+	struct resource *res;
+
+	target = find_mem_target(pxm);
+	if (!target)
+		return -ENOENT;
+
+	list_for_each_entry(tcache, &target->caches, node) {
+		if (tcache->cache_attrs.address_mode !=
+				NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR)
+			continue;
+
+		res = &target->memregions;
+		if (!resource_contains(res, backing_res))
+			continue;
+
+		*cache_size = tcache->cache_attrs.size;
+		return 0;
+	}
+
+	*cache_size = 0;
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(hmat_get_extended_linear_cache_size, "CXL");
+
 static struct memory_target *acpi_find_genport_target(u32 uid)
 {
 	struct memory_target *target;
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..1a0c9c6ca818 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -14,5 +14,6 @@ cxl_core-y += pci.o
 cxl_core-y += hdm.o
 cxl_core-y += pmu.o
 cxl_core-y += cdat.o
+cxl_core-y += acpi.o
 cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
new file mode 100644
index 000000000000..f13b4dae6ac5
--- /dev/null
+++ b/drivers/cxl/core/acpi.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#include <linux/acpi.h>
+#include "cxl.h"
+#include "core.h"
+
+int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
+					    int nid, resource_size_t *size)
+{
+	return hmat_get_extended_linear_cache_size(backing_res, nid, size);
+}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 800466f96a68..0fb779b612d1 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -115,4 +115,7 @@ bool cxl_need_node_perf_attrs_update(int nid);
 int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
 					struct access_coordinate *c);
 
+int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res,
+					    int nid, resource_size_t *size);
+
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b98b1ccffd1c..2d8699a86b24 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -824,6 +824,21 @@ static int match_free_decoder(struct device *dev, void *data)
 	return 1;
 }
 
+static bool region_res_match_cxl_range(struct cxl_region_params *p,
+				       struct range *range)
+{
+	if (!p->res)
+		return false;
+
+	/*
+	 * If an extended linear cache region then the CXL range is assumed
+	 * to be fronted by the DRAM range in current known implementation.
+	 * This assumption will be made until a variant implementation exists.
+	 */
+	return p->res->start + p->cache_size == range->start &&
+		p->res->end == range->end;
+}
+
 static int match_auto_decoder(struct device *dev, void *data)
 {
 	struct cxl_region_params *p = data;
@@ -836,7 +851,7 @@ static int match_auto_decoder(struct device *dev, void *data)
 	cxld = to_cxl_decoder(dev);
 	r = &cxld->hpa_range;
 
-	if (p->res && p->res->start == r->start && p->res->end == r->end)
+	if (region_res_match_cxl_range(p, r))
 		return 1;
 
 	return 0;
@@ -1424,8 +1439,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
 		if (cxld->interleave_ways != iw ||
 		    cxld->interleave_granularity != ig ||
-		    cxld->hpa_range.start != p->res->start ||
-		    cxld->hpa_range.end != p->res->end ||
+		    !region_res_match_cxl_range(p, &cxld->hpa_range) ||
 		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
 			dev_err(&cxlr->dev,
 				"%s:%s %s expected iw: %d ig: %d %pr\n",
@@ -1949,7 +1963,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -ENXIO;
 	}
 
-	if (resource_size(cxled->dpa_res) * p->interleave_ways !=
+	if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
 	    resource_size(p->res)) {
 		dev_dbg(&cxlr->dev,
 			"%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
@@ -3221,6 +3235,45 @@ static int match_region_by_range(struct device *dev, void *data)
 	return rc;
 }
 
+static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
+					    struct resource *res)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int nid = phys_to_target_node(res->start);
+	resource_size_t size, cache_size;
+	int rc;
+
+	size = resource_size(res);
+	if (!size)
+		return -EINVAL;
+
+	rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
+	if (rc)
+		return rc;
+
+	if (!cache_size)
+		return 0;
+
+	if (size != cache_size) {
+		dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Move the start of the range to where the cache range starts. The
+	 * implementation assumes that the cache range is in front of the
+	 * CXL range. This is not dictated by the HMAT spec but is how the
+	 * current known implementation is configured.
+	 *
+	 * The cache range is expected to be within the CFMWS. The adjusted
+	 * res->start should not be less than cxlrd->res->start.
+	 */
+	res->start -= cache_size;
+	p->cache_size = cache_size;
+
+	return 0;
+}
+
 /* Establish an empty region covering the given HPA range */
 static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 					   struct cxl_endpoint_decoder *cxled)
@@ -3267,6 +3320,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 
 	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
 				    dev_name(&cxlr->dev));
+
+	rc = cxl_extended_linear_cache_resize(cxlr, res);
+	if (rc) {
+		/*
+		 * Failing to support extended linear cache region resize does not
+		 * prevent the region from functioning. Only causes cxl list showing
+		 * incorrect region size.
+		 */
+		dev_warn(cxlmd->dev.parent,
+			 "Failed to support extended linear cache.\n");
+	}
+
 	rc = insert_resource(cxlrd->res, res);
 	if (rc) {
 		/*
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f6015f24ad38..6a1fb784f74a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -492,6 +492,7 @@ enum cxl_config_state {
  * @res: allocated iomem capacity for this region
  * @targets: active ordered targets in current decoder configuration
  * @nr_targets: number of targets
+ * @cache_size: extended linear cache size if exists, otherwise zero.
  *
  * State transitions are protected by the cxl_region_rwsem
  */
@@ -503,6 +504,7 @@ struct cxl_region_params {
 	struct resource *res;
 	struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
 	int nr_targets;
+	resource_size_t cache_size;
 };
 
 /*
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 6adcd1b92b20..1bf5368337bc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1090,6 +1090,17 @@ static inline acpi_handle acpi_get_processor_handle(int cpu)
 
 #endif	/* !CONFIG_ACPI */
 
+#ifdef CONFIG_ACPI_HMAT
+int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
+					resource_size_t *size);
+#else
+static inline int hmat_get_extended_linear_cache_size(struct resource *backing_res,
+						      int nid, resource_size_t *size)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 extern void arch_post_acpi_subsys_init(void);
 
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index b1256fee3567..1ae13987a8a2 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -61,6 +61,7 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
 cxl_core-y += $(CXL_CORE_SRC)/hdm.o
 cxl_core-y += $(CXL_CORE_SRC)/pmu.o
 cxl_core-y += $(CXL_CORE_SRC)/cdat.o
+cxl_core-y += $(CXL_CORE_SRC)/acpi.o
 cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
 cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
 cxl_core-y += config_check.o
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL
  2025-01-17 17:28 ` [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
@ 2025-02-21  3:09   ` Alison Schofield
  2025-02-22  0:13     ` Dave Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Alison Schofield @ 2025-02-21  3:09 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, jonathan.cameron, ira.weiny, ming.li

On Fri, Jan 17, 2025 at 10:28:31AM -0700, Dave Jiang wrote:
> The current cxl region size only indicates the size of the CXL memory
> region without accounting for the extended linear cache size. Retrieve the
> cache size from HMAT and append that to the cxl region size for the cxl
> region range that matches the SRAT range that has extended linear cache
> enabled.
> 
> The SRAT defines the whole memory range that includes the extended linear
> cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
> Cache Information Structure defines the size of the extended linear cache
> size and matches to the SRAT Memory Affinity Structure by the memory
> proxmity domain. Add a helper to match the cxl range to the SRAT memory
> range in order to retrieve the cache size.
> 
> There are several places that checks the cxl region range against the
> decoder range. Use new helper to check between the two ranges and address
> the new cache size.

This reads like we are inflating the region size by cache size, and then
changing region set up code to account for the inflation. So, I'm going
to question where we need to do that inflation.

When the new region param p->cache_size is calculated it is added directly
to the p->res and that leads to much of the other work in region.c

Could p->cache_size be used as an addend when needed, like:
- Add it to the insert_resource in construct_region().
- Add it to the sysfs show's for region resource start and resource size.

Then when we get to dpa to hpa address translation, the p->res start
doesn't need adjusting either. As it is now, it's the cache start
and I think it should be the cxl resource start.

The touchpoints may grow in the direction I'm suggesting that make
it a poorer choice than what is here now. Maybe its time for the
something like a cxl_resource and a non_cxl_resource that add together
to make the region_resource.

I haven't been following this patch set all along, just started looking
yesterday, so I'm prepared to be  way off base. Figure blurting it out
at this point is the faster path forward. 

More comments related below...


> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
snip

> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
snip

> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index b98b1ccffd1c..2d8699a86b24 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -824,6 +824,21 @@ static int match_free_decoder(struct device *dev, void *data)
>  	return 1;
>  }
>  
> +static bool region_res_match_cxl_range(struct cxl_region_params *p,
> +				       struct range *range)
> +{
> +	if (!p->res)
> +		return false;
> +
> +	/*
> +	 * If an extended linear cache region then the CXL range is assumed
> +	 * to be fronted by the DRAM range in current known implementation.
> +	 * This assumption will be made until a variant implementation exists.
> +	 */
> +	return p->res->start + p->cache_size == range->start &&
> +		p->res->end == range->end;
> +}
> +
>  static int match_auto_decoder(struct device *dev, void *data)
>  {
>  	struct cxl_region_params *p = data;
> @@ -836,7 +851,7 @@ static int match_auto_decoder(struct device *dev, void *data)
>  	cxld = to_cxl_decoder(dev);
>  	r = &cxld->hpa_range;
>  
> -	if (p->res && p->res->start == r->start && p->res->end == r->end)
> +	if (region_res_match_cxl_range(p, r))
>  		return 1;

if we don't change p->res directly, this isn't needed.

>  	return 0;
> @@ -1424,8 +1439,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>  	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
>  		if (cxld->interleave_ways != iw ||
>  		    cxld->interleave_granularity != ig ||
> -		    cxld->hpa_range.start != p->res->start ||
> -		    cxld->hpa_range.end != p->res->end ||
> +		    !region_res_match_cxl_range(p, &cxld->hpa_range) ||

similar

>  		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
>  			dev_err(&cxlr->dev,
>  				"%s:%s %s expected iw: %d ig: %d %pr\n",
> @@ -1949,7 +1963,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  		return -ENXIO;
>  	}
>  
> -	if (resource_size(cxled->dpa_res) * p->interleave_ways !=
> +	if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
>  	    resource_size(p->res)) {

similar

>  		dev_dbg(&cxlr->dev,
>  			"%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
> @@ -3221,6 +3235,45 @@ static int match_region_by_range(struct device *dev, void *data)
>  	return rc;
>  }
>  
> +static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> +					    struct resource *res)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +	int nid = phys_to_target_node(res->start);
> +	resource_size_t size, cache_size;
> +	int rc;
> +
> +	size = resource_size(res);
> +	if (!size)
> +		return -EINVAL;
> +
> +	rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
> +	if (rc)
> +		return rc;
> +
> +	if (!cache_size)
> +		return 0;
> +
> +	if (size != cache_size) {
> +		dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	/*
> +	 * Move the start of the range to where the cache range starts. The
> +	 * implementation assumes that the cache range is in front of the
> +	 * CXL range. This is not dictated by the HMAT spec but is how the
> +	 * current known implementation is configured.
> +	 *
> +	 * The cache range is expected to be within the CFMWS. The adjusted
> +	 * res->start should not be less than cxlrd->res->start.

Check for 'cache range is expected to be within the CFMWS' ?


> +	 */
> +	res->start -= cache_size;
> +	p->cache_size = cache_size;
> +
> +	return 0;
> +}
> +
>  /* Establish an empty region covering the given HPA range */
>  static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  					   struct cxl_endpoint_decoder *cxled)
> @@ -3267,6 +3320,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  
>  	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
>  				    dev_name(&cxlr->dev));
> +
> +	rc = cxl_extended_linear_cache_resize(cxlr, res);
> +	if (rc) {
> +		/*
> +		 * Failing to support extended linear cache region resize does not
> +		 * prevent the region from functioning. Only causes cxl list showing
> +		 * incorrect region size.

Also cxlr_hpa_cache_alias() lookups will fail for cxl events, so no
hpa_alias in trace events.

> +		 */
> +		dev_warn(cxlmd->dev.parent,
> +			 "Failed to support extended linear cache.\n");

Maybe more specifics of what is/isn't present.

> +	}
> +
>  	rc = insert_resource(cxlrd->res, res);

Cut off in this diff is the "p->res = res" assignment that follows,
which then makes all the previous changes regarding matching decoder
ranges necessary.


>  	if (rc) {
>  		/*
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
snip

> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
snip

> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
snip


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL
  2025-02-21  3:09   ` Alison Schofield
@ 2025-02-22  0:13     ` Dave Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Jiang @ 2025-02-22  0:13 UTC (permalink / raw)
  To: Alison Schofield
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, jonathan.cameron, ira.weiny, ming.li



On 2/20/25 8:09 PM, Alison Schofield wrote:
> On Fri, Jan 17, 2025 at 10:28:31AM -0700, Dave Jiang wrote:
>> The current cxl region size only indicates the size of the CXL memory
>> region without accounting for the extended linear cache size. Retrieve the
>> cache size from HMAT and append that to the cxl region size for the cxl
>> region range that matches the SRAT range that has extended linear cache
>> enabled.
>>
>> The SRAT defines the whole memory range that includes the extended linear
>> cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
>> Cache Information Structure defines the size of the extended linear cache
>> size and matches to the SRAT Memory Affinity Structure by the memory
>> proxmity domain. Add a helper to match the cxl range to the SRAT memory
>> range in order to retrieve the cache size.
>>
>> There are several places that checks the cxl region range against the
>> decoder range. Use new helper to check between the two ranges and address
>> the new cache size.
> 
> This reads like we are inflating the region size by cache size, and then
> changing region set up code to account for the inflation. So, I'm going
> to question where we need to do that inflation.
> 
> When the new region param p->cache_size is calculated it is added directly
> to the p->res and that leads to much of the other work in region.c
> 
> Could p->cache_size be used as an addend when needed, like:
> - Add it to the insert_resource in construct_region().
> - Add it to the sysfs show's for region resource start and resource size.
> 
> Then when we get to dpa to hpa address translation, the p->res start
> doesn't need adjusting either. As it is now, it's the cache start
> and I think it should be the cxl resource start.
> 
> The touchpoints may grow in the direction I'm suggesting that make
> it a poorer choice than what is here now. Maybe its time for the
> something like a cxl_resource and a non_cxl_resource that add together
> to make the region_resource.
> 
> I haven't been following this patch set all along, just started looking
> yesterday, so I'm prepared to be  way off base. Figure blurting it out
> at this point is the faster path forward. 
> 
> More comments related below...
> 
> 
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> snip
> 
>> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> snip
> 
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index b98b1ccffd1c..2d8699a86b24 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -824,6 +824,21 @@ static int match_free_decoder(struct device *dev, void *data)
>>  	return 1;
>>  }
>>  
>> +static bool region_res_match_cxl_range(struct cxl_region_params *p,
>> +				       struct range *range)
>> +{
>> +	if (!p->res)
>> +		return false;
>> +
>> +	/*
>> +	 * If an extended linear cache region then the CXL range is assumed
>> +	 * to be fronted by the DRAM range in current known implementation.
>> +	 * This assumption will be made until a variant implementation exists.
>> +	 */
>> +	return p->res->start + p->cache_size == range->start &&
>> +		p->res->end == range->end;
>> +}
>> +
>>  static int match_auto_decoder(struct device *dev, void *data)
>>  {
>>  	struct cxl_region_params *p = data;
>> @@ -836,7 +851,7 @@ static int match_auto_decoder(struct device *dev, void *data)
>>  	cxld = to_cxl_decoder(dev);
>>  	r = &cxld->hpa_range;
>>  
>> -	if (p->res && p->res->start == r->start && p->res->end == r->end)
>> +	if (region_res_match_cxl_range(p, r))
>>  		return 1;
> 
> if we don't change p->res directly, this isn't needed.

It does get changed so it's needed. A lot of these changes are done after tripping setup failures during testing and debugging.

> 
>>  	return 0;
>> @@ -1424,8 +1439,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>>  	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
>>  		if (cxld->interleave_ways != iw ||
>>  		    cxld->interleave_granularity != ig ||
>> -		    cxld->hpa_range.start != p->res->start ||
>> -		    cxld->hpa_range.end != p->res->end ||
>> +		    !region_res_match_cxl_range(p, &cxld->hpa_range) ||
> 
> similar
> 
>>  		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
>>  			dev_err(&cxlr->dev,
>>  				"%s:%s %s expected iw: %d ig: %d %pr\n",
>> @@ -1949,7 +1963,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>>  		return -ENXIO;
>>  	}
>>  
>> -	if (resource_size(cxled->dpa_res) * p->interleave_ways !=
>> +	if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
>>  	    resource_size(p->res)) {
> 
> similar
> 
>>  		dev_dbg(&cxlr->dev,
>>  			"%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n",
>> @@ -3221,6 +3235,45 @@ static int match_region_by_range(struct device *dev, void *data)
>>  	return rc;
>>  }
>>  
>> +static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
>> +					    struct resource *res)
>> +{
>> +	struct cxl_region_params *p = &cxlr->params;
>> +	int nid = phys_to_target_node(res->start);
>> +	resource_size_t size, cache_size;
>> +	int rc;
>> +
>> +	size = resource_size(res);
>> +	if (!size)
>> +		return -EINVAL;
>> +
>> +	rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
>> +	if (rc)
>> +		return rc;
>> +
>> +	if (!cache_size)
>> +		return 0;
>> +
>> +	if (size != cache_size) {
>> +		dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	/*
>> +	 * Move the start of the range to where the cache range starts. The
>> +	 * implementation assumes that the cache range is in front of the
>> +	 * CXL range. This is not dictated by the HMAT spec but is how the
>> +	 * current known implementation is configured.
>> +	 *
>> +	 * The cache range is expected to be within the CFMWS. The adjusted
>> +	 * res->start should not be less than cxlrd->res->start.
> 
> Check for 'cache range is expected to be within the CFMWS' ?

Will add

> 
> 
>> +	 */
>> +	res->start -= cache_size;
>> +	p->cache_size = cache_size;
>> +
>> +	return 0;
>> +}
>> +
>>  /* Establish an empty region covering the given HPA range */
>>  static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>  					   struct cxl_endpoint_decoder *cxled)
>> @@ -3267,6 +3320,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>  
>>  	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
>>  				    dev_name(&cxlr->dev));
>> +
>> +	rc = cxl_extended_linear_cache_resize(cxlr, res);
>> +	if (rc) {
>> +		/*
>> +		 * Failing to support extended linear cache region resize does not
>> +		 * prevent the region from functioning. Only causes cxl list showing
>> +		 * incorrect region size.
> 
> Also cxlr_hpa_cache_alias() lookups will fail for cxl events, so no
> hpa_alias in trace events.

Right. But it needs to report the near memory alias vs the CXL address. hpa_alias is used interchangeably and not necessarily specific to near or far memory.
 
> 
>> +		 */
>> +		dev_warn(cxlmd->dev.parent,
>> +			 "Failed to support extended linear cache.\n");
> 
> Maybe more specifics of what is/isn't present.

It's just a general catch all for whatever failures from retrieving the cache size and calculate the start address.

> 
>> +	}
>> +
>>  	rc = insert_resource(cxlrd->res, res);
> 
> Cut off in this diff is the "p->res = res" assignment that follows,
> which then makes all the previous changes regarding matching decoder
> ranges necessary.

yes

> 
> 
>>  	if (rc) {
>>  		/*
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> snip
> 
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> snip
> 
>> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> snip
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events
  2025-01-17 17:28 [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
  2025-01-17 17:28 ` [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
  2025-01-17 17:28 ` [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
@ 2025-01-17 17:28 ` Dave Jiang
  2025-01-21 15:02   ` Jonathan Cameron
  2025-02-21  1:30   ` Alison Schofield
  2025-01-17 17:28 ` [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
  3 siblings, 2 replies; 14+ messages in thread
From: Dave Jiang @ 2025-01-17 17:28 UTC (permalink / raw)
  To: linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny, ming.li

Add the aliased address of extended linear cache when emitting event
trace for DRAM and general media of CXL events.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v3:
- Drop unused region to nid function
- Make sure hpa_alias defaults to ~0ULL. (Jonathan)
---
 drivers/cxl/core/mbox.c  | 28 ++++++++++++++++++++++++----
 drivers/cxl/core/trace.h | 24 ++++++++++++++++--------
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 548564c770c0..f42c4c56dc43 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -856,6 +856,23 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, "CXL");
 
+static u64 cxlr_hpa_cache_alias(struct cxl_region *cxlr, u64 hpa)
+{
+	struct cxl_region_params *p;
+
+	if (!cxlr)
+		return ~0ULL;
+
+	p = &cxlr->params;
+	if (!p->cache_size)
+		return ~0ULL;
+
+	if (hpa >= p->res->start + p->cache_size)
+		return hpa - p->cache_size;
+
+	return hpa + p->cache_size;
+}
+
 void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 			    enum cxl_event_log_type type,
 			    enum cxl_event_type event_type,
@@ -871,7 +888,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 	}
 
 	if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) {
-		u64 dpa, hpa = ULLONG_MAX;
+		u64 dpa, hpa = ULLONG_MAX, hpa_alias = ~0ULL;
 		struct cxl_region *cxlr;
 
 		/*
@@ -884,14 +901,17 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 
 		dpa = le64_to_cpu(evt->media_hdr.phys_addr) & CXL_DPA_MASK;
 		cxlr = cxl_dpa_to_region(cxlmd, dpa);
-		if (cxlr)
+		if (cxlr) {
 			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+			hpa_alias = cxlr_hpa_cache_alias(cxlr, hpa);
+		}
 
 		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
 			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
-						&evt->gen_media);
+						hpa_alias, &evt->gen_media);
 		else if (event_type == CXL_CPER_EVENT_DRAM)
-			trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
+			trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
+				       &evt->dram);
 	}
 }
 EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 8389a94adb1a..257f60f16e4c 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -316,9 +316,10 @@ TRACE_EVENT(cxl_generic_event,
 TRACE_EVENT(cxl_general_media,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_region *cxlr, u64 hpa, struct cxl_event_gen_media *rec),
+		 struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
+		 struct cxl_event_gen_media *rec),
 
-	TP_ARGS(cxlmd, log, cxlr, hpa, rec),
+	TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -332,6 +333,7 @@ TRACE_EVENT(cxl_general_media,
 		__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
 		/* Following are out of order to pack trace record */
 		__field(u64, hpa)
+		__field(u64, hpa_alias)
 		__field_struct(uuid_t, region_uuid)
 		__field(u16, validity_flags)
 		__field(u8, rank)
@@ -358,6 +360,7 @@ TRACE_EVENT(cxl_general_media,
 			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
 		__entry->validity_flags = get_unaligned_le16(&rec->media_hdr.validity_flags);
 		__entry->hpa = hpa;
+		__entry->hpa_alias = hpa_alias;
 		if (cxlr) {
 			__assign_str(region_name);
 			uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -370,7 +373,7 @@ TRACE_EVENT(cxl_general_media,
 	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
 		"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
 		"device=%x comp_id=%s validity_flags='%s' " \
-		"hpa=%llx region=%s region_uuid=%pUb",
+		"hpa=%llx hpa_alias0=%llx region=%s region_uuid=%pUb",
 		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
 		show_event_desc_flags(__entry->descriptor),
 		show_gmer_mem_event_type(__entry->type),
@@ -378,7 +381,8 @@ TRACE_EVENT(cxl_general_media,
 		__entry->channel, __entry->rank, __entry->device,
 		__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
 		show_valid_flags(__entry->validity_flags),
-		__entry->hpa, __get_str(region_name), &__entry->region_uuid
+		__entry->hpa, __entry->hpa_alias, __get_str(region_name),
+		&__entry->region_uuid
 	)
 );
 
@@ -424,9 +428,10 @@ TRACE_EVENT(cxl_general_media,
 TRACE_EVENT(cxl_dram,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_region *cxlr, u64 hpa, struct cxl_event_dram *rec),
+		 struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
+		 struct cxl_event_dram *rec),
 
-	TP_ARGS(cxlmd, log, cxlr, hpa, rec),
+	TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -442,6 +447,7 @@ TRACE_EVENT(cxl_dram,
 		__field(u32, row)
 		__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
 		__field(u64, hpa)
+		__field(u64, hpa_alias)
 		__field_struct(uuid_t, region_uuid)
 		__field(u8, rank)	/* Out of order to pack trace record */
 		__field(u8, bank_group)	/* Out of order to pack trace record */
@@ -472,6 +478,7 @@ TRACE_EVENT(cxl_dram,
 		memcpy(__entry->cor_mask, &rec->correction_mask,
 			CXL_EVENT_DER_CORRECTION_MASK_SIZE);
 		__entry->hpa = hpa;
+		__entry->hpa_alias = hpa_alias;
 		if (cxlr) {
 			__assign_str(region_name);
 			uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -485,7 +492,7 @@ TRACE_EVENT(cxl_dram,
 		"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
 		"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
 		"validity_flags='%s' " \
-		"hpa=%llx region=%s region_uuid=%pUb",
+		"hpa=%llx hpa_alias0=%llx region=%s region_uuid=%pUb",
 		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
 		show_event_desc_flags(__entry->descriptor),
 		show_dram_mem_event_type(__entry->type),
@@ -495,7 +502,8 @@ TRACE_EVENT(cxl_dram,
 		__entry->row, __entry->column,
 		__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
 		show_dram_valid_flags(__entry->validity_flags),
-		__entry->hpa, __get_str(region_name), &__entry->region_uuid
+		__entry->hpa_alias, __entry->hpa, __get_str(region_name),
+		&__entry->region_uuid
 	)
 );
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events
  2025-01-17 17:28 ` [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
@ 2025-01-21 15:02   ` Jonathan Cameron
  2025-02-21  1:30   ` Alison Schofield
  1 sibling, 0 replies; 14+ messages in thread
From: Jonathan Cameron @ 2025-01-21 15:02 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, alison.schofield, ira.weiny, ming.li

On Fri, 17 Jan 2025 10:28:32 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Add the aliased address of extended linear cache when emitting event
> trace for DRAM and general media of CXL events.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events
  2025-01-17 17:28 ` [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
  2025-01-21 15:02   ` Jonathan Cameron
@ 2025-02-21  1:30   ` Alison Schofield
  2025-02-24 17:32     ` Dave Jiang
  1 sibling, 1 reply; 14+ messages in thread
From: Alison Schofield @ 2025-02-21  1:30 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, jonathan.cameron, ira.weiny, ming.li

On Fri, Jan 17, 2025 at 10:28:32AM -0700, Dave Jiang wrote:
> Add the aliased address of extended linear cache when emitting event
> trace for DRAM and general media of CXL events.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> v3:
> - Drop unused region to nid function
> - Make sure hpa_alias defaults to ~0ULL. (Jonathan)
> ---
>  drivers/cxl/core/mbox.c  | 28 ++++++++++++++++++++++++----
>  drivers/cxl/core/trace.h | 24 ++++++++++++++++--------
>  2 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 548564c770c0..f42c4c56dc43 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -856,6 +856,23 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, "CXL");
>  
> +static u64 cxlr_hpa_cache_alias(struct cxl_region *cxlr, u64 hpa)
> +{
> +	struct cxl_region_params *p;
> +
> +	if (!cxlr)
> +		return ~0ULL;
> +
> +	p = &cxlr->params;
> +	if (!p->cache_size)
> +		return ~0ULL;
> +
> +	if (hpa >= p->res->start + p->cache_size)
> +		return hpa - p->cache_size;
> +
> +	return hpa + p->cache_size;
> +}
> +
>  void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>  			    enum cxl_event_log_type type,
>  			    enum cxl_event_type event_type,
> @@ -871,7 +888,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>  	}
>  
>  	if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) {
> -		u64 dpa, hpa = ULLONG_MAX;
> +		u64 dpa, hpa = ULLONG_MAX, hpa_alias = ~0ULL;

A bit odd to use 2 different notations for same thing.
Prefer ULLONG_MAX here and in previous function.


>  		struct cxl_region *cxlr;
>  
>  		/*
> @@ -884,14 +901,17 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>  
>  		dpa = le64_to_cpu(evt->media_hdr.phys_addr) & CXL_DPA_MASK;
>  		cxlr = cxl_dpa_to_region(cxlmd, dpa);
> -		if (cxlr)
> +		if (cxlr) {
>  			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
> +			hpa_alias = cxlr_hpa_cache_alias(cxlr, hpa);
> +		}
>  
>  		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
>  			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
> -						&evt->gen_media);
> +						hpa_alias, &evt->gen_media);
>  		else if (event_type == CXL_CPER_EVENT_DRAM)
> -			trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
> +			trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
> +				       &evt->dram);
>  	}
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 8389a94adb1a..257f60f16e4c 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -316,9 +316,10 @@ TRACE_EVENT(cxl_generic_event,
>  TRACE_EVENT(cxl_general_media,
>  
>  	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
> -		 struct cxl_region *cxlr, u64 hpa, struct cxl_event_gen_media *rec),
> +		 struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
> +		 struct cxl_event_gen_media *rec),
>  
> -	TP_ARGS(cxlmd, log, cxlr, hpa, rec),
> +	TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
>  
>  	TP_STRUCT__entry(
>  		CXL_EVT_TP_entry
> @@ -332,6 +333,7 @@ TRACE_EVENT(cxl_general_media,
>  		__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
>  		/* Following are out of order to pack trace record */
>  		__field(u64, hpa)
> +		__field(u64, hpa_alias)

I saw Jonathan's ask for hpa_alias to be hpa_alias0 or something?
That's done in the CXL_EVT_TP_printk output below. It needs to be
used here in the field name to be picked up in the trace event.
(same for cxl_dram event)

But...what's the deal with hpa_alias0. If we anticipate an array
of aliases, then maybe an array like is done for comp_id would
work. Expect that's overkill at the moment.


>  		__field_struct(uuid_t, region_uuid)
>  		__field(u16, validity_flags)
>  		__field(u8, rank)
> @@ -358,6 +360,7 @@ TRACE_EVENT(cxl_general_media,
>  			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
>  		__entry->validity_flags = get_unaligned_le16(&rec->media_hdr.validity_flags);
>  		__entry->hpa = hpa;
> +		__entry->hpa_alias = hpa_alias;
>  		if (cxlr) {
>  			__assign_str(region_name);
>  			uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
> @@ -370,7 +373,7 @@ TRACE_EVENT(cxl_general_media,
>  	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
>  		"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
>  		"device=%x comp_id=%s validity_flags='%s' " \
> -		"hpa=%llx region=%s region_uuid=%pUb",
> +		"hpa=%llx hpa_alias0=%llx region=%s region_uuid=%pUb",
>  		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
>  		show_event_desc_flags(__entry->descriptor),
>  		show_gmer_mem_event_type(__entry->type),
> @@ -378,7 +381,8 @@ TRACE_EVENT(cxl_general_media,
>  		__entry->channel, __entry->rank, __entry->device,
>  		__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
>  		show_valid_flags(__entry->validity_flags),
> -		__entry->hpa, __get_str(region_name), &__entry->region_uuid
> +		__entry->hpa, __entry->hpa_alias, __get_str(region_name),
> +		&__entry->region_uuid
>  	)
>  );
>  
> @@ -424,9 +428,10 @@ TRACE_EVENT(cxl_general_media,
>  TRACE_EVENT(cxl_dram,
>  
>  	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
> -		 struct cxl_region *cxlr, u64 hpa, struct cxl_event_dram *rec),
> +		 struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
> +		 struct cxl_event_dram *rec),
>  
> -	TP_ARGS(cxlmd, log, cxlr, hpa, rec),
> +	TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
>  
>  	TP_STRUCT__entry(
>  		CXL_EVT_TP_entry
> @@ -442,6 +447,7 @@ TRACE_EVENT(cxl_dram,
>  		__field(u32, row)
>  		__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
>  		__field(u64, hpa)
> +		__field(u64, hpa_alias)
>  		__field_struct(uuid_t, region_uuid)
>  		__field(u8, rank)	/* Out of order to pack trace record */
>  		__field(u8, bank_group)	/* Out of order to pack trace record */
> @@ -472,6 +478,7 @@ TRACE_EVENT(cxl_dram,
>  		memcpy(__entry->cor_mask, &rec->correction_mask,
>  			CXL_EVENT_DER_CORRECTION_MASK_SIZE);
>  		__entry->hpa = hpa;
> +		__entry->hpa_alias = hpa_alias;
>  		if (cxlr) {
>  			__assign_str(region_name);
>  			uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
> @@ -485,7 +492,7 @@ TRACE_EVENT(cxl_dram,
>  		"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
>  		"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
>  		"validity_flags='%s' " \
> -		"hpa=%llx region=%s region_uuid=%pUb",
> +		"hpa=%llx hpa_alias0=%llx region=%s region_uuid=%pUb",
>  		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
>  		show_event_desc_flags(__entry->descriptor),
>  		show_dram_mem_event_type(__entry->type),
> @@ -495,7 +502,8 @@ TRACE_EVENT(cxl_dram,
>  		__entry->row, __entry->column,
>  		__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
>  		show_dram_valid_flags(__entry->validity_flags),
> -		__entry->hpa, __get_str(region_name), &__entry->region_uuid
> +		__entry->hpa_alias, __entry->hpa, __get_str(region_name),

Needs swapping -  hpa then hpa_alias


> +		&__entry->region_uuid
>  	)
>  );
>  
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events
  2025-02-21  1:30   ` Alison Schofield
@ 2025-02-24 17:32     ` Dave Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Jiang @ 2025-02-24 17:32 UTC (permalink / raw)
  To: Alison Schofield
  Cc: linux-cxl, linux-acpi, rafael, bp, dan.j.williams, tony.luck,
	dave, jonathan.cameron, ira.weiny, ming.li



On 2/20/25 6:30 PM, Alison Schofield wrote:
> On Fri, Jan 17, 2025 at 10:28:32AM -0700, Dave Jiang wrote:
>> Add the aliased address of extended linear cache when emitting event
>> trace for DRAM and general media of CXL events.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> v3:
>> - Drop unused region to nid function
>> - Make sure hpa_alias defaults to ~0ULL. (Jonathan)
>> ---
>>  drivers/cxl/core/mbox.c  | 28 ++++++++++++++++++++++++----
>>  drivers/cxl/core/trace.h | 24 ++++++++++++++++--------
>>  2 files changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index 548564c770c0..f42c4c56dc43 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -856,6 +856,23 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, "CXL");
>>  
>> +static u64 cxlr_hpa_cache_alias(struct cxl_region *cxlr, u64 hpa)
>> +{
>> +	struct cxl_region_params *p;
>> +
>> +	if (!cxlr)
>> +		return ~0ULL;
>> +
>> +	p = &cxlr->params;
>> +	if (!p->cache_size)
>> +		return ~0ULL;
>> +
>> +	if (hpa >= p->res->start + p->cache_size)
>> +		return hpa - p->cache_size;
>> +
>> +	return hpa + p->cache_size;
>> +}
>> +
>>  void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>  			    enum cxl_event_log_type type,
>>  			    enum cxl_event_type event_type,
>> @@ -871,7 +888,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>  	}
>>  
>>  	if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) {
>> -		u64 dpa, hpa = ULLONG_MAX;
>> +		u64 dpa, hpa = ULLONG_MAX, hpa_alias = ~0ULL;
> 
> A bit odd to use 2 different notations for same thing.
> Prefer ULLONG_MAX here and in previous function.

Will fix

> 
> 
>>  		struct cxl_region *cxlr;
>>  
>>  		/*
>> @@ -884,14 +901,17 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>  
>>  		dpa = le64_to_cpu(evt->media_hdr.phys_addr) & CXL_DPA_MASK;
>>  		cxlr = cxl_dpa_to_region(cxlmd, dpa);
>> -		if (cxlr)
>> +		if (cxlr) {
>>  			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
>> +			hpa_alias = cxlr_hpa_cache_alias(cxlr, hpa);
>> +		}
>>  
>>  		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
>>  			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
>> -						&evt->gen_media);
>> +						hpa_alias, &evt->gen_media);
>>  		else if (event_type == CXL_CPER_EVENT_DRAM)
>> -			trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
>> +			trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
>> +				       &evt->dram);
>>  	}
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
>> index 8389a94adb1a..257f60f16e4c 100644
>> --- a/drivers/cxl/core/trace.h
>> +++ b/drivers/cxl/core/trace.h
>> @@ -316,9 +316,10 @@ TRACE_EVENT(cxl_generic_event,
>>  TRACE_EVENT(cxl_general_media,
>>  
>>  	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
>> -		 struct cxl_region *cxlr, u64 hpa, struct cxl_event_gen_media *rec),
>> +		 struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
>> +		 struct cxl_event_gen_media *rec),
>>  
>> -	TP_ARGS(cxlmd, log, cxlr, hpa, rec),
>> +	TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
>>  
>>  	TP_STRUCT__entry(
>>  		CXL_EVT_TP_entry
>> @@ -332,6 +333,7 @@ TRACE_EVENT(cxl_general_media,
>>  		__array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
>>  		/* Following are out of order to pack trace record */
>>  		__field(u64, hpa)
>> +		__field(u64, hpa_alias)
> 
> I saw Jonathan's ask for hpa_alias to be hpa_alias0 or something?
> That's done in the CXL_EVT_TP_printk output below. It needs to be
> used here in the field name to be picked up in the trace event.
> (same for cxl_dram event)
> 
> But...what's the deal with hpa_alias0. If we anticipate an array
> of aliases, then maybe an array like is done for comp_id would
> work. Expect that's overkill at the moment.

So basically the spec language allows for multiple aliases, but the current implementation only has 1. So this is basically allowing for that happen in the future but not doing much more than that until actual implementation happens.

> 
> 
>>  		__field_struct(uuid_t, region_uuid)
>>  		__field(u16, validity_flags)
>>  		__field(u8, rank)
>> @@ -358,6 +360,7 @@ TRACE_EVENT(cxl_general_media,
>>  			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
>>  		__entry->validity_flags = get_unaligned_le16(&rec->media_hdr.validity_flags);
>>  		__entry->hpa = hpa;
>> +		__entry->hpa_alias = hpa_alias;
>>  		if (cxlr) {
>>  			__assign_str(region_name);
>>  			uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
>> @@ -370,7 +373,7 @@ TRACE_EVENT(cxl_general_media,
>>  	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
>>  		"descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
>>  		"device=%x comp_id=%s validity_flags='%s' " \
>> -		"hpa=%llx region=%s region_uuid=%pUb",
>> +		"hpa=%llx hpa_alias0=%llx region=%s region_uuid=%pUb",
>>  		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
>>  		show_event_desc_flags(__entry->descriptor),
>>  		show_gmer_mem_event_type(__entry->type),
>> @@ -378,7 +381,8 @@ TRACE_EVENT(cxl_general_media,
>>  		__entry->channel, __entry->rank, __entry->device,
>>  		__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
>>  		show_valid_flags(__entry->validity_flags),
>> -		__entry->hpa, __get_str(region_name), &__entry->region_uuid
>> +		__entry->hpa, __entry->hpa_alias, __get_str(region_name),
>> +		&__entry->region_uuid
>>  	)
>>  );
>>  
>> @@ -424,9 +428,10 @@ TRACE_EVENT(cxl_general_media,
>>  TRACE_EVENT(cxl_dram,
>>  
>>  	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
>> -		 struct cxl_region *cxlr, u64 hpa, struct cxl_event_dram *rec),
>> +		 struct cxl_region *cxlr, u64 hpa, u64 hpa_alias,
>> +		 struct cxl_event_dram *rec),
>>  
>> -	TP_ARGS(cxlmd, log, cxlr, hpa, rec),
>> +	TP_ARGS(cxlmd, log, cxlr, hpa, hpa_alias, rec),
>>  
>>  	TP_STRUCT__entry(
>>  		CXL_EVT_TP_entry
>> @@ -442,6 +447,7 @@ TRACE_EVENT(cxl_dram,
>>  		__field(u32, row)
>>  		__array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
>>  		__field(u64, hpa)
>> +		__field(u64, hpa_alias)
>>  		__field_struct(uuid_t, region_uuid)
>>  		__field(u8, rank)	/* Out of order to pack trace record */
>>  		__field(u8, bank_group)	/* Out of order to pack trace record */
>> @@ -472,6 +478,7 @@ TRACE_EVENT(cxl_dram,
>>  		memcpy(__entry->cor_mask, &rec->correction_mask,
>>  			CXL_EVENT_DER_CORRECTION_MASK_SIZE);
>>  		__entry->hpa = hpa;
>> +		__entry->hpa_alias = hpa_alias;
>>  		if (cxlr) {
>>  			__assign_str(region_name);
>>  			uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
>> @@ -485,7 +492,7 @@ TRACE_EVENT(cxl_dram,
>>  		"transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
>>  		"bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
>>  		"validity_flags='%s' " \
>> -		"hpa=%llx region=%s region_uuid=%pUb",
>> +		"hpa=%llx hpa_alias0=%llx region=%s region_uuid=%pUb",
>>  		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
>>  		show_event_desc_flags(__entry->descriptor),
>>  		show_dram_mem_event_type(__entry->type),
>> @@ -495,7 +502,8 @@ TRACE_EVENT(cxl_dram,
>>  		__entry->row, __entry->column,
>>  		__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
>>  		show_dram_valid_flags(__entry->validity_flags),
>> -		__entry->hpa, __get_str(region_name), &__entry->region_uuid
>> +		__entry->hpa_alias, __entry->hpa, __get_str(region_name),
> 
> Needs swapping -  hpa then hpa_alias

Will fix
> 
> 
>> +		&__entry->region_uuid
>>  	)
>>  );
>>  
>> -- 
>> 2.47.1
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache
  2025-01-17 17:28 [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
                   ` (2 preceding siblings ...)
  2025-01-17 17:28 ` [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
@ 2025-01-17 17:28 ` Dave Jiang
  2025-01-20  5:05   ` Li Ming
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Jiang @ 2025-01-17 17:28 UTC (permalink / raw)
  To: linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny, ming.li, Jonathan Cameron

Below is a setup with extended linear cache configuration with an example
layout of memory region shown below presented as a single memory region
consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
The kernel sees a region of total 256G of system memory.

              128G DRAM                          128G CXL memory
|-----------------------------------|-------------------------------------|

Data resides in either DRAM or far memory (FM) with no replication. Hot
data is swapped into DRAM by the hardware behind the scenes. When error is
detected in one location, it is possible that error also resides in the
aliased location. Therefore when a memory location that is flagged by MCE
is part of the special region, the aliased memory location needs to be
offlined as well.

Add an mce notify callback to identify if the MCE address location is part
of an extended linear cache region and handle accordingly.

Added symbol export to set_mce_nospec() in x86 code in order to call
set_mce_nospec() from the CXL MCE notify callback.

Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v3:
- Add endpoint pointer check. (Ming)
- Add mce notifier removal. (Ming)
- Return ~0ULL for no cache alias.
---
 arch/x86/mm/pat/set_memory.c |  1 +
 drivers/cxl/Kconfig          |  4 +++
 drivers/cxl/core/Makefile    |  1 +
 drivers/cxl/core/mbox.c      |  8 +++++
 drivers/cxl/core/mce.c       | 63 ++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/mce.h       | 16 +++++++++
 drivers/cxl/core/region.c    | 28 ++++++++++++++++
 drivers/cxl/cxl.h            |  6 ++++
 drivers/cxl/cxlmem.h         |  2 ++
 tools/testing/cxl/Kbuild     |  1 +
 10 files changed, 130 insertions(+)
 create mode 100644 drivers/cxl/core/mce.c
 create mode 100644 drivers/cxl/core/mce.h

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 95bc50a8541c..a0df698f46a2 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2083,6 +2083,7 @@ int set_mce_nospec(unsigned long pfn)
 		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
 	return rc;
 }
+EXPORT_SYMBOL_GPL(set_mce_nospec);
 
 /* Restore full speculative operation to the pfn. */
 int clear_mce_nospec(unsigned long pfn)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 876469e23f7a..d1c91dacae56 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_MCE
+	def_bool y
+	depends on X86_MCE && MEMORY_FAILURE
+
 endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 1a0c9c6ca818..61c9332b3582 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -17,3 +17,4 @@ cxl_core-y += cdat.o
 cxl_core-y += acpi.o
 cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
+cxl_core-$(CONFIG_CXL_MCE) += mce.o
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f42c4c56dc43..ad11f49cb117 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -11,6 +11,7 @@
 
 #include "core.h"
 #include "trace.h"
+#include "mce.h"
 
 static bool cxl_raw_allow_all;
 
@@ -1458,6 +1459,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
 struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 {
 	struct cxl_memdev_state *mds;
+	int rc;
 
 	mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
 	if (!mds) {
@@ -1473,6 +1475,12 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 	mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
 	mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
 
+	cxl_register_mce_notifier(&mds->mce_notifier);
+	rc = devm_add_action_or_reset(dev, cxl_unregister_mce_notifier,
+				      &mds->mce_notifier);
+	if (rc)
+		return ERR_PTR(rc);
+
 	return mds;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, "CXL");
diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
new file mode 100644
index 000000000000..dab5acce249e
--- /dev/null
+++ b/drivers/cxl/core/mce.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#include <linux/mm.h>
+#include <linux/notifier.h>
+#include <linux/set_memory.h>
+#include <asm/mce.h>
+#include <cxlmem.h>
+#include "mce.h"
+
+static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
+			  void *data)
+{
+	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
+						    mce_notifier);
+	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
+	struct cxl_port *endpoint = cxlmd->endpoint;
+	struct mce *mce = data;
+	u64 spa, spa_alias;
+	unsigned long pfn;
+
+	if (!mce || !mce_usable_address(mce))
+		return NOTIFY_DONE;
+
+	if (!endpoint)
+		return NOTIFY_DONE;
+
+	spa = mce->addr & MCI_ADDR_PHYSADDR;
+
+	pfn = spa >> PAGE_SHIFT;
+	if (!pfn_valid(pfn))
+		return NOTIFY_DONE;
+
+	spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
+	if (spa_alias == ~0ULL)
+		return NOTIFY_DONE;
+
+	pfn = spa_alias >> PAGE_SHIFT;
+
+	/*
+	 * Take down the aliased memory page. The original memory page flagged
+	 * by the MCE will be taken cared of by the standard MCE handler.
+	 */
+	dev_emerg(mds->cxlds.dev, "Offlining aliased SPA address0: %#llx\n",
+		  spa_alias);
+	if (!memory_failure(pfn, 0))
+		set_mce_nospec(pfn);
+
+	return NOTIFY_OK;
+}
+
+void cxl_register_mce_notifier(struct notifier_block *mce_notifier)
+{
+	mce_notifier->notifier_call = cxl_handle_mce;
+	mce_notifier->priority = MCE_PRIO_UC;
+	mce_register_decode_chain(mce_notifier);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_register_mce_notifier, "CXL");
+
+void cxl_unregister_mce_notifier(void *mce_notifier)
+{
+	mce_unregister_decode_chain(mce_notifier);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_unregister_mce_notifier, "CXL");
diff --git a/drivers/cxl/core/mce.h b/drivers/cxl/core/mce.h
new file mode 100644
index 000000000000..b92381f4c1e8
--- /dev/null
+++ b/drivers/cxl/core/mce.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
+#ifndef _CXL_CORE_MCE_H_
+#define _CXL_CORE_MCE_H_
+
+#include <linux/notifier.h>
+
+#ifdef CONFIG_CXL_MCE
+void cxl_register_mce_notifier(struct notifier_block *mce_notifer);
+void cxl_unregister_mce_notifier(void *mce_notifer);
+#else
+static inline void cxl_register_mce_notifier(struct notifier_block *mce_notifier) {}
+static inline void cxl_unregister_mce_notifier(void *mce_notifier) {}
+#endif
+
+#endif
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 2d8699a86b24..7a9ea8394876 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3437,6 +3437,34 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
 
+u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa)
+{
+	struct cxl_region_ref *iter;
+	unsigned long index;
+
+	if (!endpoint)
+		return ~0ULL;
+
+	guard(rwsem_write)(&cxl_region_rwsem);
+
+	xa_for_each(&endpoint->regions, index, iter) {
+		struct cxl_region_params *p = &iter->region->params;
+
+		if (p->res->start <= spa && spa <= p->res->end) {
+			if (!p->cache_size)
+				return ~0ULL;
+
+			if (spa > p->res->start + p->cache_size)
+				return spa - p->cache_size;
+
+			return spa + p->cache_size;
+		}
+	}
+
+	return ~0ULL;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_get_spa_cache_alias, "CXL");
+
 static int is_system_ram(struct resource *res, void *arg)
 {
 	struct cxl_region *cxlr = arg;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6a1fb784f74a..cff98e803722 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -876,6 +876,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
 int cxl_add_to_region(struct cxl_port *root,
 		      struct cxl_endpoint_decoder *cxled);
 struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
+u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
 #else
 static inline bool is_cxl_pmem_region(struct device *dev)
 {
@@ -894,6 +895,11 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
 {
 	return NULL;
 }
+static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
+					       u64 spa)
+{
+	return 0;
+}
 #endif
 
 void cxl_endpoint_parse_cdat(struct cxl_port *port);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2a25d1957ddb..55752cbf408c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -477,6 +477,7 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
  * @poison: poison driver state info
  * @security: security driver state info
  * @fw: firmware upload / activation state
+ * @mce_notifier: MCE notifier
  *
  * See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
  * details on capacity parameters.
@@ -503,6 +504,7 @@ struct cxl_memdev_state {
 	struct cxl_poison_state poison;
 	struct cxl_security_state security;
 	struct cxl_fw_state fw;
+	struct notifier_block mce_notifier;
 };
 
 static inline struct cxl_memdev_state *
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 1ae13987a8a2..f625eb2d2dc5 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -64,6 +64,7 @@ cxl_core-y += $(CXL_CORE_SRC)/cdat.o
 cxl_core-y += $(CXL_CORE_SRC)/acpi.o
 cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
 cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
+cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
 cxl_core-y += config_check.o
 cxl_core-y += cxl_core_test.o
 cxl_core-y += cxl_core_exports.o
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache
  2025-01-17 17:28 ` [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
@ 2025-01-20  5:05   ` Li Ming
  2025-01-21 15:14     ` Dave Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Li Ming @ 2025-01-20  5:05 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny

On 1/18/2025 1:28 AM, Dave Jiang wrote:
> Below is a setup with extended linear cache configuration with an example
> layout of memory region shown below presented as a single memory region
> consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
> The kernel sees a region of total 256G of system memory.
>
>               128G DRAM                          128G CXL memory
> |-----------------------------------|-------------------------------------|
>
> Data resides in either DRAM or far memory (FM) with no replication. Hot
> data is swapped into DRAM by the hardware behind the scenes. When error is
> detected in one location, it is possible that error also resides in the
> aliased location. Therefore when a memory location that is flagged by MCE
> is part of the special region, the aliased memory location needs to be
> offlined as well.
>
> Add an mce notify callback to identify if the MCE address location is part
> of an extended linear cache region and handle accordingly.
>
> Added symbol export to set_mce_nospec() in x86 code in order to call
> set_mce_nospec() from the CXL MCE notify callback.
>
> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> v3:
> - Add endpoint pointer check. (Ming)
> - Add mce notifier removal. (Ming)
> - Return ~0ULL for no cache alias.
> ---
>  arch/x86/mm/pat/set_memory.c |  1 +
>  drivers/cxl/Kconfig          |  4 +++
>  drivers/cxl/core/Makefile    |  1 +
>  drivers/cxl/core/mbox.c      |  8 +++++
>  drivers/cxl/core/mce.c       | 63 ++++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/mce.h       | 16 +++++++++
>  drivers/cxl/core/region.c    | 28 ++++++++++++++++
>  drivers/cxl/cxl.h            |  6 ++++
>  drivers/cxl/cxlmem.h         |  2 ++
>  tools/testing/cxl/Kbuild     |  1 +
>  10 files changed, 130 insertions(+)
>  create mode 100644 drivers/cxl/core/mce.c
>  create mode 100644 drivers/cxl/core/mce.h
>
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 95bc50a8541c..a0df698f46a2 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2083,6 +2083,7 @@ int set_mce_nospec(unsigned long pfn)
>  		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(set_mce_nospec);
>  
>  /* Restore full speculative operation to the pfn. */
>  int clear_mce_nospec(unsigned long pfn)
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 876469e23f7a..d1c91dacae56 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_MCE
> +	def_bool y
> +	depends on X86_MCE && MEMORY_FAILURE
> +
>  endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 1a0c9c6ca818..61c9332b3582 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -17,3 +17,4 @@ cxl_core-y += cdat.o
>  cxl_core-y += acpi.o
>  cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
> +cxl_core-$(CONFIG_CXL_MCE) += mce.o
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index f42c4c56dc43..ad11f49cb117 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -11,6 +11,7 @@
>  
>  #include "core.h"
>  #include "trace.h"
> +#include "mce.h"
>  
>  static bool cxl_raw_allow_all;
>  
> @@ -1458,6 +1459,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>  struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>  {
>  	struct cxl_memdev_state *mds;
> +	int rc;
>  
>  	mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>  	if (!mds) {
> @@ -1473,6 +1475,12 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>  	mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
>  	mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
>  
> +	cxl_register_mce_notifier(&mds->mce_notifier);
> +	rc = devm_add_action_or_reset(dev, cxl_unregister_mce_notifier,
> +				      &mds->mce_notifier);
> +	if (rc)
> +		return ERR_PTR(rc);
> +

maybe we can put this devm release action into cxl_register_mce_notifier() and rename cxl_register_mce_notifier() to devm_cxl_register_mce_notifier()?


>  	return mds;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, "CXL");
> diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
> new file mode 100644
> index 000000000000..dab5acce249e
> --- /dev/null
> +++ b/drivers/cxl/core/mce.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
> +#include <linux/mm.h>
> +#include <linux/notifier.h>
> +#include <linux/set_memory.h>
> +#include <asm/mce.h>
> +#include <cxlmem.h>
> +#include "mce.h"
> +
> +static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
> +			  void *data)
> +{
> +	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
> +						    mce_notifier);
> +	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct mce *mce = data;
> +	u64 spa, spa_alias;
> +	unsigned long pfn;
> +
> +	if (!mce || !mce_usable_address(mce))
> +		return NOTIFY_DONE;
> +
> +	if (!endpoint)
> +		return NOTIFY_DONE;
> +
> +	spa = mce->addr & MCI_ADDR_PHYSADDR;
> +
> +	pfn = spa >> PAGE_SHIFT;
> +	if (!pfn_valid(pfn))
> +		return NOTIFY_DONE;
> +
> +	spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
> +	if (spa_alias == ~0ULL)
> +		return NOTIFY_DONE;
> +
> +	pfn = spa_alias >> PAGE_SHIFT;
> +
> +	/*
> +	 * Take down the aliased memory page. The original memory page flagged
> +	 * by the MCE will be taken cared of by the standard MCE handler.
> +	 */
> +	dev_emerg(mds->cxlds.dev, "Offlining aliased SPA address0: %#llx\n",
> +		  spa_alias);
> +	if (!memory_failure(pfn, 0))
> +		set_mce_nospec(pfn);
> +
> +	return NOTIFY_OK;
> +}
> +
> +void cxl_register_mce_notifier(struct notifier_block *mce_notifier)
> +{
> +	mce_notifier->notifier_call = cxl_handle_mce;
> +	mce_notifier->priority = MCE_PRIO_UC;
> +	mce_register_decode_chain(mce_notifier);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_register_mce_notifier, "CXL");
> +
> +void cxl_unregister_mce_notifier(void *mce_notifier)
> +{
> +	mce_unregister_decode_chain(mce_notifier);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_unregister_mce_notifier, "CXL");

My understanding is that these two functions are no need to be exported, because they are invoked inside cxl_core.ko.

I check that they are not exported in v2, any reason for this change?


Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache
  2025-01-20  5:05   ` Li Ming
@ 2025-01-21 15:14     ` Dave Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Jiang @ 2025-01-21 15:14 UTC (permalink / raw)
  To: Li Ming, linux-cxl, linux-acpi
  Cc: rafael, bp, dan.j.williams, tony.luck, dave, jonathan.cameron,
	alison.schofield, ira.weiny



On 1/19/25 10:05 PM, Li Ming wrote:
> On 1/18/2025 1:28 AM, Dave Jiang wrote:
>> Below is a setup with extended linear cache configuration with an example
>> layout of memory region shown below presented as a single memory region
>> consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
>> The kernel sees a region of total 256G of system memory.
>>
>>               128G DRAM                          128G CXL memory
>> |-----------------------------------|-------------------------------------|
>>
>> Data resides in either DRAM or far memory (FM) with no replication. Hot
>> data is swapped into DRAM by the hardware behind the scenes. When error is
>> detected in one location, it is possible that error also resides in the
>> aliased location. Therefore when a memory location that is flagged by MCE
>> is part of the special region, the aliased memory location needs to be
>> offlined as well.
>>
>> Add an mce notify callback to identify if the MCE address location is part
>> of an extended linear cache region and handle accordingly.
>>
>> Added symbol export to set_mce_nospec() in x86 code in order to call
>> set_mce_nospec() from the CXL MCE notify callback.
>>
>> Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> v3:
>> - Add endpoint pointer check. (Ming)
>> - Add mce notifier removal. (Ming)
>> - Return ~0ULL for no cache alias.
>> ---
>>  arch/x86/mm/pat/set_memory.c |  1 +
>>  drivers/cxl/Kconfig          |  4 +++
>>  drivers/cxl/core/Makefile    |  1 +
>>  drivers/cxl/core/mbox.c      |  8 +++++
>>  drivers/cxl/core/mce.c       | 63 ++++++++++++++++++++++++++++++++++++
>>  drivers/cxl/core/mce.h       | 16 +++++++++
>>  drivers/cxl/core/region.c    | 28 ++++++++++++++++
>>  drivers/cxl/cxl.h            |  6 ++++
>>  drivers/cxl/cxlmem.h         |  2 ++
>>  tools/testing/cxl/Kbuild     |  1 +
>>  10 files changed, 130 insertions(+)
>>  create mode 100644 drivers/cxl/core/mce.c
>>  create mode 100644 drivers/cxl/core/mce.h
>>
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>> index 95bc50a8541c..a0df698f46a2 100644
>> --- a/arch/x86/mm/pat/set_memory.c
>> +++ b/arch/x86/mm/pat/set_memory.c
>> @@ -2083,6 +2083,7 @@ int set_mce_nospec(unsigned long pfn)
>>  		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
>>  	return rc;
>>  }
>> +EXPORT_SYMBOL_GPL(set_mce_nospec);
>>  
>>  /* Restore full speculative operation to the pfn. */
>>  int clear_mce_nospec(unsigned long pfn)
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
>> index 876469e23f7a..d1c91dacae56 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
>>  	  If unsure, or if this kernel is meant for production environments,
>>  	  say N.
>>  
>> +config CXL_MCE
>> +	def_bool y
>> +	depends on X86_MCE && MEMORY_FAILURE
>> +
>>  endif
>> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
>> index 1a0c9c6ca818..61c9332b3582 100644
>> --- a/drivers/cxl/core/Makefile
>> +++ b/drivers/cxl/core/Makefile
>> @@ -17,3 +17,4 @@ cxl_core-y += cdat.o
>>  cxl_core-y += acpi.o
>>  cxl_core-$(CONFIG_TRACING) += trace.o
>>  cxl_core-$(CONFIG_CXL_REGION) += region.o
>> +cxl_core-$(CONFIG_CXL_MCE) += mce.o
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index f42c4c56dc43..ad11f49cb117 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -11,6 +11,7 @@
>>  
>>  #include "core.h"
>>  #include "trace.h"
>> +#include "mce.h"
>>  
>>  static bool cxl_raw_allow_all;
>>  
>> @@ -1458,6 +1459,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>>  struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>>  {
>>  	struct cxl_memdev_state *mds;
>> +	int rc;
>>  
>>  	mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>>  	if (!mds) {
>> @@ -1473,6 +1475,12 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>>  	mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
>>  	mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
>>  
>> +	cxl_register_mce_notifier(&mds->mce_notifier);
>> +	rc = devm_add_action_or_reset(dev, cxl_unregister_mce_notifier,
>> +				      &mds->mce_notifier);
>> +	if (rc)
>> +		return ERR_PTR(rc);
>> +
> 
> maybe we can put this devm release action into cxl_register_mce_notifier() and rename cxl_register_mce_notifier() to devm_cxl_register_mce_notifier()?

ok I'll do that

> 
> 
>>  	return mds;
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, "CXL");
>> diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
>> new file mode 100644
>> index 000000000000..dab5acce249e
>> --- /dev/null
>> +++ b/drivers/cxl/core/mce.c
>> @@ -0,0 +1,63 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright(c) 2024 Intel Corporation. All rights reserved. */
>> +#include <linux/mm.h>
>> +#include <linux/notifier.h>
>> +#include <linux/set_memory.h>
>> +#include <asm/mce.h>
>> +#include <cxlmem.h>
>> +#include "mce.h"
>> +
>> +static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
>> +			  void *data)
>> +{
>> +	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
>> +						    mce_notifier);
>> +	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
>> +	struct cxl_port *endpoint = cxlmd->endpoint;
>> +	struct mce *mce = data;
>> +	u64 spa, spa_alias;
>> +	unsigned long pfn;
>> +
>> +	if (!mce || !mce_usable_address(mce))
>> +		return NOTIFY_DONE;
>> +
>> +	if (!endpoint)
>> +		return NOTIFY_DONE;
>> +
>> +	spa = mce->addr & MCI_ADDR_PHYSADDR;
>> +
>> +	pfn = spa >> PAGE_SHIFT;
>> +	if (!pfn_valid(pfn))
>> +		return NOTIFY_DONE;
>> +
>> +	spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
>> +	if (spa_alias == ~0ULL)
>> +		return NOTIFY_DONE;
>> +
>> +	pfn = spa_alias >> PAGE_SHIFT;
>> +
>> +	/*
>> +	 * Take down the aliased memory page. The original memory page flagged
>> +	 * by the MCE will be taken cared of by the standard MCE handler.
>> +	 */
>> +	dev_emerg(mds->cxlds.dev, "Offlining aliased SPA address0: %#llx\n",
>> +		  spa_alias);
>> +	if (!memory_failure(pfn, 0))
>> +		set_mce_nospec(pfn);
>> +
>> +	return NOTIFY_OK;
>> +}
>> +
>> +void cxl_register_mce_notifier(struct notifier_block *mce_notifier)
>> +{
>> +	mce_notifier->notifier_call = cxl_handle_mce;
>> +	mce_notifier->priority = MCE_PRIO_UC;
>> +	mce_register_decode_chain(mce_notifier);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_register_mce_notifier, "CXL");
>> +
>> +void cxl_unregister_mce_notifier(void *mce_notifier)
>> +{
>> +	mce_unregister_decode_chain(mce_notifier);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_unregister_mce_notifier, "CXL");
> 
> My understanding is that these two functions are no need to be exported, because they are invoked inside cxl_core.ko.
> 
> I check that they are not exported in v2, any reason for this change?

It seems cxl_test needed the export. 
> 
> 
> Ming
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-24 17:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 17:28 [PATCH v3 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2025-01-17 17:28 ` [PATCH v3 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
2025-02-21  1:42   ` Alison Schofield
2025-02-21 23:45     ` Dave Jiang
2025-01-17 17:28 ` [PATCH v3 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
2025-02-21  3:09   ` Alison Schofield
2025-02-22  0:13     ` Dave Jiang
2025-01-17 17:28 ` [PATCH v3 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
2025-01-21 15:02   ` Jonathan Cameron
2025-02-21  1:30   ` Alison Schofield
2025-02-24 17:32     ` Dave Jiang
2025-01-17 17:28 ` [PATCH v3 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
2025-01-20  5:05   ` Li Ming
2025-01-21 15:14     ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox