public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement
@ 2025-02-18 13:23 Robert Richter
  2025-02-18 13:23 ` [PATCH v2 01/15] cxl: Modify address translation callback for generic use Robert Richter
                   ` (15 more replies)
  0 siblings, 16 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

This patch set adds support of address translation and enables this
for AMD Zen5 platforms. This is a new appoach in response to an
earlier attempt to implement CXL address translation [1] and the
comments on it, esp. Dan's [2]. Dan suggested to solve this by walking
the port hierarchy from the host port to the host bridge. When
crossing memory domains from one port to the other, HPA translations
are applied using a callback function to handle platform specifics.

The CXL driver currently does not implement address translation which
assumes the host physical addresses (HPA) and system physical
addresses (SPA) are equal.

Systems with different HPA and SPA addresses need address translation.
If this is the case, the hardware addresses esp. used in the HDM
decoder configurations are different to the system's or parent port
address ranges. E.g. AMD Zen5 systems may be configured to use
'Normalized addresses'. Then, CXL endpoints have their own physical
address base which is not the same as the SPA used by the CXL host
bridge. Thus, addresses need to be translated from the endpoint's to
its CXL host bridge's address range.

To enable address translation, the endpoint's HPA range must be
translated to each of the parent port's address ranges up to the root
decoder. This is implemented by traversing the decoder and port
hierarchy from the endpoint up to the root port and applying platform
specific translation functions to determine the next HPA range of the
parent port where needed:

  if (cxl_port->to_hpa)
    hpa = cxl_port->to_hpa(cxl_decoder, hpa)

A callback is introduced to translate an HPA range from a port to its
parent. Not all ports in the hierarchy must implement this function,
e.g. on Zen5 only the first root or switch port that connects the
endpoints require address translation.

The root port's HPA range is equivalent to the system's SPA range and
can then be used to find an endpoint's root port and region.  

Also, translated HPA ranges must be used to calculate the endpoint
position in the region.

Once the region was found, the decoders of all ports between the
endpoint and the root port need to be found based on the translated
HPA. Configuration checks and interleaving setup must be modified as
necessary to support address translation.

Note that only auto-discovery of decoders is supported. Thus, decoders
are locked and cannot be configured manually.

Finally, Zen5 address translation is enabled using ACPI PRMT.

This series bases on:

 [PATCH v3 00/18] cxl: Address translation support, part 1: Cleanups and refactoring

Purpose of patches:
 * Patches #1-#2: Introduction of address translation callback,
 * Patches #3-#12: Functional changes for address
   translation (common code).
 * #13: Architectural platform setup
 * Patch #15, #15: AMD Zen5 address translation.

V2:
 * rebased onto cxl/next,
 * split of v1 in two parts:
   * removed cleanups and updates from this series to post them as a
     separate series (Dave),
   * this part 2 applies on top of part 1, v3,
 * added tags to SOB chain,
 * reworked architecture, vendor and platform setup (Jonathan):
   * added patch "cxl/x86: Prepare for architectural platform setup",
   * added function arch_cxl_port_platform_setup() plus a __weak
     versions for archs other than x86,
   * moved code to core/x86,
 * added comment to cxl_to_hpa_fn (Ben),
 * updated year in copyright statement (Ben),
 * cxl_port_calc_hpa(): Removed HPA check for zero (Jonathan), return
   1 if modified,
 * cxl_port_calc_pos(): Updated description and wording (Ben),
 * added sereral patches around interleaving and SPA calculation in
   cxl_endpoint_decoder_initialize(),
 * reworked iterator in cxl_endpoint_decoder_initialize() (Gregory),
 * fixed region interleaving parameters() (Alison),
 * fixed check in cxl_region_attach() (Alison),
 * Clarified in coverletter that not all ports in a system must
   implement the to_hpa() callback (Terry).

[1] https://lore.kernel.org/linux-cxl/20240701174754.967954-1-rrichter@amd.com/
[2] https://lore.kernel.org/linux-cxl/669086821f136_5fffa29473@dwillia2-xfh.jf.intel.com.notmuch/

Robert Richter (15):
  cxl: Modify address translation callback for generic use
  cxl: Introduce callback to translate an HPA range from a port to its
    parent
  cxl/region: Factor out code for interleaving calculations
  cxl/region: Calculate endpoint's region position during init
  cxl/region: Calculate and store the SPA range of an endpoint
  cxl/region: Use endpoint's HPA range to find the port's decoder
  cxl/region: Use translated HPA ranges to find the port's decoder
  cxl/region: Use the endpoint's SPA range to find a region
  cxl/region: Use the endpoint's SPA range to create a region
  cxl/region: Use root decoders interleaving parameters to create a
    region
  cxl/region: Use the endpoint's SPA range to check a region
  cxl/region: Lock decoders that need address translation
  cxl/x86: Prepare for architectural platform setup
  cxl/amd: Enable Zen5 address translation using ACPI PRMT
  MAINTAINERS: CXL: Add entry for AMD platform support (CXL_AMD)

 MAINTAINERS                   |   7 +
 drivers/cxl/Kconfig           |   4 +
 drivers/cxl/acpi.c            |   4 +-
 drivers/cxl/core/Makefile     |   3 +
 drivers/cxl/core/core.h       |   4 +
 drivers/cxl/core/port.c       |   4 +
 drivers/cxl/core/region.c     | 323 ++++++++++++++++++++++++++++------
 drivers/cxl/core/x86/amd.c    | 259 +++++++++++++++++++++++++++
 drivers/cxl/core/x86/common.c |  14 ++
 drivers/cxl/cxl.h             |  19 +-
 10 files changed, 583 insertions(+), 58 deletions(-)
 create mode 100644 drivers/cxl/core/x86/amd.c
 create mode 100644 drivers/cxl/core/x86/common.c


base-commit: ed50d9abb9177ba106f47e2c48e1bf1804a6956c
-- 
2.39.5


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v2 01/15] cxl: Modify address translation callback for generic use
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 16:00   ` Gregory Price
  2025-02-18 13:23 ` [PATCH v2 02/15] cxl: Introduce callback to translate an HPA range from a port to its parent Robert Richter
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

The root decoder address translation callback could be reused for
other decoders too. For generic use of the callback, change the
function interface to use a decoder argument instead of the root
decoder.

Note that a root decoder's HPA is equal to its SPA, but else it may be
different. Thus, change the name of the function type to
cxl_to_hpa_fn.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c        | 4 ++--
 drivers/cxl/core/region.c | 7 +++++--
 drivers/cxl/cxl.h         | 5 ++---
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 93c73b163c28..48ad90025d77 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -22,9 +22,9 @@ static const guid_t acpi_cxl_qtg_id_guid =
 	GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
 		  0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
 
-
-static u64 cxl_xor_hpa_to_spa(struct cxl_root_decoder *cxlrd, u64 hpa)
+static u64 cxl_xor_hpa_to_spa(struct cxl_decoder *cxld, u64 hpa)
 {
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(&cxld->dev);
 	struct cxl_cxims_data *cximsd = cxlrd->platform_data;
 	int hbiw = cxlrd->cxlsd.nr_targets;
 	u64 val;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a56b84e7103a..c118bda93e86 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2936,9 +2936,12 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
 	/* Apply the hpa_offset to the region base address */
 	hpa = hpa_offset + p->res->start;
 
-	/* Root decoder translation overrides typical modulo decode */
+	/*
+	 * Root decoder translation overrides typical modulo decode.
+	 * Note that a root decoder's HPA is equal to its SPA.
+	 */
 	if (cxlrd->hpa_to_spa)
-		hpa = cxlrd->hpa_to_spa(cxlrd, hpa);
+		hpa = cxlrd->hpa_to_spa(&cxlrd->cxlsd.cxld, hpa);
 
 	if (hpa < p->res->start || hpa > p->res->end) {
 		dev_dbg(&cxlr->dev,
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 50e7d878bb6f..b19ba47242c6 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -418,8 +418,7 @@ struct cxl_switch_decoder {
 	struct cxl_dport *target[];
 };
 
-struct cxl_root_decoder;
-typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
+typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
 
 /**
  * struct cxl_root_decoder - Static platform CXL address decoder
@@ -434,7 +433,7 @@ typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
 struct cxl_root_decoder {
 	struct resource *res;
 	atomic_t region_id;
-	cxl_hpa_to_spa_fn hpa_to_spa;
+	cxl_to_hpa_fn hpa_to_spa;
 	void *platform_data;
 	struct mutex range_lock;
 	int qos_class;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 02/15] cxl: Introduce callback to translate an HPA range from a port to its parent
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
  2025-02-18 13:23 ` [PATCH v2 01/15] cxl: Modify address translation callback for generic use Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 21:19   ` Dave Jiang
  2025-02-18 13:23 ` [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations Robert Richter
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

To enable address translation, the endpoint's HPA range must be
translated to each of the parent port's address ranges up to the root
decoder. Traverse the decoder and port hierarchy from the endpoint up
to the root port and apply platform specific translation functions to
determine the next HPA range of the parent port where needed:

  if (cxl_port->to_hpa)
    hpa = cxl_port->to_hpa(cxl_decoder, hpa)

The root port's HPA range is equivalent to the system's SPA range.

Introduce a callback to translate an HPA range from a port to its
parent.

Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
---
 drivers/cxl/cxl.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b19ba47242c6..17496784f021 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -418,6 +418,15 @@ struct cxl_switch_decoder {
 	struct cxl_dport *target[];
 };
 
+/**
+ * cxl_to_hpa_fn - type of a callback function to translate an HPA
+ * @cxld: cxl_decoder to translate from
+ * @hpa: HPA of the @cxld decoder's address range
+ *
+ * The callback translates a decoder's HPA to the next upper domain
+ * which is the address range of the decoder's parent port. The return
+ * value is the translated HPA on success or ULLONG_MAX otherwise.
+ */
 typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
 
 /**
@@ -581,6 +590,7 @@ struct cxl_dax_region {
  * @parent_dport: dport that points to this port in the parent
  * @decoder_ida: allocator for decoder ids
  * @reg_map: component and ras register mapping parameters
+ * @to_hpa: Callback to translate a child port's decoder address to the port's HPA address range
  * @nr_dports: number of entries in @dports
  * @hdm_end: track last allocated HDM decoder instance for allocation ordering
  * @commit_end: cursor to track highest committed decoder for commit ordering
@@ -602,6 +612,7 @@ struct cxl_port {
 	struct cxl_dport *parent_dport;
 	struct ida decoder_ida;
 	struct cxl_register_map reg_map;
+	cxl_to_hpa_fn to_hpa;
 	int nr_dports;
 	int hdm_end;
 	int commit_end;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
  2025-02-18 13:23 ` [PATCH v2 01/15] cxl: Modify address translation callback for generic use Robert Richter
  2025-02-18 13:23 ` [PATCH v2 02/15] cxl: Introduce callback to translate an HPA range from a port to its parent Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 16:28   ` Gregory Price
  2025-03-14 12:45   ` Jonathan Cameron
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

Function cxl_calc_interleave_pos() contains code to calculate the
interleaving parameters of a port. Factor out that code for later
reuse. Add function cxl_port_calc_interleave() for this and introduce
struct cxl_interleave_context to collect all interleaving data.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 63 ++++++++++++++++++++++++++-------------
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c118bda93e86..ad4a6ce37216 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1800,27 +1800,34 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
 	return rc;
 }
 
+struct cxl_interleave_context {
+	struct range *hpa_range;
+	int pos;
+};
+
 /**
- * cxl_calc_interleave_pos() - calculate an endpoint position in a region
- * @cxled: endpoint decoder member of given region
+ * cxl_port_calc_interleave() - calculate interleave config of an endpoint for @port
+ * @port: Port the new position is calculated for.
+ * @ctx: Interleave context
  *
- * The endpoint position is calculated by traversing the topology from
- * the endpoint to the root decoder and iteratively applying this
- * calculation:
+ * The endpoint position for the next port is calculated by applying
+ * this calculation:
  *
  *    position = position * parent_ways + parent_pos;
  *
  * ...where @position is inferred from switch and root decoder target lists.
  *
+ * The endpoint's position in a region can be calculated by traversing
+ * the topology from the endpoint to the root decoder and iteratively
+ * applying the function for each port.
+ *
  * Return: position >= 0 on success
  *	   -ENXIO on failure
  */
-static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
+static int cxl_port_calc_interleave(struct cxl_port *port,
+				    struct cxl_interleave_context *ctx)
 {
-	struct cxl_port *iter, *port = cxled_to_port(cxled);
-	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
-	struct range *range = &cxled->cxld.hpa_range;
-	int parent_ways = 0, parent_pos = 0, pos = 0;
+	int parent_ways = 0, parent_pos = 0;
 	int rc;
 
 	/*
@@ -1852,22 +1859,38 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
 	 * complex topologies, including those with switches.
 	 */
 
-	/* Iterate from endpoint to root_port refining the position */
-	for (iter = port; iter; iter = parent_port_of(iter)) {
-		if (is_cxl_root(iter))
-			break;
+	if (is_cxl_root(port))
+		return 0;
 
-		rc = find_pos_and_ways(iter, range, &parent_pos, &parent_ways);
-		if (rc)
-			return rc;
+	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
+	if (rc)
+		return rc;
 
-		pos = pos * parent_ways + parent_pos;
-	}
+	ctx->pos = ctx->pos * parent_ways + parent_pos;
+
+	return ctx->pos;
+}
+
+static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_port *iter, *port = cxled_to_port(cxled);
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_interleave_context ctx;
+	int pos = 0;
+
+	ctx = (struct cxl_interleave_context) {
+		.hpa_range = &cxled->cxld.hpa_range,
+	};
+
+	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
+	     iter = parent_port_of(iter))
+		pos = cxl_port_calc_interleave(iter, &ctx);
 
 	dev_dbg(&cxlmd->dev,
 		"decoder:%s parent:%s port:%s range:%#llx-%#llx pos:%d\n",
 		dev_name(&cxled->cxld.dev), dev_name(cxlmd->dev.parent),
-		dev_name(&port->dev), range->start, range->end, pos);
+		dev_name(&port->dev), ctx.hpa_range->start, ctx.hpa_range->end,
+		ctx.pos);
 
 	return pos;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (2 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-19 23:32   ` Gregory Price
                     ` (5 more replies)
  2025-02-18 13:23 ` [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint Robert Richter
                   ` (11 subsequent siblings)
  15 siblings, 6 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

The calculation of an endpoint's position in a region traverses all
ports up to the root port and determines the corresponding decoders
for that particular address range. For address translation the HPA
range must be recalculated between ports. In order to prepare the
implementation of address translation, move code to
cxl_endpoint_decoder_initialize() and reuse the existing iterator
there.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ad4a6ce37216..6f106bfa115f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1903,7 +1903,6 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
 	for (i = 0; i < p->nr_targets; i++) {
 		struct cxl_endpoint_decoder *cxled = p->targets[i];
 
-		cxled->pos = cxl_calc_interleave_pos(cxled);
 		/*
 		 * Record that sorting failed, but still continue to calc
 		 * cxled->pos so that follow-on code paths can reliably
@@ -3264,10 +3263,22 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
 	struct cxl_port *iter = cxled_to_port(cxled);
 	struct cxl_decoder *root, *cxld = &cxled->cxld;
-	struct range *hpa = &cxld->hpa_range;
+	struct range hpa = cxld->hpa_range;
+	struct cxl_interleave_context ctx;
+	int rc;
 
-	while (iter && !is_cxl_root(iter))
-		iter = to_cxl_port(iter->dev.parent);
+	ctx = (struct cxl_interleave_context) {
+		.hpa_range = &hpa,
+	};
+
+	while (iter && !is_cxl_root(iter)) {
+		/* Convert interleave settings to next port upstream. */
+		rc = cxl_port_calc_interleave(iter, &ctx);
+		if (rc < 0)
+			return rc;
+
+		iter = parent_port_of(iter);
+	}
 
 	if (!iter)
 		return -ENXIO;
@@ -3281,7 +3292,13 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 		return -ENXIO;
 	}
 
+	dev_dbg(cxld->dev.parent,
+		"%s:%s: range:%#llx-%#llx pos:%d\n",
+		dev_name(&cxled->cxld.dev), dev_name(&cxld->dev),
+		hpa.start, hpa.end, ctx.pos);
+
 	cxled->cxlrd = to_cxl_root_decoder(&root->dev);
+	cxled->pos = ctx.pos;
 
 	return 0;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (3 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 18:42   ` Gregory Price
                     ` (2 more replies)
  2025-02-18 13:23 ` [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder Robert Richter
                   ` (10 subsequent siblings)
  15 siblings, 3 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

To find the correct region and root port of an endpoint of a system
needing address translation, the endpoint's HPA range must be
translated to each of the parent port address ranges up to the root
decoder.

Calculate the SPA range using the newly introduced callback function
port->to_hpa() that translates the decoder's HPA range to its parent
port's HPA range of the next outer memory domain. Introduce the helper
function cxl_port_calc_hpa() for this to calculate address ranges
using the low-level port->to_hpa() callbacks. Determine the root port
SPA range by iterating all the ports up to the root. Store the
endpoint's SPA range for later use.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 81 ++++++++++++++++++++++++++++++++-------
 drivers/cxl/cxl.h         |  1 +
 2 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6f106bfa115f..d898c9f51113 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -832,6 +832,44 @@ static int match_free_decoder(struct device *dev, const void *data)
 	return 1;
 }
 
+static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
+			     struct range *hpa_range)
+{
+	struct range hpa = *hpa_range;
+	u64 len = range_len(&hpa);
+
+	if (!port->to_hpa)
+		return 0;
+
+	/* Translate HPA to the next upper domain. */
+	hpa.start = port->to_hpa(cxld, hpa.start);
+	hpa.end = port->to_hpa(cxld, hpa.end);
+
+	if (hpa.start == ULLONG_MAX || hpa.end == ULLONG_MAX) {
+		dev_warn(&port->dev,
+			"CXL address translation: HPA range invalid: %#llx-%#llx:%#llx-%#llx(%s)\n",
+			hpa.start, hpa.end, hpa_range->start,
+			hpa_range->end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
+	if (range_len(&hpa) != len * cxld->interleave_ways) {
+		dev_warn(&port->dev,
+			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
+			hpa.start, hpa.end, hpa_range->start,
+			hpa_range->end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
+	if (hpa.start == hpa_range->start && hpa.end == hpa_range->end)
+		return 0;
+
+	*hpa_range = hpa;
+
+	/* Return 1 if modified. */
+	return 1;
+}
+
 static int match_auto_decoder(struct device *dev, const void *data)
 {
 	const struct cxl_region_params *p = data;
@@ -1882,6 +1920,11 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
 		.hpa_range = &cxled->cxld.hpa_range,
 	};
 
+	/*
+	 * Address translation is only supported for auto-discovery of
+	 * decoders. There is no need to support address translation
+	 * here. That is, do not recalculate ctx.hpa_range here.
+	 */
 	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
 	     iter = parent_port_of(iter))
 		pos = cxl_port_calc_interleave(iter, &ctx);
@@ -3262,7 +3305,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 {
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
 	struct cxl_port *iter = cxled_to_port(cxled);
-	struct cxl_decoder *root, *cxld = &cxled->cxld;
+	struct cxl_port *parent = parent_port_of(iter);
+	struct cxl_decoder *cxld = &cxled->cxld;
 	struct range hpa = cxld->hpa_range;
 	struct cxl_interleave_context ctx;
 	int rc;
@@ -3271,25 +3315,33 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 		.hpa_range = &hpa,
 	};
 
-	while (iter && !is_cxl_root(iter)) {
+	if (!iter || !parent)
+		return -ENXIO;
+
+	while (iter && parent) {
+		/* Translate HPA to the next upper memory domain. */
+		rc = cxl_port_calc_hpa(parent, cxld, &hpa);
+		if (rc < 0)
+			return rc;
+
 		/* Convert interleave settings to next port upstream. */
 		rc = cxl_port_calc_interleave(iter, &ctx);
 		if (rc < 0)
 			return rc;
 
-		iter = parent_port_of(iter);
-	}
+		iter = parent;
+		parent = parent_port_of(iter);
 
-	if (!iter)
-		return -ENXIO;
+		if (!parent || parent->to_hpa)
+			cxld = cxl_port_find_switch_decoder(iter, &hpa);
 
-	root = cxl_port_find_switch_decoder(iter, hpa);
-	if (!root) {
-		dev_err(cxlmd->dev.parent,
-			"%s:%s no CXL window for range %#llx:%#llx\n",
-			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
-			cxld->hpa_range.start, cxld->hpa_range.end);
-		return -ENXIO;
+		if (!cxld) {
+			dev_err(cxlmd->dev.parent,
+				"%s:%s no CXL window for range %#llx:%#llx\n",
+				dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+				hpa.start, hpa.end);
+			return -ENXIO;
+		}
 	}
 
 	dev_dbg(cxld->dev.parent,
@@ -3297,7 +3349,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 		dev_name(&cxled->cxld.dev), dev_name(&cxld->dev),
 		hpa.start, hpa.end, ctx.pos);
 
-	cxled->cxlrd = to_cxl_root_decoder(&root->dev);
+	cxled->cxlrd = to_cxl_root_decoder(&cxld->dev);
+	cxled->spa_range = hpa;
 	cxled->pos = ctx.pos;
 
 	return 0;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 17496784f021..7303aec1c31c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -394,6 +394,7 @@ struct cxl_endpoint_decoder {
 	struct cxl_decoder cxld;
 	struct cxl_root_decoder *cxlrd;
 	struct resource *dpa_res;
+	struct range spa_range;
 	resource_size_t skip;
 	enum cxl_decoder_state state;
 	int part;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (4 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-04-24  0:28   ` Gregory Price
  2025-02-18 13:23 ` [PATCH v2 07/15] cxl/region: Use translated HPA ranges " Robert Richter
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

For the implementation of address translation it might not be possible
to determine the root decoder in the early enumeration state since the
SPA range is still unknown. Instead, the endpoint's HPA range is known
and from there the topology can be traversed up to the root port while
the memory range is adjusted from one memory domain to the next up to
the root port.

In a first step, use endpoint's HPA range to find the port's decoder.
Without address translation HPA == SPA, so the endpoint's HPA range
can be used since it is the same as the root decoder's.

Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
---
 drivers/cxl/core/region.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index d898c9f51113..5048511f9de5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -872,9 +872,8 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
 
 static int match_auto_decoder(struct device *dev, const void *data)
 {
-	const struct cxl_region_params *p = data;
+	const struct range *r, *hpa = data;
 	struct cxl_decoder *cxld;
-	struct range *r;
 
 	if (!is_switch_decoder(dev))
 		return 0;
@@ -882,7 +881,7 @@ static int match_auto_decoder(struct device *dev, const void *data)
 	cxld = to_cxl_decoder(dev);
 	r = &cxld->hpa_range;
 
-	if (p->res && p->res->start == r->start && p->res->end == r->end)
+	if (hpa && hpa->start == r->start && hpa->end == r->end)
 		return 1;
 
 	return 0;
@@ -906,7 +905,7 @@ cxl_find_decoder_early(struct cxl_port *port,
 		return &cxled->cxld;
 
 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
-		dev = device_find_child(&port->dev, &cxlr->params,
+		dev = device_find_child(&port->dev, &cxled->cxld.hpa_range,
 					match_auto_decoder);
 	else
 		dev = device_find_child(&port->dev, NULL, match_free_decoder);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 07/15] cxl/region: Use translated HPA ranges to find the port's decoder
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (5 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:13   ` Gregory Price
  2025-02-18 13:23 ` [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region Robert Richter
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

This is the second step to find the port's decoder with address
translation enabled. The translated HPA range must be used to find a
decoder. The port's HPA range is determined by applying address
translation when crossing memory domains for the HPA range to each
port while traversing the topology from the endpoint up to the port.

Introduce a function cxl_find_auto_decoder() that calculates the
port's translated address range to determine the corresponding
decoder. Use the existing helper function cxl_port_calc_hpa() for HPA
range calculation.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 60 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 5048511f9de5..6d5ede5b4c43 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -887,6 +887,63 @@ static int match_auto_decoder(struct device *dev, const void *data)
 	return 0;
 }
 
+static struct device *
+cxl_find_auto_decoder(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
+		      struct cxl_region *cxlr)
+{
+	struct cxl_port *parent, *iter = cxled_to_port(cxled);
+	struct cxl_decoder *cxld = &cxled->cxld;
+	struct range hpa = cxld->hpa_range;
+	struct cxl_region_ref *rr;
+
+	while (iter != port) {
+		parent = parent_port_of(iter);
+		if (!parent) {
+			dev_warn(&port->dev,
+				"port not a parent of endpoint decoder %s\n",
+				dev_name(&cxled->cxld.dev));
+			return NULL;
+		}
+
+		if (!parent->to_hpa) {
+			iter = parent;
+			continue;
+		}
+
+		/* Lower domain decoders are already attached. */
+		rr = cxl_rr_load(iter, cxlr);
+		cxld = rr ? rr->decoder : NULL;
+		if (!cxld) {
+			dev_warn(&iter->dev,
+				"no decoder found for region %s\n",
+				dev_name(&cxlr->dev));
+			return NULL;
+		}
+
+		/* Check switch decoder range. */
+		if (cxld != &cxled->cxld &&
+		    !match_auto_decoder(&cxld->dev, &hpa)) {
+			dev_warn(&iter->dev,
+				"decoder %s out of range %#llx-%#llx:%#llx-%#llx(%s)\n",
+				dev_name(&cxld->dev), cxld->hpa_range.start,
+				cxld->hpa_range.end, hpa.start, hpa.end,
+				dev_name(&cxled->cxld.dev));
+			return NULL;
+		}
+
+		if (cxl_port_calc_hpa(parent, cxld, &hpa) < 0)
+			return NULL;
+
+		iter = parent;
+	}
+
+	dev_dbg(cxld->dev.parent, "%s: range: %#llx-%#llx iw: %d ig: %d\n",
+		dev_name(&cxld->dev), hpa.start, hpa.end,
+		cxld->interleave_ways, cxld->interleave_granularity);
+
+	return device_find_child(&port->dev, &hpa, match_auto_decoder);
+}
+
 /*
  * Use cxl_find_decoder_early() only during region setup in the early
  * setup stage. Once a port is attached to a region, the region
@@ -905,8 +962,7 @@ cxl_find_decoder_early(struct cxl_port *port,
 		return &cxled->cxld;
 
 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
-		dev = device_find_child(&port->dev, &cxled->cxld.hpa_range,
-					match_auto_decoder);
+		dev = cxl_find_auto_decoder(port, cxled, cxlr);
 	else
 		dev = device_find_child(&port->dev, NULL, match_free_decoder);
 	if (!dev)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (6 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 07/15] cxl/region: Use translated HPA ranges " Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:28   ` Gregory Price
                     ` (2 more replies)
  2025-02-18 13:23 ` [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create " Robert Richter
                   ` (7 subsequent siblings)
  15 siblings, 3 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

To find the correct region and root port of an endpoint of a system
needing address translation, the endpoint's HPA range must be
translated to each of the parent port address ranges up to the root
decoder.

Use the calculated SPA range of an endpoint to find the endpoint's
region.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6d5ede5b4c43..ffe6038249ed 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3535,7 +3535,6 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 
 static int cxl_endpoint_decoder_add(struct cxl_endpoint_decoder *cxled)
 {
-	struct range *hpa = &cxled->cxld.hpa_range;
 	struct cxl_root_decoder *cxlrd = cxled->cxlrd;
 	struct cxl_region_params *p;
 	struct cxl_region *cxlr;
@@ -3547,7 +3546,7 @@ static int cxl_endpoint_decoder_add(struct cxl_endpoint_decoder *cxled)
 	 * one does the construction and the others add to that.
 	 */
 	mutex_lock(&cxlrd->range_lock);
-	cxlr = cxl_find_region_by_range(cxlrd, hpa);
+	cxlr = cxl_find_region_by_range(cxlrd, &cxled->spa_range);
 	if (!cxlr)
 		cxlr = construct_region(cxlrd, cxled);
 	mutex_unlock(&cxlrd->range_lock);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create a region
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (7 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:31   ` Gregory Price
                     ` (2 more replies)
  2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
                   ` (6 subsequent siblings)
  15 siblings, 3 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

To create a region, SPA ranges must be used. With address translation
the endpoint's HPA range is not the same as the SPA range. Use the
previously calculated SPA range instead.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ffe6038249ed..6e0434eee6df 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3445,14 +3445,14 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
 	return to_cxl_region(region_dev);
 }
 
-/* Establish an empty region covering the given HPA range */
+/* Establish an empty region covering the given SPA range */
 static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 					   struct cxl_endpoint_decoder *cxled)
 {
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
 	struct cxl_port *port = cxlrd_to_port(cxlrd);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	struct range *hpa = &cxled->cxld.hpa_range;
+	struct range *spa = &cxled->spa_range;
 	int rc, part = READ_ONCE(cxled->part);
 	struct cxl_region_params *p;
 	struct cxl_region *cxlr;
@@ -3493,7 +3493,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 		goto err;
 	}
 
-	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
+	*res = DEFINE_RES_MEM_NAMED(spa->start, range_len(spa),
 				    dev_name(&cxlr->dev));
 	rc = insert_resource(cxlrd->res, res);
 	if (rc) {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (8 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create " Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:43   ` Gregory Price
                     ` (3 more replies)
  2025-02-18 13:23 ` [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check " Robert Richter
                   ` (5 subsequent siblings)
  15 siblings, 4 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

Endpoints requiring address translation might not be aware of the
system's interleaving configuration. Instead, interleaving can be
configured on an upper memory domain (from an endpoint view) and thus
is not visible to the endpoint. For region creation this might cause
an invalid interleaving config that does not match the CFMWS entries.

Use the interleaving configuration of the root decoders to create a
region which bases on CFMWS entries. This always matches the system's
interleaving configuration and is independent of the underlying memory
topology.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 39 ++++++++++++++++++++++++++++++++++-----
 drivers/cxl/cxl.h         |  2 ++
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6e0434eee6df..3afcc9ca06ae 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1749,6 +1749,15 @@ static int cxl_region_validate_position(struct cxl_region *cxlr,
 		}
 	}
 
+	if (p->interleave_ways != cxled->interleave_ways ||
+	    p->interleave_granularity != cxled->interleave_granularity ) {
+		dev_dbg(&cxlr->dev, "interleaving config mismatch with %s: ways: %d:%d granularity: %d:%d\n",
+			dev_name(&cxled->cxld.dev), p->interleave_ways,
+			cxled->interleave_ways, p->interleave_granularity,
+			cxled->interleave_granularity);
+		return -ENXIO;
+	}
+
 	return 0;
 }
 
@@ -1852,7 +1861,7 @@ static int match_switch_decoder_by_range(struct device *dev,
 }
 
 static int find_pos_and_ways(struct cxl_port *port, struct range *range,
-			     int *pos, int *ways)
+			     int *pos, int *ways, int *granularity)
 {
 	struct cxl_switch_decoder *cxlsd;
 	struct cxl_port *parent;
@@ -1873,6 +1882,7 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
 	}
 	cxlsd = to_cxl_switch_decoder(dev);
 	*ways = cxlsd->cxld.interleave_ways;
+	*granularity = cxlsd->cxld.interleave_granularity;
 
 	for (int i = 0; i < *ways; i++) {
 		if (cxlsd->target[i] == port->parent_dport) {
@@ -1896,6 +1906,8 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
 struct cxl_interleave_context {
 	struct range *hpa_range;
 	int pos;
+	int interleave_ways;
+	int interleave_granularity;
 };
 
 /**
@@ -1914,13 +1926,17 @@ struct cxl_interleave_context {
  * the topology from the endpoint to the root decoder and iteratively
  * applying the function for each port.
  *
+ * Calculation of interleaving ways:
+ *
+ *    interleave_ways = interleave_ways * parent_ways;
+ *
  * Return: position >= 0 on success
  *	   -ENXIO on failure
  */
 static int cxl_port_calc_interleave(struct cxl_port *port,
 				    struct cxl_interleave_context *ctx)
 {
-	int parent_ways = 0, parent_pos = 0;
+	int parent_ways = 0, parent_pos = 0, parent_granularity = 0;
 	int rc;
 
 	/*
@@ -1955,12 +1971,23 @@ static int cxl_port_calc_interleave(struct cxl_port *port,
 	if (is_cxl_root(port))
 		return 0;
 
-	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
+	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways,
+			&parent_granularity);
 	if (rc)
 		return rc;
 
 	ctx->pos = ctx->pos * parent_ways + parent_pos;
 
+	if (ctx->interleave_ways)
+		ctx->interleave_ways *= parent_ways;
+	else
+		ctx->interleave_ways = parent_ways;
+
+	if (ctx->interleave_granularity)
+		ctx->interleave_granularity *= ctx->interleave_ways;
+	else
+		ctx->interleave_granularity = parent_granularity;
+
 	return ctx->pos;
 }
 
@@ -3407,6 +3434,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 	cxled->cxlrd = to_cxl_root_decoder(&cxld->dev);
 	cxled->spa_range = hpa;
 	cxled->pos = ctx.pos;
+	cxled->interleave_ways = ctx.interleave_ways;
+	cxled->interleave_granularity = ctx.interleave_granularity;
 
 	return 0;
 }
@@ -3508,8 +3537,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 	}
 
 	p->res = res;
-	p->interleave_ways = cxled->cxld.interleave_ways;
-	p->interleave_granularity = cxled->cxld.interleave_granularity;
+	p->interleave_ways = cxled->interleave_ways;
+	p->interleave_granularity = cxled->interleave_granularity;
 	p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
 
 	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 7303aec1c31c..31afd71c3c8e 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -399,6 +399,8 @@ struct cxl_endpoint_decoder {
 	enum cxl_decoder_state state;
 	int part;
 	int pos;
+	int interleave_ways;
+	int interleave_granularity;
 };
 
 /**
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check a region
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (9 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:50   ` Gregory Price
  2025-04-08 15:54   ` Gregory Price
  2025-02-18 13:23 ` [PATCH v2 12/15] cxl/region: Lock decoders that need address translation Robert Richter
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

Endpoints or switches requiring address translation might not be aware
of the system's interleaving configuration. Then, the configured
endpoint's address range might not match the expected range. In
contrast, the SPA range of an endpoint is calculated applying platform
specific address translation. That range is correct and can be used to
check a region range.

Adjust the region range check and use the endpoint's SPA range to
check it.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3afcc9ca06ae..2ca24565757a 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1531,22 +1531,26 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
 		if (cxld->interleave_ways != iw ||
 		    cxld->interleave_granularity != ig ||
-		    cxld->hpa_range.start != p->res->start ||
-		    cxld->hpa_range.end != p->res->end ||
+		    cxled->spa_range.start != p->res->start ||
+		    cxled->spa_range.end != p->res->end ||
 		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
 			dev_err(&cxlr->dev,
 				"%s:%s %s expected iw: %d ig: %d %pr\n",
 				dev_name(port->uport_dev), dev_name(&port->dev),
 				__func__, iw, ig, p->res);
 			dev_err(&cxlr->dev,
-				"%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n",
+				"%s:%s %s got iw: %d ig: %d state: %s %#llx-%#llx:%#llx-%#llx(%s):%#llx-%#llx(%s)\n",
 				dev_name(port->uport_dev), dev_name(&port->dev),
 				__func__, cxld->interleave_ways,
 				cxld->interleave_granularity,
 				(cxld->flags & CXL_DECODER_F_ENABLE) ?
 					"enabled" :
 					"disabled",
-				cxld->hpa_range.start, cxld->hpa_range.end);
+				p->res->start, p->res->end,
+				cxled->spa_range.start, cxled->spa_range.end,
+				dev_name(&cxled->cxld.dev),
+				cxld->hpa_range.start, cxld->hpa_range.end,
+				dev_name(&cxld->dev));
 			return -ENXIO;
 		}
 	} else {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 12/15] cxl/region: Lock decoders that need address translation
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (10 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check " Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:57   ` Gregory Price
  2025-02-18 13:23 ` [PATCH v2 13/15] cxl/x86: Prepare for architectural platform setup Robert Richter
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

There is only support to translate from the endpoint to its parent
port, but not in the opposite direction from the parent to the
endpoint. Thus, the endpoint address range cannot be determined and
setup manually. If the parent implements the ->to_hpa() callback and
needs address translation, forbid reprogramming of the decoders and
lock them.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/region.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 2ca24565757a..dab059ee26ef 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3410,6 +3410,17 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
 		if (rc < 0)
 			return rc;
 
+		/*
+		 * There is only support to translate from the endpoint to its
+		 * parent port, but not in the opposite direction from the
+		 * parent to the endpoint. Thus, the endpoint address range
+		 * cannot be determined and setup manually. If the address range
+		 * was translated and modified, forbid reprogramming of the
+		 * decoders and lock them.
+		 */
+		if (rc)
+			cxld->flags |= CXL_DECODER_F_LOCK;
+
 		/* Convert interleave settings to next port upstream. */
 		rc = cxl_port_calc_interleave(iter, &ctx);
 		if (rc < 0)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 13/15] cxl/x86: Prepare for architectural platform setup
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (11 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 12/15] cxl/region: Lock decoders that need address translation Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20 19:57   ` Gregory Price
  2025-02-18 13:23 ` [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT Robert Richter
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

In preparation of an architectural, vendor and platform specific port
setup, add a function arch_cxl_port_platform_setup() that can be used
to handle and implement architecture and platform specific code.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/Makefile     |  2 ++
 drivers/cxl/core/core.h       |  1 +
 drivers/cxl/core/port.c       |  4 ++++
 drivers/cxl/core/x86/common.c | 12 ++++++++++++
 4 files changed, 19 insertions(+)
 create mode 100644 drivers/cxl/core/x86/common.c

diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..db1d16d39037 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -16,3 +16,5 @@ cxl_core-y += pmu.o
 cxl_core-y += cdat.o
 cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
+
+cxl_core-$(CONFIG_X86)		+= x86/common.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index a20ea2b7d1a4..e2955f91fd98 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -118,5 +118,6 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
 					struct access_coordinate *c);
 
 int cxl_gpf_port_setup(struct device *dport_dev, struct cxl_port *port);
+void arch_cxl_port_platform_setup(struct cxl_port *port);
 
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index d19930c009ce..e94671ea8455 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -841,6 +841,8 @@ static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
 			    &cxl_einj_inject_fops);
 }
 
+void __weak arch_cxl_port_platform_setup(struct cxl_port *port) { }
+
 static int cxl_port_add(struct cxl_port *port,
 			resource_size_t component_reg_phys,
 			struct cxl_dport *parent_dport)
@@ -878,6 +880,8 @@ static int cxl_port_add(struct cxl_port *port,
 			return rc;
 	}
 
+	arch_cxl_port_platform_setup(port);
+
 	rc = device_add(dev);
 	if (rc)
 		return rc;
diff --git a/drivers/cxl/core/x86/common.c b/drivers/cxl/core/x86/common.c
new file mode 100644
index 000000000000..eeb9bdadb26d
--- /dev/null
+++ b/drivers/cxl/core/x86/common.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <cxl.h>
+
+#include "../core.h"
+
+void arch_cxl_port_platform_setup(struct cxl_port *port)
+{
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (12 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 13/15] cxl/x86: Prepare for architectural platform setup Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-21  0:40   ` Dave Jiang
                     ` (2 more replies)
  2025-02-18 13:23 ` [PATCH v2 15/15] MAINTAINERS: CXL: Add entry for AMD platform support (CXL_AMD) Robert Richter
  2025-02-20  1:00 ` [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Gregory Price
  15 siblings, 3 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, Robert Richter,
	Terry Bowman
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco

Add AMD platform specific Zen5 support for address translation.

Zen5 systems may be configured to use 'Normalized addresses'. Then,
CXL endpoints use their own physical address space and are programmed
passthrough (DPA == HPA), the number of interleaving ways for the
endpoint is set to one. The Host Physical Addresses (HPAs) need to be
translated from the endpoint to its CXL host bridge. The HPA of a CXL
host bridge is equivalent to the System Physical Address (SPA).

ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
Device Physical Address (DPA) to its System Physical Address. This is
documented in:

 AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
 ACPI v6.5 Porting Guide, Publication # 58088
 https://www.amd.com/en/search/documentation/hub.html

To implement AMD Zen5 address translation the following steps are
needed:

Apply platform specific changes to each port where necessary using
platform detection and the existing architectural framework.

Add a function cxl_port_setup_amd() to implement AMD platform specific
code. Use Kbuild and Kconfig options respectively to enable the code
depending on architecture and platform options. Handle architecture
specifics in arch_cxl_port_platform_setup() and create a new file
core/x86/amd.c for this.

Introduce a function cxl_zen5_init() to handle Zen5 specific
enablement. Zen5 platforms are detected using the PCIe vendor and
device ID of the corresponding CXL root port. There is a check for
ACPI PRMT CXL address translation support.

Apply cxl_zen5_to_hpa() as cxl_port->to_hpa() callback to Zen5 CXL
host bridges to enable platform specific address translation.

Use ACPI PRM DPA-to-SPA translation to determine an endpoint's
interleaving configuration and base address during the early
initialization process. This is used to determine an endpoint's SPA
range and to check the address translation setup.

The configuration can be determined calling the PRM for specific HPAs
given. The resulting SPAs are used to calculate interleaving
parameters of the host bridge and root port. The maximum granularity
(chunk size) is 16k, minimum is 256. Use the following calculation for
the given HPAs:

 ways    = hpa_len(SZ_16K) / SZ_16K
 gran    = (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
          / (ways - 1)
 pos     = (hpa_len(SZ_16K) - ways * SZ_16K) / gran

Before the endpoint is attached to a region the translation is checked
for reasonable values.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/Kconfig           |   4 +
 drivers/cxl/core/Makefile     |   1 +
 drivers/cxl/core/core.h       |   3 +
 drivers/cxl/core/region.c     |  25 +++-
 drivers/cxl/core/x86/amd.c    | 259 ++++++++++++++++++++++++++++++++++
 drivers/cxl/core/x86/common.c |   2 +
 6 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cxl/core/x86/amd.c

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 876469e23f7a..e576028dd983 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_AMD
+       def_bool y
+       depends on AMD_NB
+
 endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index db1d16d39037..cfe41b8edfd3 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -18,3 +18,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
 
 cxl_core-$(CONFIG_X86)		+= x86/common.o
+cxl_core-$(CONFIG_CXL_AMD)	+= x86/amd.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index e2955f91fd98..d5c94e8cea42 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -119,5 +119,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
 
 int cxl_gpf_port_setup(struct device *dport_dev, struct cxl_port *port);
 void arch_cxl_port_platform_setup(struct cxl_port *port);
+#if defined(CONFIG_X86)
+void cxl_port_setup_amd(struct cxl_port *port);
+#endif
 
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index dab059ee26ef..b6806e67c62a 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -837,10 +837,24 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
 {
 	struct range hpa = *hpa_range;
 	u64 len = range_len(&hpa);
+	int bits;
 
 	if (!port->to_hpa)
 		return 0;
 
+	/*
+	 * Check range and length alignment of 256 MB for the
+	 * interleaved address range. With max. 16-way interleaving
+	 * applied this is at least 16 KB.
+	 */
+
+	if (!len || hpa_range->start & (SZ_16K - 1) || len & (SZ_16K - 1)) {
+		dev_warn(&port->dev,
+			"HPA range not aligned or multiple of 16 kB: %#llx-%#llx(%s)\n",
+			hpa_range->start, hpa_range->end, dev_name(&cxld->dev));
+		return -ENXIO;
+	}
+
 	/* Translate HPA to the next upper domain. */
 	hpa.start = port->to_hpa(cxld, hpa.start);
 	hpa.end = port->to_hpa(cxld, hpa.end);
@@ -853,7 +867,16 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
 		return -ENXIO;
 	}
 
-	if (range_len(&hpa) != len * cxld->interleave_ways) {
+	/*
+	 *  Apply min and max interleaving addresses to the range.
+	 *  Determine the interleave ways and expand the 16 KB range
+	 *  by the power-of-2 part it.
+	 */
+	bits = range_len(&hpa) > len ? __ffs(range_len(&hpa) / len) : 0;
+	hpa.start &= ~((SZ_16K << bits) - 1);
+	hpa.end |= (SZ_16K << bits) - 1;
+
+	if (range_len(&hpa) % len) {
 		dev_warn(&port->dev,
 			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
 			hpa.start, hpa.end, hpa_range->start,
diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
new file mode 100644
index 000000000000..483c92c18054
--- /dev/null
+++ b/drivers/cxl/core/x86/amd.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/prmt.h>
+#include <linux/pci.h>
+
+#include <cxlmem.h>
+#include "../core.h"
+
+#define PCI_DEVICE_ID_AMD_ZEN5_ROOT		0x153e
+
+static const struct pci_device_id zen5_root_port_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ZEN5_ROOT) },
+	{},
+};
+
+static int is_zen5_root_port(struct device *dev, void *unused)
+{
+	if (!dev_is_pci(dev))
+		return 0;
+
+	return !!pci_match_id(zen5_root_port_ids, to_pci_dev(dev));
+}
+
+static bool is_zen5(struct cxl_port *port)
+{
+	if (!IS_ENABLED(CONFIG_ACPI_PRMT))
+		return false;
+
+	/* To get the CXL root port, find the CXL host bridge first. */
+	if (is_cxl_root(port) ||
+	    !port->host_bridge ||
+	    !is_cxl_root(to_cxl_port(port->dev.parent)))
+		return false;
+
+	return !!device_for_each_child(port->host_bridge, NULL,
+				       is_zen5_root_port);
+}
+
+/*
+ * PRM Address Translation - CXL DPA to System Physical Address
+ *
+ * Reference:
+ *
+ * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
+ * ACPI v6.5 Porting Guide, Publication # 58088
+ */
+
+static const guid_t prm_cxl_dpa_spa_guid =
+	GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
+		  0x48, 0x0b, 0x94);
+
+struct prm_cxl_dpa_spa_data {
+	u64 dpa;
+	u8 reserved;
+	u8 devfn;
+	u8 bus;
+	u8 segment;
+	void *out;
+} __packed;
+
+static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
+{
+	struct prm_cxl_dpa_spa_data data;
+	u64 spa;
+	int rc;
+
+	data = (struct prm_cxl_dpa_spa_data) {
+		.dpa     = dpa,
+		.devfn   = pci_dev->devfn,
+		.bus     = pci_dev->bus->number,
+		.segment = pci_domain_nr(pci_dev->bus),
+		.out     = &spa,
+	};
+
+	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+	if (rc) {
+		pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
+		return ULLONG_MAX;
+	}
+
+	pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
+
+	return spa;
+}
+
+/* Bits used for interleaving. */
+#define SPA_INTERLEAVING_BITS	GENMASK_ULL(14, 8)
+
+static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)
+{
+	struct cxl_memdev *cxlmd;
+	struct pci_dev *pci_dev;
+	struct cxl_port *port;
+	u64 base, spa, spa2, len, len2, offset, granularity, gran_mask;
+	int ways, pos, ways_bits, gran_bits;
+
+	/*
+	 * Nothing to do if base is non-zero and Normalized Addressing
+	 * is disabled.
+	 */
+	if (cxld->hpa_range.start)
+		return hpa;
+
+	/* Only translate from endpoint to its parent port. */
+	if (!is_endpoint_decoder(&cxld->dev))
+		return hpa;
+
+	/*
+	 * Endpoints are programmed passthrough in Normalized
+	 * Addressing mode.
+	 */
+	if (cxld->interleave_ways != 1) {
+		dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
+			cxld->interleave_ways, cxld->interleave_granularity);
+		return ULLONG_MAX;
+	}
+
+	if (hpa > cxld->hpa_range.end) {
+		dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
+			hpa, cxld->hpa_range.start, cxld->hpa_range.end);
+		return ULLONG_MAX;
+	}
+
+	port = to_cxl_port(cxld->dev.parent);
+	cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
+	if (!port || !dev_is_pci(cxlmd->dev.parent)) {
+		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
+			dev_name(cxld->dev.parent), cxld->hpa_range.start,
+			cxld->hpa_range.end);
+		return ULLONG_MAX;
+	}
+	pci_dev = to_pci_dev(cxlmd->dev.parent);
+
+	/*
+	 * If the decoder is already attached we are past the decoder
+	 * initialization, do not determine the address mapping and
+	 * just return here.
+	 */
+	if (cxld->region)
+		return prm_cxl_dpa_spa(pci_dev, hpa);
+
+	/*
+	 * Determine the interleaving config. Maximum granularity
+	 * (chunk size) is 16k, minimum is 256. Calculated with:
+	 *
+	 *	ways	= hpa_len(SZ_16K) / SZ_16K
+	 * 	gran	= (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
+	 *                / (ways - 1)
+	 *	pos	= (hpa_len(SZ_16K) - ways * SZ_16K) / gran
+	 */
+
+	base = prm_cxl_dpa_spa(pci_dev, 0);
+	spa  = prm_cxl_dpa_spa(pci_dev, SZ_16K);
+	spa2 = prm_cxl_dpa_spa(pci_dev, SZ_16K - SZ_256);
+
+	/* Includes checks to avoid div by zero */
+	if (!base || base == ULLONG_MAX || spa == ULLONG_MAX ||
+	    spa2 == ULLONG_MAX || spa < base + SZ_16K || spa2 <= base ||
+	    (spa > base + SZ_16K && spa - spa2 < SZ_256 * 2)) {
+		dev_dbg(&cxld->dev, "Error translating HPA: base: %#llx spa: %#llx spa2: %#llx\n",
+			base, spa, spa2);
+		return ULLONG_MAX;
+	}
+
+	len = spa - base;
+	len2 = spa2 - base;
+
+	/* offset = pos * granularity */
+	if (len == SZ_16K && len2 == SZ_16K - SZ_256) {
+		ways = 1;
+		offset = 0;
+		granularity = SZ_256;
+		pos = 0;
+		ways_bits = 0;
+		gran_bits = 8;
+	} else {
+		ways = len / SZ_16K;
+		offset = spa & (SZ_16K - 1);
+		granularity = (len - len2 - SZ_256) / (ways - 1);
+		ways_bits = __ffs(ways);
+		gran_bits = __ffs(granularity);
+		pos = offset >> gran_bits;
+	}
+
+	/*
+	 * Check the mapping: Number of ways is power of 2 or a
+	 * multiple of 3 ways (len == ways * SZ_16K), granularitys is
+	 * power of 2.
+	 */
+	if (len & ~(3ULL << (ways_bits + 14)) ||
+	    granularity != 1 << gran_bits || offset != pos << gran_bits) {
+		dev_dbg(&cxld->dev, "Error determining address mapping: base: %#llx spa: %#llx spa2: %#llx ways: %d pos: %d granularity: %llu\n",
+			base, spa, spa2, ways, pos, granularity);
+		return ULLONG_MAX;
+	}
+
+	spa = prm_cxl_dpa_spa(pci_dev, hpa);
+
+	/*
+	 * Check SPA using a PRM call for the closest DPA calculated
+	 * for the HPA. If the HPA matches a different interleaving
+	 * position other than the decoder's, determine its offset to
+	 * adjust the SPA.
+	 */
+
+	gran_mask = GENMASK_ULL(gran_bits, 0);
+	spa2 = base + (hpa & ~gran_mask) * ways + (hpa & gran_mask);
+	base = base - pos * granularity;
+
+	dev_dbg(&cxld->dev,
+		"address mapping found for %s (hpa -> spa): %#llx -> %#llx (%#llx+%#llx) ways: %d pos: %d granularity: %llu\n",
+		pci_name(pci_dev), hpa, spa, base, spa - base, ways, pos,
+		granularity);
+
+
+	if ((spa ^ spa2) & ~SPA_INTERLEAVING_BITS) {
+		dev_dbg(&cxld->dev, "SPA calculation failed: %#llx:%#llx\n",
+			spa, spa2);
+		return ULLONG_MAX;
+	}
+
+	return spa;
+}
+
+static void cxl_zen5_init(struct cxl_port *port)
+{
+	u64 spa;
+	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
+	int rc;
+
+	if (!is_zen5(port))
+		return;
+
+	/* Check kernel and firmware support */
+	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+
+	if (rc == -EOPNOTSUPP) {
+		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
+		return;
+	}
+
+	if (rc == -ENODEV) {
+		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
+		return;
+	}
+
+	port->to_hpa = cxl_zen5_to_hpa;
+
+	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
+		dev_name(&port->dev));
+}
+
+void cxl_port_setup_amd(struct cxl_port *port)
+{
+	cxl_zen5_init(port);
+}
diff --git a/drivers/cxl/core/x86/common.c b/drivers/cxl/core/x86/common.c
index eeb9bdadb26d..7ccd68b035e6 100644
--- a/drivers/cxl/core/x86/common.c
+++ b/drivers/cxl/core/x86/common.c
@@ -9,4 +9,6 @@
 
 void arch_cxl_port_platform_setup(struct cxl_port *port)
 {
+	if (IS_ENABLED(CONFIG_CXL_AMD))
+		cxl_port_setup_amd(port);
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 15/15] MAINTAINERS: CXL: Add entry for AMD platform support (CXL_AMD)
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (13 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT Robert Richter
@ 2025-02-18 13:23 ` Robert Richter
  2025-02-20  1:00 ` [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Gregory Price
  15 siblings, 0 replies; 61+ messages in thread
From: Robert Richter @ 2025-02-18 13:23 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman, Robert Richter

Adding a maintainer's entry for AMD platform specific CXL support.

Cc: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 896a307fa065..12eca796fadf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5842,6 +5842,13 @@ F:	include/cxl/
 F:	include/uapi/linux/cxl_mem.h
 F:	tools/testing/cxl/
 
+COMPUTE EXPRESS LINK - AMD (CXL_AMD)
+M:	Robert Richter <rrichter@amd.com>
+M:	Terry Bowman <terry.bowman@amd.com>
+L:	linux-cxl@vger.kernel.org
+S:	Maintained
+F:	drivers/cxl/core/x86/amd.c
+
 COMPUTE EXPRESS LINK PMU (CPMU)
 M:	Jonathan Cameron <jonathan.cameron@huawei.com>
 L:	linux-cxl@vger.kernel.org
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
@ 2025-02-19 23:32   ` Gregory Price
  2025-02-20 17:31   ` Gregory Price
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-19 23:32 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:45PM +0100, Robert Richter wrote:
> The calculation of an endpoint's position in a region traverses all
> ports up to the root port and determines the corresponding decoders
> for that particular address range. For address translation the HPA
> range must be recalculated between ports. In order to prepare the
> implementation of address translation, move code to
> cxl_endpoint_decoder_initialize() and reuse the existing iterator
> there.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ad4a6ce37216..6f106bfa115f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1903,7 +1903,6 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
>  	for (i = 0; i < p->nr_targets; i++) {
>  		struct cxl_endpoint_decoder *cxled = p->targets[i];
>  
> -		cxled->pos = cxl_calc_interleave_pos(cxled);
>  		/*
>  		 * Record that sorting failed, but still continue to calc
>  		 * cxled->pos so that follow-on code paths can reliably
> @@ -3264,10 +3263,22 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>  	struct cxl_port *iter = cxled_to_port(cxled);
>  	struct cxl_decoder *root, *cxld = &cxled->cxld;
> -	struct range *hpa = &cxld->hpa_range;
> +	struct range hpa = cxld->hpa_range;

I believe you have a build error here:

drivers/cxl/core/region.c: In function ‘cxl_endpoint_decoder_initialize’:
drivers/cxl/core/region.c:3286:51: error: incompatible type for argument 2 of ‘cxl_port_find_switch_decoder’
 3286 |         root = cxl_port_find_switch_decoder(iter, hpa);
      |                                                   ^~~
      |                                                   |
      |                                                   struct range
drivers/cxl/core/region.c:3244:67: note: expected ‘struct range *’ but argument is of type ‘struct range’
 3244 | cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa)
      |                                                     ~~~~~~~~~~~~~~^~~

Just a missed & later in the function

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement
  2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
                   ` (14 preceding siblings ...)
  2025-02-18 13:23 ` [PATCH v2 15/15] MAINTAINERS: CXL: Add entry for AMD platform support (CXL_AMD) Robert Richter
@ 2025-02-20  1:00 ` Gregory Price
  15 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20  1:00 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:41PM +0100, Robert Richter wrote:
> This patch set adds support of address translation and enables this
> for AMD Zen5 platforms. This is a new appoach in response to an
> earlier attempt to implement CXL address translation [1] and the
> comments on it, esp. Dan's [2]. Dan suggested to solve this by walking
> the port hierarchy from the host port to the host bridge. When
> crossing memory domains from one port to the other, HPA translations
> are applied using a callback function to handle platform specifics.
> 
> This series bases on:
> 
>  [PATCH v3 00/18] cxl: Address translation support, part 1: Cleanups and refactoring
> 
> Purpose of patches:
>  * Patches #1-#2: Introduction of address translation callback,
>  * Patches #3-#12: Functional changes for address
>    translation (common code).
>  * #13: Architectural platform setup
>  * Patch #15, #15: AMD Zen5 address translation.
> 
> [1] https://lore.kernel.org/linux-cxl/20240701174754.967954-1-rrichter@amd.com/
> [2] https://lore.kernel.org/linux-cxl/669086821f136_5fffa29473@dwillia2-xfh.jf.intel.com.notmuch/
> 

With the one build fix i've reported, I have tested this with Part 1 on
a Zen5 system w/ the PRM functionality.

Will review patches individually, but for the set:

Tested-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 01/15] cxl: Modify address translation callback for generic use
  2025-02-18 13:23 ` [PATCH v2 01/15] cxl: Modify address translation callback for generic use Robert Richter
@ 2025-02-20 16:00   ` Gregory Price
  2025-02-20 21:03     ` Dave Jiang
  0 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-02-20 16:00 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:42PM +0100, Robert Richter wrote:
> The root decoder address translation callback could be reused for
> other decoders too. For generic use of the callback, change the
> function interface to use a decoder argument instead of the root
> decoder.
> 
> Note that a root decoder's HPA is equal to its SPA, but else it may be

slighlty better wording:
"but it may different for non-root decoders"

> different. Thus, change the name of the function type to
> cxl_to_hpa_fn.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations
  2025-02-18 13:23 ` [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations Robert Richter
@ 2025-02-20 16:28   ` Gregory Price
  2025-02-20 16:41     ` Gregory Price
  2025-03-14 12:45   ` Jonathan Cameron
  1 sibling, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-02-20 16:28 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:44PM +0100, Robert Richter wrote:
> Function cxl_calc_interleave_pos() contains code to calculate the
> interleaving parameters of a port. Factor out that code for later
> reuse. Add function cxl_port_calc_interleave() for this and introduce
> struct cxl_interleave_context to collect all interleaving data.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 63 ++++++++++++++++++++++++++-------------
>  1 file changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c118bda93e86..ad4a6ce37216 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1800,27 +1800,34 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
>  	return rc;
>  }
>  
> +struct cxl_interleave_context {
> +	struct range *hpa_range;
> +	int pos;
> +};
> +

I get that this will be used later to pass information back, but this
patch by itself is a little confusing because ctx seems pointless since
the function still returns the position and accesses the hpa_range directly

Looked at in isolation, having the context structure change in this
patch than just adding the hpa_range as an argument and adding the
context later when it's actually relevant.

static int cxl_port_calc_interleave(struct cxl_port *port,
				    struct range *hpa_range);

> +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_port *iter, *port = cxled_to_port(cxled);
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_interleave_context ctx;
> +	int pos = 0;
> +
> +	ctx = (struct cxl_interleave_context) {
> +		.hpa_range = &cxled->cxld.hpa_range,
> +	};
> +
> +	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
> +	     iter = parent_port_of(iter))
> +		pos = cxl_port_calc_interleave(iter, &ctx);
>  
>  	dev_dbg(&cxlmd->dev,
>  		"decoder:%s parent:%s port:%s range:%#llx-%#llx pos:%d\n",
>  		dev_name(&cxled->cxld.dev), dev_name(cxlmd->dev.parent),
> -		dev_name(&port->dev), range->start, range->end, pos);
> +		dev_name(&port->dev), ctx.hpa_range->start, ctx.hpa_range->end,
> +		ctx.pos);
> 

	context just gets discarded here and hpa_range and pos are
	otherwise still accessible if you just pass hpa_range as an
	argument.  So I would push off adding the context argument to
	the patch that actually needs it.

>  	return pos;
>  }

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations
  2025-02-20 16:28   ` Gregory Price
@ 2025-02-20 16:41     ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 16:41 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Thu, Feb 20, 2025 at 11:28:40AM -0500, Gregory Price wrote:
> On Tue, Feb 18, 2025 at 02:23:44PM +0100, Robert Richter wrote:
> I get that this will be used later to pass information back, but this
> patch by itself is a little confusing because ctx seems pointless since
> the function still returns the position and accesses the hpa_range directly
> 
> Looked at in isolation, having the context structure change in this
> patch than just adding the hpa_range as an argument and adding the
> context later when it's actually relevant.
> 
> static int cxl_port_calc_interleave(struct cxl_port *port,
> 				    struct range *hpa_range);
>

Disregard my note here, I missed that pos has to be carried through.

Didn't notice until I looked at patch 3 and 4 together.

> +	ctx->pos = ctx->pos * parent_ways + parent_pos;
> +
> +	return ctx->pos;

This looks fine

Reviewed-by: Gregory Price <gourry@gourry.net>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
  2025-02-19 23:32   ` Gregory Price
@ 2025-02-20 17:31   ` Gregory Price
  2025-02-20 21:56   ` Dave Jiang
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 17:31 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:45PM +0100, Robert Richter wrote:
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ad4a6ce37216..6f106bfa115f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1903,7 +1903,6 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
>  	for (i = 0; i < p->nr_targets; i++) {
>  		struct cxl_endpoint_decoder *cxled = p->targets[i];
>  
> -		cxled->pos = cxl_calc_interleave_pos(cxled);
>  		/*
>  		 * Record that sorting failed, but still continue to calc
>  		 * cxled->pos so that follow-on code paths can reliably
> @@ -3264,10 +3263,22 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>  	struct cxl_port *iter = cxled_to_port(cxled);
>  	struct cxl_decoder *root, *cxld = &cxled->cxld;
> -	struct range *hpa = &cxld->hpa_range;
> +	struct range hpa = cxld->hpa_range;
> +	struct cxl_interleave_context ctx;
> +	int rc;
>  
> -	while (iter && !is_cxl_root(iter))
> -		iter = to_cxl_port(iter->dev.parent);
> +	ctx = (struct cxl_interleave_context) {
> +		.hpa_range = &hpa,
> +	};
> +
> +	while (iter && !is_cxl_root(iter)) {
> +		/* Convert interleave settings to next port upstream. */
> +		rc = cxl_port_calc_interleave(iter, &ctx);

Thinking about it a bit more, you still have cxl_port_calc_interleave
returning the position, but you have the position captured in ctx.

I think you just want to return 0/ERR now since ctx will have the pos.

This makes me think patch 3 and 4 should just be 1 patch.

> +		if (rc < 0)
> +			return rc;
> +
> +		iter = parent_port_of(iter);
> +	}
>  
>  	if (!iter)
>  		return -ENXIO;
> @@ -3281,7 +3292,13 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  		return -ENXIO;
>  	}
>  
> +	dev_dbg(cxld->dev.parent,
> +		"%s:%s: range:%#llx-%#llx pos:%d\n",
> +		dev_name(&cxled->cxld.dev), dev_name(&cxld->dev),
> +		hpa.start, hpa.end, ctx.pos);
> +
>  	cxled->cxlrd = to_cxl_root_decoder(&root->dev);
> +	cxled->pos = ctx.pos;
>  
>  	return 0;
>  }
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint
  2025-02-18 13:23 ` [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint Robert Richter
@ 2025-02-20 18:42   ` Gregory Price
  2025-02-20 22:31   ` Dave Jiang
  2025-03-14 12:41   ` Jonathan Cameron
  2 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 18:42 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:46PM +0100, Robert Richter wrote:
> To find the correct region and root port of an endpoint of a system
> needing address translation, the endpoint's HPA range must be
> translated to each of the parent port address ranges up to the root
> decoder.
> 
> Calculate the SPA range using the newly introduced callback function
> port->to_hpa() that translates the decoder's HPA range to its parent
> port's HPA range of the next outer memory domain. Introduce the helper
> function cxl_port_calc_hpa() for this to calculate address ranges
> using the low-level port->to_hpa() callbacks. Determine the root port
> SPA range by iterating all the ports up to the root. Store the
> endpoint's SPA range for later use.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
>  
> +	/*
> +	 * Address translation is only supported for auto-discovery of
> +	 * decoders. There is no need to support address translation
> +	 * here. That is, do not recalculate ctx.hpa_range here.
> +	 */

Can we at least add why translation is not supported / needed?  Just
saying "there is no need" doesn't help devs after us understand the code.

With that change:

Reviewed-by: Gregory Price <gourry@gourry.net>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 07/15] cxl/region: Use translated HPA ranges to find the port's decoder
  2025-02-18 13:23 ` [PATCH v2 07/15] cxl/region: Use translated HPA ranges " Robert Richter
@ 2025-02-20 19:13   ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:13 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:48PM +0100, Robert Richter wrote:
> This is the second step to find the port's decoder with address
> translation enabled. The translated HPA range must be used to find a
> decoder. The port's HPA range is determined by applying address
> translation when crossing memory domains for the HPA range to each
> port while traversing the topology from the endpoint up to the port.
> 
> Introduce a function cxl_find_auto_decoder() that calculates the
> port's translated address range to determine the corresponding
> decoder. Use the existing helper function cxl_port_calc_hpa() for HPA
> range calculation.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 60 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 5048511f9de5..6d5ede5b4c43 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -887,6 +887,63 @@ static int match_auto_decoder(struct device *dev, const void *data)
>  	return 0;
>  }
>  
> +static struct device *
> +cxl_find_auto_decoder(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
> +		      struct cxl_region *cxlr)
> +{
> +	struct cxl_port *parent, *iter = cxled_to_port(cxled);
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct range hpa = cxld->hpa_range;
> +	struct cxl_region_ref *rr;
> +
> +	while (iter != port) {

I understand this is used to map an endpoint to a region associated with
the given port.  Is there a scenario where `port` is *not* a root decoder?

Seems like port will (should) always be a root decoder here in practice.

May or may not be a useful check on input, either way:

Reviewed-by: Gregory Price <gourry@gourry.net>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region
  2025-02-18 13:23 ` [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region Robert Richter
@ 2025-02-20 19:28   ` Gregory Price
  2025-03-14 12:45   ` Jonathan Cameron
  2025-04-08 15:45   ` Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:28 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:49PM +0100, Robert Richter wrote:
> To find the correct region and root port of an endpoint of a system
> needing address translation, the endpoint's HPA range must be
> translated to each of the parent port address ranges up to the root
> decoder.
> 
> Use the calculated SPA range of an endpoint to find the endpoint's
> region.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>

May be worth noting that cxl_endpoint_initialize() will set
cxled->spa_range regardless of whether HPA=SPA or not, and so it will
always contain the correct value at this point - even if translation
was not required.

Reviewed-by: Gregory Price <gourry@gourry.net>

> ---
>  drivers/cxl/core/region.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6d5ede5b4c43..ffe6038249ed 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3535,7 +3535,6 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  
>  static int cxl_endpoint_decoder_add(struct cxl_endpoint_decoder *cxled)
>  {
> -	struct range *hpa = &cxled->cxld.hpa_range;
>  	struct cxl_root_decoder *cxlrd = cxled->cxlrd;
>  	struct cxl_region_params *p;
>  	struct cxl_region *cxlr;
> @@ -3547,7 +3546,7 @@ static int cxl_endpoint_decoder_add(struct cxl_endpoint_decoder *cxled)
>  	 * one does the construction and the others add to that.
>  	 */
>  	mutex_lock(&cxlrd->range_lock);
> -	cxlr = cxl_find_region_by_range(cxlrd, hpa);
> +	cxlr = cxl_find_region_by_range(cxlrd, &cxled->spa_range);
>  	if (!cxlr)
>  		cxlr = construct_region(cxlrd, cxled);
>  	mutex_unlock(&cxlrd->range_lock);
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create a region
  2025-02-18 13:23 ` [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create " Robert Richter
@ 2025-02-20 19:31   ` Gregory Price
  2025-03-14 12:46   ` Jonathan Cameron
  2025-04-08 15:50   ` Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:31 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:50PM +0100, Robert Richter wrote:
> To create a region, SPA ranges must be used. With address translation
> the endpoint's HPA range is not the same as the SPA range. Use the
> previously calculated SPA range instead.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

Reviewed-by: Gregory Price <gourry@gourry.net>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region
  2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
@ 2025-02-20 19:43   ` Gregory Price
  2025-03-14 12:49   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:43 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:51PM +0100, Robert Richter wrote:
... snip ...
>  static int find_pos_and_ways(struct cxl_port *port, struct range *range,
> -			     int *pos, int *ways)
> +			     int *pos, int *ways, int *granularity)

Function name starting to diverge from functionality

find_interleave_config() ?

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check a region
  2025-02-18 13:23 ` [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check " Robert Richter
@ 2025-02-20 19:50   ` Gregory Price
  2025-04-08 15:54   ` Gregory Price
  1 sibling, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:50 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:52PM +0100, Robert Richter wrote:
> Endpoints or switches requiring address translation might not be aware
> of the system's interleaving configuration. Then, the configured
> endpoint's address range might not match the expected range. In
> contrast, the SPA range of an endpoint is calculated applying platform
> specific address translation. That range is correct and can be used to
> check a region range.
> 
> Adjust the region range check and use the endpoint's SPA range to
> check it.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

Explanation could be a little clearer, something like:

Endpoint and switch decoders may have an HPA range that differs from
the memory region's SPA range.  Utilize the device's calculated
spa_range instead of its hpa_range to validate the region mapping.


The interleave comments don't seem particularly relevant here unless i'm
missing something.

The content of the patch looks good.

Reviewed-by: Gregory Price <gourry@gourry.net>

> ---
>  drivers/cxl/core/region.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3afcc9ca06ae..2ca24565757a 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1531,22 +1531,26 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>  	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
>  		if (cxld->interleave_ways != iw ||
>  		    cxld->interleave_granularity != ig ||
> -		    cxld->hpa_range.start != p->res->start ||
> -		    cxld->hpa_range.end != p->res->end ||
> +		    cxled->spa_range.start != p->res->start ||
> +		    cxled->spa_range.end != p->res->end ||
>  		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
>  			dev_err(&cxlr->dev,
>  				"%s:%s %s expected iw: %d ig: %d %pr\n",
>  				dev_name(port->uport_dev), dev_name(&port->dev),
>  				__func__, iw, ig, p->res);
>  			dev_err(&cxlr->dev,
> -				"%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n",
> +				"%s:%s %s got iw: %d ig: %d state: %s %#llx-%#llx:%#llx-%#llx(%s):%#llx-%#llx(%s)\n",
>  				dev_name(port->uport_dev), dev_name(&port->dev),
>  				__func__, cxld->interleave_ways,
>  				cxld->interleave_granularity,
>  				(cxld->flags & CXL_DECODER_F_ENABLE) ?
>  					"enabled" :
>  					"disabled",
> -				cxld->hpa_range.start, cxld->hpa_range.end);
> +				p->res->start, p->res->end,
> +				cxled->spa_range.start, cxled->spa_range.end,
> +				dev_name(&cxled->cxld.dev),
> +				cxld->hpa_range.start, cxld->hpa_range.end,
> +				dev_name(&cxld->dev));
>  			return -ENXIO;
>  		}
>  	} else {
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 12/15] cxl/region: Lock decoders that need address translation
  2025-02-18 13:23 ` [PATCH v2 12/15] cxl/region: Lock decoders that need address translation Robert Richter
@ 2025-02-20 19:57   ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:57 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:53PM +0100, Robert Richter wrote:
> There is only support to translate from the endpoint to its parent
> port, but not in the opposite direction from the parent to the
> endpoint. Thus, the endpoint address range cannot be determined and
> setup manually. If the parent implements the ->to_hpa() callback and
> needs address translation, forbid reprogramming of the decoders and
> lock them.
>

Re-reading this explanation, it reads oddly.

I think what you are trying to say is:

On platforms where endpoint decoders require HPA-to-SPA translation,
decoders cannot be reprogrammed due to opaque translation done by
the platform's memory controllers. When address range is modified
(translated) lock the decoder to prevent reprogramming.

Is this accurate?


> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 2ca24565757a..dab059ee26ef 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3410,6 +3410,17 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  		if (rc < 0)
>  			return rc;
>  
> +		/*
> +		 * There is only support to translate from the endpoint to its
> +		 * parent port, but not in the opposite direction from the
> +		 * parent to the endpoint. Thus, the endpoint address range
> +		 * cannot be determined and setup manually. If the address range
> +		 * was translated and modified, forbid reprogramming of the
> +		 * decoders and lock them.
> +		 */
> +		if (rc)
> +			cxld->flags |= CXL_DECODER_F_LOCK;
> +
>  		/* Convert interleave settings to next port upstream. */
>  		rc = cxl_port_calc_interleave(iter, &ctx);
>  		if (rc < 0)
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 13/15] cxl/x86: Prepare for architectural platform setup
  2025-02-18 13:23 ` [PATCH v2 13/15] cxl/x86: Prepare for architectural platform setup Robert Richter
@ 2025-02-20 19:57   ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 19:57 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:54PM +0100, Robert Richter wrote:
> In preparation of an architectural, vendor and platform specific port
> setup, add a function arch_cxl_port_platform_setup() that can be used
> to handle and implement architecture and platform specific code.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 01/15] cxl: Modify address translation callback for generic use
  2025-02-20 16:00   ` Gregory Price
@ 2025-02-20 21:03     ` Dave Jiang
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2025-02-20 21:03 UTC (permalink / raw)
  To: Gregory Price, Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
	Fabio M. De Francesco, Terry Bowman



On 2/20/25 9:00 AM, Gregory Price wrote:
> On Tue, Feb 18, 2025 at 02:23:42PM +0100, Robert Richter wrote:
>> The root decoder address translation callback could be reused for
>> other decoders too. For generic use of the callback, change the
>> function interface to use a decoder argument instead of the root
>> decoder.
>>
>> Note that a root decoder's HPA is equal to its SPA, but else it may be
> 
> slighlty better wording:
> "but it may different for non-root decoders"

This change helped me understand what that sentence was trying to say.

> 
>> different. Thus, change the name of the function type to
>> cxl_to_hpa_fn.

I would also drop the "thus" so it's in imperative tone. 

>>
>> Signed-off-by: Robert Richter <rrichter@amd.com>
> 
> Reviewed-by: Gregory Price <gourry@gourry.net>
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 02/15] cxl: Introduce callback to translate an HPA range from a port to its parent
  2025-02-18 13:23 ` [PATCH v2 02/15] cxl: Introduce callback to translate an HPA range from a port to its parent Robert Richter
@ 2025-02-20 21:19   ` Dave Jiang
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2025-02-20 21:19 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman



On 2/18/25 6:23 AM, Robert Richter wrote:
> To enable address translation, the endpoint's HPA range must be
> translated to each of the parent port's address ranges up to the root
> decoder. Traverse the decoder and port hierarchy from the endpoint up
> to the root port and apply platform specific translation functions to
> determine the next HPA range of the parent port where needed:
> 
>   if (cxl_port->to_hpa)
>     hpa = cxl_port->to_hpa(cxl_decoder, hpa)
> 
> The root port's HPA range is equivalent to the system's SPA range.
> 
> Introduce a callback to translate an HPA range from a port to its
> parent.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
>  drivers/cxl/cxl.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)

With such a small change, I would fold this into the patch where you are using it. 

> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b19ba47242c6..17496784f021 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -418,6 +418,15 @@ struct cxl_switch_decoder {
>  	struct cxl_dport *target[];
>  };
>  
> +/**
> + * cxl_to_hpa_fn - type of a callback function to translate an HPA
> + * @cxld: cxl_decoder to translate from
> + * @hpa: HPA of the @cxld decoder's address range
> + *
> + * The callback translates a decoder's HPA to the next upper domain
> + * which is the address range of the decoder's parent port. The return
> + * value is the translated HPA on success or ULLONG_MAX otherwise.
> + */
>  typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
>  
>  /**
> @@ -581,6 +590,7 @@ struct cxl_dax_region {
>   * @parent_dport: dport that points to this port in the parent
>   * @decoder_ida: allocator for decoder ids
>   * @reg_map: component and ras register mapping parameters
> + * @to_hpa: Callback to translate a child port's decoder address to the port's HPA address range
>   * @nr_dports: number of entries in @dports
>   * @hdm_end: track last allocated HDM decoder instance for allocation ordering
>   * @commit_end: cursor to track highest committed decoder for commit ordering
> @@ -602,6 +612,7 @@ struct cxl_port {
>  	struct cxl_dport *parent_dport;
>  	struct ida decoder_ida;
>  	struct cxl_register_map reg_map;
> +	cxl_to_hpa_fn to_hpa;
>  	int nr_dports;
>  	int hdm_end;
>  	int commit_end;


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
  2025-02-19 23:32   ` Gregory Price
  2025-02-20 17:31   ` Gregory Price
@ 2025-02-20 21:56   ` Dave Jiang
  2025-04-04  4:38   ` Gregory Price
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2025-02-20 21:56 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman



On 2/18/25 6:23 AM, Robert Richter wrote:
> The calculation of an endpoint's position in a region traverses all
> ports up to the root port and determines the corresponding decoders
> for that particular address range. For address translation the HPA
> range must be recalculated between ports. In order to prepare the
> implementation of address translation, move code to
> cxl_endpoint_decoder_initialize() and reuse the existing iterator
> there.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ad4a6ce37216..6f106bfa115f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1903,7 +1903,6 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
>  	for (i = 0; i < p->nr_targets; i++) {
>  		struct cxl_endpoint_decoder *cxled = p->targets[i];
>  
> -		cxled->pos = cxl_calc_interleave_pos(cxled);
>  		/*
>  		 * Record that sorting failed, but still continue to calc
>  		 * cxled->pos so that follow-on code paths can reliably

Do the comments need to be updated here with the deletion of that line above?

DJ

> @@ -3264,10 +3263,22 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>  	struct cxl_port *iter = cxled_to_port(cxled);
>  	struct cxl_decoder *root, *cxld = &cxled->cxld;
> -	struct range *hpa = &cxld->hpa_range;
> +	struct range hpa = cxld->hpa_range;
> +	struct cxl_interleave_context ctx;
> +	int rc;
>  
> -	while (iter && !is_cxl_root(iter))
> -		iter = to_cxl_port(iter->dev.parent);
> +	ctx = (struct cxl_interleave_context) {
> +		.hpa_range = &hpa,
> +	};
> +
> +	while (iter && !is_cxl_root(iter)) {
> +		/* Convert interleave settings to next port upstream. */
> +		rc = cxl_port_calc_interleave(iter, &ctx);
> +		if (rc < 0)
> +			return rc;
> +
> +		iter = parent_port_of(iter);
> +	}
>  
>  	if (!iter)
>  		return -ENXIO;
> @@ -3281,7 +3292,13 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  		return -ENXIO;
>  	}
>  
> +	dev_dbg(cxld->dev.parent,
> +		"%s:%s: range:%#llx-%#llx pos:%d\n",
> +		dev_name(&cxled->cxld.dev), dev_name(&cxld->dev),
> +		hpa.start, hpa.end, ctx.pos);
> +
>  	cxled->cxlrd = to_cxl_root_decoder(&root->dev);
> +	cxled->pos = ctx.pos;
>  
>  	return 0;
>  }


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint
  2025-02-18 13:23 ` [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint Robert Richter
  2025-02-20 18:42   ` Gregory Price
@ 2025-02-20 22:31   ` Dave Jiang
  2025-02-20 22:37     ` Gregory Price
  2025-03-14 12:41   ` Jonathan Cameron
  2 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2025-02-20 22:31 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
	Terry Bowman



On 2/18/25 6:23 AM, Robert Richter wrote:
> To find the correct region and root port of an endpoint of a system
> needing address translation, the endpoint's HPA range must be
> translated to each of the parent port address ranges up to the root
> decoder.
> 
> Calculate the SPA range using the newly introduced callback function
> port->to_hpa() that translates the decoder's HPA range to its parent
> port's HPA range of the next outer memory domain. Introduce the helper
> function cxl_port_calc_hpa() for this to calculate address ranges
> using the low-level port->to_hpa() callbacks. Determine the root port
> SPA range by iterating all the ports up to the root. Store the
> endpoint's SPA range for later use.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 81 ++++++++++++++++++++++++++++++++-------
>  drivers/cxl/cxl.h         |  1 +
>  2 files changed, 68 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6f106bfa115f..d898c9f51113 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -832,6 +832,44 @@ static int match_free_decoder(struct device *dev, const void *data)
>  	return 1;
>  }
>  
> +static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
> +			     struct range *hpa_range)
> +{
> +	struct range hpa = *hpa_range;
> +	u64 len = range_len(&hpa);
> +
> +	if (!port->to_hpa)
> +		return 0;
> +
> +	/* Translate HPA to the next upper domain. */
> +	hpa.start = port->to_hpa(cxld, hpa.start);
> +	hpa.end = port->to_hpa(cxld, hpa.end);
> +
> +	if (hpa.start == ULLONG_MAX || hpa.end == ULLONG_MAX) {
> +		dev_warn(&port->dev,
> +			"CXL address translation: HPA range invalid: %#llx-%#llx:%#llx-%#llx(%s)\n",
> +			hpa.start, hpa.end, hpa_range->start,
> +			hpa_range->end, dev_name(&cxld->dev));
> +		return -ENXIO;
> +	}
> +
> +	if (range_len(&hpa) != len * cxld->interleave_ways) {
> +		dev_warn(&port->dev,
> +			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
> +			hpa.start, hpa.end, hpa_range->start,
> +			hpa_range->end, dev_name(&cxld->dev));
> +		return -ENXIO;
> +	}
> +
> +	if (hpa.start == hpa_range->start && hpa.end == hpa_range->end)
> +		return 0;
> +
> +	*hpa_range = hpa;
> +
> +	/* Return 1 if modified. */
> +	return 1;
> +}
> +
>  static int match_auto_decoder(struct device *dev, const void *data)
>  {
>  	const struct cxl_region_params *p = data;
> @@ -1882,6 +1920,11 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
>  		.hpa_range = &cxled->cxld.hpa_range,
>  	};
>  
> +	/*
> +	 * Address translation is only supported for auto-discovery of
> +	 * decoders. There is no need to support address translation
> +	 * here. That is, do not recalculate ctx.hpa_range here.
> +	 */

Should this go with patch 3?

>  	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
>  	     iter = parent_port_of(iter))
>  		pos = cxl_port_calc_interleave(iter, &ctx);
> @@ -3262,7 +3305,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  {
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>  	struct cxl_port *iter = cxled_to_port(cxled);
> -	struct cxl_decoder *root, *cxld = &cxled->cxld;
> +	struct cxl_port *parent = parent_port_of(iter);
> +	struct cxl_decoder *cxld = &cxled->cxld;
>  	struct range hpa = cxld->hpa_range;
>  	struct cxl_interleave_context ctx;
>  	int rc;
> @@ -3271,25 +3315,33 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  		.hpa_range = &hpa,
>  	};
>  
> -	while (iter && !is_cxl_root(iter)) {
> +	if (!iter || !parent)

While parent_port_of() will check NULL and return NULL, it just seems icky checking the pointer after use.

DJ

> +		return -ENXIO;
> +
> +	while (iter && parent) {
> +		/* Translate HPA to the next upper memory domain. */
> +		rc = cxl_port_calc_hpa(parent, cxld, &hpa);
> +		if (rc < 0)
> +			return rc;
> +
>  		/* Convert interleave settings to next port upstream. */
>  		rc = cxl_port_calc_interleave(iter, &ctx);
>  		if (rc < 0)
>  			return rc;
>  
> -		iter = parent_port_of(iter);
> -	}
> +		iter = parent;
> +		parent = parent_port_of(iter);
>  
> -	if (!iter)
> -		return -ENXIO;
> +		if (!parent || parent->to_hpa)
> +			cxld = cxl_port_find_switch_decoder(iter, &hpa);
>  
> -	root = cxl_port_find_switch_decoder(iter, hpa);
> -	if (!root) {
> -		dev_err(cxlmd->dev.parent,
> -			"%s:%s no CXL window for range %#llx:%#llx\n",
> -			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> -			cxld->hpa_range.start, cxld->hpa_range.end);
> -		return -ENXIO;
> +		if (!cxld) {
> +			dev_err(cxlmd->dev.parent,
> +				"%s:%s no CXL window for range %#llx:%#llx\n",
> +				dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +				hpa.start, hpa.end);
> +			return -ENXIO;
> +		}
>  	}
>  
>  	dev_dbg(cxld->dev.parent,
> @@ -3297,7 +3349,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  		dev_name(&cxled->cxld.dev), dev_name(&cxld->dev),
>  		hpa.start, hpa.end, ctx.pos);
>  
> -	cxled->cxlrd = to_cxl_root_decoder(&root->dev);
> +	cxled->cxlrd = to_cxl_root_decoder(&cxld->dev);
> +	cxled->spa_range = hpa;
>  	cxled->pos = ctx.pos;
>  
>  	return 0;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 17496784f021..7303aec1c31c 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -394,6 +394,7 @@ struct cxl_endpoint_decoder {
>  	struct cxl_decoder cxld;
>  	struct cxl_root_decoder *cxlrd;
>  	struct resource *dpa_res;
> +	struct range spa_range;
>  	resource_size_t skip;
>  	enum cxl_decoder_state state;
>  	int part;


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint
  2025-02-20 22:31   ` Dave Jiang
@ 2025-02-20 22:37     ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-02-20 22:37 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Thu, Feb 20, 2025 at 03:31:45PM -0700, Dave Jiang wrote:
> >  	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
> >  	     iter = parent_port_of(iter))
> >  		pos = cxl_port_calc_interleave(iter, &ctx);
> > @@ -3262,7 +3305,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
> >  {
> >  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> >  	struct cxl_port *iter = cxled_to_port(cxled);
> > -	struct cxl_decoder *root, *cxld = &cxled->cxld;
> > +	struct cxl_port *parent = parent_port_of(iter);
> > +	struct cxl_decoder *cxld = &cxled->cxld;
> >  	struct range hpa = cxld->hpa_range;
> >  	struct cxl_interleave_context ctx;
> >  	int rc;
> > @@ -3271,25 +3315,33 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
> >  		.hpa_range = &hpa,
> >  	};
> >  
> > -	while (iter && !is_cxl_root(iter)) {
> > +	if (!iter || !parent)
> 
> While parent_port_of() will check NULL and return NULL, it just seems icky checking the pointer after use.
> 

I briefly had the same thought and dug into parent_port_of, erred on the
side of "How many places is cxl_endpoint_decoder_initialize going to be
called?" - but you're probably right, we probably should do the null
check if only for style. 

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT
  2025-02-18 13:23 ` [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT Robert Richter
@ 2025-02-21  0:40   ` Dave Jiang
  2025-03-14 13:01   ` Jonathan Cameron
  2025-04-05  2:38   ` [PATCH] [HACK] drop zen5_init checks due to segfault Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2025-02-21  0:40 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Davidlohr Bueso, Terry Bowman
  Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco



On 2/18/25 6:23 AM, Robert Richter wrote:
> Add AMD platform specific Zen5 support for address translation.
> 
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
> 
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
> 
>  AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
>  ACPI v6.5 Porting Guide, Publication # 58088
>  https://www.amd.com/en/search/documentation/hub.html
> 
> To implement AMD Zen5 address translation the following steps are
> needed:
> 
> Apply platform specific changes to each port where necessary using
> platform detection and the existing architectural framework.
> 
> Add a function cxl_port_setup_amd() to implement AMD platform specific
> code. Use Kbuild and Kconfig options respectively to enable the code
> depending on architecture and platform options. Handle architecture
> specifics in arch_cxl_port_platform_setup() and create a new file
> core/x86/amd.c for this.
> 
> Introduce a function cxl_zen5_init() to handle Zen5 specific
> enablement. Zen5 platforms are detected using the PCIe vendor and
> device ID of the corresponding CXL root port. There is a check for
> ACPI PRMT CXL address translation support.
> 
> Apply cxl_zen5_to_hpa() as cxl_port->to_hpa() callback to Zen5 CXL
> host bridges to enable platform specific address translation.
> 
> Use ACPI PRM DPA-to-SPA translation to determine an endpoint's
> interleaving configuration and base address during the early
> initialization process. This is used to determine an endpoint's SPA
> range and to check the address translation setup.
> 
> The configuration can be determined calling the PRM for specific HPAs
> given. The resulting SPAs are used to calculate interleaving
> parameters of the host bridge and root port. The maximum granularity
> (chunk size) is 16k, minimum is 256. Use the following calculation for
> the given HPAs:
> 
>  ways    = hpa_len(SZ_16K) / SZ_16K
>  gran    = (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
>           / (ways - 1)
>  pos     = (hpa_len(SZ_16K) - ways * SZ_16K) / gran
> 
> Before the endpoint is attached to a region the translation is checked
> for reasonable values.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

What is the possibilty of making this ACPI PRM call interface public and teach drivers/cxl/acpi.c to make it common code? That way if the platform implements the ACPI call then the translation gets registered and we don't need to have platform specific code. Really would like to avoid drivers/cxl/core/$arch if that's possible. 

DJ

> ---
>  drivers/cxl/Kconfig           |   4 +
>  drivers/cxl/core/Makefile     |   1 +
>  drivers/cxl/core/core.h       |   3 +
>  drivers/cxl/core/region.c     |  25 +++-
>  drivers/cxl/core/x86/amd.c    | 259 ++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/x86/common.c |   2 +
>  6 files changed, 293 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cxl/core/x86/amd.c
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 876469e23f7a..e576028dd983 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -146,4 +146,8 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_AMD
> +       def_bool y
> +       depends on AMD_NB
> +
>  endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index db1d16d39037..cfe41b8edfd3 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -18,3 +18,4 @@ cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
>  
>  cxl_core-$(CONFIG_X86)		+= x86/common.o
> +cxl_core-$(CONFIG_CXL_AMD)	+= x86/amd.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index e2955f91fd98..d5c94e8cea42 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -119,5 +119,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
>  
>  int cxl_gpf_port_setup(struct device *dport_dev, struct cxl_port *port);
>  void arch_cxl_port_platform_setup(struct cxl_port *port);
> +#if defined(CONFIG_X86)
> +void cxl_port_setup_amd(struct cxl_port *port);
> +#endif
>  
>  #endif /* __CXL_CORE_H__ */
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index dab059ee26ef..b6806e67c62a 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -837,10 +837,24 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
>  {
>  	struct range hpa = *hpa_range;
>  	u64 len = range_len(&hpa);
> +	int bits;
>  
>  	if (!port->to_hpa)
>  		return 0;
>  
> +	/*
> +	 * Check range and length alignment of 256 MB for the
> +	 * interleaved address range. With max. 16-way interleaving
> +	 * applied this is at least 16 KB.
> +	 */
> +
> +	if (!len || hpa_range->start & (SZ_16K - 1) || len & (SZ_16K - 1)) {
> +		dev_warn(&port->dev,
> +			"HPA range not aligned or multiple of 16 kB: %#llx-%#llx(%s)\n",
> +			hpa_range->start, hpa_range->end, dev_name(&cxld->dev));
> +		return -ENXIO;
> +	}
> +
>  	/* Translate HPA to the next upper domain. */
>  	hpa.start = port->to_hpa(cxld, hpa.start);
>  	hpa.end = port->to_hpa(cxld, hpa.end);
> @@ -853,7 +867,16 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
>  		return -ENXIO;
>  	}
>  
> -	if (range_len(&hpa) != len * cxld->interleave_ways) {
> +	/*
> +	 *  Apply min and max interleaving addresses to the range.
> +	 *  Determine the interleave ways and expand the 16 KB range
> +	 *  by the power-of-2 part it.
> +	 */
> +	bits = range_len(&hpa) > len ? __ffs(range_len(&hpa) / len) : 0;
> +	hpa.start &= ~((SZ_16K << bits) - 1);
> +	hpa.end |= (SZ_16K << bits) - 1;
> +
> +	if (range_len(&hpa) % len) {
>  		dev_warn(&port->dev,
>  			"CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
>  			hpa.start, hpa.end, hpa_range->start,
> diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
> new file mode 100644
> index 000000000000..483c92c18054
> --- /dev/null
> +++ b/drivers/cxl/core/x86/amd.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +
> +#include <cxlmem.h>
> +#include "../core.h"
> +
> +#define PCI_DEVICE_ID_AMD_ZEN5_ROOT		0x153e
> +
> +static const struct pci_device_id zen5_root_port_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ZEN5_ROOT) },
> +	{},
> +};
> +
> +static int is_zen5_root_port(struct device *dev, void *unused)
> +{
> +	if (!dev_is_pci(dev))
> +		return 0;
> +
> +	return !!pci_match_id(zen5_root_port_ids, to_pci_dev(dev));
> +}
> +
> +static bool is_zen5(struct cxl_port *port)
> +{
> +	if (!IS_ENABLED(CONFIG_ACPI_PRMT))
> +		return false;
> +
> +	/* To get the CXL root port, find the CXL host bridge first. */
> +	if (is_cxl_root(port) ||
> +	    !port->host_bridge ||
> +	    !is_cxl_root(to_cxl_port(port->dev.parent)))
> +		return false;
> +
> +	return !!device_for_each_child(port->host_bridge, NULL,
> +				       is_zen5_root_port);
> +}
> +
> +/*
> + * PRM Address Translation - CXL DPA to System Physical Address
> + *
> + * Reference:
> + *
> + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> + * ACPI v6.5 Porting Guide, Publication # 58088
> + */
> +
> +static const guid_t prm_cxl_dpa_spa_guid =
> +	GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
> +		  0x48, 0x0b, 0x94);
> +
> +struct prm_cxl_dpa_spa_data {
> +	u64 dpa;
> +	u8 reserved;
> +	u8 devfn;
> +	u8 bus;
> +	u8 segment;
> +	void *out;
> +} __packed;
> +
> +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> +{
> +	struct prm_cxl_dpa_spa_data data;
> +	u64 spa;
> +	int rc;
> +
> +	data = (struct prm_cxl_dpa_spa_data) {
> +		.dpa     = dpa,
> +		.devfn   = pci_dev->devfn,
> +		.bus     = pci_dev->bus->number,
> +		.segment = pci_domain_nr(pci_dev->bus),
> +		.out     = &spa,
> +	};
> +
> +	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +	if (rc) {
> +		pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> +		return ULLONG_MAX;
> +	}
> +
> +	pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> +
> +	return spa;
> +}
> +
> +/* Bits used for interleaving. */
> +#define SPA_INTERLEAVING_BITS	GENMASK_ULL(14, 8)
> +
> +static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{
> +	struct cxl_memdev *cxlmd;
> +	struct pci_dev *pci_dev;
> +	struct cxl_port *port;
> +	u64 base, spa, spa2, len, len2, offset, granularity, gran_mask;
> +	int ways, pos, ways_bits, gran_bits;
> +
> +	/*
> +	 * Nothing to do if base is non-zero and Normalized Addressing
> +	 * is disabled.
> +	 */
> +	if (cxld->hpa_range.start)
> +		return hpa;
> +
> +	/* Only translate from endpoint to its parent port. */
> +	if (!is_endpoint_decoder(&cxld->dev))
> +		return hpa;
> +
> +	/*
> +	 * Endpoints are programmed passthrough in Normalized
> +	 * Addressing mode.
> +	 */
> +	if (cxld->interleave_ways != 1) {
> +		dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
> +			cxld->interleave_ways, cxld->interleave_granularity);
> +		return ULLONG_MAX;
> +	}
> +
> +	if (hpa > cxld->hpa_range.end) {
> +		dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
> +			hpa, cxld->hpa_range.start, cxld->hpa_range.end);
> +		return ULLONG_MAX;
> +	}
> +
> +	port = to_cxl_port(cxld->dev.parent);
> +	cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> +	if (!port || !dev_is_pci(cxlmd->dev.parent)) {
> +		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> +			dev_name(cxld->dev.parent), cxld->hpa_range.start,
> +			cxld->hpa_range.end);
> +		return ULLONG_MAX;
> +	}
> +	pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> +	/*
> +	 * If the decoder is already attached we are past the decoder
> +	 * initialization, do not determine the address mapping and
> +	 * just return here.
> +	 */
> +	if (cxld->region)
> +		return prm_cxl_dpa_spa(pci_dev, hpa);
> +
> +	/*
> +	 * Determine the interleaving config. Maximum granularity
> +	 * (chunk size) is 16k, minimum is 256. Calculated with:
> +	 *
> +	 *	ways	= hpa_len(SZ_16K) / SZ_16K
> +	 * 	gran	= (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
> +	 *                / (ways - 1)
> +	 *	pos	= (hpa_len(SZ_16K) - ways * SZ_16K) / gran
> +	 */
> +
> +	base = prm_cxl_dpa_spa(pci_dev, 0);
> +	spa  = prm_cxl_dpa_spa(pci_dev, SZ_16K);
> +	spa2 = prm_cxl_dpa_spa(pci_dev, SZ_16K - SZ_256);
> +
> +	/* Includes checks to avoid div by zero */
> +	if (!base || base == ULLONG_MAX || spa == ULLONG_MAX ||
> +	    spa2 == ULLONG_MAX || spa < base + SZ_16K || spa2 <= base ||
> +	    (spa > base + SZ_16K && spa - spa2 < SZ_256 * 2)) {
> +		dev_dbg(&cxld->dev, "Error translating HPA: base: %#llx spa: %#llx spa2: %#llx\n",
> +			base, spa, spa2);
> +		return ULLONG_MAX;
> +	}
> +
> +	len = spa - base;
> +	len2 = spa2 - base;
> +
> +	/* offset = pos * granularity */
> +	if (len == SZ_16K && len2 == SZ_16K - SZ_256) {
> +		ways = 1;
> +		offset = 0;
> +		granularity = SZ_256;
> +		pos = 0;
> +		ways_bits = 0;
> +		gran_bits = 8;
> +	} else {
> +		ways = len / SZ_16K;
> +		offset = spa & (SZ_16K - 1);
> +		granularity = (len - len2 - SZ_256) / (ways - 1);
> +		ways_bits = __ffs(ways);
> +		gran_bits = __ffs(granularity);
> +		pos = offset >> gran_bits;
> +	}
> +
> +	/*
> +	 * Check the mapping: Number of ways is power of 2 or a
> +	 * multiple of 3 ways (len == ways * SZ_16K), granularitys is
> +	 * power of 2.
> +	 */
> +	if (len & ~(3ULL << (ways_bits + 14)) ||
> +	    granularity != 1 << gran_bits || offset != pos << gran_bits) {
> +		dev_dbg(&cxld->dev, "Error determining address mapping: base: %#llx spa: %#llx spa2: %#llx ways: %d pos: %d granularity: %llu\n",
> +			base, spa, spa2, ways, pos, granularity);
> +		return ULLONG_MAX;
> +	}
> +
> +	spa = prm_cxl_dpa_spa(pci_dev, hpa);
> +
> +	/*
> +	 * Check SPA using a PRM call for the closest DPA calculated
> +	 * for the HPA. If the HPA matches a different interleaving
> +	 * position other than the decoder's, determine its offset to
> +	 * adjust the SPA.
> +	 */
> +
> +	gran_mask = GENMASK_ULL(gran_bits, 0);
> +	spa2 = base + (hpa & ~gran_mask) * ways + (hpa & gran_mask);
> +	base = base - pos * granularity;
> +
> +	dev_dbg(&cxld->dev,
> +		"address mapping found for %s (hpa -> spa): %#llx -> %#llx (%#llx+%#llx) ways: %d pos: %d granularity: %llu\n",
> +		pci_name(pci_dev), hpa, spa, base, spa - base, ways, pos,
> +		granularity);
> +
> +
> +	if ((spa ^ spa2) & ~SPA_INTERLEAVING_BITS) {
> +		dev_dbg(&cxld->dev, "SPA calculation failed: %#llx:%#llx\n",
> +			spa, spa2);
> +		return ULLONG_MAX;
> +	}
> +
> +	return spa;
> +}
> +
> +static void cxl_zen5_init(struct cxl_port *port)
> +{
> +	u64 spa;
> +	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> +	int rc;
> +
> +	if (!is_zen5(port))
> +		return;
> +
> +	/* Check kernel and firmware support */
> +	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +
> +	if (rc == -EOPNOTSUPP) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
> +		return;
> +	}
> +
> +	if (rc == -ENODEV) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
> +		return;
> +	}
> +
> +	port->to_hpa = cxl_zen5_to_hpa;
> +
> +	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> +		dev_name(&port->dev));
> +}
> +
> +void cxl_port_setup_amd(struct cxl_port *port)
> +{
> +	cxl_zen5_init(port);
> +}
> diff --git a/drivers/cxl/core/x86/common.c b/drivers/cxl/core/x86/common.c
> index eeb9bdadb26d..7ccd68b035e6 100644
> --- a/drivers/cxl/core/x86/common.c
> +++ b/drivers/cxl/core/x86/common.c
> @@ -9,4 +9,6 @@
>  
>  void arch_cxl_port_platform_setup(struct cxl_port *port)
>  {
> +	if (IS_ENABLED(CONFIG_CXL_AMD))
> +		cxl_port_setup_amd(port);
>  }


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint
  2025-02-18 13:23 ` [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint Robert Richter
  2025-02-20 18:42   ` Gregory Price
  2025-02-20 22:31   ` Dave Jiang
@ 2025-03-14 12:41   ` Jonathan Cameron
  2 siblings, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2025-03-14 12:41 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
	Gregory Price, Fabio M. De Francesco, Terry Bowman

On Tue, 18 Feb 2025 14:23:46 +0100
Robert Richter <rrichter@amd.com> wrote:

> To find the correct region and root port of an endpoint of a system
> needing address translation, the endpoint's HPA range must be
> translated to each of the parent port address ranges up to the root
> decoder.
> 
> Calculate the SPA range using the newly introduced callback function
> port->to_hpa() that translates the decoder's HPA range to its parent
> port's HPA range of the next outer memory domain. Introduce the helper
> function cxl_port_calc_hpa() for this to calculate address ranges
> using the low-level port->to_hpa() callbacks. Determine the root port
> SPA range by iterating all the ports up to the root. Store the
> endpoint's SPA range for later use.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 81 ++++++++++++++++++++++++++++++++-------
>  drivers/cxl/cxl.h         |  1 +
>  2 files changed, 68 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6f106bfa115f..d898c9f51113 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c

> +
>  static int match_auto_decoder(struct device *dev, const void *data)
>  {
>  	const struct cxl_region_params *p = data;
> @@ -1882,6 +1920,11 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
>  		.hpa_range = &cxled->cxld.hpa_range,
>  	};
>  
> +	/*
> +	 * Address translation is only supported for auto-discovery of
> +	 * decoders. There is no need to support address translation
> +	 * here. That is, do not recalculate ctx.hpa_range here.
> +	 */
>  	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
>  	     iter = parent_port_of(iter))
>  		pos = cxl_port_calc_interleave(iter, &ctx);
> @@ -3262,7 +3305,8 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  {
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>  	struct cxl_port *iter = cxled_to_port(cxled);
> -	struct cxl_decoder *root, *cxld = &cxled->cxld;
> +	struct cxl_port *parent = parent_port_of(iter);
> +	struct cxl_decoder *cxld = &cxled->cxld;
>  	struct range hpa = cxld->hpa_range;
>  	struct cxl_interleave_context ctx;
>  	int rc;
> @@ -3271,25 +3315,33 @@ static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
>  		.hpa_range = &hpa,
>  	};
>  
> -	while (iter && !is_cxl_root(iter)) {
> +	if (!iter || !parent)
> +		return -ENXIO;
> +
> +	while (iter && parent) {
> +		/* Translate HPA to the next upper memory domain. */
> +		rc = cxl_port_calc_hpa(parent, cxld, &hpa);
> +		if (rc < 0)
> +			return rc;
> +
>  		/* Convert interleave settings to next port upstream. */
>  		rc = cxl_port_calc_interleave(iter, &ctx);
>  		if (rc < 0)
>  			return rc;
>  
> -		iter = parent_port_of(iter);
> -	}
> +		iter = parent;
> +		parent = parent_port_of(iter);
>  
> -	if (!iter)
> -		return -ENXIO;
> +		if (!parent || parent->to_hpa)
> +			cxld = cxl_port_find_switch_decoder(iter, &hpa);
>  
> -	root = cxl_port_find_switch_decoder(iter, hpa);
> -	if (!root) {

I'm not immediately spotting why root needs a rename.  Is it used
for other things now that it wasn't before?

> -		dev_err(cxlmd->dev.parent,
> -			"%s:%s no CXL window for range %#llx:%#llx\n",
> -			dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> -			cxld->hpa_range.start, cxld->hpa_range.end);
> -		return -ENXIO;
> +		if (!cxld) {
> +			dev_err(cxlmd->dev.parent,
> +				"%s:%s no CXL window for range %#llx:%#llx\n",
> +				dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> +				hpa.start, hpa.end);
> +			return -ENXIO;
> +		}
>  	}



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations
  2025-02-18 13:23 ` [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations Robert Richter
  2025-02-20 16:28   ` Gregory Price
@ 2025-03-14 12:45   ` Jonathan Cameron
  1 sibling, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2025-03-14 12:45 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
	Gregory Price, Fabio M. De Francesco, Terry Bowman

On Tue, 18 Feb 2025 14:23:44 +0100
Robert Richter <rrichter@amd.com> wrote:

> Function cxl_calc_interleave_pos() contains code to calculate the
> interleaving parameters of a port. Factor out that code for later
> reuse. Add function cxl_port_calc_interleave() for this and introduce
> struct cxl_interleave_context to collect all interleaving data.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 63 ++++++++++++++++++++++++++-------------
>  1 file changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c118bda93e86..ad4a6ce37216 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1800,27 +1800,34 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
>  	return rc;
>  }
>  
> +struct cxl_interleave_context {
> +	struct range *hpa_range;
> +	int pos;

If this isn't going to get bigger later in the series I'd be inclined to just pass
the two separately.  Update the pos based on return value if non negative.

Maybe I'm missing a reason that doesn't work.



> +};
> +
>  /**
> - * cxl_calc_interleave_pos() - calculate an endpoint position in a region
> - * @cxled: endpoint decoder member of given region
> + * cxl_port_calc_interleave() - calculate interleave config of an endpoint for @port
> + * @port: Port the new position is calculated for.
> + * @ctx: Interleave context

>   *
> - * The endpoint position is calculated by traversing the topology from
> - * the endpoint to the root decoder and iteratively applying this
> - * calculation:
> + * The endpoint position for the next port is calculated by applying
> + * this calculation:
>   *
>   *    position = position * parent_ways + parent_pos;
>   *
>   * ...where @position is inferred from switch and root decoder target lists.
>   *
> + * The endpoint's position in a region can be calculated by traversing
> + * the topology from the endpoint to the root decoder and iteratively
> + * applying the function for each port.
> + *
>   * Return: position >= 0 on success
>   *	   -ENXIO on failure
>   */
> -static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> +static int cxl_port_calc_interleave(struct cxl_port *port,
> +				    struct cxl_interleave_context *ctx)
>  {
> -	struct cxl_port *iter, *port = cxled_to_port(cxled);
> -	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> -	struct range *range = &cxled->cxld.hpa_range;
> -	int parent_ways = 0, parent_pos = 0, pos = 0;
> +	int parent_ways = 0, parent_pos = 0;
>  	int rc;
>  
>  	/*
> @@ -1852,22 +1859,38 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
>  	 * complex topologies, including those with switches.
>  	 */
>  
> -	/* Iterate from endpoint to root_port refining the position */
> -	for (iter = port; iter; iter = parent_port_of(iter)) {
> -		if (is_cxl_root(iter))
> -			break;
> +	if (is_cxl_root(port))
> +		return 0;
>  
> -		rc = find_pos_and_ways(iter, range, &parent_pos, &parent_ways);
> -		if (rc)
> -			return rc;
> +	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
> +	if (rc)
> +		return rc;
>  
> -		pos = pos * parent_ways + parent_pos;
> -	}
> +	ctx->pos = ctx->pos * parent_ways + parent_pos;
> +
> +	return ctx->pos;
> +}
> +
> +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_port *iter, *port = cxled_to_port(cxled);
> +	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_interleave_context ctx;
> +	int pos = 0;
> +
> +	ctx = (struct cxl_interleave_context) {
> +		.hpa_range = &cxled->cxld.hpa_range,
> +	};
> +
> +	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
> +	     iter = parent_port_of(iter))
> +		pos = cxl_port_calc_interleave(iter, &ctx);
>  
>  	dev_dbg(&cxlmd->dev,
>  		"decoder:%s parent:%s port:%s range:%#llx-%#llx pos:%d\n",
>  		dev_name(&cxled->cxld.dev), dev_name(cxlmd->dev.parent),
> -		dev_name(&port->dev), range->start, range->end, pos);
> +		dev_name(&port->dev), ctx.hpa_range->start, ctx.hpa_range->end,
> +		ctx.pos);
>  
>  	return pos;
>  }


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region
  2025-02-18 13:23 ` [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region Robert Richter
  2025-02-20 19:28   ` Gregory Price
@ 2025-03-14 12:45   ` Jonathan Cameron
  2025-04-08 15:45   ` Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2025-03-14 12:45 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
	Gregory Price, Fabio M. De Francesco, Terry Bowman

On Tue, 18 Feb 2025 14:23:49 +0100
Robert Richter <rrichter@amd.com> wrote:

> To find the correct region and root port of an endpoint of a system
> needing address translation, the endpoint's HPA range must be
> translated to each of the parent port address ranges up to the root
> decoder.
> 
> Use the calculated SPA range of an endpoint to find the endpoint's
> region.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create a region
  2025-02-18 13:23 ` [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create " Robert Richter
  2025-02-20 19:31   ` Gregory Price
@ 2025-03-14 12:46   ` Jonathan Cameron
  2025-04-08 15:50   ` Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2025-03-14 12:46 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
	Gregory Price, Fabio M. De Francesco, Terry Bowman

On Tue, 18 Feb 2025 14:23:50 +0100
Robert Richter <rrichter@amd.com> wrote:

> To create a region, SPA ranges must be used. With address translation
> the endpoint's HPA range is not the same as the SPA range. Use the
> previously calculated SPA range instead.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region
  2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
  2025-02-20 19:43   ` Gregory Price
@ 2025-03-14 12:49   ` Jonathan Cameron
  2025-04-01  1:59   ` Gregory Price
  2025-04-01 18:03   ` Gregory Price
  3 siblings, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2025-03-14 12:49 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
	Gregory Price, Fabio M. De Francesco, Terry Bowman

On Tue, 18 Feb 2025 14:23:51 +0100
Robert Richter <rrichter@amd.com> wrote:

> Endpoints requiring address translation might not be aware of the
> system's interleaving configuration. Instead, interleaving can be
> configured on an upper memory domain (from an endpoint view) and thus
> is not visible to the endpoint. For region creation this might cause
> an invalid interleaving config that does not match the CFMWS entries.
> 
> Use the interleaving configuration of the root decoders to create a
> region which bases on CFMWS entries. This always matches the system's
> interleaving configuration and is independent of the underlying memory
> topology.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 39 ++++++++++++++++++++++++++++++++++-----
>  drivers/cxl/cxl.h         |  2 ++
>  2 files changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6e0434eee6df..3afcc9ca06ae 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1749,6 +1749,15 @@ static int cxl_region_validate_position(struct cxl_region *cxlr,
>  		}
>  	}
>  
> +	if (p->interleave_ways != cxled->interleave_ways ||
> +	    p->interleave_granularity != cxled->interleave_granularity ) {
> +		dev_dbg(&cxlr->dev, "interleaving config mismatch with %s: ways: %d:%d granularity: %d:%d\n",
> +			dev_name(&cxled->cxld.dev), p->interleave_ways,
> +			cxled->interleave_ways, p->interleave_granularity,
> +			cxled->interleave_granularity);
> +		return -ENXIO;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -1852,7 +1861,7 @@ static int match_switch_decoder_by_range(struct device *dev,
>  }
>  
>  static int find_pos_and_ways(struct cxl_port *port, struct range *range,
> -			     int *pos, int *ways)
> +			     int *pos, int *ways, int *granularity)
>  {
>  	struct cxl_switch_decoder *cxlsd;
>  	struct cxl_port *parent;
> @@ -1873,6 +1882,7 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
>  	}
>  	cxlsd = to_cxl_switch_decoder(dev);
>  	*ways = cxlsd->cxld.interleave_ways;
> +	*granularity = cxlsd->cxld.interleave_granularity;
>  
>  	for (int i = 0; i < *ways; i++) {
>  		if (cxlsd->target[i] == port->parent_dport) {
> @@ -1896,6 +1906,8 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
>  struct cxl_interleave_context {
>  	struct range *hpa_range;
>  	int pos;
> +	int interleave_ways;
> +	int interleave_granularity;

Ah. And here is our context expansion

>  };
>  
>  /**
> @@ -1914,13 +1926,17 @@ struct cxl_interleave_context {
>   * the topology from the endpoint to the root decoder and iteratively
>   * applying the function for each port.
>   *
> + * Calculation of interleaving ways:
> + *
> + *    interleave_ways = interleave_ways * parent_ways;
> + *
>   * Return: position >= 0 on success
>   *	   -ENXIO on failure
>   */
>  static int cxl_port_calc_interleave(struct cxl_port *port,
>  				    struct cxl_interleave_context *ctx)
>  {
> -	int parent_ways = 0, parent_pos = 0;
> +	int parent_ways = 0, parent_pos = 0, parent_granularity = 0;
>  	int rc;
>  
>  	/*
> @@ -1955,12 +1971,23 @@ static int cxl_port_calc_interleave(struct cxl_port *port,
>  	if (is_cxl_root(port))
>  		return 0;
>  
> -	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
> +	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways,
> +			&parent_granularity);
>  	if (rc)
>  		return rc;
>  
>  	ctx->pos = ctx->pos * parent_ways + parent_pos;
>  
> +	if (ctx->interleave_ways)
> +		ctx->interleave_ways *= parent_ways;
> +	else
> +		ctx->interleave_ways = parent_ways;
> +
> +	if (ctx->interleave_granularity)
> +		ctx->interleave_granularity *= ctx->interleave_ways;
> +	else
> +		ctx->interleave_granularity = parent_granularity;
> +
>  	return ctx->pos;

I think Gregory called this out in earlier patch.  Mixing and matching
between returning pos and use of ctx makes things hard to read.  If
we need to have it in context, then make this return 0 or -ERR instead.

>  }



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT
  2025-02-18 13:23 ` [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT Robert Richter
  2025-02-21  0:40   ` Dave Jiang
@ 2025-03-14 13:01   ` Jonathan Cameron
  2025-04-05  2:38   ` [PATCH] [HACK] drop zen5_init checks due to segfault Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2025-03-14 13:01 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Dave Jiang, Davidlohr Bueso, Terry Bowman, linux-cxl,
	linux-kernel, Gregory Price, Fabio M. De Francesco

On Tue, 18 Feb 2025 14:23:55 +0100
Robert Richter <rrichter@amd.com> wrote:

> Add AMD platform specific Zen5 support for address translation.
> 
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
> 
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
> 
>  AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
>  ACPI v6.5 Porting Guide, Publication # 58088
>  https://www.amd.com/en/search/documentation/hub.html
> 
> To implement AMD Zen5 address translation the following steps are
> needed:
> 
> Apply platform specific changes to each port where necessary using
> platform detection and the existing architectural framework.
> 
> Add a function cxl_port_setup_amd() to implement AMD platform specific
> code. Use Kbuild and Kconfig options respectively to enable the code
> depending on architecture and platform options. Handle architecture
> specifics in arch_cxl_port_platform_setup() and create a new file
> core/x86/amd.c for this.
> 
> Introduce a function cxl_zen5_init() to handle Zen5 specific
> enablement. Zen5 platforms are detected using the PCIe vendor and
> device ID of the corresponding CXL root port. There is a check for
> ACPI PRMT CXL address translation support.
> 
> Apply cxl_zen5_to_hpa() as cxl_port->to_hpa() callback to Zen5 CXL
> host bridges to enable platform specific address translation.
> 
> Use ACPI PRM DPA-to-SPA translation to determine an endpoint's
> interleaving configuration and base address during the early
> initialization process. This is used to determine an endpoint's SPA
> range and to check the address translation setup.
> 
> The configuration can be determined calling the PRM for specific HPAs
> given. The resulting SPAs are used to calculate interleaving
> parameters of the host bridge and root port. The maximum granularity
> (chunk size) is 16k, minimum is 256. Use the following calculation for
> the given HPAs:
> 
>  ways    = hpa_len(SZ_16K) / SZ_16K
>  gran    = (hpa_len(SZ_16K) - hpa_len(SZ_16K - SZ_256) - SZ_256)
>           / (ways - 1)
>  pos     = (hpa_len(SZ_16K) - ways * SZ_16K) / gran
> 
> Before the endpoint is attached to a region the translation is checked
> for reasonable values.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
Some trivial comments inline.  The rest I may take another look at as
not gotten my head fully around it yet.

> diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
> new file mode 100644
> index 000000000000..483c92c18054
> --- /dev/null
> +++ b/drivers/cxl/core/x86/amd.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +
> +#include <cxlmem.h>
> +#include "../core.h"
> +
> +#define PCI_DEVICE_ID_AMD_ZEN5_ROOT		0x153e

Might as well just put it inline unless you have lots of uses.

> +
> +static const struct pci_device_id zen5_root_port_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ZEN5_ROOT) },
> +	{},
	{}

Don't want to make it easy to add stuff after.

> +};
> +
> +static int is_zen5_root_port(struct device *dev, void *unused)
> +{
> +	if (!dev_is_pci(dev))
> +		return 0;
> +
> +	return !!pci_match_id(zen5_root_port_ids, to_pci_dev(dev));
> +}
> +
> +static bool is_zen5(struct cxl_port *port)
> +{
> +	if (!IS_ENABLED(CONFIG_ACPI_PRMT))
> +		return false;
> +
> +	/* To get the CXL root port, find the CXL host bridge first. */
> +	if (is_cxl_root(port) ||
> +	    !port->host_bridge ||
combine a few of these on same line perhaps?
> +	    !is_cxl_root(to_cxl_port(port->dev.parent)))
> +		return false;

> +
> +/* Bits used for interleaving. */
> +#define SPA_INTERLEAVING_BITS	GENMASK_ULL(14, 8)
> +
> +static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{


> +	port = to_cxl_port(cxld->dev.parent);
> +	cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> +	if (!port || !dev_is_pci(cxlmd->dev.parent)) {

Given you are going to dereference cxlmd, maybe more logical to do

	if (!cxlmd || !dev_is_pci()

> +		dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> +			dev_name(cxld->dev.parent), cxld->hpa_range.start,
> +			cxld->hpa_range.end);
> +		return ULLONG_MAX;
> +	}
> +	pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> +	/*
> +	 * If the decoder is already attached we are past the decoder
> +	 * initialization, do not determine the address mapping and
> +	 * just return here.
> +	 */
> +	if (cxld->region)
> +		return prm_cxl_dpa_spa(pci_dev, hpa);
> +
>
> +	/*
> +	 * Check the mapping: Number of ways is power of 2 or a
> +	 * multiple of 3 ways (len == ways * SZ_16K), granularitys is

granularity is 

> +	 * power of 2.
> +	 */
> +	if (len & ~(3ULL << (ways_bits + 14)) ||
> +	    granularity != 1 << gran_bits || offset != pos << gran_bits) {
> +		dev_dbg(&cxld->dev, "Error determining address mapping: base: %#llx spa: %#llx spa2: %#llx ways: %d pos: %d granularity: %llu\n",
> +			base, spa, spa2, ways, pos, granularity);
> +		return ULLONG_MAX;
> +	}
> +
> +	spa = prm_cxl_dpa_spa(pci_dev, hpa);
> +
> +	/*
> +	 * Check SPA using a PRM call for the closest DPA calculated
> +	 * for the HPA. If the HPA matches a different interleaving
> +	 * position other than the decoder's, determine its offset to
> +	 * adjust the SPA.
> +	 */
> +
> +	gran_mask = GENMASK_ULL(gran_bits, 0);
> +	spa2 = base + (hpa & ~gran_mask) * ways + (hpa & gran_mask);
> +	base = base - pos * granularity;
> +
> +	dev_dbg(&cxld->dev,
> +		"address mapping found for %s (hpa -> spa): %#llx -> %#llx (%#llx+%#llx) ways: %d pos: %d granularity: %llu\n",
> +		pci_name(pci_dev), hpa, spa, base, spa - base, ways, pos,
> +		granularity);
> +
> +
> +	if ((spa ^ spa2) & ~SPA_INTERLEAVING_BITS) {
> +		dev_dbg(&cxld->dev, "SPA calculation failed: %#llx:%#llx\n",
> +			spa, spa2);
> +		return ULLONG_MAX;
> +	}
> +
> +	return spa;
> +}
> +
> +static void cxl_zen5_init(struct cxl_port *port)
> +{
> +	u64 spa;
> +	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> +	int rc;
> +
> +	if (!is_zen5(port))
> +		return;
> +
> +	/* Check kernel and firmware support */
> +	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> +
No blank line here. Keen the error check right up against the call.
> +	if (rc == -EOPNOTSUPP) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
> +		return;
> +	}
> +
> +	if (rc == -ENODEV) {
> +		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
> +		return;
> +	}
> +
> +	port->to_hpa = cxl_zen5_to_hpa;
> +
> +	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> +		dev_name(&port->dev));
> +}
> +
> +void cxl_port_setup_amd(struct cxl_port *port)
> +{
> +	cxl_zen5_init(port);
> +}


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region
  2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
  2025-02-20 19:43   ` Gregory Price
  2025-03-14 12:49   ` Jonathan Cameron
@ 2025-04-01  1:59   ` Gregory Price
  2025-04-01  5:26     ` Gregory Price
  2025-04-01 18:03   ` Gregory Price
  3 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-04-01  1:59 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:51PM +0100, Robert Richter wrote:
> @@ -1955,12 +1971,23 @@ static int cxl_port_calc_interleave(struct cxl_port *port,
>  	if (is_cxl_root(port))
>  		return 0;
>  
> -	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
> +	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways,
> +			&parent_granularity);
>  	if (rc)
>  		return rc;
>  
>  	ctx->pos = ctx->pos * parent_ways + parent_pos;
>  
> +	if (ctx->interleave_ways)
> +		ctx->interleave_ways *= parent_ways;
> +	else
> +		ctx->interleave_ways = parent_ways;
> +
> +	if (ctx->interleave_granularity)
> +		ctx->interleave_granularity *= ctx->interleave_ways;
> +	else
> +		ctx->interleave_granularity = parent_granularity;
> +
>  	return ctx->pos;
>  }
>  

I have discovered on my Zen5 that either this code is incorrect, or my
decoders are programmed incorrectly.

decoderN.M  |  ig  iw
----------------------
decoder0.0  |  2   256
decoder1.0  |  1   256
decoder3.0  |  1   256
decoder5.0  |  1   256
decoder6.0  |  1   256
region0     |  2   512 <--- Wrong

*Arch quirk aside*, everything except region is as expected.


I finally dropped a bunch of hacks from my branch, and my Zen5 stopped
bringing devices up correctly, with the error:

[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets expected
               iw: 1 ig: 1024   [... snip ...]
[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets got
               iw: 1 ig: 256    [... snip ...]

Sitting here scratching my head how I could possibly end up with an ig
of 1024 with the above set of decoders when I realized the region
inherited interleave_ways/granularity from the ENDPOINT decoder, which
is not exposed to sysfs.

Had to come back around to realize this patch set adds new
ways/granularity fields to the endpoint decoder.

struct cxl_endpoint_decoder {
        struct cxl_decoder cxld;
	...
        int interleave_ways;
        int interleave_granularity;
}

struct cxl_decoder {
	...
        int interleave_ways;
        int interleave_granularity;
}


1) the cxl_endpoint_decoder descriptor needs to be updated to explain
   why these ways/granularity differ from the cxl_decoder inside of the
   cxl_endpoint_decoder.  This is very, very confusing.

   The reason appears to be that the endpoint decoder ways/granularity
   is the region ways/granularity.  So the endpoint decoder is passing
   this information along.

   Makes me think the region creation code should call this directly,
   rather than all this indirection.


2) This calculation appears to just be plain wrong.


static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
{
	ctx = (struct cxl_interleave_context) {
                .hpa_range = &hpa,
        };
	...
        while (iter && parent) {
	        endpoint        host bridge      root
		decoder6.0  ->  decoder3.0   ->  decoder0.0

                /* Convert interleave settings to next port upstream. */
                rc = cxl_port_calc_interleave(iter, &ctx);
		...
	}
	...
	cxled->interleave_ways = ctx.interleave_ways;
        cxled->interleave_granularity = ctx.interleave_granularity;
}

On my setup, I would expect to iterate decoder3.0 and decoder0.0
	decoderN.M  |  ig  iw
	----------------------
	decoder0.0  |  2   256
	decoder3.0  |  1   256

	on entry [iw,ig] = [0,0]
	[parent_ways, parent_gran] -> [1,256]
	[iw * piw, ig * piw]       -> [2,512]


Looking at a normal system, we'd expect this configuration:

decoderN.M  |  ig  iw
----------------------
decoder0.0  |  2   256
decoder1.0  |  1   512
decoder3.0  |  1   512
decoder5.0  |  2   256
decoder6.0  |  2   256

The above code produces the following:
	[1,512]
	[2,1024] <--- still wrong


in cxl_port_setup_targets we have this comment:

        if (is_cxl_root(parent_port)) {
                /*
                 * Root decoder IG is always set to value in CFMWS which
                 * may be different than this region's IG.  We can use the
                 * region's IG here since interleave_granularity_store()
                 * does not allow interleaved host-bridges with
                 * root IG != region IG.
                 */
                parent_ig = p->interleave_granularity;
                parent_iw = cxlrd->cxlsd.cxld.interleave_ways;
	}


Can we not just always report the parent ways/granularity, and skip all
the math?  We'll always iterate to the root, and that's what we want the
region to match, right?

Better yet, can we not just do this in the region creation code, rather
than having the endpoint carry it through to the region for some reason?
Avoid adding the duplicate ways/granularity field all together.

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region
  2025-04-01  1:59   ` Gregory Price
@ 2025-04-01  5:26     ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-01  5:26 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Mon, Mar 31, 2025 at 09:59:45PM -0400, Gregory Price wrote:
> I have discovered on my Zen5 that either this code is incorrect, or my
> decoders are programmed incorrectly.
> 
> decoderN.M  |  ig  iw
> ----------------------
> decoder0.0  |  2   256
> decoder3.0  |  1   256
> decoder6.0  |  1   256
> region0     |  2   512 <--- Wrong
> 
> *Arch quirk aside*, everything except region is as expected.
>
... snip ...
> 
> Looking at a normal system, we'd expect this configuration:
> 
> decoderN.M  |  ig  iw
> ----------------------
> decoder0.0  |  2   256
> decoder3.0  |  1   512
> decoder6.0  |  2   256
> 
> The above code produces the following:
> 	[1,512]
> 	[2,1024] <--- still wrong
> 
... snip ...
> 
> Can we not just always report the parent ways/granularity, and skip all
> the math?  We'll always iterate to the root, and that's what we want the
> region to match, right?
> 
> Better yet, can we not just do this in the region creation code, rather
> than having the endpoint carry it through to the region for some reason?
> Avoid adding the duplicate ways/granularity field all together.
> 

Having tested just using the root decoder data, i now get the expected
1:512, but i realized the issue is also that the regionref uses the
endpoint->decoder interleave ways/granularity

Before:
[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets expected
               iw: 1 ig: 1024   [... snip ...]
[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets got
               iw: 1 ig: 256    [... snip ...]

After:
[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets expected
               iw: 1 ig: 512 
[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets got 
               iw: 1 ig: 256

This makes sense, as the Zen5 quirk here is that the endpoints are
programmed with a 0-base for just their capacity, and they have no
interleave set on them - while the host bridges have 1:256 to match
the endpoint and 2:256 in the root is the only "correct" (in-spec)
programming of the topology.

I think the only choice here is to have another arch_check_interleave()
check here in `cxl_port_setup_targets()` that checks for this.  I
haven't run the numbers on what the results would be if the HB had 1:512
instead of 1:256, but I imagine there lies another round of madness.

~Gregory


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region
  2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
                     ` (2 preceding siblings ...)
  2025-04-01  1:59   ` Gregory Price
@ 2025-04-01 18:03   ` Gregory Price
  3 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-01 18:03 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:51PM +0100, Robert Richter wrote:
> Endpoints requiring address translation might not be aware of the
> system's interleaving configuration. Instead, interleaving can be
> configured on an upper memory domain (from an endpoint view) and thus
> is not visible to the endpoint. For region creation this might cause
> an invalid interleaving config that does not match the CFMWS entries.
> 
> Use the interleaving configuration of the root decoders to create a
> region which bases on CFMWS entries. This always matches the system's
> interleaving configuration and is independent of the underlying memory
> topology.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
... snip ...
> @@ -1955,12 +1971,23 @@ static int cxl_port_calc_interleave(struct cxl_port *port,
>  	if (is_cxl_root(port))
>  		return 0;
>  
> -	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
> +	rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways,
> +			&parent_granularity);
>  	if (rc)
>  		return rc;
>  
>  	ctx->pos = ctx->pos * parent_ways + parent_pos;
>  
> +	if (ctx->interleave_ways)
> +		ctx->interleave_ways *= parent_ways;
> +	else
> +		ctx->interleave_ways = parent_ways;
> +
> +	if (ctx->interleave_granularity)
> +		ctx->interleave_granularity *= ctx->interleave_ways;
> +	else
> +		ctx->interleave_granularity = parent_granularity;
> +
>  	return ctx->pos;
>  }


The root is always build from the CFMWS, so you don't need to do any
math, you just need to take the root.  Since the logic calling this will
always climb to the root, you'll always get the right value.

So I think you just want this

--- 

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index fe23dc106956..05b24b008a1b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1977,17 +1977,8 @@ static int cxl_port_calc_interleave(struct cxl_port *port,
                return rc;

        ctx->pos = ctx->pos * parent_ways + parent_pos;
-
-       if (ctx->interleave_ways)
-               ctx->interleave_ways *= parent_ways;
-       else
-               ctx->interleave_ways = parent_ways;
-
-       if (ctx->interleave_granularity)
-               ctx->interleave_granularity *= ctx->interleave_ways;
-       else
-               ctx->interleave_granularity = parent_granularity;
-
+       ctx->interleave_ways = parent_ways;
+       ctx->interleave_granularity = parent_granularity;
        return ctx->pos;
 }

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
                     ` (2 preceding siblings ...)
  2025-02-20 21:56   ` Dave Jiang
@ 2025-04-04  4:38   ` Gregory Price
  2025-04-04 15:36   ` [PATCH] cxl/region: Continue recalculating position during sort Gregory Price
  2025-04-05  2:35   ` [PATCH] cxl region: recalculate interleave pos during region probe Gregory Price
  5 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-04  4:38 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:45PM +0100, Robert Richter wrote:
> The calculation of an endpoint's position in a region traverses all
> ports up to the root port and determines the corresponding decoders
> for that particular address range. For address translation the HPA
> range must be recalculated between ports. In order to prepare the
> implementation of address translation, move code to
> cxl_endpoint_decoder_initialize() and reuse the existing iterator
> there.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ad4a6ce37216..6f106bfa115f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1903,7 +1903,6 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
>  	for (i = 0; i < p->nr_targets; i++) {
>  		struct cxl_endpoint_decoder *cxled = p->targets[i];
>  
> -		cxled->pos = cxl_calc_interleave_pos(cxled);
>  		/*

This change breaks the entire sort process, because in
cxl_region_attach_auto the positions are temporarily overwritten

To make this work, I had to drop over-write of the position when doing
the attach process - but this is just incorrect as we now have a
cxled->pos that indexes incorrectly into p->targets[N]

The result of the above change (without the below hack) is that decoder
probe order causes a race condition - whoever shows up first gets
position 0, and the sort does nothing (because the above line is
dropped).

TL;DR: This line should just be left as-is.  It's perfectly fine to
re-calculate the position.

~Gregory

--- DO NOT USE - added for context ---

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 902b04b875b3..e75eb1c815f1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1855,7 +1855,6 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
         */
        pos = p->nr_targets;
        p->targets[pos] = cxled;
-       cxled->pos = pos;
        p->nr_targets++;

        return 0;


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH] cxl/region: Continue recalculating position during sort
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
                     ` (3 preceding siblings ...)
  2025-04-04  4:38   ` Gregory Price
@ 2025-04-04 15:36   ` Gregory Price
  2025-04-04 17:22     ` Gregory Price
  2025-04-05  2:35   ` [PATCH] cxl region: recalculate interleave pos during region probe Gregory Price
  5 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-04-04 15:36 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

The auto decoder probe proess overwrites the endpoint position
temporarily to record its temporary location in the region target list.
This patch restores the pos recalculation during the sort target process
so that decoder probe order doesn't affect region probe.

Signed-off-by: Gregory Price <gourry@gourry.net>
---
 drivers/cxl/core/region.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e1ef0d577b35..8c79c0a39d56 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2061,6 +2061,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
        for (i = 0; i < p->nr_targets; i++) {
                struct cxl_endpoint_decoder *cxled = p->targets[i];

+               cxled->pos = cxl_calc_interleave_pos(cxled);
                /*
                 * Record that sorting failed, but still continue to calc
                 * cxled->pos so that follow-on code paths can reliably
--
2.47.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] cxl/region: Continue recalculating position during sort
  2025-04-04 15:36   ` [PATCH] cxl/region: Continue recalculating position during sort Gregory Price
@ 2025-04-04 17:22     ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-04 17:22 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Fri, Apr 04, 2025 at 10:36:49AM -0500, Gregory Price wrote:
> The auto decoder probe proess overwrites the endpoint position
> temporarily to record its temporary location in the region target list.
> This patch restores the pos recalculation during the sort target process
> so that decoder probe order doesn't affect region probe.
> 
> Signed-off-by: Gregory Price <gourry@gourry.net>

Disregard this patch, it appears to break is subtly different ways now -
sigh :[.

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] cxl region: recalculate interleave pos during region probe
  2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
                     ` (4 preceding siblings ...)
  2025-04-04 15:36   ` [PATCH] cxl/region: Continue recalculating position during sort Gregory Price
@ 2025-04-05  2:35   ` Gregory Price
  2025-04-08 15:30     ` Gregory Price
  5 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-04-05  2:35 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

There are at least 3 bugs at this point in the patch series.

1. cxl_calc_interleave_pos should be using cxled->spa_range
2. cxl_calc_interleave_pos should be returning ctx->pos
3. cxl_region_sort_targets still needs to call cxl_calc_interleave_pos

The auto decoder probe proess overwrites the endpoint position
temporarily to record its temporary location in the region target list.
This patch restores the pos recalculation during the sort target process
so that decoder probe order doesn't affect region probe.

Signed-off-by: Gregory Price <gourry@gourry.net>
---
 drivers/cxl/core/region.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 934dcb2daa15..5c9e2b747731 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1879,7 +1879,7 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
 	int pos = 0;

 	ctx = (struct cxl_interleave_context) {
-		.hpa_range = &cxled->cxld.hpa_range,
+		.hpa_range = &cxled->spa_range,
 	};

 	for (iter = cxled_to_port(cxled); pos >= 0 && iter;
@@ -1892,7 +1892,7 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
 		dev_name(&port->dev), ctx.hpa_range->start, ctx.hpa_range->end,
 		ctx.pos);

-	return pos;
+	return ctx.pos;
 }

 static int cxl_region_sort_targets(struct cxl_region *cxlr)
@@ -1903,6 +1903,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
 	for (i = 0; i < p->nr_targets; i++) {
 		struct cxl_endpoint_decoder *cxled = p->targets[i];

+		cxled->pos = cxl_calc_interleave_pos(cxled);
 		/*
 		 * Record that sorting failed, but still continue to calc
 		 * cxled->pos so that follow-on code paths can reliably
--
2.47.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH] [HACK] drop zen5_init checks due to segfault
  2025-02-18 13:23 ` [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT Robert Richter
  2025-02-21  0:40   ` Dave Jiang
  2025-03-14 13:01   ` Jonathan Cameron
@ 2025-04-05  2:38   ` Gregory Price
  2025-05-13 21:10     ` Robert Richter
  2 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-04-05  2:38 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, Terry Bowman,
	linux-cxl, linux-kernel, Fabio M. De Francesco

Unclear why this is occuring, but a raw call to the PRM at this point
causes segfaults on my Zen5 system.  Later calls to the prm work just
fine, and modifying the structure to include pci_dev info still results
in a segfault.

Debugging this is not possible on my end since the crash happens deep in
the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?

---
 drivers/cxl/core/x86/amd.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/cxl/core/x86/amd.c b/drivers/cxl/core/x86/amd.c
index 483c92c18054..5e3708f9e179 100644
--- a/drivers/cxl/core/x86/amd.c
+++ b/drivers/cxl/core/x86/amd.c
@@ -227,26 +227,9 @@ static u64 cxl_zen5_to_hpa(struct cxl_decoder *cxld, u64 hpa)

 static void cxl_zen5_init(struct cxl_port *port)
 {
-	u64 spa;
-	struct prm_cxl_dpa_spa_data data = { .out = &spa, };
-	int rc;
-
 	if (!is_zen5(port))
 		return;

-	/* Check kernel and firmware support */
-	rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
-
-	if (rc == -EOPNOTSUPP) {
-		pr_warn_once("ACPI PRMT: PRM address translation not supported by kernel\n");
-		return;
-	}
-
-	if (rc == -ENODEV) {
-		pr_warn_once("ACPI PRMT: PRM address translation not supported by firmware\n");
-		return;
-	}
-
 	port->to_hpa = cxl_zen5_to_hpa;

 	dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
--
2.47.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] cxl region: recalculate interleave pos during region probe
  2025-04-05  2:35   ` [PATCH] cxl region: recalculate interleave pos during region probe Gregory Price
@ 2025-04-08 15:30     ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-08 15:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Fri, Apr 04, 2025 at 10:35:20PM -0400, Gregory Price wrote:
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 934dcb2daa15..5c9e2b747731 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1879,7 +1879,7 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
>  	int pos = 0;
> 
>  	ctx = (struct cxl_interleave_context) {
> -		.hpa_range = &cxled->cxld.hpa_range,
> +		.hpa_range = &cxled->spa_range,
>  	};
>

Realizing just now that this line belongs to the next patch.

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region
  2025-02-18 13:23 ` [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region Robert Richter
  2025-02-20 19:28   ` Gregory Price
  2025-03-14 12:45   ` Jonathan Cameron
@ 2025-04-08 15:45   ` Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-08 15:45 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:49PM +0100, Robert Richter wrote:
> To find the correct region and root port of an endpoint of a system
> needing address translation, the endpoint's HPA range must be
> translated to each of the parent port address ranges up to the root
> decoder.
> 
> Use the calculated SPA range of an endpoint to find the endpoint's
> region.
> 

After debugging some other patches, I think this patch needs to just be
rolled in with the introduction of cxled->spa_range (Patch 5).

(spa_range == hpa_range) up to this point, so this is effectively a NOP.

> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/region.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6d5ede5b4c43..ffe6038249ed 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3535,7 +3535,6 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  
>  static int cxl_endpoint_decoder_add(struct cxl_endpoint_decoder *cxled)
>  {
> -	struct range *hpa = &cxled->cxld.hpa_range;
>  	struct cxl_root_decoder *cxlrd = cxled->cxlrd;
>  	struct cxl_region_params *p;
>  	struct cxl_region *cxlr;
> @@ -3547,7 +3546,7 @@ static int cxl_endpoint_decoder_add(struct cxl_endpoint_decoder *cxled)
>  	 * one does the construction and the others add to that.
>  	 */
>  	mutex_lock(&cxlrd->range_lock);
> -	cxlr = cxl_find_region_by_range(cxlrd, hpa);
> +	cxlr = cxl_find_region_by_range(cxlrd, &cxled->spa_range);
>  	if (!cxlr)
>  		cxlr = construct_region(cxlrd, cxled);
>  	mutex_unlock(&cxlrd->range_lock);
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create a region
  2025-02-18 13:23 ` [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create " Robert Richter
  2025-02-20 19:31   ` Gregory Price
  2025-03-14 12:46   ` Jonathan Cameron
@ 2025-04-08 15:50   ` Gregory Price
  2 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-08 15:50 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:50PM +0100, Robert Richter wrote:
> To create a region, SPA ranges must be used. With address translation
> the endpoint's HPA range is not the same as the SPA range. Use the
> previously calculated SPA range instead.
>

Same with patch 8, I think this probably should just be rolled in with
patch 5.  You can simplify the notes in patch 5 by saying:

"""
 since cxled->spa_range is set to cxled->cxld.hpa_range at endpoint
 initialization, this patch is effectively a no-op for all systems
 that do not provide a translation callback.

 Change all refs to cxled->cxld.hpa_range to cxled->spa_range.
"""

This should also help simplify the patch series overall.

I haven't gone through to check if there are any other missed hpa_range
references yet (e.g. [1])

[1] https://lore.kernel.org/linux-cxl/20250218132356.1809075-1-rrichter@amd.com/T/#m26a8e4d8b1783dfb837553f184f7f174b61b7825

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check a region
  2025-02-18 13:23 ` [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check " Robert Richter
  2025-02-20 19:50   ` Gregory Price
@ 2025-04-08 15:54   ` Gregory Price
  1 sibling, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-08 15:54 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:52PM +0100, Robert Richter wrote:
> Endpoints or switches requiring address translation might not be aware
> of the system's interleaving configuration. Then, the configured
> endpoint's address range might not match the expected range. In
> contrast, the SPA range of an endpoint is calculated applying platform
> specific address translation. That range is correct and can be used to
> check a region range.
> 
> Adjust the region range check and use the endpoint's SPA range to
> check it.
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>

Like other patches, I think this can probably just be folded into patch
5 under the comment that cxled->spa_range == cxled->cxld.hpa_range for
all systems that do not implement a translation mechanism.

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder
  2025-02-18 13:23 ` [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder Robert Richter
@ 2025-04-24  0:28   ` Gregory Price
  2025-04-24 21:49     ` Gregory Price
  0 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-04-24  0:28 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Tue, Feb 18, 2025 at 02:23:47PM +0100, Robert Richter wrote:
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d898c9f51113..5048511f9de5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -906,7 +905,7 @@ cxl_find_decoder_early(struct cxl_port *port,
>  		return &cxled->cxld;
>  
>  	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> -		dev = device_find_child(&port->dev, &cxlr->params,
> +		dev = device_find_child(&port->dev, &cxled->cxld.hpa_range,
>  					match_auto_decoder);

This semantic has now changed because of the linear caching set.

Working around this with something like this hack for now

Probably we want to pull the range out of the resource and put it right
in the params instead of the local variable, but just getting it working
for testing for now

~Gregory

---

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index eac873125e6d..c8d38ce55045 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -833,7 +833,8 @@ static int match_free_decoder(struct device *dev, const void *data)
 }

 static bool region_res_match_cxl_range(const struct cxl_region_params *p,
-				       struct range *range)
+				       const struct range *range1,
+				       const struct range *range2)
 {
 	if (!p->res)
 		return false;
@@ -843,8 +844,8 @@ static bool region_res_match_cxl_range(const struct cxl_region_params *p,
 	 * to be fronted by the DRAM range in current known implementation.
 	 * This assumption will be made until a variant implementation exists.
 	 */
-	return p->res->start + p->cache_size == range->start &&
-		p->res->end == range->end;
+	return range1->start + p->cache_size == range2->start &&
+		range1->end == range2->end;
 }

 static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
@@ -885,11 +886,15 @@ static int cxl_port_calc_hpa(struct cxl_port *port, struct cxl_decoder *cxld,
 	return 1;
 }

+struct mad_context {
+	struct cxl_region_params *p;
+	struct range *r;
+};
 static int match_auto_decoder(struct device *dev, const void *data)
 {
-	const struct cxl_region_params *p = data;
+	const struct range *r;
 	struct cxl_decoder *cxld;
-	struct range *r;
+	const struct mad_context *ctx = data;

 	if (!is_switch_decoder(dev))
 		return 0;
@@ -897,7 +902,7 @@ static int match_auto_decoder(struct device *dev, const void *data)
 	cxld = to_cxl_decoder(dev);
 	r = &cxld->hpa_range;

-	if (region_res_match_cxl_range(p, r))
+	if (region_res_match_cxl_range(ctx->p, ctx->r, r))
 		return 1;

 	return 0;
@@ -916,13 +921,14 @@ cxl_find_decoder_early(struct cxl_port *port,
 		       struct cxl_region *cxlr)
 {
 	struct device *dev;
+	struct mad_context mad = { .p = &cxlr->params,
+				   .r =&cxled->cxld.hpa_range };

 	if (port == cxled_to_port(cxled))
 		return &cxled->cxld;

 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
-		dev = device_find_child(&port->dev, &cxlr->params,
-					match_auto_decoder);
+		dev = device_find_child(&port->dev, &mad, match_auto_decoder);
 	else
 		dev = device_find_child(&port->dev, NULL, match_free_decoder);
 	if (!dev)
@@ -1363,6 +1369,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 	struct cxl_decoder *cxld = cxl_rr->decoder;
 	struct cxl_switch_decoder *cxlsd;
 	struct cxl_port *iter = port;
+	struct range r;
 	u16 eig, peig;
 	u8 eiw, peiw;

@@ -1488,10 +1495,12 @@ static int cxl_port_setup_targets(struct cxl_port *port,
 		return -ENXIO;
 	}

+	r.start = p ? p->res->start : 0;
+	r.end = p ? p->res->end : 0;
 	if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
 		if (cxld->interleave_ways != iw ||
 		    cxld->interleave_granularity != ig ||
-		    !region_res_match_cxl_range(p, &cxld->hpa_range) ||
+		    !region_res_match_cxl_range(p, &r, &cxld->hpa_range) ||
 		    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
 			dev_err(&cxlr->dev,
 				"%s:%s %s expected iw: %d ig: %d %pr\n",

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder
  2025-04-24  0:28   ` Gregory Price
@ 2025-04-24 21:49     ` Gregory Price
  2025-04-24 23:46       ` Gregory Price
  0 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2025-04-24 21:49 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Wed, Apr 23, 2025 at 08:28:03PM -0400, Gregory Price wrote:
> @@ -916,13 +921,14 @@ cxl_find_decoder_early(struct cxl_port *port,
>  		       struct cxl_region *cxlr)
>  {
>  	struct device *dev;
> +	struct mad_context mad = { .p = &cxlr->params,
> +				   .r =&cxled->cxld.hpa_range };
                                               ^^^^^^^^^^^^^^
					       spa_range

Woops, missed this.  Not sure if it goes here or a later patch, but
that's needed to make this work.

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder
  2025-04-24 21:49     ` Gregory Price
@ 2025-04-24 23:46       ` Gregory Price
  0 siblings, 0 replies; 61+ messages in thread
From: Gregory Price @ 2025-04-24 23:46 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel, Fabio M. De Francesco, Terry Bowman

On Thu, Apr 24, 2025 at 05:49:35PM -0400, Gregory Price wrote:
> On Wed, Apr 23, 2025 at 08:28:03PM -0400, Gregory Price wrote:
> > @@ -916,13 +921,14 @@ cxl_find_decoder_early(struct cxl_port *port,
> >  		       struct cxl_region *cxlr)
> >  {
> >  	struct device *dev;
> > +	struct mad_context mad = { .p = &cxlr->params,
> > +				   .r =&cxled->cxld.hpa_range };
>                                                ^^^^^^^^^^^^^^
> 					       spa_range
> 
> Woops, missed this.  Not sure if it goes here or a later patch, but
> that's needed to make this work.
> 
> ~Gregory

And for the sake of completeness - I've confirmed that this is
sufficient to get a Zen5 working on top of

   v6.13 + (v6.14, v6.15, cxl-next) PCI/CXL backports

So a smaller set of changes than I was expecting (some other mild fixups
but nothing major).  Just kind of have to decide what the shape of this
change looks like, if not like this.

I can share my modified line if you'd like, but I haven't incorporated
any suggestions from the chain here.

~Gregory

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] [HACK] drop zen5_init checks due to segfault
  2025-04-05  2:38   ` [PATCH] [HACK] drop zen5_init checks due to segfault Gregory Price
@ 2025-05-13 21:10     ` Robert Richter
  2025-06-17 20:33       ` Joshua Hahn
  0 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-05-13 21:10 UTC (permalink / raw)
  To: Gregory Price
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, Terry Bowman,
	linux-cxl, linux-kernel, Fabio M. De Francesco

On 04.04.25 22:38:58, Gregory Price wrote:
> Unclear why this is occuring, but a raw call to the PRM at this point
> causes segfaults on my Zen5 system.  Later calls to the prm work just
> fine, and modifying the structure to include pci_dev info still results
> in a segfault.
> 
> Debugging this is not possible on my end since the crash happens deep in
> the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?

There is a subsys_initcall order dependency if driver is builtin:

subsys_initcall(cxl_acpi_init);
subsys_initcall(efisubsys_init);

A fix using subsys_initcall_sync(cxl_acpi_init) solves the issue.

efi_rts_wq workqueue is used by cxl_acpi_init() before its allocation
in efisubsys_init(). I will address that in the next submission.

Thanks for looking into this.

-Robert

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] [HACK] drop zen5_init checks due to segfault
  2025-05-13 21:10     ` Robert Richter
@ 2025-06-17 20:33       ` Joshua Hahn
  2025-06-24  5:43         ` Robert Richter
  0 siblings, 1 reply; 61+ messages in thread
From: Joshua Hahn @ 2025-06-17 20:33 UTC (permalink / raw)
  To: Robert Richter
  Cc: Gregory Price, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Dave Jiang, Davidlohr Bueso,
	Terry Bowman, linux-cxl, linux-kernel, Fabio M. De Francesco

On Tue, 13 May 2025 23:10:36 +0200 Robert Richter <rrichter@amd.com> wrote:

> On 04.04.25 22:38:58, Gregory Price wrote:
> > Unclear why this is occuring, but a raw call to the PRM at this point
> > causes segfaults on my Zen5 system.  Later calls to the prm work just
> > fine, and modifying the structure to include pci_dev info still results
> > in a segfault.
> > 
> > Debugging this is not possible on my end since the crash happens deep in
> > the ACPI prm code.  Seems maybe the PRM interface isn't ready or something?
> 
> There is a subsys_initcall order dependency if driver is builtin:
> 
> subsys_initcall(cxl_acpi_init);
> subsys_initcall(efisubsys_init);
> 
> A fix using subsys_initcall_sync(cxl_acpi_init) solves the issue.
> 
> efi_rts_wq workqueue is used by cxl_acpi_init() before its allocation
> in efisubsys_init(). I will address that in the next submission.
> 
> Thanks for looking into this.
> 
> -Robert
> 

Hello Robert,

I hope you are doing well! Sorry for reviving an old thread. I'm currently
trying to apply this patchset, and saw the same issue that Gregory was having.
Keeping the PRM checks would be helpful for debugging when things go wrong, so
I wanted to try and apply your suggestion, but had a bit of trouble
understanding what the core of the problem was.

I was hoping for some help in understanding your explanation here -- I don't
think I can see where the dependency appears. (In particular, I'm having
trouble understanding where the efi_rts_wq dependnecy matters during the
cxl_zen5_init function). 

Thank you for this patchset, and for your help!
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] [HACK] drop zen5_init checks due to segfault
  2025-06-17 20:33       ` Joshua Hahn
@ 2025-06-24  5:43         ` Robert Richter
  2025-06-24 21:46           ` Joshua Hahn
  0 siblings, 1 reply; 61+ messages in thread
From: Robert Richter @ 2025-06-24  5:43 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Gregory Price, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Dave Jiang, Davidlohr Bueso,
	Terry Bowman, linux-cxl, linux-kernel, Fabio M. De Francesco

On 17.06.25 13:33:18, Joshua Hahn wrote:
> I was hoping for some help in understanding your explanation here -- I don't
> think I can see where the dependency appears. (In particular, I'm having
> trouble understanding where the efi_rts_wq dependnecy matters during the
> cxl_zen5_init function). 

Here a temporary patch with an explanation in the description:


From a540b814d48574b67a9aaa97a5d7536c61d4deda Mon Sep 17 00:00:00 2001
From: Robert Richter <rrichter@amd.com>
Date: Tue, 13 May 2025 15:02:16 +0200
Subject: [PATCH] cxl/acpi: Prepare use of EFI runtime services

In order to use EFI runtime services, esp. ACPI PRM which uses the
efi_rts_wq workqueue, initialize EFI before CXL ACPI.

There is a subsys_initcall order dependency if driver is builtin:

 subsys_initcall(cxl_acpi_init);
 subsys_initcall(efisubsys_init);

Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
efisubsys_init() first.

Reported-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index a1a99ec3f12c..af750a8bd373 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -932,8 +932,12 @@ static void __exit cxl_acpi_exit(void)
 	cxl_bus_drain();
 }
 
-/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
-subsys_initcall(cxl_acpi_init);
+/*
+ * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use
+ * subsys_initcall_sync() since there is an order dependency with
+ * subsys_initcall(efisubsys_init), which must run first.
+ */
+subsys_initcall_sync(cxl_acpi_init);
 
 /*
  * Arrange for host-bridge ports to be active synchronous with
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH] [HACK] drop zen5_init checks due to segfault
  2025-06-24  5:43         ` Robert Richter
@ 2025-06-24 21:46           ` Joshua Hahn
  0 siblings, 0 replies; 61+ messages in thread
From: Joshua Hahn @ 2025-06-24 21:46 UTC (permalink / raw)
  To: Robert Richter
  Cc: Gregory Price, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Dave Jiang, Davidlohr Bueso,
	Terry Bowman, linux-cxl, linux-kernel, Fabio M. De Francesco

On Tue, 24 Jun 2025 07:43:13 +0200 Robert Richter <rrichter@amd.com> wrote:

> On 17.06.25 13:33:18, Joshua Hahn wrote:
> > I was hoping for some help in understanding your explanation here -- I don't
> > think I can see where the dependency appears. (In particular, I'm having
> > trouble understanding where the efi_rts_wq dependnecy matters during the
> > cxl_zen5_init function). 
> 
> Here a temporary patch with an explanation in the description:

Hi Robert,

Thank you for this patch! I just tested on my machine, and can confirm that
this does indeed fix the problem. I'm not sure if this will be folded into
the rest of the patchset or if it will be its own, but I will add my
signatures below.

Thank you again, Have a great day!

Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>

> From a540b814d48574b67a9aaa97a5d7536c61d4deda Mon Sep 17 00:00:00 2001
> From: Robert Richter <rrichter@amd.com>
> Date: Tue, 13 May 2025 15:02:16 +0200
> Subject: [PATCH] cxl/acpi: Prepare use of EFI runtime services
> 
> In order to use EFI runtime services, esp. ACPI PRM which uses the
> efi_rts_wq workqueue, initialize EFI before CXL ACPI.
> 
> There is a subsys_initcall order dependency if driver is builtin:
> 
>  subsys_initcall(cxl_acpi_init);
>  subsys_initcall(efisubsys_init);
> 
> Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
> its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
> efisubsys_init() first.
> 
> Reported-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>

[...snip...]

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2025-06-24 21:46 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 13:23 [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Robert Richter
2025-02-18 13:23 ` [PATCH v2 01/15] cxl: Modify address translation callback for generic use Robert Richter
2025-02-20 16:00   ` Gregory Price
2025-02-20 21:03     ` Dave Jiang
2025-02-18 13:23 ` [PATCH v2 02/15] cxl: Introduce callback to translate an HPA range from a port to its parent Robert Richter
2025-02-20 21:19   ` Dave Jiang
2025-02-18 13:23 ` [PATCH v2 03/15] cxl/region: Factor out code for interleaving calculations Robert Richter
2025-02-20 16:28   ` Gregory Price
2025-02-20 16:41     ` Gregory Price
2025-03-14 12:45   ` Jonathan Cameron
2025-02-18 13:23 ` [PATCH v2 04/15] cxl/region: Calculate endpoint's region position during init Robert Richter
2025-02-19 23:32   ` Gregory Price
2025-02-20 17:31   ` Gregory Price
2025-02-20 21:56   ` Dave Jiang
2025-04-04  4:38   ` Gregory Price
2025-04-04 15:36   ` [PATCH] cxl/region: Continue recalculating position during sort Gregory Price
2025-04-04 17:22     ` Gregory Price
2025-04-05  2:35   ` [PATCH] cxl region: recalculate interleave pos during region probe Gregory Price
2025-04-08 15:30     ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 05/15] cxl/region: Calculate and store the SPA range of an endpoint Robert Richter
2025-02-20 18:42   ` Gregory Price
2025-02-20 22:31   ` Dave Jiang
2025-02-20 22:37     ` Gregory Price
2025-03-14 12:41   ` Jonathan Cameron
2025-02-18 13:23 ` [PATCH v2 06/15] cxl/region: Use endpoint's HPA range to find the port's decoder Robert Richter
2025-04-24  0:28   ` Gregory Price
2025-04-24 21:49     ` Gregory Price
2025-04-24 23:46       ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 07/15] cxl/region: Use translated HPA ranges " Robert Richter
2025-02-20 19:13   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 08/15] cxl/region: Use the endpoint's SPA range to find a region Robert Richter
2025-02-20 19:28   ` Gregory Price
2025-03-14 12:45   ` Jonathan Cameron
2025-04-08 15:45   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 09/15] cxl/region: Use the endpoint's SPA range to create " Robert Richter
2025-02-20 19:31   ` Gregory Price
2025-03-14 12:46   ` Jonathan Cameron
2025-04-08 15:50   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters " Robert Richter
2025-02-20 19:43   ` Gregory Price
2025-03-14 12:49   ` Jonathan Cameron
2025-04-01  1:59   ` Gregory Price
2025-04-01  5:26     ` Gregory Price
2025-04-01 18:03   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 11/15] cxl/region: Use the endpoint's SPA range to check " Robert Richter
2025-02-20 19:50   ` Gregory Price
2025-04-08 15:54   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 12/15] cxl/region: Lock decoders that need address translation Robert Richter
2025-02-20 19:57   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 13/15] cxl/x86: Prepare for architectural platform setup Robert Richter
2025-02-20 19:57   ` Gregory Price
2025-02-18 13:23 ` [PATCH v2 14/15] cxl/amd: Enable Zen5 address translation using ACPI PRMT Robert Richter
2025-02-21  0:40   ` Dave Jiang
2025-03-14 13:01   ` Jonathan Cameron
2025-04-05  2:38   ` [PATCH] [HACK] drop zen5_init checks due to segfault Gregory Price
2025-05-13 21:10     ` Robert Richter
2025-06-17 20:33       ` Joshua Hahn
2025-06-24  5:43         ` Robert Richter
2025-06-24 21:46           ` Joshua Hahn
2025-02-18 13:23 ` [PATCH v2 15/15] MAINTAINERS: CXL: Add entry for AMD platform support (CXL_AMD) Robert Richter
2025-02-20  1:00 ` [PATCH v2 00/15] cxl: Address translation support, part 2: Generic support and AMD Zen5 platform enablement Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox