Linux CXL
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Address translation for HDM decoding
@ 2024-07-01 17:47 Robert Richter
  2024-07-01 17:47 ` [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c Robert Richter
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Robert Richter @ 2024-07-01 17:47 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Default expectation of Linux is that HPA == SPA, which means that
hardware addresses in the decoders are the same as the kernel sees
them. However, there are platforms where this is not the case and an
address translation between decoder's (HPA) and the system's physical
addresses (SPA) is needed.

This series implements address translation for HDM decoding. The
implementation follows the rule that the representation of hardware
address ranges in the kernel are all SPA. If decoder registers (HDM
decoder cap or register range) are not SPA, a base offset must be
applied. Translation happens when accessing the registers back and
forth. After a read access an address will be converted to SPA and
before a write access the programmed address is translated from an
SPA. The decoder register access can be easily encapsulated by address
translation and thus there are only a few places where translation is
needed and the code must be changed. This is implemented in patch #2,
patch #1 is a prerequisite.

Address translation is restricted to platforms that need it. As such a
platform check is needed and a flag is introduced for this (patch #3).

For address translation the base offset must be determined for the
memory domain. Depending on the platform there are various options for
this. The address range in the CEDT's CFWMS entry of the CXL host
bridge can be used to determine the decoder's base address (patch
#4). This is enabled for AMD Zen4 platforms (patch #5).

Changelog:

v2:
 * Fixed build error for other archs [kbot]


Robert Richter (5):
  cxl/hdm: Moving HDM specific code to core/hdm.c.
  cxl/hdm: Implement address translation for HDM decoding
  cxl/acpi: Add platform flag for HPA address translation
  cxl/hdm: Setup HPA base for address translation using the HPA window
    in CFMWS
  cxl/acpi: Enable address translation for Zen4 platforms

 drivers/cxl/acpi.c      |  16 +++
 drivers/cxl/core/core.h |   2 +
 drivers/cxl/core/hdm.c  | 245 ++++++++++++++++++++++++++++++++++++++--
 drivers/cxl/core/pci.c  | 119 +------------------
 drivers/cxl/cxl.h       |   2 +
 drivers/cxl/cxlmem.h    |   4 +
 drivers/cxl/cxlpci.h    |   3 -
 7 files changed, 262 insertions(+), 129 deletions(-)


base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
-- 
2.39.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c.
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
@ 2024-07-01 17:47 ` Robert Richter
  2024-07-12  0:32   ` Dan Williams
  2024-07-01 17:47 ` [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding Robert Richter
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Robert Richter @ 2024-07-01 17:47 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Code to handle HDM specifics is implemented in core/pci.c and
core/hdm.c. In both files helper functions will be needed for address
translation. To simplify the introduction of static helper functions,
move all HDM code to core/hdm.c.

The function devm_cxl_enable_mem() is no longer static and is shared
now within the core module. No functional changes otherwise.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/core.h |   2 +
 drivers/cxl/core/hdm.c  | 117 +++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/pci.c  | 119 +---------------------------------------
 drivers/cxl/cxlmem.h    |   3 +
 drivers/cxl/cxlpci.h    |   3 -
 5 files changed, 123 insertions(+), 121 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 625394486459..d8c3c27ce100 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -70,6 +70,8 @@ int cxl_query_cmd(struct cxl_memdev *cxlmd,
 int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s);
 void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
 				   resource_size_t length);
+struct cxl_dev_state;
+int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds);
 
 struct dentry *cxl_debugfs_create_dir(const char *dir);
 int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 784843fa2a22..605da9a55d89 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -1027,3 +1027,120 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_decoders, CXL);
+
+/* require dvsec ranges to be covered by a locked platform window */
+static int dvsec_range_allowed(struct device *dev, void *arg)
+{
+	struct range *dev_range = arg;
+	struct cxl_decoder *cxld;
+
+	if (!is_root_decoder(dev))
+		return 0;
+
+	cxld = to_cxl_decoder(dev);
+
+	if (!(cxld->flags & CXL_DECODER_F_RAM))
+		return 0;
+
+	return range_contains(&cxld->hpa_range, dev_range);
+}
+
+static void disable_hdm(void *_cxlhdm)
+{
+	u32 global_ctrl;
+	struct cxl_hdm *cxlhdm = _cxlhdm;
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+
+	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+	writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE,
+	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+}
+
+static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
+{
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	u32 global_ctrl;
+
+	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+	writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
+	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+	return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
+}
+
+/**
+ * cxl_hdm_decode_init() - Setup HDM decoding for the endpoint
+ * @cxlds: Device state
+ * @cxlhdm: Mapped HDM decoder Capability
+ * @info: Cached DVSEC range registers info
+ *
+ * Try to enable the endpoint's HDM Decoder Capability
+ */
+int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
+			struct cxl_endpoint_dvsec_info *info)
+{
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	struct cxl_port *port = cxlhdm->port;
+	struct device *dev = cxlds->dev;
+	struct cxl_port *root;
+	int i, rc, allowed;
+	u32 global_ctrl = 0;
+
+	if (hdm)
+		global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+	/*
+	 * If the HDM Decoder Capability is already enabled then assume
+	 * that some other agent like platform firmware set it up.
+	 */
+	if (global_ctrl & CXL_HDM_DECODER_ENABLE || (!hdm && info->mem_enabled))
+		return devm_cxl_enable_mem(&port->dev, cxlds);
+	else if (!hdm)
+		return -ENODEV;
+
+	root = to_cxl_port(port->dev.parent);
+	while (!is_cxl_root(root) && is_cxl_port(root->dev.parent))
+		root = to_cxl_port(root->dev.parent);
+	if (!is_cxl_root(root)) {
+		dev_err(dev, "Failed to acquire root port for HDM enable\n");
+		return -ENODEV;
+	}
+
+	for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
+		struct device *cxld_dev;
+
+		cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
+					     dvsec_range_allowed);
+		if (!cxld_dev) {
+			dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
+			continue;
+		}
+		dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
+		put_device(cxld_dev);
+		allowed++;
+	}
+
+	if (!allowed && info->mem_enabled) {
+		dev_err(dev, "Range register decodes outside platform defined CXL ranges.\n");
+		return -ENXIO;
+	}
+
+	/*
+	 * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base
+	 * [High,Low] when HDM operation is enabled the range register values
+	 * are ignored by the device, but the spec also recommends matching the
+	 * DVSEC Range 1,2 to HDM Decoder Range 0,1. So, non-zero info->ranges
+	 * are expected even though Linux does not require or maintain that
+	 * match. If at least one DVSEC range is enabled and allowed, skip HDM
+	 * Decoder Capability Enable.
+	 */
+	if (info->mem_enabled)
+		return 0;
+
+	rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
+	if (rc)
+		return rc;
+
+	return devm_cxl_enable_mem(&port->dev, cxlds);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, CXL);
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 8567dd11eaac..4f67e3ae7a05 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -270,7 +270,7 @@ static void clear_mem_enable(void *cxlds)
 	cxl_set_mem_enable(cxlds, 0);
 }
 
-static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
+int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
 {
 	int rc;
 
@@ -282,46 +282,6 @@ static int devm_cxl_enable_mem(struct device *host, struct cxl_dev_state *cxlds)
 	return devm_add_action_or_reset(host, clear_mem_enable, cxlds);
 }
 
-/* require dvsec ranges to be covered by a locked platform window */
-static int dvsec_range_allowed(struct device *dev, void *arg)
-{
-	struct range *dev_range = arg;
-	struct cxl_decoder *cxld;
-
-	if (!is_root_decoder(dev))
-		return 0;
-
-	cxld = to_cxl_decoder(dev);
-
-	if (!(cxld->flags & CXL_DECODER_F_RAM))
-		return 0;
-
-	return range_contains(&cxld->hpa_range, dev_range);
-}
-
-static void disable_hdm(void *_cxlhdm)
-{
-	u32 global_ctrl;
-	struct cxl_hdm *cxlhdm = _cxlhdm;
-	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
-
-	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-	writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE,
-	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-}
-
-static int devm_cxl_enable_hdm(struct device *host, struct cxl_hdm *cxlhdm)
-{
-	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
-	u32 global_ctrl;
-
-	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-	writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
-	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-
-	return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
-}
-
 int cxl_dvsec_rr_decode(struct device *dev, int d,
 			struct cxl_endpoint_dvsec_info *info)
 {
@@ -425,83 +385,6 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dvsec_rr_decode, CXL);
 
-/**
- * cxl_hdm_decode_init() - Setup HDM decoding for the endpoint
- * @cxlds: Device state
- * @cxlhdm: Mapped HDM decoder Capability
- * @info: Cached DVSEC range registers info
- *
- * Try to enable the endpoint's HDM Decoder Capability
- */
-int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
-			struct cxl_endpoint_dvsec_info *info)
-{
-	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
-	struct cxl_port *port = cxlhdm->port;
-	struct device *dev = cxlds->dev;
-	struct cxl_port *root;
-	int i, rc, allowed;
-	u32 global_ctrl = 0;
-
-	if (hdm)
-		global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
-
-	/*
-	 * If the HDM Decoder Capability is already enabled then assume
-	 * that some other agent like platform firmware set it up.
-	 */
-	if (global_ctrl & CXL_HDM_DECODER_ENABLE || (!hdm && info->mem_enabled))
-		return devm_cxl_enable_mem(&port->dev, cxlds);
-	else if (!hdm)
-		return -ENODEV;
-
-	root = to_cxl_port(port->dev.parent);
-	while (!is_cxl_root(root) && is_cxl_port(root->dev.parent))
-		root = to_cxl_port(root->dev.parent);
-	if (!is_cxl_root(root)) {
-		dev_err(dev, "Failed to acquire root port for HDM enable\n");
-		return -ENODEV;
-	}
-
-	for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
-		struct device *cxld_dev;
-
-		cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
-					     dvsec_range_allowed);
-		if (!cxld_dev) {
-			dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
-			continue;
-		}
-		dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
-		put_device(cxld_dev);
-		allowed++;
-	}
-
-	if (!allowed && info->mem_enabled) {
-		dev_err(dev, "Range register decodes outside platform defined CXL ranges.\n");
-		return -ENXIO;
-	}
-
-	/*
-	 * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base
-	 * [High,Low] when HDM operation is enabled the range register values
-	 * are ignored by the device, but the spec also recommends matching the
-	 * DVSEC Range 1,2 to HDM Decoder Range 0,1. So, non-zero info->ranges
-	 * are expected even though Linux does not require or maintain that
-	 * match. If at least one DVSEC range is enabled and allowed, skip HDM
-	 * Decoder Capability Enable.
-	 */
-	if (info->mem_enabled)
-		return 0;
-
-	rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
-	if (rc)
-		return rc;
-
-	return devm_cxl_enable_mem(&port->dev, cxlds);
-}
-EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, CXL);
-
 #define CXL_DOE_TABLE_ACCESS_REQ_CODE		0x000000ff
 #define   CXL_DOE_TABLE_ACCESS_REQ_CODE_READ	0
 #define CXL_DOE_TABLE_ACCESS_TABLE_TYPE		0x0000ff00
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 19aba81cdf13..61d9f4e00921 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -862,4 +862,7 @@ struct cxl_hdm {
 struct seq_file;
 struct dentry *cxl_debugfs_create_dir(const char *dir);
 void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
+int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
+			struct cxl_endpoint_dvsec_info *info);
+
 #endif /* __CXL_MEM_H__ */
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 4da07727ab9c..fe01793af1cb 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -122,9 +122,6 @@ static inline bool cxl_pci_flit_256(struct pci_dev *pdev)
 }
 
 int devm_cxl_port_enumerate_dports(struct cxl_port *port);
-struct cxl_dev_state;
-int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
-			struct cxl_endpoint_dvsec_info *info);
 void read_cdat_data(struct cxl_port *port);
 void cxl_cor_error_detected(struct pci_dev *pdev);
 pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
  2024-07-01 17:47 ` [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c Robert Richter
@ 2024-07-01 17:47 ` Robert Richter
  2024-07-12  1:03   ` Dan Williams
  2024-07-01 17:47 ` [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation Robert Richter
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Robert Richter @ 2024-07-01 17:47 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Default expectation of Linux is that HPA == SPA, which means that
hardware addresses in the decoders are the same as the kernel sees
them. However, there are platforms where this is not the case and an
address translation between decoder's (HPA) and the system's physical
addresses (SPA) is needed.

The CXL driver stores all internal hardware address ranges as SPA.
When accessing the HDM decoder's registers, hardware addresses must be
translated back and forth. This is needed for the base addresses in
the CXL Range Registers of the PCIe DVSEC for CXL Devices
(CXL_DVSEC_RANGE_BASE*) or the CXL HDM Decoder Capability Structure
(CXL_HDM_DECODER0_BASE*).

To handle address translation the kernel needs to keep track of the
system's base HPA the decoder bases on. The base can be different
between memory domains, each port may have its own domain. Thus, the
HPA base cannot be shared between CXL ports and its decoders, instead
the base HPA must be stored per port. Each port has its own struct
cxl_hdm to handle the sets of decoders and targets, that struct can
also be used for storing the base.

Add @base_hpa to struct cxl_hdm. Also Introduce helper functions for
the translation and use them to convert the HDM decoder base addresses
to or from an SPA.

While at this, rename len to size for the common base/size naming used
with ranges.

Link: https://lore.kernel.org/all/65c6b8c9a42e4_d2d4294f1@dwillia2-xfh.jf.intel.com.notmuch/
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++--------
 drivers/cxl/cxlmem.h   |  1 +
 2 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 605da9a55d89..50078013f4e3 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -125,6 +125,17 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
 	return true;
 }
 
+static void setup_base_hpa(struct cxl_hdm *cxlhdm)
+{
+	/*
+	 * Address translation is not needed on platforms with HPA ==
+	 * SPA. HDM decoder addresses all base on system addresses,
+	 * there is no offset and the base is zero (cxlhdm->base_hpa
+	 * == 0). Nothing to do here as it is already pre-initialized
+	 * zero.
+	 */
+}
+
 /**
  * devm_cxl_setup_hdm - map HDM decoder component registers
  * @port: cxl_port to map
@@ -144,6 +155,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
 	cxlhdm->port = port;
 	dev_set_drvdata(dev, cxlhdm);
 
+	setup_base_hpa(cxlhdm);
+
 	/* Memory devices can configure device HDM using DVSEC range regs. */
 	if (reg_map->resource == CXL_RESOURCE_NONE) {
 		if (!info || !info->mem_enabled) {
@@ -611,6 +624,23 @@ static int cxld_await_commit(void __iomem *hdm, int id)
 	return -ETIMEDOUT;
 }
 
+/*
+ * Default expectation is that decoder base addresses match
+ * HPA resource values (that is cxlhdm->base_hpa == 0).
+ */
+
+static inline resource_size_t cxl_xlat_to_hpa(resource_size_t base,
+					      struct cxl_hdm *cxlhdm)
+{
+	return cxlhdm->base_hpa + base;
+}
+
+static inline resource_size_t cxl_xlat_to_base(resource_size_t hpa,
+					       struct cxl_hdm *cxlhdm)
+{
+	return hpa - cxlhdm->base_hpa;
+}
+
 static int cxl_decoder_commit(struct cxl_decoder *cxld)
 {
 	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
@@ -655,7 +685,7 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld)
 	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id));
 	cxld_set_interleave(cxld, &ctrl);
 	cxld_set_type(cxld, &ctrl);
-	base = cxld->hpa_range.start;
+	base = cxl_xlat_to_base(cxld->hpa_range.start, cxlhdm);
 	size = range_len(&cxld->hpa_range);
 
 	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
@@ -746,22 +776,27 @@ static int cxl_setup_hdm_decoder_from_dvsec(
 	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
 	int which, struct cxl_endpoint_dvsec_info *info)
 {
+	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
 	struct cxl_endpoint_decoder *cxled;
-	u64 len;
+	u64 base, size;
 	int rc;
 
 	if (!is_cxl_endpoint(port))
 		return -EOPNOTSUPP;
 
 	cxled = to_cxl_endpoint_decoder(&cxld->dev);
-	len = range_len(&info->dvsec_range[which]);
-	if (!len)
+	size = range_len(&info->dvsec_range[which]);
+	if (!size)
 		return -ENOENT;
+	base = cxl_xlat_to_hpa(info->dvsec_range[which].start, cxlhdm);
 
 	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
 	cxld->commit = NULL;
 	cxld->reset = NULL;
-	cxld->hpa_range = info->dvsec_range[which];
+	cxld->hpa_range = (struct range) {
+		.start = base,
+		.end = base + size -1,
+	};
 
 	/*
 	 * Set the emulated decoder as locked pending additional support to
@@ -770,14 +805,14 @@ static int cxl_setup_hdm_decoder_from_dvsec(
 	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
 	port->commit_end = cxld->id;
 
-	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
+	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, size, 0);
 	if (rc) {
 		dev_err(&port->dev,
 			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
-			port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
+			port->id, cxld->id, *dpa_base, *dpa_base + size - 1, rc);
 		return rc;
 	}
-	*dpa_base += len;
+	*dpa_base += size;
 	cxled->state = CXL_DECODER_STATE_AUTO;
 
 	return 0;
@@ -787,6 +822,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			    int *target_map, void __iomem *hdm, int which,
 			    u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
 {
+	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
 	struct cxl_endpoint_decoder *cxled = NULL;
 	u64 size, base, skip, dpa_size, lo, hi;
 	bool committed;
@@ -823,6 +859,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 
 	if (info)
 		cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
+	base = cxl_xlat_to_hpa(base, cxlhdm);
+
 	cxld->hpa_range = (struct range) {
 		.start = base,
 		.end = base + size - 1,
@@ -1107,16 +1146,20 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
 	}
 
 	for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
-		struct device *cxld_dev;
-
-		cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
-					     dvsec_range_allowed);
+		u64 base = cxl_xlat_to_hpa(info->dvsec_range[i].start, cxlhdm);
+		u64 size = range_len(&info->dvsec_range[i]);
+		struct range hpa_range = {
+			.start = base,
+			.end = base + size -1,
+		};
+		struct device *cxld_dev __free(put_device) =
+			cxld_dev = device_find_child(&root->dev, &hpa_range,
+						     dvsec_range_allowed);
 		if (!cxld_dev) {
 			dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
 			continue;
 		}
 		dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
-		put_device(cxld_dev);
 		allowed++;
 	}
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 61d9f4e00921..849ea97385c9 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -856,6 +856,7 @@ struct cxl_hdm {
 	unsigned int decoder_count;
 	unsigned int target_count;
 	unsigned int interleave_mask;
+	u64 base_hpa;
 	struct cxl_port *port;
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
  2024-07-01 17:47 ` [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c Robert Richter
  2024-07-01 17:47 ` [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding Robert Richter
@ 2024-07-01 17:47 ` Robert Richter
  2024-07-12  1:27   ` Dan Williams
  2024-07-01 17:47 ` [PATCH v2 4/5] cxl/hdm: Setup HPA base for address translation using the HPA window in CFMWS Robert Richter
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Robert Richter @ 2024-07-01 17:47 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Adding an early check to detect platform specifics to (later) enable
HPA address translation. The cxl_root structure is used to store that
information.

Note: The platform check will be added later when enabling address
translation.

Link: https://lore.kernel.org/all/65c68969903b1_afa429460@dwillia2-xfh.jf.intel.com.notmuch/
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c | 7 +++++++
 drivers/cxl/cxl.h  | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 571069863c62..67f73a831bd3 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -536,6 +536,11 @@ static int cxl_get_chbs(struct device *dev, struct acpi_device *hb,
 	return 0;
 }
 
+static void setup_platform_quirks(struct cxl_root *root)
+{
+	root->hpa_xlat_enable = 0;
+}
+
 static int get_genport_coordinates(struct device *dev, struct cxl_dport *dport)
 {
 	struct acpi_device *hb = to_cxl_host_bridge(NULL, dev);
@@ -838,6 +843,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 		return PTR_ERR(cxl_root);
 	root_port = &cxl_root->port;
 
+	setup_platform_quirks(cxl_root);
+
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_dport);
 	if (rc < 0)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 603c0120cff8..95d054dc1af0 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -627,10 +627,12 @@ struct cxl_port {
  *
  * @port: cxl_port member
  * @ops: cxl root operations
+ * @hpa_xlat_enable: enable HPA translation
  */
 struct cxl_root {
 	struct cxl_port port;
 	const struct cxl_root_ops *ops;
+	bool hpa_xlat_enable;
 };
 
 static inline struct cxl_root *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] cxl/hdm: Setup HPA base for address translation using the HPA window in CFMWS
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
                   ` (2 preceding siblings ...)
  2024-07-01 17:47 ` [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation Robert Richter
@ 2024-07-01 17:47 ` Robert Richter
  2024-07-01 17:47 ` [PATCH v2 5/5] cxl/acpi: Enable address translation for Zen4 platforms Robert Richter
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-07-01 17:47 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

There are platforms where an address translation between decoder's
(HPA) and the system's physical addresses (SPA) is needed. The HPA
window in the CFMWS can be used to determine the address offset for
the translation. Each CXL endpoint or switch is uniquely attached to a
CXL host bridge. The host bridge is assigned a unique HPA window in an
CFMWS entry of the CEDT (host bridge is in target list). The hardware
base addresses of a CFMWS is an SPA. With that, the offset can be
determined using the HDM decoder's base address from the registers and
the HPA window in the CFMWS entry of the corresponding CXL host
bridge.

The CFMWS entries are parsed during host bridge enablement and set up
in the CXL root decoder during CXL decoder enumeration before a CXL
endpoint is enabled. That is, the endpoint's host bridge's root
decoder can be determined. The HPA range of it marks the beginning of
the HDM decoder's base address and the offset between both can be used
for later address translation.

Setup HPA base address (@base_hpa) of a struct cxl_hdm by determining
the offset as described. Use the port's host bridge and CXL root port
to find the corresponding CXL root decoder containing the HPA window
in the bridge's CFMWS entry. Only enable this for platforms with the
@hpa_xlat_enable flag set.

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 50078013f4e3..5164ff807537 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -125,8 +125,73 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
 	return true;
 }
 
+static int match_root_decoder(struct device *dev, void *dport_dev)
+{
+	struct cxl_switch_decoder *cxlsd;
+
+	if (!is_switch_decoder(dev))
+		return 0;
+
+	cxlsd = to_cxl_switch_decoder(dev);
+
+	guard(rwsem_read)(&cxl_region_rwsem);
+
+	for (int i = 0; i < cxlsd->nr_targets; i++) {
+		if (dport_dev == cxlsd->target[i]->dport_dev)
+			return 1;
+	}
+
+	return 0;
+}
+
+static struct cxl_decoder *find_root_decoder(struct cxl_port *port,
+					     struct device *dport_dev)
+{
+	struct device *dev;
+
+	dev = device_find_child(&port->dev, dport_dev, match_root_decoder);
+
+	return dev ? to_cxl_decoder(dev) : NULL;
+}
+
+static void setup_base_hpa_cfmws(struct cxl_hdm *cxlhdm,
+				 struct cxl_root *cxl_root)
+{
+	struct cxl_port *port = cxlhdm->port;
+	struct cxl_decoder *cxld;
+	u64 base;
+
+	if (!port->host_bridge) {
+		dev_dbg(&port->dev, "No host bridge found for port.\n");
+		return;
+	}
+
+	cxld = find_root_decoder(&cxl_root->port, port->host_bridge);
+	if (!cxld) {
+		dev_dbg(&port->dev,
+			"CFMWS missing for host bridge %s, HPA range not found.\n",
+			dev_name(port->host_bridge));
+		return;
+	}
+
+	base = cxld->hpa_range.start;
+	dev_dbg(&port->dev,
+		"HPA translation for decoders enabled, base 0x%08llx\n",
+		base);
+	put_device(&cxld->dev);
+
+	cxlhdm->base_hpa = base;
+}
+
 static void setup_base_hpa(struct cxl_hdm *cxlhdm)
 {
+	struct cxl_port *port = cxlhdm->port;
+
+	struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
+
+	if (!cxl_root)
+		return;
+
 	/*
 	 * Address translation is not needed on platforms with HPA ==
 	 * SPA. HDM decoder addresses all base on system addresses,
@@ -134,6 +199,10 @@ static void setup_base_hpa(struct cxl_hdm *cxlhdm)
 	 * == 0). Nothing to do here as it is already pre-initialized
 	 * zero.
 	 */
+	if (!cxl_root->hpa_xlat_enable)
+		return;
+
+	setup_base_hpa_cfmws(cxlhdm, cxl_root);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] cxl/acpi: Enable address translation for Zen4 platforms
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
                   ` (3 preceding siblings ...)
  2024-07-01 17:47 ` [PATCH v2 4/5] cxl/hdm: Setup HPA base for address translation using the HPA window in CFMWS Robert Richter
@ 2024-07-01 17:47 ` Robert Richter
  2024-07-11  0:02 ` [PATCH v2 0/5] Address translation for HDM decoding Alison Schofield
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-07-01 17:47 UTC (permalink / raw)
  To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Enable address translation for Zen4 platforms.

Link: https://lore.kernel.org/all/65c68969903b1_afa429460@dwillia2-xfh.jf.intel.com.notmuch/
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/acpi.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 67f73a831bd3..70fdb806016e 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -11,6 +11,10 @@
 #include "cxlpci.h"
 #include "cxl.h"
 
+#ifdef CONFIG_X86
+#include <asm/cpu.h>
+#endif
+
 #define CXL_RCRB_SIZE	SZ_8K
 
 struct cxl_cxims_data {
@@ -538,7 +542,12 @@ static int cxl_get_chbs(struct device *dev, struct acpi_device *hb,
 
 static void setup_platform_quirks(struct cxl_root *root)
 {
-	root->hpa_xlat_enable = 0;
+#ifdef CONFIG_X86_64
+	bool is_amd_zen4 = boot_cpu_has(X86_FEATURE_ZEN4);
+#else
+	bool is_amd_zen4 = false;
+#endif
+	root->hpa_xlat_enable = is_amd_zen4;
 }
 
 static int get_genport_coordinates(struct device *dev, struct cxl_dport *dport)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Address translation for HDM decoding
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
                   ` (4 preceding siblings ...)
  2024-07-01 17:47 ` [PATCH v2 5/5] cxl/acpi: Enable address translation for Zen4 platforms Robert Richter
@ 2024-07-11  0:02 ` Alison Schofield
  2024-07-11  0:26   ` Alison Schofield
  2024-07-11 19:03   ` Robert Richter
  2024-07-25 22:00 ` Gregory Price
  2024-08-16 18:32 ` Gregory Price
  7 siblings, 2 replies; 17+ messages in thread
From: Alison Schofield @ 2024-07-11  0:02 UTC (permalink / raw)
  To: Robert Richter
  Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel

On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> This series implements address translation for HDM decoding. The
> implementation follows the rule that the representation of hardware
> address ranges in the kernel are all SPA. If decoder registers (HDM
> decoder cap or register range) are not SPA, a base offset must be
> applied. Translation happens when accessing the registers back and
> forth. After a read access an address will be converted to SPA and
> before a write access the programmed address is translated from an
> SPA. The decoder register access can be easily encapsulated by address
> translation and thus there are only a few places where translation is
> needed and the code must be changed. This is implemented in patch #2,
> patch #1 is a prerequisite.
> 
> Address translation is restricted to platforms that need it. As such a
> platform check is needed and a flag is introduced for this (patch #3).
> 
> For address translation the base offset must be determined for the
> memory domain. Depending on the platform there are various options for
> this. The address range in the CEDT's CFWMS entry of the CXL host
> bridge can be used to determine the decoder's base address (patch
> #4). This is enabled for AMD Zen4 platforms (patch #5).


Hi Robert,

This HPA->SPA work needs to be done for addresses reported directly by
devices, ie DPAs in poison and other events - right?

For the XOR case, we discover the need for HPA->SPA while parsing the
CFMWS and add a cxl_hpa_to_spa_fn to the struct cxl_root_decoder. Later,
when the driver wants to translate a DPA (to include in a TRACE_EVENT)
it uses that 'extra mile' HPA->SPA function.

See Patch 2 in this series:
https://lore.kernel.org/cover.1719980933.git.alison.schofield@intel.com/T/#m9206e1f872ef252dbb54ce7f0365f0b267179fda

It seems the Zen4 extra mile is a simple offset from the base calc.
Do you think a zen4 hpa->spa function will fit in with what I've done?

FWIW I took this code for a spin through cxl-test on it's own and combined
w the xor address tranlation patch set and no collisions, all humming along
nicely (for non-zen config).

--Alison

> 
snip
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Address translation for HDM decoding
  2024-07-11  0:02 ` [PATCH v2 0/5] Address translation for HDM decoding Alison Schofield
@ 2024-07-11  0:26   ` Alison Schofield
  2024-07-11 19:03   ` Robert Richter
  1 sibling, 0 replies; 17+ messages in thread
From: Alison Schofield @ 2024-07-11  0:26 UTC (permalink / raw)
  To: Robert Richter
  Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel

On Wed, Jul 10, 2024 at 05:02:46PM -0700, Alison Schofield wrote:
> On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> > Default expectation of Linux is that HPA == SPA, which means that
> > hardware addresses in the decoders are the same as the kernel sees
> > them. However, there are platforms where this is not the case and an
> > address translation between decoder's (HPA) and the system's physical
> > addresses (SPA) is needed.
> > 
> > This series implements address translation for HDM decoding. The
> > implementation follows the rule that the representation of hardware
> > address ranges in the kernel are all SPA. If decoder registers (HDM
> > decoder cap or register range) are not SPA, a base offset must be
> > applied. Translation happens when accessing the registers back and
> > forth. After a read access an address will be converted to SPA and
> > before a write access the programmed address is translated from an
> > SPA. The decoder register access can be easily encapsulated by address
> > translation and thus there are only a few places where translation is
> > needed and the code must be changed. This is implemented in patch #2,
> > patch #1 is a prerequisite.
> > 
> > Address translation is restricted to platforms that need it. As such a
> > platform check is needed and a flag is introduced for this (patch #3).
> > 
> > For address translation the base offset must be determined for the
> > memory domain. Depending on the platform there are various options for
> > this. The address range in the CEDT's CFWMS entry of the CXL host
> > bridge can be used to determine the decoder's base address (patch
> > #4). This is enabled for AMD Zen4 platforms (patch #5).
> 
> 
> Hi Robert,
> 
> This HPA->SPA work needs to be done for addresses reported directly by
> devices, ie DPAs in poison and other events - right?
> 
> For the XOR case, we discover the need for HPA->SPA while parsing the
> CFMWS and add a cxl_hpa_to_spa_fn to the struct cxl_root_decoder. Later,
> when the driver wants to translate a DPA (to include in a TRACE_EVENT)
> it uses that 'extra mile' HPA->SPA function.
> 
> See Patch 2 in this series:
> https://lore.kernel.org/cover.1719980933.git.alison.schofield@intel.com/T/#m9206e1f872ef252dbb54ce7f0365f0b267179fda

Better link:
https://lore.kernel.org/linux-cxl/cover.1719980933.git.alison.schofield@intel.com/

> 
> It seems the Zen4 extra mile is a simple offset from the base calc.
> Do you think a zen4 hpa->spa function will fit in with what I've done?
> 
> FWIW I took this code for a spin through cxl-test on it's own and combined
> w the xor address tranlation patch set and no collisions, all humming along
> nicely (for non-zen config).
> 
> --Alison
> 
> > 
> snip
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Address translation for HDM decoding
  2024-07-11  0:02 ` [PATCH v2 0/5] Address translation for HDM decoding Alison Schofield
  2024-07-11  0:26   ` Alison Schofield
@ 2024-07-11 19:03   ` Robert Richter
  1 sibling, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-07-11 19:03 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel

Alison,

On 10.07.24 17:02:46, Alison Schofield wrote:
> This HPA->SPA work needs to be done for addresses reported directly by
> devices, ie DPAs in poison and other events - right?

we need it to translate the base address of the HDM decoder cap
registers or the range registers to an HPA when accessing it.

> For the XOR case, we discover the need for HPA->SPA while parsing the
> CFMWS and add a cxl_hpa_to_spa_fn to the struct cxl_root_decoder. Later,

As some platforms allow an even more fine grained HPA/SPA mapping that
may devide a memory domain in separate SPA ranges we might just want
to move it directly into struct cxl_decoder so that endpoints or
switches can use it too.

> when the driver wants to translate a DPA (to include in a TRACE_EVENT)
> it uses that 'extra mile' HPA->SPA function.
> 
> See Patch 2 in this series:
> https://lore.kernel.org/cover.1719980933.git.alison.schofield@intel.com/T/#m9206e1f872ef252dbb54ce7f0365f0b267179fda
> 
> It seems the Zen4 extra mile is a simple offset from the base calc.
> Do you think a zen4 hpa->spa function will fit in with what I've done?

Thanks for the pointer, I will take look into the details here and try
to reuse most of that infrastructure.

> 
> FWIW I took this code for a spin through cxl-test on it's own and combined
> w the xor address tranlation patch set and no collisions, all humming along
> nicely (for non-zen config).

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c.
  2024-07-01 17:47 ` [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c Robert Richter
@ 2024-07-12  0:32   ` Dan Williams
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2024-07-12  0:32 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Robert Richter wrote:
> Code to handle HDM specifics is implemented in core/pci.c and
> core/hdm.c. In both files helper functions will be needed for address
> translation. To simplify the introduction of static helper functions,
> move all HDM code to core/hdm.c.
> 
> The function devm_cxl_enable_mem() is no longer static and is shared
> now within the core module. No functional changes otherwise.

Looks ok.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding
  2024-07-01 17:47 ` [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding Robert Richter
@ 2024-07-12  1:03   ` Dan Williams
  2024-07-13  7:40     ` Robert Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2024-07-12  1:03 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> The CXL driver stores all internal hardware address ranges as SPA.
> When accessing the HDM decoder's registers, hardware addresses must be
> translated back and forth. This is needed for the base addresses in
> the CXL Range Registers of the PCIe DVSEC for CXL Devices
> (CXL_DVSEC_RANGE_BASE*) or the CXL HDM Decoder Capability Structure
> (CXL_HDM_DECODER0_BASE*).
> 
> To handle address translation the kernel needs to keep track of the
> system's base HPA the decoder bases on. The base can be different
> between memory domains, each port may have its own domain. Thus, the
> HPA base cannot be shared between CXL ports and its decoders, instead
> the base HPA must be stored per port. Each port has its own struct
> cxl_hdm to handle the sets of decoders and targets, that struct can
> also be used for storing the base.
> 
> Add @base_hpa to struct cxl_hdm. Also Introduce helper functions for
> the translation and use them to convert the HDM decoder base addresses
> to or from an SPA.
> 
> While at this, rename len to size for the common base/size naming used
> with ranges.
> 
> Link: https://lore.kernel.org/all/65c6b8c9a42e4_d2d4294f1@dwillia2-xfh.jf.intel.com.notmuch/
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++--------
>  drivers/cxl/cxlmem.h   |  1 +
>  2 files changed, 57 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 605da9a55d89..50078013f4e3 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -125,6 +125,17 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
>  	return true;
>  }
>  
> +static void setup_base_hpa(struct cxl_hdm *cxlhdm)
> +{
> +	/*
> +	 * Address translation is not needed on platforms with HPA ==
> +	 * SPA. HDM decoder addresses all base on system addresses,
> +	 * there is no offset and the base is zero (cxlhdm->base_hpa
> +	 * == 0). Nothing to do here as it is already pre-initialized
> +	 * zero.
> +	 */
> +}

I am not grokking why this is per @cxlhdm? More on this concern below...

> +	base = cxl_xlat_to_base(cxld->hpa_range.start, cxlhdm);

Would prefer this operation is called cxl_hpa_to_spa(), which is
different than the cxldrd->hpa_to_spa() in Alison's patches. This is
about an HPA domain per-host-bridge.

The cxlrd->hpa_to_spa() is after the address exits the host bridge there
is a translation to a Window interleave. Both of those are "SPAs" in my
mind.

>  	size = range_len(&cxld->hpa_range);
>  
>  	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> @@ -746,22 +776,27 @@ static int cxl_setup_hdm_decoder_from_dvsec(
>  	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
>  	int which, struct cxl_endpoint_dvsec_info *info)
>  {
> +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
>  	struct cxl_endpoint_decoder *cxled;
> -	u64 len;
> +	u64 base, size;

Please don't include renames like this s/@len/@size/ in the same patch
that changes some logic.

>  	int rc;
>  
>  	if (!is_cxl_endpoint(port))
>  		return -EOPNOTSUPP;
>  
>  	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> -	len = range_len(&info->dvsec_range[which]);
> -	if (!len)
> +	size = range_len(&info->dvsec_range[which]);
> +	if (!size)
>  		return -ENOENT;
> +	base = cxl_xlat_to_hpa(info->dvsec_range[which].start, cxlhdm);

Wait, the dvsec_range addresses are read from the registers, they are
already HPAs, shouldn't this be the "to SPA" flavor translation?

>  	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
>  	cxld->commit = NULL;
>  	cxld->reset = NULL;
> -	cxld->hpa_range = info->dvsec_range[which];
> +	cxld->hpa_range = (struct range) {
> +		.start = base,
> +		.end = base + size -1,
> +	};

I think it is confusing to change the software tracking of 'struct
cxl_decoder' to be in SPA, can this be kept as HPA tracking and then
fixup the locations that compare against SPAs, like CFWMS values and the
iomem_resource tree, to do the conversion?

>  
>  	/*
>  	 * Set the emulated decoder as locked pending additional support to
> @@ -770,14 +805,14 @@ static int cxl_setup_hdm_decoder_from_dvsec(
>  	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
>  	port->commit_end = cxld->id;
>  
> -	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
> +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, size, 0);
>  	if (rc) {
>  		dev_err(&port->dev,
>  			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> -			port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
> +			port->id, cxld->id, *dpa_base, *dpa_base + size - 1, rc);
>  		return rc;
>  	}
> -	*dpa_base += len;
> +	*dpa_base += size;
>  	cxled->state = CXL_DECODER_STATE_AUTO;
>  
>  	return 0;
> @@ -787,6 +822,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  			    int *target_map, void __iomem *hdm, int which,
>  			    u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
>  {
> +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
>  	struct cxl_endpoint_decoder *cxled = NULL;
>  	u64 size, base, skip, dpa_size, lo, hi;
>  	bool committed;
> @@ -823,6 +859,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  
>  	if (info)
>  		cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +
> +	base = cxl_xlat_to_hpa(base, cxlhdm);
> +
>  	cxld->hpa_range = (struct range) {
>  		.start = base,
>  		.end = base + size - 1,
> @@ -1107,16 +1146,20 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
>  	}
>  
>  	for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> -		struct device *cxld_dev;
> -
> -		cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
> -					     dvsec_range_allowed);
> +		u64 base = cxl_xlat_to_hpa(info->dvsec_range[i].start, cxlhdm);
> +		u64 size = range_len(&info->dvsec_range[i]);
> +		struct range hpa_range = {
> +			.start = base,
> +			.end = base + size -1,
> +		};
> +		struct device *cxld_dev __free(put_device) =
> +			cxld_dev = device_find_child(&root->dev, &hpa_range,
> +						     dvsec_range_allowed);
>  		if (!cxld_dev) {
>  			dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
>  			continue;
>  		}
>  		dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
> -		put_device(cxld_dev);

Jarring to see this cleanup in the same patch. It deserves to be its own
standalone cleanup patch.

>  		allowed++;
>  	}
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 61d9f4e00921..849ea97385c9 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -856,6 +856,7 @@ struct cxl_hdm {
>  	unsigned int decoder_count;
>  	unsigned int target_count;
>  	unsigned int interleave_mask;
> +	u64 base_hpa;

So, per the concern above, each @cxlhdm instance exists within a port
hierarchy. It is only the top of that port hiearchy that understands
that everything underneath is within it's own CXL HPA address domain.

So I would expect that only place this value needs to be stored is in
the cxl_port objects associated with host-bridges. The only place it
would need to be considered is when comparing iomem_resource and region
addresses with decoder addresses.

In other words, I think it is potentially mentally taxing to remember
that 'struct cxl_decoder' stores translated addresses vs teaching paths
that compare region address about the translation needed for endpoint
decoders.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation
  2024-07-01 17:47 ` [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation Robert Richter
@ 2024-07-12  1:27   ` Dan Williams
  2024-07-13  7:41     ` Robert Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2024-07-12  1:27 UTC (permalink / raw)
  To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Dave Jiang, Davidlohr Bueso
  Cc: linux-cxl, linux-kernel, Robert Richter

Robert Richter wrote:
> Adding an early check to detect platform specifics to (later) enable
> HPA address translation. The cxl_root structure is used to store that
> information.
> 
> Note: The platform check will be added later when enabling address
> translation.

It feels odd to have a flag at the root for this because the translation
is at the host-bridge level, right?

I was more thinking of a solution that does:

spa = cxld_hpa_to_spa(cxld);

...and then internal to that the code walks the port hierarchy from it's
host port to the host bridge. Then does something like

hb->hpa_to_spa(hb, hpa)

Where @hb is:

struct cxl_hb {
	struct cxl_port port;
	u64 (*hpa_to_spa)(struct cxl_hb *, u64);
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding
  2024-07-12  1:03   ` Dan Williams
@ 2024-07-13  7:40     ` Robert Richter
  0 siblings, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-07-13  7:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Jonathan Cameron,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel

On 11.07.24 18:03:13, Dan Williams wrote:
> Robert Richter wrote:
> > Default expectation of Linux is that HPA == SPA, which means that
> > hardware addresses in the decoders are the same as the kernel sees
> > them. However, there are platforms where this is not the case and an
> > address translation between decoder's (HPA) and the system's physical
> > addresses (SPA) is needed.
> > 
> > The CXL driver stores all internal hardware address ranges as SPA.
> > When accessing the HDM decoder's registers, hardware addresses must be
> > translated back and forth. This is needed for the base addresses in
> > the CXL Range Registers of the PCIe DVSEC for CXL Devices
> > (CXL_DVSEC_RANGE_BASE*) or the CXL HDM Decoder Capability Structure
> > (CXL_HDM_DECODER0_BASE*).
> > 
> > To handle address translation the kernel needs to keep track of the
> > system's base HPA the decoder bases on. The base can be different
> > between memory domains, each port may have its own domain. Thus, the
> > HPA base cannot be shared between CXL ports and its decoders, instead
> > the base HPA must be stored per port. Each port has its own struct
> > cxl_hdm to handle the sets of decoders and targets, that struct can
> > also be used for storing the base.
> > 
> > Add @base_hpa to struct cxl_hdm. Also Introduce helper functions for
> > the translation and use them to convert the HDM decoder base addresses
> > to or from an SPA.
> > 
> > While at this, rename len to size for the common base/size naming used
> > with ranges.
> > 
> > Link: https://lore.kernel.org/all/65c6b8c9a42e4_d2d4294f1@dwillia2-xfh.jf.intel.com.notmuch/
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++--------
> >  drivers/cxl/cxlmem.h   |  1 +
> >  2 files changed, 57 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 605da9a55d89..50078013f4e3 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -125,6 +125,17 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> >  	return true;
> >  }
> >  
> > +static void setup_base_hpa(struct cxl_hdm *cxlhdm)
> > +{
> > +	/*
> > +	 * Address translation is not needed on platforms with HPA ==
> > +	 * SPA. HDM decoder addresses all base on system addresses,
> > +	 * there is no offset and the base is zero (cxlhdm->base_hpa
> > +	 * == 0). Nothing to do here as it is already pre-initialized
> > +	 * zero.
> > +	 */
> > +}
> 
> I am not grokking why this is per @cxlhdm? More on this concern below...

Yes, I was going to solve only this single use case. Moving that down
to the decoders as you suggested below would better generalize this
and code could be reused for other implementations or platforms where
conversion is needed.

> 
> > +	base = cxl_xlat_to_base(cxld->hpa_range.start, cxlhdm);
> 
> Would prefer this operation is called cxl_hpa_to_spa(), which is
> different than the cxldrd->hpa_to_spa() in Alison's patches. This is
> about an HPA domain per-host-bridge.
> 
> The cxlrd->hpa_to_spa() is after the address exits the host bridge there
> is a translation to a Window interleave. Both of those are "SPAs" in my
> mind.

I am going to align this with Alison's patches.

> 
> >  	size = range_len(&cxld->hpa_range);
> >  
> >  	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> > @@ -746,22 +776,27 @@ static int cxl_setup_hdm_decoder_from_dvsec(
> >  	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> >  	int which, struct cxl_endpoint_dvsec_info *info)
> >  {
> > +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> >  	struct cxl_endpoint_decoder *cxled;
> > -	u64 len;
> > +	u64 base, size;
> 
> Please don't include renames like this s/@len/@size/ in the same patch
> that changes some logic.

I will separate all cleanups in this patch.

> 
> >  	int rc;
> >  
> >  	if (!is_cxl_endpoint(port))
> >  		return -EOPNOTSUPP;
> >  
> >  	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > -	len = range_len(&info->dvsec_range[which]);
> > -	if (!len)
> > +	size = range_len(&info->dvsec_range[which]);
> > +	if (!size)
> >  		return -ENOENT;
> > +	base = cxl_xlat_to_hpa(info->dvsec_range[which].start, cxlhdm);
> 
> Wait, the dvsec_range addresses are read from the registers, they are
> already HPAs, shouldn't this be the "to SPA" flavor translation?

Yes, the naming is a little confusing at all.

The intention here was somehow "corrected" hpa, as it is later used in
cxld->"hpa"_range, and that is meant to be hpa == spa. That will
change anyway with a rework.

> 
> >  	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> >  	cxld->commit = NULL;
> >  	cxld->reset = NULL;
> > -	cxld->hpa_range = info->dvsec_range[which];
> > +	cxld->hpa_range = (struct range) {
> > +		.start = base,
> > +		.end = base + size -1,
> > +	};
> 
> I think it is confusing to change the software tracking of 'struct
> cxl_decoder' to be in SPA, can this be kept as HPA tracking and then
> fixup the locations that compare against SPAs, like CFWMS values and the
> iomem_resource tree, to do the conversion?

In sysfs the addresses should be shown as SPA. If hpa_range is always
an SPA there is the advantage that only the access to the registers
needs translation. If that changes, the accessors of hpa_range need
conversion. Need to check impact of this.

Possibly we could maintain both ranges, e.g. another spa_range that
contains cached values with the translated addresses.

That said, regarding naming, an HPA is an address that can be
read/written from/to the HDM decoder cap regs or the range regs,
right?

> 
> >  
> >  	/*
> >  	 * Set the emulated decoder as locked pending additional support to
> > @@ -770,14 +805,14 @@ static int cxl_setup_hdm_decoder_from_dvsec(
> >  	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
> >  	port->commit_end = cxld->id;
> >  
> > -	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
> > +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, size, 0);
> >  	if (rc) {
> >  		dev_err(&port->dev,
> >  			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> > -			port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
> > +			port->id, cxld->id, *dpa_base, *dpa_base + size - 1, rc);
> >  		return rc;
> >  	}
> > -	*dpa_base += len;
> > +	*dpa_base += size;
> >  	cxled->state = CXL_DECODER_STATE_AUTO;
> >  
> >  	return 0;
> > @@ -787,6 +822,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> >  			    int *target_map, void __iomem *hdm, int which,
> >  			    u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
> >  {
> > +	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> >  	struct cxl_endpoint_decoder *cxled = NULL;
> >  	u64 size, base, skip, dpa_size, lo, hi;
> >  	bool committed;
> > @@ -823,6 +859,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> >  
> >  	if (info)
> >  		cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > +
> > +	base = cxl_xlat_to_hpa(base, cxlhdm);
> > +
> >  	cxld->hpa_range = (struct range) {
> >  		.start = base,
> >  		.end = base + size - 1,
> > @@ -1107,16 +1146,20 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> >  	}
> >  
> >  	for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> > -		struct device *cxld_dev;
> > -
> > -		cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
> > -					     dvsec_range_allowed);
> > +		u64 base = cxl_xlat_to_hpa(info->dvsec_range[i].start, cxlhdm);
> > +		u64 size = range_len(&info->dvsec_range[i]);
> > +		struct range hpa_range = {
> > +			.start = base,
> > +			.end = base + size -1,
> > +		};
> > +		struct device *cxld_dev __free(put_device) =
> > +			cxld_dev = device_find_child(&root->dev, &hpa_range,
> > +						     dvsec_range_allowed);
> >  		if (!cxld_dev) {
> >  			dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
> >  			continue;
> >  		}
> >  		dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
> > -		put_device(cxld_dev);
> 
> Jarring to see this cleanup in the same patch. It deserves to be its own
> standalone cleanup patch.

Ok.

> 
> >  		allowed++;
> >  	}
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 61d9f4e00921..849ea97385c9 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -856,6 +856,7 @@ struct cxl_hdm {
> >  	unsigned int decoder_count;
> >  	unsigned int target_count;
> >  	unsigned int interleave_mask;
> > +	u64 base_hpa;
> 
> So, per the concern above, each @cxlhdm instance exists within a port
> hierarchy. It is only the top of that port hiearchy that understands
> that everything underneath is within it's own CXL HPA address domain.
> 
> So I would expect that only place this value needs to be stored is in
> the cxl_port objects associated with host-bridges. The only place it
> would need to be considered is when comparing iomem_resource and region
> addresses with decoder addresses.
> 
> In other words, I think it is potentially mentally taxing to remember
> that 'struct cxl_decoder' stores translated addresses vs teaching paths
> that compare region address about the translation needed for endpoint
> decoders.

Yes, that could be moved from HDM down to the decoders and translated
addresses being kept there. That approach is more flexible to also
solve other needs, such as XOR math translations or those of other
platforms.

So will look into this.

Thanks for review.

-Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation
  2024-07-12  1:27   ` Dan Williams
@ 2024-07-13  7:41     ` Robert Richter
  0 siblings, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-07-13  7:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Jonathan Cameron,
	Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel

On 11.07.24 18:27:30, Dan Williams wrote:
> Robert Richter wrote:
> > Adding an early check to detect platform specifics to (later) enable
> > HPA address translation. The cxl_root structure is used to store that
> > information.
> > 
> > Note: The platform check will be added later when enabling address
> > translation.
> 
> It feels odd to have a flag at the root for this because the translation
> is at the host-bridge level, right?
> 
> I was more thinking of a solution that does:
> 
> spa = cxld_hpa_to_spa(cxld);
> 
> ...and then internal to that the code walks the port hierarchy from it's
> host port to the host bridge. Then does something like
> 
> hb->hpa_to_spa(hb, hpa)
> 
> Where @hb is:
> 
> struct cxl_hb {
> 	struct cxl_port port;
> 	u64 (*hpa_to_spa)(struct cxl_hb *, u64);
> }

Ok, will consider that for the rework.

-Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Address translation for HDM decoding
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
                   ` (5 preceding siblings ...)
  2024-07-11  0:02 ` [PATCH v2 0/5] Address translation for HDM decoding Alison Schofield
@ 2024-07-25 22:00 ` Gregory Price
  2024-08-06 10:54   ` Robert Richter
  2024-08-16 18:32 ` Gregory Price
  7 siblings, 1 reply; 17+ messages in thread
From: Gregory Price @ 2024-07-25 22:00 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel

On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> This series implements address translation for HDM decoding. The
> implementation follows the rule that the representation of hardware
> address ranges in the kernel are all SPA. If decoder registers (HDM
> decoder cap or register range) are not SPA, a base offset must be
> applied. Translation happens when accessing the registers back and
> forth. After a read access an address will be converted to SPA and
> before a write access the programmed address is translated from an
> SPA. The decoder register access can be easily encapsulated by address
> translation and thus there are only a few places where translation is
> needed and the code must be changed. This is implemented in patch #2,
> patch #1 is a prerequisite.
> 
> Address translation is restricted to platforms that need it. As such a
> platform check is needed and a flag is introduced for this (patch #3).
> 
> For address translation the base offset must be determined for the
> memory domain. Depending on the platform there are various options for
> this. The address range in the CEDT's CFWMS entry of the CXL host
> bridge can be used to determine the decoder's base address (patch
> #4). This is enabled for AMD Zen4 platforms (patch #5).
> 
> Changelog:
> 
> v2:
>  * Fixed build error for other archs [kbot]
> 

Hi Robert,

I'm looking to test this patch series but saw you were looking at
reworking a portion of it.  Just wanted to inquire as to whether
you think I should wait for a v3 given this is a few weeks old now.

Thanks!
~Gregory

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Address translation for HDM decoding
  2024-07-25 22:00 ` Gregory Price
@ 2024-08-06 10:54   ` Robert Richter
  0 siblings, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-08-06 10:54 UTC (permalink / raw)
  To: Gregory Price
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel

On 25.07.24 18:00:00, Gregory Price wrote:
> On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> > Default expectation of Linux is that HPA == SPA, which means that
> > hardware addresses in the decoders are the same as the kernel sees
> > them. However, there are platforms where this is not the case and an
> > address translation between decoder's (HPA) and the system's physical
> > addresses (SPA) is needed.
> > 
> > This series implements address translation for HDM decoding. The
> > implementation follows the rule that the representation of hardware
> > address ranges in the kernel are all SPA. If decoder registers (HDM
> > decoder cap or register range) are not SPA, a base offset must be
> > applied. Translation happens when accessing the registers back and
> > forth. After a read access an address will be converted to SPA and
> > before a write access the programmed address is translated from an
> > SPA. The decoder register access can be easily encapsulated by address
> > translation and thus there are only a few places where translation is
> > needed and the code must be changed. This is implemented in patch #2,
> > patch #1 is a prerequisite.
> > 
> > Address translation is restricted to platforms that need it. As such a
> > platform check is needed and a flag is introduced for this (patch #3).
> > 
> > For address translation the base offset must be determined for the
> > memory domain. Depending on the platform there are various options for
> > this. The address range in the CEDT's CFWMS entry of the CXL host
> > bridge can be used to determine the decoder's base address (patch
> > #4). This is enabled for AMD Zen4 platforms (patch #5).
> > 
> > Changelog:
> > 
> > v2:
> >  * Fixed build error for other archs [kbot]
> > 
> 
> Hi Robert,
> 
> I'm looking to test this patch series but saw you were looking at
> reworking a portion of it.  Just wanted to inquire as to whether
> you think I should wait for a v3 given this is a few weeks old now.

Yes, please wait with it.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Address translation for HDM decoding
  2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
                   ` (6 preceding siblings ...)
  2024-07-25 22:00 ` Gregory Price
@ 2024-08-16 18:32 ` Gregory Price
  7 siblings, 0 replies; 17+ messages in thread
From: Gregory Price @ 2024-08-16 18:32 UTC (permalink / raw)
  To: Robert Richter
  Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
	linux-kernel

On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> This series implements address translation for HDM decoding. The
> implementation follows the rule that the representation of hardware
> address ranges in the kernel are all SPA. If decoder registers (HDM
> decoder cap or register range) are not SPA, a base offset must be
> applied. Translation happens when accessing the registers back and
> forth. After a read access an address will be converted to SPA and
> before a write access the programmed address is translated from an
> SPA. The decoder register access can be easily encapsulated by address
> translation and thus there are only a few places where translation is
> needed and the code must be changed. This is implemented in patch #2,
> patch #1 is a prerequisite.
> 
> Address translation is restricted to platforms that need it. As such a
> platform check is needed and a flag is introduced for this (patch #3).
> 
> For address translation the base offset must be determined for the
> memory domain. Depending on the platform there are various options for
> this. The address range in the CEDT's CFWMS entry of the CXL host
> bridge can be used to determine the decoder's base address (patch
> #4). This is enabled for AMD Zen4 platforms (patch #5).
> 

Just testing this out, and I'm seeing an inability to actually map the
memory, though the reason escapes me.

Do you have the expected workflow of this down?

For example this fails:

echo ram > /sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
echo 4096 > /sys/bus/cxl/devices/region0/interleave_granularity
echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
echo 0x40000000 > /sys/bus/cxl/devices/region0/size
echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
^^^^ at this point: -bash: echo: write error: Invalid argument

echo 1 > /sys/bus/cxl/devices/region0/commit

and while the cxl driver sees the correct topology, the current version
of cxl-cli reports the memdevs as "anonymous" and reports a failure to
lookup the targets for a region

without adding too much bulk to the email

[~/ndctl]# ./build/cxl/cxl list -vvvvv
libcxl: cxl_mappings_init: region0 target0:  lookup failure
[
  {
    "anon memdevs":[
      {
        "memdev":"mem0",
        "ram_size":137438953472,
...
      {
        "memdev":"mem1",
        "ram_size":137438953472,
...
  {
    "buses":[
      {
...
        "decoders:root0":[
          {
            "decoder":"decoder0.0",
            "resource":825975898112,
            "size":260382392320,
            "interleave_ways":1,
            "max_available_extent":260382392320,
            "volatile_capable":true,
            "qos_class":1,
            "nr_targets":1,
            "targets":[
              {
                "target":"pci0000:00",
                "alias":"ACPI0016:01",
                "position":0,
                "id":0
              }
            ],
            "regions:decoder0.0":[
              {
                "region":"region0",
                "resource":825975898112,
                "size":260382392320,
                "type":"ram",
                "interleave_ways":1,
                "interleave_granularity":256,
                "decode_state":"reset",
                "state":"disabled",
                "mappings":[
                ]
              }
...

Do you have a sense of what might generate this behavior?

~Gregory

> Changelog:
> 
> v2:
>  * Fixed build error for other archs [kbot]
> 
> 
> Robert Richter (5):
>   cxl/hdm: Moving HDM specific code to core/hdm.c.
>   cxl/hdm: Implement address translation for HDM decoding
>   cxl/acpi: Add platform flag for HPA address translation
>   cxl/hdm: Setup HPA base for address translation using the HPA window
>     in CFMWS
>   cxl/acpi: Enable address translation for Zen4 platforms
> 
>  drivers/cxl/acpi.c      |  16 +++
>  drivers/cxl/core/core.h |   2 +
>  drivers/cxl/core/hdm.c  | 245 ++++++++++++++++++++++++++++++++++++++--
>  drivers/cxl/core/pci.c  | 119 +------------------
>  drivers/cxl/cxl.h       |   2 +
>  drivers/cxl/cxlmem.h    |   4 +
>  drivers/cxl/cxlpci.h    |   3 -
>  7 files changed, 262 insertions(+), 129 deletions(-)
> 
> 
> base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-08-16 18:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-01 17:47 [PATCH v2 0/5] Address translation for HDM decoding Robert Richter
2024-07-01 17:47 ` [PATCH v2 1/5] cxl/hdm: Moving HDM specific code to core/hdm.c Robert Richter
2024-07-12  0:32   ` Dan Williams
2024-07-01 17:47 ` [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding Robert Richter
2024-07-12  1:03   ` Dan Williams
2024-07-13  7:40     ` Robert Richter
2024-07-01 17:47 ` [PATCH v2 3/5] cxl/acpi: Add platform flag for HPA address translation Robert Richter
2024-07-12  1:27   ` Dan Williams
2024-07-13  7:41     ` Robert Richter
2024-07-01 17:47 ` [PATCH v2 4/5] cxl/hdm: Setup HPA base for address translation using the HPA window in CFMWS Robert Richter
2024-07-01 17:47 ` [PATCH v2 5/5] cxl/acpi: Enable address translation for Zen4 platforms Robert Richter
2024-07-11  0:02 ` [PATCH v2 0/5] Address translation for HDM decoding Alison Schofield
2024-07-11  0:26   ` Alison Schofield
2024-07-11 19:03   ` Robert Richter
2024-07-25 22:00 ` Gregory Price
2024-08-06 10:54   ` Robert Richter
2024-08-16 18:32 ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox