Linux CXL
 help / color / mirror / Atom feed
* [PATCH v4 0/4] XOR Math Fixups: translation & position
@ 2024-07-03  5:29 alison.schofield
  2024-07-03  5:29 ` [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() alison.schofield
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: alison.schofield @ 2024-07-03  5:29 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>


Dropped tags on Patch 2 due to changes. Please Tag again.

Changes in v4:
- Patch 1: Updated commit msg/log 
  The name tidy-ups eventually led to a 'fold' not a 'rename'
- Patch 2: Rename the root decoder callback hpa_to_spa (Dan)
- Patch 2: Remove hpa_to_spa as a param to cxl_root_decoder_alloc()
- Patch 2: Add code comment that chunk check is modulo only (Fabio)
- Patch 2: Add lore link to unit test in commit log (Fabio)
- Cover Letter: Add an introduction (Dan)

Link to v3:
https://lore.kernel.org/cover.1719275633.git.alison.schofield@intel.com/


Begin cover letter:

XOR Math Fixups are presented for both translation and position.

Translation:
The CXL driver intends to report DPAs and their SPA translation in
the TRACE logs for CXL poison, general_media, and dram events. It
is actually only logging the HPA, not the SPA. That works for CXL
decodes using typical MODULO arithmetic where HPA==SPA, but not for
XOR decodes. The driver needs to restore the XOR'd bits in order to
get to the SPA and it doesn't. This means that address translations
for root decoders using XOR maps are wrong.

Specifically regions that interleave across 2,4,6,8,12, or 16 host
bridges are affected. Interleaves using 1 or 3 host bridges, even if
configured with XOR Arithmetic, do not use xormaps, and are safe.

Aside from knowing that any address translation of a 1 or 3 way host
bridge interleave is correct no matter the decode (XOR or MODULO),
all others are suspect because the decode is actually transparent to
users.

Position:
The position part of this patchset came from the discovery that
the driver doesn't need to calculate a targets position in a region
interleave set. The BIOS sets the target list and the driver can
simply use that order.


Presentation is as follows:

Patch 1: Clean up - cxl_trace_hpa()-> cxl_dpa_to_hpa()

Patch 2: cxl: Restore XOR'd position bits during address translation
This completes the DPA->HPA->SPA translation, correcting the XOR
address translation problem described above.

Patch 3 & Patch 4 are paired. Patch 3 presents the new method for
verifying a target position in the list and Patch 4 removes the
old method.

FYI - the reason I don't present the code removal first is because
I think it is easier to read the diff if I leave in the old root
decoder call back setup for calc_hb, insert the new call back along
the same path, and then rip out the defunct calc_hb. That's the
way I created the patchset and it may be an easier way for reviewers
to follow along with the root decoder callback setup.


Alison Schofield (4):
  cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
  cxl: Restore XOR'd position bits during address translation
  cxl/region: Verify target positions using the ordered target list
  cxl: Remove defunct code calculating host bridge target positions

 drivers/cxl/acpi.c        | 84 ++++++++++++++++-----------------------
 drivers/cxl/core/core.h   |  8 ++--
 drivers/cxl/core/mbox.c   |  2 +-
 drivers/cxl/core/port.c   | 20 +---------
 drivers/cxl/core/region.c | 61 ++++++++++++++--------------
 drivers/cxl/core/trace.h  |  4 +-
 drivers/cxl/cxl.h         | 11 ++---
 7 files changed, 77 insertions(+), 113 deletions(-)


base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
-- 
2.37.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
  2024-07-03  5:29 [PATCH v4 0/4] XOR Math Fixups: translation & position alison.schofield
@ 2024-07-03  5:29 ` alison.schofield
  2024-07-11 20:52   ` Robert Richter
  2024-07-03  5:29 ` [PATCH v4 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: alison.schofield @ 2024-07-03  5:29 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
addresses the work it performs is a DPA to HPA translation not a
trace. Tidy up this naming by moving the minimal work done in
cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa()
for trace event callbacks.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/core.h   |  8 ++++----
 drivers/cxl/core/mbox.c   |  2 +-
 drivers/cxl/core/region.c | 33 +++++++++++++--------------------
 drivers/cxl/core/trace.h  |  4 ++--
 4 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 625394486459..72a506c9dbd0 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -28,12 +28,12 @@ int cxl_region_init(void);
 void cxl_region_exit(void);
 int cxl_get_poison_by_endpoint(struct cxl_port *port);
 struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
-u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
-		  u64 dpa);
+u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
+		   u64 dpa);
 
 #else
-static inline u64
-cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
+static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
+				 const struct cxl_memdev *cxlmd, u64 dpa)
 {
 	return ULLONG_MAX;
 }
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 2626f3fff201..eb0b08e5136f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -878,7 +878,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 		dpa = le64_to_cpu(evt->common.phys_addr) & CXL_DPA_MASK;
 		cxlr = cxl_dpa_to_region(cxlmd, dpa);
 		if (cxlr)
-			hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
+			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
 
 		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
 			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3c2b6144be23..237c28d5f2cc 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2749,15 +2749,25 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
 	return false;
 }
 
-static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
-			  struct cxl_endpoint_decoder *cxled)
+u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
+		   u64 dpa)
 {
 	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
 	struct cxl_region_params *p = &cxlr->params;
-	int pos = cxled->pos;
+	struct cxl_endpoint_decoder *cxled = NULL;
 	u16 eig = 0;
 	u8 eiw = 0;
+	int pos;
 
+	for (int i = 0; i < p->nr_targets; i++) {
+		cxled = p->targets[i];
+		if (cxlmd == cxled_to_memdev(cxled))
+			break;
+	}
+	if (!cxled || cxlmd != cxled_to_memdev(cxled))
+		return ULLONG_MAX;
+
+	pos = cxled->pos;
 	ways_to_eiw(p->interleave_ways, &eiw);
 	granularity_to_eig(p->interleave_granularity, &eig);
 
@@ -2797,23 +2807,6 @@ static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
 	return hpa;
 }
 
-u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
-		  u64 dpa)
-{
-	struct cxl_region_params *p = &cxlr->params;
-	struct cxl_endpoint_decoder *cxled = NULL;
-
-	for (int i = 0; i <  p->nr_targets; i++) {
-		cxled = p->targets[i];
-		if (cxlmd == cxled_to_memdev(cxled))
-			break;
-	}
-	if (!cxled || cxlmd != cxled_to_memdev(cxled))
-		return ULLONG_MAX;
-
-	return cxl_dpa_to_hpa(dpa, cxlr, cxled);
-}
-
 static struct lock_class_key cxl_pmem_region_key;
 
 static int cxl_pmem_region_alloc(struct cxl_region *cxlr)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index ee5cd4eb2f16..21b76e9c5c60 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -704,8 +704,8 @@ TRACE_EVENT(cxl_poison,
 		if (cxlr) {
 			__assign_str(region);
 			memcpy(__entry->uuid, &cxlr->params.uuid, 16);
-			__entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
-						     __entry->dpa);
+			__entry->hpa = cxl_dpa_to_hpa(cxlr, cxlmd,
+						      __entry->dpa);
 		} else {
 			__assign_str(region);
 			memset(__entry->uuid, 0, 16);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/4] cxl: Restore XOR'd position bits during address translation
  2024-07-03  5:29 [PATCH v4 0/4] XOR Math Fixups: translation & position alison.schofield
  2024-07-03  5:29 ` [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() alison.schofield
@ 2024-07-03  5:29 ` alison.schofield
  2024-07-11 22:35   ` Dan Williams
  2024-07-03  5:29 ` [PATCH v4 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: alison.schofield @ 2024-07-03  5:29 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, Diego Garcia Rodriguez

From: Alison Schofield <alison.schofield@intel.com>

When a device reports a DPA in events like poison, general_media,
and dram, the driver translates that DPA back to an HPA. Presently,
the CXL driver translation only considers the Modulo position and
will report the wrong HPA for XOR configured root decoders.

Add a helper function that restores the XOR'd bits during DPA->HPA
address translation. Plumb a root decoder callback to the new helper
when XOR interleave arithmetic is in use. For Modulo arithmetic, just
let the callback be NULL - as in no extra work required.

Upon completion of a DPA->HPA translation a couple of checks are
performed on the result. One simply confirms that the calculated
HPA is within the address range of the region. That test is useful
for both Modulo and XOR interleave arithmetic decodes.

A second check confirms that the HPA is within an expected chunk
based on the endpoints position in the region and the region
granularity. An XOR decode disrupts the Modulo pattern making the
chunk check useless.

To align the checks with the proper decode, pull the region range
check inline and use the helper to do the chunk check for Modulo
decodes only.

A cxl-test unit test is posted for upstream review here:
https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/

Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
---
 drivers/cxl/acpi.c        | 40 +++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/region.c | 23 +++++++++++++---------
 drivers/cxl/cxl.h         |  3 +++
 3 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 571069863c62..6b6ae9c81368 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
 	return cxlrd->cxlsd.target[n];
 }
 
+static u64 cxl_xor_hpa_to_spa(struct cxl_root_decoder *cxlrd, u64 hpa)
+{
+	struct cxl_cxims_data *cximsd = cxlrd->platform_data;
+	int hbiw = cxlrd->cxlsd.nr_targets;
+	u64 val;
+	int pos;
+
+	/* No xormaps for host bridge interleave ways of 1 or 3 */
+	if (hbiw == 1 || hbiw == 3)
+		return hpa;
+
+	/*
+	 * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
+	 * the position bit to its value before the xormap was applied at
+	 * HPA->DPA translation.
+	 *
+	 * pos is the lowest set bit in an XORMAP
+	 * val is the XORALLBITS(HPA & XORMAP)
+	 *
+	 * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
+	 * as an operation that outputs a single bit by XORing all the
+	 * bits in the input (hpa & xormap). Implement XORALLBITS using
+	 * hweight64(). If the hamming weight is even the XOR of those
+	 * bits results in val==0, if odd the XOR result is val==1.
+	 */
+
+	for (int i = 0; i < cximsd->nr_maps; i++) {
+		if (!cximsd->xormaps[i])
+			continue;
+		pos = __ffs(cximsd->xormaps[i]);
+		val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
+		hpa = (hpa & ~(1ULL << pos)) | (val << pos);
+	}
+
+	return hpa;
+}
+
 struct cxl_cxims_context {
 	struct device *dev;
 	struct cxl_root_decoder *cxlrd;
@@ -434,6 +471,9 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 
 	cxlrd->qos_class = cfmws->qtg_id;
 
+	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_XOR)
+		cxlrd->hpa_to_spa = cxl_xor_hpa_to_spa;
+
 	rc = cxl_decoder_add(cxld, target_map);
 	if (rc)
 		return rc;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 237c28d5f2cc..23abd0f7b856 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2723,20 +2723,13 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
 	return ctx.cxlr;
 }
 
-static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
+static bool cxl_is_hpa_in_chunk(u64 hpa, struct cxl_region *cxlr, int pos)
 {
 	struct cxl_region_params *p = &cxlr->params;
 	int gran = p->interleave_granularity;
 	int ways = p->interleave_ways;
 	u64 offset;
 
-	/* Is the hpa within this region at all */
-	if (hpa < p->res->start || hpa > p->res->end) {
-		dev_dbg(&cxlr->dev,
-			"Addr trans fail: hpa 0x%llx not in region\n", hpa);
-		return false;
-	}
-
 	/* Is the hpa in an expected chunk for its pos(-ition) */
 	offset = hpa - p->res->start;
 	offset = do_div(offset, gran * ways);
@@ -2752,6 +2745,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
 u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
 		   u64 dpa)
 {
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
 	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
 	struct cxl_region_params *p = &cxlr->params;
 	struct cxl_endpoint_decoder *cxled = NULL;
@@ -2801,7 +2795,18 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
 	/* Apply the hpa_offset to the region base address */
 	hpa = hpa_offset + p->res->start;
 
-	if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
+	/* Root decoder translation overrides typical modulo decode */
+	if (cxlrd->hpa_to_spa)
+		hpa = cxlrd->hpa_to_spa(cxlrd, hpa);
+
+	if (hpa < p->res->start || hpa > p->res->end) {
+		dev_dbg(&cxlr->dev,
+			"Addr trans fail: hpa 0x%llx not in region\n", hpa);
+		return ULLONG_MAX;
+	}
+
+	/* Simple chunk check, by pos & gran, only applies to modulo decodes */
+	if (!cxlrd->hpa_to_spa && (!cxl_is_hpa_in_chunk(hpa, cxlr, pos)))
 		return ULLONG_MAX;
 
 	return hpa;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 603c0120cff8..dfc4f27d195c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -434,12 +434,14 @@ struct cxl_switch_decoder {
 struct cxl_root_decoder;
 typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
 					    int pos);
+typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
 
 /**
  * struct cxl_root_decoder - Static platform CXL address decoder
  * @res: host / parent resource for region allocations
  * @region_id: region id for next region provisioning event
  * @calc_hb: which host bridge covers the n'th position by granularity
+ * @hpa_to_spa: translate CXL host-physical-address to Platform system-physical-address
  * @platform_data: platform specific configuration data
  * @range_lock: sync region autodiscovery by address range
  * @qos_class: QoS performance class cookie
@@ -449,6 +451,7 @@ struct cxl_root_decoder {
 	struct resource *res;
 	atomic_t region_id;
 	cxl_calc_hb_fn calc_hb;
+	cxl_hpa_to_spa_fn hpa_to_spa;
 	void *platform_data;
 	struct mutex range_lock;
 	int qos_class;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/4] cxl/region: Verify target positions using the ordered target list
  2024-07-03  5:29 [PATCH v4 0/4] XOR Math Fixups: translation & position alison.schofield
  2024-07-03  5:29 ` [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() alison.schofield
  2024-07-03  5:29 ` [PATCH v4 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
@ 2024-07-03  5:29 ` alison.schofield
  2024-07-03  5:29 ` [PATCH v4 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
  2024-07-11 19:59 ` [PATCH v4 0/4] XOR Math Fixups: translation & position Dan Williams
  4 siblings, 0 replies; 8+ messages in thread
From: alison.schofield @ 2024-07-03  5:29 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

When a root decoder is configured the interleave target list is read
from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
9-22 the target list is in interleave order. The CXL driver populates
its decoder target list in the same order and stores it in 'struct
cxl_switch_decoder' field "@target: active ordered target list in
current decoder configuration"

Given the promise of an ordered list, the driver can stop duplicating
the work of BIOS and simply check target positions against the ordered
list during region configuration.

The simplified check against the ordered list is presented here.
A follow-on patch will remove the unused code.

For Modulo arithmetic this is not a fix, only a simplification.
For XOR arithmetic this is a fix for HB IW of 3,6,12.

Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/core/region.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 23abd0f7b856..2772828ca6ca 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1559,10 +1559,13 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
 				      const struct cxl_dport *dport, int pos)
 {
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
+	struct cxl_decoder *cxld = &cxlsd->cxld;
+	int iw = cxld->interleave_ways;
 	struct cxl_port *iter;
 	int rc;
 
-	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
+	if (dport != cxlrd->cxlsd.target[pos % iw]) {
 		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
 			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
 			dev_name(&cxlrd->cxlsd.cxld.dev));
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 4/4] cxl: Remove defunct code calculating host bridge target positions
  2024-07-03  5:29 [PATCH v4 0/4] XOR Math Fixups: translation & position alison.schofield
                   ` (2 preceding siblings ...)
  2024-07-03  5:29 ` [PATCH v4 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
@ 2024-07-03  5:29 ` alison.schofield
  2024-07-11 19:59 ` [PATCH v4 0/4] XOR Math Fixups: translation & position Dan Williams
  4 siblings, 0 replies; 8+ messages in thread
From: alison.schofield @ 2024-07-03  5:29 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS
target list in interleave target order. This means the calculations
the CXL driver added to determine positions when XOR math is in use,
along with the entire XOR vs Modulo call back setup is not needed.

A prior patch added a common method to verify positions.

Remove the now unused code related to the cxl_calc_hb_fn.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/acpi.c      | 60 ++---------------------------------------
 drivers/cxl/core/port.c | 20 +-------------
 drivers/cxl/cxl.h       |  8 +-----
 3 files changed, 4 insertions(+), 84 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 6b6ae9c81368..921cee3bb980 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -22,57 +22,6 @@ static const guid_t acpi_cxl_qtg_id_guid =
 	GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
 		  0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
 
-/*
- * Find a targets entry (n) in the host bridge interleave list.
- * CXL Specification 3.0 Table 9-22
- */
-static int cxl_xor_calc_n(u64 hpa, struct cxl_cxims_data *cximsd, int iw,
-			  int ig)
-{
-	int i = 0, n = 0;
-	u8 eiw;
-
-	/* IW: 2,4,6,8,12,16 begin building 'n' using xormaps */
-	if (iw != 3) {
-		for (i = 0; i < cximsd->nr_maps; i++)
-			n |= (hweight64(hpa & cximsd->xormaps[i]) & 1) << i;
-	}
-	/* IW: 3,6,12 add a modulo calculation to 'n' */
-	if (!is_power_of_2(iw)) {
-		if (ways_to_eiw(iw, &eiw))
-			return -1;
-		hpa &= GENMASK_ULL(51, eiw + ig);
-		n |= do_div(hpa, 3) << i;
-	}
-	return n;
-}
-
-static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
-{
-	struct cxl_cxims_data *cximsd = cxlrd->platform_data;
-	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
-	struct cxl_decoder *cxld = &cxlsd->cxld;
-	int ig = cxld->interleave_granularity;
-	int iw = cxld->interleave_ways;
-	int n = 0;
-	u64 hpa;
-
-	if (dev_WARN_ONCE(&cxld->dev,
-			  cxld->interleave_ways != cxlsd->nr_targets,
-			  "misconfigured root decoder\n"))
-		return NULL;
-
-	hpa = cxlrd->res->start + pos * ig;
-
-	/* Entry (n) is 0 for no interleave (iw == 1) */
-	if (iw != 1)
-		n = cxl_xor_calc_n(hpa, cximsd, iw, ig);
-
-	if (n < 0)
-		return NULL;
-
-	return cxlrd->cxlsd.target[n];
-}
 
 static u64 cxl_xor_hpa_to_spa(struct cxl_root_decoder *cxlrd, u64 hpa)
 {
@@ -398,7 +347,6 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 	struct cxl_port *root_port = ctx->root_port;
 	struct cxl_cxims_context cxims_ctx;
 	struct device *dev = ctx->dev;
-	cxl_calc_hb_fn cxl_calc_hb;
 	struct cxl_decoder *cxld;
 	unsigned int ways, i, ig;
 	int rc;
@@ -426,13 +374,9 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 	if (rc)
 		return rc;
 
-	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
-		cxl_calc_hb = cxl_hb_modulo;
-	else
-		cxl_calc_hb = cxl_hb_xor;
-
 	struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
-		cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
+		cxl_root_decoder_alloc(root_port, ways);
+
 	if (IS_ERR(cxlrd))
 		return PTR_ERR(cxlrd);
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 887ed6e358fb..bc9e18d02598 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1733,21 +1733,6 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
 	return 0;
 }
 
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
-{
-	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
-	struct cxl_decoder *cxld = &cxlsd->cxld;
-	int iw;
-
-	iw = cxld->interleave_ways;
-	if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
-			  "misconfigured root decoder\n"))
-		return NULL;
-
-	return cxlrd->cxlsd.target[pos % iw];
-}
-EXPORT_SYMBOL_NS_GPL(cxl_hb_modulo, CXL);
-
 static struct lock_class_key cxl_decoder_key;
 
 /**
@@ -1807,7 +1792,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
  * cxl_root_decoder_alloc - Allocate a root level decoder
  * @port: owning CXL root of this decoder
  * @nr_targets: static number of downstream targets
- * @calc_hb: which host bridge covers the n'th position by granularity
  *
  * Return: A new cxl decoder to be registered by cxl_decoder_add(). A
  * 'CXL root' decoder is one that decodes from a top-level / static platform
@@ -1815,8 +1799,7 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
  * topology.
  */
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
-						unsigned int nr_targets,
-						cxl_calc_hb_fn calc_hb)
+						unsigned int nr_targets)
 {
 	struct cxl_root_decoder *cxlrd;
 	struct cxl_switch_decoder *cxlsd;
@@ -1838,7 +1821,6 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 		return ERR_PTR(rc);
 	}
 
-	cxlrd->calc_hb = calc_hb;
 	mutex_init(&cxlrd->range_lock);
 
 	cxld = &cxlsd->cxld;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index dfc4f27d195c..e27a0cd4f107 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -432,15 +432,12 @@ struct cxl_switch_decoder {
 };
 
 struct cxl_root_decoder;
-typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
-					    int pos);
 typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
 
 /**
  * struct cxl_root_decoder - Static platform CXL address decoder
  * @res: host / parent resource for region allocations
  * @region_id: region id for next region provisioning event
- * @calc_hb: which host bridge covers the n'th position by granularity
  * @hpa_to_spa: translate CXL host-physical-address to Platform system-physical-address
  * @platform_data: platform specific configuration data
  * @range_lock: sync region autodiscovery by address range
@@ -450,7 +447,6 @@ typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
 struct cxl_root_decoder {
 	struct resource *res;
 	atomic_t region_id;
-	cxl_calc_hb_fn calc_hb;
 	cxl_hpa_to_spa_fn hpa_to_spa;
 	void *platform_data;
 	struct mutex range_lock;
@@ -775,9 +771,7 @@ bool is_root_decoder(struct device *dev);
 bool is_switch_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
-						unsigned int nr_targets,
-						cxl_calc_hb_fn calc_hb);
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
+						unsigned int nr_targets);
 struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
 						    unsigned int nr_targets);
 int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 0/4] XOR Math Fixups: translation & position
  2024-07-03  5:29 [PATCH v4 0/4] XOR Math Fixups: translation & position alison.schofield
                   ` (3 preceding siblings ...)
  2024-07-03  5:29 ` [PATCH v4 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
@ 2024-07-11 19:59 ` Dan Williams
  4 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2024-07-11 19:59 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Dropped tags on Patch 2 due to changes. Please Tag again.

Short of obvious cases where the approach is unrecognizable from the
previous version, I would say always include tags with a "please holler
if you disagree" explanation as to why someone might want to withdraw
their previous tag. Email communication is already lossy enough without
constant revalidation.

Otherwise, if you do ask for re-tag then please explain why you think
the old tag is invalidated. I.e. I could have acked your explanation and
trusted that description here rather then go re-review patch 2.

> 
> Changes in v4:
> - Patch 1: Updated commit msg/log 
>   The name tidy-ups eventually led to a 'fold' not a 'rename'
> - Patch 2: Rename the root decoder callback hpa_to_spa (Dan)
> - Patch 2: Remove hpa_to_spa as a param to cxl_root_decoder_alloc()
> - Patch 2: Add code comment that chunk check is modulo only (Fabio)
> - Patch 2: Add lore link to unit test in commit log (Fabio)
> - Cover Letter: Add an introduction (Dan)
> 
> Link to v3:
> https://lore.kernel.org/cover.1719275633.git.alison.schofield@intel.com/
> 
> 
> Begin cover letter:
> 
> XOR Math Fixups are presented for both translation and position.
> 
> Translation:
> The CXL driver intends to report DPAs and their SPA translation in

s/intends/has a responsibility/, right?

Don't fix this up with a re-post, but for anyone new to this patchset
they should know that RAS is one the main reasons to have OS awareness
of CXL at all. If address translation is broken a fundamental reason for
the driver to even exist is broken.  So "intends" undersells how core
this functionality is to even having a CXL subsystem in the kernel.

> the TRACE logs for CXL poison, general_media, and dram events. It
> is actually only logging the HPA, not the SPA. That works for CXL
> decodes using typical MODULO arithmetic where HPA==SPA, but not for
> XOR decodes. The driver needs to restore the XOR'd bits in order to
> get to the SPA and it doesn't. This means that address translations
> for root decoders using XOR maps are wrong.
> 
> Specifically regions that interleave across 2,4,6,8,12, or 16 host
> bridges are affected. Interleaves using 1 or 3 host bridges, even if
> configured with XOR Arithmetic, do not use xormaps, and are safe.
> 
> Aside from knowing that any address translation of a 1 or 3 way host
> bridge interleave is correct no matter the decode (XOR or MODULO),
> all others are suspect because the decode is actually transparent to
> users.
> 
> Position:
> The position part of this patchset came from the discovery that
> the driver doesn't need to calculate a targets position in a region
> interleave set. The BIOS sets the target list and the driver can
> simply use that order.

Thanks for this writeup and the insight about XOR vs 1 or 3 way
interleaves.

May I ask that you incorporate this useful bit of prose as a kdoc for
cxl_dpa_to_hpa()? This can be a follow-on patch, no need to respin this
set again. Otherwise this gets lost to the sands of time where only
someone savvy enough to do the lore archive search will see it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
  2024-07-03  5:29 ` [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() alison.schofield
@ 2024-07-11 20:52   ` Robert Richter
  0 siblings, 0 replies; 8+ messages in thread
From: Robert Richter @ 2024-07-11 20:52 UTC (permalink / raw)
  To: alison.schofield
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, linux-cxl

On 02.07.24 22:29:49, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
> addresses the work it performs is a DPA to HPA translation not a
> trace. Tidy up this naming by moving the minimal work done in
> cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa()
> for trace event callbacks.
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Robert Richter <rrichter@amd.com>

> ---
>  drivers/cxl/core/core.h   |  8 ++++----
>  drivers/cxl/core/mbox.c   |  2 +-
>  drivers/cxl/core/region.c | 33 +++++++++++++--------------------
>  drivers/cxl/core/trace.h  |  4 ++--
>  4 files changed, 20 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/4] cxl: Restore XOR'd position bits during address translation
  2024-07-03  5:29 ` [PATCH v4 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
@ 2024-07-11 22:35   ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2024-07-11 22:35 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, Diego Garcia Rodriguez

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> When a device reports a DPA in events like poison, general_media,
> and dram, the driver translates that DPA back to an HPA. Presently,
> the CXL driver translation only considers the Modulo position and
> will report the wrong HPA for XOR configured root decoders.
> 
> Add a helper function that restores the XOR'd bits during DPA->HPA
> address translation. Plumb a root decoder callback to the new helper
> when XOR interleave arithmetic is in use. For Modulo arithmetic, just
> let the callback be NULL - as in no extra work required.
> 
> Upon completion of a DPA->HPA translation a couple of checks are
> performed on the result. One simply confirms that the calculated
> HPA is within the address range of the region. That test is useful
> for both Modulo and XOR interleave arithmetic decodes.
> 
> A second check confirms that the HPA is within an expected chunk
> based on the endpoints position in the region and the region
> granularity. An XOR decode disrupts the Modulo pattern making the
> chunk check useless.
> 
> To align the checks with the proper decode, pull the region range
> check inline and use the helper to do the chunk check for Modulo
> decodes only.
> 
> A cxl-test unit test is posted for upstream review here:
> https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/
> 
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
> ---
>  drivers/cxl/acpi.c        | 40 +++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c | 23 +++++++++++++---------
>  drivers/cxl/cxl.h         |  3 +++
>  3 files changed, 57 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 571069863c62..6b6ae9c81368 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
>  	return cxlrd->cxlsd.target[n];
>  }
>  
> +static u64 cxl_xor_hpa_to_spa(struct cxl_root_decoder *cxlrd, u64 hpa)
> +{
> +	struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> +	int hbiw = cxlrd->cxlsd.nr_targets;
> +	u64 val;
> +	int pos;
> +
> +	/* No xormaps for host bridge interleave ways of 1 or 3 */
> +	if (hbiw == 1 || hbiw == 3)
> +		return hpa;
> +
> +	/*
> +	 * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> +	 * the position bit to its value before the xormap was applied at
> +	 * HPA->DPA translation.
> +	 *
> +	 * pos is the lowest set bit in an XORMAP
> +	 * val is the XORALLBITS(HPA & XORMAP)
> +	 *
> +	 * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> +	 * as an operation that outputs a single bit by XORing all the
> +	 * bits in the input (hpa & xormap). Implement XORALLBITS using
> +	 * hweight64(). If the hamming weight is even the XOR of those
> +	 * bits results in val==0, if odd the XOR result is val==1.
> +	 */
> +
> +	for (int i = 0; i < cximsd->nr_maps; i++) {
> +		if (!cximsd->xormaps[i])
> +			continue;
> +		pos = __ffs(cximsd->xormaps[i]);
> +		val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> +		hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> +	}
> +
> +	return hpa;
> +}
> +
>  struct cxl_cxims_context {
>  	struct device *dev;
>  	struct cxl_root_decoder *cxlrd;
> @@ -434,6 +471,9 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
>  
>  	cxlrd->qos_class = cfmws->qtg_id;
>  
> +	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_XOR)
> +		cxlrd->hpa_to_spa = cxl_xor_hpa_to_spa;
> +
>  	rc = cxl_decoder_add(cxld, target_map);
>  	if (rc)
>  		return rc;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 237c28d5f2cc..23abd0f7b856 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2723,20 +2723,13 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
>  	return ctx.cxlr;
>  }
>  
> -static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
> +static bool cxl_is_hpa_in_chunk(u64 hpa, struct cxl_region *cxlr, int pos)

Minor note going forward, no need to respin, if the function is static
then it need not have a global namespace prefix, so this would be ok
renamed to is_hpa_in_chunk().

>  {
>  	struct cxl_region_params *p = &cxlr->params;
>  	int gran = p->interleave_granularity;
>  	int ways = p->interleave_ways;
>  	u64 offset;
>  
> -	/* Is the hpa within this region at all */
> -	if (hpa < p->res->start || hpa > p->res->end) {
> -		dev_dbg(&cxlr->dev,
> -			"Addr trans fail: hpa 0x%llx not in region\n", hpa);
> -		return false;
> -	}
> -
>  	/* Is the hpa in an expected chunk for its pos(-ition) */
>  	offset = hpa - p->res->start;
>  	offset = do_div(offset, gran * ways);
> @@ -2752,6 +2745,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
>  u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
>  		   u64 dpa)
>  {
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>  	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
>  	struct cxl_region_params *p = &cxlr->params;
>  	struct cxl_endpoint_decoder *cxled = NULL;
> @@ -2801,7 +2795,18 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
>  	/* Apply the hpa_offset to the region base address */
>  	hpa = hpa_offset + p->res->start;
>  
> -	if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
> +	/* Root decoder translation overrides typical modulo decode */

Another minor / don't respin quibble, I would not say "override" it
"supplements" since "overrides" to me implies that the dpa_to_hpa
translation is completely replaced.

Otherwise, looks good, you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-11 22:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03  5:29 [PATCH v4 0/4] XOR Math Fixups: translation & position alison.schofield
2024-07-03  5:29 ` [PATCH v4 1/4] cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() alison.schofield
2024-07-11 20:52   ` Robert Richter
2024-07-03  5:29 ` [PATCH v4 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
2024-07-11 22:35   ` Dan Williams
2024-07-03  5:29 ` [PATCH v4 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
2024-07-03  5:29 ` [PATCH v4 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
2024-07-11 19:59 ` [PATCH v4 0/4] XOR Math Fixups: translation & position Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox