* [PATCH v2 0/4] XOR Math Fixups: translation & position
@ 2024-05-08 18:47 alison.schofield
2024-05-08 18:47 ` [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate() alison.schofield
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: alison.schofield @ 2024-05-08 18:47 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
Changes in v2:
- Rebase on cxl/next
- Remove 'misconfigured root decoder' warn in cxl_region_attach_position() (Dan)
- Comment the implementation of CXL Spec's XORALLBITS in cxl_xor_restore (Dan)
- Prepend new Patch 1 to rename cxl_trace_hpa()-> cxl_translate() (Dan)
- Simplify the setting of position bit in cxl_xor_restore() (Dan)
- Collapse the helper restore_xor_pos() into cxl_xor_restore()
- Use host bridge ways, not region ways, in cxl_xor_restore().
- Update 'override' comment in cxl_translate() (Dan)
- Clarify via renames (Dan)
cxl_addr_trans_fn -> cxl_translate_fn
cxl_xor_trans() -> cxl_xor_translate()
Link to v1:
https://lore.kernel.org/cover.1714159486.git.alison.schofield@intel.com/
Begin cover letter:
Rather than repeat the individual patch commit message content,
let me describe the flow of this set:
Patch 1: Rename an existing fn - cxl_trace_hpa()-> cxl_translate()
Patch 2: cxl/acpi: Restore XOR'd position bits during address translation
The problem fixed in this patch, bad HPA translations with XOR math,
came to my attention recently. Patch 1 can stand alone, but since that
discovery also shed light on how to repair an issue with calculating
positions in interleave sets (Patch 2,3) they are presented together.
Patch 3 & Patch 4 are paired. Patch 3 presents the new method for
verifying a target position in the list and Patch 3 removes the
old method. These could be squashed.
FYI - the reason I don't present the code removal first is because
I think it is easier to read the diff if I leave in the old root
decoder call back setup for calc_hb, insert the new call back along
the same path, and then rip out the defunct calc_hb. That's the
way I created the patchset and it may be an easier way for reviewers
to follow along with the root decoder callback setup.
Alison Schofield (4):
cxl/core: Rename cxl_trace_hpa() to cxl_translate()
cxl/acpi: Restore XOR'd position bits during address translation
cxl/region: Verify target positions using the ordered target list
cxl: Remove defunct code calculating host bridge target positions
drivers/cxl/acpi.c | 80 ++++++++++++++++-----------------------
drivers/cxl/core/core.h | 4 +-
drivers/cxl/core/mbox.c | 2 +-
drivers/cxl/core/port.c | 21 ++--------
drivers/cxl/core/region.c | 12 +++++-
drivers/cxl/core/trace.h | 2 +-
drivers/cxl/cxl.h | 10 ++---
7 files changed, 54 insertions(+), 77 deletions(-)
base-commit: d99f13843237cf9dbdc1bd873a901662b4aee16f
--
2.37.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate()
2024-05-08 18:47 [PATCH v2 0/4] XOR Math Fixups: translation & position alison.schofield
@ 2024-05-08 18:47 ` alison.schofield
2024-05-30 3:45 ` Dan Williams
2024-06-07 14:40 ` Jonathan Cameron
2024-05-08 18:47 ` [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
` (2 subsequent siblings)
3 siblings, 2 replies; 18+ messages in thread
From: alison.schofield @ 2024-05-08 18:47 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
addresses, the work it performs is a translation (dpa->hpa), not a
trace. Rename it. Since this is the only translate work in CXL,
drop the _hpa suffix in the rename.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/core.h | 4 ++--
drivers/cxl/core/mbox.c | 2 +-
drivers/cxl/core/region.c | 2 +-
drivers/cxl/core/trace.h | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 625394486459..8ceebd3b51d2 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -28,12 +28,12 @@ int cxl_region_init(void);
void cxl_region_exit(void);
int cxl_get_poison_by_endpoint(struct cxl_port *port);
struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
-u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
+u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
u64 dpa);
#else
static inline u64
-cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
+cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
{
return ULLONG_MAX;
}
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 2626f3fff201..edc54a1ca298 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -878,7 +878,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
dpa = le64_to_cpu(evt->common.phys_addr) & CXL_DPA_MASK;
cxlr = cxl_dpa_to_region(cxlmd, dpa);
if (cxlr)
- hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
+ hpa = cxl_translate(cxlr, cxlmd, dpa);
if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 00a9f0eef8dd..245edf748906 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2797,7 +2797,7 @@ static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
return hpa;
}
-u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
+u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
u64 dpa)
{
struct cxl_region_params *p = &cxlr->params;
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 07a0394b1d99..6925a6a31f01 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -704,7 +704,7 @@ TRACE_EVENT(cxl_poison,
if (cxlr) {
__assign_str(region, dev_name(&cxlr->dev));
memcpy(__entry->uuid, &cxlr->params.uuid, 16);
- __entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
+ __entry->hpa = cxl_translate(cxlr, cxlmd,
__entry->dpa);
} else {
__assign_str(region, "");
--
2.37.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-05-08 18:47 [PATCH v2 0/4] XOR Math Fixups: translation & position alison.schofield
2024-05-08 18:47 ` [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate() alison.schofield
@ 2024-05-08 18:47 ` alison.schofield
2024-05-30 3:55 ` Dan Williams
2024-06-07 15:01 ` Jonathan Cameron
2024-05-08 18:47 ` [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
2024-05-08 18:47 ` [PATCH v2 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
3 siblings, 2 replies; 18+ messages in thread
From: alison.schofield @ 2024-05-08 18:47 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
When a CXL region is created in a CXL Window (CFMWS) that uses XOR
interleave arithmetic XOR maps are applied during the HPA->DPA
translation. The XOR function changes the interleave selector
bit (aka position bit) in the HPA thereby varying which host bridge
services an HPA. The purpose is to minimize hot spots thereby
improving performance.
When a device reports a DPA in events such as poison, general_media,
and dram, the driver translates that DPA back to an HPA. Presently,
the CXL driver translation only considers the modulo position and
will report the wrong HPA for XOR configured CFMWS's.
Add a helper function that restores the XOR'd bits during DPA->HPA
address translation. Plumb a root decoder callback to the new helper
when XOR interleave arithmetic is in use. For MODULO arithmetic, just
let the callback be NULL - as in no extra work required.
Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
drivers/cxl/core/port.c | 5 +++-
drivers/cxl/core/region.c | 5 ++++
drivers/cxl/cxl.h | 6 ++++-
4 files changed, 59 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 571069863c62..20488e7b09ac 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
return cxlrd->cxlsd.target[n];
}
+static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
+{
+ struct cxl_cxims_data *cximsd = cxlrd->platform_data;
+ int hbiw = cxlrd->cxlsd.nr_targets;
+ u64 val;
+ int pos;
+
+ /* No xormaps for host bridge interleave ways of 1 or 3 */
+ if (hbiw == 1 || hbiw == 3)
+ return hpa;
+
+ /*
+ * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
+ * the position bit to its value before the xormap was applied at
+ * HPA->DPA translation.
+ *
+ * pos is the lowest set bit in an XORMAP
+ * val is the XORALLBITS(HPA & XORMAP)
+ *
+ * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
+ * as an operation that outputs a single bit by XORing all the
+ * bits in the input (hpa & xormap). Implement XORALLBITS using
+ * hweight64(). If the hamming weight is even the XOR of those
+ * bits results in 0, if odd the XOR result is 1.
+ */
+
+ for (int i = 0; i < cximsd->nr_maps; i++) {
+ if (!cximsd->xormaps[i])
+ continue;
+ pos = __ffs(cximsd->xormaps[i]);
+ val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
+ hpa = (hpa & ~(1ULL << pos)) | (val << pos);
+ }
+
+ return hpa;
+}
+
struct cxl_cxims_context {
struct device *dev;
struct cxl_root_decoder *cxlrd;
@@ -362,6 +399,7 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
struct cxl_cxims_context cxims_ctx;
struct device *dev = ctx->dev;
cxl_calc_hb_fn cxl_calc_hb;
+ cxl_translate_fn translate;
struct cxl_decoder *cxld;
unsigned int ways, i, ig;
int rc;
@@ -389,13 +427,17 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
if (rc)
return rc;
- if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
+ if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
cxl_calc_hb = cxl_hb_modulo;
- else
+ translate = NULL;
+
+ } else {
cxl_calc_hb = cxl_hb_xor;
+ translate = cxl_xor_translate;
+ }
struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
- cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
+ cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb, translate);
if (IS_ERR(cxlrd))
return PTR_ERR(cxlrd);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 762783bb091a..32346c171892 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1808,6 +1808,7 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
* @port: owning CXL root of this decoder
* @nr_targets: static number of downstream targets
* @calc_hb: which host bridge covers the n'th position by granularity
+ * @translate: decoder specific address translation function
*
* Return: A new cxl decoder to be registered by cxl_decoder_add(). A
* 'CXL root' decoder is one that decodes from a top-level / static platform
@@ -1816,7 +1817,8 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
*/
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb)
+ cxl_calc_hb_fn calc_hb,
+ cxl_translate_fn translate)
{
struct cxl_root_decoder *cxlrd;
struct cxl_switch_decoder *cxlsd;
@@ -1839,6 +1841,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
}
cxlrd->calc_hb = calc_hb;
+ cxlrd->translate = translate;
mutex_init(&cxlrd->range_lock);
cxld = &cxlsd->cxld;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 245edf748906..2fe93c5a8072 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2752,6 +2752,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled)
{
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
struct cxl_region_params *p = &cxlr->params;
int pos = cxled->pos;
@@ -2791,6 +2792,10 @@ static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
/* Apply the hpa_offset to the region base address */
hpa = hpa_offset + p->res->start;
+ /* Root decoder translation overrides typical modulo decode */
+ if (cxlrd->translate)
+ hpa = cxlrd->translate(cxlrd, hpa);
+
if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
return ULLONG_MAX;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 80f58b96dc1c..e11155002213 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -434,12 +434,14 @@ struct cxl_switch_decoder {
struct cxl_root_decoder;
typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
int pos);
+typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
/**
* struct cxl_root_decoder - Static platform CXL address decoder
* @res: host / parent resource for region allocations
* @region_id: region id for next region provisioning event
* @calc_hb: which host bridge covers the n'th position by granularity
+ * @translate: decoder specific address translation function
* @platform_data: platform specific configuration data
* @range_lock: sync region autodiscovery by address range
* @qos_class: QoS performance class cookie
@@ -449,6 +451,7 @@ struct cxl_root_decoder {
struct resource *res;
atomic_t region_id;
cxl_calc_hb_fn calc_hb;
+ cxl_translate_fn translate;
void *platform_data;
struct mutex range_lock;
int qos_class;
@@ -773,7 +776,8 @@ bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb);
+ cxl_calc_hb_fn calc_hb,
+ cxl_translate_fn translate);
struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets);
--
2.37.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list
2024-05-08 18:47 [PATCH v2 0/4] XOR Math Fixups: translation & position alison.schofield
2024-05-08 18:47 ` [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate() alison.schofield
2024-05-08 18:47 ` [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
@ 2024-05-08 18:47 ` alison.schofield
2024-06-07 15:04 ` Jonathan Cameron
2024-06-11 21:50 ` Dan Williams
2024-05-08 18:47 ` [PATCH v2 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
3 siblings, 2 replies; 18+ messages in thread
From: alison.schofield @ 2024-05-08 18:47 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
When a root decoder is configured the interleave target list is read
from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
9-22 the target list is in interleave order. The CXL driver populates
its decoder target list in the same order and stores it in 'struct
cxl_switch_decoder' field "@target: active ordered target list in
current decoder configuration"
Given the promise of an ordered list, the driver can stop duplicating
the work of BIOS and simply check target positions against the ordered
list during region configuration.
The simplified check against the ordered list is presented here.
A follow-on patch will remove the unused code.
For Modulo arithmetic this is not a fix, only a simplification.
For XOR arithmetic this is a fix for HB IW of 3,6,12.
Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/region.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 2fe93c5a8072..6aa2c981f1c4 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1559,10 +1559,13 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
const struct cxl_dport *dport, int pos)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
+ struct cxl_decoder *cxld = &cxlsd->cxld;
+ int iw = cxld->interleave_ways;
struct cxl_port *iter;
int rc;
- if (cxlrd->calc_hb(cxlrd, pos) != dport) {
+ if (dport != cxlrd->cxlsd.target[pos % iw]) {
dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
dev_name(&cxlrd->cxlsd.cxld.dev));
--
2.37.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/4] cxl: Remove defunct code calculating host bridge target positions
2024-05-08 18:47 [PATCH v2 0/4] XOR Math Fixups: translation & position alison.schofield
` (2 preceding siblings ...)
2024-05-08 18:47 ` [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
@ 2024-05-08 18:47 ` alison.schofield
2024-06-07 15:06 ` Jonathan Cameron
3 siblings, 1 reply; 18+ messages in thread
From: alison.schofield @ 2024-05-08 18:47 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS
target list in interleave target order. This means that the calculations
the CXL driver added to determine positions when XOR math is in use,
along with the entire XOR vs Modulo call back setup is not needed.
A prior patch added a common method to verify positions.
Remove the now unused code related to the cxl_calc_hb_fn.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/acpi.c | 62 ++---------------------------------------
drivers/cxl/core/port.c | 18 ------------
drivers/cxl/cxl.h | 6 ----
3 files changed, 3 insertions(+), 83 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 20488e7b09ac..25da55337834 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -22,57 +22,6 @@ static const guid_t acpi_cxl_qtg_id_guid =
GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
-/*
- * Find a targets entry (n) in the host bridge interleave list.
- * CXL Specification 3.0 Table 9-22
- */
-static int cxl_xor_calc_n(u64 hpa, struct cxl_cxims_data *cximsd, int iw,
- int ig)
-{
- int i = 0, n = 0;
- u8 eiw;
-
- /* IW: 2,4,6,8,12,16 begin building 'n' using xormaps */
- if (iw != 3) {
- for (i = 0; i < cximsd->nr_maps; i++)
- n |= (hweight64(hpa & cximsd->xormaps[i]) & 1) << i;
- }
- /* IW: 3,6,12 add a modulo calculation to 'n' */
- if (!is_power_of_2(iw)) {
- if (ways_to_eiw(iw, &eiw))
- return -1;
- hpa &= GENMASK_ULL(51, eiw + ig);
- n |= do_div(hpa, 3) << i;
- }
- return n;
-}
-
-static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
-{
- struct cxl_cxims_data *cximsd = cxlrd->platform_data;
- struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
- struct cxl_decoder *cxld = &cxlsd->cxld;
- int ig = cxld->interleave_granularity;
- int iw = cxld->interleave_ways;
- int n = 0;
- u64 hpa;
-
- if (dev_WARN_ONCE(&cxld->dev,
- cxld->interleave_ways != cxlsd->nr_targets,
- "misconfigured root decoder\n"))
- return NULL;
-
- hpa = cxlrd->res->start + pos * ig;
-
- /* Entry (n) is 0 for no interleave (iw == 1) */
- if (iw != 1)
- n = cxl_xor_calc_n(hpa, cximsd, iw, ig);
-
- if (n < 0)
- return NULL;
-
- return cxlrd->cxlsd.target[n];
-}
static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
{
@@ -398,7 +347,6 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
struct cxl_port *root_port = ctx->root_port;
struct cxl_cxims_context cxims_ctx;
struct device *dev = ctx->dev;
- cxl_calc_hb_fn cxl_calc_hb;
cxl_translate_fn translate;
struct cxl_decoder *cxld;
unsigned int ways, i, ig;
@@ -427,17 +375,13 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
if (rc)
return rc;
- if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
- cxl_calc_hb = cxl_hb_modulo;
+ if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
translate = NULL;
-
- } else {
- cxl_calc_hb = cxl_hb_xor;
+ else
translate = cxl_xor_translate;
- }
struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
- cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb, translate);
+ cxl_root_decoder_alloc(root_port, ways, translate);
if (IS_ERR(cxlrd))
return PTR_ERR(cxlrd);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 32346c171892..9c0e4c6387aa 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1733,21 +1733,6 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
return 0;
}
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
-{
- struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
- struct cxl_decoder *cxld = &cxlsd->cxld;
- int iw;
-
- iw = cxld->interleave_ways;
- if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
- "misconfigured root decoder\n"))
- return NULL;
-
- return cxlrd->cxlsd.target[pos % iw];
-}
-EXPORT_SYMBOL_NS_GPL(cxl_hb_modulo, CXL);
-
static struct lock_class_key cxl_decoder_key;
/**
@@ -1807,7 +1792,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
* cxl_root_decoder_alloc - Allocate a root level decoder
* @port: owning CXL root of this decoder
* @nr_targets: static number of downstream targets
- * @calc_hb: which host bridge covers the n'th position by granularity
* @translate: decoder specific address translation function
*
* Return: A new cxl decoder to be registered by cxl_decoder_add(). A
@@ -1817,7 +1801,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
*/
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb,
cxl_translate_fn translate)
{
struct cxl_root_decoder *cxlrd;
@@ -1840,7 +1823,6 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
return ERR_PTR(rc);
}
- cxlrd->calc_hb = calc_hb;
cxlrd->translate = translate;
mutex_init(&cxlrd->range_lock);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index e11155002213..68ac68506670 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -432,15 +432,12 @@ struct cxl_switch_decoder {
};
struct cxl_root_decoder;
-typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
- int pos);
typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
/**
* struct cxl_root_decoder - Static platform CXL address decoder
* @res: host / parent resource for region allocations
* @region_id: region id for next region provisioning event
- * @calc_hb: which host bridge covers the n'th position by granularity
* @translate: decoder specific address translation function
* @platform_data: platform specific configuration data
* @range_lock: sync region autodiscovery by address range
@@ -450,7 +447,6 @@ typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
struct cxl_root_decoder {
struct resource *res;
atomic_t region_id;
- cxl_calc_hb_fn calc_hb;
cxl_translate_fn translate;
void *platform_data;
struct mutex range_lock;
@@ -776,9 +772,7 @@ bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb,
cxl_translate_fn translate);
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets);
int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
--
2.37.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate()
2024-05-08 18:47 ` [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate() alison.schofield
@ 2024-05-30 3:45 ` Dan Williams
2024-06-07 14:40 ` Jonathan Cameron
1 sibling, 0 replies; 18+ messages in thread
From: Dan Williams @ 2024-05-30 3:45 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
> addresses, the work it performs is a translation (dpa->hpa), not a
> trace. Rename it. Since this is the only translate work in CXL,
> drop the _hpa suffix in the rename.
>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
LGTM
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-05-08 18:47 ` [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
@ 2024-05-30 3:55 ` Dan Williams
2024-05-30 22:29 ` Alison Schofield
2024-06-07 15:01 ` Jonathan Cameron
1 sibling, 1 reply; 18+ messages in thread
From: Dan Williams @ 2024-05-30 3:55 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> interleave arithmetic XOR maps are applied during the HPA->DPA
> translation. The XOR function changes the interleave selector
> bit (aka position bit) in the HPA thereby varying which host bridge
> services an HPA. The purpose is to minimize hot spots thereby
> improving performance.
I think it is important to either detail how "interleave hot spots"
arise, or just omit that description since it is ultimately not relevant
to Linux. I.e. the "why XOR" question is irrelevant compared to "what
happens to a end user with a kernel that does not comprehend XOR
interleave maths".
> When a device reports a DPA in events such as poison, general_media,
> and dram, the driver translates that DPA back to an HPA. Presently,
> the CXL driver translation only considers the modulo position and
> will report the wrong HPA for XOR configured CFMWS's.
>
> Add a helper function that restores the XOR'd bits during DPA->HPA
> address translation. Plumb a root decoder callback to the new helper
> when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> let the callback be NULL - as in no extra work required.
Perhaps add a handful of sentences about the testing for this patch to
make sure the maths are now correct compared to what was there before?
>
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/core/port.c | 5 +++-
> drivers/cxl/core/region.c | 5 ++++
> drivers/cxl/cxl.h | 6 ++++-
> 4 files changed, 59 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 571069863c62..20488e7b09ac 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> return cxlrd->cxlsd.target[n];
> }
>
> +static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
> +{
> + struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> + int hbiw = cxlrd->cxlsd.nr_targets;
> + u64 val;
> + int pos;
> +
> + /* No xormaps for host bridge interleave ways of 1 or 3 */
> + if (hbiw == 1 || hbiw == 3)
> + return hpa;
> +
> + /*
> + * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> + * the position bit to its value before the xormap was applied at
> + * HPA->DPA translation.
> + *
> + * pos is the lowest set bit in an XORMAP
> + * val is the XORALLBITS(HPA & XORMAP)
> + *
> + * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> + * as an operation that outputs a single bit by XORing all the
> + * bits in the input (hpa & xormap). Implement XORALLBITS using
> + * hweight64(). If the hamming weight is even the XOR of those
> + * bits results in 0, if odd the XOR result is 1.
> + */
> +
> + for (int i = 0; i < cximsd->nr_maps; i++) {
> + if (!cximsd->xormaps[i])
> + continue;
> + pos = __ffs(cximsd->xormaps[i]);
> + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> + }
This looks correct to me, but so did the original broken implementation.
I would feel better about a self test either integrated into
tools/testing/cxl/, or documented / referenced in the CXL Device Driver
Writer's Guide that gives confidence that "This is the way".
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-05-30 3:55 ` Dan Williams
@ 2024-05-30 22:29 ` Alison Schofield
2024-05-31 1:46 ` Dan Williams
0 siblings, 1 reply; 18+ messages in thread
From: Alison Schofield @ 2024-05-30 22:29 UTC (permalink / raw)
To: Dan Williams
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, linux-cxl
On Wed, May 29, 2024 at 08:55:30PM -0700, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> > interleave arithmetic XOR maps are applied during the HPA->DPA
> > translation. The XOR function changes the interleave selector
> > bit (aka position bit) in the HPA thereby varying which host bridge
> > services an HPA. The purpose is to minimize hot spots thereby
> > improving performance.
>
> I think it is important to either detail how "interleave hot spots"
> arise, or just omit that description since it is ultimately not relevant
> to Linux. I.e. the "why XOR" question is irrelevant compared to "what
> happens to a end user with a kernel that does not comprehend XOR
> interleave maths".
I think I understand. Long ago we decided to add XOR Math support to
the CXL Driver. The point now is to do it correctly not rejustify it's
existence.
I'll chop 'The purpose...'
>
> > When a device reports a DPA in events such as poison, general_media,
> > and dram, the driver translates that DPA back to an HPA. Presently,
> > the CXL driver translation only considers the modulo position and
> > will report the wrong HPA for XOR configured CFMWS's.
> >
> > Add a helper function that restores the XOR'd bits during DPA->HPA
> > address translation. Plumb a root decoder callback to the new helper
> > when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> > let the callback be NULL - as in no extra work required.
>
> Perhaps add a handful of sentences about the testing for this patch to
> make sure the maths are now correct compared to what was there before?
>
There was nothing there before, but I think I know what you mean.
More on that below...
> >
> > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
snip
> > +
> > + for (int i = 0; i < cximsd->nr_maps; i++) {
> > + if (!cximsd->xormaps[i])
> > + continue;
> > + pos = __ffs(cximsd->xormaps[i]);
> > + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> > + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> > + }
>
> This looks correct to me, but so did the original broken implementation.
> I would feel better about a self test either integrated into
> tools/testing/cxl/, or documented / referenced in the CXL Device Driver
> Writer's Guide that gives confidence that "This is the way".
Let me see about adding a unit test to the cxl suite that exercises the
translation path using data from the sample XOR translation tables that are
currently being added to the CXL Driver Writer's Guide. That would allow the
test to verify against a known set of DPA->HPA offsets for a given XOR region
config.
-- Alison
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-05-30 22:29 ` Alison Schofield
@ 2024-05-31 1:46 ` Dan Williams
0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2024-05-31 1:46 UTC (permalink / raw)
To: Alison Schofield, Dan Williams
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, linux-cxl
Alison Schofield wrote:
> Let me see about adding a unit test to the cxl suite that exercises the
> translation path using data from the sample XOR translation tables that are
> currently being added to the CXL Driver Writer's Guide. That would allow the
> test to verify against a known set of DPA->HPA offsets for a given XOR region
> config.
Yeah, that sounds good. No need to worry about integrating that into
cxl_test. It could just be a sample test program in cxl-cli that shows
the algorithm and then can be referenced as "see, the kernel does the
same thing as this test that passes with the sample data from the Driver
Writer's Guide.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate()
2024-05-08 18:47 ` [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate() alison.schofield
2024-05-30 3:45 ` Dan Williams
@ 2024-06-07 14:40 ` Jonathan Cameron
2024-06-07 17:45 ` Alison Schofield
1 sibling, 1 reply; 18+ messages in thread
From: Jonathan Cameron @ 2024-06-07 14:40 UTC (permalink / raw)
To: alison.schofield
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Wed, 8 May 2024 11:47:50 -0700
alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
> addresses, the work it performs is a translation (dpa->hpa), not a
> trace. Rename it. Since this is the only translate work in CXL,
> drop the _hpa suffix in the rename.
>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Maybe makes some sense to keep the direction in the name?
I can see for things like PPR we may need to go HPA to DPA as
it'll be based on error counts that may be reported in HPA.
So Friday bikeshedding time.
cxl_dpa_to_hpa()?
> ---
> drivers/cxl/core/core.h | 4 ++--
> drivers/cxl/core/mbox.c | 2 +-
> drivers/cxl/core/region.c | 2 +-
> drivers/cxl/core/trace.h | 2 +-
> 4 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 625394486459..8ceebd3b51d2 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -28,12 +28,12 @@ int cxl_region_init(void);
> void cxl_region_exit(void);
> int cxl_get_poison_by_endpoint(struct cxl_port *port);
> struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
> -u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> +u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> u64 dpa);
>
> #else
> static inline u64
> -cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
> +cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
> {
> return ULLONG_MAX;
> }
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 2626f3fff201..edc54a1ca298 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -878,7 +878,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> dpa = le64_to_cpu(evt->common.phys_addr) & CXL_DPA_MASK;
> cxlr = cxl_dpa_to_region(cxlmd, dpa);
> if (cxlr)
> - hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
> + hpa = cxl_translate(cxlr, cxlmd, dpa);
>
> if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
> trace_cxl_general_media(cxlmd, type, cxlr, hpa,
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 00a9f0eef8dd..245edf748906 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2797,7 +2797,7 @@ static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
> return hpa;
> }
>
> -u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> +u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> u64 dpa)
> {
> struct cxl_region_params *p = &cxlr->params;
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 07a0394b1d99..6925a6a31f01 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -704,7 +704,7 @@ TRACE_EVENT(cxl_poison,
> if (cxlr) {
> __assign_str(region, dev_name(&cxlr->dev));
> memcpy(__entry->uuid, &cxlr->params.uuid, 16);
> - __entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
> + __entry->hpa = cxl_translate(cxlr, cxlmd,
> __entry->dpa);
> } else {
> __assign_str(region, "");
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-05-08 18:47 ` [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
2024-05-30 3:55 ` Dan Williams
@ 2024-06-07 15:01 ` Jonathan Cameron
2024-06-07 18:20 ` Alison Schofield
1 sibling, 1 reply; 18+ messages in thread
From: Jonathan Cameron @ 2024-06-07 15:01 UTC (permalink / raw)
To: alison.schofield
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Wed, 8 May 2024 11:47:51 -0700
alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> interleave arithmetic XOR maps are applied during the HPA->DPA
> translation. The XOR function changes the interleave selector
> bit (aka position bit) in the HPA thereby varying which host bridge
> services an HPA. The purpose is to minimize hot spots thereby
> improving performance.
>
> When a device reports a DPA in events such as poison, general_media,
> and dram, the driver translates that DPA back to an HPA. Presently,
> the CXL driver translation only considers the modulo position and
> will report the wrong HPA for XOR configured CFMWS's.
>
> Add a helper function that restores the XOR'd bits during DPA->HPA
> address translation. Plumb a root decoder callback to the new helper
> when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> let the callback be NULL - as in no extra work required.
>
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Trivial comment inline. Agree entirely that some tests would be good.
I ran through a few trivial cases on a bit of paper and it looks to
me like it works but that hardly counts as testing :)
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/core/port.c | 5 +++-
> drivers/cxl/core/region.c | 5 ++++
> drivers/cxl/cxl.h | 6 ++++-
> 4 files changed, 59 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 571069863c62..20488e7b09ac 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> return cxlrd->cxlsd.target[n];
> }
>
> +static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
> +{
> + struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> + int hbiw = cxlrd->cxlsd.nr_targets;
> + u64 val;
> + int pos;
> +
> + /* No xormaps for host bridge interleave ways of 1 or 3 */
> + if (hbiw == 1 || hbiw == 3)
> + return hpa;
> +
> + /*
> + * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> + * the position bit to its value before the xormap was applied at
> + * HPA->DPA translation.
> + *
> + * pos is the lowest set bit in an XORMAP
> + * val is the XORALLBITS(HPA & XORMAP)
> + *
> + * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> + * as an operation that outputs a single bit by XORing all the
> + * bits in the input (hpa & xormap). Implement XORALLBITS using
> + * hweight64(). If the hamming weight is even the XOR of those
> + * bits results in 0, if odd the XOR result is 1.
> + */
> +
> + for (int i = 0; i < cximsd->nr_maps; i++) {
> + if (!cximsd->xormaps[i])
> + continue;
> + pos = __ffs(cximsd->xormaps[i]);
At the moment the comment on XORALLBITS isn't associated with this
code very well. I'd factor it out as cxl_xorallbits() mostly so
you can stick the comment next to the bit that does the work.
Or maybe a #define XORALLBITS(hpa, xormap) is good enough if
you move it up under the comment.
> + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> + }
> +
> + return hpa;
> +}
> +
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list
2024-05-08 18:47 ` [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
@ 2024-06-07 15:04 ` Jonathan Cameron
2024-06-11 21:50 ` Dan Williams
1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2024-06-07 15:04 UTC (permalink / raw)
To: alison.schofield
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Wed, 8 May 2024 11:47:52 -0700
alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> When a root decoder is configured the interleave target list is read
> from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
> 9-22 the target list is in interleave order. The CXL driver populates
> its decoder target list in the same order and stores it in 'struct
> cxl_switch_decoder' field "@target: active ordered target list in
> current decoder configuration"
>
> Given the promise of an ordered list, the driver can stop duplicating
> the work of BIOS and simply check target positions against the ordered
> list during region configuration.
>
> The simplified check against the ordered list is presented here.
> A follow-on patch will remove the unused code.
>
> For Modulo arithmetic this is not a fix, only a simplification.
> For XOR arithmetic this is a fix for HB IW of 3,6,12.
>
> Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] cxl: Remove defunct code calculating host bridge target positions
2024-05-08 18:47 ` [PATCH v2 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
@ 2024-06-07 15:06 ` Jonathan Cameron
0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2024-06-07 15:06 UTC (permalink / raw)
To: alison.schofield
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Wed, 8 May 2024 11:47:53 -0700
alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS
> target list in interleave target order. This means that the calculations
> the CXL driver added to determine positions when XOR math is in use,
> along with the entire XOR vs Modulo call back setup is not needed.
>
> A prior patch added a common method to verify positions.
>
> Remove the now unused code related to the cxl_calc_hb_fn.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
FWIW given this just removes unused code.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate()
2024-06-07 14:40 ` Jonathan Cameron
@ 2024-06-07 17:45 ` Alison Schofield
2024-06-07 17:55 ` Jonathan Cameron
0 siblings, 1 reply; 18+ messages in thread
From: Alison Schofield @ 2024-06-07 17:45 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Fri, Jun 07, 2024 at 03:40:25PM +0100, Jonathan Cameron wrote:
> On Wed, 8 May 2024 11:47:50 -0700
> alison.schofield@intel.com wrote:
>
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
> > addresses, the work it performs is a translation (dpa->hpa), not a
> > trace. Rename it. Since this is the only translate work in CXL,
> > drop the _hpa suffix in the rename.
> >
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
>
> Maybe makes some sense to keep the direction in the name?
>
> I can see for things like PPR we may need to go HPA to DPA as
> it'll be based on error counts that may be reported in HPA.
>
> So Friday bikeshedding time.
> cxl_dpa_to_hpa()?
Thanks for the review Jonathan!
Let's sync on the possible dpa->hpa->dpa use cases and then I'll
weigh in on your bikeshedding.
Agree we need HPA->DPA support for userspace. User needs ability
to find which device maps an HPA.
wrt PPR, do you think that is an in kernel need?
If the PPR maintenance was done in response to a CXL event, the DPA
is already in the event trace along with the HPA so I wouldn't expect
that a lookup is needed, at least not within the kernel.
Your explicit cxl_dpa_to_hpa() name is good.
--Alison
>
>
> > ---
> > drivers/cxl/core/core.h | 4 ++--
> > drivers/cxl/core/mbox.c | 2 +-
> > drivers/cxl/core/region.c | 2 +-
> > drivers/cxl/core/trace.h | 2 +-
> > 4 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 625394486459..8ceebd3b51d2 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -28,12 +28,12 @@ int cxl_region_init(void);
> > void cxl_region_exit(void);
> > int cxl_get_poison_by_endpoint(struct cxl_port *port);
> > struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
> > -u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > +u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > u64 dpa);
> >
> > #else
> > static inline u64
> > -cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
> > +cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
> > {
> > return ULLONG_MAX;
> > }
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 2626f3fff201..edc54a1ca298 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -878,7 +878,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> > dpa = le64_to_cpu(evt->common.phys_addr) & CXL_DPA_MASK;
> > cxlr = cxl_dpa_to_region(cxlmd, dpa);
> > if (cxlr)
> > - hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
> > + hpa = cxl_translate(cxlr, cxlmd, dpa);
> >
> > if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
> > trace_cxl_general_media(cxlmd, type, cxlr, hpa,
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 00a9f0eef8dd..245edf748906 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2797,7 +2797,7 @@ static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
> > return hpa;
> > }
> >
> > -u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > +u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > u64 dpa)
> > {
> > struct cxl_region_params *p = &cxlr->params;
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index 07a0394b1d99..6925a6a31f01 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h
> > @@ -704,7 +704,7 @@ TRACE_EVENT(cxl_poison,
> > if (cxlr) {
> > __assign_str(region, dev_name(&cxlr->dev));
> > memcpy(__entry->uuid, &cxlr->params.uuid, 16);
> > - __entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
> > + __entry->hpa = cxl_translate(cxlr, cxlmd,
> > __entry->dpa);
> > } else {
> > __assign_str(region, "");
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate()
2024-06-07 17:45 ` Alison Schofield
@ 2024-06-07 17:55 ` Jonathan Cameron
0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2024-06-07 17:55 UTC (permalink / raw)
To: Alison Schofield
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Fri, 7 Jun 2024 10:45:42 -0700
Alison Schofield <alison.schofield@intel.com> wrote:
> On Fri, Jun 07, 2024 at 03:40:25PM +0100, Jonathan Cameron wrote:
> > On Wed, 8 May 2024 11:47:50 -0700
> > alison.schofield@intel.com wrote:
> >
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
> > > addresses, the work it performs is a translation (dpa->hpa), not a
> > > trace. Rename it. Since this is the only translate work in CXL,
> > > drop the _hpa suffix in the rename.
> > >
> > > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> >
> > Maybe makes some sense to keep the direction in the name?
> >
> > I can see for things like PPR we may need to go HPA to DPA as
> > it'll be based on error counts that may be reported in HPA.
> >
> > So Friday bikeshedding time.
> > cxl_dpa_to_hpa()?
>
> Thanks for the review Jonathan!
>
> Let's sync on the possible dpa->hpa->dpa use cases and then I'll
> weigh in on your bikeshedding.
>
> Agree we need HPA->DPA support for userspace. User needs ability
> to find which device maps an HPA.
>
> wrt PPR, do you think that is an in kernel need?
> If the PPR maintenance was done in response to a CXL event, the DPA
> is already in the event trace along with the HPA so I wouldn't expect
> that a lookup is needed, at least not within the kernel.
Might be a synchronous poison event that ends up
triggering thing. And for that we'll be lucky to get any useful
info beyond an HPA.
I've only started thinking about this in the last few days, so
not 100% sure yet.
Given PPR can I think interrupt data flow we need to be very
careful to mediate any attempt so to enable it at all will
need kernel support. Whether the interface takes a DPA or
an HPA is an open question. We'll see!
Jonathan
>
> Your explicit cxl_dpa_to_hpa() name is good.
>
> --Alison
>
>
>
> >
> >
> > > ---
> > > drivers/cxl/core/core.h | 4 ++--
> > > drivers/cxl/core/mbox.c | 2 +-
> > > drivers/cxl/core/region.c | 2 +-
> > > drivers/cxl/core/trace.h | 2 +-
> > > 4 files changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > > index 625394486459..8ceebd3b51d2 100644
> > > --- a/drivers/cxl/core/core.h
> > > +++ b/drivers/cxl/core/core.h
> > > @@ -28,12 +28,12 @@ int cxl_region_init(void);
> > > void cxl_region_exit(void);
> > > int cxl_get_poison_by_endpoint(struct cxl_port *port);
> > > struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
> > > -u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > > +u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > > u64 dpa);
> > >
> > > #else
> > > static inline u64
> > > -cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
> > > +cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
> > > {
> > > return ULLONG_MAX;
> > > }
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 2626f3fff201..edc54a1ca298 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -878,7 +878,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> > > dpa = le64_to_cpu(evt->common.phys_addr) & CXL_DPA_MASK;
> > > cxlr = cxl_dpa_to_region(cxlmd, dpa);
> > > if (cxlr)
> > > - hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
> > > + hpa = cxl_translate(cxlr, cxlmd, dpa);
> > >
> > > if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
> > > trace_cxl_general_media(cxlmd, type, cxlr, hpa,
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index 00a9f0eef8dd..245edf748906 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -2797,7 +2797,7 @@ static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
> > > return hpa;
> > > }
> > >
> > > -u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > > +u64 cxl_translate(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> > > u64 dpa)
> > > {
> > > struct cxl_region_params *p = &cxlr->params;
> > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > > index 07a0394b1d99..6925a6a31f01 100644
> > > --- a/drivers/cxl/core/trace.h
> > > +++ b/drivers/cxl/core/trace.h
> > > @@ -704,7 +704,7 @@ TRACE_EVENT(cxl_poison,
> > > if (cxlr) {
> > > __assign_str(region, dev_name(&cxlr->dev));
> > > memcpy(__entry->uuid, &cxlr->params.uuid, 16);
> > > - __entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
> > > + __entry->hpa = cxl_translate(cxlr, cxlmd,
> > > __entry->dpa);
> > > } else {
> > > __assign_str(region, "");
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-06-07 15:01 ` Jonathan Cameron
@ 2024-06-07 18:20 ` Alison Schofield
2024-06-10 10:20 ` Jonathan Cameron
0 siblings, 1 reply; 18+ messages in thread
From: Alison Schofield @ 2024-06-07 18:20 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Fri, Jun 07, 2024 at 04:01:14PM +0100, Jonathan Cameron wrote:
> On Wed, 8 May 2024 11:47:51 -0700
> alison.schofield@intel.com wrote:
>
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> > interleave arithmetic XOR maps are applied during the HPA->DPA
> > translation. The XOR function changes the interleave selector
> > bit (aka position bit) in the HPA thereby varying which host bridge
> > services an HPA. The purpose is to minimize hot spots thereby
> > improving performance.
> >
> > When a device reports a DPA in events such as poison, general_media,
> > and dram, the driver translates that DPA back to an HPA. Presently,
> > the CXL driver translation only considers the modulo position and
> > will report the wrong HPA for XOR configured CFMWS's.
> >
> > Add a helper function that restores the XOR'd bits during DPA->HPA
> > address translation. Plumb a root decoder callback to the new helper
> > when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> > let the callback be NULL - as in no extra work required.
> >
> > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Trivial comment inline. Agree entirely that some tests would be good.
> I ran through a few trivial cases on a bit of paper and it looks to
> me like it works but that hardly counts as testing :)
Thanks for the review and for doing some calcs.
I've become very adept at working these out with paper/pencil, that hop
to C implementation is the challenge ;)
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Hmm...well, let me know if I can keep that after you read below...
>
> > ---
> > drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> > drivers/cxl/core/port.c | 5 +++-
> > drivers/cxl/core/region.c | 5 ++++
> > drivers/cxl/cxl.h | 6 ++++-
> > 4 files changed, 59 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 571069863c62..20488e7b09ac 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> > return cxlrd->cxlsd.target[n];
> > }
> >
> > +static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
> > +{
> > + struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> > + int hbiw = cxlrd->cxlsd.nr_targets;
> > + u64 val;
> > + int pos;
> > +
> > + /* No xormaps for host bridge interleave ways of 1 or 3 */
> > + if (hbiw == 1 || hbiw == 3)
> > + return hpa;
> > +
> > + /*
> > + * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> > + * the position bit to its value before the xormap was applied at
> > + * HPA->DPA translation.
> > + *
> > + * pos is the lowest set bit in an XORMAP
> > + * val is the XORALLBITS(HPA & XORMAP)
> > + *
> > + * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> > + * as an operation that outputs a single bit by XORing all the
> > + * bits in the input (hpa & xormap). Implement XORALLBITS using
> > + * hweight64(). If the hamming weight is even the XOR of those
> > + * bits results in 0, if odd the XOR result is 1.
> > + */
> > +
> > + for (int i = 0; i < cximsd->nr_maps; i++) {
> > + if (!cximsd->xormaps[i])
> > + continue;
> > + pos = __ffs(cximsd->xormaps[i]);
>
> At the moment the comment on XORALLBITS isn't associated with this
> code very well. I'd factor it out as cxl_xorallbits() mostly so
> you can stick the comment next to the bit that does the work.
> Or maybe a #define XORALLBITS(hpa, xormap) is good enough if
> you move it up under the comment.
>
> > + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> > + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> > + }
> > +
> > + return hpa;
> > +}
> > +
You haven't convinced me that readers will not be able to associate
the block comment directly above the for-loop with the work inside
the for-loop. Especially since this is a 25 line function with a
single focus.
I intentionally didn't insert line-by-line commentary in the
for loop, but rather told the story in the comment and then
just did it.
Maybe repeating 'val' here, wraps up the comment better:
- * bits results in 0, if odd the XOR result is 1.
+ * bits results in val==0, if odd the XOR results in val==1.
-- Alison
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation
2024-06-07 18:20 ` Alison Schofield
@ 2024-06-10 10:20 ` Jonathan Cameron
0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Cameron @ 2024-06-10 10:20 UTC (permalink / raw)
To: Alison Schofield
Cc: Davidlohr Bueso, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Fri, 7 Jun 2024 11:20:25 -0700
Alison Schofield <alison.schofield@intel.com> wrote:
> On Fri, Jun 07, 2024 at 04:01:14PM +0100, Jonathan Cameron wrote:
> > On Wed, 8 May 2024 11:47:51 -0700
> > alison.schofield@intel.com wrote:
> >
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> > > interleave arithmetic XOR maps are applied during the HPA->DPA
> > > translation. The XOR function changes the interleave selector
> > > bit (aka position bit) in the HPA thereby varying which host bridge
> > > services an HPA. The purpose is to minimize hot spots thereby
> > > improving performance.
> > >
> > > When a device reports a DPA in events such as poison, general_media,
> > > and dram, the driver translates that DPA back to an HPA. Presently,
> > > the CXL driver translation only considers the modulo position and
> > > will report the wrong HPA for XOR configured CFMWS's.
> > >
> > > Add a helper function that restores the XOR'd bits during DPA->HPA
> > > address translation. Plumb a root decoder callback to the new helper
> > > when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> > > let the callback be NULL - as in no extra work required.
> > >
> > > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Trivial comment inline. Agree entirely that some tests would be good.
> > I ran through a few trivial cases on a bit of paper and it looks to
> > me like it works but that hardly counts as testing :)
>
> Thanks for the review and for doing some calcs.
>
> I've become very adept at working these out with paper/pencil, that hop
> to C implementation is the challenge ;)
>
> >
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Hmm...well, let me know if I can keep that after you read below...
Meh. I don't care that much. Just made me thing a tiny bit more
than I like to ;) Fine with or without the change you suggest.
Jonathan
>
> >
> > > ---
> > > drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> > > drivers/cxl/core/port.c | 5 +++-
> > > drivers/cxl/core/region.c | 5 ++++
> > > drivers/cxl/cxl.h | 6 ++++-
> > > 4 files changed, 59 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > > index 571069863c62..20488e7b09ac 100644
> > > --- a/drivers/cxl/acpi.c
> > > +++ b/drivers/cxl/acpi.c
> > > @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> > > return cxlrd->cxlsd.target[n];
> > > }
> > >
> > > +static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
> > > +{
> > > + struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> > > + int hbiw = cxlrd->cxlsd.nr_targets;
> > > + u64 val;
> > > + int pos;
> > > +
> > > + /* No xormaps for host bridge interleave ways of 1 or 3 */
> > > + if (hbiw == 1 || hbiw == 3)
> > > + return hpa;
> > > +
> > > + /*
> > > + * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> > > + * the position bit to its value before the xormap was applied at
> > > + * HPA->DPA translation.
> > > + *
> > > + * pos is the lowest set bit in an XORMAP
> > > + * val is the XORALLBITS(HPA & XORMAP)
> > > + *
> > > + * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> > > + * as an operation that outputs a single bit by XORing all the
> > > + * bits in the input (hpa & xormap). Implement XORALLBITS using
> > > + * hweight64(). If the hamming weight is even the XOR of those
> > > + * bits results in 0, if odd the XOR result is 1.
> > > + */
> > > +
> > > + for (int i = 0; i < cximsd->nr_maps; i++) {
> > > + if (!cximsd->xormaps[i])
> > > + continue;
> > > + pos = __ffs(cximsd->xormaps[i]);
> >
> > At the moment the comment on XORALLBITS isn't associated with this
> > code very well. I'd factor it out as cxl_xorallbits() mostly so
> > you can stick the comment next to the bit that does the work.
> > Or maybe a #define XORALLBITS(hpa, xormap) is good enough if
> > you move it up under the comment.
> >
> > > + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> > > + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> > > + }
> > > +
> > > + return hpa;
> > > +}
> > > +
>
> You haven't convinced me that readers will not be able to associate
> the block comment directly above the for-loop with the work inside
> the for-loop. Especially since this is a 25 line function with a
> single focus.
>
> I intentionally didn't insert line-by-line commentary in the
> for loop, but rather told the story in the comment and then
> just did it.
>
> Maybe repeating 'val' here, wraps up the comment better:
>
> - * bits results in 0, if odd the XOR result is 1.
> + * bits results in val==0, if odd the XOR results in val==1.
>
> -- Alison
>
>
>
>
>
>
>
>
>
> >
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list
2024-05-08 18:47 ` [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
2024-06-07 15:04 ` Jonathan Cameron
@ 2024-06-11 21:50 ` Dan Williams
1 sibling, 0 replies; 18+ messages in thread
From: Dan Williams @ 2024-06-11 21:50 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> When a root decoder is configured the interleave target list is read
> from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
> 9-22 the target list is in interleave order. The CXL driver populates
> its decoder target list in the same order and stores it in 'struct
> cxl_switch_decoder' field "@target: active ordered target list in
> current decoder configuration"
>
> Given the promise of an ordered list, the driver can stop duplicating
> the work of BIOS and simply check target positions against the ordered
> list during region configuration.
>
> The simplified check against the ordered list is presented here.
> A follow-on patch will remove the unused code.
>
> For Modulo arithmetic this is not a fix, only a simplification.
> For XOR arithmetic this is a fix for HB IW of 3,6,12.
>
> Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
LGTM
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-06-11 21:51 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-08 18:47 [PATCH v2 0/4] XOR Math Fixups: translation & position alison.schofield
2024-05-08 18:47 ` [PATCH v2 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_translate() alison.schofield
2024-05-30 3:45 ` Dan Williams
2024-06-07 14:40 ` Jonathan Cameron
2024-06-07 17:45 ` Alison Schofield
2024-06-07 17:55 ` Jonathan Cameron
2024-05-08 18:47 ` [PATCH v2 2/4] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
2024-05-30 3:55 ` Dan Williams
2024-05-30 22:29 ` Alison Schofield
2024-05-31 1:46 ` Dan Williams
2024-06-07 15:01 ` Jonathan Cameron
2024-06-07 18:20 ` Alison Schofield
2024-06-10 10:20 ` Jonathan Cameron
2024-05-08 18:47 ` [PATCH v2 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
2024-06-07 15:04 ` Jonathan Cameron
2024-06-11 21:50 ` Dan Williams
2024-05-08 18:47 ` [PATCH v2 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
2024-06-07 15:06 ` Jonathan Cameron
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox