* [PATCH v3 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_dpa_to_hpa()
2024-06-25 0:55 [PATCH v3 0/4] XOR Math Fixups: translation & position alison.schofield
@ 2024-06-25 0:55 ` alison.schofield
2024-06-27 1:45 ` Dan Williams
2024-06-25 0:55 ` [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: alison.schofield @ 2024-06-25 0:55 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
addresses, the work it performs is a DPA to HPA translation, not a
trace. Rename it.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/core.h | 8 ++++----
drivers/cxl/core/mbox.c | 2 +-
drivers/cxl/core/region.c | 33 +++++++++++++--------------------
drivers/cxl/core/trace.h | 4 ++--
4 files changed, 20 insertions(+), 27 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 625394486459..72a506c9dbd0 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -28,12 +28,12 @@ int cxl_region_init(void);
void cxl_region_exit(void);
int cxl_get_poison_by_endpoint(struct cxl_port *port);
struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
-u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
- u64 dpa);
+u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
+ u64 dpa);
#else
-static inline u64
-cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa)
+static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
+ const struct cxl_memdev *cxlmd, u64 dpa)
{
return ULLONG_MAX;
}
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 2626f3fff201..eb0b08e5136f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -878,7 +878,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
dpa = le64_to_cpu(evt->common.phys_addr) & CXL_DPA_MASK;
cxlr = cxl_dpa_to_region(cxlmd, dpa);
if (cxlr)
- hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
+ hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3c2b6144be23..237c28d5f2cc 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2749,15 +2749,25 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
return false;
}
-static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
- struct cxl_endpoint_decoder *cxled)
+u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
+ u64 dpa)
{
u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
struct cxl_region_params *p = &cxlr->params;
- int pos = cxled->pos;
+ struct cxl_endpoint_decoder *cxled = NULL;
u16 eig = 0;
u8 eiw = 0;
+ int pos;
+ for (int i = 0; i < p->nr_targets; i++) {
+ cxled = p->targets[i];
+ if (cxlmd == cxled_to_memdev(cxled))
+ break;
+ }
+ if (!cxled || cxlmd != cxled_to_memdev(cxled))
+ return ULLONG_MAX;
+
+ pos = cxled->pos;
ways_to_eiw(p->interleave_ways, &eiw);
granularity_to_eig(p->interleave_granularity, &eig);
@@ -2797,23 +2807,6 @@ static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
return hpa;
}
-u64 cxl_trace_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
- u64 dpa)
-{
- struct cxl_region_params *p = &cxlr->params;
- struct cxl_endpoint_decoder *cxled = NULL;
-
- for (int i = 0; i < p->nr_targets; i++) {
- cxled = p->targets[i];
- if (cxlmd == cxled_to_memdev(cxled))
- break;
- }
- if (!cxled || cxlmd != cxled_to_memdev(cxled))
- return ULLONG_MAX;
-
- return cxl_dpa_to_hpa(dpa, cxlr, cxled);
-}
-
static struct lock_class_key cxl_pmem_region_key;
static int cxl_pmem_region_alloc(struct cxl_region *cxlr)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index ee5cd4eb2f16..21b76e9c5c60 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -704,8 +704,8 @@ TRACE_EVENT(cxl_poison,
if (cxlr) {
__assign_str(region);
memcpy(__entry->uuid, &cxlr->params.uuid, 16);
- __entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
- __entry->dpa);
+ __entry->hpa = cxl_dpa_to_hpa(cxlr, cxlmd,
+ __entry->dpa);
} else {
__assign_str(region);
memset(__entry->uuid, 0, 16);
--
2.37.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v3 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_dpa_to_hpa()
2024-06-25 0:55 ` [PATCH v3 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_dpa_to_hpa() alison.schofield
@ 2024-06-27 1:45 ` Dan Williams
0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2024-06-27 1:45 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
> addresses, the work it performs is a DPA to HPA translation, not a
> trace. Rename it.
>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Looks good,
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation
2024-06-25 0:55 [PATCH v3 0/4] XOR Math Fixups: translation & position alison.schofield
2024-06-25 0:55 ` [PATCH v3 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_dpa_to_hpa() alison.schofield
@ 2024-06-25 0:55 ` alison.schofield
2024-06-27 2:04 ` Dan Williams
2024-07-01 9:28 ` Fabio M. De Francesco
2024-06-25 0:55 ` [PATCH v3 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
` (2 subsequent siblings)
4 siblings, 2 replies; 11+ messages in thread
From: alison.schofield @ 2024-06-25 0:55 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl, Diego Garcia Rodriguez
From: Alison Schofield <alison.schofield@intel.com>
When a device reports a DPA in events like poison, general_media,
and dram, the driver translates that DPA back to an HPA. Presently,
the CXL driver translation only considers the Modulo position and
will report the wrong HPA for XOR configured root decoders.
Add a helper function that restores the XOR'd bits during DPA->HPA
address translation. Plumb a root decoder callback to the new helper
when XOR interleave arithmetic is in use. For Modulo arithmetic, just
let the callback be NULL - as in no extra work required.
Upon completion of a DPA->HPA translation a couple of checks are
performed on the result. One simply confirms that the calculated
HPA is within the address range of the region. That test is useful
for both Modulo and XOR interleave arithmetic decodes.
A second check confirms that the HPA is within an expected chunk
based on the endpoints position in the region and the region
granularity. An XOR decode disrupts the Modulo pattern making the
chunk check useless.
To align the checks with the proper decode, pull the region range
check inline and use the helper to do the chunk check for Modulo
decodes only.
A cxl-test unit test of address translations is in upstream review.
Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
---
drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
drivers/cxl/core/port.c | 5 +++-
drivers/cxl/core/region.c | 22 ++++++++++--------
drivers/cxl/cxl.h | 6 ++++-
4 files changed, 67 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 571069863c62..010741da0176 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
return cxlrd->cxlsd.target[n];
}
+static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
+{
+ struct cxl_cxims_data *cximsd = cxlrd->platform_data;
+ int hbiw = cxlrd->cxlsd.nr_targets;
+ u64 val;
+ int pos;
+
+ /* No xormaps for host bridge interleave ways of 1 or 3 */
+ if (hbiw == 1 || hbiw == 3)
+ return hpa;
+
+ /*
+ * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
+ * the position bit to its value before the xormap was applied at
+ * HPA->DPA translation.
+ *
+ * pos is the lowest set bit in an XORMAP
+ * val is the XORALLBITS(HPA & XORMAP)
+ *
+ * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
+ * as an operation that outputs a single bit by XORing all the
+ * bits in the input (hpa & xormap). Implement XORALLBITS using
+ * hweight64(). If the hamming weight is even the XOR of those
+ * bits results in val==0, if odd the XOR result is val==1.
+ */
+
+ for (int i = 0; i < cximsd->nr_maps; i++) {
+ if (!cximsd->xormaps[i])
+ continue;
+ pos = __ffs(cximsd->xormaps[i]);
+ val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
+ hpa = (hpa & ~(1ULL << pos)) | (val << pos);
+ }
+
+ return hpa;
+}
+
struct cxl_cxims_context {
struct device *dev;
struct cxl_root_decoder *cxlrd;
@@ -362,6 +399,7 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
struct cxl_cxims_context cxims_ctx;
struct device *dev = ctx->dev;
cxl_calc_hb_fn cxl_calc_hb;
+ cxl_translate_fn translate;
struct cxl_decoder *cxld;
unsigned int ways, i, ig;
int rc;
@@ -389,13 +427,17 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
if (rc)
return rc;
- if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
+ if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
cxl_calc_hb = cxl_hb_modulo;
- else
+ translate = NULL;
+
+ } else {
cxl_calc_hb = cxl_hb_xor;
+ translate = cxl_xor_translate;
+ }
struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
- cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
+ cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb, translate);
if (IS_ERR(cxlrd))
return PTR_ERR(cxlrd);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 887ed6e358fb..e5d5f7783857 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1808,6 +1808,7 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
* @port: owning CXL root of this decoder
* @nr_targets: static number of downstream targets
* @calc_hb: which host bridge covers the n'th position by granularity
+ * @translate: decoder specific address translation function
*
* Return: A new cxl decoder to be registered by cxl_decoder_add(). A
* 'CXL root' decoder is one that decodes from a top-level / static platform
@@ -1816,7 +1817,8 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
*/
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb)
+ cxl_calc_hb_fn calc_hb,
+ cxl_translate_fn translate)
{
struct cxl_root_decoder *cxlrd;
struct cxl_switch_decoder *cxlsd;
@@ -1839,6 +1841,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
}
cxlrd->calc_hb = calc_hb;
+ cxlrd->translate = translate;
mutex_init(&cxlrd->range_lock);
cxld = &cxlsd->cxld;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 237c28d5f2cc..bdb06dbe98a8 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2723,20 +2723,13 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
return ctx.cxlr;
}
-static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
+static bool cxl_is_hpa_in_chunk(u64 hpa, struct cxl_region *cxlr, int pos)
{
struct cxl_region_params *p = &cxlr->params;
int gran = p->interleave_granularity;
int ways = p->interleave_ways;
u64 offset;
- /* Is the hpa within this region at all */
- if (hpa < p->res->start || hpa > p->res->end) {
- dev_dbg(&cxlr->dev,
- "Addr trans fail: hpa 0x%llx not in region\n", hpa);
- return false;
- }
-
/* Is the hpa in an expected chunk for its pos(-ition) */
offset = hpa - p->res->start;
offset = do_div(offset, gran * ways);
@@ -2752,6 +2745,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
u64 dpa)
{
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
struct cxl_region_params *p = &cxlr->params;
struct cxl_endpoint_decoder *cxled = NULL;
@@ -2801,7 +2795,17 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
/* Apply the hpa_offset to the region base address */
hpa = hpa_offset + p->res->start;
- if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
+ /* Root decoder translation overrides typical modulo decode */
+ if (cxlrd->translate)
+ hpa = cxlrd->translate(cxlrd, hpa);
+
+ if (hpa < p->res->start || hpa > p->res->end) {
+ dev_dbg(&cxlr->dev,
+ "Addr trans fail: hpa 0x%llx not in region\n", hpa);
+ return ULLONG_MAX;
+ }
+
+ if (!cxlrd->translate && (!cxl_is_hpa_in_chunk(hpa, cxlr, pos)))
return ULLONG_MAX;
return hpa;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 603c0120cff8..3678235fc9ce 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -434,12 +434,14 @@ struct cxl_switch_decoder {
struct cxl_root_decoder;
typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
int pos);
+typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
/**
* struct cxl_root_decoder - Static platform CXL address decoder
* @res: host / parent resource for region allocations
* @region_id: region id for next region provisioning event
* @calc_hb: which host bridge covers the n'th position by granularity
+ * @translate: decoder specific address translation function
* @platform_data: platform specific configuration data
* @range_lock: sync region autodiscovery by address range
* @qos_class: QoS performance class cookie
@@ -449,6 +451,7 @@ struct cxl_root_decoder {
struct resource *res;
atomic_t region_id;
cxl_calc_hb_fn calc_hb;
+ cxl_translate_fn translate;
void *platform_data;
struct mutex range_lock;
int qos_class;
@@ -773,7 +776,8 @@ bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb);
+ cxl_calc_hb_fn calc_hb,
+ cxl_translate_fn translate);
struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets);
--
2.37.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation
2024-06-25 0:55 ` [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
@ 2024-06-27 2:04 ` Dan Williams
2024-07-01 9:28 ` Fabio M. De Francesco
1 sibling, 0 replies; 11+ messages in thread
From: Dan Williams @ 2024-06-27 2:04 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl, Diego Garcia Rodriguez
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> When a device reports a DPA in events like poison, general_media,
> and dram, the driver translates that DPA back to an HPA. Presently,
> the CXL driver translation only considers the Modulo position and
> will report the wrong HPA for XOR configured root decoders.
>
> Add a helper function that restores the XOR'd bits during DPA->HPA
> address translation. Plumb a root decoder callback to the new helper
> when XOR interleave arithmetic is in use. For Modulo arithmetic, just
> let the callback be NULL - as in no extra work required.
>
> Upon completion of a DPA->HPA translation a couple of checks are
> performed on the result. One simply confirms that the calculated
> HPA is within the address range of the region. That test is useful
> for both Modulo and XOR interleave arithmetic decodes.
>
> A second check confirms that the HPA is within an expected chunk
> based on the endpoints position in the region and the region
> granularity. An XOR decode disrupts the Modulo pattern making the
> chunk check useless.
>
> To align the checks with the proper decode, pull the region range
> check inline and use the helper to do the chunk check for Modulo
> decodes only.
>
> A cxl-test unit test of address translations is in upstream review.
>
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
> ---
> drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/core/port.c | 5 +++-
> drivers/cxl/core/region.c | 22 ++++++++++--------
> drivers/cxl/cxl.h | 6 ++++-
> 4 files changed, 67 insertions(+), 14 deletions(-)
>
[..]
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 603c0120cff8..3678235fc9ce 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -434,12 +434,14 @@ struct cxl_switch_decoder {
> struct cxl_root_decoder;
> typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
> int pos);
> +typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
>
> /**
> * struct cxl_root_decoder - Static platform CXL address decoder
> * @res: host / parent resource for region allocations
> * @region_id: region id for next region provisioning event
> * @calc_hb: which host bridge covers the n'th position by granularity
> + * @translate: decoder specific address translation function
> * @platform_data: platform specific configuration data
> * @range_lock: sync region autodiscovery by address range
> * @qos_class: QoS performance class cookie
> @@ -449,6 +451,7 @@ struct cxl_root_decoder {
> struct resource *res;
> atomic_t region_id;
> cxl_calc_hb_fn calc_hb;
> + cxl_translate_fn translate;
So the cxl_translate() => cxl_dpa_to_hpa() rename was good, but now this
name sticks out as not right because this routine is not doing dpa to
hpa translation. It is doing extended translation after the modulo
translation completes. It builds on the assumption that all address
decode below host-bridges is only modulo math, but that once the HPA
reaches the host-bridge it goes through a second stage translation.
In other parts of the driver this host-bridge level address has been
referred to as an SPA. Most times HPAs and SPAs are identical, but with
XOR math, or with AMD platforms like this [1], there is CXL HPA to
platform SPA translation.
All that said, lets call this method hpa_to_spa() and document it as
@hpa_to_spa: translate CXL host-physical-address to Platform system-physical-address
...then the code reads better:
if (!cxlrd->hpa_to_spa)
...then it is clear that this root decoder is not doing anything outside
of standard CXL address translation that all switch and device-endpoint
decoders support.
With that rename you can add:
Reviewed-by: Dan Willliams <dan.j.williams@intel.com>
[1]: https://lore.kernel.org/all/20240216160113.407141-1-rrichter@amd.com/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation
2024-06-25 0:55 ` [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
2024-06-27 2:04 ` Dan Williams
@ 2024-07-01 9:28 ` Fabio M. De Francesco
2024-07-01 9:42 ` Fabio M. De Francesco
2024-07-01 22:48 ` Alison Schofield
1 sibling, 2 replies; 11+ messages in thread
From: Fabio M. De Francesco @ 2024-07-01 9:28 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams, alison.schofield
Cc: linux-cxl, Diego Garcia Rodriguez
On Tuesday, June 25, 2024 2:55:53 AM GMT+2 alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
Hi Alison,
Below I have two questions...
> When a device reports a DPA in events like poison, general_media,
> and dram, the driver translates that DPA back to an HPA. Presently,
> the CXL driver translation only considers the Modulo position and
> will report the wrong HPA for XOR configured root decoders.
>
> Add a helper function that restores the XOR'd bits during DPA->HPA
> address translation. Plumb a root decoder callback to the new helper
> when XOR interleave arithmetic is in use. For Modulo arithmetic, just
> let the callback be NULL - as in no extra work required.
>
> Upon completion of a DPA->HPA translation a couple of checks are
> performed on the result. One simply confirms that the calculated
> HPA is within the address range of the region. That test is useful
> for both Modulo and XOR interleave arithmetic decodes.
>
> A second check confirms that the HPA is within an expected chunk
> based on the endpoints position in the region and the region
> granularity. An XOR decode disrupts the Modulo pattern making the
> chunk check useless.
>
> To align the checks with the proper decode, pull the region range
> check inline and use the helper to do the chunk check for Modulo
> decodes only.
>
> A cxl-test unit test of address translations is in upstream review.
Would it be helpful to provide a link to the test?
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
> ---
> drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> drivers/cxl/core/port.c | 5 +++-
> drivers/cxl/core/region.c | 22 ++++++++++--------
> drivers/cxl/cxl.h | 6 ++++-
> 4 files changed, 67 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 571069863c62..010741da0176 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct
cxl_root_decoder *cxlrd, int pos)
> return cxlrd->cxlsd.target[n];
> }
>
> +static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
> +{
> + struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> + int hbiw = cxlrd->cxlsd.nr_targets;
> + u64 val;
> + int pos;
> +
> + /* No xormaps for host bridge interleave ways of 1 or 3 */
> + if (hbiw == 1 || hbiw == 3)
> + return hpa;
> +
> + /*
> + * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> + * the position bit to its value before the xormap was applied at
> + * HPA->DPA translation.
> + *
> + * pos is the lowest set bit in an XORMAP
> + * val is the XORALLBITS(HPA & XORMAP)
> + *
> + * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> + * as an operation that outputs a single bit by XORing all the
> + * bits in the input (hpa & xormap). Implement XORALLBITS using
> + * hweight64(). If the hamming weight is even the XOR of those
> + * bits results in val==0, if odd the XOR result is val==1.
> + */
> +
> + for (int i = 0; i < cximsd->nr_maps; i++) {
> + if (!cximsd->xormaps[i])
> + continue;
> + pos = __ffs(cximsd->xormaps[i]);
> + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> + }
> +
> + return hpa;
> +}
> +
> struct cxl_cxims_context {
> struct device *dev;
> struct cxl_root_decoder *cxlrd;
> @@ -362,6 +399,7 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws
*cfmws,
> struct cxl_cxims_context cxims_ctx;
> struct device *dev = ctx->dev;
> cxl_calc_hb_fn cxl_calc_hb;
> + cxl_translate_fn translate;
> struct cxl_decoder *cxld;
> unsigned int ways, i, ig;
> int rc;
> @@ -389,13 +427,17 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws
*cfmws,
> if (rc)
> return rc;
>
> - if (cfmws->interleave_arithmetic ==
ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
> + if (cfmws->interleave_arithmetic ==
ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
> cxl_calc_hb = cxl_hb_modulo;
> - else
> + translate = NULL;
> +
> + } else {
> cxl_calc_hb = cxl_hb_xor;
> + translate = cxl_xor_translate;
> + }
>
> struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
> - cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
> + cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb,
translate);
> if (IS_ERR(cxlrd))
> return PTR_ERR(cxlrd);
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 887ed6e358fb..e5d5f7783857 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1808,6 +1808,7 @@ static int cxl_switch_decoder_init(struct cxl_port
*port,
> * @port: owning CXL root of this decoder
> * @nr_targets: static number of downstream targets
> * @calc_hb: which host bridge covers the n'th position by granularity
> + * @translate: decoder specific address translation function
> *
> * Return: A new cxl decoder to be registered by cxl_decoder_add(). A
> * 'CXL root' decoder is one that decodes from a top-level / static
platform
> @@ -1816,7 +1817,8 @@ static int cxl_switch_decoder_init(struct cxl_port
*port,
> */
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> unsigned
int nr_targets,
> -
cxl_calc_hb_fn calc_hb)
> +
cxl_calc_hb_fn calc_hb,
> +
cxl_translate_fn translate)
> {
> struct cxl_root_decoder *cxlrd;
> struct cxl_switch_decoder *cxlsd;
> @@ -1839,6 +1841,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct
cxl_port *port,
> }
>
> cxlrd->calc_hb = calc_hb;
> + cxlrd->translate = translate;
> mutex_init(&cxlrd->range_lock);
>
> cxld = &cxlsd->cxld;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 237c28d5f2cc..bdb06dbe98a8 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2723,20 +2723,13 @@ struct cxl_region *cxl_dpa_to_region(const struct
cxl_memdev *cxlmd, u64 dpa)
> return ctx.cxlr;
> }
>
> -static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
> +static bool cxl_is_hpa_in_chunk(u64 hpa, struct cxl_region *cxlr, int pos)
> {
> struct cxl_region_params *p = &cxlr->params;
> int gran = p->interleave_granularity;
> int ways = p->interleave_ways;
> u64 offset;
>
> - /* Is the hpa within this region at all */
> - if (hpa < p->res->start || hpa > p->res->end) {
> - dev_dbg(&cxlr->dev,
> - "Addr trans fail: hpa 0x%llx not in
region\n", hpa);
> - return false;
> - }
> -
> /* Is the hpa in an expected chunk for its pos(-ition) */
> offset = hpa - p->res->start;
> offset = do_div(offset, gran * ways);
> @@ -2752,6 +2745,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct
cxl_region *cxlr, int pos)
> u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> u64 dpa)
> {
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr-
>dev.parent);
> u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_endpoint_decoder *cxled = NULL;
> @@ -2801,7 +2795,17 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const
struct cxl_memdev *cxlmd,
> /* Apply the hpa_offset to the region base address */
> hpa = hpa_offset + p->res->start;
>
> - if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
> + /* Root decoder translation overrides typical modulo decode */
> + if (cxlrd->translate)
> + hpa = cxlrd->translate(cxlrd, hpa);
> +
> + if (hpa < p->res->start || hpa > p->res->end) {
> + dev_dbg(&cxlr->dev,
> + "Addr trans fail: hpa 0x%llx not in
region\n", hpa);
> + return ULLONG_MAX;
> + }
> +
> + if (!cxlrd->translate && (!cxl_is_hpa_in_chunk(hpa, cxlr, pos)))
> return ULLONG_MAX;
I needed some time to understand this. It was not immediately clear why, for
XOR translations, this chunk check is skipped.
Wouldn't it be helpful to add a comment to explain why that check is skipped?
Thanks,
Fabio
>
> return hpa;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 603c0120cff8..3678235fc9ce 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -434,12 +434,14 @@ struct cxl_switch_decoder {
> struct cxl_root_decoder;
> typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
> int pos);
> +typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
>
> /**
> * struct cxl_root_decoder - Static platform CXL address decoder
> * @res: host / parent resource for region allocations
> * @region_id: region id for next region provisioning event
> * @calc_hb: which host bridge covers the n'th position by granularity
> + * @translate: decoder specific address translation function
> * @platform_data: platform specific configuration data
> * @range_lock: sync region autodiscovery by address range
> * @qos_class: QoS performance class cookie
> @@ -449,6 +451,7 @@ struct cxl_root_decoder {
> struct resource *res;
> atomic_t region_id;
> cxl_calc_hb_fn calc_hb;
> + cxl_translate_fn translate;
> void *platform_data;
> struct mutex range_lock;
> int qos_class;
> @@ -773,7 +776,8 @@ bool is_switch_decoder(struct device *dev);
> bool is_endpoint_decoder(struct device *dev);
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> unsigned
int nr_targets,
> -
cxl_calc_hb_fn calc_hb);
> +
cxl_calc_hb_fn calc_hb,
> +
cxl_translate_fn translate);
> struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
> struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
>
unsigned int nr_targets);
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation
2024-07-01 9:28 ` Fabio M. De Francesco
@ 2024-07-01 9:42 ` Fabio M. De Francesco
2024-07-01 22:48 ` Alison Schofield
1 sibling, 0 replies; 11+ messages in thread
From: Fabio M. De Francesco @ 2024-07-01 9:42 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams, alison.schofield
Cc: linux-cxl, Diego Garcia Rodriguez
On Monday, July 1, 2024 11:28:24 AM GMT+2 Fabio M. De Francesco wrote:
> On Tuesday, June 25, 2024 2:55:53 AM GMT+2 alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
>
> Hi Alison,
>
> Below I have two questions...
>
> > When a device reports a DPA in events like poison, general_media,
> > and dram, the driver translates that DPA back to an HPA. Presently,
> > the CXL driver translation only considers the Modulo position and
> > will report the wrong HPA for XOR configured root decoders.
> >
> > Add a helper function that restores the XOR'd bits during DPA->HPA
> > address translation. Plumb a root decoder callback to the new helper
> > when XOR interleave arithmetic is in use. For Modulo arithmetic, just
> > let the callback be NULL - as in no extra work required.
> >
> > Upon completion of a DPA->HPA translation a couple of checks are
> > performed on the result. One simply confirms that the calculated
> > HPA is within the address range of the region. That test is useful
> > for both Modulo and XOR interleave arithmetic decodes.
> >
> > A second check confirms that the HPA is within an expected chunk
> > based on the endpoints position in the region and the region
> > granularity. An XOR decode disrupts the Modulo pattern making the
> > chunk check useless.
> >
> > To align the checks with the proper decode, pull the region range
> > check inline and use the helper to do the chunk check for Modulo
> > decodes only.
> >
> > A cxl-test unit test of address translations is in upstream review.
>
> Would it be helpful to provide a link to the test?
>
> > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
> > ---
> > drivers/cxl/acpi.c | 48 ++++++++++++++++++++++++++++++++++++---
> > drivers/cxl/core/port.c | 5 +++-
> > drivers/cxl/core/region.c | 22 ++++++++++--------
> > drivers/cxl/cxl.h | 6 ++++-
> > 4 files changed, 67 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 571069863c62..010741da0176 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -74,6 +74,43 @@ static struct cxl_dport *cxl_hb_xor(struct
> cxl_root_decoder *cxlrd, int pos)
> > return cxlrd->cxlsd.target[n];
> > }
> >
> > +static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
> > +{
> > + struct cxl_cxims_data *cximsd = cxlrd->platform_data;
> > + int hbiw = cxlrd->cxlsd.nr_targets;
> > + u64 val;
> > + int pos;
> > +
> > + /* No xormaps for host bridge interleave ways of 1 or 3 */
> > + if (hbiw == 1 || hbiw == 3)
> > + return hpa;
> > +
> > + /*
> > + * For root decoders using xormaps (hbiw: 2,4,6,8,12,16) restore
> > + * the position bit to its value before the xormap was applied at
> > + * HPA->DPA translation.
> > + *
> > + * pos is the lowest set bit in an XORMAP
> > + * val is the XORALLBITS(HPA & XORMAP)
> > + *
> > + * XORALLBITS: The CXL spec (3.1 Table 9-22) defines XORALLBITS
> > + * as an operation that outputs a single bit by XORing all the
> > + * bits in the input (hpa & xormap). Implement XORALLBITS using
> > + * hweight64(). If the hamming weight is even the XOR of those
> > + * bits results in val==0, if odd the XOR result is val==1.
> > + */
> > +
> > + for (int i = 0; i < cximsd->nr_maps; i++) {
> > + if (!cximsd->xormaps[i])
> > + continue;
> > + pos = __ffs(cximsd->xormaps[i]);
> > + val = (hweight64(hpa & cximsd->xormaps[i]) & 1);
> > + hpa = (hpa & ~(1ULL << pos)) | (val << pos);
> > + }
> > +
> > + return hpa;
> > +}
> > +
> > struct cxl_cxims_context {
> > struct device *dev;
> > struct cxl_root_decoder *cxlrd;
> > @@ -362,6 +399,7 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws
> *cfmws,
> > struct cxl_cxims_context cxims_ctx;
> > struct device *dev = ctx->dev;
> > cxl_calc_hb_fn cxl_calc_hb;
> > + cxl_translate_fn translate;
> > struct cxl_decoder *cxld;
> > unsigned int ways, i, ig;
> > int rc;
> > @@ -389,13 +427,17 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws
> *cfmws,
> > if (rc)
> > return rc;
> >
> > - if (cfmws->interleave_arithmetic ==
> ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
> > + if (cfmws->interleave_arithmetic ==
> ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
> > cxl_calc_hb = cxl_hb_modulo;
> > - else
> > + translate = NULL;
> > +
> > + } else {
> > cxl_calc_hb = cxl_hb_xor;
> > + translate = cxl_xor_translate;
> > + }
> >
> > struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
> > - cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
> > + cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb,
> translate);
> > if (IS_ERR(cxlrd))
> > return PTR_ERR(cxlrd);
> >
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 887ed6e358fb..e5d5f7783857 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1808,6 +1808,7 @@ static int cxl_switch_decoder_init(struct cxl_port
> *port,
> > * @port: owning CXL root of this decoder
> > * @nr_targets: static number of downstream targets
> > * @calc_hb: which host bridge covers the n'th position by granularity
> > + * @translate: decoder specific address translation function
> > *
> > * Return: A new cxl decoder to be registered by cxl_decoder_add(). A
> > * 'CXL root' decoder is one that decodes from a top-level / static
> platform
> > @@ -1816,7 +1817,8 @@ static int cxl_switch_decoder_init(struct cxl_port
> *port,
> > */
> > struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > unsigned
> int nr_targets,
> > -
> cxl_calc_hb_fn calc_hb)
> > +
> cxl_calc_hb_fn calc_hb,
> > +
> cxl_translate_fn translate)
> > {
> > struct cxl_root_decoder *cxlrd;
> > struct cxl_switch_decoder *cxlsd;
> > @@ -1839,6 +1841,7 @@ struct cxl_root_decoder
*cxl_root_decoder_alloc(struct
> cxl_port *port,
> > }
> >
> > cxlrd->calc_hb = calc_hb;
> > + cxlrd->translate = translate;
> > mutex_init(&cxlrd->range_lock);
> >
> > cxld = &cxlsd->cxld;
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 237c28d5f2cc..bdb06dbe98a8 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2723,20 +2723,13 @@ struct cxl_region *cxl_dpa_to_region(const struct
> cxl_memdev *cxlmd, u64 dpa)
> > return ctx.cxlr;
> > }
> >
> > -static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int
pos)
> > +static bool cxl_is_hpa_in_chunk(u64 hpa, struct cxl_region *cxlr, int
pos)
> > {
> > struct cxl_region_params *p = &cxlr->params;
> > int gran = p->interleave_granularity;
> > int ways = p->interleave_ways;
> > u64 offset;
> >
> > - /* Is the hpa within this region at all */
> > - if (hpa < p->res->start || hpa > p->res->end) {
> > - dev_dbg(&cxlr->dev,
> > - "Addr trans fail: hpa 0x%llx not in
> region\n", hpa);
> > - return false;
> > - }
> > -
> > /* Is the hpa in an expected chunk for its pos(-ition) */
> > offset = hpa - p->res->start;
> > offset = do_div(offset, gran * ways);
> > @@ -2752,6 +2745,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct
> cxl_region *cxlr, int pos)
> > u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev
*cxlmd,
> > u64 dpa)
> > {
> > + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr-
> >dev.parent);
> > u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> > struct cxl_region_params *p = &cxlr->params;
> > struct cxl_endpoint_decoder *cxled = NULL;
> > @@ -2801,7 +2795,17 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const
> struct cxl_memdev *cxlmd,
> > /* Apply the hpa_offset to the region base address */
> > hpa = hpa_offset + p->res->start;
> >
> > - if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
> > + /* Root decoder translation overrides typical modulo decode */
> > + if (cxlrd->translate)
> > + hpa = cxlrd->translate(cxlrd, hpa);
> > +
> > + if (hpa < p->res->start || hpa > p->res->end) {
> > + dev_dbg(&cxlr->dev,
> > + "Addr trans fail: hpa 0x%llx not in
> region\n", hpa);
> > + return ULLONG_MAX;
> > + }
> > +
> > + if (!cxlrd->translate && (!cxl_is_hpa_in_chunk(hpa, cxlr, pos)))
> > return ULLONG_MAX;
>
> I needed some time to understand this. It was not immediately clear why, for
> XOR translations, this chunk check is skipped.
Sorry, I wanted to say for "no translations" (i.e., for !cxlrd->translate) but
with copy-paste something went wrong.
Fabio
>
> Wouldn't it be helpful to add a comment to explain why that check is
skipped?
>
> Thanks,
>
> Fabio
>
> >
> > return hpa;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 603c0120cff8..3678235fc9ce 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -434,12 +434,14 @@ struct cxl_switch_decoder {
> > struct cxl_root_decoder;
> > typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder
*cxlrd,
> > int pos);
> > +typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
> >
> > /**
> > * struct cxl_root_decoder - Static platform CXL address decoder
> > * @res: host / parent resource for region allocations
> > * @region_id: region id for next region provisioning event
> > * @calc_hb: which host bridge covers the n'th position by granularity
> > + * @translate: decoder specific address translation function
> > * @platform_data: platform specific configuration data
> > * @range_lock: sync region autodiscovery by address range
> > * @qos_class: QoS performance class cookie
> > @@ -449,6 +451,7 @@ struct cxl_root_decoder {
> > struct resource *res;
> > atomic_t region_id;
> > cxl_calc_hb_fn calc_hb;
> > + cxl_translate_fn translate;
> > void *platform_data;
> > struct mutex range_lock;
> > int qos_class;
> > @@ -773,7 +776,8 @@ bool is_switch_decoder(struct device *dev);
> > bool is_endpoint_decoder(struct device *dev);
> > struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > unsigned
> int nr_targets,
> > -
> cxl_calc_hb_fn calc_hb);
> > +
> cxl_calc_hb_fn calc_hb,
> > +
> cxl_translate_fn translate);
> > struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
> > struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port
*port,
> >
> unsigned int nr_targets);
> >
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation
2024-07-01 9:28 ` Fabio M. De Francesco
2024-07-01 9:42 ` Fabio M. De Francesco
@ 2024-07-01 22:48 ` Alison Schofield
1 sibling, 0 replies; 11+ messages in thread
From: Alison Schofield @ 2024-07-01 22:48 UTC (permalink / raw)
To: Fabio M. De Francesco
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, Dan Williams, linux-cxl, Diego Garcia Rodriguez
On Mon, Jul 01, 2024 at 11:28:24AM +0200, Fabio M. De Francesco wrote:
> On Tuesday, June 25, 2024 2:55:53 AM GMT+2 alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
>
> Hi Alison,
>
> Below I have two questions...
>
Thanks for reviewing -
snip
> >
> > A cxl-test unit test of address translations is in upstream review.
>
> Would it be helpful to provide a link to the test?
Sure. Will add lore link in next rev.
snip
>
> > +
How about this comment:
/* Predictable chunk guarantee only applies to modulo decodes */
> > + if (!cxlrd->translate && (!cxl_is_hpa_in_chunk(hpa, cxlr, pos)))
> > return ULLONG_MAX;
>
> I needed some time to understand this. It was not immediately clear why, for
> XOR translations, this chunk check is skipped.
>
> Wouldn't it be helpful to add a comment to explain why that check is skipped
Beyond that comment, I'd like readers to use git blame and look at
at the commit log:
A second check confirms that the HPA is within an expected chunk
based on the endpoints position in the region and the region
granularity. An XOR decode disrupts the Modulo pattern making the
chunk check useless.
I'm intentionally avoiding making the CXL driver and it's documentation
the source of XOR interleave education.
>
> Thanks,
>
> Fabio
>
> >
> > return hpa;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 603c0120cff8..3678235fc9ce 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -434,12 +434,14 @@ struct cxl_switch_decoder {
> > struct cxl_root_decoder;
> > typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
> > int pos);
> > +typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
> >
> > /**
> > * struct cxl_root_decoder - Static platform CXL address decoder
> > * @res: host / parent resource for region allocations
> > * @region_id: region id for next region provisioning event
> > * @calc_hb: which host bridge covers the n'th position by granularity
> > + * @translate: decoder specific address translation function
> > * @platform_data: platform specific configuration data
> > * @range_lock: sync region autodiscovery by address range
> > * @qos_class: QoS performance class cookie
> > @@ -449,6 +451,7 @@ struct cxl_root_decoder {
> > struct resource *res;
> > atomic_t region_id;
> > cxl_calc_hb_fn calc_hb;
> > + cxl_translate_fn translate;
> > void *platform_data;
> > struct mutex range_lock;
> > int qos_class;
> > @@ -773,7 +776,8 @@ bool is_switch_decoder(struct device *dev);
> > bool is_endpoint_decoder(struct device *dev);
> > struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> > unsigned
> int nr_targets,
> > -
> cxl_calc_hb_fn calc_hb);
> > +
> cxl_calc_hb_fn calc_hb,
> > +
> cxl_translate_fn translate);
> > struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
> > struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
> >
> unsigned int nr_targets);
> >
>
>
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 3/4] cxl/region: Verify target positions using the ordered target list
2024-06-25 0:55 [PATCH v3 0/4] XOR Math Fixups: translation & position alison.schofield
2024-06-25 0:55 ` [PATCH v3 1/4] cxl/core: Rename cxl_trace_hpa() to cxl_dpa_to_hpa() alison.schofield
2024-06-25 0:55 ` [PATCH v3 2/4] cxl: Restore XOR'd position bits during address translation alison.schofield
@ 2024-06-25 0:55 ` alison.schofield
2024-06-25 0:55 ` [PATCH v3 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
2024-06-27 1:52 ` [PATCH v3 0/4] XOR Math Fixups: translation & position Dan Williams
4 siblings, 0 replies; 11+ messages in thread
From: alison.schofield @ 2024-06-25 0:55 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
When a root decoder is configured the interleave target list is read
from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
9-22 the target list is in interleave order. The CXL driver populates
its decoder target list in the same order and stores it in 'struct
cxl_switch_decoder' field "@target: active ordered target list in
current decoder configuration"
Given the promise of an ordered list, the driver can stop duplicating
the work of BIOS and simply check target positions against the ordered
list during region configuration.
The simplified check against the ordered list is presented here.
A follow-on patch will remove the unused code.
For Modulo arithmetic this is not a fix, only a simplification.
For XOR arithmetic this is a fix for HB IW of 3,6,12.
Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/region.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index bdb06dbe98a8..77aa74250c87 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1559,10 +1559,13 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
const struct cxl_dport *dport, int pos)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
+ struct cxl_decoder *cxld = &cxlsd->cxld;
+ int iw = cxld->interleave_ways;
struct cxl_port *iter;
int rc;
- if (cxlrd->calc_hb(cxlrd, pos) != dport) {
+ if (dport != cxlrd->cxlsd.target[pos % iw]) {
dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
dev_name(&cxlrd->cxlsd.cxld.dev));
--
2.37.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v3 4/4] cxl: Remove defunct code calculating host bridge target positions
2024-06-25 0:55 [PATCH v3 0/4] XOR Math Fixups: translation & position alison.schofield
` (2 preceding siblings ...)
2024-06-25 0:55 ` [PATCH v3 3/4] cxl/region: Verify target positions using the ordered target list alison.schofield
@ 2024-06-25 0:55 ` alison.schofield
2024-06-27 1:52 ` [PATCH v3 0/4] XOR Math Fixups: translation & position Dan Williams
4 siblings, 0 replies; 11+ messages in thread
From: alison.schofield @ 2024-06-25 0:55 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS
target list in interleave target order. This means the calculations
the CXL driver added to determine positions when XOR math is in use,
along with the entire XOR vs Modulo call back setup is not needed.
A prior patch added a common method to verify positions.
Remove the now unused code related to the cxl_calc_hb_fn.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/acpi.c | 62 ++---------------------------------------
drivers/cxl/core/port.c | 18 ------------
drivers/cxl/cxl.h | 6 ----
3 files changed, 3 insertions(+), 83 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 010741da0176..18c7cb78504e 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -22,57 +22,6 @@ static const guid_t acpi_cxl_qtg_id_guid =
GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
-/*
- * Find a targets entry (n) in the host bridge interleave list.
- * CXL Specification 3.0 Table 9-22
- */
-static int cxl_xor_calc_n(u64 hpa, struct cxl_cxims_data *cximsd, int iw,
- int ig)
-{
- int i = 0, n = 0;
- u8 eiw;
-
- /* IW: 2,4,6,8,12,16 begin building 'n' using xormaps */
- if (iw != 3) {
- for (i = 0; i < cximsd->nr_maps; i++)
- n |= (hweight64(hpa & cximsd->xormaps[i]) & 1) << i;
- }
- /* IW: 3,6,12 add a modulo calculation to 'n' */
- if (!is_power_of_2(iw)) {
- if (ways_to_eiw(iw, &eiw))
- return -1;
- hpa &= GENMASK_ULL(51, eiw + ig);
- n |= do_div(hpa, 3) << i;
- }
- return n;
-}
-
-static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
-{
- struct cxl_cxims_data *cximsd = cxlrd->platform_data;
- struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
- struct cxl_decoder *cxld = &cxlsd->cxld;
- int ig = cxld->interleave_granularity;
- int iw = cxld->interleave_ways;
- int n = 0;
- u64 hpa;
-
- if (dev_WARN_ONCE(&cxld->dev,
- cxld->interleave_ways != cxlsd->nr_targets,
- "misconfigured root decoder\n"))
- return NULL;
-
- hpa = cxlrd->res->start + pos * ig;
-
- /* Entry (n) is 0 for no interleave (iw == 1) */
- if (iw != 1)
- n = cxl_xor_calc_n(hpa, cximsd, iw, ig);
-
- if (n < 0)
- return NULL;
-
- return cxlrd->cxlsd.target[n];
-}
static u64 cxl_xor_translate(struct cxl_root_decoder *cxlrd, u64 hpa)
{
@@ -398,7 +347,6 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
struct cxl_port *root_port = ctx->root_port;
struct cxl_cxims_context cxims_ctx;
struct device *dev = ctx->dev;
- cxl_calc_hb_fn cxl_calc_hb;
cxl_translate_fn translate;
struct cxl_decoder *cxld;
unsigned int ways, i, ig;
@@ -427,17 +375,13 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
if (rc)
return rc;
- if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
- cxl_calc_hb = cxl_hb_modulo;
+ if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
translate = NULL;
-
- } else {
- cxl_calc_hb = cxl_hb_xor;
+ else
translate = cxl_xor_translate;
- }
struct cxl_root_decoder *cxlrd __free(put_cxlrd) =
- cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb, translate);
+ cxl_root_decoder_alloc(root_port, ways, translate);
if (IS_ERR(cxlrd))
return PTR_ERR(cxlrd);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index e5d5f7783857..9e19f2072ba0 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1733,21 +1733,6 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
return 0;
}
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
-{
- struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
- struct cxl_decoder *cxld = &cxlsd->cxld;
- int iw;
-
- iw = cxld->interleave_ways;
- if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
- "misconfigured root decoder\n"))
- return NULL;
-
- return cxlrd->cxlsd.target[pos % iw];
-}
-EXPORT_SYMBOL_NS_GPL(cxl_hb_modulo, CXL);
-
static struct lock_class_key cxl_decoder_key;
/**
@@ -1807,7 +1792,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
* cxl_root_decoder_alloc - Allocate a root level decoder
* @port: owning CXL root of this decoder
* @nr_targets: static number of downstream targets
- * @calc_hb: which host bridge covers the n'th position by granularity
* @translate: decoder specific address translation function
*
* Return: A new cxl decoder to be registered by cxl_decoder_add(). A
@@ -1817,7 +1801,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
*/
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb,
cxl_translate_fn translate)
{
struct cxl_root_decoder *cxlrd;
@@ -1840,7 +1823,6 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
return ERR_PTR(rc);
}
- cxlrd->calc_hb = calc_hb;
cxlrd->translate = translate;
mutex_init(&cxlrd->range_lock);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3678235fc9ce..75959f6147de 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -432,15 +432,12 @@ struct cxl_switch_decoder {
};
struct cxl_root_decoder;
-typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
- int pos);
typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
/**
* struct cxl_root_decoder - Static platform CXL address decoder
* @res: host / parent resource for region allocations
* @region_id: region id for next region provisioning event
- * @calc_hb: which host bridge covers the n'th position by granularity
* @translate: decoder specific address translation function
* @platform_data: platform specific configuration data
* @range_lock: sync region autodiscovery by address range
@@ -450,7 +447,6 @@ typedef u64 (*cxl_translate_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
struct cxl_root_decoder {
struct resource *res;
atomic_t region_id;
- cxl_calc_hb_fn calc_hb;
cxl_translate_fn translate;
void *platform_data;
struct mutex range_lock;
@@ -776,9 +772,7 @@ bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
- cxl_calc_hb_fn calc_hb,
cxl_translate_fn translate);
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets);
int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
--
2.37.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v3 0/4] XOR Math Fixups: translation & position
2024-06-25 0:55 [PATCH v3 0/4] XOR Math Fixups: translation & position alison.schofield
` (3 preceding siblings ...)
2024-06-25 0:55 ` [PATCH v3 4/4] cxl: Remove defunct code calculating host bridge target positions alison.schofield
@ 2024-06-27 1:52 ` Dan Williams
4 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2024-06-27 1:52 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Changes in v3:
> - Patch 2: Perform the 'chunk' check on Modulo decodes only
> - Patch 1: Fold cxl_translate() into cxl_dpa_to_hpa() (Jonathan)
> Jonathan asked for a rename of cxl_translate to cxl_dpa_to_hpa()
> but the latter already existed and the work of cxl_translate() was
> minimal. They are now one.
> - Remove the mention of XOR's purpose in Patch 2 commit log (Dan)
> - Reword hamming weight wrt XORALLBITS code comment (Jonathan)
> - Post a unit test upstream[1] (Dan, Jonathan)
> - Remove Reviewed-by Tags on Patch 1 & 2 due to rework
> - Add Diego's Tested-by tag to Patch 2,3
>
> Link to v2:
> https://lore.kernel.org/cover.1714159486.git.alison.schofield@intel.com/
>
> [1] https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/
>
>
> Begin cover letter:
>
> Rather than repeat the individual patch commit message content,
> let me describe the flow of this set:
For a future cover letter, a lead in like this does not capture
potential new reviewers compared to a boilerplate flow of:
"Kernel fails at X... this is bad because... here is the flow of the
patches to get the kernel back on track..."
>
> Patch 1: Rename an existing fn - cxl_trace_hpa()-> cxl_dpa_to_hpa()
> A tiny, yet essential cleanup to take first.
>
> Patch 2: cxl: Restore XOR'd position bits during address translation
> The problem fixed in this patch, bad HPA translations with XOR math,
> came to my attention recently.
>
> Patch 3 & Patch 4 are paired. Patch 3 presents the new method for
> verifying a target position in the list and Patch 4 removes the
> old method. These could be squashed.
>
> FYI - the reason I don't present the code removal first is because
> I think it is easier to read the diff if I leave in the old root
> decoder call back setup for calc_hb, insert the new call back along
> the same path, and then rip out the defunct calc_hb. That's the
> way I created the patchset and it may be an easier way for reviewers
> to follow along with the root decoder callback setup.
If I did not say it before, I appreciate notes like this, and agree this
makes the patches easier to read.
^ permalink raw reply [flat|nested] 11+ messages in thread