* [PATCH v3 1/6] cxl: Remove the CXL_DECODER_MIXED mistake
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
@ 2025-02-04 4:24 ` Dan Williams
2025-02-04 17:42 ` Fan Ni
2025-02-04 4:24 ` [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
` (6 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2025-02-04 4:24 UTC (permalink / raw)
To: linux-cxl
Cc: Ira Weiny, Jonathan Cameron, Alejandro Lucero, Dave Jiang,
dave.jiang, Jonathan.Cameron
CXL_DECODER_MIXED is a safety mechanism introduced for the case where
platform firmware has programmed an endpoint decoder that straddles a
DPA partition boundary. While the kernel is careful to only allocate DPA
capacity within a single partition there is no guarantee that platform
firmware, or anything that touched the device before the current kernel,
gets that right.
However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
designation because of the way it tracks partition boundaries. A
request_resource() that spans ->ram_res and ->pmem_res fails with the
following signature:
__cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
CXL_DECODER_MIXED is dead defensive programming after the driver has
already given up on the device. It has never offered any protection in
practice, just delete it.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/hdm.c | 6 +++---
drivers/cxl/core/region.c | 12 ------------
drivers/cxl/cxl.h | 4 +---
3 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 50e6a45b30ba..b7f6a2d69f78 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -332,9 +332,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
else if (resource_contains(&cxlds->ram_res, res))
cxled->mode = CXL_DECODER_RAM;
else {
- dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n",
- port->id, cxled->cxld.id, cxled->dpa_res);
- cxled->mode = CXL_DECODER_MIXED;
+ dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
+ port->id, cxled->cxld.id, res);
+ cxled->mode = CXL_DECODER_NONE;
}
port->hdm_end++;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e8d11a988fd9..34aa3d45fafa 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2738,18 +2738,6 @@ static int poison_by_decoder(struct device *dev, void *arg)
if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
return rc;
- /*
- * Regions are only created with single mode decoders: pmem or ram.
- * Linux does not support mixed mode decoders. This means that
- * reading poison per endpoint decoder adheres to the requirement
- * that poison reads of pmem and ram must be separated.
- * CXL 3.0 Spec 8.2.9.8.4.1
- */
- if (cxled->mode == CXL_DECODER_MIXED) {
- dev_dbg(dev, "poison list read unsupported in mixed mode\n");
- return rc;
- }
-
cxlmd = cxled_to_memdev(cxled);
if (cxled->skip) {
offset = cxled->dpa_res->start - cxled->skip;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index bbbaa0d0a670..9fd1524ed150 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -380,7 +380,6 @@ enum cxl_decoder_mode {
CXL_DECODER_NONE,
CXL_DECODER_RAM,
CXL_DECODER_PMEM,
- CXL_DECODER_MIXED,
CXL_DECODER_DEAD,
};
@@ -390,10 +389,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
[CXL_DECODER_NONE] = "none",
[CXL_DECODER_RAM] = "ram",
[CXL_DECODER_PMEM] = "pmem",
- [CXL_DECODER_MIXED] = "mixed",
};
- if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
+ if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
return names[mode];
return "mixed";
}
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v3 1/6] cxl: Remove the CXL_DECODER_MIXED mistake
2025-02-04 4:24 ` [PATCH v3 1/6] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
@ 2025-02-04 17:42 ` Fan Ni
0 siblings, 0 replies; 19+ messages in thread
From: Fan Ni @ 2025-02-04 17:42 UTC (permalink / raw)
To: Dan Williams
Cc: linux-cxl, Ira Weiny, Jonathan Cameron, Alejandro Lucero,
Dave Jiang
On Mon, Feb 03, 2025 at 08:24:06PM -0800, Dan Williams wrote:
> CXL_DECODER_MIXED is a safety mechanism introduced for the case where
> platform firmware has programmed an endpoint decoder that straddles a
> DPA partition boundary. While the kernel is careful to only allocate DPA
> capacity within a single partition there is no guarantee that platform
> firmware, or anything that touched the device before the current kernel,
> gets that right.
>
> However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
> designation because of the way it tracks partition boundaries. A
> request_resource() that spans ->ram_res and ->pmem_res fails with the
> following signature:
>
> __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
>
> CXL_DECODER_MIXED is dead defensive programming after the driver has
> already given up on the device. It has never offered any protection in
> practice, just delete it.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
> ---
> drivers/cxl/core/hdm.c | 6 +++---
> drivers/cxl/core/region.c | 12 ------------
> drivers/cxl/cxl.h | 4 +---
> 3 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 50e6a45b30ba..b7f6a2d69f78 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -332,9 +332,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> else if (resource_contains(&cxlds->ram_res, res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> - dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n",
> - port->id, cxled->cxld.id, cxled->dpa_res);
> - cxled->mode = CXL_DECODER_MIXED;
> + dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> + port->id, cxled->cxld.id, res);
> + cxled->mode = CXL_DECODER_NONE;
> }
>
> port->hdm_end++;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e8d11a988fd9..34aa3d45fafa 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2738,18 +2738,6 @@ static int poison_by_decoder(struct device *dev, void *arg)
> if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> return rc;
>
> - /*
> - * Regions are only created with single mode decoders: pmem or ram.
> - * Linux does not support mixed mode decoders. This means that
> - * reading poison per endpoint decoder adheres to the requirement
> - * that poison reads of pmem and ram must be separated.
> - * CXL 3.0 Spec 8.2.9.8.4.1
> - */
> - if (cxled->mode == CXL_DECODER_MIXED) {
> - dev_dbg(dev, "poison list read unsupported in mixed mode\n");
> - return rc;
> - }
> -
> cxlmd = cxled_to_memdev(cxled);
> if (cxled->skip) {
> offset = cxled->dpa_res->start - cxled->skip;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index bbbaa0d0a670..9fd1524ed150 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -380,7 +380,6 @@ enum cxl_decoder_mode {
> CXL_DECODER_NONE,
> CXL_DECODER_RAM,
> CXL_DECODER_PMEM,
> - CXL_DECODER_MIXED,
> CXL_DECODER_DEAD,
> };
>
> @@ -390,10 +389,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> [CXL_DECODER_NONE] = "none",
> [CXL_DECODER_RAM] = "ram",
> [CXL_DECODER_PMEM] = "pmem",
> - [CXL_DECODER_MIXED] = "mixed",
> };
>
> - if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
> + if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
> return names[mode];
> return "mixed";
> }
>
--
Fan Ni
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
2025-02-04 4:24 ` [PATCH v3 1/6] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
@ 2025-02-04 4:24 ` Dan Williams
2025-02-04 11:30 ` Jonathan Cameron
2025-02-04 17:50 ` Fan Ni
2025-02-04 4:24 ` [PATCH v3 3/6] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
` (5 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: Dan Williams @ 2025-02-04 4:24 UTC (permalink / raw)
To: linux-cxl
Cc: Alejandro Lucero, Ira Weiny, Dave Jiang, dave.jiang,
Jonathan.Cameron
In preparation for consolidating all DPA partition information into an
array of DPA metadata, introduce helpers that hide the layout of the
current data. I.e. make the eventual replacement of ->ram_res,
->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
no-op for code paths that consume that information, and reduce the noise
of follow-on patches.
The end goal is to consolidate all DPA information in 'struct
cxl_dev_state', but for now the helpers just make it appear that all DPA
metadata is relative to @cxlds.
As the conversion to generic partition metadata walking is completed,
these helpers will naturally be eliminated, or reduced in scope.
Cc: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 71 +++++++++++++++++++++++++++---------------
drivers/cxl/core/hdm.c | 26 ++++++++-------
drivers/cxl/core/mbox.c | 18 ++++++-----
drivers/cxl/core/memdev.c | 42 +++++++++++++------------
drivers/cxl/core/region.c | 10 ++++--
drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++----
drivers/cxl/mem.c | 2 +
tools/testing/cxl/test/cxl.c | 25 ++++++++-------
8 files changed, 162 insertions(+), 90 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 8153f8d83a16..797baad483cb 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -258,27 +258,36 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
struct xarray *dsmas_xa)
{
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
struct device *dev = cxlds->dev;
- struct range pmem_range = {
- .start = cxlds->pmem_res.start,
- .end = cxlds->pmem_res.end,
- };
- struct range ram_range = {
- .start = cxlds->ram_res.start,
- .end = cxlds->ram_res.end,
- };
struct dsmas_entry *dent;
unsigned long index;
+ const struct resource *partition[] = {
+ to_ram_res(cxlds),
+ to_pmem_res(cxlds),
+ };
+ struct cxl_dpa_perf *perf[] = {
+ to_ram_perf(cxlds),
+ to_pmem_perf(cxlds),
+ };
xa_for_each(dsmas_xa, index, dent) {
- if (resource_size(&cxlds->ram_res) &&
- range_contains(&ram_range, &dent->dpa_range))
- update_perf_entry(dev, dent, &mds->ram_perf);
- else if (resource_size(&cxlds->pmem_res) &&
- range_contains(&pmem_range, &dent->dpa_range))
- update_perf_entry(dev, dent, &mds->pmem_perf);
- else
+ bool found = false;
+
+ for (int i = 0; i < ARRAY_SIZE(partition); i++) {
+ const struct resource *res = partition[i];
+ struct range range = {
+ .start = res->start,
+ .end = res->end,
+ };
+
+ if (range_contains(&range, &dent->dpa_range)) {
+ update_perf_entry(dev, dent, perf[i]);
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
&dent->dpa_range);
}
@@ -304,6 +313,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
{
+ if (!dpa_perf)
+ return;
+
*dpa_perf = (struct cxl_dpa_perf) {
.qos_class = CXL_QOS_CLASS_INVALID,
};
@@ -312,6 +324,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
static bool cxl_qos_match(struct cxl_port *root_port,
struct cxl_dpa_perf *dpa_perf)
{
+ if (!dpa_perf)
+ return false;
+
if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
return false;
@@ -346,7 +361,8 @@ static int match_cxlrd_hb(struct device *dev, void *data)
static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds),
+ *pmem_perf = to_pmem_perf(cxlds);
struct cxl_port *root_port;
int rc;
@@ -359,17 +375,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
root_port = &cxl_root->port;
/* Check that the QTG IDs are all sane between end device and root decoders */
- if (!cxl_qos_match(root_port, &mds->ram_perf))
- reset_dpa_perf(&mds->ram_perf);
- if (!cxl_qos_match(root_port, &mds->pmem_perf))
- reset_dpa_perf(&mds->pmem_perf);
+ if (!cxl_qos_match(root_port, ram_perf))
+ reset_dpa_perf(ram_perf);
+ if (!cxl_qos_match(root_port, pmem_perf))
+ reset_dpa_perf(pmem_perf);
/* Check to make sure that the device's host bridge is under a root decoder */
rc = device_for_each_child(&root_port->dev,
cxlmd->endpoint->host_bridge, match_cxlrd_hb);
if (!rc) {
- reset_dpa_perf(&mds->ram_perf);
- reset_dpa_perf(&mds->pmem_perf);
+ reset_dpa_perf(ram_perf);
+ reset_dpa_perf(pmem_perf);
}
return rc;
@@ -574,20 +590,23 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle
enum cxl_decoder_mode mode)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_dpa_perf *perf;
switch (mode) {
case CXL_DECODER_RAM:
- perf = &mds->ram_perf;
+ perf = to_ram_perf(cxlds);
break;
case CXL_DECODER_PMEM:
- perf = &mds->pmem_perf;
+ perf = to_pmem_perf(cxlds);
break;
default:
return ERR_PTR(-EINVAL);
}
+ if (!perf)
+ return ERR_PTR(-EINVAL);
+
if (!dpa_perf_contains(perf, cxled->dpa_res))
return ERR_PTR(-EINVAL);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index b7f6a2d69f78..48b26d3dad26 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
cxled->dpa_res = res;
cxled->skip = skipped;
- if (resource_contains(&cxlds->pmem_res, res))
+ if (resource_contains(to_pmem_res(cxlds), res))
cxled->mode = CXL_DECODER_PMEM;
- else if (resource_contains(&cxlds->ram_res, res))
+ else if (resource_contains(to_ram_res(cxlds), res))
cxled->mode = CXL_DECODER_RAM;
else {
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
@@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
* Only allow modes that are supported by the current partition
* configuration
*/
- if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
+ if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
dev_dbg(dev, "no available pmem capacity\n");
return -ENXIO;
}
- if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
+ if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
dev_dbg(dev, "no available ram capacity\n");
return -ENXIO;
}
@@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
struct device *dev = &cxled->cxld.dev;
resource_size_t start, avail, skip;
struct resource *p, *last;
+ const struct resource *ram_res = to_ram_res(cxlds);
+ const struct resource *pmem_res = to_pmem_res(cxlds);
int rc;
down_write(&cxl_dpa_rwsem);
@@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
goto out;
}
- for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
+ for (p = ram_res->child, last = NULL; p; p = p->sibling)
last = p;
if (last)
free_ram_start = last->end + 1;
else
- free_ram_start = cxlds->ram_res.start;
+ free_ram_start = ram_res->start;
- for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
+ for (p = pmem_res->child, last = NULL; p; p = p->sibling)
last = p;
if (last)
free_pmem_start = last->end + 1;
else
- free_pmem_start = cxlds->pmem_res.start;
+ free_pmem_start = pmem_res->start;
if (cxled->mode == CXL_DECODER_RAM) {
start = free_ram_start;
- avail = cxlds->ram_res.end - start + 1;
+ avail = ram_res->end - start + 1;
skip = 0;
} else if (cxled->mode == CXL_DECODER_PMEM) {
resource_size_t skip_start, skip_end;
start = free_pmem_start;
- avail = cxlds->pmem_res.end - start + 1;
+ avail = pmem_res->end - start + 1;
skip_start = free_ram_start;
/*
* If some pmem is already allocated, then that allocation
* already handled the skip.
*/
- if (cxlds->pmem_res.child &&
- skip_start == cxlds->pmem_res.child->start)
+ if (pmem_res->child &&
+ skip_start == pmem_res->child->start)
skip_end = skip_start - 1;
else
skip_end = start - 1;
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 548564c770c0..3502f1633ad2 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent,
int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct resource *ram_res = to_ram_res(cxlds);
+ struct resource *pmem_res = to_pmem_res(cxlds);
struct device *dev = cxlds->dev;
int rc;
if (!cxlds->media_ready) {
cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
- cxlds->ram_res = DEFINE_RES_MEM(0, 0);
- cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
+ *ram_res = DEFINE_RES_MEM(0, 0);
+ *pmem_res = DEFINE_RES_MEM(0, 0);
return 0;
}
cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
if (mds->partition_align_bytes == 0) {
- rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
+ rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
mds->volatile_only_bytes, "ram");
if (rc)
return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
+ return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
mds->volatile_only_bytes,
mds->persistent_only_bytes, "pmem");
}
@@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
return rc;
}
- rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
+ rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
mds->active_volatile_bytes, "ram");
if (rc)
return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
+ return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
mds->active_volatile_bytes,
mds->active_persistent_bytes, "pmem");
}
@@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->cxlds.reg_map.host = dev;
mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
- mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
- mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
+ to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
+ to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
return mds;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index ae3dfcbe8938..c5f8320ed330 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- unsigned long long len = resource_size(&cxlds->ram_res);
+ unsigned long long len = resource_size(to_ram_res(cxlds));
return sysfs_emit(buf, "%#llx\n", len);
}
@@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- unsigned long long len = resource_size(&cxlds->pmem_res);
+ unsigned long long len = cxl_pmem_size(cxlds);
return sysfs_emit(buf, "%#llx\n", len);
}
@@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
int rc = 0;
/* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
- if (resource_size(&cxlds->pmem_res)) {
- offset = cxlds->pmem_res.start;
- length = resource_size(&cxlds->pmem_res);
+ if (cxl_pmem_size(cxlds)) {
+ const struct resource *res = to_pmem_res(cxlds);
+
+ offset = res->start;
+ length = resource_size(res);
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
if (rc)
return rc;
}
- if (resource_size(&cxlds->ram_res)) {
- offset = cxlds->ram_res.start;
- length = resource_size(&cxlds->ram_res);
+ if (cxl_ram_size(cxlds)) {
+ const struct resource *res = to_ram_res(cxlds);
+
+ offset = res->start;
+ length = resource_size(res);
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
/*
* Invalid Physical Address is not an error for
@@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
+ return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class);
}
static struct device_attribute dev_attr_pmem_qos_class =
@@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
+ return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class);
}
static struct device_attribute dev_attr_ram_qos_class =
@@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds);
- if (a == &dev_attr_ram_qos_class.attr)
- if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
- return 0;
+ if (a == &dev_attr_ram_qos_class.attr &&
+ (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
+ return 0;
return a->mode;
}
@@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds);
- if (a == &dev_attr_pmem_qos_class.attr)
- if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
- return 0;
+ if (a == &dev_attr_pmem_qos_class.attr &&
+ (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
+ return 0;
return a->mode;
}
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 34aa3d45fafa..6706a1b79549 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2701,7 +2701,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
if (ctx->mode == CXL_DECODER_RAM) {
offset = ctx->offset;
- length = resource_size(&cxlds->ram_res) - offset;
+ length = cxl_ram_size(cxlds) - offset;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
if (rc == -EFAULT)
rc = 0;
@@ -2713,9 +2713,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
length = resource_size(&cxlds->dpa_res) - offset;
if (!length)
return 0;
- } else if (resource_size(&cxlds->pmem_res)) {
- offset = cxlds->pmem_res.start;
- length = resource_size(&cxlds->pmem_res);
+ } else if (cxl_pmem_size(cxlds)) {
+ const struct resource *res = to_pmem_res(cxlds);
+
+ offset = res->start;
+ length = resource_size(res);
} else {
return 0;
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2a25d1957ddb..78e92e24d7b5 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -423,8 +423,8 @@ struct cxl_dpa_perf {
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
- * @pmem_res: Active Persistent memory capacity configuration
- * @ram_res: Active Volatile memory capacity configuration
+ * @_pmem_res: Active Persistent memory capacity configuration
+ * @_ram_res: Active Volatile memory capacity configuration
* @serial: PCIe Device Serial Number
* @type: Generic Memory Class device or Vendor Specific Memory device
* @cxl_mbox: CXL mailbox context
@@ -438,13 +438,41 @@ struct cxl_dev_state {
bool rcd;
bool media_ready;
struct resource dpa_res;
- struct resource pmem_res;
- struct resource ram_res;
+ struct resource _pmem_res;
+ struct resource _ram_res;
u64 serial;
enum cxl_devtype type;
struct cxl_mailbox cxl_mbox;
};
+static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
+{
+ return &cxlds->_ram_res;
+}
+
+static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
+{
+ return &cxlds->_pmem_res;
+}
+
+static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
+{
+ const struct resource *res = to_ram_res(cxlds);
+
+ if (!res)
+ return 0;
+ return resource_size(res);
+}
+
+static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
+{
+ const struct resource *res = to_pmem_res(cxlds);
+
+ if (!res)
+ return 0;
+ return resource_size(res);
+}
+
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
{
return dev_get_drvdata(cxl_mbox->host);
@@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
* @active_persistent_bytes: sum of hard + soft persistent
* @next_volatile_bytes: volatile capacity change pending device reset
* @next_persistent_bytes: persistent capacity change pending device reset
- * @ram_perf: performance data entry matched to RAM partition
- * @pmem_perf: performance data entry matched to PMEM partition
+ * @_ram_perf: performance data entry matched to RAM partition
+ * @_pmem_perf: performance data entry matched to PMEM partition
* @event: event log driver state
* @poison: poison driver state info
* @security: security driver state info
@@ -496,8 +524,8 @@ struct cxl_memdev_state {
u64 next_volatile_bytes;
u64 next_persistent_bytes;
- struct cxl_dpa_perf ram_perf;
- struct cxl_dpa_perf pmem_perf;
+ struct cxl_dpa_perf _ram_perf;
+ struct cxl_dpa_perf _pmem_perf;
struct cxl_event_state event;
struct cxl_poison_state poison;
@@ -505,6 +533,20 @@ struct cxl_memdev_state {
struct cxl_fw_state fw;
};
+static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
+
+ return &mds->_ram_perf;
+}
+
+static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
+
+ return &mds->_pmem_perf;
+}
+
static inline struct cxl_memdev_state *
to_cxl_memdev_state(struct cxl_dev_state *cxlds)
{
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 2f03a4d5606e..9675243bd05b 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev)
return -ENXIO;
}
- if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
+ if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
if (rc) {
if (rc == -ENODEV)
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index cc8948f49117..9654513bbccf 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
find_cxl_root(port);
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
- struct range pmem_range = {
- .start = cxlds->pmem_res.start,
- .end = cxlds->pmem_res.end,
+ const struct resource *partition[] = {
+ to_ram_res(cxlds),
+ to_pmem_res(cxlds),
};
- struct range ram_range = {
- .start = cxlds->ram_res.start,
- .end = cxlds->ram_res.end,
+ struct cxl_dpa_perf *perf[] = {
+ to_ram_perf(cxlds),
+ to_pmem_perf(cxlds),
};
if (!cxl_root)
return;
- if (range_len(&ram_range))
- dpa_perf_setup(port, &ram_range, &mds->ram_perf);
+ for (int i = 0; i < ARRAY_SIZE(partition); i++) {
+ const struct resource *res = partition[i];
+ struct range range = {
+ .start = res->start,
+ .end = res->end,
+ };
- if (range_len(&pmem_range))
- dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
+ dpa_perf_setup(port, &range, perf[i]);
+ }
cxl_memdev_update_perf(cxlmd);
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-02-04 4:24 ` [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
@ 2025-02-04 11:30 ` Jonathan Cameron
2025-02-04 17:50 ` Fan Ni
1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Cameron @ 2025-02-04 11:30 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Alejandro Lucero, Ira Weiny, Dave Jiang
On Mon, 03 Feb 2025 20:24:12 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> In preparation for consolidating all DPA partition information into an
> array of DPA metadata, introduce helpers that hide the layout of the
> current data. I.e. make the eventual replacement of ->ram_res,
> ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> no-op for code paths that consume that information, and reduce the noise
> of follow-on patches.
>
> The end goal is to consolidate all DPA information in 'struct
> cxl_dev_state', but for now the helpers just make it appear that all DPA
> metadata is relative to @cxlds.
>
> As the conversion to generic partition metadata walking is completed,
> these helpers will naturally be eliminated, or reduced in scope.
>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
One observation inline. I don't care much either way.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/cdat.c | 71 +++++++++++++++++++++++++++---------------
> drivers/cxl/core/hdm.c | 26 ++++++++-------
> drivers/cxl/core/mbox.c | 18 ++++++-----
> drivers/cxl/core/memdev.c | 42 +++++++++++++------------
> drivers/cxl/core/region.c | 10 ++++--
> drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++----
> drivers/cxl/mem.c | 2 +
> tools/testing/cxl/test/cxl.c | 25 ++++++++-------
> 8 files changed, 162 insertions(+), 90 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 8153f8d83a16..797baad483cb 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -258,27 +258,36 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
> static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct xarray *dsmas_xa)
> {
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct device *dev = cxlds->dev;
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> - };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> - };
> struct dsmas_entry *dent;
> unsigned long index;
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> + };
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> + };
>
> xa_for_each(dsmas_xa, index, dent) {
> - if (resource_size(&cxlds->ram_res) &&
> - range_contains(&ram_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->ram_perf);
> - else if (resource_size(&cxlds->pmem_res) &&
> - range_contains(&pmem_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->pmem_perf);
> - else
> + bool found = false;
> +
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> + if (range_contains(&range, &dent->dpa_range)) {
> + update_perf_entry(dev, dent, perf[i]);
> + found = true;
> + break;
> + }
> + }
> +
> + if (!found)
Could use check on whether we got to end of array check if you expand scope of i
outside loop.
if (i == ARRAY_SIZE(partition))
dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
&dent->dpa_range);
I don't care much either way.
> dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> &dent->dpa_range);
> }
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-02-04 4:24 ` [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-02-04 11:30 ` Jonathan Cameron
@ 2025-02-04 17:50 ` Fan Ni
1 sibling, 0 replies; 19+ messages in thread
From: Fan Ni @ 2025-02-04 17:50 UTC (permalink / raw)
To: Dan Williams
Cc: linux-cxl, Alejandro Lucero, Ira Weiny, Dave Jiang,
Jonathan.Cameron
On Mon, Feb 03, 2025 at 08:24:12PM -0800, Dan Williams wrote:
> In preparation for consolidating all DPA partition information into an
> array of DPA metadata, introduce helpers that hide the layout of the
> current data. I.e. make the eventual replacement of ->ram_res,
> ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> no-op for code paths that consume that information, and reduce the noise
> of follow-on patches.
>
> The end goal is to consolidate all DPA information in 'struct
> cxl_dev_state', but for now the helpers just make it appear that all DPA
> metadata is relative to @cxlds.
>
> As the conversion to generic partition metadata walking is completed,
> these helpers will naturally be eliminated, or reduced in scope.
>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
Reviewed-by: Fan Ni <fan.ni@samsung.com>
> drivers/cxl/core/cdat.c | 71 +++++++++++++++++++++++++++---------------
> drivers/cxl/core/hdm.c | 26 ++++++++-------
> drivers/cxl/core/mbox.c | 18 ++++++-----
> drivers/cxl/core/memdev.c | 42 +++++++++++++------------
> drivers/cxl/core/region.c | 10 ++++--
> drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++----
> drivers/cxl/mem.c | 2 +
> tools/testing/cxl/test/cxl.c | 25 ++++++++-------
> 8 files changed, 162 insertions(+), 90 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 8153f8d83a16..797baad483cb 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -258,27 +258,36 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
> static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct xarray *dsmas_xa)
> {
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct device *dev = cxlds->dev;
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> - };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> - };
> struct dsmas_entry *dent;
> unsigned long index;
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> + };
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> + };
>
> xa_for_each(dsmas_xa, index, dent) {
> - if (resource_size(&cxlds->ram_res) &&
> - range_contains(&ram_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->ram_perf);
> - else if (resource_size(&cxlds->pmem_res) &&
> - range_contains(&pmem_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->pmem_perf);
> - else
> + bool found = false;
> +
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> + if (range_contains(&range, &dent->dpa_range)) {
> + update_perf_entry(dev, dent, perf[i]);
> + found = true;
> + break;
> + }
> + }
> +
> + if (!found)
> dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> &dent->dpa_range);
> }
> @@ -304,6 +313,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
>
> static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
> {
> + if (!dpa_perf)
> + return;
> +
> *dpa_perf = (struct cxl_dpa_perf) {
> .qos_class = CXL_QOS_CLASS_INVALID,
> };
> @@ -312,6 +324,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
> static bool cxl_qos_match(struct cxl_port *root_port,
> struct cxl_dpa_perf *dpa_perf)
> {
> + if (!dpa_perf)
> + return false;
> +
> if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
> return false;
>
> @@ -346,7 +361,8 @@ static int match_cxlrd_hb(struct device *dev, void *data)
> static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
> {
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds),
> + *pmem_perf = to_pmem_perf(cxlds);
> struct cxl_port *root_port;
> int rc;
>
> @@ -359,17 +375,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
> root_port = &cxl_root->port;
>
> /* Check that the QTG IDs are all sane between end device and root decoders */
> - if (!cxl_qos_match(root_port, &mds->ram_perf))
> - reset_dpa_perf(&mds->ram_perf);
> - if (!cxl_qos_match(root_port, &mds->pmem_perf))
> - reset_dpa_perf(&mds->pmem_perf);
> + if (!cxl_qos_match(root_port, ram_perf))
> + reset_dpa_perf(ram_perf);
> + if (!cxl_qos_match(root_port, pmem_perf))
> + reset_dpa_perf(pmem_perf);
>
> /* Check to make sure that the device's host bridge is under a root decoder */
> rc = device_for_each_child(&root_port->dev,
> cxlmd->endpoint->host_bridge, match_cxlrd_hb);
> if (!rc) {
> - reset_dpa_perf(&mds->ram_perf);
> - reset_dpa_perf(&mds->pmem_perf);
> + reset_dpa_perf(ram_perf);
> + reset_dpa_perf(pmem_perf);
> }
>
> return rc;
> @@ -574,20 +590,23 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle
> enum cxl_decoder_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_dpa_perf *perf;
>
> switch (mode) {
> case CXL_DECODER_RAM:
> - perf = &mds->ram_perf;
> + perf = to_ram_perf(cxlds);
> break;
> case CXL_DECODER_PMEM:
> - perf = &mds->pmem_perf;
> + perf = to_pmem_perf(cxlds);
> break;
> default:
> return ERR_PTR(-EINVAL);
> }
>
> + if (!perf)
> + return ERR_PTR(-EINVAL);
> +
> if (!dpa_perf_contains(perf, cxled->dpa_res))
> return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index b7f6a2d69f78..48b26d3dad26 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (resource_contains(&cxlds->pmem_res, res))
> + if (resource_contains(to_pmem_res(cxlds), res))
> cxled->mode = CXL_DECODER_PMEM;
> - else if (resource_contains(&cxlds->ram_res, res))
> + else if (resource_contains(to_ram_res(cxlds), res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> @@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> * Only allow modes that are supported by the current partition
> * configuration
> */
> - if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
> + if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> dev_dbg(dev, "no available pmem capacity\n");
> return -ENXIO;
> }
> - if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
> + if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> dev_dbg(dev, "no available ram capacity\n");
> return -ENXIO;
> }
> @@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> struct device *dev = &cxled->cxld.dev;
> resource_size_t start, avail, skip;
> struct resource *p, *last;
> + const struct resource *ram_res = to_ram_res(cxlds);
> + const struct resource *pmem_res = to_pmem_res(cxlds);
> int rc;
>
> down_write(&cxl_dpa_rwsem);
> @@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
> + for (p = ram_res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> free_ram_start = last->end + 1;
> else
> - free_ram_start = cxlds->ram_res.start;
> + free_ram_start = ram_res->start;
>
> - for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
> + for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> free_pmem_start = last->end + 1;
> else
> - free_pmem_start = cxlds->pmem_res.start;
> + free_pmem_start = pmem_res->start;
>
> if (cxled->mode == CXL_DECODER_RAM) {
> start = free_ram_start;
> - avail = cxlds->ram_res.end - start + 1;
> + avail = ram_res->end - start + 1;
> skip = 0;
> } else if (cxled->mode == CXL_DECODER_PMEM) {
> resource_size_t skip_start, skip_end;
>
> start = free_pmem_start;
> - avail = cxlds->pmem_res.end - start + 1;
> + avail = pmem_res->end - start + 1;
> skip_start = free_ram_start;
>
> /*
> * If some pmem is already allocated, then that allocation
> * already handled the skip.
> */
> - if (cxlds->pmem_res.child &&
> - skip_start == cxlds->pmem_res.child->start)
> + if (pmem_res->child &&
> + skip_start == pmem_res->child->start)
> skip_end = skip_start - 1;
> else
> skip_end = start - 1;
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 548564c770c0..3502f1633ad2 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent,
> int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> + struct resource *ram_res = to_ram_res(cxlds);
> + struct resource *pmem_res = to_pmem_res(cxlds);
> struct device *dev = cxlds->dev;
> int rc;
>
> if (!cxlds->media_ready) {
> cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> - cxlds->ram_res = DEFINE_RES_MEM(0, 0);
> - cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
> + *ram_res = DEFINE_RES_MEM(0, 0);
> + *pmem_res = DEFINE_RES_MEM(0, 0);
> return 0;
> }
>
> cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
>
> if (mds->partition_align_bytes == 0) {
> - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> mds->volatile_only_bytes, "ram");
> if (rc)
> return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> mds->volatile_only_bytes,
> mds->persistent_only_bytes, "pmem");
> }
> @@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> return rc;
> }
>
> - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> mds->active_volatile_bytes, "ram");
> if (rc)
> return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> mds->active_volatile_bytes,
> mds->active_persistent_bytes, "pmem");
> }
> @@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> mds->cxlds.reg_map.host = dev;
> mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> - mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
> - mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
> + to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
> + to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
>
> return mds;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index ae3dfcbe8938..c5f8320ed330 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(&cxlds->ram_res);
> + unsigned long long len = resource_size(to_ram_res(cxlds));
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> @@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(&cxlds->pmem_res);
> + unsigned long long len = cxl_pmem_size(cxlds);
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> @@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> int rc = 0;
>
> /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
> - if (resource_size(&cxlds->pmem_res)) {
> - offset = cxlds->pmem_res.start;
> - length = resource_size(&cxlds->pmem_res);
> + if (cxl_pmem_size(cxlds)) {
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc)
> return rc;
> }
> - if (resource_size(&cxlds->ram_res)) {
> - offset = cxlds->ram_res.start;
> - length = resource_size(&cxlds->ram_res);
> + if (cxl_ram_size(cxlds)) {
> + const struct resource *res = to_ram_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> /*
> * Invalid Physical Address is not an error for
> @@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>
> - return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
> + return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class);
> }
>
> static struct device_attribute dev_attr_pmem_qos_class =
> @@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>
> - return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
> + return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class);
> }
>
> static struct device_attribute dev_attr_ram_qos_class =
> @@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
> {
> struct device *dev = kobj_to_dev(kobj);
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds);
>
> - if (a == &dev_attr_ram_qos_class.attr)
> - if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
> - return 0;
> + if (a == &dev_attr_ram_qos_class.attr &&
> + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
> + return 0;
>
> return a->mode;
> }
> @@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n
> {
> struct device *dev = kobj_to_dev(kobj);
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds);
>
> - if (a == &dev_attr_pmem_qos_class.attr)
> - if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
> - return 0;
> + if (a == &dev_attr_pmem_qos_class.attr &&
> + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
> + return 0;
>
> return a->mode;
> }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 34aa3d45fafa..6706a1b79549 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2701,7 +2701,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
>
> if (ctx->mode == CXL_DECODER_RAM) {
> offset = ctx->offset;
> - length = resource_size(&cxlds->ram_res) - offset;
> + length = cxl_ram_size(cxlds) - offset;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc == -EFAULT)
> rc = 0;
> @@ -2713,9 +2713,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> length = resource_size(&cxlds->dpa_res) - offset;
> if (!length)
> return 0;
> - } else if (resource_size(&cxlds->pmem_res)) {
> - offset = cxlds->pmem_res.start;
> - length = resource_size(&cxlds->pmem_res);
> + } else if (cxl_pmem_size(cxlds)) {
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> } else {
> return 0;
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 2a25d1957ddb..78e92e24d7b5 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -423,8 +423,8 @@ struct cxl_dpa_perf {
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> - * @pmem_res: Active Persistent memory capacity configuration
> - * @ram_res: Active Volatile memory capacity configuration
> + * @_pmem_res: Active Persistent memory capacity configuration
> + * @_ram_res: Active Volatile memory capacity configuration
> * @serial: PCIe Device Serial Number
> * @type: Generic Memory Class device or Vendor Specific Memory device
> * @cxl_mbox: CXL mailbox context
> @@ -438,13 +438,41 @@ struct cxl_dev_state {
> bool rcd;
> bool media_ready;
> struct resource dpa_res;
> - struct resource pmem_res;
> - struct resource ram_res;
> + struct resource _pmem_res;
> + struct resource _ram_res;
> u64 serial;
> enum cxl_devtype type;
> struct cxl_mailbox cxl_mbox;
> };
>
> +static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +{
> + return &cxlds->_ram_res;
> +}
> +
> +static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +{
> + return &cxlds->_pmem_res;
> +}
> +
> +static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
> +{
> + const struct resource *res = to_ram_res(cxlds);
> +
> + if (!res)
> + return 0;
> + return resource_size(res);
> +}
> +
> +static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> +{
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + if (!res)
> + return 0;
> + return resource_size(res);
> +}
> +
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
> @@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> * @active_persistent_bytes: sum of hard + soft persistent
> * @next_volatile_bytes: volatile capacity change pending device reset
> * @next_persistent_bytes: persistent capacity change pending device reset
> - * @ram_perf: performance data entry matched to RAM partition
> - * @pmem_perf: performance data entry matched to PMEM partition
> + * @_ram_perf: performance data entry matched to RAM partition
> + * @_pmem_perf: performance data entry matched to PMEM partition
> * @event: event log driver state
> * @poison: poison driver state info
> * @security: security driver state info
> @@ -496,8 +524,8 @@ struct cxl_memdev_state {
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
>
> - struct cxl_dpa_perf ram_perf;
> - struct cxl_dpa_perf pmem_perf;
> + struct cxl_dpa_perf _ram_perf;
> + struct cxl_dpa_perf _pmem_perf;
>
> struct cxl_event_state event;
> struct cxl_poison_state poison;
> @@ -505,6 +533,20 @@ struct cxl_memdev_state {
> struct cxl_fw_state fw;
> };
>
> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> +
> + return &mds->_ram_perf;
> +}
> +
> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> +
> + return &mds->_pmem_perf;
> +}
> +
> static inline struct cxl_memdev_state *
> to_cxl_memdev_state(struct cxl_dev_state *cxlds)
> {
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 2f03a4d5606e..9675243bd05b 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev)
> return -ENXIO;
> }
>
> - if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> + if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
> if (rc) {
> if (rc == -ENODEV)
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index cc8948f49117..9654513bbccf 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
> find_cxl_root(port);
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> };
>
> if (!cxl_root)
> return;
>
> - if (range_len(&ram_range))
> - dpa_perf_setup(port, &ram_range, &mds->ram_perf);
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
>
> - if (range_len(&pmem_range))
> - dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
> + dpa_perf_setup(port, &range, perf[i]);
> + }
>
> cxl_memdev_update_perf(cxlmd);
>
>
--
Fan Ni
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 3/6] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
2025-02-04 4:24 ` [PATCH v3 1/6] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-02-04 4:24 ` [PATCH v3 2/6] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
@ 2025-02-04 4:24 ` Dan Williams
2025-02-04 11:50 ` Jonathan Cameron
2025-02-04 4:24 ` [PATCH v3 4/6] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2025-02-04 4:24 UTC (permalink / raw)
To: linux-cxl
Cc: Ira Weiny, Dave Jiang, Alejandro Lucero, dave.jiang,
Jonathan.Cameron
The pending efforts to add CXL Accelerator (type-2) device [1], and
Dynamic Capacity (DCD) support [2], tripped on the
no-longer-fit-for-purpose design in the CXL subsystem for tracking
device-physical-address (DPA) metadata. Trip hazards include:
- CXL Memory Devices need to consider a PMEM partition, but Accelerator
devices with CXL.mem likely do not in the common case.
- CXL Memory Devices enumerate DPA through Memory Device mailbox
commands like Partition Info, Accelerators devices do not.
- CXL Memory Devices that support DCD support more than 2 partitions.
Some of the driver algorithms are awkward to expand to > 2 partition
cases.
- DPA performance data is a general capability that can be shared with
accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
suitable.
- Hardcoded assumptions around the PMEM partition always being index-1
if RAM is zero-sized or PMEM is zero sized.
- 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
memory property, it should be phased in favor of a partition id and
the memory property comes from the partition info.
Towards cleaning up those issues and allowing a smoother landing for the
aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
way for Memory Devices and Accelerators to initialize the DPA information
in 'struct cxl_dev_state'.
For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
get the new data structure initialized, and cleanup some qos_class init.
Follow on patches will go further to use the new data structure to
cleanup algorithms that are better suited to loop over all possible
partitions.
cxl_dpa_setup() follows the locking expectations of mutating the device
DPA map, and is suitable for Accelerator drivers to use. Accelerators
likely only have one hardcoded 'ram' partition to convey to the
cxl_core.
Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 15 ++-----
drivers/cxl/core/hdm.c | 88 +++++++++++++++++++++++++++++++++++++++-
drivers/cxl/core/mbox.c | 68 ++++++++++---------------------
drivers/cxl/core/memdev.c | 2 -
drivers/cxl/cxlmem.h | 93 +++++++++++++++++++++++++++++-------------
drivers/cxl/pci.c | 7 +++
tools/testing/cxl/test/cxl.c | 15 ++-----
tools/testing/cxl/test/mem.c | 7 +++
8 files changed, 195 insertions(+), 100 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 797baad483cb..33cabe4aa026 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -261,27 +261,20 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
struct device *dev = cxlds->dev;
struct dsmas_entry *dent;
unsigned long index;
- const struct resource *partition[] = {
- to_ram_res(cxlds),
- to_pmem_res(cxlds),
- };
- struct cxl_dpa_perf *perf[] = {
- to_ram_perf(cxlds),
- to_pmem_perf(cxlds),
- };
xa_for_each(dsmas_xa, index, dent) {
bool found = false;
- for (int i = 0; i < ARRAY_SIZE(partition); i++) {
- const struct resource *res = partition[i];
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ struct resource *res = &cxlds->part[i].res;
struct range range = {
.start = res->start,
.end = res->end,
};
if (range_contains(&range, &dent->dpa_range)) {
- update_perf_entry(dev, dent, perf[i]);
+ update_perf_entry(dev, dent,
+ &cxlds->part[i].perf);
found = true;
break;
}
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 48b26d3dad26..ea4e792f789f 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
cxled->dpa_res = res;
cxled->skip = skipped;
- if (resource_contains(to_pmem_res(cxlds), res))
+ if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
cxled->mode = CXL_DECODER_PMEM;
- else if (resource_contains(to_ram_res(cxlds), res))
+ else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
cxled->mode = CXL_DECODER_RAM;
else {
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
@@ -342,6 +342,90 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
return 0;
}
+static int add_dpa_res(struct device *dev, struct resource *parent,
+ struct resource *res, resource_size_t start,
+ resource_size_t size, const char *type)
+{
+ int rc;
+
+ *res = (struct resource) {
+ .name = type,
+ .start = start,
+ .end = start + size - 1,
+ .flags = IORESOURCE_MEM,
+ };
+ if (resource_size(res) == 0) {
+ dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
+ return 0;
+ }
+ rc = request_resource(parent, res);
+ if (rc) {
+ dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
+ res, rc);
+ return rc;
+ }
+
+ dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
+
+ return 0;
+}
+
+static const char *cxl_mode_name(enum cxl_partition_mode mode)
+{
+ switch (mode) {
+ case CXL_PARTMODE_RAM:
+ return "ram";
+ case CXL_PARTMODE_PMEM:
+ return "pmem";
+ default:
+ return "";
+ };
+}
+
+/* if this fails the caller must destroy @cxlds, there is no recovery */
+int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
+{
+ struct device *dev = cxlds->dev;
+
+ guard(rwsem_write)(&cxl_dpa_rwsem);
+
+ if (cxlds->nr_partitions)
+ return -EBUSY;
+
+ if (!info->size || !info->nr_partitions) {
+ cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
+ cxlds->nr_partitions = 0;
+ return 0;
+ }
+
+ cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
+
+ for (int i = 0; i < info->nr_partitions; i++) {
+ const struct cxl_dpa_part_info *part = &info->part[i];
+ int rc;
+
+ cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
+ cxlds->part[i].mode = part->mode;
+
+ /* Require ordered + contiguous partitions */
+ if (i) {
+ const struct cxl_dpa_part_info *prev = &info->part[i - 1];
+
+ if (prev->range.end + 1 != part->range.start)
+ return -EINVAL;
+ }
+ rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
+ part->range.start, range_len(&part->range),
+ cxl_mode_name(part->mode));
+ if (rc)
+ return rc;
+ cxlds->nr_partitions++;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_dpa_setup);
+
int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 3502f1633ad2..62bb3653362f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1241,57 +1241,39 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
return rc;
}
-static int add_dpa_res(struct device *dev, struct resource *parent,
- struct resource *res, resource_size_t start,
- resource_size_t size, const char *type)
+static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
{
- int rc;
+ int i = info->nr_partitions;
- res->name = type;
- res->start = start;
- res->end = start + size - 1;
- res->flags = IORESOURCE_MEM;
- if (resource_size(res) == 0) {
- dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
- return 0;
- }
- rc = request_resource(parent, res);
- if (rc) {
- dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
- res, rc);
- return rc;
- }
-
- dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
+ if (size == 0)
+ return;
- return 0;
+ info->part[i].range = (struct range) {
+ .start = start,
+ .end = start + size - 1,
+ };
+ info->part[i].mode = mode;
+ info->nr_partitions++;
}
-int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
- struct resource *ram_res = to_ram_res(cxlds);
- struct resource *pmem_res = to_pmem_res(cxlds);
struct device *dev = cxlds->dev;
int rc;
if (!cxlds->media_ready) {
- cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
- *ram_res = DEFINE_RES_MEM(0, 0);
- *pmem_res = DEFINE_RES_MEM(0, 0);
+ info->size = 0;
return 0;
}
- cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
+ info->size = mds->total_bytes;
if (mds->partition_align_bytes == 0) {
- rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
- mds->volatile_only_bytes, "ram");
- if (rc)
- return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
- mds->volatile_only_bytes,
- mds->persistent_only_bytes, "pmem");
+ add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->volatile_only_bytes,
+ mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
+ return 0;
}
rc = cxl_mem_get_partition_info(mds);
@@ -1300,15 +1282,13 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
return rc;
}
- rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
- mds->active_volatile_bytes, "ram");
- if (rc)
- return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
- mds->active_volatile_bytes,
- mds->active_persistent_bytes, "pmem");
+ add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
+ CXL_PARTMODE_PMEM);
+
+ return 0;
}
-EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
int cxl_set_timestamp(struct cxl_memdev_state *mds)
{
@@ -1452,8 +1432,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->cxlds.reg_map.host = dev;
mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
- to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
- to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
return mds;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index c5f8320ed330..be0eb57086e1 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- unsigned long long len = resource_size(to_ram_res(cxlds));
+ unsigned long long len = cxl_ram_size(cxlds);
return sysfs_emit(buf, "%#llx\n", len);
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 78e92e24d7b5..e33b2d5efed9 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -97,6 +97,24 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
+enum cxl_partition_mode {
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
+#define CXL_NR_PARTITIONS_MAX 2
+
+struct cxl_dpa_info {
+ u64 size;
+ struct cxl_dpa_part_info {
+ struct range range;
+ enum cxl_partition_mode mode;
+ } part[CXL_NR_PARTITIONS_MAX];
+ int nr_partitions;
+};
+
+int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
+
static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
struct cxl_memdev *cxlmd)
{
@@ -408,6 +426,18 @@ struct cxl_dpa_perf {
int qos_class;
};
+/**
+ * struct cxl_dpa_partition - DPA partition descriptor
+ * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
+ * @perf: performance attributes of the partition from CDAT
+ * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ */
+struct cxl_dpa_partition {
+ struct resource res;
+ struct cxl_dpa_perf perf;
+ enum cxl_partition_mode mode;
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -423,8 +453,8 @@ struct cxl_dpa_perf {
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
- * @_pmem_res: Active Persistent memory capacity configuration
- * @_ram_res: Active Volatile memory capacity configuration
+ * @part: DPA partition array
+ * @nr_partitions: Number of DPA partitions
* @serial: PCIe Device Serial Number
* @type: Generic Memory Class device or Vendor Specific Memory device
* @cxl_mbox: CXL mailbox context
@@ -438,21 +468,47 @@ struct cxl_dev_state {
bool rcd;
bool media_ready;
struct resource dpa_res;
- struct resource _pmem_res;
- struct resource _ram_res;
+ struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
+ unsigned int nr_partitions;
u64 serial;
enum cxl_devtype type;
struct cxl_mailbox cxl_mbox;
};
-static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
+
+/* Static RAM is only expected at partition 0. */
+static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
+{
+ if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
+ return NULL;
+ return &cxlds->part[0].res;
+}
+
+/*
+ * Static PMEM may be at partition index 0 when there is no static RAM
+ * capacity.
+ */
+static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
+{
+ for (int i = 0; i < cxlds->nr_partitions; i++)
+ if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+ return &cxlds->part[i].res;
+ return NULL;
+}
+
+static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
{
- return &cxlds->_ram_res;
+ if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
+ return NULL;
+ return &cxlds->part[0].perf;
}
-static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
+static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
{
- return &cxlds->_pmem_res;
+ for (int i = 0; i < cxlds->nr_partitions; i++)
+ if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+ return &cxlds->part[i].perf;
+ return NULL;
}
static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
@@ -499,8 +555,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
* @active_persistent_bytes: sum of hard + soft persistent
* @next_volatile_bytes: volatile capacity change pending device reset
* @next_persistent_bytes: persistent capacity change pending device reset
- * @_ram_perf: performance data entry matched to RAM partition
- * @_pmem_perf: performance data entry matched to PMEM partition
* @event: event log driver state
* @poison: poison driver state info
* @security: security driver state info
@@ -524,29 +578,12 @@ struct cxl_memdev_state {
u64 next_volatile_bytes;
u64 next_persistent_bytes;
- struct cxl_dpa_perf _ram_perf;
- struct cxl_dpa_perf _pmem_perf;
-
struct cxl_event_state event;
struct cxl_poison_state poison;
struct cxl_security_state security;
struct cxl_fw_state fw;
};
-static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
-{
- struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
-
- return &mds->_ram_perf;
-}
-
-static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
-{
- struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
-
- return &mds->_pmem_perf;
-}
-
static inline struct cxl_memdev_state *
to_cxl_memdev_state(struct cxl_dev_state *cxlds)
{
@@ -860,7 +897,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
-int cxl_mem_create_range_info(struct cxl_memdev_state *mds);
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index a96e54c6259e..b2c943a4de0a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -903,6 +903,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd);
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
+ struct cxl_dpa_info range_info = { 0 };
struct cxl_memdev_state *mds;
struct cxl_dev_state *cxlds;
struct cxl_register_map map;
@@ -993,7 +994,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
- rc = cxl_mem_create_range_info(mds);
+ rc = cxl_mem_dpa_fetch(mds, &range_info);
+ if (rc)
+ return rc;
+
+ rc = cxl_dpa_setup(cxlds, &range_info);
if (rc)
return rc;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 9654513bbccf..083a66a52731 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
- const struct resource *partition[] = {
- to_ram_res(cxlds),
- to_pmem_res(cxlds),
- };
- struct cxl_dpa_perf *perf[] = {
- to_ram_perf(cxlds),
- to_pmem_perf(cxlds),
- };
if (!cxl_root)
return;
- for (int i = 0; i < ARRAY_SIZE(partition); i++) {
- const struct resource *res = partition[i];
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ struct resource *res = &cxlds->part[i].res;
+ struct cxl_dpa_perf *perf = &cxlds->part[i].perf;
struct range range = {
.start = res->start,
.end = res->end,
};
- dpa_perf_setup(port, &range, perf[i]);
+ dpa_perf_setup(port, &range, perf);
}
cxl_memdev_update_perf(cxlmd);
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 8d731bd63988..495199238335 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1494,6 +1494,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
struct cxl_dev_state *cxlds;
struct cxl_mockmem_data *mdata;
struct cxl_mailbox *cxl_mbox;
+ struct cxl_dpa_info range_info = { 0 };
int rc;
mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
@@ -1554,7 +1555,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- rc = cxl_mem_create_range_info(mds);
+ rc = cxl_mem_dpa_fetch(mds, &range_info);
+ if (rc)
+ return rc;
+
+ rc = cxl_dpa_setup(cxlds, &range_info);
if (rc)
return rc;
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v3 3/6] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-02-04 4:24 ` [PATCH v3 3/6] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
@ 2025-02-04 11:50 ` Jonathan Cameron
2025-02-04 18:50 ` Dan Williams
0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron @ 2025-02-04 11:50 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Ira Weiny, Dave Jiang, Alejandro Lucero
Hi Dan,
My only real question in here is whether the ram partition must
be zero is worth explicitly optimizing for. It reduces
some chances for code reuse.
Either way, this is fine. A few other tiny things inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/core/cdat.c | 15 ++-----
> drivers/cxl/core/hdm.c | 88 +++++++++++++++++++++++++++++++++++++++-
> drivers/cxl/core/mbox.c | 68 ++++++++++---------------------
> drivers/cxl/core/memdev.c | 2 -
> drivers/cxl/cxlmem.h | 93 +++++++++++++++++++++++++++++-------------
> drivers/cxl/pci.c | 7 +++
> tools/testing/cxl/test/cxl.c | 15 ++-----
> tools/testing/cxl/test/mem.c | 7 +++
> 8 files changed, 195 insertions(+), 100 deletions(-)
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 48b26d3dad26..ea4e792f789f 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> +/* if this fails the caller must destroy @cxlds, there is no recovery */
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
...
> +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
Trivial but other exports in here are explicitly name spaced. Does it make
sense to relad this one?
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index c5f8320ed330..be0eb57086e1 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(to_ram_res(cxlds));
> + unsigned long long len = cxl_ram_size(cxlds);
Belongs in previous patch perhaps?
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 78e92e24d7b5..e33b2d5efed9 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -97,6 +97,24 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>
> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +
> +/* Static RAM is only expected at partition 0. */
I'm not sure this simplification is worth it. After all can define
a helper if it is useful to have the to_x_res functions and define
them in terms of it.
static inline const struct resource *to_mem_res(struct cxl_dev_state *cxlds,
enum cxl_partition_mode mode)
{
for (int i = 0; i < cxlds->nr_partitions; i++)
if (cxlds->part[i].mode == mode)
return &cxlds->part[i].res;
return NULL;
}
static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
{
return to_mem_res(cxlds, CXL_PARTMODE_RAM);
}
static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
{
return to_mem_res(cxlds, CXL_PARTMODE_PMEM);
Or just use the to_mem_res() throughout and drop the to_pmem/ram_res() helpers.
}
> +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +{
> + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> + return NULL;
> + return &cxlds->part[0].res;
> +}
> +
> +/*
> + * Static PMEM may be at partition index 0 when there is no static RAM
> + * capacity.
> + */
> +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +{
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> + return &cxlds->part[i].res;
> + return NULL;
> +}
> +
> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
Similar. Is it worth position 0 optimization?
> {
> - return &cxlds->_ram_res;
> + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> + return NULL;
> + return &cxlds->part[0].perf;
> }
>
> -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> {
> - return &cxlds->_pmem_res;
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> + return &cxlds->part[i].perf;
> + return NULL;
> }
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 3/6] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-02-04 11:50 ` Jonathan Cameron
@ 2025-02-04 18:50 ` Dan Williams
0 siblings, 0 replies; 19+ messages in thread
From: Dan Williams @ 2025-02-04 18:50 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: linux-cxl, Ira Weiny, Dave Jiang, Alejandro Lucero
Jonathan Cameron wrote:
> Hi Dan,
>
> My only real question in here is whether the ram partition must
> be zero is worth explicitly optimizing for. It reduces
> some chances for code reuse.
>
> Either way, this is fine. A few other tiny things inline.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > drivers/cxl/core/cdat.c | 15 ++-----
> > drivers/cxl/core/hdm.c | 88 +++++++++++++++++++++++++++++++++++++++-
> > drivers/cxl/core/mbox.c | 68 ++++++++++---------------------
> > drivers/cxl/core/memdev.c | 2 -
> > drivers/cxl/cxlmem.h | 93 +++++++++++++++++++++++++++++-------------
> > drivers/cxl/pci.c | 7 +++
> > tools/testing/cxl/test/cxl.c | 15 ++-----
> > tools/testing/cxl/test/mem.c | 7 +++
> > 8 files changed, 195 insertions(+), 100 deletions(-)
>
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 48b26d3dad26..ea4e792f789f 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
>
> > +/* if this fails the caller must destroy @cxlds, there is no recovery */
> > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> ...
>
> > +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
>
> Trivial but other exports in here are explicitly name spaced. Does it make
> sense to relad this one?
This was a concious decision to start distinguishing the symbol exports
that are between drivers/cxl/ components, or exports for CXL specific
use cases (like insert_resource_expand_to_fit) from public exports for
the wider kernel.
So, I would expect any symbols that get "promoted" to public usage in
the Type-2 series also drop the symbol namespace and are audited to not
assume a cxl_pci driver calling context.
For example, this is why cxl_dpa_setup() takes a lock where one was not
needed previously.
>
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index c5f8320ed330..be0eb57086e1 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> > {
> > struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > - unsigned long long len = resource_size(to_ram_res(cxlds));
> > + unsigned long long len = cxl_ram_size(cxlds);
>
> Belongs in previous patch perhaps?
The motivation for including it here is that this is the first patch in
the series that makes to_ram_res() start returning NULL when the 'ram'
size is zero.
> > return sysfs_emit(buf, "%#llx\n", len);
> > }
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 78e92e24d7b5..e33b2d5efed9 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -97,6 +97,24 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>
> >
> > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > +
> > +/* Static RAM is only expected at partition 0. */
>
> I'm not sure this simplification is worth it. After all can define
> a helper if it is useful to have the to_x_res functions and define
> them in terms of it.
>
> static inline const struct resource *to_mem_res(struct cxl_dev_state *cxlds,
> enum cxl_partition_mode mode)
> {
> for (int i = 0; i < cxlds->nr_partitions; i++)
> if (cxlds->part[i].mode == mode)
> return &cxlds->part[i].res;
> return NULL;
> }
>
> static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> {
> return to_mem_res(cxlds, CXL_PARTMODE_RAM);
> }
> static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> {
> return to_mem_res(cxlds, CXL_PARTMODE_PMEM);
>
> Or just use the to_mem_res() throughout and drop the to_pmem/ram_res() helpers.
The to_*_res() helpers are only transitory to unwind the
partition-specific logic in favor of generic partition metadata. They
are gone in patch6. Also the hard coded expectation that 'ram' is at
zero is less a simplification and more documentation of device
expectations.
> > +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > +{
> > + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> > + return NULL;
> > + return &cxlds->part[0].res;
> > +}
> > +
> > +/*
> > + * Static PMEM may be at partition index 0 when there is no static RAM
> > + * capacity.
> > + */
> > +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> > +{
> > + for (int i = 0; i < cxlds->nr_partitions; i++)
> > + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> > + return &cxlds->part[i].res;
> > + return NULL;
> > +}
> > +
> > +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
>
> Similar. Is it worth position 0 optimization?
Again it's less optimization and more documentation especially because
shipping tooling embeds "ram before pmem" assumptions. It serves as a
cheap early warning system if a device under development breaks these
assumptions.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 4/6] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
` (2 preceding siblings ...)
2025-02-04 4:24 ` [PATCH v3 3/6] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
@ 2025-02-04 4:24 ` Dan Williams
2025-02-04 12:13 ` Jonathan Cameron
2025-02-04 4:24 ` [PATCH v3 5/6] cxl: Kill enum cxl_decoder_mode Dan Williams
` (3 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2025-02-04 4:24 UTC (permalink / raw)
To: linux-cxl
Cc: Ira Weiny, Alejandro Lucero, Dave Jiang, dave.jiang,
Jonathan.Cameron
cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
allocations being distinct from RAM allocations in specific ways when in
practice the allocation rules are only relative to DPA partition index.
The rules for cxl_dpa_alloc() are:
- allocations can only come from 1 partition
- if allocating at partition-index-N, all free space in partitions less
than partition-index-N must be skipped over
Use the new 'struct cxl_dpa_partition' array to support allocation with
an arbitrary number of DPA partitions on the device.
A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
concept and supersede it with looking up the memory properties from
partition metadata. Until then cxl_part_mode() temporarily bridges code
that looks up partitions by @cxled->mode.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/hdm.c | 199 ++++++++++++++++++++++++++++++++++--------------
drivers/cxl/cxlmem.h | 14 +++
2 files changed, 156 insertions(+), 57 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index ea4e792f789f..78ecb88bad7e 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -223,6 +223,37 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
+/* See request_skip() kernel-doc */
+static resource_size_t __adjust_skip(struct cxl_dev_state *cxlds,
+ const resource_size_t skip_base,
+ const resource_size_t skip_len,
+ const char *requester)
+{
+ const resource_size_t skip_end = skip_base + skip_len - 1;
+
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ const struct resource *part_res = &cxlds->part[i].res;
+ resource_size_t adjust_start, adjust_end, size;
+
+ adjust_start = max(skip_base, part_res->start);
+ adjust_end = min(skip_end, part_res->end);
+
+ if (adjust_end < adjust_start)
+ continue;
+
+ size = adjust_end - adjust_start + 1;
+
+ if (!requester)
+ __release_region(&cxlds->dpa_res, adjust_start, size);
+ else if (!__request_region(&cxlds->dpa_res, adjust_start, size,
+ requester, 0))
+ return adjust_start - skip_base;
+ }
+
+ return skip_len;
+}
+#define release_skip(c, b, l) __adjust_skip((c), (b), (l), NULL)
+
/*
* Must be called in a context that synchronizes against this decoder's
* port ->remove() callback (like an endpoint decoder sysfs attribute)
@@ -241,7 +272,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
skip_start = res->start - cxled->skip;
__release_region(&cxlds->dpa_res, res->start, resource_size(res));
if (cxled->skip)
- __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
+ release_skip(cxlds, skip_start, cxled->skip);
cxled->skip = 0;
cxled->dpa_res = NULL;
put_device(&cxled->cxld.dev);
@@ -268,6 +299,58 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
__cxl_dpa_release(cxled);
}
+/**
+ * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
+ * @cxlds: CXL.mem device context that parents @cxled
+ * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
+ * @skip_base: DPA < start of new DPA allocation (DPAnew)
+ * @skip_len: @skip_base + @skip_len == DPAnew
+ *
+ * DPA 'skip' arises from out-of-sequence DPA allocation events relative
+ * to free capacity across multiple partitions. It is a wasteful event
+ * as usable DPA gets thrown away, but if a deployment has, for example,
+ * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
+ * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
+ * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
+ * Protection" for more details.
+ *
+ * A 'skip' always covers the last allocated DPA in a previous partition
+ * to the start of the current partition to allocate. Allocations never
+ * start in the middle of a partition, and allocations are always
+ * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
+ * unwind order from forced in-order allocation).
+ *
+ * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
+ * would always be contained to a single partition. Given
+ * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
+ * might span "tail capacity of partition[0], all of partition[1], ...,
+ * all of partition[N-1]" to support allocating from partition[N]. That
+ * in turn interacts with the partition 'struct resource' boundaries
+ * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
+ * partition. I.e. this is a quirk of using a 'struct resource' tree to
+ * detect range conflicts while also tracking partition boundaries in
+ * @cxlds->dpa_res.
+ */
+static int request_skip(struct cxl_dev_state *cxlds,
+ struct cxl_endpoint_decoder *cxled,
+ const resource_size_t skip_base,
+ const resource_size_t skip_len)
+{
+ resource_size_t skipped = __adjust_skip(cxlds, skip_base, skip_len,
+ dev_name(&cxled->cxld.dev));
+
+ if (skipped == skip_len)
+ return 0;
+
+ dev_dbg(cxlds->dev,
+ "%s: failed to reserve skipped space (%pa %pa %pa)\n",
+ dev_name(&cxled->cxld.dev), &skip_base, &skip_len, &skipped);
+
+ release_skip(cxlds, skip_base, skipped);
+
+ return -EBUSY;
+}
+
static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped)
@@ -276,7 +359,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &port->dev;
+ enum cxl_decoder_mode mode;
struct resource *res;
+ int rc;
lockdep_assert_held_write(&cxl_dpa_rwsem);
@@ -305,14 +390,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
}
if (skipped) {
- res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
- dev_name(&cxled->cxld.dev), 0);
- if (!res) {
- dev_dbg(dev,
- "decoder%d.%d: failed to reserve skipped space\n",
- port->id, cxled->cxld.id);
- return -EBUSY;
- }
+ rc = request_skip(cxlds, cxled, base - skipped, skipped);
+ if (rc)
+ return rc;
}
res = __request_region(&cxlds->dpa_res, base, len,
dev_name(&cxled->cxld.dev), 0);
@@ -320,22 +400,23 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
port->id, cxled->cxld.id);
if (skipped)
- __release_region(&cxlds->dpa_res, base - skipped,
- skipped);
+ release_skip(cxlds, base - skipped, skipped);
return -EBUSY;
}
cxled->dpa_res = res;
cxled->skip = skipped;
- if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
- cxled->mode = CXL_DECODER_PMEM;
- else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
- cxled->mode = CXL_DECODER_RAM;
- else {
+ mode = CXL_DECODER_NONE;
+ for (int i = 0; cxlds->nr_partitions; i++)
+ if (resource_contains(&cxlds->part[i].res, res)) {
+ mode = cxl_part_mode(cxlds->part[i].mode);
+ break;
+ }
+
+ if (mode == CXL_DECODER_NONE)
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
port->id, cxled->cxld.id, res);
- cxled->mode = CXL_DECODER_NONE;
- }
+ cxled->mode = mode;
port->hdm_end++;
get_device(&cxled->cxld.dev);
@@ -542,15 +623,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- resource_size_t free_ram_start, free_pmem_start;
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &cxled->cxld.dev;
- resource_size_t start, avail, skip;
+ struct resource *res, *prev = NULL;
+ resource_size_t start, avail, skip, skip_start;
struct resource *p, *last;
- const struct resource *ram_res = to_ram_res(cxlds);
- const struct resource *pmem_res = to_pmem_res(cxlds);
- int rc;
+ int part, rc;
down_write(&cxl_dpa_rwsem);
if (cxled->cxld.region) {
@@ -566,47 +645,53 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
goto out;
}
- for (p = ram_res->child, last = NULL; p; p = p->sibling)
- last = p;
- if (last)
- free_ram_start = last->end + 1;
- else
- free_ram_start = ram_res->start;
+ part = -1;
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
+ part = i;
+ break;
+ }
+ }
+
+ if (part < 0) {
+ rc = -EBUSY;
+ goto out;
+ }
- for (p = pmem_res->child, last = NULL; p; p = p->sibling)
+ res = &cxlds->part[part].res;
+ for (p = res->child, last = NULL; p; p = p->sibling)
last = p;
if (last)
- free_pmem_start = last->end + 1;
+ start = last->end + 1;
else
- free_pmem_start = pmem_res->start;
+ start = res->start;
- if (cxled->mode == CXL_DECODER_RAM) {
- start = free_ram_start;
- avail = ram_res->end - start + 1;
- skip = 0;
- } else if (cxled->mode == CXL_DECODER_PMEM) {
- resource_size_t skip_start, skip_end;
-
- start = free_pmem_start;
- avail = pmem_res->end - start + 1;
- skip_start = free_ram_start;
-
- /*
- * If some pmem is already allocated, then that allocation
- * already handled the skip.
- */
- if (pmem_res->child &&
- skip_start == pmem_res->child->start)
- skip_end = skip_start - 1;
- else
- skip_end = start - 1;
- skip = skip_end - skip_start + 1;
- } else {
- dev_dbg(dev, "mode not set\n");
- rc = -EINVAL;
- goto out;
+ /*
+ * To allocate at partition N, a skip needs to be calculated for all
+ * unallocated space at lower partitions indices.
+ *
+ * If a partition has any allocations, the search can end because a
+ * previous cxl_dpa_alloc() invocation is assumed to have accounted for
+ * all previous partitions.
+ */
+ skip_start = CXL_RESOURCE_NONE;
+ for (int i = part; i; i--) {
+ prev = &cxlds->part[i - 1].res;
+ for (p = prev->child, last = NULL; p; p = p->sibling)
+ last = p;
+ if (last) {
+ skip_start = last->end + 1;
+ break;
+ }
+ skip_start = prev->start;
}
+ avail = res->end - start + 1;
+ if (skip_start == CXL_RESOURCE_NONE)
+ skip = 0;
+ else
+ skip = res->start - skip_start;
+
if (size > avail) {
dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
cxl_decoder_mode_name(cxled->mode), &avail);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e33b2d5efed9..9fe615d96573 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -529,6 +529,20 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
return resource_size(res);
}
+/*
+ * Translate the operational mode of memory capacity with the
+ * operational mode of a decoder
+ * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
+ */
+static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
+{
+ if (mode == CXL_PARTMODE_RAM)
+ return CXL_DECODER_RAM;
+ if (mode == CXL_PARTMODE_PMEM)
+ return CXL_DECODER_PMEM;
+ return CXL_DECODER_NONE;
+}
+
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
{
return dev_get_drvdata(cxl_mbox->host);
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v3 4/6] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-02-04 4:24 ` [PATCH v3 4/6] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
@ 2025-02-04 12:13 ` Jonathan Cameron
0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron @ 2025-02-04 12:13 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Ira Weiny, Alejandro Lucero, Dave Jiang
On Mon, 03 Feb 2025 20:24:24 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> allocations being distinct from RAM allocations in specific ways when in
> practice the allocation rules are only relative to DPA partition index.
>
> The rules for cxl_dpa_alloc() are:
>
> - allocations can only come from 1 partition
>
> - if allocating at partition-index-N, all free space in partitions less
> than partition-index-N must be skipped over
>
> Use the new 'struct cxl_dpa_partition' array to support allocation with
> an arbitrary number of DPA partitions on the device.
>
> A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
> concept and supersede it with looking up the memory properties from
> partition metadata. Until then cxl_part_mode() temporarily bridges code
> that looks up partitions by @cxled->mode.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Nice. More comments than questions in line....
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> @@ -542,15 +623,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - resource_size_t free_ram_start, free_pmem_start;
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> - resource_size_t start, avail, skip;
> + struct resource *res, *prev = NULL;
> + resource_size_t start, avail, skip, skip_start;
> struct resource *p, *last;
> - const struct resource *ram_res = to_ram_res(cxlds);
> - const struct resource *pmem_res = to_pmem_res(cxlds);
> - int rc;
> + int part, rc;
>
> down_write(&cxl_dpa_rwsem);
> if (cxled->cxld.region) {
> @@ -566,47 +645,53 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = ram_res->child, last = NULL; p; p = p->sibling)
> - last = p;
> - if (last)
> - free_ram_start = last->end + 1;
> - else
> - free_ram_start = ram_res->start;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> + part = i;
This code could be made simpler but you delete it in patch 5 anyway
so I'll drop my comments on it to avoid confusion.
I wrote a nice essay that will never see the light of day. Ah well.
> + break;
> + }
> + }
> +
> + if (part < 0) {
> + rc = -EBUSY;
> + goto out;
> + }
>
> - for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> + res = &cxlds->part[part].res;
> + for (p = res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> - free_pmem_start = last->end + 1;
> + start = last->end + 1;
> else
> - free_pmem_start = pmem_res->start;
> + start = res->start;
>
> - if (cxled->mode == CXL_DECODER_RAM) {
> - start = free_ram_start;
> - avail = ram_res->end - start + 1;
> - skip = 0;
> - } else if (cxled->mode == CXL_DECODER_PMEM) {
> - resource_size_t skip_start, skip_end;
> -
> - start = free_pmem_start;
> - avail = pmem_res->end - start + 1;
> - skip_start = free_ram_start;
> -
> - /*
> - * If some pmem is already allocated, then that allocation
> - * already handled the skip.
> - */
> - if (pmem_res->child &&
> - skip_start == pmem_res->child->start)
> - skip_end = skip_start - 1;
> - else
> - skip_end = start - 1;
> - skip = skip_end - skip_start + 1;
> - } else {
> - dev_dbg(dev, "mode not set\n");
> - rc = -EINVAL;
> - goto out;
> + /*
> + * To allocate at partition N, a skip needs to be calculated for all
> + * unallocated space at lower partitions indices.
> + *
> + * If a partition has any allocations, the search can end because a
> + * previous cxl_dpa_alloc() invocation is assumed to have accounted for
> + * all previous partitions.
> + */
> + skip_start = CXL_RESOURCE_NONE;
> + for (int i = part; i; i--) {
> + prev = &cxlds->part[i - 1].res;
> + for (p = prev->child, last = NULL; p; p = p->sibling)
> + last = p;
This pattern keeps turning up. Maybe a helper is appropriate?
last = resource_last_child()
Perhaps a job for another day.
> + if (last) {
> + skip_start = last->end + 1;
> + break;
> + }
> + skip_start = prev->start;
> }
>
> + avail = res->end - start + 1;
> + if (skip_start == CXL_RESOURCE_NONE)
> + skip = 0;
> + else
> + skip = res->start - skip_start;
> +
> if (size > avail) {
> dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
> cxl_decoder_mode_name(cxled->mode), &avail);
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 5/6] cxl: Kill enum cxl_decoder_mode
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
` (3 preceding siblings ...)
2025-02-04 4:24 ` [PATCH v3 4/6] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
@ 2025-02-04 4:24 ` Dan Williams
2025-02-04 12:23 ` Jonathan Cameron
2025-02-04 4:24 ` [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers Dan Williams
` (2 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2025-02-04 4:24 UTC (permalink / raw)
To: linux-cxl
Cc: Ira Weiny, Alejandro Lucero, Dave Jiang, dave.jiang,
Jonathan.Cameron
Now that the operational mode of DPA capacity (ram vs pmem... etc) is
tracked in the partition, and no code paths have dependencies on the
mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
can be cleaned up, specifically this ambiguity on whether the operation
mode implied anything about the partition order.
Endpoint decoders simply reference their assigned partition where the
operational mode can be retrieved as partition mode.
With this in place PMEM can now be partition0 which happens today when
the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
assumption can now just iterate partitions and consult the partition
mode after the fact.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 18 ++----
drivers/cxl/core/core.h | 4 +
drivers/cxl/core/hdm.c | 68 ++++++++++--------------
drivers/cxl/core/memdev.c | 15 +----
drivers/cxl/core/port.c | 21 ++++++-
drivers/cxl/core/region.c | 127 +++++++++++++++++++++++++--------------------
drivers/cxl/cxl.h | 37 +++----------
drivers/cxl/cxlmem.h | 19 -------
8 files changed, 134 insertions(+), 175 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 33cabe4aa026..f96ae28022b0 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -579,23 +579,15 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
return range_contains(&perf->dpa_range, &dpa);
}
-static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled,
- enum cxl_decoder_mode mode)
+static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_dpa_perf *perf;
- switch (mode) {
- case CXL_DECODER_RAM:
- perf = to_ram_perf(cxlds);
- break;
- case CXL_DECODER_PMEM:
- perf = to_pmem_perf(cxlds);
- break;
- default:
+ if (cxled->part < 0)
return ERR_PTR(-EINVAL);
- }
+ perf = &cxlds->part[cxled->part].perf;
if (!perf)
return ERR_PTR(-EINVAL);
@@ -659,7 +651,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr,
if (cxlds->rcd)
return -ENODEV;
- perf = cxled_get_dpa_perf(cxled, cxlr->mode);
+ perf = cxled_get_dpa_perf(cxled);
if (IS_ERR(perf))
return PTR_ERR(perf);
@@ -1065,7 +1057,7 @@ void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
lockdep_assert_held(&cxl_dpa_rwsem);
- perf = cxled_get_dpa_perf(cxled, cxlr->mode);
+ perf = cxled_get_dpa_perf(cxled);
if (IS_ERR(perf))
return;
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 800466f96a68..22dac79c5192 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -72,8 +72,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
resource_size_t length);
struct dentry *cxl_debugfs_create_dir(const char *dir);
-int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
- enum cxl_decoder_mode mode);
+int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
+ enum cxl_partition_mode mode);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 78ecb88bad7e..d705dec1471e 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -359,7 +359,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &port->dev;
- enum cxl_decoder_mode mode;
struct resource *res;
int rc;
@@ -406,17 +405,21 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
cxled->dpa_res = res;
cxled->skip = skipped;
- mode = CXL_DECODER_NONE;
- for (int i = 0; cxlds->nr_partitions; i++)
- if (resource_contains(&cxlds->part[i].res, res)) {
- mode = cxl_part_mode(cxlds->part[i].mode);
- break;
- }
+ /*
+ * When allocating new capacity, ->part is already set, when
+ * discovering decoder settings at initial enumeration, ->part
+ * is not set.
+ */
+ if (cxled->part < 0)
+ for (int i = 0; cxlds->nr_partitions; i++)
+ if (resource_contains(&cxlds->part[i].res, res)) {
+ cxled->part = i;
+ break;
+ }
- if (mode == CXL_DECODER_NONE)
+ if (cxled->part < 0)
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
port->id, cxled->cxld.id, res);
- cxled->mode = mode;
port->hdm_end++;
get_device(&cxled->cxld.dev);
@@ -583,40 +586,33 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
return rc;
}
-int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
- enum cxl_decoder_mode mode)
+int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
+ enum cxl_partition_mode mode)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &cxled->cxld.dev;
-
- switch (mode) {
- case CXL_DECODER_RAM:
- case CXL_DECODER_PMEM:
- break;
- default:
- dev_dbg(dev, "unsupported mode: %d\n", mode);
- return -EINVAL;
- }
+ int part;
guard(rwsem_write)(&cxl_dpa_rwsem);
if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
return -EBUSY;
- /*
- * Only allow modes that are supported by the current partition
- * configuration
- */
- if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
- dev_dbg(dev, "no available pmem capacity\n");
- return -ENXIO;
+ for (part = 0; part < cxlds->nr_partitions; part++)
+ if (cxlds->part[part].mode == mode)
+ break;
+
+ if (part >= cxlds->nr_partitions) {
+ dev_dbg(dev, "unsupported mode: %d\n", mode);
+ return -EINVAL;
}
- if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
- dev_dbg(dev, "no available ram capacity\n");
+
+ if (!resource_size(&cxlds->part[part].res)) {
+ dev_dbg(dev, "no available capacity for mode: %d\n", mode);
return -ENXIO;
}
- cxled->mode = mode;
+ cxled->part = part;
return 0;
}
@@ -645,15 +641,9 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
goto out;
}
- part = -1;
- for (int i = 0; i < cxlds->nr_partitions; i++) {
- if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
- part = i;
- break;
- }
- }
-
+ part = cxled->part;
if (part < 0) {
+ dev_dbg(dev, "partition not set\n");
rc = -EBUSY;
goto out;
}
@@ -694,7 +684,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
if (size > avail) {
dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
- cxl_decoder_mode_name(cxled->mode), &avail);
+ res->name, &avail);
rc = -ENOSPC;
goto out;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index be0eb57086e1..615cbd861f66 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -198,17 +198,8 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
int rc = 0;
/* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
- if (cxl_pmem_size(cxlds)) {
- const struct resource *res = to_pmem_res(cxlds);
-
- offset = res->start;
- length = resource_size(res);
- rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc)
- return rc;
- }
- if (cxl_ram_size(cxlds)) {
- const struct resource *res = to_ram_res(cxlds);
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ const struct resource *res = &cxlds->part[i].res;
offset = res->start;
length = resource_size(res);
@@ -217,7 +208,7 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
* Invalid Physical Address is not an error for
* volatile addresses. Device support is optional.
*/
- if (rc == -EFAULT)
+ if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
rc = 0;
}
return rc;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 78a5c2c25982..fc24c5bc9f2a 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -194,25 +194,35 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ /* without @cxl_dpa_rwsem, make sure @part is not reloaded */
+ int part = READ_ONCE(cxled->part);
+ const char *desc;
+
+ if (part < 0)
+ desc = "none";
+ else
+ desc = cxlds->part[part].res.name;
- return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
+ return sysfs_emit(buf, "%s\n", desc);
}
static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
- enum cxl_decoder_mode mode;
+ enum cxl_partition_mode mode;
ssize_t rc;
if (sysfs_streq(buf, "pmem"))
- mode = CXL_DECODER_PMEM;
+ mode = CXL_PARTMODE_PMEM;
else if (sysfs_streq(buf, "ram"))
- mode = CXL_DECODER_RAM;
+ mode = CXL_PARTMODE_RAM;
else
return -EINVAL;
- rc = cxl_dpa_set_mode(cxled, mode);
+ rc = cxl_dpa_set_part(cxled, mode);
if (rc)
return rc;
@@ -1899,6 +1909,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port)
return ERR_PTR(-ENOMEM);
cxled->pos = -1;
+ cxled->part = -1;
cxld = &cxled->cxld;
rc = cxl_decoder_init(port, cxld);
if (rc) {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 6706a1b79549..84ce625b8591 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -144,7 +144,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
rc = down_read_interruptible(&cxl_region_rwsem);
if (rc)
return rc;
- if (cxlr->mode != CXL_DECODER_PMEM)
+ if (cxlr->mode != CXL_PARTMODE_PMEM)
rc = sysfs_emit(buf, "\n");
else
rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
@@ -441,7 +441,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
* Support tooling that expects to find a 'uuid' attribute for all
* regions regardless of mode.
*/
- if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
+ if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_PARTMODE_PMEM)
return 0444;
return a->mode;
}
@@ -603,8 +603,16 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_region *cxlr = to_cxl_region(dev);
+ const char *desc;
- return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
+ if (cxlr->mode == CXL_PARTMODE_RAM)
+ desc = "ram";
+ else if (cxlr->mode == CXL_PARTMODE_PMEM)
+ desc = "pmem";
+ else
+ desc = "";
+
+ return sysfs_emit(buf, "%s\n", desc);
}
static DEVICE_ATTR_RO(mode);
@@ -630,7 +638,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
/* ways, granularity and uuid (if PMEM) need to be set before HPA */
if (!p->interleave_ways || !p->interleave_granularity ||
- (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid)))
+ (cxlr->mode == CXL_PARTMODE_PMEM && uuid_is_null(&p->uuid)))
return -ENXIO;
div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder);
@@ -1888,6 +1896,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_region_params *p = &cxlr->params;
struct cxl_port *ep_port, *root_port;
struct cxl_dport *dport;
@@ -1902,17 +1911,17 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return rc;
}
- if (cxled->mode != cxlr->mode) {
- dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
- dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
- return -EINVAL;
- }
-
- if (cxled->mode == CXL_DECODER_DEAD) {
+ if (cxled->part < 0) {
dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
return -ENODEV;
}
+ if (cxlds->part[cxled->part].mode != cxlr->mode) {
+ dev_dbg(&cxlr->dev, "%s region mode: %d mismatch\n",
+ dev_name(&cxled->cxld.dev), cxlr->mode);
+ return -EINVAL;
+ }
+
/* all full of members, or interleave config not established? */
if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
dev_dbg(&cxlr->dev, "region already active\n");
@@ -2115,7 +2124,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
{
down_write(&cxl_region_rwsem);
- cxled->mode = CXL_DECODER_DEAD;
+ cxled->part = -1;
cxl_region_detach(cxled);
up_write(&cxl_region_rwsem);
}
@@ -2471,7 +2480,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb,
*/
static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
int id,
- enum cxl_decoder_mode mode,
+ enum cxl_partition_mode mode,
enum cxl_decoder_type type)
{
struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
@@ -2525,13 +2534,13 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_decoder_mode mode, int id)
+ enum cxl_partition_mode mode, int id)
{
int rc;
switch (mode) {
- case CXL_DECODER_RAM:
- case CXL_DECODER_PMEM:
+ case CXL_PARTMODE_RAM:
+ case CXL_PARTMODE_PMEM:
break;
default:
dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
@@ -2551,7 +2560,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
}
static ssize_t create_region_store(struct device *dev, const char *buf,
- size_t len, enum cxl_decoder_mode mode)
+ size_t len, enum cxl_partition_mode mode)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_region *cxlr;
@@ -2572,7 +2581,7 @@ static ssize_t create_pmem_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_DECODER_PMEM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
}
DEVICE_ATTR_RW(create_pmem_region);
@@ -2580,7 +2589,7 @@ static ssize_t create_ram_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_DECODER_RAM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
}
DEVICE_ATTR_RW(create_ram_region);
@@ -2678,7 +2687,7 @@ EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, "CXL");
struct cxl_poison_context {
struct cxl_port *port;
- enum cxl_decoder_mode mode;
+ int part;
u64 offset;
};
@@ -2686,49 +2695,45 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
struct cxl_poison_context *ctx)
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ const struct resource *res;
+ struct resource *p, *last;
u64 offset, length;
int rc = 0;
+ if (ctx->part < 0)
+ return 0;
+
/*
- * Collect poison for the remaining unmapped resources
- * after poison is collected by committed endpoints.
- *
- * Knowing that PMEM must always follow RAM, get poison
- * for unmapped resources based on the last decoder's mode:
- * ram: scan remains of ram range, then any pmem range
- * pmem: scan remains of pmem range
+ * Collect poison for the remaining unmapped resources after
+ * poison is collected by committed endpoints decoders.
*/
-
- if (ctx->mode == CXL_DECODER_RAM) {
- offset = ctx->offset;
- length = cxl_ram_size(cxlds) - offset;
+ for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
+ res = &cxlds->part[i].res;
+ for (p = res->child, last = NULL; p; p = p->sibling)
+ last = p;
+ if (last)
+ offset = last->end + 1;
+ else
+ offset = res->start;
+ length = res->end - offset + 1;
+ if (!length)
+ break;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc == -EFAULT)
- rc = 0;
+ if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
+ continue;
if (rc)
- return rc;
- }
- if (ctx->mode == CXL_DECODER_PMEM) {
- offset = ctx->offset;
- length = resource_size(&cxlds->dpa_res) - offset;
- if (!length)
- return 0;
- } else if (cxl_pmem_size(cxlds)) {
- const struct resource *res = to_pmem_res(cxlds);
-
- offset = res->start;
- length = resource_size(res);
- } else {
- return 0;
+ break;
}
- return cxl_mem_get_poison(cxlmd, offset, length, NULL);
+ return rc;
}
static int poison_by_decoder(struct device *dev, void *arg)
{
struct cxl_poison_context *ctx = arg;
struct cxl_endpoint_decoder *cxled;
+ enum cxl_partition_mode mode;
+ struct cxl_dev_state *cxlds;
struct cxl_memdev *cxlmd;
u64 offset, length;
int rc = 0;
@@ -2737,15 +2742,18 @@ static int poison_by_decoder(struct device *dev, void *arg)
return rc;
cxled = to_cxl_endpoint_decoder(dev);
- if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
+ if (!cxled->dpa_res)
return rc;
cxlmd = cxled_to_memdev(cxled);
+ cxlds = cxlmd->cxlds;
+ mode = cxlds->part[cxled->part].mode;
+
if (cxled->skip) {
offset = cxled->dpa_res->start - cxled->skip;
length = cxled->skip;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
+ if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
rc = 0;
if (rc)
return rc;
@@ -2754,7 +2762,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
offset = cxled->dpa_res->start;
length = cxled->dpa_res->end - offset + 1;
rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
- if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
+ if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
rc = 0;
if (rc)
return rc;
@@ -2762,7 +2770,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
/* Iterate until commit_end is reached */
if (cxled->cxld.id == ctx->port->commit_end) {
ctx->offset = cxled->dpa_res->end + 1;
- ctx->mode = cxled->mode;
+ ctx->part = cxled->part;
return 1;
}
@@ -2775,7 +2783,8 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
int rc = 0;
ctx = (struct cxl_poison_context) {
- .port = port
+ .port = port,
+ .part = -1,
};
rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
@@ -3220,14 +3229,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_port *port = cxlrd_to_port(cxlrd);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct range *hpa = &cxled->cxld.hpa_range;
+ int rc, part = READ_ONCE(cxled->part);
struct cxl_region_params *p;
struct cxl_region *cxlr;
struct resource *res;
- int rc;
+
+ if (part < 0)
+ return ERR_PTR(-EBUSY);
do {
- cxlr = __create_region(cxlrd, cxled->mode,
+ cxlr = __create_region(cxlrd, cxlds->part[part].mode,
atomic_read(&cxlrd->region_id));
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
@@ -3430,9 +3443,9 @@ static int cxl_region_probe(struct device *dev)
return rc;
switch (cxlr->mode) {
- case CXL_DECODER_PMEM:
+ case CXL_PARTMODE_PMEM:
return devm_cxl_add_pmem_region(cxlr);
- case CXL_DECODER_RAM:
+ case CXL_PARTMODE_RAM:
/*
* The region can not be manged by CXL if any portion of
* it is already online as 'System RAM'
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9fd1524ed150..94a34833eedd 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -372,30 +372,6 @@ struct cxl_decoder {
void (*reset)(struct cxl_decoder *cxld);
};
-/*
- * CXL_DECODER_DEAD prevents endpoints from being reattached to regions
- * while cxld_unregister() is running
- */
-enum cxl_decoder_mode {
- CXL_DECODER_NONE,
- CXL_DECODER_RAM,
- CXL_DECODER_PMEM,
- CXL_DECODER_DEAD,
-};
-
-static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
-{
- static const char * const names[] = {
- [CXL_DECODER_NONE] = "none",
- [CXL_DECODER_RAM] = "ram",
- [CXL_DECODER_PMEM] = "pmem",
- };
-
- if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
- return names[mode];
- return "mixed";
-}
-
/*
* Track whether this decoder is reserved for region autodiscovery, or
* free for userspace provisioning.
@@ -410,16 +386,16 @@ enum cxl_decoder_state {
* @cxld: base cxl_decoder_object
* @dpa_res: actively claimed DPA span of this decoder
* @skip: offset into @dpa_res where @cxld.hpa_range maps
- * @mode: which memory type / access-mode-partition this decoder targets
* @state: autodiscovery state
+ * @part: partition index this decoder maps
* @pos: interleave position in @cxld.region
*/
struct cxl_endpoint_decoder {
struct cxl_decoder cxld;
struct resource *dpa_res;
resource_size_t skip;
- enum cxl_decoder_mode mode;
enum cxl_decoder_state state;
+ int part;
int pos;
};
@@ -504,6 +480,11 @@ struct cxl_region_params {
int nr_targets;
};
+enum cxl_partition_mode {
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
@@ -523,7 +504,7 @@ struct cxl_region_params {
* struct cxl_region - CXL region
* @dev: This region's device
* @id: This region's id. Id is globally unique across all regions
- * @mode: Endpoint decoder allocation / access mode
+ * @mode: Operational mode of the mapped capacity
* @type: Endpoint decoder target type
* @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
* @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge
@@ -536,7 +517,7 @@ struct cxl_region_params {
struct cxl_region {
struct device dev;
int id;
- enum cxl_decoder_mode mode;
+ enum cxl_partition_mode mode;
enum cxl_decoder_type type;
struct cxl_nvdimm_bridge *cxl_nvb;
struct cxl_pmem_region *cxlr_pmem;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 9fe615d96573..f218d43dec9f 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -97,11 +97,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
-enum cxl_partition_mode {
- CXL_PARTMODE_RAM,
- CXL_PARTMODE_PMEM,
-};
-
#define CXL_NR_PARTITIONS_MAX 2
struct cxl_dpa_info {
@@ -529,20 +524,6 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
return resource_size(res);
}
-/*
- * Translate the operational mode of memory capacity with the
- * operational mode of a decoder
- * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
- */
-static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
-{
- if (mode == CXL_PARTMODE_RAM)
- return CXL_DECODER_RAM;
- if (mode == CXL_PARTMODE_PMEM)
- return CXL_DECODER_PMEM;
- return CXL_DECODER_NONE;
-}
-
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
{
return dev_get_drvdata(cxl_mbox->host);
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v3 5/6] cxl: Kill enum cxl_decoder_mode
2025-02-04 4:24 ` [PATCH v3 5/6] cxl: Kill enum cxl_decoder_mode Dan Williams
@ 2025-02-04 12:23 ` Jonathan Cameron
2025-02-04 18:57 ` Dan Williams
0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron @ 2025-02-04 12:23 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Ira Weiny, Alejandro Lucero, Dave Jiang
On Mon, 03 Feb 2025 20:24:29 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> tracked in the partition, and no code paths have dependencies on the
> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> can be cleaned up, specifically this ambiguity on whether the operation
> mode implied anything about the partition order.
>
> Endpoint decoders simply reference their assigned partition where the
> operational mode can be retrieved as partition mode.
>
> With this in place PMEM can now be partition0 which happens today when
> the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
> DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
> assumption can now just iterate partitions and consult the partition
> mode after the fact.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
One trivial equality check inline to tidy up otherwise nice.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/cdat.c | 18 ++----
> drivers/cxl/core/core.h | 4 +
> drivers/cxl/core/hdm.c | 68 ++++++++++--------------
> drivers/cxl/core/memdev.c | 15 +----
> drivers/cxl/core/port.c | 21 ++++++-
> drivers/cxl/core/region.c | 127 +++++++++++++++++++++++++--------------------
> drivers/cxl/cxl.h | 37 +++----------
> drivers/cxl/cxlmem.h | 19 -------
> 8 files changed, 134 insertions(+), 175 deletions(-)
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 78ecb88bad7e..d705dec1471e 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -359,7 +359,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode)
> +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> + enum cxl_partition_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> -
> - switch (mode) {
> - case CXL_DECODER_RAM:
> - case CXL_DECODER_PMEM:
> - break;
> - default:
> - dev_dbg(dev, "unsupported mode: %d\n", mode);
> - return -EINVAL;
> - }
> + int part;
>
> guard(rwsem_write)(&cxl_dpa_rwsem);
> if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> return -EBUSY;
>
> - /*
> - * Only allow modes that are supported by the current partition
> - * configuration
> - */
> - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> - dev_dbg(dev, "no available pmem capacity\n");
> - return -ENXIO;
> + for (part = 0; part < cxlds->nr_partitions; part++)
> + if (cxlds->part[part].mode == mode)
> + break;
> +
> + if (part >= cxlds->nr_partitions) {
How would it be greater?
> + dev_dbg(dev, "unsupported mode: %d\n", mode);
> + return -EINVAL;
> }
> - if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> - dev_dbg(dev, "no available ram capacity\n");
> +
> + if (!resource_size(&cxlds->part[part].res)) {
> + dev_dbg(dev, "no available capacity for mode: %d\n", mode);
> return -ENXIO;
> }
>
> - cxled->mode = mode;
> + cxled->part = part;
> return 0;
> }
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 5/6] cxl: Kill enum cxl_decoder_mode
2025-02-04 12:23 ` Jonathan Cameron
@ 2025-02-04 18:57 ` Dan Williams
0 siblings, 0 replies; 19+ messages in thread
From: Dan Williams @ 2025-02-04 18:57 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: linux-cxl, Ira Weiny, Alejandro Lucero, Dave Jiang
Jonathan Cameron wrote:
> On Mon, 03 Feb 2025 20:24:29 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> > tracked in the partition, and no code paths have dependencies on the
> > mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> > can be cleaned up, specifically this ambiguity on whether the operation
> > mode implied anything about the partition order.
> >
> > Endpoint decoders simply reference their assigned partition where the
> > operational mode can be retrieved as partition mode.
> >
> > With this in place PMEM can now be partition0 which happens today when
> > the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
> > DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
> > assumption can now just iterate partitions and consult the partition
> > mode after the fact.
> >
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> One trivial equality check inline to tidy up otherwise nice.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[..]
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 78ecb88bad7e..d705dec1471e 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -359,7 +359,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>
> > -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > - enum cxl_decoder_mode mode)
> > +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> > + enum cxl_partition_mode mode)
> > {
> > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > struct device *dev = &cxled->cxld.dev;
> > -
> > - switch (mode) {
> > - case CXL_DECODER_RAM:
> > - case CXL_DECODER_PMEM:
> > - break;
> > - default:
> > - dev_dbg(dev, "unsupported mode: %d\n", mode);
> > - return -EINVAL;
> > - }
> > + int part;
> >
> > guard(rwsem_write)(&cxl_dpa_rwsem);
> > if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> > return -EBUSY;
> >
> > - /*
> > - * Only allow modes that are supported by the current partition
> > - * configuration
> > - */
> > - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> > - dev_dbg(dev, "no available pmem capacity\n");
> > - return -ENXIO;
> > + for (part = 0; part < cxlds->nr_partitions; part++)
> > + if (cxlds->part[part].mode == mode)
> > + break;
> > +
> > + if (part >= cxlds->nr_partitions) {
>
> How would it be greater?
It would never be greater in this context, this is just a personal
coding style preference for ">=" identifying the opposite set of numbers
from "<" rather than "==".
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
` (4 preceding siblings ...)
2025-02-04 4:24 ` [PATCH v3 5/6] cxl: Kill enum cxl_decoder_mode Dan Williams
@ 2025-02-04 4:24 ` Dan Williams
2025-02-04 12:32 ` Jonathan Cameron
2025-02-04 20:52 ` Ira Weiny
2025-02-04 10:42 ` [PATCH v3 0/6] cxl: DPA partition metadata is a mess Alejandro Lucero Palau
2025-02-04 21:33 ` Dave Jiang
7 siblings, 2 replies; 19+ messages in thread
From: Dan Williams @ 2025-02-04 4:24 UTC (permalink / raw)
To: linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, dave.jiang,
Jonathan.Cameron
Now that the 'struct cxl_dpa_partition' array contains both size and
performance information, all paths that iterate over that information
can use a loop rather than hard-code 'ram' and 'pmem' lookups.
Remove, or reduce the scope of the temporary helpers that bridged the
pre-'struct cxl_dpa_partition' state of the code to the post-'struct
cxl_dpa_partition' state.
- to_{ram,pmem}_perf(): scope reduced to just sysfs_emit + is_visible()
helpers
- to_{ram,pmem}_res(): fold into their only users cxl_{ram,pmem}_size()
- cxl_ram_size(): scope reduced to ram_size_show() (Note,
cxl_pmem_size() also used to gate nvdimm registration)
In short, memdev sysfs ABI already made the promise that 0-sized
partitions will show for memdevs, but that can be avoided for future
partitions by using dynamic sysfs group visibility (new relative to when
the partition ABI first shipped upstream).
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 51 ++++++++++++++++++++++-------------------
drivers/cxl/core/memdev.c | 23 ++++++++++++++++++
drivers/cxl/cxlmem.h | 56 ++++++---------------------------------------
3 files changed, 57 insertions(+), 73 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index f96ae28022b0..bfba7f5019cf 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -306,9 +306,6 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
{
- if (!dpa_perf)
- return;
-
*dpa_perf = (struct cxl_dpa_perf) {
.qos_class = CXL_QOS_CLASS_INVALID,
};
@@ -317,9 +314,6 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
static bool cxl_qos_match(struct cxl_port *root_port,
struct cxl_dpa_perf *dpa_perf)
{
- if (!dpa_perf)
- return false;
-
if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
return false;
@@ -351,37 +345,46 @@ static int match_cxlrd_hb(struct device *dev, void *data)
return 0;
}
-static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
+static void cxl_qos_class_verify(struct cxl_memdev *cxlmd)
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds),
- *pmem_perf = to_pmem_perf(cxlds);
struct cxl_port *root_port;
- int rc;
struct cxl_root *cxl_root __free(put_cxl_root) =
find_cxl_root(cxlmd->endpoint);
+ /*
+ * No need to reset_dpa_perf() here as find_cxl_root() is guaranteed to
+ * succeed when called in the cxl_endpoint_port_probe() path.
+ */
if (!cxl_root)
- return -ENODEV;
+ return;
root_port = &cxl_root->port;
- /* Check that the QTG IDs are all sane between end device and root decoders */
- if (!cxl_qos_match(root_port, ram_perf))
- reset_dpa_perf(ram_perf);
- if (!cxl_qos_match(root_port, pmem_perf))
- reset_dpa_perf(pmem_perf);
-
- /* Check to make sure that the device's host bridge is under a root decoder */
- rc = device_for_each_child(&root_port->dev,
- cxlmd->endpoint->host_bridge, match_cxlrd_hb);
- if (!rc) {
- reset_dpa_perf(ram_perf);
- reset_dpa_perf(pmem_perf);
+ /*
+ * Save userspace from needing to check if a qos class has any matches
+ * by hiding qos class info if the memdev is not mapped by a root
+ * decoder, or the partition class does not match any root decoder
+ * class.
+ */
+ if (!device_for_each_child(&root_port->dev,
+ cxlmd->endpoint->host_bridge,
+ match_cxlrd_hb)) {
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ struct cxl_dpa_perf *perf = &cxlds->part[i].perf;
+
+ reset_dpa_perf(perf);
+ }
+ return;
}
- return rc;
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ struct cxl_dpa_perf *perf = &cxlds->part[i].perf;
+
+ if (!cxl_qos_match(root_port, perf))
+ reset_dpa_perf(perf);
+ }
}
static void discard_dsmas(struct xarray *xa)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 615cbd861f66..63c6c681125d 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -75,6 +75,14 @@ static ssize_t label_storage_size_show(struct device *dev,
}
static DEVICE_ATTR_RO(label_storage_size);
+static resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
+{
+ /* Static RAM is only expected at partition 0. */
+ if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
+ return 0;
+ return resource_size(&cxlds->part[0].res);
+}
+
static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -399,6 +407,14 @@ static struct attribute *cxl_memdev_attributes[] = {
NULL,
};
+static struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
+{
+ for (int i = 0; i < cxlds->nr_partitions; i++)
+ if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+ return &cxlds->part[i].perf;
+ return NULL;
+}
+
static ssize_t pmem_qos_class_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -417,6 +433,13 @@ static struct attribute *cxl_memdev_pmem_attributes[] = {
NULL,
};
+static struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
+{
+ if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
+ return NULL;
+ return &cxlds->part[0].perf;
+}
+
static ssize_t ram_qos_class_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f218d43dec9f..36a091597066 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -470,58 +470,16 @@ struct cxl_dev_state {
struct cxl_mailbox cxl_mbox;
};
-
-/* Static RAM is only expected at partition 0. */
-static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
-{
- if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
- return NULL;
- return &cxlds->part[0].res;
-}
-
-/*
- * Static PMEM may be at partition index 0 when there is no static RAM
- * capacity.
- */
-static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
-{
- for (int i = 0; i < cxlds->nr_partitions; i++)
- if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
- return &cxlds->part[i].res;
- return NULL;
-}
-
-static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
-{
- if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
- return NULL;
- return &cxlds->part[0].perf;
-}
-
-static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
+static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
{
+ /*
+ * Static PMEM may be at partition index 0 when there is no static RAM
+ * capacity.
+ */
for (int i = 0; i < cxlds->nr_partitions; i++)
if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
- return &cxlds->part[i].perf;
- return NULL;
-}
-
-static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
-{
- const struct resource *res = to_ram_res(cxlds);
-
- if (!res)
- return 0;
- return resource_size(res);
-}
-
-static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
-{
- const struct resource *res = to_pmem_res(cxlds);
-
- if (!res)
- return 0;
- return resource_size(res);
+ return resource_size(&cxlds->part[i].res);
+ return 0;
}
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers
2025-02-04 4:24 ` [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers Dan Williams
@ 2025-02-04 12:32 ` Jonathan Cameron
2025-02-04 20:52 ` Ira Weiny
1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Cameron @ 2025-02-04 12:32 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
On Mon, 03 Feb 2025 20:24:35 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> Now that the 'struct cxl_dpa_partition' array contains both size and
> performance information, all paths that iterate over that information
> can use a loop rather than hard-code 'ram' and 'pmem' lookups.
>
> Remove, or reduce the scope of the temporary helpers that bridged the
> pre-'struct cxl_dpa_partition' state of the code to the post-'struct
> cxl_dpa_partition' state.
>
> - to_{ram,pmem}_perf(): scope reduced to just sysfs_emit + is_visible()
> helpers
>
> - to_{ram,pmem}_res(): fold into their only users cxl_{ram,pmem}_size()
>
> - cxl_ram_size(): scope reduced to ram_size_show() (Note,
> cxl_pmem_size() also used to gate nvdimm registration)
>
> In short, memdev sysfs ABI already made the promise that 0-sized
> partitions will show for memdevs, but that can be avoided for future
> partitions by using dynamic sysfs group visibility (new relative to when
> the partition ABI first shipped upstream).
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ok. This might make my earlier suggestions about not special
cases ram regions less effective, but I think perhaps worth considering
none the less.
Anyhow, this patch is fine.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers
2025-02-04 4:24 ` [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers Dan Williams
2025-02-04 12:32 ` Jonathan Cameron
@ 2025-02-04 20:52 ` Ira Weiny
1 sibling, 0 replies; 19+ messages in thread
From: Ira Weiny @ 2025-02-04 20:52 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> Now that the 'struct cxl_dpa_partition' array contains both size and
> performance information, all paths that iterate over that information
> can use a loop rather than hard-code 'ram' and 'pmem' lookups.
>
> Remove, or reduce the scope of the temporary helpers that bridged the
> pre-'struct cxl_dpa_partition' state of the code to the post-'struct
> cxl_dpa_partition' state.
>
> - to_{ram,pmem}_perf(): scope reduced to just sysfs_emit + is_visible()
> helpers
>
> - to_{ram,pmem}_res(): fold into their only users cxl_{ram,pmem}_size()
>
> - cxl_ram_size(): scope reduced to ram_size_show() (Note,
> cxl_pmem_size() also used to gate nvdimm registration)
>
> In short, memdev sysfs ABI already made the promise that 0-sized
> partitions will show for memdevs, but that can be avoided for future
> partitions by using dynamic sysfs group visibility (new relative to when
> the partition ABI first shipped upstream).
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 0/6] cxl: DPA partition metadata is a mess...
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
` (5 preceding siblings ...)
2025-02-04 4:24 ` [PATCH v3 6/6] cxl: Cleanup partition size and perf helpers Dan Williams
@ 2025-02-04 10:42 ` Alejandro Lucero Palau
2025-02-04 21:33 ` Dave Jiang
7 siblings, 0 replies; 19+ messages in thread
From: Alejandro Lucero Palau @ 2025-02-04 10:42 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Ira Weiny, Jonathan Cameron, Dave Jiang
On 2/4/25 04:24, Dan Williams wrote:
> Changes since v2 [0]:
> - Fix "no partition found" debug messages in update_perf_entry() and
> cxl_memdev_set_qos_class() (Jonathan)
> - Avoid touching dpa_perf_contains() twice in the series to add / remove
> NULL check (Alejandro)
> - Add a cxl_mode_name() helper (Jonathan)
> - Use @part as the iterator in cxl_dpa_set_part() to simplify mode
> lookups (Jonathan)
> - Require ordered + contiguous partitions (Ira and Alejandro)
> - Fix partition detection for BIOS established reservations, and kill
> CXL_PARTMODE_NONE (Jonathan)
> - Simplify release_skip() reuse its internals for request_skip()
> (Jonathan)
> - Add patch6 to finish cleanup of explicit partition lookups
>
> [0]: http://lore.kernel.org/173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com
>
> ---
>
> As noted in patch3, the pending efforts to add CXL Accelerator (type-2)
> device [1], and Dynamic Capacity (DCD) support [2], tripped on the
> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> device-physical-address (DPA) metadata.
>
> In fact there was no design at all, just a couple of open-coded 'struct
> resource' instances for 'ram' and 'pmem' and a pile of explicit code
> referencing those resources directly.
>
> See patch3 for more details on the specific problems that caused, and
> patch4 for the eyesore reduction of making the DPA allocation algorithm
> partition number agnostic.
>
> The motivation with this effort is to make it easier to land the Type-2
> and DCD series.
>
> [1]: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com
> [2]: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com
>
> ---
>
> Dan Williams (6):
> cxl: Remove the CXL_DECODER_MIXED mistake
> cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
> cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
> cxl: Make cxl_dpa_alloc() DPA partition number agnostic
> cxl: Kill enum cxl_decoder_mode
> cxl: Cleanup partition size and perf helpers
>
>
> drivers/cxl/core/cdat.c | 101 +++++++------
> drivers/cxl/core/core.h | 4 -
> drivers/cxl/core/hdm.c | 319 ++++++++++++++++++++++++++++++++----------
> drivers/cxl/core/mbox.c | 66 +++------
> drivers/cxl/core/memdev.c | 66 +++++----
> drivers/cxl/core/port.c | 21 ++-
> drivers/cxl/core/region.c | 137 +++++++++---------
> drivers/cxl/cxl.h | 39 +----
> drivers/cxl/cxlmem.h | 52 ++++++-
> drivers/cxl/mem.c | 2
> drivers/cxl/pci.c | 7 +
> tools/testing/cxl/test/cxl.c | 22 +--
> tools/testing/cxl/test/mem.c | 7 +
> 13 files changed, 519 insertions(+), 324 deletions(-)
>
> base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
I have used this last version with the impending Type2 v10 patchset
using these changes and it works as expected.
Using two different type2 devices/drivers, so for all the patches here:
Tested-by: Alejandro Lucero <alucerop@amd.com>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 0/6] cxl: DPA partition metadata is a mess...
2025-02-04 4:24 [PATCH v3 0/6] cxl: DPA partition metadata is a mess Dan Williams
` (6 preceding siblings ...)
2025-02-04 10:42 ` [PATCH v3 0/6] cxl: DPA partition metadata is a mess Alejandro Lucero Palau
@ 2025-02-04 21:33 ` Dave Jiang
7 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2025-02-04 21:33 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Ira Weiny, Jonathan Cameron, Alejandro Lucero
On 2/3/25 9:24 PM, Dan Williams wrote:
> Changes since v2 [0]:
> - Fix "no partition found" debug messages in update_perf_entry() and
> cxl_memdev_set_qos_class() (Jonathan)
> - Avoid touching dpa_perf_contains() twice in the series to add / remove
> NULL check (Alejandro)
> - Add a cxl_mode_name() helper (Jonathan)
> - Use @part as the iterator in cxl_dpa_set_part() to simplify mode
> lookups (Jonathan)
> - Require ordered + contiguous partitions (Ira and Alejandro)
> - Fix partition detection for BIOS established reservations, and kill
> CXL_PARTMODE_NONE (Jonathan)
> - Simplify release_skip() reuse its internals for request_skip()
> (Jonathan)
> - Add patch6 to finish cleanup of explicit partition lookups
>
> [0]: http://lore.kernel.org/173753635014.3849855.17902348420186052714.stgit@dwillia2-xfh.jf.intel.com
>
Series applied to cxl/next
> ---
>
> As noted in patch3, the pending efforts to add CXL Accelerator (type-2)
> device [1], and Dynamic Capacity (DCD) support [2], tripped on the
> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> device-physical-address (DPA) metadata.
>
> In fact there was no design at all, just a couple of open-coded 'struct
> resource' instances for 'ram' and 'pmem' and a pile of explicit code
> referencing those resources directly.
>
> See patch3 for more details on the specific problems that caused, and
> patch4 for the eyesore reduction of making the DPA allocation algorithm
> partition number agnostic.
>
> The motivation with this effort is to make it easier to land the Type-2
> and DCD series.
>
> [1]: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com
> [2]: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com
>
> ---
>
> Dan Williams (6):
> cxl: Remove the CXL_DECODER_MIXED mistake
> cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
> cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
> cxl: Make cxl_dpa_alloc() DPA partition number agnostic
> cxl: Kill enum cxl_decoder_mode
> cxl: Cleanup partition size and perf helpers
>
>
> drivers/cxl/core/cdat.c | 101 +++++++------
> drivers/cxl/core/core.h | 4 -
> drivers/cxl/core/hdm.c | 319 ++++++++++++++++++++++++++++++++----------
> drivers/cxl/core/mbox.c | 66 +++------
> drivers/cxl/core/memdev.c | 66 +++++----
> drivers/cxl/core/port.c | 21 ++-
> drivers/cxl/core/region.c | 137 +++++++++---------
> drivers/cxl/cxl.h | 39 +----
> drivers/cxl/cxlmem.h | 52 ++++++-
> drivers/cxl/mem.c | 2
> drivers/cxl/pci.c | 7 +
> tools/testing/cxl/test/cxl.c | 22 +--
> tools/testing/cxl/test/mem.c | 7 +
> 13 files changed, 519 insertions(+), 324 deletions(-)
>
> base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
^ permalink raw reply [flat|nested] 19+ messages in thread