* [PATCH v2 0/5] cxl: DPA partition metadata is a mess...
@ 2025-01-22 8:59 Dan Williams
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
` (5 more replies)
0 siblings, 6 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-22 8:59 UTC (permalink / raw)
To: linux-cxl
Cc: Ira Weiny, Dave Jiang, Alejandro Lucero, dave.jiang,
Jonathan.Cameron
Changes since v1: [0]
- Stop requiring PMEM to be at partition-index 1, i.e. remove empty
partitions. (Jonathan)
- Document the assumptions and implementation of
{request,release}_skip() (Jonathan, Alejandro)
- Kill 'enum cxl_decoder_mode' to cleanup remainder of hard-coded
expectations of a static PMEM partition always being present
[0]: http://lore.kernel.org/173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com
---
As noted in patch3, the pending efforts to add CXL Accelerator (type-2)
device [1], and Dynamic Capacity (DCD) support [2], tripped on the
no-longer-fit-for-purpose design in the CXL subsystem for tracking
device-physical-address (DPA) metadata.
In fact there was no design at all, just a couple of open-coded 'struct
resource' instances for 'ram' and 'pmem' and a pile of explicit code
referencing those resources directly.
See patch3 for more details on the specific problems that caused, and
patch4 for the eyesore reduction of making the DPA allocation algorithm
partition number agnostic.
The motivation with this effort is to make it easier to land the Type-2
and DCD series.
[1]: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com
[2]: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com
---
Dan Williams (5):
cxl: Remove the CXL_DECODER_MIXED mistake
cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
cxl: Make cxl_dpa_alloc() DPA partition number agnostic
cxl: Kill enum cxl_decoder_mode
drivers/cxl/core/cdat.c | 74 +++++-----
drivers/cxl/core/core.h | 4 -
drivers/cxl/core/hdm.c | 310 +++++++++++++++++++++++++++++++-----------
drivers/cxl/core/mbox.c | 66 +++------
drivers/cxl/core/memdev.c | 43 ++----
drivers/cxl/core/port.c | 20 ++-
drivers/cxl/core/region.c | 138 ++++++++++---------
drivers/cxl/cxl.h | 40 +----
drivers/cxl/cxlmem.h | 94 +++++++++++--
drivers/cxl/mem.c | 2
drivers/cxl/pci.c | 7 +
tools/testing/cxl/test/cxl.c | 22 +--
tools/testing/cxl/test/mem.c | 7 +
13 files changed, 511 insertions(+), 316 deletions(-)
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
@ 2025-01-22 8:59 ` Dan Williams
2025-01-22 14:11 ` Ira Weiny
` (3 more replies)
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
` (4 subsequent siblings)
5 siblings, 4 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-22 8:59 UTC (permalink / raw)
To: linux-cxl; +Cc: dave.jiang, Jonathan.Cameron
CXL_DECODER_MIXED is a safety mechanism introduced for the case where
platform firmware has programmed an endpoint decoder that straddles a
DPA partition boundary. While the kernel is careful to only allocate DPA
capacity within a single partition there is no guarantee that platform
firmware, or anything that touched the device before the current kernel,
gets that right.
However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
designation because of the way it tracks partition boundaries. A
request_resource() that spans ->ram_res and ->pmem_res fails with the
following signature:
__cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
CXL_DECODER_MIXED is dead defensive programming after the driver has
already given up on the device. It has never offered any protection in
practice, just delete it.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/hdm.c | 6 +++---
drivers/cxl/core/region.c | 12 ------------
drivers/cxl/cxl.h | 4 +---
3 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 28edd5822486..2848d6991d45 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -332,9 +332,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
else if (resource_contains(&cxlds->ram_res, res))
cxled->mode = CXL_DECODER_RAM;
else {
- dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n",
- port->id, cxled->cxld.id, cxled->dpa_res);
- cxled->mode = CXL_DECODER_MIXED;
+ dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
+ port->id, cxled->cxld.id, res);
+ cxled->mode = CXL_DECODER_NONE;
}
port->hdm_end++;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index d77899650798..e4885acac853 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2725,18 +2725,6 @@ static int poison_by_decoder(struct device *dev, void *arg)
if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
return rc;
- /*
- * Regions are only created with single mode decoders: pmem or ram.
- * Linux does not support mixed mode decoders. This means that
- * reading poison per endpoint decoder adheres to the requirement
- * that poison reads of pmem and ram must be separated.
- * CXL 3.0 Spec 8.2.9.8.4.1
- */
- if (cxled->mode == CXL_DECODER_MIXED) {
- dev_dbg(dev, "poison list read unsupported in mixed mode\n");
- return rc;
- }
-
cxlmd = cxled_to_memdev(cxled);
if (cxled->skip) {
offset = cxled->dpa_res->start - cxled->skip;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f6015f24ad38..4d0550367042 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -379,7 +379,6 @@ enum cxl_decoder_mode {
CXL_DECODER_NONE,
CXL_DECODER_RAM,
CXL_DECODER_PMEM,
- CXL_DECODER_MIXED,
CXL_DECODER_DEAD,
};
@@ -389,10 +388,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
[CXL_DECODER_NONE] = "none",
[CXL_DECODER_RAM] = "ram",
[CXL_DECODER_PMEM] = "pmem",
- [CXL_DECODER_MIXED] = "mixed",
};
- if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
+ if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
return names[mode];
return "mixed";
}
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
@ 2025-01-22 8:59 ` Dan Williams
2025-01-22 14:18 ` Ira Weiny
` (3 more replies)
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
` (3 subsequent siblings)
5 siblings, 4 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-22 8:59 UTC (permalink / raw)
To: linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, dave.jiang,
Jonathan.Cameron
In preparation for consolidating all DPA partition information into an
array of DPA metadata, introduce helpers that hide the layout of the
current data. I.e. make the eventual replacement of ->ram_res,
->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
no-op for code paths that consume that information, and reduce the noise
of follow-on patches.
The end goal is to consolidate all DPA information in 'struct
cxl_dev_state', but for now the helpers just make it appear that all DPA
metadata is relative to @cxlds.
Note that a follow-on patch also cleans up the temporary placeholders of
@ram_res, and @pmem_res in the qos_class manipulation code,
cxl_dpa_alloc(), and cxl_mem_create_range_info().
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++-----------------
drivers/cxl/core/hdm.c | 26 ++++++++--------
drivers/cxl/core/mbox.c | 18 ++++++-----
drivers/cxl/core/memdev.c | 42 +++++++++++++------------
drivers/cxl/core/region.c | 10 ++++--
drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++-----
drivers/cxl/mem.c | 2 +
tools/testing/cxl/test/cxl.c | 25 ++++++++-------
8 files changed, 159 insertions(+), 92 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 8153f8d83a16..b177a488e29b 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
struct xarray *dsmas_xa)
{
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
struct device *dev = cxlds->dev;
- struct range pmem_range = {
- .start = cxlds->pmem_res.start,
- .end = cxlds->pmem_res.end,
- };
- struct range ram_range = {
- .start = cxlds->ram_res.start,
- .end = cxlds->ram_res.end,
- };
struct dsmas_entry *dent;
unsigned long index;
+ const struct resource *partition[] = {
+ to_ram_res(cxlds),
+ to_pmem_res(cxlds),
+ };
+ struct cxl_dpa_perf *perf[] = {
+ to_ram_perf(cxlds),
+ to_pmem_perf(cxlds),
+ };
xa_for_each(dsmas_xa, index, dent) {
- if (resource_size(&cxlds->ram_res) &&
- range_contains(&ram_range, &dent->dpa_range))
- update_perf_entry(dev, dent, &mds->ram_perf);
- else if (resource_size(&cxlds->pmem_res) &&
- range_contains(&pmem_range, &dent->dpa_range))
- update_perf_entry(dev, dent, &mds->pmem_perf);
- else
- dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
- &dent->dpa_range);
+ for (int i = 0; i < ARRAY_SIZE(partition); i++) {
+ const struct resource *res = partition[i];
+ struct range range = {
+ .start = res->start,
+ .end = res->end,
+ };
+
+ if (range_contains(&range, &dent->dpa_range))
+ update_perf_entry(dev, dent, perf[i]);
+ else
+ dev_dbg(dev,
+ "no partition for dsmas dpa: %pra\n",
+ &dent->dpa_range);
+ }
}
}
@@ -304,6 +308,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
{
+ if (!dpa_perf)
+ return;
+
*dpa_perf = (struct cxl_dpa_perf) {
.qos_class = CXL_QOS_CLASS_INVALID,
};
@@ -312,6 +319,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
static bool cxl_qos_match(struct cxl_port *root_port,
struct cxl_dpa_perf *dpa_perf)
{
+ if (!dpa_perf)
+ return false;
+
if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
return false;
@@ -346,7 +356,8 @@ static int match_cxlrd_hb(struct device *dev, void *data)
static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds),
+ *pmem_perf = to_pmem_perf(cxlds);
struct cxl_port *root_port;
int rc;
@@ -359,17 +370,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
root_port = &cxl_root->port;
/* Check that the QTG IDs are all sane between end device and root decoders */
- if (!cxl_qos_match(root_port, &mds->ram_perf))
- reset_dpa_perf(&mds->ram_perf);
- if (!cxl_qos_match(root_port, &mds->pmem_perf))
- reset_dpa_perf(&mds->pmem_perf);
+ if (!cxl_qos_match(root_port, ram_perf))
+ reset_dpa_perf(ram_perf);
+ if (!cxl_qos_match(root_port, pmem_perf))
+ reset_dpa_perf(pmem_perf);
/* Check to make sure that the device's host bridge is under a root decoder */
rc = device_for_each_child(&root_port->dev,
cxlmd->endpoint->host_bridge, match_cxlrd_hb);
if (!rc) {
- reset_dpa_perf(&mds->ram_perf);
- reset_dpa_perf(&mds->pmem_perf);
+ reset_dpa_perf(ram_perf);
+ reset_dpa_perf(pmem_perf);
}
return rc;
@@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
.end = dpa_res->end,
};
+ if (!perf)
+ return false;
+
return range_contains(&perf->dpa_range, &dpa);
}
@@ -574,15 +588,15 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle
enum cxl_decoder_mode mode)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_dpa_perf *perf;
switch (mode) {
case CXL_DECODER_RAM:
- perf = &mds->ram_perf;
+ perf = to_ram_perf(cxlds);
break;
case CXL_DECODER_PMEM:
- perf = &mds->pmem_perf;
+ perf = to_pmem_perf(cxlds);
break;
default:
return ERR_PTR(-EINVAL);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 2848d6991d45..7a85522294ad 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
cxled->dpa_res = res;
cxled->skip = skipped;
- if (resource_contains(&cxlds->pmem_res, res))
+ if (resource_contains(to_pmem_res(cxlds), res))
cxled->mode = CXL_DECODER_PMEM;
- else if (resource_contains(&cxlds->ram_res, res))
+ else if (resource_contains(to_ram_res(cxlds), res))
cxled->mode = CXL_DECODER_RAM;
else {
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
@@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
* Only allow modes that are supported by the current partition
* configuration
*/
- if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
+ if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
dev_dbg(dev, "no available pmem capacity\n");
return -ENXIO;
}
- if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
+ if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
dev_dbg(dev, "no available ram capacity\n");
return -ENXIO;
}
@@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
struct device *dev = &cxled->cxld.dev;
resource_size_t start, avail, skip;
struct resource *p, *last;
+ const struct resource *ram_res = to_ram_res(cxlds);
+ const struct resource *pmem_res = to_pmem_res(cxlds);
int rc;
down_write(&cxl_dpa_rwsem);
@@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
goto out;
}
- for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
+ for (p = ram_res->child, last = NULL; p; p = p->sibling)
last = p;
if (last)
free_ram_start = last->end + 1;
else
- free_ram_start = cxlds->ram_res.start;
+ free_ram_start = ram_res->start;
- for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
+ for (p = pmem_res->child, last = NULL; p; p = p->sibling)
last = p;
if (last)
free_pmem_start = last->end + 1;
else
- free_pmem_start = cxlds->pmem_res.start;
+ free_pmem_start = pmem_res->start;
if (cxled->mode == CXL_DECODER_RAM) {
start = free_ram_start;
- avail = cxlds->ram_res.end - start + 1;
+ avail = ram_res->end - start + 1;
skip = 0;
} else if (cxled->mode == CXL_DECODER_PMEM) {
resource_size_t skip_start, skip_end;
start = free_pmem_start;
- avail = cxlds->pmem_res.end - start + 1;
+ avail = pmem_res->end - start + 1;
skip_start = free_ram_start;
/*
* If some pmem is already allocated, then that allocation
* already handled the skip.
*/
- if (cxlds->pmem_res.child &&
- skip_start == cxlds->pmem_res.child->start)
+ if (pmem_res->child &&
+ skip_start == pmem_res->child->start)
skip_end = skip_start - 1;
else
skip_end = start - 1;
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 548564c770c0..3502f1633ad2 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent,
int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct resource *ram_res = to_ram_res(cxlds);
+ struct resource *pmem_res = to_pmem_res(cxlds);
struct device *dev = cxlds->dev;
int rc;
if (!cxlds->media_ready) {
cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
- cxlds->ram_res = DEFINE_RES_MEM(0, 0);
- cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
+ *ram_res = DEFINE_RES_MEM(0, 0);
+ *pmem_res = DEFINE_RES_MEM(0, 0);
return 0;
}
cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
if (mds->partition_align_bytes == 0) {
- rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
+ rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
mds->volatile_only_bytes, "ram");
if (rc)
return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
+ return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
mds->volatile_only_bytes,
mds->persistent_only_bytes, "pmem");
}
@@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
return rc;
}
- rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
+ rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
mds->active_volatile_bytes, "ram");
if (rc)
return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
+ return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
mds->active_volatile_bytes,
mds->active_persistent_bytes, "pmem");
}
@@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->cxlds.reg_map.host = dev;
mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
- mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
- mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
+ to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
+ to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
return mds;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index ae3dfcbe8938..c5f8320ed330 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- unsigned long long len = resource_size(&cxlds->ram_res);
+ unsigned long long len = resource_size(to_ram_res(cxlds));
return sysfs_emit(buf, "%#llx\n", len);
}
@@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- unsigned long long len = resource_size(&cxlds->pmem_res);
+ unsigned long long len = cxl_pmem_size(cxlds);
return sysfs_emit(buf, "%#llx\n", len);
}
@@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
int rc = 0;
/* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
- if (resource_size(&cxlds->pmem_res)) {
- offset = cxlds->pmem_res.start;
- length = resource_size(&cxlds->pmem_res);
+ if (cxl_pmem_size(cxlds)) {
+ const struct resource *res = to_pmem_res(cxlds);
+
+ offset = res->start;
+ length = resource_size(res);
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
if (rc)
return rc;
}
- if (resource_size(&cxlds->ram_res)) {
- offset = cxlds->ram_res.start;
- length = resource_size(&cxlds->ram_res);
+ if (cxl_ram_size(cxlds)) {
+ const struct resource *res = to_ram_res(cxlds);
+
+ offset = res->start;
+ length = resource_size(res);
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
/*
* Invalid Physical Address is not an error for
@@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
+ return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class);
}
static struct device_attribute dev_attr_pmem_qos_class =
@@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
+ return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class);
}
static struct device_attribute dev_attr_ram_qos_class =
@@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds);
- if (a == &dev_attr_ram_qos_class.attr)
- if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
- return 0;
+ if (a == &dev_attr_ram_qos_class.attr &&
+ (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
+ return 0;
return a->mode;
}
@@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds);
- if (a == &dev_attr_pmem_qos_class.attr)
- if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
- return 0;
+ if (a == &dev_attr_pmem_qos_class.attr &&
+ (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
+ return 0;
return a->mode;
}
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e4885acac853..9f0f6fdbc841 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2688,7 +2688,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
if (ctx->mode == CXL_DECODER_RAM) {
offset = ctx->offset;
- length = resource_size(&cxlds->ram_res) - offset;
+ length = cxl_ram_size(cxlds) - offset;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
if (rc == -EFAULT)
rc = 0;
@@ -2700,9 +2700,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
length = resource_size(&cxlds->dpa_res) - offset;
if (!length)
return 0;
- } else if (resource_size(&cxlds->pmem_res)) {
- offset = cxlds->pmem_res.start;
- length = resource_size(&cxlds->pmem_res);
+ } else if (cxl_pmem_size(cxlds)) {
+ const struct resource *res = to_pmem_res(cxlds);
+
+ offset = res->start;
+ length = resource_size(res);
} else {
return 0;
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2a25d1957ddb..78e92e24d7b5 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -423,8 +423,8 @@ struct cxl_dpa_perf {
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
- * @pmem_res: Active Persistent memory capacity configuration
- * @ram_res: Active Volatile memory capacity configuration
+ * @_pmem_res: Active Persistent memory capacity configuration
+ * @_ram_res: Active Volatile memory capacity configuration
* @serial: PCIe Device Serial Number
* @type: Generic Memory Class device or Vendor Specific Memory device
* @cxl_mbox: CXL mailbox context
@@ -438,13 +438,41 @@ struct cxl_dev_state {
bool rcd;
bool media_ready;
struct resource dpa_res;
- struct resource pmem_res;
- struct resource ram_res;
+ struct resource _pmem_res;
+ struct resource _ram_res;
u64 serial;
enum cxl_devtype type;
struct cxl_mailbox cxl_mbox;
};
+static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
+{
+ return &cxlds->_ram_res;
+}
+
+static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
+{
+ return &cxlds->_pmem_res;
+}
+
+static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
+{
+ const struct resource *res = to_ram_res(cxlds);
+
+ if (!res)
+ return 0;
+ return resource_size(res);
+}
+
+static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
+{
+ const struct resource *res = to_pmem_res(cxlds);
+
+ if (!res)
+ return 0;
+ return resource_size(res);
+}
+
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
{
return dev_get_drvdata(cxl_mbox->host);
@@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
* @active_persistent_bytes: sum of hard + soft persistent
* @next_volatile_bytes: volatile capacity change pending device reset
* @next_persistent_bytes: persistent capacity change pending device reset
- * @ram_perf: performance data entry matched to RAM partition
- * @pmem_perf: performance data entry matched to PMEM partition
+ * @_ram_perf: performance data entry matched to RAM partition
+ * @_pmem_perf: performance data entry matched to PMEM partition
* @event: event log driver state
* @poison: poison driver state info
* @security: security driver state info
@@ -496,8 +524,8 @@ struct cxl_memdev_state {
u64 next_volatile_bytes;
u64 next_persistent_bytes;
- struct cxl_dpa_perf ram_perf;
- struct cxl_dpa_perf pmem_perf;
+ struct cxl_dpa_perf _ram_perf;
+ struct cxl_dpa_perf _pmem_perf;
struct cxl_event_state event;
struct cxl_poison_state poison;
@@ -505,6 +533,20 @@ struct cxl_memdev_state {
struct cxl_fw_state fw;
};
+static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
+
+ return &mds->_ram_perf;
+}
+
+static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
+
+ return &mds->_pmem_perf;
+}
+
static inline struct cxl_memdev_state *
to_cxl_memdev_state(struct cxl_dev_state *cxlds)
{
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 2f03a4d5606e..9675243bd05b 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev)
return -ENXIO;
}
- if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
+ if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
if (rc) {
if (rc == -ENODEV)
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index d0337c11f9ee..7f1c5061307b 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
find_cxl_root(port);
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
- struct range pmem_range = {
- .start = cxlds->pmem_res.start,
- .end = cxlds->pmem_res.end,
+ const struct resource *partition[] = {
+ to_ram_res(cxlds),
+ to_pmem_res(cxlds),
};
- struct range ram_range = {
- .start = cxlds->ram_res.start,
- .end = cxlds->ram_res.end,
+ struct cxl_dpa_perf *perf[] = {
+ to_ram_perf(cxlds),
+ to_pmem_perf(cxlds),
};
if (!cxl_root)
return;
- if (range_len(&ram_range))
- dpa_perf_setup(port, &ram_range, &mds->ram_perf);
+ for (int i = 0; i < ARRAY_SIZE(partition); i++) {
+ const struct resource *res = partition[i];
+ struct range range = {
+ .start = res->start,
+ .end = res->end,
+ };
- if (range_len(&pmem_range))
- dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
+ dpa_perf_setup(port, &range, perf[i]);
+ }
cxl_memdev_update_perf(cxlmd);
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
@ 2025-01-22 8:59 ` Dan Williams
2025-01-22 14:53 ` Ira Weiny
` (4 more replies)
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
` (2 subsequent siblings)
5 siblings, 5 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-22 8:59 UTC (permalink / raw)
To: linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, dave.jiang,
Jonathan.Cameron
The pending efforts to add CXL Accelerator (type-2) device [1], and
Dynamic Capacity (DCD) support [2], tripped on the
no-longer-fit-for-purpose design in the CXL subsystem for tracking
device-physical-address (DPA) metadata. Trip hazards include:
- CXL Memory Devices need to consider a PMEM partition, but Accelerator
devices with CXL.mem likely do not in the common case.
- CXL Memory Devices enumerate DPA through Memory Device mailbox
commands like Partition Info, Accelerators devices do not.
- CXL Memory Devices that support DCD support more than 2 partitions.
Some of the driver algorithms are awkward to expand to > 2 partition
cases.
- DPA performance data is a general capability that can be shared with
accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
suitable.
- Hardcoded assumptions around the PMEM partition always being index-1
if RAM is zero-sized or PMEM is zero sized.
- 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
memory property, it should be phased in favor of a partition id and
the memory property comes from the partition info.
Towards cleaning up those issues and allowing a smoother landing for the
aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
way for Memory Devices and Accelerators to initialize the DPA information
in 'struct cxl_dev_state'.
For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
get the new data structure initialized, and cleanup some qos_class init.
Follow on patches will go further to use the new data structure to
cleanup algorithms that are better suited to loop over all possible
partitions.
cxl_dpa_setup() follows the locking expectations of mutating the device
DPA map, and is suitable for Accelerator drivers to use. Accelerators
likely only have one hardcoded 'ram' partition to convey to the
cxl_core.
Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 15 ++-----
drivers/cxl/core/hdm.c | 75 +++++++++++++++++++++++++++++++++-
drivers/cxl/core/mbox.c | 68 ++++++++++--------------------
drivers/cxl/core/memdev.c | 2 -
drivers/cxl/cxlmem.h | 94 +++++++++++++++++++++++++++++-------------
drivers/cxl/pci.c | 7 +++
tools/testing/cxl/test/cxl.c | 15 ++-----
tools/testing/cxl/test/mem.c | 7 +++
8 files changed, 183 insertions(+), 100 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index b177a488e29b..5400a421ad30 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
struct device *dev = cxlds->dev;
struct dsmas_entry *dent;
unsigned long index;
- const struct resource *partition[] = {
- to_ram_res(cxlds),
- to_pmem_res(cxlds),
- };
- struct cxl_dpa_perf *perf[] = {
- to_ram_perf(cxlds),
- to_pmem_perf(cxlds),
- };
xa_for_each(dsmas_xa, index, dent) {
- for (int i = 0; i < ARRAY_SIZE(partition); i++) {
- const struct resource *res = partition[i];
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ struct resource *res = &cxlds->part[i].res;
struct range range = {
.start = res->start,
.end = res->end,
};
if (range_contains(&range, &dent->dpa_range))
- update_perf_entry(dev, dent, perf[i]);
+ update_perf_entry(dev, dent,
+ &cxlds->part[i].perf);
else
dev_dbg(dev,
"no partition for dsmas dpa: %pra\n",
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 7a85522294ad..3f8a54ca4624 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
cxled->dpa_res = res;
cxled->skip = skipped;
- if (resource_contains(to_pmem_res(cxlds), res))
+ if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
cxled->mode = CXL_DECODER_PMEM;
- else if (resource_contains(to_ram_res(cxlds), res))
+ else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
cxled->mode = CXL_DECODER_RAM;
else {
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
@@ -342,6 +342,77 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
return 0;
}
+static int add_dpa_res(struct device *dev, struct resource *parent,
+ struct resource *res, resource_size_t start,
+ resource_size_t size, const char *type)
+{
+ int rc;
+
+ *res = (struct resource) {
+ .name = type,
+ .start = start,
+ .end = start + size - 1,
+ .flags = IORESOURCE_MEM,
+ };
+ if (resource_size(res) == 0) {
+ dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
+ return 0;
+ }
+ rc = request_resource(parent, res);
+ if (rc) {
+ dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
+ res, rc);
+ return rc;
+ }
+
+ dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
+
+ return 0;
+}
+
+/* if this fails the caller must destroy @cxlds, there is no recovery */
+int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
+{
+ struct device *dev = cxlds->dev;
+
+ guard(rwsem_write)(&cxl_dpa_rwsem);
+
+ if (cxlds->nr_partitions)
+ return -EBUSY;
+
+ if (!info->size || !info->nr_partitions) {
+ cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
+ cxlds->nr_partitions = 0;
+ return 0;
+ }
+
+ cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
+
+ for (int i = 0; i < info->nr_partitions; i++) {
+ const struct cxl_dpa_part_info *part = &info->part[i];
+ const char *desc;
+ int rc;
+
+ if (part->mode == CXL_PARTMODE_RAM)
+ desc = "ram";
+ else if (part->mode == CXL_PARTMODE_PMEM)
+ desc = "pmem";
+ else
+ desc = "";
+ cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
+ cxlds->part[i].mode = part->mode;
+ rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
+ part->range.start, range_len(&part->range),
+ desc);
+ if (rc)
+ return rc;
+ cxlds->nr_partitions++;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_dpa_setup);
+
int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 3502f1633ad2..62bb3653362f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1241,57 +1241,39 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
return rc;
}
-static int add_dpa_res(struct device *dev, struct resource *parent,
- struct resource *res, resource_size_t start,
- resource_size_t size, const char *type)
+static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
{
- int rc;
+ int i = info->nr_partitions;
- res->name = type;
- res->start = start;
- res->end = start + size - 1;
- res->flags = IORESOURCE_MEM;
- if (resource_size(res) == 0) {
- dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
- return 0;
- }
- rc = request_resource(parent, res);
- if (rc) {
- dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
- res, rc);
- return rc;
- }
-
- dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
+ if (size == 0)
+ return;
- return 0;
+ info->part[i].range = (struct range) {
+ .start = start,
+ .end = start + size - 1,
+ };
+ info->part[i].mode = mode;
+ info->nr_partitions++;
}
-int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
- struct resource *ram_res = to_ram_res(cxlds);
- struct resource *pmem_res = to_pmem_res(cxlds);
struct device *dev = cxlds->dev;
int rc;
if (!cxlds->media_ready) {
- cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
- *ram_res = DEFINE_RES_MEM(0, 0);
- *pmem_res = DEFINE_RES_MEM(0, 0);
+ info->size = 0;
return 0;
}
- cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
+ info->size = mds->total_bytes;
if (mds->partition_align_bytes == 0) {
- rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
- mds->volatile_only_bytes, "ram");
- if (rc)
- return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
- mds->volatile_only_bytes,
- mds->persistent_only_bytes, "pmem");
+ add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->volatile_only_bytes,
+ mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
+ return 0;
}
rc = cxl_mem_get_partition_info(mds);
@@ -1300,15 +1282,13 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
return rc;
}
- rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
- mds->active_volatile_bytes, "ram");
- if (rc)
- return rc;
- return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
- mds->active_volatile_bytes,
- mds->active_persistent_bytes, "pmem");
+ add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
+ CXL_PARTMODE_PMEM);
+
+ return 0;
}
-EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL");
+EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
int cxl_set_timestamp(struct cxl_memdev_state *mds)
{
@@ -1452,8 +1432,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->cxlds.reg_map.host = dev;
mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
- to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
- to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
return mds;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index c5f8320ed330..be0eb57086e1 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- unsigned long long len = resource_size(to_ram_res(cxlds));
+ unsigned long long len = cxl_ram_size(cxlds);
return sysfs_emit(buf, "%#llx\n", len);
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 78e92e24d7b5..15f549afab7c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -97,6 +97,25 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
+enum cxl_partition_mode {
+ CXL_PARTMODE_NONE,
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
+#define CXL_NR_PARTITIONS_MAX 2
+
+struct cxl_dpa_info {
+ u64 size;
+ struct cxl_dpa_part_info {
+ struct range range;
+ enum cxl_partition_mode mode;
+ } part[CXL_NR_PARTITIONS_MAX];
+ int nr_partitions;
+};
+
+int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
+
static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
struct cxl_memdev *cxlmd)
{
@@ -408,6 +427,18 @@ struct cxl_dpa_perf {
int qos_class;
};
+/**
+ * struct cxl_dpa_partition - DPA partition descriptor
+ * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
+ * @perf: performance attributes of the partition from CDAT
+ * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ */
+struct cxl_dpa_partition {
+ struct resource res;
+ struct cxl_dpa_perf perf;
+ enum cxl_partition_mode mode;
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -423,8 +454,8 @@ struct cxl_dpa_perf {
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
- * @_pmem_res: Active Persistent memory capacity configuration
- * @_ram_res: Active Volatile memory capacity configuration
+ * @part: DPA partition array
+ * @nr_partitions: Number of DPA partitions
* @serial: PCIe Device Serial Number
* @type: Generic Memory Class device or Vendor Specific Memory device
* @cxl_mbox: CXL mailbox context
@@ -438,21 +469,47 @@ struct cxl_dev_state {
bool rcd;
bool media_ready;
struct resource dpa_res;
- struct resource _pmem_res;
- struct resource _ram_res;
+ struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
+ unsigned int nr_partitions;
u64 serial;
enum cxl_devtype type;
struct cxl_mailbox cxl_mbox;
};
-static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
+
+/* Static RAM is only expected at partition 0. */
+static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
+{
+ if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
+ return NULL;
+ return &cxlds->part[0].res;
+}
+
+/*
+ * Static PMEM may be at partition index 0 when there is no static RAM
+ * capacity.
+ */
+static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
+{
+ for (int i = 0; i < cxlds->nr_partitions; i++)
+ if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+ return &cxlds->part[i].res;
+ return NULL;
+}
+
+static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
{
- return &cxlds->_ram_res;
+ if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
+ return NULL;
+ return &cxlds->part[0].perf;
}
-static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
+static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
{
- return &cxlds->_pmem_res;
+ for (int i = 0; i < cxlds->nr_partitions; i++)
+ if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+ return &cxlds->part[i].perf;
+ return NULL;
}
static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
@@ -499,8 +556,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
* @active_persistent_bytes: sum of hard + soft persistent
* @next_volatile_bytes: volatile capacity change pending device reset
* @next_persistent_bytes: persistent capacity change pending device reset
- * @_ram_perf: performance data entry matched to RAM partition
- * @_pmem_perf: performance data entry matched to PMEM partition
* @event: event log driver state
* @poison: poison driver state info
* @security: security driver state info
@@ -524,29 +579,12 @@ struct cxl_memdev_state {
u64 next_volatile_bytes;
u64 next_persistent_bytes;
- struct cxl_dpa_perf _ram_perf;
- struct cxl_dpa_perf _pmem_perf;
-
struct cxl_event_state event;
struct cxl_poison_state poison;
struct cxl_security_state security;
struct cxl_fw_state fw;
};
-static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
-{
- struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
-
- return &mds->_ram_perf;
-}
-
-static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
-{
- struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
-
- return &mds->_pmem_perf;
-}
-
static inline struct cxl_memdev_state *
to_cxl_memdev_state(struct cxl_dev_state *cxlds)
{
@@ -860,7 +898,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
-int cxl_mem_create_range_info(struct cxl_memdev_state *mds);
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0241d1d7133a..47dbfe406236 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd);
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
+ struct cxl_dpa_info range_info = { 0 };
struct cxl_memdev_state *mds;
struct cxl_dev_state *cxlds;
struct cxl_register_map map;
@@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
- rc = cxl_mem_create_range_info(mds);
+ rc = cxl_mem_dpa_fetch(mds, &range_info);
+ if (rc)
+ return rc;
+
+ rc = cxl_dpa_setup(cxlds, &range_info);
if (rc)
return rc;
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 7f1c5061307b..ba3d48b37de3 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
- const struct resource *partition[] = {
- to_ram_res(cxlds),
- to_pmem_res(cxlds),
- };
- struct cxl_dpa_perf *perf[] = {
- to_ram_perf(cxlds),
- to_pmem_perf(cxlds),
- };
if (!cxl_root)
return;
- for (int i = 0; i < ARRAY_SIZE(partition); i++) {
- const struct resource *res = partition[i];
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ struct resource *res = &cxlds->part[i].res;
+ struct cxl_dpa_perf *perf = &cxlds->part[i].perf;
struct range range = {
.start = res->start,
.end = res->end,
};
- dpa_perf_setup(port, &range, perf[i]);
+ dpa_perf_setup(port, &range, perf);
}
cxl_memdev_update_perf(cxlmd);
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 347c1e7b37bd..ed365e083c8f 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
struct cxl_dev_state *cxlds;
struct cxl_mockmem_data *mdata;
struct cxl_mailbox *cxl_mbox;
+ struct cxl_dpa_info range_info = { 0 };
int rc;
mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
@@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- rc = cxl_mem_create_range_info(mds);
+ rc = cxl_mem_dpa_fetch(mds, &range_info);
+ if (rc)
+ return rc;
+
+ rc = cxl_dpa_setup(cxlds, &range_info);
if (rc)
return rc;
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
` (2 preceding siblings ...)
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
@ 2025-01-22 8:59 ` Dan Williams
2025-01-22 16:29 ` Ira Weiny
` (3 more replies)
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
2025-01-23 17:23 ` [PATCH v2 0/5] cxl: DPA partition metadata is a mess Alejandro Lucero Palau
5 siblings, 4 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-22 8:59 UTC (permalink / raw)
To: linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, dave.jiang,
Jonathan.Cameron
cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
allocations being distinct from RAM allocations in specific ways when in
practice the allocation rules are only relative to DPA partition index.
The rules for cxl_dpa_alloc() are:
- allocations can only come from 1 partition
- if allocating at partition-index-N, all free space in partitions less
than partition-index-N must be skipped over
Use the new 'struct cxl_dpa_partition' array to support allocation with
an arbitrary number of DPA partitions on the device.
A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
concept and supersede it with looking up the memory properties from
partition metadata. Until then cxl_part_mode() temporarily bridges code
that looks up partitions by @cxled->mode.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/hdm.c | 215 +++++++++++++++++++++++++++++++++++-------------
drivers/cxl/cxlmem.h | 14 +++
2 files changed, 172 insertions(+), 57 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 3f8a54ca4624..591aeb26c9e1 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
+/* See request_skip() kernel-doc */
+static void release_skip(struct cxl_dev_state *cxlds,
+ const resource_size_t skip_base,
+ const resource_size_t skip_len)
+{
+ resource_size_t skip_start = skip_base, skip_rem = skip_len;
+
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ const struct resource *part_res = &cxlds->part[i].res;
+ resource_size_t skip_end, skip_size;
+
+ if (skip_start < part_res->start || skip_start > part_res->end)
+ continue;
+
+ skip_end = min(part_res->end, skip_start + skip_rem - 1);
+ skip_size = skip_end - skip_start + 1;
+ __release_region(&cxlds->dpa_res, skip_start, skip_size);
+ skip_start += skip_size;
+ skip_rem -= skip_size;
+
+ if (!skip_rem)
+ break;
+ }
+}
+
/*
* Must be called in a context that synchronizes against this decoder's
* port ->remove() callback (like an endpoint decoder sysfs attribute)
@@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
skip_start = res->start - cxled->skip;
__release_region(&cxlds->dpa_res, res->start, resource_size(res));
if (cxled->skip)
- __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
+ release_skip(cxlds, skip_start, cxled->skip);
cxled->skip = 0;
cxled->dpa_res = NULL;
put_device(&cxled->cxld.dev);
@@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
__cxl_dpa_release(cxled);
}
+/**
+ * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
+ * @cxlds: CXL.mem device context that parents @cxled
+ * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
+ * @skip_base: DPA < start of new DPA allocation (DPAnew)
+ * @skip_len: @skip_base + @skip_len == DPAnew
+ *
+ * DPA 'skip' arises from out-of-sequence DPA allocation events relative
+ * to free capacity across multiple partitions. It is a wasteful event
+ * as usable DPA gets thrown away, but if a deployment has, for example,
+ * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
+ * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
+ * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
+ * Protection" for more details.
+ *
+ * A 'skip' always covers the last allocated DPA in a previous partition
+ * to the start of the current partition to allocate. Allocations never
+ * start in the middle of a partition, and allocations are always
+ * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
+ * unwind order from forced in-order allocation).
+ *
+ * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
+ * would always be contained to a single partition. Given
+ * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
+ * might span "tail capacity of partition[0], all of partition[1], ...,
+ * all of partition[N-1]" to support allocating from partition[N]. That
+ * in turn interacts with the partition 'struct resource' boundaries
+ * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
+ * partition. I.e. this is a quirk of using a 'struct resource' tree to
+ * detect range conflicts while also tracking partition boundaries in
+ * @cxlds->dpa_res.
+ */
+static int request_skip(struct cxl_dev_state *cxlds,
+ struct cxl_endpoint_decoder *cxled,
+ const resource_size_t skip_base,
+ const resource_size_t skip_len)
+{
+ resource_size_t skip_start = skip_base, skip_rem = skip_len;
+
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ const struct resource *part_res = &cxlds->part[i].res;
+ struct cxl_port *port = cxled_to_port(cxled);
+ resource_size_t skip_end, skip_size;
+ struct resource *res;
+
+ if (skip_start < part_res->start || skip_start > part_res->end)
+ continue;
+
+ skip_end = min(part_res->end, skip_start + skip_rem - 1);
+ skip_size = skip_end - skip_start + 1;
+
+ res = __request_region(&cxlds->dpa_res, skip_start, skip_size,
+ dev_name(&cxled->cxld.dev), 0);
+ if (!res) {
+ dev_dbg(cxlds->dev,
+ "decoder%d.%d: failed to reserve skipped space\n",
+ port->id, cxled->cxld.id);
+ break;
+ }
+ skip_start += skip_size;
+ skip_rem -= skip_size;
+ if (!skip_rem)
+ break;
+ }
+
+ if (skip_rem == 0)
+ return 0;
+
+ release_skip(cxlds, skip_base, skip_len - skip_rem);
+
+ return -EBUSY;
+}
+
static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped)
@@ -276,7 +374,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &port->dev;
+ enum cxl_decoder_mode mode;
struct resource *res;
+ int rc;
lockdep_assert_held_write(&cxl_dpa_rwsem);
@@ -305,14 +405,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
}
if (skipped) {
- res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
- dev_name(&cxled->cxld.dev), 0);
- if (!res) {
- dev_dbg(dev,
- "decoder%d.%d: failed to reserve skipped space\n",
- port->id, cxled->cxld.id);
- return -EBUSY;
- }
+ rc = request_skip(cxlds, cxled, base - skipped, skipped);
+ if (rc)
+ return rc;
}
res = __request_region(&cxlds->dpa_res, base, len,
dev_name(&cxled->cxld.dev), 0);
@@ -320,22 +415,23 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
port->id, cxled->cxld.id);
if (skipped)
- __release_region(&cxlds->dpa_res, base - skipped,
- skipped);
+ release_skip(cxlds, base - skipped, skipped);
return -EBUSY;
}
cxled->dpa_res = res;
cxled->skip = skipped;
- if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
- cxled->mode = CXL_DECODER_PMEM;
- else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
- cxled->mode = CXL_DECODER_RAM;
- else {
+ mode = CXL_DECODER_NONE;
+ for (int i = 0; cxlds->nr_partitions; i++)
+ if (resource_contains(&cxlds->part[i].res, res)) {
+ mode = cxl_part_mode(cxlds->part[i].mode);
+ break;
+ }
+
+ if (mode == CXL_DECODER_NONE)
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
port->id, cxled->cxld.id, res);
- cxled->mode = CXL_DECODER_NONE;
- }
+ cxled->mode = mode;
port->hdm_end++;
get_device(&cxled->cxld.dev);
@@ -529,15 +625,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- resource_size_t free_ram_start, free_pmem_start;
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &cxled->cxld.dev;
- resource_size_t start, avail, skip;
+ struct resource *res, *prev = NULL;
+ resource_size_t start, avail, skip, skip_start;
struct resource *p, *last;
- const struct resource *ram_res = to_ram_res(cxlds);
- const struct resource *pmem_res = to_pmem_res(cxlds);
- int rc;
+ int part, rc;
down_write(&cxl_dpa_rwsem);
if (cxled->cxld.region) {
@@ -553,47 +647,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
goto out;
}
- for (p = ram_res->child, last = NULL; p; p = p->sibling)
- last = p;
- if (last)
- free_ram_start = last->end + 1;
- else
- free_ram_start = ram_res->start;
+ part = -1;
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
+ part = i;
+ break;
+ }
+ }
- for (p = pmem_res->child, last = NULL; p; p = p->sibling)
+ if (part < 0) {
+ dev_dbg(dev, "partition %d not found\n", part);
+ rc = -EBUSY;
+ goto out;
+ }
+
+ res = &cxlds->part[part].res;
+ for (p = res->child, last = NULL; p; p = p->sibling)
last = p;
if (last)
- free_pmem_start = last->end + 1;
+ start = last->end + 1;
else
- free_pmem_start = pmem_res->start;
-
- if (cxled->mode == CXL_DECODER_RAM) {
- start = free_ram_start;
- avail = ram_res->end - start + 1;
- skip = 0;
- } else if (cxled->mode == CXL_DECODER_PMEM) {
- resource_size_t skip_start, skip_end;
+ start = res->start;
- start = free_pmem_start;
- avail = pmem_res->end - start + 1;
- skip_start = free_ram_start;
-
- /*
- * If some pmem is already allocated, then that allocation
- * already handled the skip.
- */
- if (pmem_res->child &&
- skip_start == pmem_res->child->start)
- skip_end = skip_start - 1;
- else
- skip_end = start - 1;
- skip = skip_end - skip_start + 1;
- } else {
- dev_dbg(dev, "mode not set\n");
- rc = -EINVAL;
- goto out;
+ /*
+ * To allocate at partition N, a skip needs to be calculated for all
+ * unallocated space at lower partitions indices.
+ *
+ * If a partition has any allocations, the search can end because a
+ * previous cxl_dpa_alloc() invocation is assumed to have accounted for
+ * all previous partitions.
+ */
+ skip_start = CXL_RESOURCE_NONE;
+ for (int i = part; i; i--) {
+ prev = &cxlds->part[i - 1].res;
+ for (p = prev->child, last = NULL; p; p = p->sibling)
+ last = p;
+ if (last) {
+ skip_start = last->end + 1;
+ break;
+ }
+ skip_start = prev->start;
}
+ avail = res->end - start + 1;
+ if (skip_start == CXL_RESOURCE_NONE)
+ skip = 0;
+ else
+ skip = res->start - skip_start;
+
if (size > avail) {
dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
cxl_decoder_mode_name(cxled->mode), &avail);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 15f549afab7c..bad99456e901 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -530,6 +530,20 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
return resource_size(res);
}
+/*
+ * Translate the operational mode of memory capacity with the
+ * operational mode of a decoder
+ * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
+ */
+static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
+{
+ if (mode == CXL_PARTMODE_RAM)
+ return CXL_DECODER_RAM;
+ if (mode == CXL_PARTMODE_PMEM)
+ return CXL_DECODER_PMEM;
+ return CXL_DECODER_NONE;
+}
+
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
{
return dev_get_drvdata(cxl_mbox->host);
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
` (3 preceding siblings ...)
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
@ 2025-01-22 8:59 ` Dan Williams
2025-01-22 17:42 ` Ira Weiny
` (3 more replies)
2025-01-23 17:23 ` [PATCH v2 0/5] cxl: DPA partition metadata is a mess Alejandro Lucero Palau
5 siblings, 4 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-22 8:59 UTC (permalink / raw)
To: linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, dave.jiang,
Jonathan.Cameron
Now that the operational mode of DPA capacity (ram vs pmem... etc) is
tracked in the partition, and no code paths have dependencies on the
mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
can be cleaned up, specifically this ambiguity on whether the operation
mode implied anything about the partition order.
Endpoint decoders simply reference their assigned partition where the
operational mode can be retrieved as partition mode.
With this in place PMEM can now be partition0 which happens today when
the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
assumption can now just iterate partitions and consult the partition
mode after the fact.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/cdat.c | 21 ++-----
drivers/cxl/core/core.h | 4 +
drivers/cxl/core/hdm.c | 64 +++++++----------------
drivers/cxl/core/memdev.c | 15 +----
drivers/cxl/core/port.c | 20 +++++--
drivers/cxl/core/region.c | 128 +++++++++++++++++++++++++--------------------
drivers/cxl/cxl.h | 38 ++++---------
drivers/cxl/cxlmem.h | 20 -------
8 files changed, 127 insertions(+), 183 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 5400a421ad30..ca7fb2b182ed 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -571,29 +571,18 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
.end = dpa_res->end,
};
- if (!perf)
- return false;
-
return range_contains(&perf->dpa_range, &dpa);
}
-static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled,
- enum cxl_decoder_mode mode)
+static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_dpa_perf *perf;
- switch (mode) {
- case CXL_DECODER_RAM:
- perf = to_ram_perf(cxlds);
- break;
- case CXL_DECODER_PMEM:
- perf = to_pmem_perf(cxlds);
- break;
- default:
+ if (cxled->part < 0)
return ERR_PTR(-EINVAL);
- }
+ perf = &cxlds->part[cxled->part].perf;
if (!dpa_perf_contains(perf, cxled->dpa_res))
return ERR_PTR(-EINVAL);
@@ -654,7 +643,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr,
if (cxlds->rcd)
return -ENODEV;
- perf = cxled_get_dpa_perf(cxled, cxlr->mode);
+ perf = cxled_get_dpa_perf(cxled);
if (IS_ERR(perf))
return PTR_ERR(perf);
@@ -1060,7 +1049,7 @@ void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
lockdep_assert_held(&cxl_dpa_rwsem);
- perf = cxled_get_dpa_perf(cxled, cxlr->mode);
+ perf = cxled_get_dpa_perf(cxled);
if (IS_ERR(perf))
return;
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 800466f96a68..22dac79c5192 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -72,8 +72,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
resource_size_t length);
struct dentry *cxl_debugfs_create_dir(const char *dir);
-int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
- enum cxl_decoder_mode mode);
+int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
+ enum cxl_partition_mode mode);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 591aeb26c9e1..bb478e7b12f6 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -374,7 +374,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &port->dev;
- enum cxl_decoder_mode mode;
struct resource *res;
int rc;
@@ -421,18 +420,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
cxled->dpa_res = res;
cxled->skip = skipped;
- mode = CXL_DECODER_NONE;
- for (int i = 0; cxlds->nr_partitions; i++)
- if (resource_contains(&cxlds->part[i].res, res)) {
- mode = cxl_part_mode(cxlds->part[i].mode);
- break;
- }
-
- if (mode == CXL_DECODER_NONE)
- dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
- port->id, cxled->cxld.id, res);
- cxled->mode = mode;
-
port->hdm_end++;
get_device(&cxled->cxld.dev);
return 0;
@@ -585,40 +572,36 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
return rc;
}
-int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
- enum cxl_decoder_mode mode)
+int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
+ enum cxl_partition_mode mode)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *dev = &cxled->cxld.dev;
-
- switch (mode) {
- case CXL_DECODER_RAM:
- case CXL_DECODER_PMEM:
- break;
- default:
- dev_dbg(dev, "unsupported mode: %d\n", mode);
- return -EINVAL;
- }
+ int part;
guard(rwsem_write)(&cxl_dpa_rwsem);
if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
return -EBUSY;
- /*
- * Only allow modes that are supported by the current partition
- * configuration
- */
- if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
- dev_dbg(dev, "no available pmem capacity\n");
- return -ENXIO;
+ part = -1;
+ for (int i = 0; i < cxlds->nr_partitions; i++)
+ if (cxlds->part[i].mode == mode) {
+ part = i;
+ break;
+ }
+
+ if (part < 0) {
+ dev_dbg(dev, "unsupported mode: %d\n", mode);
+ return -EINVAL;
}
- if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
- dev_dbg(dev, "no available ram capacity\n");
+
+ if (!resource_size(&cxlds->part[part].res)) {
+ dev_dbg(dev, "no available capacity for mode: %d\n", mode);
return -ENXIO;
}
- cxled->mode = mode;
+ cxled->part = part;
return 0;
}
@@ -647,16 +630,9 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
goto out;
}
- part = -1;
- for (int i = 0; i < cxlds->nr_partitions; i++) {
- if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
- part = i;
- break;
- }
- }
-
+ part = cxled->part;
if (part < 0) {
- dev_dbg(dev, "partition %d not found\n", part);
+ dev_dbg(dev, "partition not set\n");
rc = -EBUSY;
goto out;
}
@@ -697,7 +673,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
if (size > avail) {
dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
- cxl_decoder_mode_name(cxled->mode), &avail);
+ res->name, &avail);
rc = -ENOSPC;
goto out;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index be0eb57086e1..615cbd861f66 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -198,17 +198,8 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
int rc = 0;
/* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
- if (cxl_pmem_size(cxlds)) {
- const struct resource *res = to_pmem_res(cxlds);
-
- offset = res->start;
- length = resource_size(res);
- rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc)
- return rc;
- }
- if (cxl_ram_size(cxlds)) {
- const struct resource *res = to_ram_res(cxlds);
+ for (int i = 0; i < cxlds->nr_partitions; i++) {
+ const struct resource *res = &cxlds->part[i].res;
offset = res->start;
length = resource_size(res);
@@ -217,7 +208,7 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
* Invalid Physical Address is not an error for
* volatile addresses. Device support is optional.
*/
- if (rc == -EFAULT)
+ if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
rc = 0;
}
return rc;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 78a5c2c25982..f5f2701c8771 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -194,25 +194,35 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ /* without @cxl_dpa_rwsem, make sure @part is not reloaded */
+ int part = READ_ONCE(cxled->part);
+ const char *desc;
+
+ if (part < 0)
+ desc = "none";
+ else
+ desc = cxlds->part[part].res.name;
- return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
+ return sysfs_emit(buf, "%s\n", desc);
}
static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
- enum cxl_decoder_mode mode;
+ enum cxl_partition_mode mode;
ssize_t rc;
if (sysfs_streq(buf, "pmem"))
- mode = CXL_DECODER_PMEM;
+ mode = CXL_PARTMODE_PMEM;
else if (sysfs_streq(buf, "ram"))
- mode = CXL_DECODER_RAM;
+ mode = CXL_PARTMODE_RAM;
else
return -EINVAL;
- rc = cxl_dpa_set_mode(cxled, mode);
+ rc = cxl_dpa_set_part(cxled, mode);
if (rc)
return rc;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 9f0f6fdbc841..83b985d2ba76 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -144,7 +144,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
rc = down_read_interruptible(&cxl_region_rwsem);
if (rc)
return rc;
- if (cxlr->mode != CXL_DECODER_PMEM)
+ if (cxlr->mode != CXL_PARTMODE_PMEM)
rc = sysfs_emit(buf, "\n");
else
rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
@@ -441,7 +441,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
* Support tooling that expects to find a 'uuid' attribute for all
* regions regardless of mode.
*/
- if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
+ if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_PARTMODE_PMEM)
return 0444;
return a->mode;
}
@@ -603,8 +603,16 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_region *cxlr = to_cxl_region(dev);
+ const char *desc;
- return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
+ if (cxlr->mode == CXL_PARTMODE_RAM)
+ desc = "ram";
+ else if (cxlr->mode == CXL_PARTMODE_PMEM)
+ desc = "pmem";
+ else
+ desc = "";
+
+ return sysfs_emit(buf, "%s\n", desc);
}
static DEVICE_ATTR_RO(mode);
@@ -630,7 +638,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
/* ways, granularity and uuid (if PMEM) need to be set before HPA */
if (!p->interleave_ways || !p->interleave_granularity ||
- (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid)))
+ (cxlr->mode == CXL_PARTMODE_PMEM && uuid_is_null(&p->uuid)))
return -ENXIO;
div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder);
@@ -1875,6 +1883,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_region_params *p = &cxlr->params;
struct cxl_port *ep_port, *root_port;
struct cxl_dport *dport;
@@ -1889,17 +1898,17 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return rc;
}
- if (cxled->mode != cxlr->mode) {
- dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
- dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
- return -EINVAL;
- }
-
- if (cxled->mode == CXL_DECODER_DEAD) {
+ if (cxled->part < 0) {
dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
return -ENODEV;
}
+ if (cxlds->part[cxled->part].mode != cxlr->mode) {
+ dev_dbg(&cxlr->dev, "%s region mode: %d mismatch\n",
+ dev_name(&cxled->cxld.dev), cxlr->mode);
+ return -EINVAL;
+ }
+
/* all full of members, or interleave config not established? */
if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
dev_dbg(&cxlr->dev, "region already active\n");
@@ -2102,7 +2111,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
{
down_write(&cxl_region_rwsem);
- cxled->mode = CXL_DECODER_DEAD;
+ cxled->part = -1;
cxl_region_detach(cxled);
up_write(&cxl_region_rwsem);
}
@@ -2458,7 +2467,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb,
*/
static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
int id,
- enum cxl_decoder_mode mode,
+ enum cxl_partition_mode mode,
enum cxl_decoder_type type)
{
struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
@@ -2512,13 +2521,13 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_decoder_mode mode, int id)
+ enum cxl_partition_mode mode, int id)
{
int rc;
switch (mode) {
- case CXL_DECODER_RAM:
- case CXL_DECODER_PMEM:
+ case CXL_PARTMODE_RAM:
+ case CXL_PARTMODE_PMEM:
break;
default:
dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
@@ -2538,7 +2547,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
}
static ssize_t create_region_store(struct device *dev, const char *buf,
- size_t len, enum cxl_decoder_mode mode)
+ size_t len, enum cxl_partition_mode mode)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_region *cxlr;
@@ -2559,7 +2568,7 @@ static ssize_t create_pmem_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_DECODER_PMEM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
}
DEVICE_ATTR_RW(create_pmem_region);
@@ -2567,7 +2576,7 @@ static ssize_t create_ram_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_DECODER_RAM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
}
DEVICE_ATTR_RW(create_ram_region);
@@ -2665,7 +2674,7 @@ EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, "CXL");
struct cxl_poison_context {
struct cxl_port *port;
- enum cxl_decoder_mode mode;
+ int part;
u64 offset;
};
@@ -2673,49 +2682,45 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
struct cxl_poison_context *ctx)
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ const struct resource *res;
+ struct resource *p, *last;
u64 offset, length;
int rc = 0;
+ if (ctx->part < 0)
+ return 0;
+
/*
- * Collect poison for the remaining unmapped resources
- * after poison is collected by committed endpoints.
- *
- * Knowing that PMEM must always follow RAM, get poison
- * for unmapped resources based on the last decoder's mode:
- * ram: scan remains of ram range, then any pmem range
- * pmem: scan remains of pmem range
+ * Collect poison for the remaining unmapped resources after
+ * poison is collected by committed endpoints decoders.
*/
-
- if (ctx->mode == CXL_DECODER_RAM) {
- offset = ctx->offset;
- length = cxl_ram_size(cxlds) - offset;
+ for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
+ res = &cxlds->part[i].res;
+ for (p = res->child, last = NULL; p; p = p->sibling)
+ last = p;
+ if (last)
+ offset = last->end + 1;
+ else
+ offset = res->start;
+ length = res->end - offset + 1;
+ if (!length)
+ break;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc == -EFAULT)
- rc = 0;
+ if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
+ continue;
if (rc)
- return rc;
- }
- if (ctx->mode == CXL_DECODER_PMEM) {
- offset = ctx->offset;
- length = resource_size(&cxlds->dpa_res) - offset;
- if (!length)
- return 0;
- } else if (cxl_pmem_size(cxlds)) {
- const struct resource *res = to_pmem_res(cxlds);
-
- offset = res->start;
- length = resource_size(res);
- } else {
- return 0;
+ break;
}
- return cxl_mem_get_poison(cxlmd, offset, length, NULL);
+ return rc;
}
static int poison_by_decoder(struct device *dev, void *arg)
{
struct cxl_poison_context *ctx = arg;
struct cxl_endpoint_decoder *cxled;
+ enum cxl_partition_mode mode;
+ struct cxl_dev_state *cxlds;
struct cxl_memdev *cxlmd;
u64 offset, length;
int rc = 0;
@@ -2728,11 +2733,17 @@ static int poison_by_decoder(struct device *dev, void *arg)
return rc;
cxlmd = cxled_to_memdev(cxled);
+ cxlds = cxlmd->cxlds;
+ if (cxled->part < 0)
+ mode = CXL_PARTMODE_NONE;
+ else
+ mode = cxlds->part[cxled->part].mode;
+
if (cxled->skip) {
offset = cxled->dpa_res->start - cxled->skip;
length = cxled->skip;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
+ if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
rc = 0;
if (rc)
return rc;
@@ -2741,7 +2752,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
offset = cxled->dpa_res->start;
length = cxled->dpa_res->end - offset + 1;
rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
- if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
+ if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
rc = 0;
if (rc)
return rc;
@@ -2749,7 +2760,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
/* Iterate until commit_end is reached */
if (cxled->cxld.id == ctx->port->commit_end) {
ctx->offset = cxled->dpa_res->end + 1;
- ctx->mode = cxled->mode;
+ ctx->part = cxled->part;
return 1;
}
@@ -2762,7 +2773,8 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
int rc = 0;
ctx = (struct cxl_poison_context) {
- .port = port
+ .port = port,
+ .part = -1,
};
rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
@@ -3206,14 +3218,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_port *port = cxlrd_to_port(cxlrd);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct range *hpa = &cxled->cxld.hpa_range;
+ int rc, part = READ_ONCE(cxled->part);
struct cxl_region_params *p;
struct cxl_region *cxlr;
struct resource *res;
- int rc;
+
+ if (part < 0)
+ return ERR_PTR(-EBUSY);
do {
- cxlr = __create_region(cxlrd, cxled->mode,
+ cxlr = __create_region(cxlrd, cxlds->part[part].mode,
atomic_read(&cxlrd->region_id));
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
@@ -3416,9 +3432,9 @@ static int cxl_region_probe(struct device *dev)
return rc;
switch (cxlr->mode) {
- case CXL_DECODER_PMEM:
+ case CXL_PARTMODE_PMEM:
return devm_cxl_add_pmem_region(cxlr);
- case CXL_DECODER_RAM:
+ case CXL_PARTMODE_RAM:
/*
* The region can not be manged by CXL if any portion of
* it is already online as 'System RAM'
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 4d0550367042..cb6f0b761b24 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -371,30 +371,6 @@ struct cxl_decoder {
void (*reset)(struct cxl_decoder *cxld);
};
-/*
- * CXL_DECODER_DEAD prevents endpoints from being reattached to regions
- * while cxld_unregister() is running
- */
-enum cxl_decoder_mode {
- CXL_DECODER_NONE,
- CXL_DECODER_RAM,
- CXL_DECODER_PMEM,
- CXL_DECODER_DEAD,
-};
-
-static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
-{
- static const char * const names[] = {
- [CXL_DECODER_NONE] = "none",
- [CXL_DECODER_RAM] = "ram",
- [CXL_DECODER_PMEM] = "pmem",
- };
-
- if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
- return names[mode];
- return "mixed";
-}
-
/*
* Track whether this decoder is reserved for region autodiscovery, or
* free for userspace provisioning.
@@ -409,16 +385,16 @@ enum cxl_decoder_state {
* @cxld: base cxl_decoder_object
* @dpa_res: actively claimed DPA span of this decoder
* @skip: offset into @dpa_res where @cxld.hpa_range maps
- * @mode: which memory type / access-mode-partition this decoder targets
* @state: autodiscovery state
+ * @part: partition index this decoder maps
* @pos: interleave position in @cxld.region
*/
struct cxl_endpoint_decoder {
struct cxl_decoder cxld;
struct resource *dpa_res;
resource_size_t skip;
- enum cxl_decoder_mode mode;
enum cxl_decoder_state state;
+ int part;
int pos;
};
@@ -503,6 +479,12 @@ struct cxl_region_params {
int nr_targets;
};
+enum cxl_partition_mode {
+ CXL_PARTMODE_NONE,
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
@@ -522,7 +504,7 @@ struct cxl_region_params {
* struct cxl_region - CXL region
* @dev: This region's device
* @id: This region's id. Id is globally unique across all regions
- * @mode: Endpoint decoder allocation / access mode
+ * @mode: Operational mode of the mapped capacity
* @type: Endpoint decoder target type
* @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
* @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge
@@ -535,7 +517,7 @@ struct cxl_region_params {
struct cxl_region {
struct device dev;
int id;
- enum cxl_decoder_mode mode;
+ enum cxl_partition_mode mode;
enum cxl_decoder_type type;
struct cxl_nvdimm_bridge *cxl_nvb;
struct cxl_pmem_region *cxlr_pmem;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index bad99456e901..f218d43dec9f 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -97,12 +97,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
-enum cxl_partition_mode {
- CXL_PARTMODE_NONE,
- CXL_PARTMODE_RAM,
- CXL_PARTMODE_PMEM,
-};
-
#define CXL_NR_PARTITIONS_MAX 2
struct cxl_dpa_info {
@@ -530,20 +524,6 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
return resource_size(res);
}
-/*
- * Translate the operational mode of memory capacity with the
- * operational mode of a decoder
- * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
- */
-static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
-{
- if (mode == CXL_PARTMODE_RAM)
- return CXL_DECODER_RAM;
- if (mode == CXL_PARTMODE_PMEM)
- return CXL_DECODER_PMEM;
- return CXL_DECODER_NONE;
-}
-
static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
{
return dev_get_drvdata(cxl_mbox->host);
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
@ 2025-01-22 14:11 ` Ira Weiny
2025-01-23 15:49 ` Jonathan Cameron
` (2 subsequent siblings)
3 siblings, 0 replies; 48+ messages in thread
From: Ira Weiny @ 2025-01-22 14:11 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: dave.jiang, Jonathan.Cameron
Dan Williams wrote:
> CXL_DECODER_MIXED is a safety mechanism introduced for the case where
> platform firmware has programmed an endpoint decoder that straddles a
> DPA partition boundary. While the kernel is careful to only allocate DPA
> capacity within a single partition there is no guarantee that platform
> firmware, or anything that touched the device before the current kernel,
> gets that right.
>
> However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
> designation because of the way it tracks partition boundaries. A
> request_resource() that spans ->ram_res and ->pmem_res fails with the
> following signature:
>
> __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
>
> CXL_DECODER_MIXED is dead defensive programming after the driver has
> already given up on the device. It has never offered any protection in
> practice, just delete it.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
@ 2025-01-22 14:18 ` Ira Weiny
2025-01-23 15:57 ` Jonathan Cameron
` (2 subsequent siblings)
3 siblings, 0 replies; 48+ messages in thread
From: Ira Weiny @ 2025-01-22 14:18 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> In preparation for consolidating all DPA partition information into an
> array of DPA metadata, introduce helpers that hide the layout of the
> current data. I.e. make the eventual replacement of ->ram_res,
> ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> no-op for code paths that consume that information, and reduce the noise
> of follow-on patches.
>
> The end goal is to consolidate all DPA information in 'struct
> cxl_dev_state', but for now the helpers just make it appear that all DPA
> metadata is relative to @cxlds.
>
> Note that a follow-on patch also cleans up the temporary placeholders of
> @ram_res, and @pmem_res in the qos_class manipulation code,
> cxl_dpa_alloc(), and cxl_mem_create_range_info().
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
@ 2025-01-22 14:53 ` Ira Weiny
2025-01-22 22:24 ` Dan Williams
2025-01-23 16:09 ` Jonathan Cameron
` (3 subsequent siblings)
4 siblings, 1 reply; 48+ messages in thread
From: Ira Weiny @ 2025-01-22 14:53 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
[snip]
> +/* if this fails the caller must destroy @cxlds, there is no recovery */
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> +{
> + struct device *dev = cxlds->dev;
> +
> + guard(rwsem_write)(&cxl_dpa_rwsem);
> +
> + if (cxlds->nr_partitions)
> + return -EBUSY;
> +
> + if (!info->size || !info->nr_partitions) {
> + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> + cxlds->nr_partitions = 0;
> + return 0;
> + }
> +
> + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> +
> + for (int i = 0; i < info->nr_partitions; i++) {
> + const struct cxl_dpa_part_info *part = &info->part[i];
> + const char *desc;
> + int rc;
> +
> + if (part->mode == CXL_PARTMODE_RAM)
> + desc = "ram";
> + else if (part->mode == CXL_PARTMODE_PMEM)
> + desc = "pmem";
> + else
> + desc = "";
This can be a follow on patch but why not allow devices to name their
partitions?
> + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> + cxlds->part[i].mode = part->mode;
> + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> + part->range.start, range_len(&part->range),
> + desc);
> + if (rc)
> + return rc;
> + cxlds->nr_partitions++;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
> +
> int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped)
[snip]
>
> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +
> +/* Static RAM is only expected at partition 0. */
Is this because the spec requires RAM first and the partition array must
remain in DPA order?
This could be in a follow on patch, but unless I'm missing something the
partition information must be specified in increasing DPA order. Perhaps
that is accounted for in this series later.
Regardless for this patch I don't see any brokeness.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
@ 2025-01-22 16:29 ` Ira Weiny
2025-01-22 22:35 ` Dan Williams
2025-01-23 16:41 ` Jonathan Cameron
` (2 subsequent siblings)
3 siblings, 1 reply; 48+ messages in thread
From: Ira Weiny @ 2025-01-22 16:29 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> allocations being distinct from RAM allocations in specific ways when in
> practice the allocation rules are only relative to DPA partition index.
>
> The rules for cxl_dpa_alloc() are:
>
> - allocations can only come from 1 partition
>
> - if allocating at partition-index-N, all free space in partitions less
> than partition-index-N must be skipped over
I think this is a bit deeper. The partition index must also correspond to
the DPA order. The DCD code verifies the partition index's are in DPA
order when reading them from the device. Therefore, that code will add
them to cxl_dpa_info in order. But general device driver writers may miss
this point.
[snip]
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3f8a54ca4624..591aeb26c9e1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
>
> +/* See request_skip() kernel-doc */
> +static void release_skip(struct cxl_dev_state *cxlds,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + resource_size_t skip_end, skip_size;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> +
> + if (!skip_rem)
> + break;
> + }
> +}
> +
> /*
> * Must be called in a context that synchronizes against this decoder's
> * port ->remove() callback (like an endpoint decoder sysfs attribute)
> @@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> skip_start = res->start - cxled->skip;
> __release_region(&cxlds->dpa_res, res->start, resource_size(res));
> if (cxled->skip)
> - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> + release_skip(cxlds, skip_start, cxled->skip);
> cxled->skip = 0;
> cxled->dpa_res = NULL;
> put_device(&cxled->cxld.dev);
> @@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> __cxl_dpa_release(cxled);
> }
>
> +/**
> + * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
> + * @cxlds: CXL.mem device context that parents @cxled
> + * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
> + * @skip_base: DPA < start of new DPA allocation (DPAnew)
> + * @skip_len: @skip_base + @skip_len == DPAnew
> + *
> + * DPA 'skip' arises from out-of-sequence DPA allocation events relative
> + * to free capacity across multiple partitions. It is a wasteful event
> + * as usable DPA gets thrown away, but if a deployment has, for example,
> + * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
> + * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
> + * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
> + * Protection" for more details.
I think this is a great comment here.
> + *
> + * A 'skip' always covers the last allocated DPA in a previous partition
> + * to the start of the current partition to allocate. Allocations never
> + * start in the middle of a partition, and allocations are always
> + * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
> + * unwind order from forced in-order allocation).
> + *
> + * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
> + * would always be contained to a single partition. Given
> + * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
> + * might span "tail capacity of partition[0], all of partition[1], ...,
> + * all of partition[N-1]" to support allocating from partition[N]. That
> + * in turn interacts with the partition 'struct resource' boundaries
> + * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
> + * partition. I.e. this is a quirk of using a 'struct resource' tree to
> + * detect range conflicts while also tracking partition boundaries in
> + * @cxlds->dpa_res.
Another great comment but it does not actually cover the DCD case. This
is because in DCD the partitions might also have skips between them.
That said the update should come with DCD or if type 2 devices may have
the same loosening of device partitions.
This is a good clean up though,
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
@ 2025-01-22 17:42 ` Ira Weiny
2025-01-22 22:58 ` Dan Williams
2025-01-23 21:30 ` Dave Jiang
2025-01-23 16:51 ` Jonathan Cameron
` (2 subsequent siblings)
3 siblings, 2 replies; 48+ messages in thread
From: Ira Weiny @ 2025-01-22 17:42 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> tracked in the partition, and no code paths have dependencies on the
> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> can be cleaned up, specifically this ambiguity on whether the operation
> mode implied anything about the partition order.
>
> Endpoint decoders simply reference their assigned partition where the
> operational mode can be retrieved as partition mode.
You really seem to be defining a region mode not a partition mode.
I did a lot of work to resolve this for DCD interleave in the future.
This included the introduction of the DC region mode. I __think__ that
what you have here will work fine.
However, from a user ABI standpoint I'm going to have to play games with
having the DCD partitions in a well defined sub-array such that the user
can specify which DCD partition they want to use. So the user concept of
decoder mode does not really go away.
In the interest of urgency I'm going to give my tag on this. But I would
have preferred this called region mode. But I can see why partition mode
makes sense too.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 14:53 ` Ira Weiny
@ 2025-01-22 22:24 ` Dan Williams
2025-01-23 3:10 ` Ira Weiny
0 siblings, 1 reply; 48+ messages in thread
From: Dan Williams @ 2025-01-22 22:24 UTC (permalink / raw)
To: Ira Weiny, Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Ira Weiny wrote:
> Dan Williams wrote:
>
> [snip]
>
> > +/* if this fails the caller must destroy @cxlds, there is no recovery */
> > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> > +{
> > + struct device *dev = cxlds->dev;
> > +
> > + guard(rwsem_write)(&cxl_dpa_rwsem);
> > +
> > + if (cxlds->nr_partitions)
> > + return -EBUSY;
> > +
> > + if (!info->size || !info->nr_partitions) {
> > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> > + cxlds->nr_partitions = 0;
> > + return 0;
> > + }
> > +
> > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> > +
> > + for (int i = 0; i < info->nr_partitions; i++) {
> > + const struct cxl_dpa_part_info *part = &info->part[i];
> > + const char *desc;
> > + int rc;
> > +
> > + if (part->mode == CXL_PARTMODE_RAM)
> > + desc = "ram";
> > + else if (part->mode == CXL_PARTMODE_PMEM)
> > + desc = "pmem";
> > + else
> > + desc = "";
>
> This can be a follow on patch but why not allow devices to name their
> partitions?
The proposal in patch5 is that the partition resource name is the
operation mode. See the changes to mode_show().
So the name is there for the kernel/user ABI to tie the decoder's
assigned partition to an operation mode.
Now, what may need to happen is that the partitions and their modes get
exported in case userspace needs to know that allocating a decoder to
"dynamic ram" before it allocates a decoder to "ram" implies that future
attempts to allocate "ram" will fail. That may not need to include the
actual partition indices, just an ordering of operation modes.
So far we have been saved from needing such a thing as "ram+pmem"
devices are only an emulation test case, not something where end users
can get themselves into trouble by doing out-of-order allocations.
Usually RAM is pre-allocated by BIOS which also limits the possibility
of the 'skip' code ever being used.
> > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> > + cxlds->part[i].mode = part->mode;
> > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> > + part->range.start, range_len(&part->range),
> > + desc);
> > + if (rc)
> > + return rc;
> > + cxlds->nr_partitions++;
> > + }
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
> > +
> > int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > resource_size_t base, resource_size_t len,
> > resource_size_t skipped)
>
> [snip]
>
> >
> > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > +
> > +/* Static RAM is only expected at partition 0. */
>
> Is this because the spec requires RAM first and the partition array must
> remain in DPA order?
Yes.
> This could be in a follow on patch, but unless I'm missing something the
> partition information must be specified in increasing DPA order. Perhaps
> that is accounted for in this series later.
The partition information is specified in increasing DPA order in these
patches, so I am missing the concern?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 16:29 ` Ira Weiny
@ 2025-01-22 22:35 ` Dan Williams
2025-01-23 3:14 ` Ira Weiny
0 siblings, 1 reply; 48+ messages in thread
From: Dan Williams @ 2025-01-22 22:35 UTC (permalink / raw)
To: Ira Weiny, Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Ira Weiny wrote:
> Dan Williams wrote:
> > cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> > allocations being distinct from RAM allocations in specific ways when in
> > practice the allocation rules are only relative to DPA partition index.
> >
> > The rules for cxl_dpa_alloc() are:
> >
> > - allocations can only come from 1 partition
> >
> > - if allocating at partition-index-N, all free space in partitions less
> > than partition-index-N must be skipped over
>
> I think this is a bit deeper. The partition index must also correspond to
> the DPA order. The DCD code verifies the partition index's are in DPA
> order when reading them from the device. Therefore, that code will add
> them to cxl_dpa_info in order. But general device driver writers may miss
> this point.
We could save them from themselves with some paranoia in
cxl_dpa_setup(), but as Alejandro said accelerators are typically
single-static-RAM-partition devices. The risk is low that someone builds
a multi-partition accelerator *and* builds a driver that messes that up,
but I would not say no to a comment that notes that expectation.
> [snip]
>
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 3f8a54ca4624..591aeb26c9e1 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
> >
> > +/* See request_skip() kernel-doc */
> > +static void release_skip(struct cxl_dev_state *cxlds,
> > + const resource_size_t skip_base,
> > + const resource_size_t skip_len)
> > +{
> > + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> > +
> > + for (int i = 0; i < cxlds->nr_partitions; i++) {
> > + const struct resource *part_res = &cxlds->part[i].res;
> > + resource_size_t skip_end, skip_size;
> > +
> > + if (skip_start < part_res->start || skip_start > part_res->end)
> > + continue;
> > +
> > + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> > + skip_size = skip_end - skip_start + 1;
> > + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> > + skip_start += skip_size;
> > + skip_rem -= skip_size;
> > +
> > + if (!skip_rem)
> > + break;
> > + }
> > +}
> > +
> > /*
> > * Must be called in a context that synchronizes against this decoder's
> > * port ->remove() callback (like an endpoint decoder sysfs attribute)
> > @@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > skip_start = res->start - cxled->skip;
> > __release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > if (cxled->skip)
> > - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> > + release_skip(cxlds, skip_start, cxled->skip);
> > cxled->skip = 0;
> > cxled->dpa_res = NULL;
> > put_device(&cxled->cxld.dev);
> > @@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > __cxl_dpa_release(cxled);
> > }
> >
> > +/**
> > + * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
> > + * @cxlds: CXL.mem device context that parents @cxled
> > + * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
> > + * @skip_base: DPA < start of new DPA allocation (DPAnew)
> > + * @skip_len: @skip_base + @skip_len == DPAnew
> > + *
> > + * DPA 'skip' arises from out-of-sequence DPA allocation events relative
> > + * to free capacity across multiple partitions. It is a wasteful event
> > + * as usable DPA gets thrown away, but if a deployment has, for example,
> > + * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
> > + * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
> > + * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
> > + * Protection" for more details.
>
> I think this is a great comment here.
Appreciate that, never know how these things are going to translate.
>
> > + *
> > + * A 'skip' always covers the last allocated DPA in a previous partition
> > + * to the start of the current partition to allocate. Allocations never
> > + * start in the middle of a partition, and allocations are always
> > + * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
> > + * unwind order from forced in-order allocation).
> > + *
> > + * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
> > + * would always be contained to a single partition. Given
> > + * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
> > + * might span "tail capacity of partition[0], all of partition[1], ...,
> > + * all of partition[N-1]" to support allocating from partition[N]. That
> > + * in turn interacts with the partition 'struct resource' boundaries
> > + * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
> > + * partition. I.e. this is a quirk of using a 'struct resource' tree to
> > + * detect range conflicts while also tracking partition boundaries in
> > + * @cxlds->dpa_res.
>
> Another great comment but it does not actually cover the DCD case. This
> is because in DCD the partitions might also have skips between them.
I think that "just works". The allocation will be bound by the
partition, and the skip is calculated from the "end of last allocation
in a previous partition". So, the distance between "end of last" and
"allocation start" will naturally include inter-partition holes, right?
> That said the update should come with DCD or if type 2 devices may have
> the same loosening of device partitions.
>
> This is a good clean up though,
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Appreciate the quick turnaround... I will endeavor to do the same with
the next DCD posting.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 17:42 ` Ira Weiny
@ 2025-01-22 22:58 ` Dan Williams
2025-01-23 3:39 ` Ira Weiny
2025-01-23 21:30 ` Dave Jiang
1 sibling, 1 reply; 48+ messages in thread
From: Dan Williams @ 2025-01-22 22:58 UTC (permalink / raw)
To: Ira Weiny, Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Ira Weiny wrote:
> Dan Williams wrote:
> > Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> > tracked in the partition, and no code paths have dependencies on the
> > mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> > can be cleaned up, specifically this ambiguity on whether the operation
> > mode implied anything about the partition order.
> >
> > Endpoint decoders simply reference their assigned partition where the
> > operational mode can be retrieved as partition mode.
>
> You really seem to be defining a region mode not a partition mode.
To me it comes down to the hierarchy of building up a region.
The DPA is in a fixed operational mode regardless of whether a region is
mapped to it. "pmem is always pmem", "ram is always ram" (modulo online
re-partition which no device has ever built). So calling it a "partition
mode" reflects that the partition comes first, then the endpoint decoder
is mapped to a partition, then the region is mapped to an endpoint
decoder. Region mode is subordinate to partition mode.
> I did a lot of work to resolve this for DCD interleave in the future.
> This included the introduction of the DC region mode. I __think__ that
> what you have here will work fine.
>
> However, from a user ABI standpoint I'm going to have to play games with
> having the DCD partitions in a well defined sub-array such that the user
> can specify which DCD partition they want to use. So the user concept of
> decoder mode does not really go away.
This is the question, do we need to rip that "give userspace explicit
partition control" ABI band-aid?
As I mentioned over here [1], I admit that someone might build a "ram,
dynamic ram, shared ram" device, I remain skeptical that someone will
build a, for example, "ram, dynamic ram, dynamic ram, shared ram"
device. We can always make the ABI more complicated in the future, but
the common case of "userspace need only care about mode and let the
kernel find the partition", probably carries the implementation for the
foreseeable future.
[1]: http://lore.kernel.org/67915ce296030_20fa29457@dwillia2-xfh.jf.intel.com.notmuch
> In the interest of urgency I'm going to give my tag on this. But I would
> have preferred this called region mode. But I can see why partition mode
> makes sense too.
It is a fair comment that deserves to be captured in the Glossary of
Terms entry for "partition".
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 22:24 ` Dan Williams
@ 2025-01-23 3:10 ` Ira Weiny
0 siblings, 0 replies; 48+ messages in thread
From: Ira Weiny @ 2025-01-23 3:10 UTC (permalink / raw)
To: Dan Williams, Ira Weiny, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> Ira Weiny wrote:
> > Dan Williams wrote:
> >
> > [snip]
> >
> > > +/* if this fails the caller must destroy @cxlds, there is no recovery */
> > > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> > > +{
> > > + struct device *dev = cxlds->dev;
> > > +
> > > + guard(rwsem_write)(&cxl_dpa_rwsem);
> > > +
> > > + if (cxlds->nr_partitions)
> > > + return -EBUSY;
> > > +
> > > + if (!info->size || !info->nr_partitions) {
> > > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> > > + cxlds->nr_partitions = 0;
> > > + return 0;
> > > + }
> > > +
> > > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> > > +
> > > + for (int i = 0; i < info->nr_partitions; i++) {
> > > + const struct cxl_dpa_part_info *part = &info->part[i];
> > > + const char *desc;
> > > + int rc;
> > > +
> > > + if (part->mode == CXL_PARTMODE_RAM)
> > > + desc = "ram";
> > > + else if (part->mode == CXL_PARTMODE_PMEM)
> > > + desc = "pmem";
> > > + else
> > > + desc = "";
> >
> > This can be a follow on patch but why not allow devices to name their
> > partitions?
>
> The proposal in patch5 is that the partition resource name is the
> operation mode. See the changes to mode_show().
>
> So the name is there for the kernel/user ABI to tie the decoder's
> assigned partition to an operation mode.
>
> Now, what may need to happen is that the partitions and their modes get
> exported in case userspace needs to know that allocating a decoder to
> "dynamic ram" before it allocates a decoder to "ram" implies that future
> attempts to allocate "ram" will fail. That may not need to include the
> actual partition indices, just an ordering of operation modes.
I'm not buying the 'dynamic ram' partition type just yet. What evidence
do we have that a device will not have 2 dynamic ram partitions?
Further, in all our discussions on the use cases of DCD no one has
mentioned a need for the partition attributes to be communicated to the
user.
I'm starting to think this is better left to the user with the export of
the attributes of each partition to the user.
But that has little to do with this patch now.
>
> So far we have been saved from needing such a thing as "ram+pmem"
> devices are only an emulation test case, not something where end users
> can get themselves into trouble by doing out-of-order allocations.
>
> Usually RAM is pre-allocated by BIOS which also limits the possibility
> of the 'skip' code ever being used.
>
> > > + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> > > + cxlds->part[i].mode = part->mode;
> > > + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> > > + part->range.start, range_len(&part->range),
> > > + desc);
> > > + if (rc)
> > > + return rc;
> > > + cxlds->nr_partitions++;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
> > > +
> > > int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > > resource_size_t base, resource_size_t len,
> > > resource_size_t skipped)
> >
> > [snip]
> >
> > >
> > > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > > +
> > > +/* Static RAM is only expected at partition 0. */
> >
> > Is this because the spec requires RAM first and the partition array must
> > remain in DPA order?
>
> Yes.
>
> > This could be in a follow on patch, but unless I'm missing something the
> > partition information must be specified in increasing DPA order. Perhaps
> > that is accounted for in this series later.
>
> The partition information is specified in increasing DPA order in these
> patches, so I am missing the concern?
No not missing anything.
Ira
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 22:35 ` Dan Williams
@ 2025-01-23 3:14 ` Ira Weiny
2025-01-23 3:28 ` Dan Williams
0 siblings, 1 reply; 48+ messages in thread
From: Ira Weiny @ 2025-01-23 3:14 UTC (permalink / raw)
To: Dan Williams, Ira Weiny, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> Ira Weiny wrote:
> > Dan Williams wrote:
> > > cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> > > allocations being distinct from RAM allocations in specific ways when in
> > > practice the allocation rules are only relative to DPA partition index.
> > >
> > > The rules for cxl_dpa_alloc() are:
> > >
> > > - allocations can only come from 1 partition
> > >
> > > - if allocating at partition-index-N, all free space in partitions less
> > > than partition-index-N must be skipped over
> >
> > I think this is a bit deeper. The partition index must also correspond to
> > the DPA order. The DCD code verifies the partition index's are in DPA
> > order when reading them from the device. Therefore, that code will add
> > them to cxl_dpa_info in order. But general device driver writers may miss
> > this point.
>
> We could save them from themselves with some paranoia in
> cxl_dpa_setup(), but as Alejandro said accelerators are typically
> single-static-RAM-partition devices. The risk is low that someone builds
> a multi-partition accelerator *and* builds a driver that messes that up,
> but I would not say no to a comment that notes that expectation.
>
> > [snip]
> >
> > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > index 3f8a54ca4624..591aeb26c9e1 100644
> > > --- a/drivers/cxl/core/hdm.c
> > > +++ b/drivers/cxl/core/hdm.c
> > > @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> > > }
> > > EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
> > >
> > > +/* See request_skip() kernel-doc */
> > > +static void release_skip(struct cxl_dev_state *cxlds,
> > > + const resource_size_t skip_base,
> > > + const resource_size_t skip_len)
> > > +{
> > > + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> > > +
> > > + for (int i = 0; i < cxlds->nr_partitions; i++) {
> > > + const struct resource *part_res = &cxlds->part[i].res;
> > > + resource_size_t skip_end, skip_size;
> > > +
> > > + if (skip_start < part_res->start || skip_start > part_res->end)
> > > + continue;
> > > +
> > > + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> > > + skip_size = skip_end - skip_start + 1;
> > > + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> > > + skip_start += skip_size;
> > > + skip_rem -= skip_size;
> > > +
> > > + if (!skip_rem)
> > > + break;
> > > + }
> > > +}
> > > +
> > > /*
> > > * Must be called in a context that synchronizes against this decoder's
> > > * port ->remove() callback (like an endpoint decoder sysfs attribute)
> > > @@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > > skip_start = res->start - cxled->skip;
> > > __release_region(&cxlds->dpa_res, res->start, resource_size(res));
> > > if (cxled->skip)
> > > - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> > > + release_skip(cxlds, skip_start, cxled->skip);
> > > cxled->skip = 0;
> > > cxled->dpa_res = NULL;
> > > put_device(&cxled->cxld.dev);
> > > @@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > > __cxl_dpa_release(cxled);
> > > }
> > >
> > > +/**
> > > + * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
> > > + * @cxlds: CXL.mem device context that parents @cxled
> > > + * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
> > > + * @skip_base: DPA < start of new DPA allocation (DPAnew)
> > > + * @skip_len: @skip_base + @skip_len == DPAnew
> > > + *
> > > + * DPA 'skip' arises from out-of-sequence DPA allocation events relative
> > > + * to free capacity across multiple partitions. It is a wasteful event
> > > + * as usable DPA gets thrown away, but if a deployment has, for example,
> > > + * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
> > > + * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
> > > + * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
> > > + * Protection" for more details.
> >
> > I think this is a great comment here.
>
> Appreciate that, never know how these things are going to translate.
>
> >
> > > + *
> > > + * A 'skip' always covers the last allocated DPA in a previous partition
> > > + * to the start of the current partition to allocate. Allocations never
> > > + * start in the middle of a partition, and allocations are always
> > > + * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
> > > + * unwind order from forced in-order allocation).
> > > + *
> > > + * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
> > > + * would always be contained to a single partition. Given
> > > + * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
> > > + * might span "tail capacity of partition[0], all of partition[1], ...,
> > > + * all of partition[N-1]" to support allocating from partition[N]. That
> > > + * in turn interacts with the partition 'struct resource' boundaries
> > > + * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
> > > + * partition. I.e. this is a quirk of using a 'struct resource' tree to
> > > + * detect range conflicts while also tracking partition boundaries in
> > > + * @cxlds->dpa_res.
> >
> > Another great comment but it does not actually cover the DCD case. This
> > is because in DCD the partitions might also have skips between them.
>
> I think that "just works". The allocation will be bound by the
> partition, and the skip is calculated from the "end of last allocation
> in a previous partition". So, the distance between "end of last" and
> "allocation start" will naturally include inter-partition holes, right?
Not without a change to the algorithm I came up with. We could create
phantom partitions which represent the skips between partitions.
Otherwise the skip resources need a different parent.
From my commit message:
Two complications arise with Dynamic Capacity regions which did not
exist with Ram and PMEM partitions. First, gaps in the DPA space can
exist between and around the DC partitions. Second, the Linux resource
tree does not allow a resource to be marked across existing nodes within
a tree.
For clarity, below is an example of an 60GB device with 10GB of RAM,
10GB of PMEM and 10GB for each of 2 DC partitions. The desired CXL
mapping is 5GB of RAM, 5GB of PMEM, and 5GB of DC1.
DPA RANGE
(dpa_res)
0GB 10GB 20GB 30GB 40GB 50GB 60GB
|----------|----------|----------|----------|----------|----------|
RAM PMEM DC0 DC1
(ram_res) (pmem_res) (dc_res[0]) (dc_res[1])
|----------|----------| <gap> |----------| <gap> |----------|
RAM PMEM DC1
|XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----|
0GB 5GB 10GB 15GB 20GB 30GB 40GB 50GB 60GB
The previous skip resource between RAM and PMEM was always a child of
the RAM resource and fit nicely [see (S) below]. Because of this
simplicity this skip resource reference was not stored in any CXL state.
On release the skip range could be calculated based on the endpoint
decoders stored values.
Now when DC1 is being mapped 4 skip resources must be created as
children. One for the PMEM resource (A), two of the parent DPA resource
(B,D), and one more child of the DC0 resource (C).
0GB 10GB 20GB 30GB 40GB 50GB 60GB
|----------|----------|----------|----------|----------|----------|
| |
|----------|----------| | |----------| | |----------|
| | | | |
(S) (A) (B) (C) (D)
v v v v v
|XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----|
skip skip skip skip skip
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-23 3:14 ` Ira Weiny
@ 2025-01-23 3:28 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 3:28 UTC (permalink / raw)
To: Ira Weiny, Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Ira Weiny wrote:
[..]
> > > Another great comment but it does not actually cover the DCD case. This
> > > is because in DCD the partitions might also have skips between them.
> >
> > I think that "just works". The allocation will be bound by the
> > partition, and the skip is calculated from the "end of last allocation
> > in a previous partition". So, the distance between "end of last" and
> > "allocation start" will naturally include inter-partition holes, right?
>
> Not without a change to the algorithm I came up with. We could create
> phantom partitions which represent the skips between partitions.
> Otherwise the skip resources need a different parent.
Oh, that is true. Same problem as tracking skips across more than 2
partitions.
However, I highly doubt anyone is going to build a device with
inter-partition skips to the point where I am comfortable with the
driver rejecting the possibility of such devices. Effectively dare
someone to build such a needlessly complicated device especially when
Capacity Configuration supports the concept of decode-length vs usable
capacity.
>
> From my commit message:
>
> Two complications arise with Dynamic Capacity regions which did not
> exist with Ram and PMEM partitions. First, gaps in the DPA space can
> exist between and around the DC partitions. Second, the Linux resource
> tree does not allow a resource to be marked across existing nodes within
> a tree.
>
> For clarity, below is an example of an 60GB device with 10GB of RAM,
> 10GB of PMEM and 10GB for each of 2 DC partitions. The desired CXL
> mapping is 5GB of RAM, 5GB of PMEM, and 5GB of DC1.
>
> DPA RANGE
> (dpa_res)
> 0GB 10GB 20GB 30GB 40GB 50GB 60GB
> |----------|----------|----------|----------|----------|----------|
>
> RAM PMEM DC0 DC1
> (ram_res) (pmem_res) (dc_res[0]) (dc_res[1])
> |----------|----------| <gap> |----------| <gap> |----------|
>
> RAM PMEM DC1
> |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----|
> 0GB 5GB 10GB 15GB 20GB 30GB 40GB 50GB 60GB
>
> The previous skip resource between RAM and PMEM was always a child of
> the RAM resource and fit nicely [see (S) below]. Because of this
> simplicity this skip resource reference was not stored in any CXL state.
> On release the skip range could be calculated based on the endpoint
> decoders stored values.
>
> Now when DC1 is being mapped 4 skip resources must be created as
> children. One for the PMEM resource (A), two of the parent DPA resource
> (B,D), and one more child of the DC0 resource (C).
>
> 0GB 10GB 20GB 30GB 40GB 50GB 60GB
> |----------|----------|----------|----------|----------|----------|
> | |
> |----------|----------| | |----------| | |----------|
> | | | | |
> (S) (A) (B) (C) (D)
> v v v v v
> |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----|
> skip skip skip skip skip
Yeah, I simply have low interest to review a patch implementing that
scheme unless and until someone says "our device wants to do that, and
it is too late to change".
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 22:58 ` Dan Williams
@ 2025-01-23 3:39 ` Ira Weiny
2025-01-23 4:11 ` Dan Williams
0 siblings, 1 reply; 48+ messages in thread
From: Ira Weiny @ 2025-01-23 3:39 UTC (permalink / raw)
To: Dan Williams, Ira Weiny, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Dan Williams wrote:
> Ira Weiny wrote:
> > Dan Williams wrote:
> > > Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> > > tracked in the partition, and no code paths have dependencies on the
> > > mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> > > can be cleaned up, specifically this ambiguity on whether the operation
> > > mode implied anything about the partition order.
> > >
> > > Endpoint decoders simply reference their assigned partition where the
> > > operational mode can be retrieved as partition mode.
> >
> > You really seem to be defining a region mode not a partition mode.
>
> To me it comes down to the hierarchy of building up a region.
>
> The DPA is in a fixed operational mode regardless of whether a region is
> mapped to it. "pmem is always pmem", "ram is always ram" (modulo online
> re-partition which no device has ever built). So calling it a "partition
> mode" reflects that the partition comes first, then the endpoint decoder
> is mapped to a partition, then the region is mapped to an endpoint
> decoder. Region mode is subordinate to partition mode.
Exactly but
@@ -535,7 +517,7 @@ struct cxl_region_params {
struct cxl_region {
struct device dev;
int id;
- enum cxl_decoder_mode mode;
+ enum cxl_partition_mode mode;
enum cxl_decoder_type type;
struct cxl_nvdimm_bridge *cxl_nvb;
struct cxl_pmem_region *cxlr_pmem;
... is assigning that partition mode to the region.
>
> > I did a lot of work to resolve this for DCD interleave in the future.
> > This included the introduction of the DC region mode. I __think__ that
> > what you have here will work fine.
> >
> > However, from a user ABI standpoint I'm going to have to play games with
> > having the DCD partitions in a well defined sub-array such that the user
> > can specify which DCD partition they want to use. So the user concept of
> > decoder mode does not really go away.
>
> This is the question, do we need to rip that "give userspace explicit
> partition control" ABI band-aid?
>
> As I mentioned over here [1], I admit that someone might build a "ram,
> dynamic ram, shared ram" device, I remain skeptical that someone will
> build a, for example, "ram, dynamic ram, dynamic ram, shared ram"
> device. We can always make the ABI more complicated in the future, but
> the common case of "userspace need only care about mode and let the
> kernel find the partition", probably carries the implementation for the
> foreseeable future.
I've thought about this all afternoon. I feel like this is baking in policy in
the kernel. Wouldn't it be better to export the attributes of the partitions
to the user and have them sort it out?
>
> [1]: http://lore.kernel.org/67915ce296030_20fa29457@dwillia2-xfh.jf.intel.com.notmuch
>
> > In the interest of urgency I'm going to give my tag on this. But I would
> > have preferred this called region mode. But I can see why partition mode
> > makes sense too.
>
> It is a fair comment that deserves to be captured in the Glossary of
> Terms entry for "partition".
eh... sure.
Ira
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-23 3:39 ` Ira Weiny
@ 2025-01-23 4:11 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 4:11 UTC (permalink / raw)
To: Ira Weiny, Dan Williams, linux-cxl
Cc: Dave Jiang, Alejandro Lucero, Ira Weiny, Jonathan.Cameron
Ira Weiny wrote:
> Dan Williams wrote:
> > Ira Weiny wrote:
> > > Dan Williams wrote:
> > > > Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> > > > tracked in the partition, and no code paths have dependencies on the
> > > > mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> > > > can be cleaned up, specifically this ambiguity on whether the operation
> > > > mode implied anything about the partition order.
> > > >
> > > > Endpoint decoders simply reference their assigned partition where the
> > > > operational mode can be retrieved as partition mode.
> > >
> > > You really seem to be defining a region mode not a partition mode.
> >
> > To me it comes down to the hierarchy of building up a region.
> >
> > The DPA is in a fixed operational mode regardless of whether a region is
> > mapped to it. "pmem is always pmem", "ram is always ram" (modulo online
> > re-partition which no device has ever built). So calling it a "partition
> > mode" reflects that the partition comes first, then the endpoint decoder
> > is mapped to a partition, then the region is mapped to an endpoint
> > decoder. Region mode is subordinate to partition mode.
>
> Exactly but
>
> @@ -535,7 +517,7 @@ struct cxl_region_params {
> struct cxl_region {
> struct device dev;
> int id;
> - enum cxl_decoder_mode mode;
> + enum cxl_partition_mode mode;
> enum cxl_decoder_type type;
> struct cxl_nvdimm_bridge *cxl_nvb;
> struct cxl_pmem_region *cxlr_pmem;
>
> ... is assigning that partition mode to the region.
Right, to cache it and simplify other code paths. Otherwise, say more, I
am not picking up the argument.
The alternative is an awkward helper along the lines of:
to_cxl_region_mode(struct cxl_region *cxlr)
{
guard(rwsem_read)(&cxl_region_rwsem);
return cxlds->part[cxlr->params->targets[0]->part].mode;
}
...and some other gymnastics when assigning that first decoder to the
targets list.
> > > I did a lot of work to resolve this for DCD interleave in the future.
> > > This included the introduction of the DC region mode. I __think__ that
> > > what you have here will work fine.
> > >
> > > However, from a user ABI standpoint I'm going to have to play games with
> > > having the DCD partitions in a well defined sub-array such that the user
> > > can specify which DCD partition they want to use. So the user concept of
> > > decoder mode does not really go away.
> >
> > This is the question, do we need to rip that "give userspace explicit
> > partition control" ABI band-aid?
> >
> > As I mentioned over here [1], I admit that someone might build a "ram,
> > dynamic ram, shared ram" device, I remain skeptical that someone will
> > build a, for example, "ram, dynamic ram, dynamic ram, shared ram"
> > device. We can always make the ABI more complicated in the future, but
> > the common case of "userspace need only care about mode and let the
> > kernel find the partition", probably carries the implementation for the
> > foreseeable future.
>
> I've thought about this all afternoon. I feel like this is baking in policy in
> the kernel. Wouldn't it be better to export the attributes of the partitions
> to the user and have them sort it out?
We have gotten by without expanding the user ABI. A large swath of DCD
functionality can be had with the kernel doing the mode-to-partition
lookup. Dare a device to require the kernel to expand its ABI.
History is littered with premature spec enabling.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-22 14:11 ` Ira Weiny
@ 2025-01-23 15:49 ` Jonathan Cameron
2025-01-23 15:58 ` Alejandro Lucero Palau
2025-01-23 16:03 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2025-01-23 15:49 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, dave.jiang
On Wed, 22 Jan 2025 00:59:16 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> CXL_DECODER_MIXED is a safety mechanism introduced for the case where
> platform firmware has programmed an endpoint decoder that straddles a
> DPA partition boundary. While the kernel is careful to only allocate DPA
> capacity within a single partition there is no guarantee that platform
> firmware, or anything that touched the device before the current kernel,
> gets that right.
>
> However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
> designation because of the way it tracks partition boundaries. A
> request_resource() that spans ->ram_res and ->pmem_res fails with the
> following signature:
>
> __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
>
> CXL_DECODER_MIXED is dead defensive programming after the driver has
> already given up on the device. It has never offered any protection in
> practice, just delete it.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-22 14:18 ` Ira Weiny
@ 2025-01-23 15:57 ` Jonathan Cameron
2025-01-23 20:01 ` Dan Williams
2025-01-23 16:13 ` Dave Jiang
2025-01-23 16:25 ` Alejandro Lucero Palau
3 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2025-01-23 15:57 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
On Wed, 22 Jan 2025 00:59:21 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> In preparation for consolidating all DPA partition information into an
> array of DPA metadata, introduce helpers that hide the layout of the
> current data. I.e. make the eventual replacement of ->ram_res,
> ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> no-op for code paths that consume that information, and reduce the noise
> of follow-on patches.
>
> The end goal is to consolidate all DPA information in 'struct
> cxl_dev_state', but for now the helpers just make it appear that all DPA
> metadata is relative to @cxlds.
>
> Note that a follow-on patch also cleans up the temporary placeholders of
> @ram_res, and @pmem_res in the qos_class manipulation code,
> cxl_dpa_alloc(), and cxl_mem_create_range_info().
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++-----------------
> drivers/cxl/core/hdm.c | 26 ++++++++--------
> drivers/cxl/core/mbox.c | 18 ++++++-----
> drivers/cxl/core/memdev.c | 42 +++++++++++++------------
> drivers/cxl/core/region.c | 10 ++++--
> drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++-----
> drivers/cxl/mem.c | 2 +
> tools/testing/cxl/test/cxl.c | 25 ++++++++-------
> 8 files changed, 159 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 8153f8d83a16..b177a488e29b 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
> static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct xarray *dsmas_xa)
> {
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct device *dev = cxlds->dev;
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> - };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> - };
> struct dsmas_entry *dent;
> unsigned long index;
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> + };
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> + };
>
> xa_for_each(dsmas_xa, index, dent) {
> - if (resource_size(&cxlds->ram_res) &&
> - range_contains(&ram_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->ram_perf);
> - else if (resource_size(&cxlds->pmem_res) &&
> - range_contains(&pmem_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->pmem_perf);
> - else
> - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> - &dent->dpa_range);
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> + if (range_contains(&range, &dent->dpa_range))
> + update_perf_entry(dev, dent, perf[i]);
> + else
> + dev_dbg(dev,
> + "no partition for dsmas dpa: %pra\n",
> + &dent->dpa_range);
This else branch looks less than helpful if I read the code right.
It will fire at least once for every dsmas entry implying no partition when
in reality it is it probably matched on next partition.
Probably want to break out on match and check if i == ARRAY_SIZE(partition)
after the for loop and only then print the message.
> + }
> }
> }
>
> @@ -304,6 +308,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-22 14:11 ` Ira Weiny
2025-01-23 15:49 ` Jonathan Cameron
@ 2025-01-23 15:58 ` Alejandro Lucero Palau
2025-01-23 16:03 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 15:58 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: dave.jiang, Jonathan.Cameron
On 1/22/25 08:59, Dan Williams wrote:
> CXL_DECODER_MIXED is a safety mechanism introduced for the case where
> platform firmware has programmed an endpoint decoder that straddles a
> DPA partition boundary. While the kernel is careful to only allocate DPA
> capacity within a single partition there is no guarantee that platform
> firmware, or anything that touched the device before the current kernel,
> gets that right.
>
> However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
> designation because of the way it tracks partition boundaries. A
> request_resource() that spans ->ram_res and ->pmem_res fails with the
> following signature:
>
> __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
>
> CXL_DECODER_MIXED is dead defensive programming after the driver has
> already given up on the device. It has never offered any protection in
> practice, just delete it.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/core/hdm.c | 6 +++---
> drivers/cxl/core/region.c | 12 ------------
> drivers/cxl/cxl.h | 4 +---
> 3 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 28edd5822486..2848d6991d45 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -332,9 +332,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> else if (resource_contains(&cxlds->ram_res, res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> - dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n",
> - port->id, cxled->cxld.id, cxled->dpa_res);
> - cxled->mode = CXL_DECODER_MIXED;
> + dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> + port->id, cxled->cxld.id, res);
> + cxled->mode = CXL_DECODER_NONE;
> }
>
> port->hdm_end++;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d77899650798..e4885acac853 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2725,18 +2725,6 @@ static int poison_by_decoder(struct device *dev, void *arg)
> if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> return rc;
>
> - /*
> - * Regions are only created with single mode decoders: pmem or ram.
> - * Linux does not support mixed mode decoders. This means that
> - * reading poison per endpoint decoder adheres to the requirement
> - * that poison reads of pmem and ram must be separated.
> - * CXL 3.0 Spec 8.2.9.8.4.1
> - */
> - if (cxled->mode == CXL_DECODER_MIXED) {
> - dev_dbg(dev, "poison list read unsupported in mixed mode\n");
> - return rc;
> - }
> -
> cxlmd = cxled_to_memdev(cxled);
> if (cxled->skip) {
> offset = cxled->dpa_res->start - cxled->skip;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f6015f24ad38..4d0550367042 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -379,7 +379,6 @@ enum cxl_decoder_mode {
> CXL_DECODER_NONE,
> CXL_DECODER_RAM,
> CXL_DECODER_PMEM,
> - CXL_DECODER_MIXED,
> CXL_DECODER_DEAD,
> };
>
> @@ -389,10 +388,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> [CXL_DECODER_NONE] = "none",
> [CXL_DECODER_RAM] = "ram",
> [CXL_DECODER_PMEM] = "pmem",
> - [CXL_DECODER_MIXED] = "mixed",
> };
>
> - if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
> + if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
> return names[mode];
> return "mixed";
> }
>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
` (2 preceding siblings ...)
2025-01-23 15:58 ` Alejandro Lucero Palau
@ 2025-01-23 16:03 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2025-01-23 16:03 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Jonathan.Cameron
On 1/22/25 1:59 AM, Dan Williams wrote:
> CXL_DECODER_MIXED is a safety mechanism introduced for the case where
> platform firmware has programmed an endpoint decoder that straddles a
> DPA partition boundary. While the kernel is careful to only allocate DPA
> capacity within a single partition there is no guarantee that platform
> firmware, or anything that touched the device before the current kernel,
> gets that right.
>
> However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED
> designation because of the way it tracks partition boundaries. A
> request_resource() that spans ->ram_res and ->pmem_res fails with the
> following signature:
>
> __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation
>
> CXL_DECODER_MIXED is dead defensive programming after the driver has
> already given up on the device. It has never offered any protection in
> practice, just delete it.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/hdm.c | 6 +++---
> drivers/cxl/core/region.c | 12 ------------
> drivers/cxl/cxl.h | 4 +---
> 3 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 28edd5822486..2848d6991d45 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -332,9 +332,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> else if (resource_contains(&cxlds->ram_res, res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> - dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n",
> - port->id, cxled->cxld.id, cxled->dpa_res);
> - cxled->mode = CXL_DECODER_MIXED;
> + dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> + port->id, cxled->cxld.id, res);
> + cxled->mode = CXL_DECODER_NONE;
> }
>
> port->hdm_end++;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d77899650798..e4885acac853 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2725,18 +2725,6 @@ static int poison_by_decoder(struct device *dev, void *arg)
> if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> return rc;
>
> - /*
> - * Regions are only created with single mode decoders: pmem or ram.
> - * Linux does not support mixed mode decoders. This means that
> - * reading poison per endpoint decoder adheres to the requirement
> - * that poison reads of pmem and ram must be separated.
> - * CXL 3.0 Spec 8.2.9.8.4.1
> - */
> - if (cxled->mode == CXL_DECODER_MIXED) {
> - dev_dbg(dev, "poison list read unsupported in mixed mode\n");
> - return rc;
> - }
> -
> cxlmd = cxled_to_memdev(cxled);
> if (cxled->skip) {
> offset = cxled->dpa_res->start - cxled->skip;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f6015f24ad38..4d0550367042 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -379,7 +379,6 @@ enum cxl_decoder_mode {
> CXL_DECODER_NONE,
> CXL_DECODER_RAM,
> CXL_DECODER_PMEM,
> - CXL_DECODER_MIXED,
> CXL_DECODER_DEAD,
> };
>
> @@ -389,10 +388,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> [CXL_DECODER_NONE] = "none",
> [CXL_DECODER_RAM] = "ram",
> [CXL_DECODER_PMEM] = "pmem",
> - [CXL_DECODER_MIXED] = "mixed",
> };
>
> - if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
> + if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
> return names[mode];
> return "mixed";
> }
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-22 14:53 ` Ira Weiny
@ 2025-01-23 16:09 ` Jonathan Cameron
2025-01-23 20:24 ` Dan Williams
2025-01-23 16:57 ` Dave Jiang
` (2 subsequent siblings)
4 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2025-01-23 16:09 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
On Wed, 22 Jan 2025 00:59:27 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> The pending efforts to add CXL Accelerator (type-2) device [1], and
> Dynamic Capacity (DCD) support [2], tripped on the
> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> device-physical-address (DPA) metadata. Trip hazards include:
>
> - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> devices with CXL.mem likely do not in the common case.
>
> - CXL Memory Devices enumerate DPA through Memory Device mailbox
> commands like Partition Info, Accelerators devices do not.
>
> - CXL Memory Devices that support DCD support more than 2 partitions.
> Some of the driver algorithms are awkward to expand to > 2 partition
> cases.
>
> - DPA performance data is a general capability that can be shared with
> accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
> suitable.
>
> - Hardcoded assumptions around the PMEM partition always being index-1
> if RAM is zero-sized or PMEM is zero sized.
>
> - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
> memory property, it should be phased in favor of a partition id and
> the memory property comes from the partition info.
>
> Towards cleaning up those issues and allowing a smoother landing for the
> aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
> array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
> way for Memory Devices and Accelerators to initialize the DPA information
> in 'struct cxl_dev_state'.
>
> For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
> get the new data structure initialized, and cleanup some qos_class init.
> Follow on patches will go further to use the new data structure to
> cleanup algorithms that are better suited to loop over all possible
> partitions.
>
> cxl_dpa_setup() follows the locking expectations of mutating the device
> DPA map, and is suitable for Accelerator drivers to use. Accelerators
> likely only have one hardcoded 'ram' partition to convey to the
> cxl_core.
>
> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A few trivial comments inline but looking better to me.
One question about what smells to me like our next MIXED mode.
> ---
> drivers/cxl/core/cdat.c | 15 ++-----
> drivers/cxl/core/hdm.c | 75 +++++++++++++++++++++++++++++++++-
> drivers/cxl/core/mbox.c | 68 ++++++++++--------------------
> drivers/cxl/core/memdev.c | 2 -
> drivers/cxl/cxlmem.h | 94 +++++++++++++++++++++++++++++-------------
> drivers/cxl/pci.c | 7 +++
> tools/testing/cxl/test/cxl.c | 15 ++-----
> tools/testing/cxl/test/mem.c | 7 +++
> 8 files changed, 183 insertions(+), 100 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index b177a488e29b..5400a421ad30 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> +/* if this fails the caller must destroy @cxlds, there is no recovery */
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> +{
> + struct device *dev = cxlds->dev;
> +
> + guard(rwsem_write)(&cxl_dpa_rwsem);
> +
> + if (cxlds->nr_partitions)
> + return -EBUSY;
> +
> + if (!info->size || !info->nr_partitions) {
> + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> + cxlds->nr_partitions = 0;
> + return 0;
> + }
> +
> + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> +
> + for (int i = 0; i < info->nr_partitions; i++) {
> + const struct cxl_dpa_part_info *part = &info->part[i];
> + const char *desc;
> + int rc;
> +
> + if (part->mode == CXL_PARTMODE_RAM)
> + desc = "ram";
> + else if (part->mode == CXL_PARTMODE_PMEM)
> + desc = "pmem";
I'd go switch statement now to save having to fix this up later, or
an array of strings with a bounds check.
(not important though if you want to shunt that into another day)
> + else
> + desc = "";
> + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> + cxlds->part[i].mode = part->mode;
> + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> + part->range.start, range_len(&part->range),
> + desc);
> + if (rc)
> + return rc;
> + cxlds->nr_partitions++;
> + }
> +
> + return 0;
> +}
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 3502f1633ad2..62bb3653362f 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 78e92e24d7b5..15f549afab7c 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -97,6 +97,25 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped);
>
> +enum cxl_partition_mode {
> + CXL_PARTMODE_NONE,
What is NONE for? Given you are now packing the partitions and
counting them when would we get an 'empty' one?
> + CXL_PARTMODE_RAM,
> + CXL_PARTMODE_PMEM,
> +};
> +
> +#define CXL_NR_PARTITIONS_MAX 2
> +
> +struct cxl_dpa_info {
> + u64 size;
> + struct cxl_dpa_part_info {
> + struct range range;
> + enum cxl_partition_mode mode;
> + } part[CXL_NR_PARTITIONS_MAX];
> + int nr_partitions;
> +};
> +
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-22 14:18 ` Ira Weiny
2025-01-23 15:57 ` Jonathan Cameron
@ 2025-01-23 16:13 ` Dave Jiang
2025-01-23 16:25 ` Alejandro Lucero Palau
3 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2025-01-23 16:13 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Alejandro Lucero, Ira Weiny, Jonathan.Cameron
On 1/22/25 1:59 AM, Dan Williams wrote:
> In preparation for consolidating all DPA partition information into an
> array of DPA metadata, introduce helpers that hide the layout of the
> current data. I.e. make the eventual replacement of ->ram_res,
> ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> no-op for code paths that consume that information, and reduce the noise
> of follow-on patches.
>
> The end goal is to consolidate all DPA information in 'struct
> cxl_dev_state', but for now the helpers just make it appear that all DPA
> metadata is relative to @cxlds.
>
> Note that a follow-on patch also cleans up the temporary placeholders of
> @ram_res, and @pmem_res in the qos_class manipulation code,
> cxl_dpa_alloc(), and cxl_mem_create_range_info().
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With Jonathan's comment addressed
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++-----------------
> drivers/cxl/core/hdm.c | 26 ++++++++--------
> drivers/cxl/core/mbox.c | 18 ++++++-----
> drivers/cxl/core/memdev.c | 42 +++++++++++++------------
> drivers/cxl/core/region.c | 10 ++++--
> drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++-----
> drivers/cxl/mem.c | 2 +
> tools/testing/cxl/test/cxl.c | 25 ++++++++-------
> 8 files changed, 159 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 8153f8d83a16..b177a488e29b 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
> static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct xarray *dsmas_xa)
> {
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct device *dev = cxlds->dev;
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> - };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> - };
> struct dsmas_entry *dent;
> unsigned long index;
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> + };
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> + };
>
> xa_for_each(dsmas_xa, index, dent) {
> - if (resource_size(&cxlds->ram_res) &&
> - range_contains(&ram_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->ram_perf);
> - else if (resource_size(&cxlds->pmem_res) &&
> - range_contains(&pmem_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->pmem_perf);
> - else
> - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> - &dent->dpa_range);
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> + if (range_contains(&range, &dent->dpa_range))
> + update_perf_entry(dev, dent, perf[i]);
> + else
> + dev_dbg(dev,
> + "no partition for dsmas dpa: %pra\n",
> + &dent->dpa_range);
> + }
> }
> }
>
> @@ -304,6 +308,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
>
> static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
> {
> + if (!dpa_perf)
> + return;
> +
> *dpa_perf = (struct cxl_dpa_perf) {
> .qos_class = CXL_QOS_CLASS_INVALID,
> };
> @@ -312,6 +319,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
> static bool cxl_qos_match(struct cxl_port *root_port,
> struct cxl_dpa_perf *dpa_perf)
> {
> + if (!dpa_perf)
> + return false;
> +
> if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
> return false;
>
> @@ -346,7 +356,8 @@ static int match_cxlrd_hb(struct device *dev, void *data)
> static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
> {
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds),
> + *pmem_perf = to_pmem_perf(cxlds);
> struct cxl_port *root_port;
> int rc;
>
> @@ -359,17 +370,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
> root_port = &cxl_root->port;
>
> /* Check that the QTG IDs are all sane between end device and root decoders */
> - if (!cxl_qos_match(root_port, &mds->ram_perf))
> - reset_dpa_perf(&mds->ram_perf);
> - if (!cxl_qos_match(root_port, &mds->pmem_perf))
> - reset_dpa_perf(&mds->pmem_perf);
> + if (!cxl_qos_match(root_port, ram_perf))
> + reset_dpa_perf(ram_perf);
> + if (!cxl_qos_match(root_port, pmem_perf))
> + reset_dpa_perf(pmem_perf);
>
> /* Check to make sure that the device's host bridge is under a root decoder */
> rc = device_for_each_child(&root_port->dev,
> cxlmd->endpoint->host_bridge, match_cxlrd_hb);
> if (!rc) {
> - reset_dpa_perf(&mds->ram_perf);
> - reset_dpa_perf(&mds->pmem_perf);
> + reset_dpa_perf(ram_perf);
> + reset_dpa_perf(pmem_perf);
> }
>
> return rc;
> @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
> .end = dpa_res->end,
> };
>
> + if (!perf)
> + return false;
> +
> return range_contains(&perf->dpa_range, &dpa);
> }
>
> @@ -574,15 +588,15 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle
> enum cxl_decoder_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_dpa_perf *perf;
>
> switch (mode) {
> case CXL_DECODER_RAM:
> - perf = &mds->ram_perf;
> + perf = to_ram_perf(cxlds);
> break;
> case CXL_DECODER_PMEM:
> - perf = &mds->pmem_perf;
> + perf = to_pmem_perf(cxlds);
> break;
> default:
> return ERR_PTR(-EINVAL);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 2848d6991d45..7a85522294ad 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (resource_contains(&cxlds->pmem_res, res))
> + if (resource_contains(to_pmem_res(cxlds), res))
> cxled->mode = CXL_DECODER_PMEM;
> - else if (resource_contains(&cxlds->ram_res, res))
> + else if (resource_contains(to_ram_res(cxlds), res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> @@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> * Only allow modes that are supported by the current partition
> * configuration
> */
> - if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
> + if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> dev_dbg(dev, "no available pmem capacity\n");
> return -ENXIO;
> }
> - if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
> + if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> dev_dbg(dev, "no available ram capacity\n");
> return -ENXIO;
> }
> @@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> struct device *dev = &cxled->cxld.dev;
> resource_size_t start, avail, skip;
> struct resource *p, *last;
> + const struct resource *ram_res = to_ram_res(cxlds);
> + const struct resource *pmem_res = to_pmem_res(cxlds);
> int rc;
>
> down_write(&cxl_dpa_rwsem);
> @@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
> + for (p = ram_res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> free_ram_start = last->end + 1;
> else
> - free_ram_start = cxlds->ram_res.start;
> + free_ram_start = ram_res->start;
>
> - for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
> + for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> free_pmem_start = last->end + 1;
> else
> - free_pmem_start = cxlds->pmem_res.start;
> + free_pmem_start = pmem_res->start;
>
> if (cxled->mode == CXL_DECODER_RAM) {
> start = free_ram_start;
> - avail = cxlds->ram_res.end - start + 1;
> + avail = ram_res->end - start + 1;
> skip = 0;
> } else if (cxled->mode == CXL_DECODER_PMEM) {
> resource_size_t skip_start, skip_end;
>
> start = free_pmem_start;
> - avail = cxlds->pmem_res.end - start + 1;
> + avail = pmem_res->end - start + 1;
> skip_start = free_ram_start;
>
> /*
> * If some pmem is already allocated, then that allocation
> * already handled the skip.
> */
> - if (cxlds->pmem_res.child &&
> - skip_start == cxlds->pmem_res.child->start)
> + if (pmem_res->child &&
> + skip_start == pmem_res->child->start)
> skip_end = skip_start - 1;
> else
> skip_end = start - 1;
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 548564c770c0..3502f1633ad2 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent,
> int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> + struct resource *ram_res = to_ram_res(cxlds);
> + struct resource *pmem_res = to_pmem_res(cxlds);
> struct device *dev = cxlds->dev;
> int rc;
>
> if (!cxlds->media_ready) {
> cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> - cxlds->ram_res = DEFINE_RES_MEM(0, 0);
> - cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
> + *ram_res = DEFINE_RES_MEM(0, 0);
> + *pmem_res = DEFINE_RES_MEM(0, 0);
> return 0;
> }
>
> cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
>
> if (mds->partition_align_bytes == 0) {
> - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> mds->volatile_only_bytes, "ram");
> if (rc)
> return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> mds->volatile_only_bytes,
> mds->persistent_only_bytes, "pmem");
> }
> @@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> return rc;
> }
>
> - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> mds->active_volatile_bytes, "ram");
> if (rc)
> return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> mds->active_volatile_bytes,
> mds->active_persistent_bytes, "pmem");
> }
> @@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> mds->cxlds.reg_map.host = dev;
> mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> - mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
> - mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
> + to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
> + to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
>
> return mds;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index ae3dfcbe8938..c5f8320ed330 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(&cxlds->ram_res);
> + unsigned long long len = resource_size(to_ram_res(cxlds));
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> @@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(&cxlds->pmem_res);
> + unsigned long long len = cxl_pmem_size(cxlds);
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> @@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> int rc = 0;
>
> /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
> - if (resource_size(&cxlds->pmem_res)) {
> - offset = cxlds->pmem_res.start;
> - length = resource_size(&cxlds->pmem_res);
> + if (cxl_pmem_size(cxlds)) {
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc)
> return rc;
> }
> - if (resource_size(&cxlds->ram_res)) {
> - offset = cxlds->ram_res.start;
> - length = resource_size(&cxlds->ram_res);
> + if (cxl_ram_size(cxlds)) {
> + const struct resource *res = to_ram_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> /*
> * Invalid Physical Address is not an error for
> @@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>
> - return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
> + return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class);
> }
>
> static struct device_attribute dev_attr_pmem_qos_class =
> @@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>
> - return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
> + return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class);
> }
>
> static struct device_attribute dev_attr_ram_qos_class =
> @@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
> {
> struct device *dev = kobj_to_dev(kobj);
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds);
>
> - if (a == &dev_attr_ram_qos_class.attr)
> - if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
> - return 0;
> + if (a == &dev_attr_ram_qos_class.attr &&
> + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
> + return 0;
>
> return a->mode;
> }
> @@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n
> {
> struct device *dev = kobj_to_dev(kobj);
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds);
>
> - if (a == &dev_attr_pmem_qos_class.attr)
> - if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
> - return 0;
> + if (a == &dev_attr_pmem_qos_class.attr &&
> + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
> + return 0;
>
> return a->mode;
> }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e4885acac853..9f0f6fdbc841 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2688,7 +2688,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
>
> if (ctx->mode == CXL_DECODER_RAM) {
> offset = ctx->offset;
> - length = resource_size(&cxlds->ram_res) - offset;
> + length = cxl_ram_size(cxlds) - offset;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc == -EFAULT)
> rc = 0;
> @@ -2700,9 +2700,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> length = resource_size(&cxlds->dpa_res) - offset;
> if (!length)
> return 0;
> - } else if (resource_size(&cxlds->pmem_res)) {
> - offset = cxlds->pmem_res.start;
> - length = resource_size(&cxlds->pmem_res);
> + } else if (cxl_pmem_size(cxlds)) {
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> } else {
> return 0;
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 2a25d1957ddb..78e92e24d7b5 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -423,8 +423,8 @@ struct cxl_dpa_perf {
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> - * @pmem_res: Active Persistent memory capacity configuration
> - * @ram_res: Active Volatile memory capacity configuration
> + * @_pmem_res: Active Persistent memory capacity configuration
> + * @_ram_res: Active Volatile memory capacity configuration
> * @serial: PCIe Device Serial Number
> * @type: Generic Memory Class device or Vendor Specific Memory device
> * @cxl_mbox: CXL mailbox context
> @@ -438,13 +438,41 @@ struct cxl_dev_state {
> bool rcd;
> bool media_ready;
> struct resource dpa_res;
> - struct resource pmem_res;
> - struct resource ram_res;
> + struct resource _pmem_res;
> + struct resource _ram_res;
> u64 serial;
> enum cxl_devtype type;
> struct cxl_mailbox cxl_mbox;
> };
>
> +static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +{
> + return &cxlds->_ram_res;
> +}
> +
> +static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +{
> + return &cxlds->_pmem_res;
> +}
> +
> +static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
> +{
> + const struct resource *res = to_ram_res(cxlds);
> +
> + if (!res)
> + return 0;
> + return resource_size(res);
> +}
> +
> +static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> +{
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + if (!res)
> + return 0;
> + return resource_size(res);
> +}
> +
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
> @@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> * @active_persistent_bytes: sum of hard + soft persistent
> * @next_volatile_bytes: volatile capacity change pending device reset
> * @next_persistent_bytes: persistent capacity change pending device reset
> - * @ram_perf: performance data entry matched to RAM partition
> - * @pmem_perf: performance data entry matched to PMEM partition
> + * @_ram_perf: performance data entry matched to RAM partition
> + * @_pmem_perf: performance data entry matched to PMEM partition
> * @event: event log driver state
> * @poison: poison driver state info
> * @security: security driver state info
> @@ -496,8 +524,8 @@ struct cxl_memdev_state {
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
>
> - struct cxl_dpa_perf ram_perf;
> - struct cxl_dpa_perf pmem_perf;
> + struct cxl_dpa_perf _ram_perf;
> + struct cxl_dpa_perf _pmem_perf;
>
> struct cxl_event_state event;
> struct cxl_poison_state poison;
> @@ -505,6 +533,20 @@ struct cxl_memdev_state {
> struct cxl_fw_state fw;
> };
>
> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> +
> + return &mds->_ram_perf;
> +}
> +
> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> +
> + return &mds->_pmem_perf;
> +}
> +
> static inline struct cxl_memdev_state *
> to_cxl_memdev_state(struct cxl_dev_state *cxlds)
> {
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 2f03a4d5606e..9675243bd05b 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev)
> return -ENXIO;
> }
>
> - if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> + if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
> if (rc) {
> if (rc == -ENODEV)
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index d0337c11f9ee..7f1c5061307b 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
> find_cxl_root(port);
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> };
>
> if (!cxl_root)
> return;
>
> - if (range_len(&ram_range))
> - dpa_perf_setup(port, &ram_range, &mds->ram_perf);
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
>
> - if (range_len(&pmem_range))
> - dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
> + dpa_perf_setup(port, &range, perf[i]);
> + }
>
> cxl_memdev_update_perf(cxlmd);
>
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
` (2 preceding siblings ...)
2025-01-23 16:13 ` Dave Jiang
@ 2025-01-23 16:25 ` Alejandro Lucero Palau
2025-01-23 21:04 ` Dan Williams
3 siblings, 1 reply; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 16:25 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
On 1/22/25 08:59, Dan Williams wrote:
> In preparation for consolidating all DPA partition information into an
> array of DPA metadata, introduce helpers that hide the layout of the
> current data. I.e. make the eventual replacement of ->ram_res,
> ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> no-op for code paths that consume that information, and reduce the noise
> of follow-on patches.
>
> The end goal is to consolidate all DPA information in 'struct
> cxl_dev_state', but for now the helpers just make it appear that all DPA
> metadata is relative to @cxlds.
>
> Note that a follow-on patch also cleans up the temporary placeholders of
> @ram_res, and @pmem_res in the qos_class manipulation code,
> cxl_dpa_alloc(), and cxl_mem_create_range_info().
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++-----------------
> drivers/cxl/core/hdm.c | 26 ++++++++--------
> drivers/cxl/core/mbox.c | 18 ++++++-----
> drivers/cxl/core/memdev.c | 42 +++++++++++++------------
> drivers/cxl/core/region.c | 10 ++++--
> drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++-----
> drivers/cxl/mem.c | 2 +
> tools/testing/cxl/test/cxl.c | 25 ++++++++-------
> 8 files changed, 159 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 8153f8d83a16..b177a488e29b 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
> static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct xarray *dsmas_xa)
> {
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct device *dev = cxlds->dev;
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> - };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> - };
> struct dsmas_entry *dent;
> unsigned long index;
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> + };
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> + };
>
> xa_for_each(dsmas_xa, index, dent) {
> - if (resource_size(&cxlds->ram_res) &&
> - range_contains(&ram_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->ram_perf);
> - else if (resource_size(&cxlds->pmem_res) &&
> - range_contains(&pmem_range, &dent->dpa_range))
> - update_perf_entry(dev, dent, &mds->pmem_perf);
> - else
> - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> - &dent->dpa_range);
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> + if (range_contains(&range, &dent->dpa_range))
> + update_perf_entry(dev, dent, perf[i]);
This is checking if range contains dent->dpa_range.
I think it is just the opposite.
> + else
> + dev_dbg(dev,
> + "no partition for dsmas dpa: %pra\n",
> + &dent->dpa_range);
> + }
> }
> }
>
> @@ -304,6 +308,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
>
> static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
> {
> + if (!dpa_perf)
> + return;
> +
> *dpa_perf = (struct cxl_dpa_perf) {
> .qos_class = CXL_QOS_CLASS_INVALID,
> };
> @@ -312,6 +319,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
> static bool cxl_qos_match(struct cxl_port *root_port,
> struct cxl_dpa_perf *dpa_perf)
> {
> + if (!dpa_perf)
> + return false;
> +
> if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
> return false;
>
> @@ -346,7 +356,8 @@ static int match_cxlrd_hb(struct device *dev, void *data)
> static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
> {
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds),
> + *pmem_perf = to_pmem_perf(cxlds);
> struct cxl_port *root_port;
> int rc;
>
> @@ -359,17 +370,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
> root_port = &cxl_root->port;
>
> /* Check that the QTG IDs are all sane between end device and root decoders */
> - if (!cxl_qos_match(root_port, &mds->ram_perf))
> - reset_dpa_perf(&mds->ram_perf);
> - if (!cxl_qos_match(root_port, &mds->pmem_perf))
> - reset_dpa_perf(&mds->pmem_perf);
> + if (!cxl_qos_match(root_port, ram_perf))
> + reset_dpa_perf(ram_perf);
> + if (!cxl_qos_match(root_port, pmem_perf))
> + reset_dpa_perf(pmem_perf);
>
> /* Check to make sure that the device's host bridge is under a root decoder */
> rc = device_for_each_child(&root_port->dev,
> cxlmd->endpoint->host_bridge, match_cxlrd_hb);
> if (!rc) {
> - reset_dpa_perf(&mds->ram_perf);
> - reset_dpa_perf(&mds->pmem_perf);
> + reset_dpa_perf(ram_perf);
> + reset_dpa_perf(pmem_perf);
> }
>
> return rc;
> @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
> .end = dpa_res->end,
> };
>
> + if (!perf)
> + return false;
> +
This change seems to be an improvement or hardening. Not against doing
it, but seizing the change, the function can be simplified using the
parameter without any local variable.
> return range_contains(&perf->dpa_range, &dpa);
> }
>
> @@ -574,15 +588,15 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle
> enum cxl_decoder_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_dpa_perf *perf;
>
> switch (mode) {
> case CXL_DECODER_RAM:
> - perf = &mds->ram_perf;
> + perf = to_ram_perf(cxlds);
> break;
> case CXL_DECODER_PMEM:
> - perf = &mds->pmem_perf;
> + perf = to_pmem_perf(cxlds);
> break;
> default:
> return ERR_PTR(-EINVAL);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 2848d6991d45..7a85522294ad 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (resource_contains(&cxlds->pmem_res, res))
> + if (resource_contains(to_pmem_res(cxlds), res))
> cxled->mode = CXL_DECODER_PMEM;
> - else if (resource_contains(&cxlds->ram_res, res))
> + else if (resource_contains(to_ram_res(cxlds), res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> @@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> * Only allow modes that are supported by the current partition
> * configuration
> */
> - if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) {
> + if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> dev_dbg(dev, "no available pmem capacity\n");
> return -ENXIO;
> }
> - if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) {
> + if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> dev_dbg(dev, "no available ram capacity\n");
> return -ENXIO;
> }
> @@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> struct device *dev = &cxled->cxld.dev;
> resource_size_t start, avail, skip;
> struct resource *p, *last;
> + const struct resource *ram_res = to_ram_res(cxlds);
> + const struct resource *pmem_res = to_pmem_res(cxlds);
> int rc;
>
> down_write(&cxl_dpa_rwsem);
> @@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling)
> + for (p = ram_res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> free_ram_start = last->end + 1;
> else
> - free_ram_start = cxlds->ram_res.start;
> + free_ram_start = ram_res->start;
>
> - for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling)
> + for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> free_pmem_start = last->end + 1;
> else
> - free_pmem_start = cxlds->pmem_res.start;
> + free_pmem_start = pmem_res->start;
>
> if (cxled->mode == CXL_DECODER_RAM) {
> start = free_ram_start;
> - avail = cxlds->ram_res.end - start + 1;
> + avail = ram_res->end - start + 1;
> skip = 0;
> } else if (cxled->mode == CXL_DECODER_PMEM) {
> resource_size_t skip_start, skip_end;
>
> start = free_pmem_start;
> - avail = cxlds->pmem_res.end - start + 1;
> + avail = pmem_res->end - start + 1;
> skip_start = free_ram_start;
>
> /*
> * If some pmem is already allocated, then that allocation
> * already handled the skip.
> */
> - if (cxlds->pmem_res.child &&
> - skip_start == cxlds->pmem_res.child->start)
> + if (pmem_res->child &&
> + skip_start == pmem_res->child->start)
> skip_end = skip_start - 1;
> else
> skip_end = start - 1;
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 548564c770c0..3502f1633ad2 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent,
> int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> + struct resource *ram_res = to_ram_res(cxlds);
> + struct resource *pmem_res = to_pmem_res(cxlds);
> struct device *dev = cxlds->dev;
> int rc;
>
> if (!cxlds->media_ready) {
> cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> - cxlds->ram_res = DEFINE_RES_MEM(0, 0);
> - cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
> + *ram_res = DEFINE_RES_MEM(0, 0);
> + *pmem_res = DEFINE_RES_MEM(0, 0);
This is a good example for the discussion about the patch hardening
resource_contains. The initialization seems fine but IORESOURCE_UNSET
not used.
it could be argued the resource is set, but it is a zero-size resource
leading to problems in current CXL code.
> return 0;
> }
>
> cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
>
> if (mds->partition_align_bytes == 0) {
> - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> mds->volatile_only_bytes, "ram");
> if (rc)
> return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> mds->volatile_only_bytes,
> mds->persistent_only_bytes, "pmem");
> }
> @@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> return rc;
> }
>
> - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0,
> + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> mds->active_volatile_bytes, "ram");
> if (rc)
> return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res,
> + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> mds->active_volatile_bytes,
> mds->active_persistent_bytes, "pmem");
> }
> @@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> mds->cxlds.reg_map.host = dev;
> mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> - mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
> - mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
> + to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
> + to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
>
> return mds;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index ae3dfcbe8938..c5f8320ed330 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(&cxlds->ram_res);
> + unsigned long long len = resource_size(to_ram_res(cxlds));
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> @@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(&cxlds->pmem_res);
> + unsigned long long len = cxl_pmem_size(cxlds);
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> @@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> int rc = 0;
>
> /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
> - if (resource_size(&cxlds->pmem_res)) {
> - offset = cxlds->pmem_res.start;
> - length = resource_size(&cxlds->pmem_res);
> + if (cxl_pmem_size(cxlds)) {
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc)
> return rc;
> }
> - if (resource_size(&cxlds->ram_res)) {
> - offset = cxlds->ram_res.start;
> - length = resource_size(&cxlds->ram_res);
> + if (cxl_ram_size(cxlds)) {
> + const struct resource *res = to_ram_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> /*
> * Invalid Physical Address is not an error for
> @@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>
> - return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
> + return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class);
> }
>
> static struct device_attribute dev_attr_pmem_qos_class =
> @@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>
> - return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
> + return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class);
> }
>
> static struct device_attribute dev_attr_ram_qos_class =
> @@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
> {
> struct device *dev = kobj_to_dev(kobj);
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds);
>
> - if (a == &dev_attr_ram_qos_class.attr)
> - if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
> - return 0;
> + if (a == &dev_attr_ram_qos_class.attr &&
> + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
> + return 0;
>
> return a->mode;
> }
> @@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n
> {
> struct device *dev = kobj_to_dev(kobj);
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> + struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds);
>
> - if (a == &dev_attr_pmem_qos_class.attr)
> - if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
> - return 0;
> + if (a == &dev_attr_pmem_qos_class.attr &&
> + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
> + return 0;
>
> return a->mode;
> }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e4885acac853..9f0f6fdbc841 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2688,7 +2688,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
>
> if (ctx->mode == CXL_DECODER_RAM) {
> offset = ctx->offset;
> - length = resource_size(&cxlds->ram_res) - offset;
> + length = cxl_ram_size(cxlds) - offset;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc == -EFAULT)
> rc = 0;
> @@ -2700,9 +2700,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> length = resource_size(&cxlds->dpa_res) - offset;
> if (!length)
> return 0;
> - } else if (resource_size(&cxlds->pmem_res)) {
> - offset = cxlds->pmem_res.start;
> - length = resource_size(&cxlds->pmem_res);
> + } else if (cxl_pmem_size(cxlds)) {
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + offset = res->start;
> + length = resource_size(res);
> } else {
> return 0;
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 2a25d1957ddb..78e92e24d7b5 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -423,8 +423,8 @@ struct cxl_dpa_perf {
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> - * @pmem_res: Active Persistent memory capacity configuration
> - * @ram_res: Active Volatile memory capacity configuration
> + * @_pmem_res: Active Persistent memory capacity configuration
> + * @_ram_res: Active Volatile memory capacity configuration
> * @serial: PCIe Device Serial Number
> * @type: Generic Memory Class device or Vendor Specific Memory device
> * @cxl_mbox: CXL mailbox context
> @@ -438,13 +438,41 @@ struct cxl_dev_state {
> bool rcd;
> bool media_ready;
> struct resource dpa_res;
> - struct resource pmem_res;
> - struct resource ram_res;
> + struct resource _pmem_res;
> + struct resource _ram_res;
> u64 serial;
> enum cxl_devtype type;
> struct cxl_mailbox cxl_mbox;
> };
>
> +static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +{
> + return &cxlds->_ram_res;
> +}
> +
> +static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +{
> + return &cxlds->_pmem_res;
> +}
> +
> +static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
> +{
> + const struct resource *res = to_ram_res(cxlds);
> +
> + if (!res)
> + return 0;
> + return resource_size(res);
> +}
> +
> +static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> +{
> + const struct resource *res = to_pmem_res(cxlds);
> +
> + if (!res)
> + return 0;
> + return resource_size(res);
> +}
> +
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
> @@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> * @active_persistent_bytes: sum of hard + soft persistent
> * @next_volatile_bytes: volatile capacity change pending device reset
> * @next_persistent_bytes: persistent capacity change pending device reset
> - * @ram_perf: performance data entry matched to RAM partition
> - * @pmem_perf: performance data entry matched to PMEM partition
> + * @_ram_perf: performance data entry matched to RAM partition
> + * @_pmem_perf: performance data entry matched to PMEM partition
> * @event: event log driver state
> * @poison: poison driver state info
> * @security: security driver state info
> @@ -496,8 +524,8 @@ struct cxl_memdev_state {
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
>
> - struct cxl_dpa_perf ram_perf;
> - struct cxl_dpa_perf pmem_perf;
> + struct cxl_dpa_perf _ram_perf;
> + struct cxl_dpa_perf _pmem_perf;
>
> struct cxl_event_state event;
> struct cxl_poison_state poison;
> @@ -505,6 +533,20 @@ struct cxl_memdev_state {
> struct cxl_fw_state fw;
> };
>
> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> +
> + return &mds->_ram_perf;
> +}
> +
> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> +
> + return &mds->_pmem_perf;
> +}
> +
> static inline struct cxl_memdev_state *
> to_cxl_memdev_state(struct cxl_dev_state *cxlds)
> {
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 2f03a4d5606e..9675243bd05b 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev)
> return -ENXIO;
> }
>
> - if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> + if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> rc = devm_cxl_add_nvdimm(parent_port, cxlmd);
> if (rc) {
> if (rc == -ENODEV)
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index d0337c11f9ee..7f1c5061307b 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
> find_cxl_root(port);
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
> - struct range pmem_range = {
> - .start = cxlds->pmem_res.start,
> - .end = cxlds->pmem_res.end,
> + const struct resource *partition[] = {
> + to_ram_res(cxlds),
> + to_pmem_res(cxlds),
> };
> - struct range ram_range = {
> - .start = cxlds->ram_res.start,
> - .end = cxlds->ram_res.end,
> + struct cxl_dpa_perf *perf[] = {
> + to_ram_perf(cxlds),
> + to_pmem_perf(cxlds),
> };
>
> if (!cxl_root)
> return;
>
> - if (range_len(&ram_range))
> - dpa_perf_setup(port, &ram_range, &mds->ram_perf);
> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> + const struct resource *res = partition[i];
> + struct range range = {
> + .start = res->start,
> + .end = res->end,
> + };
>
> - if (range_len(&pmem_range))
> - dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
> + dpa_perf_setup(port, &range, perf[i]);
> + }
>
> cxl_memdev_update_perf(cxlmd);
>
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-22 16:29 ` Ira Weiny
@ 2025-01-23 16:41 ` Jonathan Cameron
2025-01-23 21:34 ` Dan Williams
2025-01-23 17:21 ` Alejandro Lucero Palau
2025-01-23 20:52 ` Dave Jiang
3 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2025-01-23 16:41 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
On Wed, 22 Jan 2025 00:59:33 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> allocations being distinct from RAM allocations in specific ways when in
> practice the allocation rules are only relative to DPA partition index.
>
> The rules for cxl_dpa_alloc() are:
>
> - allocations can only come from 1 partition
>
> - if allocating at partition-index-N, all free space in partitions less
> than partition-index-N must be skipped over
>
> Use the new 'struct cxl_dpa_partition' array to support allocation with
> an arbitrary number of DPA partitions on the device.
>
> A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
> concept and supersede it with looking up the memory properties from
> partition metadata. Until then cxl_part_mode() temporarily bridges code
> that looks up partitions by @cxled->mode.
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A few possible simplifications below + a trivial debug message printing
a useful value comment.
Jonathan
> ---
> drivers/cxl/core/hdm.c | 215 +++++++++++++++++++++++++++++++++++-------------
> drivers/cxl/cxlmem.h | 14 +++
> 2 files changed, 172 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3f8a54ca4624..591aeb26c9e1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
>
> +/* See request_skip() kernel-doc */
> +static void release_skip(struct cxl_dev_state *cxlds,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + resource_size_t skip_end, skip_size;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> +
> + if (!skip_rem)
> + break;
> + }
> +}
Could ignore all explicit ordering constraints and have perhaps simpler
(Even simpler if there is an overlap helper we can use)
Assumption is we want to blow away anything in the skip range, whatever
partition it is in.
for (int i = 0; i < cxlds->nr_paritions; i++) {
const struct resource *part_res = &cxlds->part[i].res;
resource_size_t toremove_start, toremove_end;
toremove_start = max(skip_start, part_res->start);
toremove_end = min(skip_end, part_res->end);
if (toremove_end > toremove_start) {
resource_size_t rem_size = toremove_end - toremove_start + 1;
__release_region(&cxlds->dpa_res, toremove_start, rem_size);
}
}
Can track skip_rem or not bother with that optimization.
Mind you your code is fine so I don't really mind.
I think we can build similar for request_skip based on ordering assumption, though
there we do need to keep track of how far we got so as to unwind only
that bit.
> +
> +static int request_skip(struct cxl_dev_state *cxlds,
> + struct cxl_endpoint_decoder *cxled,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + struct cxl_port *port = cxled_to_port(cxled);
> + resource_size_t skip_end, skip_size;
> + struct resource *res;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> +
> + res = __request_region(&cxlds->dpa_res, skip_start, skip_size,
> + dev_name(&cxled->cxld.dev), 0);
> + if (!res) {
> + dev_dbg(cxlds->dev,
> + "decoder%d.%d: failed to reserve skipped space\n",
> + port->id, cxled->cxld.id);
> + break;
> + }
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> + if (!skip_rem)
> + break;
> + }
> +
> + if (skip_rem == 0)
> + return 0;
> +
> + release_skip(cxlds, skip_base, skip_len - skip_rem);
> +
> + return -EBUSY;
> +}
> @@ -529,15 +625,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - resource_size_t free_ram_start, free_pmem_start;
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> - resource_size_t start, avail, skip;
> + struct resource *res, *prev = NULL;
> + resource_size_t start, avail, skip, skip_start;
> struct resource *p, *last;
> - const struct resource *ram_res = to_ram_res(cxlds);
> - const struct resource *pmem_res = to_pmem_res(cxlds);
> - int rc;
> + int part, rc;
>
> down_write(&cxl_dpa_rwsem);
> if (cxled->cxld.region) {
> @@ -553,47 +647,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = ram_res->child, last = NULL; p; p = p->sibling)
> - last = p;
> - if (last)
> - free_ram_start = last->end + 1;
> - else
> - free_ram_start = ram_res->start;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> + part = i;
> + break;
> + }
> + }
>
> - for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> + if (part < 0) {
> + dev_dbg(dev, "partition %d not found\n", part);
how is part useful to print here? it's -1
> + rc = -EBUSY;
> + goto out;
> + }
Maybe tidier as a check on loop exiting early.
for (part = 0; i < cxlds->nr_partitions; part++) {
if (cxled->mode == cxl_part_mode(cxlds->part[part].mode)
break;
}
if (part == cxlds->nr_partitions) {
dev_dbg(dev, "parition mode %d not found\n", cxled->mode);
rc = -EBUSY;
goto out;
}
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
2025-01-22 17:42 ` Ira Weiny
@ 2025-01-23 16:51 ` Jonathan Cameron
2025-01-23 21:50 ` Dan Williams
2025-01-23 17:20 ` Alejandro Lucero Palau
2025-01-23 21:29 ` Dave Jiang
3 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2025-01-23 16:51 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
On Wed, 22 Jan 2025 00:59:38 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> tracked in the partition, and no code paths have dependencies on the
> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> can be cleaned up, specifically this ambiguity on whether the operation
> mode implied anything about the partition order.
>
> Endpoint decoders simply reference their assigned partition where the
> operational mode can be retrieved as partition mode.
>
> With this in place PMEM can now be partition0 which happens today when
> the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
> DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
> assumption can now just iterate partitions and consult the partition
> mode after the fact.
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A few things inline.
Jonathan
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 591aeb26c9e1..bb478e7b12f6 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
>
> -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode)
> +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> + enum cxl_partition_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> -
> - switch (mode) {
> - case CXL_DECODER_RAM:
> - case CXL_DECODER_PMEM:
> - break;
> - default:
> - dev_dbg(dev, "unsupported mode: %d\n", mode);
> - return -EINVAL;
> - }
> + int part;
>
> guard(rwsem_write)(&cxl_dpa_rwsem);
> if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> return -EBUSY;
>
> - /*
> - * Only allow modes that are supported by the current partition
> - * configuration
> - */
> - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> - dev_dbg(dev, "no available pmem capacity\n");
> - return -ENXIO;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++)
Similar to previous comment can use early loop exit and
part as the loop iteration variable short code and no magic i
appears.
> + if (cxlds->part[i].mode == mode) {
> + part = i;
> + break;
> + }
> +
> + if (part < 0) {
> + dev_dbg(dev, "unsupported mode: %d\n", mode);
> + return -EINVAL;
> }
> - if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> - dev_dbg(dev, "no available ram capacity\n");
> +
> + if (!resource_size(&cxlds->part[part].res)) {
> + dev_dbg(dev, "no available capacity for mode: %d\n", mode);
> return -ENXIO;
> }
>
> - cxled->mode = mode;
> + cxled->part = part;
> return 0;
> }
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 9f0f6fdbc841..83b985d2ba76 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
>
> static int poison_by_decoder(struct device *dev, void *arg)
> {
> struct cxl_poison_context *ctx = arg;
> struct cxl_endpoint_decoder *cxled;
> + enum cxl_partition_mode mode;
> + struct cxl_dev_state *cxlds;
> struct cxl_memdev *cxlmd;
> u64 offset, length;
> int rc = 0;
> @@ -2728,11 +2733,17 @@ static int poison_by_decoder(struct device *dev, void *arg)
> return rc;
>
> cxlmd = cxled_to_memdev(cxled);
> + cxlds = cxlmd->cxlds;
> + if (cxled->part < 0)
> + mode = CXL_PARTMODE_NONE;
Ah. Here is our mysterious none. Maybe add a comment on what
this means in practice. Race condition, actual hole, crazy decoder
someone (e.g bios) setup?
> + else
> + mode = cxlds->part[cxled->part].mode;
> +
> if (cxled->skip) {
> offset = cxled->dpa_res->start - cxled->skip;
> length = cxled->skip;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
> + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
> rc = 0;
> if (rc)
> return rc;
> @@ -2741,7 +2752,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
> offset = cxled->dpa_res->start;
> length = cxled->dpa_res->end - offset + 1;
> rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
> - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
> + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
> rc = 0;
> if (rc)
> return rc;
> @@ -2749,7 +2760,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
> /* Iterate until commit_end is reached */
> if (cxled->cxld.id == ctx->port->commit_end) {
> ctx->offset = cxled->dpa_res->end + 1;
> - ctx->mode = cxled->mode;
> + ctx->part = cxled->part;
> return 1;
> }
>
> @@ -2762,7 +2773,8 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> int rc = 0;
>
> ctx = (struct cxl_poison_context) {
> - .port = port
> + .port = port,
> + .part = -1,
> };
>
> rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
> @@ -3206,14 +3218,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *port = cxlrd_to_port(cxlrd);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct range *hpa = &cxled->cxld.hpa_range;
> + int rc, part = READ_ONCE(cxled->part);
> struct cxl_region_params *p;
> struct cxl_region *cxlr;
> struct resource *res;
> - int rc;
> +
> + if (part < 0)
> + return ERR_PTR(-EBUSY);
>
> do {
> - cxlr = __create_region(cxlrd, cxled->mode,
> + cxlr = __create_region(cxlrd, cxlds->part[part].mode,
> atomic_read(&cxlrd->region_id));
> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>
> @@ -3416,9 +3432,9 @@ static int cxl_region_probe(struct device *dev)
> return rc;
>
> switch (cxlr->mode) {
> - case CXL_DECODER_PMEM:
> + case CXL_PARTMODE_PMEM:
> return devm_cxl_add_pmem_region(cxlr);
> - case CXL_DECODER_RAM:
> + case CXL_PARTMODE_RAM:
> /*
> * The region can not be manged by CXL if any portion of
> * it is already online as 'System RAM'
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-22 14:53 ` Ira Weiny
2025-01-23 16:09 ` Jonathan Cameron
@ 2025-01-23 16:57 ` Dave Jiang
2025-01-23 17:00 ` Alejandro Lucero Palau
2025-01-23 17:17 ` Alejandro Lucero Palau
4 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2025-01-23 16:57 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Alejandro Lucero, Ira Weiny, Jonathan.Cameron
On 1/22/25 1:59 AM, Dan Williams wrote:
> The pending efforts to add CXL Accelerator (type-2) device [1], and
> Dynamic Capacity (DCD) support [2], tripped on the
> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> device-physical-address (DPA) metadata. Trip hazards include:
>
> - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> devices with CXL.mem likely do not in the common case.
>
> - CXL Memory Devices enumerate DPA through Memory Device mailbox
> commands like Partition Info, Accelerators devices do not.
>
> - CXL Memory Devices that support DCD support more than 2 partitions.
> Some of the driver algorithms are awkward to expand to > 2 partition
> cases.
>
> - DPA performance data is a general capability that can be shared with
> accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
> suitable.
>
> - Hardcoded assumptions around the PMEM partition always being index-1
> if RAM is zero-sized or PMEM is zero sized.
>
> - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
> memory property, it should be phased in favor of a partition id and
> the memory property comes from the partition info.
>
> Towards cleaning up those issues and allowing a smoother landing for the
> aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
> array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
> way for Memory Devices and Accelerators to initialize the DPA information
> in 'struct cxl_dev_state'.
>
> For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
> get the new data structure initialized, and cleanup some qos_class init.
> Follow on patches will go further to use the new data structure to
> cleanup algorithms that are better suited to loop over all possible
> partitions.
>
> cxl_dpa_setup() follows the locking expectations of mutating the device
> DPA map, and is suitable for Accelerator drivers to use. Accelerators
> likely only have one hardcoded 'ram' partition to convey to the
> cxl_core.
>
> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/cdat.c | 15 ++-----
> drivers/cxl/core/hdm.c | 75 +++++++++++++++++++++++++++++++++-
> drivers/cxl/core/mbox.c | 68 ++++++++++--------------------
> drivers/cxl/core/memdev.c | 2 -
> drivers/cxl/cxlmem.h | 94 +++++++++++++++++++++++++++++-------------
> drivers/cxl/pci.c | 7 +++
> tools/testing/cxl/test/cxl.c | 15 ++-----
> tools/testing/cxl/test/mem.c | 7 +++
> 8 files changed, 183 insertions(+), 100 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index b177a488e29b..5400a421ad30 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct device *dev = cxlds->dev;
> struct dsmas_entry *dent;
> unsigned long index;
> - const struct resource *partition[] = {
> - to_ram_res(cxlds),
> - to_pmem_res(cxlds),
> - };
> - struct cxl_dpa_perf *perf[] = {
> - to_ram_perf(cxlds),
> - to_pmem_perf(cxlds),
> - };
>
> xa_for_each(dsmas_xa, index, dent) {
> - for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> - const struct resource *res = partition[i];
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + struct resource *res = &cxlds->part[i].res;
> struct range range = {
> .start = res->start,
> .end = res->end,
> };
>
> if (range_contains(&range, &dent->dpa_range))
> - update_perf_entry(dev, dent, perf[i]);
> + update_perf_entry(dev, dent,
> + &cxlds->part[i].perf);
> else
> dev_dbg(dev,
> "no partition for dsmas dpa: %pra\n",
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 7a85522294ad..3f8a54ca4624 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (resource_contains(to_pmem_res(cxlds), res))
> + if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
> cxled->mode = CXL_DECODER_PMEM;
> - else if (resource_contains(to_ram_res(cxlds), res))
> + else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> @@ -342,6 +342,77 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> return 0;
> }
>
> +static int add_dpa_res(struct device *dev, struct resource *parent,
> + struct resource *res, resource_size_t start,
> + resource_size_t size, const char *type)
> +{
> + int rc;
> +
> + *res = (struct resource) {
> + .name = type,
> + .start = start,
> + .end = start + size - 1,
> + .flags = IORESOURCE_MEM,
> + };
> + if (resource_size(res) == 0) {
> + dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
> + return 0;
> + }
> + rc = request_resource(parent, res);
> + if (rc) {
> + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
> + res, rc);
> + return rc;
> + }
> +
> + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
> +
> + return 0;
> +}
> +
> +/* if this fails the caller must destroy @cxlds, there is no recovery */
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> +{
> + struct device *dev = cxlds->dev;
> +
> + guard(rwsem_write)(&cxl_dpa_rwsem);
> +
> + if (cxlds->nr_partitions)
> + return -EBUSY;
> +
> + if (!info->size || !info->nr_partitions) {
> + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> + cxlds->nr_partitions = 0;
> + return 0;
> + }
> +
> + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> +
> + for (int i = 0; i < info->nr_partitions; i++) {
> + const struct cxl_dpa_part_info *part = &info->part[i];
> + const char *desc;
> + int rc;
> +
> + if (part->mode == CXL_PARTMODE_RAM)
> + desc = "ram";
> + else if (part->mode == CXL_PARTMODE_PMEM)
> + desc = "pmem";
> + else
> + desc = "";
> + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> + cxlds->part[i].mode = part->mode;
> + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> + part->range.start, range_len(&part->range),
> + desc);
> + if (rc)
> + return rc;
> + cxlds->nr_partitions++;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
> +
> int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped)
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 3502f1633ad2..62bb3653362f 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1241,57 +1241,39 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
> return rc;
> }
>
> -static int add_dpa_res(struct device *dev, struct resource *parent,
> - struct resource *res, resource_size_t start,
> - resource_size_t size, const char *type)
> +static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
> {
> - int rc;
> + int i = info->nr_partitions;
>
> - res->name = type;
> - res->start = start;
> - res->end = start + size - 1;
> - res->flags = IORESOURCE_MEM;
> - if (resource_size(res) == 0) {
> - dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
> - return 0;
> - }
> - rc = request_resource(parent, res);
> - if (rc) {
> - dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
> - res, rc);
> - return rc;
> - }
> -
> - dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
> + if (size == 0)
> + return;
>
> - return 0;
> + info->part[i].range = (struct range) {
> + .start = start,
> + .end = start + size - 1,
> + };
> + info->part[i].mode = mode;
> + info->nr_partitions++;
> }
>
> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> - struct resource *ram_res = to_ram_res(cxlds);
> - struct resource *pmem_res = to_pmem_res(cxlds);
> struct device *dev = cxlds->dev;
> int rc;
>
> if (!cxlds->media_ready) {
> - cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> - *ram_res = DEFINE_RES_MEM(0, 0);
> - *pmem_res = DEFINE_RES_MEM(0, 0);
> + info->size = 0;
> return 0;
> }
>
> - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
> + info->size = mds->total_bytes;
>
> if (mds->partition_align_bytes == 0) {
> - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> - mds->volatile_only_bytes, "ram");
> - if (rc)
> - return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> - mds->volatile_only_bytes,
> - mds->persistent_only_bytes, "pmem");
> + add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
> + add_part(info, mds->volatile_only_bytes,
> + mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
> + return 0;
> }
>
> rc = cxl_mem_get_partition_info(mds);
> @@ -1300,15 +1282,13 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> return rc;
> }
>
> - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> - mds->active_volatile_bytes, "ram");
> - if (rc)
> - return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> - mds->active_volatile_bytes,
> - mds->active_persistent_bytes, "pmem");
> + add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
> + add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
> + CXL_PARTMODE_PMEM);
> +
> + return 0;
> }
> -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
>
> int cxl_set_timestamp(struct cxl_memdev_state *mds)
> {
> @@ -1452,8 +1432,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> mds->cxlds.reg_map.host = dev;
> mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> - to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
> - to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
>
> return mds;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index c5f8320ed330..be0eb57086e1 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(to_ram_res(cxlds));
> + unsigned long long len = cxl_ram_size(cxlds);
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 78e92e24d7b5..15f549afab7c 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -97,6 +97,25 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped);
>
> +enum cxl_partition_mode {
> + CXL_PARTMODE_NONE,
> + CXL_PARTMODE_RAM,
> + CXL_PARTMODE_PMEM,
> +};
> +
> +#define CXL_NR_PARTITIONS_MAX 2
> +
> +struct cxl_dpa_info {
> + u64 size;
> + struct cxl_dpa_part_info {
> + struct range range;
> + enum cxl_partition_mode mode;
> + } part[CXL_NR_PARTITIONS_MAX];
> + int nr_partitions;
> +};
> +
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
> +
> static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> struct cxl_memdev *cxlmd)
> {
> @@ -408,6 +427,18 @@ struct cxl_dpa_perf {
> int qos_class;
> };
>
> +/**
> + * struct cxl_dpa_partition - DPA partition descriptor
> + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
> + * @perf: performance attributes of the partition from CDAT
> + * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
> + */
> +struct cxl_dpa_partition {
> + struct resource res;
> + struct cxl_dpa_perf perf;
> + enum cxl_partition_mode mode;
> +};
> +
> /**
> * struct cxl_dev_state - The driver device state
> *
> @@ -423,8 +454,8 @@ struct cxl_dpa_perf {
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> - * @_pmem_res: Active Persistent memory capacity configuration
> - * @_ram_res: Active Volatile memory capacity configuration
> + * @part: DPA partition array
> + * @nr_partitions: Number of DPA partitions
> * @serial: PCIe Device Serial Number
> * @type: Generic Memory Class device or Vendor Specific Memory device
> * @cxl_mbox: CXL mailbox context
> @@ -438,21 +469,47 @@ struct cxl_dev_state {
> bool rcd;
> bool media_ready;
> struct resource dpa_res;
> - struct resource _pmem_res;
> - struct resource _ram_res;
> + struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
> + unsigned int nr_partitions;
> u64 serial;
> enum cxl_devtype type;
> struct cxl_mailbox cxl_mbox;
> };
>
> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +
> +/* Static RAM is only expected at partition 0. */
> +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +{
> + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> + return NULL;
> + return &cxlds->part[0].res;
> +}
> +
> +/*
> + * Static PMEM may be at partition index 0 when there is no static RAM
> + * capacity.
> + */
> +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +{
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> + return &cxlds->part[i].res;
> + return NULL;
> +}
> +
> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> {
> - return &cxlds->_ram_res;
> + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> + return NULL;
> + return &cxlds->part[0].perf;
> }
>
> -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> {
> - return &cxlds->_pmem_res;
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> + return &cxlds->part[i].perf;
> + return NULL;
> }
>
> static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
> @@ -499,8 +556,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> * @active_persistent_bytes: sum of hard + soft persistent
> * @next_volatile_bytes: volatile capacity change pending device reset
> * @next_persistent_bytes: persistent capacity change pending device reset
> - * @_ram_perf: performance data entry matched to RAM partition
> - * @_pmem_perf: performance data entry matched to PMEM partition
> * @event: event log driver state
> * @poison: poison driver state info
> * @security: security driver state info
> @@ -524,29 +579,12 @@ struct cxl_memdev_state {
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
>
> - struct cxl_dpa_perf _ram_perf;
> - struct cxl_dpa_perf _pmem_perf;
> -
> struct cxl_event_state event;
> struct cxl_poison_state poison;
> struct cxl_security_state security;
> struct cxl_fw_state fw;
> };
>
> -static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> -{
> - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> -
> - return &mds->_ram_perf;
> -}
> -
> -static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> -{
> - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> -
> - return &mds->_pmem_perf;
> -}
> -
> static inline struct cxl_memdev_state *
> to_cxl_memdev_state(struct cxl_dev_state *cxlds)
> {
> @@ -860,7 +898,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
> int cxl_dev_state_identify(struct cxl_memdev_state *mds);
> int cxl_await_media_ready(struct cxl_dev_state *cxlds);
> int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds);
> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
> struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0241d1d7133a..47dbfe406236 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd);
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> + struct cxl_dpa_info range_info = { 0 };
> struct cxl_memdev_state *mds;
> struct cxl_dev_state *cxlds;
> struct cxl_register_map map;
> @@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> - rc = cxl_mem_create_range_info(mds);
> + rc = cxl_mem_dpa_fetch(mds, &range_info);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dpa_setup(cxlds, &range_info);
> if (rc)
> return rc;
>
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 7f1c5061307b..ba3d48b37de3 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
> - const struct resource *partition[] = {
> - to_ram_res(cxlds),
> - to_pmem_res(cxlds),
> - };
> - struct cxl_dpa_perf *perf[] = {
> - to_ram_perf(cxlds),
> - to_pmem_perf(cxlds),
> - };
>
> if (!cxl_root)
> return;
>
> - for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> - const struct resource *res = partition[i];
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + struct resource *res = &cxlds->part[i].res;
> + struct cxl_dpa_perf *perf = &cxlds->part[i].perf;
> struct range range = {
> .start = res->start,
> .end = res->end,
> };
>
> - dpa_perf_setup(port, &range, perf[i]);
> + dpa_perf_setup(port, &range, perf);
> }
>
> cxl_memdev_update_perf(cxlmd);
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 347c1e7b37bd..ed365e083c8f 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> struct cxl_dev_state *cxlds;
> struct cxl_mockmem_data *mdata;
> struct cxl_mailbox *cxl_mbox;
> + struct cxl_dpa_info range_info = { 0 };
> int rc;
>
> mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
> @@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> if (rc)
> return rc;
>
> - rc = cxl_mem_create_range_info(mds);
> + rc = cxl_mem_dpa_fetch(mds, &range_info);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dpa_setup(cxlds, &range_info);
> if (rc)
> return rc;
>
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
` (2 preceding siblings ...)
2025-01-23 16:57 ` Dave Jiang
@ 2025-01-23 17:00 ` Alejandro Lucero Palau
2025-01-23 22:43 ` Dan Williams
2025-01-23 17:17 ` Alejandro Lucero Palau
4 siblings, 1 reply; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 17:00 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
<snip>
>
> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +
> +/* Static RAM is only expected at partition 0. */
This may be seen silly but Static Ram as SRAM has connotations which
could confuse people.
Maybe better to use Static partition for RAM.
It is far more concerning to me though Ira's comment about how can we be
sure there is going to be only one RAM or one PMEM partition.
Apart from that, LGTM:
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
<snip>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
` (3 preceding siblings ...)
2025-01-23 17:00 ` Alejandro Lucero Palau
@ 2025-01-23 17:17 ` Alejandro Lucero Palau
2025-01-23 22:48 ` Dan Williams
4 siblings, 1 reply; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 17:17 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
On 1/22/25 08:59, Dan Williams wrote:
> The pending efforts to add CXL Accelerator (type-2) device [1], and
> Dynamic Capacity (DCD) support [2], tripped on the
> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> device-physical-address (DPA) metadata. Trip hazards include:
>
> - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> devices with CXL.mem likely do not in the common case.
>
> - CXL Memory Devices enumerate DPA through Memory Device mailbox
> commands like Partition Info, Accelerators devices do not.
Forgot to mention this one. Mailbox is optional for accelerators.
So maybe "but this way is optional for accelerators, implying
accel-driver defined/hardcoded enumeration".
>
> - CXL Memory Devices that support DCD support more than 2 partitions.
> Some of the driver algorithms are awkward to expand to > 2 partition
> cases.
>
> - DPA performance data is a general capability that can be shared with
> accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
> suitable.
>
> - Hardcoded assumptions around the PMEM partition always being index-1
> if RAM is zero-sized or PMEM is zero sized.
>
> - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
> memory property, it should be phased in favor of a partition id and
> the memory property comes from the partition info.
>
> Towards cleaning up those issues and allowing a smoother landing for the
> aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
> array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
> way for Memory Devices and Accelerators to initialize the DPA information
> in 'struct cxl_dev_state'.
>
> For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
> get the new data structure initialized, and cleanup some qos_class init.
> Follow on patches will go further to use the new data structure to
> cleanup algorithms that are better suited to loop over all possible
> partitions.
>
> cxl_dpa_setup() follows the locking expectations of mutating the device
> DPA map, and is suitable for Accelerator drivers to use. Accelerators
> likely only have one hardcoded 'ram' partition to convey to the
> cxl_core.
>
> Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/core/cdat.c | 15 ++-----
> drivers/cxl/core/hdm.c | 75 +++++++++++++++++++++++++++++++++-
> drivers/cxl/core/mbox.c | 68 ++++++++++--------------------
> drivers/cxl/core/memdev.c | 2 -
> drivers/cxl/cxlmem.h | 94 +++++++++++++++++++++++++++++-------------
> drivers/cxl/pci.c | 7 +++
> tools/testing/cxl/test/cxl.c | 15 ++-----
> tools/testing/cxl/test/mem.c | 7 +++
> 8 files changed, 183 insertions(+), 100 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index b177a488e29b..5400a421ad30 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> struct device *dev = cxlds->dev;
> struct dsmas_entry *dent;
> unsigned long index;
> - const struct resource *partition[] = {
> - to_ram_res(cxlds),
> - to_pmem_res(cxlds),
> - };
> - struct cxl_dpa_perf *perf[] = {
> - to_ram_perf(cxlds),
> - to_pmem_perf(cxlds),
> - };
>
> xa_for_each(dsmas_xa, index, dent) {
> - for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> - const struct resource *res = partition[i];
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + struct resource *res = &cxlds->part[i].res;
> struct range range = {
> .start = res->start,
> .end = res->end,
> };
>
> if (range_contains(&range, &dent->dpa_range))
> - update_perf_entry(dev, dent, perf[i]);
> + update_perf_entry(dev, dent,
> + &cxlds->part[i].perf);
> else
> dev_dbg(dev,
> "no partition for dsmas dpa: %pra\n",
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 7a85522294ad..3f8a54ca4624 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (resource_contains(to_pmem_res(cxlds), res))
> + if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
> cxled->mode = CXL_DECODER_PMEM;
> - else if (resource_contains(to_ram_res(cxlds), res))
> + else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
> cxled->mode = CXL_DECODER_RAM;
> else {
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> @@ -342,6 +342,77 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> return 0;
> }
>
> +static int add_dpa_res(struct device *dev, struct resource *parent,
> + struct resource *res, resource_size_t start,
> + resource_size_t size, const char *type)
> +{
> + int rc;
> +
> + *res = (struct resource) {
> + .name = type,
> + .start = start,
> + .end = start + size - 1,
> + .flags = IORESOURCE_MEM,
> + };
> + if (resource_size(res) == 0) {
> + dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
> + return 0;
> + }
> + rc = request_resource(parent, res);
> + if (rc) {
> + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
> + res, rc);
> + return rc;
> + }
> +
> + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
> +
> + return 0;
> +}
> +
> +/* if this fails the caller must destroy @cxlds, there is no recovery */
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> +{
> + struct device *dev = cxlds->dev;
> +
> + guard(rwsem_write)(&cxl_dpa_rwsem);
> +
> + if (cxlds->nr_partitions)
> + return -EBUSY;
> +
> + if (!info->size || !info->nr_partitions) {
> + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> + cxlds->nr_partitions = 0;
> + return 0;
> + }
> +
> + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> +
> + for (int i = 0; i < info->nr_partitions; i++) {
> + const struct cxl_dpa_part_info *part = &info->part[i];
> + const char *desc;
> + int rc;
> +
> + if (part->mode == CXL_PARTMODE_RAM)
> + desc = "ram";
> + else if (part->mode == CXL_PARTMODE_PMEM)
> + desc = "pmem";
> + else
> + desc = "";
> + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID;
> + cxlds->part[i].mode = part->mode;
> + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res,
> + part->range.start, range_len(&part->range),
> + desc);
> + if (rc)
> + return rc;
> + cxlds->nr_partitions++;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cxl_dpa_setup);
> +
> int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped)
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 3502f1633ad2..62bb3653362f 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1241,57 +1241,39 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
> return rc;
> }
>
> -static int add_dpa_res(struct device *dev, struct resource *parent,
> - struct resource *res, resource_size_t start,
> - resource_size_t size, const char *type)
> +static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
> {
> - int rc;
> + int i = info->nr_partitions;
>
> - res->name = type;
> - res->start = start;
> - res->end = start + size - 1;
> - res->flags = IORESOURCE_MEM;
> - if (resource_size(res) == 0) {
> - dev_dbg(dev, "DPA(%s): no capacity\n", res->name);
> - return 0;
> - }
> - rc = request_resource(parent, res);
> - if (rc) {
> - dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name,
> - res, rc);
> - return rc;
> - }
> -
> - dev_dbg(dev, "DPA(%s): %pr\n", res->name, res);
> + if (size == 0)
> + return;
>
> - return 0;
> + info->part[i].range = (struct range) {
> + .start = start,
> + .end = start + size - 1,
> + };
> + info->part[i].mode = mode;
> + info->nr_partitions++;
> }
>
> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> - struct resource *ram_res = to_ram_res(cxlds);
> - struct resource *pmem_res = to_pmem_res(cxlds);
> struct device *dev = cxlds->dev;
> int rc;
>
> if (!cxlds->media_ready) {
> - cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> - *ram_res = DEFINE_RES_MEM(0, 0);
> - *pmem_res = DEFINE_RES_MEM(0, 0);
> + info->size = 0;
> return 0;
> }
>
> - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes);
> + info->size = mds->total_bytes;
>
> if (mds->partition_align_bytes == 0) {
> - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> - mds->volatile_only_bytes, "ram");
> - if (rc)
> - return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> - mds->volatile_only_bytes,
> - mds->persistent_only_bytes, "pmem");
> + add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
> + add_part(info, mds->volatile_only_bytes,
> + mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
> + return 0;
> }
>
> rc = cxl_mem_get_partition_info(mds);
> @@ -1300,15 +1282,13 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> return rc;
> }
>
> - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0,
> - mds->active_volatile_bytes, "ram");
> - if (rc)
> - return rc;
> - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res,
> - mds->active_volatile_bytes,
> - mds->active_persistent_bytes, "pmem");
> + add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
> + add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
> + CXL_PARTMODE_PMEM);
> +
> + return 0;
> }
> -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
>
> int cxl_set_timestamp(struct cxl_memdev_state *mds)
> {
> @@ -1452,8 +1432,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> mds->cxlds.reg_map.host = dev;
> mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
> - to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
> - to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID;
>
> return mds;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index c5f8320ed330..be0eb57086e1 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - unsigned long long len = resource_size(to_ram_res(cxlds));
> + unsigned long long len = cxl_ram_size(cxlds);
>
> return sysfs_emit(buf, "%#llx\n", len);
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 78e92e24d7b5..15f549afab7c 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -97,6 +97,25 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped);
>
> +enum cxl_partition_mode {
> + CXL_PARTMODE_NONE,
> + CXL_PARTMODE_RAM,
> + CXL_PARTMODE_PMEM,
> +};
> +
> +#define CXL_NR_PARTITIONS_MAX 2
> +
> +struct cxl_dpa_info {
> + u64 size;
> + struct cxl_dpa_part_info {
> + struct range range;
> + enum cxl_partition_mode mode;
> + } part[CXL_NR_PARTITIONS_MAX];
> + int nr_partitions;
> +};
> +
> +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
> +
> static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
> struct cxl_memdev *cxlmd)
> {
> @@ -408,6 +427,18 @@ struct cxl_dpa_perf {
> int qos_class;
> };
>
> +/**
> + * struct cxl_dpa_partition - DPA partition descriptor
> + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
> + * @perf: performance attributes of the partition from CDAT
> + * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
> + */
> +struct cxl_dpa_partition {
> + struct resource res;
> + struct cxl_dpa_perf perf;
> + enum cxl_partition_mode mode;
> +};
> +
> /**
> * struct cxl_dev_state - The driver device state
> *
> @@ -423,8 +454,8 @@ struct cxl_dpa_perf {
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> - * @_pmem_res: Active Persistent memory capacity configuration
> - * @_ram_res: Active Volatile memory capacity configuration
> + * @part: DPA partition array
> + * @nr_partitions: Number of DPA partitions
> * @serial: PCIe Device Serial Number
> * @type: Generic Memory Class device or Vendor Specific Memory device
> * @cxl_mbox: CXL mailbox context
> @@ -438,21 +469,47 @@ struct cxl_dev_state {
> bool rcd;
> bool media_ready;
> struct resource dpa_res;
> - struct resource _pmem_res;
> - struct resource _ram_res;
> + struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
> + unsigned int nr_partitions;
> u64 serial;
> enum cxl_devtype type;
> struct cxl_mailbox cxl_mbox;
> };
>
> -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +
> +/* Static RAM is only expected at partition 0. */
> +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> +{
> + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> + return NULL;
> + return &cxlds->part[0].res;
> +}
> +
> +/*
> + * Static PMEM may be at partition index 0 when there is no static RAM
> + * capacity.
> + */
> +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +{
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> + return &cxlds->part[i].res;
> + return NULL;
> +}
> +
> +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> {
> - return &cxlds->_ram_res;
> + if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
> + return NULL;
> + return &cxlds->part[0].perf;
> }
>
> -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds)
> +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> {
> - return &cxlds->_pmem_res;
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
> + return &cxlds->part[i].perf;
> + return NULL;
> }
>
> static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
> @@ -499,8 +556,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> * @active_persistent_bytes: sum of hard + soft persistent
> * @next_volatile_bytes: volatile capacity change pending device reset
> * @next_persistent_bytes: persistent capacity change pending device reset
> - * @_ram_perf: performance data entry matched to RAM partition
> - * @_pmem_perf: performance data entry matched to PMEM partition
> * @event: event log driver state
> * @poison: poison driver state info
> * @security: security driver state info
> @@ -524,29 +579,12 @@ struct cxl_memdev_state {
> u64 next_volatile_bytes;
> u64 next_persistent_bytes;
>
> - struct cxl_dpa_perf _ram_perf;
> - struct cxl_dpa_perf _pmem_perf;
> -
> struct cxl_event_state event;
> struct cxl_poison_state poison;
> struct cxl_security_state security;
> struct cxl_fw_state fw;
> };
>
> -static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
> -{
> - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> -
> - return &mds->_ram_perf;
> -}
> -
> -static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
> -{
> - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds);
> -
> - return &mds->_pmem_perf;
> -}
> -
> static inline struct cxl_memdev_state *
> to_cxl_memdev_state(struct cxl_dev_state *cxlds)
> {
> @@ -860,7 +898,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
> int cxl_dev_state_identify(struct cxl_memdev_state *mds);
> int cxl_await_media_ready(struct cxl_dev_state *cxlds);
> int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
> -int cxl_mem_create_range_info(struct cxl_memdev_state *mds);
> +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
> struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0241d1d7133a..47dbfe406236 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd);
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> + struct cxl_dpa_info range_info = { 0 };
> struct cxl_memdev_state *mds;
> struct cxl_dev_state *cxlds;
> struct cxl_register_map map;
> @@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> - rc = cxl_mem_create_range_info(mds);
> + rc = cxl_mem_dpa_fetch(mds, &range_info);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dpa_setup(cxlds, &range_info);
> if (rc)
> return rc;
>
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 7f1c5061307b..ba3d48b37de3 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
> struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
> - const struct resource *partition[] = {
> - to_ram_res(cxlds),
> - to_pmem_res(cxlds),
> - };
> - struct cxl_dpa_perf *perf[] = {
> - to_ram_perf(cxlds),
> - to_pmem_perf(cxlds),
> - };
>
> if (!cxl_root)
> return;
>
> - for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> - const struct resource *res = partition[i];
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + struct resource *res = &cxlds->part[i].res;
> + struct cxl_dpa_perf *perf = &cxlds->part[i].perf;
> struct range range = {
> .start = res->start,
> .end = res->end,
> };
>
> - dpa_perf_setup(port, &range, perf[i]);
> + dpa_perf_setup(port, &range, perf);
> }
>
> cxl_memdev_update_perf(cxlmd);
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 347c1e7b37bd..ed365e083c8f 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> struct cxl_dev_state *cxlds;
> struct cxl_mockmem_data *mdata;
> struct cxl_mailbox *cxl_mbox;
> + struct cxl_dpa_info range_info = { 0 };
> int rc;
>
> mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
> @@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> if (rc)
> return rc;
>
> - rc = cxl_mem_create_range_info(mds);
> + rc = cxl_mem_dpa_fetch(mds, &range_info);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dpa_setup(cxlds, &range_info);
> if (rc)
> return rc;
>
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
2025-01-22 17:42 ` Ira Weiny
2025-01-23 16:51 ` Jonathan Cameron
@ 2025-01-23 17:20 ` Alejandro Lucero Palau
2025-01-23 21:29 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 17:20 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
On 1/22/25 08:59, Dan Williams wrote:
> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> tracked in the partition, and no code paths have dependencies on the
> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> can be cleaned up, specifically this ambiguity on whether the operation
> mode implied anything about the partition order.
>
> Endpoint decoders simply reference their assigned partition where the
> operational mode can be retrieved as partition mode.
>
> With this in place PMEM can now be partition0 which happens today when
> the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
> DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
> assumption can now just iterate partitions and consult the partition
> mode after the fact.
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/cdat.c | 21 ++-----
> drivers/cxl/core/core.h | 4 +
> drivers/cxl/core/hdm.c | 64 +++++++----------------
> drivers/cxl/core/memdev.c | 15 +----
> drivers/cxl/core/port.c | 20 +++++--
> drivers/cxl/core/region.c | 128 +++++++++++++++++++++++++--------------------
> drivers/cxl/cxl.h | 38 ++++---------
> drivers/cxl/cxlmem.h | 20 -------
> 8 files changed, 127 insertions(+), 183 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 5400a421ad30..ca7fb2b182ed 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -571,29 +571,18 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
> .end = dpa_res->end,
> };
>
> - if (!perf)
> - return false;
> -
> return range_contains(&perf->dpa_range, &dpa);
> }
>
> -static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode)
> +static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_dpa_perf *perf;
>
> - switch (mode) {
> - case CXL_DECODER_RAM:
> - perf = to_ram_perf(cxlds);
> - break;
> - case CXL_DECODER_PMEM:
> - perf = to_pmem_perf(cxlds);
> - break;
> - default:
> + if (cxled->part < 0)
> return ERR_PTR(-EINVAL);
> - }
> + perf = &cxlds->part[cxled->part].perf;
>
> if (!dpa_perf_contains(perf, cxled->dpa_res))
> return ERR_PTR(-EINVAL);
> @@ -654,7 +643,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr,
> if (cxlds->rcd)
> return -ENODEV;
>
> - perf = cxled_get_dpa_perf(cxled, cxlr->mode);
> + perf = cxled_get_dpa_perf(cxled);
> if (IS_ERR(perf))
> return PTR_ERR(perf);
>
> @@ -1060,7 +1049,7 @@ void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
>
> lockdep_assert_held(&cxl_dpa_rwsem);
>
> - perf = cxled_get_dpa_perf(cxled, cxlr->mode);
> + perf = cxled_get_dpa_perf(cxled);
> if (IS_ERR(perf))
> return;
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 800466f96a68..22dac79c5192 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -72,8 +72,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> resource_size_t length);
>
> struct dentry *cxl_debugfs_create_dir(const char *dir);
> -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode);
> +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> + enum cxl_partition_mode mode);
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 591aeb26c9e1..bb478e7b12f6 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -374,7 +374,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &port->dev;
> - enum cxl_decoder_mode mode;
> struct resource *res;
> int rc;
>
> @@ -421,18 +420,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - mode = CXL_DECODER_NONE;
> - for (int i = 0; cxlds->nr_partitions; i++)
> - if (resource_contains(&cxlds->part[i].res, res)) {
> - mode = cxl_part_mode(cxlds->part[i].mode);
> - break;
> - }
> -
> - if (mode == CXL_DECODER_NONE)
> - dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> - port->id, cxled->cxld.id, res);
> - cxled->mode = mode;
> -
> port->hdm_end++;
> get_device(&cxled->cxld.dev);
> return 0;
> @@ -585,40 +572,36 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> return rc;
> }
>
> -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode)
> +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> + enum cxl_partition_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> -
> - switch (mode) {
> - case CXL_DECODER_RAM:
> - case CXL_DECODER_PMEM:
> - break;
> - default:
> - dev_dbg(dev, "unsupported mode: %d\n", mode);
> - return -EINVAL;
> - }
> + int part;
>
> guard(rwsem_write)(&cxl_dpa_rwsem);
> if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> return -EBUSY;
>
> - /*
> - * Only allow modes that are supported by the current partition
> - * configuration
> - */
> - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> - dev_dbg(dev, "no available pmem capacity\n");
> - return -ENXIO;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == mode) {
> + part = i;
> + break;
> + }
> +
> + if (part < 0) {
> + dev_dbg(dev, "unsupported mode: %d\n", mode);
> + return -EINVAL;
> }
> - if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> - dev_dbg(dev, "no available ram capacity\n");
> +
> + if (!resource_size(&cxlds->part[part].res)) {
> + dev_dbg(dev, "no available capacity for mode: %d\n", mode);
> return -ENXIO;
> }
>
> - cxled->mode = mode;
> + cxled->part = part;
> return 0;
> }
>
> @@ -647,16 +630,9 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - part = -1;
> - for (int i = 0; i < cxlds->nr_partitions; i++) {
> - if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> - part = i;
> - break;
> - }
> - }
> -
> + part = cxled->part;
> if (part < 0) {
> - dev_dbg(dev, "partition %d not found\n", part);
> + dev_dbg(dev, "partition not set\n");
> rc = -EBUSY;
> goto out;
> }
> @@ -697,7 +673,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>
> if (size > avail) {
> dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
> - cxl_decoder_mode_name(cxled->mode), &avail);
> + res->name, &avail);
> rc = -ENOSPC;
> goto out;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index be0eb57086e1..615cbd861f66 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -198,17 +198,8 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> int rc = 0;
>
> /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
> - if (cxl_pmem_size(cxlds)) {
> - const struct resource *res = to_pmem_res(cxlds);
> -
> - offset = res->start;
> - length = resource_size(res);
> - rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc)
> - return rc;
> - }
> - if (cxl_ram_size(cxlds)) {
> - const struct resource *res = to_ram_res(cxlds);
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *res = &cxlds->part[i].res;
>
> offset = res->start;
> length = resource_size(res);
> @@ -217,7 +208,7 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> * Invalid Physical Address is not an error for
> * volatile addresses. Device support is optional.
> */
> - if (rc == -EFAULT)
> + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> rc = 0;
> }
> return rc;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 78a5c2c25982..f5f2701c8771 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -194,25 +194,35 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + /* without @cxl_dpa_rwsem, make sure @part is not reloaded */
> + int part = READ_ONCE(cxled->part);
> + const char *desc;
> +
> + if (part < 0)
> + desc = "none";
> + else
> + desc = cxlds->part[part].res.name;
>
> - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
> + return sysfs_emit(buf, "%s\n", desc);
> }
>
> static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> - enum cxl_decoder_mode mode;
> + enum cxl_partition_mode mode;
> ssize_t rc;
>
> if (sysfs_streq(buf, "pmem"))
> - mode = CXL_DECODER_PMEM;
> + mode = CXL_PARTMODE_PMEM;
> else if (sysfs_streq(buf, "ram"))
> - mode = CXL_DECODER_RAM;
> + mode = CXL_PARTMODE_RAM;
> else
> return -EINVAL;
>
> - rc = cxl_dpa_set_mode(cxled, mode);
> + rc = cxl_dpa_set_part(cxled, mode);
> if (rc)
> return rc;
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 9f0f6fdbc841..83b985d2ba76 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -144,7 +144,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
> rc = down_read_interruptible(&cxl_region_rwsem);
> if (rc)
> return rc;
> - if (cxlr->mode != CXL_DECODER_PMEM)
> + if (cxlr->mode != CXL_PARTMODE_PMEM)
> rc = sysfs_emit(buf, "\n");
> else
> rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
> @@ -441,7 +441,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
> * Support tooling that expects to find a 'uuid' attribute for all
> * regions regardless of mode.
> */
> - if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
> + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_PARTMODE_PMEM)
> return 0444;
> return a->mode;
> }
> @@ -603,8 +603,16 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> struct cxl_region *cxlr = to_cxl_region(dev);
> + const char *desc;
>
> - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
> + if (cxlr->mode == CXL_PARTMODE_RAM)
> + desc = "ram";
> + else if (cxlr->mode == CXL_PARTMODE_PMEM)
> + desc = "pmem";
> + else
> + desc = "";
> +
> + return sysfs_emit(buf, "%s\n", desc);
> }
> static DEVICE_ATTR_RO(mode);
>
> @@ -630,7 +638,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
>
> /* ways, granularity and uuid (if PMEM) need to be set before HPA */
> if (!p->interleave_ways || !p->interleave_granularity ||
> - (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid)))
> + (cxlr->mode == CXL_PARTMODE_PMEM && uuid_is_null(&p->uuid)))
> return -ENXIO;
>
> div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder);
> @@ -1875,6 +1883,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> {
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_port *ep_port, *root_port;
> struct cxl_dport *dport;
> @@ -1889,17 +1898,17 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return rc;
> }
>
> - if (cxled->mode != cxlr->mode) {
> - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> - return -EINVAL;
> - }
> -
> - if (cxled->mode == CXL_DECODER_DEAD) {
> + if (cxled->part < 0) {
> dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> return -ENODEV;
> }
>
> + if (cxlds->part[cxled->part].mode != cxlr->mode) {
> + dev_dbg(&cxlr->dev, "%s region mode: %d mismatch\n",
> + dev_name(&cxled->cxld.dev), cxlr->mode);
> + return -EINVAL;
> + }
> +
> /* all full of members, or interleave config not established? */
> if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> dev_dbg(&cxlr->dev, "region already active\n");
> @@ -2102,7 +2111,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
> {
> down_write(&cxl_region_rwsem);
> - cxled->mode = CXL_DECODER_DEAD;
> + cxled->part = -1;
> cxl_region_detach(cxled);
> up_write(&cxl_region_rwsem);
> }
> @@ -2458,7 +2467,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb,
> */
> static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> int id,
> - enum cxl_decoder_mode mode,
> + enum cxl_partition_mode mode,
> enum cxl_decoder_type type)
> {
> struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> @@ -2512,13 +2521,13 @@ static ssize_t create_ram_region_show(struct device *dev,
> }
>
> static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> - enum cxl_decoder_mode mode, int id)
> + enum cxl_partition_mode mode, int id)
> {
> int rc;
>
> switch (mode) {
> - case CXL_DECODER_RAM:
> - case CXL_DECODER_PMEM:
> + case CXL_PARTMODE_RAM:
> + case CXL_PARTMODE_PMEM:
> break;
> default:
> dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
> @@ -2538,7 +2547,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> }
>
> static ssize_t create_region_store(struct device *dev, const char *buf,
> - size_t len, enum cxl_decoder_mode mode)
> + size_t len, enum cxl_partition_mode mode)
> {
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> struct cxl_region *cxlr;
> @@ -2559,7 +2568,7 @@ static ssize_t create_pmem_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> {
> - return create_region_store(dev, buf, len, CXL_DECODER_PMEM);
> + return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
> }
> DEVICE_ATTR_RW(create_pmem_region);
>
> @@ -2567,7 +2576,7 @@ static ssize_t create_ram_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> {
> - return create_region_store(dev, buf, len, CXL_DECODER_RAM);
> + return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
> }
> DEVICE_ATTR_RW(create_ram_region);
>
> @@ -2665,7 +2674,7 @@ EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, "CXL");
>
> struct cxl_poison_context {
> struct cxl_port *port;
> - enum cxl_decoder_mode mode;
> + int part;
> u64 offset;
> };
>
> @@ -2673,49 +2682,45 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> struct cxl_poison_context *ctx)
> {
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + const struct resource *res;
> + struct resource *p, *last;
> u64 offset, length;
> int rc = 0;
>
> + if (ctx->part < 0)
> + return 0;
> +
> /*
> - * Collect poison for the remaining unmapped resources
> - * after poison is collected by committed endpoints.
> - *
> - * Knowing that PMEM must always follow RAM, get poison
> - * for unmapped resources based on the last decoder's mode:
> - * ram: scan remains of ram range, then any pmem range
> - * pmem: scan remains of pmem range
> + * Collect poison for the remaining unmapped resources after
> + * poison is collected by committed endpoints decoders.
> */
> -
> - if (ctx->mode == CXL_DECODER_RAM) {
> - offset = ctx->offset;
> - length = cxl_ram_size(cxlds) - offset;
> + for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
> + res = &cxlds->part[i].res;
> + for (p = res->child, last = NULL; p; p = p->sibling)
> + last = p;
> + if (last)
> + offset = last->end + 1;
> + else
> + offset = res->start;
> + length = res->end - offset + 1;
> + if (!length)
> + break;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc == -EFAULT)
> - rc = 0;
> + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> + continue;
> if (rc)
> - return rc;
> - }
> - if (ctx->mode == CXL_DECODER_PMEM) {
> - offset = ctx->offset;
> - length = resource_size(&cxlds->dpa_res) - offset;
> - if (!length)
> - return 0;
> - } else if (cxl_pmem_size(cxlds)) {
> - const struct resource *res = to_pmem_res(cxlds);
> -
> - offset = res->start;
> - length = resource_size(res);
> - } else {
> - return 0;
> + break;
> }
>
> - return cxl_mem_get_poison(cxlmd, offset, length, NULL);
> + return rc;
> }
>
> static int poison_by_decoder(struct device *dev, void *arg)
> {
> struct cxl_poison_context *ctx = arg;
> struct cxl_endpoint_decoder *cxled;
> + enum cxl_partition_mode mode;
> + struct cxl_dev_state *cxlds;
> struct cxl_memdev *cxlmd;
> u64 offset, length;
> int rc = 0;
> @@ -2728,11 +2733,17 @@ static int poison_by_decoder(struct device *dev, void *arg)
> return rc;
>
> cxlmd = cxled_to_memdev(cxled);
> + cxlds = cxlmd->cxlds;
> + if (cxled->part < 0)
> + mode = CXL_PARTMODE_NONE;
> + else
> + mode = cxlds->part[cxled->part].mode;
> +
> if (cxled->skip) {
> offset = cxled->dpa_res->start - cxled->skip;
> length = cxled->skip;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
> + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
> rc = 0;
> if (rc)
> return rc;
> @@ -2741,7 +2752,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
> offset = cxled->dpa_res->start;
> length = cxled->dpa_res->end - offset + 1;
> rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
> - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
> + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
> rc = 0;
> if (rc)
> return rc;
> @@ -2749,7 +2760,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
> /* Iterate until commit_end is reached */
> if (cxled->cxld.id == ctx->port->commit_end) {
> ctx->offset = cxled->dpa_res->end + 1;
> - ctx->mode = cxled->mode;
> + ctx->part = cxled->part;
> return 1;
> }
>
> @@ -2762,7 +2773,8 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> int rc = 0;
>
> ctx = (struct cxl_poison_context) {
> - .port = port
> + .port = port,
> + .part = -1,
> };
>
> rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
> @@ -3206,14 +3218,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *port = cxlrd_to_port(cxlrd);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct range *hpa = &cxled->cxld.hpa_range;
> + int rc, part = READ_ONCE(cxled->part);
> struct cxl_region_params *p;
> struct cxl_region *cxlr;
> struct resource *res;
> - int rc;
> +
> + if (part < 0)
> + return ERR_PTR(-EBUSY);
>
> do {
> - cxlr = __create_region(cxlrd, cxled->mode,
> + cxlr = __create_region(cxlrd, cxlds->part[part].mode,
> atomic_read(&cxlrd->region_id));
> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>
> @@ -3416,9 +3432,9 @@ static int cxl_region_probe(struct device *dev)
> return rc;
>
> switch (cxlr->mode) {
> - case CXL_DECODER_PMEM:
> + case CXL_PARTMODE_PMEM:
> return devm_cxl_add_pmem_region(cxlr);
> - case CXL_DECODER_RAM:
> + case CXL_PARTMODE_RAM:
> /*
> * The region can not be manged by CXL if any portion of
> * it is already online as 'System RAM'
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 4d0550367042..cb6f0b761b24 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -371,30 +371,6 @@ struct cxl_decoder {
> void (*reset)(struct cxl_decoder *cxld);
> };
>
> -/*
> - * CXL_DECODER_DEAD prevents endpoints from being reattached to regions
> - * while cxld_unregister() is running
> - */
> -enum cxl_decoder_mode {
> - CXL_DECODER_NONE,
> - CXL_DECODER_RAM,
> - CXL_DECODER_PMEM,
> - CXL_DECODER_DEAD,
> -};
> -
> -static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> -{
> - static const char * const names[] = {
> - [CXL_DECODER_NONE] = "none",
> - [CXL_DECODER_RAM] = "ram",
> - [CXL_DECODER_PMEM] = "pmem",
> - };
> -
> - if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
> - return names[mode];
> - return "mixed";
> -}
> -
> /*
> * Track whether this decoder is reserved for region autodiscovery, or
> * free for userspace provisioning.
> @@ -409,16 +385,16 @@ enum cxl_decoder_state {
> * @cxld: base cxl_decoder_object
> * @dpa_res: actively claimed DPA span of this decoder
> * @skip: offset into @dpa_res where @cxld.hpa_range maps
> - * @mode: which memory type / access-mode-partition this decoder targets
> * @state: autodiscovery state
> + * @part: partition index this decoder maps
> * @pos: interleave position in @cxld.region
> */
> struct cxl_endpoint_decoder {
> struct cxl_decoder cxld;
> struct resource *dpa_res;
> resource_size_t skip;
> - enum cxl_decoder_mode mode;
> enum cxl_decoder_state state;
> + int part;
> int pos;
> };
>
> @@ -503,6 +479,12 @@ struct cxl_region_params {
> int nr_targets;
> };
>
> +enum cxl_partition_mode {
> + CXL_PARTMODE_NONE,
> + CXL_PARTMODE_RAM,
> + CXL_PARTMODE_PMEM,
> +};
> +
> /*
> * Indicate whether this region has been assembled by autodetection or
> * userspace assembly. Prevent endpoint decoders outside of automatic
> @@ -522,7 +504,7 @@ struct cxl_region_params {
> * struct cxl_region - CXL region
> * @dev: This region's device
> * @id: This region's id. Id is globally unique across all regions
> - * @mode: Endpoint decoder allocation / access mode
> + * @mode: Operational mode of the mapped capacity
> * @type: Endpoint decoder target type
> * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
> * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge
> @@ -535,7 +517,7 @@ struct cxl_region_params {
> struct cxl_region {
> struct device dev;
> int id;
> - enum cxl_decoder_mode mode;
> + enum cxl_partition_mode mode;
> enum cxl_decoder_type type;
> struct cxl_nvdimm_bridge *cxl_nvb;
> struct cxl_pmem_region *cxlr_pmem;
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index bad99456e901..f218d43dec9f 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -97,12 +97,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped);
>
> -enum cxl_partition_mode {
> - CXL_PARTMODE_NONE,
> - CXL_PARTMODE_RAM,
> - CXL_PARTMODE_PMEM,
> -};
> -
> #define CXL_NR_PARTITIONS_MAX 2
>
> struct cxl_dpa_info {
> @@ -530,20 +524,6 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> return resource_size(res);
> }
>
> -/*
> - * Translate the operational mode of memory capacity with the
> - * operational mode of a decoder
> - * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
> - */
> -static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
> -{
> - if (mode == CXL_PARTMODE_RAM)
> - return CXL_DECODER_RAM;
> - if (mode == CXL_PARTMODE_PMEM)
> - return CXL_DECODER_PMEM;
> - return CXL_DECODER_NONE;
> -}
> -
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-22 16:29 ` Ira Weiny
2025-01-23 16:41 ` Jonathan Cameron
@ 2025-01-23 17:21 ` Alejandro Lucero Palau
2025-01-23 20:52 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 17:21 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
On 1/22/25 08:59, Dan Williams wrote:
> cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> allocations being distinct from RAM allocations in specific ways when in
> practice the allocation rules are only relative to DPA partition index.
>
> The rules for cxl_dpa_alloc() are:
>
> - allocations can only come from 1 partition
>
> - if allocating at partition-index-N, all free space in partitions less
> than partition-index-N must be skipped over
>
> Use the new 'struct cxl_dpa_partition' array to support allocation with
> an arbitrary number of DPA partitions on the device.
>
> A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
> concept and supersede it with looking up the memory properties from
> partition metadata. Until then cxl_part_mode() temporarily bridges code
> that looks up partitions by @cxled->mode.
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/hdm.c | 215 +++++++++++++++++++++++++++++++++++-------------
> drivers/cxl/cxlmem.h | 14 +++
> 2 files changed, 172 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3f8a54ca4624..591aeb26c9e1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
>
> +/* See request_skip() kernel-doc */
> +static void release_skip(struct cxl_dev_state *cxlds,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + resource_size_t skip_end, skip_size;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> +
> + if (!skip_rem)
> + break;
> + }
> +}
> +
> /*
> * Must be called in a context that synchronizes against this decoder's
> * port ->remove() callback (like an endpoint decoder sysfs attribute)
> @@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> skip_start = res->start - cxled->skip;
> __release_region(&cxlds->dpa_res, res->start, resource_size(res));
> if (cxled->skip)
> - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> + release_skip(cxlds, skip_start, cxled->skip);
> cxled->skip = 0;
> cxled->dpa_res = NULL;
> put_device(&cxled->cxld.dev);
> @@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> __cxl_dpa_release(cxled);
> }
>
> +/**
> + * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
> + * @cxlds: CXL.mem device context that parents @cxled
> + * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
> + * @skip_base: DPA < start of new DPA allocation (DPAnew)
> + * @skip_len: @skip_base + @skip_len == DPAnew
> + *
> + * DPA 'skip' arises from out-of-sequence DPA allocation events relative
> + * to free capacity across multiple partitions. It is a wasteful event
> + * as usable DPA gets thrown away, but if a deployment has, for example,
> + * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
> + * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
> + * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
> + * Protection" for more details.
> + *
> + * A 'skip' always covers the last allocated DPA in a previous partition
> + * to the start of the current partition to allocate. Allocations never
> + * start in the middle of a partition, and allocations are always
> + * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
> + * unwind order from forced in-order allocation).
> + *
> + * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
> + * would always be contained to a single partition. Given
> + * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
> + * might span "tail capacity of partition[0], all of partition[1], ...,
> + * all of partition[N-1]" to support allocating from partition[N]. That
> + * in turn interacts with the partition 'struct resource' boundaries
> + * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
> + * partition. I.e. this is a quirk of using a 'struct resource' tree to
> + * detect range conflicts while also tracking partition boundaries in
> + * @cxlds->dpa_res.
> + */
> +static int request_skip(struct cxl_dev_state *cxlds,
> + struct cxl_endpoint_decoder *cxled,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + struct cxl_port *port = cxled_to_port(cxled);
> + resource_size_t skip_end, skip_size;
> + struct resource *res;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> +
> + res = __request_region(&cxlds->dpa_res, skip_start, skip_size,
> + dev_name(&cxled->cxld.dev), 0);
> + if (!res) {
> + dev_dbg(cxlds->dev,
> + "decoder%d.%d: failed to reserve skipped space\n",
> + port->id, cxled->cxld.id);
> + break;
> + }
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> + if (!skip_rem)
> + break;
> + }
> +
> + if (skip_rem == 0)
> + return 0;
> +
> + release_skip(cxlds, skip_base, skip_len - skip_rem);
> +
> + return -EBUSY;
> +}
> +
> static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped)
> @@ -276,7 +374,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &port->dev;
> + enum cxl_decoder_mode mode;
> struct resource *res;
> + int rc;
>
> lockdep_assert_held_write(&cxl_dpa_rwsem);
>
> @@ -305,14 +405,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> }
>
> if (skipped) {
> - res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
> - dev_name(&cxled->cxld.dev), 0);
> - if (!res) {
> - dev_dbg(dev,
> - "decoder%d.%d: failed to reserve skipped space\n",
> - port->id, cxled->cxld.id);
> - return -EBUSY;
> - }
> + rc = request_skip(cxlds, cxled, base - skipped, skipped);
> + if (rc)
> + return rc;
> }
> res = __request_region(&cxlds->dpa_res, base, len,
> dev_name(&cxled->cxld.dev), 0);
> @@ -320,22 +415,23 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
> port->id, cxled->cxld.id);
> if (skipped)
> - __release_region(&cxlds->dpa_res, base - skipped,
> - skipped);
> + release_skip(cxlds, base - skipped, skipped);
> return -EBUSY;
> }
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
> - cxled->mode = CXL_DECODER_PMEM;
> - else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
> - cxled->mode = CXL_DECODER_RAM;
> - else {
> + mode = CXL_DECODER_NONE;
> + for (int i = 0; cxlds->nr_partitions; i++)
> + if (resource_contains(&cxlds->part[i].res, res)) {
> + mode = cxl_part_mode(cxlds->part[i].mode);
> + break;
> + }
> +
> + if (mode == CXL_DECODER_NONE)
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> port->id, cxled->cxld.id, res);
> - cxled->mode = CXL_DECODER_NONE;
> - }
> + cxled->mode = mode;
>
> port->hdm_end++;
> get_device(&cxled->cxld.dev);
> @@ -529,15 +625,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - resource_size_t free_ram_start, free_pmem_start;
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> - resource_size_t start, avail, skip;
> + struct resource *res, *prev = NULL;
> + resource_size_t start, avail, skip, skip_start;
> struct resource *p, *last;
> - const struct resource *ram_res = to_ram_res(cxlds);
> - const struct resource *pmem_res = to_pmem_res(cxlds);
> - int rc;
> + int part, rc;
>
> down_write(&cxl_dpa_rwsem);
> if (cxled->cxld.region) {
> @@ -553,47 +647,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = ram_res->child, last = NULL; p; p = p->sibling)
> - last = p;
> - if (last)
> - free_ram_start = last->end + 1;
> - else
> - free_ram_start = ram_res->start;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> + part = i;
> + break;
> + }
> + }
>
> - for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> + if (part < 0) {
> + dev_dbg(dev, "partition %d not found\n", part);
> + rc = -EBUSY;
> + goto out;
> + }
> +
> + res = &cxlds->part[part].res;
> + for (p = res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> - free_pmem_start = last->end + 1;
> + start = last->end + 1;
> else
> - free_pmem_start = pmem_res->start;
> -
> - if (cxled->mode == CXL_DECODER_RAM) {
> - start = free_ram_start;
> - avail = ram_res->end - start + 1;
> - skip = 0;
> - } else if (cxled->mode == CXL_DECODER_PMEM) {
> - resource_size_t skip_start, skip_end;
> + start = res->start;
>
> - start = free_pmem_start;
> - avail = pmem_res->end - start + 1;
> - skip_start = free_ram_start;
> -
> - /*
> - * If some pmem is already allocated, then that allocation
> - * already handled the skip.
> - */
> - if (pmem_res->child &&
> - skip_start == pmem_res->child->start)
> - skip_end = skip_start - 1;
> - else
> - skip_end = start - 1;
> - skip = skip_end - skip_start + 1;
> - } else {
> - dev_dbg(dev, "mode not set\n");
> - rc = -EINVAL;
> - goto out;
> + /*
> + * To allocate at partition N, a skip needs to be calculated for all
> + * unallocated space at lower partitions indices.
> + *
> + * If a partition has any allocations, the search can end because a
> + * previous cxl_dpa_alloc() invocation is assumed to have accounted for
> + * all previous partitions.
> + */
> + skip_start = CXL_RESOURCE_NONE;
> + for (int i = part; i; i--) {
> + prev = &cxlds->part[i - 1].res;
> + for (p = prev->child, last = NULL; p; p = p->sibling)
> + last = p;
> + if (last) {
> + skip_start = last->end + 1;
> + break;
> + }
> + skip_start = prev->start;
> }
>
> + avail = res->end - start + 1;
> + if (skip_start == CXL_RESOURCE_NONE)
> + skip = 0;
> + else
> + skip = res->start - skip_start;
> +
> if (size > avail) {
> dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
> cxl_decoder_mode_name(cxled->mode), &avail);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 15f549afab7c..bad99456e901 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -530,6 +530,20 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> return resource_size(res);
> }
>
> +/*
> + * Translate the operational mode of memory capacity with the
> + * operational mode of a decoder
> + * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
> + */
> +static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
> +{
> + if (mode == CXL_PARTMODE_RAM)
> + return CXL_DECODER_RAM;
> + if (mode == CXL_PARTMODE_PMEM)
> + return CXL_DECODER_PMEM;
> + return CXL_DECODER_NONE;
> +}
> +
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 0/5] cxl: DPA partition metadata is a mess...
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
` (4 preceding siblings ...)
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
@ 2025-01-23 17:23 ` Alejandro Lucero Palau
5 siblings, 0 replies; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-23 17:23 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Ira Weiny, Dave Jiang, Jonathan.Cameron
On 1/22/25 08:59, Dan Williams wrote:
> Changes since v1: [0]
> - Stop requiring PMEM to be at partition-index 1, i.e. remove empty
> partitions. (Jonathan)
> - Document the assumptions and implementation of
> {request,release}_skip() (Jonathan, Alejandro)
> - Kill 'enum cxl_decoder_mode' to cleanup remainder of hard-coded
> expectations of a static PMEM partition always being present
>
> [0]: http://lore.kernel.org/173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com
>
> ---
>
> As noted in patch3, the pending efforts to add CXL Accelerator (type-2)
> device [1], and Dynamic Capacity (DCD) support [2], tripped on the
> no-longer-fit-for-purpose design in the CXL subsystem for tracking
> device-physical-address (DPA) metadata.
>
> In fact there was no design at all, just a couple of open-coded 'struct
> resource' instances for 'ram' and 'pmem' and a pile of explicit code
> referencing those resources directly.
>
> See patch3 for more details on the specific problems that caused, and
> patch4 for the eyesore reduction of making the DPA allocation algorithm
> partition number agnostic.
>
> The motivation with this effort is to make it easier to land the Type-2
> and DCD series.
>
> [1]: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com
> [2]: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com
>
> ---
>
> Dan Williams (5):
> cxl: Remove the CXL_DECODER_MIXED mistake
> cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
> cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
> cxl: Make cxl_dpa_alloc() DPA partition number agnostic
> cxl: Kill enum cxl_decoder_mode
>
FWIW, I'll adapt the current CXL patchset as in v9 for testing the
changes in this patchset and report here in the next days.
Thank you
> drivers/cxl/core/cdat.c | 74 +++++-----
> drivers/cxl/core/core.h | 4 -
> drivers/cxl/core/hdm.c | 310 +++++++++++++++++++++++++++++++-----------
> drivers/cxl/core/mbox.c | 66 +++------
> drivers/cxl/core/memdev.c | 43 ++----
> drivers/cxl/core/port.c | 20 ++-
> drivers/cxl/core/region.c | 138 ++++++++++---------
> drivers/cxl/cxl.h | 40 +----
> drivers/cxl/cxlmem.h | 94 +++++++++++--
> drivers/cxl/mem.c | 2
> drivers/cxl/pci.c | 7 +
> tools/testing/cxl/test/cxl.c | 22 +--
> tools/testing/cxl/test/mem.c | 7 +
> 13 files changed, 511 insertions(+), 316 deletions(-)
>
> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-23 15:57 ` Jonathan Cameron
@ 2025-01-23 20:01 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 20:01 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
Jonathan Cameron wrote:
> On Wed, 22 Jan 2025 00:59:21 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > In preparation for consolidating all DPA partition information into an
> > array of DPA metadata, introduce helpers that hide the layout of the
> > current data. I.e. make the eventual replacement of ->ram_res,
> > ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> > no-op for code paths that consume that information, and reduce the noise
> > of follow-on patches.
> >
> > The end goal is to consolidate all DPA information in 'struct
> > cxl_dev_state', but for now the helpers just make it appear that all DPA
> > metadata is relative to @cxlds.
> >
> > Note that a follow-on patch also cleans up the temporary placeholders of
> > @ram_res, and @pmem_res in the qos_class manipulation code,
> > cxl_dpa_alloc(), and cxl_mem_create_range_info().
> >
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[..]
> > xa_for_each(dsmas_xa, index, dent) {
> > - if (resource_size(&cxlds->ram_res) &&
> > - range_contains(&ram_range, &dent->dpa_range))
> > - update_perf_entry(dev, dent, &mds->ram_perf);
> > - else if (resource_size(&cxlds->pmem_res) &&
> > - range_contains(&pmem_range, &dent->dpa_range))
> > - update_perf_entry(dev, dent, &mds->pmem_perf);
> > - else
> > - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> > - &dent->dpa_range);
> > + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> > + const struct resource *res = partition[i];
> > + struct range range = {
> > + .start = res->start,
> > + .end = res->end,
> > + };
> > +
> > + if (range_contains(&range, &dent->dpa_range))
> > + update_perf_entry(dev, dent, perf[i]);
> > + else
> > + dev_dbg(dev,
> > + "no partition for dsmas dpa: %pra\n",
> > + &dent->dpa_range);
>
> This else branch looks less than helpful if I read the code right.
> It will fire at least once for every dsmas entry implying no partition when
> in reality it is it probably matched on next partition.
> Probably want to break out on match and check if i == ARRAY_SIZE(partition)
> after the for loop and only then print the message.
Ah, true, good catch.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-23 16:09 ` Jonathan Cameron
@ 2025-01-23 20:24 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 20:24 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
Jonathan Cameron wrote:
> On Wed, 22 Jan 2025 00:59:27 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > The pending efforts to add CXL Accelerator (type-2) device [1], and
> > Dynamic Capacity (DCD) support [2], tripped on the
> > no-longer-fit-for-purpose design in the CXL subsystem for tracking
> > device-physical-address (DPA) metadata. Trip hazards include:
> >
> > - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> > devices with CXL.mem likely do not in the common case.
> >
> > - CXL Memory Devices enumerate DPA through Memory Device mailbox
> > commands like Partition Info, Accelerators devices do not.
> >
> > - CXL Memory Devices that support DCD support more than 2 partitions.
> > Some of the driver algorithms are awkward to expand to > 2 partition
> > cases.
> >
> > - DPA performance data is a general capability that can be shared with
> > accelerators, so tracking it in 'struct cxl_memdev_state' is no longer
> > suitable.
> >
> > - Hardcoded assumptions around the PMEM partition always being index-1
> > if RAM is zero-sized or PMEM is zero sized.
> >
> > - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a
> > memory property, it should be phased in favor of a partition id and
> > the memory property comes from the partition info.
> >
> > Towards cleaning up those issues and allowing a smoother landing for the
> > aforementioned pending efforts, introduce a 'struct cxl_dpa_partition'
> > array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared
> > way for Memory Devices and Accelerators to initialize the DPA information
> > in 'struct cxl_dev_state'.
> >
> > For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to
> > get the new data structure initialized, and cleanup some qos_class init.
> > Follow on patches will go further to use the new data structure to
> > cleanup algorithms that are better suited to loop over all possible
> > partitions.
> >
> > cxl_dpa_setup() follows the locking expectations of mutating the device
> > DPA map, and is suitable for Accelerator drivers to use. Accelerators
> > likely only have one hardcoded 'ram' partition to convey to the
> > cxl_core.
> >
> > Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1]
> > Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2]
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
>
> A few trivial comments inline but looking better to me.
>
> One question about what smells to me like our next MIXED mode.
>
>
[..]
> > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> > index b177a488e29b..5400a421ad30 100644
> > --- a/drivers/cxl/core/cdat.c
> > +++ b/drivers/cxl/core/cdat.c
>
> > +/* if this fails the caller must destroy @cxlds, there is no recovery */
> > +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
> > +{
> > + struct device *dev = cxlds->dev;
> > +
> > + guard(rwsem_write)(&cxl_dpa_rwsem);
> > +
> > + if (cxlds->nr_partitions)
> > + return -EBUSY;
> > +
> > + if (!info->size || !info->nr_partitions) {
> > + cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> > + cxlds->nr_partitions = 0;
> > + return 0;
> > + }
> > +
> > + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
> > +
> > + for (int i = 0; i < info->nr_partitions; i++) {
> > + const struct cxl_dpa_part_info *part = &info->part[i];
> > + const char *desc;
> > + int rc;
> > +
> > + if (part->mode == CXL_PARTMODE_RAM)
> > + desc = "ram";
> > + else if (part->mode == CXL_PARTMODE_PMEM)
> > + desc = "pmem";
>
> I'd go switch statement now to save having to fix this up later, or
> an array of strings with a bounds check.
> (not important though if you want to shunt that into another day)
Might as well do it now.
[..]
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 78e92e24d7b5..15f549afab7c 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -97,6 +97,25 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > resource_size_t base, resource_size_t len,
> > resource_size_t skipped);
> >
> > +enum cxl_partition_mode {
> > + CXL_PARTMODE_NONE,
>
> What is NONE for? Given you are now packing the partitions and
> counting them when would we get an 'empty' one?
Looks like another thinko during the conversion. It gets used later on
in the series to check for endpoint-decoders that have not been assigned
a partition. However, that path also guarantees that the endpoint
decoder *has* been assigned. So long CXL_PARTMODE_NONE, we hardly knew
ye.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
` (2 preceding siblings ...)
2025-01-23 17:21 ` Alejandro Lucero Palau
@ 2025-01-23 20:52 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2025-01-23 20:52 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Alejandro Lucero, Ira Weiny, Jonathan.Cameron
On 1/22/25 1:59 AM, Dan Williams wrote:
> cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> allocations being distinct from RAM allocations in specific ways when in
> practice the allocation rules are only relative to DPA partition index.
>
> The rules for cxl_dpa_alloc() are:
>
> - allocations can only come from 1 partition
>
> - if allocating at partition-index-N, all free space in partitions less
> than partition-index-N must be skipped over
>
> Use the new 'struct cxl_dpa_partition' array to support allocation with
> an arbitrary number of DPA partitions on the device.
>
> A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
> concept and supersede it with looking up the memory properties from
> partition metadata. Until then cxl_part_mode() temporarily bridges code
> that looks up partitions by @cxled->mode.
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/hdm.c | 215 +++++++++++++++++++++++++++++++++++-------------
> drivers/cxl/cxlmem.h | 14 +++
> 2 files changed, 172 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3f8a54ca4624..591aeb26c9e1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
>
> +/* See request_skip() kernel-doc */
> +static void release_skip(struct cxl_dev_state *cxlds,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + resource_size_t skip_end, skip_size;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> +
> + if (!skip_rem)
> + break;
> + }
> +}
> +
> /*
> * Must be called in a context that synchronizes against this decoder's
> * port ->remove() callback (like an endpoint decoder sysfs attribute)
> @@ -241,7 +266,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> skip_start = res->start - cxled->skip;
> __release_region(&cxlds->dpa_res, res->start, resource_size(res));
> if (cxled->skip)
> - __release_region(&cxlds->dpa_res, skip_start, cxled->skip);
> + release_skip(cxlds, skip_start, cxled->skip);
> cxled->skip = 0;
> cxled->dpa_res = NULL;
> put_device(&cxled->cxld.dev);
> @@ -268,6 +293,79 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> __cxl_dpa_release(cxled);
> }
>
> +/**
> + * request_skip() - Track DPA 'skip' in @cxlds->dpa_res resource tree
> + * @cxlds: CXL.mem device context that parents @cxled
> + * @cxled: Endpoint decoder establishing new allocation that skips lower DPA
> + * @skip_base: DPA < start of new DPA allocation (DPAnew)
> + * @skip_len: @skip_base + @skip_len == DPAnew
> + *
> + * DPA 'skip' arises from out-of-sequence DPA allocation events relative
> + * to free capacity across multiple partitions. It is a wasteful event
> + * as usable DPA gets thrown away, but if a deployment has, for example,
> + * a dual RAM+PMEM device, wants to use PMEM, and has unallocated RAM
> + * DPA, the free RAM DPA must be sacrificed to start allocating PMEM.
> + * See third "Implementation Note" in CXL 3.1 8.2.4.19.13 "Decoder
> + * Protection" for more details.
> + *
> + * A 'skip' always covers the last allocated DPA in a previous partition
> + * to the start of the current partition to allocate. Allocations never
> + * start in the middle of a partition, and allocations are always
> + * de-allocated in reverse order (see cxl_dpa_free(), or natural devm
> + * unwind order from forced in-order allocation).
> + *
> + * If @cxlds->nr_partitions was guaranteed to be <= 2 then the 'skip'
> + * would always be contained to a single partition. Given
> + * @cxlds->nr_partitions may be > 2 it results in cases where the 'skip'
> + * might span "tail capacity of partition[0], all of partition[1], ...,
> + * all of partition[N-1]" to support allocating from partition[N]. That
> + * in turn interacts with the partition 'struct resource' boundaries
> + * within @cxlds->dpa_res whereby 'skip' requests need to be divided by
> + * partition. I.e. this is a quirk of using a 'struct resource' tree to
> + * detect range conflicts while also tracking partition boundaries in
> + * @cxlds->dpa_res.
> + */
> +static int request_skip(struct cxl_dev_state *cxlds,
> + struct cxl_endpoint_decoder *cxled,
> + const resource_size_t skip_base,
> + const resource_size_t skip_len)
> +{
> + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> +
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *part_res = &cxlds->part[i].res;
> + struct cxl_port *port = cxled_to_port(cxled);
> + resource_size_t skip_end, skip_size;
> + struct resource *res;
> +
> + if (skip_start < part_res->start || skip_start > part_res->end)
> + continue;
> +
> + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> + skip_size = skip_end - skip_start + 1;
> +
> + res = __request_region(&cxlds->dpa_res, skip_start, skip_size,
> + dev_name(&cxled->cxld.dev), 0);
> + if (!res) {
> + dev_dbg(cxlds->dev,
> + "decoder%d.%d: failed to reserve skipped space\n",
> + port->id, cxled->cxld.id);
> + break;
> + }
> + skip_start += skip_size;
> + skip_rem -= skip_size;
> + if (!skip_rem)
> + break;
> + }
> +
> + if (skip_rem == 0)
> + return 0;
> +
> + release_skip(cxlds, skip_base, skip_len - skip_rem);
> +
> + return -EBUSY;
> +}
> +
> static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped)
> @@ -276,7 +374,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &port->dev;
> + enum cxl_decoder_mode mode;
> struct resource *res;
> + int rc;
>
> lockdep_assert_held_write(&cxl_dpa_rwsem);
>
> @@ -305,14 +405,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> }
>
> if (skipped) {
> - res = __request_region(&cxlds->dpa_res, base - skipped, skipped,
> - dev_name(&cxled->cxld.dev), 0);
> - if (!res) {
> - dev_dbg(dev,
> - "decoder%d.%d: failed to reserve skipped space\n",
> - port->id, cxled->cxld.id);
> - return -EBUSY;
> - }
> + rc = request_skip(cxlds, cxled, base - skipped, skipped);
> + if (rc)
> + return rc;
> }
> res = __request_region(&cxlds->dpa_res, base, len,
> dev_name(&cxled->cxld.dev), 0);
> @@ -320,22 +415,23 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
> port->id, cxled->cxld.id);
> if (skipped)
> - __release_region(&cxlds->dpa_res, base - skipped,
> - skipped);
> + release_skip(cxlds, base - skipped, skipped);
> return -EBUSY;
> }
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - if (to_pmem_res(cxlds) && resource_contains(to_pmem_res(cxlds), res))
> - cxled->mode = CXL_DECODER_PMEM;
> - else if (to_ram_res(cxlds) && resource_contains(to_ram_res(cxlds), res))
> - cxled->mode = CXL_DECODER_RAM;
> - else {
> + mode = CXL_DECODER_NONE;
> + for (int i = 0; cxlds->nr_partitions; i++)
> + if (resource_contains(&cxlds->part[i].res, res)) {
> + mode = cxl_part_mode(cxlds->part[i].mode);
> + break;
> + }
> +
> + if (mode == CXL_DECODER_NONE)
> dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> port->id, cxled->cxld.id, res);
> - cxled->mode = CXL_DECODER_NONE;
> - }
> + cxled->mode = mode;
>
> port->hdm_end++;
> get_device(&cxled->cxld.dev);
> @@ -529,15 +625,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - resource_size_t free_ram_start, free_pmem_start;
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> - resource_size_t start, avail, skip;
> + struct resource *res, *prev = NULL;
> + resource_size_t start, avail, skip, skip_start;
> struct resource *p, *last;
> - const struct resource *ram_res = to_ram_res(cxlds);
> - const struct resource *pmem_res = to_pmem_res(cxlds);
> - int rc;
> + int part, rc;
>
> down_write(&cxl_dpa_rwsem);
> if (cxled->cxld.region) {
> @@ -553,47 +647,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - for (p = ram_res->child, last = NULL; p; p = p->sibling)
> - last = p;
> - if (last)
> - free_ram_start = last->end + 1;
> - else
> - free_ram_start = ram_res->start;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> + part = i;
> + break;
> + }
> + }
>
> - for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> + if (part < 0) {
> + dev_dbg(dev, "partition %d not found\n", part);
> + rc = -EBUSY;
> + goto out;
> + }
> +
> + res = &cxlds->part[part].res;
> + for (p = res->child, last = NULL; p; p = p->sibling)
> last = p;
> if (last)
> - free_pmem_start = last->end + 1;
> + start = last->end + 1;
> else
> - free_pmem_start = pmem_res->start;
> -
> - if (cxled->mode == CXL_DECODER_RAM) {
> - start = free_ram_start;
> - avail = ram_res->end - start + 1;
> - skip = 0;
> - } else if (cxled->mode == CXL_DECODER_PMEM) {
> - resource_size_t skip_start, skip_end;
> + start = res->start;
>
> - start = free_pmem_start;
> - avail = pmem_res->end - start + 1;
> - skip_start = free_ram_start;
> -
> - /*
> - * If some pmem is already allocated, then that allocation
> - * already handled the skip.
> - */
> - if (pmem_res->child &&
> - skip_start == pmem_res->child->start)
> - skip_end = skip_start - 1;
> - else
> - skip_end = start - 1;
> - skip = skip_end - skip_start + 1;
> - } else {
> - dev_dbg(dev, "mode not set\n");
> - rc = -EINVAL;
> - goto out;
> + /*
> + * To allocate at partition N, a skip needs to be calculated for all
> + * unallocated space at lower partitions indices.
> + *
> + * If a partition has any allocations, the search can end because a
> + * previous cxl_dpa_alloc() invocation is assumed to have accounted for
> + * all previous partitions.
> + */
> + skip_start = CXL_RESOURCE_NONE;
> + for (int i = part; i; i--) {
> + prev = &cxlds->part[i - 1].res;
> + for (p = prev->child, last = NULL; p; p = p->sibling)
> + last = p;
> + if (last) {
> + skip_start = last->end + 1;
> + break;
> + }
> + skip_start = prev->start;
> }
>
> + avail = res->end - start + 1;
> + if (skip_start == CXL_RESOURCE_NONE)
> + skip = 0;
> + else
> + skip = res->start - skip_start;
> +
> if (size > avail) {
> dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
> cxl_decoder_mode_name(cxled->mode), &avail);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 15f549afab7c..bad99456e901 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -530,6 +530,20 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> return resource_size(res);
> }
>
> +/*
> + * Translate the operational mode of memory capacity with the
> + * operational mode of a decoder
> + * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
> + */
> +static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
> +{
> + if (mode == CXL_PARTMODE_RAM)
> + return CXL_DECODER_RAM;
> + if (mode == CXL_PARTMODE_PMEM)
> + return CXL_DECODER_PMEM;
> + return CXL_DECODER_NONE;
> +}
> +
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-23 16:25 ` Alejandro Lucero Palau
@ 2025-01-23 21:04 ` Dan Williams
2025-01-24 10:15 ` Alejandro Lucero Palau
0 siblings, 1 reply; 48+ messages in thread
From: Dan Williams @ 2025-01-23 21:04 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, linux-cxl
Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
Alejandro Lucero Palau wrote:
>
> On 1/22/25 08:59, Dan Williams wrote:
> > In preparation for consolidating all DPA partition information into an
> > array of DPA metadata, introduce helpers that hide the layout of the
> > current data. I.e. make the eventual replacement of ->ram_res,
> > ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a
> > no-op for code paths that consume that information, and reduce the noise
> > of follow-on patches.
> >
> > The end goal is to consolidate all DPA information in 'struct
> > cxl_dev_state', but for now the helpers just make it appear that all DPA
> > metadata is relative to @cxlds.
> >
> > Note that a follow-on patch also cleans up the temporary placeholders of
> > @ram_res, and @pmem_res in the qos_class manipulation code,
> > cxl_dpa_alloc(), and cxl_mem_create_range_info().
> >
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++-----------------
> > drivers/cxl/core/hdm.c | 26 ++++++++--------
> > drivers/cxl/core/mbox.c | 18 ++++++-----
> > drivers/cxl/core/memdev.c | 42 +++++++++++++------------
> > drivers/cxl/core/region.c | 10 ++++--
> > drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++-----
> > drivers/cxl/mem.c | 2 +
> > tools/testing/cxl/test/cxl.c | 25 ++++++++-------
> > 8 files changed, 159 insertions(+), 92 deletions(-)
> >
> > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> > index 8153f8d83a16..b177a488e29b 100644
> > --- a/drivers/cxl/core/cdat.c
> > +++ b/drivers/cxl/core/cdat.c
> > @@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
> > static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
> > struct xarray *dsmas_xa)
[..]
> > xa_for_each(dsmas_xa, index, dent) {
> > - if (resource_size(&cxlds->ram_res) &&
> > - range_contains(&ram_range, &dent->dpa_range))
> > - update_perf_entry(dev, dent, &mds->ram_perf);
> > - else if (resource_size(&cxlds->pmem_res) &&
> > - range_contains(&pmem_range, &dent->dpa_range))
> > - update_perf_entry(dev, dent, &mds->pmem_perf);
> > - else
> > - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> > - &dent->dpa_range);
> > + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> > + const struct resource *res = partition[i];
> > + struct range range = {
> > + .start = res->start,
> > + .end = res->end,
> > + };
> > +
> > + if (range_contains(&range, &dent->dpa_range))
> > + update_perf_entry(dev, dent, perf[i]);
>
> This is checking if range contains dent->dpa_range.
>
> I think it is just the opposite.
This looks like an equivalent conversion to me, what am I missing?
> > @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
> > .end = dpa_res->end,
> > };
> >
> > + if (!perf)
> > + return false;
> > +
>
> This change seems to be an improvement or hardening. Not against doing
> it, but seizing the change, the function can be simplified using the
> parameter without any local variable.
No, it's not pure hardening, it is actively avoiding NULL pointer
de-references introduced by the new scheme to not track empty
partitions.
I.e the new to_{ram,pmem}_perf() helpers return NULL when the partition
is zero-sized. Previously this code path would do range checks on empty
partitions.
> > return range_contains(&perf->dpa_range, &dpa);
> > }
> >
[..]
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 548564c770c0..3502f1633ad2 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent,
> > int cxl_mem_create_range_info(struct cxl_memdev_state *mds)
> > {
> > struct cxl_dev_state *cxlds = &mds->cxlds;
> > + struct resource *ram_res = to_ram_res(cxlds);
> > + struct resource *pmem_res = to_pmem_res(cxlds);
> > struct device *dev = cxlds->dev;
> > int rc;
> >
> > if (!cxlds->media_ready) {
> > cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> > - cxlds->ram_res = DEFINE_RES_MEM(0, 0);
> > - cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
> > + *ram_res = DEFINE_RES_MEM(0, 0);
> > + *pmem_res = DEFINE_RES_MEM(0, 0);
>
>
> This is a good example for the discussion about the patch hardening
> resource_contains. The initialization seems fine but IORESOURCE_UNSET
> not used.
To the contrary, I think these changes are an example of "updating
resource_contains() to check for zero is a band-aid that does not fix
the root problem".
> it could be argued the resource is set, but it is a zero-size resource
> leading to problems in current CXL code.
I challenge you to find problems in current CXL code after these
partition reworks.
Passing empty ranges to resource_contains() (and don't forget
range_contains()) is either a sign of a confused caller, or a caller
that expressly wants "contains" to return true on empty matches. With
these DPA metadata changes a primary source if not all 0-sized resources
in drivers/cxl/ calls to resource_contains() are eliminated. The new
cxl_dpa_setup() arranges for any walk of cxlds->nr_partitions to never
find a zero-sized resource.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
` (2 preceding siblings ...)
2025-01-23 17:20 ` Alejandro Lucero Palau
@ 2025-01-23 21:29 ` Dave Jiang
3 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2025-01-23 21:29 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Alejandro Lucero, Ira Weiny, Jonathan.Cameron
On 1/22/25 1:59 AM, Dan Williams wrote:
> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> tracked in the partition, and no code paths have dependencies on the
> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> can be cleaned up, specifically this ambiguity on whether the operation
> mode implied anything about the partition order.
>
> Endpoint decoders simply reference their assigned partition where the
> operational mode can be retrieved as partition mode.
>
> With this in place PMEM can now be partition0 which happens today when
> the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
> DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
> assumption can now just iterate partitions and consult the partition
> mode after the fact.
>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alejandro Lucero <alucerop@amd.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/cdat.c | 21 ++-----
> drivers/cxl/core/core.h | 4 +
> drivers/cxl/core/hdm.c | 64 +++++++----------------
> drivers/cxl/core/memdev.c | 15 +----
> drivers/cxl/core/port.c | 20 +++++--
> drivers/cxl/core/region.c | 128 +++++++++++++++++++++++++--------------------
> drivers/cxl/cxl.h | 38 ++++---------
> drivers/cxl/cxlmem.h | 20 -------
> 8 files changed, 127 insertions(+), 183 deletions(-)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index 5400a421ad30..ca7fb2b182ed 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -571,29 +571,18 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
> .end = dpa_res->end,
> };
>
> - if (!perf)
> - return false;
> -
> return range_contains(&perf->dpa_range, &dpa);
> }
>
> -static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode)
> +static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_dpa_perf *perf;
>
> - switch (mode) {
> - case CXL_DECODER_RAM:
> - perf = to_ram_perf(cxlds);
> - break;
> - case CXL_DECODER_PMEM:
> - perf = to_pmem_perf(cxlds);
> - break;
> - default:
> + if (cxled->part < 0)
> return ERR_PTR(-EINVAL);
> - }
> + perf = &cxlds->part[cxled->part].perf;
>
> if (!dpa_perf_contains(perf, cxled->dpa_res))
> return ERR_PTR(-EINVAL);
> @@ -654,7 +643,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr,
> if (cxlds->rcd)
> return -ENODEV;
>
> - perf = cxled_get_dpa_perf(cxled, cxlr->mode);
> + perf = cxled_get_dpa_perf(cxled);
> if (IS_ERR(perf))
> return PTR_ERR(perf);
>
> @@ -1060,7 +1049,7 @@ void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
>
> lockdep_assert_held(&cxl_dpa_rwsem);
>
> - perf = cxled_get_dpa_perf(cxled, cxlr->mode);
> + perf = cxled_get_dpa_perf(cxled);
> if (IS_ERR(perf))
> return;
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 800466f96a68..22dac79c5192 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -72,8 +72,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> resource_size_t length);
>
> struct dentry *cxl_debugfs_create_dir(const char *dir);
> -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode);
> +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> + enum cxl_partition_mode mode);
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size);
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 591aeb26c9e1..bb478e7b12f6 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -374,7 +374,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &port->dev;
> - enum cxl_decoder_mode mode;
> struct resource *res;
> int rc;
>
> @@ -421,18 +420,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> cxled->dpa_res = res;
> cxled->skip = skipped;
>
> - mode = CXL_DECODER_NONE;
> - for (int i = 0; cxlds->nr_partitions; i++)
> - if (resource_contains(&cxlds->part[i].res, res)) {
> - mode = cxl_part_mode(cxlds->part[i].mode);
> - break;
> - }
> -
> - if (mode == CXL_DECODER_NONE)
> - dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
> - port->id, cxled->cxld.id, res);
> - cxled->mode = mode;
> -
> port->hdm_end++;
> get_device(&cxled->cxld.dev);
> return 0;
> @@ -585,40 +572,36 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> return rc;
> }
>
> -int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> - enum cxl_decoder_mode mode)
> +int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> + enum cxl_partition_mode mode)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
> -
> - switch (mode) {
> - case CXL_DECODER_RAM:
> - case CXL_DECODER_PMEM:
> - break;
> - default:
> - dev_dbg(dev, "unsupported mode: %d\n", mode);
> - return -EINVAL;
> - }
> + int part;
>
> guard(rwsem_write)(&cxl_dpa_rwsem);
> if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> return -EBUSY;
>
> - /*
> - * Only allow modes that are supported by the current partition
> - * configuration
> - */
> - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> - dev_dbg(dev, "no available pmem capacity\n");
> - return -ENXIO;
> + part = -1;
> + for (int i = 0; i < cxlds->nr_partitions; i++)
> + if (cxlds->part[i].mode == mode) {
> + part = i;
> + break;
> + }
> +
> + if (part < 0) {
> + dev_dbg(dev, "unsupported mode: %d\n", mode);
> + return -EINVAL;
> }
> - if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) {
> - dev_dbg(dev, "no available ram capacity\n");
> +
> + if (!resource_size(&cxlds->part[part].res)) {
> + dev_dbg(dev, "no available capacity for mode: %d\n", mode);
> return -ENXIO;
> }
>
> - cxled->mode = mode;
> + cxled->part = part;
> return 0;
> }
>
> @@ -647,16 +630,9 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> goto out;
> }
>
> - part = -1;
> - for (int i = 0; i < cxlds->nr_partitions; i++) {
> - if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> - part = i;
> - break;
> - }
> - }
> -
> + part = cxled->part;
> if (part < 0) {
> - dev_dbg(dev, "partition %d not found\n", part);
> + dev_dbg(dev, "partition not set\n");
> rc = -EBUSY;
> goto out;
> }
> @@ -697,7 +673,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>
> if (size > avail) {
> dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size,
> - cxl_decoder_mode_name(cxled->mode), &avail);
> + res->name, &avail);
> rc = -ENOSPC;
> goto out;
> }
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index be0eb57086e1..615cbd861f66 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -198,17 +198,8 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> int rc = 0;
>
> /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
> - if (cxl_pmem_size(cxlds)) {
> - const struct resource *res = to_pmem_res(cxlds);
> -
> - offset = res->start;
> - length = resource_size(res);
> - rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc)
> - return rc;
> - }
> - if (cxl_ram_size(cxlds)) {
> - const struct resource *res = to_ram_res(cxlds);
> + for (int i = 0; i < cxlds->nr_partitions; i++) {
> + const struct resource *res = &cxlds->part[i].res;
>
> offset = res->start;
> length = resource_size(res);
> @@ -217,7 +208,7 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> * Invalid Physical Address is not an error for
> * volatile addresses. Device support is optional.
> */
> - if (rc == -EFAULT)
> + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> rc = 0;
> }
> return rc;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 78a5c2c25982..f5f2701c8771 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -194,25 +194,35 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + /* without @cxl_dpa_rwsem, make sure @part is not reloaded */
> + int part = READ_ONCE(cxled->part);
> + const char *desc;
> +
> + if (part < 0)
> + desc = "none";
> + else
> + desc = cxlds->part[part].res.name;
>
> - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
> + return sysfs_emit(buf, "%s\n", desc);
> }
>
> static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> - enum cxl_decoder_mode mode;
> + enum cxl_partition_mode mode;
> ssize_t rc;
>
> if (sysfs_streq(buf, "pmem"))
> - mode = CXL_DECODER_PMEM;
> + mode = CXL_PARTMODE_PMEM;
> else if (sysfs_streq(buf, "ram"))
> - mode = CXL_DECODER_RAM;
> + mode = CXL_PARTMODE_RAM;
> else
> return -EINVAL;
>
> - rc = cxl_dpa_set_mode(cxled, mode);
> + rc = cxl_dpa_set_part(cxled, mode);
> if (rc)
> return rc;
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 9f0f6fdbc841..83b985d2ba76 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -144,7 +144,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
> rc = down_read_interruptible(&cxl_region_rwsem);
> if (rc)
> return rc;
> - if (cxlr->mode != CXL_DECODER_PMEM)
> + if (cxlr->mode != CXL_PARTMODE_PMEM)
> rc = sysfs_emit(buf, "\n");
> else
> rc = sysfs_emit(buf, "%pUb\n", &p->uuid);
> @@ -441,7 +441,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
> * Support tooling that expects to find a 'uuid' attribute for all
> * regions regardless of mode.
> */
> - if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM)
> + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_PARTMODE_PMEM)
> return 0444;
> return a->mode;
> }
> @@ -603,8 +603,16 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> struct cxl_region *cxlr = to_cxl_region(dev);
> + const char *desc;
>
> - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode));
> + if (cxlr->mode == CXL_PARTMODE_RAM)
> + desc = "ram";
> + else if (cxlr->mode == CXL_PARTMODE_PMEM)
> + desc = "pmem";
> + else
> + desc = "";
> +
> + return sysfs_emit(buf, "%s\n", desc);
> }
> static DEVICE_ATTR_RO(mode);
>
> @@ -630,7 +638,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
>
> /* ways, granularity and uuid (if PMEM) need to be set before HPA */
> if (!p->interleave_ways || !p->interleave_granularity ||
> - (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid)))
> + (cxlr->mode == CXL_PARTMODE_PMEM && uuid_is_null(&p->uuid)))
> return -ENXIO;
>
> div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder);
> @@ -1875,6 +1883,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> {
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_port *ep_port, *root_port;
> struct cxl_dport *dport;
> @@ -1889,17 +1898,17 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return rc;
> }
>
> - if (cxled->mode != cxlr->mode) {
> - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n",
> - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode);
> - return -EINVAL;
> - }
> -
> - if (cxled->mode == CXL_DECODER_DEAD) {
> + if (cxled->part < 0) {
> dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
> return -ENODEV;
> }
>
> + if (cxlds->part[cxled->part].mode != cxlr->mode) {
> + dev_dbg(&cxlr->dev, "%s region mode: %d mismatch\n",
> + dev_name(&cxled->cxld.dev), cxlr->mode);
> + return -EINVAL;
> + }
> +
> /* all full of members, or interleave config not established? */
> if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
> dev_dbg(&cxlr->dev, "region already active\n");
> @@ -2102,7 +2111,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
> {
> down_write(&cxl_region_rwsem);
> - cxled->mode = CXL_DECODER_DEAD;
> + cxled->part = -1;
> cxl_region_detach(cxled);
> up_write(&cxl_region_rwsem);
> }
> @@ -2458,7 +2467,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb,
> */
> static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> int id,
> - enum cxl_decoder_mode mode,
> + enum cxl_partition_mode mode,
> enum cxl_decoder_type type)
> {
> struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> @@ -2512,13 +2521,13 @@ static ssize_t create_ram_region_show(struct device *dev,
> }
>
> static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> - enum cxl_decoder_mode mode, int id)
> + enum cxl_partition_mode mode, int id)
> {
> int rc;
>
> switch (mode) {
> - case CXL_DECODER_RAM:
> - case CXL_DECODER_PMEM:
> + case CXL_PARTMODE_RAM:
> + case CXL_PARTMODE_PMEM:
> break;
> default:
> dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
> @@ -2538,7 +2547,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> }
>
> static ssize_t create_region_store(struct device *dev, const char *buf,
> - size_t len, enum cxl_decoder_mode mode)
> + size_t len, enum cxl_partition_mode mode)
> {
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> struct cxl_region *cxlr;
> @@ -2559,7 +2568,7 @@ static ssize_t create_pmem_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> {
> - return create_region_store(dev, buf, len, CXL_DECODER_PMEM);
> + return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
> }
> DEVICE_ATTR_RW(create_pmem_region);
>
> @@ -2567,7 +2576,7 @@ static ssize_t create_ram_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> {
> - return create_region_store(dev, buf, len, CXL_DECODER_RAM);
> + return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
> }
> DEVICE_ATTR_RW(create_ram_region);
>
> @@ -2665,7 +2674,7 @@ EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, "CXL");
>
> struct cxl_poison_context {
> struct cxl_port *port;
> - enum cxl_decoder_mode mode;
> + int part;
> u64 offset;
> };
>
> @@ -2673,49 +2682,45 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> struct cxl_poison_context *ctx)
> {
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + const struct resource *res;
> + struct resource *p, *last;
> u64 offset, length;
> int rc = 0;
>
> + if (ctx->part < 0)
> + return 0;
> +
> /*
> - * Collect poison for the remaining unmapped resources
> - * after poison is collected by committed endpoints.
> - *
> - * Knowing that PMEM must always follow RAM, get poison
> - * for unmapped resources based on the last decoder's mode:
> - * ram: scan remains of ram range, then any pmem range
> - * pmem: scan remains of pmem range
> + * Collect poison for the remaining unmapped resources after
> + * poison is collected by committed endpoints decoders.
> */
> -
> - if (ctx->mode == CXL_DECODER_RAM) {
> - offset = ctx->offset;
> - length = cxl_ram_size(cxlds) - offset;
> + for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
> + res = &cxlds->part[i].res;
> + for (p = res->child, last = NULL; p; p = p->sibling)
> + last = p;
> + if (last)
> + offset = last->end + 1;
> + else
> + offset = res->start;
> + length = res->end - offset + 1;
> + if (!length)
> + break;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc == -EFAULT)
> - rc = 0;
> + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> + continue;
> if (rc)
> - return rc;
> - }
> - if (ctx->mode == CXL_DECODER_PMEM) {
> - offset = ctx->offset;
> - length = resource_size(&cxlds->dpa_res) - offset;
> - if (!length)
> - return 0;
> - } else if (cxl_pmem_size(cxlds)) {
> - const struct resource *res = to_pmem_res(cxlds);
> -
> - offset = res->start;
> - length = resource_size(res);
> - } else {
> - return 0;
> + break;
> }
>
> - return cxl_mem_get_poison(cxlmd, offset, length, NULL);
> + return rc;
> }
>
> static int poison_by_decoder(struct device *dev, void *arg)
> {
> struct cxl_poison_context *ctx = arg;
> struct cxl_endpoint_decoder *cxled;
> + enum cxl_partition_mode mode;
> + struct cxl_dev_state *cxlds;
> struct cxl_memdev *cxlmd;
> u64 offset, length;
> int rc = 0;
> @@ -2728,11 +2733,17 @@ static int poison_by_decoder(struct device *dev, void *arg)
> return rc;
>
> cxlmd = cxled_to_memdev(cxled);
> + cxlds = cxlmd->cxlds;
> + if (cxled->part < 0)
> + mode = CXL_PARTMODE_NONE;
> + else
> + mode = cxlds->part[cxled->part].mode;
> +
> if (cxled->skip) {
> offset = cxled->dpa_res->start - cxled->skip;
> length = cxled->skip;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
> + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
> rc = 0;
> if (rc)
> return rc;
> @@ -2741,7 +2752,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
> offset = cxled->dpa_res->start;
> length = cxled->dpa_res->end - offset + 1;
> rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
> - if (rc == -EFAULT && cxled->mode == CXL_DECODER_RAM)
> + if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
> rc = 0;
> if (rc)
> return rc;
> @@ -2749,7 +2760,7 @@ static int poison_by_decoder(struct device *dev, void *arg)
> /* Iterate until commit_end is reached */
> if (cxled->cxld.id == ctx->port->commit_end) {
> ctx->offset = cxled->dpa_res->end + 1;
> - ctx->mode = cxled->mode;
> + ctx->part = cxled->part;
> return 1;
> }
>
> @@ -2762,7 +2773,8 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> int rc = 0;
>
> ctx = (struct cxl_poison_context) {
> - .port = port
> + .port = port,
> + .part = -1,
> };
>
> rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
> @@ -3206,14 +3218,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *port = cxlrd_to_port(cxlrd);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct range *hpa = &cxled->cxld.hpa_range;
> + int rc, part = READ_ONCE(cxled->part);
> struct cxl_region_params *p;
> struct cxl_region *cxlr;
> struct resource *res;
> - int rc;
> +
> + if (part < 0)
> + return ERR_PTR(-EBUSY);
>
> do {
> - cxlr = __create_region(cxlrd, cxled->mode,
> + cxlr = __create_region(cxlrd, cxlds->part[part].mode,
> atomic_read(&cxlrd->region_id));
> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>
> @@ -3416,9 +3432,9 @@ static int cxl_region_probe(struct device *dev)
> return rc;
>
> switch (cxlr->mode) {
> - case CXL_DECODER_PMEM:
> + case CXL_PARTMODE_PMEM:
> return devm_cxl_add_pmem_region(cxlr);
> - case CXL_DECODER_RAM:
> + case CXL_PARTMODE_RAM:
> /*
> * The region can not be manged by CXL if any portion of
> * it is already online as 'System RAM'
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 4d0550367042..cb6f0b761b24 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -371,30 +371,6 @@ struct cxl_decoder {
> void (*reset)(struct cxl_decoder *cxld);
> };
>
> -/*
> - * CXL_DECODER_DEAD prevents endpoints from being reattached to regions
> - * while cxld_unregister() is running
> - */
> -enum cxl_decoder_mode {
> - CXL_DECODER_NONE,
> - CXL_DECODER_RAM,
> - CXL_DECODER_PMEM,
> - CXL_DECODER_DEAD,
> -};
> -
> -static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
> -{
> - static const char * const names[] = {
> - [CXL_DECODER_NONE] = "none",
> - [CXL_DECODER_RAM] = "ram",
> - [CXL_DECODER_PMEM] = "pmem",
> - };
> -
> - if (mode >= CXL_DECODER_NONE && mode < CXL_DECODER_DEAD)
> - return names[mode];
> - return "mixed";
> -}
> -
> /*
> * Track whether this decoder is reserved for region autodiscovery, or
> * free for userspace provisioning.
> @@ -409,16 +385,16 @@ enum cxl_decoder_state {
> * @cxld: base cxl_decoder_object
> * @dpa_res: actively claimed DPA span of this decoder
> * @skip: offset into @dpa_res where @cxld.hpa_range maps
> - * @mode: which memory type / access-mode-partition this decoder targets
> * @state: autodiscovery state
> + * @part: partition index this decoder maps
> * @pos: interleave position in @cxld.region
> */
> struct cxl_endpoint_decoder {
> struct cxl_decoder cxld;
> struct resource *dpa_res;
> resource_size_t skip;
> - enum cxl_decoder_mode mode;
> enum cxl_decoder_state state;
> + int part;
> int pos;
> };
>
> @@ -503,6 +479,12 @@ struct cxl_region_params {
> int nr_targets;
> };
>
> +enum cxl_partition_mode {
> + CXL_PARTMODE_NONE,
> + CXL_PARTMODE_RAM,
> + CXL_PARTMODE_PMEM,
> +};
> +
> /*
> * Indicate whether this region has been assembled by autodetection or
> * userspace assembly. Prevent endpoint decoders outside of automatic
> @@ -522,7 +504,7 @@ struct cxl_region_params {
> * struct cxl_region - CXL region
> * @dev: This region's device
> * @id: This region's id. Id is globally unique across all regions
> - * @mode: Endpoint decoder allocation / access mode
> + * @mode: Operational mode of the mapped capacity
> * @type: Endpoint decoder target type
> * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
> * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge
> @@ -535,7 +517,7 @@ struct cxl_region_params {
> struct cxl_region {
> struct device dev;
> int id;
> - enum cxl_decoder_mode mode;
> + enum cxl_partition_mode mode;
> enum cxl_decoder_type type;
> struct cxl_nvdimm_bridge *cxl_nvb;
> struct cxl_pmem_region *cxlr_pmem;
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index bad99456e901..f218d43dec9f 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -97,12 +97,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped);
>
> -enum cxl_partition_mode {
> - CXL_PARTMODE_NONE,
> - CXL_PARTMODE_RAM,
> - CXL_PARTMODE_PMEM,
> -};
> -
> #define CXL_NR_PARTITIONS_MAX 2
>
> struct cxl_dpa_info {
> @@ -530,20 +524,6 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> return resource_size(res);
> }
>
> -/*
> - * Translate the operational mode of memory capacity with the
> - * operational mode of a decoder
> - * TODO: kill 'enum cxl_decoder_mode' to obviate this helper
> - */
> -static inline enum cxl_decoder_mode cxl_part_mode(enum cxl_partition_mode mode)
> -{
> - if (mode == CXL_PARTMODE_RAM)
> - return CXL_DECODER_RAM;
> - if (mode == CXL_PARTMODE_PMEM)
> - return CXL_DECODER_PMEM;
> - return CXL_DECODER_NONE;
> -}
> -
> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
> {
> return dev_get_drvdata(cxl_mbox->host);
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-22 17:42 ` Ira Weiny
2025-01-22 22:58 ` Dan Williams
@ 2025-01-23 21:30 ` Dave Jiang
2025-01-24 22:22 ` Ira Weiny
1 sibling, 1 reply; 48+ messages in thread
From: Dave Jiang @ 2025-01-23 21:30 UTC (permalink / raw)
To: Ira Weiny, Dan Williams, linux-cxl; +Cc: Alejandro Lucero, Jonathan.Cameron
On 1/22/25 10:42 AM, Ira Weiny wrote:
> Dan Williams wrote:
>> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
>> tracked in the partition, and no code paths have dependencies on the
>> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
>> can be cleaned up, specifically this ambiguity on whether the operation
>> mode implied anything about the partition order.
>>
>> Endpoint decoders simply reference their assigned partition where the
>> operational mode can be retrieved as partition mode.
>
> You really seem to be defining a region mode not a partition mode.
>
> I did a lot of work to resolve this for DCD interleave in the future.
> This included the introduction of the DC region mode. I __think__ that
> what you have here will work fine.
>
> However, from a user ABI standpoint I'm going to have to play games with
> having the DCD partitions in a well defined sub-array such that the user
Are you talking about instead of having additional elements in the partition array, DCD will have it's own array?
DJ
> can specify which DCD partition they want to use. So the user concept of
> decoder mode does not really go away.
>
> In the interest of urgency I'm going to give my tag on this. But I would
> have preferred this called region mode. But I can see why partition mode
> makes sense too.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
>
> [snip]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic
2025-01-23 16:41 ` Jonathan Cameron
@ 2025-01-23 21:34 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 21:34 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
Jonathan Cameron wrote:
> On Wed, 22 Jan 2025 00:59:33 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM
> > allocations being distinct from RAM allocations in specific ways when in
> > practice the allocation rules are only relative to DPA partition index.
> >
> > The rules for cxl_dpa_alloc() are:
> >
> > - allocations can only come from 1 partition
> >
> > - if allocating at partition-index-N, all free space in partitions less
> > than partition-index-N must be skipped over
> >
> > Use the new 'struct cxl_dpa_partition' array to support allocation with
> > an arbitrary number of DPA partitions on the device.
> >
> > A follow-on patch can go further to cleanup 'enum cxl_decoder_mode'
> > concept and supersede it with looking up the memory properties from
> > partition metadata. Until then cxl_part_mode() temporarily bridges code
> > that looks up partitions by @cxled->mode.
> >
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> A few possible simplifications below + a trivial debug message printing
> a useful value comment.
>
> Jonathan
>
> > ---
> > drivers/cxl/core/hdm.c | 215 +++++++++++++++++++++++++++++++++++-------------
> > drivers/cxl/cxlmem.h | 14 +++
> > 2 files changed, 172 insertions(+), 57 deletions(-)
> >
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 3f8a54ca4624..591aeb26c9e1 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -223,6 +223,31 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL");
> >
> > +/* See request_skip() kernel-doc */
> > +static void release_skip(struct cxl_dev_state *cxlds,
> > + const resource_size_t skip_base,
> > + const resource_size_t skip_len)
> > +{
> > + resource_size_t skip_start = skip_base, skip_rem = skip_len;
> > +
> > + for (int i = 0; i < cxlds->nr_partitions; i++) {
> > + const struct resource *part_res = &cxlds->part[i].res;
> > + resource_size_t skip_end, skip_size;
> > +
> > + if (skip_start < part_res->start || skip_start > part_res->end)
> > + continue;
> > +
> > + skip_end = min(part_res->end, skip_start + skip_rem - 1);
> > + skip_size = skip_end - skip_start + 1;
> > + __release_region(&cxlds->dpa_res, skip_start, skip_size);
> > + skip_start += skip_size;
> > + skip_rem -= skip_size;
> > +
> > + if (!skip_rem)
> > + break;
> > + }
> > +}
>
> Could ignore all explicit ordering constraints and have perhaps simpler
> (Even simpler if there is an overlap helper we can use)
> Assumption is we want to blow away anything in the skip range, whatever
> partition it is in.
>
> for (int i = 0; i < cxlds->nr_paritions; i++) {
> const struct resource *part_res = &cxlds->part[i].res;
> resource_size_t toremove_start, toremove_end;
>
> toremove_start = max(skip_start, part_res->start);
> toremove_end = min(skip_end, part_res->end);
> if (toremove_end > toremove_start) {
> resource_size_t rem_size = toremove_end - toremove_start + 1;
> __release_region(&cxlds->dpa_res, toremove_start, rem_size);
> }
>
> }
> Can track skip_rem or not bother with that optimization.
I like it, I'll switch to this.
>
> Mind you your code is fine so I don't really mind.
> I think we can build similar for request_skip based on ordering assumption, though
> there we do need to keep track of how far we got so as to unwind only
> that bit.
Will give it a spin and see how it looks.
[..]
> > @@ -553,47 +647,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> > goto out;
> > }
> >
> > - for (p = ram_res->child, last = NULL; p; p = p->sibling)
> > - last = p;
> > - if (last)
> > - free_ram_start = last->end + 1;
> > - else
> > - free_ram_start = ram_res->start;
> > + part = -1;
> > + for (int i = 0; i < cxlds->nr_partitions; i++) {
> > + if (cxled->mode == cxl_part_mode(cxlds->part[i].mode)) {
> > + part = i;
> > + break;
> > + }
> > + }
> >
> > - for (p = pmem_res->child, last = NULL; p; p = p->sibling)
> > + if (part < 0) {
> > + dev_dbg(dev, "partition %d not found\n", part);
>
> how is part useful to print here? it's -1
Yeah, another thinko likely because I was already thinking about how
cxled->part shuold already be set before entering this function in the
next patch.
For this one, I'll just delete the message because in the next patch the
loop is gone and only the following remains:
part = cxled->part;
if (part < 0) {
dev_dbg(dev, "partition not set\n");
rc = -EBUSY;
goto out;
}
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-23 16:51 ` Jonathan Cameron
@ 2025-01-23 21:50 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 21:50 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: linux-cxl, Dave Jiang, Alejandro Lucero, Ira Weiny
Jonathan Cameron wrote:
> On Wed, 22 Jan 2025 00:59:38 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> > tracked in the partition, and no code paths have dependencies on the
> > mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> > can be cleaned up, specifically this ambiguity on whether the operation
> > mode implied anything about the partition order.
> >
> > Endpoint decoders simply reference their assigned partition where the
> > operational mode can be retrieved as partition mode.
> >
> > With this in place PMEM can now be partition0 which happens today when
> > the RAM capacity size is zero. Dynamic RAM can appear above PMEM when
> > DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM"
> > assumption can now just iterate partitions and consult the partition
> > mode after the fact.
> >
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: Alejandro Lucero <alucerop@amd.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> A few things inline.
>
> Jonathan
>
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 591aeb26c9e1..bb478e7b12f6 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
[..]
> > - /*
> > - * Only allow modes that are supported by the current partition
> > - * configuration
> > - */
> > - if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) {
> > - dev_dbg(dev, "no available pmem capacity\n");
> > - return -ENXIO;
> > + part = -1;
> > + for (int i = 0; i < cxlds->nr_partitions; i++)
>
> Similar to previous comment can use early loop exit and
> part as the loop iteration variable short code and no magic i
> appears.
Yeah, might as well.
[..]
> > @@ -2728,11 +2733,17 @@ static int poison_by_decoder(struct device *dev, void *arg)
> > return rc;
> >
> > cxlmd = cxled_to_memdev(cxled);
> > + cxlds = cxlmd->cxlds;
> > + if (cxled->part < 0)
> > + mode = CXL_PARTMODE_NONE;
> Ah. Here is our mysterious none. Maybe add a comment on what
> this means in practice. Race condition, actual hole, crazy decoder
> someone (e.g bios) setup?
I just fixed it up like this:
@@ -2724,15 +2729,18 @@ static int poison_by_decoder(struct device *dev, void *arg)
return rc;
cxled = to_cxl_endpoint_decoder(dev);
- if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
+ if (!cxled->dpa_res)
return rc;
cxlmd = cxled_to_memdev(cxled);
+ cxlds = cxlmd->cxlds;
+ mode = cxlds->part[cxled->part].mode;
+
...because there is no such thing as a decoder with allocated capacity
but no partition set, and an endpoint decoder will never be zero sized.
I also thought about adding a lockdep_assert_held(&cxl_dpa_rwsem)
because this function is a few calls away from that context, but will
just leave that alone for now.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-23 17:00 ` Alejandro Lucero Palau
@ 2025-01-23 22:43 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-23 22:43 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, linux-cxl
Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
Alejandro Lucero Palau wrote:
> <snip>
> >
> > -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds)
> > +
> > +/* Static RAM is only expected at partition 0. */
>
>
> This may be seen silly but Static Ram as SRAM has connotations which
> could confuse people.
>
> Maybe better to use Static partition for RAM.
Without the DCD code upstream even "Static partition" is "confusing",
"what's a non-static partition?". At some point people need to be
trusted to have context and share in the responsibility to resolve their
own confusion.
> It is far more concerning to me though Ira's comment about how can we be
> sure there is going to be only one RAM or one PMEM partition.
Simply put, we can't.
Use cases evolve in unpredictable ways and folks will show up to address
those new use case when they arrive. I do not lose sleep over it due to
the open ecosystem, and in this case, economics:
- open ecosystem means someone can see us chatting about theoretical
devices that complicate the implementation and either chime in and say
"we have such a use case, here are the patches", or "it looks like
mainline does not care about that particular degree of spec freedom, do
we really want to go there with our device?".
- on economics: It is hard enough to get developers to care about NUMA
effects let alone heterogenous memory performance within a single
device. It would mean building a complicated device with heterogeneous
media, or mixed memory controllers. The Type-2 series confirmed that
single range RAM accelerator support is all that is needed for now.
My bar is "as simple as possible, but no simpler" especially when it
comes to user ABI extensions, because ABIs are forever.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-23 17:17 ` Alejandro Lucero Palau
@ 2025-01-23 22:48 ` Dan Williams
2025-01-24 10:29 ` Alejandro Lucero Palau
0 siblings, 1 reply; 48+ messages in thread
From: Dan Williams @ 2025-01-23 22:48 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, linux-cxl
Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
Alejandro Lucero Palau wrote:
>
> On 1/22/25 08:59, Dan Williams wrote:
> > The pending efforts to add CXL Accelerator (type-2) device [1], and
> > Dynamic Capacity (DCD) support [2], tripped on the
> > no-longer-fit-for-purpose design in the CXL subsystem for tracking
> > device-physical-address (DPA) metadata. Trip hazards include:
> >
> > - CXL Memory Devices need to consider a PMEM partition, but Accelerator
> > devices with CXL.mem likely do not in the common case.
> >
> > - CXL Memory Devices enumerate DPA through Memory Device mailbox
> > commands like Partition Info, Accelerators devices do not.
>
>
> Forgot to mention this one. Mailbox is optional for accelerators.
>
>
> So maybe "but this way is optional for accelerators, implying
> accel-driver defined/hardcoded enumeration".
That's what "Accelerators do not" means. I.e. that setting up the DPA
layout needs to be refactored into infrastructure that can be shared
with drivers that will not use a mailbox for this purpose.
[..scrolls to end to see if any additional comments..]
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-23 21:04 ` Dan Williams
@ 2025-01-24 10:15 ` Alejandro Lucero Palau
2025-01-25 0:45 ` Dan Williams
0 siblings, 1 reply; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-24 10:15 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
On 1/23/25 21:04, Dan Williams wrote:
<snip>
> xa_for_each(dsmas_xa, index, dent) {
>>> - if (resource_size(&cxlds->ram_res) &&
>>> - range_contains(&ram_range, &dent->dpa_range))
>>> - update_perf_entry(dev, dent, &mds->ram_perf);
>>> - else if (resource_size(&cxlds->pmem_res) &&
>>> - range_contains(&pmem_range, &dent->dpa_range))
>>> - update_perf_entry(dev, dent, &mds->pmem_perf);
>>> - else
>>> - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
>>> - &dent->dpa_range);
>>> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
>>> + const struct resource *res = partition[i];
>>> + struct range range = {
>>> + .start = res->start,
>>> + .end = res->end,
>>> + };
>>> +
>>> + if (range_contains(&range, &dent->dpa_range))
>>> + update_perf_entry(dev, dent, perf[i]);
>> This is checking if range contains dent->dpa_range.
>>
>> I think it is just the opposite.
> This looks like an equivalent conversion to me, what am I missing?
I guess I was confused with dpa_range in dent and assumed that was the
full device DPA. But now after studying the code in more detail, I see
it is right ... but confusing :-)
If I'm not wrong again, that range_contains should really be ranges
being equal, what range_contains does also cover ...
It would help some comment explaining what the function is doing with
the information coming from two different sources, one being the device
CDAT, with potentially performance numbers per partition, and the other
being the mailbox GET_PARTITION_INFO command.
Anyway, apologies for the noise.
<snip>
>>> @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
>>> .end = dpa_res->end,
>>> };
>>>
>>> + if (!perf)
>>> + return false;
>>> +
>> This change seems to be an improvement or hardening. Not against doing
>> it, but seizing the change, the function can be simplified using the
>> parameter without any local variable.
> No, it's not pure hardening, it is actively avoiding NULL pointer
> de-references introduced by the new scheme to not track empty
> partitions.
>
> I.e the new to_{ram,pmem}_perf() helpers return NULL when the partition
> is zero-sized. Previously this code path would do range checks on empty
> partitions.
That's true.
But I do not see the reason for using that local variable that I know is
not related with your change.
<snip>
>>>
>>> if (!cxlds->media_ready) {
>>> cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
>>> - cxlds->ram_res = DEFINE_RES_MEM(0, 0);
>>> - cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
>>> + *ram_res = DEFINE_RES_MEM(0, 0);
>>> + *pmem_res = DEFINE_RES_MEM(0, 0);
>>
>> This is a good example for the discussion about the patch hardening
>> resource_contains. The initialization seems fine but IORESOURCE_UNSET
>> not used.
> To the contrary, I think these changes are an example of "updating
> resource_contains() to check for zero is a band-aid that does not fix
> the root problem".
>
I agree, but it does not preclude some resource initialization as you do
here not containing the IORESOURCE_UNSET flag and some other code
triggering the same problem I faced, so the hardening of
resource_contains still useful, IMO. But I'll tell Bjorn Helgaas that
patch is not required anymore for CXL leaving to him the decision to
apply it if he considers so, following the concerns you expressed there.
>> it could be argued the resource is set, but it is a zero-size resource
>> leading to problems in current CXL code.
> I challenge you to find problems in current CXL code after these
> partition reworks.
That's an unfair challenge ... you removed resource_contains ... :-)
Let me think I got some credit for that after my heads-up ;-)
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
2025-01-23 22:48 ` Dan Williams
@ 2025-01-24 10:29 ` Alejandro Lucero Palau
0 siblings, 0 replies; 48+ messages in thread
From: Alejandro Lucero Palau @ 2025-01-24 10:29 UTC (permalink / raw)
To: Dan Williams, linux-cxl; +Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
On 1/23/25 22:48, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
>> On 1/22/25 08:59, Dan Williams wrote:
>>> The pending efforts to add CXL Accelerator (type-2) device [1], and
>>> Dynamic Capacity (DCD) support [2], tripped on the
>>> no-longer-fit-for-purpose design in the CXL subsystem for tracking
>>> device-physical-address (DPA) metadata. Trip hazards include:
>>>
>>> - CXL Memory Devices need to consider a PMEM partition, but Accelerator
>>> devices with CXL.mem likely do not in the common case.
>>>
>>> - CXL Memory Devices enumerate DPA through Memory Device mailbox
>>> commands like Partition Info, Accelerators devices do not.
>>
>> Forgot to mention this one. Mailbox is optional for accelerators.
>>
>>
>> So maybe "but this way is optional for accelerators, implying
>> accel-driver defined/hardcoded enumeration".
> That's what "Accelerators do not" means. I.e. that setting up the DPA
> layout needs to be refactored into infrastructure that can be shared
> with drivers that will not use a mailbox for this purpose.
I do not want to spend time in this little thing, but I understand the
line above as an absolute for accelerators.
I agree the shared infrastructure needs to support both cases.
>
> [..scrolls to end to see if any additional comments..]
You are not the only one suffering this ... I'll do my best avoiding it
in the future.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode
2025-01-23 21:30 ` Dave Jiang
@ 2025-01-24 22:22 ` Ira Weiny
0 siblings, 0 replies; 48+ messages in thread
From: Ira Weiny @ 2025-01-24 22:22 UTC (permalink / raw)
To: Dave Jiang, Ira Weiny, Dan Williams, linux-cxl
Cc: Alejandro Lucero, Jonathan.Cameron
Dave Jiang wrote:
>
>
> On 1/22/25 10:42 AM, Ira Weiny wrote:
> > Dan Williams wrote:
> >> Now that the operational mode of DPA capacity (ram vs pmem... etc) is
> >> tracked in the partition, and no code paths have dependencies on the
> >> mode implying the partition index, the ambiguous 'enum cxl_decoder_mode'
> >> can be cleaned up, specifically this ambiguity on whether the operation
> >> mode implied anything about the partition order.
> >>
> >> Endpoint decoders simply reference their assigned partition where the
> >> operational mode can be retrieved as partition mode.
> >
> > You really seem to be defining a region mode not a partition mode.
> >
> > I did a lot of work to resolve this for DCD interleave in the future.
> > This included the introduction of the DC region mode. I __think__ that
> > what you have here will work fine.
> >
> > However, from a user ABI standpoint I'm going to have to play games with
> > having the DCD partitions in a well defined sub-array such that the user
>
> Are you talking about instead of having additional elements in the partition array, DCD will have it's own array?
Not it's own. But a well known sub-array.
Dan does not like that though. He wants the partitions to be anonymous to
the user and based only on their attributes. So I'll defer to that and
let the next person deal with the fall out if anyone ever makes an actual
device.
Ira
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers
2025-01-24 10:15 ` Alejandro Lucero Palau
@ 2025-01-25 0:45 ` Dan Williams
0 siblings, 0 replies; 48+ messages in thread
From: Dan Williams @ 2025-01-25 0:45 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, linux-cxl
Cc: Dave Jiang, Ira Weiny, Jonathan.Cameron
Alejandro Lucero Palau wrote:
>
> On 1/23/25 21:04, Dan Williams wrote:
>
> <snip>
>
> > xa_for_each(dsmas_xa, index, dent) {
> >>> - if (resource_size(&cxlds->ram_res) &&
> >>> - range_contains(&ram_range, &dent->dpa_range))
> >>> - update_perf_entry(dev, dent, &mds->ram_perf);
> >>> - else if (resource_size(&cxlds->pmem_res) &&
> >>> - range_contains(&pmem_range, &dent->dpa_range))
> >>> - update_perf_entry(dev, dent, &mds->pmem_perf);
> >>> - else
> >>> - dev_dbg(dev, "no partition for dsmas dpa: %pra\n",
> >>> - &dent->dpa_range);
> >>> + for (int i = 0; i < ARRAY_SIZE(partition); i++) {
> >>> + const struct resource *res = partition[i];
> >>> + struct range range = {
> >>> + .start = res->start,
> >>> + .end = res->end,
> >>> + };
> >>> +
> >>> + if (range_contains(&range, &dent->dpa_range))
> >>> + update_perf_entry(dev, dent, perf[i]);
> >> This is checking if range contains dent->dpa_range.
> >>
> >> I think it is just the opposite.
> > This looks like an equivalent conversion to me, what am I missing?
>
>
> I guess I was confused with dpa_range in dent and assumed that was the
> full device DPA. But now after studying the code in more detail, I see
> it is right ... but confusing :-)
>
>
> If I'm not wrong again, that range_contains should really be ranges
> being equal, what range_contains does also cover ...
>
>
> It would help some comment explaining what the function is doing with
> the information coming from two different sources, one being the device
> CDAT, with potentially performance numbers per partition, and the other
> being the mailbox GET_PARTITION_INFO command.
>
>
> Anyway, apologies for the noise.
Not noise at all, a good review of this function's decisions after the
fact. Dave, do you want to follow on with some commentary for this
function?
I recall part of the motivation for range_contains() was the fact that
theoretically the CDAT DSMAS range could be less than a partition, or
cross a partition boundary. The policy here is about explicitly not
entertaining devices that want to get fancy with the CDAT. So, the
kernel just picks any DSMAS in the partition and uses that for the
performance information. If somebody screams and says the kernel needs
to be more precise (force mainline to carry more maintenance burden for
their use case) then we can revisit.
>
> >>> @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
> >>> .end = dpa_res->end,
> >>> };
> >>>
> >>> + if (!perf)
> >>> + return false;
> >>> +
> >> This change seems to be an improvement or hardening. Not against doing
> >> it, but seizing the change, the function can be simplified using the
> >> parameter without any local variable.
> > No, it's not pure hardening, it is actively avoiding NULL pointer
> > de-references introduced by the new scheme to not track empty
> > partitions.
> >
> > I.e the new to_{ram,pmem}_perf() helpers return NULL when the partition
> > is zero-sized. Previously this code path would do range checks on empty
> > partitions.
>
>
> That's true.
>
>
> But I do not see the reason for using that local variable that I know is
> not related with your change.
@perf was never NULL before this patch, it may be NULL afterwards, so I
am not understanding the "not related" charge?
I.e. previously, cxled_get_dpa_perf() would pass perf entries related to zero
ranges, but now to_ram_perf() returns NULL for those cases.
However, I could move that decision outside of dpa_perf_contains()
rather than deleting this NULL check in patch5. I'll do that.
[..]
> >>>
> >>> if (!cxlds->media_ready) {
> >>> cxlds->dpa_res = DEFINE_RES_MEM(0, 0);
> >>> - cxlds->ram_res = DEFINE_RES_MEM(0, 0);
> >>> - cxlds->pmem_res = DEFINE_RES_MEM(0, 0);
> >>> + *ram_res = DEFINE_RES_MEM(0, 0);
> >>> + *pmem_res = DEFINE_RES_MEM(0, 0);
> >>
> >> This is a good example for the discussion about the patch hardening
> >> resource_contains. The initialization seems fine but IORESOURCE_UNSET
> >> not used.
> > To the contrary, I think these changes are an example of "updating
> > resource_contains() to check for zero is a band-aid that does not fix
> > the root problem".
> >
>
> I agree, but it does not preclude some resource initialization as you do
> here not containing the IORESOURCE_UNSET flag and some other code
> triggering the same problem I faced, so the hardening of
> resource_contains still useful, IMO. But I'll tell Bjorn Helgaas that
> patch is not required anymore for CXL leaving to him the decision to
> apply it if he considers so, following the concerns you expressed there.
>
> >> it could be argued the resource is set, but it is a zero-size resource
> >> leading to problems in current CXL code.
> > I challenge you to find problems in current CXL code after these
> > partition reworks.
>
> That's an unfair challenge ... you removed resource_contains ... :-)
>
> Let me think I got some credit for that after my heads-up ;-)
Oh definitely, you did.
That was part of my internal monologue / uneasiness with the Type-2
patches, something like => "why is CXL sending 0 sized resources to
resource_contains()... oh, no, this whole partition tracking
implementation needs a re-think".
^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2025-01-25 0:45 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-22 8:59 [PATCH v2 0/5] cxl: DPA partition metadata is a mess Dan Williams
2025-01-22 8:59 ` [PATCH v2 1/5] cxl: Remove the CXL_DECODER_MIXED mistake Dan Williams
2025-01-22 14:11 ` Ira Weiny
2025-01-23 15:49 ` Jonathan Cameron
2025-01-23 15:58 ` Alejandro Lucero Palau
2025-01-23 16:03 ` Dave Jiang
2025-01-22 8:59 ` [PATCH v2 2/5] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers Dan Williams
2025-01-22 14:18 ` Ira Weiny
2025-01-23 15:57 ` Jonathan Cameron
2025-01-23 20:01 ` Dan Williams
2025-01-23 16:13 ` Dave Jiang
2025-01-23 16:25 ` Alejandro Lucero Palau
2025-01-23 21:04 ` Dan Williams
2025-01-24 10:15 ` Alejandro Lucero Palau
2025-01-25 0:45 ` Dan Williams
2025-01-22 8:59 ` [PATCH v2 3/5] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' Dan Williams
2025-01-22 14:53 ` Ira Weiny
2025-01-22 22:24 ` Dan Williams
2025-01-23 3:10 ` Ira Weiny
2025-01-23 16:09 ` Jonathan Cameron
2025-01-23 20:24 ` Dan Williams
2025-01-23 16:57 ` Dave Jiang
2025-01-23 17:00 ` Alejandro Lucero Palau
2025-01-23 22:43 ` Dan Williams
2025-01-23 17:17 ` Alejandro Lucero Palau
2025-01-23 22:48 ` Dan Williams
2025-01-24 10:29 ` Alejandro Lucero Palau
2025-01-22 8:59 ` [PATCH v2 4/5] cxl: Make cxl_dpa_alloc() DPA partition number agnostic Dan Williams
2025-01-22 16:29 ` Ira Weiny
2025-01-22 22:35 ` Dan Williams
2025-01-23 3:14 ` Ira Weiny
2025-01-23 3:28 ` Dan Williams
2025-01-23 16:41 ` Jonathan Cameron
2025-01-23 21:34 ` Dan Williams
2025-01-23 17:21 ` Alejandro Lucero Palau
2025-01-23 20:52 ` Dave Jiang
2025-01-22 8:59 ` [PATCH v2 5/5] cxl: Kill enum cxl_decoder_mode Dan Williams
2025-01-22 17:42 ` Ira Weiny
2025-01-22 22:58 ` Dan Williams
2025-01-23 3:39 ` Ira Weiny
2025-01-23 4:11 ` Dan Williams
2025-01-23 21:30 ` Dave Jiang
2025-01-24 22:22 ` Ira Weiny
2025-01-23 16:51 ` Jonathan Cameron
2025-01-23 21:50 ` Dan Williams
2025-01-23 17:20 ` Alejandro Lucero Palau
2025-01-23 21:29 ` Dave Jiang
2025-01-23 17:23 ` [PATCH v2 0/5] cxl: DPA partition metadata is a mess Alejandro Lucero Palau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox