[RFC PATCH 0/4] cxl: Region attach for accelerators

public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/4] cxl: Region attach for accelerators
@ 2026-04-03 21:00 Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Dan Williams @ 2026-04-03 21:00 UTC (permalink / raw)
  To: dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

Here is a collection of proposed fixups to get basic CXL accelerator
driver attachment operational. The primary goal here is communicate the
auto-assembled region parameters to an accelerator driver while also
following existing CXL subsystem lifetime rules.

It is RFC due to lack of testing, but passing basic smoke tests so far.

It applies on top of the latest state of the for-7.1/cxl-type2-support
branch in cxl.git [1].

https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-7.1/cxl-type2-support


Dan Williams (4):
  cxl/mem: Add support to cleanly continue after attach
  cxl/region: Move region lock error code to -EBUSY
  cxl/region: Block region delete for locked regions
  cxl/region: Introduce cxl_memdev_attach_region

 include/cxl/cxl.h         |  16 +++++
 drivers/cxl/core/region.c | 136 ++++++++++++++++++++++++++++++++++++--
 drivers/cxl/mem.c         |  14 +++-
 3 files changed, 161 insertions(+), 5 deletions(-)


base-commit: ccabeb15ced4c06eeff7e35d6a54e7835d3d160a
-- 
2.53.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach
  2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
@ 2026-04-03 21:00 ` Dan Williams
  2026-04-09 22:02   ` Cheatham, Benjamin
  2026-04-03 21:00 ` [RFC PATCH 2/4] cxl/region: Move region lock error code to -EBUSY Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Dan Williams @ 2026-04-03 21:00 UTC (permalink / raw)
  To: dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

For drivers that want to fallback to PCI-only operation, immediately
cleanup on attach failure. Otherwise vestigial topology objects are left
until driver unload.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index ff858318091f..5e7fa378dd66 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -201,7 +201,19 @@ static int cxl_mem_probe(struct device *dev)
 struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
 				       const struct cxl_memdev_attach *attach)
 {
-	return __devm_cxl_add_memdev(cxlds, attach);
+	struct cxl_memdev *cxlmd;
+	void *group;
+
+	group = devres_open_group(cxlds->dev, NULL, GFP_KERNEL);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	cxlmd = __devm_cxl_add_memdev(cxlds, attach);
+	if (IS_ERR(cxlmd))
+		devres_release_group(cxlds->dev, group);
+	else
+		devres_remove_group(cxlds->dev, group);
+	return cxlmd;
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/4] cxl/region: Move region lock error code to -EBUSY
  2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
@ 2026-04-03 21:00 ` Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 3/4] cxl/region: Block region delete for locked regions Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
  3 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2026-04-03 21:00 UTC (permalink / raw)
  To: dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

-EPERM is permission check. Root is permitted to manage a region, but at
any given point in time the region may be busy. In preparation for adding
another region operation failure due to region lock, move the existing
error indication to -EBUSY.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3edb5703d6de..600c96be0888 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -426,7 +426,7 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
 	}
 
 	if (test_bit(CXL_REGION_F_LOCK, &cxlr->flags))
-		return -EPERM;
+		return -EBUSY;
 
 	rc = queue_reset(cxlr);
 	if (rc)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/4] cxl/region: Block region delete for locked regions
  2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 2/4] cxl/region: Move region lock error code to -EBUSY Dan Williams
@ 2026-04-03 21:00 ` Dan Williams
  2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
  3 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2026-04-03 21:00 UTC (permalink / raw)
  To: dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

In support of reducing the number of failure conditions an accelerator
driver needs to consider, block user initiated delete of locked regions.

This might turn out to be too heavy and some region unlock mechanism needs
to be considered, but in the interest of doing something simple to get
accelerators enabled, take this coarse step.

Switch to scope-based cleanup now that there is an early exit case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 600c96be0888..11bc0b88b05f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2768,17 +2768,20 @@ static ssize_t delete_region_store(struct device *dev,
 {
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 	struct cxl_port *port = to_cxl_port(dev->parent);
-	struct cxl_region *cxlr;
 
-	cxlr = cxl_find_region_by_name(cxlrd, buf);
+	struct cxl_region *cxlr __free(put_cxl_region) =
+		cxl_find_region_by_name(cxlrd, buf);
 	if (IS_ERR(cxlr))
 		return PTR_ERR(cxlr);
 
+	if (test_bit(CXL_REGION_F_LOCK, &cxlr->flags))
+		return -EBUSY;
+
 	devm_release_action(port->uport_dev, unregister_region, cxlr);
-	put_device(&cxlr->dev);
 
 	return len;
 }
+
 DEVICE_ATTR_WO(delete_region);
 
 static void cxl_pmem_region_release(struct device *dev)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
                   ` (2 preceding siblings ...)
  2026-04-03 21:00 ` [RFC PATCH 3/4] cxl/region: Block region delete for locked regions Dan Williams
@ 2026-04-03 21:00 ` Dan Williams
  2026-04-07 10:25   ` Alejandro Lucero Palau
  2026-04-09 22:02   ` Cheatham, Benjamin
  3 siblings, 2 replies; 14+ messages in thread
From: Dan Williams @ 2026-04-03 21:00 UTC (permalink / raw)
  To: dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

To date, platform firmware maps accelerator memory and accelerator drivers
simply want an address range that they can map themselves. This typically
results in a single region being auto-assembled upon registration of a
memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
parameter to retrieve that region while also adhering to CXL subsystem
locking and lifetime rules. As part of adhering to current object lifetime
rules, if the region or the CXL port topology is invalidated, the CXL core
arranges for the accelertor driver to be detached as well.

The locking and lifetime rules were validated by this local change to
cxl_mock_mem. This tests all the lock acquisition and cleanup at
modprobe -r cxl_test time. More work needed to also test the full positive
case.

    struct cxl_attach_region
            attach = { .attach = {
                               .probe = cxl_memdev_attach_region,
                       } };
    cxlmd = devm_cxl_add_memdev(cxlds, &attach.attach);

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/cxl/cxl.h         |  16 +++++
 drivers/cxl/core/region.c | 125 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)

diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 10a9b8fa2f6b..1698d15ec1ca 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -153,6 +153,22 @@ struct cxl_memdev_attach {
 	int (*probe)(struct cxl_memdev *cxlmd);
 };
 
+/**
+ * struct cxl_attach_region - coordinate mapping a region at memdev registration
+ * @attach: common core attachment descriptor
+ * @region: physical address range of the region
+ *
+ * For the common simple case of a CXL device with private (non-general purpose
+ * / "accelerator") memory, enumerate firmware instantiated region, or
+ * instantiate a region for the device's capacity. Destroy the region on detach.
+ */
+struct cxl_attach_region {
+	struct cxl_memdev_attach attach;
+	struct range region;
+};
+
+int cxl_memdev_attach_region(struct cxl_memdev *cxlmd);
+
 /**
  * struct cxl_dev_state - The driver device state
  *
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 11bc0b88b05f..090f52392b20 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1123,6 +1123,19 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
 static void cxl_region_setup_flags(struct cxl_region *cxlr,
 				   struct cxl_decoder *cxld)
 {
+	if (is_endpoint_decoder(&cxld->dev)) {
+		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+
+		/*
+		 * When a region's memdevs specify an @attach method the attach
+		 * provider is responsible for dispositioning the region for
+		 * both probe and userspace management
+		 */
+		if (cxlmd->attach)
+			set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
+	}
+
 	if (cxld->flags & CXL_DECODER_F_LOCK) {
 		set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
 		clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
@@ -4226,6 +4239,115 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
 	return 0;
 }
 
+static int first_mapped_decoder(struct device *dev, const void *data)
+{
+	struct cxl_endpoint_decoder *cxled;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	if (cxled->cxld.region)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * As this is running in endpoint port remove context it does not race cxl_root
+ * destruction since port topologies are always removed depth first.
+ */
+static void cxl_endpoint_region_autoremove(void *_cxlr)
+{
+	struct cxl_region *cxlr = _cxlr;
+	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
+	struct cxl_port *port = cxlrd_to_port(cxlrd);
+
+	devm_release_action(port->uport_dev, unregister_region, cxlr);
+}
+
+/*
+ * Runs in cxl_mem_probe context after successful endpoint probe, assumes the
+ * simple case of single mapped decoder per memdev.
+ */
+int cxl_memdev_attach_region(struct cxl_memdev *cxlmd)
+{
+	struct cxl_attach_region *attach =
+		container_of(cxlmd->attach, typeof(*attach), attach);
+	struct cxl_port *endpoint = cxlmd->endpoint;
+	struct cxl_endpoint_decoder *cxled;
+	struct cxl_region *cxlr;
+	int rc;
+
+	/* hold endpoint lock to setup autoremove of the region */
+	guard(device)(&endpoint->dev);
+	if (!endpoint->dev.driver)
+		return -ENXIO;
+	guard(rwsem_read)(&cxl_rwsem.region);
+	guard(rwsem_read)(&cxl_rwsem.dpa);
+
+	/*
+	 * TODO auto-instantiate a region, for now assume this will find an
+	 * auto-region
+	 */
+	struct device *dev __free(put_device) =
+		device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
+
+	if (!dev) {
+		dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
+			dev_name(&cxlmd->dev));
+		return -ENXIO;
+	}
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	cxlr = cxled->cxld.region;
+
+	if (cxlr->params.state < CXL_CONFIG_COMMIT) {
+		dev_dbg(cxlmd->cxlds->dev,
+			"region %s not committed for memdev %s\n",
+			dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
+		return -ENXIO;
+	}
+
+	if (cxlr->params.nr_targets > 1) {
+		dev_dbg(cxlmd->cxlds->dev,
+			"Only attach to local non-interleaved region\n");
+		return -ENXIO;
+	}
+
+	/* Only teardown regions that pass validation, ignore the rest */
+	rc = devm_add_action_or_reset(&endpoint->dev,
+				      cxl_endpoint_region_autoremove, cxlr);
+	if (rc)
+		return rc;
+
+	attach->region = (struct range) {
+		.start = cxlr->params.res->start,
+		.end = cxlr->params.res->end,
+	};
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
+
+/*
+ * The presence of an attach method indicates that the region is designated for
+ * a purpose outside of CXL core memory expansion defaults.
+ */
+static bool cxl_region_has_memdev_attach(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+
+	for (int i = 0; i < p->nr_targets; i++) {
+		struct cxl_endpoint_decoder *cxled = p->targets[i];
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+
+		if (cxlmd->attach)
+			return true;
+	}
+
+	return false;
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -4257,6 +4379,9 @@ static int cxl_region_probe(struct device *dev)
 	if (rc)
 		return rc;
 
+	if (cxl_region_has_memdev_attach(cxlr))
+		return 0;
+
 	switch (cxlr->mode) {
 	case CXL_PARTMODE_PMEM:
 		rc = devm_cxl_region_edac_register(cxlr);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
@ 2026-04-07 10:25   ` Alejandro Lucero Palau
  2026-04-11 21:42     ` Dan Williams
  2026-04-09 22:02   ` Cheatham, Benjamin
  1 sibling, 1 reply; 14+ messages in thread
From: Alejandro Lucero Palau @ 2026-04-07 10:25 UTC (permalink / raw)
  To: Dan Williams, dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau


On 4/3/26 22:00, Dan Williams wrote:
> To date, platform firmware maps accelerator memory and accelerator drivers
> simply want an address range that they can map themselves. This typically
> results in a single region being auto-assembled upon registration of a
> memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
> parameter to retrieve that region while also adhering to CXL subsystem
> locking and lifetime rules.


I doubt this approach is safer than the type2 v25 case. Apparently you 
are reducing the potential concurrency issue if someone removes 
cxl_acpi, but I think it is as strong as the v25 approach. With the 
rwsem lock, the region will exist or it will not. If the removal happens 
after the driver gets the region, the ioremap can race, with your 
approach and with the v25 one.


Also, you are keen to use your attach function for doing the same thing 
I have, which needs to be justified. I can not see a reason for that 
attach functionality with the sfc case. You did not reply to that comment:


https://lore.kernel.org/linux-cxl/20260330143827.1278677-1-alejandro.lucero-palau@amd.com/T/#m21968fd0ddf02df3641194d1450cb2bd392def26


And although I have not tested it, I think this is buggy. Removal of 
cxl_acpi will trigger unregister_region and a subsequent driver exit 
will invoke, inside the autoremove call linked to the endpoint device, 
devm_release_action on something not existing anymore. It is the same 
problem with v25 type2 and potential cxl_acpi removal. Maybe we should 
avoid linking a type2-related region to the cxl root port ...


And IMO, hiding what we are doing to the user/driver is not convenient. 
Although current type2 is going to rely on a committed decoder, that 
will not be the case if we look beyond impending needs. A function with 
a name describing this situation can add clarity to the different 
possibilities to support as a follow up of this initial work. About 
naming, attaching a region to a memdev is confusing ... as it is already 
"attached" or linked, and the reason you give for having that explicit 
attachment not being clear from a safety point of view.


> As part of adhering to current object lifetime
> rules, if the region or the CXL port topology is invalidated, the CXL core
> arranges for the accelertor driver to be detached as well.
>
> The locking and lifetime rules were validated by this local change to
> cxl_mock_mem. This tests all the lock acquisition and cleanup at
> modprobe -r cxl_test time. More work needed to also test the full positive
> case.
>
>      struct cxl_attach_region
>              attach = { .attach = {
>                                 .probe = cxl_memdev_attach_region,
>                         } };
>      cxlmd = devm_cxl_add_memdev(cxlds, &attach.attach);
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>   include/cxl/cxl.h         |  16 +++++
>   drivers/cxl/core/region.c | 125 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 141 insertions(+)
>
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 10a9b8fa2f6b..1698d15ec1ca 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -153,6 +153,22 @@ struct cxl_memdev_attach {
>   	int (*probe)(struct cxl_memdev *cxlmd);
>   };
>   
> +/**
> + * struct cxl_attach_region - coordinate mapping a region at memdev registration
> + * @attach: common core attachment descriptor
> + * @region: physical address range of the region
> + *
> + * For the common simple case of a CXL device with private (non-general purpose
> + * / "accelerator") memory, enumerate firmware instantiated region, or
> + * instantiate a region for the device's capacity. Destroy the region on detach.
> + */
> +struct cxl_attach_region {
> +	struct cxl_memdev_attach attach;
> +	struct range region;
> +};
> +
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd);
> +
>   /**
>    * struct cxl_dev_state - The driver device state
>    *
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 11bc0b88b05f..090f52392b20 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1123,6 +1123,19 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
>   static void cxl_region_setup_flags(struct cxl_region *cxlr,
>   				   struct cxl_decoder *cxld)
>   {
> +	if (is_endpoint_decoder(&cxld->dev)) {
> +		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +
> +		/*
> +		 * When a region's memdevs specify an @attach method the attach
> +		 * provider is responsible for dispositioning the region for
> +		 * both probe and userspace management
> +		 */
> +		if (cxlmd->attach)
> +			set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
> +	}
> +
>   	if (cxld->flags & CXL_DECODER_F_LOCK) {
>   		set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
>   		clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
> @@ -4226,6 +4239,115 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
>   	return 0;
>   }
>   
> +static int first_mapped_decoder(struct device *dev, const void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (cxled->cxld.region)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * As this is running in endpoint port remove context it does not race cxl_root
> + * destruction since port topologies are always removed depth first.
> + */
> +static void cxl_endpoint_region_autoremove(void *_cxlr)
> +{
> +	struct cxl_region *cxlr = _cxlr;
> +	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> +	struct cxl_port *port = cxlrd_to_port(cxlrd);
> +
> +	devm_release_action(port->uport_dev, unregister_region, cxlr);
> +}
> +
> +/*
> + * Runs in cxl_mem_probe context after successful endpoint probe, assumes the
> + * simple case of single mapped decoder per memdev.
> + */
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_attach_region *attach =
> +		container_of(cxlmd->attach, typeof(*attach), attach);
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_region *cxlr;
> +	int rc;
> +
> +	/* hold endpoint lock to setup autoremove of the region */
> +	guard(device)(&endpoint->dev);
> +	if (!endpoint->dev.driver)
> +		return -ENXIO;
> +	guard(rwsem_read)(&cxl_rwsem.region);
> +	guard(rwsem_read)(&cxl_rwsem.dpa);
> +
> +	/*
> +	 * TODO auto-instantiate a region, for now assume this will find an
> +	 * auto-region
> +	 */
> +	struct device *dev __free(put_device) =
> +		device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
> +
> +	if (!dev) {
> +		dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
> +			dev_name(&cxlmd->dev));
> +		return -ENXIO;
> +	}
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	cxlr = cxled->cxld.region;
> +
> +	if (cxlr->params.state < CXL_CONFIG_COMMIT) {
> +		dev_dbg(cxlmd->cxlds->dev,
> +			"region %s not committed for memdev %s\n",
> +			dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
> +		return -ENXIO;
> +	}
> +
> +	if (cxlr->params.nr_targets > 1) {
> +		dev_dbg(cxlmd->cxlds->dev,
> +			"Only attach to local non-interleaved region\n");
> +		return -ENXIO;
> +	}
> +
> +	/* Only teardown regions that pass validation, ignore the rest */
> +	rc = devm_add_action_or_reset(&endpoint->dev,
> +				      cxl_endpoint_region_autoremove, cxlr);
> +	if (rc)
> +		return rc;
> +
> +	attach->region = (struct range) {
> +		.start = cxlr->params.res->start,
> +		.end = cxlr->params.res->end,
> +	};
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
> +
> +/*
> + * The presence of an attach method indicates that the region is designated for
> + * a purpose outside of CXL core memory expansion defaults.
> + */
> +static bool cxl_region_has_memdev_attach(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +
> +	for (int i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +
> +		if (cxlmd->attach)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>   static int cxl_region_probe(struct device *dev)
>   {
>   	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -4257,6 +4379,9 @@ static int cxl_region_probe(struct device *dev)
>   	if (rc)
>   		return rc;
>   
> +	if (cxl_region_has_memdev_attach(cxlr))
> +		return 0;
> +
>   	switch (cxlr->mode) {
>   	case CXL_PARTMODE_PMEM:
>   		rc = devm_cxl_region_edac_register(cxlr);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach
  2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
@ 2026-04-09 22:02   ` Cheatham, Benjamin
  2026-04-11 22:33     ` Dan Williams
  0 siblings, 1 reply; 14+ messages in thread
From: Cheatham, Benjamin @ 2026-04-09 22:02 UTC (permalink / raw)
  To: Dan Williams, dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

On 4/3/2026 4:00 PM, Dan Williams wrote:
> For drivers that want to fallback to PCI-only operation, immediately
> cleanup on attach failure. Otherwise vestigial topology objects are left
> until driver unload.
> 

The usage of "attach" here is somewhat ambiguous. Does this mean the attach callback,
or cxl_mem attach? Also, I know that providing the attach callback is probably what
indicates a driver wants to fallback to PCIe operation, but it's not evident based on
the description or contents of the patch.

> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/mem.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index ff858318091f..5e7fa378dd66 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -201,7 +201,19 @@ static int cxl_mem_probe(struct device *dev)
>  struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
>  				       const struct cxl_memdev_attach *attach)
>  {
> -	return __devm_cxl_add_memdev(cxlds, attach);
> +	struct cxl_memdev *cxlmd;
> +	void *group;
> +
> +	group = devres_open_group(cxlds->dev, NULL, GFP_KERNEL);
> +	if (!group)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cxlmd = __devm_cxl_add_memdev(cxlds, attach);
> +	if (IS_ERR(cxlmd))
> +		devres_release_group(cxlds->dev, group);
> +	else
> +		devres_remove_group(cxlds->dev, group);
> +	return cxlmd;
>  }
>  EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>  

I'm having trouble figuring out how exactly this is supposed to work. I guess I have two questions:
1) What is this devres group doing? There aren't any actions added to it (afaik), so it seems to do nothing? Maybe there's some magic
going on here I don't know about?

2) Isn't the memdev already cleaned up if cxl_mem probe fails and attach is provided? What's the extra cleanup happening here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
  2026-04-07 10:25   ` Alejandro Lucero Palau
@ 2026-04-09 22:02   ` Cheatham, Benjamin
  2026-04-11 23:02     ` Dan Williams
  1 sibling, 1 reply; 14+ messages in thread
From: Cheatham, Benjamin @ 2026-04-09 22:02 UTC (permalink / raw)
  To: Dan Williams, dave.jiang; +Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

On 4/3/2026 4:00 PM, Dan Williams wrote:
> To date, platform firmware maps accelerator memory and accelerator drivers
> simply want an address range that they can map themselves. This typically
> results in a single region being auto-assembled upon registration of a
> memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
> parameter to retrieve that region while also adhering to CXL subsystem
> locking and lifetime rules. As part of adhering to current object lifetime
> rules, if the region or the CXL port topology is invalidated, the CXL core
> arranges for the accelertor driver to be detached as well.

I'm probably missing something, but where does the core detach the accelerator driver?
I assume it's part of unregister_region(), but I'm not seeing any driver_detach() calls
on the accelerator device.

I don't think there's any demand for this at the moment, so it's more of a "food for thought"
kind of thing, but why not allow for a fallback mode of operation instead of detaching
the accelerator driver? First thing that comes to mind is an optional callback as part
of struct cxl_memdev_attach that is called by the core during CXL removal. The assumption
would be that an accelerator driver that provides the callback would clean up all of its CXL
usage in the callback then continue in a PCIe-only mode of operation (hardware willing).
When the callback isn't provided, you get the behavior in this patch. Again, this isn't
needed here but may be a nice middle-ground later on down the line.

> 
> The locking and lifetime rules were validated by this local change to
> cxl_mock_mem. This tests all the lock acquisition and cleanup at
> modprobe -r cxl_test time. More work needed to also test the full positive
> case.
> 
>     struct cxl_attach_region
>             attach = { .attach = {
>                                .probe = cxl_memdev_attach_region,
>                        } };
>     cxlmd = devm_cxl_add_memdev(cxlds, &attach.attach);
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/cxl/cxl.h         |  16 +++++
>  drivers/cxl/core/region.c | 125 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)
> 
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 10a9b8fa2f6b..1698d15ec1ca 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -153,6 +153,22 @@ struct cxl_memdev_attach {
>  	int (*probe)(struct cxl_memdev *cxlmd);
>  };
>  
> +/**
> + * struct cxl_attach_region - coordinate mapping a region at memdev registration
> + * @attach: common core attachment descriptor
> + * @region: physical address range of the region
> + *
> + * For the common simple case of a CXL device with private (non-general purpose
> + * / "accelerator") memory, enumerate firmware instantiated region, or
> + * instantiate a region for the device's capacity. Destroy the region on detach.
> + */
> +struct cxl_attach_region {
> +	struct cxl_memdev_attach attach;
> +	struct range region;
> +};
> +
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd);
> +
>  /**
>   * struct cxl_dev_state - The driver device state
>   *
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 11bc0b88b05f..090f52392b20 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1123,6 +1123,19 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
>  static void cxl_region_setup_flags(struct cxl_region *cxlr,
>  				   struct cxl_decoder *cxld)
>  {
> +	if (is_endpoint_decoder(&cxld->dev)) {
> +		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +
> +		/*
> +		 * When a region's memdevs specify an @attach method the attach
> +		 * provider is responsible for dispositioning the region for
> +		 * both probe and userspace management
> +		 */
> +		if (cxlmd->attach)
> +			set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
> +	}
> +

It seems unfortunate that you set the region lock bit here and then immediately do a check
to set the bit again. I can't think of a way to leverage cxld->flags for type 2 in an
ergonomic way though, so I guess it's fine.
>  	if (cxld->flags & CXL_DECODER_F_LOCK) {
>  		set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
>  		clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
> @@ -4226,6 +4239,115 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
>  	return 0;
>  }
>  
> +static int first_mapped_decoder(struct device *dev, const void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (cxled->cxld.region)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * As this is running in endpoint port remove context it does not race cxl_root
> + * destruction since port topologies are always removed depth first.
> + */
> +static void cxl_endpoint_region_autoremove(void *_cxlr)
> +{
> +	struct cxl_region *cxlr = _cxlr;
> +	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> +	struct cxl_port *port = cxlrd_to_port(cxlrd);
> +
> +	devm_release_action(port->uport_dev, unregister_region, cxlr);
> +}
> +
> +/*
> + * Runs in cxl_mem_probe context after successful endpoint probe, assumes the
> + * simple case of single mapped decoder per memdev.
> + */
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_attach_region *attach =
> +		container_of(cxlmd->attach, typeof(*attach), attach);
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_region *cxlr;
> +	int rc;
> +
> +	/* hold endpoint lock to setup autoremove of the region */
> +	guard(device)(&endpoint->dev);
> +	if (!endpoint->dev.driver)
> +		return -ENXIO;
> +	guard(rwsem_read)(&cxl_rwsem.region);
> +	guard(rwsem_read)(&cxl_rwsem.dpa);
> +
> +	/*
> +	 * TODO auto-instantiate a region, for now assume this will find an
> +	 * auto-region
> +	 */
> +	struct device *dev __free(put_device) =
> +		device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
> +
> +	if (!dev) {
> +		dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
> +			dev_name(&cxlmd->dev));
> +		return -ENXIO;
> +	}
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	cxlr = cxled->cxld.region;
> +
> +	if (cxlr->params.state < CXL_CONFIG_COMMIT) {
> +		dev_dbg(cxlmd->cxlds->dev,
> +			"region %s not committed for memdev %s\n",
> +			dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
> +		return -ENXIO;
> +	}
> +
> +	if (cxlr->params.nr_targets > 1) {
> +		dev_dbg(cxlmd->cxlds->dev,
> +			"Only attach to local non-interleaved region\n");
> +		return -ENXIO;
> +	}
> +
> +	/* Only teardown regions that pass validation, ignore the rest */
> +	rc = devm_add_action_or_reset(&endpoint->dev,
> +				      cxl_endpoint_region_autoremove, cxlr);
> +	if (rc)
> +		return rc;
> +
> +	attach->region = (struct range) {
> +		.start = cxlr->params.res->start,
> +		.end = cxlr->params.res->end,
> +	};
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
> +
> +/*
> + * The presence of an attach method indicates that the region is designated for
> + * a purpose outside of CXL core memory expansion defaults.
> + */
> +static bool cxl_region_has_memdev_attach(struct cxl_region *cxlr)
> +{

I think this should be renamed to something like "cxl_region_is_private()"; it
better matches the intended use of the function while lessening the burden on
a non-type 2 educated reader. Also has the added benefit of being one less
change site if the mechanism for determining if a region belongs to an accelerator
changes in the future.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-07 10:25   ` Alejandro Lucero Palau
@ 2026-04-11 21:42     ` Dan Williams
  2026-04-14 15:41       ` Alejandro Lucero Palau
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Williams @ 2026-04-11 21:42 UTC (permalink / raw)
  To: Alejandro Lucero Palau, Dan Williams, dave.jiang
  Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

Alejandro Lucero Palau wrote:
> 
> On 4/3/26 22:00, Dan Williams wrote:
> > To date, platform firmware maps accelerator memory and accelerator drivers
> > simply want an address range that they can map themselves. This typically
> > results in a single region being auto-assembled upon registration of a
> > memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
> > parameter to retrieve that region while also adhering to CXL subsystem
> > locking and lifetime rules.
> 
> 
> I doubt this approach is safer than the type2 v25 case. 

I believe it is safer, see evidence below...

> Apparently you are reducing the potential concurrency issue if someone
> removes cxl_acpi, but I think it is as strong as the v25 approach.
> With the rwsem lock, the region will exist or it will not. If the
> removal happens after the driver gets the region, the ioremap can
> race, with your approach and with the v25 one.

Yes, ioremap can race and in the attach case when the ioremap is
invalidated the sfc driver gets unloaded. In v25 it leads to
use-after-free issues.

> Also, you are keen to use your attach function for doing the same thing 
> I have, which needs to be justified.

No, you need to justify usage and export of cxl_core objects without
taking care of the current lifetime rules. Did you see these reports
from sashiko about v25? These issues are the motivation to pull
responsibility away from client drivers.

https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=6
https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=7

Now, I do not think the "attach" appraoch is the final end state. This
needs to get simpler still, but it can not get simpler with an
increasing attack surface.

> I can not see a reason for that attach functionality with the sfc
> case. You did not reply to that comment:
> 
> 
> https://lore.kernel.org/linux-cxl/20260330143827.1278677-1-alejandro.lucero-palau@amd.com/T/#m21968fd0ddf02df3641194d1450cb2bd392def26

Yes, did not have bandwidth to reply to that beyond sending out the
attach approach to better contain CXL core complexity.

> And although I have not tested it, I think this is buggy. Removal of 
> cxl_acpi will trigger unregister_region and a subsequent driver exit 
> will invoke, inside the autoremove call linked to the endpoint device, 
> devm_release_action on something not existing anymore.

If there is a bug here I would like to see that test case because as far
as I can see it would be shared with the generic expander case.

> It is the same problem with v25 type2 and potential cxl_acpi removal.
> Maybe we should avoid linking a type2-related region to the cxl root
> port ...
> 
> 
> And IMO, hiding what we are doing to the user/driver is not convenient. 
> Although current type2 is going to rely on a committed decoder, that 
> will not be the case if we look beyond impending needs.

Yes, start with the simple committed decoder case. Later, follow-on with
auto-create support. That would have no need to go back and teach
accelerator drivers how to create regions in the simple case the same
helper would "just work".

> A function with a name describing this situation can add clarity to
> the different possibilities to support as a follow up of this initial
> work. About naming, attaching a region to a memdev is confusing ... as
> it is already "attached" or linked, and the reason you give for having
> that explicit attachment not being clear from a safety point of view.

I would be happy if the only detail left to discuss was naming.

In this case "attach" means "attach to CXL the topology". A PCI driver
can create a cxl_memdev, but if that cxl_memdev finds no active CXL
topology then that "attachment" fails.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach
  2026-04-09 22:02   ` Cheatham, Benjamin
@ 2026-04-11 22:33     ` Dan Williams
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2026-04-11 22:33 UTC (permalink / raw)
  To: Cheatham, Benjamin, dave.jiang
  Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

Cheatham, Benjamin wrote:
> On 4/3/2026 4:00 PM, Dan Williams wrote:
> > For drivers that want to fallback to PCI-only operation, immediately
> > cleanup on attach failure. Otherwise vestigial topology objects are left
> > until driver unload.
> > 
> 
> The usage of "attach" here is somewhat ambiguous. Does this mean the attach callback,
> or cxl_mem attach?

Fair point.

> Also, I know that providing the attach callback is probably what
> indicates a driver wants to fallback to PCIe operation, but it's not
> evident based on the description or contents of the patch.

I rushed this... it was a hectic week when I sent this.

> 2) Isn't the memdev already cleaned up if cxl_mem probe fails and
> attach is provided? What's the extra cleanup happening here?

Exactly right. cxl_mem_probe() failure does the job.

Sorry for the noise.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-09 22:02   ` Cheatham, Benjamin
@ 2026-04-11 23:02     ` Dan Williams
  2026-04-12  8:57       ` Lukas Wunner
  2026-04-13 14:25       ` Cheatham, Benjamin
  0 siblings, 2 replies; 14+ messages in thread
From: Dan Williams @ 2026-04-11 23:02 UTC (permalink / raw)
  To: Cheatham, Benjamin, dave.jiang
  Cc: linux-cxl, linux-kernel, alejandro.lucero-palau, lukas

[ add Lukas ]

Cheatham, Benjamin wrote:
> On 4/3/2026 4:00 PM, Dan Williams wrote:
> > To date, platform firmware maps accelerator memory and accelerator drivers
> > simply want an address range that they can map themselves. This typically
> > results in a single region being auto-assembled upon registration of a
> > memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
> > parameter to retrieve that region while also adhering to CXL subsystem
> > locking and lifetime rules. As part of adhering to current object lifetime
> > rules, if the region or the CXL port topology is invalidated, the CXL core
> > arranges for the accelertor driver to be detached as well.
> 
> I'm probably missing something, but where does the core detach the
> accelerator driver?  I assume it's part of unregister_region(), but
> I'm not seeing any driver_detach() calls on the accelerator device.

It happens in detach_memdev().

> I don't think there's any demand for this at the moment, so it's more
> of a "food for thought" kind of thing, but why not allow for a
> fallback mode of operation instead of detaching the accelerator
> driver? First thing that comes to mind is an optional callback as part
> of struct cxl_memdev_attach that is called by the core during CXL
> removal. The assumption would be that an accelerator driver that
> provides the callback would clean up all of its CXL usage in the
> callback then continue in a PCIe-only mode of operation (hardware
> willing).  When the callback isn't provided, you get the behavior in
> this patch. Again, this isn't needed here but may be a nice
> middle-ground later on down the line.

I think that is possible, and also is not really a CXL specific feature. A
capability to move resources in active use by drivers has been proposed before
[1] (credit: Lukas).

[..]
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 11bc0b88b05f..090f52392b20 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1123,6 +1123,19 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
> >  static void cxl_region_setup_flags(struct cxl_region *cxlr,
> >  				   struct cxl_decoder *cxld)
> >  {
> > +	if (is_endpoint_decoder(&cxld->dev)) {
> > +		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > +
> > +		/*
> > +		 * When a region's memdevs specify an @attach method the attach
> > +		 * provider is responsible for dispositioning the region for
> > +		 * both probe and userspace management
> > +		 */
> > +		if (cxlmd->attach)
> > +			set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
> > +	}
> > +
> 
> It seems unfortunate that you set the region lock bit here and then
> immediately do a check to set the bit again.

There are 2 different cases to lock the region here, presence of attach, or
presence of hardware locked decoder.

> I can't think of a way to leverage cxld->flags for type 2 in an
> ergonomic way though, so I guess it's fine.

The flags are just caching hardware state, I do not want to have drivers
play games with the flags that dilute that "cached hardware value"
meaning.

[..]
> > +/*
> > + * The presence of an attach method indicates that the region is designated for
> > + * a purpose outside of CXL core memory expansion defaults.
> > + */
> > +static bool cxl_region_has_memdev_attach(struct cxl_region *cxlr)
> > +{
> 
> I think this should be renamed to something like "cxl_region_is_private()"; it
> better matches the intended use of the function while lessening the burden on
> a non-type 2 educated reader. Also has the added benefit of being one less
> change site if the mechanism for determining if a region belongs to an accelerator
> changes in the future.

Why does the CXL core naming need to cater to accelerator vs expansion
use case naming?  CXL core developer needs to understand all the use
cases and wants to make them unified as much as possible.
"cxl_region_is_private()" says nothing about what "private" means and
where to look next. It also muddies the namespace for when CXL regions
start becoming the backing store for "private" nodes. Is that a
"private" region?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-11 23:02     ` Dan Williams
@ 2026-04-12  8:57       ` Lukas Wunner
  2026-04-13 14:25       ` Cheatham, Benjamin
  1 sibling, 0 replies; 14+ messages in thread
From: Lukas Wunner @ 2026-04-12  8:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Cheatham, Benjamin, dave.jiang, linux-cxl, linux-kernel,
	alejandro.lucero-palau

On Sat, Apr 11, 2026 at 04:02:34PM -0700, Dan Williams wrote:
> Cheatham, Benjamin wrote:
> > On 4/3/2026 4:00 PM, Dan Williams wrote:
> > > To date, platform firmware maps accelerator memory and accelerator drivers
> > > simply want an address range that they can map themselves. This typically
> > > results in a single region being auto-assembled upon registration of a
> > > memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
> > > parameter to retrieve that region while also adhering to CXL subsystem
> > > locking and lifetime rules. As part of adhering to current object lifetime
> > > rules, if the region or the CXL port topology is invalidated, the CXL core
> > > arranges for the accelertor driver to be detached as well.

Nit: s/accelertor/accelerator/ and wrap to 72 chars

> I think that is possible, and also is not really a CXL specific feature. A
> capability to move resources in active use by drivers has been proposed before
> [1] (credit: Lukas).

There's no link provided for [1] in your e-mail but you may be referring to:

https://lore.kernel.org/all/20201218174011.340514-1-s.miroshnichenko@yadro.com/

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-11 23:02     ` Dan Williams
  2026-04-12  8:57       ` Lukas Wunner
@ 2026-04-13 14:25       ` Cheatham, Benjamin
  1 sibling, 0 replies; 14+ messages in thread
From: Cheatham, Benjamin @ 2026-04-13 14:25 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: linux-cxl, linux-kernel, alejandro.lucero-palau, lukas

On 4/11/2026 6:02 PM, Dan Williams wrote:

[snip]

>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 11bc0b88b05f..090f52392b20 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -1123,6 +1123,19 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
>>>  static void cxl_region_setup_flags(struct cxl_region *cxlr,
>>>                                struct cxl_decoder *cxld)
>>>  {
>>> +   if (is_endpoint_decoder(&cxld->dev)) {
>>> +           struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
>>> +           struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>>> +
>>> +           /*
>>> +            * When a region's memdevs specify an @attach method the attach
>>> +            * provider is responsible for dispositioning the region for
>>> +            * both probe and userspace management
>>> +            */
>>> +           if (cxlmd->attach)
>>> +                   set_bit(CXL_REGION_F_LOCK, &cxlr->flags);
>>> +   }
>>> +
>>
>> It seems unfortunate that you set the region lock bit here and then
>> immediately do a check to set the bit again.
> 
> There are 2 different cases to lock the region here, presence of attach, or
> presence of hardware locked decoder.
> 
>> I can't think of a way to leverage cxld->flags for type 2 in an
>> ergonomic way though, so I guess it's fine.
> 
> The flags are just caching hardware state, I do not want to have drivers
> play games with the flags that dilute that "cached hardware value"
> meaning.
> 

Didn't realize that, makes sense.

> [..]
>>> +/*
>>> + * The presence of an attach method indicates that the region is designated for
>>> + * a purpose outside of CXL core memory expansion defaults.
>>> + */
>>> +static bool cxl_region_has_memdev_attach(struct cxl_region *cxlr)
>>> +{
>>
>> I think this should be renamed to something like "cxl_region_is_private()"; it
>> better matches the intended use of the function while lessening the burden on
>> a non-type 2 educated reader. Also has the added benefit of being one less
>> change site if the mechanism for determining if a region belongs to an accelerator
>> changes in the future.
> 
> Why does the CXL core naming need to cater to accelerator vs expansion
> use case naming?  CXL core developer needs to understand all the use
> cases and wants to make them unified as much as possible.

It doesn't. I agree that developers that are going to touch this code should
understand all the use cases but the naming here doesn't accurately convey
the intended use case. memdev_attach is only used by accelerator drivers, so
this function is checking the device type/driver.

That does somewhat go against unifying the use cases but the use cases
definitely do diverge here. It's really more about clearly delineating where
the cases diverge to someone who is less knowledgeable about accelerators
(which is most people in my experience).

> "cxl_region_is_private()" says nothing about what "private" means and
> where to look next. It also muddies the namespace for when CXL regions
> start becoming the backing store for "private" nodes. Is that a
> "private" region?

Bad naming suggestion on my part, I forgot about the private nodes set.
I was about to suggest changing it to "cxl_region_is_reserved()" but that's
an even more overloaded term :/. Going along the lines of what I said above,
maybe something like "cxl_region_reserved_by_acclerator()"? Kind of long, but
gets to the point.

Thanks,
Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
  2026-04-11 21:42     ` Dan Williams
@ 2026-04-14 15:41       ` Alejandro Lucero Palau
  0 siblings, 0 replies; 14+ messages in thread
From: Alejandro Lucero Palau @ 2026-04-14 15:41 UTC (permalink / raw)
  To: Dan Williams, Dan Williams, dave.jiang
  Cc: linux-cxl, linux-kernel, alejandro.lucero-palau

On 4/11/26 22:42, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
>> On 4/3/26 22:00, Dan Williams wrote:
>>> To date, platform firmware maps accelerator memory and accelerator drivers
>>> simply want an address range that they can map themselves. This typically
>>> results in a single region being auto-assembled upon registration of a
>>> memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
>>> parameter to retrieve that region while also adhering to CXL subsystem
>>> locking and lifetime rules.
>>
>> I doubt this approach is safer than the type2 v25 case.
> I believe it is safer, see evidence below...

It is safer for the case of user space unbinding a memdev device from 
cxl_mem racing with the type2 initialization regarding the linked 
autodiscovered region to the memdev. But, if I am not misunderstanding 
something here, the problem remains ... (below explained).

BTW, I know about unbinding for the virtualization case, with a pci 
device being bound to UIO/VFIO instead of to the host vendor-specific 
pci driver, but I can not see the point of user space doing this for a 
cxl memdev. I know the unbinding is standard functionality but I wonder 
if this should be avoided under certain scenarios like this one. 
Otherwise, if there exists a reason for allowing this, I would like to know.

>
>> Apparently you are reducing the potential concurrency issue if someone
>> removes cxl_acpi, but I think it is as strong as the v25 approach.
>> With the rwsem lock, the region will exist or it will not. If the
>> removal happens after the driver gets the region, the ioremap can
>> race, with your approach and with the v25 one.
> Yes, ioremap can race and in the attach case when the ioremap is
> invalidated the sfc driver gets unloaded. In v25 it leads to
> use-after-free issues.

I do not follow this. I can not see in the code a link between the 
autoremove functionality and that triggering the sfc driver being unload.

And the new function added, cxl_memdev_attach_region(), supposedly being 
called by a type2 driver after getting the memdev, does only ensure the 
region will be there for updating the attach struct owned by the type2 
driver, but as soon as the endpoint device lock is release, that could 
not be the case, I mean the region not there anymore but the attach 
struct having still the region data.

Also, if the cxl_mem unbinding happens after the type2 driver has 
invoked ioremap, the driver will be using something it should not. Won't 
it? I do not think that is correct and the type2 driver should somehow 
get a notification for stopping using such a range, some sort of detach 
function the cxl core can invoke. That is something I unsuccessfully 
tried to add in the past:

https://lore.kernel.org/linux-cxl/20250624141355.269056-19-alejandro.lucero-palau@amd.com/

>
>> Also, you are keen to use your attach function for doing the same thing
>> I have, which needs to be justified.
> No, you need to justify usage and export of cxl_core objects without
> taking care of the current lifetime rules. Did you see these reports
> from sashiko about v25? These issues are the motivation to pull
> responsibility away from client drivers.
>
> https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=6
> https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=7
>
> Now, I do not think the "attach" appraoch is the final end state. This
> needs to get simpler still, but it can not get simpler with an
> increasing attack surface.

What I mean is the protection you are adding is doable inside the type2 
series without the attach option for cxl memdev.

There is a justification for the protection avoiding racing with 
unbinding from user space, but there is no justification for the attach 
itself, at least for the sfc case.

>
>> I can not see a reason for that attach functionality with the sfc
>> case. You did not reply to that comment:
>>
>>
>> https://lore.kernel.org/linux-cxl/20260330143827.1278677-1-alejandro.lucero-palau@amd.com/T/#m21968fd0ddf02df3641194d1450cb2bd392def26
> Yes, did not have bandwidth to reply to that beyond sending out the
> attach approach to better contain CXL core complexity.
>
>> And although I have not tested it, I think this is buggy. Removal of
>> cxl_acpi will trigger unregister_region and a subsequent driver exit
>> will invoke, inside the autoremove call linked to the endpoint device,
>> devm_release_action on something not existing anymore.
> If there is a bug here I would like to see that test case because as far
> as I can see it would be shared with the generic expander case.

You are introducing the cxl_endpoint_region_autoremove() linked to the 
endpoint device. If cxl_acpi is removed, that will lead to the region 
unregistered first, then the delete_endpoint() trying to unregister the 
region with devm_release_action() again. That will make a kernel warn at 
least.

>> It is the same problem with v25 type2 and potential cxl_acpi removal.
>> Maybe we should avoid linking a type2-related region to the cxl root
>> port ...
>>
>>
>> And IMO, hiding what we are doing to the user/driver is not convenient.
>> Although current type2 is going to rely on a committed decoder, that
>> will not be the case if we look beyond impending needs.
> Yes, start with the simple committed decoder case. Later, follow-on with
> auto-create support. That would have no need to go back and teach
> accelerator drivers how to create regions in the simple case the same
> helper would "just work".
>
>> A function with a name describing this situation can add clarity to
>> the different possibilities to support as a follow up of this initial
>> work. About naming, attaching a region to a memdev is confusing ... as
>> it is already "attached" or linked, and the reason you give for having
>> that explicit attachment not being clear from a safety point of view.
> I would be happy if the only detail left to discuss was naming.
>
> In this case "attach" means "attach to CXL the topology". A PCI driver
> can create a cxl_memdev, but if that cxl_memdev finds no active CXL
> topology then that "attachment" fails.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-04-14 15:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
2026-04-09 22:02   ` Cheatham, Benjamin
2026-04-11 22:33     ` Dan Williams
2026-04-03 21:00 ` [RFC PATCH 2/4] cxl/region: Move region lock error code to -EBUSY Dan Williams
2026-04-03 21:00 ` [RFC PATCH 3/4] cxl/region: Block region delete for locked regions Dan Williams
2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
2026-04-07 10:25   ` Alejandro Lucero Palau
2026-04-11 21:42     ` Dan Williams
2026-04-14 15:41       ` Alejandro Lucero Palau
2026-04-09 22:02   ` Cheatham, Benjamin
2026-04-11 23:02     ` Dan Williams
2026-04-12  8:57       ` Lukas Wunner
2026-04-13 14:25       ` Cheatham, Benjamin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox