Linux CXL
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: "Cheatham, Benjamin" <benjamin.cheatham@amd.com>,
	alejandro.lucero-palau@amd.com, linux-cxl@vger.kernel.org,
	djbw@kernel.org, edward.cree@amd.com, davem@davemloft.net,
	kuba@kernel.org, pabeni@redhat.com, edumazet@google.com,
	dave.jiang@intel.com
Subject: Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
Date: Fri, 1 May 2026 11:35:04 +0100	[thread overview]
Message-ID: <4277a34e-4edf-44b7-a6a2-05668791e080@amd.com> (raw)
In-Reply-To: <a8390b56-e5a9-4e06-b8f8-e66a8bd9e824@amd.com>


On 4/29/26 22:14, Cheatham, Benjamin wrote:
> On 4/23/2026 1:05 PM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Support an accelerator driver to safely work with an autodiscovered
>> region from a committed HDM decoder through:
>>
>>          1) an accelerator driver cxl_attach_region struct with attach
>>             and detach callbacks.
>>
>>          2) a specific function, cxl_memdev_attach_region() keeping the
>>             required locks for finding a region linked to the memdev
>>             endpoint, and
>>
>>          3) invoking attach callback while keeping the locking allowing to
>>             work (ioremap and other internal stuff) with the related physical
>>             range by the accelerator driver, and
>>
>>          4) linking a detach callback to the endpoint device removal where
>>             the accelerator driver can stop using the region range.
>>
>> This covers the cases of a potential removal of cxl_acpi module or a
>> accelerator memdev unbinding from cxl_mem driver through sysfs.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>>   drivers/cxl/core/region.c          | 118 ++++++++++++++++++++++++++++-
>>   drivers/net/ethernet/sfc/efx_cxl.c |  37 +++++++++
>>   drivers/net/ethernet/sfc/efx_cxl.h |   2 +
>>   include/cxl/cxl.h                  |  17 +++++
>>   4 files changed, 171 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index e50dc716d4e8..68f5a1fd1b1c 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2711,9 +2711,16 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>>   	if (rc)
>>   		goto err;
>>   
>> -	rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
>> -	if (rc)
>> -		return ERR_PTR(rc);
>> +	/*
>> +	 * For accelerators/type2, region release linked to endpoint device.
>> +	 * See handling of cxl_endpoint_region_autoremove() below by
>> +	 * cxl_memdev_attach_region().
>> +	 */
>> +	if (type == CXL_DECODER_HOSTONLYMEM) {
>> +		rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
>> +		if (rc)
>> +			return ERR_PTR(rc);
>> +	}
>>   
>>   	dev_dbg(port->uport_dev, "%s: created %s\n",
>>   		dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
>> @@ -4043,6 +4050,111 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
>>   	return 0;
>>   }
>>   
>> +static int first_mapped_decoder(struct device *dev, const void *data)
>> +{
>> +	struct cxl_endpoint_decoder *cxled;
>> +
>> +	if (!is_endpoint_decoder(dev))
>> +		return 0;
>> +
>> +	cxled = to_cxl_endpoint_decoder(dev);
>> +	if (cxled->cxld.region)
>> +		return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * As this is running in endpoint port remove context it does not race cxl_root
>> + * destruction since port topologies are always removed depth first.
>> + */
>> +static void cxl_endpoint_region_autoremove(void *_cxlr)
>> +{
>> +	unregister_region(_cxlr);
>> +}
>> +
>> +/**
>> + * cxl_memdev_attach_region - bind region to accelerator memdev
>> + *
>> + * @cxlmd: a pointer to cxl_memdev to use
>> + * @attach: a pointer to region attach struct with callbacks for
>> + * 	    safely working with a region range by the caller
>> + *
>> + * Returns 0 or error.
>> + */
>> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd,
>> +			     struct cxl_attach_region *attach)
>> +{
>> +	struct cxl_port *endpoint = cxlmd->endpoint;
>> +	struct cxl_endpoint_decoder *cxled;
>> +	struct cxl_region *cxlr;
>> +	int rc;
>> +
>> +	/* hold endpoint lock to setup autoremove of the region */
>> +	guard(device)(&endpoint->dev);
>> +	if (!endpoint->dev.driver)
>> +		return -ENXIO;
>> +	guard(rwsem_read)(&cxl_rwsem.region);
>> +	guard(rwsem_read)(&cxl_rwsem.dpa);
>> +
>> +	/*
>> +	 * TODO auto-instantiate a region, for now assume this will find an
>> +	 * auto-region
>> +	 */
>> +	struct device *dev __free(put_device) =
>> +		device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
>> +
>> +	if (!dev) {
>> +		dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
>> +			dev_name(&cxlmd->dev));
>> +		return -ENXIO;
>> +	}
>> +
>> +	cxled = to_cxl_endpoint_decoder(dev);
>> +	cxlr = cxled->cxld.region;
>> +
>> +	if (cxlr->params.state < CXL_CONFIG_COMMIT) {
>> +		dev_dbg(cxlmd->cxlds->dev,
>> +			"region %s not committed for memdev %s\n",
>> +			dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
>> +		return -ENXIO;
>> +	}
>> +
>> +	if (cxlr->params.nr_targets > 1) {
>> +		dev_dbg(cxlmd->cxlds->dev,
>> +			"Only attach to local non-interleaved region\n");
>> +		return -ENXIO;
>> +	}
>> +
>> +	attach->region = (struct range) {
>> +		.start = cxlr->params.res->start,
>> +		.end = cxlr->params.res->end,
>> +	};
>> +
>> +	/*
>> +	 * With endpoint locked leave the caller to safely work with the region
>> +	 * range.
>> +	 */
> This reads a bit weirdly, how about: "Let the caller do its region-specific setup"? I
> don't think it's important to mention the endpoint being locked here.


I wanted to emphasize the safety of such attachment regarding the 
problem of concurrent unwinding triggered from user space.


>> +	rc = attach->attach(attach->data);
>> +	if (rc)
>> +		return rc;
>> +
>> +	/* Only teardown regions that pass validation, ignore the rest */
>> +	rc = devm_add_action_or_reset(&endpoint->dev,
>> +				      cxl_endpoint_region_autoremove, cxlr);
>> +	if (rc)
>> +		return rc;
>> +
>> +	/* Link type2 driver callback for stopping use of the region range. */
>> +	rc = devm_add_action_or_reset(&endpoint->dev,
>> +				      attach->detach, attach->data);
>> +	if (rc)
>> +		return rc;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
>> +
>>   static int cxl_region_probe(struct device *dev)
>>   {
>>   	struct cxl_region *cxlr = to_cxl_region(dev);
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 7d8a6c2133c8..9ca16b051d62 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -14,6 +14,36 @@
>>   
>>   #define EFX_CTPIO_BUFFER_SIZE	SZ_256M
>>   
>> +/* Called with cxl endpoint device locked for precluding potential related
>> + * cxl region removal triggered from user space, allowing safely mapping of
>> + * such cxl region by the sfc driver.
>> + */
>> +static int efx_cxl_map_region(void *data) {
>> +	struct efx_probe_data *probe_data = data;
>> +	struct efx_nic *efx = &probe_data->efx;
>> +	struct pci_dev *pci_dev = efx->pci_dev;
>> +	struct efx_cxl *cxl = probe_data->cxl;
>> +	struct range *cxl_pio_range = &cxl->attach_region.region;
>> +
>> +	cxl->ctpio_cxl = ioremap(cxl_pio_range->start,
>> +				 cxl_pio_range->end - cxl_pio_range->start + 1);
>> +	if (!cxl->ctpio_cxl) {
>> +		pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
>> +				 cxl_pio_range);
>> +		return -ENOMEM;
>> +	}
>> +	probe_data->cxl_pio_initialised = true;
>> +	return 0;
>> +}
>> +
>> +/* Called at driver exit or when user space triggers cxl region removal. */
>> +static void efx_cxl_unmap_region(void *data) {
>> +	struct efx_probe_data *probe_data = data;
>> +
>> +	probe_data->cxl_pio_initialised = false;
>> +	iounmap(probe_data->cxl->ctpio_cxl);
>> +}
>> +
>>   int efx_cxl_init(struct efx_probe_data *probe_data)
>>   {
>>   	struct efx_nic *efx = &probe_data->efx;
>> @@ -81,6 +111,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>   		return PTR_ERR(cxl->cxlmd);
>>   	}
>>   
>> +	probe_data->cxl->attach_region.attach = efx_cxl_map_region;
>> +	probe_data->cxl->attach_region.detach = efx_cxl_unmap_region;
>> +	probe_data->cxl->attach_region.data = probe_data;
>> +
>> +	cxl_memdev_attach_region(cxl->cxlmd,
>> +				 &probe_data->cxl->attach_region);
> I would think you want to check the return code here, no? I looked at the last patch
> and don't think this will cause issues, but I would at least expect some debug output
> here.


Of course, I forgot to do so. RFC hastiness ...


>> +
>>   	probe_data->cxl = cxl;
>>   
>>   	return 0;
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
>> index 04e46278464d..1c294cd1df56 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.h
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
>> @@ -20,6 +20,8 @@ struct efx_probe_data;
>>   struct efx_cxl {
>>   	struct cxl_dev_state cxlds;
>>   	struct cxl_memdev *cxlmd;
>> +	struct cxl_attach_region attach_region;
>> +	void __iomem *ctpio_cxl;
>>   };
>>   
>>   int efx_cxl_init(struct efx_probe_data *probe_data);
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 10a9b8fa2f6b..22d9435b351f 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -7,6 +7,7 @@
>>   
>>   #include <linux/node.h>
>>   #include <linux/ioport.h>
>> +#include <linux/pci.h>
>>   #include <cxl/mailbox.h>
>>   
>>   /**
>> @@ -153,6 +154,20 @@ struct cxl_memdev_attach {
>>   	int (*probe)(struct cxl_memdev *cxlmd);
>>   };
>>   
>> +/**
>> + * struct cxl_attach_region - accelerator region handling
>> + * @attach: invoked at cxl_memdev_attach_region() with endpoint device locked.
>> + * @detach: invoked at endpoint release.
>> + * @data: pointer referencing accelerator data for attach and detach calls.
>> + * @region: initialised with autodiscovered region values linked to memdev.
>> + */
>> +struct cxl_attach_region {
>> +	int (*attach)(void *);
>> +	void (*detach)(void *);
>> +	void *data;
>> +	struct range region;
>> +};
> I like your approach here,


Thank you.


>   but have a problem with this. It seems like an attempt to circumvent the cxl_memdev_attach
> struct above, which makes it borderline useless (as of now). Would you be opposed to folding it into cxl_memdev_attach
> instead? I'm thinking doing something like:
>
> /**
>   * struct cxl_memdev_attach - CXL memdev driver and region attach handlers
>   *
>   * @mem_probe: Callback used during cxl_mem::probe for memdev specific setup
>   * @region_attach: Device specific setup invoked during cxl_memdev_attach_region() with endpoint device locked
>   * @region_detach: Device specific cleanup invoked at endpoint release
>   * @region_data: Pointer passed to region_attach and region_detach calls
>   * @region: Memory range covered by autodiscovered region linked to memdev
>   */
> struct cxl_memdev_attach {
> 	int (*mem_probe)(struct cxl_memdev *cxlmd);
> 	int (*region_attach)(void *data);
> 	int (*region_detach)(void *data);
> 	void *region_data;
> 	struct range region;
> };
>
> and then replace cxl_attach_region with cxl_memdev_attach accordingly. I think that's a 3-way win: You get
> what you want to do, Dan's work isn't invalidated, and cxl_memdev_attach becomes a one-stop shop for all
> CXL accelerator attach shenanigans.


I would prefer to have such mem_probe aka attach by DanW decoupled from 
region attachment.


First of all, because they target different situations, endpoint not 
being ready in the first case, dealing with a potential committed 
decoder and auto discovered region in the second.


Also, the mem_probe thing is not necessary for our device and for any 
other one connected to one  CXL Root Bridge as I have been saying since 
Dan added that option (or waiting for another reason behind its 
necessity). I agree it could be necessary for other drivers, so happy to 
have it ... for getting the memdev. Note the current patchset version 
does only cover the case of a region already there, but as the effort 
had since first version, there will be a need for getting a region where 
the caller can decide the size based on not only available CFMWS but 
whatever the driver considers knowing the amount available and the own 
driver configuration. Once you have DRAM in the device, I envision cases 
where the amount of memory used for CXL purposes being only part of it 
because the device/driver configuration decides so, then using some 
portion of that memory for other internal purposes.



>> +
>>   /**
>>    * struct cxl_dev_state - The driver device state
>>    *
>> @@ -231,4 +246,6 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>>   int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
>>   struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
>>   				       const struct cxl_memdev_attach *attach);
>> +struct cxl_region;
>> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd, struct cxl_attach_region *attach);
>>   #endif /* __CXL_CXL_H__ */

  reply	other threads:[~2026-05-01 10:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 1/8] sfc: add cxl support alejandro.lucero-palau
2026-04-29 21:14   ` Cheatham, Benjamin
2026-05-01 10:07     ` Alejandro Lucero Palau
2026-04-23 18:05 ` [PATCH v26 2/8] cxl/sfc: Map cxl regs alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 3/8] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 4/8] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2026-04-30 23:23   ` Dan Williams (nvidia)
2026-04-23 18:05 ` [PATCH v26 5/8] sfc: create type2 cxl memdev alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev alejandro.lucero-palau
2026-04-29 21:14   ` Cheatham, Benjamin
2026-05-01 10:35     ` Alejandro Lucero Palau [this message]
2026-05-01  2:00   ` Dan Williams (nvidia)
2026-05-01 10:59     ` Alejandro Lucero Palau
2026-05-02  0:46       ` Dan Williams (nvidia)
2026-05-05 20:51         ` Alejandro Lucero Palau
2026-04-23 18:05 ` [PATCH v26 7/8] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2026-04-29 21:14   ` Cheatham, Benjamin
2026-04-23 18:05 ` [PATCH v26 8/8] sfc: support pio mapping based on cxl alejandro.lucero-palau
2026-04-23 22:07 ` [PATCH v26 0/8] Type2 device basic support Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4277a34e-4edf-44b7-a6a2-05668791e080@amd.com \
    --to=alucerop@amd.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=benjamin.cheatham@amd.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=djbw@kernel.org \
    --cc=edumazet@google.com \
    --cc=edward.cree@amd.com \
    --cc=kuba@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox