public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Dan Williams <dan.j.williams@intel.com>,
	alejandro.lucero-palau@amd.com, linux-cxl@vger.kernel.org,
	netdev@vger.kernel.org, dave.jiang@intel.com,
	edward.cree@amd.com, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com
Cc: Martin Habets <habetsm.xilinx@gmail.com>,
	Fan Ni <fan.ni@samsung.com>, Edward Cree <ecree.xilinx@gmail.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH v25 05/11] sfc: create type2 cxl memdev
Date: Wed, 1 Apr 2026 11:16:17 +0100	[thread overview]
Message-ID: <f0f5a510-4073-444c-9f05-7153d0b2087b@amd.com> (raw)
In-Reply-To: <69ccaa8448da1_1b0cc61009c@dwillia2-mobl4.notmuch>


On 4/1/26 06:17, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl API for creating a cxl memory device using the type2
>> cxl_dev_state struct.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Reviewed-by: Fan Ni <fan.ni@samsung.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>   drivers/net/ethernet/sfc/efx_cxl.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 6619084a77d8..63e6f277ae9f 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -75,6 +75,12 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>   		return -ENODEV;
>>   	}
>>   
>> +	cxl->cxlmd = devm_cxl_add_memdev(&cxl->cxlds, NULL);
> Did this forget about:
>
> 29317f8dc6ed cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
>
> ?


I did not. The idea behind attach does not make sense for sfc, so I do 
not use that option. I did discuss this with you at LPC and based on 
previous comments which had not replies (case 1 in here):


https://lore.kernel.org/linux-cxl/836d06d6-a36f-4ba3-b7c9-ba8687ba2190@amd.com/


 From our conversation at LPC, there is no reason for a CXL device 
connected to the CXL Root Bridge not having its memdev properly 
initialized. I did ask you specifically about this, and you mentioned 
links on CXL switches not ready, but as I said then and repeat now, this 
is not what sfc driver expects at all. Only a port from a CXL Root 
Bridge makes sense due to the latencies involved with more complex CXL 
topologies.


So, from that list I wrote in that old thread where I tried to summarize 
the problems and clarify the confusion behind different concerns, the 
only issue is someone removing cxl_acpi. I do not think that should be 
possible at all, so something should be added for other modules 
depending on it avoiding its removal. I would say any CXL port created 
is (in X86) dependent on cxl_acpi, so something at port creation 
invoking a exported cxl_acpi function could do it, like a cxl_acpi_users 
counter. This could also help for visibility to user space about its usage.


Therefore, I do not like the changes you propose here. But, if this 
proposal is the only way this can go through in time for 7.1, I will 
resign myself to it. It is not clear I'll be working on CXL in the near 
future, so any progress even without my agreement is better than nothing.


> tl;dr: see the replacement patch below for this and the next several
> patches. Untested! I will write a cxl_test for it tomorrow. For now,
> take it as a "something like this".
>
> ---
> ...longer story.
>
> The difference between general memory expansion and other CXL memory is
> that the driver has a use case for the memory. For general memory
> expansion the memdev *is* the use case. A NULL attach parameter means
> there is value in continuing if the memdev is not attached to the CXL
> domain or not mapped in any region.
>
> A non-NULL @attach addresses all the caller's requirements at attach
> time. It also arranges by default to trigger ->remove() if anything in
> the CXL topology is disturbed while the region is in use. Anything from
> hotplug, to modprobe -r cxl_acpi, or cxl disable-port.
>
> For efx a finer grained failure mode can be built on top of this by
> adding a remove action that something like efx_tx_may_pio() could use to
> dynamically fallback away from CXL if something goes wrong.
>
> Until then though the CXL subsystem, because CXL allows for dynamic
> memory reassignment, goes to great lengths to ensure that it can all be
> dynamically torn down. So consumers that register relative to a cxl_port
> topology need to be prepared for that port hierarchy to be torn down and
> evacuate their usage of any regions.
>
> The rest of the patches in this set are racy because none of the results
> are stable (locks dropped on return) and any teardown is silent (no
> notification of teardown).
>
> All of this wants to stay local to the cxl_core and should not be
> anything that accelerator drivers need to worry about. Accelerator
> drivers just register, get a physical address range to map, and off they
> go.
>
> The meat of this patch is cxl_memdev_attach_region(). The rest of is
> protection against userspace messing up the region configuration.
>
> ---
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
> index 961639cef692..bfd8633a1e01 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.h
> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
> @@ -24,10 +24,7 @@ struct efx_probe_data;
>   struct efx_cxl {
>   	struct cxl_dev_state cxlds;
>   	struct cxl_memdev *cxlmd;
> -	struct cxl_root_decoder *cxlrd;
> -	struct cxl_port *endpoint;
> -	struct cxl_endpoint_decoder *cxled;
> -	struct cxl_region *efx_region;
> +	struct cxl_attach_region attach;
>   	void __iomem *ctpio_cxl;
>   };
>   
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 10a9b8fa2f6b..1698d15ec1ca 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -153,6 +153,22 @@ struct cxl_memdev_attach {
>   	int (*probe)(struct cxl_memdev *cxlmd);
>   };
>   
> +/**
> + * struct cxl_attach_region - coordinate mapping a region at memdev registration
> + * @attach: common core attachment descriptor
> + * @region: physical address range of the region
> + *
> + * For the common simple case of a CXL device with private (non-general purpose
> + * / "accelerator") memory, enumerate firmware instantiated region, or
> + * instantiate a region for the device's capacity. Destroy the region on detach.
> + */
> +struct cxl_attach_region {
> +	struct cxl_memdev_attach attach;
> +	struct range region;
> +};
> +
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd);
> +
>   /**
>    * struct cxl_dev_state - The driver device state
>    *
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3edb5703d6de..5d60e6c0a89e 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -406,6 +406,40 @@ static int __commit(struct cxl_region *cxlr)
>   	return 0;
>   }
>   
> +/*
> + * When a region's memdevs specify an @attach method the attach provider is
> + * responsible for dispositioning the region for both probe and userspace
> + * management
> + */
> +static bool cxl_region_has_memdev_attach(struct cxl_region *cxlr)
> +{
> +	struct cxl_region_params *p = &cxlr->params;
> +
> +	for (int i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *cxled = p->targets[i];
> +		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +
> +		if (cxlmd->attach)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int check_region_mutable(struct cxl_region *cxlr)
> +{
> +	int rc;
> +
> +	ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
> +	if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
> +		return rc;
> +
> +	if (cxl_region_has_memdev_attach(cxlr))
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
>   static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
>   			    const char *buf, size_t len)
>   {
> @@ -418,6 +452,10 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
>   	if (rc)
>   		return rc;
>   
> +	rc = check_region_mutable(cxlr);
> +	if (rc)
> +		return rc;
> +
>   	if (commit) {
>   		rc = __commit(cxlr);
>   		if (rc)
> @@ -2768,17 +2806,23 @@ static ssize_t delete_region_store(struct device *dev,
>   {
>   	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>   	struct cxl_port *port = to_cxl_port(dev->parent);
> -	struct cxl_region *cxlr;
> +	int rc;
>   
> -	cxlr = cxl_find_region_by_name(cxlrd, buf);
> +	struct cxl_region *cxlr __free(put_cxl_region) =
> +		cxl_find_region_by_name(cxlrd, buf);
>   	if (IS_ERR(cxlr))
>   		return PTR_ERR(cxlr);
>   
> +	rc = check_region_mutable(cxlr);
> +	if (rc)
> +		return rc;
> +
>   	devm_release_action(port->uport_dev, unregister_region, cxlr);
> -	put_device(&cxlr->dev);
>   
>   	return len;
>   }
> +
> +
>   DEVICE_ATTR_WO(delete_region);
>   
>   static void cxl_pmem_region_release(struct device *dev)
> @@ -4223,6 +4267,88 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
>   	return 0;
>   }
>   
> +static int first_mapped_decoder(struct device *dev, const void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (cxled->cxld.region)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * As this is running in endpoint port remove context it does not race cxl_root
> + * destruction since port topologies are always removed depth first.
> + */
> +static void cxl_endpoint_region_autoremove(void *_cxlr)
> +{
> +	struct cxl_region *cxlr = _cxlr;
> +	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> +	struct cxl_port *port = cxlrd_to_port(cxlrd);
> +
> +	devm_release_action(port->uport_dev, unregister_region, cxlr);
> +}
> +
> +/*
> + * Runs in cxl_mem_probe context after successful endpoint probe, assumes the
> + * simple case of single mapped decoder per memdev.
> + */
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_attach_region *attach =
> +		container_of(cxlmd->attach, typeof(*attach), attach);
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_region *cxlr;
> +	int rc;
> +
> +	/* hold endpoint lock to setup autoremove of the region */
> +	guard(device)(&endpoint->dev);
> +	if (!endpoint->dev.driver)
> +		return -ENXIO;
> +	guard(rwsem_read)(&cxl_rwsem.region);
> +	guard(rwsem_read)(&cxl_rwsem.dpa);
> +
> +	/*
> +	 * TODO auto-instantiate a region, for now assume this will find an
> +	 * auto-region
> +	 */
> +	struct device *dev __free(put_device) =
> +		device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
> +
> +	if (!dev) {
> +		dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
> +			dev_name(&cxlmd->dev));
> +		return -ENXIO;
> +	}
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	cxlr = cxled->cxld.region;
> +	rc = devm_add_action_or_reset(&endpoint->dev,
> +				      cxl_endpoint_region_autoremove, cxlr);
> +	if (rc)
> +		return rc;
> +
> +	if (cxlr->params.state < CXL_CONFIG_COMMIT) {
> +		dev_dbg(cxlmd->cxlds->dev,
> +			"region %s not committed for memdev %s\n",
> +			dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
> +		return -ENXIO;
> +	}
> +
> +	attach->region = (struct range) {
> +		.start = cxlr->params.res->start,
> +		.end = cxlr->params.res->end,
> +	};
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
> +
>   static int cxl_region_probe(struct device *dev)
>   {
>   	struct cxl_region *cxlr = to_cxl_region(dev);
> @@ -4254,6 +4380,9 @@ static int cxl_region_probe(struct device *dev)
>   	if (rc)
>   		return rc;
>   
> +	if (cxl_region_has_memdev_attach(cxlr))
> +		return 0;
> +
>   	switch (cxlr->mode) {
>   	case CXL_PARTMODE_PMEM:
>   		rc = devm_cxl_region_edac_register(cxlr);
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 6619084a77d8..a388f3120dd4 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -75,6 +75,25 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>   		return -ENODEV;
>   	}
>   
> +	cxl->attach = (struct cxl_attach_region) {
> +		.attach = {
> +			.probe = cxl_memdev_attach_region,
> +		},
> +	};
> +	cxl->cxlmd = devm_cxl_add_memdev(&cxl->cxlds, &cxl->attach.attach);
> +	if (IS_ERR(cxl->cxlmd)) {
> +		pci_err(pci_dev, "CXL accel memdev creation failed");
> +		return PTR_ERR(cxl->cxlmd);
> +	}
> +
> +	cxl->ctpio_cxl = ioremap(cxl->attach.region.start,
> +				 range_len(&cxl->attach.region));
> +	if (!cxl->ctpio_cxl) {
> +		pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
> +		return -ENOMEM;
> +	}
> +
> +
>   	probe_data->cxl = cxl;
>   
>   	return 0;
> @@ -82,6 +101,10 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>   
>   void efx_cxl_exit(struct efx_probe_data *probe_data)
>   {
> +	if (!probe_data->cxl)
> +		return;
> +
> +	iounmap(probe_data->cxl->ctpio_cxl);
>   }
>   
>   MODULE_IMPORT_NS("CXL");

  reply	other threads:[~2026-04-01 10:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 14:38 [PATCH v25 00/11] Type2 device basic support alejandro.lucero-palau
2026-03-30 14:38 ` [PATCH v25 01/11] sfc: add cxl support alejandro.lucero-palau
2026-03-31  3:37   ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 02/11] cxl/sfc: Map cxl regs alejandro.lucero-palau
2026-03-30 14:38 ` [PATCH v25 03/11] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2026-03-30 14:38 ` [PATCH v25 04/11] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2026-03-31  3:46   ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 05/11] sfc: create type2 cxl memdev alejandro.lucero-palau
2026-03-31 16:47   ` kernel test robot
2026-04-01  5:17   ` Dan Williams
2026-04-01 10:16     ` Alejandro Lucero Palau [this message]
2026-04-01 21:53       ` Dan Williams
2026-04-02  6:30         ` Alejandro Lucero Palau
2026-04-02 18:32           ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 06/11] cxl/hdm: Add support for getting region from committed decoder alejandro.lucero-palau
2026-04-01  5:18   ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 07/11] cxl: Add function for obtaining region range alejandro.lucero-palau
2026-04-01  5:20   ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 08/11] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
2026-04-01  5:21   ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 09/11] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
2026-03-31 16:23   ` kernel test robot
2026-03-30 14:38 ` [PATCH v25 10/11] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2026-04-01  5:27   ` Dan Williams
2026-03-30 14:38 ` [PATCH v25 11/11] sfc: support pio mapping based on cxl alejandro.lucero-palau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0f5a510-4073-444c-9f05-7153d0b2087b@amd.com \
    --to=alucerop@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=ecree.xilinx@gmail.com \
    --cc=edumazet@google.com \
    --cc=edward.cree@amd.com \
    --cc=fan.ni@samsung.com \
    --cc=habetsm.xilinx@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox