From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: <alejandro.lucero-palau@amd.com>
Cc: <linux-cxl@vger.kernel.org>, <netdev@vger.kernel.org>,
<dan.j.williams@intel.com>, <edward.cree@amd.com>,
<davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>,
<edumazet@google.com>, <dave.jiang@intel.com>
Subject: Re: [PATCH v20 01/22] cxl/mem: Arrange for always-synchronous memdev attach
Date: Wed, 12 Nov 2025 14:53:41 +0000 [thread overview]
Message-ID: <20251112145341.00005b4e@huawei.com> (raw)
In-Reply-To: <20251110153657.2706192-2-alejandro.lucero-palau@amd.com>
On Mon, 10 Nov 2025 15:36:36 +0000
alejandro.lucero-palau@amd.com wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In preparation for CXL accelerator drivers that have a hard dependency on
> CXL capability initialization, arrange for the endpoint probe result to be
> conveyed to the caller of devm_cxl_add_memdev().
>
> As it stands cxl_pci does not care about the attach state of the cxl_memdev
> because all generic memory expansion functionality can be handled by the
> cxl_core. For accelerators, that driver needs to know perform driver
> specific initialization if CXL is available, or exectute a fallback to PCIe
> only operation.
>
> By moving devm_cxl_add_memdev() to cxl_mem.ko it removes async module
> loading as one reason that a memdev may not be attached upon return from
> devm_cxl_add_memdev().
>
> The diff is busy as this moves cxl_memdev_alloc() down below the definition
> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
> preclude needing to export more symbols from the cxl_core.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Alejandro, read submitting patches again. Whilst the first sign off should
indeed by Dan's this also needs one from you as a 'handler' of the patch.
Be very careful checking these tag chains. If they are wrong no one can
merge the set and it just acts as a silly blocker.
I would have split this up and made the changes to cxl_memdev_alloc in
a precursor patch (use of __free is obvious one) then could have stated
that that was simply moved in this patch.
There are other changes in there that are really hard to spot though
and I think there are some bugs lurking in error paths.
Jonathan
> ---
> drivers/cxl/Kconfig | 2 +-
> drivers/cxl/core/memdev.c | 101 ++++++++++++++------------------------
> drivers/cxl/mem.c | 41 ++++++++++++++++
> drivers/cxl/private.h | 10 ++++
> 4 files changed, 90 insertions(+), 64 deletions(-)
> create mode 100644 drivers/cxl/private.h
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index e370d733e440..14b4601faf66 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -8,6 +8,7 @@
> #include <linux/idr.h>
> #include <linux/pci.h>
> #include <cxlmem.h>
> +#include "private.h"
> #include "trace.h"
> #include "core.h"
>
> @@ -648,42 +649,25 @@ static void detach_memdev(struct work_struct *work)
>
> static struct lock_class_key cxl_memdev_key;
>
> -static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> - const struct file_operations *fops)
> +int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
> {
> - struct cxl_memdev *cxlmd;
> - struct device *dev;
> - struct cdev *cdev;
> + struct device *dev = &cxlmd->dev;
> + struct cdev *cdev = &cxlmd->cdev;
> int rc;
>
> - cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
> - if (!cxlmd)
> - return ERR_PTR(-ENOMEM);
> -
> - rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
> - if (rc < 0)
> - goto err;
> - cxlmd->id = rc;
> - cxlmd->depth = -1;
> -
> - dev = &cxlmd->dev;
> - device_initialize(dev);
> - lockdep_set_class(&dev->mutex, &cxl_memdev_key);
> - dev->parent = cxlds->dev;
> - dev->bus = &cxl_bus_type;
> - dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> - dev->type = &cxl_memdev_type;
> - device_set_pm_not_required(dev);
> - INIT_WORK(&cxlmd->detach_work, detach_memdev);
> -
> - cdev = &cxlmd->cdev;
> - cdev_init(cdev, fops);
> - return cxlmd;
> + rc = cdev_device_add(cdev, dev);
> + if (rc) {
> + /*
> + * The cdev was briefly live, shutdown any ioctl operations that
> + * saw that state.
> + */
> + cxl_memdev_shutdown(dev);
> + return rc;
> + }
>
> -err:
> - kfree(cxlmd);
> - return ERR_PTR(rc);
> + return devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
> }
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_add_or_reset, "CXL");
>
> static long __cxl_memdev_ioctl(struct cxl_memdev *cxlmd, unsigned int cmd,
> unsigned long arg)
> @@ -1051,50 +1035,41 @@ static const struct file_operations cxl_memdev_fops = {
> .llseek = noop_llseek,
> };
>
> -struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> - struct cxl_dev_state *cxlds)
> +struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds)
> {
> - struct cxl_memdev *cxlmd;
> + struct cxl_memdev *cxlmd __free(kfree) =
> + kzalloc(sizeof(*cxlmd), GFP_KERNEL);
Trivial and perhaps not worth the hassle.
I'd pull this out of the declarations block to have
struct device *dev;
struct cdev *cdev;
int rc;
struct cxl_memdev *cxlmd __free(kfree) =
kzalloc(sizeof(*cxlmd), GFP_KERNEL);
if (!cxlmd)
return ERR_PTR(-ENOMEM);
That is treat the __free() related statement as an inline declaration of
the type we only really allow for these.
> struct device *dev;
> struct cdev *cdev;
> int rc;
>
> - cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
> - if (IS_ERR(cxlmd))
> - return cxlmd;
>
> - dev = &cxlmd->dev;
> - rc = dev_set_name(dev, "mem%d", cxlmd->id);
> - if (rc)
> - goto err;
> + if (!cxlmd)
> + return ERR_PTR(-ENOMEM);
>
> - /*
> - * Activate ioctl operations, no cxl_memdev_rwsem manipulation
> - * needed as this is ordered with cdev_add() publishing the device.
> - */
> + rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
> + if (rc < 0)
> + return ERR_PTR(rc);
> + cxlmd->id = rc;
> + cxlmd->depth = -1;
> cxlmd->cxlds = cxlds;
> cxlds->cxlmd = cxlmd;
These two lines weren't previously in cxl_memdev_alloc()
I'd like a statement in the commit message of why they are now. It seems
harmless because they are still ordered before the add and are
ultimately freed
I'm not immediately spotting why they now are. This whole code shift
and complex diff is enough of a pain I'd be tempted to do the move first
so that we can then see what is actually changed much more easily.
>
> - cdev = &cxlmd->cdev;
> - rc = cdev_device_add(cdev, dev);
> - if (rc)
> - goto err;
> -
> - rc = devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
> - if (rc)
> - return ERR_PTR(rc);
> - return cxlmd;
> + dev = &cxlmd->dev;
> + device_initialize(dev);
> + lockdep_set_class(&dev->mutex, &cxl_memdev_key);
> + dev->parent = cxlds->dev;
> + dev->bus = &cxl_bus_type;
> + dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> + dev->type = &cxl_memdev_type;
> + device_set_pm_not_required(dev);
> + INIT_WORK(&cxlmd->detach_work, detach_memdev);
>
> -err:
> - /*
> - * The cdev was briefly live, shutdown any ioctl operations that
> - * saw that state.
> - */
> - cxl_memdev_shutdown(dev);
> - put_device(dev);
> - return ERR_PTR(rc);
> + cdev = &cxlmd->cdev;
> + cdev_init(cdev, &cxl_memdev_fops);
> + return_ptr(cxlmd);
> }
> -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_alloc, "CXL");
>
> static void sanitize_teardown_notifier(void *data)
> {
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index d2155f45240d..fa5d901ee817 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -7,6 +7,7 @@
>
> #include "cxlmem.h"
> #include "cxlpci.h"
> +#include "private.h"
>
> /**
> * DOC: cxl mem
> @@ -202,6 +203,45 @@ static int cxl_mem_probe(struct device *dev)
> return devm_add_action_or_reset(dev, enable_suspend, NULL);
> }
>
> +static void __cxlmd_free(struct cxl_memdev *cxlmd)
> +{
> + cxlmd->cxlds->cxlmd = NULL;
> + put_device(&cxlmd->dev);
> + kfree(cxlmd);
> +}
> +
> +DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
> +
> +/**
> + * devm_cxl_add_memdev - Add a CXL memory device
> + * @host: devres alloc/release context and parent for the memdev
> + * @cxlds: CXL device state to associate with the memdev
> + *
> + * Upon return the device will have had a chance to attach to the
> + * cxl_mem driver, but may fail if the CXL topology is not ready
> + * (hardware CXL link down, or software platform CXL root not attached)
> + */
> +struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> + struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds);
> + int rc;
> +
> + if (IS_ERR(cxlmd))
> + return cxlmd;
> +
> + rc = dev_set_name(&cxlmd->dev, "mem%d", cxlmd->id);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + rc = devm_cxl_memdev_add_or_reset(host, cxlmd);
> + if (rc)
> + return ERR_PTR(rc);
Is the reference tracking right here? If the above call fails
then it is possible cxl_memdev_unregister() has been called
or just cxl_memdev_shutdown().
If nothing else (and I suspect there is worse but haven't
counted references) that will set
cxlmd->cxlds = NULL;
s part of cxl_memdev_shutdown()
The __cxlmd_free() then dereferences that and boom.
> +
> + return no_free_ptr(cxlmd);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
next prev parent reply other threads:[~2025-11-12 14:53 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 15:36 [PATCH v20 00/22] Type2 device basic support alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 01/22] cxl/mem: Arrange for always-synchronous memdev attach alejandro.lucero-palau
2025-11-12 14:53 ` Jonathan Cameron [this message]
2025-11-14 11:10 ` Alejandro Lucero Palau
2025-11-14 15:24 ` Dave Jiang
2025-11-10 15:36 ` [PATCH v20 02/22] cxl/port: Arrange for always synchronous endpoint attach alejandro.lucero-palau
2025-11-12 14:57 ` Jonathan Cameron
2025-11-13 23:01 ` Dave Jiang
2025-11-10 15:36 ` [PATCH v20 03/22] cxl/mem: Introduce a memdev creation ->probe() operation alejandro.lucero-palau
2025-11-12 15:00 ` Jonathan Cameron
2025-11-13 23:02 ` Dave Jiang
2025-11-10 15:36 ` [PATCH v20 04/22] cxl: Add type2 device basic support alejandro.lucero-palau
2025-11-12 15:33 ` Jonathan Cameron
2025-11-15 8:11 ` Alejandro Lucero Palau
2025-11-10 15:36 ` [PATCH v20 05/22] sfc: add cxl support alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 06/22] cxl: Move pci generic code alejandro.lucero-palau
2025-11-12 15:41 ` Jonathan Cameron
2025-11-15 8:12 ` Alejandro Lucero Palau
2025-11-17 15:00 ` Dave Jiang
2025-11-18 14:52 ` Alejandro Lucero Palau
2025-11-14 0:25 ` Alison Schofield
2025-11-14 16:15 ` Dave Jiang
2025-11-15 8:16 ` Alejandro Lucero Palau
2025-11-16 2:07 ` Alison Schofield
2025-11-18 14:55 ` Alejandro Lucero Palau
2025-11-10 15:36 ` [PATCH v20 07/22] cxl/sfc: Map cxl component regs alejandro.lucero-palau
2025-11-12 15:45 ` Jonathan Cameron
2025-11-12 15:52 ` Jonathan Cameron
2025-11-10 15:36 ` [PATCH v20 08/22] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2025-11-12 15:52 ` Jonathan Cameron
2025-11-10 15:36 ` [PATCH v20 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 10/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-11-12 16:10 ` Jonathan Cameron
2025-11-19 17:16 ` Alejandro Lucero Palau
2025-11-10 15:36 ` [PATCH v20 12/22] sfc: get root decoder alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 14/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-11-13 23:52 ` Dave Jiang
2025-11-10 15:36 ` [PATCH v20 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-11-12 16:19 ` Jonathan Cameron
2025-11-19 18:31 ` Alejandro Lucero Palau
2025-11-14 0:00 ` Dave Jiang
2025-11-10 15:36 ` [PATCH v20 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 20/22] sfc: create cxl region alejandro.lucero-palau
2025-11-12 16:21 ` Jonathan Cameron
2025-11-14 0:02 ` Dave Jiang
2025-11-10 15:36 ` [PATCH v20 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-11-10 15:36 ` [PATCH v20 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-11-12 16:24 ` Jonathan Cameron
2025-11-14 0:03 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251112145341.00005b4e@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=edward.cree@amd.com \
--cc=kuba@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).