From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>
Cc: <mhonap@nvidia.com>, <aniketa@nvidia.com>, <ankita@nvidia.com>,
<alwilliamson@nvidia.com>, <vsethi@nvidia.com>, <jgg@nvidia.com>,
<mochs@nvidia.com>, <skolothumtho@nvidia.com>,
<alejandro.lucero-palau@amd.com>, <dave@stgolabs.net>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<ira.weiny@intel.com>, <dan.j.williams@intel.com>, <jgg@ziepe.ca>,
<yishaih@nvidia.com>, <kevin.tian@intel.com>, <cjia@nvidia.com>,
<targupta@nvidia.com>, <zhiw@nvidia.com>, <kjaju@nvidia.com>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
<kvm@vger.kernel.org>
Subject: Re: [PATCH 09/20] vfio/cxl: Implement CXL device detection and HDM register probing
Date: Fri, 13 Mar 2026 12:43:42 +0000 [thread overview]
Message-ID: <20260313124342.00001add@huawei.com> (raw)
In-Reply-To: <26be5ee3-d7ba-4bea-9824-c514341cf324@intel.com>
On Thu, 12 Mar 2026 15:31:03 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> On 3/11/26 1:34 PM, mhonap@nvidia.com wrote:
> > From: Manish Honap <mhonap@nvidia.com>
> >
> > Implement the core CXL Type-2 device detection and component register
> > probing logic in vfio_pci_cxl_detect_and_init().
> >
> > Three private helpers are introduced:
> >
> > vfio_cxl_create_device_state() allocates the per-device
> > vfio_pci_cxl_state structure using devm_cxl_dev_state_create() so
> > that lifetime is tied to the PCI device binding.
> >
> > vfio_cxl_find_bar() locates the PCI BAR that contains a given HPA
> > range, returning the BAR index and offset within it.
> >
> > vfio_cxl_setup_regs() uses the CXL core helpers cxl_find_regblock()
> > and cxl_probe_component_regs() to enumerate the HDM decoder register
> > block, then records its BAR index, offset and size in the CXL state.
> >
> > vfio_pci_cxl_detect_and_init() orchestrates detection:
> > 1. Check for CXL DVSEC via pcie_is_cxl() + pci_find_dvsec_capability().
> > 2. Allocate CXL device state.
> > 3. Temporarily call pci_enable_device_mem() for ioremap, then disable.
> > 4. Probe component registers to find the HDM decoder block.
> >
> > On any failure vdev->cxl is devm_kfree'd so that device falls back to
> > plain PCI mode transparently.
> >
> > Signed-off-by: Manish Honap <mhonap@nvidia.com>
Given Dave didn't crop anything I'll just reply on top and avoid duplication.
Mostly lifetime handling comments. I get nervous when devm occurs in the middle
of non devm code. It needs to be done very carefully. I don't think you
have a bug here, but I'm not keen on the resulting difference in order of setup
and tear down. I'd like cleanup to tidy it up even though it would be safe
to do later.
> > ---
> > drivers/vfio/pci/cxl/vfio_cxl_core.c | 151 +++++++++++++++++++++++++++
> > drivers/vfio/pci/cxl/vfio_cxl_priv.h | 8 ++
> > 2 files changed, 159 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vfio_cxl_core.c
> > index 7698d94e16be..2da6da1c0605 100644
> > --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c
> > +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c
> > @@ -18,6 +18,114 @@
> >
> > MODULE_IMPORT_NS("CXL");
> >
> > +static int vfio_cxl_create_device_state(struct vfio_pci_core_device *vdev,
> > + u16 dvsec)
> > +{
> > + struct pci_dev *pdev = vdev->pdev;
> > + struct device *dev = &pdev->dev;
> > + struct vfio_pci_cxl_state *cxl;
> > + bool cxl_mem_capable, is_cxl_type3;
> > + u16 cap_word;
> > +
> > + /*
> > + * The devm allocation for the CXL state remains for the entire time
> > + * the PCI device is bound to vfio-pci. From successful CXL init
> > + * in probe until the device is released on unbind.
> > + * No extra explicit free is needed; devm handles it when
> > + * pdev->dev is released.
> > + */
> > + vdev->cxl = devm_cxl_dev_state_create(dev,
Rather than assigning this in here, I'd use a local variable for
the return of this, operate on that and return it from the function.
That both creates a clean separation and possibly make error
handling simpler later...
Also similar to some of the feedback Alejandro had on his type 2
series, be very careful with mixing devm calls and non devm,
generally that's a path to hard to read and reason about code.
I know you've stated this is fine because it's tied to the
PCI device lifetime and you are probably right on that, but
I'm not keen.
This might be a case where manual unwinding of devres is needed
(maybe using devres groups if we end up with a bunch of stuff
to undo).
> > + CXL_DEVTYPE_DEVMEM,
> > + pdev->dev.id, dvsec,
> > + struct vfio_pci_cxl_state,
> > + cxlds, false);
> > + if (!vdev->cxl)
> > + return -ENOMEM;
> > +
> > + cxl = vdev->cxl;
> > + cxl->dvsec = dvsec;
That's a bit odd given we pass dvsec into devm_cxl_dev_state_create()
why doesn't it assign it?
> > +
> > + pci_read_config_word(pdev, dvsec + CXL_DVSEC_CAPABILITY_OFFSET,
> > + &cap_word);
> > +
> > + cxl_mem_capable = !!(cap_word & CXL_DVSEC_MEM_CAPABLE);
> > + is_cxl_type3 = ((pdev->class >> 8) == PCI_CLASS_MEMORY_CXL);
>
> Both of these can use FIELD_GET().
>
> > +
> > + /*
> > + * Type 2 = CXL memory capable but NOT Type 3 (e.g. accelerator/GPU)
> > + * Unsupported for non cxl type-2 class of devices.
As in other places, type 3 doesn't mean that. What you mean is class
code compliant type 3.
> > + */
> > + if (!(cxl_mem_capable && !is_cxl_type3)) {
> > + devm_kfree(&pdev->dev, vdev->cxl);
As below. That needs a name that makes it clear it is the right devm call.
> > + vdev->cxl = NULL;
> > + return -ENODEV;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int vfio_cxl_setup_regs(struct vfio_pci_core_device *vdev)
> > +{
> > + struct vfio_pci_cxl_state *cxl = vdev->cxl;
> > + struct cxl_register_map *map = &cxl->cxlds.reg_map;
> > + resource_size_t offset, bar_offset, size;
> > + struct pci_dev *pdev = vdev->pdev;
> > + void __iomem *base;
> > + u32 count;
> > + int ret;
> > + u8 bar;
> > +
> > + if (WARN_ON_ONCE(!pci_is_enabled(pdev)))
> > + return -EINVAL;
> > +
> > + /* Find component register block via Register Locator DVSEC */
> > + ret = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, map);
> > + if (ret)
> > + return ret;
> > +
> > + /* Temporarily map the register block */
> > + base = ioremap(map->resource, map->max_size);
>
> Request the mem region before mapping it?
>
> DJ
>
> > + if (!base)
> > + return -ENOMEM;
> > +
> > + /* Probe component register capabilities */
> > + cxl_probe_component_regs(&pdev->dev, base, &map->component_map);
> > +
> > + /* Unmap immediately */
> > + iounmap(base);
> > +
> > + /* Check if HDM decoder was found */
> > + if (!map->component_map.hdm_decoder.valid)
> > + return -ENODEV;
> > +
> > + pci_dbg(pdev,
> > + "vfio_cxl: HDM decoder at offset=0x%lx, size=0x%lx\n",
> > + map->component_map.hdm_decoder.offset,
> > + map->component_map.hdm_decoder.size);
> > +
> > + /* Get HDM register info */
> > + ret = cxl_get_hdm_reg_info(&cxl->cxlds, &count, &offset, &size);
> > + if (ret)
> > + return ret;
> > +
> > + if (!count || !size)
> > + return -ENODEV;
> > +
> > + cxl->hdm_count = count;
> > + cxl->hdm_reg_offset = offset;
> > + cxl->hdm_reg_size = size;
> > +
> > + ret = cxl_regblock_get_bar_info(map, &bar, &bar_offset);
> > + if (ret)
> > + return ret;
> > +
> > + cxl->comp_reg_bar = bar;
> > + cxl->comp_reg_offset = bar_offset;
> > + cxl->comp_reg_size = CXL_COMPONENT_REG_BLOCK_SIZE;
> > +
> > + return 0;
> > +}
> > +
> > /**
> > * vfio_pci_cxl_detect_and_init - Detect and initialize CXL Type-2 device
> > * @vdev: VFIO PCI device
> > @@ -28,8 +136,51 @@ MODULE_IMPORT_NS("CXL");
> > */
> > void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev)
> > {
> > + struct pci_dev *pdev = vdev->pdev;
> > + struct vfio_pci_cxl_state *cxl;
> > + u16 dvsec;
> > + int ret;
> > +
> > + if (!pcie_is_cxl(pdev))
> > + return;
> > +
> > + dvsec = pci_find_dvsec_capability(pdev,
> > + PCI_VENDOR_ID_CXL,
> > + PCI_DVSEC_CXL_DEVICE);
> > + if (!dvsec)
> > + return;
> > +
> > + ret = vfio_cxl_create_device_state(vdev, dvsec);
Suggestion above would lead to.
cxl = vfio_cxl_create_device_state(vdev, dvsec);
if (IS_ERR(cxl))
return PTR_ERR(cxl); //assuming failing at this point is fatal.
then only set vdev->cxl once you are sure this function succeeded.
Thus removing the need to set it to NULL on failure.
You need to pass it into a few more calls though.
> > + if (ret)
> > + return;
> > +
> > + cxl = vdev->cxl;
> > +
> > + /*
> > + * Required for ioremap of the component register block and
> > + * calls to cxl_probe_component_regs().
> > + */
> > + ret = pci_enable_device_mem(pdev);
> > + if (ret)
> > + goto failed;
> > +
> > + ret = vfio_cxl_setup_regs(vdev);
> > + if (ret) {
> > + pci_disable_device(pdev);
> > + goto failed;
> > + }
> > +
> > + pci_disable_device(pdev);
> > +
AS above, I think this would be cleaner if only here do we have
vdev->cxl = cxl;
> > + return;
> > +
> > +failed:
> > + devm_kfree(&pdev->dev, vdev->cxl);
If you get here you found a CXL device but couldn't handle it. Is it useful
to continue? I'd suggest probably not. If returning an error then devm
cleanup happens. Also, we should have a wrapper around devm_kfree to
make it clear that it is valid to use it for undoing the
devm_cxl_dev_state_create() (I haven't even checked it is valid)
> > + vdev->cxl = NULL;
> > }
if /* __LINUX_VFIO_CXL_PRIV_H */
>
>
next prev parent reply other threads:[~2026-03-13 12:43 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 20:34 [PATCH 00/20] vfio/pci: Add CXL Type-2 device passthrough support mhonap
2026-03-11 20:34 ` [PATCH 01/20] cxl: Introduce cxl_get_hdm_reg_info() mhonap
2026-03-12 11:28 ` Jonathan Cameron
2026-03-12 16:33 ` Dave Jiang
2026-03-11 20:34 ` [PATCH 02/20] cxl: Expose cxl subsystem specific functions for vfio mhonap
2026-03-12 16:49 ` Dave Jiang
2026-03-13 10:05 ` Manish Honap
2026-03-11 20:34 ` [PATCH 03/20] cxl: Move CXL spec defines to public header mhonap
2026-03-13 12:18 ` Jonathan Cameron
2026-03-13 16:56 ` Dave Jiang
2026-03-18 14:56 ` Jonathan Cameron
2026-03-18 17:51 ` Manish Honap
2026-03-11 20:34 ` [PATCH 04/20] cxl: Media ready check refactoring mhonap
2026-03-12 20:29 ` Dave Jiang
2026-03-13 10:05 ` Manish Honap
2026-03-11 20:34 ` [PATCH 05/20] cxl: Expose BAR index and offset from register map mhonap
2026-03-12 20:58 ` Dave Jiang
2026-03-13 10:11 ` Manish Honap
2026-03-11 20:34 ` [PATCH 06/20] vfio/cxl: Add UAPI for CXL Type-2 device passthrough mhonap
2026-03-12 21:04 ` Dave Jiang
2026-03-11 20:34 ` [PATCH 07/20] vfio/pci: Add CXL state to vfio_pci_core_device mhonap
2026-03-11 20:34 ` [PATCH 08/20] vfio/pci: Add vfio-cxl Kconfig and build infrastructure mhonap
2026-03-13 12:27 ` Jonathan Cameron
2026-03-18 17:21 ` Manish Honap
2026-03-11 20:34 ` [PATCH 09/20] vfio/cxl: Implement CXL device detection and HDM register probing mhonap
2026-03-12 22:31 ` Dave Jiang
2026-03-13 12:43 ` Jonathan Cameron [this message]
2026-03-18 17:43 ` Manish Honap
2026-03-11 20:34 ` [PATCH 10/20] vfio/cxl: CXL region management mhonap
2026-03-12 22:55 ` Dave Jiang
2026-03-13 12:52 ` Jonathan Cameron
2026-03-18 17:48 ` Manish Honap
2026-03-11 20:34 ` [PATCH 11/20] vfio/cxl: Expose DPA memory region to userspace with fault+zap mmap mhonap
2026-03-13 17:07 ` Dave Jiang
2026-03-18 17:54 ` Manish Honap
2026-03-11 20:34 ` [PATCH 12/20] vfio/pci: Export config access helpers mhonap
2026-03-11 20:34 ` [PATCH 13/20] vfio/cxl: Introduce HDM decoder register emulation framework mhonap
2026-03-13 19:05 ` Dave Jiang
2026-03-18 17:58 ` Manish Honap
2026-03-11 20:34 ` [PATCH 14/20] vfio/cxl: Check media readiness and create CXL memdev mhonap
2026-03-11 20:34 ` [PATCH 15/20] vfio/cxl: Introduce CXL DVSEC configuration space emulation mhonap
2026-03-13 22:07 ` Dave Jiang
2026-03-18 18:41 ` Manish Honap
2026-03-11 20:34 ` [PATCH 16/20] vfio/pci: Expose CXL device and region info via VFIO ioctl mhonap
2026-03-11 20:34 ` [PATCH 17/20] vfio/cxl: Provide opt-out for CXL feature mhonap
2026-03-11 20:34 ` [PATCH 18/20] docs: vfio-pci: Document CXL Type-2 device passthrough mhonap
2026-03-13 12:13 ` Jonathan Cameron
2026-03-17 21:24 ` Alex Williamson
2026-03-19 16:06 ` Jonathan Cameron
2026-03-23 14:36 ` Manish Honap
2026-03-11 20:34 ` [PATCH 19/20] selftests/vfio: Add CXL Type-2 passthrough tests mhonap
2026-03-11 20:34 ` [PATCH 20/20] selftests/vfio: Fix VLA initialisation in vfio_pci_irq_set() mhonap
2026-03-13 22:23 ` Dave Jiang
2026-03-18 18:07 ` Manish Honap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260313124342.00001add@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=alison.schofield@intel.com \
--cc=alwilliamson@nvidia.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=kevin.tian@intel.com \
--cc=kjaju@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhonap@nvidia.com \
--cc=mochs@nvidia.com \
--cc=skolothumtho@nvidia.com \
--cc=targupta@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=vsethi@nvidia.com \
--cc=yishaih@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox