From: Dan Williams <dan.j.williams@intel.com>
To: Manish Honap <mhonap@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Alex Williamson <alex@shazbot.org>,
"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>
Cc: "alex@shazbot.org" <alex@shazbot.org>,
Srirangan Madhavan <smadhavan@nvidia.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"dave.jiang@intel.com" <dave.jiang@intel.com>,
"ira.weiny@intel.com" <ira.weiny@intel.com>,
"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
"alison.schofield@intel.com" <alison.schofield@intel.com>,
"dave@stgolabs.net" <dave@stgolabs.net>,
Jeshua Smith <jeshuas@nvidia.com>,
Vikram Sethi <vsethi@nvidia.com>,
Sai Yashwanth Reddy Kancherla <skancherla@nvidia.com>,
Vishal Aslot <vaslot@nvidia.com>,
Shanker Donthineni <sdonthineni@nvidia.com>,
Vidya Sagar <vidyas@nvidia.com>, Jiandi An <jan@nvidia.com>,
Matt Ochs <mochs@nvidia.com>,
Derek Schumacher <dschumacher@nvidia.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Manish Honap <mhonap@nvidia.com>
Subject: RE: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Date: Tue, 17 Mar 2026 10:03:28 -0700 [thread overview]
Message-ID: <69b98960907e9_7ee31003b@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <IA1PR12MB90304014081461742C3977F4BD41A@IA1PR12MB9030.namprd12.prod.outlook.com>
Manish Honap wrote:
[..]
> > The CXL accelerator series is currently contending with being able to
> > restore device configuration after reset. I expect vfio-cxl to build on
> > that, not push CXL flows into the PCI core.
>
> Hello Dan,
>
> My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
> would like to explain because I expect you will have similar concerns about
> it and I'd rather have this conversation now.
>
> Type-2 passthrough series takes the opposite structural approach as you are
> suggesting here: CXL Type-2 support is an optional extension compiled into
> vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.
>
> Here is the reasoning:
>
> 1. Device enumeration
> =====================
>
> CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
> objects. The kernel discovers them through PCI config space scan, not through
> the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
> 0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
> device to bind to.
>
> A standalone vfio-cxl driver would therefore need to match on the PCI device
> just like vfio-pci does, and then call into vfio-pci-core for every PCI
> concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
> mapping, FLR, and migration callbacks. That is the variant driver pattern
> we rejected in favour of generic CXL passthrough. We have seen this exact
Lore link for this "rejection" discussion?
> outcome with the prior iterations of this series before we moved to the
> enlightened vfio-pci model.
I still do not understand the argument. CXL functionality is a library
that PCI drivers can use. If vfio-pci functionality is also a library
then vfio-cxl is a driver that uses services from both libraries. Where
the module and driver name boundaries are drawn is more an organization
decision not an functional one.
The argument for vfio-cxl organizational independence is more about
being able to tell at a diffstat level the relative PCI vs CXL
maintenance impact / regression risk.
> 2. CXL-CORE involvement
> =======================
>
> CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
> time the CXL enlightenment layer:
>
> - calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
> - calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
> - calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
> - creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
> the same path Alejandro's v23 series uses).
>
> The CXL core is fully involved. The difference is that the binding to
> userspace is still through vfio-pci, which already manages the pci_dev
> lifecycle, reset sequencing, and VFIO region/irq API.
Sure, every CXL driver in the system will do the same.
> 3. Standalone vfio-cxl
> ======================
>
> To match the model you are suggesting, vfio-cxl would need to:
>
> (a) Register a new driver on the CXL bus (struct cxl_driver), probing
> struct cxl_memdev or a new struct cxl_endpoint,
What, why? Just like this patch was series was proposing extending the
PCI core with additional common functionality the proposal is extend the
CXL core object drivers with the same.
> (b) Re-implement or delegate everything vfio-pci-core provides — config
> space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
> either by calling vfio-pci-core as a library or by duplicating it, and
What is the argument against a library?
> (c) present to userspace through a new device model distinct from
> vfio-pci.
CXL is a distinct operational model. What breaks if userspace is
required to explicitly account for CXL passhthrough?
> This is a significant new surface. QEMU's CXL passthrough support already
> builds on vfio-pci: it receives the PCI device via VFIO, reads the
> VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
> A vfio-cxl object model would require non-trivial QEMU changes for something
> that already works in the enlightened vfio-pci model.
What specifically about a kernel code organization choice affects the
QEMU implementation? A uAPI is kernel code organization agnostic.
The concern is designing ourselves into a PCI corner when longterm QEMU
benefits from understanding CXL objects. For example, CXL error handling
/ recovery is already well on its way to being performed in terms of CXL
port objects.
> 4. Module dependency
> ====================
>
> Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
> add CXL knowledge to the PCI core;
drivers/pci/cxl.c
> we add it to the VFIO layer that is already CXL_BUS-dependent.
Yes, VFIO layer needs CXL enlightenment and VFIO's requirements imply
wider benefits to other CXL capable devices.
> I would very much appreciate your thoughts on [1] considering the above. I want
> to understand your thoughts on whether vfio-pci-core can remain the single
> entry point from userspace, or whether you envision a new VFIO device type.
>
> Jonathan has indicated he has thoughts on this as well; hopefully, we
> can converge on a direction that doesn't require duplicating vfio-pci-core.
No one is suggesting, "require duplicating vfio-pci-core", please do not
argue with strawman cariacatures like this.
> [1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@nvidia.com/
Will take a look...
next prev parent reply other threads:[~2026-03-17 17:03 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-06 8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
2026-03-06 8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
2026-03-06 17:45 ` Alex Williamson
2026-03-07 0:37 ` Srirangan Madhavan
2026-03-10 21:44 ` Dan Williams
2026-03-16 14:02 ` Vishal Aslot
2026-03-06 8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
2026-03-06 17:45 ` Alex Williamson
2026-03-07 0:35 ` Srirangan Madhavan
2026-03-10 16:13 ` Dave Jiang
2026-03-06 8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
2026-03-10 21:45 ` Dan Williams
2026-03-06 8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
2026-03-06 17:45 ` Alex Williamson
2026-03-12 12:28 ` Jonathan Cameron
2026-03-06 8:00 ` [PATCH 5/5] PCI: Add HDM decoder state save/restore smadhavan
2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
2026-03-10 22:46 ` Alex Williamson
2026-03-11 1:45 ` Dan Williams
2026-03-17 14:51 ` Manish Honap
2026-03-17 17:03 ` Dan Williams [this message]
2026-03-17 18:19 ` Alex Williamson
2026-04-02 1:12 ` Dan Williams
2026-04-02 21:01 ` Alex Williamson
2026-04-02 21:52 ` Dan Williams
2026-03-12 12:34 ` Jonathan Cameron
2026-03-16 13:59 ` Vishal Aslot
2026-03-16 17:28 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69b98960907e9_7ee31003b@dwillia2-mobl4.notmuch \
--to=dan.j.williams@intel.com \
--cc=alex@shazbot.org \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=dschumacher@nvidia.com \
--cc=ira.weiny@intel.com \
--cc=jan@nvidia.com \
--cc=jeshuas@nvidia.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mhonap@nvidia.com \
--cc=mochs@nvidia.com \
--cc=sdonthineni@nvidia.com \
--cc=skancherla@nvidia.com \
--cc=smadhavan@nvidia.com \
--cc=vaslot@nvidia.com \
--cc=vidyas@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox