public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Manish Honap <mhonap@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Alex Williamson <alex@shazbot.org>,
	"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>
Cc: "alex@shazbot.org" <alex@shazbot.org>,
	Srirangan Madhavan <smadhavan@nvidia.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	Jeshua Smith <jeshuas@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	Sai Yashwanth Reddy Kancherla <skancherla@nvidia.com>,
	Vishal Aslot <vaslot@nvidia.com>,
	Shanker Donthineni <sdonthineni@nvidia.com>,
	Vidya Sagar <vidyas@nvidia.com>, Jiandi An <jan@nvidia.com>,
	Matt Ochs <mochs@nvidia.com>,
	Derek Schumacher <dschumacher@nvidia.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Manish Honap <mhonap@nvidia.com>
Subject: RE: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Date: Tue, 17 Mar 2026 10:03:28 -0700	[thread overview]
Message-ID: <69b98960907e9_7ee31003b@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <IA1PR12MB90304014081461742C3977F4BD41A@IA1PR12MB9030.namprd12.prod.outlook.com>

Manish Honap wrote:
[..]
> > The CXL accelerator series is currently contending with being able to
> > restore device configuration after reset. I expect vfio-cxl to build on
> > that, not push CXL flows into the PCI core.
> 
> Hello Dan,
> 
> My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
> would like to explain because I expect you will have similar concerns about
> it and I'd rather have this conversation now.
> 
> Type-2 passthrough series takes the opposite structural approach as you are
> suggesting here: CXL Type-2 support is an optional extension compiled into
> vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.
> 
> Here is the reasoning:
> 
> 1. Device enumeration
> =====================
> 
> CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
> objects.  The kernel discovers them through PCI config space scan, not through
> the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
> 0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
> device to bind to.
> 
> A standalone vfio-cxl driver would therefore need to match on the PCI device
> just like vfio-pci does, and then call into vfio-pci-core for every PCI
> concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
> mapping, FLR, and migration callbacks. That is the variant driver pattern
> we rejected in favour of generic CXL passthrough. We have seen this exact

Lore link for this "rejection" discussion?

> outcome with the prior iterations of this series before we moved to the
> enlightened vfio-pci model.

I still do not understand the argument. CXL functionality is a library
that PCI drivers can use. If vfio-pci functionality is also a library
then vfio-cxl is a driver that uses services from both libraries. Where
the module and driver name boundaries are drawn is more an organization
decision not an functional one.

The argument for vfio-cxl organizational independence is more about
being able to tell at a diffstat level the relative PCI vs CXL
maintenance impact / regression risk.

> 2. CXL-CORE involvement
> =======================
> 
> CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
> time the CXL enlightenment layer:
> 
>   - calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
>   - calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
>   - calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
>   - creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
>     the same path Alejandro's v23 series uses).
> 
> The CXL core is fully involved.  The difference is that the binding to
> userspace is still through vfio-pci, which already manages the pci_dev
> lifecycle, reset sequencing, and VFIO region/irq API.

Sure, every CXL driver in the system will do the same.

> 3. Standalone vfio-cxl
> ======================
> 
> To match the model you are suggesting, vfio-cxl would need to:
> 
>   (a) Register a new driver on the CXL bus (struct cxl_driver), probing
>       struct cxl_memdev or a new struct cxl_endpoint,

What, why? Just like this patch was series was proposing extending the
PCI core with additional common functionality the proposal is extend the
CXL core object drivers with the same.

>   (b) Re-implement or delegate everything vfio-pci-core provides — config
>       space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
>       either by calling vfio-pci-core as a library or by duplicating it, and

What is the argument against a library?

>   (c) present to userspace through a new device model distinct from
>       vfio-pci.

CXL is a distinct operational model. What breaks if userspace is
required to explicitly account for CXL passhthrough?

> This is a significant new surface. QEMU's CXL passthrough support already
> builds on vfio-pci: it receives the PCI device via VFIO, reads the
> VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
> A vfio-cxl object model would require non-trivial QEMU changes for something
> that already works in the enlightened vfio-pci model.

What specifically about a kernel code organization choice affects the
QEMU implementation? A uAPI is kernel code organization agnostic.

The concern is designing ourselves into a PCI corner when longterm QEMU
benefits from understanding CXL objects. For example, CXL error handling
/ recovery is already well on its way to being performed in terms of CXL
port objects.

> 4. Module dependency
> ====================
> 
> Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
> add CXL knowledge to the PCI core;

drivers/pci/cxl.c

> we add it to the VFIO layer that is already CXL_BUS-dependent.

Yes, VFIO layer needs CXL enlightenment and VFIO's requirements imply
wider benefits to other CXL capable devices.

> I would very much appreciate your thoughts on [1] considering the above. I want
> to understand your thoughts on whether vfio-pci-core can remain the single
> entry point from userspace, or whether you envision a new VFIO device type.
> 
> Jonathan has indicated he has thoughts on this as well; hopefully, we
> can converge on a direction that doesn't require duplicating vfio-pci-core.

No one is suggesting, "require duplicating vfio-pci-core", please do not
argue with strawman cariacatures like this.

> [1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@nvidia.com/

Will take a look...

  reply	other threads:[~2026-03-17 17:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-07  0:37     ` Srirangan Madhavan
2026-03-10 21:44   ` Dan Williams
2026-03-16 14:02     ` Vishal Aslot
2026-03-06  8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-07  0:35     ` Srirangan Madhavan
2026-03-10 16:13       ` Dave Jiang
2026-03-06  8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
2026-03-10 21:45   ` Dan Williams
2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-12 12:28   ` Jonathan Cameron
2026-03-06  8:00 ` [PATCH 5/5] PCI: Add HDM decoder state save/restore smadhavan
2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
2026-03-10 22:46   ` Alex Williamson
2026-03-11  1:45     ` Dan Williams
2026-03-17 14:51       ` Manish Honap
2026-03-17 17:03         ` Dan Williams [this message]
2026-03-17 18:19           ` Alex Williamson
2026-04-02  1:12             ` Dan Williams
2026-04-02 21:01               ` Alex Williamson
2026-04-02 21:52                 ` Dan Williams
2026-03-12 12:34 ` Jonathan Cameron
2026-03-16 13:59   ` Vishal Aslot
2026-03-16 17:28     ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69b98960907e9_7ee31003b@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alex@shazbot.org \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=dschumacher@nvidia.com \
    --cc=ira.weiny@intel.com \
    --cc=jan@nvidia.com \
    --cc=jeshuas@nvidia.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mhonap@nvidia.com \
    --cc=mochs@nvidia.com \
    --cc=sdonthineni@nvidia.com \
    --cc=skancherla@nvidia.com \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    --cc=vidyas@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox