All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Manish Honap <mhonap@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Alex Williamson <alex@shazbot.org>,
	"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>
Cc: "alex@shazbot.org" <alex@shazbot.org>,
	Srirangan Madhavan <smadhavan@nvidia.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	Jeshua Smith <jeshuas@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	Sai Yashwanth Reddy Kancherla <skancherla@nvidia.com>,
	Vishal Aslot <vaslot@nvidia.com>,
	Shanker Donthineni <sdonthineni@nvidia.com>,
	Vidya Sagar <vidyas@nvidia.com>, Jiandi An <jan@nvidia.com>,
	Matt Ochs <mochs@nvidia.com>,
	Derek Schumacher <dschumacher@nvidia.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Manish Honap <mhonap@nvidia.com>
Subject: RE: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
Date: Tue, 17 Mar 2026 10:03:28 -0700	[thread overview]
Message-ID: <69b98960907e9_7ee31003b@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <IA1PR12MB90304014081461742C3977F4BD41A@IA1PR12MB9030.namprd12.prod.outlook.com>

Manish Honap wrote:
[..]
> > The CXL accelerator series is currently contending with being able to
> > restore device configuration after reset. I expect vfio-cxl to build on
> > that, not push CXL flows into the PCI core.
> 
> Hello Dan,
> 
> My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
> would like to explain because I expect you will have similar concerns about
> it and I'd rather have this conversation now.
> 
> Type-2 passthrough series takes the opposite structural approach as you are
> suggesting here: CXL Type-2 support is an optional extension compiled into
> vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.
> 
> Here is the reasoning:
> 
> 1. Device enumeration
> =====================
> 
> CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
> objects.  The kernel discovers them through PCI config space scan, not through
> the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
> 0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
> device to bind to.
> 
> A standalone vfio-cxl driver would therefore need to match on the PCI device
> just like vfio-pci does, and then call into vfio-pci-core for every PCI
> concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
> mapping, FLR, and migration callbacks. That is the variant driver pattern
> we rejected in favour of generic CXL passthrough. We have seen this exact

Lore link for this "rejection" discussion?

> outcome with the prior iterations of this series before we moved to the
> enlightened vfio-pci model.

I still do not understand the argument. CXL functionality is a library
that PCI drivers can use. If vfio-pci functionality is also a library
then vfio-cxl is a driver that uses services from both libraries. Where
the module and driver name boundaries are drawn is more an organization
decision not an functional one.

The argument for vfio-cxl organizational independence is more about
being able to tell at a diffstat level the relative PCI vs CXL
maintenance impact / regression risk.

> 2. CXL-CORE involvement
> =======================
> 
> CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
> time the CXL enlightenment layer:
> 
>   - calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
>   - calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
>   - calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
>   - creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
>     the same path Alejandro's v23 series uses).
> 
> The CXL core is fully involved.  The difference is that the binding to
> userspace is still through vfio-pci, which already manages the pci_dev
> lifecycle, reset sequencing, and VFIO region/irq API.

Sure, every CXL driver in the system will do the same.

> 3. Standalone vfio-cxl
> ======================
> 
> To match the model you are suggesting, vfio-cxl would need to:
> 
>   (a) Register a new driver on the CXL bus (struct cxl_driver), probing
>       struct cxl_memdev or a new struct cxl_endpoint,

What, why? Just like this patch was series was proposing extending the
PCI core with additional common functionality the proposal is extend the
CXL core object drivers with the same.

>   (b) Re-implement or delegate everything vfio-pci-core provides — config
>       space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
>       either by calling vfio-pci-core as a library or by duplicating it, and

What is the argument against a library?

>   (c) present to userspace through a new device model distinct from
>       vfio-pci.

CXL is a distinct operational model. What breaks if userspace is
required to explicitly account for CXL passhthrough?

> This is a significant new surface. QEMU's CXL passthrough support already
> builds on vfio-pci: it receives the PCI device via VFIO, reads the
> VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
> A vfio-cxl object model would require non-trivial QEMU changes for something
> that already works in the enlightened vfio-pci model.

What specifically about a kernel code organization choice affects the
QEMU implementation? A uAPI is kernel code organization agnostic.

The concern is designing ourselves into a PCI corner when longterm QEMU
benefits from understanding CXL objects. For example, CXL error handling
/ recovery is already well on its way to being performed in terms of CXL
port objects.

> 4. Module dependency
> ====================
> 
> Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
> add CXL knowledge to the PCI core;

drivers/pci/cxl.c

> we add it to the VFIO layer that is already CXL_BUS-dependent.

Yes, VFIO layer needs CXL enlightenment and VFIO's requirements imply
wider benefits to other CXL capable devices.

> I would very much appreciate your thoughts on [1] considering the above. I want
> to understand your thoughts on whether vfio-pci-core can remain the single
> entry point from userspace, or whether you envision a new VFIO device type.
> 
> Jonathan has indicated he has thoughts on this as well; hopefully, we
> can converge on a direction that doesn't require duplicating vfio-pci-core.

No one is suggesting, "require duplicating vfio-pci-core", please do not
argue with strawman cariacatures like this.

> [1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@nvidia.com/

Will take a look...

  reply	other threads:[~2026-03-17 17:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-07  0:37     ` Srirangan Madhavan
2026-03-10 21:44   ` Dan Williams
2026-03-16 14:02     ` Vishal Aslot
2026-03-06  8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-07  0:35     ` Srirangan Madhavan
2026-03-10 16:13       ` Dave Jiang
2026-03-06  8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
2026-03-10 21:45   ` Dan Williams
2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-12 12:28   ` Jonathan Cameron
2026-03-06  8:00 ` [PATCH 5/5] PCI: Add HDM decoder state save/restore smadhavan
2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
2026-03-10 22:46   ` Alex Williamson
2026-03-11  1:45     ` Dan Williams
2026-03-17 14:51       ` Manish Honap
2026-03-17 17:03         ` Dan Williams [this message]
2026-03-17 18:19           ` Alex Williamson
2026-04-02  1:12             ` Dan Williams
2026-04-02 21:01               ` Alex Williamson
2026-04-02 21:52                 ` Dan Williams
2026-03-12 12:34 ` Jonathan Cameron
2026-03-16 13:59   ` Vishal Aslot
2026-03-16 17:28     ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69b98960907e9_7ee31003b@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alex@shazbot.org \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=dschumacher@nvidia.com \
    --cc=ira.weiny@intel.com \
    --cc=jan@nvidia.com \
    --cc=jeshuas@nvidia.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mhonap@nvidia.com \
    --cc=mochs@nvidia.com \
    --cc=sdonthineni@nvidia.com \
    --cc=skancherla@nvidia.com \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    --cc=vidyas@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.