public inbox for linux-kselftest@vger.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <djbw@kernel.org>
To: mhonap@nvidia.com,  alwilliamson@nvidia.com,
	 jonathan.cameron@huawei.com,  dave.jiang@intel.com,
	 alejandro.lucero-palau@amd.com,  dave@stgolabs.net,
	 alison.schofield@intel.com,  vishal.l.verma@intel.com,
	 ira.weiny@intel.com,  dmatlack@google.com,  shuah@kernel.org,
	 jgg@ziepe.ca,  yishaih@nvidia.com,  skolothumtho@nvidia.com,
	 kevin.tian@intel.com,  ankita@nvidia.com
Cc: vsethi@nvidia.com,  cjia@nvidia.com,  targupta@nvidia.com,
	 zhiw@nvidia.com,  kjaju@nvidia.com,
	 linux-kselftest@vger.kernel.org,  linux-kernel@vger.kernel.org,
	 linux-cxl@vger.kernel.org,  kvm@vger.kernel.org,
	 mhonap@nvidia.com,  Alex Williamson <alex@shazbot.org>,
	 Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH v2 00/20] vfio/pci: Add CXL Type-2 device passthrough support
Date: Mon, 13 Apr 2026 21:08:38 -0700	[thread overview]
Message-ID: <69ddbdc62016d_147c801004d@djbw-dev.notmuch> (raw)
In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com>

Forgive me if any of the commentary below was already hashed out in the
v1 discussion. Your excellent changelog notes make catching up much
easier, thanks!

mhonap@ wrote:
> From: Manish Honap <mhonap@nvidia.com>
> 
> CXL Type-2 accelerators (e.g. CXL.mem-capable GPUs) cannot be passed
> through to virtual machines with stock vfio-pci because the driver has
> no concept of HDM decoder management, DPA region exposure, or component
> register emulation.  This series wires all of that into vfio-pci-core
> behind a new CONFIG_VFIO_CXL_CORE optional module, without requiring a
> variant driver.
> 
> When a CXL Device DVSEC (Vendor ID 0x1E98, ID 0x0000) is detected at
> device open time, the driver:
> 
>   - Probes the HDM Decoder Capability block in the component registers
>     and allocates a DPA region through the CXL subsystem.  On devices
>     where firmware has already committed a decoder, the kernel skips
>     allocation and re-uses the committed range.
> 
>   - Builds a kernel-owned shadow of the HDM register block.  The VMM
>     reads and writes this shadow through a dedicated COMP_REGS VFIO
>     region rather than touching the hardware directly.  The kernel
>     enforces CXL 3.1 bit-field rules: reserved bits, read-only bits,
>     the COMMIT/COMMITTED latch, and the LOCK→0 reprogram path for
>     firmware-committed decoders.
> 
>   - Exposes the DPA range as a second VFIO region (VFIO_REGION_SUBTYPE_CXL)
>     backed by the kernel-assigned HPA.  PTEs are inserted lazily on first
>     page fault and torn down atomically under memory_lock during FLR.

I assume, or hope this means expose a CXL region as
VFIO_REGION_SUBTYPE_CXL, as DPA is a device-internal address space that
VFIO probably does not need to worry about. VFIO likely only needs to
care about system visible resource.

If / when interleaving arrives for CXL accelerators the 1:1 vfio-pci to
DPA to CXL region HPA association breaks. Ok, to assume 1:1 for now.

>   - Intercepts writes to the CXL DVSEC configuration-space registers
>     (Control, Status, Control2, Status2, Lock, Range Base) and replays

Range Base is ignored when global HDM Decoder Control is enabled. I
would hope that this enabling ditches CXL 1.x legacy wherever possible.

>     them through a per-device vconfig shadow, enforcing RWL/RW1CS/RWO
>     access semantics and the CONFIG_LOCK one-shot latch.

Linux should have no need to ever trigger CXL register bit locks. That
is only for firmware to make changes immutable if the firmware has
requirements that nothing moves for its own purposes.

Now, it makes sense to configure the vCXL device to be locked at setup,
but I do not currently see the use case for the vBIOS to mutate and lock
the configuration.

[..] 
>   - Includes selftests

Yay!

>     covering device detection, capability parsing,
>     region enumeration, HDM register emulation, DPA mmap with page-fault
>     insertion, FLR invalidation, and DVSEC register emulation.
> 
> The series is applied on top of the cxl/next branch using the base
> specified at the end of this cover letter plus Alejandro's v23 Type-2
> device support patches [1].

One of the sticking points of the accelerator series has been how many
details of the CXL core internal object lifetime leak out.

My hope / thought experiment is that the initial version of this
enabling only needs to facilitate getting a VMM established CXL region
into a guest. With that VFIO only needs is the CXL region HPA and MMIO
layout so that CXL registers can be trapped and non-CXL registers can be
direct mapped.

> Series structure
> ================
> 
>   Patches 1-5 extend the CXL subsystem with the APIs vfio-pci needs.
> 
>   Patches 6-8 add the vfio-pci-core plumbing (UAPI, device state,
>   Kconfig/build).
> 
>   Patches 9-15 implement the core device lifecycle: detection, HDM
>   emulation, media readiness, region management, DPA region, and DVSEC
>   emulation.
> 
>   Patches 16-18 wire everything together at open/close time and
>   populate the VFIO ioctl paths.
> 
>   Patches 19-20 add documentation and selftests.
> 
> Changes since v1
> ================
[..]
> HDM API simplification (patch 1)
> 
>   v1 exported cxl_get_hdm_reg_info() which returned a raw struct with
>   offset and size fields. v2 replaces it with cxl_get_hdm_info() which
>   uses the cached count already populated by cxl_probe_component_regs()
>   and returns a single struct with all HDM metadata, removing the need
>   for callers to re-read the hardware.

What is the accelerator use case to support multiple CXL regions per
device?

In other words, it feels ambitious to support that while simultaneously
kicking the "interleave" question down the road. If we are going for
initial simplicity that also means single region to start.

> cxl_await_range_active() split (patch 4)
> 
>   cxl_await_media_ready() requires a CXLMDEV mailbox register, which
>   Type-2 accelerators may not have.  v2 splits out cxl_await_range_active()
>   so the HDM range-active poll can be used independently of the media
>   ready path.

This feels like a detail vfio-pci does not need to worry about. The core
knows that the device does not have a mailbox and the core knows it
needs to await range ready when probing HDM. Something is broken if
vfio-pci needs to duplicate this part of the setup.

> LOCK→0 transition in HDM ctrl write emulation (patch 11)
> 
>   v1 did not handle the case where a guest tries to clear the LOCK bit
>   to reprogram a firmware-committed decoder. v2 allows this transition
>   and re-programs the hardware accordingly.

? Guest has no ability to manipulate Host HPA mappings. A protocol for a
guest to work with a host to remap HPA does not sound like a v1
requirement. This would be equivalent to a guest asking to move a host
PCI BAR.

      parent reply	other threads:[~2026-04-14  4:08 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 14:38 [PATCH v2 00/20] vfio/pci: Add CXL Type-2 device passthrough support mhonap
2026-04-01 14:38 ` [PATCH v2 01/20] cxl: Add cxl_get_hdm_info() for HDM decoder metadata mhonap
2026-04-01 14:38 ` [PATCH v2 02/20] cxl: Declare cxl_find_regblock and cxl_probe_component_regs in public header mhonap
2026-04-01 14:39 ` [PATCH v2 03/20] cxl: Move component/HDM register defines to uapi/cxl/cxl_regs.h mhonap
2026-04-01 14:39 ` [PATCH v2 04/20] cxl: Split cxl_await_range_active() from media-ready wait mhonap
2026-04-01 14:39 ` [PATCH v2 05/20] cxl: Record BIR and BAR offset in cxl_register_map mhonap
2026-04-01 14:39 ` [PATCH v2 06/20] vfio: UAPI for CXL-capable PCI device assignment mhonap
2026-04-01 14:39 ` [PATCH v2 07/20] vfio/pci: Add CXL state to vfio_pci_core_device mhonap
2026-04-01 14:39 ` [PATCH v2 08/20] vfio/pci: Add CONFIG_VFIO_CXL_CORE and stub CXL hooks mhonap
2026-04-01 14:39 ` [PATCH v2 09/20] vfio/cxl: Detect CXL DVSEC and probe HDM block mhonap
2026-04-01 14:39 ` [PATCH v2 10/20] vfio/pci: Export config access helpers mhonap
2026-04-01 14:39 ` [PATCH v2 11/20] vfio/cxl: Introduce HDM decoder register emulation framework mhonap
2026-04-01 14:39 ` [PATCH v2 12/20] vfio/cxl: Wait for HDM ranges and create memdev mhonap
2026-04-01 14:39 ` [PATCH v2 13/20] vfio/cxl: CXL region management support mhonap
2026-04-01 14:39 ` [PATCH v2 14/20] vfio/cxl: DPA VFIO region with demand fault mmap and reset zap mhonap
2026-04-01 14:39 ` [PATCH v2 15/20] vfio/cxl: Virtualize CXL DVSEC config writes mhonap
2026-04-01 14:39 ` [PATCH v2 16/20] vfio/cxl: Register regions with VFIO layer mhonap
2026-04-03 19:35   ` Dan Williams
2026-04-04 18:53     ` Jason Gunthorpe
2026-04-04 19:36       ` Dan Williams
2026-04-06 21:22         ` Gregory Price
2026-04-06 22:05           ` Jason Gunthorpe
2026-04-07 14:15             ` Gregory Price
2026-04-06 22:10         ` Jason Gunthorpe
2026-04-01 14:39 ` [PATCH v2 17/20] vfio/pci: Advertise CXL cap and sparse component BAR to userspace mhonap
2026-04-01 14:39 ` [PATCH v2 18/20] vfio/cxl: Provide opt-out for CXL feature mhonap
2026-04-01 14:39 ` [PATCH v2 19/20] docs: vfio-pci: Document CXL Type-2 device passthrough mhonap
2026-04-01 14:39 ` [PATCH v2 20/20] selftests/vfio: Add CXL Type-2 VFIO assignment test mhonap
2026-04-14  4:08 ` Dan Williams [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69ddbdc62016d_147c801004d@djbw-dev.notmuch \
    --to=djbw@kernel.org \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alex@shazbot.org \
    --cc=alison.schofield@intel.com \
    --cc=alwilliamson@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=dmatlack@google.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jonathan.cameron@huawei.com \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mhonap@nvidia.com \
    --cc=shuah@kernel.org \
    --cc=skolothumtho@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox