[RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
@ 2026-04-05  7:28 Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 01/10] vfio/pci: Use the write side of EventNotifier for IRQ signaling Scott J. Goldman
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman

This series adds VFIO PCI device passthrough support for Apple Silicon
Macs running macOS, using a DriverKit extension (dext) as the host
backend instead of the Linux VFIO kernel driver.

I'm sending this as an RFC because I'd like feedback before investing
further in upstreaming.  The code is functional.  I've tested it with
an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
[1]), likely due to the BAR access penalty described below.  AI
inference workloads appear less affected.  Ollama with Qwen 3.5
generates around 140 tok/sec on the same setup [2].

How it works:

On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
for device access and DMA mapping.  On macOS, there is no equivalent
kernel interface.  Instead, a userspace DriverKit extension
(VFIOUserPCIDriver) mediates access to the physical PCI device through
IOKit's IOUserClient and PCIDriverKit APIs.

The series keeps the existing VFIOPCIDevice model and reuses QEMU's
passthrough infrastructure.  A few ioctl callsites are refactored into
io_ops callbacks, the build system is extended for Darwin, and the
Apple-specific backend plugs in behind those abstractions.

The guest sees two PCI devices: the passthrough device itself
(vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
DMA mapping device (apple-dma-pci).  On the QEMU side, an
AppleVFIOContainer implements the IOMMU backend, and a C client
library wraps the IOUserClient calls to the dext for config space,
BAR MMIO, interrupts, reset, and DMA.

DMA limitations:

This is the biggest platform constraint.  Unlike a typical IOMMU
mapping operation where the caller specifies the IOVA, the
PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
system-assigned IOVA.  There is no way to request a specific address.
This means the guest's requested DMA addresses cannot be used
directly.  The guest kernel module must intercept DMA mapping calls
and forward them through the companion device to get the actual
hardware IOVA.

There are also hard platform limits: approximately 1.5 GB total
mapped memory and roughly 64k concurrent mappings.  Not all
workloads will fit within these limits, though GPU gaming and LLM
inference have worked in practice.

BAR access has performance issues as well.  HVF does not expose
controls to map device memory as cacheable in the guest, creating a
significant performance penalty on BAR MMIO.  Uncached mappings work
correctly but slowly compared to what the hardware could do.

What works:
- PCI config space passthrough
- BAR MMIO via direct-mapped device memory
- MSI/MSI-X interrupts via async notification from the dext
- Device reset (FLR with hot-reset fallback)
- DMA mapping for guest device drivers

What doesn't work:
- Expansion ROM / VBIOS passthrough
- PCI BAR quirks
- VGA region passthrough
- Migration and dirty page tracking
- Hot-unplug

Questions for reviewers:

1. Is this something the VFIO maintainers would consider carrying
   upstream?  The refactoring patches (3-6) are benign, but the Apple
   backend is a new platform with real limitations.  That said, if Apple
   lifts some of the DART/HVF restrictions in a future macOS release, the
   code changes to take advantage would likely be minor.  I'd like to
   understand whether this is in scope before doing the work to
   address review feedback on the full series.

2. The apple-dma-pci companion device: should this be a virtio device
   instead?  I went with a simple custom PCI device because the virtio
   infrastructure didn't buy much for what is essentially a {map, unmap}
   register interface, but if virtio is preferred, what is the process
   for allocating a device ID?  If a custom PCI device is the right
   approach, I've tentatively allocated 1b36:0015.  Is there a process
   for reserving a device ID under the Red Hat PCI vendor, or is
   claiming it in pci-ids.rst sufficient?  The guest-side kernel module
   hooks all DMA mapping functions for passed-through devices, which is
   unusual enough that I'm not sure it's upstreamable in the Linux
   kernel.  I can maintain it out of tree if needed.

3. Should the macOS host-side DriverKit extension live in the QEMU
   tree?  It's not included in this series and requires Apple code
   signing.  I'm happy to keep it out of tree if that's preferred,
   or include the source if reviewers want it co-located.

4. The existing VFIO code includes <linux/vfio.h> from the
   linux-headers/ tree, which is intended to track upstream Linux
   UAPI headers.  To make this compile on macOS, I added minimal
   stub headers (include/compat/linux/types.h and linux/ioctl.h)
   so the existing vfio.h parses on macOS without modification.  An
   alternative would be to move an approximation of vfio.h into
   standard-headers/, but that felt against the spirit of tracking
   the latest upstream headers, and the standard-headers import
   process strips ioctls which the VFIO code relies on.  I felt
   the stub approach was the least invasive, but I'm open to
   changing it if there's a preferred way to handle this.

[1] https://imgur.com/a/xoRS9kT
[2] https://imgur.com/a/ui4pYF0

Scott J. Goldman (10):
  vfio/pci: Use the write side of EventNotifier for IRQ signaling
  accel/hvf: avoid executable mappings for RAM-device memory
  vfio: Allow building on Darwin hosts
  vfio: Prepare existing code for Apple VFIO backend
  vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps
  vfio: Add device_reset callback to VFIODeviceIOOps
  vfio/apple: Add DriverKit dext client library
  vfio/apple: Add IOMMU container and PCI device
  vfio/apple: Add apple-dma-pci companion device
  docs: Add vfio-apple documentation and MAINTAINERS entry

 Kconfig.host                       |   3 +
 MAINTAINERS                        |  11 +
 accel/hvf/hvf-all.c                |  10 +-
 backends/Kconfig                   |   2 +-
 docs/specs/pci-ids.rst             |   3 +
 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/vfio-apple.rst | 160 +++++
 hw/vfio-user/device.c              |  16 +-
 hw/vfio/Kconfig                    |   4 +-
 hw/vfio/ap.c                       |   4 +-
 hw/vfio/apple-device.c             | 945 +++++++++++++++++++++++++++++
 hw/vfio/apple-dext-client.c        | 681 +++++++++++++++++++++
 hw/vfio/apple-dext-client.h        | 253 ++++++++
 hw/vfio/apple-dma.c                | 540 +++++++++++++++++
 hw/vfio/apple.h                    |  74 +++
 hw/vfio/ccw.c                      |   2 +-
 hw/vfio/container-apple.c          | 241 ++++++++
 hw/vfio/device.c                   |  42 ++
 hw/vfio/meson.build                |  12 +-
 hw/vfio/migration.c                |   5 +-
 hw/vfio/pci.c                      |  50 +-
 hw/vfio/pci.h                      |   1 +
 hw/vfio/region.c                   | 108 ++--
 hw/vfio/types.h                    |   2 +
 hw/vfio/vfio-helpers.h             |   2 +-
 hw/vfio/vfio-migration-internal.h  |   4 +-
 hw/vfio/vfio-region.h              |   4 +
 include/compat/linux/ioctl.h       |   2 +
 include/compat/linux/types.h       |  26 +
 include/hw/pci/pci.h               |   1 +
 include/hw/vfio/vfio-container.h   |   1 +
 include/hw/vfio/vfio-device.h      |  40 +-
 meson.build                        |  10 +-
 util/event_notifier-posix.c        |   5 +-
 34 files changed, 3197 insertions(+), 68 deletions(-)
 create mode 100644 docs/system/devices/vfio-apple.rst
 create mode 100644 hw/vfio/apple-device.c
 create mode 100644 hw/vfio/apple-dext-client.c
 create mode 100644 hw/vfio/apple-dext-client.h
 create mode 100644 hw/vfio/apple-dma.c
 create mode 100644 hw/vfio/apple.h
 create mode 100644 hw/vfio/container-apple.c
 create mode 100644 include/compat/linux/ioctl.h
 create mode 100644 include/compat/linux/types.h

-- 
2.50.1 (Apple Git-155)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 01/10] vfio/pci: Use the write side of EventNotifier for IRQ signaling
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 02/10] accel/hvf: avoid executable mappings for RAM-device memory Scott J. Goldman
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman

When passing an fd to a vfio-user server for interrupt signaling, the
write side of the EventNotifier must be used. For eventfd-backed
notifiers the read and write descriptors are the same, so the existing
kernel VFIO path is unchanged. However, on hosts that emulate
EventNotifier with a pipe pair (e.g. macOS), event_notifier_get_fd()
returns the read side, which the vfio-user server cannot write to.

Introduce vfio_irq_signal_fd() which returns event_notifier_get_wfd()
and use it at every site in vfio/pci that hands an fd to
vfio_device_irq_set_signaling() or the bulk IRQ path. The
qemu_set_fd_handler() calls continue to use the read fd as before.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 hw/vfio/pci.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1945751ffd..ee1a42e7e0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -80,6 +80,17 @@ static bool vfio_notifier_init(VFIOPCIDevice *vdev, EventNotifier *e,
     return true;
 }
 
+/*
+ * Return the fd that the vfio kernel driver or vfio-user server should
+ * write to in order to signal an interrupt.  For eventfd-backed notifiers
+ * this is the same descriptor QEMU reads, but on hosts that emulate
+ * EventNotifier with a pipe pair the write side must be used instead.
+ */
+static int vfio_irq_signal_fd(EventNotifier *e)
+{
+    return event_notifier_get_wfd(e);
+}
+
 static void vfio_notifier_cleanup(VFIOPCIDevice *vdev, EventNotifier *e,
                                   const char *name, int nr)
 {
@@ -378,7 +389,9 @@ static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
     }
 
     if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
-                                VFIO_IRQ_SET_ACTION_TRIGGER, fd, errp)) {
+                                VFIO_IRQ_SET_ACTION_TRIGGER,
+                                vfio_irq_signal_fd(&vdev->intx.interrupt),
+                                errp)) {
         qemu_set_fd_handler(fd, NULL, NULL, vdev);
         vfio_notifier_cleanup(vdev, &vdev->intx.interrupt, "intx-interrupt", 0);
         return false;
@@ -548,9 +561,9 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
         if (vdev->msi_vectors[i].use) {
             if (vdev->msi_vectors[i].virq < 0 ||
                 (msix && msix_is_masked(pdev, i))) {
-                fd = event_notifier_get_fd(&vdev->msi_vectors[i].interrupt);
+                fd = vfio_irq_signal_fd(&vdev->msi_vectors[i].interrupt);
             } else {
-                fd = event_notifier_get_fd(&vdev->msi_vectors[i].kvm_interrupt);
+                fd = vfio_irq_signal_fd(&vdev->msi_vectors[i].kvm_interrupt);
             }
         }
 
@@ -628,9 +641,9 @@ static void set_irq_signalling(VFIODevice *vbasedev, VFIOMSIVector *vector,
     int32_t fd;
 
     if (vector->virq >= 0) {
-        fd = event_notifier_get_fd(&vector->kvm_interrupt);
+        fd = vfio_irq_signal_fd(&vector->kvm_interrupt);
     } else {
-        fd = event_notifier_get_fd(&vector->interrupt);
+        fd = vfio_irq_signal_fd(&vector->interrupt);
     }
 
     if (!vfio_device_irq_set_signaling(vbasedev, VFIO_PCI_MSIX_IRQ_INDEX, nr,
@@ -770,7 +783,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
      * be re-asserted on unmask.  Nothing to do if already using QEMU mode.
      */
     if (vector->virq >= 0) {
-        int32_t fd = event_notifier_get_fd(&vector->interrupt);
+        int32_t fd = vfio_irq_signal_fd(&vector->interrupt);
         Error *err = NULL;
 
         if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX,
@@ -3181,7 +3194,9 @@ void vfio_pci_register_err_notifier(VFIOPCIDevice *vdev)
     }
 
     if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_ERR_IRQ_INDEX, 0,
-                                       VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
+                                       VFIO_IRQ_SET_ACTION_TRIGGER,
+                                       vfio_irq_signal_fd(&vdev->err_notifier),
+                                       &err)) {
         error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
         qemu_set_fd_handler(fd, NULL, NULL, vdev);
         vfio_notifier_cleanup(vdev, &vdev->err_notifier, "err_notifier", 0);
@@ -3254,7 +3269,9 @@ void vfio_pci_register_req_notifier(VFIOPCIDevice *vdev)
     }
 
     if (!vfio_device_irq_set_signaling(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX, 0,
-                                       VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
+                                       VFIO_IRQ_SET_ACTION_TRIGGER,
+                                       vfio_irq_signal_fd(&vdev->req_notifier),
+                                       &err)) {
         error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
         qemu_set_fd_handler(fd, NULL, NULL, vdev);
         vfio_notifier_cleanup(vdev, &vdev->req_notifier, "req_notifier", 0);
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 02/10] accel/hvf: avoid executable mappings for RAM-device memory
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 01/10] vfio/pci: Use the write side of EventNotifier for IRQ signaling Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-22 17:05   ` Philippe Mathieu-Daudé
  2026-04-05  7:28 ` [RFC PATCH 03/10] vfio: Allow building on Darwin hosts Scott J. Goldman
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman

On macOS, HVF can panic the host kernel if a guest accesses device-backed
memory through an executable mapping. Leave RAM-device/MMIO regions
mapped read/write only and keep EXEC for ordinary guest RAM.

This works around the immediate crash seen with passthrough BAR
mappings. There are still platform-specific performance issues with
guest write-combining mappings, but uncached mappings behave much more
like the host-side mapping and this at least avoids the panic.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 accel/hvf/hvf-all.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
index 5f357c6d19..76cec4655b 100644
--- a/accel/hvf/hvf-all.c
+++ b/accel/hvf/hvf-all.c
@@ -114,7 +114,15 @@ static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
         return;
     }
 
-    flags = HV_MEMORY_READ | HV_MEMORY_EXEC | (writable ? HV_MEMORY_WRITE : 0);
+    flags = HV_MEMORY_READ | (writable ? HV_MEMORY_WRITE : 0);
+    /*
+     * Leave RAM-device/MMIO mappings RW-only: on macOS, accessing them through
+     * executable HVF mappings can panic the host kernel. Ordinary guest RAM
+     * still needs EXEC.
+     */
+    if (!memory_region_is_ram_device(area)) {
+        flags |= HV_MEMORY_EXEC;
+    }
     mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 
     trace_hvf_vm_map(gpa, size, mem, flags,
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 02/10] accel/hvf: avoid executable mappings for RAM-device memory
  2026-04-05  7:28 ` [RFC PATCH 02/10] accel/hvf: avoid executable mappings for RAM-device memory Scott J. Goldman
@ 2026-04-22 17:05   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 25+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-04-22 17:05 UTC (permalink / raw)
  To: Scott J. Goldman, qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x

On 5/4/26 09:28, Scott J. Goldman wrote:
> On macOS, HVF can panic the host kernel if a guest accesses device-backed
> memory through an executable mapping. Leave RAM-device/MMIO regions
> mapped read/write only and keep EXEC for ordinary guest RAM.
> 
> This works around the immediate crash seen with passthrough BAR
> mappings. There are still platform-specific performance issues with
> guest write-combining mappings, but uncached mappings behave much more
> like the host-side mapping and this at least avoids the panic.
> 
> Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
> ---
>   accel/hvf/hvf-all.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 03/10] vfio: Allow building on Darwin hosts
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 01/10] vfio/pci: Use the write side of EventNotifier for IRQ signaling Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 02/10] accel/hvf: avoid executable mappings for RAM-device memory Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 04/10] vfio: Prepare existing code for Apple VFIO backend Scott J. Goldman
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman

Enable the VFIO subsystem to compile on macOS/Darwin in addition to
Linux.  This is build infrastructure only — no Apple-specific device
code is added yet.

Key changes:
- Add CONFIG_DARWIN Kconfig symbol and propagate it through the build
  system so that VFIO and VFIO_PCI are selectable on Darwin hosts.
- Provide minimal <linux/types.h> and <linux/ioctl.h> shim headers
  under include/compat/ so the Linux UAPI VFIO headers parse on macOS.
- Widen CONFIG_LINUX guards to CONFIG_LINUX || CONFIG_DARWIN in the
  VFIO device, helpers, and migration headers.
- Make container-legacy.c (the Linux /dev/vfio ioctl container) build
  only on Linux, and restrict IOMMUFD to Linux in Kconfig.
- Remove the CONFIG_EVENTFD guard around event_notifier_init_fd() so
  pipe-based EventNotifiers can be initialized on Darwin.
- Mark vfio-pci as not user-creatable on Darwin since the Linux VFIO
  kernel driver is not available; the Apple-specific device type will
  be added in a subsequent commit.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 Kconfig.host                      |  3 +++
 backends/Kconfig                  |  2 +-
 hw/vfio/Kconfig                   |  4 ++--
 hw/vfio/meson.build               |  4 +++-
 hw/vfio/pci.c                     |  3 +++
 hw/vfio/vfio-helpers.h            |  2 +-
 hw/vfio/vfio-migration-internal.h |  4 ++--
 include/compat/linux/ioctl.h      |  2 ++
 include/compat/linux/types.h      | 26 ++++++++++++++++++++++++++
 include/hw/vfio/vfio-device.h     |  4 ++--
 meson.build                       | 10 +++++++++-
 util/event_notifier-posix.c       |  5 ++---
 12 files changed, 56 insertions(+), 13 deletions(-)
 create mode 100644 include/compat/linux/ioctl.h
 create mode 100644 include/compat/linux/types.h

diff --git a/Kconfig.host b/Kconfig.host
index 933425c74b..bb2780293c 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -5,6 +5,9 @@
 config LINUX
     bool
 
+config DARWIN
+    bool
+
 config LIBCBOR
     bool
 
diff --git a/backends/Kconfig b/backends/Kconfig
index d3dbe19868..d1be4148d3 100644
--- a/backends/Kconfig
+++ b/backends/Kconfig
@@ -2,7 +2,7 @@ source tpm/Kconfig
 
 config IOMMUFD
     bool
-    depends on VFIO
+    depends on VFIO && LINUX
 
 config SPDM_SOCKET
     bool
diff --git a/hw/vfio/Kconfig b/hw/vfio/Kconfig
index 27de24e4db..a409483c34 100644
--- a/hw/vfio/Kconfig
+++ b/hw/vfio/Kconfig
@@ -2,14 +2,14 @@
 
 config VFIO
     bool
-    depends on LINUX
+    depends on LINUX || DARWIN
 
 config VFIO_PCI
     bool
     default y
     select VFIO
     select EDID
-    depends on LINUX && PCI
+    depends on (LINUX || DARWIN) && PCI
 
 config VFIO_CCW
     bool
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 82f68698fb..1ee9c11d5b 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -4,9 +4,11 @@ vfio_ss = ss.source_set()
 vfio_ss.add(files(
   'listener.c',
   'container.c',
-  'container-legacy.c',
   'helpers.c',
 ))
+if host_os == 'linux'
+  vfio_ss.add(files('container-legacy.c'))
+endif
 vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c'))
 vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
   'pci-quirks.c',
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ee1a42e7e0..5a1c2d8c2e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3862,6 +3862,9 @@ static void vfio_pci_class_init(ObjectClass *klass, const void *data)
 #endif
     dc->vmsd = &vfio_cpr_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
+#ifdef CONFIG_DARWIN
+    dc->user_creatable = false;
+#endif
     pdc->realize = vfio_pci_realize;
 
     object_class_property_set_description(klass, /* 1.3 */
diff --git a/hw/vfio/vfio-helpers.h b/hw/vfio/vfio-helpers.h
index 54a327ffbc..2afb360797 100644
--- a/hw/vfio/vfio-helpers.h
+++ b/hw/vfio/vfio-helpers.h
@@ -9,7 +9,7 @@
 #ifndef HW_VFIO_VFIO_HELPERS_H
 #define HW_VFIO_VFIO_HELPERS_H
 
-#ifdef CONFIG_LINUX
+#if defined(CONFIG_LINUX) || defined(CONFIG_DARWIN)
 #include <linux/vfio.h>
 
 extern int vfio_kvm_device_fd;
diff --git a/hw/vfio/vfio-migration-internal.h b/hw/vfio/vfio-migration-internal.h
index 814fbd9eba..566cd6a871 100644
--- a/hw/vfio/vfio-migration-internal.h
+++ b/hw/vfio/vfio-migration-internal.h
@@ -9,7 +9,7 @@
 #ifndef HW_VFIO_VFIO_MIGRATION_INTERNAL_H
 #define HW_VFIO_VFIO_MIGRATION_INTERNAL_H
 
-#ifdef CONFIG_LINUX
+#if defined(CONFIG_LINUX) || defined(CONFIG_DARWIN)
 #include <linux/vfio.h>
 #endif
 
@@ -62,7 +62,7 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev);
 int vfio_save_device_config_state(QEMUFile *f, void *opaque, Error **errp);
 int vfio_load_device_config_state(QEMUFile *f, void *opaque);
 
-#ifdef CONFIG_LINUX
+#if defined(CONFIG_LINUX) || defined(CONFIG_DARWIN)
 int vfio_migration_set_state(VFIODevice *vbasedev,
                              enum vfio_device_mig_state new_state,
                              enum vfio_device_mig_state recover_state,
diff --git a/include/compat/linux/ioctl.h b/include/compat/linux/ioctl.h
new file mode 100644
index 0000000000..2c789fefc6
--- /dev/null
+++ b/include/compat/linux/ioctl.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* empty Darwin shim - ioctl macros not needed */
diff --git a/include/compat/linux/types.h b/include/compat/linux/types.h
new file mode 100644
index 0000000000..d6620aaf7f
--- /dev/null
+++ b/include/compat/linux/types.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Darwin shim for <linux/types.h>
+ *
+ * The Linux UAPI headers that QEMU copies into linux-headers/ expect
+ * these typedefs from <linux/types.h>.  Provide the small subset we
+ * need so those headers parse on macOS.
+ */
+#ifndef COMPAT_LINUX_TYPES_H
+#define COMPAT_LINUX_TYPES_H
+
+#include <stdint.h>
+
+typedef uint8_t  __u8;
+typedef uint16_t __u16;
+typedef uint32_t __u32;
+typedef uint64_t __u64;
+typedef int8_t   __s8;
+typedef int16_t  __s16;
+typedef int32_t  __s32;
+typedef int64_t  __s64;
+typedef int64_t  loff_t;
+
+typedef __u64 __aligned_u64 __attribute__((aligned(8)));
+
+#endif /* COMPAT_LINUX_TYPES_H */
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 828a31c006..17c5db369c 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -23,7 +23,7 @@
 
 #include "system/memory.h"
 #include "qemu/queue.h"
-#ifdef CONFIG_LINUX
+#if defined(CONFIG_LINUX) || defined(CONFIG_DARWIN)
 #include <linux/vfio.h>
 #endif
 #include "system/system.h"
@@ -171,7 +171,7 @@ VFIODevice *vfio_get_vfio_device(Object *obj);
 typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIODeviceList vfio_device_list;
 
-#ifdef CONFIG_LINUX
+#if defined(CONFIG_LINUX) || defined(CONFIG_DARWIN)
 /*
  * How devices communicate with the server.  The default option is through
  * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
diff --git a/meson.build b/meson.build
index daa58e46a3..b12466b730 100644
--- a/meson.build
+++ b/meson.build
@@ -764,7 +764,14 @@ if 'objc' in all_languages
   add_project_arguments(objc.get_supported_arguments(qemu_common_flags + warn_flags),
                         native: false, language: 'objc')
 endif
-if host_os == 'linux'
+if host_os == 'darwin'
+  # linux-headers/linux/vfio.h includes <linux/types.h> and <linux/ioctl.h>
+  # which are system headers on Linux but absent on macOS.  Point at the
+  # static shims in include/compat/
+  add_project_arguments('-isystem', meson.current_source_dir() / 'include/compat',
+                        language: all_languages)
+endif
+if host_os in ['linux', 'darwin']
   add_project_arguments('-isystem', meson.current_source_dir() / 'linux-headers',
                         '-isystem', 'linux-headers',
                         language: all_languages)
@@ -3282,6 +3289,7 @@ host_kconfig = \
   (have_vhost_kernel ? ['CONFIG_VHOST_KERNEL=y'] : []) + \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   (host_os == 'linux' ? ['CONFIG_LINUX=y'] : []) + \
+  (host_os == 'darwin' ? ['CONFIG_DARWIN=y'] : []) + \
   (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
   (vfio_user_server_allowed ? ['CONFIG_VFIO_USER_SERVER_ALLOWED=y'] : []) + \
   (hv_balloon ? ['CONFIG_HV_BALLOON_POSSIBLE=y'] : []) + \
diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
index 83fdbb96bb..0e05e81aca 100644
--- a/util/event_notifier-posix.c
+++ b/util/event_notifier-posix.c
@@ -20,10 +20,10 @@
 #include <sys/eventfd.h>
 #endif
 
-#ifdef CONFIG_EVENTFD
 /*
  * Initialize @e with existing file descriptor @fd.
- * @fd must be a genuine eventfd object, emulation with pipe won't do.
+ * On hosts without eventfd(), callers can still restore a single descriptor
+ * for cases that only need eventfd-like semantics.
  */
 void event_notifier_init_fd(EventNotifier *e, int fd)
 {
@@ -31,7 +31,6 @@ void event_notifier_init_fd(EventNotifier *e, int fd)
     e->wfd = fd;
     e->initialized = true;
 }
-#endif
 
 int event_notifier_init(EventNotifier *e, int active)
 {
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 04/10] vfio: Prepare existing code for Apple VFIO backend
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (2 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 03/10] vfio: Allow building on Darwin hosts Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 05/10] vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps Scott J. Goldman
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman, Scott J. Goldman

From: "Scott J. Goldman" <scottjg@umich.edu>

Adjust the shared VFIO code so a non-Linux backend can plug in:

- vfio_device_get_name(): accept a device that already has a name set
  without requiring sysfsdev or an fd.
- vfio_device_is_mdev(): return true on Darwin — the dext mediates all
  device access and manages DMA mappings explicitly, so the mdev
  assumptions (software-managed DMA, balloon-safe) hold.
- vfio_device_attach(): select the Apple IOMMU container type on Darwin.
- vfio_pci_realize(): allow realize when name is pre-set (no sysfsdev),
  and add a no_bar_quirks flag so subclasses can skip BAR quirk setup.
- Add TYPE_VFIO_IOMMU_APPLE and TYPE_VFIO_APPLE_PCI type strings.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 hw/vfio/device.c                 | 20 ++++++++++++++++++++
 hw/vfio/pci.c                    |  8 +++++---
 hw/vfio/pci.h                    |  1 +
 hw/vfio/types.h                  |  2 ++
 include/hw/vfio/vfio-container.h |  1 +
 5 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 973fc35b59..338becffa7 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -316,6 +316,13 @@ bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
     struct stat st;
 
     if (vbasedev->fd < 0) {
+        if (!vbasedev->sysfsdev) {
+            if (vbasedev->name) {
+                return true;
+            }
+            error_setg(errp, "No provided host device");
+            return false;
+        }
         if (stat(vbasedev->sysfsdev, &st) < 0) {
             error_setg_errno(errp, errno, "no such host device");
             error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
@@ -404,7 +411,16 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
     g_autofree char *tmp = NULL;
 
     if (!vbasedev->sysfsdev) {
+#ifdef CONFIG_DARWIN
+        /*
+         * On Darwin the dext mediates all device access and manages DMA
+         * mappings explicitly, so the mdev assumptions (software-managed
+         * DMA, balloon-safe) hold.
+         */
+        return true;
+#else
         return false;
+#endif
     }
 
     tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
@@ -462,9 +478,13 @@ bool vfio_device_attach_by_iommu_type(const char *iommu_type, char *name,
 bool vfio_device_attach(char *name, VFIODevice *vbasedev,
                         AddressSpace *as, Error **errp)
 {
+#ifdef CONFIG_DARWIN
+    const char *iommu_type = TYPE_VFIO_IOMMU_APPLE;
+#else
     const char *iommu_type = vbasedev->iommufd ?
                              TYPE_VFIO_IOMMU_IOMMUFD :
                              TYPE_VFIO_IOMMU_LEGACY;
+#endif
 
     return vfio_device_attach_by_iommu_type(iommu_type, name, vbasedev,
                                             as, errp);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5a1c2d8c2e..cf817d9ae7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3483,7 +3483,7 @@ static void vfio_pci_realize(PCIDevice *pdev, Error **errp)
     char uuid[UUID_STR_LEN];
     g_autofree char *name = NULL;
 
-    if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
+    if (vbasedev->fd < 0 && !vbasedev->sysfsdev && !vbasedev->name) {
         if (!(~vdev->host.domain || ~vdev->host.bus ||
               ~vdev->host.slot || ~vdev->host.function)) {
             error_setg(errp, "No provided host device");
@@ -3558,8 +3558,10 @@ static void vfio_pci_realize(PCIDevice *pdev, Error **errp)
         vfio_vga_quirk_setup(vdev);
     }
 
-    for (i = 0; i < PCI_ROM_SLOT; i++) {
-        vfio_bar_quirk_setup(vdev, i);
+    if (!vdev->no_bar_quirks) {
+        for (i = 0; i < PCI_ROM_SLOT; i++) {
+            vfio_bar_quirk_setup(vdev, i);
+        }
     }
 
     if (!vfio_pci_interrupt_setup(vdev, errp)) {
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index d6495d7f29..424acd71b6 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -187,6 +187,7 @@ struct VFIOPCIDevice {
     bool defer_kvm_irq_routing;
     bool clear_parent_atomics_on_exit;
     bool skip_vsc_check;
+    bool no_bar_quirks;
     uint16_t vpasid_cap_offset;
     VFIODisplay *dpy;
     Notifier irqchip_change_notifier;
diff --git a/hw/vfio/types.h b/hw/vfio/types.h
index 5482d90808..b44c234ac4 100644
--- a/hw/vfio/types.h
+++ b/hw/vfio/types.h
@@ -20,4 +20,6 @@
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
+#define TYPE_VFIO_APPLE_PCI "vfio-apple-pci"
+
 #endif /* HW_VFIO_VFIO_TYPES_H */
diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
index a7d5c5ed67..5ccbabccb4 100644
--- a/include/hw/vfio/vfio-container.h
+++ b/include/hw/vfio/vfio-container.h
@@ -117,6 +117,7 @@ vfio_container_get_page_size_mask(const VFIOContainer *bcontainer)
 #define TYPE_VFIO_IOMMU_SPAPR TYPE_VFIO_IOMMU "-spapr"
 #define TYPE_VFIO_IOMMU_IOMMUFD TYPE_VFIO_IOMMU "-iommufd"
 #define TYPE_VFIO_IOMMU_USER TYPE_VFIO_IOMMU "-user"
+#define TYPE_VFIO_IOMMU_APPLE TYPE_VFIO_IOMMU "-apple"
 
 struct VFIOIOMMUClass {
     ObjectClass parent_class;
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 05/10] vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (3 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 04/10] vfio: Prepare existing code for Apple VFIO backend Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 06/10] vfio: Add device_reset callback " Scott J. Goldman
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman

Rename vfio_region_mmap() and vfio_region_unmap() to
vfio_region_mmap_fd() and vfio_region_unmap_fd() respectively, and
introduce new region_map and region_unmap callbacks in
VFIODeviceIOOps.

The new vfio_region_mmap() and vfio_region_unmap() functions now
dispatch through these io_ops callbacks, allowing different backends
to provide their own region mapping implementations. Both the ioctl
and vfio-user backends implement the callbacks by calling the renamed
fd-based variants.

This refactor enables future backends that may require alternate
region mapping strategies.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 hw/vfio-user/device.c         |  16 ++++-
 hw/vfio/device.c              |  14 +++++
 hw/vfio/region.c              | 108 +++++++++++++++++++++++-----------
 hw/vfio/vfio-region.h         |   4 ++
 include/hw/vfio/vfio-device.h |  25 ++++++++
 5 files changed, 131 insertions(+), 36 deletions(-)

diff --git a/hw/vfio-user/device.c b/hw/vfio-user/device.c
index 64ef35b320..957d19217b 100644
--- a/hw/vfio-user/device.c
+++ b/hw/vfio-user/device.c
@@ -12,6 +12,7 @@
 #include "qemu/lockable.h"
 #include "qemu/thread.h"
 
+#include "hw/vfio/vfio-region.h"
 #include "hw/vfio-user/device.h"
 #include "hw/vfio-user/trace.h"
 
@@ -428,6 +429,18 @@ static int vfio_user_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
     return ret;
 }
 
+static int vfio_user_device_io_region_map(VFIODevice *vbasedev,
+                                          VFIORegion *region)
+{
+    return vfio_region_mmap_fd(region);
+}
+
+static void vfio_user_device_io_region_unmap(VFIODevice *vbasedev,
+                                             VFIORegion *region)
+{
+    vfio_region_unmap_fd(region);
+}
+
 /*
  * Socket-based io_ops
  */
@@ -437,5 +450,6 @@ VFIODeviceIOOps vfio_user_device_io_ops_sock = {
     .set_irqs = vfio_user_device_io_set_irqs,
     .region_read = vfio_user_device_io_region_read,
     .region_write = vfio_user_device_io_region_write,
-
+    .region_map = vfio_user_device_io_region_map,
+    .region_unmap = vfio_user_device_io_region_unmap,
 };
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 338becffa7..1b703dcbec 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -23,6 +23,7 @@
 
 #include "hw/vfio/vfio-device.h"
 #include "hw/vfio/pci.h"
+#include "hw/vfio/vfio-region.h"
 #include "hw/core/iommu.h"
 #include "hw/core/hw-error.h"
 #include "trace.h"
@@ -656,6 +657,17 @@ static int vfio_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
     return ret < 0 ? -errno : ret;
 }
 
+static int vfio_device_io_region_map(VFIODevice *vbasedev, VFIORegion *region)
+{
+    return vfio_region_mmap_fd(region);
+}
+
+static void vfio_device_io_region_unmap(VFIODevice *vbasedev,
+                                        VFIORegion *region)
+{
+    vfio_region_unmap_fd(region);
+}
+
 static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
     .device_feature = vfio_device_io_device_feature,
     .get_region_info = vfio_device_io_get_region_info,
@@ -663,4 +675,6 @@ static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
     .set_irqs = vfio_device_io_set_irqs,
     .region_read = vfio_device_io_region_read,
     .region_write = vfio_device_io_region_write,
+    .region_map = vfio_device_io_region_map,
+    .region_unmap = vfio_device_io_region_unmap,
 };
diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index 47fdc2df34..9f7780e06c 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -273,15 +273,48 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
     return 0;
 }
 
-static void vfio_subregion_unmap(VFIORegion *region, int index)
+void vfio_region_register_mmap(VFIORegion *region, int index)
 {
+    char *name;
+
+    if (!region->mmaps[index].mmap) {
+        return;
+    }
+
+    name = g_strdup_printf("%s mmaps[%d]",
+                           memory_region_name(region->mem), index);
+    memory_region_init_ram_device_ptr(&region->mmaps[index].mem,
+                                      memory_region_owner(region->mem),
+                                      name, region->mmaps[index].size,
+                                      region->mmaps[index].mmap);
+    g_free(name);
+    memory_region_add_subregion(region->mem, region->mmaps[index].offset,
+                                &region->mmaps[index].mem);
+
+    trace_vfio_region_mmap(memory_region_name(&region->mmaps[index].mem),
+                           region->mmaps[index].offset,
+                           region->mmaps[index].offset +
+                           region->mmaps[index].size - 1);
+}
+
+void vfio_region_unregister_mmap(VFIORegion *region, int index)
+{
+    if (!region->mmaps[index].mmap) {
+        return;
+    }
+
     trace_vfio_region_unmap(memory_region_name(&region->mmaps[index].mem),
                             region->mmaps[index].offset,
                             region->mmaps[index].offset +
                             region->mmaps[index].size - 1);
     memory_region_del_subregion(region->mem, &region->mmaps[index].mem);
-    munmap(region->mmaps[index].mmap, region->mmaps[index].size);
     object_unparent(OBJECT(&region->mmaps[index].mem));
+}
+
+static void vfio_region_unmap_fd_one(VFIORegion *region, int index)
+{
+    vfio_region_unregister_mmap(region, index);
+    munmap(region->mmaps[index].mmap, region->mmaps[index].size);
     region->mmaps[index].mmap = NULL;
 }
 
@@ -342,14 +375,13 @@ static bool vfio_region_create_dma_buf(VFIORegion *region, Error **errp)
     return true;
 }
 
-int vfio_region_mmap(VFIORegion *region)
+int vfio_region_mmap_fd(VFIORegion *region)
 {
     void *map_base, *map_align;
     Error *local_err = NULL;
     int i, ret, prot = 0;
     off_t map_offset = 0;
     size_t align;
-    char *name;
     int fd;
 
     if (!region->mem || !region->nr_mmaps) {
@@ -417,21 +449,7 @@ int vfio_region_mmap(VFIORegion *region)
             goto no_mmap;
         }
 
-        name = g_strdup_printf("%s mmaps[%d]",
-                               memory_region_name(region->mem), i);
-        memory_region_init_ram_device_ptr(&region->mmaps[i].mem,
-                                          memory_region_owner(region->mem),
-                                          name, region->mmaps[i].size,
-                                          region->mmaps[i].mmap);
-        g_free(name);
-        memory_region_add_subregion(region->mem, region->mmaps[i].offset,
-                                    &region->mmaps[i].mem);
-
-        trace_vfio_region_mmap(memory_region_name(&region->mmaps[i].mem),
-                               region->mmaps[i].offset,
-                               region->mmaps[i].offset +
-                               region->mmaps[i].size - 1);
-
+        vfio_region_register_mmap(region, i);
         map_offset = region->mmaps[i].offset + region->mmaps[i].size;
     }
 
@@ -457,13 +475,13 @@ no_mmap:
     region->mmaps[i].mmap = NULL;
 
     for (i--; i >= 0; i--) {
-        vfio_subregion_unmap(region, i);
+        vfio_region_unmap_fd_one(region, i);
     }
 
     return ret;
 }
 
-void vfio_region_unmap(VFIORegion *region)
+void vfio_region_unmap_fd(VFIORegion *region)
 {
     int i;
 
@@ -473,41 +491,61 @@ void vfio_region_unmap(VFIORegion *region)
 
     for (i = 0; i < region->nr_mmaps; i++) {
         if (region->mmaps[i].mmap) {
-            vfio_subregion_unmap(region, i);
+            vfio_region_unmap_fd_one(region, i);
         }
     }
 }
 
-void vfio_region_exit(VFIORegion *region)
+int vfio_region_mmap(VFIORegion *region)
 {
-    int i;
+    VFIODevice *vbasedev;
+
+    if (!region->mem) {
+        return 0;
+    }
+
+    vbasedev = region->vbasedev;
+    if (!vbasedev->io_ops || !vbasedev->io_ops->region_map) {
+        return -EINVAL;
+    }
+
+    return vbasedev->io_ops->region_map(vbasedev, region);
+}
+
+void vfio_region_unmap(VFIORegion *region)
+{
+    VFIODevice *vbasedev;
 
     if (!region->mem) {
         return;
     }
 
-    for (i = 0; i < region->nr_mmaps; i++) {
-        if (region->mmaps[i].mmap) {
-            memory_region_del_subregion(region->mem, &region->mmaps[i].mem);
-        }
+    vbasedev = region->vbasedev;
+    if (!vbasedev->io_ops || !vbasedev->io_ops->region_unmap) {
+        return;
+    }
+
+    vbasedev->io_ops->region_unmap(vbasedev, region);
+}
+
+void vfio_region_exit(VFIORegion *region)
+{
+    if (!region->mem) {
+        return;
     }
 
+    vfio_region_unmap(region);
+
     trace_vfio_region_exit(region->vbasedev->name, region->nr);
 }
 
 void vfio_region_finalize(VFIORegion *region)
 {
-    int i;
-
     if (!region->mem) {
         return;
     }
 
-    for (i = 0; i < region->nr_mmaps; i++) {
-        if (region->mmaps[i].mmap) {
-            munmap(region->mmaps[i].mmap, region->mmaps[i].size);
-        }
-    }
+    vfio_region_unmap(region);
 
     g_free(region->mem);
     g_free(region->mmaps);
diff --git a/hw/vfio/vfio-region.h b/hw/vfio/vfio-region.h
index 9b21d4ee5b..afdce466b1 100644
--- a/hw/vfio/vfio-region.h
+++ b/hw/vfio/vfio-region.h
@@ -39,6 +39,10 @@ uint64_t vfio_region_read(void *opaque,
                           hwaddr addr, unsigned size);
 int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
                       int index, const char *name, Error **errp);
+void vfio_region_register_mmap(VFIORegion *region, int index);
+void vfio_region_unregister_mmap(VFIORegion *region, int index);
+int vfio_region_mmap_fd(VFIORegion *region);
+void vfio_region_unmap_fd(VFIORegion *region);
 int vfio_region_mmap(VFIORegion *region);
 void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
 void vfio_region_unmap(VFIORegion *region);
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 17c5db369c..1a3b42bcaf 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -44,6 +44,7 @@ enum {
 typedef struct VFIODeviceOps VFIODeviceOps;
 typedef struct VFIODeviceIOOps VFIODeviceIOOps;
 typedef struct VFIOMigration VFIOMigration;
+typedef struct VFIORegion VFIORegion;
 
 typedef struct IOMMUFDBackend IOMMUFDBackend;
 typedef struct VFIOIOASHwpt VFIOIOASHwpt;
@@ -260,6 +261,30 @@ struct VFIODeviceIOOps {
      */
     int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
                         void *data, bool post);
+
+    /**
+     * @region_map
+     *
+     * Map a region's directly accessible subranges and register any mmap-backed
+     * subregions with QEMU.
+     *
+     * @vdev: #VFIODevice to use
+     * @region: #VFIORegion to map
+     *
+     * Returns 0 on success or -errno.
+     */
+    int (*region_map)(VFIODevice *vdev, VFIORegion *region);
+
+    /**
+     * @region_unmap
+     *
+     * Unregister any mmap-backed subregions for a region and release their
+     * backend mappings.
+     *
+     * @vdev: #VFIODevice to use
+     * @region: #VFIORegion to unmap
+     */
+    void (*region_unmap)(VFIODevice *vdev, VFIORegion *region);
 };
 
 void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainer *bcontainer,
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 06/10] vfio: Add device_reset callback to VFIODeviceIOOps
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (4 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 05/10] vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 07/10] vfio/apple: Add DriverKit dext client library Scott J. Goldman
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman

Route all VFIO_DEVICE_RESET ioctl calls through a new device_reset
io_ops callback, matching the pattern established for region_map and
region_unmap. This allows non-Linux backends to provide their own
reset implementation.

The Linux ioctl backend implements the callback by issuing the
VFIO_DEVICE_RESET ioctl. All existing callsites in pci.c, ccw.c,
ap.c, and migration.c are converted to use the callback.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 hw/vfio/ap.c                  |  4 ++--
 hw/vfio/ccw.c                 |  2 +-
 hw/vfio/device.c              |  8 ++++++++
 hw/vfio/migration.c           |  5 +++--
 hw/vfio/pci.c                 |  6 ++++--
 include/hw/vfio/vfio-device.h | 11 +++++++++++
 6 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 5c8f305653..2f2f17e666 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -290,10 +290,10 @@ static void vfio_ap_reset(DeviceState *dev)
     int ret;
     VFIOAPDevice *vapdev = VFIO_AP_DEVICE(dev);
 
-    ret = ioctl(vapdev->vdev.fd, VFIO_DEVICE_RESET);
+    ret = vapdev->vdev.io_ops->device_reset(&vapdev->vdev);
     if (ret) {
         error_report("%s: failed to reset %s device: %s", __func__,
-                     vapdev->vdev.name, strerror(errno));
+                     vapdev->vdev.name, strerror(-ret));
     }
 }
 
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index ce9c014e6a..330b733b7e 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -242,7 +242,7 @@ static void vfio_ccw_reset(DeviceState *dev)
 {
     VFIOCCWDevice *vcdev = VFIO_CCW(dev);
 
-    ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
+    vcdev->vdev.io_ops->device_reset(&vcdev->vdev);
 }
 
 static void vfio_ccw_crw_read(VFIOCCWDevice *vcdev)
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 1b703dcbec..cf3953d975 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -657,6 +657,13 @@ static int vfio_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
     return ret < 0 ? -errno : ret;
 }
 
+static int vfio_device_io_device_reset(VFIODevice *vbasedev)
+{
+    int ret = ioctl(vbasedev->fd, VFIO_DEVICE_RESET);
+
+    return ret < 0 ? -errno : ret;
+}
+
 static int vfio_device_io_region_map(VFIODevice *vbasedev, VFIORegion *region)
 {
     return vfio_region_mmap_fd(region);
@@ -673,6 +680,7 @@ static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
     .get_region_info = vfio_device_io_get_region_info,
     .get_irq_info = vfio_device_io_get_irq_info,
     .set_irqs = vfio_device_io_set_irqs,
+    .device_reset = vfio_device_io_device_reset,
     .region_read = vfio_device_io_region_read,
     .region_write = vfio_device_io_region_write,
     .region_map = vfio_device_io_region_map,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 83327b6573..b31253ea90 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -216,9 +216,10 @@ int vfio_migration_set_state(VFIODevice *vbasedev,
     return 0;
 
 reset_device:
-    if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
+    ret = vbasedev->io_ops->device_reset(vbasedev);
+    if (ret) {
         hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
-                 strerror(errno));
+                 strerror(-ret));
     }
 
     vfio_migration_set_device_state(vbasedev, VFIO_DEVICE_STATE_RUNNING);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index cf817d9ae7..458b3400cc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3691,7 +3691,8 @@ static void vfio_pci_reset(DeviceState *dev)
 
     if (vdev->vbasedev.reset_works &&
         (vdev->has_flr || !vdev->has_pm_reset) &&
-        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
+        vdev->vbasedev.io_ops && vdev->vbasedev.io_ops->device_reset &&
+        !vdev->vbasedev.io_ops->device_reset(&vdev->vbasedev)) {
         trace_vfio_pci_reset_flr(vdev->vbasedev.name);
         goto post_reset;
     }
@@ -3703,7 +3704,8 @@ static void vfio_pci_reset(DeviceState *dev)
 
     /* If nothing else works and the device supports PM reset, use it */
     if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
-        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
+        vdev->vbasedev.io_ops && vdev->vbasedev.io_ops->device_reset &&
+        !vdev->vbasedev.io_ops->device_reset(&vdev->vbasedev)) {
         trace_vfio_pci_reset_pm(vdev->vbasedev.name);
         goto post_reset;
     }
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 1a3b42bcaf..0e6bff774e 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -262,6 +262,17 @@ struct VFIODeviceIOOps {
     int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
                         void *data, bool post);
 
+    /**
+     * @device_reset
+     *
+     * Reset the device.
+     *
+     * @vdev: #VFIODevice to reset
+     *
+     * Returns 0 on success or -errno.
+     */
+    int (*device_reset)(VFIODevice *vdev);
+
     /**
      * @region_map
      *
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 07/10] vfio/apple: Add DriverKit dext client library
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (5 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 06/10] vfio: Add device_reset callback " Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 08/10] vfio/apple: Add IOMMU container and PCI device Scott J. Goldman
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman, Scott J. Goldman

From: "Scott J. Goldman" <scottjg@umich.edu>

Add the C client library for communicating with the VFIOUserPCIDriver
DriverKit extension (dext) on macOS.  This provides the low-level
IOUserClient wrappers that the Apple VFIO backend will use:

- Connection management (connect, disconnect, claim)
- PCI config space read/write (individual and block)
- BAR info queries, BAR mapping/unmapping, MMIO read/write
- DMA region registration and unregistration
- Interrupt setup, pending IRQ polling, async notification
- Device reset (FLR with hot-reset fallback)

All calls go through IOKit's IOConnectCallScalarMethod /
IOConnectMapMemory64 to the dext, which mediates access to the
physical PCI device via PCIDriverKit.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 hw/vfio/apple-dext-client.c | 681 ++++++++++++++++++++++++++++++++++++
 hw/vfio/apple-dext-client.h | 253 ++++++++++++++
 hw/vfio/meson.build         |   7 +
 3 files changed, 941 insertions(+)
 create mode 100644 hw/vfio/apple-dext-client.c
 create mode 100644 hw/vfio/apple-dext-client.h

diff --git a/hw/vfio/apple-dext-client.c b/hw/vfio/apple-dext-client.c
new file mode 100644
index 0000000000..7ba03fc6e9
--- /dev/null
+++ b/hw/vfio/apple-dext-client.c
@@ -0,0 +1,681 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * C client implementation for communicating with the VFIOUserPCIDriver dext
+ * via IOKit IOUserClient.
+ *
+ * Copyright (c) 2026 Scott J. Goldman
+ */
+
+#include "qemu/osdep.h"
+
+#include "apple-dext-client.h"
+
+#include <CoreFoundation/CoreFoundation.h>
+#include <IOKit/IOKitLib.h>
+#include <dispatch/dispatch.h>
+#include <string.h>
+
+enum {
+    kSelectorGetIdentity        = 0,
+    kSelectorClaim              = 1,
+    kSelectorTerminate          = 2,
+    kSelectorAllocateDMABuffer  = 3,
+    kSelectorFreeDMABuffer      = 4,
+    kSelectorRegisterDMARegion  = 5,
+    kSelectorUnregisterDMARegion = 6,
+    kSelectorProbeDMARegion     = 7,
+    kSelectorConfigRead         = 8,
+    kSelectorConfigWrite        = 9,
+    kSelectorGetBARInfo         = 10,
+    kSelectorMMIORead           = 11,
+    kSelectorMMIOWrite          = 12,
+    kSelectorSetupInterrupts    = 13,
+    kSelectorCheckInterrupt     = 14,
+    kSelectorWaitInterrupt      = 15,
+    kSelectorSetIRQMask         = 16,
+    kSelectorResetDevice        = 17,
+};
+
+/*
+ * Keep this in sync with PCIDriverKit BAR type encoding. Bit 3 indicates
+ * prefetchability for memory BARs.
+ */
+#define APPLE_DEXT_BAR_PREFETCHABLE_MASK 0x08
+#ifndef kIOMapWriteCombineCache
+#define kIOMapWriteCombineCache 0x00000400
+#endif
+
+static bool
+dext_service_matches_class(io_service_t service, const char *className)
+{
+    bool match = false;
+    CFTypeRef ref;
+
+    ref = IORegistryEntryCreateCFProperty(service, CFSTR("IOUserClass"),
+                                          kCFAllocatorDefault, 0);
+    if (ref == NULL) {
+        return false;
+    }
+
+    if (CFGetTypeID(ref) == CFStringGetTypeID()) {
+        CFStringRef expected = CFStringCreateWithCString(
+            kCFAllocatorDefault, className, kCFStringEncodingUTF8);
+        if (expected != NULL) {
+            match = CFStringCompare((CFStringRef)ref, expected, 0)
+                    == kCFCompareEqualTo;
+            CFRelease(expected);
+        }
+    }
+    CFRelease(ref);
+    return match;
+}
+
+static bool
+dext_connection_matches_bdf(io_connect_t connection,
+                            uint8_t bus, uint8_t device, uint8_t function)
+{
+    uint64_t output[6] = {0};
+    uint32_t outputCount = 6;
+    kern_return_t kr;
+
+    kr = IOConnectCallMethod(connection, kSelectorGetIdentity,
+                             NULL, 0, NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS || outputCount < 3) {
+        return false;
+    }
+
+    return (uint8_t)output[0] == bus &&
+           (uint8_t)output[1] == device &&
+           (uint8_t)output[2] == function;
+}
+
+io_connect_t
+apple_dext_connect(uint8_t bus, uint8_t device, uint8_t function)
+{
+    CFMutableDictionaryRef matching;
+    io_iterator_t iterator = IO_OBJECT_NULL;
+    io_connect_t result = IO_OBJECT_NULL;
+    io_service_t service;
+    kern_return_t kr;
+
+    matching = IOServiceMatching("IOUserService");
+    if (matching == NULL) {
+        return IO_OBJECT_NULL;
+    }
+
+    kr = IOServiceGetMatchingServices(kIOMainPortDefault, matching, &iterator);
+    if (kr != KERN_SUCCESS) {
+        return IO_OBJECT_NULL;
+    }
+
+    while ((service = IOIteratorNext(iterator)) != IO_OBJECT_NULL) {
+        io_connect_t connection = IO_OBJECT_NULL;
+
+        if (!dext_service_matches_class(service, "VFIOUserPCIDriver")) {
+            IOObjectRelease(service);
+            continue;
+        }
+
+        kr = IOServiceOpen(service, mach_task_self(), 0, &connection);
+        IOObjectRelease(service);
+
+        if (kr != KERN_SUCCESS) {
+            continue;
+        }
+
+        if (dext_connection_matches_bdf(connection, bus, device, function)) {
+            result = connection;
+            break;
+        }
+
+        IOServiceClose(connection);
+    }
+
+    IOObjectRelease(iterator);
+    return result;
+}
+
+void
+apple_dext_disconnect(io_connect_t connection)
+{
+    if (connection != IO_OBJECT_NULL) {
+        IOServiceClose(connection);
+    }
+}
+
+kern_return_t
+apple_dext_claim(io_connect_t connection)
+{
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectCallMethod(connection,
+                               kSelectorClaim,
+                               NULL, 0, NULL, 0,
+                               NULL, NULL, NULL, NULL);
+}
+
+kern_return_t
+apple_dext_register_dma(io_connect_t connection,
+                            uint64_t iova,
+                            uint64_t client_va,
+                            uint64_t size,
+                            uint64_t *out_bus_addr,
+                            uint64_t *out_bus_len)
+{
+    uint64_t input[3] = { iova, client_va, size };
+    uint64_t output[3] = {0};
+    uint32_t outputCount = 3;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorRegisterDMARegion,
+                             input, 3,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    if (out_bus_addr != NULL && outputCount >= 2) {
+        *out_bus_addr = output[1];
+    }
+    if (out_bus_len != NULL && outputCount >= 3) {
+        *out_bus_len = output[2];
+    }
+
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_unregister_dma(io_connect_t connection,
+                              uint64_t iova)
+{
+    uint64_t input[1] = { iova };
+
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectCallMethod(connection,
+                               kSelectorUnregisterDMARegion,
+                               input, 1,
+                               NULL, 0,
+                               NULL, NULL,
+                               NULL, NULL);
+}
+
+kern_return_t
+apple_dext_probe_dma(io_connect_t connection,
+                         uint64_t iova,
+                         uint64_t offset,
+                         uint64_t *out_word)
+{
+    uint64_t input[2] = { iova, offset };
+    uint64_t output[1] = {0};
+    uint32_t outputCount = 1;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL || out_word == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorProbeDMARegion,
+                             input, 2,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    *out_word = output[0];
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_config_read(io_connect_t connection,
+                           uint64_t offset,
+                           uint64_t width,
+                           uint64_t *out_value)
+{
+    uint64_t input[2] = { offset, width };
+    uint64_t output[1] = {0};
+    uint32_t outputCount = 1;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL || out_value == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorConfigRead,
+                             input, 2,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    *out_value = output[0];
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_config_write(io_connect_t connection,
+                            uint64_t offset,
+                            uint64_t width,
+                            uint64_t value)
+{
+    uint64_t input[3] = { offset, width, value };
+
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectCallMethod(connection,
+                               kSelectorConfigWrite,
+                               input, 3,
+                               NULL, 0,
+                               NULL, NULL,
+                               NULL, NULL);
+}
+
+kern_return_t
+apple_dext_config_read_block(io_connect_t connection,
+                                 uint64_t offset,
+                                 void *buf,
+                                 size_t len)
+{
+    uint8_t *dst = (uint8_t *)buf;
+    uint64_t pos = offset;
+    size_t remaining = len;
+
+    if (connection == IO_OBJECT_NULL || buf == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    while (remaining >= 4) {
+        uint64_t val = 0;
+        uint32_t dword;
+        kern_return_t kr;
+
+        kr = apple_dext_config_read(connection, pos, 4, &val);
+        if (kr != KERN_SUCCESS) {
+            return kr;
+        }
+        dword = (uint32_t)val;
+        memcpy(dst, &dword, 4);
+        dst += 4;
+        pos += 4;
+        remaining -= 4;
+    }
+
+    while (remaining > 0) {
+        uint64_t val = 0;
+        kern_return_t kr;
+
+        kr = apple_dext_config_read(connection, pos, 1, &val);
+        if (kr != KERN_SUCCESS) {
+            return kr;
+        }
+        *dst = (uint8_t)val;
+        dst++;
+        pos++;
+        remaining--;
+    }
+
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_get_bar_info(io_connect_t connection,
+                            uint8_t bar,
+                            uint8_t *out_mem_idx,
+                            uint64_t *out_size,
+                            uint8_t *out_type)
+{
+    uint64_t input[1] = { bar };
+    uint64_t output[3] = {0};
+    uint32_t outputCount = 3;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorGetBARInfo,
+                             input, 1,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    if (out_mem_idx != NULL) {
+        *out_mem_idx = (uint8_t)output[0];
+    }
+    if (out_size != NULL) {
+        *out_size = output[1];
+    }
+    if (out_type != NULL) {
+        *out_type = (uint8_t)output[2];
+    }
+
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_map_bar(io_connect_t connection,
+                       uint8_t bar,
+                       mach_vm_address_t *out_addr,
+                       mach_vm_size_t *out_size,
+                       uint8_t *out_type)
+{
+    uint64_t bar_size = 0;
+    uint8_t bar_type = 0;
+    uint32_t mem_type;
+    mach_vm_address_t addr = 0;
+    mach_vm_size_t size = 0;
+    IOOptionBits opts = kIOMapAnywhere;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL || out_addr == NULL || out_size == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = apple_dext_get_bar_info(connection, bar, NULL,
+                                     &bar_size, &bar_type);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    /*
+     * The memory type for IOConnectMapMemory64 must match the dext's
+     * CopyClientMemoryForType expectation:
+     * kVFIOUserPCIDriverUserClientMemoryTypeBAR0 (= 1) plus the BAR index.
+     * This is NOT the same as the PCIDriverKit internal memoryIndex returned
+     * by GetBARInfo.
+     */
+    mem_type = 1 + (uint32_t)bar;
+
+    if (bar_type & APPLE_DEXT_BAR_PREFETCHABLE_MASK) {
+        opts |= kIOMapWriteCombineCache;
+    }
+
+    kr = IOConnectMapMemory64(connection, mem_type, mach_task_self(),
+                              &addr, &size, opts);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    *out_addr = addr;
+    *out_size = size;
+    if (out_type != NULL) {
+        *out_type = bar_type;
+    }
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_unmap_bar(io_connect_t connection,
+                         uint8_t bar,
+                         mach_vm_address_t addr)
+{
+    uint32_t mem_type = 1 + (uint32_t)bar;
+
+    if (connection == IO_OBJECT_NULL || addr == 0) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectUnmapMemory64(connection, mem_type, mach_task_self(), addr);
+}
+
+kern_return_t
+apple_dext_mmio_read(io_connect_t connection,
+                         uint8_t mem_idx,
+                         uint64_t offset,
+                         uint64_t width,
+                         uint64_t *out_value)
+{
+    uint64_t input[3] = { mem_idx, offset, width };
+    uint64_t output[1] = {0};
+    uint32_t outputCount = 1;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL || out_value == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorMMIORead,
+                             input, 3,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    *out_value = output[0];
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_mmio_write(io_connect_t connection,
+                          uint8_t mem_idx,
+                          uint64_t offset,
+                          uint64_t width,
+                          uint64_t value)
+{
+    uint64_t input[4] = { mem_idx, offset, width, value };
+
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectCallMethod(connection,
+                               kSelectorMMIOWrite,
+                               input, 4,
+                               NULL, 0,
+                               NULL, NULL,
+                               NULL, NULL);
+}
+
+kern_return_t
+apple_dext_setup_interrupts(io_connect_t connection,
+                                uint32_t *out_num_vectors)
+{
+    uint64_t output[1] = {0};
+    uint32_t outputCount = 1;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorSetupInterrupts,
+                             NULL, 0,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    if (out_num_vectors != NULL && outputCount >= 1) {
+        *out_num_vectors = (uint32_t)output[0];
+    }
+
+    return kIOReturnSuccess;
+}
+
+kern_return_t
+apple_dext_reset_device(io_connect_t connection)
+{
+    if (connection == IO_OBJECT_NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectCallMethod(connection,
+                               kSelectorResetDevice,
+                               NULL, 0, NULL, 0,
+                               NULL, NULL, NULL, NULL);
+}
+
+kern_return_t
+apple_dext_set_irq_mask(io_connect_t connection, const uint64_t mask[4])
+{
+    if (connection == IO_OBJECT_NULL || mask == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    return IOConnectCallMethod(connection,
+                               kSelectorSetIRQMask,
+                               mask, 4,
+                               NULL, 0,
+                               NULL, NULL,
+                               NULL, NULL);
+}
+
+kern_return_t
+apple_dext_read_pending_irqs(io_connect_t connection, uint64_t pending[4])
+{
+    uint64_t output[4] = {0};
+    uint32_t outputCount = 4;
+    kern_return_t kr;
+    uint32_t i;
+
+    if (connection == IO_OBJECT_NULL || pending == NULL) {
+        return kIOReturnBadArgument;
+    }
+
+    kr = IOConnectCallMethod(connection,
+                             kSelectorCheckInterrupt,
+                             NULL, 0,
+                             NULL, 0,
+                             output, &outputCount,
+                             NULL, NULL);
+    if (kr != KERN_SUCCESS) {
+        return kr;
+    }
+
+    for (i = 0; i < 4; i++) {
+        pending[i] = (i < outputCount) ? output[i] : 0;
+    }
+
+    return kIOReturnSuccess;
+}
+
+struct AppleDextInterruptNotify {
+    io_connect_t connection;
+    IONotificationPortRef notifyPort;
+    mach_port_t machPort;
+    dispatch_queue_t dispatchQueue;
+    void (*handler_fn)(void *opaque);
+    void *opaque;
+};
+
+static void
+apple_dext_async_callback(void *refcon, IOReturn result,
+                          void **args, uint32_t numArgs)
+{
+    AppleDextInterruptNotify *notify = refcon;
+
+    if (result == kIOReturnSuccess && notify->handler_fn) {
+        notify->handler_fn(notify->opaque);
+    }
+}
+
+static kern_return_t
+apple_dext_interrupt_notify_arm(AppleDextInterruptNotify *notify)
+{
+    uint64_t asyncRef[kIOAsyncCalloutCount];
+
+    asyncRef[kIOAsyncCalloutFuncIndex] =
+        (uint64_t)(uintptr_t)apple_dext_async_callback;
+    asyncRef[kIOAsyncCalloutRefconIndex] =
+        (uint64_t)(uintptr_t)notify;
+
+    return IOConnectCallAsyncMethod(notify->connection,
+                                    kSelectorWaitInterrupt,
+                                    notify->machPort,
+                                    asyncRef, kIOAsyncCalloutCount,
+                                    NULL, 0, NULL, 0,
+                                    NULL, NULL, NULL, NULL);
+}
+
+AppleDextInterruptNotify *
+apple_dext_interrupt_notify_create(io_connect_t connection,
+                                   void (*handler_fn)(void *opaque),
+                                   void *opaque)
+{
+    AppleDextInterruptNotify *notify;
+    kern_return_t kr;
+
+    if (connection == IO_OBJECT_NULL || handler_fn == NULL) {
+        return NULL;
+    }
+
+    notify = g_new0(AppleDextInterruptNotify, 1);
+    notify->connection = connection;
+    notify->handler_fn = handler_fn;
+    notify->opaque = opaque;
+
+    notify->notifyPort = IONotificationPortCreate(kIOMainPortDefault);
+    if (!notify->notifyPort) {
+        g_free(notify);
+        return NULL;
+    }
+
+    notify->dispatchQueue = dispatch_queue_create(
+        "org.qemu.vfio-apple.irq-notify", DISPATCH_QUEUE_SERIAL);
+    IONotificationPortSetDispatchQueue(notify->notifyPort,
+                                       notify->dispatchQueue);
+    notify->machPort = IONotificationPortGetMachPort(notify->notifyPort);
+
+    kr = apple_dext_interrupt_notify_arm(notify);
+    if (kr != KERN_SUCCESS) {
+        IONotificationPortDestroy(notify->notifyPort);
+        dispatch_release(notify->dispatchQueue);
+        g_free(notify);
+        return NULL;
+    }
+
+    return notify;
+}
+
+kern_return_t
+apple_dext_interrupt_notify_rearm(AppleDextInterruptNotify *notify)
+{
+    if (!notify) {
+        return kIOReturnBadArgument;
+    }
+    return apple_dext_interrupt_notify_arm(notify);
+}
+
+void
+apple_dext_interrupt_notify_destroy(AppleDextInterruptNotify *notify)
+{
+    if (!notify) {
+        return;
+    }
+
+    IONotificationPortDestroy(notify->notifyPort);
+    dispatch_release(notify->dispatchQueue);
+    g_free(notify);
+}
diff --git a/hw/vfio/apple-dext-client.h b/hw/vfio/apple-dext-client.h
new file mode 100644
index 0000000000..07574493e6
--- /dev/null
+++ b/hw/vfio/apple-dext-client.h
@@ -0,0 +1,253 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * C API for connecting to the VFIOUserPCIDriver DriverKit extension.
+ *
+ * The vfio-user server process uses this to:
+ *  1. Find and open an IOUserClient to the dext for a given PCI BDF.
+ *  2. Claim the device so the dext opens its IOPCIDevice provider.
+ *  3. Register client-owned memory (QEMU guest RAM mapped via shared file)
+ *     for DMA by the physical PCI device.
+ *  4. Unregister DMA regions when QEMU removes them.
+ *
+ * Integration with libvfio-user:
+ *   vfu_dma_register_cb_t  ->  apple_dext_register_dma()
+ *   vfu_dma_unregister_cb_t -> apple_dext_unregister_dma()
+ *
+ * Copyright (c) 2026 Scott J. Goldman
+ */
+
+#ifndef HW_VFIO_APPLE_DEXT_CLIENT_H
+#define HW_VFIO_APPLE_DEXT_CLIENT_H
+
+#include <IOKit/IOKitLib.h>
+#include <stdint.h>
+
+/*
+ * Find the VFIOUserPCIDriver dext instance matching the given PCI BDF
+ * and open an IOUserClient connection to it.
+ * Returns IO_OBJECT_NULL on failure.
+ */
+io_connect_t apple_dext_connect(uint8_t bus, uint8_t device,
+                                    uint8_t function);
+
+/*
+ * Close a previously opened connection.
+ */
+void apple_dext_disconnect(io_connect_t connection);
+
+/*
+ * Claim the PCI device through the dext (opens the IOPCIDevice provider).
+ * Must be called before registering DMA regions.
+ */
+kern_return_t apple_dext_claim(io_connect_t connection);
+
+/*
+ * Register a region of this process's address space for DMA.
+ *
+ * @iova:         guest IOVA (device-visible DMA address)
+ * @client_va:    virtual address of the memory in this process
+ * @size:         region size in bytes
+ * @out_bus_addr: receives first DMA bus address segment (may be NULL)
+ * @out_bus_len:  receives first DMA bus address segment length (may be NULL)
+ *
+ * The memory at client_va must remain valid and mapped until the region
+ * is unregistered.
+ */
+kern_return_t apple_dext_register_dma(io_connect_t connection,
+                                          uint64_t iova,
+                                          uint64_t client_va,
+                                          uint64_t size,
+                                          uint64_t *out_bus_addr,
+                                          uint64_t *out_bus_len);
+
+/*
+ * Unregister a previously registered DMA region identified by its IOVA.
+ */
+kern_return_t apple_dext_unregister_dma(io_connect_t connection,
+                                            uint64_t iova);
+
+/*
+ * Read 8 bytes from a registered DMA region's IOMemoryDescriptor.
+ * Used to verify the descriptor references the same physical pages
+ * as the client's virtual mapping.
+ *
+ * @iova:     base IOVA of the registered region
+ * @offset:   byte offset within the region to read from
+ * @out_word: receives the 8-byte value read from the descriptor
+ */
+kern_return_t apple_dext_probe_dma(io_connect_t connection,
+                                       uint64_t iova,
+                                       uint64_t offset,
+                                       uint64_t *out_word);
+
+/*
+ * Read from PCI configuration space.
+ *
+ * @offset: byte offset into config space
+ * @width:  access width in bytes (1, 2, or 4)
+ * @out_value: receives the value read
+ */
+kern_return_t apple_dext_config_read(io_connect_t connection,
+                                         uint64_t offset,
+                                         uint64_t width,
+                                         uint64_t *out_value);
+
+/*
+ * Write to PCI configuration space.
+ *
+ * @offset: byte offset into config space
+ * @width:  access width in bytes (1, 2, or 4)
+ * @value:  value to write
+ */
+kern_return_t apple_dext_config_write(io_connect_t connection,
+                                          uint64_t offset,
+                                          uint64_t width,
+                                          uint64_t value);
+
+/*
+ * Read a contiguous block of PCI configuration space.
+ * Internally issues repeated 32-bit reads, with a final
+ * narrower read for any trailing bytes.
+ *
+ * @offset: starting byte offset
+ * @buf:    destination buffer
+ * @len:    number of bytes to read
+ */
+kern_return_t apple_dext_config_read_block(io_connect_t connection,
+                                               uint64_t offset,
+                                               void *buf,
+                                               size_t len);
+
+/*
+ * Query BAR information from the PCI device.
+ *
+ * @bar:          BAR index (0-5)
+ * @out_mem_idx:  receives the memory index for MemoryRead/Write calls
+ * @out_size:     receives the BAR size in bytes
+ * @out_type:     receives the BAR type (mem32, mem64, io, etc.)
+ */
+kern_return_t apple_dext_get_bar_info(io_connect_t connection,
+                                          uint8_t bar,
+                                          uint8_t *out_mem_idx,
+                                          uint64_t *out_size,
+                                          uint8_t *out_type);
+
+/*
+ * Map a PCI BAR directly into this process through the dext.
+ *
+ * The dext supplies the BAR's IOMemoryDescriptor and IOKit applies the
+ * appropriate cache mode for the BAR type (default-cache for BAR0 style
+ * register windows, write-combine for prefetchable apertures).
+ *
+ * @bar:      BAR index (0-5)
+ * @out_addr: receives the mapped virtual address
+ * @out_size: receives the mapped size
+ * @out_type: receives the BAR type (may be NULL)
+ */
+kern_return_t apple_dext_map_bar(io_connect_t connection,
+                                     uint8_t bar,
+                                     mach_vm_address_t *out_addr,
+                                     mach_vm_size_t *out_size,
+                                     uint8_t *out_type);
+
+/*
+ * Unmap a BAR previously mapped with apple_dext_map_bar().
+ */
+kern_return_t apple_dext_unmap_bar(io_connect_t connection,
+                                       uint8_t bar,
+                                       mach_vm_address_t addr);
+
+/*
+ * Read from a PCI BAR (MMIO).
+ *
+ * @mem_idx: memory index from apple_dext_get_bar_info
+ * @offset:  byte offset within the BAR
+ * @width:   access width in bytes (1, 2, 4, or 8)
+ * @out_value: receives the value read
+ */
+kern_return_t apple_dext_mmio_read(io_connect_t connection,
+                                       uint8_t mem_idx,
+                                       uint64_t offset,
+                                       uint64_t width,
+                                       uint64_t *out_value);
+
+/*
+ * Write to a PCI BAR (MMIO).
+ *
+ * @mem_idx: memory index from apple_dext_get_bar_info
+ * @offset:  byte offset within the BAR
+ * @width:   access width in bytes (1, 2, 4, or 8)
+ * @value:   value to write
+ */
+kern_return_t apple_dext_mmio_write(io_connect_t connection,
+                                        uint8_t mem_idx,
+                                        uint64_t offset,
+                                        uint64_t width,
+                                        uint64_t value);
+
+/*
+ * Set up interrupt forwarding for the PCI device.
+ * Creates IOInterruptDispatchSource handlers for all available
+ * MSI/MSI-X vectors in the dext. Interrupts are queued in a ring
+ * buffer and retrieved via apple_dext_check_interrupt().
+ *
+ * @out_num_vectors: receives the number of interrupt vectors registered
+ */
+kern_return_t apple_dext_setup_interrupts(io_connect_t connection,
+                                              uint32_t *out_num_vectors);
+
+/*
+ * Reset the PCI device via the dext.  Tries FLR first, then falls
+ * back to PM reset (D3hot → D0 transition).
+ */
+kern_return_t apple_dext_reset_device(io_connect_t connection);
+
+/*
+ * Set the IRQ enable mask in the dext.  Only vectors with their
+ * corresponding bit set will be recorded as pending when the
+ * hardware fires.  mask[] is 4 x uint64_t covering 256 vectors.
+ */
+kern_return_t apple_dext_set_irq_mask(io_connect_t connection,
+                                      const uint64_t mask[4]);
+
+/*
+ * Read and clear all pending interrupt bits from the dext.
+ * Returns up to 256 bits (4 MSI/MSI-X vectors per bit) across
+ * 4 uint64_t words.  Each bit that was set is atomically cleared
+ * in the dext.
+ */
+kern_return_t apple_dext_read_pending_irqs(io_connect_t connection,
+                                           uint64_t pending[4]);
+
+/*
+ * Opaque state for async interrupt notification from the dext.
+ */
+typedef struct AppleDextInterruptNotify AppleDextInterruptNotify;
+
+/*
+ * Create async interrupt notification.  handler_fn is called on a GCD
+ * dispatch queue whenever the dext signals that one or more interrupt
+ * bits have been set.  The handler should wake the QEMU main loop,
+ * which then calls apple_dext_read_pending_irqs() to drain the bits.
+ *
+ * The notification is armed immediately upon creation.
+ */
+AppleDextInterruptNotify *
+apple_dext_interrupt_notify_create(io_connect_t connection,
+                                   void (*handler_fn)(void *opaque),
+                                   void *opaque);
+
+/*
+ * Re-arm the async interrupt notification after draining pending bits.
+ * Must be called after each wakeup to receive subsequent notifications.
+ */
+kern_return_t
+apple_dext_interrupt_notify_rearm(AppleDextInterruptNotify *notify);
+
+/*
+ * Tear down and free async interrupt notification state.
+ */
+void apple_dext_interrupt_notify_destroy(AppleDextInterruptNotify *notify);
+
+#endif /* HW_VFIO_APPLE_DEXT_CLIENT_H */
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 1ee9c11d5b..965c8e5b80 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -36,3 +36,10 @@ system_ss.add(when: 'CONFIG_IOMMUFD', if_false: files('iommufd-stubs.c'))
 system_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
   'display.c',
 ))
+
+# Apple VFIO backend
+if host_os == 'darwin'
+  system_ss.add(when: 'CONFIG_VFIO',
+                if_true: [files('apple-dext-client.c'),
+                coref, iokit])
+endif
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 08/10] vfio/apple: Add IOMMU container and PCI device
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (6 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 07/10] vfio/apple: Add DriverKit dext client library Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 09/10] vfio/apple: Add apple-dma-pci companion device Scott J. Goldman
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman, Scott J. Goldman

From: "Scott J. Goldman" <scottjg@umich.edu>

Add the core Apple VFIO backend: an IOMMU container type and a PCI
device type that together provide VFIO passthrough on macOS via the
VFIOUserPCIDriver DriverKit extension.

AppleVFIOContainer (container-apple.c):
  Implements VFIOIOMMUClass for DMA map/unmap by calling through to the
  dext client's register/unregister_dma functions.  Synthesizes
  VFIO_DEVICE_GET_INFO responses with appropriate flags for the
  passthrough device.

VFIOApplePCIDevice (apple-device.c):
  QOM type "vfio-apple-pci" subclassing VFIOPCIDevice.  Provides the
  full VFIODeviceIOOps implementation:
  - Config space reads forwarded to the dext; writes filtered to block
    BAR and status register reprogramming (macOS/DART owns those).
  - PCI COMMAND register writes forwarded for bus-master/memory-space
    enable.
  - BAR regions directly mapped via IOConnectMapMemory64 and accessed
    as host MMIO loads/stores.
  - MSI/MSI-X interrupt delivery through a bitmap-poll model with
    async GCD notification bridged to an EventNotifier.
  - Device reset via the dext (IOPCIDevice::Reset FLR/hot-reset).
  - Shared dext connection management for multi-function devices.

apple.h defines the shared types: AppleVFIOContainer, AppleVFIOState,
AppleVFIOBarMap, and VFIOApplePCIDevice.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 hw/vfio/apple-device.c    | 945 ++++++++++++++++++++++++++++++++++++++
 hw/vfio/apple.h           |  74 +++
 hw/vfio/container-apple.c | 241 ++++++++++
 hw/vfio/meson.build       |   6 +-
 4 files changed, 1264 insertions(+), 2 deletions(-)
 create mode 100644 hw/vfio/apple-device.c
 create mode 100644 hw/vfio/apple.h
 create mode 100644 hw/vfio/container-apple.c

diff --git a/hw/vfio/apple-device.c b/hw/vfio/apple-device.c
new file mode 100644
index 0000000000..9291ac845b
--- /dev/null
+++ b/hw/vfio/apple-device.c
@@ -0,0 +1,945 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Apple/macOS VFIO PCI device passthrough via DriverKit dext.
+ *
+ * Copyright (c) 2026 Scott J. Goldman
+ */
+
+#include "qemu/osdep.h"
+
+#include <errno.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <linux/vfio.h>
+
+#include "apple-dext-client.h"
+#include "hw/vfio/apple.h"
+#include "hw/vfio/vfio-container.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/host-pci-mmio.h"
+#include "qemu/main-loop.h"
+
+typedef struct AppleVFIOSharedDext {
+    io_connect_t conn;
+    uint32_t refs;
+} AppleVFIOSharedDext;
+
+typedef struct AppleVFIODMAProbe {
+    uint64_t managed_bdf;
+    uint64_t host_bus;
+    uint64_t host_device;
+    uint64_t host_function;
+    DeviceState *match;
+} AppleVFIODMAProbe;
+
+static GHashTable *apple_vfio_shared_dexts;
+
+static inline guint apple_vfio_dext_key(uint8_t bus, uint8_t device,
+                                        uint8_t function)
+{
+    return ((guint)bus << 16) | ((guint)device << 8) | function;
+}
+
+static inline AppleVFIOContainer *apple_vfio_container(VFIODevice *vbasedev)
+{
+    return VFIO_IOMMU_APPLE(vbasedev->bcontainer);
+}
+
+static inline io_connect_t apple_vfio_connection(VFIODevice *vbasedev)
+{
+    AppleVFIOContainer *container = apple_vfio_container(vbasedev);
+
+    return container ? container->dext_conn : IO_OBJECT_NULL;
+}
+
+static void apple_vfio_find_dma_companion_cb(PCIBus *bus, PCIDevice *pdev,
+                                             void *opaque)
+{
+    AppleVFIODMAProbe *probe = opaque;
+    Error *err = NULL;
+    uint64_t managed_bdf;
+    uint64_t host_bus;
+    uint64_t host_device;
+    uint64_t host_function;
+
+    if (probe->match ||
+        !object_dynamic_cast(OBJECT(pdev), "apple-dma-pci")) {
+        return;
+    }
+
+    managed_bdf = object_property_get_uint(OBJECT(pdev), "managed-bdf", &err);
+    if (err) {
+        error_free(err);
+        return;
+    }
+
+    host_bus = object_property_get_uint(OBJECT(pdev), "x-apple-host-bus", &err);
+    if (err) {
+        error_free(err);
+        return;
+    }
+
+    host_device = object_property_get_uint(OBJECT(pdev),
+                                           "x-apple-host-device", &err);
+    if (err) {
+        error_free(err);
+        return;
+    }
+
+    host_function = object_property_get_uint(OBJECT(pdev),
+                                             "x-apple-host-function", &err);
+    if (err) {
+        error_free(err);
+        return;
+    }
+
+    if (managed_bdf == probe->managed_bdf &&
+        host_bus == probe->host_bus &&
+        host_device == probe->host_device &&
+        host_function == probe->host_function) {
+        probe->match = DEVICE(pdev);
+    }
+}
+
+static DeviceState *apple_vfio_find_dma_companion(VFIOApplePCIDevice *adev)
+{
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+    PCIDevice *pdev = PCI_DEVICE(vdev);
+    AppleVFIODMAProbe probe = {
+        .managed_bdf = PCI_BUILD_BDF(pci_dev_bus_num(pdev), pdev->devfn),
+        .host_bus = vdev->host.bus,
+        .host_device = vdev->host.slot,
+        .host_function = vdev->host.function,
+    };
+
+    pci_for_each_device_under_bus(pci_device_root_bus(pdev),
+                                  apple_vfio_find_dma_companion_cb, &probe);
+    return probe.match;
+}
+
+static void apple_vfio_signal_irqfd(int fd)
+{
+    static const uint64_t value = 1;
+    ssize_t ret;
+
+    if (fd < 0) {
+        return;
+    }
+
+    do {
+        ret = write(fd, &value, sizeof(value));
+    } while (ret < 0 && errno == EINTR);
+}
+
+static void apple_vfio_deliver_irq(VFIOPCIDevice *vdev, uint32_t vector)
+{
+    switch (vdev->interrupt) {
+    case VFIO_INT_MSI:
+    case VFIO_INT_MSIX:
+        if (vector < vdev->nr_vectors && vdev->msi_vectors[vector].use) {
+            apple_vfio_signal_irqfd(
+                event_notifier_get_wfd(&vdev->msi_vectors[vector].interrupt));
+        }
+        break;
+    case VFIO_INT_INTx:
+        apple_vfio_signal_irqfd(
+            event_notifier_get_wfd(&vdev->intx.interrupt));
+        break;
+    default:
+        break;
+    }
+}
+
+/*
+ * Called on a GCD dispatch queue when the dext signals pending interrupts.
+ * Just pokes the EventNotifier to wake the QEMU main loop.
+ */
+static void apple_vfio_irq_wakeup(void *opaque)
+{
+    VFIOApplePCIDevice *adev = opaque;
+
+    event_notifier_set(&adev->apple->irq_notifier);
+}
+
+/*
+ * QEMU main-loop fd handler: drain the pending-interrupt bitfield from
+ * the dext, deliver each flagged vector, then re-arm the async wait.
+ */
+static void apple_vfio_irq_handler(void *opaque)
+{
+    VFIOApplePCIDevice *adev = opaque;
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    AppleVFIOState *apple = adev->apple;
+    uint64_t pending[4];
+    int word;
+
+    if (!event_notifier_test_and_clear(&apple->irq_notifier)) {
+        return;
+    }
+
+    if (conn == IO_OBJECT_NULL) {
+        return;
+    }
+
+    if (apple_dext_read_pending_irqs(conn, pending) != KERN_SUCCESS) {
+        apple_dext_interrupt_notify_rearm(apple->irq_notify);
+        return;
+    }
+
+    for (word = 0; word < 4; word++) {
+        uint64_t bits = pending[word];
+
+        while (bits) {
+            int bit = __builtin_ctzll(bits);
+            uint32_t vector = word * 64 + bit;
+
+            apple_vfio_deliver_irq(vdev, vector);
+            bits &= bits - 1;
+        }
+    }
+
+    apple_dext_interrupt_notify_rearm(apple->irq_notify);
+}
+
+bool apple_vfio_get_bar_info(VFIOApplePCIDevice *adev, uint8_t bar,
+                             uint8_t *mem_idx, uint64_t *size,
+                             uint8_t *type)
+{
+    io_connect_t conn = apple_vfio_connection(&VFIO_PCI_DEVICE(adev)->vbasedev);
+
+    if (conn != IO_OBJECT_NULL) {
+        return apple_dext_get_bar_info(conn, bar, mem_idx, size, type) ==
+               KERN_SUCCESS;
+    }
+
+    if (mem_idx) {
+        *mem_idx = 0;
+    }
+    if (size) {
+        *size = 0;
+    }
+    if (type) {
+        *type = 0;
+    }
+    return false;
+}
+
+static void apple_vfio_pci_init(VFIOApplePCIDevice *adev)
+{
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+
+    /*
+     * On macOS, HVF can only map on 16kb page boundaries, so these quirk
+     * fixes end up breaking things. Likewise the performance enhancements
+     * there rely on kvm-specific features. Disable for now, but we should
+     * revisit this.
+     */
+    vdev->no_bar_quirks = true;
+}
+
+static bool apple_vfio_pci_pre_realize(VFIOApplePCIDevice *adev, Error **errp)
+{
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+
+    adev->apple = g_new0(AppleVFIOState, 1);
+
+    if (!vbasedev->name) {
+        vbasedev->name = g_strdup_printf("apple-%04x:%02x:%02x.%x",
+                                         vdev->host.domain,
+                                         vdev->host.bus,
+                                         vdev->host.slot,
+                                         vdev->host.function);
+    }
+
+    return true;
+}
+
+static bool apple_vfio_create_dma_companion(VFIOApplePCIDevice *adev,
+                                            Error **errp)
+{
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+    PCIDevice *pdev = PCI_DEVICE(vdev);
+    DeviceState *dev;
+
+    if (adev->dma_companion_autocreated && adev->dma_companion) {
+        return true;
+    }
+
+    if (apple_vfio_find_dma_companion(adev) != NULL) {
+        return true;
+    }
+
+    dev = qdev_new("apple-dma-pci");
+    if (!object_property_set_uint(OBJECT(dev), "managed-bdf",
+                                  PCI_BUILD_BDF(pci_dev_bus_num(pdev),
+                                                pdev->devfn), errp) ||
+        !object_property_set_uint(OBJECT(dev), "x-apple-host-bus",
+                                  vdev->host.bus, errp) ||
+        !object_property_set_uint(OBJECT(dev), "x-apple-host-device",
+                                  vdev->host.slot, errp) ||
+        !object_property_set_uint(OBJECT(dev), "x-apple-host-function",
+                                  vdev->host.function, errp)) {
+        object_unref(OBJECT(dev));
+        return false;
+    }
+
+    if (!qdev_realize(dev, BUS(pci_get_bus(pdev)), errp)) {
+        object_unref(OBJECT(dev));
+        return false;
+    }
+
+    adev->dma_companion = dev;
+    adev->dma_companion_autocreated = true;
+    object_unref(OBJECT(dev));
+    return true;
+}
+
+static void apple_vfio_destroy_dma_companion(VFIOApplePCIDevice *adev)
+{
+    if (!adev->dma_companion_autocreated || adev->dma_companion == NULL) {
+        return;
+    }
+
+    object_unparent(OBJECT(adev->dma_companion));
+    adev->dma_companion = NULL;
+    adev->dma_companion_autocreated = false;
+}
+
+bool apple_vfio_device_setup(VFIOApplePCIDevice *adev, Error **errp)
+{
+    VFIODevice *vbasedev = &VFIO_PCI_DEVICE(adev)->vbasedev;
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    uint32_t num_vectors = 0;
+    kern_return_t kr;
+
+    if (conn == IO_OBJECT_NULL) {
+        error_setg(errp, "vfio-apple: missing dext connection");
+        return false;
+    }
+
+    kr = apple_dext_setup_interrupts(conn, &num_vectors);
+    if (kr != KERN_SUCCESS) {
+        error_setg(errp, "vfio-apple: failed to setup interrupts (kr=0x%x)",
+                   kr);
+        return false;
+    }
+
+    adev->apple->num_irq_vectors = num_vectors;
+
+    if (event_notifier_init(&adev->apple->irq_notifier, false) < 0) {
+        error_setg(errp, "vfio-apple: failed to create IRQ event notifier");
+        return false;
+    }
+
+    qemu_set_fd_handler(event_notifier_get_fd(&adev->apple->irq_notifier),
+                        apple_vfio_irq_handler, NULL, adev);
+
+    adev->apple->irq_notify =
+        apple_dext_interrupt_notify_create(conn, apple_vfio_irq_wakeup, adev);
+    if (!adev->apple->irq_notify) {
+        error_setg(errp,
+                   "vfio-apple: failed to create IRQ async notification");
+        qemu_set_fd_handler(
+            event_notifier_get_fd(&adev->apple->irq_notifier),
+            NULL, NULL, NULL);
+        event_notifier_cleanup(&adev->apple->irq_notifier);
+        return false;
+    }
+
+    return true;
+}
+
+void apple_vfio_device_cleanup(VFIOApplePCIDevice *adev)
+{
+    AppleVFIOState *apple = adev->apple;
+
+    if (!apple) {
+        return;
+    }
+
+    if (apple->irq_notify) {
+        apple_dext_interrupt_notify_destroy(apple->irq_notify);
+        apple->irq_notify = NULL;
+
+        qemu_set_fd_handler(event_notifier_get_fd(&apple->irq_notifier),
+                            NULL, NULL, NULL);
+        event_notifier_cleanup(&apple->irq_notifier);
+    }
+}
+
+static int apple_vfio_device_feature(VFIODevice *vdev,
+                                     struct vfio_device_feature *feat)
+{
+    return -ENOTTY;
+}
+
+static int apple_vfio_device_reset(VFIODevice *vbasedev)
+{
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+
+    if (conn == IO_OBJECT_NULL) {
+        return -ENODEV;
+    }
+
+    return apple_dext_reset_device(conn) == KERN_SUCCESS ? 0 : -EIO;
+}
+
+static int apple_vfio_get_region_info(VFIODevice *vbasedev,
+                                      struct vfio_region_info *info,
+                                      int *fd)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    uint32_t index = info->index;
+    uint64_t size = 0;
+
+    if (fd) {
+        *fd = -1;
+    }
+
+    memset((char *)info + offsetof(struct vfio_region_info, flags), 0,
+           sizeof(*info) - offsetof(struct vfio_region_info, flags));
+
+    info->index = index;
+    info->flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE;
+    info->offset = (uint64_t)index << 20;
+
+    switch (info->index) {
+    case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
+        if (!apple_vfio_get_bar_info(adev, info->index, NULL, &size, NULL)) {
+            size = 0;
+        }
+        info->size = size;
+        info->flags |= VFIO_REGION_INFO_FLAG_MMAP;
+        break;
+    case VFIO_PCI_CONFIG_REGION_INDEX:
+        info->size = PCIE_CONFIG_SPACE_SIZE;
+        break;
+    case VFIO_PCI_ROM_REGION_INDEX:
+    case VFIO_PCI_VGA_REGION_INDEX:
+        info->size = 0;
+        break;
+    default:
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int apple_vfio_get_irq_info(VFIODevice *vbasedev,
+                                   struct vfio_irq_info *info)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+
+    switch (info->index) {
+    case VFIO_PCI_MSI_IRQ_INDEX:
+        info->flags = VFIO_IRQ_INFO_EVENTFD;
+        info->count = adev->apple->num_irq_vectors;
+        break;
+    case VFIO_PCI_MSIX_IRQ_INDEX:
+        info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;
+        info->count = vdev->msix ? vdev->msix->entries : 0;
+        break;
+    case VFIO_PCI_INTX_IRQ_INDEX:
+        info->flags = VFIO_IRQ_INFO_EVENTFD;
+        info->count = 1;
+        break;
+    case VFIO_PCI_ERR_IRQ_INDEX:
+    case VFIO_PCI_REQ_IRQ_INDEX:
+        /*
+         * Apple dext passthrough has no kernel-side AER or device-request
+         * notification currently; return count 0 to tell the core to skip
+         * these.
+         */
+        info->flags = 0;
+        info->count = 0;
+        break;
+    default:
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static void apple_vfio_update_irq_mask(VFIODevice *vbasedev)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    uint64_t mask[4] = {0};
+    uint32_t i;
+
+    if (conn == IO_OBJECT_NULL) {
+        return;
+    }
+
+    switch (vdev->interrupt) {
+    case VFIO_INT_MSI:
+    case VFIO_INT_MSIX:
+        for (i = 0; i < vdev->nr_vectors; i++) {
+            if (vdev->msi_vectors[i].use) {
+                mask[i / 64] |= 1ULL << (i % 64);
+            }
+        }
+        break;
+    case VFIO_INT_INTx:
+        mask[0] = 1;
+        break;
+    default:
+        break;
+    }
+
+    apple_dext_set_irq_mask(conn, mask);
+}
+
+static int apple_vfio_set_irqs(VFIODevice *vbasedev, struct vfio_irq_set *irq)
+{
+    apple_vfio_update_irq_mask(vbasedev);
+    return 0;
+}
+
+static int apple_vfio_bar_read(VFIODevice *vbasedev, uint8_t nr, off_t off,
+                               uint32_t size, void *data)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    AppleVFIOBarMap *bm = &adev->apple->bar_maps[nr];
+    const void *p;
+    uint64_t value;
+
+    if (!bm->addr || off + size > bm->size) {
+        error_report("vfio-apple: BAR%d read out of range or unmapped", nr);
+        return -EINVAL;
+    }
+
+    if (size != 1 && size != 2 && size != 4 && size != 8) {
+        return -EINVAL;
+    }
+
+    p = (const char *)bm->addr + off;
+    value = host_pci_ldn_le_p(p, size);
+    memcpy(data, &value, size);
+
+    return size;
+}
+
+static int apple_vfio_region_read(VFIODevice *vbasedev, uint8_t nr, off_t off,
+                                  uint32_t size, void *data)
+{
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    kern_return_t kr;
+    uint32_t legacy_size = 0;
+
+    if (nr != VFIO_PCI_CONFIG_REGION_INDEX) {
+        return apple_vfio_bar_read(vbasedev, nr, off, size, data);
+    }
+
+    if (conn == IO_OBJECT_NULL) {
+        return -ENODEV;
+    }
+
+    legacy_size = MIN(size, PCIE_CONFIG_SPACE_SIZE - off);
+
+    if (legacy_size == 1 || legacy_size == 2 || legacy_size == 4) {
+        uint64_t value = 0;
+
+        kr = apple_dext_config_read(conn, off, legacy_size, &value);
+        if (kr != KERN_SUCCESS) {
+            return -EIO;
+        }
+
+        memcpy(data, &value, legacy_size);
+        if (legacy_size < size) {
+            memset((uint8_t *)data + legacy_size, 0, size - legacy_size);
+        }
+        return size;
+    }
+
+    kr = apple_dext_config_read_block(conn, off, data, legacy_size);
+    if (kr != KERN_SUCCESS) {
+        return -EIO;
+    }
+    if (legacy_size < size) {
+        memset((uint8_t *)data + legacy_size, 0, size - legacy_size);
+    }
+    return size;
+}
+
+static bool apple_vfio_config_write_is_safe(off_t off, uint32_t size)
+{
+    off_t end = off + size;
+
+    /*
+     * Block writes that would reprogram the device's bus identity or
+     * address decoders.  macOS / DART owns those registers; touching
+     * them from the guest breaks the IOKit mapping and the device
+     * "falls off the bus."
+     *
+     * Everything else (vendor capabilities, MSI/MSI-X, PCIe cap, etc.)
+     * is forwarded.
+     */
+
+    /* PCI_STATUS stays emulated/blocked */
+    if (off < PCI_STATUS + 2 && end > PCI_STATUS) {
+        return false;
+    }
+
+    /* BAR0-BAR5 */
+    if (off < PCI_BASE_ADDRESS_5 + 4 && end > PCI_BASE_ADDRESS_0) {
+        return false;
+    }
+
+    return true;
+}
+
+static int apple_vfio_forward_command_write(io_connect_t conn, off_t off,
+                                            uint32_t size, const void *data)
+{
+    const uint8_t *bytes = data;
+    off_t end = off + size;
+    off_t cmd_start = MAX(off, (off_t)PCI_COMMAND);
+    off_t cmd_end = MIN(end, (off_t)(PCI_COMMAND + 2));
+    off_t pos;
+
+    if (conn == IO_OBJECT_NULL) {
+        return -ENODEV;
+    }
+
+    for (pos = cmd_start; pos < cmd_end; pos++) {
+        uint64_t value = bytes[pos - off];
+        kern_return_t kr = apple_dext_config_write(conn, pos, 1, value);
+
+        if (kr != KERN_SUCCESS) {
+            return -EIO;
+        }
+    }
+
+    return 0;
+}
+
+static int apple_vfio_bar_write(VFIODevice *vbasedev, uint8_t nr, off_t off,
+                                uint32_t size, void *data)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    AppleVFIOBarMap *bm = &adev->apple->bar_maps[nr];
+    void *p;
+    uint64_t value = 0;
+
+    if (!bm->addr || off + size > bm->size) {
+        error_report("vfio-apple: BAR%d write out of range or unmapped", nr);
+        return -EINVAL;
+    }
+
+    if (size != 1 && size != 2 && size != 4 && size != 8) {
+        return -EINVAL;
+    }
+
+    p = (char *)bm->addr + off;
+    memcpy(&value, data, size);
+    host_pci_stn_le_p(p, size, value);
+
+    return size;
+}
+
+static int apple_vfio_region_write(VFIODevice *vbasedev, uint8_t nr, off_t off,
+                                   uint32_t size, void *data, bool post)
+{
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    uint64_t value = 0;
+    kern_return_t kr;
+    uint32_t legacy_size;
+
+    if (nr != VFIO_PCI_CONFIG_REGION_INDEX) {
+        return apple_vfio_bar_write(vbasedev, nr, off, size, data);
+    }
+
+    if (off < PCI_COMMAND + 2 && off + size > PCI_COMMAND) {
+        int ret = apple_vfio_forward_command_write(conn, off, size, data);
+
+        if (ret) {
+            return ret;
+        }
+
+        if (off >= PCI_COMMAND && off + size <= PCI_COMMAND + 2) {
+            return size;
+        }
+    }
+
+    if (!apple_vfio_config_write_is_safe(off, size)) {
+        return size;
+    }
+
+    if (conn == IO_OBJECT_NULL) {
+        return -ENODEV;
+    }
+
+    memcpy(&value, data, size);
+    legacy_size = MIN(size, PCIE_CONFIG_SPACE_SIZE - off);
+    if (!(legacy_size == 1 || legacy_size == 2 || legacy_size == 4)) {
+        return -EINVAL;
+    }
+
+    kr = apple_dext_config_write(conn, off, legacy_size, value);
+    if (kr != KERN_SUCCESS) {
+        return -EIO;
+    }
+    return size;
+}
+
+static int apple_vfio_region_map(VFIODevice *vbasedev, VFIORegion *region);
+static void apple_vfio_region_unmap(VFIODevice *vbasedev, VFIORegion *region);
+
+static int apple_vfio_region_map(VFIODevice *vbasedev, VFIORegion *region)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(adev);
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    int bar = region->nr;
+    VFIOBAR *vbar;
+    mach_vm_address_t local_addr = 0;
+    mach_vm_size_t bar_size = 0;
+    uint8_t bar_type = 0;
+    kern_return_t kr;
+    int i;
+
+    if (bar < VFIO_PCI_BAR0_REGION_INDEX || bar >= VFIO_PCI_ROM_REGION_INDEX) {
+        return 0;
+    }
+
+    vbar = &vdev->bars[bar];
+
+    if (conn == IO_OBJECT_NULL || !vbar->size || vbar->ioport) {
+        return 0;
+    }
+
+    if (bar > 0 && vdev->bars[bar - 1].mem64) {
+        return 0;
+    }
+
+    if (adev->apple->bar_maps[bar].addr != NULL) {
+        return 0;
+    }
+
+    kr = apple_dext_map_bar(conn, bar, &local_addr, &bar_size, &bar_type);
+    if (kr != KERN_SUCCESS) {
+        warn_report("vfio-apple: BAR%d map failed for %s: 0x%x",
+                    bar, vbasedev->name, kr);
+        return -EIO;
+    }
+
+    if (bar_size > vbar->size) {
+        bar_size = vbar->size;
+    }
+
+    adev->apple->bar_maps[bar].addr = (void *)local_addr;
+    adev->apple->bar_maps[bar].size = bar_size;
+
+    /*
+     * Use the pre-computed mmap regions — already split around the MSI-X
+     * table/PBA hole by vfio_pci_fixup_msix_region() during realize.
+     * We just need to fill in the host pointers from our dext mapping.
+     */
+    for (i = 0; i < region->nr_mmaps; i++) {
+        region->mmaps[i].mmap = (char *)local_addr + region->mmaps[i].offset;
+        vfio_region_register_mmap(region, i);
+    }
+
+    return 0;
+}
+
+static void apple_vfio_region_unmap(VFIODevice *vbasedev, VFIORegion *region)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    io_connect_t conn = apple_vfio_connection(vbasedev);
+    int bar = region->nr;
+    AppleVFIOBarMap *bm;
+    int i;
+
+    if (bar < VFIO_PCI_BAR0_REGION_INDEX || bar >= VFIO_PCI_ROM_REGION_INDEX) {
+        return;
+    }
+
+    bm = &adev->apple->bar_maps[bar];
+
+    for (i = 0; i < region->nr_mmaps; i++) {
+        if (region->mmaps[i].mmap) {
+            vfio_region_unregister_mmap(region, i);
+            region->mmaps[i].mmap = NULL;
+        }
+    }
+
+    if (conn != IO_OBJECT_NULL && bm->addr != NULL) {
+        apple_dext_unmap_bar(conn, bar, (mach_vm_address_t)bm->addr);
+    }
+
+    bm->addr = NULL;
+    bm->size = 0;
+}
+
+VFIODeviceIOOps apple_vfio_device_io_ops = {
+    .device_feature = apple_vfio_device_feature,
+    .get_region_info = apple_vfio_get_region_info,
+    .get_irq_info = apple_vfio_get_irq_info,
+    .set_irqs = apple_vfio_set_irqs,
+    .device_reset = apple_vfio_device_reset,
+    .region_read = apple_vfio_region_read,
+    .region_write = apple_vfio_region_write,
+    .region_map = apple_vfio_region_map,
+    .region_unmap = apple_vfio_region_unmap,
+};
+
+bool apple_vfio_dext_publish(uint8_t bus, uint8_t device, uint8_t function,
+                             io_connect_t conn)
+{
+    AppleVFIOSharedDext *shared;
+    guint key = apple_vfio_dext_key(bus, device, function);
+
+    if (!apple_vfio_shared_dexts) {
+        apple_vfio_shared_dexts =
+            g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL, g_free);
+    }
+
+    if (g_hash_table_lookup(apple_vfio_shared_dexts, GUINT_TO_POINTER(key))) {
+        return false;
+    }
+
+    shared = g_new0(AppleVFIOSharedDext, 1);
+    shared->conn = conn;
+    shared->refs = 1;
+    g_hash_table_insert(apple_vfio_shared_dexts, GUINT_TO_POINTER(key), shared);
+    return true;
+}
+
+io_connect_t apple_vfio_dext_lookup(uint8_t bus, uint8_t device,
+                                    uint8_t function)
+{
+    AppleVFIOSharedDext *shared;
+    guint key = apple_vfio_dext_key(bus, device, function);
+
+    if (!apple_vfio_shared_dexts) {
+        return IO_OBJECT_NULL;
+    }
+
+    shared = g_hash_table_lookup(apple_vfio_shared_dexts,
+                                 GUINT_TO_POINTER(key));
+    if (!shared) {
+        return IO_OBJECT_NULL;
+    }
+
+    shared->refs++;
+    return shared->conn;
+}
+
+void apple_vfio_dext_release(uint8_t bus, uint8_t device, uint8_t function,
+                             io_connect_t conn)
+{
+    AppleVFIOSharedDext *shared;
+    guint key = apple_vfio_dext_key(bus, device, function);
+
+    if (!apple_vfio_shared_dexts) {
+        return;
+    }
+
+    shared = g_hash_table_lookup(apple_vfio_shared_dexts,
+                                 GUINT_TO_POINTER(key));
+    if (!shared || shared->conn != conn) {
+        return;
+    }
+
+    if (--shared->refs == 0) {
+        apple_dext_disconnect(conn);
+        g_hash_table_remove(apple_vfio_shared_dexts, GUINT_TO_POINTER(key));
+    }
+}
+
+/* ------------------------------------------------------------------ */
+/* QOM type: vfio-apple-pci                                           */
+/* ------------------------------------------------------------------ */
+
+static void (*parent_realize)(PCIDevice *, Error **);
+static void (*parent_exit)(PCIDevice *);
+
+static void apple_vfio_pci_instance_init(Object *obj)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(obj);
+
+    apple_vfio_pci_init(adev);
+}
+
+static void apple_vfio_pci_realize_fn(PCIDevice *pdev, Error **errp)
+{
+    ERRP_GUARD();
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(pdev);
+
+    if (!apple_vfio_pci_pre_realize(adev, errp)) {
+        return;
+    }
+
+    parent_realize(pdev, errp);
+    if (*errp) {
+        g_clear_pointer(&adev->apple, g_free);
+        return;
+    }
+
+    if (!apple_vfio_create_dma_companion(adev, errp)) {
+        if (parent_exit) {
+            parent_exit(pdev);
+        }
+        g_clear_pointer(&adev->apple, g_free);
+        return;
+    }
+}
+
+static void apple_vfio_pci_exit_fn(PCIDevice *pdev)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(pdev);
+
+    apple_vfio_destroy_dma_companion(adev);
+
+    if (parent_exit) {
+        parent_exit(pdev);
+    }
+}
+
+static void apple_vfio_pci_finalize_fn(Object *obj)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(obj);
+
+    apple_vfio_device_cleanup(adev);
+    g_clear_pointer(&adev->apple, g_free);
+}
+
+static void apple_vfio_pci_class_init(ObjectClass *klass, const void *data)
+{
+    PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    parent_realize = pdc->realize;
+    parent_exit = pdc->exit;
+
+    pdc->realize = apple_vfio_pci_realize_fn;
+    pdc->exit = apple_vfio_pci_exit_fn;
+    dc->user_creatable = true;
+    dc->desc = "VFIO-based PCI device assignment (Apple/macOS)";
+}
+
+static const TypeInfo vfio_apple_pci_info = {
+    .name = TYPE_VFIO_APPLE_PCI,
+    .parent = TYPE_VFIO_PCI,
+    .instance_size = sizeof(VFIOApplePCIDevice),
+    .instance_init = apple_vfio_pci_instance_init,
+    .instance_finalize = apple_vfio_pci_finalize_fn,
+    .class_init = apple_vfio_pci_class_init,
+};
+
+static void register_vfio_apple_pci_type(void)
+{
+    type_register_static(&vfio_apple_pci_info);
+}
+
+type_init(register_vfio_apple_pci_type)
diff --git a/hw/vfio/apple.h b/hw/vfio/apple.h
new file mode 100644
index 0000000000..81d4bd2b66
--- /dev/null
+++ b/hw/vfio/apple.h
@@ -0,0 +1,74 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Apple/macOS VFIO passthrough common definitions.
+ *
+ * Copyright (c) 2026 Scott J. Goldman
+ */
+
+#ifndef HW_VFIO_APPLE_H
+#define HW_VFIO_APPLE_H
+
+#include <stdint.h>
+
+#include "hw/vfio/pci.h"
+#include "hw/vfio/vfio-container.h"
+#include "qapi/error.h"
+#include "qemu/event_notifier.h"
+
+#ifdef CONFIG_DARWIN
+#include <IOKit/IOKitLib.h>
+#else
+typedef uintptr_t io_connect_t;
+#define IO_OBJECT_NULL ((io_connect_t)0)
+#endif
+
+OBJECT_DECLARE_SIMPLE_TYPE(AppleVFIOContainer, VFIO_IOMMU_APPLE)
+
+struct AppleVFIOContainer {
+    VFIOContainer parent_obj;
+    io_connect_t dext_conn;
+    uint8_t host_bus;
+    uint8_t host_device;
+    uint8_t host_function;
+};
+
+typedef struct AppleDextInterruptNotify AppleDextInterruptNotify;
+
+typedef struct AppleVFIOBarMap {
+    void *addr;
+    size_t size;
+} AppleVFIOBarMap;
+
+typedef struct AppleVFIOState {
+    AppleDextInterruptNotify *irq_notify;
+    EventNotifier irq_notifier;
+    uint32_t num_irq_vectors;
+    AppleVFIOBarMap bar_maps[PCI_ROM_SLOT];
+} AppleVFIOState;
+
+OBJECT_DECLARE_SIMPLE_TYPE(VFIOApplePCIDevice, VFIO_APPLE_PCI)
+
+struct VFIOApplePCIDevice {
+    VFIOPCIDevice parent_obj;
+    AppleVFIOState *apple;
+    DeviceState *dma_companion;
+    bool dma_companion_autocreated;
+};
+
+extern VFIODeviceIOOps apple_vfio_device_io_ops;
+
+bool apple_vfio_device_setup(VFIOApplePCIDevice *adev, Error **errp);
+void apple_vfio_device_cleanup(VFIOApplePCIDevice *adev);
+bool apple_vfio_get_bar_info(VFIOApplePCIDevice *adev, uint8_t bar,
+                             uint8_t *mem_idx, uint64_t *size,
+                             uint8_t *type);
+
+bool apple_vfio_dext_publish(uint8_t bus, uint8_t device, uint8_t function,
+                             io_connect_t conn);
+io_connect_t apple_vfio_dext_lookup(uint8_t bus, uint8_t device,
+                                    uint8_t function);
+void apple_vfio_dext_release(uint8_t bus, uint8_t device, uint8_t function,
+                             io_connect_t conn);
+
+#endif /* HW_VFIO_APPLE_H */
diff --git a/hw/vfio/container-apple.c b/hw/vfio/container-apple.c
new file mode 100644
index 0000000000..5a5c55b622
--- /dev/null
+++ b/hw/vfio/container-apple.c
@@ -0,0 +1,241 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Apple/macOS VFIO IOMMU container backend.
+ *
+ * Copyright (c) 2026 Scott J. Goldman
+ */
+
+#include "qemu/osdep.h"
+
+#include <linux/vfio.h>
+
+#include "apple-dext-client.h"
+#include "hw/vfio/apple.h"
+#include "hw/vfio/vfio-device.h"
+#include "hw/vfio/vfio-listener.h"
+#include "qapi/error.h"
+#include "system/ramblock.h"
+
+static bool apple_vfio_setup(VFIOContainer *bcontainer, Error **errp)
+{
+    bcontainer->pgsizes = qemu_real_host_page_size();
+    bcontainer->dma_max_mappings = UINT_MAX;
+    bcontainer->dirty_pages_supported = false;
+    bcontainer->max_dirty_bitmap_size = 0;
+    bcontainer->dirty_pgsizes = 0;
+    return true;
+}
+
+/*
+ * DMA map/unmap are no-ops: Apple passthrough handles DMA mapping through
+ * the companion apple-dma-pci device which talks to the dext directly,
+ * bypassing the IOMMU container's DMA path.  The stubs are required because
+ * the VFIO listener asserts they are non-NULL.
+ */
+static int apple_vfio_dma_map(const VFIOContainer *bcontainer, hwaddr iova,
+                              uint64_t size, void *vaddr, bool readonly,
+                              MemoryRegion *mr)
+{
+    return 0;
+}
+
+static int apple_vfio_dma_unmap(const VFIOContainer *bcontainer, hwaddr iova,
+                                uint64_t size, IOMMUTLBEntry *iotlb,
+                                bool unmap_all)
+{
+    return 0;
+}
+
+static int apple_vfio_set_dirty_page_tracking(const VFIOContainer *bcontainer,
+                                              bool start, Error **errp)
+{
+    error_setg_errno(errp, ENOTSUP, "vfio-apple does not support migration");
+    return -ENOTSUP;
+}
+
+static int apple_vfio_query_dirty_bitmap(const VFIOContainer *bcontainer,
+                                         VFIOBitmap *vbmap, hwaddr iova,
+                                         hwaddr size, uint64_t backend_flag,
+                                         Error **errp)
+{
+    error_setg_errno(errp, ENOTSUP, "vfio-apple does not support migration");
+    return -ENOTSUP;
+}
+
+static int apple_vfio_pci_hot_reset(VFIODevice *vbasedev, bool single)
+{
+    return 0;
+}
+
+static AppleVFIOContainer *apple_vfio_container_connect(AddressSpace *as,
+                                                        VFIODevice *vbasedev,
+                                                        Error **errp)
+{
+    VFIOPCIDevice *vdev = VFIO_PCI_DEVICE(vbasedev->dev);
+    AppleVFIOContainer *container;
+    VFIOContainer *bcontainer;
+    VFIOAddressSpace *space;
+    VFIOIOMMUClass *vioc;
+    int ret;
+
+    space = vfio_address_space_get(as);
+    container = VFIO_IOMMU_APPLE(object_new(TYPE_VFIO_IOMMU_APPLE));
+    bcontainer = VFIO_IOMMU(container);
+    vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+
+    container->host_bus = vdev->host.bus;
+    container->host_device = vdev->host.slot;
+    container->host_function = vdev->host.function;
+
+    ret = ram_block_uncoordinated_discard_disable(true);
+    if (ret) {
+        error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
+        goto fail_unref;
+    }
+
+    container->dext_conn = apple_dext_connect(container->host_bus,
+                                                  container->host_device,
+                                                  container->host_function);
+    if (container->dext_conn == IO_OBJECT_NULL) {
+        error_setg(errp,
+                   "vfio-apple: could not connect to dext for host PCI "
+                   "%02x:%02x.%x",
+                   container->host_bus, container->host_device,
+                   container->host_function);
+        goto fail_discards;
+    }
+
+    if (apple_dext_claim(container->dext_conn) != KERN_SUCCESS) {
+        error_setg(errp,
+                   "vfio-apple: failed to claim dext-backed PCI device "
+                   "%02x:%02x.%x",
+                   container->host_bus, container->host_device,
+                   container->host_function);
+        goto fail_release_conn;
+    }
+
+    if (!apple_vfio_dext_publish(container->host_bus, container->host_device,
+                                 container->host_function,
+                                 container->dext_conn)) {
+        error_setg(errp,
+                   "vfio-apple: duplicate dext owner for host PCI %02x:%02x.%x",
+                   container->host_bus, container->host_device,
+                   container->host_function);
+        goto fail_release_conn;
+    }
+
+    if (!vioc->setup(bcontainer, errp)) {
+        goto fail_shared_conn;
+    }
+
+    vfio_address_space_insert(space, bcontainer);
+
+    if (!vfio_listener_register(bcontainer, errp)) {
+        goto fail_address_space;
+    }
+
+    bcontainer->initialized = true;
+    return container;
+
+fail_address_space:
+    vfio_listener_unregister(bcontainer);
+    QLIST_REMOVE(bcontainer, next);
+    bcontainer->space = NULL;
+fail_shared_conn:
+    apple_vfio_dext_release(container->host_bus, container->host_device,
+                            container->host_function, container->dext_conn);
+    container->dext_conn = IO_OBJECT_NULL;
+fail_discards:
+    ram_block_uncoordinated_discard_disable(false);
+fail_unref:
+    object_unref(container);
+    vfio_address_space_put(space);
+    return NULL;
+
+fail_release_conn:
+    apple_dext_disconnect(container->dext_conn);
+    container->dext_conn = IO_OBJECT_NULL;
+    goto fail_discards;
+}
+
+static void apple_vfio_container_disconnect(AppleVFIOContainer *container)
+{
+    VFIOContainer *bcontainer = VFIO_IOMMU(container);
+    VFIOAddressSpace *space = bcontainer->space;
+
+    ram_block_uncoordinated_discard_disable(false);
+    vfio_listener_unregister(bcontainer);
+
+    apple_vfio_dext_release(container->host_bus, container->host_device,
+                            container->host_function, container->dext_conn);
+    container->dext_conn = IO_OBJECT_NULL;
+
+    object_unref(container);
+    vfio_address_space_put(space);
+}
+
+static bool apple_vfio_attach_device(const char *name, VFIODevice *vbasedev,
+                                     AddressSpace *as, Error **errp)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    AppleVFIOContainer *container;
+    struct vfio_device_info info = {
+        .argsz = sizeof(info),
+        .flags = VFIO_DEVICE_FLAGS_PCI | VFIO_DEVICE_FLAGS_RESET,
+        .num_regions = VFIO_PCI_NUM_REGIONS,
+        .num_irqs = VFIO_PCI_NUM_IRQS,
+    };
+
+    container = apple_vfio_container_connect(as, vbasedev, errp);
+    if (!container) {
+        return false;
+    }
+
+    vbasedev->fd = -1;
+    vbasedev->io_ops = &apple_vfio_device_io_ops;
+    vfio_device_prepare(vbasedev, VFIO_IOMMU(container), &info);
+
+    if (!apple_vfio_device_setup(adev, errp)) {
+        vfio_device_unprepare(vbasedev);
+        apple_vfio_container_disconnect(container);
+        return false;
+    }
+
+    return true;
+}
+
+static void apple_vfio_detach_device(VFIODevice *vbasedev)
+{
+    VFIOApplePCIDevice *adev = VFIO_APPLE_PCI(vbasedev->dev);
+    AppleVFIOContainer *container = VFIO_IOMMU_APPLE(vbasedev->bcontainer);
+
+    apple_vfio_device_cleanup(adev);
+    vfio_device_unprepare(vbasedev);
+    apple_vfio_container_disconnect(container);
+}
+
+static void vfio_iommu_apple_class_init(ObjectClass *klass, const void *data)
+{
+    VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
+
+    vioc->setup = apple_vfio_setup;
+    vioc->dma_map = apple_vfio_dma_map;
+    vioc->dma_unmap = apple_vfio_dma_unmap;
+    vioc->attach_device = apple_vfio_attach_device;
+    vioc->detach_device = apple_vfio_detach_device;
+    vioc->set_dirty_page_tracking = apple_vfio_set_dirty_page_tracking;
+    vioc->query_dirty_bitmap = apple_vfio_query_dirty_bitmap;
+    vioc->pci_hot_reset = apple_vfio_pci_hot_reset;
+}
+
+static const TypeInfo apple_vfio_types[] = {
+    {
+        .name = TYPE_VFIO_IOMMU_APPLE,
+        .parent = TYPE_VFIO_IOMMU,
+        .instance_size = sizeof(AppleVFIOContainer),
+        .class_init = vfio_iommu_apple_class_init,
+    },
+};
+
+DEFINE_TYPES(apple_vfio_types)
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 965c8e5b80..473f8669f9 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -40,6 +40,8 @@ system_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
 # Apple VFIO backend
 if host_os == 'darwin'
   system_ss.add(when: 'CONFIG_VFIO',
-                if_true: [files('apple-dext-client.c'),
-                coref, iokit])
+                if_true: [files('apple-device.c',
+                                'container-apple.c',
+                                'apple-dext-client.c'),
+                          coref, iokit])
 endif
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 09/10] vfio/apple: Add apple-dma-pci companion device
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (7 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 08/10] vfio/apple: Add IOMMU container and PCI device Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  7:28 ` [RFC PATCH 10/10] docs: Add vfio-apple documentation and MAINTAINERS entry Scott J. Goldman
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman, Scott J. Goldman

From: "Scott J. Goldman" <scottjg@umich.edu>

Add a virtual PCI device ("apple-dma-pci") that acts as the DMA
mapping companion for passthrough devices on macOS.

On Linux, the kernel VFIO driver can directly map guest RAM into the
hardware IOMMU.  On macOS, the DriverKit dext can only DMA-map memory
that belongs to its client process (QEMU).  The guest kernel needs a
way to request DMA mappings for its buffers through the host.

apple-dma-pci provides a simple MMIO register interface in the guest:
the guest driver writes DMA mapping requests (IOVA, GPA, size) to the
device's BAR, and the QEMU device model translates GPAs to HVAs and
calls through to the dext to establish the physical IOMMU mapping.

The device uses PCI vendor/device ID 1b36:0015 (Red Hat, Inc.).

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 docs/specs/pci-ids.rst |   3 +
 hw/vfio/apple-dma.c    | 540 +++++++++++++++++++++++++++++++++++++++++
 hw/vfio/meson.build    |   5 +-
 include/hw/pci/pci.h   |   1 +
 4 files changed, 546 insertions(+), 3 deletions(-)
 create mode 100644 hw/vfio/apple-dma.c

diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst
index 261b0f359f..48229dab5d 100644
--- a/docs/specs/pci-ids.rst
+++ b/docs/specs/pci-ids.rst
@@ -100,6 +100,9 @@ PCI devices (other than virtio):
   PCI UFS device (``-device ufs``)
 1b36:0014
   PCI RISC-V IOMMU device
+1b36:0015
+  Apple DMA mapping device (``-device apple-dma-pci``,
+  :doc:`../system/devices/vfio-apple`)
 
 All these devices are documented in :doc:`index`.
 
diff --git a/hw/vfio/apple-dma.c b/hw/vfio/apple-dma.c
new file mode 100644
index 0000000000..e705179b0d
--- /dev/null
+++ b/hw/vfio/apple-dma.c
@@ -0,0 +1,540 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Apple DMA mapping PCI device
+ *
+ * A simple PCI device that receives batched DMA map/unmap requests from
+ * the guest via a shared command page + doorbell register, resolves guest
+ * physical addresses to host virtual addresses, and registers them with
+ * the macOS DriverKit dext for DART mapping.
+ *
+ * Protocol:
+ *   1. Guest allocates a command page and request/response buffers in RAM.
+ *   2. Guest writes the command page GPA to BAR registers (one-time setup).
+ *   3. Per batch: guest fills the command page and request buffer (no VMEXIT),
+ *      then writes the doorbell register (single VMEXIT triggers processing).
+ *   4. Device reads the command page, processes all entries, writes responses
+ *      and status back to guest RAM before the doorbell write returns.
+ *
+ * Copyright (c) 2026 Scott J. Goldman
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/pci/pci_device.h"
+#include "hw/core/qdev-properties.h"
+#include "hw/vfio/apple-dext-client.h"
+#include "hw/vfio/apple.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "system/address-spaces.h"
+#include "system/dma.h"
+#include "system/memory.h"
+
+#include "hw/pci/pci.h"
+
+/* BAR0 register offsets */
+#define APPLE_DMA_REG_VERSION       0x00    /* R:  protocol version */
+#define APPLE_DMA_REG_MANAGED_BDF   0x04    /* R:  guest BDF this maps for */
+#define APPLE_DMA_REG_MAX_ENTRIES   0x08    /* R:  max entries per batch */
+#define APPLE_DMA_REG_STATUS        0x0C    /* R:  result of last doorbell */
+#define APPLE_DMA_REG_CMD_GPA_LO    0x10    /* W:  command page GPA [31:0] */
+#define APPLE_DMA_REG_CMD_GPA_HI    0x14    /* W:  command page GPA [63:32] */
+#define APPLE_DMA_REG_DOORBELL      0x18    /* W:  any write triggers batch */
+#define APPLE_DMA_BAR_SIZE          0x1000  /* page-aligned */
+
+#define APPLE_DMA_VERSION           2
+#define APPLE_DMA_MAX_ENTRIES       4096
+
+/* Command types (in command page) */
+#define APPLE_DMA_CMD_MAP           1
+#define APPLE_DMA_CMD_UNMAP         2
+
+/* Status codes */
+#define APPLE_DMA_S_OK              0
+#define APPLE_DMA_S_IOERR           1
+#define APPLE_DMA_S_INVAL           3
+
+/*
+ * Command page layout (in guest RAM, 32 bytes):
+ *
+ *   0x00  uint32_t  type         MAP=1, UNMAP=2
+ *   0x04  uint32_t  count        number of entries
+ *   0x08  uint32_t  status       (written by device) 0=OK
+ *   0x0C  uint32_t  reserved
+ *   0x10  uint64_t  req_gpa      GPA of request entries array
+ *   0x18  uint64_t  resp_gpa     GPA of response entries array
+ */
+#define CMD_OFF_TYPE      0x00
+#define CMD_OFF_COUNT     0x04
+#define CMD_OFF_STATUS    0x08
+#define CMD_OFF_REQ_GPA   0x10
+#define CMD_OFF_RESP_GPA  0x18
+#define CMD_PAGE_SIZE     0x20
+
+/*
+ * Map request entry (16 bytes):
+ *   uint64_t gpa, uint32_t len, uint32_t flags
+ *
+ * Map response entry (24 bytes):
+ *   uint64_t id, uint64_t dma_addr, uint32_t dma_len, uint32_t status
+ *
+ * Unmap request entry (16 bytes):
+ *   uint64_t id, uint64_t size
+ *
+ * Unmap response entry (16 bytes):
+ *   uint64_t id, uint32_t status, uint32_t reserved
+ */
+
+typedef struct AppleDMAMapReq {
+    uint64_t gpa;
+    uint32_t len;
+    uint32_t flags;
+} QEMU_PACKED AppleDMAMapReq;
+
+typedef struct AppleDMAMapResp {
+    uint64_t id;
+    uint64_t dma_addr;
+    uint32_t dma_len;
+    uint32_t status;
+} QEMU_PACKED AppleDMAMapResp;
+
+typedef struct AppleDMAUnmapReq {
+    uint64_t id;
+    uint64_t size;
+} QEMU_PACKED AppleDMAUnmapReq;
+
+typedef struct AppleDMAUnmapResp {
+    uint64_t id;
+    uint32_t status;
+    uint32_t reserved;
+} QEMU_PACKED AppleDMAUnmapResp;
+
+#define TYPE_APPLE_DMA_PCI "apple-dma-pci"
+OBJECT_DECLARE_SIMPLE_TYPE(AppleDMAState, APPLE_DMA_PCI)
+
+struct AppleDMAState {
+    PCIDevice parent_obj;
+
+    MemoryRegion bar;
+
+    /* Configuration (set via properties) */
+    uint32_t managed_bdf;
+    uint32_t max_entries;
+    uint32_t apple_host_bus;
+    uint32_t apple_host_device;
+    uint32_t apple_host_function;
+
+    /* Runtime state */
+    uint64_t cmd_gpa;
+    uint32_t last_status;
+    io_connect_t dext_conn;
+    bool shared_dext_conn;
+};
+
+/* ------------------------------------------------------------------ */
+/* DMA backend operations                                              */
+/* ------------------------------------------------------------------ */
+
+static bool apple_dma_backend_map(AppleDMAState *s, uint64_t gpa, uint32_t size,
+                                  uint64_t *out_dma_addr, uint32_t *out_dma_len)
+{
+    hwaddr map_len = size;
+    void *hva;
+    uint64_t bus_addr = 0, bus_len = 0;
+
+    hva = dma_memory_map(&address_space_memory, gpa, &map_len,
+                         DMA_DIRECTION_TO_DEVICE, MEMTXATTRS_UNSPECIFIED);
+    if (!hva || map_len < size) {
+        if (hva) {
+            dma_memory_unmap(&address_space_memory, hva, map_len,
+                             DMA_DIRECTION_TO_DEVICE, 0);
+        }
+        return false;
+    }
+
+    /*
+     * Use the GPA as the dext lookup key. The dext treats this as an
+     * opaque handle for matching register/unregister calls; the actual
+     * DMA bus address is assigned by the platform and returned in
+     * bus_addr.
+     */
+    if (s->dext_conn != IO_OBJECT_NULL) {
+        kern_return_t kr;
+
+        kr = apple_dext_register_dma(s->dext_conn, gpa,
+                                     (uint64_t)hva, size,
+                                     &bus_addr, &bus_len);
+        dma_memory_unmap(&address_space_memory, hva, map_len,
+                         DMA_DIRECTION_TO_DEVICE, 0);
+        if (kr != KERN_SUCCESS) {
+            return false;
+        }
+        *out_dma_addr = bus_addr;
+        *out_dma_len = (uint32_t)bus_len;
+    } else {
+        dma_memory_unmap(&address_space_memory, hva, map_len,
+                         DMA_DIRECTION_TO_DEVICE, 0);
+        *out_dma_addr = gpa;
+        *out_dma_len = size;
+    }
+
+    return true;
+}
+
+static uint32_t apple_dma_backend_unmap(AppleDMAState *s, uint64_t id)
+{
+    if (s->dext_conn != IO_OBJECT_NULL) {
+        kern_return_t kr;
+
+        kr = apple_dext_unregister_dma(s->dext_conn, id);
+        if (kr != KERN_SUCCESS) {
+            return APPLE_DMA_S_IOERR;
+        }
+    }
+
+    return APPLE_DMA_S_OK;
+}
+
+/* ------------------------------------------------------------------ */
+/* Doorbell — process a batch from the command page                    */
+/* ------------------------------------------------------------------ */
+
+static void apple_dma_handle_map(AppleDMAState *s, uint64_t req_gpa,
+                                 uint64_t resp_gpa, uint32_t count)
+{
+    AddressSpace *as = &address_space_memory;
+    hwaddr req_len = count * sizeof(AppleDMAMapReq);
+    hwaddr resp_len = count * sizeof(AppleDMAMapResp);
+    AppleDMAMapReq *reqs;
+    AppleDMAMapResp *resps;
+    uint32_t i;
+    bool ok = true;
+
+    reqs = dma_memory_map(as, req_gpa, &req_len, DMA_DIRECTION_TO_DEVICE,
+                          MEMTXATTRS_UNSPECIFIED);
+    if (!reqs || req_len < count * sizeof(AppleDMAMapReq)) {
+        if (reqs) {
+            dma_memory_unmap(as, reqs, req_len, DMA_DIRECTION_TO_DEVICE, 0);
+        }
+        s->last_status = APPLE_DMA_S_INVAL;
+        return;
+    }
+
+    resps = dma_memory_map(as, resp_gpa, &resp_len, DMA_DIRECTION_FROM_DEVICE,
+                           MEMTXATTRS_UNSPECIFIED);
+    if (!resps || resp_len < count * sizeof(AppleDMAMapResp)) {
+        if (resps) {
+            dma_memory_unmap(as, resps, resp_len,
+                             DMA_DIRECTION_FROM_DEVICE, 0);
+        }
+        dma_memory_unmap(as, reqs, req_len, DMA_DIRECTION_TO_DEVICE, 0);
+        s->last_status = APPLE_DMA_S_INVAL;
+        return;
+    }
+
+    for (i = 0; i < count; i++) {
+        uint64_t gpa = le64_to_cpu(reqs[i].gpa);
+        uint64_t dma_addr = 0;
+        uint32_t dma_len = 0;
+
+        if (apple_dma_backend_map(s, gpa, le32_to_cpu(reqs[i].len),
+                                  &dma_addr, &dma_len)) {
+            resps[i].id = cpu_to_le64(gpa);
+            resps[i].dma_addr = cpu_to_le64(dma_addr);
+            resps[i].dma_len = cpu_to_le32(dma_len);
+            resps[i].status = cpu_to_le32(APPLE_DMA_S_OK);
+        } else {
+            resps[i].status = cpu_to_le32(APPLE_DMA_S_IOERR);
+            ok = false;
+        }
+    }
+
+    s->last_status = ok ? APPLE_DMA_S_OK : APPLE_DMA_S_IOERR;
+    dma_memory_unmap(as, resps, resp_len, DMA_DIRECTION_FROM_DEVICE,
+                     resp_len);
+    dma_memory_unmap(as, reqs, req_len, DMA_DIRECTION_TO_DEVICE, 0);
+}
+
+static void apple_dma_handle_unmap(AppleDMAState *s, uint64_t req_gpa,
+                                   uint64_t resp_gpa, uint32_t count)
+{
+    AddressSpace *as = &address_space_memory;
+    hwaddr req_len = count * sizeof(AppleDMAUnmapReq);
+    hwaddr resp_len = count * sizeof(AppleDMAUnmapResp);
+    AppleDMAUnmapReq *reqs;
+    AppleDMAUnmapResp *resps;
+    uint32_t i;
+    bool ok = true;
+
+    reqs = dma_memory_map(as, req_gpa, &req_len, DMA_DIRECTION_TO_DEVICE,
+                          MEMTXATTRS_UNSPECIFIED);
+    if (!reqs || req_len < count * sizeof(AppleDMAUnmapReq)) {
+        if (reqs) {
+            dma_memory_unmap(as, reqs, req_len, DMA_DIRECTION_TO_DEVICE, 0);
+        }
+        s->last_status = APPLE_DMA_S_INVAL;
+        return;
+    }
+
+    resps = dma_memory_map(as, resp_gpa, &resp_len, DMA_DIRECTION_FROM_DEVICE,
+                           MEMTXATTRS_UNSPECIFIED);
+    if (!resps || resp_len < count * sizeof(AppleDMAUnmapResp)) {
+        if (resps) {
+            dma_memory_unmap(as, resps, resp_len,
+                             DMA_DIRECTION_FROM_DEVICE, 0);
+        }
+        dma_memory_unmap(as, reqs, req_len, DMA_DIRECTION_TO_DEVICE, 0);
+        s->last_status = APPLE_DMA_S_INVAL;
+        return;
+    }
+
+    for (i = 0; i < count; i++) {
+        uint64_t id = le64_to_cpu(reqs[i].id);
+        uint32_t status = apple_dma_backend_unmap(s, id);
+
+        resps[i].id = cpu_to_le64(id);
+        resps[i].status = cpu_to_le32(status);
+        if (status != APPLE_DMA_S_OK) {
+            ok = false;
+        }
+    }
+
+    s->last_status = ok ? APPLE_DMA_S_OK : APPLE_DMA_S_IOERR;
+    dma_memory_unmap(as, resps, resp_len, DMA_DIRECTION_FROM_DEVICE,
+                     resp_len);
+    dma_memory_unmap(as, reqs, req_len, DMA_DIRECTION_TO_DEVICE, 0);
+}
+
+static void apple_dma_doorbell(AppleDMAState *s)
+{
+    AddressSpace *as = &address_space_memory;
+    uint8_t cmd_buf[CMD_PAGE_SIZE];
+    uint32_t type, count;
+    uint64_t req_gpa, resp_gpa;
+    uint32_t le_status;
+
+    if (!s->cmd_gpa) {
+        s->last_status = APPLE_DMA_S_INVAL;
+        return;
+    }
+
+    if (dma_memory_read(as, s->cmd_gpa, cmd_buf, CMD_PAGE_SIZE,
+                        MEMTXATTRS_UNSPECIFIED) != MEMTX_OK) {
+        s->last_status = APPLE_DMA_S_INVAL;
+        return;
+    }
+
+    type = ldl_le_p(cmd_buf + CMD_OFF_TYPE);
+    count = ldl_le_p(cmd_buf + CMD_OFF_COUNT);
+    req_gpa = ldq_le_p(cmd_buf + CMD_OFF_REQ_GPA);
+    resp_gpa = ldq_le_p(cmd_buf + CMD_OFF_RESP_GPA);
+
+    if (!count || count > s->max_entries || !req_gpa || !resp_gpa) {
+        s->last_status = APPLE_DMA_S_INVAL;
+        goto write_status;
+    }
+
+    switch (type) {
+    case APPLE_DMA_CMD_MAP:
+        apple_dma_handle_map(s, req_gpa, resp_gpa, count);
+        break;
+    case APPLE_DMA_CMD_UNMAP:
+        apple_dma_handle_unmap(s, req_gpa, resp_gpa, count);
+        break;
+    default:
+        s->last_status = APPLE_DMA_S_INVAL;
+        break;
+    }
+
+write_status:
+    le_status = cpu_to_le32(s->last_status);
+    dma_memory_write(as, s->cmd_gpa + CMD_OFF_STATUS, &le_status, 4,
+                     MEMTXATTRS_UNSPECIFIED);
+}
+
+/* ------------------------------------------------------------------ */
+/* MMIO BAR handlers                                                   */
+/* ------------------------------------------------------------------ */
+
+static uint64_t apple_dma_bar_read(void *opaque, hwaddr addr, unsigned size)
+{
+    AppleDMAState *s = opaque;
+
+    switch (addr) {
+    case APPLE_DMA_REG_VERSION:
+        return APPLE_DMA_VERSION;
+    case APPLE_DMA_REG_MANAGED_BDF:
+        return s->managed_bdf;
+    case APPLE_DMA_REG_MAX_ENTRIES:
+        return s->max_entries;
+    case APPLE_DMA_REG_STATUS:
+        return s->last_status;
+    default:
+        return 0;
+    }
+}
+
+static void apple_dma_bar_write(void *opaque, hwaddr addr, uint64_t val,
+                                unsigned size)
+{
+    AppleDMAState *s = opaque;
+
+    switch (addr) {
+    case APPLE_DMA_REG_CMD_GPA_LO:
+        s->cmd_gpa = deposit64(s->cmd_gpa, 0, 32, val);
+        break;
+    case APPLE_DMA_REG_CMD_GPA_HI:
+        s->cmd_gpa = deposit64(s->cmd_gpa, 32, 32, val);
+        break;
+    case APPLE_DMA_REG_DOORBELL:
+        apple_dma_doorbell(s);
+        break;
+    default:
+        break;
+    }
+}
+
+static const MemoryRegionOps apple_dma_bar_ops = {
+    .read = apple_dma_bar_read,
+    .write = apple_dma_bar_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+/* ------------------------------------------------------------------ */
+/* Dext connection                                                     */
+/* ------------------------------------------------------------------ */
+
+static bool apple_dma_connect_dext(AppleDMAState *s, Error **errp)
+{
+    io_connect_t conn;
+    kern_return_t kr;
+
+    conn = apple_vfio_dext_lookup(s->apple_host_bus, s->apple_host_device,
+                                  s->apple_host_function);
+    if (conn != IO_OBJECT_NULL) {
+        s->dext_conn = conn;
+        s->shared_dext_conn = true;
+        return true;
+    }
+
+    conn = apple_dext_connect(s->apple_host_bus, s->apple_host_device,
+                                  s->apple_host_function);
+    if (conn == IO_OBJECT_NULL) {
+        error_setg(errp,
+                   "apple-dma: could not connect to dext for host PCI "
+                   "%02x:%02x.%x",
+                   s->apple_host_bus, s->apple_host_device,
+                   s->apple_host_function);
+        return false;
+    }
+
+    kr = apple_dext_claim(conn);
+    if (kr != KERN_SUCCESS) {
+        error_setg(errp,
+                   "apple-dma: failed to claim dext-backed PCI device "
+                   "%02x:%02x.%x (kr=0x%x)",
+                   s->apple_host_bus, s->apple_host_device,
+                   s->apple_host_function, kr);
+        apple_dext_disconnect(conn);
+        return false;
+    }
+
+    s->dext_conn = conn;
+    s->shared_dext_conn = false;
+    return true;
+}
+
+/* ------------------------------------------------------------------ */
+/* PCI device lifecycle                                                */
+/* ------------------------------------------------------------------ */
+
+static void apple_dma_pci_realize(PCIDevice *pdev, Error **errp)
+{
+    AppleDMAState *s = APPLE_DMA_PCI(pdev);
+
+    if (s->apple_host_bus == UINT32_MAX ||
+        s->apple_host_device == UINT32_MAX ||
+        s->apple_host_function == UINT32_MAX) {
+        error_setg(errp, "apple-dma: requires x-apple-host-bus, "
+                   "x-apple-host-device, and x-apple-host-function");
+        return;
+    }
+
+    if (!s->max_entries) {
+        s->max_entries = APPLE_DMA_MAX_ENTRIES;
+    }
+
+    if (!apple_dma_connect_dext(s, errp)) {
+        return;
+    }
+
+    memory_region_init_io(&s->bar, OBJECT(s), &apple_dma_bar_ops, s,
+                          "apple-dma-bar", APPLE_DMA_BAR_SIZE);
+    pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
+}
+
+static void apple_dma_pci_exit(PCIDevice *pdev)
+{
+    AppleDMAState *s = APPLE_DMA_PCI(pdev);
+
+    if (s->dext_conn != IO_OBJECT_NULL) {
+        if (s->shared_dext_conn) {
+            apple_vfio_dext_release(s->apple_host_bus, s->apple_host_device,
+                                    s->apple_host_function, s->dext_conn);
+        } else {
+            apple_dext_disconnect(s->dext_conn);
+        }
+        s->dext_conn = IO_OBJECT_NULL;
+    }
+}
+
+static const Property apple_dma_pci_properties[] = {
+    DEFINE_PROP_UINT32("managed-bdf", AppleDMAState, managed_bdf, 0),
+    DEFINE_PROP_UINT32("max-entries", AppleDMAState, max_entries, 0),
+    DEFINE_PROP_UINT32("x-apple-host-bus", AppleDMAState,
+                       apple_host_bus, UINT32_MAX),
+    DEFINE_PROP_UINT32("x-apple-host-device", AppleDMAState,
+                       apple_host_device, UINT32_MAX),
+    DEFINE_PROP_UINT32("x-apple-host-function", AppleDMAState,
+                       apple_host_function, UINT32_MAX),
+};
+
+static void apple_dma_pci_class_init(ObjectClass *klass, const void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
+
+    pdc->realize = apple_dma_pci_realize;
+    pdc->exit = apple_dma_pci_exit;
+    pdc->vendor_id = PCI_VENDOR_ID_REDHAT;
+    pdc->device_id = PCI_DEVICE_ID_REDHAT_APPLE_DMA;
+    pdc->class_id = PCI_CLASS_SYSTEM_OTHER;
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    device_class_set_props(dc, apple_dma_pci_properties);
+    dc->desc = "Apple DMA mapping device";
+}
+
+static const TypeInfo apple_dma_pci_info = {
+    .name = TYPE_APPLE_DMA_PCI,
+    .parent = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(AppleDMAState),
+    .class_init = apple_dma_pci_class_init,
+    .interfaces = (const InterfaceInfo[]) {
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { },
+    },
+};
+
+static void apple_dma_pci_register(void)
+{
+    type_register_static(&apple_dma_pci_info);
+}
+
+type_init(apple_dma_pci_register)
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 473f8669f9..d7b4cbcc19 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -40,8 +40,7 @@ system_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
 # Apple VFIO backend
 if host_os == 'darwin'
   system_ss.add(when: 'CONFIG_VFIO',
-                if_true: [files('apple-device.c',
-                                'container-apple.c',
-                                'apple-dext-client.c'),
+                if_true: [files('apple-device.c', 'apple-dma.c',
+                                'container-apple.c', 'apple-dext-client.c'),
                           coref, iokit])
 endif
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 5b179091de..8e3fe77cc7 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -121,6 +121,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_REDHAT_ACPI_ERST   0x0012
 #define PCI_DEVICE_ID_REDHAT_UFS         0x0013
 #define PCI_DEVICE_ID_REDHAT_RISCV_IOMMU 0x0014
+#define PCI_DEVICE_ID_REDHAT_APPLE_DMA   0x0015
 #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
 
 #define FMT_PCIBUS                      PRIx64
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 10/10] docs: Add vfio-apple documentation and MAINTAINERS entry
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (8 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 09/10] vfio/apple: Add apple-dma-pci companion device Scott J. Goldman
@ 2026-04-05  7:28 ` Scott J. Goldman
  2026-04-05  8:01 ` [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Mohamed Mediouni
  2026-04-05 10:36 ` BALATON Zoltan
  11 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex, clg, pbonzini, rbolshakov, phil, mst, john.levon,
	thanos.makatos, qemu-s390x, Scott J. Goldman, Scott J. Goldman

From: "Scott J. Goldman" <scottjg@umich.edu>

Add documentation for the Apple VFIO passthrough backend covering
requirements, usage, the DMA companion device protocol, and known
limitations.  Register the vfio-apple files in MAINTAINERS.

Signed-off-by: Scott J. Goldman <scottjgo@gmail.com>
---
 MAINTAINERS                        |  11 ++
 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/vfio-apple.rst | 160 +++++++++++++++++++++++++++++
 3 files changed, 172 insertions(+)
 create mode 100644 docs/system/devices/vfio-apple.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ad215eced8..baed89cbf7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2338,6 +2338,17 @@ F: qapi/vfio.json
 F: migration/vfio-stub.c
 F: tests/functional/aarch64/test_device_passthrough.py
 
+vfio-apple
+M: Scott J. Goldman <scottjgo@gmail.com>
+S: Maintained
+F: hw/vfio/apple-device.c
+F: hw/vfio/apple-dext-client.c
+F: hw/vfio/apple-dext-client.h
+F: hw/vfio/apple-dma.c
+F: hw/vfio/apple.h
+F: hw/vfio/container-apple.c
+F: include/compat/linux/
+
 vfio-igd
 M: Alex Williamson <alex@shazbot.org>
 M: Cédric Le Goater <clg@redhat.com>
diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 40054bb7df..aa3dbfa1e0 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -98,4 +98,5 @@ Emulated Devices
    devices/scsi/index.rst
    devices/usb-u2f.rst
    devices/usb.rst
+   devices/vfio-apple.rst
    devices/vfio-user.rst
diff --git a/docs/system/devices/vfio-apple.rst b/docs/system/devices/vfio-apple.rst
new file mode 100644
index 0000000000..b0e92f3b96
--- /dev/null
+++ b/docs/system/devices/vfio-apple.rst
@@ -0,0 +1,160 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+.. _vfio-apple:
+
+======================
+vfio-apple (macOS/ARM)
+======================
+
+QEMU supports PCI device passthrough on Apple Silicon Macs using a macOS
+DriverKit extension (dext) as the host backend.  Unlike Linux VFIO, which
+relies on kernel-managed IOMMU groups and ``/dev/vfio``, the Apple backend
+communicates with a userspace driver extension through IOKit's
+``IOUserClient`` interface.
+
+Requirements
+============
+
+- Apple Silicon Mac running macOS
+- A DriverKit extension (``VFIOUserPCIDriver``) installed and running for
+  the target PCI device
+- QEMU built with ``--enable-hvf`` on a Darwin host
+- A guest kernel module that speaks the ``apple-dma-pci`` protocol (see
+  below)
+
+Usage
+=====
+
+Specify the host PCI device by its bus/device/function address:
+
+.. code-block:: console
+
+  -device vfio-apple-pci,host=01:00.0
+
+This creates a ``vfio-apple-pci`` device that connects to the dext instance
+managing the given host PCI BDF.  A companion ``apple-dma-pci`` device is
+automatically created on the same bus.
+
+Architecture
+============
+
+The implementation consists of several components:
+
+``vfio-apple-pci``
+  A QOM subclass of ``vfio-pci`` that overrides realize/exit to set up the
+  Apple-specific IOMMU container and interrupt delivery.
+
+``vfio-iommu-apple``
+  An IOMMU container backend.  DMA map/unmap through the container are
+  no-ops because DMA is managed separately through the companion device
+  (see below).
+
+``apple-dma-pci``
+  A paravirtualized PCI device that provides batched DMA map/unmap
+  operations between the guest and the dext.  The guest driver writes a
+  command page GPA to BAR0 registers, fills request/response buffers in
+  RAM, then triggers processing with a single doorbell write (one VMEXIT
+  per batch).
+
+``Dext communication``
+  All host device access (config space, BAR MMIO, DMA registration,
+  interrupt notification) goes through IOKit ``IOConnectCallMethod`` calls to
+  the dext's ``IOUserClient``.
+
+DMA mapping
+-----------
+
+On Linux, VFIO programs the hardware IOMMU directly via kernel ioctls.
+QEMU chooses IOVAs and the kernel maps them through the IOMMU.
+
+On macOS, the DART (Apple's IOMMU) is managed entirely by the DriverKit
+extension.  QEMU cannot choose IOVAs — it can only request that a host
+virtual address be mapped for DMA, and the platform assigns the resulting
+IOVA.  This means the guest cannot assume any particular IOVA layout;
+the ``apple-dma-pci`` companion device returns the platform-assigned IOVA
+and bus address in its response entries.
+
+Because of this architecture, a **guest kernel module** is required to
+drive the ``apple-dma-pci`` device.  The module discovers the companion
+device, submits map/unmap batches, and translates between guest physical
+addresses and the platform-assigned DMA addresses that the passthrough
+device will use.
+
+Platform constraints
+--------------------
+
+The macOS DART imposes limits that do not exist with Linux VFIO:
+
+- **No IOVA alignment guarantees**: the platform may return any address.
+  Guest drivers that assume page-aligned or naturally-aligned DMA
+  addresses must account for this.
+- **Total DMA memory limit**: approximately 1.5 GB of guest memory can be
+  registered for DMA at any time.
+- **Mapping count limit**: approximately 64k concurrent DMA mappings.
+
+These limits are enforced by the DART hardware and DriverKit, not by QEMU.
+Exceeding them will cause map requests to fail.
+
+``apple-dma-pci`` register interface
+-------------------------------------
+
+.. list-table::
+   :header-rows: 1
+
+   * - Offset
+     - Name
+     - Access
+     - Description
+   * - 0x00
+     - VERSION
+     - R
+     - Protocol version
+   * - 0x04
+     - MANAGED_BDF
+     - R
+     - Guest BDF this device maps for
+   * - 0x08
+     - MAX_ENTRIES
+     - R
+     - Maximum entries per batch
+   * - 0x0C
+     - STATUS
+     - R
+     - Result of last doorbell
+   * - 0x10
+     - CMD_GPA_LO
+     - W
+     - Command page GPA [31:0]
+   * - 0x14
+     - CMD_GPA_HI
+     - W
+     - Command page GPA [63:32]
+   * - 0x18
+     - DOORBELL
+     - W
+     - Any write triggers batch processing
+
+Interrupts
+----------
+
+The dext tracks pending hardware interrupts (MSI/MSI-X) in a bitfield
+(one bit per vector, up to 256 vectors).  When a hardware interrupt fires,
+the dext sets the corresponding bit and completes an asynchronous
+notification to QEMU via ``IOConnectCallAsyncMethod``.  The notification
+is dispatched through a GCD queue, which wakes the QEMU main loop via an
+``EventNotifier`` pipe.  QEMU then atomically reads and clears the pending
+bits and delivers each flagged vector to the guest.
+
+Limitations
+===========
+
+- **Guest kernel module required**: the ``apple-dma-pci`` protocol is not
+  handled by standard guest drivers.
+- **No migration support**: the Apple container does not support dirty page
+  tracking.
+- **Interrupt delivery**: interrupts are delivered asynchronously via IOKit
+  rather than directly by the kernel, adding some overhead compared to
+  Linux VFIO.
+- **No hot-plug**: devices must be configured at VM startup.
+- **DMA constraints**: see `Platform constraints`_ above.
+- **Darwin only**: the base ``vfio-pci`` device type is not user-creatable
+  on Darwin; use ``vfio-apple-pci`` instead.
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (9 preceding siblings ...)
  2026-04-05  7:28 ` [RFC PATCH 10/10] docs: Add vfio-apple documentation and MAINTAINERS entry Scott J. Goldman
@ 2026-04-05  8:01 ` Mohamed Mediouni
  2026-04-05  8:14   ` Mohamed Mediouni
  2026-04-05 10:36 ` BALATON Zoltan
  11 siblings, 1 reply; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-05  8:01 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x


> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
> 
> This series adds VFIO PCI device passthrough support for Apple Silicon
> Macs running macOS, using a DriverKit extension (dext) as the host
> backend instead of the Linux VFIO kernel driver.
> 
> I'm sending this as an RFC because I'd like feedback before investing
> further in upstreaming.  The code is functional.  I've tested it with
> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
> [1]), likely due to the BAR access penalty described below.  AI
> inference workloads appear less affected.  Ollama with Qwen 3.5
> generates around 140 tok/sec on the same setup [2].
> 
> How it works:
> 
> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
> for device access and DMA mapping.  On macOS, there is no equivalent
> kernel interface.  Instead, a userspace DriverKit extension
> (VFIOUserPCIDriver) mediates access to the physical PCI device through
> IOKit's IOUserClient and PCIDriverKit APIs.
> 
> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
> passthrough infrastructure.  A few ioctl callsites are refactored into
> io_ops callbacks, the build system is extended for Darwin, and the
> Apple-specific backend plugs in behind those abstractions.
> 
> The guest sees two PCI devices: the passthrough device itself
> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
> DMA mapping device (apple-dma-pci).  On the QEMU side, an
> AppleVFIOContainer implements the IOMMU backend, and a C client
> library wraps the IOUserClient calls to the dext for config space,
> BAR MMIO, interrupts, reset, and DMA.
> 
> DMA limitations:
> 
> This is the biggest platform constraint.  Unlike a typical IOMMU
> mapping operation where the caller specifies the IOVA, the
> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
> system-assigned IOVA.  There is no way to request a specific address.
> This means the guest's requested DMA addresses cannot be used
> directly.  The guest kernel module must intercept DMA mapping calls
> and forward them through the companion device to get the actual
> hardware IOVA.

Hello,

Ugh this one is not great. By the way, Apple has a private PCIe passthrough
API used by Virtualization.framework but that’s a different design.

Would bounce buffering using something akin the confidential compute path and 
a pre-defined chunk of host memory accessible from the device, and then managing
the guest address map work? (see swiotlb).

If the last part isn’t possible, something minimal to export an swiotlb window
through device tree with giving the IOVA there would be good too.

And that will get rid of a need for a apple-dma-pci device.

> There are also hard platform limits: approximately 1.5 GB total
> mapped memory and roughly 64k concurrent mappings.  Not all
> workloads will fit within these limits, though GPU gaming and LLM
> inference have worked in practice.

That’s not too dissimilar from the confidential compute limitations.

> 
> BAR access has performance issues as well.  HVF does not expose
> controls to map device memory as cacheable in the guest, creating a
> significant performance penalty on BAR MMIO.  Uncached mappings work
> correctly but slowly compared to what the hardware could do.

That’s not a macOS limitation and not an Apple hardware limitation, but
it’s more fundamental to how PCIe works.

Unlike CXL, PCIe doesn’t have a coherency protocol story, and the alternative
of uncached and doing manual software-managed flushes isn’t really tenable.

> 
> What works:
> - PCI config space passthrough
> - BAR MMIO via direct-mapped device memory
> - MSI/MSI-X interrupts via async notification from the dext
> - Device reset (FLR with hot-reset fallback)
> - DMA mapping for guest device drivers
> 
This is very interesting to see :)

> What doesn't work:
> - Expansion ROM / VBIOS passthrough
> - PCI BAR quirks
> - VGA region passthrough
> - Migration and dirty page tracking
> - Hot-unplug
> 



> Questions for reviewers:
> 
> 1. Is this something the VFIO maintainers would consider carrying
>   upstream?  The refactoring patches (3-6) are benign, but the Apple
>   backend is a new platform with real limitations.  That said, if Apple
>   lifts some of the DART/HVF restrictions in a future macOS release, the
>   code changes to take advantage would likely be minor.  I'd like to
>   understand whether this is in scope before doing the work to
>   address review feedback on the full series.
> 
> 2. The apple-dma-pci companion device: should this be a virtio device
>   instead?  I went with a simple custom PCI device because the virtio
>   infrastructure didn't buy much for what is essentially a {map, unmap}
>   register interface, but if virtio is preferred, what is the process
>   for allocating a device ID?  If a custom PCI device is the right
>   approach, I've tentatively allocated 1b36:0015.  Is there a process
>   for reserving a device ID under the Red Hat PCI vendor, or is
>   claiming it in pci-ids.rst sufficient?  The guest-side kernel module
>   hooks all DMA mapping functions for passed-through devices, which is
>   unusual enough that I'm not sure it's upstreamable in the Linux
>   kernel.  I can maintain it out of tree if needed.

I’d recommend using bounce buffers like the CoCo case if possible. I don’t
think that the apple-dma-pci definitely-not-an-IOMMU is a good idea.

> 
> 3. Should the macOS host-side DriverKit extension live in the QEMU
>   tree?  It's not included in this series and requires Apple code
>   signing.  I'm happy to keep it out of tree if that's preferred,
>   or include the source if reviewers want it co-located.

Both are fine I think. Could you share compatibility with the tinygrad
one at https://github.com/tinygrad/tinygrad/tree/7e54992bf600789dbe5d37b99fe12a19c32e36a1/extra/usbgpu/tbgpu/installer and prebuilt at https://raw.githubusercontent.com/tinygrad/tinygpu_releases/refs/heads/main/TinyGPU.zip?

> 
> 4. The existing VFIO code includes <linux/vfio.h> from the
>   linux-headers/ tree, which is intended to track upstream Linux
>   UAPI headers.  To make this compile on macOS, I added minimal
>   stub headers (include/compat/linux/types.h and linux/ioctl.h)
>   so the existing vfio.h parses on macOS without modification.  An
>   alternative would be to move an approximation of vfio.h into
>   standard-headers/, but that felt against the spirit of tracking
>   the latest upstream headers, and the standard-headers import
>   process strips ioctls which the VFIO code relies on.  I felt
>   the stub approach was the least invasive, but I'm open to
>   changing it if there's a preferred way to handle this.
> 
> [1] https://imgur.com/a/xoRS9kT
> [2] https://imgur.com/a/ui4pYF0
> 
> Scott J. Goldman (10):
>  vfio/pci: Use the write side of EventNotifier for IRQ signaling
>  accel/hvf: avoid executable mappings for RAM-device memory
>  vfio: Allow building on Darwin hosts
>  vfio: Prepare existing code for Apple VFIO backend
>  vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps
>  vfio: Add device_reset callback to VFIODeviceIOOps
>  vfio/apple: Add DriverKit dext client library
>  vfio/apple: Add IOMMU container and PCI device
>  vfio/apple: Add apple-dma-pci companion device
>  docs: Add vfio-apple documentation and MAINTAINERS entry
> 
> Kconfig.host                       |   3 +
> MAINTAINERS                        |  11 +
> accel/hvf/hvf-all.c                |  10 +-
> backends/Kconfig                   |   2 +-
> docs/specs/pci-ids.rst             |   3 +
> docs/system/device-emulation.rst   |   1 +
> docs/system/devices/vfio-apple.rst | 160 +++++
> hw/vfio-user/device.c              |  16 +-
> hw/vfio/Kconfig                    |   4 +-
> hw/vfio/ap.c                       |   4 +-
> hw/vfio/apple-device.c             | 945 +++++++++++++++++++++++++++++
> hw/vfio/apple-dext-client.c        | 681 +++++++++++++++++++++
> hw/vfio/apple-dext-client.h        | 253 ++++++++
> hw/vfio/apple-dma.c                | 540 +++++++++++++++++
> hw/vfio/apple.h                    |  74 +++
> hw/vfio/ccw.c                      |   2 +-
> hw/vfio/container-apple.c          | 241 ++++++++
> hw/vfio/device.c                   |  42 ++
> hw/vfio/meson.build                |  12 +-
> hw/vfio/migration.c                |   5 +-
> hw/vfio/pci.c                      |  50 +-
> hw/vfio/pci.h                      |   1 +
> hw/vfio/region.c                   | 108 ++--
> hw/vfio/types.h                    |   2 +
> hw/vfio/vfio-helpers.h             |   2 +-
> hw/vfio/vfio-migration-internal.h  |   4 +-
> hw/vfio/vfio-region.h              |   4 +
> include/compat/linux/ioctl.h       |   2 +
> include/compat/linux/types.h       |  26 +
> include/hw/pci/pci.h               |   1 +
> include/hw/vfio/vfio-container.h   |   1 +
> include/hw/vfio/vfio-device.h      |  40 +-
> meson.build                        |  10 +-
> util/event_notifier-posix.c        |   5 +-
> 34 files changed, 3197 insertions(+), 68 deletions(-)
> create mode 100644 docs/system/devices/vfio-apple.rst
> create mode 100644 hw/vfio/apple-device.c
> create mode 100644 hw/vfio/apple-dext-client.c
> create mode 100644 hw/vfio/apple-dext-client.h
> create mode 100644 hw/vfio/apple-dma.c
> create mode 100644 hw/vfio/apple.h
> create mode 100644 hw/vfio/container-apple.c
> create mode 100644 include/compat/linux/ioctl.h
> create mode 100644 include/compat/linux/types.h
> 
> -- 
> 2.50.1 (Apple Git-155)
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-05  8:01 ` [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Mohamed Mediouni
@ 2026-04-05  8:14   ` Mohamed Mediouni
  2026-04-05 23:20     ` Scott J. Goldman
  0 siblings, 1 reply; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-05  8:14 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x



> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
> 
>> 
>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>> 
>> This series adds VFIO PCI device passthrough support for Apple Silicon
>> Macs running macOS, using a DriverKit extension (dext) as the host
>> backend instead of the Linux VFIO kernel driver.
>> 
>> I'm sending this as an RFC because I'd like feedback before investing
>> further in upstreaming.  The code is functional.  I've tested it with
>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>> [1]), likely due to the BAR access penalty described below.  AI
>> inference workloads appear less affected.  Ollama with Qwen 3.5
>> generates around 140 tok/sec on the same setup [2].
>> 
>> How it works:
>> 
>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>> for device access and DMA mapping.  On macOS, there is no equivalent
>> kernel interface.  Instead, a userspace DriverKit extension
>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>> IOKit's IOUserClient and PCIDriverKit APIs.
>> 
>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>> passthrough infrastructure.  A few ioctl callsites are refactored into
>> io_ops callbacks, the build system is extended for Darwin, and the
>> Apple-specific backend plugs in behind those abstractions.
>> 
>> The guest sees two PCI devices: the passthrough device itself
>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>> AppleVFIOContainer implements the IOMMU backend, and a C client
>> library wraps the IOUserClient calls to the dext for config space,
>> BAR MMIO, interrupts, reset, and DMA.
>> 
>> DMA limitations:
>> 
>> This is the biggest platform constraint.  Unlike a typical IOMMU
>> mapping operation where the caller specifies the IOVA, the
>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>> system-assigned IOVA.  There is no way to request a specific address.
>> This means the guest's requested DMA addresses cannot be used
>> directly.  The guest kernel module must intercept DMA mapping calls
>> and forward them through the companion device to get the actual
>> hardware IOVA.
> 
> Hello,
> 
> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
> API used by Virtualization.framework but that’s a different design.
> 
> Would bounce buffering using something akin the confidential compute path and 
> a pre-defined chunk of host memory accessible from the device, and then managing
> the guest address map work? (see swiotlb).

see restricted-dma-pool

I think in this specific case that ACPI support isn’t worth it and that FDT
will be good enough.

The limitation that I can see there if if you can’t match IOVA and GPA for that
restricted DMA pool, then you’ll need a small (and hopefully easy to merge) kernel
change.

> 
> If the last part isn’t possible, something minimal to export an swiotlb window
> through device tree with giving the IOVA there would be good too.
> 
> And that will get rid of a need for a apple-dma-pci device.


> 
>> There are also hard platform limits: approximately 1.5 GB total
>> mapped memory and roughly 64k concurrent mappings.  Not all
>> workloads will fit within these limits, though GPU gaming and LLM
>> inference have worked in practice.
> 
> That’s not too dissimilar from the confidential compute limitations.
> 
>> 
>> BAR access has performance issues as well.  HVF does not expose
>> controls to map device memory as cacheable in the guest, creating a
>> significant performance penalty on BAR MMIO.  Uncached mappings work
>> correctly but slowly compared to what the hardware could do.
> 
> That’s not a macOS limitation and not an Apple hardware limitation, but
> it’s more fundamental to how PCIe works.
> 
> Unlike CXL, PCIe doesn’t have a coherency protocol story, and the alternative
> of uncached and doing manual software-managed flushes isn’t really tenable.
> 
>> 
>> What works:
>> - PCI config space passthrough
>> - BAR MMIO via direct-mapped device memory
>> - MSI/MSI-X interrupts via async notification from the dext
>> - Device reset (FLR with hot-reset fallback)
>> - DMA mapping for guest device drivers
>> 
> This is very interesting to see :)
> 
>> What doesn't work:
>> - Expansion ROM / VBIOS passthrough
>> - PCI BAR quirks
>> - VGA region passthrough
>> - Migration and dirty page tracking
>> - Hot-unplug
>> 
> 
> 
> 
>> Questions for reviewers:
>> 
>> 1. Is this something the VFIO maintainers would consider carrying
>>  upstream?  The refactoring patches (3-6) are benign, but the Apple
>>  backend is a new platform with real limitations.  That said, if Apple
>>  lifts some of the DART/HVF restrictions in a future macOS release, the
>>  code changes to take advantage would likely be minor.  I'd like to
>>  understand whether this is in scope before doing the work to
>>  address review feedback on the full series.
>> 
>> 2. The apple-dma-pci companion device: should this be a virtio device
>>  instead?  I went with a simple custom PCI device because the virtio
>>  infrastructure didn't buy much for what is essentially a {map, unmap}
>>  register interface, but if virtio is preferred, what is the process
>>  for allocating a device ID?  If a custom PCI device is the right
>>  approach, I've tentatively allocated 1b36:0015.  Is there a process
>>  for reserving a device ID under the Red Hat PCI vendor, or is
>>  claiming it in pci-ids.rst sufficient?  The guest-side kernel module
>>  hooks all DMA mapping functions for passed-through devices, which is
>>  unusual enough that I'm not sure it's upstreamable in the Linux
>>  kernel.  I can maintain it out of tree if needed.
> 
> I’d recommend using bounce buffers like the CoCo case if possible. I don’t
> think that the apple-dma-pci definitely-not-an-IOMMU is a good idea.
> 
>> 
>> 3. Should the macOS host-side DriverKit extension live in the QEMU
>>  tree?  It's not included in this series and requires Apple code
>>  signing.  I'm happy to keep it out of tree if that's preferred,
>>  or include the source if reviewers want it co-located.
> 
> Both are fine I think. Could you share compatibility with the tinygrad
> one at https://github.com/tinygrad/tinygrad/tree/7e54992bf600789dbe5d37b99fe12a19c32e36a1/extra/usbgpu/tbgpu/installer and prebuilt at https://raw.githubusercontent.com/tinygrad/tinygpu_releases/refs/heads/main/TinyGPU.zip?
> 
>> 
>> 4. The existing VFIO code includes <linux/vfio.h> from the
>>  linux-headers/ tree, which is intended to track upstream Linux
>>  UAPI headers.  To make this compile on macOS, I added minimal
>>  stub headers (include/compat/linux/types.h and linux/ioctl.h)
>>  so the existing vfio.h parses on macOS without modification.  An
>>  alternative would be to move an approximation of vfio.h into
>>  standard-headers/, but that felt against the spirit of tracking
>>  the latest upstream headers, and the standard-headers import
>>  process strips ioctls which the VFIO code relies on.  I felt
>>  the stub approach was the least invasive, but I'm open to
>>  changing it if there's a preferred way to handle this.
>> 
>> [1] https://imgur.com/a/xoRS9kT
>> [2] https://imgur.com/a/ui4pYF0
>> 
>> Scott J. Goldman (10):
>> vfio/pci: Use the write side of EventNotifier for IRQ signaling
>> accel/hvf: avoid executable mappings for RAM-device memory
>> vfio: Allow building on Darwin hosts
>> vfio: Prepare existing code for Apple VFIO backend
>> vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps
>> vfio: Add device_reset callback to VFIODeviceIOOps
>> vfio/apple: Add DriverKit dext client library
>> vfio/apple: Add IOMMU container and PCI device
>> vfio/apple: Add apple-dma-pci companion device
>> docs: Add vfio-apple documentation and MAINTAINERS entry
>> 
>> Kconfig.host                       |   3 +
>> MAINTAINERS                        |  11 +
>> accel/hvf/hvf-all.c                |  10 +-
>> backends/Kconfig                   |   2 +-
>> docs/specs/pci-ids.rst             |   3 +
>> docs/system/device-emulation.rst   |   1 +
>> docs/system/devices/vfio-apple.rst | 160 +++++
>> hw/vfio-user/device.c              |  16 +-
>> hw/vfio/Kconfig                    |   4 +-
>> hw/vfio/ap.c                       |   4 +-
>> hw/vfio/apple-device.c             | 945 +++++++++++++++++++++++++++++
>> hw/vfio/apple-dext-client.c        | 681 +++++++++++++++++++++
>> hw/vfio/apple-dext-client.h        | 253 ++++++++
>> hw/vfio/apple-dma.c                | 540 +++++++++++++++++
>> hw/vfio/apple.h                    |  74 +++
>> hw/vfio/ccw.c                      |   2 +-
>> hw/vfio/container-apple.c          | 241 ++++++++
>> hw/vfio/device.c                   |  42 ++
>> hw/vfio/meson.build                |  12 +-
>> hw/vfio/migration.c                |   5 +-
>> hw/vfio/pci.c                      |  50 +-
>> hw/vfio/pci.h                      |   1 +
>> hw/vfio/region.c                   | 108 ++--
>> hw/vfio/types.h                    |   2 +
>> hw/vfio/vfio-helpers.h             |   2 +-
>> hw/vfio/vfio-migration-internal.h  |   4 +-
>> hw/vfio/vfio-region.h              |   4 +
>> include/compat/linux/ioctl.h       |   2 +
>> include/compat/linux/types.h       |  26 +
>> include/hw/pci/pci.h               |   1 +
>> include/hw/vfio/vfio-container.h   |   1 +
>> include/hw/vfio/vfio-device.h      |  40 +-
>> meson.build                        |  10 +-
>> util/event_notifier-posix.c        |   5 +-
>> 34 files changed, 3197 insertions(+), 68 deletions(-)
>> create mode 100644 docs/system/devices/vfio-apple.rst
>> create mode 100644 hw/vfio/apple-device.c
>> create mode 100644 hw/vfio/apple-dext-client.c
>> create mode 100644 hw/vfio/apple-dext-client.h
>> create mode 100644 hw/vfio/apple-dma.c
>> create mode 100644 hw/vfio/apple.h
>> create mode 100644 hw/vfio/container-apple.c
>> create mode 100644 include/compat/linux/ioctl.h
>> create mode 100644 include/compat/linux/types.h
>> 
>> -- 
>> 2.50.1 (Apple Git-155)




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-05  8:14   ` Mohamed Mediouni
@ 2026-04-05 23:20     ` Scott J. Goldman
  2026-04-06  0:16       ` Mohamed Mediouni
  0 siblings, 1 reply; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05 23:20 UTC (permalink / raw)
  To: Mohamed Mediouni, Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>
>
>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>> 
>>> 
>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>> 
>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>> backend instead of the Linux VFIO kernel driver.
>>> 
>>> I'm sending this as an RFC because I'd like feedback before investing
>>> further in upstreaming.  The code is functional.  I've tested it with
>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>> [1]), likely due to the BAR access penalty described below.  AI
>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>> generates around 140 tok/sec on the same setup [2].
>>> 
>>> How it works:
>>> 
>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>> kernel interface.  Instead, a userspace DriverKit extension
>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>> 
>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>> io_ops callbacks, the build system is extended for Darwin, and the
>>> Apple-specific backend plugs in behind those abstractions.
>>> 
>>> The guest sees two PCI devices: the passthrough device itself
>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>> library wraps the IOUserClient calls to the dext for config space,
>>> BAR MMIO, interrupts, reset, and DMA.
>>> 
>>> DMA limitations:
>>> 
>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>> mapping operation where the caller specifies the IOVA, the
>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>> system-assigned IOVA.  There is no way to request a specific address.
>>> This means the guest's requested DMA addresses cannot be used
>>> directly.  The guest kernel module must intercept DMA mapping calls
>>> and forward them through the companion device to get the actual
>>> hardware IOVA.
>> 
>> Hello,
>> 
>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>> API used by Virtualization.framework but that’s a different design.

This is really interesting and I had not heard about this. Are you
able to elaborate on this one at all? Maybe this is something where an
internal API to manipulate the DART is available inside
Virtualization.framework?

>> Would bounce buffering using something akin the confidential compute path and 
>> a pre-defined chunk of host memory accessible from the device, and then managing
>> the guest address map work? (see swiotlb).

I tested this approach early on, but ran into a couple issues:

1. Not only does PrepareForDMA() limit the total size of the pool, but
   it also limits the size of individual allocations. IIRC it not very
   large at around 16MB. Thankfully, I found that the allocator seemed
   to just keep allocating continguously across multiple allocations, so
   maybe that's fine?
2. Linux swiotlb default configuration is too small for GPU drivers. The
   max single mapping is 256KB and the total pool size is 64MB. The
   overall pool size is configurable but the max single mapping is
   derived from IO_TLB_SEGSIZE and IO_TLB_SIZE which are compile-time
   constants. During games, I have seen roughly ~900MB of active DMA
   mappings and mappings much larger than 256kb.

I abandoned this approach because it seemed like the CPU penalty of
bouncing all the DMA buffers would be pretty severe and the swiotlb
allocator just didn't seem designed for this much memory pressure. I
also was hoping to avoid the requirement of recompiling the entire guest
kernel as a prerequisite for guests to use this passthrough feature. On
top of that, I wasn't sure if upstream would even be willing to take
changes to support this use case, since it's so far outside what the
existing swiotlb allocator would normally be doing.

That said, you were saying that CoCo is fine with this restriction? Do
other devices just not have drivers that are doing so much allocation? I
didn't actually try changing the constants and recompiling the guest
kernel in swiotlb to make the pool big enough for it to really work at
all with the nvidia guest driver, I will have to see what happens.

>
> see restricted-dma-pool
>
> I think in this specific case that ACPI support isn’t worth it and that FDT
> will be good enough.

Yes, this seems fine to me as well if we went the swiotlb route. It
could be a different `-machine` type or perhaps a machine-specific param
if we went this route, maybe.

>
> The limitation that I can see there if if you can’t match IOVA and GPA for that
> restricted DMA pool, then you’ll need a small (and hopefully easy to merge) kernel
> change.
>> If the last part isn’t possible, something minimal to export an swiotlb window
>> through device tree with giving the IOVA there would be good too.
>> 
>> And that will get rid of a need for a apple-dma-pci device.

I am not 100% sure since I didn't try this exactly, but it seems like
you could have the DriverKit side allocate a big DMA buffer before the
guest starts, and then identity map the region somewhere inside the
guest with the `restricted-dma-pool` attribute attached to it. The
caveat being that you might have to pray that the region is contiguous
or introduce a much more complicated swiotlb subsystem allocator.

WRT a kernel patch to make it easier, can you elaborate on what you werelt thinking there?

>>> There are also hard platform limits: approximately 1.5 GB total
>>> mapped memory and roughly 64k concurrent mappings.  Not all
>>> workloads will fit within these limits, though GPU gaming and LLM
>>> inference have worked in practice.
>> 
>> That’s not too dissimilar from the confidential compute limitations.
>> 
>>> 
>>> BAR access has performance issues as well.  HVF does not expose
>>> controls to map device memory as cacheable in the guest, creating a
>>> significant performance penalty on BAR MMIO.  Uncached mappings work
>>> correctly but slowly compared to what the hardware could do.
>> 
>> That’s not a macOS limitation and not an Apple hardware limitation, but
>> it’s more fundamental to how PCIe works.
>> 
>> Unlike CXL, PCIe doesn’t have a coherency protocol story, and the alternative
>> of uncached and doing manual software-managed flushes isn’t really tenable.

Apologies, I misspoke. It's not cacheability that's the issue. I think
it's write-combining. Specifically the question is how the HVF sets the 
attributes in the stage-2 page tables. The behavior is observable by
looking at the performance of sweeping writes across the BARs.

As part of the work to implement and test this change I wrote such a
benchmark as a client of the dext in the host, and a Linux kernel module
that runs in the guest. It takes BAR1 (VRAM aperture) and does a write
sweep of 8MB with 4 passes and measures the results.

Host (mapped with kIOWriteCombineCache): 386mb/sec
Host (mapped with kIInhibitCache): 46mb/sec

Guest (mapped with ioremap_wc) 31mb/s
Guest (mapped with ioremap): 31mb/s

In the case of BAR1, it is marked prefetchable so I believe you would
usually want to map it with write-combining. I'm not sure why the case
without write-combining is worse in the guest, but it's the same order
of magnitude. I think the real interesting thing there is that the
write-combining map in the guest performs identically to the one 
without. To me, that indicates that perhaps the stage-2 bits are not set
properly. Even though the host has mapped the memory with
kIOWriteCombineCache, this wasn't propogated when HVF maps this into the
guest, which probably falls back to the lesser of the stage-1 vs stage-2
mappings (i.e. disabling write-combining). 

>> 
>>> 
>>> What works:
>>> - PCI config space passthrough
>>> - BAR MMIO via direct-mapped device memory
>>> - MSI/MSI-X interrupts via async notification from the dext
>>> - Device reset (FLR with hot-reset fallback)
>>> - DMA mapping for guest device drivers
>>> 
>> This is very interesting to see :)

Thanks! It's always nice to catch some interest/advice for a strange
project like this.

>> 
>>> What doesn't work:
>>> - Expansion ROM / VBIOS passthrough
>>> - PCI BAR quirks
>>> - VGA region passthrough
>>> - Migration and dirty page tracking
>>> - Hot-unplug
>>> 
>>
>> 
>> 
>>> Questions for reviewers:
>>> 
>>> 1. Is this something the VFIO maintainers would consider carrying
>>>  upstream?  The refactoring patches (3-6) are benign, but the Apple
>>>  backend is a new platform with real limitations.  That said, if Apple
>>>  lifts some of the DART/HVF restrictions in a future macOS release, the
>>>  code changes to take advantage would likely be minor.  I'd like to
>>>  understand whether this is in scope before doing the work to
>>>  address review feedback on the full series.
>>> 
>>> 2. The apple-dma-pci companion device: should this be a virtio device
>>>  instead?  I went with a simple custom PCI device because the virtio
>>>  infrastructure didn't buy much for what is essentially a {map, unmap}
>>>  register interface, but if virtio is preferred, what is the process
>>>  for allocating a device ID?  If a custom PCI device is the right
>>>  approach, I've tentatively allocated 1b36:0015.  Is there a process
>>>  for reserving a device ID under the Red Hat PCI vendor, or is
>>>  claiming it in pci-ids.rst sufficient?  The guest-side kernel module
>>>  hooks all DMA mapping functions for passed-through devices, which is
>>>  unusual enough that I'm not sure it's upstreamable in the Linux
>>>  kernel.  I can maintain it out of tree if needed.
>> 
>> I’d recommend using bounce buffers like the CoCo case if possible. I don’t
>> think that the apple-dma-pci definitely-not-an-IOMMU is a good idea.

To be clear, it definitely is weird and bad, but it was seemingly the
least bad option that I was able to get working with minimal guest
changes (just one guest kmod).

>> 
>>> 
>>> 3. Should the macOS host-side DriverKit extension live in the QEMU
>>>  tree?  It's not included in this series and requires Apple code
>>>  signing.  I'm happy to keep it out of tree if that's preferred,
>>>  or include the source if reviewers want it co-located.
>> 
>> Both are fine I think. Could you share compatibility with the tinygrad
>> one at https://github.com/tinygrad/tinygrad/tree/7e54992bf600789dbe5d37b99fe12a19c32e36a1/extra/usbgpu/tbgpu/installer and prebuilt at https://raw.githubusercontent.com/tinygrad/tinygpu_releases/refs/heads/main/TinyGPU.zip?

This is a good question and not something I had considered. My module
probably works a little different than their module. It's possible I'm
wrong but my understanding was:

1. They got apple entitlements for AMD/NVIDIA driver vendor ids only.
   That said, if it became compatible with QEMU, I suppose it would be
   an easy case to make that it could be expanded to wildcard (another
   developer indicated to me that Apple was willing to grant the
   wildcard entitlement if the use case was justifiable)
2. The architecture of their driver is a little different. I believe
   they are allocating DMA-able memory in the driver and mapping it down
   to userland, so it's kind of the reverse of what I'm doing now. I
   guess, conceivably they could change how they are doing this to unify
   our efforts.

Thanks,
-sjg

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-05 23:20     ` Scott J. Goldman
@ 2026-04-06  0:16       ` Mohamed Mediouni
  2026-04-08  7:02         ` Scott J. Goldman
  0 siblings, 1 reply; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-06  0:16 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x



> On 6. Apr 2026, at 01:20, Scott J. Goldman <scottjgo@gmail.com> wrote:
> 
> On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>> 
>> 
>>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>> 
>>>> 
>>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>> 
>>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>>> backend instead of the Linux VFIO kernel driver.
>>>> 
>>>> I'm sending this as an RFC because I'd like feedback before investing
>>>> further in upstreaming.  The code is functional.  I've tested it with
>>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>>> [1]), likely due to the BAR access penalty described below.  AI
>>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>>> generates around 140 tok/sec on the same setup [2].
>>>> 
>>>> How it works:
>>>> 
>>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>>> kernel interface.  Instead, a userspace DriverKit extension
>>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>>> 
>>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>>> io_ops callbacks, the build system is extended for Darwin, and the
>>>> Apple-specific backend plugs in behind those abstractions.
>>>> 
>>>> The guest sees two PCI devices: the passthrough device itself
>>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>>> library wraps the IOUserClient calls to the dext for config space,
>>>> BAR MMIO, interrupts, reset, and DMA.
>>>> 
>>>> DMA limitations:
>>>> 
>>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>>> mapping operation where the caller specifies the IOVA, the
>>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>>> system-assigned IOVA.  There is no way to request a specific address.
>>>> This means the guest's requested DMA addresses cannot be used
>>>> directly.  The guest kernel module must intercept DMA mapping calls
>>>> and forward them through the companion device to get the actual
>>>> hardware IOVA.
>>> 
>>> Hello,
>>> 
>>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>>> API used by Virtualization.framework but that’s a different design.
> 
> This is really interesting and I had not heard about this. Are you
> able to elaborate on this one at all? Maybe this is something where an
> internal API to manipulate the DART is available inside
> Virtualization.framework?

Hello,

All of it needs using private entitlements currently.

It’s _VZPCIPassthroughDeviceConfiguration, a private class needing com.apple.private.virtualization to use.

The VMM process itself then uses the com.apple.private.PCIPassthrough.access entitlement. I’m not
sure whether OS versions even have all the code currently though.

>>> Would bounce buffering using something akin the confidential compute path and 
>>> a pre-defined chunk of host memory accessible from the device, and then managing
>>> the guest address map work? (see swiotlb).
> 
> I tested this approach early on, but ran into a couple issues:
> 
> 1. Not only does PrepareForDMA() limit the total size of the pool, but
>   it also limits the size of individual allocations. IIRC it not very
>   large at around 16MB.
Sigh.

> Thankfully, I found that the allocator seemed
>   to just keep allocating continguously across multiple allocations, so
>   maybe that's fine?
That’s good… but it sounds brittle…

> 2. Linux swiotlb default configuration is too small for GPU drivers. The
>   max single mapping is 256KB and the total pool size is 64MB. The
>   overall pool size is configurable but the max single mapping is
>   derived from IO_TLB_SEGSIZE and IO_TLB_SIZE which are compile-time
>   constants. During games, I have seen roughly ~900MB of active DMA
>   mappings and mappings much larger than 256kb.

Pre-defined mappings with restricted-dma-pool sound like a good idea there.
> 
> I abandoned this approach because it seemed like the CPU penalty of
> bouncing all the DMA buffers would be pretty severe and the swiotlb
> allocator just didn't seem designed for this much memory pressure. I
> also was hoping to avoid the requirement of recompiling the entire guest
> kernel as a prerequisite for guests to use this passthrough feature. On
> top of that, I wasn't sure if upstream would even be willing to take
> changes to support this use case, since it's so far outside what the
> existing swiotlb allocator would normally be doing.
> 
> That said, you were saying that CoCo is fine with this restriction? Do
> other devices just not have drivers that are doing so much allocation? I
> didn't actually try changing the constants and recompiling the guest
> kernel in swiotlb to make the pool big enough for it to really work at
> all with the nvidia guest driver, I will have to see what happens.

CoCo with bounce buffering works with NVIDIA GPUs. It had to be done because
no trusted I/O path (and implementing that is a quagmire).

A recent Intel post about it claiming production-readiness:

https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Confidential-AI-with-GPU-Acceleration-Bounce-Buffers-Offer-a/post/1740417

> 
>> 
>> see restricted-dma-pool
>> 
>> I think in this specific case that ACPI support isn’t worth it and that FDT
>> will be good enough.
> 
> Yes, this seems fine to me as well if we went the swiotlb route. It
> could be a different `-machine` type or perhaps a machine-specific param
> if we went this route, maybe.
> 
>> 
>> The limitation that I can see there if if you can’t match IOVA and GPA for that
>> restricted DMA pool, then you’ll need a small (and hopefully easy to merge) kernel
>> change.
>>> If the last part isn’t possible, something minimal to export an swiotlb window
>>> through device tree with giving the IOVA there would be good too.
>>> 
>>> And that will get rid of a need for a apple-dma-pci device.
> 
> I am not 100% sure since I didn't try this exactly, but it seems like
> you could have the DriverKit side allocate a big DMA buffer before the
> guest starts, and then identity map the region somewhere inside the
> guest with the `restricted-dma-pool` attribute attached to it. The
> caveat being that you might have to pray that the region is contiguous
> or introduce a much more complicated swiotlb subsystem allocator.
> 
> WRT a kernel patch to make it easier, can you elaborate on what you werelt thinking there?

Using restricted-dma-pool - with changes to be able specify to IOVA base if necessary.
> 
>>>> There are also hard platform limits: approximately 1.5 GB total
>>>> mapped memory and roughly 64k concurrent mappings.  Not all
>>>> workloads will fit within these limits, though GPU gaming and LLM
>>>> inference have worked in practice.
>>> 
>>> That’s not too dissimilar from the confidential compute limitations.
>>> 
>>>> 
>>>> BAR access has performance issues as well.  HVF does not expose
>>>> controls to map device memory as cacheable in the guest, creating a
>>>> significant performance penalty on BAR MMIO.  Uncached mappings work
>>>> correctly but slowly compared to what the hardware could do.
>>> 
>>> That’s not a macOS limitation and not an Apple hardware limitation, but
>>> it’s more fundamental to how PCIe works.
>>> 
>>> Unlike CXL, PCIe doesn’t have a coherency protocol story, and the alternative
>>> of uncached and doing manual software-managed flushes isn’t really tenable.
> 
> Apologies, I misspoke. It's not cacheability that's the issue. I think
> it's write-combining. Specifically the question is how the HVF sets the 
> attributes in the stage-2 page tables. The behavior is observable by
> looking at the performance of sweeping writes across the BARs.
> 

Hello,

Oh that makes a lot more sense.

There are also other oddities going on there. For device memory, macOS does this thing:
https://github.com/apple-oss-distributions/xnu/blob/main/osfmk/arm64/sleh.c#L1756C1-L1756C33

This function is empty in open-source XNU, but it’s very much *not* empty in closed-source XNU (sigh).

> As part of the work to implement and test this change I wrote such a
> benchmark as a client of the dext in the host, and a Linux kernel module
> that runs in the guest. It takes BAR1 (VRAM aperture) and does a write
> sweep of 8MB with 4 passes and measures the results.
> 
> Host (mapped with kIOWriteCombineCache): 386mb/sec
> Host (mapped with kIInhibitCache): 46mb/sec
> 
> Guest (mapped with ioremap_wc) 31mb/s
> Guest (mapped with ioremap): 31mb/s
> 
> In the case of BAR1, it is marked prefetchable so I believe you would
> usually want to map it with write-combining. I'm not sure why the case
> without write-combining is worse in the guest, but it's the same order
> of magnitude. I think the real interesting thing there is that the
> write-combining map in the guest performs identically to the one 
> without. To me, that indicates that perhaps the stage-2 bits are not set
> properly.

That indeed looks like that...
> Even though the host has mapped the memory with
> kIOWriteCombineCache, this wasn't propogated when HVF maps this into the
> guest, which probably falls back to the lesser of the stage-1 vs stage-2
> mappings (i.e. disabling write-combining). 
> 
> 
>>> 
>>>> 
>>>> What works:
>>>> - PCI config space passthrough
>>>> - BAR MMIO via direct-mapped device memory
>>>> - MSI/MSI-X interrupts via async notification from the dext
>>>> - Device reset (FLR with hot-reset fallback)
>>>> - DMA mapping for guest device drivers
>>>> 
>>> This is very interesting to see :)
> 
> Thanks! It's always nice to catch some interest/advice for a strange
> project like this.
> 
>>> 
>>>> What doesn't work:
>>>> - Expansion ROM / VBIOS passthrough
>>>> - PCI BAR quirks
>>>> - VGA region passthrough
>>>> - Migration and dirty page tracking
>>>> - Hot-unplug
>>>> 
>>> 
>>> 
>>> 
>>>> Questions for reviewers:
>>>> 
>>>> 1. Is this something the VFIO maintainers would consider carrying
>>>> upstream?  The refactoring patches (3-6) are benign, but the Apple
>>>> backend is a new platform with real limitations.  That said, if Apple
>>>> lifts some of the DART/HVF restrictions in a future macOS release, the
>>>> code changes to take advantage would likely be minor.  I'd like to
>>>> understand whether this is in scope before doing the work to
>>>> address review feedback on the full series.
>>>> 
>>>> 2. The apple-dma-pci companion device: should this be a virtio device
>>>> instead?  I went with a simple custom PCI device because the virtio
>>>> infrastructure didn't buy much for what is essentially a {map, unmap}
>>>> register interface, but if virtio is preferred, what is the process
>>>> for allocating a device ID?  If a custom PCI device is the right
>>>> approach, I've tentatively allocated 1b36:0015.  Is there a process
>>>> for reserving a device ID under the Red Hat PCI vendor, or is
>>>> claiming it in pci-ids.rst sufficient?  The guest-side kernel module
>>>> hooks all DMA mapping functions for passed-through devices, which is
>>>> unusual enough that I'm not sure it's upstreamable in the Linux
>>>> kernel.  I can maintain it out of tree if needed.
>>> 
>>> I’d recommend using bounce buffers like the CoCo case if possible. I don’t
>>> think that the apple-dma-pci definitely-not-an-IOMMU is a good idea.
> 
> To be clear, it definitely is weird and bad, but it was seemingly the
> least bad option that I was able to get working with minimal guest
> changes (just one guest kmod).
> 
>>> 
>>>> 
>>>> 3. Should the macOS host-side DriverKit extension live in the QEMU
>>>> tree?  It's not included in this series and requires Apple code
>>>> signing.  I'm happy to keep it out of tree if that's preferred,
>>>> or include the source if reviewers want it co-located.
>>> 
>>> Both are fine I think. Could you share compatibility with the tinygrad
>>> one at https://github.com/tinygrad/tinygrad/tree/7e54992bf600789dbe5d37b99fe12a19c32e36a1/extra/usbgpu/tbgpu/installer and prebuilt at https://raw.githubusercontent.com/tinygrad/tinygpu_releases/refs/heads/main/TinyGPU.zip?
> 
> This is a good question and not something I had considered. My module
> probably works a little different than their module. It's possible I'm
> wrong but my understanding was:
> 
> 1. They got apple entitlements for AMD/NVIDIA driver vendor ids only.
>   That said, if it became compatible with QEMU, I suppose it would be
>   an easy case to make that it could be expanded to wildcard (another
>   developer indicated to me that Apple was willing to grant the
>   wildcard entitlement if the use case was justifiable)
> 2. The architecture of their driver is a little different. I believe
>   they are allocating DMA-able memory in the driver and mapping it down
>   to userland, so it's kind of the reverse of what I'm doing now. I
>   guess, conceivably they could change how they are doing this to unify
>   our efforts.
> 
> Thanks,
> -sjg
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-06  0:16       ` Mohamed Mediouni
@ 2026-04-08  7:02         ` Scott J. Goldman
  2026-04-08  8:33           ` Mohamed Mediouni
  2026-04-08 19:09           ` Mohamed Mediouni
  0 siblings, 2 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-08  7:02 UTC (permalink / raw)
  To: Mohamed Mediouni, Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

On Sun Apr 5, 2026 at 5:16 PM PDT, Mohamed Mediouni wrote:
>
>
>> On 6. Apr 2026, at 01:20, Scott J. Goldman <scottjgo@gmail.com> wrote:
>> 
>> On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>>> 
>>> 
>>>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>>> 
>>>>> 
>>>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>> 
>>>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>>>> backend instead of the Linux VFIO kernel driver.
>>>>> 
>>>>> I'm sending this as an RFC because I'd like feedback before investing
>>>>> further in upstreaming.  The code is functional.  I've tested it with
>>>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>>>> [1]), likely due to the BAR access penalty described below.  AI
>>>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>>>> generates around 140 tok/sec on the same setup [2].
>>>>> 
>>>>> How it works:
>>>>> 
>>>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>>>> kernel interface.  Instead, a userspace DriverKit extension
>>>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>>>> 
>>>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>>>> io_ops callbacks, the build system is extended for Darwin, and the
>>>>> Apple-specific backend plugs in behind those abstractions.
>>>>> 
>>>>> The guest sees two PCI devices: the passthrough device itself
>>>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>>>> library wraps the IOUserClient calls to the dext for config space,
>>>>> BAR MMIO, interrupts, reset, and DMA.
>>>>> 
>>>>> DMA limitations:
>>>>> 
>>>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>>>> mapping operation where the caller specifies the IOVA, the
>>>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>>>> system-assigned IOVA.  There is no way to request a specific address.
>>>>> This means the guest's requested DMA addresses cannot be used
>>>>> directly.  The guest kernel module must intercept DMA mapping calls
>>>>> and forward them through the companion device to get the actual
>>>>> hardware IOVA.
>>>> 
>>>> Hello,
>>>> 
>>>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>>>> API used by Virtualization.framework but that's a different design.
>> 
>> This is really interesting and I had not heard about this. Are you
>> able to elaborate on this one at all? Maybe this is something where an
>> internal API to manipulate the DART is available inside
>> Virtualization.framework?
>
> Hello,
>
> All of it needs using private entitlements currently.
>
> It's _VZPCIPassthroughDeviceConfiguration, a private class needing com.apple.private.virtualization to use.
>
> The VMM process itself then uses the com.apple.private.PCIPassthrough.access entitlement. I'm not
> sure whether OS versions even have all the code currently though.
>

Appreciate the pointers here. It looks like, as you said, the framework
taps into a bunch of code that isn't shipped to us mere mortals. I can
see from some of the code in Virtualization.framework the general shape
of what they're doing, though.

It looks like they implement a virtio-iommu device that ultimately calls
into the host kernel with some internal APIs to do the DART mappings. 

>>>> Would bounce buffering using something akin the confidential compute path and 
>>>> a pre-defined chunk of host memory accessible from the device, and then managing
>>>> the guest address map work? (see swiotlb).
>> 
>> I tested this approach early on, but ran into a couple issues:
>> 
>> 1. Not only does PrepareForDMA() limit the total size of the pool, but
>>   it also limits the size of individual allocations. IIRC it not very
>>   large at around 16MB.
> Sigh.
>
>> Thankfully, I found that the allocator seemed
>>   to just keep allocating continguously across multiple allocations, so
>>   maybe that's fine?
> That's good… but it sounds brittle…
>
>> 2. Linux swiotlb default configuration is too small for GPU drivers. The
>>   max single mapping is 256KB and the total pool size is 64MB. The
>>   overall pool size is configurable but the max single mapping is
>>   derived from IO_TLB_SEGSIZE and IO_TLB_SIZE which are compile-time
>>   constants. During games, I have seen roughly ~900MB of active DMA
>>   mappings and mappings much larger than 256kb.
>
> Pre-defined mappings with restricted-dma-pool sound like a good idea there.
>> 
>> I abandoned this approach because it seemed like the CPU penalty of
>> bouncing all the DMA buffers would be pretty severe and the swiotlb
>> allocator just didn't seem designed for this much memory pressure. I
>> also was hoping to avoid the requirement of recompiling the entire guest
>> kernel as a prerequisite for guests to use this passthrough feature. On
>> top of that, I wasn't sure if upstream would even be willing to take
>> changes to support this use case, since it's so far outside what the
>> existing swiotlb allocator would normally be doing.
>> 
>> That said, you were saying that CoCo is fine with this restriction? Do
>> other devices just not have drivers that are doing so much allocation? I
>> didn't actually try changing the constants and recompiling the guest
>> kernel in swiotlb to make the pool big enough for it to really work at
>> all with the nvidia guest driver, I will have to see what happens.
>
> CoCo with bounce buffering works with NVIDIA GPUs. It had to be done because
> no trusted I/O path (and implementing that is a quagmire).
>
> A recent Intel post about it claiming production-readiness:
>
> https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Confidential-AI-with-GPU-Acceleration-Bounce-Buffers-Offer-a/post/1740417

I dug in here and implemented the restricted-dma-pool solution. Still
needs some cleanups but it's working enough to test. To start with the
bad news:

1. As I mentioned previously, the mainline kernel has a max 256k limit
for any single swiotlb mapping. This has been debated a few times on
LKML, but the consensus has been generally that it should not be changed
or made configurable. You can see the threads:
  - https://lkml.org/lkml/2015/3/3/84
  - https://patchwork.kernel.org/project/linux-mips/patch/20210914151016.3174924-1-Roman_Skakun@epam.com/

2. NVIDIA drivers immediately make a contiguous 528384 byte allocation,
at least on my hardware (NVIDIA RTX 5090), which is required as part of
initializing the firmware on the card. This obviously fails immediately.
It happens on both the NVIDIA-provided "open" drivers[1] and the in-tree
`nouveau` [2], so it's more a hardware-specific issue than just a driver
problem. If you hack around that (allocate 3 smaller buffers and hope
they are contiguous), you'll see that both drivers assume coherent DMA
memory (moreso in the nvidia driver than nouveau, but it's a problem in
both). They map DMA buffers and then write data into the buffers
afterward. So you end up sending empty swiotlb buffers to the card and
it'll ultimately fail to initialize.

It's possible the press release was referring to using the closed NVIDIA
drivers, but those are now deprecated and don't support my newer GPU.

But, there is good news:

1. The IOVA range that seems to always come from PCIDriverKit is pretty
far outside the default qemu mapping from `-machine virt`, so the range
can be cleanly identity mapped in the VM without overlap. One of the
restrictions I noted earlier (16MB max contiguous mapping) was actually
just a bug in my code. A large contiguous mapping seems to work fine,
though the ~1.5GB limit is still real.

2. restricted-dma-pool DT attribute can be assigned per-device. So it
doesn't affect other drivers on the system, and potentially that means
you can have different pools for multiple devices (have not actually
tried this yet, but seems like it would work).

3. More normal devices can work. I purchased a thunderbolt nvme
enclosure and it works with the swiotlb bounce buffering with no kernel
modifications.

4. With a sufficient amount of hacks in the driver, the NVIDIA "open"
driver can be made to work, albeit with already slow gaming performance
reduced to about 30% (~10fps) vs paravirt dma mapping (~30fps). I wasn't
able to get CUDA working, but presumably that just needs more elbow
grease.

After sleeping on this a bit, I think my proposal would be:

- The `restricted-dma-pool` method can be the default. For most devices
  this will work seamlessly, though users may have to specify a size for
  the pool, since the optimal size will vary for each device.

- The apple-dma-pci thing can be downgraded from an actual device to an
  out-of-tree workaround. I have not yet tested it, but presumably it
  can use ivshmem or a virtual serial port to communicate the mappings.
  It's mostly a guest-side hack so it doesn't really need qemu
  involvement necessarily. 

- I doubt Apple will actually approve this for distribution, but I can
  write a kext that uses the kernel API to manipulate the DART directly.
  I didn't realize this was an option before. This can act as kind of a
  companion for my dext and as follow-on to this patchset, I can teach
  the vIOMMU device to use it. Eventually if Apple exposes this as
  something you can use in a dext, then the functionality can be moved
  into the dext and all of these concerns become moot. Until then, it
  can be an optimization if you're willing to run without SIP.

If you think this is OK, I can prepare a new version of the patchset.

Thanks,
-sjg

[1] https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/gpu/gsp/kernel_gsp.c#L5404
[2] https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c#L1827



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-08  7:02         ` Scott J. Goldman
@ 2026-04-08  8:33           ` Mohamed Mediouni
  2026-04-08 19:09           ` Mohamed Mediouni
  1 sibling, 0 replies; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-08  8:33 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x



> On 8. Apr 2026, at 09:02, Scott J. Goldman <scottjgo@gmail.com> wrote:
> 
> 
> I dug in here and implemented the restricted-dma-pool solution. Still
> needs some cleanups but it's working enough to test. To start with the
> bad news:
> 
> 1. As I mentioned previously, the mainline kernel has a max 256k limit
> for any single swiotlb mapping. This has been debated a few times on
> LKML, but the consensus has been generally that it should not be changed
> or made configurable. You can see the threads:
>  - https://lkml.org/lkml/2015/3/3/84
>  - https://patchwork.kernel.org/project/linux-mips/patch/20210914151016.3174924-1-Roman_Skakun@epam.com/
> 
> 2. NVIDIA drivers immediately make a contiguous 528384 byte allocation,
> at least on my hardware (NVIDIA RTX 5090), which is required as part of
> initializing the firmware on the card. This obviously fails immediately.
> It happens on both the NVIDIA-provided "open" drivers[1] and the in-tree
> `nouveau` [2], so it's more a hardware-specific issue than just a driver
> problem. If you hack around that (allocate 3 smaller buffers and hope
> they are contiguous), you'll see that both drivers assume coherent DMA
> memory (moreso in the nvidia driver than nouveau, but it's a problem in
> both). They map DMA buffers and then write data into the buffers
> afterward. So you end up sending empty swiotlb buffers to the card and
> it'll ultimately fail to initialize.

Hello,

Interesting, on x86 the bounce buffer situation is a bit different instead of 
just conventional swiotlb.

It looks like some of the NVIDIA bounce buffering support code is behind checks
to support x86_64 only:

https://github.com/NVIDIA/open-gpu-kernel-modules/blob/db0c4e65c8e34c678d745ddb1317f53f90d1072b/src/nvidia/src/kernel/gpu/ce/arch/blackwell/kernel_ce_gb100.c#L1833

> 
> It's possible the press release was referring to using the closed NVIDIA
> drivers, but those are now deprecated and don't support my newer GPU.
> 

They’re using the open ones on x86, Hopper doesn’t support the closed ones.

> But, there is good news:
> 
> 1. The IOVA range that seems to always come from PCIDriverKit is pretty
> far outside the default qemu mapping from `-machine virt`, so the range
> can be cleanly identity mapped in the VM without overlap. One of the
> restrictions I noted earlier (16MB max contiguous mapping) was actually
> just a bug in my code. A large contiguous mapping seems to work fine,
> though the ~1.5GB limit is still real.

There’s a catch there for early platforms with small IPA space (ie early non-Pro/Max)
which only had a 64GB IPA space.

But documenting those as not supported is probably fine.

> 
> 2. restricted-dma-pool DT attribute can be assigned per-device. So it
> doesn't affect other drivers on the system, and potentially that means
> you can have different pools for multiple devices (have not actually
> tried this yet, but seems like it would work).

Yes.

> 
> 3. More normal devices can work. I purchased a thunderbolt nvme
> enclosure and it works with the swiotlb bounce buffering with no kernel
> modifications.
> 
> 4. With a sufficient amount of hacks in the driver, the NVIDIA "open"
> driver can be made to work, albeit with already slow gaming performance
> reduced to about 30% (~10fps) vs paravirt dma mapping (~30fps). I wasn't
> able to get CUDA working, but presumably that just needs more elbow
> grease.

UVM for CUDA uses its own separate memory allocator so changes for CUDA
are expected - except if you force disable UVM which you can do by not
loading nvidia-uvm and this:

https://gist.githubusercontent.com/shkhln/40ef290463e78fb2b0000c60f4ad797e/raw/0e1fd8e8ea52b7445c3d33f5e5975efd20388dcb/uvm_ioctl_override.c

> 
> After sleeping on this a bit, I think my proposal would be:
> 
> - The `restricted-dma-pool` method can be the default. For most devices
>  this will work seamlessly, though users may have to specify a size for
>  the pool, since the optimal size will vary for each device.
> 
> - The apple-dma-pci thing can be downgraded from an actual device to an
>  out-of-tree workaround. I have not yet tested it, but presumably it
>  can use ivshmem or a virtual serial port to communicate the mappings.
>  It's mostly a guest-side hack so it doesn't really need qemu
>  involvement necessarily.

If it’s going to ship, I think keeping it an actual device is a good idea.

> 
> - I doubt Apple will actually approve this for distribution, but I can
>  write a kext that uses the kernel API to manipulate the DART directly.

Unfortunately the kext situation is pretty much a hard no for signing on
the Apple side. :/

>  I didn't realize this was an option before. This can act as kind of a
>  companion for my dext and as follow-on to this patchset, I can teach
>  the vIOMMU device to use it. Eventually if Apple exposes this as
>  something you can use in a dext, then the functionality can be moved
>  into the dext and all of these concerns become moot. Until then, it
>  can be an optimization if you're willing to run without SIP.

Yeah, proper virtio-mmu can be available as an option when SIP is off
with a custom kext, and I think having support for that in-tree would be
a good idea.

> 
> If you think this is OK, I can prepare a new version of the patchset.

This all looks cool :)

> 
> Thanks,
> -sjg
> 
> [1] https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/gpu/gsp/kernel_gsp.c#L5404
> [2] https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c#L1827
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-08  7:02         ` Scott J. Goldman
  2026-04-08  8:33           ` Mohamed Mediouni
@ 2026-04-08 19:09           ` Mohamed Mediouni
  2026-04-08 20:45             ` Scott J. Goldman
  1 sibling, 1 reply; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-08 19:09 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x



> On 8. Apr 2026, at 09:02, Scott J. Goldman <scottjgo@gmail.com> wrote:
> 
> On Sun Apr 5, 2026 at 5:16 PM PDT, Mohamed Mediouni wrote:
>> 
>> 
>>> On 6. Apr 2026, at 01:20, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>> 
>>> On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>>>> 
>>>> 
>>>>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>>>> 
>>>>>> 
>>>>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>>> 
>>>>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>>>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>>>>> backend instead of the Linux VFIO kernel driver.
>>>>>> 
>>>>>> I'm sending this as an RFC because I'd like feedback before investing
>>>>>> further in upstreaming.  The code is functional.  I've tested it with
>>>>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>>>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>>>>> [1]), likely due to the BAR access penalty described below.  AI
>>>>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>>>>> generates around 140 tok/sec on the same setup [2].
>>>>>> 
>>>>>> How it works:
>>>>>> 
>>>>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>>>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>>>>> kernel interface.  Instead, a userspace DriverKit extension
>>>>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>>>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>>>>> 
>>>>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>>>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>>>>> io_ops callbacks, the build system is extended for Darwin, and the
>>>>>> Apple-specific backend plugs in behind those abstractions.
>>>>>> 
>>>>>> The guest sees two PCI devices: the passthrough device itself
>>>>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>>>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>>>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>>>>> library wraps the IOUserClient calls to the dext for config space,
>>>>>> BAR MMIO, interrupts, reset, and DMA.
>>>>>> 
>>>>>> DMA limitations:
>>>>>> 
>>>>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>>>>> mapping operation where the caller specifies the IOVA, the
>>>>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>>>>> system-assigned IOVA.  There is no way to request a specific address.
>>>>>> This means the guest's requested DMA addresses cannot be used
>>>>>> directly.  The guest kernel module must intercept DMA mapping calls
>>>>>> and forward them through the companion device to get the actual
>>>>>> hardware IOVA.
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>>>>> API used by Virtualization.framework but that's a different design.
>>> 
>>> This is really interesting and I had not heard about this. Are you
>>> able to elaborate on this one at all? Maybe this is something where an
>>> internal API to manipulate the DART is available inside
>>> Virtualization.framework?
>> 
>> Hello,
>> 
>> All of it needs using private entitlements currently.
>> 
>> It's _VZPCIPassthroughDeviceConfiguration, a private class needing com.apple.private.virtualization to use.
>> 
>> The VMM process itself then uses the com.apple.private.PCIPassthrough.access entitlement. I'm not
>> sure whether OS versions even have all the code currently though.
>> 
> 
> Appreciate the pointers here. It looks like, as you said, the framework
> taps into a bunch of code that isn't shipped to us mere mortals. I can
> see from some of the code in Virtualization.framework the general shape
> of what they're doing, though.
> 
> It looks like they implement a virtio-iommu device that ultimately calls
> into the host kernel with some internal APIs to do the DART mappings. 
> 

Hello,

Some more details:

The VMM side when using Virtualization.framework is at /System/Library/Frameworks/Virtualization.framework/XPCServices/com.apple.Virtualization.VirtualMachine.xpc/Contents/MacOS/com.apple.Virtualization.VirtualMachine
as Virtualization.framework

And that directly communicates with IOPCIDevice...

And the source code side of PCIDriverKit is at https://github.com/apple-oss-distributions/IOPCIFamily/tree/main/PCIDriverKit 

And for PrepareForDMA at https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOUserServer.cpp#L1001

IOMemoryDescriptor has this option: https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/DriverKit/IOMemoryDescriptor.iig#L56 - kIOMemoryMapFixedAddress

But not sure whether that’s allowed for user-mode drivers

Hopefully that helps.

Thank you,
-Mohamed








^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-08 19:09           ` Mohamed Mediouni
@ 2026-04-08 20:45             ` Scott J. Goldman
  2026-04-08 22:12               ` Mohamed Mediouni
  0 siblings, 1 reply; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-08 20:45 UTC (permalink / raw)
  To: Mohamed Mediouni, Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

On Wed Apr 8, 2026 at 12:09 PM PDT, Mohamed Mediouni wrote:
>
>
>> On 8. Apr 2026, at 09:02, Scott J. Goldman <scottjgo@gmail.com> wrote:
>> 
>> On Sun Apr 5, 2026 at 5:16 PM PDT, Mohamed Mediouni wrote:
>>> 
>>> 
>>>> On 6. Apr 2026, at 01:20, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>> 
>>>> On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>>>>> 
>>>>> 
>>>>>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>>>> 
>>>>>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>>>>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>>>>>> backend instead of the Linux VFIO kernel driver.
>>>>>>> 
>>>>>>> I'm sending this as an RFC because I'd like feedback before investing
>>>>>>> further in upstreaming.  The code is functional.  I've tested it with
>>>>>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>>>>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>>>>>> [1]), likely due to the BAR access penalty described below.  AI
>>>>>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>>>>>> generates around 140 tok/sec on the same setup [2].
>>>>>>> 
>>>>>>> How it works:
>>>>>>> 
>>>>>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>>>>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>>>>>> kernel interface.  Instead, a userspace DriverKit extension
>>>>>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>>>>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>>>>>> 
>>>>>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>>>>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>>>>>> io_ops callbacks, the build system is extended for Darwin, and the
>>>>>>> Apple-specific backend plugs in behind those abstractions.
>>>>>>> 
>>>>>>> The guest sees two PCI devices: the passthrough device itself
>>>>>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>>>>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>>>>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>>>>>> library wraps the IOUserClient calls to the dext for config space,
>>>>>>> BAR MMIO, interrupts, reset, and DMA.
>>>>>>> 
>>>>>>> DMA limitations:
>>>>>>> 
>>>>>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>>>>>> mapping operation where the caller specifies the IOVA, the
>>>>>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>>>>>> system-assigned IOVA.  There is no way to request a specific address.
>>>>>>> This means the guest's requested DMA addresses cannot be used
>>>>>>> directly.  The guest kernel module must intercept DMA mapping calls
>>>>>>> and forward them through the companion device to get the actual
>>>>>>> hardware IOVA.
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>>>>>> API used by Virtualization.framework but that's a different design.
>>>> 
>>>> This is really interesting and I had not heard about this. Are you
>>>> able to elaborate on this one at all? Maybe this is something where an
>>>> internal API to manipulate the DART is available inside
>>>> Virtualization.framework?
>>> 
>>> Hello,
>>> 
>>> All of it needs using private entitlements currently.
>>> 
>>> It's _VZPCIPassthroughDeviceConfiguration, a private class needing com.apple.private.virtualization to use.
>>> 
>>> The VMM process itself then uses the com.apple.private.PCIPassthrough.access entitlement. I'm not
>>> sure whether OS versions even have all the code currently though.
>>> 
>> 
>> Appreciate the pointers here. It looks like, as you said, the framework
>> taps into a bunch of code that isn't shipped to us mere mortals. I can
>> see from some of the code in Virtualization.framework the general shape
>> of what they're doing, though.
>> 
>> It looks like they implement a virtio-iommu device that ultimately calls
>> into the host kernel with some internal APIs to do the DART mappings. 
>> 
>
> Hello,
>
> Some more details:
>
> The VMM side when using Virtualization.framework is at /System/Library/Frameworks/Virtualization.framework/XPCServices/com.apple.Virtualization.VirtualMachine.xpc/Contents/MacOS/com.apple.Virtualization.VirtualMachine
> as Virtualization.framework
>
> And that directly communicates with IOPCIDevice...
>
> And the source code side of PCIDriverKit is at https://github.com/apple-oss-distributions/IOPCIFamily/tree/main/PCIDriverKit 
>
> And for PrepareForDMA at https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOUserServer.cpp#L1001
>
> IOMemoryDescriptor has this option: https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/DriverKit/IOMemoryDescriptor.iig#L56 - kIOMemoryMapFixedAddress
>
> But not sure whether that’s allowed for user-mode drivers
>
> Hopefully that helps.
>

Appreciate the pointers. It seems like the flag does work on DriverKit
user-level drivers. Unfortunately it controls the virtual address
placement in the process, not the IOVA for DMA. If you follow the path
through:

PrepareForDMA_Impl(options, memDesc, offset, length, ...)
 -> IODMACommand::prepare(offset, length)
   -> md->dmaCommandOperation(kIOMDDMAMap, &mapArgs)
     -> IOGeneralMemoryDescriptor::dmaMap(mapper, ..., &mapArgs.fAlloc)

You can see that the code doesn't pass these flags through to the
ultimate call to iovmMapMemory. The base class path is at:

https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOMemoryDescriptor.cpp#L4581-L4587C54

and the IOGeneralMemoryDescriptor override (which is the path taken for client memory) has the same gap:

https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOMemoryDescriptor.cpp#L4671-L4742

where the mapOptions are also set in IODMACommand::prepare():

https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L985C1-L996C1

I think the flag that would have to be in the path would be
kIODMAMapFixedAddress.

Thanks,
-sjg


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-08 20:45             ` Scott J. Goldman
@ 2026-04-08 22:12               ` Mohamed Mediouni
  2026-04-08 23:33                 ` Scott J. Goldman
  0 siblings, 1 reply; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-08 22:12 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x



> On 8. Apr 2026, at 22:45, Scott J. Goldman <scottjgo@gmail.com> wrote:
> 
> On Wed Apr 8, 2026 at 12:09 PM PDT, Mohamed Mediouni wrote:
>> 
>> 
>>> On 8. Apr 2026, at 09:02, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>> 
>>> On Sun Apr 5, 2026 at 5:16 PM PDT, Mohamed Mediouni wrote:
>>>> 
>>>> 
>>>>> On 6. Apr 2026, at 01:20, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>> 
>>>>> On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>>>>>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>>>>>>> backend instead of the Linux VFIO kernel driver.
>>>>>>>> 
>>>>>>>> I'm sending this as an RFC because I'd like feedback before investing
>>>>>>>> further in upstreaming.  The code is functional.  I've tested it with
>>>>>>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>>>>>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>>>>>>> [1]), likely due to the BAR access penalty described below.  AI
>>>>>>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>>>>>>> generates around 140 tok/sec on the same setup [2].
>>>>>>>> 
>>>>>>>> How it works:
>>>>>>>> 
>>>>>>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>>>>>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>>>>>>> kernel interface.  Instead, a userspace DriverKit extension
>>>>>>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>>>>>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>>>>>>> 
>>>>>>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>>>>>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>>>>>>> io_ops callbacks, the build system is extended for Darwin, and the
>>>>>>>> Apple-specific backend plugs in behind those abstractions.
>>>>>>>> 
>>>>>>>> The guest sees two PCI devices: the passthrough device itself
>>>>>>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>>>>>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>>>>>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>>>>>>> library wraps the IOUserClient calls to the dext for config space,
>>>>>>>> BAR MMIO, interrupts, reset, and DMA.
>>>>>>>> 
>>>>>>>> DMA limitations:
>>>>>>>> 
>>>>>>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>>>>>>> mapping operation where the caller specifies the IOVA, the
>>>>>>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>>>>>>> system-assigned IOVA.  There is no way to request a specific address.
>>>>>>>> This means the guest's requested DMA addresses cannot be used
>>>>>>>> directly.  The guest kernel module must intercept DMA mapping calls
>>>>>>>> and forward them through the companion device to get the actual
>>>>>>>> hardware IOVA.
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>>>>>>> API used by Virtualization.framework but that's a different design.
>>>>> 
>>>>> This is really interesting and I had not heard about this. Are you
>>>>> able to elaborate on this one at all? Maybe this is something where an
>>>>> internal API to manipulate the DART is available inside
>>>>> Virtualization.framework?
>>>> 
>>>> Hello,
>>>> 
>>>> All of it needs using private entitlements currently.
>>>> 
>>>> It's _VZPCIPassthroughDeviceConfiguration, a private class needing com.apple.private.virtualization to use.
>>>> 
>>>> The VMM process itself then uses the com.apple.private.PCIPassthrough.access entitlement. I'm not
>>>> sure whether OS versions even have all the code currently though.
>>>> 
>>> 
>>> Appreciate the pointers here. It looks like, as you said, the framework
>>> taps into a bunch of code that isn't shipped to us mere mortals. I can
>>> see from some of the code in Virtualization.framework the general shape
>>> of what they're doing, though.
>>> 
>>> It looks like they implement a virtio-iommu device that ultimately calls
>>> into the host kernel with some internal APIs to do the DART mappings. 
>>> 
>> 
>> Hello,
>> 
>> Some more details:
>> 
>> The VMM side when using Virtualization.framework is at /System/Library/Frameworks/Virtualization.framework/XPCServices/com.apple.Virtualization.VirtualMachine.xpc/Contents/MacOS/com.apple.Virtualization.VirtualMachine
>> as Virtualization.framework
>> 
>> And that directly communicates with IOPCIDevice...
>> 
>> And the source code side of PCIDriverKit is at https://github.com/apple-oss-distributions/IOPCIFamily/tree/main/PCIDriverKit 
>> 
>> And for PrepareForDMA at https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOUserServer.cpp#L1001
>> 
>> IOMemoryDescriptor has this option: https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/DriverKit/IOMemoryDescriptor.iig#L56 - kIOMemoryMapFixedAddress
>> 
>> But not sure whether that’s allowed for user-mode drivers
>> 
>> Hopefully that helps.
>> 
> 
> Appreciate the pointers. It seems like the flag does work on DriverKit
> user-level drivers. Unfortunately it controls the virtual address
> placement in the process, not the IOVA for DMA. If you follow the path
> through:
> 
That’s indeed the case… 
> PrepareForDMA_Impl(options, memDesc, offset, length, ...)
> -> IODMACommand::prepare(offset, length)
>   -> md->dmaCommandOperation(kIOMDDMAMap, &mapArgs)
>     -> IOGeneralMemoryDescriptor::dmaMap(mapper, ..., &mapArgs.fAlloc)
> 
> You can see that the code doesn't pass these flags through to the
> ultimate call to iovmMapMemory. The base class path is at:
> 
> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOMemoryDescriptor.cpp#L4581-L4587C54
> 
> and the IOGeneralMemoryDescriptor override (which is the path taken for client memory) has the same gap:
> 
> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOMemoryDescriptor.cpp#L4671-L4742
> 
> where the mapOptions are also set in IODMACommand::prepare():
> 
> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L985C1-L996C1
> 
> I think the flag that would have to be in the path would be
> kIODMAMapFixedAddress.

Intriguingly, mapOptions is defined at https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L943C39-L943C49 and set there in https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L985C1-L996C1 but then not used afterwards…

And the flags passed are:
https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOUserServer.cpp#L979
so it’s not in IODMACommandSpecification either.

Makes me wonder even more what Apple does for their own VM thing, or maybe
the simplest answer is that they’re not using something public - especially
as they’re doing this from user-mode using the IOPCIDevice interface…
Or… the Apple code in Virtualization.framework is unfinished and I’m just thinking too hard about this





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-08 22:12               ` Mohamed Mediouni
@ 2026-04-08 23:33                 ` Scott J. Goldman
  2026-04-09  0:02                   ` Mohamed Mediouni
  0 siblings, 1 reply; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-08 23:33 UTC (permalink / raw)
  To: Mohamed Mediouni, Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

On Wed Apr 8, 2026 at 3:12 PM PDT, Mohamed Mediouni wrote:
>
>
>> On 8. Apr 2026, at 22:45, Scott J. Goldman <scottjgo@gmail.com> wrote:
>> 
>> On Wed Apr 8, 2026 at 12:09 PM PDT, Mohamed Mediouni wrote:
>>> 
>>> 
>>>> On 8. Apr 2026, at 09:02, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>> 
>>>> On Sun Apr 5, 2026 at 5:16 PM PDT, Mohamed Mediouni wrote:
>>>>> 
>>>>> 
>>>>>> On 6. Apr 2026, at 01:20, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>>> 
>>>>>> On Sun Apr 5, 2026 at 1:14 AM PDT, Mohamed Mediouni wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On 5. Apr 2026, at 10:01, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 5. Apr 2026, at 09:28, Scott J. Goldman <scottjgo@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> This series adds VFIO PCI device passthrough support for Apple Silicon
>>>>>>>>> Macs running macOS, using a DriverKit extension (dext) as the host
>>>>>>>>> backend instead of the Linux VFIO kernel driver.
>>>>>>>>> 
>>>>>>>>> I'm sending this as an RFC because I'd like feedback before investing
>>>>>>>>> further in upstreaming.  The code is functional.  I've tested it with
>>>>>>>>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>>>>>>>>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>>>>>>>>> [1]), likely due to the BAR access penalty described below.  AI
>>>>>>>>> inference workloads appear less affected.  Ollama with Qwen 3.5
>>>>>>>>> generates around 140 tok/sec on the same setup [2].
>>>>>>>>> 
>>>>>>>>> How it works:
>>>>>>>>> 
>>>>>>>>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>>>>>>>>> for device access and DMA mapping.  On macOS, there is no equivalent
>>>>>>>>> kernel interface.  Instead, a userspace DriverKit extension
>>>>>>>>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>>>>>>>>> IOKit's IOUserClient and PCIDriverKit APIs.
>>>>>>>>> 
>>>>>>>>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>>>>>>>>> passthrough infrastructure.  A few ioctl callsites are refactored into
>>>>>>>>> io_ops callbacks, the build system is extended for Darwin, and the
>>>>>>>>> Apple-specific backend plugs in behind those abstractions.
>>>>>>>>> 
>>>>>>>>> The guest sees two PCI devices: the passthrough device itself
>>>>>>>>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>>>>>>>>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>>>>>>>>> AppleVFIOContainer implements the IOMMU backend, and a C client
>>>>>>>>> library wraps the IOUserClient calls to the dext for config space,
>>>>>>>>> BAR MMIO, interrupts, reset, and DMA.
>>>>>>>>> 
>>>>>>>>> DMA limitations:
>>>>>>>>> 
>>>>>>>>> This is the biggest platform constraint.  Unlike a typical IOMMU
>>>>>>>>> mapping operation where the caller specifies the IOVA, the
>>>>>>>>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>>>>>>>>> system-assigned IOVA.  There is no way to request a specific address.
>>>>>>>>> This means the guest's requested DMA addresses cannot be used
>>>>>>>>> directly.  The guest kernel module must intercept DMA mapping calls
>>>>>>>>> and forward them through the companion device to get the actual
>>>>>>>>> hardware IOVA.
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Ugh this one is not great. By the way, Apple has a private PCIe passthrough
>>>>>>>> API used by Virtualization.framework but that's a different design.
>>>>>> 
>>>>>> This is really interesting and I had not heard about this. Are you
>>>>>> able to elaborate on this one at all? Maybe this is something where an
>>>>>> internal API to manipulate the DART is available inside
>>>>>> Virtualization.framework?
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> All of it needs using private entitlements currently.
>>>>> 
>>>>> It's _VZPCIPassthroughDeviceConfiguration, a private class needing com.apple.private.virtualization to use.
>>>>> 
>>>>> The VMM process itself then uses the com.apple.private.PCIPassthrough.access entitlement. I'm not
>>>>> sure whether OS versions even have all the code currently though.
>>>>> 
>>>> 
>>>> Appreciate the pointers here. It looks like, as you said, the framework
>>>> taps into a bunch of code that isn't shipped to us mere mortals. I can
>>>> see from some of the code in Virtualization.framework the general shape
>>>> of what they're doing, though.
>>>> 
>>>> It looks like they implement a virtio-iommu device that ultimately calls
>>>> into the host kernel with some internal APIs to do the DART mappings. 
>>>> 
>>> 
>>> Hello,
>>> 
>>> Some more details:
>>> 
>>> The VMM side when using Virtualization.framework is at /System/Library/Frameworks/Virtualization.framework/XPCServices/com.apple.Virtualization.VirtualMachine.xpc/Contents/MacOS/com.apple.Virtualization.VirtualMachine
>>> as Virtualization.framework
>>> 
>>> And that directly communicates with IOPCIDevice...
>>> 
>>> And the source code side of PCIDriverKit is at https://github.com/apple-oss-distributions/IOPCIFamily/tree/main/PCIDriverKit 
>>> 
>>> And for PrepareForDMA at https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOUserServer.cpp#L1001
>>> 
>>> IOMemoryDescriptor has this option: https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/DriverKit/IOMemoryDescriptor.iig#L56 - kIOMemoryMapFixedAddress
>>> 
>>> But not sure whether that’s allowed for user-mode drivers
>>> 
>>> Hopefully that helps.
>>> 
>> 
>> Appreciate the pointers. It seems like the flag does work on DriverKit
>> user-level drivers. Unfortunately it controls the virtual address
>> placement in the process, not the IOVA for DMA. If you follow the path
>> through:
>> 
> That’s indeed the case… 
>> PrepareForDMA_Impl(options, memDesc, offset, length, ...)
>> -> IODMACommand::prepare(offset, length)
>>   -> md->dmaCommandOperation(kIOMDDMAMap, &mapArgs)
>>     -> IOGeneralMemoryDescriptor::dmaMap(mapper, ..., &mapArgs.fAlloc)
>> 
>> You can see that the code doesn't pass these flags through to the
>> ultimate call to iovmMapMemory. The base class path is at:
>> 
>> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOMemoryDescriptor.cpp#L4581-L4587C54
>> 
>> and the IOGeneralMemoryDescriptor override (which is the path taken for client memory) has the same gap:
>> 
>> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOMemoryDescriptor.cpp#L4671-L4742
>> 
>> where the mapOptions are also set in IODMACommand::prepare():
>> 
>> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L985C1-L996C1
>> 
>> I think the flag that would have to be in the path would be
>> kIODMAMapFixedAddress.
>
> Intriguingly, mapOptions is defined at https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L943C39-L943C49 and set there in https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IODMACommand.cpp#L985C1-L996C1 but then not used afterwards…
>
> And the flags passed are:
> https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/iokit/Kernel/IOUserServer.cpp#L979
> so it’s not in IODMACommandSpecification either.
>
> Makes me wonder even more what Apple does for their own VM thing, or maybe
> the simplest answer is that they’re not using something public - especially
> as they’re doing this from user-mode using the IOPCIDevice interface…
> Or… the Apple code in Virtualization.framework is unfinished and I’m just thinking too hard about this

My possibly incorrect read of the strings and some disassembly of
Virtulization.framework is that they look for
IOServiceMatching("PCIPassthrough") which is simply not exported by
anything I can find on my Mac. For internal builds maybe they ship
another kext for this?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-08 23:33                 ` Scott J. Goldman
@ 2026-04-09  0:02                   ` Mohamed Mediouni
  0 siblings, 0 replies; 25+ messages in thread
From: Mohamed Mediouni @ 2026-04-09  0:02 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

> On 9. Apr 2026, at 01:33, Scott J. Goldman <scottjgo@gmail.com> wrote:
> 
> My possibly incorrect read of the strings and some disassembly of
> Virtulization.framework is that they look for
> IOServiceMatching("PCIPassthrough") which is simply not exported by
> anything I can find on my Mac. For internal builds maybe they ship
> another kext for this?
> 

Hello,

Looked further and tried to run that code…

The related service names queried at runtime by com.apple.Virtualization.VirtualMachine
passed to IOServiceMatching in this case are IOPCIDevice… and PCIPassthroughController

As the internet has no mention of PCIPassthroughController and it’s not
there on my machine running macOS 26.4, and I don’t see any reference of
the com.apple.private.PCIPassthrough.access entitlement anywhere outside
of the VirtualMachine service entitlements list, it’s possible to definitely
conclude that the kernel side of this isn’t shipped by Apple at this point.

And that’s pretty unfortunate but oh well :/ Hopefully was worth looking at though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
                   ` (10 preceding siblings ...)
  2026-04-05  8:01 ` [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Mohamed Mediouni
@ 2026-04-05 10:36 ` BALATON Zoltan
  2026-04-05 18:16   ` Scott J. Goldman
  11 siblings, 1 reply; 25+ messages in thread
From: BALATON Zoltan @ 2026-04-05 10:36 UTC (permalink / raw)
  To: Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

On Sun, 5 Apr 2026, Scott J. Goldman wrote:
> This series adds VFIO PCI device passthrough support for Apple Silicon
> Macs running macOS, using a DriverKit extension (dext) as the host
> backend instead of the Linux VFIO kernel driver.
>
> I'm sending this as an RFC because I'd like feedback before investing
> further in upstreaming.  The code is functional.  I've tested it with
> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
> [1]), likely due to the BAR access penalty described below.  AI
> inference workloads appear less affected.  Ollama with Qwen 3.5
> generates around 140 tok/sec on the same setup [2].
>
> How it works:
>
> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
> for device access and DMA mapping.  On macOS, there is no equivalent
> kernel interface.  Instead, a userspace DriverKit extension
> (VFIOUserPCIDriver) mediates access to the physical PCI device through
> IOKit's IOUserClient and PCIDriverKit APIs.
>
> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
> passthrough infrastructure.  A few ioctl callsites are refactored into
> io_ops callbacks, the build system is extended for Darwin, and the
> Apple-specific backend plugs in behind those abstractions.
>
> The guest sees two PCI devices: the passthrough device itself
> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
> DMA mapping device (apple-dma-pci).  On the QEMU side, an
> AppleVFIOContainer implements the IOMMU backend, and a C client
> library wraps the IOUserClient calls to the dext for config space,
> BAR MMIO, interrupts, reset, and DMA.
>
> DMA limitations:
>
> This is the biggest platform constraint.  Unlike a typical IOMMU
> mapping operation where the caller specifies the IOVA, the
> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
> system-assigned IOVA.  There is no way to request a specific address.
> This means the guest's requested DMA addresses cannot be used
> directly.  The guest kernel module must intercept DMA mapping calls
> and forward them through the companion device to get the actual
> hardware IOVA.

I don't know this so what I say might not make sense but I think there is 
iommu emulation in QEMU so could that be used to do this in QEMU and avoid 
needing a kernel module for it in the guest?

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs
  2026-04-05 10:36 ` BALATON Zoltan
@ 2026-04-05 18:16   ` Scott J. Goldman
  0 siblings, 0 replies; 25+ messages in thread
From: Scott J. Goldman @ 2026-04-05 18:16 UTC (permalink / raw)
  To: BALATON Zoltan, Scott J. Goldman
  Cc: qemu-devel, alex, clg, pbonzini, rbolshakov, phil, mst,
	john.levon, thanos.makatos, qemu-s390x

On Sun Apr 5, 2026 at 3:36 AM PDT, BALATON Zoltan wrote:
> On Sun, 5 Apr 2026, Scott J. Goldman wrote:
>> This series adds VFIO PCI device passthrough support for Apple Silicon
>> Macs running macOS, using a DriverKit extension (dext) as the host
>> backend instead of the Linux VFIO kernel driver.
>>
>> I'm sending this as an RFC because I'd like feedback before investing
>> further in upstreaming.  The code is functional.  I've tested it with
>> an NVIDIA RTX 5090 in a Thunderbolt dock on an M4 MacBook Air.  GPU
>> gaming works but is slow (~30 fps on high settings in Cyberpunk 2077
>> [1]), likely due to the BAR access penalty described below.  AI
>> inference workloads appear less affected.  Ollama with Qwen 3.5
>> generates around 140 tok/sec on the same setup [2].
>>
>> How it works:
>>
>> On Linux, VFIO relies on kernel-managed IOMMU groups and /dev/vfio
>> for device access and DMA mapping.  On macOS, there is no equivalent
>> kernel interface.  Instead, a userspace DriverKit extension
>> (VFIOUserPCIDriver) mediates access to the physical PCI device through
>> IOKit's IOUserClient and PCIDriverKit APIs.
>>
>> The series keeps the existing VFIOPCIDevice model and reuses QEMU's
>> passthrough infrastructure.  A few ioctl callsites are refactored into
>> io_ops callbacks, the build system is extended for Darwin, and the
>> Apple-specific backend plugs in behind those abstractions.
>>
>> The guest sees two PCI devices: the passthrough device itself
>> (vfio-apple-pci, which subclasses VFIOPCIDevice) and a companion
>> DMA mapping device (apple-dma-pci).  On the QEMU side, an
>> AppleVFIOContainer implements the IOMMU backend, and a C client
>> library wraps the IOUserClient calls to the dext for config space,
>> BAR MMIO, interrupts, reset, and DMA.
>>
>> DMA limitations:
>>
>> This is the biggest platform constraint.  Unlike a typical IOMMU
>> mapping operation where the caller specifies the IOVA, the
>> PCIDriverKit API (IODMACommand::PrepareForDMA) returns a
>> system-assigned IOVA.  There is no way to request a specific address.
>> This means the guest's requested DMA addresses cannot be used
>> directly.  The guest kernel module must intercept DMA mapping calls
>> and forward them through the companion device to get the actual
>> hardware IOVA.
>
> I don't know this so what I say might not make sense but I think there is 
> iommu emulation in QEMU so could that be used to do this in QEMU and avoid 
> needing a kernel module for it in the guest?
>
> Regards,
> BALATON Zoltan

I think the challenge is that this is a passthrough device doing
DMA directly on the physical PCIe bus.  The device's DMA
transactions go through the real hardware IOMMU (DART), not
through QEMU.

If the guest programs the device with IOVA 0x1000 (assigned by
a virtual IOMMU), the device will issue a DMA read for 0x1000 on
the physical bus.  But DART only knows about the IOVAs that
PrepareForDMA assigned, so the transaction would fault.

My understanding is that on other platforms, this is handled in
a simpler way because the host IOMMU can be programmed directly:

1. Guest boots with 2 GB of RAM.  QEMU maps guest physical
   address (GPA) 0x10000000-0x90000000 to host physical memory
   at, say, 0x110000000-0x190000000.
2. QEMU programs the host IOMMU so the device's view of
   0x10000000-0x90000000 translates to the real host addresses.
3. Guest programs the PCI device to DMA to GPA 0x20000000.
4. The device issues the transaction, the IOMMU translates it
   to 0x120000000, and it hits the right physical memory.

On macOS, step 2 is missing.  The DriverKit APIs don't provide
a way to program arbitrary IOVA-to-HPA translations into DART.
You can only hand a buffer to PrepareForDMA and get back whatever
IOVA the system assigns.

On top of that, the platform limits the total amount of DMA-mapped
memory to roughly 1.5 GB across ~64k mappings, so you can't even
map all of a 2 GB guest's RAM.  I believe this limit comes from
the host device tree and isn't modifiable by users, though it
could potentially be changed by Apple in firmware.

A vIOMMU could reduce the amount of memory that needs to be
mapped (only what the guest actually uses for DMA, not all of
guest RAM), but fundamentally you still need something akin to
step 2 to make the device's physical DMA transactions land at
the right addresses, and we don't have that on this platform.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-04-22 17:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-05  7:28 [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 01/10] vfio/pci: Use the write side of EventNotifier for IRQ signaling Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 02/10] accel/hvf: avoid executable mappings for RAM-device memory Scott J. Goldman
2026-04-22 17:05   ` Philippe Mathieu-Daudé
2026-04-05  7:28 ` [RFC PATCH 03/10] vfio: Allow building on Darwin hosts Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 04/10] vfio: Prepare existing code for Apple VFIO backend Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 05/10] vfio: Add region_map and region_unmap callbacks to VFIODeviceIOOps Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 06/10] vfio: Add device_reset callback " Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 07/10] vfio/apple: Add DriverKit dext client library Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 08/10] vfio/apple: Add IOMMU container and PCI device Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 09/10] vfio/apple: Add apple-dma-pci companion device Scott J. Goldman
2026-04-05  7:28 ` [RFC PATCH 10/10] docs: Add vfio-apple documentation and MAINTAINERS entry Scott J. Goldman
2026-04-05  8:01 ` [RFC PATCH 00/10] vfio: PCI device passthrough on Apple Silicon Macs Mohamed Mediouni
2026-04-05  8:14   ` Mohamed Mediouni
2026-04-05 23:20     ` Scott J. Goldman
2026-04-06  0:16       ` Mohamed Mediouni
2026-04-08  7:02         ` Scott J. Goldman
2026-04-08  8:33           ` Mohamed Mediouni
2026-04-08 19:09           ` Mohamed Mediouni
2026-04-08 20:45             ` Scott J. Goldman
2026-04-08 22:12               ` Mohamed Mediouni
2026-04-08 23:33                 ` Scott J. Goldman
2026-04-09  0:02                   ` Mohamed Mediouni
2026-04-05 10:36 ` BALATON Zoltan
2026-04-05 18:16   ` Scott J. Goldman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.