* Re: [PATCH] vfio/igd: Update IGD passthrough docoumentation
2025-03-12 15:50 [PATCH] vfio/igd: Update IGD passthrough docoumentation Tomita Moeko
@ 2025-03-12 16:29 ` Alex Williamson
2025-03-13 8:37 ` Cédric Le Goater
2025-03-13 11:04 ` Corvin Köhne
2 siblings, 0 replies; 4+ messages in thread
From: Alex Williamson @ 2025-03-12 16:29 UTC (permalink / raw)
To: Tomita Moeko; +Cc: Cédric Le Goater, qemu-devel, Corvin Köhne
On Wed, 12 Mar 2025 23:50:02 +0800
Tomita Moeko <tomitamoeko@gmail.com> wrote:
> A previous change made the OpRegion and LPC quirks independent of the
> exising legacy mode, update the docoumentation accordingly. More related
> topics, like creating EFI Option ROM of IGD for OVMF, how to solve the
> VFIO_DMA_MAP Invalid Argument warning, as well as details on IGD memory
> internals, are also added.
>
> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> ---
> docs/igd-assign.txt | 262 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 193 insertions(+), 69 deletions(-)
>
> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> index e17bb50789..c7c4565906 100644
> --- a/docs/igd-assign.txt
> +++ b/docs/igd-assign.txt
> @@ -1,44 +1,69 @@
> Intel Graphics Device (IGD) assignment with vfio-pci
> ====================================================
>
> -IGD has two different modes for assignment using vfio-pci:
> -
> -1) Universal Pass-Through (UPT) mode:
> -
> - In this mode the IGD device is added as a *secondary* (ie. non-primary)
> - graphics device in combination with an emulated primary graphics device.
> - This mode *requires* guest driver support to remove the external
> - dependencies generally associated with IGD (see below). Those guest
> - drivers only support this mode for Broadwell and newer IGD, according to
> - Intel. Additionally, this mode by default, and as officially supported
> - by Intel, does not support direct video output. The intention is to use
> - this mode either to provide hardware acceleration to the emulated graphics
> - or to use this mode in combination with guest-based remote access software,
> - for example VNC (see below for optional output support). This mode
> - theoretically has no device specific handling dependencies on vfio-pci or
> - the VM firmware.
> -
> -2) "Legacy" mode:
> -
> - In this mode the IGD device is intended to be the primary and exclusive
> - graphics device in the VM[1], as such QEMU does not facilitate any sort
> - of remote graphics to the VM in this mode. A connected physical monitor
> - is the intended output device for IGD. This mode includes several
> - requirements and restrictions:
> -
> - * IGD must be given address 02.0 on the PCI root bus in the VM
> - * The host kernel must support vfio extensions for IGD (v4.6)
> - * vfio VGA support very likely needs to be enabled in the host kernel
> - * The VM firmware must support specific fw_cfg enablers for IGD
> - * The VM machine type must support a PCI host bridge at 00.0 (standard)
> - * The VM machine type must provide or allow to be created a special
> - ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> - PCI address 1f.0.
> - * The IGD device must have a VGA ROM, either provided via the romfile
> - option or loaded automatically through vfio (standard). rombar=0
> - will disable legacy mode support.
> - * Hotplug of the IGD device is not supported.
> - * The IGD device must be a SandyBridge or newer model device.
> +Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either
> +serve as primary and exclusive graphics adapter, or used in combination with an
> +emulated primary graphics device, depending on the config and guest driver
> +support. However, IGD devices are not "clean" PCI devices, they use extra
> +memory regions other than BARs. Special handling is required to make them work
> +properly, including:
> +
> +* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output
> + information.
> +* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
> +
> +Certain guest software also depends on following conditions to work:
> +(*-Required by)
> +
> +| Condition | Linux | Windows | VBIOS | EFI GOP |
> +|---------------------------------------------|-------|---------|-------|---------|
> +| #1 IGD has a valid OpRegion containing VBT | * ^1 | * | * | * |
> +| #2 VID/DID of LPC bridge at 00:1f.0 matches | | | * | * |
> +| #3 IGD is assigned to BDF 00:02.0 | | | * | * |
> +| #4 IGD has VGA controller device class | | | * | * |
> +| #5 Host's VGA ranges are mapped to IGD | | | * | |
> +| #6 Guest has valid VBIOS or UEFI Option ROM | | | * | * |
> +
> +^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
> + use the VBT copied from host OpRegion to prevent incorrect configuration.
> +
> +For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
> +guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
> +
> +For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge
> +to guest. Currently this is only supported on i440fx machines as there is
> +already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
> +lead to unexpected behavior.
> +
> +For #3, "addr=2.0" assigns IGD to 00:02.0.
> +
> +For #4, the primary display must be set to IGD in host BIOS.
> +
> +For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
> +
> +For #6, ROM either provided via the ROM BAR or romfile= option is needed, this
> +Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
> +"Guest firmware" section.
> +
> +QEMU also provides a "Legacy" mode that implicitly enables full functionality
> +on IGD, it is automatically enabled when
> +* Machine type is i440fx
> +* IGD is assigned to guest BDF 00:02.0
> +* ROM BAR or romfile is present
> +
> +In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
> +VGA range access, which is equivalent to:
> + x-igd-opregion=on,x-igd-lpc=on,x-vga=on
> +
> +By default, "Legacy" mode won't fail, it continues on error. User can set
> +"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
> +conditions above for legacy mode is met, and if any error occurs, QEMU will
> +fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy
> +mode.
> +
> +In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
> +graphics devices should be removed, this can be done using "-nographic" or
> +"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
>
> For either mode, depending on the host kernel, the i915 driver in the host
> may generate faults and errors upon re-binding to an IGD device after it
> @@ -73,31 +98,39 @@ DVI, or DisplayPort) may be unsupported in some use cases. In the author's
> experience, even DP to VGA adapters can be troublesome while adapters between
> digital formats work well.
>
> -Usage
> -=====
> -The intention is for IGD assignment to be transparent for users and thus for
> -management tools like libvirt. To make use of legacy mode, simply remove all
> -other graphics options and use "-nographic" and either "-vga none" or
> -"-nodefaults", along with adding the device using vfio-pci:
>
> - -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2
> +Options
> +=======
> +* x-igd-opregion=[on|*off*]
> + Copy host IGD OpRegion and expose it to guest with fw_cfg
> +
> +* x-igd-lpc=[on|*off*]
> + Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
> +
> +* x-igd-legacy-mode=[on|off|*auto*]
> + Enable/Disable legacy mode
> +
> +* x-igd-gms=[hex, default 0]
> + Overriding DSM region size in GGC register, 0 means uses host value.
> + Use this only when the DSM size cannot be changed through the
> + 'DVMT Pre-Allocated' option in host BIOS.
>
> -For UPT mode, retain the default emulated graphics and simply add the vfio-pci
> -device making use of any other bus address other than 02.0. libvirt will
> -default to assigning the device a UPT compatible address while legacy mode
> -users will need to manually edit the XML if using a tool like virt-manager
> -where the VM device address is not expressly specified.
>
> -An experimental vfio-pci option also exists to enable OpRegion, and thus
> -external monitor support, for UPT mode. This can be enabled by adding
> -"x-igd-opregion=on" to the vfio-pci device options for the IGD device. As
> -with legacy mode, this requires the host to support features introduced in
> -the v4.6 kernel. If Intel chooses to embrace this support, the option may
> -be made non-experimental in the future, opening it to libvirt support.
> +Examples
> +========
> +* Adding IGD with automatically legacy mode support
> + -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
>
> -Developer ABI
> -=============
> -Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> +* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
> + (For UEFI guests)
> + -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-opregion=on,x-igd-lpc=on,romfile=efi_oprom.rom
> +
> +
> +Guest firmware
> +==============
> +Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
> +Memory (BDSM) in guest address space. IGD passthrough support imposes two
> +fw_cfg requirements on the VM firmware:
>
> 1) "etc/igd-opregion"
>
> @@ -117,17 +150,108 @@ Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> Firmware must allocate a reserved memory below 4GB with required 1MB
> alignment equal to this size. Additionally the base address of this
> reserved region must be written to the dword BDSM register in PCI config
> - space of the IGD device at offset 0x5C. As this support is related to
> - running the IGD ROM, which has other dependencies on the device appearing
> - at guest address 00:02.0, it's expected that this fw_cfg file is only
> - relevant to a single PCI class VGA device with Intel vendor ID, appearing
> - at PCI bus address 00:02.0.
> + space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
> + 64-bit BDSM). As this support is related to running the IGD ROM, which
> + has other dependencies on the device appearing at guest address 00:02.0,
> + it's expected that this fw_cfg file is only relevant to a single PCI
> + class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0.
> +
> +Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
> +However, the support is not accepted by upstream EDK2/OVMF. A recommended
> +solution is to create a virtual OpRom with following DXE drivers:
> +
> +* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
> +* IntelGopDriver: Closed-source Intel GOP driver
> +* PlatformGopPolicy: Protocol required by IntelGopDriver
> +
> +IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD.
> +
> +The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
> +with PlatformGopPolicy for industrial computing is at [4]. There is also an
> +unofficially maintained version with newer Gen11+ device support at [5].
> +You need to build them with EDK2.
> +
> +For the IntelGopDriver, Intel never released it to public. You may contact
> +Intel support to get one as [4] said, if you are an Intel primer customer,
s/primer/premier/ ?
> +or you can try extract it from your host firmware using "UEFI BIOS Updater"[6].
> +
> +Once you got all the required DXE drivers, a Option ROM can be generated with
> +EfiRom utility in EDK2, using
> + EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
> + -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
> +
> +
> +Known issues
> +============
> +When using OVMF as guest firmware, you may encounter the following warning:
> +warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument)
> +Solution:
> +Set the host physical address bits to IOMMU address width using
> + -cpu host,host-phys-bits-limit=<IOMMU address width>
> +Or in libvirt XML with
> + <cpu>
> + <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
> + </cpu>
> +The IOMMU address width can be determined with
> +echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 ))
That's handy!
> +Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details
> +
> +
> +Memory View
> +===========
> +IGD has it own address space. To use system RAM as VRAM, a single-level page
> +table named Graphics Translation Table (GTT) is used for the address
> +translation. Each page table entry points a 4KB page. The translation flow is:
> +
> +(PTE size 8) +-------------+---+
> + | Address | V | V: Valid Bit
> + +-------------+---+
> + | ... | |
> +IGD:0x01ae9010 0xd740| 0x70ffc000 | 1 | Mem:0x42ba3e010^
> +-----------------> 0xd748| 0x42ba3e000 | 1 +------------------>
> +(addr << 12) * 8 0xd750| 0x42ba3f000 | 1 |
> + | ... | |
> + +-------------+---+
I think this was meant to be '(addr >> 12) * 8'. A simpler
representation is just (addr >> 9), but maybe you're trying to
emphasize the PTE size here.
> +^ The address may be remapped by IOMMU
> +
> +The memory region store GTT is called GTT Stolen Memory (GSM), it is located
> +right below the Data Stolen Memory (DSM). Accessing this region directly is
> +not allowed, any access will immediately freeze the whole system. The only way
> +to access it is through the second half of MMIO BAR0.
> +
> +The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS
> +environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
> +reserving a continuous region and program its base address to BDSM register,
> +then let VBIOS/GOP driver initializing this region. Illustration below shows
> +how DSM is mapped.
> +
> + IGD Addr Space Host Addr Space Guest Addr Space
> + +-------------+ +-------------+ +-------------+
> + | | | | | |
> + | | | | | |
> + | | +-------------+ +-------------+
> + | | | Data Stolen | | Data Stolen |
> + | | | (Guest) | | (Guest) |
> + | | +------------>+-------------+<------->+-------------+<--Guest BDSM
> + | | | Passthrough | | EPT | | Emulated by QEMU
> +DSMSIZE+-------------+ | with IOMMU | | Mapping | | Programmed by guest FW
> + | | | | | | |
> + | | | | | | |
> + 0+-------------+--+ | | | |
> + | +-------------+ | |
> + | | Data Stolen | +-------------+
> + | | (Host) |
> + +------------>+-------------+<--Host BDSM
> + Non- | | "real" one in HW
> + Passthrough | | Programmed by host FW
> + +-------------+
>
> Footnotes
> =========
> -[1] Nothing precludes adding additional emulated or assigned graphics devices
> - as non-primary, other than the combination typically not working. I only
> - intend to set user expectations, others are welcome to find working
> - combinations or fix whatever issues prevent this from working in the common
> - case.
> +[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
> [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
> +[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
> + Tianocore bugzilla was down since Jan 2025 :(
> +[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004
> +[5] https://github.com/tomitamoeko/VfioIgdPkg
> +[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357
This is great and a much needed update. Thanks!
With above corrections:
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vfio/igd: Update IGD passthrough docoumentation
2025-03-12 15:50 [PATCH] vfio/igd: Update IGD passthrough docoumentation Tomita Moeko
2025-03-12 16:29 ` Alex Williamson
@ 2025-03-13 8:37 ` Cédric Le Goater
2025-03-13 11:04 ` Corvin Köhne
2 siblings, 0 replies; 4+ messages in thread
From: Cédric Le Goater @ 2025-03-13 8:37 UTC (permalink / raw)
To: Tomita Moeko, Alex Williamson; +Cc: qemu-devel, Corvin Köhne
Please fix in the Subject "documentation"
On 3/12/25 16:50, Tomita Moeko wrote:
> A previous change made the OpRegion and LPC quirks independent of the
> exising legacy mode, update the docoumentation accordingly. More related
existing documentation
Thanks,
C.
> topics, like creating EFI Option ROM of IGD for OVMF, how to solve the
> VFIO_DMA_MAP Invalid Argument warning, as well as details on IGD memory
> internals, are also added.
>
> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> ---
> docs/igd-assign.txt | 262 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 193 insertions(+), 69 deletions(-)
>
> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> index e17bb50789..c7c4565906 100644
> --- a/docs/igd-assign.txt
> +++ b/docs/igd-assign.txt
> @@ -1,44 +1,69 @@
> Intel Graphics Device (IGD) assignment with vfio-pci
> ====================================================
>
> -IGD has two different modes for assignment using vfio-pci:
> -
> -1) Universal Pass-Through (UPT) mode:
> -
> - In this mode the IGD device is added as a *secondary* (ie. non-primary)
> - graphics device in combination with an emulated primary graphics device.
> - This mode *requires* guest driver support to remove the external
> - dependencies generally associated with IGD (see below). Those guest
> - drivers only support this mode for Broadwell and newer IGD, according to
> - Intel. Additionally, this mode by default, and as officially supported
> - by Intel, does not support direct video output. The intention is to use
> - this mode either to provide hardware acceleration to the emulated graphics
> - or to use this mode in combination with guest-based remote access software,
> - for example VNC (see below for optional output support). This mode
> - theoretically has no device specific handling dependencies on vfio-pci or
> - the VM firmware.
> -
> -2) "Legacy" mode:
> -
> - In this mode the IGD device is intended to be the primary and exclusive
> - graphics device in the VM[1], as such QEMU does not facilitate any sort
> - of remote graphics to the VM in this mode. A connected physical monitor
> - is the intended output device for IGD. This mode includes several
> - requirements and restrictions:
> -
> - * IGD must be given address 02.0 on the PCI root bus in the VM
> - * The host kernel must support vfio extensions for IGD (v4.6)
> - * vfio VGA support very likely needs to be enabled in the host kernel
> - * The VM firmware must support specific fw_cfg enablers for IGD
> - * The VM machine type must support a PCI host bridge at 00.0 (standard)
> - * The VM machine type must provide or allow to be created a special
> - ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> - PCI address 1f.0.
> - * The IGD device must have a VGA ROM, either provided via the romfile
> - option or loaded automatically through vfio (standard). rombar=0
> - will disable legacy mode support.
> - * Hotplug of the IGD device is not supported.
> - * The IGD device must be a SandyBridge or newer model device.
> +Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either
> +serve as primary and exclusive graphics adapter, or used in combination with an
> +emulated primary graphics device, depending on the config and guest driver
> +support. However, IGD devices are not "clean" PCI devices, they use extra
> +memory regions other than BARs. Special handling is required to make them work
> +properly, including:
> +
> +* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output
> + information.
> +* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
> +
> +Certain guest software also depends on following conditions to work:
> +(*-Required by)
> +
> +| Condition | Linux | Windows | VBIOS | EFI GOP |
> +|---------------------------------------------|-------|---------|-------|---------|
> +| #1 IGD has a valid OpRegion containing VBT | * ^1 | * | * | * |
> +| #2 VID/DID of LPC bridge at 00:1f.0 matches | | | * | * |
> +| #3 IGD is assigned to BDF 00:02.0 | | | * | * |
> +| #4 IGD has VGA controller device class | | | * | * |
> +| #5 Host's VGA ranges are mapped to IGD | | | * | |
> +| #6 Guest has valid VBIOS or UEFI Option ROM | | | * | * |
> +
> +^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
> + use the VBT copied from host OpRegion to prevent incorrect configuration.
> +
> +For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
> +guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
> +
> +For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge
> +to guest. Currently this is only supported on i440fx machines as there is
> +already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
> +lead to unexpected behavior.
> +
> +For #3, "addr=2.0" assigns IGD to 00:02.0.
> +
> +For #4, the primary display must be set to IGD in host BIOS.
> +
> +For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
> +
> +For #6, ROM either provided via the ROM BAR or romfile= option is needed, this
> +Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
> +"Guest firmware" section.
> +
> +QEMU also provides a "Legacy" mode that implicitly enables full functionality
> +on IGD, it is automatically enabled when
> +* Machine type is i440fx
> +* IGD is assigned to guest BDF 00:02.0
> +* ROM BAR or romfile is present
> +
> +In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
> +VGA range access, which is equivalent to:
> + x-igd-opregion=on,x-igd-lpc=on,x-vga=on
> +
> +By default, "Legacy" mode won't fail, it continues on error. User can set
> +"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
> +conditions above for legacy mode is met, and if any error occurs, QEMU will
> +fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy
> +mode.
> +
> +In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
> +graphics devices should be removed, this can be done using "-nographic" or
> +"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
>
> For either mode, depending on the host kernel, the i915 driver in the host
> may generate faults and errors upon re-binding to an IGD device after it
> @@ -73,31 +98,39 @@ DVI, or DisplayPort) may be unsupported in some use cases. In the author's
> experience, even DP to VGA adapters can be troublesome while adapters between
> digital formats work well.
>
> -Usage
> -=====
> -The intention is for IGD assignment to be transparent for users and thus for
> -management tools like libvirt. To make use of legacy mode, simply remove all
> -other graphics options and use "-nographic" and either "-vga none" or
> -"-nodefaults", along with adding the device using vfio-pci:
>
> - -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2
> +Options
> +=======
> +* x-igd-opregion=[on|*off*]
> + Copy host IGD OpRegion and expose it to guest with fw_cfg
> +
> +* x-igd-lpc=[on|*off*]
> + Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
> +
> +* x-igd-legacy-mode=[on|off|*auto*]
> + Enable/Disable legacy mode
> +
> +* x-igd-gms=[hex, default 0]
> + Overriding DSM region size in GGC register, 0 means uses host value.
> + Use this only when the DSM size cannot be changed through the
> + 'DVMT Pre-Allocated' option in host BIOS.
>
> -For UPT mode, retain the default emulated graphics and simply add the vfio-pci
> -device making use of any other bus address other than 02.0. libvirt will
> -default to assigning the device a UPT compatible address while legacy mode
> -users will need to manually edit the XML if using a tool like virt-manager
> -where the VM device address is not expressly specified.
>
> -An experimental vfio-pci option also exists to enable OpRegion, and thus
> -external monitor support, for UPT mode. This can be enabled by adding
> -"x-igd-opregion=on" to the vfio-pci device options for the IGD device. As
> -with legacy mode, this requires the host to support features introduced in
> -the v4.6 kernel. If Intel chooses to embrace this support, the option may
> -be made non-experimental in the future, opening it to libvirt support.
> +Examples
> +========
> +* Adding IGD with automatically legacy mode support
> + -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
>
> -Developer ABI
> -=============
> -Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> +* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
> + (For UEFI guests)
> + -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-opregion=on,x-igd-lpc=on,romfile=efi_oprom.rom
> +
> +
> +Guest firmware
> +==============
> +Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
> +Memory (BDSM) in guest address space. IGD passthrough support imposes two
> +fw_cfg requirements on the VM firmware:
>
> 1) "etc/igd-opregion"
>
> @@ -117,17 +150,108 @@ Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> Firmware must allocate a reserved memory below 4GB with required 1MB
> alignment equal to this size. Additionally the base address of this
> reserved region must be written to the dword BDSM register in PCI config
> - space of the IGD device at offset 0x5C. As this support is related to
> - running the IGD ROM, which has other dependencies on the device appearing
> - at guest address 00:02.0, it's expected that this fw_cfg file is only
> - relevant to a single PCI class VGA device with Intel vendor ID, appearing
> - at PCI bus address 00:02.0.
> + space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
> + 64-bit BDSM). As this support is related to running the IGD ROM, which
> + has other dependencies on the device appearing at guest address 00:02.0,
> + it's expected that this fw_cfg file is only relevant to a single PCI
> + class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0.
> +
> +Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
> +However, the support is not accepted by upstream EDK2/OVMF. A recommended
> +solution is to create a virtual OpRom with following DXE drivers:
> +
> +* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
> +* IntelGopDriver: Closed-source Intel GOP driver
> +* PlatformGopPolicy: Protocol required by IntelGopDriver
> +
> +IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD.
> +
> +The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
> +with PlatformGopPolicy for industrial computing is at [4]. There is also an
> +unofficially maintained version with newer Gen11+ device support at [5].
> +You need to build them with EDK2.
> +
> +For the IntelGopDriver, Intel never released it to public. You may contact
> +Intel support to get one as [4] said, if you are an Intel primer customer,
> +or you can try extract it from your host firmware using "UEFI BIOS Updater"[6].
> +
> +Once you got all the required DXE drivers, a Option ROM can be generated with
> +EfiRom utility in EDK2, using
> + EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
> + -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
> +
> +
> +Known issues
> +============
> +When using OVMF as guest firmware, you may encounter the following warning:
> +warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument)
> +Solution:
> +Set the host physical address bits to IOMMU address width using
> + -cpu host,host-phys-bits-limit=<IOMMU address width>
> +Or in libvirt XML with
> + <cpu>
> + <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
> + </cpu>
> +The IOMMU address width can be determined with
> +echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 ))
> +Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details
> +
> +
> +Memory View
> +===========
> +IGD has it own address space. To use system RAM as VRAM, a single-level page
> +table named Graphics Translation Table (GTT) is used for the address
> +translation. Each page table entry points a 4KB page. The translation flow is:
> +
> +(PTE size 8) +-------------+---+
> + | Address | V | V: Valid Bit
> + +-------------+---+
> + | ... | |
> +IGD:0x01ae9010 0xd740| 0x70ffc000 | 1 | Mem:0x42ba3e010^
> +-----------------> 0xd748| 0x42ba3e000 | 1 +------------------>
> +(addr << 12) * 8 0xd750| 0x42ba3f000 | 1 |
> + | ... | |
> + +-------------+---+
> +^ The address may be remapped by IOMMU
> +
> +The memory region store GTT is called GTT Stolen Memory (GSM), it is located
> +right below the Data Stolen Memory (DSM). Accessing this region directly is
> +not allowed, any access will immediately freeze the whole system. The only way
> +to access it is through the second half of MMIO BAR0.
> +
> +The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS
> +environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
> +reserving a continuous region and program its base address to BDSM register,
> +then let VBIOS/GOP driver initializing this region. Illustration below shows
> +how DSM is mapped.
> +
> + IGD Addr Space Host Addr Space Guest Addr Space
> + +-------------+ +-------------+ +-------------+
> + | | | | | |
> + | | | | | |
> + | | +-------------+ +-------------+
> + | | | Data Stolen | | Data Stolen |
> + | | | (Guest) | | (Guest) |
> + | | +------------>+-------------+<------->+-------------+<--Guest BDSM
> + | | | Passthrough | | EPT | | Emulated by QEMU
> +DSMSIZE+-------------+ | with IOMMU | | Mapping | | Programmed by guest FW
> + | | | | | | |
> + | | | | | | |
> + 0+-------------+--+ | | | |
> + | +-------------+ | |
> + | | Data Stolen | +-------------+
> + | | (Host) |
> + +------------>+-------------+<--Host BDSM
> + Non- | | "real" one in HW
> + Passthrough | | Programmed by host FW
> + +-------------+
>
> Footnotes
> =========
> -[1] Nothing precludes adding additional emulated or assigned graphics devices
> - as non-primary, other than the combination typically not working. I only
> - intend to set user expectations, others are welcome to find working
> - combinations or fix whatever issues prevent this from working in the common
> - case.
> +[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
> [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
> +[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
> + Tianocore bugzilla was down since Jan 2025 :(
> +[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004
> +[5] https://github.com/tomitamoeko/VfioIgdPkg
> +[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vfio/igd: Update IGD passthrough docoumentation
2025-03-12 15:50 [PATCH] vfio/igd: Update IGD passthrough docoumentation Tomita Moeko
2025-03-12 16:29 ` Alex Williamson
2025-03-13 8:37 ` Cédric Le Goater
@ 2025-03-13 11:04 ` Corvin Köhne
2 siblings, 0 replies; 4+ messages in thread
From: Corvin Köhne @ 2025-03-13 11:04 UTC (permalink / raw)
To: Tomita Moeko, Alex Williamson, Cédric Le Goater; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 18302 bytes --]
On Wed, 2025-03-12 at 23:50 +0800, Tomita Moeko wrote:
> A previous change made the OpRegion and LPC quirks independent of the
> exising legacy mode, update the docoumentation accordingly. More related
> topics, like creating EFI Option ROM of IGD for OVMF, how to solve the
> VFIO_DMA_MAP Invalid Argument warning, as well as details on IGD memory
> internals, are also added.
>
> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> ---
> docs/igd-assign.txt | 262 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 193 insertions(+), 69 deletions(-)
>
> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> index e17bb50789..c7c4565906 100644
> --- a/docs/igd-assign.txt
> +++ b/docs/igd-assign.txt
> @@ -1,44 +1,69 @@
> Intel Graphics Device (IGD) assignment with vfio-pci
> ====================================================
>
> -IGD has two different modes for assignment using vfio-pci:
> -
> -1) Universal Pass-Through (UPT) mode:
> -
> - In this mode the IGD device is added as a *secondary* (ie. non-primary)
> - graphics device in combination with an emulated primary graphics device.
> - This mode *requires* guest driver support to remove the external
> - dependencies generally associated with IGD (see below). Those guest
> - drivers only support this mode for Broadwell and newer IGD, according to
> - Intel. Additionally, this mode by default, and as officially supported
> - by Intel, does not support direct video output. The intention is to use
> - this mode either to provide hardware acceleration to the emulated graphics
> - or to use this mode in combination with guest-based remote access
> software,
> - for example VNC (see below for optional output support). This mode
> - theoretically has no device specific handling dependencies on vfio-pci or
> - the VM firmware.
> -
> -2) "Legacy" mode:
> -
> - In this mode the IGD device is intended to be the primary and exclusive
> - graphics device in the VM[1], as such QEMU does not facilitate any sort
> - of remote graphics to the VM in this mode. A connected physical monitor
> - is the intended output device for IGD. This mode includes several
> - requirements and restrictions:
> -
> - * IGD must be given address 02.0 on the PCI root bus in the VM
> - * The host kernel must support vfio extensions for IGD (v4.6)
> - * vfio VGA support very likely needs to be enabled in the host kernel
> - * The VM firmware must support specific fw_cfg enablers for IGD
> - * The VM machine type must support a PCI host bridge at 00.0 (standard)
> - * The VM machine type must provide or allow to be created a special
> - ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> - PCI address 1f.0.
> - * The IGD device must have a VGA ROM, either provided via the romfile
> - option or loaded automatically through vfio (standard). rombar=0
> - will disable legacy mode support.
> - * Hotplug of the IGD device is not supported.
> - * The IGD device must be a SandyBridge or newer model device.
> +Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest,
> either
> +serve as primary and exclusive graphics adapter, or used in combination with
> an
> +emulated primary graphics device, depending on the config and guest driver
> +support. However, IGD devices are not "clean" PCI devices, they use extra
> +memory regions other than BARs. Special handling is required to make them
> work
> +properly, including:
> +
> +* OpRegion for accessing Virtual BIOS Table (VBT) that contains display
> output
> + information.
> +* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
> +
> +Certain guest software also depends on following conditions to work:
> +(*-Required by)
> +
> +| Condition | Linux | Windows | VBIOS | EFI
> GOP |
> +|---------------------------------------------|-------|---------|-------|----
> -----|
> +| #1 IGD has a valid OpRegion containing VBT | * ^1 | * | * |
> * |
> +| #2 VID/DID of LPC bridge at 00:1f.0 matches | | | * |
> * |
> +| #3 IGD is assigned to BDF 00:02.0 | | | * |
> * |
> +| #4 IGD has VGA controller device class | | | * |
> * |
> +| #5 Host's VGA ranges are mapped to IGD | | | *
> | |
> +| #6 Guest has valid VBIOS or UEFI Option ROM | | | * |
> * |
> +
> +^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
> + use the VBT copied from host OpRegion to prevent incorrect configuration.
> +
> +For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
> +guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
> +
> +For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host
> bridge
> +to guest. Currently this is only supported on i440fx machines as there is
> +already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
> +lead to unexpected behavior.
> +
> +For #3, "addr=2.0" assigns IGD to 00:02.0.
> +
> +For #4, the primary display must be set to IGD in host BIOS.
> +
> +For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
> +
> +For #6, ROM either provided via the ROM BAR or romfile= option is needed,
> this
> +Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
> +"Guest firmware" section.
> +
> +QEMU also provides a "Legacy" mode that implicitly enables full functionality
> +on IGD, it is automatically enabled when
> +* Machine type is i440fx
> +* IGD is assigned to guest BDF 00:02.0
> +* ROM BAR or romfile is present
> +
> +In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
> +VGA range access, which is equivalent to:
> + x-igd-opregion=on,x-igd-lpc=on,x-vga=on
> +
> +By default, "Legacy" mode won't fail, it continues on error. User can set
> +"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
> +conditions above for legacy mode is met, and if any error occurs, QEMU will
> +fail immediately. Users can also set "x-igd-legacy-mode=off" to disable
> legacy
> +mode.
> +
> +In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
> +graphics devices should be removed, this can be done using "-nographic" or
> +"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
>
> For either mode, depending on the host kernel, the i915 driver in the host
> may generate faults and errors upon re-binding to an IGD device after it
> @@ -73,31 +98,39 @@ DVI, or DisplayPort) may be unsupported in some use
> cases. In the author's
> experience, even DP to VGA adapters can be troublesome while adapters between
> digital formats work well.
>
> -Usage
> -=====
> -The intention is for IGD assignment to be transparent for users and thus for
> -management tools like libvirt. To make use of legacy mode, simply remove all
> -other graphics options and use "-nographic" and either "-vga none" or
> -"-nodefaults", along with adding the device using vfio-pci:
>
> - -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2
> +Options
> +=======
> +* x-igd-opregion=[on|*off*]
> + Copy host IGD OpRegion and expose it to guest with fw_cfg
> +
> +* x-igd-lpc=[on|*off*]
> + Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
> +
> +* x-igd-legacy-mode=[on|off|*auto*]
> + Enable/Disable legacy mode
> +
> +* x-igd-gms=[hex, default 0]
> + Overriding DSM region size in GGC register, 0 means uses host value.
> + Use this only when the DSM size cannot be changed through the
> + 'DVMT Pre-Allocated' option in host BIOS.
>
> -For UPT mode, retain the default emulated graphics and simply add the vfio-
> pci
> -device making use of any other bus address other than 02.0. libvirt will
> -default to assigning the device a UPT compatible address while legacy mode
> -users will need to manually edit the XML if using a tool like virt-manager
> -where the VM device address is not expressly specified.
>
> -An experimental vfio-pci option also exists to enable OpRegion, and thus
> -external monitor support, for UPT mode. This can be enabled by adding
> -"x-igd-opregion=on" to the vfio-pci device options for the IGD device. As
> -with legacy mode, this requires the host to support features introduced in
> -the v4.6 kernel. If Intel chooses to embrace this support, the option may
> -be made non-experimental in the future, opening it to libvirt support.
> +Examples
> +========
> +* Adding IGD with automatically legacy mode support
> + -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
>
> -Developer ABI
> -=============
> -Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> +* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
> + (For UEFI guests)
> + -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-
> igd-opregion=on,x-igd-lpc=on,romfile=efi_oprom.rom
> +
> +
> +Guest firmware
> +==============
> +Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
> +Memory (BDSM) in guest address space. IGD passthrough support imposes two
> +fw_cfg requirements on the VM firmware:
>
> 1) "etc/igd-opregion"
>
> @@ -117,17 +150,108 @@ Legacy mode IGD support imposes two fw_cfg requirements
> on the VM firmware:
> Firmware must allocate a reserved memory below 4GB with required 1MB
> alignment equal to this size. Additionally the base address of this
> reserved region must be written to the dword BDSM register in PCI config
> - space of the IGD device at offset 0x5C. As this support is related to
> - running the IGD ROM, which has other dependencies on the device appearing
> - at guest address 00:02.0, it's expected that this fw_cfg file is only
> - relevant to a single PCI class VGA device with Intel vendor ID, appearing
> - at PCI bus address 00:02.0.
> + space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
> + 64-bit BDSM). As this support is related to running the IGD ROM, which
> + has other dependencies on the device appearing at guest address 00:02.0,
> + it's expected that this fw_cfg file is only relevant to a single PCI
> + class VGA device with Intel vendor ID, appearing at PCI bus address
> 00:02.0.
> +
> +Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
> +However, the support is not accepted by upstream EDK2/OVMF. A recommended
> +solution is to create a virtual OpRom with following DXE drivers:
> +
> +* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
> +* IntelGopDriver: Closed-source Intel GOP driver
> +* PlatformGopPolicy: Protocol required by IntelGopDriver
> +
> +IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on
> IGD.
> +
> +The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
> +with PlatformGopPolicy for industrial computing is at [4]. There is also an
> +unofficially maintained version with newer Gen11+ device support at [5].
> +You need to build them with EDK2.
> +
> +For the IntelGopDriver, Intel never released it to public. You may contact
> +Intel support to get one as [4] said, if you are an Intel primer customer,
> +or you can try extract it from your host firmware using "UEFI BIOS
> Updater"[6].
> +
> +Once you got all the required DXE drivers, a Option ROM can be generated with
> +EfiRom utility in EDK2, using
> + EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
> + -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
> +
> +
> +Known issues
> +============
> +When using OVMF as guest firmware, you may encounter the following warning:
> +warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000,
> 0x7fd336000000) = -22 (Invalid argument)
> +Solution:
> +Set the host physical address bits to IOMMU address width using
> + -cpu host,host-phys-bits-limit=<IOMMU address width>
> +Or in libvirt XML with
> + <cpu>
> + <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
> + </cpu>
> +The IOMMU address width can be determined with
> +echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) &
> 0x3F0000) >> 16) + 1 ))
> +Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more
> details
> +
> +
> +Memory View
> +===========
> +IGD has it own address space. To use system RAM as VRAM, a single-level page
> +table named Graphics Translation Table (GTT) is used for the address
> +translation. Each page table entry points a 4KB page. The translation flow
> is:
> +
> +(PTE size 8) +-------------+---+
> + | Address | V | V: Valid Bit
> + +-------------+---+
> + | ... | |
> +IGD:0x01ae9010 0xd740| 0x70ffc000 | 1 | Mem:0x42ba3e010^
> +-----------------> 0xd748| 0x42ba3e000 | 1 +------------------>
> +(addr << 12) * 8 0xd750| 0x42ba3f000 | 1 |
> + | ... | |
> + +-------------+---+
> +^ The address may be remapped by IOMMU
> +
> +The memory region store GTT is called GTT Stolen Memory (GSM), it is located
> +right below the Data Stolen Memory (DSM). Accessing this region directly is
> +not allowed, any access will immediately freeze the whole system. The only
> way
> +to access it is through the second half of MMIO BAR0.
> +
> +The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-
> OS
> +environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
> +reserving a continuous region and program its base address to BDSM register,
> +then let VBIOS/GOP driver initializing this region. Illustration below shows
> +how DSM is mapped.
> +
> + IGD Addr Space Host Addr Space Guest Addr
> Space
> + +-------------+ +-------------+ +-------------+
> + | | | | | |
> + | | | | | |
> + | | +-------------+ +-------------+
> + | | | Data Stolen | | Data Stolen |
> + | | | (Guest) | | (Guest) |
> + | | +------------>+-------------+<------->+-------------
> +<--Guest BDSM
> + | | | Passthrough | | EPT |
> | Emulated by QEMU
> +DSMSIZE+-------------+ | with IOMMU | | Mapping |
> | Programmed by guest FW
> + | | | | | | |
> + | | | | | | |
> + 0+-------------+--+ | | | |
> + | +-------------+ | |
> + | | Data Stolen | +-------------+
> + | | (Host) |
> + +------------>+-------------+<--Host BDSM
> + Non- | | "real" one in HW
> + Passthrough | | Programmed by host FW
> + +-------------+
>
> Footnotes
> =========
> -[1] Nothing precludes adding additional emulated or assigned graphics devices
> - as non-primary, other than the combination typically not working. I only
> - intend to set user expectations, others are welcome to find working
> - combinations or fix whatever issues prevent this from working in the
> common
> - case.
> +[1]
> https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
> [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
> +[3]
> https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
> + Tianocore bugzilla was down since Jan 2025 :(
> +[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch
> 0001-0004
> +[5] https://github.com/tomitamoeko/VfioIgdPkg
> +[6]
> https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread