qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk
@ 2025-01-24 19:12 Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 1/5] vfio/igd: remove GTT write quirk in IO BAR 4 Tomita Moeko
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-01-24 19:12 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: qemu-devel, Tomita Moeko

Based on experiments and reverse engineering about the mysterious IO
BAR4, it appears that the current quirk implementation is incorrect.
As discussed in a previous mail thread [1], current implementation
believes VBIOS is writing HPA of Data Stolen Memory (DSM) in GTT
entries, so it intercepts and writes (addr - Host BDSM + Guest BDSM)
instead, probably believes the addresses are hardcoded in VBIOS.

With disassembling a VBIOS of Intel Sandy/Ivy Bridge and Haswell [2]
(thankfully there is only a few 32-bit out instructions), VBIOS read
BDSM from pci config space, which is programmed by SeaBIOS [3] in
guest. That is to say, VBIOS actually writes GPA when programming GTT
initial entries, the current quirk is incorrect.

The change was tested on a Kaby Lake HD620 (Gen 9) by comparing GTT
entries read from MMIO BAR0 and IO BAR4.

This patchset also creates a new function vfio_config_quirk_setup()
for the config space quirk previously in vfio_probe_igd_bar4_quirk().
The OpRegion quirk in vfio_realize() is also merged into the config
qurk. Finally all the igd-related codes are moved to igd.c.

[1] https://lore.kernel.org/qemu-devel/b373a030-5ccf-418c-9213-865ddc6748fd@gmail.com/
[2] https://winraid.level1techs.com/t/offer-intel-sandy-ivy-bridge-and-haswell-vbios-modules/30272
[3] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Related VBIOS disassembly:

# read BDSM
6498:	66 53                	push   %ebx
649a:	66 2e 83 3e 94 64 00 	cmpl   $0x0,%cs:0x6494  # BDSM already read?
64a1:	74 07                	je     0x64aa
64a3:	66 2e a1 94 64       	mov    %cs:0x6494,%eax 
64a8:	eb 0f                	jmp    0x64b9
64aa:	b8 5e 10             	mov    $0x105e,%ax      # 5e == 5c + 2, that's why shl 16 later?
                                                    # ((2 << 11) + 0x5e)
64ad:	e8 55 05             	call   0x6a05           # read config space via port cf8/cfc
64b0:	66 c1 e0 10          	shl    $0x10,%eax       # eax << 16
64b4:	66 2e a3 94 64       	mov    %eax,%cs:0x6494  # save BDSM
64b9:	66 5b                	pop    %ebx
64bb:	c3                   	ret
# program single GTT entry, esi: addr, eax: data
6d16:	66 50                	push   %eax
6d18:	52                   	push   %dx
6d19:	2e 8b 16 4a e4       	mov    %cs:-0x1bb6,%dx  # saved MMIO_Index port
6d1e:	66 96                	xchg   %eax,%esi
6d20:	66 ef                	out    %eax,(%dx)       # write MMIO_Index
6d22:	2e 8b 16 4c e4       	mov    %cs:-0x1bb4,%dx  # saved MMIO_Data port
6d27:	66 96                	xchg   %eax,%esi
6d29:	66 83 c8 01          	or     $0x1,%eax        # set valid bit
6d2d:	66 ef                	out    %eax,(%dx)       # write MMIO_Data
6d2f:	5a                   	pop    %dx
6d30:	66 58                	pop    %eax
6d32:	c3                   	ret
# program GTT entries, esi: entry addr, eax: page addr
e929:	51                   	push   %cx
e92a:	e8 6b 7b             	call   0x6498       # read BDSM
e92d:	25 00 e0             	and    $0xe000,%ax
e930:	66 83 c8 01          	or     $0x1,%eax    # valid bit
e934:	66 be 01 00 00 00    	mov    $0x1,%esi    # gtt access bit in addr
e93a:	e8 d9 83             	call   0x6d16       # single entry routine
e93d:	66 83 c6 04          	add    $0x4,%esi    # next entry
e941:	66 05 00 10 00 00    	add    $0x1000,%eax # next page
e947:	e2 f1                	loop   0xe93a
e949:	59                   	pop    %cx
e94a:	c3                   	ret

Observed layout of the undocumented MMIO_Index register:
 31                                                   2   1      0
+-------------------------------------------------------------------+
|                        Offset                        | Rsvd | Sel |
+-------------------------------------------------------------------+
- Offset: Byte offset in specified region, 4-byte aligned.
- Sel: Region selector
       0: MMIO register region (first half of MMIO BAR0)
       1: GTT region (second half of MMIO BAR0). Pre Gen11 only.


Changelog:
v2:
* Move the OpRegion quirk from vfio_realize() to vfio_probe_igd_config_
  quirk(), as proposed in v1, making it reasonable to return bool.
Link: https://lore.kernel.org/qemu-devel/20250122171731.40444-1-tomitamoeko@gmail.com/

Tomita Moeko (5):
  vfio/igd: remove GTT write quirk in IO BAR 4
  vfio/pci: add placeholder for device-specific config space quirks
  vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk
  vfio/igd: do not include GTT stolen size in etc/igd-bdsm-size
  vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()

 hw/vfio/igd.c        | 381 ++++++++++++++-----------------------------
 hw/vfio/pci-quirks.c |  51 +-----
 hw/vfio/pci.c        |  29 +---
 hw/vfio/pci.h        |   7 +-
 4 files changed, 136 insertions(+), 332 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] vfio/igd: remove GTT write quirk in IO BAR 4
  2025-01-24 19:12 [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk Tomita Moeko
@ 2025-01-24 19:12 ` Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 2/5] vfio/pci: add placeholder for device-specific config space quirks Tomita Moeko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-01-24 19:12 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: qemu-devel, Tomita Moeko

The IO BAR4 of IGD devices contains a pair of 32-bit address/data
registers, MMIO_Index (0x0) and MMIO_Data (0x4), which provide access
to the MMIO BAR0 (GTTMMADR) from IO space. These registers are probably
only used by the VBIOS, and are not documented by intel. The observed
layout of MMIO_Index register is:
 31                                                   2   1      0
+-------------------------------------------------------------------+
|                        Offset                        | Rsvd | Sel |
+-------------------------------------------------------------------+
- Offset: Byte offset in specified region, 4-byte aligned.
- Sel: Region selector
       0: MMIO register region (first half of MMIO BAR0)
       1: GTT region (second half of MMIO BAR0). Pre Gen11 only.

Currently, QEMU implements a quirk that adjusts the guest Data Stolen
Memory (DSM) region address to be (addr - host BDSM + guest BDSM) when
programming GTT entries via IO BAR4, assuming guest still programs GTT
with host DSM address, which is not the case. Guest's BDSM register is
emulated and initialized to 0 at startup by QEMU, then SeaBIOS programs
its value[1]. As result, the address programmed to GTT entries by VBIOS
running in guest are valid GPA, and this unnecessary adjustment brings
inconsistency.

[1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
---
 hw/vfio/igd.c | 191 +-------------------------------------------------
 1 file changed, 1 insertion(+), 190 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index b1a237edd6..ca3a32f4f2 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -106,12 +106,6 @@ static int igd_gen(VFIOPCIDevice *vdev)
     return -1;
 }
 
-typedef struct VFIOIGDQuirk {
-    struct VFIOPCIDevice *vdev;
-    uint32_t index;
-    uint64_t bdsm;
-} VFIOIGDQuirk;
-
 #define IGD_GMCH 0x50 /* Graphics Control Register */
 #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
 #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
@@ -300,129 +294,6 @@ static int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
     return ret;
 }
 
-/*
- * IGD Gen8 and newer support up to 8MB for the GTT and use a 64bit PTE
- * entry, older IGDs use 2MB and 32bit.  Each PTE maps a 4k page.  Therefore
- * we either have 2M/4k * 4 = 2k or 8M/4k * 8 = 16k as the maximum iobar index
- * for programming the GTT.
- *
- * See linux:include/drm/i915_drm.h for shift and mask values.
- */
-static int vfio_igd_gtt_max(VFIOPCIDevice *vdev)
-{
-    uint32_t gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, sizeof(gmch));
-    int gen = igd_gen(vdev);
-    uint64_t ggms_size = igd_gtt_memory_size(gen, gmch);
-
-    return (ggms_size / (4 * KiB)) * (gen < 8 ? 4 : 8);
-}
-
-/*
- * The IGD ROM will make use of stolen memory (GGMS) for support of VESA modes.
- * Somehow the host stolen memory range is used for this, but how the ROM gets
- * it is a mystery, perhaps it's hardcoded into the ROM.  Thankfully though, it
- * reprograms the GTT through the IOBAR where we can trap it and transpose the
- * programming to the VM allocated buffer.  That buffer gets reserved by the VM
- * firmware via the fw_cfg entry added below.  Here we're just monitoring the
- * IOBAR address and data registers to detect a write sequence targeting the
- * GTTADR.  This code is developed by observed behavior and doesn't have a
- * direct spec reference, unfortunately.
- */
-static uint64_t vfio_igd_quirk_data_read(void *opaque,
-                                         hwaddr addr, unsigned size)
-{
-    VFIOIGDQuirk *igd = opaque;
-    VFIOPCIDevice *vdev = igd->vdev;
-
-    igd->index = ~0;
-
-    return vfio_region_read(&vdev->bars[4].region, addr + 4, size);
-}
-
-static void vfio_igd_quirk_data_write(void *opaque, hwaddr addr,
-                                      uint64_t data, unsigned size)
-{
-    VFIOIGDQuirk *igd = opaque;
-    VFIOPCIDevice *vdev = igd->vdev;
-    uint64_t val = data;
-    int gen = igd_gen(vdev);
-
-    /*
-     * Programming the GGMS starts at index 0x1 and uses every 4th index (ie.
-     * 0x1, 0x5, 0x9, 0xd,...).  For pre-Gen8 each 4-byte write is a whole PTE
-     * entry, with 0th bit enable set.  For Gen8 and up, PTEs are 64bit, so
-     * entries 0x5 & 0xd are the high dword, in our case zero.  Each PTE points
-     * to a 4k page, which we translate to a page from the VM allocated region,
-     * pointed to by the BDSM register.  If this is not set, we fail.
-     *
-     * We trap writes to the full configured GTT size, but we typically only
-     * see the vBIOS writing up to (nearly) the 1MB barrier.  In fact it often
-     * seems to miss the last entry for an even 1MB GTT.  Doing a gratuitous
-     * write of that last entry does work, but is hopefully unnecessary since
-     * we clear the previous GTT on initialization.
-     */
-    if ((igd->index % 4 == 1) && igd->index < vfio_igd_gtt_max(vdev)) {
-        if (gen < 8 || (igd->index % 8 == 1)) {
-            uint64_t base;
-
-            if (gen < 11) {
-                base = pci_get_long(vdev->pdev.config + IGD_BDSM);
-            } else {
-                base = pci_get_quad(vdev->pdev.config + IGD_BDSM_GEN11);
-            }
-            if (!base) {
-                hw_error("vfio-igd: Guest attempted to program IGD GTT before "
-                         "BIOS reserved stolen memory.  Unsupported BIOS?");
-            }
-
-            val = data - igd->bdsm + base;
-        } else {
-            val = 0; /* upper 32bits of pte, we only enable below 4G PTEs */
-        }
-
-        trace_vfio_pci_igd_bar4_write(vdev->vbasedev.name,
-                                      igd->index, data, val);
-    }
-
-    vfio_region_write(&vdev->bars[4].region, addr + 4, val, size);
-
-    igd->index = ~0;
-}
-
-static const MemoryRegionOps vfio_igd_data_quirk = {
-    .read = vfio_igd_quirk_data_read,
-    .write = vfio_igd_quirk_data_write,
-    .endianness = DEVICE_LITTLE_ENDIAN,
-};
-
-static uint64_t vfio_igd_quirk_index_read(void *opaque,
-                                          hwaddr addr, unsigned size)
-{
-    VFIOIGDQuirk *igd = opaque;
-    VFIOPCIDevice *vdev = igd->vdev;
-
-    igd->index = ~0;
-
-    return vfio_region_read(&vdev->bars[4].region, addr, size);
-}
-
-static void vfio_igd_quirk_index_write(void *opaque, hwaddr addr,
-                                       uint64_t data, unsigned size)
-{
-    VFIOIGDQuirk *igd = opaque;
-    VFIOPCIDevice *vdev = igd->vdev;
-
-    igd->index = data;
-
-    vfio_region_write(&vdev->bars[4].region, addr, data, size);
-}
-
-static const MemoryRegionOps vfio_igd_index_quirk = {
-    .read = vfio_igd_quirk_index_read,
-    .write = vfio_igd_quirk_index_write,
-    .endianness = DEVICE_LITTLE_ENDIAN,
-};
-
 #define IGD_GGC_MMIO_OFFSET     0x108040
 #define IGD_BDSM_MMIO_OFFSET    0x1080C0
 
@@ -494,14 +365,11 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     g_autofree struct vfio_region_info *opregion = NULL;
     g_autofree struct vfio_region_info *host = NULL;
     g_autofree struct vfio_region_info *lpc = NULL;
-    VFIOQuirk *quirk;
-    VFIOIGDQuirk *igd;
     PCIDevice *lpc_bridge;
-    int i, ret, gen;
+    int ret, gen;
     uint64_t ggms_size, gms_size;
     uint64_t *bdsm_size;
     uint32_t gmch;
-    uint16_t cmd_orig, cmd;
     Error *err = NULL;
 
     /*
@@ -634,32 +502,6 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    /* Setup our quirk to munge GTT addresses to the VM allocated buffer */
-    quirk = vfio_quirk_alloc(2);
-    igd = quirk->data = g_malloc0(sizeof(*igd));
-    igd->vdev = vdev;
-    igd->index = ~0;
-    if (gen < 11) {
-        igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM, 4);
-    } else {
-        igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM_GEN11, 4);
-        igd->bdsm |=
-            (uint64_t)vfio_pci_read_config(&vdev->pdev, IGD_BDSM_GEN11 + 4, 4) << 32;
-    }
-    igd->bdsm &= ~((1 * MiB) - 1); /* 1MB aligned */
-
-    memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_index_quirk,
-                          igd, "vfio-igd-index-quirk", 4);
-    memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
-                                        0, &quirk->mem[0], 1);
-
-    memory_region_init_io(&quirk->mem[1], OBJECT(vdev), &vfio_igd_data_quirk,
-                          igd, "vfio-igd-data-quirk", 4);
-    memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
-                                        4, &quirk->mem[1], 1);
-
-    QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
-
     /*
      * Allow user to override dsm size using x-igd-gms option, in multiples of
      * 32MiB. This option should only be used when the desired size cannot be
@@ -717,37 +559,6 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
         pci_set_quad(vdev->emulated_config_bits + IGD_BDSM_GEN11, ~0);
     }
 
-    /*
-     * This IOBAR gives us access to GTTADR, which allows us to write to
-     * the GTT itself.  So let's go ahead and write zero to all the GTT
-     * entries to avoid spurious DMA faults.  Be sure I/O access is enabled
-     * before talking to the device.
-     */
-    if (pread(vdev->vbasedev.fd, &cmd_orig, sizeof(cmd_orig),
-              vdev->config_offset + PCI_COMMAND) != sizeof(cmd_orig)) {
-        error_report("IGD device %s - failed to read PCI command register",
-                     vdev->vbasedev.name);
-    }
-
-    cmd = cmd_orig | PCI_COMMAND_IO;
-
-    if (pwrite(vdev->vbasedev.fd, &cmd, sizeof(cmd),
-               vdev->config_offset + PCI_COMMAND) != sizeof(cmd)) {
-        error_report("IGD device %s - failed to write PCI command register",
-                     vdev->vbasedev.name);
-    }
-
-    for (i = 1; i < vfio_igd_gtt_max(vdev); i += 4) {
-        vfio_region_write(&vdev->bars[4].region, 0, i, 4);
-        vfio_region_write(&vdev->bars[4].region, 4, 0, 4);
-    }
-
-    if (pwrite(vdev->vbasedev.fd, &cmd_orig, sizeof(cmd_orig),
-               vdev->config_offset + PCI_COMMAND) != sizeof(cmd_orig)) {
-        error_report("IGD device %s - failed to restore PCI command register",
-                     vdev->vbasedev.name);
-    }
-
     trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name,
                                     (ggms_size + gms_size) / MiB);
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] vfio/pci: add placeholder for device-specific config space quirks
  2025-01-24 19:12 [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 1/5] vfio/igd: remove GTT write quirk in IO BAR 4 Tomita Moeko
@ 2025-01-24 19:12 ` Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk Tomita Moeko
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-01-24 19:12 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: qemu-devel, Tomita Moeko

Some devices, such as IGD, require device-specific quirks to be applied
to their pci config spaces. Currently, these quirks are either part of
BAR quirk, or being a part of vfio_realize(). Add a placeholder for pci
config quirks for moving the quirks to one place later.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
---
 hw/vfio/pci-quirks.c | 5 +++++
 hw/vfio/pci.c        | 4 ++++
 hw/vfio/pci.h        | 1 +
 3 files changed, 10 insertions(+)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index fbe43b0a79..c40e3ca88f 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1167,6 +1167,11 @@ bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
 /*
  * Common quirk probe entry points.
  */
+bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp)
+{
+    return true;
+}
+
 void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
 {
     vfio_vga_probe_ati_3c3_quirk(vdev);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ab17a98ee5..e624ae56c4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3124,6 +3124,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         goto out_unset_idev;
     }
 
+    if (!vfio_config_quirk_setup(vdev, errp)) {
+        goto out_unset_idev;
+    }
+
     if (vdev->vga) {
         vfio_vga_quirk_setup(vdev);
     }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 43c166680a..5359e94f18 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -205,6 +205,7 @@ uint64_t vfio_vga_read(void *opaque, hwaddr addr, unsigned size);
 void vfio_vga_write(void *opaque, hwaddr addr, uint64_t data, unsigned size);
 
 bool vfio_opt_rom_in_denylist(VFIOPCIDevice *vdev);
+bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp);
 void vfio_vga_quirk_setup(VFIOPCIDevice *vdev);
 void vfio_vga_quirk_exit(VFIOPCIDevice *vdev);
 void vfio_vga_quirk_finalize(VFIOPCIDevice *vdev);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk
  2025-01-24 19:12 [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 1/5] vfio/igd: remove GTT write quirk in IO BAR 4 Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 2/5] vfio/pci: add placeholder for device-specific config space quirks Tomita Moeko
@ 2025-01-24 19:12 ` Tomita Moeko
  2025-01-31  9:14   ` Cédric Le Goater
  2025-01-24 19:12 ` [PATCH v2 4/5] vfio/igd: do not include GTT stolen size in etc/igd-bdsm-size Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk() Tomita Moeko
  4 siblings, 1 reply; 15+ messages in thread
From: Tomita Moeko @ 2025-01-24 19:12 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: qemu-devel, Tomita Moeko

The actual IO BAR4 write quirk in vfio_probe_igd_bar4_quirk() was
removed in previous change, leaving the function not matching its name,
so move it into the newly introduced vfio_config_quirk_setup(). There
is no functional change in this commit. If any failure occurs, the
function simply returns and proceeds.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
---
 hw/vfio/igd.c        | 31 +++++++++++++++++--------------
 hw/vfio/pci-quirks.c |  6 +++++-
 hw/vfio/pci.h        |  2 +-
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index ca3a32f4f2..376a26fbae 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -359,7 +359,8 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
 }
 
-void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
+bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
+                                 Error **errp G_GNUC_UNUSED)
 {
     g_autofree struct vfio_region_info *rom = NULL;
     g_autofree struct vfio_region_info *opregion = NULL;
@@ -378,10 +379,10 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
      * PCI bus address.
      */
     if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
-        !vfio_is_vga(vdev) || nr != 4 ||
+        !vfio_is_vga(vdev) ||
         &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
                                        0, PCI_DEVFN(0x2, 0))) {
-        return;
+        return true;
     }
 
     /*
@@ -395,7 +396,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
                                            "vfio-pci-igd-lpc-bridge")) {
         error_report("IGD device %s cannot support legacy mode due to existing "
                      "devices at address 1f.0", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /*
@@ -407,7 +408,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if (gen == -1) {
         error_report("IGD device %s is unsupported in legacy mode, "
                      "try SandyBridge or newer", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /*
@@ -420,7 +421,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if ((ret || !rom->size) && !vdev->pdev.romfile) {
         error_report("IGD device %s has no ROM, legacy mode disabled",
                      vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /*
@@ -431,7 +432,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
         error_report("IGD device %s hotplugged, ROM disabled, "
                      "legacy mode disabled", vdev->vbasedev.name);
         vdev->rom_read_failed = true;
-        return;
+        return true;
     }
 
     /*
@@ -444,7 +445,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if (ret) {
         error_report("IGD device %s does not support OpRegion access,"
                      "legacy mode disabled", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     ret = vfio_get_dev_region_info(&vdev->vbasedev,
@@ -453,7 +454,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if (ret) {
         error_report("IGD device %s does not support host bridge access,"
                      "legacy mode disabled", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     ret = vfio_get_dev_region_info(&vdev->vbasedev,
@@ -462,7 +463,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if (ret) {
         error_report("IGD device %s does not support LPC bridge access,"
                      "legacy mode disabled", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, 4);
@@ -476,7 +477,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
         error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
         error_report("IGD device %s failed to enable VGA access, "
                      "legacy mode disabled", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /* Create our LPC/ISA bridge */
@@ -484,7 +485,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if (ret) {
         error_report("IGD device %s failed to create LPC bridge, "
                      "legacy mode disabled", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /* Stuff some host values into the VM PCI host bridge */
@@ -492,14 +493,14 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     if (ret) {
         error_report("IGD device %s failed to modify host bridge, "
                      "legacy mode disabled", vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /* Setup OpRegion access */
     if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
         error_append_hint(&err, "IGD legacy mode disabled\n");
         error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
-        return;
+        return true;
     }
 
     /*
@@ -561,4 +562,6 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 
     trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name,
                                     (ggms_size + gms_size) / MiB);
+
+    return true;
 }
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index c40e3ca88f..b8379cb512 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1169,6 +1169,11 @@ bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
  */
 bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp)
 {
+#ifdef CONFIG_VFIO_IGD
+    if (!vfio_probe_igd_config_quirk(vdev, errp)) {
+        return false;
+    }
+#endif
     return true;
 }
 
@@ -1220,7 +1225,6 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
     vfio_probe_rtl8168_bar2_quirk(vdev, nr);
 #ifdef CONFIG_VFIO_IGD
     vfio_probe_igd_bar0_quirk(vdev, nr);
-    vfio_probe_igd_bar4_quirk(vdev, nr);
 #endif
 }
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 5359e94f18..5c64de0270 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -217,7 +217,7 @@ bool vfio_add_virt_caps(VFIOPCIDevice *vdev, Error **errp);
 void vfio_quirk_reset(VFIOPCIDevice *vdev);
 VFIOQuirk *vfio_quirk_alloc(int nr_mem);
 void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr);
-void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
+bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp);
 
 extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] vfio/igd: do not include GTT stolen size in etc/igd-bdsm-size
  2025-01-24 19:12 [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk Tomita Moeko
                   ` (2 preceding siblings ...)
  2025-01-24 19:12 ` [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk Tomita Moeko
@ 2025-01-24 19:12 ` Tomita Moeko
  2025-01-24 19:12 ` [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk() Tomita Moeko
  4 siblings, 0 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-01-24 19:12 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: qemu-devel, Tomita Moeko

Though GTT Stolen Memory (GSM) is right below Data Stolen Memory (DSM)
in host address space, direct access to GSM is prohibited, and it is
not mapped to guest address space. Both host and guest accesses GSM
indirectly through the second half of MMIO BAR0 (GTTMMADR).

Guest firmware only need to reserve a memory region for DSM and program
the BDSM register with the base address of that region, that's actually
what both SeaBIOS[1] and OVMF does now.

[1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
---
 hw/vfio/igd.c | 28 +++-------------------------
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 376a26fbae..6e06dd774a 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -112,28 +112,8 @@ static int igd_gen(VFIOPCIDevice *vdev)
 
 #define IGD_GMCH_GEN6_GMS_SHIFT     3       /* SNB_GMCH in i915 */
 #define IGD_GMCH_GEN6_GMS_MASK      0x1f
-#define IGD_GMCH_GEN6_GGMS_SHIFT    8
-#define IGD_GMCH_GEN6_GGMS_MASK     0x3
 #define IGD_GMCH_GEN8_GMS_SHIFT     8       /* BDW_GMCH in i915 */
 #define IGD_GMCH_GEN8_GMS_MASK      0xff
-#define IGD_GMCH_GEN8_GGMS_SHIFT    6
-#define IGD_GMCH_GEN8_GGMS_MASK     0x3
-
-static uint64_t igd_gtt_memory_size(int gen, uint16_t gmch)
-{
-    uint64_t ggms;
-
-    if (gen < 8) {
-        ggms = (gmch >> IGD_GMCH_GEN6_GGMS_SHIFT) & IGD_GMCH_GEN6_GGMS_MASK;
-    } else {
-        ggms = (gmch >> IGD_GMCH_GEN8_GGMS_SHIFT) & IGD_GMCH_GEN8_GGMS_MASK;
-        if (ggms != 0) {
-            ggms = 1ULL << ggms;
-        }
-    }
-
-    return ggms * MiB;
-}
 
 static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
 {
@@ -368,7 +348,7 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
     g_autofree struct vfio_region_info *lpc = NULL;
     PCIDevice *lpc_bridge;
     int ret, gen;
-    uint64_t ggms_size, gms_size;
+    uint64_t gms_size;
     uint64_t *bdsm_size;
     uint32_t gmch;
     Error *err = NULL;
@@ -528,7 +508,6 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
         }
     }
 
-    ggms_size = igd_gtt_memory_size(gen, gmch);
     gms_size = igd_stolen_memory_size(gen, gmch);
 
     /*
@@ -540,7 +519,7 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
      * config offset 0x5C.
      */
     bdsm_size = g_malloc(sizeof(*bdsm_size));
-    *bdsm_size = cpu_to_le64(ggms_size + gms_size);
+    *bdsm_size = cpu_to_le64(gms_size);
     fw_cfg_add_file(fw_cfg_find(), "etc/igd-bdsm-size",
                     bdsm_size, sizeof(*bdsm_size));
 
@@ -560,8 +539,7 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
         pci_set_quad(vdev->emulated_config_bits + IGD_BDSM_GEN11, ~0);
     }
 
-    trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name,
-                                    (ggms_size + gms_size) / MiB);
+    trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name, (gms_size / MiB));
 
     return true;
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-24 19:12 [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk Tomita Moeko
                   ` (3 preceding siblings ...)
  2025-01-24 19:12 ` [PATCH v2 4/5] vfio/igd: do not include GTT stolen size in etc/igd-bdsm-size Tomita Moeko
@ 2025-01-24 19:12 ` Tomita Moeko
  2025-01-24 21:13   ` Alex Williamson
  4 siblings, 1 reply; 15+ messages in thread
From: Tomita Moeko @ 2025-01-24 19:12 UTC (permalink / raw)
  To: Alex Williamson, Cédric Le Goater; +Cc: qemu-devel, Tomita Moeko

Both enable opregion option (x-igd-opregion) and legacy mode require
setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
code. Finally we moved all the IGD-related code into igd.c.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
---
 hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
 hw/vfio/pci-quirks.c |  50 ---------------
 hw/vfio/pci.c        |  25 --------
 hw/vfio/pci.h        |   4 --
 4 files changed, 110 insertions(+), 112 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 6e06dd774a..015beacf5f 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
     return -1;
 }
 
+#define IGD_ASLS 0xfc /* ASL Storage Register */
 #define IGD_GMCH 0x50 /* Graphics Control Register */
 #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
 #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
@@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
     return 0;
 }
 
+/*
+ * The OpRegion includes the Video BIOS Table, which seems important for
+ * telling the driver what sort of outputs it has.  Without this, the device
+ * may work in the guest, but we may not get output.  This also requires BIOS
+ * support to reserve and populate a section of guest memory sufficient for
+ * the table and to write the base address of that memory to the ASLS register
+ * of the IGD device.
+ */
+static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
+                                       struct vfio_region_info *info,
+                                       Error **errp)
+{
+    int ret;
+
+    vdev->igd_opregion = g_malloc0(info->size);
+    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
+                info->size, info->offset);
+    if (ret != info->size) {
+        error_setg(errp, "failed to read IGD OpRegion");
+        g_free(vdev->igd_opregion);
+        vdev->igd_opregion = NULL;
+        return false;
+    }
+
+    /*
+     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
+     * allocate 32bit reserved memory for, copy these contents into, and write
+     * the reserved memory base address to the device ASLS register at 0xFC.
+     * Alignment of this reserved region seems flexible, but using a 4k page
+     * alignment seems to work well.  This interface assumes a single IGD
+     * device, which may be at VM address 00:02.0 in legacy mode or another
+     * address in UPT mode.
+     *
+     * NB, there may be future use cases discovered where the VM should have
+     * direct interaction with the host OpRegion, in which case the write to
+     * the ASLS register would trigger MemoryRegion setup to enable that.
+     */
+    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
+                    vdev->igd_opregion, info->size);
+
+    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
+
+    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
+    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
+    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
+
+    return true;
+}
+
 /*
  * The rather short list of registers that we copy from the host devices.
  * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
@@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
 }
 
+static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
+{
+    g_autofree struct vfio_region_info *opregion = NULL;
+    int ret;
+
+    /*
+     * Hotplugging is not supprted for both opregion access and legacy mode.
+     * For legacy mode, we also need to mark the ROM failed.
+     */
+    if (vdev->pdev.qdev.hotplugged) {
+        vdev->rom_read_failed = true;
+        error_setg(errp,
+                   "IGD OpRegion is not supported on hotplugged device");
+        return false;
+    }
+
+    ret = vfio_get_dev_region_info(&vdev->vbasedev,
+                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
+                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
+    if (ret) {
+        error_setg_errno(errp, -ret,
+                         "device does not supports IGD OpRegion feature");
+        return false;
+    }
+
+    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
+        return false;
+    }
+
+    return true;
+}
+
 bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
-                                 Error **errp G_GNUC_UNUSED)
+                                 Error **errp)
 {
     g_autofree struct vfio_region_info *rom = NULL;
-    g_autofree struct vfio_region_info *opregion = NULL;
     g_autofree struct vfio_region_info *host = NULL;
     g_autofree struct vfio_region_info *lpc = NULL;
+    PCIBus *bus;
     PCIDevice *lpc_bridge;
     int ret, gen;
+    bool legacy_mode, enable_opregion;
     uint64_t gms_size;
     uint64_t *bdsm_size;
     uint32_t gmch;
     Error *err = NULL;
 
+    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
+        !vfio_is_vga(vdev)) {
+        return true;
+    }
+
     /*
      * This must be an Intel VGA device at address 00:02.0 for us to even
      * consider enabling legacy mode.  The vBIOS has dependencies on the
      * PCI bus address.
      */
-    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
-        !vfio_is_vga(vdev) ||
-        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
-                                       0, PCI_DEVFN(0x2, 0))) {
+    bus = pci_device_root_bus(&vdev->pdev);
+    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
+    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
+
+    if (!enable_opregion && !legacy_mode) {
+        return true;
+    }
+
+    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
+        if (enable_opregion) {
+            error_propagate(errp, err);
+            return false;
+        } else if (legacy_mode) {
+            error_append_hint(&err, "IGD legacy mode disabled\n");
+            error_report_err(err);
+            return true;
+        }
+    }
+
+    if (!legacy_mode) {
         return true;
     }
 
@@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
         return true;
     }
 
-    /*
-     * Ignore the hotplug corner case, mark the ROM failed, we can't
-     * create the devices we need for legacy mode in the hotplug scenario.
-     */
-    if (vdev->pdev.qdev.hotplugged) {
-        error_report("IGD device %s hotplugged, ROM disabled, "
-                     "legacy mode disabled", vdev->vbasedev.name);
-        vdev->rom_read_failed = true;
-        return true;
-    }
-
     /*
      * Check whether we have all the vfio device specific regions to
      * support legacy mode (added in Linux v4.6).  If not, bail.
      */
-    ret = vfio_get_dev_region_info(&vdev->vbasedev,
-                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
-                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
-    if (ret) {
-        error_report("IGD device %s does not support OpRegion access,"
-                     "legacy mode disabled", vdev->vbasedev.name);
-        return true;
-    }
-
     ret = vfio_get_dev_region_info(&vdev->vbasedev,
                         VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
                         VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG, &host);
@@ -476,13 +560,6 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
         return true;
     }
 
-    /* Setup OpRegion access */
-    if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
-        error_append_hint(&err, "IGD legacy mode disabled\n");
-        error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
-        return true;
-    }
-
     /*
      * Allow user to override dsm size using x-igd-gms option, in multiples of
      * 32MiB. This option should only be used when the desired size cannot be
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index b8379cb512..f2b37910f0 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1114,56 +1114,6 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
     trace_vfio_quirk_rtl8168_probe(vdev->vbasedev.name);
 }
 
-#define IGD_ASLS 0xfc /* ASL Storage Register */
-
-/*
- * The OpRegion includes the Video BIOS Table, which seems important for
- * telling the driver what sort of outputs it has.  Without this, the device
- * may work in the guest, but we may not get output.  This also requires BIOS
- * support to reserve and populate a section of guest memory sufficient for
- * the table and to write the base address of that memory to the ASLS register
- * of the IGD device.
- */
-bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
-                                struct vfio_region_info *info, Error **errp)
-{
-    int ret;
-
-    vdev->igd_opregion = g_malloc0(info->size);
-    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
-                info->size, info->offset);
-    if (ret != info->size) {
-        error_setg(errp, "failed to read IGD OpRegion");
-        g_free(vdev->igd_opregion);
-        vdev->igd_opregion = NULL;
-        return false;
-    }
-
-    /*
-     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
-     * allocate 32bit reserved memory for, copy these contents into, and write
-     * the reserved memory base address to the device ASLS register at 0xFC.
-     * Alignment of this reserved region seems flexible, but using a 4k page
-     * alignment seems to work well.  This interface assumes a single IGD
-     * device, which may be at VM address 00:02.0 in legacy mode or another
-     * address in UPT mode.
-     *
-     * NB, there may be future use cases discovered where the VM should have
-     * direct interaction with the host OpRegion, in which case the write to
-     * the ASLS register would trigger MemoryRegion setup to enable that.
-     */
-    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
-                    vdev->igd_opregion, info->size);
-
-    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
-
-    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
-    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
-    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
-
-    return true;
-}
-
 /*
  * Common quirk probe entry points.
  */
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e624ae56c4..1b90c78c5a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3136,31 +3136,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         vfio_bar_quirk_setup(vdev, i);
     }
 
-    if (!vdev->igd_opregion &&
-        vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION) {
-        g_autofree struct vfio_region_info *opregion = NULL;
-
-        if (vdev->pdev.qdev.hotplugged) {
-            error_setg(errp,
-                       "cannot support IGD OpRegion feature on hotplugged "
-                       "device");
-            goto out_unset_idev;
-        }
-
-        ret = vfio_get_dev_region_info(vbasedev,
-                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
-                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
-        if (ret) {
-            error_setg_errno(errp, -ret,
-                             "does not support requested IGD OpRegion feature");
-            goto out_unset_idev;
-        }
-
-        if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
-            goto out_unset_idev;
-        }
-    }
-
     /* QEMU emulates all of MSI & MSIX */
     if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
         memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 5c64de0270..b9e920a6b4 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -229,10 +229,6 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
 
 bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
 
-bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
-                                struct vfio_region_info *info,
-                                Error **errp);
-
 void vfio_display_reset(VFIOPCIDevice *vdev);
 bool vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
 void vfio_display_finalize(VFIOPCIDevice *vdev);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-24 19:12 ` [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk() Tomita Moeko
@ 2025-01-24 21:13   ` Alex Williamson
  2025-01-25  7:42     ` Tomita Moeko
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2025-01-24 21:13 UTC (permalink / raw)
  To: Tomita Moeko; +Cc: Cédric Le Goater, qemu-devel

On Sat, 25 Jan 2025 03:12:45 +0800
Tomita Moeko <tomitamoeko@gmail.com> wrote:

> Both enable opregion option (x-igd-opregion) and legacy mode require
> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
> code. Finally we moved all the IGD-related code into igd.c.
> 
> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> ---
>  hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
>  hw/vfio/pci-quirks.c |  50 ---------------
>  hw/vfio/pci.c        |  25 --------
>  hw/vfio/pci.h        |   4 --
>  4 files changed, 110 insertions(+), 112 deletions(-)
> 
> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> index 6e06dd774a..015beacf5f 100644
> --- a/hw/vfio/igd.c
> +++ b/hw/vfio/igd.c
> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
>      return -1;
>  }
>  
> +#define IGD_ASLS 0xfc /* ASL Storage Register */
>  #define IGD_GMCH 0x50 /* Graphics Control Register */
>  #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
>  #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
>      return 0;
>  }
>  
> +/*
> + * The OpRegion includes the Video BIOS Table, which seems important for
> + * telling the driver what sort of outputs it has.  Without this, the device
> + * may work in the guest, but we may not get output.  This also requires BIOS
> + * support to reserve and populate a section of guest memory sufficient for
> + * the table and to write the base address of that memory to the ASLS register
> + * of the IGD device.
> + */
> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> +                                       struct vfio_region_info *info,
> +                                       Error **errp)
> +{
> +    int ret;
> +
> +    vdev->igd_opregion = g_malloc0(info->size);
> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
> +                info->size, info->offset);
> +    if (ret != info->size) {
> +        error_setg(errp, "failed to read IGD OpRegion");
> +        g_free(vdev->igd_opregion);
> +        vdev->igd_opregion = NULL;
> +        return false;
> +    }
> +
> +    /*
> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
> +     * allocate 32bit reserved memory for, copy these contents into, and write
> +     * the reserved memory base address to the device ASLS register at 0xFC.
> +     * Alignment of this reserved region seems flexible, but using a 4k page
> +     * alignment seems to work well.  This interface assumes a single IGD
> +     * device, which may be at VM address 00:02.0 in legacy mode or another
> +     * address in UPT mode.
> +     *
> +     * NB, there may be future use cases discovered where the VM should have
> +     * direct interaction with the host OpRegion, in which case the write to
> +     * the ASLS register would trigger MemoryRegion setup to enable that.
> +     */
> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
> +                    vdev->igd_opregion, info->size);
> +
> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
> +
> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
> +
> +    return true;
> +}
> +
>  /*
>   * The rather short list of registers that we copy from the host devices.
>   * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>      QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>  }
>  
> +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
> +{
> +    g_autofree struct vfio_region_info *opregion = NULL;
> +    int ret;
> +
> +    /*
> +     * Hotplugging is not supprted for both opregion access and legacy mode.
> +     * For legacy mode, we also need to mark the ROM failed.
> +     */

The explanation was a little better in the removed comment.

> +    if (vdev->pdev.qdev.hotplugged) {
> +        vdev->rom_read_failed = true;
> +        error_setg(errp,
> +                   "IGD OpRegion is not supported on hotplugged device");

As was the error log.

> +        return false;
> +    }
> +
> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
> +    if (ret) {
> +        error_setg_errno(errp, -ret,
> +                         "device does not supports IGD OpRegion feature");
> +        return false;
> +    }
> +
> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
> -                                 Error **errp G_GNUC_UNUSED)
> +                                 Error **errp)
>  {
>      g_autofree struct vfio_region_info *rom = NULL;
> -    g_autofree struct vfio_region_info *opregion = NULL;
>      g_autofree struct vfio_region_info *host = NULL;
>      g_autofree struct vfio_region_info *lpc = NULL;
> +    PCIBus *bus;
>      PCIDevice *lpc_bridge;
>      int ret, gen;
> +    bool legacy_mode, enable_opregion;
>      uint64_t gms_size;
>      uint64_t *bdsm_size;
>      uint32_t gmch;
>      Error *err = NULL;
>  
> +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> +        !vfio_is_vga(vdev)) {
> +        return true;
> +    }
> +
>      /*
>       * This must be an Intel VGA device at address 00:02.0 for us to even
>       * consider enabling legacy mode.  The vBIOS has dependencies on the
>       * PCI bus address.
>       */
> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> -        !vfio_is_vga(vdev) ||
> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
> -                                       0, PCI_DEVFN(0x2, 0))) {
> +    bus = pci_device_root_bus(&vdev->pdev);
> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
> +
> +    if (!enable_opregion && !legacy_mode) {
> +        return true;
> +    }
> +
> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
> +        if (enable_opregion) {
> +            error_propagate(errp, err);
> +            return false;
> +        } else if (legacy_mode) {
> +            error_append_hint(&err, "IGD legacy mode disabled\n");
> +            error_report_err(err);
> +            return true;
> +        }
> +    }
> +
> +    if (!legacy_mode) {
>          return true;
>      }
>  
> @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>          return true;
>      }
>  
> -    /*
> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
> -     * create the devices we need for legacy mode in the hotplug scenario.
> -     */
> -    if (vdev->pdev.qdev.hotplugged) {
> -        error_report("IGD device %s hotplugged, ROM disabled, "
> -                     "legacy mode disabled", vdev->vbasedev.name);
> -        vdev->rom_read_failed = true;
> -        return true;
> -    }
> -
>      /*
>       * Check whether we have all the vfio device specific regions to
>       * support legacy mode (added in Linux v4.6).  If not, bail.
>       */

And we're disassociating opregion setup from this useful comment.

What are we actually accomplishing here?  What specific code
duplication is eliminated?

Why is it important to move all this code to igd.c?

It's really difficult to untangle this refactor, I think it could be
done more iteratively, if it's really even beneficial.  Thanks,

Alex

> -    ret = vfio_get_dev_region_info(&vdev->vbasedev,
> -                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
> -                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
> -    if (ret) {
> -        error_report("IGD device %s does not support OpRegion access,"
> -                     "legacy mode disabled", vdev->vbasedev.name);
> -        return true;
> -    }
> -
>      ret = vfio_get_dev_region_info(&vdev->vbasedev,
>                          VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>                          VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG, &host);
> @@ -476,13 +560,6 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>          return true;
>      }
>  
> -    /* Setup OpRegion access */
> -    if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
> -        error_append_hint(&err, "IGD legacy mode disabled\n");
> -        error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> -        return true;
> -    }
> -
>      /*
>       * Allow user to override dsm size using x-igd-gms option, in multiples of
>       * 32MiB. This option should only be used when the desired size cannot be
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index b8379cb512..f2b37910f0 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1114,56 +1114,6 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
>      trace_vfio_quirk_rtl8168_probe(vdev->vbasedev.name);
>  }
>  
> -#define IGD_ASLS 0xfc /* ASL Storage Register */
> -
> -/*
> - * The OpRegion includes the Video BIOS Table, which seems important for
> - * telling the driver what sort of outputs it has.  Without this, the device
> - * may work in the guest, but we may not get output.  This also requires BIOS
> - * support to reserve and populate a section of guest memory sufficient for
> - * the table and to write the base address of that memory to the ASLS register
> - * of the IGD device.
> - */
> -bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> -                                struct vfio_region_info *info, Error **errp)
> -{
> -    int ret;
> -
> -    vdev->igd_opregion = g_malloc0(info->size);
> -    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
> -                info->size, info->offset);
> -    if (ret != info->size) {
> -        error_setg(errp, "failed to read IGD OpRegion");
> -        g_free(vdev->igd_opregion);
> -        vdev->igd_opregion = NULL;
> -        return false;
> -    }
> -
> -    /*
> -     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
> -     * allocate 32bit reserved memory for, copy these contents into, and write
> -     * the reserved memory base address to the device ASLS register at 0xFC.
> -     * Alignment of this reserved region seems flexible, but using a 4k page
> -     * alignment seems to work well.  This interface assumes a single IGD
> -     * device, which may be at VM address 00:02.0 in legacy mode or another
> -     * address in UPT mode.
> -     *
> -     * NB, there may be future use cases discovered where the VM should have
> -     * direct interaction with the host OpRegion, in which case the write to
> -     * the ASLS register would trigger MemoryRegion setup to enable that.
> -     */
> -    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
> -                    vdev->igd_opregion, info->size);
> -
> -    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
> -
> -    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
> -    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
> -    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
> -
> -    return true;
> -}
> -
>  /*
>   * Common quirk probe entry points.
>   */
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e624ae56c4..1b90c78c5a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3136,31 +3136,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          vfio_bar_quirk_setup(vdev, i);
>      }
>  
> -    if (!vdev->igd_opregion &&
> -        vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION) {
> -        g_autofree struct vfio_region_info *opregion = NULL;
> -
> -        if (vdev->pdev.qdev.hotplugged) {
> -            error_setg(errp,
> -                       "cannot support IGD OpRegion feature on hotplugged "
> -                       "device");
> -            goto out_unset_idev;
> -        }
> -
> -        ret = vfio_get_dev_region_info(vbasedev,
> -                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
> -                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
> -        if (ret) {
> -            error_setg_errno(errp, -ret,
> -                             "does not support requested IGD OpRegion feature");
> -            goto out_unset_idev;
> -        }
> -
> -        if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
> -            goto out_unset_idev;
> -        }
> -    }
> -
>      /* QEMU emulates all of MSI & MSIX */
>      if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
>          memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 5c64de0270..b9e920a6b4 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -229,10 +229,6 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>  
>  bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>  
> -bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> -                                struct vfio_region_info *info,
> -                                Error **errp);
> -
>  void vfio_display_reset(VFIOPCIDevice *vdev);
>  bool vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>  void vfio_display_finalize(VFIOPCIDevice *vdev);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-24 21:13   ` Alex Williamson
@ 2025-01-25  7:42     ` Tomita Moeko
  2025-01-30 18:33       ` Tomita Moeko
  0 siblings, 1 reply; 15+ messages in thread
From: Tomita Moeko @ 2025-01-25  7:42 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Cédric Le Goater, qemu-devel

On 1/25/25 05:13, Alex Williamson wrote:
> On Sat, 25 Jan 2025 03:12:45 +0800
> Tomita Moeko <tomitamoeko@gmail.com> wrote:
> 
>> Both enable opregion option (x-igd-opregion) and legacy mode require
>> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
>> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
>> code. Finally we moved all the IGD-related code into igd.c.
>>
>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
>> ---
>>  hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
>>  hw/vfio/pci-quirks.c |  50 ---------------
>>  hw/vfio/pci.c        |  25 --------
>>  hw/vfio/pci.h        |   4 --
>>  4 files changed, 110 insertions(+), 112 deletions(-)
>>
>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
>> index 6e06dd774a..015beacf5f 100644
>> --- a/hw/vfio/igd.c
>> +++ b/hw/vfio/igd.c
>> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
>>      return -1;
>>  }
>>  
>> +#define IGD_ASLS 0xfc /* ASL Storage Register */
>>  #define IGD_GMCH 0x50 /* Graphics Control Register */
>>  #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
>>  #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
>> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
>>      return 0;
>>  }
>>  
>> +/*
>> + * The OpRegion includes the Video BIOS Table, which seems important for
>> + * telling the driver what sort of outputs it has.  Without this, the device
>> + * may work in the guest, but we may not get output.  This also requires BIOS
>> + * support to reserve and populate a section of guest memory sufficient for
>> + * the table and to write the base address of that memory to the ASLS register
>> + * of the IGD device.
>> + */
>> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>> +                                       struct vfio_region_info *info,
>> +                                       Error **errp)
>> +{
>> +    int ret;
>> +
>> +    vdev->igd_opregion = g_malloc0(info->size);
>> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>> +                info->size, info->offset);
>> +    if (ret != info->size) {
>> +        error_setg(errp, "failed to read IGD OpRegion");
>> +        g_free(vdev->igd_opregion);
>> +        vdev->igd_opregion = NULL;
>> +        return false;
>> +    }
>> +
>> +    /*
>> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>> +     * allocate 32bit reserved memory for, copy these contents into, and write
>> +     * the reserved memory base address to the device ASLS register at 0xFC.
>> +     * Alignment of this reserved region seems flexible, but using a 4k page
>> +     * alignment seems to work well.  This interface assumes a single IGD
>> +     * device, which may be at VM address 00:02.0 in legacy mode or another
>> +     * address in UPT mode.
>> +     *
>> +     * NB, there may be future use cases discovered where the VM should have
>> +     * direct interaction with the host OpRegion, in which case the write to
>> +     * the ASLS register would trigger MemoryRegion setup to enable that.
>> +     */
>> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>> +                    vdev->igd_opregion, info->size);
>> +
>> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>> +
>> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>> +
>> +    return true;
>> +}
>> +
>>  /*
>>   * The rather short list of registers that we copy from the host devices.
>>   * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
>> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>>      QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>>  }
>>  
>> +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
>> +{
>> +    g_autofree struct vfio_region_info *opregion = NULL;
>> +    int ret;
>> +
>> +    /*
>> +     * Hotplugging is not supprted for both opregion access and legacy mode.
>> +     * For legacy mode, we also need to mark the ROM failed.
>> +     */
> 
> The explanation was a little better in the removed comment.
> 
>> +    if (vdev->pdev.qdev.hotplugged) {
>> +        vdev->rom_read_failed = true;
>> +        error_setg(errp,
>> +                   "IGD OpRegion is not supported on hotplugged device");
> 
> As was the error log.
> 
>> +        return false;
>> +    }
>> +
>> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>> +    if (ret) {
>> +        error_setg_errno(errp, -ret,
>> +                         "device does not supports IGD OpRegion feature");
>> +        return false;
>> +    }
>> +
>> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>  bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>> -                                 Error **errp G_GNUC_UNUSED)
>> +                                 Error **errp)
>>  {
>>      g_autofree struct vfio_region_info *rom = NULL;
>> -    g_autofree struct vfio_region_info *opregion = NULL;
>>      g_autofree struct vfio_region_info *host = NULL;
>>      g_autofree struct vfio_region_info *lpc = NULL;
>> +    PCIBus *bus;
>>      PCIDevice *lpc_bridge;
>>      int ret, gen;
>> +    bool legacy_mode, enable_opregion;
>>      uint64_t gms_size;
>>      uint64_t *bdsm_size;
>>      uint32_t gmch;
>>      Error *err = NULL;
>>  
>> +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>> +        !vfio_is_vga(vdev)) {
>> +        return true;
>> +    }
>> +
>>      /*
>>       * This must be an Intel VGA device at address 00:02.0 for us to even
>>       * consider enabling legacy mode.  The vBIOS has dependencies on the
>>       * PCI bus address.
>>       */
>> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>> -        !vfio_is_vga(vdev) ||
>> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>> -                                       0, PCI_DEVFN(0x2, 0))) {
>> +    bus = pci_device_root_bus(&vdev->pdev);
>> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
>> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
>> +
>> +    if (!enable_opregion && !legacy_mode) {
>> +        return true;
>> +    }
>> +
>> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
>> +        if (enable_opregion) {
>> +            error_propagate(errp, err);
>> +            return false;
>> +        } else if (legacy_mode) {
>> +            error_append_hint(&err, "IGD legacy mode disabled\n");
>> +            error_report_err(err);
>> +            return true;
>> +        }
>> +    }
>> +
>> +    if (!legacy_mode) {
>>          return true;
>>      }
>>  
>> @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>          return true;
>>      }
>>  
>> -    /*
>> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
>> -     * create the devices we need for legacy mode in the hotplug scenario.
>> -     */
>> -    if (vdev->pdev.qdev.hotplugged) {
>> -        error_report("IGD device %s hotplugged, ROM disabled, "
>> -                     "legacy mode disabled", vdev->vbasedev.name);
>> -        vdev->rom_read_failed = true;
>> -        return true;
>> -    }
>> -
>>      /*
>>       * Check whether we have all the vfio device specific regions to
>>       * support legacy mode (added in Linux v4.6).  If not, bail.
>>       */
> > And we're disassociating opregion setup from this useful comment.
> 
> What are we actually accomplishing here?  What specific code
> duplication is eliminated?

This patch is designed for moving the opregion quirk in vfio_realize()
to igd.c, for better isolation of the igd-specific code. Legacy mode
also need to initialize opregion as x-igd-opregion=on option. These
code are almost the same, except legacy mode continues on error, while
x-igd-opregion fails.

I am going to clearify that in the commit message of v3.

> Why is it important to move all this code to igd.c?
> 
> It's really difficult to untangle this refactor, I think it could be
> done more iteratively, if it's really even beneficial.  Thanks,
> 
> Alex

Agreed. Actually I'd like to totally remove the "legacy mode" and "UPT
mode" concept in future, my proposal is:
* Emulate and initialize ASLS and BDSM register unconditionally. These
  registers holds HPA, keeping the old value to guest is not a good
  idea
* Make the host bridge and LPC bridge ID quirks optional like OpRegion.
  Recent Linux kernel and Windows driver seems not relying on it. This
  enables IGD passthrough on Q35 machines, but probably without UEFI
  GOP or VBIOS support, as it is observed they require specific LPC
  bridge DID to work.
* Remove the requirement of IGD device class being VGA controller, this
  was previous discussed in my kernel change [1]
* Update the document

It would time consuming to implement all them, coding is not difficult,
but I have to verify my change on diffrent platforms. And they are out
of this patchset's scope I think. I personally perfers doing it in a
future patchset.

[1] https://lore.kernel.org/all/20250123163416.7653-1-tomitamoeko@gmail.com/

Thanks,
Moeko

>> -    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>> -                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>> -                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>> -    if (ret) {
>> -        error_report("IGD device %s does not support OpRegion access,"
>> -                     "legacy mode disabled", vdev->vbasedev.name);
>> -        return true;
>> -    }
>> -
>>      ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>                          VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>                          VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG, &host);
>> @@ -476,13 +560,6 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>          return true;
>>      }
>>  
>> -    /* Setup OpRegion access */
>> -    if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
>> -        error_append_hint(&err, "IGD legacy mode disabled\n");
>> -        error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>> -        return true;
>> -    }
>> -
>>      /*
>>       * Allow user to override dsm size using x-igd-gms option, in multiples of
>>       * 32MiB. This option should only be used when the desired size cannot be
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index b8379cb512..f2b37910f0 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -1114,56 +1114,6 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
>>      trace_vfio_quirk_rtl8168_probe(vdev->vbasedev.name);
>>  }
>>  
>> -#define IGD_ASLS 0xfc /* ASL Storage Register */
>> -
>> -/*
>> - * The OpRegion includes the Video BIOS Table, which seems important for
>> - * telling the driver what sort of outputs it has.  Without this, the device
>> - * may work in the guest, but we may not get output.  This also requires BIOS
>> - * support to reserve and populate a section of guest memory sufficient for
>> - * the table and to write the base address of that memory to the ASLS register
>> - * of the IGD device.
>> - */
>> -bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>> -                                struct vfio_region_info *info, Error **errp)
>> -{
>> -    int ret;
>> -
>> -    vdev->igd_opregion = g_malloc0(info->size);
>> -    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>> -                info->size, info->offset);
>> -    if (ret != info->size) {
>> -        error_setg(errp, "failed to read IGD OpRegion");
>> -        g_free(vdev->igd_opregion);
>> -        vdev->igd_opregion = NULL;
>> -        return false;
>> -    }
>> -
>> -    /*
>> -     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>> -     * allocate 32bit reserved memory for, copy these contents into, and write
>> -     * the reserved memory base address to the device ASLS register at 0xFC.
>> -     * Alignment of this reserved region seems flexible, but using a 4k page
>> -     * alignment seems to work well.  This interface assumes a single IGD
>> -     * device, which may be at VM address 00:02.0 in legacy mode or another
>> -     * address in UPT mode.
>> -     *
>> -     * NB, there may be future use cases discovered where the VM should have
>> -     * direct interaction with the host OpRegion, in which case the write to
>> -     * the ASLS register would trigger MemoryRegion setup to enable that.
>> -     */
>> -    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>> -                    vdev->igd_opregion, info->size);
>> -
>> -    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>> -
>> -    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>> -    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>> -    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>> -
>> -    return true;
>> -}
>> -
>>  /*
>>   * Common quirk probe entry points.
>>   */
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e624ae56c4..1b90c78c5a 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3136,31 +3136,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>          vfio_bar_quirk_setup(vdev, i);
>>      }
>>  
>> -    if (!vdev->igd_opregion &&
>> -        vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION) {
>> -        g_autofree struct vfio_region_info *opregion = NULL;
>> -
>> -        if (vdev->pdev.qdev.hotplugged) {
>> -            error_setg(errp,
>> -                       "cannot support IGD OpRegion feature on hotplugged "
>> -                       "device");
>> -            goto out_unset_idev;
>> -        }
>> -
>> -        ret = vfio_get_dev_region_info(vbasedev,
>> -                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>> -                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>> -        if (ret) {
>> -            error_setg_errno(errp, -ret,
>> -                             "does not support requested IGD OpRegion feature");
>> -            goto out_unset_idev;
>> -        }
>> -
>> -        if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>> -            goto out_unset_idev;
>> -        }
>> -    }
>> -
>>      /* QEMU emulates all of MSI & MSIX */
>>      if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
>>          memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 5c64de0270..b9e920a6b4 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -229,10 +229,6 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>>  
>>  bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>>  
>> -bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>> -                                struct vfio_region_info *info,
>> -                                Error **errp);
>> -
>>  void vfio_display_reset(VFIOPCIDevice *vdev);
>>  bool vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>>  void vfio_display_finalize(VFIOPCIDevice *vdev);
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-25  7:42     ` Tomita Moeko
@ 2025-01-30 18:33       ` Tomita Moeko
  2025-01-30 20:41         ` Alex Williamson
  0 siblings, 1 reply; 15+ messages in thread
From: Tomita Moeko @ 2025-01-30 18:33 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Cédric Le Goater, qemu-devel

On 1/25/25 15:42, Tomita Moeko wrote:
> On 1/25/25 05:13, Alex Williamson wrote:
>> On Sat, 25 Jan 2025 03:12:45 +0800
>> Tomita Moeko <tomitamoeko@gmail.com> wrote:
>>
>>> Both enable opregion option (x-igd-opregion) and legacy mode require
>>> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
>>> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
>>> code. Finally we moved all the IGD-related code into igd.c.
>>>
>>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
>>> ---
>>>  hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
>>>  hw/vfio/pci-quirks.c |  50 ---------------
>>>  hw/vfio/pci.c        |  25 --------
>>>  hw/vfio/pci.h        |   4 --
>>>  4 files changed, 110 insertions(+), 112 deletions(-)
>>>
>>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
>>> index 6e06dd774a..015beacf5f 100644
>>> --- a/hw/vfio/igd.c
>>> +++ b/hw/vfio/igd.c
>>> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
>>>      return -1;
>>>  }
>>>  
>>> +#define IGD_ASLS 0xfc /* ASL Storage Register */
>>>  #define IGD_GMCH 0x50 /* Graphics Control Register */
>>>  #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
>>>  #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
>>> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
>>>      return 0;
>>>  }
>>>  
>>> +/*
>>> + * The OpRegion includes the Video BIOS Table, which seems important for
>>> + * telling the driver what sort of outputs it has.  Without this, the device
>>> + * may work in the guest, but we may not get output.  This also requires BIOS
>>> + * support to reserve and populate a section of guest memory sufficient for
>>> + * the table and to write the base address of that memory to the ASLS register
>>> + * of the IGD device.
>>> + */
>>> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>> +                                       struct vfio_region_info *info,
>>> +                                       Error **errp)
>>> +{
>>> +    int ret;
>>> +
>>> +    vdev->igd_opregion = g_malloc0(info->size);
>>> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>>> +                info->size, info->offset);
>>> +    if (ret != info->size) {
>>> +        error_setg(errp, "failed to read IGD OpRegion");
>>> +        g_free(vdev->igd_opregion);
>>> +        vdev->igd_opregion = NULL;
>>> +        return false;
>>> +    }
>>> +
>>> +    /*
>>> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>>> +     * allocate 32bit reserved memory for, copy these contents into, and write
>>> +     * the reserved memory base address to the device ASLS register at 0xFC.
>>> +     * Alignment of this reserved region seems flexible, but using a 4k page
>>> +     * alignment seems to work well.  This interface assumes a single IGD
>>> +     * device, which may be at VM address 00:02.0 in legacy mode or another
>>> +     * address in UPT mode.
>>> +     *
>>> +     * NB, there may be future use cases discovered where the VM should have
>>> +     * direct interaction with the host OpRegion, in which case the write to
>>> +     * the ASLS register would trigger MemoryRegion setup to enable that.
>>> +     */
>>> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>>> +                    vdev->igd_opregion, info->size);
>>> +
>>> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>>> +
>>> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>>> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>>> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>>> +
>>> +    return true;
>>> +}
>>> +
>>>  /*
>>>   * The rather short list of registers that we copy from the host devices.
>>>   * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
>>> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>>>      QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>>>  }
>>>  
>>> +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
>>> +{
>>> +    g_autofree struct vfio_region_info *opregion = NULL;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Hotplugging is not supprted for both opregion access and legacy mode.
>>> +     * For legacy mode, we also need to mark the ROM failed.
>>> +     */
>>
>> The explanation was a little better in the removed comment.
>>
>>> +    if (vdev->pdev.qdev.hotplugged) {
>>> +        vdev->rom_read_failed = true;
>>> +        error_setg(errp,
>>> +                   "IGD OpRegion is not supported on hotplugged device");
>>
>> As was the error log.
>>
>>> +        return false;
>>> +    }
>>> +
>>> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>>> +    if (ret) {
>>> +        error_setg_errno(errp, -ret,
>>> +                         "device does not supports IGD OpRegion feature");
>>> +        return false;
>>> +    }
>>> +
>>> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>>  bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>> -                                 Error **errp G_GNUC_UNUSED)
>>> +                                 Error **errp)
>>>  {
>>>      g_autofree struct vfio_region_info *rom = NULL;
>>> -    g_autofree struct vfio_region_info *opregion = NULL;
>>>      g_autofree struct vfio_region_info *host = NULL;
>>>      g_autofree struct vfio_region_info *lpc = NULL;
>>> +    PCIBus *bus;
>>>      PCIDevice *lpc_bridge;
>>>      int ret, gen;
>>> +    bool legacy_mode, enable_opregion;
>>>      uint64_t gms_size;
>>>      uint64_t *bdsm_size;
>>>      uint32_t gmch;
>>>      Error *err = NULL;
>>>  
>>> +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>> +        !vfio_is_vga(vdev)) {
>>> +        return true;
>>> +    }
>>> +
>>>      /*
>>>       * This must be an Intel VGA device at address 00:02.0 for us to even
>>>       * consider enabling legacy mode.  The vBIOS has dependencies on the
>>>       * PCI bus address.
>>>       */
>>> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>> -        !vfio_is_vga(vdev) ||
>>> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>>> -                                       0, PCI_DEVFN(0x2, 0))) {
>>> +    bus = pci_device_root_bus(&vdev->pdev);
>>> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
>>> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
>>> +
>>> +    if (!enable_opregion && !legacy_mode) {
>>> +        return true;
>>> +    }
>>> +
>>> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
>>> +        if (enable_opregion) {
>>> +            error_propagate(errp, err);
>>> +            return false;
>>> +        } else if (legacy_mode) {
>>> +            error_append_hint(&err, "IGD legacy mode disabled\n");
>>> +            error_report_err(err);
>>> +            return true;
>>> +        }
>>> +    }
>>> +
>>> +    if (!legacy_mode) {
>>>          return true;
>>>      }
>>>  
>>> @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>          return true;
>>>      }
>>>  
>>> -    /*
>>> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
>>> -     * create the devices we need for legacy mode in the hotplug scenario.
>>> -     */
>>> -    if (vdev->pdev.qdev.hotplugged) {
>>> -        error_report("IGD device %s hotplugged, ROM disabled, "
>>> -                     "legacy mode disabled", vdev->vbasedev.name);
>>> -        vdev->rom_read_failed = true;
>>> -        return true;
>>> -    }
>>> -
>>>      /*
>>>       * Check whether we have all the vfio device specific regions to
>>>       * support legacy mode (added in Linux v4.6).  If not, bail.
>>>       */
>>> And we're disassociating opregion setup from this useful comment.
>>
>> What are we actually accomplishing here?  What specific code
>> duplication is eliminated?
> 
> This patch is designed for moving the opregion quirk in vfio_realize()
> to igd.c, for better isolation of the igd-specific code. Legacy mode
> also need to initialize opregion as x-igd-opregion=on option. These
> code are almost the same, except legacy mode continues on error, while
> x-igd-opregion fails.
> 
> I am going to clearify that in the commit message of v3.
> 
>> Why is it important to move all this code to igd.c?

x-igd-opreqion quirk is currently located in pci-quirks.c, which is not
controlled by CONFIG_VFIO_IGD, moving it to igd.c prevents building
that unnecessary code in certain binaries, for example, non x86 builds. 

>> It's really difficult to untangle this refactor, I think it could be
>> done more iteratively, if it's really even beneficial.  Thanks,
>>
>> Alex
> 
> Agreed. Actually I'd like to totally remove the "legacy mode" and "UPT
> mode" concept in future, my proposal is:
> * Emulate and initialize ASLS and BDSM register unconditionally. These
>   registers holds HPA, keeping the old value to guest is not a good
>   idea
> * Make the host bridge and LPC bridge ID quirks optional like OpRegion.
>   Recent Linux kernel and Windows driver seems not relying on it. This
>   enables IGD passthrough on Q35 machines, but probably without UEFI
>   GOP or VBIOS support, as it is observed they require specific LPC
>   bridge DID to work.
> * Remove the requirement of IGD device class being VGA controller, this
>   was previous discussed in my kernel change [1]
> * Update the document
> 
> It would time consuming to implement all them, coding is not difficult,
> but I have to verify my change on diffrent platforms. And they are out
> of this patchset's scope I think. I personally perfers doing it in a
> future patchset.
> 
> [1] https://lore.kernel.org/all/20250123163416.7653-1-tomitamoeko@gmail.com/
> 
> Thanks,
> Moeko

Please let me know if you have any thoughts or suggestions, in case
you missed the previous mail.

Best Regards,
Moeko

>>> -    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>> -                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>> -                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>>> -    if (ret) {
>>> -        error_report("IGD device %s does not support OpRegion access,"
>>> -                     "legacy mode disabled", vdev->vbasedev.name);
>>> -        return true;
>>> -    }
>>> -
>>>      ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>>                          VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>>                          VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG, &host);
>>> @@ -476,13 +560,6 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>          return true;
>>>      }
>>>  
>>> -    /* Setup OpRegion access */
>>> -    if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
>>> -        error_append_hint(&err, "IGD legacy mode disabled\n");
>>> -        error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>>> -        return true;
>>> -    }
>>> -
>>>      /*
>>>       * Allow user to override dsm size using x-igd-gms option, in multiples of
>>>       * 32MiB. This option should only be used when the desired size cannot be
>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>> index b8379cb512..f2b37910f0 100644
>>> --- a/hw/vfio/pci-quirks.c
>>> +++ b/hw/vfio/pci-quirks.c
>>> @@ -1114,56 +1114,6 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
>>>      trace_vfio_quirk_rtl8168_probe(vdev->vbasedev.name);
>>>  }
>>>  
>>> -#define IGD_ASLS 0xfc /* ASL Storage Register */
>>> -
>>> -/*
>>> - * The OpRegion includes the Video BIOS Table, which seems important for
>>> - * telling the driver what sort of outputs it has.  Without this, the device
>>> - * may work in the guest, but we may not get output.  This also requires BIOS
>>> - * support to reserve and populate a section of guest memory sufficient for
>>> - * the table and to write the base address of that memory to the ASLS register
>>> - * of the IGD device.
>>> - */
>>> -bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>> -                                struct vfio_region_info *info, Error **errp)
>>> -{
>>> -    int ret;
>>> -
>>> -    vdev->igd_opregion = g_malloc0(info->size);
>>> -    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>>> -                info->size, info->offset);
>>> -    if (ret != info->size) {
>>> -        error_setg(errp, "failed to read IGD OpRegion");
>>> -        g_free(vdev->igd_opregion);
>>> -        vdev->igd_opregion = NULL;
>>> -        return false;
>>> -    }
>>> -
>>> -    /*
>>> -     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>>> -     * allocate 32bit reserved memory for, copy these contents into, and write
>>> -     * the reserved memory base address to the device ASLS register at 0xFC.
>>> -     * Alignment of this reserved region seems flexible, but using a 4k page
>>> -     * alignment seems to work well.  This interface assumes a single IGD
>>> -     * device, which may be at VM address 00:02.0 in legacy mode or another
>>> -     * address in UPT mode.
>>> -     *
>>> -     * NB, there may be future use cases discovered where the VM should have
>>> -     * direct interaction with the host OpRegion, in which case the write to
>>> -     * the ASLS register would trigger MemoryRegion setup to enable that.
>>> -     */
>>> -    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>>> -                    vdev->igd_opregion, info->size);
>>> -
>>> -    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>>> -
>>> -    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>>> -    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>>> -    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>>> -
>>> -    return true;
>>> -}
>>> -
>>>  /*
>>>   * Common quirk probe entry points.
>>>   */
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index e624ae56c4..1b90c78c5a 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3136,31 +3136,6 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>          vfio_bar_quirk_setup(vdev, i);
>>>      }
>>>  
>>> -    if (!vdev->igd_opregion &&
>>> -        vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION) {
>>> -        g_autofree struct vfio_region_info *opregion = NULL;
>>> -
>>> -        if (vdev->pdev.qdev.hotplugged) {
>>> -            error_setg(errp,
>>> -                       "cannot support IGD OpRegion feature on hotplugged "
>>> -                       "device");
>>> -            goto out_unset_idev;
>>> -        }
>>> -
>>> -        ret = vfio_get_dev_region_info(vbasedev,
>>> -                        VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>> -                        VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>>> -        if (ret) {
>>> -            error_setg_errno(errp, -ret,
>>> -                             "does not support requested IGD OpRegion feature");
>>> -            goto out_unset_idev;
>>> -        }
>>> -
>>> -        if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>>> -            goto out_unset_idev;
>>> -        }
>>> -    }
>>> -
>>>      /* QEMU emulates all of MSI & MSIX */
>>>      if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
>>>          memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
>>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>>> index 5c64de0270..b9e920a6b4 100644
>>> --- a/hw/vfio/pci.h
>>> +++ b/hw/vfio/pci.h
>>> @@ -229,10 +229,6 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>>>  
>>>  bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>>>  
>>> -bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>> -                                struct vfio_region_info *info,
>>> -                                Error **errp);
>>> -
>>>  void vfio_display_reset(VFIOPCIDevice *vdev);
>>>  bool vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>>>  void vfio_display_finalize(VFIOPCIDevice *vdev);
>>
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-30 18:33       ` Tomita Moeko
@ 2025-01-30 20:41         ` Alex Williamson
  2025-01-31 10:15           ` Cédric Le Goater
  2025-02-01  7:48           ` Tomita Moeko
  0 siblings, 2 replies; 15+ messages in thread
From: Alex Williamson @ 2025-01-30 20:41 UTC (permalink / raw)
  To: Tomita Moeko; +Cc: Cédric Le Goater, qemu-devel

On Fri, 31 Jan 2025 02:33:03 +0800
Tomita Moeko <tomitamoeko@gmail.com> wrote:

> On 1/25/25 15:42, Tomita Moeko wrote:
> > On 1/25/25 05:13, Alex Williamson wrote:  
> >> On Sat, 25 Jan 2025 03:12:45 +0800
> >> Tomita Moeko <tomitamoeko@gmail.com> wrote:
> >>  
> >>> Both enable opregion option (x-igd-opregion) and legacy mode require
> >>> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
> >>> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
> >>> code. Finally we moved all the IGD-related code into igd.c.
> >>>
> >>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> >>> ---
> >>>  hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
> >>>  hw/vfio/pci-quirks.c |  50 ---------------
> >>>  hw/vfio/pci.c        |  25 --------
> >>>  hw/vfio/pci.h        |   4 --
> >>>  4 files changed, 110 insertions(+), 112 deletions(-)
> >>>
> >>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> >>> index 6e06dd774a..015beacf5f 100644
> >>> --- a/hw/vfio/igd.c
> >>> +++ b/hw/vfio/igd.c
> >>> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
> >>>      return -1;
> >>>  }
> >>>  
> >>> +#define IGD_ASLS 0xfc /* ASL Storage Register */
> >>>  #define IGD_GMCH 0x50 /* Graphics Control Register */
> >>>  #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
> >>>  #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
> >>> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
> >>>      return 0;
> >>>  }
> >>>  
> >>> +/*
> >>> + * The OpRegion includes the Video BIOS Table, which seems important for
> >>> + * telling the driver what sort of outputs it has.  Without this, the device
> >>> + * may work in the guest, but we may not get output.  This also requires BIOS
> >>> + * support to reserve and populate a section of guest memory sufficient for
> >>> + * the table and to write the base address of that memory to the ASLS register
> >>> + * of the IGD device.
> >>> + */
> >>> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> >>> +                                       struct vfio_region_info *info,
> >>> +                                       Error **errp)
> >>> +{
> >>> +    int ret;
> >>> +
> >>> +    vdev->igd_opregion = g_malloc0(info->size);
> >>> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
> >>> +                info->size, info->offset);
> >>> +    if (ret != info->size) {
> >>> +        error_setg(errp, "failed to read IGD OpRegion");
> >>> +        g_free(vdev->igd_opregion);
> >>> +        vdev->igd_opregion = NULL;
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    /*
> >>> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
> >>> +     * allocate 32bit reserved memory for, copy these contents into, and write
> >>> +     * the reserved memory base address to the device ASLS register at 0xFC.
> >>> +     * Alignment of this reserved region seems flexible, but using a 4k page
> >>> +     * alignment seems to work well.  This interface assumes a single IGD
> >>> +     * device, which may be at VM address 00:02.0 in legacy mode or another
> >>> +     * address in UPT mode.
> >>> +     *
> >>> +     * NB, there may be future use cases discovered where the VM should have
> >>> +     * direct interaction with the host OpRegion, in which case the write to
> >>> +     * the ASLS register would trigger MemoryRegion setup to enable that.
> >>> +     */
> >>> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
> >>> +                    vdev->igd_opregion, info->size);
> >>> +
> >>> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
> >>> +
> >>> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
> >>> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
> >>> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
> >>> +
> >>> +    return true;
> >>> +}
> >>> +
> >>>  /*
> >>>   * The rather short list of registers that we copy from the host devices.
> >>>   * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
> >>> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
> >>>      QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
> >>>  }
> >>>  
> >>> +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
> >>> +{
> >>> +    g_autofree struct vfio_region_info *opregion = NULL;
> >>> +    int ret;
> >>> +
> >>> +    /*
> >>> +     * Hotplugging is not supprted for both opregion access and legacy mode.
> >>> +     * For legacy mode, we also need to mark the ROM failed.
> >>> +     */  
> >>
> >> The explanation was a little better in the removed comment.
> >>  
> >>> +    if (vdev->pdev.qdev.hotplugged) {
> >>> +        vdev->rom_read_failed = true;
> >>> +        error_setg(errp,
> >>> +                   "IGD OpRegion is not supported on hotplugged device");  
> >>
> >> As was the error log.
> >>  
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
> >>> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
> >>> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
> >>> +    if (ret) {
> >>> +        error_setg_errno(errp, -ret,
> >>> +                         "device does not supports IGD OpRegion feature");
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    return true;
> >>> +}
> >>> +
> >>>  bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
> >>> -                                 Error **errp G_GNUC_UNUSED)
> >>> +                                 Error **errp)
> >>>  {
> >>>      g_autofree struct vfio_region_info *rom = NULL;
> >>> -    g_autofree struct vfio_region_info *opregion = NULL;
> >>>      g_autofree struct vfio_region_info *host = NULL;
> >>>      g_autofree struct vfio_region_info *lpc = NULL;
> >>> +    PCIBus *bus;
> >>>      PCIDevice *lpc_bridge;
> >>>      int ret, gen;
> >>> +    bool legacy_mode, enable_opregion;
> >>>      uint64_t gms_size;
> >>>      uint64_t *bdsm_size;
> >>>      uint32_t gmch;
> >>>      Error *err = NULL;
> >>>  
> >>> +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> >>> +        !vfio_is_vga(vdev)) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>>      /*
> >>>       * This must be an Intel VGA device at address 00:02.0 for us to even
> >>>       * consider enabling legacy mode.  The vBIOS has dependencies on the
> >>>       * PCI bus address.
> >>>       */
> >>> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> >>> -        !vfio_is_vga(vdev) ||
> >>> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
> >>> -                                       0, PCI_DEVFN(0x2, 0))) {
> >>> +    bus = pci_device_root_bus(&vdev->pdev);
> >>> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
> >>> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
> >>> +
> >>> +    if (!enable_opregion && !legacy_mode) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
> >>> +        if (enable_opregion) {
> >>> +            error_propagate(errp, err);
> >>> +            return false;
> >>> +        } else if (legacy_mode) {
> >>> +            error_append_hint(&err, "IGD legacy mode disabled\n");
> >>> +            error_report_err(err);
> >>> +            return true;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    if (!legacy_mode) {
> >>>          return true;
> >>>      }
> >>>  
> >>> @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
> >>>          return true;
> >>>      }
> >>>  
> >>> -    /*
> >>> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
> >>> -     * create the devices we need for legacy mode in the hotplug scenario.
> >>> -     */
> >>> -    if (vdev->pdev.qdev.hotplugged) {
> >>> -        error_report("IGD device %s hotplugged, ROM disabled, "
> >>> -                     "legacy mode disabled", vdev->vbasedev.name);
> >>> -        vdev->rom_read_failed = true;
> >>> -        return true;
> >>> -    }
> >>> -
> >>>      /*
> >>>       * Check whether we have all the vfio device specific regions to
> >>>       * support legacy mode (added in Linux v4.6).  If not, bail.
> >>>       */
> >>> And we're disassociating opregion setup from this useful comment.  
> >>
> >> What are we actually accomplishing here?  What specific code
> >> duplication is eliminated?  
> > 
> > This patch is designed for moving the opregion quirk in vfio_realize()
> > to igd.c, for better isolation of the igd-specific code. Legacy mode
> > also need to initialize opregion as x-igd-opregion=on option. These
> > code are almost the same, except legacy mode continues on error, while
> > x-igd-opregion fails.
> > 
> > I am going to clearify that in the commit message of v3.
> >   
> >> Why is it important to move all this code to igd.c?  
> 
> x-igd-opreqion quirk is currently located in pci-quirks.c, which is not
> controlled by CONFIG_VFIO_IGD, moving it to igd.c prevents building
> that unnecessary code in certain binaries, for example, non x86 builds. 
> 
> >> It's really difficult to untangle this refactor, I think it could be
> >> done more iteratively, if it's really even beneficial.  Thanks,
> >>
> >> Alex  
> > 
> > Agreed. Actually I'd like to totally remove the "legacy mode" and "UPT
> > mode" concept in future, my proposal is:
> > * Emulate and initialize ASLS and BDSM register unconditionally. These
> >   registers holds HPA, keeping the old value to guest is not a good
> >   idea
> > * Make the host bridge and LPC bridge ID quirks optional like OpRegion.
> >   Recent Linux kernel and Windows driver seems not relying on it. This
> >   enables IGD passthrough on Q35 machines, but probably without UEFI
> >   GOP or VBIOS support, as it is observed they require specific LPC
> >   bridge DID to work.
> > * Remove the requirement of IGD device class being VGA controller, this
> >   was previous discussed in my kernel change [1]
> > * Update the document
> > 
> > It would time consuming to implement all them, coding is not difficult,
> > but I have to verify my change on diffrent platforms. And they are out
> > of this patchset's scope I think. I personally perfers doing it in a
> > future patchset.
> > 
> > [1] https://lore.kernel.org/all/20250123163416.7653-1-tomitamoeko@gmail.com/
> > 
> > Thanks,
> > Moeko  
> 
> Please let me know if you have any thoughts or suggestions, in case
> you missed the previous mail.

TBH, I'm surprised there's so much interest in direct assignment of
igd.  I'd be curious in your motivation, if you can share it.

Regardless, it's nice to see it updated for newer hardware and I don't
mind the goal of isolating the code so it can be disabled on other
archs, so long as we can do so in small, logical steps that are easy to
follow.

At this point, the idea of legacy vs UPT might only exist in QEMU.
There are going to be some challenges to avoid breaking existing VM
command lines if the host and LPC bridge quirks become optional.  There
are a couple x-igd- options that we're free to break as they've always
been experimental, but the implicit LPC bridge and host bridge quirks
need to be considered carefully.  The fact that "legacy" mode has never
previously worked on q35 could mean that we can tie those quirks to a
new experimental option that's off by default and only enabled for
440fx machine types.

I'm glad you included the documentation update in your list, it's
clearly out of date, as is some of my knowledge regarding guest driver
requirements.  I hope we can make some progress on uefi support as well,
as that's essentially a requirement for newer guests.  If we can't get
the code upstream into edk2, maybe we can at least document steps for
others to create images.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk
  2025-01-24 19:12 ` [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk Tomita Moeko
@ 2025-01-31  9:14   ` Cédric Le Goater
  2025-02-02 16:24     ` Tomita Moeko
  0 siblings, 1 reply; 15+ messages in thread
From: Cédric Le Goater @ 2025-01-31  9:14 UTC (permalink / raw)
  To: Tomita Moeko, Alex Williamson; +Cc: qemu-devel

Hello Tomita,

On 1/24/25 20:12, Tomita Moeko wrote:
> The actual IO BAR4 write quirk in vfio_probe_igd_bar4_quirk() was
> removed in previous change, leaving the function not matching its name,
> so move it into the newly introduced vfio_config_quirk_setup(). There
> is no functional change in this commit. If any failure occurs, the
> function simply returns and proceeds.
> 
> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> ---
>   hw/vfio/igd.c        | 31 +++++++++++++++++--------------
>   hw/vfio/pci-quirks.c |  6 +++++-
>   hw/vfio/pci.h        |  2 +-
>   3 files changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> index ca3a32f4f2..376a26fbae 100644
> --- a/hw/vfio/igd.c
> +++ b/hw/vfio/igd.c
> @@ -359,7 +359,8 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>   }
>   
> -void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
> +bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
> +                                 Error **errp G_GNUC_UNUSED)

Adding an 'Error **' parameter is in improvement indeed. All the error_report
of this routine need to be converted too.

>   {
>       g_autofree struct vfio_region_info *rom = NULL;
>       g_autofree struct vfio_region_info *opregion = NULL;
> @@ -378,10 +379,10 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>        * PCI bus address.
>        */
>       if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> -        !vfio_is_vga(vdev) || nr != 4 ||
> +        !vfio_is_vga(vdev) ||
>           &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>                                          0, PCI_DEVFN(0x2, 0))) {
> -        return;
> +        return true;
>       }
>   
>       /*
> @@ -395,7 +396,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>                                              "vfio-pci-igd-lpc-bridge")) {
>           error_report("IGD device %s cannot support legacy mode due to existing "
>                        "devices at address 1f.0", vdev->vbasedev.name);
> -        return;
> +        return true;

if there is an error_report, why is this returning true ? It should be false.
  
>       }
>   
>       /*
> @@ -407,7 +408,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if (gen == -1) {
>           error_report("IGD device %s is unsupported in legacy mode, "
>                        "try SandyBridge or newer", vdev->vbasedev.name);
> -        return;
> +        return true;

same here and else where.

>       }
>   
>       /*
> @@ -420,7 +421,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if ((ret || !rom->size) && !vdev->pdev.romfile) {
>           error_report("IGD device %s has no ROM, legacy mode disabled",
>                        vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       /*
> @@ -431,7 +432,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>           error_report("IGD device %s hotplugged, ROM disabled, "
>                        "legacy mode disabled", vdev->vbasedev.name);
>           vdev->rom_read_failed = true;
> -        return;
> +        return true;
>       }
>   
>       /*
> @@ -444,7 +445,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if (ret) {
>           error_report("IGD device %s does not support OpRegion access,"
>                        "legacy mode disabled", vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       ret = vfio_get_dev_region_info(&vdev->vbasedev,
> @@ -453,7 +454,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if (ret) {
>           error_report("IGD device %s does not support host bridge access,"
>                        "legacy mode disabled", vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       ret = vfio_get_dev_region_info(&vdev->vbasedev,
> @@ -462,7 +463,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if (ret) {
>           error_report("IGD device %s does not support LPC bridge access,"
>                        "legacy mode disabled", vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, 4);
> @@ -476,7 +477,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>           error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>           error_report("IGD device %s failed to enable VGA access, "
>                        "legacy mode disabled", vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       /* Create our LPC/ISA bridge */
> @@ -484,7 +485,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if (ret) {
>           error_report("IGD device %s failed to create LPC bridge, "
>                        "legacy mode disabled", vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       /* Stuff some host values into the VM PCI host bridge */
> @@ -492,14 +493,14 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>       if (ret) {
>           error_report("IGD device %s failed to modify host bridge, "
>                        "legacy mode disabled", vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       /* Setup OpRegion access */
>       if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
>           error_append_hint(&err, "IGD legacy mode disabled\n");
>           error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> -        return;
> +        return true;
>       }
>   
>       /*
> @@ -561,4 +562,6 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>   
>       trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name,
>                                       (ggms_size + gms_size) / MiB);
> +
> +    return true;
>   }
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index c40e3ca88f..b8379cb512 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1169,6 +1169,11 @@ bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>    */
>   bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp)
>   {
> +#ifdef CONFIG_VFIO_IGD

oh. We try to avoid such ifdef in QEMU. Only very specific and high level
configs are discarded at compile time (KVM, LINUX, etc).

One way to adress this case, would be to use QOM. Example below :

  
Declare a base class :
     
     #define TYPE_VFIO_PCI_QUIRK_PROVIDER "vfio-pci-quirk-provider"
     
     OBJECT_DECLARE_TYPE(VFIOPCIQuirkProvider, VFIOPCIQuirkProviderClass,
                         VFIO_PCI_QUIRK_PROVIDER)
     
     struct VFIOPCIQuirkProviderClass {
         ObjectClass parent;
     
         bool (*probe)(VFIOPCIDevice *vdev, Error **errp);
         bool (*setup)(VFIOPCIDevice *vdev, Error **errp);
     };
     
     static const TypeInfo vfio_pci_quirk_info = {
         .name = TYPE_VFIO_PCI_QUIRK_PROVIDER,
         .parent = TYPE_OBJECT,
         .class_size = sizeof(VFIOPCIQuirkClass),
         .abstract = true,
     };
     
     static void register_vfio_pci_quirk_type(void)
     {
         type_register_static(&vfio_pci_quirk_info);
     }
     
     type_init(register_vfio_pci_quirk_type)


Declare one for IGD
     
     static void vfio_pci_quirk_igd_class_init(ObjectClass *klass, void *data)
     {
         VFIOPCIQuirkClass* vpqc = VFIO_PCI_QUIRK_PROVIDER_CLASS(klass);
     
         vpqc->setup = vfio_probe_igd_quirk_probe;
         vpqc->probe = vfio_probe_igd_quirk_probe;
     }
     
     static const TypeInfo vfio_pci_quirk_igd_info = {
         .name = TYPE_VFIO_PCI_QUIRK_PROVIDER "-igd",
         .parent = TYPE_VFIO_PCI_QUIRK_PROVIDER,
         .class_init = vfio_pci_quirk_igd_class_init,
         .class_size = sizeof(VFIOPCIQuirkClass),
     };
     
     static void vfio_pci_quirk_igd_register_types(void)
     {
         type_register_static(&vfio_pci_quirk_igd_info);
     }
     
     type_init(vfio_pci_quirk_igd_register_types)


and in the common part, loop on all the classes to probe and setup :


     static void vfio_pci_quirk_class_foreach(ObjectClass *klass, void *opaque)
     {
         VFIOPCIQuirkProviderClass* vpqc = VFIO_PCI_QUIRK_PROVIDER_CLASS(klass);
         Error *local_err = NULL;
     
         vpqc->setup(opaque, &local_err);
     }
     
     bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp)
     {
         object_class_foreach(vfio_pci_quirk_class_foreach,
                              TYPE_VFIO_PCI_QUIRK_PROVIDER, false, vdev);
        ...




Thanks,

C.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-30 20:41         ` Alex Williamson
@ 2025-01-31 10:15           ` Cédric Le Goater
  2025-02-01  7:57             ` Tomita Moeko
  2025-02-01  7:48           ` Tomita Moeko
  1 sibling, 1 reply; 15+ messages in thread
From: Cédric Le Goater @ 2025-01-31 10:15 UTC (permalink / raw)
  To: Alex Williamson, Tomita Moeko; +Cc: qemu-devel

On 1/30/25 21:41, Alex Williamson wrote:
> On Fri, 31 Jan 2025 02:33:03 +0800
> Tomita Moeko <tomitamoeko@gmail.com> wrote:
> 
>> On 1/25/25 15:42, Tomita Moeko wrote:
>>> On 1/25/25 05:13, Alex Williamson wrote:
>>>> On Sat, 25 Jan 2025 03:12:45 +0800
>>>> Tomita Moeko <tomitamoeko@gmail.com> wrote:
>>>>   
>>>>> Both enable opregion option (x-igd-opregion) and legacy mode require
>>>>> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
>>>>> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
>>>>> code. Finally we moved all the IGD-related code into igd.c.
>>>>>
>>>>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
>>>>> ---
>>>>>   hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
>>>>>   hw/vfio/pci-quirks.c |  50 ---------------
>>>>>   hw/vfio/pci.c        |  25 --------
>>>>>   hw/vfio/pci.h        |   4 --
>>>>>   4 files changed, 110 insertions(+), 112 deletions(-)
>>>>>
>>>>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
>>>>> index 6e06dd774a..015beacf5f 100644
>>>>> --- a/hw/vfio/igd.c
>>>>> +++ b/hw/vfio/igd.c
>>>>> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
>>>>>       return -1;
>>>>>   }
>>>>>   
>>>>> +#define IGD_ASLS 0xfc /* ASL Storage Register */
>>>>>   #define IGD_GMCH 0x50 /* Graphics Control Register */
>>>>>   #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
>>>>>   #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
>>>>> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
>>>>>       return 0;
>>>>>   }
>>>>>   
>>>>> +/*
>>>>> + * The OpRegion includes the Video BIOS Table, which seems important for
>>>>> + * telling the driver what sort of outputs it has.  Without this, the device
>>>>> + * may work in the guest, but we may not get output.  This also requires BIOS
>>>>> + * support to reserve and populate a section of guest memory sufficient for
>>>>> + * the table and to write the base address of that memory to the ASLS register
>>>>> + * of the IGD device.
>>>>> + */
>>>>> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>>>> +                                       struct vfio_region_info *info,
>>>>> +                                       Error **errp)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    vdev->igd_opregion = g_malloc0(info->size);
>>>>> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>>>>> +                info->size, info->offset);
>>>>> +    if (ret != info->size) {
>>>>> +        error_setg(errp, "failed to read IGD OpRegion");
>>>>> +        g_free(vdev->igd_opregion);
>>>>> +        vdev->igd_opregion = NULL;
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>>>>> +     * allocate 32bit reserved memory for, copy these contents into, and write
>>>>> +     * the reserved memory base address to the device ASLS register at 0xFC.
>>>>> +     * Alignment of this reserved region seems flexible, but using a 4k page
>>>>> +     * alignment seems to work well.  This interface assumes a single IGD
>>>>> +     * device, which may be at VM address 00:02.0 in legacy mode or another
>>>>> +     * address in UPT mode.
>>>>> +     *
>>>>> +     * NB, there may be future use cases discovered where the VM should have
>>>>> +     * direct interaction with the host OpRegion, in which case the write to
>>>>> +     * the ASLS register would trigger MemoryRegion setup to enable that.
>>>>> +     */
>>>>> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>>>>> +                    vdev->igd_opregion, info->size);
>>>>> +
>>>>> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>>>>> +
>>>>> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>>>>> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>>>>> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>>   /*
>>>>>    * The rather short list of registers that we copy from the host devices.
>>>>>    * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
>>>>> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>>>>>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>>>>>   }
>>>>>   
>>>>> +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
>>>>> +{
>>>>> +    g_autofree struct vfio_region_info *opregion = NULL;
>>>>> +    int ret;
>>>>> +
>>>>> +    /*
>>>>> +     * Hotplugging is not supprted for both opregion access and legacy mode.
>>>>> +     * For legacy mode, we also need to mark the ROM failed.
>>>>> +     */
>>>>
>>>> The explanation was a little better in the removed comment.
>>>>   
>>>>> +    if (vdev->pdev.qdev.hotplugged) {
>>>>> +        vdev->rom_read_failed = true;
>>>>> +        error_setg(errp,
>>>>> +                   "IGD OpRegion is not supported on hotplugged device");
>>>>
>>>> As was the error log.
>>>>   
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>>>> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>>>> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>>>>> +    if (ret) {
>>>>> +        error_setg_errno(errp, -ret,
>>>>> +                         "device does not supports IGD OpRegion feature");
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>>   bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>>> -                                 Error **errp G_GNUC_UNUSED)
>>>>> +                                 Error **errp)
>>>>>   {
>>>>>       g_autofree struct vfio_region_info *rom = NULL;
>>>>> -    g_autofree struct vfio_region_info *opregion = NULL;
>>>>>       g_autofree struct vfio_region_info *host = NULL;
>>>>>       g_autofree struct vfio_region_info *lpc = NULL;
>>>>> +    PCIBus *bus;
>>>>>       PCIDevice *lpc_bridge;
>>>>>       int ret, gen;
>>>>> +    bool legacy_mode, enable_opregion;
>>>>>       uint64_t gms_size;
>>>>>       uint64_t *bdsm_size;
>>>>>       uint32_t gmch;
>>>>>       Error *err = NULL;
>>>>>   
>>>>> +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>>>> +        !vfio_is_vga(vdev)) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>>       /*
>>>>>        * This must be an Intel VGA device at address 00:02.0 for us to even
>>>>>        * consider enabling legacy mode.  The vBIOS has dependencies on the
>>>>>        * PCI bus address.
>>>>>        */
>>>>> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>>>> -        !vfio_is_vga(vdev) ||
>>>>> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>>>>> -                                       0, PCI_DEVFN(0x2, 0))) {
>>>>> +    bus = pci_device_root_bus(&vdev->pdev);
>>>>> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
>>>>> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
>>>>> +
>>>>> +    if (!enable_opregion && !legacy_mode) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
>>>>> +        if (enable_opregion) {
>>>>> +            error_propagate(errp, err);
>>>>> +            return false;
>>>>> +        } else if (legacy_mode) {
>>>>> +            error_append_hint(&err, "IGD legacy mode disabled\n");
>>>>> +            error_report_err(err);
>>>>> +            return true;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    if (!legacy_mode) {
>>>>>           return true;
>>>>>       }
>>>>>   
>>>>> @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>>>           return true;
>>>>>       }
>>>>>   
>>>>> -    /*
>>>>> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
>>>>> -     * create the devices we need for legacy mode in the hotplug scenario.
>>>>> -     */
>>>>> -    if (vdev->pdev.qdev.hotplugged) {
>>>>> -        error_report("IGD device %s hotplugged, ROM disabled, "
>>>>> -                     "legacy mode disabled", vdev->vbasedev.name);
>>>>> -        vdev->rom_read_failed = true;
>>>>> -        return true;
>>>>> -    }
>>>>> -
>>>>>       /*
>>>>>        * Check whether we have all the vfio device specific regions to
>>>>>        * support legacy mode (added in Linux v4.6).  If not, bail.
>>>>>        */
>>>>> And we're disassociating opregion setup from this useful comment.
>>>>
>>>> What are we actually accomplishing here?  What specific code
>>>> duplication is eliminated?
>>>
>>> This patch is designed for moving the opregion quirk in vfio_realize()
>>> to igd.c, for better isolation of the igd-specific code. Legacy mode
>>> also need to initialize opregion as x-igd-opregion=on option. These
>>> code are almost the same, except legacy mode continues on error, while
>>> x-igd-opregion fails.
>>>
>>> I am going to clearify that in the commit message of v3.
>>>    
>>>> Why is it important to move all this code to igd.c?
>>
>> x-igd-opreqion quirk is currently located in pci-quirks.c, which is not
>> controlled by CONFIG_VFIO_IGD, moving it to igd.c prevents building
>> that unnecessary code in certain binaries, for example, non x86 builds.
>>
>>>> It's really difficult to untangle this refactor, I think it could be
>>>> done more iteratively, if it's really even beneficial.  Thanks,
>>>>
>>>> Alex
>>>
>>> Agreed. Actually I'd like to totally remove the "legacy mode" and "UPT
>>> mode" concept in future, my proposal is:
>>> * Emulate and initialize ASLS and BDSM register unconditionally. These
>>>    registers holds HPA, keeping the old value to guest is not a good
>>>    idea
>>> * Make the host bridge and LPC bridge ID quirks optional like OpRegion.
>>>    Recent Linux kernel and Windows driver seems not relying on it. This
>>>    enables IGD passthrough on Q35 machines, but probably without UEFI
>>>    GOP or VBIOS support, as it is observed they require specific LPC
>>>    bridge DID to work.
>>> * Remove the requirement of IGD device class being VGA controller, this
>>>    was previous discussed in my kernel change [1]
>>> * Update the document
>>>
>>> It would time consuming to implement all them, coding is not difficult,
>>> but I have to verify my change on diffrent platforms. And they are out
>>> of this patchset's scope I think. I personally perfers doing it in a
>>> future patchset.
>>>
>>> [1] https://lore.kernel.org/all/20250123163416.7653-1-tomitamoeko@gmail.com/
>>>
>>> Thanks,
>>> Moeko
>>
>> Please let me know if you have any thoughts or suggestions, in case
>> you missed the previous mail.
> 
> TBH, I'm surprised there's so much interest in direct assignment of
> igd.  I'd be curious in your motivation, if you can share it.
> 
> Regardless, it's nice to see it updated for newer hardware and I don't
> mind the goal of isolating the code so it can be disabled on other
> archs, so long as we can do so in small, logical steps that are easy to
> follow.
> 
> At this point, the idea of legacy vs UPT might only exist in QEMU.
> There are going to be some challenges to avoid breaking existing VM
> command lines if the host and LPC bridge quirks become optional.  There
> are a couple x-igd- options that we're free to break as they've always
> been experimental, but the implicit LPC bridge and host bridge quirks
> need to be considered carefully.  The fact that "legacy" mode has never
> previously worked on q35 could mean that we can tie those quirks to a
> new experimental option that's off by default and only enabled for
> 440fx machine types.
> 
> I'm glad you included the documentation update in your list, it's
> clearly out of date, as is some of my knowledge regarding guest driver
> requirements.  

Could we please have an update of docs/igd-assign.txt too ?

As some point, we should consolidate all VFIO documentation under
one section. That's another topic.

> I hope we can make some progress on uefi support as well,
> as that's essentially a requirement for newer guests.  If we can't get
> the code upstream into edk2, maybe we can at least document steps for
> others to create images.  Thanks,
>

So, I am bit lot here, forgive my ignorance.

I am seeing issues (a black screen and nothing else to report) with :

   00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c)

using uefi, seabios, pc or q35 does not change the result.


However, it works fine with a uefi q35 machine using :

   00:02.0 VGA compatible controller: Intel Corporation Alder Lake-N [UHD Graphics]

How can I dig into the first issue ?


Also, if we know that there are platform requirements for IGD assignment to work, we
should try to verify that they are met when the machine boots.


Thanks,

C.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-30 20:41         ` Alex Williamson
  2025-01-31 10:15           ` Cédric Le Goater
@ 2025-02-01  7:48           ` Tomita Moeko
  1 sibling, 0 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-02-01  7:48 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Cédric Le Goater, qemu-devel

On 1/31/25 04:41, Alex Williamson wrote:
> On Fri, 31 Jan 2025 02:33:03 +0800
> Tomita Moeko <tomitamoeko@gmail.com> wrote:
> 
>> On 1/25/25 15:42, Tomita Moeko wrote:
>>> On 1/25/25 05:13, Alex Williamson wrote:  
>>>> On Sat, 25 Jan 2025 03:12:45 +0800
>>>> Tomita Moeko <tomitamoeko@gmail.com> wrote:
>>>>  
>>>>> Both enable opregion option (x-igd-opregion) and legacy mode require
>>>>> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
>>>>> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
>>>>> code. Finally we moved all the IGD-related code into igd.c.
>>>>>
>>>>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
>>>>> ---
>>>>>  hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
>>>>>  hw/vfio/pci-quirks.c |  50 ---------------
>>>>>  hw/vfio/pci.c        |  25 --------
>>>>>  hw/vfio/pci.h        |   4 --
>>>>>  4 files changed, 110 insertions(+), 112 deletions(-)
>>>>>
>>>>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
>>>>> index 6e06dd774a..015beacf5f 100644
>>>>> --- a/hw/vfio/igd.c
>>>>> +++ b/hw/vfio/igd.c
>>>>> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
>>>>>      return -1;
>>>>>  }
>>>>>  
>>>>> +#define IGD_ASLS 0xfc /* ASL Storage Register */
>>>>>  #define IGD_GMCH 0x50 /* Graphics Control Register */
>>>>>  #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
>>>>>  #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
>>>>> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
>>>>>      return 0;
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * The OpRegion includes the Video BIOS Table, which seems important for
>>>>> + * telling the driver what sort of outputs it has.  Without this, the device
>>>>> + * may work in the guest, but we may not get output.  This also requires BIOS
>>>>> + * support to reserve and populate a section of guest memory sufficient for
>>>>> + * the table and to write the base address of that memory to the ASLS register
>>>>> + * of the IGD device.
>>>>> + */
>>>>> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>>>> +                                       struct vfio_region_info *info,
>>>>> +                                       Error **errp)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    vdev->igd_opregion = g_malloc0(info->size);
>>>>> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>>>>> +                info->size, info->offset);
>>>>> +    if (ret != info->size) {
>>>>> +        error_setg(errp, "failed to read IGD OpRegion");
>>>>> +        g_free(vdev->igd_opregion);
>>>>> +        vdev->igd_opregion = NULL;
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>>>>> +     * allocate 32bit reserved memory for, copy these contents into, and write
>>>>> +     * the reserved memory base address to the device ASLS register at 0xFC.
>>>>> +     * Alignment of this reserved region seems flexible, but using a 4k page
>>>>> +     * alignment seems to work well.  This interface assumes a single IGD
>>>>> +     * device, which may be at VM address 00:02.0 in legacy mode or another
>>>>> +     * address in UPT mode.
>>>>> +     *
>>>>> +     * NB, there may be future use cases discovered where the VM should have
>>>>> +     * direct interaction with the host OpRegion, in which case the write to
>>>>> +     * the ASLS register would trigger MemoryRegion setup to enable that.
>>>>> +     */
>>>>> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>>>>> +                    vdev->igd_opregion, info->size);
>>>>> +
>>>>> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>>>>> +
>>>>> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>>>>> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>>>>> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>>  /*
>>>>>   * The rather short list of registers that we copy from the host devices.
>>>>>   * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
>>>>> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>>>>>      QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>>>>>  }
>>>>>  
>>>>> +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
>>>>> +{
>>>>> +    g_autofree struct vfio_region_info *opregion = NULL;
>>>>> +    int ret;
>>>>> +
>>>>> +    /*
>>>>> +     * Hotplugging is not supprted for both opregion access and legacy mode.
>>>>> +     * For legacy mode, we also need to mark the ROM failed.
>>>>> +     */  
>>>>
>>>> The explanation was a little better in the removed comment.
>>>>  
>>>>> +    if (vdev->pdev.qdev.hotplugged) {
>>>>> +        vdev->rom_read_failed = true;
>>>>> +        error_setg(errp,
>>>>> +                   "IGD OpRegion is not supported on hotplugged device");  
>>>>
>>>> As was the error log.
>>>>  
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>>>> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>>>> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>>>>> +    if (ret) {
>>>>> +        error_setg_errno(errp, -ret,
>>>>> +                         "device does not supports IGD OpRegion feature");
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>>  bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>>> -                                 Error **errp G_GNUC_UNUSED)
>>>>> +                                 Error **errp)
>>>>>  {
>>>>>      g_autofree struct vfio_region_info *rom = NULL;
>>>>> -    g_autofree struct vfio_region_info *opregion = NULL;
>>>>>      g_autofree struct vfio_region_info *host = NULL;
>>>>>      g_autofree struct vfio_region_info *lpc = NULL;
>>>>> +    PCIBus *bus;
>>>>>      PCIDevice *lpc_bridge;
>>>>>      int ret, gen;
>>>>> +    bool legacy_mode, enable_opregion;
>>>>>      uint64_t gms_size;
>>>>>      uint64_t *bdsm_size;
>>>>>      uint32_t gmch;
>>>>>      Error *err = NULL;
>>>>>  
>>>>> +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>>>> +        !vfio_is_vga(vdev)) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>>      /*
>>>>>       * This must be an Intel VGA device at address 00:02.0 for us to even
>>>>>       * consider enabling legacy mode.  The vBIOS has dependencies on the
>>>>>       * PCI bus address.
>>>>>       */
>>>>> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>>>> -        !vfio_is_vga(vdev) ||
>>>>> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>>>>> -                                       0, PCI_DEVFN(0x2, 0))) {
>>>>> +    bus = pci_device_root_bus(&vdev->pdev);
>>>>> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
>>>>> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
>>>>> +
>>>>> +    if (!enable_opregion && !legacy_mode) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
>>>>> +        if (enable_opregion) {
>>>>> +            error_propagate(errp, err);
>>>>> +            return false;
>>>>> +        } else if (legacy_mode) {
>>>>> +            error_append_hint(&err, "IGD legacy mode disabled\n");
>>>>> +            error_report_err(err);
>>>>> +            return true;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    if (!legacy_mode) {
>>>>>          return true;
>>>>>      }
>>>>>  
>>>>> @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>>>          return true;
>>>>>      }
>>>>>  
>>>>> -    /*
>>>>> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
>>>>> -     * create the devices we need for legacy mode in the hotplug scenario.
>>>>> -     */
>>>>> -    if (vdev->pdev.qdev.hotplugged) {
>>>>> -        error_report("IGD device %s hotplugged, ROM disabled, "
>>>>> -                     "legacy mode disabled", vdev->vbasedev.name);
>>>>> -        vdev->rom_read_failed = true;
>>>>> -        return true;
>>>>> -    }
>>>>> -
>>>>>      /*
>>>>>       * Check whether we have all the vfio device specific regions to
>>>>>       * support legacy mode (added in Linux v4.6).  If not, bail.
>>>>>       */
>>>>> And we're disassociating opregion setup from this useful comment.  
>>>>
>>>> What are we actually accomplishing here?  What specific code
>>>> duplication is eliminated?  
>>>
>>> This patch is designed for moving the opregion quirk in vfio_realize()
>>> to igd.c, for better isolation of the igd-specific code. Legacy mode
>>> also need to initialize opregion as x-igd-opregion=on option. These
>>> code are almost the same, except legacy mode continues on error, while
>>> x-igd-opregion fails.
>>>
>>> I am going to clearify that in the commit message of v3.
>>>   
>>>> Why is it important to move all this code to igd.c?  
>>
>> x-igd-opreqion quirk is currently located in pci-quirks.c, which is not
>> controlled by CONFIG_VFIO_IGD, moving it to igd.c prevents building
>> that unnecessary code in certain binaries, for example, non x86 builds. 
>>
>>>> It's really difficult to untangle this refactor, I think it could be
>>>> done more iteratively, if it's really even beneficial.  Thanks,
>>>>
>>>> Alex  
>>>
>>> Agreed. Actually I'd like to totally remove the "legacy mode" and "UPT
>>> mode" concept in future, my proposal is:
>>> * Emulate and initialize ASLS and BDSM register unconditionally. These
>>>   registers holds HPA, keeping the old value to guest is not a good
>>>   idea
>>> * Make the host bridge and LPC bridge ID quirks optional like OpRegion.
>>>   Recent Linux kernel and Windows driver seems not relying on it. This
>>>   enables IGD passthrough on Q35 machines, but probably without UEFI
>>>   GOP or VBIOS support, as it is observed they require specific LPC
>>>   bridge DID to work.
>>> * Remove the requirement of IGD device class being VGA controller, this
>>>   was previous discussed in my kernel change [1]
>>> * Update the document
>>>
>>> It would time consuming to implement all them, coding is not difficult,
>>> but I have to verify my change on diffrent platforms. And they are out
>>> of this patchset's scope I think. I personally perfers doing it in a
>>> future patchset.
>>>
>>> [1] https://lore.kernel.org/all/20250123163416.7653-1-tomitamoeko@gmail.com/
>>>
>>> Thanks,
>>> Moeko  
>>
>> Please let me know if you have any thoughts or suggestions, in case
>> you missed the previous mail.
> 
> TBH, I'm surprised there's so much interest in direct assignment of
> igd.  I'd be curious in your motivation, if you can share it.

I was setting up a windows guest on the linux box in my university lab
since some software used in the university only supports Windows :(.
As the linux box only runs background services like nfs share, and I
only ssh to it for coding. I'd like to passthrough the GPU to guset so
it can have baremetal-like GUI experience. During setup, I found there
were limitations and conflicting tutorials, which makes me curious
about how it actually works.

> Regardless, it's nice to see it updated for newer hardware and I don't
> mind the goal of isolating the code so it can be disabled on other
> archs, so long as we can do so in small, logical steps that are easy to
> follow.
> 
> At this point, the idea of legacy vs UPT might only exist in QEMU.
> There are going to be some challenges to avoid breaking existing VM
> command lines if the host and LPC bridge quirks become optional.  There
> are a couple x-igd- options that we're free to break as they've always
> been experimental, but the implicit LPC bridge and host bridge quirks
> need to be considered carefully.  The fact that "legacy" mode has never
> previously worked on q35 could mean that we can tie those quirks to a
> new experimental option that's off by default and only enabled for
> 440fx machine types.

Currently legacy mode requires creating a dummy LPC bridge at 00:1f.0,
but on q35, there is already an ICH9 LPC there. In my trials, LPC is
not a must for linux guests and windows guests with recent driver.
Only UEFI GOP requires it. Making it as another x-igd- option will
give more flexibility I believe.

"Legacy" mode implicitly means enabling the options below I think
* x-igd-opregion=on
* dummy lpc bridge, as well as host bridge ids
* x-vga=on, not a must, but code tries to enable VGA access

Remoing legacy mode while keeping the current behavior is challenging,
as legacy mode is implicitly enabled by addr=0x2 and doesn't fail if
any condition is not met. It seems we cannot replace legacy mode in
a completely clean mannar, my proposal for the default values are:

| config          | x-igd-opregion | lpc quirk |
|-----------------|----------------|-----------|
| i440fx,addr=0x2 | yes            | yes       |
| i440fx,others   | no             | no        |
| q35             | no             | no        |

However, it doesn't handle the cases legacy mode fails halfway :(,
any suggestions would be greatly appreciated.

> I'm glad you included the documentation update in your list, it's
> clearly out of date, as is some of my knowledge regarding guest driver
> requirements.  I hope we can make some progress on uefi support as well,
> as that's essentially a requirement for newer guests.  If we can't get
> the code upstream into edk2, maybe we can at least document steps for
> others to create images.  Thanks,

Sure. I will include it in the documentation update.

> Alex
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk()
  2025-01-31 10:15           ` Cédric Le Goater
@ 2025-02-01  7:57             ` Tomita Moeko
  0 siblings, 0 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-02-01  7:57 UTC (permalink / raw)
  To: Cédric Le Goater, Alex Williamson; +Cc: qemu-devel

On 1/31/25 18:15, Cédric Le Goater wrote:
> On 1/30/25 21:41, Alex Williamson wrote:
>> On Fri, 31 Jan 2025 02:33:03 +0800
>> Tomita Moeko <tomitamoeko@gmail.com> wrote:
>>
>>> On 1/25/25 15:42, Tomita Moeko wrote:
>>>> On 1/25/25 05:13, Alex Williamson wrote:
>>>>> On Sat, 25 Jan 2025 03:12:45 +0800
>>>>> Tomita Moeko <tomitamoeko@gmail.com> wrote:
>>>>>  
>>>>>> Both enable opregion option (x-igd-opregion) and legacy mode require
>>>>>> setting up OpRegion copy for IGD devices. Move x-igd-opregion handler
>>>>>> in vfio_realize() to vfio_probe_igd_config_quirk() to elimate duplicate
>>>>>> code. Finally we moved all the IGD-related code into igd.c.
>>>>>>
>>>>>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
>>>>>> ---
>>>>>>   hw/vfio/igd.c        | 143 +++++++++++++++++++++++++++++++++----------
>>>>>>   hw/vfio/pci-quirks.c |  50 ---------------
>>>>>>   hw/vfio/pci.c        |  25 --------
>>>>>>   hw/vfio/pci.h        |   4 --
>>>>>>   4 files changed, 110 insertions(+), 112 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
>>>>>> index 6e06dd774a..015beacf5f 100644
>>>>>> --- a/hw/vfio/igd.c
>>>>>> +++ b/hw/vfio/igd.c
>>>>>> @@ -106,6 +106,7 @@ static int igd_gen(VFIOPCIDevice *vdev)
>>>>>>       return -1;
>>>>>>   }
>>>>>>   +#define IGD_ASLS 0xfc /* ASL Storage Register */
>>>>>>   #define IGD_GMCH 0x50 /* Graphics Control Register */
>>>>>>   #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
>>>>>>   #define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later */
>>>>>> @@ -138,6 +139,55 @@ static uint64_t igd_stolen_memory_size(int gen, uint32_t gmch)
>>>>>>       return 0;
>>>>>>   }
>>>>>>   +/*
>>>>>> + * The OpRegion includes the Video BIOS Table, which seems important for
>>>>>> + * telling the driver what sort of outputs it has.  Without this, the device
>>>>>> + * may work in the guest, but we may not get output.  This also requires BIOS
>>>>>> + * support to reserve and populate a section of guest memory sufficient for
>>>>>> + * the table and to write the base address of that memory to the ASLS register
>>>>>> + * of the IGD device.
>>>>>> + */
>>>>>> +static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>>>>> +                                       struct vfio_region_info *info,
>>>>>> +                                       Error **errp)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    vdev->igd_opregion = g_malloc0(info->size);
>>>>>> +    ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
>>>>>> +                info->size, info->offset);
>>>>>> +    if (ret != info->size) {
>>>>>> +        error_setg(errp, "failed to read IGD OpRegion");
>>>>>> +        g_free(vdev->igd_opregion);
>>>>>> +        vdev->igd_opregion = NULL;
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to
>>>>>> +     * allocate 32bit reserved memory for, copy these contents into, and write
>>>>>> +     * the reserved memory base address to the device ASLS register at 0xFC.
>>>>>> +     * Alignment of this reserved region seems flexible, but using a 4k page
>>>>>> +     * alignment seems to work well.  This interface assumes a single IGD
>>>>>> +     * device, which may be at VM address 00:02.0 in legacy mode or another
>>>>>> +     * address in UPT mode.
>>>>>> +     *
>>>>>> +     * NB, there may be future use cases discovered where the VM should have
>>>>>> +     * direct interaction with the host OpRegion, in which case the write to
>>>>>> +     * the ASLS register would trigger MemoryRegion setup to enable that.
>>>>>> +     */
>>>>>> +    fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
>>>>>> +                    vdev->igd_opregion, info->size);
>>>>>> +
>>>>>> +    trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
>>>>>> +
>>>>>> +    pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
>>>>>> +    pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
>>>>>> +    pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
>>>>>> +
>>>>>> +    return true;
>>>>>> +}
>>>>>> +
>>>>>>   /*
>>>>>>    * The rather short list of registers that we copy from the host devices.
>>>>>>    * The LPC/ISA bridge values are definitely needed to support the vBIOS, the
>>>>>> @@ -339,29 +389,83 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>>>>>>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>>>>>>   }
>>>>>>   +static bool vfio_igd_try_enable_opregion(VFIOPCIDevice *vdev, Error **errp)
>>>>>> +{
>>>>>> +    g_autofree struct vfio_region_info *opregion = NULL;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Hotplugging is not supprted for both opregion access and legacy mode.
>>>>>> +     * For legacy mode, we also need to mark the ROM failed.
>>>>>> +     */
>>>>>
>>>>> The explanation was a little better in the removed comment.
>>>>>  
>>>>>> +    if (vdev->pdev.qdev.hotplugged) {
>>>>>> +        vdev->rom_read_failed = true;
>>>>>> +        error_setg(errp,
>>>>>> +                   "IGD OpRegion is not supported on hotplugged device");
>>>>>
>>>>> As was the error log.
>>>>>  
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    ret = vfio_get_dev_region_info(&vdev->vbasedev,
>>>>>> +                    VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
>>>>>> +                    VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
>>>>>> +    if (ret) {
>>>>>> +        error_setg_errno(errp, -ret,
>>>>>> +                         "device does not supports IGD OpRegion feature");
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    return true;
>>>>>> +}
>>>>>> +
>>>>>>   bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>>>> -                                 Error **errp G_GNUC_UNUSED)
>>>>>> +                                 Error **errp)
>>>>>>   {
>>>>>>       g_autofree struct vfio_region_info *rom = NULL;
>>>>>> -    g_autofree struct vfio_region_info *opregion = NULL;
>>>>>>       g_autofree struct vfio_region_info *host = NULL;
>>>>>>       g_autofree struct vfio_region_info *lpc = NULL;
>>>>>> +    PCIBus *bus;
>>>>>>       PCIDevice *lpc_bridge;
>>>>>>       int ret, gen;
>>>>>> +    bool legacy_mode, enable_opregion;
>>>>>>       uint64_t gms_size;
>>>>>>       uint64_t *bdsm_size;
>>>>>>       uint32_t gmch;
>>>>>>       Error *err = NULL;
>>>>>>   +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>>>>> +        !vfio_is_vga(vdev)) {
>>>>>> +        return true;
>>>>>> +    }
>>>>>> +
>>>>>>       /*
>>>>>>        * This must be an Intel VGA device at address 00:02.0 for us to even
>>>>>>        * consider enabling legacy mode.  The vBIOS has dependencies on the
>>>>>>        * PCI bus address.
>>>>>>        */
>>>>>> -    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>>>>>> -        !vfio_is_vga(vdev) ||
>>>>>> -        &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>>>>>> -                                       0, PCI_DEVFN(0x2, 0))) {
>>>>>> +    bus = pci_device_root_bus(&vdev->pdev);
>>>>>> +    legacy_mode = (&vdev->pdev == pci_find_device(bus, 0, PCI_DEVFN(0x2, 0)));
>>>>>> +    enable_opregion = (vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION);
>>>>>> +
>>>>>> +    if (!enable_opregion && !legacy_mode) {
>>>>>> +        return true;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!vfio_igd_try_enable_opregion(vdev, &err)) {
>>>>>> +        if (enable_opregion) {
>>>>>> +            error_propagate(errp, err);
>>>>>> +            return false;
>>>>>> +        } else if (legacy_mode) {
>>>>>> +            error_append_hint(&err, "IGD legacy mode disabled\n");
>>>>>> +            error_report_err(err);
>>>>>> +            return true;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!legacy_mode) {
>>>>>>           return true;
>>>>>>       }
>>>>>>   @@ -404,30 +508,10 @@ bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>>>>>>           return true;
>>>>>>       }
>>>>>>   -    /*
>>>>>> -     * Ignore the hotplug corner case, mark the ROM failed, we can't
>>>>>> -     * create the devices we need for legacy mode in the hotplug scenario.
>>>>>> -     */
>>>>>> -    if (vdev->pdev.qdev.hotplugged) {
>>>>>> -        error_report("IGD device %s hotplugged, ROM disabled, "
>>>>>> -                     "legacy mode disabled", vdev->vbasedev.name);
>>>>>> -        vdev->rom_read_failed = true;
>>>>>> -        return true;
>>>>>> -    }
>>>>>> -
>>>>>>       /*
>>>>>>        * Check whether we have all the vfio device specific regions to
>>>>>>        * support legacy mode (added in Linux v4.6).  If not, bail.
>>>>>>        */
>>>>>> And we're disassociating opregion setup from this useful comment.
>>>>>
>>>>> What are we actually accomplishing here?  What specific code
>>>>> duplication is eliminated?
>>>>
>>>> This patch is designed for moving the opregion quirk in vfio_realize()
>>>> to igd.c, for better isolation of the igd-specific code. Legacy mode
>>>> also need to initialize opregion as x-igd-opregion=on option. These
>>>> code are almost the same, except legacy mode continues on error, while
>>>> x-igd-opregion fails.
>>>>
>>>> I am going to clearify that in the commit message of v3.
>>>>   
>>>>> Why is it important to move all this code to igd.c?
>>>
>>> x-igd-opreqion quirk is currently located in pci-quirks.c, which is not
>>> controlled by CONFIG_VFIO_IGD, moving it to igd.c prevents building
>>> that unnecessary code in certain binaries, for example, non x86 builds.
>>>
>>>>> It's really difficult to untangle this refactor, I think it could be
>>>>> done more iteratively, if it's really even beneficial.  Thanks,
>>>>>
>>>>> Alex
>>>>
>>>> Agreed. Actually I'd like to totally remove the "legacy mode" and "UPT
>>>> mode" concept in future, my proposal is:
>>>> * Emulate and initialize ASLS and BDSM register unconditionally. These
>>>>    registers holds HPA, keeping the old value to guest is not a good
>>>>    idea
>>>> * Make the host bridge and LPC bridge ID quirks optional like OpRegion.
>>>>    Recent Linux kernel and Windows driver seems not relying on it. This
>>>>    enables IGD passthrough on Q35 machines, but probably without UEFI
>>>>    GOP or VBIOS support, as it is observed they require specific LPC
>>>>    bridge DID to work.
>>>> * Remove the requirement of IGD device class being VGA controller, this
>>>>    was previous discussed in my kernel change [1]
>>>> * Update the document
>>>>
>>>> It would time consuming to implement all them, coding is not difficult,
>>>> but I have to verify my change on diffrent platforms. And they are out
>>>> of this patchset's scope I think. I personally perfers doing it in a
>>>> future patchset.
>>>>
>>>> [1] https://lore.kernel.org/all/20250123163416.7653-1-tomitamoeko@gmail.com/
>>>>
>>>> Thanks,
>>>> Moeko
>>>
>>> Please let me know if you have any thoughts or suggestions, in case
>>> you missed the previous mail.
>>
>> TBH, I'm surprised there's so much interest in direct assignment of
>> igd.  I'd be curious in your motivation, if you can share it.
>>
>> Regardless, it's nice to see it updated for newer hardware and I don't
>> mind the goal of isolating the code so it can be disabled on other
>> archs, so long as we can do so in small, logical steps that are easy to
>> follow.
>>
>> At this point, the idea of legacy vs UPT might only exist in QEMU.
>> There are going to be some challenges to avoid breaking existing VM
>> command lines if the host and LPC bridge quirks become optional.  There
>> are a couple x-igd- options that we're free to break as they've always
>> been experimental, but the implicit LPC bridge and host bridge quirks
>> need to be considered carefully.  The fact that "legacy" mode has never
>> previously worked on q35 could mean that we can tie those quirks to a
>> new experimental option that's off by default and only enabled for
>> 440fx machine types.
>>
>> I'm glad you included the documentation update in your list, it's
>> clearly out of date, as is some of my knowledge regarding guest driver
>> requirements.  
> 
> Could we please have an update of docs/igd-assign.txt too ?
> 
> As some point, we should consolidate all VFIO documentation under
> one section. That's another topic.
> 
>> I hope we can make some progress on uefi support as well,
>> as that's essentially a requirement for newer guests.  If we can't get
>> the code upstream into edk2, maybe we can at least document steps for
>> others to create images.  Thanks,
>>
> 
> So, I am bit lot here, forgive my ignorance.
> 
> I am seeing issues (a black screen and nothing else to report) with :
> 
>   00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c)
> 
> using uefi, seabios, pc or q35 does not change the result.
> 
> 
> However, it works fine with a uefi q35 machine using :
> 
>   00:02.0 VGA compatible controller: Intel Corporation Alder Lake-N [UHD Graphics]
> 
> How can I dig into the first issue ?
> 

If you are running a linux guest, `dmesg | grep i915` in guest is
always a good start. Adding `drm_debug` to kernel cmdline logs more
details. I mainly uses `drm_debug=0x6` for igd passthrough (Enable
Driver and KMS logging).

https://docs.kernel.org/gpu/drm-internals.html

> Also, if we know that there are platform requirements for IGD assignment to work, we
> should try to verify that they are met when the machine boots.

Well noted.
 
> Thanks,
> 
> C.
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk
  2025-01-31  9:14   ` Cédric Le Goater
@ 2025-02-02 16:24     ` Tomita Moeko
  0 siblings, 0 replies; 15+ messages in thread
From: Tomita Moeko @ 2025-02-02 16:24 UTC (permalink / raw)
  To: Cédric Le Goater, Alex Williamson; +Cc: qemu-devel

On 1/31/25 17:14, Cédric Le Goater wrote:
> Hello Tomita,
> 
> On 1/24/25 20:12, Tomita Moeko wrote:
>> The actual IO BAR4 write quirk in vfio_probe_igd_bar4_quirk() was
>> removed in previous change, leaving the function not matching its name,
>> so move it into the newly introduced vfio_config_quirk_setup(). There
>> is no functional change in this commit. If any failure occurs, the
>> function simply returns and proceeds.
>>
>> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
>> ---
>>   hw/vfio/igd.c        | 31 +++++++++++++++++--------------
>>   hw/vfio/pci-quirks.c |  6 +++++-
>>   hw/vfio/pci.h        |  2 +-
>>   3 files changed, 23 insertions(+), 16 deletions(-)
>>
>> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
>> index ca3a32f4f2..376a26fbae 100644
>> --- a/hw/vfio/igd.c
>> +++ b/hw/vfio/igd.c
>> @@ -359,7 +359,8 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, bdsm_quirk, next);
>>   }
>>   -void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>> +bool vfio_probe_igd_config_quirk(VFIOPCIDevice *vdev,
>> +                                 Error **errp G_GNUC_UNUSED)
> 
> Adding an 'Error **' parameter is in improvement indeed. All the error_report
> of this routine need to be converted too.

Sorry I missed this mail previously.

Originally this function simply return and continue on error, I prefer
keeping current behavior until legacy mode was finally removed, as I
described in my reply to Alex. At that point, this function will
terminate and report on error.
 
>>   {
>>       g_autofree struct vfio_region_info *rom = NULL;
>>       g_autofree struct vfio_region_info *opregion = NULL;
>> @@ -378,10 +379,10 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>        * PCI bus address.
>>        */
>>       if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
>> -        !vfio_is_vga(vdev) || nr != 4 ||
>> +        !vfio_is_vga(vdev) ||
>>           &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
>>                                          0, PCI_DEVFN(0x2, 0))) {
>> -        return;
>> +        return true;
>>       }
>>         /*
>> @@ -395,7 +396,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>                                              "vfio-pci-igd-lpc-bridge")) {
>>           error_report("IGD device %s cannot support legacy mode due to existing "
>>                        "devices at address 1f.0", vdev->vbasedev.name);
>> -        return;
>> +        return true;
> 
> if there is an error_report, why is this returning true ? It should be false.

Please see my comment above
  
>>       }
>>         /*
>> @@ -407,7 +408,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if (gen == -1) {
>>           error_report("IGD device %s is unsupported in legacy mode, "
>>                        "try SandyBridge or newer", vdev->vbasedev.name);
>> -        return;
>> +        return true;
> 
> same here and else where.
> 
>>       }
>>         /*
>> @@ -420,7 +421,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if ((ret || !rom->size) && !vdev->pdev.romfile) {
>>           error_report("IGD device %s has no ROM, legacy mode disabled",
>>                        vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         /*
>> @@ -431,7 +432,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>           error_report("IGD device %s hotplugged, ROM disabled, "
>>                        "legacy mode disabled", vdev->vbasedev.name);
>>           vdev->rom_read_failed = true;
>> -        return;
>> +        return true;
>>       }
>>         /*
>> @@ -444,7 +445,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if (ret) {
>>           error_report("IGD device %s does not support OpRegion access,"
>>                        "legacy mode disabled", vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         ret = vfio_get_dev_region_info(&vdev->vbasedev,
>> @@ -453,7 +454,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if (ret) {
>>           error_report("IGD device %s does not support host bridge access,"
>>                        "legacy mode disabled", vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         ret = vfio_get_dev_region_info(&vdev->vbasedev,
>> @@ -462,7 +463,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if (ret) {
>>           error_report("IGD device %s does not support LPC bridge access,"
>>                        "legacy mode disabled", vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, 4);
>> @@ -476,7 +477,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>           error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>>           error_report("IGD device %s failed to enable VGA access, "
>>                        "legacy mode disabled", vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         /* Create our LPC/ISA bridge */
>> @@ -484,7 +485,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if (ret) {
>>           error_report("IGD device %s failed to create LPC bridge, "
>>                        "legacy mode disabled", vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         /* Stuff some host values into the VM PCI host bridge */
>> @@ -492,14 +493,14 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>       if (ret) {
>>           error_report("IGD device %s failed to modify host bridge, "
>>                        "legacy mode disabled", vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         /* Setup OpRegion access */
>>       if (!vfio_pci_igd_opregion_init(vdev, opregion, &err)) {
>>           error_append_hint(&err, "IGD legacy mode disabled\n");
>>           error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>> -        return;
>> +        return true;
>>       }
>>         /*
>> @@ -561,4 +562,6 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
>>         trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name,
>>                                       (ggms_size + gms_size) / MiB);
>> +
>> +    return true;
>>   }
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index c40e3ca88f..b8379cb512 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -1169,6 +1169,11 @@ bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>>    */
>>   bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp)
>>   {
>> +#ifdef CONFIG_VFIO_IGD
> 
> oh. We try to avoid such ifdef in QEMU. Only very specific and high level
> configs are discarded at compile time (KVM, LINUX, etc).

This ifdef has been in QEMU code, for only building igd-related code
into x86 target (VFIO_IGD is only seleted when PC_PCI=on, and I440FX/
Q35 selests PC_PCI). IMO using the ifdef here is reasonable, as IGD
quirks is a part of vfio-pci, but using QOM in furure would be a
good direction.

> One way to adress this case, would be to use QOM. Example below :
> 
>  
> Declare a base class :
>         #define TYPE_VFIO_PCI_QUIRK_PROVIDER "vfio-pci-quirk-provider"
>         OBJECT_DECLARE_TYPE(VFIOPCIQuirkProvider, VFIOPCIQuirkProviderClass,
>                         VFIO_PCI_QUIRK_PROVIDER)
>         struct VFIOPCIQuirkProviderClass {
>         ObjectClass parent;
>             bool (*probe)(VFIOPCIDevice *vdev, Error **errp);
>         bool (*setup)(VFIOPCIDevice *vdev, Error **errp);
>     };
>         static const TypeInfo vfio_pci_quirk_info = {
>         .name = TYPE_VFIO_PCI_QUIRK_PROVIDER,
>         .parent = TYPE_OBJECT,
>         .class_size = sizeof(VFIOPCIQuirkClass),
>         .abstract = true,
>     };
>         static void register_vfio_pci_quirk_type(void)
>     {
>         type_register_static(&vfio_pci_quirk_info);
>     }
>         type_init(register_vfio_pci_quirk_type)
> 
> 
> Declare one for IGD
>         static void vfio_pci_quirk_igd_class_init(ObjectClass *klass, void *data)
>     {
>         VFIOPCIQuirkClass* vpqc = VFIO_PCI_QUIRK_PROVIDER_CLASS(klass);
>             vpqc->setup = vfio_probe_igd_quirk_probe;
>         vpqc->probe = vfio_probe_igd_quirk_probe;
>     }
>         static const TypeInfo vfio_pci_quirk_igd_info = {
>         .name = TYPE_VFIO_PCI_QUIRK_PROVIDER "-igd",
>         .parent = TYPE_VFIO_PCI_QUIRK_PROVIDER,
>         .class_init = vfio_pci_quirk_igd_class_init,
>         .class_size = sizeof(VFIOPCIQuirkClass),
>     };
>         static void vfio_pci_quirk_igd_register_types(void)
>     {
>         type_register_static(&vfio_pci_quirk_igd_info);
>     }
>         type_init(vfio_pci_quirk_igd_register_types)
> 
> 
> and in the common part, loop on all the classes to probe and setup :
> 
> 
>     static void vfio_pci_quirk_class_foreach(ObjectClass *klass, void *opaque)
>     {
>         VFIOPCIQuirkProviderClass* vpqc = VFIO_PCI_QUIRK_PROVIDER_CLASS(klass);
>         Error *local_err = NULL;
>             vpqc->setup(opaque, &local_err);
>     }
>         bool vfio_config_quirk_setup(VFIOPCIDevice *vdev, Error **errp)
>     {
>         object_class_foreach(vfio_pci_quirk_class_foreach,
>                              TYPE_VFIO_PCI_QUIRK_PROVIDER, false, vdev);
>        ...
> 
> 
> 
> 
> Thanks,
> 
> C.
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-02 16:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24 19:12 [PATCH v2 0/5] vfio/igd: remove incorrect IO BAR4 quirk Tomita Moeko
2025-01-24 19:12 ` [PATCH v2 1/5] vfio/igd: remove GTT write quirk in IO BAR 4 Tomita Moeko
2025-01-24 19:12 ` [PATCH v2 2/5] vfio/pci: add placeholder for device-specific config space quirks Tomita Moeko
2025-01-24 19:12 ` [PATCH v2 3/5] vfio/igd: refactor vfio_probe_igd_bar4_quirk() into pci config quirk Tomita Moeko
2025-01-31  9:14   ` Cédric Le Goater
2025-02-02 16:24     ` Tomita Moeko
2025-01-24 19:12 ` [PATCH v2 4/5] vfio/igd: do not include GTT stolen size in etc/igd-bdsm-size Tomita Moeko
2025-01-24 19:12 ` [PATCH v2 5/5] vfio/igd: handle x-igd-opregion in vfio_probe_igd_config_quirk() Tomita Moeko
2025-01-24 21:13   ` Alex Williamson
2025-01-25  7:42     ` Tomita Moeko
2025-01-30 18:33       ` Tomita Moeko
2025-01-30 20:41         ` Alex Williamson
2025-01-31 10:15           ` Cédric Le Goater
2025-02-01  7:57             ` Tomita Moeko
2025-02-01  7:48           ` Tomita Moeko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).