* [Qemu-devel] [PATCH v6 0/8] Series short description @ 2016-05-17 20:19 Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 1/8] vfio: Enable sparse mmap capability Alex Williamson ` (8 more replies) 0 siblings, 9 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm The following series implements... --- Alex Williamson (8): vfio: Enable sparse mmap capability vfio: Create device specific region info helper vfio/pci: Fix return of vfio_populate_vga() vfio/pci: Consolidate VGA setup vfio/pci: Setup BAR quirks after capabilities probing vfio/pci: Intel graphics legacy mode assignment vfio/pci: Add a separate option for IGD OpRegion support vfio/pci: Add IGD documentation docs/igd-assign.txt | 133 ++++++++ hw/vfio/common.c | 103 ++++++- hw/vfio/pci-quirks.c | 643 +++++++++++++++++++++++++++++++++++++++++ hw/vfio/pci.c | 151 ++++++---- hw/vfio/pci.h | 8 + include/hw/vfio/vfio-common.h | 2 trace-events | 11 + 7 files changed, 989 insertions(+), 62 deletions(-) create mode 100644 docs/igd-assign.txt -- Signature ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 1/8] vfio: Enable sparse mmap capability 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson @ 2016-05-17 20:19 ` Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 2/8] vfio: Create device specific region info helper Alex Williamson ` (7 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm The sparse mmap capability in a vfio region info allows vfio to tell us which sub-areas of a region may be mmap'd. Thus rather than assuming a single mmap covers the entire region and later frobbing it ourselves for things like the PCI MSI-X vector table, we can read that directly from vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/common.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++--- trace-events | 2 ++ 2 files changed, 65 insertions(+), 4 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f27db36..eb6f04e 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -496,6 +496,54 @@ static void vfio_listener_release(VFIOContainer *container) memory_listener_unregister(&container->listener); } +static struct vfio_info_cap_header * +vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id) +{ + struct vfio_info_cap_header *hdr; + void *ptr = info; + + if (!(info->flags & VFIO_REGION_INFO_FLAG_CAPS)) { + return NULL; + } + + for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) { + if (hdr->id == id) { + return hdr; + } + } + + return NULL; +} + +static void vfio_setup_region_sparse_mmaps(VFIORegion *region, + struct vfio_region_info *info) +{ + struct vfio_info_cap_header *hdr; + struct vfio_region_info_cap_sparse_mmap *sparse; + int i; + + hdr = vfio_get_region_info_cap(info, VFIO_REGION_INFO_CAP_SPARSE_MMAP); + if (!hdr) { + return; + } + + sparse = container_of(hdr, struct vfio_region_info_cap_sparse_mmap, header); + + trace_vfio_region_sparse_mmap_header(region->vbasedev->name, + region->nr, sparse->nr_areas); + + region->nr_mmaps = sparse->nr_areas; + region->mmaps = g_new0(VFIOMmap, region->nr_mmaps); + + for (i = 0; i < region->nr_mmaps; i++) { + region->mmaps[i].offset = sparse->areas[i].offset; + region->mmaps[i].size = sparse->areas[i].size; + trace_vfio_region_sparse_mmap_entry(i, region->mmaps[i].offset, + region->mmaps[i].offset + + region->mmaps[i].size); + } +} + int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region, int index, const char *name) { @@ -522,11 +570,14 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region, region->flags & VFIO_REGION_INFO_FLAG_MMAP && !(region->size & ~qemu_real_host_page_mask)) { - region->nr_mmaps = 1; - region->mmaps = g_new0(VFIOMmap, region->nr_mmaps); + vfio_setup_region_sparse_mmaps(region, info); - region->mmaps[0].offset = 0; - region->mmaps[0].size = region->size; + if (!region->nr_mmaps) { + region->nr_mmaps = 1; + region->mmaps = g_new0(VFIOMmap, region->nr_mmaps); + region->mmaps[0].offset = 0; + region->mmaps[0].size = region->size; + } } } @@ -1086,6 +1137,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index, *info = g_malloc0(argsz); (*info)->index = index; +retry: (*info)->argsz = argsz; if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) { @@ -1093,6 +1145,13 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index, return -errno; } + if ((*info)->argsz > argsz) { + argsz = (*info)->argsz; + *info = g_realloc(*info, argsz); + + goto retry; + } + return 0; } diff --git a/trace-events b/trace-events index 4fce005..99c5132 100644 --- a/trace-events +++ b/trace-events @@ -1734,6 +1734,8 @@ vfio_region_mmap(const char *name, unsigned long offset, unsigned long end) "Reg vfio_region_exit(const char *name, int index) "Device %s, region %d" vfio_region_finalize(const char *name, int index) "Device %s, region %d" vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps enabled: %d" +vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries" +vfio_region_sparse_mmap_entry(int i, off_t start, off_t end) "sparse entry %d [0x%lx - 0x%lx]" # hw/vfio/platform.c vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d" ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 2/8] vfio: Create device specific region info helper 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 1/8] vfio: Enable sparse mmap capability Alex Williamson @ 2016-05-17 20:19 ` Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 3/8] vfio/pci: Fix return of vfio_populate_vga() Alex Williamson ` (6 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm Given a device specific region type and sub-type, find it. Also cleanup return point on error in vfio_get_region_info() so that we always return 0 with a valid pointer or -errno and NULL. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/common.c | 36 ++++++++++++++++++++++++++++++++++++ include/hw/vfio/vfio-common.h | 2 ++ trace-events | 4 ++-- 3 files changed, 40 insertions(+), 2 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index eb6f04e..ae2c13b 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1142,6 +1142,7 @@ retry: if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) { g_free(*info); + *info = NULL; return -errno; } @@ -1155,6 +1156,41 @@ retry: return 0; } +int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type, + uint32_t subtype, struct vfio_region_info **info) +{ + int i; + + for (i = 0; i < vbasedev->num_regions; i++) { + struct vfio_info_cap_header *hdr; + struct vfio_region_info_cap_type *cap_type; + + if (vfio_get_region_info(vbasedev, i, info)) { + continue; + } + + hdr = vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE); + if (!hdr) { + g_free(*info); + continue; + } + + cap_type = container_of(hdr, struct vfio_region_info_cap_type, header); + + trace_vfio_get_dev_region(vbasedev->name, i, + cap_type->type, cap_type->subtype); + + if (cap_type->type == type && cap_type->subtype == subtype) { + return 0; + } + + g_free(*info); + } + + *info = NULL; + return -ENODEV; +} + /* * Interfaces for IBM EEH (Enhanced Error Handling) */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index eb0e1b0..a947f9c 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -154,5 +154,7 @@ extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces; #ifdef CONFIG_LINUX int vfio_get_region_info(VFIODevice *vbasedev, int index, struct vfio_region_info **info); +int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type, + uint32_t subtype, struct vfio_region_info **info); #endif #endif /* !HW_VFIO_VFIO_COMMON_H */ diff --git a/trace-events b/trace-events index 99c5132..32e7a78 100644 --- a/trace-events +++ b/trace-events @@ -1714,8 +1714,7 @@ vfio_quirk_ati_bonaire_reset_timeout(const char *name) "%s" vfio_quirk_ati_bonaire_reset_done(const char *name) "%s" vfio_quirk_ati_bonaire_reset(const char *name) "%s" - -# hw/vfio/vfio-common.c +# hw/vfio/common.c vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)" vfio_region_read(char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64 @@ -1736,6 +1735,7 @@ vfio_region_finalize(const char *name, int index) "Device %s, region %d" vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps enabled: %d" vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries" vfio_region_sparse_mmap_entry(int i, off_t start, off_t end) "sparse entry %d [0x%lx - 0x%lx]" +vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8" # hw/vfio/platform.c vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d" ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 3/8] vfio/pci: Fix return of vfio_populate_vga() 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 1/8] vfio: Enable sparse mmap capability Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 2/8] vfio: Create device specific region info helper Alex Williamson @ 2016-05-17 20:19 ` Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 4/8] vfio/pci: Consolidate VGA setup Alex Williamson ` (5 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm This function returns success if either we setup the VGA region or the host vfio doesn't return enough regions to support the VGA index. This latter case doesn't make any sense. If we're asked to populate VGA, fail if it doesn't exist and let the caller decide if that's important. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/pci.c | 55 ++++++++++++++++++++++++++----------------------------- 1 file changed, 26 insertions(+), 29 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d091d8c..dfce313 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2061,42 +2061,39 @@ int vfio_populate_vga(VFIOPCIDevice *vdev) struct vfio_region_info *reg_info; int ret; - if (vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) { - ret = vfio_get_region_info(vbasedev, - VFIO_PCI_VGA_REGION_INDEX, ®_info); - if (ret) { - return ret; - } + ret = vfio_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, ®_info); + if (ret) { + return ret; + } - if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) || - !(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) || - reg_info->size < 0xbffff + 1) { - error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx", - (unsigned long)reg_info->flags, - (unsigned long)reg_info->size); - g_free(reg_info); - return -EINVAL; - } + if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) || + !(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) || + reg_info->size < 0xbffff + 1) { + error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx", + (unsigned long)reg_info->flags, + (unsigned long)reg_info->size); + g_free(reg_info); + return -EINVAL; + } - vdev->vga = g_new0(VFIOVGA, 1); + vdev->vga = g_new0(VFIOVGA, 1); - vdev->vga->fd_offset = reg_info->offset; - vdev->vga->fd = vdev->vbasedev.fd; + vdev->vga->fd_offset = reg_info->offset; + vdev->vga->fd = vdev->vbasedev.fd; - g_free(reg_info); + g_free(reg_info); - vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE; - vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM; - QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks); + vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE; + vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM; + QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks); - vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE; - vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO; - QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks); + vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE; + vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO; + QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks); - vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE; - vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI; - QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks); - } + vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE; + vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI; + QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks); return 0; } ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 4/8] vfio/pci: Consolidate VGA setup 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson ` (2 preceding siblings ...) 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 3/8] vfio/pci: Fix return of vfio_populate_vga() Alex Williamson @ 2016-05-17 20:19 ` Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 5/8] vfio/pci: Setup BAR quirks after capabilities probing Alex Williamson ` (4 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm Combine VGA discovery and registration. Quirks can have dependencies on BARs, so the quirks push out until after we've scanned the BARs. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/pci.c | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index dfce313..daf10b8 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1452,29 +1452,6 @@ static void vfio_bars_setup(VFIOPCIDevice *vdev) for (i = 0; i < PCI_ROM_SLOT; i++) { vfio_bar_setup(vdev, i); } - - if (vdev->vga) { - memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_MEM].mem, - OBJECT(vdev), &vfio_vga_ops, - &vdev->vga->region[QEMU_PCI_VGA_MEM], - "vfio-vga-mmio@0xa0000", - QEMU_PCI_VGA_MEM_SIZE); - memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem, - OBJECT(vdev), &vfio_vga_ops, - &vdev->vga->region[QEMU_PCI_VGA_IO_LO], - "vfio-vga-io@0x3b0", - QEMU_PCI_VGA_IO_LO_SIZE); - memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem, - OBJECT(vdev), &vfio_vga_ops, - &vdev->vga->region[QEMU_PCI_VGA_IO_HI], - "vfio-vga-io@0x3c0", - QEMU_PCI_VGA_IO_HI_SIZE); - - pci_register_vga(&vdev->pdev, &vdev->vga->region[QEMU_PCI_VGA_MEM].mem, - &vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem, - &vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem); - vfio_vga_quirk_setup(vdev); - } } static void vfio_bars_exit(VFIOPCIDevice *vdev) @@ -2087,14 +2064,36 @@ int vfio_populate_vga(VFIOPCIDevice *vdev) vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM; QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks); + memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_MEM].mem, + OBJECT(vdev), &vfio_vga_ops, + &vdev->vga->region[QEMU_PCI_VGA_MEM], + "vfio-vga-mmio@0xa0000", + QEMU_PCI_VGA_MEM_SIZE); + vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE; vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO; QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks); + memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem, + OBJECT(vdev), &vfio_vga_ops, + &vdev->vga->region[QEMU_PCI_VGA_IO_LO], + "vfio-vga-io@0x3b0", + QEMU_PCI_VGA_IO_LO_SIZE); + vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE; vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI; QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks); + memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem, + OBJECT(vdev), &vfio_vga_ops, + &vdev->vga->region[QEMU_PCI_VGA_IO_HI], + "vfio-vga-io@0x3c0", + QEMU_PCI_VGA_IO_HI_SIZE); + + pci_register_vga(&vdev->pdev, &vdev->vga->region[QEMU_PCI_VGA_MEM].mem, + &vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem, + &vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem); + return 0; } @@ -2557,6 +2556,10 @@ static int vfio_initfn(PCIDevice *pdev) goto out_teardown; } + if (vdev->vga) { + vfio_vga_quirk_setup(vdev); + } + /* QEMU emulates all of MSI & MSIX */ if (pdev->cap_present & QEMU_PCI_CAP_MSIX) { memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff, ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 5/8] vfio/pci: Setup BAR quirks after capabilities probing 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson ` (3 preceding siblings ...) 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 4/8] vfio/pci: Consolidate VGA setup Alex Williamson @ 2016-05-17 20:19 ` Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment Alex Williamson ` (3 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm Capability probing modifies wmask, which quirks may be interested in changing themselves. Apply our BAR quirks after the capability scan to make this possible. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/pci.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index daf10b8..aa6fb7b 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1440,8 +1440,6 @@ static void vfio_bar_setup(VFIOPCIDevice *vdev, int nr) vdev->vbasedev.name, nr); } - vfio_bar_quirk_setup(vdev, nr); - pci_register_bar(&vdev->pdev, nr, type, bar->region.mem); } @@ -2394,7 +2392,7 @@ static int vfio_initfn(PCIDevice *pdev) ssize_t len; struct stat st; int groupid; - int ret; + int i, ret; if (!vdev->vbasedev.sysfsdev) { vdev->vbasedev.sysfsdev = @@ -2560,6 +2558,10 @@ static int vfio_initfn(PCIDevice *pdev) vfio_vga_quirk_setup(vdev); } + for (i = 0; i < PCI_ROM_SLOT; i++) { + vfio_bar_quirk_setup(vdev, i); + } + /* QEMU emulates all of MSI & MSIX */ if (pdev->cap_present & QEMU_PCI_CAP_MSIX) { memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff, ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson ` (4 preceding siblings ...) 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 5/8] vfio/pci: Setup BAR quirks after capabilities probing Alex Williamson @ 2016-05-17 20:19 ` Alex Williamson 2016-05-18 14:21 ` Gerd Hoffmann 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 7/8] vfio/pci: Add a separate option for IGD OpRegion support Alex Williamson ` (2 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:19 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm Enable quirks to support SandyBridge and newer IGD devices as primary VM graphics. This requires new vfio-pci device specific regions added in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config space access to the PCI host bridge and LPC/ISA bridge. VM firmware support, SeaBIOS only so far, is also required for reserving memory regions for IGD specific use. In order to enable this mode, IGD must be assigned to the VM at PCI bus address 00:02.0, it must have a ROM, it must be able to enable VGA, it must have or be able to create on its own an LPC/ISA bridge of the proper type at PCI bus address 00:1f.0 (sorry, not compatible with Q35 yet), and it must have the above noted vfio-pci kernel features and BIOS. The intention is that to enable this mode, a user simply needs to assign 00:02.0 from the host to 00:02.0 in the VM: -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0 and everything either happens automatically or it doesn't. In the case that it doesn't, we leave error reports, but assume the device will operate in universal passthrough mode (UPT), which doesn't require any of this, but has a much more narrow window of supported devices, supported use cases, and supported guest drivers. When using IGD in this mode, the VM firmware is required to reserve some VM RAM for the OpRegion (on the order or several 4k pages) and stolen memory for the GTT (up to 8MB for the latest GPUs). An additional option, x-igd-gms allows the user to specify some amount of additional memory (value is number of 32MB chunks up to 512MB) that is pre-allocated for graphics use. TBH, I don't know of anything that requires this or makes use of this memory, which is why we don't allocate any by default, but the specification suggests this is not actually a valid combination, so the option exists as a workaround. Please report if it's actually necessary in some environment. See code comments for further discussion about the actual operation of the quirks necessary to assign these devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/pci-quirks.c | 643 ++++++++++++++++++++++++++++++++++++++++++++++++++ hw/vfio/pci.c | 8 + hw/vfio/pci.h | 2 trace-events | 5 4 files changed, 657 insertions(+), 1 deletion(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index 49ecf11..f2b8ed5 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -11,9 +11,12 @@ */ #include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qemu/range.h" +#include "qapi/error.h" +#include "hw/nvram/fw_cfg.h" #include "pci.h" #include "trace.h" -#include "qemu/range.h" /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */ static bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint32_t device) @@ -962,6 +965,643 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr) } /* + * Intel IGD support + * + * Obviously IGD is not a discrete device, this is evidenced not only by it + * being integrated into the CPU, but by the various chipset and BIOS + * dependencies that it brings along with it. Intel is trying to move away + * from this and Broadwell and newer devices can run in what Intel calls + * "Universal Pass-Through" mode, or UPT. Theoretically in UPT mode, nothing + * more is required beyond assigning the IGD device to a VM. There are + * however support limitations to this mode. It only supports IGD as a + * secondary graphics device in the VM and it doesn't officially support any + * physical outputs. + * + * The code here attempts to enable what we'll call legacy mode assignment, + * IGD retains most of the capabilities we expect for it to have on bare + * metal. To enable this mode, the IGD device must be assigned to the VM + * at PCI address 00:02.0, it must have a ROM, it very likely needs VGA + * support, we must have VM BIOS support for reserving and populating some + * of the required tables, and we need to tweak the chipset with revisions + * and IDs and an LPC/ISA bridge device. The intention is to make all of + * this happen automatically by installing the device at the correct VM PCI + * bus address. If any of the conditions are not met, we cross our fingers + * and hope the user knows better. + * + * NB - It is possible to enable physical outputs in UPT mode by supplying + * an OpRegion table. We don't do this by default because the guest driver + * behaves differently if an OpRegion is provided and no monitor is attached + * vs no OpRegion and a monitor being attached or not. Effectively, if a + * headless setup is desired, the OpRegion gets in the way of that. + */ + +/* + * This presumes the device is already known to be an Intel VGA device, so we + * take liberties in which device ID bits match which generation. This should + * not be taken as an indication that all the devices are supported, or even + * supportable, some of them don't even support VT-d. + * See linux:include/drm/i915_pciids.h for IDs. + */ +static int igd_gen(VFIOPCIDevice *vdev) +{ + if ((vdev->device_id & 0xfff) == 0xa84) { + return 8; /* Broxton */ + } + + switch (vdev->device_id & 0xff00) { + /* Old, untested, unavailable, unknown */ + case 0x0000: + case 0x2500: + case 0x2700: + case 0x2900: + case 0x2a00: + case 0x2e00: + case 0x3500: + case 0xa000: + return -1; + /* SandyBridge, IvyBridge, ValleyView, Haswell */ + case 0x0100: + case 0x0400: + case 0x0a00: + case 0x0c00: + case 0x0d00: + case 0x0f00: + return 6; + /* BroadWell, CherryView, SkyLake, KabyLake */ + case 0x1600: + case 0x1900: + case 0x2200: + case 0x5900: + return 8; + } + + return 8; /* Assume newer is compatible */ +} + +typedef struct VFIOIGDQuirk { + struct VFIOPCIDevice *vdev; + uint32_t index; +} VFIOIGDQuirk; + +#define IGD_GMCH 0x50 /* Graphics Control Register */ +#define IGD_BDSM 0x5c /* Base Data of Stolen Memory */ +#define IGD_ASLS 0xfc /* ASL Storage Register */ + +/* + * The OpRegion includes the Video BIOS Table, which seems important for + * telling the driver what sort of outputs it has. Without this, the device + * may work in the guest, but we may not get output. This also requires BIOS + * support to reserve and populate a section of guest memory sufficient for + * the table and to write the base address of that memory to the ASLS register + * of the IGD device. + */ +static int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, + struct vfio_region_info *info) +{ + int ret; + + vdev->igd_opregion = g_malloc0(info->size); + ret = pread(vdev->vbasedev.fd, vdev->igd_opregion, + info->size, info->offset); + if (ret != info->size) { + error_report("vfio: Error reading IGD OpRegion"); + g_free(vdev->igd_opregion); + vdev->igd_opregion = NULL; + return -EINVAL; + } + + /* + * Provide fw_cfg with a copy of the OpRegion which the VM firmware is to + * allocate 32bit reserved memory for, copy these contents into, and write + * the reserved memory base address to the device ASLS register at 0xFC. + * Alignment of this reserved region seems flexible, but using a 4k page + * alignment seems to work well. This interface assumes a single IGD + * device, which may be at VM address 00:02.0 in legacy mode or another + * address in UPT mode. + * + * NB, there may be future use cases discovered where the VM should have + * direct interaction with the host OpRegion, in which case the write to + * the ASLS register would trigger MemoryRegion setup to enable that. + */ + fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion", + vdev->igd_opregion, info->size); + + trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name); + + pci_set_long(vdev->pdev.config + IGD_ASLS, 0); + pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0); + pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0); + + return 0; +} + +/* + * The rather short list of registers that we copy from the host devices. + * The LPC/ISA bridge values are definitely needed to support the vBIOS, the + * host bridge values may or may not be needed depending on the guest OS. + * Since we're only munging revision and subsystem values on the host bridge, + * we don't require our own device. The LPC/ISA bridge needs to be our very + * own though. + */ +typedef struct { + uint8_t offset; + uint8_t len; +} IGDHostInfo; + +static const IGDHostInfo igd_host_bridge_infos[] = { + {PCI_REVISION_ID, 2}, + {PCI_SUBSYSTEM_VENDOR_ID, 2}, + {PCI_SUBSYSTEM_ID, 2}, +}; + +static const IGDHostInfo igd_lpc_bridge_infos[] = { + {PCI_VENDOR_ID, 2}, + {PCI_DEVICE_ID, 2}, + {PCI_REVISION_ID, 2}, + {PCI_SUBSYSTEM_VENDOR_ID, 2}, + {PCI_SUBSYSTEM_ID, 2}, +}; + +static int vfio_pci_igd_copy(VFIOPCIDevice *vdev, PCIDevice *pdev, + struct vfio_region_info *info, + const IGDHostInfo *list, int len) +{ + int i, ret; + + for (i = 0; i < len; i++) { + ret = pread(vdev->vbasedev.fd, pdev->config + list[i].offset, + list[i].len, info->offset + list[i].offset); + if (ret != list[i].len) { + error_report("IGD copy failed: %m"); + return -errno; + } + } + + return 0; +} + +/* + * Stuff a few values into the host bridge. + */ +static int vfio_pci_igd_host_init(VFIOPCIDevice *vdev, + struct vfio_region_info *info) +{ + PCIBus *bus; + PCIDevice *host_bridge; + int ret; + + bus = pci_device_root_bus(&vdev->pdev); + host_bridge = pci_find_device(bus, 0, PCI_DEVFN(0, 0)); + + if (!host_bridge) { + error_report("Can't find host bridge"); + return -ENODEV; + } + + ret = vfio_pci_igd_copy(vdev, host_bridge, info, igd_host_bridge_infos, + ARRAY_SIZE(igd_host_bridge_infos)); + if (!ret) { + trace_vfio_pci_igd_host_bridge_enabled(vdev->vbasedev.name); + } + + return ret; +} + +/* + * IGD LPC/ISA bridge support code. The vBIOS needs this, but we can't write + * arbitrary values into just any bridge, so we must create our own. We try + * to handle if the user has created it for us, which they might want to do + * to enable multifuction so we don't occupy the whole PCI slot. + */ +static void vfio_pci_igd_lpc_bridge_realize(PCIDevice *pdev, Error **errp) +{ + if (pdev->devfn != PCI_DEVFN(0x1f, 0)) { + error_setg(errp, "VFIO dummy ISA/LPC bridge must have address 1f.0"); + } +} + +static void vfio_pci_igd_lpc_bridge_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + + dc->desc = "VFIO dummy ISA/LPC bridge for IGD assignment"; + dc->hotpluggable = false; + k->realize = vfio_pci_igd_lpc_bridge_realize; + k->class_id = PCI_CLASS_BRIDGE_ISA; +} + +static TypeInfo vfio_pci_igd_lpc_bridge_info = { + .name = "vfio-pci-igd-lpc-bridge", + .parent = TYPE_PCI_DEVICE, + .class_init = vfio_pci_igd_lpc_bridge_class_init, +}; + +static void vfio_pci_igd_register_types(void) +{ + type_register_static(&vfio_pci_igd_lpc_bridge_info); +} + +type_init(vfio_pci_igd_register_types) + +static int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev, + struct vfio_region_info *info) +{ + PCIDevice *lpc_bridge; + int ret; + + lpc_bridge = pci_find_device(pci_device_root_bus(&vdev->pdev), + 0, PCI_DEVFN(0x1f, 0)); + if (!lpc_bridge) { + lpc_bridge = pci_create_simple(pci_device_root_bus(&vdev->pdev), + PCI_DEVFN(0x1f, 0), "vfio-pci-igd-lpc-bridge"); + } + + ret = vfio_pci_igd_copy(vdev, lpc_bridge, info, igd_lpc_bridge_infos, + ARRAY_SIZE(igd_lpc_bridge_infos)); + if (!ret) { + trace_vfio_pci_igd_lpc_bridge_enabled(vdev->vbasedev.name); + } + + return ret; +} + +/* + * IGD Gen8 and newer support up to 8MB for the GTT and use a 64bit PTE + * entry, older IGDs use 2MB and 32bit. Each PTE maps a 4k page. Therefore + * we either have 2M/4k * 4 = 2k or 8M/4k * 8 = 16k as the maximum iobar index + * for programming the GTT. + * + * See linux:include/drm/i915_drm.h for shift and mask values. + */ +static int vfio_igd_gtt_max(VFIOPCIDevice *vdev) +{ + uint32_t gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, sizeof(gmch)); + int ggms, gen = igd_gen(vdev); + + gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, sizeof(gmch)); + ggms = (gmch >> (gen < 8 ? 8 : 6)) & 0x3; + if (gen > 6) { + ggms = 1 << ggms; + } + + ggms *= 1024 * 1024; + + return (ggms / (4 * 1024)) * (gen < 8 ? 4 : 8); +} + +/* + * The IGD ROM will make use of stolen memory (GGMS) for support of VESA modes. + * Somehow the host stolen memory range is used for this, but how the ROM gets + * it is a mystery, perhaps it's hardcoded into the ROM. Thankfully though, it + * reprograms the GTT through the IOBAR where we can trap it and transpose the + * programming to the VM allocated buffer. That buffer gets reserved by the VM + * firmware via the fw_cfg entry added below. Here we're just monitoring the + * IOBAR address and data registers to detect a write sequence targeting the + * GTTADR. This code is developed by observed behavior and doesn't have a + * direct spec reference, unfortunately. + */ +static uint64_t vfio_igd_quirk_data_read(void *opaque, + hwaddr addr, unsigned size) +{ + VFIOIGDQuirk *igd = opaque; + VFIOPCIDevice *vdev = igd->vdev; + + igd->index = ~0; + + return vfio_region_read(&vdev->bars[4].region, addr + 4, size); +} + +static void vfio_igd_quirk_data_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + VFIOIGDQuirk *igd = opaque; + VFIOPCIDevice *vdev = igd->vdev; + uint64_t val = data; + int gen = igd_gen(vdev); + + /* + * Programming the GGMS starts at index 0x1 and uses every 4th index (ie. + * 0x1, 0x5, 0x9, 0xd,...). For pre-Gen8 each 4-byte write is a whole PTE + * entry, with 0th bit enable set. For Gen8 and up, PTEs are 64bit, so + * entries 0x5 & 0xd are the high dword, in our case zero. Each PTE points + * to a 4k page, which we translate to a page from the VM allocated region, + * pointed to by the BDSM register. If this is not set, we fail. + * + * We trap writes to the full configured GTT size, but we typically only + * see the vBIOS writing up to (nearly) the 1MB barrier. In fact it often + * seems to miss the last entry for an even 1MB GTT. Doing a gratuitous + * write of that last entry does work, but is hopefully unnecessary since + * we clear the previous GTT on initialization. + */ + if ((igd->index % 4 == 1) && igd->index < vfio_igd_gtt_max(vdev)) { + if (gen < 8 || (igd->index % 8 == 1)) { + uint32_t base; + + base = pci_get_long(vdev->pdev.config + IGD_BDSM); + if (!base) { + hw_error("vfio-igd: Guest attempted to program IGD GTT before " + "BIOS reserved stolen memory. Unsupported BIOS?"); + } + + val = base | (data & ((1 << 20) - 1)); + } else { + val = 0; /* upper 32bits of pte, we only enable below 4G PTEs */ + } + + trace_vfio_pci_igd_bar4_write(vdev->vbasedev.name, + igd->index, data, val); + } + + vfio_region_write(&vdev->bars[4].region, addr + 4, val, size); + + igd->index = ~0; +} + +static const MemoryRegionOps vfio_igd_data_quirk = { + .read = vfio_igd_quirk_data_read, + .write = vfio_igd_quirk_data_write, + .endianness = DEVICE_LITTLE_ENDIAN, +}; + +static uint64_t vfio_igd_quirk_index_read(void *opaque, + hwaddr addr, unsigned size) +{ + VFIOIGDQuirk *igd = opaque; + VFIOPCIDevice *vdev = igd->vdev; + + igd->index = ~0; + + return vfio_region_read(&vdev->bars[4].region, addr, size); +} + +static void vfio_igd_quirk_index_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + VFIOIGDQuirk *igd = opaque; + VFIOPCIDevice *vdev = igd->vdev; + + igd->index = data; + + vfio_region_write(&vdev->bars[4].region, addr, data, size); +} + +static const MemoryRegionOps vfio_igd_index_quirk = { + .read = vfio_igd_quirk_index_read, + .write = vfio_igd_quirk_index_write, + .endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr) +{ + struct vfio_region_info *rom = NULL, *opregion = NULL, + *host = NULL, *lpc = NULL; + VFIOQuirk *quirk; + VFIOIGDQuirk *igd; + PCIDevice *lpc_bridge; + int i, ret, ggms_mb, gms_mb = 0, gen; + uint64_t *bdsm_size; + uint32_t gmch; + uint16_t cmd_orig, cmd; + + /* + * This must be an Intel VGA device at address 00:02.0 for us to even + * consider enabling legacy mode. The vBIOS has dependencies on the + * PCI bus address. + */ + if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) || + !vfio_is_vga(vdev) || nr != 4 || + &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev), + 0, PCI_DEVFN(0x2, 0))) { + return; + } + + /* + * We need to create an LPC/ISA bridge at PCI bus address 00:1f.0 that we + * can stuff host values into, so if there's already one there and it's not + * one we can hack on, legacy mode is no-go. Sorry Q35. + */ + lpc_bridge = pci_find_device(pci_device_root_bus(&vdev->pdev), + 0, PCI_DEVFN(0x1f, 0)); + if (lpc_bridge && !object_dynamic_cast(OBJECT(lpc_bridge), + "vfio-pci-igd-lpc-bridge")) { + error_report("IGD device %s cannot support legacy mode due to existing " + "devices at address 1f.0", vdev->vbasedev.name); + return; + } + + /* + * IGD is not a standard, they like to change their specs often. We + * only attempt to support back to SandBridge and we hope that newer + * devices maintain compatibility with generation 8. + */ + gen = igd_gen(vdev); + if (gen != 6 && gen != 8) { + error_report("IGD device %s is unsupported in legacy mode, " + "try SandyBridge or newer", vdev->vbasedev.name); + return; + } + + /* + * Most of what we're doing here is to enable the ROM to run, so if + * there's no ROM, there's no point in setting up this quirk. + * NB. We only seem to get BIOS ROMs, so a UEFI VM would need CSM support. + */ + ret = vfio_get_region_info(&vdev->vbasedev, + VFIO_PCI_ROM_REGION_INDEX, &rom); + if ((ret || !rom->size) && !vdev->pdev.romfile) { + error_report("IGD device %s has no ROM, legacy mode disabled", + vdev->vbasedev.name); + goto out; + } + + /* + * Ignore the hotplug corner case, mark the ROM failed, we can't + * create the devices we need for legacy mode in the hotplug scenario. + */ + if (vdev->pdev.qdev.hotplugged) { + error_report("IGD device %s hotplugged, ROM disabled, " + "legacy mode disabled", vdev->vbasedev.name); + vdev->rom_read_failed = true; + goto out; + } + + /* + * Check whether we have all the vfio device specific regions to + * support legacy mode (added in Linux v4.6). If not, bail. + */ + ret = vfio_get_dev_region_info(&vdev->vbasedev, + VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL, + VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion); + if (ret) { + error_report("IGD device %s does not support OpRegion access," + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + ret = vfio_get_dev_region_info(&vdev->vbasedev, + VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL, + VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG, &host); + if (ret) { + error_report("IGD device %s does not support host bridge access," + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + ret = vfio_get_dev_region_info(&vdev->vbasedev, + VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL, + VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG, &lpc); + if (ret) { + error_report("IGD device %s does not support LPC bridge access," + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, 4); + + /* + * If IGD VGA Disable is clear (expected) and VGA is not already enabled, + * try to enable it. Probably shouldn't be using legacy mode without VGA, + * but also no point in us enabling VGA if disabled in hardware. + */ + if (!(gmch & 0x2) && !vdev->vga && vfio_populate_vga(vdev)) { + error_report("IGD device %s failed to enable VGA access, " + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + /* Create our LPC/ISA bridge */ + ret = vfio_pci_igd_lpc_init(vdev, lpc); + if (ret) { + error_report("IGD device %s failed to create LPC bridge, " + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + /* Stuff some host values into the VM PCI host bridge */ + ret = vfio_pci_igd_host_init(vdev, host); + if (ret) { + error_report("IGD device %s failed to modify host bridge, " + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + /* Setup OpRegion access */ + ret = vfio_pci_igd_opregion_init(vdev, opregion); + if (ret) { + error_report("IGD device %s failed to setup OpRegion, " + "legacy mode disabled", vdev->vbasedev.name); + goto out; + } + + /* Setup our quirk to munge GTT addresses to the VM allocated buffer */ + quirk = g_malloc0(sizeof(*quirk)); + quirk->mem = g_new0(MemoryRegion, 2); + quirk->nr_mem = 2; + igd = quirk->data = g_malloc0(sizeof(*igd)); + igd->vdev = vdev; + igd->index = ~0; + + memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_index_quirk, + igd, "vfio-igd-index-quirk", 4); + memory_region_add_subregion_overlap(vdev->bars[nr].region.mem, + 0, &quirk->mem[0], 1); + + memory_region_init_io(&quirk->mem[1], OBJECT(vdev), &vfio_igd_data_quirk, + igd, "vfio-igd-data-quirk", 4); + memory_region_add_subregion_overlap(vdev->bars[nr].region.mem, + 4, &quirk->mem[1], 1); + + QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next); + + /* Determine the size of stolen memory needed for GTT */ + ggms_mb = (gmch >> (gen < 8 ? 8 : 6)) & 0x3; + if (gen > 6) { + ggms_mb = 1 << ggms_mb; + } + + /* + * Assume we have no GMS memory, but allow it to be overrided by device + * option (experimental). The spec doesn't actually allow zero GMS when + * when IVD (IGD VGA Disable) is clear, but the claim is that it's unused, + * so let's not waste VM memory for it. + */ + gmch &= ~((gen < 8 ? 0x1f : 0xff) << (gen < 8 ? 3 : 8)); + + if (vdev->igd_gms) { + if (vdev->igd_gms <= 0x10) { + gms_mb = vdev->igd_gms * 32; + gmch |= vdev->igd_gms << (gen < 8 ? 3 : 8); + } else { + error_report("Unsupported IGD GMS value 0x%x", vdev->igd_gms); + vdev->igd_gms = 0; + } + } + + /* + * Request reserved memory for stolen memory via fw_cfg. VM firmware + * must allocate a 1MB aligned reserved memory region below 4GB with + * the requested size (in bytes) for use by the Intel PCI class VGA + * device at VM address 00:02.0. The base address of this reserved + * memory region must be written to the device BDSM regsiter at PCI + * config offset 0x5C. + */ + bdsm_size = g_malloc(sizeof(*bdsm_size)); + *bdsm_size = cpu_to_le64((ggms_mb + gms_mb) * 1024 * 1024); + fw_cfg_add_file(fw_cfg_find(), "etc/igd-bdsm-size", + bdsm_size, sizeof(*bdsm_size)); + + /* GMCH is read-only, emulated */ + pci_set_long(vdev->pdev.config + IGD_GMCH, gmch); + pci_set_long(vdev->pdev.wmask + IGD_GMCH, 0); + pci_set_long(vdev->emulated_config_bits + IGD_GMCH, ~0); + + /* BDSM is read-write, emulated. The BIOS needs to be able to write it */ + pci_set_long(vdev->pdev.config + IGD_BDSM, 0); + pci_set_long(vdev->pdev.wmask + IGD_BDSM, ~0); + pci_set_long(vdev->emulated_config_bits + IGD_BDSM, ~0); + + /* + * This IOBAR gives us access to GTTADR, which allows us to write to + * the GTT itself. So let's go ahead and write zero to all the GTT + * entries to avoid spurious DMA faults. Be sure I/O access is enabled + * before talking to the device. + */ + if (pread(vdev->vbasedev.fd, &cmd_orig, sizeof(cmd_orig), + vdev->config_offset + PCI_COMMAND) != sizeof(cmd_orig)) { + error_report("IGD device %s - failed to read PCI command register", + vdev->vbasedev.name); + } + + cmd = cmd_orig | PCI_COMMAND_IO; + + if (pwrite(vdev->vbasedev.fd, &cmd, sizeof(cmd), + vdev->config_offset + PCI_COMMAND) != sizeof(cmd)) { + error_report("IGD device %s - failed to write PCI command register", + vdev->vbasedev.name); + } + + for (i = 1; i < vfio_igd_gtt_max(vdev); i += 4) { + vfio_region_write(&vdev->bars[4].region, 0, i, 4); + vfio_region_write(&vdev->bars[4].region, 4, 0, 4); + } + + if (pwrite(vdev->vbasedev.fd, &cmd_orig, sizeof(cmd_orig), + vdev->config_offset + PCI_COMMAND) != sizeof(cmd_orig)) { + error_report("IGD device %s - failed to restore PCI command register", + vdev->vbasedev.name); + } + + trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name, ggms_mb + gms_mb); + +out: + g_free(rom); + g_free(opregion); + g_free(host); + g_free(lpc); +} + +/* * Common quirk probe entry points. */ void vfio_vga_quirk_setup(VFIOPCIDevice *vdev) @@ -1010,6 +1650,7 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr) vfio_probe_nvidia_bar5_quirk(vdev, nr); vfio_probe_nvidia_bar0_quirk(vdev, nr); vfio_probe_rtl8168_bar2_quirk(vdev, nr); + vfio_probe_igd_bar4_quirk(vdev, nr); } void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index aa6fb7b..06d91cc 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2605,6 +2605,13 @@ static void vfio_instance_finalize(Object *obj) vfio_bars_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); + /* + * XXX Leaking igd_opregion is not an oversight, we can't remove the + * fw_cfg entry therefore leaking this allocation seems like the safest + * option. + * + * g_free(vdev->igd_opregion); + */ vfio_put_device(vdev); vfio_put_group(group); } @@ -2689,6 +2696,7 @@ static Property vfio_pci_dev_properties[] = { sub_vendor_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice, sub_device_id, PCI_ANY_ID), + DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0), /* * TODO - support passed fds... is this necessary? * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name), diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 3976f68..31ee8da 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -115,6 +115,7 @@ typedef struct VFIOPCIDevice { int interrupt; /* Current interrupt type */ VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */ VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */ + void *igd_opregion; PCIHostDeviceAddress host; EventNotifier err_notifier; EventNotifier req_notifier; @@ -129,6 +130,7 @@ typedef struct VFIOPCIDevice { #define VFIO_FEATURE_ENABLE_REQ_BIT 1 #define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT) int32_t bootindex; + uint32_t igd_gms; uint8_t pm_cap; bool has_vga; bool pci_aer; diff --git a/trace-events b/trace-events index 32e7a78..3659990 100644 --- a/trace-events +++ b/trace-events @@ -1713,6 +1713,11 @@ vfio_quirk_ati_bonaire_reset_no_smc(const char *name) "%s" vfio_quirk_ati_bonaire_reset_timeout(const char *name) "%s" vfio_quirk_ati_bonaire_reset_done(const char *name) "%s" vfio_quirk_ati_bonaire_reset(const char *name) "%s" +vfio_pci_igd_bar4_write(const char *name, uint32_t index, uint32_t data, uint32_t base) "%s [%03x] %08x -> %08x" +vfio_pci_igd_bdsm_enabled(const char *name, int size) "%s %dMB" +vfio_pci_igd_opregion_enabled(const char *name) "%s" +vfio_pci_igd_host_bridge_enabled(const char *name) "%s" +vfio_pci_igd_lpc_bridge_enabled(const char *name) "%s" # hw/vfio/common.c vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)" ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment Alex Williamson @ 2016-05-18 14:21 ` Gerd Hoffmann 2016-05-18 16:50 ` Alex Williamson 0 siblings, 1 reply; 21+ messages in thread From: Gerd Hoffmann @ 2016-05-18 14:21 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, allen.m.kay, kvm, Zhao, Xinda, Kevin Tian On Di, 2016-05-17 at 14:19 -0600, Alex Williamson wrote: > +static int igd_gen(VFIOPCIDevice *vdev) > +{ > + if ((vdev->device_id & 0xfff) == 0xa84) { > + return 8; /* Broxton */ > + } > + > + switch (vdev->device_id & 0xff00) { > + /* Old, untested, unavailable, unknown */ > + case 0x0000: > + case 0x2500: > + case 0x2700: > + case 0x2900: > + case 0x2a00: > + case 0x2e00: > + case 0x3500: > + case 0xa000: > + return -1; > + /* SandyBridge, IvyBridge, ValleyView, Haswell */ > + case 0x0100: > + case 0x0400: > + case 0x0a00: > + case 0x0c00: > + case 0x0d00: > + case 0x0f00: > + return 6; > + /* BroadWell, CherryView, SkyLake, KabyLake */ > + case 0x1600: > + case 0x1900: > + case 0x2200: > + case 0x5900: > + return 8; > + } > + > + return 8; /* Assume newer is compatible */ > +} > + This link: https://github.com/01org/Igvtg-qemu/commit/ea32e6769004d6eb98d2dbd859d81bf1885c6ad2#diff-9d4d99332b83a7de33cbeed489d60448R920 happened to land in my inbox these days. It provides a bunch of functions called is_$codename() and intel_gen_version(). Looks more complete to me. Maybe we should pick up them and find a place in the qemu source tree where both vfio and intel-vgpu (and maybe xen device assignment too) can share those functions? cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment 2016-05-18 14:21 ` Gerd Hoffmann @ 2016-05-18 16:50 ` Alex Williamson 0 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-18 16:50 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, allen.m.kay, kvm, Zhao, Xinda, Kevin Tian On Wed, 18 May 2016 16:21:58 +0200 Gerd Hoffmann <kraxel@redhat.com> wrote: > On Di, 2016-05-17 at 14:19 -0600, Alex Williamson wrote: > > +static int igd_gen(VFIOPCIDevice *vdev) > > +{ > > + if ((vdev->device_id & 0xfff) == 0xa84) { > > + return 8; /* Broxton */ > > + } > > + > > + switch (vdev->device_id & 0xff00) { > > + /* Old, untested, unavailable, unknown */ > > + case 0x0000: > > + case 0x2500: > > + case 0x2700: > > + case 0x2900: > > + case 0x2a00: > > + case 0x2e00: > > + case 0x3500: > > + case 0xa000: > > + return -1; > > + /* SandyBridge, IvyBridge, ValleyView, Haswell */ > > + case 0x0100: > > + case 0x0400: > > + case 0x0a00: > > + case 0x0c00: > > + case 0x0d00: > > + case 0x0f00: > > + return 6; > > + /* BroadWell, CherryView, SkyLake, KabyLake */ > > + case 0x1600: > > + case 0x1900: > > + case 0x2200: > > + case 0x5900: > > + return 8; > > + } > > + > > + return 8; /* Assume newer is compatible */ > > +} > > + > > This link: > https://github.com/01org/Igvtg-qemu/commit/ea32e6769004d6eb98d2dbd859d81bf1885c6ad2#diff-9d4d99332b83a7de33cbeed489d60448R920 > > happened to land in my inbox these days. It provides a bunch of > functions called is_$codename() and intel_gen_version(). Looks more > complete to me. Maybe we should pick up them and find a place in the > qemu source tree where both vfio and intel-vgpu (and maybe xen device > assignment too) can share those functions? Yeah, I used to have something more similar to that, this was an attempt to simplify since I know I'm already dealing with an Intel VGA device. That let's me not care that some of the device IDs I'll match won't be IGD devices because they'll never get to this filter. For me the is_sandbridge() and is_ivybridge() tests simplifies to that single first case of 0x0100. All of is_haswell() is covered by the next four lines, is_broadwell() and is_skylake() the first two in the next section. So I think I have those sufficiently covered and I add a few more, but it's entirely possible that there's no VT-d support on those extras, I didn't lookup ark specs for every processor in the series. I'm all for re-using a common function once we have more than one user, but I think for now this is simple and does the job. Thanks, Alex ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 7/8] vfio/pci: Add a separate option for IGD OpRegion support 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson ` (5 preceding siblings ...) 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment Alex Williamson @ 2016-05-17 20:20 ` Alex Williamson 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation Alex Williamson 2016-05-17 20:42 ` [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) Alex Williamson 8 siblings, 0 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:20 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm The IGD OpRegion is enabled automatically when running in legacy mode, but it can sometimes be useful in universal passthrough mode as well. Without an OpRegion, output spigots don't work, and even though Intel doesn't officially support physical outputs in UPT mode, it's a useful feature. Note that if an OpRegion is enabled but a monitor is not connected, some graphics features will be disabled in the guest versus a headless system without an OpRegion, where they would work. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- hw/vfio/pci-quirks.c | 4 ++-- hw/vfio/pci.c | 31 +++++++++++++++++++++++++++++++ hw/vfio/pci.h | 6 ++++++ 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index f2b8ed5..35d32b7 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -1055,8 +1055,8 @@ typedef struct VFIOIGDQuirk { * the table and to write the base address of that memory to the ASLS register * of the IGD device. */ -static int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, - struct vfio_region_info *info) +int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, + struct vfio_region_info *info) { int ret; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 06d91cc..deab0c6 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2562,6 +2562,35 @@ static int vfio_initfn(PCIDevice *pdev) vfio_bar_quirk_setup(vdev, i); } + if (!vdev->igd_opregion && + vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION) { + struct vfio_region_info *opregion; + + if (vdev->pdev.qdev.hotplugged) { + error_report("Cannot support IGD OpRegion feature on hotplugged " + "device %s", vdev->vbasedev.name); + ret = -EINVAL; + goto out_teardown; + } + + ret = vfio_get_dev_region_info(&vdev->vbasedev, + VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL, + VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion); + if (ret) { + error_report("Device %s does not support requested IGD OpRegion " + "feature", vdev->vbasedev.name); + goto out_teardown; + } + + ret = vfio_pci_igd_opregion_init(vdev, opregion); + g_free(opregion); + if (ret) { + error_report("Device %s IGD OpRegion initialization failed", + vdev->vbasedev.name); + goto out_teardown; + } + } + /* QEMU emulates all of MSI & MSIX */ if (pdev->cap_present & QEMU_PCI_CAP_MSIX) { memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff, @@ -2686,6 +2715,8 @@ static Property vfio_pci_dev_properties[] = { VFIO_FEATURE_ENABLE_VGA_BIT, false), DEFINE_PROP_BIT("x-req", VFIOPCIDevice, features, VFIO_FEATURE_ENABLE_REQ_BIT, true), + DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, + VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false), DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false), diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 31ee8da..b3eb0d8 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -129,6 +129,9 @@ typedef struct VFIOPCIDevice { #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT) #define VFIO_FEATURE_ENABLE_REQ_BIT 1 #define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT) +#define VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT 2 +#define VFIO_FEATURE_ENABLE_IGD_OPREGION \ + (1 << VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT) int32_t bootindex; uint32_t igd_gms; uint8_t pm_cap; @@ -161,4 +164,7 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev); int vfio_populate_vga(VFIOPCIDevice *vdev); +int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, + struct vfio_region_info *info); + #endif /* HW_VFIO_VFIO_PCI_H */ ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson ` (6 preceding siblings ...) 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 7/8] vfio/pci: Add a separate option for IGD OpRegion support Alex Williamson @ 2016-05-17 20:20 ` Alex Williamson 2016-05-18 2:23 ` Eric Blake 2016-05-17 20:42 ` [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) Alex Williamson 8 siblings, 1 reply; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:20 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm Document the usage modes, host primary graphics considerations, usage, and fw_cfg ABI required for IGD assignment with vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- docs/igd-assign.txt | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 docs/igd-assign.txt diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt new file mode 100644 index 0000000..e17bb50 --- /dev/null +++ b/docs/igd-assign.txt @@ -0,0 +1,133 @@ +Intel Graphics Device (IGD) assignment with vfio-pci +==================================================== + +IGD has two different modes for assignment using vfio-pci: + +1) Universal Pass-Through (UPT) mode: + + In this mode the IGD device is added as a *secondary* (ie. non-primary) + graphics device in combination with an emulated primary graphics device. + This mode *requires* guest driver support to remove the external + dependencies generally associated with IGD (see below). Those guest + drivers only support this mode for Broadwell and newer IGD, according to + Intel. Additionally, this mode by default, and as officially supported + by Intel, does not support direct video output. The intention is to use + this mode either to provide hardware acceleration to the emulated graphics + or to use this mode in combination with guest-based remote access software, + for example VNC (see below for optional output support). This mode + theoretically has no device specific handling dependencies on vfio-pci or + the VM firmware. + +2) "Legacy" mode: + + In this mode the IGD device is intended to be the primary and exclusive + graphics device in the VM[1], as such QEMU does not facilitate any sort + of remote graphics to the VM in this mode. A connected physical monitor + is the intended output device for IGD. This mode includes several + requirements and restrictions: + + * IGD must be given address 02.0 on the PCI root bus in the VM + * The host kernel must support vfio extensions for IGD (v4.6) + * vfio VGA support very likely needs to be enabled in the host kernel + * The VM firmware must support specific fw_cfg enablers for IGD + * The VM machine type must support a PCI host bridge at 00.0 (standard) + * The VM machine type must provide or allow to be created a special + ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at + PCI address 1f.0. + * The IGD device must have a VGA ROM, either provided via the romfile + option or loaded automatically through vfio (standard). rombar=0 + will disable legacy mode support. + * Hotplug of the IGD device is not supported. + * The IGD device must be a SandyBridge or newer model device. + +For either mode, depending on the host kernel, the i915 driver in the host +may generate faults and errors upon re-binding to an IGD device after it +has been assigned to a VM. It's therefore generally recommended to prevent +such driver binding unless the host driver is known to work well for this. +There are numerous ways to do this, i915 can be blacklisted on the host, +the driver_override option can be used to ensure that only vfio-pci can bind +to the device on the host[2], virsh nodedev-detach can be used to bind the +device to vfio drivers and then managed='no' set in the VM xml to prevent +re-binding to i915, etc. Also note that IGD is also typically the primary +graphics in the host and special options may be required beyond simply +blacklisting i915 or using pci-stub/vfio-pci to take ownership of IGD as a +PCI class device. Lower level drivers exist that may still claim the device. +It may therefore be necessary to use kernel boot options video=vesafb:off or +video=efifb:off (depending on host BIOS/UEFI) or these can be combined to +a catch-all, video=vesafb:off,efifb:off. Error messages such as: + + Failed to mmap 0000:00:02.0 BAR <>. Performance may be slow + +are a good indicator that such a problem exists. The host files /proc/iomem +and /proc/ioports are often useful for identifying drivers consuming ranges +of the device to cause such conflicts. + +Additionally, IGD device are known to generate small numbers of DMAR faults +when initially assigned. It is believed that this is simply the IGD attempting +to access the reserved GTT space after reset, which it no longer has access to +when accessed from userspace. So long as the DMAR faults are small in number +and most importantly, not ongoing, these are not an indication of an error. + +Additionally++, analog VGA output (as opposed to digital outputs like HDMI, +DVI, or DisplayPort) may be unsupported in some use cases. In the author's +experience, even DP to VGA adapters can be troublesome while adapters between +digital formats work well. + +Usage +===== +The intention is for IGD assignment to be transparent for users and thus for +management tools like libvirt. To make use of legacy mode, simply remove all +other graphics options and use "-nographic" and either "-vga none" or +"-nodefaults", along with adding the device using vfio-pci: + + -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2 + +For UPT mode, retain the default emulated graphics and simply add the vfio-pci +device making use of any other bus address other than 02.0. libvirt will +default to assigning the device a UPT compatible address while legacy mode +users will need to manually edit the XML if using a tool like virt-manager +where the VM device address is not expressly specified. + +An experimental vfio-pci option also exists to enable OpRegion, and thus +external monitor support, for UPT mode. This can be enabled by adding +"x-igd-opregion=on" to the vfio-pci device options for the IGD device. As +with legacy mode, this requires the host to support features introduced in +the v4.6 kernel. If Intel chooses to embrace this support, the option may +be made non-experimental in the future, opening it to libvirt support. + +Developer ABI +============= +Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware: + +1) "etc/igd-opregion" + + This fw_cfg file exposes the OpRegion for the IGD device. A reserved + region should be created below 4GB (recommended 4KB alignment), sized + sufficient for the fw_cfg file size, and the content of this file copied + to it. The dword based address of this reserved memory region must also + be written to the ASLS register at offset 0xFC on the IGD device. It is + recommended that firmware should make use of this fw_cfg entry for any + PCI class VGA device with Intel vendor ID. Multiple of such devices + within a VM is undefined. + +2) "etc/igd-bdsm-size" + + This fw_cfg file contains an 8-byte, little endian integer indicating + the size of the reserved memory region required for IGD stolen memory. + Firmware must allocate a reserved memory below 4GB with required 1MB + alignment equal to this size. Additionally the base address of this + reserved region must be written to the dword BDSM register in PCI config + space of the IGD device at offset 0x5C. As this support is related to + running the IGD ROM, which has other dependencies on the device appearing + at guest address 00:02.0, it's expected that this fw_cfg file is only + relevant to a single PCI class VGA device with Intel vendor ID, appearing + at PCI bus address 00:02.0. + +Footnotes +========= +[1] Nothing precludes adding additional emulated or assigned graphics devices + as non-primary, other than the combination typically not working. I only + intend to set user expectations, others are welcome to find working + combinations or fix whatever issues prevent this from working in the common + case. +[2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation Alex Williamson @ 2016-05-18 2:23 ` Eric Blake 2016-05-18 18:35 ` Alex Williamson 0 siblings, 1 reply; 21+ messages in thread From: Eric Blake @ 2016-05-18 2:23 UTC (permalink / raw) To: Alex Williamson, qemu-devel; +Cc: allen.m.kay, kraxel, kvm [-- Attachment #1: Type: text/plain, Size: 1048 bytes --] On 05/17/2016 02:20 PM, Alex Williamson wrote: > Document the usage modes, host primary graphics considerations, usage, > and fw_cfg ABI required for IGD assignment with vfio. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > docs/igd-assign.txt | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 133 insertions(+) > create mode 100644 docs/igd-assign.txt > > diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt > new file mode 100644 > index 0000000..e17bb50 > --- /dev/null > +++ b/docs/igd-assign.txt > @@ -0,0 +1,133 @@ > +Intel Graphics Device (IGD) assignment with vfio-pci > +==================================================== Without an explicit copyright notice and license, your documentation defaults to GPLv2+, per the top level COPYING. If you want something else, it may be worth explicitly calling it out. Otherwise, it was a good read. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation 2016-05-18 2:23 ` Eric Blake @ 2016-05-18 18:35 ` Alex Williamson 2016-05-18 19:28 ` Eric Blake 0 siblings, 1 reply; 21+ messages in thread From: Alex Williamson @ 2016-05-18 18:35 UTC (permalink / raw) To: Eric Blake; +Cc: qemu-devel, allen.m.kay, kraxel, kvm On Tue, 17 May 2016 20:23:42 -0600 Eric Blake <eblake@redhat.com> wrote: > On 05/17/2016 02:20 PM, Alex Williamson wrote: > > Document the usage modes, host primary graphics considerations, usage, > > and fw_cfg ABI required for IGD assignment with vfio. > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > docs/igd-assign.txt | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 133 insertions(+) > > create mode 100644 docs/igd-assign.txt > > > > diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt > > new file mode 100644 > > index 0000000..e17bb50 > > --- /dev/null > > +++ b/docs/igd-assign.txt > > @@ -0,0 +1,133 @@ > > +Intel Graphics Device (IGD) assignment with vfio-pci > > +==================================================== > > Without an explicit copyright notice and license, your documentation > defaults to GPLv2+, per the top level COPYING. If you want something > else, it may be worth explicitly calling it out. > > Otherwise, it was a good read. Thanks for reviewing, Eric. Surveying existing docs, I see a mix of a few with explicit GPLv2+, some PROMELA code under WTFPL/public domain, and one lone file under FreeBSD documentation license. The vast majority don't set an explicit license. I don't have a strong feeling about this and the documentation is certainly no opus. Should I have a strong feeling about this? Thanks, Alex ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation 2016-05-18 18:35 ` Alex Williamson @ 2016-05-18 19:28 ` Eric Blake 0 siblings, 0 replies; 21+ messages in thread From: Eric Blake @ 2016-05-18 19:28 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, allen.m.kay, kraxel, kvm [-- Attachment #1: Type: text/plain, Size: 3080 bytes --] On 05/18/2016 12:35 PM, Alex Williamson wrote: >> Without an explicit copyright notice and license, your documentation >> defaults to GPLv2+, per the top level COPYING. If you want something >> else, it may be worth explicitly calling it out. >> >> Otherwise, it was a good read. > > > Thanks for reviewing, Eric. Surveying existing docs, I see a mix of a > few with explicit GPLv2+, some PROMELA code under WTFPL/public domain, > and one lone file under FreeBSD documentation license. The vast > majority don't set an explicit license. I don't have a strong feeling > about this and the documentation is certainly no opus. Should I have a > strong feeling about this? Thanks, My personal opinion: I can see arguments for two different camps: 1. Will this document ever be useful to someone implementing a parallel code base to the same thing? If so, does a GPLv2+ document taint their implementation to also have to be GPLv2+ compatible? I'm not a lawyer, so I have NO IDEA if someone can legally claim that an independent implementation of code based on a GPLv2+ specification is constrained by the license of the doc it was implementing. If it were me answering, I'd say that there is no tainting involved (GPL applies to source code in its preferred original form; and how would you justify documentation to be a preferred original form?). But precisely because it is a fuzzy question, being explicit makes the solution easier to reason about: giving an explicit license looser than GPL means that no one can argue that the documentation is forcing a particular license on implementations. 2. Why bother? I like GPLv2+ because of its strong copyleft, and want to make life harder for anyone that doesn't believe in the same way. Meanwhile, sticking an explicit copyright blurb in the doc makes the doc harder to read (you have to scroll to get to the real meat), particularly if I'm just fine with the project-default license, so omitting the text is okay. Of course, once a copyright blurb is omitted, if qemu ever changes its own license (such as to GPLv3+ [hah, we know that won't happen without a rewrite of all the GPLv2-only stuff]), then my document follows that change in lockstep, so explicitly stating GPLv2+ cements my intention regardless of external context. So at the end of the day, it boils down to how comfortable are you with the license you've chosen, how likely do you think your doc will be important to someone writing parallel code, and how much do you care. I'm perfectly okay with no change, and was merely pointing it out in case you have a stronger opinion than me. As you rightly noted, we have a number of interesting precedents in the directory, so there is no real common standard, so much as personal preference, although I do prefer that if you use an explicit license, pick one already in use rather than adding yet another license to the overall mix of qemu. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson ` (7 preceding siblings ...) 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation Alex Williamson @ 2016-05-17 20:42 ` Alex Williamson 2016-05-18 14:24 ` Gerd Hoffmann 8 siblings, 1 reply; 21+ messages in thread From: Alex Williamson @ 2016-05-17 20:42 UTC (permalink / raw) To: qemu-devel; +Cc: allen.m.kay, kraxel, kvm On Tue, 17 May 2016 14:19:19 -0600 Alex Williamson <alex.williamson@redhat.com> wrote: > The following series implements... Gag, I blame a bum vpn connection and botch vim recovery ;) Here's what I meant to list: This is the first non-RFC posting, so if you wish to review and provide comments, now is the time to do so. What's new in v6: * The fw_cfg entry for BDSM reservation has changed per Gerd's feedback. This will require a new SeaBIOS to make the matching change (incompatible with previous postings). * Added a docs entry for IGD assignment covering both the usage of UPT vs legacy mode as well as the ABI using fw_cfg. * Fix some uninitialized pointer issues if we bail out of legacy mode setup. * Leak igd_opregion if we happen to be hot-unplugged. I don't see a better option given the fw_cfg dependency, we really don't want to support unplug of IGD at all. I believe we're no longer blocked by SeaBIOS being willing to consume the new fw_cfg entries, so barring feedback otherwise, my plan will be to send a pull request including these changes (after an appropriate review period), ask Kevin to apply the matching SeaBIOS change, then see if Gerd will help me pull that support into the default QEMU SeaBIOS image. There is a plethora of comments and documentation in this series, so I will forego further description of what this enables. Jump to the last patch to read about it or hit patch 6 for the very well documented (IMO) meat of IGD support. Thanks! Alex > --- > > Alex Williamson (8): > vfio: Enable sparse mmap capability > vfio: Create device specific region info helper > vfio/pci: Fix return of vfio_populate_vga() > vfio/pci: Consolidate VGA setup > vfio/pci: Setup BAR quirks after capabilities probing > vfio/pci: Intel graphics legacy mode assignment > vfio/pci: Add a separate option for IGD OpRegion support > vfio/pci: Add IGD documentation > > > docs/igd-assign.txt | 133 ++++++++ > hw/vfio/common.c | 103 ++++++- > hw/vfio/pci-quirks.c | 643 +++++++++++++++++++++++++++++++++++++++++ > hw/vfio/pci.c | 151 ++++++---- > hw/vfio/pci.h | 8 + > include/hw/vfio/vfio-common.h | 2 > trace-events | 11 + > 7 files changed, 989 insertions(+), 62 deletions(-) > create mode 100644 docs/igd-assign.txt ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-17 20:42 ` [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) Alex Williamson @ 2016-05-18 14:24 ` Gerd Hoffmann 2016-05-18 18:45 ` Alex Williamson 0 siblings, 1 reply; 21+ messages in thread From: Gerd Hoffmann @ 2016-05-18 14:24 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, allen.m.kay, kvm Hi, > I believe we're no longer blocked by SeaBIOS being willing to > consume the new fw_cfg entries, so barring feedback otherwise, my > plan will be to send a pull request including these changes (after > an appropriate review period), ask Kevin to apply the matching > SeaBIOS change, then see if Gerd will help me pull that support > into the default QEMU SeaBIOS image. Sounds good. Patches look sane to me too. Will try to set aside some time to test them later this week (and also push a branch with updated seabios etc for others to try). cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-18 14:24 ` Gerd Hoffmann @ 2016-05-18 18:45 ` Alex Williamson 2016-05-19 9:00 ` Gerd Hoffmann 2016-05-20 12:19 ` Gerd Hoffmann 0 siblings, 2 replies; 21+ messages in thread From: Alex Williamson @ 2016-05-18 18:45 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: allen.m.kay, qemu-devel, kvm On Wed, 18 May 2016 16:24:49 +0200 Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > I believe we're no longer blocked by SeaBIOS being willing to > > consume the new fw_cfg entries, so barring feedback otherwise, my > > plan will be to send a pull request including these changes (after > > an appropriate review period), ask Kevin to apply the matching > > SeaBIOS change, then see if Gerd will help me pull that support > > into the default QEMU SeaBIOS image. > > Sounds good. Patches look sane to me too. > > Will try to set aside some time to test them later this week (and also > push a branch with updated seabios etc for others to try). Thanks Gerd! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-18 18:45 ` Alex Williamson @ 2016-05-19 9:00 ` Gerd Hoffmann 2016-05-20 12:19 ` Gerd Hoffmann 1 sibling, 0 replies; 21+ messages in thread From: Gerd Hoffmann @ 2016-05-19 9:00 UTC (permalink / raw) To: Alex Williamson; +Cc: allen.m.kay, qemu-devel, kvm On Mi, 2016-05-18 at 12:45 -0600, Alex Williamson wrote: > On Wed, 18 May 2016 16:24:49 +0200 > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > Hi, > > > > > I believe we're no longer blocked by SeaBIOS being willing to > > > consume the new fw_cfg entries, so barring feedback otherwise, my > > > plan will be to send a pull request including these changes (after > > > an appropriate review period), ask Kevin to apply the matching > > > SeaBIOS change, then see if Gerd will help me pull that support > > > into the default QEMU SeaBIOS image. > > > > Sounds good. Patches look sane to me too. > > > > Will try to set aside some time to test them later this week (and also > > push a branch with updated seabios etc for others to try). > > Thanks Gerd! Branch pushed meanwhile: https://www.kraxel.org/cgit/qemu/log/?h=work/vfio-igd [ not tested yet, kernel builds still running ... ] cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-18 18:45 ` Alex Williamson 2016-05-19 9:00 ` Gerd Hoffmann @ 2016-05-20 12:19 ` Gerd Hoffmann 2016-05-20 15:08 ` Alex Williamson 1 sibling, 1 reply; 21+ messages in thread From: Gerd Hoffmann @ 2016-05-20 12:19 UTC (permalink / raw) To: Alex Williamson; +Cc: allen.m.kay, qemu-devel, kvm On Mi, 2016-05-18 at 12:45 -0600, Alex Williamson wrote: > On Wed, 18 May 2016 16:24:49 +0200 > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > Hi, > > > > > I believe we're no longer blocked by SeaBIOS being willing to > > > consume the new fw_cfg entries, so barring feedback otherwise, my > > > plan will be to send a pull request including these changes (after > > > an appropriate review period), ask Kevin to apply the matching > > > SeaBIOS change, then see if Gerd will help me pull that support > > > into the default QEMU SeaBIOS image. > > > > Sounds good. Patches look sane to me too. > > > > Will try to set aside some time to test them later this week (and also > > push a branch with updated seabios etc for others to try). > > Thanks Gerd! Works flawlessly. Linux works fine with the igd assigned as primary (pc machine type and igd using slot 2). Linux doesn't work in UPT mode for me. That is more a guest driver issue though. The i915 driver of older kernels Oopses. The i915 driver of recent kernels (4.4.5+) loads fine on q35, but fails to drive my monitor, no matter whenever I enable the opregion or not. For pc an additional kernel patch is needed (which hopefully lands upstream soon) to reach feature parity (as-in: no Oops on load) with q35. For windows UPT mode is the only thing supported by the drivers. Needs the opregion enabled to show something on the connected display. Works fine on both pc and q35, in parallel with a emulated display, and you can configure things as you want (mirror, expand desktop, use only one of the displays). Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com> cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-20 12:19 ` Gerd Hoffmann @ 2016-05-20 15:08 ` Alex Williamson 2016-05-23 13:34 ` Gerd Hoffmann 0 siblings, 1 reply; 21+ messages in thread From: Alex Williamson @ 2016-05-20 15:08 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: allen.m.kay, qemu-devel, kvm On Fri, 20 May 2016 14:19:19 +0200 Gerd Hoffmann <kraxel@redhat.com> wrote: > On Mi, 2016-05-18 at 12:45 -0600, Alex Williamson wrote: > > On Wed, 18 May 2016 16:24:49 +0200 > > Gerd Hoffmann <kraxel@redhat.com> wrote: > > > > > Hi, > > > > > > > I believe we're no longer blocked by SeaBIOS being willing to > > > > consume the new fw_cfg entries, so barring feedback otherwise, my > > > > plan will be to send a pull request including these changes (after > > > > an appropriate review period), ask Kevin to apply the matching > > > > SeaBIOS change, then see if Gerd will help me pull that support > > > > into the default QEMU SeaBIOS image. > > > > > > Sounds good. Patches look sane to me too. > > > > > > Will try to set aside some time to test them later this week (and also > > > push a branch with updated seabios etc for others to try). > > > > Thanks Gerd! > > Works flawlessly. > > Linux works fine with the igd assigned as primary (pc machine type and > igd using slot 2). > > Linux doesn't work in UPT mode for me. That is more a guest driver > issue though. The i915 driver of older kernels Oopses. The i915 driver > of recent kernels (4.4.5+) loads fine on q35, but fails to drive my > monitor, no matter whenever I enable the opregion or not. For pc an > additional kernel patch is needed (which hopefully lands upstream soon) > to reach feature parity (as-in: no Oops on load) with q35. Yes, I've never had success with UPT for Linux guests either. > For windows UPT mode is the only thing supported by the drivers. Needs > the opregion enabled to show something on the connected display. Works > fine on both pc and q35, in parallel with a emulated display, and you > can configure things as you want (mirror, expand desktop, use only one > of the displays). UPT is probably the only thing Intel cares to support on Windows, yes, but did you have any issues or try legacy mode on Windows? I find that Windows 7/8/10 all work well with legacy mode, including the basic drivers in the shrink-wrap images. Please let me know if you find otherwise. > > Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> > Tested-by: Gerd Hoffmann <kraxel@redhat.com> Awesome, thanks! Alex ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) 2016-05-20 15:08 ` Alex Williamson @ 2016-05-23 13:34 ` Gerd Hoffmann 0 siblings, 0 replies; 21+ messages in thread From: Gerd Hoffmann @ 2016-05-23 13:34 UTC (permalink / raw) To: Alex Williamson; +Cc: allen.m.kay, qemu-devel, kvm Hi, > UPT is probably the only thing Intel cares to support on Windows, yes, > but did you have any issues or try legacy mode on Windows? Just tried that with Win10 -- working fine. cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2016-05-23 13:35 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-17 20:19 [Qemu-devel] [PATCH v6 0/8] Series short description Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 1/8] vfio: Enable sparse mmap capability Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 2/8] vfio: Create device specific region info helper Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 3/8] vfio/pci: Fix return of vfio_populate_vga() Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 4/8] vfio/pci: Consolidate VGA setup Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 5/8] vfio/pci: Setup BAR quirks after capabilities probing Alex Williamson 2016-05-17 20:19 ` [Qemu-devel] [PATCH v6 6/8] vfio/pci: Intel graphics legacy mode assignment Alex Williamson 2016-05-18 14:21 ` Gerd Hoffmann 2016-05-18 16:50 ` Alex Williamson 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 7/8] vfio/pci: Add a separate option for IGD OpRegion support Alex Williamson 2016-05-17 20:20 ` [Qemu-devel] [PATCH v6 8/8] vfio/pci: Add IGD documentation Alex Williamson 2016-05-18 2:23 ` Eric Blake 2016-05-18 18:35 ` Alex Williamson 2016-05-18 19:28 ` Eric Blake 2016-05-17 20:42 ` [Qemu-devel] vfio IGD assignment (was Re: [PATCH v6 0/8] Series short description) Alex Williamson 2016-05-18 14:24 ` Gerd Hoffmann 2016-05-18 18:45 ` Alex Williamson 2016-05-19 9:00 ` Gerd Hoffmann 2016-05-20 12:19 ` Gerd Hoffmann 2016-05-20 15:08 ` Alex Williamson 2016-05-23 13:34 ` Gerd Hoffmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).