* [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support
@ 2016-02-16 21:31 Alex Williamson
2016-02-16 21:31 ` [Qemu-devel] [RFC PATCH v3 1/9] vfio: Add sysfsdev property for pci & platform Alex Williamson
` (10 more replies)
0 siblings, 11 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:31 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
v3:
Quite a bit of restructuring, functional differences include exposing
another fw_cfg file to indicate the size of the stolen memory region.
We really have no need to copy anything into stolen memory, so while
we tell SeaBIOS about it via a fw_cfg file, the data pointer is NULL
so it can't be read.
I'm also now reading the GGMS size from the GMCH register which
determines the size of the GTT region of stolen memory. The vBIOS
is typically only using 1MB, but this is often 2MB in hardware. I
also give the user the ability to specify a GMS value for further
stolen memory. We default it to zero and it's an experimental option
so we can remove if it's not useful. QEMU now does the virtualization
of the GMCH and BDSM registers, which is was sort of doing before
anyway, but vfio kernel no longer does anything special for them.
Getting the GGMS size requires that we know something about the IGD
version we're using, so code added for that.
One fun thing, IGD is really part of the reason that the x-vga option
is experimental. IGD doesn't like to give up VGA routing. Now we
can use that to our advantage. If the hardware doesn't report VGA
disabled, we can automatically turn it on. I position this around
all the other stuff of doing vBIOS and BDSM quirks, so it can be
disabled by specifying rombar=0 or using the new x-no-auto-vga
options. I'm also using this to signal when to skip creating the
ISA bridge and messing with the host bridge. If we have a Gen8 or
newer and rombar=0 is specified then we don't do any special setup,
which should enable Intel's Universal Passthrough Mode. This is
already supported by libvirt, so it should make an easy path between
old and new modes.
Not seen here is a whole revision that created fake BARs on the ISA
bride for the opregion and stolen memory such that they were
automatically mapped with no BIOS requirement. That has a gap that
stolen memory gets disabled during BAR sizing and breaks altogether
if the guest moves the BAR. It really only affects VESA mode, but
it's still enough to abandon that hack approach.
This will work with the previous kernel patches, but I'd recommend
v3 anyway, plus the PCI FLR reset delay on latops. You will
definitely need new SeaBIOS for this or else you'll get a hw_error.
Happy testing, reviews and feedback welcome. Thanks,
Alex
v2:
IGD support is greatly expanded. Due to feedback on the previous
serious QEMU no longer maps the OpRegion to the guest, we simply fill
a buffer and expose it as fw_cfg. We could still do the mapping in
the future if there's value to it.
New features include the use of host and LPC bridge config space
provided through new vfio device specific regions. This eliminates
the need for QEMU to go poking around in pci-sysfs. Additionally the
host and LPC changes are now initiated by vfio-pci upon finding the
necessary regions to support these. Thus the igd_passthru=on machine
option is not needed for this series. This series no longer has any
dependency on Gerd's previous IGD series.
Also included is PCI option ROM fixups, which automatically fixes the
device ID in the ROM and recalculates the checksum for ROMs loaded
through vfio. This is necessary for IGD as the ROM vfio provides us
through the shadow ROM space typically has the wrong ID and bogus
checksum. It would also be useful for anyone "soft modding" a card
by specifying a different device ID and manually hacking the ROM.
Finally is a quirk to handle stolen memory and requires cooperation
with SeaBIOS. We need the vBIOS, as enabled by the ROM support
above, for lighting up laptop panels (at least for my SNB system),
but that vBIOS tries to make use of host stolen memory, which either
overlaps VM RAM or empty space, which leads to VM memory corruption
or DMAR faults respectively. We can prevent this by intercepting
the vBIOS programming of the device to instead use a buffer allocated
by SeaBIOS. I'm amazed this works, but it does... at least for me.
Comments and testing feedback welcome. You'll need this QEMU patch
series, the latest vfio patch series (including the PCI reset path
on laptops), and a new SeaBIOS patch series. Thanks,
Alex
v1:
This is the QEMU compliment to the vfio kernel capability chain
series. This is RFC since it depends on those non-upstream kernel
changes. Patch 1/ will be posted separately, it's somewhat unrelated,
but is in my build tree so I include it here for anyone that wants to
build this series.
This series includes sparse mmap support for avoiding mmaps over the
MSI-X vector table and device specific memory regions for IGD OpRegion
support. MemoryRegions are significantly generalize for the former,
to make it really easy for each vfio region to be backed by none or
more mmap MemoryRegion. The MSI-X vector table then either adds an
mmap region, or not via a legacy quirk or explicit sparse mmap
support.
IGD OpRegions are exposed as new device specific region, which simply
entails searching regions past those known for matching type and
sub-type regions that we know how to handle. Writes to the OpRegion
register (ASL storage) pop the host OpRegion into VM system memory.
This isn't exactly like how real hardware works, but it makes for a
convenient implementation. Alternatively we could pass the entire
OpRegion table via fw_cfg, but this makes write through to the host
impossible (if that's even useful). This is certainly something that
I'm looking for comments about in this series. Thanks,
Alex
---
Alex Williamson (9):
vfio: Add sysfsdev property for pci & platform
vfio: Wrap VFIO_DEVICE_GET_REGION_INFO
vfio: Generalize region support
vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions
linux-headers/vfio: Update for proposed capabilities list
vfio: Enable sparse mmap capability
vfio/pci: Fixup PCI option ROMs
vfio/pci: Split out VGA setup
Intel IGD support for vfio
hw/arm/sysbus-fdt.c | 2
hw/vfio/common.c | 249 +++++++++++++++--
hw/vfio/pci-quirks.c | 538 +++++++++++++++++++++++++++++++++++--
hw/vfio/pci.c | 593 +++++++++++++++++++++++------------------
hw/vfio/pci.h | 21 +
hw/vfio/platform.c | 126 +++------
include/hw/vfio/vfio-common.h | 29 ++
linux-headers/linux/vfio.h | 101 +++++++
trace-events | 19 +
9 files changed, 1262 insertions(+), 416 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 1/9] vfio: Add sysfsdev property for pci & platform
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
@ 2016-02-16 21:31 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 2/9] vfio: Wrap VFIO_DEVICE_GET_REGION_INFO Alex Williamson
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:31 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path, but it should be noted that
the device namespace is global for vfio, a virtual device cannot
overlap with existing namespaces and should not create a name prone to
conflict, such as a simple instance number.
The same changes is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/pci.c | 130 +++++++++++++++++------------------------
hw/vfio/platform.c | 55 ++++++++++-------
include/hw/vfio/vfio-common.h | 1
3 files changed, 86 insertions(+), 100 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 49f3d2d..5524121 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -895,12 +895,8 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
/* Since pci handles romfile, just print a message and return */
if (vfio_blacklist_opt_rom(vdev) && vdev->pdev.romfile) {
- error_printf("Warning : Device at %04x:%02x:%02x.%x "
- "is known to cause system instability issues during "
- "option rom execution. "
- "Proceeding anyway since user specified romfile\n",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+ error_printf("Warning : Device at %s is known to cause system instability issues during option rom execution. Proceeding anyway since user specified romfile\n",
+ vdev->vbasedev.name);
}
return;
}
@@ -913,9 +909,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
pwrite(fd, &size, 4, offset) != 4 ||
pread(fd, &size, 4, offset) != 4 ||
pwrite(fd, &orig, 4, offset) != 4) {
- error_report("%s(%04x:%02x:%02x.%x) failed: %m",
- __func__, vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function);
+ error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
return;
}
@@ -927,29 +921,18 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
if (vfio_blacklist_opt_rom(vdev)) {
if (dev->opts && qemu_opt_get(dev->opts, "rombar")) {
- error_printf("Warning : Device at %04x:%02x:%02x.%x "
- "is known to cause system instability issues during "
- "option rom execution. "
- "Proceeding anyway since user specified non zero value for "
- "rombar\n",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+ error_printf("Warning : Device at %s is known to cause system instability issues during option rom execution. Proceeding anyway since user specified non zero value for rombar\n",
+ vdev->vbasedev.name);
} else {
- error_printf("Warning : Rom loading for device at "
- "%04x:%02x:%02x.%x has been disabled due to "
- "system instability issues. "
- "Specify rombar=1 or romfile to force\n",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+ error_printf("Warning : Rom loading for device at %s has been disabled due to system instability issues. Specify rombar=1 or romfile to force\n",
+ vdev->vbasedev.name);
return;
}
}
trace_vfio_pci_size_rom(vdev->vbasedev.name, size);
- snprintf(name, sizeof(name), "vfio[%04x:%02x:%02x.%x].rom",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+ snprintf(name, sizeof(name), "vfio[%s].rom", vdev->vbasedev.name);
memory_region_init_io(&vdev->pdev.rom, OBJECT(vdev),
&vfio_rom_ops, vdev, name, size);
@@ -1063,9 +1046,8 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
ret = pread(vdev->vbasedev.fd, &phys_val, len,
vdev->config_offset + addr);
if (ret != len) {
- error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
- __func__, vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function, addr, len);
+ error_report("%s(%s, 0x%x, 0x%x) failed: %m",
+ __func__, vdev->vbasedev.name, addr, len);
return -errno;
}
phys_val = le32_to_cpu(phys_val);
@@ -1089,9 +1071,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
/* Write everything to VFIO, let it filter out what we can't write */
if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
!= len) {
- error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
- __func__, vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function, addr, val, len);
+ error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m",
+ __func__, vdev->vbasedev.name, addr, val, len);
}
/* MSI/MSI-X Enabling/Disabling */
@@ -1383,9 +1364,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
return;
}
- snprintf(name, sizeof(name), "VFIO %04x:%02x:%02x.%x BAR %d",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function, nr);
+ snprintf(name, sizeof(name), "VFIO %s BAR %d", vdev->vbasedev.name, nr);
/* Determine what type of BAR this is for registration */
ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
@@ -1755,9 +1734,8 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
}
if (ret < 0) {
- error_report("vfio: %04x:%02x:%02x.%x Error adding PCI capability "
- "0x%x[0x%x]@0x%x: %d", vdev->host.domain,
- vdev->host.bus, vdev->host.slot, vdev->host.function,
+ error_report("vfio: %s Error adding PCI capability "
+ "0x%x[0x%x]@0x%x: %d", vdev->vbasedev.name,
cap_id, size, pos, ret);
return ret;
}
@@ -1819,11 +1797,14 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
vfio_intx_enable(vdev);
}
-static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
- PCIHostDeviceAddress *host2)
+static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
{
- return (host1->domain == host2->domain && host1->bus == host2->bus &&
- host1->slot == host2->slot && host1->function == host2->function);
+ char tmp[13];
+
+ sprintf(tmp, "%04x:%02x:%02x.%1x", addr->domain,
+ addr->bus, addr->slot, addr->function);
+
+ return (strcmp(tmp, name) == 0);
}
static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
@@ -1848,9 +1829,8 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
if (ret && errno != ENOSPC) {
ret = -errno;
if (!vdev->has_pm_reset) {
- error_report("vfio: Cannot reset device %04x:%02x:%02x.%x, "
- "no available reset mechanism.", vdev->host.domain,
- vdev->host.bus, vdev->host.slot, vdev->host.function);
+ error_report("vfio: Cannot reset device %s, "
+ "no available reset mechanism.", vdev->vbasedev.name);
}
goto out_single;
}
@@ -1883,7 +1863,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
trace_vfio_pci_hot_reset_dep_devices(host.domain,
host.bus, host.slot, host.function, devices[i].group_id);
- if (vfio_pci_host_match(&host, &vdev->host)) {
+ if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
continue;
}
@@ -1909,7 +1889,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
continue;
}
tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
- if (vfio_pci_host_match(&host, &tmp->host)) {
+ if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
if (single) {
ret = -EINVAL;
goto out_single;
@@ -1971,7 +1951,7 @@ out:
host.slot = PCI_SLOT(devices[i].devfn);
host.function = PCI_FUNC(devices[i].devfn);
- if (vfio_pci_host_match(&host, &vdev->host)) {
+ if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
continue;
}
@@ -1990,7 +1970,7 @@ out:
continue;
}
tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
- if (vfio_pci_host_match(&host, &tmp->host)) {
+ if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
vfio_pci_post_reset(tmp);
break;
}
@@ -2196,10 +2176,7 @@ static void vfio_err_notifier_handler(void *opaque)
* guest to contain the error.
*/
- error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected. "
- "Please collect any data possible and then kill the guest",
- __func__, vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function);
+ error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
vm_stop(RUN_STATE_INTERNAL_ERROR);
}
@@ -2380,42 +2357,43 @@ static int vfio_initfn(PCIDevice *pdev)
VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
VFIODevice *vbasedev_iter;
VFIOGroup *group;
- char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
+ char *tmp, group_path[PATH_MAX], *group_name;
ssize_t len;
struct stat st;
int groupid;
int ret;
- /* Check that the host device exists */
- snprintf(path, sizeof(path),
- "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
- if (stat(path, &st) < 0) {
- error_report("vfio: error: no such host device: %s", path);
+ if (!vdev->vbasedev.sysfsdev) {
+ vdev->vbasedev.sysfsdev =
+ g_strdup_printf("/sys/bus/pci/devices/%04x:%02x:%02x.%01x",
+ vdev->host.domain, vdev->host.bus,
+ vdev->host.slot, vdev->host.function);
+ }
+
+ if (stat(vdev->vbasedev.sysfsdev, &st) < 0) {
+ error_report("vfio: error: no such host device: %s",
+ vdev->vbasedev.sysfsdev);
return -errno;
}
+ vdev->vbasedev.name = g_strdup(basename(vdev->vbasedev.sysfsdev));
vdev->vbasedev.ops = &vfio_pci_ops;
-
vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
- vdev->vbasedev.name = g_strdup_printf("%04x:%02x:%02x.%01x",
- vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function);
- strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
+ tmp = g_strdup_printf("%s/iommu_group", vdev->vbasedev.sysfsdev);
+ len = readlink(tmp, group_path, sizeof(group_path));
+ g_free(tmp);
- len = readlink(path, iommu_group_path, sizeof(path));
- if (len <= 0 || len >= sizeof(path)) {
+ if (len <= 0 || len >= sizeof(group_path)) {
error_report("vfio: error no iommu_group for device");
return len < 0 ? -errno : -ENAMETOOLONG;
}
- iommu_group_path[len] = 0;
- group_name = basename(iommu_group_path);
+ group_path[len] = 0;
+ group_name = basename(group_path);
if (sscanf(group_name, "%d", &groupid) != 1) {
- error_report("vfio: error reading %s: %m", path);
+ error_report("vfio: error reading %s: %m", group_path);
return -errno;
}
@@ -2427,21 +2405,18 @@ static int vfio_initfn(PCIDevice *pdev)
return -ENOENT;
}
- snprintf(path, sizeof(path), "%04x:%02x:%02x.%01x",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
-
QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
- error_report("vfio: error: device %s is already attached", path);
+ error_report("vfio: error: device %s is already attached",
+ vdev->vbasedev.name);
vfio_put_group(group);
return -EBUSY;
}
}
- ret = vfio_get_device(group, path, &vdev->vbasedev);
+ ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev);
if (ret) {
- error_report("vfio: failed to get device %s", path);
+ error_report("vfio: failed to get device %s", vdev->vbasedev.name);
vfio_put_group(group);
return ret;
}
@@ -2658,6 +2633,7 @@ static void vfio_instance_init(Object *obj)
static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
+ DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
intx.mmap_timeout, 1100),
DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index ebc9dcb..6c8b54a 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -560,38 +560,45 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
{
VFIOGroup *group;
VFIODevice *vbasedev_iter;
- char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
+ char *tmp, group_path[PATH_MAX], *group_name;
ssize_t len;
struct stat st;
int groupid;
int ret;
- /* name must be set prior to the call */
- if (!vbasedev->name || strchr(vbasedev->name, '/')) {
- return -EINVAL;
- }
+ /* @sysfsdev takes precedence over @host */
+ if (vbasedev->sysfsdev) {
+ g_free(vbasedev->name);
+ vbasedev->name = g_strdup(basename(vbasedev->sysfsdev));
+ } else {
+ if (!vbasedev->name || strchr(vbasedev->name, '/')) {
+ return -EINVAL;
+ }
- /* Check that the host device exists */
- g_snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
- vbasedev->name);
+ vbasedev->sysfsdev = g_strdup_printf("/sys/bus/platform/devices/%s",
+ vbasedev->name);
+ }
- if (stat(path, &st) < 0) {
- error_report("vfio: error: no such host device: %s", path);
+ if (stat(vbasedev->sysfsdev, &st) < 0) {
+ error_report("vfio: error: no such host device: %s",
+ vbasedev->sysfsdev);
return -errno;
}
- g_strlcat(path, "iommu_group", sizeof(path));
- len = readlink(path, iommu_group_path, sizeof(iommu_group_path));
- if (len < 0 || len >= sizeof(iommu_group_path)) {
+ tmp = g_strdup_printf("%s/iommu_group", vbasedev->sysfsdev);
+ len = readlink(tmp, group_path, sizeof(group_path));
+ g_free(tmp);
+
+ if (len < 0 || len >= sizeof(group_path)) {
error_report("vfio: error no iommu_group for device");
return len < 0 ? -errno : -ENAMETOOLONG;
}
- iommu_group_path[len] = 0;
- group_name = basename(iommu_group_path);
+ group_path[len] = 0;
+ group_name = basename(group_path);
if (sscanf(group_name, "%d", &groupid) != 1) {
- error_report("vfio: error reading %s: %m", path);
+ error_report("vfio: error reading %s: %m", group_path);
return -errno;
}
@@ -603,25 +610,24 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
return -ENOENT;
}
- g_snprintf(path, sizeof(path), "%s", vbasedev->name);
-
QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
- error_report("vfio: error: device %s is already attached", path);
+ error_report("vfio: error: device %s is already attached",
+ vbasedev->name);
vfio_put_group(group);
return -EBUSY;
}
}
- ret = vfio_get_device(group, path, vbasedev);
+ ret = vfio_get_device(group, vbasedev->name, vbasedev);
if (ret) {
- error_report("vfio: failed to get device %s", path);
+ error_report("vfio: failed to get device %s", vbasedev->name);
vfio_put_group(group);
return ret;
}
ret = vfio_populate_device(vbasedev);
if (ret) {
- error_report("vfio: failed to populate device %s", path);
+ error_report("vfio: failed to populate device %s", vbasedev->name);
vfio_put_group(group);
}
@@ -681,7 +687,9 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
vbasedev->ops = &vfio_platform_ops;
- trace_vfio_platform_realize(vbasedev->name, vdev->compat);
+ trace_vfio_platform_realize(vbasedev->sysfsdev ?
+ vbasedev->sysfsdev : vbasedev->name,
+ vdev->compat);
ret = vfio_base_device_init(vbasedev);
if (ret) {
@@ -703,6 +711,7 @@ static const VMStateDescription vfio_platform_vmstate = {
static Property vfio_platform_dev_properties[] = {
DEFINE_PROP_STRING("host", VFIOPlatformDevice, vbasedev.name),
+ DEFINE_PROP_STRING("sysfsdev", VFIOPlatformDevice, vbasedev.sysfsdev),
DEFINE_PROP_BOOL("x-no-mmap", VFIOPlatformDevice, vbasedev.no_mmap, false),
DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
mmap_timeout, 1100),
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f037f3c..7e00ffc 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -89,6 +89,7 @@ typedef struct VFIODeviceOps VFIODeviceOps;
typedef struct VFIODevice {
QLIST_ENTRY(VFIODevice) next;
struct VFIOGroup *group;
+ char *sysfsdev;
char *name;
int fd;
int type;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 2/9] vfio: Wrap VFIO_DEVICE_GET_REGION_INFO
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
2016-02-16 21:31 ` [Qemu-devel] [RFC PATCH v3 1/9] vfio: Add sysfsdev property for pci & platform Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 3/9] vfio: Generalize region support Alex Williamson
` (8 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
In preparation for supporting capability chains on regions, wrap
ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
each caller.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/vfio/common.c | 18 +++++++++
hw/vfio/pci.c | 81 +++++++++++++++++++++--------------------
hw/vfio/platform.c | 13 ++++---
include/hw/vfio/vfio-common.h | 3 ++
4 files changed, 69 insertions(+), 46 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 607ec70..e20fc4f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -959,6 +959,24 @@ void vfio_put_base_device(VFIODevice *vbasedev)
close(vbasedev->fd);
}
+int vfio_get_region_info(VFIODevice *vbasedev, int index,
+ struct vfio_region_info **info)
+{
+ size_t argsz = sizeof(struct vfio_region_info);
+
+ *info = g_malloc0(argsz);
+
+ (*info)->index = index;
+ (*info)->argsz = argsz;
+
+ if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
+ g_free(*info);
+ return -errno;
+ }
+
+ return 0;
+}
+
static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
int req, void *param)
{
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5524121..a52947b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -783,25 +783,25 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
{
- struct vfio_region_info reg_info = {
- .argsz = sizeof(reg_info),
- .index = VFIO_PCI_ROM_REGION_INDEX
- };
+ struct vfio_region_info *reg_info;
uint64_t size;
off_t off = 0;
ssize_t bytes;
- if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
+ if (vfio_get_region_info(&vdev->vbasedev,
+ VFIO_PCI_ROM_REGION_INDEX, ®_info)) {
error_report("vfio: Error getting ROM info: %m");
return;
}
- trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info.size,
- (unsigned long)reg_info.offset,
- (unsigned long)reg_info.flags);
+ trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info->size,
+ (unsigned long)reg_info->offset,
+ (unsigned long)reg_info->flags);
+
+ vdev->rom_size = size = reg_info->size;
+ vdev->rom_offset = reg_info->offset;
- vdev->rom_size = size = reg_info.size;
- vdev->rom_offset = reg_info.offset;
+ g_free(reg_info);
if (!vdev->rom_size) {
vdev->rom_read_failed = true;
@@ -2026,7 +2026,7 @@ static VFIODeviceOps vfio_pci_ops = {
static int vfio_populate_device(VFIOPCIDevice *vdev)
{
VFIODevice *vbasedev = &vdev->vbasedev;
- struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+ struct vfio_region_info *reg_info;
struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
int i, ret = -1;
@@ -2048,72 +2048,73 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
}
for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
- reg_info.index = i;
-
- ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ ret = vfio_get_region_info(vbasedev, i, ®_info);
if (ret) {
error_report("vfio: Error getting region %d info: %m", i);
goto error;
}
trace_vfio_populate_device_region(vbasedev->name, i,
- (unsigned long)reg_info.size,
- (unsigned long)reg_info.offset,
- (unsigned long)reg_info.flags);
+ (unsigned long)reg_info->size,
+ (unsigned long)reg_info->offset,
+ (unsigned long)reg_info->flags);
vdev->bars[i].region.vbasedev = vbasedev;
- vdev->bars[i].region.flags = reg_info.flags;
- vdev->bars[i].region.size = reg_info.size;
- vdev->bars[i].region.fd_offset = reg_info.offset;
+ vdev->bars[i].region.flags = reg_info->flags;
+ vdev->bars[i].region.size = reg_info->size;
+ vdev->bars[i].region.fd_offset = reg_info->offset;
vdev->bars[i].region.nr = i;
QLIST_INIT(&vdev->bars[i].quirks);
- }
- reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
+ g_free(reg_info);
+ }
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ ret = vfio_get_region_info(vbasedev,
+ VFIO_PCI_CONFIG_REGION_INDEX, ®_info);
if (ret) {
error_report("vfio: Error getting config info: %m");
goto error;
}
trace_vfio_populate_device_config(vdev->vbasedev.name,
- (unsigned long)reg_info.size,
- (unsigned long)reg_info.offset,
- (unsigned long)reg_info.flags);
+ (unsigned long)reg_info->size,
+ (unsigned long)reg_info->offset,
+ (unsigned long)reg_info->flags);
- vdev->config_size = reg_info.size;
+ vdev->config_size = reg_info->size;
if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
vdev->pdev.cap_present &= ~QEMU_PCI_CAP_EXPRESS;
}
- vdev->config_offset = reg_info.offset;
+ vdev->config_offset = reg_info->offset;
+
+ g_free(reg_info);
if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
- struct vfio_region_info vga_info = {
- .argsz = sizeof(vga_info),
- .index = VFIO_PCI_VGA_REGION_INDEX,
- };
-
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
+ ret = vfio_get_region_info(vbasedev,
+ VFIO_PCI_VGA_REGION_INDEX, ®_info);
if (ret) {
error_report(
"vfio: Device does not support requested feature x-vga");
goto error;
}
- if (!(vga_info.flags & VFIO_REGION_INFO_FLAG_READ) ||
- !(vga_info.flags & VFIO_REGION_INFO_FLAG_WRITE) ||
- vga_info.size < 0xbffff + 1) {
+ if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
+ !(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
+ reg_info->size < 0xbffff + 1) {
error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
- (unsigned long)vga_info.flags,
- (unsigned long)vga_info.size);
+ (unsigned long)reg_info->flags,
+ (unsigned long)reg_info->size);
+ g_free(reg_info);
+ ret = -1;
goto error;
}
- vdev->vga.fd_offset = vga_info.offset;
+ vdev->vga.fd_offset = reg_info->offset;
vdev->vga.fd = vdev->vbasedev.fd;
+ g_free(reg_info);
+
vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
QLIST_INIT(&vdev->vga.region[QEMU_PCI_VGA_MEM].quirks);
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 6c8b54a..f9b9c20 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -476,23 +476,24 @@ static int vfio_populate_device(VFIODevice *vbasedev)
vdev->regions = g_new0(VFIORegion *, vbasedev->num_regions);
for (i = 0; i < vbasedev->num_regions; i++) {
- struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+ struct vfio_region_info *reg_info;
VFIORegion *ptr;
vdev->regions[i] = g_new0(VFIORegion, 1);
ptr = vdev->regions[i];
- reg_info.index = i;
- ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ ret = vfio_get_region_info(vbasedev, i, ®_info);
if (ret) {
error_report("vfio: Error getting region %d info: %m", i);
goto reg_error;
}
- ptr->flags = reg_info.flags;
- ptr->size = reg_info.size;
- ptr->fd_offset = reg_info.offset;
+ ptr->flags = reg_info->flags;
+ ptr->size = reg_info->size;
+ ptr->fd_offset = reg_info->offset;
ptr->nr = i;
ptr->vbasedev = vbasedev;
+ g_free(reg_info);
+
trace_vfio_platform_populate_regions(ptr->nr,
(unsigned long)ptr->flags,
(unsigned long)ptr->size,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7e00ffc..371383c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -25,6 +25,7 @@
#include "exec/memory.h"
#include "qemu/queue.h"
#include "qemu/notify.h"
+#include <linux/vfio.h>
/*#define DEBUG_VFIO*/
#ifdef DEBUG_VFIO
@@ -134,6 +135,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as);
void vfio_put_group(VFIOGroup *group);
int vfio_get_device(VFIOGroup *group, const char *name,
VFIODevice *vbasedev);
+int vfio_get_region_info(VFIODevice *vbasedev, int index,
+ struct vfio_region_info **info);
extern const MemoryRegionOps vfio_region_ops;
extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 3/9] vfio: Generalize region support
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
2016-02-16 21:31 ` [Qemu-devel] [RFC PATCH v3 1/9] vfio: Add sysfsdev property for pci & platform Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 2/9] vfio: Wrap VFIO_DEVICE_GET_REGION_INFO Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 4/9] vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions Alex Williamson
` (7 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
Both platform and PCI vfio drivers create a "slow", I/O memory region
with one or more mmap memory regions overlayed when supported by the
device. Generalize this to a set of common helpers in the core that
pulls the region info from vfio, fills the region data, configures
slow mapping, and adds helpers for comleting the mmap, enable/disable,
and teardown. This can be immediately used by the PCI MSI-X code,
which needs to mmap around the MSI-X vector table.
This also changes VFIORegion.mem to be dynamically allocated because
otherwise we don't know how the caller has allocated VFIORegion and
therefore don't know whether to unreference it to destroy the
MemoryRegion or not.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/arm/sysbus-fdt.c | 2
hw/vfio/common.c | 172 ++++++++++++++++++++++++++++++++++-------
hw/vfio/pci-quirks.c | 24 +++---
hw/vfio/pci.c | 168 +++++++++++++++++++++-------------------
hw/vfio/platform.c | 72 +++--------------
include/hw/vfio/vfio-common.h | 23 ++++-
trace-events | 10 ++
7 files changed, 282 insertions(+), 189 deletions(-)
diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 68a3de5..4dfa0e4 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -94,7 +94,7 @@ static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice *sbdev, void *opaque)
mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
reg_attr[2 * i] = cpu_to_be32(mmio_base);
reg_attr[2 * i + 1] = cpu_to_be32(
- memory_region_size(&vdev->regions[i]->mem));
+ memory_region_size(vdev->regions[i]->mem));
}
ret = qemu_fdt_setprop(fdt, nodename, "reg", reg_attr,
vbasedev->num_regions * 2 * sizeof(uint32_t));
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e20fc4f..96ccb79 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -493,46 +493,162 @@ static void vfio_listener_release(VFIOContainer *container)
memory_listener_unregister(&container->listener);
}
-int vfio_mmap_region(Object *obj, VFIORegion *region,
- MemoryRegion *mem, MemoryRegion *submem,
- void **map, size_t size, off_t offset,
- const char *name)
+int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
+ int index, const char *name)
{
- int ret = 0;
- VFIODevice *vbasedev = region->vbasedev;
+ struct vfio_region_info *info;
+ int ret;
+
+ ret = vfio_get_region_info(vbasedev, index, &info);
+ if (ret) {
+ return ret;
+ }
+
+ region->vbasedev = vbasedev;
+ region->flags = info->flags;
+ region->size = info->size;
+ region->fd_offset = info->offset;
+ region->nr = index;
- if (!vbasedev->no_mmap && size && region->flags &
- VFIO_REGION_INFO_FLAG_MMAP) {
- int prot = 0;
+ if (region->size) {
+ region->mem = g_new0(MemoryRegion, 1);
+ memory_region_init_io(region->mem, obj, &vfio_region_ops,
+ region, name, region->size);
- if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
- prot |= PROT_READ;
+ if (!vbasedev->no_mmap &&
+ region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
+ !(region->size & ~qemu_real_host_page_mask)) {
+
+ region->nr_mmaps = 1;
+ region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+
+ region->mmaps[0].offset = 0;
+ region->mmaps[0].size = region->size;
}
+ }
+
+ g_free(info);
+
+ trace_vfio_region_setup(vbasedev->name, index, name,
+ region->flags, region->fd_offset, region->size);
+ return 0;
+}
- if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
- prot |= PROT_WRITE;
+int vfio_region_mmap(VFIORegion *region)
+{
+ int i, prot = 0;
+ char *name;
+
+ if (!region->mem) {
+ return 0;
+ }
+
+ prot |= region->flags & VFIO_REGION_INFO_FLAG_READ ? PROT_READ : 0;
+ prot |= region->flags & VFIO_REGION_INFO_FLAG_WRITE ? PROT_WRITE : 0;
+
+ for (i = 0; i < region->nr_mmaps; i++) {
+ region->mmaps[i].mmap = mmap(NULL, region->mmaps[i].size, prot,
+ MAP_SHARED, region->vbasedev->fd,
+ region->fd_offset +
+ region->mmaps[i].offset);
+ if (region->mmaps[i].mmap == MAP_FAILED) {
+ int ret = -errno;
+
+ trace_vfio_region_mmap_fault(memory_region_name(region->mem), i,
+ region->fd_offset +
+ region->mmaps[i].offset,
+ region->fd_offset +
+ region->mmaps[i].offset +
+ region->mmaps[i].size - 1, ret);
+
+ region->mmaps[i].mmap = NULL;
+
+ for (i--; i >= 0; i--) {
+ memory_region_del_subregion(region->mem, ®ion->mmaps[i].mem);
+ munmap(region->mmaps[i].mmap, region->mmaps[i].size);
+ object_unparent(OBJECT(®ion->mmaps[i].mem));
+ region->mmaps[i].mmap = NULL;
+ }
+
+ return ret;
}
- *map = mmap(NULL, size, prot, MAP_SHARED,
- vbasedev->fd,
- region->fd_offset + offset);
- if (*map == MAP_FAILED) {
- *map = NULL;
- ret = -errno;
- goto empty_region;
+ name = g_strdup_printf("%s mmaps[%d]",
+ memory_region_name(region->mem), i);
+ memory_region_init_ram_ptr(®ion->mmaps[i].mem,
+ memory_region_owner(region->mem),
+ name, region->mmaps[i].size,
+ region->mmaps[i].mmap);
+ g_free(name);
+ memory_region_set_skip_dump(®ion->mmaps[i].mem);
+ memory_region_add_subregion(region->mem, region->mmaps[i].offset,
+ ®ion->mmaps[i].mem);
+
+ trace_vfio_region_mmap(memory_region_name(®ion->mmaps[i].mem),
+ region->mmaps[i].offset,
+ region->mmaps[i].offset +
+ region->mmaps[i].size - 1);
+ }
+
+ return 0;
+}
+
+void vfio_region_exit(VFIORegion *region)
+{
+ int i;
+
+ if (!region->mem) {
+ return;
+ }
+
+ for (i = 0; i < region->nr_mmaps; i++) {
+ if (region->mmaps[i].mmap) {
+ memory_region_del_subregion(region->mem, ®ion->mmaps[i].mem);
}
+ }
- memory_region_init_ram_ptr(submem, obj, name, size, *map);
- memory_region_set_skip_dump(submem);
- } else {
-empty_region:
- /* Create a zero sized sub-region to make cleanup easy. */
- memory_region_init(submem, obj, name, 0);
+ trace_vfio_region_exit(region->vbasedev->name, region->nr);
+}
+
+void vfio_region_finalize(VFIORegion *region)
+{
+ int i;
+
+ if (!region->mem) {
+ return;
}
- memory_region_add_subregion(mem, offset, submem);
+ for (i = 0; i < region->nr_mmaps; i++) {
+ if (region->mmaps[i].mmap) {
+ munmap(region->mmaps[i].mmap, region->mmaps[i].size);
+ object_unparent(OBJECT(®ion->mmaps[i].mem));
+ }
+ }
- return ret;
+ object_unparent(OBJECT(region->mem));
+
+ g_free(region->mem);
+ g_free(region->mmaps);
+
+ trace_vfio_region_finalize(region->vbasedev->name, region->nr);
+}
+
+void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled)
+{
+ int i;
+
+ if (!region->mem) {
+ return;
+ }
+
+ for (i = 0; i < region->nr_mmaps; i++) {
+ if (region->mmaps[i].mmap) {
+ memory_region_set_enabled(®ion->mmaps[i].mem, enabled);
+ }
+ }
+
+ trace_vfio_region_mmaps_set_enabled(memory_region_name(region->mem),
+ enabled);
}
void vfio_reset_handler(void *opaque)
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 4815527..d626ec9 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -337,14 +337,14 @@ static void vfio_probe_ati_bar4_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(window->addr_mem, OBJECT(vdev),
&vfio_generic_window_address_quirk, window,
"vfio-ati-bar4-window-address-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
window->address_offset,
window->addr_mem, 1);
memory_region_init_io(window->data_mem, OBJECT(vdev),
&vfio_generic_window_data_quirk, window,
"vfio-ati-bar4-window-data-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
window->data_offset,
window->data_mem, 1);
@@ -378,7 +378,7 @@ static void vfio_probe_ati_bar2_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(mirror->mem, OBJECT(vdev),
&vfio_generic_mirror_quirk, mirror,
"vfio-ati-bar2-4000-quirk", PCI_CONFIG_SPACE_SIZE);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
mirror->offset, mirror->mem, 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -683,7 +683,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(window->addr_mem, OBJECT(vdev),
&vfio_generic_window_address_quirk, window,
"vfio-nvidia-bar5-window-address-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
window->address_offset,
window->addr_mem, 1);
memory_region_set_enabled(window->addr_mem, false);
@@ -691,7 +691,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(window->data_mem, OBJECT(vdev),
&vfio_generic_window_data_quirk, window,
"vfio-nvidia-bar5-window-data-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
window->data_offset,
window->data_mem, 1);
memory_region_set_enabled(window->data_mem, false);
@@ -699,13 +699,13 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem[2], OBJECT(vdev),
&vfio_nvidia_bar5_quirk_master, bar5,
"vfio-nvidia-bar5-master-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
0, &quirk->mem[2], 1);
memory_region_init_io(&quirk->mem[3], OBJECT(vdev),
&vfio_nvidia_bar5_quirk_enable, bar5,
"vfio-nvidia-bar5-enable-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
4, &quirk->mem[3], 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -767,7 +767,7 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
&vfio_nvidia_mirror_quirk, mirror,
"vfio-nvidia-bar0-88000-mirror-quirk",
vdev->config_size);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
mirror->offset, mirror->mem, 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -786,7 +786,7 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
&vfio_nvidia_mirror_quirk, mirror,
"vfio-nvidia-bar0-1800-mirror-quirk",
PCI_CONFIG_SPACE_SIZE);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
mirror->offset, mirror->mem, 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -947,13 +947,13 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem[0], OBJECT(vdev),
&vfio_rtl_address_quirk, rtl,
"vfio-rtl8168-window-address-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
0x74, &quirk->mem[0], 1);
memory_region_init_io(&quirk->mem[1], OBJECT(vdev),
&vfio_rtl_data_quirk, rtl,
"vfio-rtl8168-window-data-quirk", 4);
- memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
0x70, &quirk->mem[1], 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1020,7 +1020,7 @@ void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
QLIST_FOREACH(quirk, &bar->quirks, next) {
for (i = 0; i < quirk->nr_mem; i++) {
- memory_region_del_subregion(&bar->region.mem, &quirk->mem[i]);
+ memory_region_del_subregion(bar->region.mem, &quirk->mem[i]);
}
}
}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a52947b..a1a42a2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1166,6 +1166,74 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
return 0;
}
+static void vfio_pci_fixup_msix_region(VFIOPCIDevice *vdev)
+{
+ off_t start, end;
+ VFIORegion *region = &vdev->bars[vdev->msix->table_bar].region;
+
+ /*
+ * We expect to find a single mmap covering the whole BAR, anything else
+ * means it's either unsupported or already setup.
+ */
+ if (region->nr_mmaps != 1 || region->mmaps[0].offset ||
+ region->size != region->mmaps[0].size) {
+ return;
+ }
+
+ /* MSI-X table start and end aligned to host page size */
+ start = vdev->msix->table_offset & qemu_real_host_page_mask;
+ end = REAL_HOST_PAGE_ALIGN((uint64_t)vdev->msix->table_offset +
+ (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
+
+ /*
+ * Does the MSI-X table cover the beginning of the BAR? The whole BAR?
+ * NB - Host page size is necessarily a power of two and so is the PCI
+ * BAR (not counting EA yet), therefore if we have host page aligned
+ * @start and @end, then any remainder of the BAR before or after those
+ * must be at least host page sized and therefore mmap'able.
+ */
+ if (!start) {
+ if (end >= region->size) {
+ region->nr_mmaps = 0;
+ g_free(region->mmaps);
+ region->mmaps = NULL;
+ trace_vfio_msix_fixup(vdev->vbasedev.name,
+ vdev->msix->table_bar, 0, 0);
+ } else {
+ region->mmaps[0].offset = end;
+ region->mmaps[0].size = region->size - end;
+ trace_vfio_msix_fixup(vdev->vbasedev.name,
+ vdev->msix->table_bar, region->mmaps[0].offset,
+ region->mmaps[0].offset + region->mmaps[0].size);
+ }
+
+ /* Maybe it's aligned at the end of the BAR */
+ } else if (end >= region->size) {
+ region->mmaps[0].size = start;
+ trace_vfio_msix_fixup(vdev->vbasedev.name,
+ vdev->msix->table_bar, region->mmaps[0].offset,
+ region->mmaps[0].offset + region->mmaps[0].size);
+
+ /* Otherwise it must split the BAR */
+ } else {
+ region->nr_mmaps = 2;
+ region->mmaps = g_renew(VFIOMmap, region->mmaps, 2);
+
+ memcpy(®ion->mmaps[1], ®ion->mmaps[0], sizeof(VFIOMmap));
+
+ region->mmaps[0].size = start;
+ trace_vfio_msix_fixup(vdev->vbasedev.name,
+ vdev->msix->table_bar, region->mmaps[0].offset,
+ region->mmaps[0].offset + region->mmaps[0].size);
+
+ region->mmaps[1].offset = end;
+ region->mmaps[1].size = region->size - end;
+ trace_vfio_msix_fixup(vdev->vbasedev.name,
+ vdev->msix->table_bar, region->mmaps[1].offset,
+ region->mmaps[1].offset + region->mmaps[1].size);
+ }
+}
+
/*
* We don't have any control over how pci_add_capability() inserts
* capabilities into the chain. In order to setup MSI-X we need a
@@ -1240,6 +1308,8 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev)
msix->table_offset, msix->entries);
vdev->msix = msix;
+ vfio_pci_fixup_msix_region(vdev);
+
return 0;
}
@@ -1250,9 +1320,9 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
sizeof(unsigned long));
ret = msix_init(&vdev->pdev, vdev->msix->entries,
- &vdev->bars[vdev->msix->table_bar].region.mem,
+ vdev->bars[vdev->msix->table_bar].region.mem,
vdev->msix->table_bar, vdev->msix->table_offset,
- &vdev->bars[vdev->msix->pba_bar].region.mem,
+ vdev->bars[vdev->msix->pba_bar].region.mem,
vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
if (ret < 0) {
if (ret == -ENOTSUP) {
@@ -1289,8 +1359,8 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
if (vdev->msix) {
msix_uninit(&vdev->pdev,
- &vdev->bars[vdev->msix->table_bar].region.mem,
- &vdev->bars[vdev->msix->pba_bar].region.mem);
+ vdev->bars[vdev->msix->table_bar].region.mem,
+ vdev->bars[vdev->msix->pba_bar].region.mem);
g_free(vdev->msix->pending);
}
}
@@ -1303,16 +1373,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
int i;
for (i = 0; i < PCI_ROM_SLOT; i++) {
- VFIOBAR *bar = &vdev->bars[i];
-
- if (!bar->region.size) {
- continue;
- }
-
- memory_region_set_enabled(&bar->region.mmap_mem, enabled);
- if (vdev->msix && vdev->msix->table_bar == i) {
- memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
- }
+ vfio_region_mmaps_set_enabled(&vdev->bars[i].region, enabled);
}
}
@@ -1326,11 +1387,7 @@ static void vfio_unregister_bar(VFIOPCIDevice *vdev, int nr)
vfio_bar_quirk_teardown(vdev, nr);
- memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
-
- if (vdev->msix && vdev->msix->table_bar == nr) {
- memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
- }
+ vfio_region_exit(&bar->region);
}
static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
@@ -1343,18 +1400,13 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
vfio_bar_quirk_free(vdev, nr);
- munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
-
- if (vdev->msix && vdev->msix->table_bar == nr) {
- munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
- }
+ vfio_region_finalize(&bar->region);
}
static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
uint64_t size = bar->region.size;
- char name[64];
uint32_t pci_bar;
uint8_t type;
int ret;
@@ -1364,8 +1416,6 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
return;
}
- snprintf(name, sizeof(name), "VFIO %s BAR %d", vdev->vbasedev.name, nr);
-
/* Determine what type of BAR this is for registration */
ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
@@ -1380,41 +1430,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
type = pci_bar & (bar->ioport ? ~PCI_BASE_ADDRESS_IO_MASK :
~PCI_BASE_ADDRESS_MEM_MASK);
- /* A "slow" read/write mapping underlies all BARs */
- memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
- bar, name, size);
- pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
-
- /*
- * We can't mmap areas overlapping the MSIX vector table, so we
- * potentially insert a direct-mapped subregion before and after it.
- */
- if (vdev->msix && vdev->msix->table_bar == nr) {
- size = vdev->msix->table_offset & qemu_real_host_page_mask;
- }
-
- strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
- if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
- &bar->region.mmap_mem, &bar->region.mmap,
- size, 0, name)) {
- error_report("%s unsupported. Performance may be slow", name);
- }
-
- if (vdev->msix && vdev->msix->table_bar == nr) {
- uint64_t start;
-
- start = REAL_HOST_PAGE_ALIGN((uint64_t)vdev->msix->table_offset +
- (vdev->msix->entries *
- PCI_MSIX_ENTRY_SIZE));
+ pci_register_bar(&vdev->pdev, nr, type, bar->region.mem);
- size = start < bar->region.size ? bar->region.size - start : 0;
- strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
- /* VFIOMSIXInfo contains another MemoryRegion for this mapping */
- if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
- &vdev->msix->mmap_mem,
- &vdev->msix->mmap, size, start, name)) {
- error_report("%s unsupported. Performance may be slow", name);
- }
+ if (vfio_region_mmap(&bar->region)) {
+ error_report("Failed to mmap %s BAR %d. Performance may be slow",
+ vdev->vbasedev.name, nr);
}
vfio_bar_quirk_setup(vdev, nr);
@@ -2048,25 +2068,18 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
}
for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
- ret = vfio_get_region_info(vbasedev, i, ®_info);
+ char *name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
+
+ ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+ &vdev->bars[i].region, i, name);
+ g_free(name);
+
if (ret) {
error_report("vfio: Error getting region %d info: %m", i);
goto error;
}
- trace_vfio_populate_device_region(vbasedev->name, i,
- (unsigned long)reg_info->size,
- (unsigned long)reg_info->offset,
- (unsigned long)reg_info->flags);
-
- vdev->bars[i].region.vbasedev = vbasedev;
- vdev->bars[i].region.flags = reg_info->flags;
- vdev->bars[i].region.size = reg_info->size;
- vdev->bars[i].region.fd_offset = reg_info->offset;
- vdev->bars[i].region.nr = i;
QLIST_INIT(&vdev->bars[i].quirks);
-
- g_free(reg_info);
}
ret = vfio_get_region_info(vbasedev,
@@ -2152,11 +2165,8 @@ error:
static void vfio_put_device(VFIOPCIDevice *vdev)
{
g_free(vdev->vbasedev.name);
- if (vdev->msix) {
- object_unparent(OBJECT(&vdev->msix->mmap_mem));
- g_free(vdev->msix);
- vdev->msix = NULL;
- }
+ g_free(vdev->msix);
+
vfio_put_base_device(&vdev->vbasedev);
}
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index f9b9c20..a2ab75d 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -143,12 +143,8 @@ static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
{
int i;
- trace_vfio_platform_mmap_set_enabled(enabled);
-
for (i = 0; i < vdev->vbasedev.num_regions; i++) {
- VFIORegion *region = vdev->regions[i];
-
- memory_region_set_enabled(®ion->mmap_mem, enabled);
+ vfio_region_mmaps_set_enabled(vdev->regions[i], enabled);
}
}
@@ -476,29 +472,16 @@ static int vfio_populate_device(VFIODevice *vbasedev)
vdev->regions = g_new0(VFIORegion *, vbasedev->num_regions);
for (i = 0; i < vbasedev->num_regions; i++) {
- struct vfio_region_info *reg_info;
- VFIORegion *ptr;
+ char *name = g_strdup_printf("VFIO %s region %d\n", vbasedev->name, i);
vdev->regions[i] = g_new0(VFIORegion, 1);
- ptr = vdev->regions[i];
- ret = vfio_get_region_info(vbasedev, i, ®_info);
+ ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+ vdev->regions[i], i, name);
+ g_free(name);
if (ret) {
error_report("vfio: Error getting region %d info: %m", i);
goto reg_error;
}
- ptr->flags = reg_info->flags;
- ptr->size = reg_info->size;
- ptr->fd_offset = reg_info->offset;
- ptr->nr = i;
- ptr->vbasedev = vbasedev;
-
- g_free(reg_info);
-
- trace_vfio_platform_populate_regions(ptr->nr,
- (unsigned long)ptr->flags,
- (unsigned long)ptr->size,
- ptr->vbasedev->fd,
- (unsigned long)ptr->fd_offset);
}
vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
@@ -535,6 +518,9 @@ irq_err:
}
reg_error:
for (i = 0; i < vbasedev->num_regions; i++) {
+ if (vdev->regions[i]) {
+ vfio_region_finalize(vdev->regions[i]);
+ }
g_free(vdev->regions[i]);
}
g_free(vdev->regions);
@@ -636,41 +622,6 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
}
/**
- * vfio_map_region - initialize the 2 memory regions for a given
- * MMIO region index
- * @vdev: the VFIO platform device handle
- * @nr: the index of the region
- *
- * Init the top memory region and the mmapped memory region beneath
- * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
- * and could not be passed to memory region functions
-*/
-static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
-{
- VFIORegion *region = vdev->regions[nr];
- uint64_t size = region->size;
- char name[64];
-
- if (!size) {
- return;
- }
-
- g_snprintf(name, sizeof(name), "VFIO %s region %d",
- vdev->vbasedev.name, nr);
-
- /* A "slow" read/write mapping underlies all regions */
- memory_region_init_io(®ion->mem, OBJECT(vdev), &vfio_region_ops,
- region, name, size);
-
- g_strlcat(name, " mmap", sizeof(name));
-
- if (vfio_mmap_region(OBJECT(vdev), region, ®ion->mem,
- ®ion->mmap_mem, ®ion->mmap, size, 0, name)) {
- error_report("%s unsupported. Performance may be slow", name);
- }
-}
-
-/**
* vfio_platform_realize - the device realize function
* @dev: device state pointer
* @errp: error
@@ -700,8 +651,11 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
}
for (i = 0; i < vbasedev->num_regions; i++) {
- vfio_map_region(vdev, i);
- sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
+ if (vfio_region_mmap(vdev->regions[i])) {
+ error_report("%s mmap unsupported. Performance may be slow",
+ memory_region_name(vdev->regions[i]->mem));
+ }
+ sysbus_init_mmio(sbdev, vdev->regions[i]->mem);
}
}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 371383c..594905a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -41,14 +41,21 @@ enum {
VFIO_DEVICE_TYPE_PLATFORM = 1,
};
+typedef struct VFIOMmap {
+ MemoryRegion mem;
+ void *mmap;
+ off_t offset;
+ size_t size;
+} VFIOMmap;
+
typedef struct VFIORegion {
struct VFIODevice *vbasedev;
off_t fd_offset; /* offset of region within device fd */
- MemoryRegion mem; /* slow, read/write access */
- MemoryRegion mmap_mem; /* direct mapped access */
- void *mmap;
+ MemoryRegion *mem; /* slow, read/write access */
size_t size;
uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
+ uint32_t nr_mmaps;
+ VFIOMmap *mmaps;
uint8_t nr; /* cache the region number for debug */
} VFIORegion;
@@ -126,10 +133,12 @@ void vfio_region_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size);
uint64_t vfio_region_read(void *opaque,
hwaddr addr, unsigned size);
-int vfio_mmap_region(Object *vdev, VFIORegion *region,
- MemoryRegion *mem, MemoryRegion *submem,
- void **map, size_t size, off_t offset,
- const char *name);
+int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
+ int index, const char *name);
+int vfio_region_mmap(VFIORegion *region);
+void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
+void vfio_region_exit(VFIORegion *region);
+void vfio_region_finalize(VFIORegion *region);
void vfio_reset_handler(void *opaque);
VFIOGroup *vfio_get_group(int groupid, AddressSpace *as);
void vfio_put_group(VFIOGroup *group);
diff --git a/trace-events b/trace-events
index f986c81..21cd28d 100644
--- a/trace-events
+++ b/trace-events
@@ -1656,6 +1656,7 @@ vfio_msix_enable(const char *name) " (%s)"
vfio_msix_pba_disable(const char *name) " (%s)"
vfio_msix_pba_enable(const char *name) " (%s)"
vfio_msix_disable(const char *name) " (%s)"
+vfio_msix_fixup(const char *name, int bar, off_t offset, size_t size) " (%s) MSI-X region %d mmap fixup [0x%lx - 0x%lx]"
vfio_msi_enable(const char *name, int nr_vectors) " (%s) Enabled %d MSI vectors"
vfio_msi_disable(const char *name) " (%s)"
vfio_pci_load_rom(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s ROM:\n size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
@@ -1674,7 +1675,6 @@ vfio_pci_hot_reset(const char *name, const char *type) " (%s) %s"
vfio_pci_hot_reset_has_dep_devices(const char *name) "%s: hot reset dependent devices:"
vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int group_id) "\t%04x:%02x:%02x.%x group %d"
vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %s"
-vfio_populate_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
vfio_initfn(const char *name, int group_id) " (%s) group %d"
@@ -1730,13 +1730,17 @@ vfio_disconnect_container(int fd) "close container->fd=%d"
vfio_put_group(int fd) "close group->fd=%d"
vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
vfio_put_base_device(int fd) "close vdev->fd=%d"
+vfio_region_setup(const char *dev, int index, const char *name, unsigned long flags, unsigned long offset, unsigned long size) "Device %s, region %d \"%s\", flags: %lx, offset: %lx, size: %lx"
+vfio_region_mmap_fault(const char *name, int index, unsigned long offset, unsigned long size, int fault) "Region %s mmaps[%d], [%lx - %lx], fault: %d"
+vfio_region_mmap(const char *name, unsigned long offset, unsigned long end) "Region %s [%lx - %lx]"
+vfio_region_exit(const char *name, int index) "Device %s, region %d"
+vfio_region_finalize(const char *name, int index) "Device %s, region %d"
+vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps enabled: %d"
# hw/vfio/platform.c
-vfio_platform_populate_regions(int region_index, unsigned long flag, unsigned long size, int fd, unsigned long offset) "- region %d flags = 0x%lx, size = 0x%lx, fd= %d, offset = 0x%lx"
vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
-vfio_platform_mmap_set_enabled(bool enabled) "fast path = %d"
vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow path"
vfio_platform_intp_interrupt(int pin, int fd) "Inject IRQ #%d (fd = %d)"
vfio_platform_intp_inject_pending_lockheld(int pin, int fd) "Inject pending IRQ #%d (fd = %d)"
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 4/9] vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (2 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 3/9] vfio: Generalize region support Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 5/9] linux-headers/vfio: Update for proposed capabilities list Alex Williamson
` (6 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
Match common vfio code with setup, exit, and finalize functions for
BAR, quirk, and VGA management. VGA is also changed to dynamic
allocation to match the other MemoryRegions.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/vfio/pci-quirks.c | 38 ++++++++---------
hw/vfio/pci.c | 114 +++++++++++++++++++++-----------------------------
hw/vfio/pci.h | 10 ++--
3 files changed, 71 insertions(+), 91 deletions(-)
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index d626ec9..49ecf11 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -290,10 +290,10 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
memory_region_init_io(quirk->mem, OBJECT(vdev), &vfio_ati_3c3_quirk, vdev,
"vfio-ati-3c3-quirk", 1);
- memory_region_add_subregion(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem,
+ memory_region_add_subregion(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem,
3 /* offset 3 bytes from 0x3c0 */, quirk->mem);
- QLIST_INSERT_HEAD(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks,
+ QLIST_INSERT_HEAD(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks,
quirk, next);
trace_vfio_quirk_ati_3c3_probe(vdev->vbasedev.name);
@@ -428,7 +428,7 @@ static uint64_t vfio_nvidia_3d4_quirk_read(void *opaque,
quirk->state = NONE;
- return vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
+ return vfio_vga_read(&vdev->vga->region[QEMU_PCI_VGA_IO_HI],
addr + 0x14, size);
}
@@ -465,7 +465,7 @@ static void vfio_nvidia_3d4_quirk_write(void *opaque, hwaddr addr,
break;
}
- vfio_vga_write(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
+ vfio_vga_write(&vdev->vga->region[QEMU_PCI_VGA_IO_HI],
addr + 0x14, data, size);
}
@@ -481,7 +481,7 @@ static uint64_t vfio_nvidia_3d0_quirk_read(void *opaque,
VFIONvidia3d0Quirk *quirk = opaque;
VFIOPCIDevice *vdev = quirk->vdev;
VFIONvidia3d0State old_state = quirk->state;
- uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
+ uint64_t data = vfio_vga_read(&vdev->vga->region[QEMU_PCI_VGA_IO_HI],
addr + 0x10, size);
quirk->state = NONE;
@@ -523,7 +523,7 @@ static void vfio_nvidia_3d0_quirk_write(void *opaque, hwaddr addr,
}
}
- vfio_vga_write(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
+ vfio_vga_write(&vdev->vga->region[QEMU_PCI_VGA_IO_HI],
addr + 0x10, data, size);
}
@@ -551,15 +551,15 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_nvidia_3d4_quirk,
data, "vfio-nvidia-3d4-quirk", 2);
- memory_region_add_subregion(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem,
+ memory_region_add_subregion(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem,
0x14 /* 0x3c0 + 0x14 */, &quirk->mem[0]);
memory_region_init_io(&quirk->mem[1], OBJECT(vdev), &vfio_nvidia_3d0_quirk,
data, "vfio-nvidia-3d0-quirk", 2);
- memory_region_add_subregion(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem,
+ memory_region_add_subregion(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem,
0x10 /* 0x3c0 + 0x10 */, &quirk->mem[1]);
- QLIST_INSERT_HEAD(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks,
+ QLIST_INSERT_HEAD(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks,
quirk, next);
trace_vfio_quirk_nvidia_3d0_probe(vdev->vbasedev.name);
@@ -970,28 +970,28 @@ void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
vfio_vga_probe_nvidia_3d0_quirk(vdev);
}
-void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
+void vfio_vga_quirk_exit(VFIOPCIDevice *vdev)
{
VFIOQuirk *quirk;
int i, j;
- for (i = 0; i < ARRAY_SIZE(vdev->vga.region); i++) {
- QLIST_FOREACH(quirk, &vdev->vga.region[i].quirks, next) {
+ for (i = 0; i < ARRAY_SIZE(vdev->vga->region); i++) {
+ QLIST_FOREACH(quirk, &vdev->vga->region[i].quirks, next) {
for (j = 0; j < quirk->nr_mem; j++) {
- memory_region_del_subregion(&vdev->vga.region[i].mem,
+ memory_region_del_subregion(&vdev->vga->region[i].mem,
&quirk->mem[j]);
}
}
}
}
-void vfio_vga_quirk_free(VFIOPCIDevice *vdev)
+void vfio_vga_quirk_finalize(VFIOPCIDevice *vdev)
{
int i, j;
- for (i = 0; i < ARRAY_SIZE(vdev->vga.region); i++) {
- while (!QLIST_EMPTY(&vdev->vga.region[i].quirks)) {
- VFIOQuirk *quirk = QLIST_FIRST(&vdev->vga.region[i].quirks);
+ for (i = 0; i < ARRAY_SIZE(vdev->vga->region); i++) {
+ while (!QLIST_EMPTY(&vdev->vga->region[i].quirks)) {
+ VFIOQuirk *quirk = QLIST_FIRST(&vdev->vga->region[i].quirks);
QLIST_REMOVE(quirk, next);
for (j = 0; j < quirk->nr_mem; j++) {
object_unparent(OBJECT(&quirk->mem[j]));
@@ -1012,7 +1012,7 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
vfio_probe_rtl8168_bar2_quirk(vdev, nr);
}
-void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
+void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
VFIOQuirk *quirk;
@@ -1025,7 +1025,7 @@ void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
}
}
-void vfio_bar_quirk_free(VFIOPCIDevice *vdev, int nr)
+void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
int i;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a1a42a2..8e20781 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1377,42 +1377,16 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
}
}
-static void vfio_unregister_bar(VFIOPCIDevice *vdev, int nr)
+static void vfio_bar_setup(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
- if (!bar->region.size) {
- return;
- }
-
- vfio_bar_quirk_teardown(vdev, nr);
-
- vfio_region_exit(&bar->region);
-}
-
-static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
-{
- VFIOBAR *bar = &vdev->bars[nr];
-
- if (!bar->region.size) {
- return;
- }
-
- vfio_bar_quirk_free(vdev, nr);
-
- vfio_region_finalize(&bar->region);
-}
-
-static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
-{
- VFIOBAR *bar = &vdev->bars[nr];
- uint64_t size = bar->region.size;
uint32_t pci_bar;
uint8_t type;
int ret;
/* Skip both unimplemented BARs and the upper half of 64bit BARS. */
- if (!size) {
+ if (!bar->region.size) {
return;
}
@@ -1430,72 +1404,78 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
type = pci_bar & (bar->ioport ? ~PCI_BASE_ADDRESS_IO_MASK :
~PCI_BASE_ADDRESS_MEM_MASK);
- pci_register_bar(&vdev->pdev, nr, type, bar->region.mem);
-
if (vfio_region_mmap(&bar->region)) {
error_report("Failed to mmap %s BAR %d. Performance may be slow",
vdev->vbasedev.name, nr);
}
vfio_bar_quirk_setup(vdev, nr);
+
+ pci_register_bar(&vdev->pdev, nr, type, bar->region.mem);
}
-static void vfio_map_bars(VFIOPCIDevice *vdev)
+static void vfio_bars_setup(VFIOPCIDevice *vdev)
{
int i;
for (i = 0; i < PCI_ROM_SLOT; i++) {
- vfio_map_bar(vdev, i);
+ vfio_bar_setup(vdev, i);
}
- if (vdev->has_vga) {
- memory_region_init_io(&vdev->vga.region[QEMU_PCI_VGA_MEM].mem,
+ if (vdev->vga) {
+ memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_MEM].mem,
OBJECT(vdev), &vfio_vga_ops,
- &vdev->vga.region[QEMU_PCI_VGA_MEM],
+ &vdev->vga->region[QEMU_PCI_VGA_MEM],
"vfio-vga-mmio@0xa0000",
QEMU_PCI_VGA_MEM_SIZE);
- memory_region_init_io(&vdev->vga.region[QEMU_PCI_VGA_IO_LO].mem,
+ memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem,
OBJECT(vdev), &vfio_vga_ops,
- &vdev->vga.region[QEMU_PCI_VGA_IO_LO],
+ &vdev->vga->region[QEMU_PCI_VGA_IO_LO],
"vfio-vga-io@0x3b0",
QEMU_PCI_VGA_IO_LO_SIZE);
- memory_region_init_io(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem,
+ memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem,
OBJECT(vdev), &vfio_vga_ops,
- &vdev->vga.region[QEMU_PCI_VGA_IO_HI],
+ &vdev->vga->region[QEMU_PCI_VGA_IO_HI],
"vfio-vga-io@0x3c0",
QEMU_PCI_VGA_IO_HI_SIZE);
- pci_register_vga(&vdev->pdev, &vdev->vga.region[QEMU_PCI_VGA_MEM].mem,
- &vdev->vga.region[QEMU_PCI_VGA_IO_LO].mem,
- &vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem);
+ pci_register_vga(&vdev->pdev, &vdev->vga->region[QEMU_PCI_VGA_MEM].mem,
+ &vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem,
+ &vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem);
vfio_vga_quirk_setup(vdev);
}
}
-static void vfio_unregister_bars(VFIOPCIDevice *vdev)
+static void vfio_bars_exit(VFIOPCIDevice *vdev)
{
int i;
for (i = 0; i < PCI_ROM_SLOT; i++) {
- vfio_unregister_bar(vdev, i);
+ vfio_bar_quirk_exit(vdev, i);
+ vfio_region_exit(&vdev->bars[i].region);
}
- if (vdev->has_vga) {
- vfio_vga_quirk_teardown(vdev);
+ if (vdev->vga) {
pci_unregister_vga(&vdev->pdev);
+ vfio_vga_quirk_exit(vdev);
}
}
-static void vfio_unmap_bars(VFIOPCIDevice *vdev)
+static void vfio_bars_finalize(VFIOPCIDevice *vdev)
{
int i;
for (i = 0; i < PCI_ROM_SLOT; i++) {
- vfio_unmap_bar(vdev, i);
+ vfio_bar_quirk_finalize(vdev, i);
+ vfio_region_finalize(&vdev->bars[i].region);
}
- if (vdev->has_vga) {
- vfio_vga_quirk_free(vdev);
+ if (vdev->vga) {
+ vfio_vga_quirk_finalize(vdev);
+ for (i = 0; i < ARRAY_SIZE(vdev->vga->region); i++) {
+ object_unparent(OBJECT(&vdev->vga->region[i].mem));
+ }
+ g_free(vdev->vga);
}
}
@@ -2123,24 +2103,24 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
goto error;
}
- vdev->vga.fd_offset = reg_info->offset;
- vdev->vga.fd = vdev->vbasedev.fd;
+ vdev->vga = g_new0(VFIOVGA, 1);
- g_free(reg_info);
+ vdev->vga->fd_offset = reg_info->offset;
+ vdev->vga->fd = vdev->vbasedev.fd;
- vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
- vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
- QLIST_INIT(&vdev->vga.region[QEMU_PCI_VGA_MEM].quirks);
+ g_free(reg_info);
- vdev->vga.region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
- vdev->vga.region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
- QLIST_INIT(&vdev->vga.region[QEMU_PCI_VGA_IO_LO].quirks);
+ vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
+ vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
+ QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks);
- vdev->vga.region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
- vdev->vga.region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
- QLIST_INIT(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks);
+ vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
+ vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
+ QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks);
- vdev->has_vga = true;
+ vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
+ vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
+ QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks);
}
irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
@@ -2527,7 +2507,7 @@ static int vfio_initfn(PCIDevice *pdev)
return ret;
}
- vfio_map_bars(vdev);
+ vfio_bars_setup(vdev);
ret = vfio_add_capabilities(vdev);
if (ret) {
@@ -2564,7 +2544,7 @@ static int vfio_initfn(PCIDevice *pdev)
out_teardown:
pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
vfio_teardown_msi(vdev);
- vfio_unregister_bars(vdev);
+ vfio_bars_exit(vdev);
return ret;
}
@@ -2574,7 +2554,7 @@ static void vfio_instance_finalize(Object *obj)
VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pci_dev);
VFIOGroup *group = vdev->vbasedev.group;
- vfio_unmap_bars(vdev);
+ vfio_bars_finalize(vdev);
g_free(vdev->emulated_config_bits);
g_free(vdev->rom);
vfio_put_device(vdev);
@@ -2593,7 +2573,7 @@ static void vfio_exitfn(PCIDevice *pdev)
timer_free(vdev->intx.mmap_timer);
}
vfio_teardown_msi(vdev);
- vfio_unregister_bars(vdev);
+ vfio_bars_exit(vdev);
}
static void vfio_pci_reset(DeviceState *dev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 6256587..b8a7189 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -114,7 +114,7 @@ typedef struct VFIOPCIDevice {
int nr_vectors; /* Number of MSI/MSIX vectors currently in use */
int interrupt; /* Current interrupt type */
VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
- VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
+ VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
PCIHostDeviceAddress host;
EventNotifier err_notifier;
EventNotifier req_notifier;
@@ -150,11 +150,11 @@ void vfio_vga_write(void *opaque, hwaddr addr, uint64_t data, unsigned size);
bool vfio_blacklist_opt_rom(VFIOPCIDevice *vdev);
void vfio_vga_quirk_setup(VFIOPCIDevice *vdev);
-void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev);
-void vfio_vga_quirk_free(VFIOPCIDevice *vdev);
+void vfio_vga_quirk_exit(VFIOPCIDevice *vdev);
+void vfio_vga_quirk_finalize(VFIOPCIDevice *vdev);
void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr);
-void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr);
-void vfio_bar_quirk_free(VFIOPCIDevice *vdev, int nr);
+void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr);
+void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr);
void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
#endif /* HW_VFIO_VFIO_PCI_H */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 5/9] linux-headers/vfio: Update for proposed capabilities list
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (3 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 4/9] vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 6/9] vfio: Enable sparse mmap capability Alex Williamson
` (5 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
| 101 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 99 insertions(+), 2 deletions(-)
--git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index aa276bc..759b850 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -39,6 +39,13 @@
#define VFIO_SPAPR_TCE_v2_IOMMU 7
/*
+ * The No-IOMMU IOMMU offers no translation or isolation for devices and
+ * supports no ioctls outside of VFIO_CHECK_EXTENSION. Use of VFIO's No-IOMMU
+ * code will taint the host kernel and should be used with extreme caution.
+ */
+#define VFIO_NOIOMMU_IOMMU 8
+
+/*
* The IOCTL interface is designed for extensibility by embedding the
* structure length (argsz) and flags into structures passed between
* kernel and userspace. We therefore use the _IO() macro for these
@@ -52,6 +59,33 @@
#define VFIO_TYPE (';')
#define VFIO_BASE 100
+/*
+ * For extension of INFO ioctls, VFIO makes use of a capability chain
+ * designed after PCI/e capabilities. A flag bit indicates whether
+ * this capability chain is supported and a field defined in the fixed
+ * structure defines the offset of the first capability in the chain.
+ * This field is only valid when the corresponding bit in the flags
+ * bitmap is set. This offset field is relative to the start of the
+ * INFO buffer, as is the next field within each capability header.
+ * The id within the header is a shared address space per INFO ioctl,
+ * while the version field is specific to the capability id. The
+ * contents following the header are specific to the capability id.
+ */
+struct vfio_info_cap_header {
+ __u16 id; /* Identifies capability */
+ __u16 version; /* Version specific to the capability ID */
+ __u32 next; /* Offset of next capability */
+};
+
+/*
+ * Callers of INFO ioctls passing insufficiently sized buffers will see
+ * the capability chain flag bit set, a zero value for the first capability
+ * offset (if available within the provided argsz), and argsz will be
+ * updated to report the necessary buffer size. For compatibility, the
+ * INFO ioctl will not report error in this case, but the capability chain
+ * will not be available.
+ */
+
/* -------- IOCTLs for VFIO file descriptor (/dev/vfio/vfio) -------- */
/**
@@ -187,13 +221,73 @@ struct vfio_region_info {
#define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports read */
#define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */
#define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */
+#define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */
__u32 index; /* Region index */
- __u32 resv; /* Reserved for alignment */
+ __u32 cap_offset; /* Offset within info struct of first cap */
__u64 size; /* Region size (bytes) */
__u64 offset; /* Region offset from start of device fd */
};
#define VFIO_DEVICE_GET_REGION_INFO _IO(VFIO_TYPE, VFIO_BASE + 8)
+/*
+ * The sparse mmap capability allows finer granularity of specifying areas
+ * within a region with mmap support. When specified, the user should only
+ * mmap the offset ranges specified by the areas array. mmaps outside of the
+ * areas specified may fail (such as the range covering a PCI MSI-X table) or
+ * may result in improper device behavior.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_REGION_INFO_CAP_SPARSE_MMAP 1
+
+struct vfio_region_sparse_mmap_area {
+ __u64 offset; /* Offset of mmap'able area within region */
+ __u64 size; /* Size of mmap'able area */
+};
+
+struct vfio_region_info_cap_sparse_mmap {
+ struct vfio_info_cap_header header;
+ __u32 nr_areas;
+ __u32 reserved;
+ struct vfio_region_sparse_mmap_area areas[];
+};
+
+/*
+ * The device specific type capability allows regions unique to a specific
+ * device or class of devices to be exposed. This helps solve the problem for
+ * vfio bus drivers of defining which region indexes correspond to which region
+ * on the device, without needing to resort to static indexes, as done by
+ * vfio-pci. For instance, if we were to go back in time, we might remove
+ * VFIO_PCI_VGA_REGION_INDEX and let vfio-pci simply define that all indexes
+ * greater than or equal to VFIO_PCI_NUM_REGIONS are device specific and we'd
+ * make a "VGA" device specific type to describe the VGA access space. This
+ * means that non-VGA devices wouldn't need to waste this index, and thus the
+ * address space associated with it due to implementation of device file
+ * descriptor offsets in vfio-pci.
+ *
+ * The current implementation is now part of the user ABI, so we can't use this
+ * for VGA, but there are other upcoming use cases, such as opregions for Intel
+ * IGD devices and framebuffers for vGPU devices. We missed VGA, but we'll
+ * use this for future additions.
+ *
+ * The structure below defines version 1 of this capability.
+ */
+#define VFIO_REGION_INFO_CAP_TYPE 2
+
+struct vfio_region_info_cap_type {
+ struct vfio_info_cap_header header;
+ __u32 type; /* global per bus driver */
+ __u32 subtype; /* type specific */
+};
+
+#define VFIO_REGION_TYPE_PCI_VENDOR_TYPE (1 << 31)
+#define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff)
+
+/* 8086 Vendor sub-types */
+#define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION (1)
+#define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2)
+#define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3)
+
/**
* VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
* struct vfio_irq_info)
@@ -329,7 +423,8 @@ enum {
* between described ranges are unimplemented.
*/
VFIO_PCI_VGA_REGION_INDEX,
- VFIO_PCI_NUM_REGIONS
+ VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
+ /* device specific cap to define content. */
};
enum {
@@ -568,8 +663,10 @@ struct vfio_iommu_spapr_tce_create {
__u32 flags;
/* in */
__u32 page_shift;
+ __u32 __resv1;
__u64 window_size;
__u32 levels;
+ __u32 __resv2;
/* out */
__u64 start_addr;
};
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 6/9] vfio: Enable sparse mmap capability
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (4 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 5/9] linux-headers/vfio: Update for proposed capabilities list Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 7/9] vfio/pci: Fixup PCI option ROMs Alex Williamson
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
The sparse mmap capability in a vfio region info allows vfio to tell
us which sub-areas of a region may be mmap'd. Thus rather than
assuming a single mmap covers the entire region and later frobbing it
ourselves for things like the PCI MSI-X vector table, we can read that
directly from vfio.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/vfio/common.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++---
trace-events | 2 ++
2 files changed, 65 insertions(+), 4 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 96ccb79..879a657 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -493,6 +493,54 @@ static void vfio_listener_release(VFIOContainer *container)
memory_listener_unregister(&container->listener);
}
+static struct vfio_info_cap_header *
+vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id)
+{
+ struct vfio_info_cap_header *hdr;
+ void *ptr = info;
+
+ if (!(info->flags & VFIO_REGION_INFO_FLAG_CAPS)) {
+ return NULL;
+ }
+
+ for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+ if (hdr->id == id) {
+ return hdr;
+ }
+ }
+
+ return NULL;
+}
+
+static void vfio_setup_region_sparse_mmaps(VFIORegion *region,
+ struct vfio_region_info *info)
+{
+ struct vfio_info_cap_header *hdr;
+ struct vfio_region_info_cap_sparse_mmap *sparse;
+ int i;
+
+ hdr = vfio_get_region_info_cap(info, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+ if (!hdr) {
+ return;
+ }
+
+ sparse = container_of(hdr, struct vfio_region_info_cap_sparse_mmap, header);
+
+ trace_vfio_region_sparse_mmap_header(region->vbasedev->name,
+ region->nr, sparse->nr_areas);
+
+ region->nr_mmaps = sparse->nr_areas;
+ region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+
+ for (i = 0; i < region->nr_mmaps; i++) {
+ region->mmaps[i].offset = sparse->areas[i].offset;
+ region->mmaps[i].size = sparse->areas[i].size;
+ trace_vfio_region_sparse_mmap_entry(i, region->mmaps[i].offset,
+ region->mmaps[i].offset +
+ region->mmaps[i].size);
+ }
+}
+
int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
int index, const char *name)
{
@@ -519,11 +567,14 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
!(region->size & ~qemu_real_host_page_mask)) {
- region->nr_mmaps = 1;
- region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+ vfio_setup_region_sparse_mmaps(region, info);
- region->mmaps[0].offset = 0;
- region->mmaps[0].size = region->size;
+ if (!region->nr_mmaps) {
+ region->nr_mmaps = 1;
+ region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+ region->mmaps[0].offset = 0;
+ region->mmaps[0].size = region->size;
+ }
}
}
@@ -1083,6 +1134,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
*info = g_malloc0(argsz);
(*info)->index = index;
+retry:
(*info)->argsz = argsz;
if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
@@ -1090,6 +1142,13 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
return -errno;
}
+ if ((*info)->argsz > argsz) {
+ argsz = (*info)->argsz;
+ *info = g_realloc(*info, argsz);
+
+ goto retry;
+ }
+
return 0;
}
diff --git a/trace-events b/trace-events
index 21cd28d..c2f48af 100644
--- a/trace-events
+++ b/trace-events
@@ -1736,6 +1736,8 @@ vfio_region_mmap(const char *name, unsigned long offset, unsigned long end) "Reg
vfio_region_exit(const char *name, int index) "Device %s, region %d"
vfio_region_finalize(const char *name, int index) "Device %s, region %d"
vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps enabled: %d"
+vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries"
+vfio_region_sparse_mmap_entry(int i, off_t start, off_t end) "sparse entry %d [0x%lx - 0x%lx]"
# hw/vfio/platform.c
vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 7/9] vfio/pci: Fixup PCI option ROMs
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (5 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 6/9] vfio: Enable sparse mmap capability Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 8/9] vfio/pci: Split out VGA setup Alex Williamson
` (3 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
Devices like Intel graphics are known to not only have bad checksums,
but also the wrong device ID. This is not so surprising given that
the video BIOS is typically part of the system firmware image rather
that embedded into the device and needs to support any IGD device
installed into the system.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/vfio/pci.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8e20781..d0d0864 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -832,6 +832,36 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
break;
}
}
+
+ /*
+ * Test the ROM signature against our device, if the vendor is correct
+ * but the device ID doesn't match, store the correct device ID and
+ * recompute the checksum. Intel IGD devices need this and are known
+ * to have bogus checksums so we can't simply adjust the checksum.
+ */
+ if (pci_get_word(vdev->rom) == 0xaa55 &&
+ pci_get_word(vdev->rom + 0x18) + 8 < vdev->rom_size &&
+ !memcmp(vdev->rom + pci_get_word(vdev->rom + 0x18), "PCIR", 4)) {
+ uint16_t vid, did;
+
+ vid = pci_get_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 4);
+ did = pci_get_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 6);
+
+ if (vid == vdev->vendor_id && did != vdev->device_id) {
+ int i;
+ uint8_t csum, *data = vdev->rom;
+
+ pci_set_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 6,
+ vdev->device_id);
+ data[6] = 0;
+
+ for (csum = 0, i = 0; i < vdev->rom_size; i++) {
+ csum += data[i];
+ }
+
+ data[6] = -csum;
+ }
+ }
}
static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 8/9] vfio/pci: Split out VGA setup
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (6 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 7/9] vfio/pci: Fixup PCI option ROMs Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 9/9] Intel IGD support for vfio Alex Williamson
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
This could be setup later by device specific code, such as IGD
initialization.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/vfio/pci.c | 82 +++++++++++++++++++++++++++++++++------------------------
hw/vfio/pci.h | 2 +
2 files changed, 50 insertions(+), 34 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d0d0864..368f40d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2053,6 +2053,52 @@ static VFIODeviceOps vfio_pci_ops = {
.vfio_eoi = vfio_intx_eoi,
};
+int vfio_populate_vga(VFIOPCIDevice *vdev)
+{
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ struct vfio_region_info *reg_info;
+ int ret;
+
+ if (vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
+ ret = vfio_get_region_info(vbasedev,
+ VFIO_PCI_VGA_REGION_INDEX, ®_info);
+ if (ret) {
+ return ret;
+ }
+
+ if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
+ !(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
+ reg_info->size < 0xbffff + 1) {
+ error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
+ (unsigned long)reg_info->flags,
+ (unsigned long)reg_info->size);
+ g_free(reg_info);
+ return -EINVAL;
+ }
+
+ vdev->vga = g_new0(VFIOVGA, 1);
+
+ vdev->vga->fd_offset = reg_info->offset;
+ vdev->vga->fd = vdev->vbasedev.fd;
+
+ g_free(reg_info);
+
+ vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
+ vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
+ QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks);
+
+ vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
+ vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
+ QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks);
+
+ vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
+ vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
+ QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks);
+ }
+
+ return 0;
+}
+
static int vfio_populate_device(VFIOPCIDevice *vdev)
{
VFIODevice *vbasedev = &vdev->vbasedev;
@@ -2112,45 +2158,13 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
g_free(reg_info);
- if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
- vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
- ret = vfio_get_region_info(vbasedev,
- VFIO_PCI_VGA_REGION_INDEX, ®_info);
+ if (vdev->features & VFIO_FEATURE_ENABLE_VGA) {
+ ret = vfio_populate_vga(vdev);
if (ret) {
error_report(
"vfio: Device does not support requested feature x-vga");
goto error;
}
-
- if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
- !(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
- reg_info->size < 0xbffff + 1) {
- error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
- (unsigned long)reg_info->flags,
- (unsigned long)reg_info->size);
- g_free(reg_info);
- ret = -1;
- goto error;
- }
-
- vdev->vga = g_new0(VFIOVGA, 1);
-
- vdev->vga->fd_offset = reg_info->offset;
- vdev->vga->fd = vdev->vbasedev.fd;
-
- g_free(reg_info);
-
- vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
- vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
- QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks);
-
- vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
- vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
- QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks);
-
- vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
- vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
- QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks);
}
irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b8a7189..3976f68 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -157,4 +157,6 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr);
void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr);
void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
+int vfio_populate_vga(VFIOPCIDevice *vdev);
+
#endif /* HW_VFIO_VFIO_PCI_H */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [RFC PATCH v3 9/9] Intel IGD support for vfio
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (7 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 8/9] vfio/pci: Split out VGA setup Alex Williamson
@ 2016-02-16 21:32 ` Alex Williamson
2016-02-17 11:16 ` [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Tian, Kevin
2016-03-01 14:49 ` Gerd Hoffmann
10 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-16 21:32 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.williamson, allen.m.kay, kvm
---
hw/vfio/common.c | 2
hw/vfio/pci-quirks.c | 476 +++++++++++++++++++++++++++++++++++++++++
hw/vfio/pci.c | 68 ++++++
hw/vfio/pci.h | 9 +
include/hw/vfio/vfio-common.h | 2
trace-events | 7 +
6 files changed, 562 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 879a657..c201bee 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -493,7 +493,7 @@ static void vfio_listener_release(VFIOContainer *container)
memory_listener_unregister(&container->listener);
}
-static struct vfio_info_cap_header *
+struct vfio_info_cap_header *
vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id)
{
struct vfio_info_cap_header *hdr;
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 49ecf11..5828362 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -11,9 +11,11 @@
*/
#include "qemu/osdep.h"
+#include "hw/nvram/fw_cfg.h"
#include "pci.h"
#include "trace.h"
#include "qemu/range.h"
+#include "qemu/error-report.h"
/* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
static bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint32_t device)
@@ -962,6 +964,479 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
}
/*
+ * Intel IGD support
+ *
+ * We need to do a few things to support Intel Integrated Graphics Devices:
+ * 1) Define a stolen memory region and trap I/O port writes programming it
+ * 2) Expose the OpRegion if one is provided to us
+ * 3) Copy key PCI config space register values from the host bridge
+ * 4) Create an LPC/ISA bridge and do the same for it.
+ *
+ * We mostly try to hide IGD stolen memory (GGMS/GMS) from the VM, but if a ROM
+ * is exposed it will try to use at least 1MB of GGMS stolen memory, apparently
+ * for VESA modes. The ROM itself seems to contain the address of the host
+ * stolen memory range as execution of the vBIOS writes these addresses without
+ * probing the hardware. Fortunately the vBIOS writes these addresses through
+ * I/O port space and they're only for use by the graphics device itself.
+ * Therefore we can intercept them without a performance penalty to native
+ * drivers and we can modify them to a new range without affecting
+ * functionality. We ask the VM BIOS to allocate a new reserved range for this
+ * with the "etc/igd-bdsm" fw_cfg file, which is not actually readable, just a
+ * convenient way of providing a named tag with size. If VGA is not disabled
+ * on IGD, we'll also automatically enable it through this process since
+ * execution of the vBIOS sort of implies VGA. This can all be disabled by
+ * passing rombar=0 to the device.
+ *
+ * The remaining quirks are all enabled through vfio device specific regions
+ * and are triggered through discovery of those regions. Exposing the OpRegion
+ * is mainly useful for the Video BIOS Table (VBT). We create a copy of the
+ * OpRegion data and ask the VM BIOS to create storage space for it and copy it
+ * into VM memory using the "etc/igd-opregion" fw_cfg file.
+ *
+ * The host and ISA bridge features are necessary for IGD versions that do not
+ * support Intel's Universal Passthrough Mode (UPT). UPT should be supported
+ * on BroadWell and newer GPUs. However, not all guest drivers support this.
+ * Therefore if the IGD is an 8th generation or newer, we'll only initialize
+ * these devices if VGA mode is not supported. Since VGA mode can be
+ * automatically enabled for the stolen memory support above, this means
+ * specifically disabling ROM support.
+ */
+
+/*
+ * This presumes the device is already known to be an Intel VGA device, so we
+ * take liberties in which device ID bits match which generation.
+ * See linux:include/drm/i915_pciids.h for IDs.
+ */
+static int igd_gen(VFIOPCIDevice *vdev)
+{
+ if ((vdev->device_id & 0xfff) == 0xa84) {
+ return 8; /* Broxton */
+ }
+
+ switch (vdev->device_id & 0xff00) {
+ /* Old, untested, unavailable, unknown */
+ case 0x0000:
+ case 0x2500:
+ case 0x2700:
+ case 0x2900:
+ case 0x2a00:
+ case 0x2e00:
+ case 0x3500:
+ case 0xa000:
+ return -1;
+ /* SandyBridge, IvyBridge, ValleyView, Haswell */
+ case 0x0100:
+ case 0x0400:
+ case 0x0a00:
+ case 0x0c00:
+ case 0x0d00:
+ case 0x0f00:
+ return 6;
+ /* BroadWell, CherryView, SkyLake, KabyLake */
+ case 0x1600:
+ case 0x1900:
+ case 0x2200:
+ case 0x5900:
+ return 8;
+ }
+
+ return 8; /* Assume newer is compatible */
+}
+
+typedef struct VFIOIGDQuirk {
+ struct VFIOPCIDevice *vdev;
+ uint32_t index;
+} VFIOIGDQuirk;
+
+#define IGD_GMCH 0x50 /* Graphics Control Register */
+#define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
+#define IGD_ASLS 0xfc /* ASL Storage Register */
+
+static uint64_t vfio_igd_quirk_data_read(void *opaque,
+ hwaddr addr, unsigned size)
+{
+ VFIOIGDQuirk *igd = opaque;
+ VFIOPCIDevice *vdev = igd->vdev;
+
+ igd->index = ~0;
+
+ return vfio_region_read(&vdev->bars[4].region, addr + 4, size);
+}
+
+static void vfio_igd_quirk_data_write(void *opaque, hwaddr addr,
+ uint64_t data, unsigned size)
+{
+ VFIOIGDQuirk *igd = opaque;
+ VFIOPCIDevice *vdev = igd->vdev;
+
+ /*
+ * Programming the GGMS starts at index 0x1 and uses every 4th index up
+ * through 0x3fd (ie. 0x1, 0x5, 0x9, ..., 0x3f9, 0x3fd). The address
+ * written at each index is incremented by 4k. This only accounts for 1MB,
+ * while the GGMS is potentially up to 2MB, the vBIOS doesn't seem to go
+ * beyond this and we don't have a spec reference to know if it goes up
+ * through 0x7fd.
+ */
+ if (igd->index < 0x400 && (igd->index % 4 == 1)) {
+ uint32_t base;
+
+ base = pci_get_long(vdev->pdev.config + IGD_BDSM);
+ if (!base) {
+ hw_error("vfio-igd: Guest attempted to program IGD GTT before BIOS reserved stolen memory. Unsupported BIOS?");
+ }
+
+ base |= (data & ((1 << 20) - 1));
+
+ trace_vfio_pci_igd_bar4_write(vdev->vbasedev.name,
+ igd->index, data, base);
+ data = base;
+ }
+
+ vfio_region_write(&vdev->bars[4].region, addr + 4, data, size);
+
+ /*
+ * Observation: On IVB system the vBIOS writes up through index 0x3f9,
+ * which correlates to offset 0xfe000 within the stolen memory range. That
+ * suspiciously leaves exactly one 4k page of the first 1MB unwritten and
+ * generates DMAR faults at offset 0xff000 from the host BDSM. If we do
+ * one more step, triggered on the write to index 0x3f9 to write the index
+ * and data for that last page, the issue appears resolved. Note that the
+ * GGMS may be 1MB or 2MB, but the vBIOS seems to program 1MB regardless
+ * and inspection shows the following index registers in the sequence
+ * already addresses within the first 1MB programmed.
+ */
+ if (igd->index == 0x3f9) {
+ vfio_region_write(&vdev->bars[4].region, addr, igd->index + 4, 4);
+ vfio_region_write(&vdev->bars[4].region, addr + 4, data + 0x1000, size);
+ }
+
+ igd->index = ~0;
+}
+
+static const MemoryRegionOps vfio_igd_data_quirk = {
+ .read = vfio_igd_quirk_data_read,
+ .write = vfio_igd_quirk_data_write,
+ .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static uint64_t vfio_igd_quirk_index_read(void *opaque,
+ hwaddr addr, unsigned size)
+{
+ VFIOIGDQuirk *igd = opaque;
+ VFIOPCIDevice *vdev = igd->vdev;
+
+ igd->index = ~0;
+
+ return vfio_region_read(&vdev->bars[4].region, addr, size);
+}
+
+static void vfio_igd_quirk_index_write(void *opaque, hwaddr addr,
+ uint64_t data, unsigned size)
+{
+ VFIOIGDQuirk *igd = opaque;
+ VFIOPCIDevice *vdev = igd->vdev;
+
+ igd->index = data;
+
+ vfio_region_write(&vdev->bars[4].region, addr, data, size);
+}
+
+static const MemoryRegionOps vfio_igd_index_quirk = {
+ .read = vfio_igd_quirk_index_read,
+ .write = vfio_igd_quirk_index_write,
+ .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
+{
+ DeviceClass *dc = DEVICE_GET_CLASS(vdev);
+ struct vfio_region_info *reg_info;
+ VFIOQuirk *quirk;
+ VFIOIGDQuirk *igd;
+ int rom_size, ggms_mb, gms_mb = 0;
+ uint32_t gmch, gms_mask, gms_shift, ggms_mask, ggms_shift;
+
+ if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
+ !vfio_is_vga(vdev) || nr != 4) {
+ return;
+ }
+
+ if (vfio_get_region_info(&vdev->vbasedev,
+ VFIO_PCI_ROM_REGION_INDEX, ®_info)) {
+ error_report("Failed to get IGD ROM info");
+ return;
+ }
+
+ rom_size = reg_info->size;
+ g_free(reg_info);
+
+ if (!rom_size) {
+ return; /* No ROM, no VBIOS, no need */
+ }
+
+ /* See linux:include/drm/i915_drm.h */
+ switch (igd_gen(vdev)) {
+ case 6:
+ gms_mask = 0x1f;
+ gms_shift = 3;
+ ggms_mask = 0x3;
+ ggms_shift = 8;
+ break;
+ case 8:
+ gms_mask = 0xff;
+ gms_shift = 8;
+ ggms_mask = 0x3;
+ ggms_shift = 6;
+ break;
+ default:
+ error_report("Unknown IGD version, try SandyBridge or newer.");
+ return;
+ }
+
+ if (vdev->pdev.qdev.hotplugged) {
+ error_report("IGD should not be hot-added, good luck");
+ return;
+ }
+
+ gmch = vfio_pci_read_config(&vdev->pdev, IGD_GMCH, 4);
+ ggms_mb = (gmch >> ggms_shift) & ggms_mask; /* Read GGMS */
+ gmch &= ~(gms_mask << gms_shift); /* Mask out GMS */
+
+ if (!ggms_mb) {
+ ggms_mb = 1; /* vBIOS seems to use 1MB without checking hardware */
+ gmch |= ggms_mb << ggms_shift;
+ }
+
+ if (!(gmch & 0x2) && !vdev->vga && !vdev->no_auto_vga) {
+ if (vfio_populate_vga(vdev)) {
+ error_report("IGD VGA auto-enable failed");
+ }
+ }
+
+ dc->hotpluggable = false;
+
+ quirk = g_malloc0(sizeof(*quirk));
+ quirk->mem = g_new0(MemoryRegion, 2);
+ quirk->nr_mem = 2;
+ igd = quirk->data = g_malloc0(sizeof(*igd));
+ igd->vdev = vdev;
+ igd->index = ~0;
+
+ memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_index_quirk,
+ igd, "vfio-igd-index-quirk", 4);
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
+ 0, &quirk->mem[0], 1);
+
+ memory_region_init_io(&quirk->mem[1], OBJECT(vdev), &vfio_igd_data_quirk,
+ igd, "vfio-igd-data-quirk", 4);
+ memory_region_add_subregion_overlap(vdev->bars[nr].region.mem,
+ 4, &quirk->mem[1], 1);
+
+ QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
+
+ /*
+ * Not sure if this is worthwhile, but we can ask the BIOS to reserve GMS
+ * stolen memory as well and report it out through the emulated GMCH
+ * register. It's only an experimental option, so it can be dropped.
+ */
+ if (vdev->igd_gms) {
+ if (vdev->igd_gms <= 0x10) {
+ gms_mb = vdev->igd_gms * 32;
+ gmch |= vdev->igd_gms << gms_shift;
+ } else {
+ error_report("Unsupported IGD GMS value 0x%x", vdev->igd_gms);
+ vdev->igd_gms = 0;
+ }
+ }
+
+ fw_cfg_add_file(fw_cfg_find(), "etc/igd-bdsm",
+ NULL, (ggms_mb + gms_mb) * 1024 * 1024);
+
+ /* GMCH is read-only, emulated */
+ pci_set_long(vdev->pdev.config + IGD_GMCH, gmch);
+ pci_set_long(vdev->pdev.wmask + IGD_BDSM, 0);
+ pci_set_long(vdev->emulated_config_bits + IGD_GMCH, ~0);
+
+ /* BDSM is read-write, emulated. The BIOS needs to be able to write it */
+ pci_set_long(vdev->pdev.config + IGD_BDSM, 0);
+ pci_set_long(vdev->pdev.wmask + IGD_BDSM, ~0);
+ pci_set_long(vdev->emulated_config_bits + IGD_BDSM, ~0);
+
+ trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name, ggms_mb + gms_mb);
+}
+
+
+int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
+ struct vfio_region_info *region)
+{
+ DeviceClass *dc = DEVICE_GET_CLASS(vdev);
+ int ret;
+
+ if (vdev->pdev.qdev.hotplugged) {
+ return -EINVAL;
+ }
+
+ dc->hotpluggable = false;
+
+ vdev->igd_opregion = g_malloc0(region->size);
+ ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
+ region->size, region->offset);
+ if (ret != region->size) {
+ error_report("vfio: Error reading IGD OpRegion");
+ g_free(vdev->igd_opregion);
+ vdev->igd_opregion = NULL;
+ return -EINVAL;
+ }
+
+ fw_cfg_add_file(fw_cfg_find(), "etc/igd-opregion",
+ vdev->igd_opregion, region->size);
+
+ trace_vfio_pci_igd_opregion_enabled(vdev->vbasedev.name);
+
+ /* Like BDSM, the BIOS writes the location of the reserved memory here */
+ pci_set_long(vdev->pdev.config + IGD_ASLS, 0);
+ pci_set_long(vdev->pdev.wmask + IGD_ASLS, ~0);
+ pci_set_long(vdev->emulated_config_bits + IGD_ASLS, ~0);
+
+ return 0;
+}
+
+/* Register sets for host and LPC/ISA bridge to copy into VM */
+typedef struct {
+ uint8_t offset;
+ uint8_t len;
+} IGDHostInfo;
+
+static const IGDHostInfo igd_host_bridge_infos[] = {
+ {PCI_REVISION_ID, 2},
+ {PCI_SUBSYSTEM_VENDOR_ID, 2},
+ {PCI_SUBSYSTEM_ID, 2},
+};
+
+static const IGDHostInfo igd_lpc_bridge_infos[] = {
+ {PCI_VENDOR_ID, 2},
+ {PCI_DEVICE_ID, 2},
+ {PCI_REVISION_ID, 2},
+ {PCI_SUBSYSTEM_VENDOR_ID, 2},
+ {PCI_SUBSYSTEM_ID, 2},
+};
+
+static int vfio_pci_igd_copy(VFIOPCIDevice *vdev, PCIDevice *pdev,
+ struct vfio_region_info *region,
+ const IGDHostInfo *list, int len)
+{
+ int i, ret;
+
+ for (i = 0; i < len; i++) {
+ ret = pread(vdev->vbasedev.fd, pdev->config + list[i].offset,
+ list[i].len, region->offset + list[i].offset);
+ if (ret != list[i].len) {
+ error_report("IGD copy failed: %m");
+ return -errno;
+ }
+ }
+
+ return 0;
+}
+
+int vfio_pci_igd_host_init(VFIOPCIDevice *vdev,
+ struct vfio_region_info *region)
+{
+ DeviceClass *dc = DEVICE_GET_CLASS(vdev);
+ PCIBus *bus;
+ PCIDevice *host_bridge;
+ int ret;
+
+ if (igd_gen(vdev) >= 8 && !vdev->vga) {
+ return 0;
+ }
+
+ if (vdev->pdev.qdev.hotplugged) {
+ return -EINVAL;
+ }
+
+ dc->hotpluggable = false;
+
+ bus = pci_device_root_bus(&vdev->pdev);
+ host_bridge = pci_find_device(bus, 0, PCI_DEVFN(0, 0));
+
+ if (!host_bridge) {
+ error_report("Can't find host bridge");
+ return -ENODEV;
+ }
+
+ ret = vfio_pci_igd_copy(vdev, host_bridge, region, igd_host_bridge_infos,
+ ARRAY_SIZE(igd_host_bridge_infos));
+ if (!ret) {
+ trace_vfio_pci_igd_host_bridge_enabled(vdev->vbasedev.name);
+ }
+
+ return ret;
+}
+
+static void vfio_pci_igd_lpc_bridge_realize(PCIDevice *pdev, Error **errp)
+{
+ if (pdev->devfn != PCI_DEVFN(0x1f, 0)) {
+ error_setg(errp, "VFIO dummy ISA/LPC bridge must have address 1f.0");
+ return;
+ }
+}
+
+static void vfio_pci_igd_lpc_bridge_class_init(ObjectClass *klass, void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+ PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+ dc->desc = "VFIO dummy ISA/LPC bridge for IGD assignment";
+ dc->hotpluggable = false;
+ k->realize = vfio_pci_igd_lpc_bridge_realize;
+ k->class_id = PCI_CLASS_BRIDGE_ISA;
+}
+
+static TypeInfo vfio_pci_igd_lpc_bridge_info = {
+ .name = "vfio-pci-igd-lpc-bridge",
+ .parent = TYPE_PCI_DEVICE,
+ .class_init = vfio_pci_igd_lpc_bridge_class_init,
+};
+
+static void vfio_pci_igd_register_types(void)
+{
+ type_register_static(&vfio_pci_igd_lpc_bridge_info);
+}
+
+type_init(vfio_pci_igd_register_types)
+
+int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
+ struct vfio_region_info *region)
+{
+ DeviceClass *dc = DEVICE_GET_CLASS(vdev);
+ PCIDevice *lpc_bridge;
+ int ret;
+
+ if (igd_gen(vdev) >= 8 && !vdev->vga) {
+ return 0;
+ }
+
+ if (vdev->pdev.qdev.hotplugged) {
+ return -EINVAL;
+ }
+
+ dc->hotpluggable = false;
+
+ lpc_bridge = pci_create_simple(pci_device_root_bus(&vdev->pdev),
+ PCI_DEVFN(0x1f, 0),
+ "vfio-pci-igd-lpc-bridge");
+
+ ret = vfio_pci_igd_copy(vdev, lpc_bridge, region, igd_lpc_bridge_infos,
+ ARRAY_SIZE(igd_lpc_bridge_infos));
+ if (!ret) {
+ trace_vfio_pci_igd_lpc_bridge_enabled(vdev->vbasedev.name);
+ }
+
+ return ret;
+}
+
+/*
* Common quirk probe entry points.
*/
void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
@@ -1010,6 +1485,7 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
vfio_probe_nvidia_bar5_quirk(vdev, nr);
vfio_probe_nvidia_bar0_quirk(vdev, nr);
vfio_probe_rtl8168_bar2_quirk(vdev, nr);
+ vfio_probe_igd_bar4_quirk(vdev, nr);
}
void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 368f40d..a288fae 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2099,6 +2099,66 @@ int vfio_populate_vga(VFIOPCIDevice *vdev)
return 0;
}
+static int vfio_populate_dev_regions(VFIOPCIDevice *vdev)
+{
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ struct vfio_region_info *reg_info;
+ int i, ret;
+
+ if (vbasedev->num_regions > VFIO_PCI_NUM_REGIONS) {
+ for (i = VFIO_PCI_NUM_REGIONS; i < vbasedev->num_regions; i++) {
+ struct vfio_info_cap_header *hdr;
+ struct vfio_region_info_cap_type *type;
+
+ ret = vfio_get_region_info(vbasedev, i, ®_info);
+ if (ret) {
+ continue;
+ }
+
+ hdr = vfio_get_region_info_cap(reg_info, VFIO_REGION_INFO_CAP_TYPE);
+ if (!hdr) {
+ g_free(reg_info);
+ continue;
+ }
+
+ type = container_of(hdr, struct vfio_region_info_cap_type, header);
+
+ trace_vfio_populate_dev_region(vdev->vbasedev.name, i,
+ type->type, type->subtype);
+
+ if (type->type ==
+ (VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL) &&
+ type->subtype == VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION) {
+
+ ret = vfio_pci_igd_opregion_init(vdev, reg_info);
+ if (ret) {
+ return ret;
+ }
+ } else if (type->type ==
+ (VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL) &&
+ type->subtype == VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG) {
+
+ ret = vfio_pci_igd_host_init(vdev, reg_info);
+ if (ret) {
+ return ret;
+ }
+ } else if (type->type ==
+ (VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL) &&
+ type->subtype == VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG) {
+
+ ret = vfio_pci_igd_lpc_init(vdev, reg_info);
+ if (ret) {
+ return ret;
+ }
+ }
+
+ g_free(reg_info);
+ }
+ }
+
+ return 0;
+}
+
static int vfio_populate_device(VFIOPCIDevice *vdev)
{
VFIODevice *vbasedev = &vdev->vbasedev;
@@ -2579,6 +2639,11 @@ static int vfio_initfn(PCIDevice *pdev)
}
}
+ ret = vfio_populate_dev_regions(vdev);
+ if (ret) {
+ goto out_teardown;
+ }
+
vfio_register_err_notifier(vdev);
vfio_register_req_notifier(vdev);
vfio_setup_resetfn_quirk(vdev);
@@ -2601,6 +2666,7 @@ static void vfio_instance_finalize(Object *obj)
vfio_bars_finalize(vdev);
g_free(vdev->emulated_config_bits);
g_free(vdev->rom);
+ g_free(vdev->igd_opregion);
vfio_put_device(vdev);
vfio_put_group(group);
}
@@ -2679,12 +2745,14 @@ static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false),
DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false),
DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false),
+ DEFINE_PROP_BOOL("x-no-auto-vga", VFIOPCIDevice, no_auto_vga, false),
DEFINE_PROP_UINT32("x-pci-vendor-id", VFIOPCIDevice, vendor_id, PCI_ANY_ID),
DEFINE_PROP_UINT32("x-pci-device-id", VFIOPCIDevice, device_id, PCI_ANY_ID),
DEFINE_PROP_UINT32("x-pci-sub-vendor-id", VFIOPCIDevice,
sub_vendor_id, PCI_ANY_ID),
DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
sub_device_id, PCI_ANY_ID),
+ DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
/*
* TODO - support passed fds... is this necessary?
* DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 3976f68..850de17 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -115,6 +115,7 @@ typedef struct VFIOPCIDevice {
int interrupt; /* Current interrupt type */
VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
+ void *igd_opregion;
PCIHostDeviceAddress host;
EventNotifier err_notifier;
EventNotifier req_notifier;
@@ -129,6 +130,7 @@ typedef struct VFIOPCIDevice {
#define VFIO_FEATURE_ENABLE_REQ_BIT 1
#define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT)
int32_t bootindex;
+ uint32_t igd_gms;
uint8_t pm_cap;
bool has_vga;
bool pci_aer;
@@ -139,6 +141,7 @@ typedef struct VFIOPCIDevice {
bool no_kvm_intx;
bool no_kvm_msi;
bool no_kvm_msix;
+ bool no_auto_vga;
} VFIOPCIDevice;
uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
@@ -159,4 +162,10 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
int vfio_populate_vga(VFIOPCIDevice *vdev);
+int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
+ struct vfio_region_info *region);
+int vfio_pci_igd_host_init(VFIOPCIDevice *vdev,
+ struct vfio_region_info *region);
+int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
+ struct vfio_region_info *region);
#endif /* HW_VFIO_VFIO_PCI_H */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 594905a..45ba6a3 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -146,6 +146,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
VFIODevice *vbasedev);
int vfio_get_region_info(VFIODevice *vbasedev, int index,
struct vfio_region_info **info);
+struct vfio_info_cap_header *
+ vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id);
extern const MemoryRegionOps vfio_region_ops;
extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
diff --git a/trace-events b/trace-events
index c2f48af..06af953 100644
--- a/trace-events
+++ b/trace-events
@@ -1677,6 +1677,7 @@ vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int
vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %s"
vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
+vfio_populate_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
vfio_initfn(const char *name, int group_id) " (%s) group %d"
vfio_pci_reset(const char *name) " (%s)"
vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
@@ -1715,7 +1716,11 @@ vfio_quirk_ati_bonaire_reset_no_smc(const char *name) "%s"
vfio_quirk_ati_bonaire_reset_timeout(const char *name) "%s"
vfio_quirk_ati_bonaire_reset_done(const char *name) "%s"
vfio_quirk_ati_bonaire_reset(const char *name) "%s"
-
+vfio_pci_igd_bar4_write(const char *name, uint32_t index, uint32_t data, uint32_t base) "%s [%03x] %08x -> %08x"
+vfio_pci_igd_bdsm_enabled(const char *name, int size) "%s %dMB"
+vfio_pci_igd_opregion_enabled(const char *name) "%s"
+vfio_pci_igd_host_bridge_enabled(const char *name) "%s"
+vfio_pci_igd_lpc_bridge_enabled(const char *name) "%s"
# hw/vfio/vfio-common.c
vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (8 preceding siblings ...)
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 9/9] Intel IGD support for vfio Alex Williamson
@ 2016-02-17 11:16 ` Tian, Kevin
2016-02-17 14:02 ` Alex Williamson
2016-03-01 14:49 ` Gerd Hoffmann
10 siblings, 1 reply; 15+ messages in thread
From: Tian, Kevin @ 2016-02-17 11:16 UTC (permalink / raw)
To: Alex Williamson, qemu-devel@nongnu.org; +Cc: Kay, Allen M, kvm@vger.kernel.org
> From: Alex Williamson
> Sent: Wednesday, February 17, 2016 5:32 AM
>
> v3:
Alex, thanks a lot for your enhancement, which should make future
vgpu work easier!
>
> Quite a bit of restructuring, functional differences include exposing
> another fw_cfg file to indicate the size of the stolen memory region.
> We really have no need to copy anything into stolen memory, so while
> we tell SeaBIOS about it via a fw_cfg file, the data pointer is NULL
> so it can't be read.
>
> I'm also now reading the GGMS size from the GMCH register which
> determines the size of the GTT region of stolen memory. The vBIOS
> is typically only using 1MB, but this is often 2MB in hardware. I
> also give the user the ability to specify a GMS value for further
> stolen memory. We default it to zero and it's an experimental option
> so we can remove if it's not useful. QEMU now does the virtualization
> of the GMCH and BDSM registers, which is was sort of doing before
> anyway, but vfio kernel no longer does anything special for them.
> Getting the GGMS size requires that we know something about the IGD
> version we're using, so code added for that.
I'm not sure how useful keeping stolen memory is. Stolen memory
is not accessed by CPU directly. One is for GTT, but GTT is alias in
PCI bar already. The other is to back as framebuffer for early boot-up,
which is mapped to GTT, so the access actually comes from GPU which
is what RMRR should care about for IOMMU setup.
Thanks
Kevin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support
2016-02-17 11:16 ` [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Tian, Kevin
@ 2016-02-17 14:02 ` Alex Williamson
0 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-02-17 14:02 UTC (permalink / raw)
To: Tian, Kevin; +Cc: Kay, Allen M, qemu-devel@nongnu.org, kvm@vger.kernel.org
On Wed, 17 Feb 2016 11:16:37 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:
> > From: Alex Williamson
> > Sent: Wednesday, February 17, 2016 5:32 AM
> >
> > v3:
>
> Alex, thanks a lot for your enhancement, which should make future
> vgpu work easier!
Yes, there are a number of vGPU enablers in this series, sparse mmaps
and device specific regions.
> > Quite a bit of restructuring, functional differences include exposing
> > another fw_cfg file to indicate the size of the stolen memory region.
> > We really have no need to copy anything into stolen memory, so while
> > we tell SeaBIOS about it via a fw_cfg file, the data pointer is NULL
> > so it can't be read.
> >
> > I'm also now reading the GGMS size from the GMCH register which
> > determines the size of the GTT region of stolen memory. The vBIOS
> > is typically only using 1MB, but this is often 2MB in hardware. I
> > also give the user the ability to specify a GMS value for further
> > stolen memory. We default it to zero and it's an experimental option
> > so we can remove if it's not useful. QEMU now does the virtualization
> > of the GMCH and BDSM registers, which is was sort of doing before
> > anyway, but vfio kernel no longer does anything special for them.
> > Getting the GGMS size requires that we know something about the IGD
> > version we're using, so code added for that.
>
> I'm not sure how useful keeping stolen memory is. Stolen memory
> is not accessed by CPU directly. One is for GTT, but GTT is alias in
> PCI bar already. The other is to back as framebuffer for early boot-up,
> which is mapped to GTT, so the access actually comes from GPU which
> is what RMRR should care about for IOMMU setup.
I explained the reason why we need stolen memory in reply to the
seabios patch, the vBIOS needs GTT space therefore we need stolen
memory allocated somewhere for the VM so that we can redirect to that
rather than the host stolen memory address space. Allowing the user to
specify a GMS size may not be useful, but note this in the GPU spec:
BIOS Requirement: BIOS must not set this bit to 0 [IVD] if the GMS
field bits 7:3 of this register pre-allocates no memory.
This is a requirement of the VGA disable bit in the MGGC register.
We expose this bit unmodified from hardware, which will almost always
have VGA enabled and therefore expose this bit as zero. Therefore it's
not legal per the spec to expose GMS as zero with VGA enabled. I don't
know of anything that requires this, but I left the option open in case
we find something. Thanks,
Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
` (9 preceding siblings ...)
2016-02-17 11:16 ` [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Tian, Kevin
@ 2016-03-01 14:49 ` Gerd Hoffmann
2016-03-01 15:29 ` Eric Auger
2016-03-01 20:21 ` Alex Williamson
10 siblings, 2 replies; 15+ messages in thread
From: Gerd Hoffmann @ 2016-03-01 14:49 UTC (permalink / raw)
To: Alex Williamson; +Cc: allen.m.kay, qemu-devel, kvm
On Di, 2016-02-16 at 14:31 -0700, Alex Williamson wrote:
> v3:
Have a new version of this?
Want look at this, but it doesn't apply to qemu master.
thanks,
Gerd
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support
2016-03-01 14:49 ` Gerd Hoffmann
@ 2016-03-01 15:29 ` Eric Auger
2016-03-01 20:21 ` Alex Williamson
1 sibling, 0 replies; 15+ messages in thread
From: Eric Auger @ 2016-03-01 15:29 UTC (permalink / raw)
To: Gerd Hoffmann, Alex Williamson; +Cc: allen.m.kay, qemu-devel, kvm
Hi Gerd,
On 03/01/2016 03:49 PM, Gerd Hoffmann wrote:
> On Di, 2016-02-16 at 14:31 -0700, Alex Williamson wrote:
>> v3:
>
> Have a new version of this?
> Want look at this, but it doesn't apply to qemu master.
I think you have subset of this series in:
"[Qemu-devel] [PATCH 0/6] vfio: Prep for vGPU & IGD",
http://lists.gnu.org/archive/html/qemu-devel/2016-02/msg05089.html
I don't know if it can help you.
Best Regards
Eric
>
> thanks,
> Gerd
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support
2016-03-01 14:49 ` Gerd Hoffmann
2016-03-01 15:29 ` Eric Auger
@ 2016-03-01 20:21 ` Alex Williamson
1 sibling, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2016-03-01 20:21 UTC (permalink / raw)
To: Gerd Hoffmann; +Cc: allen.m.kay, qemu-devel, kvm
On Tue, 01 Mar 2016 15:49:26 +0100
Gerd Hoffmann <kraxel@redhat.com> wrote:
> On Di, 2016-02-16 at 14:31 -0700, Alex Williamson wrote:
> > v3:
>
> Have a new version of this?
> Want look at this, but it doesn't apply to qemu master.
Try this branch for now:
https://github.com/awilliam/qemu-vfio/tree/vfio-igd-20160301
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-03-01 20:21 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-16 21:31 [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Alex Williamson
2016-02-16 21:31 ` [Qemu-devel] [RFC PATCH v3 1/9] vfio: Add sysfsdev property for pci & platform Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 2/9] vfio: Wrap VFIO_DEVICE_GET_REGION_INFO Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 3/9] vfio: Generalize region support Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 4/9] vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 5/9] linux-headers/vfio: Update for proposed capabilities list Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 6/9] vfio: Enable sparse mmap capability Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 7/9] vfio/pci: Fixup PCI option ROMs Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 8/9] vfio/pci: Split out VGA setup Alex Williamson
2016-02-16 21:32 ` [Qemu-devel] [RFC PATCH v3 9/9] Intel IGD support for vfio Alex Williamson
2016-02-17 11:16 ` [Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support Tian, Kevin
2016-02-17 14:02 ` Alex Williamson
2016-03-01 14:49 ` Gerd Hoffmann
2016-03-01 15:29 ` Eric Auger
2016-03-01 20:21 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).