* [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough
@ 2014-07-07 12:27 Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 01/13] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
` (12 more replies)
0 siblings, 13 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
This RFC series aims at enabling KVM platform device passthrough.
It implements a VFIO platform device which is bound to be dynamically
instantiated using -device option.
The VFIO platform device uses an host VFIO platform driver which must
be bound to the assigned device prior to the QEMU system start.
- the guest can directly access the device register space
- assigned device IRQs are transparently routed to the guest by
QEMU/KVM (2 methods currently are supported)
- iommu is transparently programmed to prevent the device from
accessing physical pages outside of the guest address space
The patch series was fully reworked between v3 and v4 to ease the
review of PCI modifications. Dynamic instantiation from command
line was cleaned up thanks to Alex Graf "Dynamic sysbus device
allocation support" patch series and its porting onto machvirt.
the patch relies on the following QEMU patch series:
- Alex Graf's "Dynamic sysbus device allocation support"
http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
- "machvirt dynamic sysbus device instantiation"
Port Alex mechanics from e500 to virt. Propose to implement
device tree generation in devices instead of machine file
The patch series is made of the following patch files:
1-6) Modifications to PCI code to prepare for VFIO platform device:
7) split of PCI specific code and generic code (move)
8) EXEC_FLAG setting
9) creation of the VFIO platform device, without irqfd support
(MMIO direct access and IRQ assignment).
10-11) addition of irqfd/virqfd support
12) capability to dynamically instantiate the device
13) example derived VFIO device: calxeda xgmac
v3->v4 changes (Eric Auger, Alvise Rigo)
- rebase on last VFIO PCI code (v2.1.0-rc0)
- full git history rework to ease PCI code change review
- mv include files in hw/vfio
- DPRINTF reformatting temporarily moved out
- support of VFIO virq (removal of resamplefd handler on user-side)
- integration with sysbus dynamic instantiation framwork
- removal of unrealize and cleanup routines until it is better
understood what is really needed
- Support of VFIO for Amba devices should be handled in an inherited
device to specialize the device tree generation (clock handle currently
missing in framework however)
- "Always use eventfd as notifying mechanism" temporarily moved out
- static instantiation is not mainstream (although it remains possible)
note if static instantiation is used, irqfd must be setup in machine file
when virtual IRQ is known
- create the GSI routing table on qemu side
v2->v3 changes (Alvise Rigo, Eric Auger):
- Following Alex W recommandations, further efforts to factorize the
code between PCI:introduction of VFIODevice and VFIORegion
as base classes
- unique reset handler for platform and PCI
- cleanup following Kim's comments
- multiple IRQ support mechanics should be in place although not
tested
- Better handling of MMIO multiple regions
- New features and fixes by Alvise (multiple compat string, exec
flag, force eventfd usage, amba device tree support)
- irqfd support
v1->v2 changes (Kim Phillips, Eric Auger):
- IRQ initial support (legacy mode where eventfds are handled on
user side)
- hacked dynamic instantiation
v1 (Kim Phillips):
- initial split between PCI and platform
- MMIO support only
- static instantiation
This patch has the following kernel side dependencies:
- [RFC Patch v6 0/20] VFIO support for platform devices
https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
- [Patch] ARM: KVM: Handle IPA unmapping on memory region deletion
https://patches.linaro.org/27691/
- [PATCH v3] ARM: KVM: add irqfd and irq routing support
https://patches.linaro.org/32261/
- [PATCH] ARM: KVM: Enable the KVM-VFIO device
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-March/008629.html
- [PATCH v2] ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping
http://www.spinics.net/lists/kvm/msg105083.html
The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
is assigned to KVM host while the second one is assigned to the guest.
Unfortunately a single IRQ is exercised.
Next steps:
- use of "ARM: Forwarding physical interrupts to a guest VM"
- unbind/migration/reset problematics
Here are the instructions to test on a Calxeda Midway:
https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
git://git.linaro.org/people/eric.auger/linux.git (branch irqfd_integ_v3)
git://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v4)
Best Regards
Eric
Alvise Rigo (1):
hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings
Eric Auger (11):
hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
hw/vfio/pci: Remove unneeded include files
hw/vfio/pci: introduce VFIODevice
hw/vfio/pci: Introduce VFIORegion
hw/vfio/pci: split vfio_get_device
hw/vfio: create common module
hw/vfio/platform: add vfio-platform support
hw/intc/arm_gic_kvm: enable irqfd and set routing table
hw/vfio/platform: Add irqfd support
hw/vfio/platform: add default dt generation for vfio device
hw/vfio: add an example calxeda_xgmac
Kim Phillips (1):
vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into
include/hw/vfio
LICENSE | 2 +-
MAINTAINERS | 2 +-
hw/Makefile.objs | 1 +
hw/intc/arm_gic_kvm.c | 11 +
hw/misc/Makefile.objs | 1 -
hw/ppc/spapr_pci_vfio.c | 2 +-
hw/vfio/Makefile.objs | 6 +
hw/vfio/calxeda_xgmac.c | 165 +++++
hw/vfio/common.c | 1003 +++++++++++++++++++++++++
hw/{misc/vfio.c => vfio/pci.c} | 1514 +++++++-------------------------------
hw/vfio/platform.c | 766 +++++++++++++++++++
include/hw/vfio/vfio-common.h | 149 ++++
include/hw/vfio/vfio-platform.h | 75 ++
include/hw/{misc => vfio}/vfio.h | 0
linux-headers/linux/vfio.h | 2 +
15 files changed, 2465 insertions(+), 1234 deletions(-)
create mode 100644 hw/vfio/Makefile.objs
create mode 100644 hw/vfio/calxeda_xgmac.c
create mode 100644 hw/vfio/common.c
rename hw/{misc/vfio.c => vfio/pci.c} (71%)
create mode 100644 hw/vfio/platform.c
create mode 100644 include/hw/vfio/vfio-common.h
create mode 100644 include/hw/vfio/vfio-platform.h
rename include/hw/{misc => vfio}/vfio.h (100%)
--
1.8.3.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 01/13] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 02/13] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
` (11 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, Kim Phillips, patches, will.deacon,
agraf, stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis,
kvmarm
From: Kim Phillips <kim.phillips@linaro.org>
This is done in preparation for the addition of VFIO platform
device support.
Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
LICENSE | 2 +-
MAINTAINERS | 2 +-
hw/Makefile.objs | 1 +
hw/misc/Makefile.objs | 1 -
hw/ppc/spapr_pci_vfio.c | 2 +-
hw/vfio/Makefile.objs | 3 +++
hw/{misc/vfio.c => vfio/pci.c} | 2 +-
include/hw/{misc => vfio}/vfio.h | 0
8 files changed, 8 insertions(+), 5 deletions(-)
create mode 100644 hw/vfio/Makefile.objs
rename hw/{misc/vfio.c => vfio/pci.c} (99%)
rename include/hw/{misc => vfio}/vfio.h (100%)
diff --git a/LICENSE b/LICENSE
index da70e94..0e0b4b9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -11,7 +11,7 @@ option) any later version.
As of July 2013, contributions under version 2 of the GNU General Public
License (and no later version) are only accepted for the following files
-or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*.
+or directories: bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*.
3) The Tiny Code Generator (TCG) is released under the BSD license
(see license headers in files).
diff --git a/MAINTAINERS b/MAINTAINERS
index e7dc907..eade89f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -618,7 +618,7 @@ F: tests/usb-hcd-ehci-test.c
VFIO
M: Alex Williamson <alex.williamson@redhat.com>
S: Supported
-F: hw/misc/vfio.c
+F: hw/vfio/*
vhost
M: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 52a1464..73afa41 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,6 +26,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += ssi/
devices-dirs-$(CONFIG_SOFTMMU) += timer/
devices-dirs-$(CONFIG_TPM) += tpm/
devices-dirs-$(CONFIG_SOFTMMU) += usb/
+devices-dirs-$(CONFIG_SOFTMMU) += vfio/
devices-dirs-$(CONFIG_VIRTIO) += virtio/
devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
devices-dirs-$(CONFIG_SOFTMMU) += xen/
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 97572a6..d081606 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -21,7 +21,6 @@ common-obj-$(CONFIG_MACIO) += macio/
ifeq ($(CONFIG_PCI), y)
obj-$(CONFIG_KVM) += ivshmem.o
-obj-$(CONFIG_LINUX) += vfio.o
endif
obj-$(CONFIG_REALVIEW) += arm_sysctl.o
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index d3bddf2..144912b 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,7 +20,7 @@
#include "hw/ppc/spapr.h"
#include "hw/pci-host/spapr.h"
#include "linux/vfio.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
static Property spapr_phb_vfio_properties[] = {
DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
new file mode 100644
index 0000000..31c7dab
--- /dev/null
+++ b/hw/vfio/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_PCI) += pci.o
+endif
diff --git a/hw/misc/vfio.c b/hw/vfio/pci.c
similarity index 99%
rename from hw/misc/vfio.c
rename to hw/vfio/pci.c
index aef4c9c..db2bdcd 100644
--- a/hw/misc/vfio.c
+++ b/hw/vfio/pci.c
@@ -39,7 +39,7 @@
#include "qemu/range.h"
#include "sysemu/kvm.h"
#include "sysemu/sysemu.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
/* #define DEBUG_VFIO */
#ifdef DEBUG_VFIO
diff --git a/include/hw/misc/vfio.h b/include/hw/vfio/vfio.h
similarity index 100%
rename from include/hw/misc/vfio.h
rename to include/hw/vfio/vfio.h
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 02/13] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 01/13] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files Eric Auger
` (10 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
This prepares for the introduction of VFIOPlatformDevice
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/pci.c | 209 +++++++++++++++++++++++++++++-----------------------------
1 file changed, 105 insertions(+), 104 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index db2bdcd..5c7bfd5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -56,11 +56,11 @@
#define VFIO_ALLOW_KVM_MSI 1
#define VFIO_ALLOW_KVM_MSIX 1
-struct VFIODevice;
+struct VFIOPCIDevice;
typedef struct VFIOQuirk {
MemoryRegion mem;
- struct VFIODevice *vdev;
+ struct VFIOPCIDevice *vdev;
QLIST_ENTRY(VFIOQuirk) next;
struct {
uint32_t base_offset:TARGET_PAGE_BITS;
@@ -122,7 +122,7 @@ typedef struct VFIOINTx {
typedef struct VFIOMSIVector {
EventNotifier interrupt; /* eventfd triggered on interrupt */
EventNotifier kvm_interrupt; /* eventfd triggered for KVM irqfd bypass */
- struct VFIODevice *vdev; /* back pointer to device */
+ struct VFIOPCIDevice *vdev; /* back pointer to device */
MSIMessage msg; /* cache the MSI message so we know when it changes */
int virq; /* KVM irqchip route for QEMU bypass */
bool use;
@@ -185,7 +185,7 @@ typedef struct VFIOMSIXInfo {
void *mmap;
} VFIOMSIXInfo;
-typedef struct VFIODevice {
+typedef struct VFIOPCIDevice {
PCIDevice pdev;
int fd;
VFIOINTx intx;
@@ -203,7 +203,7 @@ typedef struct VFIODevice {
VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
PCIHostDeviceAddress host;
- QLIST_ENTRY(VFIODevice) next;
+ QLIST_ENTRY(VFIOPCIDevice) next;
struct VFIOGroup *group;
EventNotifier err_notifier;
uint32_t features;
@@ -218,13 +218,13 @@ typedef struct VFIODevice {
bool has_pm_reset;
bool needs_reset;
bool rom_read_failed;
-} VFIODevice;
+} VFIOPCIDevice;
typedef struct VFIOGroup {
int fd;
int groupid;
VFIOContainer *container;
- QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_HEAD(, VFIOPCIDevice) device_list;
QLIST_ENTRY(VFIOGroup) next;
QLIST_ENTRY(VFIOGroup) container_next;
} VFIOGroup;
@@ -268,16 +268,16 @@ static QLIST_HEAD(, VFIOGroup)
static int vfio_kvm_device_fd = -1;
#endif
-static void vfio_disable_interrupts(VFIODevice *vdev);
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
uint32_t val, int len);
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
/*
* Common VFIO interrupt disable
*/
-static void vfio_disable_irqindex(VFIODevice *vdev, int index)
+static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
{
struct vfio_irq_set irq_set = {
.argsz = sizeof(irq_set),
@@ -293,7 +293,7 @@ static void vfio_disable_irqindex(VFIODevice *vdev, int index)
/*
* INTx
*/
-static void vfio_unmask_intx(VFIODevice *vdev)
+static void vfio_unmask_intx(VFIOPCIDevice *vdev)
{
struct vfio_irq_set irq_set = {
.argsz = sizeof(irq_set),
@@ -307,7 +307,7 @@ static void vfio_unmask_intx(VFIODevice *vdev)
}
#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIODevice *vdev)
+static void vfio_mask_intx(VFIOPCIDevice *vdev)
{
struct vfio_irq_set irq_set = {
.argsz = sizeof(irq_set),
@@ -338,7 +338,7 @@ static void vfio_mask_intx(VFIODevice *vdev)
*/
static void vfio_intx_mmap_enable(void *opaque)
{
- VFIODevice *vdev = opaque;
+ VFIOPCIDevice *vdev = opaque;
if (vdev->intx.pending) {
timer_mod(vdev->intx.mmap_timer,
@@ -351,7 +351,7 @@ static void vfio_intx_mmap_enable(void *opaque)
static void vfio_intx_interrupt(void *opaque)
{
- VFIODevice *vdev = opaque;
+ VFIOPCIDevice *vdev = opaque;
if (!event_notifier_test_and_clear(&vdev->intx.interrupt)) {
return;
@@ -370,7 +370,7 @@ static void vfio_intx_interrupt(void *opaque)
}
}
-static void vfio_eoi(VFIODevice *vdev)
+static void vfio_eoi(VFIOPCIDevice *vdev)
{
if (!vdev->intx.pending) {
return;
@@ -384,7 +384,7 @@ static void vfio_eoi(VFIODevice *vdev)
vfio_unmask_intx(vdev);
}
-static void vfio_enable_intx_kvm(VFIODevice *vdev)
+static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
{
#ifdef CONFIG_KVM
struct kvm_irqfd irqfd = {
@@ -463,7 +463,7 @@ fail:
#endif
}
-static void vfio_disable_intx_kvm(VFIODevice *vdev)
+static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
{
#ifdef CONFIG_KVM
struct kvm_irqfd irqfd = {
@@ -508,7 +508,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev)
static void vfio_update_irq(PCIDevice *pdev)
{
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
PCIINTxRoute route;
if (vdev->interrupt != VFIO_INT_INTx) {
@@ -539,7 +539,7 @@ static void vfio_update_irq(PCIDevice *pdev)
vfio_eoi(vdev);
}
-static int vfio_enable_intx(VFIODevice *vdev)
+static int vfio_enable_intx(VFIOPCIDevice *vdev)
{
uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
int ret, argsz;
@@ -605,7 +605,7 @@ static int vfio_enable_intx(VFIODevice *vdev)
return 0;
}
-static void vfio_disable_intx(VFIODevice *vdev)
+static void vfio_disable_intx(VFIOPCIDevice *vdev)
{
int fd;
@@ -632,7 +632,7 @@ static void vfio_disable_intx(VFIODevice *vdev)
static void vfio_msi_interrupt(void *opaque)
{
VFIOMSIVector *vector = opaque;
- VFIODevice *vdev = vector->vdev;
+ VFIOPCIDevice *vdev = vector->vdev;
int nr = vector - vdev->msi_vectors;
if (!event_notifier_test_and_clear(&vector->interrupt)) {
@@ -664,7 +664,7 @@ static void vfio_msi_interrupt(void *opaque)
}
}
-static int vfio_enable_vectors(VFIODevice *vdev, bool msix)
+static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
{
struct vfio_irq_set *irq_set;
int ret = 0, i, argsz;
@@ -746,7 +746,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg)
static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
MSIMessage *msg, IOHandler *handler)
{
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
VFIOMSIVector *vector;
int ret;
@@ -835,7 +835,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
{
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
VFIOMSIVector *vector = &vdev->msi_vectors[nr];
DPRINTF("%s(%04x:%02x:%02x.%x) vector %d released\n", __func__,
@@ -874,7 +874,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
}
}
-static void vfio_enable_msix(VFIODevice *vdev)
+static void vfio_enable_msix(VFIOPCIDevice *vdev)
{
vfio_disable_interrupts(vdev);
@@ -907,7 +907,7 @@ static void vfio_enable_msix(VFIODevice *vdev)
vdev->host.bus, vdev->host.slot, vdev->host.function);
}
-static void vfio_enable_msi(VFIODevice *vdev)
+static void vfio_enable_msi(VFIOPCIDevice *vdev)
{
int ret, i;
@@ -986,7 +986,7 @@ retry:
vdev->host.function, vdev->nr_vectors);
}
-static void vfio_disable_msi_common(VFIODevice *vdev)
+static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
{
int i;
@@ -1010,7 +1010,7 @@ static void vfio_disable_msi_common(VFIODevice *vdev)
vfio_enable_intx(vdev);
}
-static void vfio_disable_msix(VFIODevice *vdev)
+static void vfio_disable_msix(VFIOPCIDevice *vdev)
{
int i;
@@ -1037,7 +1037,7 @@ static void vfio_disable_msix(VFIODevice *vdev)
vdev->host.bus, vdev->host.slot, vdev->host.function);
}
-static void vfio_disable_msi(VFIODevice *vdev)
+static void vfio_disable_msi(VFIOPCIDevice *vdev)
{
vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
vfio_disable_msi_common(vdev);
@@ -1046,7 +1046,7 @@ static void vfio_disable_msi(VFIODevice *vdev)
vdev->host.bus, vdev->host.slot, vdev->host.function);
}
-static void vfio_update_msi(VFIODevice *vdev)
+static void vfio_update_msi(VFIOPCIDevice *vdev)
{
int i;
@@ -1099,7 +1099,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
#ifdef DEBUG_VFIO
{
- VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+ VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx", 0x%"PRIx64
", %d)\n", __func__, vdev->host.domain, vdev->host.bus,
@@ -1116,7 +1116,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
* which access will service the interrupt, so we're potentially
* getting quite a few host interrupts per guest interrupt.
*/
- vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+ vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
}
static uint64_t vfio_bar_read(void *opaque,
@@ -1154,7 +1154,7 @@ static uint64_t vfio_bar_read(void *opaque,
#ifdef DEBUG_VFIO
{
- VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+ VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx
", %d) = 0x%"PRIx64"\n", __func__, vdev->host.domain,
@@ -1164,7 +1164,7 @@ static uint64_t vfio_bar_read(void *opaque,
#endif
/* Same as write above */
- vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+ vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
return data;
}
@@ -1175,7 +1175,7 @@ static const MemoryRegionOps vfio_bar_ops = {
.endianness = DEVICE_NATIVE_ENDIAN,
};
-static void vfio_pci_load_rom(VFIODevice *vdev)
+static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
{
struct vfio_region_info reg_info = {
.argsz = sizeof(reg_info),
@@ -1233,7 +1233,7 @@ static void vfio_pci_load_rom(VFIODevice *vdev)
static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
{
- VFIODevice *vdev = opaque;
+ VFIOPCIDevice *vdev = opaque;
union {
uint8_t byte;
uint16_t word;
@@ -1283,7 +1283,7 @@ static const MemoryRegionOps vfio_rom_ops = {
.endianness = DEVICE_NATIVE_ENDIAN,
};
-static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
+static bool vfio_blacklist_opt_rom(VFIOPCIDevice *vdev)
{
PCIDevice *pdev = &vdev->pdev;
uint16_t vendor_id, device_id;
@@ -1303,7 +1303,7 @@ static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
return false;
}
-static void vfio_pci_size_rom(VFIODevice *vdev)
+static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
{
uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
@@ -1482,7 +1482,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
hwaddr addr, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
uint64_t data;
if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
@@ -1515,7 +1515,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
if (ranges_overlap(addr, size,
quirk->data.address_offset, quirk->data.address_size)) {
@@ -1569,7 +1569,7 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
hwaddr addr, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
uint64_t data;
@@ -1599,7 +1599,7 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
@@ -1644,7 +1644,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
hwaddr addr, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
uint64_t data = vfio_pci_read_config(&vdev->pdev,
PCI_BASE_ADDRESS_0 + (4 * 4) + 1,
size);
@@ -1658,7 +1658,7 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
.endianness = DEVICE_LITTLE_ENDIAN,
};
-static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -1701,7 +1701,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
* that only read-only access is provided, but we drop writes when the window
* is enabled to config space nonetheless.
*/
-static void vfio_probe_ati_bar4_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -1763,7 +1763,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
hwaddr addr, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
switch (addr) {
case 4: /* address */
@@ -1805,7 +1805,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
switch (addr) {
case 4: /* address */
@@ -1852,7 +1852,7 @@ static const MemoryRegionOps vfio_rtl8168_window_quirk = {
.endianness = DEVICE_LITTLE_ENDIAN,
};
-static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -1880,7 +1880,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
/*
* Trap the BAR2 MMIO window to config space as well.
*/
-static void vfio_probe_ati_bar2_4000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -1948,7 +1948,7 @@ static uint64_t vfio_nvidia_3d0_quirk_read(void *opaque,
hwaddr addr, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
PCIDevice *pdev = &vdev->pdev;
uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
addr + quirk->data.base_offset, size);
@@ -1967,7 +1967,7 @@ static void vfio_nvidia_3d0_quirk_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
PCIDevice *pdev = &vdev->pdev;
switch (quirk->data.flags) {
@@ -2014,7 +2014,7 @@ static const MemoryRegionOps vfio_nvidia_3d0_quirk = {
.endianness = DEVICE_LITTLE_ENDIAN,
};
-static void vfio_vga_probe_nvidia_3d0_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -2106,7 +2106,7 @@ static const MemoryRegionOps vfio_nvidia_bar5_window_quirk = {
.endianness = DEVICE_LITTLE_ENDIAN,
};
-static void vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -2141,7 +2141,7 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size)
{
VFIOQuirk *quirk = opaque;
- VFIODevice *vdev = quirk->vdev;
+ VFIOPCIDevice *vdev = quirk->vdev;
PCIDevice *pdev = &vdev->pdev;
hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
@@ -2174,7 +2174,7 @@ static const MemoryRegionOps vfio_nvidia_88000_quirk = {
*
* Here's offset 0x88000...
*/
-static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -2208,7 +2208,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
/*
* And here's the same for BAR0 offset 0x1800...
*/
-static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
{
PCIDevice *pdev = &vdev->pdev;
VFIOQuirk *quirk;
@@ -2252,13 +2252,13 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
/*
* Common quirk probe entry points.
*/
-static void vfio_vga_quirk_setup(VFIODevice *vdev)
+static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
{
vfio_vga_probe_ati_3c3_quirk(vdev);
vfio_vga_probe_nvidia_3d0_quirk(vdev);
}
-static void vfio_vga_quirk_teardown(VFIODevice *vdev)
+static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
{
int i;
@@ -2273,7 +2273,7 @@ static void vfio_vga_quirk_teardown(VFIODevice *vdev)
}
}
-static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
{
vfio_probe_ati_bar4_window_quirk(vdev, nr);
vfio_probe_ati_bar2_4000_quirk(vdev, nr);
@@ -2283,7 +2283,7 @@ static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
vfio_probe_rtl8168_bar2_window_quirk(vdev, nr);
}
-static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
@@ -2301,7 +2301,7 @@ static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
*/
static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
{
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
@@ -2336,7 +2336,7 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
uint32_t val, int len)
{
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
uint32_t val_le = cpu_to_le32(val);
DPRINTF("%s(%04x:%02x:%02x.%x, @0x%x, 0x%x, len=0x%x)\n", __func__,
@@ -2693,7 +2693,7 @@ static void vfio_listener_release(VFIOContainer *container)
/*
* Interrupt setup
*/
-static void vfio_disable_interrupts(VFIODevice *vdev)
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
{
switch (vdev->interrupt) {
case VFIO_INT_INTx:
@@ -2708,7 +2708,7 @@ static void vfio_disable_interrupts(VFIODevice *vdev)
}
}
-static int vfio_setup_msi(VFIODevice *vdev, int pos)
+static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
{
uint16_t ctrl;
bool msi_64bit, msi_maskbit;
@@ -2748,7 +2748,7 @@ static int vfio_setup_msi(VFIODevice *vdev, int pos)
* need to first look for where the MSI-X table lives. So we
* unfortunately split MSI-X setup across two functions.
*/
-static int vfio_early_setup_msix(VFIODevice *vdev)
+static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
{
uint8_t pos;
uint16_t ctrl;
@@ -2794,7 +2794,7 @@ static int vfio_early_setup_msix(VFIODevice *vdev)
return 0;
}
-static int vfio_setup_msix(VFIODevice *vdev, int pos)
+static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
{
int ret;
@@ -2814,7 +2814,7 @@ static int vfio_setup_msix(VFIODevice *vdev, int pos)
return 0;
}
-static void vfio_teardown_msi(VFIODevice *vdev)
+static void vfio_teardown_msi(VFIOPCIDevice *vdev)
{
msi_uninit(&vdev->pdev);
@@ -2827,7 +2827,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
/*
* Resource setup
*/
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
{
int i;
@@ -2845,7 +2845,7 @@ static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
}
}
-static void vfio_unmap_bar(VFIODevice *vdev, int nr)
+static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
@@ -2868,7 +2868,7 @@ static void vfio_unmap_bar(VFIODevice *vdev, int nr)
memory_region_destroy(&bar->mem);
}
-static int vfio_mmap_bar(VFIODevice *vdev, VFIOBAR *bar,
+static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
MemoryRegion *mem, MemoryRegion *submem,
void **map, size_t size, off_t offset,
const char *name)
@@ -2906,7 +2906,7 @@ empty_region:
return ret;
}
-static void vfio_map_bar(VFIODevice *vdev, int nr)
+static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
unsigned size = bar->size;
@@ -2975,7 +2975,7 @@ static void vfio_map_bar(VFIODevice *vdev, int nr)
vfio_bar_quirk_setup(vdev, nr);
}
-static void vfio_map_bars(VFIODevice *vdev)
+static void vfio_map_bars(VFIOPCIDevice *vdev)
{
int i;
@@ -3007,7 +3007,7 @@ static void vfio_map_bars(VFIODevice *vdev)
}
}
-static void vfio_unmap_bars(VFIODevice *vdev)
+static void vfio_unmap_bars(VFIOPCIDevice *vdev)
{
int i;
@@ -3046,7 +3046,7 @@ static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
}
-static void vfio_add_emulated_word(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_word(VFIOPCIDevice *vdev, int pos,
uint16_t val, uint16_t mask)
{
vfio_set_word_bits(vdev->pdev.config + pos, val, mask);
@@ -3059,7 +3059,7 @@ static void vfio_set_long_bits(uint8_t *buf, uint32_t val, uint32_t mask)
pci_set_long(buf, (pci_get_long(buf) & ~mask) | val);
}
-static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_long(VFIOPCIDevice *vdev, int pos,
uint32_t val, uint32_t mask)
{
vfio_set_long_bits(vdev->pdev.config + pos, val, mask);
@@ -3067,7 +3067,7 @@ static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
}
-static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
+static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size)
{
uint16_t flags;
uint8_t type;
@@ -3159,7 +3159,7 @@ static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
return pos;
}
-static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
{
uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
@@ -3171,7 +3171,7 @@ static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
}
}
-static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
{
uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
@@ -3183,7 +3183,7 @@ static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
}
}
-static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
{
uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
@@ -3195,7 +3195,7 @@ static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
}
}
-static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
+static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
{
PCIDevice *pdev = &vdev->pdev;
uint8_t cap_id, next, size;
@@ -3270,7 +3270,7 @@ static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
return 0;
}
-static int vfio_add_capabilities(VFIODevice *vdev)
+static int vfio_add_capabilities(VFIOPCIDevice *vdev)
{
PCIDevice *pdev = &vdev->pdev;
@@ -3282,7 +3282,7 @@ static int vfio_add_capabilities(VFIODevice *vdev)
return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
}
-static void vfio_pci_pre_reset(VFIODevice *vdev)
+static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
{
PCIDevice *pdev = &vdev->pdev;
uint16_t cmd;
@@ -3319,7 +3319,7 @@ static void vfio_pci_pre_reset(VFIODevice *vdev)
vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
}
-static void vfio_pci_post_reset(VFIODevice *vdev)
+static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
{
vfio_enable_intx(vdev);
}
@@ -3331,7 +3331,7 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
host1->slot == host2->slot && host1->function == host2->function);
}
-static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
{
VFIOGroup *group;
struct vfio_pci_hot_reset_info *info;
@@ -3381,7 +3381,7 @@ static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
/* Verify that we have all the groups required */
for (i = 0; i < info->count; i++) {
PCIHostDeviceAddress host;
- VFIODevice *tmp;
+ VFIOPCIDevice *tmp;
host.domain = devices[i].segment;
host.bus = devices[i].bus;
@@ -3473,7 +3473,7 @@ out:
/* Re-enable INTx on affected devices */
for (i = 0; i < info->count; i++) {
PCIHostDeviceAddress host;
- VFIODevice *tmp;
+ VFIOPCIDevice *tmp;
host.domain = devices[i].segment;
host.bus = devices[i].bus;
@@ -3523,12 +3523,12 @@ out_single:
* _one() will only do a hot reset for the one in-use devices case, calling
* _multi() will do nothing if a _one() would have been sufficient.
*/
-static int vfio_pci_hot_reset_one(VFIODevice *vdev)
+static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
{
return vfio_pci_hot_reset(vdev, true);
}
-static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
{
return vfio_pci_hot_reset(vdev, false);
}
@@ -3536,7 +3536,7 @@ static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
static void vfio_pci_reset_handler(void *opaque)
{
VFIOGroup *group;
- VFIODevice *vdev;
+ VFIOPCIDevice *vdev;
QLIST_FOREACH(group, &group_list, next) {
QLIST_FOREACH(vdev, &group->device_list, next) {
@@ -3874,7 +3874,8 @@ static void vfio_put_group(VFIOGroup *group)
}
}
-static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev)
+static int vfio_get_device(VFIOGroup *group, const char *name,
+ VFIOPCIDevice *vdev)
{
struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
@@ -4028,7 +4029,7 @@ error:
return ret;
}
-static void vfio_put_device(VFIODevice *vdev)
+static void vfio_put_device(VFIOPCIDevice *vdev)
{
QLIST_REMOVE(vdev, next);
vdev->group = NULL;
@@ -4042,7 +4043,7 @@ static void vfio_put_device(VFIODevice *vdev)
static void vfio_err_notifier_handler(void *opaque)
{
- VFIODevice *vdev = opaque;
+ VFIOPCIDevice *vdev = opaque;
if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
return;
@@ -4071,7 +4072,7 @@ static void vfio_err_notifier_handler(void *opaque)
* and continue after disabling error recovery support for the
* device.
*/
-static void vfio_register_err_notifier(VFIODevice *vdev)
+static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
{
int ret;
int argsz;
@@ -4112,7 +4113,7 @@ static void vfio_register_err_notifier(VFIODevice *vdev)
g_free(irq_set);
}
-static void vfio_unregister_err_notifier(VFIODevice *vdev)
+static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
{
int argsz;
struct vfio_irq_set *irq_set;
@@ -4147,7 +4148,7 @@ static void vfio_unregister_err_notifier(VFIODevice *vdev)
static int vfio_initfn(PCIDevice *pdev)
{
- VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
VFIOGroup *group;
char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
ssize_t len;
@@ -4301,7 +4302,7 @@ out_put:
static void vfio_exitfn(PCIDevice *pdev)
{
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
VFIOGroup *group = vdev->group;
vfio_unregister_err_notifier(vdev);
@@ -4321,7 +4322,7 @@ static void vfio_exitfn(PCIDevice *pdev)
static void vfio_pci_reset(DeviceState *dev)
{
PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
- VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function);
@@ -4353,16 +4354,16 @@ post_reset:
}
static Property vfio_pci_dev_properties[] = {
- DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIODevice, host),
- DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIODevice,
+ DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
+ DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
intx.mmap_timeout, 1100),
- DEFINE_PROP_BIT("x-vga", VFIODevice, features,
+ DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
VFIO_FEATURE_ENABLE_VGA_BIT, false),
- DEFINE_PROP_INT32("bootindex", VFIODevice, bootindex, -1),
+ DEFINE_PROP_INT32("bootindex", VFIOPCIDevice, bootindex, -1),
/*
* TODO - support passed fds... is this necessary?
- * DEFINE_PROP_STRING("vfiofd", VFIODevice, vfiofd_name),
- * DEFINE_PROP_STRING("vfiogroupfd, VFIODevice, vfiogroupfd_name),
+ * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
+ * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
*/
DEFINE_PROP_END_OF_LIST(),
};
@@ -4392,7 +4393,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
static const TypeInfo vfio_pci_dev_info = {
.name = "vfio-pci",
.parent = TYPE_PCI_DEVICE,
- .instance_size = sizeof(VFIODevice),
+ .instance_size = sizeof(VFIOPCIDevice),
.class_init = vfio_pci_dev_class_init,
};
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 01/13] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 02/13] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-08 18:55 ` Alex Williamson
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice Eric Auger
` (9 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/pci.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bfd5..a7df3de 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -18,26 +18,14 @@
* Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
*/
-#include <dirent.h>
#include <linux/vfio.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
-#include <sys/stat.h>
-#include <sys/types.h>
-#include <unistd.h>
-
-#include "config.h"
#include "exec/address-spaces.h"
-#include "exec/memory.h"
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
-#include "hw/pci/pci.h"
-#include "qemu-common.h"
#include "qemu/error-report.h"
-#include "qemu/event_notifier.h"
-#include "qemu/queue.h"
#include "qemu/range.h"
-#include "sysemu/kvm.h"
#include "sysemu/sysemu.h"
#include "hw/vfio/vfio.h"
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (2 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-08 22:41 ` Alex Williamson
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion Eric Auger
` (8 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
Introduce the VFIODevice struct that is going to be shared by
VFIOPCIDevice and VFIOPlatformDevice.
Additional fields will be added there later on for review
convenience.
the group's device_list becomes a list of VFIODevice
This obliges to rework the reset_handler which becomes generic and
calls VFIODevice ops that are specialized in each parent object.
Also functions that iterate on this list must take care that the
devices can be something else than VFIOPCIDevice. The type is used
to discriminate them.
we profit from this step to change the prototype of
vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
apply to VFIODevice. They are renamed as *_irqindex.
The index is passed as parameter to anticipate their usage for
platform IRQs
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/pci.c | 243 +++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 149 insertions(+), 94 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a7df3de..d0bee62 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -44,6 +44,11 @@
#define VFIO_ALLOW_KVM_MSI 1
#define VFIO_ALLOW_KVM_MSIX 1
+enum {
+ VFIO_DEVICE_TYPE_PCI = 0,
+ VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
struct VFIOPCIDevice;
typedef struct VFIOQuirk {
@@ -173,9 +178,27 @@ typedef struct VFIOMSIXInfo {
void *mmap;
} VFIOMSIXInfo;
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+ QLIST_ENTRY(VFIODevice) next;
+ struct VFIOGroup *group;
+ char *name;
+ int fd;
+ int type;
+ bool reset_works;
+ bool needs_reset;
+ VFIODeviceOps *ops;
+} VFIODevice;
+
+struct VFIODeviceOps {
+ bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+ int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+};
+
typedef struct VFIOPCIDevice {
PCIDevice pdev;
- int fd;
+ VFIODevice vbasedev;
VFIOINTx intx;
unsigned int config_size;
uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
@@ -191,20 +214,16 @@ typedef struct VFIOPCIDevice {
VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
PCIHostDeviceAddress host;
- QLIST_ENTRY(VFIOPCIDevice) next;
- struct VFIOGroup *group;
EventNotifier err_notifier;
uint32_t features;
#define VFIO_FEATURE_ENABLE_VGA_BIT 0
#define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
int32_t bootindex;
uint8_t pm_cap;
- bool reset_works;
bool has_vga;
bool pci_aer;
bool has_flr;
bool has_pm_reset;
- bool needs_reset;
bool rom_read_failed;
} VFIOPCIDevice;
@@ -212,7 +231,7 @@ typedef struct VFIOGroup {
int fd;
int groupid;
VFIOContainer *container;
- QLIST_HEAD(, VFIOPCIDevice) device_list;
+ QLIST_HEAD(, VFIODevice) device_list;
QLIST_ENTRY(VFIOGroup) next;
QLIST_ENTRY(VFIOGroup) container_next;
} VFIOGroup;
@@ -265,7 +284,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
/*
* Common VFIO interrupt disable
*/
-static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
{
struct vfio_irq_set irq_set = {
.argsz = sizeof(irq_set),
@@ -275,37 +294,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
.count = 0,
};
- ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
}
/*
* INTx
*/
-static void vfio_unmask_intx(VFIOPCIDevice *vdev)
+static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
{
struct vfio_irq_set irq_set = {
.argsz = sizeof(irq_set),
.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
- .index = VFIO_PCI_INTX_IRQ_INDEX,
+ .index = index,
.start = 0,
.count = 1,
};
- ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
}
#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIOPCIDevice *vdev)
+static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
{
struct vfio_irq_set irq_set = {
.argsz = sizeof(irq_set),
.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
- .index = VFIO_PCI_INTX_IRQ_INDEX,
+ .index = index,
.start = 0,
.count = 1,
};
- ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
}
#endif
@@ -369,7 +388,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
vdev->intx.pending = false;
pci_irq_deassert(&vdev->pdev);
- vfio_unmask_intx(vdev);
+ vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
}
static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -392,7 +411,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
/* Get to a known interrupt state */
qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
- vfio_mask_intx(vdev);
+ vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
vdev->intx.pending = false;
pci_irq_deassert(&vdev->pdev);
@@ -422,7 +441,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
*pfd = irqfd.resamplefd;
- ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
g_free(irq_set);
if (ret) {
error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
@@ -430,7 +449,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
}
/* Let'em rip */
- vfio_unmask_intx(vdev);
+ vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
vdev->intx.kvm_accel = true;
@@ -447,7 +466,7 @@ fail_irqfd:
event_notifier_cleanup(&vdev->intx.unmask);
fail:
qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
- vfio_unmask_intx(vdev);
+ vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
#endif
}
@@ -468,7 +487,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
* Get to a known state, hardware masked, QEMU ready to accept new
* interrupts, QEMU IRQ de-asserted.
*/
- vfio_mask_intx(vdev);
+ vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
vdev->intx.pending = false;
pci_irq_deassert(&vdev->pdev);
@@ -486,7 +505,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
vdev->intx.kvm_accel = false;
/* If we've missed an event, let it re-fire through QEMU */
- vfio_unmask_intx(vdev);
+ vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
DPRINTF("%s(%04x:%02x:%02x.%x) KVM INTx accel disabled\n",
__func__, vdev->host.domain, vdev->host.bus,
@@ -574,7 +593,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
*pfd = event_notifier_get_fd(&vdev->intx.interrupt);
qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
- ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
g_free(irq_set);
if (ret) {
error_report("vfio: Error: Failed to setup INTx fd: %m");
@@ -599,7 +618,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
timer_del(vdev->intx.mmap_timer);
vfio_disable_intx_kvm(vdev);
- vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+ vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
vdev->intx.pending = false;
pci_irq_deassert(&vdev->pdev);
vfio_mmap_set_enabled(vdev, true);
@@ -678,7 +697,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
}
}
- ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
g_free(irq_set);
@@ -777,7 +796,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
* increase them as needed.
*/
if (vdev->nr_vectors < nr + 1) {
- vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+ vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
vdev->nr_vectors = nr + 1;
ret = vfio_enable_vectors(vdev, true);
if (ret) {
@@ -805,7 +824,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
*pfd = event_notifier_get_fd(&vector->interrupt);
}
- ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
g_free(irq_set);
if (ret) {
error_report("vfio: failed to modify vector, %d", ret);
@@ -856,7 +875,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
*pfd = event_notifier_get_fd(&vector->interrupt);
- ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
g_free(irq_set);
}
@@ -1016,7 +1035,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
}
if (vdev->nr_vectors) {
- vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+ vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
}
vfio_disable_msi_common(vdev);
@@ -1027,7 +1046,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
static void vfio_disable_msi(VFIOPCIDevice *vdev)
{
- vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
+ vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
vfio_disable_msi_common(vdev);
DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
@@ -1173,7 +1192,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
off_t off = 0;
size_t bytes;
- if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
+ if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
error_report("vfio: Error getting ROM info: %m");
return;
}
@@ -1203,7 +1222,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
memset(vdev->rom, 0xff, size);
while (size) {
- bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
+ bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
+ size, vdev->rom_offset + off);
if (bytes == 0) {
break;
} else if (bytes > 0) {
@@ -1297,6 +1317,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
DeviceState *dev = DEVICE(vdev);
char name[32];
+ int fd = vdev->vbasedev.fd;
if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
/* Since pci handles romfile, just print a message and return */
@@ -1315,10 +1336,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
* Use the same size ROM BAR as the physical device. The contents
* will get filled in later when the guest tries to read it.
*/
- if (pread(vdev->fd, &orig, 4, offset) != 4 ||
- pwrite(vdev->fd, &size, 4, offset) != 4 ||
- pread(vdev->fd, &size, 4, offset) != 4 ||
- pwrite(vdev->fd, &orig, 4, offset) != 4) {
+ if (pread(fd, &orig, 4, offset) != 4 ||
+ pwrite(fd, &size, 4, offset) != 4 ||
+ pread(fd, &size, 4, offset) != 4 ||
+ pwrite(fd, &orig, 4, offset) != 4) {
error_report("%s(%04x:%02x:%02x.%x) failed: %m",
__func__, vdev->host.domain, vdev->host.bus,
vdev->host.slot, vdev->host.function);
@@ -2302,7 +2323,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
ssize_t ret;
- ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
+ ret = pread(vdev->vbasedev.fd, &phys_val, len,
+ vdev->config_offset + addr);
if (ret != len) {
error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
__func__, vdev->host.domain, vdev->host.bus,
@@ -2332,7 +2354,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
vdev->host.function, addr, val, len);
/* Write everything to VFIO, let it filter out what we can't write */
- if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
+ if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
+ != len) {
error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
__func__, vdev->host.domain, vdev->host.bus,
vdev->host.slot, vdev->host.function, addr, val, len);
@@ -2702,7 +2725,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
bool msi_64bit, msi_maskbit;
int ret, entries;
- if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+ if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
return -errno;
}
@@ -2741,23 +2764,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
uint8_t pos;
uint16_t ctrl;
uint32_t table, pba;
+ int fd = vdev->vbasedev.fd;
pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
if (!pos) {
return 0;
}
- if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+ if (pread(fd, &ctrl, sizeof(ctrl),
vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
return -errno;
}
- if (pread(vdev->fd, &table, sizeof(table),
+ if (pread(fd, &table, sizeof(table),
vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
return -errno;
}
- if (pread(vdev->fd, &pba, sizeof(pba),
+ if (pread(fd, &pba, sizeof(pba),
vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
return -errno;
}
@@ -2913,7 +2937,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
vdev->host.function, nr);
/* Determine what type of BAR this is for registration */
- ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
+ ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
if (ret != sizeof(pci_bar)) {
error_report("vfio: Failed to read BAR %d (%m)", nr);
@@ -3334,12 +3358,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
single ? "one" : "multi");
vfio_pci_pre_reset(vdev);
- vdev->needs_reset = false;
+ vdev->vbasedev.needs_reset = false;
info = g_malloc0(sizeof(*info));
info->argsz = sizeof(*info);
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
if (ret && errno != ENOSPC) {
ret = -errno;
if (!vdev->has_pm_reset) {
@@ -3355,7 +3379,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
info->argsz = sizeof(*info) + (count * sizeof(*devices));
devices = &info->devices[0];
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
if (ret) {
ret = -errno;
error_report("vfio: hot reset info failed: %m");
@@ -3370,6 +3394,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
for (i = 0; i < info->count; i++) {
PCIHostDeviceAddress host;
VFIOPCIDevice *tmp;
+ VFIODevice *vbasedev_iter;
host.domain = devices[i].segment;
host.bus = devices[i].bus;
@@ -3401,7 +3426,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
}
/* Prep dependent devices for reset and clear our marker. */
- QLIST_FOREACH(tmp, &group->device_list, next) {
+ QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+ if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
if (vfio_pci_host_match(&host, &tmp->host)) {
if (single) {
DPRINTF("vfio: found another in-use device "
@@ -3411,7 +3440,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
goto out_single;
}
vfio_pci_pre_reset(tmp);
- tmp->needs_reset = false;
+ tmp->vbasedev.needs_reset = false;
multi = true;
break;
}
@@ -3450,7 +3479,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
}
/* Bus reset! */
- ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
g_free(reset);
DPRINTF("%04x:%02x:%02x.%x hot reset: %s\n", vdev->host.domain,
@@ -3462,6 +3491,7 @@ out:
for (i = 0; i < info->count; i++) {
PCIHostDeviceAddress host;
VFIOPCIDevice *tmp;
+ VFIODevice *vbasedev_iter;
host.domain = devices[i].segment;
host.bus = devices[i].bus;
@@ -3482,7 +3512,11 @@ out:
break;
}
- QLIST_FOREACH(tmp, &group->device_list, next) {
+ QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+ if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
if (vfio_pci_host_match(&host, &tmp->host)) {
vfio_pci_post_reset(tmp);
break;
@@ -3516,28 +3550,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
return vfio_pci_hot_reset(vdev, true);
}
-static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
{
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
return vfio_pci_hot_reset(vdev, false);
}
-static void vfio_pci_reset_handler(void *opaque)
+static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
+{
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+ if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
+ vbasedev->needs_reset = true;
+ }
+ return vbasedev->needs_reset;
+}
+
+static VFIODeviceOps vfio_pci_ops = {
+ .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
+ .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+};
+
+static void vfio_reset_handler(void *opaque)
{
VFIOGroup *group;
- VFIOPCIDevice *vdev;
+ VFIODevice *vbasedev;
QLIST_FOREACH(group, &group_list, next) {
- QLIST_FOREACH(vdev, &group->device_list, next) {
- if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
- vdev->needs_reset = true;
- }
+ QLIST_FOREACH(vbasedev, &group->device_list, next) {
+ vbasedev->ops->vfio_compute_needs_reset(vbasedev);
}
}
QLIST_FOREACH(group, &group_list, next) {
- QLIST_FOREACH(vdev, &group->device_list, next) {
- if (vdev->needs_reset) {
- vfio_pci_hot_reset_multi(vdev);
+ QLIST_FOREACH(vbasedev, &group->device_list, next) {
+ if (vbasedev->needs_reset) {
+ vbasedev->ops->vfio_hot_reset_multi(vbasedev);
}
}
}
@@ -3682,7 +3729,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
if (container->iommu_data.type1.error) {
ret = container->iommu_data.type1.error;
- error_report("vfio: memory listener initialization failed for container");
+ error_report("vfio: memory listener initialization failed"
+ " for container");
goto listener_release_exit;
}
@@ -3826,7 +3874,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
}
if (QLIST_EMPTY(&group_list)) {
- qemu_register_reset(vfio_pci_reset_handler, NULL);
+ qemu_register_reset(vfio_reset_handler, NULL);
}
QLIST_INSERT_HEAD(&group_list, group, next);
@@ -3858,7 +3906,7 @@ static void vfio_put_group(VFIOGroup *group)
g_free(group);
if (QLIST_EMPTY(&group_list)) {
- qemu_unregister_reset(vfio_pci_reset_handler, NULL);
+ qemu_unregister_reset(vfio_reset_handler, NULL);
}
}
@@ -3879,12 +3927,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
return ret;
}
- vdev->fd = ret;
- vdev->group = group;
- QLIST_INSERT_HEAD(&group->device_list, vdev, next);
+ vdev->vbasedev.fd = ret;
+ vdev->vbasedev.group = group;
+ QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
/* Sanity check device */
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
if (ret) {
error_report("vfio: error getting device info: %m");
goto error;
@@ -3898,7 +3946,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
goto error;
}
- vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+ vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
error_report("vfio: unexpected number of io regions %u",
@@ -3914,7 +3962,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
reg_info.index = i;
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
if (ret) {
error_report("vfio: Error getting region %d info: %m", i);
goto error;
@@ -3928,14 +3976,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
vdev->bars[i].flags = reg_info.flags;
vdev->bars[i].size = reg_info.size;
vdev->bars[i].fd_offset = reg_info.offset;
- vdev->bars[i].fd = vdev->fd;
+ vdev->bars[i].fd = vdev->vbasedev.fd;
vdev->bars[i].nr = i;
QLIST_INIT(&vdev->bars[i].quirks);
}
reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
if (ret) {
error_report("vfio: Error getting config info: %m");
goto error;
@@ -3959,7 +4007,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
.index = VFIO_PCI_VGA_REGION_INDEX,
};
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
if (ret) {
error_report(
"vfio: Device does not support requested feature x-vga");
@@ -3976,7 +4024,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
}
vdev->vga.fd_offset = vga_info.offset;
- vdev->vga.fd = vdev->fd;
+ vdev->vga.fd = vdev->vbasedev.fd;
vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
@@ -3994,7 +4042,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
}
irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
- ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
if (ret) {
/* This can fail for an old kernel or legacy PCI dev */
DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
@@ -4010,19 +4058,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
error:
if (ret) {
- QLIST_REMOVE(vdev, next);
- vdev->group = NULL;
- close(vdev->fd);
+ QLIST_REMOVE(&vdev->vbasedev, next);
+ vdev->vbasedev.group = NULL;
+ close(vdev->vbasedev.fd);
}
return ret;
}
static void vfio_put_device(VFIOPCIDevice *vdev)
{
- QLIST_REMOVE(vdev, next);
- vdev->group = NULL;
+ QLIST_REMOVE(&vdev->vbasedev, next);
+ vdev->vbasedev.group = NULL;
DPRINTF("vfio_put_device: close vdev->fd\n");
- close(vdev->fd);
+ close(vdev->vbasedev.fd);
+ g_free(vdev->vbasedev.name);
if (vdev->msix) {
g_free(vdev->msix);
vdev->msix = NULL;
@@ -4091,7 +4140,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
*pfd = event_notifier_get_fd(&vdev->err_notifier);
qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
- ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
if (ret) {
error_report("vfio: Failed to set up error notification");
qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
@@ -4124,7 +4173,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
pfd = (int32_t *)&irq_set->data;
*pfd = -1;
- ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
if (ret) {
error_report("vfio: Failed to de-assign error fd: %m");
}
@@ -4136,7 +4185,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
static int vfio_initfn(PCIDevice *pdev)
{
- VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+ VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+ VFIODevice *vbasedev_iter;
VFIOGroup *group;
char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
ssize_t len;
@@ -4154,6 +4204,14 @@ static int vfio_initfn(PCIDevice *pdev)
return -errno;
}
+ vdev->vbasedev.ops = &vfio_pci_ops;
+
+ vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
+ vdev->vbasedev.name = g_malloc0(PATH_MAX);
+ snprintf(vdev->vbasedev.name, PATH_MAX, "%04x:%02x:%02x.%01x",
+ vdev->host.domain, vdev->host.bus, vdev->host.slot,
+ vdev->host.function);
+
strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
len = readlink(path, iommu_group_path, sizeof(path));
@@ -4183,12 +4241,8 @@ static int vfio_initfn(PCIDevice *pdev)
vdev->host.domain, vdev->host.bus, vdev->host.slot,
vdev->host.function);
- QLIST_FOREACH(pvdev, &group->device_list, next) {
- if (pvdev->host.domain == vdev->host.domain &&
- pvdev->host.bus == vdev->host.bus &&
- pvdev->host.slot == vdev->host.slot &&
- pvdev->host.function == vdev->host.function) {
-
+ QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+ if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
error_report("vfio: error: device %s is already attached", path);
vfio_put_group(group);
return -EBUSY;
@@ -4203,7 +4257,7 @@ static int vfio_initfn(PCIDevice *pdev)
}
/* Get a copy of config space */
- ret = pread(vdev->fd, vdev->pdev.config,
+ ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
MIN(pci_config_size(&vdev->pdev), vdev->config_size),
vdev->config_offset);
if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
@@ -4291,7 +4345,7 @@ out_put:
static void vfio_exitfn(PCIDevice *pdev)
{
VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
- VFIOGroup *group = vdev->group;
+ VFIOGroup *group = vdev->vbasedev.group;
vfio_unregister_err_notifier(vdev);
pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
@@ -4317,8 +4371,9 @@ static void vfio_pci_reset(DeviceState *dev)
vfio_pci_pre_reset(vdev);
- if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
- !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+ if (vdev->vbasedev.reset_works &&
+ (vdev->has_flr || !vdev->has_pm_reset) &&
+ !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
DPRINTF("%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET\n", vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function);
goto post_reset;
@@ -4330,8 +4385,8 @@ static void vfio_pci_reset(DeviceState *dev)
}
/* If nothing else works and the device supports PM reset, use it */
- if (vdev->reset_works && vdev->has_pm_reset &&
- !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+ if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
+ !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
DPRINTF("%04x:%02x:%02x.%x PCI PM Reset\n", vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function);
goto post_reset;
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (3 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-08 22:41 ` Alex Williamson
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device Eric Auger
` (7 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.
vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.
vfio_mmap_bar becomes vfio_map_region
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/pci.c | 169 ++++++++++++++++++++++++++++++++--------------------------
1 file changed, 93 insertions(+), 76 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d0bee62..5f0164a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -74,15 +74,20 @@ typedef struct VFIOQuirk {
} data;
} VFIOQuirk;
-typedef struct VFIOBAR {
- off_t fd_offset; /* offset of BAR within device fd */
- int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
+typedef struct VFIORegion {
+ struct VFIODevice *vbasedev;
+ off_t fd_offset; /* offset of region within device fd */
+ int fd; /* device fd, allows us to pass VFIORegion as opaque data */
MemoryRegion mem; /* slow, read/write access */
MemoryRegion mmap_mem; /* direct mapped access */
void *mmap;
size_t size;
uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
- uint8_t nr; /* cache the BAR number for debug */
+ uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOBAR {
+ VFIORegion region;
bool ioport;
bool mem64;
QLIST_HEAD(, VFIOQuirk) quirks;
@@ -194,6 +199,7 @@ typedef struct VFIODevice {
struct VFIODeviceOps {
bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+ void (*vfio_eoi)(VFIODevice *vdev);
};
typedef struct VFIOPCIDevice {
@@ -377,8 +383,10 @@ static void vfio_intx_interrupt(void *opaque)
}
}
-static void vfio_eoi(VFIOPCIDevice *vdev)
+static void vfio_eoi(VFIODevice *vbasedev)
{
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
if (!vdev->intx.pending) {
return;
}
@@ -388,7 +396,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
vdev->intx.pending = false;
pci_irq_deassert(&vdev->pdev);
- vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
+ vfio_unmask_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
}
static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -543,7 +551,7 @@ static void vfio_update_irq(PCIDevice *pdev)
vfio_enable_intx_kvm(vdev);
/* Re-enable the interrupt in cased we missed an EOI */
- vfio_eoi(vdev);
+ vfio_eoi(&vdev->vbasedev);
}
static int vfio_enable_intx(VFIOPCIDevice *vdev)
@@ -1073,10 +1081,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
/*
* IO Port/MMIO - Beware of the endians, VFIO is always little endian
*/
-static void vfio_bar_write(void *opaque, hwaddr addr,
+static void vfio_region_write(void *opaque, hwaddr addr,
uint64_t data, unsigned size)
{
- VFIOBAR *bar = opaque;
+ VFIORegion *region = opaque;
+ VFIODevice *vbasedev = region->vbasedev;
union {
uint8_t byte;
uint16_t word;
@@ -1099,19 +1108,16 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
break;
}
- if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
+ if (pwrite(region->fd, &buf, size, region->fd_offset + addr) != size) {
error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
__func__, addr, data, size);
}
#ifdef DEBUG_VFIO
{
- VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
- DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx", 0x%"PRIx64
- ", %d)\n", __func__, vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function, bar->nr, addr,
- data, size);
+ DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+ ", %d)\n", __func__, vbasedev->name,
+ region->nr, addr, data, size);
}
#endif
@@ -1123,13 +1129,15 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
* which access will service the interrupt, so we're potentially
* getting quite a few host interrupts per guest interrupt.
*/
- vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+ vbasedev->ops->vfio_eoi(vbasedev);
+
}
-static uint64_t vfio_bar_read(void *opaque,
+static uint64_t vfio_region_read(void *opaque,
hwaddr addr, unsigned size)
{
- VFIOBAR *bar = opaque;
+ VFIORegion *region = opaque;
+ VFIODevice *vbasedev = region->vbasedev;
union {
uint8_t byte;
uint16_t word;
@@ -1138,7 +1146,7 @@ static uint64_t vfio_bar_read(void *opaque,
} buf;
uint64_t data = 0;
- if (pread(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
+ if (pread(region->fd, &buf, size, region->fd_offset + addr) != size) {
error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
__func__, addr, size);
return (uint64_t)-1;
@@ -1161,24 +1169,21 @@ static uint64_t vfio_bar_read(void *opaque,
#ifdef DEBUG_VFIO
{
- VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
- DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx
- ", %d) = 0x%"PRIx64"\n", __func__, vdev->host.domain,
- vdev->host.bus, vdev->host.slot, vdev->host.function,
- bar->nr, addr, size, data);
+ DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
+ __func__, vdev->name,
+ region->nr, addr, size, data);
}
#endif
/* Same as write above */
- vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+ vbasedev->ops->vfio_eoi(vbasedev);
return data;
}
-static const MemoryRegionOps vfio_bar_ops = {
- .read = vfio_bar_read,
- .write = vfio_bar_write,
+static const MemoryRegionOps vfio_region_ops = {
+ .read = vfio_region_read,
+ .write = vfio_region_write,
.endianness = DEVICE_NATIVE_ENDIAN,
};
@@ -1513,7 +1518,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
vdev->host.bus, vdev->host.slot, vdev->host.function,
quirk->data.bar, addr, size, data);
} else {
- data = vfio_bar_read(&vdev->bars[quirk->data.bar],
+ data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
addr + quirk->data.base_offset, size);
}
@@ -1564,7 +1569,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
return;
}
- vfio_bar_write(&vdev->bars[quirk->data.bar],
+ vfio_region_write(&vdev->bars[quirk->data.bar].region,
addr + quirk->data.base_offset, data, size);
}
@@ -1598,7 +1603,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
vdev->host.bus, vdev->host.slot, vdev->host.function,
quirk->data.bar, addr + base, size, data);
} else {
- data = vfio_bar_read(&vdev->bars[quirk->data.bar], addr + base, size);
+ data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+ addr + base, size);
}
return data;
@@ -1627,7 +1633,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
vdev->host.domain, vdev->host.bus, vdev->host.slot,
vdev->host.function, quirk->data.bar, addr + base, data, size);
} else {
- vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+ vfio_region_write(&vdev->bars[quirk->data.bar].region,
+ addr + base, data, size);
}
}
@@ -1680,7 +1687,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
* As long as the BAR is >= 256 bytes it will be aligned such that the
* lower byte is always zero. Filter out anything else, if it exists.
*/
- if (!vdev->bars[4].ioport || vdev->bars[4].size < 256) {
+ if (!vdev->bars[4].ioport || vdev->bars[4].region.size < 256) {
return;
}
@@ -1733,7 +1740,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem, OBJECT(vdev),
&vfio_generic_window_quirk, quirk,
"vfio-ati-bar4-window-quirk", 8);
- memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+ memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
quirk->data.base_offset, &quirk->mem, 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1807,7 +1814,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
memory_region_name(&quirk->mem), vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function);
- return vfio_bar_read(&vdev->bars[quirk->data.bar], addr + 0x70, size);
+ return vfio_region_read(&vdev->bars[quirk->data.bar].region,
+ addr + 0x70, size);
}
static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
@@ -1847,7 +1855,8 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
memory_region_name(&quirk->mem), vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function);
- vfio_bar_write(&vdev->bars[quirk->data.bar], addr + 0x70, data, size);
+ vfio_region_write(&vdev->bars[quirk->data.bar].region,
+ addr + 0x70, data, size);
}
static const MemoryRegionOps vfio_rtl8168_window_quirk = {
@@ -1877,7 +1886,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_rtl8168_window_quirk,
quirk, "vfio-rtl8168-window-quirk", 8);
- memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+ memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
0x70, &quirk->mem, 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1910,7 +1919,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
"vfio-ati-bar2-4000-quirk",
TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
- memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+ memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
quirk->data.address_match & TARGET_PAGE_MASK,
&quirk->mem, 1);
@@ -2029,7 +2038,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
VFIOQuirk *quirk;
if (pci_get_word(pdev->config + PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA ||
- !vdev->bars[1].size) {
+ !vdev->bars[1].region.size) {
return;
}
@@ -2137,7 +2146,8 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem, OBJECT(vdev),
&vfio_nvidia_bar5_window_quirk, quirk,
"vfio-nvidia-bar5-window-quirk", 16);
- memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 1);
+ memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+ 0, &quirk->mem, 1);
QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -2164,7 +2174,8 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
*/
if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
- vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+ vfio_region_write(&vdev->bars[quirk->data.bar].region,
+ addr + base, data, size);
}
}
@@ -2203,7 +2214,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
quirk, "vfio-nvidia-bar0-88000-quirk",
TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
- memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+ memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
quirk->data.address_match & TARGET_PAGE_MASK,
&quirk->mem, 1);
@@ -2229,7 +2240,8 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
/* Log the chipset ID */
DPRINTF("Nvidia NV%02x\n",
- (unsigned int)(vfio_bar_read(&vdev->bars[0], 0, 4) >> 20) & 0xff);
+ (unsigned int)(vfio_region_read(&vdev->bars[0].region, 0, 4) >> 20)
+ & 0xff);
quirk = g_malloc0(sizeof(*quirk));
quirk->vdev = vdev;
@@ -2241,7 +2253,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
"vfio-nvidia-bar0-1800-quirk",
TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
- memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+ memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
quirk->data.address_match & TARGET_PAGE_MASK,
&quirk->mem, 1);
@@ -2298,7 +2310,7 @@ static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
while (!QLIST_EMPTY(&bar->quirks)) {
VFIOQuirk *quirk = QLIST_FIRST(&bar->quirks);
- memory_region_del_subregion(&bar->mem, &quirk->mem);
+ memory_region_del_subregion(&bar->region.mem, &quirk->mem);
memory_region_destroy(&quirk->mem);
QLIST_REMOVE(quirk, next);
g_free(quirk);
@@ -2811,9 +2823,9 @@ static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
int ret;
ret = msix_init(&vdev->pdev, vdev->msix->entries,
- &vdev->bars[vdev->msix->table_bar].mem,
+ &vdev->bars[vdev->msix->table_bar].region.mem,
vdev->msix->table_bar, vdev->msix->table_offset,
- &vdev->bars[vdev->msix->pba_bar].mem,
+ &vdev->bars[vdev->msix->pba_bar].region.mem,
vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
if (ret < 0) {
if (ret == -ENOTSUP) {
@@ -2831,8 +2843,9 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
msi_uninit(&vdev->pdev);
if (vdev->msix) {
- msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
- &vdev->bars[vdev->msix->pba_bar].mem);
+ msix_uninit(&vdev->pdev,
+ &vdev->bars[vdev->msix->table_bar].region.mem,
+ &vdev->bars[vdev->msix->pba_bar].region.mem);
}
}
@@ -2846,11 +2859,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
for (i = 0; i < PCI_ROM_SLOT; i++) {
VFIOBAR *bar = &vdev->bars[i];
- if (!bar->size) {
+ if (!bar->region.size) {
continue;
}
- memory_region_set_enabled(&bar->mmap_mem, enabled);
+ memory_region_set_enabled(&bar->region.mmap_mem, enabled);
if (vdev->msix && vdev->msix->table_bar == i) {
memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
}
@@ -2861,45 +2874,46 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
- if (!bar->size) {
+ if (!bar->region.size) {
return;
}
vfio_bar_quirk_teardown(vdev, nr);
- memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
- munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
- memory_region_destroy(&bar->mmap_mem);
+ memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
+ munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
+ memory_region_destroy(&bar->region.mmap_mem);
if (vdev->msix && vdev->msix->table_bar == nr) {
- memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
+ memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
memory_region_destroy(&vdev->msix->mmap_mem);
}
- memory_region_destroy(&bar->mem);
+ memory_region_destroy(&bar->region.mem);
}
-static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
+static int vfio_mmap_region(Object *vdev, VFIORegion *region,
MemoryRegion *mem, MemoryRegion *submem,
void **map, size_t size, off_t offset,
const char *name)
{
int ret = 0;
- if (VFIO_ALLOW_MMAP && size && bar->flags & VFIO_REGION_INFO_FLAG_MMAP) {
+ if (VFIO_ALLOW_MMAP && size && region->flags &
+ VFIO_REGION_INFO_FLAG_MMAP) {
int prot = 0;
- if (bar->flags & VFIO_REGION_INFO_FLAG_READ) {
+ if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
prot |= PROT_READ;
}
- if (bar->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+ if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
prot |= PROT_WRITE;
}
*map = mmap(NULL, size, prot, MAP_SHARED,
- bar->fd, bar->fd_offset + offset);
+ region->fd, region->fd_offset + offset);
if (*map == MAP_FAILED) {
*map = NULL;
ret = -errno;
@@ -2921,7 +2935,7 @@ empty_region:
static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
- unsigned size = bar->size;
+ unsigned size = bar->region.size;
char name[64];
uint32_t pci_bar;
uint8_t type;
@@ -2951,9 +2965,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
~PCI_BASE_ADDRESS_MEM_MASK);
/* A "slow" read/write mapping underlies all BARs */
- memory_region_init_io(&bar->mem, OBJECT(vdev), &vfio_bar_ops,
+ memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
bar, name, size);
- pci_register_bar(&vdev->pdev, nr, type, &bar->mem);
+ pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
/*
* We can't mmap areas overlapping the MSIX vector table, so we
@@ -2964,8 +2978,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
}
strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
- if (vfio_mmap_bar(vdev, bar, &bar->mem,
- &bar->mmap_mem, &bar->mmap, size, 0, name)) {
+ if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+ &bar->region.mmap_mem, &bar->region.mmap,
+ size, 0, name)) {
error_report("%s unsupported. Performance may be slow", name);
}
@@ -2975,10 +2990,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
start = HOST_PAGE_ALIGN(vdev->msix->table_offset +
(vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
- size = start < bar->size ? bar->size - start : 0;
+ size = start < bar->region.size ? bar->region.size - start : 0;
strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
/* VFIOMSIXInfo contains another MemoryRegion for this mapping */
- if (vfio_mmap_bar(vdev, bar, &bar->mem, &vdev->msix->mmap_mem,
+ if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+ &vdev->msix->mmap_mem,
&vdev->msix->mmap, size, start, name)) {
error_report("%s unsupported. Performance may be slow", name);
}
@@ -3568,6 +3584,7 @@ static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
static VFIODeviceOps vfio_pci_ops = {
.vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
.vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+ .vfio_eoi = vfio_eoi,
};
static void vfio_reset_handler(void *opaque)
@@ -3973,11 +3990,11 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
(unsigned long)reg_info.size, (unsigned long)reg_info.offset,
(unsigned long)reg_info.flags);
- vdev->bars[i].flags = reg_info.flags;
- vdev->bars[i].size = reg_info.size;
- vdev->bars[i].fd_offset = reg_info.offset;
- vdev->bars[i].fd = vdev->vbasedev.fd;
- vdev->bars[i].nr = i;
+ vdev->bars[i].region.flags = reg_info.flags;
+ vdev->bars[i].region.size = reg_info.size;
+ vdev->bars[i].region.fd_offset = reg_info.offset;
+ vdev->bars[i].region.fd = vdev->vbasedev.fd;
+ vdev->bars[i].region.nr = i;
QLIST_INIT(&vdev->bars[i].quirks);
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (4 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-08 22:43 ` Alex Williamson
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 07/13] hw/vfio: create common module Eric Auger
` (6 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
vfio_get_device now takes a VFIODevice as argument. The function is split
into 4 functional parts: dev_info query, device check, region populate
and interrupt populate. the last 3 are specialized by parent device and
are added into DeviceOps.
3 new fields are introduced in VFIODevice to store dev_info.
vfio_put_base_device is created.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/pci.c | 181 +++++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 121 insertions(+), 60 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5f0164a..d228cf8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -194,12 +194,18 @@ typedef struct VFIODevice {
bool reset_works;
bool needs_reset;
VFIODeviceOps *ops;
+ unsigned int num_irqs;
+ unsigned int num_regions;
+ unsigned int flags;
} VFIODevice;
struct VFIODeviceOps {
bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
int (*vfio_hot_reset_multi)(VFIODevice *vdev);
void (*vfio_eoi)(VFIODevice *vdev);
+ int (*vfio_check_device)(VFIODevice *vdev);
+ int (*vfio_populate_regions)(VFIODevice *vdev);
+ int (*vfio_populate_interrupts)(VFIODevice *vdev);
};
typedef struct VFIOPCIDevice {
@@ -286,6 +292,10 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
uint32_t val, int len);
static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_put_base_device(VFIODevice *vbasedev);
+static int vfio_check_device(VFIODevice *vbasedev);
+static int vfio_populate_regions(VFIODevice *vbasedev);
+static int vfio_populate_interrupts(VFIODevice *vbasedev);
/*
* Common VFIO interrupt disable
@@ -3585,6 +3595,9 @@ static VFIODeviceOps vfio_pci_ops = {
.vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
.vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
.vfio_eoi = vfio_eoi,
+ .vfio_check_device = vfio_check_device,
+ .vfio_populate_regions = vfio_populate_regions,
+ .vfio_populate_interrupts = vfio_populate_interrupts,
};
static void vfio_reset_handler(void *opaque)
@@ -3927,54 +3940,53 @@ static void vfio_put_group(VFIOGroup *group)
}
}
-static int vfio_get_device(VFIOGroup *group, const char *name,
- VFIOPCIDevice *vdev)
+static int vfio_check_device(VFIODevice *vbasedev)
{
- struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
- struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
- struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
- int ret, i;
-
- ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
- if (ret < 0) {
- error_report("vfio: error getting device %s from group %d: %m",
- name, group->groupid);
- error_printf("Verify all devices in group %d are bound to vfio-pci "
- "or pci-stub and not already in use\n", group->groupid);
- return ret;
+ if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
+ error_report("vfio: Um, this isn't a PCI device");
+ goto error;
}
-
- vdev->vbasedev.fd = ret;
- vdev->vbasedev.group = group;
- QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
-
- /* Sanity check device */
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
- if (ret) {
- error_report("vfio: error getting device info: %m");
+ if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
+ error_report("vfio: unexpected number of io regions %u",
+ vbasedev->num_regions);
goto error;
}
-
- DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
- dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
-
- if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
- error_report("vfio: Um, this isn't a PCI device");
+ if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
+ error_report("vfio: unexpected number of irqs %u",
+ vbasedev->num_irqs);
goto error;
}
+ return 0;
+error:
+ vfio_put_base_device(vbasedev);
+ return -errno;
+}
- vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+static int vfio_populate_interrupts(VFIODevice *vbasedev)
+{
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+ int ret;
+ struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+ irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
- if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
- error_report("vfio: unexpected number of io regions %u",
- dev_info.num_regions);
- goto error;
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+ if (ret) {
+ /* This can fail for an old kernel or legacy PCI dev */
+ DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
+ } else if (irq_info.count == 1) {
+ vdev->pci_aer = true;
+ } else {
+ error_report("vfio: %s Could not enable error recovery for the device",
+ vbasedev->name);
}
+ return ret;
+}
- if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
- error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
- goto error;
- }
+static int vfio_populate_regions(VFIODevice *vbasedev)
+{
+ struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+ int i, ret;
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
reg_info.index = i;
@@ -4018,7 +4030,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
vdev->config_offset = reg_info.offset;
if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
- dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
+ vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
struct vfio_region_info vga_info = {
.argsz = sizeof(vga_info),
.index = VFIO_PCI_VGA_REGION_INDEX,
@@ -4057,38 +4069,87 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
vdev->has_vga = true;
}
- irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
+ return 0;
+error:
+ vfio_put_base_device(vbasedev);
+ return -errno;
+}
+
+static int vfio_get_device(VFIOGroup *group, const char *name,
+ VFIODevice *vbasedev)
+{
+ struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+ struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+ struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+ int ret;
+
+ ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+ if (ret < 0) {
+ error_report("vfio: error getting device %s from group %d: %m",
+ name, group->groupid);
+ error_printf("Verify all devices in group %d are bound to vfio-pci "
+ "or pci-stub and not already in use\n", group->groupid);
+ return ret;
+ }
+
+ vbasedev->fd = ret;
+ vbasedev->group = group;
+ QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
if (ret) {
- /* This can fail for an old kernel or legacy PCI dev */
- DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
- ret = 0;
- } else if (irq_info.count == 1) {
- vdev->pci_aer = true;
- } else {
- error_report("vfio: %04x:%02x:%02x.%x "
- "Could not enable error recovery for the device",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+ error_report("vfio: error getting device info: %m");
+ goto error;
+ }
+
+ vbasedev->num_irqs = dev_info.num_irqs;
+ vbasedev->num_regions = dev_info.num_regions;
+ vbasedev->flags = dev_info.flags;
+
+ DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
+ dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
+
+ vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+ /* call device specific functions */
+ ret = vbasedev->ops->vfio_check_device(vbasedev);
+ if (ret) {
+ error_report("vfio: error when checking device %s\n",
+ vbasedev->name);
+ goto error;
+ }
+ ret = vbasedev->ops->vfio_populate_regions(vbasedev);
+ if (ret) {
+ error_report("vfio: error when populating regions of device %s\n",
+ vbasedev->name);
+ goto error;
+ }
+ ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
+ if (ret) {
+ error_report("vfio: error when populating interrupts of device %s\n",
+ vbasedev->name);
+ goto error;
}
error:
if (ret) {
- QLIST_REMOVE(&vdev->vbasedev, next);
- vdev->vbasedev.group = NULL;
- close(vdev->vbasedev.fd);
+ vfio_put_base_device(vbasedev);
}
return ret;
}
-static void vfio_put_device(VFIOPCIDevice *vdev)
+void vfio_put_base_device(VFIODevice *vbasedev)
{
- QLIST_REMOVE(&vdev->vbasedev, next);
- vdev->vbasedev.group = NULL;
+ QLIST_REMOVE(vbasedev, next);
+ vbasedev->group = NULL;
DPRINTF("vfio_put_device: close vdev->fd\n");
- close(vdev->vbasedev.fd);
- g_free(vdev->vbasedev.name);
+ close(vbasedev->fd);
+ g_free(vbasedev->name);
+}
+
+static void vfio_put_device(VFIOPCIDevice *vdev)
+{
+ vfio_put_base_device(&vdev->vbasedev);
if (vdev->msix) {
g_free(vdev->msix);
vdev->msix = NULL;
@@ -4266,7 +4327,7 @@ static int vfio_initfn(PCIDevice *pdev)
}
}
- ret = vfio_get_device(group, path, vdev);
+ ret = vfio_get_device(group, path, &vdev->vbasedev);
if (ret) {
error_report("vfio: failed to get device %s", path);
vfio_put_group(group);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 07/13] hw/vfio: create common module
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (5 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings Eric Auger
` (5 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, Kim Phillips, patches, will.deacon,
agraf, stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis,
kvmarm
A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).
This patch only consists in move (no functional changes)
Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v3 -> v4:
[Eric Auger]
move done after all PCI modifications to anticipate for
VFIO Platform needs. Purpose is to alleviate the whole
review process.
<= v3
First split done by Kim Phillips
---
hw/vfio/Makefile.objs | 1 +
hw/vfio/common.c | 994 +++++++++++++++++++++++++++++++++++++
hw/vfio/pci.c | 1082 +----------------------------------------
include/hw/vfio/vfio-common.h | 148 ++++++
4 files changed, 1150 insertions(+), 1075 deletions(-)
create mode 100644 hw/vfio/common.c
create mode 100644 include/hw/vfio/vfio-common.h
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 31c7dab..e31f30e 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,3 +1,4 @@
ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_SOFTMMU) += common.o
obj-$(CONFIG_PCI) += pci.o
endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
new file mode 100644
index 0000000..ed93cf3
--- /dev/null
+++ b/hw/vfio/common.c
@@ -0,0 +1,994 @@
+/*
+ * generic functions used by VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ * Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ * Adapted for KVM by Qumranet.
+ * Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ * Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ * Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ * Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ * Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio.h"
+#include <linux/vfio.h>
+#include <hw/hw.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include "qemu/error-report.h"
+#include "sysemu/kvm.h"
+
+QLIST_HEAD(, VFIOGroup)
+ group_list = QLIST_HEAD_INITIALIZER(group_list);
+
+QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
+ QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+
+#ifdef CONFIG_KVM
+/*
+ * We have a single VFIO pseudo device per KVM VM. Once created it lives
+ * for the life of the VM. Closing the file descriptor only drops our
+ * reference to it and the device's reference to kvm. Therefore once
+ * initialized, this file descriptor is only released on QEMU exit and
+ * we'll re-use it should another vfio device be attached before then.
+ */
+static int vfio_kvm_device_fd = -1;
+#endif
+
+/*
+ * Common VFIO interrupt disable
+ */
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
+{
+ struct vfio_irq_set irq_set = {
+ .argsz = sizeof(irq_set),
+ .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+ .index = index,
+ .start = 0,
+ .count = 0,
+ };
+
+ ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
+{
+ struct vfio_irq_set irq_set = {
+ .argsz = sizeof(irq_set),
+ .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+ .index = index,
+ .start = 0,
+ .count = 1,
+ };
+
+ ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
+{
+ struct vfio_irq_set irq_set = {
+ .argsz = sizeof(irq_set),
+ .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+ .index = index,
+ .start = 0,
+ .count = 1,
+ };
+
+ ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+#endif
+
+/*
+ * IO Port/MMIO - Beware of the endians, VFIO is always little endian
+ */
+void vfio_region_write(void *opaque, hwaddr addr,
+ uint64_t data, unsigned size)
+{
+ VFIORegion *region = opaque;
+ VFIODevice *vbasedev = region->vbasedev;
+ union {
+ uint8_t byte;
+ uint16_t word;
+ uint32_t dword;
+ uint64_t qword;
+ } buf;
+
+ switch (size) {
+ case 1:
+ buf.byte = data;
+ break;
+ case 2:
+ buf.word = data;
+ break;
+ case 4:
+ buf.dword = data;
+ break;
+ default:
+ hw_error("vfio: unsupported write size, %d bytes", size);
+ break;
+ }
+
+ if (pwrite(region->fd, &buf, size, region->fd_offset + addr) != size) {
+ error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
+ __func__, addr, data, size);
+ }
+
+#ifdef DEBUG_VFIO
+ {
+ DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+ ", %d)\n", __func__, vbasedev->name,
+ region->nr, addr, data, size);
+ }
+#endif
+
+ /*
+ * A read or write to a BAR always signals an INTx EOI. This will
+ * do nothing if not pending (including not in INTx mode). We assume
+ * that a BAR access is in response to an interrupt and that BAR
+ * accesses will service the interrupt. Unfortunately, we don't know
+ * which access will service the interrupt, so we're potentially
+ * getting quite a few host interrupts per guest interrupt.
+ */
+ vbasedev->ops->vfio_eoi(vbasedev);
+
+}
+
+uint64_t vfio_region_read(void *opaque,
+ hwaddr addr, unsigned size)
+{
+ VFIORegion *region = opaque;
+ VFIODevice *vbasedev = region->vbasedev;
+ union {
+ uint8_t byte;
+ uint16_t word;
+ uint32_t dword;
+ uint64_t qword;
+ } buf;
+ uint64_t data = 0;
+
+ if (pread(region->fd, &buf, size, region->fd_offset + addr) != size) {
+ error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
+ __func__, addr, size);
+ return (uint64_t)-1;
+ }
+
+ switch (size) {
+ case 1:
+ data = buf.byte;
+ break;
+ case 2:
+ data = buf.word;
+ break;
+ case 4:
+ data = buf.dword;
+ break;
+ default:
+ hw_error("vfio: unsupported read size, %d bytes", size);
+ break;
+ }
+
+#ifdef DEBUG_VFIO
+ {
+ DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
+ __func__, vbasedev->name,
+ region->nr, addr, size, data);
+ }
+#endif
+
+ /* Same as write above */
+ vbasedev->ops->vfio_eoi(vbasedev);
+
+ return data;
+}
+
+const MemoryRegionOps vfio_region_ops = {
+ .read = vfio_region_read,
+ .write = vfio_region_write,
+ .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_dma_unmap(VFIOContainer *container,
+ hwaddr iova, ram_addr_t size)
+{
+ struct vfio_iommu_type1_dma_unmap unmap = {
+ .argsz = sizeof(unmap),
+ .flags = 0,
+ .iova = iova,
+ .size = size,
+ };
+
+ if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+ DPRINTF("VFIO_UNMAP_DMA: %d\n", -errno);
+ return -errno;
+ }
+
+ return 0;
+}
+
+static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly)
+{
+ struct vfio_iommu_type1_dma_map map = {
+ .argsz = sizeof(map),
+ .flags = VFIO_DMA_MAP_FLAG_READ,
+ .vaddr = (__u64)(uintptr_t)vaddr,
+ .iova = iova,
+ .size = size,
+ };
+
+ if (!readonly) {
+ map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
+ }
+
+ /*
+ * Try the mapping, if it fails with EBUSY, unmap the region and try
+ * again. This shouldn't be necessary, but we sometimes see it in
+ * the the VGA ROM space.
+ */
+ if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
+ (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+ ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
+ return 0;
+ }
+
+ DPRINTF("VFIO_MAP_DMA: %d\n", -errno);
+ return -errno;
+}
+
+static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+{
+ return (!memory_region_is_ram(section->mr) &&
+ !memory_region_is_iommu(section->mr)) ||
+ /*
+ * Sizing an enabled 64-bit BAR can cause spurious mappings to
+ * addresses in the upper part of the 64-bit address space. These
+ * are never accessed by the CPU and beyond the address width of
+ * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width.
+ */
+ section->offset_within_address_space & (1ULL << 63);
+}
+
+static void vfio_iommu_map_notify(Notifier *n, void *data)
+{
+ VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+ VFIOContainer *container = giommu->container;
+ IOMMUTLBEntry *iotlb = data;
+ MemoryRegion *mr;
+ hwaddr xlat;
+ hwaddr len = iotlb->addr_mask + 1;
+ void *vaddr;
+ int ret;
+
+ DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+ iotlb->iova, iotlb->iova + iotlb->addr_mask);
+
+ /*
+ * The IOMMU TLB entry we have just covers translation through
+ * this IOMMU to its immediate target. We need to translate
+ * it the rest of the way through to memory.
+ */
+ mr = address_space_translate(&address_space_memory,
+ iotlb->translated_addr,
+ &xlat, &len, iotlb->perm & IOMMU_WO);
+ if (!memory_region_is_ram(mr)) {
+ DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
+ xlat);
+ return;
+ }
+ /*
+ * Translation truncates length to the IOMMU page size,
+ * check that it did not truncate too much.
+ */
+ if (len & iotlb->addr_mask) {
+ DPRINTF("iommu has granularity incompatible with target AS\n");
+ return;
+ }
+
+ if (iotlb->perm != IOMMU_NONE) {
+ vaddr = memory_region_get_ram_ptr(mr) + xlat;
+
+ ret = vfio_dma_map(container, iotlb->iova,
+ iotlb->addr_mask + 1, vaddr,
+ !(iotlb->perm & IOMMU_WO) || mr->readonly);
+ if (ret) {
+ error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iotlb->iova,
+ iotlb->addr_mask + 1, vaddr, ret);
+ }
+ } else {
+ ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
+ if (ret) {
+ error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iotlb->iova,
+ iotlb->addr_mask + 1, ret);
+ }
+ }
+}
+
+static void vfio_listener_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+ VFIOContainer *container = container_of(listener, VFIOContainer,
+ iommu_data.type1.listener);
+ hwaddr iova, end;
+ Int128 llend;
+ void *vaddr;
+ int ret;
+
+ if (vfio_listener_skipped_section(section)) {
+ DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
+ section->offset_within_address_space,
+ section->offset_within_address_space +
+ int128_get64(int128_sub(section->size, int128_one())));
+ return;
+ }
+
+ if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+ (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+ error_report("%s received unaligned region", __func__);
+ return;
+ }
+
+ iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+ llend = int128_make64(section->offset_within_address_space);
+ llend = int128_add(llend, section->size);
+ llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+ if (int128_ge(int128_make64(iova), llend)) {
+ return;
+ }
+
+ memory_region_ref(section->mr);
+
+ if (memory_region_is_iommu(section->mr)) {
+ VFIOGuestIOMMU *giommu;
+
+ DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+ iova, int128_get64(int128_sub(llend, int128_one())));
+ /*
+ * FIXME: We should do some checking to see if the
+ * capabilities of the host VFIO IOMMU are adequate to model
+ * the guest IOMMU
+ *
+ * FIXME: For VFIO iommu types which have KVM acceleration to
+ * avoid bouncing all map/unmaps through qemu this way, this
+ * would be the right place to wire that up (tell the KVM
+ * device emulation the VFIO iommu handles to use).
+ */
+ /*
+ * This assumes that the guest IOMMU is empty of
+ * mappings at this point.
+ *
+ * One way of doing this is:
+ * 1. Avoid sharing IOMMUs between emulated devices or different
+ * IOMMU groups.
+ * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
+ * there are some mappings in IOMMU.
+ *
+ * VFIO on SPAPR does that. Other IOMMU models may do that different,
+ * they must make sure there are no existing mappings or
+ * loop through existing mappings to map them into VFIO.
+ */
+ giommu = g_malloc0(sizeof(*giommu));
+ giommu->iommu = section->mr;
+ giommu->container = container;
+ giommu->n.notify = vfio_iommu_map_notify;
+ QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+ memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+ return;
+ }
+
+ /* Here we assume that memory_region_is_ram(section->mr)==true */
+
+ end = int128_get64(llend);
+ vaddr = memory_region_get_ram_ptr(section->mr) +
+ section->offset_within_region +
+ (iova - section->offset_within_address_space);
+
+ DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
+ iova, end - 1, vaddr);
+
+ ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
+ if (ret) {
+ error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, end - iova, vaddr, ret);
+
+ /*
+ * On the initfn path, store the first error in the container so we
+ * can gracefully fail. Runtime, there's not much we can do other
+ * than throw a hardware error.
+ */
+ if (!container->iommu_data.type1.initialized) {
+ if (!container->iommu_data.type1.error) {
+ container->iommu_data.type1.error = ret;
+ }
+ } else {
+ hw_error("vfio: DMA mapping failed, unable to continue");
+ }
+ }
+}
+
+static void vfio_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+ VFIOContainer *container = container_of(listener, VFIOContainer,
+ iommu_data.type1.listener);
+ hwaddr iova, end;
+ int ret;
+
+ if (vfio_listener_skipped_section(section)) {
+ DPRINTF("SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
+ section->offset_within_address_space,
+ section->offset_within_address_space +
+ int128_get64(int128_sub(section->size, int128_one())));
+ return;
+ }
+
+ if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+ (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+ error_report("%s received unaligned region", __func__);
+ return;
+ }
+
+ if (memory_region_is_iommu(section->mr)) {
+ VFIOGuestIOMMU *giommu;
+
+ QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+ if (giommu->iommu == section->mr) {
+ memory_region_unregister_iommu_notifier(&giommu->n);
+ QLIST_REMOVE(giommu, giommu_next);
+ g_free(giommu);
+ break;
+ }
+ }
+
+ /*
+ * FIXME: We assume the one big unmap below is adequate to
+ * remove any individual page mappings in the IOMMU which
+ * might have been copied into VFIO. This works for a page table
+ * based IOMMU where a big unmap flattens a large range of IO-PTEs.
+ * That may not be true for all IOMMU types.
+ */
+ }
+
+ iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+ end = (section->offset_within_address_space + int128_get64(section->size)) &
+ TARGET_PAGE_MASK;
+
+ if (iova >= end) {
+ return;
+ }
+
+ DPRINTF("region_del %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+ iova, end - 1);
+
+ ret = vfio_dma_unmap(container, iova, end - iova);
+ memory_region_unref(section->mr);
+ if (ret) {
+ error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, end - iova, ret);
+ }
+}
+
+const MemoryListener vfio_memory_listener = {
+ .region_add = vfio_listener_region_add,
+ .region_del = vfio_listener_region_del,
+};
+
+void vfio_listener_release(VFIOContainer *container)
+{
+ memory_listener_unregister(&container->iommu_data.type1.listener);
+}
+
+int vfio_mmap_region(Object *vdev, VFIORegion *region,
+ MemoryRegion *mem, MemoryRegion *submem,
+ void **map, size_t size, off_t offset,
+ const char *name)
+{
+ int ret = 0;
+
+ if (VFIO_ALLOW_MMAP && size && region->flags &
+ VFIO_REGION_INFO_FLAG_MMAP) {
+ int prot = 0;
+
+ if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
+ prot |= PROT_READ;
+ }
+
+ if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+ prot |= PROT_WRITE;
+ }
+
+ *map = mmap(NULL, size, prot, MAP_SHARED,
+ region->fd, region->fd_offset + offset);
+ if (*map == MAP_FAILED) {
+ *map = NULL;
+ ret = -errno;
+ goto empty_region;
+ }
+
+ memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
+ } else {
+empty_region:
+ /* Create a zero sized sub-region to make cleanup easy. */
+ memory_region_init(submem, OBJECT(vdev), name, 0);
+ }
+
+ memory_region_add_subregion(mem, offset, submem);
+
+ return ret;
+}
+
+void vfio_reset_handler(void *opaque)
+{
+ VFIOGroup *group;
+ VFIODevice *vbasedev;
+
+ QLIST_FOREACH(group, &group_list, next) {
+ QLIST_FOREACH(vbasedev, &group->device_list, next) {
+ vbasedev->ops->vfio_compute_needs_reset(vbasedev);
+ }
+ }
+
+ QLIST_FOREACH(group, &group_list, next) {
+ QLIST_FOREACH(vbasedev, &group->device_list, next) {
+ if (vbasedev->needs_reset) {
+ vbasedev->ops->vfio_hot_reset_multi(vbasedev);
+ }
+ }
+ }
+}
+
+static void vfio_kvm_device_add_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+ struct kvm_device_attr attr = {
+ .group = KVM_DEV_VFIO_GROUP,
+ .attr = KVM_DEV_VFIO_GROUP_ADD,
+ .addr = (uint64_t)(unsigned long)&group->fd,
+ };
+
+ if (!kvm_enabled()) {
+ return;
+ }
+
+ if (vfio_kvm_device_fd < 0) {
+ struct kvm_create_device cd = {
+ .type = KVM_DEV_TYPE_VFIO,
+ };
+
+ if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+ DPRINTF("KVM_CREATE_DEVICE: %m\n");
+ return;
+ }
+
+ vfio_kvm_device_fd = cd.fd;
+ }
+
+ if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+ error_report("Failed to add group %d to KVM VFIO device: %m",
+ group->groupid);
+ }
+#endif
+}
+
+static void vfio_kvm_device_del_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+ struct kvm_device_attr attr = {
+ .group = KVM_DEV_VFIO_GROUP,
+ .attr = KVM_DEV_VFIO_GROUP_DEL,
+ .addr = (uint64_t)(unsigned long)&group->fd,
+ };
+
+ if (vfio_kvm_device_fd < 0) {
+ return;
+ }
+
+ if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+ error_report("Failed to remove group %d from KVM VFIO device: %m",
+ group->groupid);
+ }
+#endif
+}
+
+static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
+{
+ VFIOAddressSpace *space;
+
+ QLIST_FOREACH(space, &vfio_address_spaces, list) {
+ if (space->as == as) {
+ return space;
+ }
+ }
+
+ /* No suitable VFIOAddressSpace, create a new one */
+ space = g_malloc0(sizeof(*space));
+ space->as = as;
+ QLIST_INIT(&space->containers);
+
+ QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
+
+ return space;
+}
+
+static void vfio_put_address_space(VFIOAddressSpace *space)
+{
+ if (QLIST_EMPTY(&space->containers)) {
+ QLIST_REMOVE(space, list);
+ g_free(space);
+ }
+}
+
+static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
+{
+ VFIOContainer *container;
+ int ret, fd;
+ VFIOAddressSpace *space;
+
+ space = vfio_get_address_space(as);
+
+ QLIST_FOREACH(container, &space->containers, next) {
+ if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+ group->container = container;
+ QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+ return 0;
+ }
+ }
+
+ fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+ if (fd < 0) {
+ error_report("vfio: failed to open /dev/vfio/vfio: %m");
+ ret = -errno;
+ goto put_space_exit;
+ }
+
+ ret = ioctl(fd, VFIO_GET_API_VERSION);
+ if (ret != VFIO_API_VERSION) {
+ error_report("vfio: supported vfio version: %d, "
+ "reported version: %d", VFIO_API_VERSION, ret);
+ ret = -EINVAL;
+ goto close_fd_exit;
+ }
+
+ container = g_malloc0(sizeof(*container));
+ container->space = space;
+ container->fd = fd;
+
+ if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
+ ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+ if (ret) {
+ error_report("vfio: failed to set group container: %m");
+ ret = -errno;
+ goto free_container_exit;
+ }
+
+ ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
+ if (ret) {
+ error_report("vfio: failed to set iommu for container: %m");
+ ret = -errno;
+ goto free_container_exit;
+ }
+
+ container->iommu_data.type1.listener = vfio_memory_listener;
+ container->iommu_data.release = vfio_listener_release;
+
+ memory_listener_register(&container->iommu_data.type1.listener,
+ &address_space_memory);
+
+ if (container->iommu_data.type1.error) {
+ ret = container->iommu_data.type1.error;
+ error_report("vfio: memory listener initialization"
+ " failed for container");
+ goto listener_release_exit;
+ }
+
+ container->iommu_data.type1.initialized = true;
+
+ } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+ ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+ if (ret) {
+ error_report("vfio: failed to set group container: %m");
+ ret = -errno;
+ goto free_container_exit;
+ }
+
+ ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+ if (ret) {
+ error_report("vfio: failed to set iommu for container: %m");
+ ret = -errno;
+ goto free_container_exit;
+ }
+
+ /*
+ * The host kernel code implementing VFIO_IOMMU_DISABLE is called
+ * when container fd is closed so we do not call it explicitly
+ * in this file.
+ */
+ ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+ if (ret) {
+ error_report("vfio: failed to enable container: %m");
+ ret = -errno;
+ goto free_container_exit;
+ }
+
+ container->iommu_data.type1.listener = vfio_memory_listener;
+ container->iommu_data.release = vfio_listener_release;
+
+ memory_listener_register(&container->iommu_data.type1.listener,
+ container->space->as);
+
+ } else {
+ error_report("vfio: No available IOMMU models");
+ ret = -EINVAL;
+ goto free_container_exit;
+ }
+
+ QLIST_INIT(&container->group_list);
+ QLIST_INSERT_HEAD(&space->containers, container, next);
+
+ group->container = container;
+ QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+
+ return 0;
+
+listener_release_exit:
+ vfio_listener_release(container);
+
+free_container_exit:
+ g_free(container);
+
+close_fd_exit:
+ close(fd);
+
+put_space_exit:
+ vfio_put_address_space(space);
+
+ return ret;
+}
+
+static void vfio_disconnect_container(VFIOGroup *group)
+{
+ VFIOContainer *container = group->container;
+
+ if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
+ error_report("vfio: error disconnecting group %d from container",
+ group->groupid);
+ }
+
+ QLIST_REMOVE(group, container_next);
+ group->container = NULL;
+
+ if (QLIST_EMPTY(&container->group_list)) {
+ VFIOAddressSpace *space = container->space;
+
+ if (container->iommu_data.release) {
+ container->iommu_data.release(container);
+ }
+ QLIST_REMOVE(container, next);
+ DPRINTF("vfio_disconnect_container: close container->fd\n");
+ close(container->fd);
+ g_free(container);
+
+ vfio_put_address_space(space);
+ }
+}
+
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
+{
+ VFIOGroup *group;
+ char path[32];
+ struct vfio_group_status status = { .argsz = sizeof(status) };
+
+ QLIST_FOREACH(group, &group_list, next) {
+ if (group->groupid == groupid) {
+ /* Found it. Now is it already in the right context? */
+ if (group->container->space->as == as) {
+ return group;
+ } else {
+ error_report("vfio: group %d used in multiple address spaces",
+ group->groupid);
+ return NULL;
+ }
+ }
+ }
+
+ group = g_malloc0(sizeof(*group));
+
+ snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
+ group->fd = qemu_open(path, O_RDWR);
+ if (group->fd < 0) {
+ error_report("vfio: error opening %s: %m", path);
+ goto free_group_exit;
+ }
+
+ if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
+ error_report("vfio: error getting group status: %m");
+ goto close_fd_exit;
+ }
+
+ if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+ error_report("vfio: error, group %d is not viable, please ensure "
+ "all devices within the iommu_group are bound to their "
+ "vfio bus driver.", groupid);
+ goto close_fd_exit;
+ }
+
+ group->groupid = groupid;
+ QLIST_INIT(&group->device_list);
+
+ if (vfio_connect_container(group, as)) {
+ error_report("vfio: failed to setup container for group %d", groupid);
+ goto close_fd_exit;
+ }
+
+ if (QLIST_EMPTY(&group_list)) {
+ qemu_register_reset(vfio_reset_handler, NULL);
+ }
+
+ QLIST_INSERT_HEAD(&group_list, group, next);
+
+ vfio_kvm_device_add_group(group);
+
+ return group;
+
+close_fd_exit:
+ close(group->fd);
+
+free_group_exit:
+ g_free(group);
+
+ return NULL;
+}
+
+void vfio_put_group(VFIOGroup *group)
+{
+ if (!QLIST_EMPTY(&group->device_list)) {
+ return;
+ }
+
+ vfio_kvm_device_del_group(group);
+ vfio_disconnect_container(group);
+ QLIST_REMOVE(group, next);
+ DPRINTF("vfio_put_group: close group->fd\n");
+ close(group->fd);
+ g_free(group);
+
+ if (QLIST_EMPTY(&group_list)) {
+ qemu_unregister_reset(vfio_reset_handler, NULL);
+ }
+}
+
+int vfio_get_device(VFIOGroup *group, const char *name,
+ VFIODevice *vbasedev)
+{
+ struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+ struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+ struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+ int ret;
+
+ ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+ if (ret < 0) {
+ error_report("vfio: error getting device %s from group %d: %m",
+ name, group->groupid);
+ error_printf("Verify all devices in group %d are bound to vfio-pci "
+ "or pci-stub and not already in use\n", group->groupid);
+ return ret;
+ }
+
+ vbasedev->fd = ret;
+ vbasedev->group = group;
+ QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+ if (ret) {
+ error_report("vfio: error getting device info: %m");
+ goto error;
+ }
+
+ vbasedev->num_irqs = dev_info.num_irqs;
+ vbasedev->num_regions = dev_info.num_regions;
+ vbasedev->flags = dev_info.flags;
+
+ DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
+ dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
+
+ vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+ /* call device specific functions */
+ ret = vbasedev->ops->vfio_check_device(vbasedev);
+ if (ret) {
+ error_report("vfio: error when checking device %s\n",
+ vbasedev->name);
+ goto error;
+ }
+ ret = vbasedev->ops->vfio_populate_regions(vbasedev);
+ if (ret) {
+ error_report("vfio: error when populating regions of device %s\n",
+ vbasedev->name);
+ goto error;
+ }
+ ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
+ if (ret) {
+ error_report("vfio: error when populating interrupts of device %s\n",
+ vbasedev->name);
+ goto error;
+ }
+
+error:
+ if (ret) {
+ vfio_put_base_device(vbasedev);
+ }
+ return ret;
+}
+
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+ QLIST_REMOVE(vbasedev, next);
+ vbasedev->group = NULL;
+ DPRINTF("vfio_put_device: close vdev->fd\n");
+ close(vbasedev->fd);
+ g_free(vbasedev->name);
+}
+
+static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
+ int req, void *param)
+{
+ VFIOGroup *group;
+ VFIOContainer *container;
+ int ret = -1;
+
+ group = vfio_get_group(groupid, as);
+ if (!group) {
+ error_report("vfio: group %d not registered", groupid);
+ return ret;
+ }
+
+ container = group->container;
+ if (group->container) {
+ ret = ioctl(container->fd, req, param);
+ if (ret < 0) {
+ error_report("vfio: failed to ioctl container: ret=%d, %s",
+ ret, strerror(errno));
+ }
+ }
+
+ vfio_put_group(group);
+
+ return ret;
+}
+
+int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
+ int req, void *param)
+{
+ /* We allow only certain ioctls to the container */
+ switch (req) {
+ case VFIO_CHECK_EXTENSION:
+ case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+ break;
+ default:
+ /* Return an error on unknown requests */
+ error_report("vfio: unsupported ioctl %X", req);
+ return -1;
+ }
+
+ return vfio_container_do_ioctl(as, groupid, req, param);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d228cf8..4af5df9 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -21,33 +21,17 @@
#include <linux/vfio.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
-#include "exec/address-spaces.h"
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
#include "qemu/error-report.h"
#include "qemu/range.h"
#include "sysemu/sysemu.h"
-#include "hw/vfio/vfio.h"
+#include "hw/vfio/vfio-common.h"
-/* #define DEBUG_VFIO */
-#ifdef DEBUG_VFIO
-#define DPRINTF(fmt, ...) \
- do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
- do { } while (0)
-#endif
-
-/* Extra debugging, trap acceleration paths for more logging */
-#define VFIO_ALLOW_MMAP 1
-#define VFIO_ALLOW_KVM_INTX 1
-#define VFIO_ALLOW_KVM_MSI 1
-#define VFIO_ALLOW_KVM_MSIX 1
-
-enum {
- VFIO_DEVICE_TYPE_PCI = 0,
- VFIO_DEVICE_TYPE_PLATFORM = 1,
-};
+extern const MemoryRegionOps vfio_region_ops;
+extern const MemoryListener vfio_memory_listener;
+extern QLIST_HEAD(, VFIOGroup) group_list;
+extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
struct VFIOPCIDevice;
@@ -74,18 +58,6 @@ typedef struct VFIOQuirk {
} data;
} VFIOQuirk;
-typedef struct VFIORegion {
- struct VFIODevice *vbasedev;
- off_t fd_offset; /* offset of region within device fd */
- int fd; /* device fd, allows us to pass VFIORegion as opaque data */
- MemoryRegion mem; /* slow, read/write access */
- MemoryRegion mmap_mem; /* direct mapped access */
- void *mmap;
- size_t size;
- uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
- uint8_t nr; /* cache the region number for debug */
-} VFIORegion;
-
typedef struct VFIOBAR {
VFIORegion region;
bool ioport;
@@ -133,45 +105,6 @@ enum {
VFIO_INT_MSIX = 3,
};
-typedef struct VFIOAddressSpace {
- AddressSpace *as;
- QLIST_HEAD(, VFIOContainer) containers;
- QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
- QLIST_HEAD_INITIALIZER(vfio_address_spaces);
-
-struct VFIOGroup;
-
-typedef struct VFIOType1 {
- MemoryListener listener;
- int error;
- bool initialized;
-} VFIOType1;
-
-typedef struct VFIOContainer {
- VFIOAddressSpace *space;
- int fd; /* /dev/vfio/vfio, empowered by the attached groups */
- struct {
- /* enable abstraction to support various iommu backends */
- union {
- VFIOType1 type1;
- };
- void (*release)(struct VFIOContainer *);
- } iommu_data;
- QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
- QLIST_HEAD(, VFIOGroup) group_list;
- QLIST_ENTRY(VFIOContainer) next;
-} VFIOContainer;
-
-typedef struct VFIOGuestIOMMU {
- VFIOContainer *container;
- MemoryRegion *iommu;
- Notifier n;
- QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
/* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
typedef struct VFIOMSIXInfo {
uint8_t table_bar;
@@ -183,31 +116,6 @@ typedef struct VFIOMSIXInfo {
void *mmap;
} VFIOMSIXInfo;
-typedef struct VFIODeviceOps VFIODeviceOps;
-
-typedef struct VFIODevice {
- QLIST_ENTRY(VFIODevice) next;
- struct VFIOGroup *group;
- char *name;
- int fd;
- int type;
- bool reset_works;
- bool needs_reset;
- VFIODeviceOps *ops;
- unsigned int num_irqs;
- unsigned int num_regions;
- unsigned int flags;
-} VFIODevice;
-
-struct VFIODeviceOps {
- bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
- int (*vfio_hot_reset_multi)(VFIODevice *vdev);
- void (*vfio_eoi)(VFIODevice *vdev);
- int (*vfio_check_device)(VFIODevice *vdev);
- int (*vfio_populate_regions)(VFIODevice *vdev);
- int (*vfio_populate_interrupts)(VFIODevice *vdev);
-};
-
typedef struct VFIOPCIDevice {
PCIDevice pdev;
VFIODevice vbasedev;
@@ -239,15 +147,6 @@ typedef struct VFIOPCIDevice {
bool rom_read_failed;
} VFIOPCIDevice;
-typedef struct VFIOGroup {
- int fd;
- int groupid;
- VFIOContainer *container;
- QLIST_HEAD(, VFIODevice) device_list;
- QLIST_ENTRY(VFIOGroup) next;
- QLIST_ENTRY(VFIOGroup) container_next;
-} VFIOGroup;
-
typedef struct VFIORomBlacklistEntry {
uint16_t vendor_id;
uint16_t device_id;
@@ -273,78 +172,16 @@ static const VFIORomBlacklistEntry romblacklist[] = {
#define MSIX_CAP_LENGTH 12
-static QLIST_HEAD(, VFIOGroup)
- group_list = QLIST_HEAD_INITIALIZER(group_list);
-
-#ifdef CONFIG_KVM
-/*
- * We have a single VFIO pseudo device per KVM VM. Once created it lives
- * for the life of the VM. Closing the file descriptor only drops our
- * reference to it and the device's reference to kvm. Therefore once
- * initialized, this file descriptor is only released on QEMU exit and
- * we'll re-use it should another vfio device be attached before then.
- */
-static int vfio_kvm_device_fd = -1;
-#endif
-
static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
uint32_t val, int len);
static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
-static void vfio_put_base_device(VFIODevice *vbasedev);
static int vfio_check_device(VFIODevice *vbasedev);
static int vfio_populate_regions(VFIODevice *vbasedev);
static int vfio_populate_interrupts(VFIODevice *vbasedev);
/*
- * Common VFIO interrupt disable
- */
-static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
-{
- struct vfio_irq_set irq_set = {
- .argsz = sizeof(irq_set),
- .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
- .index = index,
- .start = 0,
- .count = 0,
- };
-
- ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-/*
- * INTx
- */
-static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
-{
- struct vfio_irq_set irq_set = {
- .argsz = sizeof(irq_set),
- .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
- .index = index,
- .start = 0,
- .count = 1,
- };
-
- ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
-{
- struct vfio_irq_set irq_set = {
- .argsz = sizeof(irq_set),
- .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
- .index = index,
- .start = 0,
- .count = 1,
- };
-
- ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-#endif
-
-/*
* Disabling BAR mmaping can be slow, but toggling it around INTx can
* also be a huge overhead. We try to get the best of both worlds by
* waiting until an interrupt to disable mmaps (subsequent transitions
@@ -1088,115 +925,6 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
}
}
-/*
- * IO Port/MMIO - Beware of the endians, VFIO is always little endian
- */
-static void vfio_region_write(void *opaque, hwaddr addr,
- uint64_t data, unsigned size)
-{
- VFIORegion *region = opaque;
- VFIODevice *vbasedev = region->vbasedev;
- union {
- uint8_t byte;
- uint16_t word;
- uint32_t dword;
- uint64_t qword;
- } buf;
-
- switch (size) {
- case 1:
- buf.byte = data;
- break;
- case 2:
- buf.word = data;
- break;
- case 4:
- buf.dword = data;
- break;
- default:
- hw_error("vfio: unsupported write size, %d bytes", size);
- break;
- }
-
- if (pwrite(region->fd, &buf, size, region->fd_offset + addr) != size) {
- error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
- __func__, addr, data, size);
- }
-
-#ifdef DEBUG_VFIO
- {
- DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
- ", %d)\n", __func__, vbasedev->name,
- region->nr, addr, data, size);
- }
-#endif
-
- /*
- * A read or write to a BAR always signals an INTx EOI. This will
- * do nothing if not pending (including not in INTx mode). We assume
- * that a BAR access is in response to an interrupt and that BAR
- * accesses will service the interrupt. Unfortunately, we don't know
- * which access will service the interrupt, so we're potentially
- * getting quite a few host interrupts per guest interrupt.
- */
- vbasedev->ops->vfio_eoi(vbasedev);
-
-}
-
-static uint64_t vfio_region_read(void *opaque,
- hwaddr addr, unsigned size)
-{
- VFIORegion *region = opaque;
- VFIODevice *vbasedev = region->vbasedev;
- union {
- uint8_t byte;
- uint16_t word;
- uint32_t dword;
- uint64_t qword;
- } buf;
- uint64_t data = 0;
-
- if (pread(region->fd, &buf, size, region->fd_offset + addr) != size) {
- error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
- __func__, addr, size);
- return (uint64_t)-1;
- }
-
- switch (size) {
- case 1:
- data = buf.byte;
- break;
- case 2:
- data = buf.word;
- break;
- case 4:
- data = buf.dword;
- break;
- default:
- hw_error("vfio: unsupported read size, %d bytes", size);
- break;
- }
-
-#ifdef DEBUG_VFIO
- {
- DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
- __func__, vdev->name,
- region->nr, addr, size, data);
- }
-#endif
-
- /* Same as write above */
- vbasedev->ops->vfio_eoi(vbasedev);
-
- return data;
-}
-
-static const MemoryRegionOps vfio_region_ops = {
- .read = vfio_region_read,
- .write = vfio_region_write,
- .endianness = DEVICE_NATIVE_ENDIAN,
-};
-
static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
{
struct vfio_region_info reg_info = {
@@ -2423,307 +2151,6 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
}
/*
- * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
- */
-static int vfio_dma_unmap(VFIOContainer *container,
- hwaddr iova, ram_addr_t size)
-{
- struct vfio_iommu_type1_dma_unmap unmap = {
- .argsz = sizeof(unmap),
- .flags = 0,
- .iova = iova,
- .size = size,
- };
-
- if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
- DPRINTF("VFIO_UNMAP_DMA: %d\n", -errno);
- return -errno;
- }
-
- return 0;
-}
-
-static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
- ram_addr_t size, void *vaddr, bool readonly)
-{
- struct vfio_iommu_type1_dma_map map = {
- .argsz = sizeof(map),
- .flags = VFIO_DMA_MAP_FLAG_READ,
- .vaddr = (__u64)(uintptr_t)vaddr,
- .iova = iova,
- .size = size,
- };
-
- if (!readonly) {
- map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
- }
-
- /*
- * Try the mapping, if it fails with EBUSY, unmap the region and try
- * again. This shouldn't be necessary, but we sometimes see it in
- * the the VGA ROM space.
- */
- if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
- (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
- ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
- return 0;
- }
-
- DPRINTF("VFIO_MAP_DMA: %d\n", -errno);
- return -errno;
-}
-
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
-{
- return (!memory_region_is_ram(section->mr) &&
- !memory_region_is_iommu(section->mr)) ||
- /*
- * Sizing an enabled 64-bit BAR can cause spurious mappings to
- * addresses in the upper part of the 64-bit address space. These
- * are never accessed by the CPU and beyond the address width of
- * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width.
- */
- section->offset_within_address_space & (1ULL << 63);
-}
-
-static void vfio_iommu_map_notify(Notifier *n, void *data)
-{
- VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
- VFIOContainer *container = giommu->container;
- IOMMUTLBEntry *iotlb = data;
- MemoryRegion *mr;
- hwaddr xlat;
- hwaddr len = iotlb->addr_mask + 1;
- void *vaddr;
- int ret;
-
- DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
- iotlb->iova, iotlb->iova + iotlb->addr_mask);
-
- /*
- * The IOMMU TLB entry we have just covers translation through
- * this IOMMU to its immediate target. We need to translate
- * it the rest of the way through to memory.
- */
- mr = address_space_translate(&address_space_memory,
- iotlb->translated_addr,
- &xlat, &len, iotlb->perm & IOMMU_WO);
- if (!memory_region_is_ram(mr)) {
- DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
- xlat);
- return;
- }
- /*
- * Translation truncates length to the IOMMU page size,
- * check that it did not truncate too much.
- */
- if (len & iotlb->addr_mask) {
- DPRINTF("iommu has granularity incompatible with target AS\n");
- return;
- }
-
- if (iotlb->perm != IOMMU_NONE) {
- vaddr = memory_region_get_ram_ptr(mr) + xlat;
-
- ret = vfio_dma_map(container, iotlb->iova,
- iotlb->addr_mask + 1, vaddr,
- !(iotlb->perm & IOMMU_WO) || mr->readonly);
- if (ret) {
- error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx", %p) = %d (%m)",
- container, iotlb->iova,
- iotlb->addr_mask + 1, vaddr, ret);
- }
- } else {
- ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
- if (ret) {
- error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx") = %d (%m)",
- container, iotlb->iova,
- iotlb->addr_mask + 1, ret);
- }
- }
-}
-
-static void vfio_listener_region_add(MemoryListener *listener,
- MemoryRegionSection *section)
-{
- VFIOContainer *container = container_of(listener, VFIOContainer,
- iommu_data.type1.listener);
- hwaddr iova, end;
- Int128 llend;
- void *vaddr;
- int ret;
-
- if (vfio_listener_skipped_section(section)) {
- DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
- section->offset_within_address_space,
- section->offset_within_address_space +
- int128_get64(int128_sub(section->size, int128_one())));
- return;
- }
-
- if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
- (section->offset_within_region & ~TARGET_PAGE_MASK))) {
- error_report("%s received unaligned region", __func__);
- return;
- }
-
- iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
- llend = int128_make64(section->offset_within_address_space);
- llend = int128_add(llend, section->size);
- llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
-
- if (int128_ge(int128_make64(iova), llend)) {
- return;
- }
-
- memory_region_ref(section->mr);
-
- if (memory_region_is_iommu(section->mr)) {
- VFIOGuestIOMMU *giommu;
-
- DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
- iova, int128_get64(int128_sub(llend, int128_one())));
- /*
- * FIXME: We should do some checking to see if the
- * capabilities of the host VFIO IOMMU are adequate to model
- * the guest IOMMU
- *
- * FIXME: For VFIO iommu types which have KVM acceleration to
- * avoid bouncing all map/unmaps through qemu this way, this
- * would be the right place to wire that up (tell the KVM
- * device emulation the VFIO iommu handles to use).
- */
- /*
- * This assumes that the guest IOMMU is empty of
- * mappings at this point.
- *
- * One way of doing this is:
- * 1. Avoid sharing IOMMUs between emulated devices or different
- * IOMMU groups.
- * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
- * there are some mappings in IOMMU.
- *
- * VFIO on SPAPR does that. Other IOMMU models may do that different,
- * they must make sure there are no existing mappings or
- * loop through existing mappings to map them into VFIO.
- */
- giommu = g_malloc0(sizeof(*giommu));
- giommu->iommu = section->mr;
- giommu->container = container;
- giommu->n.notify = vfio_iommu_map_notify;
- QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
- memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-
- return;
- }
-
- /* Here we assume that memory_region_is_ram(section->mr)==true */
-
- end = int128_get64(llend);
- vaddr = memory_region_get_ram_ptr(section->mr) +
- section->offset_within_region +
- (iova - section->offset_within_address_space);
-
- DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
- iova, end - 1, vaddr);
-
- ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
- if (ret) {
- error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx", %p) = %d (%m)",
- container, iova, end - iova, vaddr, ret);
-
- /*
- * On the initfn path, store the first error in the container so we
- * can gracefully fail. Runtime, there's not much we can do other
- * than throw a hardware error.
- */
- if (!container->iommu_data.type1.initialized) {
- if (!container->iommu_data.type1.error) {
- container->iommu_data.type1.error = ret;
- }
- } else {
- hw_error("vfio: DMA mapping failed, unable to continue");
- }
- }
-}
-
-static void vfio_listener_region_del(MemoryListener *listener,
- MemoryRegionSection *section)
-{
- VFIOContainer *container = container_of(listener, VFIOContainer,
- iommu_data.type1.listener);
- hwaddr iova, end;
- int ret;
-
- if (vfio_listener_skipped_section(section)) {
- DPRINTF("SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
- section->offset_within_address_space,
- section->offset_within_address_space +
- int128_get64(int128_sub(section->size, int128_one())));
- return;
- }
-
- if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
- (section->offset_within_region & ~TARGET_PAGE_MASK))) {
- error_report("%s received unaligned region", __func__);
- return;
- }
-
- if (memory_region_is_iommu(section->mr)) {
- VFIOGuestIOMMU *giommu;
-
- QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
- if (giommu->iommu == section->mr) {
- memory_region_unregister_iommu_notifier(&giommu->n);
- QLIST_REMOVE(giommu, giommu_next);
- g_free(giommu);
- break;
- }
- }
-
- /*
- * FIXME: We assume the one big unmap below is adequate to
- * remove any individual page mappings in the IOMMU which
- * might have been copied into VFIO. This works for a page table
- * based IOMMU where a big unmap flattens a large range of IO-PTEs.
- * That may not be true for all IOMMU types.
- */
- }
-
- iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
- end = (section->offset_within_address_space + int128_get64(section->size)) &
- TARGET_PAGE_MASK;
-
- if (iova >= end) {
- return;
- }
-
- DPRINTF("region_del %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
- iova, end - 1);
-
- ret = vfio_dma_unmap(container, iova, end - iova);
- memory_region_unref(section->mr);
- if (ret) {
- error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx") = %d (%m)",
- container, iova, end - iova, ret);
- }
-}
-
-static MemoryListener vfio_memory_listener = {
- .region_add = vfio_listener_region_add,
- .region_del = vfio_listener_region_del,
-};
-
-static void vfio_listener_release(VFIOContainer *container)
-{
- memory_listener_unregister(&container->iommu_data.type1.listener);
-}
-
-/*
* Interrupt setup
*/
static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
@@ -2903,45 +2330,6 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
memory_region_destroy(&bar->region.mem);
}
-static int vfio_mmap_region(Object *vdev, VFIORegion *region,
- MemoryRegion *mem, MemoryRegion *submem,
- void **map, size_t size, off_t offset,
- const char *name)
-{
- int ret = 0;
-
- if (VFIO_ALLOW_MMAP && size && region->flags &
- VFIO_REGION_INFO_FLAG_MMAP) {
- int prot = 0;
-
- if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
- prot |= PROT_READ;
- }
-
- if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
- prot |= PROT_WRITE;
- }
-
- *map = mmap(NULL, size, prot, MAP_SHARED,
- region->fd, region->fd_offset + offset);
- if (*map == MAP_FAILED) {
- *map = NULL;
- ret = -errno;
- goto empty_region;
- }
-
- memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
- } else {
-empty_region:
- /* Create a zero sized sub-region to make cleanup easy. */
- memory_region_init(submem, OBJECT(vdev), name, 0);
- }
-
- memory_region_add_subregion(mem, offset, submem);
-
- return ret;
-}
-
static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
{
VFIOBAR *bar = &vdev->bars[nr];
@@ -3600,346 +2988,6 @@ static VFIODeviceOps vfio_pci_ops = {
.vfio_populate_interrupts = vfio_populate_interrupts,
};
-static void vfio_reset_handler(void *opaque)
-{
- VFIOGroup *group;
- VFIODevice *vbasedev;
-
- QLIST_FOREACH(group, &group_list, next) {
- QLIST_FOREACH(vbasedev, &group->device_list, next) {
- vbasedev->ops->vfio_compute_needs_reset(vbasedev);
- }
- }
-
- QLIST_FOREACH(group, &group_list, next) {
- QLIST_FOREACH(vbasedev, &group->device_list, next) {
- if (vbasedev->needs_reset) {
- vbasedev->ops->vfio_hot_reset_multi(vbasedev);
- }
- }
- }
-}
-
-static void vfio_kvm_device_add_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
- struct kvm_device_attr attr = {
- .group = KVM_DEV_VFIO_GROUP,
- .attr = KVM_DEV_VFIO_GROUP_ADD,
- .addr = (uint64_t)(unsigned long)&group->fd,
- };
-
- if (!kvm_enabled()) {
- return;
- }
-
- if (vfio_kvm_device_fd < 0) {
- struct kvm_create_device cd = {
- .type = KVM_DEV_TYPE_VFIO,
- };
-
- if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
- DPRINTF("KVM_CREATE_DEVICE: %m\n");
- return;
- }
-
- vfio_kvm_device_fd = cd.fd;
- }
-
- if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
- error_report("Failed to add group %d to KVM VFIO device: %m",
- group->groupid);
- }
-#endif
-}
-
-static void vfio_kvm_device_del_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
- struct kvm_device_attr attr = {
- .group = KVM_DEV_VFIO_GROUP,
- .attr = KVM_DEV_VFIO_GROUP_DEL,
- .addr = (uint64_t)(unsigned long)&group->fd,
- };
-
- if (vfio_kvm_device_fd < 0) {
- return;
- }
-
- if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
- error_report("Failed to remove group %d from KVM VFIO device: %m",
- group->groupid);
- }
-#endif
-}
-
-static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
-{
- VFIOAddressSpace *space;
-
- QLIST_FOREACH(space, &vfio_address_spaces, list) {
- if (space->as == as) {
- return space;
- }
- }
-
- /* No suitable VFIOAddressSpace, create a new one */
- space = g_malloc0(sizeof(*space));
- space->as = as;
- QLIST_INIT(&space->containers);
-
- QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
-
- return space;
-}
-
-static void vfio_put_address_space(VFIOAddressSpace *space)
-{
- if (QLIST_EMPTY(&space->containers)) {
- QLIST_REMOVE(space, list);
- g_free(space);
- }
-}
-
-static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
-{
- VFIOContainer *container;
- int ret, fd;
- VFIOAddressSpace *space;
-
- space = vfio_get_address_space(as);
-
- QLIST_FOREACH(container, &space->containers, next) {
- if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
- group->container = container;
- QLIST_INSERT_HEAD(&container->group_list, group, container_next);
- return 0;
- }
- }
-
- fd = qemu_open("/dev/vfio/vfio", O_RDWR);
- if (fd < 0) {
- error_report("vfio: failed to open /dev/vfio/vfio: %m");
- ret = -errno;
- goto put_space_exit;
- }
-
- ret = ioctl(fd, VFIO_GET_API_VERSION);
- if (ret != VFIO_API_VERSION) {
- error_report("vfio: supported vfio version: %d, "
- "reported version: %d", VFIO_API_VERSION, ret);
- ret = -EINVAL;
- goto close_fd_exit;
- }
-
- container = g_malloc0(sizeof(*container));
- container->space = space;
- container->fd = fd;
-
- if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
- ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
- if (ret) {
- error_report("vfio: failed to set group container: %m");
- ret = -errno;
- goto free_container_exit;
- }
-
- ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
- if (ret) {
- error_report("vfio: failed to set iommu for container: %m");
- ret = -errno;
- goto free_container_exit;
- }
-
- container->iommu_data.type1.listener = vfio_memory_listener;
- container->iommu_data.release = vfio_listener_release;
-
- memory_listener_register(&container->iommu_data.type1.listener,
- &address_space_memory);
-
- if (container->iommu_data.type1.error) {
- ret = container->iommu_data.type1.error;
- error_report("vfio: memory listener initialization failed"
- " for container");
- goto listener_release_exit;
- }
-
- container->iommu_data.type1.initialized = true;
-
- } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
- ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
- if (ret) {
- error_report("vfio: failed to set group container: %m");
- ret = -errno;
- goto free_container_exit;
- }
-
- ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
- if (ret) {
- error_report("vfio: failed to set iommu for container: %m");
- ret = -errno;
- goto free_container_exit;
- }
-
- /*
- * The host kernel code implementing VFIO_IOMMU_DISABLE is called
- * when container fd is closed so we do not call it explicitly
- * in this file.
- */
- ret = ioctl(fd, VFIO_IOMMU_ENABLE);
- if (ret) {
- error_report("vfio: failed to enable container: %m");
- ret = -errno;
- goto free_container_exit;
- }
-
- container->iommu_data.type1.listener = vfio_memory_listener;
- container->iommu_data.release = vfio_listener_release;
-
- memory_listener_register(&container->iommu_data.type1.listener,
- container->space->as);
-
- } else {
- error_report("vfio: No available IOMMU models");
- ret = -EINVAL;
- goto free_container_exit;
- }
-
- QLIST_INIT(&container->group_list);
- QLIST_INSERT_HEAD(&space->containers, container, next);
-
- group->container = container;
- QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-
- return 0;
-
-listener_release_exit:
- vfio_listener_release(container);
-
-free_container_exit:
- g_free(container);
-
-close_fd_exit:
- close(fd);
-
-put_space_exit:
- vfio_put_address_space(space);
-
- return ret;
-}
-
-static void vfio_disconnect_container(VFIOGroup *group)
-{
- VFIOContainer *container = group->container;
-
- if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
- error_report("vfio: error disconnecting group %d from container",
- group->groupid);
- }
-
- QLIST_REMOVE(group, container_next);
- group->container = NULL;
-
- if (QLIST_EMPTY(&container->group_list)) {
- VFIOAddressSpace *space = container->space;
-
- if (container->iommu_data.release) {
- container->iommu_data.release(container);
- }
- QLIST_REMOVE(container, next);
- DPRINTF("vfio_disconnect_container: close container->fd\n");
- close(container->fd);
- g_free(container);
-
- vfio_put_address_space(space);
- }
-}
-
-static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
-{
- VFIOGroup *group;
- char path[32];
- struct vfio_group_status status = { .argsz = sizeof(status) };
-
- QLIST_FOREACH(group, &group_list, next) {
- if (group->groupid == groupid) {
- /* Found it. Now is it already in the right context? */
- if (group->container->space->as == as) {
- return group;
- } else {
- error_report("vfio: group %d used in multiple address spaces",
- group->groupid);
- return NULL;
- }
- }
- }
-
- group = g_malloc0(sizeof(*group));
-
- snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
- group->fd = qemu_open(path, O_RDWR);
- if (group->fd < 0) {
- error_report("vfio: error opening %s: %m", path);
- goto free_group_exit;
- }
-
- if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
- error_report("vfio: error getting group status: %m");
- goto close_fd_exit;
- }
-
- if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
- error_report("vfio: error, group %d is not viable, please ensure "
- "all devices within the iommu_group are bound to their "
- "vfio bus driver.", groupid);
- goto close_fd_exit;
- }
-
- group->groupid = groupid;
- QLIST_INIT(&group->device_list);
-
- if (vfio_connect_container(group, as)) {
- error_report("vfio: failed to setup container for group %d", groupid);
- goto close_fd_exit;
- }
-
- if (QLIST_EMPTY(&group_list)) {
- qemu_register_reset(vfio_reset_handler, NULL);
- }
-
- QLIST_INSERT_HEAD(&group_list, group, next);
-
- vfio_kvm_device_add_group(group);
-
- return group;
-
-close_fd_exit:
- close(group->fd);
-
-free_group_exit:
- g_free(group);
-
- return NULL;
-}
-
-static void vfio_put_group(VFIOGroup *group)
-{
- if (!QLIST_EMPTY(&group->device_list)) {
- return;
- }
-
- vfio_kvm_device_del_group(group);
- vfio_disconnect_container(group);
- QLIST_REMOVE(group, next);
- DPRINTF("vfio_put_group: close group->fd\n");
- close(group->fd);
- g_free(group);
-
- if (QLIST_EMPTY(&group_list)) {
- qemu_unregister_reset(vfio_reset_handler, NULL);
- }
-}
-
static int vfio_check_device(VFIODevice *vbasedev)
{
if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
@@ -3997,7 +3045,7 @@ static int vfio_populate_regions(VFIODevice *vbasedev)
goto error;
}
- DPRINTF("Device %s region %d:\n", name, i);
+ DPRINTF("Device %s region %d:\n", vdev->vbasedev.name, i);
DPRINTF(" size: 0x%lx, offset: 0x%lx, flags: 0x%lx\n",
(unsigned long)reg_info.size, (unsigned long)reg_info.offset,
(unsigned long)reg_info.flags);
@@ -4018,7 +3066,7 @@ static int vfio_populate_regions(VFIODevice *vbasedev)
goto error;
}
- DPRINTF("Device %s config:\n", name);
+ DPRINTF("Device %s config:\n", vdev->vbasedev.name);
DPRINTF(" size: 0x%lx, offset: 0x%lx, flags: 0x%lx\n",
(unsigned long)reg_info.size, (unsigned long)reg_info.offset,
(unsigned long)reg_info.flags);
@@ -4075,78 +3123,6 @@ error:
return -errno;
}
-static int vfio_get_device(VFIOGroup *group, const char *name,
- VFIODevice *vbasedev)
-{
- struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
- struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
- struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
- int ret;
-
- ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
- if (ret < 0) {
- error_report("vfio: error getting device %s from group %d: %m",
- name, group->groupid);
- error_printf("Verify all devices in group %d are bound to vfio-pci "
- "or pci-stub and not already in use\n", group->groupid);
- return ret;
- }
-
- vbasedev->fd = ret;
- vbasedev->group = group;
- QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
-
- ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
- if (ret) {
- error_report("vfio: error getting device info: %m");
- goto error;
- }
-
- vbasedev->num_irqs = dev_info.num_irqs;
- vbasedev->num_regions = dev_info.num_regions;
- vbasedev->flags = dev_info.flags;
-
- DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
- dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
-
- vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
- /* call device specific functions */
- ret = vbasedev->ops->vfio_check_device(vbasedev);
- if (ret) {
- error_report("vfio: error when checking device %s\n",
- vbasedev->name);
- goto error;
- }
- ret = vbasedev->ops->vfio_populate_regions(vbasedev);
- if (ret) {
- error_report("vfio: error when populating regions of device %s\n",
- vbasedev->name);
- goto error;
- }
- ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
- if (ret) {
- error_report("vfio: error when populating interrupts of device %s\n",
- vbasedev->name);
- goto error;
- }
-
-error:
- if (ret) {
- vfio_put_base_device(vbasedev);
- }
- return ret;
-}
-
-void vfio_put_base_device(VFIODevice *vbasedev)
-{
- QLIST_REMOVE(vbasedev, next);
- vbasedev->group = NULL;
- DPRINTF("vfio_put_device: close vdev->fd\n");
- close(vbasedev->fd);
- g_free(vbasedev->name);
-}
-
static void vfio_put_device(VFIOPCIDevice *vdev)
{
vfio_put_base_device(&vdev->vbasedev);
@@ -4524,47 +3500,3 @@ static void register_vfio_pci_dev_type(void)
}
type_init(register_vfio_pci_dev_type)
-
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
- int req, void *param)
-{
- VFIOGroup *group;
- VFIOContainer *container;
- int ret = -1;
-
- group = vfio_get_group(groupid, as);
- if (!group) {
- error_report("vfio: group %d not registered", groupid);
- return ret;
- }
-
- container = group->container;
- if (group->container) {
- ret = ioctl(container->fd, req, param);
- if (ret < 0) {
- error_report("vfio: failed to ioctl container: ret=%d, %s",
- ret, strerror(errno));
- }
- }
-
- vfio_put_group(group);
-
- return ret;
-}
-
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
- int req, void *param)
-{
- /* We allow only certain ioctls to the container */
- switch (req) {
- case VFIO_CHECK_EXTENSION:
- case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
- break;
- default:
- /* Return an error on unknown requests */
- error_report("vfio: unsupported ioctl %X", req);
- return -1;
- }
-
- return vfio_container_do_ioctl(as, groupid, req, param);
-}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
new file mode 100644
index 0000000..d19622b
--- /dev/null
+++ b/include/hw/vfio/vfio-common.h
@@ -0,0 +1,148 @@
+/*
+ * common header for vfio based device assignment support
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ * Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ * Adapted for KVM by Qumranet.
+ * Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ * Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ * Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ * Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ * Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+#ifndef HW_VFIO_VFIO_COMMON_H
+#define HW_VFIO_VFIO_COMMON_H
+
+#include "exec/address-spaces.h"
+
+/*#define DEBUG_VFIO*/
+#ifdef DEBUG_VFIO
+#define DPRINTF(fmt, ...) \
+ do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+ do { } while (0)
+#endif
+
+/* Extra debugging, trap acceleration paths for more logging */
+#define VFIO_ALLOW_MMAP 1
+#define VFIO_ALLOW_KVM_INTX 1
+#define VFIO_ALLOW_KVM_MSI 1
+#define VFIO_ALLOW_KVM_MSIX 1
+
+enum {
+ VFIO_DEVICE_TYPE_PCI = 0,
+ VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
+typedef struct VFIORegion {
+ struct VFIODevice *vbasedev;
+ off_t fd_offset; /* offset of region within device fd */
+ int fd; /* device fd, allows us to pass VFIORegion as opaque data */
+ MemoryRegion mem; /* slow, read/write access */
+ MemoryRegion mmap_mem; /* direct mapped access */
+ void *mmap;
+ size_t size;
+ uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
+ uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOAddressSpace {
+ AddressSpace *as;
+ QLIST_HEAD(, VFIOContainer) containers;
+ QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
+struct VFIOGroup;
+
+typedef struct VFIOType1 {
+ MemoryListener listener;
+ int error;
+ bool initialized;
+} VFIOType1;
+
+typedef struct VFIOContainer {
+ VFIOAddressSpace *space;
+ int fd; /* /dev/vfio/vfio, empowered by the attached groups */
+ struct {
+ /* enable abstraction to support various iommu backends */
+ union {
+ VFIOType1 type1;
+ };
+ void (*release)(struct VFIOContainer *);
+ } iommu_data;
+ QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+ QLIST_HEAD(, VFIOGroup) group_list;
+ QLIST_ENTRY(VFIOContainer) next;
+} VFIOContainer;
+
+typedef struct VFIOGuestIOMMU {
+ VFIOContainer *container;
+ MemoryRegion *iommu;
+ Notifier n;
+ QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+ QLIST_ENTRY(VFIODevice) next;
+ struct VFIOGroup *group;
+ char *name;
+ int fd;
+ int type;
+ bool reset_works;
+ bool needs_reset;
+ VFIODeviceOps *ops;
+ unsigned int num_irqs;
+ unsigned int num_regions;
+ unsigned int flags;
+} VFIODevice;
+
+struct VFIODeviceOps {
+ bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+ int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+ void (*vfio_eoi)(VFIODevice *vdev);
+ int (*vfio_check_device)(VFIODevice *vdev);
+ int (*vfio_populate_regions)(VFIODevice *vdev);
+ int (*vfio_populate_interrupts)(VFIODevice *vdev);
+};
+
+typedef struct VFIOGroup {
+ int fd;
+ int groupid;
+ VFIOContainer *container;
+ QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_ENTRY(VFIOGroup) next;
+ QLIST_ENTRY(VFIOGroup) container_next;
+} VFIOGroup;
+
+void vfio_put_base_device(VFIODevice *vbasedev);
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index);
+#ifdef CONFIG_KVM
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index);
+#endif
+void vfio_region_write(void *opaque, hwaddr addr,
+ uint64_t data, unsigned size);
+uint64_t vfio_region_read(void *opaque,
+ hwaddr addr, unsigned size);
+void vfio_listener_release(VFIOContainer *container);
+int vfio_mmap_region(Object *vdev, VFIORegion *region,
+ MemoryRegion *mem, MemoryRegion *submem,
+ void **map, size_t size, off_t offset,
+ const char *name);
+void vfio_reset_handler(void *opaque);
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as);
+void vfio_put_group(VFIOGroup *group);
+int vfio_get_device(VFIOGroup *group, const char *name,
+ VFIODevice *vbasedev);
+
+#endif /* !HW_VFIO_VFIO_COMMON_H */
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (6 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 07/13] hw/vfio: create common module Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:40 ` Peter Maydell
2014-07-07 12:49 ` Will Deacon
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 09/13] hw/vfio/platform: add vfio-platform support Eric Auger
` (4 subsequent siblings)
12 siblings, 2 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
From: Alvise Rigo <a.rigo@virtualopensystems.com>
The flag is mandatory for the ARM SMMU so we always add it if the MMIO
handles it.
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
---
hw/vfio/common.c | 9 +++++++++
include/hw/vfio/vfio-common.h | 1 +
| 2 ++
3 files changed, 12 insertions(+)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ed93cf3..e22f326 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -233,6 +233,11 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
}
+ /* add exec flag */
+ if (container->iommu_data.has_exec_cap) {
+ map.flags |= VFIO_DMA_MAP_FLAG_EXEC;
+ }
+
/*
* Try the mapping, if it fails with EBUSY, unmap the region and try
* again. This shouldn't be necessary, but we sometimes see it in
@@ -688,6 +693,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
goto free_container_exit;
}
+ if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_IOMMU_PROT_EXEC)) {
+ container->iommu_data.has_exec_cap = true;
+ }
+
container->iommu_data.type1.listener = vfio_memory_listener;
container->iommu_data.release = vfio_listener_release;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index d19622b..e670ae3 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -76,6 +76,7 @@ typedef struct VFIOContainer {
union {
VFIOType1 type1;
};
+ bool has_exec_cap; /* support of exec capability by the IOMMU */
void (*release)(struct VFIOContainer *);
} iommu_data;
QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
--git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 26c218e..b13f7d3 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -30,6 +30,7 @@
*/
#define VFIO_DMA_CC_IOMMU 4
+#define VFIO_IOMMU_PROT_EXEC 5
/*
* The IOCTL interface is designed for extensibility by embedding the
* structure length (argsz) and flags into structures passed between
@@ -398,6 +399,7 @@ struct vfio_iommu_type1_dma_map {
__u32 flags;
#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
+#define VFIO_DMA_MAP_FLAG_EXEC (1 << 2) /* executable from device */
__u64 vaddr; /* Process virtual address */
__u64 iova; /* IO virtual address */
__u64 size; /* Size of mapping (bytes) */
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 09/13] hw/vfio/platform: add vfio-platform support
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (7 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 10/13] hw/intc/arm_gic_kvm: enable irqfd and set routing table Eric Auger
` (3 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, Kim Phillips, patches, will.deacon,
agraf, stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis,
kvmarm
Minimal VFIO platform implementation supporting
- register space user mapping,
- IRQ assignment based on eventfds handled on qemu side.
irqfd kernel acceleration comes in a subsequent patch.
Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v3 -> v4:
[Eric Auger]
- merge of "vfio: Add initial IRQ support in platform device"
to get a full functional patch although perfs are limited.
- removal of unrealize function since I currently understand
it is only used with device hot-plug feature.
v2 -> v3:
[Eric Auger]
- further factorization between PCI and platform (VFIORegion,
VFIODevice). same level of functionality.
<= v2:
[Kim Philipps]
- Initial Creation of the device supporting register space mapping
---
hw/vfio/Makefile.objs | 1 +
hw/vfio/platform.c | 528 ++++++++++++++++++++++++++++++++++++++++
include/hw/vfio/vfio-platform.h | 74 ++++++
3 files changed, 603 insertions(+)
create mode 100644 hw/vfio/platform.c
create mode 100644 include/hw/vfio/vfio-platform.h
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..c5c76fe 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,5 @@
ifeq ($(CONFIG_LINUX), y)
obj-$(CONFIG_SOFTMMU) += common.o
obj-$(CONFIG_PCI) += pci.o
+obj-$(CONFIG_SOFTMMU) += platform.o
endif
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
new file mode 100644
index 0000000..a5fc22b
--- /dev/null
+++ b/hw/vfio/platform.c
@@ -0,0 +1,528 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ * Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ * Copyright Red Hat, Inc. 2012
+ */
+
+#include <linux/vfio.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "hw/vfio/vfio-platform.h"
+
+extern const MemoryRegionOps vfio_region_ops;
+extern const MemoryListener vfio_memory_listener;
+extern QLIST_HEAD(, VFIOGroup) group_list;
+extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
+
+static void vfio_put_device(VFIOPlatformDevice *vdev)
+{
+ unsigned int i;
+ VFIODevice *vbasedev = &vdev->vbasedev;
+
+ for (i = 0; i < vbasedev->num_regions; i++) {
+ g_free(vdev->regions[i]);
+ }
+ g_free(vdev->regions);
+ vfio_put_base_device(&vdev->vbasedev);
+}
+
+/*
+ * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
+ * is not a QOM Object and cannot be passed to memory region functions
+*/
+static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
+{
+ VFIORegion *region = vdev->regions[nr];
+ unsigned size = region->size;
+ char name[64];
+
+ snprintf(name, sizeof(name), "VFIO %s region %d",
+ vdev->vbasedev.name, nr);
+
+ /* A "slow" read/write mapping underlies all regions */
+ memory_region_init_io(®ion->mem, OBJECT(vdev), &vfio_region_ops,
+ region, name, size);
+
+ strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
+
+ if (vfio_mmap_region(OBJECT(vdev), region, ®ion->mem,
+ ®ion->mmap_mem, ®ion->mmap, size, 0, name)) {
+ error_report("%s unsupported. Performance may be slow", name);
+ }
+}
+
+static void print_regions(VFIOPlatformDevice *vdev)
+{
+ int i;
+
+ DPRINTF("Device \"%s\" counts %d region(s):\n",
+ vdev->vbasedev.name, vdev->vbasedev.num_regions);
+
+ for (i = 0; i < vdev->vbasedev.num_regions; i++) {
+ DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
+ "fd= %d, offset = 0x%lx\n",
+ vdev->regions[i]->nr,
+ (unsigned long)vdev->regions[i]->flags,
+ (unsigned long)vdev->regions[i]->size,
+ vdev->regions[i]->fd,
+ (unsigned long)vdev->regions[i]->fd_offset);
+ }
+}
+
+static int vfio_populate_regions(VFIODevice *vbasedev)
+{
+ struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+ int i, ret = errno;
+ VFIOPlatformDevice *vdev =
+ container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+ vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
+
+ for (i = 0; i < vbasedev->num_regions; i++) {
+ vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
+ reg_info.index = i;
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ if (ret) {
+ error_report("vfio: Error getting region %d info: %m", i);
+ goto error;
+ }
+
+ vdev->regions[i]->flags = reg_info.flags;
+ vdev->regions[i]->size = reg_info.size;
+ vdev->regions[i]->fd_offset = reg_info.offset;
+ vdev->regions[i]->fd = vbasedev->fd;
+ vdev->regions[i]->nr = i;
+ vdev->regions[i]->vbasedev = vbasedev;
+ }
+ print_regions(vdev);
+ return ret;
+error:
+ vfio_put_device(vdev);
+ return ret;
+}
+
+/* not implemented yet */
+static int vfio_platform_check_device(VFIODevice *vdev)
+{
+ return 0;
+}
+
+/* not implemented yet */
+static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
+{
+return false;
+}
+
+/* not implemented yet */
+static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
+{
+return 0;
+}
+
+/*
+ * eoi function is called on the first access to any MMIO region
+ * after an IRQ was triggered. It is assumed this access corresponds
+ * to the IRQ status register reset.
+ * With such a mechanism, a single IRQ can be handled at a time since
+ * there is no way to know which IRQ was completed by the guest.
+ * (we would need additional details about the IRQ status register mask)
+ */
+static void vfio_platform_eoi(VFIODevice *vbasedev)
+{
+ VFIOINTp *intp;
+ VFIOPlatformDevice *vdev =
+ container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+ bool eoi_done = false;
+
+ QLIST_FOREACH(intp, &vdev->intp_list, next) {
+ if (intp->state == VFIO_IRQ_ACTIVE) {
+ if (eoi_done) {
+ error_report("several IRQ pending: "
+ "this case should not happen!\n");
+ }
+ DPRINTF("EOI IRQ #%d fd=%d\n",
+ intp->pin, event_notifier_get_fd(&intp->interrupt));
+ intp->state = VFIO_IRQ_INACTIVE;
+
+ /* deassert the virtual IRQ and unmask physical one */
+ qemu_set_irq(intp->qemuirq, 0);
+ vfio_unmask_irqindex(vbasedev, intp->pin);
+ eoi_done = true;
+ }
+ }
+
+ /*
+ * in case there are pending IRQs, handle them one at a time */
+ if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
+ intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
+ vfio_intp_interrupt(intp);
+ QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
+ }
+ return;
+}
+
+/*
+ * enable/disable the fast path mode
+ * fast path = MMIO region is mmaped (no KVM TRAP)
+ * slow path = MMIO region is trapped and region callbacks are called
+ * slow path enables to trap the IRQ status register guest reset
+*/
+
+static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
+{
+ VFIORegion *region;
+ int i;
+
+ DPRINTF("fast path = %d\n", enabled);
+
+ for (i = 0; i < vdev->vbasedev.num_regions; i++) {
+ region = vdev->regions[i];
+
+ /* register space is unmapped to trap EOI */
+ memory_region_set_enabled(®ion->mmap_mem, enabled);
+ }
+}
+
+/*
+ * Checks whether the IRQ is still pending. In the negative
+ * the fast path mode (where reg space is mmaped) can be restored.
+ * if the IRQ is still pending, we must keep on trapping IRQ status
+ * register reset with mmap disabled (slow path).
+ * the function is called on mmap_timer event.
+ * by construction a single fd is handled at a time. See EOI comment
+ * for additional details.
+ */
+static void vfio_intp_mmap_enable(void *opaque)
+{
+ VFIOINTp *tmp;
+ VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
+ bool one_active_irq = false;
+
+ QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+ if (tmp->state == VFIO_IRQ_ACTIVE) {
+ if (one_active_irq) {
+ error_report("several active IRQ: "
+ "this case should not happen!\n");
+ }
+ DPRINTF("IRQ #%d still pending, stay in slow path\n",
+ tmp->pin);
+ timer_mod(vdev->mmap_timer,
+ qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+ vdev->mmap_timeout);
+ one_active_irq = true;
+ }
+ }
+ if (one_active_irq) {
+ return;
+ }
+ DPRINTF("no pending IRQ, restore fast path\n");
+ vfio_mmap_set_enabled(vdev, true);
+}
+
+/*
+ * The fd handler
+ */
+void vfio_intp_interrupt(void *opaque)
+{
+ int ret;
+ VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
+ VFIOPlatformDevice *vdev = intp->vdev;
+ bool one_active_irq = false;
+
+ /*
+ * first check whether there is a pending IRQ
+ * in the positive the new IRQ cannot be handled until the
+ * active one is not completed.
+ * by construction the same IRQ as the pending one cannot hit
+ * since the physical IRQ was disabled by the VFIO driver
+ */
+ QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+ if (tmp->state == VFIO_IRQ_ACTIVE) {
+ one_active_irq = true;
+ }
+ }
+ if (one_active_irq) {
+ /*
+ * the new IRQ gets a pending status and is pushed in
+ * the pending queue
+ */
+ intp->state = VFIO_IRQ_PENDING;
+ QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
+ intp, pqnext);
+ return;
+ }
+
+ /* no active IRQ, the new IRQ can be forwarded to guest */
+ DPRINTF("Handle IRQ #%d (fd = %d)\n",
+ intp->pin, event_notifier_get_fd(&intp->interrupt));
+
+ ret = event_notifier_test_and_clear(&intp->interrupt);
+ if (!ret) {
+ DPRINTF("Error when clearing fd=%d\n",
+ event_notifier_get_fd(&intp->interrupt));
+ }
+
+ intp->state = VFIO_IRQ_ACTIVE;
+
+ /* sets slow path */
+ vfio_mmap_set_enabled(vdev, false);
+
+ /* trigger the virtual IRQ */
+ qemu_set_irq(intp->qemuirq, 1);
+
+ /* schedule the mmap timer which will restore mmap path after EOI*/
+ if (vdev->mmap_timeout) {
+ timer_mod(vdev->mmap_timer,
+ qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + vdev->mmap_timeout);
+ }
+}
+
+static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
+{
+ struct vfio_irq_set *irq_set;
+ int32_t *pfd;
+ int ret, argsz;
+ int device = vbasedev->fd;
+ VFIOPlatformDevice *vdev =
+ container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+ SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
+ VFIOINTp *intp;
+
+ /* allocate and populate a new VFIOINTp structure put in a queue list */
+ intp = g_malloc0(sizeof(*intp));
+ intp->vdev = vdev;
+ intp->pin = index;
+ intp->state = VFIO_IRQ_INACTIVE;
+ sysbus_init_irq(sbdev, &intp->qemuirq);
+
+ ret = event_notifier_init(&intp->interrupt, 0);
+
+ if (ret) {
+ error_report("vfio: Error: event_notifier_init failed ");
+ return ret;
+ }
+ /* build the irq_set to be passed to the vfio kernel driver */
+
+ argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+ irq_set = g_malloc0(argsz);
+ irq_set->argsz = argsz;
+ irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+ irq_set->index = index;
+ irq_set->start = 0;
+ irq_set->count = 1;
+ pfd = (int32_t *)&irq_set->data;
+
+ *pfd = event_notifier_get_fd(&intp->interrupt);
+
+ DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
+
+ qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
+
+ /*
+ * pass the index/fd binding to the kernel driver so that it
+ * triggers this fd on HW IRQ
+ */
+ ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
+ g_free(irq_set);
+ if (ret) {
+ error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
+ qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+ close(*pfd); /* TO DO : replace by event_notifier_cleanup */
+ return -errno;
+ }
+
+ /* store the new intp in qlist */
+ QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
+ return 0;
+}
+
+static int vfio_populate_interrupts(VFIODevice *vbasedev)
+{
+ struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+ int i, ret;
+ VFIOPlatformDevice *vdev =
+ container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+ vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+ vfio_intp_mmap_enable, vdev);
+
+ QSIMPLEQ_INIT(&vdev->pending_intp_queue);
+
+ for (i = 0; i < vbasedev->num_irqs; i++) {
+ irq.index = i;
+
+ DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+ if (ret) {
+ error_printf("vfio: error getting device %s irq info",
+ vbasedev->name);
+ }
+ DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
+ irq.index, irq.count, irq.flags);
+
+ vfio_enable_intp(vbasedev, irq.index);
+ }
+ return 0;
+}
+
+static VFIODeviceOps vfio_platform_ops = {
+ .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
+ .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
+ .vfio_eoi = vfio_platform_eoi,
+ .vfio_check_device = vfio_platform_check_device,
+ .vfio_populate_regions = vfio_populate_regions,
+ .vfio_populate_interrupts = vfio_populate_interrupts,
+};
+
+static int vfio_base_device_init(VFIODevice *vbasedev)
+{
+ VFIOGroup *group;
+ VFIODevice *vbasedev_iter;
+ char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
+ ssize_t len;
+ struct stat st;
+ int groupid;
+ int ret;
+
+ /* name must be set prior to the call */
+ if (vbasedev->name == NULL) {
+ return -errno;
+ }
+
+ /* Check that the host device exists */
+ snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
+ vbasedev->name);
+
+ if (stat(path, &st) < 0) {
+ error_report("vfio: error: no such host device: %s", path);
+ return -errno;
+ }
+
+ strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
+ len = readlink(path, iommu_group_path, sizeof(path));
+ if (len <= 0 || len >= sizeof(path)) {
+ error_report("vfio: error no iommu_group for device");
+ return len < 0 ? -errno : ENAMETOOLONG;
+ }
+
+ iommu_group_path[len] = 0;
+ group_name = basename(iommu_group_path);
+
+ if (sscanf(group_name, "%d", &groupid) != 1) {
+ error_report("vfio: error reading %s: %m", path);
+ return -errno;
+ }
+
+ DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
+
+ group = vfio_get_group(groupid, &address_space_memory);
+ if (!group) {
+ error_report("vfio: failed to get group %d", groupid);
+ return -ENOENT;
+ }
+
+ snprintf(path, sizeof(path), "%s", vbasedev->name);
+
+ QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+ if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
+ error_report("vfio: error: device %s is already attached", path);
+ vfio_put_group(group);
+ return -EBUSY;
+ }
+ }
+ ret = vfio_get_device(group, path, vbasedev);
+ if (ret < 0) {
+ error_report("vfio: failed to get device %s", path);
+ vfio_put_group(group);
+ return ret;
+ }
+ return ret;
+}
+
+static void vfio_platform_realize(DeviceState *dev, Error **errp)
+{
+ VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+ SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ int i, ret;
+
+ vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
+ vbasedev->ops = &vfio_platform_ops;
+
+ DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
+
+ ret = vfio_base_device_init(vbasedev);
+ if (ret < 0) {
+ return;
+ }
+
+ for (i = 0; i < vbasedev->num_regions; i++) {
+ vfio_map_region(vdev, i);
+ sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
+ }
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+ .name = TYPE_VFIO_PLATFORM,
+ .version_id = 3,
+ .minimum_version_id = 2,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ },
+ .unmigratable = 1,
+};
+
+static Property vfio_platform_dev_properties[] = {
+ DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),
+ DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
+ DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
+ mmap_timeout, 1100),
+ DEFINE_PROP_UINT32("num_irqs", VFIOPlatformDevice,
+ vbasedev.num_irqs, 0),
+ DEFINE_PROP_UINT32("num_regions", VFIOPlatformDevice,
+ vbasedev.num_regions, 0),
+ DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+ DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vfio_platform_class_init(ObjectClass *klass, void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+
+ dc->realize = vfio_platform_realize;
+ dc->props = vfio_platform_dev_properties;
+ dc->vmsd = &vfio_platform_vmstate;
+ dc->desc = "VFIO-based platform device assignment";
+ set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo vfio_platform_dev_info = {
+ .name = TYPE_VFIO_PLATFORM,
+ .parent = TYPE_SYS_BUS_DEVICE,
+ .instance_size = sizeof(VFIOPlatformDevice),
+ .class_init = vfio_platform_class_init,
+ .class_size = sizeof(VFIOPlatformDeviceClass),
+};
+
+static void register_vfio_platform_dev_type(void)
+{
+ type_register_static(&vfio_platform_dev_info);
+}
+
+type_init(register_vfio_platform_dev_type)
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
new file mode 100644
index 0000000..134fc1e
--- /dev/null
+++ b/include/hw/vfio/vfio-platform.h
@@ -0,0 +1,74 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ * Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ * Copyright Red Hat, Inc. 2012
+ */
+
+#ifndef HW_VFIO_VFIO_PLATFORM_H
+#define HW_VFIO_VFIO_PLATFORM_H
+
+#include "hw/sysbus.h"
+#include "hw/vfio/vfio-common.h"
+
+#define TYPE_VFIO_PLATFORM "vfio-platform"
+
+enum {
+ VFIO_IRQ_INACTIVE = 0,
+ VFIO_IRQ_PENDING = 1,
+ VFIO_IRQ_ACTIVE = 2,
+ /* VFIO_IRQ_ACTIVE_AND_PENDING cannot happen with VFIO */
+};
+
+typedef struct VFIOINTp {
+ QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
+ QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
+ EventNotifier interrupt; /* eventfd triggered on interrupt */
+ EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
+ qemu_irq qemuirq;
+ struct VFIOPlatformDevice *vdev; /* back pointer to device */
+ int state; /* inactive, pending, active */
+ bool kvm_accel; /* set when QEMU bypass through KVM enabled */
+ uint8_t pin; /* index */
+ uint8_t virtualID; /* virtual IRQ */
+} VFIOINTp;
+
+typedef struct VFIOPlatformDevice {
+ SysBusDevice sbdev;
+ VFIODevice vbasedev; /* not a QOM object */
+ VFIORegion **regions;
+ QLIST_HEAD(, VFIOINTp) intp_list; /* list of IRQ */
+ /* queue of pending IRQ */
+ QSIMPLEQ_HEAD(pending_intp_queue, VFIOINTp) pending_intp_queue;
+ char *compat; /* compatibility string */
+ bool irqfd_allowed;
+ uint32_t mmap_timeout; /* delay to re-enable mmaps after interrupt */
+ QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
+} VFIOPlatformDevice;
+
+
+typedef struct VFIOPlatformDeviceClass {
+ /*< private >*/
+ SysBusDeviceClass parent_class;
+ /*< public >*/
+} VFIOPlatformDeviceClass;
+
+#define VFIO_PLATFORM_DEVICE(obj) \
+ OBJECT_CHECK(VFIOPlatformDevice, (obj), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_CLASS(klass) \
+ OBJECT_CLASS_CHECK(VFIOPlatformDeviceClass, (klass), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(VFIOPlatformDeviceClass, (obj), TYPE_VFIO_PLATFORM)
+
+void vfio_intp_interrupt(void *opaque);
+void vfio_setup_irqfd(SysBusDevice *dev, int index, int virq);
+
+#endif
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 10/13] hw/intc/arm_gic_kvm: enable irqfd and set routing table
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (8 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 09/13] hw/vfio/platform: add vfio-platform support Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 11/13] hw/vfio/platform: Add irqfd support Eric Auger
` (2 subsequent siblings)
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
Makes possible to use KVM irqfd. An identity GSI routing table
is defined so that virtual IRQ injection can happen.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/intc/arm_gic_kvm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index 5038885..29b9236 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -576,6 +576,17 @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_V2_ADDR_TYPE_CPU,
s->dev_fd);
+
+ /* set up irq routing */
+ kvm_init_irq_routing(kvm_state);
+ for (i = 0; i < s->num_irq - GIC_INTERNAL; ++i) {
+ kvm_irqchip_add_irq_route(kvm_state, i, 0, i);
+ }
+
+ kvm_irqfds_allowed = true;
+ kvm_gsi_routing_allowed = true;
+
+ kvm_irqchip_commit_routes(kvm_state);
}
static void kvm_arm_gic_class_init(ObjectClass *klass, void *data)
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 11/13] hw/vfio/platform: Add irqfd support
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (9 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 10/13] hw/intc/arm_gic_kvm: enable irqfd and set routing table Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 12/13] hw/vfio/platform: add default dt generation for vfio device Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 13/13] hw/vfio: add an example calxeda_xgmac Eric Auger
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
This patch aims at optimizing IRQ handling using irqfd framework.
Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.
the virtual IRQ completion is trapped at interrupt controller
instead of on guest 1st access to any region after IRQ hit.
This removes the need for fast/slow path swap.
Overall this brings significant performance improvements.
It depends on host kernel KVM irqfd/GSI routing capability.
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
v3 -> v4:
[Alvise Rigo]
Use of VFIO Platform driver v6 unmask/virqfd feature and removal
of resamplefd handler. Physical IRQ unmasking is now done in
VFIO driver.
v3:
[Eric Auger]
initial support with resamplefd handled on QEMU side since the
unmask was not supported on VFIO platform driver v5.
---
hw/vfio/platform.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a5fc22b..fb0f7c9 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -381,6 +381,101 @@ static int vfio_populate_interrupts(VFIODevice *vbasedev)
return 0;
}
+static void vfio_enable_intp_kvm(VFIOINTp *intp)
+{
+#ifdef CONFIG_KVM
+ struct kvm_irqfd irqfd = {
+ .fd = event_notifier_get_fd(&intp->interrupt),
+ .gsi = intp->virtualID,
+ .flags = KVM_IRQFD_FLAG_RESAMPLE,
+ };
+
+ struct vfio_irq_set *irq_set;
+ int ret, argsz;
+ int32_t *pfd;
+ VFIODevice *vbasedev = &intp->vdev->vbasedev;
+
+ if (!kvm_irqfds_enabled() ||
+ !kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) {
+ return;
+ }
+
+ /* Get to a known interrupt state */
+ qemu_set_fd_handler(irqfd.fd, NULL, NULL, NULL);
+ vfio_mask_irqindex(vbasedev, intp->pin);
+ intp->state = VFIO_IRQ_INACTIVE;
+ qemu_set_irq(intp->qemuirq, 0);
+
+ /* Get an eventfd for resample/unmask */
+ if (event_notifier_init(&intp->unmask, 0)) {
+ error_report("vfio: Error: event_notifier_init failed eoi");
+ goto fail;
+ }
+
+ /* KVM triggers it, VFIO listens for it */
+ irqfd.resamplefd = event_notifier_get_fd(&intp->unmask);
+
+ if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
+ error_report("vfio: Error: Failed to setup resample irqfd: %m");
+ goto fail_irqfd;
+ }
+
+ argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+ irq_set = g_malloc0(argsz);
+ irq_set->argsz = argsz;
+ irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
+ irq_set->index = intp->pin;
+ irq_set->start = 0;
+ irq_set->count = 1;
+ pfd = (int32_t *)&irq_set->data;
+
+ *pfd = irqfd.resamplefd;
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ g_free(irq_set);
+ if (ret) {
+ error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
+ goto fail_vfio;
+ }
+
+ /* Let'em rip */
+ vfio_unmask_irqindex(vbasedev, intp->pin);
+
+ intp->kvm_accel = true;
+
+ DPRINTF("%s irqfd pin=%d to virtID = %d fd=%d, resamplefd=%d)\n",
+ __func__, intp->pin, intp->virtualID,
+ irqfd.fd, irqfd.resamplefd);
+
+ return;
+
+fail_vfio:
+ irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
+ kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
+fail_irqfd:
+ event_notifier_cleanup(&intp->unmask);
+fail:
+ qemu_set_fd_handler(irqfd.fd, vfio_intp_interrupt, NULL, intp);
+ vfio_unmask_irqindex(vbasedev, intp->pin);
+#endif
+}
+
+void vfio_setup_irqfd(SysBusDevice *s, int index, int virq)
+{
+ VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
+ VFIOINTp *intp;
+
+ QLIST_FOREACH(intp, &vdev->intp_list, next) {
+ if (intp->pin == index) {
+ intp->virtualID = virq;
+ DPRINTF("enable irqfd for irq index %d (virtual IRQ %d)\n",
+ index, virq);
+ vfio_enable_intp_kvm(intp);
+ }
+ }
+}
+
static VFIODeviceOps vfio_platform_ops = {
.vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
.vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 12/13] hw/vfio/platform: add default dt generation for vfio device
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (10 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 11/13] hw/vfio/platform: Add irqfd support Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 13/13] hw/vfio: add an example calxeda_xgmac Eric Auger
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
implement a default dt generation for VFIO platform device. This
enables dynamic instantiation. If the device tree generation does
not match the expectation it is possible to derive a component
and write one's own fdt_add_device_node method.
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/platform.c | 143 ++++++++++++++++++++++++++++++++++++++++
include/hw/vfio/vfio-platform.h | 1 +
2 files changed, 144 insertions(+)
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index fb0f7c9..447ce50 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -22,6 +22,7 @@
#include "qemu/range.h"
#include "sysemu/sysemu.h"
#include "hw/vfio/vfio-platform.h"
+#include "hw/misc/platform_devices.h"
extern const MemoryRegionOps vfio_region_ops;
extern const MemoryListener vfio_memory_listener;
@@ -573,6 +574,142 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
}
}
+static char *format_compat(char * compat)
+{
+ char *str_ptr, *corrected_compat;
+ /*
+ * process compatibility property string passed by end-user
+ * replaces / by , and ; by NUL character
+ */
+ corrected_compat = g_strdup(compat);
+ /*
+ * the total length of the string has to include also the last
+ * NUL char.
+ */
+
+ str_ptr = corrected_compat;
+ while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
+ *str_ptr = ',';
+ }
+
+ /* substitute ";" with the NUL char */
+ str_ptr = corrected_compat;
+ while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
+ *str_ptr = '\0';
+ }
+
+ return corrected_compat;
+}
+
+static void wrap_fdt_add_node(SysBusDevice *sbdev, void *opaque)
+{
+ PlatformDevtreeData *data = opaque;
+ VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+ VFIOPlatformDeviceClass *vdevc = VFIO_PLATFORM_DEVICE_GET_CLASS(sbdev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ gchar irq_number_prop[8];
+ Object *obj = OBJECT(sbdev);
+ char *corrected_compat;
+ uint64_t irq_number;
+ int compat_str_len = strlen(vdev->compat)+1;
+ int i;
+
+ corrected_compat = format_compat(vdev->compat);
+ snprintf(vdev->compat, compat_str_len, "%s", corrected_compat);
+ g_free(corrected_compat);
+
+ vdevc->fdt_add_device_node(sbdev, opaque);
+
+ for (i = 0; i < vbasedev->num_irqs; i++) {
+ snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+ irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+ + data->irq_start;
+ /*
+ * this is a hack: this does not relate to dt creation
+ * for setting up irqfd we must give the virtual IRQ number
+ * which is the sum of irq_start and actual platform bus irq
+ * index. at realize point we do not have this info.
+ * in case of static instantiation, the irq connect is not yet
+ * done. In case of dynamic instantiation it was done but we
+ * miss irq_start which is not stored in the sysbus device.
+ */
+ if (vdev->irqfd_allowed) {
+ vfio_setup_irqfd(sbdev, i, irq_number);
+ }
+ }
+}
+
+static void default_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
+{
+ PlatformDevtreeData *data = opaque;
+ void *fdt = data->fdt;
+ const char *parent_node = data->node;
+ uint64_t platform_bus_base = data->platform_bus_base;
+ int compat_str_len;
+ char *nodename;
+ int i, ret;
+ uint32_t *irq_attr;
+ uint64_t *reg_attr;
+ uint64_t mmio_base;
+ uint64_t irq_number;
+ gchar mmio_base_prop[8];
+ gchar irq_number_prop[8];
+ VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ Object *obj = OBJECT(sbdev);
+
+ mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+ nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+ vbasedev->name,
+ mmio_base + platform_bus_base);
+
+ qemu_fdt_add_subnode(fdt, nodename);
+
+ compat_str_len = strlen(vdev->compat) + 1;
+ qemu_fdt_setprop(fdt, nodename, "compatible",
+ vdev->compat, compat_str_len);
+
+ reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+ for (i = 0; i < vbasedev->num_regions; i++) {
+ snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
+ mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
+ reg_attr[4*i] = 2;
+ reg_attr[4*i+1] = mmio_base + platform_bus_base;
+ reg_attr[4*i+2] = 2;
+ reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
+ }
+
+ ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
+ vbasedev->num_regions*2, reg_attr);
+ if (ret < 0) {
+ error_report("could not set reg property of node %s", nodename);
+ }
+
+ irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
+
+ for (i = 0; i < vbasedev->num_irqs; i++) {
+ snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+ irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+ + data->irq_start;
+ irq_attr[3*i] = cpu_to_be32(0);
+ irq_attr[3*i+1] = cpu_to_be32(irq_number);
+ irq_attr[3*i+2] = cpu_to_be32(0x4);
+ }
+
+ ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+ irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
+ if (ret < 0) {
+ error_report("could not set interrupts property of node %s",
+ nodename);
+ }
+
+ g_free(nodename);
+ g_free(irq_attr);
+ g_free(reg_attr);
+}
+
static const VMStateDescription vfio_platform_vmstate = {
.name = TYPE_VFIO_PLATFORM,
.version_id = 3,
@@ -599,6 +736,12 @@ static Property vfio_platform_dev_properties[] = {
static void vfio_platform_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
+ SysBusDeviceClass *sbdevc = SYS_BUS_DEVICE_CLASS(klass);
+ VFIOPlatformDeviceClass *vdc = VFIO_PLATFORM_DEVICE_CLASS(klass);
+
+ /* substitute wrap_fdt_add_node to parent fdt_add_node */
+ sbdevc->fdt_add_node = wrap_fdt_add_node;
+ vdc->fdt_add_device_node = default_fdt_add_device_node;
dc->realize = vfio_platform_realize;
dc->props = vfio_platform_dev_properties;
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index 134fc1e..1e9a5f4 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -59,6 +59,7 @@ typedef struct VFIOPlatformDeviceClass {
/*< private >*/
SysBusDeviceClass parent_class;
/*< public >*/
+ void (*fdt_add_device_node)(SysBusDevice *sbdev, void *opaque);
} VFIOPlatformDeviceClass;
#define VFIO_PLATFORM_DEVICE(obj) \
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC v4 13/13] hw/vfio: add an example calxeda_xgmac
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
` (11 preceding siblings ...)
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 12/13] hw/vfio/platform: add default dt generation for vfio device Eric Auger
@ 2014-07-07 12:27 ` Eric Auger
12 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 12:27 UTC (permalink / raw)
To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm
Example of a derived VFIO Device. Illustrates how to specialize
the device node addition - although it does not make sense for
this peculiar device which could reuse the default
VFIOPlatformDeviceClass fdt_add_device_node implementation -.
The end-user does not need to specify the compat which is hard
coded in the realize function.
-device vfio-calxeda-xgmac,vfio_device="fff51000.ethernet"
cannot be used instead of
-device vfio-platform,vfio_device="fff51000.ethernet",compat="calxeda/hb-xgmac"
This is an example. NOT TO BE APPLIED for this device.
Not-signed-off-by: Eric Auger <eric.auger@linaro.org>
---
hw/vfio/Makefile.objs | 1 +
hw/vfio/calxeda_xgmac.c | 165 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 166 insertions(+)
create mode 100644 hw/vfio/calxeda_xgmac.c
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index c5c76fe..913ab14 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -2,4 +2,5 @@ ifeq ($(CONFIG_LINUX), y)
obj-$(CONFIG_SOFTMMU) += common.o
obj-$(CONFIG_PCI) += pci.o
obj-$(CONFIG_SOFTMMU) += platform.o
+obj-$(CONFIG_SOFTMMU) += calxeda_xgmac.o
endif
diff --git a/hw/vfio/calxeda_xgmac.c b/hw/vfio/calxeda_xgmac.c
new file mode 100644
index 0000000..61010cd
--- /dev/null
+++ b/hw/vfio/calxeda_xgmac.c
@@ -0,0 +1,165 @@
+/*
+ * calxeda xgmac example VFIO device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ * Eric Auger <eric.auger@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/vfio/vfio-platform.h"
+#include "hw/misc/platform_devices.h"
+#include "qemu/error-report.h"
+
+#define TYPE_VFIO_CALXEDA_XGMAC "vfio-calxeda-xgmac"
+
+typedef struct VFIOCalxedaXgmacDevice {
+ VFIOPlatformDevice vdev;
+} VFIOCalxedaXgmacDevice;
+
+typedef struct VFIOCalxedaXgmacDeviceClass {
+ /*< private >*/
+ VFIOPlatformDeviceClass parent_class;
+ /*< public >*/
+ DeviceRealize parent_realize;
+} VFIOCalxedaXgmacDeviceClass;
+
+#define VFIO_CALXEDA_XGMAC_DEVICE(obj) \
+ OBJECT_CHECK(VFIOCalxedaXgmacDevice, (obj), TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass) \
+ OBJECT_CLASS_CHECK(VFIOCalxedaXgmacDeviceClass, (klass), \
+ TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(VFIOCalxedaXgmacDeviceClass, (obj), \
+ TYPE_VFIO_CALXEDA_XGMAC)
+
+static void calxeda_xgmac_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
+{
+ PlatformDevtreeData *data = opaque;
+ void *fdt = data->fdt;
+ const char *parent_node = data->node;
+ uint64_t platform_bus_base = data->platform_bus_base;
+ int compat_str_len;
+ char *nodename;
+ int i, ret;
+ uint32_t *irq_attr;
+ uint64_t *reg_attr;
+ uint64_t mmio_base;
+ uint64_t irq_number;
+ gchar mmio_base_prop[8];
+ gchar irq_number_prop[8];
+ VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ Object *obj = OBJECT(sbdev);
+
+ mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+ nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+ vbasedev->name,
+ mmio_base + platform_bus_base);
+
+ qemu_fdt_add_subnode(fdt, nodename);
+
+ compat_str_len = strlen(vdev->compat) + 1;
+ qemu_fdt_setprop(fdt, nodename, "compatible",
+ vdev->compat, compat_str_len);
+
+ reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+ for (i = 0; i < vbasedev->num_regions; i++) {
+ snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
+ mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
+ reg_attr[4*i] = 2;
+ reg_attr[4*i+1] = mmio_base + platform_bus_base;
+ reg_attr[4*i+2] = 2;
+ reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
+ }
+
+ ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
+ vbasedev->num_regions*2, reg_attr);
+ if (ret < 0) {
+ error_report("could not set reg property of node %s", nodename);
+ }
+
+ irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
+
+ for (i = 0; i < vbasedev->num_irqs; i++) {
+ snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+ irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+ + data->irq_start;
+ irq_attr[3*i] = cpu_to_be32(0);
+ irq_attr[3*i+1] = cpu_to_be32(irq_number);
+ irq_attr[3*i+2] = cpu_to_be32(0x4);
+ }
+
+ ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+ irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
+ if (ret < 0) {
+ error_report("could not set interrupts property of node %s",
+ nodename);
+ }
+
+ g_free(nodename);
+ g_free(irq_attr);
+ g_free(reg_attr);
+}
+
+static void calxeda_xgmac_realize(DeviceState *dev, Error **errp)
+{
+ VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+ VFIOCalxedaXgmacDeviceClass *k = VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(dev);
+ int compat_str_len;
+ const char compat[] = "calxeda,hb-xgmac";
+
+ if (vdev->compat != NULL) {
+ /* compat already provided by the user */
+ compat_str_len = strlen(vdev->compat)+1;
+ snprintf(vdev->compat, compat_str_len, "%s", compat);
+ } else {
+ vdev->compat = g_strdup(compat);
+ }
+
+ /* hard code the compat before calling parent realize */
+ k->parent_realize(dev, errp);
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+ .name = TYPE_VFIO_CALXEDA_XGMAC,
+ .version_id = 3,
+ .minimum_version_id = 2,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ },
+ .unmigratable = 1,
+};
+
+static void vfio_calxeda_xgmac_class_init(ObjectClass *klass, void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+ VFIOPlatformDeviceClass *vdc = VFIO_PLATFORM_DEVICE_CLASS(klass);
+ VFIOCalxedaXgmacDeviceClass *vcxc =
+ VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass);
+ vdc->fdt_add_device_node = calxeda_xgmac_fdt_add_device_node;
+ vcxc->parent_realize = dc->realize;
+ dc->realize = calxeda_xgmac_realize;
+ dc->desc = "VFIO Calxeda XGMAC";
+}
+
+static const TypeInfo vfio_calxeda_xgmac_dev_info = {
+ .name = TYPE_VFIO_CALXEDA_XGMAC,
+ .parent = TYPE_VFIO_PLATFORM,
+ .instance_size = sizeof(VFIOCalxedaXgmacDevice),
+ .class_init = vfio_calxeda_xgmac_class_init,
+ .class_size = sizeof(VFIOCalxedaXgmacDeviceClass),
+};
+
+static void register_calxeda_xgmac_dev_type(void)
+{
+ type_register_static(&vfio_calxeda_xgmac_dev_info);
+}
+
+type_init(register_calxeda_xgmac_dev_type)
--
1.8.3.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings Eric Auger
@ 2014-07-07 12:40 ` Peter Maydell
2014-07-07 12:49 ` Will Deacon
1 sibling, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2014-07-07 12:40 UTC (permalink / raw)
To: Eric Auger
Cc: Alexander Graf, Kim Phillips, eric.auger, Patch Tracking,
Will Deacon, QEMU Developers, Alvise Rigo, Bharat Bhushan,
Alex Williamson, Stuart Yoder, Antonios Motakis,
kvmarm@lists.cs.columbia.edu, Christoffer Dall
On 7 July 2014 13:27, Eric Auger <eric.auger@linaro.org> wrote:
> From: Alvise Rigo <a.rigo@virtualopensystems.com>
>
> The flag is mandatory for the ARM SMMU so we always add it if the MMIO
> handles it.
>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
> hw/vfio/common.c | 9 +++++++++
> include/hw/vfio/vfio-common.h | 1 +
> linux-headers/linux/vfio.h | 2 ++
> 3 files changed, 12 insertions(+)
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 26c218e..b13f7d3 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -30,6 +30,7 @@
> */
> #define VFIO_DMA_CC_IOMMU 4
>
> +#define VFIO_IOMMU_PROT_EXEC 5
> /*
> * The IOCTL interface is designed for extensibility by embedding the
> * structure length (argsz) and flags into structures passed between
> @@ -398,6 +399,7 @@ struct vfio_iommu_type1_dma_map {
> __u32 flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
> +#define VFIO_DMA_MAP_FLAG_EXEC (1 << 2) /* executable from device */
> __u64 vaddr; /* Process virtual address */
> __u64 iova; /* IO virtual address */
> __u64 size; /* Size of mapping (bytes) */
You shouldn't change linux-headers/ files except by syncing them from
a kernel tree using scripts/update-linux-headers.sh. Those changes
should always be in a separate commit that includes the kernel tree
and commit hash synced against in its commit message. For an RFC
patchseries where the equivalent kernel changes haven't been
accepted upstream yet it's ok to sync against a local tree (and
clearly note in the commit message that it's not for committing
to upstream qemu), but the changes should still be in their own patch.
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings Eric Auger
2014-07-07 12:40 ` Peter Maydell
@ 2014-07-07 12:49 ` Will Deacon
2014-07-07 13:25 ` Alvise Rigo
1 sibling, 1 reply; 29+ messages in thread
From: Will Deacon @ 2014-07-07 12:49 UTC (permalink / raw)
To: Eric Auger
Cc: agraf@suse.de, kim.phillips@freescale.com, eric.auger@st.com,
peter.maydell@linaro.org, patches@linaro.org,
qemu-devel@nongnu.org, a.rigo@virtualopensystems.com,
Bharat.Bhushan@freescale.com, alex.williamson@redhat.com,
stuart.yoder@freescale.com, a.motakis@virtualopensystems.com,
kvmarm@lists.cs.columbia.edu, christoffer.dall@linaro.org
On Mon, Jul 07, 2014 at 01:27:18PM +0100, Eric Auger wrote:
> From: Alvise Rigo <a.rigo@virtualopensystems.com>
>
> The flag is mandatory for the ARM SMMU so we always add it if the MMIO
> handles it.
I though the logic of this flag was changing (so that you request an NX
mapping instead), so I'd hold off on this change until the kernel has
decided what it's doing.
Also, the ARM SMMU doesn't mandate any flags, you probably need this as
a result of using a PL330, which is an odd case of a DMA master that
spits out EXEC transactions (instruction fetch).
Will
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings
2014-07-07 12:49 ` Will Deacon
@ 2014-07-07 13:25 ` Alvise Rigo
2014-07-07 13:29 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Alvise Rigo @ 2014-07-07 13:25 UTC (permalink / raw)
To: Will Deacon, Eric Auger
Cc: peter.maydell@linaro.org, kim.phillips@freescale.com,
eric.auger@st.com, patches@linaro.org, qemu-devel@nongnu.org,
agraf@suse.de, Bharat.Bhushan@freescale.com,
alex.williamson@redhat.com, stuart.yoder@freescale.com,
a.motakis@virtualopensystems.com, kvmarm@lists.cs.columbia.edu,
christoffer.dall@linaro.org
Il 07/07/2014 14:49, Will Deacon ha scritto:
> On Mon, Jul 07, 2014 at 01:27:18PM +0100, Eric Auger wrote:
>> From: Alvise Rigo <a.rigo@virtualopensystems.com>
>>
>> The flag is mandatory for the ARM SMMU so we always add it if the MMIO
>> handles it.
>
> I though the logic of this flag was changing (so that you request an NX
> mapping instead), so I'd hold off on this change until the kernel has
> decided what it's doing.
Yes, you are right.
This patch is not needed anymore, in fact it was dropped in my last
patch series.
It should not be here, please ignore it.
Regards,
alvise
>
> Also, the ARM SMMU doesn't mandate any flags, you probably need this as
> a result of using a PL330, which is an odd case of a DMA master that
> spits out EXEC transactions (instruction fetch).
>
> Will
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings
2014-07-07 13:25 ` Alvise Rigo
@ 2014-07-07 13:29 ` Eric Auger
0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-07 13:29 UTC (permalink / raw)
To: Alvise Rigo, Will Deacon
Cc: peter.maydell@linaro.org, kim.phillips@freescale.com,
eric.auger@st.com, patches@linaro.org, qemu-devel@nongnu.org,
agraf@suse.de, Bharat.Bhushan@freescale.com,
alex.williamson@redhat.com, stuart.yoder@freescale.com,
a.motakis@virtualopensystems.com, kvmarm@lists.cs.columbia.edu,
christoffer.dall@linaro.org
On 07/07/2014 03:25 PM, Alvise Rigo wrote:
> Il 07/07/2014 14:49, Will Deacon ha scritto:
>> On Mon, Jul 07, 2014 at 01:27:18PM +0100, Eric Auger wrote:
>>> From: Alvise Rigo <a.rigo@virtualopensystems.com>
>>>
>>> The flag is mandatory for the ARM SMMU so we always add it if the MMIO
>>> handles it.
>>
>> I though the logic of this flag was changing (so that you request an NX
>> mapping instead), so I'd hold off on this change until the kernel has
>> decided what it's doing.
>
> Yes, you are right.
> This patch is not needed anymore, in fact it was dropped in my last
> patch series.
> It should not be here, please ignore it.
OK. My apologies.
Best Regards
Eric
>
> Regards,
> alvise
>
>>
>> Also, the ARM SMMU doesn't mandate any flags, you probably need this as
>> a result of using a PL330, which is an odd case of a DMA master that
>> spits out EXEC transactions (instruction fetch).
>>
>> Will
>>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files Eric Auger
@ 2014-07-08 18:55 ` Alex Williamson
2014-07-23 9:59 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2014-07-08 18:55 UTC (permalink / raw)
To: Eric Auger
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
> hw/vfio/pci.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 5c7bfd5..a7df3de 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -18,26 +18,14 @@
> * Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
> */
>
> -#include <dirent.h>
> #include <linux/vfio.h>
> #include <sys/ioctl.h>
> #include <sys/mman.h>
> -#include <sys/stat.h>
> -#include <sys/types.h>
> -#include <unistd.h>
> -
> -#include "config.h"
> #include "exec/address-spaces.h"
> -#include "exec/memory.h"
> #include "hw/pci/msi.h"
> #include "hw/pci/msix.h"
> -#include "hw/pci/pci.h"
> -#include "qemu-common.h"
> #include "qemu/error-report.h"
> -#include "qemu/event_notifier.h"
> -#include "qemu/queue.h"
> #include "qemu/range.h"
> -#include "sysemu/kvm.h"
> #include "sysemu/sysemu.h"
> #include "hw/vfio/vfio.h"
Was this just a remove and see if it still compiles exercise? I'm not
sure I'm a fan of removing includes that are arbitrarily included via
another include chain. Thanks,
Alex
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-07-08 22:41 ` Alex Williamson
2014-07-23 10:02 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2014-07-08 22:41 UTC (permalink / raw)
To: Eric Auger
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
> Introduce the VFIODevice struct that is going to be shared by
> VFIOPCIDevice and VFIOPlatformDevice.
>
> Additional fields will be added there later on for review
> convenience.
>
> the group's device_list becomes a list of VFIODevice
>
> This obliges to rework the reset_handler which becomes generic and
> calls VFIODevice ops that are specialized in each parent object.
> Also functions that iterate on this list must take care that the
> devices can be something else than VFIOPCIDevice. The type is used
> to discriminate them.
>
> we profit from this step to change the prototype of
> vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
> apply to VFIODevice. They are renamed as *_irqindex.
> The index is passed as parameter to anticipate their usage for
> platform IRQs
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
> hw/vfio/pci.c | 243 +++++++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 149 insertions(+), 94 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index a7df3de..d0bee62 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -44,6 +44,11 @@
> #define VFIO_ALLOW_KVM_MSI 1
> #define VFIO_ALLOW_KVM_MSIX 1
>
> +enum {
> + VFIO_DEVICE_TYPE_PCI = 0,
> + VFIO_DEVICE_TYPE_PLATFORM = 1,
> +};
> +
> struct VFIOPCIDevice;
>
> typedef struct VFIOQuirk {
> @@ -173,9 +178,27 @@ typedef struct VFIOMSIXInfo {
> void *mmap;
> } VFIOMSIXInfo;
>
> +typedef struct VFIODeviceOps VFIODeviceOps;
> +
> +typedef struct VFIODevice {
> + QLIST_ENTRY(VFIODevice) next;
> + struct VFIOGroup *group;
> + char *name;
> + int fd;
> + int type;
> + bool reset_works;
> + bool needs_reset;
> + VFIODeviceOps *ops;
> +} VFIODevice;
> +
> +struct VFIODeviceOps {
> + bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
> + int (*vfio_hot_reset_multi)(VFIODevice *vdev);
> +};
> +
> typedef struct VFIOPCIDevice {
> PCIDevice pdev;
> - int fd;
> + VFIODevice vbasedev;
> VFIOINTx intx;
> unsigned int config_size;
> uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
> @@ -191,20 +214,16 @@ typedef struct VFIOPCIDevice {
> VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
> VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
> PCIHostDeviceAddress host;
> - QLIST_ENTRY(VFIOPCIDevice) next;
> - struct VFIOGroup *group;
> EventNotifier err_notifier;
> uint32_t features;
> #define VFIO_FEATURE_ENABLE_VGA_BIT 0
> #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
> int32_t bootindex;
> uint8_t pm_cap;
> - bool reset_works;
> bool has_vga;
> bool pci_aer;
> bool has_flr;
> bool has_pm_reset;
> - bool needs_reset;
> bool rom_read_failed;
> } VFIOPCIDevice;
>
> @@ -212,7 +231,7 @@ typedef struct VFIOGroup {
> int fd;
> int groupid;
> VFIOContainer *container;
> - QLIST_HEAD(, VFIOPCIDevice) device_list;
> + QLIST_HEAD(, VFIODevice) device_list;
> QLIST_ENTRY(VFIOGroup) next;
> QLIST_ENTRY(VFIOGroup) container_next;
> } VFIOGroup;
> @@ -265,7 +284,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> /*
> * Common VFIO interrupt disable
> */
> -static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
> +static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
> {
> struct vfio_irq_set irq_set = {
> .argsz = sizeof(irq_set),
> @@ -275,37 +294,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
> .count = 0,
> };
>
> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> + ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> }
>
> /*
> * INTx
> */
> -static void vfio_unmask_intx(VFIOPCIDevice *vdev)
> +static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
> {
> struct vfio_irq_set irq_set = {
> .argsz = sizeof(irq_set),
> .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
> - .index = VFIO_PCI_INTX_IRQ_INDEX,
> + .index = index,
> .start = 0,
> .count = 1,
> };
>
> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> + ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> }
>
> #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
> -static void vfio_mask_intx(VFIOPCIDevice *vdev)
> +static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
> {
> struct vfio_irq_set irq_set = {
> .argsz = sizeof(irq_set),
> .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
> - .index = VFIO_PCI_INTX_IRQ_INDEX,
> + .index = index,
> .start = 0,
> .count = 1,
> };
>
> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> + ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> }
> #endif
>
> @@ -369,7 +388,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
>
> vdev->intx.pending = false;
> pci_irq_deassert(&vdev->pdev);
> - vfio_unmask_intx(vdev);
> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> }
>
> static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
> @@ -392,7 +411,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>
> /* Get to a known interrupt state */
> qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
> - vfio_mask_intx(vdev);
> + vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> vdev->intx.pending = false;
> pci_irq_deassert(&vdev->pdev);
>
> @@ -422,7 +441,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>
> *pfd = irqfd.resamplefd;
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> g_free(irq_set);
> if (ret) {
> error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
> @@ -430,7 +449,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
> }
>
> /* Let'em rip */
> - vfio_unmask_intx(vdev);
> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>
> vdev->intx.kvm_accel = true;
>
> @@ -447,7 +466,7 @@ fail_irqfd:
> event_notifier_cleanup(&vdev->intx.unmask);
> fail:
> qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
> - vfio_unmask_intx(vdev);
> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> #endif
> }
>
> @@ -468,7 +487,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
> * Get to a known state, hardware masked, QEMU ready to accept new
> * interrupts, QEMU IRQ de-asserted.
> */
> - vfio_mask_intx(vdev);
> + vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> vdev->intx.pending = false;
> pci_irq_deassert(&vdev->pdev);
>
> @@ -486,7 +505,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
> vdev->intx.kvm_accel = false;
>
> /* If we've missed an event, let it re-fire through QEMU */
> - vfio_unmask_intx(vdev);
> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>
> DPRINTF("%s(%04x:%02x:%02x.%x) KVM INTx accel disabled\n",
> __func__, vdev->host.domain, vdev->host.bus,
> @@ -574,7 +593,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
> *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
> qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> g_free(irq_set);
> if (ret) {
> error_report("vfio: Error: Failed to setup INTx fd: %m");
> @@ -599,7 +618,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
>
> timer_del(vdev->intx.mmap_timer);
> vfio_disable_intx_kvm(vdev);
> - vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> vdev->intx.pending = false;
> pci_irq_deassert(&vdev->pdev);
> vfio_mmap_set_enabled(vdev, true);
> @@ -678,7 +697,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> }
> }
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>
> g_free(irq_set);
>
> @@ -777,7 +796,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> * increase them as needed.
> */
> if (vdev->nr_vectors < nr + 1) {
> - vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> vdev->nr_vectors = nr + 1;
> ret = vfio_enable_vectors(vdev, true);
> if (ret) {
> @@ -805,7 +824,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> *pfd = event_notifier_get_fd(&vector->interrupt);
> }
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> g_free(irq_set);
> if (ret) {
> error_report("vfio: failed to modify vector, %d", ret);
> @@ -856,7 +875,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>
> *pfd = event_notifier_get_fd(&vector->interrupt);
>
> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>
> g_free(irq_set);
> }
> @@ -1016,7 +1035,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
> }
>
> if (vdev->nr_vectors) {
> - vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> }
>
> vfio_disable_msi_common(vdev);
> @@ -1027,7 +1046,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>
> static void vfio_disable_msi(VFIOPCIDevice *vdev)
> {
> - vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
> vfio_disable_msi_common(vdev);
>
> DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
> @@ -1173,7 +1192,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
> off_t off = 0;
> size_t bytes;
>
> - if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
> + if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
> error_report("vfio: Error getting ROM info: %m");
> return;
> }
> @@ -1203,7 +1222,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
> memset(vdev->rom, 0xff, size);
>
> while (size) {
> - bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
> + bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
> + size, vdev->rom_offset + off);
> if (bytes == 0) {
> break;
> } else if (bytes > 0) {
> @@ -1297,6 +1317,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
> off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
> DeviceState *dev = DEVICE(vdev);
> char name[32];
> + int fd = vdev->vbasedev.fd;
>
> if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
> /* Since pci handles romfile, just print a message and return */
> @@ -1315,10 +1336,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
> * Use the same size ROM BAR as the physical device. The contents
> * will get filled in later when the guest tries to read it.
> */
> - if (pread(vdev->fd, &orig, 4, offset) != 4 ||
> - pwrite(vdev->fd, &size, 4, offset) != 4 ||
> - pread(vdev->fd, &size, 4, offset) != 4 ||
> - pwrite(vdev->fd, &orig, 4, offset) != 4) {
> + if (pread(fd, &orig, 4, offset) != 4 ||
> + pwrite(fd, &size, 4, offset) != 4 ||
> + pread(fd, &size, 4, offset) != 4 ||
> + pwrite(fd, &orig, 4, offset) != 4) {
> error_report("%s(%04x:%02x:%02x.%x) failed: %m",
> __func__, vdev->host.domain, vdev->host.bus,
> vdev->host.slot, vdev->host.function);
> @@ -2302,7 +2323,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
> if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
> ssize_t ret;
>
> - ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
> + ret = pread(vdev->vbasedev.fd, &phys_val, len,
> + vdev->config_offset + addr);
> if (ret != len) {
> error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
> __func__, vdev->host.domain, vdev->host.bus,
> @@ -2332,7 +2354,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
> vdev->host.function, addr, val, len);
>
> /* Write everything to VFIO, let it filter out what we can't write */
> - if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
> + if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
> + != len) {
> error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
> __func__, vdev->host.domain, vdev->host.bus,
> vdev->host.slot, vdev->host.function, addr, val, len);
> @@ -2702,7 +2725,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
> bool msi_64bit, msi_maskbit;
> int ret, entries;
>
> - if (pread(vdev->fd, &ctrl, sizeof(ctrl),
> + if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
> vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
> return -errno;
> }
> @@ -2741,23 +2764,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
> uint8_t pos;
> uint16_t ctrl;
> uint32_t table, pba;
> + int fd = vdev->vbasedev.fd;
>
> pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
> if (!pos) {
> return 0;
> }
>
> - if (pread(vdev->fd, &ctrl, sizeof(ctrl),
> + if (pread(fd, &ctrl, sizeof(ctrl),
> vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
> return -errno;
> }
>
> - if (pread(vdev->fd, &table, sizeof(table),
> + if (pread(fd, &table, sizeof(table),
> vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
> return -errno;
> }
>
> - if (pread(vdev->fd, &pba, sizeof(pba),
> + if (pread(fd, &pba, sizeof(pba),
> vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
> return -errno;
> }
> @@ -2913,7 +2937,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
> vdev->host.function, nr);
>
> /* Determine what type of BAR this is for registration */
> - ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
> + ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
> vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
> if (ret != sizeof(pci_bar)) {
> error_report("vfio: Failed to read BAR %d (%m)", nr);
> @@ -3334,12 +3358,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> single ? "one" : "multi");
>
> vfio_pci_pre_reset(vdev);
> - vdev->needs_reset = false;
> + vdev->vbasedev.needs_reset = false;
>
> info = g_malloc0(sizeof(*info));
> info->argsz = sizeof(*info);
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> if (ret && errno != ENOSPC) {
> ret = -errno;
> if (!vdev->has_pm_reset) {
> @@ -3355,7 +3379,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> info->argsz = sizeof(*info) + (count * sizeof(*devices));
> devices = &info->devices[0];
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> if (ret) {
> ret = -errno;
> error_report("vfio: hot reset info failed: %m");
> @@ -3370,6 +3394,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> for (i = 0; i < info->count; i++) {
> PCIHostDeviceAddress host;
> VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
>
> host.domain = devices[i].segment;
> host.bus = devices[i].bus;
> @@ -3401,7 +3426,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> }
>
> /* Prep dependent devices for reset and clear our marker. */
> - QLIST_FOREACH(tmp, &group->device_list, next) {
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> if (vfio_pci_host_match(&host, &tmp->host)) {
> if (single) {
> DPRINTF("vfio: found another in-use device "
> @@ -3411,7 +3440,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> goto out_single;
> }
> vfio_pci_pre_reset(tmp);
> - tmp->needs_reset = false;
> + tmp->vbasedev.needs_reset = false;
> multi = true;
> break;
> }
> @@ -3450,7 +3479,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> }
>
> /* Bus reset! */
> - ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> g_free(reset);
>
> DPRINTF("%04x:%02x:%02x.%x hot reset: %s\n", vdev->host.domain,
> @@ -3462,6 +3491,7 @@ out:
> for (i = 0; i < info->count; i++) {
> PCIHostDeviceAddress host;
> VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
>
> host.domain = devices[i].segment;
> host.bus = devices[i].bus;
> @@ -3482,7 +3512,11 @@ out:
> break;
> }
>
> - QLIST_FOREACH(tmp, &group->device_list, next) {
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> if (vfio_pci_host_match(&host, &tmp->host)) {
> vfio_pci_post_reset(tmp);
> break;
> @@ -3516,28 +3550,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
> return vfio_pci_hot_reset(vdev, true);
> }
>
> -static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
> +static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
> {
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
nit, extra white space ^
> return vfio_pci_hot_reset(vdev, false);
> }
>
> -static void vfio_pci_reset_handler(void *opaque)
> +static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
> +{
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> + if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> + vbasedev->needs_reset = true;
> + }
> + return vbasedev->needs_reset;
> +}
> +
> +static VFIODeviceOps vfio_pci_ops = {
> + .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
> + .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
> +};
> +
> +static void vfio_reset_handler(void *opaque)
> {
> VFIOGroup *group;
> - VFIOPCIDevice *vdev;
> + VFIODevice *vbasedev;
>
> QLIST_FOREACH(group, &group_list, next) {
> - QLIST_FOREACH(vdev, &group->device_list, next) {
> - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> - vdev->needs_reset = true;
> - }
> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
> + vbasedev->ops->vfio_compute_needs_reset(vbasedev);
> }
> }
>
> QLIST_FOREACH(group, &group_list, next) {
> - QLIST_FOREACH(vdev, &group->device_list, next) {
> - if (vdev->needs_reset) {
> - vfio_pci_hot_reset_multi(vdev);
> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
> + if (vbasedev->needs_reset) {
> + vbasedev->ops->vfio_hot_reset_multi(vbasedev);
> }
> }
> }
> @@ -3682,7 +3729,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>
> if (container->iommu_data.type1.error) {
> ret = container->iommu_data.type1.error;
> - error_report("vfio: memory listener initialization failed for container");
> + error_report("vfio: memory listener initialization failed"
> + " for container");
Generally not good to split strings that would otherwise be search-able.
> goto listener_release_exit;
> }
>
> @@ -3826,7 +3874,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
> }
>
> if (QLIST_EMPTY(&group_list)) {
> - qemu_register_reset(vfio_pci_reset_handler, NULL);
> + qemu_register_reset(vfio_reset_handler, NULL);
> }
>
> QLIST_INSERT_HEAD(&group_list, group, next);
> @@ -3858,7 +3906,7 @@ static void vfio_put_group(VFIOGroup *group)
> g_free(group);
>
> if (QLIST_EMPTY(&group_list)) {
> - qemu_unregister_reset(vfio_pci_reset_handler, NULL);
> + qemu_unregister_reset(vfio_reset_handler, NULL);
> }
> }
>
> @@ -3879,12 +3927,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> return ret;
> }
>
> - vdev->fd = ret;
> - vdev->group = group;
> - QLIST_INSERT_HEAD(&group->device_list, vdev, next);
> + vdev->vbasedev.fd = ret;
> + vdev->vbasedev.group = group;
> + QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
>
> /* Sanity check device */
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
> if (ret) {
> error_report("vfio: error getting device info: %m");
> goto error;
> @@ -3898,7 +3946,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> goto error;
> }
>
> - vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> + vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>
> if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
> error_report("vfio: unexpected number of io regions %u",
> @@ -3914,7 +3962,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
> reg_info.index = i;
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
> if (ret) {
> error_report("vfio: Error getting region %d info: %m", i);
> goto error;
> @@ -3928,14 +3976,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> vdev->bars[i].flags = reg_info.flags;
> vdev->bars[i].size = reg_info.size;
> vdev->bars[i].fd_offset = reg_info.offset;
> - vdev->bars[i].fd = vdev->fd;
> + vdev->bars[i].fd = vdev->vbasedev.fd;
> vdev->bars[i].nr = i;
> QLIST_INIT(&vdev->bars[i].quirks);
> }
>
> reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
> if (ret) {
> error_report("vfio: Error getting config info: %m");
> goto error;
> @@ -3959,7 +4007,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> .index = VFIO_PCI_VGA_REGION_INDEX,
> };
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
> if (ret) {
> error_report(
> "vfio: Device does not support requested feature x-vga");
> @@ -3976,7 +4024,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> }
>
> vdev->vga.fd_offset = vga_info.offset;
> - vdev->vga.fd = vdev->fd;
> + vdev->vga.fd = vdev->vbasedev.fd;
>
> vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
> vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
> @@ -3994,7 +4042,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> }
> irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> if (ret) {
> /* This can fail for an old kernel or legacy PCI dev */
> DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
> @@ -4010,19 +4058,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>
> error:
> if (ret) {
> - QLIST_REMOVE(vdev, next);
> - vdev->group = NULL;
> - close(vdev->fd);
> + QLIST_REMOVE(&vdev->vbasedev, next);
> + vdev->vbasedev.group = NULL;
> + close(vdev->vbasedev.fd);
> }
> return ret;
> }
>
> static void vfio_put_device(VFIOPCIDevice *vdev)
> {
> - QLIST_REMOVE(vdev, next);
> - vdev->group = NULL;
> + QLIST_REMOVE(&vdev->vbasedev, next);
> + vdev->vbasedev.group = NULL;
> DPRINTF("vfio_put_device: close vdev->fd\n");
> - close(vdev->fd);
> + close(vdev->vbasedev.fd);
> + g_free(vdev->vbasedev.name);
> if (vdev->msix) {
> g_free(vdev->msix);
> vdev->msix = NULL;
> @@ -4091,7 +4140,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
> *pfd = event_notifier_get_fd(&vdev->err_notifier);
> qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> if (ret) {
> error_report("vfio: Failed to set up error notification");
> qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
> @@ -4124,7 +4173,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
> pfd = (int32_t *)&irq_set->data;
> *pfd = -1;
>
> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> if (ret) {
> error_report("vfio: Failed to de-assign error fd: %m");
> }
> @@ -4136,7 +4185,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>
> static int vfio_initfn(PCIDevice *pdev)
> {
> - VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> + VFIODevice *vbasedev_iter;
> VFIOGroup *group;
> char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
> ssize_t len;
> @@ -4154,6 +4204,14 @@ static int vfio_initfn(PCIDevice *pdev)
> return -errno;
> }
>
> + vdev->vbasedev.ops = &vfio_pci_ops;
> +
> + vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
> + vdev->vbasedev.name = g_malloc0(PATH_MAX);
> + snprintf(vdev->vbasedev.name, PATH_MAX, "%04x:%02x:%02x.%01x",
> + vdev->host.domain, vdev->host.bus, vdev->host.slot,
> + vdev->host.function);
> +
asprintf(3)? This is a deterministic length, so PATH_MAX is especially
ridiculous.
> strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>
> len = readlink(path, iommu_group_path, sizeof(path));
> @@ -4183,12 +4241,8 @@ static int vfio_initfn(PCIDevice *pdev)
> vdev->host.domain, vdev->host.bus, vdev->host.slot,
> vdev->host.function);
>
> - QLIST_FOREACH(pvdev, &group->device_list, next) {
> - if (pvdev->host.domain == vdev->host.domain &&
> - pvdev->host.bus == vdev->host.bus &&
> - pvdev->host.slot == vdev->host.slot &&
> - pvdev->host.function == vdev->host.function) {
> -
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
> error_report("vfio: error: device %s is already attached", path);
> vfio_put_group(group);
> return -EBUSY;
> @@ -4203,7 +4257,7 @@ static int vfio_initfn(PCIDevice *pdev)
> }
>
> /* Get a copy of config space */
> - ret = pread(vdev->fd, vdev->pdev.config,
> + ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> vdev->config_offset);
> if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
> @@ -4291,7 +4345,7 @@ out_put:
> static void vfio_exitfn(PCIDevice *pdev)
> {
> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> - VFIOGroup *group = vdev->group;
> + VFIOGroup *group = vdev->vbasedev.group;
>
> vfio_unregister_err_notifier(vdev);
> pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> @@ -4317,8 +4371,9 @@ static void vfio_pci_reset(DeviceState *dev)
>
> vfio_pci_pre_reset(vdev);
>
> - if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
> - !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
> + if (vdev->vbasedev.reset_works &&
> + (vdev->has_flr || !vdev->has_pm_reset) &&
> + !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
> DPRINTF("%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET\n", vdev->host.domain,
> vdev->host.bus, vdev->host.slot, vdev->host.function);
> goto post_reset;
> @@ -4330,8 +4385,8 @@ static void vfio_pci_reset(DeviceState *dev)
> }
>
> /* If nothing else works and the device supports PM reset, use it */
> - if (vdev->reset_works && vdev->has_pm_reset &&
> - !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
> + if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
> + !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
> DPRINTF("%04x:%02x:%02x.%x PCI PM Reset\n", vdev->host.domain,
> vdev->host.bus, vdev->host.slot, vdev->host.function);
> goto post_reset;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion Eric Auger
@ 2014-07-08 22:41 ` Alex Williamson
2014-07-23 13:50 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2014-07-08 22:41 UTC (permalink / raw)
To: Eric Auger
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
> This structure is going to be shared by VFIOPCIDevice and
> VFIOPlatformDevice. VFIOBAR includes it.
>
> vfio_eoi becomes an ops of VFIODevice specialized by parent device.
> This makes possible to transform vfio_bar_write/read into generic
> vfio_region_write/read that will be used by VFIOPlatformDevice too.
>
> vfio_mmap_bar becomes vfio_map_region
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
> hw/vfio/pci.c | 169 ++++++++++++++++++++++++++++++++--------------------------
> 1 file changed, 93 insertions(+), 76 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index d0bee62..5f0164a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -74,15 +74,20 @@ typedef struct VFIOQuirk {
> } data;
> } VFIOQuirk;
>
> -typedef struct VFIOBAR {
> - off_t fd_offset; /* offset of BAR within device fd */
> - int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
> +typedef struct VFIORegion {
> + struct VFIODevice *vbasedev;
> + off_t fd_offset; /* offset of region within device fd */
> + int fd; /* device fd, allows us to pass VFIORegion as opaque data */
The value of fd here is a bit diminished if we're adding a pointer to
the basedev.
> MemoryRegion mem; /* slow, read/write access */
> MemoryRegion mmap_mem; /* direct mapped access */
> void *mmap;
> size_t size;
> uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
> - uint8_t nr; /* cache the BAR number for debug */
> + uint8_t nr; /* cache the region number for debug */
> +} VFIORegion;
> +
> +typedef struct VFIOBAR {
> + VFIORegion region;
> bool ioport;
> bool mem64;
> QLIST_HEAD(, VFIOQuirk) quirks;
> @@ -194,6 +199,7 @@ typedef struct VFIODevice {
> struct VFIODeviceOps {
> bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
> + void (*vfio_eoi)(VFIODevice *vdev);
> };
>
> typedef struct VFIOPCIDevice {
> @@ -377,8 +383,10 @@ static void vfio_intx_interrupt(void *opaque)
> }
> }
>
> -static void vfio_eoi(VFIOPCIDevice *vdev)
> +static void vfio_eoi(VFIODevice *vbasedev)
> {
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +
> if (!vdev->intx.pending) {
> return;
> }
> @@ -388,7 +396,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
>
> vdev->intx.pending = false;
> pci_irq_deassert(&vdev->pdev);
> - vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> + vfio_unmask_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
> }
>
> static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
> @@ -543,7 +551,7 @@ static void vfio_update_irq(PCIDevice *pdev)
> vfio_enable_intx_kvm(vdev);
>
> /* Re-enable the interrupt in cased we missed an EOI */
> - vfio_eoi(vdev);
> + vfio_eoi(&vdev->vbasedev);
> }
>
> static int vfio_enable_intx(VFIOPCIDevice *vdev)
> @@ -1073,10 +1081,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
> /*
> * IO Port/MMIO - Beware of the endians, VFIO is always little endian
> */
> -static void vfio_bar_write(void *opaque, hwaddr addr,
> +static void vfio_region_write(void *opaque, hwaddr addr,
> uint64_t data, unsigned size)
> {
> - VFIOBAR *bar = opaque;
> + VFIORegion *region = opaque;
> + VFIODevice *vbasedev = region->vbasedev;
> union {
> uint8_t byte;
> uint16_t word;
> @@ -1099,19 +1108,16 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
> break;
> }
>
> - if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
> + if (pwrite(region->fd, &buf, size, region->fd_offset + addr) != size) {
> error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
> __func__, addr, data, size);
Now that we've got vbasedev->name and region->nr we could make this
error report a bit more useful.
> }
>
> #ifdef DEBUG_VFIO
> {
> - VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
> -
> - DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx", 0x%"PRIx64
> - ", %d)\n", __func__, vdev->host.domain, vdev->host.bus,
> - vdev->host.slot, vdev->host.function, bar->nr, addr,
> - data, size);
> + DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
> + ", %d)\n", __func__, vbasedev->name,
> + region->nr, addr, data, size);
> }
> #endif
This no longer needs the #ifdef since we don't need any new variables to
make the debug info accessible. Thank goodness vfio maps BAR0 to
region0 or else this debug output would need a translator.
>
> @@ -1123,13 +1129,15 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
> * which access will service the interrupt, so we're potentially
> * getting quite a few host interrupts per guest interrupt.
> */
> - vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
> + vbasedev->ops->vfio_eoi(vbasedev);
> +
> }
>
> -static uint64_t vfio_bar_read(void *opaque,
> +static uint64_t vfio_region_read(void *opaque,
> hwaddr addr, unsigned size)
> {
> - VFIOBAR *bar = opaque;
> + VFIORegion *region = opaque;
> + VFIODevice *vbasedev = region->vbasedev;
> union {
> uint8_t byte;
> uint16_t word;
> @@ -1138,7 +1146,7 @@ static uint64_t vfio_bar_read(void *opaque,
> } buf;
> uint64_t data = 0;
>
> - if (pread(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
> + if (pread(region->fd, &buf, size, region->fd_offset + addr) != size) {
> error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
> __func__, addr, size);
> return (uint64_t)-1;
> @@ -1161,24 +1169,21 @@ static uint64_t vfio_bar_read(void *opaque,
>
> #ifdef DEBUG_VFIO
> {
> - VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
> -
> - DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx
> - ", %d) = 0x%"PRIx64"\n", __func__, vdev->host.domain,
> - vdev->host.bus, vdev->host.slot, vdev->host.function,
> - bar->nr, addr, size, data);
> + DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
> + __func__, vdev->name,
> + region->nr, addr, size, data);
> }
> #endif
>
> /* Same as write above */
> - vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
> + vbasedev->ops->vfio_eoi(vbasedev);
>
> return data;
> }
>
> -static const MemoryRegionOps vfio_bar_ops = {
> - .read = vfio_bar_read,
> - .write = vfio_bar_write,
> +static const MemoryRegionOps vfio_region_ops = {
> + .read = vfio_region_read,
> + .write = vfio_region_write,
> .endianness = DEVICE_NATIVE_ENDIAN,
> };
>
> @@ -1513,7 +1518,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
> vdev->host.bus, vdev->host.slot, vdev->host.function,
> quirk->data.bar, addr, size, data);
> } else {
> - data = vfio_bar_read(&vdev->bars[quirk->data.bar],
> + data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
> addr + quirk->data.base_offset, size);
re-align the next line please
> }
>
> @@ -1564,7 +1569,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
> return;
> }
>
> - vfio_bar_write(&vdev->bars[quirk->data.bar],
> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
> addr + quirk->data.base_offset, data, size);
> }
>
> @@ -1598,7 +1603,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
> vdev->host.bus, vdev->host.slot, vdev->host.function,
> quirk->data.bar, addr + base, size, data);
> } else {
> - data = vfio_bar_read(&vdev->bars[quirk->data.bar], addr + base, size);
> + data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
> + addr + base, size);
> }
>
> return data;
> @@ -1627,7 +1633,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
> vdev->host.domain, vdev->host.bus, vdev->host.slot,
> vdev->host.function, quirk->data.bar, addr + base, data, size);
> } else {
> - vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
> + addr + base, data, size);
> }
> }
>
> @@ -1680,7 +1687,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
> * As long as the BAR is >= 256 bytes it will be aligned such that the
> * lower byte is always zero. Filter out anything else, if it exists.
> */
> - if (!vdev->bars[4].ioport || vdev->bars[4].size < 256) {
> + if (!vdev->bars[4].ioport || vdev->bars[4].region.size < 256) {
> return;
> }
>
> @@ -1733,7 +1740,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
> memory_region_init_io(&quirk->mem, OBJECT(vdev),
> &vfio_generic_window_quirk, quirk,
> "vfio-ati-bar4-window-quirk", 8);
> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> quirk->data.base_offset, &quirk->mem, 1);
>
> QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
> @@ -1807,7 +1814,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
> memory_region_name(&quirk->mem), vdev->host.domain,
> vdev->host.bus, vdev->host.slot, vdev->host.function);
>
> - return vfio_bar_read(&vdev->bars[quirk->data.bar], addr + 0x70, size);
> + return vfio_region_read(&vdev->bars[quirk->data.bar].region,
> + addr + 0x70, size);
> }
>
> static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
> @@ -1847,7 +1855,8 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
> memory_region_name(&quirk->mem), vdev->host.domain,
> vdev->host.bus, vdev->host.slot, vdev->host.function);
>
> - vfio_bar_write(&vdev->bars[quirk->data.bar], addr + 0x70, data, size);
> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
> + addr + 0x70, data, size);
> }
>
> static const MemoryRegionOps vfio_rtl8168_window_quirk = {
> @@ -1877,7 +1886,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
>
> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_rtl8168_window_quirk,
> quirk, "vfio-rtl8168-window-quirk", 8);
> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> 0x70, &quirk->mem, 1);
>
> QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
> @@ -1910,7 +1919,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
> "vfio-ati-bar2-4000-quirk",
> TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> quirk->data.address_match & TARGET_PAGE_MASK,
> &quirk->mem, 1);
>
> @@ -2029,7 +2038,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
> VFIOQuirk *quirk;
>
> if (pci_get_word(pdev->config + PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA ||
> - !vdev->bars[1].size) {
> + !vdev->bars[1].region.size) {
> return;
> }
>
> @@ -2137,7 +2146,8 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
> memory_region_init_io(&quirk->mem, OBJECT(vdev),
> &vfio_nvidia_bar5_window_quirk, quirk,
> "vfio-nvidia-bar5-window-quirk", 16);
> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 1);
> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> + 0, &quirk->mem, 1);
>
> QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
>
> @@ -2164,7 +2174,8 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
> */
> if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
> vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
> - vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
> + addr + base, data, size);
> }
> }
>
> @@ -2203,7 +2214,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
> quirk, "vfio-nvidia-bar0-88000-quirk",
> TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> quirk->data.address_match & TARGET_PAGE_MASK,
> &quirk->mem, 1);
>
> @@ -2229,7 +2240,8 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
>
> /* Log the chipset ID */
> DPRINTF("Nvidia NV%02x\n",
> - (unsigned int)(vfio_bar_read(&vdev->bars[0], 0, 4) >> 20) & 0xff);
> + (unsigned int)(vfio_region_read(&vdev->bars[0].region, 0, 4) >> 20)
> + & 0xff);
>
> quirk = g_malloc0(sizeof(*quirk));
> quirk->vdev = vdev;
> @@ -2241,7 +2253,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
> "vfio-nvidia-bar0-1800-quirk",
> TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> quirk->data.address_match & TARGET_PAGE_MASK,
> &quirk->mem, 1);
>
> @@ -2298,7 +2310,7 @@ static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
>
> while (!QLIST_EMPTY(&bar->quirks)) {
> VFIOQuirk *quirk = QLIST_FIRST(&bar->quirks);
> - memory_region_del_subregion(&bar->mem, &quirk->mem);
> + memory_region_del_subregion(&bar->region.mem, &quirk->mem);
> memory_region_destroy(&quirk->mem);
> QLIST_REMOVE(quirk, next);
> g_free(quirk);
> @@ -2811,9 +2823,9 @@ static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
> int ret;
>
> ret = msix_init(&vdev->pdev, vdev->msix->entries,
> - &vdev->bars[vdev->msix->table_bar].mem,
> + &vdev->bars[vdev->msix->table_bar].region.mem,
> vdev->msix->table_bar, vdev->msix->table_offset,
> - &vdev->bars[vdev->msix->pba_bar].mem,
> + &vdev->bars[vdev->msix->pba_bar].region.mem,
> vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
> if (ret < 0) {
> if (ret == -ENOTSUP) {
> @@ -2831,8 +2843,9 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
> msi_uninit(&vdev->pdev);
>
> if (vdev->msix) {
> - msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
> - &vdev->bars[vdev->msix->pba_bar].mem);
> + msix_uninit(&vdev->pdev,
> + &vdev->bars[vdev->msix->table_bar].region.mem,
> + &vdev->bars[vdev->msix->pba_bar].region.mem);
> }
> }
>
> @@ -2846,11 +2859,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
> for (i = 0; i < PCI_ROM_SLOT; i++) {
> VFIOBAR *bar = &vdev->bars[i];
>
> - if (!bar->size) {
> + if (!bar->region.size) {
> continue;
> }
>
> - memory_region_set_enabled(&bar->mmap_mem, enabled);
> + memory_region_set_enabled(&bar->region.mmap_mem, enabled);
> if (vdev->msix && vdev->msix->table_bar == i) {
> memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
> }
> @@ -2861,45 +2874,46 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
> {
> VFIOBAR *bar = &vdev->bars[nr];
>
> - if (!bar->size) {
> + if (!bar->region.size) {
> return;
> }
>
> vfio_bar_quirk_teardown(vdev, nr);
>
> - memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
> - munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
> - memory_region_destroy(&bar->mmap_mem);
> + memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
> + munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
> + memory_region_destroy(&bar->region.mmap_mem);
>
> if (vdev->msix && vdev->msix->table_bar == nr) {
> - memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
> + memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
> munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
> memory_region_destroy(&vdev->msix->mmap_mem);
> }
>
> - memory_region_destroy(&bar->mem);
> + memory_region_destroy(&bar->region.mem);
> }
>
> -static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
> +static int vfio_mmap_region(Object *vdev, VFIORegion *region,
"vdev" is effectively a reserved variable name here, let's not use it to
reference an Object.
> MemoryRegion *mem, MemoryRegion *submem,
> void **map, size_t size, off_t offset,
> const char *name)
> {
> int ret = 0;
>
> - if (VFIO_ALLOW_MMAP && size && bar->flags & VFIO_REGION_INFO_FLAG_MMAP) {
> + if (VFIO_ALLOW_MMAP && size && region->flags &
> + VFIO_REGION_INFO_FLAG_MMAP) {
> int prot = 0;
>
> - if (bar->flags & VFIO_REGION_INFO_FLAG_READ) {
> + if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
> prot |= PROT_READ;
> }
>
> - if (bar->flags & VFIO_REGION_INFO_FLAG_WRITE) {
> + if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
> prot |= PROT_WRITE;
> }
>
> *map = mmap(NULL, size, prot, MAP_SHARED,
> - bar->fd, bar->fd_offset + offset);
> + region->fd, region->fd_offset + offset);
> if (*map == MAP_FAILED) {
> *map = NULL;
> ret = -errno;
> @@ -2921,7 +2935,7 @@ empty_region:
> static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
> {
> VFIOBAR *bar = &vdev->bars[nr];
> - unsigned size = bar->size;
> + unsigned size = bar->region.size;
> char name[64];
> uint32_t pci_bar;
> uint8_t type;
> @@ -2951,9 +2965,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
> ~PCI_BASE_ADDRESS_MEM_MASK);
>
> /* A "slow" read/write mapping underlies all BARs */
> - memory_region_init_io(&bar->mem, OBJECT(vdev), &vfio_bar_ops,
> + memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
> bar, name, size);
> - pci_register_bar(&vdev->pdev, nr, type, &bar->mem);
> + pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
>
> /*
> * We can't mmap areas overlapping the MSIX vector table, so we
> @@ -2964,8 +2978,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
> }
>
> strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
> - if (vfio_mmap_bar(vdev, bar, &bar->mem,
> - &bar->mmap_mem, &bar->mmap, size, 0, name)) {
> + if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
> + &bar->region.mmap_mem, &bar->region.mmap,
> + size, 0, name)) {
> error_report("%s unsupported. Performance may be slow", name);
> }
>
> @@ -2975,10 +2990,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
> start = HOST_PAGE_ALIGN(vdev->msix->table_offset +
> (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
>
> - size = start < bar->size ? bar->size - start : 0;
> + size = start < bar->region.size ? bar->region.size - start : 0;
> strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
> /* VFIOMSIXInfo contains another MemoryRegion for this mapping */
> - if (vfio_mmap_bar(vdev, bar, &bar->mem, &vdev->msix->mmap_mem,
> + if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
> + &vdev->msix->mmap_mem,
> &vdev->msix->mmap, size, start, name)) {
> error_report("%s unsupported. Performance may be slow", name);
> }
> @@ -3568,6 +3584,7 @@ static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
> static VFIODeviceOps vfio_pci_ops = {
> .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
> .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
> + .vfio_eoi = vfio_eoi,
> };
>
> static void vfio_reset_handler(void *opaque)
> @@ -3973,11 +3990,11 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> (unsigned long)reg_info.size, (unsigned long)reg_info.offset,
> (unsigned long)reg_info.flags);
>
> - vdev->bars[i].flags = reg_info.flags;
> - vdev->bars[i].size = reg_info.size;
> - vdev->bars[i].fd_offset = reg_info.offset;
> - vdev->bars[i].fd = vdev->vbasedev.fd;
> - vdev->bars[i].nr = i;
> + vdev->bars[i].region.flags = reg_info.flags;
> + vdev->bars[i].region.size = reg_info.size;
> + vdev->bars[i].region.fd_offset = reg_info.offset;
> + vdev->bars[i].region.fd = vdev->vbasedev.fd;
> + vdev->bars[i].region.nr = i;
> QLIST_INIT(&vdev->bars[i].quirks);
> }
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device Eric Auger
@ 2014-07-08 22:43 ` Alex Williamson
2014-07-24 9:51 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2014-07-08 22:43 UTC (permalink / raw)
To: Eric Auger
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
> vfio_get_device now takes a VFIODevice as argument. The function is split
> into 4 functional parts: dev_info query, device check, region populate
> and interrupt populate. the last 3 are specialized by parent device and
> are added into DeviceOps.
>
> 3 new fields are introduced in VFIODevice to store dev_info.
>
> vfio_put_base_device is created.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
> hw/vfio/pci.c | 181 +++++++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 121 insertions(+), 60 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 5f0164a..d228cf8 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -194,12 +194,18 @@ typedef struct VFIODevice {
> bool reset_works;
> bool needs_reset;
> VFIODeviceOps *ops;
> + unsigned int num_irqs;
> + unsigned int num_regions;
> + unsigned int flags;
> } VFIODevice;
>
> struct VFIODeviceOps {
> bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
> void (*vfio_eoi)(VFIODevice *vdev);
> + int (*vfio_check_device)(VFIODevice *vdev);
> + int (*vfio_populate_regions)(VFIODevice *vdev);
> + int (*vfio_populate_interrupts)(VFIODevice *vdev);
> };
>
> typedef struct VFIOPCIDevice {
> @@ -286,6 +292,10 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
> uint32_t val, int len);
> static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static void vfio_put_base_device(VFIODevice *vbasedev);
> +static int vfio_check_device(VFIODevice *vbasedev);
> +static int vfio_populate_regions(VFIODevice *vbasedev);
> +static int vfio_populate_interrupts(VFIODevice *vbasedev);
>
> /*
> * Common VFIO interrupt disable
> @@ -3585,6 +3595,9 @@ static VFIODeviceOps vfio_pci_ops = {
> .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
> .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
> .vfio_eoi = vfio_eoi,
> + .vfio_check_device = vfio_check_device,
> + .vfio_populate_regions = vfio_populate_regions,
> + .vfio_populate_interrupts = vfio_populate_interrupts,
> };
>
> static void vfio_reset_handler(void *opaque)
> @@ -3927,54 +3940,53 @@ static void vfio_put_group(VFIOGroup *group)
> }
> }
>
> -static int vfio_get_device(VFIOGroup *group, const char *name,
> - VFIOPCIDevice *vdev)
> +static int vfio_check_device(VFIODevice *vbasedev)
> {
> - struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> - struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> - struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> - int ret, i;
> -
> - ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> - if (ret < 0) {
> - error_report("vfio: error getting device %s from group %d: %m",
> - name, group->groupid);
> - error_printf("Verify all devices in group %d are bound to vfio-pci "
> - "or pci-stub and not already in use\n", group->groupid);
> - return ret;
> + if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
> + error_report("vfio: Um, this isn't a PCI device");
> + goto error;
> }
> -
> - vdev->vbasedev.fd = ret;
> - vdev->vbasedev.group = group;
> - QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
> -
> - /* Sanity check device */
> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
> - if (ret) {
> - error_report("vfio: error getting device info: %m");
> + if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
> + error_report("vfio: unexpected number of io regions %u",
> + vbasedev->num_regions);
> goto error;
> }
> -
> - DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
> - dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
> -
> - if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
> - error_report("vfio: Um, this isn't a PCI device");
> + if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
> + error_report("vfio: unexpected number of irqs %u",
> + vbasedev->num_irqs);
> goto error;
> }
> + return 0;
> +error:
> + vfio_put_base_device(vbasedev);
This doesn't make much sense, this function never "got" the base device,
so why does it need to "put" it on error? We should simply return error
and the caller (presumably who got it) should put the device.
> + return -errno;
Nothing above seems to guarantee we have anything useful in errno (or
that it hasn't been clobbered).
> +}
>
> - vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
> +{
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> + int ret;
> + struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> + irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>
> - if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
> - error_report("vfio: unexpected number of io regions %u",
> - dev_info.num_regions);
> - goto error;
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> + if (ret) {
> + /* This can fail for an old kernel or legacy PCI dev */
> + DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
> + } else if (irq_info.count == 1) {
> + vdev->pci_aer = true;
> + } else {
> + error_report("vfio: %s Could not enable error recovery for the device",
> + vbasedev->name);
> }
> + return ret;
This function returns error if the device doesn't support error
reporting, which is an optional feature. I don't think that's what we
want.
> +}
>
> - if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
> - error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
> - goto error;
> - }
> +static int vfio_populate_regions(VFIODevice *vbasedev)
> +{
> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> + int i, ret;
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>
> for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
> reg_info.index = i;
> @@ -4018,7 +4030,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
> vdev->config_offset = reg_info.offset;
>
> if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
> - dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
> + vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
> struct vfio_region_info vga_info = {
> .argsz = sizeof(vga_info),
> .index = VFIO_PCI_VGA_REGION_INDEX,
> @@ -4057,38 +4069,87 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>
> vdev->has_vga = true;
> }
> - irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
> + return 0;
> +error:
> + vfio_put_base_device(vbasedev);
> + return -errno;
errno can get clobbered by here, don't count on it. Also, why does this
put the base device while interrupt_populate error does not? The put
needs to happen a level above these functions imho.
> +}
> +
> +static int vfio_get_device(VFIOGroup *group, const char *name,
> + VFIODevice *vbasedev)
> +{
> + struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> + struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> + int ret;
> +
> + ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> + if (ret < 0) {
> + error_report("vfio: error getting device %s from group %d: %m",
> + name, group->groupid);
> + error_printf("Verify all devices in group %d are bound to vfio-pci "
> + "or pci-stub and not already in use\n", group->groupid);
> + return ret;
> + }
> +
> + vbasedev->fd = ret;
> + vbasedev->group = group;
> + QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>
> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
> if (ret) {
> - /* This can fail for an old kernel or legacy PCI dev */
> - DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
> - ret = 0;
> - } else if (irq_info.count == 1) {
> - vdev->pci_aer = true;
> - } else {
> - error_report("vfio: %04x:%02x:%02x.%x "
> - "Could not enable error recovery for the device",
> - vdev->host.domain, vdev->host.bus, vdev->host.slot,
> - vdev->host.function);
> + error_report("vfio: error getting device info: %m");
> + goto error;
> + }
> +
> + vbasedev->num_irqs = dev_info.num_irqs;
> + vbasedev->num_regions = dev_info.num_regions;
> + vbasedev->flags = dev_info.flags;
> +
> + DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
> + dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
> +
> + vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> +
> + /* call device specific functions */
> + ret = vbasedev->ops->vfio_check_device(vbasedev);
> + if (ret) {
> + error_report("vfio: error when checking device %s\n",
> + vbasedev->name);
> + goto error;
> + }
> + ret = vbasedev->ops->vfio_populate_regions(vbasedev);
> + if (ret) {
> + error_report("vfio: error when populating regions of device %s\n",
> + vbasedev->name);
> + goto error;
> + }
> + ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
> + if (ret) {
> + error_report("vfio: error when populating interrupts of device %s\n",
> + vbasedev->name);
> + goto error;
> }
>
> error:
> if (ret) {
> - QLIST_REMOVE(&vdev->vbasedev, next);
> - vdev->vbasedev.group = NULL;
> - close(vdev->vbasedev.fd);
> + vfio_put_base_device(vbasedev);
Whoops, more confusion, the call-out functions are doing put calls
(well, some of them) and so is this. This is the only place it should
occur.
> }
> return ret;
> }
>
> -static void vfio_put_device(VFIOPCIDevice *vdev)
> +void vfio_put_base_device(VFIODevice *vbasedev)
> {
> - QLIST_REMOVE(&vdev->vbasedev, next);
> - vdev->vbasedev.group = NULL;
> + QLIST_REMOVE(vbasedev, next);
> + vbasedev->group = NULL;
> DPRINTF("vfio_put_device: close vdev->fd\n");
> - close(vdev->vbasedev.fd);
> - g_free(vdev->vbasedev.name);
> + close(vbasedev->fd);
> + g_free(vbasedev->name);
get/put of the base device is still a bit messy. .name doesn't get
allocated by the get, but gets freed by the put.
> +}
> +
> +static void vfio_put_device(VFIOPCIDevice *vdev)
> +{
> + vfio_put_base_device(&vdev->vbasedev);
> if (vdev->msix) {
> g_free(vdev->msix);
> vdev->msix = NULL;
> @@ -4266,7 +4327,7 @@ static int vfio_initfn(PCIDevice *pdev)
> }
> }
>
> - ret = vfio_get_device(group, path, vdev);
> + ret = vfio_get_device(group, path, &vdev->vbasedev);
> if (ret) {
> error_report("vfio: failed to get device %s", path);
> vfio_put_group(group);
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files
2014-07-08 18:55 ` Alex Williamson
@ 2014-07-23 9:59 ` Eric Auger
0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-23 9:59 UTC (permalink / raw)
To: Alex Williamson
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On 07/08/2014 08:55 PM, Alex Williamson wrote:
> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>> hw/vfio/pci.c | 12 ------------
>> 1 file changed, 12 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 5c7bfd5..a7df3de 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -18,26 +18,14 @@
>> * Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>> */
>>
>> -#include <dirent.h>
>> #include <linux/vfio.h>
>> #include <sys/ioctl.h>
>> #include <sys/mman.h>
>> -#include <sys/stat.h>
>> -#include <sys/types.h>
>> -#include <unistd.h>
>> -
>> -#include "config.h"
>> #include "exec/address-spaces.h"
>> -#include "exec/memory.h"
>> #include "hw/pci/msi.h"
>> #include "hw/pci/msix.h"
>> -#include "hw/pci/pci.h"
>> -#include "qemu-common.h"
>> #include "qemu/error-report.h"
>> -#include "qemu/event_notifier.h"
>> -#include "qemu/queue.h"
>> #include "qemu/range.h"
>> -#include "sysemu/kvm.h"
>> #include "sysemu/sysemu.h"
>> #include "hw/vfio/vfio.h"
>
> Was this just a remove and see if it still compiles exercise? I'm not
> sure I'm a fan of removing includes that are arbitrarily included via
> another include chain. Thanks,
Hi Alex.
Sorry for the delay, coming back from vacation period...
Yes it was a lazy way to sort things out for PCI/platform split.
Then I will drop that patch file.
Besides, some system includes might be removed thanks to the inclusion
of qemu-common.h, which sounds stable/reliable? dirent.h as well? Anyway
it does not help in any way for my matters.
Best Regards
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
2014-07-08 22:41 ` Alex Williamson
@ 2014-07-23 10:02 ` Eric Auger
2014-07-23 10:24 ` Peter Maydell
0 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2014-07-23 10:02 UTC (permalink / raw)
To: Alex Williamson
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On 07/09/2014 12:41 AM, Alex Williamson wrote:
> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>> Introduce the VFIODevice struct that is going to be shared by
>> VFIOPCIDevice and VFIOPlatformDevice.
>>
>> Additional fields will be added there later on for review
>> convenience.
>>
>> the group's device_list becomes a list of VFIODevice
>>
>> This obliges to rework the reset_handler which becomes generic and
>> calls VFIODevice ops that are specialized in each parent object.
>> Also functions that iterate on this list must take care that the
>> devices can be something else than VFIOPCIDevice. The type is used
>> to discriminate them.
>>
>> we profit from this step to change the prototype of
>> vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
>> apply to VFIODevice. They are renamed as *_irqindex.
>> The index is passed as parameter to anticipate their usage for
>> platform IRQs
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>> hw/vfio/pci.c | 243 +++++++++++++++++++++++++++++++++++-----------------------
>> 1 file changed, 149 insertions(+), 94 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index a7df3de..d0bee62 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -44,6 +44,11 @@
>> #define VFIO_ALLOW_KVM_MSI 1
>> #define VFIO_ALLOW_KVM_MSIX 1
>>
>> +enum {
>> + VFIO_DEVICE_TYPE_PCI = 0,
>> + VFIO_DEVICE_TYPE_PLATFORM = 1,
>> +};
>> +
>> struct VFIOPCIDevice;
>>
>> typedef struct VFIOQuirk {
>> @@ -173,9 +178,27 @@ typedef struct VFIOMSIXInfo {
>> void *mmap;
>> } VFIOMSIXInfo;
>>
>> +typedef struct VFIODeviceOps VFIODeviceOps;
>> +
>> +typedef struct VFIODevice {
>> + QLIST_ENTRY(VFIODevice) next;
>> + struct VFIOGroup *group;
>> + char *name;
>> + int fd;
>> + int type;
>> + bool reset_works;
>> + bool needs_reset;
>> + VFIODeviceOps *ops;
>> +} VFIODevice;
>> +
>> +struct VFIODeviceOps {
>> + bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
>> + int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>> +};
>> +
>> typedef struct VFIOPCIDevice {
>> PCIDevice pdev;
>> - int fd;
>> + VFIODevice vbasedev;
>> VFIOINTx intx;
>> unsigned int config_size;
>> uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
>> @@ -191,20 +214,16 @@ typedef struct VFIOPCIDevice {
>> VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
>> VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
>> PCIHostDeviceAddress host;
>> - QLIST_ENTRY(VFIOPCIDevice) next;
>> - struct VFIOGroup *group;
>> EventNotifier err_notifier;
>> uint32_t features;
>> #define VFIO_FEATURE_ENABLE_VGA_BIT 0
>> #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
>> int32_t bootindex;
>> uint8_t pm_cap;
>> - bool reset_works;
>> bool has_vga;
>> bool pci_aer;
>> bool has_flr;
>> bool has_pm_reset;
>> - bool needs_reset;
>> bool rom_read_failed;
>> } VFIOPCIDevice;
>>
>> @@ -212,7 +231,7 @@ typedef struct VFIOGroup {
>> int fd;
>> int groupid;
>> VFIOContainer *container;
>> - QLIST_HEAD(, VFIOPCIDevice) device_list;
>> + QLIST_HEAD(, VFIODevice) device_list;
>> QLIST_ENTRY(VFIOGroup) next;
>> QLIST_ENTRY(VFIOGroup) container_next;
>> } VFIOGroup;
>> @@ -265,7 +284,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>> /*
>> * Common VFIO interrupt disable
>> */
>> -static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
>> +static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>> {
>> struct vfio_irq_set irq_set = {
>> .argsz = sizeof(irq_set),
>> @@ -275,37 +294,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
>> .count = 0,
>> };
>>
>> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> + ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> }
>>
>> /*
>> * INTx
>> */
>> -static void vfio_unmask_intx(VFIOPCIDevice *vdev)
>> +static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>> {
>> struct vfio_irq_set irq_set = {
>> .argsz = sizeof(irq_set),
>> .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>> - .index = VFIO_PCI_INTX_IRQ_INDEX,
>> + .index = index,
>> .start = 0,
>> .count = 1,
>> };
>>
>> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> + ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> }
>>
>> #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
>> -static void vfio_mask_intx(VFIOPCIDevice *vdev)
>> +static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
>> {
>> struct vfio_irq_set irq_set = {
>> .argsz = sizeof(irq_set),
>> .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
>> - .index = VFIO_PCI_INTX_IRQ_INDEX,
>> + .index = index,
>> .start = 0,
>> .count = 1,
>> };
>>
>> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> + ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> }
>> #endif
>>
>> @@ -369,7 +388,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
>>
>> vdev->intx.pending = false;
>> pci_irq_deassert(&vdev->pdev);
>> - vfio_unmask_intx(vdev);
>> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> }
>>
>> static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>> @@ -392,7 +411,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>>
>> /* Get to a known interrupt state */
>> qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
>> - vfio_mask_intx(vdev);
>> + vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> vdev->intx.pending = false;
>> pci_irq_deassert(&vdev->pdev);
>>
>> @@ -422,7 +441,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>>
>> *pfd = irqfd.resamplefd;
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> g_free(irq_set);
>> if (ret) {
>> error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
>> @@ -430,7 +449,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>> }
>>
>> /* Let'em rip */
>> - vfio_unmask_intx(vdev);
>> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>
>> vdev->intx.kvm_accel = true;
>>
>> @@ -447,7 +466,7 @@ fail_irqfd:
>> event_notifier_cleanup(&vdev->intx.unmask);
>> fail:
>> qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
>> - vfio_unmask_intx(vdev);
>> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> #endif
>> }
>>
>> @@ -468,7 +487,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
>> * Get to a known state, hardware masked, QEMU ready to accept new
>> * interrupts, QEMU IRQ de-asserted.
>> */
>> - vfio_mask_intx(vdev);
>> + vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> vdev->intx.pending = false;
>> pci_irq_deassert(&vdev->pdev);
>>
>> @@ -486,7 +505,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
>> vdev->intx.kvm_accel = false;
>>
>> /* If we've missed an event, let it re-fire through QEMU */
>> - vfio_unmask_intx(vdev);
>> + vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>
>> DPRINTF("%s(%04x:%02x:%02x.%x) KVM INTx accel disabled\n",
>> __func__, vdev->host.domain, vdev->host.bus,
>> @@ -574,7 +593,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
>> *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
>> qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> g_free(irq_set);
>> if (ret) {
>> error_report("vfio: Error: Failed to setup INTx fd: %m");
>> @@ -599,7 +618,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
>>
>> timer_del(vdev->intx.mmap_timer);
>> vfio_disable_intx_kvm(vdev);
>> - vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
>> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> vdev->intx.pending = false;
>> pci_irq_deassert(&vdev->pdev);
>> vfio_mmap_set_enabled(vdev, true);
>> @@ -678,7 +697,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>> }
>> }
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>
>> g_free(irq_set);
>>
>> @@ -777,7 +796,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>> * increase them as needed.
>> */
>> if (vdev->nr_vectors < nr + 1) {
>> - vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
>> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> vdev->nr_vectors = nr + 1;
>> ret = vfio_enable_vectors(vdev, true);
>> if (ret) {
>> @@ -805,7 +824,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>> *pfd = event_notifier_get_fd(&vector->interrupt);
>> }
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> g_free(irq_set);
>> if (ret) {
>> error_report("vfio: failed to modify vector, %d", ret);
>> @@ -856,7 +875,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>>
>> *pfd = event_notifier_get_fd(&vector->interrupt);
>>
>> - ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>
>> g_free(irq_set);
>> }
>> @@ -1016,7 +1035,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>> }
>>
>> if (vdev->nr_vectors) {
>> - vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
>> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> }
>>
>> vfio_disable_msi_common(vdev);
>> @@ -1027,7 +1046,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>>
>> static void vfio_disable_msi(VFIOPCIDevice *vdev)
>> {
>> - vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
>> + vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
>> vfio_disable_msi_common(vdev);
>>
>> DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
>> @@ -1173,7 +1192,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>> off_t off = 0;
>> size_t bytes;
>>
>> - if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
>> + if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) {
>> error_report("vfio: Error getting ROM info: %m");
>> return;
>> }
>> @@ -1203,7 +1222,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>> memset(vdev->rom, 0xff, size);
>>
>> while (size) {
>> - bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
>> + bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
>> + size, vdev->rom_offset + off);
>> if (bytes == 0) {
>> break;
>> } else if (bytes > 0) {
>> @@ -1297,6 +1317,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>> off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
>> DeviceState *dev = DEVICE(vdev);
>> char name[32];
>> + int fd = vdev->vbasedev.fd;
>>
>> if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>> /* Since pci handles romfile, just print a message and return */
>> @@ -1315,10 +1336,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>> * Use the same size ROM BAR as the physical device. The contents
>> * will get filled in later when the guest tries to read it.
>> */
>> - if (pread(vdev->fd, &orig, 4, offset) != 4 ||
>> - pwrite(vdev->fd, &size, 4, offset) != 4 ||
>> - pread(vdev->fd, &size, 4, offset) != 4 ||
>> - pwrite(vdev->fd, &orig, 4, offset) != 4) {
>> + if (pread(fd, &orig, 4, offset) != 4 ||
>> + pwrite(fd, &size, 4, offset) != 4 ||
>> + pread(fd, &size, 4, offset) != 4 ||
>> + pwrite(fd, &orig, 4, offset) != 4) {
>> error_report("%s(%04x:%02x:%02x.%x) failed: %m",
>> __func__, vdev->host.domain, vdev->host.bus,
>> vdev->host.slot, vdev->host.function);
>> @@ -2302,7 +2323,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>> if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
>> ssize_t ret;
>>
>> - ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
>> + ret = pread(vdev->vbasedev.fd, &phys_val, len,
>> + vdev->config_offset + addr);
>> if (ret != len) {
>> error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
>> __func__, vdev->host.domain, vdev->host.bus,
>> @@ -2332,7 +2354,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
>> vdev->host.function, addr, val, len);
>>
>> /* Write everything to VFIO, let it filter out what we can't write */
>> - if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
>> + if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
>> + != len) {
>> error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
>> __func__, vdev->host.domain, vdev->host.bus,
>> vdev->host.slot, vdev->host.function, addr, val, len);
>> @@ -2702,7 +2725,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
>> bool msi_64bit, msi_maskbit;
>> int ret, entries;
>>
>> - if (pread(vdev->fd, &ctrl, sizeof(ctrl),
>> + if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
>> vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>> return -errno;
>> }
>> @@ -2741,23 +2764,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
>> uint8_t pos;
>> uint16_t ctrl;
>> uint32_t table, pba;
>> + int fd = vdev->vbasedev.fd;
>>
>> pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
>> if (!pos) {
>> return 0;
>> }
>>
>> - if (pread(vdev->fd, &ctrl, sizeof(ctrl),
>> + if (pread(fd, &ctrl, sizeof(ctrl),
>> vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>> return -errno;
>> }
>>
>> - if (pread(vdev->fd, &table, sizeof(table),
>> + if (pread(fd, &table, sizeof(table),
>> vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
>> return -errno;
>> }
>>
>> - if (pread(vdev->fd, &pba, sizeof(pba),
>> + if (pread(fd, &pba, sizeof(pba),
>> vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
>> return -errno;
>> }
>> @@ -2913,7 +2937,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>> vdev->host.function, nr);
>>
>> /* Determine what type of BAR this is for registration */
>> - ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
>> + ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
>> vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
>> if (ret != sizeof(pci_bar)) {
>> error_report("vfio: Failed to read BAR %d (%m)", nr);
>> @@ -3334,12 +3358,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> single ? "one" : "multi");
>>
>> vfio_pci_pre_reset(vdev);
>> - vdev->needs_reset = false;
>> + vdev->vbasedev.needs_reset = false;
>>
>> info = g_malloc0(sizeof(*info));
>> info->argsz = sizeof(*info);
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>> if (ret && errno != ENOSPC) {
>> ret = -errno;
>> if (!vdev->has_pm_reset) {
>> @@ -3355,7 +3379,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> info->argsz = sizeof(*info) + (count * sizeof(*devices));
>> devices = &info->devices[0];
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>> if (ret) {
>> ret = -errno;
>> error_report("vfio: hot reset info failed: %m");
>> @@ -3370,6 +3394,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> for (i = 0; i < info->count; i++) {
>> PCIHostDeviceAddress host;
>> VFIOPCIDevice *tmp;
>> + VFIODevice *vbasedev_iter;
>>
>> host.domain = devices[i].segment;
>> host.bus = devices[i].bus;
>> @@ -3401,7 +3426,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> }
>>
>> /* Prep dependent devices for reset and clear our marker. */
>> - QLIST_FOREACH(tmp, &group->device_list, next) {
>> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> + if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
>> + continue;
>> + }
>> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>> if (vfio_pci_host_match(&host, &tmp->host)) {
>> if (single) {
>> DPRINTF("vfio: found another in-use device "
>> @@ -3411,7 +3440,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> goto out_single;
>> }
>> vfio_pci_pre_reset(tmp);
>> - tmp->needs_reset = false;
>> + tmp->vbasedev.needs_reset = false;
>> multi = true;
>> break;
>> }
>> @@ -3450,7 +3479,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> }
>>
>> /* Bus reset! */
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
>> g_free(reset);
>>
>> DPRINTF("%04x:%02x:%02x.%x hot reset: %s\n", vdev->host.domain,
>> @@ -3462,6 +3491,7 @@ out:
>> for (i = 0; i < info->count; i++) {
>> PCIHostDeviceAddress host;
>> VFIOPCIDevice *tmp;
>> + VFIODevice *vbasedev_iter;
>>
>> host.domain = devices[i].segment;
>> host.bus = devices[i].bus;
>> @@ -3482,7 +3512,11 @@ out:
>> break;
>> }
>>
>> - QLIST_FOREACH(tmp, &group->device_list, next) {
>> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> + if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
>> + continue;
>> + }
>> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>> if (vfio_pci_host_match(&host, &tmp->host)) {
>> vfio_pci_post_reset(tmp);
>> break;
>> @@ -3516,28 +3550,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
>> return vfio_pci_hot_reset(vdev, true);
>> }
>>
>> -static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
>> +static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
>> {
>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>
> nit, extra white space ^
Hi Alex,
OK corrected
>
>> return vfio_pci_hot_reset(vdev, false);
>> }
>>
>> -static void vfio_pci_reset_handler(void *opaque)
>> +static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
>> +{
>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> + if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
>> + vbasedev->needs_reset = true;
>> + }
>> + return vbasedev->needs_reset;
>> +}
>> +
>> +static VFIODeviceOps vfio_pci_ops = {
>> + .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
>> + .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
>> +};
>> +
>> +static void vfio_reset_handler(void *opaque)
>> {
>> VFIOGroup *group;
>> - VFIOPCIDevice *vdev;
>> + VFIODevice *vbasedev;
>>
>> QLIST_FOREACH(group, &group_list, next) {
>> - QLIST_FOREACH(vdev, &group->device_list, next) {
>> - if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
>> - vdev->needs_reset = true;
>> - }
>> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> + vbasedev->ops->vfio_compute_needs_reset(vbasedev);
>> }
>> }
>>
>> QLIST_FOREACH(group, &group_list, next) {
>> - QLIST_FOREACH(vdev, &group->device_list, next) {
>> - if (vdev->needs_reset) {
>> - vfio_pci_hot_reset_multi(vdev);
>> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> + if (vbasedev->needs_reset) {
>> + vbasedev->ops->vfio_hot_reset_multi(vbasedev);
>> }
>> }
>> }
>> @@ -3682,7 +3729,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>>
>> if (container->iommu_data.type1.error) {
>> ret = container->iommu_data.type1.error;
>> - error_report("vfio: memory listener initialization failed for container");
>> + error_report("vfio: memory listener initialization failed"
>> + " for container");
>
> Generally not good to split strings that would otherwise be search-able.
OK
>
>> goto listener_release_exit;
>> }
>>
>> @@ -3826,7 +3874,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
>> }
>>
>> if (QLIST_EMPTY(&group_list)) {
>> - qemu_register_reset(vfio_pci_reset_handler, NULL);
>> + qemu_register_reset(vfio_reset_handler, NULL);
>> }
>>
>> QLIST_INSERT_HEAD(&group_list, group, next);
>> @@ -3858,7 +3906,7 @@ static void vfio_put_group(VFIOGroup *group)
>> g_free(group);
>>
>> if (QLIST_EMPTY(&group_list)) {
>> - qemu_unregister_reset(vfio_pci_reset_handler, NULL);
>> + qemu_unregister_reset(vfio_reset_handler, NULL);
>> }
>> }
>>
>> @@ -3879,12 +3927,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> return ret;
>> }
>>
>> - vdev->fd = ret;
>> - vdev->group = group;
>> - QLIST_INSERT_HEAD(&group->device_list, vdev, next);
>> + vdev->vbasedev.fd = ret;
>> + vdev->vbasedev.group = group;
>> + QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
>>
>> /* Sanity check device */
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
>> if (ret) {
>> error_report("vfio: error getting device info: %m");
>> goto error;
>> @@ -3898,7 +3946,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> goto error;
>> }
>>
>> - vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>> + vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>>
>> if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>> error_report("vfio: unexpected number of io regions %u",
>> @@ -3914,7 +3962,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
>> reg_info.index = i;
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
>> if (ret) {
>> error_report("vfio: Error getting region %d info: %m", i);
>> goto error;
>> @@ -3928,14 +3976,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> vdev->bars[i].flags = reg_info.flags;
>> vdev->bars[i].size = reg_info.size;
>> vdev->bars[i].fd_offset = reg_info.offset;
>> - vdev->bars[i].fd = vdev->fd;
>> + vdev->bars[i].fd = vdev->vbasedev.fd;
>> vdev->bars[i].nr = i;
>> QLIST_INIT(&vdev->bars[i].quirks);
>> }
>>
>> reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
>> if (ret) {
>> error_report("vfio: Error getting config info: %m");
>> goto error;
>> @@ -3959,7 +4007,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> .index = VFIO_PCI_VGA_REGION_INDEX,
>> };
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
>> if (ret) {
>> error_report(
>> "vfio: Device does not support requested feature x-vga");
>> @@ -3976,7 +4024,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> }
>>
>> vdev->vga.fd_offset = vga_info.offset;
>> - vdev->vga.fd = vdev->fd;
>> + vdev->vga.fd = vdev->vbasedev.fd;
>>
>> vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
>> vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
>> @@ -3994,7 +4042,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> }
>> irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>> if (ret) {
>> /* This can fail for an old kernel or legacy PCI dev */
>> DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
>> @@ -4010,19 +4058,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>
>> error:
>> if (ret) {
>> - QLIST_REMOVE(vdev, next);
>> - vdev->group = NULL;
>> - close(vdev->fd);
>> + QLIST_REMOVE(&vdev->vbasedev, next);
>> + vdev->vbasedev.group = NULL;
>> + close(vdev->vbasedev.fd);
>> }
>> return ret;
>> }
>>
>> static void vfio_put_device(VFIOPCIDevice *vdev)
>> {
>> - QLIST_REMOVE(vdev, next);
>> - vdev->group = NULL;
>> + QLIST_REMOVE(&vdev->vbasedev, next);
>> + vdev->vbasedev.group = NULL;
>> DPRINTF("vfio_put_device: close vdev->fd\n");
>> - close(vdev->fd);
>> + close(vdev->vbasedev.fd);
>> + g_free(vdev->vbasedev.name);
>> if (vdev->msix) {
>> g_free(vdev->msix);
>> vdev->msix = NULL;
>> @@ -4091,7 +4140,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
>> *pfd = event_notifier_get_fd(&vdev->err_notifier);
>> qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> if (ret) {
>> error_report("vfio: Failed to set up error notification");
>> qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
>> @@ -4124,7 +4173,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>> pfd = (int32_t *)&irq_set->data;
>> *pfd = -1;
>>
>> - ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> if (ret) {
>> error_report("vfio: Failed to de-assign error fd: %m");
>> }
>> @@ -4136,7 +4185,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>>
>> static int vfio_initfn(PCIDevice *pdev)
>> {
>> - VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> + VFIODevice *vbasedev_iter;
>> VFIOGroup *group;
>> char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>> ssize_t len;
>> @@ -4154,6 +4204,14 @@ static int vfio_initfn(PCIDevice *pdev)
>> return -errno;
>> }
>>
>> + vdev->vbasedev.ops = &vfio_pci_ops;
>> +
>> + vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
>> + vdev->vbasedev.name = g_malloc0(PATH_MAX);
>> + snprintf(vdev->vbasedev.name, PATH_MAX, "%04x:%02x:%02x.%01x",
>> + vdev->host.domain, vdev->host.bus, vdev->host.slot,
>> + vdev->host.function);
>> +
>
> asprintf(3)? This is a deterministic length, so PATH_MAX is especially
> ridiculous.
agreed, will use asprintf instead.
Regarding the new VFIODevice overall, does it match your expectations in
term of factorization. Do you think there are the requested fields there.
Thanks you in advance
Best Regards
Eric
>
>> strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>>
>> len = readlink(path, iommu_group_path, sizeof(path));
>> @@ -4183,12 +4241,8 @@ static int vfio_initfn(PCIDevice *pdev)
>> vdev->host.domain, vdev->host.bus, vdev->host.slot,
>> vdev->host.function);
>>
>> - QLIST_FOREACH(pvdev, &group->device_list, next) {
>> - if (pvdev->host.domain == vdev->host.domain &&
>> - pvdev->host.bus == vdev->host.bus &&
>> - pvdev->host.slot == vdev->host.slot &&
>> - pvdev->host.function == vdev->host.function) {
>> -
>> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> + if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
>> error_report("vfio: error: device %s is already attached", path);
>> vfio_put_group(group);
>> return -EBUSY;
>> @@ -4203,7 +4257,7 @@ static int vfio_initfn(PCIDevice *pdev)
>> }
>>
>> /* Get a copy of config space */
>> - ret = pread(vdev->fd, vdev->pdev.config,
>> + ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>> MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>> vdev->config_offset);
>> if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
>> @@ -4291,7 +4345,7 @@ out_put:
>> static void vfio_exitfn(PCIDevice *pdev)
>> {
>> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> - VFIOGroup *group = vdev->group;
>> + VFIOGroup *group = vdev->vbasedev.group;
>>
>> vfio_unregister_err_notifier(vdev);
>> pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>> @@ -4317,8 +4371,9 @@ static void vfio_pci_reset(DeviceState *dev)
>>
>> vfio_pci_pre_reset(vdev);
>>
>> - if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
>> - !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
>> + if (vdev->vbasedev.reset_works &&
>> + (vdev->has_flr || !vdev->has_pm_reset) &&
>> + !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
>> DPRINTF("%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET\n", vdev->host.domain,
>> vdev->host.bus, vdev->host.slot, vdev->host.function);
>> goto post_reset;
>> @@ -4330,8 +4385,8 @@ static void vfio_pci_reset(DeviceState *dev)
>> }
>>
>> /* If nothing else works and the device supports PM reset, use it */
>> - if (vdev->reset_works && vdev->has_pm_reset &&
>> - !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
>> + if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
>> + !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
>> DPRINTF("%04x:%02x:%02x.%x PCI PM Reset\n", vdev->host.domain,
>> vdev->host.bus, vdev->host.slot, vdev->host.function);
>> goto post_reset;
>
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
2014-07-23 10:02 ` Eric Auger
@ 2014-07-23 10:24 ` Peter Maydell
2014-07-23 11:40 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2014-07-23 10:24 UTC (permalink / raw)
To: Eric Auger
Cc: Alexander Graf, Kim Phillips, eric.auger, Patch Tracking,
Will Deacon, QEMU Developers, Alvise Rigo, Bharat Bhushan,
Alex Williamson, Stuart Yoder, Antonios Motakis,
kvmarm@lists.cs.columbia.edu, Christoffer Dall
On 23 July 2014 11:02, Eric Auger <eric.auger@linaro.org> wrote:
> On 07/09/2014 12:41 AM, Alex Williamson wrote:
>> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>>> + vdev->vbasedev.ops = &vfio_pci_ops;
>>> +
>>> + vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
>>> + vdev->vbasedev.name = g_malloc0(PATH_MAX);
>>> + snprintf(vdev->vbasedev.name, PATH_MAX, "%04x:%02x:%02x.%01x",
>>> + vdev->host.domain, vdev->host.bus, vdev->host.slot,
>>> + vdev->host.function);
>>> +
>>
>> asprintf(3)? This is a deterministic length, so PATH_MAX is especially
>> ridiculous.
> agreed, will use asprintf instead.
A minor nit given this is going to be in "only on Linux"
code, but we generally prefer g_strdup_printf() over
raw asprintf() (they do the same thing, but the glib
function is guaranteed to be present everywhere,
and the returned memory is freeable with g_free()
like most of our strings, rather than needing to remember
that it needs to be freed via free().)
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
2014-07-23 10:24 ` Peter Maydell
@ 2014-07-23 11:40 ` Eric Auger
0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-23 11:40 UTC (permalink / raw)
To: Peter Maydell
Cc: Alexander Graf, Kim Phillips, eric.auger, Patch Tracking,
Will Deacon, QEMU Developers, Alvise Rigo, Bharat Bhushan,
Alex Williamson, Stuart Yoder, Antonios Motakis,
kvmarm@lists.cs.columbia.edu, Christoffer Dall
On 07/23/2014 12:24 PM, Peter Maydell wrote:
> On 23 July 2014 11:02, Eric Auger <eric.auger@linaro.org> wrote:
>> On 07/09/2014 12:41 AM, Alex Williamson wrote:
>>> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>>>> + vdev->vbasedev.ops = &vfio_pci_ops;
>>>> +
>>>> + vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
>>>> + vdev->vbasedev.name = g_malloc0(PATH_MAX);
>>>> + snprintf(vdev->vbasedev.name, PATH_MAX, "%04x:%02x:%02x.%01x",
>>>> + vdev->host.domain, vdev->host.bus, vdev->host.slot,
>>>> + vdev->host.function);
>>>> +
>>>
>>> asprintf(3)? This is a deterministic length, so PATH_MAX is especially
>>> ridiculous.
>> agreed, will use asprintf instead.
>
> A minor nit given this is going to be in "only on Linux"
> code, but we generally prefer g_strdup_printf() over
> raw asprintf() (they do the same thing, but the glib
> function is guaranteed to be present everywhere,
> and the returned memory is freeable with g_free()
> like most of our strings, rather than needing to remember
> that it needs to be freed via free().)
Hi Peter,
thanks. this is noted.
BR
Eric
>
> thanks
> -- PMM
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion
2014-07-08 22:41 ` Alex Williamson
@ 2014-07-23 13:50 ` Eric Auger
0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-23 13:50 UTC (permalink / raw)
To: Alex Williamson
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On 07/09/2014 12:41 AM, Alex Williamson wrote:
> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>> This structure is going to be shared by VFIOPCIDevice and
>> VFIOPlatformDevice. VFIOBAR includes it.
>>
>> vfio_eoi becomes an ops of VFIODevice specialized by parent device.
>> This makes possible to transform vfio_bar_write/read into generic
>> vfio_region_write/read that will be used by VFIOPlatformDevice too.
>>
>> vfio_mmap_bar becomes vfio_map_region
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>> hw/vfio/pci.c | 169 ++++++++++++++++++++++++++++++++--------------------------
>> 1 file changed, 93 insertions(+), 76 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index d0bee62..5f0164a 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -74,15 +74,20 @@ typedef struct VFIOQuirk {
>> } data;
>> } VFIOQuirk;
>>
>> -typedef struct VFIOBAR {
>> - off_t fd_offset; /* offset of BAR within device fd */
>> - int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
>> +typedef struct VFIORegion {
>> + struct VFIODevice *vbasedev;
>> + off_t fd_offset; /* offset of region within device fd */
>> + int fd; /* device fd, allows us to pass VFIORegion as opaque data */
>
> The value of fd here is a bit diminished if we're adding a pointer to
> the basedev.
agreed. I removed it.
>
>> MemoryRegion mem; /* slow, read/write access */
>> MemoryRegion mmap_mem; /* direct mapped access */
>> void *mmap;
>> size_t size;
>> uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
>> - uint8_t nr; /* cache the BAR number for debug */
>> + uint8_t nr; /* cache the region number for debug */
>> +} VFIORegion;
>> +
>> +typedef struct VFIOBAR {
>> + VFIORegion region;
>> bool ioport;
>> bool mem64;
>> QLIST_HEAD(, VFIOQuirk) quirks;
>> @@ -194,6 +199,7 @@ typedef struct VFIODevice {
>> struct VFIODeviceOps {
>> bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
>> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>> + void (*vfio_eoi)(VFIODevice *vdev);
>> };
>>
>> typedef struct VFIOPCIDevice {
>> @@ -377,8 +383,10 @@ static void vfio_intx_interrupt(void *opaque)
>> }
>> }
>>
>> -static void vfio_eoi(VFIOPCIDevice *vdev)
>> +static void vfio_eoi(VFIODevice *vbasedev)
>> {
>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +
>> if (!vdev->intx.pending) {
>> return;
>> }
>> @@ -388,7 +396,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
>>
>> vdev->intx.pending = false;
>> pci_irq_deassert(&vdev->pdev);
>> - vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> + vfio_unmask_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>> }
>>
>> static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>> @@ -543,7 +551,7 @@ static void vfio_update_irq(PCIDevice *pdev)
>> vfio_enable_intx_kvm(vdev);
>>
>> /* Re-enable the interrupt in cased we missed an EOI */
>> - vfio_eoi(vdev);
>> + vfio_eoi(&vdev->vbasedev);
>> }
>>
>> static int vfio_enable_intx(VFIOPCIDevice *vdev)
>> @@ -1073,10 +1081,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
>> /*
>> * IO Port/MMIO - Beware of the endians, VFIO is always little endian
>> */
>> -static void vfio_bar_write(void *opaque, hwaddr addr,
>> +static void vfio_region_write(void *opaque, hwaddr addr,
>> uint64_t data, unsigned size)
>> {
>> - VFIOBAR *bar = opaque;
>> + VFIORegion *region = opaque;
>> + VFIODevice *vbasedev = region->vbasedev;
>> union {
>> uint8_t byte;
>> uint16_t word;
>> @@ -1099,19 +1108,16 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
>> break;
>> }
>>
>> - if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
>> + if (pwrite(region->fd, &buf, size, region->fd_offset + addr) != size) {
>> error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
>> __func__, addr, data, size);
>
> Now that we've got vbasedev->name and region->nr we could make this
> error report a bit more useful.
OK done for both vfio_region_write and read
>
>> }
>>
>> #ifdef DEBUG_VFIO
>> {
>> - VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
>> -
>> - DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx", 0x%"PRIx64
>> - ", %d)\n", __func__, vdev->host.domain, vdev->host.bus,
>> - vdev->host.slot, vdev->host.function, bar->nr, addr,
>> - data, size);
>> + DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
>> + ", %d)\n", __func__, vbasedev->name,
>> + region->nr, addr, data, size);
>> }
>> #endif
>
> This no longer needs the #ifdef since we don't need any new variables to
> make the debug info accessible.
OK removed
Thank goodness vfio maps BAR0 to
> region0 or else this debug output would need a translator.
Yes ;-)
>
>>
>> @@ -1123,13 +1129,15 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
>> * which access will service the interrupt, so we're potentially
>> * getting quite a few host interrupts per guest interrupt.
>> */
>> - vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
>> + vbasedev->ops->vfio_eoi(vbasedev);
>> +
>> }
>>
>> -static uint64_t vfio_bar_read(void *opaque,
>> +static uint64_t vfio_region_read(void *opaque,
>> hwaddr addr, unsigned size)
>> {
>> - VFIOBAR *bar = opaque;
>> + VFIORegion *region = opaque;
>> + VFIODevice *vbasedev = region->vbasedev;
>> union {
>> uint8_t byte;
>> uint16_t word;
>> @@ -1138,7 +1146,7 @@ static uint64_t vfio_bar_read(void *opaque,
>> } buf;
>> uint64_t data = 0;
>>
>> - if (pread(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
>> + if (pread(region->fd, &buf, size, region->fd_offset + addr) != size) {
>> error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
>> __func__, addr, size);
>> return (uint64_t)-1;
>> @@ -1161,24 +1169,21 @@ static uint64_t vfio_bar_read(void *opaque,
>>
>> #ifdef DEBUG_VFIO
>> {
>> - VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
>> -
>> - DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx
>> - ", %d) = 0x%"PRIx64"\n", __func__, vdev->host.domain,
>> - vdev->host.bus, vdev->host.slot, vdev->host.function,
>> - bar->nr, addr, size, data);
>> + DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
>> + __func__, vdev->name,
>> + region->nr, addr, size, data);
>> }
>> #endif
>>
>> /* Same as write above */
>> - vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
>> + vbasedev->ops->vfio_eoi(vbasedev);
>>
>> return data;
>> }
>>
>> -static const MemoryRegionOps vfio_bar_ops = {
>> - .read = vfio_bar_read,
>> - .write = vfio_bar_write,
>> +static const MemoryRegionOps vfio_region_ops = {
>> + .read = vfio_region_read,
>> + .write = vfio_region_write,
>> .endianness = DEVICE_NATIVE_ENDIAN,
>> };
>>
>> @@ -1513,7 +1518,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
>> vdev->host.bus, vdev->host.slot, vdev->host.function,
>> quirk->data.bar, addr, size, data);
>> } else {
>> - data = vfio_bar_read(&vdev->bars[quirk->data.bar],
>> + data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
>> addr + quirk->data.base_offset, size);
>
> re-align the next line please
>
>> }
>>
>> @@ -1564,7 +1569,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
>> return;
>> }
>>
>> - vfio_bar_write(&vdev->bars[quirk->data.bar],
>> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
>> addr + quirk->data.base_offset, data, size);
>> }
>>
>> @@ -1598,7 +1603,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
>> vdev->host.bus, vdev->host.slot, vdev->host.function,
>> quirk->data.bar, addr + base, size, data);
>> } else {
>> - data = vfio_bar_read(&vdev->bars[quirk->data.bar], addr + base, size);
>> + data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
>> + addr + base, size);
>> }
>>
>> return data;
>> @@ -1627,7 +1633,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
>> vdev->host.domain, vdev->host.bus, vdev->host.slot,
>> vdev->host.function, quirk->data.bar, addr + base, data, size);
>> } else {
>> - vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
>> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
>> + addr + base, data, size);
>> }
>> }
>>
>> @@ -1680,7 +1687,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
>> * As long as the BAR is >= 256 bytes it will be aligned such that the
>> * lower byte is always zero. Filter out anything else, if it exists.
>> */
>> - if (!vdev->bars[4].ioport || vdev->bars[4].size < 256) {
>> + if (!vdev->bars[4].ioport || vdev->bars[4].region.size < 256) {
>> return;
>> }
>>
>> @@ -1733,7 +1740,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
>> memory_region_init_io(&quirk->mem, OBJECT(vdev),
>> &vfio_generic_window_quirk, quirk,
>> "vfio-ati-bar4-window-quirk", 8);
>> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
>> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
>> quirk->data.base_offset, &quirk->mem, 1);
>>
>> QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
>> @@ -1807,7 +1814,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
>> memory_region_name(&quirk->mem), vdev->host.domain,
>> vdev->host.bus, vdev->host.slot, vdev->host.function);
>>
>> - return vfio_bar_read(&vdev->bars[quirk->data.bar], addr + 0x70, size);
>> + return vfio_region_read(&vdev->bars[quirk->data.bar].region,
>> + addr + 0x70, size);
>> }
>>
>> static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
>> @@ -1847,7 +1855,8 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
>> memory_region_name(&quirk->mem), vdev->host.domain,
>> vdev->host.bus, vdev->host.slot, vdev->host.function);
>>
>> - vfio_bar_write(&vdev->bars[quirk->data.bar], addr + 0x70, data, size);
>> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
>> + addr + 0x70, data, size);
>> }
>>
>> static const MemoryRegionOps vfio_rtl8168_window_quirk = {
>> @@ -1877,7 +1886,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
>>
>> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_rtl8168_window_quirk,
>> quirk, "vfio-rtl8168-window-quirk", 8);
>> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
>> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
>> 0x70, &quirk->mem, 1);
>>
>> QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
>> @@ -1910,7 +1919,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
>> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
>> "vfio-ati-bar2-4000-quirk",
>> TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
>> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
>> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
>> quirk->data.address_match & TARGET_PAGE_MASK,
>> &quirk->mem, 1);
>>
>> @@ -2029,7 +2038,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
>> VFIOQuirk *quirk;
>>
>> if (pci_get_word(pdev->config + PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA ||
>> - !vdev->bars[1].size) {
>> + !vdev->bars[1].region.size) {
>> return;
>> }
>>
>> @@ -2137,7 +2146,8 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
>> memory_region_init_io(&quirk->mem, OBJECT(vdev),
>> &vfio_nvidia_bar5_window_quirk, quirk,
>> "vfio-nvidia-bar5-window-quirk", 16);
>> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 1);
>> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
>> + 0, &quirk->mem, 1);
>>
>> QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
>>
>> @@ -2164,7 +2174,8 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
>> */
>> if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
>> vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
>> - vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
>> + vfio_region_write(&vdev->bars[quirk->data.bar].region,
>> + addr + base, data, size);
>> }
>> }
>>
>> @@ -2203,7 +2214,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
>> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
>> quirk, "vfio-nvidia-bar0-88000-quirk",
>> TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
>> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
>> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
>> quirk->data.address_match & TARGET_PAGE_MASK,
>> &quirk->mem, 1);
>>
>> @@ -2229,7 +2240,8 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
>>
>> /* Log the chipset ID */
>> DPRINTF("Nvidia NV%02x\n",
>> - (unsigned int)(vfio_bar_read(&vdev->bars[0], 0, 4) >> 20) & 0xff);
>> + (unsigned int)(vfio_region_read(&vdev->bars[0].region, 0, 4) >> 20)
>> + & 0xff);
>>
>> quirk = g_malloc0(sizeof(*quirk));
>> quirk->vdev = vdev;
>> @@ -2241,7 +2253,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
>> memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
>> "vfio-nvidia-bar0-1800-quirk",
>> TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
>> - memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
>> + memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
>> quirk->data.address_match & TARGET_PAGE_MASK,
>> &quirk->mem, 1);
>>
>> @@ -2298,7 +2310,7 @@ static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
>>
>> while (!QLIST_EMPTY(&bar->quirks)) {
>> VFIOQuirk *quirk = QLIST_FIRST(&bar->quirks);
>> - memory_region_del_subregion(&bar->mem, &quirk->mem);
>> + memory_region_del_subregion(&bar->region.mem, &quirk->mem);
>> memory_region_destroy(&quirk->mem);
>> QLIST_REMOVE(quirk, next);
>> g_free(quirk);
>> @@ -2811,9 +2823,9 @@ static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
>> int ret;
>>
>> ret = msix_init(&vdev->pdev, vdev->msix->entries,
>> - &vdev->bars[vdev->msix->table_bar].mem,
>> + &vdev->bars[vdev->msix->table_bar].region.mem,
>> vdev->msix->table_bar, vdev->msix->table_offset,
>> - &vdev->bars[vdev->msix->pba_bar].mem,
>> + &vdev->bars[vdev->msix->pba_bar].region.mem,
>> vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
>> if (ret < 0) {
>> if (ret == -ENOTSUP) {
>> @@ -2831,8 +2843,9 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
>> msi_uninit(&vdev->pdev);
>>
>> if (vdev->msix) {
>> - msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
>> - &vdev->bars[vdev->msix->pba_bar].mem);
>> + msix_uninit(&vdev->pdev,
>> + &vdev->bars[vdev->msix->table_bar].region.mem,
>> + &vdev->bars[vdev->msix->pba_bar].region.mem);
>> }
>> }
>>
>> @@ -2846,11 +2859,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
>> for (i = 0; i < PCI_ROM_SLOT; i++) {
>> VFIOBAR *bar = &vdev->bars[i];
>>
>> - if (!bar->size) {
>> + if (!bar->region.size) {
>> continue;
>> }
>>
>> - memory_region_set_enabled(&bar->mmap_mem, enabled);
>> + memory_region_set_enabled(&bar->region.mmap_mem, enabled);
>> if (vdev->msix && vdev->msix->table_bar == i) {
>> memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
>> }
>> @@ -2861,45 +2874,46 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
>> {
>> VFIOBAR *bar = &vdev->bars[nr];
>>
>> - if (!bar->size) {
>> + if (!bar->region.size) {
>> return;
>> }
>>
>> vfio_bar_quirk_teardown(vdev, nr);
>>
>> - memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
>> - munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
>> - memory_region_destroy(&bar->mmap_mem);
>> + memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
>> + munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
>> + memory_region_destroy(&bar->region.mmap_mem);
>>
>> if (vdev->msix && vdev->msix->table_bar == nr) {
>> - memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
>> + memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
>> munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
>> memory_region_destroy(&vdev->msix->mmap_mem);
>> }
>>
>> - memory_region_destroy(&bar->mem);
>> + memory_region_destroy(&bar->region.mem);
>> }
>>
>> -static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
>> +static int vfio_mmap_region(Object *vdev, VFIORegion *region,
>
> "vdev" is effectively a reserved variable name here, let's not use it to
> reference an Object.
OK. renamed into obj and removed useless OBJECT().
Best Regards
Eric
>
>> MemoryRegion *mem, MemoryRegion *submem,
>> void **map, size_t size, off_t offset,
>> const char *name)
>> {
>> int ret = 0;
>>
>> - if (VFIO_ALLOW_MMAP && size && bar->flags & VFIO_REGION_INFO_FLAG_MMAP) {
>> + if (VFIO_ALLOW_MMAP && size && region->flags &
>> + VFIO_REGION_INFO_FLAG_MMAP) {
>> int prot = 0;
>>
>> - if (bar->flags & VFIO_REGION_INFO_FLAG_READ) {
>> + if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
>> prot |= PROT_READ;
>> }
>>
>> - if (bar->flags & VFIO_REGION_INFO_FLAG_WRITE) {
>> + if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
>> prot |= PROT_WRITE;
>> }
>>
>> *map = mmap(NULL, size, prot, MAP_SHARED,
>> - bar->fd, bar->fd_offset + offset);
>> + region->fd, region->fd_offset + offset);
>> if (*map == MAP_FAILED) {
>> *map = NULL;
>> ret = -errno;
>> @@ -2921,7 +2935,7 @@ empty_region:
>> static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>> {
>> VFIOBAR *bar = &vdev->bars[nr];
>> - unsigned size = bar->size;
>> + unsigned size = bar->region.size;
>> char name[64];
>> uint32_t pci_bar;
>> uint8_t type;
>> @@ -2951,9 +2965,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>> ~PCI_BASE_ADDRESS_MEM_MASK);
>>
>> /* A "slow" read/write mapping underlies all BARs */
>> - memory_region_init_io(&bar->mem, OBJECT(vdev), &vfio_bar_ops,
>> + memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
>> bar, name, size);
>> - pci_register_bar(&vdev->pdev, nr, type, &bar->mem);
>> + pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
>>
>> /*
>> * We can't mmap areas overlapping the MSIX vector table, so we
>> @@ -2964,8 +2978,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>> }
>>
>> strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
>> - if (vfio_mmap_bar(vdev, bar, &bar->mem,
>> - &bar->mmap_mem, &bar->mmap, size, 0, name)) {
>> + if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
>> + &bar->region.mmap_mem, &bar->region.mmap,
>> + size, 0, name)) {
>> error_report("%s unsupported. Performance may be slow", name);
>> }
>>
>> @@ -2975,10 +2990,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>> start = HOST_PAGE_ALIGN(vdev->msix->table_offset +
>> (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
>>
>> - size = start < bar->size ? bar->size - start : 0;
>> + size = start < bar->region.size ? bar->region.size - start : 0;
>> strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
>> /* VFIOMSIXInfo contains another MemoryRegion for this mapping */
>> - if (vfio_mmap_bar(vdev, bar, &bar->mem, &vdev->msix->mmap_mem,
>> + if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
>> + &vdev->msix->mmap_mem,
>> &vdev->msix->mmap, size, start, name)) {
>> error_report("%s unsupported. Performance may be slow", name);
>> }
>> @@ -3568,6 +3584,7 @@ static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
>> static VFIODeviceOps vfio_pci_ops = {
>> .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
>> .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
>> + .vfio_eoi = vfio_eoi,
>> };
>>
>> static void vfio_reset_handler(void *opaque)
>> @@ -3973,11 +3990,11 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> (unsigned long)reg_info.size, (unsigned long)reg_info.offset,
>> (unsigned long)reg_info.flags);
>>
>> - vdev->bars[i].flags = reg_info.flags;
>> - vdev->bars[i].size = reg_info.size;
>> - vdev->bars[i].fd_offset = reg_info.offset;
>> - vdev->bars[i].fd = vdev->vbasedev.fd;
>> - vdev->bars[i].nr = i;
>> + vdev->bars[i].region.flags = reg_info.flags;
>> + vdev->bars[i].region.size = reg_info.size;
>> + vdev->bars[i].region.fd_offset = reg_info.offset;
>> + vdev->bars[i].region.fd = vdev->vbasedev.fd;
>> + vdev->bars[i].region.nr = i;
>> QLIST_INIT(&vdev->bars[i].quirks);
>> }
>>
>
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device
2014-07-08 22:43 ` Alex Williamson
@ 2014-07-24 9:51 ` Eric Auger
2014-07-24 10:25 ` Eric Auger
0 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2014-07-24 9:51 UTC (permalink / raw)
To: Alex Williamson
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On 07/09/2014 12:43 AM, Alex Williamson wrote:
> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>> vfio_get_device now takes a VFIODevice as argument. The function is split
>> into 4 functional parts: dev_info query, device check, region populate
>> and interrupt populate. the last 3 are specialized by parent device and
>> are added into DeviceOps.
>>
>> 3 new fields are introduced in VFIODevice to store dev_info.
>>
>> vfio_put_base_device is created.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>> hw/vfio/pci.c | 181 +++++++++++++++++++++++++++++++++++++++-------------------
>> 1 file changed, 121 insertions(+), 60 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 5f0164a..d228cf8 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -194,12 +194,18 @@ typedef struct VFIODevice {
>> bool reset_works;
>> bool needs_reset;
>> VFIODeviceOps *ops;
>> + unsigned int num_irqs;
>> + unsigned int num_regions;
>> + unsigned int flags;
>> } VFIODevice;
>>
>> struct VFIODeviceOps {
>> bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
>> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>> void (*vfio_eoi)(VFIODevice *vdev);
>> + int (*vfio_check_device)(VFIODevice *vdev);
>> + int (*vfio_populate_regions)(VFIODevice *vdev);
>> + int (*vfio_populate_interrupts)(VFIODevice *vdev);
>> };
>>
>> typedef struct VFIOPCIDevice {
>> @@ -286,6 +292,10 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
>> static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
>> uint32_t val, int len);
>> static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>> +static void vfio_put_base_device(VFIODevice *vbasedev);
>> +static int vfio_check_device(VFIODevice *vbasedev);
>> +static int vfio_populate_regions(VFIODevice *vbasedev);
>> +static int vfio_populate_interrupts(VFIODevice *vbasedev);
>>
>> /*
>> * Common VFIO interrupt disable
>> @@ -3585,6 +3595,9 @@ static VFIODeviceOps vfio_pci_ops = {
>> .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
>> .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
>> .vfio_eoi = vfio_eoi,
>> + .vfio_check_device = vfio_check_device,
>> + .vfio_populate_regions = vfio_populate_regions,
>> + .vfio_populate_interrupts = vfio_populate_interrupts,
>> };
>>
>> static void vfio_reset_handler(void *opaque)
>> @@ -3927,54 +3940,53 @@ static void vfio_put_group(VFIOGroup *group)
>> }
>> }
>>
>> -static int vfio_get_device(VFIOGroup *group, const char *name,
>> - VFIOPCIDevice *vdev)
>> +static int vfio_check_device(VFIODevice *vbasedev)
>> {
>> - struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>> - struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>> - struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>> - int ret, i;
>> -
>> - ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>> - if (ret < 0) {
>> - error_report("vfio: error getting device %s from group %d: %m",
>> - name, group->groupid);
>> - error_printf("Verify all devices in group %d are bound to vfio-pci "
>> - "or pci-stub and not already in use\n", group->groupid);
>> - return ret;
>> + if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
>> + error_report("vfio: Um, this isn't a PCI device");
>> + goto error;
>> }
>> -
>> - vdev->vbasedev.fd = ret;
>> - vdev->vbasedev.group = group;
>> - QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
>> -
>> - /* Sanity check device */
>> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
>> - if (ret) {
>> - error_report("vfio: error getting device info: %m");
>> + if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>> + error_report("vfio: unexpected number of io regions %u",
>> + vbasedev->num_regions);
>> goto error;
>> }
>> -
>> - DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
>> - dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
>> -
>> - if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
>> - error_report("vfio: Um, this isn't a PCI device");
>> + if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
>> + error_report("vfio: unexpected number of irqs %u",
>> + vbasedev->num_irqs);
>> goto error;
>> }
>> + return 0;
>> +error:
>> + vfio_put_base_device(vbasedev);
>
> This doesn't make much sense, this function never "got" the base device,
> so why does it need to "put" it on error? We should simply return error
> and the caller (presumably who got it) should put the device.
Hi Alex,
definitively I need to revisit and homogenize my error handling: all
sub-functions just returning errors - if sensible- , get/put at upper
level. errno misusage. Sorry for that :-(
>
>> + return -errno;
>
> Nothing above seems to guarantee we have anything useful in errno (or
> that it hasn't been clobbered).
replaced by -1
>
>> +}
>>
>> - vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
>> +{
>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> + int ret;
>> + struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>> + irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>>
>> - if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>> - error_report("vfio: unexpected number of io regions %u",
>> - dev_info.num_regions);
>> - goto error;
>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>> + if (ret) {
>> + /* This can fail for an old kernel or legacy PCI dev */
>> + DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
>> + } else if (irq_info.count == 1) {
>> + vdev->pci_aer = true;
>> + } else {
>> + error_report("vfio: %s Could not enable error recovery for the device",
>> + vbasedev->name);
>> }
>> + return ret;
>
> This function returns error if the device doesn't support error
> reporting, which is an optional feature. I don't think that's what we
> want.
OK misunderstood the comment. function proto changed to return void.
>
>> +}
>>
>> - if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
>> - error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
>> - goto error;
>> - }
>> +static int vfio_populate_regions(VFIODevice *vbasedev)
>> +{
>> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>> + int i, ret;
>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>
>> for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
>> reg_info.index = i;
>> @@ -4018,7 +4030,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>> vdev->config_offset = reg_info.offset;
>>
>> if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
>> - dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
>> + vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
>> struct vfio_region_info vga_info = {
>> .argsz = sizeof(vga_info),
>> .index = VFIO_PCI_VGA_REGION_INDEX,
>> @@ -4057,38 +4069,87 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>
>> vdev->has_vga = true;
>> }
>> - irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>> + return 0;
>> +error:
>> + vfio_put_base_device(vbasedev);
>> + return -errno;
>
> errno can get clobbered by here, don't count on it. Also, why does this
> put the base device while interrupt_populate error does not? The put
> needs to happen a level above these functions imho.
ok
>
>> +}
>> +
>> +static int vfio_get_device(VFIOGroup *group, const char *name,
>> + VFIODevice *vbasedev)
>> +{
>> + struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>> + struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>> + int ret;
>> +
>> + ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>> + if (ret < 0) {
>> + error_report("vfio: error getting device %s from group %d: %m",
>> + name, group->groupid);
>> + error_printf("Verify all devices in group %d are bound to vfio-pci "
>> + "or pci-stub and not already in use\n", group->groupid);
>> + return ret;
>> + }
>> +
>> + vbasedev->fd = ret;
>> + vbasedev->group = group;
>> + QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>>
>> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
>> if (ret) {
>> - /* This can fail for an old kernel or legacy PCI dev */
>> - DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
>> - ret = 0;
>> - } else if (irq_info.count == 1) {
>> - vdev->pci_aer = true;
>> - } else {
>> - error_report("vfio: %04x:%02x:%02x.%x "
>> - "Could not enable error recovery for the device",
>> - vdev->host.domain, vdev->host.bus, vdev->host.slot,
>> - vdev->host.function);
>> + error_report("vfio: error getting device info: %m");
>> + goto error;
>> + }
>> +
>> + vbasedev->num_irqs = dev_info.num_irqs;
>> + vbasedev->num_regions = dev_info.num_regions;
>> + vbasedev->flags = dev_info.flags;
>> +
>> + DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
>> + dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
>> +
>> + vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>> +
>> + /* call device specific functions */
>> + ret = vbasedev->ops->vfio_check_device(vbasedev);
>> + if (ret) {
>> + error_report("vfio: error when checking device %s\n",
>> + vbasedev->name);
>> + goto error;
>> + }
>> + ret = vbasedev->ops->vfio_populate_regions(vbasedev);
>> + if (ret) {
>> + error_report("vfio: error when populating regions of device %s\n",
>> + vbasedev->name);
>> + goto error;
>> + }
>> + ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
>> + if (ret) {
>> + error_report("vfio: error when populating interrupts of device %s\n",
>> + vbasedev->name);
>> + goto error;
>> }
>>
>> error:
>> if (ret) {
>> - QLIST_REMOVE(&vdev->vbasedev, next);
>> - vdev->vbasedev.group = NULL;
>> - close(vdev->vbasedev.fd);
>> + vfio_put_base_device(vbasedev);
>
> Whoops, more confusion, the call-out functions are doing put calls
> (well, some of them) and so is this. This is the only place it should
> occur.
OK
>
>> }
>> return ret;
>> }
>>
>> -static void vfio_put_device(VFIOPCIDevice *vdev)
>> +void vfio_put_base_device(VFIODevice *vbasedev)
>> {
>> - QLIST_REMOVE(&vdev->vbasedev, next);
>> - vdev->vbasedev.group = NULL;
>> + QLIST_REMOVE(vbasedev, next);
>> + vbasedev->group = NULL;
>> DPRINTF("vfio_put_device: close vdev->fd\n");
>> - close(vdev->vbasedev.fd);
>> - g_free(vdev->vbasedev.name);
>> + close(vbasedev->fd);
>> + g_free(vbasedev->name);
>
> get/put of the base device is still a bit messy. .name doesn't get
> allocated by the get, but gets freed by the put.
.name dealloc moved to vfio_put_device
Thank you for your review
Best Regards
Eric
>
>> +}
>> +
>> +static void vfio_put_device(VFIOPCIDevice *vdev)
>> +{
>> + vfio_put_base_device(&vdev->vbasedev);
>> if (vdev->msix) {
>> g_free(vdev->msix);
>> vdev->msix = NULL;
>> @@ -4266,7 +4327,7 @@ static int vfio_initfn(PCIDevice *pdev)
>> }
>> }
>>
>> - ret = vfio_get_device(group, path, vdev);
>> + ret = vfio_get_device(group, path, &vdev->vbasedev);
>> if (ret) {
>> error_report("vfio: failed to get device %s", path);
>> vfio_put_group(group);
>
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device
2014-07-24 9:51 ` Eric Auger
@ 2014-07-24 10:25 ` Eric Auger
0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2014-07-24 10:25 UTC (permalink / raw)
To: Alex Williamson
Cc: agraf, kim.phillips, eric.auger, peter.maydell, patches,
will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, stuart.yoder,
a.motakis, kvmarm, christoffer.dall
On 07/24/2014 11:51 AM, Eric Auger wrote:
> On 07/09/2014 12:43 AM, Alex Williamson wrote:
>> On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
>>> vfio_get_device now takes a VFIODevice as argument. The function is split
>>> into 4 functional parts: dev_info query, device check, region populate
>>> and interrupt populate. the last 3 are specialized by parent device and
>>> are added into DeviceOps.
>>>
>>> 3 new fields are introduced in VFIODevice to store dev_info.
>>>
>>> vfio_put_base_device is created.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>> ---
>>> hw/vfio/pci.c | 181 +++++++++++++++++++++++++++++++++++++++-------------------
>>> 1 file changed, 121 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 5f0164a..d228cf8 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -194,12 +194,18 @@ typedef struct VFIODevice {
>>> bool reset_works;
>>> bool needs_reset;
>>> VFIODeviceOps *ops;
>>> + unsigned int num_irqs;
>>> + unsigned int num_regions;
>>> + unsigned int flags;
>>> } VFIODevice;
>>>
>>> struct VFIODeviceOps {
>>> bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
>>> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>>> void (*vfio_eoi)(VFIODevice *vdev);
>>> + int (*vfio_check_device)(VFIODevice *vdev);
>>> + int (*vfio_populate_regions)(VFIODevice *vdev);
>>> + int (*vfio_populate_interrupts)(VFIODevice *vdev);
>>> };
>>>
>>> typedef struct VFIOPCIDevice {
>>> @@ -286,6 +292,10 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
>>> static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
>>> uint32_t val, int len);
>>> static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>> +static void vfio_put_base_device(VFIODevice *vbasedev);
>>> +static int vfio_check_device(VFIODevice *vbasedev);
>>> +static int vfio_populate_regions(VFIODevice *vbasedev);
>>> +static int vfio_populate_interrupts(VFIODevice *vbasedev);
>>>
>>> /*
>>> * Common VFIO interrupt disable
>>> @@ -3585,6 +3595,9 @@ static VFIODeviceOps vfio_pci_ops = {
>>> .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
>>> .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
>>> .vfio_eoi = vfio_eoi,
>>> + .vfio_check_device = vfio_check_device,
>>> + .vfio_populate_regions = vfio_populate_regions,
>>> + .vfio_populate_interrupts = vfio_populate_interrupts,
>>> };
>>>
>>> static void vfio_reset_handler(void *opaque)
>>> @@ -3927,54 +3940,53 @@ static void vfio_put_group(VFIOGroup *group)
>>> }
>>> }
>>>
>>> -static int vfio_get_device(VFIOGroup *group, const char *name,
>>> - VFIOPCIDevice *vdev)
>>> +static int vfio_check_device(VFIODevice *vbasedev)
>>> {
>>> - struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>>> - struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>>> - struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>>> - int ret, i;
>>> -
>>> - ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>> - if (ret < 0) {
>>> - error_report("vfio: error getting device %s from group %d: %m",
>>> - name, group->groupid);
>>> - error_printf("Verify all devices in group %d are bound to vfio-pci "
>>> - "or pci-stub and not already in use\n", group->groupid);
>>> - return ret;
>>> + if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
>>> + error_report("vfio: Um, this isn't a PCI device");
>>> + goto error;
>>> }
>>> -
>>> - vdev->vbasedev.fd = ret;
>>> - vdev->vbasedev.group = group;
>>> - QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
>>> -
>>> - /* Sanity check device */
>>> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
>>> - if (ret) {
>>> - error_report("vfio: error getting device info: %m");
>>> + if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>>> + error_report("vfio: unexpected number of io regions %u",
>>> + vbasedev->num_regions);
>>> goto error;
>>> }
>>> -
>>> - DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
>>> - dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
>>> -
>>> - if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
>>> - error_report("vfio: Um, this isn't a PCI device");
>>> + if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
>>> + error_report("vfio: unexpected number of irqs %u",
>>> + vbasedev->num_irqs);
>>> goto error;
>>> }
>>> + return 0;
>>> +error:
>>> + vfio_put_base_device(vbasedev);
>>
>> This doesn't make much sense, this function never "got" the base device,
>> so why does it need to "put" it on error? We should simply return error
>> and the caller (presumably who got it) should put the device.
> Hi Alex,
>
> definitively I need to revisit and homogenize my error handling: all
> sub-functions just returning errors - if sensible- , get/put at upper
> level. errno misusage. Sorry for that :-(
>>
>>> + return -errno;
>>
>> Nothing above seems to guarantee we have anything useful in errno (or
>> that it hasn't been clobbered).
> replaced by -1
>>
>>> +}
>>>
>>> - vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>>> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
>>> +{
>>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>> + int ret;
>>> + struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>>> + irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>>>
>>> - if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>>> - error_report("vfio: unexpected number of io regions %u",
>>> - dev_info.num_regions);
>>> - goto error;
>>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>>> + if (ret) {
>>> + /* This can fail for an old kernel or legacy PCI dev */
>>> + DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
>>> + } else if (irq_info.count == 1) {
>>> + vdev->pci_aer = true;
>>> + } else {
>>> + error_report("vfio: %s Could not enable error recovery for the device",
>>> + vbasedev->name);
>>> }
>>> + return ret;
>>
>> This function returns error if the device doesn't support error
>> reporting, which is an optional feature. I don't think that's what we
>> want.
> OK misunderstood the comment. function proto changed to return void.
after some more thoughts related to platform usage I would like to keep
the return value and set it to 0 for PCI.
BR
Eric
>>
>>> +}
>>>
>>> - if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
>>> - error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
>>> - goto error;
>>> - }
>>> +static int vfio_populate_regions(VFIODevice *vbasedev)
>>> +{
>>> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>>> + int i, ret;
>>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>>
>>> for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
>>> reg_info.index = i;
>>> @@ -4018,7 +4030,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>> vdev->config_offset = reg_info.offset;
>>>
>>> if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
>>> - dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
>>> + vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
>>> struct vfio_region_info vga_info = {
>>> .argsz = sizeof(vga_info),
>>> .index = VFIO_PCI_VGA_REGION_INDEX,
>>> @@ -4057,38 +4069,87 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>>
>>> vdev->has_vga = true;
>>> }
>>> - irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>>> + return 0;
>>> +error:
>>> + vfio_put_base_device(vbasedev);
>>> + return -errno;
>>
>> errno can get clobbered by here, don't count on it. Also, why does this
>> put the base device while interrupt_populate error does not? The put
>> needs to happen a level above these functions imho.
>
> ok
>>
>>> +}
>>> +
>>> +static int vfio_get_device(VFIOGroup *group, const char *name,
>>> + VFIODevice *vbasedev)
>>> +{
>>> + struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>>> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>>> + struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>>> + int ret;
>>> +
>>> + ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>> + if (ret < 0) {
>>> + error_report("vfio: error getting device %s from group %d: %m",
>>> + name, group->groupid);
>>> + error_printf("Verify all devices in group %d are bound to vfio-pci "
>>> + "or pci-stub and not already in use\n", group->groupid);
>>> + return ret;
>>> + }
>>> +
>>> + vbasedev->fd = ret;
>>> + vbasedev->group = group;
>>> + QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>>>
>>> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
>>> if (ret) {
>>> - /* This can fail for an old kernel or legacy PCI dev */
>>> - DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
>>> - ret = 0;
>>> - } else if (irq_info.count == 1) {
>>> - vdev->pci_aer = true;
>>> - } else {
>>> - error_report("vfio: %04x:%02x:%02x.%x "
>>> - "Could not enable error recovery for the device",
>>> - vdev->host.domain, vdev->host.bus, vdev->host.slot,
>>> - vdev->host.function);
>>> + error_report("vfio: error getting device info: %m");
>>> + goto error;
>>> + }
>>> +
>>> + vbasedev->num_irqs = dev_info.num_irqs;
>>> + vbasedev->num_regions = dev_info.num_regions;
>>> + vbasedev->flags = dev_info.flags;
>>> +
>>> + DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
>>> + dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
>>> +
>>> + vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>>> +
>>> + /* call device specific functions */
>>> + ret = vbasedev->ops->vfio_check_device(vbasedev);
>>> + if (ret) {
>>> + error_report("vfio: error when checking device %s\n",
>>> + vbasedev->name);
>>> + goto error;
>>> + }
>>> + ret = vbasedev->ops->vfio_populate_regions(vbasedev);
>>> + if (ret) {
>>> + error_report("vfio: error when populating regions of device %s\n",
>>> + vbasedev->name);
>>> + goto error;
>>> + }
>>> + ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
>>> + if (ret) {
>>> + error_report("vfio: error when populating interrupts of device %s\n",
>>> + vbasedev->name);
>>> + goto error;
>>> }
>>>
>>> error:
>>> if (ret) {
>>> - QLIST_REMOVE(&vdev->vbasedev, next);
>>> - vdev->vbasedev.group = NULL;
>>> - close(vdev->vbasedev.fd);
>>> + vfio_put_base_device(vbasedev);
>>
>> Whoops, more confusion, the call-out functions are doing put calls
>> (well, some of them) and so is this. This is the only place it should
>> occur.
> OK
>>
>>> }
>>> return ret;
>>> }
>>>
>>> -static void vfio_put_device(VFIOPCIDevice *vdev)
>>> +void vfio_put_base_device(VFIODevice *vbasedev)
>>> {
>>> - QLIST_REMOVE(&vdev->vbasedev, next);
>>> - vdev->vbasedev.group = NULL;
>>> + QLIST_REMOVE(vbasedev, next);
>>> + vbasedev->group = NULL;
>>> DPRINTF("vfio_put_device: close vdev->fd\n");
>>> - close(vdev->vbasedev.fd);
>>> - g_free(vdev->vbasedev.name);
>>> + close(vbasedev->fd);
>>> + g_free(vbasedev->name);
>>
>> get/put of the base device is still a bit messy. .name doesn't get
>> allocated by the get, but gets freed by the put.
>
> .name dealloc moved to vfio_put_device
>
> Thank you for your review
>
> Best Regards
>
> Eric
>>
>>> +}
>>> +
>>> +static void vfio_put_device(VFIOPCIDevice *vdev)
>>> +{
>>> + vfio_put_base_device(&vdev->vbasedev);
>>> if (vdev->msix) {
>>> g_free(vdev->msix);
>>> vdev->msix = NULL;
>>> @@ -4266,7 +4327,7 @@ static int vfio_initfn(PCIDevice *pdev)
>>> }
>>> }
>>>
>>> - ret = vfio_get_device(group, path, vdev);
>>> + ret = vfio_get_device(group, path, &vdev->vbasedev);
>>> if (ret) {
>>> error_report("vfio: failed to get device %s", path);
>>> vfio_put_group(group);
>>
>>
>>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2014-07-24 10:25 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-07 12:27 [Qemu-devel] [RFC v4 00/13] KVM platform device passthrough Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 01/13] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 02/13] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files Eric Auger
2014-07-08 18:55 ` Alex Williamson
2014-07-23 9:59 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice Eric Auger
2014-07-08 22:41 ` Alex Williamson
2014-07-23 10:02 ` Eric Auger
2014-07-23 10:24 ` Peter Maydell
2014-07-23 11:40 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 05/13] hw/vfio/pci: Introduce VFIORegion Eric Auger
2014-07-08 22:41 ` Alex Williamson
2014-07-23 13:50 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 06/13] hw/vfio/pci: split vfio_get_device Eric Auger
2014-07-08 22:43 ` Alex Williamson
2014-07-24 9:51 ` Eric Auger
2014-07-24 10:25 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 07/13] hw/vfio: create common module Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 08/13] hw/vfio/common: Add EXEC_FLAG to VFIO DMA mappings Eric Auger
2014-07-07 12:40 ` Peter Maydell
2014-07-07 12:49 ` Will Deacon
2014-07-07 13:25 ` Alvise Rigo
2014-07-07 13:29 ` Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 09/13] hw/vfio/platform: add vfio-platform support Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 10/13] hw/intc/arm_gic_kvm: enable irqfd and set routing table Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 11/13] hw/vfio/platform: Add irqfd support Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 12/13] hw/vfio/platform: add default dt generation for vfio device Eric Auger
2014-07-07 12:27 ` [Qemu-devel] [RFC v4 13/13] hw/vfio: add an example calxeda_xgmac Eric Auger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).