qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH qemu 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb
@ 2015-09-02  8:16 Alexey Kardashevskiy
  2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
  2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
  0 siblings, 2 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-02  8:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
	qemu-ppc, David Gibson


This allows sharing the same PHB for both emulated and VFIO devices.
This used to be a part of the Dynamic DMA window patchset. Since it was
moved out and reworked quite a lot, all reviewed-by's got outdated.

This is based on dgibson/spapr-next sha1 2194742.


Please comment. Thanks.


Alexey Kardashevskiy (2):
  spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  spapr_pci: Remove constraints about VFIO-PCI devices

 hw/ppc/Makefile.objs        |   5 +-
 hw/ppc/spapr_pci.c          | 170 +++++++++++++++++++++++++-------------------
 hw/ppc/spapr_pci_vfio.c     | 147 ++++++++++++++------------------------
 hw/vfio/common.c            |  22 ++----
 include/hw/pci-host/spapr.h |  23 +++---
 include/hw/vfio/vfio.h      |   3 +-
 6 files changed, 170 insertions(+), 200 deletions(-)

-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-02  8:16 [Qemu-devel] [PATCH qemu 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
@ 2015-09-02  8:16 ` Alexey Kardashevskiy
  2015-09-02  9:44   ` Alexey Kardashevskiy
  2015-09-03  2:05   ` David Gibson
  2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
  1 sibling, 2 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-02  8:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Gavin Shan,
	David Gibson

sPAPRTCETable is handling 2 TCE tables already:

1) guest view of the TCE table - emulated devices use only this table;

2) hardware IOMMU table - VFIO PCI devices use it for actual work but
it does not replace 1) and it is not visible to the guest.
The initialization of this table is driven by vfio-pci device,
DMA map/unmap requests are handled via MemoryListener so there is very
little to do in spapr-pci-vfio-host-bridge.

This moves VFIO bits to the generic spapr-pci-host-bridge which allows
putting emulated and VFIO devices on the same PHB. It is still possible
to create multiple PHBs and avoid sharing PHB resouces for emulated and
VFIO devices.

If there is no VFIO-PCI device attaches, no special ioctls will be called.
If there are some VFIO-PCI devices attached, PHB may refuse to attach
another VFIO-PCI device if a VFIO container on the host kernel side
does not support container sharing.

This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
still should be used for VFIO devices. The next patch will remove IOMMU ID
property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.

This adds a number of VFIO-PCI devices currently attached to a PHB as
PHB needs to know whether to do DMA setup for VFIO or not. Since
at the moment of the PHB's realize() invocation we cannot tell yet
how many VFIO-PCI devices are there (they are not attached yet),
this moves DMA setup to the reset handler.

This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
and decide whether to call spapr_phb_vfio_eeh_set_option() or not.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/Makefile.objs        |   5 +-
 hw/ppc/spapr_pci.c          | 170 +++++++++++++++++++++++++-------------------
 hw/ppc/spapr_pci_vfio.c     | 144 +++++++++++++++----------------------
 include/hw/pci-host/spapr.h |  23 +++---
 4 files changed, 164 insertions(+), 178 deletions(-)

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..6c06fcf 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,10 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
-ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
-obj-y += spapr_pci_vfio.o
-endif
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_pci_vfio.o spapr_rtc.o spapr_drc.o
 # PowerPC 4xx boards
 obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
 obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index bc30631..12f841f 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -430,7 +430,6 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
                                     target_ulong rets)
 {
     sPAPRPHBState *sphb;
-    sPAPRPHBClass *spc;
     PCIDevice *pdev;
     uint32_t addr, option;
     uint64_t buid;
@@ -445,7 +444,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
     option = rtas_ld(args, 3);
 
     sphb = spapr_pci_find_phb(spapr, buid);
-    if (!sphb) {
+    if (!sphb || (sphb->vfio_num == 0)) {
         goto param_error_exit;
     }
 
@@ -455,12 +454,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
         goto param_error_exit;
     }
 
-    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
-    if (!spc->eeh_set_option) {
-        goto param_error_exit;
-    }
-
-    ret = spc->eeh_set_option(sphb, addr, option);
+    ret = spapr_phb_vfio_eeh_set_option(sphb, pdev, option);
     rtas_st(rets, 0, ret);
     return;
 
@@ -475,7 +469,6 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
                                            target_ulong rets)
 {
     sPAPRPHBState *sphb;
-    sPAPRPHBClass *spc;
     PCIDevice *pdev;
     uint32_t addr, option;
     uint64_t buid;
@@ -486,12 +479,7 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
 
     buid = rtas_ldq(args, 1);
     sphb = spapr_pci_find_phb(spapr, buid);
-    if (!sphb) {
-        goto param_error_exit;
-    }
-
-    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
-    if (!spc->eeh_set_option) {
+    if (!sphb || (sphb->vfio_num == 0)) {
         goto param_error_exit;
     }
 
@@ -531,7 +519,6 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
                                             target_ulong rets)
 {
     sPAPRPHBState *sphb;
-    sPAPRPHBClass *spc;
     uint64_t buid;
     int state, ret;
 
@@ -541,16 +528,11 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
 
     buid = rtas_ldq(args, 1);
     sphb = spapr_pci_find_phb(spapr, buid);
-    if (!sphb) {
+    if (!sphb || (sphb->vfio_num == 0)) {
         goto param_error_exit;
     }
 
-    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
-    if (!spc->eeh_get_state) {
-        goto param_error_exit;
-    }
-
-    ret = spc->eeh_get_state(sphb, &state);
+    ret = spapr_phb_vfio_eeh_get_state(sphb, &state);
     rtas_st(rets, 0, ret);
     if (ret != RTAS_OUT_SUCCESS) {
         return;
@@ -575,7 +557,6 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu,
                                     target_ulong rets)
 {
     sPAPRPHBState *sphb;
-    sPAPRPHBClass *spc;
     uint32_t option;
     uint64_t buid;
     int ret;
@@ -587,16 +568,11 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu,
     buid = rtas_ldq(args, 1);
     option = rtas_ld(args, 3);
     sphb = spapr_pci_find_phb(spapr, buid);
-    if (!sphb) {
+    if (!sphb || (sphb->vfio_num == 0)) {
         goto param_error_exit;
     }
 
-    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
-    if (!spc->eeh_reset) {
-        goto param_error_exit;
-    }
-
-    ret = spc->eeh_reset(sphb, option);
+    ret = spapr_phb_vfio_eeh_reset(sphb, option);
     rtas_st(rets, 0, ret);
     return;
 
@@ -611,7 +587,6 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu,
                                   target_ulong rets)
 {
     sPAPRPHBState *sphb;
-    sPAPRPHBClass *spc;
     uint64_t buid;
     int ret;
 
@@ -621,16 +596,11 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu,
 
     buid = rtas_ldq(args, 1);
     sphb = spapr_pci_find_phb(spapr, buid);
-    if (!sphb) {
+    if (!sphb || (sphb->vfio_num == 0)) {
         goto param_error_exit;
     }
 
-    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
-    if (!spc->eeh_configure) {
-        goto param_error_exit;
-    }
-
-    ret = spc->eeh_configure(sphb);
+    ret = spapr_phb_vfio_eeh_configure(sphb);
     rtas_st(rets, 0, ret);
     return;
 
@@ -646,7 +616,6 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu,
                                        target_ulong rets)
 {
     sPAPRPHBState *sphb;
-    sPAPRPHBClass *spc;
     int option;
     uint64_t buid;
 
@@ -656,12 +625,7 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu,
 
     buid = rtas_ldq(args, 1);
     sphb = spapr_pci_find_phb(spapr, buid);
-    if (!sphb) {
-        goto param_error_exit;
-    }
-
-    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
-    if (!spc->eeh_set_option) {
+    if (!sphb || (sphb->vfio_num == 0)) {
         goto param_error_exit;
     }
 
@@ -810,6 +774,57 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev)
     return buf;
 }
 
+static void spapr_phb_walk_devices(PCIBus *bus, PCIDevice *pdev, void *opaque)
+{
+    sPAPRPHBState *sphb = opaque;
+    PCIBus *sec_bus;
+
+    if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
+        ++sphb->vfio_num;
+    }
+
+    if ((pci_default_read_config(pdev, PCI_HEADER_TYPE, 1) !=
+         PCI_HEADER_TYPE_BRIDGE)) {
+        return;
+    }
+
+    sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
+    if (!sec_bus) {
+        return;
+    }
+
+    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+                        spapr_phb_walk_devices, sphb);
+}
+
+static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
+{
+    int ret;
+    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
+
+    sphb->dma32_window_start = 0;
+    sphb->dma32_window_size = SPAPR_PCI_DMA32_SIZE;
+
+    /* Update the number of VFIO-PCI devices on the PHB */
+    sphb->vfio_num = 0;
+    pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
+
+    if (sphb->vfio_num) {
+        if (sphb->iommugroupid == -1) {
+            error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
+            return -1;
+        }
+
+        ret = spapr_phb_vfio_dma_capabilities_update(sphb);
+        if (ret) {
+            error_report("Unable to get DMA32 info from VFIO");
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /* Macros to operate with address in OF binding to PCI */
 #define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
 #define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
@@ -1217,7 +1232,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
     sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
     PCIHostState *phb = PCI_HOST_BRIDGE(s);
-    sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
     char *namebuf;
     int i;
     PCIBus *bus;
@@ -1371,35 +1385,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    if (!info->finish_realize) {
-        error_setg(errp, "finish_realize not defined");
-        return;
-    }
-
-    info->finish_realize(sphb, errp);
-
     sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
 }
 
-static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
-{
-    sPAPRTCETable *tcet;
-    uint32_t nb_table;
-
-    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
-    tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
-                               0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
-    if (!tcet) {
-        error_setg(errp, "Unable to create TCE table for %s",
-                   sphb->dtbusname);
-        return ;
-    }
-
-    /* Register default 32bit DMA window */
-    memory_region_add_subregion(&sphb->iommu_root, 0,
-                                spapr_tce_get_iommu(tcet));
-}
-
 static int spapr_phb_children_reset(Object *child, void *opaque)
 {
     DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
@@ -1413,8 +1401,42 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
 
 static void spapr_phb_reset(DeviceState *qdev)
 {
+    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
+    sPAPRTCETable *tcet;
+
     /* Reset the IOMMU state */
     object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
+
+    if (spapr_phb_dma_capabilities_update(sphb)) {
+        return;
+    }
+
+    /* Register default 32bit DMA window */
+    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
+    if (!tcet) {
+        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
+        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
+                                   sphb->dma32_window_start,
+                                   SPAPR_TCE_PAGE_SHIFT, nb,
+                                   sphb->vfio_num > 0);
+        if (!tcet) {
+            error_report("No default TCE table for %s", sphb->dtbusname);
+            return;
+        }
+
+        memory_region_add_subregion(&sphb->iommu_root, 0,
+                                    spapr_tce_get_iommu(tcet));
+    }
+
+    if (sphb->vfio_num) {
+        /*
+         * The PE might be in frozen state. To reenable the EEH
+         * functionality on it will clean the frozen state, which
+         * ensures that the contained PCI devices will work properly
+         * after reboot.
+         */
+        spapr_phb_vfio_eeh_reenable(SPAPR_PCI_HOST_BRIDGE(qdev));
+    }
 }
 
 static Property spapr_phb_properties[] = {
@@ -1535,7 +1557,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
 {
     PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
-    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
     HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
 
     hc->root_bus_path = spapr_phb_root_bus_path;
@@ -1545,7 +1566,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_spapr_pci;
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->cannot_instantiate_with_device_add_yet = false;
-    spc->finish_realize = spapr_phb_finish_realize;
     hp->plug = spapr_phb_hot_plug_child;
     hp->unplug = spapr_phb_hot_unplug_child;
 }
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index cca45ed..f94d8a4 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,84 +20,47 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "hw/pci/msix.h"
-#include "linux/vfio.h"
 #include "hw/vfio/vfio.h"
 
+#ifdef CONFIG_LINUX
+#include "linux/vfio.h"
+
 static Property spapr_phb_vfio_properties[] = {
-    DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
+    DEFINE_PROP_INT32("iommu", sPAPRPHBState, iommugroupid, -1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
+int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
     int ret;
-    sPAPRTCETable *tcet;
-    uint32_t liobn = svphb->phb.dma_liobn;
 
-    if (svphb->iommugroupid == -1) {
-        error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid);
-        return;
-    }
-
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
-                               VFIO_CHECK_EXTENSION,
-                               (void *) VFIO_SPAPR_TCE_IOMMU);
-    if (ret != 1) {
-        error_setg_errno(errp, -ret,
-                         "spapr-vfio: SPAPR extension is not supported");
-        return;
-    }
-
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
                                VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
     if (ret) {
-        error_setg_errno(errp, -ret,
-                         "spapr-vfio: get info from container failed");
-        return;
+        return ret;
     }
 
-    tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start,
-                               SPAPR_TCE_PAGE_SHIFT,
-                               info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT,
-                               true);
-    if (!tcet) {
-        error_setg(errp, "spapr-vfio: failed to create VFIO TCE table");
-        return;
-    }
+    sphb->dma32_window_start = info.dma32_window_start;
+    sphb->dma32_window_size = info.dma32_window_size;
 
-    /* Register default 32bit DMA window */
-    memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
-                                spapr_tce_get_iommu(tcet));
+    return ret;
 }
 
-static void spapr_phb_vfio_eeh_reenable(sPAPRPHBVFIOState *svphb)
+void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
 {
     struct vfio_eeh_pe_op op = {
         .argsz = sizeof(op),
         .op    = VFIO_EEH_PE_ENABLE
     };
 
-    vfio_container_ioctl(&svphb->phb.iommu_as,
-                         svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
+    vfio_container_ioctl(&sphb->iommu_as,
+                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
 }
 
-static void spapr_phb_vfio_reset(DeviceState *qdev)
+int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
+                                  PCIDevice *pdev, int option)
 {
-    /*
-     * The PE might be in frozen state. To reenable the EEH
-     * functionality on it will clean the frozen state, which
-     * ensures that the contained PCI devices will work properly
-     * after reboot.
-     */
-    spapr_phb_vfio_eeh_reenable(SPAPR_PCI_VFIO_HOST_BRIDGE(qdev));
-}
-
-static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
-                                         unsigned int addr, int option)
-{
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
     int ret;
 
@@ -105,25 +68,9 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
     case RTAS_EEH_DISABLE:
         op.op = VFIO_EEH_PE_DISABLE;
         break;
-    case RTAS_EEH_ENABLE: {
-        PCIHostState *phb;
-        PCIDevice *pdev;
-
-        /*
-         * The EEH functionality is enabled on basis of PCI device,
-         * instead of PE. We need check the validity of the PCI
-         * device address.
-         */
-        phb = PCI_HOST_BRIDGE(sphb);
-        pdev = pci_find_device(phb->bus,
-                               (addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
-        if (!pdev) {
-            return RTAS_OUT_PARAM_ERROR;
-        }
-
+    case RTAS_EEH_ENABLE:
         op.op = VFIO_EEH_PE_ENABLE;
         break;
-    }
     case RTAS_EEH_THAW_IO:
         op.op = VFIO_EEH_PE_UNFREEZE_IO;
         break;
@@ -134,7 +81,7 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
         return RTAS_OUT_PARAM_ERROR;
     }
 
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
                                VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_HW_ERROR;
@@ -143,14 +90,13 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
     return RTAS_OUT_SUCCESS;
 }
 
-static int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
+int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
     int ret;
 
     op.op = VFIO_EEH_PE_GET_STATE;
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
                                VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_PARAM_ERROR;
@@ -203,9 +149,8 @@ static void spapr_phb_vfio_eeh_pre_reset(sPAPRPHBState *sphb)
        pci_for_each_bus(phb->bus, spapr_phb_vfio_eeh_clear_bus_msix, NULL);
 }
 
-static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
+int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
     int ret;
 
@@ -225,7 +170,7 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
         return RTAS_OUT_PARAM_ERROR;
     }
 
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
                                VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_HW_ERROR;
@@ -234,14 +179,13 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
     return RTAS_OUT_SUCCESS;
 }
 
-static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
+int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
     int ret;
 
     op.op = VFIO_EEH_PE_CONFIGURE;
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
                                VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_PARAM_ERROR;
@@ -253,23 +197,14 @@ static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
 static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
-    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
 
     dc->props = spapr_phb_vfio_properties;
-    dc->reset = spapr_phb_vfio_reset;
-    spc->finish_realize = spapr_phb_vfio_finish_realize;
-    spc->eeh_set_option = spapr_phb_vfio_eeh_set_option;
-    spc->eeh_get_state = spapr_phb_vfio_eeh_get_state;
-    spc->eeh_reset = spapr_phb_vfio_eeh_reset;
-    spc->eeh_configure = spapr_phb_vfio_eeh_configure;
 }
 
 static const TypeInfo spapr_phb_vfio_info = {
     .name          = TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE,
     .parent        = TYPE_SPAPR_PCI_HOST_BRIDGE,
-    .instance_size = sizeof(sPAPRPHBVFIOState),
     .class_init    = spapr_phb_vfio_class_init,
-    .class_size    = sizeof(sPAPRPHBClass),
 };
 
 static void spapr_pci_vfio_register_types(void)
@@ -278,3 +213,36 @@ static void spapr_pci_vfio_register_types(void)
 }
 
 type_init(spapr_pci_vfio_register_types)
+
+#else /* !CONFIG_LINUX */
+
+int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
+{
+    return -EINVAL;
+}
+
+int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
+                                  PCIDevice *pdev, int option)
+{
+    return RTAS_OUT_HW_ERROR;
+}
+
+int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
+{
+    return RTAS_OUT_HW_ERROR;
+}
+
+int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
+{
+    return RTAS_OUT_HW_ERROR;
+}
+
+int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
+{
+    return RTAS_OUT_HW_ERROR;
+}
+
+void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
+{
+}
+#endif
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 5322b56..7f3c712 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -47,12 +47,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState;
 
 struct sPAPRPHBClass {
     PCIHostBridgeClass parent_class;
-
-    void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
-    int (*eeh_set_option)(sPAPRPHBState *sphb, unsigned int addr, int option);
-    int (*eeh_get_state)(sPAPRPHBState *sphb, int *state);
-    int (*eeh_reset)(sPAPRPHBState *sphb, int option);
-    int (*eeh_configure)(sPAPRPHBState *sphb);
 };
 
 typedef struct spapr_pci_msi {
@@ -91,12 +85,11 @@ struct sPAPRPHBState {
     spapr_pci_msi_mig *msi_devs;
 
     QLIST_ENTRY(sPAPRPHBState) list;
-};
 
-struct sPAPRPHBVFIOState {
-    sPAPRPHBState phb;
-
-    int32_t iommugroupid;
+    int32_t iommugroupid; /* obsolete */
+    uint32_t dma32_window_start;
+    uint32_t dma32_window_size;
+    unsigned vfio_num;
 };
 
 #define SPAPR_PCI_MAX_INDEX          255
@@ -138,4 +131,12 @@ sPAPRPHBState *spapr_pci_find_phb(sPAPRMachineState *spapr, uint64_t buid);
 PCIDevice *spapr_pci_find_dev(sPAPRMachineState *spapr, uint64_t buid,
                               uint32_t config_addr);
 
+int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb);
+int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
+                                  PCIDevice *pdev, int option);
+int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state);
+int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option);
+int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb);
+void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb);
+
 #endif /* __HW_SPAPR_PCI_H__ */
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH qemu 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
  2015-09-02  8:16 [Qemu-devel] [PATCH qemu 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
  2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
@ 2015-09-02  8:16 ` Alexey Kardashevskiy
  1 sibling, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-02  8:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Gavin Shan,
	David Gibson

So far there were 2 limitations enforced on an emulated PHB
regarding VFIO:
1) only one IOMMU group per IOMMU container was allowed and
the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
2) only one IOMMU container per PHB was allowed as a group
can only be attached to one container.

However these are not really necessary so we are removing them here.

This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
not to receive it. Instead of getting a container from a group ID,
this calls ioctl() for every container attached to the PHB address space.
This allows multiple containers on the same PHB which means multiple
groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
to be passed to every container separately which will affect
the performance. For 32bit devices it is still recommended to put
every group to a separate PHB.

Since the existing VFIO code is already trying to share a container for
multiple groups, just removing a group id from
the vfio_container_do_ioctl() allows the existing code to share
a container if it is supported by the host kernel.

This replaces the check for a group id to be set correctly with
the check that it is not set.

This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
members is accessed here.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_pci.c      | 10 +++++-----
 hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
 hw/vfio/common.c        | 22 ++++++----------------
 include/hw/vfio/vfio.h  |  3 +--
 4 files changed, 18 insertions(+), 34 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 12f841f..ec20e23 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
     pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
 
     if (sphb->vfio_num) {
-        if (sphb->iommugroupid == -1) {
-            error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
-            return -1;
-        }
-
         ret = spapr_phb_vfio_dma_capabilities_update(sphb);
         if (ret) {
             error_report("Unable to get DMA32 info from VFIO");
@@ -1237,6 +1232,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     PCIBus *bus;
     uint64_t msi_window_size = 4096;
 
+    if ((sphb->iommugroupid != -1) &&
+        object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
+        error_report("Warning: iommugroupid is deprecated and will be ignored");
+    }
+
     if (sphb->index != (uint32_t)-1) {
         hwaddr windows_base;
 
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index f94d8a4..48137d5 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
     struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
     int ret;
 
-    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as,
                                VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
     if (ret) {
         return ret;
@@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
         .op    = VFIO_EEH_PE_ENABLE
     };
 
-    vfio_container_ioctl(&sphb->iommu_as,
-                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
+    vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
 }
 
 int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
@@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
         return RTAS_OUT_PARAM_ERROR;
     }
 
-    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
-                               VFIO_EEH_PE_OP, &op);
+    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_HW_ERROR;
     }
@@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
     int ret;
 
     op.op = VFIO_EEH_PE_GET_STATE;
-    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
-                               VFIO_EEH_PE_OP, &op);
+    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_PARAM_ERROR;
     }
@@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
         return RTAS_OUT_PARAM_ERROR;
     }
 
-    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
-                               VFIO_EEH_PE_OP, &op);
+    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_HW_ERROR;
     }
@@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
     int ret;
 
     op.op = VFIO_EEH_PE_CONFIGURE;
-    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
-                               VFIO_EEH_PE_OP, &op);
+    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_PARAM_ERROR;
     }
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85ee9b0..a00644e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
     close(vbasedev->fd);
 }
 
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
-                                   int req, void *param)
+static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
 {
-    VFIOGroup *group;
     VFIOContainer *container;
     int ret = -1;
+    VFIOAddressSpace *space = vfio_get_address_space(as);
 
-    group = vfio_get_group(groupid, as);
-    if (!group) {
-        error_report("vfio: group %d not registered", groupid);
-        return ret;
-    }
-
-    container = group->container;
-    if (group->container) {
+    QLIST_FOREACH(container, &space->containers, next) {
         ret = ioctl(container->fd, req, param);
         if (ret < 0) {
             error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
                          _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
+            return -errno;
         }
     }
 
-    vfio_put_group(group);
-
     return ret;
 }
 
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
-                         int req, void *param)
+int vfio_container_ioctl(AddressSpace *as, int req, void *param)
 {
     /* We allow only certain ioctls to the container */
     switch (req) {
@@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
         return -1;
     }
 
-    return vfio_container_do_ioctl(as, groupid, req, param);
+    return vfio_container_do_ioctl(as, req, param);
 }
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 0b26cd8..e076c14 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -3,7 +3,6 @@
 
 #include "qemu/typedefs.h"
 
-extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
-                                int req, void *param);
+extern int vfio_container_ioctl(AddressSpace *as, int req, void *param);
 
 #endif
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
@ 2015-09-02  9:44   ` Alexey Kardashevskiy
  2015-09-03  2:07     ` David Gibson
  2015-09-03  2:05   ` David Gibson
  1 sibling, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-02  9:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Alexander Graf, Gavin Shan, David Gibson

On 09/02/2015 06:16 PM, Alexey Kardashevskiy wrote:
> sPAPRTCETable is handling 2 TCE tables already:
>
> 1) guest view of the TCE table - emulated devices use only this table;
>
> 2) hardware IOMMU table - VFIO PCI devices use it for actual work but
> it does not replace 1) and it is not visible to the guest.
> The initialization of this table is driven by vfio-pci device,
> DMA map/unmap requests are handled via MemoryListener so there is very
> little to do in spapr-pci-vfio-host-bridge.
>
> This moves VFIO bits to the generic spapr-pci-host-bridge which allows
> putting emulated and VFIO devices on the same PHB. It is still possible
> to create multiple PHBs and avoid sharing PHB resouces for emulated and
> VFIO devices.
>
> If there is no VFIO-PCI device attaches, no special ioctls will be called.
> If there are some VFIO-PCI devices attached, PHB may refuse to attach
> another VFIO-PCI device if a VFIO container on the host kernel side
> does not support container sharing.
>
> This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
> except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
> still should be used for VFIO devices. The next patch will remove IOMMU ID
> property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
>
> This adds a number of VFIO-PCI devices currently attached to a PHB as
> PHB needs to know whether to do DMA setup for VFIO or not. Since
> at the moment of the PHB's realize() invocation we cannot tell yet
> how many VFIO-PCI devices are there (they are not attached yet),
> this moves DMA setup to the reset handler.
>
> This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
> rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
> and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
>
> This should cause no behavioural change.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/ppc/Makefile.objs        |   5 +-
>   hw/ppc/spapr_pci.c          | 170 +++++++++++++++++++++++++-------------------
>   hw/ppc/spapr_pci_vfio.c     | 144 +++++++++++++++----------------------
>   include/hw/pci-host/spapr.h |  23 +++---
>   4 files changed, 164 insertions(+), 178 deletions(-)
>
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c8ab06e..6c06fcf 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -3,10 +3,7 @@ obj-y += ppc.o ppc_booke.o
>   # IBM pSeries (sPAPR)
>   obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>   obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> -obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
> -ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> -obj-y += spapr_pci_vfio.o
> -endif
> +obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_pci_vfio.o spapr_rtc.o spapr_drc.o
>   # PowerPC 4xx boards
>   obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>   obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index bc30631..12f841f 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -430,7 +430,6 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
>                                       target_ulong rets)
>   {
>       sPAPRPHBState *sphb;
> -    sPAPRPHBClass *spc;
>       PCIDevice *pdev;
>       uint32_t addr, option;
>       uint64_t buid;
> @@ -445,7 +444,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
>       option = rtas_ld(args, 3);
>
>       sphb = spapr_pci_find_phb(spapr, buid);
> -    if (!sphb) {
> +    if (!sphb || (sphb->vfio_num == 0)) {
>           goto param_error_exit;
>       }
>
> @@ -455,12 +454,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
>           goto param_error_exit;
>       }
>
> -    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> -    if (!spc->eeh_set_option) {
> -        goto param_error_exit;
> -    }
> -
> -    ret = spc->eeh_set_option(sphb, addr, option);
> +    ret = spapr_phb_vfio_eeh_set_option(sphb, pdev, option);
>       rtas_st(rets, 0, ret);
>       return;
>
> @@ -475,7 +469,6 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
>                                              target_ulong rets)
>   {
>       sPAPRPHBState *sphb;
> -    sPAPRPHBClass *spc;
>       PCIDevice *pdev;
>       uint32_t addr, option;
>       uint64_t buid;
> @@ -486,12 +479,7 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
>
>       buid = rtas_ldq(args, 1);
>       sphb = spapr_pci_find_phb(spapr, buid);
> -    if (!sphb) {
> -        goto param_error_exit;
> -    }
> -
> -    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> -    if (!spc->eeh_set_option) {
> +    if (!sphb || (sphb->vfio_num == 0)) {
>           goto param_error_exit;
>       }
>
> @@ -531,7 +519,6 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
>                                               target_ulong rets)
>   {
>       sPAPRPHBState *sphb;
> -    sPAPRPHBClass *spc;
>       uint64_t buid;
>       int state, ret;
>
> @@ -541,16 +528,11 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
>
>       buid = rtas_ldq(args, 1);
>       sphb = spapr_pci_find_phb(spapr, buid);
> -    if (!sphb) {
> +    if (!sphb || (sphb->vfio_num == 0)) {
>           goto param_error_exit;
>       }
>
> -    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> -    if (!spc->eeh_get_state) {
> -        goto param_error_exit;
> -    }
> -
> -    ret = spc->eeh_get_state(sphb, &state);
> +    ret = spapr_phb_vfio_eeh_get_state(sphb, &state);
>       rtas_st(rets, 0, ret);
>       if (ret != RTAS_OUT_SUCCESS) {
>           return;
> @@ -575,7 +557,6 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu,
>                                       target_ulong rets)
>   {
>       sPAPRPHBState *sphb;
> -    sPAPRPHBClass *spc;
>       uint32_t option;
>       uint64_t buid;
>       int ret;
> @@ -587,16 +568,11 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu,
>       buid = rtas_ldq(args, 1);
>       option = rtas_ld(args, 3);
>       sphb = spapr_pci_find_phb(spapr, buid);
> -    if (!sphb) {
> +    if (!sphb || (sphb->vfio_num == 0)) {
>           goto param_error_exit;
>       }
>
> -    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> -    if (!spc->eeh_reset) {
> -        goto param_error_exit;
> -    }
> -
> -    ret = spc->eeh_reset(sphb, option);
> +    ret = spapr_phb_vfio_eeh_reset(sphb, option);
>       rtas_st(rets, 0, ret);
>       return;
>
> @@ -611,7 +587,6 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu,
>                                     target_ulong rets)
>   {
>       sPAPRPHBState *sphb;
> -    sPAPRPHBClass *spc;
>       uint64_t buid;
>       int ret;
>
> @@ -621,16 +596,11 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu,
>
>       buid = rtas_ldq(args, 1);
>       sphb = spapr_pci_find_phb(spapr, buid);
> -    if (!sphb) {
> +    if (!sphb || (sphb->vfio_num == 0)) {
>           goto param_error_exit;
>       }
>
> -    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> -    if (!spc->eeh_configure) {
> -        goto param_error_exit;
> -    }
> -
> -    ret = spc->eeh_configure(sphb);
> +    ret = spapr_phb_vfio_eeh_configure(sphb);
>       rtas_st(rets, 0, ret);
>       return;
>
> @@ -646,7 +616,6 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu,
>                                          target_ulong rets)
>   {
>       sPAPRPHBState *sphb;
> -    sPAPRPHBClass *spc;
>       int option;
>       uint64_t buid;
>
> @@ -656,12 +625,7 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu,
>
>       buid = rtas_ldq(args, 1);
>       sphb = spapr_pci_find_phb(spapr, buid);
> -    if (!sphb) {
> -        goto param_error_exit;
> -    }
> -
> -    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> -    if (!spc->eeh_set_option) {
> +    if (!sphb || (sphb->vfio_num == 0)) {
>           goto param_error_exit;
>       }
>
> @@ -810,6 +774,57 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev)
>       return buf;
>   }
>
> +static void spapr_phb_walk_devices(PCIBus *bus, PCIDevice *pdev, void *opaque)
> +{
> +    sPAPRPHBState *sphb = opaque;
> +    PCIBus *sec_bus;
> +
> +    if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> +        ++sphb->vfio_num;
> +    }
> +
> +    if ((pci_default_read_config(pdev, PCI_HEADER_TYPE, 1) !=
> +         PCI_HEADER_TYPE_BRIDGE)) {
> +        return;
> +    }
> +
> +    sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
> +    if (!sec_bus) {
> +        return;
> +    }
> +
> +    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> +                        spapr_phb_walk_devices, sphb);
> +}
> +
> +static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> +{
> +    int ret;
> +    PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
> +
> +    sphb->dma32_window_start = 0;
> +    sphb->dma32_window_size = SPAPR_PCI_DMA32_SIZE;
> +
> +    /* Update the number of VFIO-PCI devices on the PHB */
> +    sphb->vfio_num = 0;
> +    pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
> +
> +    if (sphb->vfio_num) {
> +        if (sphb->iommugroupid == -1) {
> +            error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> +            return -1;
> +        }
> +
> +        ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> +        if (ret) {
> +            error_report("Unable to get DMA32 info from VFIO");
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>   /* Macros to operate with address in OF binding to PCI */
>   #define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
>   #define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> @@ -1217,7 +1232,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>       SysBusDevice *s = SYS_BUS_DEVICE(dev);
>       sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>       PCIHostState *phb = PCI_HOST_BRIDGE(s);
> -    sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
>       char *namebuf;
>       int i;
>       PCIBus *bus;
> @@ -1371,35 +1385,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>           }
>       }
>
> -    if (!info->finish_realize) {
> -        error_setg(errp, "finish_realize not defined");
> -        return;
> -    }
> -
> -    info->finish_realize(sphb, errp);
> -
>       sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
>   }
>
> -static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> -{
> -    sPAPRTCETable *tcet;
> -    uint32_t nb_table;
> -
> -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
> -    tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> -                               0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
> -    if (!tcet) {
> -        error_setg(errp, "Unable to create TCE table for %s",
> -                   sphb->dtbusname);
> -        return ;
> -    }
> -
> -    /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, 0,
> -                                spapr_tce_get_iommu(tcet));
> -}
> -
>   static int spapr_phb_children_reset(Object *child, void *opaque)
>   {
>       DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
> @@ -1413,8 +1401,42 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
>
>   static void spapr_phb_reset(DeviceState *qdev)
>   {
> +    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +    sPAPRTCETable *tcet;
> +
>       /* Reset the IOMMU state */
>       object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
> +
> +    if (spapr_phb_dma_capabilities_update(sphb)) {
> +        return;
> +    }
> +
> +    /* Register default 32bit DMA window */
> +    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
> +    if (!tcet) {
> +        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
> +        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> +                                   sphb->dma32_window_start,
> +                                   SPAPR_TCE_PAGE_SHIFT, nb,
> +                                   sphb->vfio_num > 0);
> +        if (!tcet) {
> +            error_report("No default TCE table for %s", sphb->dtbusname);
> +            return;
> +        }
> +
> +        memory_region_add_subregion(&sphb->iommu_root, 0,


Of course bug... should be bus_offset instead of 0. I'll wait a bit for 
more reviews and respin later.



> +                                    spapr_tce_get_iommu(tcet));
> +    }
> +
> +    if (sphb->vfio_num) {
> +        /*
> +         * The PE might be in frozen state. To reenable the EEH
> +         * functionality on it will clean the frozen state, which
> +         * ensures that the contained PCI devices will work properly
> +         * after reboot.
> +         */
> +        spapr_phb_vfio_eeh_reenable(SPAPR_PCI_HOST_BRIDGE(qdev));
> +    }
>   }
>
>   static Property spapr_phb_properties[] = {
> @@ -1535,7 +1557,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>   {
>       PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
>       DeviceClass *dc = DEVICE_CLASS(klass);
> -    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
>       HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
>
>       hc->root_bus_path = spapr_phb_root_bus_path;
> @@ -1545,7 +1566,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>       dc->vmsd = &vmstate_spapr_pci;
>       set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>       dc->cannot_instantiate_with_device_add_yet = false;
> -    spc->finish_realize = spapr_phb_finish_realize;
>       hp->plug = spapr_phb_hot_plug_child;
>       hp->unplug = spapr_phb_hot_unplug_child;
>   }
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index cca45ed..f94d8a4 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -20,84 +20,47 @@
>   #include "hw/ppc/spapr.h"
>   #include "hw/pci-host/spapr.h"
>   #include "hw/pci/msix.h"
> -#include "linux/vfio.h"
>   #include "hw/vfio/vfio.h"
>
> +#ifdef CONFIG_LINUX
> +#include "linux/vfio.h"
> +
>   static Property spapr_phb_vfio_properties[] = {
> -    DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
> +    DEFINE_PROP_INT32("iommu", sPAPRPHBState, iommugroupid, -1),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>
> -static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
> +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
>   {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>       struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>       int ret;
> -    sPAPRTCETable *tcet;
> -    uint32_t liobn = svphb->phb.dma_liobn;
>
> -    if (svphb->iommugroupid == -1) {
> -        error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid);
> -        return;
> -    }
> -
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> -                               VFIO_CHECK_EXTENSION,
> -                               (void *) VFIO_SPAPR_TCE_IOMMU);
> -    if (ret != 1) {
> -        error_setg_errno(errp, -ret,
> -                         "spapr-vfio: SPAPR extension is not supported");
> -        return;
> -    }
> -
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                  VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>       if (ret) {
> -        error_setg_errno(errp, -ret,
> -                         "spapr-vfio: get info from container failed");
> -        return;
> +        return ret;
>       }
>
> -    tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start,
> -                               SPAPR_TCE_PAGE_SHIFT,
> -                               info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT,
> -                               true);
> -    if (!tcet) {
> -        error_setg(errp, "spapr-vfio: failed to create VFIO TCE table");
> -        return;
> -    }
> +    sphb->dma32_window_start = info.dma32_window_start;
> +    sphb->dma32_window_size = info.dma32_window_size;
>
> -    /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
> -                                spapr_tce_get_iommu(tcet));
> +    return ret;
>   }
>
> -static void spapr_phb_vfio_eeh_reenable(sPAPRPHBVFIOState *svphb)
> +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
>   {
>       struct vfio_eeh_pe_op op = {
>           .argsz = sizeof(op),
>           .op    = VFIO_EEH_PE_ENABLE
>       };
>
> -    vfio_container_ioctl(&svphb->phb.iommu_as,
> -                         svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> +    vfio_container_ioctl(&sphb->iommu_as,
> +                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
>   }
>
> -static void spapr_phb_vfio_reset(DeviceState *qdev)
> +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> +                                  PCIDevice *pdev, int option)
>   {
> -    /*
> -     * The PE might be in frozen state. To reenable the EEH
> -     * functionality on it will clean the frozen state, which
> -     * ensures that the contained PCI devices will work properly
> -     * after reboot.
> -     */
> -    spapr_phb_vfio_eeh_reenable(SPAPR_PCI_VFIO_HOST_BRIDGE(qdev));
> -}
> -
> -static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> -                                         unsigned int addr, int option)
> -{
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>       struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>       int ret;
>
> @@ -105,25 +68,9 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>       case RTAS_EEH_DISABLE:
>           op.op = VFIO_EEH_PE_DISABLE;
>           break;
> -    case RTAS_EEH_ENABLE: {
> -        PCIHostState *phb;
> -        PCIDevice *pdev;
> -
> -        /*
> -         * The EEH functionality is enabled on basis of PCI device,
> -         * instead of PE. We need check the validity of the PCI
> -         * device address.
> -         */
> -        phb = PCI_HOST_BRIDGE(sphb);
> -        pdev = pci_find_device(phb->bus,
> -                               (addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
> -        if (!pdev) {
> -            return RTAS_OUT_PARAM_ERROR;
> -        }
> -
> +    case RTAS_EEH_ENABLE:
>           op.op = VFIO_EEH_PE_ENABLE;
>           break;
> -    }
>       case RTAS_EEH_THAW_IO:
>           op.op = VFIO_EEH_PE_UNFREEZE_IO;
>           break;
> @@ -134,7 +81,7 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>           return RTAS_OUT_PARAM_ERROR;
>       }
>
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                  VFIO_EEH_PE_OP, &op);
>       if (ret < 0) {
>           return RTAS_OUT_HW_ERROR;
> @@ -143,14 +90,13 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>       return RTAS_OUT_SUCCESS;
>   }
>
> -static int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
>   {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>       struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>       int ret;
>
>       op.op = VFIO_EEH_PE_GET_STATE;
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                  VFIO_EEH_PE_OP, &op);
>       if (ret < 0) {
>           return RTAS_OUT_PARAM_ERROR;
> @@ -203,9 +149,8 @@ static void spapr_phb_vfio_eeh_pre_reset(sPAPRPHBState *sphb)
>          pci_for_each_bus(phb->bus, spapr_phb_vfio_eeh_clear_bus_msix, NULL);
>   }
>
> -static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>   {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>       struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>       int ret;
>
> @@ -225,7 +170,7 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>           return RTAS_OUT_PARAM_ERROR;
>       }
>
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                  VFIO_EEH_PE_OP, &op);
>       if (ret < 0) {
>           return RTAS_OUT_HW_ERROR;
> @@ -234,14 +179,13 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>       return RTAS_OUT_SUCCESS;
>   }
>
> -static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>   {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>       struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>       int ret;
>
>       op.op = VFIO_EEH_PE_CONFIGURE;
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                  VFIO_EEH_PE_OP, &op);
>       if (ret < 0) {
>           return RTAS_OUT_PARAM_ERROR;
> @@ -253,23 +197,14 @@ static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>   static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
> -    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
>
>       dc->props = spapr_phb_vfio_properties;
> -    dc->reset = spapr_phb_vfio_reset;
> -    spc->finish_realize = spapr_phb_vfio_finish_realize;
> -    spc->eeh_set_option = spapr_phb_vfio_eeh_set_option;
> -    spc->eeh_get_state = spapr_phb_vfio_eeh_get_state;
> -    spc->eeh_reset = spapr_phb_vfio_eeh_reset;
> -    spc->eeh_configure = spapr_phb_vfio_eeh_configure;
>   }
>
>   static const TypeInfo spapr_phb_vfio_info = {
>       .name          = TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE,
>       .parent        = TYPE_SPAPR_PCI_HOST_BRIDGE,
> -    .instance_size = sizeof(sPAPRPHBVFIOState),
>       .class_init    = spapr_phb_vfio_class_init,
> -    .class_size    = sizeof(sPAPRPHBClass),
>   };
>
>   static void spapr_pci_vfio_register_types(void)
> @@ -278,3 +213,36 @@ static void spapr_pci_vfio_register_types(void)
>   }
>
>   type_init(spapr_pci_vfio_register_types)
> +
> +#else /* !CONFIG_LINUX */
> +
> +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> +{
> +    return -EINVAL;
> +}
> +
> +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> +                                  PCIDevice *pdev, int option)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> +{
> +}
> +#endif
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 5322b56..7f3c712 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -47,12 +47,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState;
>
>   struct sPAPRPHBClass {
>       PCIHostBridgeClass parent_class;
> -
> -    void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
> -    int (*eeh_set_option)(sPAPRPHBState *sphb, unsigned int addr, int option);
> -    int (*eeh_get_state)(sPAPRPHBState *sphb, int *state);
> -    int (*eeh_reset)(sPAPRPHBState *sphb, int option);
> -    int (*eeh_configure)(sPAPRPHBState *sphb);
>   };
>
>   typedef struct spapr_pci_msi {
> @@ -91,12 +85,11 @@ struct sPAPRPHBState {
>       spapr_pci_msi_mig *msi_devs;
>
>       QLIST_ENTRY(sPAPRPHBState) list;
> -};
>
> -struct sPAPRPHBVFIOState {
> -    sPAPRPHBState phb;
> -
> -    int32_t iommugroupid;
> +    int32_t iommugroupid; /* obsolete */
> +    uint32_t dma32_window_start;
> +    uint32_t dma32_window_size;
> +    unsigned vfio_num;
>   };
>
>   #define SPAPR_PCI_MAX_INDEX          255
> @@ -138,4 +131,12 @@ sPAPRPHBState *spapr_pci_find_phb(sPAPRMachineState *spapr, uint64_t buid);
>   PCIDevice *spapr_pci_find_dev(sPAPRMachineState *spapr, uint64_t buid,
>                                 uint32_t config_addr);
>
> +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb);
> +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> +                                  PCIDevice *pdev, int option);
> +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state);
> +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option);
> +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb);
> +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb);
> +
>   #endif /* __HW_SPAPR_PCI_H__ */
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
  2015-09-02  9:44   ` Alexey Kardashevskiy
@ 2015-09-03  2:05   ` David Gibson
  2015-09-03  3:19     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 9+ messages in thread
From: David Gibson @ 2015-09-03  2:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 16633 bytes --]

On Wed, Sep 02, 2015 at 06:16:02PM +1000, Alexey Kardashevskiy wrote:
> sPAPRTCETable is handling 2 TCE tables already:
> 
> 1) guest view of the TCE table - emulated devices use only this table;
> 
> 2) hardware IOMMU table - VFIO PCI devices use it for actual work but
> it does not replace 1) and it is not visible to the guest.
> The initialization of this table is driven by vfio-pci device,
> DMA map/unmap requests are handled via MemoryListener so there is very
> little to do in spapr-pci-vfio-host-bridge.
> 
> This moves VFIO bits to the generic spapr-pci-host-bridge which allows
> putting emulated and VFIO devices on the same PHB. It is still possible
> to create multiple PHBs and avoid sharing PHB resouces for emulated and
> VFIO devices.
> 
> If there is no VFIO-PCI device attaches, no special ioctls will be called.
> If there are some VFIO-PCI devices attached, PHB may refuse to attach
> another VFIO-PCI device if a VFIO container on the host kernel side
> does not support container sharing.
> 
> This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
> except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
> still should be used for VFIO devices. The next patch will remove IOMMU ID
> property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
> 
> This adds a number of VFIO-PCI devices currently attached to a PHB as
> PHB needs to know whether to do DMA setup for VFIO or not. Since
> at the moment of the PHB's realize() invocation we cannot tell yet
> how many VFIO-PCI devices are there (they are not attached yet),
> this moves DMA setup to the reset handler.
> 
> This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
> rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
> and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
> 
> This should cause no behavioural change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

[snip]
>  static int spapr_phb_children_reset(Object *child, void *opaque)
>  {
>      DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
> @@ -1413,8 +1401,42 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
>  
>  static void spapr_phb_reset(DeviceState *qdev)
>  {
> +    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +    sPAPRTCETable *tcet;
> +
>      /* Reset the IOMMU state */
>      object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
> +
> +    if (spapr_phb_dma_capabilities_update(sphb)) {
> +        return;
> +    }
> +
> +    /* Register default 32bit DMA window */
> +    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
> +    if (!tcet) {
> +        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
> +        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> +                                   sphb->dma32_window_start,
> +                                   SPAPR_TCE_PAGE_SHIFT, nb,
> +                                   sphb->vfio_num > 0);

Could delaying the construction of the TCE table object until reset
time cause problems with migration?  i.e. can you be sure that the
destination will have the TCE table object present and in a suitable
state to accept the incoming table information from the source?

> +        if (!tcet) {
> +            error_report("No default TCE table for %s", sphb->dtbusname);
> +            return;
> +        }
> +
> +        memory_region_add_subregion(&sphb->iommu_root, 0,
> +                                    spapr_tce_get_iommu(tcet));
> +    }
> +
> +    if (sphb->vfio_num) {
> +        /*
> +         * The PE might be in frozen state. To reenable the EEH
> +         * functionality on it will clean the frozen state, which
> +         * ensures that the contained PCI devices will work properly
> +         * after reboot.
> +         */
> +        spapr_phb_vfio_eeh_reenable(SPAPR_PCI_HOST_BRIDGE(qdev));
> +    }
>  }
>  
>  static Property spapr_phb_properties[] = {
> @@ -1535,7 +1557,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>  {
>      PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
>      DeviceClass *dc = DEVICE_CLASS(klass);
> -    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
>      HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
>  
>      hc->root_bus_path = spapr_phb_root_bus_path;
> @@ -1545,7 +1566,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_spapr_pci;
>      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>      dc->cannot_instantiate_with_device_add_yet = false;
> -    spc->finish_realize = spapr_phb_finish_realize;
>      hp->plug = spapr_phb_hot_plug_child;
>      hp->unplug = spapr_phb_hot_unplug_child;
>  }
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index cca45ed..f94d8a4 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -20,84 +20,47 @@
>  #include "hw/ppc/spapr.h"
>  #include "hw/pci-host/spapr.h"
>  #include "hw/pci/msix.h"
> -#include "linux/vfio.h"
>  #include "hw/vfio/vfio.h"
>  
> +#ifdef CONFIG_LINUX
> +#include "linux/vfio.h"
> +
>  static Property spapr_phb_vfio_properties[] = {
> -    DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
> +    DEFINE_PROP_INT32("iommu", sPAPRPHBState, iommugroupid, -1),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> -static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
> +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>      int ret;
> -    sPAPRTCETable *tcet;
> -    uint32_t liobn = svphb->phb.dma_liobn;
>  
> -    if (svphb->iommugroupid == -1) {
> -        error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid);
> -        return;
> -    }
> -
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> -                               VFIO_CHECK_EXTENSION,
> -                               (void *) VFIO_SPAPR_TCE_IOMMU);
> -    if (ret != 1) {
> -        error_setg_errno(errp, -ret,
> -                         "spapr-vfio: SPAPR extension is not supported");
> -        return;
> -    }
> -
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                 VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>      if (ret) {
> -        error_setg_errno(errp, -ret,
> -                         "spapr-vfio: get info from container failed");
> -        return;
> +        return ret;
>      }
>  
> -    tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start,
> -                               SPAPR_TCE_PAGE_SHIFT,
> -                               info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT,
> -                               true);
> -    if (!tcet) {
> -        error_setg(errp, "spapr-vfio: failed to create VFIO TCE table");
> -        return;
> -    }
> +    sphb->dma32_window_start = info.dma32_window_start;
> +    sphb->dma32_window_size = info.dma32_window_size;
>  
> -    /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
> -                                spapr_tce_get_iommu(tcet));
> +    return ret;
>  }
>  
> -static void spapr_phb_vfio_eeh_reenable(sPAPRPHBVFIOState *svphb)
> +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
>  {
>      struct vfio_eeh_pe_op op = {
>          .argsz = sizeof(op),
>          .op    = VFIO_EEH_PE_ENABLE
>      };
>  
> -    vfio_container_ioctl(&svphb->phb.iommu_as,
> -                         svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> +    vfio_container_ioctl(&sphb->iommu_as,
> +                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
>  }
>  
> -static void spapr_phb_vfio_reset(DeviceState *qdev)
> +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> +                                  PCIDevice *pdev, int option)
>  {
> -    /*
> -     * The PE might be in frozen state. To reenable the EEH
> -     * functionality on it will clean the frozen state, which
> -     * ensures that the contained PCI devices will work properly
> -     * after reboot.
> -     */
> -    spapr_phb_vfio_eeh_reenable(SPAPR_PCI_VFIO_HOST_BRIDGE(qdev));
> -}
> -
> -static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> -                                         unsigned int addr, int option)
> -{
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>      int ret;
>  
> @@ -105,25 +68,9 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>      case RTAS_EEH_DISABLE:
>          op.op = VFIO_EEH_PE_DISABLE;
>          break;
> -    case RTAS_EEH_ENABLE: {
> -        PCIHostState *phb;
> -        PCIDevice *pdev;
> -
> -        /*
> -         * The EEH functionality is enabled on basis of PCI device,
> -         * instead of PE. We need check the validity of the PCI
> -         * device address.
> -         */
> -        phb = PCI_HOST_BRIDGE(sphb);
> -        pdev = pci_find_device(phb->bus,
> -                               (addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
> -        if (!pdev) {
> -            return RTAS_OUT_PARAM_ERROR;
> -        }
> -
> +    case RTAS_EEH_ENABLE:
>          op.op = VFIO_EEH_PE_ENABLE;
>          break;
> -    }
>      case RTAS_EEH_THAW_IO:
>          op.op = VFIO_EEH_PE_UNFREEZE_IO;
>          break;
> @@ -134,7 +81,7 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>          return RTAS_OUT_PARAM_ERROR;
>      }
>  
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                 VFIO_EEH_PE_OP, &op);
>      if (ret < 0) {
>          return RTAS_OUT_HW_ERROR;
> @@ -143,14 +90,13 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>      return RTAS_OUT_SUCCESS;
>  }
>  
> -static int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>      int ret;
>  
>      op.op = VFIO_EEH_PE_GET_STATE;
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                 VFIO_EEH_PE_OP, &op);
>      if (ret < 0) {
>          return RTAS_OUT_PARAM_ERROR;
> @@ -203,9 +149,8 @@ static void spapr_phb_vfio_eeh_pre_reset(sPAPRPHBState *sphb)
>         pci_for_each_bus(phb->bus, spapr_phb_vfio_eeh_clear_bus_msix, NULL);
>  }
>  
> -static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>      int ret;
>  
> @@ -225,7 +170,7 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>          return RTAS_OUT_PARAM_ERROR;
>      }
>  
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                 VFIO_EEH_PE_OP, &op);
>      if (ret < 0) {
>          return RTAS_OUT_HW_ERROR;
> @@ -234,14 +179,13 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>      return RTAS_OUT_SUCCESS;
>  }
>  
> -static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
>      int ret;
>  
>      op.op = VFIO_EEH_PE_CONFIGURE;
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>                                 VFIO_EEH_PE_OP, &op);
>      if (ret < 0) {
>          return RTAS_OUT_PARAM_ERROR;
> @@ -253,23 +197,14 @@ static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>  static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> -    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
>  
>      dc->props = spapr_phb_vfio_properties;
> -    dc->reset = spapr_phb_vfio_reset;
> -    spc->finish_realize = spapr_phb_vfio_finish_realize;
> -    spc->eeh_set_option = spapr_phb_vfio_eeh_set_option;
> -    spc->eeh_get_state = spapr_phb_vfio_eeh_get_state;
> -    spc->eeh_reset = spapr_phb_vfio_eeh_reset;
> -    spc->eeh_configure = spapr_phb_vfio_eeh_configure;
>  }
>  
>  static const TypeInfo spapr_phb_vfio_info = {
>      .name          = TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE,
>      .parent        = TYPE_SPAPR_PCI_HOST_BRIDGE,
> -    .instance_size = sizeof(sPAPRPHBVFIOState),
>      .class_init    = spapr_phb_vfio_class_init,
> -    .class_size    = sizeof(sPAPRPHBClass),
>  };
>  
>  static void spapr_pci_vfio_register_types(void)
> @@ -278,3 +213,36 @@ static void spapr_pci_vfio_register_types(void)
>  }
>  
>  type_init(spapr_pci_vfio_register_types)
> +
> +#else /* !CONFIG_LINUX */
> +
> +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> +{
> +    return -EINVAL;
> +}
> +
> +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> +                                  PCIDevice *pdev, int option)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> +{
> +    return RTAS_OUT_HW_ERROR;
> +}
> +
> +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> +{
> +}
> +#endif
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 5322b56..7f3c712 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -47,12 +47,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState;
>  
>  struct sPAPRPHBClass {
>      PCIHostBridgeClass parent_class;
> -
> -    void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
> -    int (*eeh_set_option)(sPAPRPHBState *sphb, unsigned int addr, int option);
> -    int (*eeh_get_state)(sPAPRPHBState *sphb, int *state);
> -    int (*eeh_reset)(sPAPRPHBState *sphb, int option);
> -    int (*eeh_configure)(sPAPRPHBState *sphb);
>  };
>  
>  typedef struct spapr_pci_msi {
> @@ -91,12 +85,11 @@ struct sPAPRPHBState {
>      spapr_pci_msi_mig *msi_devs;
>  
>      QLIST_ENTRY(sPAPRPHBState) list;
> -};
>  
> -struct sPAPRPHBVFIOState {
> -    sPAPRPHBState phb;
> -
> -    int32_t iommugroupid;
> +    int32_t iommugroupid; /* obsolete */
> +    uint32_t dma32_window_start;
> +    uint32_t dma32_window_size;
> +    unsigned vfio_num;
>  };
>  
>  #define SPAPR_PCI_MAX_INDEX          255
> @@ -138,4 +131,12 @@ sPAPRPHBState *spapr_pci_find_phb(sPAPRMachineState *spapr, uint64_t buid);
>  PCIDevice *spapr_pci_find_dev(sPAPRMachineState *spapr, uint64_t buid,
>                                uint32_t config_addr);
>  
> +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb);
> +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> +                                  PCIDevice *pdev, int option);
> +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state);
> +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option);
> +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb);
> +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb);
> +
>  #endif /* __HW_SPAPR_PCI_H__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-02  9:44   ` Alexey Kardashevskiy
@ 2015-09-03  2:07     ` David Gibson
  2015-09-03  3:14       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2015-09-03  2:07 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3724 bytes --]

On Wed, Sep 02, 2015 at 07:44:07PM +1000, Alexey Kardashevskiy wrote:
> On 09/02/2015 06:16 PM, Alexey Kardashevskiy wrote:
> >sPAPRTCETable is handling 2 TCE tables already:
> >
> >1) guest view of the TCE table - emulated devices use only this table;
> >
> >2) hardware IOMMU table - VFIO PCI devices use it for actual work but
> >it does not replace 1) and it is not visible to the guest.
> >The initialization of this table is driven by vfio-pci device,
> >DMA map/unmap requests are handled via MemoryListener so there is very
> >little to do in spapr-pci-vfio-host-bridge.
> >
> >This moves VFIO bits to the generic spapr-pci-host-bridge which allows
> >putting emulated and VFIO devices on the same PHB. It is still possible
> >to create multiple PHBs and avoid sharing PHB resouces for emulated and
> >VFIO devices.
> >
> >If there is no VFIO-PCI device attaches, no special ioctls will be called.
> >If there are some VFIO-PCI devices attached, PHB may refuse to attach
> >another VFIO-PCI device if a VFIO container on the host kernel side
> >does not support container sharing.
> >
> >This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
> >except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
> >still should be used for VFIO devices. The next patch will remove IOMMU ID
> >property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
> >
> >This adds a number of VFIO-PCI devices currently attached to a PHB as
> >PHB needs to know whether to do DMA setup for VFIO or not. Since
> >at the moment of the PHB's realize() invocation we cannot tell yet
> >how many VFIO-PCI devices are there (they are not attached yet),
> >this moves DMA setup to the reset handler.
> >
> >This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
> >rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
> >and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
> >
> >This should cause no behavioural change.

[snip]

> >  static void spapr_phb_reset(DeviceState *qdev)
> >  {
> >+    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
> >+    sPAPRTCETable *tcet;
> >+
> >      /* Reset the IOMMU state */
> >      object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
> >+
> >+    if (spapr_phb_dma_capabilities_update(sphb)) {
> >+        return;
> >+    }
> >+
> >+    /* Register default 32bit DMA window */
> >+    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
> >+    if (!tcet) {
> >+        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
> >+        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> >+                                   sphb->dma32_window_start,
> >+                                   SPAPR_TCE_PAGE_SHIFT, nb,
> >+                                   sphb->vfio_num > 0);
> >+        if (!tcet) {
> >+            error_report("No default TCE table for %s", sphb->dtbusname);
> >+            return;
> >+        }
> >+
> >+        memory_region_add_subregion(&sphb->iommu_root, 0,
> 
> 
> Of course bug... should be bus_offset instead of 0. I'll wait a bit for more
> reviews and respin later.

Are you sure?  I thought the idea of the iommu_root MR was that it was
always at offset 0, and the "real" IOMMU region sat inside it at the
right offset, allowing things to be moved around in the case of DDW.

It was at offset 0 in the original realize() time
memory_region_add_subregion().

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-03  2:07     ` David Gibson
@ 2015-09-03  3:14       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-03  3:14 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf

On 09/03/2015 12:07 PM, David Gibson wrote:
> On Wed, Sep 02, 2015 at 07:44:07PM +1000, Alexey Kardashevskiy wrote:
>> On 09/02/2015 06:16 PM, Alexey Kardashevskiy wrote:
>>> sPAPRTCETable is handling 2 TCE tables already:
>>>
>>> 1) guest view of the TCE table - emulated devices use only this table;
>>>
>>> 2) hardware IOMMU table - VFIO PCI devices use it for actual work but
>>> it does not replace 1) and it is not visible to the guest.
>>> The initialization of this table is driven by vfio-pci device,
>>> DMA map/unmap requests are handled via MemoryListener so there is very
>>> little to do in spapr-pci-vfio-host-bridge.
>>>
>>> This moves VFIO bits to the generic spapr-pci-host-bridge which allows
>>> putting emulated and VFIO devices on the same PHB. It is still possible
>>> to create multiple PHBs and avoid sharing PHB resouces for emulated and
>>> VFIO devices.
>>>
>>> If there is no VFIO-PCI device attaches, no special ioctls will be called.
>>> If there are some VFIO-PCI devices attached, PHB may refuse to attach
>>> another VFIO-PCI device if a VFIO container on the host kernel side
>>> does not support container sharing.
>>>
>>> This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
>>> except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
>>> still should be used for VFIO devices. The next patch will remove IOMMU ID
>>> property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
>>>
>>> This adds a number of VFIO-PCI devices currently attached to a PHB as
>>> PHB needs to know whether to do DMA setup for VFIO or not. Since
>>> at the moment of the PHB's realize() invocation we cannot tell yet
>>> how many VFIO-PCI devices are there (they are not attached yet),
>>> this moves DMA setup to the reset handler.
>>>
>>> This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
>>> rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
>>> and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
>>>
>>> This should cause no behavioural change.
>
> [snip]
>
>>>   static void spapr_phb_reset(DeviceState *qdev)
>>>   {
>>> +    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
>>> +    sPAPRTCETable *tcet;
>>> +
>>>       /* Reset the IOMMU state */
>>>       object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
>>> +
>>> +    if (spapr_phb_dma_capabilities_update(sphb)) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Register default 32bit DMA window */
>>> +    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
>>> +    if (!tcet) {
>>> +        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
>>> +        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>>> +                                   sphb->dma32_window_start,
>>> +                                   SPAPR_TCE_PAGE_SHIFT, nb,
>>> +                                   sphb->vfio_num > 0);
>>> +        if (!tcet) {
>>> +            error_report("No default TCE table for %s", sphb->dtbusname);
>>> +            return;
>>> +        }
>>> +
>>> +        memory_region_add_subregion(&sphb->iommu_root, 0,
>>
>>
>> Of course bug... should be bus_offset instead of 0. I'll wait a bit for more
>> reviews and respin later.
>
> Are you sure?  I thought the idea of the iommu_root MR was that it was
> always at offset 0, and the "real" IOMMU region sat inside it at the
> right offset, allowing things to be moved around in the case of DDW.

The "root iommu" patches are coming next. At this point I am just getting 
rid of vfio phb.


> It was at offset 0 in the original realize() time
> memory_region_add_subregion().


It was 0 for emulated PHB but it was tcet->bus_offset for VFIO PHB.




-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-03  2:05   ` David Gibson
@ 2015-09-03  3:19     ` Alexey Kardashevskiy
  2015-09-03  3:22       ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-03  3:19 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf

On 09/03/2015 12:05 PM, David Gibson wrote:
> On Wed, Sep 02, 2015 at 06:16:02PM +1000, Alexey Kardashevskiy wrote:
>> sPAPRTCETable is handling 2 TCE tables already:
>>
>> 1) guest view of the TCE table - emulated devices use only this table;
>>
>> 2) hardware IOMMU table - VFIO PCI devices use it for actual work but
>> it does not replace 1) and it is not visible to the guest.
>> The initialization of this table is driven by vfio-pci device,
>> DMA map/unmap requests are handled via MemoryListener so there is very
>> little to do in spapr-pci-vfio-host-bridge.
>>
>> This moves VFIO bits to the generic spapr-pci-host-bridge which allows
>> putting emulated and VFIO devices on the same PHB. It is still possible
>> to create multiple PHBs and avoid sharing PHB resouces for emulated and
>> VFIO devices.
>>
>> If there is no VFIO-PCI device attaches, no special ioctls will be called.
>> If there are some VFIO-PCI devices attached, PHB may refuse to attach
>> another VFIO-PCI device if a VFIO container on the host kernel side
>> does not support container sharing.
>>
>> This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
>> except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
>> still should be used for VFIO devices. The next patch will remove IOMMU ID
>> property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
>>
>> This adds a number of VFIO-PCI devices currently attached to a PHB as
>> PHB needs to know whether to do DMA setup for VFIO or not. Since
>> at the moment of the PHB's realize() invocation we cannot tell yet
>> how many VFIO-PCI devices are there (they are not attached yet),
>> this moves DMA setup to the reset handler.
>>
>> This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
>> rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
>> and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
>>
>> This should cause no behavioural change.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> [snip]
>>   static int spapr_phb_children_reset(Object *child, void *opaque)
>>   {
>>       DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
>> @@ -1413,8 +1401,42 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
>>
>>   static void spapr_phb_reset(DeviceState *qdev)
>>   {
>> +    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
>> +    sPAPRTCETable *tcet;
>> +
>>       /* Reset the IOMMU state */
>>       object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
>> +
>> +    if (spapr_phb_dma_capabilities_update(sphb)) {
>> +        return;
>> +    }
>> +
>> +    /* Register default 32bit DMA window */
>> +    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
>> +    if (!tcet) {
>> +        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
>> +        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>> +                                   sphb->dma32_window_start,
>> +                                   SPAPR_TCE_PAGE_SHIFT, nb,
>> +                                   sphb->vfio_num > 0);
>
> Could delaying the construction of the TCE table object until reset
> time cause problems with migration?  i.e. can you be sure that the
> destination will have the TCE table object present and in a suitable
> state to accept the incoming table information from the source?


This is a valid concern but the PHB reset handler is called (just checked) 
when QEMU is started with "-incoming tcp:vpl2:33333" so yes, I am sure :)




-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
  2015-09-03  3:19     ` Alexey Kardashevskiy
@ 2015-09-03  3:22       ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2015-09-03  3:22 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4114 bytes --]

On Thu, Sep 03, 2015 at 01:19:01PM +1000, Alexey Kardashevskiy wrote:
> On 09/03/2015 12:05 PM, David Gibson wrote:
> >On Wed, Sep 02, 2015 at 06:16:02PM +1000, Alexey Kardashevskiy wrote:
> >>sPAPRTCETable is handling 2 TCE tables already:
> >>
> >>1) guest view of the TCE table - emulated devices use only this table;
> >>
> >>2) hardware IOMMU table - VFIO PCI devices use it for actual work but
> >>it does not replace 1) and it is not visible to the guest.
> >>The initialization of this table is driven by vfio-pci device,
> >>DMA map/unmap requests are handled via MemoryListener so there is very
> >>little to do in spapr-pci-vfio-host-bridge.
> >>
> >>This moves VFIO bits to the generic spapr-pci-host-bridge which allows
> >>putting emulated and VFIO devices on the same PHB. It is still possible
> >>to create multiple PHBs and avoid sharing PHB resouces for emulated and
> >>VFIO devices.
> >>
> >>If there is no VFIO-PCI device attaches, no special ioctls will be called.
> >>If there are some VFIO-PCI devices attached, PHB may refuse to attach
> >>another VFIO-PCI device if a VFIO container on the host kernel side
> >>does not support container sharing.
> >>
> >>This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
> >>except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
> >>still should be used for VFIO devices. The next patch will remove IOMMU ID
> >>property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
> >>
> >>This adds a number of VFIO-PCI devices currently attached to a PHB as
> >>PHB needs to know whether to do DMA setup for VFIO or not. Since
> >>at the moment of the PHB's realize() invocation we cannot tell yet
> >>how many VFIO-PCI devices are there (they are not attached yet),
> >>this moves DMA setup to the reset handler.
> >>
> >>This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
> >>rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
> >>and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
> >>
> >>This should cause no behavioural change.
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >
> >[snip]
> >>  static int spapr_phb_children_reset(Object *child, void *opaque)
> >>  {
> >>      DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
> >>@@ -1413,8 +1401,42 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
> >>
> >>  static void spapr_phb_reset(DeviceState *qdev)
> >>  {
> >>+    sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
> >>+    sPAPRTCETable *tcet;
> >>+
> >>      /* Reset the IOMMU state */
> >>      object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
> >>+
> >>+    if (spapr_phb_dma_capabilities_update(sphb)) {
> >>+        return;
> >>+    }
> >>+
> >>+    /* Register default 32bit DMA window */
> >>+    tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
> >>+    if (!tcet) {
> >>+        const unsigned nb = sphb->dma32_window_size >> SPAPR_TCE_PAGE_SHIFT;
> >>+        tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> >>+                                   sphb->dma32_window_start,
> >>+                                   SPAPR_TCE_PAGE_SHIFT, nb,
> >>+                                   sphb->vfio_num > 0);
> >
> >Could delaying the construction of the TCE table object until reset
> >time cause problems with migration?  i.e. can you be sure that the
> >destination will have the TCE table object present and in a suitable
> >state to accept the incoming table information from the source?
> 
> 
> This is a valid concern but the PHB reset handler is called (just checked)
> when QEMU is started with "-incoming tcp:vpl2:33333" so yes, I am sure :)

Ok.  The rest of these patches look fine, so if you fix the bus offset
problem, I'll merge into spapr-next.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-03  3:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-02  8:16 [Qemu-devel] [PATCH qemu 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-09-02  9:44   ` Alexey Kardashevskiy
2015-09-03  2:07     ` David Gibson
2015-09-03  3:14       ` Alexey Kardashevskiy
2015-09-03  2:05   ` David Gibson
2015-09-03  3:19     ` Alexey Kardashevskiy
2015-09-03  3:22       ` David Gibson
2015-09-02  8:16 ` [Qemu-devel] [PATCH qemu 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).