* [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb @ 2015-09-03 4:40 Alexey Kardashevskiy 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy 0 siblings, 2 replies; 9+ messages in thread From: Alexey Kardashevskiy @ 2015-09-03 4:40 UTC (permalink / raw) To: qemu-devel Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Gavin Shan, David Gibson This allows sharing the same PHB for both emulated and VFIO devices. This used to be a part of the Dynamic DMA window patchset. Since it was moved out and reworked quite a lot, all reviewed-by's got outdated. This is based on dgibson/spapr-next sha1 f91ffef. Please comment. Thanks. Changes: v2: * fixed bus_offset * moved DMA setup to a spapr_phb_dma_reset() helper - DDW patches would add it anyway * verified that migration still works Alexey Kardashevskiy (2): spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge spapr_pci: Remove constraints about VFIO-PCI devices hw/ppc/Makefile.objs | 5 +- hw/ppc/spapr_pci.c | 195 +++++++++++++++++++++++++++----------------- hw/ppc/spapr_pci_vfio.c | 147 +++++++++++++-------------------- hw/vfio/common.c | 22 ++--- include/hw/pci-host/spapr.h | 23 +++--- include/hw/vfio/vfio.h | 3 +- 6 files changed, 195 insertions(+), 200 deletions(-) -- 2.4.0.rc3.8.gfb3e7d5 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge 2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy @ 2015-09-03 4:40 ` Alexey Kardashevskiy 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy 1 sibling, 0 replies; 9+ messages in thread From: Alexey Kardashevskiy @ 2015-09-03 4:40 UTC (permalink / raw) To: qemu-devel Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Gavin Shan, David Gibson sPAPRTCETable is handling 2 TCE tables already: 1) guest view of the TCE table - emulated devices use only this table; 2) hardware IOMMU table - VFIO PCI devices use it for actual work but it does not replace 1) and it is not visible to the guest. The initialization of this table is driven by vfio-pci device, DMA map/unmap requests are handled via MemoryListener so there is very little to do in spapr-pci-vfio-host-bridge. This moves VFIO bits to the generic spapr-pci-host-bridge which allows putting emulated and VFIO devices on the same PHB. It is still possible to create multiple PHBs and avoid sharing PHB resouces for emulated and VFIO devices. If there is no VFIO-PCI device attaches, no special ioctls will be called. If there are some VFIO-PCI devices attached, PHB may refuse to attach another VFIO-PCI device if a VFIO container on the host kernel side does not support container sharing. This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge except it has an additional "iommu" property so spapr-pci-vfio-host-bridge still should be used for VFIO devices. The next patch will remove IOMMU ID property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge. This adds a number of VFIO-PCI devices currently attached to a PHB as PHB needs to know whether to do DMA setup for VFIO or not. Since at the moment of the PHB's realize() invocation we cannot tell yet how many VFIO-PCI devices are there (they are not attached yet), this moves DMA setup to the reset handler (which is also called when QEMU is about to receive migration). This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci" and decide whether to call spapr_phb_vfio_eeh_set_option() or not. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- Changes: v2: * used tcet->bus_offset in the common PHB code as this is what VFIO PHB needs --- hw/ppc/Makefile.objs | 5 +- hw/ppc/spapr_pci.c | 195 +++++++++++++++++++++++++++----------------- hw/ppc/spapr_pci_vfio.c | 144 +++++++++++++------------------- include/hw/pci-host/spapr.h | 23 +++--- 4 files changed, 189 insertions(+), 178 deletions(-) diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs index c8ab06e..6c06fcf 100644 --- a/hw/ppc/Makefile.objs +++ b/hw/ppc/Makefile.objs @@ -3,10 +3,7 @@ obj-y += ppc.o ppc_booke.o # IBM pSeries (sPAPR) obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o -obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o -ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy) -obj-y += spapr_pci_vfio.o -endif +obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_pci_vfio.o spapr_rtc.o spapr_drc.o # PowerPC 4xx boards obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o obj-y += ppc4xx_pci.o diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index bc30631..4e289cb 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -430,7 +430,6 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu, target_ulong rets) { sPAPRPHBState *sphb; - sPAPRPHBClass *spc; PCIDevice *pdev; uint32_t addr, option; uint64_t buid; @@ -445,7 +444,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu, option = rtas_ld(args, 3); sphb = spapr_pci_find_phb(spapr, buid); - if (!sphb) { + if (!sphb || (sphb->vfio_num == 0)) { goto param_error_exit; } @@ -455,12 +454,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu, goto param_error_exit; } - spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); - if (!spc->eeh_set_option) { - goto param_error_exit; - } - - ret = spc->eeh_set_option(sphb, addr, option); + ret = spapr_phb_vfio_eeh_set_option(sphb, pdev, option); rtas_st(rets, 0, ret); return; @@ -475,7 +469,6 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu, target_ulong rets) { sPAPRPHBState *sphb; - sPAPRPHBClass *spc; PCIDevice *pdev; uint32_t addr, option; uint64_t buid; @@ -486,12 +479,7 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu, buid = rtas_ldq(args, 1); sphb = spapr_pci_find_phb(spapr, buid); - if (!sphb) { - goto param_error_exit; - } - - spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); - if (!spc->eeh_set_option) { + if (!sphb || (sphb->vfio_num == 0)) { goto param_error_exit; } @@ -531,7 +519,6 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu, target_ulong rets) { sPAPRPHBState *sphb; - sPAPRPHBClass *spc; uint64_t buid; int state, ret; @@ -541,16 +528,11 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu, buid = rtas_ldq(args, 1); sphb = spapr_pci_find_phb(spapr, buid); - if (!sphb) { + if (!sphb || (sphb->vfio_num == 0)) { goto param_error_exit; } - spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); - if (!spc->eeh_get_state) { - goto param_error_exit; - } - - ret = spc->eeh_get_state(sphb, &state); + ret = spapr_phb_vfio_eeh_get_state(sphb, &state); rtas_st(rets, 0, ret); if (ret != RTAS_OUT_SUCCESS) { return; @@ -575,7 +557,6 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu, target_ulong rets) { sPAPRPHBState *sphb; - sPAPRPHBClass *spc; uint32_t option; uint64_t buid; int ret; @@ -587,16 +568,11 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu, buid = rtas_ldq(args, 1); option = rtas_ld(args, 3); sphb = spapr_pci_find_phb(spapr, buid); - if (!sphb) { + if (!sphb || (sphb->vfio_num == 0)) { goto param_error_exit; } - spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); - if (!spc->eeh_reset) { - goto param_error_exit; - } - - ret = spc->eeh_reset(sphb, option); + ret = spapr_phb_vfio_eeh_reset(sphb, option); rtas_st(rets, 0, ret); return; @@ -611,7 +587,6 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu, target_ulong rets) { sPAPRPHBState *sphb; - sPAPRPHBClass *spc; uint64_t buid; int ret; @@ -621,16 +596,11 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu, buid = rtas_ldq(args, 1); sphb = spapr_pci_find_phb(spapr, buid); - if (!sphb) { + if (!sphb || (sphb->vfio_num == 0)) { goto param_error_exit; } - spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); - if (!spc->eeh_configure) { - goto param_error_exit; - } - - ret = spc->eeh_configure(sphb); + ret = spapr_phb_vfio_eeh_configure(sphb); rtas_st(rets, 0, ret); return; @@ -646,7 +616,6 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu, target_ulong rets) { sPAPRPHBState *sphb; - sPAPRPHBClass *spc; int option; uint64_t buid; @@ -656,12 +625,7 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu, buid = rtas_ldq(args, 1); sphb = spapr_pci_find_phb(spapr, buid); - if (!sphb) { - goto param_error_exit; - } - - spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); - if (!spc->eeh_set_option) { + if (!sphb || (sphb->vfio_num == 0)) { goto param_error_exit; } @@ -810,6 +774,102 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev) return buf; } +static void spapr_phb_walk_devices(PCIBus *bus, PCIDevice *pdev, void *opaque) +{ + sPAPRPHBState *sphb = opaque; + PCIBus *sec_bus; + + if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { + ++sphb->vfio_num; + } + + if ((pci_default_read_config(pdev, PCI_HEADER_TYPE, 1) != + PCI_HEADER_TYPE_BRIDGE)) { + return; + } + + sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev)); + if (!sec_bus) { + return; + } + + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), + spapr_phb_walk_devices, sphb); +} + +static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) +{ + int ret; + PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus; + + sphb->dma32_window_start = 0; + sphb->dma32_window_size = SPAPR_PCI_DMA32_SIZE; + + /* Update the number of VFIO-PCI devices on the PHB */ + sphb->vfio_num = 0; + pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); + + if (sphb->vfio_num) { + if (sphb->iommugroupid == -1) { + error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); + return -1; + } + + ret = spapr_phb_vfio_dma_capabilities_update(sphb); + if (ret) { + error_report("Unable to get DMA32 info from VFIO"); + return ret; + } + } + + return 0; +} + +static int spapr_phb_dma_init_window(sPAPRPHBState *sphb, + uint32_t liobn, uint32_t page_shift, + uint64_t window_size) +{ + sPAPRTCETable *tcet; + uint64_t bus_offset = sphb->dma32_window_start; + uint32_t nb_table = window_size >> page_shift; + + if (!nb_table) { + return -1; + } + + tcet = spapr_tce_new_table(DEVICE(sphb), liobn, bus_offset, + page_shift, nb_table, sphb->vfio_num > 0); + + return tcet ? 0 : -1; +} + +static int spapr_phb_dma_reset(sPAPRPHBState *sphb) +{ + sPAPRTCETable *tcet; + + if (spapr_phb_dma_capabilities_update(sphb)) { + return -1; + } + + /* Register default 32bit DMA window */ + tcet = spapr_tce_find_by_liobn(sphb->dma_liobn); + if (!tcet) { + spapr_phb_dma_init_window(sphb, sphb->dma_liobn, SPAPR_TCE_PAGE_SHIFT, + sphb->dma32_window_size); + + tcet = spapr_tce_find_by_liobn(sphb->dma_liobn); + if (!tcet) { + error_report("No default TCE table for %s", sphb->dtbusname); + return -1; + } + + memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset, + spapr_tce_get_iommu(tcet)); + } + + return 0; +} + /* Macros to operate with address in OF binding to PCI */ #define b_x(x, p, l) (((x) & ((1<<(l))-1)) << (p)) #define b_n(x) b_x((x), 31, 1) /* 0 if relocatable */ @@ -1217,7 +1277,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) SysBusDevice *s = SYS_BUS_DEVICE(dev); sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s); PCIHostState *phb = PCI_HOST_BRIDGE(s); - sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s); char *namebuf; int i; PCIBus *bus; @@ -1371,35 +1430,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) } } - if (!info->finish_realize) { - error_setg(errp, "finish_realize not defined"); - return; - } - - info->finish_realize(sphb, errp); - sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free); } -static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp) -{ - sPAPRTCETable *tcet; - uint32_t nb_table; - - nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT; - tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn, - 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false); - if (!tcet) { - error_setg(errp, "Unable to create TCE table for %s", - sphb->dtbusname); - return ; - } - - /* Register default 32bit DMA window */ - memory_region_add_subregion(&sphb->iommu_root, 0, - spapr_tce_get_iommu(tcet)); -} - static int spapr_phb_children_reset(Object *child, void *opaque) { DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE); @@ -1413,8 +1446,22 @@ static int spapr_phb_children_reset(Object *child, void *opaque) static void spapr_phb_reset(DeviceState *qdev) { + sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev); + + spapr_phb_dma_reset(sphb); + /* Reset the IOMMU state */ object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL); + + if (sphb->vfio_num > 0) { + /* + * The PE might be in frozen state. To reenable the EEH + * functionality on it will clean the frozen state, which + * ensures that the contained PCI devices will work properly + * after reboot. + */ + spapr_phb_vfio_eeh_reenable(sphb); + } } static Property spapr_phb_properties[] = { @@ -1535,7 +1582,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data) { PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass); - sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass); HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass); hc->root_bus_path = spapr_phb_root_bus_path; @@ -1545,7 +1591,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data) dc->vmsd = &vmstate_spapr_pci; set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); dc->cannot_instantiate_with_device_add_yet = false; - spc->finish_realize = spapr_phb_finish_realize; hp->plug = spapr_phb_hot_plug_child; hp->unplug = spapr_phb_hot_unplug_child; } diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index cca45ed..f94d8a4 100644 --- a/hw/ppc/spapr_pci_vfio.c +++ b/hw/ppc/spapr_pci_vfio.c @@ -20,84 +20,47 @@ #include "hw/ppc/spapr.h" #include "hw/pci-host/spapr.h" #include "hw/pci/msix.h" -#include "linux/vfio.h" #include "hw/vfio/vfio.h" +#ifdef CONFIG_LINUX +#include "linux/vfio.h" + static Property spapr_phb_vfio_properties[] = { - DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1), + DEFINE_PROP_INT32("iommu", sPAPRPHBState, iommugroupid, -1), DEFINE_PROP_END_OF_LIST(), }; -static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp) +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) { - sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; int ret; - sPAPRTCETable *tcet; - uint32_t liobn = svphb->phb.dma_liobn; - if (svphb->iommugroupid == -1) { - error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid); - return; - } - - ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid, - VFIO_CHECK_EXTENSION, - (void *) VFIO_SPAPR_TCE_IOMMU); - if (ret != 1) { - error_setg_errno(errp, -ret, - "spapr-vfio: SPAPR extension is not supported"); - return; - } - - ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid, + ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); if (ret) { - error_setg_errno(errp, -ret, - "spapr-vfio: get info from container failed"); - return; + return ret; } - tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start, - SPAPR_TCE_PAGE_SHIFT, - info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT, - true); - if (!tcet) { - error_setg(errp, "spapr-vfio: failed to create VFIO TCE table"); - return; - } + sphb->dma32_window_start = info.dma32_window_start; + sphb->dma32_window_size = info.dma32_window_size; - /* Register default 32bit DMA window */ - memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset, - spapr_tce_get_iommu(tcet)); + return ret; } -static void spapr_phb_vfio_eeh_reenable(sPAPRPHBVFIOState *svphb) +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) { struct vfio_eeh_pe_op op = { .argsz = sizeof(op), .op = VFIO_EEH_PE_ENABLE }; - vfio_container_ioctl(&svphb->phb.iommu_as, - svphb->iommugroupid, VFIO_EEH_PE_OP, &op); + vfio_container_ioctl(&sphb->iommu_as, + sphb->iommugroupid, VFIO_EEH_PE_OP, &op); } -static void spapr_phb_vfio_reset(DeviceState *qdev) +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, + PCIDevice *pdev, int option) { - /* - * The PE might be in frozen state. To reenable the EEH - * functionality on it will clean the frozen state, which - * ensures that the contained PCI devices will work properly - * after reboot. - */ - spapr_phb_vfio_eeh_reenable(SPAPR_PCI_VFIO_HOST_BRIDGE(qdev)); -} - -static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, - unsigned int addr, int option) -{ - sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_eeh_pe_op op = { .argsz = sizeof(op) }; int ret; @@ -105,25 +68,9 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, case RTAS_EEH_DISABLE: op.op = VFIO_EEH_PE_DISABLE; break; - case RTAS_EEH_ENABLE: { - PCIHostState *phb; - PCIDevice *pdev; - - /* - * The EEH functionality is enabled on basis of PCI device, - * instead of PE. We need check the validity of the PCI - * device address. - */ - phb = PCI_HOST_BRIDGE(sphb); - pdev = pci_find_device(phb->bus, - (addr >> 16) & 0xFF, (addr >> 8) & 0xFF); - if (!pdev) { - return RTAS_OUT_PARAM_ERROR; - } - + case RTAS_EEH_ENABLE: op.op = VFIO_EEH_PE_ENABLE; break; - } case RTAS_EEH_THAW_IO: op.op = VFIO_EEH_PE_UNFREEZE_IO; break; @@ -134,7 +81,7 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, return RTAS_OUT_PARAM_ERROR; } - ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid, + ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_HW_ERROR; @@ -143,14 +90,13 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, return RTAS_OUT_SUCCESS; } -static int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) { - sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_eeh_pe_op op = { .argsz = sizeof(op) }; int ret; op.op = VFIO_EEH_PE_GET_STATE; - ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid, + ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_PARAM_ERROR; @@ -203,9 +149,8 @@ static void spapr_phb_vfio_eeh_pre_reset(sPAPRPHBState *sphb) pci_for_each_bus(phb->bus, spapr_phb_vfio_eeh_clear_bus_msix, NULL); } -static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) { - sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_eeh_pe_op op = { .argsz = sizeof(op) }; int ret; @@ -225,7 +170,7 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) return RTAS_OUT_PARAM_ERROR; } - ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid, + ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_HW_ERROR; @@ -234,14 +179,13 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) return RTAS_OUT_SUCCESS; } -static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) { - sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_eeh_pe_op op = { .argsz = sizeof(op) }; int ret; op.op = VFIO_EEH_PE_CONFIGURE; - ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid, + ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_PARAM_ERROR; @@ -253,23 +197,14 @@ static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); - sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass); dc->props = spapr_phb_vfio_properties; - dc->reset = spapr_phb_vfio_reset; - spc->finish_realize = spapr_phb_vfio_finish_realize; - spc->eeh_set_option = spapr_phb_vfio_eeh_set_option; - spc->eeh_get_state = spapr_phb_vfio_eeh_get_state; - spc->eeh_reset = spapr_phb_vfio_eeh_reset; - spc->eeh_configure = spapr_phb_vfio_eeh_configure; } static const TypeInfo spapr_phb_vfio_info = { .name = TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE, .parent = TYPE_SPAPR_PCI_HOST_BRIDGE, - .instance_size = sizeof(sPAPRPHBVFIOState), .class_init = spapr_phb_vfio_class_init, - .class_size = sizeof(sPAPRPHBClass), }; static void spapr_pci_vfio_register_types(void) @@ -278,3 +213,36 @@ static void spapr_pci_vfio_register_types(void) } type_init(spapr_pci_vfio_register_types) + +#else /* !CONFIG_LINUX */ + +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) +{ + return -EINVAL; +} + +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, + PCIDevice *pdev, int option) +{ + return RTAS_OUT_HW_ERROR; +} + +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) +{ + return RTAS_OUT_HW_ERROR; +} + +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) +{ + return RTAS_OUT_HW_ERROR; +} + +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) +{ + return RTAS_OUT_HW_ERROR; +} + +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) +{ +} +#endif diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 5322b56..7f3c712 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -47,12 +47,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState; struct sPAPRPHBClass { PCIHostBridgeClass parent_class; - - void (*finish_realize)(sPAPRPHBState *sphb, Error **errp); - int (*eeh_set_option)(sPAPRPHBState *sphb, unsigned int addr, int option); - int (*eeh_get_state)(sPAPRPHBState *sphb, int *state); - int (*eeh_reset)(sPAPRPHBState *sphb, int option); - int (*eeh_configure)(sPAPRPHBState *sphb); }; typedef struct spapr_pci_msi { @@ -91,12 +85,11 @@ struct sPAPRPHBState { spapr_pci_msi_mig *msi_devs; QLIST_ENTRY(sPAPRPHBState) list; -}; -struct sPAPRPHBVFIOState { - sPAPRPHBState phb; - - int32_t iommugroupid; + int32_t iommugroupid; /* obsolete */ + uint32_t dma32_window_start; + uint32_t dma32_window_size; + unsigned vfio_num; }; #define SPAPR_PCI_MAX_INDEX 255 @@ -138,4 +131,12 @@ sPAPRPHBState *spapr_pci_find_phb(sPAPRMachineState *spapr, uint64_t buid); PCIDevice *spapr_pci_find_dev(sPAPRMachineState *spapr, uint64_t buid, uint32_t config_addr); +int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb); +int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, + PCIDevice *pdev, int option); +int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state); +int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option); +int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb); +void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb); + #endif /* __HW_SPAPR_PCI_H__ */ -- 2.4.0.rc3.8.gfb3e7d5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy @ 2015-09-03 4:40 ` Alexey Kardashevskiy 2015-09-10 2:43 ` Alex Williamson 1 sibling, 1 reply; 9+ messages in thread From: Alexey Kardashevskiy @ 2015-09-03 4:40 UTC (permalink / raw) To: qemu-devel Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Gavin Shan, David Gibson So far there were 2 limitations enforced on an emulated PHB regarding VFIO: 1) only one IOMMU group per IOMMU container was allowed and the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; 2) only one IOMMU container per PHB was allowed as a group can only be attached to one container. However these are not really necessary so we are removing them here. This deprecates IOMMU group ID and changes vfio_container_do_ioctl() not to receive it. Instead of getting a container from a group ID, this calls ioctl() for every container attached to the PHB address space. This allows multiple containers on the same PHB which means multiple groups per PHB. Note that this will lead to every H_PUT_TCE&etc call to be passed to every container separately which will affect the performance. For 32bit devices it is still recommended to put every group to a separate PHB. Since the existing VFIO code is already trying to share a container for multiple groups, just removing a group id from the vfio_container_do_ioctl() allows the existing code to share a container if it is supported by the host kernel. This replaces the check for a group id to be set correctly with the check that it is not set. This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState members is accessed here. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- hw/ppc/spapr_pci.c | 10 +++++----- hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- hw/vfio/common.c | 22 ++++++---------------- include/hw/vfio/vfio.h | 3 +-- 4 files changed, 18 insertions(+), 34 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 4e289cb..1b7559d 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); if (sphb->vfio_num) { - if (sphb->iommugroupid == -1) { - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); - return -1; - } - ret = spapr_phb_vfio_dma_capabilities_update(sphb); if (ret) { error_report("Unable to get DMA32 info from VFIO"); @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) PCIBus *bus; uint64_t msi_window_size = 4096; + if ((sphb->iommugroupid != -1) && + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { + error_report("Warning: iommugroupid is deprecated and will be ignored"); + } + if (sphb->index != (uint32_t)-1) { hwaddr windows_base; diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index f94d8a4..48137d5 100644 --- a/hw/ppc/spapr_pci_vfio.c +++ b/hw/ppc/spapr_pci_vfio.c @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; int ret; - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); if (ret) { return ret; @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) .op = VFIO_EEH_PE_ENABLE }; - vfio_container_ioctl(&sphb->iommu_as, - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); } int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, return RTAS_OUT_PARAM_ERROR; } - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, - VFIO_EEH_PE_OP, &op); + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_HW_ERROR; } @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) int ret; op.op = VFIO_EEH_PE_GET_STATE; - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, - VFIO_EEH_PE_OP, &op); + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_PARAM_ERROR; } @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) return RTAS_OUT_PARAM_ERROR; } - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, - VFIO_EEH_PE_OP, &op); + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_HW_ERROR; } @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) int ret; op.op = VFIO_EEH_PE_CONFIGURE; - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, - VFIO_EEH_PE_OP, &op); + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); if (ret < 0) { return RTAS_OUT_PARAM_ERROR; } diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 85ee9b0..a00644e 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) close(vbasedev->fd); } -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, - int req, void *param) +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) { - VFIOGroup *group; VFIOContainer *container; int ret = -1; + VFIOAddressSpace *space = vfio_get_address_space(as); - group = vfio_get_group(groupid, as); - if (!group) { - error_report("vfio: group %d not registered", groupid); - return ret; - } - - container = group->container; - if (group->container) { + QLIST_FOREACH(container, &space->containers, next) { ret = ioctl(container->fd, req, param); if (ret < 0) { error_report("vfio: failed to ioctl %d to container: ret=%d, %s", _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); + return -errno; } } - vfio_put_group(group); - return ret; } -int vfio_container_ioctl(AddressSpace *as, int32_t groupid, - int req, void *param) +int vfio_container_ioctl(AddressSpace *as, int req, void *param) { /* We allow only certain ioctls to the container */ switch (req) { @@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid, return -1; } - return vfio_container_do_ioctl(as, groupid, req, param); + return vfio_container_do_ioctl(as, req, param); } diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h index 0b26cd8..e076c14 100644 --- a/include/hw/vfio/vfio.h +++ b/include/hw/vfio/vfio.h @@ -3,7 +3,6 @@ #include "qemu/typedefs.h" -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, - int req, void *param); +extern int vfio_container_ioctl(AddressSpace *as, int req, void *param); #endif -- 2.4.0.rc3.8.gfb3e7d5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy @ 2015-09-10 2:43 ` Alex Williamson 2015-09-11 20:03 ` Alex Williamson 0 siblings, 1 reply; 9+ messages in thread From: Alex Williamson @ 2015-09-10 2:43 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, David Gibson On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > So far there were 2 limitations enforced on an emulated PHB > regarding VFIO: > 1) only one IOMMU group per IOMMU container was allowed and > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; > 2) only one IOMMU container per PHB was allowed as a group > can only be attached to one container. > > However these are not really necessary so we are removing them here. > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > not to receive it. Instead of getting a container from a group ID, > this calls ioctl() for every container attached to the PHB address space. > This allows multiple containers on the same PHB which means multiple > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > to be passed to every container separately which will affect > the performance. For 32bit devices it is still recommended to put > every group to a separate PHB. > > Since the existing VFIO code is already trying to share a container for > multiple groups, just removing a group id from > the vfio_container_do_ioctl() allows the existing code to share > a container if it is supported by the host kernel. > > This replaces the check for a group id to be set correctly with > the check that it is not set. > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > members is accessed here. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > hw/ppc/spapr_pci.c | 10 +++++----- > hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > hw/vfio/common.c | 22 ++++++---------------- > include/hw/vfio/vfio.h | 3 +-- > 4 files changed, 18 insertions(+), 34 deletions(-) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 4e289cb..1b7559d 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); > > if (sphb->vfio_num) { > - if (sphb->iommugroupid == -1) { > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); > - return -1; > - } > - > ret = spapr_phb_vfio_dma_capabilities_update(sphb); > if (ret) { > error_report("Unable to get DMA32 info from VFIO"); > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) > PCIBus *bus; > uint64_t msi_window_size = 4096; > > + if ((sphb->iommugroupid != -1) && > + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { > + error_report("Warning: iommugroupid is deprecated and will be ignored"); > + } > + > if (sphb->index != (uint32_t)-1) { > hwaddr windows_base; > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > index f94d8a4..48137d5 100644 > --- a/hw/ppc/spapr_pci_vfio.c > +++ b/hw/ppc/spapr_pci_vfio.c > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; > int ret; > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > + ret = vfio_container_ioctl(&sphb->iommu_as, > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > if (ret) { > return ret; > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) > .op = VFIO_EEH_PE_ENABLE > }; > > - vfio_container_ioctl(&sphb->iommu_as, > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > } > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > return RTAS_OUT_PARAM_ERROR; > } > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > - VFIO_EEH_PE_OP, &op); > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > if (ret < 0) { > return RTAS_OUT_HW_ERROR; > } > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) > int ret; > > op.op = VFIO_EEH_PE_GET_STATE; > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > - VFIO_EEH_PE_OP, &op); > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > if (ret < 0) { > return RTAS_OUT_PARAM_ERROR; > } > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) > return RTAS_OUT_PARAM_ERROR; > } > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > - VFIO_EEH_PE_OP, &op); > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > if (ret < 0) { > return RTAS_OUT_HW_ERROR; > } > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) > int ret; > > op.op = VFIO_EEH_PE_CONFIGURE; > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > - VFIO_EEH_PE_OP, &op); > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > if (ret < 0) { > return RTAS_OUT_PARAM_ERROR; > } > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 85ee9b0..a00644e 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > close(vbasedev->fd); > } > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > - int req, void *param) > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) > { > - VFIOGroup *group; > VFIOContainer *container; > int ret = -1; > + VFIOAddressSpace *space = vfio_get_address_space(as); > > - group = vfio_get_group(groupid, as); > - if (!group) { > - error_report("vfio: group %d not registered", groupid); > - return ret; > - } > - > - container = group->container; > - if (group->container) { > + QLIST_FOREACH(container, &space->containers, next) { > ret = ioctl(container->fd, req, param); > if (ret < 0) { > error_report("vfio: failed to ioctl %d to container: ret=%d, %s", > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); > + return -errno; > } > } This highlights how terrible this ioctl passthrough interface is; what's a caller supposed to do on error? Here we don't have visibility into the ioctl being called in order to do any unwind on error. The caller doesn't have the context to unwind only the failed containers. Is returning errno here really sufficient for anyone to figure out how to proceed or should we just cut our loses and abort()? What's the least horrible way we can realistically handle a failure here? Thanks, Alex > > - vfio_put_group(group); > - > return ret; > } > > -int vfio_container_ioctl(AddressSpace *as, int32_t groupid, > - int req, void *param) > +int vfio_container_ioctl(AddressSpace *as, int req, void *param) > { > /* We allow only certain ioctls to the container */ > switch (req) { > @@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid, > return -1; > } > > - return vfio_container_do_ioctl(as, groupid, req, param); > + return vfio_container_do_ioctl(as, req, param); > } > diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h > index 0b26cd8..e076c14 100644 > --- a/include/hw/vfio/vfio.h > +++ b/include/hw/vfio/vfio.h > @@ -3,7 +3,6 @@ > > #include "qemu/typedefs.h" > > -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, > - int req, void *param); > +extern int vfio_container_ioctl(AddressSpace *as, int req, void *param); > > #endif ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-10 2:43 ` Alex Williamson @ 2015-09-11 20:03 ` Alex Williamson 2015-09-14 4:04 ` David Gibson 0 siblings, 1 reply; 9+ messages in thread From: Alex Williamson @ 2015-09-11 20:03 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, David Gibson On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > > So far there were 2 limitations enforced on an emulated PHB > > regarding VFIO: > > 1) only one IOMMU group per IOMMU container was allowed and > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; > > 2) only one IOMMU container per PHB was allowed as a group > > can only be attached to one container. > > > > However these are not really necessary so we are removing them here. > > > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > > not to receive it. Instead of getting a container from a group ID, > > this calls ioctl() for every container attached to the PHB address space. > > This allows multiple containers on the same PHB which means multiple > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > > to be passed to every container separately which will affect > > the performance. For 32bit devices it is still recommended to put > > every group to a separate PHB. > > > > Since the existing VFIO code is already trying to share a container for > > multiple groups, just removing a group id from > > the vfio_container_do_ioctl() allows the existing code to share > > a container if it is supported by the host kernel. > > > > This replaces the check for a group id to be set correctly with > > the check that it is not set. > > > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > > members is accessed here. > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > --- > > hw/ppc/spapr_pci.c | 10 +++++----- > > hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > > hw/vfio/common.c | 22 ++++++---------------- > > include/hw/vfio/vfio.h | 3 +-- > > 4 files changed, 18 insertions(+), 34 deletions(-) > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index 4e289cb..1b7559d 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) > > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); > > > > if (sphb->vfio_num) { > > - if (sphb->iommugroupid == -1) { > > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); > > - return -1; > > - } > > - > > ret = spapr_phb_vfio_dma_capabilities_update(sphb); > > if (ret) { > > error_report("Unable to get DMA32 info from VFIO"); > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) > > PCIBus *bus; > > uint64_t msi_window_size = 4096; > > > > + if ((sphb->iommugroupid != -1) && > > + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { > > + error_report("Warning: iommugroupid is deprecated and will be ignored"); > > + } > > + > > if (sphb->index != (uint32_t)-1) { > > hwaddr windows_base; > > > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > > index f94d8a4..48137d5 100644 > > --- a/hw/ppc/spapr_pci_vfio.c > > +++ b/hw/ppc/spapr_pci_vfio.c > > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) > > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; > > int ret; > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > + ret = vfio_container_ioctl(&sphb->iommu_as, > > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > > if (ret) { > > return ret; > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) > > .op = VFIO_EEH_PE_ENABLE > > }; > > > > - vfio_container_ioctl(&sphb->iommu_as, > > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > } > > > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > return RTAS_OUT_PARAM_ERROR; > > } > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > - VFIO_EEH_PE_OP, &op); > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > if (ret < 0) { > > return RTAS_OUT_HW_ERROR; > > } > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) > > int ret; > > > > op.op = VFIO_EEH_PE_GET_STATE; > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > - VFIO_EEH_PE_OP, &op); > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > if (ret < 0) { > > return RTAS_OUT_PARAM_ERROR; > > } > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) > > return RTAS_OUT_PARAM_ERROR; > > } > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > - VFIO_EEH_PE_OP, &op); > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > if (ret < 0) { > > return RTAS_OUT_HW_ERROR; > > } > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) > > int ret; > > > > op.op = VFIO_EEH_PE_CONFIGURE; > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > - VFIO_EEH_PE_OP, &op); > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > if (ret < 0) { > > return RTAS_OUT_PARAM_ERROR; > > } > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > index 85ee9b0..a00644e 100644 > > --- a/hw/vfio/common.c > > +++ b/hw/vfio/common.c > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > > close(vbasedev->fd); > > } > > > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > > - int req, void *param) > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) > > { > > - VFIOGroup *group; > > VFIOContainer *container; > > int ret = -1; > > + VFIOAddressSpace *space = vfio_get_address_space(as); > > > > - group = vfio_get_group(groupid, as); > > - if (!group) { > > - error_report("vfio: group %d not registered", groupid); > > - return ret; > > - } > > - > > - container = group->container; > > - if (group->container) { > > + QLIST_FOREACH(container, &space->containers, next) { > > ret = ioctl(container->fd, req, param); > > if (ret < 0) { > > error_report("vfio: failed to ioctl %d to container: ret=%d, %s", > > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); > > + return -errno; > > } > > } > > This highlights how terrible this ioctl passthrough interface is; what's > a caller supposed to do on error? Here we don't have visibility into > the ioctl being called in order to do any unwind on error. The caller > doesn't have the context to unwind only the failed containers. Is > returning errno here really sufficient for anyone to figure out how to > proceed or should we just cut our loses and abort()? What's the least > horrible way we can realistically handle a failure here? Thanks, It seemed like I asked this before and it turns out that I did: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html "Gavin says yes" is not really a supportable solution when I stumbled on it again and David doesn't know why it's safe either (nor does it answer my question of how does this work). At a minimum, the reasoning why this is safe for the ioctls we currently allow here needs to be spelled out in a comment. We could easily make the mistake of adding another ioctl to the passthrough for which those assumptions are not valid. We may even want to abort if it's not one of the ioctls we've evaluated so we don't overlook this problem later. Thanks, Alex > > > - vfio_put_group(group); > > - > > return ret; > > } > > > > -int vfio_container_ioctl(AddressSpace *as, int32_t groupid, > > - int req, void *param) > > +int vfio_container_ioctl(AddressSpace *as, int req, void *param) > > { > > /* We allow only certain ioctls to the container */ > > switch (req) { > > @@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid, > > return -1; > > } > > > > - return vfio_container_do_ioctl(as, groupid, req, param); > > + return vfio_container_do_ioctl(as, req, param); > > } > > diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h > > index 0b26cd8..e076c14 100644 > > --- a/include/hw/vfio/vfio.h > > +++ b/include/hw/vfio/vfio.h > > @@ -3,7 +3,6 @@ > > > > #include "qemu/typedefs.h" > > > > -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, > > - int req, void *param); > > +extern int vfio_container_ioctl(AddressSpace *as, int req, void *param); > > > > #endif > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-11 20:03 ` Alex Williamson @ 2015-09-14 4:04 ` David Gibson 2015-09-14 8:56 ` Alexey Kardashevskiy 0 siblings, 1 reply; 9+ messages in thread From: David Gibson @ 2015-09-14 4:04 UTC (permalink / raw) To: Alex Williamson; +Cc: Alexey Kardashevskiy, qemu-ppc, qemu-devel, Gavin Shan [-- Attachment #1: Type: text/plain, Size: 9309 bytes --] On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: > On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: > > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > > > So far there were 2 limitations enforced on an emulated PHB > > > regarding VFIO: > > > 1) only one IOMMU group per IOMMU container was allowed and > > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; > > > 2) only one IOMMU container per PHB was allowed as a group > > > can only be attached to one container. > > > > > > However these are not really necessary so we are removing them here. > > > > > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > > > not to receive it. Instead of getting a container from a group ID, > > > this calls ioctl() for every container attached to the PHB address space. > > > This allows multiple containers on the same PHB which means multiple > > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > > > to be passed to every container separately which will affect > > > the performance. For 32bit devices it is still recommended to put > > > every group to a separate PHB. > > > > > > Since the existing VFIO code is already trying to share a container for > > > multiple groups, just removing a group id from > > > the vfio_container_do_ioctl() allows the existing code to share > > > a container if it is supported by the host kernel. > > > > > > This replaces the check for a group id to be set correctly with > > > the check that it is not set. > > > > > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > > > members is accessed here. > > > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > > --- > > > hw/ppc/spapr_pci.c | 10 +++++----- > > > hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > > > hw/vfio/common.c | 22 ++++++---------------- > > > include/hw/vfio/vfio.h | 3 +-- > > > 4 files changed, 18 insertions(+), 34 deletions(-) > > > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > > index 4e289cb..1b7559d 100644 > > > --- a/hw/ppc/spapr_pci.c > > > +++ b/hw/ppc/spapr_pci.c > > > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) > > > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); > > > > > > if (sphb->vfio_num) { > > > - if (sphb->iommugroupid == -1) { > > > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); > > > - return -1; > > > - } > > > - > > > ret = spapr_phb_vfio_dma_capabilities_update(sphb); > > > if (ret) { > > > error_report("Unable to get DMA32 info from VFIO"); > > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) > > > PCIBus *bus; > > > uint64_t msi_window_size = 4096; > > > > > > + if ((sphb->iommugroupid != -1) && > > > + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { > > > + error_report("Warning: iommugroupid is deprecated and will be ignored"); > > > + } > > > + > > > if (sphb->index != (uint32_t)-1) { > > > hwaddr windows_base; > > > > > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > > > index f94d8a4..48137d5 100644 > > > --- a/hw/ppc/spapr_pci_vfio.c > > > +++ b/hw/ppc/spapr_pci_vfio.c > > > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) > > > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; > > > int ret; > > > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > + ret = vfio_container_ioctl(&sphb->iommu_as, > > > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > > > if (ret) { > > > return ret; > > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) > > > .op = VFIO_EEH_PE_ENABLE > > > }; > > > > > > - vfio_container_ioctl(&sphb->iommu_as, > > > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > > > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > } > > > > > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_HW_ERROR; > > > } > > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) > > > int ret; > > > > > > op.op = VFIO_EEH_PE_GET_STATE; > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_HW_ERROR; > > > } > > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) > > > int ret; > > > > > > op.op = VFIO_EEH_PE_CONFIGURE; > > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > > > - VFIO_EEH_PE_OP, &op); > > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > > > if (ret < 0) { > > > return RTAS_OUT_PARAM_ERROR; > > > } > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > index 85ee9b0..a00644e 100644 > > > --- a/hw/vfio/common.c > > > +++ b/hw/vfio/common.c > > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > > > close(vbasedev->fd); > > > } > > > > > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > > > - int req, void *param) > > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) > > > { > > > - VFIOGroup *group; > > > VFIOContainer *container; > > > int ret = -1; > > > + VFIOAddressSpace *space = vfio_get_address_space(as); > > > > > > - group = vfio_get_group(groupid, as); > > > - if (!group) { > > > - error_report("vfio: group %d not registered", groupid); > > > - return ret; > > > - } > > > - > > > - container = group->container; > > > - if (group->container) { > > > + QLIST_FOREACH(container, &space->containers, next) { > > > ret = ioctl(container->fd, req, param); > > > if (ret < 0) { > > > error_report("vfio: failed to ioctl %d to container: ret=%d, %s", > > > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); > > > + return -errno; > > > } > > > } > > > > This highlights how terrible this ioctl passthrough interface is; what's > > a caller supposed to do on error? Here we don't have visibility into > > the ioctl being called in order to do any unwind on error. The caller > > doesn't have the context to unwind only the failed containers. Is > > returning errno here really sufficient for anyone to figure out how to > > proceed or should we just cut our loses and abort()? What's the least > > horrible way we can realistically handle a failure here? Thanks, > > It seemed like I asked this before and it turns out that I did: > > https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html > > "Gavin says yes" is not really a supportable solution when I stumbled on > it again and David doesn't know why it's safe either (nor does it answer > my question of how does this work). At a minimum, the reasoning why > this is safe for the ioctls we currently allow here needs to be spelled > out in a comment. We could easily make the mistake of adding another > ioctl to the passthrough for which those assumptions are not valid. We > may even want to abort if it's not one of the ioctls we've evaluated so > we don't overlook this problem later. Thanks, Yeah, you're right. This is a mess. What we need to do here is to make vfio_container_ioctl() take a VFIOContainer object as the name suggests. The the callers will need to either use it on a specific container, or iterate across all the containers in the VFIOAddressSpace as appropriate for the specific operation. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-14 4:04 ` David Gibson @ 2015-09-14 8:56 ` Alexey Kardashevskiy 2015-09-14 11:42 ` David Gibson 0 siblings, 1 reply; 9+ messages in thread From: Alexey Kardashevskiy @ 2015-09-14 8:56 UTC (permalink / raw) To: David Gibson, Alex Williamson; +Cc: qemu-ppc, qemu-devel, Gavin Shan On 09/14/2015 02:04 PM, David Gibson wrote: > On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: >> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: >>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: >>>> So far there were 2 limitations enforced on an emulated PHB >>>> regarding VFIO: >>>> 1) only one IOMMU group per IOMMU container was allowed and >>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; >>>> 2) only one IOMMU container per PHB was allowed as a group >>>> can only be attached to one container. >>>> >>>> However these are not really necessary so we are removing them here. >>>> >>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl() >>>> not to receive it. Instead of getting a container from a group ID, >>>> this calls ioctl() for every container attached to the PHB address space. >>>> This allows multiple containers on the same PHB which means multiple >>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call >>>> to be passed to every container separately which will affect >>>> the performance. For 32bit devices it is still recommended to put >>>> every group to a separate PHB. >>>> >>>> Since the existing VFIO code is already trying to share a container for >>>> multiple groups, just removing a group id from >>>> the vfio_container_do_ioctl() allows the existing code to share >>>> a container if it is supported by the host kernel. >>>> >>>> This replaces the check for a group id to be set correctly with >>>> the check that it is not set. >>>> >>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState >>>> members is accessed here. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>> --- >>>> hw/ppc/spapr_pci.c | 10 +++++----- >>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- >>>> hw/vfio/common.c | 22 ++++++---------------- >>>> include/hw/vfio/vfio.h | 3 +-- >>>> 4 files changed, 18 insertions(+), 34 deletions(-) >>>> >>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >>>> index 4e289cb..1b7559d 100644 >>>> --- a/hw/ppc/spapr_pci.c >>>> +++ b/hw/ppc/spapr_pci.c >>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) >>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); >>>> >>>> if (sphb->vfio_num) { >>>> - if (sphb->iommugroupid == -1) { >>>> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); >>>> - return -1; >>>> - } >>>> - >>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb); >>>> if (ret) { >>>> error_report("Unable to get DMA32 info from VFIO"); >>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) >>>> PCIBus *bus; >>>> uint64_t msi_window_size = 4096; >>>> >>>> + if ((sphb->iommugroupid != -1) && >>>> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { >>>> + error_report("Warning: iommugroupid is deprecated and will be ignored"); >>>> + } >>>> + >>>> if (sphb->index != (uint32_t)-1) { >>>> hwaddr windows_base; >>>> >>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c >>>> index f94d8a4..48137d5 100644 >>>> --- a/hw/ppc/spapr_pci_vfio.c >>>> +++ b/hw/ppc/spapr_pci_vfio.c >>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) >>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; >>>> int ret; >>>> >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, >>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); >>>> if (ret) { >>>> return ret; >>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) >>>> .op = VFIO_EEH_PE_ENABLE >>>> }; >>>> >>>> - vfio_container_ioctl(&sphb->iommu_as, >>>> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); >>>> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> } >>>> >>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, >>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_HW_ERROR; >>>> } >>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) >>>> int ret; >>>> >>>> op.op = VFIO_EEH_PE_GET_STATE; >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_HW_ERROR; >>>> } >>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) >>>> int ret; >>>> >>>> op.op = VFIO_EEH_PE_CONFIGURE; >>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>> - VFIO_EEH_PE_OP, &op); >>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>> if (ret < 0) { >>>> return RTAS_OUT_PARAM_ERROR; >>>> } >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 85ee9b0..a00644e 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) >>>> close(vbasedev->fd); >>>> } >>>> >>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, >>>> - int req, void *param) >>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) >>>> { >>>> - VFIOGroup *group; >>>> VFIOContainer *container; >>>> int ret = -1; >>>> + VFIOAddressSpace *space = vfio_get_address_space(as); >>>> >>>> - group = vfio_get_group(groupid, as); >>>> - if (!group) { >>>> - error_report("vfio: group %d not registered", groupid); >>>> - return ret; >>>> - } >>>> - >>>> - container = group->container; >>>> - if (group->container) { >>>> + QLIST_FOREACH(container, &space->containers, next) { >>>> ret = ioctl(container->fd, req, param); >>>> if (ret < 0) { >>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s", >>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); >>>> + return -errno; >>>> } >>>> } >>> >>> This highlights how terrible this ioctl passthrough interface is; what's >>> a caller supposed to do on error? Here we don't have visibility into >>> the ioctl being called in order to do any unwind on error. The caller >>> doesn't have the context to unwind only the failed containers. Is >>> returning errno here really sufficient for anyone to figure out how to >>> proceed or should we just cut our loses and abort()? What's the least >>> horrible way we can realistically handle a failure here? Thanks, >> >> It seemed like I asked this before and it turns out that I did: >> >> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html >> >> "Gavin says yes" is not really a supportable solution when I stumbled on >> it again and David doesn't know why it's safe either (nor does it answer >> my question of how does this work). At a minimum, the reasoning why >> this is safe for the ioctls we currently allow here needs to be spelled >> out in a comment. We could easily make the mistake of adding another >> ioctl to the passthrough for which those assumptions are not valid. We >> may even want to abort if it's not one of the ioctls we've evaluated so >> we don't overlook this problem later. Thanks, > > Yeah, you're right. This is a mess. Right. I have been testing/studying this lately a lot, will fix this too. > What we need to do here is to make vfio_container_ioctl() take a > VFIOContainer object as the name suggests. The the callers will need > to either use it on a specific container, or iterate across all the > containers in the VFIOAddressSpace as appropriate for the specific > operation. The callers - spapr code - do not deal with containers, these are a VFIO internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace. So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?) makes more sense here. Which I can make a patch for. Or change vfio_container_ioctl() as you suggest (+make is static) and add public vfio_as_ioctl(). Which one to pick? -- Alexey ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-14 8:56 ` Alexey Kardashevskiy @ 2015-09-14 11:42 ` David Gibson 2015-09-14 14:37 ` Alexey Kardashevskiy 0 siblings, 1 reply; 9+ messages in thread From: David Gibson @ 2015-09-14 11:42 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan [-- Attachment #1: Type: text/plain, Size: 10589 bytes --] On Mon, Sep 14, 2015 at 06:56:35PM +1000, Alexey Kardashevskiy wrote: > On 09/14/2015 02:04 PM, David Gibson wrote: > >On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: > >>On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: > >>>On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: > >>>>So far there were 2 limitations enforced on an emulated PHB > >>>>regarding VFIO: > >>>>1) only one IOMMU group per IOMMU container was allowed and > >>>>the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; > >>>>2) only one IOMMU container per PHB was allowed as a group > >>>>can only be attached to one container. > >>>> > >>>>However these are not really necessary so we are removing them here. > >>>> > >>>>This deprecates IOMMU group ID and changes vfio_container_do_ioctl() > >>>>not to receive it. Instead of getting a container from a group ID, > >>>>this calls ioctl() for every container attached to the PHB address space. > >>>>This allows multiple containers on the same PHB which means multiple > >>>>groups per PHB. Note that this will lead to every H_PUT_TCE&etc call > >>>>to be passed to every container separately which will affect > >>>>the performance. For 32bit devices it is still recommended to put > >>>>every group to a separate PHB. > >>>> > >>>>Since the existing VFIO code is already trying to share a container for > >>>>multiple groups, just removing a group id from > >>>>the vfio_container_do_ioctl() allows the existing code to share > >>>>a container if it is supported by the host kernel. > >>>> > >>>>This replaces the check for a group id to be set correctly with > >>>>the check that it is not set. > >>>> > >>>>This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState > >>>>members is accessed here. > >>>> > >>>>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >>>>--- > >>>> hw/ppc/spapr_pci.c | 10 +++++----- > >>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- > >>>> hw/vfio/common.c | 22 ++++++---------------- > >>>> include/hw/vfio/vfio.h | 3 +-- > >>>> 4 files changed, 18 insertions(+), 34 deletions(-) > >>>> > >>>>diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >>>>index 4e289cb..1b7559d 100644 > >>>>--- a/hw/ppc/spapr_pci.c > >>>>+++ b/hw/ppc/spapr_pci.c > >>>>@@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) > >>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); > >>>> > >>>> if (sphb->vfio_num) { > >>>>- if (sphb->iommugroupid == -1) { > >>>>- error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); > >>>>- return -1; > >>>>- } > >>>>- > >>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb); > >>>> if (ret) { > >>>> error_report("Unable to get DMA32 info from VFIO"); > >>>>@@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) > >>>> PCIBus *bus; > >>>> uint64_t msi_window_size = 4096; > >>>> > >>>>+ if ((sphb->iommugroupid != -1) && > >>>>+ object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { > >>>>+ error_report("Warning: iommugroupid is deprecated and will be ignored"); > >>>>+ } > >>>>+ > >>>> if (sphb->index != (uint32_t)-1) { > >>>> hwaddr windows_base; > >>>> > >>>>diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c > >>>>index f94d8a4..48137d5 100644 > >>>>--- a/hw/ppc/spapr_pci_vfio.c > >>>>+++ b/hw/ppc/spapr_pci_vfio.c > >>>>@@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) > >>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; > >>>> int ret; > >>>> > >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, > >>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > >>>> if (ret) { > >>>> return ret; > >>>>@@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) > >>>> .op = VFIO_EEH_PE_ENABLE > >>>> }; > >>>> > >>>>- vfio_container_ioctl(&sphb->iommu_as, > >>>>- sphb->iommugroupid, VFIO_EEH_PE_OP, &op); > >>>>+ vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > >>>> } > >>>> > >>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > >>>>@@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>> > >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_HW_ERROR; > >>>> } > >>>>@@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) > >>>> int ret; > >>>> > >>>> op.op = VFIO_EEH_PE_GET_STATE; > >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>>@@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>> > >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_HW_ERROR; > >>>> } > >>>>@@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) > >>>> int ret; > >>>> > >>>> op.op = VFIO_EEH_PE_CONFIGURE; > >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, > >>>>- VFIO_EEH_PE_OP, &op); > >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); > >>>> if (ret < 0) { > >>>> return RTAS_OUT_PARAM_ERROR; > >>>> } > >>>>diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>>>index 85ee9b0..a00644e 100644 > >>>>--- a/hw/vfio/common.c > >>>>+++ b/hw/vfio/common.c > >>>>@@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) > >>>> close(vbasedev->fd); > >>>> } > >>>> > >>>>-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, > >>>>- int req, void *param) > >>>>+static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) > >>>> { > >>>>- VFIOGroup *group; > >>>> VFIOContainer *container; > >>>> int ret = -1; > >>>>+ VFIOAddressSpace *space = vfio_get_address_space(as); > >>>> > >>>>- group = vfio_get_group(groupid, as); > >>>>- if (!group) { > >>>>- error_report("vfio: group %d not registered", groupid); > >>>>- return ret; > >>>>- } > >>>>- > >>>>- container = group->container; > >>>>- if (group->container) { > >>>>+ QLIST_FOREACH(container, &space->containers, next) { > >>>> ret = ioctl(container->fd, req, param); > >>>> if (ret < 0) { > >>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s", > >>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); > >>>>+ return -errno; > >>>> } > >>>> } > >>> > >>>This highlights how terrible this ioctl passthrough interface is; what's > >>>a caller supposed to do on error? Here we don't have visibility into > >>>the ioctl being called in order to do any unwind on error. The caller > >>>doesn't have the context to unwind only the failed containers. Is > >>>returning errno here really sufficient for anyone to figure out how to > >>>proceed or should we just cut our loses and abort()? What's the least > >>>horrible way we can realistically handle a failure here? Thanks, > >> > >>It seemed like I asked this before and it turns out that I did: > >> > >>https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html > >> > >>"Gavin says yes" is not really a supportable solution when I stumbled on > >>it again and David doesn't know why it's safe either (nor does it answer > >>my question of how does this work). At a minimum, the reasoning why > >>this is safe for the ioctls we currently allow here needs to be spelled > >>out in a comment. We could easily make the mistake of adding another > >>ioctl to the passthrough for which those assumptions are not valid. We > >>may even want to abort if it's not one of the ioctls we've evaluated so > >>we don't overlook this problem later. Thanks, > > > >Yeah, you're right. This is a mess. > > > Right. I have been testing/studying this lately a lot, will fix this too. > > > >What we need to do here is to make vfio_container_ioctl() take a > >VFIOContainer object as the name suggests. The the callers will need > >to either use it on a specific container, or iterate across all the > >containers in the VFIOAddressSpace as appropriate for the specific > >operation. > > The callers - spapr code - do not deal with containers, these are a VFIO > internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace. > > So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?) > makes more sense here. Which I can make a patch for. Or change > vfio_container_ioctl() as you suggest (+make is static) and add public > vfio_as_ioctl(). Which one to pick? No, that doesn't address the problem. The ioctl is operating on the container, the wrapper needs to take the container. Whether the ioctl can be safely replicated across all containers in the VFIOAddressSpace, or can only be used if you have additional information to select a specific container *cannot* be known unless you know exactly what ioctl() you're talking about. That decision has to move to the callers, which know what they're doing. If those callers don't know about individual containers now, then they have to learn. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices 2015-09-14 11:42 ` David Gibson @ 2015-09-14 14:37 ` Alexey Kardashevskiy 0 siblings, 0 replies; 9+ messages in thread From: Alexey Kardashevskiy @ 2015-09-14 14:37 UTC (permalink / raw) To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan On 09/14/2015 09:42 PM, David Gibson wrote: > On Mon, Sep 14, 2015 at 06:56:35PM +1000, Alexey Kardashevskiy wrote: >> On 09/14/2015 02:04 PM, David Gibson wrote: >>> On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote: >>>> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote: >>>>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote: >>>>>> So far there were 2 limitations enforced on an emulated PHB >>>>>> regarding VFIO: >>>>>> 1) only one IOMMU group per IOMMU container was allowed and >>>>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this; >>>>>> 2) only one IOMMU container per PHB was allowed as a group >>>>>> can only be attached to one container. >>>>>> >>>>>> However these are not really necessary so we are removing them here. >>>>>> >>>>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl() >>>>>> not to receive it. Instead of getting a container from a group ID, >>>>>> this calls ioctl() for every container attached to the PHB address space. >>>>>> This allows multiple containers on the same PHB which means multiple >>>>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call >>>>>> to be passed to every container separately which will affect >>>>>> the performance. For 32bit devices it is still recommended to put >>>>>> every group to a separate PHB. >>>>>> >>>>>> Since the existing VFIO code is already trying to share a container for >>>>>> multiple groups, just removing a group id from >>>>>> the vfio_container_do_ioctl() allows the existing code to share >>>>>> a container if it is supported by the host kernel. >>>>>> >>>>>> This replaces the check for a group id to be set correctly with >>>>>> the check that it is not set. >>>>>> >>>>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState >>>>>> members is accessed here. >>>>>> >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>> --- >>>>>> hw/ppc/spapr_pci.c | 10 +++++----- >>>>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++----------- >>>>>> hw/vfio/common.c | 22 ++++++---------------- >>>>>> include/hw/vfio/vfio.h | 3 +-- >>>>>> 4 files changed, 18 insertions(+), 34 deletions(-) >>>>>> >>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >>>>>> index 4e289cb..1b7559d 100644 >>>>>> --- a/hw/ppc/spapr_pci.c >>>>>> +++ b/hw/ppc/spapr_pci.c >>>>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb) >>>>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb); >>>>>> >>>>>> if (sphb->vfio_num) { >>>>>> - if (sphb->iommugroupid == -1) { >>>>>> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid); >>>>>> - return -1; >>>>>> - } >>>>>> - >>>>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb); >>>>>> if (ret) { >>>>>> error_report("Unable to get DMA32 info from VFIO"); >>>>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) >>>>>> PCIBus *bus; >>>>>> uint64_t msi_window_size = 4096; >>>>>> >>>>>> + if ((sphb->iommugroupid != -1) && >>>>>> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) { >>>>>> + error_report("Warning: iommugroupid is deprecated and will be ignored"); >>>>>> + } >>>>>> + >>>>>> if (sphb->index != (uint32_t)-1) { >>>>>> hwaddr windows_base; >>>>>> >>>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c >>>>>> index f94d8a4..48137d5 100644 >>>>>> --- a/hw/ppc/spapr_pci_vfio.c >>>>>> +++ b/hw/ppc/spapr_pci_vfio.c >>>>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb) >>>>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; >>>>>> int ret; >>>>>> >>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, >>>>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); >>>>>> if (ret) { >>>>>> return ret; >>>>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb) >>>>>> .op = VFIO_EEH_PE_ENABLE >>>>>> }; >>>>>> >>>>>> - vfio_container_ioctl(&sphb->iommu_as, >>>>>> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op); >>>>>> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>>>> } >>>>>> >>>>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, >>>>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb, >>>>>> return RTAS_OUT_PARAM_ERROR; >>>>>> } >>>>>> >>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>>>> - VFIO_EEH_PE_OP, &op); >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>>>> if (ret < 0) { >>>>>> return RTAS_OUT_HW_ERROR; >>>>>> } >>>>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state) >>>>>> int ret; >>>>>> >>>>>> op.op = VFIO_EEH_PE_GET_STATE; >>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>>>> - VFIO_EEH_PE_OP, &op); >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>>>> if (ret < 0) { >>>>>> return RTAS_OUT_PARAM_ERROR; >>>>>> } >>>>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option) >>>>>> return RTAS_OUT_PARAM_ERROR; >>>>>> } >>>>>> >>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>>>> - VFIO_EEH_PE_OP, &op); >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>>>> if (ret < 0) { >>>>>> return RTAS_OUT_HW_ERROR; >>>>>> } >>>>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb) >>>>>> int ret; >>>>>> >>>>>> op.op = VFIO_EEH_PE_CONFIGURE; >>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid, >>>>>> - VFIO_EEH_PE_OP, &op); >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op); >>>>>> if (ret < 0) { >>>>>> return RTAS_OUT_PARAM_ERROR; >>>>>> } >>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>>>> index 85ee9b0..a00644e 100644 >>>>>> --- a/hw/vfio/common.c >>>>>> +++ b/hw/vfio/common.c >>>>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev) >>>>>> close(vbasedev->fd); >>>>>> } >>>>>> >>>>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid, >>>>>> - int req, void *param) >>>>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param) >>>>>> { >>>>>> - VFIOGroup *group; >>>>>> VFIOContainer *container; >>>>>> int ret = -1; >>>>>> + VFIOAddressSpace *space = vfio_get_address_space(as); >>>>>> >>>>>> - group = vfio_get_group(groupid, as); >>>>>> - if (!group) { >>>>>> - error_report("vfio: group %d not registered", groupid); >>>>>> - return ret; >>>>>> - } >>>>>> - >>>>>> - container = group->container; >>>>>> - if (group->container) { >>>>>> + QLIST_FOREACH(container, &space->containers, next) { >>>>>> ret = ioctl(container->fd, req, param); >>>>>> if (ret < 0) { >>>>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s", >>>>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno)); >>>>>> + return -errno; >>>>>> } >>>>>> } >>>>> >>>>> This highlights how terrible this ioctl passthrough interface is; what's >>>>> a caller supposed to do on error? Here we don't have visibility into >>>>> the ioctl being called in order to do any unwind on error. The caller >>>>> doesn't have the context to unwind only the failed containers. Is >>>>> returning errno here really sufficient for anyone to figure out how to >>>>> proceed or should we just cut our loses and abort()? What's the least >>>>> horrible way we can realistically handle a failure here? Thanks, >>>> >>>> It seemed like I asked this before and it turns out that I did: >>>> >>>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html >>>> >>>> "Gavin says yes" is not really a supportable solution when I stumbled on >>>> it again and David doesn't know why it's safe either (nor does it answer >>>> my question of how does this work). At a minimum, the reasoning why >>>> this is safe for the ioctls we currently allow here needs to be spelled >>>> out in a comment. We could easily make the mistake of adding another >>>> ioctl to the passthrough for which those assumptions are not valid. We >>>> may even want to abort if it's not one of the ioctls we've evaluated so >>>> we don't overlook this problem later. Thanks, >>> >>> Yeah, you're right. This is a mess. >> >> >> Right. I have been testing/studying this lately a lot, will fix this too. >> >> >>> What we need to do here is to make vfio_container_ioctl() take a >>> VFIOContainer object as the name suggests. The the callers will need >>> to either use it on a specific container, or iterate across all the >>> containers in the VFIOAddressSpace as appropriate for the specific >>> operation. >> >> The callers - spapr code - do not deal with containers, these are a VFIO >> internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace. >> >> So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?) >> makes more sense here. Which I can make a patch for. Or change >> vfio_container_ioctl() as you suggest (+make is static) and add public >> vfio_as_ioctl(). Which one to pick? > > No, that doesn't address the problem. The ioctl is operating on the > container, the wrapper needs to take the container. Whether the ioctl > can be safely replicated across all containers in the > VFIOAddressSpace, or can only be used if you have additional > information to select a specific container *cannot* be known unless > you know exactly what ioctl() you're talking about. That decision has > to move to the callers, which know what they're doing. > > If those callers don't know about individual containers now, then they > have to learn. I lost you here. Guests (and therefore the vfio_container_ioctl() callers which are RTAS handlers) work with either LIOBNs or phbid+bus+devfn (where bus+devfn are not used) so it is LIOBN or PHB. If there are multiple containers, there is no way (and no need) to call these ioctls on some containers only. If that really matters, the user is supposed to use a separate PHB. If bus+devfn are used (which I'll double check), then we might want another helper in QEMU - vfio_pci_ioctl(PCIDevice *pdev) and new vfio-pci-spapr PCI driver on the host (of #ifdef in the existing vfio-pci driver) but I doubt this is the case. -- Alexey ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-09-14 14:37 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy 2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy 2015-09-10 2:43 ` Alex Williamson 2015-09-11 20:03 ` Alex Williamson 2015-09-14 4:04 ` David Gibson 2015-09-14 8:56 ` Alexey Kardashevskiy 2015-09-14 11:42 ` David Gibson 2015-09-14 14:37 ` Alexey Kardashevskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).