* [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb
@ 2015-09-03 4:40 Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
0 siblings, 2 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-03 4:40 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Gavin Shan,
David Gibson
This allows sharing the same PHB for both emulated and VFIO devices.
This used to be a part of the Dynamic DMA window patchset. Since it was
moved out and reworked quite a lot, all reviewed-by's got outdated.
This is based on dgibson/spapr-next sha1 f91ffef.
Please comment. Thanks.
Changes:
v2:
* fixed bus_offset
* moved DMA setup to a spapr_phb_dma_reset() helper - DDW patches would
add it anyway
* verified that migration still works
Alexey Kardashevskiy (2):
spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
spapr_pci: Remove constraints about VFIO-PCI devices
hw/ppc/Makefile.objs | 5 +-
hw/ppc/spapr_pci.c | 195 +++++++++++++++++++++++++++-----------------
hw/ppc/spapr_pci_vfio.c | 147 +++++++++++++--------------------
hw/vfio/common.c | 22 ++---
include/hw/pci-host/spapr.h | 23 +++---
include/hw/vfio/vfio.h | 3 +-
6 files changed, 195 insertions(+), 200 deletions(-)
--
2.4.0.rc3.8.gfb3e7d5
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
@ 2015-09-03 4:40 ` Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
1 sibling, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-03 4:40 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Gavin Shan,
David Gibson
sPAPRTCETable is handling 2 TCE tables already:
1) guest view of the TCE table - emulated devices use only this table;
2) hardware IOMMU table - VFIO PCI devices use it for actual work but
it does not replace 1) and it is not visible to the guest.
The initialization of this table is driven by vfio-pci device,
DMA map/unmap requests are handled via MemoryListener so there is very
little to do in spapr-pci-vfio-host-bridge.
This moves VFIO bits to the generic spapr-pci-host-bridge which allows
putting emulated and VFIO devices on the same PHB. It is still possible
to create multiple PHBs and avoid sharing PHB resouces for emulated and
VFIO devices.
If there is no VFIO-PCI device attaches, no special ioctls will be called.
If there are some VFIO-PCI devices attached, PHB may refuse to attach
another VFIO-PCI device if a VFIO container on the host kernel side
does not support container sharing.
This makes spapr-pci-vfio-host-bridge type equal to spapr-pci-host-bridge
except it has an additional "iommu" property so spapr-pci-vfio-host-bridge
still should be used for VFIO devices. The next patch will remove IOMMU ID
property and allow putting VFIO-PCI devices onto spapr-pci-host-bridge.
This adds a number of VFIO-PCI devices currently attached to a PHB as
PHB needs to know whether to do DMA setup for VFIO or not. Since
at the moment of the PHB's realize() invocation we cannot tell yet
how many VFIO-PCI devices are there (they are not attached yet),
this moves DMA setup to the reset handler (which is also called when
QEMU is about to receive migration).
This moves PCI device lookup from spapr_phb_vfio_eeh_set_option() to
rtas_ibm_set_eeh_option() as we need to know if the device is "vfio-pci"
and decide whether to call spapr_phb_vfio_eeh_set_option() or not.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* used tcet->bus_offset in the common PHB code as this is what VFIO PHB
needs
---
hw/ppc/Makefile.objs | 5 +-
hw/ppc/spapr_pci.c | 195 +++++++++++++++++++++++++++-----------------
hw/ppc/spapr_pci_vfio.c | 144 +++++++++++++-------------------
include/hw/pci-host/spapr.h | 23 +++---
4 files changed, 189 insertions(+), 178 deletions(-)
diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..6c06fcf 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,10 +3,7 @@ obj-y += ppc.o ppc_booke.o
# IBM pSeries (sPAPR)
obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
-ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
-obj-y += spapr_pci_vfio.o
-endif
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_pci_vfio.o spapr_rtc.o spapr_drc.o
# PowerPC 4xx boards
obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index bc30631..4e289cb 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -430,7 +430,6 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
target_ulong rets)
{
sPAPRPHBState *sphb;
- sPAPRPHBClass *spc;
PCIDevice *pdev;
uint32_t addr, option;
uint64_t buid;
@@ -445,7 +444,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
option = rtas_ld(args, 3);
sphb = spapr_pci_find_phb(spapr, buid);
- if (!sphb) {
+ if (!sphb || (sphb->vfio_num == 0)) {
goto param_error_exit;
}
@@ -455,12 +454,7 @@ static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
goto param_error_exit;
}
- spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
- if (!spc->eeh_set_option) {
- goto param_error_exit;
- }
-
- ret = spc->eeh_set_option(sphb, addr, option);
+ ret = spapr_phb_vfio_eeh_set_option(sphb, pdev, option);
rtas_st(rets, 0, ret);
return;
@@ -475,7 +469,6 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
target_ulong rets)
{
sPAPRPHBState *sphb;
- sPAPRPHBClass *spc;
PCIDevice *pdev;
uint32_t addr, option;
uint64_t buid;
@@ -486,12 +479,7 @@ static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
buid = rtas_ldq(args, 1);
sphb = spapr_pci_find_phb(spapr, buid);
- if (!sphb) {
- goto param_error_exit;
- }
-
- spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
- if (!spc->eeh_set_option) {
+ if (!sphb || (sphb->vfio_num == 0)) {
goto param_error_exit;
}
@@ -531,7 +519,6 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
target_ulong rets)
{
sPAPRPHBState *sphb;
- sPAPRPHBClass *spc;
uint64_t buid;
int state, ret;
@@ -541,16 +528,11 @@ static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu,
buid = rtas_ldq(args, 1);
sphb = spapr_pci_find_phb(spapr, buid);
- if (!sphb) {
+ if (!sphb || (sphb->vfio_num == 0)) {
goto param_error_exit;
}
- spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
- if (!spc->eeh_get_state) {
- goto param_error_exit;
- }
-
- ret = spc->eeh_get_state(sphb, &state);
+ ret = spapr_phb_vfio_eeh_get_state(sphb, &state);
rtas_st(rets, 0, ret);
if (ret != RTAS_OUT_SUCCESS) {
return;
@@ -575,7 +557,6 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu,
target_ulong rets)
{
sPAPRPHBState *sphb;
- sPAPRPHBClass *spc;
uint32_t option;
uint64_t buid;
int ret;
@@ -587,16 +568,11 @@ static void rtas_ibm_set_slot_reset(PowerPCCPU *cpu,
buid = rtas_ldq(args, 1);
option = rtas_ld(args, 3);
sphb = spapr_pci_find_phb(spapr, buid);
- if (!sphb) {
+ if (!sphb || (sphb->vfio_num == 0)) {
goto param_error_exit;
}
- spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
- if (!spc->eeh_reset) {
- goto param_error_exit;
- }
-
- ret = spc->eeh_reset(sphb, option);
+ ret = spapr_phb_vfio_eeh_reset(sphb, option);
rtas_st(rets, 0, ret);
return;
@@ -611,7 +587,6 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu,
target_ulong rets)
{
sPAPRPHBState *sphb;
- sPAPRPHBClass *spc;
uint64_t buid;
int ret;
@@ -621,16 +596,11 @@ static void rtas_ibm_configure_pe(PowerPCCPU *cpu,
buid = rtas_ldq(args, 1);
sphb = spapr_pci_find_phb(spapr, buid);
- if (!sphb) {
+ if (!sphb || (sphb->vfio_num == 0)) {
goto param_error_exit;
}
- spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
- if (!spc->eeh_configure) {
- goto param_error_exit;
- }
-
- ret = spc->eeh_configure(sphb);
+ ret = spapr_phb_vfio_eeh_configure(sphb);
rtas_st(rets, 0, ret);
return;
@@ -646,7 +616,6 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu,
target_ulong rets)
{
sPAPRPHBState *sphb;
- sPAPRPHBClass *spc;
int option;
uint64_t buid;
@@ -656,12 +625,7 @@ static void rtas_ibm_slot_error_detail(PowerPCCPU *cpu,
buid = rtas_ldq(args, 1);
sphb = spapr_pci_find_phb(spapr, buid);
- if (!sphb) {
- goto param_error_exit;
- }
-
- spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
- if (!spc->eeh_set_option) {
+ if (!sphb || (sphb->vfio_num == 0)) {
goto param_error_exit;
}
@@ -810,6 +774,102 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev)
return buf;
}
+static void spapr_phb_walk_devices(PCIBus *bus, PCIDevice *pdev, void *opaque)
+{
+ sPAPRPHBState *sphb = opaque;
+ PCIBus *sec_bus;
+
+ if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
+ ++sphb->vfio_num;
+ }
+
+ if ((pci_default_read_config(pdev, PCI_HEADER_TYPE, 1) !=
+ PCI_HEADER_TYPE_BRIDGE)) {
+ return;
+ }
+
+ sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
+ if (!sec_bus) {
+ return;
+ }
+
+ pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+ spapr_phb_walk_devices, sphb);
+}
+
+static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
+{
+ int ret;
+ PCIBus *bus = PCI_HOST_BRIDGE(sphb)->bus;
+
+ sphb->dma32_window_start = 0;
+ sphb->dma32_window_size = SPAPR_PCI_DMA32_SIZE;
+
+ /* Update the number of VFIO-PCI devices on the PHB */
+ sphb->vfio_num = 0;
+ pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
+
+ if (sphb->vfio_num) {
+ if (sphb->iommugroupid == -1) {
+ error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
+ return -1;
+ }
+
+ ret = spapr_phb_vfio_dma_capabilities_update(sphb);
+ if (ret) {
+ error_report("Unable to get DMA32 info from VFIO");
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static int spapr_phb_dma_init_window(sPAPRPHBState *sphb,
+ uint32_t liobn, uint32_t page_shift,
+ uint64_t window_size)
+{
+ sPAPRTCETable *tcet;
+ uint64_t bus_offset = sphb->dma32_window_start;
+ uint32_t nb_table = window_size >> page_shift;
+
+ if (!nb_table) {
+ return -1;
+ }
+
+ tcet = spapr_tce_new_table(DEVICE(sphb), liobn, bus_offset,
+ page_shift, nb_table, sphb->vfio_num > 0);
+
+ return tcet ? 0 : -1;
+}
+
+static int spapr_phb_dma_reset(sPAPRPHBState *sphb)
+{
+ sPAPRTCETable *tcet;
+
+ if (spapr_phb_dma_capabilities_update(sphb)) {
+ return -1;
+ }
+
+ /* Register default 32bit DMA window */
+ tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
+ if (!tcet) {
+ spapr_phb_dma_init_window(sphb, sphb->dma_liobn, SPAPR_TCE_PAGE_SHIFT,
+ sphb->dma32_window_size);
+
+ tcet = spapr_tce_find_by_liobn(sphb->dma_liobn);
+ if (!tcet) {
+ error_report("No default TCE table for %s", sphb->dtbusname);
+ return -1;
+ }
+
+ memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
+ spapr_tce_get_iommu(tcet));
+ }
+
+ return 0;
+}
+
/* Macros to operate with address in OF binding to PCI */
#define b_x(x, p, l) (((x) & ((1<<(l))-1)) << (p))
#define b_n(x) b_x((x), 31, 1) /* 0 if relocatable */
@@ -1217,7 +1277,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
SysBusDevice *s = SYS_BUS_DEVICE(dev);
sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
PCIHostState *phb = PCI_HOST_BRIDGE(s);
- sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
char *namebuf;
int i;
PCIBus *bus;
@@ -1371,35 +1430,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
}
}
- if (!info->finish_realize) {
- error_setg(errp, "finish_realize not defined");
- return;
- }
-
- info->finish_realize(sphb, errp);
-
sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
}
-static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
-{
- sPAPRTCETable *tcet;
- uint32_t nb_table;
-
- nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
- tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
- 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
- if (!tcet) {
- error_setg(errp, "Unable to create TCE table for %s",
- sphb->dtbusname);
- return ;
- }
-
- /* Register default 32bit DMA window */
- memory_region_add_subregion(&sphb->iommu_root, 0,
- spapr_tce_get_iommu(tcet));
-}
-
static int spapr_phb_children_reset(Object *child, void *opaque)
{
DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
@@ -1413,8 +1446,22 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
static void spapr_phb_reset(DeviceState *qdev)
{
+ sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
+
+ spapr_phb_dma_reset(sphb);
+
/* Reset the IOMMU state */
object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
+
+ if (sphb->vfio_num > 0) {
+ /*
+ * The PE might be in frozen state. To reenable the EEH
+ * functionality on it will clean the frozen state, which
+ * ensures that the contained PCI devices will work properly
+ * after reboot.
+ */
+ spapr_phb_vfio_eeh_reenable(sphb);
+ }
}
static Property spapr_phb_properties[] = {
@@ -1535,7 +1582,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
{
PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
DeviceClass *dc = DEVICE_CLASS(klass);
- sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
hc->root_bus_path = spapr_phb_root_bus_path;
@@ -1545,7 +1591,6 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_spapr_pci;
set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
dc->cannot_instantiate_with_device_add_yet = false;
- spc->finish_realize = spapr_phb_finish_realize;
hp->plug = spapr_phb_hot_plug_child;
hp->unplug = spapr_phb_hot_unplug_child;
}
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index cca45ed..f94d8a4 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,84 +20,47 @@
#include "hw/ppc/spapr.h"
#include "hw/pci-host/spapr.h"
#include "hw/pci/msix.h"
-#include "linux/vfio.h"
#include "hw/vfio/vfio.h"
+#ifdef CONFIG_LINUX
+#include "linux/vfio.h"
+
static Property spapr_phb_vfio_properties[] = {
- DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
+ DEFINE_PROP_INT32("iommu", sPAPRPHBState, iommugroupid, -1),
DEFINE_PROP_END_OF_LIST(),
};
-static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
+int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
{
- sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
int ret;
- sPAPRTCETable *tcet;
- uint32_t liobn = svphb->phb.dma_liobn;
- if (svphb->iommugroupid == -1) {
- error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid);
- return;
- }
-
- ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
- VFIO_CHECK_EXTENSION,
- (void *) VFIO_SPAPR_TCE_IOMMU);
- if (ret != 1) {
- error_setg_errno(errp, -ret,
- "spapr-vfio: SPAPR extension is not supported");
- return;
- }
-
- ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+ ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
if (ret) {
- error_setg_errno(errp, -ret,
- "spapr-vfio: get info from container failed");
- return;
+ return ret;
}
- tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start,
- SPAPR_TCE_PAGE_SHIFT,
- info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT,
- true);
- if (!tcet) {
- error_setg(errp, "spapr-vfio: failed to create VFIO TCE table");
- return;
- }
+ sphb->dma32_window_start = info.dma32_window_start;
+ sphb->dma32_window_size = info.dma32_window_size;
- /* Register default 32bit DMA window */
- memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
- spapr_tce_get_iommu(tcet));
+ return ret;
}
-static void spapr_phb_vfio_eeh_reenable(sPAPRPHBVFIOState *svphb)
+void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
{
struct vfio_eeh_pe_op op = {
.argsz = sizeof(op),
.op = VFIO_EEH_PE_ENABLE
};
- vfio_container_ioctl(&svphb->phb.iommu_as,
- svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
+ vfio_container_ioctl(&sphb->iommu_as,
+ sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
}
-static void spapr_phb_vfio_reset(DeviceState *qdev)
+int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
+ PCIDevice *pdev, int option)
{
- /*
- * The PE might be in frozen state. To reenable the EEH
- * functionality on it will clean the frozen state, which
- * ensures that the contained PCI devices will work properly
- * after reboot.
- */
- spapr_phb_vfio_eeh_reenable(SPAPR_PCI_VFIO_HOST_BRIDGE(qdev));
-}
-
-static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
- unsigned int addr, int option)
-{
- sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
int ret;
@@ -105,25 +68,9 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
case RTAS_EEH_DISABLE:
op.op = VFIO_EEH_PE_DISABLE;
break;
- case RTAS_EEH_ENABLE: {
- PCIHostState *phb;
- PCIDevice *pdev;
-
- /*
- * The EEH functionality is enabled on basis of PCI device,
- * instead of PE. We need check the validity of the PCI
- * device address.
- */
- phb = PCI_HOST_BRIDGE(sphb);
- pdev = pci_find_device(phb->bus,
- (addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
- if (!pdev) {
- return RTAS_OUT_PARAM_ERROR;
- }
-
+ case RTAS_EEH_ENABLE:
op.op = VFIO_EEH_PE_ENABLE;
break;
- }
case RTAS_EEH_THAW_IO:
op.op = VFIO_EEH_PE_UNFREEZE_IO;
break;
@@ -134,7 +81,7 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
return RTAS_OUT_PARAM_ERROR;
}
- ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+ ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_HW_ERROR;
@@ -143,14 +90,13 @@ static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
return RTAS_OUT_SUCCESS;
}
-static int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
+int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
{
- sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
int ret;
op.op = VFIO_EEH_PE_GET_STATE;
- ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+ ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_PARAM_ERROR;
@@ -203,9 +149,8 @@ static void spapr_phb_vfio_eeh_pre_reset(sPAPRPHBState *sphb)
pci_for_each_bus(phb->bus, spapr_phb_vfio_eeh_clear_bus_msix, NULL);
}
-static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
+int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
{
- sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
int ret;
@@ -225,7 +170,7 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
return RTAS_OUT_PARAM_ERROR;
}
- ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+ ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_HW_ERROR;
@@ -234,14 +179,13 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
return RTAS_OUT_SUCCESS;
}
-static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
+int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
{
- sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };
int ret;
op.op = VFIO_EEH_PE_CONFIGURE;
- ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+ ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_PARAM_ERROR;
@@ -253,23 +197,14 @@ static int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
- sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
dc->props = spapr_phb_vfio_properties;
- dc->reset = spapr_phb_vfio_reset;
- spc->finish_realize = spapr_phb_vfio_finish_realize;
- spc->eeh_set_option = spapr_phb_vfio_eeh_set_option;
- spc->eeh_get_state = spapr_phb_vfio_eeh_get_state;
- spc->eeh_reset = spapr_phb_vfio_eeh_reset;
- spc->eeh_configure = spapr_phb_vfio_eeh_configure;
}
static const TypeInfo spapr_phb_vfio_info = {
.name = TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE,
.parent = TYPE_SPAPR_PCI_HOST_BRIDGE,
- .instance_size = sizeof(sPAPRPHBVFIOState),
.class_init = spapr_phb_vfio_class_init,
- .class_size = sizeof(sPAPRPHBClass),
};
static void spapr_pci_vfio_register_types(void)
@@ -278,3 +213,36 @@ static void spapr_pci_vfio_register_types(void)
}
type_init(spapr_pci_vfio_register_types)
+
+#else /* !CONFIG_LINUX */
+
+int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
+{
+ return -EINVAL;
+}
+
+int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
+ PCIDevice *pdev, int option)
+{
+ return RTAS_OUT_HW_ERROR;
+}
+
+int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
+{
+ return RTAS_OUT_HW_ERROR;
+}
+
+int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
+{
+ return RTAS_OUT_HW_ERROR;
+}
+
+int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
+{
+ return RTAS_OUT_HW_ERROR;
+}
+
+void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
+{
+}
+#endif
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 5322b56..7f3c712 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -47,12 +47,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState;
struct sPAPRPHBClass {
PCIHostBridgeClass parent_class;
-
- void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
- int (*eeh_set_option)(sPAPRPHBState *sphb, unsigned int addr, int option);
- int (*eeh_get_state)(sPAPRPHBState *sphb, int *state);
- int (*eeh_reset)(sPAPRPHBState *sphb, int option);
- int (*eeh_configure)(sPAPRPHBState *sphb);
};
typedef struct spapr_pci_msi {
@@ -91,12 +85,11 @@ struct sPAPRPHBState {
spapr_pci_msi_mig *msi_devs;
QLIST_ENTRY(sPAPRPHBState) list;
-};
-struct sPAPRPHBVFIOState {
- sPAPRPHBState phb;
-
- int32_t iommugroupid;
+ int32_t iommugroupid; /* obsolete */
+ uint32_t dma32_window_start;
+ uint32_t dma32_window_size;
+ unsigned vfio_num;
};
#define SPAPR_PCI_MAX_INDEX 255
@@ -138,4 +131,12 @@ sPAPRPHBState *spapr_pci_find_phb(sPAPRMachineState *spapr, uint64_t buid);
PCIDevice *spapr_pci_find_dev(sPAPRMachineState *spapr, uint64_t buid,
uint32_t config_addr);
+int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb);
+int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
+ PCIDevice *pdev, int option);
+int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state);
+int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option);
+int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb);
+void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb);
+
#endif /* __HW_SPAPR_PCI_H__ */
--
2.4.0.rc3.8.gfb3e7d5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
@ 2015-09-03 4:40 ` Alexey Kardashevskiy
2015-09-10 2:43 ` Alex Williamson
1 sibling, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-03 4:40 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Gavin Shan,
David Gibson
So far there were 2 limitations enforced on an emulated PHB
regarding VFIO:
1) only one IOMMU group per IOMMU container was allowed and
the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
2) only one IOMMU container per PHB was allowed as a group
can only be attached to one container.
However these are not really necessary so we are removing them here.
This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
not to receive it. Instead of getting a container from a group ID,
this calls ioctl() for every container attached to the PHB address space.
This allows multiple containers on the same PHB which means multiple
groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
to be passed to every container separately which will affect
the performance. For 32bit devices it is still recommended to put
every group to a separate PHB.
Since the existing VFIO code is already trying to share a container for
multiple groups, just removing a group id from
the vfio_container_do_ioctl() allows the existing code to share
a container if it is supported by the host kernel.
This replaces the check for a group id to be set correctly with
the check that it is not set.
This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
members is accessed here.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
hw/ppc/spapr_pci.c | 10 +++++-----
hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
hw/vfio/common.c | 22 ++++++----------------
include/hw/vfio/vfio.h | 3 +--
4 files changed, 18 insertions(+), 34 deletions(-)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4e289cb..1b7559d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
if (sphb->vfio_num) {
- if (sphb->iommugroupid == -1) {
- error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
- return -1;
- }
-
ret = spapr_phb_vfio_dma_capabilities_update(sphb);
if (ret) {
error_report("Unable to get DMA32 info from VFIO");
@@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
PCIBus *bus;
uint64_t msi_window_size = 4096;
+ if ((sphb->iommugroupid != -1) &&
+ object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
+ error_report("Warning: iommugroupid is deprecated and will be ignored");
+ }
+
if (sphb->index != (uint32_t)-1) {
hwaddr windows_base;
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index f94d8a4..48137d5 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
int ret;
- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
+ ret = vfio_container_ioctl(&sphb->iommu_as,
VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
if (ret) {
return ret;
@@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
.op = VFIO_EEH_PE_ENABLE
};
- vfio_container_ioctl(&sphb->iommu_as,
- sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
+ vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
}
int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
@@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
return RTAS_OUT_PARAM_ERROR;
}
- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
- VFIO_EEH_PE_OP, &op);
+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_HW_ERROR;
}
@@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
int ret;
op.op = VFIO_EEH_PE_GET_STATE;
- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
- VFIO_EEH_PE_OP, &op);
+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_PARAM_ERROR;
}
@@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
return RTAS_OUT_PARAM_ERROR;
}
- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
- VFIO_EEH_PE_OP, &op);
+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_HW_ERROR;
}
@@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
int ret;
op.op = VFIO_EEH_PE_CONFIGURE;
- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
- VFIO_EEH_PE_OP, &op);
+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
if (ret < 0) {
return RTAS_OUT_PARAM_ERROR;
}
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85ee9b0..a00644e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
close(vbasedev->fd);
}
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
- int req, void *param)
+static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
{
- VFIOGroup *group;
VFIOContainer *container;
int ret = -1;
+ VFIOAddressSpace *space = vfio_get_address_space(as);
- group = vfio_get_group(groupid, as);
- if (!group) {
- error_report("vfio: group %d not registered", groupid);
- return ret;
- }
-
- container = group->container;
- if (group->container) {
+ QLIST_FOREACH(container, &space->containers, next) {
ret = ioctl(container->fd, req, param);
if (ret < 0) {
error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
_IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
+ return -errno;
}
}
- vfio_put_group(group);
-
return ret;
}
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
- int req, void *param)
+int vfio_container_ioctl(AddressSpace *as, int req, void *param)
{
/* We allow only certain ioctls to the container */
switch (req) {
@@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
return -1;
}
- return vfio_container_do_ioctl(as, groupid, req, param);
+ return vfio_container_do_ioctl(as, req, param);
}
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 0b26cd8..e076c14 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -3,7 +3,6 @@
#include "qemu/typedefs.h"
-extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
- int req, void *param);
+extern int vfio_container_ioctl(AddressSpace *as, int req, void *param);
#endif
--
2.4.0.rc3.8.gfb3e7d5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
@ 2015-09-10 2:43 ` Alex Williamson
2015-09-11 20:03 ` Alex Williamson
0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2015-09-10 2:43 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, David Gibson
On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> So far there were 2 limitations enforced on an emulated PHB
> regarding VFIO:
> 1) only one IOMMU group per IOMMU container was allowed and
> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> 2) only one IOMMU container per PHB was allowed as a group
> can only be attached to one container.
>
> However these are not really necessary so we are removing them here.
>
> This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> not to receive it. Instead of getting a container from a group ID,
> this calls ioctl() for every container attached to the PHB address space.
> This allows multiple containers on the same PHB which means multiple
> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> to be passed to every container separately which will affect
> the performance. For 32bit devices it is still recommended to put
> every group to a separate PHB.
>
> Since the existing VFIO code is already trying to share a container for
> multiple groups, just removing a group id from
> the vfio_container_do_ioctl() allows the existing code to share
> a container if it is supported by the host kernel.
>
> This replaces the check for a group id to be set correctly with
> the check that it is not set.
>
> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> members is accessed here.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> hw/ppc/spapr_pci.c | 10 +++++-----
> hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> hw/vfio/common.c | 22 ++++++----------------
> include/hw/vfio/vfio.h | 3 +--
> 4 files changed, 18 insertions(+), 34 deletions(-)
>
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 4e289cb..1b7559d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
>
> if (sphb->vfio_num) {
> - if (sphb->iommugroupid == -1) {
> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> - return -1;
> - }
> -
> ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> if (ret) {
> error_report("Unable to get DMA32 info from VFIO");
> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> PCIBus *bus;
> uint64_t msi_window_size = 4096;
>
> + if ((sphb->iommugroupid != -1) &&
> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> + error_report("Warning: iommugroupid is deprecated and will be ignored");
> + }
> +
> if (sphb->index != (uint32_t)-1) {
> hwaddr windows_base;
>
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index f94d8a4..48137d5 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> int ret;
>
> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> + ret = vfio_container_ioctl(&sphb->iommu_as,
> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> if (ret) {
> return ret;
> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> .op = VFIO_EEH_PE_ENABLE
> };
>
> - vfio_container_ioctl(&sphb->iommu_as,
> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> }
>
> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> return RTAS_OUT_PARAM_ERROR;
> }
>
> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> - VFIO_EEH_PE_OP, &op);
> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> if (ret < 0) {
> return RTAS_OUT_HW_ERROR;
> }
> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> int ret;
>
> op.op = VFIO_EEH_PE_GET_STATE;
> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> - VFIO_EEH_PE_OP, &op);
> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> if (ret < 0) {
> return RTAS_OUT_PARAM_ERROR;
> }
> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> return RTAS_OUT_PARAM_ERROR;
> }
>
> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> - VFIO_EEH_PE_OP, &op);
> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> if (ret < 0) {
> return RTAS_OUT_HW_ERROR;
> }
> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> int ret;
>
> op.op = VFIO_EEH_PE_CONFIGURE;
> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> - VFIO_EEH_PE_OP, &op);
> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> if (ret < 0) {
> return RTAS_OUT_PARAM_ERROR;
> }
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 85ee9b0..a00644e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> close(vbasedev->fd);
> }
>
> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> - int req, void *param)
> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
> {
> - VFIOGroup *group;
> VFIOContainer *container;
> int ret = -1;
> + VFIOAddressSpace *space = vfio_get_address_space(as);
>
> - group = vfio_get_group(groupid, as);
> - if (!group) {
> - error_report("vfio: group %d not registered", groupid);
> - return ret;
> - }
> -
> - container = group->container;
> - if (group->container) {
> + QLIST_FOREACH(container, &space->containers, next) {
> ret = ioctl(container->fd, req, param);
> if (ret < 0) {
> error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> + return -errno;
> }
> }
This highlights how terrible this ioctl passthrough interface is; what's
a caller supposed to do on error? Here we don't have visibility into
the ioctl being called in order to do any unwind on error. The caller
doesn't have the context to unwind only the failed containers. Is
returning errno here really sufficient for anyone to figure out how to
proceed or should we just cut our loses and abort()? What's the least
horrible way we can realistically handle a failure here? Thanks,
Alex
>
> - vfio_put_group(group);
> -
> return ret;
> }
>
> -int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> - int req, void *param)
> +int vfio_container_ioctl(AddressSpace *as, int req, void *param)
> {
> /* We allow only certain ioctls to the container */
> switch (req) {
> @@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> return -1;
> }
>
> - return vfio_container_do_ioctl(as, groupid, req, param);
> + return vfio_container_do_ioctl(as, req, param);
> }
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..e076c14 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -3,7 +3,6 @@
>
> #include "qemu/typedefs.h"
>
> -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> - int req, void *param);
> +extern int vfio_container_ioctl(AddressSpace *as, int req, void *param);
>
> #endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-10 2:43 ` Alex Williamson
@ 2015-09-11 20:03 ` Alex Williamson
2015-09-14 4:04 ` David Gibson
0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2015-09-11 20:03 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Gavin Shan, David Gibson
On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> > So far there were 2 limitations enforced on an emulated PHB
> > regarding VFIO:
> > 1) only one IOMMU group per IOMMU container was allowed and
> > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> > 2) only one IOMMU container per PHB was allowed as a group
> > can only be attached to one container.
> >
> > However these are not really necessary so we are removing them here.
> >
> > This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> > not to receive it. Instead of getting a container from a group ID,
> > this calls ioctl() for every container attached to the PHB address space.
> > This allows multiple containers on the same PHB which means multiple
> > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> > to be passed to every container separately which will affect
> > the performance. For 32bit devices it is still recommended to put
> > every group to a separate PHB.
> >
> > Since the existing VFIO code is already trying to share a container for
> > multiple groups, just removing a group id from
> > the vfio_container_do_ioctl() allows the existing code to share
> > a container if it is supported by the host kernel.
> >
> > This replaces the check for a group id to be set correctly with
> > the check that it is not set.
> >
> > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> > members is accessed here.
> >
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > hw/ppc/spapr_pci.c | 10 +++++-----
> > hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> > hw/vfio/common.c | 22 ++++++----------------
> > include/hw/vfio/vfio.h | 3 +--
> > 4 files changed, 18 insertions(+), 34 deletions(-)
> >
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 4e289cb..1b7559d 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
> >
> > if (sphb->vfio_num) {
> > - if (sphb->iommugroupid == -1) {
> > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> > - return -1;
> > - }
> > -
> > ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> > if (ret) {
> > error_report("Unable to get DMA32 info from VFIO");
> > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> > PCIBus *bus;
> > uint64_t msi_window_size = 4096;
> >
> > + if ((sphb->iommugroupid != -1) &&
> > + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> > + error_report("Warning: iommugroupid is deprecated and will be ignored");
> > + }
> > +
> > if (sphb->index != (uint32_t)-1) {
> > hwaddr windows_base;
> >
> > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> > index f94d8a4..48137d5 100644
> > --- a/hw/ppc/spapr_pci_vfio.c
> > +++ b/hw/ppc/spapr_pci_vfio.c
> > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> > int ret;
> >
> > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > + ret = vfio_container_ioctl(&sphb->iommu_as,
> > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > if (ret) {
> > return ret;
> > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> > .op = VFIO_EEH_PE_ENABLE
> > };
> >
> > - vfio_container_ioctl(&sphb->iommu_as,
> > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > }
> >
> > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > return RTAS_OUT_PARAM_ERROR;
> > }
> >
> > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > - VFIO_EEH_PE_OP, &op);
> > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > if (ret < 0) {
> > return RTAS_OUT_HW_ERROR;
> > }
> > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> > int ret;
> >
> > op.op = VFIO_EEH_PE_GET_STATE;
> > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > - VFIO_EEH_PE_OP, &op);
> > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > if (ret < 0) {
> > return RTAS_OUT_PARAM_ERROR;
> > }
> > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> > return RTAS_OUT_PARAM_ERROR;
> > }
> >
> > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > - VFIO_EEH_PE_OP, &op);
> > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > if (ret < 0) {
> > return RTAS_OUT_HW_ERROR;
> > }
> > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> > int ret;
> >
> > op.op = VFIO_EEH_PE_CONFIGURE;
> > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > - VFIO_EEH_PE_OP, &op);
> > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > if (ret < 0) {
> > return RTAS_OUT_PARAM_ERROR;
> > }
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 85ee9b0..a00644e 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> > close(vbasedev->fd);
> > }
> >
> > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> > - int req, void *param)
> > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
> > {
> > - VFIOGroup *group;
> > VFIOContainer *container;
> > int ret = -1;
> > + VFIOAddressSpace *space = vfio_get_address_space(as);
> >
> > - group = vfio_get_group(groupid, as);
> > - if (!group) {
> > - error_report("vfio: group %d not registered", groupid);
> > - return ret;
> > - }
> > -
> > - container = group->container;
> > - if (group->container) {
> > + QLIST_FOREACH(container, &space->containers, next) {
> > ret = ioctl(container->fd, req, param);
> > if (ret < 0) {
> > error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
> > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> > + return -errno;
> > }
> > }
>
> This highlights how terrible this ioctl passthrough interface is; what's
> a caller supposed to do on error? Here we don't have visibility into
> the ioctl being called in order to do any unwind on error. The caller
> doesn't have the context to unwind only the failed containers. Is
> returning errno here really sufficient for anyone to figure out how to
> proceed or should we just cut our loses and abort()? What's the least
> horrible way we can realistically handle a failure here? Thanks,
It seemed like I asked this before and it turns out that I did:
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
"Gavin says yes" is not really a supportable solution when I stumbled on
it again and David doesn't know why it's safe either (nor does it answer
my question of how does this work). At a minimum, the reasoning why
this is safe for the ioctls we currently allow here needs to be spelled
out in a comment. We could easily make the mistake of adding another
ioctl to the passthrough for which those assumptions are not valid. We
may even want to abort if it's not one of the ioctls we've evaluated so
we don't overlook this problem later. Thanks,
Alex
>
> > - vfio_put_group(group);
> > -
> > return ret;
> > }
> >
> > -int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> > - int req, void *param)
> > +int vfio_container_ioctl(AddressSpace *as, int req, void *param)
> > {
> > /* We allow only certain ioctls to the container */
> > switch (req) {
> > @@ -968,5 +958,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> > return -1;
> > }
> >
> > - return vfio_container_do_ioctl(as, groupid, req, param);
> > + return vfio_container_do_ioctl(as, req, param);
> > }
> > diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> > index 0b26cd8..e076c14 100644
> > --- a/include/hw/vfio/vfio.h
> > +++ b/include/hw/vfio/vfio.h
> > @@ -3,7 +3,6 @@
> >
> > #include "qemu/typedefs.h"
> >
> > -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> > - int req, void *param);
> > +extern int vfio_container_ioctl(AddressSpace *as, int req, void *param);
> >
> > #endif
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-11 20:03 ` Alex Williamson
@ 2015-09-14 4:04 ` David Gibson
2015-09-14 8:56 ` Alexey Kardashevskiy
0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2015-09-14 4:04 UTC (permalink / raw)
To: Alex Williamson; +Cc: Alexey Kardashevskiy, qemu-ppc, qemu-devel, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 9309 bytes --]
On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
> > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> > > So far there were 2 limitations enforced on an emulated PHB
> > > regarding VFIO:
> > > 1) only one IOMMU group per IOMMU container was allowed and
> > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> > > 2) only one IOMMU container per PHB was allowed as a group
> > > can only be attached to one container.
> > >
> > > However these are not really necessary so we are removing them here.
> > >
> > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> > > not to receive it. Instead of getting a container from a group ID,
> > > this calls ioctl() for every container attached to the PHB address space.
> > > This allows multiple containers on the same PHB which means multiple
> > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> > > to be passed to every container separately which will affect
> > > the performance. For 32bit devices it is still recommended to put
> > > every group to a separate PHB.
> > >
> > > Since the existing VFIO code is already trying to share a container for
> > > multiple groups, just removing a group id from
> > > the vfio_container_do_ioctl() allows the existing code to share
> > > a container if it is supported by the host kernel.
> > >
> > > This replaces the check for a group id to be set correctly with
> > > the check that it is not set.
> > >
> > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> > > members is accessed here.
> > >
> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > ---
> > > hw/ppc/spapr_pci.c | 10 +++++-----
> > > hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> > > hw/vfio/common.c | 22 ++++++----------------
> > > include/hw/vfio/vfio.h | 3 +--
> > > 4 files changed, 18 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index 4e289cb..1b7559d 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> > > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
> > >
> > > if (sphb->vfio_num) {
> > > - if (sphb->iommugroupid == -1) {
> > > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> > > - return -1;
> > > - }
> > > -
> > > ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> > > if (ret) {
> > > error_report("Unable to get DMA32 info from VFIO");
> > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> > > PCIBus *bus;
> > > uint64_t msi_window_size = 4096;
> > >
> > > + if ((sphb->iommugroupid != -1) &&
> > > + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> > > + error_report("Warning: iommugroupid is deprecated and will be ignored");
> > > + }
> > > +
> > > if (sphb->index != (uint32_t)-1) {
> > > hwaddr windows_base;
> > >
> > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> > > index f94d8a4..48137d5 100644
> > > --- a/hw/ppc/spapr_pci_vfio.c
> > > +++ b/hw/ppc/spapr_pci_vfio.c
> > > @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> > > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> > > int ret;
> > >
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > + ret = vfio_container_ioctl(&sphb->iommu_as,
> > > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > > if (ret) {
> > > return ret;
> > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> > > .op = VFIO_EEH_PE_ENABLE
> > > };
> > >
> > > - vfio_container_ioctl(&sphb->iommu_as,
> > > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> > > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > }
> > >
> > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > >
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_HW_ERROR;
> > > }
> > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> > > int ret;
> > >
> > > op.op = VFIO_EEH_PE_GET_STATE;
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > >
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_HW_ERROR;
> > > }
> > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> > > int ret;
> > >
> > > op.op = VFIO_EEH_PE_CONFIGURE;
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 85ee9b0..a00644e 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> > > close(vbasedev->fd);
> > > }
> > >
> > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> > > - int req, void *param)
> > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
> > > {
> > > - VFIOGroup *group;
> > > VFIOContainer *container;
> > > int ret = -1;
> > > + VFIOAddressSpace *space = vfio_get_address_space(as);
> > >
> > > - group = vfio_get_group(groupid, as);
> > > - if (!group) {
> > > - error_report("vfio: group %d not registered", groupid);
> > > - return ret;
> > > - }
> > > -
> > > - container = group->container;
> > > - if (group->container) {
> > > + QLIST_FOREACH(container, &space->containers, next) {
> > > ret = ioctl(container->fd, req, param);
> > > if (ret < 0) {
> > > error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
> > > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> > > + return -errno;
> > > }
> > > }
> >
> > This highlights how terrible this ioctl passthrough interface is; what's
> > a caller supposed to do on error? Here we don't have visibility into
> > the ioctl being called in order to do any unwind on error. The caller
> > doesn't have the context to unwind only the failed containers. Is
> > returning errno here really sufficient for anyone to figure out how to
> > proceed or should we just cut our loses and abort()? What's the least
> > horrible way we can realistically handle a failure here? Thanks,
>
> It seemed like I asked this before and it turns out that I did:
>
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
>
> "Gavin says yes" is not really a supportable solution when I stumbled on
> it again and David doesn't know why it's safe either (nor does it answer
> my question of how does this work). At a minimum, the reasoning why
> this is safe for the ioctls we currently allow here needs to be spelled
> out in a comment. We could easily make the mistake of adding another
> ioctl to the passthrough for which those assumptions are not valid. We
> may even want to abort if it's not one of the ioctls we've evaluated so
> we don't overlook this problem later. Thanks,
Yeah, you're right. This is a mess.
What we need to do here is to make vfio_container_ioctl() take a
VFIOContainer object as the name suggests. The the callers will need
to either use it on a specific container, or iterate across all the
containers in the VFIOAddressSpace as appropriate for the specific
operation.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-14 4:04 ` David Gibson
@ 2015-09-14 8:56 ` Alexey Kardashevskiy
2015-09-14 11:42 ` David Gibson
0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-14 8:56 UTC (permalink / raw)
To: David Gibson, Alex Williamson; +Cc: qemu-ppc, qemu-devel, Gavin Shan
On 09/14/2015 02:04 PM, David Gibson wrote:
> On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
>> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
>>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
>>>> So far there were 2 limitations enforced on an emulated PHB
>>>> regarding VFIO:
>>>> 1) only one IOMMU group per IOMMU container was allowed and
>>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
>>>> 2) only one IOMMU container per PHB was allowed as a group
>>>> can only be attached to one container.
>>>>
>>>> However these are not really necessary so we are removing them here.
>>>>
>>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
>>>> not to receive it. Instead of getting a container from a group ID,
>>>> this calls ioctl() for every container attached to the PHB address space.
>>>> This allows multiple containers on the same PHB which means multiple
>>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
>>>> to be passed to every container separately which will affect
>>>> the performance. For 32bit devices it is still recommended to put
>>>> every group to a separate PHB.
>>>>
>>>> Since the existing VFIO code is already trying to share a container for
>>>> multiple groups, just removing a group id from
>>>> the vfio_container_do_ioctl() allows the existing code to share
>>>> a container if it is supported by the host kernel.
>>>>
>>>> This replaces the check for a group id to be set correctly with
>>>> the check that it is not set.
>>>>
>>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
>>>> members is accessed here.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>> hw/ppc/spapr_pci.c | 10 +++++-----
>>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
>>>> hw/vfio/common.c | 22 ++++++----------------
>>>> include/hw/vfio/vfio.h | 3 +--
>>>> 4 files changed, 18 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index 4e289cb..1b7559d 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
>>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
>>>>
>>>> if (sphb->vfio_num) {
>>>> - if (sphb->iommugroupid == -1) {
>>>> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
>>>> - return -1;
>>>> - }
>>>> -
>>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb);
>>>> if (ret) {
>>>> error_report("Unable to get DMA32 info from VFIO");
>>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>> PCIBus *bus;
>>>> uint64_t msi_window_size = 4096;
>>>>
>>>> + if ((sphb->iommugroupid != -1) &&
>>>> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
>>>> + error_report("Warning: iommugroupid is deprecated and will be ignored");
>>>> + }
>>>> +
>>>> if (sphb->index != (uint32_t)-1) {
>>>> hwaddr windows_base;
>>>>
>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>>>> index f94d8a4..48137d5 100644
>>>> --- a/hw/ppc/spapr_pci_vfio.c
>>>> +++ b/hw/ppc/spapr_pci_vfio.c
>>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
>>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>>>> int ret;
>>>>
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as,
>>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>>>> if (ret) {
>>>> return ret;
>>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
>>>> .op = VFIO_EEH_PE_ENABLE
>>>> };
>>>>
>>>> - vfio_container_ioctl(&sphb->iommu_as,
>>>> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
>>>> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> }
>>>>
>>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>>
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_HW_ERROR;
>>>> }
>>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
>>>> int ret;
>>>>
>>>> op.op = VFIO_EEH_PE_GET_STATE;
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>>
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_HW_ERROR;
>>>> }
>>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>>>> int ret;
>>>>
>>>> op.op = VFIO_EEH_PE_CONFIGURE;
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 85ee9b0..a00644e 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>>>> close(vbasedev->fd);
>>>> }
>>>>
>>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
>>>> - int req, void *param)
>>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
>>>> {
>>>> - VFIOGroup *group;
>>>> VFIOContainer *container;
>>>> int ret = -1;
>>>> + VFIOAddressSpace *space = vfio_get_address_space(as);
>>>>
>>>> - group = vfio_get_group(groupid, as);
>>>> - if (!group) {
>>>> - error_report("vfio: group %d not registered", groupid);
>>>> - return ret;
>>>> - }
>>>> -
>>>> - container = group->container;
>>>> - if (group->container) {
>>>> + QLIST_FOREACH(container, &space->containers, next) {
>>>> ret = ioctl(container->fd, req, param);
>>>> if (ret < 0) {
>>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
>>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
>>>> + return -errno;
>>>> }
>>>> }
>>>
>>> This highlights how terrible this ioctl passthrough interface is; what's
>>> a caller supposed to do on error? Here we don't have visibility into
>>> the ioctl being called in order to do any unwind on error. The caller
>>> doesn't have the context to unwind only the failed containers. Is
>>> returning errno here really sufficient for anyone to figure out how to
>>> proceed or should we just cut our loses and abort()? What's the least
>>> horrible way we can realistically handle a failure here? Thanks,
>>
>> It seemed like I asked this before and it turns out that I did:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
>>
>> "Gavin says yes" is not really a supportable solution when I stumbled on
>> it again and David doesn't know why it's safe either (nor does it answer
>> my question of how does this work). At a minimum, the reasoning why
>> this is safe for the ioctls we currently allow here needs to be spelled
>> out in a comment. We could easily make the mistake of adding another
>> ioctl to the passthrough for which those assumptions are not valid. We
>> may even want to abort if it's not one of the ioctls we've evaluated so
>> we don't overlook this problem later. Thanks,
>
> Yeah, you're right. This is a mess.
Right. I have been testing/studying this lately a lot, will fix this too.
> What we need to do here is to make vfio_container_ioctl() take a
> VFIOContainer object as the name suggests. The the callers will need
> to either use it on a specific container, or iterate across all the
> containers in the VFIOAddressSpace as appropriate for the specific
> operation.
The callers - spapr code - do not deal with containers, these are a VFIO
internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace.
So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?)
makes more sense here. Which I can make a patch for. Or change
vfio_container_ioctl() as you suggest (+make is static) and add public
vfio_as_ioctl(). Which one to pick?
--
Alexey
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-14 8:56 ` Alexey Kardashevskiy
@ 2015-09-14 11:42 ` David Gibson
2015-09-14 14:37 ` Alexey Kardashevskiy
0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2015-09-14 11:42 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan
[-- Attachment #1: Type: text/plain, Size: 10589 bytes --]
On Mon, Sep 14, 2015 at 06:56:35PM +1000, Alexey Kardashevskiy wrote:
> On 09/14/2015 02:04 PM, David Gibson wrote:
> >On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
> >>On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
> >>>On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> >>>>So far there were 2 limitations enforced on an emulated PHB
> >>>>regarding VFIO:
> >>>>1) only one IOMMU group per IOMMU container was allowed and
> >>>>the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> >>>>2) only one IOMMU container per PHB was allowed as a group
> >>>>can only be attached to one container.
> >>>>
> >>>>However these are not really necessary so we are removing them here.
> >>>>
> >>>>This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> >>>>not to receive it. Instead of getting a container from a group ID,
> >>>>this calls ioctl() for every container attached to the PHB address space.
> >>>>This allows multiple containers on the same PHB which means multiple
> >>>>groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> >>>>to be passed to every container separately which will affect
> >>>>the performance. For 32bit devices it is still recommended to put
> >>>>every group to a separate PHB.
> >>>>
> >>>>Since the existing VFIO code is already trying to share a container for
> >>>>multiple groups, just removing a group id from
> >>>>the vfio_container_do_ioctl() allows the existing code to share
> >>>>a container if it is supported by the host kernel.
> >>>>
> >>>>This replaces the check for a group id to be set correctly with
> >>>>the check that it is not set.
> >>>>
> >>>>This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> >>>>members is accessed here.
> >>>>
> >>>>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>---
> >>>> hw/ppc/spapr_pci.c | 10 +++++-----
> >>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> >>>> hw/vfio/common.c | 22 ++++++----------------
> >>>> include/hw/vfio/vfio.h | 3 +--
> >>>> 4 files changed, 18 insertions(+), 34 deletions(-)
> >>>>
> >>>>diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >>>>index 4e289cb..1b7559d 100644
> >>>>--- a/hw/ppc/spapr_pci.c
> >>>>+++ b/hw/ppc/spapr_pci.c
> >>>>@@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> >>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
> >>>>
> >>>> if (sphb->vfio_num) {
> >>>>- if (sphb->iommugroupid == -1) {
> >>>>- error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> >>>>- return -1;
> >>>>- }
> >>>>-
> >>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> >>>> if (ret) {
> >>>> error_report("Unable to get DMA32 info from VFIO");
> >>>>@@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >>>> PCIBus *bus;
> >>>> uint64_t msi_window_size = 4096;
> >>>>
> >>>>+ if ((sphb->iommugroupid != -1) &&
> >>>>+ object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> >>>>+ error_report("Warning: iommugroupid is deprecated and will be ignored");
> >>>>+ }
> >>>>+
> >>>> if (sphb->index != (uint32_t)-1) {
> >>>> hwaddr windows_base;
> >>>>
> >>>>diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> >>>>index f94d8a4..48137d5 100644
> >>>>--- a/hw/ppc/spapr_pci_vfio.c
> >>>>+++ b/hw/ppc/spapr_pci_vfio.c
> >>>>@@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> >>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> >>>> int ret;
> >>>>
> >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as,
> >>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> >>>> if (ret) {
> >>>> return ret;
> >>>>@@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> >>>> .op = VFIO_EEH_PE_ENABLE
> >>>> };
> >>>>
> >>>>- vfio_container_ioctl(&sphb->iommu_as,
> >>>>- sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> >>>>+ vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> >>>> }
> >>>>
> >>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> >>>>@@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> >>>> return RTAS_OUT_PARAM_ERROR;
> >>>> }
> >>>>
> >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> >>>>- VFIO_EEH_PE_OP, &op);
> >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> >>>> if (ret < 0) {
> >>>> return RTAS_OUT_HW_ERROR;
> >>>> }
> >>>>@@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
> >>>> int ret;
> >>>>
> >>>> op.op = VFIO_EEH_PE_GET_STATE;
> >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> >>>>- VFIO_EEH_PE_OP, &op);
> >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> >>>> if (ret < 0) {
> >>>> return RTAS_OUT_PARAM_ERROR;
> >>>> }
> >>>>@@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> >>>> return RTAS_OUT_PARAM_ERROR;
> >>>> }
> >>>>
> >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> >>>>- VFIO_EEH_PE_OP, &op);
> >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> >>>> if (ret < 0) {
> >>>> return RTAS_OUT_HW_ERROR;
> >>>> }
> >>>>@@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> >>>> int ret;
> >>>>
> >>>> op.op = VFIO_EEH_PE_CONFIGURE;
> >>>>- ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> >>>>- VFIO_EEH_PE_OP, &op);
> >>>>+ ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> >>>> if (ret < 0) {
> >>>> return RTAS_OUT_PARAM_ERROR;
> >>>> }
> >>>>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>>>index 85ee9b0..a00644e 100644
> >>>>--- a/hw/vfio/common.c
> >>>>+++ b/hw/vfio/common.c
> >>>>@@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> >>>> close(vbasedev->fd);
> >>>> }
> >>>>
> >>>>-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> >>>>- int req, void *param)
> >>>>+static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
> >>>> {
> >>>>- VFIOGroup *group;
> >>>> VFIOContainer *container;
> >>>> int ret = -1;
> >>>>+ VFIOAddressSpace *space = vfio_get_address_space(as);
> >>>>
> >>>>- group = vfio_get_group(groupid, as);
> >>>>- if (!group) {
> >>>>- error_report("vfio: group %d not registered", groupid);
> >>>>- return ret;
> >>>>- }
> >>>>-
> >>>>- container = group->container;
> >>>>- if (group->container) {
> >>>>+ QLIST_FOREACH(container, &space->containers, next) {
> >>>> ret = ioctl(container->fd, req, param);
> >>>> if (ret < 0) {
> >>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
> >>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> >>>>+ return -errno;
> >>>> }
> >>>> }
> >>>
> >>>This highlights how terrible this ioctl passthrough interface is; what's
> >>>a caller supposed to do on error? Here we don't have visibility into
> >>>the ioctl being called in order to do any unwind on error. The caller
> >>>doesn't have the context to unwind only the failed containers. Is
> >>>returning errno here really sufficient for anyone to figure out how to
> >>>proceed or should we just cut our loses and abort()? What's the least
> >>>horrible way we can realistically handle a failure here? Thanks,
> >>
> >>It seemed like I asked this before and it turns out that I did:
> >>
> >>https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
> >>
> >>"Gavin says yes" is not really a supportable solution when I stumbled on
> >>it again and David doesn't know why it's safe either (nor does it answer
> >>my question of how does this work). At a minimum, the reasoning why
> >>this is safe for the ioctls we currently allow here needs to be spelled
> >>out in a comment. We could easily make the mistake of adding another
> >>ioctl to the passthrough for which those assumptions are not valid. We
> >>may even want to abort if it's not one of the ioctls we've evaluated so
> >>we don't overlook this problem later. Thanks,
> >
> >Yeah, you're right. This is a mess.
>
>
> Right. I have been testing/studying this lately a lot, will fix this too.
>
>
> >What we need to do here is to make vfio_container_ioctl() take a
> >VFIOContainer object as the name suggests. The the callers will need
> >to either use it on a specific container, or iterate across all the
> >containers in the VFIOAddressSpace as appropriate for the specific
> >operation.
>
> The callers - spapr code - do not deal with containers, these are a VFIO
> internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace.
>
> So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?)
> makes more sense here. Which I can make a patch for. Or change
> vfio_container_ioctl() as you suggest (+make is static) and add public
> vfio_as_ioctl(). Which one to pick?
No, that doesn't address the problem. The ioctl is operating on the
container, the wrapper needs to take the container. Whether the ioctl
can be safely replicated across all containers in the
VFIOAddressSpace, or can only be used if you have additional
information to select a specific container *cannot* be known unless
you know exactly what ioctl() you're talking about. That decision has
to move to the callers, which know what they're doing.
If those callers don't know about individual containers now, then they
have to learn.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
2015-09-14 11:42 ` David Gibson
@ 2015-09-14 14:37 ` Alexey Kardashevskiy
0 siblings, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-14 14:37 UTC (permalink / raw)
To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan
On 09/14/2015 09:42 PM, David Gibson wrote:
> On Mon, Sep 14, 2015 at 06:56:35PM +1000, Alexey Kardashevskiy wrote:
>> On 09/14/2015 02:04 PM, David Gibson wrote:
>>> On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
>>>> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
>>>>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
>>>>>> So far there were 2 limitations enforced on an emulated PHB
>>>>>> regarding VFIO:
>>>>>> 1) only one IOMMU group per IOMMU container was allowed and
>>>>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
>>>>>> 2) only one IOMMU container per PHB was allowed as a group
>>>>>> can only be attached to one container.
>>>>>>
>>>>>> However these are not really necessary so we are removing them here.
>>>>>>
>>>>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
>>>>>> not to receive it. Instead of getting a container from a group ID,
>>>>>> this calls ioctl() for every container attached to the PHB address space.
>>>>>> This allows multiple containers on the same PHB which means multiple
>>>>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
>>>>>> to be passed to every container separately which will affect
>>>>>> the performance. For 32bit devices it is still recommended to put
>>>>>> every group to a separate PHB.
>>>>>>
>>>>>> Since the existing VFIO code is already trying to share a container for
>>>>>> multiple groups, just removing a group id from
>>>>>> the vfio_container_do_ioctl() allows the existing code to share
>>>>>> a container if it is supported by the host kernel.
>>>>>>
>>>>>> This replaces the check for a group id to be set correctly with
>>>>>> the check that it is not set.
>>>>>>
>>>>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
>>>>>> members is accessed here.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>> hw/ppc/spapr_pci.c | 10 +++++-----
>>>>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
>>>>>> hw/vfio/common.c | 22 ++++++----------------
>>>>>> include/hw/vfio/vfio.h | 3 +--
>>>>>> 4 files changed, 18 insertions(+), 34 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>>> index 4e289cb..1b7559d 100644
>>>>>> --- a/hw/ppc/spapr_pci.c
>>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
>>>>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
>>>>>>
>>>>>> if (sphb->vfio_num) {
>>>>>> - if (sphb->iommugroupid == -1) {
>>>>>> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
>>>>>> - return -1;
>>>>>> - }
>>>>>> -
>>>>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb);
>>>>>> if (ret) {
>>>>>> error_report("Unable to get DMA32 info from VFIO");
>>>>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>>>> PCIBus *bus;
>>>>>> uint64_t msi_window_size = 4096;
>>>>>>
>>>>>> + if ((sphb->iommugroupid != -1) &&
>>>>>> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
>>>>>> + error_report("Warning: iommugroupid is deprecated and will be ignored");
>>>>>> + }
>>>>>> +
>>>>>> if (sphb->index != (uint32_t)-1) {
>>>>>> hwaddr windows_base;
>>>>>>
>>>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>>>>>> index f94d8a4..48137d5 100644
>>>>>> --- a/hw/ppc/spapr_pci_vfio.c
>>>>>> +++ b/hw/ppc/spapr_pci_vfio.c
>>>>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
>>>>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>>>>>> int ret;
>>>>>>
>>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as,
>>>>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>>>>>> if (ret) {
>>>>>> return ret;
>>>>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
>>>>>> .op = VFIO_EEH_PE_ENABLE
>>>>>> };
>>>>>>
>>>>>> - vfio_container_ioctl(&sphb->iommu_as,
>>>>>> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
>>>>>> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>>> }
>>>>>>
>>>>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>>>> return RTAS_OUT_PARAM_ERROR;
>>>>>> }
>>>>>>
>>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>>>> - VFIO_EEH_PE_OP, &op);
>>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>>> if (ret < 0) {
>>>>>> return RTAS_OUT_HW_ERROR;
>>>>>> }
>>>>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
>>>>>> int ret;
>>>>>>
>>>>>> op.op = VFIO_EEH_PE_GET_STATE;
>>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>>>> - VFIO_EEH_PE_OP, &op);
>>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>>> if (ret < 0) {
>>>>>> return RTAS_OUT_PARAM_ERROR;
>>>>>> }
>>>>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>>>>>> return RTAS_OUT_PARAM_ERROR;
>>>>>> }
>>>>>>
>>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>>>> - VFIO_EEH_PE_OP, &op);
>>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>>> if (ret < 0) {
>>>>>> return RTAS_OUT_HW_ERROR;
>>>>>> }
>>>>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>>>>>> int ret;
>>>>>>
>>>>>> op.op = VFIO_EEH_PE_CONFIGURE;
>>>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>>>> - VFIO_EEH_PE_OP, &op);
>>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>>>> if (ret < 0) {
>>>>>> return RTAS_OUT_PARAM_ERROR;
>>>>>> }
>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>> index 85ee9b0..a00644e 100644
>>>>>> --- a/hw/vfio/common.c
>>>>>> +++ b/hw/vfio/common.c
>>>>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>>>>>> close(vbasedev->fd);
>>>>>> }
>>>>>>
>>>>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
>>>>>> - int req, void *param)
>>>>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
>>>>>> {
>>>>>> - VFIOGroup *group;
>>>>>> VFIOContainer *container;
>>>>>> int ret = -1;
>>>>>> + VFIOAddressSpace *space = vfio_get_address_space(as);
>>>>>>
>>>>>> - group = vfio_get_group(groupid, as);
>>>>>> - if (!group) {
>>>>>> - error_report("vfio: group %d not registered", groupid);
>>>>>> - return ret;
>>>>>> - }
>>>>>> -
>>>>>> - container = group->container;
>>>>>> - if (group->container) {
>>>>>> + QLIST_FOREACH(container, &space->containers, next) {
>>>>>> ret = ioctl(container->fd, req, param);
>>>>>> if (ret < 0) {
>>>>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
>>>>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
>>>>>> + return -errno;
>>>>>> }
>>>>>> }
>>>>>
>>>>> This highlights how terrible this ioctl passthrough interface is; what's
>>>>> a caller supposed to do on error? Here we don't have visibility into
>>>>> the ioctl being called in order to do any unwind on error. The caller
>>>>> doesn't have the context to unwind only the failed containers. Is
>>>>> returning errno here really sufficient for anyone to figure out how to
>>>>> proceed or should we just cut our loses and abort()? What's the least
>>>>> horrible way we can realistically handle a failure here? Thanks,
>>>>
>>>> It seemed like I asked this before and it turns out that I did:
>>>>
>>>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
>>>>
>>>> "Gavin says yes" is not really a supportable solution when I stumbled on
>>>> it again and David doesn't know why it's safe either (nor does it answer
>>>> my question of how does this work). At a minimum, the reasoning why
>>>> this is safe for the ioctls we currently allow here needs to be spelled
>>>> out in a comment. We could easily make the mistake of adding another
>>>> ioctl to the passthrough for which those assumptions are not valid. We
>>>> may even want to abort if it's not one of the ioctls we've evaluated so
>>>> we don't overlook this problem later. Thanks,
>>>
>>> Yeah, you're right. This is a mess.
>>
>>
>> Right. I have been testing/studying this lately a lot, will fix this too.
>>
>>
>>> What we need to do here is to make vfio_container_ioctl() take a
>>> VFIOContainer object as the name suggests. The the callers will need
>>> to either use it on a specific container, or iterate across all the
>>> containers in the VFIOAddressSpace as appropriate for the specific
>>> operation.
>>
>> The callers - spapr code - do not deal with containers, these are a VFIO
>> internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace.
>>
>> So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?)
>> makes more sense here. Which I can make a patch for. Or change
>> vfio_container_ioctl() as you suggest (+make is static) and add public
>> vfio_as_ioctl(). Which one to pick?
>
> No, that doesn't address the problem. The ioctl is operating on the
> container, the wrapper needs to take the container. Whether the ioctl
> can be safely replicated across all containers in the
> VFIOAddressSpace, or can only be used if you have additional
> information to select a specific container *cannot* be known unless
> you know exactly what ioctl() you're talking about. That decision has
> to move to the callers, which know what they're doing.
>
> If those callers don't know about individual containers now, then they
> have to learn.
I lost you here. Guests (and therefore the vfio_container_ioctl() callers
which are RTAS handlers) work with either LIOBNs or phbid+bus+devfn (where
bus+devfn are not used) so it is LIOBN or PHB. If there are multiple
containers, there is no way (and no need) to call these ioctls on some
containers only. If that really matters, the user is supposed to use a
separate PHB.
If bus+devfn are used (which I'll double check), then we might want another
helper in QEMU - vfio_pci_ioctl(PCIDevice *pdev) and new vfio-pci-spapr PCI
driver on the host (of #ifdef in the existing vfio-pci driver) but I doubt
this is the case.
--
Alexey
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-09-14 14:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
2015-09-10 2:43 ` Alex Williamson
2015-09-11 20:03 ` Alex Williamson
2015-09-14 4:04 ` David Gibson
2015-09-14 8:56 ` Alexey Kardashevskiy
2015-09-14 11:42 ` David Gibson
2015-09-14 14:37 ` Alexey Kardashevskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).