* [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3
@ 2013-05-14 9:13 David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 01/11] iommu: Fix compile error in ioapic.c David Gibson
` (11 more replies)
0 siblings, 12 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, qemu-ppc, qemu-devel, mst
Hi,
Here's the next version of my patches working towards integrating vfio
with guest visible iommu support. These are on top of Paolo Bonzini's
iommu branch at:
git://github.com/bonzini/qemu.git
Paolo says he has already merged patches 1-4, but they don't seem to
have hit the published tree yet.
With this version, we have, theory, working support for guest IOMMUs
with VFIO. It's untested, though, and needs considerable polish.
Still, it's a proof of concept, and should demonstrate what I have in
mind here.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 01/11] iommu: Fix compile error in ioapic.c
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 02/11] pci: Don't del_subgregion on a non subregion David Gibson
` (10 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
The iommu tree introduced a build bug into hw/i386/kvm/ioapic.c. Looks
like a messed up copy and paste.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/i386/kvm/ioapic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
index e3c29da..fe24d26 100644
--- a/hw/i386/kvm/ioapic.c
+++ b/hw/i386/kvm/ioapic.c
@@ -108,7 +108,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
abort();
}
- memory_region_unref(mrs.mr);
+ memory_region_unref(&s->io_memory);
}
static void kvm_ioapic_reset(DeviceState *dev)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 02/11] pci: Don't del_subgregion on a non subregion
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 01/11] iommu: Fix compile error in ioapic.c David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 03/11] pci: Rework PCI iommu lifetime assumptions David Gibson
` (9 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
Currently do_pci_unregister_device() calls memory_region_del_subregion()
on the device's bus_master_enable_region and the device's iommu region.
But the bus_master_enable_region is an _alias_ to the iommu region, the
iommu region is _not_ a subregion of it. I suspect this has slipped by
only because PCI hot unplug has not been tested with the new PCI DMA
address space handling. This patch removes the bogus call.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/pci/pci.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 0ba39e6..58d3f69 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -875,7 +875,6 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
pci_config_free(pci_dev);
address_space_destroy(&pci_dev->bus_master_as);
- memory_region_del_subregion(&pci_dev->bus_master_enable_region, pci_dev->iommu);
pci_dev->bus->iommu_dtor_fn(pci_dev->iommu);
memory_region_destroy(&pci_dev->bus_master_enable_region);
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 03/11] pci: Rework PCI iommu lifetime assumptions
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 01/11] iommu: Fix compile error in ioapic.c David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 02/11] pci: Don't del_subgregion on a non subregion David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 04/11] pci: Use AddressSpace rather than MemoryRegion to represent PCI DMA space David Gibson
` (8 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
The current bus callbacks to assign and destroy an iommu memory region for
a PCI device tacitly assume that the lifetime of that address space is
tied to the lifetime of the PCI device. Although that's certainly
possible, it's much more likely that the region will be (at least
potentially) shared between several devices and have a lifetime instead
tied to the PCI host bridge, or the IOMMU itself. This is demonstrated in
the fact that there are no existing users of the destructor hook.
This patch simplifies the code by moving to the more likely use model.
This means we can eliminate the destructor hook entirely, and the iommu_fn
hook is now assumed to inform us of the device's pre-existing DMA adddress
space, rather than creating or assigning it. That in turn means we have
no need to keep a reference to the region around in the PCIDevice
structure, which saves a little space.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/pci/pci.c | 16 +++++-----------
hw/ppc/spapr_pci.c | 2 +-
include/hw/pci/pci.h | 5 +----
include/hw/pci/pci_bus.h | 1 -
4 files changed, 7 insertions(+), 17 deletions(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 58d3f69..3c947b3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -285,10 +285,6 @@ static MemoryRegion *pci_default_iommu(PCIBus *bus, void *opaque, int devfn)
return get_system_memory();
}
-static void pci_default_iommu_dtor(MemoryRegion *mr)
-{
-}
-
static void pci_bus_init(PCIBus *bus, DeviceState *parent,
const char *name,
MemoryRegion *address_space_mem,
@@ -299,7 +295,7 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
bus->devfn_min = devfn_min;
bus->address_space_mem = address_space_mem;
bus->address_space_io = address_space_io;
- pci_setup_iommu(bus, pci_default_iommu, NULL, NULL);
+ pci_setup_iommu(bus, pci_default_iommu, NULL);
/* host bridge */
QLIST_INIT(&bus->child);
@@ -797,6 +793,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
PCIConfigReadFunc *config_read = pc->config_read;
PCIConfigWriteFunc *config_write = pc->config_write;
+ MemoryRegion *dma_mr;
if (devfn < 0) {
for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
@@ -814,9 +811,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
}
pci_dev->bus = bus;
- pci_dev->iommu = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
+ dma_mr = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
memory_region_init_alias(&pci_dev->bus_master_enable_region, "bus master",
- pci_dev->iommu, 0, memory_region_size(pci_dev->iommu));
+ dma_mr, 0, memory_region_size(dma_mr));
memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
address_space_init(&pci_dev->bus_master_as, &pci_dev->bus_master_enable_region,
name);
@@ -875,7 +872,6 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
pci_config_free(pci_dev);
address_space_destroy(&pci_dev->bus_master_as);
- pci_dev->bus->iommu_dtor_fn(pci_dev->iommu);
memory_region_destroy(&pci_dev->bus_master_enable_region);
}
@@ -2237,11 +2233,9 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
k->props = pci_props;
}
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, PCIIOMMUDestructorFunc dtor,
- void *opaque)
+void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
{
bus->iommu_fn = fn;
- bus->iommu_dtor_fn = dtor ? dtor : pci_default_iommu_dtor;
bus->iommu_opaque = opaque;
}
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7014b61..eb1d9e7 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -650,7 +650,7 @@ static int spapr_phb_init(SysBusDevice *s)
fprintf(stderr, "Unable to create TCE table for %s\n", sphb->dtbusname);
return -1;
}
- pci_setup_iommu(bus, spapr_pci_dma_iommu, NULL, sphb);
+ pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
QLIST_INSERT_HEAD(&spapr->phbs, sphb, list);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 400e772..61fe51e 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -242,7 +242,6 @@ struct PCIDevice {
PCIIORegion io_regions[PCI_NUM_REGIONS];
AddressSpace bus_master_as;
MemoryRegion bus_master_enable_region;
- MemoryRegion *iommu;
/* do not access the following fields */
PCIConfigReadFunc *config_read;
@@ -402,10 +401,8 @@ int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
void pci_device_deassert_intx(PCIDevice *dev);
typedef MemoryRegion *(*PCIIOMMUFunc)(PCIBus *, void *, int);
-typedef void (*PCIIOMMUDestructorFunc)(MemoryRegion *mr);
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, PCIIOMMUDestructorFunc dtor,
- void *opaque);
+void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
static inline void
pci_set_byte(uint8_t *config, uint8_t val)
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index fada8f5..66762f6 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -11,7 +11,6 @@
struct PCIBus {
BusState qbus;
PCIIOMMUFunc iommu_fn;
- PCIIOMMUDestructorFunc iommu_dtor_fn;
void *iommu_opaque;
uint8_t devfn_min;
pci_set_irq_fn set_irq;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 04/11] pci: Use AddressSpace rather than MemoryRegion to represent PCI DMA space
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (2 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 03/11] pci: Rework PCI iommu lifetime assumptions David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 05/11] pci: Introduce helper to retrieve a PCI device's DMA address space David Gibson
` (7 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
Currently the PCI iommu_fn hook returns a MemoryRegion * to represent the
DMA address of this bus's IOMMU, although that MemoryRegion does have to
be a root MemoryRegion. Several upcoming users of this need the extra
features of an AddressSpace object, rather than a MemoryRegion, and while
they could each construct their own AddressSpace wrapper for the iommu
MemoryRegion, that leads to unnecessary proliferation of essentially
identical AddressSpace objects. This patch avoids that, by instead having
iommu_fn return an AddressSpace *, assuming the referenced AS's lifetime
is managed somewhere else (probably the PCI host bridge).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/pci/pci.c | 10 +++++-----
hw/ppc/spapr_pci.c | 6 ++++--
include/hw/pci-host/spapr.h | 1 +
include/hw/pci/pci.h | 2 +-
4 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 3c947b3..39085d8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -279,10 +279,10 @@ int pci_find_domain(const PCIBus *bus)
return -1;
}
-static MemoryRegion *pci_default_iommu(PCIBus *bus, void *opaque, int devfn)
+static AddressSpace *pci_default_iommu(PCIBus *bus, void *opaque, int devfn)
{
/* FIXME: inherit memory region from bus creator */
- return get_system_memory();
+ return &address_space_memory;
}
static void pci_bus_init(PCIBus *bus, DeviceState *parent,
@@ -793,7 +793,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
PCIConfigReadFunc *config_read = pc->config_read;
PCIConfigWriteFunc *config_write = pc->config_write;
- MemoryRegion *dma_mr;
+ AddressSpace *dma_as;
if (devfn < 0) {
for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
@@ -811,9 +811,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
}
pci_dev->bus = bus;
- dma_mr = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
+ dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
memory_region_init_alias(&pci_dev->bus_master_enable_region, "bus master",
- dma_mr, 0, memory_region_size(dma_mr));
+ dma_as->root, 0, memory_region_size(dma_as->root));
memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
address_space_init(&pci_dev->bus_master_as, &pci_dev->bus_master_enable_region,
name);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index eb1d9e7..762db62 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -506,11 +506,11 @@ static const MemoryRegionOps spapr_msi_ops = {
/*
* PHB PCI device
*/
-static MemoryRegion *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
{
sPAPRPHBState *phb = opaque;
- return spapr_tce_get_iommu(phb->tcet);
+ return &phb->iommu_as;
}
static int spapr_phb_init(SysBusDevice *s)
@@ -650,6 +650,8 @@ static int spapr_phb_init(SysBusDevice *s)
fprintf(stderr, "Unable to create TCE table for %s\n", sphb->dtbusname);
return -1;
}
+ address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
+ sphb->dtbusname);
pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
QLIST_INSERT_HEAD(&spapr->phbs, sphb, list);
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 653dd40..1e23dbf 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -50,6 +50,7 @@ typedef struct sPAPRPHBState {
uint64_t dma_window_start;
uint64_t dma_window_size;
sPAPRTCETable *tcet;
+ AddressSpace iommu_as;
struct {
uint32_t irq;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 61fe51e..6ef1f97 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -400,7 +400,7 @@ int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
void pci_device_deassert_intx(PCIDevice *dev);
-typedef MemoryRegion *(*PCIIOMMUFunc)(PCIBus *, void *, int);
+typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 05/11] pci: Introduce helper to retrieve a PCI device's DMA address space
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (3 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 04/11] pci: Use AddressSpace rather than MemoryRegion to represent PCI DMA space David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 06/11] memory: Sanity check that no listeners remain on a destroyed AddressSpace David Gibson
` (6 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
A PCI device's DMA address space (possibly an IOMMU) is returned by a
method on the PCIBus. At the moment that only has one caller, so the
method is simply open coded. We'll need another caller for VFIO, so
this patch introduces a helper/wrapper function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/pci/pci.c | 9 ++++++++-
include/hw/pci/pci.h | 1 +
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 39085d8..e7a9735 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -811,7 +811,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
}
pci_dev->bus = bus;
- dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
+ dma_as = pci_iommu_as(pci_dev);
memory_region_init_alias(&pci_dev->bus_master_enable_region, "bus master",
dma_as->root, 0, memory_region_size(dma_as->root));
memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
@@ -2233,6 +2233,13 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
k->props = pci_props;
}
+AddressSpace *pci_iommu_as(PCIDevice *dev)
+{
+ PCIBus *bus = PCI_BUS(dev->bus);
+
+ return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
+}
+
void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
{
bus->iommu_fn = fn;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6ef1f97..21e2132 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -402,6 +402,7 @@ void pci_device_deassert_intx(PCIDevice *dev);
typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
+AddressSpace *pci_iommu_as(PCIDevice *dev);
void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
static inline void
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 06/11] memory: Sanity check that no listeners remain on a destroyed AddressSpace
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (4 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 05/11] pci: Introduce helper to retrieve a PCI device's DMA address space David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 07/11] vfio: Introduce VFIO address spaces David Gibson
` (5 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
At the moment, most AddressSpace objects last as long as the guest system
in practice, but that could well change in future. In addition, for VFIO
we will be introducing some private per-AdressSpace information, which must
be disposed of before the AddressSpace itself is destroyed.
To reduce the chances of subtle bugs in this area, this patch adds
asssertions to ensure that when an AddressSpace is destroyed, there are no
remaining MemoryListeners using that AS as a filter.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
memory.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/memory.c b/memory.c
index 1a86607..c409ee5 100644
--- a/memory.c
+++ b/memory.c
@@ -1727,11 +1727,18 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
void address_space_destroy(AddressSpace *as)
{
+ MemoryListener *listener;
+
/* Flush out anything from MemoryListeners listening in on this */
memory_region_transaction_begin();
as->root = NULL;
memory_region_transaction_commit();
QTAILQ_REMOVE(&address_spaces, as, address_spaces_link);
+
+ QTAILQ_FOREACH(listener, &memory_listeners, link) {
+ assert(listener->address_space_filter != as);
+ }
+
address_space_destroy_dispatch(as);
flatview_unref(as->current_map);
g_free(as->name);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 07/11] vfio: Introduce VFIO address spaces
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (5 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 06/11] memory: Sanity check that no listeners remain on a destroyed AddressSpace David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 16:53 ` Alex Williamson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 08/11] vfio: Create VFIOAddressSpace objects as needed David Gibson
` (4 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
The only model so far supported for VFIO passthrough devices is the model
usually used on x86, where all of the guest's RAM is mapped into the
(host) IOMMU and there is no IOMMU visible in the guest.
This patch begins to relax this model, introducing the notion of a
VFIOAddressSpace. This represents a logical DMA address space which will
be visible to one or more VFIO devices by appropriate mapping in the (host)
IOMMU. Thus the currently global list of containers becomes local to
a VFIOAddressSpace, and we verify that we don't attempt to add a VFIO
group to multiple address spaces.
For now, only one VFIOAddressSpace is created and used, corresponding to
main system memory, that will change in future patches.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/misc/vfio.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 48 insertions(+), 15 deletions(-)
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index c4a8853..e557f69 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -113,9 +113,17 @@ enum {
VFIO_INT_MSIX = 3,
};
+typedef struct VFIOAddressSpace {
+ AddressSpace *as;
+ QLIST_HEAD(, VFIOContainer) containers;
+} VFIOAddressSpace;
+
+static VFIOAddressSpace vfio_address_space_memory;
+
struct VFIOGroup;
typedef struct VFIOContainer {
+ VFIOAddressSpace *space;
int fd; /* /dev/vfio/vfio, empowered by the attached groups */
struct {
/* enable abstraction to support various iommu backends */
@@ -178,9 +186,6 @@ typedef struct VFIOGroup {
#define MSIX_CAP_LENGTH 12
-static QLIST_HEAD(, VFIOContainer)
- container_list = QLIST_HEAD_INITIALIZER(container_list);
-
static QLIST_HEAD(, VFIOGroup)
group_list = QLIST_HEAD_INITIALIZER(group_list);
@@ -2624,16 +2629,28 @@ static int vfio_load_rom(VFIODevice *vdev)
return 0;
}
-static int vfio_connect_container(VFIOGroup *group)
+static void vfio_address_space_init(VFIOAddressSpace *space, AddressSpace *as)
+{
+ space->as = as;
+ QLIST_INIT(&space->containers);
+}
+
+static int vfio_connect_container(VFIOGroup *group, VFIOAddressSpace *space)
{
VFIOContainer *container;
int ret, fd;
if (group->container) {
- return 0;
+ if (group->container->space == space) {
+ return 0;
+ } else {
+ error_report("vfio: group %d used in multiple address spaces",
+ group->groupid);
+ return -EBUSY;
+ }
}
- QLIST_FOREACH(container, &container_list, next) {
+ QLIST_FOREACH(container, &space->containers, next) {
if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
group->container = container;
QLIST_INSERT_HEAD(&container->group_list, group, container_next);
@@ -2656,6 +2673,7 @@ static int vfio_connect_container(VFIOGroup *group)
}
container = g_malloc0(sizeof(*container));
+ container->space = space;
container->fd = fd;
if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
@@ -2678,7 +2696,8 @@ static int vfio_connect_container(VFIOGroup *group)
container->iommu_data.listener = vfio_memory_listener;
container->iommu_data.release = vfio_listener_release;
- memory_listener_register(&container->iommu_data.listener, &address_space_memory);
+ memory_listener_register(&container->iommu_data.listener,
+ container->space->as);
} else {
error_report("vfio: No available IOMMU models");
g_free(container);
@@ -2687,7 +2706,7 @@ static int vfio_connect_container(VFIOGroup *group)
}
QLIST_INIT(&container->group_list);
- QLIST_INSERT_HEAD(&container_list, container, next);
+ QLIST_INSERT_HEAD(&space->containers, container, next);
group->container = container;
QLIST_INSERT_HEAD(&container->group_list, group, container_next);
@@ -2700,7 +2719,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
VFIOContainer *container = group->container;
if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
- error_report("vfio: error disconnecting group %d from container",
+ error_report("vfio: error disconnecting group %d from context",
group->groupid);
}
@@ -2712,13 +2731,13 @@ static void vfio_disconnect_container(VFIOGroup *group)
container->iommu_data.release(container);
}
QLIST_REMOVE(container, next);
- DPRINTF("vfio_disconnect_container: close container->fd\n");
+ DPRINTF("vfio_disconnect: close container->fd\n");
close(container->fd);
g_free(container);
}
}
-static VFIOGroup *vfio_get_group(int groupid)
+static VFIOGroup *vfio_get_group(int groupid, VFIOAddressSpace *space)
{
VFIOGroup *group;
char path[32];
@@ -2726,7 +2745,15 @@ static VFIOGroup *vfio_get_group(int groupid)
QLIST_FOREACH(group, &group_list, next) {
if (group->groupid == groupid) {
- return group;
+ /* Found it. Now is it already in the right context? */
+ assert(group->container);
+ if (group->container->space == space) {
+ return group;
+ } else {
+ error_report("vfio: group %d used in multiple address spaces",
+ group->groupid);
+ return NULL;
+ }
}
}
@@ -2759,8 +2786,8 @@ static VFIOGroup *vfio_get_group(int groupid)
group->groupid = groupid;
QLIST_INIT(&group->device_list);
- if (vfio_connect_container(group)) {
- error_report("vfio: failed to setup container for group %d", groupid);
+ if (vfio_connect_container(group, space)) {
+ error_report("vfio: failed to setup context for group %d", groupid);
close(group->fd);
g_free(group);
return NULL;
@@ -2992,7 +3019,12 @@ static int vfio_initfn(PCIDevice *pdev)
DPRINTF("%s(%04x:%02x:%02x.%x) group %d\n", __func__, vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function, groupid);
- group = vfio_get_group(groupid);
+ if (pci_iommu_as(pdev) != &address_space_memory) {
+ error_report("vfio: DMA address space must be system memory");
+ return -ENXIO;
+ }
+
+ group = vfio_get_group(groupid, &vfio_address_space_memory);
if (!group) {
error_report("vfio: failed to get group %d", groupid);
return -ENOENT;
@@ -3212,6 +3244,7 @@ static const TypeInfo vfio_pci_dev_info = {
static void register_vfio_pci_dev_type(void)
{
+ vfio_address_space_init(&vfio_address_space_memory, &address_space_memory);
type_register_static(&vfio_pci_dev_info);
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 08/11] vfio: Create VFIOAddressSpace objects as needed
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (6 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 07/11] vfio: Introduce VFIO address spaces David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry David Gibson
` (3 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
So far, VFIO has a notion of different logical DMA address spaces, but
only ever uses one (system memory). This patch extends this, creating
new VFIOAddressSpace objects as necessary, according to the AddressSpace
reported by the PCI subsystem for this device's DMAs.
This isn't enough yet to support guest side IOMMUs with VFIO, but it does
mean we could now support VFIO devices on, for example, a guest side PCI
host bridge which maps system memory at somewhere other than 0 in PCI
space.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/misc/vfio.c | 42 ++++++++++++++++++++++++++++++++++--------
1 file changed, 34 insertions(+), 8 deletions(-)
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index e557f69..f4e3792 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -116,9 +116,10 @@ enum {
typedef struct VFIOAddressSpace {
AddressSpace *as;
QLIST_HEAD(, VFIOContainer) containers;
+ QLIST_ENTRY(VFIOAddressSpace) list;
} VFIOAddressSpace;
-static VFIOAddressSpace vfio_address_space_memory;
+QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
struct VFIOGroup;
@@ -2629,10 +2630,33 @@ static int vfio_load_rom(VFIODevice *vdev)
return 0;
}
-static void vfio_address_space_init(VFIOAddressSpace *space, AddressSpace *as)
+static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
{
+ VFIOAddressSpace *space;
+
+ QLIST_FOREACH(space, &vfio_address_spaces, list) {
+ if (space->as == as)
+ return space;
+ }
+
+ /* No suitable VFIOAddressSpace, create a new one */
+ space = g_malloc0(sizeof(*space));
space->as = as;
QLIST_INIT(&space->containers);
+
+ QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
+
+ return space;
+}
+
+static void vfio_put_address_space(VFIOAddressSpace *space)
+{
+ if (!QLIST_EMPTY(&space->containers)) {
+ return;
+ }
+
+ QLIST_REMOVE(space, list);
+ g_free(space);
}
static int vfio_connect_container(VFIOGroup *group, VFIOAddressSpace *space)
@@ -2727,6 +2751,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
group->container = NULL;
if (QLIST_EMPTY(&container->group_list)) {
+ VFIOAddressSpace *space = container->space;
+
if (container->iommu_data.release) {
container->iommu_data.release(container);
}
@@ -2734,6 +2760,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
DPRINTF("vfio_disconnect: close container->fd\n");
close(container->fd);
g_free(container);
+
+ vfio_put_address_space(space);
}
}
@@ -2984,6 +3012,7 @@ static int vfio_initfn(PCIDevice *pdev)
{
VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
VFIOGroup *group;
+ VFIOAddressSpace *space;
char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
ssize_t len;
struct stat st;
@@ -3019,14 +3048,12 @@ static int vfio_initfn(PCIDevice *pdev)
DPRINTF("%s(%04x:%02x:%02x.%x) group %d\n", __func__, vdev->host.domain,
vdev->host.bus, vdev->host.slot, vdev->host.function, groupid);
- if (pci_iommu_as(pdev) != &address_space_memory) {
- error_report("vfio: DMA address space must be system memory");
- return -ENXIO;
- }
+ space = vfio_get_address_space(pci_iommu_as(pdev));
- group = vfio_get_group(groupid, &vfio_address_space_memory);
+ group = vfio_get_group(groupid, space);
if (!group) {
error_report("vfio: failed to get group %d", groupid);
+ vfio_put_address_space(space);
return -ENOENT;
}
@@ -3244,7 +3271,6 @@ static const TypeInfo vfio_pci_dev_info = {
static void register_vfio_pci_dev_type(void)
{
- vfio_address_space_init(&vfio_address_space_memory, &address_space_memory);
type_register_static(&vfio_pci_dev_info);
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (7 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 08/11] vfio: Create VFIOAddressSpace objects as needed David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 16:53 ` Alex Williamson
2013-05-14 21:22 ` [Qemu-devel] " Paolo Bonzini
2013-05-14 9:13 ` [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers David Gibson
` (2 subsequent siblings)
11 siblings, 2 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
Currently, the IOMMUTLBEntry structure contains the translated address,
but not the IOVA (original untranslated address). We're going to need it
for upcoming changes, so add it in here, and populate it correctly in our
one existing iommu implementation.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/ppc/spapr_iommu.c | 2 ++
include/exec/memory.h | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 90469b3..07a6307 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -80,6 +80,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
if (tcet->bypass) {
return (IOMMUTLBEntry) {
+ .iova = 0,
.translated_addr = 0,
.addr_mask = ~(hwaddr)0,
.perm = { true, true },
@@ -102,6 +103,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
#endif
return (IOMMUTLBEntry) {
+ .iova = addr & ~SPAPR_TCE_PAGE_MASK,
.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK,
.addr_mask = SPAPR_TCE_PAGE_MASK,
.perm = { [0] = tce & SPAPR_TCE_RO, [1] = tce & SPAPR_TCE_WO },
diff --git a/include/exec/memory.h b/include/exec/memory.h
index b97ace7..cd33439 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -116,7 +116,7 @@ typedef struct IOMMUTLBEntry IOMMUTLBEntry;
typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
struct IOMMUTLBEntry {
- hwaddr translated_addr;
+ hwaddr iova, translated_addr;
hwaddr addr_mask; /* 0xfff = 4k translation */
bool perm[2]; /* permissions, [0] for read, [1] for write */
};
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (8 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 17:15 ` Alex Williamson
2013-05-14 21:23 ` Paolo Bonzini
2013-05-14 9:13 ` [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support David Gibson
2013-05-14 9:39 ` [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 Paolo Bonzini
11 siblings, 2 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
This patch adds a NotifierList to MemoryRegions which represent IOMMUs
allowing other parts of the code to register interest in mappings or
unmappings from the IOMMU. All IOMMU implementations will need to call
memory_region_notify_iommu() to inform those waiting on the notifier list,
whenever an IOMMU mapping is made or removed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/ppc/spapr_iommu.c | 8 ++++++++
include/exec/memory.h | 7 +++++++
memory.c | 18 ++++++++++++++++++
3 files changed, 33 insertions(+)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 07a6307..6d40485 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -194,6 +194,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
target_ulong tce)
{
sPAPRTCE *tcep;
+ IOMMUTLBEntry entry;
if (ioba >= tcet->window_size) {
hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
@@ -204,6 +205,13 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
tcep->tce = tce;
+ entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
+ entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
+ entry.addr_mask = SPAPR_TCE_PAGE_MASK;
+ entry.perm[0] = !!(tce & SPAPR_TCE_RO);
+ entry.perm[1] = !!(tce & SPAPR_TCE_WO);
+ memory_region_notify_iommu(&tcet->iommu, entry);
+
return H_SUCCESS;
}
diff --git a/include/exec/memory.h b/include/exec/memory.h
index cd33439..024b511 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -25,6 +25,7 @@
#include "exec/iorange.h"
#include "exec/ioport.h"
#include "qemu/int128.h"
+#include "qemu/notify.h"
typedef struct MemoryRegionOps MemoryRegionOps;
typedef struct MemoryRegionPortio MemoryRegionPortio;
@@ -160,6 +161,7 @@ struct MemoryRegion {
unsigned ioeventfd_nb;
MemoryRegionIoeventfd *ioeventfds;
struct AddressSpace *iommu_target_as;
+ NotifierList iommu_notify;
};
struct MemoryRegionPortio {
@@ -462,6 +464,11 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
*/
bool memory_region_is_iommu(MemoryRegion *mr);
+void memory_region_notify_iommu(MemoryRegion *mr,
+ IOMMUTLBEntry entry);
+void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
+void memory_region_unregister_iommu_notifier(Notifier *n);
+
/**
* memory_region_name: get a memory region's name
*
diff --git a/memory.c b/memory.c
index c409ee5..b11ca9f 100644
--- a/memory.c
+++ b/memory.c
@@ -1060,6 +1060,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
mr->terminates = true; /* then re-forwards */
mr->destructor = memory_region_destructor_none;
mr->iommu_target_as = target_as;
+ notifier_list_init(&mr->iommu_notify);
}
static uint64_t invalid_read(void *opaque, hwaddr addr,
@@ -1175,6 +1176,23 @@ bool memory_region_is_iommu(MemoryRegion *mr)
return mr->iommu_ops;
}
+void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
+{
+ notifier_list_add(&mr->iommu_notify, n);
+}
+
+void memory_region_unregister_iommu_notifier(Notifier *n)
+{
+ notifier_remove(n);
+}
+
+void memory_region_notify_iommu(MemoryRegion *mr,
+ IOMMUTLBEntry entry)
+{
+ assert(memory_region_is_iommu(mr));
+ notifier_list_notify(&mr->iommu_notify, &entry);
+}
+
void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
{
uint8_t mask = 1 << client;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (9 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers David Gibson
@ 2013-05-14 9:13 ` David Gibson
2013-05-14 9:48 ` Paolo Bonzini
2013-05-14 17:15 ` [Qemu-devel] " Alex Williamson
2013-05-14 9:39 ` [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 Paolo Bonzini
11 siblings, 2 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 9:13 UTC (permalink / raw)
To: alex.williamson, pbonzini; +Cc: aik, David Gibson, qemu-ppc, qemu-devel, mst
This patch uses the new IOMMU notifiers to allow VFIO pass through devices
to work with guest side IOMMUs, as long as the host-side VFIO iommu has
sufficient capability and granularity to match the guest side. This works
by tracking all map and unmap operations on the guest IOMMU using the
notifiers, and mirroring them into VFIO.
There are a number of FIXMEs, and the scheme involves rather more notifier
structures than I'd like, but it shuold make for a reasonable proof of
concept.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 126 insertions(+), 13 deletions(-)
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index f4e3792..62a83ca 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -133,10 +133,18 @@ typedef struct VFIOContainer {
};
void (*release)(struct VFIOContainer *);
} iommu_data;
+ QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
QLIST_HEAD(, VFIOGroup) group_list;
QLIST_ENTRY(VFIOContainer) next;
} VFIOContainer;
+typedef struct VFIOGuestIOMMU {
+ VFIOContainer *container;
+ MemoryRegion *iommu;
+ Notifier n;
+ QLIST_ENTRY(VFIOGuestIOMMU) list;
+} VFIOGuestIOMMU;
+
/* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
typedef struct VFIOMSIXInfo {
uint8_t table_bar;
@@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
static bool vfio_listener_skipped_section(MemoryRegionSection *section)
{
- return !memory_region_is_ram(section->mr);
+ return !memory_region_is_ram(section->mr) &&
+ !memory_region_is_iommu(section->mr);
+}
+
+static void vfio_iommu_map_notify(Notifier *n, void *data)
+{
+ VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+ MemoryRegion *iommu = giommu->iommu;
+ VFIOContainer *container = giommu->container;
+ IOMMUTLBEntry *iotlb = data;
+ MemoryRegionSection *mrs;
+ hwaddr xlat;
+ hwaddr len = iotlb->addr_mask + 1;
+ void *vaddr;
+ int ret;
+
+ DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+ iotlb->iova, iotlb->iova + iotlb->address_mask);
+
+ /* The IOMMU TLB entry we have just covers translation through
+ * this IOMMU to its immediate target. We need to translate
+ * it the rest of the way through to memory. */
+ mrs = address_space_translate(iommu->iommu_target_as,
+ iotlb->translated_addr,
+ &xlat, &len, iotlb->perm[1]);
+ if (!memory_region_is_ram(mrs->mr)) {
+ DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
+ xlat);
+ return;
+ }
+ if (len & iotlb->addr_mask) {
+ DPRINTF("iommu has granularity incompatible with target AS\n");
+ return;
+ }
+
+ vaddr = memory_region_get_ram_ptr(mrs->mr) +
+ mrs->offset_within_region +
+ (xlat - mrs->offset_within_address_space);
+
+ if (iotlb->perm[0] || iotlb->perm[1]) {
+ ret = vfio_dma_map(container, iotlb->iova,
+ iotlb->addr_mask + 1, vaddr,
+ !iotlb->perm[1] || mrs->readonly);
+ if (ret) {
+ error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iotlb->iova,
+ iotlb->addr_mask + 1, vaddr, ret);
+ }
+ } else {
+ ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
+ if (ret) {
+ error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iotlb->iova,
+ iotlb->addr_mask + 1, ret);
+ }
+ }
}
static void vfio_listener_region_add(MemoryListener *listener,
@@ -1949,11 +2014,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
VFIOContainer *container = container_of(listener, VFIOContainer,
iommu_data.listener);
hwaddr iova, end;
- void *vaddr;
int ret;
- assert(!memory_region_is_iommu(section->mr));
-
if (vfio_listener_skipped_section(section)) {
DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
section->offset_within_address_space,
@@ -1975,20 +2037,55 @@ static void vfio_listener_region_add(MemoryListener *listener,
return;
}
- vaddr = memory_region_get_ram_ptr(section->mr) +
+ memory_region_ref(section->mr);
+
+ if (memory_region_is_ram(section->mr)) {
+ void *vaddr;
+
+ DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
+ iova, end - 1, vaddr);
+
+ vaddr = memory_region_get_ram_ptr(section->mr) +
section->offset_within_region +
(iova - section->offset_within_address_space);
- DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
- iova, end - 1, vaddr);
- memory_region_ref(section->mr);
- ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
- if (ret) {
- error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx", %p) = %d (%m)",
- container, iova, end - iova, vaddr, ret);
+ ret = vfio_dma_map(container, iova, end - iova, vaddr,
+ section->readonly);
+ if (ret) {
+ error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, end - iova, vaddr, ret);
+ }
+ } else if (memory_region_is_iommu(section->mr)) {
+ VFIOGuestIOMMU *giommu;
+
+ DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+ iova, end - 1);
+
+ /*
+ * FIXME: We should do some checking to see if the
+ * capabilities of the host VFIO IOMMU are adequate to model
+ * the guest IOMMU
+ *
+ * FIXME: This assumes that the guest IOMMU is empty of
+ * mappings at this point - we should either enforce this, or
+ * loop through existing mappings to map them into VFIO.
+ *
+ * FIXME: For VFIO iommu types which have KVM acceleration to
+ * avoid bouncing all map/unmaps through qemu this way, this
+ * would be the right place to wire that up (tell the KVM
+ * device emulation the VFIO iommu handles to use).
+ */
+ giommu = g_malloc0(sizeof(*giommu));
+ giommu->iommu = section->mr;
+ giommu->container = container;
+ giommu->n.notify = vfio_iommu_map_notify;
+
+ QLIST_INSERT_HEAD(&container->guest_iommus, giommu, list);
+ memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
}
+
}
static void vfio_listener_region_del(MemoryListener *listener,
@@ -2012,6 +2109,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
return;
}
+ if (memory_region_is_iommu(section->mr)) {
+ VFIOGuestIOMMU *giommu;
+
+ QLIST_FOREACH(giommu, &container->guest_iommus, list) {
+ if (giommu->iommu == section->mr) {
+ memory_region_unregister_iommu_notifier(&giommu->n);
+ break;
+ }
+ }
+
+ /* FIXME: We assume the one big unmap below is adequate to
+ * remove any individual page mappings in the IOMMU which
+ * might have been copied into VFIO. That may not be true for
+ * all IOMMU types */
+ }
+
iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
end = (section->offset_within_address_space + section->size) &
TARGET_PAGE_MASK;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
` (10 preceding siblings ...)
2013-05-14 9:13 ` [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support David Gibson
@ 2013-05-14 9:39 ` Paolo Bonzini
11 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2013-05-14 9:39 UTC (permalink / raw)
To: David Gibson; +Cc: aik, alex.williamson, qemu-ppc, qemu-devel, mst
Il 14/05/2013 11:13, David Gibson ha scritto:
> Hi,
>
> Here's the next version of my patches working towards integrating vfio
> with guest visible iommu support. These are on top of Paolo Bonzini's
> iommu branch at:
> git://github.com/bonzini/qemu.git
> Paolo says he has already merged patches 1-4, but they don't seem to
> have hit the published tree yet.
Will do that ASAP.
> With this version, we have, theory, working support for guest IOMMUs
> with VFIO. It's untested, though, and needs considerable polish.
> Still, it's a proof of concept, and should demonstrate what I have in
> mind here.
It's a very nice proof of concept, as far as I can tell (don't know much
about VFIO).
Paolo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-14 9:13 ` [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support David Gibson
@ 2013-05-14 9:48 ` Paolo Bonzini
2013-05-14 13:57 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-14 17:15 ` [Qemu-devel] " Alex Williamson
1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2013-05-14 9:48 UTC (permalink / raw)
To: David Gibson; +Cc: aik, alex.williamson, qemu-ppc, qemu-devel, mst
Il 14/05/2013 11:13, David Gibson ha scritto:
> This patch uses the new IOMMU notifiers to allow VFIO pass through devices
> to work with guest side IOMMUs, as long as the host-side VFIO iommu has
> sufficient capability and granularity to match the guest side. This works
> by tracking all map and unmap operations on the guest IOMMU using the
> notifiers, and mirroring them into VFIO.
>
> There are a number of FIXMEs, and the scheme involves rather more notifier
> structures than I'd like, but it shuold make for a reasonable proof of
> concept.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 126 insertions(+), 13 deletions(-)
>
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index f4e3792..62a83ca 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -133,10 +133,18 @@ typedef struct VFIOContainer {
> };
> void (*release)(struct VFIOContainer *);
> } iommu_data;
> + QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
> QLIST_HEAD(, VFIOGroup) group_list;
> QLIST_ENTRY(VFIOContainer) next;
> } VFIOContainer;
>
> +typedef struct VFIOGuestIOMMU {
> + VFIOContainer *container;
> + MemoryRegion *iommu;
> + Notifier n;
> + QLIST_ENTRY(VFIOGuestIOMMU) list;
> +} VFIOGuestIOMMU;
> +
> /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
> typedef struct VFIOMSIXInfo {
> uint8_t table_bar;
> @@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>
> static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> {
> - return !memory_region_is_ram(section->mr);
> + return !memory_region_is_ram(section->mr) &&
> + !memory_region_is_iommu(section->mr);
> +}
> +
> +static void vfio_iommu_map_notify(Notifier *n, void *data)
> +{
> + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> + MemoryRegion *iommu = giommu->iommu;
> + VFIOContainer *container = giommu->container;
> + IOMMUTLBEntry *iotlb = data;
> + MemoryRegionSection *mrs;
> + hwaddr xlat;
> + hwaddr len = iotlb->addr_mask + 1;
> + void *vaddr;
> + int ret;
> +
> + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> + iotlb->iova, iotlb->iova + iotlb->address_mask);
> +
> + /* The IOMMU TLB entry we have just covers translation through
> + * this IOMMU to its immediate target. We need to translate
> + * it the rest of the way through to memory. */
> + mrs = address_space_translate(iommu->iommu_target_as,
> + iotlb->translated_addr,
> + &xlat, &len, iotlb->perm[1]);
> + if (!memory_region_is_ram(mrs->mr)) {
> + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
> + xlat);
> + return;
> + }
> + if (len & iotlb->addr_mask) {
> + DPRINTF("iommu has granularity incompatible with target AS\n");
> + return;
> + }
> +
> + vaddr = memory_region_get_ram_ptr(mrs->mr) +
> + mrs->offset_within_region +
> + (xlat - mrs->offset_within_address_space);
Are you sure you need this translation? It's done already in
address_space_translate, so it should be just
vaddr = memory_region_get_ram_ptr(mrs->mr) + xlat;
Paolo
> + if (iotlb->perm[0] || iotlb->perm[1]) {
> + ret = vfio_dma_map(container, iotlb->iova,
> + iotlb->addr_mask + 1, vaddr,
> + !iotlb->perm[1] || mrs->readonly);
> + if (ret) {
> + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> + container, iotlb->iova,
> + iotlb->addr_mask + 1, vaddr, ret);
> + }
> + } else {
> + ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> + if (ret) {
> + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> + "0x%"HWADDR_PRIx") = %d (%m)",
> + container, iotlb->iova,
> + iotlb->addr_mask + 1, ret);
> + }
> + }
> }
>
> static void vfio_listener_region_add(MemoryListener *listener,
> @@ -1949,11 +2014,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
> VFIOContainer *container = container_of(listener, VFIOContainer,
> iommu_data.listener);
> hwaddr iova, end;
> - void *vaddr;
> int ret;
>
> - assert(!memory_region_is_iommu(section->mr));
> -
> if (vfio_listener_skipped_section(section)) {
> DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
> section->offset_within_address_space,
> @@ -1975,20 +2037,55 @@ static void vfio_listener_region_add(MemoryListener *listener,
> return;
> }
>
> - vaddr = memory_region_get_ram_ptr(section->mr) +
> + memory_region_ref(section->mr);
> +
> + if (memory_region_is_ram(section->mr)) {
> + void *vaddr;
> +
> + DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> + iova, end - 1, vaddr);
> +
> + vaddr = memory_region_get_ram_ptr(section->mr) +
> section->offset_within_region +
> (iova - section->offset_within_address_space);
>
> - DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> - iova, end - 1, vaddr);
>
> - memory_region_ref(section->mr);
> - ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
> - if (ret) {
> - error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> - "0x%"HWADDR_PRIx", %p) = %d (%m)",
> - container, iova, end - iova, vaddr, ret);
> + ret = vfio_dma_map(container, iova, end - iova, vaddr,
> + section->readonly);
> + if (ret) {
> + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> + container, iova, end - iova, vaddr, ret);
> + }
> + } else if (memory_region_is_iommu(section->mr)) {
> + VFIOGuestIOMMU *giommu;
> +
> + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> + iova, end - 1);
> +
> + /*
> + * FIXME: We should do some checking to see if the
> + * capabilities of the host VFIO IOMMU are adequate to model
> + * the guest IOMMU
> + *
> + * FIXME: This assumes that the guest IOMMU is empty of
> + * mappings at this point - we should either enforce this, or
> + * loop through existing mappings to map them into VFIO.
> + *
> + * FIXME: For VFIO iommu types which have KVM acceleration to
> + * avoid bouncing all map/unmaps through qemu this way, this
> + * would be the right place to wire that up (tell the KVM
> + * device emulation the VFIO iommu handles to use).
> + */
> + giommu = g_malloc0(sizeof(*giommu));
> + giommu->iommu = section->mr;
> + giommu->container = container;
> + giommu->n.notify = vfio_iommu_map_notify;
> +
> + QLIST_INSERT_HEAD(&container->guest_iommus, giommu, list);
> + memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> }
> +
> }
>
> static void vfio_listener_region_del(MemoryListener *listener,
> @@ -2012,6 +2109,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
> return;
> }
>
> + if (memory_region_is_iommu(section->mr)) {
> + VFIOGuestIOMMU *giommu;
> +
> + QLIST_FOREACH(giommu, &container->guest_iommus, list) {
> + if (giommu->iommu == section->mr) {
> + memory_region_unregister_iommu_notifier(&giommu->n);
> + break;
> + }
> + }
> +
> + /* FIXME: We assume the one big unmap below is adequate to
> + * remove any individual page mappings in the IOMMU which
> + * might have been copied into VFIO. That may not be true for
> + * all IOMMU types */
> + }
> +
> iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> end = (section->offset_within_address_space + section->size) &
> TARGET_PAGE_MASK;
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-14 9:48 ` Paolo Bonzini
@ 2013-05-14 13:57 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-14 13:57 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: alex.williamson, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 4126 bytes --]
On Tue, May 14, 2013 at 11:48:47AM +0200, Paolo Bonzini wrote:
> Il 14/05/2013 11:13, David Gibson ha scritto:
> > This patch uses the new IOMMU notifiers to allow VFIO pass through devices
> > to work with guest side IOMMUs, as long as the host-side VFIO iommu has
> > sufficient capability and granularity to match the guest side. This works
> > by tracking all map and unmap operations on the guest IOMMU using the
> > notifiers, and mirroring them into VFIO.
> >
> > There are a number of FIXMEs, and the scheme involves rather more notifier
> > structures than I'd like, but it shuold make for a reasonable proof of
> > concept.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 126 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > index f4e3792..62a83ca 100644
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -133,10 +133,18 @@ typedef struct VFIOContainer {
> > };
> > void (*release)(struct VFIOContainer *);
> > } iommu_data;
> > + QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
> > QLIST_HEAD(, VFIOGroup) group_list;
> > QLIST_ENTRY(VFIOContainer) next;
> > } VFIOContainer;
> >
> > +typedef struct VFIOGuestIOMMU {
> > + VFIOContainer *container;
> > + MemoryRegion *iommu;
> > + Notifier n;
> > + QLIST_ENTRY(VFIOGuestIOMMU) list;
> > +} VFIOGuestIOMMU;
> > +
> > /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
> > typedef struct VFIOMSIXInfo {
> > uint8_t table_bar;
> > @@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
> >
> > static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > {
> > - return !memory_region_is_ram(section->mr);
> > + return !memory_region_is_ram(section->mr) &&
> > + !memory_region_is_iommu(section->mr);
> > +}
> > +
> > +static void vfio_iommu_map_notify(Notifier *n, void *data)
> > +{
> > + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > + MemoryRegion *iommu = giommu->iommu;
> > + VFIOContainer *container = giommu->container;
> > + IOMMUTLBEntry *iotlb = data;
> > + MemoryRegionSection *mrs;
> > + hwaddr xlat;
> > + hwaddr len = iotlb->addr_mask + 1;
> > + void *vaddr;
> > + int ret;
> > +
> > + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > + iotlb->iova, iotlb->iova + iotlb->address_mask);
> > +
> > + /* The IOMMU TLB entry we have just covers translation through
> > + * this IOMMU to its immediate target. We need to translate
> > + * it the rest of the way through to memory. */
> > + mrs = address_space_translate(iommu->iommu_target_as,
> > + iotlb->translated_addr,
> > + &xlat, &len, iotlb->perm[1]);
> > + if (!memory_region_is_ram(mrs->mr)) {
> > + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
> > + xlat);
> > + return;
> > + }
> > + if (len & iotlb->addr_mask) {
> > + DPRINTF("iommu has granularity incompatible with target AS\n");
> > + return;
> > + }
> > +
> > + vaddr = memory_region_get_ram_ptr(mrs->mr) +
> > + mrs->offset_within_region +
> > + (xlat - mrs->offset_within_address_space);
>
> Are you sure you need this translation? It's done already in
> address_space_translate, so it should be just
>
> vaddr = memory_region_get_ram_ptr(mrs->mr) + xlat;
Ah, yes. I wasn't sure what xlat was relative to (the final address
space, or the final memory region). Looking more carefully at the
similar path in address_space_map(), I see that it's the memory
region. I'll fix that up.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 07/11] vfio: Introduce VFIO address spaces
2013-05-14 9:13 ` [Qemu-devel] [PATCH 07/11] vfio: Introduce VFIO address spaces David Gibson
@ 2013-05-14 16:53 ` Alex Williamson
2013-05-15 1:18 ` [Qemu-devel] [Qemu-ppc] " David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2013-05-14 16:53 UTC (permalink / raw)
To: David Gibson; +Cc: aik, pbonzini, qemu-ppc, qemu-devel, mst
On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> The only model so far supported for VFIO passthrough devices is the model
> usually used on x86, where all of the guest's RAM is mapped into the
> (host) IOMMU and there is no IOMMU visible in the guest.
>
> This patch begins to relax this model, introducing the notion of a
> VFIOAddressSpace. This represents a logical DMA address space which will
> be visible to one or more VFIO devices by appropriate mapping in the (host)
> IOMMU. Thus the currently global list of containers becomes local to
> a VFIOAddressSpace, and we verify that we don't attempt to add a VFIO
> group to multiple address spaces.
>
> For now, only one VFIOAddressSpace is created and used, corresponding to
> main system memory, that will change in future patches.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/misc/vfio.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 48 insertions(+), 15 deletions(-)
>
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index c4a8853..e557f69 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -113,9 +113,17 @@ enum {
> VFIO_INT_MSIX = 3,
> };
>
> +typedef struct VFIOAddressSpace {
> + AddressSpace *as;
> + QLIST_HEAD(, VFIOContainer) containers;
> +} VFIOAddressSpace;
> +
> +static VFIOAddressSpace vfio_address_space_memory;
> +
> struct VFIOGroup;
>
> typedef struct VFIOContainer {
> + VFIOAddressSpace *space;
> int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> struct {
> /* enable abstraction to support various iommu backends */
> @@ -178,9 +186,6 @@ typedef struct VFIOGroup {
>
> #define MSIX_CAP_LENGTH 12
>
> -static QLIST_HEAD(, VFIOContainer)
> - container_list = QLIST_HEAD_INITIALIZER(container_list);
> -
> static QLIST_HEAD(, VFIOGroup)
> group_list = QLIST_HEAD_INITIALIZER(group_list);
>
> @@ -2624,16 +2629,28 @@ static int vfio_load_rom(VFIODevice *vdev)
> return 0;
> }
>
> -static int vfio_connect_container(VFIOGroup *group)
> +static void vfio_address_space_init(VFIOAddressSpace *space, AddressSpace *as)
> +{
> + space->as = as;
> + QLIST_INIT(&space->containers);
> +}
> +
> +static int vfio_connect_container(VFIOGroup *group, VFIOAddressSpace *space)
> {
> VFIOContainer *container;
> int ret, fd;
>
> if (group->container) {
> - return 0;
> + if (group->container->space == space) {
> + return 0;
> + } else {
> + error_report("vfio: group %d used in multiple address spaces",
> + group->groupid);
> + return -EBUSY;
> + }
> }
>
> - QLIST_FOREACH(container, &container_list, next) {
> + QLIST_FOREACH(container, &space->containers, next) {
> if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> group->container = container;
> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> @@ -2656,6 +2673,7 @@ static int vfio_connect_container(VFIOGroup *group)
> }
>
> container = g_malloc0(sizeof(*container));
> + container->space = space;
> container->fd = fd;
>
> if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
> @@ -2678,7 +2696,8 @@ static int vfio_connect_container(VFIOGroup *group)
> container->iommu_data.listener = vfio_memory_listener;
> container->iommu_data.release = vfio_listener_release;
>
> - memory_listener_register(&container->iommu_data.listener, &address_space_memory);
> + memory_listener_register(&container->iommu_data.listener,
> + container->space->as);
> } else {
> error_report("vfio: No available IOMMU models");
> g_free(container);
> @@ -2687,7 +2706,7 @@ static int vfio_connect_container(VFIOGroup *group)
> }
>
> QLIST_INIT(&container->group_list);
> - QLIST_INSERT_HEAD(&container_list, container, next);
> + QLIST_INSERT_HEAD(&space->containers, container, next);
>
> group->container = container;
> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> @@ -2700,7 +2719,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
> VFIOContainer *container = group->container;
>
> if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
> - error_report("vfio: error disconnecting group %d from container",
> + error_report("vfio: error disconnecting group %d from context",
> group->groupid);
> }
>
> @@ -2712,13 +2731,13 @@ static void vfio_disconnect_container(VFIOGroup *group)
> container->iommu_data.release(container);
> }
> QLIST_REMOVE(container, next);
> - DPRINTF("vfio_disconnect_container: close container->fd\n");
> + DPRINTF("vfio_disconnect: close container->fd\n");
> close(container->fd);
> g_free(container);
> }
> }
Drop the above two chunks.
> -static VFIOGroup *vfio_get_group(int groupid)
> +static VFIOGroup *vfio_get_group(int groupid, VFIOAddressSpace *space)
> {
> VFIOGroup *group;
> char path[32];
> @@ -2726,7 +2745,15 @@ static VFIOGroup *vfio_get_group(int groupid)
>
> QLIST_FOREACH(group, &group_list, next) {
> if (group->groupid == groupid) {
> - return group;
> + /* Found it. Now is it already in the right context? */
> + assert(group->container);
> + if (group->container->space == space) {
> + return group;
> + } else {
> + error_report("vfio: group %d used in multiple address spaces",
> + group->groupid);
> + return NULL;
> + }
> }
> }
>
> @@ -2759,8 +2786,8 @@ static VFIOGroup *vfio_get_group(int groupid)
> group->groupid = groupid;
> QLIST_INIT(&group->device_list);
>
> - if (vfio_connect_container(group)) {
> - error_report("vfio: failed to setup container for group %d", groupid);
> + if (vfio_connect_container(group, space)) {
> + error_report("vfio: failed to setup context for group %d", groupid);
s/container/context/ is unnecessary now.
> close(group->fd);
> g_free(group);
> return NULL;
> @@ -2992,7 +3019,12 @@ static int vfio_initfn(PCIDevice *pdev)
> DPRINTF("%s(%04x:%02x:%02x.%x) group %d\n", __func__, vdev->host.domain,
> vdev->host.bus, vdev->host.slot, vdev->host.function, groupid);
>
> - group = vfio_get_group(groupid);
> + if (pci_iommu_as(pdev) != &address_space_memory) {
> + error_report("vfio: DMA address space must be system memory");
> + return -ENXIO;
> + }
> +
> + group = vfio_get_group(groupid, &vfio_address_space_memory);
> if (!group) {
> error_report("vfio: failed to get group %d", groupid);
> return -ENOENT;
> @@ -3212,6 +3244,7 @@ static const TypeInfo vfio_pci_dev_info = {
>
> static void register_vfio_pci_dev_type(void)
> {
> + vfio_address_space_init(&vfio_address_space_memory, &address_space_memory);
> type_register_static(&vfio_pci_dev_info);
> }
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry
2013-05-14 9:13 ` [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry David Gibson
@ 2013-05-14 16:53 ` Alex Williamson
2013-05-15 3:51 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-14 21:22 ` [Qemu-devel] " Paolo Bonzini
1 sibling, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2013-05-14 16:53 UTC (permalink / raw)
To: David Gibson; +Cc: aik, pbonzini, qemu-ppc, qemu-devel, mst
On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> Currently, the IOMMUTLBEntry structure contains the translated address,
> but not the IOVA (original untranslated address). We're going to need it
> for upcoming changes, so add it in here, and populate it correctly in our
> one existing iommu implementation.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/ppc/spapr_iommu.c | 2 ++
> include/exec/memory.h | 2 +-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 90469b3..07a6307 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -80,6 +80,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>
> if (tcet->bypass) {
> return (IOMMUTLBEntry) {
> + .iova = 0,
> .translated_addr = 0,
> .addr_mask = ~(hwaddr)0,
> .perm = { true, true },
> @@ -102,6 +103,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
> #endif
>
> return (IOMMUTLBEntry) {
> + .iova = addr & ~SPAPR_TCE_PAGE_MASK,
> .translated_addr = tce & ~SPAPR_TCE_PAGE_MASK,
> .addr_mask = SPAPR_TCE_PAGE_MASK,
> .perm = { [0] = tce & SPAPR_TCE_RO, [1] = tce & SPAPR_TCE_WO },
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index b97ace7..cd33439 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -116,7 +116,7 @@ typedef struct IOMMUTLBEntry IOMMUTLBEntry;
> typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
>
> struct IOMMUTLBEntry {
> - hwaddr translated_addr;
> + hwaddr iova, translated_addr;
Being a struct definition, I assume we'd want to keep these on separate
lines for clarity (and preferably comments).
> hwaddr addr_mask; /* 0xfff = 4k translation */
> bool perm[2]; /* permissions, [0] for read, [1] for write */
> };
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-14 9:13 ` [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support David Gibson
2013-05-14 9:48 ` Paolo Bonzini
@ 2013-05-14 17:15 ` Alex Williamson
2013-05-15 1:33 ` [Qemu-devel] [Qemu-ppc] " David Gibson
1 sibling, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2013-05-14 17:15 UTC (permalink / raw)
To: David Gibson; +Cc: aik, pbonzini, qemu-ppc, qemu-devel, mst
On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> This patch uses the new IOMMU notifiers to allow VFIO pass through devices
> to work with guest side IOMMUs, as long as the host-side VFIO iommu has
> sufficient capability and granularity to match the guest side. This works
> by tracking all map and unmap operations on the guest IOMMU using the
> notifiers, and mirroring them into VFIO.
>
> There are a number of FIXMEs, and the scheme involves rather more notifier
> structures than I'd like, but it shuold make for a reasonable proof of
> concept.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 126 insertions(+), 13 deletions(-)
>
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index f4e3792..62a83ca 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -133,10 +133,18 @@ typedef struct VFIOContainer {
> };
> void (*release)(struct VFIOContainer *);
> } iommu_data;
> + QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
Seems like this would be related to the space, not the container.
> QLIST_HEAD(, VFIOGroup) group_list;
> QLIST_ENTRY(VFIOContainer) next;
> } VFIOContainer;
>
> +typedef struct VFIOGuestIOMMU {
> + VFIOContainer *container;
> + MemoryRegion *iommu;
> + Notifier n;
> + QLIST_ENTRY(VFIOGuestIOMMU) list;
> +} VFIOGuestIOMMU;
> +
> /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
> typedef struct VFIOMSIXInfo {
> uint8_t table_bar;
> @@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>
> static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> {
> - return !memory_region_is_ram(section->mr);
> + return !memory_region_is_ram(section->mr) &&
> + !memory_region_is_iommu(section->mr);
> +}
> +
> +static void vfio_iommu_map_notify(Notifier *n, void *data)
> +{
> + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> + MemoryRegion *iommu = giommu->iommu;
> + VFIOContainer *container = giommu->container;
> + IOMMUTLBEntry *iotlb = data;
> + MemoryRegionSection *mrs;
> + hwaddr xlat;
> + hwaddr len = iotlb->addr_mask + 1;
> + void *vaddr;
> + int ret;
> +
> + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> + iotlb->iova, iotlb->iova + iotlb->address_mask);
> +
> + /* The IOMMU TLB entry we have just covers translation through
> + * this IOMMU to its immediate target. We need to translate
> + * it the rest of the way through to memory. */
> + mrs = address_space_translate(iommu->iommu_target_as,
> + iotlb->translated_addr,
> + &xlat, &len, iotlb->perm[1]);
> + if (!memory_region_is_ram(mrs->mr)) {
> + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
> + xlat);
> + return;
> + }
> + if (len & iotlb->addr_mask) {
> + DPRINTF("iommu has granularity incompatible with target AS\n");
> + return;
> + }
> +
> + vaddr = memory_region_get_ram_ptr(mrs->mr) +
> + mrs->offset_within_region +
> + (xlat - mrs->offset_within_address_space);
> +
> + if (iotlb->perm[0] || iotlb->perm[1]) {
> + ret = vfio_dma_map(container, iotlb->iova,
> + iotlb->addr_mask + 1, vaddr,
> + !iotlb->perm[1] || mrs->readonly);
> + if (ret) {
> + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> + container, iotlb->iova,
> + iotlb->addr_mask + 1, vaddr, ret);
> + }
> + } else {
> + ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> + if (ret) {
> + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> + "0x%"HWADDR_PRIx") = %d (%m)",
> + container, iotlb->iova,
> + iotlb->addr_mask + 1, ret);
> + }
> + }
> }
>
> static void vfio_listener_region_add(MemoryListener *listener,
> @@ -1949,11 +2014,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
> VFIOContainer *container = container_of(listener, VFIOContainer,
> iommu_data.listener);
> hwaddr iova, end;
> - void *vaddr;
> int ret;
>
> - assert(!memory_region_is_iommu(section->mr));
> -
> if (vfio_listener_skipped_section(section)) {
> DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
> section->offset_within_address_space,
> @@ -1975,20 +2037,55 @@ static void vfio_listener_region_add(MemoryListener *listener,
> return;
> }
>
> - vaddr = memory_region_get_ram_ptr(section->mr) +
> + memory_region_ref(section->mr);
> +
> + if (memory_region_is_ram(section->mr)) {
> + void *vaddr;
> +
> + DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> + iova, end - 1, vaddr);
> +
> + vaddr = memory_region_get_ram_ptr(section->mr) +
> section->offset_within_region +
> (iova - section->offset_within_address_space);
>
> - DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> - iova, end - 1, vaddr);
>
> - memory_region_ref(section->mr);
> - ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
> - if (ret) {
> - error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> - "0x%"HWADDR_PRIx", %p) = %d (%m)",
> - container, iova, end - iova, vaddr, ret);
> + ret = vfio_dma_map(container, iova, end - iova, vaddr,
> + section->readonly);
> + if (ret) {
> + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> + container, iova, end - iova, vaddr, ret);
> + }
> + } else if (memory_region_is_iommu(section->mr)) {
> + VFIOGuestIOMMU *giommu;
> +
> + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> + iova, end - 1);
> +
> + /*
> + * FIXME: We should do some checking to see if the
> + * capabilities of the host VFIO IOMMU are adequate to model
> + * the guest IOMMU
> + *
> + * FIXME: This assumes that the guest IOMMU is empty of
> + * mappings at this point - we should either enforce this, or
> + * loop through existing mappings to map them into VFIO.
> + *
> + * FIXME: For VFIO iommu types which have KVM acceleration to
> + * avoid bouncing all map/unmaps through qemu this way, this
> + * would be the right place to wire that up (tell the KVM
> + * device emulation the VFIO iommu handles to use).
> + */
> + giommu = g_malloc0(sizeof(*giommu));
> + giommu->iommu = section->mr;
> + giommu->container = container;
> + giommu->n.notify = vfio_iommu_map_notify;
> +
> + QLIST_INSERT_HEAD(&container->guest_iommus, giommu, list);
> + memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
And this is also filtered on the space, so we're not adding iommus that
aren't handling regions within this space, right?
Does the memory listener need to move to the space?
> }
> +
> }
>
> static void vfio_listener_region_del(MemoryListener *listener,
> @@ -2012,6 +2109,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
> return;
> }
>
> + if (memory_region_is_iommu(section->mr)) {
> + VFIOGuestIOMMU *giommu;
> +
> + QLIST_FOREACH(giommu, &container->guest_iommus, list) {
> + if (giommu->iommu == section->mr) {
> + memory_region_unregister_iommu_notifier(&giommu->n);
Missing free
> + break;
> + }
> + }
> +
> + /* FIXME: We assume the one big unmap below is adequate to
> + * remove any individual page mappings in the IOMMU which
> + * might have been copied into VFIO. That may not be true for
> + * all IOMMU types */
> + }
> +
> iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> end = (section->offset_within_address_space + section->size) &
> TARGET_PAGE_MASK;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers
2013-05-14 9:13 ` [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers David Gibson
@ 2013-05-14 17:15 ` Alex Williamson
2013-05-14 21:02 ` Paolo Bonzini
2013-05-14 21:23 ` Paolo Bonzini
1 sibling, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2013-05-14 17:15 UTC (permalink / raw)
To: David Gibson; +Cc: aik, pbonzini, qemu-ppc, qemu-devel, mst
On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> This patch adds a NotifierList to MemoryRegions which represent IOMMUs
> allowing other parts of the code to register interest in mappings or
> unmappings from the IOMMU. All IOMMU implementations will need to call
> memory_region_notify_iommu() to inform those waiting on the notifier list,
> whenever an IOMMU mapping is made or removed.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/ppc/spapr_iommu.c | 8 ++++++++
> include/exec/memory.h | 7 +++++++
> memory.c | 18 ++++++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 07a6307..6d40485 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -194,6 +194,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
> target_ulong tce)
> {
> sPAPRTCE *tcep;
> + IOMMUTLBEntry entry;
>
> if (ioba >= tcet->window_size) {
> hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
> @@ -204,6 +205,13 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
> tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
> tcep->tce = tce;
>
> + entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
> + entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
> + entry.addr_mask = SPAPR_TCE_PAGE_MASK;
> + entry.perm[0] = !!(tce & SPAPR_TCE_RO);
> + entry.perm[1] = !!(tce & SPAPR_TCE_WO);
I really wish these perms were addressed by #defines (ie.
perm[READ]/perm[WRITE]) or explicitly named.
> + memory_region_notify_iommu(&tcet->iommu, entry);
> +
> return H_SUCCESS;
> }
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index cd33439..024b511 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -25,6 +25,7 @@
> #include "exec/iorange.h"
> #include "exec/ioport.h"
> #include "qemu/int128.h"
> +#include "qemu/notify.h"
>
> typedef struct MemoryRegionOps MemoryRegionOps;
> typedef struct MemoryRegionPortio MemoryRegionPortio;
> @@ -160,6 +161,7 @@ struct MemoryRegion {
> unsigned ioeventfd_nb;
> MemoryRegionIoeventfd *ioeventfds;
> struct AddressSpace *iommu_target_as;
> + NotifierList iommu_notify;
> };
>
> struct MemoryRegionPortio {
> @@ -462,6 +464,11 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
> */
> bool memory_region_is_iommu(MemoryRegion *mr);
>
> +void memory_region_notify_iommu(MemoryRegion *mr,
> + IOMMUTLBEntry entry);
> +void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> +void memory_region_unregister_iommu_notifier(Notifier *n);
> +
> /**
> * memory_region_name: get a memory region's name
> *
> diff --git a/memory.c b/memory.c
> index c409ee5..b11ca9f 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1060,6 +1060,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
> mr->terminates = true; /* then re-forwards */
> mr->destructor = memory_region_destructor_none;
> mr->iommu_target_as = target_as;
> + notifier_list_init(&mr->iommu_notify);
> }
>
> static uint64_t invalid_read(void *opaque, hwaddr addr,
> @@ -1175,6 +1176,23 @@ bool memory_region_is_iommu(MemoryRegion *mr)
> return mr->iommu_ops;
> }
>
> +void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> +{
> + notifier_list_add(&mr->iommu_notify, n);
> +}
> +
> +void memory_region_unregister_iommu_notifier(Notifier *n)
> +{
> + notifier_remove(n);
> +}
> +
> +void memory_region_notify_iommu(MemoryRegion *mr,
> + IOMMUTLBEntry entry)
> +{
> + assert(memory_region_is_iommu(mr));
> + notifier_list_notify(&mr->iommu_notify, &entry);
> +}
> +
> void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> {
> uint8_t mask = 1 << client;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers
2013-05-14 17:15 ` Alex Williamson
@ 2013-05-14 21:02 ` Paolo Bonzini
0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2013-05-14 21:02 UTC (permalink / raw)
To: Alex Williamson; +Cc: aik, mst, qemu-ppc, qemu-devel, David Gibson
Il 14/05/2013 19:15, Alex Williamson ha scritto:
> On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
>> > This patch adds a NotifierList to MemoryRegions which represent IOMMUs
>> > allowing other parts of the code to register interest in mappings or
>> > unmappings from the IOMMU. All IOMMU implementations will need to call
>> > memory_region_notify_iommu() to inform those waiting on the notifier list,
>> > whenever an IOMMU mapping is made or removed.
>> >
>> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> > ---
>> > hw/ppc/spapr_iommu.c | 8 ++++++++
>> > include/exec/memory.h | 7 +++++++
>> > memory.c | 18 ++++++++++++++++++
>> > 3 files changed, 33 insertions(+)
>> >
>> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> > index 07a6307..6d40485 100644
>> > --- a/hw/ppc/spapr_iommu.c
>> > +++ b/hw/ppc/spapr_iommu.c
>> > @@ -194,6 +194,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>> > target_ulong tce)
>> > {
>> > sPAPRTCE *tcep;
>> > + IOMMUTLBEntry entry;
>> >
>> > if (ioba >= tcet->window_size) {
>> > hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
>> > @@ -204,6 +205,13 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>> > tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
>> > tcep->tce = tce;
>> >
>> > + entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
>> > + entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
>> > + entry.addr_mask = SPAPR_TCE_PAGE_MASK;
>> > + entry.perm[0] = !!(tce & SPAPR_TCE_RO);
>> > + entry.perm[1] = !!(tce & SPAPR_TCE_WO);
> I really wish these perms were addressed by #defines (ie.
> perm[READ]/perm[WRITE]) or explicitly named.
>
I will make it a 2-bit field.
Paolo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry
2013-05-14 9:13 ` [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry David Gibson
2013-05-14 16:53 ` Alex Williamson
@ 2013-05-14 21:22 ` Paolo Bonzini
1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2013-05-14 21:22 UTC (permalink / raw)
To: David Gibson; +Cc: aik, alex.williamson, qemu-ppc, qemu-devel, mst
Il 14/05/2013 11:13, David Gibson ha scritto:
> Currently, the IOMMUTLBEntry structure contains the translated address,
> but not the IOVA (original untranslated address). We're going to need it
> for upcoming changes, so add it in here, and populate it correctly in our
> one existing iommu implementation.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/ppc/spapr_iommu.c | 2 ++
> include/exec/memory.h | 2 +-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 90469b3..07a6307 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -80,6 +80,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>
> if (tcet->bypass) {
> return (IOMMUTLBEntry) {
> + .iova = 0,
> .translated_addr = 0,
> .addr_mask = ~(hwaddr)0,
> .perm = { true, true },
> @@ -102,6 +103,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
> #endif
>
> return (IOMMUTLBEntry) {
> + .iova = addr & ~SPAPR_TCE_PAGE_MASK,
> .translated_addr = tce & ~SPAPR_TCE_PAGE_MASK,
> .addr_mask = SPAPR_TCE_PAGE_MASK,
> .perm = { [0] = tce & SPAPR_TCE_RO, [1] = tce & SPAPR_TCE_WO },
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index b97ace7..cd33439 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -116,7 +116,7 @@ typedef struct IOMMUTLBEntry IOMMUTLBEntry;
> typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
>
> struct IOMMUTLBEntry {
> - hwaddr translated_addr;
> + hwaddr iova, translated_addr;
> hwaddr addr_mask; /* 0xfff = 4k translation */
> bool perm[2]; /* permissions, [0] for read, [1] for write */
> };
>
Split and squashed into various patches of my iommu tree.
Paolo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers
2013-05-14 9:13 ` [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers David Gibson
2013-05-14 17:15 ` Alex Williamson
@ 2013-05-14 21:23 ` Paolo Bonzini
2013-05-15 1:21 ` [Qemu-devel] [Qemu-ppc] " David Gibson
1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2013-05-14 21:23 UTC (permalink / raw)
To: David Gibson; +Cc: aik, alex.williamson, qemu-ppc, qemu-devel, mst
Il 14/05/2013 11:13, David Gibson ha scritto:
> This patch adds a NotifierList to MemoryRegions which represent IOMMUs
> allowing other parts of the code to register interest in mappings or
> unmappings from the IOMMU. All IOMMU implementations will need to call
> memory_region_notify_iommu() to inform those waiting on the notifier list,
> whenever an IOMMU mapping is made or removed.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
Split into various patches of my iommu tree, will push tomorrow.
Paolo
> hw/ppc/spapr_iommu.c | 8 ++++++++
> include/exec/memory.h | 7 +++++++
> memory.c | 18 ++++++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 07a6307..6d40485 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -194,6 +194,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
> target_ulong tce)
> {
> sPAPRTCE *tcep;
> + IOMMUTLBEntry entry;
>
> if (ioba >= tcet->window_size) {
> hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
> @@ -204,6 +205,13 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
> tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
> tcep->tce = tce;
>
> + entry.iova = ioba & ~SPAPR_TCE_PAGE_MASK;
> + entry.translated_addr = tce & ~SPAPR_TCE_PAGE_MASK;
> + entry.addr_mask = SPAPR_TCE_PAGE_MASK;
> + entry.perm[0] = !!(tce & SPAPR_TCE_RO);
> + entry.perm[1] = !!(tce & SPAPR_TCE_WO);
> + memory_region_notify_iommu(&tcet->iommu, entry);
> +
> return H_SUCCESS;
> }
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index cd33439..024b511 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -25,6 +25,7 @@
> #include "exec/iorange.h"
> #include "exec/ioport.h"
> #include "qemu/int128.h"
> +#include "qemu/notify.h"
>
> typedef struct MemoryRegionOps MemoryRegionOps;
> typedef struct MemoryRegionPortio MemoryRegionPortio;
> @@ -160,6 +161,7 @@ struct MemoryRegion {
> unsigned ioeventfd_nb;
> MemoryRegionIoeventfd *ioeventfds;
> struct AddressSpace *iommu_target_as;
> + NotifierList iommu_notify;
> };
>
> struct MemoryRegionPortio {
> @@ -462,6 +464,11 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
> */
> bool memory_region_is_iommu(MemoryRegion *mr);
>
> +void memory_region_notify_iommu(MemoryRegion *mr,
> + IOMMUTLBEntry entry);
> +void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> +void memory_region_unregister_iommu_notifier(Notifier *n);
> +
> /**
> * memory_region_name: get a memory region's name
> *
> diff --git a/memory.c b/memory.c
> index c409ee5..b11ca9f 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1060,6 +1060,7 @@ void memory_region_init_iommu(MemoryRegion *mr,
> mr->terminates = true; /* then re-forwards */
> mr->destructor = memory_region_destructor_none;
> mr->iommu_target_as = target_as;
> + notifier_list_init(&mr->iommu_notify);
> }
>
> static uint64_t invalid_read(void *opaque, hwaddr addr,
> @@ -1175,6 +1176,23 @@ bool memory_region_is_iommu(MemoryRegion *mr)
> return mr->iommu_ops;
> }
>
> +void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> +{
> + notifier_list_add(&mr->iommu_notify, n);
> +}
> +
> +void memory_region_unregister_iommu_notifier(Notifier *n)
> +{
> + notifier_remove(n);
> +}
> +
> +void memory_region_notify_iommu(MemoryRegion *mr,
> + IOMMUTLBEntry entry)
> +{
> + assert(memory_region_is_iommu(mr));
> + notifier_list_notify(&mr->iommu_notify, &entry);
> +}
> +
> void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> {
> uint8_t mask = 1 << client;
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 07/11] vfio: Introduce VFIO address spaces
2013-05-14 16:53 ` Alex Williamson
@ 2013-05-15 1:18 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-15 1:18 UTC (permalink / raw)
To: Alex Williamson; +Cc: pbonzini, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 7079 bytes --]
On Tue, May 14, 2013 at 10:53:12AM -0600, Alex Williamson wrote:
> On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> > The only model so far supported for VFIO passthrough devices is the model
> > usually used on x86, where all of the guest's RAM is mapped into the
> > (host) IOMMU and there is no IOMMU visible in the guest.
> >
> > This patch begins to relax this model, introducing the notion of a
> > VFIOAddressSpace. This represents a logical DMA address space which will
> > be visible to one or more VFIO devices by appropriate mapping in the (host)
> > IOMMU. Thus the currently global list of containers becomes local to
> > a VFIOAddressSpace, and we verify that we don't attempt to add a VFIO
> > group to multiple address spaces.
> >
> > For now, only one VFIOAddressSpace is created and used, corresponding to
> > main system memory, that will change in future patches.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > hw/misc/vfio.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
> > 1 file changed, 48 insertions(+), 15 deletions(-)
> >
> > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > index c4a8853..e557f69 100644
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -113,9 +113,17 @@ enum {
> > VFIO_INT_MSIX = 3,
> > };
> >
> > +typedef struct VFIOAddressSpace {
> > + AddressSpace *as;
> > + QLIST_HEAD(, VFIOContainer) containers;
> > +} VFIOAddressSpace;
> > +
> > +static VFIOAddressSpace vfio_address_space_memory;
> > +
> > struct VFIOGroup;
> >
> > typedef struct VFIOContainer {
> > + VFIOAddressSpace *space;
> > int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> > struct {
> > /* enable abstraction to support various iommu backends */
> > @@ -178,9 +186,6 @@ typedef struct VFIOGroup {
> >
> > #define MSIX_CAP_LENGTH 12
> >
> > -static QLIST_HEAD(, VFIOContainer)
> > - container_list = QLIST_HEAD_INITIALIZER(container_list);
> > -
> > static QLIST_HEAD(, VFIOGroup)
> > group_list = QLIST_HEAD_INITIALIZER(group_list);
> >
> > @@ -2624,16 +2629,28 @@ static int vfio_load_rom(VFIODevice *vdev)
> > return 0;
> > }
> >
> > -static int vfio_connect_container(VFIOGroup *group)
> > +static void vfio_address_space_init(VFIOAddressSpace *space, AddressSpace *as)
> > +{
> > + space->as = as;
> > + QLIST_INIT(&space->containers);
> > +}
> > +
> > +static int vfio_connect_container(VFIOGroup *group, VFIOAddressSpace *space)
> > {
> > VFIOContainer *container;
> > int ret, fd;
> >
> > if (group->container) {
> > - return 0;
> > + if (group->container->space == space) {
> > + return 0;
> > + } else {
> > + error_report("vfio: group %d used in multiple address spaces",
> > + group->groupid);
> > + return -EBUSY;
> > + }
> > }
> >
> > - QLIST_FOREACH(container, &container_list, next) {
> > + QLIST_FOREACH(container, &space->containers, next) {
> > if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> > group->container = container;
> > QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > @@ -2656,6 +2673,7 @@ static int vfio_connect_container(VFIOGroup *group)
> > }
> >
> > container = g_malloc0(sizeof(*container));
> > + container->space = space;
> > container->fd = fd;
> >
> > if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
> > @@ -2678,7 +2696,8 @@ static int vfio_connect_container(VFIOGroup *group)
> > container->iommu_data.listener = vfio_memory_listener;
> > container->iommu_data.release = vfio_listener_release;
> >
> > - memory_listener_register(&container->iommu_data.listener, &address_space_memory);
> > + memory_listener_register(&container->iommu_data.listener,
> > + container->space->as);
> > } else {
> > error_report("vfio: No available IOMMU models");
> > g_free(container);
> > @@ -2687,7 +2706,7 @@ static int vfio_connect_container(VFIOGroup *group)
> > }
> >
> > QLIST_INIT(&container->group_list);
> > - QLIST_INSERT_HEAD(&container_list, container, next);
> > + QLIST_INSERT_HEAD(&space->containers, container, next);
> >
> > group->container = container;
> > QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > @@ -2700,7 +2719,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
> > VFIOContainer *container = group->container;
> >
> > if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
> > - error_report("vfio: error disconnecting group %d from container",
> > + error_report("vfio: error disconnecting group %d from context",
> > group->groupid);
> > }
> >
> > @@ -2712,13 +2731,13 @@ static void vfio_disconnect_container(VFIOGroup *group)
> > container->iommu_data.release(container);
> > }
> > QLIST_REMOVE(container, next);
> > - DPRINTF("vfio_disconnect_container: close container->fd\n");
> > + DPRINTF("vfio_disconnect: close container->fd\n");
> > close(container->fd);
> > g_free(container);
> > }
> > }
>
> Drop the above two chunks.
>
> > -static VFIOGroup *vfio_get_group(int groupid)
> > +static VFIOGroup *vfio_get_group(int groupid, VFIOAddressSpace *space)
> > {
> > VFIOGroup *group;
> > char path[32];
> > @@ -2726,7 +2745,15 @@ static VFIOGroup *vfio_get_group(int groupid)
> >
> > QLIST_FOREACH(group, &group_list, next) {
> > if (group->groupid == groupid) {
> > - return group;
> > + /* Found it. Now is it already in the right context? */
> > + assert(group->container);
> > + if (group->container->space == space) {
> > + return group;
> > + } else {
> > + error_report("vfio: group %d used in multiple address spaces",
> > + group->groupid);
> > + return NULL;
> > + }
> > }
> > }
> >
> > @@ -2759,8 +2786,8 @@ static VFIOGroup *vfio_get_group(int groupid)
> > group->groupid = groupid;
> > QLIST_INIT(&group->device_list);
> >
> > - if (vfio_connect_container(group)) {
> > - error_report("vfio: failed to setup container for group %d", groupid);
> > + if (vfio_connect_container(group, space)) {
> > + error_report("vfio: failed to setup context for group %d", groupid);
>
> s/container/context/ is unnecessary now.
Ah, yes, fixed.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 10/11] memory: Add iommu map/unmap notifiers
2013-05-14 21:23 ` Paolo Bonzini
@ 2013-05-15 1:21 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-15 1:21 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: alex.williamson, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]
On Tue, May 14, 2013 at 11:23:16PM +0200, Paolo Bonzini wrote:
> Il 14/05/2013 11:13, David Gibson ha scritto:
> > This patch adds a NotifierList to MemoryRegions which represent IOMMUs
> > allowing other parts of the code to register interest in mappings or
> > unmappings from the IOMMU. All IOMMU implementations will need to call
> > memory_region_notify_iommu() to inform those waiting on the notifier list,
> > whenever an IOMMU mapping is made or removed.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
>
> Split into various patches of my iommu tree, will push tomorrow.
Uhhh... ok. I would kind have preferred to hold off on this one until
the VFIO/iommu stuff is more polished. I'm not totally convinced the
notifier structure doesn't need some tweaking.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-14 17:15 ` [Qemu-devel] " Alex Williamson
@ 2013-05-15 1:33 ` David Gibson
2013-05-15 2:51 ` Alex Williamson
0 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2013-05-15 1:33 UTC (permalink / raw)
To: Alex Williamson; +Cc: pbonzini, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 10097 bytes --]
On Tue, May 14, 2013 at 11:15:26AM -0600, Alex Williamson wrote:
> On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> > This patch uses the new IOMMU notifiers to allow VFIO pass through devices
> > to work with guest side IOMMUs, as long as the host-side VFIO iommu has
> > sufficient capability and granularity to match the guest side. This works
> > by tracking all map and unmap operations on the guest IOMMU using the
> > notifiers, and mirroring them into VFIO.
> >
> > There are a number of FIXMEs, and the scheme involves rather more notifier
> > structures than I'd like, but it shuold make for a reasonable proof of
> > concept.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 126 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > index f4e3792..62a83ca 100644
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -133,10 +133,18 @@ typedef struct VFIOContainer {
> > };
> > void (*release)(struct VFIOContainer *);
> > } iommu_data;
> > + QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
>
> Seems like this would be related to the space, not the container.
So, originally I was going to put it into the space, until I realised
that the MemoryListener which sets it up is already per-container.
The list still could be per-space, of course, but we'd have to do a
bunch of check-if-it's-already there stuff. And the remove path is
worse.
> > QLIST_HEAD(, VFIOGroup) group_list;
> > QLIST_ENTRY(VFIOContainer) next;
> > } VFIOContainer;
> >
> > +typedef struct VFIOGuestIOMMU {
> > + VFIOContainer *container;
> > + MemoryRegion *iommu;
> > + Notifier n;
> > + QLIST_ENTRY(VFIOGuestIOMMU) list;
> > +} VFIOGuestIOMMU;
> > +
> > /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
> > typedef struct VFIOMSIXInfo {
> > uint8_t table_bar;
> > @@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
> >
> > static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > {
> > - return !memory_region_is_ram(section->mr);
> > + return !memory_region_is_ram(section->mr) &&
> > + !memory_region_is_iommu(section->mr);
> > +}
> > +
> > +static void vfio_iommu_map_notify(Notifier *n, void *data)
> > +{
> > + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > + MemoryRegion *iommu = giommu->iommu;
> > + VFIOContainer *container = giommu->container;
> > + IOMMUTLBEntry *iotlb = data;
> > + MemoryRegionSection *mrs;
> > + hwaddr xlat;
> > + hwaddr len = iotlb->addr_mask + 1;
> > + void *vaddr;
> > + int ret;
> > +
> > + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > + iotlb->iova, iotlb->iova + iotlb->address_mask);
> > +
> > + /* The IOMMU TLB entry we have just covers translation through
> > + * this IOMMU to its immediate target. We need to translate
> > + * it the rest of the way through to memory. */
> > + mrs = address_space_translate(iommu->iommu_target_as,
> > + iotlb->translated_addr,
> > + &xlat, &len, iotlb->perm[1]);
> > + if (!memory_region_is_ram(mrs->mr)) {
> > + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
> > + xlat);
> > + return;
> > + }
> > + if (len & iotlb->addr_mask) {
> > + DPRINTF("iommu has granularity incompatible with target AS\n");
> > + return;
> > + }
> > +
> > + vaddr = memory_region_get_ram_ptr(mrs->mr) +
> > + mrs->offset_within_region +
> > + (xlat - mrs->offset_within_address_space);
> > +
> > + if (iotlb->perm[0] || iotlb->perm[1]) {
> > + ret = vfio_dma_map(container, iotlb->iova,
> > + iotlb->addr_mask + 1, vaddr,
> > + !iotlb->perm[1] || mrs->readonly);
> > + if (ret) {
> > + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > + container, iotlb->iova,
> > + iotlb->addr_mask + 1, vaddr, ret);
> > + }
> > + } else {
> > + ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> > + if (ret) {
> > + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> > + "0x%"HWADDR_PRIx") = %d (%m)",
> > + container, iotlb->iova,
> > + iotlb->addr_mask + 1, ret);
> > + }
> > + }
> > }
> >
> > static void vfio_listener_region_add(MemoryListener *listener,
> > @@ -1949,11 +2014,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > VFIOContainer *container = container_of(listener, VFIOContainer,
> > iommu_data.listener);
> > hwaddr iova, end;
> > - void *vaddr;
> > int ret;
> >
> > - assert(!memory_region_is_iommu(section->mr));
> > -
> > if (vfio_listener_skipped_section(section)) {
> > DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
> > section->offset_within_address_space,
> > @@ -1975,20 +2037,55 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > return;
> > }
> >
> > - vaddr = memory_region_get_ram_ptr(section->mr) +
> > + memory_region_ref(section->mr);
> > +
> > + if (memory_region_is_ram(section->mr)) {
> > + void *vaddr;
> > +
> > + DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> > + iova, end - 1, vaddr);
> > +
> > + vaddr = memory_region_get_ram_ptr(section->mr) +
> > section->offset_within_region +
> > (iova - section->offset_within_address_space);
> >
> > - DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> > - iova, end - 1, vaddr);
> >
> > - memory_region_ref(section->mr);
> > - ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
> > - if (ret) {
> > - error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > - "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > - container, iova, end - iova, vaddr, ret);
> > + ret = vfio_dma_map(container, iova, end - iova, vaddr,
> > + section->readonly);
> > + if (ret) {
> > + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > + container, iova, end - iova, vaddr, ret);
> > + }
> > + } else if (memory_region_is_iommu(section->mr)) {
> > + VFIOGuestIOMMU *giommu;
> > +
> > + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > + iova, end - 1);
> > +
> > + /*
> > + * FIXME: We should do some checking to see if the
> > + * capabilities of the host VFIO IOMMU are adequate to model
> > + * the guest IOMMU
> > + *
> > + * FIXME: This assumes that the guest IOMMU is empty of
> > + * mappings at this point - we should either enforce this, or
> > + * loop through existing mappings to map them into VFIO.
> > + *
> > + * FIXME: For VFIO iommu types which have KVM acceleration to
> > + * avoid bouncing all map/unmaps through qemu this way, this
> > + * would be the right place to wire that up (tell the KVM
> > + * device emulation the VFIO iommu handles to use).
> > + */
> > + giommu = g_malloc0(sizeof(*giommu));
> > + giommu->iommu = section->mr;
> > + giommu->container = container;
> > + giommu->n.notify = vfio_iommu_map_notify;
> > +
> > + QLIST_INSERT_HEAD(&container->guest_iommus, giommu, list);
> > + memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
>
> And this is also filtered on the space, so we're not adding iommus that
> aren't handling regions within this space, right?
I'm not sure what you're getting at.
> Does the memory listener need to move to the space?
That would make this simpler, but it has other problems. Having the
listener per-container means that when a container is added to a space
which already has a bunch of things in it, we automatically get those
replayed by the listener so we can set up the new container.
And.. when we come to deal with the case of adding an iommu region
that already has a bunch of mappings, we may well want the iommu
notifier to be per-container for the same reason.
In some ways I'd prefer to add the iommu notifications as new
callbacks in the memorylistener, rather than a seperate set of
Notifiers, but to do that we'd need to iterate through all address
spaces that contain the iommu region, and I couldn't see a good way of
doing that.
>
> > }
> > +
> > }
> >
> > static void vfio_listener_region_del(MemoryListener *listener,
> > @@ -2012,6 +2109,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
> > return;
> > }
> >
> > + if (memory_region_is_iommu(section->mr)) {
> > + VFIOGuestIOMMU *giommu;
> > +
> > + QLIST_FOREACH(giommu, &container->guest_iommus, list) {
> > + if (giommu->iommu == section->mr) {
> > + memory_region_unregister_iommu_notifier(&giommu->n);
>
> Missing free
Oops. And list remove for that matter. Fixing for next version.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-15 1:33 ` [Qemu-devel] [Qemu-ppc] " David Gibson
@ 2013-05-15 2:51 ` Alex Williamson
2013-05-15 3:32 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2013-05-15 2:51 UTC (permalink / raw)
To: David Gibson; +Cc: pbonzini, qemu-ppc, qemu-devel, mst
On Wed, 2013-05-15 at 11:33 +1000, David Gibson wrote:
> On Tue, May 14, 2013 at 11:15:26AM -0600, Alex Williamson wrote:
> > On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> > > This patch uses the new IOMMU notifiers to allow VFIO pass through devices
> > > to work with guest side IOMMUs, as long as the host-side VFIO iommu has
> > > sufficient capability and granularity to match the guest side. This works
> > > by tracking all map and unmap operations on the guest IOMMU using the
> > > notifiers, and mirroring them into VFIO.
> > >
> > > There are a number of FIXMEs, and the scheme involves rather more notifier
> > > structures than I'd like, but it shuold make for a reasonable proof of
> > > concept.
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> > > 1 file changed, 126 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > index f4e3792..62a83ca 100644
> > > --- a/hw/misc/vfio.c
> > > +++ b/hw/misc/vfio.c
> > > @@ -133,10 +133,18 @@ typedef struct VFIOContainer {
> > > };
> > > void (*release)(struct VFIOContainer *);
> > > } iommu_data;
> > > + QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
> >
> > Seems like this would be related to the space, not the container.
>
> So, originally I was going to put it into the space, until I realised
> that the MemoryListener which sets it up is already per-container.
> The list still could be per-space, of course, but we'd have to do a
> bunch of check-if-it's-already there stuff. And the remove path is
> worse.
>
> > > QLIST_HEAD(, VFIOGroup) group_list;
> > > QLIST_ENTRY(VFIOContainer) next;
> > > } VFIOContainer;
> > >
> > > +typedef struct VFIOGuestIOMMU {
> > > + VFIOContainer *container;
> > > + MemoryRegion *iommu;
> > > + Notifier n;
> > > + QLIST_ENTRY(VFIOGuestIOMMU) list;
> > > +} VFIOGuestIOMMU;
> > > +
> > > /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
> > > typedef struct VFIOMSIXInfo {
> > > uint8_t table_bar;
> > > @@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
> > >
> > > static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > > {
> > > - return !memory_region_is_ram(section->mr);
> > > + return !memory_region_is_ram(section->mr) &&
> > > + !memory_region_is_iommu(section->mr);
> > > +}
> > > +
> > > +static void vfio_iommu_map_notify(Notifier *n, void *data)
> > > +{
> > > + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > + MemoryRegion *iommu = giommu->iommu;
> > > + VFIOContainer *container = giommu->container;
> > > + IOMMUTLBEntry *iotlb = data;
> > > + MemoryRegionSection *mrs;
> > > + hwaddr xlat;
> > > + hwaddr len = iotlb->addr_mask + 1;
> > > + void *vaddr;
> > > + int ret;
> > > +
> > > + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > > + iotlb->iova, iotlb->iova + iotlb->address_mask);
> > > +
> > > + /* The IOMMU TLB entry we have just covers translation through
> > > + * this IOMMU to its immediate target. We need to translate
> > > + * it the rest of the way through to memory. */
> > > + mrs = address_space_translate(iommu->iommu_target_as,
> > > + iotlb->translated_addr,
> > > + &xlat, &len, iotlb->perm[1]);
> > > + if (!memory_region_is_ram(mrs->mr)) {
> > > + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
> > > + xlat);
> > > + return;
> > > + }
> > > + if (len & iotlb->addr_mask) {
> > > + DPRINTF("iommu has granularity incompatible with target AS\n");
> > > + return;
> > > + }
> > > +
> > > + vaddr = memory_region_get_ram_ptr(mrs->mr) +
> > > + mrs->offset_within_region +
> > > + (xlat - mrs->offset_within_address_space);
> > > +
> > > + if (iotlb->perm[0] || iotlb->perm[1]) {
> > > + ret = vfio_dma_map(container, iotlb->iova,
> > > + iotlb->addr_mask + 1, vaddr,
> > > + !iotlb->perm[1] || mrs->readonly);
> > > + if (ret) {
> > > + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > + container, iotlb->iova,
> > > + iotlb->addr_mask + 1, vaddr, ret);
> > > + }
> > > + } else {
> > > + ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> > > + if (ret) {
> > > + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> > > + "0x%"HWADDR_PRIx") = %d (%m)",
> > > + container, iotlb->iova,
> > > + iotlb->addr_mask + 1, ret);
> > > + }
> > > + }
> > > }
> > >
> > > static void vfio_listener_region_add(MemoryListener *listener,
> > > @@ -1949,11 +2014,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > > VFIOContainer *container = container_of(listener, VFIOContainer,
> > > iommu_data.listener);
> > > hwaddr iova, end;
> > > - void *vaddr;
> > > int ret;
> > >
> > > - assert(!memory_region_is_iommu(section->mr));
> > > -
> > > if (vfio_listener_skipped_section(section)) {
> > > DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
> > > section->offset_within_address_space,
> > > @@ -1975,20 +2037,55 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > > return;
> > > }
> > >
> > > - vaddr = memory_region_get_ram_ptr(section->mr) +
> > > + memory_region_ref(section->mr);
> > > +
> > > + if (memory_region_is_ram(section->mr)) {
> > > + void *vaddr;
> > > +
> > > + DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> > > + iova, end - 1, vaddr);
> > > +
> > > + vaddr = memory_region_get_ram_ptr(section->mr) +
> > > section->offset_within_region +
> > > (iova - section->offset_within_address_space);
> > >
> > > - DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> > > - iova, end - 1, vaddr);
> > >
> > > - memory_region_ref(section->mr);
> > > - ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
> > > - if (ret) {
> > > - error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > - "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > - container, iova, end - iova, vaddr, ret);
> > > + ret = vfio_dma_map(container, iova, end - iova, vaddr,
> > > + section->readonly);
> > > + if (ret) {
> > > + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > + container, iova, end - iova, vaddr, ret);
> > > + }
> > > + } else if (memory_region_is_iommu(section->mr)) {
> > > + VFIOGuestIOMMU *giommu;
> > > +
> > > + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > > + iova, end - 1);
> > > +
> > > + /*
> > > + * FIXME: We should do some checking to see if the
> > > + * capabilities of the host VFIO IOMMU are adequate to model
> > > + * the guest IOMMU
> > > + *
> > > + * FIXME: This assumes that the guest IOMMU is empty of
> > > + * mappings at this point - we should either enforce this, or
> > > + * loop through existing mappings to map them into VFIO.
> > > + *
> > > + * FIXME: For VFIO iommu types which have KVM acceleration to
> > > + * avoid bouncing all map/unmaps through qemu this way, this
> > > + * would be the right place to wire that up (tell the KVM
> > > + * device emulation the VFIO iommu handles to use).
> > > + */
> > > + giommu = g_malloc0(sizeof(*giommu));
> > > + giommu->iommu = section->mr;
> > > + giommu->container = container;
> > > + giommu->n.notify = vfio_iommu_map_notify;
> > > +
> > > + QLIST_INSERT_HEAD(&container->guest_iommus, giommu, list);
> > > + memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> >
> > And this is also filtered on the space, so we're not adding iommus that
> > aren't handling regions within this space, right?
>
> I'm not sure what you're getting at.
Trying to make sure I understand why guest_iommus is a list. We can
have multiple guest iommus, but in the majority of those cases I would
expect only one iommu per VFIOAddressSpace. So if the listener is for
this space, we only add the relevant iommu and not all of the iommus in
the machine. Actually maybe two guest iommu ranges are common for spapr
per space, a 32bit and a 64bit range?
> > Does the memory listener need to move to the space?
>
> That would make this simpler, but it has other problems. Having the
> listener per-container means that when a container is added to a space
> which already has a bunch of things in it, we automatically get those
> replayed by the listener so we can set up the new container.
Ah, right, we need replay. Ok, I think it's a reasonable proof of
concept to start with. Thanks,
Alex
> And.. when we come to deal with the case of adding an iommu region
> that already has a bunch of mappings, we may well want the iommu
> notifier to be per-container for the same reason.
>
> In some ways I'd prefer to add the iommu notifications as new
> callbacks in the memorylistener, rather than a seperate set of
> Notifiers, but to do that we'd need to iterate through all address
> spaces that contain the iommu region, and I couldn't see a good way of
> doing that.
>
>
> >
> > > }
> > > +
> > > }
> > >
> > > static void vfio_listener_region_del(MemoryListener *listener,
> > > @@ -2012,6 +2109,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
> > > return;
> > > }
> > >
> > > + if (memory_region_is_iommu(section->mr)) {
> > > + VFIOGuestIOMMU *giommu;
> > > +
> > > + QLIST_FOREACH(giommu, &container->guest_iommus, list) {
> > > + if (giommu->iommu == section->mr) {
> > > + memory_region_unregister_iommu_notifier(&giommu->n);
> >
> > Missing free
>
> Oops. And list remove for that matter. Fixing for next version.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-15 2:51 ` Alex Williamson
@ 2013-05-15 3:32 ` David Gibson
2013-05-16 6:53 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2013-05-15 3:32 UTC (permalink / raw)
To: Alex Williamson; +Cc: pbonzini, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 12075 bytes --]
On Tue, May 14, 2013 at 08:51:08PM -0600, Alex Williamson wrote:
> On Wed, 2013-05-15 at 11:33 +1000, David Gibson wrote:
> > On Tue, May 14, 2013 at 11:15:26AM -0600, Alex Williamson wrote:
> > > On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> > > > This patch uses the new IOMMU notifiers to allow VFIO pass through devices
> > > > to work with guest side IOMMUs, as long as the host-side VFIO iommu has
> > > > sufficient capability and granularity to match the guest side. This works
> > > > by tracking all map and unmap operations on the guest IOMMU using the
> > > > notifiers, and mirroring them into VFIO.
> > > >
> > > > There are a number of FIXMEs, and the scheme involves rather more notifier
> > > > structures than I'd like, but it shuold make for a reasonable proof of
> > > > concept.
> > > >
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > > hw/misc/vfio.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> > > > 1 file changed, 126 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > > index f4e3792..62a83ca 100644
> > > > --- a/hw/misc/vfio.c
> > > > +++ b/hw/misc/vfio.c
> > > > @@ -133,10 +133,18 @@ typedef struct VFIOContainer {
> > > > };
> > > > void (*release)(struct VFIOContainer *);
> > > > } iommu_data;
> > > > + QLIST_HEAD(, VFIOGuestIOMMU) guest_iommus;
> > >
> > > Seems like this would be related to the space, not the container.
> >
> > So, originally I was going to put it into the space, until I realised
> > that the MemoryListener which sets it up is already per-container.
> > The list still could be per-space, of course, but we'd have to do a
> > bunch of check-if-it's-already there stuff. And the remove path is
> > worse.
> >
> > > > QLIST_HEAD(, VFIOGroup) group_list;
> > > > QLIST_ENTRY(VFIOContainer) next;
> > > > } VFIOContainer;
> > > >
> > > > +typedef struct VFIOGuestIOMMU {
> > > > + VFIOContainer *container;
> > > > + MemoryRegion *iommu;
> > > > + Notifier n;
> > > > + QLIST_ENTRY(VFIOGuestIOMMU) list;
> > > > +} VFIOGuestIOMMU;
> > > > +
> > > > /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
> > > > typedef struct VFIOMSIXInfo {
> > > > uint8_t table_bar;
> > > > @@ -1940,7 +1948,64 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
> > > >
> > > > static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> > > > {
> > > > - return !memory_region_is_ram(section->mr);
> > > > + return !memory_region_is_ram(section->mr) &&
> > > > + !memory_region_is_iommu(section->mr);
> > > > +}
> > > > +
> > > > +static void vfio_iommu_map_notify(Notifier *n, void *data)
> > > > +{
> > > > + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > > > + MemoryRegion *iommu = giommu->iommu;
> > > > + VFIOContainer *container = giommu->container;
> > > > + IOMMUTLBEntry *iotlb = data;
> > > > + MemoryRegionSection *mrs;
> > > > + hwaddr xlat;
> > > > + hwaddr len = iotlb->addr_mask + 1;
> > > > + void *vaddr;
> > > > + int ret;
> > > > +
> > > > + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > > > + iotlb->iova, iotlb->iova + iotlb->address_mask);
> > > > +
> > > > + /* The IOMMU TLB entry we have just covers translation through
> > > > + * this IOMMU to its immediate target. We need to translate
> > > > + * it the rest of the way through to memory. */
> > > > + mrs = address_space_translate(iommu->iommu_target_as,
> > > > + iotlb->translated_addr,
> > > > + &xlat, &len, iotlb->perm[1]);
> > > > + if (!memory_region_is_ram(mrs->mr)) {
> > > > + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
> > > > + xlat);
> > > > + return;
> > > > + }
> > > > + if (len & iotlb->addr_mask) {
> > > > + DPRINTF("iommu has granularity incompatible with target AS\n");
> > > > + return;
> > > > + }
> > > > +
> > > > + vaddr = memory_region_get_ram_ptr(mrs->mr) +
> > > > + mrs->offset_within_region +
> > > > + (xlat - mrs->offset_within_address_space);
> > > > +
> > > > + if (iotlb->perm[0] || iotlb->perm[1]) {
> > > > + ret = vfio_dma_map(container, iotlb->iova,
> > > > + iotlb->addr_mask + 1, vaddr,
> > > > + !iotlb->perm[1] || mrs->readonly);
> > > > + if (ret) {
> > > > + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > > + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > > + container, iotlb->iova,
> > > > + iotlb->addr_mask + 1, vaddr, ret);
> > > > + }
> > > > + } else {
> > > > + ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> > > > + if (ret) {
> > > > + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> > > > + "0x%"HWADDR_PRIx") = %d (%m)",
> > > > + container, iotlb->iova,
> > > > + iotlb->addr_mask + 1, ret);
> > > > + }
> > > > + }
> > > > }
> > > >
> > > > static void vfio_listener_region_add(MemoryListener *listener,
> > > > @@ -1949,11 +2014,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > > > VFIOContainer *container = container_of(listener, VFIOContainer,
> > > > iommu_data.listener);
> > > > hwaddr iova, end;
> > > > - void *vaddr;
> > > > int ret;
> > > >
> > > > - assert(!memory_region_is_iommu(section->mr));
> > > > -
> > > > if (vfio_listener_skipped_section(section)) {
> > > > DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
> > > > section->offset_within_address_space,
> > > > @@ -1975,20 +2037,55 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > > > return;
> > > > }
> > > >
> > > > - vaddr = memory_region_get_ram_ptr(section->mr) +
> > > > + memory_region_ref(section->mr);
> > > > +
> > > > + if (memory_region_is_ram(section->mr)) {
> > > > + void *vaddr;
> > > > +
> > > > + DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> > > > + iova, end - 1, vaddr);
> > > > +
> > > > + vaddr = memory_region_get_ram_ptr(section->mr) +
> > > > section->offset_within_region +
> > > > (iova - section->offset_within_address_space);
> > > >
> > > > - DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
> > > > - iova, end - 1, vaddr);
> > > >
> > > > - memory_region_ref(section->mr);
> > > > - ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
> > > > - if (ret) {
> > > > - error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > > - "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > > - container, iova, end - iova, vaddr, ret);
> > > > + ret = vfio_dma_map(container, iova, end - iova, vaddr,
> > > > + section->readonly);
> > > > + if (ret) {
> > > > + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> > > > + "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > > > + container, iova, end - iova, vaddr, ret);
> > > > + }
> > > > + } else if (memory_region_is_iommu(section->mr)) {
> > > > + VFIOGuestIOMMU *giommu;
> > > > +
> > > > + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
> > > > + iova, end - 1);
> > > > +
> > > > + /*
> > > > + * FIXME: We should do some checking to see if the
> > > > + * capabilities of the host VFIO IOMMU are adequate to model
> > > > + * the guest IOMMU
> > > > + *
> > > > + * FIXME: This assumes that the guest IOMMU is empty of
> > > > + * mappings at this point - we should either enforce this, or
> > > > + * loop through existing mappings to map them into VFIO.
> > > > + *
> > > > + * FIXME: For VFIO iommu types which have KVM acceleration to
> > > > + * avoid bouncing all map/unmaps through qemu this way, this
> > > > + * would be the right place to wire that up (tell the KVM
> > > > + * device emulation the VFIO iommu handles to use).
> > > > + */
> > > > + giommu = g_malloc0(sizeof(*giommu));
> > > > + giommu->iommu = section->mr;
> > > > + giommu->container = container;
> > > > + giommu->n.notify = vfio_iommu_map_notify;
> > > > +
> > > > + QLIST_INSERT_HEAD(&container->guest_iommus, giommu, list);
> > > > + memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> > >
> > > And this is also filtered on the space, so we're not adding iommus that
> > > aren't handling regions within this space, right?
> >
> > I'm not sure what you're getting at.
>
> Trying to make sure I understand why guest_iommus is a list. We can
> have multiple guest iommus, but in the majority of those cases I would
> expect only one iommu per VFIOAddressSpace. So if the listener is for
> this space, we only add the relevant iommu and not all of the iommus in
> the machine.
That's what we do - we only add notifiers for iommu regions that
appear within the relevant address space. It's a list because at
least theoretically there could be more than one iommu region in the
AS, although I don't know of any real cases where that would be true.
> Actually maybe two guest iommu ranges are common for spapr
> per space, a 32bit and a 64bit range?
I think 64-bit DMA windows on PAPR are usually just mapped to RAM with
a fixed offset, rather than having TCEs (page table). That might well
introduce some extra complexities in how we mirror that into VFIO, but
it's not directly relevant to this point.
> > > Does the memory listener need to move to the space?
> >
> > That would make this simpler, but it has other problems. Having the
> > listener per-container means that when a container is added to a space
> > which already has a bunch of things in it, we automatically get those
> > replayed by the listener so we can set up the new container.
>
> Ah, right, we need replay. Ok, I think it's a reasonable proof of
> concept to start with. Thanks,
So, replay raises some interesting issues itself. At the moment we
have replay for the regions, and that's straightforward enough. The
question is what we do about replay of mappings with guest iommus
themselves.
In the case where the guest IOMMU is essentially purely software
implemented, replay is fairly straightforward - we add a new hook to
mr->iommu_ops which scans the existing mappings and sends them to the
given notifier.
For spapr, though, PUT_TCE can be a real bottleneck, so at least for a
host bridge with only VFIO devices, we want to have PUT_TCE
implemented directly in KVM, with an ioctl() to wire up the PUT_TCE
liobn directly to the host IOMMU table underlying the VFIO container.
Replay is kind of a problem though, because qemu has no record of the
mappings to replay from.
In practice we can configure that problem away though - if we only put
one vfio group per guest host bridge, then the container will always
be wired up before any mappings are made. But we need some way of
safely detecting the ok case and optimizing appropriately.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry
2013-05-14 16:53 ` Alex Williamson
@ 2013-05-15 3:51 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-15 3:51 UTC (permalink / raw)
To: Alex Williamson; +Cc: pbonzini, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]
On Tue, May 14, 2013 at 10:53:35AM -0600, Alex Williamson wrote:
> On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
> > Currently, the IOMMUTLBEntry structure contains the translated address,
> > but not the IOVA (original untranslated address). We're going to need it
> > for upcoming changes, so add it in here, and populate it correctly in our
> > one existing iommu implementation.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > hw/ppc/spapr_iommu.c | 2 ++
> > include/exec/memory.h | 2 +-
> > 2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> > index 90469b3..07a6307 100644
> > --- a/hw/ppc/spapr_iommu.c
> > +++ b/hw/ppc/spapr_iommu.c
> > @@ -80,6 +80,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
> >
> > if (tcet->bypass) {
> > return (IOMMUTLBEntry) {
> > + .iova = 0,
> > .translated_addr = 0,
> > .addr_mask = ~(hwaddr)0,
> > .perm = { true, true },
> > @@ -102,6 +103,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
> > #endif
> >
> > return (IOMMUTLBEntry) {
> > + .iova = addr & ~SPAPR_TCE_PAGE_MASK,
> > .translated_addr = tce & ~SPAPR_TCE_PAGE_MASK,
> > .addr_mask = SPAPR_TCE_PAGE_MASK,
> > .perm = { [0] = tce & SPAPR_TCE_RO, [1] = tce & SPAPR_TCE_WO },
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index b97ace7..cd33439 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -116,7 +116,7 @@ typedef struct IOMMUTLBEntry IOMMUTLBEntry;
> > typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
> >
> > struct IOMMUTLBEntry {
> > - hwaddr translated_addr;
> > + hwaddr iova, translated_addr;
>
> Being a struct definition, I assume we'd want to keep these on separate
> lines for clarity (and preferably comments).
Good idea, adjusted accordingly.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 11/11] vfio: Add guest side IOMMU support
2013-05-15 3:32 ` David Gibson
@ 2013-05-16 6:53 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2013-05-16 6:53 UTC (permalink / raw)
To: Alex Williamson; +Cc: pbonzini, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]
On Wed, May 15, 2013 at 01:32:50PM +1000, David Gibson wrote:
> On Tue, May 14, 2013 at 08:51:08PM -0600, Alex Williamson wrote:
> > On Wed, 2013-05-15 at 11:33 +1000, David Gibson wrote:
> > > On Tue, May 14, 2013 at 11:15:26AM -0600, Alex Williamson wrote:
> > > > On Tue, 2013-05-14 at 19:13 +1000, David Gibson wrote:
[snip]
> > Trying to make sure I understand why guest_iommus is a list. We can
> > have multiple guest iommus, but in the majority of those cases I would
> > expect only one iommu per VFIOAddressSpace. So if the listener is for
> > this space, we only add the relevant iommu and not all of the iommus in
> > the machine.
>
> That's what we do - we only add notifiers for iommu regions that
> appear within the relevant address space. It's a list because at
> least theoretically there could be more than one iommu region in the
> AS, although I don't know of any real cases where that would be true.
>
>
> > Actually maybe two guest iommu ranges are common for spapr
> > per space, a 32bit and a 64bit range?
>
> I think 64-bit DMA windows on PAPR are usually just mapped to RAM with
> a fixed offset, rather than having TCEs (page table). That might well
> introduce some extra complexities in how we mirror that into VFIO, but
> it's not directly relevant to this point.
So, since I wrote that I've heard from Ben that while current machines
don't have multiple (translated) iommu windows, this could well be
found on future machines.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2013-05-16 6:54 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-14 9:13 [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 01/11] iommu: Fix compile error in ioapic.c David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 02/11] pci: Don't del_subgregion on a non subregion David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 03/11] pci: Rework PCI iommu lifetime assumptions David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 04/11] pci: Use AddressSpace rather than MemoryRegion to represent PCI DMA space David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 05/11] pci: Introduce helper to retrieve a PCI device's DMA address space David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 06/11] memory: Sanity check that no listeners remain on a destroyed AddressSpace David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 07/11] vfio: Introduce VFIO address spaces David Gibson
2013-05-14 16:53 ` Alex Williamson
2013-05-15 1:18 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 08/11] vfio: Create VFIOAddressSpace objects as needed David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 09/11] memory: Add iova to IOMMUTLBEntry David Gibson
2013-05-14 16:53 ` Alex Williamson
2013-05-15 3:51 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-14 21:22 ` [Qemu-devel] " Paolo Bonzini
2013-05-14 9:13 ` [Qemu-devel] [PATCH 10/11] memory: Add iommu map/unmap notifiers David Gibson
2013-05-14 17:15 ` Alex Williamson
2013-05-14 21:02 ` Paolo Bonzini
2013-05-14 21:23 ` Paolo Bonzini
2013-05-15 1:21 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-14 9:13 ` [Qemu-devel] [PATCH 11/11] vfio: Add guest side IOMMU support David Gibson
2013-05-14 9:48 ` Paolo Bonzini
2013-05-14 13:57 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-14 17:15 ` [Qemu-devel] " Alex Williamson
2013-05-15 1:33 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2013-05-15 2:51 ` Alex Williamson
2013-05-15 3:32 ` David Gibson
2013-05-16 6:53 ` David Gibson
2013-05-14 9:39 ` [Qemu-devel] [0/11] RFC: VFIO and guest side IOMMUs, version 3 Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).