* [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW)
@ 2015-01-29  9:27 Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 01/18] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
                   ` (18 more replies)
  0 siblings, 19 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
At the moment sPAPR PHB supports only a single 32bit window
which is normally 1..2GB which is not enough for high performance devices.
PAPR spec enables creating an additional window(s) to support 64bit
DMA and bigger page sizes.
This patchset adds DDW support for pseries. The host kernel changes are
required, posted earlier today as:
[PATCH v3 00/24] powerpc/iommu/vfio: Enable Dynamic DMA windows
This was tested on POWER8 system which allows one additional DMA window
which is mapped at 0x800.0000.0000.0000 and supports 16MB pages.
Existing guests check for DDW capabilities in PHB's device tree and if it
is present, they request for an additional window and map entire guest RAM
using H_PUT_TCE/... hypercalls once at boot time and switch to direct DMA
operations.
TCE tables still may be big enough for guests backed with 64K pages but they
are reasonably small for guests backed by 16MB pages.
This does not contain PCI 64bit BAR support and VIO-TCE-bypass rework, these
are required for this to work but they have been posted separately today.
Please comment. Thanks!
Changes:
v4:
* (!) reimplemented the whole thing
* machine reset and ddw-reset RTAS call both remove all TCE tables and
create the default one
* IOMMU group id is not needed to use VFIO PHB anymore, multiple groups
are supported on the same VFIO container and virtual PHB
v3:
* removed "reset" from API now
* reworked machine versions
* applied multiple comments
* includes David's machine QOM rework as this patchset adds a new machine type
v2:
* tested on emulated PHB
* removed "ddw" machine property, now it is PHB property
* disabled by default
* defined "pseries-2.2" machine which enables DDW by default
* fixed reset() and reference counting
Alexey Kardashevskiy (18):
  spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
  spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe
  spapr_pci: Introduce a liobn number generating macros
  spapr_vio: Introduce a liobn number generating macros
  spapr_pci: Make find_phb()/find_dev() public
  spapr_iommu: Make spapr_tce_find_by_liobn() public
  spapr_iommu: Implement free_table() helper
  vfio: Add DMA memory registering
  spapr_rtas: Reserve DDW RTAS token numbers
  spapr_pci: Define DDW callbacks
  spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
  spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers
  spapr_pci: Advertise dynamic DMA windows to guest
  vfio: Enable DDW ioctls to VFIO IOMMU driver
  spapr_pci_vfio: Enable multiple groups per container
  spapr_rtas_ddw: Workaround broken LE guests
  target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64
  vfio: Enable in-kernel acceleration via VFIO KVM device
 hw/ppc/Makefile.objs          |   3 +
 hw/ppc/spapr.c                |   5 +
 hw/ppc/spapr_iommu.c          |  23 ++-
 hw/ppc/spapr_pci.c            | 209 ++++++++++++++++++++------
 hw/ppc/spapr_pci_vfio.c       | 108 +++++++++-----
 hw/ppc/spapr_rtas.c           |  29 ++++
 hw/ppc/spapr_rtas_ddw.c       | 337 ++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_vio.c            |   2 +-
 hw/vfio/common.c              | 167 ++++++++++++++++++---
 include/hw/pci-host/spapr.h   |  38 ++++-
 include/hw/ppc/spapr.h        |  15 +-
 include/hw/vfio/vfio-common.h |   3 +-
 include/hw/vfio/vfio.h        |   6 +-
 target-ppc/kvm.c              |  48 ++++--
 target-ppc/kvm_ppc.h          |  10 +-
 trace-events                  |   4 +
 16 files changed, 881 insertions(+), 126 deletions(-)
 create mode 100644 hw/ppc/spapr_rtas_ddw.c
-- 
2.0.0
^ permalink raw reply	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 01/18] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 02/18] spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe Alexey Kardashevskiy
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max as
the window size parameter to the kernel ioctl() is 32-bit so
there's no way of expressing a TCE window > 4GB.
We are going to add huge DMA windows support so this will create small
window and unexpectedly fail later.
This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
Changes:
v3:
* fixed commit log
* added cast to uint64_t
---
 hw/ppc/spapr_iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 51d49c8..8de0482 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -129,11 +129,11 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
 static int spapr_tce_table_realize(DeviceState *dev)
 {
     sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
+    uint64_t window_size = (uint64_t)tcet->nb_table << tcet->page_shift;
 
-    if (kvm_enabled()) {
+    if (kvm_enabled() && !(window_size >> 32)) {
         tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
-                                              tcet->nb_table <<
-                                              tcet->page_shift,
+                                              window_size,
                                               &tcet->fd,
                                               tcet->vfio_accel);
     }
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 02/18] spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 01/18] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-02  6:30   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 03/18] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
PAPR is defined as big endian so TCEs need an adjustment so
does this patch.
This changes code to have ldq_be_phys() in one place.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 8de0482..a19dc5e 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -250,7 +250,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
     target_ulong ioba1 = ioba;
     target_ulong tce_list = args[2];
     target_ulong npages = args[3];
-    target_ulong ret = H_PARAMETER;
+    target_ulong ret = H_PARAMETER, tce = 0;
     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
     CPUState *cs = CPU(cpu);
     hwaddr page_mask, page_size;
@@ -270,7 +270,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
     for (i = 0; i < npages; ++i, ioba += page_size) {
         target_ulong off = (tce_list & ~SPAPR_TCE_RW) +
                                 i * sizeof(target_ulong);
-        target_ulong tce = ldq_phys(cs->as, off);
+        tce = ldq_be_phys(cs->as, off);
 
         ret = put_tce_emu(tcet, ioba, tce);
         if (ret) {
@@ -281,8 +281,7 @@ static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
     /* Trace last successful or the first problematic entry */
     i = i ? (i - 1) : 0;
     trace_spapr_iommu_indirect(liobn, ioba1, tce_list, i,
-                               ldq_phys(cs->as,
-                               tce_list + i * sizeof(target_ulong)),
+                               tce,
                                ret);
 
     return ret;
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 03/18] spapr_pci: Introduce a liobn number generating macros
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 01/18] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 02/18] spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-04 15:31   ` Alexander Graf
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 04/18] spapr_vio: " Alexey Kardashevskiy
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
We are going to have multiple DMA windows per PHB and we want them to
migrate so we need a predictable way of assigning LIOBNs.
This introduces a macro which makes up a LIOBN from fixed prefix,
PHB index (unique PHB id) and window number.
This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
from LIOBN. It is used to distinguish the default 32bit windows from
dynamic windows and avoid picking default DMA window properties from
a wrong TCE table.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_pci.c     | 4 ++--
 include/hw/ppc/spapr.h | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4500849..64c7702 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -502,7 +502,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         }
 
         sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
-        sphb->dma_liobn = SPAPR_PCI_BASE_LIOBN + sphb->index;
+        sphb->dma_liobn = SPAPR_PCI_LIOBN(sphb->index, 0);
 
         windows_base = SPAPR_PCI_WINDOW_BASE
             + sphb->index * SPAPR_PCI_WINDOW_SPACING;
@@ -843,7 +843,7 @@ static int spapr_phb_children_dt(Object *child, void *opaque)
     sPAPRTCETable *tcet;
 
     tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
-    if (!tcet) {
+    if (!tcet || SPAPR_PCI_DMA_WINDOW_NUM(tcet->liobn)) {
         return 0;
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 642cdc3..a2c4bac 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -442,7 +442,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
 #define SPAPR_TCE_PAGE_MASK    (SPAPR_TCE_PAGE_SIZE - 1)
 
 #define SPAPR_VIO_BASE_LIOBN    0x00000000
-#define SPAPR_PCI_BASE_LIOBN    0x80000000
+#define SPAPR_PCI_LIOBN(i, n)   (0x80000000 | ((i) << 8) | (n))
+#define SPAPR_PCI_DMA_WINDOW_NUM(liobn) ((liobn) & 0xff)
 
 #define RTAS_ERROR_LOG_MAX      2048
 
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 04/18] spapr_vio: Introduce a liobn number generating macros
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 03/18] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 05/18] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This introduces a macro which makes up a LIOBN from fixed prefix and
VIO device address (@reg property).
This is to keep LIOBN macros rendering consistent - the same macro for
PCI has been added by the previous patch.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_vio.c     | 2 +-
 include/hw/ppc/spapr.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 367345c..4f1eb79 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -469,7 +469,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
     }
 
     if (pc->rtce_window_size) {
-        uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
+        uint32_t liobn = SPAPR_VIO_LIOBN(dev->reg);
 
         memory_region_init(&dev->mrroot, OBJECT(dev), "iommu-spapr-root",
                            ram_size);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a2c4bac..7e0b69c 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -441,7 +441,7 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
 #define SPAPR_TCE_PAGE_SIZE    (1ULL << SPAPR_TCE_PAGE_SHIFT)
 #define SPAPR_TCE_PAGE_MASK    (SPAPR_TCE_PAGE_SIZE - 1)
 
-#define SPAPR_VIO_BASE_LIOBN    0x00000000
+#define SPAPR_VIO_LIOBN(reg)    (0x00000000 | (reg))
 #define SPAPR_PCI_LIOBN(i, n)   (0x80000000 | ((i) << 8) | (n))
 #define SPAPR_PCI_DMA_WINDOW_NUM(liobn) ((liobn) & 0xff)
 
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 05/18] spapr_pci: Make find_phb()/find_dev() public
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 04/18] spapr_vio: " Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 06/18] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_pci.c          | 22 +++++++++++-----------
 include/hw/pci-host/spapr.h |  4 ++++
 2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 64c7702..3665f8a 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -47,7 +47,7 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
-static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
+sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
     sPAPRPHBState *sphb;
 
@@ -61,10 +61,10 @@ static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
     return NULL;
 }
 
-static PCIDevice *find_dev(sPAPREnvironment *spapr, uint64_t buid,
-                           uint32_t config_addr)
+PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
+                              uint32_t config_addr)
 {
-    sPAPRPHBState *sphb = find_phb(spapr, buid);
+    sPAPRPHBState *sphb = spapr_pci_find_phb(spapr, buid);
     PCIHostState *phb = PCI_HOST_BRIDGE(sphb);
     int bus_num = (config_addr >> 16) & 0xFF;
     int devfn = (config_addr >> 8) & 0xFF;
@@ -95,7 +95,7 @@ static void finish_read_pci_config(sPAPREnvironment *spapr, uint64_t buid,
         return;
     }
 
-    pci_dev = find_dev(spapr, buid, addr);
+    pci_dev = spapr_pci_find_dev(spapr, buid, addr);
     addr = rtas_pci_cfgaddr(addr);
 
     if (!pci_dev || (addr % size) || (addr >= pci_config_size(pci_dev))) {
@@ -162,7 +162,7 @@ static void finish_write_pci_config(sPAPREnvironment *spapr, uint64_t buid,
         return;
     }
 
-    pci_dev = find_dev(spapr, buid, addr);
+    pci_dev = spapr_pci_find_dev(spapr, buid, addr);
     addr = rtas_pci_cfgaddr(addr);
 
     if (!pci_dev || (addr % size) || (addr >= pci_config_size(pci_dev))) {
@@ -280,9 +280,9 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     }
 
     /* Fins sPAPRPHBState */
-    phb = find_phb(spapr, buid);
+    phb = spapr_pci_find_phb(spapr, buid);
     if (phb) {
-        pdev = find_dev(spapr, buid, config_addr);
+        pdev = spapr_pci_find_dev(spapr, buid, config_addr);
     }
     if (!phb || !pdev) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -381,9 +381,9 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
     spapr_pci_msi *msi;
 
     /* Find sPAPRPHBState */
-    phb = find_phb(spapr, buid);
+    phb = spapr_pci_find_phb(spapr, buid);
     if (phb) {
-        pdev = find_dev(spapr, buid, config_addr);
+        pdev = spapr_pci_find_dev(spapr, buid, config_addr);
     }
     if (!phb || !pdev) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -530,7 +530,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    if (find_phb(spapr, sphb->buid)) {
+    if (spapr_pci_find_phb(spapr, sphb->buid)) {
         error_setg(errp, "PCI host bridges must have unique BUIDs");
         return;
     }
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 92695b6..5c91387 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -123,4 +123,8 @@ void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr);
 
 void spapr_pci_rtas_init(void);
 
+sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
+PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
+                              uint32_t config_addr);
+
 #endif /* __HW_SPAPR_PCI_H__ */
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 06/18] spapr_iommu: Make spapr_tce_find_by_liobn() public
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 05/18] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
At the moment spapr_tce_find_by_liobn() is used by H_PUT_TCE/...
handlers to find an IOMMU by LIOBN.
We are going to implement Dynamic DMA windows (DDW), new code
will go to a new file and we will use spapr_tce_find_by_liobn()
there too so let's make it public.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_iommu.c   | 2 +-
 include/hw/ppc/spapr.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index a19dc5e..309c6cf 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -41,7 +41,7 @@ enum sPAPRTCEAccess {
 
 static QLIST_HEAD(spapr_tce_tables, sPAPRTCETable) spapr_tce_tables;
 
-static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
+sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
 {
     sPAPRTCETable *tcet;
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 7e0b69c..696b786 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -468,6 +468,7 @@ struct sPAPRTCETable {
     QLIST_ENTRY(sPAPRTCETable) list;
 };
 
+sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn);
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 06/18] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-02  6:37   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering Alexey Kardashevskiy
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
Every sPAPRTCETable object holds an IOMMU memory region which holds
a referenced to the sPAPRTCETable instance. So if we want to free
an sPAPRTCETable instance, calling object_unref() will not be enough
as embedded memory region will hold the reference and we need to break
the loop.
This adds a spapr_tce_free_table() helper which destroys the embedded
memory region and then calls object_unref() on the sPAPRTCETable instance
which will succeed now. The helper adds a new child under unique name
derived from LIOBN.
As we are here, fix spapr_tce_new_table() callers.
At the moment spapr_tce_new_table() references sPAPRTCETable twice -
once in object_new() and second time in object_property_add_child().
The callers of spapr_tce_new_table() should take care of correct
referencing.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c    | 10 +++++++++-
 hw/ppc/spapr_pci.c      |  2 ++
 hw/ppc/spapr_pci_vfio.c |  2 ++
 include/hw/ppc/spapr.h  |  1 +
 4 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 309c6cf..1adf568 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -164,6 +164,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    bool vfio_accel)
 {
     sPAPRTCETable *tcet;
+    char buf[32];
 
     if (spapr_tce_find_by_liobn(liobn)) {
         fprintf(stderr, "Attempted to create TCE table with duplicate"
@@ -182,13 +183,20 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
     tcet->nb_table = nb_table;
     tcet->vfio_accel = vfio_accel;
 
-    object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), NULL);
+    snprintf(buf, sizeof(buf) - 1, "tce-table-%08X", tcet->liobn);
+    object_property_add_child(OBJECT(owner), buf, OBJECT(tcet), NULL);
 
     object_property_set_bool(OBJECT(tcet), true, "realized", NULL);
 
     return tcet;
 }
 
+void spapr_tce_free_table(sPAPRTCETable *tcet)
+{
+    object_unparent(OBJECT(&tcet->iommu));
+    object_unparent(OBJECT(tcet));
+}
+
 static void spapr_tce_table_unrealize(DeviceState *dev, Error **errp)
 {
     sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 3665f8a..6bd00e8 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -649,6 +649,8 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
     /* Register default 32bit DMA window */
     memory_region_add_subregion(&sphb->iommu_root, 0,
                                 spapr_tce_get_iommu(tcet));
+
+    object_unref(OBJECT(tcet));
 }
 
 static int spapr_phb_children_reset(Object *child, void *opaque)
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 144912b..aabf0ae 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -69,6 +69,8 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
     /* Register default 32bit DMA window */
     memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
                                 spapr_tce_get_iommu(tcet));
+
+    object_unref(OBJECT(tcet));
 }
 
 static void spapr_phb_vfio_reset(DeviceState *qdev)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 696b786..b442e41 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -477,6 +477,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    uint32_t page_shift,
                                    uint32_t nb_table,
                                    bool vfio_accel);
+void spapr_tce_free_table(sPAPRTCETable *tcet);
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
 int spapr_dma_dt(void *fdt, int node_off, const char *propname,
                  uint32_t liobn, uint64_t window, uint32_t size);
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-02  7:04   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 09/18] spapr_rtas: Reserve DDW RTAS token numbers Alexey Kardashevskiy
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This makes use of the new "memory registering" feature. The idea is
to provide the guest ability to notify the host kernel about pages which
are going to be used for DMA. Having this information, the host kernel
can pin them all, do locked pages accounting and not spent time on
doing that in real time with possible failures which cannot be handled
nicely in some cases.
This adds a memory listener which notifies a VFIO container about memory
which needs to be pinned/unpinned. VFIO MMIO regions are skipped.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c              | 99 ++++++++++++++++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h |  3 +-
 2 files changed, 100 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cf483ff..c08f9ab 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -485,6 +485,99 @@ void vfio_listener_release(VFIOContainer *container)
     memory_listener_unregister(&container->iommu_data.type1.listener);
 }
 
+static int vfio_mem_register(VFIOContainer *container,
+                             void *vaddr, ram_addr_t size)
+{
+    struct vfio_iommu_type1_register_memory reg = {
+        .argsz = sizeof(reg),
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .size = size,
+    };
+    long ret = ioctl(container->fd, VFIO_IOMMU_REGISTER_MEMORY, ®);
+
+    DPRINTF("(%u) %s %u: va=%lx size=%lx, ret=%ld\n", getpid(),
+            __func__, __LINE__, reg.vaddr, reg.size, ret);
+
+    return ret;
+}
+
+static int vfio_mem_unregister(VFIOContainer *container,
+                               void *vaddr, ram_addr_t size)
+{
+    struct vfio_iommu_type1_unregister_memory reg = {
+        .argsz = sizeof(reg),
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .size = size,
+    };
+    long ret = ioctl(container->fd, VFIO_IOMMU_UNREGISTER_MEMORY, ®);
+
+    DPRINTF("(%u) %s %u: va=%lx size=%lx, ret=%ld\n", getpid(),
+            __func__, __LINE__, reg.vaddr, reg.size, ret);
+
+    return ret;
+}
+
+static void vfio_ram_region_add(MemoryListener *listener,
+                                MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.ramlistener);
+    hwaddr end;
+    Int128 llend;
+    void *vaddr;
+
+    if (!memory_region_is_ram(section->mr) ||
+        memory_region_is_skip_dump(section->mr)) {
+        return;
+    }
+
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+    memory_region_ref(section->mr);
+
+    end = int128_get64(llend);
+    vaddr = memory_region_get_ram_ptr(section->mr) +
+        section->offset_within_region;
+    vfio_mem_register(container, vaddr, end);
+}
+
+static void vfio_ram_region_del(MemoryListener *listener,
+                                MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.ramlistener);
+    hwaddr iova, end;
+    void *vaddr;
+
+    if (!memory_region_is_ram(section->mr) ||
+        memory_region_is_skip_dump(section->mr)) {
+        return;
+    }
+
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    end = (section->offset_within_address_space + int128_get64(section->size)) &
+        TARGET_PAGE_MASK;
+    vaddr = memory_region_get_ram_ptr(section->mr) +
+        section->offset_within_region +
+        (iova - section->offset_within_address_space);
+
+    vfio_mem_unregister(container, vaddr, end - iova);
+}
+
+const MemoryListener vfio_ram_listener = {
+    .region_add = vfio_ram_region_add,
+    .region_del = vfio_ram_region_del,
+};
+
+static void vfio_spapr_listener_release(VFIOContainer *container)
+{
+    memory_listener_unregister(&container->iommu_data.type1.listener);
+    memory_listener_unregister(&container->iommu_data.type1.ramlistener);
+}
+
 int vfio_mmap_region(Object *obj, VFIORegion *region,
                      MemoryRegion *mem, MemoryRegion *submem,
                      void **map, size_t size, off_t offset,
@@ -705,6 +798,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             goto free_container_exit;
         }
 
+        container->iommu_data.type1.ramlistener = vfio_ram_listener;
+        memory_listener_register(&container->iommu_data.type1.ramlistener,
+                                 &address_space_memory);
+
         /*
          * The host kernel code implementing VFIO_IOMMU_DISABLE is called
          * when container fd is closed so we do not call it explicitly
@@ -718,7 +815,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
         }
 
         container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
+        container->iommu_data.release = vfio_spapr_listener_release;
 
         memory_listener_register(&container->iommu_data.type1.listener,
                                  container->space->as);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1d5bfe8..3f91216 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,7 @@ struct VFIOGroup;
 
 typedef struct VFIOType1 {
     MemoryListener listener;
+    MemoryListener ramlistener;
     int error;
     bool initialized;
 } VFIOType1;
@@ -144,7 +145,7 @@ int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev);
 
 extern const MemoryRegionOps vfio_region_ops;
-extern const MemoryListener vfio_memory_listener;
+extern const MemoryListener vfio_memory_listener, vfio_ram_listener;
 extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
 extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
 
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 09/18] spapr_rtas: Reserve DDW RTAS token numbers
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (7 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-02  7:09   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 10/18] spapr_pci: Define DDW callbacks Alexey Kardashevskiy
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
Will be squashed later.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/hw/ppc/spapr.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b442e41..5f4e137 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -382,8 +382,12 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 #define RTAS_GET_SENSOR_STATE                   (RTAS_TOKEN_BASE + 0x1D)
 #define RTAS_IBM_CONFIGURE_CONNECTOR            (RTAS_TOKEN_BASE + 0x1E)
 #define RTAS_IBM_OS_TERM                        (RTAS_TOKEN_BASE + 0x1F)
+#define RTAS_IBM_QUERY_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x20)
+#define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x21)
+#define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x22)
+#define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x23)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x20)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x24)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 10/18] spapr_pci: Define DDW callbacks
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (8 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 09/18] spapr_rtas: Reserve DDW RTAS token numbers Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 11/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This adds callbacks definitions which PHB needs to implement in order to
support dynamic DMA windows (DDW).
Will be squashed later.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/hw/pci-host/spapr.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 5c91387..eec95f3 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -49,6 +49,27 @@ struct sPAPRPHBClass {
     PCIHostBridgeClass parent_class;
 
     void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
+
+/* sPAPR spec defined pagesize mask values */
+#define DDW_PGSIZE_4K       0x01
+#define DDW_PGSIZE_64K      0x02
+#define DDW_PGSIZE_16M      0x04
+#define DDW_PGSIZE_32M      0x08
+#define DDW_PGSIZE_64M      0x10
+#define DDW_PGSIZE_128M     0x20
+#define DDW_PGSIZE_256M     0x40
+#define DDW_PGSIZE_16G      0x80
+#define DDW_PGSIZE_MASK     0xFF
+
+    int (*ddw_query)(sPAPRPHBState *sphb, uint32_t *windows_supported,
+                     uint32_t *page_size_mask,
+                     uint32_t *dma32_window_size,
+                     uint64_t *dma64_window_size);
+    int (*ddw_create)(sPAPRPHBState *sphb, uint32_t liobn,
+                      uint32_t page_shift, uint32_t window_shift,
+                      sPAPRTCETable **ptcet);
+    int (*ddw_remove)(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
+    int (*ddw_reset)(sPAPRPHBState *sphb);
 };
 
 typedef struct spapr_pci_msi {
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 11/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (9 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 10/18] spapr_pci: Define DDW callbacks Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  3:51   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 12/18] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers Alexey Kardashevskiy
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This implements DDW for emulated and VFIO PHB.
This removes all DMA windows on reset and creates the default window,
same is done on the "ibm,reset-pe-dma-window" call.
This converts sPAPRPHBClass::finish_realize to sPAPRPHBClass::ddw_reset
and others.
The "ddw" property is enabled by default on a PHB but for compatibility
pseries-2.1 machine disables it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v4:
* reset handler is back in generalized form
v3:
* removed reset
* windows_num is now 1 or bigger rather than 0-based value and it is only
changed in PHB code, not in RTAS
* added page mask check in create()
* added SPAPR_PCI_DDW_MAX_WINDOWS to track how many windows are already
created
v2:
* tested on hacked emulated E1000
* implemented DDW reset on the PHB reset
* spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO
spapr_pci_vfio: Enable DDW
This implements DDW for VFIO. Host kernel support is required for this.
After this patch DDW will be enabled on all machines but pseries-2.1.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* remove()/reset() callbacks use spapr_pci's ones
---
 hw/ppc/spapr_pci.c          | 160 +++++++++++++++++++++++++++++++++++---------
 hw/ppc/spapr_pci_vfio.c     |  98 +++++++++++++++++----------
 include/hw/pci-host/spapr.h |  15 ++++-
 3 files changed, 203 insertions(+), 70 deletions(-)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 6bd00e8..3ec03be 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -469,6 +469,126 @@ static const MemoryRegionOps spapr_msi_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN
 };
 
+static int spapr_phb_get_win_num_cb(Object *child, void *opaque)
+{
+    if (object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE)) {
+        ++*(unsigned *)opaque;
+    }
+    return 0;
+}
+
+unsigned spapr_phb_get_win_num(sPAPRPHBState *sphb)
+{
+    unsigned ret = 0;
+
+    object_child_foreach(OBJECT(sphb), spapr_phb_get_win_num_cb, &ret);
+
+    return ret;
+}
+
+/*
+ * Dynamic DMA windows
+ */
+static int spapr_pci_ddw_query(sPAPRPHBState *sphb,
+                               uint32_t *windows_supported,
+                               uint32_t *page_size_mask,
+                               uint32_t *dma32_window_size,
+                               uint64_t *dma64_window_size)
+{
+    *windows_supported = SPAPR_PCI_DDW_MAX_WINDOWS;
+    *page_size_mask = DDW_PGSIZE_64K | DDW_PGSIZE_16M;
+    *dma32_window_size = SPAPR_PCI_TCE32_WIN_SIZE;
+    *dma64_window_size = ram_size;
+
+    return 0;
+}
+
+static int spapr_pci_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
+                                uint32_t page_shift, uint32_t window_shift,
+                                sPAPRTCETable **ptcet)
+{
+    uint64_t bus_offset = spapr_phb_get_win_num(sphb) ?
+                          SPAPR_PCI_TCE64_START : 0;
+
+    if (((page_shift != 16) && (page_shift != 24) && (page_shift != 12))) {
+        return -1;
+    }
+
+    *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
+                                 bus_offset,
+                                 page_shift,
+                                 1ULL << (window_shift - page_shift),
+                                 false);
+    if (!*ptcet) {
+        return -1;
+    }
+    memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
+                                spapr_tce_get_iommu(*ptcet));
+
+    return 0;
+}
+
+int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
+{
+    memory_region_del_subregion(&sphb->iommu_root,
+                                spapr_tce_get_iommu(tcet));
+    spapr_tce_free_table(tcet);
+
+    return 0;
+}
+
+static int spapr_pci_remove_ddw_cb(Object *child, void *opaque)
+{
+    sPAPRTCETable *tcet;
+
+    tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
+
+    if (tcet) {
+        sPAPRPHBState *sphb = opaque;
+        sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+
+        spc->ddw_remove(sphb, tcet);
+    }
+
+    return 0;
+}
+
+int spapr_pci_ddw_reset(sPAPRPHBState *sphb)
+{
+    int ret;
+    sPAPRPHBClass *spc;
+    sPAPRTCETable *tcet;
+    uint32_t windows_supported = 0, page_size_mask = 0, dma32_window_size = 0;
+    uint64_t dma64_window_size = 0;
+
+    /* Remove all windows */
+    object_child_foreach(OBJECT(sphb), spapr_pci_remove_ddw_cb, sphb);
+
+    /* Create default 32bit window */
+    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+    if (!spc->ddw_create || !spc->ddw_query) {
+        return -1;
+    }
+
+    ret = spc->ddw_query(sphb, &windows_supported, &page_size_mask,
+                         &dma32_window_size, &dma64_window_size);
+    if (ret) {
+        return ret;
+    }
+
+    sphb->ddw_enabled = (windows_supported > 1);
+
+    ret = spc->ddw_create(sphb, SPAPR_PCI_LIOBN(sphb->index, 0),
+                          SPAPR_TCE_PAGE_SHIFT, ctzl(dma32_window_size), &tcet);
+    if (ret) {
+        return ret;
+    }
+
+    object_unref(OBJECT(tcet));
+
+    return 0;
+}
+
 /*
  * PHB PCI device
  */
@@ -484,7 +604,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
     sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
     PCIHostState *phb = PCI_HOST_BRIDGE(s);
-    sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
     char *namebuf;
     int i;
     PCIBus *bus;
@@ -622,37 +741,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         sphb->lsi_table[i].irq = irq;
     }
 
-    if (!info->finish_realize) {
-        error_setg(errp, "finish_realize not defined");
-        return;
-    }
-
-    info->finish_realize(sphb, errp);
-
     sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
 }
 
-static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
-{
-    sPAPRTCETable *tcet;
-
-    tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
-                               0,
-                               SPAPR_TCE_PAGE_SHIFT,
-                               0x40000000 >> SPAPR_TCE_PAGE_SHIFT, false);
-    if (!tcet) {
-        error_setg(errp, "Unable to create TCE table for %s",
-                   sphb->dtbusname);
-        return ;
-    }
-
-    /* Register default 32bit DMA window */
-    memory_region_add_subregion(&sphb->iommu_root, 0,
-                                spapr_tce_get_iommu(tcet));
-
-    object_unref(OBJECT(tcet));
-}
-
 static int spapr_phb_children_reset(Object *child, void *opaque)
 {
     DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
@@ -666,7 +757,11 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
 
 static void spapr_phb_reset(DeviceState *qdev)
 {
-    /* Reset the IOMMU state */
+    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(qdev);
+
+    if (spc->ddw_reset) {
+        spc->ddw_reset(SPAPR_PCI_HOST_BRIDGE(qdev));
+    }
     object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
 }
 
@@ -801,7 +896,10 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_spapr_pci;
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->cannot_instantiate_with_device_add_yet = false;
-    spc->finish_realize = spapr_phb_finish_realize;
+    spc->ddw_query = spapr_pci_ddw_query;
+    spc->ddw_create = spapr_pci_ddw_create;
+    spc->ddw_remove = spapr_pci_ddw_remove;
+    spc->ddw_reset = spapr_pci_ddw_reset;
 }
 
 static const TypeInfo spapr_phb_info = {
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index aabf0ae..b20ac90 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -27,65 +27,89 @@ static Property spapr_phb_vfio_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
+static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
+                                    uint32_t *windows_supported,
+                                    uint32_t *page_size_mask,
+                                    uint32_t *dma32_window_size,
+                                    uint64_t *dma64_window_size)
 {
     sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
     int ret;
-    sPAPRTCETable *tcet;
-    uint32_t liobn = svphb->phb.dma_liobn;
 
-    if (svphb->iommugroupid == -1) {
-        error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid);
-        return;
-    }
-
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
-                               VFIO_CHECK_EXTENSION,
-                               (void *) VFIO_SPAPR_TCE_IOMMU);
-    if (ret != 1) {
-        error_setg_errno(errp, -ret,
-                         "spapr-vfio: SPAPR extension is not supported");
-        return;
-    }
-
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
                                VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
     if (ret) {
-        error_setg_errno(errp, -ret,
-                         "spapr-vfio: get info from container failed");
-        return;
+        return ret;
     }
 
-    tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start,
-                               SPAPR_TCE_PAGE_SHIFT,
-                               info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT,
-                               true);
-    if (!tcet) {
-        error_setg(errp, "spapr-vfio: failed to create VFIO TCE table");
-        return;
+    *windows_supported = info.windows_supported;
+    *page_size_mask = info.flags & DDW_PGSIZE_MASK;
+    *dma32_window_size = info.dma32_window_size;
+    *dma64_window_size = ram_size;
+
+    return ret;
+}
+
+static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
+                                     uint32_t page_shift, uint32_t window_shift,
+                                     sPAPRTCETable **ptcet)
+{
+    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+    struct vfio_iommu_spapr_tce_create create = {
+        .argsz = sizeof(create),
+        .page_shift = page_shift,
+        .window_shift = window_shift,
+        .levels = 1,
+        .start_addr = 0,
+    };
+    int ret;
+
+    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+                               VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
+    if (ret) {
+        return ret;
     }
 
-    /* Register default 32bit DMA window */
-    memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
-                                spapr_tce_get_iommu(tcet));
+    *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
+                                 create.start_addr,
+                                 page_shift,
+                                 1ULL << (window_shift - page_shift),
+                                 true);
+    if (!*ptcet) {
+        return -1;
+    }
+    memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
+                                spapr_tce_get_iommu(*ptcet));
 
-    object_unref(OBJECT(tcet));
+    return ret;
 }
 
-static void spapr_phb_vfio_reset(DeviceState *qdev)
+static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
 {
-    /* Do nothing */
+    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+    struct vfio_iommu_spapr_tce_remove remove = {
+        .argsz = sizeof(remove),
+        .start_addr = tcet->bus_offset
+    };
+    int ret;
+
+    spapr_pci_ddw_remove(sphb, tcet);
+    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+                               VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
+
+    return ret;
 }
 
 static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
 {
-    DeviceClass *dc = DEVICE_CLASS(klass);
     sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->props = spapr_phb_vfio_properties;
-    dc->reset = spapr_phb_vfio_reset;
-    spc->finish_realize = spapr_phb_vfio_finish_realize;
+    spc->ddw_query = spapr_pci_vfio_ddw_query;
+    spc->ddw_create = spapr_pci_vfio_ddw_create;
+    spc->ddw_remove = spapr_pci_vfio_ddw_remove;
 }
 
 static const TypeInfo spapr_phb_vfio_info = {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index eec95f3..577f908 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -48,8 +48,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState;
 struct sPAPRPHBClass {
     PCIHostBridgeClass parent_class;
 
-    void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
-
 /* sPAPR spec defined pagesize mask values */
 #define DDW_PGSIZE_4K       0x01
 #define DDW_PGSIZE_64K      0x02
@@ -106,6 +104,8 @@ struct sPAPRPHBState {
     int32_t msi_devs_num;
     spapr_pci_msi_mig *msi_devs;
 
+    bool ddw_enabled;
+
     QLIST_ENTRY(sPAPRPHBState) list;
 };
 
@@ -129,6 +129,14 @@ struct sPAPRPHBVFIOState {
 
 #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
 
+#define SPAPR_PCI_TCE32_WIN_SIZE     0x80000000ULL
+
+/* Default 64bit dynamic window offset */
+#define SPAPR_PCI_TCE64_START        0x8000000000000000ULL
+
+/* Maximum allowed number of DMA windows for emulated PHB */
+#define SPAPR_PCI_DDW_MAX_WINDOWS    2
+
 static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
 {
     return xics_get_qirq(spapr->icp, phb->lsi_table[pin].irq);
@@ -147,5 +155,8 @@ void spapr_pci_rtas_init(void);
 sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
 PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
                               uint32_t config_addr);
+int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
+int spapr_pci_ddw_reset(sPAPRPHBState *sphb);
+unsigned spapr_phb_get_win_num(sPAPRPHBState *sphb);
 
 #endif /* __HW_SPAPR_PCI_H__ */
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 12/18] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (10 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 11/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  4:05   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 13/18] spapr_pci: Advertise dynamic DMA windows to guest Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)
which can support page sizes other than 4K.
The existing implementation of DDW in the guest tries to create one huge
DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
succeeds, the guest switches to dma_direct_ops and never calls
TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
the entire RAM and not waste time on map/unmap later.
This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.
These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PHB.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v3:
* added ibm,reset-pe-dma-window
v2:
* double loop squashed to spapr_iommu_fixmask() helper
* added @ddw_num counter to PHB, it is used to generate LIOBN for new
window; it is reset on ddw-reset event
* added ULL to constants used in shift operations
* rtas_ibm_reset_pe_dma_window() and rtas_ibm_remove_pe_dma_window()
do not remove windows anymore, the PHB callback has to as it will reuse
the same code in case of guest reboot as well
---
 hw/ppc/Makefile.objs    |   3 +
 hw/ppc/spapr_rtas_ddw.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events            |   4 +
 3 files changed, 304 insertions(+)
 create mode 100644 hw/ppc/spapr_rtas_ddw.c
diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 19d9920..d7fe4fb 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
+ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
+obj-y += spapr_rtas_ddw.o
+endif
 # PowerPC 4xx boards
 obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
 obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
new file mode 100644
index 0000000..af70601
--- /dev/null
+++ b/hw/ppc/spapr_rtas_ddw.c
@@ -0,0 +1,297 @@
+/*
+ * QEMU sPAPR Dynamic DMA windows support
+ *
+ * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License,
+ *  or (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw/ppc/spapr.h"
+#include "hw/pci-host/spapr.h"
+#include "trace.h"
+
+static uint32_t spapr_iommu_fixmask(struct ppc_one_seg_page_size *sps,
+                                    uint32_t query_mask)
+{
+    int i, j;
+    uint32_t mask = 0;
+    const struct { int shift; uint32_t mask; } masks[] = {
+        { 12, DDW_PGSIZE_4K },
+        { 16, DDW_PGSIZE_64K },
+        { 24, DDW_PGSIZE_16M },
+        { 25, DDW_PGSIZE_32M },
+        { 26, DDW_PGSIZE_64M },
+        { 27, DDW_PGSIZE_128M },
+        { 28, DDW_PGSIZE_256M },
+        { 34, DDW_PGSIZE_16G },
+    };
+
+    for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+        for (j = 0; j < ARRAY_SIZE(masks); ++j) {
+            if ((sps[i].page_shift == masks[j].shift) &&
+                    (query_mask & masks[j].mask)) {
+                mask |= masks[j].mask;
+            }
+        }
+    }
+
+    return mask;
+}
+
+static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
+                                         sPAPREnvironment *spapr,
+                                         uint32_t token, uint32_t nargs,
+                                         target_ulong args,
+                                         uint32_t nret, target_ulong rets)
+{
+    CPUPPCState *env = &cpu->env;
+    sPAPRPHBState *sphb;
+    sPAPRPHBClass *spc;
+    uint64_t buid;
+    uint32_t avail, addr, pgmask = 0;
+    uint32_t windows_supported = 0, page_size_mask = 0, dma32_window_size = 0;
+    uint64_t dma64_window_size = 0;
+    unsigned current;
+    long ret;
+
+    if ((nargs != 3) || (nret != 5)) {
+        goto param_error_exit;
+    }
+
+    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
+    addr = rtas_ld(args, 0);
+    sphb = spapr_pci_find_phb(spapr, buid);
+    if (!sphb || !sphb->ddw_enabled) {
+        goto param_error_exit;
+    }
+
+    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+    if (!spc->ddw_query) {
+        goto hw_error_exit;
+    }
+
+    ret = spc->ddw_query(sphb, &windows_supported, &page_size_mask,
+                         &dma32_window_size, &dma64_window_size);
+    trace_spapr_iommu_ddw_query(buid, addr, windows_supported,
+                                page_size_mask, pgmask, ret);
+    if (ret) {
+        goto hw_error_exit;
+    }
+
+    current = spapr_phb_get_win_num(sphb);
+    avail = (windows_supported > current) ? (windows_supported - current) : 0;
+
+    /* Work out supported page masks */
+    pgmask = spapr_iommu_fixmask(env->sps.sps, page_size_mask);
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, avail);
+
+    /*
+     * This is "Largest contiguous block of TCEs allocated specifically
+     * for (that is, are reserved for) this PE".
+     * Return the maximum number as all RAM was in 4K pages.
+     */
+    rtas_st(rets, 2, dma64_window_size >> SPAPR_TCE_PAGE_SHIFT);
+    rtas_st(rets, 3, pgmask);
+    rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
+    return;
+
+hw_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+    return;
+
+param_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
+                                          sPAPREnvironment *spapr,
+                                          uint32_t token, uint32_t nargs,
+                                          target_ulong args,
+                                          uint32_t nret, target_ulong rets)
+{
+    sPAPRPHBState *sphb;
+    sPAPRPHBClass *spc;
+    sPAPRTCETable *tcet = NULL;
+    uint32_t addr, page_shift, window_shift, liobn;
+    uint64_t buid;
+    long ret;
+    uint32_t windows_supported = 0, page_size_mask = 0, dma32_window_size = 0;
+    uint64_t dma64_window_size = 0;
+
+    if ((nargs != 5) || (nret != 4)) {
+        goto param_error_exit;
+    }
+
+    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
+    addr = rtas_ld(args, 0);
+    sphb = spapr_pci_find_phb(spapr, buid);
+    if (!sphb || !sphb->ddw_enabled) {
+        goto param_error_exit;
+    }
+
+    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+    if (!spc->ddw_create || !spc->ddw_query) {
+        goto hw_error_exit;
+    }
+
+    ret = spc->ddw_query(sphb, &windows_supported, &page_size_mask,
+                         &dma32_window_size, &dma64_window_size);
+    if (ret || (spapr_phb_get_win_num(sphb) >= windows_supported)) {
+        goto hw_error_exit;
+    }
+
+    page_shift = rtas_ld(args, 3);
+    window_shift = rtas_ld(args, 4);
+    /* Default 32bit window#0 is always there so +1 */
+    liobn = SPAPR_PCI_LIOBN(sphb->index, spapr_phb_get_win_num(sphb));
+
+    ret = spc->ddw_create(sphb, liobn, page_shift, window_shift, &tcet);
+    trace_spapr_iommu_ddw_create(buid, addr, 1ULL << page_shift,
+                                 1ULL << window_shift,
+                                 tcet ? tcet->bus_offset : 0xbaadf00d,
+                                 liobn, ret);
+    if (ret || !tcet) {
+        goto hw_error_exit;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, liobn);
+    rtas_st(rets, 2, tcet->bus_offset >> 32);
+    rtas_st(rets, 3, tcet->bus_offset & ((uint32_t) -1));
+
+    object_unref(OBJECT(tcet));
+    return;
+
+hw_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+    return;
+
+param_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_remove_pe_dma_window(PowerPCCPU *cpu,
+                                          sPAPREnvironment *spapr,
+                                          uint32_t token, uint32_t nargs,
+                                          target_ulong args,
+                                          uint32_t nret, target_ulong rets)
+{
+    sPAPRPHBState *sphb;
+    sPAPRPHBClass *spc;
+    sPAPRTCETable *tcet;
+    uint32_t liobn;
+    long ret;
+
+    if ((nargs != 1) || (nret != 1)) {
+        goto param_error_exit;
+    }
+
+    liobn = rtas_ld(args, 0);
+    tcet = spapr_tce_find_by_liobn(liobn);
+    if (!tcet) {
+        goto param_error_exit;
+    }
+
+    sphb = SPAPR_PCI_HOST_BRIDGE(OBJECT(tcet)->parent);
+    if (!sphb || !sphb->ddw_enabled) {
+        goto param_error_exit;
+    }
+
+    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+    if (!spc->ddw_remove) {
+        goto hw_error_exit;
+    }
+
+    ret = spc->ddw_remove(sphb, tcet);
+    trace_spapr_iommu_ddw_remove(liobn, ret);
+    if (ret) {
+        goto hw_error_exit;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    return;
+
+hw_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+    return;
+
+param_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_reset_pe_dma_window(PowerPCCPU *cpu,
+                                         sPAPREnvironment *spapr,
+                                         uint32_t token, uint32_t nargs,
+                                         target_ulong args,
+                                         uint32_t nret, target_ulong rets)
+{
+    sPAPRPHBState *sphb;
+    sPAPRPHBClass *spc;
+    uint64_t buid;
+    uint32_t addr;
+    long ret;
+
+    if ((nargs != 3) || (nret != 1)) {
+        goto param_error_exit;
+    }
+
+    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
+    addr = rtas_ld(args, 0);
+    sphb = spapr_pci_find_phb(spapr, buid);
+    if (!sphb || !sphb->ddw_enabled) {
+        goto param_error_exit;
+    }
+
+    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+    if (!spc->ddw_reset) {
+        goto hw_error_exit;
+    }
+
+    ret = spc->ddw_reset(sphb);
+    trace_spapr_iommu_ddw_reset(buid, addr, ret);
+    if (ret) {
+        goto hw_error_exit;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+
+    return;
+
+hw_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+    return;
+
+param_error_exit:
+    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void spapr_rtas_ddw_init(void)
+{
+    spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW,
+                        "ibm,query-pe-dma-window",
+                        rtas_ibm_query_pe_dma_window);
+    spapr_rtas_register(RTAS_IBM_CREATE_PE_DMA_WINDOW,
+                        "ibm,create-pe-dma-window",
+                        rtas_ibm_create_pe_dma_window);
+    spapr_rtas_register(RTAS_IBM_REMOVE_PE_DMA_WINDOW,
+                        "ibm,remove-pe-dma-window",
+                        rtas_ibm_remove_pe_dma_window);
+    spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW,
+                        "ibm,reset-pe-dma-window",
+                        rtas_ibm_reset_pe_dma_window);
+}
+
+type_init(spapr_rtas_ddw_init)
diff --git a/trace-events b/trace-events
index 04f5df2..9af53d9 100644
--- a/trace-events
+++ b/trace-events
@@ -1285,6 +1285,10 @@ spapr_iommu_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t iobaN
 spapr_iommu_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
 spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x"
 spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d"
+spapr_iommu_ddw_query(uint64_t buid, uint32_t cfgaddr, uint32_t wa, uint32_t pgz, uint32_t pgz_fixed, long ret) "buid=%"PRIx64" addr=%"PRIx32", %u windows available, sizes %"PRIx32", fixed %"PRIx32", ret = %ld"
+spapr_iommu_ddw_create(uint64_t buid, uint32_t cfgaddr, unsigned long long pg_size, unsigned long long req_size, uint64_t start, uint32_t liobn, long ret) "buid=%"PRIx64" addr=%"PRIx32", page size=0x%llx, requested=0x%llx, start addr=%"PRIx64", liobn=%"PRIx32", ret = %ld"
+spapr_iommu_ddw_remove(uint32_t liobn, long ret) "liobn=%"PRIx32", ret = %ld"
+spapr_iommu_ddw_reset(uint64_t buid, uint32_t cfgaddr, long ret) "buid=%"PRIx64" addr=%"PRIx32", ret = %ld"
 
 # hw/ppc/ppc.c
 ppc_tb_adjust(uint64_t offs1, uint64_t offs2, int64_t diff, int64_t seconds) "adjusted from 0x%"PRIx64" to 0x%"PRIx64", diff %"PRId64" (%"PRId64"s)"
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 13/18] spapr_pci: Advertise dynamic DMA windows to guest
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (11 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 12/18] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  4:10   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 14/18] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c     |  5 +++++
 hw/ppc/spapr_pci.c | 25 +++++++++++++++++++++++++
 2 files changed, 30 insertions(+)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b560459..f9882c1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1760,6 +1760,11 @@ static void spapr_machine_2_1_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     static GlobalProperty compat_props[] = {
         HW_COMPAT_2_1,
+        {
+            .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
+            .property = "ddw",
+            .value    = stringify(off),
+        },
         { /* end of list */ }
     };
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 3ec03be..a94bba1 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -775,6 +775,7 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
                        SPAPR_PCI_IO_WIN_SIZE),
+    DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -993,6 +994,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     uint32_t interrupt_map_mask[] = {
         cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
     uint32_t interrupt_map[PCI_SLOT_MAX * PCI_NUM_PINS][7];
+    uint32_t ddw_applicable[] = {
+        cpu_to_be32(RTAS_IBM_QUERY_PE_DMA_WINDOW),
+        cpu_to_be32(RTAS_IBM_CREATE_PE_DMA_WINDOW),
+        cpu_to_be32(RTAS_IBM_REMOVE_PE_DMA_WINDOW)
+    };
+    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(phb);
 
     /* Start populating the FDT */
     sprintf(nodename, "pci@%" PRIx64, phb->buid);
@@ -1022,6 +1029,24 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
     _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS));
 
+    /* Dynamic DMA window */
+    if (phb->ddw_enabled &&
+        spc->ddw_query && spc->ddw_create && spc->ddw_remove) {
+        _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-applicable", &ddw_applicable,
+                         sizeof(ddw_applicable)));
+
+        if (spc->ddw_reset) {
+            uint32_t ddw_extensions[] = {
+                cpu_to_be32(1),
+                cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
+            };
+
+            /* When enabled, the guest will remove the default 32bit window */
+            _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-extensions",
+                             &ddw_extensions, sizeof(ddw_extensions)));
+        }
+    }
+
     /* Build the interrupt-map, this must matches what is done
      * in pci_spapr_map_irq
      */
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 14/18] vfio: Enable DDW ioctls to VFIO IOMMU driver
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (12 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 13/18] spapr_pci: Advertise dynamic DMA windows to guest Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This enables DDW RTAS-related ioctls in VFIO.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c08f9ab..1cafcf8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1045,6 +1045,8 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
     switch (req) {
     case VFIO_CHECK_EXTENSION:
     case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+    case VFIO_IOMMU_SPAPR_TCE_CREATE:
+    case VFIO_IOMMU_SPAPR_TCE_REMOVE:
         break;
     default:
         /* Return an error on unknown requests */
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (13 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 14/18] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  4:19   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 16/18] spapr_rtas_ddw: Workaround broken LE guests Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
This enables multiple IOMMU groups in one VFIO container which means
that multiple devices from different groups can share the same IOMMU table
and locked pages counting can be done once as there is no need to have
several containers for two or more groups.
This removes a group id from vfio_container_ioctl(). The kernel support
is required for this.
This adds a check that there is just one VFIO container per PHB address
space.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_pci_vfio.c |  9 +++------
 hw/vfio/common.c        | 33 +++++++++++++++------------------
 include/hw/vfio/vfio.h  |  2 +-
 3 files changed, 19 insertions(+), 25 deletions(-)
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index b20ac90..257181d 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -33,11 +33,10 @@ static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
                                     uint32_t *dma32_window_size,
                                     uint64_t *dma64_window_size)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
     int ret;
 
-    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as,
                                VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
     if (ret) {
         return ret;
@@ -55,7 +54,6 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
                                      uint32_t page_shift, uint32_t window_shift,
                                      sPAPRTCETable **ptcet)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_iommu_spapr_tce_create create = {
         .argsz = sizeof(create),
         .page_shift = page_shift,
@@ -65,7 +63,7 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
     };
     int ret;
 
-    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as,
                                VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
     if (ret) {
         return ret;
@@ -87,7 +85,6 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
 
 static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
 {
-    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
     struct vfio_iommu_spapr_tce_remove remove = {
         .argsz = sizeof(remove),
         .start_addr = tcet->bus_offset
@@ -95,7 +92,7 @@ static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
     int ret;
 
     spapr_pci_ddw_remove(sphb, tcet);
-    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+    ret = vfio_container_ioctl(&sphb->iommu_as,
                                VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
 
     return ret;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1cafcf8..a26cbae 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1011,34 +1011,31 @@ void vfio_put_base_device(VFIODevice *vbasedev)
     close(vbasedev->fd);
 }
 
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
+static int vfio_container_do_ioctl(AddressSpace *as,
                                    int req, void *param)
 {
-    VFIOGroup *group;
     VFIOContainer *container;
-    int ret = -1;
+    int ret;
+    VFIOAddressSpace *space;
 
-    group = vfio_get_group(groupid, as);
-    if (!group) {
-        error_report("vfio: group %d not registered", groupid);
-        return ret;
-    }
+    space = vfio_get_address_space(as);
+    container = QLIST_FIRST(&space->containers);
 
-    container = group->container;
-    if (group->container) {
-        ret = ioctl(container->fd, req, param);
-        if (ret < 0) {
-            error_report("vfio: failed to ioctl container: ret=%d, %s",
-                         ret, strerror(errno));
-        }
+    if (!container || QLIST_NEXT(container, next)) {
+        error_report("vfio: multiple containers per PHB are not supported");
+        return -1;
     }
 
-    vfio_put_group(group);
+    ret = ioctl(container->fd, req, param);
+    if (ret < 0) {
+        error_report("vfio: failed to ioctl container: ret=%d, %s",
+                     ret, strerror(errno));
+    }
 
     return ret;
 }
 
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
+int vfio_container_ioctl(AddressSpace *as,
                          int req, void *param)
 {
     /* We allow only certain ioctls to the container */
@@ -1054,5 +1051,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
         return -1;
     }
 
-    return vfio_container_do_ioctl(as, groupid, req, param);
+    return vfio_container_do_ioctl(as, req, param);
 }
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 0b26cd8..76b5744 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -3,7 +3,7 @@
 
 #include "qemu/typedefs.h"
 
-extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
+extern int vfio_container_ioctl(AddressSpace *as,
                                 int req, void *param);
 
 #endif
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 16/18] spapr_rtas_ddw: Workaround broken LE guests
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (14 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  4:23   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 17/18] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
Recent kernels do parse results of what DDW RTAS calls return incorrectly
if compiled with LITTLE_ENDIAN=yes.
This adds special handling for such guests.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_rtas.c     | 29 +++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas_ddw.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h  |  2 ++
 3 files changed, 71 insertions(+)
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 2ec2a8e..c3dee94 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -293,12 +293,15 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
+    spapr_rtas_fn fn_wa; /* workaround helper */
 } rtas_table[RTAS_TOKEN_MAX - RTAS_TOKEN_BASE];
 
 target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                              uint32_t token, uint32_t nargs, target_ulong args,
                              uint32_t nret, target_ulong rets)
 {
+    uint32_t tokensw = bswap32(token);
+
     if ((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX)) {
         struct rtas_call *call = rtas_table + (token - RTAS_TOKEN_BASE);
 
@@ -308,6 +311,16 @@ target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         }
     }
 
+    /* Workaround for LE guests */
+    if ((tokensw >= RTAS_TOKEN_BASE) && (tokensw < RTAS_TOKEN_MAX)) {
+        struct rtas_call *call = rtas_table + (tokensw - RTAS_TOKEN_BASE);
+
+        if (call->fn_wa) {
+            call->fn_wa(cpu, spapr, tokensw, nargs, args, nret, rets);
+            return H_SUCCESS;
+        }
+    }
+
     /* HACK: Some Linux early debug code uses RTAS display-character,
      * but assumes the token value is 0xa (which it is on some real
      * machines) without looking it up in the device tree.  This
@@ -340,6 +353,22 @@ void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
     rtas_table[token].fn = fn;
 }
 
+void spapr_rtas_register_wrong_endian(int token, spapr_rtas_fn fn)
+{
+    if (!((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX))) {
+        fprintf(stderr, "RTAS invalid token 0x%x\n", token);
+        exit(1);
+    }
+
+    token -= RTAS_TOKEN_BASE;
+    if (!rtas_table[token].fn) {
+        fprintf(stderr, "RTAS token %x must be initialized to allow workaround\n",
+                token);
+        exit(1);
+    }
+    rtas_table[token].fn_wa = fn;
+}
+
 int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
                                  hwaddr rtas_size)
 {
diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
index af70601..56eae9f 100644
--- a/hw/ppc/spapr_rtas_ddw.c
+++ b/hw/ppc/spapr_rtas_ddw.c
@@ -278,6 +278,41 @@ param_error_exit:
     rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
 }
 
+#define SPAPR_RTAS_DDW_SWAP(n) rtas_st(rets, (n), bswap32(rtas_ld(rets, (n))))
+
+static void rtas_ibm_query_pe_dma_window_wrong_endian(PowerPCCPU *cpu,
+                                                      sPAPREnvironment *spapr,
+                                                      uint32_t token,
+                                                      uint32_t nargs,
+                                                      target_ulong args,
+                                                      uint32_t nret,
+                                                      target_ulong rets)
+{
+    rtas_ibm_query_pe_dma_window(cpu, spapr, token, nargs, args, nret, rets);
+
+    SPAPR_RTAS_DDW_SWAP(0);
+    SPAPR_RTAS_DDW_SWAP(1);
+    SPAPR_RTAS_DDW_SWAP(2);
+    SPAPR_RTAS_DDW_SWAP(3);
+    SPAPR_RTAS_DDW_SWAP(4);
+}
+
+static void rtas_ibm_create_pe_dma_window_wrong_endian(PowerPCCPU *cpu,
+                                                       sPAPREnvironment *spapr,
+                                                       uint32_t token,
+                                                       uint32_t nargs,
+                                                       target_ulong args,
+                                                       uint32_t nret,
+                                                       target_ulong rets)
+{
+    rtas_ibm_create_pe_dma_window(cpu, spapr, token, nargs, args, nret, rets);
+
+    SPAPR_RTAS_DDW_SWAP(0);
+    SPAPR_RTAS_DDW_SWAP(1);
+    SPAPR_RTAS_DDW_SWAP(2);
+    SPAPR_RTAS_DDW_SWAP(3);
+}
+
 static void spapr_rtas_ddw_init(void)
 {
     spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW,
@@ -292,6 +327,11 @@ static void spapr_rtas_ddw_init(void)
     spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW,
                         "ibm,reset-pe-dma-window",
                         rtas_ibm_reset_pe_dma_window);
+
+    spapr_rtas_register_wrong_endian(RTAS_IBM_QUERY_PE_DMA_WINDOW,
+                                     rtas_ibm_query_pe_dma_window_wrong_endian);
+    spapr_rtas_register_wrong_endian(RTAS_IBM_CREATE_PE_DMA_WINDOW,
+                                     rtas_ibm_create_pe_dma_window_wrong_endian);
 }
 
 type_init(spapr_rtas_ddw_init)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5f4e137..bf8e4a6 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -435,6 +435,8 @@ typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                               uint32_t nargs, target_ulong args,
                               uint32_t nret, target_ulong rets);
 void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn);
+void spapr_rtas_register_wrong_endian(int token, spapr_rtas_fn fn);
+
 target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                              uint32_t token, uint32_t nargs, target_ulong args,
                              uint32_t nret, target_ulong rets);
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 17/18] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (15 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 16/18] spapr_rtas_ddw: Workaround broken LE guests Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  4:30   ` David Gibson
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
  2015-01-30  4:01 ` [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c |  7 ++++---
 target-ppc/kvm.c     | 48 +++++++++++++++++++++++++++++++++++++-----------
 target-ppc/kvm_ppc.h | 10 +++++++---
 3 files changed, 48 insertions(+), 17 deletions(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 1adf568..258f837 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -129,11 +129,12 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
 static int spapr_tce_table_realize(DeviceState *dev)
 {
     sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
-    uint64_t window_size = (uint64_t)tcet->nb_table << tcet->page_shift;
 
-    if (kvm_enabled() && !(window_size >> 32)) {
+    if (kvm_enabled()) {
         tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
-                                              window_size,
+                                              tcet->nb_table,
+                                              tcet->bus_offset,
+                                              tcet->page_shift,
                                               &tcet->fd,
                                               tcet->vfio_accel);
     }
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 1edf2b5..5e0e2e8 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -63,6 +63,7 @@ static int cap_booke_sregs;
 static int cap_ppc_smt;
 static int cap_ppc_rma;
 static int cap_spapr_tce;
+static int cap_spapr_tce_64;
 static int cap_spapr_multitce;
 static int cap_spapr_vfio;
 static int cap_hior;
@@ -104,6 +105,7 @@ int kvm_arch_init(KVMState *s)
     cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
     cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
     cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
+    cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
     cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
     cap_spapr_vfio = false;
     cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
@@ -1994,13 +1996,10 @@ bool kvmppc_spapr_use_multitce(void)
     return cap_spapr_multitce;
 }
 
-void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
-                              bool vfio_accel)
+void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_shift,
+                              uint64_t bus_offset, uint32_t page_shift,
+                              int *pfd, bool vfio_accel)
 {
-    struct kvm_create_spapr_tce args = {
-        .liobn = liobn,
-        .window_size = window_size,
-    };
     long len;
     int fd;
     void *table;
@@ -2013,14 +2012,41 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
         return NULL;
     }
 
-    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
-    if (fd < 0) {
-        fprintf(stderr, "KVM: Failed to create TCE table for liobn 0x%x\n",
-                liobn);
+    if (cap_spapr_tce_64) {
+        struct kvm_create_spapr_tce_64 args = {
+            .liobn = liobn,
+            .page_shift = page_shift,
+            .offset = bus_offset >> page_shift,
+            .size = window_shift,
+            .flags = 0
+        };
+        fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE_64, &args);
+        if (fd < 0) {
+            fprintf(stderr,
+                    "KVM: Failed to create TCE64 table for liobn 0x%x\n",
+                    liobn);
+            return NULL;
+        }
+    } else if (cap_spapr_tce) {
+        uint64_t window_size = (uint64_t) window_shift << page_shift;
+        struct kvm_create_spapr_tce args = {
+            .liobn = liobn,
+            .window_size = window_size,
+        };
+        if ((window_size != args.window_size) || bus_offset) {
+            return NULL;
+        }
+        fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
+        if (fd < 0) {
+            fprintf(stderr, "KVM: Failed to create TCE table for liobn 0x%x\n",
+                    liobn);
+            return NULL;
+        }
+    } else {
         return NULL;
     }
 
-    len = (window_size / SPAPR_TCE_PAGE_SIZE) * sizeof(uint64_t);
+    len = window_shift * sizeof(uint64_t);
     /* FIXME: round this up to page size */
 
     table = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 2e0224c..ec36777 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -35,8 +35,9 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
 #ifndef CONFIG_USER_ONLY
 off_t kvmppc_alloc_rma(void **rma);
 bool kvmppc_spapr_use_multitce(void);
-void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
-                              bool vfio_accel);
+void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_shift,
+                              uint64_t bus_offset, uint32_t page_shift,
+                              int *pfd, bool vfio_accel);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
 uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
@@ -157,7 +158,10 @@ static inline bool kvmppc_spapr_use_multitce(void)
 }
 
 static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
-                                            uint32_t window_size, int *fd,
+                                            uint32_t window_shift,
+                                            uint64_t bus_offset,
+                                            uint32_t page_shift,
+                                            int *fd,
                                             bool vfio_accel)
 {
     return NULL;
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (16 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 17/18] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 Alexey Kardashevskiy
@ 2015-01-29  9:27 ` Alexey Kardashevskiy
  2015-02-05  4:49   ` David Gibson
  2015-01-30  4:01 ` [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
  18 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-29  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, Alexander Graf,
	David Gibson
TCE hypercalls (H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE) use a logical bus
number (LIOBN) to identify which TCE table the request is addressed to.
However VFIO kernel driver operates with IOMMU group IDs and has no idea
about which LIOBN corresponds to which group. If the host kernel supports
in-kernel acceleration for TCE calls, we have to provide the LIOBN to IOMMU
mapping information.
This makes use of a VFIO KVM device's
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN attribute to set the link
between LIOBN and IOMMU group.
The vfio_container_spapr_set_liobn() helper is implemented completely
in vfio.c because kvm_vfio_spapr_tce_liobn needs a group fd and
we do not want to share resources likes that outside vfio.c.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c    |  1 +
 hw/ppc/spapr_pci_vfio.c | 11 +++++++++++
 hw/vfio/common.c        | 33 +++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio.h  |  4 ++++
 4 files changed, 49 insertions(+)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 258f837..3de95d7 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -142,6 +142,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
     if (!tcet->table) {
         size_t table_size = tcet->nb_table * sizeof(uint64_t);
         tcet->table = g_malloc0(table_size);
+        tcet->vfio_accel = false;
     }
 
     trace_spapr_iommu_new_table(tcet->liobn, tcet, tcet->table, tcet->fd);
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 257181d..2078187 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -21,6 +21,7 @@
 #include "hw/pci-host/spapr.h"
 #include "linux/vfio.h"
 #include "hw/vfio/vfio.h"
+#include "qemu/error-report.h"
 
 static Property spapr_phb_vfio_properties[] = {
     DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
@@ -80,6 +81,16 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
     memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
                                 spapr_tce_get_iommu(*ptcet));
 
+    if (!(*ptcet)->vfio_accel) {
+        return 0;
+    }
+    ret = vfio_container_spapr_set_liobn(&sphb->iommu_as,
+                                         liobn, (*ptcet)->bus_offset);
+    if (ret) {
+        error_report("spapr-vfio: failed to create link to IOMMU");
+        ret = 0;
+    }
+
     return ret;
 }
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a26cbae..ec778d0 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1053,3 +1053,36 @@ int vfio_container_ioctl(AddressSpace *as,
 
     return vfio_container_do_ioctl(as, req, param);
 }
+
+int vfio_container_spapr_set_liobn(AddressSpace *as,
+                                   uint64_t liobn,
+                                   uint64_t start_addr)
+{
+#ifdef CONFIG_KVM
+    int ret;
+    struct kvm_vfio_spapr_tce_liobn param = {
+        .argsz = sizeof(param),
+        .liobn = liobn,
+        .start_addr = start_addr
+    };
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN,
+        .addr = (uint64_t)(unsigned long)¶m,
+    };
+
+    if (vfio_kvm_device_fd < 0) {
+        return 0;
+    }
+
+    ret = ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr);
+    if (ret) {
+        error_report("vfio: failed to setup liobn for a group: %s",
+                     strerror(errno));
+    }
+
+    return ret;
+#else
+    return 0;
+#endif
+}
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 76b5744..8457933 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -6,4 +6,8 @@
 extern int vfio_container_ioctl(AddressSpace *as,
                                 int req, void *param);
 
+extern int vfio_container_spapr_set_liobn(AddressSpace *as,
+                                          uint64_t liobn,
+                                          uint64_t start_addr);
+
 #endif
-- 
2.0.0
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW)
  2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
                   ` (17 preceding siblings ...)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
@ 2015-01-30  4:01 ` Alexey Kardashevskiy
  18 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-30  4:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, qemu-ppc, Alexander Graf, David Gibson
Hi!
Forgot to mention  - this is RFC only as 1) it depends on the host kernel
changes 2) does not support migration for emulated PHB + DDW, I think I
have to give David's idea with TCE table stub object a go.
Thanks
On 01/29/2015 08:27 PM, Alexey Kardashevskiy wrote:
> At the moment sPAPR PHB supports only a single 32bit window
> which is normally 1..2GB which is not enough for high performance devices.
> 
> PAPR spec enables creating an additional window(s) to support 64bit
> DMA and bigger page sizes.
> 
> This patchset adds DDW support for pseries. The host kernel changes are
> required, posted earlier today as:
> [PATCH v3 00/24] powerpc/iommu/vfio: Enable Dynamic DMA windows
> 
> This was tested on POWER8 system which allows one additional DMA window
> which is mapped at 0x800.0000.0000.0000 and supports 16MB pages.
> Existing guests check for DDW capabilities in PHB's device tree and if it
> is present, they request for an additional window and map entire guest RAM
> using H_PUT_TCE/... hypercalls once at boot time and switch to direct DMA
> operations.
> 
> TCE tables still may be big enough for guests backed with 64K pages but they
> are reasonably small for guests backed by 16MB pages.
> 
> 
> This does not contain PCI 64bit BAR support and VIO-TCE-bypass rework, these
> are required for this to work but they have been posted separately today.
> 
> 
> Please comment. Thanks!
> 
> Changes:
> v4:
> * (!) reimplemented the whole thing
> * machine reset and ddw-reset RTAS call both remove all TCE tables and
> create the default one
> * IOMMU group id is not needed to use VFIO PHB anymore, multiple groups
> are supported on the same VFIO container and virtual PHB
> 
> v3:
> * removed "reset" from API now
> * reworked machine versions
> * applied multiple comments
> * includes David's machine QOM rework as this patchset adds a new machine type
> 
> v2:
> * tested on emulated PHB
> * removed "ddw" machine property, now it is PHB property
> * disabled by default
> * defined "pseries-2.2" machine which enables DDW by default
> * fixed reset() and reference counting
> 
> 
> 
> 
> Alexey Kardashevskiy (18):
>   spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
>   spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe
>   spapr_pci: Introduce a liobn number generating macros
>   spapr_vio: Introduce a liobn number generating macros
>   spapr_pci: Make find_phb()/find_dev() public
>   spapr_iommu: Make spapr_tce_find_by_liobn() public
>   spapr_iommu: Implement free_table() helper
>   vfio: Add DMA memory registering
>   spapr_rtas: Reserve DDW RTAS token numbers
>   spapr_pci: Define DDW callbacks
>   spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
>   spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers
>   spapr_pci: Advertise dynamic DMA windows to guest
>   vfio: Enable DDW ioctls to VFIO IOMMU driver
>   spapr_pci_vfio: Enable multiple groups per container
>   spapr_rtas_ddw: Workaround broken LE guests
>   target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64
>   vfio: Enable in-kernel acceleration via VFIO KVM device
> 
>  hw/ppc/Makefile.objs          |   3 +
>  hw/ppc/spapr.c                |   5 +
>  hw/ppc/spapr_iommu.c          |  23 ++-
>  hw/ppc/spapr_pci.c            | 209 ++++++++++++++++++++------
>  hw/ppc/spapr_pci_vfio.c       | 108 +++++++++-----
>  hw/ppc/spapr_rtas.c           |  29 ++++
>  hw/ppc/spapr_rtas_ddw.c       | 337 ++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_vio.c            |   2 +-
>  hw/vfio/common.c              | 167 ++++++++++++++++++---
>  include/hw/pci-host/spapr.h   |  38 ++++-
>  include/hw/ppc/spapr.h        |  15 +-
>  include/hw/vfio/vfio-common.h |   3 +-
>  include/hw/vfio/vfio.h        |   6 +-
>  target-ppc/kvm.c              |  48 ++++--
>  target-ppc/kvm_ppc.h          |  10 +-
>  trace-events                  |   4 +
>  16 files changed, 881 insertions(+), 126 deletions(-)
>  create mode 100644 hw/ppc/spapr_rtas_ddw.c
> 
-- 
Alexey
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 02/18] spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 02/18] spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe Alexey Kardashevskiy
@ 2015-02-02  6:30   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-02  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 529 bytes --]
On Thu, Jan 29, 2015 at 08:27:14PM +1100, Alexey Kardashevskiy wrote:
> PAPR is defined as big endian so TCEs need an adjustment so
> does this patch.
> 
> This changes code to have ldq_be_phys() in one place.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
@ 2015-02-02  6:37   ` David Gibson
  2015-02-03  1:32     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: David Gibson @ 2015-02-02  6:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]
On Thu, Jan 29, 2015 at 08:27:19PM +1100, Alexey Kardashevskiy wrote:
> Every sPAPRTCETable object holds an IOMMU memory region which holds
> a referenced to the sPAPRTCETable instance. So if we want to free
> an sPAPRTCETable instance, calling object_unref() will not be enough
> as embedded memory region will hold the reference and we need to break
> the loop.
> 
> This adds a spapr_tce_free_table() helper which destroys the embedded
> memory region and then calls object_unref() on the sPAPRTCETable instance
> which will succeed now. The helper adds a new child under unique name
> derived from LIOBN.
> 
> As we are here, fix spapr_tce_new_table() callers.
> At the moment spapr_tce_new_table() references sPAPRTCETable twice -
> once in object_new() and second time in object_property_add_child().
> The callers of spapr_tce_new_table() should take care of correct
> referencing.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
With the caveat that I don't know the QOM refcounting model well
enough to really analyze if this is correct.  It's not obviously
wrong, and doesn't look like it will break anything in the actual PAPR
related code.
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering Alexey Kardashevskiy
@ 2015-02-02  7:04   ` David Gibson
  2015-02-17  2:14     ` Alexey Kardashevskiy
  2015-02-17  2:20     ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering [repost] Alexey Kardashevskiy
  0 siblings, 2 replies; 40+ messages in thread
From: David Gibson @ 2015-02-02  7:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 7143 bytes --]
On Thu, Jan 29, 2015 at 08:27:20PM +1100, Alexey Kardashevskiy wrote:
> This makes use of the new "memory registering" feature. The idea is
> to provide the guest ability to notify the host kernel about pages which
> are going to be used for DMA. Having this information, the host kernel
> can pin them all, do locked pages accounting and not spent time on
> doing that in real time with possible failures which cannot be handled
> nicely in some cases.
> 
> This adds a memory listener which notifies a VFIO container about memory
> which needs to be pinned/unpinned. VFIO MMIO regions are skipped.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/vfio/common.c              | 99 ++++++++++++++++++++++++++++++++++++++++++-
>  include/hw/vfio/vfio-common.h |  3 +-
>  2 files changed, 100 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index cf483ff..c08f9ab 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -485,6 +485,99 @@ void vfio_listener_release(VFIOContainer *container)
>      memory_listener_unregister(&container->iommu_data.type1.listener);
>  }
>  
> +static int vfio_mem_register(VFIOContainer *container,
> +                             void *vaddr, ram_addr_t size)
> +{
> +    struct vfio_iommu_type1_register_memory reg = {
> +        .argsz = sizeof(reg),
> +        .vaddr = (__u64)(uintptr_t)vaddr,
> +        .size = size,
> +    };
> +    long ret = ioctl(container->fd, VFIO_IOMMU_REGISTER_MEMORY, ®);
> +
> +    DPRINTF("(%u) %s %u: va=%lx size=%lx, ret=%ld\n", getpid(),
> +            __func__, __LINE__, reg.vaddr, reg.size, ret);
> +
> +    return ret;
> +}
> +
> +static int vfio_mem_unregister(VFIOContainer *container,
> +                               void *vaddr, ram_addr_t size)
> +{
> +    struct vfio_iommu_type1_unregister_memory reg = {
> +        .argsz = sizeof(reg),
> +        .vaddr = (__u64)(uintptr_t)vaddr,
> +        .size = size,
> +    };
> +    long ret = ioctl(container->fd, VFIO_IOMMU_UNREGISTER_MEMORY, ®);
> +
> +    DPRINTF("(%u) %s %u: va=%lx size=%lx, ret=%ld\n", getpid(),
> +            __func__, __LINE__, reg.vaddr, reg.size, ret);
> +
> +    return ret;
> +}
> +
> +static void vfio_ram_region_add(MemoryListener *listener,
> +                                MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer,
> +                                            iommu_data.type1.ramlistener);
> +    hwaddr end;
> +    Int128 llend;
> +    void *vaddr;
> +
> +    if (!memory_region_is_ram(section->mr) ||
> +        memory_region_is_skip_dump(section->mr)) {
> +        return;
> +    }
> +
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
Does this need an add TARGET_PAGE_SIZE-1 in order to make it round up
to a page boundary, rather than truncate to a page boundary?
> +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> +
> +    memory_region_ref(section->mr);
> +
> +    end = int128_get64(llend);
> +    vaddr = memory_region_get_ram_ptr(section->mr) +
> +        section->offset_within_region;
> +    vfio_mem_register(container, vaddr, end);
> +}
> +
> +static void vfio_ram_region_del(MemoryListener *listener,
> +                                MemoryRegionSection *section)
> +{
> +    VFIOContainer *container = container_of(listener, VFIOContainer,
> +                                            iommu_data.type1.ramlistener);
> +    hwaddr iova, end;
> +    void *vaddr;
> +
> +    if (!memory_region_is_ram(section->mr) ||
> +        memory_region_is_skip_dump(section->mr)) {
> +        return;
> +    }
> +
> +
> +    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> +    end = (section->offset_within_address_space + int128_get64(section->size)) &
> +        TARGET_PAGE_MASK;
> +    vaddr = memory_region_get_ram_ptr(section->mr) +
> +        section->offset_within_region +
> +        (iova - section->offset_within_address_space);
It's not clear to me why region_add and region_del have a different
set of address calculations.
> +    vfio_mem_unregister(container, vaddr, end - iova);
> +}
> +
> +const MemoryListener vfio_ram_listener = {
> +    .region_add = vfio_ram_region_add,
> +    .region_del = vfio_ram_region_del,
> +};
> +
> +static void vfio_spapr_listener_release(VFIOContainer *container)
> +{
> +    memory_listener_unregister(&container->iommu_data.type1.listener);
> +    memory_listener_unregister(&container->iommu_data.type1.ramlistener);
Accessing fields within type1 from a function whose name says it's
spapr specific seems very wrong.
> +}
> +
>  int vfio_mmap_region(Object *obj, VFIORegion *region,
>                       MemoryRegion *mem, MemoryRegion *submem,
>                       void **map, size_t size, off_t offset,
> @@ -705,6 +798,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              goto free_container_exit;
>          }
>  
> +        container->iommu_data.type1.ramlistener = vfio_ram_listener;
> +        memory_listener_register(&container->iommu_data.type1.ramlistener,
> +                                 &address_space_memory);
Why two separate listeners, rather than doing both jobs from a single listener?
>          /*
>           * The host kernel code implementing VFIO_IOMMU_DISABLE is called
>           * when container fd is closed so we do not call it explicitly
> @@ -718,7 +815,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>          }
>  
>          container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> +        container->iommu_data.release = vfio_spapr_listener_release;
>  
>          memory_listener_register(&container->iommu_data.type1.listener,
>                                   container->space->as);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 1d5bfe8..3f91216 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -66,6 +66,7 @@ struct VFIOGroup;
>  
>  typedef struct VFIOType1 {
>      MemoryListener listener;
> +    MemoryListener ramlistener;
>      int error;
>      bool initialized;
>  } VFIOType1;
> @@ -144,7 +145,7 @@ int vfio_get_device(VFIOGroup *group, const char *name,
>                      VFIODevice *vbasedev);
>  
>  extern const MemoryRegionOps vfio_region_ops;
> -extern const MemoryListener vfio_memory_listener;
> +extern const MemoryListener vfio_memory_listener, vfio_ram_listener;
>  extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
>  extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
>  
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 09/18] spapr_rtas: Reserve DDW RTAS token numbers
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 09/18] spapr_rtas: Reserve DDW RTAS token numbers Alexey Kardashevskiy
@ 2015-02-02  7:09   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-02  7:09 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 1624 bytes --]
On Thu, Jan 29, 2015 at 08:27:21PM +1100, Alexey Kardashevskiy wrote:
> Will be squashed later.
? Was this supposed to reach the published series?
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/hw/ppc/spapr.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b442e41..5f4e137 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -382,8 +382,12 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
>  #define RTAS_GET_SENSOR_STATE                   (RTAS_TOKEN_BASE + 0x1D)
>  #define RTAS_IBM_CONFIGURE_CONNECTOR            (RTAS_TOKEN_BASE + 0x1E)
>  #define RTAS_IBM_OS_TERM                        (RTAS_TOKEN_BASE + 0x1F)
> +#define RTAS_IBM_QUERY_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x20)
> +#define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x21)
> +#define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x22)
> +#define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x23)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x20)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x24)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper
  2015-02-02  6:37   ` David Gibson
@ 2015-02-03  1:32     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-02-03  1:32 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/02/2015 05:37 PM, David Gibson wrote:
> On Thu, Jan 29, 2015 at 08:27:19PM +1100, Alexey Kardashevskiy wrote:
>> Every sPAPRTCETable object holds an IOMMU memory region which holds 
>> a referenced to the sPAPRTCETable instance. So if we want to free an
>> sPAPRTCETable instance, calling object_unref() will not be enough as
>> embedded memory region will hold the reference and we need to break 
>> the loop.
>> 
>> This adds a spapr_tce_free_table() helper which destroys the
>> embedded memory region and then calls object_unref() on the
>> sPAPRTCETable instance which will succeed now. The helper adds a new
>> child under unique name derived from LIOBN.
>> 
>> As we are here, fix spapr_tce_new_table() callers. At the moment
>> spapr_tce_new_table() references sPAPRTCETable twice - once in
>> object_new() and second time in object_property_add_child(). The
>> callers of spapr_tce_new_table() should take care of correct 
>> referencing.
>> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> With the caveat that I don't know the QOM refcounting model well 
> enough to really analyze if this is correct.  It's not obviously 
> wrong, and doesn't look like it will break anything in the actual
> PAPR related code.
I am redoing this part to always keep sPAPRTCETable and not free them on
every reboot so you can ignore this type of problems now. Thanks!
- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJU0CUvAAoJEIYTPdgrwSC5mpcP+wa3vulve++JX06UWbTsaarV
nXQhWsKHAFQZmQYzRzyUwl00JEIp14NZ9dzBXEaAIutQmiFDeNA7a3QCHpnkt2Ei
JJzzzK5ygjcX9H2RP+hzaoJkHxdFKMQ9TREKe86IsHYw34rREzfS/Li6PkTm+LSo
3jnnd1k0GGB9AX/hwvDHHcdQRpio2sI0lG7WH3R8FDbHAHo/YW3JvrPL3iTDpNDA
HYR9Hy4JynzQIGJhzYFkmODEjUbLtn/c/xIYVZJDNdDdLETF1sK+iTpbinxtPe5n
JsKGncsFmZGKotJ3ZAjPxogdYhqtvq5/VsUpc2jHlC4QKgEsVLqUm5cfu+hl5a00
w/4W9tB+lqbFFvTxNAOokYPnWaD+NJZ3DY03vsCKRk16/MrMjdfMpsj/m3ai6pAx
Pc/A2JfZ30SjqbFBLQJZKBDKQ9JARp7F/S60s7iW2D4ELRChdprcb/cE/gVGoWFV
DYKpPRtdzjNNtjMLNd6OuWVu3LItcfz42VQwxrwjyZOIIgpdpOM1hBBXqMjhOykW
gFtlf/zEDAxQ9cl0RrjH9gC1UwnP7w+slvspa5MNghnFKByHBOwUW9bNpDc2xcp2
9I6j1yjxI81Si96uGFj8Ba2LzBGaTzVIbiSsnV2cvZKirNAN/Gnm9lcgwOOMKA09
QAx4oiji73rE6GzBZxFx
=o1mo
-----END PGP SIGNATURE-----
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 03/18] spapr_pci: Introduce a liobn number generating macros
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 03/18] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
@ 2015-02-04 15:31   ` Alexander Graf
  0 siblings, 0 replies; 40+ messages in thread
From: Alexander Graf @ 2015-02-04 15:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: Alex Williamson, qemu-ppc, David Gibson
On 29.01.15 10:27, Alexey Kardashevskiy wrote:
> We are going to have multiple DMA windows per PHB and we want them to
> migrate so we need a predictable way of assigning LIOBNs.
> 
> This introduces a macro which makes up a LIOBN from fixed prefix,
> PHB index (unique PHB id) and window number.
> 
> This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
> from LIOBN. It is used to distinguish the default 32bit windows from
> dynamic windows and avoid picking default DMA window properties from
> a wrong TCE table.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_pci.c     | 4 ++--
>  include/hw/ppc/spapr.h | 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 4500849..64c7702 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -502,7 +502,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          }
>  
>          sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
> -        sphb->dma_liobn = SPAPR_PCI_BASE_LIOBN + sphb->index;
> +        sphb->dma_liobn = SPAPR_PCI_LIOBN(sphb->index, 0);
>  
>          windows_base = SPAPR_PCI_WINDOW_BASE
>              + sphb->index * SPAPR_PCI_WINDOW_SPACING;
> @@ -843,7 +843,7 @@ static int spapr_phb_children_dt(Object *child, void *opaque)
>      sPAPRTCETable *tcet;
>  
>      tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
> -    if (!tcet) {
> +    if (!tcet || SPAPR_PCI_DMA_WINDOW_NUM(tcet->liobn)) {
>          return 0;
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 642cdc3..a2c4bac 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -442,7 +442,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
>  #define SPAPR_TCE_PAGE_MASK    (SPAPR_TCE_PAGE_SIZE - 1)
>  
>  #define SPAPR_VIO_BASE_LIOBN    0x00000000
> -#define SPAPR_PCI_BASE_LIOBN    0x80000000
> +#define SPAPR_PCI_LIOBN(i, n)   (0x80000000 | ((i) << 8) | (n))
It would be more readable if you called "i" "phb_id" and "n"
"window_num" instead ;).
Alex
> +#define SPAPR_PCI_DMA_WINDOW_NUM(liobn) ((liobn) & 0xff)
>  
>  #define RTAS_ERROR_LOG_MAX      2048
>  
> 
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 11/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 11/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
@ 2015-02-05  3:51   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-05  3:51 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 16286 bytes --]
On Thu, Jan 29, 2015 at 08:27:23PM +1100, Alexey Kardashevskiy wrote:
> This implements DDW for emulated and VFIO PHB.
> 
> This removes all DMA windows on reset and creates the default window,
> same is done on the "ibm,reset-pe-dma-window" call.
> This converts sPAPRPHBClass::finish_realize to sPAPRPHBClass::ddw_reset
> and others.
> 
> The "ddw" property is enabled by default on a PHB but for compatibility
> pseries-2.1 machine disables it.
Now that we're past the 2.2 release, this should change to only be
enabled for 2.3+, yes?
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v4:
> * reset handler is back in generalized form
> 
> v3:
> * removed reset
> * windows_num is now 1 or bigger rather than 0-based value and it is only
> changed in PHB code, not in RTAS
> * added page mask check in create()
> * added SPAPR_PCI_DDW_MAX_WINDOWS to track how many windows are already
> created
> 
> v2:
> * tested on hacked emulated E1000
> * implemented DDW reset on the PHB reset
> * spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO
> 
> spapr_pci_vfio: Enable DDW
> 
> This implements DDW for VFIO. Host kernel support is required for this.
> 
> After this patch DDW will be enabled on all machines but pseries-2.1.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * remove()/reset() callbacks use spapr_pci's ones
> ---
>  hw/ppc/spapr_pci.c          | 160 +++++++++++++++++++++++++++++++++++---------
>  hw/ppc/spapr_pci_vfio.c     |  98 +++++++++++++++++----------
>  include/hw/pci-host/spapr.h |  15 ++++-
>  3 files changed, 203 insertions(+), 70 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 6bd00e8..3ec03be 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -469,6 +469,126 @@ static const MemoryRegionOps spapr_msi_ops = {
>      .endianness = DEVICE_LITTLE_ENDIAN
>  };
>  
> +static int spapr_phb_get_win_num_cb(Object *child, void *opaque)
> +{
> +    if (object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE)) {
> +        ++*(unsigned *)opaque;
> +    }
> +    return 0;
> +}
> +
> +unsigned spapr_phb_get_win_num(sPAPRPHBState *sphb)
> +{
> +    unsigned ret = 0;
> +
> +    object_child_foreach(OBJECT(sphb), spapr_phb_get_win_num_cb, &ret);
> +
> +    return ret;
> +}
> +
> +/*
> + * Dynamic DMA windows
> + */
> +static int spapr_pci_ddw_query(sPAPRPHBState *sphb,
> +                               uint32_t *windows_supported,
> +                               uint32_t *page_size_mask,
> +                               uint32_t *dma32_window_size,
> +                               uint64_t *dma64_window_size)
> +{
> +    *windows_supported = SPAPR_PCI_DDW_MAX_WINDOWS;
> +    *page_size_mask = DDW_PGSIZE_64K | DDW_PGSIZE_16M;
> +    *dma32_window_size = SPAPR_PCI_TCE32_WIN_SIZE;
> +    *dma64_window_size = ram_size;
> +
> +    return 0;
> +}
> +
> +static int spapr_pci_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
> +                                uint32_t page_shift, uint32_t window_shift,
> +                                sPAPRTCETable **ptcet)
> +{
> +    uint64_t bus_offset = spapr_phb_get_win_num(sphb) ?
> +                          SPAPR_PCI_TCE64_START : 0;
Should you also have an assert that spapr_phb_get_win_num(sphb) <=1 at
this point?
> +
> +    if (((page_shift != 16) && (page_shift != 24) && (page_shift != 12))) {
> +        return -1;
You only have two return values: failure and success.  So is there a
reason you're using an int, rather than returning the sPAPRTCETable *
or NULL?
> +    }
> +
> +    *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
> +                                 bus_offset,
> +                                 page_shift,
> +                                 1ULL << (window_shift - page_shift),
> +                                 false);
> +    if (!*ptcet) {
> +        return -1;
> +    }
> +    memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
> +                                spapr_tce_get_iommu(*ptcet));
> +
> +    return 0;
> +}
> +
> +int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
> +{
> +    memory_region_del_subregion(&sphb->iommu_root,
> +                                spapr_tce_get_iommu(tcet));
> +    spapr_tce_free_table(tcet);
> +
> +    return 0;
> +}
> +
> +static int spapr_pci_remove_ddw_cb(Object *child, void *opaque)
> +{
> +    sPAPRTCETable *tcet;
> +
> +    tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
> +
> +    if (tcet) {
> +        sPAPRPHBState *sphb = opaque;
> +        sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +
> +        spc->ddw_remove(sphb, tcet);
> +    }
> +
> +    return 0;
> +}
> +
> +int spapr_pci_ddw_reset(sPAPRPHBState *sphb)
> +{
> +    int ret;
> +    sPAPRPHBClass *spc;
> +    sPAPRTCETable *tcet;
> +    uint32_t windows_supported = 0, page_size_mask = 0, dma32_window_size = 0;
> +    uint64_t dma64_window_size = 0;
> +
> +    /* Remove all windows */
> +    object_child_foreach(OBJECT(sphb), spapr_pci_remove_ddw_cb, sphb);
> +
> +    /* Create default 32bit window */
This comment seems to below a few lines down from here.
> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +    if (!spc->ddw_create || !spc->ddw_query) {
> +        return -1;
> +    }
> +
> +    ret = spc->ddw_query(sphb, &windows_supported, &page_size_mask,
> +                         &dma32_window_size, &dma64_window_size);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    sphb->ddw_enabled = (windows_supported > 1);
ddw_enabled doesn't actually seem to be tested anywhere.  And
shouldn't it depend on the externall set property for pre-2.3
compat, not just on the # windows supported by the underlying
implementation?
> +    ret = spc->ddw_create(sphb, SPAPR_PCI_LIOBN(sphb->index, 0),
> +                          SPAPR_TCE_PAGE_SHIFT, ctzl(dma32_window_size), &tcet);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    object_unref(OBJECT(tcet));
This could perhaps do with a comment saying why you've ended up with
an extraneous reference.
> +
> +    return 0;
> +}
> +
>  /*
>   * PHB PCI device
>   */
> @@ -484,7 +604,6 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>      SysBusDevice *s = SYS_BUS_DEVICE(dev);
>      sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>      PCIHostState *phb = PCI_HOST_BRIDGE(s);
> -    sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
>      char *namebuf;
>      int i;
>      PCIBus *bus;
> @@ -622,37 +741,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          sphb->lsi_table[i].irq = irq;
>      }
>  
> -    if (!info->finish_realize) {
> -        error_setg(errp, "finish_realize not defined");
> -        return;
> -    }
> -
> -    info->finish_realize(sphb, errp);
> -
>      sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
>  }
>  
> -static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> -{
> -    sPAPRTCETable *tcet;
> -
> -    tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> -                               0,
> -                               SPAPR_TCE_PAGE_SHIFT,
> -                               0x40000000 >> SPAPR_TCE_PAGE_SHIFT, false);
> -    if (!tcet) {
> -        error_setg(errp, "Unable to create TCE table for %s",
> -                   sphb->dtbusname);
> -        return ;
> -    }
> -
> -    /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, 0,
> -                                spapr_tce_get_iommu(tcet));
> -
> -    object_unref(OBJECT(tcet));
> -}
> -
>  static int spapr_phb_children_reset(Object *child, void *opaque)
>  {
>      DeviceState *dev = (DeviceState *) object_dynamic_cast(child, TYPE_DEVICE);
> @@ -666,7 +757,11 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
>  
>  static void spapr_phb_reset(DeviceState *qdev)
>  {
> -    /* Reset the IOMMU state */
> +    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(qdev);
> +
> +    if (spc->ddw_reset) {
> +        spc->ddw_reset(SPAPR_PCI_HOST_BRIDGE(qdev));
> +    }
>      object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
>  }
>  
> @@ -801,7 +896,10 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_spapr_pci;
>      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>      dc->cannot_instantiate_with_device_add_yet = false;
> -    spc->finish_realize = spapr_phb_finish_realize;
> +    spc->ddw_query = spapr_pci_ddw_query;
> +    spc->ddw_create = spapr_pci_ddw_create;
> +    spc->ddw_remove = spapr_pci_ddw_remove;
> +    spc->ddw_reset = spapr_pci_ddw_reset;
>  }
>  
>  static const TypeInfo spapr_phb_info = {
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index aabf0ae..b20ac90 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -27,65 +27,89 @@ static Property spapr_phb_vfio_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> -static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
> +static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
> +                                    uint32_t *windows_supported,
> +                                    uint32_t *page_size_mask,
> +                                    uint32_t *dma32_window_size,
> +                                    uint64_t *dma64_window_size)
>  {
>      sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>      int ret;
> -    sPAPRTCETable *tcet;
> -    uint32_t liobn = svphb->phb.dma_liobn;
>  
> -    if (svphb->iommugroupid == -1) {
> -        error_setg(errp, "Wrong IOMMU group ID %d", svphb->iommugroupid);
> -        return;
> -    }
> -
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> -                               VFIO_CHECK_EXTENSION,
> -                               (void *) VFIO_SPAPR_TCE_IOMMU);
> -    if (ret != 1) {
> -        error_setg_errno(errp, -ret,
> -                         "spapr-vfio: SPAPR extension is not supported");
> -        return;
> -    }
> -
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>                                 VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>      if (ret) {
> -        error_setg_errno(errp, -ret,
> -                         "spapr-vfio: get info from container failed");
> -        return;
> +        return ret;
>      }
>  
> -    tcet = spapr_tce_new_table(DEVICE(sphb), liobn, info.dma32_window_start,
> -                               SPAPR_TCE_PAGE_SHIFT,
> -                               info.dma32_window_size >> SPAPR_TCE_PAGE_SHIFT,
> -                               true);
> -    if (!tcet) {
> -        error_setg(errp, "spapr-vfio: failed to create VFIO TCE table");
> -        return;
> +    *windows_supported = info.windows_supported;
> +    *page_size_mask = info.flags & DDW_PGSIZE_MASK;
> +    *dma32_window_size = info.dma32_window_size;
> +    *dma64_window_size = ram_size;
> +
> +    return ret;
> +}
> +
> +static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
> +                                     uint32_t page_shift, uint32_t window_shift,
> +                                     sPAPRTCETable **ptcet)
> +{
> +    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
> +    struct vfio_iommu_spapr_tce_create create = {
> +        .argsz = sizeof(create),
> +        .page_shift = page_shift,
> +        .window_shift = window_shift,
> +        .levels = 1,
> +        .start_addr = 0,
> +    };
> +    int ret;
> +
> +    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> +                               VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
> +    if (ret) {
> +        return ret;
>      }
>  
> -    /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
> -                                spapr_tce_get_iommu(tcet));
> +    *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
> +                                 create.start_addr,
> +                                 page_shift,
> +                                 1ULL << (window_shift - page_shift),
> +                                 true);
> +    if (!*ptcet) {
> +        return -1;
> +    }
> +    memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
> +                                spapr_tce_get_iommu(*ptcet));
>  
> -    object_unref(OBJECT(tcet));
> +    return ret;
>  }
>  
> -static void spapr_phb_vfio_reset(DeviceState *qdev)
> +static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
>  {
> -    /* Do nothing */
> +    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
> +    struct vfio_iommu_spapr_tce_remove remove = {
> +        .argsz = sizeof(remove),
> +        .start_addr = tcet->bus_offset
> +    };
> +    int ret;
> +
> +    spapr_pci_ddw_remove(sphb, tcet);
> +    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> +                               VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
> +
> +    return ret;
>  }
>  
>  static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
>  {
> -    DeviceClass *dc = DEVICE_CLASS(klass);
>      sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
>  
>      dc->props = spapr_phb_vfio_properties;
> -    dc->reset = spapr_phb_vfio_reset;
> -    spc->finish_realize = spapr_phb_vfio_finish_realize;
> +    spc->ddw_query = spapr_pci_vfio_ddw_query;
> +    spc->ddw_create = spapr_pci_vfio_ddw_create;
> +    spc->ddw_remove = spapr_pci_vfio_ddw_remove;
>  }
>  
>  static const TypeInfo spapr_phb_vfio_info = {
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index eec95f3..577f908 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -48,8 +48,6 @@ typedef struct sPAPRPHBVFIOState sPAPRPHBVFIOState;
>  struct sPAPRPHBClass {
>      PCIHostBridgeClass parent_class;
>  
> -    void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
> -
>  /* sPAPR spec defined pagesize mask values */
>  #define DDW_PGSIZE_4K       0x01
>  #define DDW_PGSIZE_64K      0x02
> @@ -106,6 +104,8 @@ struct sPAPRPHBState {
>      int32_t msi_devs_num;
>      spapr_pci_msi_mig *msi_devs;
>  
> +    bool ddw_enabled;
> +
>      QLIST_ENTRY(sPAPRPHBState) list;
>  };
>  
> @@ -129,6 +129,14 @@ struct sPAPRPHBVFIOState {
>  
>  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
>  
> +#define SPAPR_PCI_TCE32_WIN_SIZE     0x80000000ULL
> +
> +/* Default 64bit dynamic window offset */
> +#define SPAPR_PCI_TCE64_START        0x8000000000000000ULL
> +
> +/* Maximum allowed number of DMA windows for emulated PHB */
> +#define SPAPR_PCI_DDW_MAX_WINDOWS    2
> +
>  static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>  {
>      return xics_get_qirq(spapr->icp, phb->lsi_table[pin].irq);
> @@ -147,5 +155,8 @@ void spapr_pci_rtas_init(void);
>  sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
>  PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
>                                uint32_t config_addr);
> +int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
> +int spapr_pci_ddw_reset(sPAPRPHBState *sphb);
> +unsigned spapr_phb_get_win_num(sPAPRPHBState *sphb);
>  
>  #endif /* __HW_SPAPR_PCI_H__ */
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 12/18] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 12/18] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers Alexey Kardashevskiy
@ 2015-02-05  4:05   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-05  4:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 14732 bytes --]
On Thu, Jan 29, 2015 at 08:27:24PM +1100, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> which can support page sizes other than 4K.
> 
> The existing implementation of DDW in the guest tries to create one huge
> DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
> succeeds, the guest switches to dma_direct_ops and never calls
> TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
> the entire RAM and not waste time on map/unmap later.
> 
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
> 
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PHB.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v3:
> * added ibm,reset-pe-dma-window
> 
> v2:
> * double loop squashed to spapr_iommu_fixmask() helper
> * added @ddw_num counter to PHB, it is used to generate LIOBN for new
> window; it is reset on ddw-reset event
> * added ULL to constants used in shift operations
> * rtas_ibm_reset_pe_dma_window() and rtas_ibm_remove_pe_dma_window()
> do not remove windows anymore, the PHB callback has to as it will reuse
> the same code in case of guest reboot as well
> ---
>  hw/ppc/Makefile.objs    |   3 +
>  hw/ppc/spapr_rtas_ddw.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++
>  trace-events            |   4 +
>  3 files changed, 304 insertions(+)
>  create mode 100644 hw/ppc/spapr_rtas_ddw.c
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 19d9920..d7fe4fb 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> +ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
> +obj-y += spapr_rtas_ddw.o
> +endif
>  # PowerPC 4xx boards
>  obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>  obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
> new file mode 100644
> index 0000000..af70601
> --- /dev/null
> +++ b/hw/ppc/spapr_rtas_ddw.c
> @@ -0,0 +1,297 @@
> +/*
> + * QEMU sPAPR Dynamic DMA windows support
> + *
> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License,
> + *  or (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "hw/ppc/spapr.h"
> +#include "hw/pci-host/spapr.h"
> +#include "trace.h"
> +
> +static uint32_t spapr_iommu_fixmask(struct ppc_one_seg_page_size *sps,
> +                                    uint32_t query_mask)
This function could do with a comment describing what it's for.  Every
version of this series I've looked at I've forgotten and had to figure
it out again.
> +{
> +    int i, j;
> +    uint32_t mask = 0;
> +    const struct { int shift; uint32_t mask; } masks[] = {
> +        { 12, DDW_PGSIZE_4K },
> +        { 16, DDW_PGSIZE_64K },
> +        { 24, DDW_PGSIZE_16M },
> +        { 25, DDW_PGSIZE_32M },
> +        { 26, DDW_PGSIZE_64M },
> +        { 27, DDW_PGSIZE_128M },
> +        { 28, DDW_PGSIZE_256M },
> +        { 34, DDW_PGSIZE_16G },
> +    };
> +
> +    for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
> +        for (j = 0; j < ARRAY_SIZE(masks); ++j) {
> +            if ((sps[i].page_shift == masks[j].shift) &&
> +                    (query_mask & masks[j].mask)) {
> +                mask |= masks[j].mask;
> +            }
> +        }
> +    }
> +
> +    return mask;
> +}
> +
> +static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
> +                                         sPAPREnvironment *spapr,
> +                                         uint32_t token, uint32_t nargs,
> +                                         target_ulong args,
> +                                         uint32_t nret, target_ulong rets)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    sPAPRPHBState *sphb;
> +    sPAPRPHBClass *spc;
> +    uint64_t buid;
> +    uint32_t avail, addr, pgmask = 0;
> +    uint32_t windows_supported = 0, page_size_mask = 0, dma32_window_size = 0;
> +    uint64_t dma64_window_size = 0;
> +    unsigned current;
> +    long ret;
> +
> +    if ((nargs != 3) || (nret != 5)) {
> +        goto param_error_exit;
> +    }
> +
> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> +    addr = rtas_ld(args, 0);
> +    sphb = spapr_pci_find_phb(spapr, buid);
> +    if (!sphb || !sphb->ddw_enabled) {
> +        goto param_error_exit;
> +    }
> +
> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +    if (!spc->ddw_query) {
> +        goto hw_error_exit;
> +    }
> +
> +    ret = spc->ddw_query(sphb, &windows_supported, &page_size_mask,
> +                         &dma32_window_size, &dma64_window_size);
> +    trace_spapr_iommu_ddw_query(buid, addr, windows_supported,
> +                                page_size_mask, pgmask, ret);
> +    if (ret) {
> +        goto hw_error_exit;
> +    }
> +
> +    current = spapr_phb_get_win_num(sphb);
> +    avail = (windows_supported > current) ? (windows_supported - current) : 0;
> +
> +    /* Work out supported page masks */
> +    pgmask = spapr_iommu_fixmask(env->sps.sps, page_size_mask);
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, avail);
> +
> +    /*
> +     * This is "Largest contiguous block of TCEs allocated specifically
> +     * for (that is, are reserved for) this PE".
> +     * Return the maximum number as all RAM was in 4K pages.
> +     */
> +    rtas_st(rets, 2, dma64_window_size >> SPAPR_TCE_PAGE_SHIFT);
> +    rtas_st(rets, 3, pgmask);
> +    rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
> +    return;
> +
> +hw_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +    return;
> +
> +param_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +}
> +
> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
> +                                          sPAPREnvironment *spapr,
> +                                          uint32_t token, uint32_t nargs,
> +                                          target_ulong args,
> +                                          uint32_t nret, target_ulong rets)
> +{
> +    sPAPRPHBState *sphb;
> +    sPAPRPHBClass *spc;
> +    sPAPRTCETable *tcet = NULL;
> +    uint32_t addr, page_shift, window_shift, liobn;
> +    uint64_t buid;
> +    long ret;
> +    uint32_t windows_supported = 0, page_size_mask = 0, dma32_window_size = 0;
> +    uint64_t dma64_window_size = 0;
> +
> +    if ((nargs != 5) || (nret != 4)) {
> +        goto param_error_exit;
> +    }
> +
> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> +    addr = rtas_ld(args, 0);
> +    sphb = spapr_pci_find_phb(spapr, buid);
> +    if (!sphb || !sphb->ddw_enabled) {
> +        goto param_error_exit;
> +    }
> +
> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +    if (!spc->ddw_create || !spc->ddw_query) {
> +        goto hw_error_exit;
> +    }
> +
> +    ret = spc->ddw_query(sphb, &windows_supported, &page_size_mask,
> +                         &dma32_window_size, &dma64_window_size);
> +    if (ret || (spapr_phb_get_win_num(sphb) >= windows_supported)) {
> +        goto hw_error_exit;
Is RTAS_OUT_HW_ERROR really the right thing for the case of having
allocated all the available windows?
> +    }
> +
> +    page_shift = rtas_ld(args, 3);
> +    window_shift = rtas_ld(args, 4);
> +    /* Default 32bit window#0 is always there so +1 */
> +    liobn = SPAPR_PCI_LIOBN(sphb->index, spapr_phb_get_win_num(sphb));
> +
> +    ret = spc->ddw_create(sphb, liobn, page_shift, window_shift, &tcet);
> +    trace_spapr_iommu_ddw_create(buid, addr, 1ULL << page_shift,
> +                                 1ULL << window_shift,
> +                                 tcet ? tcet->bus_offset : 0xbaadf00d,
> +                                 liobn, ret);
> +    if (ret || !tcet) {
> +        goto hw_error_exit;
> +    }
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, liobn);
> +    rtas_st(rets, 2, tcet->bus_offset >> 32);
> +    rtas_st(rets, 3, tcet->bus_offset & ((uint32_t) -1));
> +
> +    object_unref(OBJECT(tcet));
> +    return;
> +
> +hw_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +    return;
> +
> +param_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +}
> +
> +static void rtas_ibm_remove_pe_dma_window(PowerPCCPU *cpu,
> +                                          sPAPREnvironment *spapr,
> +                                          uint32_t token, uint32_t nargs,
> +                                          target_ulong args,
> +                                          uint32_t nret, target_ulong rets)
> +{
> +    sPAPRPHBState *sphb;
> +    sPAPRPHBClass *spc;
> +    sPAPRTCETable *tcet;
> +    uint32_t liobn;
> +    long ret;
> +
> +    if ((nargs != 1) || (nret != 1)) {
> +        goto param_error_exit;
> +    }
> +
> +    liobn = rtas_ld(args, 0);
> +    tcet = spapr_tce_find_by_liobn(liobn);
> +    if (!tcet) {
> +        goto param_error_exit;
> +    }
> +
> +    sphb = SPAPR_PCI_HOST_BRIDGE(OBJECT(tcet)->parent);
> +    if (!sphb || !sphb->ddw_enabled) {
> +        goto param_error_exit;
> +    }
> +
> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +    if (!spc->ddw_remove) {
> +        goto hw_error_exit;
> +    }
> +
> +    ret = spc->ddw_remove(sphb, tcet);
> +    trace_spapr_iommu_ddw_remove(liobn, ret);
> +    if (ret) {
> +        goto hw_error_exit;
> +    }
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    return;
> +
> +hw_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +    return;
> +
> +param_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +}
> +
> +static void rtas_ibm_reset_pe_dma_window(PowerPCCPU *cpu,
> +                                         sPAPREnvironment *spapr,
> +                                         uint32_t token, uint32_t nargs,
> +                                         target_ulong args,
> +                                         uint32_t nret, target_ulong rets)
> +{
> +    sPAPRPHBState *sphb;
> +    sPAPRPHBClass *spc;
> +    uint64_t buid;
> +    uint32_t addr;
> +    long ret;
> +
> +    if ((nargs != 3) || (nret != 1)) {
> +        goto param_error_exit;
> +    }
> +
> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> +    addr = rtas_ld(args, 0);
> +    sphb = spapr_pci_find_phb(spapr, buid);
> +    if (!sphb || !sphb->ddw_enabled) {
> +        goto param_error_exit;
> +    }
> +
> +    spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +    if (!spc->ddw_reset) {
> +        goto hw_error_exit;
> +    }
> +
> +    ret = spc->ddw_reset(sphb);
> +    trace_spapr_iommu_ddw_reset(buid, addr, ret);
> +    if (ret) {
> +        goto hw_error_exit;
> +    }
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +
> +    return;
> +
> +hw_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +    return;
> +
> +param_error_exit:
> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +}
> +
> +static void spapr_rtas_ddw_init(void)
> +{
> +    spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW,
> +                        "ibm,query-pe-dma-window",
> +                        rtas_ibm_query_pe_dma_window);
> +    spapr_rtas_register(RTAS_IBM_CREATE_PE_DMA_WINDOW,
> +                        "ibm,create-pe-dma-window",
> +                        rtas_ibm_create_pe_dma_window);
> +    spapr_rtas_register(RTAS_IBM_REMOVE_PE_DMA_WINDOW,
> +                        "ibm,remove-pe-dma-window",
> +                        rtas_ibm_remove_pe_dma_window);
> +    spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW,
> +                        "ibm,reset-pe-dma-window",
> +                        rtas_ibm_reset_pe_dma_window);
> +}
> +
> +type_init(spapr_rtas_ddw_init)
> diff --git a/trace-events b/trace-events
> index 04f5df2..9af53d9 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1285,6 +1285,10 @@ spapr_iommu_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t iobaN
>  spapr_iommu_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
>  spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x"
>  spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d"
> +spapr_iommu_ddw_query(uint64_t buid, uint32_t cfgaddr, uint32_t wa, uint32_t pgz, uint32_t pgz_fixed, long ret) "buid=%"PRIx64" addr=%"PRIx32", %u windows available, sizes %"PRIx32", fixed %"PRIx32", ret = %ld"
> +spapr_iommu_ddw_create(uint64_t buid, uint32_t cfgaddr, unsigned long long pg_size, unsigned long long req_size, uint64_t start, uint32_t liobn, long ret) "buid=%"PRIx64" addr=%"PRIx32", page size=0x%llx, requested=0x%llx, start addr=%"PRIx64", liobn=%"PRIx32", ret = %ld"
> +spapr_iommu_ddw_remove(uint32_t liobn, long ret) "liobn=%"PRIx32", ret = %ld"
> +spapr_iommu_ddw_reset(uint64_t buid, uint32_t cfgaddr, long ret) "buid=%"PRIx64" addr=%"PRIx32", ret = %ld"
>  
>  # hw/ppc/ppc.c
>  ppc_tb_adjust(uint64_t offs1, uint64_t offs2, int64_t diff, int64_t seconds) "adjusted from 0x%"PRIx64" to 0x%"PRIx64", diff %"PRId64" (%"PRId64"s)"
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 13/18] spapr_pci: Advertise dynamic DMA windows to guest
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 13/18] spapr_pci: Advertise dynamic DMA windows to guest Alexey Kardashevskiy
@ 2015-02-05  4:10   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-05  4:10 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 3611 bytes --]
On Thu, Jan 29, 2015 at 08:27:25PM +1100, Alexey Kardashevskiy wrote:
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Needs a commit message.
> ---
>  hw/ppc/spapr.c     |  5 +++++
>  hw/ppc/spapr_pci.c | 25 +++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b560459..f9882c1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1760,6 +1760,11 @@ static void spapr_machine_2_1_class_init(ObjectClass *oc, void *data)
>      MachineClass *mc = MACHINE_CLASS(oc);
>      static GlobalProperty compat_props[] = {
>          HW_COMPAT_2_1,
As noted in earlier comment, this should go to COMPAT_2_2 now that 2.2
is released, shouldn't it?
> +        {
> +            .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
> +            .property = "ddw",
> +            .value    = stringify(off),
> +        },
>          { /* end of list */ }
>      };
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 3ec03be..a94bba1 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -775,6 +775,7 @@ static Property spapr_phb_properties[] = {
>      DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
>      DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
>                         SPAPR_PCI_IO_WIN_SIZE),
> +    DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true),
Also, from my reading of the reset code it looked like this setting
would be overwritten by the (available_windows > 1) test.
Come to that.. does the compat stuff belong here with the dt stuff, or
in the earlier patch?
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -993,6 +994,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>      uint32_t interrupt_map_mask[] = {
>          cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
>      uint32_t interrupt_map[PCI_SLOT_MAX * PCI_NUM_PINS][7];
> +    uint32_t ddw_applicable[] = {
> +        cpu_to_be32(RTAS_IBM_QUERY_PE_DMA_WINDOW),
> +        cpu_to_be32(RTAS_IBM_CREATE_PE_DMA_WINDOW),
> +        cpu_to_be32(RTAS_IBM_REMOVE_PE_DMA_WINDOW)
> +    };
> +    sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(phb);
>  
>      /* Start populating the FDT */
>      sprintf(nodename, "pci@%" PRIx64, phb->buid);
> @@ -1022,6 +1029,24 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>      _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
>      _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS));
>  
> +    /* Dynamic DMA window */
> +    if (phb->ddw_enabled &&
> +        spc->ddw_query && spc->ddw_create && spc->ddw_remove) {
> +        _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-applicable", &ddw_applicable,
> +                         sizeof(ddw_applicable)));
> +
> +        if (spc->ddw_reset) {
> +            uint32_t ddw_extensions[] = {
> +                cpu_to_be32(1),
> +                cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
> +            };
> +
> +            /* When enabled, the guest will remove the default 32bit window */
> +            _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-extensions",
> +                             &ddw_extensions, sizeof(ddw_extensions)));
> +        }
> +    }
> +
>      /* Build the interrupt-map, this must matches what is done
>       * in pci_spapr_map_irq
>       */
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
@ 2015-02-05  4:19   ` David Gibson
  2015-02-17  0:34     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: David Gibson @ 2015-02-05  4:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 6187 bytes --]
On Thu, Jan 29, 2015 at 08:27:27PM +1100, Alexey Kardashevskiy wrote:
> This enables multiple IOMMU groups in one VFIO container which means
> that multiple devices from different groups can share the same IOMMU table
> and locked pages counting can be done once as there is no need to have
> several containers for two or more groups.
> 
> This removes a group id from vfio_container_ioctl(). The kernel support
> is required for this.
> 
> This adds a check that there is just one VFIO container per PHB address
> space.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr_pci_vfio.c |  9 +++------
>  hw/vfio/common.c        | 33 +++++++++++++++------------------
>  include/hw/vfio/vfio.h  |  2 +-
>  3 files changed, 19 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index b20ac90..257181d 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -33,11 +33,10 @@ static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
>                                      uint32_t *dma32_window_size,
>                                      uint64_t *dma64_window_size)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>      int ret;
>  
> -    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as,
>                                 VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
Huh.. so vfio_container_ioctl() is actually only used by the spapr_tce
code.  What's it doing living in the common vfio code?
>      if (ret) {
>          return ret;
> @@ -55,7 +54,6 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
>                                       uint32_t page_shift, uint32_t window_shift,
>                                       sPAPRTCETable **ptcet)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_iommu_spapr_tce_create create = {
>          .argsz = sizeof(create),
>          .page_shift = page_shift,
> @@ -65,7 +63,7 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
>      };
>      int ret;
>  
> -    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as,
>                                 VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
>      if (ret) {
>          return ret;
> @@ -87,7 +85,6 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
>  
>  static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
>  {
> -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>      struct vfio_iommu_spapr_tce_remove remove = {
>          .argsz = sizeof(remove),
>          .start_addr = tcet->bus_offset
> @@ -95,7 +92,7 @@ static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
>      int ret;
>  
>      spapr_pci_ddw_remove(sphb, tcet);
> -    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> +    ret = vfio_container_ioctl(&sphb->iommu_as,
>                                 VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
>  
>      return ret;
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 1cafcf8..a26cbae 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1011,34 +1011,31 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>      close(vbasedev->fd);
>  }
>  
> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> +static int vfio_container_do_ioctl(AddressSpace *as,
>                                     int req, void *param)
>  {
> -    VFIOGroup *group;
>      VFIOContainer *container;
> -    int ret = -1;
> +    int ret;
> +    VFIOAddressSpace *space;
>  
> -    group = vfio_get_group(groupid, as);
> -    if (!group) {
> -        error_report("vfio: group %d not registered", groupid);
> -        return ret;
> -    }
> +    space = vfio_get_address_space(as);
> +    container = QLIST_FIRST(&space->containers);
>  
> -    container = group->container;
> -    if (group->container) {
> -        ret = ioctl(container->fd, req, param);
> -        if (ret < 0) {
> -            error_report("vfio: failed to ioctl container: ret=%d, %s",
> -                         ret, strerror(errno));
> -        }
> +    if (!container || QLIST_NEXT(container, next)) {
> +        error_report("vfio: multiple containers per PHB are not
> -    supported");
Shouldn't this be an assert?  It's qemu code that sets up the
containers after all.
Also the error message is not right for the case of !container.
> +        return -1;
>      }
>  
> -    vfio_put_group(group);
> +    ret = ioctl(container->fd, req, param);
> +    if (ret < 0) {
> +        error_report("vfio: failed to ioctl container: ret=%d, %s",
> +                     ret, strerror(errno));
> +    }
>  
>      return ret;
>  }
>  
> -int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> +int vfio_container_ioctl(AddressSpace *as,
>                           int req, void *param)
>  {
>      /* We allow only certain ioctls to the container */
> @@ -1054,5 +1051,5 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>          return -1;
>      }
>  
> -    return vfio_container_do_ioctl(as, groupid, req, param);
> +    return vfio_container_do_ioctl(as, req, param);
>  }
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..76b5744 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -3,7 +3,7 @@
>  
>  #include "qemu/typedefs.h"
>  
> -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> +extern int vfio_container_ioctl(AddressSpace *as,
>                                  int req, void *param);
>  
>  #endif
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 16/18] spapr_rtas_ddw: Workaround broken LE guests
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 16/18] spapr_rtas_ddw: Workaround broken LE guests Alexey Kardashevskiy
@ 2015-02-05  4:23   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-05  4:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 6661 bytes --]
On Thu, Jan 29, 2015 at 08:27:28PM +1100, Alexey Kardashevskiy wrote:
> Recent kernels do parse results of what DDW RTAS calls return incorrectly
> if compiled with LITTLE_ENDIAN=yes.
> 
> This adds special handling for such guests.
I don't really follow this commit message.  You need to justify
including this ugly workaround for incorrect guests.
What are the guests that are out in the field which will trigger this behaviour?
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr_rtas.c     | 29 +++++++++++++++++++++++++++++
>  hw/ppc/spapr_rtas_ddw.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h  |  2 ++
>  3 files changed, 71 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 2ec2a8e..c3dee94 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -293,12 +293,15 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> +    spapr_rtas_fn fn_wa; /* workaround helper */
>  } rtas_table[RTAS_TOKEN_MAX - RTAS_TOKEN_BASE];
>  
>  target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                               uint32_t token, uint32_t nargs, target_ulong args,
>                               uint32_t nret, target_ulong rets)
>  {
> +    uint32_t tokensw = bswap32(token);
> +
>      if ((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX)) {
>          struct rtas_call *call = rtas_table + (token - RTAS_TOKEN_BASE);
>  
> @@ -308,6 +311,16 @@ target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>          }
>      }
>  
> +    /* Workaround for LE guests */
> +    if ((tokensw >= RTAS_TOKEN_BASE) && (tokensw < RTAS_TOKEN_MAX)) {
> +        struct rtas_call *call = rtas_table + (tokensw - RTAS_TOKEN_BASE);
> +
> +        if (call->fn_wa) {
> +            call->fn_wa(cpu, spapr, tokensw, nargs, args, nret, rets);
> +            return H_SUCCESS;
> +        }
> +    }
> +
>      /* HACK: Some Linux early debug code uses RTAS display-character,
>       * but assumes the token value is 0xa (which it is on some real
>       * machines) without looking it up in the device tree.  This
> @@ -340,6 +353,22 @@ void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
>      rtas_table[token].fn = fn;
>  }
>  
> +void spapr_rtas_register_wrong_endian(int token, spapr_rtas_fn fn)
> +{
> +    if (!((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX))) {
> +        fprintf(stderr, "RTAS invalid token 0x%x\n", token);
> +        exit(1);
> +    }
> +
> +    token -= RTAS_TOKEN_BASE;
> +    if (!rtas_table[token].fn) {
> +        fprintf(stderr, "RTAS token %x must be initialized to allow workaround\n",
> +                token);
> +        exit(1);
> +    }
> +    rtas_table[token].fn_wa = fn;
> +}
> +
>  int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
>                                   hwaddr rtas_size)
>  {
> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
> index af70601..56eae9f 100644
> --- a/hw/ppc/spapr_rtas_ddw.c
> +++ b/hw/ppc/spapr_rtas_ddw.c
> @@ -278,6 +278,41 @@ param_error_exit:
>      rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>  }
>  
> +#define SPAPR_RTAS_DDW_SWAP(n) rtas_st(rets, (n), bswap32(rtas_ld(rets, (n))))
> +
> +static void rtas_ibm_query_pe_dma_window_wrong_endian(PowerPCCPU *cpu,
> +                                                      sPAPREnvironment *spapr,
> +                                                      uint32_t token,
> +                                                      uint32_t nargs,
> +                                                      target_ulong args,
> +                                                      uint32_t nret,
> +                                                      target_ulong rets)
> +{
> +    rtas_ibm_query_pe_dma_window(cpu, spapr, token, nargs, args, nret, rets);
> +
> +    SPAPR_RTAS_DDW_SWAP(0);
> +    SPAPR_RTAS_DDW_SWAP(1);
> +    SPAPR_RTAS_DDW_SWAP(2);
> +    SPAPR_RTAS_DDW_SWAP(3);
> +    SPAPR_RTAS_DDW_SWAP(4);
> +}
> +
> +static void rtas_ibm_create_pe_dma_window_wrong_endian(PowerPCCPU *cpu,
> +                                                       sPAPREnvironment *spapr,
> +                                                       uint32_t token,
> +                                                       uint32_t nargs,
> +                                                       target_ulong args,
> +                                                       uint32_t nret,
> +                                                       target_ulong rets)
> +{
> +    rtas_ibm_create_pe_dma_window(cpu, spapr, token, nargs, args, nret, rets);
> +
> +    SPAPR_RTAS_DDW_SWAP(0);
> +    SPAPR_RTAS_DDW_SWAP(1);
> +    SPAPR_RTAS_DDW_SWAP(2);
> +    SPAPR_RTAS_DDW_SWAP(3);
> +}
> +
>  static void spapr_rtas_ddw_init(void)
>  {
>      spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW,
> @@ -292,6 +327,11 @@ static void spapr_rtas_ddw_init(void)
>      spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW,
>                          "ibm,reset-pe-dma-window",
>                          rtas_ibm_reset_pe_dma_window);
> +
> +    spapr_rtas_register_wrong_endian(RTAS_IBM_QUERY_PE_DMA_WINDOW,
> +                                     rtas_ibm_query_pe_dma_window_wrong_endian);
> +    spapr_rtas_register_wrong_endian(RTAS_IBM_CREATE_PE_DMA_WINDOW,
> +                                     rtas_ibm_create_pe_dma_window_wrong_endian);
>  }
>  
>  type_init(spapr_rtas_ddw_init)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 5f4e137..bf8e4a6 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -435,6 +435,8 @@ typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                                uint32_t nargs, target_ulong args,
>                                uint32_t nret, target_ulong rets);
>  void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn);
> +void spapr_rtas_register_wrong_endian(int token, spapr_rtas_fn fn);
> +
>  target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                               uint32_t token, uint32_t nargs, target_ulong args,
>                               uint32_t nret, target_ulong rets);
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 17/18] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 17/18] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 Alexey Kardashevskiy
@ 2015-02-05  4:30   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-05  4:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 6763 bytes --]
On Thu, Jan 29, 2015 at 08:27:29PM +1100, Alexey Kardashevskiy wrote:
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Needs a commit message, in particular explaining what
KVM_CREATE_SPAPR_TCE_64 is and why we want it.
I mean, I can guess, but I wrote the original KVM_CREATE_SPAPR_TCE, so
I'm hardly typical.
> ---
>  hw/ppc/spapr_iommu.c |  7 ++++---
>  target-ppc/kvm.c     | 48 +++++++++++++++++++++++++++++++++++++-----------
>  target-ppc/kvm_ppc.h | 10 +++++++---
>  3 files changed, 48 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 1adf568..258f837 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -129,11 +129,12 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  static int spapr_tce_table_realize(DeviceState *dev)
>  {
>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
> -    uint64_t window_size = (uint64_t)tcet->nb_table << tcet->page_shift;
>  
> -    if (kvm_enabled() && !(window_size >> 32)) {
> +    if (kvm_enabled()) {
>          tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
> -                                              window_size,
> +                                              tcet->nb_table,
You're passing "nb_table" to the "window_shift" parameter of the
modified function.  They don't sound like the same quantity..
> +                                              tcet->bus_offset,
> +                                              tcet->page_shift,
>                                                &tcet->fd,
>                                                tcet->vfio_accel);
>      }
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 1edf2b5..5e0e2e8 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -63,6 +63,7 @@ static int cap_booke_sregs;
>  static int cap_ppc_smt;
>  static int cap_ppc_rma;
>  static int cap_spapr_tce;
> +static int cap_spapr_tce_64;
>  static int cap_spapr_multitce;
>  static int cap_spapr_vfio;
>  static int cap_hior;
> @@ -104,6 +105,7 @@ int kvm_arch_init(KVMState *s)
>      cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
>      cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
>      cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> +    cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
>      cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
>      cap_spapr_vfio = false;
>      cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
> @@ -1994,13 +1996,10 @@ bool kvmppc_spapr_use_multitce(void)
>      return cap_spapr_multitce;
>  }
>  
> -void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
> -                              bool vfio_accel)
> +void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_shift,
> +                              uint64_t bus_offset, uint32_t page_shift,
> +                              int *pfd, bool vfio_accel)
>  {
> -    struct kvm_create_spapr_tce args = {
> -        .liobn = liobn,
> -        .window_size = window_size,
> -    };
>      long len;
>      int fd;
>      void *table;
> @@ -2013,14 +2012,41 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
>          return NULL;
>      }
>  
> -    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
> -    if (fd < 0) {
> -        fprintf(stderr, "KVM: Failed to create TCE table for liobn 0x%x\n",
> -                liobn);
> +    if (cap_spapr_tce_64) {
> +        struct kvm_create_spapr_tce_64 args = {
> +            .liobn = liobn,
> +            .page_shift = page_shift,
> +            .offset = bus_offset >> page_shift,
> +            .size = window_shift,
> +            .flags = 0
> +        };
> +        fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE_64, &args);
> +        if (fd < 0) {
> +            fprintf(stderr,
> +                    "KVM: Failed to create TCE64 table for liobn 0x%x\n",
> +                    liobn);
> +            return NULL;
> +        }
> +    } else if (cap_spapr_tce) {
> +        uint64_t window_size = (uint64_t) window_shift << page_shift;
> +        struct kvm_create_spapr_tce args = {
> +            .liobn = liobn,
> +            .window_size = window_size,
> +        };
> +        if ((window_size != args.window_size) || bus_offset) {
> +            return NULL;
> +        }
> +        fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_SPAPR_TCE, &args);
> +        if (fd < 0) {
> +            fprintf(stderr, "KVM: Failed to create TCE table for liobn 0x%x\n",
> +                    liobn);
> +            return NULL;
> +        }
> +    } else {
>          return NULL;
>      }
>  
> -    len = (window_size / SPAPR_TCE_PAGE_SIZE) * sizeof(uint64_t);
> +    len = window_shift * sizeof(uint64_t);
.. which is probably because it appears window_shift is not a shift at all.
>      /* FIXME: round this up to page size */
>  
>      table = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 2e0224c..ec36777 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -35,8 +35,9 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
>  #ifndef CONFIG_USER_ONLY
>  off_t kvmppc_alloc_rma(void **rma);
>  bool kvmppc_spapr_use_multitce(void);
> -void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
> -                              bool vfio_accel);
> +void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_shift,
> +                              uint64_t bus_offset, uint32_t page_shift,
> +                              int *pfd, bool vfio_accel);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> @@ -157,7 +158,10 @@ static inline bool kvmppc_spapr_use_multitce(void)
>  }
>  
>  static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
> -                                            uint32_t window_size, int *fd,
> +                                            uint32_t window_shift,
> +                                            uint64_t bus_offset,
> +                                            uint32_t page_shift,
> +                                            int *fd,
>                                              bool vfio_accel)
>  {
>      return NULL;
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device
  2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
@ 2015-02-05  4:49   ` David Gibson
  2015-02-17  2:36     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: David Gibson @ 2015-02-05  4:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 4851 bytes --]
On Thu, Jan 29, 2015 at 08:27:30PM +1100, Alexey Kardashevskiy wrote:
> TCE hypercalls (H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE) use a logical bus
> number (LIOBN) to identify which TCE table the request is addressed to.
> However VFIO kernel driver operates with IOMMU group IDs and has no idea
> about which LIOBN corresponds to which group. If the host kernel supports
> in-kernel acceleration for TCE calls, we have to provide the LIOBN to IOMMU
> mapping information.
> 
> This makes use of a VFIO KVM device's
> KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN attribute to set the link
> between LIOBN and IOMMU group.
> 
> The vfio_container_spapr_set_liobn() helper is implemented completely
> in vfio.c because kvm_vfio_spapr_tce_liobn needs a group fd and
> we do not want to share resources likes that outside vfio.c.
I thought you'd moved away from the idea of in-kernel TCE
acceleration, since big DMA windows made it unnecessary.
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr_iommu.c    |  1 +
>  hw/ppc/spapr_pci_vfio.c | 11 +++++++++++
>  hw/vfio/common.c        | 33 +++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h  |  4 ++++
>  4 files changed, 49 insertions(+)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 258f837..3de95d7 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -142,6 +142,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>      if (!tcet->table) {
>          size_t table_size = tcet->nb_table * sizeof(uint64_t);
>          tcet->table = g_malloc0(table_size);
> +        tcet->vfio_accel = false;
This should probably have a qdev prop so that it can be explicitly
disabled on a per-device basis for testing.
>      }
>  
>      trace_spapr_iommu_new_table(tcet->liobn, tcet, tcet->table, tcet->fd);
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index 257181d..2078187 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -21,6 +21,7 @@
>  #include "hw/pci-host/spapr.h"
>  #include "linux/vfio.h"
>  #include "hw/vfio/vfio.h"
> +#include "qemu/error-report.h"
>  
>  static Property spapr_phb_vfio_properties[] = {
>      DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
> @@ -80,6 +81,16 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
>      memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
>                                  spapr_tce_get_iommu(*ptcet));
>  
> +    if (!(*ptcet)->vfio_accel) {
> +        return 0;
> +    }
> +    ret = vfio_container_spapr_set_liobn(&sphb->iommu_as,
> +                                         liobn, (*ptcet)->bus_offset);
> +    if (ret) {
> +        error_report("spapr-vfio: failed to create link to IOMMU");
> +        ret = 0;
> +    }
> +
>      return ret;
>  }
>  
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index a26cbae..ec778d0 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1053,3 +1053,36 @@ int vfio_container_ioctl(AddressSpace *as,
>  
>      return vfio_container_do_ioctl(as, req, param);
>  }
> +
> +int vfio_container_spapr_set_liobn(AddressSpace *as,
> +                                   uint64_t liobn,
> +                                   uint64_t start_addr)
> +{
> +#ifdef CONFIG_KVM
> +    int ret;
> +    struct kvm_vfio_spapr_tce_liobn param = {
> +        .argsz = sizeof(param),
> +        .liobn = liobn,
> +        .start_addr = start_addr
> +    };
> +    struct kvm_device_attr attr = {
> +        .group = KVM_DEV_VFIO_GROUP,
> +        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN,
> +        .addr = (uint64_t)(unsigned long)¶m,
> +    };
> +
> +    if (vfio_kvm_device_fd < 0) {
> +        return 0;
> +    }
> +
> +    ret = ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr);
> +    if (ret) {
> +        error_report("vfio: failed to setup liobn for a group: %s",
> +                     strerror(errno));
> +    }
> +
> +    return ret;
> +#else
> +    return 0;
> +#endif
> +}
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 76b5744..8457933 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -6,4 +6,8 @@
>  extern int vfio_container_ioctl(AddressSpace *as,
>                                  int req, void *param);
>  
> +extern int vfio_container_spapr_set_liobn(AddressSpace *as,
> +                                          uint64_t liobn,
> +                                          uint64_t start_addr);
> +
>  #endif
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container
  2015-02-05  4:19   ` David Gibson
@ 2015-02-17  0:34     ` Alexey Kardashevskiy
  2015-02-17 23:52       ` David Gibson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-02-17  0:34 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/05/2015 03:19 PM, David Gibson wrote:
> On Thu, Jan 29, 2015 at 08:27:27PM +1100, Alexey Kardashevskiy wrote:
>> This enables multiple IOMMU groups in one VFIO container which
>> means that multiple devices from different groups can share the same
>> IOMMU table and locked pages counting can be done once as there is
>> no need to have several containers for two or more groups.
>> 
>> This removes a group id from vfio_container_ioctl(). The kernel
>> support is required for this.
>> 
>> This adds a check that there is just one VFIO container per PHB
>> address space.
>> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- 
>> hw/ppc/spapr_pci_vfio.c |  9 +++------ hw/vfio/common.c        | 33
>> +++++++++++++++------------------ include/hw/vfio/vfio.h  |  2 +- 3
>> files changed, 19 insertions(+), 25 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index
>> b20ac90..257181d 100644 --- a/hw/ppc/spapr_pci_vfio.c +++
>> b/hw/ppc/spapr_pci_vfio.c @@ -33,11 +33,10 @@ static int
>> spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb, uint32_t
>> *dma32_window_size, uint64_t *dma64_window_size) { -
>> sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct
>> vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; int
>> ret;
>> 
>> -    ret = vfio_container_ioctl(&sphb->iommu_as,
>> svphb->iommugroupid, +    ret =
>> vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_GET_INFO,
>> &info);
> 
> 
> Huh.. so vfio_container_ioctl() is actually only used by the
> spapr_tce code.  What's it doing living in the common vfio code?
vfio_container_ioctl() converts the address space to a container fd. The
code outside VFIO does not have an idea about these
container/group/device fds.
> 
>> if (ret) { return ret; @@ -55,7 +54,6 @@ static int
>> spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn, 
>> uint32_t page_shift, uint32_t window_shift, sPAPRTCETable **ptcet) 
>> { -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); 
>> struct vfio_iommu_spapr_tce_create create = { .argsz =
>> sizeof(create), .page_shift = page_shift, @@ -65,7 +63,7 @@ static
>> int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn, 
>> }; int ret;
>> 
>> -    ret = vfio_container_ioctl(&sphb->iommu_as,
>> svphb->iommugroupid, +    ret =
>> vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_CREATE,
>> &create); if (ret) { return ret; @@ -87,7 +85,6 @@ static int
>> spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
>> 
>> static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb,
>> sPAPRTCETable *tcet) { -    sPAPRPHBVFIOState *svphb =
>> SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_iommu_spapr_tce_remove
>> remove = { .argsz = sizeof(remove), .start_addr = tcet->bus_offset 
>> @@ -95,7 +92,7 @@ static int spapr_pci_vfio_ddw_remove(sPAPRPHBState
>> *sphb, sPAPRTCETable *tcet) int ret;
>> 
>> spapr_pci_ddw_remove(sphb, tcet); -    ret =
>> vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid, +    ret
>> = vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_REMOVE,
>> &remove);
>> 
>> return ret; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>> 1cafcf8..a26cbae 100644 --- a/hw/vfio/common.c +++
>> b/hw/vfio/common.c @@ -1011,34 +1011,31 @@ void
>> vfio_put_base_device(VFIODevice *vbasedev) close(vbasedev->fd); }
>> 
>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t
>> groupid, +static int vfio_container_do_ioctl(AddressSpace *as, int
>> req, void *param) { -    VFIOGroup *group; VFIOContainer
>> *container; -    int ret = -1; +    int ret; +    VFIOAddressSpace
>> *space;
>> 
>> -    group = vfio_get_group(groupid, as); -    if (!group) { -
>> error_report("vfio: group %d not registered", groupid); -
>> return ret; -    } +    space = vfio_get_address_space(as); +
>> container = QLIST_FIRST(&space->containers);
>> 
>> -    container = group->container; -    if (group->container) { -
>> ret = ioctl(container->fd, req, param); -        if (ret < 0) { -
>> error_report("vfio: failed to ioctl container: ret=%d, %s", -
>> ret, strerror(errno)); -        } +    if (!container ||
>> QLIST_NEXT(container, next)) { +        error_report("vfio: multiple
>> containers per PHB are not -    supported");
> 
> Shouldn't this be an assert?  It's qemu code that sets up the 
> containers after all.
The VFIO maintainer does not like asserts and I do not want to do things
which he does not like unless I really have to :)
> Also the error message is not right for the case of !container.
Right, that I will fix.
>> +        return -1; }
>> 
>> -    vfio_put_group(group); +    ret = ioctl(container->fd, req,
>> param); +    if (ret < 0) { +        error_report("vfio: failed to
>> ioctl container: ret=%d, %s", +                     ret,
>> strerror(errno)); +    }
>> 
>> return ret; }
>> 
>> -int vfio_container_ioctl(AddressSpace *as, int32_t groupid, +int
>> vfio_container_ioctl(AddressSpace *as, int req, void *param) { /* We
>> allow only certain ioctls to the container */ @@ -1054,5 +1051,5 @@
>> int vfio_container_ioctl(AddressSpace *as, int32_t groupid, return
>> -1; }
>> 
>> -    return vfio_container_do_ioctl(as, groupid, req, param); +
>> return vfio_container_do_ioctl(as, req, param); } diff --git
>> a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h index
>> 0b26cd8..76b5744 100644 --- a/include/hw/vfio/vfio.h +++
>> b/include/hw/vfio/vfio.h @@ -3,7 +3,7 @@
>> 
>> #include "qemu/typedefs.h"
>> 
>> -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, 
>> +extern int vfio_container_ioctl(AddressSpace *as, int req, void
>> *param);
>> 
>> #endif
> 
- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJU4oyTAAoJEIYTPdgrwSC5SSUQAJH/KS2rnIXXmyZPnFI9FW+T
X780EzHZ2OLdIMoxqSF2x4tpxL7TLgt+3ZBkqOhmjFhhPnMGEz4KnLRJYG4VAoZx
iS5xfbKtR34Dxyz5flhbdTs1BXkUJli5nhiSiXMbQfgsn0ZG2czvvhee9WZExsAL
0EM1UdxWP4Y0TD03msRWjCfuUmtZnGF3F0iMf3Yv3uuwXvT5pFQWe6Wf0JJNPUDC
X3+wvZeGl+gaX3mGbE0iXL4GiWnK+JqkGUt/CrEdnIVD9h0H6wiLY7s82QEkEsK3
IXRJBfL+TXF/RgKAeCyeVZJzAxbw3PWsb4DKb4U4ByWOmdYxXEq35G99pxZEvPn0
DHobPUNk1RyTrDVL9n5W7lbwcOc81xCA5A3Y3z2NnOaPt8J/wcYg6qigkvhDpMql
PIr+1WHAOf++xUDf+EIXhubGSwqNM7FgUDyrvAEWMEUP+ddz74QN9h0LD7ZB/1lR
svPTE+a/kM2biolzwIPTCdwQXroME7lujB1ynvOEwtiGi5BOjwFTHZ/J7P12iYaZ
px5S8WI07cR7d9cyMFkeYrwxlBwj10i9NzFDP+1JngbzrVmpf15ekj4CFAxMo7gV
ubjcQCvOMoCbiljm5Cg7yn1MjlFkp9ZnNAPKv/TVpw6Co4TgljiAMi+w0DsLXJ8y
DS6iczjry50ZOTpUyWbj
=UkLT
-----END PGP SIGNATURE-----
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering
  2015-02-02  7:04   ` David Gibson
@ 2015-02-17  2:14     ` Alexey Kardashevskiy
  2015-02-17 23:53       ` David Gibson
  2015-02-17  2:20     ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering [repost] Alexey Kardashevskiy
  1 sibling, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-02-17  2:14 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/02/2015 06:04 PM, David Gibson wrote:
> On Thu, Jan 29, 2015 at 08:27:20PM +1100, Alexey Kardashevskiy wrote:
>> This makes use of the new "memory registering" feature. The idea is 
>> to provide the guest ability to notify the host kernel about pages
>> which are going to be used for DMA. Having this information, the
>> host kernel can pin them all, do locked pages accounting and not
>> spent time on doing that in real time with possible failures which
>> cannot be handled nicely in some cases.
>> 
>> This adds a memory listener which notifies a VFIO container about
>> memory which needs to be pinned/unpinned. VFIO MMIO regions are
>> skipped.
>> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- 
>> hw/vfio/common.c              | 99
>> ++++++++++++++++++++++++++++++++++++++++++- 
>> include/hw/vfio/vfio-common.h |  3 +- 2 files changed, 100
>> insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>> cf483ff..c08f9ab 100644 --- a/hw/vfio/common.c +++
>> b/hw/vfio/common.c @@ -485,6 +485,99 @@ void
>> vfio_listener_release(VFIOContainer *container) 
>> memory_listener_unregister(&container->iommu_data.type1.listener); 
>> }
>> 
>> +static int vfio_mem_register(VFIOContainer *container, +
>> void *vaddr, ram_addr_t size) +{ +    struct
>> vfio_iommu_type1_register_memory reg = { +        .argsz =
>> sizeof(reg), +        .vaddr = (__u64)(uintptr_t)vaddr, +
>> .size = size, +    }; +    long ret = ioctl(container->fd,
>> VFIO_IOMMU_REGISTER_MEMORY, ®); + +    DPRINTF("(%u) %s %u:
>> va=%lx size=%lx, ret=%ld\n", getpid(), +            __func__,
>> __LINE__, reg.vaddr, reg.size, ret); + +    return ret; +} + +static
>> int vfio_mem_unregister(VFIOContainer *container, +
>> void *vaddr, ram_addr_t size) +{ +    struct
>> vfio_iommu_type1_unregister_memory reg = { +        .argsz =
>> sizeof(reg), +        .vaddr = (__u64)(uintptr_t)vaddr, +
>> .size = size, +    }; +    long ret = ioctl(container->fd,
>> VFIO_IOMMU_UNREGISTER_MEMORY, ®); + +    DPRINTF("(%u) %s %u:
>> va=%lx size=%lx, ret=%ld\n", getpid(), +            __func__,
>> __LINE__, reg.vaddr, reg.size, ret); + +    return ret; +} + +static
>> void vfio_ram_region_add(MemoryListener *listener, +
>> MemoryRegionSection *section) +{ +    VFIOContainer *container =
>> container_of(listener, VFIOContainer, +
>> iommu_data.type1.ramlistener); +    hwaddr end; +    Int128 llend; +
>> void *vaddr; + +    if (!memory_region_is_ram(section->mr) || +
>> memory_region_is_skip_dump(section->mr)) { +        return; +    } 
>> + +    llend = int128_make64(section->offset_within_address_space); 
>> +    llend = int128_add(llend, section->size);
> 
> Does this need an add TARGET_PAGE_SIZE-1 in order to make it round up 
> to a page boundary, rather than truncate to a page boundary?
Fixed.
>> +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + +
>> memory_region_ref(section->mr); + +    end = int128_get64(llend); +
>> vaddr = memory_region_get_ram_ptr(section->mr) + +
>> section->offset_within_region; +    vfio_mem_register(container,
>> vaddr, end); +} + +static void vfio_ram_region_del(MemoryListener
>> *listener, +                                MemoryRegionSection
>> *section) +{ +    VFIOContainer *container = container_of(listener,
>> VFIOContainer, +
>> iommu_data.type1.ramlistener); +    hwaddr iova, end; +    void
>> *vaddr; + +    if (!memory_region_is_ram(section->mr) || +
>> memory_region_is_skip_dump(section->mr)) { +        return; +    } 
>> + + +    iova =
>> TARGET_PAGE_ALIGN(section->offset_within_address_space); +    end =
>> (section->offset_within_address_space + int128_get64(section->size))
>> & +        TARGET_PAGE_MASK; +    vaddr =
>> memory_region_get_ram_ptr(section->mr) + +
>> section->offset_within_region + +        (iova -
>> section->offset_within_address_space);
> 
> It's not clear to me why region_add and region_del have a different 
> set of address calculations.
Cut-n-paste from the other listener :) I will rework it.
>> +    vfio_mem_unregister(container, vaddr, end - iova); +} + +const
>> MemoryListener vfio_ram_listener = { +    .region_add =
>> vfio_ram_region_add, +    .region_del = vfio_ram_region_del, +}; + 
>> +static void vfio_spapr_listener_release(VFIOContainer *container) 
>> +{ +
>> memory_listener_unregister(&container->iommu_data.type1.listener); +
>> memory_listener_unregister(&container->iommu_data.type1.ramlistener);
>
>> 
> Accessing fields within type1 from a function whose name says it's 
> spapr specific seems very wrong.
Kind of ugly, yes. But we share the common memory listener with Type1 so...
>> +} + int vfio_mmap_region(Object *obj, VFIORegion *region, 
>> MemoryRegion *mem, MemoryRegion *submem, void **map, size_t size,
>> off_t offset, @@ -705,6 +798,10 @@ static int
>> vfio_connect_container(VFIOGroup *group, AddressSpace *as) goto
>> free_container_exit; }
>> 
>> +        container->iommu_data.type1.ramlistener =
>> vfio_ram_listener; +
>> memory_listener_register(&container->iommu_data.type1.ramlistener, +
>> &address_space_memory);
> 
> Why two separate listeners, rather than doing both jobs from a single
> listener?
... I actually like the idea to have this separated from the rest of the
code. Furthermore, now I think we better have separate memory listeners
for Type1 and SPAPR as the current vfio_listener_region_add()/del() look
quite ugly trying to do different things depending on
memory_region_is_iommu().
Any objection to separating SPAPR's listener (and merging it with the one
introduced by this patch)?
>> /* * The host kernel code implementing VFIO_IOMMU_DISABLE is called 
>> * when container fd is closed so we do not call it explicitly @@
>> -718,7 +815,7 @@ static int vfio_connect_container(VFIOGroup *group,
>> AddressSpace *as) }
>> 
>> container->iommu_data.type1.listener = vfio_memory_listener; -
>> container->iommu_data.release = vfio_listener_release; +
>> container->iommu_data.release = vfio_spapr_listener_release;
>> 
>> memory_listener_register(&container->iommu_data.type1.listener, 
>> container->space->as); diff --git a/include/hw/vfio/vfio-common.h
>> b/include/hw/vfio/vfio-common.h index 1d5bfe8..3f91216 100644 ---
>> a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h 
>> @@ -66,6 +66,7 @@ struct VFIOGroup;
>> 
>> typedef struct VFIOType1 { MemoryListener listener; +
>> MemoryListener ramlistener; int error; bool initialized; }
>> VFIOType1; @@ -144,7 +145,7 @@ int vfio_get_device(VFIOGroup *group,
>> const char *name, VFIODevice *vbasedev);
>> 
>> extern const MemoryRegionOps vfio_region_ops; -extern const
>> MemoryListener vfio_memory_listener; +extern const MemoryListener
>> vfio_memory_listener, vfio_ram_listener; extern
>> QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list; extern
>> QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
>> 
> 
- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJU4qQJAAoJEIYTPdgrwSC51g0P/iNSwMhniXDP/OIwUNA/fN+l
WWVOpNT9aMgkJM7nttgUITc8SGvkqVsF50KsB8E6o6/tcYy4eKWFwZhRHCDWhAAg
6n6iJtSfGW7jQ2HEkTFB60RCEXhDsHDyuG0Q9BHkk/7EAOFqTPNXncv14Ant81GB
rDRtT5vQsvmr6nT8Qw7Yzr1ip1kVl60YVSjxGjD2D3xU7TLIqNkIGbBgSKErcGh5
19hGsN7GcTTmteFrUmKpmeRt/yRazmHuzV5eMVuwnO6rDX4TCbVZpLUrmsDd3K2A
ePr5oCtWZl45b9MwZT1DLkgN1SjYvH5NgbeYBPOSO5BeGu0TcedULNw++ptpfm+v
G7P5k4TLWze9PM/vVh1e6QZp/pe2eHkFOBqr+I35T8eMgCXkETbLPwHVpdsybrua
2r1GwayCopk6uikcHWsBGuQ3+1QNNUBwBm1h9MYBLjB+dC9tfQqCkK0X8fbmhpSG
0doaM36RZdiPc0iP5qyih80F7MnNRUuKPwa2k0z6fEgpnkT4JqIDHFX7CrznqYrT
BmfYSsX3a45YnxTsYLtPRmb2EQDojQ66yiCu70sMXFk+j+lojnpov/zal/4K67Ws
uliuSxm/tNNc17sppRHhTfHsFwOkJ92KNwfqxN/6o6LOtv5GFVsR0tdiBAOnZYfP
OZH2mTBVfccG3KgsnelV
=x4bG
-----END PGP SIGNATURE-----
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering [repost]
  2015-02-02  7:04   ` David Gibson
  2015-02-17  2:14     ` Alexey Kardashevskiy
@ 2015-02-17  2:20     ` Alexey Kardashevskiy
  1 sibling, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-02-17  2:20 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Repost as I think I pressed "crtl-r" and thunberbird killed the formatting :)
On 02/02/2015 06:04 PM, David Gibson wrote:
> On Thu, Jan 29, 2015 at 08:27:20PM +1100, Alexey Kardashevskiy wrote:
>> This makes use of the new "memory registering" feature. The idea is 
>> to provide the guest ability to notify the host kernel about pages
>> which are going to be used for DMA. Having this information, the
>> host kernel can pin them all, do locked pages accounting and not
>> spent time on doing that in real time with possible failures which
>> cannot be handled nicely in some cases.
>> 
>> This adds a memory listener which notifies a VFIO container about
>> memory which needs to be pinned/unpinned. VFIO MMIO regions are
>> skipped.
>> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- 
>> hw/vfio/common.c              | 99
>> ++++++++++++++++++++++++++++++++++++++++++- 
>> include/hw/vfio/vfio-common.h |  3 +- 2 files changed, 100
>> insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>> cf483ff..c08f9ab 100644 --- a/hw/vfio/common.c +++
>> b/hw/vfio/common.c @@ -485,6 +485,99 @@ void
>> vfio_listener_release(VFIOContainer *container) 
>> memory_listener_unregister(&container->iommu_data.type1.listener); 
>> }
>> 
>> +static int vfio_mem_register(VFIOContainer *container, +
>> void *vaddr, ram_addr_t size) +{ +    struct
>> vfio_iommu_type1_register_memory reg = { +        .argsz =
>> sizeof(reg), +        .vaddr = (__u64)(uintptr_t)vaddr, +
>> .size = size, +    }; +    long ret = ioctl(container->fd,
>> VFIO_IOMMU_REGISTER_MEMORY, ®); + +    DPRINTF("(%u) %s %u:
>> va=%lx size=%lx, ret=%ld\n", getpid(), +            __func__,
>> __LINE__, reg.vaddr, reg.size, ret); + +    return ret; +} + +static
>> int vfio_mem_unregister(VFIOContainer *container, +
>> void *vaddr, ram_addr_t size) +{ +    struct
>> vfio_iommu_type1_unregister_memory reg = { +        .argsz =
>> sizeof(reg), +        .vaddr = (__u64)(uintptr_t)vaddr, +
>> .size = size, +    }; +    long ret = ioctl(container->fd,
>> VFIO_IOMMU_UNREGISTER_MEMORY, ®); + +    DPRINTF("(%u) %s %u:
>> va=%lx size=%lx, ret=%ld\n", getpid(), +            __func__,
>> __LINE__, reg.vaddr, reg.size, ret); + +    return ret; +} + +static
>> void vfio_ram_region_add(MemoryListener *listener, +
>> MemoryRegionSection *section) +{ +    VFIOContainer *container =
>> container_of(listener, VFIOContainer, +
>> iommu_data.type1.ramlistener); +    hwaddr end; +    Int128 llend; +
>> void *vaddr; + +    if (!memory_region_is_ram(section->mr) || +
>> memory_region_is_skip_dump(section->mr)) { +        return; +    } 
>> + +    llend = int128_make64(section->offset_within_address_space); 
>> +    llend = int128_add(llend, section->size);
> 
> Does this need an add TARGET_PAGE_SIZE-1 in order to make it round up 
> to a page boundary, rather than truncate to a page boundary?
Fixed.
> 
>> +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + +
>> memory_region_ref(section->mr); + +    end = int128_get64(llend); +
>> vaddr = memory_region_get_ram_ptr(section->mr) + +
>> section->offset_within_region; +    vfio_mem_register(container,
>> vaddr, end); +} + +static void vfio_ram_region_del(MemoryListener
>> *listener, +                                MemoryRegionSection
>> *section) +{ +    VFIOContainer *container = container_of(listener,
>> VFIOContainer, +
>> iommu_data.type1.ramlistener); +    hwaddr iova, end; +    void
>> *vaddr; + +    if (!memory_region_is_ram(section->mr) || +
>> memory_region_is_skip_dump(section->mr)) { +        return; +    } 
>> + + +    iova =
>> TARGET_PAGE_ALIGN(section->offset_within_address_space); +    end =
>> (section->offset_within_address_space + int128_get64(section->size))
>> & +        TARGET_PAGE_MASK; +    vaddr =
>> memory_region_get_ram_ptr(section->mr) + +
>> section->offset_within_region + +        (iova -
>> section->offset_within_address_space);
> 
> It's not clear to me why region_add and region_del have a different 
> set of address calculations.
> 
Cut-n-paste from the other listener; I will rework it; this version was
too raw.
>> +    vfio_mem_unregister(container, vaddr, end - iova); +} + +const
>> MemoryListener vfio_ram_listener = { +    .region_add =
>> vfio_ram_region_add, +    .region_del = vfio_ram_region_del, +}; + 
>> +static void vfio_spapr_listener_release(VFIOContainer *container) 
>> +{ +
>> memory_listener_unregister(&container->iommu_data.type1.listener); +
>> memory_listener_unregister(&container->iommu_data.type1.ramlistener);
>
>> 
> Accessing fields within type1 from a function whose name says it's 
> spapr specific seems very wrong.
Kind of ugly, yes. But we share the common memory listener with Type1 so...
> 
>> +} + int vfio_mmap_region(Object *obj, VFIORegion *region, 
>> MemoryRegion *mem, MemoryRegion *submem, void **map, size_t size,
>> off_t offset, @@ -705,6 +798,10 @@ static int
>> vfio_connect_container(VFIOGroup *group, AddressSpace *as) goto
>> free_container_exit; }
>> 
>> +        container->iommu_data.type1.ramlistener =
>> vfio_ram_listener; +
>> memory_listener_register(&container->iommu_data.type1.ramlistener, +
>> &address_space_memory);
> 
> Why two separate listeners, rather than doing both jobs from a single
> listener?
> 
... I actually like the idea to have this separated from the rest of the
code. Furthermore, now I think we better have separate memory listeners
for Type1 and SPAPR as the current vfio_listener_region_add()/del() look
quite ugly trying to do different things depending on
memory_region_is_iommu().
Any objection to separating SPAPR's listener (and merging it with the one
introduced by this patch)?
>> /* * The host kernel code implementing VFIO_IOMMU_DISABLE is called 
>> * when container fd is closed so we do not call it explicitly @@
>> -718,7 +815,7 @@ static int vfio_connect_container(VFIOGroup *group,
>> AddressSpace *as) }
>> 
>> container->iommu_data.type1.listener = vfio_memory_listener; -
>> container->iommu_data.release = vfio_listener_release; +
>> container->iommu_data.release = vfio_spapr_listener_release;
>> 
>> memory_listener_register(&container->iommu_data.type1.listener, 
>> container->space->as); diff --git a/include/hw/vfio/vfio-common.h
>> b/include/hw/vfio/vfio-common.h index 1d5bfe8..3f91216 100644 ---
>> a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h 
>> @@ -66,6 +66,7 @@ struct VFIOGroup;
>> 
>> typedef struct VFIOType1 { MemoryListener listener; +
>> MemoryListener ramlistener; int error; bool initialized; }
>> VFIOType1; @@ -144,7 +145,7 @@ int vfio_get_device(VFIOGroup *group,
>> const char *name, VFIODevice *vbasedev);
>> 
>> extern const MemoryRegionOps vfio_region_ops; -extern const
>> MemoryListener vfio_memory_listener; +extern const MemoryListener
>> vfio_memory_listener, vfio_ram_listener; extern
>> QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list; extern
>> QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
>> 
> 
- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJU4qVcAAoJEIYTPdgrwSC5czMP/Aui1yqy0a1/MZ69FlvZwYPR
V/wtTcy04IVBhBIxDq9BYMLOuWrliCZWbE172pXEVwcwD/JQedvYrgF2f+zZ+NUm
exoScPwG+nWS3iy9Xw5pLuqYbQVkIIoJ/cyTqjZesqmc++Og1FiRh8Wh1qttn8Dl
KSqnn30hVGdjZgIf6sztMOw3a9eAt+mRpdcjhIiMiilbYhoWwvsVNH9nIC4WNkX8
nky8+DarT4Tlwif9wns/BubtYtzPhUHeaIHjv4c7NSbSInXVRJ0bUPprxJse0l3c
lRSBNYp/cA4Qsw1Cec5ZC911OQDuIVGUrv80iXi2urFPViuma8HxYRxadJ3K2u/n
DCw+rvwB8dTvMviwYG0PyWrSSduUbC8riThypvw+yvfgC5qZi6KlxBXROoFy1AQN
YaeS5Mb7tTgmeYogClslu5MYQtSkNH7G8oD8NqNpXD6gA8oloCK2U0cLg6XYtH04
DtGkk9AoTTsDrAXPNX+OuGWQOLQAiR8rpLqq59J59ug+6uObrrGDoovCZH9Vk4rJ
F/gdogD/D/uO213QhfxJdCav/E0AxJ7LglCgvDKUk73Jy9es11s+Xy2augkfexHL
4foua0OqJl34hhtLubv6W0YZLx6yUZaUQoypOmuLJYfrb4TNlEHXcOGdsDmkk7No
yflIC9eNtNDR0wdWQqo+
=iytn
-----END PGP SIGNATURE-----
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device
  2015-02-05  4:49   ` David Gibson
@ 2015-02-17  2:36     ` Alexey Kardashevskiy
  2015-02-17 23:56       ` David Gibson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2015-02-17  2:36 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/05/2015 03:49 PM, David Gibson wrote:
> On Thu, Jan 29, 2015 at 08:27:30PM +1100, Alexey Kardashevskiy wrote:
>> TCE hypercalls (H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE) use a
>> logical bus number (LIOBN) to identify which TCE table the request
>> is addressed to. However VFIO kernel driver operates with IOMMU
>> group IDs and has no idea about which LIOBN corresponds to which
>> group. If the host kernel supports in-kernel acceleration for TCE
>> calls, we have to provide the LIOBN to IOMMU mapping information.
>> 
>> This makes use of a VFIO KVM device's 
>> KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN attribute to set the link 
>> between LIOBN and IOMMU group.
>> 
>> The vfio_container_spapr_set_liobn() helper is implemented
>> completely in vfio.c because kvm_vfio_spapr_tce_liobn needs a group
>> fd and we do not want to share resources likes that outside vfio.c.
> 
> I thought you'd moved away from the idea of in-kernel TCE 
> acceleration, since big DMA windows made it unnecessary.
Not entirely. DDW may not be supported by some hardware (like nVidia
being able to generate only 40bit DMA addresses without hacks). DMA
memory registering is pretty much about it - when we preregister pages,
the KVM acceleration becomes lot simpler as it does not have to take care
of pinning or locked_vm accounting.
But I should have not posted it here anyway, too early :)
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- 
>> hw/ppc/spapr_iommu.c    |  1 + hw/ppc/spapr_pci_vfio.c | 11
>> +++++++++++ hw/vfio/common.c        | 33
>> +++++++++++++++++++++++++++++++++ include/hw/vfio/vfio.h  |  4 ++++ 
>> 4 files changed, 49 insertions(+)
>> 
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c index
>> 258f837..3de95d7 100644 --- a/hw/ppc/spapr_iommu.c +++
>> b/hw/ppc/spapr_iommu.c @@ -142,6 +142,7 @@ static int
>> spapr_tce_table_realize(DeviceState *dev) if (!tcet->table) { size_t
>> table_size = tcet->nb_table * sizeof(uint64_t); tcet->table =
>> g_malloc0(table_size); +        tcet->vfio_accel = false;
> 
> This should probably have a qdev prop so that it can be explicitly 
> disabled on a per-device basis for testing.
It is a TCE table's property, how can you disable it if it is not created
via the command line?
>> }
>> 
>> trace_spapr_iommu_new_table(tcet->liobn, tcet, tcet->table,
>> tcet->fd); diff --git a/hw/ppc/spapr_pci_vfio.c
>> b/hw/ppc/spapr_pci_vfio.c index 257181d..2078187 100644 ---
>> a/hw/ppc/spapr_pci_vfio.c +++ b/hw/ppc/spapr_pci_vfio.c @@ -21,6
>> +21,7 @@ #include "hw/pci-host/spapr.h" #include "linux/vfio.h" 
>> #include "hw/vfio/vfio.h" +#include "qemu/error-report.h"
>> 
>> static Property spapr_phb_vfio_properties[] = { 
>> DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1), @@
>> -80,6 +81,16 @@ static int spapr_pci_vfio_ddw_create(sPAPRPHBState
>> *sphb, uint32_t liobn, 
>> memory_region_add_subregion(&sphb->iommu_root,
>> (*ptcet)->bus_offset, spapr_tce_get_iommu(*ptcet));
>> 
>> +    if (!(*ptcet)->vfio_accel) { +        return 0; +    } +    ret
>> = vfio_container_spapr_set_liobn(&sphb->iommu_as, +
>> liobn, (*ptcet)->bus_offset); +    if (ret) { +
>> error_report("spapr-vfio: failed to create link to IOMMU"); +
>> ret = 0; +    } + return ret; }
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>> a26cbae..ec778d0 100644 --- a/hw/vfio/common.c +++
>> b/hw/vfio/common.c @@ -1053,3 +1053,36 @@ int
>> vfio_container_ioctl(AddressSpace *as,
>> 
>> return vfio_container_do_ioctl(as, req, param); } + +int
>> vfio_container_spapr_set_liobn(AddressSpace *as, +
>> uint64_t liobn, +                                   uint64_t
>> start_addr) +{ +#ifdef CONFIG_KVM +    int ret; +    struct
>> kvm_vfio_spapr_tce_liobn param = { +        .argsz = sizeof(param), 
>> +        .liobn = liobn, +        .start_addr = start_addr +    }; +
>> struct kvm_device_attr attr = { +        .group =
>> KVM_DEV_VFIO_GROUP, +        .attr =
>> KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN, +        .addr =
>> (uint64_t)(unsigned long)¶m, +    }; + +    if
>> (vfio_kvm_device_fd < 0) { +        return 0; +    } + +    ret =
>> ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr); +    if (ret)
>> { +        error_report("vfio: failed to setup liobn for a group:
>> %s", +                     strerror(errno)); +    } + +    return
>> ret; +#else +    return 0; +#endif +} diff --git
>> a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h index
>> 76b5744..8457933 100644 --- a/include/hw/vfio/vfio.h +++
>> b/include/hw/vfio/vfio.h @@ -6,4 +6,8 @@ extern int
>> vfio_container_ioctl(AddressSpace *as, int req, void *param);
>> 
>> +extern int vfio_container_spapr_set_liobn(AddressSpace *as, +
>> uint64_t liobn, +                                          uint64_t
>> start_addr); + #endif
> 
- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJU4qk1AAoJEIYTPdgrwSC5VVAP/08MMoTSLOphkpCn2aIsvUyE
HGjhkkxzALk/E69kdLbsuOLjTngLj0B+CZJX5ArDitmdUVaPBZOJn52MoL+IUyg1
DNZeQH+2J7t+/UZmauHgkxLDWHgtbYZ0LnHOf5HZF/65hsCgb/771z6q0T0T2TJH
F3UhzUBoXmx18OJjsA17l1KQ9QeWGuWjNGcgzZQ5SqqfKRWZUYEwxWLCAXwFQEmC
Xy7JhJ65SgGlCkFNQf1UhoN0N5pxTBUXIj30mtmkfZN5okM68CXmA23tDsulgd+w
d0FTkWZWzegIkLLt/rOL+ILVZqV3pppEgc0ECb+yXtKuzBeaNDQNLXI+TGQVDxEa
tN9olWp8KVYUo5gircmVnM4/6DQe4pca72YvoCYXDe2fwtU5swaG5fL+0qWnn9OA
j0dG3XLJQfQ6XL75P5KQ42zltjOpfhAq5d6tEB2PwlkmyPih8x1iexq/uH/chGKD
h0eRo9xY6fkvW49Gf2jFJvSJ1PhUqIsEmIf4tLPcg5aRqYKWrVqNV57oDtVVLcRD
yewag/FNWk526bWQhH4Kpn1HGqmV3gIbnhGjprvKaqPUIAnyCTxb5dQN3kNTIGDs
rz+XXbJ2g5M3X6opO2OH65aSzCzjvd36JZbqh4jJPP4uctZfBjcDSLYNCl8OIGT8
RFewBFMpyzSV4SV1vJnn
=KtXz
-----END PGP SIGNATURE-----
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container
  2015-02-17  0:34     ` Alexey Kardashevskiy
@ 2015-02-17 23:52       ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-17 23:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 7301 bytes --]
On Tue, Feb 17, 2015 at 11:34:27AM +1100, Alexey Kardashevskiy wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/05/2015 03:19 PM, David Gibson wrote:
> > On Thu, Jan 29, 2015 at 08:27:27PM +1100, Alexey Kardashevskiy wrote:
> >> This enables multiple IOMMU groups in one VFIO container which
> >> means that multiple devices from different groups can share the same
> >> IOMMU table and locked pages counting can be done once as there is
> >> no need to have several containers for two or more groups.
> >> 
> >> This removes a group id from vfio_container_ioctl(). The kernel
> >> support is required for this.
> >> 
> >> This adds a check that there is just one VFIO container per PHB
> >> address space.
> >> 
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- 
> >> hw/ppc/spapr_pci_vfio.c |  9 +++------ hw/vfio/common.c        | 33
> >> +++++++++++++++------------------ include/hw/vfio/vfio.h  |  2 +- 3
> >> files changed, 19 insertions(+), 25 deletions(-)
> >> 
> >> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index
> >> b20ac90..257181d 100644 --- a/hw/ppc/spapr_pci_vfio.c +++
> >> b/hw/ppc/spapr_pci_vfio.c @@ -33,11 +33,10 @@ static int
> >> spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb, uint32_t
> >> *dma32_window_size, uint64_t *dma64_window_size) { -
> >> sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct
> >> vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) }; int
> >> ret;
> >> 
> >> -    ret = vfio_container_ioctl(&sphb->iommu_as,
> >> svphb->iommugroupid, +    ret =
> >> vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_GET_INFO,
> >> &info);
> > 
> > 
> > Huh.. so vfio_container_ioctl() is actually only used by the
> > spapr_tce code.  What's it doing living in the common vfio code?
> 
> vfio_container_ioctl() converts the address space to a container fd. The
> code outside VFIO does not have an idea about these
> container/group/device fds.
Ah, ok I see.
> >> if (ret) { return ret; @@ -55,7 +54,6 @@ static int
> >> spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn, 
> >> uint32_t page_shift, uint32_t window_shift, sPAPRTCETable **ptcet) 
> >> { -    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); 
> >> struct vfio_iommu_spapr_tce_create create = { .argsz =
> >> sizeof(create), .page_shift = page_shift, @@ -65,7 +63,7 @@ static
> >> int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn, 
> >> }; int ret;
> >> 
> >> -    ret = vfio_container_ioctl(&sphb->iommu_as,
> >> svphb->iommugroupid, +    ret =
> >> vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_CREATE,
> >> &create); if (ret) { return ret; @@ -87,7 +85,6 @@ static int
> >> spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t liobn,
> >> 
> >> static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb,
> >> sPAPRTCETable *tcet) { -    sPAPRPHBVFIOState *svphb =
> >> SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); struct vfio_iommu_spapr_tce_remove
> >> remove = { .argsz = sizeof(remove), .start_addr = tcet->bus_offset 
> >> @@ -95,7 +92,7 @@ static int spapr_pci_vfio_ddw_remove(sPAPRPHBState
> >> *sphb, sPAPRTCETable *tcet) int ret;
> >> 
> >> spapr_pci_ddw_remove(sphb, tcet); -    ret =
> >> vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid, +    ret
> >> = vfio_container_ioctl(&sphb->iommu_as, VFIO_IOMMU_SPAPR_TCE_REMOVE,
> >> &remove);
> >> 
> >> return ret; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
> >> 1cafcf8..a26cbae 100644 --- a/hw/vfio/common.c +++
> >> b/hw/vfio/common.c @@ -1011,34 +1011,31 @@ void
> >> vfio_put_base_device(VFIODevice *vbasedev) close(vbasedev->fd); }
> >> 
> >> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t
> >> groupid, +static int vfio_container_do_ioctl(AddressSpace *as, int
> >> req, void *param) { -    VFIOGroup *group; VFIOContainer
> >> *container; -    int ret = -1; +    int ret; +    VFIOAddressSpace
> >> *space;
> >> 
> >> -    group = vfio_get_group(groupid, as); -    if (!group) { -
> >> error_report("vfio: group %d not registered", groupid); -
> >> return ret; -    } +    space = vfio_get_address_space(as); +
> >> container = QLIST_FIRST(&space->containers);
> >> 
> >> -    container = group->container; -    if (group->container) { -
> >> ret = ioctl(container->fd, req, param); -        if (ret < 0) { -
> >> error_report("vfio: failed to ioctl container: ret=%d, %s", -
> >> ret, strerror(errno)); -        } +    if (!container ||
> >> QLIST_NEXT(container, next)) { +        error_report("vfio: multiple
> >> containers per PHB are not -    supported");
> > 
> > Shouldn't this be an assert?  It's qemu code that sets up the 
> > containers after all.
> 
> 
> The VFIO maintainer does not like asserts and I do not want to do things
> which he does not like unless I really have to :)
Ok.
> > Also the error message is not right for the case of !container.
> 
> Right, that I will fix.
> 
> 
> >> +        return -1; }
> >> 
> >> -    vfio_put_group(group); +    ret = ioctl(container->fd, req,
> >> param); +    if (ret < 0) { +        error_report("vfio: failed to
> >> ioctl container: ret=%d, %s", +                     ret,
> >> strerror(errno)); +    }
> >> 
> >> return ret; }
> >> 
> >> -int vfio_container_ioctl(AddressSpace *as, int32_t groupid, +int
> >> vfio_container_ioctl(AddressSpace *as, int req, void *param) { /* We
> >> allow only certain ioctls to the container */ @@ -1054,5 +1051,5 @@
> >> int vfio_container_ioctl(AddressSpace *as, int32_t groupid, return
> >> -1; }
> >> 
> >> -    return vfio_container_do_ioctl(as, groupid, req, param); +
> >> return vfio_container_do_ioctl(as, req, param); } diff --git
> >> a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h index
> >> 0b26cd8..76b5744 100644 --- a/include/hw/vfio/vfio.h +++
> >> b/include/hw/vfio/vfio.h @@ -3,7 +3,7 @@
> >> 
> >> #include "qemu/typedefs.h"
> >> 
> >> -extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, 
> >> +extern int vfio_container_ioctl(AddressSpace *as, int req, void
> >> *param);
> >> 
> >> #endif
> > 
> 
> 
> - -- 
> Alexey
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> 
> iQIcBAEBAgAGBQJU4oyTAAoJEIYTPdgrwSC5SSUQAJH/KS2rnIXXmyZPnFI9FW+T
> X780EzHZ2OLdIMoxqSF2x4tpxL7TLgt+3ZBkqOhmjFhhPnMGEz4KnLRJYG4VAoZx
> iS5xfbKtR34Dxyz5flhbdTs1BXkUJli5nhiSiXMbQfgsn0ZG2czvvhee9WZExsAL
> 0EM1UdxWP4Y0TD03msRWjCfuUmtZnGF3F0iMf3Yv3uuwXvT5pFQWe6Wf0JJNPUDC
> X3+wvZeGl+gaX3mGbE0iXL4GiWnK+JqkGUt/CrEdnIVD9h0H6wiLY7s82QEkEsK3
> IXRJBfL+TXF/RgKAeCyeVZJzAxbw3PWsb4DKb4U4ByWOmdYxXEq35G99pxZEvPn0
> DHobPUNk1RyTrDVL9n5W7lbwcOc81xCA5A3Y3z2NnOaPt8J/wcYg6qigkvhDpMql
> PIr+1WHAOf++xUDf+EIXhubGSwqNM7FgUDyrvAEWMEUP+ddz74QN9h0LD7ZB/1lR
> svPTE+a/kM2biolzwIPTCdwQXroME7lujB1ynvOEwtiGi5BOjwFTHZ/J7P12iYaZ
> px5S8WI07cR7d9cyMFkeYrwxlBwj10i9NzFDP+1JngbzrVmpf15ekj4CFAxMo7gV
> ubjcQCvOMoCbiljm5Cg7yn1MjlFkp9ZnNAPKv/TVpw6Co4TgljiAMi+w0DsLXJ8y
> DS6iczjry50ZOTpUyWbj
> =UkLT
> -----END PGP SIGNATURE-----
> 
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering
  2015-02-17  2:14     ` Alexey Kardashevskiy
@ 2015-02-17 23:53       ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-17 23:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 2185 bytes --]
On Tue, Feb 17, 2015 at 01:14:33PM +1100, Alexey Kardashevskiy wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/02/2015 06:04 PM, David Gibson wrote:
> > On Thu, Jan 29, 2015 at 08:27:20PM +1100, Alexey Kardashevskiy
> wrote:
[snip]
> >> +    vfio_mem_unregister(container, vaddr, end - iova); +} + +const
> >> MemoryListener vfio_ram_listener = { +    .region_add =
> >> vfio_ram_region_add, +    .region_del = vfio_ram_region_del, +}; + 
> >> +static void vfio_spapr_listener_release(VFIOContainer *container) 
> >> +{ +
> >> memory_listener_unregister(&container->iommu_data.type1.listener); +
> >> memory_listener_unregister(&container->iommu_data.type1.ramlistener);
> >
> >> 
> > Accessing fields within type1 from a function whose name says it's 
> > spapr specific seems very wrong.
> 
> 
> Kind of ugly, yes. But we share the common memory listener with Type1 so...
> 
> 
> >> +} + int vfio_mmap_region(Object *obj, VFIORegion *region, 
> >> MemoryRegion *mem, MemoryRegion *submem, void **map, size_t size,
> >> off_t offset, @@ -705,6 +798,10 @@ static int
> >> vfio_connect_container(VFIOGroup *group, AddressSpace *as) goto
> >> free_container_exit; }
> >> 
> >> +        container->iommu_data.type1.ramlistener =
> >> vfio_ram_listener; +
> >> memory_listener_register(&container->iommu_data.type1.ramlistener, +
> >> &address_space_memory);
> > 
> > Why two separate listeners, rather than doing both jobs from a single
> > listener?
> 
> ... I actually like the idea to have this separated from the rest of the
> code. Furthermore, now I think we better have separate memory listeners
> for Type1 and SPAPR as the current vfio_listener_region_add()/del() look
> quite ugly trying to do different things depending on
> memory_region_is_iommu().
> 
> Any objection to separating SPAPR's listener (and merging it with the one
> introduced by this patch)?
Not from here, I think that sounds like a good idea.
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device
  2015-02-17  2:36     ` Alexey Kardashevskiy
@ 2015-02-17 23:56       ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2015-02-17 23:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alex Williamson, qemu-ppc, qemu-devel, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]
On Tue, Feb 17, 2015 at 01:36:38PM +1100, Alexey Kardashevskiy wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/05/2015 03:49 PM, David Gibson wrote:
> > On Thu, Jan 29, 2015 at 08:27:30PM +1100, Alexey Kardashevskiy wrote:
> >> TCE hypercalls (H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE) use a
> >> logical bus number (LIOBN) to identify which TCE table the request
> >> is addressed to. However VFIO kernel driver operates with IOMMU
> >> group IDs and has no idea about which LIOBN corresponds to which
> >> group. If the host kernel supports in-kernel acceleration for TCE
> >> calls, we have to provide the LIOBN to IOMMU mapping information.
> >> 
> >> This makes use of a VFIO KVM device's 
> >> KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN attribute to set the link 
> >> between LIOBN and IOMMU group.
> >> 
> >> The vfio_container_spapr_set_liobn() helper is implemented
> >> completely in vfio.c because kvm_vfio_spapr_tce_liobn needs a group
> >> fd and we do not want to share resources likes that outside vfio.c.
> > 
> > I thought you'd moved away from the idea of in-kernel TCE 
> > acceleration, since big DMA windows made it unnecessary.
> 
> 
> Not entirely. DDW may not be supported by some hardware (like nVidia
> being able to generate only 40bit DMA addresses without hacks). DMA
> memory registering is pretty much about it - when we preregister pages,
> the KVM acceleration becomes lot simpler as it does not have to take care
> of pinning or locked_vm accounting.
Ok, that makes sense.
> But I should have not posted it here anyway, too early :)
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- 
> >> hw/ppc/spapr_iommu.c    |  1 + hw/ppc/spapr_pci_vfio.c | 11
> >> +++++++++++ hw/vfio/common.c        | 33
> >> +++++++++++++++++++++++++++++++++ include/hw/vfio/vfio.h  |  4 ++++ 
> >> 4 files changed, 49 insertions(+)
> >> 
> >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c index
> >> 258f837..3de95d7 100644 --- a/hw/ppc/spapr_iommu.c +++
> >> b/hw/ppc/spapr_iommu.c @@ -142,6 +142,7 @@ static int
> >> spapr_tce_table_realize(DeviceState *dev) if (!tcet->table) { size_t
> >> table_size = tcet->nb_table * sizeof(uint64_t); tcet->table =
> >> g_malloc0(table_size); +        tcet->vfio_accel = false;
> > 
> > This should probably have a qdev prop so that it can be explicitly 
> > disabled on a per-device basis for testing.
> 
> 
> It is a TCE table's property, how can you disable it if it is not created
> via the command line?
Uh, yeah it would have to be a property on VIO devices and on VFIO
host bridges which the propagates the option to the TCE table.
Hrm, that does sound a bit messy now I think about it.
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply	[flat|nested] 40+ messages in thread
end of thread, other threads:[~2015-02-17 23:57 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-29  9:27 [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 01/18] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 02/18] spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe Alexey Kardashevskiy
2015-02-02  6:30   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 03/18] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
2015-02-04 15:31   ` Alexander Graf
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 04/18] spapr_vio: " Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 05/18] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 06/18] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
2015-02-02  6:37   ` David Gibson
2015-02-03  1:32     ` Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering Alexey Kardashevskiy
2015-02-02  7:04   ` David Gibson
2015-02-17  2:14     ` Alexey Kardashevskiy
2015-02-17 23:53       ` David Gibson
2015-02-17  2:20     ` [Qemu-devel] [PATCH v4 08/18] vfio: Add DMA memory registering [repost] Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 09/18] spapr_rtas: Reserve DDW RTAS token numbers Alexey Kardashevskiy
2015-02-02  7:09   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 10/18] spapr_pci: Define DDW callbacks Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 11/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
2015-02-05  3:51   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 12/18] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS handlers Alexey Kardashevskiy
2015-02-05  4:05   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 13/18] spapr_pci: Advertise dynamic DMA windows to guest Alexey Kardashevskiy
2015-02-05  4:10   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 14/18] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 15/18] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
2015-02-05  4:19   ` David Gibson
2015-02-17  0:34     ` Alexey Kardashevskiy
2015-02-17 23:52       ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 16/18] spapr_rtas_ddw: Workaround broken LE guests Alexey Kardashevskiy
2015-02-05  4:23   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 17/18] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 Alexey Kardashevskiy
2015-02-05  4:30   ` David Gibson
2015-01-29  9:27 ` [Qemu-devel] [PATCH v4 18/18] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
2015-02-05  4:49   ` David Gibson
2015-02-17  2:36     ` Alexey Kardashevskiy
2015-02-17 23:56       ` David Gibson
2015-01-30  4:01 ` [Qemu-devel] [PATCH v4 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).