* [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW)
@ 2014-08-15 10:12 Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal Alexey Kardashevskiy
` (12 more replies)
0 siblings, 13 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
At the moment sPAPR PHB supports only a single 32bit window
which is normally 1..2GB which is not enough for high performance devices.
PAPR spec enables creating an additional window(s) to support 64bit
DMA and bigger page sizes.
This patchset adds DDW support for pseries. The host kernel changes are
required.
This was tested on POWER8 system which allows one additional DMA window
which is mapped at 0x800.0000.0000.0000 and supports 16MB pages.
Existing guests check for DDW capabilities in PHB's device tree and if it
is present, they request for an additional window and map entire guest RAM
using H_PUT_TCE/... hypercalls once at boot time and switch to direct DMA
operations.
TCE tables still may be big enough for guests backed with 64K pages but they
are reasonably small for guests backed by 16MB pages.
Please comment. Thanks!
Changes:
v2:
* tested on emulated PHB
* removed "ddw" machine property, now it is PHB property
* disabled by default
* defined "pseries-2.2" machine which enables DDW by default
* fixed reset() and reference counting
Alexey Kardashevskiy (13):
qom: Make object_child_foreach safe for objects removal
spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
spapr_pci: Make find_phb()/find_dev() public
spapr_iommu: Make spapr_tce_find_by_liobn() public
spapr_pci: Introduce a liobn number generating macros
spapr_iommu: Implement free_table() helper
spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
spapr_pci: Enable DDW
spapr_pci_vfio: Call spapr_pci::reset on reset
linux headers update for DDW
spapr_pci_vfio: Enable DDW
vfio: Enable DDW ioctls to VFIO IOMMU driver
spapr: Add pseries-2.2 machine with default "ddw" option
hw/misc/vfio.c | 4 +
hw/ppc/Makefile.objs | 3 +
hw/ppc/spapr.c | 29 +++++
hw/ppc/spapr_iommu.c | 19 ++-
hw/ppc/spapr_pci.c | 125 +++++++++++++++++--
hw/ppc/spapr_pci_vfio.c | 88 +++++++++++++-
hw/ppc/spapr_rtas_ddw.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
include/hw/pci-host/spapr.h | 30 +++++
include/hw/ppc/spapr.h | 11 +-
linux-headers/linux/vfio.h | 37 +++++-
qom/object.c | 4 +-
trace-events | 4 +
12 files changed, 611 insertions(+), 26 deletions(-)
create mode 100644 hw/ppc/spapr_rtas_ddw.c
--
2.0.0
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-19 0:39 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
` (11 subsequent siblings)
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
Current object_child_foreach() uses QTAILQ_FOREACH() to walk
through children and that makes children removal from the callback
impossible.
This makes object_child_foreach() use QTAILQ_FOREACH_SAFE().
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
This went to Andreas's qom-next tree, it is here for the reference only.
---
qom/object.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/qom/object.c b/qom/object.c
index 0e8267b..4a814dc 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -678,10 +678,10 @@ void object_class_foreach(void (*fn)(ObjectClass *klass, void *opaque),
int object_child_foreach(Object *obj, int (*fn)(Object *child, void *opaque),
void *opaque)
{
- ObjectProperty *prop;
+ ObjectProperty *prop, *next;
int ret = 0;
- QTAILQ_FOREACH(prop, &obj->properties, node) {
+ QTAILQ_FOREACH_SAFE(prop, &obj->properties, node, next) {
if (object_property_is_child(prop)) {
ret = fn(prop->opaque, opaque);
if (ret != 0) {
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-19 0:43 ` David Gibson
2014-08-27 9:27 ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 03/13] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
` (10 subsequent siblings)
12 siblings, 2 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max.
We are going to add huge DMA windows support so this will create small
window and unexpectedly fail later.
This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB. Since
those windows are normally mapped at the boot time, there will be no
performance impact.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
hw/ppc/spapr_iommu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index f6e32a4..36f5d27 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -113,11 +113,11 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
static int spapr_tce_table_realize(DeviceState *dev)
{
sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
+ uint64_t window_size = tcet->nb_table << tcet->page_shift;
- if (kvm_enabled()) {
+ if (kvm_enabled() && !(window_size >> 32)) {
tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
- tcet->nb_table <<
- tcet->page_shift,
+ window_size,
&tcet->fd,
tcet->vfio_accel);
}
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 03/13] spapr_pci: Make find_phb()/find_dev() public
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 04/13] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
` (9 subsequent siblings)
12 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/ppc/spapr_pci.c | 22 +++++++++++-----------
include/hw/pci-host/spapr.h | 4 ++++
2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e894f07..5c46c0d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -47,7 +47,7 @@
#define RTAS_TYPE_MSI 1
#define RTAS_TYPE_MSIX 2
-static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
+sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid)
{
sPAPRPHBState *sphb;
@@ -61,10 +61,10 @@ static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
return NULL;
}
-static PCIDevice *find_dev(sPAPREnvironment *spapr, uint64_t buid,
- uint32_t config_addr)
+PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
+ uint32_t config_addr)
{
- sPAPRPHBState *sphb = find_phb(spapr, buid);
+ sPAPRPHBState *sphb = spapr_pci_find_phb(spapr, buid);
PCIHostState *phb = PCI_HOST_BRIDGE(sphb);
int bus_num = (config_addr >> 16) & 0xFF;
int devfn = (config_addr >> 8) & 0xFF;
@@ -95,7 +95,7 @@ static void finish_read_pci_config(sPAPREnvironment *spapr, uint64_t buid,
return;
}
- pci_dev = find_dev(spapr, buid, addr);
+ pci_dev = spapr_pci_find_dev(spapr, buid, addr);
addr = rtas_pci_cfgaddr(addr);
if (!pci_dev || (addr % size) || (addr >= pci_config_size(pci_dev))) {
@@ -162,7 +162,7 @@ static void finish_write_pci_config(sPAPREnvironment *spapr, uint64_t buid,
return;
}
- pci_dev = find_dev(spapr, buid, addr);
+ pci_dev = spapr_pci_find_dev(spapr, buid, addr);
addr = rtas_pci_cfgaddr(addr);
if (!pci_dev || (addr % size) || (addr >= pci_config_size(pci_dev))) {
@@ -280,9 +280,9 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
}
/* Fins sPAPRPHBState */
- phb = find_phb(spapr, buid);
+ phb = spapr_pci_find_phb(spapr, buid);
if (phb) {
- pdev = find_dev(spapr, buid, config_addr);
+ pdev = spapr_pci_find_dev(spapr, buid, config_addr);
}
if (!phb || !pdev) {
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -381,9 +381,9 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
spapr_pci_msi *msi;
/* Find sPAPRPHBState */
- phb = find_phb(spapr, buid);
+ phb = spapr_pci_find_phb(spapr, buid);
if (phb) {
- pdev = find_dev(spapr, buid, config_addr);
+ pdev = spapr_pci_find_dev(spapr, buid, config_addr);
}
if (!phb || !pdev) {
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -557,7 +557,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
return;
}
- if (find_phb(spapr, sphb->buid)) {
+ if (spapr_pci_find_phb(spapr, sphb->buid)) {
error_setg(errp, "PCI host bridges must have unique BUIDs");
return;
}
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 32f0aa7..14c2ab0 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -122,4 +122,8 @@ void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr);
void spapr_pci_rtas_init(void);
+sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
+PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
+ uint32_t config_addr);
+
#endif /* __HW_SPAPR_PCI_H__ */
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 04/13] spapr_iommu: Make spapr_tce_find_by_liobn() public
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (2 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 03/13] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
` (8 subsequent siblings)
12 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
At the moment spapr_tce_find_by_liobn() is used by H_PUT_TCE/...
handlers to find an IOMMU by LIOBN.
We are going to implement Dynamic DMA windows (DDW), new code
will go to a new file and we will use spapr_tce_find_by_liobn()
there too so let's make it public.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/ppc/spapr_iommu.c | 2 +-
include/hw/ppc/spapr.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 36f5d27..588d442 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -40,7 +40,7 @@ enum sPAPRTCEAccess {
static QLIST_HEAD(spapr_tce_tables, sPAPRTCETable) spapr_tce_tables;
-static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
+sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
{
sPAPRTCETable *tcet;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 36e8e51..c9d6c6c 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -467,6 +467,7 @@ struct sPAPRTCETable {
QLIST_ENTRY(sPAPRTCETable) list;
};
+sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn);
void spapr_events_init(sPAPREnvironment *spapr);
void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (3 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 04/13] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-19 0:44 ` David Gibson
2014-08-27 9:29 ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
` (7 subsequent siblings)
12 siblings, 2 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
We are going to have multiple DMA windows per PHB and we want them to
migrate so we need a predictable way of assigning LIOBNs.
This introduces a macro which makes up a LIOBN from fixed prefix,
PHB index (unique PHB id) and window number.
This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
from LIOBN, will be used in next patch(es) to distinguish the default
32bit windows from dynamic windows.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
hw/ppc/spapr_pci.c | 2 +-
include/hw/ppc/spapr.h | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 5c46c0d..17eb0d8 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -529,7 +529,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
}
sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
- sphb->dma_liobn = SPAPR_PCI_BASE_LIOBN + sphb->index;
+ sphb->dma_liobn = SPAPR_PCI_LIOBN(sphb->index, 0);
windows_base = SPAPR_PCI_WINDOW_BASE
+ sphb->index * SPAPR_PCI_WINDOW_SPACING;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c9d6c6c..782519a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -443,7 +443,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
#define SPAPR_TCE_PAGE_MASK (SPAPR_TCE_PAGE_SIZE - 1)
#define SPAPR_VIO_BASE_LIOBN 0x00000000
-#define SPAPR_PCI_BASE_LIOBN 0x80000000
+#define SPAPR_PCI_LIOBN(i, n) (0x80000000 | ((i) << 8) | (n))
+#define SPAPR_PCI_DMA_WINDOW_NUM(liobn) ((liobn) & 0xff)
#define RTAS_ERROR_LOG_MAX 2048
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (4 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-26 6:16 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support Alexey Kardashevskiy
` (6 subsequent siblings)
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
Every sPAPRTCETable object holds an IOMMU memory region which holds
a referenced to the sPAPRTCETable instance. So if we want to free
an sPAPRTCETable instance, calling object_unref() will not be enough
as embedded memory region will hold the reference and we need to break
the loop.
This adds a spapr_tce_free_table() helper which destroys the embedded
memory region and then calls object_unref() on the sPAPRTCETable instance
which will succeed now. The helper adds a new child under unique name
derived from LIOBN.
As we are here, fix spapr_tce_new_table() callers.
At the moment spapr_tce_new_table() references sPAPRTCETable twice -
once in object_new() and second time in object_property_add_child().
The callers of spapr_tce_new_table() should take care of correct
referencing.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
hw/ppc/spapr_iommu.c | 11 ++++++++++-
hw/ppc/spapr_pci.c | 2 ++
hw/ppc/spapr_pci_vfio.c | 2 ++
include/hw/ppc/spapr.h | 1 +
4 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 588d442..74949f4 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -147,6 +147,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
bool vfio_accel)
{
sPAPRTCETable *tcet;
+ char buf[32];
if (spapr_tce_find_by_liobn(liobn)) {
fprintf(stderr, "Attempted to create TCE table with duplicate"
@@ -165,13 +166,21 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
tcet->nb_table = nb_table;
tcet->vfio_accel = vfio_accel;
- object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), NULL);
+ snprintf(buf, sizeof(buf) - 1, "tce-table-%08X", tcet->liobn);
+ object_property_add_child(OBJECT(owner), buf, OBJECT(tcet), NULL);
object_property_set_bool(OBJECT(tcet), true, "realized", NULL);
return tcet;
}
+void spapr_tce_free_table(sPAPRTCETable *tcet)
+{
+ memory_region_destroy(&tcet->iommu);
+
+ object_unref(OBJECT(tcet));
+}
+
static void spapr_tce_table_finalize(Object *obj)
{
sPAPRTCETable *tcet = SPAPR_TCE_TABLE(obj);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 17eb0d8..aa20c36 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -654,6 +654,8 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
/* Register default 32bit DMA window */
memory_region_add_subregion(&sphb->iommu_root, 0,
spapr_tce_get_iommu(tcet));
+
+ object_unref(OBJECT(tcet));
}
static int spapr_phb_children_reset(Object *child, void *opaque)
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index d3bddf2..51b4314 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -69,6 +69,8 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
/* Register default 32bit DMA window */
memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
spapr_tce_get_iommu(tcet));
+
+ object_unref(OBJECT(tcet));
}
static void spapr_phb_vfio_reset(DeviceState *qdev)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 782519a..537072f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -477,6 +477,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
uint32_t page_shift,
uint32_t nb_table,
bool vfio_accel);
+void spapr_tce_free_table(sPAPRTCETable *tcet);
MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass);
int spapr_dma_dt(void *fdt, int node_off, const char *propname,
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (5 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-26 7:06 ` David Gibson
2014-08-27 9:36 ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW Alexey Kardashevskiy
` (5 subsequent siblings)
12 siblings, 2 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)
which can support page sizes other than 4K.
The existing implementation of DDW in the guest tries to create one huge
DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
succeeds, the guest switches to dma_direct_ops and never calls
TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
the entire RAM and not waste time on map/unmap.
This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.
These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PHB.
Since no PHB class implements new callback in this patch, no functional
change is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* double loop squashed to spapr_iommu_fixmask() helper
* added @ddw_num counter to PHB, it is used to generate LIOBN for new
window; it is reset on ddw-reset event
* added ULL to constants used in shift operations
* rtas_ibm_reset_pe_dma_window() and rtas_ibm_remove_pe_dma_window()
do not remove windows anymore, the PHB callback has to as it will reuse
the same code in case of guest reboot as well
---
hw/ppc/Makefile.objs | 3 +
hw/ppc/spapr_pci.c | 3 +-
hw/ppc/spapr_rtas_ddw.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
include/hw/pci-host/spapr.h | 19 +++
include/hw/ppc/spapr.h | 6 +-
trace-events | 4 +
6 files changed, 316 insertions(+), 2 deletions(-)
create mode 100644 hw/ppc/spapr_rtas_ddw.c
diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index edd44d0..9773294 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o
ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
obj-y += spapr_pci_vfio.o
endif
+ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
+obj-y += spapr_rtas_ddw.o
+endif
# PowerPC 4xx boards
obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index aa20c36..9b03d0d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -759,7 +759,7 @@ static int spapr_pci_post_load(void *opaque, int version_id)
static const VMStateDescription vmstate_spapr_pci = {
.name = "spapr_pci",
- .version_id = 2,
+ .version_id = 3,
.minimum_version_id = 2,
.pre_save = spapr_pci_pre_save,
.post_load = spapr_pci_post_load,
@@ -775,6 +775,7 @@ static const VMStateDescription vmstate_spapr_pci = {
VMSTATE_INT32(msi_devs_num, sPAPRPHBState),
VMSTATE_STRUCT_VARRAY_ALLOC(msi_devs, sPAPRPHBState, msi_devs_num, 0,
vmstate_spapr_pci_msi, spapr_pci_msi_mig),
+ VMSTATE_UINT32_V(ddw_num, sPAPRPHBState, 3),
VMSTATE_END_OF_LIST()
},
};
diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
new file mode 100644
index 0000000..2b5376a
--- /dev/null
+++ b/hw/ppc/spapr_rtas_ddw.c
@@ -0,0 +1,283 @@
+/*
+ * QEMU sPAPR Dynamic DMA windows support
+ *
+ * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw/ppc/spapr.h"
+#include "hw/pci-host/spapr.h"
+#include "trace.h"
+
+static uint32_t spapr_iommu_fixmask(struct ppc_one_seg_page_size *sps,
+ uint32_t query_mask)
+{
+ int i, j;
+ uint32_t mask = 0;
+ const struct { int shift; uint32_t mask; } masks[] = {
+ { 12, DDW_PGSIZE_4K },
+ { 16, DDW_PGSIZE_64K },
+ { 24, DDW_PGSIZE_16M },
+ { 25, DDW_PGSIZE_32M },
+ { 26, DDW_PGSIZE_64M },
+ { 27, DDW_PGSIZE_128M },
+ { 28, DDW_PGSIZE_256M },
+ { 34, DDW_PGSIZE_16G },
+ };
+
+ for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+ for (j = 0; j < ARRAY_SIZE(masks); ++j) {
+ if ((sps[i].page_shift == masks[j].shift) &&
+ (query_mask & masks[j].mask)) {
+ mask |= masks[j].mask;
+ }
+ }
+ }
+
+ return mask;
+}
+
+static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
+ sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args,
+ uint32_t nret, target_ulong rets)
+{
+ CPUPPCState *env = &cpu->env;
+ sPAPRPHBState *sphb;
+ sPAPRPHBClass *spc;
+ uint64_t buid;
+ uint32_t addr, pgmask = 0;
+ uint32_t windows_available = 0, page_size_mask = 0;
+ long ret;
+
+ if ((nargs != 3) || (nret != 5)) {
+ goto param_error_exit;
+ }
+
+ buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
+ addr = rtas_ld(args, 0);
+ sphb = spapr_pci_find_phb(spapr, buid);
+ if (!sphb) {
+ goto param_error_exit;
+ }
+
+ spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+ if (!spc->ddw_query) {
+ goto hw_error_exit;
+ }
+
+ ret = spc->ddw_query(sphb, &windows_available, &page_size_mask);
+ trace_spapr_iommu_ddw_query(buid, addr, windows_available,
+ page_size_mask, pgmask, ret);
+ if (ret) {
+ goto hw_error_exit;
+ }
+
+ /* Work out supported page masks */
+ pgmask = spapr_iommu_fixmask(env->sps.sps, page_size_mask);
+
+ rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+ rtas_st(rets, 1, windows_available);
+
+ /*
+ * This is "Largest contiguous block of TCEs allocated specifically
+ * for (that is, are reserved for) this PE".
+ * Return the maximum number as all RAM was in 4K pages.
+ */
+ rtas_st(rets, 2, ram_size >> SPAPR_TCE_PAGE_SHIFT);
+ rtas_st(rets, 3, pgmask);
+ rtas_st(rets, 4, pgmask); /* DMA migration mask */
+ return;
+
+hw_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+ return;
+
+param_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
+ sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args,
+ uint32_t nret, target_ulong rets)
+{
+ sPAPRPHBState *sphb;
+ sPAPRPHBClass *spc;
+ sPAPRTCETable *tcet = NULL;
+ uint32_t addr, page_shift, window_shift, liobn;
+ uint64_t buid;
+ long ret;
+
+ if ((nargs != 5) || (nret != 4)) {
+ goto param_error_exit;
+ }
+
+ buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
+ addr = rtas_ld(args, 0);
+ sphb = spapr_pci_find_phb(spapr, buid);
+ if (!sphb) {
+ goto param_error_exit;
+ }
+
+ spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+ if (!spc->ddw_create) {
+ goto hw_error_exit;
+ }
+
+ page_shift = rtas_ld(args, 3);
+ window_shift = rtas_ld(args, 4);
+ /* Default 32bit window#0 is always there so +1 */
+ liobn = SPAPR_PCI_LIOBN(sphb->index, sphb->ddw_num + 1);
+
+ ret = spc->ddw_create(sphb, page_shift, window_shift, liobn, &tcet);
+ trace_spapr_iommu_ddw_create(buid, addr, 1ULL << page_shift,
+ 1ULL << window_shift,
+ tcet ? tcet->bus_offset : 0xbaadf00d,
+ liobn, ret);
+ if (ret || !tcet) {
+ goto hw_error_exit;
+ }
+
+ sphb->ddw_num++;
+ rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+ rtas_st(rets, 1, liobn);
+ rtas_st(rets, 2, tcet->bus_offset >> 32);
+ rtas_st(rets, 3, tcet->bus_offset & ((uint32_t) -1));
+
+ object_unref(OBJECT(tcet));
+ return;
+
+hw_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+ return;
+
+param_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_remove_pe_dma_window(PowerPCCPU *cpu,
+ sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args,
+ uint32_t nret, target_ulong rets)
+{
+ sPAPRPHBState *sphb;
+ sPAPRPHBClass *spc;
+ sPAPRTCETable *tcet;
+ uint32_t liobn;
+ long ret;
+
+ if ((nargs != 1) || (nret != 1)) {
+ goto param_error_exit;
+ }
+
+ liobn = rtas_ld(args, 0);
+ tcet = spapr_tce_find_by_liobn(liobn);
+ if (!tcet) {
+ goto param_error_exit;
+ }
+
+ sphb = SPAPR_PCI_HOST_BRIDGE(OBJECT(tcet)->parent);
+ if (!sphb) {
+ goto param_error_exit;
+ }
+
+ spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+ if (!spc->ddw_remove) {
+ goto hw_error_exit;
+ }
+
+ ret = spc->ddw_remove(sphb, tcet);
+ trace_spapr_iommu_ddw_remove(liobn, ret);
+ if (ret) {
+ goto hw_error_exit;
+ }
+
+ rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+ return;
+
+hw_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+ return;
+
+param_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void rtas_ibm_reset_pe_dma_window(PowerPCCPU *cpu,
+ sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args,
+ uint32_t nret, target_ulong rets)
+{
+ sPAPRPHBState *sphb;
+ sPAPRPHBClass *spc;
+ uint64_t buid;
+ uint32_t addr;
+ long ret;
+
+ if ((nargs != 3) || (nret != 1)) {
+ goto param_error_exit;
+ }
+
+ buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
+ addr = rtas_ld(args, 0);
+ sphb = spapr_pci_find_phb(spapr, buid);
+ if (!sphb) {
+ goto param_error_exit;
+ }
+
+ spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+ if (!spc->ddw_reset) {
+ goto hw_error_exit;
+ }
+
+ ret = spc->ddw_reset(sphb);
+ trace_spapr_iommu_ddw_reset(buid, addr, ret);
+ if (ret) {
+ goto hw_error_exit;
+ }
+
+ rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+ return;
+
+hw_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+ return;
+
+param_error_exit:
+ rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
+static void spapr_rtas_ddw_init(void)
+{
+ spapr_rtas_register(RTAS_IBM_QUERY_PE_DMA_WINDOW,
+ "ibm,query-pe-dma-window",
+ rtas_ibm_query_pe_dma_window);
+ spapr_rtas_register(RTAS_IBM_CREATE_PE_DMA_WINDOW,
+ "ibm,create-pe-dma-window",
+ rtas_ibm_create_pe_dma_window);
+ spapr_rtas_register(RTAS_IBM_REMOVE_PE_DMA_WINDOW,
+ "ibm,remove-pe-dma-window",
+ rtas_ibm_remove_pe_dma_window);
+ spapr_rtas_register(RTAS_IBM_RESET_PE_DMA_WINDOW,
+ "ibm,reset-pe-dma-window",
+ rtas_ibm_reset_pe_dma_window);
+}
+
+type_init(spapr_rtas_ddw_init)
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 14c2ab0..a1fdbb2 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -49,6 +49,24 @@ struct sPAPRPHBClass {
PCIHostBridgeClass parent_class;
void (*finish_realize)(sPAPRPHBState *sphb, Error **errp);
+
+/* sPAPR spec defined pagesize mask values */
+#define DDW_PGSIZE_4K 0x01
+#define DDW_PGSIZE_64K 0x02
+#define DDW_PGSIZE_16M 0x04
+#define DDW_PGSIZE_32M 0x08
+#define DDW_PGSIZE_64M 0x10
+#define DDW_PGSIZE_128M 0x20
+#define DDW_PGSIZE_256M 0x40
+#define DDW_PGSIZE_16G 0x80
+
+ int (*ddw_query)(sPAPRPHBState *sphb, uint32_t *windows_available,
+ uint32_t *page_size_mask);
+ int (*ddw_create)(sPAPRPHBState *sphb, uint32_t page_shift,
+ uint32_t window_shift, uint32_t liobn,
+ sPAPRTCETable **ptcet);
+ int (*ddw_remove)(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
+ int (*ddw_reset)(sPAPRPHBState *sphb);
};
typedef struct spapr_pci_msi {
@@ -73,6 +91,7 @@ struct sPAPRPHBState {
MemoryRegion memwindow, iowindow;
uint32_t dma_liobn;
+ uint32_t ddw_num;
AddressSpace iommu_as;
MemoryRegion iommu_root;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 537072f..4140615 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -383,8 +383,12 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
#define RTAS_GET_SENSOR_STATE (RTAS_TOKEN_BASE + 0x1D)
#define RTAS_IBM_CONFIGURE_CONNECTOR (RTAS_TOKEN_BASE + 0x1E)
#define RTAS_IBM_OS_TERM (RTAS_TOKEN_BASE + 0x1F)
+#define RTAS_IBM_QUERY_PE_DMA_WINDOW (RTAS_TOKEN_BASE + 0x20)
+#define RTAS_IBM_CREATE_PE_DMA_WINDOW (RTAS_TOKEN_BASE + 0x21)
+#define RTAS_IBM_REMOVE_PE_DMA_WINDOW (RTAS_TOKEN_BASE + 0x22)
+#define RTAS_IBM_RESET_PE_DMA_WINDOW (RTAS_TOKEN_BASE + 0x23)
-#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x20)
+#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x24)
/* RTAS ibm,get-system-parameter token values */
#define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS 20
diff --git a/trace-events b/trace-events
index 11a17a8..5b54fbd 100644
--- a/trace-events
+++ b/trace-events
@@ -1213,6 +1213,10 @@ spapr_iommu_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t iobaN
spapr_iommu_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x"
spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d"
+spapr_iommu_ddw_query(uint64_t buid, uint32_t cfgaddr, uint32_t wa, uint32_t pgz, uint32_t pgz_fixed, long ret) "buid=%"PRIx64" addr=%"PRIx32", %u windows available, sizes %"PRIx32", fixed %"PRIx32", ret = %ld"
+spapr_iommu_ddw_create(uint64_t buid, uint32_t cfgaddr, unsigned long long pg_size, unsigned long long req_size, uint64_t start, uint32_t liobn, long ret) "buid=%"PRIx64" addr=%"PRIx32", page size=0x%llx, requested=0x%llx, start addr=%"PRIx64", liobn=%"PRIx32", ret = %ld"
+spapr_iommu_ddw_remove(uint32_t liobn, long ret) "liobn=%"PRIx32", ret = %ld"
+spapr_iommu_ddw_reset(uint64_t buid, uint32_t cfgaddr, long ret) "buid=%"PRIx64" addr=%"PRIx32", ret = %ld"
# hw/ppc/ppc.c
ppc_tb_adjust(uint64_t offs1, uint64_t offs2, int64_t diff, int64_t seconds) "adjusted from 0x%"PRIx64" to 0x%"PRIx64", diff %"PRId64" (%"PRId64"s)"
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (6 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-26 7:14 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset Alexey Kardashevskiy
` (4 subsequent siblings)
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This implements DDW for emulated PHB.
This advertises DDW in device tree.
Since QEMU does not implement any 64bit DMA capable device, this hack
has been used to enable 64bit DMA on E1000:
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 0fc29a0..131f80a 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -240,6 +240,7 @@ static const uint32_t mac_reg_init[] = {
[STATUS] = 0x80000000 | E1000_STATUS_GIO_MASTER_ENABLE |
E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
+ E1000_STATUS_PCIX_MODE |
E1000_STATUS_LU,
[MANC] = E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* tested on hacked emulated E1000
* implemented DDW reset on the PHB reset
* spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO
---
hw/ppc/spapr_pci.c | 96 +++++++++++++++++++++++++++++++++++++++++++++
include/hw/pci-host/spapr.h | 7 ++++
2 files changed, 103 insertions(+)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 9b03d0d..cba8d9d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -22,6 +22,7 @@
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
+#include "sysemu/sysemu.h"
#include "hw/hw.h"
#include "hw/pci/pci.h"
#include "hw/pci/msi.h"
@@ -498,6 +499,70 @@ void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr)
}
/*
+ * Dynamic DMA windows
+ */
+static int spapr_pci_ddw_query(sPAPRPHBState *sphb,
+ uint32_t *windows_available,
+ uint32_t *page_size_mask)
+{
+ *windows_available = 1;
+ *page_size_mask = DDW_PGSIZE_16M;
+
+ return 0;
+}
+
+static int spapr_pci_ddw_create(sPAPRPHBState *sphb, uint32_t page_shift,
+ uint32_t window_shift, uint32_t liobn,
+ sPAPRTCETable **ptcet)
+{
+ *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
+ SPAPR_PCI_TCE64_START, page_shift,
+ 1ULL << (window_shift - page_shift),
+ true);
+ if (!*ptcet) {
+ return -1;
+ }
+ memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
+ spapr_tce_get_iommu(*ptcet));
+
+ return 0;
+}
+
+int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
+{
+ memory_region_del_subregion(&sphb->iommu_root,
+ spapr_tce_get_iommu(tcet));
+ spapr_tce_free_table(tcet);
+
+ return 0;
+}
+
+static int spapr_pci_remove_ddw_cb(Object *child, void *opaque)
+{
+ sPAPRTCETable *tcet;
+
+ tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
+
+ /* Delete all dynamic windows, i.e. every except the default one with #0 */
+ if (tcet && SPAPR_PCI_DMA_WINDOW_NUM(tcet->liobn)) {
+ sPAPRPHBState *sphb = opaque;
+ sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
+
+ spc->ddw_remove(sphb, tcet);
+ }
+
+ return 0;
+}
+
+int spapr_pci_ddw_reset(sPAPRPHBState *sphb)
+{
+ object_child_foreach(OBJECT(sphb), spapr_pci_remove_ddw_cb, sphb);
+ sphb->ddw_num = 0;
+
+ return 0;
+}
+
+/*
* PHB PCI device
*/
static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
@@ -671,6 +736,12 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
static void spapr_phb_reset(DeviceState *qdev)
{
+ sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(qdev);
+
+ if (spc->ddw_reset) {
+ spc->ddw_reset(SPAPR_PCI_HOST_BRIDGE(qdev));
+ }
+
/* Reset the IOMMU state */
object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
}
@@ -685,6 +756,7 @@ static Property spapr_phb_properties[] = {
DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
SPAPR_PCI_IO_WIN_SIZE),
+ DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, false),
DEFINE_PROP_END_OF_LIST(),
};
@@ -802,6 +874,10 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
dc->cannot_instantiate_with_device_add_yet = false;
spc->finish_realize = spapr_phb_finish_realize;
+ spc->ddw_query = spapr_pci_ddw_query;
+ spc->ddw_create = spapr_pci_ddw_create;
+ spc->ddw_remove = spapr_pci_ddw_remove;
+ spc->ddw_reset = spapr_pci_ddw_reset;
}
static const TypeInfo spapr_phb_info = {
@@ -885,6 +961,13 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
uint32_t interrupt_map_mask[] = {
cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
uint32_t interrupt_map[PCI_SLOT_MAX * PCI_NUM_PINS][7];
+ uint32_t ddw_applicable[] = {
+ RTAS_IBM_QUERY_PE_DMA_WINDOW,
+ RTAS_IBM_CREATE_PE_DMA_WINDOW,
+ RTAS_IBM_REMOVE_PE_DMA_WINDOW
+ };
+ uint32_t ddw_extensions[] = { 1, RTAS_IBM_RESET_PE_DMA_WINDOW };
+ sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(phb);
/* Start populating the FDT */
sprintf(nodename, "pci@%" PRIx64, phb->buid);
@@ -914,6 +997,19 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
_FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
_FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS));
+ /* Dynamic DMA window */
+ if (phb->ddw_enabled &&
+ spc->ddw_query && spc->ddw_create && spc->ddw_remove) {
+ _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-applicable", &ddw_applicable,
+ sizeof(ddw_applicable)));
+
+ if (spc->ddw_reset) {
+ /* When enabled, the guest will remove the default 32bit window */
+ _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-extensions",
+ &ddw_extensions, sizeof(ddw_extensions)));
+ }
+ }
+
/* Build the interrupt-map, this must matches what is done
* in pci_spapr_map_irq
*/
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index a1fdbb2..0b40fcf 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -104,6 +104,8 @@ struct sPAPRPHBState {
int32_t msi_devs_num;
spapr_pci_msi_mig *msi_devs;
+ bool ddw_enabled;
+
QLIST_ENTRY(sPAPRPHBState) list;
};
@@ -126,6 +128,9 @@ struct sPAPRPHBVFIOState {
#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
+/* Default 64bit dynamic window offset */
+#define SPAPR_PCI_TCE64_START 0x8000000000000000ULL
+
static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
{
return xics_get_qirq(spapr->icp, phb->lsi_table[pin].irq);
@@ -144,5 +149,7 @@ void spapr_pci_rtas_init(void);
sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
uint32_t config_addr);
+int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
+int spapr_pci_ddw_reset(sPAPRPHBState *sphb);
#endif /* __HW_SPAPR_PCI_H__ */
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (7 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-26 6:55 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW Alexey Kardashevskiy
` (3 subsequent siblings)
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This enables use of the parent class rest() callback in VFIO.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
I honestly do not remember why I did not do this when I added VFIO at the first place...
---
hw/ppc/spapr_pci_vfio.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 51b4314..11b4272 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -73,18 +73,12 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
object_unref(OBJECT(tcet));
}
-static void spapr_phb_vfio_reset(DeviceState *qdev)
-{
- /* Do nothing */
-}
-
static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
dc->props = spapr_phb_vfio_properties;
- dc->reset = spapr_phb_vfio_reset;
spc->finish_realize = spapr_phb_vfio_finish_realize;
}
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (8 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-18 17:42 ` Alex Williamson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW Alexey Kardashevskiy
` (2 subsequent siblings)
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
Since the changes are not in upstream yet, no tag or branch is specified here.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
| 37 ++++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
--git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 26c218e..f0aa97d 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -448,13 +448,48 @@ struct vfio_iommu_type1_dma_unmap {
*/
struct vfio_iommu_spapr_tce_info {
__u32 argsz;
- __u32 flags; /* reserved for future use */
+ __u32 flags;
+#define VFIO_IOMMU_SPAPR_TCE_FLAG_DDW 1 /* Support dynamic windows */
__u32 dma32_window_start; /* 32 bit window start (bytes) */
__u32 dma32_window_size; /* 32 bit window size (bytes) */
};
#define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
+/*
+ * Dynamic DMA windows
+ */
+struct vfio_iommu_spapr_tce_query {
+ __u32 argsz;
+ /* out */
+ __u32 windows_available;
+ __u32 page_size_mask;
+};
+#define VFIO_IOMMU_SPAPR_TCE_QUERY _IO(VFIO_TYPE, VFIO_BASE + 17)
+
+struct vfio_iommu_spapr_tce_create {
+ __u32 argsz;
+ /* in */
+ __u32 page_shift;
+ __u32 window_shift;
+ /* out */
+ __u64 start_addr;
+
+};
+#define VFIO_IOMMU_SPAPR_TCE_CREATE _IO(VFIO_TYPE, VFIO_BASE + 18)
+
+struct vfio_iommu_spapr_tce_remove {
+ __u32 argsz;
+ /* in */
+ __u64 start_addr;
+};
+#define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 19)
+
+struct vfio_iommu_spapr_tce_reset {
+ __u32 argsz;
+};
+#define VFIO_IOMMU_SPAPR_TCE_RESET _IO(VFIO_TYPE, VFIO_BASE + 20)
+
/* ***************************************************************** */
#endif /* VFIO_H */
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (9 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-26 7:19 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option Alexey Kardashevskiy
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This implements DDW for VFIO. Host kernel support is required for this.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* remove()/reset() callbacks use spapr_pci's ones
---
hw/ppc/spapr_pci_vfio.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 11b4272..79df716 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -71,6 +71,88 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
spapr_tce_get_iommu(tcet));
object_unref(OBJECT(tcet));
+
+ if (sphb->ddw_enabled) {
+ sphb->ddw_enabled = !!(info.flags & VFIO_IOMMU_SPAPR_TCE_FLAG_DDW);
+ }
+}
+
+static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
+ uint32_t *windows_available,
+ uint32_t *page_size_mask)
+{
+ sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+ struct vfio_iommu_spapr_tce_query query = { .argsz = sizeof(query) };
+ int ret;
+
+ ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+ VFIO_IOMMU_SPAPR_TCE_QUERY, &query);
+ if (ret) {
+ return ret;
+ }
+
+ *windows_available = query.windows_available;
+ *page_size_mask = query.page_size_mask;
+
+ return ret;
+}
+
+static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t page_shift,
+ uint32_t window_shift, uint32_t liobn,
+ sPAPRTCETable **ptcet)
+{
+ sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+ struct vfio_iommu_spapr_tce_create create = {
+ .argsz = sizeof(create),
+ .page_shift = page_shift,
+ .window_shift = window_shift,
+ .start_addr = 0
+ };
+ int ret;
+
+ ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+ VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
+ if (ret) {
+ return ret;
+ }
+
+ *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
+ create.start_addr, page_shift,
+ 1ULL << (window_shift - page_shift),
+ true);
+ memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
+ spapr_tce_get_iommu(*ptcet));
+
+ return ret;
+}
+
+static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
+{
+ sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+ struct vfio_iommu_spapr_tce_remove remove = {
+ .argsz = sizeof(remove),
+ .start_addr = tcet->bus_offset
+ };
+ int ret;
+
+ spapr_pci_ddw_remove(sphb, tcet);
+ ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+ VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
+
+ return ret;
+}
+
+static int spapr_pci_vfio_ddw_reset(sPAPRPHBState *sphb)
+{
+ sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
+ struct vfio_iommu_spapr_tce_reset reset = { .argsz = sizeof(reset) };
+ int ret;
+
+ spapr_pci_ddw_reset(sphb);
+ ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
+ VFIO_IOMMU_SPAPR_TCE_RESET, &reset);
+
+ return ret;
}
static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
@@ -80,6 +162,10 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
dc->props = spapr_phb_vfio_properties;
spc->finish_realize = spapr_phb_vfio_finish_realize;
+ spc->ddw_query = spapr_pci_vfio_ddw_query;
+ spc->ddw_create = spapr_pci_vfio_ddw_create;
+ spc->ddw_remove = spapr_pci_vfio_ddw_remove;
+ spc->ddw_reset = spapr_pci_vfio_ddw_reset;
}
static const TypeInfo spapr_phb_vfio_info = {
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (10 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-26 7:20 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option Alexey Kardashevskiy
12 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This enables DDW RTAS-related ioctls in VFIO.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
hw/misc/vfio.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index ba08adb..5b95cfa 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -4453,6 +4453,10 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
switch (req) {
case VFIO_CHECK_EXTENSION:
case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+ case VFIO_IOMMU_SPAPR_TCE_QUERY:
+ case VFIO_IOMMU_SPAPR_TCE_CREATE:
+ case VFIO_IOMMU_SPAPR_TCE_REMOVE:
+ case VFIO_IOMMU_SPAPR_TCE_RESET:
break;
default:
/* Return an error on unknown requests */
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
` (11 preceding siblings ...)
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
@ 2014-08-15 10:12 ` Alexey Kardashevskiy
2014-08-27 9:44 ` Alexander Graf
2014-08-27 9:44 ` Alexander Graf
12 siblings, 2 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-15 10:12 UTC (permalink / raw)
To: qemu-devel
Cc: Alexey Kardashevskiy, Alexander Graf, Gavin Shan, Alex Williamson,
qemu-ppc, David Gibson
This defines new "pseries" machine version which is capable of DDW
(dynamic DMA windows) by default.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
hw/ppc/spapr.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5c92707..507fd8a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1685,10 +1685,39 @@ static const TypeInfo spapr_machine_2_1_info = {
.class_init = spapr_machine_2_1_class_init,
};
+static void spapr_machine_2_2_class_init(ObjectClass *oc, void *data)
+{
+ MachineClass *mc = MACHINE_CLASS(oc);
+ static GlobalProperty compat_props[] = {
+ {
+ .driver = TYPE_SPAPR_PCI_HOST_BRIDGE,
+ .property = "ddw",
+ .value = stringify(on),
+ },
+ {
+ .driver = TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE,
+ .property = "ddw",
+ .value = stringify(on),
+ }
+ };
+
+ mc->name = "pseries-2.2";
+ mc->desc = "pSeries Logical Partition (PAPR compliant) v2.2";
+ mc->is_default = 0;
+ mc->compat_props = compat_props;
+}
+
+static const TypeInfo spapr_machine_2_2_info = {
+ .name = TYPE_SPAPR_MACHINE "2.2",
+ .parent = TYPE_SPAPR_MACHINE,
+ .class_init = spapr_machine_2_2_class_init,
+};
+
static void spapr_machine_register_types(void)
{
type_register_static(&spapr_machine_info);
type_register_static(&spapr_machine_2_1_info);
+ type_register_static(&spapr_machine_2_2_info);
}
type_init(spapr_machine_register_types)
--
2.0.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW Alexey Kardashevskiy
@ 2014-08-18 17:42 ` Alex Williamson
2014-08-20 7:49 ` Alexey Kardashevskiy
0 siblings, 1 reply; 41+ messages in thread
From: Alex Williamson @ 2014-08-18 17:42 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: David Gibson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On Fri, 2014-08-15 at 20:12 +1000, Alexey Kardashevskiy wrote:
> Since the changes are not in upstream yet, no tag or branch is specified here.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> linux-headers/linux/vfio.h | 37 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 26c218e..f0aa97d 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -448,13 +448,48 @@ struct vfio_iommu_type1_dma_unmap {
> */
> struct vfio_iommu_spapr_tce_info {
> __u32 argsz;
> - __u32 flags; /* reserved for future use */
> + __u32 flags;
> +#define VFIO_IOMMU_SPAPR_TCE_FLAG_DDW 1 /* Support dynamic windows */
> __u32 dma32_window_start; /* 32 bit window start (bytes) */
> __u32 dma32_window_size; /* 32 bit window size (bytes) */
> };
>
> #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>
> +/*
> + * Dynamic DMA windows
> + */
> +struct vfio_iommu_spapr_tce_query {
> + __u32 argsz;
> + /* out */
> + __u32 windows_available;
> + __u32 page_size_mask;
> +};
Why do we need a new ioctl for this vs extending tce_info to include it?
That's sort of the point of including argsz and flags in the ioctl.
> +#define VFIO_IOMMU_SPAPR_TCE_QUERY _IO(VFIO_TYPE, VFIO_BASE + 17)
> +
> +struct vfio_iommu_spapr_tce_create {
> + __u32 argsz;
> + /* in */
> + __u32 page_shift;
> + __u32 window_shift;
> + /* out */
> + __u64 start_addr;
> +
> +};
> +#define VFIO_IOMMU_SPAPR_TCE_CREATE _IO(VFIO_TYPE, VFIO_BASE + 18)
> +
> +struct vfio_iommu_spapr_tce_remove {
> + __u32 argsz;
> + /* in */
> + __u64 start_addr;
> +};
> +#define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 19)
> +
> +struct vfio_iommu_spapr_tce_reset {
> + __u32 argsz;
> +};
> +#define VFIO_IOMMU_SPAPR_TCE_RESET _IO(VFIO_TYPE, VFIO_BASE + 20)
> +
argsz by itself seems rather pointless if we don't have a flags field to
augment the structure. Thanks,
Alex
> /* ***************************************************************** */
>
> #endif /* VFIO_H */
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal Alexey Kardashevskiy
@ 2014-08-19 0:39 ` David Gibson
0 siblings, 0 replies; 41+ messages in thread
From: David Gibson @ 2014-08-19 0:39 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 635 bytes --]
On Fri, Aug 15, 2014 at 08:12:23PM +1000, Alexey Kardashevskiy wrote:
> Current object_child_foreach() uses QTAILQ_FOREACH() to walk
> through children and that makes children removal from the callback
> impossible.
>
> This makes object_child_foreach() use QTAILQ_FOREACH_SAFE().
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Seems like a good idea to me.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
@ 2014-08-19 0:43 ` David Gibson
2014-08-20 8:09 ` Alexey Kardashevskiy
2014-08-27 9:27 ` Alexander Graf
1 sibling, 1 reply; 41+ messages in thread
From: David Gibson @ 2014-08-19 0:43 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]
On Fri, Aug 15, 2014 at 08:12:24PM +1000, Alexey Kardashevskiy wrote:
> The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max.
> We are going to add huge DMA windows support so this will create small
> window and unexpectedly fail later.
>
> This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB. Since
> those windows are normally mapped at the boot time, there will be no
> performance impact.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
I think perhaps the crucial point here is that the window size
parameter to the kernel ioctl() is only 32-bit, so there's no way of
expressing a TCE window > 4GB.
> ---
> hw/ppc/spapr_iommu.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index f6e32a4..36f5d27 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -113,11 +113,11 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
> static int spapr_tce_table_realize(DeviceState *dev)
> {
> sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
> + uint64_t window_size = tcet->nb_table << tcet->page_shift;
tcet->nb_table is only 32-bit itself, so this isn't going to work as
intended without a cast.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
@ 2014-08-19 0:44 ` David Gibson
2014-08-27 9:29 ` Alexander Graf
1 sibling, 0 replies; 41+ messages in thread
From: David Gibson @ 2014-08-19 0:44 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 852 bytes --]
On Fri, Aug 15, 2014 at 08:12:27PM +1000, Alexey Kardashevskiy wrote:
> We are going to have multiple DMA windows per PHB and we want them to
> migrate so we need a predictable way of assigning LIOBNs.
>
> This introduces a macro which makes up a LIOBN from fixed prefix,
> PHB index (unique PHB id) and window number.
>
> This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
> from LIOBN, will be used in next patch(es) to distinguish the default
> 32bit windows from dynamic windows.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Looks sane enough.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW
2014-08-18 17:42 ` Alex Williamson
@ 2014-08-20 7:49 ` Alexey Kardashevskiy
2014-08-20 19:44 ` Alex Williamson
0 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-20 7:49 UTC (permalink / raw)
To: Alex Williamson
Cc: David Gibson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/19/2014 03:42 AM, Alex Williamson wrote:
> On Fri, 2014-08-15 at 20:12 +1000, Alexey Kardashevskiy wrote:
>> Since the changes are not in upstream yet, no tag or branch is specified here.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> linux-headers/linux/vfio.h | 37 ++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 36 insertions(+), 1 deletion(-)
>>
>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>> index 26c218e..f0aa97d 100644
>> --- a/linux-headers/linux/vfio.h
>> +++ b/linux-headers/linux/vfio.h
>> @@ -448,13 +448,48 @@ struct vfio_iommu_type1_dma_unmap {
>> */
>> struct vfio_iommu_spapr_tce_info {
>> __u32 argsz;
>> - __u32 flags; /* reserved for future use */
>> + __u32 flags;
>> +#define VFIO_IOMMU_SPAPR_TCE_FLAG_DDW 1 /* Support dynamic windows */
>> __u32 dma32_window_start; /* 32 bit window start (bytes) */
>> __u32 dma32_window_size; /* 32 bit window size (bytes) */
>> };
>>
>> #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>
>> +/*
>> + * Dynamic DMA windows
>> + */
>> +struct vfio_iommu_spapr_tce_query {
>> + __u32 argsz;
>> + /* out */
>> + __u32 windows_available;
>> + __u32 page_size_mask;
>> +};
>
> Why do we need a new ioctl for this vs extending tce_info to include it?
> That's sort of the point of including argsz and flags in the ioctl.
It is not actual now but I can imagine that these numbers may change
depending on multiple calls of create()/remove().
>
>> +#define VFIO_IOMMU_SPAPR_TCE_QUERY _IO(VFIO_TYPE, VFIO_BASE + 17)
>> +
>> +struct vfio_iommu_spapr_tce_create {
>> + __u32 argsz;
>> + /* in */
>> + __u32 page_shift;
>> + __u32 window_shift;
>> + /* out */
>> + __u64 start_addr;
>> +
>> +};
>> +#define VFIO_IOMMU_SPAPR_TCE_CREATE _IO(VFIO_TYPE, VFIO_BASE + 18)
>> +
>> +struct vfio_iommu_spapr_tce_remove {
>> + __u32 argsz;
>> + /* in */
>> + __u64 start_addr;
>> +};
>> +#define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 19)
>> +
>> +struct vfio_iommu_spapr_tce_reset {
>> + __u32 argsz;
>> +};
>> +#define VFIO_IOMMU_SPAPR_TCE_RESET _IO(VFIO_TYPE, VFIO_BASE + 20)
>> +
>
> argsz by itself seems rather pointless if we don't have a flags field to
> augment the structure. Thanks,
Add flags and check for zero or remove it? Cannot choose, please help :)
> Alex
>
>> /* ***************************************************************** */
>>
>> #endif /* VFIO_H */
>
>
>
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
2014-08-19 0:43 ` David Gibson
@ 2014-08-20 8:09 ` Alexey Kardashevskiy
0 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-20 8:09 UTC (permalink / raw)
To: David Gibson
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/19/2014 10:43 AM, David Gibson wrote:
> On Fri, Aug 15, 2014 at 08:12:24PM +1000, Alexey Kardashevskiy wrote:
>> The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max.
>> We are going to add huge DMA windows support so this will create small
>> window and unexpectedly fail later.
>>
>> This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB. Since
>> those windows are normally mapped at the boot time, there will be no
>> performance impact.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> I think perhaps the crucial point here is that the window size
> parameter to the kernel ioctl() is only 32-bit, so there's no way of
> expressing a TCE window > 4GB.
That exactly the point. I'll steal these bit and commit log will start with:
===
The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max a
the window size parameter to the kernel ioctl() is only 32-bit, so
there's no way of expressing a TCE window > 4GB.
===
>> ---
>> hw/ppc/spapr_iommu.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index f6e32a4..36f5d27 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -113,11 +113,11 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>> static int spapr_tce_table_realize(DeviceState *dev)
>> {
>> sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
>> + uint64_t window_size = tcet->nb_table << tcet->page_shift;
>
> tcet->nb_table is only 32-bit itself, so this isn't going to work as
> intended without a cast.
Oh. Thanks. /me is thinking how to catch this kind of errors from a script...
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW
2014-08-20 7:49 ` Alexey Kardashevskiy
@ 2014-08-20 19:44 ` Alex Williamson
2014-08-21 2:47 ` Alexey Kardashevskiy
0 siblings, 1 reply; 41+ messages in thread
From: Alex Williamson @ 2014-08-20 19:44 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: David Gibson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On Wed, 2014-08-20 at 17:49 +1000, Alexey Kardashevskiy wrote:
> On 08/19/2014 03:42 AM, Alex Williamson wrote:
> > On Fri, 2014-08-15 at 20:12 +1000, Alexey Kardashevskiy wrote:
> >> Since the changes are not in upstream yet, no tag or branch is specified here.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >> linux-headers/linux/vfio.h | 37 ++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 36 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> >> index 26c218e..f0aa97d 100644
> >> --- a/linux-headers/linux/vfio.h
> >> +++ b/linux-headers/linux/vfio.h
> >> @@ -448,13 +448,48 @@ struct vfio_iommu_type1_dma_unmap {
> >> */
> >> struct vfio_iommu_spapr_tce_info {
> >> __u32 argsz;
> >> - __u32 flags; /* reserved for future use */
> >> + __u32 flags;
> >> +#define VFIO_IOMMU_SPAPR_TCE_FLAG_DDW 1 /* Support dynamic windows */
> >> __u32 dma32_window_start; /* 32 bit window start (bytes) */
> >> __u32 dma32_window_size; /* 32 bit window size (bytes) */
> >> };
> >>
> >> #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >>
> >> +/*
> >> + * Dynamic DMA windows
> >> + */
> >> +struct vfio_iommu_spapr_tce_query {
> >> + __u32 argsz;
> >> + /* out */
> >> + __u32 windows_available;
> >> + __u32 page_size_mask;
> >> +};
> >
> > Why do we need a new ioctl for this vs extending tce_info to include it?
> > That's sort of the point of including argsz and flags in the ioctl.
>
>
> It is not actual now but I can imagine that these numbers may change
> depending on multiple calls of create()/remove().
Why does that matter?
> >> +#define VFIO_IOMMU_SPAPR_TCE_QUERY _IO(VFIO_TYPE, VFIO_BASE + 17)
> >> +
> >> +struct vfio_iommu_spapr_tce_create {
> >> + __u32 argsz;
> >> + /* in */
> >> + __u32 page_shift;
> >> + __u32 window_shift;
> >> + /* out */
> >> + __u64 start_addr;
> >> +
> >> +};
> >> +#define VFIO_IOMMU_SPAPR_TCE_CREATE _IO(VFIO_TYPE, VFIO_BASE + 18)
> >> +
> >> +struct vfio_iommu_spapr_tce_remove {
> >> + __u32 argsz;
> >> + /* in */
> >> + __u64 start_addr;
> >> +};
> >> +#define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 19)
> >> +
> >> +struct vfio_iommu_spapr_tce_reset {
> >> + __u32 argsz;
> >> +};
> >> +#define VFIO_IOMMU_SPAPR_TCE_RESET _IO(VFIO_TYPE, VFIO_BASE + 20)
> >> +
> >
> > argsz by itself seems rather pointless if we don't have a flags field to
> > augment the structure. Thanks,
>
>
> Add flags and check for zero or remove it? Cannot choose, please help :)
Do we really need to hash this out again? Almost every vfio ioctl
includes argsz and flags. Please continue to do this unless you have a
good reason otherwise. Thanks,
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW
2014-08-20 19:44 ` Alex Williamson
@ 2014-08-21 2:47 ` Alexey Kardashevskiy
0 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-21 2:47 UTC (permalink / raw)
To: Alex Williamson
Cc: David Gibson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/21/2014 05:44 AM, Alex Williamson wrote:
> On Wed, 2014-08-20 at 17:49 +1000, Alexey Kardashevskiy wrote:
>> On 08/19/2014 03:42 AM, Alex Williamson wrote:
>>> On Fri, 2014-08-15 at 20:12 +1000, Alexey Kardashevskiy wrote:
>>>> Since the changes are not in upstream yet, no tag or branch is specified here.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>> linux-headers/linux/vfio.h | 37 ++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 36 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>>>> index 26c218e..f0aa97d 100644
>>>> --- a/linux-headers/linux/vfio.h
>>>> +++ b/linux-headers/linux/vfio.h
>>>> @@ -448,13 +448,48 @@ struct vfio_iommu_type1_dma_unmap {
>>>> */
>>>> struct vfio_iommu_spapr_tce_info {
>>>> __u32 argsz;
>>>> - __u32 flags; /* reserved for future use */
>>>> + __u32 flags;
>>>> +#define VFIO_IOMMU_SPAPR_TCE_FLAG_DDW 1 /* Support dynamic windows */
>>>> __u32 dma32_window_start; /* 32 bit window start (bytes) */
>>>> __u32 dma32_window_size; /* 32 bit window size (bytes) */
>>>> };
>>>>
>>>> #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>>
>>>> +/*
>>>> + * Dynamic DMA windows
>>>> + */
>>>> +struct vfio_iommu_spapr_tce_query {
>>>> + __u32 argsz;
>>>> + /* out */
>>>> + __u32 windows_available;
>>>> + __u32 page_size_mask;
>>>> +};
>>>
>>> Why do we need a new ioctl for this vs extending tce_info to include it?
>>> That's sort of the point of including argsz and flags in the ioctl.
>>
>>
>> It is not actual now but I can imagine that these numbers may change
>> depending on multiple calls of create()/remove().
>
> Why does that matter?
Well, I could try imagining some hardware which would be able to have 2
small windows or 1 big window and only after the userspace created one
windows the host kernel could say if that window was small enough and it
still can do big window, or something like.
On the other hand, since we do not support this anyway and I do not think
we ever will and if we will, no idea what form it will take, I'll remove it
for now.
>>>> +#define VFIO_IOMMU_SPAPR_TCE_QUERY _IO(VFIO_TYPE, VFIO_BASE + 17)
>>>> +
>>>> +struct vfio_iommu_spapr_tce_create {
>>>> + __u32 argsz;
>>>> + /* in */
>>>> + __u32 page_shift;
>>>> + __u32 window_shift;
>>>> + /* out */
>>>> + __u64 start_addr;
>>>> +
>>>> +};
>>>> +#define VFIO_IOMMU_SPAPR_TCE_CREATE _IO(VFIO_TYPE, VFIO_BASE + 18)
>>>> +
>>>> +struct vfio_iommu_spapr_tce_remove {
>>>> + __u32 argsz;
>>>> + /* in */
>>>> + __u64 start_addr;
>>>> +};
>>>> +#define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 19)
>>>> +
>>>> +struct vfio_iommu_spapr_tce_reset {
>>>> + __u32 argsz;
>>>> +};
>>>> +#define VFIO_IOMMU_SPAPR_TCE_RESET _IO(VFIO_TYPE, VFIO_BASE + 20)
>>>> +
>>>
>>> argsz by itself seems rather pointless if we don't have a flags field to
>>> augment the structure. Thanks,
>>
>>
>> Add flags and check for zero or remove it? Cannot choose, please help :)
>
> Do we really need to hash this out again? Almost every vfio ioctl
> includes argsz and flags. Please continue to do this unless you have a
> good reason otherwise. Thanks,
Will do. Thanks!
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
@ 2014-08-26 6:16 ` David Gibson
2014-08-26 7:04 ` Alexey Kardashevskiy
0 siblings, 1 reply; 41+ messages in thread
From: David Gibson @ 2014-08-26 6:16 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 1270 bytes --]
On Fri, Aug 15, 2014 at 08:12:28PM +1000, Alexey Kardashevskiy wrote:
> Every sPAPRTCETable object holds an IOMMU memory region which holds
> a referenced to the sPAPRTCETable instance. So if we want to free
> an sPAPRTCETable instance, calling object_unref() will not be enough
> as embedded memory region will hold the reference and we need to break
> the loop.
>
> This adds a spapr_tce_free_table() helper which destroys the embedded
> memory region and then calls object_unref() on the sPAPRTCETable instance
> which will succeed now. The helper adds a new child under unique name
> derived from LIOBN.
>
> As we are here, fix spapr_tce_new_table() callers.
> At the moment spapr_tce_new_table() references sPAPRTCETable twice -
> once in object_new() and second time in object_property_add_child().
> The callers of spapr_tce_new_table() should take care of correct
> referencing.
So I've been trying to determine if there's any way to avoid this by
not constructing the reference loop in the first place, but all the
qom is breaking my brain.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset Alexey Kardashevskiy
@ 2014-08-26 6:55 ` David Gibson
0 siblings, 0 replies; 41+ messages in thread
From: David Gibson @ 2014-08-26 6:55 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 681 bytes --]
On Fri, Aug 15, 2014 at 08:12:31PM +1000, Alexey Kardashevskiy wrote:
> This enables use of the parent class rest() callback in VFIO.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>
> I honestly do not remember why I did not do this when I added VFIO
> at the first place...
This or equivalent probably belongs in the main commit message,
otherwise it has no justification for the change at all.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper
2014-08-26 6:16 ` David Gibson
@ 2014-08-26 7:04 ` Alexey Kardashevskiy
0 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-26 7:04 UTC (permalink / raw)
To: David Gibson
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/26/2014 04:16 PM, David Gibson wrote:
> On Fri, Aug 15, 2014 at 08:12:28PM +1000, Alexey Kardashevskiy wrote:
>> Every sPAPRTCETable object holds an IOMMU memory region which holds
>> a referenced to the sPAPRTCETable instance. So if we want to free
>> an sPAPRTCETable instance, calling object_unref() will not be enough
>> as embedded memory region will hold the reference and we need to break
>> the loop.
>>
>> This adds a spapr_tce_free_table() helper which destroys the embedded
>> memory region and then calls object_unref() on the sPAPRTCETable instance
>> which will succeed now. The helper adds a new child under unique name
>> derived from LIOBN.
>>
>> As we are here, fix spapr_tce_new_table() callers.
>> At the moment spapr_tce_new_table() references sPAPRTCETable twice -
>> once in object_new() and second time in object_property_add_child().
>> The callers of spapr_tce_new_table() should take care of correct
>> referencing.
>
> So I've been trying to determine if there's any way to avoid this by
> not constructing the reference loop in the first place, but all the
> qom is breaking my brain.
Well. I could have added an additional object to sPAPRTCETable (just a
pointer in the struct but not a QOM-child!) and use it as an owner for
MemoryRegion. It would not hold reference to the table. But since I do not
grasp the idea of having an owner for a MemoryRegion, this can break
something which I cannot even imagine :)
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support Alexey Kardashevskiy
@ 2014-08-26 7:06 ` David Gibson
2014-08-27 9:36 ` Alexander Graf
1 sibling, 0 replies; 41+ messages in thread
From: David Gibson @ 2014-08-26 7:06 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 5329 bytes --]
On Fri, Aug 15, 2014 at 08:12:29PM +1000, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> which can support page sizes other than 4K.
>
> The existing implementation of DDW in the guest tries to create one huge
> DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
> succeeds, the guest switches to dma_direct_ops and never calls
> TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
> the entire RAM and not waste time on map/unmap.
>
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
>
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PHB.
>
> Since no PHB class implements new callback in this patch, no functional
> change is expected.
[snip]
> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
> + sPAPREnvironment *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> + sPAPRPHBState *sphb;
> + sPAPRPHBClass *spc;
> + sPAPRTCETable *tcet = NULL;
> + uint32_t addr, page_shift, window_shift, liobn;
> + uint64_t buid;
> + long ret;
> +
> + if ((nargs != 5) || (nret != 4)) {
> + goto param_error_exit;
> + }
> +
> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> + addr = rtas_ld(args, 0);
> + sphb = spapr_pci_find_phb(spapr, buid);
> + if (!sphb) {
> + goto param_error_exit;
> + }
> +
> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> + if (!spc->ddw_create) {
> + goto hw_error_exit;
> + }
> +
> + page_shift = rtas_ld(args, 3);
> + window_shift = rtas_ld(args, 4);
> + /* Default 32bit window#0 is always there so +1 */
> + liobn = SPAPR_PCI_LIOBN(sphb->index, sphb->ddw_num + 1);
> +
> + ret = spc->ddw_create(sphb, page_shift, window_shift, liobn, &tcet);
> + trace_spapr_iommu_ddw_create(buid, addr, 1ULL << page_shift,
> + 1ULL << window_shift,
> + tcet ? tcet->bus_offset : 0xbaadf00d,
> + liobn, ret);
> + if (ret || !tcet) {
> + goto hw_error_exit;
> + }
> +
> + sphb->ddw_num++;
You increment ddw_num here...
[snip]
> +static void rtas_ibm_remove_pe_dma_window(PowerPCCPU *cpu,
> + sPAPREnvironment *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> + sPAPRPHBState *sphb;
> + sPAPRPHBClass *spc;
> + sPAPRTCETable *tcet;
> + uint32_t liobn;
> + long ret;
> +
> + if ((nargs != 1) || (nret != 1)) {
> + goto param_error_exit;
> + }
> +
> + liobn = rtas_ld(args, 0);
> + tcet = spapr_tce_find_by_liobn(liobn);
> + if (!tcet) {
> + goto param_error_exit;
> + }
> +
> + sphb = SPAPR_PCI_HOST_BRIDGE(OBJECT(tcet)->parent);
> + if (!sphb) {
> + goto param_error_exit;
> + }
> +
> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> + if (!spc->ddw_remove) {
> + goto hw_error_exit;
> + }
> +
> + ret = spc->ddw_remove(sphb, tcet);
> + trace_spapr_iommu_ddw_remove(liobn, ret);
> + if (ret) {
> + goto hw_error_exit;
> + }
> +
> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> + return;
.. but don't decrement it here. Is that a bug?
[snip]
> +static void rtas_ibm_reset_pe_dma_window(PowerPCCPU *cpu,
> + sPAPREnvironment *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> + sPAPRPHBState *sphb;
> + sPAPRPHBClass *spc;
> + uint64_t buid;
> + uint32_t addr;
> + long ret;
> +
> + if ((nargs != 3) || (nret != 1)) {
> + goto param_error_exit;
> + }
> +
> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> + addr = rtas_ld(args, 0);
> + sphb = spapr_pci_find_phb(spapr, buid);
> + if (!sphb) {
> + goto param_error_exit;
> + }
> +
> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> + if (!spc->ddw_reset) {
> + goto hw_error_exit;
> + }
> +
> + ret = spc->ddw_reset(sphb);
> + trace_spapr_iommu_ddw_reset(buid, addr, ret);
> + if (ret) {
> + goto hw_error_exit;
> + }
Likewise ddw_num doesn't seem to be reset here.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW Alexey Kardashevskiy
@ 2014-08-26 7:14 ` David Gibson
2014-08-26 8:11 ` Alexey Kardashevskiy
0 siblings, 1 reply; 41+ messages in thread
From: David Gibson @ 2014-08-26 7:14 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 8816 bytes --]
On Fri, Aug 15, 2014 at 08:12:30PM +1000, Alexey Kardashevskiy wrote:
> This implements DDW for emulated PHB.
>
> This advertises DDW in device tree.
>
> Since QEMU does not implement any 64bit DMA capable device, this hack
> has been used to enable 64bit DMA on E1000:
>
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index 0fc29a0..131f80a 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -240,6 +240,7 @@ static const uint32_t mac_reg_init[] = {
> [STATUS] = 0x80000000 | E1000_STATUS_GIO_MASTER_ENABLE |
> E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
> E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
> + E1000_STATUS_PCIX_MODE |
> E1000_STATUS_LU,
> [MANC] = E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
> E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
Are you planning to send a patch to enable 64-bit DMA in the e1000
device properly?
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * tested on hacked emulated E1000
> * implemented DDW reset on the PHB reset
> * spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO
> ---
> hw/ppc/spapr_pci.c | 96 +++++++++++++++++++++++++++++++++++++++++++++
> include/hw/pci-host/spapr.h | 7 ++++
> 2 files changed, 103 insertions(+)
>
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 9b03d0d..cba8d9d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -22,6 +22,7 @@
> * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> * THE SOFTWARE.
> */
> +#include "sysemu/sysemu.h"
> #include "hw/hw.h"
> #include "hw/pci/pci.h"
> #include "hw/pci/msi.h"
> @@ -498,6 +499,70 @@ void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr)
> }
>
> /*
> + * Dynamic DMA windows
> + */
> +static int spapr_pci_ddw_query(sPAPRPHBState *sphb,
> + uint32_t *windows_available,
> + uint32_t *page_size_mask)
> +{
> + *windows_available = 1;
> + *page_size_mask = DDW_PGSIZE_16M;
> +
> + return 0;
> +}
> +
> +static int spapr_pci_ddw_create(sPAPRPHBState *sphb, uint32_t page_shift,
> + uint32_t window_shift, uint32_t liobn,
> + sPAPRTCETable **ptcet)
> +{
> + *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
> + SPAPR_PCI_TCE64_START, page_shift,
> + 1ULL << (window_shift - page_shift),
> + true);
Does anything actually validate that the specified page_shift is one
of the permitted/advertised ones?
> + if (!*ptcet) {
> + return -1;
> + }
> + memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
> + spapr_tce_get_iommu(*ptcet));
> +
> + return 0;
> +}
> +
> +int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
> +{
> + memory_region_del_subregion(&sphb->iommu_root,
> + spapr_tce_get_iommu(tcet));
> + spapr_tce_free_table(tcet);
Ok, relating to my comment in the previous patch, ddw_num doesn't seem
to be decremented here either.
> +
> + return 0;
> +}
> +
> +static int spapr_pci_remove_ddw_cb(Object *child, void *opaque)
> +{
> + sPAPRTCETable *tcet;
> +
> + tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
> +
> + /* Delete all dynamic windows, i.e. every except the default one with #0 */
> + if (tcet && SPAPR_PCI_DMA_WINDOW_NUM(tcet->liobn)) {
> + sPAPRPHBState *sphb = opaque;
> + sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> +
> + spc->ddw_remove(sphb, tcet);
> + }
> +
> + return 0;
> +}
> +
> +int spapr_pci_ddw_reset(sPAPRPHBState *sphb)
> +{
> + object_child_foreach(OBJECT(sphb), spapr_pci_remove_ddw_cb, sphb);
> + sphb->ddw_num = 0;
So, you do reset ddw_num here, but since it is incremented in the
generic RTAS code, this smells like a layering violation.
> +
> + return 0;
> +}
> +
> +/*
> * PHB PCI device
> */
> static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> @@ -671,6 +736,12 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
>
> static void spapr_phb_reset(DeviceState *qdev)
> {
> + sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(qdev);
> +
> + if (spc->ddw_reset) {
> + spc->ddw_reset(SPAPR_PCI_HOST_BRIDGE(qdev));
> + }
> +
> /* Reset the IOMMU state */
> object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
> }
> @@ -685,6 +756,7 @@ static Property spapr_phb_properties[] = {
> DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
> DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
> SPAPR_PCI_IO_WIN_SIZE),
> + DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, false),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> @@ -802,6 +874,10 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
> set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> dc->cannot_instantiate_with_device_add_yet = false;
> spc->finish_realize = spapr_phb_finish_realize;
> + spc->ddw_query = spapr_pci_ddw_query;
> + spc->ddw_create = spapr_pci_ddw_create;
> + spc->ddw_remove = spapr_pci_ddw_remove;
> + spc->ddw_reset = spapr_pci_ddw_reset;
> }
>
> static const TypeInfo spapr_phb_info = {
> @@ -885,6 +961,13 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> uint32_t interrupt_map_mask[] = {
> cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
> uint32_t interrupt_map[PCI_SLOT_MAX * PCI_NUM_PINS][7];
> + uint32_t ddw_applicable[] = {
> + RTAS_IBM_QUERY_PE_DMA_WINDOW,
> + RTAS_IBM_CREATE_PE_DMA_WINDOW,
> + RTAS_IBM_REMOVE_PE_DMA_WINDOW
> + };
> + uint32_t ddw_extensions[] = { 1, RTAS_IBM_RESET_PE_DMA_WINDOW };
> + sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(phb);
>
> /* Start populating the FDT */
> sprintf(nodename, "pci@%" PRIx64, phb->buid);
> @@ -914,6 +997,19 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
> _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS));
>
> + /* Dynamic DMA window */
> + if (phb->ddw_enabled &&
> + spc->ddw_query && spc->ddw_create && spc->ddw_remove) {
> + _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-applicable", &ddw_applicable,
> + sizeof(ddw_applicable)));
> +
> + if (spc->ddw_reset) {
> + /* When enabled, the guest will remove the default 32bit window */
I guess it's not really relevant here, but the reason for availability
of reset causing the guest to remove the default window seems unclear.
> + _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-extensions",
> + &ddw_extensions, sizeof(ddw_extensions)));
> + }
> + }
> +
> /* Build the interrupt-map, this must matches what is done
> * in pci_spapr_map_irq
> */
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index a1fdbb2..0b40fcf 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -104,6 +104,8 @@ struct sPAPRPHBState {
> int32_t msi_devs_num;
> spapr_pci_msi_mig *msi_devs;
>
> + bool ddw_enabled;
> +
> QLIST_ENTRY(sPAPRPHBState) list;
> };
>
> @@ -126,6 +128,9 @@ struct sPAPRPHBVFIOState {
>
> #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>
> +/* Default 64bit dynamic window offset */
> +#define SPAPR_PCI_TCE64_START 0x8000000000000000ULL
> +
> static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
> {
> return xics_get_qirq(spapr->icp, phb->lsi_table[pin].irq);
> @@ -144,5 +149,7 @@ void spapr_pci_rtas_init(void);
> sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
> PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
> uint32_t config_addr);
> +int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
> +int spapr_pci_ddw_reset(sPAPRPHBState *sphb);
>
> #endif /* __HW_SPAPR_PCI_H__ */
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW Alexey Kardashevskiy
@ 2014-08-26 7:19 ` David Gibson
2014-08-26 8:16 ` Alexey Kardashevskiy
0 siblings, 1 reply; 41+ messages in thread
From: David Gibson @ 2014-08-26 7:19 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 4826 bytes --]
On Fri, Aug 15, 2014 at 08:12:33PM +1000, Alexey Kardashevskiy wrote:
> This implements DDW for VFIO. Host kernel support is required for this.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * remove()/reset() callbacks use spapr_pci's ones
> ---
> hw/ppc/spapr_pci_vfio.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 86 insertions(+)
>
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index 11b4272..79df716 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -71,6 +71,88 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
> spapr_tce_get_iommu(tcet));
>
> object_unref(OBJECT(tcet));
> +
> + if (sphb->ddw_enabled) {
> + sphb->ddw_enabled = !!(info.flags & VFIO_IOMMU_SPAPR_TCE_FLAG_DDW);
This overrides an explicit ddw= set by the user, which is a bit
counter-intuitive.
> + }
> +}
> +
> +static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
> + uint32_t *windows_available,
> + uint32_t *page_size_mask)
> +{
> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
> + struct vfio_iommu_spapr_tce_query query = { .argsz = sizeof(query) };
> + int ret;
> +
> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> + VFIO_IOMMU_SPAPR_TCE_QUERY, &query);
> + if (ret) {
> + return ret;
> + }
> +
> + *windows_available = query.windows_available;
> + *page_size_mask = query.page_size_mask;
> +
> + return ret;
> +}
> +
> +static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t page_shift,
> + uint32_t window_shift, uint32_t liobn,
> + sPAPRTCETable **ptcet)
> +{
> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
> + struct vfio_iommu_spapr_tce_create create = {
> + .argsz = sizeof(create),
> + .page_shift = page_shift,
> + .window_shift = window_shift,
> + .start_addr = 0
> + };
> + int ret;
> +
> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> + VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
> + if (ret) {
> + return ret;
> + }
> +
> + *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
> + create.start_addr, page_shift,
> + 1ULL << (window_shift - page_shift),
> + true);
> + memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
> + spapr_tce_get_iommu(*ptcet));
> +
> + return ret;
> +}
> +
> +static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
> +{
> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
> + struct vfio_iommu_spapr_tce_remove remove = {
> + .argsz = sizeof(remove),
> + .start_addr = tcet->bus_offset
> + };
> + int ret;
> +
> + spapr_pci_ddw_remove(sphb, tcet);
> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> + VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
> +
> + return ret;
> +}
> +
> +static int spapr_pci_vfio_ddw_reset(sPAPRPHBState *sphb)
> +{
> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
> + struct vfio_iommu_spapr_tce_reset reset = { .argsz = sizeof(reset) };
> + int ret;
> +
> + spapr_pci_ddw_reset(sphb);
> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
> + VFIO_IOMMU_SPAPR_TCE_RESET, &reset);
Unlike the non-VFIO version, this doesn't appear to reset ddw_num.
Also, there isn't call to explicitly call DDW reset on system reset.
Is that handled in kernel by the overall VFIO reset?
> + return ret;
> }
>
> static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
> @@ -80,6 +162,10 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
>
> dc->props = spapr_phb_vfio_properties;
> spc->finish_realize = spapr_phb_vfio_finish_realize;
> + spc->ddw_query = spapr_pci_vfio_ddw_query;
> + spc->ddw_create = spapr_pci_vfio_ddw_create;
> + spc->ddw_remove = spapr_pci_vfio_ddw_remove;
> + spc->ddw_reset = spapr_pci_vfio_ddw_reset;
> }
>
> static const TypeInfo spapr_phb_vfio_info = {
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
@ 2014-08-26 7:20 ` David Gibson
2014-08-26 8:20 ` Alexey Kardashevskiy
0 siblings, 1 reply; 41+ messages in thread
From: David Gibson @ 2014-08-26 7:20 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 474 bytes --]
On Fri, Aug 15, 2014 at 08:12:34PM +1000, Alexey Kardashevskiy wrote:
> This enables DDW RTAS-related ioctls in VFIO.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
This should probably just be folded into the previous patch. It's
broken without this change.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW
2014-08-26 7:14 ` David Gibson
@ 2014-08-26 8:11 ` Alexey Kardashevskiy
0 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-26 8:11 UTC (permalink / raw)
To: David Gibson
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/26/2014 05:14 PM, David Gibson wrote:
> On Fri, Aug 15, 2014 at 08:12:30PM +1000, Alexey Kardashevskiy wrote:
>> This implements DDW for emulated PHB.
>>
>> This advertises DDW in device tree.
>>
>> Since QEMU does not implement any 64bit DMA capable device, this hack
>> has been used to enable 64bit DMA on E1000:
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index 0fc29a0..131f80a 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -240,6 +240,7 @@ static const uint32_t mac_reg_init[] = {
>> [STATUS] = 0x80000000 | E1000_STATUS_GIO_MASTER_ENABLE |
>> E1000_STATUS_ASDV | E1000_STATUS_MTXCKOK |
>> E1000_STATUS_SPEED_1000 | E1000_STATUS_FD |
>> + E1000_STATUS_PCIX_MODE |
>> E1000_STATUS_LU,
>> [MANC] = E1000_MANC_EN_MNG2HOST | E1000_MANC_RCV_TCO_EN |
>> E1000_MANC_ARP_EN | E1000_MANC_0298_EN |
>
> Are you planning to send a patch to enable 64-bit DMA in the e1000
> device properly?
Nope. The e1000 family has pci and pcix devices and so far QEMU emulated
pci one with its own vendor/device ids. I would either have to change
device id to pcix version and enable that pcix bit or add new e1000-pcix
device and I do not see any real demand on this as we have virtio-pci which
is way cooler and faster and everything :)
>
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * tested on hacked emulated E1000
>> * implemented DDW reset on the PHB reset
>> * spapr_pci_ddw_remove/spapr_pci_ddw_reset are public for reuse by VFIO
>> ---
>> hw/ppc/spapr_pci.c | 96 +++++++++++++++++++++++++++++++++++++++++++++
>> include/hw/pci-host/spapr.h | 7 ++++
>> 2 files changed, 103 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 9b03d0d..cba8d9d 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -22,6 +22,7 @@
>> * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> * THE SOFTWARE.
>> */
>> +#include "sysemu/sysemu.h"
>> #include "hw/hw.h"
>> #include "hw/pci/pci.h"
>> #include "hw/pci/msi.h"
>> @@ -498,6 +499,70 @@ void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr)
>> }
>>
>> /*
>> + * Dynamic DMA windows
>> + */
>> +static int spapr_pci_ddw_query(sPAPRPHBState *sphb,
>> + uint32_t *windows_available,
>> + uint32_t *page_size_mask)
>> +{
>> + *windows_available = 1;
>> + *page_size_mask = DDW_PGSIZE_16M;
>> +
>> + return 0;
>> +}
>> +
>> +static int spapr_pci_ddw_create(sPAPRPHBState *sphb, uint32_t page_shift,
>> + uint32_t window_shift, uint32_t liobn,
>> + sPAPRTCETable **ptcet)
>> +{
>> + *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
>> + SPAPR_PCI_TCE64_START, page_shift,
>> + 1ULL << (window_shift - page_shift),
>> + true);
>
> Does anything actually validate that the specified page_shift is one
> of the permitted/advertised ones?
Nope. Will fix.
>> + if (!*ptcet) {
>> + return -1;
>> + }
>> + memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
>> + spapr_tce_get_iommu(*ptcet));
>> +
>> + return 0;
>> +}
>> +
>> +int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
>> +{
>> + memory_region_del_subregion(&sphb->iommu_root,
>> + spapr_tce_get_iommu(tcet));
>> + spapr_tce_free_table(tcet);
>
> Ok, relating to my comment in the previous patch, ddw_num doesn't seem
> to be decremented here either.
@ddw_num is an id, it just have to be unique. If I decrement it, then I'll
have to track what numbers are in use.
>> +
>> + return 0;
>> +}
>> +
>> +static int spapr_pci_remove_ddw_cb(Object *child, void *opaque)
>> +{
>> + sPAPRTCETable *tcet;
>> +
>> + tcet = (sPAPRTCETable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
>> +
>> + /* Delete all dynamic windows, i.e. every except the default one with #0 */
>> + if (tcet && SPAPR_PCI_DMA_WINDOW_NUM(tcet->liobn)) {
>> + sPAPRPHBState *sphb = opaque;
>> + sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
>> +
>> + spc->ddw_remove(sphb, tcet);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +int spapr_pci_ddw_reset(sPAPRPHBState *sphb)
>> +{
>> + object_child_foreach(OBJECT(sphb), spapr_pci_remove_ddw_cb, sphb);
>> + sphb->ddw_num = 0;
>
> So, you do reset ddw_num here, but since it is incremented in the
> generic RTAS code, this smells like a layering violation.
Yeah, good idea to keep in one file at least. Will fix.
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> * PHB PCI device
>> */
>> static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>> @@ -671,6 +736,12 @@ static int spapr_phb_children_reset(Object *child, void *opaque)
>>
>> static void spapr_phb_reset(DeviceState *qdev)
>> {
>> + sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(qdev);
>> +
>> + if (spc->ddw_reset) {
>> + spc->ddw_reset(SPAPR_PCI_HOST_BRIDGE(qdev));
>> + }
>> +
>> /* Reset the IOMMU state */
>> object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
>> }
>> @@ -685,6 +756,7 @@ static Property spapr_phb_properties[] = {
>> DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
>> DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
>> SPAPR_PCI_IO_WIN_SIZE),
>> + DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, false),
>> DEFINE_PROP_END_OF_LIST(),
>> };
>>
>> @@ -802,6 +874,10 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>> set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>> dc->cannot_instantiate_with_device_add_yet = false;
>> spc->finish_realize = spapr_phb_finish_realize;
>> + spc->ddw_query = spapr_pci_ddw_query;
>> + spc->ddw_create = spapr_pci_ddw_create;
>> + spc->ddw_remove = spapr_pci_ddw_remove;
>> + spc->ddw_reset = spapr_pci_ddw_reset;
>> }
>>
>> static const TypeInfo spapr_phb_info = {
>> @@ -885,6 +961,13 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>> uint32_t interrupt_map_mask[] = {
>> cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
>> uint32_t interrupt_map[PCI_SLOT_MAX * PCI_NUM_PINS][7];
>> + uint32_t ddw_applicable[] = {
>> + RTAS_IBM_QUERY_PE_DMA_WINDOW,
>> + RTAS_IBM_CREATE_PE_DMA_WINDOW,
>> + RTAS_IBM_REMOVE_PE_DMA_WINDOW
>> + };
>> + uint32_t ddw_extensions[] = { 1, RTAS_IBM_RESET_PE_DMA_WINDOW };
>> + sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(phb);
>>
>> /* Start populating the FDT */
>> sprintf(nodename, "pci@%" PRIx64, phb->buid);
>> @@ -914,6 +997,19 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>> _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
>> _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS));
>>
>> + /* Dynamic DMA window */
>> + if (phb->ddw_enabled &&
>> + spc->ddw_query && spc->ddw_create && spc->ddw_remove) {
>> + _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-applicable", &ddw_applicable,
>> + sizeof(ddw_applicable)));
>> +
>> + if (spc->ddw_reset) {
>> + /* When enabled, the guest will remove the default 32bit window */
>
> I guess it's not really relevant here, but the reason for availability
> of reset causing the guest to remove the default window seems unclear.
Add "warn"? This is what sles11sp3 does, recent kernels do not try removing
default window though. But by PAPR 2.7 the platform must implement "reset".
Anyway my plan is to repost without ddw_reset() at all and add it later.
>> + _FDT(fdt_setprop(fdt, bus_off, "ibm,ddw-extensions",
>> + &ddw_extensions, sizeof(ddw_extensions)));
>> + }
>> + }
>> +
>> /* Build the interrupt-map, this must matches what is done
>> * in pci_spapr_map_irq
>> */
>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>> index a1fdbb2..0b40fcf 100644
>> --- a/include/hw/pci-host/spapr.h
>> +++ b/include/hw/pci-host/spapr.h
>> @@ -104,6 +104,8 @@ struct sPAPRPHBState {
>> int32_t msi_devs_num;
>> spapr_pci_msi_mig *msi_devs;
>>
>> + bool ddw_enabled;
>> +
>> QLIST_ENTRY(sPAPRPHBState) list;
>> };
>>
>> @@ -126,6 +128,9 @@ struct sPAPRPHBVFIOState {
>>
>> #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>>
>> +/* Default 64bit dynamic window offset */
>> +#define SPAPR_PCI_TCE64_START 0x8000000000000000ULL
>> +
>> static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>> {
>> return xics_get_qirq(spapr->icp, phb->lsi_table[pin].irq);
>> @@ -144,5 +149,7 @@ void spapr_pci_rtas_init(void);
>> sPAPRPHBState *spapr_pci_find_phb(sPAPREnvironment *spapr, uint64_t buid);
>> PCIDevice *spapr_pci_find_dev(sPAPREnvironment *spapr, uint64_t buid,
>> uint32_t config_addr);
>> +int spapr_pci_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet);
>> +int spapr_pci_ddw_reset(sPAPRPHBState *sphb);
>>
>> #endif /* __HW_SPAPR_PCI_H__ */
>
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW
2014-08-26 7:19 ` David Gibson
@ 2014-08-26 8:16 ` Alexey Kardashevskiy
2014-08-27 8:25 ` David Gibson
0 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-26 8:16 UTC (permalink / raw)
To: David Gibson
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/26/2014 05:19 PM, David Gibson wrote:
> On Fri, Aug 15, 2014 at 08:12:33PM +1000, Alexey Kardashevskiy wrote:
>> This implements DDW for VFIO. Host kernel support is required for this.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * remove()/reset() callbacks use spapr_pci's ones
>> ---
>> hw/ppc/spapr_pci_vfio.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 86 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>> index 11b4272..79df716 100644
>> --- a/hw/ppc/spapr_pci_vfio.c
>> +++ b/hw/ppc/spapr_pci_vfio.c
>> @@ -71,6 +71,88 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
>> spapr_tce_get_iommu(tcet));
>>
>> object_unref(OBJECT(tcet));
>> +
>> + if (sphb->ddw_enabled) {
>> + sphb->ddw_enabled = !!(info.flags & VFIO_IOMMU_SPAPR_TCE_FLAG_DDW);
>
> This overrides an explicit ddw= set by the user, which is a bit
> counter-intuitive.
For the user it is rather "try ddw when available" than "do ddw". This was
suggested by Alex Graf or I misunderstood his suggestion :)
>
>> + }
>> +}
>> +
>> +static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
>> + uint32_t *windows_available,
>> + uint32_t *page_size_mask)
>> +{
>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>> + struct vfio_iommu_spapr_tce_query query = { .argsz = sizeof(query) };
>> + int ret;
>> +
>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>> + VFIO_IOMMU_SPAPR_TCE_QUERY, &query);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + *windows_available = query.windows_available;
>> + *page_size_mask = query.page_size_mask;
>> +
>> + return ret;
>> +}
>> +
>> +static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t page_shift,
>> + uint32_t window_shift, uint32_t liobn,
>> + sPAPRTCETable **ptcet)
>> +{
>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>> + struct vfio_iommu_spapr_tce_create create = {
>> + .argsz = sizeof(create),
>> + .page_shift = page_shift,
>> + .window_shift = window_shift,
>> + .start_addr = 0
>> + };
>> + int ret;
>> +
>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>> + VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn,
>> + create.start_addr, page_shift,
>> + 1ULL << (window_shift - page_shift),
>> + true);
>> + memory_region_add_subregion(&sphb->iommu_root, (*ptcet)->bus_offset,
>> + spapr_tce_get_iommu(*ptcet));
>> +
>> + return ret;
>> +}
>> +
>> +static int spapr_pci_vfio_ddw_remove(sPAPRPHBState *sphb, sPAPRTCETable *tcet)
>> +{
>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>> + struct vfio_iommu_spapr_tce_remove remove = {
>> + .argsz = sizeof(remove),
>> + .start_addr = tcet->bus_offset
>> + };
>> + int ret;
>> +
>> + spapr_pci_ddw_remove(sphb, tcet);
>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>> + VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
>> +
>> + return ret;
>> +}
>> +
>> +static int spapr_pci_vfio_ddw_reset(sPAPRPHBState *sphb)
>> +{
>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>> + struct vfio_iommu_spapr_tce_reset reset = { .argsz = sizeof(reset) };
>> + int ret;
>> +
>> + spapr_pci_ddw_reset(sphb);
>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>> + VFIO_IOMMU_SPAPR_TCE_RESET, &reset);
>
> Unlike the non-VFIO version, this doesn't appear to reset ddw_num.
>
> Also, there isn't call to explicitly call DDW reset on system reset.
> Is that handled in kernel by the overall VFIO reset?
"[RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset"
effectively enables spapr_phb_reset() as a reset handler for VFIO PHB and
that function does spc->ddw_reset(). This was the real trigger-reason for
the 09/13 patch.
>> + return ret;
>> }
>>
>> static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
>> @@ -80,6 +162,10 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data)
>>
>> dc->props = spapr_phb_vfio_properties;
>> spc->finish_realize = spapr_phb_vfio_finish_realize;
>> + spc->ddw_query = spapr_pci_vfio_ddw_query;
>> + spc->ddw_create = spapr_pci_vfio_ddw_create;
>> + spc->ddw_remove = spapr_pci_vfio_ddw_remove;
>> + spc->ddw_reset = spapr_pci_vfio_ddw_reset;
>> }
>>
>> static const TypeInfo spapr_phb_vfio_info = {
>
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver
2014-08-26 7:20 ` David Gibson
@ 2014-08-26 8:20 ` Alexey Kardashevskiy
2014-08-27 8:42 ` David Gibson
0 siblings, 1 reply; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-26 8:20 UTC (permalink / raw)
To: David Gibson
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
On 08/26/2014 05:20 PM, David Gibson wrote:
> On Fri, Aug 15, 2014 at 08:12:34PM +1000, Alexey Kardashevskiy wrote:
>> This enables DDW RTAS-related ioctls in VFIO.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> This should probably just be folded into the previous patch. It's
> broken without this change.
It won't work but it is not broken - guest will fail to create DDW and
continue using the default windows.
And since the series needs attention of 2 maintainers (A. Williamson and A.
Graf), it is better to draw bold line between areas :)
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW
2014-08-26 8:16 ` Alexey Kardashevskiy
@ 2014-08-27 8:25 ` David Gibson
0 siblings, 0 replies; 41+ messages in thread
From: David Gibson @ 2014-08-27 8:25 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 1595 bytes --]
On Tue, Aug 26, 2014 at 06:16:51PM +1000, Alexey Kardashevskiy wrote:
> On 08/26/2014 05:19 PM, David Gibson wrote:
> > On Fri, Aug 15, 2014 at 08:12:33PM +1000, Alexey Kardashevskiy wrote:
> >> This implements DDW for VFIO. Host kernel support is required for this.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >> Changes:
> >> v2:
> >> * remove()/reset() callbacks use spapr_pci's ones
> >> ---
> >> hw/ppc/spapr_pci_vfio.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 86 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> >> index 11b4272..79df716 100644
> >> --- a/hw/ppc/spapr_pci_vfio.c
> >> +++ b/hw/ppc/spapr_pci_vfio.c
> >> @@ -71,6 +71,88 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
> >> spapr_tce_get_iommu(tcet));
> >>
> >> object_unref(OBJECT(tcet));
> >> +
> >> + if (sphb->ddw_enabled) {
> >> + sphb->ddw_enabled = !!(info.flags & VFIO_IOMMU_SPAPR_TCE_FLAG_DDW);
> >
> > This overrides an explicit ddw= set by the user, which is a bit
> > counter-intuitive.
>
>
> For the user it is rather "try ddw when available" than "do ddw". This was
> suggested by Alex Graf or I misunderstood his suggestion :)
Uh.. never mind. I misread this, not spotting the "if" :).
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver
2014-08-26 8:20 ` Alexey Kardashevskiy
@ 2014-08-27 8:42 ` David Gibson
0 siblings, 0 replies; 41+ messages in thread
From: David Gibson @ 2014-08-27 8:42 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Alex Williamson, qemu-ppc, qemu-devel, Gavin Shan, Alexander Graf
[-- Attachment #1: Type: text/plain, Size: 908 bytes --]
On Tue, Aug 26, 2014 at 06:20:51PM +1000, Alexey Kardashevskiy wrote:
> On 08/26/2014 05:20 PM, David Gibson wrote:
> > On Fri, Aug 15, 2014 at 08:12:34PM +1000, Alexey Kardashevskiy wrote:
> >> This enables DDW RTAS-related ioctls in VFIO.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >
> > This should probably just be folded into the previous patch. It's
> > broken without this change.
>
> It won't work but it is not broken - guest will fail to create DDW and
> continue using the default windows.
>
> And since the series needs attention of 2 maintainers (A. Williamson and A.
> Graf), it is better to draw bold line between areas :)
Ah, ok, that makes sense then.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
2014-08-19 0:43 ` David Gibson
@ 2014-08-27 9:27 ` Alexander Graf
1 sibling, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2014-08-27 9:27 UTC (permalink / raw)
To: Alexey Kardashevskiy, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 15.08.14 12:12, Alexey Kardashevskiy wrote:
> The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max.
> We are going to add huge DMA windows support so this will create small
> window and unexpectedly fail later.
>
> This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB. Since
> those windows are normally mapped at the boot time, there will be no
> performance impact.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> hw/ppc/spapr_iommu.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index f6e32a4..36f5d27 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -113,11 +113,11 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
> static int spapr_tce_table_realize(DeviceState *dev)
> {
> sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
> + uint64_t window_size = tcet->nb_table << tcet->page_shift;
>
> - if (kvm_enabled()) {
> + if (kvm_enabled() && !(window_size >> 32)) {
Please add a comment here that explains the reasoning.
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
2014-08-19 0:44 ` David Gibson
@ 2014-08-27 9:29 ` Alexander Graf
1 sibling, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2014-08-27 9:29 UTC (permalink / raw)
To: Alexey Kardashevskiy, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 15.08.14 12:12, Alexey Kardashevskiy wrote:
> We are going to have multiple DMA windows per PHB and we want them to
> migrate so we need a predictable way of assigning LIOBNs.
>
> This introduces a macro which makes up a LIOBN from fixed prefix,
> PHB index (unique PHB id) and window number.
>
> This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
> from LIOBN, will be used in next patch(es) to distinguish the default
> 32bit windows from dynamic windows.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> hw/ppc/spapr_pci.c | 2 +-
> include/hw/ppc/spapr.h | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 5c46c0d..17eb0d8 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -529,7 +529,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> }
>
> sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
> - sphb->dma_liobn = SPAPR_PCI_BASE_LIOBN + sphb->index;
> + sphb->dma_liobn = SPAPR_PCI_LIOBN(sphb->index, 0);
>
> windows_base = SPAPR_PCI_WINDOW_BASE
> + sphb->index * SPAPR_PCI_WINDOW_SPACING;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index c9d6c6c..782519a 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -443,7 +443,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
> #define SPAPR_TCE_PAGE_MASK (SPAPR_TCE_PAGE_SIZE - 1)
>
> #define SPAPR_VIO_BASE_LIOBN 0x00000000
> -#define SPAPR_PCI_BASE_LIOBN 0x80000000
> +#define SPAPR_PCI_LIOBN(i, n) (0x80000000 | ((i) << 8) | (n))
> +#define SPAPR_PCI_DMA_WINDOW_NUM(liobn) ((liobn) & 0xff)
Good idea. Mind to convert VIO as well to make them consistent?
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support Alexey Kardashevskiy
2014-08-26 7:06 ` David Gibson
@ 2014-08-27 9:36 ` Alexander Graf
2014-08-27 13:56 ` Alexey Kardashevskiy
1 sibling, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2014-08-27 9:36 UTC (permalink / raw)
To: Alexey Kardashevskiy, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 15.08.14 12:12, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> which can support page sizes other than 4K.
>
> The existing implementation of DDW in the guest tries to create one huge
> DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
> succeeds, the guest switches to dma_direct_ops and never calls
> TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
> the entire RAM and not waste time on map/unmap.
>
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
>
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PHB.
>
> Since no PHB class implements new callback in this patch, no functional
> change is expected.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * double loop squashed to spapr_iommu_fixmask() helper
> * added @ddw_num counter to PHB, it is used to generate LIOBN for new
> window; it is reset on ddw-reset event
> * added ULL to constants used in shift operations
> * rtas_ibm_reset_pe_dma_window() and rtas_ibm_remove_pe_dma_window()
> do not remove windows anymore, the PHB callback has to as it will reuse
> the same code in case of guest reboot as well
> ---
> hw/ppc/Makefile.objs | 3 +
> hw/ppc/spapr_pci.c | 3 +-
> hw/ppc/spapr_rtas_ddw.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
> include/hw/pci-host/spapr.h | 19 +++
> include/hw/ppc/spapr.h | 6 +-
> trace-events | 4 +
> 6 files changed, 316 insertions(+), 2 deletions(-)
> create mode 100644 hw/ppc/spapr_rtas_ddw.c
>
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index edd44d0..9773294 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o
> ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> obj-y += spapr_pci_vfio.o
> endif
> +ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
> +obj-y += spapr_rtas_ddw.o
> +endif
> # PowerPC 4xx boards
> obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
> obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index aa20c36..9b03d0d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -759,7 +759,7 @@ static int spapr_pci_post_load(void *opaque, int version_id)
>
> static const VMStateDescription vmstate_spapr_pci = {
> .name = "spapr_pci",
> - .version_id = 2,
> + .version_id = 3,
> .minimum_version_id = 2,
> .pre_save = spapr_pci_pre_save,
> .post_load = spapr_pci_post_load,
> @@ -775,6 +775,7 @@ static const VMStateDescription vmstate_spapr_pci = {
> VMSTATE_INT32(msi_devs_num, sPAPRPHBState),
> VMSTATE_STRUCT_VARRAY_ALLOC(msi_devs, sPAPRPHBState, msi_devs_num, 0,
> vmstate_spapr_pci_msi, spapr_pci_msi_mig),
> + VMSTATE_UINT32_V(ddw_num, sPAPRPHBState, 3),
> VMSTATE_END_OF_LIST()
> },
> };
> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
> new file mode 100644
> index 0000000..2b5376a
> --- /dev/null
> +++ b/hw/ppc/spapr_rtas_ddw.c
> @@ -0,0 +1,283 @@
> +/*
> + * QEMU sPAPR Dynamic DMA windows support
> + *
> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License,
> + * or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "hw/ppc/spapr.h"
> +#include "hw/pci-host/spapr.h"
> +#include "trace.h"
> +
> +static uint32_t spapr_iommu_fixmask(struct ppc_one_seg_page_size *sps,
> + uint32_t query_mask)
> +{
> + int i, j;
> + uint32_t mask = 0;
> + const struct { int shift; uint32_t mask; } masks[] = {
> + { 12, DDW_PGSIZE_4K },
> + { 16, DDW_PGSIZE_64K },
> + { 24, DDW_PGSIZE_16M },
> + { 25, DDW_PGSIZE_32M },
> + { 26, DDW_PGSIZE_64M },
> + { 27, DDW_PGSIZE_128M },
> + { 28, DDW_PGSIZE_256M },
> + { 34, DDW_PGSIZE_16G },
> + };
> +
> + for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
> + for (j = 0; j < ARRAY_SIZE(masks); ++j) {
> + if ((sps[i].page_shift == masks[j].shift) &&
> + (query_mask & masks[j].mask)) {
> + mask |= masks[j].mask;
> + }
> + }
> + }
> +
> + return mask;
> +}
> +
> +static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
> + sPAPREnvironment *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> + CPUPPCState *env = &cpu->env;
> + sPAPRPHBState *sphb;
> + sPAPRPHBClass *spc;
> + uint64_t buid;
> + uint32_t addr, pgmask = 0;
> + uint32_t windows_available = 0, page_size_mask = 0;
> + long ret;
> +
> + if ((nargs != 3) || (nret != 5)) {
> + goto param_error_exit;
> + }
> +
> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> + addr = rtas_ld(args, 0);
> + sphb = spapr_pci_find_phb(spapr, buid);
> + if (!sphb) {
> + goto param_error_exit;
> + }
> +
> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> + if (!spc->ddw_query) {
> + goto hw_error_exit;
> + }
> +
> + ret = spc->ddw_query(sphb, &windows_available, &page_size_mask);
> + trace_spapr_iommu_ddw_query(buid, addr, windows_available,
> + page_size_mask, pgmask, ret);
> + if (ret) {
> + goto hw_error_exit;
> + }
> +
> + /* Work out supported page masks */
> + pgmask = spapr_iommu_fixmask(env->sps.sps, page_size_mask);
> +
> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> + rtas_st(rets, 1, windows_available);
> +
> + /*
> + * This is "Largest contiguous block of TCEs allocated specifically
> + * for (that is, are reserved for) this PE".
> + * Return the maximum number as all RAM was in 4K pages.
> + */
> + rtas_st(rets, 2, ram_size >> SPAPR_TCE_PAGE_SHIFT);
> + rtas_st(rets, 3, pgmask);
> + rtas_st(rets, 4, pgmask); /* DMA migration mask */
> + return;
> +
> +hw_error_exit:
> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> + return;
> +
> +param_error_exit:
> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +}
> +
> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
> + sPAPREnvironment *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> + sPAPRPHBState *sphb;
> + sPAPRPHBClass *spc;
> + sPAPRTCETable *tcet = NULL;
> + uint32_t addr, page_shift, window_shift, liobn;
> + uint64_t buid;
> + long ret;
> +
> + if ((nargs != 5) || (nret != 4)) {
> + goto param_error_exit;
> + }
> +
> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
> + addr = rtas_ld(args, 0);
> + sphb = spapr_pci_find_phb(spapr, buid);
> + if (!sphb) {
> + goto param_error_exit;
> + }
> +
> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
> + if (!spc->ddw_create) {
> + goto hw_error_exit;
> + }
> +
> + page_shift = rtas_ld(args, 3);
> + window_shift = rtas_ld(args, 4);
> + /* Default 32bit window#0 is always there so +1 */
> + liobn = SPAPR_PCI_LIOBN(sphb->index, sphb->ddw_num + 1);
What if you just initialize ddw_num to 1 on init?
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option Alexey Kardashevskiy
@ 2014-08-27 9:44 ` Alexander Graf
2014-08-27 14:24 ` Alexey Kardashevskiy
2014-08-27 9:44 ` Alexander Graf
1 sibling, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2014-08-27 9:44 UTC (permalink / raw)
To: Alexey Kardashevskiy, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 15.08.14 12:12, Alexey Kardashevskiy wrote:
> This defines new "pseries" machine version which is capable of DDW
> (dynamic DMA windows) by default.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
This machine should also be the new default for -M pseries.
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option Alexey Kardashevskiy
2014-08-27 9:44 ` Alexander Graf
@ 2014-08-27 9:44 ` Alexander Graf
1 sibling, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2014-08-27 9:44 UTC (permalink / raw)
To: Alexey Kardashevskiy, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 15.08.14 12:12, Alexey Kardashevskiy wrote:
> This defines new "pseries" machine version which is capable of DDW
> (dynamic DMA windows) by default.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
In fact, please reverse the logic. The 2.1 machine should get ddw=off
and the 2.2 machine an empty compat_props.
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support
2014-08-27 9:36 ` Alexander Graf
@ 2014-08-27 13:56 ` Alexey Kardashevskiy
0 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-27 13:56 UTC (permalink / raw)
To: Alexander Graf, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 08/27/2014 07:36 PM, Alexander Graf wrote:
>
>
> On 15.08.14 12:12, Alexey Kardashevskiy wrote:
>> This adds support for Dynamic DMA Windows (DDW) option defined by
>> the SPAPR specification which allows to have additional DMA window(s)
>> which can support page sizes other than 4K.
>>
>> The existing implementation of DDW in the guest tries to create one huge
>> DMA window with 64K or 16MB pages and map the entire guest RAM to. If it
>> succeeds, the guest switches to dma_direct_ops and never calls
>> TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use
>> the entire RAM and not waste time on map/unmap.
>>
>> This adds 4 RTAS handlers:
>> * ibm,query-pe-dma-window
>> * ibm,create-pe-dma-window
>> * ibm,remove-pe-dma-window
>> * ibm,reset-pe-dma-window
>> These are registered from type_init() callback.
>>
>> These RTAS handlers are implemented in a separate file to avoid polluting
>> spapr_iommu.c with PHB.
>>
>> Since no PHB class implements new callback in this patch, no functional
>> change is expected.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * double loop squashed to spapr_iommu_fixmask() helper
>> * added @ddw_num counter to PHB, it is used to generate LIOBN for new
>> window; it is reset on ddw-reset event
>> * added ULL to constants used in shift operations
>> * rtas_ibm_reset_pe_dma_window() and rtas_ibm_remove_pe_dma_window()
>> do not remove windows anymore, the PHB callback has to as it will reuse
>> the same code in case of guest reboot as well
>> ---
>> hw/ppc/Makefile.objs | 3 +
>> hw/ppc/spapr_pci.c | 3 +-
>> hw/ppc/spapr_rtas_ddw.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
>> include/hw/pci-host/spapr.h | 19 +++
>> include/hw/ppc/spapr.h | 6 +-
>> trace-events | 4 +
>> 6 files changed, 316 insertions(+), 2 deletions(-)
>> create mode 100644 hw/ppc/spapr_rtas_ddw.c
>>
>> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
>> index edd44d0..9773294 100644
>> --- a/hw/ppc/Makefile.objs
>> +++ b/hw/ppc/Makefile.objs
>> @@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o
>> ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>> obj-y += spapr_pci_vfio.o
>> endif
>> +ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
>> +obj-y += spapr_rtas_ddw.o
>> +endif
>> # PowerPC 4xx boards
>> obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>> obj-y += ppc4xx_pci.o
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index aa20c36..9b03d0d 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -759,7 +759,7 @@ static int spapr_pci_post_load(void *opaque, int version_id)
>>
>> static const VMStateDescription vmstate_spapr_pci = {
>> .name = "spapr_pci",
>> - .version_id = 2,
>> + .version_id = 3,
>> .minimum_version_id = 2,
>> .pre_save = spapr_pci_pre_save,
>> .post_load = spapr_pci_post_load,
>> @@ -775,6 +775,7 @@ static const VMStateDescription vmstate_spapr_pci = {
>> VMSTATE_INT32(msi_devs_num, sPAPRPHBState),
>> VMSTATE_STRUCT_VARRAY_ALLOC(msi_devs, sPAPRPHBState, msi_devs_num, 0,
>> vmstate_spapr_pci_msi, spapr_pci_msi_mig),
>> + VMSTATE_UINT32_V(ddw_num, sPAPRPHBState, 3),
>> VMSTATE_END_OF_LIST()
>> },
>> };
>> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
>> new file mode 100644
>> index 0000000..2b5376a
>> --- /dev/null
>> +++ b/hw/ppc/spapr_rtas_ddw.c
>> @@ -0,0 +1,283 @@
>> +/*
>> + * QEMU sPAPR Dynamic DMA windows support
>> + *
>> + * Copyright (c) 2014 Alexey Kardashevskiy, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License,
>> + * or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/pci-host/spapr.h"
>> +#include "trace.h"
>> +
>> +static uint32_t spapr_iommu_fixmask(struct ppc_one_seg_page_size *sps,
>> + uint32_t query_mask)
>> +{
>> + int i, j;
>> + uint32_t mask = 0;
>> + const struct { int shift; uint32_t mask; } masks[] = {
>> + { 12, DDW_PGSIZE_4K },
>> + { 16, DDW_PGSIZE_64K },
>> + { 24, DDW_PGSIZE_16M },
>> + { 25, DDW_PGSIZE_32M },
>> + { 26, DDW_PGSIZE_64M },
>> + { 27, DDW_PGSIZE_128M },
>> + { 28, DDW_PGSIZE_256M },
>> + { 34, DDW_PGSIZE_16G },
>> + };
>> +
>> + for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
>> + for (j = 0; j < ARRAY_SIZE(masks); ++j) {
>> + if ((sps[i].page_shift == masks[j].shift) &&
>> + (query_mask & masks[j].mask)) {
>> + mask |= masks[j].mask;
>> + }
>> + }
>> + }
>> +
>> + return mask;
>> +}
>> +
>> +static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
>> + sPAPREnvironment *spapr,
>> + uint32_t token, uint32_t nargs,
>> + target_ulong args,
>> + uint32_t nret, target_ulong rets)
>> +{
>> + CPUPPCState *env = &cpu->env;
>> + sPAPRPHBState *sphb;
>> + sPAPRPHBClass *spc;
>> + uint64_t buid;
>> + uint32_t addr, pgmask = 0;
>> + uint32_t windows_available = 0, page_size_mask = 0;
>> + long ret;
>> +
>> + if ((nargs != 3) || (nret != 5)) {
>> + goto param_error_exit;
>> + }
>> +
>> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>> + addr = rtas_ld(args, 0);
>> + sphb = spapr_pci_find_phb(spapr, buid);
>> + if (!sphb) {
>> + goto param_error_exit;
>> + }
>> +
>> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
>> + if (!spc->ddw_query) {
>> + goto hw_error_exit;
>> + }
>> +
>> + ret = spc->ddw_query(sphb, &windows_available, &page_size_mask);
>> + trace_spapr_iommu_ddw_query(buid, addr, windows_available,
>> + page_size_mask, pgmask, ret);
>> + if (ret) {
>> + goto hw_error_exit;
>> + }
>> +
>> + /* Work out supported page masks */
>> + pgmask = spapr_iommu_fixmask(env->sps.sps, page_size_mask);
>> +
>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> + rtas_st(rets, 1, windows_available);
>> +
>> + /*
>> + * This is "Largest contiguous block of TCEs allocated specifically
>> + * for (that is, are reserved for) this PE".
>> + * Return the maximum number as all RAM was in 4K pages.
>> + */
>> + rtas_st(rets, 2, ram_size >> SPAPR_TCE_PAGE_SHIFT);
>> + rtas_st(rets, 3, pgmask);
>> + rtas_st(rets, 4, pgmask); /* DMA migration mask */
>> + return;
>> +
>> +hw_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> + return;
>> +
>> +param_error_exit:
>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +}
>> +
>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
>> + sPAPREnvironment *spapr,
>> + uint32_t token, uint32_t nargs,
>> + target_ulong args,
>> + uint32_t nret, target_ulong rets)
>> +{
>> + sPAPRPHBState *sphb;
>> + sPAPRPHBClass *spc;
>> + sPAPRTCETable *tcet = NULL;
>> + uint32_t addr, page_shift, window_shift, liobn;
>> + uint64_t buid;
>> + long ret;
>> +
>> + if ((nargs != 5) || (nret != 4)) {
>> + goto param_error_exit;
>> + }
>> +
>> + buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>> + addr = rtas_ld(args, 0);
>> + sphb = spapr_pci_find_phb(spapr, buid);
>> + if (!sphb) {
>> + goto param_error_exit;
>> + }
>> +
>> + spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
>> + if (!spc->ddw_create) {
>> + goto hw_error_exit;
>> + }
>> +
>> + page_shift = rtas_ld(args, 3);
>> + window_shift = rtas_ld(args, 4);
>> + /* Default 32bit window#0 is always there so +1 */
>> + liobn = SPAPR_PCI_LIOBN(sphb->index, sphb->ddw_num + 1);
>
> What if you just initialize ddw_num to 1 on init?
All this DDW needs rework again.
POWER8 allows exactly 2 DMA windows per PE, their start address is fixed
and defined by bit 59 of PCI address.
Normally guests just create additional huge window and that's it. And they
do not care if QEMU advertises "ddw-reset" RTAS token. And it is all simple
and nice.
But sles11sp3 is awesomely smart and if it sees "ddw-reset" token, it
removes the default window and then creates a huge window. I can do that
but guests (sles11 and newer) do not accept huge window starting from zero
- it is considered a failure (while it is not by PAPR and I tried hacking
guest - that works).
So I'll do first version without "reset".
However "ddw-reset" is mandatory if we want to be PAPR 2.7 compliant. So
I'll probably allow removing default DMA window but when guest asks for a
huge one, I will create the second window which starts high and hope if the
guest will ever ask for a second window, it will be ready for it to start
from zero :)
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option
2014-08-27 9:44 ` Alexander Graf
@ 2014-08-27 14:24 ` Alexey Kardashevskiy
0 siblings, 0 replies; 41+ messages in thread
From: Alexey Kardashevskiy @ 2014-08-27 14:24 UTC (permalink / raw)
To: Alexander Graf, qemu-devel
Cc: Alex Williamson, qemu-ppc, Gavin Shan, David Gibson
On 08/27/2014 07:44 PM, Alexander Graf wrote:
>
>
> On 15.08.14 12:12, Alexey Kardashevskiy wrote:
>> This defines new "pseries" machine version which is capable of DDW
>> (dynamic DMA windows) by default.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> This machine should also be the new default for -M pseries.
I was hoping that the David's machines patch will go first and I'll have to
rework mine anyway :)
--
Alexey
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2014-08-27 14:24 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-15 10:12 [Qemu-devel] [RFC PATCH v2 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal Alexey Kardashevskiy
2014-08-19 0:39 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows Alexey Kardashevskiy
2014-08-19 0:43 ` David Gibson
2014-08-20 8:09 ` Alexey Kardashevskiy
2014-08-27 9:27 ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 03/13] spapr_pci: Make find_phb()/find_dev() public Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 04/13] spapr_iommu: Make spapr_tce_find_by_liobn() public Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros Alexey Kardashevskiy
2014-08-19 0:44 ` David Gibson
2014-08-27 9:29 ` Alexander Graf
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 06/13] spapr_iommu: Implement free_table() helper Alexey Kardashevskiy
2014-08-26 6:16 ` David Gibson
2014-08-26 7:04 ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 07/13] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls support Alexey Kardashevskiy
2014-08-26 7:06 ` David Gibson
2014-08-27 9:36 ` Alexander Graf
2014-08-27 13:56 ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 08/13] spapr_pci: Enable DDW Alexey Kardashevskiy
2014-08-26 7:14 ` David Gibson
2014-08-26 8:11 ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 09/13] spapr_pci_vfio: Call spapr_pci::reset on reset Alexey Kardashevskiy
2014-08-26 6:55 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW Alexey Kardashevskiy
2014-08-18 17:42 ` Alex Williamson
2014-08-20 7:49 ` Alexey Kardashevskiy
2014-08-20 19:44 ` Alex Williamson
2014-08-21 2:47 ` Alexey Kardashevskiy
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 11/13] spapr_pci_vfio: Enable DDW Alexey Kardashevskiy
2014-08-26 7:19 ` David Gibson
2014-08-26 8:16 ` Alexey Kardashevskiy
2014-08-27 8:25 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 12/13] vfio: Enable DDW ioctls to VFIO IOMMU driver Alexey Kardashevskiy
2014-08-26 7:20 ` David Gibson
2014-08-26 8:20 ` Alexey Kardashevskiy
2014-08-27 8:42 ` David Gibson
2014-08-15 10:12 ` [Qemu-devel] [RFC PATCH v2 13/13] spapr: Add pseries-2.2 machine with default "ddw" option Alexey Kardashevskiy
2014-08-27 9:44 ` Alexander Graf
2014-08-27 14:24 ` Alexey Kardashevskiy
2014-08-27 9:44 ` Alexander Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).