* [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
@ 2018-02-05 7:22 Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 1/3] s390x/pci: fixup the code walking IOMMU tables Yi Min Zhao
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Yi Min Zhao @ 2018-02-05 7:22 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, pasic, pmorel, cohuck, alex.williamson, marcel,
zyimin
This series contains three patches,
1) optimizes the code including walking DMA tables and rpcit handler
2) fixes the issue caused by IOTLB global refresh
3) uses the right pal and pba when registering ioat
The issue mentioned above was found when we tested SMC-r tools. This
behavior has been introduced when linux guest started using a global
refresh to purge the whole IOTLB of invalid entries in a lazy fashion
instead of flushing each entry when invalidating table entries.
The previous QEMU implementation didn't keep track of the mapping,
didn't handle correctly the global flush demand from the guest and a
major part of the IOTLB entries were not flushed.
Consequently linux kernel on the host keeping the previous mapping
reports, as it should, -EEXIST DMA mapping error on the next mapping
with the same IOVA. The second patch fixes this issue.
During the investigation, we noticed that the current code walking
PCI IOMMU page tables didn't check important flags of table entries,
including:
1) protection bit
2) table length
3) table offset
4) intermediate tables' invalid bit
5) format control bit
We implement the checking in the first patch before handling the
IOTLB global refresh issue. To keep track of the mapped IOTLB entries
and be able to check if the host IOTLB entries need to be refreshed
we implement a IOTLB cache in QEMU, and introduce some helper
functions to check these bits. All S390IOTLBEntry instances are stored
in a new hashtable which are indexed by IOVA. Each PCI device has its
own IOMMU. Therefore each IOMMU also has its own hashtable caching
corresponding PCI device's DMA entries. Finally, we split 1M
contiguous DMA range into 4K pages to do DMA map, and the code about
error notification is also optimized.
Change log:
v1->v2:
1) update commit messages
2) move some changes from the 2nd patch to the 1st patch
3) define macros for 'ett' in the 1st patch
Yi Min Zhao (3):
s390x/pci: fixup the code walking IOMMU tables
s390x/pci: fixup global refresh
s390x/pci: use the right pal and pba in reg_ioat()
hw/s390x/s390-pci-bus.c | 233 ++++++++++++++++++++++++++++++++++++++---------
hw/s390x/s390-pci-bus.h | 17 ++++
hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
3 files changed, 275 insertions(+), 78 deletions(-)
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 1/3] s390x/pci: fixup the code walking IOMMU tables
2018-02-05 7:22 [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Yi Min Zhao
@ 2018-02-05 7:22 ` Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 2/3] s390x/pci: fixup global refresh Yi Min Zhao
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Yi Min Zhao @ 2018-02-05 7:22 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, pasic, pmorel, cohuck, alex.williamson, marcel,
zyimin
Current s390x PCI IOMMU code is lack of flags' checking, including:
1) protection bit
2) table length
3) table offset
4) intermediate tables' invalid bit
5) format control bit
This patch introduces a new struct named S390IOTLBEntry, and makes up
these missed checkings. At the same time, inform the guest with the
corresponding error number when the check fails. Finally, in order to
get the error number, we export s390_guest_io_table_walk().
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
---
hw/s390x/s390-pci-bus.c | 223 ++++++++++++++++++++++++++++++++++++++---------
hw/s390x/s390-pci-bus.h | 16 ++++
hw/s390x/s390-pci-inst.c | 64 ++++++--------
3 files changed, 225 insertions(+), 78 deletions(-)
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 7d9c65e719..e7ef7d28d9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -309,49 +309,186 @@ static uint64_t get_st_pto(uint64_t entry)
: 0;
}
-static uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
- uint64_t guest_dma_address)
+static bool rt_entry_isvalid(uint64_t entry)
{
- uint64_t sto_a, pto_a, px_a;
- uint64_t sto, pto, pte;
- uint32_t rtx, sx, px;
-
- rtx = calc_rtx(guest_dma_address);
- sx = calc_sx(guest_dma_address);
- px = calc_px(guest_dma_address);
-
- sto_a = guest_iota + rtx * sizeof(uint64_t);
- sto = address_space_ldq(&address_space_memory, sto_a,
- MEMTXATTRS_UNSPECIFIED, NULL);
- sto = get_rt_sto(sto);
- if (!sto) {
- pte = 0;
+ return (entry & ZPCI_TABLE_VALID_MASK) == ZPCI_TABLE_VALID;
+}
+
+static bool pt_entry_isvalid(uint64_t entry)
+{
+ return (entry & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_VALID;
+}
+
+static bool entry_isprotected(uint64_t entry)
+{
+ return (entry & ZPCI_TABLE_PROT_MASK) == ZPCI_TABLE_PROTECTED;
+}
+
+/* ett is expected table type, -1 page table, 0 segment table, 1 region table */
+static uint64_t get_table_index(uint64_t iova, int8_t ett)
+{
+ switch (ett) {
+ case ZPCI_ETT_PT:
+ return calc_px(iova);
+ case ZPCI_ETT_ST:
+ return calc_sx(iova);
+ case ZPCI_ETT_RT:
+ return calc_rtx(iova);
+ }
+
+ return -1;
+}
+
+static bool entry_isvalid(uint64_t entry, int8_t ett)
+{
+ switch (ett) {
+ case ZPCI_ETT_PT:
+ return pt_entry_isvalid(entry);
+ case ZPCI_ETT_ST:
+ case ZPCI_ETT_RT:
+ return rt_entry_isvalid(entry);
+ }
+
+ return false;
+}
+
+/* Return true if address translation is done */
+static bool translate_iscomplete(uint64_t entry, int8_t ett)
+{
+ switch (ett) {
+ case 0:
+ return (entry & ZPCI_TABLE_FC) ? true : false;
+ case 1:
+ return false;
+ }
+
+ return true;
+}
+
+static uint64_t get_frame_size(int8_t ett)
+{
+ switch (ett) {
+ case ZPCI_ETT_PT:
+ return 1ULL << 12;
+ case ZPCI_ETT_ST:
+ return 1ULL << 20;
+ case ZPCI_ETT_RT:
+ return 1ULL << 31;
+ }
+
+ return 0;
+}
+
+static uint64_t get_next_table_origin(uint64_t entry, int8_t ett)
+{
+ switch (ett) {
+ case ZPCI_ETT_PT:
+ return entry & ZPCI_PTE_ADDR_MASK;
+ case ZPCI_ETT_ST:
+ return get_st_pto(entry);
+ case ZPCI_ETT_RT:
+ return get_rt_sto(entry);
+ }
+
+ return 0;
+}
+
+/**
+ * table_translate: do translation within one table and return the following
+ * table origin
+ *
+ * @entry: the entry being translated, the result is stored in this.
+ * @to: the address of table origin.
+ * @ett: expected table type, 1 region table, 0 segment table and -1 page table.
+ * @error: error code
+ */
+static uint64_t table_translate(S390IOTLBEntry *entry, uint64_t to, int8_t ett,
+ uint16_t *error)
+{
+ uint64_t tx, te, nto = 0;
+ uint16_t err = 0;
+
+ tx = get_table_index(entry->iova, ett);
+ te = address_space_ldq(&address_space_memory, to + tx * sizeof(uint64_t),
+ MEMTXATTRS_UNSPECIFIED, NULL);
+
+ if (!te) {
+ err = ERR_EVENT_INVALTE;
goto out;
}
- pto_a = sto + sx * sizeof(uint64_t);
- pto = address_space_ldq(&address_space_memory, pto_a,
- MEMTXATTRS_UNSPECIFIED, NULL);
- pto = get_st_pto(pto);
- if (!pto) {
- pte = 0;
+ if (!entry_isvalid(te, ett)) {
+ entry->perm &= IOMMU_NONE;
goto out;
}
- px_a = pto + px * sizeof(uint64_t);
- pte = address_space_ldq(&address_space_memory, px_a,
- MEMTXATTRS_UNSPECIFIED, NULL);
+ if (ett == ZPCI_ETT_RT && ((te & ZPCI_TABLE_LEN_RTX) != ZPCI_TABLE_LEN_RTX
+ || te & ZPCI_TABLE_OFFSET_MASK)) {
+ err = ERR_EVENT_INVALTL;
+ goto out;
+ }
+ nto = get_next_table_origin(te, ett);
+ if (!nto) {
+ err = ERR_EVENT_TT;
+ goto out;
+ }
+
+ if (entry_isprotected(te)) {
+ entry->perm &= IOMMU_RO;
+ } else {
+ entry->perm &= IOMMU_RW;
+ }
+
+ if (translate_iscomplete(te, ett)) {
+ switch (ett) {
+ case ZPCI_ETT_PT:
+ entry->translated_addr = te & ZPCI_PTE_ADDR_MASK;
+ break;
+ case ZPCI_ETT_ST:
+ entry->translated_addr = (te & ZPCI_SFAA_MASK) |
+ (entry->iova & ~ZPCI_SFAA_MASK);
+ break;
+ }
+ nto = 0;
+ }
out:
- return pte;
+ if (err) {
+ entry->perm = IOMMU_NONE;
+ *error = err;
+ }
+ entry->len = get_frame_size(ett);
+ return nto;
+}
+
+uint16_t s390_guest_io_table_walk(uint64_t g_iota, hwaddr addr,
+ S390IOTLBEntry *entry)
+{
+ uint64_t to = s390_pci_get_table_origin(g_iota);
+ int8_t ett = 1;
+ uint16_t error = 0;
+
+ entry->iova = addr & PAGE_MASK;
+ entry->translated_addr = 0;
+ entry->perm = IOMMU_RW;
+
+ if (entry_isprotected(g_iota)) {
+ entry->perm &= IOMMU_RO;
+ }
+
+ while (to) {
+ to = table_translate(entry, to, ett--, &error);
+ }
+
+ return error;
}
static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
IOMMUAccessFlags flag)
{
- uint64_t pte;
- uint32_t flags;
S390PCIIOMMU *iommu = container_of(mr, S390PCIIOMMU, iommu_mr);
+ S390IOTLBEntry entry;
+ uint16_t error = 0;
IOMMUTLBEntry ret = {
.target_as = &address_space_memory,
.iova = 0,
@@ -374,26 +511,26 @@ static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
DPRINTF("iommu trans addr 0x%" PRIx64 "\n", addr);
if (addr < iommu->pba || addr > iommu->pal) {
- return ret;
+ error = ERR_EVENT_OORANGE;
+ goto err;
}
- pte = s390_guest_io_table_walk(s390_pci_get_table_origin(iommu->g_iota),
- addr);
- if (!pte) {
- return ret;
- }
+ error = s390_guest_io_table_walk(iommu->g_iota, addr, &entry);
- flags = pte & ZPCI_PTE_FLAG_MASK;
- ret.iova = addr;
- ret.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
- ret.addr_mask = 0xfff;
+ ret.iova = entry.iova;
+ ret.translated_addr = entry.translated_addr;
+ ret.addr_mask = entry.len - 1;
+ ret.perm = entry.perm;
- if (flags & ZPCI_PTE_INVALID) {
- ret.perm = IOMMU_NONE;
- } else {
- ret.perm = IOMMU_RW;
+ if (flag != IOMMU_NONE && !(flag & ret.perm)) {
+ error = ERR_EVENT_TPROTE;
+ }
+err:
+ if (error) {
+ iommu->pbdev->state = ZPCI_FS_ERROR;
+ s390_pci_generate_error_event(error, iommu->pbdev->fh,
+ iommu->pbdev->fid, addr, 0);
}
-
return ret;
}
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index 2993f0ddef..7ed577c806 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -148,6 +148,8 @@ enum ZpciIoatDtype {
#define ZPCI_STE_FLAG_MASK 0x7ffULL
#define ZPCI_STE_ADDR_MASK (~ZPCI_STE_FLAG_MASK)
+#define ZPCI_SFAA_MASK (~((1ULL << 20) - 1))
+
/* I/O Page tables */
#define ZPCI_PTE_VALID_MASK 0x400
#define ZPCI_PTE_INVALID 0x400
@@ -165,10 +167,15 @@ enum ZpciIoatDtype {
#define ZPCI_TABLE_INVALID 0x20
#define ZPCI_TABLE_PROTECTED 0x200
#define ZPCI_TABLE_UNPROTECTED 0x000
+#define ZPCI_TABLE_FC 0x400
#define ZPCI_TABLE_VALID_MASK 0x20
#define ZPCI_TABLE_PROT_MASK 0x200
+#define ZPCI_ETT_RT 1
+#define ZPCI_ETT_ST 0
+#define ZPCI_ETT_PT -1
+
/* PCI Function States
*
* reserved: default; device has just been plugged or is in progress of being
@@ -253,6 +260,13 @@ typedef struct S390MsixInfo {
uint32_t pba_offset;
} S390MsixInfo;
+typedef struct S390IOTLBEntry {
+ uint64_t iova;
+ uint64_t translated_addr;
+ uint64_t len;
+ uint64_t perm;
+} S390IOTLBEntry;
+
typedef struct S390PCIBusDevice S390PCIBusDevice;
typedef struct S390PCIIOMMU {
Object parent_obj;
@@ -320,6 +334,8 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu);
void s390_pci_iommu_disable(S390PCIIOMMU *iommu);
void s390_pci_generate_error_event(uint16_t pec, uint32_t fh, uint32_t fid,
uint64_t faddr, uint32_t e);
+uint16_t s390_guest_io_table_walk(uint64_t g_iota, hwaddr addr,
+ S390IOTLBEntry *entry);
S390PCIBusDevice *s390_pci_find_dev_by_idx(S390pciState *s, uint32_t idx);
S390PCIBusDevice *s390_pci_find_dev_by_fh(S390pciState *s, uint32_t fh);
S390PCIBusDevice *s390_pci_find_dev_by_fid(S390pciState *s, uint32_t fid);
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index be449210d9..1d33a89351 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -575,23 +575,23 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
{
CPUS390XState *env = &cpu->env;
uint32_t fh;
+ uint16_t error = 0;
S390PCIBusDevice *pbdev;
S390PCIIOMMU *iommu;
+ S390IOTLBEntry entry;
hwaddr start, end;
- IOMMUTLBEntry entry;
- IOMMUMemoryRegion *iommu_mr;
- IOMMUMemoryRegionClass *imrc;
+ IOMMUTLBEntry notify;
cpu_synchronize_state(CPU(cpu));
if (env->psw.mask & PSW_MASK_PSTATE) {
s390_program_interrupt(env, PGM_PRIVILEGED, 4, ra);
- goto out;
+ return 0;
}
if (r2 & 0x1) {
s390_program_interrupt(env, PGM_SPECIFICATION, 4, ra);
- goto out;
+ return 0;
}
fh = env->regs[r1] >> 32;
@@ -602,7 +602,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
if (!pbdev) {
DPRINTF("rpcit no pci dev\n");
setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
- goto out;
+ return 0;
}
switch (pbdev->state) {
@@ -622,44 +622,38 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
iommu = pbdev->iommu;
if (!iommu->g_iota) {
- pbdev->state = ZPCI_FS_ERROR;
- setcc(cpu, ZPCI_PCI_LS_ERR);
- s390_set_status_code(env, r1, ZPCI_PCI_ST_INSUF_RES);
- s390_pci_generate_error_event(ERR_EVENT_INVALAS, pbdev->fh, pbdev->fid,
- start, 0);
- goto out;
+ error = ERR_EVENT_INVALAS;
+ goto err;
}
if (end < iommu->pba || start > iommu->pal) {
- pbdev->state = ZPCI_FS_ERROR;
- setcc(cpu, ZPCI_PCI_LS_ERR);
- s390_set_status_code(env, r1, ZPCI_PCI_ST_INSUF_RES);
- s390_pci_generate_error_event(ERR_EVENT_OORANGE, pbdev->fh, pbdev->fid,
- start, 0);
- goto out;
+ error = ERR_EVENT_OORANGE;
+ goto err;
}
- iommu_mr = &iommu->iommu_mr;
- imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
-
while (start < end) {
- entry = imrc->translate(iommu_mr, start, IOMMU_NONE);
-
- if (!entry.translated_addr) {
- pbdev->state = ZPCI_FS_ERROR;
- setcc(cpu, ZPCI_PCI_LS_ERR);
- s390_set_status_code(env, r1, ZPCI_PCI_ST_INSUF_RES);
- s390_pci_generate_error_event(ERR_EVENT_SERR, pbdev->fh, pbdev->fid,
- start, ERR_EVENT_Q_BIT);
- goto out;
+ error = s390_guest_io_table_walk(iommu->g_iota, start, &entry);
+ if (error) {
+ break;
}
-
- memory_region_notify_iommu(iommu_mr, entry);
- start += entry.addr_mask + 1;
+ notify.target_as = &address_space_memory;
+ notify.iova = entry.iova;
+ notify.translated_addr = entry.translated_addr;
+ notify.addr_mask = entry.len - 1;
+ notify.perm = entry.perm;
+ memory_region_notify_iommu(&iommu->iommu_mr, notify);
+ start += entry.len;
}
- setcc(cpu, ZPCI_PCI_LS_OK);
-out:
+err:
+ if (error) {
+ pbdev->state = ZPCI_FS_ERROR;
+ setcc(cpu, ZPCI_PCI_LS_ERR);
+ s390_set_status_code(env, r1, ZPCI_PCI_ST_FUNC_IN_ERR);
+ s390_pci_generate_error_event(error, pbdev->fh, pbdev->fid, start, 0);
+ } else {
+ setcc(cpu, ZPCI_PCI_LS_OK);
+ }
return 0;
}
--
2.14.3 (Apple Git-98)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 2/3] s390x/pci: fixup global refresh
2018-02-05 7:22 [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 1/3] s390x/pci: fixup the code walking IOMMU tables Yi Min Zhao
@ 2018-02-05 7:22 ` Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 3/3] s390x/pci: use the right pal and pba in reg_ioat() Yi Min Zhao
2018-02-06 10:23 ` [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Cornelia Huck
3 siblings, 0 replies; 7+ messages in thread
From: Yi Min Zhao @ 2018-02-05 7:22 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, pasic, pmorel, cohuck, alex.williamson, marcel,
zyimin
The VFIO common code doesn't provide the possibility to modify a
previous mapping entry in another way than unmapping and mapping again
with new properties.
To avoid -EEXIST DMA mapping error, we introduce a GHashTable to store
S390IOTLBEntry instances in order to cache the mapped entries. When
intercepting rpcit instruction, ignore the identical mapped entries to
avoid doing map operations multiple times and do unmap and re-map
operations for the case of updating the valid entries.
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
---
hw/s390x/s390-pci-bus.c | 24 +++++++++++++++-------
hw/s390x/s390-pci-bus.h | 1 +
hw/s390x/s390-pci-inst.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
3 files changed, 63 insertions(+), 15 deletions(-)
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index e7ef7d28d9..77a50cab36 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -487,7 +487,8 @@ static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
IOMMUAccessFlags flag)
{
S390PCIIOMMU *iommu = container_of(mr, S390PCIIOMMU, iommu_mr);
- S390IOTLBEntry entry;
+ S390IOTLBEntry *entry;
+ uint64_t iova = addr & PAGE_MASK;
uint16_t error = 0;
IOMMUTLBEntry ret = {
.target_as = &address_space_memory,
@@ -515,12 +516,17 @@ static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
goto err;
}
- error = s390_guest_io_table_walk(iommu->g_iota, addr, &entry);
-
- ret.iova = entry.iova;
- ret.translated_addr = entry.translated_addr;
- ret.addr_mask = entry.len - 1;
- ret.perm = entry.perm;
+ entry = g_hash_table_lookup(iommu->iotlb, &iova);
+ if (entry) {
+ ret.iova = entry->iova;
+ ret.translated_addr = entry->translated_addr;
+ ret.addr_mask = entry->len - 1;
+ ret.perm = entry->perm;
+ } else {
+ ret.iova = iova;
+ ret.addr_mask = ~PAGE_MASK;
+ ret.perm = IOMMU_NONE;
+ }
if (flag != IOMMU_NONE && !(flag & ret.perm)) {
error = ERR_EVENT_TPROTE;
@@ -572,6 +578,8 @@ static S390PCIIOMMU *s390_pci_get_iommu(S390pciState *s, PCIBus *bus,
PCI_FUNC(devfn));
memory_region_init(&iommu->mr, OBJECT(iommu), mr_name, UINT64_MAX);
address_space_init(&iommu->as, &iommu->mr, as_name);
+ iommu->iotlb = g_hash_table_new_full(g_int64_hash, g_int64_equal,
+ NULL, g_free);
table->iommu[PCI_SLOT(devfn)] = iommu;
g_free(mr_name);
@@ -661,6 +669,7 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
{
iommu->enabled = false;
+ g_hash_table_remove_all(iommu->iotlb);
memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr));
object_unparent(OBJECT(&iommu->iommu_mr));
}
@@ -676,6 +685,7 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
}
table->iommu[PCI_SLOT(devfn)] = NULL;
+ g_hash_table_destroy(iommu->iotlb);
address_space_destroy(&iommu->as);
object_unparent(OBJECT(&iommu->mr));
object_unparent(OBJECT(iommu));
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index 7ed577c806..1f7f9b5814 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -278,6 +278,7 @@ typedef struct S390PCIIOMMU {
uint64_t g_iota;
uint64_t pba;
uint64_t pal;
+ GHashTable *iotlb;
} S390PCIIOMMU;
typedef struct S390PCIIOMMUTable {
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 1d33a89351..997a9cc2e9 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -571,6 +571,45 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
return 0;
}
+static void s390_pci_update_iotlb(S390PCIIOMMU *iommu, S390IOTLBEntry *entry)
+{
+ S390IOTLBEntry *cache = g_hash_table_lookup(iommu->iotlb, &entry->iova);
+ IOMMUTLBEntry notify = {
+ .target_as = &address_space_memory,
+ .iova = entry->iova,
+ .translated_addr = entry->translated_addr,
+ .perm = entry->perm,
+ .addr_mask = ~PAGE_MASK,
+ };
+
+ if (entry->perm == IOMMU_NONE) {
+ if (!cache) {
+ return;
+ }
+ g_hash_table_remove(iommu->iotlb, &entry->iova);
+ } else {
+ if (cache) {
+ if (cache->perm == entry->perm &&
+ cache->translated_addr == entry->translated_addr) {
+ return;
+ }
+
+ notify.perm = IOMMU_NONE;
+ memory_region_notify_iommu(&iommu->iommu_mr, notify);
+ notify.perm = entry->perm;
+ }
+
+ cache = g_new(S390IOTLBEntry, 1);
+ cache->iova = entry->iova;
+ cache->translated_addr = entry->translated_addr;
+ cache->len = PAGE_SIZE;
+ cache->perm = entry->perm;
+ g_hash_table_replace(iommu->iotlb, &cache->iova, cache);
+ }
+
+ memory_region_notify_iommu(&iommu->iommu_mr, notify);
+}
+
int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
{
CPUS390XState *env = &cpu->env;
@@ -580,7 +619,6 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
S390PCIIOMMU *iommu;
S390IOTLBEntry entry;
hwaddr start, end;
- IOMMUTLBEntry notify;
cpu_synchronize_state(CPU(cpu));
@@ -636,15 +674,14 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
if (error) {
break;
}
- notify.target_as = &address_space_memory;
- notify.iova = entry.iova;
- notify.translated_addr = entry.translated_addr;
- notify.addr_mask = entry.len - 1;
- notify.perm = entry.perm;
- memory_region_notify_iommu(&iommu->iommu_mr, notify);
+
start += entry.len;
+ while (entry.iova < start && entry.iova < end) {
+ s390_pci_update_iotlb(iommu, &entry);
+ entry.iova += PAGE_SIZE;
+ entry.translated_addr += PAGE_SIZE;
+ }
}
-
err:
if (error) {
pbdev->state = ZPCI_FS_ERROR;
--
2.14.3 (Apple Git-98)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 3/3] s390x/pci: use the right pal and pba in reg_ioat()
2018-02-05 7:22 [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 1/3] s390x/pci: fixup the code walking IOMMU tables Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 2/3] s390x/pci: fixup global refresh Yi Min Zhao
@ 2018-02-05 7:22 ` Yi Min Zhao
2018-02-06 10:23 ` [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Cornelia Huck
3 siblings, 0 replies; 7+ messages in thread
From: Yi Min Zhao @ 2018-02-05 7:22 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, pasic, pmorel, cohuck, alex.williamson, marcel,
zyimin
When registering ioat, pba should be comprised of leftmost 52 bits and
rightmost 12 binary zeros, and pal should be comprised of leftmost 52
bits and right most 12 binary ones. The lower 12 bits of words 5 and 7
of the FIB are ignored by the facility. Let's fixup this.
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
---
hw/s390x/s390-pci-inst.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 997a9cc2e9..3fcc330fe3 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -865,6 +865,8 @@ static int reg_ioat(CPUS390XState *env, S390PCIIOMMU *iommu, ZpciFib fib,
uint8_t dt = (g_iota >> 2) & 0x7;
uint8_t t = (g_iota >> 11) & 0x1;
+ pba &= ~0xfff;
+ pal |= 0xfff;
if (pba > pal || pba < ZPCI_SDMA_ADDR || pal > ZPCI_EDMA_ADDR) {
s390_program_interrupt(env, PGM_OPERAND, 6, ra);
return -EINVAL;
--
2.14.3 (Apple Git-98)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
2018-02-05 7:22 [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Yi Min Zhao
` (2 preceding siblings ...)
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 3/3] s390x/pci: use the right pal and pba in reg_ioat() Yi Min Zhao
@ 2018-02-06 10:23 ` Cornelia Huck
2018-02-07 2:56 ` Yi Min Zhao
3 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-02-06 10:23 UTC (permalink / raw)
To: Yi Min Zhao
Cc: qemu-devel, borntraeger, pasic, pmorel, alex.williamson, marcel
On Mon, 5 Feb 2018 15:22:55 +0800
Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
> This series contains three patches,
> 1) optimizes the code including walking DMA tables and rpcit handler
> 2) fixes the issue caused by IOTLB global refresh
> 3) uses the right pal and pba when registering ioat
>
> The issue mentioned above was found when we tested SMC-r tools. This
> behavior has been introduced when linux guest started using a global
> refresh to purge the whole IOTLB of invalid entries in a lazy fashion
> instead of flushing each entry when invalidating table entries.
>
> The previous QEMU implementation didn't keep track of the mapping,
> didn't handle correctly the global flush demand from the guest and a
> major part of the IOTLB entries were not flushed.
>
> Consequently linux kernel on the host keeping the previous mapping
> reports, as it should, -EEXIST DMA mapping error on the next mapping
> with the same IOVA. The second patch fixes this issue.
Introducing a local tracking mechanism still feels a bit awkward to me
(even though it works, of course). If nobody else needs such a thing,
our best choice is to do it like that, though.
>
> During the investigation, we noticed that the current code walking
> PCI IOMMU page tables didn't check important flags of table entries,
> including:
> 1) protection bit
> 2) table length
> 3) table offset
> 4) intermediate tables' invalid bit
> 5) format control bit
>
> We implement the checking in the first patch before handling the
> IOTLB global refresh issue. To keep track of the mapped IOTLB entries
> and be able to check if the host IOTLB entries need to be refreshed
> we implement a IOTLB cache in QEMU, and introduce some helper
> functions to check these bits. All S390IOTLBEntry instances are stored
> in a new hashtable which are indexed by IOVA. Each PCI device has its
> own IOMMU. Therefore each IOMMU also has its own hashtable caching
> corresponding PCI device's DMA entries. Finally, we split 1M
> contiguous DMA range into 4K pages to do DMA map, and the code about
> error notification is also optimized.
>
> Change log:
> v1->v2:
> 1) update commit messages
> 2) move some changes from the 2nd patch to the 1st patch
> 3) define macros for 'ett' in the 1st patch
>
> Yi Min Zhao (3):
> s390x/pci: fixup the code walking IOMMU tables
> s390x/pci: fixup global refresh
> s390x/pci: use the right pal and pba in reg_ioat()
>
> hw/s390x/s390-pci-bus.c | 233 ++++++++++++++++++++++++++++++++++++++---------
> hw/s390x/s390-pci-bus.h | 17 ++++
> hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
> 3 files changed, 275 insertions(+), 78 deletions(-)
>
I have played with these patches and some virtio-pci devices (since I
don't have access to real zpci cards), and it worked both under kvm and
under tcg. So I'm inclined to apply this (I can't review further due to
missing documentation), unless the pci folks have further comments.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
2018-02-06 10:23 ` [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Cornelia Huck
@ 2018-02-07 2:56 ` Yi Min Zhao
2018-02-07 11:17 ` Cornelia Huck
0 siblings, 1 reply; 7+ messages in thread
From: Yi Min Zhao @ 2018-02-07 2:56 UTC (permalink / raw)
To: Cornelia Huck
Cc: qemu-devel, borntraeger, pasic, pmorel, alex.williamson, marcel
在 2018/2/6 下午6:23, Cornelia Huck 写道:
> On Mon, 5 Feb 2018 15:22:55 +0800
> Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
>
>> This series contains three patches,
>> 1) optimizes the code including walking DMA tables and rpcit handler
>> 2) fixes the issue caused by IOTLB global refresh
>> 3) uses the right pal and pba when registering ioat
>>
>> The issue mentioned above was found when we tested SMC-r tools. This
>> behavior has been introduced when linux guest started using a global
>> refresh to purge the whole IOTLB of invalid entries in a lazy fashion
>> instead of flushing each entry when invalidating table entries.
>>
>> The previous QEMU implementation didn't keep track of the mapping,
>> didn't handle correctly the global flush demand from the guest and a
>> major part of the IOTLB entries were not flushed.
>>
>> Consequently linux kernel on the host keeping the previous mapping
>> reports, as it should, -EEXIST DMA mapping error on the next mapping
>> with the same IOVA. The second patch fixes this issue.
> Introducing a local tracking mechanism still feels a bit awkward to me
> (even though it works, of course). If nobody else needs such a thing,
> our best choice is to do it like that, though.
Caching iotlb entries is also helpful for us to support 2G mapping in
future.
>
>> During the investigation, we noticed that the current code walking
>> PCI IOMMU page tables didn't check important flags of table entries,
>> including:
>> 1) protection bit
>> 2) table length
>> 3) table offset
>> 4) intermediate tables' invalid bit
>> 5) format control bit
>>
>> We implement the checking in the first patch before handling the
>> IOTLB global refresh issue. To keep track of the mapped IOTLB entries
>> and be able to check if the host IOTLB entries need to be refreshed
>> we implement a IOTLB cache in QEMU, and introduce some helper
>> functions to check these bits. All S390IOTLBEntry instances are stored
>> in a new hashtable which are indexed by IOVA. Each PCI device has its
>> own IOMMU. Therefore each IOMMU also has its own hashtable caching
>> corresponding PCI device's DMA entries. Finally, we split 1M
>> contiguous DMA range into 4K pages to do DMA map, and the code about
>> error notification is also optimized.
>>
>> Change log:
>> v1->v2:
>> 1) update commit messages
>> 2) move some changes from the 2nd patch to the 1st patch
>> 3) define macros for 'ett' in the 1st patch
>>
>> Yi Min Zhao (3):
>> s390x/pci: fixup the code walking IOMMU tables
>> s390x/pci: fixup global refresh
>> s390x/pci: use the right pal and pba in reg_ioat()
>>
>> hw/s390x/s390-pci-bus.c | 233 ++++++++++++++++++++++++++++++++++++++---------
>> hw/s390x/s390-pci-bus.h | 17 ++++
>> hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
>> 3 files changed, 275 insertions(+), 78 deletions(-)
>>
> I have played with these patches and some virtio-pci devices (since I
> don't have access to real zpci cards), and it worked both under kvm and
> under tcg. So I'm inclined to apply this (I can't review further due to
> missing documentation), unless the pci folks have further comments.
Thanks!
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code
2018-02-07 2:56 ` Yi Min Zhao
@ 2018-02-07 11:17 ` Cornelia Huck
0 siblings, 0 replies; 7+ messages in thread
From: Cornelia Huck @ 2018-02-07 11:17 UTC (permalink / raw)
To: Yi Min Zhao
Cc: qemu-devel, borntraeger, pasic, pmorel, alex.williamson, marcel
On Wed, 7 Feb 2018 10:56:01 +0800
Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
> 在 2018/2/6 下午6:23, Cornelia Huck 写道:
> > On Mon, 5 Feb 2018 15:22:55 +0800
> > Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
> >
> >> This series contains three patches,
> >> 1) optimizes the code including walking DMA tables and rpcit handler
> >> 2) fixes the issue caused by IOTLB global refresh
> >> 3) uses the right pal and pba when registering ioat
> >>
> >> The issue mentioned above was found when we tested SMC-r tools. This
> >> behavior has been introduced when linux guest started using a global
> >> refresh to purge the whole IOTLB of invalid entries in a lazy fashion
> >> instead of flushing each entry when invalidating table entries.
> >>
> >> The previous QEMU implementation didn't keep track of the mapping,
> >> didn't handle correctly the global flush demand from the guest and a
> >> major part of the IOTLB entries were not flushed.
> >>
> >> Consequently linux kernel on the host keeping the previous mapping
> >> reports, as it should, -EEXIST DMA mapping error on the next mapping
> >> with the same IOVA. The second patch fixes this issue.
> > Introducing a local tracking mechanism still feels a bit awkward to me
> > (even though it works, of course). If nobody else needs such a thing,
> > our best choice is to do it like that, though.
> Caching iotlb entries is also helpful for us to support 2G mapping in
> future.
OK, that makes sense.
> >
> >> During the investigation, we noticed that the current code walking
> >> PCI IOMMU page tables didn't check important flags of table entries,
> >> including:
> >> 1) protection bit
> >> 2) table length
> >> 3) table offset
> >> 4) intermediate tables' invalid bit
> >> 5) format control bit
> >>
> >> We implement the checking in the first patch before handling the
> >> IOTLB global refresh issue. To keep track of the mapped IOTLB entries
> >> and be able to check if the host IOTLB entries need to be refreshed
> >> we implement a IOTLB cache in QEMU, and introduce some helper
> >> functions to check these bits. All S390IOTLBEntry instances are stored
> >> in a new hashtable which are indexed by IOVA. Each PCI device has its
> >> own IOMMU. Therefore each IOMMU also has its own hashtable caching
> >> corresponding PCI device's DMA entries. Finally, we split 1M
> >> contiguous DMA range into 4K pages to do DMA map, and the code about
> >> error notification is also optimized.
> >>
> >> Change log:
> >> v1->v2:
> >> 1) update commit messages
> >> 2) move some changes from the 2nd patch to the 1st patch
> >> 3) define macros for 'ett' in the 1st patch
> >>
> >> Yi Min Zhao (3):
> >> s390x/pci: fixup the code walking IOMMU tables
> >> s390x/pci: fixup global refresh
> >> s390x/pci: use the right pal and pba in reg_ioat()
> >>
> >> hw/s390x/s390-pci-bus.c | 233 ++++++++++++++++++++++++++++++++++++++---------
> >> hw/s390x/s390-pci-bus.h | 17 ++++
> >> hw/s390x/s390-pci-inst.c | 103 ++++++++++++++-------
> >> 3 files changed, 275 insertions(+), 78 deletions(-)
> >>
> > I have played with these patches and some virtio-pci devices (since I
> > don't have access to real zpci cards), and it worked both under kvm and
> > under tcg. So I'm inclined to apply this (I can't review further due to
> > missing documentation), unless the pci folks have further comments.
> Thanks!
Applied to s390-next, thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-07 11:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-05 7:22 [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 1/3] s390x/pci: fixup the code walking IOMMU tables Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 2/3] s390x/pci: fixup global refresh Yi Min Zhao
2018-02-05 7:22 ` [Qemu-devel] [PATCH v2 3/3] s390x/pci: use the right pal and pba in reg_ioat() Yi Min Zhao
2018-02-06 10:23 ` [Qemu-devel] [PATCH v2 0/3] s390x/pci: fixup and optimize IOTLB code Cornelia Huck
2018-02-07 2:56 ` Yi Min Zhao
2018-02-07 11:17 ` Cornelia Huck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).