qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] add PCI support for the s390 platform
@ 2014-11-10 14:20 Frank Blaschka
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support Frank Blaschka
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Frank Blaschka @ 2014-11-10 14:20 UTC (permalink / raw)
  To: agraf, cornelia.huck, borntraeger, pbonzini, qemu-devel
  Cc: peter.maydell, james.hogan, mtosatti, Frank Blaschka, rth

This set of patches implemets PCI support for the s390 platform.
Now it is possible to run virtio-net-pci and potentially all
virtual pci devices conforming to s390 platform constrains.

Please review and consider for integration into 2.3

Thanks,

Frank Blaschka (3):
  s390: Add PCI bus support
  s390: implement pci instructions
  kvm: extend kvm_irqchip_add_msi_route to work on s390

 default-configs/s390x-softmmu.mak |   1 +
 hw/s390x/Makefile.objs            |   1 +
 hw/s390x/css.c                    |   5 +
 hw/s390x/css.h                    |   1 +
 hw/s390x/s390-pci-bus.c           | 485 ++++++++++++++++++++++++
 hw/s390x/s390-pci-bus.h           | 254 +++++++++++++
 hw/s390x/s390-virtio-ccw.c        |   3 +
 hw/s390x/sclp.c                   |  10 +-
 include/hw/s390x/sclp.h           |   8 +
 include/sysemu/kvm.h              |   4 +
 kvm-all.c                         |   7 +
 target-arm/kvm.c                  |   6 +
 target-i386/kvm.c                 |   6 +
 target-mips/kvm.c                 |   6 +
 target-ppc/kvm.c                  |   6 +
 target-s390x/Makefile.objs        |   2 +-
 target-s390x/ioinst.c             |  52 +++
 target-s390x/ioinst.h             |   1 +
 target-s390x/kvm.c                |  78 ++++
 target-s390x/pci_ic.c             | 753 ++++++++++++++++++++++++++++++++++++++
 target-s390x/pci_ic.h             | 335 +++++++++++++++++
 21 files changed, 2022 insertions(+), 2 deletions(-)
 create mode 100644 hw/s390x/s390-pci-bus.c
 create mode 100644 hw/s390x/s390-pci-bus.h
 create mode 100644 target-s390x/pci_ic.c
 create mode 100644 target-s390x/pci_ic.h

-- 
1.8.5.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-10 14:20 [Qemu-devel] [PATCH 0/3] add PCI support for the s390 platform Frank Blaschka
@ 2014-11-10 14:20 ` Frank Blaschka
  2014-11-10 15:14   ` Alexander Graf
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 2/3] s390: implement pci instructions Frank Blaschka
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 3/3] kvm: extend kvm_irqchip_add_msi_route to work on s390 Frank Blaschka
  2 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-10 14:20 UTC (permalink / raw)
  To: agraf, cornelia.huck, borntraeger, pbonzini, qemu-devel
  Cc: peter.maydell, james.hogan, mtosatti, Frank Blaschka, rth

From: Frank Blaschka <frank.blaschka@de.ibm.com>

This patch implements a pci bus for s390x together with infrastructure
to generate and handle hotplug events, to configure/unconfigure via
sclp instruction, to do iommu translations and provide s390 support for
MSI/MSI-X notification processing.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---
 default-configs/s390x-softmmu.mak |   1 +
 hw/s390x/Makefile.objs            |   1 +
 hw/s390x/css.c                    |   5 +
 hw/s390x/css.h                    |   1 +
 hw/s390x/s390-pci-bus.c           | 485 ++++++++++++++++++++++++++++++++++++++
 hw/s390x/s390-pci-bus.h           | 254 ++++++++++++++++++++
 hw/s390x/s390-virtio-ccw.c        |   3 +
 hw/s390x/sclp.c                   |  10 +-
 include/hw/s390x/sclp.h           |   8 +
 target-s390x/ioinst.c             |  52 ++++
 target-s390x/ioinst.h             |   1 +
 11 files changed, 820 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-bus.c
 create mode 100644 hw/s390x/s390-pci-bus.h

diff --git a/default-configs/s390x-softmmu.mak b/default-configs/s390x-softmmu.mak
index 126d88d..6ee2ff8 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -1,3 +1,4 @@
+include pci.mak
 CONFIG_VIRTIO=y
 CONFIG_SCLPCONSOLE=y
 CONFIG_S390_FLIC=y
diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 1ba6c3a..428d957 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -8,3 +8,4 @@ obj-y += ipl.o
 obj-y += css.o
 obj-y += s390-virtio-ccw.o
 obj-y += virtio-ccw.o
+obj-y += s390-pci-bus.o
diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index b67c039..7553085 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1299,6 +1299,11 @@ void css_generate_chp_crws(uint8_t cssid, uint8_t chpid)
     /* TODO */
 }
 
+void css_generate_css_crws(uint8_t cssid)
+{
+    css_queue_crw(CRW_RSC_CSS, 0, 0, 0);
+}
+
 int css_enable_mcsse(void)
 {
     trace_css_enable_facility("mcsse");
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 33104ac..7e53148 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -101,6 +101,7 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, uint16_t rsid);
 void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
                            int hotplugged, int add);
 void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
+void css_generate_css_crws(uint8_t cssid);
 void css_adapter_interrupt(uint8_t isc);
 
 #define CSS_IO_ADAPTER_VIRTIO 1
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
new file mode 100644
index 0000000..f2fa6ba
--- /dev/null
+++ b/hw/s390x/s390-pci-bus.c
@@ -0,0 +1,485 @@
+/*
+ * s390 PCI BUS
+ *
+ * Copyright 2014 IBM Corp.
+ * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
+ *            Hong Bo Li <lihbbj@cn.ibm.com>
+ *            Yi Min Zhao <zyimin@cn.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include <hw/pci/pci.h>
+#include <hw/pci/pci_bus.h>
+#include <hw/s390x/css.h>
+#include <hw/s390x/sclp.h>
+#include <hw/pci/msi.h>
+#include "qemu/error-report.h"
+#include "s390-pci-bus.h"
+
+/* #define DEBUG_S390PCI_BUS */
+#ifdef DEBUG_S390PCI_BUS
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+static const unsigned long be_to_le = BITS_PER_LONG - 1;
+static QTAILQ_HEAD(, SeiContainer) pending_sei =
+    QTAILQ_HEAD_INITIALIZER(pending_sei);
+static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
+    QTAILQ_HEAD_INITIALIZER(device_list);
+
+int chsc_sei_nt2_get_event(void *res)
+{
+    ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
+    PciCcdfAvail *accdf;
+    PciCcdfErr *eccdf;
+    int rc = 1;
+    SeiContainer *sei_cont;
+
+    sei_cont = QTAILQ_FIRST(&pending_sei);
+    if (sei_cont) {
+        QTAILQ_REMOVE(&pending_sei, sei_cont, link);
+        nt2_res->nt = 2;
+        nt2_res->cc = sei_cont->cc;
+        switch (sei_cont->cc) {
+        case 1: /* error event */
+            eccdf = (PciCcdfErr *)nt2_res->ccdf;
+            eccdf->fid = cpu_to_be32(sei_cont->fid);
+            eccdf->fh = cpu_to_be32(sei_cont->fh);
+            break;
+        case 2: /* availability event */
+            accdf = (PciCcdfAvail *)nt2_res->ccdf;
+            accdf->fid = cpu_to_be32(sei_cont->fid);
+            accdf->fh = cpu_to_be32(sei_cont->fh);
+            accdf->pec = cpu_to_be16(sei_cont->pec);
+            break;
+        default:
+            abort();
+        }
+        g_free(sei_cont);
+        rc = 0;
+    }
+
+    return rc;
+}
+
+int chsc_sei_nt2_have_event(void)
+{
+    return !QTAILQ_EMPTY(&pending_sei);
+}
+
+S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
+{
+    S390PCIBusDevice *pbdev;
+
+    QTAILQ_FOREACH(pbdev, &device_list, next) {
+        if (pbdev->fid == fid) {
+            return pbdev;
+        }
+    }
+    return NULL;
+}
+
+void s390_pci_sclp_configure(int configure, SCCB *sccb)
+{
+    PciCfgSccb *psccb = (PciCfgSccb *)sccb;
+    S390PCIBusDevice *pbdev = s390_pci_find_dev_by_fid(be32_to_cpu(psccb->aid));
+    uint16_t rc;
+
+    if (pbdev) {
+        if ((configure == 1 && pbdev->configured == true) ||
+            (configure == 0 && pbdev->configured == false)) {
+            rc = SCLP_RC_NO_ACTION_REQUIRED;
+        } else {
+            pbdev->configured = !pbdev->configured;
+            rc = SCLP_RC_NORMAL_COMPLETION;
+        }
+    } else {
+        DPRINTF("sclp config %d no dev found\n", configure);
+        rc = SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED;
+    }
+
+    psccb->header.response_code = cpu_to_be16(rc);
+    return;
+}
+
+static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
+{
+    return PCI_SLOT(pdev->devfn);
+}
+
+static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
+{
+    return PCI_SLOT(pdev->devfn) | FH_VIRT;
+}
+
+S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
+{
+    S390PCIBusDevice *dev;
+    int i = 0;
+
+    QTAILQ_FOREACH(dev, &device_list, next) {
+        if (i == idx) {
+            return dev;
+        }
+        i++;
+    }
+    return NULL;
+}
+
+S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh)
+{
+    S390PCIBusDevice *pbdev;
+
+    QTAILQ_FOREACH(pbdev, &device_list, next) {
+        if (pbdev->fh == fh) {
+            return pbdev;
+        }
+    }
+    return NULL;
+}
+
+static void s390_pci_generate_plug_event(uint16_t pec, uint32_t fh,
+                                         uint32_t fid)
+{
+    SeiContainer *sei_cont = g_malloc0(sizeof(SeiContainer));
+
+    sei_cont->fh = fh;
+    sei_cont->fid = fid;
+    sei_cont->cc = 2;
+    sei_cont->pec = pec;
+
+    QTAILQ_INSERT_TAIL(&pending_sei, sei_cont, link);
+    css_generate_css_crws(0);
+}
+
+static void s390_pci_set_irq(void *opaque, int irq, int level)
+{
+    /* nothing to do */
+}
+
+static int s390_pci_map_irq(PCIDevice *pci_dev, int irq_num)
+{
+    /* nothing to do */
+    return 0;
+}
+
+void s390_pci_bus_init(void)
+{
+    DeviceState *dev;
+
+    dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE);
+    qdev_init_nofail(dev);
+}
+
+uint64_t s390_pci_get_table_origin(uint64_t iota)
+{
+    return iota & ~ZPCI_IOTA_RTTO_FLAG;
+}
+
+static uint32_t s390_pci_get_p(uint64_t iota)
+{
+    return iota & ~ZPCI_IOTA_RTTO_FLAG;
+}
+
+static uint32_t s390_pci_get_dt(uint64_t iota)
+{
+    return (iota >> 2) & 0x7;
+}
+
+static uint32_t s390_pci_get_fs(uint64_t iota)
+{
+    uint32_t dt = s390_pci_get_dt(iota);
+
+    if (dt == 4 || dt == 5) {
+        return iota & 0x3;
+    } else {
+        return ZPCI_IOTA_FS_4K;
+    }
+}
+
+uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
+                                  uint64_t guest_dma_address)
+{
+    uint64_t sto_a, pto_a, px_a;
+    uint64_t sto, pto, pte;
+    uint32_t rtx, sx, px;
+
+    rtx = calc_rtx(guest_dma_address);
+    sx = calc_sx(guest_dma_address);
+    px = calc_px(guest_dma_address);
+
+    sto_a = guest_iota + rtx * sizeof(uint64_t);
+    cpu_physical_memory_rw(sto_a, (uint8_t *)&sto, sizeof(uint64_t), 0);
+    sto = (uint64_t)get_rt_sto(sto);
+
+    pto_a = sto + sx * sizeof(uint64_t);
+    cpu_physical_memory_rw(pto_a, (uint8_t *)&pto, sizeof(uint64_t), 0);
+    pto = (uint64_t)get_st_pto(pto);
+
+    px_a = pto + px * sizeof(uint64_t);
+    cpu_physical_memory_rw(px_a, (uint8_t *)&pte, sizeof(uint64_t), 0);
+
+    return pte;
+}
+
+static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *iommu, hwaddr addr,
+                                          bool is_write)
+{
+    IOMMUTLBEntry ret;
+    uint32_t fs;
+    uint64_t pte;
+    BEntry *container = container_of(iommu, BEntry, mr);
+    S390PCIBusDevice *pbdev = container->pbdev;
+    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pbdev->pdev)
+                                           ->qbus.parent);
+
+    DPRINTF("iommu trans addr 0x%lx\n", addr);
+
+    /* s390 does not have an APIC maped to main storage so we use
+     * a separate AddressSpace only for msix notifications
+     */
+    if (addr == ZPCI_MSI_ADDR) {
+        ret.target_as = &s->msix_notify_as;
+        ret.iova = addr;
+        ret.translated_addr = addr;
+        ret.addr_mask = 0xfff;
+        ret.perm = IOMMU_RW;
+        return ret;
+    }
+
+    pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
+                                   addr);
+
+    ret.target_as = &address_space_memory;
+    ret.iova = addr;
+    ret.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
+    fs = s390_pci_get_fs(pbdev->g_iota);
+    if (fs == ZPCI_IOTA_FS_4K) {
+        ret.addr_mask = 0xfff;
+    } else if (fs == ZPCI_IOTA_FS_1M) {
+        ret.addr_mask = 0xfffff;
+    } else if (fs == ZPCI_IOTA_FS_2G) {
+        ret.addr_mask = 0x7fffffff;
+    }
+    if (s390_pci_get_p(pbdev->g_iota) == 1) {
+        ret.perm = IOMMU_RO;
+    } else {
+        ret.perm = IOMMU_RW;
+    }
+    return ret;
+}
+
+static const MemoryRegionIOMMUOps s390_iommu_ops = {
+    .translate = s390_translate_iommu,
+};
+
+static AddressSpace *s390_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+{
+    S390pciState *s = opaque;
+
+    return &s->iommu[PCI_SLOT(devfn)].as;
+}
+
+static void s390_msi_ctrl_write(void *opaque, hwaddr addr, uint64_t data,
+                                unsigned int size)
+{
+    S390PCIBusDevice *pbdev;
+    unsigned long *aibv, *aisb;
+    int summary_set;
+    hwaddr aibv_len, aisb_len;
+    uint32_t io_int_word;
+    uint32_t hdata = le32_to_cpu(data);
+    uint32_t fid = hdata >> ZPCI_MSI_VEC_BITS;
+    uint32_t vec = hdata & ZPCI_MSI_VEC_MASK;
+
+    DPRINTF("write_msix data 0x%lx fid %d vec 0x%x\n", data, fid, vec);
+
+    pbdev = s390_pci_find_dev_by_fid(fid);
+    if (!pbdev) {
+        DPRINTF("msix_notify no dev\n");
+        return;
+    }
+    aibv_len = aisb_len = 8;
+    aibv = cpu_physical_memory_map(pbdev->routes.adapter.ind_addr,
+                                   &aibv_len, 1);
+    aisb = cpu_physical_memory_map(pbdev->routes.adapter.summary_addr,
+                                   &aisb_len, 1);
+
+    set_bit(vec ^ be_to_le, aibv);
+    summary_set = test_and_set_bit(pbdev->routes.adapter.summary_offset
+                                   ^ be_to_le, aisb);
+
+    if (!summary_set) {
+        io_int_word = (pbdev->isc << 27) | IO_INT_WORD_AI;
+        s390_io_interrupt(0, 0, 0, io_int_word);
+    }
+
+    cpu_physical_memory_unmap(aibv, aibv_len, 1, aibv_len);
+    cpu_physical_memory_unmap(aisb, aisb_len, 1, aisb_len);
+    return;
+}
+
+static uint64_t s390_msi_ctrl_read(void *opaque, hwaddr addr, unsigned size)
+{
+    return 0xffffffff;
+}
+
+static const MemoryRegionOps s390_msi_ctrl_ops = {
+    .write = s390_msi_ctrl_write,
+    .read = s390_msi_ctrl_read,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static void s390_pcihost_init_as(S390pciState *s)
+{
+    int i;
+
+    for (i = 0; i < PCI_SLOT_MAX; i++) {
+        memory_region_init_iommu(&s->iommu[i].mr, OBJECT(s),
+                                 &s390_iommu_ops, "iommu-s390", UINT64_MAX);
+        address_space_init(&s->iommu[i].as, &s->iommu[i].mr, "iommu-pci");
+    }
+
+    memory_region_init_io(&s->msix_notify_mr, OBJECT(s),
+                          &s390_msi_ctrl_ops, s, "msix-s390", UINT64_MAX);
+    address_space_init(&s->msix_notify_as, &s->msix_notify_mr, "msix-pci");
+}
+
+static int s390_pcihost_init(SysBusDevice *dev)
+{
+    PCIBus *b;
+    BusState *bus;
+    PCIHostState *phb = PCI_HOST_BRIDGE(dev);
+    S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
+
+    DPRINTF("host_init\n");
+
+    b = pci_register_bus(DEVICE(dev), NULL,
+                         s390_pci_set_irq, s390_pci_map_irq, NULL,
+                         get_system_memory(), get_system_io(), 0, 64,
+                         TYPE_PCI_BUS);
+    s390_pcihost_init_as(s);
+    pci_setup_iommu(b, s390_pci_dma_iommu, s);
+
+    bus = BUS(b);
+    qbus_set_hotplug_handler(bus, DEVICE(dev), NULL);
+    phb->bus = b;
+    return 0;
+}
+
+static int s390_pcihost_setup_msix(S390PCIBusDevice *pbdev)
+{
+    uint8_t pos;
+    uint16_t ctrl;
+    uint32_t table, pba;
+
+    pos = pci_find_capability(pbdev->pdev, PCI_CAP_ID_MSIX);
+    if (!pos) {
+        pbdev->msix.available = false;
+        return 0;
+    }
+
+    ctrl = pci_host_config_read_common(pbdev->pdev, pos + PCI_CAP_FLAGS,
+             pci_config_size(pbdev->pdev), sizeof(ctrl));
+    table = pci_host_config_read_common(pbdev->pdev, pos + PCI_MSIX_TABLE,
+             pci_config_size(pbdev->pdev), sizeof(table));
+    pba = pci_host_config_read_common(pbdev->pdev, pos + PCI_MSIX_PBA,
+             pci_config_size(pbdev->pdev), sizeof(pba));
+
+    pbdev->msix.table_bar = table & PCI_MSIX_FLAGS_BIRMASK;
+    pbdev->msix.table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
+    pbdev->msix.pba_bar = pba & PCI_MSIX_FLAGS_BIRMASK;
+    pbdev->msix.pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
+    pbdev->msix.entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
+    pbdev->msix.available = true;
+    return 0;
+}
+
+static void s390_pcihost_hot_plug(HotplugHandler *hotplug_dev,
+                                  DeviceState *dev, Error **errp)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    S390PCIBusDevice *pbdev;
+    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
+                                           ->qbus.parent);
+
+    pbdev = g_malloc0(sizeof(*pbdev));
+
+    pbdev->fid = s390_pci_get_pfid(pci_dev);
+    pbdev->pdev = pci_dev;
+    pbdev->configured = true;
+
+    pbdev->fh = s390_pci_get_pfh(pci_dev);
+
+    s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev = pbdev;
+    s390_pcihost_setup_msix(pbdev);
+
+    QTAILQ_INSERT_TAIL(&device_list, pbdev, next);
+    if (dev->hotplugged) {
+        s390_pci_generate_plug_event(HP_EVENT_RESERVED_TO_STANDBY,
+                                     pbdev->fh, pbdev->fid);
+        s390_pci_generate_plug_event(HP_EVENT_TO_CONFIGURED,
+                                     pbdev->fh, pbdev->fid);
+    }
+    return;
+}
+
+static void s390_pcihost_hot_unplug(HotplugHandler *hotplug_dev,
+                                    DeviceState *dev, Error **errp)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
+                                           ->qbus.parent);
+    S390PCIBusDevice *pbdev = s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev;
+
+    if (pbdev->configured) {
+        pbdev->configured = false;
+        s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
+                                     pbdev->fh, pbdev->fid);
+    }
+
+    QTAILQ_REMOVE(&device_list, pbdev, next);
+    s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
+                                 pbdev->fh, pbdev->fid);
+    s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev = NULL;
+    object_unparent(OBJECT(pci_dev));
+    g_free(pbdev);
+}
+
+static void s390_pcihost_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+
+    dc->cannot_instantiate_with_device_add_yet = true;
+    k->init = s390_pcihost_init;
+    hc->plug = s390_pcihost_hot_plug;
+    hc->unplug = s390_pcihost_hot_unplug;
+    msi_supported = true;
+}
+
+static const TypeInfo s390_pcihost_info = {
+    .name          = TYPE_S390_PCI_HOST_BRIDGE,
+    .parent        = TYPE_PCI_HOST_BRIDGE,
+    .instance_size = sizeof(S390pciState),
+    .class_init    = s390_pcihost_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
+};
+
+static void s390_pci_register_types(void)
+{
+    type_register_static(&s390_pcihost_info);
+}
+
+type_init(s390_pci_register_types)
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
new file mode 100644
index 0000000..088f24f
--- /dev/null
+++ b/hw/s390x/s390-pci-bus.h
@@ -0,0 +1,254 @@
+/*
+ * s390 PCI BUS definitions
+ *
+ * Copyright 2014 IBM Corp.
+ * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
+ *            Hong Bo Li <lihbbj@cn.ibm.com>
+ *            Yi Min Zhao <zyimin@cn.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390_PCI_BUS_H
+#define HW_S390_PCI_BUS_H
+
+#include <hw/pci/pci.h>
+#include <hw/pci/pci_host.h>
+#include "hw/s390x/sclp.h"
+#include "hw/s390x/s390_flic.h"
+#include "hw/s390x/css.h"
+
+#define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
+#define FH_VIRT 0x00ff0000
+#define ENABLE_BIT_OFFSET 31
+#define S390_PCIPT_ADAPTER 2
+
+#define S390_PCI_HOST_BRIDGE(obj) \
+    OBJECT_CHECK(S390pciState, (obj), TYPE_S390_PCI_HOST_BRIDGE)
+
+#define HP_EVENT_TO_CONFIGURED        0x0301
+#define HP_EVENT_RESERVED_TO_STANDBY  0x0302
+#define HP_EVENT_CONFIGURED_TO_STBRES 0x0304
+#define HP_EVENT_STANDBY_TO_RESERVED  0x0308
+
+#define ZPCI_MSI_VEC_BITS 11
+#define ZPCI_MSI_VEC_MASK 0x7f
+
+#define ZPCI_MSI_ADDR  0xfe00000000000000
+#define ZPCI_SDMA_ADDR 0x100000000
+#define ZPCI_EDMA_ADDR 0x1ffffffffffffff
+
+#define PAGE_SHIFT      12
+#define PAGE_MASK       (~(PAGE_SIZE-1))
+#define PAGE_DEFAULT_ACC        0
+#define PAGE_DEFAULT_KEY        (PAGE_DEFAULT_ACC << 4)
+
+/* I/O Translation Anchor (IOTA) */
+enum ZpciIoatDtype {
+    ZPCI_IOTA_STO = 0,
+    ZPCI_IOTA_RTTO = 1,
+    ZPCI_IOTA_RSTO = 2,
+    ZPCI_IOTA_RFTO = 3,
+    ZPCI_IOTA_PFAA = 4,
+    ZPCI_IOTA_IOPFAA = 5,
+    ZPCI_IOTA_IOPTO = 7
+};
+
+#define ZPCI_IOTA_IOT_ENABLED           0x800UL
+#define ZPCI_IOTA_DT_ST                 (ZPCI_IOTA_STO  << 2)
+#define ZPCI_IOTA_DT_RT                 (ZPCI_IOTA_RTTO << 2)
+#define ZPCI_IOTA_DT_RS                 (ZPCI_IOTA_RSTO << 2)
+#define ZPCI_IOTA_DT_RF                 (ZPCI_IOTA_RFTO << 2)
+#define ZPCI_IOTA_DT_PF                 (ZPCI_IOTA_PFAA << 2)
+#define ZPCI_IOTA_FS_4K                 0
+#define ZPCI_IOTA_FS_1M                 1
+#define ZPCI_IOTA_FS_2G                 2
+#define ZPCI_KEY                        (PAGE_DEFAULT_KEY << 5)
+
+#define ZPCI_IOTA_STO_FLAG  (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_ST)
+#define ZPCI_IOTA_RTTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RT)
+#define ZPCI_IOTA_RSTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RS)
+#define ZPCI_IOTA_RFTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RF)
+#define ZPCI_IOTA_RFAA_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY |\
+                             ZPCI_IOTA_DT_PF | ZPCI_IOTA_FS_2G)
+
+/* I/O Region and segment tables */
+#define ZPCI_INDEX_MASK         0x7ffUL
+
+#define ZPCI_TABLE_TYPE_MASK    0xc
+#define ZPCI_TABLE_TYPE_RFX     0xc
+#define ZPCI_TABLE_TYPE_RSX     0x8
+#define ZPCI_TABLE_TYPE_RTX     0x4
+#define ZPCI_TABLE_TYPE_SX      0x0
+
+#define ZPCI_TABLE_LEN_RFX      0x3
+#define ZPCI_TABLE_LEN_RSX      0x3
+#define ZPCI_TABLE_LEN_RTX      0x3
+
+#define ZPCI_TABLE_OFFSET_MASK  0xc0
+#define ZPCI_TABLE_SIZE         0x4000
+#define ZPCI_TABLE_ALIGN        ZPCI_TABLE_SIZE
+#define ZPCI_TABLE_ENTRY_SIZE   (sizeof(unsigned long))
+#define ZPCI_TABLE_ENTRIES      (ZPCI_TABLE_SIZE / ZPCI_TABLE_ENTRY_SIZE)
+
+#define ZPCI_TABLE_BITS         11
+#define ZPCI_PT_BITS            8
+#define ZPCI_ST_SHIFT           (ZPCI_PT_BITS + PAGE_SHIFT)
+#define ZPCI_RT_SHIFT           (ZPCI_ST_SHIFT + ZPCI_TABLE_BITS)
+
+#define ZPCI_RTE_FLAG_MASK      0x3fffUL
+#define ZPCI_RTE_ADDR_MASK      (~ZPCI_RTE_FLAG_MASK)
+#define ZPCI_STE_FLAG_MASK      0x7ffUL
+#define ZPCI_STE_ADDR_MASK      (~ZPCI_STE_FLAG_MASK)
+
+/* I/O Page tables */
+#define ZPCI_PTE_VALID_MASK             0x400
+#define ZPCI_PTE_INVALID                0x400
+#define ZPCI_PTE_VALID                  0x000
+#define ZPCI_PT_SIZE                    0x800
+#define ZPCI_PT_ALIGN                   ZPCI_PT_SIZE
+#define ZPCI_PT_ENTRIES                 (ZPCI_PT_SIZE / ZPCI_TABLE_ENTRY_SIZE)
+#define ZPCI_PT_MASK                    (ZPCI_PT_ENTRIES - 1)
+
+#define ZPCI_PTE_FLAG_MASK              0xfffUL
+#define ZPCI_PTE_ADDR_MASK              (~ZPCI_PTE_FLAG_MASK)
+
+/* Shared bits */
+#define ZPCI_TABLE_VALID                0x00
+#define ZPCI_TABLE_INVALID              0x20
+#define ZPCI_TABLE_PROTECTED            0x200
+#define ZPCI_TABLE_UNPROTECTED          0x000
+
+#define ZPCI_TABLE_VALID_MASK           0x20
+#define ZPCI_TABLE_PROT_MASK            0x200
+
+typedef struct SeiContainer {
+    QTAILQ_ENTRY(SeiContainer) link;
+    uint32_t fid;
+    uint32_t fh;
+    uint8_t cc;
+    uint16_t pec;
+} SeiContainer;
+
+typedef struct PciCcdfErr {
+    uint32_t reserved1;
+    uint32_t fh;
+    uint32_t fid;
+    uint32_t reserved2;
+    uint64_t faddr;
+    uint32_t reserved3;
+    uint16_t reserved4;
+    uint16_t pec;
+} QEMU_PACKED PciCcdfErr;
+
+typedef struct PciCcdfAvail {
+    uint32_t reserved1;
+    uint32_t fh;
+    uint32_t fid;
+    uint32_t reserved2;
+    uint32_t reserved3;
+    uint32_t reserved4;
+    uint32_t reserved5;
+    uint16_t reserved6;
+    uint16_t pec;
+} QEMU_PACKED PciCcdfAvail;
+
+typedef struct ChscSeiNt2Res {
+    uint16_t length;
+    uint16_t code;
+    uint16_t reserved1;
+    uint8_t reserved2;
+    uint8_t nt;
+    uint8_t flags;
+    uint8_t reserved3;
+    uint8_t reserved4;
+    uint8_t cc;
+    uint32_t reserved5[13];
+    uint8_t ccdf[4016];
+} QEMU_PACKED ChscSeiNt2Res;
+
+typedef struct PciCfgSccb {
+        SCCBHeader header;
+        uint8_t atype;
+        uint8_t reserved1;
+        uint16_t reserved2;
+        uint32_t aid;
+} QEMU_PACKED PciCfgSccb;
+
+typedef struct S390MsixInfo {
+    bool available;
+    uint8_t table_bar;
+    uint8_t pba_bar;
+    uint16_t entries;
+    uint32_t table_offset;
+    uint32_t pba_offset;
+} S390MsixInfo;
+
+typedef struct S390PCIBusDevice {
+    PCIDevice *pdev;
+    bool configured;
+    uint32_t fh;
+    uint32_t fid;
+    uint64_t g_iota;
+    uint8_t isc;
+    S390MsixInfo msix;
+    AdapterRoutes routes;
+    QTAILQ_ENTRY(S390PCIBusDevice) next;
+} S390PCIBusDevice;
+
+typedef struct BEntry {
+    AddressSpace as;
+    MemoryRegion mr;
+    S390PCIBusDevice *pbdev;
+} BEntry;
+
+typedef struct S390pciState {
+    PCIHostState parent_obj;
+    BEntry iommu[PCI_SLOT_MAX];
+    AddressSpace msix_notify_as;
+    MemoryRegion msix_notify_mr;
+} S390pciState;
+
+static inline unsigned int calc_rtx(dma_addr_t ptr)
+{
+    return ((unsigned long) ptr >> ZPCI_RT_SHIFT) & ZPCI_INDEX_MASK;
+}
+
+static inline unsigned int calc_sx(dma_addr_t ptr)
+{
+    return ((unsigned long) ptr >> ZPCI_ST_SHIFT) & ZPCI_INDEX_MASK;
+}
+
+static inline unsigned int calc_px(dma_addr_t ptr)
+{
+    return ((unsigned long) ptr >> PAGE_SHIFT) & ZPCI_PT_MASK;
+}
+
+static inline unsigned long *get_rt_sto(unsigned long entry)
+{
+    return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_RTX)
+                ? (unsigned long *) (entry & ZPCI_RTE_ADDR_MASK)
+                : NULL;
+}
+
+static inline unsigned long *get_st_pto(unsigned long entry)
+{
+    return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_SX)
+            ? (unsigned long *) (entry & ZPCI_STE_ADDR_MASK)
+            : NULL;
+}
+
+int chsc_sei_nt2_get_event(void *res);
+int chsc_sei_nt2_have_event(void);
+void s390_pci_sclp_configure(int configure, SCCB *sccb);
+S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
+S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
+S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);
+void s390_pci_bus_init(void);
+uint64_t s390_pci_get_table_origin(uint64_t iota);
+uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
+                                  uint64_t guest_dma_address);
+
+#endif
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index bc4dc2a..2e25834 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -18,6 +18,7 @@
 #include "css.h"
 #include "virtio-ccw.h"
 #include "qemu/config-file.h"
+#include "s390-pci-bus.h"
 
 #define TYPE_S390_CCW_MACHINE               "s390-ccw-machine"
 
@@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
                       machine->initrd_filename, "s390-ccw.img");
     s390_flic_init();
 
+    s390_pci_bus_init();
+
     /* register hypercalls */
     virtio_ccw_register_hcalls();
 
diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
index a759da7..a969975 100644
--- a/hw/s390x/sclp.c
+++ b/hw/s390x/sclp.c
@@ -20,6 +20,7 @@
 #include "qemu/config-file.h"
 #include "hw/s390x/sclp.h"
 #include "hw/s390x/event-facility.h"
+#include "hw/s390x/s390-pci-bus.h"
 
 static inline SCLPEventFacility *get_event_facility(void)
 {
@@ -62,7 +63,8 @@ static void read_SCP_info(SCCB *sccb)
         read_info->entries[i].type = 0;
     }
 
-    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);
+    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO |
+                                        SCLP_HAS_PCI_RECONFIG);
 
     /*
      * The storage increment size is a multiple of 1M and is a power of 2.
@@ -350,6 +352,12 @@ static void sclp_execute(SCCB *sccb, uint32_t code)
     case SCLP_UNASSIGN_STORAGE:
         unassign_storage(sccb);
         break;
+    case SCLP_CMDW_CONFIGURE_PCI:
+        s390_pci_sclp_configure(1, sccb);
+        break;
+    case SCLP_CMDW_DECONFIGURE_PCI:
+        s390_pci_sclp_configure(0, sccb);
+        break;
     default:
         efc->command_handler(ef, sccb, code);
         break;
diff --git a/include/hw/s390x/sclp.h b/include/hw/s390x/sclp.h
index ec07a11..e8a64e2 100644
--- a/include/hw/s390x/sclp.h
+++ b/include/hw/s390x/sclp.h
@@ -43,14 +43,22 @@
 #define SCLP_CMDW_CONFIGURE_CPU                 0x00110001
 #define SCLP_CMDW_DECONFIGURE_CPU               0x00100001
 
+/* SCLP PCI codes */
+#define SCLP_HAS_PCI_RECONFIG                   0x0000000040000000ULL
+#define SCLP_CMDW_CONFIGURE_PCI                 0x001a0001
+#define SCLP_CMDW_DECONFIGURE_PCI               0x001b0001
+#define SCLP_RECONFIG_PCI_ATPYE                 2
+
 /* SCLP response codes */
 #define SCLP_RC_NORMAL_READ_COMPLETION          0x0010
 #define SCLP_RC_NORMAL_COMPLETION               0x0020
 #define SCLP_RC_SCCB_BOUNDARY_VIOLATION         0x0100
+#define SCLP_RC_NO_ACTION_REQUIRED              0x0120
 #define SCLP_RC_INVALID_SCLP_COMMAND            0x01f0
 #define SCLP_RC_CONTAINED_EQUIPMENT_CHECK       0x0340
 #define SCLP_RC_INSUFFICIENT_SCCB_LENGTH        0x0300
 #define SCLP_RC_STANDBY_READ_COMPLETION         0x0410
+#define SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED       0x09f0
 #define SCLP_RC_INVALID_FUNCTION                0x40f0
 #define SCLP_RC_NO_EVENT_BUFFERS_STORED         0x60f0
 #define SCLP_RC_INVALID_SELECTION_MASK          0x70f0
diff --git a/target-s390x/ioinst.c b/target-s390x/ioinst.c
index b8a6486..d969f8f 100644
--- a/target-s390x/ioinst.c
+++ b/target-s390x/ioinst.c
@@ -14,6 +14,7 @@
 #include "cpu.h"
 #include "ioinst.h"
 #include "trace.h"
+#include "hw/s390x/s390-pci-bus.h"
 
 int ioinst_disassemble_sch_ident(uint32_t value, int *m, int *cssid, int *ssid,
                                  int *schid)
@@ -398,6 +399,7 @@ typedef struct ChscResp {
 #define CHSC_SCPD 0x0002
 #define CHSC_SCSC 0x0010
 #define CHSC_SDA  0x0031
+#define CHSC_SEI  0x000e
 
 #define CHSC_SCPD_0_M 0x20000000
 #define CHSC_SCPD_0_C 0x10000000
@@ -566,6 +568,53 @@ out:
     res->param = 0;
 }
 
+static int chsc_sei_nt0_get_event(void *res)
+{
+    /* no events yet */
+    return 1;
+}
+
+static int chsc_sei_nt0_have_event(void)
+{
+    /* no events yet */
+    return 0;
+}
+
+#define CHSC_SEI_NT0    (1ULL << 63)
+#define CHSC_SEI_NT2    (1ULL << 61)
+static void ioinst_handle_chsc_sei(ChscReq *req, ChscResp *res)
+{
+    uint64_t selection_mask = be64_to_cpu(*(uint64_t *)&req->param1);
+    uint8_t *res_flags = (uint8_t *)res->data;
+    int have_event = 0;
+    int have_more = 0;
+
+    /* regarding architecture nt0 can not be masked */
+    have_event = !chsc_sei_nt0_get_event(res);
+    have_more = chsc_sei_nt0_have_event();
+
+    if (selection_mask & CHSC_SEI_NT2) {
+        if (!have_event) {
+            have_event = !chsc_sei_nt2_get_event(res);
+        }
+
+        if (!have_more) {
+            have_more = chsc_sei_nt2_have_event();
+        }
+    }
+
+    if (have_event) {
+        res->code = cpu_to_be16(0x0001);
+        if (have_more) {
+            (*res_flags) |= 0x80;
+        } else {
+            (*res_flags) &= ~0x80;
+        }
+    } else {
+        res->code = cpu_to_be16(0x0004);
+    }
+}
+
 static void ioinst_handle_chsc_unimplemented(ChscResp *res)
 {
     res->len = cpu_to_be16(CHSC_MIN_RESP_LEN);
@@ -617,6 +666,9 @@ void ioinst_handle_chsc(S390CPU *cpu, uint32_t ipb)
     case CHSC_SDA:
         ioinst_handle_chsc_sda(req, res);
         break;
+    case CHSC_SEI:
+        ioinst_handle_chsc_sei(req, res);
+        break;
     default:
         ioinst_handle_chsc_unimplemented(res);
         break;
diff --git a/target-s390x/ioinst.h b/target-s390x/ioinst.h
index 29f6423..1efe16c 100644
--- a/target-s390x/ioinst.h
+++ b/target-s390x/ioinst.h
@@ -204,6 +204,7 @@ typedef struct CRW {
 
 #define CRW_RSC_SUBCH 0x3
 #define CRW_RSC_CHP   0x4
+#define CRW_RSC_CSS   0xb
 
 /* I/O interruption code */
 typedef struct IOIntCode {
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-10 14:20 [Qemu-devel] [PATCH 0/3] add PCI support for the s390 platform Frank Blaschka
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support Frank Blaschka
@ 2014-11-10 14:20 ` Frank Blaschka
  2014-11-10 15:56   ` Alexander Graf
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 3/3] kvm: extend kvm_irqchip_add_msi_route to work on s390 Frank Blaschka
  2 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-10 14:20 UTC (permalink / raw)
  To: agraf, cornelia.huck, borntraeger, pbonzini, qemu-devel
  Cc: peter.maydell, james.hogan, mtosatti, Frank Blaschka, rth

From: Frank Blaschka <frank.blaschka@de.ibm.com>

This patch implements the s390 pci instructions in qemu. It allows
to access and drive pci devices attached to the s390 pci bus.
Because of platform constrains devices using IO BARs are not
supported. Also a device has to support MSI/MSI-X to run on s390.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---
 target-s390x/Makefile.objs |   2 +-
 target-s390x/kvm.c         |  52 ++++
 target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
 target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
 4 files changed, 1141 insertions(+), 1 deletion(-)
 create mode 100644 target-s390x/pci_ic.c
 create mode 100644 target-s390x/pci_ic.h

diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
index 2c57494..cc71400 100644
--- a/target-s390x/Makefile.objs
+++ b/target-s390x/Makefile.objs
@@ -2,4 +2,4 @@ obj-y += translate.o helper.o cpu.o interrupt.o
 obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
 obj-y += gdbstub.o
 obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o pci_ic.o
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 5b10a25..d59e740 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -40,6 +40,7 @@
 #include "exec/gdbstub.h"
 #include "trace.h"
 #include "qapi-event.h"
+#include "pci_ic.h"
 
 /* #define DEBUG_KVM */
 
@@ -56,6 +57,7 @@
 #define IPA0_B2                         0xb200
 #define IPA0_B9                         0xb900
 #define IPA0_EB                         0xeb00
+#define IPA0_E3                         0xe300
 
 #define PRIV_B2_SCLP_CALL               0x20
 #define PRIV_B2_CSCH                    0x30
@@ -76,8 +78,17 @@
 #define PRIV_B2_XSCH                    0x76
 
 #define PRIV_EB_SQBS                    0x8a
+#define PRIV_EB_PCISTB                  0xd0
+#define PRIV_EB_SIC                     0xd1
 
 #define PRIV_B9_EQBS                    0x9c
+#define PRIV_B9_CLP                     0xa0
+#define PRIV_B9_PCISTG                  0xd0
+#define PRIV_B9_PCILG                   0xd2
+#define PRIV_B9_RPCIT                   0xd3
+
+#define PRIV_E3_MPCIFC                  0xd0
+#define PRIV_E3_STPCIFC                 0xd4
 
 #define DIAG_IPL                        0x308
 #define DIAG_KVM_HYPERCALL              0x500
@@ -814,6 +825,18 @@ static int handle_b9(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
     int r = 0;
 
     switch (ipa1) {
+    case PRIV_B9_CLP:
+        r = kvm_clp_service_call(cpu, run);
+        break;
+    case PRIV_B9_PCISTG:
+        r = kvm_pcistg_service_call(cpu, run);
+        break;
+    case PRIV_B9_PCILG:
+        r = kvm_pcilg_service_call(cpu, run);
+        break;
+    case PRIV_B9_RPCIT:
+        r = kvm_rpcit_service_call(cpu, run);
+        break;
     case PRIV_B9_EQBS:
         /* just inject exception */
         r = -1;
@@ -832,6 +855,12 @@ static int handle_eb(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
     int r = 0;
 
     switch (ipa1) {
+    case PRIV_EB_PCISTB:
+        r = kvm_pcistb_service_call(cpu, run);
+        break;
+    case PRIV_EB_SIC:
+        r = kvm_sic_service_call(cpu, run);
+        break;
     case PRIV_EB_SQBS:
         /* just inject exception */
         r = -1;
@@ -845,6 +874,26 @@ static int handle_eb(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
     return r;
 }
 
+static int handle_e3(S390CPU *cpu, struct kvm_run *run, uint8_t ipbl)
+{
+    int r = 0;
+
+    switch (ipbl) {
+    case PRIV_E3_MPCIFC:
+        r = kvm_mpcifc_service_call(cpu, run);
+        break;
+    case PRIV_E3_STPCIFC:
+        r = kvm_stpcifc_service_call(cpu, run);
+        break;
+    default:
+        r = -1;
+        DPRINTF("KVM: unhandled PRIV: 0xe3%x\n", ipbl);
+        break;
+    }
+
+    return r;
+}
+
 static int handle_hypercall(S390CPU *cpu, struct kvm_run *run)
 {
     CPUS390XState *env = &cpu->env;
@@ -1041,6 +1090,9 @@ static int handle_instruction(S390CPU *cpu, struct kvm_run *run)
     case IPA0_EB:
         r = handle_eb(cpu, run, ipa1);
         break;
+    case IPA0_E3:
+        r = handle_e3(cpu, run, run->s390_sieic.ipb & 0xff);
+        break;
     case IPA0_DIAG:
         r = handle_diag(cpu, run, run->s390_sieic.ipb);
         break;
diff --git a/target-s390x/pci_ic.c b/target-s390x/pci_ic.c
new file mode 100644
index 0000000..6c05faf
--- /dev/null
+++ b/target-s390x/pci_ic.c
@@ -0,0 +1,753 @@
+/*
+ * s390 PCI intercepts
+ *
+ * Copyright 2014 IBM Corp.
+ * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
+ *            Hong Bo Li <lihbbj@cn.ibm.com>
+ *            Yi Min Zhao <zyimin@cn.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <linux/kvm.h>
+#include <asm/ptrace.h>
+#include <hw/pci/pci.h>
+#include <hw/pci/pci_host.h>
+#include <net/net.h>
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "migration/qemu-file.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "cpu.h"
+#include "sysemu/device_tree.h"
+#include "monitor/monitor.h"
+#include "pci_ic.h"
+
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_host.h"
+#include "hw/s390x/s390-pci-bus.h"
+#include "exec/exec-all.h"
+#include "exec/memory-internal.h"
+
+/* #define DEBUG_S390PCI_IC */
+#ifdef DEBUG_S390PCI_IC
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "s390pci_ic: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+static uint64_t resume_token;
+
+static uint8_t barsize(uint64_t size)
+{
+    uint64_t mask = 1;
+    int i;
+
+    if (!size) {
+        return 0;
+    }
+
+    for (i = 0; i < 64; i++) {
+        if (size & mask) {
+            break;
+        }
+        mask = (mask << 1);
+    }
+
+    return i;
+}
+
+static void s390_set_status_code(CPUS390XState *env,
+                                 uint8_t r, uint64_t status_code)
+{
+    env->regs[r] &= ~0xff000000;
+    env->regs[r] |= (status_code & 0xff) << 24;
+}
+
+static int list_pci(ClpReqRspListPci *rrb, uint8_t *cc)
+{
+    S390PCIBusDevice *pbdev;
+    uint32_t res_code, initial_l2, g_l2, finish;
+    int rc, idx;
+
+    rc = 0;
+    if (be16_to_cpu(rrb->request.hdr.len) != 32) {
+        res_code = CLP_RC_LEN;
+        rc = -EINVAL;
+        goto out;
+    }
+
+    if ((be32_to_cpu(rrb->request.fmt) & CLP_MASK_FMT) != 0) {
+        res_code = CLP_RC_FMT;
+        rc = -EINVAL;
+        goto out;
+    }
+
+    if ((be32_to_cpu(rrb->request.fmt) & ~CLP_MASK_FMT) != 0 ||
+        rrb->request.reserved1 != 0 ||
+        rrb->request.reserved2 != 0) {
+        res_code = CLP_RC_RESNOT0;
+        rc = -EINVAL;
+        goto out;
+    }
+
+    if (be64_to_cpu(rrb->request.resume_token) == 0) {
+        resume_token = 0;
+    } else if (be64_to_cpu(rrb->request.resume_token) != resume_token) {
+        res_code = CLP_RC_LISTPCI_BADRT;
+        rc = -EINVAL;
+        goto out;
+    }
+
+    if (be16_to_cpu(rrb->response.hdr.len) < 48) {
+        res_code = CLP_RC_8K;
+        rc = -EINVAL;
+        goto out;
+    }
+
+    initial_l2 = be16_to_cpu(rrb->response.hdr.len);
+    if ((initial_l2 - LIST_PCI_HDR_LEN) % sizeof(ClpFhListEntry)
+        != 0) {
+        rc = -EINVAL;
+        *cc = 3;
+        goto out;
+    }
+
+    rrb->response.fmt = 0;
+    rrb->response.reserved1 = rrb->response.reserved2 = 0;
+    rrb->response.mdd = cpu_to_be32(FH_VIRT);
+    rrb->response.max_fn = cpu_to_be16(PCI_MAX_FUNCTIONS);
+    rrb->response.entry_size = sizeof(ClpFhListEntry);
+    finish = 0;
+    idx = resume_token;
+    g_l2 = LIST_PCI_HDR_LEN;
+    do {
+        pbdev = s390_pci_find_dev_by_idx(idx);
+        if (!pbdev) {
+            finish = 1;
+            break;
+        }
+        rrb->response.fh_list[idx - resume_token].device_id =
+            pci_get_word(pbdev->pdev->config + PCI_DEVICE_ID);
+        rrb->response.fh_list[idx - resume_token].vendor_id =
+            pci_get_word(pbdev->pdev->config + PCI_VENDOR_ID);
+        rrb->response.fh_list[idx - resume_token].config =
+            cpu_to_be32(0x80000000);
+        rrb->response.fh_list[idx - resume_token].fid = cpu_to_be32(pbdev->fid);
+        rrb->response.fh_list[idx - resume_token].fh = cpu_to_be32(pbdev->fh);
+
+        g_l2 += sizeof(ClpFhListEntry);
+        DPRINTF("g_l2 %d vendor id 0x%x device id 0x%x fid 0x%x fh 0x%x\n",
+            g_l2,
+            rrb->response.fh_list[idx - resume_token].vendor_id,
+            rrb->response.fh_list[idx - resume_token].device_id,
+            rrb->response.fh_list[idx - resume_token].fid,
+            rrb->response.fh_list[idx - resume_token].fh);
+        idx++;
+    } while (g_l2 < initial_l2);
+
+    if (finish == 1) {
+        resume_token = 0;
+    } else {
+        resume_token = idx;
+    }
+    rrb->response.resume_token = cpu_to_be64(resume_token);
+    rrb->response.hdr.len = cpu_to_be16(g_l2);
+    rrb->response.hdr.rsp = cpu_to_be16(CLP_RC_OK);
+out:
+    if (rc) {
+        DPRINTF("list pci failed rc 0x%x\n", rc);
+        rrb->response.hdr.rsp = cpu_to_be16(res_code);
+    }
+    return rc;
+}
+
+int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    ClpReqHdr *reqh;
+    ClpRspHdr *resh;
+    S390PCIBusDevice *pbdev;
+    uint32_t req_len;
+    uint32_t res_len;
+    uint8_t *buffer;
+    uint8_t cc = 0;
+    CPUS390XState *env = &cpu->env;
+    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
+    int i;
+
+    buffer = g_malloc0(4096 * 2);
+    cpu_synchronize_state(CPU(cpu));
+
+    if (env->psw.mask & PSW_MASK_PSTATE) {
+        program_interrupt(env, PGM_PRIVILEGED, 4);
+        return 0;
+    }
+
+    cpu_physical_memory_rw(env->regs[r2], buffer, sizeof(*reqh), 0);
+    reqh = (ClpReqHdr *)buffer;
+    req_len = be16_to_cpu(reqh->len);
+    if (req_len < 16 || req_len > 8184 || (req_len % 8 != 0)) {
+        program_interrupt(env, PGM_OPERAND, 4);
+        return 0;
+    }
+
+    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + sizeof(*resh), 0);
+    resh = (ClpRspHdr *)(buffer + req_len);
+    res_len = be16_to_cpu(resh->len);
+    if (res_len < 8 || res_len > 8176 || (res_len % 8 != 0)) {
+        program_interrupt(env, PGM_OPERAND, 4);
+        return 0;
+    }
+    if ((req_len + res_len) > 8192) {
+        program_interrupt(env, PGM_OPERAND, 4);
+        return 0;
+    }
+
+    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + res_len, 0);
+
+    if (req_len != 32) {
+        resh->rsp = cpu_to_be16(CLP_RC_LEN);
+        goto out;
+    }
+
+    switch (reqh->cmd) {
+    case CLP_LIST_PCI: {
+        ClpReqRspListPci *rrb = (ClpReqRspListPci *)buffer;
+        list_pci(rrb, &cc);
+        break;
+    }
+    case CLP_SET_PCI_FN: {
+        ClpReqSetPci *reqsetpci = (ClpReqSetPci *)reqh;
+        ClpRspSetPci *ressetpci = (ClpRspSetPci *)resh;
+
+        pbdev = s390_pci_find_dev_by_fh(be32_to_cpu(reqsetpci->fh));
+        if (!pbdev) {
+                ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FH);
+                goto out;
+        }
+
+        switch (reqsetpci->oc) {
+        case CLP_SET_ENABLE_PCI_FN:
+            pbdev->fh = pbdev->fh | 1 << ENABLE_BIT_OFFSET;
+            ressetpci->fh = cpu_to_be32(pbdev->fh);
+            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_OK);
+            break;
+        case CLP_SET_DISABLE_PCI_FN:
+            pbdev->fh = pbdev->fh & ~(1 << ENABLE_BIT_OFFSET);
+            ressetpci->fh = cpu_to_be32(pbdev->fh);
+            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_OK);
+            break;
+        default:
+            DPRINTF("unknown set pci command\n");
+            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FHOP);
+            break;
+        }
+        break;
+    }
+    case CLP_QUERY_PCI_FN: {
+        ClpReqQueryPci *reqquery = (ClpReqQueryPci *)reqh;
+        ClpRspQueryPci *resquery = (ClpRspQueryPci *)resh;
+
+        pbdev = s390_pci_find_dev_by_fh(reqquery->fh);
+        if (!pbdev) {
+            DPRINTF("query pci no pci dev\n");
+            resquery->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FH);
+            goto out;
+        }
+
+        for (i = 0; i < PCI_BAR_COUNT; i++) {
+            uint64_t data = pci_host_config_read_common(pbdev->pdev,
+                0x10 + (i * 4), pci_config_size(pbdev->pdev), 4);
+
+            resquery->bar[i] = bswap32(data);
+            resquery->bar_size[i] = barsize(pbdev->pdev->io_regions[i].size);
+            DPRINTF("bar %d addr 0x%x size 0x%lx barsize 0x%x\n", i,
+                    resquery->bar[i], pbdev->pdev->io_regions[i].size,
+                    resquery->bar_size[i]);
+        }
+
+        resquery->sdma = ZPCI_SDMA_ADDR;
+        resquery->edma = ZPCI_EDMA_ADDR;
+        resquery->pchid = 0;
+        resquery->ug = 1;
+        resquery->uid = pbdev->fid;
+
+        resquery->hdr.rsp = CLP_RC_OK;
+        break;
+    }
+    case CLP_QUERY_PCI_FNGRP: {
+        ClpRspQueryPciGrp *resgrp = (ClpRspQueryPciGrp *)resh;
+        resgrp->fr = 1;
+        resgrp->dasm = 0;
+        resgrp->msia = ZPCI_MSI_ADDR;
+        resgrp->mui = 0;
+        resgrp->i = 128;
+        resgrp->version = 0;
+
+        resgrp->hdr.rsp = CLP_RC_OK;
+        break;
+    }
+    default:
+        DPRINTF("unknown clp command\n");
+        resh->rsp = cpu_to_be16(CLP_RC_CMD);
+        break;
+    }
+
+out:
+    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + res_len, 1);
+    g_free(buffer);
+    setcc(cpu, cc);
+    return 0;
+}
+
+int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    S390PCIBusDevice *pbdev;
+    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
+    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
+    PciLgStg *rp;
+    uint64_t offset;
+    uint64_t data;
+    uint8_t len;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    if (env->psw.mask & PSW_MASK_PSTATE) {
+        program_interrupt(env, PGM_PRIVILEGED, 4);
+        return 0;
+    }
+
+    if (r2 & 0x1) {
+        program_interrupt(env, PGM_SPECIFICATION, 4);
+        return 0;
+    }
+
+    rp = (PciLgStg *)&env->regs[r2];
+    offset = env->regs[r2 + 1];
+
+    pbdev = s390_pci_find_dev_by_fh(rp->fh);
+    if (!pbdev) {
+        DPRINTF("pcilg no pci dev\n");
+        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+        return 0;
+    }
+
+    len = rp->len & 0xF;
+    if (rp->pcias < 6) {
+        if ((8 - (offset & 0x7)) < len) {
+            program_interrupt(env, PGM_OPERAND, 4);
+            return 0;
+        }
+        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
+        io_mem_read(mr, offset, &data, len);
+    } else if (rp->pcias == 15) {
+        if ((4 - (offset & 0x3)) < len) {
+            program_interrupt(env, PGM_OPERAND, 4);
+            return 0;
+        }
+        data =  pci_host_config_read_common(
+                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
+
+        switch (len) {
+        case 1:
+            break;
+        case 2:
+            data = cpu_to_le16(data);
+            break;
+        case 4:
+            data = cpu_to_le32(data);
+            break;
+        case 8:
+            data = cpu_to_le64(data);
+            break;
+        default:
+            program_interrupt(env, PGM_OPERAND, 4);
+            return 0;
+        }
+    } else {
+        DPRINTF("invalid space\n");
+        setcc(cpu, ZPCI_PCI_LS_ERR);
+        s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
+        return 0;
+    }
+
+    env->regs[r1] = data;
+    setcc(cpu, ZPCI_PCI_LS_OK);
+    return 0;
+}
+
+static void update_msix_table_msg_data(S390PCIBusDevice *pbdev, uint64_t offset,
+                                       uint64_t *data, uint8_t len)
+{
+    uint32_t msg_data;
+
+    if (offset % PCI_MSIX_ENTRY_SIZE != 8) {
+        return;
+    }
+
+    if (len != 4) {
+        DPRINTF("access msix table msg data but len is %d\n", len);
+        return;
+    }
+
+    msg_data = (pbdev->fid << ZPCI_MSI_VEC_BITS) | le32_to_cpu(*data);
+    *data = cpu_to_le32(msg_data);
+    DPRINTF("update msix msg_data to 0x%x\n", msg_data);
+}
+
+static int trap_msix(S390PCIBusDevice *pbdev, uint64_t offset, uint8_t pcias)
+{
+    if (pbdev->msix.available && pbdev->msix.table_bar == pcias &&
+        offset >= pbdev->msix.table_offset &&
+        offset <= pbdev->msix.table_offset +
+                  (pbdev->msix.entries - 1) * PCI_MSIX_ENTRY_SIZE) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+int kvm_pcistg_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
+    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
+    PciLgStg *rp;
+    uint64_t offset, data;
+    S390PCIBusDevice *pbdev;
+    uint8_t len;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    if (env->psw.mask & PSW_MASK_PSTATE) {
+        program_interrupt(env, PGM_PRIVILEGED, 4);
+        return 0;
+    }
+
+    if (r2 & 0x1) {
+        program_interrupt(env, PGM_SPECIFICATION, 4);
+        return 0;
+    }
+
+    rp = (PciLgStg *)&env->regs[r2];
+    offset = env->regs[r2 + 1];
+
+    pbdev = s390_pci_find_dev_by_fh(rp->fh);
+    if (!pbdev) {
+        DPRINTF("pcistg no pci dev\n");
+        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+        return 0;
+    }
+
+    data = env->regs[r1];
+    len = rp->len & 0xF;
+    if (rp->pcias < 6) {
+        if ((8 - (offset & 0x7)) < len) {
+            program_interrupt(env, PGM_OPERAND, 4);
+            return 0;
+        }
+        MemoryRegion *mr;
+        if (trap_msix(pbdev, offset, rp->pcias)) {
+            offset = offset - pbdev->msix.table_offset;
+            mr = &pbdev->pdev->msix_table_mmio;
+            update_msix_table_msg_data(pbdev, offset, &data, len);
+        } else {
+            mr = pbdev->pdev->io_regions[rp->pcias].memory;
+        }
+
+        io_mem_write(mr, offset, data, len);
+    } else if (rp->pcias == 15) {
+        if ((4 - (offset & 0x3)) < len) {
+            program_interrupt(env, PGM_OPERAND, 4);
+            return 0;
+        }
+        switch (len) {
+        case 1:
+            break;
+        case 2:
+            data = le16_to_cpu(data);
+            break;
+        case 4:
+            data = le32_to_cpu(data);
+            break;
+        case 8:
+            data = le64_to_cpu(data);
+            break;
+        default:
+            program_interrupt(env, PGM_OPERAND, 4);
+            return 0;
+        }
+
+        pci_host_config_write_common(pbdev->pdev, offset,
+                                     pci_config_size(pbdev->pdev),
+                                     data, len);
+    } else {
+        DPRINTF("pcistg invalid space\n");
+        setcc(cpu, ZPCI_PCI_LS_ERR);
+        s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
+        return 0;
+    }
+
+    setcc(cpu, ZPCI_PCI_LS_OK);
+    return 0;
+}
+
+int kvm_rpcit_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
+    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
+    uint32_t fh;
+    uint64_t pte;
+    S390PCIBusDevice *pbdev;
+    ram_addr_t size;
+    int flags;
+    IOMMUTLBEntry entry;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    if (env->psw.mask & PSW_MASK_PSTATE) {
+        program_interrupt(env, PGM_PRIVILEGED, 4);
+        return 0;
+    }
+
+    if (r2 & 0x1) {
+        program_interrupt(env, PGM_SPECIFICATION, 4);
+        return 0;
+    }
+
+    fh = env->regs[r1] >> 32;
+    size = env->regs[r2 + 1];
+
+    pbdev = s390_pci_find_dev_by_fh(fh);
+
+    if (!pbdev) {
+        DPRINTF("rpcit no pci dev\n");
+        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+        return 0;
+    }
+
+    pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
+                                   env->regs[r2]);
+    flags = pte & ZPCI_PTE_FLAG_MASK;
+    entry.target_as = &address_space_memory;
+    entry.iova = env->regs[r2];
+    entry.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
+    entry.addr_mask = size - 1;
+
+    if (flags & ZPCI_PTE_INVALID) {
+        entry.perm = IOMMU_NONE;
+    } else {
+        entry.perm = IOMMU_RW;
+    }
+
+    memory_region_notify_iommu(pci_device_iommu_address_space(
+                               pbdev->pdev)->root, entry);
+
+    setcc(cpu, ZPCI_PCI_LS_OK);
+    return 0;
+}
+
+int kvm_sic_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    qemu_log_mask(LOG_UNIMP, "SIC missing\n");
+    return 0;
+}
+
+int kvm_pcistb_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    uint8_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
+    uint8_t r3 = run->s390_sieic.ipa & 0x000f;
+    PciStb *rp;
+    uint64_t gaddr;
+    uint64_t *uaddr, *pu;
+    hwaddr len;
+    S390PCIBusDevice *pbdev;
+    MemoryRegion *mr;
+    int i;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    if (env->psw.mask & PSW_MASK_PSTATE) {
+        program_interrupt(env, PGM_PRIVILEGED, 6);
+        return 0;
+    }
+
+    rp = (PciStb *)&env->regs[r1];
+    if (rp->pcias > 5) {
+        DPRINTF("pcistb invalid space\n");
+        setcc(cpu, ZPCI_PCI_LS_ERR);
+        s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS);
+        return 0;
+    }
+
+    switch (rp->len) {
+    case 16:
+    case 32:
+    case 64:
+    case 128:
+        break;
+    default:
+        program_interrupt(env, PGM_SPECIFICATION, 6);
+        return 0;
+    }
+
+    gaddr = get_base_disp_rsy(cpu, run);
+    len = rp->len;
+
+    pbdev = s390_pci_find_dev_by_fh(rp->fh);
+    if (!pbdev) {
+        DPRINTF("pcistb no pci dev fh 0x%x\n", rp->fh);
+        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+        return 0;
+    }
+
+    uaddr = cpu_physical_memory_map(gaddr, &len, 0);
+    mr = pbdev->pdev->io_regions[rp->pcias].memory;
+    if (!memory_region_access_valid(mr, env->regs[r3], rp->len, true)) {
+        cpu_physical_memory_unmap(uaddr, len, 0, len);
+        program_interrupt(env, PGM_ADDRESSING, 6);
+        return 0;
+    }
+
+    pu = uaddr;
+    for (i = 0; i < rp->len / 8; i++) {
+        io_mem_write(mr, env->regs[r3] + i * 8, *pu, 8);
+        pu++;
+    }
+
+    cpu_physical_memory_unmap(uaddr, len, 0, len);
+    setcc(cpu, ZPCI_PCI_LS_OK);
+    return 0;
+}
+
+static int reg_irqs(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib)
+{
+    int ret;
+    S390FLICState *fs = s390_get_flic();
+    S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
+
+    ret = css_register_io_adapter(S390_PCIPT_ADAPTER,
+                                  FIB_DATA_ISC(fib.data), true, false,
+                                  &pbdev->routes.adapter.adapter_id);
+    assert(ret == 0);
+
+    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id, fib.aisb, true);
+    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id, fib.aibv, true);
+
+    pbdev->routes.adapter.summary_addr = fib.aisb;
+    pbdev->routes.adapter.summary_offset = FIB_DATA_AISBO(fib.data);
+    pbdev->routes.adapter.ind_addr = fib.aibv;
+    pbdev->routes.adapter.ind_offset = FIB_DATA_AIBVO(fib.data);
+
+    DPRINTF("reg_irqs adapter id %d\n", pbdev->routes.adapter.adapter_id);
+    return 0;
+}
+
+static int dereg_irqs(S390PCIBusDevice *pbdev)
+{
+    S390FLICState *fs = s390_get_flic();
+    S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
+
+    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id,
+                        pbdev->routes.adapter.ind_addr, false);
+
+    pbdev->routes.adapter.summary_addr = 0;
+    pbdev->routes.adapter.summary_offset = 0;
+    pbdev->routes.adapter.ind_addr = 0;
+    pbdev->routes.adapter.ind_offset = 0;
+
+    DPRINTF("dereg_irqs adapter id %d\n", pbdev->routes.adapter.adapter_id);
+    return 0;
+}
+
+int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    uint8_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
+    uint8_t oc;
+    uint32_t fh;
+    uint64_t fiba;
+    ZpciFib fib;
+    S390PCIBusDevice *pbdev;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    if (env->psw.mask & PSW_MASK_PSTATE) {
+        program_interrupt(env, PGM_PRIVILEGED, 6);
+        return 0;
+    }
+
+    oc = env->regs[r1] & 0xff;
+    fh = env->regs[r1] >> 32;
+    fiba = get_base_disp_rxy(cpu, run);
+
+    if (fiba & 0x7) {
+        program_interrupt(env, PGM_SPECIFICATION, 6);
+        return 0;
+    }
+
+    pbdev = s390_pci_find_dev_by_fh(fh);
+    if (!pbdev) {
+        DPRINTF("mpcifc no pci dev fh 0x%x\n", fh);
+        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+        return 0;
+    }
+
+    cpu_physical_memory_rw(fiba, (uint8_t *)&fib, sizeof(fib), 0);
+
+    switch (oc) {
+    case ZPCI_MOD_FC_REG_INT: {
+        pbdev->isc = FIB_DATA_ISC(fib.data);
+        reg_irqs(env, pbdev, fib);
+        break;
+    }
+    case ZPCI_MOD_FC_DEREG_INT:
+        dereg_irqs(pbdev);
+        break;
+    case ZPCI_MOD_FC_REG_IOAT:
+        if (fib.pba > fib.pal) {
+            program_interrupt(&cpu->env, PGM_OPERAND, 6);
+            return 0;
+        }
+        pbdev->g_iota = fib.iota;
+        break;
+    case ZPCI_MOD_FC_DEREG_IOAT:
+        break;
+    case ZPCI_MOD_FC_REREG_IOAT:
+        break;
+    case ZPCI_MOD_FC_RESET_ERROR:
+        break;
+    case ZPCI_MOD_FC_RESET_BLOCK:
+        break;
+    case ZPCI_MOD_FC_SET_MEASURE:
+        break;
+    default:
+        program_interrupt(&cpu->env, PGM_OPERAND, 6);
+        return 0;
+    }
+
+    setcc(cpu, ZPCI_PCI_LS_OK);
+    return 0;
+}
+
+int kvm_stpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
+{
+    qemu_log_mask(LOG_UNIMP, "STPCIFC missing\n");
+    return 0;
+}
diff --git a/target-s390x/pci_ic.h b/target-s390x/pci_ic.h
new file mode 100644
index 0000000..0eb6c27
--- /dev/null
+++ b/target-s390x/pci_ic.h
@@ -0,0 +1,335 @@
+/*
+ * s390 PCI intercept definitions
+ *
+ * Copyright 2014 IBM Corp.
+ * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
+ *            Hong Bo Li <lihbbj@cn.ibm.com>
+ *            Yi Min Zhao <zyimin@cn.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef PCI_IC_S390X_H
+#define PCI_IC_S390X_H
+
+#include <sysemu/dma.h>
+
+/* CLP common request & response block size */
+#define CLP_BLK_SIZE 4096
+#define PCI_BAR_COUNT 6
+#define PCI_MAX_FUNCTIONS 4096
+
+typedef struct ClpReqHdr {
+    __uint16_t len;
+    __uint16_t cmd;
+} QEMU_PACKED ClpReqHdr;
+
+typedef struct ClpRspHdr {
+    __uint16_t len;
+    __uint16_t rsp;
+} QEMU_PACKED ClpRspHdr;
+
+/* CLP Response Codes */
+#define CLP_RC_OK         0x0010  /* Command request successfully */
+#define CLP_RC_CMD        0x0020  /* Command code not recognized */
+#define CLP_RC_PERM       0x0030  /* Command not authorized */
+#define CLP_RC_FMT        0x0040  /* Invalid command request format */
+#define CLP_RC_LEN        0x0050  /* Invalid command request length */
+#define CLP_RC_8K         0x0060  /* Command requires 8K LPCB */
+#define CLP_RC_RESNOT0    0x0070  /* Reserved field not zero */
+#define CLP_RC_NODATA     0x0080  /* No data available */
+#define CLP_RC_FC_UNKNOWN 0x0100  /* Function code not recognized */
+
+/*
+ * Call Logical Processor - Command Codes
+ */
+#define CLP_LIST_PCI            0x0002
+#define CLP_QUERY_PCI_FN        0x0003
+#define CLP_QUERY_PCI_FNGRP     0x0004
+#define CLP_SET_PCI_FN          0x0005
+
+/* PCI function handle list entry */
+typedef struct ClpFhListEntry {
+    __uint16_t device_id;
+    __uint16_t vendor_id;
+#define CLP_FHLIST_MASK_CONFIG 0x80000000
+    __uint32_t config;
+    __uint32_t fid;
+    __uint32_t fh;
+} QEMU_PACKED ClpFhListEntry;
+
+#define CLP_RC_SETPCIFN_FH      0x0101 /* Invalid PCI fn handle */
+#define CLP_RC_SETPCIFN_FHOP    0x0102 /* Fn handle not valid for op */
+#define CLP_RC_SETPCIFN_DMAAS   0x0103 /* Invalid DMA addr space */
+#define CLP_RC_SETPCIFN_RES     0x0104 /* Insufficient resources */
+#define CLP_RC_SETPCIFN_ALRDY   0x0105 /* Fn already in requested state */
+#define CLP_RC_SETPCIFN_ERR     0x0106 /* Fn in permanent error state */
+#define CLP_RC_SETPCIFN_RECPND  0x0107 /* Error recovery pending */
+#define CLP_RC_SETPCIFN_BUSY    0x0108 /* Fn busy */
+#define CLP_RC_LISTPCI_BADRT    0x010a /* Resume token not recognized */
+#define CLP_RC_QUERYPCIFG_PFGID 0x010b /* Unrecognized PFGID */
+
+/* request or response block header length */
+#define LIST_PCI_HDR_LEN 32
+
+/* Number of function handles fitting in response block */
+#define CLP_FH_LIST_NR_ENTRIES \
+    ((CLP_BLK_SIZE - 2 * LIST_PCI_HDR_LEN) \
+        / sizeof(ClpFhListEntry))
+
+#define CLP_SET_ENABLE_PCI_FN  0 /* Yes, 0 enables it */
+#define CLP_SET_DISABLE_PCI_FN 1 /* Yes, 1 disables it */
+
+#define CLP_UTIL_STR_LEN 64
+
+#define CLP_MASK_FMT 0xf0000000
+
+/* List PCI functions request */
+typedef struct ClpReqListPci {
+    ClpReqHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+    __uint64_t resume_token;
+    __uint64_t reserved2;
+} QEMU_PACKED ClpReqListPci;
+
+/* List PCI functions response */
+typedef struct ClpRspListPci {
+    ClpRspHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+    __uint64_t resume_token;
+    __uint32_t mdd;
+    __uint16_t max_fn;
+    __uint8_t reserved2;
+    __uint8_t entry_size;
+    ClpFhListEntry fh_list[CLP_FH_LIST_NR_ENTRIES];
+} QEMU_PACKED ClpRspListPci;
+
+/* Query PCI function request */
+typedef struct ClpReqQueryPci {
+    ClpReqHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+    __uint32_t fh; /* function handle */
+    __uint32_t reserved2;
+    __uint64_t reserved3;
+} QEMU_PACKED ClpReqQueryPci;
+
+/* Query PCI function response */
+typedef struct ClpRspQueryPci {
+    ClpRspHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+    __uint16_t vfn; /* virtual fn number */
+#define CLP_RSP_QPCI_MASK_UTIL  0x100
+#define CLP_RSP_QPCI_MASK_PFGID 0xff
+    __uint16_t ug;
+    __uint32_t fid; /* pci function id */
+    __uint8_t bar_size[PCI_BAR_COUNT];
+    __uint16_t pchid;
+    __uint32_t bar[PCI_BAR_COUNT];
+    __uint64_t reserved2;
+    __uint64_t sdma; /* start dma as */
+    __uint64_t edma; /* end dma as */
+    __uint32_t reserved3[11];
+    __uint32_t uid;
+    __uint8_t util_str[CLP_UTIL_STR_LEN]; /* utility string */
+} QEMU_PACKED ClpRspQueryPci;
+
+/* Query PCI function group request */
+typedef struct ClpReqQueryPciGrp {
+    ClpReqHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+#define CLP_REQ_QPCIG_MASK_PFGID 0xff
+    __uint32_t g;
+    __uint32_t reserved2;
+    __uint64_t reserved3;
+} QEMU_PACKED ClpReqQueryPciGrp;
+
+/* Query PCI function group response */
+typedef struct ClpRspQueryPciGrp {
+    ClpRspHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+#define CLP_RSP_QPCIG_MASK_NOI 0xfff
+    __uint16_t i;
+    __uint8_t version;
+#define CLP_RSP_QPCIG_MASK_FRAME   0x2
+#define CLP_RSP_QPCIG_MASK_REFRESH 0x1
+    __uint8_t fr;
+    __uint16_t reserved2;
+    __uint16_t mui;
+    __uint64_t reserved3;
+    __uint64_t dasm; /* dma address space mask */
+    __uint64_t msia; /* MSI address */
+    __uint64_t reserved4;
+    __uint64_t reserved5;
+} QEMU_PACKED ClpRspQueryPciGrp;
+
+/* Set PCI function request */
+typedef struct ClpReqSetPci {
+    ClpReqHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+    __uint32_t fh; /* function handle */
+    __uint16_t reserved2;
+    __uint8_t oc; /* operation controls */
+    __uint8_t ndas; /* number of dma spaces */
+    __uint64_t reserved3;
+} QEMU_PACKED ClpReqSetPci;
+
+/* Set PCI function response */
+typedef struct ClpRspSetPci {
+    ClpRspHdr hdr;
+    __uint32_t fmt;
+    __uint64_t reserved1;
+    __uint32_t fh; /* function handle */
+    __uint32_t reserved3;
+    __uint64_t reserved4;
+} QEMU_PACKED ClpRspSetPci;
+
+typedef struct ClpReqRspListPci {
+    ClpReqListPci request;
+    ClpRspListPci response;
+} QEMU_PACKED ClpReqRspListPci;
+
+typedef struct ClpReqRspSetPci {
+    ClpReqSetPci request;
+    ClpRspSetPci response;
+} QEMU_PACKED ClpReqRspSetPci;
+
+typedef struct ClpReqRspQueryPci {
+    ClpReqQueryPci request;
+    ClpRspQueryPci response;
+} QEMU_PACKED ClpReqRspQueryPci;
+
+typedef struct ClpReqRspQueryPciGrp {
+    ClpReqQueryPciGrp request;
+    ClpRspQueryPciGrp response;
+} QEMU_PACKED ClpReqRspQueryPciGrp;
+
+typedef struct PciLgStg {
+    uint32_t fh;
+    uint8_t status;
+    uint8_t pcias;
+    uint8_t reserved;
+    uint8_t len;
+} QEMU_PACKED PciLgStg;
+
+typedef struct PciStb {
+    uint32_t fh;
+    uint8_t status;
+    uint8_t pcias;
+    uint8_t reserved;
+    uint8_t len;
+} QEMU_PACKED PciStb;
+
+/* Load/Store status codes */
+#define ZPCI_PCI_ST_FUNC_NOT_ENABLED        4
+#define ZPCI_PCI_ST_FUNC_IN_ERR             8
+#define ZPCI_PCI_ST_BLOCKED                 12
+#define ZPCI_PCI_ST_INSUF_RES               16
+#define ZPCI_PCI_ST_INVAL_AS                20
+#define ZPCI_PCI_ST_FUNC_ALREADY_ENABLED    24
+#define ZPCI_PCI_ST_DMA_AS_NOT_ENABLED      28
+#define ZPCI_PCI_ST_2ND_OP_IN_INV_AS        36
+#define ZPCI_PCI_ST_FUNC_NOT_AVAIL          40
+#define ZPCI_PCI_ST_ALREADY_IN_RQ_STATE     44
+
+/* Load/Store return codes */
+#define ZPCI_PCI_LS_OK              0
+#define ZPCI_PCI_LS_ERR             1
+#define ZPCI_PCI_LS_BUSY            2
+#define ZPCI_PCI_LS_INVAL_HANDLE    3
+
+/* Modify PCI Function Controls */
+#define ZPCI_MOD_FC_REG_INT     2
+#define ZPCI_MOD_FC_DEREG_INT   3
+#define ZPCI_MOD_FC_REG_IOAT    4
+#define ZPCI_MOD_FC_DEREG_IOAT  5
+#define ZPCI_MOD_FC_REREG_IOAT  6
+#define ZPCI_MOD_FC_RESET_ERROR 7
+#define ZPCI_MOD_FC_RESET_BLOCK 9
+#define ZPCI_MOD_FC_SET_MEASURE 10
+
+/* FIB function controls */
+#define ZPCI_FIB_FC_ENABLED     0x80
+#define ZPCI_FIB_FC_ERROR       0x40
+#define ZPCI_FIB_FC_LS_BLOCKED  0x20
+#define ZPCI_FIB_FC_DMAAS_REG   0x10
+
+/* FIB function controls */
+#define ZPCI_FIB_FC_ENABLED     0x80
+#define ZPCI_FIB_FC_ERROR       0x40
+#define ZPCI_FIB_FC_LS_BLOCKED  0x20
+#define ZPCI_FIB_FC_DMAAS_REG   0x10
+
+/* Function Information Block */
+typedef struct ZpciFib {
+    __uint8_t fmt;   /* format */
+    __uint8_t reserved1[7];
+    __uint8_t fc;                  /* function controls */
+    __uint8_t reserved2;
+    __uint16_t reserved3;
+    __uint32_t reserved4;
+    __uint64_t pba;                /* PCI base address */
+    __uint64_t pal;                /* PCI address limit */
+    __uint64_t iota;               /* I/O Translation Anchor */
+#define FIB_DATA_ISC(x)    (((x) >> 28) & 0x7)
+#define FIB_DATA_NOI(x)    (((x) >> 16) & 0xfff)
+#define FIB_DATA_AIBVO(x) (((x) >> 8) & 0x3f)
+#define FIB_DATA_SUM(x)    (((x) >> 7) & 0x1)
+#define FIB_DATA_AISBO(x)  ((x) & 0x3f)
+    __uint32_t data;
+    __uint32_t reserved5;
+    __uint64_t aibv;               /* Adapter int bit vector address */
+    __uint64_t aisb;               /* Adapter int summary bit address */
+    __uint64_t fmb_addr;           /* Function measurement address and key */
+    __uint32_t reserved6;
+    __uint32_t gd;
+} QEMU_PACKED ZpciFib;
+
+static inline uint64_t get_base_disp_rxy(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    uint32_t x2 = (run->s390_sieic.ipa & 0x000f);
+    uint32_t base2 = run->s390_sieic.ipb >> 28;
+    uint32_t disp2 = ((run->s390_sieic.ipb & 0x0fff0000) >> 16) +
+                     ((run->s390_sieic.ipb & 0xff00) << 4);
+
+    if (disp2 & 0x80000) {
+        disp2 += 0xfff00000;
+    }
+
+    return (base2 ? env->regs[base2] : 0) +
+           (x2 ? env->regs[x2] : 0) + (long)(int)disp2;
+}
+
+static inline uint64_t get_base_disp_rsy(S390CPU *cpu, struct kvm_run *run)
+{
+    CPUS390XState *env = &cpu->env;
+    uint32_t base2 = run->s390_sieic.ipb >> 28;
+    uint32_t disp2 = ((run->s390_sieic.ipb & 0x0fff0000) >> 16) +
+                     ((run->s390_sieic.ipb & 0xff00) << 4);
+
+    if (disp2 & 0x80000) {
+        disp2 += 0xfff00000;
+    }
+
+    return (base2 ? env->regs[base2] : 0) + (long)(int)disp2;
+}
+
+int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_rpcit_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_sic_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_pcistb_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_pcistg_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run);
+int kvm_stpcifc_service_call(S390CPU *cpu, struct kvm_run *run);
+
+#endif
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 3/3] kvm: extend kvm_irqchip_add_msi_route to work on s390
  2014-11-10 14:20 [Qemu-devel] [PATCH 0/3] add PCI support for the s390 platform Frank Blaschka
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support Frank Blaschka
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 2/3] s390: implement pci instructions Frank Blaschka
@ 2014-11-10 14:20 ` Frank Blaschka
  2 siblings, 0 replies; 27+ messages in thread
From: Frank Blaschka @ 2014-11-10 14:20 UTC (permalink / raw)
  To: agraf, cornelia.huck, borntraeger, pbonzini, qemu-devel
  Cc: peter.maydell, james.hogan, mtosatti, Frank Blaschka, rth

From: Frank Blaschka <frank.blaschka@de.ibm.com>

on s390 MSI-X irqs are presented as thin or adapter interrupts
for this we have to reorganize the routing entry to contain
valid information for the adapter interrupt code on s390.
To minimize impact on existing code we introduce an architecture
function to fixup the routing entry.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---
 include/sysemu/kvm.h |  4 ++++
 kvm-all.c            |  7 +++++++
 target-arm/kvm.c     |  6 ++++++
 target-i386/kvm.c    |  6 ++++++
 target-mips/kvm.c    |  6 ++++++
 target-ppc/kvm.c     |  6 ++++++
 target-s390x/kvm.c   | 26 ++++++++++++++++++++++++++
 7 files changed, 61 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index b0cd657..702dc93 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -148,6 +148,7 @@ extern bool kvm_readonly_mem_allowed;
 
 struct kvm_run;
 struct kvm_lapic_state;
+struct kvm_irq_routing_entry;
 
 typedef struct KVMCapabilityInfo {
     const char *name;
@@ -259,6 +260,9 @@ int kvm_arch_on_sigbus(int code, void *addr);
 
 void kvm_arch_init_irq_routing(KVMState *s);
 
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+                             uint64_t address, uint32_t data);
+
 int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
diff --git a/kvm-all.c b/kvm-all.c
index 44a5e72..7556d3f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1206,6 +1206,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
     kroute.u.msi.address_lo = (uint32_t)msg.address;
     kroute.u.msi.address_hi = msg.address >> 32;
     kroute.u.msi.data = le32_to_cpu(msg.data);
+    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data)) {
+        kvm_irqchip_release_virq(s, virq);
+        return -EINVAL;
+    }
 
     kvm_add_routing_entry(s, &kroute);
     kvm_irqchip_commit_routes(s);
@@ -1231,6 +1235,9 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
     kroute.u.msi.address_lo = (uint32_t)msg.address;
     kroute.u.msi.address_hi = msg.address >> 32;
     kroute.u.msi.data = le32_to_cpu(msg.data);
+    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data)) {
+        return -EINVAL;
+    }
 
     return kvm_update_routing_entry(s, &kroute);
 }
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 319784d..3285f81 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -441,3 +441,9 @@ int kvm_arch_irqchip_create(KVMState *s)
 
     return 0;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+                             uint64_t address, uint32_t data)
+{
+    return 0;
+}
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ccf36e8..7bc818c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2707,3 +2707,9 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
     return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
                                                 KVM_DEV_IRQ_HOST_MSIX);
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+                             uint64_t address, uint32_t data)
+{
+    return 0;
+}
diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 97fd51a..c7eb1dc 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -688,3 +688,9 @@ int kvm_arch_get_registers(CPUState *cs)
 
     return ret;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+                             uint64_t address, uint32_t data)
+{
+    return 0;
+}
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 6843fa0..04c83cd 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2388,3 +2388,9 @@ out_close:
 error_out:
     return;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+                             uint64_t address, uint32_t data)
+{
+    return 0;
+}
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index d59e740..a08641b 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi-event.h"
 #include "pci_ic.h"
+#include "hw/s390x/s390-pci-bus.h"
 
 /* #define DEBUG_KVM */
 
@@ -1414,3 +1415,28 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state)
 
     return ret;
 }
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+                              uint64_t address, uint32_t data)
+{
+    S390PCIBusDevice *pbdev;
+    uint32_t fid = data >> ZPCI_MSI_VEC_BITS;
+    uint32_t vec = data & ZPCI_MSI_VEC_MASK;
+
+    pbdev = s390_pci_find_dev_by_fid(fid);
+    if (!pbdev) {
+        DPRINTF("add_msi_route no dev\n");
+        return -ENODEV;
+    }
+
+    pbdev->routes.adapter.ind_offset = vec;
+
+    route->type = KVM_IRQ_ROUTING_S390_ADAPTER;
+    route->flags = 0;
+    route->u.adapter.summary_addr = pbdev->routes.adapter.summary_addr;
+    route->u.adapter.ind_addr = pbdev->routes.adapter.ind_addr;
+    route->u.adapter.summary_offset = pbdev->routes.adapter.summary_offset;
+    route->u.adapter.ind_offset = pbdev->routes.adapter.ind_offset;
+    route->u.adapter.adapter_id = pbdev->routes.adapter.adapter_id;
+    return 0;
+}
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support Frank Blaschka
@ 2014-11-10 15:14   ` Alexander Graf
  2014-11-18 12:50     ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-10 15:14 UTC (permalink / raw)
  To: Frank Blaschka, cornelia.huck, borntraeger, pbonzini, qemu-devel
  Cc: peter.maydell, james.hogan, mtosatti, Frank Blaschka, rth



On 10.11.14 15:20, Frank Blaschka wrote:
> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> 
> This patch implements a pci bus for s390x together with infrastructure
> to generate and handle hotplug events, to configure/unconfigure via
> sclp instruction, to do iommu translations and provide s390 support for
> MSI/MSI-X notification processing.
> 
> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> ---
>  default-configs/s390x-softmmu.mak |   1 +
>  hw/s390x/Makefile.objs            |   1 +
>  hw/s390x/css.c                    |   5 +
>  hw/s390x/css.h                    |   1 +
>  hw/s390x/s390-pci-bus.c           | 485 ++++++++++++++++++++++++++++++++++++++
>  hw/s390x/s390-pci-bus.h           | 254 ++++++++++++++++++++
>  hw/s390x/s390-virtio-ccw.c        |   3 +
>  hw/s390x/sclp.c                   |  10 +-
>  include/hw/s390x/sclp.h           |   8 +
>  target-s390x/ioinst.c             |  52 ++++
>  target-s390x/ioinst.h             |   1 +
>  11 files changed, 820 insertions(+), 1 deletion(-)
>  create mode 100644 hw/s390x/s390-pci-bus.c
>  create mode 100644 hw/s390x/s390-pci-bus.h
> 
> diff --git a/default-configs/s390x-softmmu.mak b/default-configs/s390x-softmmu.mak
> index 126d88d..6ee2ff8 100644
> --- a/default-configs/s390x-softmmu.mak
> +++ b/default-configs/s390x-softmmu.mak
> @@ -1,3 +1,4 @@
> +include pci.mak
>  CONFIG_VIRTIO=y
>  CONFIG_SCLPCONSOLE=y
>  CONFIG_S390_FLIC=y
> diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
> index 1ba6c3a..428d957 100644
> --- a/hw/s390x/Makefile.objs
> +++ b/hw/s390x/Makefile.objs
> @@ -8,3 +8,4 @@ obj-y += ipl.o
>  obj-y += css.o
>  obj-y += s390-virtio-ccw.o
>  obj-y += virtio-ccw.o
> +obj-y += s390-pci-bus.o
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index b67c039..7553085 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1299,6 +1299,11 @@ void css_generate_chp_crws(uint8_t cssid, uint8_t chpid)
>      /* TODO */
>  }
>  
> +void css_generate_css_crws(uint8_t cssid)
> +{
> +    css_queue_crw(CRW_RSC_CSS, 0, 0, 0);
> +}
> +
>  int css_enable_mcsse(void)
>  {
>      trace_css_enable_facility("mcsse");
> diff --git a/hw/s390x/css.h b/hw/s390x/css.h
> index 33104ac..7e53148 100644
> --- a/hw/s390x/css.h
> +++ b/hw/s390x/css.h
> @@ -101,6 +101,7 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, uint16_t rsid);
>  void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
>                             int hotplugged, int add);
>  void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
> +void css_generate_css_crws(uint8_t cssid);
>  void css_adapter_interrupt(uint8_t isc);
>  
>  #define CSS_IO_ADAPTER_VIRTIO 1
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> new file mode 100644
> index 0000000..f2fa6ba
> --- /dev/null
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -0,0 +1,485 @@
> +/*
> + * s390 PCI BUS
> + *
> + * Copyright 2014 IBM Corp.
> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> + *            Hong Bo Li <lihbbj@cn.ibm.com>
> + *            Yi Min Zhao <zyimin@cn.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include <hw/pci/pci.h>
> +#include <hw/pci/pci_bus.h>
> +#include <hw/s390x/css.h>
> +#include <hw/s390x/sclp.h>
> +#include <hw/pci/msi.h>
> +#include "qemu/error-report.h"
> +#include "s390-pci-bus.h"
> +
> +/* #define DEBUG_S390PCI_BUS */
> +#ifdef DEBUG_S390PCI_BUS
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +static const unsigned long be_to_le = BITS_PER_LONG - 1;
> +static QTAILQ_HEAD(, SeiContainer) pending_sei =
> +    QTAILQ_HEAD_INITIALIZER(pending_sei);
> +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
> +    QTAILQ_HEAD_INITIALIZER(device_list);

Please get rid of all statics ;). All state has to live in objects.

> +
> +int chsc_sei_nt2_get_event(void *res)
> +{
> +    ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
> +    PciCcdfAvail *accdf;
> +    PciCcdfErr *eccdf;
> +    int rc = 1;
> +    SeiContainer *sei_cont;
> +
> +    sei_cont = QTAILQ_FIRST(&pending_sei);
> +    if (sei_cont) {
> +        QTAILQ_REMOVE(&pending_sei, sei_cont, link);
> +        nt2_res->nt = 2;
> +        nt2_res->cc = sei_cont->cc;
> +        switch (sei_cont->cc) {
> +        case 1: /* error event */
> +            eccdf = (PciCcdfErr *)nt2_res->ccdf;
> +            eccdf->fid = cpu_to_be32(sei_cont->fid);
> +            eccdf->fh = cpu_to_be32(sei_cont->fh);
> +            break;
> +        case 2: /* availability event */
> +            accdf = (PciCcdfAvail *)nt2_res->ccdf;
> +            accdf->fid = cpu_to_be32(sei_cont->fid);
> +            accdf->fh = cpu_to_be32(sei_cont->fh);
> +            accdf->pec = cpu_to_be16(sei_cont->pec);
> +            break;
> +        default:
> +            abort();
> +        }
> +        g_free(sei_cont);
> +        rc = 0;
> +    }
> +
> +    return rc;
> +}
> +
> +int chsc_sei_nt2_have_event(void)
> +{
> +    return !QTAILQ_EMPTY(&pending_sei);
> +}
> +
> +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
> +{
> +    S390PCIBusDevice *pbdev;
> +
> +    QTAILQ_FOREACH(pbdev, &device_list, next) {
> +        if (pbdev->fid == fid) {
> +            return pbdev;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +void s390_pci_sclp_configure(int configure, SCCB *sccb)
> +{
> +    PciCfgSccb *psccb = (PciCfgSccb *)sccb;
> +    S390PCIBusDevice *pbdev = s390_pci_find_dev_by_fid(be32_to_cpu(psccb->aid));
> +    uint16_t rc;
> +
> +    if (pbdev) {
> +        if ((configure == 1 && pbdev->configured == true) ||
> +            (configure == 0 && pbdev->configured == false)) {
> +            rc = SCLP_RC_NO_ACTION_REQUIRED;
> +        } else {
> +            pbdev->configured = !pbdev->configured;
> +            rc = SCLP_RC_NORMAL_COMPLETION;
> +        }
> +    } else {
> +        DPRINTF("sclp config %d no dev found\n", configure);
> +        rc = SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED;
> +    }
> +
> +    psccb->header.response_code = cpu_to_be16(rc);
> +    return;
> +}
> +
> +static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
> +{
> +    return PCI_SLOT(pdev->devfn);
> +}
> +
> +static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
> +{
> +    return PCI_SLOT(pdev->devfn) | FH_VIRT;
> +}
> +
> +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
> +{
> +    S390PCIBusDevice *dev;
> +    int i = 0;
> +
> +    QTAILQ_FOREACH(dev, &device_list, next) {
> +        if (i == idx) {
> +            return dev;
> +        }
> +        i++;
> +    }
> +    return NULL;
> +}
> +
> +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh)
> +{
> +    S390PCIBusDevice *pbdev;
> +
> +    QTAILQ_FOREACH(pbdev, &device_list, next) {
> +        if (pbdev->fh == fh) {
> +            return pbdev;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void s390_pci_generate_plug_event(uint16_t pec, uint32_t fh,
> +                                         uint32_t fid)
> +{
> +    SeiContainer *sei_cont = g_malloc0(sizeof(SeiContainer));
> +
> +    sei_cont->fh = fh;
> +    sei_cont->fid = fid;
> +    sei_cont->cc = 2;
> +    sei_cont->pec = pec;
> +
> +    QTAILQ_INSERT_TAIL(&pending_sei, sei_cont, link);
> +    css_generate_css_crws(0);
> +}
> +
> +static void s390_pci_set_irq(void *opaque, int irq, int level)
> +{
> +    /* nothing to do */
> +}
> +
> +static int s390_pci_map_irq(PCIDevice *pci_dev, int irq_num)
> +{
> +    /* nothing to do */
> +    return 0;
> +}
> +
> +void s390_pci_bus_init(void)
> +{
> +    DeviceState *dev;
> +
> +    dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE);
> +    qdev_init_nofail(dev);
> +}
> +
> +uint64_t s390_pci_get_table_origin(uint64_t iota)
> +{
> +    return iota & ~ZPCI_IOTA_RTTO_FLAG;
> +}
> +
> +static uint32_t s390_pci_get_p(uint64_t iota)
> +{
> +    return iota & ~ZPCI_IOTA_RTTO_FLAG;
> +}
> +
> +static uint32_t s390_pci_get_dt(uint64_t iota)
> +{
> +    return (iota >> 2) & 0x7;
> +}
> +
> +static uint32_t s390_pci_get_fs(uint64_t iota)
> +{
> +    uint32_t dt = s390_pci_get_dt(iota);
> +
> +    if (dt == 4 || dt == 5) {
> +        return iota & 0x3;
> +    } else {
> +        return ZPCI_IOTA_FS_4K;
> +    }
> +}
> +
> +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
> +                                  uint64_t guest_dma_address)
> +{
> +    uint64_t sto_a, pto_a, px_a;
> +    uint64_t sto, pto, pte;
> +    uint32_t rtx, sx, px;
> +
> +    rtx = calc_rtx(guest_dma_address);
> +    sx = calc_sx(guest_dma_address);
> +    px = calc_px(guest_dma_address);
> +
> +    sto_a = guest_iota + rtx * sizeof(uint64_t);
> +    cpu_physical_memory_rw(sto_a, (uint8_t *)&sto, sizeof(uint64_t), 0);

This is not endian safe. Couldn't you just use ldq_be_phys() here?

> +    sto = (uint64_t)get_rt_sto(sto);
> +
> +    pto_a = sto + sx * sizeof(uint64_t);

Can there be invalid entries? How could the guest say "access not
allowed for this region"?

> +    cpu_physical_memory_rw(pto_a, (uint8_t *)&pto, sizeof(uint64_t), 0);

ldq_be_phys()

> +    pto = (uint64_t)get_st_pto(pto);
> +
> +    px_a = pto + px * sizeof(uint64_t);
> +    cpu_physical_memory_rw(px_a, (uint8_t *)&pte, sizeof(uint64_t), 0);

ldq_be_phys()

> +
> +    return pte;
> +}
> +
> +static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> +                                          bool is_write)
> +{
> +    IOMMUTLBEntry ret;
> +    uint32_t fs;
> +    uint64_t pte;
> +    BEntry *container = container_of(iommu, BEntry, mr);
> +    S390PCIBusDevice *pbdev = container->pbdev;
> +    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pbdev->pdev)
> +                                           ->qbus.parent);
> +
> +    DPRINTF("iommu trans addr 0x%lx\n", addr);
> +
> +    /* s390 does not have an APIC maped to main storage so we use
> +     * a separate AddressSpace only for msix notifications
> +     */
> +    if (addr == ZPCI_MSI_ADDR) {
> +        ret.target_as = &s->msix_notify_as;
> +        ret.iova = addr;
> +        ret.translated_addr = addr;
> +        ret.addr_mask = 0xfff;
> +        ret.perm = IOMMU_RW;
> +        return ret;
> +    }
> +
> +    pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
> +                                   addr);

Same question for the invalid entry part again. How can we inform the
guest that a device tried to DMA to an invalid address? Shouldn't there
be some error recovery interrupt and device shutdown in that case?

> +
> +    ret.target_as = &address_space_memory;
> +    ret.iova = addr;
> +    ret.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
> +    fs = s390_pci_get_fs(pbdev->g_iota);
> +    if (fs == ZPCI_IOTA_FS_4K) {
> +        ret.addr_mask = 0xfff;
> +    } else if (fs == ZPCI_IOTA_FS_1M) {
> +        ret.addr_mask = 0xfffff;
> +    } else if (fs == ZPCI_IOTA_FS_2G) {
> +        ret.addr_mask = 0x7fffffff;
> +    }
> +    if (s390_pci_get_p(pbdev->g_iota) == 1) {
> +        ret.perm = IOMMU_RO;
> +    } else {
> +        ret.perm = IOMMU_RW;
> +    }
> +    return ret;
> +}
> +
> +static const MemoryRegionIOMMUOps s390_iommu_ops = {
> +    .translate = s390_translate_iommu,
> +};
> +
> +static AddressSpace *s390_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> +{
> +    S390pciState *s = opaque;
> +
> +    return &s->iommu[PCI_SLOT(devfn)].as;
> +}
> +
> +static void s390_msi_ctrl_write(void *opaque, hwaddr addr, uint64_t data,
> +                                unsigned int size)
> +{
> +    S390PCIBusDevice *pbdev;
> +    unsigned long *aibv, *aisb;
> +    int summary_set;
> +    hwaddr aibv_len, aisb_len;
> +    uint32_t io_int_word;
> +    uint32_t hdata = le32_to_cpu(data);

^

> +    uint32_t fid = hdata >> ZPCI_MSI_VEC_BITS;
> +    uint32_t vec = hdata & ZPCI_MSI_VEC_MASK;
> +
> +    DPRINTF("write_msix data 0x%lx fid %d vec 0x%x\n", data, fid, vec);
> +
> +    pbdev = s390_pci_find_dev_by_fid(fid);
> +    if (!pbdev) {
> +        DPRINTF("msix_notify no dev\n");
> +        return;
> +    }
> +    aibv_len = aisb_len = 8;
> +    aibv = cpu_physical_memory_map(pbdev->routes.adapter.ind_addr,
> +                                   &aibv_len, 1);
> +    aisb = cpu_physical_memory_map(pbdev->routes.adapter.summary_addr,
> +                                   &aisb_len, 1);

Please use ldq_le/be_phys.

> +
> +    set_bit(vec ^ be_to_le, aibv);

What is this be_to_le? Looking at the other endianness messup so far,
I'm not convinced this is correct. Is there any way to let the
infrastructure deal with endianness instead?

> +    summary_set = test_and_set_bit(pbdev->routes.adapter.summary_offset
> +                                   ^ be_to_le, aisb);
> +
> +    if (!summary_set) {
> +        io_int_word = (pbdev->isc << 27) | IO_INT_WORD_AI;
> +        s390_io_interrupt(0, 0, 0, io_int_word);
> +    }
> +
> +    cpu_physical_memory_unmap(aibv, aibv_len, 1, aibv_len);
> +    cpu_physical_memory_unmap(aisb, aisb_len, 1, aisb_len);
> +    return;
> +}
> +
> +static uint64_t s390_msi_ctrl_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    return 0xffffffff;
> +}
> +
> +static const MemoryRegionOps s390_msi_ctrl_ops = {
> +    .write = s390_msi_ctrl_write,
> +    .read = s390_msi_ctrl_read,
> +    .endianness = DEVICE_NATIVE_ENDIAN,

If you're doing an LE conversion right after, it probably means the
region is really a LITTLE_ENDIAN region, no?

Also, le32_to_cpu is definitely wrong in this case, as that will byte
swap on BE hosts, but not on LE hosts. The value you get back is always
the same though, so you've successfully broken LE hosts.

Keep in mind that there's always the slim chance that s390x supports LE
one day, so getting endianness right from the start is pretty important,
even though it seems useless to you today ;).

> +};
> +
> +static void s390_pcihost_init_as(S390pciState *s)
> +{
> +    int i;
> +
> +    for (i = 0; i < PCI_SLOT_MAX; i++) {
> +        memory_region_init_iommu(&s->iommu[i].mr, OBJECT(s),
> +                                 &s390_iommu_ops, "iommu-s390", UINT64_MAX);
> +        address_space_init(&s->iommu[i].as, &s->iommu[i].mr, "iommu-pci");
> +    }
> +
> +    memory_region_init_io(&s->msix_notify_mr, OBJECT(s),
> +                          &s390_msi_ctrl_ops, s, "msix-s390", UINT64_MAX);
> +    address_space_init(&s->msix_notify_as, &s->msix_notify_mr, "msix-pci");
> +}
> +
> +static int s390_pcihost_init(SysBusDevice *dev)
> +{
> +    PCIBus *b;
> +    BusState *bus;
> +    PCIHostState *phb = PCI_HOST_BRIDGE(dev);
> +    S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
> +
> +    DPRINTF("host_init\n");
> +
> +    b = pci_register_bus(DEVICE(dev), NULL,
> +                         s390_pci_set_irq, s390_pci_map_irq, NULL,
> +                         get_system_memory(), get_system_io(), 0, 64,
> +                         TYPE_PCI_BUS);
> +    s390_pcihost_init_as(s);
> +    pci_setup_iommu(b, s390_pci_dma_iommu, s);
> +
> +    bus = BUS(b);
> +    qbus_set_hotplug_handler(bus, DEVICE(dev), NULL);
> +    phb->bus = b;
> +    return 0;
> +}
> +
> +static int s390_pcihost_setup_msix(S390PCIBusDevice *pbdev)
> +{
> +    uint8_t pos;
> +    uint16_t ctrl;
> +    uint32_t table, pba;
> +
> +    pos = pci_find_capability(pbdev->pdev, PCI_CAP_ID_MSIX);
> +    if (!pos) {
> +        pbdev->msix.available = false;
> +        return 0;
> +    }
> +
> +    ctrl = pci_host_config_read_common(pbdev->pdev, pos + PCI_CAP_FLAGS,
> +             pci_config_size(pbdev->pdev), sizeof(ctrl));
> +    table = pci_host_config_read_common(pbdev->pdev, pos + PCI_MSIX_TABLE,
> +             pci_config_size(pbdev->pdev), sizeof(table));
> +    pba = pci_host_config_read_common(pbdev->pdev, pos + PCI_MSIX_PBA,
> +             pci_config_size(pbdev->pdev), sizeof(pba));
> +
> +    pbdev->msix.table_bar = table & PCI_MSIX_FLAGS_BIRMASK;
> +    pbdev->msix.table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
> +    pbdev->msix.pba_bar = pba & PCI_MSIX_FLAGS_BIRMASK;
> +    pbdev->msix.pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> +    pbdev->msix.entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> +    pbdev->msix.available = true;
> +    return 0;
> +}
> +
> +static void s390_pcihost_hot_plug(HotplugHandler *hotplug_dev,
> +                                  DeviceState *dev, Error **errp)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    S390PCIBusDevice *pbdev;
> +    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
> +                                           ->qbus.parent);
> +
> +    pbdev = g_malloc0(sizeof(*pbdev));
> +
> +    pbdev->fid = s390_pci_get_pfid(pci_dev);
> +    pbdev->pdev = pci_dev;
> +    pbdev->configured = true;
> +
> +    pbdev->fh = s390_pci_get_pfh(pci_dev);
> +
> +    s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev = pbdev;
> +    s390_pcihost_setup_msix(pbdev);
> +
> +    QTAILQ_INSERT_TAIL(&device_list, pbdev, next);
> +    if (dev->hotplugged) {
> +        s390_pci_generate_plug_event(HP_EVENT_RESERVED_TO_STANDBY,
> +                                     pbdev->fh, pbdev->fid);
> +        s390_pci_generate_plug_event(HP_EVENT_TO_CONFIGURED,
> +                                     pbdev->fh, pbdev->fid);
> +    }
> +    return;
> +}
> +
> +static void s390_pcihost_hot_unplug(HotplugHandler *hotplug_dev,
> +                                    DeviceState *dev, Error **errp)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
> +                                           ->qbus.parent);
> +    S390PCIBusDevice *pbdev = s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev;
> +
> +    if (pbdev->configured) {
> +        pbdev->configured = false;
> +        s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
> +                                     pbdev->fh, pbdev->fid);
> +    }
> +
> +    QTAILQ_REMOVE(&device_list, pbdev, next);
> +    s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
> +                                 pbdev->fh, pbdev->fid);
> +    s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev = NULL;
> +    object_unparent(OBJECT(pci_dev));
> +    g_free(pbdev);
> +}
> +
> +static void s390_pcihost_class_init(ObjectClass *klass, void *data)
> +{
> +    SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
> +
> +    dc->cannot_instantiate_with_device_add_yet = true;
> +    k->init = s390_pcihost_init;
> +    hc->plug = s390_pcihost_hot_plug;
> +    hc->unplug = s390_pcihost_hot_unplug;
> +    msi_supported = true;
> +}
> +
> +static const TypeInfo s390_pcihost_info = {
> +    .name          = TYPE_S390_PCI_HOST_BRIDGE,
> +    .parent        = TYPE_PCI_HOST_BRIDGE,
> +    .instance_size = sizeof(S390pciState),
> +    .class_init    = s390_pcihost_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_HOTPLUG_HANDLER },
> +        { }
> +    }
> +};
> +
> +static void s390_pci_register_types(void)
> +{
> +    type_register_static(&s390_pcihost_info);
> +}
> +
> +type_init(s390_pci_register_types)
> diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
> new file mode 100644
> index 0000000..088f24f
> --- /dev/null
> +++ b/hw/s390x/s390-pci-bus.h
> @@ -0,0 +1,254 @@
> +/*
> + * s390 PCI BUS definitions
> + *
> + * Copyright 2014 IBM Corp.
> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> + *            Hong Bo Li <lihbbj@cn.ibm.com>
> + *            Yi Min Zhao <zyimin@cn.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#ifndef HW_S390_PCI_BUS_H
> +#define HW_S390_PCI_BUS_H
> +
> +#include <hw/pci/pci.h>
> +#include <hw/pci/pci_host.h>
> +#include "hw/s390x/sclp.h"
> +#include "hw/s390x/s390_flic.h"
> +#include "hw/s390x/css.h"
> +
> +#define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
> +#define FH_VIRT 0x00ff0000
> +#define ENABLE_BIT_OFFSET 31
> +#define S390_PCIPT_ADAPTER 2
> +
> +#define S390_PCI_HOST_BRIDGE(obj) \
> +    OBJECT_CHECK(S390pciState, (obj), TYPE_S390_PCI_HOST_BRIDGE)
> +
> +#define HP_EVENT_TO_CONFIGURED        0x0301
> +#define HP_EVENT_RESERVED_TO_STANDBY  0x0302
> +#define HP_EVENT_CONFIGURED_TO_STBRES 0x0304
> +#define HP_EVENT_STANDBY_TO_RESERVED  0x0308
> +
> +#define ZPCI_MSI_VEC_BITS 11
> +#define ZPCI_MSI_VEC_MASK 0x7f
> +
> +#define ZPCI_MSI_ADDR  0xfe00000000000000
> +#define ZPCI_SDMA_ADDR 0x100000000
> +#define ZPCI_EDMA_ADDR 0x1ffffffffffffff
> +
> +#define PAGE_SHIFT      12
> +#define PAGE_MASK       (~(PAGE_SIZE-1))
> +#define PAGE_DEFAULT_ACC        0
> +#define PAGE_DEFAULT_KEY        (PAGE_DEFAULT_ACC << 4)
> +
> +/* I/O Translation Anchor (IOTA) */
> +enum ZpciIoatDtype {
> +    ZPCI_IOTA_STO = 0,
> +    ZPCI_IOTA_RTTO = 1,
> +    ZPCI_IOTA_RSTO = 2,
> +    ZPCI_IOTA_RFTO = 3,
> +    ZPCI_IOTA_PFAA = 4,
> +    ZPCI_IOTA_IOPFAA = 5,
> +    ZPCI_IOTA_IOPTO = 7
> +};
> +
> +#define ZPCI_IOTA_IOT_ENABLED           0x800UL
> +#define ZPCI_IOTA_DT_ST                 (ZPCI_IOTA_STO  << 2)
> +#define ZPCI_IOTA_DT_RT                 (ZPCI_IOTA_RTTO << 2)
> +#define ZPCI_IOTA_DT_RS                 (ZPCI_IOTA_RSTO << 2)
> +#define ZPCI_IOTA_DT_RF                 (ZPCI_IOTA_RFTO << 2)
> +#define ZPCI_IOTA_DT_PF                 (ZPCI_IOTA_PFAA << 2)
> +#define ZPCI_IOTA_FS_4K                 0
> +#define ZPCI_IOTA_FS_1M                 1
> +#define ZPCI_IOTA_FS_2G                 2
> +#define ZPCI_KEY                        (PAGE_DEFAULT_KEY << 5)
> +
> +#define ZPCI_IOTA_STO_FLAG  (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_ST)
> +#define ZPCI_IOTA_RTTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RT)
> +#define ZPCI_IOTA_RSTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RS)
> +#define ZPCI_IOTA_RFTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RF)
> +#define ZPCI_IOTA_RFAA_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY |\
> +                             ZPCI_IOTA_DT_PF | ZPCI_IOTA_FS_2G)
> +
> +/* I/O Region and segment tables */
> +#define ZPCI_INDEX_MASK         0x7ffUL
> +
> +#define ZPCI_TABLE_TYPE_MASK    0xc
> +#define ZPCI_TABLE_TYPE_RFX     0xc
> +#define ZPCI_TABLE_TYPE_RSX     0x8
> +#define ZPCI_TABLE_TYPE_RTX     0x4
> +#define ZPCI_TABLE_TYPE_SX      0x0
> +
> +#define ZPCI_TABLE_LEN_RFX      0x3
> +#define ZPCI_TABLE_LEN_RSX      0x3
> +#define ZPCI_TABLE_LEN_RTX      0x3
> +
> +#define ZPCI_TABLE_OFFSET_MASK  0xc0
> +#define ZPCI_TABLE_SIZE         0x4000
> +#define ZPCI_TABLE_ALIGN        ZPCI_TABLE_SIZE
> +#define ZPCI_TABLE_ENTRY_SIZE   (sizeof(unsigned long))
> +#define ZPCI_TABLE_ENTRIES      (ZPCI_TABLE_SIZE / ZPCI_TABLE_ENTRY_SIZE)
> +
> +#define ZPCI_TABLE_BITS         11
> +#define ZPCI_PT_BITS            8
> +#define ZPCI_ST_SHIFT           (ZPCI_PT_BITS + PAGE_SHIFT)
> +#define ZPCI_RT_SHIFT           (ZPCI_ST_SHIFT + ZPCI_TABLE_BITS)
> +
> +#define ZPCI_RTE_FLAG_MASK      0x3fffUL
> +#define ZPCI_RTE_ADDR_MASK      (~ZPCI_RTE_FLAG_MASK)
> +#define ZPCI_STE_FLAG_MASK      0x7ffUL
> +#define ZPCI_STE_ADDR_MASK      (~ZPCI_STE_FLAG_MASK)
> +
> +/* I/O Page tables */
> +#define ZPCI_PTE_VALID_MASK             0x400
> +#define ZPCI_PTE_INVALID                0x400
> +#define ZPCI_PTE_VALID                  0x000
> +#define ZPCI_PT_SIZE                    0x800
> +#define ZPCI_PT_ALIGN                   ZPCI_PT_SIZE
> +#define ZPCI_PT_ENTRIES                 (ZPCI_PT_SIZE / ZPCI_TABLE_ENTRY_SIZE)
> +#define ZPCI_PT_MASK                    (ZPCI_PT_ENTRIES - 1)
> +
> +#define ZPCI_PTE_FLAG_MASK              0xfffUL
> +#define ZPCI_PTE_ADDR_MASK              (~ZPCI_PTE_FLAG_MASK)
> +
> +/* Shared bits */
> +#define ZPCI_TABLE_VALID                0x00
> +#define ZPCI_TABLE_INVALID              0x20
> +#define ZPCI_TABLE_PROTECTED            0x200
> +#define ZPCI_TABLE_UNPROTECTED          0x000
> +
> +#define ZPCI_TABLE_VALID_MASK           0x20
> +#define ZPCI_TABLE_PROT_MASK            0x200
> +
> +typedef struct SeiContainer {
> +    QTAILQ_ENTRY(SeiContainer) link;
> +    uint32_t fid;
> +    uint32_t fh;
> +    uint8_t cc;
> +    uint16_t pec;
> +} SeiContainer;
> +
> +typedef struct PciCcdfErr {
> +    uint32_t reserved1;
> +    uint32_t fh;
> +    uint32_t fid;
> +    uint32_t reserved2;
> +    uint64_t faddr;
> +    uint32_t reserved3;
> +    uint16_t reserved4;
> +    uint16_t pec;
> +} QEMU_PACKED PciCcdfErr;
> +
> +typedef struct PciCcdfAvail {
> +    uint32_t reserved1;
> +    uint32_t fh;
> +    uint32_t fid;
> +    uint32_t reserved2;
> +    uint32_t reserved3;
> +    uint32_t reserved4;
> +    uint32_t reserved5;
> +    uint16_t reserved6;
> +    uint16_t pec;
> +} QEMU_PACKED PciCcdfAvail;
> +
> +typedef struct ChscSeiNt2Res {
> +    uint16_t length;
> +    uint16_t code;
> +    uint16_t reserved1;
> +    uint8_t reserved2;
> +    uint8_t nt;
> +    uint8_t flags;
> +    uint8_t reserved3;
> +    uint8_t reserved4;
> +    uint8_t cc;
> +    uint32_t reserved5[13];
> +    uint8_t ccdf[4016];
> +} QEMU_PACKED ChscSeiNt2Res;
> +
> +typedef struct PciCfgSccb {
> +        SCCBHeader header;
> +        uint8_t atype;
> +        uint8_t reserved1;
> +        uint16_t reserved2;
> +        uint32_t aid;
> +} QEMU_PACKED PciCfgSccb;
> +
> +typedef struct S390MsixInfo {
> +    bool available;
> +    uint8_t table_bar;
> +    uint8_t pba_bar;
> +    uint16_t entries;
> +    uint32_t table_offset;
> +    uint32_t pba_offset;
> +} S390MsixInfo;
> +
> +typedef struct S390PCIBusDevice {
> +    PCIDevice *pdev;
> +    bool configured;
> +    uint32_t fh;
> +    uint32_t fid;
> +    uint64_t g_iota;
> +    uint8_t isc;
> +    S390MsixInfo msix;
> +    AdapterRoutes routes;
> +    QTAILQ_ENTRY(S390PCIBusDevice) next;
> +} S390PCIBusDevice;
> +
> +typedef struct BEntry {
> +    AddressSpace as;
> +    MemoryRegion mr;
> +    S390PCIBusDevice *pbdev;
> +} BEntry;
> +
> +typedef struct S390pciState {
> +    PCIHostState parent_obj;
> +    BEntry iommu[PCI_SLOT_MAX];
> +    AddressSpace msix_notify_as;
> +    MemoryRegion msix_notify_mr;
> +} S390pciState;
> +
> +static inline unsigned int calc_rtx(dma_addr_t ptr)
> +{
> +    return ((unsigned long) ptr >> ZPCI_RT_SHIFT) & ZPCI_INDEX_MASK;
> +}
> +
> +static inline unsigned int calc_sx(dma_addr_t ptr)
> +{
> +    return ((unsigned long) ptr >> ZPCI_ST_SHIFT) & ZPCI_INDEX_MASK;
> +}
> +
> +static inline unsigned int calc_px(dma_addr_t ptr)
> +{
> +    return ((unsigned long) ptr >> PAGE_SHIFT) & ZPCI_PT_MASK;
> +}
> +
> +static inline unsigned long *get_rt_sto(unsigned long entry)
> +{
> +    return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_RTX)
> +                ? (unsigned long *) (entry & ZPCI_RTE_ADDR_MASK)
> +                : NULL;
> +}
> +
> +static inline unsigned long *get_st_pto(unsigned long entry)
> +{
> +    return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_SX)
> +            ? (unsigned long *) (entry & ZPCI_STE_ADDR_MASK)
> +            : NULL;
> +}

Are these static inlines used outside of a single place? If not, please
move them into the .c file they get called from.

> +
> +int chsc_sei_nt2_get_event(void *res);
> +int chsc_sei_nt2_have_event(void);
> +void s390_pci_sclp_configure(int configure, SCCB *sccb);
> +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
> +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
> +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);

I think it makes sense to pass the PHB device as parameter on these.
Don't assume you only have one.

> +void s390_pci_bus_init(void);
> +uint64_t s390_pci_get_table_origin(uint64_t iota);
> +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
> +                                  uint64_t guest_dma_address);

Why are these exported?

> +
> +#endif
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index bc4dc2a..2e25834 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -18,6 +18,7 @@
>  #include "css.h"
>  #include "virtio-ccw.h"
>  #include "qemu/config-file.h"
> +#include "s390-pci-bus.h"
>  
>  #define TYPE_S390_CCW_MACHINE               "s390-ccw-machine"
>  
> @@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
>                        machine->initrd_filename, "s390-ccw.img");
>      s390_flic_init();
>  
> +    s390_pci_bus_init();

Please just inline that function here.

> +
>      /* register hypercalls */
>      virtio_ccw_register_hcalls();
>  
> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
> index a759da7..a969975 100644
> --- a/hw/s390x/sclp.c
> +++ b/hw/s390x/sclp.c
> @@ -20,6 +20,7 @@
>  #include "qemu/config-file.h"
>  #include "hw/s390x/sclp.h"
>  #include "hw/s390x/event-facility.h"
> +#include "hw/s390x/s390-pci-bus.h"
>  
>  static inline SCLPEventFacility *get_event_facility(void)
>  {
> @@ -62,7 +63,8 @@ static void read_SCP_info(SCCB *sccb)
>          read_info->entries[i].type = 0;
>      }
>  
> -    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);
> +    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO |
> +                                        SCLP_HAS_PCI_RECONFIG);
>  
>      /*
>       * The storage increment size is a multiple of 1M and is a power of 2.
> @@ -350,6 +352,12 @@ static void sclp_execute(SCCB *sccb, uint32_t code)
>      case SCLP_UNASSIGN_STORAGE:
>          unassign_storage(sccb);
>          break;
> +    case SCLP_CMDW_CONFIGURE_PCI:
> +        s390_pci_sclp_configure(1, sccb);
> +        break;
> +    case SCLP_CMDW_DECONFIGURE_PCI:
> +        s390_pci_sclp_configure(0, sccb);
> +        break;
>      default:
>          efc->command_handler(ef, sccb, code);
>          break;
> diff --git a/include/hw/s390x/sclp.h b/include/hw/s390x/sclp.h
> index ec07a11..e8a64e2 100644
> --- a/include/hw/s390x/sclp.h
> +++ b/include/hw/s390x/sclp.h
> @@ -43,14 +43,22 @@
>  #define SCLP_CMDW_CONFIGURE_CPU                 0x00110001
>  #define SCLP_CMDW_DECONFIGURE_CPU               0x00100001
>  
> +/* SCLP PCI codes */
> +#define SCLP_HAS_PCI_RECONFIG                   0x0000000040000000ULL
> +#define SCLP_CMDW_CONFIGURE_PCI                 0x001a0001
> +#define SCLP_CMDW_DECONFIGURE_PCI               0x001b0001
> +#define SCLP_RECONFIG_PCI_ATPYE                 2
> +
>  /* SCLP response codes */
>  #define SCLP_RC_NORMAL_READ_COMPLETION          0x0010
>  #define SCLP_RC_NORMAL_COMPLETION               0x0020
>  #define SCLP_RC_SCCB_BOUNDARY_VIOLATION         0x0100
> +#define SCLP_RC_NO_ACTION_REQUIRED              0x0120
>  #define SCLP_RC_INVALID_SCLP_COMMAND            0x01f0
>  #define SCLP_RC_CONTAINED_EQUIPMENT_CHECK       0x0340
>  #define SCLP_RC_INSUFFICIENT_SCCB_LENGTH        0x0300
>  #define SCLP_RC_STANDBY_READ_COMPLETION         0x0410
> +#define SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED       0x09f0
>  #define SCLP_RC_INVALID_FUNCTION                0x40f0
>  #define SCLP_RC_NO_EVENT_BUFFERS_STORED         0x60f0
>  #define SCLP_RC_INVALID_SELECTION_MASK          0x70f0
> diff --git a/target-s390x/ioinst.c b/target-s390x/ioinst.c
> index b8a6486..d969f8f 100644
> --- a/target-s390x/ioinst.c
> +++ b/target-s390x/ioinst.c
> @@ -14,6 +14,7 @@
>  #include "cpu.h"
>  #include "ioinst.h"
>  #include "trace.h"
> +#include "hw/s390x/s390-pci-bus.h"
>  
>  int ioinst_disassemble_sch_ident(uint32_t value, int *m, int *cssid, int *ssid,
>                                   int *schid)
> @@ -398,6 +399,7 @@ typedef struct ChscResp {
>  #define CHSC_SCPD 0x0002
>  #define CHSC_SCSC 0x0010
>  #define CHSC_SDA  0x0031
> +#define CHSC_SEI  0x000e
>  
>  #define CHSC_SCPD_0_M 0x20000000
>  #define CHSC_SCPD_0_C 0x10000000
> @@ -566,6 +568,53 @@ out:
>      res->param = 0;
>  }
>  
> +static int chsc_sei_nt0_get_event(void *res)
> +{
> +    /* no events yet */
> +    return 1;
> +}
> +
> +static int chsc_sei_nt0_have_event(void)
> +{
> +    /* no events yet */
> +    return 0;
> +}
> +
> +#define CHSC_SEI_NT0    (1ULL << 63)
> +#define CHSC_SEI_NT2    (1ULL << 61)
> +static void ioinst_handle_chsc_sei(ChscReq *req, ChscResp *res)
> +{
> +    uint64_t selection_mask = be64_to_cpu(*(uint64_t *)&req->param1);

ldq_p(&req->param1) I guess?


Alex

> +    uint8_t *res_flags = (uint8_t *)res->data;
> +    int have_event = 0;
> +    int have_more = 0;
> +
> +    /* regarding architecture nt0 can not be masked */
> +    have_event = !chsc_sei_nt0_get_event(res);
> +    have_more = chsc_sei_nt0_have_event();
> +
> +    if (selection_mask & CHSC_SEI_NT2) {
> +        if (!have_event) {
> +            have_event = !chsc_sei_nt2_get_event(res);
> +        }
> +
> +        if (!have_more) {
> +            have_more = chsc_sei_nt2_have_event();
> +        }
> +    }
> +
> +    if (have_event) {
> +        res->code = cpu_to_be16(0x0001);
> +        if (have_more) {
> +            (*res_flags) |= 0x80;
> +        } else {
> +            (*res_flags) &= ~0x80;
> +        }
> +    } else {
> +        res->code = cpu_to_be16(0x0004);
> +    }
> +}
> +
>  static void ioinst_handle_chsc_unimplemented(ChscResp *res)
>  {
>      res->len = cpu_to_be16(CHSC_MIN_RESP_LEN);
> @@ -617,6 +666,9 @@ void ioinst_handle_chsc(S390CPU *cpu, uint32_t ipb)
>      case CHSC_SDA:
>          ioinst_handle_chsc_sda(req, res);
>          break;
> +    case CHSC_SEI:
> +        ioinst_handle_chsc_sei(req, res);
> +        break;
>      default:
>          ioinst_handle_chsc_unimplemented(res);
>          break;
> diff --git a/target-s390x/ioinst.h b/target-s390x/ioinst.h
> index 29f6423..1efe16c 100644
> --- a/target-s390x/ioinst.h
> +++ b/target-s390x/ioinst.h
> @@ -204,6 +204,7 @@ typedef struct CRW {
>  
>  #define CRW_RSC_SUBCH 0x3
>  #define CRW_RSC_CHP   0x4
> +#define CRW_RSC_CSS   0xb
>  
>  /* I/O interruption code */
>  typedef struct IOIntCode {
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-10 14:20 ` [Qemu-devel] [PATCH 2/3] s390: implement pci instructions Frank Blaschka
@ 2014-11-10 15:56   ` Alexander Graf
  2014-11-11 12:10     ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-10 15:56 UTC (permalink / raw)
  To: Frank Blaschka, cornelia.huck, borntraeger, pbonzini, qemu-devel
  Cc: peter.maydell, james.hogan, mtosatti, Frank Blaschka, rth



On 10.11.14 15:20, Frank Blaschka wrote:
> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> 
> This patch implements the s390 pci instructions in qemu. It allows
> to access and drive pci devices attached to the s390 pci bus.
> Because of platform constrains devices using IO BARs are not
> supported. Also a device has to support MSI/MSI-X to run on s390.
> 
> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> ---
>  target-s390x/Makefile.objs |   2 +-
>  target-s390x/kvm.c         |  52 ++++
>  target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>  target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>  4 files changed, 1141 insertions(+), 1 deletion(-)
>  create mode 100644 target-s390x/pci_ic.c
>  create mode 100644 target-s390x/pci_ic.h
> 
> diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
> index 2c57494..cc71400 100644
> --- a/target-s390x/Makefile.objs
> +++ b/target-s390x/Makefile.objs
> @@ -2,4 +2,4 @@ obj-y += translate.o helper.o cpu.o interrupt.o
>  obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
>  obj-y += gdbstub.o
>  obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o
> -obj-$(CONFIG_KVM) += kvm.o
> +obj-$(CONFIG_KVM) += kvm.o pci_ic.o
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index 5b10a25..d59e740 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -40,6 +40,7 @@
>  #include "exec/gdbstub.h"
>  #include "trace.h"
>  #include "qapi-event.h"
> +#include "pci_ic.h"
>  
>  /* #define DEBUG_KVM */
>  
> @@ -56,6 +57,7 @@
>  #define IPA0_B2                         0xb200
>  #define IPA0_B9                         0xb900
>  #define IPA0_EB                         0xeb00
> +#define IPA0_E3                         0xe300
>  
>  #define PRIV_B2_SCLP_CALL               0x20
>  #define PRIV_B2_CSCH                    0x30
> @@ -76,8 +78,17 @@
>  #define PRIV_B2_XSCH                    0x76
>  
>  #define PRIV_EB_SQBS                    0x8a
> +#define PRIV_EB_PCISTB                  0xd0
> +#define PRIV_EB_SIC                     0xd1
>  
>  #define PRIV_B9_EQBS                    0x9c
> +#define PRIV_B9_CLP                     0xa0
> +#define PRIV_B9_PCISTG                  0xd0
> +#define PRIV_B9_PCILG                   0xd2
> +#define PRIV_B9_RPCIT                   0xd3
> +
> +#define PRIV_E3_MPCIFC                  0xd0
> +#define PRIV_E3_STPCIFC                 0xd4
>  
>  #define DIAG_IPL                        0x308
>  #define DIAG_KVM_HYPERCALL              0x500
> @@ -814,6 +825,18 @@ static int handle_b9(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>      int r = 0;
>  
>      switch (ipa1) {
> +    case PRIV_B9_CLP:
> +        r = kvm_clp_service_call(cpu, run);
> +        break;
> +    case PRIV_B9_PCISTG:
> +        r = kvm_pcistg_service_call(cpu, run);
> +        break;
> +    case PRIV_B9_PCILG:
> +        r = kvm_pcilg_service_call(cpu, run);
> +        break;
> +    case PRIV_B9_RPCIT:
> +        r = kvm_rpcit_service_call(cpu, run);
> +        break;
>      case PRIV_B9_EQBS:
>          /* just inject exception */
>          r = -1;
> @@ -832,6 +855,12 @@ static int handle_eb(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>      int r = 0;
>  
>      switch (ipa1) {
> +    case PRIV_EB_PCISTB:
> +        r = kvm_pcistb_service_call(cpu, run);
> +        break;
> +    case PRIV_EB_SIC:
> +        r = kvm_sic_service_call(cpu, run);
> +        break;
>      case PRIV_EB_SQBS:
>          /* just inject exception */
>          r = -1;
> @@ -845,6 +874,26 @@ static int handle_eb(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
>      return r;
>  }
>  
> +static int handle_e3(S390CPU *cpu, struct kvm_run *run, uint8_t ipbl)
> +{
> +    int r = 0;
> +
> +    switch (ipbl) {
> +    case PRIV_E3_MPCIFC:
> +        r = kvm_mpcifc_service_call(cpu, run);
> +        break;
> +    case PRIV_E3_STPCIFC:
> +        r = kvm_stpcifc_service_call(cpu, run);
> +        break;
> +    default:
> +        r = -1;
> +        DPRINTF("KVM: unhandled PRIV: 0xe3%x\n", ipbl);
> +        break;
> +    }
> +
> +    return r;
> +}
> +
>  static int handle_hypercall(S390CPU *cpu, struct kvm_run *run)
>  {
>      CPUS390XState *env = &cpu->env;
> @@ -1041,6 +1090,9 @@ static int handle_instruction(S390CPU *cpu, struct kvm_run *run)
>      case IPA0_EB:
>          r = handle_eb(cpu, run, ipa1);
>          break;
> +    case IPA0_E3:
> +        r = handle_e3(cpu, run, run->s390_sieic.ipb & 0xff);
> +        break;
>      case IPA0_DIAG:
>          r = handle_diag(cpu, run, run->s390_sieic.ipb);
>          break;
> diff --git a/target-s390x/pci_ic.c b/target-s390x/pci_ic.c
> new file mode 100644
> index 0000000..6c05faf
> --- /dev/null
> +++ b/target-s390x/pci_ic.c
> @@ -0,0 +1,753 @@
> +/*
> + * s390 PCI intercepts
> + *
> + * Copyright 2014 IBM Corp.
> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> + *            Hong Bo Li <lihbbj@cn.ibm.com>
> + *            Yi Min Zhao <zyimin@cn.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include <sys/types.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +
> +#include <linux/kvm.h>
> +#include <asm/ptrace.h>
> +#include <hw/pci/pci.h>
> +#include <hw/pci/pci_host.h>
> +#include <net/net.h>
> +
> +#include "qemu-common.h"
> +#include "qemu/timer.h"
> +#include "migration/qemu-file.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/kvm.h"
> +#include "cpu.h"
> +#include "sysemu/device_tree.h"
> +#include "monitor/monitor.h"
> +#include "pci_ic.h"
> +
> +#include "hw/hw.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_bridge.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/s390x/s390-pci-bus.h"
> +#include "exec/exec-all.h"
> +#include "exec/memory-internal.h"
> +
> +/* #define DEBUG_S390PCI_IC */
> +#ifdef DEBUG_S390PCI_IC
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "s390pci_ic: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +static uint64_t resume_token;

global variable? Why?

> +
> +static uint8_t barsize(uint64_t size)
> +{
> +    uint64_t mask = 1;
> +    int i;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < 64; i++) {
> +        if (size & mask) {
> +            break;
> +        }
> +        mask = (mask << 1);
> +    }
> +
> +    return i;
> +}

Isn't there an existing helper for this in the PCI layer?

In fact, please check whether it makes sense to move some of the code to
hw/ rather than target-s390x.

> +
> +static void s390_set_status_code(CPUS390XState *env,
> +                                 uint8_t r, uint64_t status_code)
> +{
> +    env->regs[r] &= ~0xff000000;
> +    env->regs[r] |= (status_code & 0xff) << 24;
> +}
> +
> +static int list_pci(ClpReqRspListPci *rrb, uint8_t *cc)
> +{
> +    S390PCIBusDevice *pbdev;
> +    uint32_t res_code, initial_l2, g_l2, finish;
> +    int rc, idx;
> +
> +    rc = 0;
> +    if (be16_to_cpu(rrb->request.hdr.len) != 32) {
> +        res_code = CLP_RC_LEN;
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    if ((be32_to_cpu(rrb->request.fmt) & CLP_MASK_FMT) != 0) {
> +        res_code = CLP_RC_FMT;
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    if ((be32_to_cpu(rrb->request.fmt) & ~CLP_MASK_FMT) != 0 ||
> +        rrb->request.reserved1 != 0 ||
> +        rrb->request.reserved2 != 0) {
> +        res_code = CLP_RC_RESNOT0;
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    if (be64_to_cpu(rrb->request.resume_token) == 0) {
> +        resume_token = 0;
> +    } else if (be64_to_cpu(rrb->request.resume_token) != resume_token) {
> +        res_code = CLP_RC_LISTPCI_BADRT;
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    if (be16_to_cpu(rrb->response.hdr.len) < 48) {
> +        res_code = CLP_RC_8K;
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    initial_l2 = be16_to_cpu(rrb->response.hdr.len);
> +    if ((initial_l2 - LIST_PCI_HDR_LEN) % sizeof(ClpFhListEntry)
> +        != 0) {
> +        rc = -EINVAL;
> +        *cc = 3;
> +        goto out;
> +    }
> +
> +    rrb->response.fmt = 0;
> +    rrb->response.reserved1 = rrb->response.reserved2 = 0;
> +    rrb->response.mdd = cpu_to_be32(FH_VIRT);
> +    rrb->response.max_fn = cpu_to_be16(PCI_MAX_FUNCTIONS);
> +    rrb->response.entry_size = sizeof(ClpFhListEntry);
> +    finish = 0;
> +    idx = resume_token;
> +    g_l2 = LIST_PCI_HDR_LEN;
> +    do {
> +        pbdev = s390_pci_find_dev_by_idx(idx);
> +        if (!pbdev) {
> +            finish = 1;
> +            break;
> +        }
> +        rrb->response.fh_list[idx - resume_token].device_id =
> +            pci_get_word(pbdev->pdev->config + PCI_DEVICE_ID);
> +        rrb->response.fh_list[idx - resume_token].vendor_id =
> +            pci_get_word(pbdev->pdev->config + PCI_VENDOR_ID);
> +        rrb->response.fh_list[idx - resume_token].config =
> +            cpu_to_be32(0x80000000);
> +        rrb->response.fh_list[idx - resume_token].fid = cpu_to_be32(pbdev->fid);
> +        rrb->response.fh_list[idx - resume_token].fh = cpu_to_be32(pbdev->fh);
> +
> +        g_l2 += sizeof(ClpFhListEntry);
> +        DPRINTF("g_l2 %d vendor id 0x%x device id 0x%x fid 0x%x fh 0x%x\n",
> +            g_l2,
> +            rrb->response.fh_list[idx - resume_token].vendor_id,
> +            rrb->response.fh_list[idx - resume_token].device_id,
> +            rrb->response.fh_list[idx - resume_token].fid,
> +            rrb->response.fh_list[idx - resume_token].fh);
> +        idx++;
> +    } while (g_l2 < initial_l2);
> +
> +    if (finish == 1) {
> +        resume_token = 0;
> +    } else {
> +        resume_token = idx;
> +    }
> +    rrb->response.resume_token = cpu_to_be64(resume_token);
> +    rrb->response.hdr.len = cpu_to_be16(g_l2);
> +    rrb->response.hdr.rsp = cpu_to_be16(CLP_RC_OK);
> +out:
> +    if (rc) {
> +        DPRINTF("list pci failed rc 0x%x\n", rc);
> +        rrb->response.hdr.rsp = cpu_to_be16(res_code);
> +    }
> +    return rc;
> +}
> +
> +int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run)

Please separate kvm_ calls from the actual implementation. Do all the
parameter extraction in the kvm_ function and then forward on to a
generic function that doesn't need to know about kvm_run anymore.

kvm specific c file:

int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run)
{
    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
    return clp_service_call(cpu, r2);
}

io / pci specific c file:

int clp_service_call(S390CPU *cpu, uint8_t r2)
{
    ...
}

> +{
> +    ClpReqHdr *reqh;
> +    ClpRspHdr *resh;
> +    S390PCIBusDevice *pbdev;
> +    uint32_t req_len;
> +    uint32_t res_len;
> +    uint8_t *buffer;
> +    uint8_t cc = 0;
> +    CPUS390XState *env = &cpu->env;
> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> +    int i;
> +
> +    buffer = g_malloc0(4096 * 2);

Do you really need this? Couldn't you make the pointers be actual
structs on the stack and just read/write from them directly?

The compiler should be smart enough to throw away elements that aren't
used anymore to conserve memory.

> +    cpu_synchronize_state(CPU(cpu));
> +
> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> +        return 0;
> +    }
> +
> +    cpu_physical_memory_rw(env->regs[r2], buffer, sizeof(*reqh), 0);
> +    reqh = (ClpReqHdr *)buffer;
> +    req_len = be16_to_cpu(reqh->len);
> +    if (req_len < 16 || req_len > 8184 || (req_len % 8 != 0)) {
> +        program_interrupt(env, PGM_OPERAND, 4);
> +        return 0;
> +    }
> +
> +    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + sizeof(*resh), 0);
> +    resh = (ClpRspHdr *)(buffer + req_len);
> +    res_len = be16_to_cpu(resh->len);
> +    if (res_len < 8 || res_len > 8176 || (res_len % 8 != 0)) {
> +        program_interrupt(env, PGM_OPERAND, 4);
> +        return 0;
> +    }
> +    if ((req_len + res_len) > 8192) {
> +        program_interrupt(env, PGM_OPERAND, 4);
> +        return 0;
> +    }
> +
> +    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + res_len, 0);
> +
> +    if (req_len != 32) {
> +        resh->rsp = cpu_to_be16(CLP_RC_LEN);
> +        goto out;
> +    }
> +
> +    switch (reqh->cmd) {
> +    case CLP_LIST_PCI: {
> +        ClpReqRspListPci *rrb = (ClpReqRspListPci *)buffer;
> +        list_pci(rrb, &cc);
> +        break;
> +    }
> +    case CLP_SET_PCI_FN: {
> +        ClpReqSetPci *reqsetpci = (ClpReqSetPci *)reqh;
> +        ClpRspSetPci *ressetpci = (ClpRspSetPci *)resh;
> +
> +        pbdev = s390_pci_find_dev_by_fh(be32_to_cpu(reqsetpci->fh));
> +        if (!pbdev) {
> +                ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FH);
> +                goto out;
> +        }
> +
> +        switch (reqsetpci->oc) {
> +        case CLP_SET_ENABLE_PCI_FN:
> +            pbdev->fh = pbdev->fh | 1 << ENABLE_BIT_OFFSET;
> +            ressetpci->fh = cpu_to_be32(pbdev->fh);
> +            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_OK);
> +            break;
> +        case CLP_SET_DISABLE_PCI_FN:
> +            pbdev->fh = pbdev->fh & ~(1 << ENABLE_BIT_OFFSET);
> +            ressetpci->fh = cpu_to_be32(pbdev->fh);
> +            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_OK);
> +            break;
> +        default:
> +            DPRINTF("unknown set pci command\n");
> +            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FHOP);
> +            break;
> +        }
> +        break;
> +    }
> +    case CLP_QUERY_PCI_FN: {
> +        ClpReqQueryPci *reqquery = (ClpReqQueryPci *)reqh;
> +        ClpRspQueryPci *resquery = (ClpRspQueryPci *)resh;
> +
> +        pbdev = s390_pci_find_dev_by_fh(reqquery->fh);
> +        if (!pbdev) {
> +            DPRINTF("query pci no pci dev\n");
> +            resquery->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FH);
> +            goto out;
> +        }
> +
> +        for (i = 0; i < PCI_BAR_COUNT; i++) {
> +            uint64_t data = pci_host_config_read_common(pbdev->pdev,
> +                0x10 + (i * 4), pci_config_size(pbdev->pdev), 4);
> +
> +            resquery->bar[i] = bswap32(data);
> +            resquery->bar_size[i] = barsize(pbdev->pdev->io_regions[i].size);
> +            DPRINTF("bar %d addr 0x%x size 0x%lx barsize 0x%x\n", i,
> +                    resquery->bar[i], pbdev->pdev->io_regions[i].size,
> +                    resquery->bar_size[i]);
> +        }
> +
> +        resquery->sdma = ZPCI_SDMA_ADDR;
> +        resquery->edma = ZPCI_EDMA_ADDR;
> +        resquery->pchid = 0;
> +        resquery->ug = 1;
> +        resquery->uid = pbdev->fid;
> +
> +        resquery->hdr.rsp = CLP_RC_OK;
> +        break;
> +    }
> +    case CLP_QUERY_PCI_FNGRP: {
> +        ClpRspQueryPciGrp *resgrp = (ClpRspQueryPciGrp *)resh;
> +        resgrp->fr = 1;
> +        resgrp->dasm = 0;
> +        resgrp->msia = ZPCI_MSI_ADDR;
> +        resgrp->mui = 0;
> +        resgrp->i = 128;
> +        resgrp->version = 0;
> +
> +        resgrp->hdr.rsp = CLP_RC_OK;
> +        break;
> +    }
> +    default:
> +        DPRINTF("unknown clp command\n");
> +        resh->rsp = cpu_to_be16(CLP_RC_CMD);
> +        break;
> +    }
> +
> +out:
> +    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + res_len, 1);

... ah, to write back. Wouldn't it be cleaner to do this explicitly?

> +    g_free(buffer);
> +    setcc(cpu, cc);
> +    return 0;
> +}
> +
> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    S390PCIBusDevice *pbdev;
> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> +    PciLgStg *rp;
> +    uint64_t offset;
> +    uint64_t data;
> +    uint8_t len;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> +        return 0;
> +    }
> +
> +    if (r2 & 0x1) {
> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> +        return 0;
> +    }
> +
> +    rp = (PciLgStg *)&env->regs[r2];
> +    offset = env->regs[r2 + 1];
> +
> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> +    if (!pbdev) {
> +        DPRINTF("pcilg no pci dev\n");
> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> +        return 0;
> +    }
> +
> +    len = rp->len & 0xF;
> +    if (rp->pcias < 6) {
> +        if ((8 - (offset & 0x7)) < len) {
> +            program_interrupt(env, PGM_OPERAND, 4);
> +            return 0;
> +        }
> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> +        io_mem_read(mr, offset, &data, len);
> +    } else if (rp->pcias == 15) {
> +        if ((4 - (offset & 0x3)) < len) {
> +            program_interrupt(env, PGM_OPERAND, 4);
> +            return 0;
> +        }
> +        data =  pci_host_config_read_common(
> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
> +
> +        switch (len) {
> +        case 1:
> +            break;
> +        case 2:
> +            data = cpu_to_le16(data);
> +            break;
> +        case 4:
> +            data = cpu_to_le32(data);
> +            break;
> +        case 8:
> +            data = cpu_to_le64(data);
> +            break;

Why? Also, this is wrong. cpu_to_le64 convert between host endianness
and LE. So if you're running this on an LE host, you won't swap the
value and get a broken result.

If you know that the value is always swapped, use bswapxx().

> +        default:
> +            program_interrupt(env, PGM_OPERAND, 4);
> +            return 0;
> +        }
> +    } else {
> +        DPRINTF("invalid space\n");
> +        setcc(cpu, ZPCI_PCI_LS_ERR);
> +        s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
> +        return 0;
> +    }
> +
> +    env->regs[r1] = data;
> +    setcc(cpu, ZPCI_PCI_LS_OK);
> +    return 0;
> +}
> +
> +static void update_msix_table_msg_data(S390PCIBusDevice *pbdev, uint64_t offset,
> +                                       uint64_t *data, uint8_t len)
> +{
> +    uint32_t msg_data;
> +
> +    if (offset % PCI_MSIX_ENTRY_SIZE != 8) {
> +        return;
> +    }
> +
> +    if (len != 4) {
> +        DPRINTF("access msix table msg data but len is %d\n", len);
> +        return;
> +    }
> +
> +    msg_data = (pbdev->fid << ZPCI_MSI_VEC_BITS) | le32_to_cpu(*data);
> +    *data = cpu_to_le32(msg_data);
> +    DPRINTF("update msix msg_data to 0x%x\n", msg_data);
> +}
> +
> +static int trap_msix(S390PCIBusDevice *pbdev, uint64_t offset, uint8_t pcias)
> +{
> +    if (pbdev->msix.available && pbdev->msix.table_bar == pcias &&
> +        offset >= pbdev->msix.table_offset &&
> +        offset <= pbdev->msix.table_offset +
> +                  (pbdev->msix.entries - 1) * PCI_MSIX_ENTRY_SIZE) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +int kvm_pcistg_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> +    PciLgStg *rp;
> +    uint64_t offset, data;
> +    S390PCIBusDevice *pbdev;
> +    uint8_t len;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> +        return 0;
> +    }
> +
> +    if (r2 & 0x1) {
> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> +        return 0;
> +    }
> +
> +    rp = (PciLgStg *)&env->regs[r2];
> +    offset = env->regs[r2 + 1];
> +
> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> +    if (!pbdev) {
> +        DPRINTF("pcistg no pci dev\n");
> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> +        return 0;
> +    }
> +
> +    data = env->regs[r1];
> +    len = rp->len & 0xF;
> +    if (rp->pcias < 6) {
> +        if ((8 - (offset & 0x7)) < len) {
> +            program_interrupt(env, PGM_OPERAND, 4);
> +            return 0;
> +        }
> +        MemoryRegion *mr;
> +        if (trap_msix(pbdev, offset, rp->pcias)) {
> +            offset = offset - pbdev->msix.table_offset;
> +            mr = &pbdev->pdev->msix_table_mmio;
> +            update_msix_table_msg_data(pbdev, offset, &data, len);
> +        } else {
> +            mr = pbdev->pdev->io_regions[rp->pcias].memory;
> +        }
> +
> +        io_mem_write(mr, offset, data, len);
> +    } else if (rp->pcias == 15) {
> +        if ((4 - (offset & 0x3)) < len) {
> +            program_interrupt(env, PGM_OPERAND, 4);
> +            return 0;
> +        }
> +        switch (len) {
> +        case 1:
> +            break;
> +        case 2:
> +            data = le16_to_cpu(data);
> +            break;
> +        case 4:
> +            data = le32_to_cpu(data);
> +            break;
> +        case 8:
> +            data = le64_to_cpu(data);
> +            break;
> +        default:
> +            program_interrupt(env, PGM_OPERAND, 4);
> +            return 0;
> +        }

I guess you want a generic function similar to qemu_bswap_len() that
supports 64bit?

> +
> +        pci_host_config_write_common(pbdev->pdev, offset,
> +                                     pci_config_size(pbdev->pdev),
> +                                     data, len);
> +    } else {
> +        DPRINTF("pcistg invalid space\n");
> +        setcc(cpu, ZPCI_PCI_LS_ERR);
> +        s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
> +        return 0;
> +    }
> +
> +    setcc(cpu, ZPCI_PCI_LS_OK);
> +    return 0;
> +}
> +
> +int kvm_rpcit_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> +    uint32_t fh;
> +    uint64_t pte;
> +    S390PCIBusDevice *pbdev;
> +    ram_addr_t size;
> +    int flags;
> +    IOMMUTLBEntry entry;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> +        return 0;
> +    }
> +
> +    if (r2 & 0x1) {
> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> +        return 0;
> +    }
> +
> +    fh = env->regs[r1] >> 32;
> +    size = env->regs[r2 + 1];
> +
> +    pbdev = s390_pci_find_dev_by_fh(fh);
> +
> +    if (!pbdev) {
> +        DPRINTF("rpcit no pci dev\n");
> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> +        return 0;
> +    }
> +
> +    pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
> +                                   env->regs[r2]);
> +    flags = pte & ZPCI_PTE_FLAG_MASK;
> +    entry.target_as = &address_space_memory;
> +    entry.iova = env->regs[r2];
> +    entry.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
> +    entry.addr_mask = size - 1;
> +
> +    if (flags & ZPCI_PTE_INVALID) {
> +        entry.perm = IOMMU_NONE;
> +    } else {
> +        entry.perm = IOMMU_RW;
> +    }

Deja vu? This is the iommu translation function, no? Can't you somehow
just call it?

> +
> +    memory_region_notify_iommu(pci_device_iommu_address_space(
> +                               pbdev->pdev)->root, entry);
> +
> +    setcc(cpu, ZPCI_PCI_LS_OK);
> +    return 0;
> +}
> +
> +int kvm_sic_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    qemu_log_mask(LOG_UNIMP, "SIC missing\n");
> +    return 0;
> +}
> +
> +int kvm_pcistb_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    uint8_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
> +    uint8_t r3 = run->s390_sieic.ipa & 0x000f;
> +    PciStb *rp;
> +    uint64_t gaddr;
> +    uint64_t *uaddr, *pu;
> +    hwaddr len;
> +    S390PCIBusDevice *pbdev;
> +    MemoryRegion *mr;
> +    int i;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> +        program_interrupt(env, PGM_PRIVILEGED, 6);
> +        return 0;
> +    }
> +
> +    rp = (PciStb *)&env->regs[r1];
> +    if (rp->pcias > 5) {
> +        DPRINTF("pcistb invalid space\n");
> +        setcc(cpu, ZPCI_PCI_LS_ERR);
> +        s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS);
> +        return 0;
> +    }
> +
> +    switch (rp->len) {
> +    case 16:
> +    case 32:
> +    case 64:
> +    case 128:
> +        break;
> +    default:
> +        program_interrupt(env, PGM_SPECIFICATION, 6);
> +        return 0;
> +    }
> +
> +    gaddr = get_base_disp_rsy(cpu, run);
> +    len = rp->len;
> +
> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> +    if (!pbdev) {
> +        DPRINTF("pcistb no pci dev fh 0x%x\n", rp->fh);
> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> +        return 0;
> +    }
> +
> +    uaddr = cpu_physical_memory_map(gaddr, &len, 0);
> +    mr = pbdev->pdev->io_regions[rp->pcias].memory;
> +    if (!memory_region_access_valid(mr, env->regs[r3], rp->len, true)) {
> +        cpu_physical_memory_unmap(uaddr, len, 0, len);
> +        program_interrupt(env, PGM_ADDRESSING, 6);
> +        return 0;
> +    }
> +
> +    pu = uaddr;
> +    for (i = 0; i < rp->len / 8; i++) {
> +        io_mem_write(mr, env->regs[r3] + i * 8, *pu, 8);

Please don't overoptimize and just use individual ldq_phys() operations
here for each memory access. In general, try to avoid
cpu_physical_memory_map().

> +        pu++;
> +    }
> +
> +    cpu_physical_memory_unmap(uaddr, len, 0, len);
> +    setcc(cpu, ZPCI_PCI_LS_OK);
> +    return 0;
> +}
> +
> +static int reg_irqs(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib)
> +{
> +    int ret;
> +    S390FLICState *fs = s390_get_flic();
> +    S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
> +
> +    ret = css_register_io_adapter(S390_PCIPT_ADAPTER,
> +                                  FIB_DATA_ISC(fib.data), true, false,
> +                                  &pbdev->routes.adapter.adapter_id);
> +    assert(ret == 0);
> +
> +    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id, fib.aisb, true);
> +    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id, fib.aibv, true);
> +
> +    pbdev->routes.adapter.summary_addr = fib.aisb;
> +    pbdev->routes.adapter.summary_offset = FIB_DATA_AISBO(fib.data);
> +    pbdev->routes.adapter.ind_addr = fib.aibv;
> +    pbdev->routes.adapter.ind_offset = FIB_DATA_AIBVO(fib.data);
> +
> +    DPRINTF("reg_irqs adapter id %d\n", pbdev->routes.adapter.adapter_id);
> +    return 0;
> +}
> +
> +static int dereg_irqs(S390PCIBusDevice *pbdev)
> +{
> +    S390FLICState *fs = s390_get_flic();
> +    S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
> +
> +    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id,
> +                        pbdev->routes.adapter.ind_addr, false);
> +
> +    pbdev->routes.adapter.summary_addr = 0;
> +    pbdev->routes.adapter.summary_offset = 0;
> +    pbdev->routes.adapter.ind_addr = 0;
> +    pbdev->routes.adapter.ind_offset = 0;
> +
> +    DPRINTF("dereg_irqs adapter id %d\n", pbdev->routes.adapter.adapter_id);
> +    return 0;
> +}
> +
> +int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    uint8_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
> +    uint8_t oc;
> +    uint32_t fh;
> +    uint64_t fiba;
> +    ZpciFib fib;
> +    S390PCIBusDevice *pbdev;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> +        program_interrupt(env, PGM_PRIVILEGED, 6);
> +        return 0;
> +    }
> +
> +    oc = env->regs[r1] & 0xff;
> +    fh = env->regs[r1] >> 32;
> +    fiba = get_base_disp_rxy(cpu, run);
> +
> +    if (fiba & 0x7) {
> +        program_interrupt(env, PGM_SPECIFICATION, 6);
> +        return 0;
> +    }
> +
> +    pbdev = s390_pci_find_dev_by_fh(fh);
> +    if (!pbdev) {
> +        DPRINTF("mpcifc no pci dev fh 0x%x\n", fh);
> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> +        return 0;
> +    }
> +
> +    cpu_physical_memory_rw(fiba, (uint8_t *)&fib, sizeof(fib), 0);

I also find cpu_physical_memory_rw() pretty hard to read. Meanwhile,
it's been deprecated by cpu_physical_memory_read() and
cpu_physical_memory_write() which make the code more readable.

> +
> +    switch (oc) {
> +    case ZPCI_MOD_FC_REG_INT: {
> +        pbdev->isc = FIB_DATA_ISC(fib.data);
> +        reg_irqs(env, pbdev, fib);
> +        break;
> +    }
> +    case ZPCI_MOD_FC_DEREG_INT:
> +        dereg_irqs(pbdev);
> +        break;
> +    case ZPCI_MOD_FC_REG_IOAT:
> +        if (fib.pba > fib.pal) {
> +            program_interrupt(&cpu->env, PGM_OPERAND, 6);
> +            return 0;
> +        }
> +        pbdev->g_iota = fib.iota;
> +        break;
> +    case ZPCI_MOD_FC_DEREG_IOAT:
> +        break;
> +    case ZPCI_MOD_FC_REREG_IOAT:
> +        break;
> +    case ZPCI_MOD_FC_RESET_ERROR:
> +        break;
> +    case ZPCI_MOD_FC_RESET_BLOCK:
> +        break;
> +    case ZPCI_MOD_FC_SET_MEASURE:
> +        break;
> +    default:
> +        program_interrupt(&cpu->env, PGM_OPERAND, 6);
> +        return 0;
> +    }
> +
> +    setcc(cpu, ZPCI_PCI_LS_OK);
> +    return 0;
> +}
> +
> +int kvm_stpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +    qemu_log_mask(LOG_UNIMP, "STPCIFC missing\n");
> +    return 0;
> +}
> diff --git a/target-s390x/pci_ic.h b/target-s390x/pci_ic.h
> new file mode 100644
> index 0000000..0eb6c27
> --- /dev/null
> +++ b/target-s390x/pci_ic.h
> @@ -0,0 +1,335 @@
> +/*
> + * s390 PCI intercept definitions
> + *
> + * Copyright 2014 IBM Corp.
> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> + *            Hong Bo Li <lihbbj@cn.ibm.com>
> + *            Yi Min Zhao <zyimin@cn.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#ifndef PCI_IC_S390X_H
> +#define PCI_IC_S390X_H
> +
> +#include <sysemu/dma.h>
> +
> +/* CLP common request & response block size */
> +#define CLP_BLK_SIZE 4096
> +#define PCI_BAR_COUNT 6
> +#define PCI_MAX_FUNCTIONS 4096
> +
> +typedef struct ClpReqHdr {
> +    __uint16_t len;
> +    __uint16_t cmd;
> +} QEMU_PACKED ClpReqHdr;
> +
> +typedef struct ClpRspHdr {
> +    __uint16_t len;
> +    __uint16_t rsp;
> +} QEMU_PACKED ClpRspHdr;
> +
> +/* CLP Response Codes */
> +#define CLP_RC_OK         0x0010  /* Command request successfully */
> +#define CLP_RC_CMD        0x0020  /* Command code not recognized */
> +#define CLP_RC_PERM       0x0030  /* Command not authorized */
> +#define CLP_RC_FMT        0x0040  /* Invalid command request format */
> +#define CLP_RC_LEN        0x0050  /* Invalid command request length */
> +#define CLP_RC_8K         0x0060  /* Command requires 8K LPCB */
> +#define CLP_RC_RESNOT0    0x0070  /* Reserved field not zero */
> +#define CLP_RC_NODATA     0x0080  /* No data available */
> +#define CLP_RC_FC_UNKNOWN 0x0100  /* Function code not recognized */
> +
> +/*
> + * Call Logical Processor - Command Codes
> + */
> +#define CLP_LIST_PCI            0x0002
> +#define CLP_QUERY_PCI_FN        0x0003
> +#define CLP_QUERY_PCI_FNGRP     0x0004
> +#define CLP_SET_PCI_FN          0x0005
> +
> +/* PCI function handle list entry */
> +typedef struct ClpFhListEntry {
> +    __uint16_t device_id;
> +    __uint16_t vendor_id;
> +#define CLP_FHLIST_MASK_CONFIG 0x80000000
> +    __uint32_t config;
> +    __uint32_t fid;
> +    __uint32_t fh;
> +} QEMU_PACKED ClpFhListEntry;
> +
> +#define CLP_RC_SETPCIFN_FH      0x0101 /* Invalid PCI fn handle */
> +#define CLP_RC_SETPCIFN_FHOP    0x0102 /* Fn handle not valid for op */
> +#define CLP_RC_SETPCIFN_DMAAS   0x0103 /* Invalid DMA addr space */
> +#define CLP_RC_SETPCIFN_RES     0x0104 /* Insufficient resources */
> +#define CLP_RC_SETPCIFN_ALRDY   0x0105 /* Fn already in requested state */
> +#define CLP_RC_SETPCIFN_ERR     0x0106 /* Fn in permanent error state */
> +#define CLP_RC_SETPCIFN_RECPND  0x0107 /* Error recovery pending */
> +#define CLP_RC_SETPCIFN_BUSY    0x0108 /* Fn busy */
> +#define CLP_RC_LISTPCI_BADRT    0x010a /* Resume token not recognized */
> +#define CLP_RC_QUERYPCIFG_PFGID 0x010b /* Unrecognized PFGID */
> +
> +/* request or response block header length */
> +#define LIST_PCI_HDR_LEN 32
> +
> +/* Number of function handles fitting in response block */
> +#define CLP_FH_LIST_NR_ENTRIES \
> +    ((CLP_BLK_SIZE - 2 * LIST_PCI_HDR_LEN) \
> +        / sizeof(ClpFhListEntry))
> +
> +#define CLP_SET_ENABLE_PCI_FN  0 /* Yes, 0 enables it */
> +#define CLP_SET_DISABLE_PCI_FN 1 /* Yes, 1 disables it */
> +
> +#define CLP_UTIL_STR_LEN 64
> +
> +#define CLP_MASK_FMT 0xf0000000
> +
> +/* List PCI functions request */
> +typedef struct ClpReqListPci {
> +    ClpReqHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +    __uint64_t resume_token;
> +    __uint64_t reserved2;
> +} QEMU_PACKED ClpReqListPci;
> +
> +/* List PCI functions response */
> +typedef struct ClpRspListPci {
> +    ClpRspHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +    __uint64_t resume_token;
> +    __uint32_t mdd;
> +    __uint16_t max_fn;
> +    __uint8_t reserved2;
> +    __uint8_t entry_size;
> +    ClpFhListEntry fh_list[CLP_FH_LIST_NR_ENTRIES];
> +} QEMU_PACKED ClpRspListPci;
> +
> +/* Query PCI function request */
> +typedef struct ClpReqQueryPci {
> +    ClpReqHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +    __uint32_t fh; /* function handle */
> +    __uint32_t reserved2;
> +    __uint64_t reserved3;
> +} QEMU_PACKED ClpReqQueryPci;
> +
> +/* Query PCI function response */
> +typedef struct ClpRspQueryPci {
> +    ClpRspHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +    __uint16_t vfn; /* virtual fn number */
> +#define CLP_RSP_QPCI_MASK_UTIL  0x100
> +#define CLP_RSP_QPCI_MASK_PFGID 0xff
> +    __uint16_t ug;
> +    __uint32_t fid; /* pci function id */
> +    __uint8_t bar_size[PCI_BAR_COUNT];
> +    __uint16_t pchid;
> +    __uint32_t bar[PCI_BAR_COUNT];
> +    __uint64_t reserved2;
> +    __uint64_t sdma; /* start dma as */
> +    __uint64_t edma; /* end dma as */
> +    __uint32_t reserved3[11];
> +    __uint32_t uid;
> +    __uint8_t util_str[CLP_UTIL_STR_LEN]; /* utility string */
> +} QEMU_PACKED ClpRspQueryPci;
> +
> +/* Query PCI function group request */
> +typedef struct ClpReqQueryPciGrp {
> +    ClpReqHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +#define CLP_REQ_QPCIG_MASK_PFGID 0xff
> +    __uint32_t g;
> +    __uint32_t reserved2;
> +    __uint64_t reserved3;
> +} QEMU_PACKED ClpReqQueryPciGrp;
> +
> +/* Query PCI function group response */
> +typedef struct ClpRspQueryPciGrp {
> +    ClpRspHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +#define CLP_RSP_QPCIG_MASK_NOI 0xfff
> +    __uint16_t i;
> +    __uint8_t version;
> +#define CLP_RSP_QPCIG_MASK_FRAME   0x2
> +#define CLP_RSP_QPCIG_MASK_REFRESH 0x1
> +    __uint8_t fr;
> +    __uint16_t reserved2;
> +    __uint16_t mui;
> +    __uint64_t reserved3;
> +    __uint64_t dasm; /* dma address space mask */
> +    __uint64_t msia; /* MSI address */
> +    __uint64_t reserved4;
> +    __uint64_t reserved5;
> +} QEMU_PACKED ClpRspQueryPciGrp;
> +
> +/* Set PCI function request */
> +typedef struct ClpReqSetPci {
> +    ClpReqHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +    __uint32_t fh; /* function handle */
> +    __uint16_t reserved2;
> +    __uint8_t oc; /* operation controls */
> +    __uint8_t ndas; /* number of dma spaces */
> +    __uint64_t reserved3;
> +} QEMU_PACKED ClpReqSetPci;
> +
> +/* Set PCI function response */
> +typedef struct ClpRspSetPci {
> +    ClpRspHdr hdr;
> +    __uint32_t fmt;
> +    __uint64_t reserved1;
> +    __uint32_t fh; /* function handle */
> +    __uint32_t reserved3;
> +    __uint64_t reserved4;
> +} QEMU_PACKED ClpRspSetPci;
> +
> +typedef struct ClpReqRspListPci {
> +    ClpReqListPci request;
> +    ClpRspListPci response;
> +} QEMU_PACKED ClpReqRspListPci;
> +
> +typedef struct ClpReqRspSetPci {
> +    ClpReqSetPci request;
> +    ClpRspSetPci response;
> +} QEMU_PACKED ClpReqRspSetPci;
> +
> +typedef struct ClpReqRspQueryPci {
> +    ClpReqQueryPci request;
> +    ClpRspQueryPci response;
> +} QEMU_PACKED ClpReqRspQueryPci;
> +
> +typedef struct ClpReqRspQueryPciGrp {
> +    ClpReqQueryPciGrp request;
> +    ClpRspQueryPciGrp response;
> +} QEMU_PACKED ClpReqRspQueryPciGrp;
> +
> +typedef struct PciLgStg {
> +    uint32_t fh;
> +    uint8_t status;
> +    uint8_t pcias;
> +    uint8_t reserved;
> +    uint8_t len;
> +} QEMU_PACKED PciLgStg;
> +
> +typedef struct PciStb {
> +    uint32_t fh;
> +    uint8_t status;
> +    uint8_t pcias;
> +    uint8_t reserved;
> +    uint8_t len;
> +} QEMU_PACKED PciStb;
> +
> +/* Load/Store status codes */
> +#define ZPCI_PCI_ST_FUNC_NOT_ENABLED        4
> +#define ZPCI_PCI_ST_FUNC_IN_ERR             8
> +#define ZPCI_PCI_ST_BLOCKED                 12
> +#define ZPCI_PCI_ST_INSUF_RES               16
> +#define ZPCI_PCI_ST_INVAL_AS                20
> +#define ZPCI_PCI_ST_FUNC_ALREADY_ENABLED    24
> +#define ZPCI_PCI_ST_DMA_AS_NOT_ENABLED      28
> +#define ZPCI_PCI_ST_2ND_OP_IN_INV_AS        36
> +#define ZPCI_PCI_ST_FUNC_NOT_AVAIL          40
> +#define ZPCI_PCI_ST_ALREADY_IN_RQ_STATE     44
> +
> +/* Load/Store return codes */
> +#define ZPCI_PCI_LS_OK              0
> +#define ZPCI_PCI_LS_ERR             1
> +#define ZPCI_PCI_LS_BUSY            2
> +#define ZPCI_PCI_LS_INVAL_HANDLE    3
> +
> +/* Modify PCI Function Controls */
> +#define ZPCI_MOD_FC_REG_INT     2
> +#define ZPCI_MOD_FC_DEREG_INT   3
> +#define ZPCI_MOD_FC_REG_IOAT    4
> +#define ZPCI_MOD_FC_DEREG_IOAT  5
> +#define ZPCI_MOD_FC_REREG_IOAT  6
> +#define ZPCI_MOD_FC_RESET_ERROR 7
> +#define ZPCI_MOD_FC_RESET_BLOCK 9
> +#define ZPCI_MOD_FC_SET_MEASURE 10
> +
> +/* FIB function controls */
> +#define ZPCI_FIB_FC_ENABLED     0x80
> +#define ZPCI_FIB_FC_ERROR       0x40
> +#define ZPCI_FIB_FC_LS_BLOCKED  0x20
> +#define ZPCI_FIB_FC_DMAAS_REG   0x10
> +
> +/* FIB function controls */
> +#define ZPCI_FIB_FC_ENABLED     0x80
> +#define ZPCI_FIB_FC_ERROR       0x40
> +#define ZPCI_FIB_FC_LS_BLOCKED  0x20
> +#define ZPCI_FIB_FC_DMAAS_REG   0x10
> +
> +/* Function Information Block */
> +typedef struct ZpciFib {
> +    __uint8_t fmt;   /* format */
> +    __uint8_t reserved1[7];
> +    __uint8_t fc;                  /* function controls */
> +    __uint8_t reserved2;
> +    __uint16_t reserved3;
> +    __uint32_t reserved4;
> +    __uint64_t pba;                /* PCI base address */
> +    __uint64_t pal;                /* PCI address limit */
> +    __uint64_t iota;               /* I/O Translation Anchor */
> +#define FIB_DATA_ISC(x)    (((x) >> 28) & 0x7)
> +#define FIB_DATA_NOI(x)    (((x) >> 16) & 0xfff)
> +#define FIB_DATA_AIBVO(x) (((x) >> 8) & 0x3f)
> +#define FIB_DATA_SUM(x)    (((x) >> 7) & 0x1)
> +#define FIB_DATA_AISBO(x)  ((x) & 0x3f)
> +    __uint32_t data;
> +    __uint32_t reserved5;
> +    __uint64_t aibv;               /* Adapter int bit vector address */
> +    __uint64_t aisb;               /* Adapter int summary bit address */
> +    __uint64_t fmb_addr;           /* Function measurement address and key */
> +    __uint32_t reserved6;
> +    __uint32_t gd;
> +} QEMU_PACKED ZpciFib;
> +
> +static inline uint64_t get_base_disp_rxy(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    uint32_t x2 = (run->s390_sieic.ipa & 0x000f);
> +    uint32_t base2 = run->s390_sieic.ipb >> 28;
> +    uint32_t disp2 = ((run->s390_sieic.ipb & 0x0fff0000) >> 16) +
> +                     ((run->s390_sieic.ipb & 0xff00) << 4);
> +
> +    if (disp2 & 0x80000) {
> +        disp2 += 0xfff00000;
> +    }
> +
> +    return (base2 ? env->regs[base2] : 0) +
> +           (x2 ? env->regs[x2] : 0) + (long)(int)disp2;
> +}
> +
> +static inline uint64_t get_base_disp_rsy(S390CPU *cpu, struct kvm_run *run)
> +{
> +    CPUS390XState *env = &cpu->env;
> +    uint32_t base2 = run->s390_sieic.ipb >> 28;
> +    uint32_t disp2 = ((run->s390_sieic.ipb & 0x0fff0000) >> 16) +
> +                     ((run->s390_sieic.ipb & 0xff00) << 4);
> +
> +    if (disp2 & 0x80000) {
> +        disp2 += 0xfff00000;
> +    }
> +
> +    return (base2 ? env->regs[base2] : 0) + (long)(int)disp2;
> +}

Same comment as in the previous patch here, please try to avoid putting
code into a header file.

> +
> +int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_rpcit_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_sic_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_pcistb_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_pcistg_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run);
> +int kvm_stpcifc_service_call(S390CPU *cpu, struct kvm_run *run);

Hrm. Maybe we could add some registration hook similar to spapr's hcall
or rtas callback registration that would allow us to encapsulate this a
bit better?

Then you'd only have to spawn a PHB device which could register for
these service calls.


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-10 15:56   ` Alexander Graf
@ 2014-11-11 12:10     ` Frank Blaschka
  2014-11-11 12:16       ` Alexander Graf
  2014-11-11 12:17       ` Peter Maydell
  0 siblings, 2 replies; 27+ messages in thread
From: Frank Blaschka @ 2014-11-11 12:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth

On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> 
> 
> On 10.11.14 15:20, Frank Blaschka wrote:
> > From: Frank Blaschka <frank.blaschka@de.ibm.com>
> > 
> > This patch implements the s390 pci instructions in qemu. It allows
> > to access and drive pci devices attached to the s390 pci bus.
> > Because of platform constrains devices using IO BARs are not
> > supported. Also a device has to support MSI/MSI-X to run on s390.
> > 
> > Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> > ---
> >  target-s390x/Makefile.objs |   2 +-
> >  target-s390x/kvm.c         |  52 ++++
> >  target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
> >  target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
> >  4 files changed, 1141 insertions(+), 1 deletion(-)
> >  create mode 100644 target-s390x/pci_ic.c
> >  create mode 100644 target-s390x/pci_ic.h
> > 
> > diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
> > index 2c57494..cc71400 100644
> > --- a/target-s390x/Makefile.objs
> > +++ b/target-s390x/Makefile.objs
> > @@ -2,4 +2,4 @@ obj-y += translate.o helper.o cpu.o interrupt.o
> >  obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
> >  obj-y += gdbstub.o
> >  obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o
> > -obj-$(CONFIG_KVM) += kvm.o
> > +obj-$(CONFIG_KVM) += kvm.o pci_ic.o
> > diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> > index 5b10a25..d59e740 100644
> > --- a/target-s390x/kvm.c
> > +++ b/target-s390x/kvm.c
> > @@ -40,6 +40,7 @@
> >  #include "exec/gdbstub.h"
> >  #include "trace.h"
> >  #include "qapi-event.h"
> > +#include "pci_ic.h"
> >  
> >  /* #define DEBUG_KVM */
> >  
> > @@ -56,6 +57,7 @@
> >  #define IPA0_B2                         0xb200
> >  #define IPA0_B9                         0xb900
> >  #define IPA0_EB                         0xeb00
> > +#define IPA0_E3                         0xe300
> >  
> >  #define PRIV_B2_SCLP_CALL               0x20
> >  #define PRIV_B2_CSCH                    0x30
> > @@ -76,8 +78,17 @@
> >  #define PRIV_B2_XSCH                    0x76
> >  
> >  #define PRIV_EB_SQBS                    0x8a
> > +#define PRIV_EB_PCISTB                  0xd0
> > +#define PRIV_EB_SIC                     0xd1
> >  
> >  #define PRIV_B9_EQBS                    0x9c
> > +#define PRIV_B9_CLP                     0xa0
> > +#define PRIV_B9_PCISTG                  0xd0
> > +#define PRIV_B9_PCILG                   0xd2
> > +#define PRIV_B9_RPCIT                   0xd3
> > +
> > +#define PRIV_E3_MPCIFC                  0xd0
> > +#define PRIV_E3_STPCIFC                 0xd4
> >  
> >  #define DIAG_IPL                        0x308
> >  #define DIAG_KVM_HYPERCALL              0x500
> > @@ -814,6 +825,18 @@ static int handle_b9(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
> >      int r = 0;
> >  
> >      switch (ipa1) {
> > +    case PRIV_B9_CLP:
> > +        r = kvm_clp_service_call(cpu, run);
> > +        break;
> > +    case PRIV_B9_PCISTG:
> > +        r = kvm_pcistg_service_call(cpu, run);
> > +        break;
> > +    case PRIV_B9_PCILG:
> > +        r = kvm_pcilg_service_call(cpu, run);
> > +        break;
> > +    case PRIV_B9_RPCIT:
> > +        r = kvm_rpcit_service_call(cpu, run);
> > +        break;
> >      case PRIV_B9_EQBS:
> >          /* just inject exception */
> >          r = -1;
> > @@ -832,6 +855,12 @@ static int handle_eb(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
> >      int r = 0;
> >  
> >      switch (ipa1) {
> > +    case PRIV_EB_PCISTB:
> > +        r = kvm_pcistb_service_call(cpu, run);
> > +        break;
> > +    case PRIV_EB_SIC:
> > +        r = kvm_sic_service_call(cpu, run);
> > +        break;
> >      case PRIV_EB_SQBS:
> >          /* just inject exception */
> >          r = -1;
> > @@ -845,6 +874,26 @@ static int handle_eb(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
> >      return r;
> >  }
> >  
> > +static int handle_e3(S390CPU *cpu, struct kvm_run *run, uint8_t ipbl)
> > +{
> > +    int r = 0;
> > +
> > +    switch (ipbl) {
> > +    case PRIV_E3_MPCIFC:
> > +        r = kvm_mpcifc_service_call(cpu, run);
> > +        break;
> > +    case PRIV_E3_STPCIFC:
> > +        r = kvm_stpcifc_service_call(cpu, run);
> > +        break;
> > +    default:
> > +        r = -1;
> > +        DPRINTF("KVM: unhandled PRIV: 0xe3%x\n", ipbl);
> > +        break;
> > +    }
> > +
> > +    return r;
> > +}
> > +
> >  static int handle_hypercall(S390CPU *cpu, struct kvm_run *run)
> >  {
> >      CPUS390XState *env = &cpu->env;
> > @@ -1041,6 +1090,9 @@ static int handle_instruction(S390CPU *cpu, struct kvm_run *run)
> >      case IPA0_EB:
> >          r = handle_eb(cpu, run, ipa1);
> >          break;
> > +    case IPA0_E3:
> > +        r = handle_e3(cpu, run, run->s390_sieic.ipb & 0xff);
> > +        break;
> >      case IPA0_DIAG:
> >          r = handle_diag(cpu, run, run->s390_sieic.ipb);
> >          break;
> > diff --git a/target-s390x/pci_ic.c b/target-s390x/pci_ic.c
> > new file mode 100644
> > index 0000000..6c05faf
> > --- /dev/null
> > +++ b/target-s390x/pci_ic.c
> > @@ -0,0 +1,753 @@
> > +/*
> > + * s390 PCI intercepts
> > + *
> > + * Copyright 2014 IBM Corp.
> > + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> > + *            Hong Bo Li <lihbbj@cn.ibm.com>
> > + *            Yi Min Zhao <zyimin@cn.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > + * your option) any later version. See the COPYING file in the top-level
> > + * directory.
> > + */
> > +
> > +#include <sys/types.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/mman.h>
> > +
> > +#include <linux/kvm.h>
> > +#include <asm/ptrace.h>
> > +#include <hw/pci/pci.h>
> > +#include <hw/pci/pci_host.h>
> > +#include <net/net.h>
> > +
> > +#include "qemu-common.h"
> > +#include "qemu/timer.h"
> > +#include "migration/qemu-file.h"
> > +#include "sysemu/sysemu.h"
> > +#include "sysemu/kvm.h"
> > +#include "cpu.h"
> > +#include "sysemu/device_tree.h"
> > +#include "monitor/monitor.h"
> > +#include "pci_ic.h"
> > +
> > +#include "hw/hw.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/pci/pci_bridge.h"
> > +#include "hw/pci/pci_bus.h"
> > +#include "hw/pci/pci_host.h"
> > +#include "hw/s390x/s390-pci-bus.h"
> > +#include "exec/exec-all.h"
> > +#include "exec/memory-internal.h"
> > +
> > +/* #define DEBUG_S390PCI_IC */
> > +#ifdef DEBUG_S390PCI_IC
> > +#define DPRINTF(fmt, ...) \
> > +    do { fprintf(stderr, "s390pci_ic: " fmt, ## __VA_ARGS__); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> > +
> > +static uint64_t resume_token;
> 
> global variable? Why?
>

Hi Alex,

thx for the review will try to address all issues from 1/3 and 2/3 patch.
If I do not agree with a change I try to explain ...
 
> > +
> > +static uint8_t barsize(uint64_t size)
> > +{
> > +    uint64_t mask = 1;
> > +    int i;
> > +
> > +    if (!size) {
> > +        return 0;
> > +    }
> > +
> > +    for (i = 0; i < 64; i++) {
> > +        if (size & mask) {
> > +            break;
> > +        }
> > +        mask = (mask << 1);
> > +    }
> > +
> > +    return i;
> > +}
> 
> Isn't there an existing helper for this in the PCI layer?
>

Did not find one, this function is used to fill a s390 specific len
in an instruction intercept (architecture specific encoding of the len).
 
> In fact, please check whether it makes sense to move some of the code to
> hw/ rather than target-s390x.
> 
> > +
> > +static void s390_set_status_code(CPUS390XState *env,
> > +                                 uint8_t r, uint64_t status_code)
> > +{
> > +    env->regs[r] &= ~0xff000000;
> > +    env->regs[r] |= (status_code & 0xff) << 24;
> > +}
> > +
> > +static int list_pci(ClpReqRspListPci *rrb, uint8_t *cc)
> > +{
> > +    S390PCIBusDevice *pbdev;
> > +    uint32_t res_code, initial_l2, g_l2, finish;
> > +    int rc, idx;
> > +
> > +    rc = 0;
> > +    if (be16_to_cpu(rrb->request.hdr.len) != 32) {
> > +        res_code = CLP_RC_LEN;
> > +        rc = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    if ((be32_to_cpu(rrb->request.fmt) & CLP_MASK_FMT) != 0) {
> > +        res_code = CLP_RC_FMT;
> > +        rc = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    if ((be32_to_cpu(rrb->request.fmt) & ~CLP_MASK_FMT) != 0 ||
> > +        rrb->request.reserved1 != 0 ||
> > +        rrb->request.reserved2 != 0) {
> > +        res_code = CLP_RC_RESNOT0;
> > +        rc = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    if (be64_to_cpu(rrb->request.resume_token) == 0) {
> > +        resume_token = 0;
> > +    } else if (be64_to_cpu(rrb->request.resume_token) != resume_token) {
> > +        res_code = CLP_RC_LISTPCI_BADRT;
> > +        rc = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    if (be16_to_cpu(rrb->response.hdr.len) < 48) {
> > +        res_code = CLP_RC_8K;
> > +        rc = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    initial_l2 = be16_to_cpu(rrb->response.hdr.len);
> > +    if ((initial_l2 - LIST_PCI_HDR_LEN) % sizeof(ClpFhListEntry)
> > +        != 0) {
> > +        rc = -EINVAL;
> > +        *cc = 3;
> > +        goto out;
> > +    }
> > +
> > +    rrb->response.fmt = 0;
> > +    rrb->response.reserved1 = rrb->response.reserved2 = 0;
> > +    rrb->response.mdd = cpu_to_be32(FH_VIRT);
> > +    rrb->response.max_fn = cpu_to_be16(PCI_MAX_FUNCTIONS);
> > +    rrb->response.entry_size = sizeof(ClpFhListEntry);
> > +    finish = 0;
> > +    idx = resume_token;
> > +    g_l2 = LIST_PCI_HDR_LEN;
> > +    do {
> > +        pbdev = s390_pci_find_dev_by_idx(idx);
> > +        if (!pbdev) {
> > +            finish = 1;
> > +            break;
> > +        }
> > +        rrb->response.fh_list[idx - resume_token].device_id =
> > +            pci_get_word(pbdev->pdev->config + PCI_DEVICE_ID);
> > +        rrb->response.fh_list[idx - resume_token].vendor_id =
> > +            pci_get_word(pbdev->pdev->config + PCI_VENDOR_ID);
> > +        rrb->response.fh_list[idx - resume_token].config =
> > +            cpu_to_be32(0x80000000);
> > +        rrb->response.fh_list[idx - resume_token].fid = cpu_to_be32(pbdev->fid);
> > +        rrb->response.fh_list[idx - resume_token].fh = cpu_to_be32(pbdev->fh);
> > +
> > +        g_l2 += sizeof(ClpFhListEntry);
> > +        DPRINTF("g_l2 %d vendor id 0x%x device id 0x%x fid 0x%x fh 0x%x\n",
> > +            g_l2,
> > +            rrb->response.fh_list[idx - resume_token].vendor_id,
> > +            rrb->response.fh_list[idx - resume_token].device_id,
> > +            rrb->response.fh_list[idx - resume_token].fid,
> > +            rrb->response.fh_list[idx - resume_token].fh);
> > +        idx++;
> > +    } while (g_l2 < initial_l2);
> > +
> > +    if (finish == 1) {
> > +        resume_token = 0;
> > +    } else {
> > +        resume_token = idx;
> > +    }
> > +    rrb->response.resume_token = cpu_to_be64(resume_token);
> > +    rrb->response.hdr.len = cpu_to_be16(g_l2);
> > +    rrb->response.hdr.rsp = cpu_to_be16(CLP_RC_OK);
> > +out:
> > +    if (rc) {
> > +        DPRINTF("list pci failed rc 0x%x\n", rc);
> > +        rrb->response.hdr.rsp = cpu_to_be16(res_code);
> > +    }
> > +    return rc;
> > +}
> > +
> > +int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run)
> 
> Please separate kvm_ calls from the actual implementation. Do all the
> parameter extraction in the kvm_ function and then forward on to a
> generic function that doesn't need to know about kvm_run anymore.
> 
> kvm specific c file:
> 
> int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run)
> {
>     uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>     return clp_service_call(cpu, r2);
> }
> 
> io / pci specific c file:
> 
> int clp_service_call(S390CPU *cpu, uint8_t r2)
> {
>     ...
> }
>

I had already in mind to separate clp and pci instruction implementation
from kvm and move them to hw. Will do some major rework and move code arround.
 
> > +{
> > +    ClpReqHdr *reqh;
> > +    ClpRspHdr *resh;
> > +    S390PCIBusDevice *pbdev;
> > +    uint32_t req_len;
> > +    uint32_t res_len;
> > +    uint8_t *buffer;
> > +    uint8_t cc = 0;
> > +    CPUS390XState *env = &cpu->env;
> > +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> > +    int i;
> > +
> > +    buffer = g_malloc0(4096 * 2);
> 
> Do you really need this? Couldn't you make the pointers be actual
> structs on the stack and just read/write from them directly?
> 
> The compiler should be smart enough to throw away elements that aren't
> used anymore to conserve memory.
> 
> > +    cpu_synchronize_state(CPU(cpu));
> > +
> > +    if (env->psw.mask & PSW_MASK_PSTATE) {
> > +        program_interrupt(env, PGM_PRIVILEGED, 4);
> > +        return 0;
> > +    }
> > +
> > +    cpu_physical_memory_rw(env->regs[r2], buffer, sizeof(*reqh), 0);
> > +    reqh = (ClpReqHdr *)buffer;
> > +    req_len = be16_to_cpu(reqh->len);
> > +    if (req_len < 16 || req_len > 8184 || (req_len % 8 != 0)) {
> > +        program_interrupt(env, PGM_OPERAND, 4);
> > +        return 0;
> > +    }
> > +
> > +    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + sizeof(*resh), 0);
> > +    resh = (ClpRspHdr *)(buffer + req_len);
> > +    res_len = be16_to_cpu(resh->len);
> > +    if (res_len < 8 || res_len > 8176 || (res_len % 8 != 0)) {
> > +        program_interrupt(env, PGM_OPERAND, 4);
> > +        return 0;
> > +    }
> > +    if ((req_len + res_len) > 8192) {
> > +        program_interrupt(env, PGM_OPERAND, 4);
> > +        return 0;
> > +    }
> > +
> > +    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + res_len, 0);
> > +
> > +    if (req_len != 32) {
> > +        resh->rsp = cpu_to_be16(CLP_RC_LEN);
> > +        goto out;
> > +    }
> > +
> > +    switch (reqh->cmd) {
> > +    case CLP_LIST_PCI: {
> > +        ClpReqRspListPci *rrb = (ClpReqRspListPci *)buffer;
> > +        list_pci(rrb, &cc);
> > +        break;
> > +    }
> > +    case CLP_SET_PCI_FN: {
> > +        ClpReqSetPci *reqsetpci = (ClpReqSetPci *)reqh;
> > +        ClpRspSetPci *ressetpci = (ClpRspSetPci *)resh;
> > +
> > +        pbdev = s390_pci_find_dev_by_fh(be32_to_cpu(reqsetpci->fh));
> > +        if (!pbdev) {
> > +                ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FH);
> > +                goto out;
> > +        }
> > +
> > +        switch (reqsetpci->oc) {
> > +        case CLP_SET_ENABLE_PCI_FN:
> > +            pbdev->fh = pbdev->fh | 1 << ENABLE_BIT_OFFSET;
> > +            ressetpci->fh = cpu_to_be32(pbdev->fh);
> > +            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_OK);
> > +            break;
> > +        case CLP_SET_DISABLE_PCI_FN:
> > +            pbdev->fh = pbdev->fh & ~(1 << ENABLE_BIT_OFFSET);
> > +            ressetpci->fh = cpu_to_be32(pbdev->fh);
> > +            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_OK);
> > +            break;
> > +        default:
> > +            DPRINTF("unknown set pci command\n");
> > +            ressetpci->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FHOP);
> > +            break;
> > +        }
> > +        break;
> > +    }
> > +    case CLP_QUERY_PCI_FN: {
> > +        ClpReqQueryPci *reqquery = (ClpReqQueryPci *)reqh;
> > +        ClpRspQueryPci *resquery = (ClpRspQueryPci *)resh;
> > +
> > +        pbdev = s390_pci_find_dev_by_fh(reqquery->fh);
> > +        if (!pbdev) {
> > +            DPRINTF("query pci no pci dev\n");
> > +            resquery->hdr.rsp = cpu_to_be16(CLP_RC_SETPCIFN_FH);
> > +            goto out;
> > +        }
> > +
> > +        for (i = 0; i < PCI_BAR_COUNT; i++) {
> > +            uint64_t data = pci_host_config_read_common(pbdev->pdev,
> > +                0x10 + (i * 4), pci_config_size(pbdev->pdev), 4);
> > +
> > +            resquery->bar[i] = bswap32(data);
> > +            resquery->bar_size[i] = barsize(pbdev->pdev->io_regions[i].size);
> > +            DPRINTF("bar %d addr 0x%x size 0x%lx barsize 0x%x\n", i,
> > +                    resquery->bar[i], pbdev->pdev->io_regions[i].size,
> > +                    resquery->bar_size[i]);
> > +        }
> > +
> > +        resquery->sdma = ZPCI_SDMA_ADDR;
> > +        resquery->edma = ZPCI_EDMA_ADDR;
> > +        resquery->pchid = 0;
> > +        resquery->ug = 1;
> > +        resquery->uid = pbdev->fid;
> > +
> > +        resquery->hdr.rsp = CLP_RC_OK;
> > +        break;
> > +    }
> > +    case CLP_QUERY_PCI_FNGRP: {
> > +        ClpRspQueryPciGrp *resgrp = (ClpRspQueryPciGrp *)resh;
> > +        resgrp->fr = 1;
> > +        resgrp->dasm = 0;
> > +        resgrp->msia = ZPCI_MSI_ADDR;
> > +        resgrp->mui = 0;
> > +        resgrp->i = 128;
> > +        resgrp->version = 0;
> > +
> > +        resgrp->hdr.rsp = CLP_RC_OK;
> > +        break;
> > +    }
> > +    default:
> > +        DPRINTF("unknown clp command\n");
> > +        resh->rsp = cpu_to_be16(CLP_RC_CMD);
> > +        break;
> > +    }
> > +
> > +out:
> > +    cpu_physical_memory_rw(env->regs[r2], buffer, req_len + res_len, 1);
> 
> ... ah, to write back. Wouldn't it be cleaner to do this explicitly?
> 
> > +    g_free(buffer);
> > +    setcc(cpu, cc);
> > +    return 0;
> > +}
> > +
> > +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    S390PCIBusDevice *pbdev;
> > +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> > +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> > +    PciLgStg *rp;
> > +    uint64_t offset;
> > +    uint64_t data;
> > +    uint8_t len;
> > +
> > +    cpu_synchronize_state(CPU(cpu));
> > +
> > +    if (env->psw.mask & PSW_MASK_PSTATE) {
> > +        program_interrupt(env, PGM_PRIVILEGED, 4);
> > +        return 0;
> > +    }
> > +
> > +    if (r2 & 0x1) {
> > +        program_interrupt(env, PGM_SPECIFICATION, 4);
> > +        return 0;
> > +    }
> > +
> > +    rp = (PciLgStg *)&env->regs[r2];
> > +    offset = env->regs[r2 + 1];
> > +
> > +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> > +    if (!pbdev) {
> > +        DPRINTF("pcilg no pci dev\n");
> > +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> > +        return 0;
> > +    }
> > +
> > +    len = rp->len & 0xF;
> > +    if (rp->pcias < 6) {
> > +        if ((8 - (offset & 0x7)) < len) {
> > +            program_interrupt(env, PGM_OPERAND, 4);
> > +            return 0;
> > +        }
> > +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> > +        io_mem_read(mr, offset, &data, len);
> > +    } else if (rp->pcias == 15) {
> > +        if ((4 - (offset & 0x3)) < len) {
> > +            program_interrupt(env, PGM_OPERAND, 4);
> > +            return 0;
> > +        }
> > +        data =  pci_host_config_read_common(
> > +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
> > +
> > +        switch (len) {
> > +        case 1:
> > +            break;
> > +        case 2:
> > +            data = cpu_to_le16(data);
> > +            break;
> > +        case 4:
> > +            data = cpu_to_le32(data);
> > +            break;
> > +        case 8:
> > +            data = cpu_to_le64(data);
> > +            break;
> 
> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> and LE. So if you're running this on an LE host, you won't swap the
> value and get a broken result.
> 
> If you know that the value is always swapped, use bswapxx().
>

Actually the code is right and required for a big endian host :-)
pcilg/pcistg provide access to the PCI config space which is defined
as PCI byte order (little endian). Since pci_host_config_read_common does
already a le to cpu conversion we have to convert back to PCI byte order.
Doing an unconditional swap would be a bug on a little endian host.

> > +        default:
> > +            program_interrupt(env, PGM_OPERAND, 4);
> > +            return 0;
> > +        }
> > +    } else {
> > +        DPRINTF("invalid space\n");
> > +        setcc(cpu, ZPCI_PCI_LS_ERR);
> > +        s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
> > +        return 0;
> > +    }
> > +
> > +    env->regs[r1] = data;
> > +    setcc(cpu, ZPCI_PCI_LS_OK);
> > +    return 0;
> > +}
> > +
> > +static void update_msix_table_msg_data(S390PCIBusDevice *pbdev, uint64_t offset,
> > +                                       uint64_t *data, uint8_t len)
> > +{
> > +    uint32_t msg_data;
> > +
> > +    if (offset % PCI_MSIX_ENTRY_SIZE != 8) {
> > +        return;
> > +    }
> > +
> > +    if (len != 4) {
> > +        DPRINTF("access msix table msg data but len is %d\n", len);
> > +        return;
> > +    }
> > +
> > +    msg_data = (pbdev->fid << ZPCI_MSI_VEC_BITS) | le32_to_cpu(*data);
> > +    *data = cpu_to_le32(msg_data);
> > +    DPRINTF("update msix msg_data to 0x%x\n", msg_data);
> > +}
> > +
> > +static int trap_msix(S390PCIBusDevice *pbdev, uint64_t offset, uint8_t pcias)
> > +{
> > +    if (pbdev->msix.available && pbdev->msix.table_bar == pcias &&
> > +        offset >= pbdev->msix.table_offset &&
> > +        offset <= pbdev->msix.table_offset +
> > +                  (pbdev->msix.entries - 1) * PCI_MSIX_ENTRY_SIZE) {
> > +        return 1;
> > +    } else {
> > +        return 0;
> > +    }
> > +}
> > +
> > +int kvm_pcistg_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> > +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> > +    PciLgStg *rp;
> > +    uint64_t offset, data;
> > +    S390PCIBusDevice *pbdev;
> > +    uint8_t len;
> > +
> > +    cpu_synchronize_state(CPU(cpu));
> > +
> > +    if (env->psw.mask & PSW_MASK_PSTATE) {
> > +        program_interrupt(env, PGM_PRIVILEGED, 4);
> > +        return 0;
> > +    }
> > +
> > +    if (r2 & 0x1) {
> > +        program_interrupt(env, PGM_SPECIFICATION, 4);
> > +        return 0;
> > +    }
> > +
> > +    rp = (PciLgStg *)&env->regs[r2];
> > +    offset = env->regs[r2 + 1];
> > +
> > +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> > +    if (!pbdev) {
> > +        DPRINTF("pcistg no pci dev\n");
> > +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> > +        return 0;
> > +    }
> > +
> > +    data = env->regs[r1];
> > +    len = rp->len & 0xF;
> > +    if (rp->pcias < 6) {
> > +        if ((8 - (offset & 0x7)) < len) {
> > +            program_interrupt(env, PGM_OPERAND, 4);
> > +            return 0;
> > +        }
> > +        MemoryRegion *mr;
> > +        if (trap_msix(pbdev, offset, rp->pcias)) {
> > +            offset = offset - pbdev->msix.table_offset;
> > +            mr = &pbdev->pdev->msix_table_mmio;
> > +            update_msix_table_msg_data(pbdev, offset, &data, len);
> > +        } else {
> > +            mr = pbdev->pdev->io_regions[rp->pcias].memory;
> > +        }
> > +
> > +        io_mem_write(mr, offset, data, len);
> > +    } else if (rp->pcias == 15) {
> > +        if ((4 - (offset & 0x3)) < len) {
> > +            program_interrupt(env, PGM_OPERAND, 4);
> > +            return 0;
> > +        }
> > +        switch (len) {
> > +        case 1:
> > +            break;
> > +        case 2:
> > +            data = le16_to_cpu(data);
> > +            break;
> > +        case 4:
> > +            data = le32_to_cpu(data);
> > +            break;
> > +        case 8:
> > +            data = le64_to_cpu(data);
> > +            break;
> > +        default:
> > +            program_interrupt(env, PGM_OPERAND, 4);
> > +            return 0;
> > +        }
> 
> I guess you want a generic function similar to qemu_bswap_len() that
> supports 64bit?
> 
> > +
> > +        pci_host_config_write_common(pbdev->pdev, offset,
> > +                                     pci_config_size(pbdev->pdev),
> > +                                     data, len);
> > +    } else {
> > +        DPRINTF("pcistg invalid space\n");
> > +        setcc(cpu, ZPCI_PCI_LS_ERR);
> > +        s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
> > +        return 0;
> > +    }
> > +
> > +    setcc(cpu, ZPCI_PCI_LS_OK);
> > +    return 0;
> > +}
> > +
> > +int kvm_rpcit_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> > +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> > +    uint32_t fh;
> > +    uint64_t pte;
> > +    S390PCIBusDevice *pbdev;
> > +    ram_addr_t size;
> > +    int flags;
> > +    IOMMUTLBEntry entry;
> > +
> > +    cpu_synchronize_state(CPU(cpu));
> > +
> > +    if (env->psw.mask & PSW_MASK_PSTATE) {
> > +        program_interrupt(env, PGM_PRIVILEGED, 4);
> > +        return 0;
> > +    }
> > +
> > +    if (r2 & 0x1) {
> > +        program_interrupt(env, PGM_SPECIFICATION, 4);
> > +        return 0;
> > +    }
> > +
> > +    fh = env->regs[r1] >> 32;
> > +    size = env->regs[r2 + 1];
> > +
> > +    pbdev = s390_pci_find_dev_by_fh(fh);
> > +
> > +    if (!pbdev) {
> > +        DPRINTF("rpcit no pci dev\n");
> > +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> > +        return 0;
> > +    }
> > +
> > +    pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
> > +                                   env->regs[r2]);
> > +    flags = pte & ZPCI_PTE_FLAG_MASK;
> > +    entry.target_as = &address_space_memory;
> > +    entry.iova = env->regs[r2];
> > +    entry.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
> > +    entry.addr_mask = size - 1;
> > +
> > +    if (flags & ZPCI_PTE_INVALID) {
> > +        entry.perm = IOMMU_NONE;
> > +    } else {
> > +        entry.perm = IOMMU_RW;
> > +    }
> 
> Deja vu? This is the iommu translation function, no? Can't you somehow
> just call it?
>

yes you are so right, can't belive I did't saw this before
 
> > +
> > +    memory_region_notify_iommu(pci_device_iommu_address_space(
> > +                               pbdev->pdev)->root, entry);
> > +
> > +    setcc(cpu, ZPCI_PCI_LS_OK);
> > +    return 0;
> > +}
> > +
> > +int kvm_sic_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    qemu_log_mask(LOG_UNIMP, "SIC missing\n");
> > +    return 0;
> > +}
> > +
> > +int kvm_pcistb_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    uint8_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
> > +    uint8_t r3 = run->s390_sieic.ipa & 0x000f;
> > +    PciStb *rp;
> > +    uint64_t gaddr;
> > +    uint64_t *uaddr, *pu;
> > +    hwaddr len;
> > +    S390PCIBusDevice *pbdev;
> > +    MemoryRegion *mr;
> > +    int i;
> > +
> > +    cpu_synchronize_state(CPU(cpu));
> > +
> > +    if (env->psw.mask & PSW_MASK_PSTATE) {
> > +        program_interrupt(env, PGM_PRIVILEGED, 6);
> > +        return 0;
> > +    }
> > +
> > +    rp = (PciStb *)&env->regs[r1];
> > +    if (rp->pcias > 5) {
> > +        DPRINTF("pcistb invalid space\n");
> > +        setcc(cpu, ZPCI_PCI_LS_ERR);
> > +        s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS);
> > +        return 0;
> > +    }
> > +
> > +    switch (rp->len) {
> > +    case 16:
> > +    case 32:
> > +    case 64:
> > +    case 128:
> > +        break;
> > +    default:
> > +        program_interrupt(env, PGM_SPECIFICATION, 6);
> > +        return 0;
> > +    }
> > +
> > +    gaddr = get_base_disp_rsy(cpu, run);
> > +    len = rp->len;
> > +
> > +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> > +    if (!pbdev) {
> > +        DPRINTF("pcistb no pci dev fh 0x%x\n", rp->fh);
> > +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> > +        return 0;
> > +    }
> > +
> > +    uaddr = cpu_physical_memory_map(gaddr, &len, 0);
> > +    mr = pbdev->pdev->io_regions[rp->pcias].memory;
> > +    if (!memory_region_access_valid(mr, env->regs[r3], rp->len, true)) {
> > +        cpu_physical_memory_unmap(uaddr, len, 0, len);
> > +        program_interrupt(env, PGM_ADDRESSING, 6);
> > +        return 0;
> > +    }
> > +
> > +    pu = uaddr;
> > +    for (i = 0; i < rp->len / 8; i++) {
> > +        io_mem_write(mr, env->regs[r3] + i * 8, *pu, 8);
> 
> Please don't overoptimize and just use individual ldq_phys() operations
> here for each memory access. In general, try to avoid
> cpu_physical_memory_map().
> 
> > +        pu++;
> > +    }
> > +
> > +    cpu_physical_memory_unmap(uaddr, len, 0, len);
> > +    setcc(cpu, ZPCI_PCI_LS_OK);
> > +    return 0;
> > +}
> > +
> > +static int reg_irqs(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib)
> > +{
> > +    int ret;
> > +    S390FLICState *fs = s390_get_flic();
> > +    S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
> > +
> > +    ret = css_register_io_adapter(S390_PCIPT_ADAPTER,
> > +                                  FIB_DATA_ISC(fib.data), true, false,
> > +                                  &pbdev->routes.adapter.adapter_id);
> > +    assert(ret == 0);
> > +
> > +    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id, fib.aisb, true);
> > +    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id, fib.aibv, true);
> > +
> > +    pbdev->routes.adapter.summary_addr = fib.aisb;
> > +    pbdev->routes.adapter.summary_offset = FIB_DATA_AISBO(fib.data);
> > +    pbdev->routes.adapter.ind_addr = fib.aibv;
> > +    pbdev->routes.adapter.ind_offset = FIB_DATA_AIBVO(fib.data);
> > +
> > +    DPRINTF("reg_irqs adapter id %d\n", pbdev->routes.adapter.adapter_id);
> > +    return 0;
> > +}
> > +
> > +static int dereg_irqs(S390PCIBusDevice *pbdev)
> > +{
> > +    S390FLICState *fs = s390_get_flic();
> > +    S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
> > +
> > +    fsc->io_adapter_map(fs, pbdev->routes.adapter.adapter_id,
> > +                        pbdev->routes.adapter.ind_addr, false);
> > +
> > +    pbdev->routes.adapter.summary_addr = 0;
> > +    pbdev->routes.adapter.summary_offset = 0;
> > +    pbdev->routes.adapter.ind_addr = 0;
> > +    pbdev->routes.adapter.ind_offset = 0;
> > +
> > +    DPRINTF("dereg_irqs adapter id %d\n", pbdev->routes.adapter.adapter_id);
> > +    return 0;
> > +}
> > +
> > +int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    uint8_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
> > +    uint8_t oc;
> > +    uint32_t fh;
> > +    uint64_t fiba;
> > +    ZpciFib fib;
> > +    S390PCIBusDevice *pbdev;
> > +
> > +    cpu_synchronize_state(CPU(cpu));
> > +
> > +    if (env->psw.mask & PSW_MASK_PSTATE) {
> > +        program_interrupt(env, PGM_PRIVILEGED, 6);
> > +        return 0;
> > +    }
> > +
> > +    oc = env->regs[r1] & 0xff;
> > +    fh = env->regs[r1] >> 32;
> > +    fiba = get_base_disp_rxy(cpu, run);
> > +
> > +    if (fiba & 0x7) {
> > +        program_interrupt(env, PGM_SPECIFICATION, 6);
> > +        return 0;
> > +    }
> > +
> > +    pbdev = s390_pci_find_dev_by_fh(fh);
> > +    if (!pbdev) {
> > +        DPRINTF("mpcifc no pci dev fh 0x%x\n", fh);
> > +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> > +        return 0;
> > +    }
> > +
> > +    cpu_physical_memory_rw(fiba, (uint8_t *)&fib, sizeof(fib), 0);
> 
> I also find cpu_physical_memory_rw() pretty hard to read. Meanwhile,
> it's been deprecated by cpu_physical_memory_read() and
> cpu_physical_memory_write() which make the code more readable.
> 
> > +
> > +    switch (oc) {
> > +    case ZPCI_MOD_FC_REG_INT: {
> > +        pbdev->isc = FIB_DATA_ISC(fib.data);
> > +        reg_irqs(env, pbdev, fib);
> > +        break;
> > +    }
> > +    case ZPCI_MOD_FC_DEREG_INT:
> > +        dereg_irqs(pbdev);
> > +        break;
> > +    case ZPCI_MOD_FC_REG_IOAT:
> > +        if (fib.pba > fib.pal) {
> > +            program_interrupt(&cpu->env, PGM_OPERAND, 6);
> > +            return 0;
> > +        }
> > +        pbdev->g_iota = fib.iota;
> > +        break;
> > +    case ZPCI_MOD_FC_DEREG_IOAT:
> > +        break;
> > +    case ZPCI_MOD_FC_REREG_IOAT:
> > +        break;
> > +    case ZPCI_MOD_FC_RESET_ERROR:
> > +        break;
> > +    case ZPCI_MOD_FC_RESET_BLOCK:
> > +        break;
> > +    case ZPCI_MOD_FC_SET_MEASURE:
> > +        break;
> > +    default:
> > +        program_interrupt(&cpu->env, PGM_OPERAND, 6);
> > +        return 0;
> > +    }
> > +
> > +    setcc(cpu, ZPCI_PCI_LS_OK);
> > +    return 0;
> > +}
> > +
> > +int kvm_stpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    qemu_log_mask(LOG_UNIMP, "STPCIFC missing\n");
> > +    return 0;
> > +}
> > diff --git a/target-s390x/pci_ic.h b/target-s390x/pci_ic.h
> > new file mode 100644
> > index 0000000..0eb6c27
> > --- /dev/null
> > +++ b/target-s390x/pci_ic.h
> > @@ -0,0 +1,335 @@
> > +/*
> > + * s390 PCI intercept definitions
> > + *
> > + * Copyright 2014 IBM Corp.
> > + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> > + *            Hong Bo Li <lihbbj@cn.ibm.com>
> > + *            Yi Min Zhao <zyimin@cn.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > + * your option) any later version. See the COPYING file in the top-level
> > + * directory.
> > + */
> > +
> > +#ifndef PCI_IC_S390X_H
> > +#define PCI_IC_S390X_H
> > +
> > +#include <sysemu/dma.h>
> > +
> > +/* CLP common request & response block size */
> > +#define CLP_BLK_SIZE 4096
> > +#define PCI_BAR_COUNT 6
> > +#define PCI_MAX_FUNCTIONS 4096
> > +
> > +typedef struct ClpReqHdr {
> > +    __uint16_t len;
> > +    __uint16_t cmd;
> > +} QEMU_PACKED ClpReqHdr;
> > +
> > +typedef struct ClpRspHdr {
> > +    __uint16_t len;
> > +    __uint16_t rsp;
> > +} QEMU_PACKED ClpRspHdr;
> > +
> > +/* CLP Response Codes */
> > +#define CLP_RC_OK         0x0010  /* Command request successfully */
> > +#define CLP_RC_CMD        0x0020  /* Command code not recognized */
> > +#define CLP_RC_PERM       0x0030  /* Command not authorized */
> > +#define CLP_RC_FMT        0x0040  /* Invalid command request format */
> > +#define CLP_RC_LEN        0x0050  /* Invalid command request length */
> > +#define CLP_RC_8K         0x0060  /* Command requires 8K LPCB */
> > +#define CLP_RC_RESNOT0    0x0070  /* Reserved field not zero */
> > +#define CLP_RC_NODATA     0x0080  /* No data available */
> > +#define CLP_RC_FC_UNKNOWN 0x0100  /* Function code not recognized */
> > +
> > +/*
> > + * Call Logical Processor - Command Codes
> > + */
> > +#define CLP_LIST_PCI            0x0002
> > +#define CLP_QUERY_PCI_FN        0x0003
> > +#define CLP_QUERY_PCI_FNGRP     0x0004
> > +#define CLP_SET_PCI_FN          0x0005
> > +
> > +/* PCI function handle list entry */
> > +typedef struct ClpFhListEntry {
> > +    __uint16_t device_id;
> > +    __uint16_t vendor_id;
> > +#define CLP_FHLIST_MASK_CONFIG 0x80000000
> > +    __uint32_t config;
> > +    __uint32_t fid;
> > +    __uint32_t fh;
> > +} QEMU_PACKED ClpFhListEntry;
> > +
> > +#define CLP_RC_SETPCIFN_FH      0x0101 /* Invalid PCI fn handle */
> > +#define CLP_RC_SETPCIFN_FHOP    0x0102 /* Fn handle not valid for op */
> > +#define CLP_RC_SETPCIFN_DMAAS   0x0103 /* Invalid DMA addr space */
> > +#define CLP_RC_SETPCIFN_RES     0x0104 /* Insufficient resources */
> > +#define CLP_RC_SETPCIFN_ALRDY   0x0105 /* Fn already in requested state */
> > +#define CLP_RC_SETPCIFN_ERR     0x0106 /* Fn in permanent error state */
> > +#define CLP_RC_SETPCIFN_RECPND  0x0107 /* Error recovery pending */
> > +#define CLP_RC_SETPCIFN_BUSY    0x0108 /* Fn busy */
> > +#define CLP_RC_LISTPCI_BADRT    0x010a /* Resume token not recognized */
> > +#define CLP_RC_QUERYPCIFG_PFGID 0x010b /* Unrecognized PFGID */
> > +
> > +/* request or response block header length */
> > +#define LIST_PCI_HDR_LEN 32
> > +
> > +/* Number of function handles fitting in response block */
> > +#define CLP_FH_LIST_NR_ENTRIES \
> > +    ((CLP_BLK_SIZE - 2 * LIST_PCI_HDR_LEN) \
> > +        / sizeof(ClpFhListEntry))
> > +
> > +#define CLP_SET_ENABLE_PCI_FN  0 /* Yes, 0 enables it */
> > +#define CLP_SET_DISABLE_PCI_FN 1 /* Yes, 1 disables it */
> > +
> > +#define CLP_UTIL_STR_LEN 64
> > +
> > +#define CLP_MASK_FMT 0xf0000000
> > +
> > +/* List PCI functions request */
> > +typedef struct ClpReqListPci {
> > +    ClpReqHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +    __uint64_t resume_token;
> > +    __uint64_t reserved2;
> > +} QEMU_PACKED ClpReqListPci;
> > +
> > +/* List PCI functions response */
> > +typedef struct ClpRspListPci {
> > +    ClpRspHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +    __uint64_t resume_token;
> > +    __uint32_t mdd;
> > +    __uint16_t max_fn;
> > +    __uint8_t reserved2;
> > +    __uint8_t entry_size;
> > +    ClpFhListEntry fh_list[CLP_FH_LIST_NR_ENTRIES];
> > +} QEMU_PACKED ClpRspListPci;
> > +
> > +/* Query PCI function request */
> > +typedef struct ClpReqQueryPci {
> > +    ClpReqHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +    __uint32_t fh; /* function handle */
> > +    __uint32_t reserved2;
> > +    __uint64_t reserved3;
> > +} QEMU_PACKED ClpReqQueryPci;
> > +
> > +/* Query PCI function response */
> > +typedef struct ClpRspQueryPci {
> > +    ClpRspHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +    __uint16_t vfn; /* virtual fn number */
> > +#define CLP_RSP_QPCI_MASK_UTIL  0x100
> > +#define CLP_RSP_QPCI_MASK_PFGID 0xff
> > +    __uint16_t ug;
> > +    __uint32_t fid; /* pci function id */
> > +    __uint8_t bar_size[PCI_BAR_COUNT];
> > +    __uint16_t pchid;
> > +    __uint32_t bar[PCI_BAR_COUNT];
> > +    __uint64_t reserved2;
> > +    __uint64_t sdma; /* start dma as */
> > +    __uint64_t edma; /* end dma as */
> > +    __uint32_t reserved3[11];
> > +    __uint32_t uid;
> > +    __uint8_t util_str[CLP_UTIL_STR_LEN]; /* utility string */
> > +} QEMU_PACKED ClpRspQueryPci;
> > +
> > +/* Query PCI function group request */
> > +typedef struct ClpReqQueryPciGrp {
> > +    ClpReqHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +#define CLP_REQ_QPCIG_MASK_PFGID 0xff
> > +    __uint32_t g;
> > +    __uint32_t reserved2;
> > +    __uint64_t reserved3;
> > +} QEMU_PACKED ClpReqQueryPciGrp;
> > +
> > +/* Query PCI function group response */
> > +typedef struct ClpRspQueryPciGrp {
> > +    ClpRspHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +#define CLP_RSP_QPCIG_MASK_NOI 0xfff
> > +    __uint16_t i;
> > +    __uint8_t version;
> > +#define CLP_RSP_QPCIG_MASK_FRAME   0x2
> > +#define CLP_RSP_QPCIG_MASK_REFRESH 0x1
> > +    __uint8_t fr;
> > +    __uint16_t reserved2;
> > +    __uint16_t mui;
> > +    __uint64_t reserved3;
> > +    __uint64_t dasm; /* dma address space mask */
> > +    __uint64_t msia; /* MSI address */
> > +    __uint64_t reserved4;
> > +    __uint64_t reserved5;
> > +} QEMU_PACKED ClpRspQueryPciGrp;
> > +
> > +/* Set PCI function request */
> > +typedef struct ClpReqSetPci {
> > +    ClpReqHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +    __uint32_t fh; /* function handle */
> > +    __uint16_t reserved2;
> > +    __uint8_t oc; /* operation controls */
> > +    __uint8_t ndas; /* number of dma spaces */
> > +    __uint64_t reserved3;
> > +} QEMU_PACKED ClpReqSetPci;
> > +
> > +/* Set PCI function response */
> > +typedef struct ClpRspSetPci {
> > +    ClpRspHdr hdr;
> > +    __uint32_t fmt;
> > +    __uint64_t reserved1;
> > +    __uint32_t fh; /* function handle */
> > +    __uint32_t reserved3;
> > +    __uint64_t reserved4;
> > +} QEMU_PACKED ClpRspSetPci;
> > +
> > +typedef struct ClpReqRspListPci {
> > +    ClpReqListPci request;
> > +    ClpRspListPci response;
> > +} QEMU_PACKED ClpReqRspListPci;
> > +
> > +typedef struct ClpReqRspSetPci {
> > +    ClpReqSetPci request;
> > +    ClpRspSetPci response;
> > +} QEMU_PACKED ClpReqRspSetPci;
> > +
> > +typedef struct ClpReqRspQueryPci {
> > +    ClpReqQueryPci request;
> > +    ClpRspQueryPci response;
> > +} QEMU_PACKED ClpReqRspQueryPci;
> > +
> > +typedef struct ClpReqRspQueryPciGrp {
> > +    ClpReqQueryPciGrp request;
> > +    ClpRspQueryPciGrp response;
> > +} QEMU_PACKED ClpReqRspQueryPciGrp;
> > +
> > +typedef struct PciLgStg {
> > +    uint32_t fh;
> > +    uint8_t status;
> > +    uint8_t pcias;
> > +    uint8_t reserved;
> > +    uint8_t len;
> > +} QEMU_PACKED PciLgStg;
> > +
> > +typedef struct PciStb {
> > +    uint32_t fh;
> > +    uint8_t status;
> > +    uint8_t pcias;
> > +    uint8_t reserved;
> > +    uint8_t len;
> > +} QEMU_PACKED PciStb;
> > +
> > +/* Load/Store status codes */
> > +#define ZPCI_PCI_ST_FUNC_NOT_ENABLED        4
> > +#define ZPCI_PCI_ST_FUNC_IN_ERR             8
> > +#define ZPCI_PCI_ST_BLOCKED                 12
> > +#define ZPCI_PCI_ST_INSUF_RES               16
> > +#define ZPCI_PCI_ST_INVAL_AS                20
> > +#define ZPCI_PCI_ST_FUNC_ALREADY_ENABLED    24
> > +#define ZPCI_PCI_ST_DMA_AS_NOT_ENABLED      28
> > +#define ZPCI_PCI_ST_2ND_OP_IN_INV_AS        36
> > +#define ZPCI_PCI_ST_FUNC_NOT_AVAIL          40
> > +#define ZPCI_PCI_ST_ALREADY_IN_RQ_STATE     44
> > +
> > +/* Load/Store return codes */
> > +#define ZPCI_PCI_LS_OK              0
> > +#define ZPCI_PCI_LS_ERR             1
> > +#define ZPCI_PCI_LS_BUSY            2
> > +#define ZPCI_PCI_LS_INVAL_HANDLE    3
> > +
> > +/* Modify PCI Function Controls */
> > +#define ZPCI_MOD_FC_REG_INT     2
> > +#define ZPCI_MOD_FC_DEREG_INT   3
> > +#define ZPCI_MOD_FC_REG_IOAT    4
> > +#define ZPCI_MOD_FC_DEREG_IOAT  5
> > +#define ZPCI_MOD_FC_REREG_IOAT  6
> > +#define ZPCI_MOD_FC_RESET_ERROR 7
> > +#define ZPCI_MOD_FC_RESET_BLOCK 9
> > +#define ZPCI_MOD_FC_SET_MEASURE 10
> > +
> > +/* FIB function controls */
> > +#define ZPCI_FIB_FC_ENABLED     0x80
> > +#define ZPCI_FIB_FC_ERROR       0x40
> > +#define ZPCI_FIB_FC_LS_BLOCKED  0x20
> > +#define ZPCI_FIB_FC_DMAAS_REG   0x10
> > +
> > +/* FIB function controls */
> > +#define ZPCI_FIB_FC_ENABLED     0x80
> > +#define ZPCI_FIB_FC_ERROR       0x40
> > +#define ZPCI_FIB_FC_LS_BLOCKED  0x20
> > +#define ZPCI_FIB_FC_DMAAS_REG   0x10
> > +
> > +/* Function Information Block */
> > +typedef struct ZpciFib {
> > +    __uint8_t fmt;   /* format */
> > +    __uint8_t reserved1[7];
> > +    __uint8_t fc;                  /* function controls */
> > +    __uint8_t reserved2;
> > +    __uint16_t reserved3;
> > +    __uint32_t reserved4;
> > +    __uint64_t pba;                /* PCI base address */
> > +    __uint64_t pal;                /* PCI address limit */
> > +    __uint64_t iota;               /* I/O Translation Anchor */
> > +#define FIB_DATA_ISC(x)    (((x) >> 28) & 0x7)
> > +#define FIB_DATA_NOI(x)    (((x) >> 16) & 0xfff)
> > +#define FIB_DATA_AIBVO(x) (((x) >> 8) & 0x3f)
> > +#define FIB_DATA_SUM(x)    (((x) >> 7) & 0x1)
> > +#define FIB_DATA_AISBO(x)  ((x) & 0x3f)
> > +    __uint32_t data;
> > +    __uint32_t reserved5;
> > +    __uint64_t aibv;               /* Adapter int bit vector address */
> > +    __uint64_t aisb;               /* Adapter int summary bit address */
> > +    __uint64_t fmb_addr;           /* Function measurement address and key */
> > +    __uint32_t reserved6;
> > +    __uint32_t gd;
> > +} QEMU_PACKED ZpciFib;
> > +
> > +static inline uint64_t get_base_disp_rxy(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    uint32_t x2 = (run->s390_sieic.ipa & 0x000f);
> > +    uint32_t base2 = run->s390_sieic.ipb >> 28;
> > +    uint32_t disp2 = ((run->s390_sieic.ipb & 0x0fff0000) >> 16) +
> > +                     ((run->s390_sieic.ipb & 0xff00) << 4);
> > +
> > +    if (disp2 & 0x80000) {
> > +        disp2 += 0xfff00000;
> > +    }
> > +
> > +    return (base2 ? env->regs[base2] : 0) +
> > +           (x2 ? env->regs[x2] : 0) + (long)(int)disp2;
> > +}
> > +
> > +static inline uint64_t get_base_disp_rsy(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +    CPUS390XState *env = &cpu->env;
> > +    uint32_t base2 = run->s390_sieic.ipb >> 28;
> > +    uint32_t disp2 = ((run->s390_sieic.ipb & 0x0fff0000) >> 16) +
> > +                     ((run->s390_sieic.ipb & 0xff00) << 4);
> > +
> > +    if (disp2 & 0x80000) {
> > +        disp2 += 0xfff00000;
> > +    }
> > +
> > +    return (base2 ? env->regs[base2] : 0) + (long)(int)disp2;
> > +}
> 
> Same comment as in the previous patch here, please try to avoid putting
> code into a header file.
> 
> > +
> > +int kvm_clp_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_rpcit_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_sic_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_pcistb_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_pcistg_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run);
> > +int kvm_stpcifc_service_call(S390CPU *cpu, struct kvm_run *run);
> 
> Hrm. Maybe we could add some registration hook similar to spapr's hcall
> or rtas callback registration that would allow us to encapsulate this a
> bit better?
> 
> Then you'd only have to spawn a PHB device which could register for
> these service calls.
> 
> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 12:10     ` Frank Blaschka
@ 2014-11-11 12:16       ` Alexander Graf
  2014-11-11 12:39         ` Frank Blaschka
  2014-11-11 12:17       ` Peter Maydell
  1 sibling, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-11 12:16 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth



On 11.11.14 13:10, Frank Blaschka wrote:
> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>
>>
>> On 10.11.14 15:20, Frank Blaschka wrote:
>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>
>>> This patch implements the s390 pci instructions in qemu. It allows
>>> to access and drive pci devices attached to the s390 pci bus.
>>> Because of platform constrains devices using IO BARs are not
>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>
>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>> ---
>>>  target-s390x/Makefile.objs |   2 +-
>>>  target-s390x/kvm.c         |  52 ++++
>>>  target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>  target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>  4 files changed, 1141 insertions(+), 1 deletion(-)
>>>  create mode 100644 target-s390x/pci_ic.c
>>>  create mode 100644 target-s390x/pci_ic.h
>>>

[...]

>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>> +{
>>> +    CPUS390XState *env = &cpu->env;
>>> +    S390PCIBusDevice *pbdev;
>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>> +    PciLgStg *rp;
>>> +    uint64_t offset;
>>> +    uint64_t data;
>>> +    uint8_t len;
>>> +
>>> +    cpu_synchronize_state(CPU(cpu));
>>> +
>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (r2 & 0x1) {
>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>> +        return 0;
>>> +    }
>>> +
>>> +    rp = (PciLgStg *)&env->regs[r2];
>>> +    offset = env->regs[r2 + 1];
>>> +
>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>> +    if (!pbdev) {
>>> +        DPRINTF("pcilg no pci dev\n");
>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>> +        return 0;
>>> +    }
>>> +
>>> +    len = rp->len & 0xF;
>>> +    if (rp->pcias < 6) {
>>> +        if ((8 - (offset & 0x7)) < len) {
>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>> +            return 0;
>>> +        }
>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>> +        io_mem_read(mr, offset, &data, len);
>>> +    } else if (rp->pcias == 15) {
>>> +        if ((4 - (offset & 0x3)) < len) {
>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>> +            return 0;
>>> +        }
>>> +        data =  pci_host_config_read_common(
>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>> +
>>> +        switch (len) {
>>> +        case 1:
>>> +            break;
>>> +        case 2:
>>> +            data = cpu_to_le16(data);
>>> +            break;
>>> +        case 4:
>>> +            data = cpu_to_le32(data);
>>> +            break;
>>> +        case 8:
>>> +            data = cpu_to_le64(data);
>>> +            break;
>>
>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>> and LE. So if you're running this on an LE host, you won't swap the
>> value and get a broken result.
>>
>> If you know that the value is always swapped, use bswapxx().
>>
> 
> Actually the code is right and required for a big endian host :-)
> pcilg/pcistg provide access to the PCI config space which is defined
> as PCI byte order (little endian). Since pci_host_config_read_common does
> already a le to cpu conversion we have to convert back to PCI byte order.
> Doing an unconditional swap would be a bug on a little endian host.

Why would it be a bug? The value you end up writing is contents of a
register and thus doesn't have endianness. So if QEMU was an LE process,
the value of data would be identical as on a BE QEMU before your swab.
After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 12:10     ` Frank Blaschka
  2014-11-11 12:16       ` Alexander Graf
@ 2014-11-11 12:17       ` Peter Maydell
  2014-11-11 12:40         ` Frank Blaschka
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2014-11-11 12:17 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: Frank Blaschka, James Hogan, Marcelo Tosatti, QEMU Developers,
	Alexander Graf, Christian Borntraeger, Cornelia Huck,
	Paolo Bonzini, Richard Henderson

On 11 November 2014 12:10, Frank Blaschka <blaschka@linux.vnet.ibm.com> wrote:
> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>> > +static uint8_t barsize(uint64_t size)
>> > +{
>> > +    uint64_t mask = 1;
>> > +    int i;
>> > +
>> > +    if (!size) {
>> > +        return 0;
>> > +    }
>> > +
>> > +    for (i = 0; i < 64; i++) {
>> > +        if (size & mask) {
>> > +            break;
>> > +        }
>> > +        mask = (mask << 1);
>> > +    }
>> > +
>> > +    return i;
>> > +}
>>
>> Isn't there an existing helper for this in the PCI layer?
>>
>
> Did not find one, this function is used to fill a s390 specific len
> in an instruction intercept (architecture specific encoding of the len).

If you do need to implement this here then you should probably
be using ctz64(). I think what you have here is equivalent to

    return size ? ctz64(size) : 0;

but you should check that.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 12:16       ` Alexander Graf
@ 2014-11-11 12:39         ` Frank Blaschka
  2014-11-11 12:51           ` Alexander Graf
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-11 12:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth

On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
> 
> 
> On 11.11.14 13:10, Frank Blaschka wrote:
> > On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 10.11.14 15:20, Frank Blaschka wrote:
> >>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>
> >>> This patch implements the s390 pci instructions in qemu. It allows
> >>> to access and drive pci devices attached to the s390 pci bus.
> >>> Because of platform constrains devices using IO BARs are not
> >>> supported. Also a device has to support MSI/MSI-X to run on s390.
> >>>
> >>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>> ---
> >>>  target-s390x/Makefile.objs |   2 +-
> >>>  target-s390x/kvm.c         |  52 ++++
> >>>  target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
> >>>  target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
> >>>  4 files changed, 1141 insertions(+), 1 deletion(-)
> >>>  create mode 100644 target-s390x/pci_ic.c
> >>>  create mode 100644 target-s390x/pci_ic.h
> >>>
> 
> [...]
> 
> >>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> >>> +{
> >>> +    CPUS390XState *env = &cpu->env;
> >>> +    S390PCIBusDevice *pbdev;
> >>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> >>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> >>> +    PciLgStg *rp;
> >>> +    uint64_t offset;
> >>> +    uint64_t data;
> >>> +    uint8_t len;
> >>> +
> >>> +    cpu_synchronize_state(CPU(cpu));
> >>> +
> >>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> >>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (r2 & 0x1) {
> >>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    rp = (PciLgStg *)&env->regs[r2];
> >>> +    offset = env->regs[r2 + 1];
> >>> +
> >>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> >>> +    if (!pbdev) {
> >>> +        DPRINTF("pcilg no pci dev\n");
> >>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    len = rp->len & 0xF;
> >>> +    if (rp->pcias < 6) {
> >>> +        if ((8 - (offset & 0x7)) < len) {
> >>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>> +            return 0;
> >>> +        }
> >>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> >>> +        io_mem_read(mr, offset, &data, len);
> >>> +    } else if (rp->pcias == 15) {
> >>> +        if ((4 - (offset & 0x3)) < len) {
> >>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>> +            return 0;
> >>> +        }
> >>> +        data =  pci_host_config_read_common(
> >>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
> >>> +
> >>> +        switch (len) {
> >>> +        case 1:
> >>> +            break;
> >>> +        case 2:
> >>> +            data = cpu_to_le16(data);
> >>> +            break;
> >>> +        case 4:
> >>> +            data = cpu_to_le32(data);
> >>> +            break;
> >>> +        case 8:
> >>> +            data = cpu_to_le64(data);
> >>> +            break;
> >>
> >> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> >> and LE. So if you're running this on an LE host, you won't swap the
> >> value and get a broken result.
> >>
> >> If you know that the value is always swapped, use bswapxx().
> >>
> > 
> > Actually the code is right and required for a big endian host :-)
> > pcilg/pcistg provide access to the PCI config space which is defined
> > as PCI byte order (little endian). Since pci_host_config_read_common does
> > already a le to cpu conversion we have to convert back to PCI byte order.
> > Doing an unconditional swap would be a bug on a little endian host.
> 
> Why would it be a bug? The value you end up writing is contents of a
> register and thus doesn't have endianness. So if QEMU was an LE process,

No, the s390 guest executing pcilg instruction expects to receive config space data
in PCI byte order.

> the value of data would be identical as on a BE QEMU before your swab.
> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>

Again on BE endian host we do the swap because of pci_host_config_read_common does
read the value and do a byte swap for that value, but we need PCI byte order not BE here.

On LE host pci_host_config_read_common does not do a byte swap so we do not have to
convert back to PCI byte order.
 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 12:17       ` Peter Maydell
@ 2014-11-11 12:40         ` Frank Blaschka
  0 siblings, 0 replies; 27+ messages in thread
From: Frank Blaschka @ 2014-11-11 12:40 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Frank Blaschka, James Hogan, Marcelo Tosatti, Alexander Graf,
	QEMU Developers, Christian Borntraeger, Cornelia Huck,
	Paolo Bonzini, Richard Henderson

On Tue, Nov 11, 2014 at 12:17:17PM +0000, Peter Maydell wrote:
> On 11 November 2014 12:10, Frank Blaschka <blaschka@linux.vnet.ibm.com> wrote:
> > On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> >> > +static uint8_t barsize(uint64_t size)
> >> > +{
> >> > +    uint64_t mask = 1;
> >> > +    int i;
> >> > +
> >> > +    if (!size) {
> >> > +        return 0;
> >> > +    }
> >> > +
> >> > +    for (i = 0; i < 64; i++) {
> >> > +        if (size & mask) {
> >> > +            break;
> >> > +        }
> >> > +        mask = (mask << 1);
> >> > +    }
> >> > +
> >> > +    return i;
> >> > +}
> >>
> >> Isn't there an existing helper for this in the PCI layer?
> >>
> >
> > Did not find one, this function is used to fill a s390 specific len
> > in an instruction intercept (architecture specific encoding of the len).
> 
> If you do need to implement this here then you should probably
> be using ctz64(). I think what you have here is equivalent to
> 
>     return size ? ctz64(size) : 0;
> 
> but you should check that.

will do thx!

> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 12:39         ` Frank Blaschka
@ 2014-11-11 12:51           ` Alexander Graf
  2014-11-11 14:08             ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-11 12:51 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net




> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
> 
>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>> 
>> 
>>> On 11.11.14 13:10, Frank Blaschka wrote:
>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>>> 
>>>> 
>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>> 
>>>>> This patch implements the s390 pci instructions in qemu. It allows
>>>>> to access and drive pci devices attached to the s390 pci bus.
>>>>> Because of platform constrains devices using IO BARs are not
>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>>> 
>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>> ---
>>>>> target-s390x/Makefile.objs |   2 +-
>>>>> target-s390x/kvm.c         |  52 ++++
>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>>>> create mode 100644 target-s390x/pci_ic.c
>>>>> create mode 100644 target-s390x/pci_ic.h
>>>>> 
>> 
>> [...]
>> 
>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>>>> +{
>>>>> +    CPUS390XState *env = &cpu->env;
>>>>> +    S390PCIBusDevice *pbdev;
>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>>>> +    PciLgStg *rp;
>>>>> +    uint64_t offset;
>>>>> +    uint64_t data;
>>>>> +    uint8_t len;
>>>>> +
>>>>> +    cpu_synchronize_state(CPU(cpu));
>>>>> +
>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    if (r2 & 0x1) {
>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    rp = (PciLgStg *)&env->regs[r2];
>>>>> +    offset = env->regs[r2 + 1];
>>>>> +
>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>>>> +    if (!pbdev) {
>>>>> +        DPRINTF("pcilg no pci dev\n");
>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    len = rp->len & 0xF;
>>>>> +    if (rp->pcias < 6) {
>>>>> +        if ((8 - (offset & 0x7)) < len) {
>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>> +            return 0;
>>>>> +        }
>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>>>> +        io_mem_read(mr, offset, &data, len);
>>>>> +    } else if (rp->pcias == 15) {
>>>>> +        if ((4 - (offset & 0x3)) < len) {
>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>> +            return 0;
>>>>> +        }
>>>>> +        data =  pci_host_config_read_common(
>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>>>> +
>>>>> +        switch (len) {
>>>>> +        case 1:
>>>>> +            break;
>>>>> +        case 2:
>>>>> +            data = cpu_to_le16(data);
>>>>> +            break;
>>>>> +        case 4:
>>>>> +            data = cpu_to_le32(data);
>>>>> +            break;
>>>>> +        case 8:
>>>>> +            data = cpu_to_le64(data);
>>>>> +            break;
>>>> 
>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>>>> and LE. So if you're running this on an LE host, you won't swap the
>>>> value and get a broken result.
>>>> 
>>>> If you know that the value is always swapped, use bswapxx().
>>>> 
>>> 
>>> Actually the code is right and required for a big endian host :-)
>>> pcilg/pcistg provide access to the PCI config space which is defined
>>> as PCI byte order (little endian). Since pci_host_config_read_common does
>>> already a le to cpu conversion we have to convert back to PCI byte order.
>>> Doing an unconditional swap would be a bug on a little endian host.
>> 
>> Why would it be a bug? The value you end up writing is contents of a
>> register and thus doesn't have endianness. So if QEMU was an LE process,
> 
> No, the s390 guest executing pcilg instruction expects to receive config space data
> in PCI byte order.
> 
>> the value of data would be identical as on a BE QEMU before your swab.
>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>> 
> 
> Again on BE endian host we do the swap because of pci_host_config_read_common does
> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
> 
> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
> convert back to PCI byte order.

We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.

So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 12:51           ` Alexander Graf
@ 2014-11-11 14:08             ` Frank Blaschka
  2014-11-11 15:24               ` Alexander Graf
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-11 14:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net

On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
> 
> 
> 
> > Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
> > 
> >> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
> >> 
> >> 
> >>> On 11.11.14 13:10, Frank Blaschka wrote:
> >>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> >>>> 
> >>>> 
> >>>>> On 10.11.14 15:20, Frank Blaschka wrote:
> >>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>> 
> >>>>> This patch implements the s390 pci instructions in qemu. It allows
> >>>>> to access and drive pci devices attached to the s390 pci bus.
> >>>>> Because of platform constrains devices using IO BARs are not
> >>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
> >>>>> 
> >>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>> ---
> >>>>> target-s390x/Makefile.objs |   2 +-
> >>>>> target-s390x/kvm.c         |  52 ++++
> >>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
> >>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
> >>>>> create mode 100644 target-s390x/pci_ic.c
> >>>>> create mode 100644 target-s390x/pci_ic.h
> >>>>> 
> >> 
> >> [...]
> >> 
> >>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> >>>>> +{
> >>>>> +    CPUS390XState *env = &cpu->env;
> >>>>> +    S390PCIBusDevice *pbdev;
> >>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> >>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> >>>>> +    PciLgStg *rp;
> >>>>> +    uint64_t offset;
> >>>>> +    uint64_t data;
> >>>>> +    uint8_t len;
> >>>>> +
> >>>>> +    cpu_synchronize_state(CPU(cpu));
> >>>>> +
> >>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> >>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (r2 & 0x1) {
> >>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    rp = (PciLgStg *)&env->regs[r2];
> >>>>> +    offset = env->regs[r2 + 1];
> >>>>> +
> >>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> >>>>> +    if (!pbdev) {
> >>>>> +        DPRINTF("pcilg no pci dev\n");
> >>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    len = rp->len & 0xF;
> >>>>> +    if (rp->pcias < 6) {
> >>>>> +        if ((8 - (offset & 0x7)) < len) {
> >>>>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>>>> +            return 0;
> >>>>> +        }
> >>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> >>>>> +        io_mem_read(mr, offset, &data, len);
> >>>>> +    } else if (rp->pcias == 15) {
> >>>>> +        if ((4 - (offset & 0x3)) < len) {
> >>>>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>>>> +            return 0;
> >>>>> +        }
> >>>>> +        data =  pci_host_config_read_common(
> >>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
> >>>>> +
> >>>>> +        switch (len) {
> >>>>> +        case 1:
> >>>>> +            break;
> >>>>> +        case 2:
> >>>>> +            data = cpu_to_le16(data);
> >>>>> +            break;
> >>>>> +        case 4:
> >>>>> +            data = cpu_to_le32(data);
> >>>>> +            break;
> >>>>> +        case 8:
> >>>>> +            data = cpu_to_le64(data);
> >>>>> +            break;
> >>>> 
> >>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> >>>> and LE. So if you're running this on an LE host, you won't swap the
> >>>> value and get a broken result.
> >>>> 
> >>>> If you know that the value is always swapped, use bswapxx().
> >>>> 
> >>> 
> >>> Actually the code is right and required for a big endian host :-)
> >>> pcilg/pcistg provide access to the PCI config space which is defined
> >>> as PCI byte order (little endian). Since pci_host_config_read_common does
> >>> already a le to cpu conversion we have to convert back to PCI byte order.
> >>> Doing an unconditional swap would be a bug on a little endian host.
> >> 
> >> Why would it be a bug? The value you end up writing is contents of a
> >> register and thus doesn't have endianness. So if QEMU was an LE process,
> > 
> > No, the s390 guest executing pcilg instruction expects to receive config space data
> > in PCI byte order.
> > 
> >> the value of data would be identical as on a BE QEMU before your swab.
> >> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
> >> 
> > 
> > Again on BE endian host we do the swap because of pci_host_config_read_common does
> > read the value and do a byte swap for that value, but we need PCI byte order not BE here.
> > 
> > On LE host pci_host_config_read_common does not do a byte swap so we do not have to
> > convert back to PCI byte order.
> 
> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
> 
> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
> 
so you tell me pci_host_config_read_common does not end up in pci_default_read_config?

uint32_t pci_default_read_config(PCIDevice *d,
                                 uint32_t address, int len)
{
    uint32_t val = 0;

    memcpy(&val, d->config + address, len);
    return le32_to_cpu(val);
}

What did I miss?

> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 14:08             ` Frank Blaschka
@ 2014-11-11 15:24               ` Alexander Graf
  2014-11-12  8:49                 ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-11 15:24 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net



On 11.11.14 15:08, Frank Blaschka wrote:
> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>
>>
>>
>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
>>>
>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>>> On 11.11.14 13:10, Frank Blaschka wrote:
>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>
>>>>>>> This patch implements the s390 pci instructions in qemu. It allows
>>>>>>> to access and drive pci devices attached to the s390 pci bus.
>>>>>>> Because of platform constrains devices using IO BARs are not
>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>>>>>
>>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>> ---
>>>>>>> target-s390x/Makefile.objs |   2 +-
>>>>>>> target-s390x/kvm.c         |  52 ++++
>>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>>>>>> create mode 100644 target-s390x/pci_ic.c
>>>>>>> create mode 100644 target-s390x/pci_ic.h
>>>>>>>
>>>>
>>>> [...]
>>>>
>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>>>>>> +{
>>>>>>> +    CPUS390XState *env = &cpu->env;
>>>>>>> +    S390PCIBusDevice *pbdev;
>>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>>>>>> +    PciLgStg *rp;
>>>>>>> +    uint64_t offset;
>>>>>>> +    uint64_t data;
>>>>>>> +    uint8_t len;
>>>>>>> +
>>>>>>> +    cpu_synchronize_state(CPU(cpu));
>>>>>>> +
>>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (r2 & 0x1) {
>>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
>>>>>>> +    offset = env->regs[r2 + 1];
>>>>>>> +
>>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>>>>>> +    if (!pbdev) {
>>>>>>> +        DPRINTF("pcilg no pci dev\n");
>>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    len = rp->len & 0xF;
>>>>>>> +    if (rp->pcias < 6) {
>>>>>>> +        if ((8 - (offset & 0x7)) < len) {
>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>> +            return 0;
>>>>>>> +        }
>>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>>>>>> +        io_mem_read(mr, offset, &data, len);
>>>>>>> +    } else if (rp->pcias == 15) {
>>>>>>> +        if ((4 - (offset & 0x3)) < len) {
>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>> +            return 0;
>>>>>>> +        }
>>>>>>> +        data =  pci_host_config_read_common(
>>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>>>>>> +
>>>>>>> +        switch (len) {
>>>>>>> +        case 1:
>>>>>>> +            break;
>>>>>>> +        case 2:
>>>>>>> +            data = cpu_to_le16(data);
>>>>>>> +            break;
>>>>>>> +        case 4:
>>>>>>> +            data = cpu_to_le32(data);
>>>>>>> +            break;
>>>>>>> +        case 8:
>>>>>>> +            data = cpu_to_le64(data);
>>>>>>> +            break;
>>>>>>
>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>>>>>> and LE. So if you're running this on an LE host, you won't swap the
>>>>>> value and get a broken result.
>>>>>>
>>>>>> If you know that the value is always swapped, use bswapxx().
>>>>>>
>>>>>
>>>>> Actually the code is right and required for a big endian host :-)
>>>>> pcilg/pcistg provide access to the PCI config space which is defined
>>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
>>>>> already a le to cpu conversion we have to convert back to PCI byte order.
>>>>> Doing an unconditional swap would be a bug on a little endian host.
>>>>
>>>> Why would it be a bug? The value you end up writing is contents of a
>>>> register and thus doesn't have endianness. So if QEMU was an LE process,
>>>
>>> No, the s390 guest executing pcilg instruction expects to receive config space data
>>> in PCI byte order.
>>>
>>>> the value of data would be identical as on a BE QEMU before your swab.
>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>>>>
>>>
>>> Again on BE endian host we do the swap because of pci_host_config_read_common does
>>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
>>>
>>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
>>> convert back to PCI byte order.
>>
>> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
>>
>> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
>>
> so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
> 
> uint32_t pci_default_read_config(PCIDevice *d,
>                                  uint32_t address, int len)
> {
>     uint32_t val = 0;
> 
>     memcpy(&val, d->config + address, len);
>     return le32_to_cpu(val);
> }
> 
> What did I miss?

That's exactly where you end up in - and it's there to convert from the
PCI config space backing storage to a native number.

Imagine you write 0x12345678 at offset 0. Because PCI config space is
defined to be LE, in the PCI config space memory this gets stored as

78 56 34 12

The reason we do the internal storage of the config space that way is
that it's (in some PCI implementations) legal to access with single byte
granularities. So you could do a pci_config_read(offset = 1) which
should return 0x56.

However, that means we completely nullify any effect of host endianness
in the PCI config layer already. So if you do pci_config_write(offset =
0, size = 4, value = 0x12345678), the contents of d->config will always
be identical, regardless of host endianness. The same holds true for
pci_config_read(offset = 0, size = 4). It will always return 0x12345678.

In your code, you swab that value again. I assume there's a reason
you're swapping it and that it's the way the architecture expects it
(mind to point me to the respective spec so I can verify?). But if the
architecture expects it, then it expects it regardless of host
endianness. The contents of regs[r1] should always be 0x78563412, no
matter whether we're in an LE or a BE environment.

Does that make sense now?


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-11 15:24               ` Alexander Graf
@ 2014-11-12  8:49                 ` Frank Blaschka
  2014-11-12  9:08                   ` Alexander Graf
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-12  8:49 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net

On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
> 
> 
> On 11.11.14 15:08, Frank Blaschka wrote:
> > On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
> >>
> >>
> >>
> >>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
> >>>
> >>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
> >>>>
> >>>>
> >>>>> On 11.11.14 13:10, Frank Blaschka wrote:
> >>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
> >>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>>>>
> >>>>>>> This patch implements the s390 pci instructions in qemu. It allows
> >>>>>>> to access and drive pci devices attached to the s390 pci bus.
> >>>>>>> Because of platform constrains devices using IO BARs are not
> >>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
> >>>>>>>
> >>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>>>> ---
> >>>>>>> target-s390x/Makefile.objs |   2 +-
> >>>>>>> target-s390x/kvm.c         |  52 ++++
> >>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
> >>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
> >>>>>>> create mode 100644 target-s390x/pci_ic.c
> >>>>>>> create mode 100644 target-s390x/pci_ic.h
> >>>>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> >>>>>>> +{
> >>>>>>> +    CPUS390XState *env = &cpu->env;
> >>>>>>> +    S390PCIBusDevice *pbdev;
> >>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> >>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> >>>>>>> +    PciLgStg *rp;
> >>>>>>> +    uint64_t offset;
> >>>>>>> +    uint64_t data;
> >>>>>>> +    uint8_t len;
> >>>>>>> +
> >>>>>>> +    cpu_synchronize_state(CPU(cpu));
> >>>>>>> +
> >>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> >>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> >>>>>>> +        return 0;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    if (r2 & 0x1) {
> >>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> >>>>>>> +        return 0;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
> >>>>>>> +    offset = env->regs[r2 + 1];
> >>>>>>> +
> >>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> >>>>>>> +    if (!pbdev) {
> >>>>>>> +        DPRINTF("pcilg no pci dev\n");
> >>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> >>>>>>> +        return 0;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    len = rp->len & 0xF;
> >>>>>>> +    if (rp->pcias < 6) {
> >>>>>>> +        if ((8 - (offset & 0x7)) < len) {
> >>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>>>>>> +            return 0;
> >>>>>>> +        }
> >>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> >>>>>>> +        io_mem_read(mr, offset, &data, len);
> >>>>>>> +    } else if (rp->pcias == 15) {
> >>>>>>> +        if ((4 - (offset & 0x3)) < len) {
> >>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>>>>>> +            return 0;
> >>>>>>> +        }
> >>>>>>> +        data =  pci_host_config_read_common(
> >>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
> >>>>>>> +
> >>>>>>> +        switch (len) {
> >>>>>>> +        case 1:
> >>>>>>> +            break;
> >>>>>>> +        case 2:
> >>>>>>> +            data = cpu_to_le16(data);
> >>>>>>> +            break;
> >>>>>>> +        case 4:
> >>>>>>> +            data = cpu_to_le32(data);
> >>>>>>> +            break;
> >>>>>>> +        case 8:
> >>>>>>> +            data = cpu_to_le64(data);
> >>>>>>> +            break;
> >>>>>>
> >>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> >>>>>> and LE. So if you're running this on an LE host, you won't swap the
> >>>>>> value and get a broken result.
> >>>>>>
> >>>>>> If you know that the value is always swapped, use bswapxx().
> >>>>>>
> >>>>>
> >>>>> Actually the code is right and required for a big endian host :-)
> >>>>> pcilg/pcistg provide access to the PCI config space which is defined
> >>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
> >>>>> already a le to cpu conversion we have to convert back to PCI byte order.
> >>>>> Doing an unconditional swap would be a bug on a little endian host.
> >>>>
> >>>> Why would it be a bug? The value you end up writing is contents of a
> >>>> register and thus doesn't have endianness. So if QEMU was an LE process,
> >>>
> >>> No, the s390 guest executing pcilg instruction expects to receive config space data
> >>> in PCI byte order.
> >>>
> >>>> the value of data would be identical as on a BE QEMU before your swab.
> >>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
> >>>>
> >>>
> >>> Again on BE endian host we do the swap because of pci_host_config_read_common does
> >>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
> >>>
> >>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
> >>> convert back to PCI byte order.
> >>
> >> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
> >>
> >> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
> >>
> > so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
> > 
> > uint32_t pci_default_read_config(PCIDevice *d,
> >                                  uint32_t address, int len)
> > {
> >     uint32_t val = 0;
> > 
> >     memcpy(&val, d->config + address, len);
> >     return le32_to_cpu(val);
> > }
> > 
> > What did I miss?
> 
> That's exactly where you end up in - and it's there to convert from the
> PCI config space backing storage to a native number.
> 
> Imagine you write 0x12345678 at offset 0. Because PCI config space is
> defined to be LE, in the PCI config space memory this gets stored as
> 
> 78 56 34 12
> 
> The reason we do the internal storage of the config space that way is
> that it's (in some PCI implementations) legal to access with single byte
> granularities. So you could do a pci_config_read(offset = 1) which
> should return 0x56.
> 
> However, that means we completely nullify any effect of host endianness
> in the PCI config layer already. So if you do pci_config_write(offset =
> 0, size = 4, value = 0x12345678), the contents of d->config will always
> be identical, regardless of host endianness. The same holds true for
> pci_config_read(offset = 0, size = 4). It will always return 0x12345678.
>

I understood this from the beginning and I completely agree to this.
 
> In your code, you swab that value again. I assume there's a reason
> you're swapping it and that it's the way the architecture expects it

Yes, s390 pcilg architecture states:
Data in the PCI configuration space are treated
as being in little-endian byte ordering

> (mind to point me to the respective spec so I can verify?). But if the
> architecture expects it, then it expects it regardless of host
> endianness. The contents of regs[r1] should always be 0x78563412, no
> matter whether we're in an LE or a BE environment.
> 
> Does that make sense now?
> 
Absolutely lets make an example for qemu running on BE and LE

byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
BE            0x78563412             0x12345678                0x78563412
LE            0x78563412             0x78563412                0x78563412

So what is the problem with my code? Adding unconditional byte swap instead of
cpu_to_le in pcilg would break architecture for pcilg if qemu is running on LE
platform.

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  8:49                 ` Frank Blaschka
@ 2014-11-12  9:08                   ` Alexander Graf
  2014-11-12  9:11                     ` Paolo Bonzini
  2014-11-12  9:19                     ` Frank Blaschka
  0 siblings, 2 replies; 27+ messages in thread
From: Alexander Graf @ 2014-11-12  9:08 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net



On 12.11.14 09:49, Frank Blaschka wrote:
> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
>>
>>
>> On 11.11.14 15:08, Frank Blaschka wrote:
>>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>>
>>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
>>>>>
>>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>>> On 11.11.14 13:10, Frank Blaschka wrote:
>>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>
>>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows
>>>>>>>>> to access and drive pci devices attached to the s390 pci bus.
>>>>>>>>> Because of platform constrains devices using IO BARs are not
>>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>> ---
>>>>>>>>> target-s390x/Makefile.objs |   2 +-
>>>>>>>>> target-s390x/kvm.c         |  52 ++++
>>>>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>>>>>>>> create mode 100644 target-s390x/pci_ic.c
>>>>>>>>> create mode 100644 target-s390x/pci_ic.h
>>>>>>>>>
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>>>>>>>> +{
>>>>>>>>> +    CPUS390XState *env = &cpu->env;
>>>>>>>>> +    S390PCIBusDevice *pbdev;
>>>>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>>>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>>>>>>>> +    PciLgStg *rp;
>>>>>>>>> +    uint64_t offset;
>>>>>>>>> +    uint64_t data;
>>>>>>>>> +    uint8_t len;
>>>>>>>>> +
>>>>>>>>> +    cpu_synchronize_state(CPU(cpu));
>>>>>>>>> +
>>>>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>>>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>>>>>>>> +        return 0;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    if (r2 & 0x1) {
>>>>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>>>>>>>> +        return 0;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
>>>>>>>>> +    offset = env->regs[r2 + 1];
>>>>>>>>> +
>>>>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>>>>>>>> +    if (!pbdev) {
>>>>>>>>> +        DPRINTF("pcilg no pci dev\n");
>>>>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>>>>>>>> +        return 0;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    len = rp->len & 0xF;
>>>>>>>>> +    if (rp->pcias < 6) {
>>>>>>>>> +        if ((8 - (offset & 0x7)) < len) {
>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>> +            return 0;
>>>>>>>>> +        }
>>>>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>>>>>>>> +        io_mem_read(mr, offset, &data, len);
>>>>>>>>> +    } else if (rp->pcias == 15) {
>>>>>>>>> +        if ((4 - (offset & 0x3)) < len) {
>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>> +            return 0;
>>>>>>>>> +        }
>>>>>>>>> +        data =  pci_host_config_read_common(
>>>>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>>>>>>>> +
>>>>>>>>> +        switch (len) {
>>>>>>>>> +        case 1:
>>>>>>>>> +            break;
>>>>>>>>> +        case 2:
>>>>>>>>> +            data = cpu_to_le16(data);
>>>>>>>>> +            break;
>>>>>>>>> +        case 4:
>>>>>>>>> +            data = cpu_to_le32(data);
>>>>>>>>> +            break;
>>>>>>>>> +        case 8:
>>>>>>>>> +            data = cpu_to_le64(data);
>>>>>>>>> +            break;
>>>>>>>>
>>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>>>>>>>> and LE. So if you're running this on an LE host, you won't swap the
>>>>>>>> value and get a broken result.
>>>>>>>>
>>>>>>>> If you know that the value is always swapped, use bswapxx().
>>>>>>>>
>>>>>>>
>>>>>>> Actually the code is right and required for a big endian host :-)
>>>>>>> pcilg/pcistg provide access to the PCI config space which is defined
>>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
>>>>>>> already a le to cpu conversion we have to convert back to PCI byte order.
>>>>>>> Doing an unconditional swap would be a bug on a little endian host.
>>>>>>
>>>>>> Why would it be a bug? The value you end up writing is contents of a
>>>>>> register and thus doesn't have endianness. So if QEMU was an LE process,
>>>>>
>>>>> No, the s390 guest executing pcilg instruction expects to receive config space data
>>>>> in PCI byte order.
>>>>>
>>>>>> the value of data would be identical as on a BE QEMU before your swab.
>>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>>>>>>
>>>>>
>>>>> Again on BE endian host we do the swap because of pci_host_config_read_common does
>>>>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
>>>>>
>>>>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
>>>>> convert back to PCI byte order.
>>>>
>>>> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
>>>>
>>>> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
>>>>
>>> so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
>>>
>>> uint32_t pci_default_read_config(PCIDevice *d,
>>>                                  uint32_t address, int len)
>>> {
>>>     uint32_t val = 0;
>>>
>>>     memcpy(&val, d->config + address, len);
>>>     return le32_to_cpu(val);
>>> }
>>>
>>> What did I miss?
>>
>> That's exactly where you end up in - and it's there to convert from the
>> PCI config space backing storage to a native number.
>>
>> Imagine you write 0x12345678 at offset 0. Because PCI config space is
>> defined to be LE, in the PCI config space memory this gets stored as
>>
>> 78 56 34 12
>>
>> The reason we do the internal storage of the config space that way is
>> that it's (in some PCI implementations) legal to access with single byte
>> granularities. So you could do a pci_config_read(offset = 1) which
>> should return 0x56.
>>
>> However, that means we completely nullify any effect of host endianness
>> in the PCI config layer already. So if you do pci_config_write(offset =
>> 0, size = 4, value = 0x12345678), the contents of d->config will always
>> be identical, regardless of host endianness. The same holds true for
>> pci_config_read(offset = 0, size = 4). It will always return 0x12345678.
>>
> 
> I understood this from the beginning and I completely agree to this.
>  
>> In your code, you swab that value again. I assume there's a reason
>> you're swapping it and that it's the way the architecture expects it
> 
> Yes, s390 pcilg architecture states:
> Data in the PCI configuration space are treated
> as being in little-endian byte ordering
> 
>> (mind to point me to the respective spec so I can verify?). But if the
>> architecture expects it, then it expects it regardless of host
>> endianness. The contents of regs[r1] should always be 0x78563412, no
>> matter whether we're in an LE or a BE environment.
>>
>> Does that make sense now?
>>
> Absolutely lets make an example for qemu running on BE and LE
> 
> byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
> BE            0x78563412             0x12345678                0x78563412
> LE            0x78563412             0x78563412                0x78563412

No, pci_default_read_config() always returns 0x12345678 because it
returns a register, not memory.


Alex

> 
> So what is the problem with my code? Adding unconditional byte swap instead of
> cpu_to_le in pcilg would break architecture for pcilg if qemu is running on LE
> platform.
> 
>>
>> Alex
>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  9:08                   ` Alexander Graf
@ 2014-11-12  9:11                     ` Paolo Bonzini
  2014-11-12  9:13                       ` Alexander Graf
  2014-11-12  9:19                     ` Frank Blaschka
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2014-11-12  9:11 UTC (permalink / raw)
  To: Alexander Graf, Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com, rth@twiddle.net



On 12/11/2014 10:08, Alexander Graf wrote:
> 
> 
> On 12.11.14 09:49, Frank Blaschka wrote:
>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
>>>
>>>
>>> On 11.11.14 15:08, Frank Blaschka wrote:
>>>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
>>>>>>
>>>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 11.11.14 13:10, Frank Blaschka wrote:
>>>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>>
>>>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows
>>>>>>>>>> to access and drive pci devices attached to the s390 pci bus.
>>>>>>>>>> Because of platform constrains devices using IO BARs are not
>>>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>> ---
>>>>>>>>>> target-s390x/Makefile.objs |   2 +-
>>>>>>>>>> target-s390x/kvm.c         |  52 ++++
>>>>>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>>>>>>>>> create mode 100644 target-s390x/pci_ic.c
>>>>>>>>>> create mode 100644 target-s390x/pci_ic.h
>>>>>>>>>>
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>>>>>>>>> +{
>>>>>>>>>> +    CPUS390XState *env = &cpu->env;
>>>>>>>>>> +    S390PCIBusDevice *pbdev;
>>>>>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>>>>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>>>>>>>>> +    PciLgStg *rp;
>>>>>>>>>> +    uint64_t offset;
>>>>>>>>>> +    uint64_t data;
>>>>>>>>>> +    uint8_t len;
>>>>>>>>>> +
>>>>>>>>>> +    cpu_synchronize_state(CPU(cpu));
>>>>>>>>>> +
>>>>>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>>>>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>>>>>>>>> +        return 0;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    if (r2 & 0x1) {
>>>>>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>>>>>>>>> +        return 0;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
>>>>>>>>>> +    offset = env->regs[r2 + 1];
>>>>>>>>>> +
>>>>>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>>>>>>>>> +    if (!pbdev) {
>>>>>>>>>> +        DPRINTF("pcilg no pci dev\n");
>>>>>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>>>>>>>>> +        return 0;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    len = rp->len & 0xF;
>>>>>>>>>> +    if (rp->pcias < 6) {
>>>>>>>>>> +        if ((8 - (offset & 0x7)) < len) {
>>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>>> +            return 0;
>>>>>>>>>> +        }
>>>>>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>>>>>>>>> +        io_mem_read(mr, offset, &data, len);
>>>>>>>>>> +    } else if (rp->pcias == 15) {
>>>>>>>>>> +        if ((4 - (offset & 0x3)) < len) {
>>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>>> +            return 0;
>>>>>>>>>> +        }
>>>>>>>>>> +        data =  pci_host_config_read_common(
>>>>>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>>>>>>>>> +
>>>>>>>>>> +        switch (len) {
>>>>>>>>>> +        case 1:
>>>>>>>>>> +            break;
>>>>>>>>>> +        case 2:
>>>>>>>>>> +            data = cpu_to_le16(data);
>>>>>>>>>> +            break;
>>>>>>>>>> +        case 4:
>>>>>>>>>> +            data = cpu_to_le32(data);
>>>>>>>>>> +            break;
>>>>>>>>>> +        case 8:
>>>>>>>>>> +            data = cpu_to_le64(data);
>>>>>>>>>> +            break;
>>>>>>>>>
>>>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>>>>>>>>> and LE. So if you're running this on an LE host, you won't swap the
>>>>>>>>> value and get a broken result.
>>>>>>>>>
>>>>>>>>> If you know that the value is always swapped, use bswapxx().
>>>>>>>>>
>>>>>>>>
>>>>>>>> Actually the code is right and required for a big endian host :-)
>>>>>>>> pcilg/pcistg provide access to the PCI config space which is defined
>>>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
>>>>>>>> already a le to cpu conversion we have to convert back to PCI byte order.
>>>>>>>> Doing an unconditional swap would be a bug on a little endian host.
>>>>>>>
>>>>>>> Why would it be a bug? The value you end up writing is contents of a
>>>>>>> register and thus doesn't have endianness. So if QEMU was an LE process,
>>>>>>
>>>>>> No, the s390 guest executing pcilg instruction expects to receive config space data
>>>>>> in PCI byte order.
>>>>>>
>>>>>>> the value of data would be identical as on a BE QEMU before your swab.
>>>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>>>>>>>
>>>>>>
>>>>>> Again on BE endian host we do the swap because of pci_host_config_read_common does
>>>>>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
>>>>>>
>>>>>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
>>>>>> convert back to PCI byte order.
>>>>>
>>>>> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
>>>>>
>>>>> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
>>>>>
>>>> so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
>>>>
>>>> uint32_t pci_default_read_config(PCIDevice *d,
>>>>                                  uint32_t address, int len)
>>>> {
>>>>     uint32_t val = 0;
>>>>
>>>>     memcpy(&val, d->config + address, len);
>>>>     return le32_to_cpu(val);
>>>> }
>>>>
>>>> What did I miss?
>>>
>>> That's exactly where you end up in - and it's there to convert from the
>>> PCI config space backing storage to a native number.
>>>
>>> Imagine you write 0x12345678 at offset 0. Because PCI config space is
>>> defined to be LE, in the PCI config space memory this gets stored as
>>>
>>> 78 56 34 12
>>>
>>> The reason we do the internal storage of the config space that way is
>>> that it's (in some PCI implementations) legal to access with single byte
>>> granularities. So you could do a pci_config_read(offset = 1) which
>>> should return 0x56.
>>>
>>> However, that means we completely nullify any effect of host endianness
>>> in the PCI config layer already. So if you do pci_config_write(offset =
>>> 0, size = 4, value = 0x12345678), the contents of d->config will always
>>> be identical, regardless of host endianness. The same holds true for
>>> pci_config_read(offset = 0, size = 4). It will always return 0x12345678.
>>>
>>
>> I understood this from the beginning and I completely agree to this.
>>  
>>> In your code, you swab that value again. I assume there's a reason
>>> you're swapping it and that it's the way the architecture expects it
>>
>> Yes, s390 pcilg architecture states:
>> Data in the PCI configuration space are treated
>> as being in little-endian byte ordering
>>
>>> (mind to point me to the respective spec so I can verify?). But if the
>>> architecture expects it, then it expects it regardless of host
>>> endianness. The contents of regs[r1] should always be 0x78563412, no
>>> matter whether we're in an LE or a BE environment.
>>>
>>> Does that make sense now?
>>>
>> Absolutely lets make an example for qemu running on BE and LE
>>
>> byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
>> BE            0x78563412             0x12345678                0x78563412
>> LE            0x78563412             0x78563412                0x78563412
> 
> No, pci_default_read_config() always returns 0x12345678 because it
> returns a register, not memory.

So:

      config space    pci_default_read_config     pcilg
      (bytes)         memcpy       cpu_to_le      (with cpu_to_le)
BE    78 56 34 12     0x78563412   0x12345678     0x78563412
LE    78 56 34 12     0x12345678   0x12345678     0x12345678

Right?

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  9:11                     ` Paolo Bonzini
@ 2014-11-12  9:13                       ` Alexander Graf
  0 siblings, 0 replies; 27+ messages in thread
From: Alexander Graf @ 2014-11-12  9:13 UTC (permalink / raw)
  To: Paolo Bonzini, Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com, rth@twiddle.net



On 12.11.14 10:11, Paolo Bonzini wrote:
> 
> 
> On 12/11/2014 10:08, Alexander Graf wrote:
>>
>>
>> On 12.11.14 09:49, Frank Blaschka wrote:
>>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 11.11.14 15:08, Frank Blaschka wrote:
>>>>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
>>>>>>>
>>>>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 11.11.14 13:10, Frank Blaschka wrote:
>>>>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>>>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>>>
>>>>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows
>>>>>>>>>>> to access and drive pci devices attached to the s390 pci bus.
>>>>>>>>>>> Because of platform constrains devices using IO BARs are not
>>>>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>>> ---
>>>>>>>>>>> target-s390x/Makefile.objs |   2 +-
>>>>>>>>>>> target-s390x/kvm.c         |  52 ++++
>>>>>>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>>>>>>>>>> create mode 100644 target-s390x/pci_ic.c
>>>>>>>>>>> create mode 100644 target-s390x/pci_ic.h
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>>>>>>>>>> +{
>>>>>>>>>>> +    CPUS390XState *env = &cpu->env;
>>>>>>>>>>> +    S390PCIBusDevice *pbdev;
>>>>>>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>>>>>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>>>>>>>>>> +    PciLgStg *rp;
>>>>>>>>>>> +    uint64_t offset;
>>>>>>>>>>> +    uint64_t data;
>>>>>>>>>>> +    uint8_t len;
>>>>>>>>>>> +
>>>>>>>>>>> +    cpu_synchronize_state(CPU(cpu));
>>>>>>>>>>> +
>>>>>>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>>>>>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>>>>>>>>>> +        return 0;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    if (r2 & 0x1) {
>>>>>>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>>>>>>>>>> +        return 0;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
>>>>>>>>>>> +    offset = env->regs[r2 + 1];
>>>>>>>>>>> +
>>>>>>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>>>>>>>>>> +    if (!pbdev) {
>>>>>>>>>>> +        DPRINTF("pcilg no pci dev\n");
>>>>>>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>>>>>>>>>> +        return 0;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    len = rp->len & 0xF;
>>>>>>>>>>> +    if (rp->pcias < 6) {
>>>>>>>>>>> +        if ((8 - (offset & 0x7)) < len) {
>>>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>>>> +            return 0;
>>>>>>>>>>> +        }
>>>>>>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>>>>>>>>>> +        io_mem_read(mr, offset, &data, len);
>>>>>>>>>>> +    } else if (rp->pcias == 15) {
>>>>>>>>>>> +        if ((4 - (offset & 0x3)) < len) {
>>>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>>>> +            return 0;
>>>>>>>>>>> +        }
>>>>>>>>>>> +        data =  pci_host_config_read_common(
>>>>>>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>>>>>>>>>> +
>>>>>>>>>>> +        switch (len) {
>>>>>>>>>>> +        case 1:
>>>>>>>>>>> +            break;
>>>>>>>>>>> +        case 2:
>>>>>>>>>>> +            data = cpu_to_le16(data);
>>>>>>>>>>> +            break;
>>>>>>>>>>> +        case 4:
>>>>>>>>>>> +            data = cpu_to_le32(data);
>>>>>>>>>>> +            break;
>>>>>>>>>>> +        case 8:
>>>>>>>>>>> +            data = cpu_to_le64(data);
>>>>>>>>>>> +            break;
>>>>>>>>>>
>>>>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>>>>>>>>>> and LE. So if you're running this on an LE host, you won't swap the
>>>>>>>>>> value and get a broken result.
>>>>>>>>>>
>>>>>>>>>> If you know that the value is always swapped, use bswapxx().
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Actually the code is right and required for a big endian host :-)
>>>>>>>>> pcilg/pcistg provide access to the PCI config space which is defined
>>>>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
>>>>>>>>> already a le to cpu conversion we have to convert back to PCI byte order.
>>>>>>>>> Doing an unconditional swap would be a bug on a little endian host.
>>>>>>>>
>>>>>>>> Why would it be a bug? The value you end up writing is contents of a
>>>>>>>> register and thus doesn't have endianness. So if QEMU was an LE process,
>>>>>>>
>>>>>>> No, the s390 guest executing pcilg instruction expects to receive config space data
>>>>>>> in PCI byte order.
>>>>>>>
>>>>>>>> the value of data would be identical as on a BE QEMU before your swab.
>>>>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>>>>>>>>
>>>>>>>
>>>>>>> Again on BE endian host we do the swap because of pci_host_config_read_common does
>>>>>>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
>>>>>>>
>>>>>>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
>>>>>>> convert back to PCI byte order.
>>>>>>
>>>>>> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
>>>>>>
>>>>>> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
>>>>>>
>>>>> so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
>>>>>
>>>>> uint32_t pci_default_read_config(PCIDevice *d,
>>>>>                                  uint32_t address, int len)
>>>>> {
>>>>>     uint32_t val = 0;
>>>>>
>>>>>     memcpy(&val, d->config + address, len);
>>>>>     return le32_to_cpu(val);
>>>>> }
>>>>>
>>>>> What did I miss?
>>>>
>>>> That's exactly where you end up in - and it's there to convert from the
>>>> PCI config space backing storage to a native number.
>>>>
>>>> Imagine you write 0x12345678 at offset 0. Because PCI config space is
>>>> defined to be LE, in the PCI config space memory this gets stored as
>>>>
>>>> 78 56 34 12
>>>>
>>>> The reason we do the internal storage of the config space that way is
>>>> that it's (in some PCI implementations) legal to access with single byte
>>>> granularities. So you could do a pci_config_read(offset = 1) which
>>>> should return 0x56.
>>>>
>>>> However, that means we completely nullify any effect of host endianness
>>>> in the PCI config layer already. So if you do pci_config_write(offset =
>>>> 0, size = 4, value = 0x12345678), the contents of d->config will always
>>>> be identical, regardless of host endianness. The same holds true for
>>>> pci_config_read(offset = 0, size = 4). It will always return 0x12345678.
>>>>
>>>
>>> I understood this from the beginning and I completely agree to this.
>>>  
>>>> In your code, you swab that value again. I assume there's a reason
>>>> you're swapping it and that it's the way the architecture expects it
>>>
>>> Yes, s390 pcilg architecture states:
>>> Data in the PCI configuration space are treated
>>> as being in little-endian byte ordering
>>>
>>>> (mind to point me to the respective spec so I can verify?). But if the
>>>> architecture expects it, then it expects it regardless of host
>>>> endianness. The contents of regs[r1] should always be 0x78563412, no
>>>> matter whether we're in an LE or a BE environment.
>>>>
>>>> Does that make sense now?
>>>>
>>> Absolutely lets make an example for qemu running on BE and LE
>>>
>>> byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
>>> BE            0x78563412             0x12345678                0x78563412
>>> LE            0x78563412             0x78563412                0x78563412
>>
>> No, pci_default_read_config() always returns 0x12345678 because it
>> returns a register, not memory.
> 
> So:
> 
>       config space    pci_default_read_config     pcilg
>       (bytes)         memcpy       cpu_to_le      (with cpu_to_le)
> BE    78 56 34 12     0x78563412   0x12345678     0x78563412
> LE    78 56 34 12     0x12345678   0x12345678     0x12345678
> 
> Right?

Yes, exactly :).


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  9:08                   ` Alexander Graf
  2014-11-12  9:11                     ` Paolo Bonzini
@ 2014-11-12  9:19                     ` Frank Blaschka
  2014-11-12  9:22                       ` Alexander Graf
  1 sibling, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-12  9:19 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net

On Wed, Nov 12, 2014 at 10:08:19AM +0100, Alexander Graf wrote:
> 
> 
> On 12.11.14 09:49, Frank Blaschka wrote:
> > On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 11.11.14 15:08, Frank Blaschka wrote:
> >>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
> >>>>>
> >>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On 11.11.14 13:10, Frank Blaschka wrote:
> >>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
> >>>>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>>>>>>
> >>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows
> >>>>>>>>> to access and drive pci devices attached to the s390 pci bus.
> >>>>>>>>> Because of platform constrains devices using IO BARs are not
> >>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>>>>>> ---
> >>>>>>>>> target-s390x/Makefile.objs |   2 +-
> >>>>>>>>> target-s390x/kvm.c         |  52 ++++
> >>>>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
> >>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
> >>>>>>>>> create mode 100644 target-s390x/pci_ic.c
> >>>>>>>>> create mode 100644 target-s390x/pci_ic.h
> >>>>>>>>>
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> >>>>>>>>> +{
> >>>>>>>>> +    CPUS390XState *env = &cpu->env;
> >>>>>>>>> +    S390PCIBusDevice *pbdev;
> >>>>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
> >>>>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
> >>>>>>>>> +    PciLgStg *rp;
> >>>>>>>>> +    uint64_t offset;
> >>>>>>>>> +    uint64_t data;
> >>>>>>>>> +    uint8_t len;
> >>>>>>>>> +
> >>>>>>>>> +    cpu_synchronize_state(CPU(cpu));
> >>>>>>>>> +
> >>>>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
> >>>>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
> >>>>>>>>> +        return 0;
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    if (r2 & 0x1) {
> >>>>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
> >>>>>>>>> +        return 0;
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
> >>>>>>>>> +    offset = env->regs[r2 + 1];
> >>>>>>>>> +
> >>>>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
> >>>>>>>>> +    if (!pbdev) {
> >>>>>>>>> +        DPRINTF("pcilg no pci dev\n");
> >>>>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> >>>>>>>>> +        return 0;
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    len = rp->len & 0xF;
> >>>>>>>>> +    if (rp->pcias < 6) {
> >>>>>>>>> +        if ((8 - (offset & 0x7)) < len) {
> >>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>>>>>>>> +            return 0;
> >>>>>>>>> +        }
> >>>>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> >>>>>>>>> +        io_mem_read(mr, offset, &data, len);
> >>>>>>>>> +    } else if (rp->pcias == 15) {
> >>>>>>>>> +        if ((4 - (offset & 0x3)) < len) {
> >>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
> >>>>>>>>> +            return 0;
> >>>>>>>>> +        }
> >>>>>>>>> +        data =  pci_host_config_read_common(
> >>>>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
> >>>>>>>>> +
> >>>>>>>>> +        switch (len) {
> >>>>>>>>> +        case 1:
> >>>>>>>>> +            break;
> >>>>>>>>> +        case 2:
> >>>>>>>>> +            data = cpu_to_le16(data);
> >>>>>>>>> +            break;
> >>>>>>>>> +        case 4:
> >>>>>>>>> +            data = cpu_to_le32(data);
> >>>>>>>>> +            break;
> >>>>>>>>> +        case 8:
> >>>>>>>>> +            data = cpu_to_le64(data);
> >>>>>>>>> +            break;
> >>>>>>>>
> >>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> >>>>>>>> and LE. So if you're running this on an LE host, you won't swap the
> >>>>>>>> value and get a broken result.
> >>>>>>>>
> >>>>>>>> If you know that the value is always swapped, use bswapxx().
> >>>>>>>>
> >>>>>>>
> >>>>>>> Actually the code is right and required for a big endian host :-)
> >>>>>>> pcilg/pcistg provide access to the PCI config space which is defined
> >>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
> >>>>>>> already a le to cpu conversion we have to convert back to PCI byte order.
> >>>>>>> Doing an unconditional swap would be a bug on a little endian host.
> >>>>>>
> >>>>>> Why would it be a bug? The value you end up writing is contents of a
> >>>>>> register and thus doesn't have endianness. So if QEMU was an LE process,
> >>>>>
> >>>>> No, the s390 guest executing pcilg instruction expects to receive config space data
> >>>>> in PCI byte order.
> >>>>>
> >>>>>> the value of data would be identical as on a BE QEMU before your swab.
> >>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
> >>>>>>
> >>>>>
> >>>>> Again on BE endian host we do the swap because of pci_host_config_read_common does
> >>>>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
> >>>>>
> >>>>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
> >>>>> convert back to PCI byte order.
> >>>>
> >>>> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
> >>>>
> >>>> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
> >>>>
> >>> so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
> >>>
> >>> uint32_t pci_default_read_config(PCIDevice *d,
> >>>                                  uint32_t address, int len)
> >>> {
> >>>     uint32_t val = 0;
> >>>
> >>>     memcpy(&val, d->config + address, len);
> >>>     return le32_to_cpu(val);
> >>> }
> >>>
> >>> What did I miss?
> >>
> >> That's exactly where you end up in - and it's there to convert from the
> >> PCI config space backing storage to a native number.
> >>
> >> Imagine you write 0x12345678 at offset 0. Because PCI config space is
> >> defined to be LE, in the PCI config space memory this gets stored as
> >>
> >> 78 56 34 12
> >>
> >> The reason we do the internal storage of the config space that way is
> >> that it's (in some PCI implementations) legal to access with single byte
> >> granularities. So you could do a pci_config_read(offset = 1) which
> >> should return 0x56.
> >>
> >> However, that means we completely nullify any effect of host endianness
> >> in the PCI config layer already. So if you do pci_config_write(offset =
> >> 0, size = 4, value = 0x12345678), the contents of d->config will always
> >> be identical, regardless of host endianness. The same holds true for
> >> pci_config_read(offset = 0, size = 4). It will always return 0x12345678.
> >>
> > 
> > I understood this from the beginning and I completely agree to this.
> >  
> >> In your code, you swab that value again. I assume there's a reason
> >> you're swapping it and that it's the way the architecture expects it
> > 
> > Yes, s390 pcilg architecture states:
> > Data in the PCI configuration space are treated
> > as being in little-endian byte ordering
> > 
> >> (mind to point me to the respective spec so I can verify?). But if the
> >> architecture expects it, then it expects it regardless of host
> >> endianness. The contents of regs[r1] should always be 0x78563412, no
> >> matter whether we're in an LE or a BE environment.
> >>
> >> Does that make sense now?
> >>
> > Absolutely lets make an example for qemu running on BE and LE
> > 
> > byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
> > BE            0x78563412             0x12345678                0x78563412
> > LE            0x78563412             0x78563412                0x78563412
> 
> No, pci_default_read_config() always returns 0x12345678 because it
> returns a register, not memory.
>

You mean implementation of pci_default_read_config is broken?
If it should return a register it should not do "return le32_to_cpu(val);"
 
> 
> Alex
> 
> > 
> > So what is the problem with my code? Adding unconditional byte swap instead of
> > cpu_to_le in pcilg would break architecture for pcilg if qemu is running on LE
> > platform.
> > 
> >>
> >> Alex
> >>
> > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  9:19                     ` Frank Blaschka
@ 2014-11-12  9:22                       ` Alexander Graf
  2014-11-12  9:36                         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-12  9:22 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com,
	pbonzini@redhat.com, rth@twiddle.net



On 12.11.14 10:19, Frank Blaschka wrote:
> On Wed, Nov 12, 2014 at 10:08:19AM +0100, Alexander Graf wrote:
>>
>>
>> On 12.11.14 09:49, Frank Blaschka wrote:
>>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 11.11.14 15:08, Frank Blaschka wrote:
>>>>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka <blaschka@linux.vnet.ibm.com>:
>>>>>>>
>>>>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 11.11.14 13:10, Frank Blaschka wrote:
>>>>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>>>>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>>>
>>>>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows
>>>>>>>>>>> to access and drive pci devices attached to the s390 pci bus.
>>>>>>>>>>> Because of platform constrains devices using IO BARs are not
>>>>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>>>>>>> ---
>>>>>>>>>>> target-s390x/Makefile.objs |   2 +-
>>>>>>>>>>> target-s390x/kvm.c         |  52 ++++
>>>>>>>>>>> target-s390x/pci_ic.c      | 753 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>> target-s390x/pci_ic.h      | 335 ++++++++++++++++++++
>>>>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>>>>>>>>>> create mode 100644 target-s390x/pci_ic.c
>>>>>>>>>>> create mode 100644 target-s390x/pci_ic.h
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>>>>>>>>>> +{
>>>>>>>>>>> +    CPUS390XState *env = &cpu->env;
>>>>>>>>>>> +    S390PCIBusDevice *pbdev;
>>>>>>>>>>> +    uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20;
>>>>>>>>>>> +    uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16;
>>>>>>>>>>> +    PciLgStg *rp;
>>>>>>>>>>> +    uint64_t offset;
>>>>>>>>>>> +    uint64_t data;
>>>>>>>>>>> +    uint8_t len;
>>>>>>>>>>> +
>>>>>>>>>>> +    cpu_synchronize_state(CPU(cpu));
>>>>>>>>>>> +
>>>>>>>>>>> +    if (env->psw.mask & PSW_MASK_PSTATE) {
>>>>>>>>>>> +        program_interrupt(env, PGM_PRIVILEGED, 4);
>>>>>>>>>>> +        return 0;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    if (r2 & 0x1) {
>>>>>>>>>>> +        program_interrupt(env, PGM_SPECIFICATION, 4);
>>>>>>>>>>> +        return 0;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    rp = (PciLgStg *)&env->regs[r2];
>>>>>>>>>>> +    offset = env->regs[r2 + 1];
>>>>>>>>>>> +
>>>>>>>>>>> +    pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>>>>>>>>>> +    if (!pbdev) {
>>>>>>>>>>> +        DPRINTF("pcilg no pci dev\n");
>>>>>>>>>>> +        setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>>>>>>>>>> +        return 0;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    len = rp->len & 0xF;
>>>>>>>>>>> +    if (rp->pcias < 6) {
>>>>>>>>>>> +        if ((8 - (offset & 0x7)) < len) {
>>>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>>>> +            return 0;
>>>>>>>>>>> +        }
>>>>>>>>>>> +        MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
>>>>>>>>>>> +        io_mem_read(mr, offset, &data, len);
>>>>>>>>>>> +    } else if (rp->pcias == 15) {
>>>>>>>>>>> +        if ((4 - (offset & 0x3)) < len) {
>>>>>>>>>>> +            program_interrupt(env, PGM_OPERAND, 4);
>>>>>>>>>>> +            return 0;
>>>>>>>>>>> +        }
>>>>>>>>>>> +        data =  pci_host_config_read_common(
>>>>>>>>>>> +                   pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
>>>>>>>>>>> +
>>>>>>>>>>> +        switch (len) {
>>>>>>>>>>> +        case 1:
>>>>>>>>>>> +            break;
>>>>>>>>>>> +        case 2:
>>>>>>>>>>> +            data = cpu_to_le16(data);
>>>>>>>>>>> +            break;
>>>>>>>>>>> +        case 4:
>>>>>>>>>>> +            data = cpu_to_le32(data);
>>>>>>>>>>> +            break;
>>>>>>>>>>> +        case 8:
>>>>>>>>>>> +            data = cpu_to_le64(data);
>>>>>>>>>>> +            break;
>>>>>>>>>>
>>>>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>>>>>>>>>> and LE. So if you're running this on an LE host, you won't swap the
>>>>>>>>>> value and get a broken result.
>>>>>>>>>>
>>>>>>>>>> If you know that the value is always swapped, use bswapxx().
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Actually the code is right and required for a big endian host :-)
>>>>>>>>> pcilg/pcistg provide access to the PCI config space which is defined
>>>>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common does
>>>>>>>>> already a le to cpu conversion we have to convert back to PCI byte order.
>>>>>>>>> Doing an unconditional swap would be a bug on a little endian host.
>>>>>>>>
>>>>>>>> Why would it be a bug? The value you end up writing is contents of a
>>>>>>>> register and thus doesn't have endianness. So if QEMU was an LE process,
>>>>>>>
>>>>>>> No, the s390 guest executing pcilg instruction expects to receive config space data
>>>>>>> in PCI byte order.
>>>>>>>
>>>>>>>> the value of data would be identical as on a BE QEMU before your swab.
>>>>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts break.
>>>>>>>>
>>>>>>>
>>>>>>> Again on BE endian host we do the swap because of pci_host_config_read_common does
>>>>>>> read the value and do a byte swap for that value, but we need PCI byte order not BE here.
>>>>>>>
>>>>>>> On LE host pci_host_config_read_common does not do a byte swap so we do not have to
>>>>>>> convert back to PCI byte order.
>>>>>>
>>>>>> We maintain the PCI config space always in LE byte order in memory, that's why there is a bwap in its read function. The return result of the read function however is always the same, regardless of LE or BE host. If I do a read of size 4, I will always get 0x1, not 0x01000000 returned.
>>>>>>
>>>>>> So now you need to convert that 0x1 into a 0x01000000 manually here because some architect thought that registers have endianness (which they don't). But you need to do it always, even on an LE host, because the pci config space return value is identical on LE and BE.
>>>>>>
>>>>> so you tell me pci_host_config_read_common does not end up in pci_default_read_config?
>>>>>
>>>>> uint32_t pci_default_read_config(PCIDevice *d,
>>>>>                                  uint32_t address, int len)
>>>>> {
>>>>>     uint32_t val = 0;
>>>>>
>>>>>     memcpy(&val, d->config + address, len);
>>>>>     return le32_to_cpu(val);
>>>>> }
>>>>>
>>>>> What did I miss?
>>>>
>>>> That's exactly where you end up in - and it's there to convert from the
>>>> PCI config space backing storage to a native number.
>>>>
>>>> Imagine you write 0x12345678 at offset 0. Because PCI config space is
>>>> defined to be LE, in the PCI config space memory this gets stored as
>>>>
>>>> 78 56 34 12
>>>>
>>>> The reason we do the internal storage of the config space that way is
>>>> that it's (in some PCI implementations) legal to access with single byte
>>>> granularities. So you could do a pci_config_read(offset = 1) which
>>>> should return 0x56.
>>>>
>>>> However, that means we completely nullify any effect of host endianness
>>>> in the PCI config layer already. So if you do pci_config_write(offset =
>>>> 0, size = 4, value = 0x12345678), the contents of d->config will always
>>>> be identical, regardless of host endianness. The same holds true for
>>>> pci_config_read(offset = 0, size = 4). It will always return 0x12345678.
>>>>
>>>
>>> I understood this from the beginning and I completely agree to this.
>>>  
>>>> In your code, you swab that value again. I assume there's a reason
>>>> you're swapping it and that it's the way the architecture expects it
>>>
>>> Yes, s390 pcilg architecture states:
>>> Data in the PCI configuration space are treated
>>> as being in little-endian byte ordering
>>>
>>>> (mind to point me to the respective spec so I can verify?). But if the
>>>> architecture expects it, then it expects it regardless of host
>>>> endianness. The contents of regs[r1] should always be 0x78563412, no
>>>> matter whether we're in an LE or a BE environment.
>>>>
>>>> Does that make sense now?
>>>>
>>> Absolutely lets make an example for qemu running on BE and LE
>>>
>>> byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
>>> BE            0x78563412             0x12345678                0x78563412
>>> LE            0x78563412             0x78563412                0x78563412
>>
>> No, pci_default_read_config() always returns 0x12345678 because it
>> returns a register, not memory.
>>
> 
> You mean implementation of pci_default_read_config is broken?
> If it should return a register it should not do "return le32_to_cpu(val);"

It has to, to convert from memory (after memcpy) to an actual register
value. Look at the value list in Paolo's email - I really have no idea
how to explain it any better.


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  9:22                       ` Alexander Graf
@ 2014-11-12  9:36                         ` Paolo Bonzini
  2014-11-12 14:34                           ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2014-11-12  9:36 UTC (permalink / raw)
  To: Alexander Graf, Frank Blaschka
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com, rth@twiddle.net



On 12/11/2014 10:22, Alexander Graf wrote:
>>>> Absolutely lets make an example for qemu running on BE and LE
>>>>
>>>> byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
>>>> BE            0x78563412             0x12345678                0x78563412
>>>> LE            0x78563412             0x78563412                0x78563412
>>>
>>> No, pci_default_read_config() always returns 0x12345678 because it
>>> returns a register, not memory.
>>>
>>
>> You mean implementation of pci_default_read_config is broken?
>> If it should return a register it should not do "return le32_to_cpu(val);"
> 
> It has to, to convert from memory (after memcpy) to an actual register
> value. Look at the value list in Paolo's email - I really have no idea
> how to explain it any better.

pci_default_read_config is reading from a *device* register, and has
absolutely zero knowledge of the host CPU endianness.

Another way to explain that the result of pci_default_read_config is
independent of the host endianness, is that the function is basically
doing this:

switch (len) {
    case 1: return d->config[address];
    case 2: return ldw_le_p(&d->config[address)]);
    case 4: return ldl_le_p(&d->config[address)]);
    default: abort();
}

So if you want to make the outcome big endian, you have to swap
unconditionally.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions
  2014-11-12  9:36                         ` Paolo Bonzini
@ 2014-11-12 14:34                           ` Frank Blaschka
  0 siblings, 0 replies; 27+ messages in thread
From: Frank Blaschka @ 2014-11-12 14:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell@linaro.org, Frank Blaschka, james.hogan@imgtec.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org, Alexander Graf,
	borntraeger@de.ibm.com, cornelia.huck@de.ibm.com, rth@twiddle.net

On Wed, Nov 12, 2014 at 10:36:03AM +0100, Paolo Bonzini wrote:
> 
> 
> On 12/11/2014 10:22, Alexander Graf wrote:
> >>>> Absolutely lets make an example for qemu running on BE and LE
> >>>>
> >>>> byte order    config space backing   pci_default_read_config   pcilg (with cpu_to_le)
> >>>> BE            0x78563412             0x12345678                0x78563412
> >>>> LE            0x78563412             0x78563412                0x78563412
> >>>
> >>> No, pci_default_read_config() always returns 0x12345678 because it
> >>> returns a register, not memory.
> >>>
> >>
> >> You mean implementation of pci_default_read_config is broken?
> >> If it should return a register it should not do "return le32_to_cpu(val);"
> > 
> > It has to, to convert from memory (after memcpy) to an actual register
> > value. Look at the value list in Paolo's email - I really have no idea
> > how to explain it any better.
> 
> pci_default_read_config is reading from a *device* register, and has
> absolutely zero knowledge of the host CPU endianness.
> 
> Another way to explain that the result of pci_default_read_config is
> independent of the host endianness, is that the function is basically
> doing this:
> 
> switch (len) {
>     case 1: return d->config[address];
>     case 2: return ldw_le_p(&d->config[address)]);
>     case 4: return ldl_le_p(&d->config[address)]);
>     default: abort();
> }
> 
> So if you want to make the outcome big endian, you have to swap
> unconditionally.
> 
> Paolo

Hi Paolo, Alex,

thx a lot for all the explanation and patience.
I think I have understand your point now. I will change the code to 
unconditional swap. I feel I had a knowledge gap regarding running guest and
host which different byte orders. Hope this gap is filled now ;)

Frank

> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-10 15:14   ` Alexander Graf
@ 2014-11-18 12:50     ` Frank Blaschka
  2014-11-18 17:00       ` Alexander Graf
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-18 12:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth

On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
> 
> 
> On 10.11.14 15:20, Frank Blaschka wrote:
> > From: Frank Blaschka <frank.blaschka@de.ibm.com>
> > 
> > This patch implements a pci bus for s390x together with infrastructure
> > to generate and handle hotplug events, to configure/unconfigure via
> > sclp instruction, to do iommu translations and provide s390 support for
> > MSI/MSI-X notification processing.
> > 
> > Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> > ---
> >  default-configs/s390x-softmmu.mak |   1 +
> >  hw/s390x/Makefile.objs            |   1 +
> >  hw/s390x/css.c                    |   5 +
> >  hw/s390x/css.h                    |   1 +
> >  hw/s390x/s390-pci-bus.c           | 485 ++++++++++++++++++++++++++++++++++++++
> >  hw/s390x/s390-pci-bus.h           | 254 ++++++++++++++++++++
> >  hw/s390x/s390-virtio-ccw.c        |   3 +
> >  hw/s390x/sclp.c                   |  10 +-
> >  include/hw/s390x/sclp.h           |   8 +
> >  target-s390x/ioinst.c             |  52 ++++
> >  target-s390x/ioinst.h             |   1 +
> >  11 files changed, 820 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/s390x/s390-pci-bus.c
> >  create mode 100644 hw/s390x/s390-pci-bus.h
> > 
> > diff --git a/default-configs/s390x-softmmu.mak b/default-configs/s390x-softmmu.mak
> > index 126d88d..6ee2ff8 100644
> > --- a/default-configs/s390x-softmmu.mak
> > +++ b/default-configs/s390x-softmmu.mak
> > @@ -1,3 +1,4 @@
> > +include pci.mak
> >  CONFIG_VIRTIO=y
> >  CONFIG_SCLPCONSOLE=y
> >  CONFIG_S390_FLIC=y
> > diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
> > index 1ba6c3a..428d957 100644
> > --- a/hw/s390x/Makefile.objs
> > +++ b/hw/s390x/Makefile.objs
> > @@ -8,3 +8,4 @@ obj-y += ipl.o
> >  obj-y += css.o
> >  obj-y += s390-virtio-ccw.o
> >  obj-y += virtio-ccw.o
> > +obj-y += s390-pci-bus.o
> > diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> > index b67c039..7553085 100644
> > --- a/hw/s390x/css.c
> > +++ b/hw/s390x/css.c
> > @@ -1299,6 +1299,11 @@ void css_generate_chp_crws(uint8_t cssid, uint8_t chpid)
> >      /* TODO */
> >  }
> >  
> > +void css_generate_css_crws(uint8_t cssid)
> > +{
> > +    css_queue_crw(CRW_RSC_CSS, 0, 0, 0);
> > +}
> > +
> >  int css_enable_mcsse(void)
> >  {
> >      trace_css_enable_facility("mcsse");
> > diff --git a/hw/s390x/css.h b/hw/s390x/css.h
> > index 33104ac..7e53148 100644
> > --- a/hw/s390x/css.h
> > +++ b/hw/s390x/css.h
> > @@ -101,6 +101,7 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, uint16_t rsid);
> >  void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
> >                             int hotplugged, int add);
> >  void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
> > +void css_generate_css_crws(uint8_t cssid);
> >  void css_adapter_interrupt(uint8_t isc);
> >  
> >  #define CSS_IO_ADAPTER_VIRTIO 1
> > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > new file mode 100644
> > index 0000000..f2fa6ba
> > --- /dev/null
> > +++ b/hw/s390x/s390-pci-bus.c
> > @@ -0,0 +1,485 @@
> > +/*
> > + * s390 PCI BUS
> > + *
> > + * Copyright 2014 IBM Corp.
> > + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> > + *            Hong Bo Li <lihbbj@cn.ibm.com>
> > + *            Yi Min Zhao <zyimin@cn.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > + * your option) any later version. See the COPYING file in the top-level
> > + * directory.
> > + */
> > +
> > +#include <hw/pci/pci.h>
> > +#include <hw/pci/pci_bus.h>
> > +#include <hw/s390x/css.h>
> > +#include <hw/s390x/sclp.h>
> > +#include <hw/pci/msi.h>
> > +#include "qemu/error-report.h"
> > +#include "s390-pci-bus.h"
> > +
> > +/* #define DEBUG_S390PCI_BUS */
> > +#ifdef DEBUG_S390PCI_BUS
> > +#define DPRINTF(fmt, ...) \
> > +    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> > +
> > +static const unsigned long be_to_le = BITS_PER_LONG - 1;
> > +static QTAILQ_HEAD(, SeiContainer) pending_sei =
> > +    QTAILQ_HEAD_INITIALIZER(pending_sei);
> > +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
> > +    QTAILQ_HEAD_INITIALIZER(device_list);
> 
> Please get rid of all statics ;). All state has to live in objects.
>

be_to_le was misleading and unnecesary will remove this one but
static QTAILQ_HEAD seems to be a common practice for list anchors.
If you really want me to change this do you have any prefered way,
or can you point me to some code doing this?

> > +
> > +int chsc_sei_nt2_get_event(void *res)
> > +{
> > +    ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
> > +    PciCcdfAvail *accdf;
> > +    PciCcdfErr *eccdf;
> > +    int rc = 1;
> > +    SeiContainer *sei_cont;
> > +
> > +    sei_cont = QTAILQ_FIRST(&pending_sei);
> > +    if (sei_cont) {
> > +        QTAILQ_REMOVE(&pending_sei, sei_cont, link);
> > +        nt2_res->nt = 2;
> > +        nt2_res->cc = sei_cont->cc;
> > +        switch (sei_cont->cc) {
> > +        case 1: /* error event */
> > +            eccdf = (PciCcdfErr *)nt2_res->ccdf;
> > +            eccdf->fid = cpu_to_be32(sei_cont->fid);
> > +            eccdf->fh = cpu_to_be32(sei_cont->fh);
> > +            break;
> > +        case 2: /* availability event */
> > +            accdf = (PciCcdfAvail *)nt2_res->ccdf;
> > +            accdf->fid = cpu_to_be32(sei_cont->fid);
> > +            accdf->fh = cpu_to_be32(sei_cont->fh);
> > +            accdf->pec = cpu_to_be16(sei_cont->pec);
> > +            break;
> > +        default:
> > +            abort();
> > +        }
> > +        g_free(sei_cont);
> > +        rc = 0;
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> > +int chsc_sei_nt2_have_event(void)
> > +{
> > +    return !QTAILQ_EMPTY(&pending_sei);
> > +}
> > +
> > +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
> > +{
> > +    S390PCIBusDevice *pbdev;
> > +
> > +    QTAILQ_FOREACH(pbdev, &device_list, next) {
> > +        if (pbdev->fid == fid) {
> > +            return pbdev;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +void s390_pci_sclp_configure(int configure, SCCB *sccb)
> > +{
> > +    PciCfgSccb *psccb = (PciCfgSccb *)sccb;
> > +    S390PCIBusDevice *pbdev = s390_pci_find_dev_by_fid(be32_to_cpu(psccb->aid));
> > +    uint16_t rc;
> > +
> > +    if (pbdev) {
> > +        if ((configure == 1 && pbdev->configured == true) ||
> > +            (configure == 0 && pbdev->configured == false)) {
> > +            rc = SCLP_RC_NO_ACTION_REQUIRED;
> > +        } else {
> > +            pbdev->configured = !pbdev->configured;
> > +            rc = SCLP_RC_NORMAL_COMPLETION;
> > +        }
> > +    } else {
> > +        DPRINTF("sclp config %d no dev found\n", configure);
> > +        rc = SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED;
> > +    }
> > +
> > +    psccb->header.response_code = cpu_to_be16(rc);
> > +    return;
> > +}
> > +
> > +static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
> > +{
> > +    return PCI_SLOT(pdev->devfn);
> > +}
> > +
> > +static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
> > +{
> > +    return PCI_SLOT(pdev->devfn) | FH_VIRT;
> > +}
> > +
> > +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
> > +{
> > +    S390PCIBusDevice *dev;
> > +    int i = 0;
> > +
> > +    QTAILQ_FOREACH(dev, &device_list, next) {
> > +        if (i == idx) {
> > +            return dev;
> > +        }
> > +        i++;
> > +    }
> > +    return NULL;
> > +}
> > +
> > +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh)
> > +{
> > +    S390PCIBusDevice *pbdev;
> > +
> > +    QTAILQ_FOREACH(pbdev, &device_list, next) {
> > +        if (pbdev->fh == fh) {
> > +            return pbdev;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static void s390_pci_generate_plug_event(uint16_t pec, uint32_t fh,
> > +                                         uint32_t fid)
> > +{
> > +    SeiContainer *sei_cont = g_malloc0(sizeof(SeiContainer));
> > +
> > +    sei_cont->fh = fh;
> > +    sei_cont->fid = fid;
> > +    sei_cont->cc = 2;
> > +    sei_cont->pec = pec;
> > +
> > +    QTAILQ_INSERT_TAIL(&pending_sei, sei_cont, link);
> > +    css_generate_css_crws(0);
> > +}
> > +
> > +static void s390_pci_set_irq(void *opaque, int irq, int level)
> > +{
> > +    /* nothing to do */
> > +}
> > +
> > +static int s390_pci_map_irq(PCIDevice *pci_dev, int irq_num)
> > +{
> > +    /* nothing to do */
> > +    return 0;
> > +}
> > +
> > +void s390_pci_bus_init(void)
> > +{
> > +    DeviceState *dev;
> > +
> > +    dev = qdev_create(NULL, TYPE_S390_PCI_HOST_BRIDGE);
> > +    qdev_init_nofail(dev);
> > +}
> > +
> > +uint64_t s390_pci_get_table_origin(uint64_t iota)
> > +{
> > +    return iota & ~ZPCI_IOTA_RTTO_FLAG;
> > +}
> > +
> > +static uint32_t s390_pci_get_p(uint64_t iota)
> > +{
> > +    return iota & ~ZPCI_IOTA_RTTO_FLAG;
> > +}
> > +
> > +static uint32_t s390_pci_get_dt(uint64_t iota)
> > +{
> > +    return (iota >> 2) & 0x7;
> > +}
> > +
> > +static uint32_t s390_pci_get_fs(uint64_t iota)
> > +{
> > +    uint32_t dt = s390_pci_get_dt(iota);
> > +
> > +    if (dt == 4 || dt == 5) {
> > +        return iota & 0x3;
> > +    } else {
> > +        return ZPCI_IOTA_FS_4K;
> > +    }
> > +}
> > +
> > +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
> > +                                  uint64_t guest_dma_address)
> > +{
> > +    uint64_t sto_a, pto_a, px_a;
> > +    uint64_t sto, pto, pte;
> > +    uint32_t rtx, sx, px;
> > +
> > +    rtx = calc_rtx(guest_dma_address);
> > +    sx = calc_sx(guest_dma_address);
> > +    px = calc_px(guest_dma_address);
> > +
> > +    sto_a = guest_iota + rtx * sizeof(uint64_t);
> > +    cpu_physical_memory_rw(sto_a, (uint8_t *)&sto, sizeof(uint64_t), 0);
> 
> This is not endian safe. Couldn't you just use ldq_be_phys() here?
> 
> > +    sto = (uint64_t)get_rt_sto(sto);
> > +
> > +    pto_a = sto + sx * sizeof(uint64_t);
> 
> Can there be invalid entries? How could the guest say "access not
> allowed for this region"?
> 
> > +    cpu_physical_memory_rw(pto_a, (uint8_t *)&pto, sizeof(uint64_t), 0);
> 
> ldq_be_phys()
> 
> > +    pto = (uint64_t)get_st_pto(pto);
> > +
> > +    px_a = pto + px * sizeof(uint64_t);
> > +    cpu_physical_memory_rw(px_a, (uint8_t *)&pte, sizeof(uint64_t), 0);
> 
> ldq_be_phys()
> 
> > +
> > +    return pte;
> > +}
> > +
> > +static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> > +                                          bool is_write)
> > +{
> > +    IOMMUTLBEntry ret;
> > +    uint32_t fs;
> > +    uint64_t pte;
> > +    BEntry *container = container_of(iommu, BEntry, mr);
> > +    S390PCIBusDevice *pbdev = container->pbdev;
> > +    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pbdev->pdev)
> > +                                           ->qbus.parent);
> > +
> > +    DPRINTF("iommu trans addr 0x%lx\n", addr);
> > +
> > +    /* s390 does not have an APIC maped to main storage so we use
> > +     * a separate AddressSpace only for msix notifications
> > +     */
> > +    if (addr == ZPCI_MSI_ADDR) {
> > +        ret.target_as = &s->msix_notify_as;
> > +        ret.iova = addr;
> > +        ret.translated_addr = addr;
> > +        ret.addr_mask = 0xfff;
> > +        ret.perm = IOMMU_RW;
> > +        return ret;
> > +    }
> > +
> > +    pte = s390_guest_io_table_walk(s390_pci_get_table_origin(pbdev->g_iota),
> > +                                   addr);
> 
> Same question for the invalid entry part again. How can we inform the
> guest that a device tried to DMA to an invalid address? Shouldn't there
> be some error recovery interrupt and device shutdown in that case?
> 
> > +
> > +    ret.target_as = &address_space_memory;
> > +    ret.iova = addr;
> > +    ret.translated_addr = pte & ZPCI_PTE_ADDR_MASK;
> > +    fs = s390_pci_get_fs(pbdev->g_iota);
> > +    if (fs == ZPCI_IOTA_FS_4K) {
> > +        ret.addr_mask = 0xfff;
> > +    } else if (fs == ZPCI_IOTA_FS_1M) {
> > +        ret.addr_mask = 0xfffff;
> > +    } else if (fs == ZPCI_IOTA_FS_2G) {
> > +        ret.addr_mask = 0x7fffffff;
> > +    }
> > +    if (s390_pci_get_p(pbdev->g_iota) == 1) {
> > +        ret.perm = IOMMU_RO;
> > +    } else {
> > +        ret.perm = IOMMU_RW;
> > +    }
> > +    return ret;
> > +}
> > +
> > +static const MemoryRegionIOMMUOps s390_iommu_ops = {
> > +    .translate = s390_translate_iommu,
> > +};
> > +
> > +static AddressSpace *s390_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> > +{
> > +    S390pciState *s = opaque;
> > +
> > +    return &s->iommu[PCI_SLOT(devfn)].as;
> > +}
> > +
> > +static void s390_msi_ctrl_write(void *opaque, hwaddr addr, uint64_t data,
> > +                                unsigned int size)
> > +{
> > +    S390PCIBusDevice *pbdev;
> > +    unsigned long *aibv, *aisb;
> > +    int summary_set;
> > +    hwaddr aibv_len, aisb_len;
> > +    uint32_t io_int_word;
> > +    uint32_t hdata = le32_to_cpu(data);
> 
> ^
> 
> > +    uint32_t fid = hdata >> ZPCI_MSI_VEC_BITS;
> > +    uint32_t vec = hdata & ZPCI_MSI_VEC_MASK;
> > +
> > +    DPRINTF("write_msix data 0x%lx fid %d vec 0x%x\n", data, fid, vec);
> > +
> > +    pbdev = s390_pci_find_dev_by_fid(fid);
> > +    if (!pbdev) {
> > +        DPRINTF("msix_notify no dev\n");
> > +        return;
> > +    }
> > +    aibv_len = aisb_len = 8;
> > +    aibv = cpu_physical_memory_map(pbdev->routes.adapter.ind_addr,
> > +                                   &aibv_len, 1);
> > +    aisb = cpu_physical_memory_map(pbdev->routes.adapter.summary_addr,
> > +                                   &aisb_len, 1);
> 
> Please use ldq_le/be_phys.
> 
> > +
> > +    set_bit(vec ^ be_to_le, aibv);
> 
> What is this be_to_le? Looking at the other endianness messup so far,
> I'm not convinced this is correct. Is there any way to let the
> infrastructure deal with endianness instead?
>

Ok this one was a mess all over the place. We can not use ldq_le/be_phys because
we need to atomically set the indicator bit in guest memory. virtio_ccw_notify
does something similar and uses a very smart approach to handle the
endianness as well. Will rewrite the code to use the same approach.
 
> > +    summary_set = test_and_set_bit(pbdev->routes.adapter.summary_offset
> > +                                   ^ be_to_le, aisb);
> > +
> > +    if (!summary_set) {
> > +        io_int_word = (pbdev->isc << 27) | IO_INT_WORD_AI;
> > +        s390_io_interrupt(0, 0, 0, io_int_word);
> > +    }
> > +
> > +    cpu_physical_memory_unmap(aibv, aibv_len, 1, aibv_len);
> > +    cpu_physical_memory_unmap(aisb, aisb_len, 1, aisb_len);
> > +    return;
> > +}
> > +
> > +static uint64_t s390_msi_ctrl_read(void *opaque, hwaddr addr, unsigned size)
> > +{
> > +    return 0xffffffff;
> > +}
> > +
> > +static const MemoryRegionOps s390_msi_ctrl_ops = {
> > +    .write = s390_msi_ctrl_write,
> > +    .read = s390_msi_ctrl_read,
> > +    .endianness = DEVICE_NATIVE_ENDIAN,
> 
> If you're doing an LE conversion right after, it probably means the
> region is really a LITTLE_ENDIAN region, no?
> 
> Also, le32_to_cpu is definitely wrong in this case, as that will byte
> swap on BE hosts, but not on LE hosts. The value you get back is always
> the same though, so you've successfully broken LE hosts.
> 
> Keep in mind that there's always the slim chance that s390x supports LE
> one day, so getting endianness right from the start is pretty important,
> even though it seems useless to you today ;).
>
Ok, think I got this now. MR is LE
 
> > +};
> > +
> > +static void s390_pcihost_init_as(S390pciState *s)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < PCI_SLOT_MAX; i++) {
> > +        memory_region_init_iommu(&s->iommu[i].mr, OBJECT(s),
> > +                                 &s390_iommu_ops, "iommu-s390", UINT64_MAX);
> > +        address_space_init(&s->iommu[i].as, &s->iommu[i].mr, "iommu-pci");
> > +    }
> > +
> > +    memory_region_init_io(&s->msix_notify_mr, OBJECT(s),
> > +                          &s390_msi_ctrl_ops, s, "msix-s390", UINT64_MAX);
> > +    address_space_init(&s->msix_notify_as, &s->msix_notify_mr, "msix-pci");
> > +}
> > +
> > +static int s390_pcihost_init(SysBusDevice *dev)
> > +{
> > +    PCIBus *b;
> > +    BusState *bus;
> > +    PCIHostState *phb = PCI_HOST_BRIDGE(dev);
> > +    S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
> > +
> > +    DPRINTF("host_init\n");
> > +
> > +    b = pci_register_bus(DEVICE(dev), NULL,
> > +                         s390_pci_set_irq, s390_pci_map_irq, NULL,
> > +                         get_system_memory(), get_system_io(), 0, 64,
> > +                         TYPE_PCI_BUS);
> > +    s390_pcihost_init_as(s);
> > +    pci_setup_iommu(b, s390_pci_dma_iommu, s);
> > +
> > +    bus = BUS(b);
> > +    qbus_set_hotplug_handler(bus, DEVICE(dev), NULL);
> > +    phb->bus = b;
> > +    return 0;
> > +}
> > +
> > +static int s390_pcihost_setup_msix(S390PCIBusDevice *pbdev)
> > +{
> > +    uint8_t pos;
> > +    uint16_t ctrl;
> > +    uint32_t table, pba;
> > +
> > +    pos = pci_find_capability(pbdev->pdev, PCI_CAP_ID_MSIX);
> > +    if (!pos) {
> > +        pbdev->msix.available = false;
> > +        return 0;
> > +    }
> > +
> > +    ctrl = pci_host_config_read_common(pbdev->pdev, pos + PCI_CAP_FLAGS,
> > +             pci_config_size(pbdev->pdev), sizeof(ctrl));
> > +    table = pci_host_config_read_common(pbdev->pdev, pos + PCI_MSIX_TABLE,
> > +             pci_config_size(pbdev->pdev), sizeof(table));
> > +    pba = pci_host_config_read_common(pbdev->pdev, pos + PCI_MSIX_PBA,
> > +             pci_config_size(pbdev->pdev), sizeof(pba));
> > +
> > +    pbdev->msix.table_bar = table & PCI_MSIX_FLAGS_BIRMASK;
> > +    pbdev->msix.table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
> > +    pbdev->msix.pba_bar = pba & PCI_MSIX_FLAGS_BIRMASK;
> > +    pbdev->msix.pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> > +    pbdev->msix.entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> > +    pbdev->msix.available = true;
> > +    return 0;
> > +}
> > +
> > +static void s390_pcihost_hot_plug(HotplugHandler *hotplug_dev,
> > +                                  DeviceState *dev, Error **errp)
> > +{
> > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > +    S390PCIBusDevice *pbdev;
> > +    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
> > +                                           ->qbus.parent);
> > +
> > +    pbdev = g_malloc0(sizeof(*pbdev));
> > +
> > +    pbdev->fid = s390_pci_get_pfid(pci_dev);
> > +    pbdev->pdev = pci_dev;
> > +    pbdev->configured = true;
> > +
> > +    pbdev->fh = s390_pci_get_pfh(pci_dev);
> > +
> > +    s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev = pbdev;
> > +    s390_pcihost_setup_msix(pbdev);
> > +
> > +    QTAILQ_INSERT_TAIL(&device_list, pbdev, next);
> > +    if (dev->hotplugged) {
> > +        s390_pci_generate_plug_event(HP_EVENT_RESERVED_TO_STANDBY,
> > +                                     pbdev->fh, pbdev->fid);
> > +        s390_pci_generate_plug_event(HP_EVENT_TO_CONFIGURED,
> > +                                     pbdev->fh, pbdev->fid);
> > +    }
> > +    return;
> > +}
> > +
> > +static void s390_pcihost_hot_unplug(HotplugHandler *hotplug_dev,
> > +                                    DeviceState *dev, Error **errp)
> > +{
> > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > +    S390pciState *s = S390_PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)
> > +                                           ->qbus.parent);
> > +    S390PCIBusDevice *pbdev = s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev;
> > +
> > +    if (pbdev->configured) {
> > +        pbdev->configured = false;
> > +        s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
> > +                                     pbdev->fh, pbdev->fid);
> > +    }
> > +
> > +    QTAILQ_REMOVE(&device_list, pbdev, next);
> > +    s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
> > +                                 pbdev->fh, pbdev->fid);
> > +    s->iommu[PCI_SLOT(pci_dev->devfn)].pbdev = NULL;
> > +    object_unparent(OBJECT(pci_dev));
> > +    g_free(pbdev);
> > +}
> > +
> > +static void s390_pcihost_class_init(ObjectClass *klass, void *data)
> > +{
> > +    SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
> > +
> > +    dc->cannot_instantiate_with_device_add_yet = true;
> > +    k->init = s390_pcihost_init;
> > +    hc->plug = s390_pcihost_hot_plug;
> > +    hc->unplug = s390_pcihost_hot_unplug;
> > +    msi_supported = true;
> > +}
> > +
> > +static const TypeInfo s390_pcihost_info = {
> > +    .name          = TYPE_S390_PCI_HOST_BRIDGE,
> > +    .parent        = TYPE_PCI_HOST_BRIDGE,
> > +    .instance_size = sizeof(S390pciState),
> > +    .class_init    = s390_pcihost_class_init,
> > +    .interfaces = (InterfaceInfo[]) {
> > +        { TYPE_HOTPLUG_HANDLER },
> > +        { }
> > +    }
> > +};
> > +
> > +static void s390_pci_register_types(void)
> > +{
> > +    type_register_static(&s390_pcihost_info);
> > +}
> > +
> > +type_init(s390_pci_register_types)
> > diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
> > new file mode 100644
> > index 0000000..088f24f
> > --- /dev/null
> > +++ b/hw/s390x/s390-pci-bus.h
> > @@ -0,0 +1,254 @@
> > +/*
> > + * s390 PCI BUS definitions
> > + *
> > + * Copyright 2014 IBM Corp.
> > + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> > + *            Hong Bo Li <lihbbj@cn.ibm.com>
> > + *            Yi Min Zhao <zyimin@cn.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > + * your option) any later version. See the COPYING file in the top-level
> > + * directory.
> > + */
> > +
> > +#ifndef HW_S390_PCI_BUS_H
> > +#define HW_S390_PCI_BUS_H
> > +
> > +#include <hw/pci/pci.h>
> > +#include <hw/pci/pci_host.h>
> > +#include "hw/s390x/sclp.h"
> > +#include "hw/s390x/s390_flic.h"
> > +#include "hw/s390x/css.h"
> > +
> > +#define TYPE_S390_PCI_HOST_BRIDGE "s390-pcihost"
> > +#define FH_VIRT 0x00ff0000
> > +#define ENABLE_BIT_OFFSET 31
> > +#define S390_PCIPT_ADAPTER 2
> > +
> > +#define S390_PCI_HOST_BRIDGE(obj) \
> > +    OBJECT_CHECK(S390pciState, (obj), TYPE_S390_PCI_HOST_BRIDGE)
> > +
> > +#define HP_EVENT_TO_CONFIGURED        0x0301
> > +#define HP_EVENT_RESERVED_TO_STANDBY  0x0302
> > +#define HP_EVENT_CONFIGURED_TO_STBRES 0x0304
> > +#define HP_EVENT_STANDBY_TO_RESERVED  0x0308
> > +
> > +#define ZPCI_MSI_VEC_BITS 11
> > +#define ZPCI_MSI_VEC_MASK 0x7f
> > +
> > +#define ZPCI_MSI_ADDR  0xfe00000000000000
> > +#define ZPCI_SDMA_ADDR 0x100000000
> > +#define ZPCI_EDMA_ADDR 0x1ffffffffffffff
> > +
> > +#define PAGE_SHIFT      12
> > +#define PAGE_MASK       (~(PAGE_SIZE-1))
> > +#define PAGE_DEFAULT_ACC        0
> > +#define PAGE_DEFAULT_KEY        (PAGE_DEFAULT_ACC << 4)
> > +
> > +/* I/O Translation Anchor (IOTA) */
> > +enum ZpciIoatDtype {
> > +    ZPCI_IOTA_STO = 0,
> > +    ZPCI_IOTA_RTTO = 1,
> > +    ZPCI_IOTA_RSTO = 2,
> > +    ZPCI_IOTA_RFTO = 3,
> > +    ZPCI_IOTA_PFAA = 4,
> > +    ZPCI_IOTA_IOPFAA = 5,
> > +    ZPCI_IOTA_IOPTO = 7
> > +};
> > +
> > +#define ZPCI_IOTA_IOT_ENABLED           0x800UL
> > +#define ZPCI_IOTA_DT_ST                 (ZPCI_IOTA_STO  << 2)
> > +#define ZPCI_IOTA_DT_RT                 (ZPCI_IOTA_RTTO << 2)
> > +#define ZPCI_IOTA_DT_RS                 (ZPCI_IOTA_RSTO << 2)
> > +#define ZPCI_IOTA_DT_RF                 (ZPCI_IOTA_RFTO << 2)
> > +#define ZPCI_IOTA_DT_PF                 (ZPCI_IOTA_PFAA << 2)
> > +#define ZPCI_IOTA_FS_4K                 0
> > +#define ZPCI_IOTA_FS_1M                 1
> > +#define ZPCI_IOTA_FS_2G                 2
> > +#define ZPCI_KEY                        (PAGE_DEFAULT_KEY << 5)
> > +
> > +#define ZPCI_IOTA_STO_FLAG  (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_ST)
> > +#define ZPCI_IOTA_RTTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RT)
> > +#define ZPCI_IOTA_RSTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RS)
> > +#define ZPCI_IOTA_RFTO_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RF)
> > +#define ZPCI_IOTA_RFAA_FLAG (ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY |\
> > +                             ZPCI_IOTA_DT_PF | ZPCI_IOTA_FS_2G)
> > +
> > +/* I/O Region and segment tables */
> > +#define ZPCI_INDEX_MASK         0x7ffUL
> > +
> > +#define ZPCI_TABLE_TYPE_MASK    0xc
> > +#define ZPCI_TABLE_TYPE_RFX     0xc
> > +#define ZPCI_TABLE_TYPE_RSX     0x8
> > +#define ZPCI_TABLE_TYPE_RTX     0x4
> > +#define ZPCI_TABLE_TYPE_SX      0x0
> > +
> > +#define ZPCI_TABLE_LEN_RFX      0x3
> > +#define ZPCI_TABLE_LEN_RSX      0x3
> > +#define ZPCI_TABLE_LEN_RTX      0x3
> > +
> > +#define ZPCI_TABLE_OFFSET_MASK  0xc0
> > +#define ZPCI_TABLE_SIZE         0x4000
> > +#define ZPCI_TABLE_ALIGN        ZPCI_TABLE_SIZE
> > +#define ZPCI_TABLE_ENTRY_SIZE   (sizeof(unsigned long))
> > +#define ZPCI_TABLE_ENTRIES      (ZPCI_TABLE_SIZE / ZPCI_TABLE_ENTRY_SIZE)
> > +
> > +#define ZPCI_TABLE_BITS         11
> > +#define ZPCI_PT_BITS            8
> > +#define ZPCI_ST_SHIFT           (ZPCI_PT_BITS + PAGE_SHIFT)
> > +#define ZPCI_RT_SHIFT           (ZPCI_ST_SHIFT + ZPCI_TABLE_BITS)
> > +
> > +#define ZPCI_RTE_FLAG_MASK      0x3fffUL
> > +#define ZPCI_RTE_ADDR_MASK      (~ZPCI_RTE_FLAG_MASK)
> > +#define ZPCI_STE_FLAG_MASK      0x7ffUL
> > +#define ZPCI_STE_ADDR_MASK      (~ZPCI_STE_FLAG_MASK)
> > +
> > +/* I/O Page tables */
> > +#define ZPCI_PTE_VALID_MASK             0x400
> > +#define ZPCI_PTE_INVALID                0x400
> > +#define ZPCI_PTE_VALID                  0x000
> > +#define ZPCI_PT_SIZE                    0x800
> > +#define ZPCI_PT_ALIGN                   ZPCI_PT_SIZE
> > +#define ZPCI_PT_ENTRIES                 (ZPCI_PT_SIZE / ZPCI_TABLE_ENTRY_SIZE)
> > +#define ZPCI_PT_MASK                    (ZPCI_PT_ENTRIES - 1)
> > +
> > +#define ZPCI_PTE_FLAG_MASK              0xfffUL
> > +#define ZPCI_PTE_ADDR_MASK              (~ZPCI_PTE_FLAG_MASK)
> > +
> > +/* Shared bits */
> > +#define ZPCI_TABLE_VALID                0x00
> > +#define ZPCI_TABLE_INVALID              0x20
> > +#define ZPCI_TABLE_PROTECTED            0x200
> > +#define ZPCI_TABLE_UNPROTECTED          0x000
> > +
> > +#define ZPCI_TABLE_VALID_MASK           0x20
> > +#define ZPCI_TABLE_PROT_MASK            0x200
> > +
> > +typedef struct SeiContainer {
> > +    QTAILQ_ENTRY(SeiContainer) link;
> > +    uint32_t fid;
> > +    uint32_t fh;
> > +    uint8_t cc;
> > +    uint16_t pec;
> > +} SeiContainer;
> > +
> > +typedef struct PciCcdfErr {
> > +    uint32_t reserved1;
> > +    uint32_t fh;
> > +    uint32_t fid;
> > +    uint32_t reserved2;
> > +    uint64_t faddr;
> > +    uint32_t reserved3;
> > +    uint16_t reserved4;
> > +    uint16_t pec;
> > +} QEMU_PACKED PciCcdfErr;
> > +
> > +typedef struct PciCcdfAvail {
> > +    uint32_t reserved1;
> > +    uint32_t fh;
> > +    uint32_t fid;
> > +    uint32_t reserved2;
> > +    uint32_t reserved3;
> > +    uint32_t reserved4;
> > +    uint32_t reserved5;
> > +    uint16_t reserved6;
> > +    uint16_t pec;
> > +} QEMU_PACKED PciCcdfAvail;
> > +
> > +typedef struct ChscSeiNt2Res {
> > +    uint16_t length;
> > +    uint16_t code;
> > +    uint16_t reserved1;
> > +    uint8_t reserved2;
> > +    uint8_t nt;
> > +    uint8_t flags;
> > +    uint8_t reserved3;
> > +    uint8_t reserved4;
> > +    uint8_t cc;
> > +    uint32_t reserved5[13];
> > +    uint8_t ccdf[4016];
> > +} QEMU_PACKED ChscSeiNt2Res;
> > +
> > +typedef struct PciCfgSccb {
> > +        SCCBHeader header;
> > +        uint8_t atype;
> > +        uint8_t reserved1;
> > +        uint16_t reserved2;
> > +        uint32_t aid;
> > +} QEMU_PACKED PciCfgSccb;
> > +
> > +typedef struct S390MsixInfo {
> > +    bool available;
> > +    uint8_t table_bar;
> > +    uint8_t pba_bar;
> > +    uint16_t entries;
> > +    uint32_t table_offset;
> > +    uint32_t pba_offset;
> > +} S390MsixInfo;
> > +
> > +typedef struct S390PCIBusDevice {
> > +    PCIDevice *pdev;
> > +    bool configured;
> > +    uint32_t fh;
> > +    uint32_t fid;
> > +    uint64_t g_iota;
> > +    uint8_t isc;
> > +    S390MsixInfo msix;
> > +    AdapterRoutes routes;
> > +    QTAILQ_ENTRY(S390PCIBusDevice) next;
> > +} S390PCIBusDevice;
> > +
> > +typedef struct BEntry {
> > +    AddressSpace as;
> > +    MemoryRegion mr;
> > +    S390PCIBusDevice *pbdev;
> > +} BEntry;
> > +
> > +typedef struct S390pciState {
> > +    PCIHostState parent_obj;
> > +    BEntry iommu[PCI_SLOT_MAX];
> > +    AddressSpace msix_notify_as;
> > +    MemoryRegion msix_notify_mr;
> > +} S390pciState;
> > +
> > +static inline unsigned int calc_rtx(dma_addr_t ptr)
> > +{
> > +    return ((unsigned long) ptr >> ZPCI_RT_SHIFT) & ZPCI_INDEX_MASK;
> > +}
> > +
> > +static inline unsigned int calc_sx(dma_addr_t ptr)
> > +{
> > +    return ((unsigned long) ptr >> ZPCI_ST_SHIFT) & ZPCI_INDEX_MASK;
> > +}
> > +
> > +static inline unsigned int calc_px(dma_addr_t ptr)
> > +{
> > +    return ((unsigned long) ptr >> PAGE_SHIFT) & ZPCI_PT_MASK;
> > +}
> > +
> > +static inline unsigned long *get_rt_sto(unsigned long entry)
> > +{
> > +    return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_RTX)
> > +                ? (unsigned long *) (entry & ZPCI_RTE_ADDR_MASK)
> > +                : NULL;
> > +}
> > +
> > +static inline unsigned long *get_st_pto(unsigned long entry)
> > +{
> > +    return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_SX)
> > +            ? (unsigned long *) (entry & ZPCI_STE_ADDR_MASK)
> > +            : NULL;
> > +}
> 
> Are these static inlines used outside of a single place? If not, please
> move them into the .c file they get called from.
> 
> > +
> > +int chsc_sei_nt2_get_event(void *res);
> > +int chsc_sei_nt2_have_event(void);
> > +void s390_pci_sclp_configure(int configure, SCCB *sccb);
> > +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
> > +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
> > +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);
> 
> I think it makes sense to pass the PHB device as parameter on these.
> Don't assume you only have one.

We need to lookup our device mainly in the instruction handlers and there
we do not have a PHB available. Also having one list for our S390PCIBusDevices
devices does not prevent us from supporting more PHBs.

> 
> > +void s390_pci_bus_init(void);
> > +uint64_t s390_pci_get_table_origin(uint64_t iota);
> > +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
> > +                                  uint64_t guest_dma_address);
> 
> Why are these exported?
> 
> > +
> > +#endif
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index bc4dc2a..2e25834 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -18,6 +18,7 @@
> >  #include "css.h"
> >  #include "virtio-ccw.h"
> >  #include "qemu/config-file.h"
> > +#include "s390-pci-bus.h"
> >  
> >  #define TYPE_S390_CCW_MACHINE               "s390-ccw-machine"
> >  
> > @@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
> >                        machine->initrd_filename, "s390-ccw.img");
> >      s390_flic_init();
> >  
> > +    s390_pci_bus_init();
> 
> Please just inline that function here.
> 

What do you mean by "just inline"?

> > +
> >      /* register hypercalls */
> >      virtio_ccw_register_hcalls();
> >  
> > diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
> > index a759da7..a969975 100644
> > --- a/hw/s390x/sclp.c
> > +++ b/hw/s390x/sclp.c
> > @@ -20,6 +20,7 @@
> >  #include "qemu/config-file.h"
> >  #include "hw/s390x/sclp.h"
> >  #include "hw/s390x/event-facility.h"
> > +#include "hw/s390x/s390-pci-bus.h"
> >  
> >  static inline SCLPEventFacility *get_event_facility(void)
> >  {
> > @@ -62,7 +63,8 @@ static void read_SCP_info(SCCB *sccb)
> >          read_info->entries[i].type = 0;
> >      }
> >  
> > -    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);
> > +    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO |
> > +                                        SCLP_HAS_PCI_RECONFIG);
> >  
> >      /*
> >       * The storage increment size is a multiple of 1M and is a power of 2.
> > @@ -350,6 +352,12 @@ static void sclp_execute(SCCB *sccb, uint32_t code)
> >      case SCLP_UNASSIGN_STORAGE:
> >          unassign_storage(sccb);
> >          break;
> > +    case SCLP_CMDW_CONFIGURE_PCI:
> > +        s390_pci_sclp_configure(1, sccb);
> > +        break;
> > +    case SCLP_CMDW_DECONFIGURE_PCI:
> > +        s390_pci_sclp_configure(0, sccb);
> > +        break;
> >      default:
> >          efc->command_handler(ef, sccb, code);
> >          break;
> > diff --git a/include/hw/s390x/sclp.h b/include/hw/s390x/sclp.h
> > index ec07a11..e8a64e2 100644
> > --- a/include/hw/s390x/sclp.h
> > +++ b/include/hw/s390x/sclp.h
> > @@ -43,14 +43,22 @@
> >  #define SCLP_CMDW_CONFIGURE_CPU                 0x00110001
> >  #define SCLP_CMDW_DECONFIGURE_CPU               0x00100001
> >  
> > +/* SCLP PCI codes */
> > +#define SCLP_HAS_PCI_RECONFIG                   0x0000000040000000ULL
> > +#define SCLP_CMDW_CONFIGURE_PCI                 0x001a0001
> > +#define SCLP_CMDW_DECONFIGURE_PCI               0x001b0001
> > +#define SCLP_RECONFIG_PCI_ATPYE                 2
> > +
> >  /* SCLP response codes */
> >  #define SCLP_RC_NORMAL_READ_COMPLETION          0x0010
> >  #define SCLP_RC_NORMAL_COMPLETION               0x0020
> >  #define SCLP_RC_SCCB_BOUNDARY_VIOLATION         0x0100
> > +#define SCLP_RC_NO_ACTION_REQUIRED              0x0120
> >  #define SCLP_RC_INVALID_SCLP_COMMAND            0x01f0
> >  #define SCLP_RC_CONTAINED_EQUIPMENT_CHECK       0x0340
> >  #define SCLP_RC_INSUFFICIENT_SCCB_LENGTH        0x0300
> >  #define SCLP_RC_STANDBY_READ_COMPLETION         0x0410
> > +#define SCLP_RC_ADAPTER_ID_NOT_RECOGNIZED       0x09f0
> >  #define SCLP_RC_INVALID_FUNCTION                0x40f0
> >  #define SCLP_RC_NO_EVENT_BUFFERS_STORED         0x60f0
> >  #define SCLP_RC_INVALID_SELECTION_MASK          0x70f0
> > diff --git a/target-s390x/ioinst.c b/target-s390x/ioinst.c
> > index b8a6486..d969f8f 100644
> > --- a/target-s390x/ioinst.c
> > +++ b/target-s390x/ioinst.c
> > @@ -14,6 +14,7 @@
> >  #include "cpu.h"
> >  #include "ioinst.h"
> >  #include "trace.h"
> > +#include "hw/s390x/s390-pci-bus.h"
> >  
> >  int ioinst_disassemble_sch_ident(uint32_t value, int *m, int *cssid, int *ssid,
> >                                   int *schid)
> > @@ -398,6 +399,7 @@ typedef struct ChscResp {
> >  #define CHSC_SCPD 0x0002
> >  #define CHSC_SCSC 0x0010
> >  #define CHSC_SDA  0x0031
> > +#define CHSC_SEI  0x000e
> >  
> >  #define CHSC_SCPD_0_M 0x20000000
> >  #define CHSC_SCPD_0_C 0x10000000
> > @@ -566,6 +568,53 @@ out:
> >      res->param = 0;
> >  }
> >  
> > +static int chsc_sei_nt0_get_event(void *res)
> > +{
> > +    /* no events yet */
> > +    return 1;
> > +}
> > +
> > +static int chsc_sei_nt0_have_event(void)
> > +{
> > +    /* no events yet */
> > +    return 0;
> > +}
> > +
> > +#define CHSC_SEI_NT0    (1ULL << 63)
> > +#define CHSC_SEI_NT2    (1ULL << 61)
> > +static void ioinst_handle_chsc_sei(ChscReq *req, ChscResp *res)
> > +{
> > +    uint64_t selection_mask = be64_to_cpu(*(uint64_t *)&req->param1);
> 
> ldq_p(&req->param1) I guess?
> 
> 
> Alex

Thx for your help,

Frank

> 
> > +    uint8_t *res_flags = (uint8_t *)res->data;
> > +    int have_event = 0;
> > +    int have_more = 0;
> > +
> > +    /* regarding architecture nt0 can not be masked */
> > +    have_event = !chsc_sei_nt0_get_event(res);
> > +    have_more = chsc_sei_nt0_have_event();
> > +
> > +    if (selection_mask & CHSC_SEI_NT2) {
> > +        if (!have_event) {
> > +            have_event = !chsc_sei_nt2_get_event(res);
> > +        }
> > +
> > +        if (!have_more) {
> > +            have_more = chsc_sei_nt2_have_event();
> > +        }
> > +    }
> > +
> > +    if (have_event) {
> > +        res->code = cpu_to_be16(0x0001);
> > +        if (have_more) {
> > +            (*res_flags) |= 0x80;
> > +        } else {
> > +            (*res_flags) &= ~0x80;
> > +        }
> > +    } else {
> > +        res->code = cpu_to_be16(0x0004);
> > +    }
> > +}
> > +
> >  static void ioinst_handle_chsc_unimplemented(ChscResp *res)
> >  {
> >      res->len = cpu_to_be16(CHSC_MIN_RESP_LEN);
> > @@ -617,6 +666,9 @@ void ioinst_handle_chsc(S390CPU *cpu, uint32_t ipb)
> >      case CHSC_SDA:
> >          ioinst_handle_chsc_sda(req, res);
> >          break;
> > +    case CHSC_SEI:
> > +        ioinst_handle_chsc_sei(req, res);
> > +        break;
> >      default:
> >          ioinst_handle_chsc_unimplemented(res);
> >          break;
> > diff --git a/target-s390x/ioinst.h b/target-s390x/ioinst.h
> > index 29f6423..1efe16c 100644
> > --- a/target-s390x/ioinst.h
> > +++ b/target-s390x/ioinst.h
> > @@ -204,6 +204,7 @@ typedef struct CRW {
> >  
> >  #define CRW_RSC_SUBCH 0x3
> >  #define CRW_RSC_CHP   0x4
> > +#define CRW_RSC_CSS   0xb
> >  
> >  /* I/O interruption code */
> >  typedef struct IOIntCode {
> > 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-18 12:50     ` Frank Blaschka
@ 2014-11-18 17:00       ` Alexander Graf
  2014-11-25 10:11         ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-18 17:00 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth



On 18.11.14 13:50, Frank Blaschka wrote:
> On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
>>
>>
>> On 10.11.14 15:20, Frank Blaschka wrote:
>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>
>>> This patch implements a pci bus for s390x together with infrastructure
>>> to generate and handle hotplug events, to configure/unconfigure via
>>> sclp instruction, to do iommu translations and provide s390 support for
>>> MSI/MSI-X notification processing.
>>>
>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>

[...]

>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> new file mode 100644
>>> index 0000000..f2fa6ba
>>> --- /dev/null
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -0,0 +1,485 @@
>>> +/*
>>> + * s390 PCI BUS
>>> + *
>>> + * Copyright 2014 IBM Corp.
>>> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
>>> + *            Hong Bo Li <lihbbj@cn.ibm.com>
>>> + *            Yi Min Zhao <zyimin@cn.ibm.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>>> + * your option) any later version. See the COPYING file in the top-level
>>> + * directory.
>>> + */
>>> +
>>> +#include <hw/pci/pci.h>
>>> +#include <hw/pci/pci_bus.h>
>>> +#include <hw/s390x/css.h>
>>> +#include <hw/s390x/sclp.h>
>>> +#include <hw/pci/msi.h>
>>> +#include "qemu/error-report.h"
>>> +#include "s390-pci-bus.h"
>>> +
>>> +/* #define DEBUG_S390PCI_BUS */
>>> +#ifdef DEBUG_S390PCI_BUS
>>> +#define DPRINTF(fmt, ...) \
>>> +    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
>>> +#else
>>> +#define DPRINTF(fmt, ...) \
>>> +    do { } while (0)
>>> +#endif
>>> +
>>> +static const unsigned long be_to_le = BITS_PER_LONG - 1;
>>> +static QTAILQ_HEAD(, SeiContainer) pending_sei =
>>> +    QTAILQ_HEAD_INITIALIZER(pending_sei);
>>> +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
>>> +    QTAILQ_HEAD_INITIALIZER(device_list);
>>
>> Please get rid of all statics ;). All state has to live in objects.
>>
> 
> be_to_le was misleading and unnecesary will remove this one but
> static QTAILQ_HEAD seems to be a common practice for list anchors.
> If you really want me to change this do you have any prefered way,
> or can you point me to some code doing this?

For PCI devices, I don't think you need a list at all. Your PHB device
should already have a proper qbus that knows about all its child devices.

As for pending_sei, what is this about?

> 
>>> +
>>> +int chsc_sei_nt2_get_event(void *res)

[...]

>>> +
>>> +int chsc_sei_nt2_get_event(void *res);
>>> +int chsc_sei_nt2_have_event(void);
>>> +void s390_pci_sclp_configure(int configure, SCCB *sccb);
>>> +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
>>> +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
>>> +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);
>>
>> I think it makes sense to pass the PHB device as parameter on these.
>> Don't assume you only have one.
> 
> We need to lookup our device mainly in the instruction handlers and there
> we do not have a PHB available.

Then have a way to find your PHB - either put a variable into the
machine object, or find it by path via QOM tree lookups. Maybe we need
multiple PHBs, identified by part of the ID? I know too little about the
way PCI works on s390x to really tell.

Again, are there specs?

> Also having one list for our S390PCIBusDevices
> devices does not prevent us from supporting more PHBs.
> 
>>
>>> +void s390_pci_bus_init(void);
>>> +uint64_t s390_pci_get_table_origin(uint64_t iota);
>>> +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
>>> +                                  uint64_t guest_dma_address);
>>
>> Why are these exported?
>>
>>> +
>>> +#endif
>>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>>> index bc4dc2a..2e25834 100644
>>> --- a/hw/s390x/s390-virtio-ccw.c
>>> +++ b/hw/s390x/s390-virtio-ccw.c
>>> @@ -18,6 +18,7 @@
>>>  #include "css.h"
>>>  #include "virtio-ccw.h"
>>>  #include "qemu/config-file.h"
>>> +#include "s390-pci-bus.h"
>>>  
>>>  #define TYPE_S390_CCW_MACHINE               "s390-ccw-machine"
>>>  
>>> @@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
>>>                        machine->initrd_filename, "s390-ccw.img");
>>>      s390_flic_init();
>>>  
>>> +    s390_pci_bus_init();
>>
>> Please just inline that function here.
>>
> 
> What do you mean by "just inline"?

The contents of the s390_pci_bus_init() function should just be standing
right here. There's no value in creating a public wrapper function for
initialization. We only did this back in the old days before qdev was
around, because initialization was difficult back then and some devices
didn't make the jump to get rid of their public init functions.

> 
>>> +
>>>      /* register hypercalls */
>>>      virtio_ccw_register_hcalls();
>>>  
>>> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
>>> index a759da7..a969975 100644
>>> --- a/hw/s390x/sclp.c
>>> +++ b/hw/s390x/sclp.c
>>> @@ -20,6 +20,7 @@
>>>  #include "qemu/config-file.h"
>>>  #include "hw/s390x/sclp.h"
>>>  #include "hw/s390x/event-facility.h"
>>> +#include "hw/s390x/s390-pci-bus.h"
>>>  
>>>  static inline SCLPEventFacility *get_event_facility(void)
>>>  {
>>> @@ -62,7 +63,8 @@ static void read_SCP_info(SCCB *sccb)
>>>          read_info->entries[i].type = 0;
>>>      }
>>>  
>>> -    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);
>>> +    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO |
>>> +                                        SCLP_HAS_PCI_RECONFIG);

Can we make this conditional on the fact that there is PCI available? Or
do you want PCI support to be the baseline? Keep in mind that going
forward, we need to start thinking about machine backwards compatibility
too, so the QEMU 2.2 versioned machine should not expose PCI (though for
now we don't care yet IIRC, but still please keep this in mind to get
used to the way things will be in the future).


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-18 17:00       ` Alexander Graf
@ 2014-11-25 10:11         ` Frank Blaschka
  2014-11-25 12:14           ` Alexander Graf
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Blaschka @ 2014-11-25 10:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth

On Tue, Nov 18, 2014 at 06:00:40PM +0100, Alexander Graf wrote:
> 
> 
> On 18.11.14 13:50, Frank Blaschka wrote:
> > On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 10.11.14 15:20, Frank Blaschka wrote:
> >>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>
> >>> This patch implements a pci bus for s390x together with infrastructure
> >>> to generate and handle hotplug events, to configure/unconfigure via
> >>> sclp instruction, to do iommu translations and provide s390 support for
> >>> MSI/MSI-X notification processing.
> >>>
> >>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> 
> [...]
> 
> >>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> >>> new file mode 100644
> >>> index 0000000..f2fa6ba
> >>> --- /dev/null
> >>> +++ b/hw/s390x/s390-pci-bus.c
> >>> @@ -0,0 +1,485 @@
> >>> +/*
> >>> + * s390 PCI BUS
> >>> + *
> >>> + * Copyright 2014 IBM Corp.
> >>> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> >>> + *            Hong Bo Li <lihbbj@cn.ibm.com>
> >>> + *            Yi Min Zhao <zyimin@cn.ibm.com>
> >>> + *
> >>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> >>> + * your option) any later version. See the COPYING file in the top-level
> >>> + * directory.
> >>> + */
> >>> +
> >>> +#include <hw/pci/pci.h>
> >>> +#include <hw/pci/pci_bus.h>
> >>> +#include <hw/s390x/css.h>
> >>> +#include <hw/s390x/sclp.h>
> >>> +#include <hw/pci/msi.h>
> >>> +#include "qemu/error-report.h"
> >>> +#include "s390-pci-bus.h"
> >>> +
> >>> +/* #define DEBUG_S390PCI_BUS */
> >>> +#ifdef DEBUG_S390PCI_BUS
> >>> +#define DPRINTF(fmt, ...) \
> >>> +    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
> >>> +#else
> >>> +#define DPRINTF(fmt, ...) \
> >>> +    do { } while (0)
> >>> +#endif
> >>> +
> >>> +static const unsigned long be_to_le = BITS_PER_LONG - 1;
> >>> +static QTAILQ_HEAD(, SeiContainer) pending_sei =
> >>> +    QTAILQ_HEAD_INITIALIZER(pending_sei);
> >>> +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
> >>> +    QTAILQ_HEAD_INITIALIZER(device_list);
> >>
> >> Please get rid of all statics ;). All state has to live in objects.
> >>
> > 
> > be_to_le was misleading and unnecesary will remove this one but
> > static QTAILQ_HEAD seems to be a common practice for list anchors.
> > If you really want me to change this do you have any prefered way,
> > or can you point me to some code doing this?
> 
> For PCI devices, I don't think you need a list at all. Your PHB device
> should already have a proper qbus that knows about all its child devices.

OK

> 
> As for pending_sei, what is this about?
>

This is a queue to store events (StoreEventInformation) used for hotplug
support. In case a device is pluged/unpluged an event is stored to this queue
and the guest is notified. Then the guest pick up the event information via
chsc instruction.
 
> > 
> >>> +
> >>> +int chsc_sei_nt2_get_event(void *res)
> 
> [...]
> 
> >>> +
> >>> +int chsc_sei_nt2_get_event(void *res);
> >>> +int chsc_sei_nt2_have_event(void);
> >>> +void s390_pci_sclp_configure(int configure, SCCB *sccb);
> >>> +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
> >>> +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
> >>> +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);
> >>
> >> I think it makes sense to pass the PHB device as parameter on these.
> >> Don't assume you only have one.
> > 
> > We need to lookup our device mainly in the instruction handlers and there
> > we do not have a PHB available.
> 
> Then have a way to find your PHB - either put a variable into the
> machine object, or find it by path via QOM tree lookups. Maybe we need
> multiple PHBs, identified by part of the ID? I know too little about the
> way PCI works on s390x to really tell.
> 
> Again, are there specs?
>

Yes there are, but unfortunately they are not public.
 
> > Also having one list for our S390PCIBusDevices
> > devices does not prevent us from supporting more PHBs.
> > 
> >>
> >>> +void s390_pci_bus_init(void);
> >>> +uint64_t s390_pci_get_table_origin(uint64_t iota);
> >>> +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
> >>> +                                  uint64_t guest_dma_address);
> >>
> >> Why are these exported?
> >>
> >>> +
> >>> +#endif
> >>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> >>> index bc4dc2a..2e25834 100644
> >>> --- a/hw/s390x/s390-virtio-ccw.c
> >>> +++ b/hw/s390x/s390-virtio-ccw.c
> >>> @@ -18,6 +18,7 @@
> >>>  #include "css.h"
> >>>  #include "virtio-ccw.h"
> >>>  #include "qemu/config-file.h"
> >>> +#include "s390-pci-bus.h"
> >>>  
> >>>  #define TYPE_S390_CCW_MACHINE               "s390-ccw-machine"
> >>>  
> >>> @@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
> >>>                        machine->initrd_filename, "s390-ccw.img");
> >>>      s390_flic_init();
> >>>  
> >>> +    s390_pci_bus_init();
> >>
> >> Please just inline that function here.
> >>
> > 
> > What do you mean by "just inline"?
> 
> The contents of the s390_pci_bus_init() function should just be standing
> right here. There's no value in creating a public wrapper function for
> initialization. We only did this back in the old days before qdev was
> around, because initialization was difficult back then and some devices
> didn't make the jump to get rid of their public init functions.
> 
> > 
> >>> +
> >>>      /* register hypercalls */
> >>>      virtio_ccw_register_hcalls();
> >>>  
> >>> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
> >>> index a759da7..a969975 100644
> >>> --- a/hw/s390x/sclp.c
> >>> +++ b/hw/s390x/sclp.c
> >>> @@ -20,6 +20,7 @@
> >>>  #include "qemu/config-file.h"
> >>>  #include "hw/s390x/sclp.h"
> >>>  #include "hw/s390x/event-facility.h"
> >>> +#include "hw/s390x/s390-pci-bus.h"
> >>>  
> >>>  static inline SCLPEventFacility *get_event_facility(void)
> >>>  {
> >>> @@ -62,7 +63,8 @@ static void read_SCP_info(SCCB *sccb)
> >>>          read_info->entries[i].type = 0;
> >>>      }
> >>>  
> >>> -    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);
> >>> +    read_info->facilities = cpu_to_be64(SCLP_HAS_CPU_INFO |
> >>> +                                        SCLP_HAS_PCI_RECONFIG);
> 
> Can we make this conditional on the fact that there is PCI available? Or
> do you want PCI support to be the baseline? Keep in mind that going
> forward, we need to start thinking about machine backwards compatibility
> too, so the QEMU 2.2 versioned machine should not expose PCI (though for
> now we don't care yet IIRC, but still please keep this in mind to get
> used to the way things will be in the future).
>

Yes, It should be dependent on the machine generation. Michael Mueller is
working on this. I will add this the time the new CPU model code comes
available. Same for the PHB itselv, it should only be created if maschine
supports zPCI. 
 
> 
> Alex
> 

I think I have addressed all issues now and plan to post a new version
of the patch set later this week.

Thx, Frank

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-25 10:11         ` Frank Blaschka
@ 2014-11-25 12:14           ` Alexander Graf
  2014-11-25 12:43             ` Frank Blaschka
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Graf @ 2014-11-25 12:14 UTC (permalink / raw)
  To: Frank Blaschka
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth



On 25.11.14 11:11, Frank Blaschka wrote:
> On Tue, Nov 18, 2014 at 06:00:40PM +0100, Alexander Graf wrote:
>>
>>
>> On 18.11.14 13:50, Frank Blaschka wrote:
>>> On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>>
>>>>> This patch implements a pci bus for s390x together with infrastructure
>>>>> to generate and handle hotplug events, to configure/unconfigure via
>>>>> sclp instruction, to do iommu translations and provide s390 support for
>>>>> MSI/MSI-X notification processing.
>>>>>
>>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
>>
>> [...]
>>
>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>> new file mode 100644
>>>>> index 0000000..f2fa6ba
>>>>> --- /dev/null
>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>> @@ -0,0 +1,485 @@
>>>>> +/*
>>>>> + * s390 PCI BUS
>>>>> + *
>>>>> + * Copyright 2014 IBM Corp.
>>>>> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
>>>>> + *            Hong Bo Li <lihbbj@cn.ibm.com>
>>>>> + *            Yi Min Zhao <zyimin@cn.ibm.com>
>>>>> + *
>>>>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>>>>> + * your option) any later version. See the COPYING file in the top-level
>>>>> + * directory.
>>>>> + */
>>>>> +
>>>>> +#include <hw/pci/pci.h>
>>>>> +#include <hw/pci/pci_bus.h>
>>>>> +#include <hw/s390x/css.h>
>>>>> +#include <hw/s390x/sclp.h>
>>>>> +#include <hw/pci/msi.h>
>>>>> +#include "qemu/error-report.h"
>>>>> +#include "s390-pci-bus.h"
>>>>> +
>>>>> +/* #define DEBUG_S390PCI_BUS */
>>>>> +#ifdef DEBUG_S390PCI_BUS
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
>>>>> +#else
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { } while (0)
>>>>> +#endif
>>>>> +
>>>>> +static const unsigned long be_to_le = BITS_PER_LONG - 1;
>>>>> +static QTAILQ_HEAD(, SeiContainer) pending_sei =
>>>>> +    QTAILQ_HEAD_INITIALIZER(pending_sei);
>>>>> +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
>>>>> +    QTAILQ_HEAD_INITIALIZER(device_list);
>>>>
>>>> Please get rid of all statics ;). All state has to live in objects.
>>>>
>>>
>>> be_to_le was misleading and unnecesary will remove this one but
>>> static QTAILQ_HEAD seems to be a common practice for list anchors.
>>> If you really want me to change this do you have any prefered way,
>>> or can you point me to some code doing this?
>>
>> For PCI devices, I don't think you need a list at all. Your PHB device
>> should already have a proper qbus that knows about all its child devices.
> 
> OK
> 
>>
>> As for pending_sei, what is this about?
>>
> 
> This is a queue to store events (StoreEventInformation) used for hotplug
> support. In case a device is pluged/unpluged an event is stored to this queue
> and the guest is notified. Then the guest pick up the event information via
> chsc instruction.

Is this for overall CCW or only for PCI? Depending on the answer, you
can put the sei event list into the respective parent device.


Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support
  2014-11-25 12:14           ` Alexander Graf
@ 2014-11-25 12:43             ` Frank Blaschka
  0 siblings, 0 replies; 27+ messages in thread
From: Frank Blaschka @ 2014-11-25 12:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell, Frank Blaschka, james.hogan, mtosatti, qemu-devel,
	borntraeger, cornelia.huck, pbonzini, rth

On Tue, Nov 25, 2014 at 01:14:01PM +0100, Alexander Graf wrote:
> 
> 
> On 25.11.14 11:11, Frank Blaschka wrote:
> > On Tue, Nov 18, 2014 at 06:00:40PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 18.11.14 13:50, Frank Blaschka wrote:
> >>> On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
> >>>>
> >>>>
> >>>> On 10.11.14 15:20, Frank Blaschka wrote:
> >>>>> From: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>>
> >>>>> This patch implements a pci bus for s390x together with infrastructure
> >>>>> to generate and handle hotplug events, to configure/unconfigure via
> >>>>> sclp instruction, to do iommu translations and provide s390 support for
> >>>>> MSI/MSI-X notification processing.
> >>>>>
> >>>>> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> >>
> >> [...]
> >>
> >>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> >>>>> new file mode 100644
> >>>>> index 0000000..f2fa6ba
> >>>>> --- /dev/null
> >>>>> +++ b/hw/s390x/s390-pci-bus.c
> >>>>> @@ -0,0 +1,485 @@
> >>>>> +/*
> >>>>> + * s390 PCI BUS
> >>>>> + *
> >>>>> + * Copyright 2014 IBM Corp.
> >>>>> + * Author(s): Frank Blaschka <frank.blaschka@de.ibm.com>
> >>>>> + *            Hong Bo Li <lihbbj@cn.ibm.com>
> >>>>> + *            Yi Min Zhao <zyimin@cn.ibm.com>
> >>>>> + *
> >>>>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> >>>>> + * your option) any later version. See the COPYING file in the top-level
> >>>>> + * directory.
> >>>>> + */
> >>>>> +
> >>>>> +#include <hw/pci/pci.h>
> >>>>> +#include <hw/pci/pci_bus.h>
> >>>>> +#include <hw/s390x/css.h>
> >>>>> +#include <hw/s390x/sclp.h>
> >>>>> +#include <hw/pci/msi.h>
> >>>>> +#include "qemu/error-report.h"
> >>>>> +#include "s390-pci-bus.h"
> >>>>> +
> >>>>> +/* #define DEBUG_S390PCI_BUS */
> >>>>> +#ifdef DEBUG_S390PCI_BUS
> >>>>> +#define DPRINTF(fmt, ...) \
> >>>>> +    do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
> >>>>> +#else
> >>>>> +#define DPRINTF(fmt, ...) \
> >>>>> +    do { } while (0)
> >>>>> +#endif
> >>>>> +
> >>>>> +static const unsigned long be_to_le = BITS_PER_LONG - 1;
> >>>>> +static QTAILQ_HEAD(, SeiContainer) pending_sei =
> >>>>> +    QTAILQ_HEAD_INITIALIZER(pending_sei);
> >>>>> +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
> >>>>> +    QTAILQ_HEAD_INITIALIZER(device_list);
> >>>>
> >>>> Please get rid of all statics ;). All state has to live in objects.
> >>>>
> >>>
> >>> be_to_le was misleading and unnecesary will remove this one but
> >>> static QTAILQ_HEAD seems to be a common practice for list anchors.
> >>> If you really want me to change this do you have any prefered way,
> >>> or can you point me to some code doing this?
> >>
> >> For PCI devices, I don't think you need a list at all. Your PHB device
> >> should already have a proper qbus that knows about all its child devices.
> > 
> > OK
> > 
> >>
> >> As for pending_sei, what is this about?
> >>
> > 
> > This is a queue to store events (StoreEventInformation) used for hotplug
> > support. In case a device is pluged/unpluged an event is stored to this queue
> > and the guest is notified. Then the guest pick up the event information via
> > chsc instruction.
> 
> Is this for overall CCW or only for PCI? Depending on the answer, you
> can put the sei event list into the respective parent device.
>

An NT2 event is pci specific. So I moved the queue for NT2 events to the PHB as well.

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-11-25 12:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-10 14:20 [Qemu-devel] [PATCH 0/3] add PCI support for the s390 platform Frank Blaschka
2014-11-10 14:20 ` [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support Frank Blaschka
2014-11-10 15:14   ` Alexander Graf
2014-11-18 12:50     ` Frank Blaschka
2014-11-18 17:00       ` Alexander Graf
2014-11-25 10:11         ` Frank Blaschka
2014-11-25 12:14           ` Alexander Graf
2014-11-25 12:43             ` Frank Blaschka
2014-11-10 14:20 ` [Qemu-devel] [PATCH 2/3] s390: implement pci instructions Frank Blaschka
2014-11-10 15:56   ` Alexander Graf
2014-11-11 12:10     ` Frank Blaschka
2014-11-11 12:16       ` Alexander Graf
2014-11-11 12:39         ` Frank Blaschka
2014-11-11 12:51           ` Alexander Graf
2014-11-11 14:08             ` Frank Blaschka
2014-11-11 15:24               ` Alexander Graf
2014-11-12  8:49                 ` Frank Blaschka
2014-11-12  9:08                   ` Alexander Graf
2014-11-12  9:11                     ` Paolo Bonzini
2014-11-12  9:13                       ` Alexander Graf
2014-11-12  9:19                     ` Frank Blaschka
2014-11-12  9:22                       ` Alexander Graf
2014-11-12  9:36                         ` Paolo Bonzini
2014-11-12 14:34                           ` Frank Blaschka
2014-11-11 12:17       ` Peter Maydell
2014-11-11 12:40         ` Frank Blaschka
2014-11-10 14:20 ` [Qemu-devel] [PATCH 3/3] kvm: extend kvm_irqchip_add_msi_route to work on s390 Frank Blaschka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).