[Qemu-devel] [PATCH 0/3] RFCv2 kvm irqfd: add directly mapped MSI IRQ support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/3] RFCv2 kvm irqfd: add directly mapped MSI IRQ support
@ 2013-06-21  9:22 Alexey Kardashevskiy
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 1/3] spapr pci msi: rework Alexey Kardashevskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-21  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf,
	Alex Williamson, qemu-ppc, David Gibson

Thanks for the comments on the previous try. Here is some rework via KVM API
rather than via PCI framework. Thanks!

Alexey Kardashevskiy (3):
  spapr pci msi: rework
  KVM: add kvm_arch_irqchip_add_msi_route
  KVM: PPC: enable irqfd

 hw/ppc/spapr.c              |    2 +
 hw/ppc/spapr_pci.c          |   97 ++++++++++++++++++++-----------------------
 include/hw/pci-host/spapr.h |    8 ++--
 include/hw/ppc/spapr.h      |    2 +
 include/sysemu/kvm.h        |    1 +
 kvm-all.c                   |    5 +++
 target-arm/kvm.c            |    6 +++
 target-i386/kvm.c           |    6 +++
 target-ppc/kvm.c            |   12 ++++++
 target-s390x/kvm.c          |    6 +++
 10 files changed, 89 insertions(+), 56 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21  9:22 [Qemu-devel] [PATCH 0/3] RFCv2 kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
@ 2013-06-21  9:22 ` Alexey Kardashevskiy
  2013-06-21 10:31   ` Alexander Graf
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route Alexey Kardashevskiy
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd Alexey Kardashevskiy
  2 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-21  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf,
	Alex Williamson, qemu-ppc, David Gibson

Previously every PCI host bridge implemented its own MSI memory window
in order to catch msi_notify()/msix_notify() calls from various QEMU
MSI-capable devives such as virtio-pci or vfio and redirect them to
the guest via qemu_pulse_irq().

The encoded MSIMessage used to be encoded as:
* .addr - address in a MSI window, this is how QEMU knows which PHB
is the message for;
* .data - number of a device on a specific PHB and vector number.

As a PHB has a destriptor for every device, and every descriptor has
first IRQ number and the number of IRQs, it can calculate global IRQ
number to use in qemu_pulse_irq().

However the total number of IRQs is not really big (at the moment it is
1024 IRQs which start from 4096) and the existing system looks overdesigned.
The patch simplifies it. Specifically:

1. MSI windows were removed from PHB.
2. Added one memory region for all MSIs.
3. Now MSIMessage::addr is a number of first IRQ of a device,
MSIMessage:data is a number of a vector.

Putting IRQ number to .data and not using .addr would make it even simpler
for MSI-X but it will not work for MSI with multiple vectors unless a first
IRQ number of a device is aligned to the MSI vectors number.

The simplified scheme also allows easier MSIMessage->IRQ translation
for upcoming IRQFD support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c              |    2 +
 hw/ppc/spapr_pci.c          |   97 ++++++++++++++++++++-----------------------
 include/hw/pci-host/spapr.h |    8 ++--
 include/hw/ppc/spapr.h      |    2 +
 4 files changed, 53 insertions(+), 56 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b18e99b..5660dba 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1243,6 +1243,8 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 
     /* Set up PCI */
     spapr_pci_rtas_init();
+    spapr_pci_msi_window_init(spapr, SPAPR_PCI_MSI_WINDOW,
+                              XICS_IRQ_BASE + XICS_IRQS);
 
     phb = spapr_create_phb(spapr, 0);
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 3666c8b..cb8596b 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -257,30 +257,6 @@ static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr,
     return -1;
 }
 
-/*
- * Set MSI/MSIX message data.
- * This is required for msi_notify()/msix_notify() which
- * will write at the addresses via spapr_msi_write().
- */
-static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr,
-                             bool msix, unsigned req_num)
-{
-    unsigned i;
-    MSIMessage msg = { .address = addr, .data = 0 };
-
-    if (!msix) {
-        msi_set_message(pdev, msg);
-        trace_spapr_pci_msi_setup(pdev->name, 0, msg.address);
-        return;
-    }
-
-    for (i = 0; i < req_num; ++i) {
-        msg.address = addr | (i << 2);
-        msix_set_message(pdev, i, msg);
-        trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
-    }
-}
-
 static void rtas_ibm_change_msi(sPAPREnvironment *spapr,
                                 uint32_t token, uint32_t nargs,
                                 target_ulong args, uint32_t nret,
@@ -292,9 +268,10 @@ static void rtas_ibm_change_msi(sPAPREnvironment *spapr,
     unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */
     unsigned int seq_num = rtas_ld(args, 5);
     unsigned int ret_intr_type;
-    int ndev, irq;
+    int i, ndev, irq;
     sPAPRPHBState *phb = NULL;
     PCIDevice *pdev = NULL;
+    MSIMessage msg;
 
     switch (func) {
     case RTAS_CHANGE_MSI_FN:
@@ -366,9 +343,29 @@ static void rtas_ibm_change_msi(sPAPREnvironment *spapr,
         phb->msi_table[ndev].config_addr = config_addr;
     }
 
-    /* Setup MSI/MSIX vectors in the device (via cfgspace or MSIX BAR) */
-    spapr_msi_setmsg(pdev, phb->msi_win_addr | (ndev << 16),
-                     ret_intr_type == RTAS_TYPE_MSIX, req_num);
+    /*
+     * Set MSI/MSIX message data.
+     * This is required for msi_notify()/msix_notify() which
+     * will write at the addresses via spapr_msi_write().
+     *
+     * The MSI address is an IRQ number lshifted by 2 in order to align
+     * it to 32bit.
+     * The MSI data is:
+     * MSI: zero for MSI as the caller is expected to pass the vector number there
+     * MSIX: the vector number (to do the same what MSI does)
+     */
+    msg.address = spapr->msi_win_addr + (phb->msi_table[ndev].irq << 2);
+    msg.data = 0;
+    if (ret_intr_type == RTAS_TYPE_MSI) {
+        msi_set_message(pdev, msg);
+        trace_spapr_pci_msi_setup(pdev->name, 0, msg.address);
+    } else {
+        for (i = 0; i < phb->msi_table[ndev].nvec; ++i) {
+            msg.data = i;
+            msix_set_message(pdev, i, msg);
+            trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
+        }
+    }
 
     rtas_st(rets, 0, 0);
     rtas_st(rets, 1, req_num);
@@ -490,10 +487,7 @@ static const MemoryRegionOps spapr_io_ops = {
 static void spapr_msi_write(void *opaque, hwaddr addr,
                             uint64_t data, unsigned size)
 {
-    sPAPRPHBState *phb = opaque;
-    int ndev = addr >> 16;
-    int vec = ((addr & 0xFFFF) >> 2) | data;
-    uint32_t irq = phb->msi_table[ndev].irq + vec;
+    uint32_t irq = (addr >> 2) + data;
 
     trace_spapr_pci_msi_write(addr, data, irq);
 
@@ -507,6 +501,23 @@ static const MemoryRegionOps spapr_msi_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN
 };
 
+void spapr_pci_msi_window_init(sPAPREnvironment *spapr, hwaddr addr, int nvec)
+{
+    if (!msi_supported) {
+        return;
+    }
+    /*
+     * As MSI/MSIX interrupts trigger by writing at MSI/MSIX vectors,
+     * we need to allocate some memory to catch those writes coming
+     * from msi_notify()/msix_notify().
+     */
+    spapr->msi_win_addr = addr;
+    memory_region_init_io(&spapr->msiwindow, &spapr_msi_ops, spapr,
+                          "msi", nvec << 2);
+    memory_region_add_subregion(get_system_memory(), spapr->msi_win_addr,
+                                &spapr->msiwindow);
+}
+
 /*
  * PHB PCI device
  */
@@ -535,8 +546,7 @@ static int _spapr_phb_init(SysBusDevice *s)
 
         if ((sphb->buid != -1) || (sphb->dma_liobn != -1)
             || (sphb->mem_win_addr != -1)
-            || (sphb->io_win_addr != -1)
-            || (sphb->msi_win_addr != -1)) {
+            || (sphb->io_win_addr != -1)) {
             fprintf(stderr, "Either \"index\" or other parameters must"
                     " be specified for PAPR PHB, not both\n");
             return -1;
@@ -549,7 +559,6 @@ static int _spapr_phb_init(SysBusDevice *s)
             + sphb->index * SPAPR_PCI_WINDOW_SPACING;
         sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
         sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
-        sphb->msi_win_addr = windows_base + SPAPR_PCI_MSI_WIN_OFF;
     }
 
     if (sphb->buid == -1) {
@@ -572,11 +581,6 @@ static int _spapr_phb_init(SysBusDevice *s)
         return -1;
     }
 
-    if (sphb->msi_win_addr == -1) {
-        fprintf(stderr, "MSI window address not specified for PHB\n");
-        return -1;
-    }
-
     if (find_phb(spapr, sphb->buid)) {
         fprintf(stderr, "PCI host bridges must have unique BUIDs\n");
         return -1;
@@ -615,17 +619,6 @@ static int _spapr_phb_init(SysBusDevice *s)
     memory_region_add_subregion(get_system_memory(), sphb->io_win_addr,
                                 &sphb->iowindow);
 
-    /* As MSI/MSIX interrupts trigger by writing at MSI/MSIX vectors,
-     * we need to allocate some memory to catch those writes coming
-     * from msi_notify()/msix_notify() */
-    if (msi_supported) {
-        sprintf(namebuf, "%s.msi", sphb->dtbusname);
-        memory_region_init_io(&sphb->msiwindow, &spapr_msi_ops, sphb,
-                              namebuf, SPAPR_MSIX_MAX_DEVS * 0x10000);
-        memory_region_add_subregion(get_system_memory(), sphb->msi_win_addr,
-                                    &sphb->msiwindow);
-    }
-
     /*
      * Selecting a busname is more complex than you'd think, due to
      * interacting constraints.  If the user has specified an id
@@ -710,7 +703,6 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_HEX64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_HEX64("io_win_size", sPAPRPHBState, io_win_size,
                       SPAPR_PCI_IO_WIN_SIZE),
-    DEFINE_PROP_HEX64("msi_win_addr", sPAPRPHBState, msi_win_addr, -1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -760,7 +752,6 @@ static const VMStateDescription vmstate_spapr_pci = {
         VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
         VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
         VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
-        VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
         VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
                              vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
         VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 398dec5..cf1b22e 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -47,8 +47,7 @@ typedef struct sPAPRPHBState {
 
     MemoryRegion memspace, iospace;
     hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
-    hwaddr msi_win_addr;
-    MemoryRegion memwindow, iowindow, msiwindow;
+    MemoryRegion memwindow, iowindow;
 
     uint32_t dma_liobn;
     uint64_t dma_window_start;
@@ -85,7 +84,8 @@ typedef struct sPAPRPHBVFIOState {
 #define SPAPR_PCI_MMIO_WIN_SIZE      0x20000000
 #define SPAPR_PCI_IO_WIN_OFF         0x80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
-#define SPAPR_PCI_MSI_WIN_OFF        0x90000000
+
+#define SPAPR_PCI_MSI_WINDOW         0x20000000000ULL
 
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 
@@ -100,6 +100,8 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
                           void *fdt);
 
+void spapr_pci_msi_window_init(sPAPREnvironment *spapr, hwaddr addr, int nvec);
+
 void spapr_pci_rtas_init(void);
 
 #endif /* __HW_SPAPR_PCI_H__ */
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0dc005c..598c0e9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -14,6 +14,8 @@ struct icp_state;
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
+    hwaddr msi_win_addr;
+    MemoryRegion msiwindow;
     struct sPAPRNVRAM *nvram;
     struct icp_state *icp;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 1/3] spapr pci msi: rework Alexey Kardashevskiy
@ 2013-06-21 10:31   ` Alexander Graf
  2013-06-21 10:52     ` Alexey Kardashevskiy
  2013-06-21 12:02     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 22+ messages in thread
From: Alexander Graf @ 2013-06-21 10:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alex Williamson, qemu-ppc,
	David Gibson


On 21.06.2013, at 11:22, Alexey Kardashevskiy wrote:

> Previously every PCI host bridge implemented its own MSI memory window
> in order to catch msi_notify()/msix_notify() calls from various QEMU
> MSI-capable devives such as virtio-pci or vfio and redirect them to
> the guest via qemu_pulse_irq().

That's how hardware works, no?

> 
> The encoded MSIMessage used to be encoded as:
> * .addr - address in a MSI window, this is how QEMU knows which PHB
> is the message for;
> * .data - number of a device on a specific PHB and vector number.
> 
> As a PHB has a destriptor for every device, and every descriptor has
> first IRQ number and the number of IRQs, it can calculate global IRQ
> number to use in qemu_pulse_irq().

How does this work on real hardware?


Alex

> However the total number of IRQs is not really big (at the moment it is
> 1024 IRQs which start from 4096) and the existing system looks overdesigned.
> The patch simplifies it. Specifically:
> 
> 1. MSI windows were removed from PHB.
> 2. Added one memory region for all MSIs.
> 3. Now MSIMessage::addr is a number of first IRQ of a device,
> MSIMessage:data is a number of a vector.
> 
> Putting IRQ number to .data and not using .addr would make it even simpler
> for MSI-X but it will not work for MSI with multiple vectors unless a first
> IRQ number of a device is aligned to the MSI vectors number.
> 
> The simplified scheme also allows easier MSIMessage->IRQ translation
> for upcoming IRQFD support.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21 10:31   ` Alexander Graf
@ 2013-06-21 10:52     ` Alexey Kardashevskiy
  2013-06-21 11:58       ` Anthony Liguori
  2013-06-21 12:02     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-21 10:52 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, qemu-devel, Alex Williamson, qemu-ppc,
	David Gibson

On 06/21/2013 08:31 PM, Alexander Graf wrote:
> 
> On 21.06.2013, at 11:22, Alexey Kardashevskiy wrote:
> 
>> Previously every PCI host bridge implemented its own MSI memory window
>> in order to catch msi_notify()/msix_notify() calls from various QEMU
>> MSI-capable devives such as virtio-pci or vfio and redirect them to
>> the guest via qemu_pulse_irq().
> 
> That's how hardware works, no?
> 
>>
>> The encoded MSIMessage used to be encoded as:
>> * .addr - address in a MSI window, this is how QEMU knows which PHB
>> is the message for;
>> * .data - number of a device on a specific PHB and vector number.
>>
>> As a PHB has a destriptor for every device, and every descriptor has
>> first IRQ number and the number of IRQs, it can calculate global IRQ
>> number to use in qemu_pulse_irq().
> 
> How does this work on real hardware?


I do not understand the question, really. Here we are emulating pHyp which
is not real hardware and never pretended to be. Our guests do not touch MSI
records in the config space and use RTAS MSI calls instead.



> 
> 
> Alex
> 
>> However the total number of IRQs is not really big (at the moment it is
>> 1024 IRQs which start from 4096) and the existing system looks overdesigned.
>> The patch simplifies it. Specifically:
>>
>> 1. MSI windows were removed from PHB.
>> 2. Added one memory region for all MSIs.
>> 3. Now MSIMessage::addr is a number of first IRQ of a device,
>> MSIMessage:data is a number of a vector.
>>
>> Putting IRQ number to .data and not using .addr would make it even simpler
>> for MSI-X but it will not work for MSI with multiple vectors unless a first
>> IRQ number of a device is aligned to the MSI vectors number.
>>
>> The simplified scheme also allows easier MSIMessage->IRQ translation
>> for upcoming IRQFD support.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21 10:52     ` Alexey Kardashevskiy
@ 2013-06-21 11:58       ` Anthony Liguori
  2013-06-21 11:59         ` Alexander Graf
  2013-06-21 12:09         ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 22+ messages in thread
From: Anthony Liguori @ 2013-06-21 11:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Alexander Graf
  Cc: Alex Williamson, qemu-ppc, qemu-devel, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 06/21/2013 08:31 PM, Alexander Graf wrote:
>> 
>> On 21.06.2013, at 11:22, Alexey Kardashevskiy wrote:
>> 
>>> Previously every PCI host bridge implemented its own MSI memory window
>>> in order to catch msi_notify()/msix_notify() calls from various QEMU
>>> MSI-capable devives such as virtio-pci or vfio and redirect them to
>>> the guest via qemu_pulse_irq().
>> 
>> That's how hardware works, no?
>> 
>>>
>>> The encoded MSIMessage used to be encoded as:
>>> * .addr - address in a MSI window, this is how QEMU knows which PHB
>>> is the message for;
>>> * .data - number of a device on a specific PHB and vector number.
>>>
>>> As a PHB has a destriptor for every device, and every descriptor has
>>> first IRQ number and the number of IRQs, it can calculate global IRQ
>>> number to use in qemu_pulse_irq().
>> 
>> How does this work on real hardware?
>
>
> I do not understand the question, really. Here we are emulating pHyp which
> is not real hardware and never pretended to be. Our guests do not touch MSI
> records in the config space and use RTAS MSI calls instead.

But RTAS is implemented as guest code.  I suspect it's doing region
access to generate the actual MSI events.

Regards,

Anthony Liguori

>
>
>> 
>> 
>> Alex
>> 
>>> However the total number of IRQs is not really big (at the moment it is
>>> 1024 IRQs which start from 4096) and the existing system looks overdesigned.
>>> The patch simplifies it. Specifically:
>>>
>>> 1. MSI windows were removed from PHB.
>>> 2. Added one memory region for all MSIs.
>>> 3. Now MSIMessage::addr is a number of first IRQ of a device,
>>> MSIMessage:data is a number of a vector.
>>>
>>> Putting IRQ number to .data and not using .addr would make it even simpler
>>> for MSI-X but it will not work for MSI with multiple vectors unless a first
>>> IRQ number of a device is aligned to the MSI vectors number.
>>>
>>> The simplified scheme also allows easier MSIMessage->IRQ translation
>>> for upcoming IRQFD support.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> 
>
>
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21 11:58       ` Anthony Liguori
@ 2013-06-21 11:59         ` Alexander Graf
  2013-06-21 12:09         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 22+ messages in thread
From: Alexander Graf @ 2013-06-21 11:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alex Williamson, qemu-ppc,
	David Gibson


On 21.06.2013, at 13:58, Anthony Liguori wrote:

> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> On 06/21/2013 08:31 PM, Alexander Graf wrote:
>>> 
>>> On 21.06.2013, at 11:22, Alexey Kardashevskiy wrote:
>>> 
>>>> Previously every PCI host bridge implemented its own MSI memory window
>>>> in order to catch msi_notify()/msix_notify() calls from various QEMU
>>>> MSI-capable devives such as virtio-pci or vfio and redirect them to
>>>> the guest via qemu_pulse_irq().
>>> 
>>> That's how hardware works, no?
>>> 
>>>> 
>>>> The encoded MSIMessage used to be encoded as:
>>>> * .addr - address in a MSI window, this is how QEMU knows which PHB
>>>> is the message for;
>>>> * .data - number of a device on a specific PHB and vector number.
>>>> 
>>>> As a PHB has a destriptor for every device, and every descriptor has
>>>> first IRQ number and the number of IRQs, it can calculate global IRQ
>>>> number to use in qemu_pulse_irq().
>>> 
>>> How does this work on real hardware?
>> 
>> 
>> I do not understand the question, really. Here we are emulating pHyp which
>> is not real hardware and never pretended to be. Our guests do not touch MSI
>> records in the config space and use RTAS MSI calls instead.
> 
> But RTAS is implemented as guest code.  I suspect it's doing region
> access to generate the actual MSI events.

RTAS is actually implemented as a hypercall, which IIRC in our case modifies the PCI config space registers directly.


ALex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21 11:58       ` Anthony Liguori
  2013-06-21 11:59         ` Alexander Graf
@ 2013-06-21 12:09         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 12:09 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, Alex Williamson,
	qemu-ppc, David Gibson

On Fri, 2013-06-21 at 06:58 -0500, Anthony Liguori wrote:
> > I do not understand the question, really. Here we are emulating pHyp which
> > is not real hardware and never pretended to be. Our guests do not touch MSI
> > records in the config space and use RTAS MSI calls instead.
> 
> But RTAS is implemented as guest code.  I suspect it's doing region
> access to generate the actual MSI events.

No, not really.

On pHyp, RTAS just gets the list of MSIs for the slot from pHyp using private
hypercalls, including the MSI address.

Note that in HW the MSI address is specific to a PCI host bridge, ie, you have 6
bridges, they may all 6 provide the same addresses to the device though they decode
them to different PE's (protection domains).

But it doesn't matter. From a PAPR guest perspective, indeed, we don't care,
*except* for a hack that went upstream in 3.10 that tries to enforce 32-bit
MSIs on broken AMD video cards and makes assumptions based on the PCIe bus speed
in the device-tree :-) However we shouldn't hit that.

In any case, if Alexey was to actually emulate our real HW, just having the address
+ data is not enough to identify a specific interrupt since each PHB will have
it's own domain there.

Thus the host bridge must be passed down the call at the very least.

>From there, the way we internally generate and decode those address/data in qemu
is of no relevance since PAPR being paravirtualized, the guest doesn't care.

Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] spapr pci msi: rework
  2013-06-21 10:31   ` Alexander Graf
  2013-06-21 10:52     ` Alexey Kardashevskiy
@ 2013-06-21 12:02     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 12:02 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson

On Fri, 2013-06-21 at 12:31 +0200, Alexander Graf wrote:
> On 21.06.2013, at 11:22, Alexey Kardashevskiy wrote:
> 
> > Previously every PCI host bridge implemented its own MSI memory window
> > in order to catch msi_notify()/msix_notify() calls from various QEMU
> > MSI-capable devives such as virtio-pci or vfio and redirect them to
> > the guest via qemu_pulse_irq().
> 
> That's how hardware works, no?
> 
> > 
> > The encoded MSIMessage used to be encoded as:
> > * .addr - address in a MSI window, this is how QEMU knows which PHB
> > is the message for;
> > * .data - number of a device on a specific PHB and vector number.
> > 
> > As a PHB has a destriptor for every device, and every descriptor has
> > first IRQ number and the number of IRQs, it can calculate global IRQ
> > number to use in qemu_pulse_irq().
> 
> How does this work on real hardware?

Why would we care ?

The MSIs are established by an hcall anyway, the guest sees nothing of
that, we can encode what we want in there.

Now the way it works on real HW is different from Alex did originally.

The address of the MSI defines an MSI "port" which is purely a mechanism
for verifying that the device is authorized to do MSIs (to that port)
and associate a PE# with it, which is cross-match with the PHB-local MSI
index which is the message  value

That value is then ORed with a PHB-configured "BUID" which forms the
full global interrupt number.

At least on P7IOC ... P8 is a bit different. But we don't really need to
reproduce that exactly.

Ben.

> 
> Alex
> 
> > However the total number of IRQs is not really big (at the moment it is
> > 1024 IRQs which start from 4096) and the existing system looks overdesigned.
> > The patch simplifies it. Specifically:
> > 
> > 1. MSI windows were removed from PHB.
> > 2. Added one memory region for all MSIs.
> > 3. Now MSIMessage::addr is a number of first IRQ of a device,
> > MSIMessage:data is a number of a vector.
> > 
> > Putting IRQ number to .data and not using .addr would make it even simpler
> > for MSI-X but it will not work for MSI with multiple vectors unless a first
> > IRQ number of a device is aligned to the MSI vectors number.
> > 
> > The simplified scheme also allows easier MSIMessage->IRQ translation
> > for upcoming IRQFD support.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21  9:22 [Qemu-devel] [PATCH 0/3] RFCv2 kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 1/3] spapr pci msi: rework Alexey Kardashevskiy
@ 2013-06-21  9:22 ` Alexey Kardashevskiy
  2013-06-21 10:33   ` Alexander Graf
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd Alexey Kardashevskiy
  2 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-21  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf,
	Alex Williamson, qemu-ppc, David Gibson

At the moment kvm_irqchip_add_msi_route() adds routing entries for every
MSI IRQ we want to use with IRQFD.

However on PPC64-pSeries no routing is required as QEMU is already doing
MSIMessage to IRQ converson. When the guest needs to allocate some
MSI vectors, it asks a hypervisor (QEMU) which returns IRQs and does all
the MSIMessage allocation/initialization in QEMU.

The patch adds callbacks to let a specific platform define how exactly
the routing is configured.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/sysemu/kvm.h |    1 +
 kvm-all.c            |    5 +++++
 target-arm/kvm.c     |    6 ++++++
 target-i386/kvm.c    |    6 ++++++
 target-ppc/kvm.c     |   10 ++++++++++
 target-s390x/kvm.c   |    6 ++++++
 6 files changed, 34 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 8b19322..9d5459d 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -215,6 +215,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 int kvm_arch_on_sigbus(int code, void *addr);
 
 void kvm_arch_init_irq_routing(KVMState *s);
+int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
 
 int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
diff --git a/kvm-all.c b/kvm-all.c
index 91aa7ff..a92cc04 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1187,6 +1187,11 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
     struct kvm_irq_routing_entry kroute;
     int virq;
 
+    virq = kvm_arch_irqchip_add_msi_route(s, msg);
+    if (virq >= 0) {
+        return virq;
+    }
+
     if (!kvm_gsi_routing_enabled()) {
         return -ENOSYS;
     }
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 7a90d38..1a9bec1 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -22,6 +22,7 @@
 #include "kvm_arm.h"
 #include "cpu.h"
 #include "hw/arm/arm.h"
+#include "hw/pci/msi.h"
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
     KVM_CAP_LAST_INFO
@@ -497,3 +498,8 @@ void kvm_arch_remove_all_hw_breakpoints(void)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
+
+int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+    return -1;
+}
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7ba98cd..96bb85c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -33,6 +33,7 @@
 #include "exec/ioport.h"
 #include "hyperv.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
 
 //#define DEBUG_KVM
 
@@ -2207,6 +2208,11 @@ void kvm_arch_init_irq_routing(KVMState *s)
     kvm_gsi_routing_allowed = true;
 }
 
+int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+    return -1;
+}
+
 /* Classic KVM device assignment interface. Will remain x86 only. */
 int kvm_device_pci_assign(KVMState *s, PCIHostDeviceAddress *dev_addr,
                           uint32_t flags, uint32_t *dev_id)
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index b60b684..d6da146 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -35,6 +35,7 @@
 #include "hw/sysbus.h"
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
+#include "hw/pci/msi.h"
 #include "sysemu/watchdog.h"
 
 //#define DEBUG_KVM
@@ -1973,3 +1974,12 @@ int kvm_arch_on_sigbus(int code, void *addr)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
+
+int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+    if (!kvm_msi_via_irqfd_allowed)
+        return -1;
+
+    msg.address -= spapr->msi_win_addr;
+    return (msg.address >> 2) + msg.data;
+}
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 4d9ac4a..4fa7a3d 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -36,6 +36,7 @@
 #include "sysemu/device_tree.h"
 #include "qapi/qmp/qjson.h"
 #include "monitor/monitor.h"
+#include "hw/pci/msi.h"
 
 /* #define DEBUG_KVM */
 
@@ -932,3 +933,8 @@ void kvm_s390_enable_css_support(S390CPU *cpu)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
+
+int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+    return -1;
+}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route Alexey Kardashevskiy
@ 2013-06-21 10:33   ` Alexander Graf
  2013-06-21 12:03     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Graf @ 2013-06-21 10:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alex Williamson, qemu-ppc,
	David Gibson


On 21.06.2013, at 11:22, Alexey Kardashevskiy wrote:

> At the moment kvm_irqchip_add_msi_route() adds routing entries for every
> MSI IRQ we want to use with IRQFD.
> 
> However on PPC64-pSeries no routing is required as QEMU is already doing
> MSIMessage to IRQ converson. When the guest needs to allocate some
> MSI vectors, it asks a hypervisor (QEMU) which returns IRQs and does all
> the MSIMessage allocation/initialization in QEMU.
> 
> The patch adds callbacks to let a specific platform define how exactly
> the routing is configured.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> include/sysemu/kvm.h |    1 +
> kvm-all.c            |    5 +++++
> target-arm/kvm.c     |    6 ++++++
> target-i386/kvm.c    |    6 ++++++
> target-ppc/kvm.c     |   10 ++++++++++
> target-s390x/kvm.c   |    6 ++++++
> 6 files changed, 34 insertions(+)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 8b19322..9d5459d 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -215,6 +215,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> int kvm_arch_on_sigbus(int code, void *addr);
> 
> void kvm_arch_init_irq_routing(KVMState *s);
> +int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
> 
> int kvm_set_irq(KVMState *s, int irq, int level);
> int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
> diff --git a/kvm-all.c b/kvm-all.c
> index 91aa7ff..a92cc04 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1187,6 +1187,11 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
>     struct kvm_irq_routing_entry kroute;
>     int virq;
> 
> +    virq = kvm_arch_irqchip_add_msi_route(s, msg);
> +    if (virq >= 0) {
> +        return virq;
> +    }
> +
>     if (!kvm_gsi_routing_enabled()) {
>         return -ENOSYS;
>     }
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 7a90d38..1a9bec1 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -22,6 +22,7 @@
> #include "kvm_arm.h"
> #include "cpu.h"
> #include "hw/arm/arm.h"
> +#include "hw/pci/msi.h"
> 
> const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>     KVM_CAP_LAST_INFO
> @@ -497,3 +498,8 @@ void kvm_arch_remove_all_hw_breakpoints(void)
> void kvm_arch_init_irq_routing(KVMState *s)
> {
> }
> +
> +int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
> +{
> +    return -1;
> +}
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 7ba98cd..96bb85c 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -33,6 +33,7 @@
> #include "exec/ioport.h"
> #include "hyperv.h"
> #include "hw/pci/pci.h"
> +#include "hw/pci/msi.h"
> 
> //#define DEBUG_KVM
> 
> @@ -2207,6 +2208,11 @@ void kvm_arch_init_irq_routing(KVMState *s)
>     kvm_gsi_routing_allowed = true;
> }
> 
> +int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
> +{
> +    return -1;
> +}
> +
> /* Classic KVM device assignment interface. Will remain x86 only. */
> int kvm_device_pci_assign(KVMState *s, PCIHostDeviceAddress *dev_addr,
>                           uint32_t flags, uint32_t *dev_id)
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index b60b684..d6da146 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -35,6 +35,7 @@
> #include "hw/sysbus.h"
> #include "hw/ppc/spapr.h"
> #include "hw/ppc/spapr_vio.h"
> +#include "hw/pci/msi.h"
> #include "sysemu/watchdog.h"
> 
> //#define DEBUG_KVM
> @@ -1973,3 +1974,12 @@ int kvm_arch_on_sigbus(int code, void *addr)
> void kvm_arch_init_irq_routing(KVMState *s)
> {
> }
> +
> +int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
> +{
> +    if (!kvm_msi_via_irqfd_allowed)
> +        return -1;
> +
> +    msg.address -= spapr->msi_win_addr;
> +    return (msg.address >> 2) + msg.data;

This breaks BookE.


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 10:33   ` Alexander Graf
@ 2013-06-21 12:03     ` Benjamin Herrenschmidt
  2013-06-21 12:05       ` Alexander Graf
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 12:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson

On Fri, 2013-06-21 at 12:33 +0200, Alexander Graf wrote:
> > +int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
> > +{
> > +    if (!kvm_msi_via_irqfd_allowed)
> > +        return -1;
> > +
> > +    msg.address -= spapr->msi_win_addr;
> > +    return (msg.address >> 2) + msg.data;
> 
> This breaks BookE.

It might be a bit more constructive to explain why and maybe propose a
solution... Alexey doesn't necessarily know the specifics of the BookE
stuff :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 12:03     ` Benjamin Herrenschmidt
@ 2013-06-21 12:05       ` Alexander Graf
  2013-06-21 12:10         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Graf @ 2013-06-21 12:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson


On 21.06.2013, at 14:03, Benjamin Herrenschmidt wrote:

> On Fri, 2013-06-21 at 12:33 +0200, Alexander Graf wrote:
>>> +int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
>>> +{
>>> +    if (!kvm_msi_via_irqfd_allowed)
>>> +        return -1;
>>> +
>>> +    msg.address -= spapr->msi_win_addr;
>>> +    return (msg.address >> 2) + msg.data;
>> 
>> This breaks BookE.
> 
> It might be a bit more constructive to explain why and maybe propose a
> solution... Alexey doesn't necessarily know the specifics of the BookE
> stuff :-)

Oh, I thought that was obvious. BookE also has in-kernel emulation and also does set kvm_msi_via_irqfd_allowed. The code above is spapr specific and would simply rewrite valid BookE MPIC addresses into something broken.

In fact, where does the spapr variable come from at all here?


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 12:05       ` Alexander Graf
@ 2013-06-21 12:10         ` Benjamin Herrenschmidt
  2013-06-21 13:46           ` Alexander Graf
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 12:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson

On Fri, 2013-06-21 at 14:05 +0200, Alexander Graf wrote:
> Oh, I thought that was obvious. BookE also has in-kernel emulation and
> also does set kvm_msi_via_irqfd_allowed. The code above is spapr
> specific and would simply rewrite valid BookE MPIC addresses into
> something broken.
> 
> In fact, where does the spapr variable come from at all here?

Right, so we are back to square #1 -> This should be a method of the
PCI Host bridge.

Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 12:10         ` Benjamin Herrenschmidt
@ 2013-06-21 13:46           ` Alexander Graf
  2013-06-21 21:54             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Graf @ 2013-06-21 13:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson


On 21.06.2013, at 14:10, Benjamin Herrenschmidt wrote:

> On Fri, 2013-06-21 at 14:05 +0200, Alexander Graf wrote:
>> Oh, I thought that was obvious. BookE also has in-kernel emulation and
>> also does set kvm_msi_via_irqfd_allowed. The code above is spapr
>> specific and would simply rewrite valid BookE MPIC addresses into
>> something broken.
>> 
>> In fact, where does the spapr variable come from at all here?
> 
> Right, so we are back to square #1 -> This should be a method of the
> PCI Host bridge.

Not sure. We could just declare a "direct virq==irq" mode in which msi.data == virq == irq. No need for any translation then.


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 13:46           ` Alexander Graf
@ 2013-06-21 21:54             ` Benjamin Herrenschmidt
  2013-06-21 22:12               ` Alexander Graf
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 21:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson

On Fri, 2013-06-21 at 15:46 +0200, Alexander Graf wrote:
> Not sure. We could just declare a "direct virq==irq" mode in which
> msi.data == virq == irq. No need for any translation then.

Maybe. Beware that MSI data is only 16-bit on the wire but we may not
care here.

One thing I'm not 100% certain of is how Alexey makes all that work with
VFIO since the MSI address/data in the device shall not be the qemu
"cooked up" ones, but the real HW ones (corresponding to a different
host interrupt).

How do that work ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 21:54             ` Benjamin Herrenschmidt
@ 2013-06-21 22:12               ` Alexander Graf
  2013-06-21 22:21                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Graf @ 2013-06-21 22:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson

On 21.06.2013, at 23:54, Benjamin Herrenschmidt wrote:

> On Fri, 2013-06-21 at 15:46 +0200, Alexander Graf wrote:
>> Not sure. We could just declare a "direct virq==irq" mode in which
>> msi.data == virq == irq. No need for any translation then.
> 
> Maybe. Beware that MSI data is only 16-bit on the wire but we may not
> care here.
> 
> One thing I'm not 100% certain of is how Alexey makes all that work with
> VFIO since the MSI address/data in the device shall not be the qemu
> "cooked up" ones, but the real HW ones (corresponding to a different
> host interrupt).
> 
> How do that work ?

The real device address/data go to a normal host interrupt vector. Once we hit such a vector, we need to find out that it's destined for the guest in real mode - no idea how you planned to do that - and then reinject it back into the guest with the virtual irq vector that you can find out by asking the irqfd hopefully.

It might make sense to implement it the easy way without real mode first, and then take it from there ;).

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 22:12               ` Alexander Graf
@ 2013-06-21 22:21                 ` Benjamin Herrenschmidt
  2013-06-21 23:10                   ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 22:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alex Williamson, qemu-ppc, David Gibson

On Sat, 2013-06-22 at 00:12 +0200, Alexander Graf wrote:
> On 21.06.2013, at 23:54, Benjamin Herrenschmidt wrote:
> 
> > On Fri, 2013-06-21 at 15:46 +0200, Alexander Graf wrote:
> >> Not sure. We could just declare a "direct virq==irq" mode in which
> >> msi.data == virq == irq. No need for any translation then.
> > 
> > Maybe. Beware that MSI data is only 16-bit on the wire but we may
> not
> > care here.
> > 
> > One thing I'm not 100% certain of is how Alexey makes all that work
> with
> > VFIO since the MSI address/data in the device shall not be the qemu
> > "cooked up" ones, but the real HW ones (corresponding to a different
> > host interrupt).
> > 
> > How do that work ?
> 
> The real device address/data go to a normal host interrupt vector.

Right, I know that :-)

>  Once we hit such a vector, we need to find out that it's destined for
> the guest in real mode - no idea how you planned to do that - and then
> reinject it back into the guest with the virtual irq vector that you
> can find out by asking the irqfd hopefully.
> 
> It might make sense to implement it the easy way without real mode
> first, and then take it from there ;).

We have that already. That isn't my question.

How is the "configuration" done ?

On x86, I assume, the guest kernel directly whacks the MSI config space
and MSI-X BAR space using some address/data it cooks up which are *not*
the real ones needed in the HW right ?

So qemu seems to play tricks to intercept the MSI-X BAR, I swa that, but
how does it know what real value to put in the HW instead ?

In our case, the guest calls RTAS, which needs to configure "something"
in the device. What does it do ?

Does it call into VFIO which then does a pci_enable_msi[x] ? In that
case how do you deal with a guest for example prodding a single MSI-X
in the middle of the array ? This is not an interface supported by
pci_enable_msix ....

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 22:21                 ` Benjamin Herrenschmidt
@ 2013-06-21 23:10                   ` Alex Williamson
  2013-06-21 23:19                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2013-06-21 23:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-devel,
	qemu-ppc, David Gibson

On Sat, 2013-06-22 at 08:21 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2013-06-22 at 00:12 +0200, Alexander Graf wrote:
> > On 21.06.2013, at 23:54, Benjamin Herrenschmidt wrote:
> > 
> > > On Fri, 2013-06-21 at 15:46 +0200, Alexander Graf wrote:
> > >> Not sure. We could just declare a "direct virq==irq" mode in which
> > >> msi.data == virq == irq. No need for any translation then.
> > > 
> > > Maybe. Beware that MSI data is only 16-bit on the wire but we may
> > not
> > > care here.
> > > 
> > > One thing I'm not 100% certain of is how Alexey makes all that work
> > with
> > > VFIO since the MSI address/data in the device shall not be the qemu
> > > "cooked up" ones, but the real HW ones (corresponding to a different
> > > host interrupt).
> > > 
> > > How do that work ?
> > 
> > The real device address/data go to a normal host interrupt vector.
> 
> Right, I know that :-)
> 
> >  Once we hit such a vector, we need to find out that it's destined for
> > the guest in real mode - no idea how you planned to do that - and then
> > reinject it back into the guest with the virtual irq vector that you
> > can find out by asking the irqfd hopefully.
> > 
> > It might make sense to implement it the easy way without real mode
> > first, and then take it from there ;).
> 
> We have that already. That isn't my question.
> 
> How is the "configuration" done ?
> 
> On x86, I assume, the guest kernel directly whacks the MSI config space
> and MSI-X BAR space using some address/data it cooks up which are *not*
> the real ones needed in the HW right ?
> 
> So qemu seems to play tricks to intercept the MSI-X BAR, I swa that, but
> how does it know what real value to put in the HW instead ?
> 
> In our case, the guest calls RTAS, which needs to configure "something"
> in the device. What does it do ?
> 
> Does it call into VFIO which then does a pci_enable_msi[x] ? In that
> case how do you deal with a guest for example prodding a single MSI-X
> in the middle of the array ? This is not an interface supported by
> pci_enable_msix ....

MSI-X is rather ugly.  As you suggest, we trap accesses to the MSI-X
table.  We don't know how many vectors the guest is going to use, so we
incrementally add them by disabling and re-enabling with a new vector
count.  The host decides what to put in the table, we don't care.  All
interrupts bounce through the host and get to the guest via eventfd,
either through qemu or directly through KVM irqfd.

If an in-use vector is modified, we write the new "MSI route" to KVM,
the host doesn't need to care.  If a vector is masked, we free the host
irq handler w/o modifying the vector configuration.  There's a comment
in hw/misc/vfio.c that we could also just bounce masked vectors through
qemu and let it drop it if we wanted to completely avoid toggling the
host.  Linux currently does not have a usable interface for masking
vectors at the device.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route
  2013-06-21 23:10                   ` Alex Williamson
@ 2013-06-21 23:19                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-21 23:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-devel,
	qemu-ppc, David Gibson

On Fri, 2013-06-21 at 17:10 -0600, Alex Williamson wrote:

> MSI-X is rather ugly.  As you suggest, we trap accesses to the MSI-X
> table.  We don't know how many vectors the guest is going to use, so we
> incrementally add them by disabling and re-enabling with a new vector
> count.  The host decides what to put in the table, we don't care.  All
> interrupts bounce through the host and get to the guest via eventfd,
> either through qemu or directly through KVM irqfd.
> 
> If an in-use vector is modified, we write the new "MSI route" to KVM,
> the host doesn't need to care.  If a vector is masked, we free the host
> irq handler w/o modifying the vector configuration.  There's a comment
> in hw/misc/vfio.c that we could also just bounce masked vectors through
> qemu and let it drop it if we wanted to completely avoid toggling the
> host.  Linux currently does not have a usable interface for masking
> vectors at the device.  Thanks,

Right. It might also be handy to add what Willy's been talking about
for a while now, which is an interface to enable/setup individual MSI-X
at runtime.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd
  2013-06-21  9:22 [Qemu-devel] [PATCH 0/3] RFCv2 kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 1/3] spapr pci msi: rework Alexey Kardashevskiy
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route Alexey Kardashevskiy
@ 2013-06-21  9:22 ` Alexey Kardashevskiy
  2013-06-21 17:52   ` Scott Wood
  2 siblings, 1 reply; 22+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-21  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf,
	Alex Williamson, qemu-ppc, David Gibson

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target-ppc/kvm.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index d6da146..e72c335 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1973,6 +1973,8 @@ int kvm_arch_on_sigbus(int code, void *addr)
 
 void kvm_arch_init_irq_routing(KVMState *s)
 {
+    kvm_irqfds_allowed = true;
+    kvm_msi_via_irqfd_allowed = true;
 }
 
 int kvm_arch_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd
  2013-06-21  9:22 ` [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd Alexey Kardashevskiy
@ 2013-06-21 17:52   ` Scott Wood
  2013-06-22  1:12     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Scott Wood @ 2013-06-21 17:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, Alex Williamson,
	qemu-ppc, David Gibson

On 06/21/2013 04:22:55 AM, Alexey Kardashevskiy wrote:
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  target-ppc/kvm.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index d6da146..e72c335 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -1973,6 +1973,8 @@ int kvm_arch_on_sigbus(int code, void *addr)
> 
>  void kvm_arch_init_irq_routing(KVMState *s)
>  {
> +    kvm_irqfds_allowed = true;
> +    kvm_msi_via_irqfd_allowed = true;
>  }

Shouldn't this should go in pseries-specific code, as was done for  
MPIC?  The model of doing IRQ-related things per "arch" rather than per  
IRQ subsystem doesn't work well outside of certain architectures such  
as x86, where more than just the ISA is standardized.

-Scott

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd
  2013-06-21 17:52   ` Scott Wood
@ 2013-06-22  1:12     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 22+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-22  1:12 UTC (permalink / raw)
  To: Scott Wood
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, Alex Williamson,
	qemu-ppc, David Gibson

On 06/22/2013 03:52 AM, Scott Wood wrote:
> On 06/21/2013 04:22:55 AM, Alexey Kardashevskiy wrote:
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  target-ppc/kvm.c |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index d6da146..e72c335 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -1973,6 +1973,8 @@ int kvm_arch_on_sigbus(int code, void *addr)
>>
>>  void kvm_arch_init_irq_routing(KVMState *s)
>>  {
>> +    kvm_irqfds_allowed = true;
>> +    kvm_msi_via_irqfd_allowed = true;
>>  }
> 
> Shouldn't this should go in pseries-specific code, as was done for MPIC? 
> The model of doing IRQ-related things per "arch" rather than per IRQ
> subsystem doesn't work well outside of certain architectures such as x86,
> where more than just the ISA is standardized.

Yes, we discussed with Alex Graf in IRC and I'll post another (sorry :) )
set of patches. It will be something like a
kvm_msi_irqfd_enable_direct_mapping flag and using msimessage::data as a
virq in that routing function, should be easy and harmless for others.


-- 
Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-06-22  1:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-21  9:22 [Qemu-devel] [PATCH 0/3] RFCv2 kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
2013-06-21  9:22 ` [Qemu-devel] [PATCH 1/3] spapr pci msi: rework Alexey Kardashevskiy
2013-06-21 10:31   ` Alexander Graf
2013-06-21 10:52     ` Alexey Kardashevskiy
2013-06-21 11:58       ` Anthony Liguori
2013-06-21 11:59         ` Alexander Graf
2013-06-21 12:09         ` Benjamin Herrenschmidt
2013-06-21 12:02     ` Benjamin Herrenschmidt
2013-06-21  9:22 ` [Qemu-devel] [PATCH 2/3] KVM: add kvm_arch_irqchip_add_msi_route Alexey Kardashevskiy
2013-06-21 10:33   ` Alexander Graf
2013-06-21 12:03     ` Benjamin Herrenschmidt
2013-06-21 12:05       ` Alexander Graf
2013-06-21 12:10         ` Benjamin Herrenschmidt
2013-06-21 13:46           ` Alexander Graf
2013-06-21 21:54             ` Benjamin Herrenschmidt
2013-06-21 22:12               ` Alexander Graf
2013-06-21 22:21                 ` Benjamin Herrenschmidt
2013-06-21 23:10                   ` Alex Williamson
2013-06-21 23:19                     ` Benjamin Herrenschmidt
2013-06-21  9:22 ` [Qemu-devel] [PATCH 3/3] KVM: PPC: enable irqfd Alexey Kardashevskiy
2013-06-21 17:52   ` Scott Wood
2013-06-22  1:12     ` Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).