From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43722) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ur5qI-000555-6F for qemu-devel@nongnu.org; Mon, 24 Jun 2013 08:26:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ur5qF-0000ck-Kh for qemu-devel@nongnu.org; Mon, 24 Jun 2013 08:25:58 -0400 Received: from mail-yh0-x22c.google.com ([2607:f8b0:4002:c01::22c]:56733) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ur5qF-0000c6-DP for qemu-devel@nongnu.org; Mon, 24 Jun 2013 08:25:55 -0400 Received: by mail-yh0-f44.google.com with SMTP id t59so4903259yho.31 for ; Mon, 24 Jun 2013 05:25:55 -0700 (PDT) From: Anthony Liguori In-Reply-To: <1372049059.30572.170.camel@ul30vt.home> References: <1371737338-25148-1-git-send-email-aik@ozlabs.ru> <1371747090.32709.61.camel@ul30vt.home> <51C3B2E4.7000807@ozlabs.ru> <1371782063.30572.74.camel@ul30vt.home> <51C3BF3E.20901@ozlabs.ru> <1371789981.30572.114.camel@ul30vt.home> <1372049059.30572.170.camel@ul30vt.home> Date: Mon, 24 Jun 2013 07:25:50 -0500 Message-ID: <87vc53bsq9.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: "Michael S . Tsirkin" , Alexey Kardashevskiy , Alexander Graf , qemu-devel , qemu-trivial@nongnu.org, qemu-ppc@nongnu.org, Paolo Bonzini , Paul Mackerras , David Gibson Alex Williamson writes: > On Sun, 2013-06-23 at 10:06 -0500, Anthony Liguori wrote: >> On Thu, Jun 20, 2013 at 11:46 PM, Alex Williamson >> wrote: >> > On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote: >> >> On 06/21/2013 12:34 PM, Alex Williamson wrote: >> >> >> >> >> >> Do not follow you, sorry. For x86, is it that MSI routing table which is >> >> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of >> >> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()? >> > >> > vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data) >> > >> > This writes directly to the interrupt block on the vCPU. With KVM, the >> > in-kernel APIC does the same write, where the pin to MSIMessage is setup >> > by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd. >> >> What is this "interrupt block on the vCPU" you speak of? I reviewed >> the SDM and see nothing in the APIC protocol or the brief description >> of MSI as a PCI concept that would indicate anything except that the >> PHB handles MSI writes and feeds them to the I/O APIC. > > In all likelihood I'm recalling ia64 details and trying to apply them to > x86. Does the MSIMessage not actually get written to the LAPIC on the > CPU? Thanks, There definitely isn't an APIC message for MSI specifically. I think the only question is whether the PHB sits on the APIC bus and can generate an APIC message directly or whether it has a private interface to the IO APIC to do it. I suspect that there are systems that do either. But the important point is that MSI writes are interpreted by the PHB either way. Regards, Anthony Liguori > > Alex > >> In fact, the wikipedia article on MSI has: >> >> "A common misconception with Message Signaled Interrupts is that they >> allow the device to send data to a processor as part of the interrupt. >> The data that is sent as part of the write is used by the chipset to >> determine which interrupt to trigger on which processor; it is not >> available for the device to communicate additional information to the >> interrupt handler." >> >> > Do I understand that on POWER the MSI from the device is intercepted at >> > the PHB and converted to an IRQ that's triggered by some means other >> > than a MSI write? >> >> This is exactly the same thing that happens on x86, no? Can you point >> me to something in the SDM that says otherwise? >> >> Regards, >> >> Anthony Liguori >> >> > So to correctly model the hardware, vfio should do a >> > msi_notify() that does a stl_le_phys that terminates at this IRQ >> > remapper thing and in turn toggles a qemu_irq. MSIMessage is only >> > extraneous data if you want to skip over hardware blocks. >> > >> > Maybe you could add a device parameter to kvm_irqchip_add_msi_route so >> > that it can be implemented on POWER without this pci_bus_map_msi >> > interface that seems very unique to POWER. Thanks, >> > >> > Alex >> > >> >> >>>> --- >> >> >>>> hw/misc/vfio.c | 11 +++++++++-- >> >> >>>> hw/pci/pci.c | 13 +++++++++++++ >> >> >>>> hw/ppc/spapr_pci.c | 13 +++++++++++++ >> >> >>>> hw/virtio/virtio-pci.c | 26 ++++++++++++++++++++------ >> >> >>>> include/hw/pci/pci.h | 4 ++++ >> >> >>>> include/hw/pci/pci_bus.h | 1 + >> >> >>>> 6 files changed, 60 insertions(+), 8 deletions(-) >> >> >>>> >> >> >>>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c >> >> >>>> index 14aac04..2d9eef7 100644 >> >> >>>> --- a/hw/misc/vfio.c >> >> >>>> +++ b/hw/misc/vfio.c >> >> >>>> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, >> >> >>>> * Attempt to enable route through KVM irqchip, >> >> >>>> * default to userspace handling if unavailable. >> >> >>>> */ >> >> >>>> - vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1; >> >> >>>> + >> >> >>>> + vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1; >> >> >>>> + if (vector->virq < 0) { >> >> >>>> + vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1; >> >> >>>> + } >> >> >>>> if (vector->virq < 0 || >> >> >>>> kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt, >> >> >>>> vector->virq) < 0) { >> >> >>>> @@ -807,7 +811,10 @@ retry: >> >> >>>> * Attempt to enable route through KVM irqchip, >> >> >>>> * default to userspace handling if unavailable. >> >> >>>> */ >> >> >>>> - vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg); >> >> >>>> + vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg); >> >> >>>> + if (vector->virq < 0) { >> >> >>>> + vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg); >> >> >>>> + } >> >> >>>> if (vector->virq < 0 || >> >> >>>> kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt, >> >> >>>> vector->virq) < 0) { >> >> >>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c >> >> >>>> index a976e46..a9875e9 100644 >> >> >>>> --- a/hw/pci/pci.c >> >> >>>> +++ b/hw/pci/pci.c >> >> >>>> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice *dev, >> >> >>>> dev->intx_routing_notifier = notifier; >> >> >>>> } >> >> >>>> >> >> >>>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn) >> >> >>>> +{ >> >> >>>> + bus->map_msi = map_msi_fn; >> >> >>>> +} >> >> >>>> + >> >> >>>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg) >> >> >>>> +{ >> >> >>>> + if (bus->map_msi) { >> >> >>>> + return bus->map_msi(bus, msg); >> >> >>>> + } >> >> >>>> + return -1; >> >> >>>> +} >> >> >>>> + >> >> >>>> /* >> >> >>>> * PCI-to-PCI bridge specification >> >> >>>> * 9.1: Interrupt routing. Table 9-1 >> >> >>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >> >> >>>> index 80408c9..9ef9a29 100644 >> >> >>>> --- a/hw/ppc/spapr_pci.c >> >> >>>> +++ b/hw/ppc/spapr_pci.c >> >> >>>> @@ -500,6 +500,18 @@ static void spapr_msi_write(void *opaque, hwaddr addr, >> >> >>>> qemu_irq_pulse(xics_get_qirq(spapr->icp, irq)); >> >> >>>> } >> >> >>>> >> >> >>>> +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg) >> >> >>>> +{ >> >> >>>> + DeviceState *par = bus->qbus.parent; >> >> >>>> + sPAPRPHBState *sphb = (sPAPRPHBState *) par; >> >> >>>> + unsigned long addr = msg.address - sphb->msi_win_addr; >> >> >>>> + int ndev = addr >> 16; >> >> >>>> + int vec = ((addr & 0xFFFF) >> 2) | msg.data; >> >> >>>> + uint32_t irq = sphb->msi_table[ndev].irq + vec; >> >> >>>> + >> >> >>>> + return (int)irq; >> >> >>>> +} >> >> >>>> + >> >> >>>> static const MemoryRegionOps spapr_msi_ops = { >> >> >>>> /* There is no .read as the read result is undefined by PCI spec */ >> >> >>>> .read = NULL, >> >> >>>> @@ -664,6 +676,7 @@ static int _spapr_phb_init(SysBusDevice *s) >> >> >>>> >> >> >>>> sphb->lsi_table[i].irq = irq; >> >> >>>> } >> >> >>>> + pci_bus_set_map_msi_fn(bus, spapr_msi_get_irq); >> >> >>>> >> >> >>>> return 0; >> >> >>>> } >> >> >>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c >> >> >>>> index d309416..587f53e 100644 >> >> >>>> --- a/hw/virtio/virtio-pci.c >> >> >>>> +++ b/hw/virtio/virtio-pci.c >> >> >>>> @@ -472,6 +472,8 @@ static unsigned virtio_pci_get_features(DeviceState *d) >> >> >>>> return proxy->host_features; >> >> >>>> } >> >> >>>> >> >> >>>> +extern int spapr_msi_get_irq(PCIBus *bus, MSIMessage *msg); >> >> >>>> + >> >> >>>> static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy, >> >> >>>> unsigned int queue_no, >> >> >>>> unsigned int vector, >> >> >>>> @@ -481,7 +483,10 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy, >> >> >>>> int ret; >> >> >>>> >> >> >>>> if (irqfd->users == 0) { >> >> >>>> - ret = kvm_irqchip_add_msi_route(kvm_state, msg); >> >> >>>> + ret = pci_bus_map_msi(proxy->pci_dev.bus, msg); >> >> >>>> + if (ret < 0) { >> >> >>>> + ret = kvm_irqchip_add_msi_route(kvm_state, msg); >> >> >>>> + } >> >> >>>> if (ret < 0) { >> >> >>>> return ret; >> >> >>>> } >> >> >>>> @@ -609,14 +614,23 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy, >> >> >>>> VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no); >> >> >>>> EventNotifier *n = virtio_queue_get_guest_notifier(vq); >> >> >>>> VirtIOIRQFD *irqfd; >> >> >>>> - int ret = 0; >> >> >>>> + int ret = 0, tmp; >> >> >>>> >> >> >>>> if (proxy->vector_irqfd) { >> >> >>>> irqfd = &proxy->vector_irqfd[vector]; >> >> >>>> - if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) { >> >> >>>> - ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg); >> >> >>>> - if (ret < 0) { >> >> >>>> - return ret; >> >> >>>> + >> >> >>>> + tmp = pci_bus_map_msi(proxy->pci_dev.bus, msg); >> >> >>>> + if (tmp >= 0) { >> >> >>>> + if (irqfd->virq != tmp) { >> >> >>>> + fprintf(stderr, "FIXME: MSI(-X) vector has changed from %X to %x\n", >> >> >>>> + irqfd->virq, tmp); >> >> >>>> + } >> >> >>>> + } else { >> >> >>>> + if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) { >> >> >>>> + ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg); >> >> >>>> + if (ret < 0) { >> >> >>>> + return ret; >> >> >>>> + } >> >> >>>> } >> >> >>>> } >> >> >>>> } >> >> >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h >> >> >>>> index 8797802..632739a 100644 >> >> >>>> --- a/include/hw/pci/pci.h >> >> >>>> +++ b/include/hw/pci/pci.h >> >> >>>> @@ -332,6 +332,7 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev); >> >> >>>> typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level); >> >> >>>> typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num); >> >> >>>> typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin); >> >> >>>> +typedef int (*pci_map_msi_fn)(PCIBus *bus, MSIMessage msg); >> >> >>>> >> >> >>>> typedef enum { >> >> >>>> PCI_HOTPLUG_DISABLED, >> >> >>>> @@ -375,6 +376,9 @@ bool pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new); >> >> >>>> void pci_bus_fire_intx_routing_notifier(PCIBus *bus); >> >> >>>> void pci_device_set_intx_routing_notifier(PCIDevice *dev, >> >> >>>> PCIINTxRoutingNotifier notifier); >> >> >>>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn); >> >> >>>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg); >> >> >>>> + >> >> >>>> void pci_device_reset(PCIDevice *dev); >> >> >>>> void pci_bus_reset(PCIBus *bus); >> >> >>>> >> >> >>>> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h >> >> >>>> index 66762f6..81efd2b 100644 >> >> >>>> --- a/include/hw/pci/pci_bus.h >> >> >>>> +++ b/include/hw/pci/pci_bus.h >> >> >>>> @@ -16,6 +16,7 @@ struct PCIBus { >> >> >>>> pci_set_irq_fn set_irq; >> >> >>>> pci_map_irq_fn map_irq; >> >> >>>> pci_route_irq_fn route_intx_to_irq; >> >> >>>> + pci_map_msi_fn map_msi; >> >> >>>> pci_hotplug_fn hotplug; >> >> >>>> DeviceState *hotplug_qdev; >> >> >>>> void *irq_opaque; >> >> >>> >> >> >>> >> >> >>> >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> >> >> >> > >> > >> > >> >