From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
Alexander Graf <agraf@suse.de>,
qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Paul Mackerras <paulus@samba.org>,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support
Date: Thu, 20 Jun 2013 22:46:21 -0600 [thread overview]
Message-ID: <1371789981.30572.114.camel@ul30vt.home> (raw)
In-Reply-To: <51C3BF3E.20901@ozlabs.ru>
On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote:
> On 06/21/2013 12:34 PM, Alex Williamson wrote:
> > On Fri, 2013-06-21 at 11:56 +1000, Alexey Kardashevskiy wrote:
> >> On 06/21/2013 02:51 AM, Alex Williamson wrote:
> >>> On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
> >>>> At the moment QEMU creates a route for every MSI IRQ.
> >>>>
> >>>> Now we are about to add IRQFD support on PPC64-pseries platform.
> >>>> pSeries already has in-kernel emulated interrupt controller with
> >>>> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
> >>>> mapping as a part of PAPR requirements for MSI/MSIX guests.
> >>>> Specifically, the pSeries guest does not touch MSIMessage's at
> >>>> all, instead it uses rtas_ibm_change_msi and rtas_ibm_query_interrupt_source
> >>>> rtas calls to do the mapping.
> >>>>
> >>>> Therefore we do not really need more routing than we got already.
> >>>> The patch introduces the infrastructure to enable direct IRQ mapping.
> >>>>
> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>
> >>>> ---
> >>>>
> >>>> The patch is raw and ugly indeed, I made it only to demonstrate
> >>>> the idea and see if it has right to live or not.
> >>>>
> >>>> For some reason which I do not really understand (limited GSI numbers?)
> >>>> the existing code always adds routing and I do not see why we would need it.
> >>>
> >>> It's an IOAPIC, a pin gets toggled from the device and an MSI message
> >>> gets written to the CPU. So the route allocates and programs the
> >>> pin->MSI, then we tell it what notifier triggers that pin.
> >>
> >>> On x86 the MSI vector doesn't encode any information about the device
> >>> sending the MSI, here you seem to be able to figure out the device and
> >>> vector space number from the address. Then your pin to MSI is
> >>> effectively fixed. So why isn't this just your
> >>> kvm_irqchip_add_msi_route function? On pSeries it's a lookup, on x86
> >>> it's a allocate and program.
> >>> What does kvm_irqchip_add_msi_route do on
> >>> pSeries today? Thanks,
> >>
> >>
> >> As we just started implementing this thing, I commented it out for the
> >> starter. Once called, it destroys direct mapping in the host kernel and
> >> everything stops working as routing is not implemented (yet? ever?).
> >
> > Yay, it's broken, you can rewrite it ;)
>
>
> There is nothing to rewrite, my understanding is that it is just not
> written yet and Paul would like not do that :)
>
>
> >> My point here is that MSIMessage to irq translation is made on a PCI domain
> >> as PAPR (ppc64 server) spec says. The guest never uses MSIMessage, it is
> >> all in QEMU, the guest dynamically allocates MSI IRQs and it is up to a
> >> hypeviser (QEMU) to take care of actual MSIMessage for the device.
> >
> > MSIMessage is what the guest has programmed for the address/data fields,
> > it's not just a QEMU invention. From the guest perspective, the device
> > writes msg.data to msg.address to signal the CPU for the interrupt.
>
>
> Our guests do never program MSIMessage. Hypercalls are used instead.
Of course POWER has a hypercall for that, but that's just abstracting
the physical device, which does actually write msg.data to msg.address
on the bus.
> >> And the only reason to use MSIMessage in QEMU for us is to support
> >> msi_notify()/msix_notify() in places like vfio_msi_interrupt(), I have
> >> added a MSI window for that long time ago which we do not need as much as
> >> we already have an irq number in vfio_msi_interrupt(), etc.
> >
> > It seems like you just have another layer of indirection via your
> > msi_table. For x86 there's a layer of indirection via the virq virtual
> > IOAPIC pin. Seems similar. Thanks,
>
>
> Do not follow you, sorry. For x86, is it that MSI routing table which is
> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?
vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data)
This writes directly to the interrupt block on the vCPU. With KVM, the
in-kernel APIC does the same write, where the pin to MSIMessage is setup
by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd.
Do I understand that on POWER the MSI from the device is intercepted at
the PHB and converted to an IRQ that's triggered by some means other
than a MSI write? So to correctly model the hardware, vfio should do a
msi_notify() that does a stl_le_phys that terminates at this IRQ
remapper thing and in turn toggles a qemu_irq. MSIMessage is only
extraneous data if you want to skip over hardware blocks.
Maybe you could add a device parameter to kvm_irqchip_add_msi_route so
that it can be implemented on POWER without this pci_bus_map_msi
interface that seems very unique to POWER. Thanks,
Alex
> >>>> ---
> >>>> hw/misc/vfio.c | 11 +++++++++--
> >>>> hw/pci/pci.c | 13 +++++++++++++
> >>>> hw/ppc/spapr_pci.c | 13 +++++++++++++
> >>>> hw/virtio/virtio-pci.c | 26 ++++++++++++++++++++------
> >>>> include/hw/pci/pci.h | 4 ++++
> >>>> include/hw/pci/pci_bus.h | 1 +
> >>>> 6 files changed, 60 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> >>>> index 14aac04..2d9eef7 100644
> >>>> --- a/hw/misc/vfio.c
> >>>> +++ b/hw/misc/vfio.c
> >>>> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> >>>> * Attempt to enable route through KVM irqchip,
> >>>> * default to userspace handling if unavailable.
> >>>> */
> >>>> - vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> >>>> +
> >>>> + vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
> >>>> + if (vector->virq < 0) {
> >>>> + vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
> >>>> + }
> >>>> if (vector->virq < 0 ||
> >>>> kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> >>>> vector->virq) < 0) {
> >>>> @@ -807,7 +811,10 @@ retry:
> >>>> * Attempt to enable route through KVM irqchip,
> >>>> * default to userspace handling if unavailable.
> >>>> */
> >>>> - vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> >>>> + vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
> >>>> + if (vector->virq < 0) {
> >>>> + vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> >>>> + }
> >>>> if (vector->virq < 0 ||
> >>>> kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
> >>>> vector->virq) < 0) {
> >>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>> index a976e46..a9875e9 100644
> >>>> --- a/hw/pci/pci.c
> >>>> +++ b/hw/pci/pci.c
> >>>> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice *dev,
> >>>> dev->intx_routing_notifier = notifier;
> >>>> }
> >>>>
> >>>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
> >>>> +{
> >>>> + bus->map_msi = map_msi_fn;
> >>>> +}
> >>>> +
> >>>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
> >>>> +{
> >>>> + if (bus->map_msi) {
> >>>> + return bus->map_msi(bus, msg);
> >>>> + }
> >>>> + return -1;
> >>>> +}
> >>>> +
> >>>> /*
> >>>> * PCI-to-PCI bridge specification
> >>>> * 9.1: Interrupt routing. Table 9-1
> >>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >>>> index 80408c9..9ef9a29 100644
> >>>> --- a/hw/ppc/spapr_pci.c
> >>>> +++ b/hw/ppc/spapr_pci.c
> >>>> @@ -500,6 +500,18 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
> >>>> qemu_irq_pulse(xics_get_qirq(spapr->icp, irq));
> >>>> }
> >>>>
> >>>> +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg)
> >>>> +{
> >>>> + DeviceState *par = bus->qbus.parent;
> >>>> + sPAPRPHBState *sphb = (sPAPRPHBState *) par;
> >>>> + unsigned long addr = msg.address - sphb->msi_win_addr;
> >>>> + int ndev = addr >> 16;
> >>>> + int vec = ((addr & 0xFFFF) >> 2) | msg.data;
> >>>> + uint32_t irq = sphb->msi_table[ndev].irq + vec;
> >>>> +
> >>>> + return (int)irq;
> >>>> +}
> >>>> +
> >>>> static const MemoryRegionOps spapr_msi_ops = {
> >>>> /* There is no .read as the read result is undefined by PCI spec */
> >>>> .read = NULL,
> >>>> @@ -664,6 +676,7 @@ static int _spapr_phb_init(SysBusDevice *s)
> >>>>
> >>>> sphb->lsi_table[i].irq = irq;
> >>>> }
> >>>> + pci_bus_set_map_msi_fn(bus, spapr_msi_get_irq);
> >>>>
> >>>> return 0;
> >>>> }
> >>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> >>>> index d309416..587f53e 100644
> >>>> --- a/hw/virtio/virtio-pci.c
> >>>> +++ b/hw/virtio/virtio-pci.c
> >>>> @@ -472,6 +472,8 @@ static unsigned virtio_pci_get_features(DeviceState *d)
> >>>> return proxy->host_features;
> >>>> }
> >>>>
> >>>> +extern int spapr_msi_get_irq(PCIBus *bus, MSIMessage *msg);
> >>>> +
> >>>> static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
> >>>> unsigned int queue_no,
> >>>> unsigned int vector,
> >>>> @@ -481,7 +483,10 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
> >>>> int ret;
> >>>>
> >>>> if (irqfd->users == 0) {
> >>>> - ret = kvm_irqchip_add_msi_route(kvm_state, msg);
> >>>> + ret = pci_bus_map_msi(proxy->pci_dev.bus, msg);
> >>>> + if (ret < 0) {
> >>>> + ret = kvm_irqchip_add_msi_route(kvm_state, msg);
> >>>> + }
> >>>> if (ret < 0) {
> >>>> return ret;
> >>>> }
> >>>> @@ -609,14 +614,23 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
> >>>> VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
> >>>> EventNotifier *n = virtio_queue_get_guest_notifier(vq);
> >>>> VirtIOIRQFD *irqfd;
> >>>> - int ret = 0;
> >>>> + int ret = 0, tmp;
> >>>>
> >>>> if (proxy->vector_irqfd) {
> >>>> irqfd = &proxy->vector_irqfd[vector];
> >>>> - if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
> >>>> - ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg);
> >>>> - if (ret < 0) {
> >>>> - return ret;
> >>>> +
> >>>> + tmp = pci_bus_map_msi(proxy->pci_dev.bus, msg);
> >>>> + if (tmp >= 0) {
> >>>> + if (irqfd->virq != tmp) {
> >>>> + fprintf(stderr, "FIXME: MSI(-X) vector has changed from %X to %x\n",
> >>>> + irqfd->virq, tmp);
> >>>> + }
> >>>> + } else {
> >>>> + if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
> >>>> + ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg);
> >>>> + if (ret < 0) {
> >>>> + return ret;
> >>>> + }
> >>>> }
> >>>> }
> >>>> }
> >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>>> index 8797802..632739a 100644
> >>>> --- a/include/hw/pci/pci.h
> >>>> +++ b/include/hw/pci/pci.h
> >>>> @@ -332,6 +332,7 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev);
> >>>> typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
> >>>> typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
> >>>> typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
> >>>> +typedef int (*pci_map_msi_fn)(PCIBus *bus, MSIMessage msg);
> >>>>
> >>>> typedef enum {
> >>>> PCI_HOTPLUG_DISABLED,
> >>>> @@ -375,6 +376,9 @@ bool pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new);
> >>>> void pci_bus_fire_intx_routing_notifier(PCIBus *bus);
> >>>> void pci_device_set_intx_routing_notifier(PCIDevice *dev,
> >>>> PCIINTxRoutingNotifier notifier);
> >>>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn);
> >>>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg);
> >>>> +
> >>>> void pci_device_reset(PCIDevice *dev);
> >>>> void pci_bus_reset(PCIBus *bus);
> >>>>
> >>>> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> >>>> index 66762f6..81efd2b 100644
> >>>> --- a/include/hw/pci/pci_bus.h
> >>>> +++ b/include/hw/pci/pci_bus.h
> >>>> @@ -16,6 +16,7 @@ struct PCIBus {
> >>>> pci_set_irq_fn set_irq;
> >>>> pci_map_irq_fn map_irq;
> >>>> pci_route_irq_fn route_intx_to_irq;
> >>>> + pci_map_msi_fn map_msi;
> >>>> pci_hotplug_fn hotplug;
> >>>> DeviceState *hotplug_qdev;
> >>>> void *irq_opaque;
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
>
>
next prev parent reply other threads:[~2013-06-21 4:46 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-20 14:08 [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
2013-06-20 15:38 ` Michael S. Tsirkin
2013-06-20 16:37 ` Anthony Liguori
2013-06-20 23:51 ` Alexey Kardashevskiy
2013-06-23 14:07 ` Michael S. Tsirkin
2013-06-23 15:02 ` Anthony Liguori
2013-06-23 21:39 ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-23 21:58 ` Anthony Liguori
2013-06-24 4:46 ` Alex Williamson
2013-06-24 12:24 ` Anthony Liguori
2013-06-24 12:39 ` Gleb Natapov
2013-06-23 21:36 ` [Qemu-devel] " Benjamin Herrenschmidt
2013-06-24 12:10 ` Michael S. Tsirkin
2013-06-20 16:51 ` Alex Williamson
2013-06-21 1:56 ` Alexey Kardashevskiy
2013-06-21 2:34 ` Alex Williamson
2013-06-21 2:49 ` Alexey Kardashevskiy
2013-06-21 4:46 ` Alex Williamson [this message]
2013-06-21 5:12 ` Benjamin Herrenschmidt
2013-06-21 6:03 ` Alex Williamson
2013-06-21 6:12 ` Benjamin Herrenschmidt
2013-06-21 6:40 ` Alexey Kardashevskiy
2013-06-23 15:06 ` Anthony Liguori
2013-06-24 4:44 ` Alex Williamson
2013-06-24 12:25 ` Anthony Liguori
2013-06-24 7:13 ` Gleb Natapov
2013-06-24 12:32 ` Anthony Liguori
2013-06-24 12:37 ` Alexander Graf
2013-06-24 13:06 ` Gleb Natapov
2013-06-24 13:34 ` Anthony Liguori
2013-06-24 13:41 ` Michael S. Tsirkin
2013-06-24 14:31 ` Anthony Liguori
2013-06-24 14:34 ` Alexander Graf
2013-06-24 15:17 ` Anthony Liguori
2013-06-24 16:48 ` Gleb Natapov
2013-06-24 16:35 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1371789981.30572.114.camel@ul30vt.home \
--to=alex.williamson@redhat.com \
--cc=agraf@suse.de \
--cc=aik@ozlabs.ru \
--cc=aliguori@us.ibm.com \
--cc=david@gibson.dropbear.id.au \
--cc=mst@redhat.com \
--cc=paulus@samba.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=qemu-trivial@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).