From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:52806)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <liuw@liuw.name>) id 1QPV7H-000295-JN
	for qemu-devel@nongnu.org; Thu, 26 May 2011 03:36:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <liuw@liuw.name>) id 1QPV7G-0003qL-C4
	for qemu-devel@nongnu.org; Thu, 26 May 2011 03:36:23 -0400
Received: from mail-iw0-f173.google.com ([209.85.214.173]:61758)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <liuw@liuw.name>) id 1QPV7G-0003qF-7V
	for qemu-devel@nongnu.org; Thu, 26 May 2011 03:36:22 -0400
Received: by iwl42 with SMTP id 42so457522iwl.4
	for <qemu-devel@nongnu.org>; Thu, 26 May 2011 00:36:21 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4DDDFF83.6010404@siemens.com>
References: <BANLkTik7kyyyMhuj+Upi5Jn_tCeF5dy+vg@mail.gmail.com>
	<4DDDFF83.6010404@siemens.com>
From: Wei Liu <liuw@liuw.name>
Date: Thu, 26 May 2011 15:35:51 +0800
Message-ID: <BANLkTim3mDivE8so0H0CioJMv7yYjreYRA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] Xen: enabling emulated MSI injection
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Anthony Perard <anthony.perard@citrix.com>, qemu-devel@nongnu.org, Stefano Stabellini <stefano.stabellini@eu.citrix.com>

On Thu, May 26, 2011 at 3:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> A cleaner way of hooking into this would be registering on the MSI MMIO
> target region (replacing the APIC). That would give you MSI support as we=
ll.
>
> However, KVM has much more complex requirements for MSI handling when
> using the in-kernel irqchip. That's because we need to support, e.g.,
> eventfd-based MSI injection in the hypervisor without looping over QEMU,
> ie. we need to translate a binary event into a MSI address/data tuple.
> This is currently solved a bit too pragmatic in our qemu-kvm tree -
> not mergeable to upstream. So I started working out a better plan,
> partly based on ideas Avi provided.
>
> One element in this concept will be a hook into the MSI delivery path,
> both for events triggered by msi/msix_notify as well as those raised by
> weird DMA (to be architecturally correct). See below for a snapshot
> (depends on other patches). That will obsolete any open-coded hooking
> like this here.
>
> If Xen support is urging, I could try to break out the relevant bits a
> bit earlier. Otherwise I would prefer to postpone the topic by a few
> weeks until we are done with evaluating the concept against KVM's
> requirements so that we can find a truly generic solution.
>
> Jan
>
>
> ------8<-------
>
> diff --git a/hw/apic.c b/hw/apic.c
> index a45b57f..7447d91 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -19,6 +19,7 @@
> =C2=A0#include "hw.h"
> =C2=A0#include "apic.h"
> =C2=A0#include "ioapic.h"
> +#include "msi.h"
> =C2=A0#include "qemu-timer.h"
> =C2=A0#include "host-utils.h"
> =C2=A0#include "sysbus.h"
> @@ -795,7 +796,7 @@ static uint32_t apic_mem_readl(void *opaque, target_p=
hys_addr_t addr)
> =C2=A0 =C2=A0 return val;
> =C2=A0}
>
> -static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
> +void apic_deliver_msi(uint64_t addr, uint32_t data)
> =C2=A0{
> =C2=A0 =C2=A0 uint8_t dest =3D (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR=
_DEST_ID_SHIFT;
> =C2=A0 =C2=A0 uint8_t vector =3D (data & MSI_DATA_VECTOR_MASK) >> MSI_DAT=
A_VECTOR_SHIFT;
> @@ -817,7 +818,9 @@ static void apic_mem_writel(void *opaque, target_phys=
_addr_t addr, uint32_t val)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* APIC is connected directly to the CPU=
.
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Mapping them on the global bus happen=
s to work because
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* MSI registers are reserved in APIC MM=
IO and vice versa. */
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0apic_send_msi(addr, val);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0MSIMessage msg =3D { .address =3D addr, .dat=
a =3D val };
> +
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0msi_deliver(&msg);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> =C2=A0 =C2=A0 }
>
> diff --git a/hw/apic.h b/hw/apic.h
> index c857d52..d0971ae 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -20,6 +20,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
> =C2=A0uint8_t cpu_get_apic_tpr(DeviceState *s);
> =C2=A0void apic_init_reset(DeviceState *s);
> =C2=A0void apic_sipi(DeviceState *s);
> +void apic_deliver_msi(uint64_t addr, uint32_t data);
>
> =C2=A0/* pc.c */
> =C2=A0int cpu_is_bsp(CPUState *env);
> diff --git a/hw/msi.c b/hw/msi.c
> index e111ceb..b88dcc4 100644
> --- a/hw/msi.c
> +++ b/hw/msi.c
> @@ -37,6 +37,14 @@
>
> =C2=A0#define PCI_MSI_VECTORS_MAX =C2=A0 =C2=A0 32
>
> +static void msi_unsupported(MSIMessage *msg)
> +{
> + =C2=A0 =C2=A0/* If we get here, the board failed to register a delivery=
 handler. */
> + =C2=A0 =C2=A0abort();
> +}
> +
> +void (*msi_deliver)(MSIMessage *msg) =3D msi_unsupported;
> +
> =C2=A0/* If we get rid of cap allocator, we won't need this. */
> =C2=A0static inline uint8_t msi_cap_sizeof(uint16_t flags)
> =C2=A0{
> @@ -361,7 +369,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"not=
ify vector 0x%x"
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0" ad=
dress: 0x%"PRIx64" data: 0x%"PRIx32"\n",
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vect=
or, msg.address, msg.data);
> - =C2=A0 =C2=A0stl_phys(msg.address, msg.data);
> + =C2=A0 =C2=A0msi_deliver(&msg);
> =C2=A0}
>
> =C2=A0/* call this function after updating configs by pci_default_write_c=
onfig(). */
> diff --git a/hw/msi.h b/hw/msi.h
> index 7a0e76f..ba76eca 100644
> --- a/hw/msi.h
> +++ b/hw/msi.h
> @@ -44,4 +44,6 @@ static inline bool msi_present(const PCIDevice *dev)
> =C2=A0 =C2=A0 return dev->cap_present & QEMU_PCI_CAP_MSI;
> =C2=A0}
>
> +extern void (*msi_deliver)(MSIMessage *msg);
> +
> =C2=A0#endif /* QEMU_MSI_H */
> diff --git a/hw/msix.c b/hw/msix.c
> index 6e55422..d893f88 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -514,7 +514,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>
> =C2=A0 =C2=A0 msix_message_from_vector(dev, vector, &msg);
>
> - =C2=A0 =C2=A0stl_phys(msg.address, msg.data);
> + =C2=A0 =C2=A0msi_deliver(&msg);
> =C2=A0}
>
> =C2=A0void msix_reset(PCIDevice *dev)
> diff --git a/hw/pc.c b/hw/pc.c
> index 73398eb..f88ef03 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -24,6 +24,7 @@
> =C2=A0#include "hw.h"
> =C2=A0#include "pc.h"
> =C2=A0#include "apic.h"
> +#include "msi.h"
> =C2=A0#include "fdc.h"
> =C2=A0#include "ide.h"
> =C2=A0#include "pci.h"
> @@ -102,6 +103,15 @@ void isa_irq_handler(void *opaque, int n, int level)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 qemu_set_irq(isa->ioapic[n], level);
> =C2=A0};
>
> +static void pc_msi_deliver(MSIMessage *msg)
> +{
> + =C2=A0 =C2=A0if ((msg->address & 0xfff00000) =3D=3D MSI_ADDR_BASE) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0apic_deliver_msi(msg->address, msg->data);
> + =C2=A0 =C2=A0} else {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0stl_phys(msg->address, msg->data);
> + =C2=A0 =C2=A0}
> +}
> +
> =C2=A0static void ioport80_write(void *opaque, uint32_t addr, uint32_t da=
ta)
> =C2=A0{
> =C2=A0}
> @@ -889,6 +899,7 @@ static DeviceState *apic_init(void *env, uint8_t apic=
_id)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0on the global memory bus. */
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* XXX: what if the base changes? */
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 sysbus_mmio_map(d, 0, MSI_ADDR_BASE);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0msi_deliver =3D pc_msi_deliver;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 apic_mapped =3D 1;
> =C2=A0 =C2=A0 }
>
> --
> 1.7.1
>
> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
>

Hmm... This patch is just part of GSoC project Virtio on Xen. AFAICS,
this is not so urgent.

I would prefer a cleaner way. A standalone function that doesn't
belong to any QEMU subsystem annoys me. And the hooking looks ugly.

Thanks for your patch, I think we can wait for a better design and
cleaner solution for both Xen and KVM.

--=20
Best regards
Wei Liu
Twitter: @iliuw
Site: http://liuw.name