qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Peter Maydell <peter.maydell@linaro.org>, qemu-devel@nongnu.org
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Paul Durrant" <paul@xen.org>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Ankur Arora" <ankur.a.arora@oracle.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Thomas Huth" <thuth@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Juan Quintela" <quintela@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Claudio Fontana" <cfontana@suse.de>,
	"Julien Grall" <julien@xen.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	armbru@redhat.com
Subject: [PATCH v11 13/59] hw/xen: Add xen_overlay device for emulating shared xenheap pages
Date: Thu, 16 Feb 2023 06:23:58 +0000	[thread overview]
Message-ID: <20230216062444.2129371-14-dwmw2@infradead.org> (raw)
In-Reply-To: <20230216062444.2129371-1-dwmw2@infradead.org>

From: David Woodhouse <dwmw@amazon.co.uk>

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
---
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/xen_overlay.c | 210 ++++++++++++++++++++++++++++++++++++++
 hw/i386/kvm/xen_overlay.h |  20 ++++
 include/sysemu/kvm_xen.h  |   7 ++
 4 files changed, 238 insertions(+)
 create mode 100644 hw/i386/kvm/xen_overlay.c
 create mode 100644 hw/i386/kvm/xen_overlay.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 95467f1ded..6165cbf019 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,5 +4,6 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
new file mode 100644
index 0000000000..a2441e2b4e
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.c
@@ -0,0 +1,210 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse <dwmw2@infradead.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+#include <linux/kvm.h>
+
+#include "hw/xen/interface/memory.h"
+
+
+#define TYPE_XEN_OVERLAY "xen-overlay"
+OBJECT_DECLARE_SIMPLE_TYPE(XenOverlayState, XEN_OVERLAY)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenOverlayState {
+    /*< private >*/
+    SysBusDevice busdev;
+    /*< public >*/
+
+    MemoryRegion shinfo_mem;
+    void *shinfo_ptr;
+    uint64_t shinfo_gpa;
+};
+
+struct XenOverlayState *xen_overlay_singleton;
+
+static void xen_overlay_do_map_page(MemoryRegion *page, uint64_t gpa)
+{
+    /*
+     * Xen allows guests to map the same page as many times as it likes
+     * into guest physical frames. We don't, because it would be hard
+     * to track and restore them all. One mapping of each page is
+     * perfectly sufficient for all known guests... and we've tested
+     * that theory on a few now in other implementations. dwmw2.
+     */
+    if (memory_region_is_mapped(page)) {
+        if (gpa == INVALID_GPA) {
+            memory_region_del_subregion(get_system_memory(), page);
+        } else {
+            /* Just move it */
+            memory_region_set_address(page, gpa);
+        }
+    } else if (gpa != INVALID_GPA) {
+        memory_region_add_subregion_overlap(get_system_memory(), gpa, page, 0);
+    }
+}
+
+/* KVM is the only existing back end for now. Let's not overengineer it yet. */
+static int xen_overlay_set_be_shinfo(uint64_t gfn)
+{
+    struct kvm_xen_hvm_attr xa = {
+        .type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+        .u.shared_info.gfn = gfn,
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &xa);
+}
+
+
+static void xen_overlay_realize(DeviceState *dev, Error **errp)
+{
+    XenOverlayState *s = XEN_OVERLAY(dev);
+
+    if (xen_mode != XEN_EMULATE) {
+        error_setg(errp, "Xen overlay page support is for Xen emulation");
+        return;
+    }
+
+    memory_region_init_ram(&s->shinfo_mem, OBJECT(dev), "xen:shared_info",
+                           XEN_PAGE_SIZE, &error_abort);
+    memory_region_set_enabled(&s->shinfo_mem, true);
+
+    s->shinfo_ptr = memory_region_get_ram_ptr(&s->shinfo_mem);
+    s->shinfo_gpa = INVALID_GPA;
+    memset(s->shinfo_ptr, 0, XEN_PAGE_SIZE);
+}
+
+static int xen_overlay_post_load(void *opaque, int version_id)
+{
+    XenOverlayState *s = opaque;
+
+    if (s->shinfo_gpa != INVALID_GPA) {
+        xen_overlay_do_map_page(&s->shinfo_mem, s->shinfo_gpa);
+        xen_overlay_set_be_shinfo(s->shinfo_gpa >> XEN_PAGE_SHIFT);
+    }
+
+    return 0;
+}
+
+static bool xen_overlay_is_needed(void *opaque)
+{
+    return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_overlay_vmstate = {
+    .name = "xen_overlay",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = xen_overlay_is_needed,
+    .post_load = xen_overlay_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(shinfo_gpa, XenOverlayState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void xen_overlay_reset(DeviceState *dev)
+{
+    kvm_xen_soft_reset();
+}
+
+static void xen_overlay_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = xen_overlay_reset;
+    dc->realize = xen_overlay_realize;
+    dc->vmsd = &xen_overlay_vmstate;
+}
+
+static const TypeInfo xen_overlay_info = {
+    .name          = TYPE_XEN_OVERLAY,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(XenOverlayState),
+    .class_init    = xen_overlay_class_init,
+};
+
+void xen_overlay_create(void)
+{
+    xen_overlay_singleton = XEN_OVERLAY(sysbus_create_simple(TYPE_XEN_OVERLAY,
+                                                             -1, NULL));
+
+    /* If xen_domid wasn't explicitly set, at least make sure it isn't zero. */
+    if (xen_domid == DOMID_QEMU) {
+        xen_domid = 1;
+    };
+}
+
+static void xen_overlay_register_types(void)
+{
+    type_register_static(&xen_overlay_info);
+}
+
+type_init(xen_overlay_register_types)
+
+int xen_overlay_map_shinfo_page(uint64_t gpa)
+{
+    XenOverlayState *s = xen_overlay_singleton;
+    int ret;
+
+    if (!s) {
+        return -ENOENT;
+    }
+
+    assert(qemu_mutex_iothread_locked());
+
+    if (s->shinfo_gpa) {
+        /* If removing shinfo page, turn the kernel magic off first */
+        ret = xen_overlay_set_be_shinfo(INVALID_GFN);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    xen_overlay_do_map_page(&s->shinfo_mem, gpa);
+    if (gpa != INVALID_GPA) {
+        ret = xen_overlay_set_be_shinfo(gpa >> XEN_PAGE_SHIFT);
+        if (ret) {
+            return ret;
+        }
+    }
+    s->shinfo_gpa = gpa;
+
+    return 0;
+}
+
+void *xen_overlay_get_shinfo_ptr(void)
+{
+    XenOverlayState *s = xen_overlay_singleton;
+
+    if (!s) {
+        return NULL;
+    }
+
+    return s->shinfo_ptr;
+}
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
new file mode 100644
index 0000000000..00cff05bb0
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.h
@@ -0,0 +1,20 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse <dwmw2@infradead.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_XEN_OVERLAY_H
+#define QEMU_XEN_OVERLAY_H
+
+void xen_overlay_create(void);
+
+int xen_overlay_map_shinfo_page(uint64_t gpa);
+void *xen_overlay_get_shinfo_ptr(void);
+
+#endif /* QEMU_XEN_OVERLAY_H */
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 5dffcc0542..0c3a273549 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -12,6 +12,13 @@
 #ifndef QEMU_SYSEMU_KVM_XEN_H
 #define QEMU_SYSEMU_KVM_XEN_H
 
+/* The KVM API uses these to indicate "no GPA" or "no GFN" */
+#define INVALID_GPA UINT64_MAX
+#define INVALID_GFN UINT64_MAX
+
+/* Qemu plays the rôle of dom0 for "interdomain" communication. */
+#define DOMID_QEMU  0
+
 int kvm_xen_soft_reset(void);
 uint32_t kvm_xen_get_caps(void);
 
-- 
2.39.0



  parent reply	other threads:[~2023-02-16  6:34 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16  6:23 [PATCH v11 00/59] Xen HVM support under KVM David Woodhouse
2023-02-16  6:23 ` [PATCH v11 01/59] include: import Xen public headers to hw/xen/interface David Woodhouse
2023-02-16  6:23 ` [PATCH v11 02/59] xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation David Woodhouse
2023-02-16  6:23 ` [PATCH v11 03/59] xen: Add XEN_DISABLED mode and make it default David Woodhouse
2023-02-16  6:23 ` [PATCH v11 04/59] i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support David Woodhouse
2023-02-16  6:23 ` [PATCH v11 05/59] i386/kvm: handle Xen HVM cpuid leaves David Woodhouse
2023-02-16  6:23 ` [PATCH v11 06/59] i386/hvm: Set Xen vCPU ID in KVM David Woodhouse
2023-02-16  6:23 ` [PATCH v11 07/59] xen-platform: exclude vfio-pci from the PCI platform unplug David Woodhouse
2023-02-16  6:23 ` [PATCH v11 08/59] xen-platform: allow its creation with XEN_EMULATE mode David Woodhouse
2023-02-16  6:23 ` [PATCH v11 09/59] i386/xen: handle guest hypercalls David Woodhouse
2023-02-20 14:17   ` Paul Durrant
2023-02-16  6:23 ` [PATCH v11 10/59] i386/xen: implement HYPERVISOR_xen_version David Woodhouse
2023-02-16  6:23 ` [PATCH v11 11/59] i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown David Woodhouse
2023-02-16  6:23 ` [PATCH v11 12/59] i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield David Woodhouse
2023-02-16  6:23 ` David Woodhouse [this message]
2023-02-16  6:23 ` [PATCH v11 14/59] xen: Permit --xen-domid argument when accel is KVM David Woodhouse
2023-02-16  6:24 ` [PATCH v11 15/59] i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode David Woodhouse
2023-02-16  6:24 ` [PATCH v11 16/59] i386/xen: manage and save/restore Xen guest long_mode setting David Woodhouse
2023-02-16  6:24 ` [PATCH v11 17/59] i386/xen: implement HYPERVISOR_memory_op David Woodhouse
2023-02-16  6:24 ` [PATCH v11 18/59] i386/xen: implement XENMEM_add_to_physmap_batch David Woodhouse
2023-02-16  6:24 ` [PATCH v11 19/59] i386/xen: implement HYPERVISOR_hvm_op David Woodhouse
2023-02-16  6:24 ` [PATCH v11 20/59] i386/xen: implement HYPERVISOR_vcpu_op David Woodhouse
2023-02-16  6:24 ` [PATCH v11 21/59] i386/xen: handle VCPUOP_register_vcpu_info David Woodhouse
2023-02-16  6:24 ` [PATCH v11 22/59] i386/xen: handle VCPUOP_register_vcpu_time_info David Woodhouse
2023-02-16  6:24 ` [PATCH v11 23/59] i386/xen: handle VCPUOP_register_runstate_memory_area David Woodhouse
2023-02-16  6:24 ` [PATCH v11 24/59] i386/xen: implement HYPERVISOR_event_channel_op David Woodhouse
2023-02-16  6:24 ` [PATCH v11 25/59] i386/xen: implement HVMOP_set_evtchn_upcall_vector David Woodhouse
2023-02-16  6:24 ` [PATCH v11 26/59] i386/xen: implement HVMOP_set_param David Woodhouse
2023-02-16  6:24 ` [PATCH v11 27/59] hw/xen: Add xen_evtchn device for event channel emulation David Woodhouse
2023-02-16  6:24 ` [PATCH v11 28/59] i386/xen: Add support for Xen event channel delivery to vCPU David Woodhouse
2023-02-16  6:24 ` [PATCH v11 29/59] hw/xen: Implement EVTCHNOP_status David Woodhouse
2023-02-16  6:24 ` [PATCH v11 30/59] hw/xen: Implement EVTCHNOP_close David Woodhouse
2023-02-16  6:24 ` [PATCH v11 31/59] hw/xen: Implement EVTCHNOP_unmask David Woodhouse
2023-02-16  6:24 ` [PATCH v11 32/59] hw/xen: Implement EVTCHNOP_bind_virq David Woodhouse
2023-02-16  6:24 ` [PATCH v11 33/59] hw/xen: Implement EVTCHNOP_bind_ipi David Woodhouse
2023-02-16  6:24 ` [PATCH v11 34/59] hw/xen: Implement EVTCHNOP_send David Woodhouse
2023-02-16  6:24 ` [PATCH v11 35/59] hw/xen: Implement EVTCHNOP_alloc_unbound David Woodhouse
2023-02-16  6:24 ` [PATCH v11 36/59] hw/xen: Implement EVTCHNOP_bind_interdomain David Woodhouse
2023-02-16  6:24 ` [PATCH v11 37/59] hw/xen: Implement EVTCHNOP_bind_vcpu David Woodhouse
2023-02-16  6:24 ` [PATCH v11 38/59] hw/xen: Implement EVTCHNOP_reset David Woodhouse
2023-02-16  6:24 ` [PATCH v11 39/59] i386/xen: add monitor commands to test event injection David Woodhouse
2023-02-16  6:24 ` [PATCH v11 40/59] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback David Woodhouse
2023-02-16  6:24 ` [PATCH v11 41/59] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback David Woodhouse
2023-02-16  6:24 ` [PATCH v11 42/59] kvm/i386: Add xen-gnttab-max-frames property David Woodhouse
2023-02-16  6:24 ` [PATCH v11 43/59] hw/xen: Add xen_gnttab device for grant table emulation David Woodhouse
2023-02-16  6:24 ` [PATCH v11 44/59] hw/xen: Support mapping grant frames David Woodhouse
2023-02-20 14:20   ` Paul Durrant
2023-02-16  6:24 ` [PATCH v11 45/59] i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson David Woodhouse
2023-02-16  6:24 ` [PATCH v11 46/59] hw/xen: Implement GNTTABOP_query_size David Woodhouse
2023-02-16  6:24 ` [PATCH v11 47/59] i386/xen: handle PV timer hypercalls David Woodhouse
2023-02-20 14:29   ` Paul Durrant
2023-02-20 15:49     ` David Woodhouse
2023-02-16  6:24 ` [PATCH v11 48/59] i386/xen: Reserve Xen special pages for console, xenstore rings David Woodhouse
2023-02-16  6:24 ` [PATCH v11 49/59] i386/xen: handle HVMOP_get_param David Woodhouse
2023-02-16  6:24 ` [PATCH v11 50/59] hw/xen: Add backend implementation of interdomain event channel support David Woodhouse
2023-02-16  6:24 ` [PATCH v11 51/59] hw/xen: Add xen_xenstore device for xenstore emulation David Woodhouse
2023-02-16  6:24 ` [PATCH v11 52/59] hw/xen: Add basic ring handling to xenstore David Woodhouse
2023-02-16  6:24 ` [PATCH v11 53/59] hw/xen: Automatically add xen-platform PCI device for emulated Xen guests David Woodhouse
2023-02-16  6:24 ` [PATCH v11 54/59] i386/xen: Implement HYPERVISOR_physdev_op David Woodhouse
2023-02-16  6:24 ` [PATCH v11 55/59] hw/xen: Implement emulated PIRQ hypercall support David Woodhouse
2023-02-16  6:24 ` [PATCH v11 56/59] hw/xen: Support GSI mapping to PIRQ David Woodhouse
2023-02-16  6:24 ` [PATCH v11 57/59] hw/xen: Support MSI " David Woodhouse
2023-02-16  6:24 ` [PATCH v11 58/59] kvm/i386: Add xen-evtchn-max-pirq property David Woodhouse
2023-02-16  6:24 ` [PATCH v11 59/59] i386/xen: Document Xen HVM emulation David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230216062444.2129371-14-dwmw2@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=alex.bennee@linaro.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=armbru@redhat.com \
    --cc=cfontana@suse.de \
    --cc=dgilbert@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=julien@xen.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).