From: David Woodhouse <dwmw2@infradead.org>
To: qemu-devel@nongnu.org
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
"Paul Durrant" <paul@xen.org>,
"Joao Martins" <joao.m.martins@oracle.com>,
"Ankur Arora" <ankur.a.arora@oracle.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Thomas Huth" <thuth@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Juan Quintela" <quintela@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"Claudio Fontana" <cfontana@suse.de>
Subject: [RFC PATCH v2 12/22] hw/xen: Add xen_overlay device for emulating shared xenheap pages
Date: Fri, 9 Dec 2022 09:56:02 +0000 [thread overview]
Message-ID: <20221209095612.689243-13-dwmw2@infradead.org> (raw)
In-Reply-To: <20221209095612.689243-1-dwmw2@infradead.org>
From: David Woodhouse <dwmw@amazon.co.uk>
For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.
To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.
Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with.... a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.
This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.
Expecting some heckling at the use of xen_overlay_singleton. What is
the best way to do that? Using qemu_find_recursive() every time seemed
a bit wrong. But I suppose mapping it into the *guest* isn't a fast
path, and if the actual grant table code is allowed to just stash the
pointer it gets from xen_overlay_page_ptr() for later use then that
isn't a fast path for device I/O either.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
hw/i386/kvm/meson.build | 1 +
hw/i386/kvm/xen_overlay.c | 198 ++++++++++++++++++++++++++++++++++++++
hw/i386/kvm/xen_overlay.h | 14 +++
hw/i386/pc_piix.c | 8 ++
4 files changed, 221 insertions(+)
create mode 100644 hw/i386/kvm/xen_overlay.c
create mode 100644 hw/i386/kvm/xen_overlay.h
diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 95467f1ded..6165cbf019 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,5 +4,6 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
new file mode 100644
index 0000000000..c3eeb8dae8
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.c
@@ -0,0 +1,198 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse <dwmw2@infradead.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+
+#include "sysemu/kvm.h"
+#include <linux/kvm.h>
+
+#include "standard-headers/xen/memory.h"
+
+static int xen_overlay_map_page_locked(uint32_t space, uint64_t idx, uint64_t gpa);
+
+#define INVALID_GPA UINT64_MAX
+#define INVALID_GFN UINT64_MAX
+
+#define TYPE_XEN_OVERLAY "xenoverlay"
+OBJECT_DECLARE_SIMPLE_TYPE(XenOverlayState, XEN_OVERLAY)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenOverlayState {
+ /*< private >*/
+ SysBusDevice busdev;
+ /*< public >*/
+
+ MemoryRegion shinfo_mem;
+ void *shinfo_ptr;
+ uint64_t shinfo_gpa;
+};
+
+struct XenOverlayState *xen_overlay_singleton;
+
+static void xen_overlay_realize(DeviceState *dev, Error **errp)
+{
+ XenOverlayState *s = XEN_OVERLAY(dev);
+
+ if (xen_mode != XEN_EMULATE) {
+ error_setg(errp, "Xen overlay page support is for Xen emulation");
+ return;
+ }
+
+ memory_region_init_ram(&s->shinfo_mem, OBJECT(dev), "xen:shared_info", XEN_PAGE_SIZE, &error_abort);
+ memory_region_set_enabled(&s->shinfo_mem, true);
+ s->shinfo_ptr = memory_region_get_ram_ptr(&s->shinfo_mem);
+ s->shinfo_gpa = INVALID_GPA;
+ memset(s->shinfo_ptr, 0, XEN_PAGE_SIZE);
+}
+
+static int xen_overlay_post_load(void *opaque, int version_id)
+{
+ XenOverlayState *s = opaque;
+
+ if (s->shinfo_gpa != INVALID_GPA) {
+ xen_overlay_map_page_locked(XENMAPSPACE_shared_info, 0, s->shinfo_gpa);
+ }
+
+ return 0;
+}
+
+static bool xen_overlay_is_needed(void *opaque)
+{
+ return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_overlay_vmstate = {
+ .name = "xen_overlay",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = xen_overlay_is_needed,
+ .post_load = xen_overlay_post_load,
+ .fields = (VMStateField[]) {
+ VMSTATE_UINT64(shinfo_gpa, XenOverlayState),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+static void xen_overlay_class_init(ObjectClass *klass, void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+
+ dc->realize = xen_overlay_realize;
+ dc->vmsd = &xen_overlay_vmstate;
+}
+
+static const TypeInfo xen_overlay_info = {
+ .name = TYPE_XEN_OVERLAY,
+ .parent = TYPE_SYS_BUS_DEVICE,
+ .instance_size = sizeof(XenOverlayState),
+ .class_init = xen_overlay_class_init,
+};
+
+void xen_overlay_create(void)
+{
+ xen_overlay_singleton = XEN_OVERLAY(sysbus_create_simple(TYPE_XEN_OVERLAY, -1, NULL));
+}
+
+static void xen_overlay_register_types(void)
+{
+ type_register_static(&xen_overlay_info);
+}
+
+type_init(xen_overlay_register_types)
+
+int xen_overlay_map_page(uint32_t space, uint64_t idx, uint64_t gpa)
+{
+ int ret;
+
+ qemu_mutex_lock_iothread();
+ ret = xen_overlay_map_page_locked(space, idx, gpa);
+ qemu_mutex_unlock_iothread();
+
+ return ret;
+}
+
+/* KVM is the only existing back end for now. Let's not overengineer it yet. */
+static int xen_overlay_set_be_shinfo(uint64_t gfn)
+{
+ struct kvm_xen_hvm_attr xa = {
+ .type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+ .u.shared_info.gfn = gfn,
+ };
+
+ return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &xa);
+}
+
+static int xen_overlay_map_page_locked(uint32_t space, uint64_t idx, uint64_t gpa)
+{
+ MemoryRegion *ovl_page;
+ int err;
+
+ if (space != XENMAPSPACE_shared_info || idx != 0)
+ return -EINVAL;
+
+ if (!xen_overlay_singleton)
+ return -ENOENT;
+
+ ovl_page = &xen_overlay_singleton->shinfo_mem;
+
+ /* Xen allows guests to map the same page as many times as it likes
+ * into guest physical frames. We don't, because it would be hard
+ * to track and restore them all. One mapping of each page is
+ * perfectly sufficient for all known guests... and we've tested
+ * that theory on a few now in other implementations. dwmw2. */
+ if (memory_region_is_mapped(ovl_page)) {
+ if (gpa == INVALID_GPA) {
+ /* If removing shinfo page, turn the kernel magic off first */
+ if (space == XENMAPSPACE_shared_info) {
+ err = xen_overlay_set_be_shinfo(INVALID_GFN);
+ if (err)
+ return err;
+ }
+ memory_region_del_subregion(get_system_memory(), ovl_page);
+ goto done;
+ } else {
+ /* Just move it */
+ memory_region_set_address(ovl_page, gpa);
+ }
+ } else if (gpa != INVALID_GPA) {
+ memory_region_add_subregion_overlap(get_system_memory(), gpa, ovl_page, 0);
+ }
+
+ xen_overlay_set_be_shinfo(gpa >> XEN_PAGE_SHIFT);
+ done:
+ xen_overlay_singleton->shinfo_gpa = gpa;
+ return 0;
+}
+
+void *xen_overlay_page_ptr(uint32_t space, uint64_t idx)
+{
+ if (space != XENMAPSPACE_shared_info || idx != 0)
+ return NULL;
+
+ if (!xen_overlay_singleton)
+ return NULL;
+
+ return xen_overlay_singleton->shinfo_ptr;
+}
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
new file mode 100644
index 0000000000..afc63991ea
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.h
@@ -0,0 +1,14 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse <dwmw2@infradead.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+void xen_overlay_create(void);
+int xen_overlay_map_page(uint32_t space, uint64_t idx, uint64_t gpa);
+void *xen_overlay_page_ptr(uint32_t space, uint64_t idx);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d1127adde0..c3c61eedde 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -58,6 +58,9 @@
#include <xen/hvm/hvm_info_table.h>
#include "hw/xen/xen_pt.h"
#endif
+#ifdef CONFIG_XEN_EMU
+#include "hw/i386/kvm/xen_overlay.h"
+#endif
#include "migration/global_state.h"
#include "migration/misc.h"
#include "sysemu/numa.h"
@@ -411,6 +414,11 @@ static void pc_xen_hvm_init(MachineState *machine)
pc_xen_hvm_init_pci(machine);
pci_create_simple(pcms->bus, -1, "xen-platform");
+#ifdef CONFIG_XEN_EMU
+ if (xen_mode == XEN_EMULATE) {
+ xen_overlay_create();
+ }
+#endif
}
#endif
--
2.35.3
next prev parent reply other threads:[~2022-12-09 9:57 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-09 9:55 [RFC PATCH v2 00/22] Xen HVM support under KVM David Woodhouse
2022-12-09 9:55 ` [RFC PATCH v2 01/22] include: import xen public headers David Woodhouse
2022-12-12 9:17 ` Paul Durrant
2022-12-09 9:55 ` [RFC PATCH v2 02/22] xen: add CONFIG_XENFV_MACHINE and CONFIG_XEN_EMU options for Xen emulation David Woodhouse
2022-12-12 9:19 ` Paul Durrant
2022-12-12 17:07 ` Paolo Bonzini
2022-12-12 22:22 ` David Woodhouse
2022-12-13 0:39 ` Paolo Bonzini
2022-12-13 0:59 ` David Woodhouse
2022-12-13 22:32 ` Paolo Bonzini
2022-12-16 8:40 ` David Woodhouse
2022-12-09 9:55 ` [RFC PATCH v2 03/22] i386/xen: Add xen-version machine property and init KVM Xen support David Woodhouse
2022-12-12 12:48 ` Paul Durrant
2022-12-12 17:30 ` Paolo Bonzini
2022-12-12 17:55 ` Paul Durrant
2022-12-13 0:13 ` David Woodhouse
2023-01-17 13:49 ` David Woodhouse
2022-12-09 9:55 ` [RFC PATCH v2 04/22] i386/kvm: handle Xen HVM cpuid leaves David Woodhouse
2022-12-12 13:13 ` Paul Durrant
2022-12-13 9:47 ` David Woodhouse
2022-12-09 9:55 ` [RFC PATCH v2 05/22] xen-platform-pci: allow its creation with XEN_EMULATE mode David Woodhouse
2022-12-12 13:24 ` Paul Durrant
2022-12-12 22:07 ` David Woodhouse
2022-12-09 9:55 ` [RFC PATCH v2 06/22] hw/xen_backend: refactor xen_be_init() David Woodhouse
2022-12-12 13:27 ` Paul Durrant
2022-12-09 9:55 ` [RFC PATCH v2 07/22] pc_piix: handle XEN_EMULATE backend init David Woodhouse
2022-12-12 13:47 ` Paul Durrant
2022-12-12 14:50 ` David Woodhouse
2022-12-09 9:55 ` [RFC PATCH v2 08/22] xen_platform: exclude vfio-pci from the PCI platform unplug David Woodhouse
2022-12-12 13:52 ` Paul Durrant
2022-12-09 9:55 ` [RFC PATCH v2 09/22] pc_piix: allow xenfv machine with XEN_EMULATE David Woodhouse
2022-12-12 14:05 ` Paul Durrant
2022-12-09 9:56 ` [RFC PATCH v2 10/22] i386/xen: handle guest hypercalls David Woodhouse
2022-12-12 14:11 ` Paul Durrant
2022-12-12 14:17 ` David Woodhouse
2022-12-12 17:07 ` Paolo Bonzini
2022-12-09 9:56 ` [RFC PATCH v2 11/22] i386/xen: implement HYPERCALL_xen_version David Woodhouse
2022-12-12 14:17 ` Paul Durrant
2022-12-13 0:06 ` David Woodhouse
2022-12-09 9:56 ` David Woodhouse [this message]
2022-12-12 14:29 ` [RFC PATCH v2 12/22] hw/xen: Add xen_overlay device for emulating shared xenheap pages Paul Durrant
2022-12-12 17:14 ` Paolo Bonzini
2022-12-09 9:56 ` [RFC PATCH v2 13/22] i386/xen: implement HYPERVISOR_memory_op David Woodhouse
2022-12-12 14:38 ` Paul Durrant
2022-12-13 0:08 ` David Woodhouse
2022-12-09 9:56 ` [RFC PATCH v2 14/22] i386/xen: implement HYPERVISOR_hvm_op David Woodhouse
2022-12-12 14:41 ` Paul Durrant
2022-12-09 9:56 ` [RFC PATCH v2 15/22] i386/xen: implement HYPERVISOR_vcpu_op David Woodhouse
2022-12-12 14:51 ` Paul Durrant
2022-12-13 0:10 ` David Woodhouse
2022-12-09 9:56 ` [RFC PATCH v2 16/22] i386/xen: handle VCPUOP_register_vcpu_info David Woodhouse
2022-12-12 14:58 ` Paul Durrant
2022-12-13 0:13 ` David Woodhouse
2022-12-14 10:28 ` Paul Durrant
2022-12-14 11:04 ` David Woodhouse
2022-12-09 9:56 ` [RFC PATCH v2 17/22] i386/xen: handle VCPUOP_register_vcpu_time_info David Woodhouse
2022-12-12 15:34 ` Paul Durrant
2022-12-09 9:56 ` [RFC PATCH v2 18/22] i386/xen: handle VCPUOP_register_runstate_memory_area David Woodhouse
2022-12-12 15:38 ` Paul Durrant
2022-12-09 9:56 ` [RFC PATCH v2 19/22] i386/xen: implement HVMOP_set_evtchn_upcall_vector David Woodhouse
2022-12-12 15:52 ` Paul Durrant
2022-12-09 9:56 ` [RFC PATCH v2 20/22] i386/xen: HVMOP_set_param / HVM_PARAM_CALLBACK_IRQ David Woodhouse
2022-12-12 16:16 ` Paul Durrant
2022-12-12 16:26 ` David Woodhouse
2022-12-12 16:39 ` Paul Durrant
2022-12-15 20:54 ` David Woodhouse
2022-12-20 13:56 ` Paul Durrant
2022-12-20 16:27 ` David Woodhouse
2022-12-20 17:25 ` Paul Durrant
2022-12-20 17:29 ` David Woodhouse
2022-12-28 10:45 ` David Woodhouse
2022-12-21 1:41 ` David Woodhouse
2022-12-21 9:37 ` Paul Durrant
2022-12-21 12:16 ` David Woodhouse
2022-12-09 9:56 ` [RFC PATCH v2 21/22] i386/xen: implement HYPERVISOR_event_channel_op David Woodhouse
2022-12-12 16:23 ` Paul Durrant
2022-12-09 9:56 ` [RFC PATCH v2 22/22] i386/xen: implement HYPERVISOR_sched_op David Woodhouse
2022-12-12 16:37 ` Paul Durrant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221209095612.689243-13-dwmw2@infradead.org \
--to=dwmw2@infradead.org \
--cc=alex.bennee@linaro.org \
--cc=ankur.a.arora@oracle.com \
--cc=cfontana@suse.de \
--cc=dgilbert@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=paul@xen.org \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).