[PATCH v4 00/15] macOS PV Graphics and new vmapple machine type

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type
@ 2024-10-24 10:27 Phil Dennis-Jordan
  2024-10-24 10:27 ` [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling Phil Dennis-Jordan
                   ` (14 more replies)
  0 siblings, 15 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

This patch set introduces a new ARM and macOS HVF specific machine type
called "vmapple", as well as a family of display devices based on the
ParavirtualizedGraphics.framework in macOS. One of the display adapter
variants, apple-gfx-mmio, is required for the new machine type, while
apple-gfx-pci can be used to enable 3D graphics acceleration with x86-64
macOS guest OSes.

Previous versions of this patch set were submitted semi-separately:
the original vmapple patch set by Alexander Graf included a monolithic
implementation of apple-gfx-mmio. I subsequently reviewed and reworked
the latter to support the PCI variant of the device as well and submitted
the result in isolation. As requested in subsequent review, I have now
recombined this with the original vmapple patch set, which I have updated
and improved in a few ways as well.

The vmapple machine type approximates the configuration in macOS's own
Virtualization.framework when running arm64 macOS guests. In addition to
generic components such as a GICv3 and an XHCI USB controller, it
includes nonstandard extensions to the virtio block device, a special
"hardware" aes engine, a configuration device, a pvpanic variant, a
"backdoor" interface, and of course the apple-gfx paravirtualised display
adapter.

There are currently a few limitations to this which aren't intrinsic,
just imperfect emulation of the VZF, but it's good enough to be just
about usable for some purposes:

 * macOS 12 guests only. Versions 13+ currently fail during early boot.
 * macOS 11+ arm64 hosts only, with hvf accel. (Perhaps some differences
   between Apple M series CPUs and TCG's aarch64 implementation? macOS
   hosts only because ParavirtualizedGraphics.framework is a black box
   implementing most of the logic behind the apple-gfx device.)
 * PCI devices use legacy IRQs, not MSI/MSI-X. As far as I can tell,
   we'd need to include the GICv3 ITS, but it's unclear to me what
   exactly needs wiring up.
 * Due to lack of MSI(-X), event delivery from USB devices to the guest
   macOS isn't working correctly. My current conclusion is that the
   OS's XHCI driver simply was never designed to work with legacy IRQs.
   The upshot is that keyboard and mouse/tablet input is very laggy.
   The solution would be to implement MSI(-X) support or figure out how
   to make hcd-xhci-sysbus work with the macOS guest, if at all possible.
   (EHCI and UHCI/OHCI controllers are not an option as the VMAPPLE
   guest kernel does not include drivers for these.)
 * The guest OS must first be provisioned using Virtualization.framework;
   the disk images can subsequently be used in Qemu. (See docs.)

The apple-gfx device can be used independently from the vmapple machine
type, at least in the PCI variant. It mainly targets x86-64 macOS guests
from version 11 on, but also includes a UEFI bootrom for basic
framebuffer mode. macOS 11 is also required on the host side, as well
as a GPU that supports the Metal API. On the guest side, this provides
3D acceleration/GPGPU support with a baseline Metal feature set,
irrespective of the host GPU's feature set. A few limitations in the
current integration:

 * Although it works fine with TCG, it does not work correctly
   cross-architecture: x86-64 guests on arm64 hosts appear to make
   some boot progress, but rendering is corrupted. I suspect
   incompatible texture memory layouts; I have no idea if this is
   fixable.
 * ParavirtualizedGraphics.framework and the guest driver support
   multi-headed configurations. The current Qemu integration always
   connects precisely 1 display.
 * State serialisation and deserialisation is currently not
   implemented, though supported in principle by the framework.
   Both apple-gfx variants thus set up a migration blocker.
 * Rendering efficiency could be better. The GPU-rendered guest
   framebuffer is copied to system memory and uses Qemu's usual
   CPU-based drawing. For maximum efficiency, the Metal texture
   containing the guest framebuffer could be drawn directly to
   a Metal view in the host window, staying on the GPU. (Similar
   to the OpenGL/virgl render path on other platforms.)

My part of this work has been sponsored by Sauce Labs Inc.

---

v2 -> v3:

 * Merged the apple-gfx and vmapple patchsets.
 * Squashed a bunch of later apple-gfx patches into the main one.
   (dGPU support, queried MMIO area size, host GPU picking logic.)
 * Rebased on latest upstream, fixing any breakages due to internal
   Qemu API changes.
 * apple-gfx: Switched to re-entrant MMIO. This is supported by the
   underlying framework and simplifies the MMIO forwarding code which
   was previously different on x86-64 vs aarch64.
 * vmapple: Fixes for minor bugs and comments from the last round of
   review.
 * vmapple aes, conf, apple-gfx: Switched reset methods to implement
   the ResettableClass base's interface.
 * vmapple: switched from virtio-hid to an XHCI USB controller and
   USB mouse and tablet devices. macOS does not provide drivers for
   virtio HID devices, at least not in version 12's vmapple kernel.
   So input now sort of works (interrupt issues) rather than not
   at all. Use network-based remote access to the guest OS as a
   work-around.

v3 -> v4:

 * Complete rework of the mechanism for handling runloop/libdispatch
   events on the main thread. PV graphics now work with the SDL UI.
 * Renamed 'apple-gfx-vmapple' device to 'apple-gfx-mmio'
 * hw/display/apple-gfx: threading model overhaul to be more consistent,
   safer, and more QEMU-idiomatic.
 * display-modes property on the apple-gfx devices now uses the
   native array property mechanism and works on both device variants.
 * hw/vmapple/aes: Improvements to logging and error handling.
 * hw/vmapple/cfg: Bug fixes around device property default values.
 * hw/vmapple/{aes,cfg,virtio-blk/vmapple}: Most header code moved into
   .c files, only a single vmapple.h now contains the #defines for the
   vmapple machine model-specific device type names.
 * hw/block/virtio-blk: New patch for replacing virtio_blk_free_request
   with g_free. (Optional)
 * Various smaller changes following comments in v3 code review in
   apple-gfx, aes, cfg, bdif, virtio-blk-vmapple, and the vmapple
   machine type itself. See patch-specific v4 change notes for details.

Alexander Graf (9):
  hw: Add vmapple subdir
  hw/misc/pvpanic: Add MMIO interface
  hvf: arm: Ignore writes to CNTP_CTL_EL0
  gpex: Allow more than 4 legacy IRQs
  hw/vmapple/aes: Introduce aes engine
  hw/vmapple/bdif: Introduce vmapple backdoor interface
  hw/vmapple/cfg: Introduce vmapple cfg region
  hw/vmapple/virtio-blk: Add support for apple virtio-blk
  hw/vmapple/vmapple: Add vmapple machine type

Phil Dennis-Jordan (6):
  ui & main loop: Redesign of system-specific main thread event handling
  hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework
    support
  hw/display/apple-gfx: Adds PCI implementation
  hw/display/apple-gfx: Adds configurable mode list
  MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF
  hw/block/virtio-blk: Replaces request free function with g_free

 MAINTAINERS                    |  15 +
 docs/system/arm/vmapple.rst    |  63 +++
 docs/system/target-arm.rst     |   1 +
 hw/Kconfig                     |   1 +
 hw/arm/sbsa-ref.c              |   2 +-
 hw/arm/virt.c                  |   2 +-
 hw/block/virtio-blk.c          |  58 +--
 hw/display/Kconfig             |  13 +
 hw/display/apple-gfx-mmio.m    | 292 ++++++++++++
 hw/display/apple-gfx-pci.m     | 159 +++++++
 hw/display/apple-gfx.h         |  70 +++
 hw/display/apple-gfx.m         | 816 +++++++++++++++++++++++++++++++++
 hw/display/meson.build         |   5 +
 hw/display/trace-events        |  28 ++
 hw/i386/microvm.c              |   2 +-
 hw/loongarch/virt.c            |   2 +-
 hw/meson.build                 |   1 +
 hw/mips/loongson3_virt.c       |   2 +-
 hw/misc/Kconfig                |   4 +
 hw/misc/meson.build            |   1 +
 hw/misc/pvpanic-mmio.c         |  61 +++
 hw/openrisc/virt.c             |  12 +-
 hw/pci-host/gpex.c             |  43 +-
 hw/riscv/virt.c                |  12 +-
 hw/vmapple/Kconfig             |  32 ++
 hw/vmapple/aes.c               | 572 +++++++++++++++++++++++
 hw/vmapple/bdif.c              | 259 +++++++++++
 hw/vmapple/cfg.c               | 197 ++++++++
 hw/vmapple/meson.build         |   5 +
 hw/vmapple/trace-events        |  23 +
 hw/vmapple/trace.h             |   1 +
 hw/vmapple/virtio-blk.c        | 233 ++++++++++
 hw/vmapple/vmapple.c           | 652 ++++++++++++++++++++++++++
 hw/xtensa/virt.c               |   2 +-
 include/hw/misc/pvpanic.h      |   1 +
 include/hw/pci-host/gpex.h     |   7 +-
 include/hw/pci/pci_ids.h       |   1 +
 include/hw/virtio/virtio-blk.h |  11 +-
 include/hw/vmapple/vmapple.h   |  21 +
 include/qemu-main.h            |   3 +-
 include/qemu/cutils.h          |  15 +
 include/qemu/typedefs.h        |   1 +
 include/sysemu/os-posix.h      |   2 +
 include/sysemu/os-win32.h      |   2 +
 include/ui/console.h           |  12 +
 meson.build                    |   5 +
 os-posix.c                     |  20 +
 system/main.c                  |  45 +-
 system/vl.c                    |   2 +
 target/arm/hvf/hvf.c           |   9 +
 ui/cocoa.m                     |  55 +--
 ui/console.c                   |  32 +-
 ui/sdl2.c                      |   2 +
 ui/trace-events                |   1 +
 util/hexdump.c                 |  14 +
 55 files changed, 3790 insertions(+), 112 deletions(-)
 create mode 100644 docs/system/arm/vmapple.rst
 create mode 100644 hw/display/apple-gfx-mmio.m
 create mode 100644 hw/display/apple-gfx-pci.m
 create mode 100644 hw/display/apple-gfx.h
 create mode 100644 hw/display/apple-gfx.m
 create mode 100644 hw/misc/pvpanic-mmio.c
 create mode 100644 hw/vmapple/Kconfig
 create mode 100644 hw/vmapple/aes.c
 create mode 100644 hw/vmapple/bdif.c
 create mode 100644 hw/vmapple/cfg.c
 create mode 100644 hw/vmapple/meson.build
 create mode 100644 hw/vmapple/trace-events
 create mode 100644 hw/vmapple/trace.h
 create mode 100644 hw/vmapple/virtio-blk.c
 create mode 100644 hw/vmapple/vmapple.c
 create mode 100644 include/hw/vmapple/vmapple.h

-- 
2.39.3 (Apple Git-145)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
@ 2024-10-24 10:27 ` Phil Dennis-Jordan
  2024-10-25  4:34   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support Phil Dennis-Jordan
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

macOS's Cocoa event handling must be done on the initial (main) thread
of the process. Furthermore, if library or application code uses
libdispatch, the main dispatch queue must be handling events on the main
thread as well.

So far, this has affected Qemu in both the Cocoa and SDL UIs, although
in different ways: the Cocoa UI replaces the default qemu_main function
with one that spins Qemu's internal main event loop off onto a
background thread. SDL (which uses Cocoa internally) on the other hand
uses a polling approach within Qemu's main event loop. Events are
polled during the SDL UI's dpy_refresh callback, which happens to run
on the main thread by default.

As UIs are mutually exclusive, this works OK as long as nothing else
needs platform-native event handling. In the next patch, a new device is
introduced based on the ParavirtualizedGraphics.framework in macOS.
This uses libdispatch internally, and only works when events are being
handled on the main runloop. With the current system, it works when
using either the Cocoa or the SDL UI. However, it does not when running
headless. Moreover, any attempt to install a similar scheme to the
Cocoa UI's main thread replacement fails when combined with the SDL
UI.

This change formalises main thread handling. UI (Display) and OS
platform implementations can declare requirements or preferences:

 * The Cocoa UI specifies that Qemu's main loop must run on a
   background thread and provides a function to run on the main thread
   which runs the NSApplication event handling runloop.
 * The SDL UI specifies that Qemu's main loop must run on the main
   thread.
 * For other UIs, or in the absence of UIs, the platform's default
   behaviour is followed.
 * The Darwin platform provides a default function to run on the
   main thread, which runs the main CFRunLoop.
 * Other OSes do not provide their own default main function and thus
   fall back to running Qemu's main loop on the main thread, as usual.

This means that on macOS, the platform's runloop events are always
handled, regardless of chosen UI. The new PV graphics device will
thus work in all configurations.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 include/qemu-main.h       |  3 +--
 include/qemu/typedefs.h   |  1 +
 include/sysemu/os-posix.h |  2 ++
 include/sysemu/os-win32.h |  2 ++
 include/ui/console.h      | 12 +++++++++
 os-posix.c                | 20 ++++++++++++++
 system/main.c             | 45 +++++++++++++++++++++++++++-----
 system/vl.c               |  2 ++
 ui/cocoa.m                | 55 +++++++++------------------------------
 ui/console.c              | 32 +++++++++++++++++++++--
 ui/sdl2.c                 |  2 ++
 ui/trace-events           |  1 +
 12 files changed, 123 insertions(+), 54 deletions(-)

diff --git a/include/qemu-main.h b/include/qemu-main.h
index 940960a7dbc..4bd0d667edc 100644
--- a/include/qemu-main.h
+++ b/include/qemu-main.h
@@ -5,7 +5,6 @@
 #ifndef QEMU_MAIN_H
 #define QEMU_MAIN_H
 
-int qemu_default_main(void);
-extern int (*qemu_main)(void);
+extern qemu_main_fn qemu_main;
 
 #endif /* QEMU_MAIN_H */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 3d84efcac47..b02cfe1f328 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -131,5 +131,6 @@ typedef struct IRQState *qemu_irq;
  * Function types
  */
 typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
+typedef int (*qemu_main_fn)(void);
 
 #endif /* QEMU_TYPEDEFS_H */
diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index b881ac6c6f7..51bbb5370e0 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -26,6 +26,7 @@
 #ifndef QEMU_OS_POSIX_H
 #define QEMU_OS_POSIX_H
 
+#include "qemu/typedefs.h"
 #include <sys/mman.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
@@ -54,6 +55,7 @@ void os_set_chroot(const char *path);
 void os_setup_limits(void);
 void os_setup_post(void);
 int os_mlock(void);
+qemu_main_fn os_non_loop_main_thread_fn(void);
 
 /**
  * qemu_alloc_stack:
diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
index b82a5d3ad93..db0daba9a52 100644
--- a/include/sysemu/os-win32.h
+++ b/include/sysemu/os-win32.h
@@ -26,6 +26,7 @@
 #ifndef QEMU_OS_WIN32_H
 #define QEMU_OS_WIN32_H
 
+#include "qemu/typedefs.h"
 #include <winsock2.h>
 #include <windows.h>
 #include <ws2tcpip.h>
@@ -105,6 +106,7 @@ void os_set_line_buffering(void);
 void os_setup_early_signal_handling(void);
 
 int getpagesize(void);
+static inline qemu_main_fn os_non_loop_main_thread_fn(void) { return NULL; }
 
 #if !defined(EPROTONOSUPPORT)
 # define EPROTONOSUPPORT EINVAL
diff --git a/include/ui/console.h b/include/ui/console.h
index 5832d52a8a6..4e3dc7da146 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -440,6 +440,18 @@ typedef struct QemuDisplay QemuDisplay;
 
 struct QemuDisplay {
     DisplayType type;
+    /*
+     * Some UIs have special requirements, for the qemu_main event loop running
+     * on either the process's initial (main) thread ('Off'), or on an
+     * explicitly created background thread ('On') because of platform-specific
+     * event handling.
+     * The default, 'Auto', indicates the display will work with both setups.
+     * If 'On', either a qemu_main_thread_fn must be supplied, or it must be
+     * ensured that all applicable host OS platforms supply a default main.
+     * (via os_non_loop_main_thread_fn())
+     */
+    OnOffAuto qemu_main_on_bg_thread;
+    qemu_main_fn qemu_main_thread_fn;
     void (*early_init)(DisplayOptions *opts);
     void (*init)(DisplayState *ds, DisplayOptions *opts);
     const char *vc;
diff --git a/os-posix.c b/os-posix.c
index 43f9a43f3fe..a173c026f6c 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -37,6 +37,8 @@
 
 #ifdef CONFIG_LINUX
 #include <sys/prctl.h>
+#elif defined(CONFIG_DARWIN)
+#include <CoreFoundation/CoreFoundation.h>
 #endif
 
 
@@ -342,3 +344,21 @@ int os_mlock(void)
     return -ENOSYS;
 #endif
 }
+
+#ifdef CONFIG_DARWIN
+static int os_darwin_cfrunloop_main(void)
+{
+    CFRunLoopRun();
+    abort();
+}
+#endif
+
+qemu_main_fn os_non_loop_main_thread_fn(void)
+{
+#ifdef CONFIG_DARWIN
+    /* By default, run the OS's event runloop on the main thread. */
+    return os_darwin_cfrunloop_main;
+#else
+    return NULL;
+#endif
+}
diff --git a/system/main.c b/system/main.c
index 9b91d21ea8c..358eab281b0 100644
--- a/system/main.c
+++ b/system/main.c
@@ -24,13 +24,10 @@
 
 #include "qemu/osdep.h"
 #include "qemu-main.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 
-#ifdef CONFIG_SDL
-#include <SDL.h>
-#endif
-
-int qemu_default_main(void)
+static int qemu_default_main(void)
 {
     int status;
 
@@ -40,10 +37,44 @@ int qemu_default_main(void)
     return status;
 }
 
-int (*qemu_main)(void) = qemu_default_main;
+/*
+ * Various macOS system libraries, including the Cocoa UI and anything using
+ * libdispatch, such as ParavirtualizedGraphics.framework, requires that the
+ * main runloop, on the main (initial) thread be running or at least regularly
+ * polled for events. A special mode is therefore supported, where the QEMU
+ * main loop runs on a separate thread and the main thread handles the
+ * CF/Cocoa runloop.
+ */
+
+static void *call_qemu_default_main(void *opaque)
+{
+    int status;
+
+    bql_lock();
+    status = qemu_default_main();
+    bql_unlock();
+
+    exit(status);
+}
+
+static void qemu_run_default_main_on_new_thread(void)
+{
+    QemuThread thread;
+
+    qemu_thread_create(&thread, "qemu_main", call_qemu_default_main,
+                       NULL, QEMU_THREAD_DETACHED);
+}
+
+qemu_main_fn qemu_main;
 
 int main(int argc, char **argv)
 {
     qemu_init(argc, argv);
-    return qemu_main();
+    if (qemu_main) {
+        qemu_run_default_main_on_new_thread();
+        bql_unlock();
+        return qemu_main();
+    } else {
+        qemu_default_main();
+    }
 }
diff --git a/system/vl.c b/system/vl.c
index e83b3b2608b..c1db20dbee9 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -134,6 +134,7 @@
 #include "sysemu/iothread.h"
 #include "qemu/guest-random.h"
 #include "qemu/keyval.h"
+#include "qemu-main.h"
 
 #define MAX_VIRTIO_CONSOLES 1
 
@@ -3667,6 +3668,7 @@ void qemu_init(int argc, char **argv)
     trace_init_file();
 
     qemu_init_main_loop(&error_fatal);
+    qemu_main = os_non_loop_main_thread_fn();
     cpu_timers_init();
 
     user_register_global_props();
diff --git a/ui/cocoa.m b/ui/cocoa.m
index 4c2dd335323..393b3800491 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -73,6 +73,8 @@
     int height;
 } QEMUScreen;
 
+@class QemuCocoaPasteboardTypeOwner;
+
 static void cocoa_update(DisplayChangeListener *dcl,
                          int x, int y, int w, int h);
 
@@ -107,6 +109,7 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 static NSInteger cbchangecount = -1;
 static QemuClipboardInfo *cbinfo;
 static QemuEvent cbevent;
+static QemuCocoaPasteboardTypeOwner *cbowner;
 
 // Utility functions to run specified code block with the BQL held
 typedef void (^CodeBlock)(void);
@@ -1321,8 +1324,10 @@ - (void) dealloc
 {
     COCOA_DEBUG("QemuCocoaAppController: dealloc\n");
 
-    if (cocoaView)
-        [cocoaView release];
+    [cocoaView release];
+    [cbowner release];
+    cbowner = nil;
+
     [super dealloc];
 }
 
@@ -1938,8 +1943,6 @@ - (void)pasteboard:(NSPasteboard *)sender provideDataForType:(NSPasteboardType)t
 
 @end
 
-static QemuCocoaPasteboardTypeOwner *cbowner;
-
 static void cocoa_clipboard_notify(Notifier *notifier, void *data);
 static void cocoa_clipboard_request(QemuClipboardInfo *info,
                                     QemuClipboardType type);
@@ -2002,43 +2005,8 @@ static void cocoa_clipboard_request(QemuClipboardInfo *info,
     }
 }
 
-/*
- * The startup process for the OSX/Cocoa UI is complicated, because
- * OSX insists that the UI runs on the initial main thread, and so we
- * need to start a second thread which runs the qemu_default_main():
- * in main():
- *  in cocoa_display_init():
- *   assign cocoa_main to qemu_main
- *   create application, menus, etc
- *  in cocoa_main():
- *   create qemu-main thread
- *   enter OSX run loop
- */
-
-static void *call_qemu_main(void *opaque)
-{
-    int status;
-
-    COCOA_DEBUG("Second thread: calling qemu_default_main()\n");
-    bql_lock();
-    status = qemu_default_main();
-    bql_unlock();
-    COCOA_DEBUG("Second thread: qemu_default_main() returned, exiting\n");
-    [cbowner release];
-    exit(status);
-}
-
 static int cocoa_main(void)
 {
-    QemuThread thread;
-
-    COCOA_DEBUG("Entered %s()\n", __func__);
-
-    bql_unlock();
-    qemu_thread_create(&thread, "qemu_main", call_qemu_main,
-                       NULL, QEMU_THREAD_DETACHED);
-
-    // Start the main event loop
     COCOA_DEBUG("Main thread: entering OSX run loop\n");
     [NSApp run];
     COCOA_DEBUG("Main thread: left OSX run loop, which should never happen\n");
@@ -2120,8 +2088,6 @@ static void cocoa_display_init(DisplayState *ds, DisplayOptions *opts)
 
     COCOA_DEBUG("qemu_cocoa: cocoa_display_init\n");
 
-    qemu_main = cocoa_main;
-
     // Pull this console process up to being a fully-fledged graphical
     // app with a menubar and Dock icon
     ProcessSerialNumber psn = { 0, kCurrentProcess };
@@ -2188,8 +2154,11 @@ static void cocoa_display_init(DisplayState *ds, DisplayOptions *opts)
 }
 
 static QemuDisplay qemu_display_cocoa = {
-    .type       = DISPLAY_TYPE_COCOA,
-    .init       = cocoa_display_init,
+    .type                   = DISPLAY_TYPE_COCOA,
+    .init                   = cocoa_display_init,
+    /* The Cocoa UI will run the NSApplication runloop on the main thread.*/
+    .qemu_main_on_bg_thread = ON_OFF_AUTO_ON,
+    .qemu_main_thread_fn    = cocoa_main,
 };
 
 static void register_cocoa(void)
diff --git a/ui/console.c b/ui/console.c
index 5165f171257..1599d8b7095 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -33,6 +33,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
+#include "qemu-main.h"
 #include "chardev/char.h"
 #include "trace.h"
 #include "exec/memory.h"
@@ -1569,12 +1570,39 @@ void qemu_display_early_init(DisplayOptions *opts)
 
 void qemu_display_init(DisplayState *ds, DisplayOptions *opts)
 {
+    QemuDisplay *display;
+    bool bg_main_loop;
+
     assert(opts->type < DISPLAY_TYPE__MAX);
     if (opts->type == DISPLAY_TYPE_NONE) {
         return;
     }
-    assert(dpys[opts->type] != NULL);
-    dpys[opts->type]->init(ds, opts);
+    display = dpys[opts->type];
+    assert(display != NULL);
+    display->init(ds, opts);
+
+    switch (display->qemu_main_on_bg_thread) {
+    case ON_OFF_AUTO_OFF:
+        bg_main_loop = false;
+        qemu_main = NULL;
+        break;
+    case ON_OFF_AUTO_ON:
+        bg_main_loop = true;
+        break;
+    case ON_OFF_AUTO_AUTO:
+    default:
+        bg_main_loop = qemu_main;
+        break;
+    }
+
+    trace_qemu_display_init_main_thread(
+        DisplayType_str(display->type), display->qemu_main_thread_fn, qemu_main,
+        OnOffAuto_lookup.array[display->qemu_main_on_bg_thread],
+        display->qemu_main_on_bg_thread, bg_main_loop);
+    if (bg_main_loop && display->qemu_main_thread_fn) {
+        qemu_main = display->qemu_main_thread_fn;
+    }
+    assert(!bg_main_loop || qemu_main);
 }
 
 const char *qemu_display_get_vc(DisplayOptions *opts)
diff --git a/ui/sdl2.c b/ui/sdl2.c
index bd4f5a9da14..35e22785119 100644
--- a/ui/sdl2.c
+++ b/ui/sdl2.c
@@ -971,6 +971,8 @@ static QemuDisplay qemu_display_sdl2 = {
     .type       = DISPLAY_TYPE_SDL,
     .early_init = sdl2_display_early_init,
     .init       = sdl2_display_init,
+    /* SDL must poll for events (via dpy_refresh) on main thread */
+    .qemu_main_on_bg_thread = ON_OFF_AUTO_OFF,
 };
 
 static void register_sdl1(void)
diff --git a/ui/trace-events b/ui/trace-events
index 3da0d5e2800..1e72c967399 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -16,6 +16,7 @@ displaysurface_free(void *display_surface) "surface=%p"
 displaychangelistener_register(void *dcl, const char *name) "%p [ %s ]"
 displaychangelistener_unregister(void *dcl, const char *name) "%p [ %s ]"
 ppm_save(int fd, void *image) "fd=%d image=%p"
+qemu_display_init_main_thread(const char *display_name, bool qemu_display_sets_main_fn, bool qemu_main_is_set, const char *display_bg_main_loop_preference, int preference, bool bg_main_loop) "display '%s': sets main thread function: %d, platform provides main function: %d, display background main loop preference: %s (%d); main loop will run on background thread: %d"
 
 # gtk-egl.c
 # gtk-gl-area.c
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling
  2024-10-24 10:27 ` [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling Phil Dennis-Jordan
@ 2024-10-25  4:34   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-25  4:34 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv

On 2024/10/24 19:27, Phil Dennis-Jordan wrote:
> macOS's Cocoa event handling must be done on the initial (main) thread
> of the process. Furthermore, if library or application code uses
> libdispatch, the main dispatch queue must be handling events on the main
> thread as well.
> 
> So far, this has affected Qemu in both the Cocoa and SDL UIs, although
> in different ways: the Cocoa UI replaces the default qemu_main function
> with one that spins Qemu's internal main event loop off onto a
> background thread. SDL (which uses Cocoa internally) on the other hand
> uses a polling approach within Qemu's main event loop. Events are
> polled during the SDL UI's dpy_refresh callback, which happens to run
> on the main thread by default.
> 
> As UIs are mutually exclusive, this works OK as long as nothing else
> needs platform-native event handling. In the next patch, a new device is
> introduced based on the ParavirtualizedGraphics.framework in macOS.
> This uses libdispatch internally, and only works when events are being
> handled on the main runloop. With the current system, it works when
> using either the Cocoa or the SDL UI. However, it does not when running
> headless. Moreover, any attempt to install a similar scheme to the
> Cocoa UI's main thread replacement fails when combined with the SDL
> UI.
> 
> This change formalises main thread handling. UI (Display) and OS
> platform implementations can declare requirements or preferences:
> 
>   * The Cocoa UI specifies that Qemu's main loop must run on a
>     background thread and provides a function to run on the main thread
>     which runs the NSApplication event handling runloop.
>   * The SDL UI specifies that Qemu's main loop must run on the main
>     thread.
>   * For other UIs, or in the absence of UIs, the platform's default
>     behaviour is followed.
>   * The Darwin platform provides a default function to run on the
>     main thread, which runs the main CFRunLoop.
>   * Other OSes do not provide their own default main function and thus
>     fall back to running Qemu's main loop on the main thread, as usual.
> 
> This means that on macOS, the platform's runloop events are always
> handled, regardless of chosen UI. The new PV graphics device will
> thus work in all configurations.
> 
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>

This patch adds a few indirections but I don' see their utilities.
Namely, we can initialize qemu_main with os_darwin_cfrunloop_main ifdef 
CONFIG_DARWIN instead of defining and calling 
os_non_loop_main_thread_fn(). ui/cocoa and ui/sdl2 can also assign 
qemu_main directly.

Regards,
Akihiko Odaki

> ---
>   include/qemu-main.h       |  3 +--
>   include/qemu/typedefs.h   |  1 +
>   include/sysemu/os-posix.h |  2 ++
>   include/sysemu/os-win32.h |  2 ++
>   include/ui/console.h      | 12 +++++++++
>   os-posix.c                | 20 ++++++++++++++
>   system/main.c             | 45 +++++++++++++++++++++++++++-----
>   system/vl.c               |  2 ++
>   ui/cocoa.m                | 55 +++++++++------------------------------
>   ui/console.c              | 32 +++++++++++++++++++++--
>   ui/sdl2.c                 |  2 ++
>   ui/trace-events           |  1 +
>   12 files changed, 123 insertions(+), 54 deletions(-)
> 
> diff --git a/include/qemu-main.h b/include/qemu-main.h
> index 940960a7dbc..4bd0d667edc 100644
> --- a/include/qemu-main.h
> +++ b/include/qemu-main.h
> @@ -5,7 +5,6 @@
>   #ifndef QEMU_MAIN_H
>   #define QEMU_MAIN_H
>   
> -int qemu_default_main(void);
> -extern int (*qemu_main)(void);
> +extern qemu_main_fn qemu_main;
>   
>   #endif /* QEMU_MAIN_H */
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 3d84efcac47..b02cfe1f328 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -131,5 +131,6 @@ typedef struct IRQState *qemu_irq;
>    * Function types
>    */
>   typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
> +typedef int (*qemu_main_fn)(void);
>   
>   #endif /* QEMU_TYPEDEFS_H */
> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
> index b881ac6c6f7..51bbb5370e0 100644
> --- a/include/sysemu/os-posix.h
> +++ b/include/sysemu/os-posix.h
> @@ -26,6 +26,7 @@
>   #ifndef QEMU_OS_POSIX_H
>   #define QEMU_OS_POSIX_H
>   
> +#include "qemu/typedefs.h"
>   #include <sys/mman.h>
>   #include <sys/socket.h>
>   #include <netinet/in.h>
> @@ -54,6 +55,7 @@ void os_set_chroot(const char *path);
>   void os_setup_limits(void);
>   void os_setup_post(void);
>   int os_mlock(void);
> +qemu_main_fn os_non_loop_main_thread_fn(void);
>   
>   /**
>    * qemu_alloc_stack:
> diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
> index b82a5d3ad93..db0daba9a52 100644
> --- a/include/sysemu/os-win32.h
> +++ b/include/sysemu/os-win32.h
> @@ -26,6 +26,7 @@
>   #ifndef QEMU_OS_WIN32_H
>   #define QEMU_OS_WIN32_H
>   
> +#include "qemu/typedefs.h"
>   #include <winsock2.h>
>   #include <windows.h>
>   #include <ws2tcpip.h>
> @@ -105,6 +106,7 @@ void os_set_line_buffering(void);
>   void os_setup_early_signal_handling(void);
>   
>   int getpagesize(void);
> +static inline qemu_main_fn os_non_loop_main_thread_fn(void) { return NULL; }
>   
>   #if !defined(EPROTONOSUPPORT)
>   # define EPROTONOSUPPORT EINVAL
> diff --git a/include/ui/console.h b/include/ui/console.h
> index 5832d52a8a6..4e3dc7da146 100644
> --- a/include/ui/console.h
> +++ b/include/ui/console.h
> @@ -440,6 +440,18 @@ typedef struct QemuDisplay QemuDisplay;
>   
>   struct QemuDisplay {
>       DisplayType type;
> +    /*
> +     * Some UIs have special requirements, for the qemu_main event loop running
> +     * on either the process's initial (main) thread ('Off'), or on an
> +     * explicitly created background thread ('On') because of platform-specific
> +     * event handling.
> +     * The default, 'Auto', indicates the display will work with both setups.
> +     * If 'On', either a qemu_main_thread_fn must be supplied, or it must be
> +     * ensured that all applicable host OS platforms supply a default main.
> +     * (via os_non_loop_main_thread_fn())
> +     */
> +    OnOffAuto qemu_main_on_bg_thread;
> +    qemu_main_fn qemu_main_thread_fn;
>       void (*early_init)(DisplayOptions *opts);
>       void (*init)(DisplayState *ds, DisplayOptions *opts);
>       const char *vc;
> diff --git a/os-posix.c b/os-posix.c
> index 43f9a43f3fe..a173c026f6c 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -37,6 +37,8 @@
>   
>   #ifdef CONFIG_LINUX
>   #include <sys/prctl.h>
> +#elif defined(CONFIG_DARWIN)
> +#include <CoreFoundation/CoreFoundation.h>
>   #endif
>   
>   
> @@ -342,3 +344,21 @@ int os_mlock(void)
>       return -ENOSYS;
>   #endif
>   }
> +
> +#ifdef CONFIG_DARWIN
> +static int os_darwin_cfrunloop_main(void)
> +{
> +    CFRunLoopRun();
> +    abort();
> +}
> +#endif
> +
> +qemu_main_fn os_non_loop_main_thread_fn(void)
> +{
> +#ifdef CONFIG_DARWIN
> +    /* By default, run the OS's event runloop on the main thread. */
> +    return os_darwin_cfrunloop_main;
> +#else
> +    return NULL;
> +#endif
> +}
> diff --git a/system/main.c b/system/main.c
> index 9b91d21ea8c..358eab281b0 100644
> --- a/system/main.c
> +++ b/system/main.c
> @@ -24,13 +24,10 @@
>   
>   #include "qemu/osdep.h"
>   #include "qemu-main.h"
> +#include "qemu/main-loop.h"
>   #include "sysemu/sysemu.h"
>   
> -#ifdef CONFIG_SDL
> -#include <SDL.h>
> -#endif
> -
> -int qemu_default_main(void)
> +static int qemu_default_main(void)
>   {
>       int status;
>   
> @@ -40,10 +37,44 @@ int qemu_default_main(void)
>       return status;
>   }
>   
> -int (*qemu_main)(void) = qemu_default_main;
> +/*
> + * Various macOS system libraries, including the Cocoa UI and anything using
> + * libdispatch, such as ParavirtualizedGraphics.framework, requires that the
> + * main runloop, on the main (initial) thread be running or at least regularly
> + * polled for events. A special mode is therefore supported, where the QEMU
> + * main loop runs on a separate thread and the main thread handles the
> + * CF/Cocoa runloop.
> + */
> +
> +static void *call_qemu_default_main(void *opaque)
> +{
> +    int status;
> +
> +    bql_lock();
> +    status = qemu_default_main();
> +    bql_unlock();
> +
> +    exit(status);
> +}
> +
> +static void qemu_run_default_main_on_new_thread(void)
> +{
> +    QemuThread thread;
> +
> +    qemu_thread_create(&thread, "qemu_main", call_qemu_default_main,
> +                       NULL, QEMU_THREAD_DETACHED);
> +}
> +
> +qemu_main_fn qemu_main;
>   
>   int main(int argc, char **argv)
>   {
>       qemu_init(argc, argv);
> -    return qemu_main();
> +    if (qemu_main) {
> +        qemu_run_default_main_on_new_thread();
> +        bql_unlock();
> +        return qemu_main();
> +    } else {
> +        qemu_default_main();
> +    }
>   }
> diff --git a/system/vl.c b/system/vl.c
> index e83b3b2608b..c1db20dbee9 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -134,6 +134,7 @@
>   #include "sysemu/iothread.h"
>   #include "qemu/guest-random.h"
>   #include "qemu/keyval.h"
> +#include "qemu-main.h"
>   
>   #define MAX_VIRTIO_CONSOLES 1
>   
> @@ -3667,6 +3668,7 @@ void qemu_init(int argc, char **argv)
>       trace_init_file();
>   
>       qemu_init_main_loop(&error_fatal);
> +    qemu_main = os_non_loop_main_thread_fn();
>       cpu_timers_init();
>   
>       user_register_global_props();
> diff --git a/ui/cocoa.m b/ui/cocoa.m
> index 4c2dd335323..393b3800491 100644
> --- a/ui/cocoa.m
> +++ b/ui/cocoa.m
> @@ -73,6 +73,8 @@
>       int height;
>   } QEMUScreen;
>   
> +@class QemuCocoaPasteboardTypeOwner;
> +
>   static void cocoa_update(DisplayChangeListener *dcl,
>                            int x, int y, int w, int h);
>   
> @@ -107,6 +109,7 @@ static void cocoa_switch(DisplayChangeListener *dcl,
>   static NSInteger cbchangecount = -1;
>   static QemuClipboardInfo *cbinfo;
>   static QemuEvent cbevent;
> +static QemuCocoaPasteboardTypeOwner *cbowner;
>   
>   // Utility functions to run specified code block with the BQL held
>   typedef void (^CodeBlock)(void);
> @@ -1321,8 +1324,10 @@ - (void) dealloc
>   {
>       COCOA_DEBUG("QemuCocoaAppController: dealloc\n");
>   
> -    if (cocoaView)
> -        [cocoaView release];
> +    [cocoaView release];
> +    [cbowner release];
> +    cbowner = nil;
> +
>       [super dealloc];
>   }
>   
> @@ -1938,8 +1943,6 @@ - (void)pasteboard:(NSPasteboard *)sender provideDataForType:(NSPasteboardType)t
>   
>   @end
>   
> -static QemuCocoaPasteboardTypeOwner *cbowner;
> -
>   static void cocoa_clipboard_notify(Notifier *notifier, void *data);
>   static void cocoa_clipboard_request(QemuClipboardInfo *info,
>                                       QemuClipboardType type);
> @@ -2002,43 +2005,8 @@ static void cocoa_clipboard_request(QemuClipboardInfo *info,
>       }
>   }
>   
> -/*
> - * The startup process for the OSX/Cocoa UI is complicated, because
> - * OSX insists that the UI runs on the initial main thread, and so we
> - * need to start a second thread which runs the qemu_default_main():
> - * in main():
> - *  in cocoa_display_init():
> - *   assign cocoa_main to qemu_main
> - *   create application, menus, etc
> - *  in cocoa_main():
> - *   create qemu-main thread
> - *   enter OSX run loop
> - */
> -
> -static void *call_qemu_main(void *opaque)
> -{
> -    int status;
> -
> -    COCOA_DEBUG("Second thread: calling qemu_default_main()\n");
> -    bql_lock();
> -    status = qemu_default_main();
> -    bql_unlock();
> -    COCOA_DEBUG("Second thread: qemu_default_main() returned, exiting\n");
> -    [cbowner release];
> -    exit(status);
> -}
> -
>   static int cocoa_main(void)
>   {
> -    QemuThread thread;
> -
> -    COCOA_DEBUG("Entered %s()\n", __func__);
> -
> -    bql_unlock();
> -    qemu_thread_create(&thread, "qemu_main", call_qemu_main,
> -                       NULL, QEMU_THREAD_DETACHED);
> -
> -    // Start the main event loop
>       COCOA_DEBUG("Main thread: entering OSX run loop\n");
>       [NSApp run];
>       COCOA_DEBUG("Main thread: left OSX run loop, which should never happen\n");
> @@ -2120,8 +2088,6 @@ static void cocoa_display_init(DisplayState *ds, DisplayOptions *opts)
>   
>       COCOA_DEBUG("qemu_cocoa: cocoa_display_init\n");
>   
> -    qemu_main = cocoa_main;
> -
>       // Pull this console process up to being a fully-fledged graphical
>       // app with a menubar and Dock icon
>       ProcessSerialNumber psn = { 0, kCurrentProcess };
> @@ -2188,8 +2154,11 @@ static void cocoa_display_init(DisplayState *ds, DisplayOptions *opts)
>   }
>   
>   static QemuDisplay qemu_display_cocoa = {
> -    .type       = DISPLAY_TYPE_COCOA,
> -    .init       = cocoa_display_init,
> +    .type                   = DISPLAY_TYPE_COCOA,
> +    .init                   = cocoa_display_init,
> +    /* The Cocoa UI will run the NSApplication runloop on the main thread.*/
> +    .qemu_main_on_bg_thread = ON_OFF_AUTO_ON,
> +    .qemu_main_thread_fn    = cocoa_main,
>   };
>   
>   static void register_cocoa(void)
> diff --git a/ui/console.c b/ui/console.c
> index 5165f171257..1599d8b7095 100644
> --- a/ui/console.c
> +++ b/ui/console.c
> @@ -33,6 +33,7 @@
>   #include "qemu/main-loop.h"
>   #include "qemu/module.h"
>   #include "qemu/option.h"
> +#include "qemu-main.h"
>   #include "chardev/char.h"
>   #include "trace.h"
>   #include "exec/memory.h"
> @@ -1569,12 +1570,39 @@ void qemu_display_early_init(DisplayOptions *opts)
>   
>   void qemu_display_init(DisplayState *ds, DisplayOptions *opts)
>   {
> +    QemuDisplay *display;
> +    bool bg_main_loop;
> +
>       assert(opts->type < DISPLAY_TYPE__MAX);
>       if (opts->type == DISPLAY_TYPE_NONE) {
>           return;
>       }
> -    assert(dpys[opts->type] != NULL);
> -    dpys[opts->type]->init(ds, opts);
> +    display = dpys[opts->type];
> +    assert(display != NULL);
> +    display->init(ds, opts);
> +
> +    switch (display->qemu_main_on_bg_thread) {
> +    case ON_OFF_AUTO_OFF:
> +        bg_main_loop = false;
> +        qemu_main = NULL;
> +        break;
> +    case ON_OFF_AUTO_ON:
> +        bg_main_loop = true;
> +        break;
> +    case ON_OFF_AUTO_AUTO:
> +    default:
> +        bg_main_loop = qemu_main;
> +        break;
> +    }
> +
> +    trace_qemu_display_init_main_thread(
> +        DisplayType_str(display->type), display->qemu_main_thread_fn, qemu_main,
> +        OnOffAuto_lookup.array[display->qemu_main_on_bg_thread],
> +        display->qemu_main_on_bg_thread, bg_main_loop);
> +    if (bg_main_loop && display->qemu_main_thread_fn) {
> +        qemu_main = display->qemu_main_thread_fn;
> +    }
> +    assert(!bg_main_loop || qemu_main);
>   }
>   
>   const char *qemu_display_get_vc(DisplayOptions *opts)
> diff --git a/ui/sdl2.c b/ui/sdl2.c
> index bd4f5a9da14..35e22785119 100644
> --- a/ui/sdl2.c
> +++ b/ui/sdl2.c
> @@ -971,6 +971,8 @@ static QemuDisplay qemu_display_sdl2 = {
>       .type       = DISPLAY_TYPE_SDL,
>       .early_init = sdl2_display_early_init,
>       .init       = sdl2_display_init,
> +    /* SDL must poll for events (via dpy_refresh) on main thread */
> +    .qemu_main_on_bg_thread = ON_OFF_AUTO_OFF,
>   };
>   
>   static void register_sdl1(void)
> diff --git a/ui/trace-events b/ui/trace-events
> index 3da0d5e2800..1e72c967399 100644
> --- a/ui/trace-events
> +++ b/ui/trace-events
> @@ -16,6 +16,7 @@ displaysurface_free(void *display_surface) "surface=%p"
>   displaychangelistener_register(void *dcl, const char *name) "%p [ %s ]"
>   displaychangelistener_unregister(void *dcl, const char *name) "%p [ %s ]"
>   ppm_save(int fd, void *image) "fd=%d image=%p"
> +qemu_display_init_main_thread(const char *display_name, bool qemu_display_sets_main_fn, bool qemu_main_is_set, const char *display_bg_main_loop_preference, int preference, bool bg_main_loop) "display '%s': sets main thread function: %d, platform provides main function: %d, display background main loop preference: %s (%d); main loop will run on background thread: %d"
>   
>   # gtk-egl.c
>   # gtk-gl-area.c



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
  2024-10-24 10:27 ` [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-25  6:03   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 03/15] hw/display/apple-gfx: Adds PCI implementation Phil Dennis-Jordan
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

MacOS provides a framework (library) that allows any vmm to implement a
paravirtualized 3d graphics passthrough to the host metal stack called
ParavirtualizedGraphics.Framework (PVG). The library abstracts away
almost every aspect of the paravirtualized device model and only provides
and receives callbacks on MMIO access as well as to share memory address
space between the VM and PVG.

This patch implements a QEMU device that drives PVG for the VMApple
variant of it.

Signed-off-by: Alexander Graf <graf@amazon.com>
Co-authored-by: Alexander Graf <graf@amazon.com>

Subsequent changes:

 * Cherry-pick/rebase conflict fixes, API use updates.
 * Moved from hw/vmapple/ (useful outside that machine type)
 * Overhaul of threading model, many thread safety improvements.
 * Asynchronous rendering.
 * Memory and object lifetime fixes.
 * Refactoring to split generic and (vmapple) MMIO variant specific
   code.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---

v2:

 * Cherry-pick/rebase conflict fixes
 * BQL function renaming
 * Moved from hw/vmapple/ (useful outside that machine type)
 * Code review comments: Switched to DEFINE_TYPES macro & little endian
   MMIO.
 * Removed some dead/superfluous code
 * Mad set_mode thread & memory safe
 * Added migration blocker due to lack of (de-)serialisation.
 * Fixes to ObjC refcounting and autorelease pool usage.
 * Fixed ObjC new/init misuse
 * Switched to ObjC category extension for private property.
 * Simplified task memory mapping and made it thread safe.
 * Refactoring to split generic and vmapple MMIO variant specific
   code.
 * Switched to asynchronous MMIO writes on x86-64
 * Rendering and graphics update are now done asynchronously
 * Fixed cursor handling
 * Coding convention fixes
 * Removed software cursor compositing

v3:

 * Rebased on latest upstream, fixed breakages including switching to Resettable methods.
 * Squashed patches dealing with dGPUs, MMIO area size, and GPU picking.
 * Allow re-entrant MMIO; this simplifies the code and solves the divergence
   between x86-64 and arm64 variants.

v4:

 * Renamed '-vmapple' device variant to '-mmio'
 * MMIO device type now requires aarch64 host and guest
 * Complete overhaul of the glue code for making Qemu's and
   ParavirtualizedGraphics.framework's threading and synchronisation models
   work together. Calls into PVG are from dispatch queues while the
   BQL-holding initiating thread processes AIO context events; callbacks from
   PVG are scheduled as BHs on the BQL/main AIO context, awaiting completion
   where necessary.
 * Guest frame rendering state is covered by the BQL, with only the PVG calls
   outside the lock, and serialised on the named render_queue.
 * Simplified logic for dropping frames in-flight during mode changes, fixed
   bug in pending frames logic.
 * Addressed smaller code review notes such as: function naming, object type
   declarations, type names/declarations/casts, code formatting, #include
   order, over-cautious ObjC retain/release, what goes in init vs realize,
   etc.


 hw/display/Kconfig          |   9 +
 hw/display/apple-gfx-mmio.m | 284 ++++++++++++++
 hw/display/apple-gfx.h      |  58 +++
 hw/display/apple-gfx.m      | 713 ++++++++++++++++++++++++++++++++++++
 hw/display/meson.build      |   4 +
 hw/display/trace-events     |  26 ++
 meson.build                 |   4 +
 7 files changed, 1098 insertions(+)
 create mode 100644 hw/display/apple-gfx-mmio.m
 create mode 100644 hw/display/apple-gfx.h
 create mode 100644 hw/display/apple-gfx.m

diff --git a/hw/display/Kconfig b/hw/display/Kconfig
index 2250c740078..6a9b7b19ada 100644
--- a/hw/display/Kconfig
+++ b/hw/display/Kconfig
@@ -140,3 +140,12 @@ config XLNX_DISPLAYPORT
 
 config DM163
     bool
+
+config MAC_PVG
+    bool
+    default y
+
+config MAC_PVG_MMIO
+    bool
+    depends on MAC_PVG && AARCH64
+
diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
new file mode 100644
index 00000000000..06131bc23f1
--- /dev/null
+++ b/hw/display/apple-gfx-mmio.m
@@ -0,0 +1,284 @@
+/*
+ * QEMU Apple ParavirtualizedGraphics.framework device, MMIO (arm64) variant
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
+ * which implements 3d graphics passthrough to the host as well as a
+ * proprietary guest communication channel to drive it. This device model
+ * implements support to drive that library from within QEMU as an MMIO-based
+ * system device for macOS on arm64 VMs.
+ */
+
+#include "qemu/osdep.h"
+#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
+#include "apple-gfx.h"
+#include "monitor/monitor.h"
+#include "hw/sysbus.h"
+#include "hw/irq.h"
+#include "trace.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXMMIOState, APPLE_GFX_MMIO)
+
+/*
+ * ParavirtualizedGraphics.Framework only ships header files for the PCI
+ * variant which does not include IOSFC descriptors and host devices. We add
+ * their definitions here so that we can also work with the ARM version.
+ */
+typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
+typedef bool(^IOSFCUnmapMemory)(
+    void *, void *, void *, void *, void *, void *);
+typedef bool(^IOSFCMapMemory)(
+    uint64_t phys, uint64_t len, bool ro, void **va, void *, void *);
+
+@interface PGDeviceDescriptor (IOSurfaceMapper)
+@property (readwrite, nonatomic) bool usingIOSurfaceMapper;
+@end
+
+@interface PGIOSurfaceHostDeviceDescriptor : NSObject
+-(PGIOSurfaceHostDeviceDescriptor *)init;
+@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory mapMemory;
+@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory unmapMemory;
+@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt raiseInterrupt;
+@end
+
+@interface PGIOSurfaceHostDevice : NSObject
+-(instancetype)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor *)desc;
+-(uint32_t)mmioReadAtOffset:(size_t)offset;
+-(void)mmioWriteAtOffset:(size_t)offset value:(uint32_t)value;
+@end
+
+struct AppleGFXMapSurfaceMemoryJob;
+struct AppleGFXMMIOState {
+    SysBusDevice parent_obj;
+
+    AppleGFXState common;
+
+    qemu_irq irq_gfx;
+    qemu_irq irq_iosfc;
+    MemoryRegion iomem_iosfc;
+    PGIOSurfaceHostDevice *pgiosfc;
+};
+
+typedef struct AppleGFXMMIOJob {
+    AppleGFXMMIOState *state;
+    uint64_t offset;
+    uint64_t value;
+    bool completed;
+} AppleGFXMMIOJob;
+
+static void iosfc_do_read(void *opaque)
+{
+    AppleGFXMMIOJob *job = opaque;
+    job->value = [job->state->pgiosfc mmioReadAtOffset:job->offset];
+    qatomic_set(&job->completed, true);
+    aio_wait_kick();
+}
+
+static uint64_t iosfc_read(void *opaque, hwaddr offset, unsigned size)
+{
+    AppleGFXMMIOJob job = {
+        .state = opaque,
+        .offset = offset,
+        .completed = false,
+    };
+    AioContext *context = qemu_get_aio_context();
+    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
+
+    dispatch_async_f(queue, &job, iosfc_do_read);
+    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
+
+    trace_apple_gfx_mmio_iosfc_read(offset, job.value);
+    return job.value;
+}
+
+static void iosfc_do_write(void *opaque)
+{
+    AppleGFXMMIOJob *job = opaque;
+    [job->state->pgiosfc mmioWriteAtOffset:job->offset value:job->value];
+    qatomic_set(&job->completed, true);
+    aio_wait_kick();
+}
+
+static void iosfc_write(void *opaque, hwaddr offset, uint64_t val,
+                        unsigned size)
+{
+    AppleGFXMMIOJob job = {
+        .state = opaque,
+        .offset = offset,
+        .value = val,
+        .completed = false,
+    };
+    AioContext *context = qemu_get_aio_context();
+    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
+
+    dispatch_async_f(queue, &job, iosfc_do_write);
+    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
+
+    trace_apple_gfx_mmio_iosfc_write(offset, val);
+}
+
+static const MemoryRegionOps apple_iosfc_ops = {
+    .read = iosfc_read,
+    .write = iosfc_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+};
+
+static void raise_iosfc_irq(void *opaque)
+{
+    AppleGFXMMIOState *s = opaque;
+
+    qemu_irq_pulse(s->irq_iosfc);
+}
+
+typedef struct AppleGFXMapSurfaceMemoryJob {
+    uint64_t guest_physical_address;
+    uint64_t guest_physical_length;
+    void *result_mem;
+    AppleGFXMMIOState *state;
+    bool read_only;
+    bool success;
+    bool done;
+} AppleGFXMapSurfaceMemoryJob;
+
+static void apple_gfx_mmio_map_surface_memory(void *opaque)
+{
+    AppleGFXMapSurfaceMemoryJob *job = opaque;
+    AppleGFXMMIOState *s = job->state;
+    mach_vm_address_t mem;
+
+    mem = apple_gfx_host_address_for_gpa_range(job->guest_physical_address,
+                                               job->guest_physical_length,
+                                               job->read_only);
+
+    qemu_mutex_lock(&s->common.job_mutex);
+    job->result_mem = (void*)mem;
+    job->success = mem != 0;
+    job->done = true;
+    qemu_cond_broadcast(&s->common.job_cond);
+    qemu_mutex_unlock(&s->common.job_mutex);
+}
+
+static PGIOSurfaceHostDevice *apple_gfx_prepare_iosurface_host_device(
+    AppleGFXMMIOState *s)
+{
+    PGIOSurfaceHostDeviceDescriptor *iosfc_desc =
+        [PGIOSurfaceHostDeviceDescriptor new];
+    PGIOSurfaceHostDevice *iosfc_host_dev = nil;
+
+    iosfc_desc.mapMemory =
+        ^bool(uint64_t phys, uint64_t len, bool ro, void **va, void *e, void *f) {
+            AppleGFXMapSurfaceMemoryJob job = {
+                .guest_physical_address = phys, .guest_physical_length = len,
+                .read_only = ro, .state = s,
+            };
+
+            aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                                    apple_gfx_mmio_map_surface_memory, &job);
+            apple_gfx_await_bh_job(&s->common, &job.done);
+
+            *va = job.result_mem;
+
+            trace_apple_gfx_iosfc_map_memory(phys, len, ro, va, e, f, *va,
+                                             job.success);
+
+            return job.success;
+        };
+
+    iosfc_desc.unmapMemory =
+        ^bool(void *a, void *b, void *c, void *d, void *e, void *f) {
+            trace_apple_gfx_iosfc_unmap_memory(a, b, c, d, e, f);
+            return true;
+        };
+
+    iosfc_desc.raiseInterrupt = ^bool(uint32_t vector) {
+        trace_apple_gfx_iosfc_raise_irq(vector);
+        aio_bh_schedule_oneshot(qemu_get_aio_context(), raise_iosfc_irq, s);
+        return true;
+    };
+
+    iosfc_host_dev =
+        [[PGIOSurfaceHostDevice alloc] initWithDescriptor:iosfc_desc];
+    [iosfc_desc release];
+    return iosfc_host_dev;
+}
+
+static void raise_gfx_irq(void *opaque)
+{
+    AppleGFXMMIOState *s = opaque;
+
+    qemu_irq_pulse(s->irq_gfx);
+}
+
+static void apple_gfx_mmio_realize(DeviceState *dev, Error **errp)
+{
+    @autoreleasepool {
+        AppleGFXMMIOState *s = APPLE_GFX_MMIO(dev);
+        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
+
+        desc.raiseInterrupt = ^(uint32_t vector) {
+            trace_apple_gfx_raise_irq(vector);
+            aio_bh_schedule_oneshot(qemu_get_aio_context(), raise_gfx_irq, s);
+        };
+
+        desc.usingIOSurfaceMapper = true;
+        s->pgiosfc = apple_gfx_prepare_iosurface_host_device(s);
+
+        apple_gfx_common_realize(&s->common, desc, errp);
+        [desc release];
+        desc = nil;
+    }
+}
+
+static void apple_gfx_mmio_init(Object *obj)
+{
+    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
+
+    apple_gfx_common_init(obj, &s->common, TYPE_APPLE_GFX_MMIO);
+
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->common.iomem_gfx);
+    memory_region_init_io(&s->iomem_iosfc, obj, &apple_iosfc_ops, s,
+                          TYPE_APPLE_GFX_MMIO, 0x10000);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem_iosfc);
+    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_gfx);
+    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_iosfc);
+}
+
+static void apple_gfx_mmio_reset(Object *obj, ResetType type)
+{
+    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
+    [s->common.pgdev reset];
+}
+
+
+static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
+
+    rc->phases.hold = apple_gfx_mmio_reset;
+    dc->hotpluggable = false;
+    dc->realize = apple_gfx_mmio_realize;
+}
+
+static TypeInfo apple_gfx_mmio_types[] = {
+    {
+        .name          = TYPE_APPLE_GFX_MMIO,
+        .parent        = TYPE_SYS_BUS_DEVICE,
+        .instance_size = sizeof(AppleGFXMMIOState),
+        .class_init    = apple_gfx_mmio_class_init,
+        .instance_init = apple_gfx_mmio_init,
+    }
+};
+DEFINE_TYPES(apple_gfx_mmio_types)
diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
new file mode 100644
index 00000000000..39931fba65a
--- /dev/null
+++ b/hw/display/apple-gfx.h
@@ -0,0 +1,58 @@
+#ifndef QEMU_APPLE_GFX_H
+#define QEMU_APPLE_GFX_H
+
+#define TYPE_APPLE_GFX_MMIO         "apple-gfx-mmio"
+#define TYPE_APPLE_GFX_PCI          "apple-gfx-pci"
+
+#include "qemu/osdep.h"
+#include <dispatch/dispatch.h>
+#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
+#include "qemu/typedefs.h"
+#include "exec/memory.h"
+#include "ui/surface.h"
+
+@class PGDeviceDescriptor;
+@protocol PGDevice;
+@protocol PGDisplay;
+@protocol MTLDevice;
+@protocol MTLTexture;
+@protocol MTLCommandQueue;
+
+typedef QTAILQ_HEAD(, PGTask_s) PGTaskList;
+
+struct AppleGFXMapMemoryJob;
+typedef struct AppleGFXState {
+    MemoryRegion iomem_gfx;
+    id<PGDevice> pgdev;
+    id<PGDisplay> pgdisp;
+    PGTaskList tasks;
+    QemuConsole *con;
+    id<MTLDevice> mtl;
+    id<MTLCommandQueue> mtl_queue;
+    bool cursor_show;
+    QEMUCursor *cursor;
+
+    /* For running PVG memory-mapping requests in the AIO context */
+    QemuCond job_cond;
+    QemuMutex job_mutex;
+
+    dispatch_queue_t render_queue;
+    /* The following fields should only be accessed from the BQL: */
+    bool gfx_update_requested;
+    bool new_frame_ready;
+    bool using_managed_texture_storage;
+    int32_t pending_frames;
+    void *vram;
+    DisplaySurface *surface;
+    id<MTLTexture> texture;
+} AppleGFXState;
+
+void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char* obj_name);
+void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
+                              Error **errp);
+uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
+                                               uint64_t length, bool read_only);
+void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag);
+
+#endif
+
diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
new file mode 100644
index 00000000000..46be9957f69
--- /dev/null
+++ b/hw/display/apple-gfx.m
@@ -0,0 +1,713 @@
+/*
+ * QEMU Apple ParavirtualizedGraphics.framework device
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
+ * which implements 3d graphics passthrough to the host as well as a
+ * proprietary guest communication channel to drive it. This device model
+ * implements support to drive that library from within QEMU.
+ */
+
+#include "qemu/osdep.h"
+#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
+#include <mach/mach_vm.h>
+#include "apple-gfx.h"
+#include "trace.h"
+#include "qemu-main.h"
+#include "exec/address-spaces.h"
+#include "migration/blocker.h"
+#include "monitor/monitor.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+#include "qapi/visitor.h"
+#include "qapi/error.h"
+#include "ui/console.h"
+
+static const PGDisplayCoord_t apple_gfx_modes[] = {
+    { .x = 1440, .y = 1080 },
+    { .x = 1280, .y = 1024 },
+};
+
+/* This implements a type defined in <ParavirtualizedGraphics/PGDevice.h>
+ * which is opaque from the framework's point of view. Typedef PGTask_t already
+ * exists in the framework headers. */
+struct PGTask_s {
+    QTAILQ_ENTRY(PGTask_s) node;
+    mach_vm_address_t address;
+    uint64_t len;
+};
+
+static Error *apple_gfx_mig_blocker;
+
+static void apple_gfx_render_frame_completed(AppleGFXState *s,
+                                             uint32_t width, uint32_t height);
+
+static inline dispatch_queue_t get_background_queue(void)
+{
+    return dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
+}
+
+static PGTask_t *apple_gfx_new_task(AppleGFXState *s, uint64_t len)
+{
+    mach_vm_address_t task_mem;
+    PGTask_t *task;
+    kern_return_t r;
+
+    r = mach_vm_allocate(mach_task_self(), &task_mem, len, VM_FLAGS_ANYWHERE);
+    if (r != KERN_SUCCESS || task_mem == 0) {
+        return NULL;
+    }
+
+    task = g_new0(PGTask_t, 1);
+
+    task->address = task_mem;
+    task->len = len;
+    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
+
+    return task;
+}
+
+typedef struct AppleGFXIOJob {
+    AppleGFXState *state;
+    uint64_t offset;
+    uint64_t value;
+    bool completed;
+} AppleGFXIOJob;
+
+static void apple_gfx_do_read(void *opaque)
+{
+    AppleGFXIOJob *job = opaque;
+    job->value = [job->state->pgdev mmioReadAtOffset:job->offset];
+    qatomic_set(&job->completed, true);
+    aio_wait_kick();
+}
+
+static uint64_t apple_gfx_read(void *opaque, hwaddr offset, unsigned size)
+{
+    AppleGFXIOJob job = {
+        .state = opaque,
+        .offset = offset,
+        .completed = false,
+    };
+    AioContext *context = qemu_get_aio_context();
+    dispatch_queue_t queue = get_background_queue();
+
+    dispatch_async_f(queue, &job, apple_gfx_do_read);
+    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
+
+    trace_apple_gfx_read(offset, job.value);
+    return job.value;
+}
+
+static void apple_gfx_do_write(void *opaque)
+{
+    AppleGFXIOJob *job = opaque;
+    [job->state->pgdev mmioWriteAtOffset:job->offset value:job->value];
+    qatomic_set(&job->completed, true);
+    aio_wait_kick();
+}
+
+static void apple_gfx_write(void *opaque, hwaddr offset, uint64_t val,
+                            unsigned size)
+{
+    /* The methods mmioReadAtOffset: and especially mmioWriteAtOffset: can
+     * trigger and block on operations on other dispatch queues, which in turn
+     * may call back out on one or more of the callback blocks. For this reason,
+     * and as we are holding the BQL, we invoke the I/O methods on a pool
+     * thread and handle AIO tasks while we wait. Any work in the callbacks
+     * requiring the BQL will in turn schedule BHs which this thread will
+     * process while waiting. */
+    AppleGFXIOJob job = {
+        .state = opaque,
+        .offset = offset,
+        .value = val,
+        .completed = false,
+    };
+    AioContext *context = qemu_get_current_aio_context();
+    dispatch_queue_t queue = get_background_queue();
+
+    dispatch_async_f(queue, &job, apple_gfx_do_write);
+    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
+
+    trace_apple_gfx_write(offset, val);
+}
+
+static const MemoryRegionOps apple_gfx_ops = {
+    .read = apple_gfx_read,
+    .write = apple_gfx_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void apple_gfx_render_new_frame_bql_unlock(AppleGFXState *s)
+{
+    BOOL r;
+    uint32_t width = surface_width(s->surface);
+    uint32_t height = surface_height(s->surface);
+    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
+    id<MTLCommandBuffer> command_buffer = [s->mtl_queue commandBuffer];
+    id<MTLTexture> texture = s->texture;
+
+    assert(bql_locked());
+    [texture retain];
+
+    bql_unlock();
+
+    /* This is not safe to call from the BQL due to PVG-internal locks causing
+     * deadlocks. */
+    r = [s->pgdisp encodeCurrentFrameToCommandBuffer:command_buffer
+                                             texture:texture
+                                              region:region];
+    if (!r) {
+        [texture release];
+        bql_lock();
+        --s->pending_frames;
+        bql_unlock();
+        qemu_log_mask(LOG_GUEST_ERROR, "apple_gfx_render_new_frame_bql_unlock: "
+                      "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
+        return;
+    }
+
+    if (s->using_managed_texture_storage) {
+        /* "Managed" textures exist in both VRAM and RAM and must be synced. */
+        id<MTLBlitCommandEncoder> blit = [command_buffer blitCommandEncoder];
+        [blit synchronizeResource:texture];
+        [blit endEncoding];
+    }
+    [texture release];
+    [command_buffer addCompletedHandler:
+        ^(id<MTLCommandBuffer> cb)
+        {
+            dispatch_async(s->render_queue, ^{
+                apple_gfx_render_frame_completed(s, width, height);
+            });
+        }];
+    [command_buffer commit];
+}
+
+static void copy_mtl_texture_to_surface_mem(id<MTLTexture> texture, void *vram)
+{
+    /* TODO: Skip this entirely on a pure Metal or headless/guest-only
+     * rendering path, else use a blit command encoder? Needs careful
+     * (double?) buffering design. */
+    size_t width = texture.width, height = texture.height;
+    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
+    [texture getBytes:vram
+          bytesPerRow:(width * 4)
+        bytesPerImage:(width * height * 4)
+           fromRegion:region
+          mipmapLevel:0
+                slice:0];
+}
+
+static void apple_gfx_render_frame_completed(AppleGFXState *s,
+                                             uint32_t width, uint32_t height)
+{
+    bql_lock();
+    --s->pending_frames;
+    assert(s->pending_frames >= 0);
+
+    /* Only update display if mode hasn't changed since we started rendering. */
+    if (width == surface_width(s->surface) &&
+        height == surface_height(s->surface)) {
+        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
+        if (s->gfx_update_requested) {
+            s->gfx_update_requested = false;
+            dpy_gfx_update_full(s->con);
+            graphic_hw_update_done(s->con);
+            s->new_frame_ready = false;
+        } else {
+            s->new_frame_ready = true;
+        }
+    }
+    if (s->pending_frames > 0) {
+        apple_gfx_render_new_frame_bql_unlock(s);
+    } else {
+        bql_unlock();
+    }
+}
+
+static void apple_gfx_fb_update_display(void *opaque)
+{
+    AppleGFXState *s = opaque;
+
+    assert(bql_locked());
+    if (s->new_frame_ready) {
+        dpy_gfx_update_full(s->con);
+        s->new_frame_ready = false;
+        graphic_hw_update_done(s->con);
+    } else if (s->pending_frames > 0) {
+        s->gfx_update_requested = true;
+    } else {
+        graphic_hw_update_done(s->con);
+    }
+}
+
+static const GraphicHwOps apple_gfx_fb_ops = {
+    .gfx_update = apple_gfx_fb_update_display,
+    .gfx_update_async = true,
+};
+
+static void update_cursor(AppleGFXState *s)
+{
+    assert(bql_locked());
+    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
+                  s->pgdisp.cursorPosition.y, s->cursor_show);
+}
+
+static void set_mode(AppleGFXState *s, uint32_t width, uint32_t height)
+{
+    MTLTextureDescriptor *textureDescriptor;
+
+    if (s->surface &&
+        width == surface_width(s->surface) &&
+        height == surface_height(s->surface)) {
+        return;
+    }
+
+    g_free(s->vram);
+    [s->texture release];
+
+    s->vram = g_malloc0_n(width * height, 4);
+    s->surface = qemu_create_displaysurface_from(width, height, PIXMAN_LE_a8r8g8b8,
+                                                 width * 4, s->vram);
+
+    @autoreleasepool {
+        textureDescriptor =
+            [MTLTextureDescriptor
+                texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
+                                             width:width
+                                            height:height
+                                         mipmapped:NO];
+        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
+        s->texture = [s->mtl newTextureWithDescriptor:textureDescriptor];
+    }
+
+    s->using_managed_texture_storage =
+        (s->texture.storageMode == MTLStorageModeManaged);
+    dpy_gfx_replace_surface(s->con, s->surface);
+}
+
+static void create_fb(AppleGFXState *s)
+{
+    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
+    set_mode(s, 1440, 1080);
+
+    s->cursor_show = true;
+}
+
+static size_t apple_gfx_get_default_mmio_range_size(void)
+{
+    size_t mmio_range_size;
+    @autoreleasepool {
+        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
+        mmio_range_size = desc.mmioLength;
+        [desc release];
+    }
+    return mmio_range_size;
+}
+
+void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char* obj_name)
+{
+    size_t mmio_range_size = apple_gfx_get_default_mmio_range_size();
+
+    trace_apple_gfx_common_init(obj_name, mmio_range_size);
+    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s, obj_name,
+                          mmio_range_size);
+
+    /* TODO: PVG framework supports serialising device state: integrate it! */
+}
+
+typedef struct AppleGFXMapMemoryJob {
+    AppleGFXState *state;
+    PGTask_t *task;
+    uint64_t virtual_offset;
+    PGPhysicalMemoryRange_t *ranges;
+    uint32_t range_count;
+    bool read_only;
+    bool success;
+    bool done;
+} AppleGFXMapMemoryJob;
+
+uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
+                                               uint64_t length, bool read_only)
+{
+    MemoryRegion *ram_region;
+    uintptr_t host_address;
+    hwaddr ram_region_offset = 0;
+    hwaddr ram_region_length = length;
+
+    ram_region = address_space_translate(&address_space_memory,
+                                         guest_physical,
+                                         &ram_region_offset,
+                                         &ram_region_length, !read_only,
+                                         MEMTXATTRS_UNSPECIFIED);
+
+    if (!ram_region || ram_region_length < length ||
+        !memory_access_is_direct(ram_region, !read_only)) {
+        return 0;
+    }
+
+    host_address = (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
+    if (host_address == 0) {
+        return 0;
+    }
+    host_address += ram_region_offset;
+
+    return host_address;
+}
+
+static void apple_gfx_map_memory(void *opaque)
+{
+    AppleGFXMapMemoryJob *job = opaque;
+    AppleGFXState *s = job->state;
+    PGTask_t *task                  = job->task;
+    uint32_t range_count            = job->range_count;
+    uint64_t virtual_offset         = job->virtual_offset;
+    PGPhysicalMemoryRange_t *ranges = job->ranges;
+    bool read_only                  = job->read_only;
+    kern_return_t r;
+    mach_vm_address_t target, source;
+    vm_prot_t cur_protection, max_protection;
+    bool success = true;
+
+    g_assert(bql_locked());
+
+    trace_apple_gfx_map_memory(task, range_count, virtual_offset, read_only);
+    for (int i = 0; i < range_count; i++) {
+        PGPhysicalMemoryRange_t *range = &ranges[i];
+
+        target = task->address + virtual_offset;
+        virtual_offset += range->physicalLength;
+
+        trace_apple_gfx_map_memory_range(i, range->physicalAddress,
+                                         range->physicalLength);
+
+        source = apple_gfx_host_address_for_gpa_range(range->physicalAddress,
+                                                      range->physicalLength,
+                                                      read_only);
+        if (source == 0) {
+            success = false;
+            continue;
+        }
+
+        MemoryRegion* alt_mr = NULL;
+        mach_vm_address_t alt_source = (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress, range->physicalLength, NULL);
+        g_assert(alt_source == source);
+
+        cur_protection = 0;
+        max_protection = 0;
+        // Map guest RAM at range->physicalAddress into PG task memory range
+        r = mach_vm_remap(mach_task_self(),
+                          &target, range->physicalLength, vm_page_size - 1,
+                          VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE,
+                          mach_task_self(),
+                          source, false /* shared mapping, no copy */,
+                          &cur_protection, &max_protection,
+                          VM_INHERIT_COPY);
+        trace_apple_gfx_remap(r, source, target);
+        g_assert(r == KERN_SUCCESS);
+    }
+
+    qemu_mutex_lock(&s->job_mutex);
+    job->success = success;
+    job->done = true;
+    qemu_cond_broadcast(&s->job_cond);
+    qemu_mutex_unlock(&s->job_mutex);
+}
+
+void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag)
+{
+    qemu_mutex_lock(&s->job_mutex);
+    while (!*job_done_flag) {
+        qemu_cond_wait(&s->job_cond, &s->job_mutex);
+    }
+    qemu_mutex_unlock(&s->job_mutex);
+}
+
+typedef struct AppleGFXReadMemoryJob {
+    AppleGFXState *s;
+    hwaddr physical_address;
+    uint64_t length;
+    void *dst;
+    bool done;
+} AppleGFXReadMemoryJob;
+
+static void apple_gfx_do_read_memory(void *opaque)
+{
+    AppleGFXReadMemoryJob *job = opaque;
+    AppleGFXState *s = job->s;
+
+    cpu_physical_memory_read(job->physical_address, job->dst, job->length);
+
+    qemu_mutex_lock(&s->job_mutex);
+    job->done = true;
+    qemu_cond_broadcast(&s->job_cond);
+    qemu_mutex_unlock(&s->job_mutex);
+}
+
+static void apple_gfx_read_memory(AppleGFXState *s, hwaddr physical_address,
+                                  uint64_t length, void *dst)
+{
+    AppleGFXReadMemoryJob job = {
+        s, physical_address, length, dst
+    };
+
+    trace_apple_gfx_read_memory(physical_address, length, dst);
+
+    /* Traversing the memory map requires RCU/BQL, so do it in a BH. */
+    aio_bh_schedule_oneshot(qemu_get_aio_context(), apple_gfx_do_read_memory,
+                            &job);
+    apple_gfx_await_bh_job(s, &job.done);
+}
+
+static void apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
+                                                     PGDeviceDescriptor *desc)
+{
+    desc.createTask = ^(uint64_t vmSize, void * _Nullable * _Nonnull baseAddress) {
+        PGTask_t *task = apple_gfx_new_task(s, vmSize);
+        *baseAddress = (void *)task->address;
+        trace_apple_gfx_create_task(vmSize, *baseAddress);
+        return task;
+    };
+
+    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
+        trace_apple_gfx_destroy_task(task);
+        QTAILQ_REMOVE(&s->tasks, task, node);
+        mach_vm_deallocate(mach_task_self(), task->address, task->len);
+        g_free(task);
+    };
+
+    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t range_count,
+                       uint64_t virtual_offset, bool read_only,
+                       PGPhysicalMemoryRange_t * _Nonnull ranges) {
+        AppleGFXMapMemoryJob job = {
+            .state = s,
+            .task = task, .ranges = ranges, .range_count = range_count,
+            .read_only = read_only, .virtual_offset = virtual_offset,
+            .done = false, .success = true,
+        };
+        if (range_count > 0) {
+            aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                                    apple_gfx_map_memory, &job);
+            apple_gfx_await_bh_job(s, &job.done);
+        }
+        return job.success;
+    };
+
+    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task, uint64_t virtualOffset,
+                         uint64_t length) {
+        kern_return_t r;
+        mach_vm_address_t range_address;
+
+        trace_apple_gfx_unmap_memory(task, virtualOffset, length);
+
+        /* Replace task memory range with fresh pages, undoing the mapping
+         * from guest RAM. */
+        range_address = task->address + virtualOffset;
+        r = mach_vm_allocate(mach_task_self(), &range_address, length,
+                             VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE);
+        g_assert(r == KERN_SUCCESS);
+
+        return true;
+    };
+
+    desc.readMemory = ^bool(uint64_t physical_address, uint64_t length,
+                            void * _Nonnull dst) {
+        apple_gfx_read_memory(s, physical_address, length, dst);
+        return true;
+    };
+}
+
+static PGDisplayDescriptor *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
+{
+    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
+
+    disp_desc.name = @"QEMU display";
+    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A 20" display */
+    disp_desc.queue = dispatch_get_main_queue();
+    disp_desc.newFrameEventHandler = ^(void) {
+        trace_apple_gfx_new_frame();
+        dispatch_async(s->render_queue, ^{
+            /* Drop frames if we get too far ahead. */
+            bql_lock();
+            if (s->pending_frames >= 2) {
+                bql_unlock();
+                return;
+            }
+            ++s->pending_frames;
+            if (s->pending_frames > 1) {
+                bql_unlock();
+                return;
+            }
+            @autoreleasepool {
+                apple_gfx_render_new_frame_bql_unlock(s);
+            }
+        });
+    };
+    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t sizeInPixels,
+                                    OSType pixelFormat) {
+        trace_apple_gfx_mode_change(sizeInPixels.x, sizeInPixels.y);
+
+        BQL_LOCK_GUARD();
+        set_mode(s, sizeInPixels.x, sizeInPixels.y);
+    };
+    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
+                                     PGDisplayCoord_t hotSpot) {
+        [glyph retain];
+        dispatch_async(get_background_queue(), ^{
+            BQL_LOCK_GUARD();
+            uint32_t bpp = glyph.bitsPerPixel;
+            size_t width = glyph.pixelsWide;
+            size_t height = glyph.pixelsHigh;
+            size_t padding_bytes_per_row = glyph.bytesPerRow - width * 4;
+            const uint8_t* px_data = glyph.bitmapData;
+
+            trace_apple_gfx_cursor_set(bpp, width, height);
+
+            if (s->cursor) {
+                cursor_unref(s->cursor);
+                s->cursor = NULL;
+            }
+
+            if (bpp == 32) { /* Shouldn't be anything else, but just to be safe...*/
+                s->cursor = cursor_alloc(width, height);
+                s->cursor->hot_x = hotSpot.x;
+                s->cursor->hot_y = hotSpot.y;
+
+                uint32_t *dest_px = s->cursor->data;
+
+                for (size_t y = 0; y < height; ++y) {
+                    for (size_t x = 0; x < width; ++x) {
+                        /* NSBitmapImageRep's red & blue channels are swapped
+                         * compared to QEMUCursor's. */
+                        *dest_px =
+                            (px_data[0] << 16u) |
+                            (px_data[1] <<  8u) |
+                            (px_data[2] <<  0u) |
+                            (px_data[3] << 24u);
+                        ++dest_px;
+                        px_data += 4;
+                    }
+                    px_data += padding_bytes_per_row;
+                }
+                dpy_cursor_define(s->con, s->cursor);
+                update_cursor(s);
+            }
+            [glyph release];
+        });
+    };
+    disp_desc.cursorShowHandler = ^(BOOL show) {
+        dispatch_async(get_background_queue(), ^{
+            BQL_LOCK_GUARD();
+            trace_apple_gfx_cursor_show(show);
+            s->cursor_show = show;
+            update_cursor(s);
+        });
+    };
+    disp_desc.cursorMoveHandler = ^(void) {
+        dispatch_async(get_background_queue(), ^{
+            BQL_LOCK_GUARD();
+            trace_apple_gfx_cursor_move();
+            update_cursor(s);
+        });
+    };
+
+    return disp_desc;
+}
+
+static NSArray<PGDisplayMode*>* apple_gfx_prepare_display_mode_array(void)
+{
+    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
+    NSArray<PGDisplayMode*>* mode_array = nil;
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
+        modes[i] =
+            [[PGDisplayMode alloc] initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
+    }
+
+    mode_array = [NSArray arrayWithObjects:modes count:ARRAY_SIZE(apple_gfx_modes)];
+
+    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
+        [modes[i] release];
+        modes[i] = nil;
+    }
+
+    return mode_array;
+}
+
+static id<MTLDevice> copy_suitable_metal_device(void)
+{
+    id<MTLDevice> dev = nil;
+    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
+
+    /* Prefer a unified memory GPU. Failing that, pick a non-removable GPU. */
+    for (size_t i = 0; i < devs.count; ++i) {
+        if (devs[i].hasUnifiedMemory) {
+            dev = devs[i];
+            break;
+        }
+        if (!devs[i].removable) {
+            dev = devs[i];
+        }
+    }
+
+    if (dev != nil) {
+        [dev retain];
+    } else {
+        dev = MTLCreateSystemDefaultDevice();
+    }
+    [devs release];
+
+    return dev;
+}
+
+void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
+                              Error **errp)
+{
+    PGDisplayDescriptor *disp_desc = nil;
+
+    if (apple_gfx_mig_blocker == NULL) {
+        error_setg(&apple_gfx_mig_blocker,
+                  "Migration state blocked by apple-gfx display device");
+        if (migrate_add_blocker(&apple_gfx_mig_blocker, errp) < 0) {
+            return;
+        }
+    }
+
+    QTAILQ_INIT(&s->tasks);
+    s->render_queue = dispatch_queue_create("apple-gfx.render",
+                                            DISPATCH_QUEUE_SERIAL);
+    s->mtl = copy_suitable_metal_device();
+    s->mtl_queue = [s->mtl newCommandQueue];
+
+    desc.device = s->mtl;
+
+    apple_gfx_register_task_mapping_handlers(s, desc);
+
+    s->pgdev = PGNewDeviceWithDescriptor(desc);
+
+    disp_desc = apple_gfx_prepare_display_descriptor(s);
+    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
+                                              port:0 serialNum:1234];
+    [disp_desc release];
+    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
+
+    create_fb(s);
+
+    qemu_mutex_init(&s->job_mutex);
+    qemu_cond_init(&s->job_cond);
+}
diff --git a/hw/display/meson.build b/hw/display/meson.build
index 20a94973fa2..619e642905a 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true: files('artist.c'))
 
 system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c', 'ati_2d.c', 'ati_dbg.c'), pixman])
 
+system_ss.add(when: 'CONFIG_MAC_PVG',         if_true: [files('apple-gfx.m'), pvg, metal])
+if cpu == 'aarch64'
+  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true: [files('apple-gfx-mmio.m'), pvg, metal])
+endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
   virtio_gpu_ss = ss.source_set()
diff --git a/hw/display/trace-events b/hw/display/trace-events
index 781f8a33203..214998312b9 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width) "dest_width : %u"
 dm163_leds(int led, uint32_t value) "led %d: 0x%x"
 dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
 dm163_refresh_rate(uint32_t rr) "refresh rate %d"
+
+# apple-gfx.m
+apple_gfx_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
+apple_gfx_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
+apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x base_addr=%p"
+apple_gfx_destroy_task(void *task) "task=%p"
+apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t virtual_offset, uint32_t read_only) "task=%p range_count=0x%x virtual_offset=0x%"PRIx64" read_only=%d"
+apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr, uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
+apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
+apple_gfx_unmap_memory(void *task, uint64_t virtual_offset, uint64_t length) "task=%p virtual_offset=0x%"PRIx64" length=0x%"PRIx64
+apple_gfx_read_memory(uint64_t phys_address, uint64_t length, void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
+apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
+apple_gfx_new_frame(void) ""
+apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64" y=%"PRId64
+apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t height) "bpp=%d width=%"PRId64" height=0x%"PRId64
+apple_gfx_cursor_show(uint32_t show) "show=%d"
+apple_gfx_cursor_move(void) ""
+apple_gfx_common_init(const char *device_name, size_t mmio_size) "device: %s; MMIO size: %zu bytes"
+
+# apple-gfx-mmio.m
+apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
+apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
+apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t ro, void *va, void *e, void *f, void* va_result, int success) "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p -> *va=%p, success = %d"
+apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c, void *d, void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
+apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
+
diff --git a/meson.build b/meson.build
index d26690ce204..0e124eff13f 100644
--- a/meson.build
+++ b/meson.build
@@ -761,6 +761,8 @@ socket = []
 version_res = []
 coref = []
 iokit = []
+pvg = []
+metal = []
 emulator_link_args = []
 midl = not_found
 widl = not_found
@@ -782,6 +784,8 @@ elif host_os == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
   iokit = dependency('appleframeworks', modules: 'IOKit', required: false)
   host_dsosuf = '.dylib'
+  pvg = dependency('appleframeworks', modules: 'ParavirtualizedGraphics')
+  metal = dependency('appleframeworks', modules: 'Metal')
 elif host_os == 'sunos'
   socket = [cc.find_library('socket'),
             cc.find_library('nsl'),
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-24 10:28 ` [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support Phil Dennis-Jordan
@ 2024-10-25  6:03   ` Akihiko Odaki
  2024-10-25 19:43     ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-25  6:03 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> MacOS provides a framework (library) that allows any vmm to implement a
> paravirtualized 3d graphics passthrough to the host metal stack called
> ParavirtualizedGraphics.Framework (PVG). The library abstracts away
> almost every aspect of the paravirtualized device model and only provides
> and receives callbacks on MMIO access as well as to share memory address
> space between the VM and PVG.
> 
> This patch implements a QEMU device that drives PVG for the VMApple
> variant of it.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Co-authored-by: Alexander Graf <graf@amazon.com>
> 
> Subsequent changes:
> 
>   * Cherry-pick/rebase conflict fixes, API use updates.
>   * Moved from hw/vmapple/ (useful outside that machine type)
>   * Overhaul of threading model, many thread safety improvements.
>   * Asynchronous rendering.
>   * Memory and object lifetime fixes.
>   * Refactoring to split generic and (vmapple) MMIO variant specific
>     code.
> 
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> 
> v2:
> 
>   * Cherry-pick/rebase conflict fixes
>   * BQL function renaming
>   * Moved from hw/vmapple/ (useful outside that machine type)
>   * Code review comments: Switched to DEFINE_TYPES macro & little endian
>     MMIO.
>   * Removed some dead/superfluous code
>   * Mad set_mode thread & memory safe
>   * Added migration blocker due to lack of (de-)serialisation.
>   * Fixes to ObjC refcounting and autorelease pool usage.
>   * Fixed ObjC new/init misuse
>   * Switched to ObjC category extension for private property.
>   * Simplified task memory mapping and made it thread safe.
>   * Refactoring to split generic and vmapple MMIO variant specific
>     code.
>   * Switched to asynchronous MMIO writes on x86-64
>   * Rendering and graphics update are now done asynchronously
>   * Fixed cursor handling
>   * Coding convention fixes
>   * Removed software cursor compositing
> 
> v3:
> 
>   * Rebased on latest upstream, fixed breakages including switching to Resettable methods.
>   * Squashed patches dealing with dGPUs, MMIO area size, and GPU picking.
>   * Allow re-entrant MMIO; this simplifies the code and solves the divergence
>     between x86-64 and arm64 variants.
> 
> v4:
> 
>   * Renamed '-vmapple' device variant to '-mmio'
>   * MMIO device type now requires aarch64 host and guest
>   * Complete overhaul of the glue code for making Qemu's and
>     ParavirtualizedGraphics.framework's threading and synchronisation models
>     work together. Calls into PVG are from dispatch queues while the
>     BQL-holding initiating thread processes AIO context events; callbacks from
>     PVG are scheduled as BHs on the BQL/main AIO context, awaiting completion
>     where necessary.
>   * Guest frame rendering state is covered by the BQL, with only the PVG calls
>     outside the lock, and serialised on the named render_queue.
>   * Simplified logic for dropping frames in-flight during mode changes, fixed
>     bug in pending frames logic.
>   * Addressed smaller code review notes such as: function naming, object type
>     declarations, type names/declarations/casts, code formatting, #include
>     order, over-cautious ObjC retain/release, what goes in init vs realize,
>     etc.
> 
> 
>   hw/display/Kconfig          |   9 +
>   hw/display/apple-gfx-mmio.m | 284 ++++++++++++++
>   hw/display/apple-gfx.h      |  58 +++
>   hw/display/apple-gfx.m      | 713 ++++++++++++++++++++++++++++++++++++
>   hw/display/meson.build      |   4 +
>   hw/display/trace-events     |  26 ++
>   meson.build                 |   4 +
>   7 files changed, 1098 insertions(+)
>   create mode 100644 hw/display/apple-gfx-mmio.m
>   create mode 100644 hw/display/apple-gfx.h
>   create mode 100644 hw/display/apple-gfx.m
> 
> diff --git a/hw/display/Kconfig b/hw/display/Kconfig
> index 2250c740078..6a9b7b19ada 100644
> --- a/hw/display/Kconfig
> +++ b/hw/display/Kconfig
> @@ -140,3 +140,12 @@ config XLNX_DISPLAYPORT
>   
>   config DM163
>       bool
> +
> +config MAC_PVG
> +    bool
> +    default y
> +
> +config MAC_PVG_MMIO
> +    bool
> +    depends on MAC_PVG && AARCH64
> +
> diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
> new file mode 100644
> index 00000000000..06131bc23f1
> --- /dev/null
> +++ b/hw/display/apple-gfx-mmio.m
> @@ -0,0 +1,284 @@
> +/*
> + * QEMU Apple ParavirtualizedGraphics.framework device, MMIO (arm64) variant
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.

Use SPDX-License-Identifier. You can find some examples with grep.

> + *
> + * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
> + * which implements 3d graphics passthrough to the host as well as a
> + * proprietary guest communication channel to drive it. This device model
> + * implements support to drive that library from within QEMU as an MMIO-based
> + * system device for macOS on arm64 VMs.
> + */
> +
> +#include "qemu/osdep.h"
> +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> +#include "apple-gfx.h"
> +#include "monitor/monitor.h"
> +#include "hw/sysbus.h"
> +#include "hw/irq.h"
> +#include "trace.h"
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXMMIOState, APPLE_GFX_MMIO)
> +
> +/*
> + * ParavirtualizedGraphics.Framework only ships header files for the PCI
> + * variant which does not include IOSFC descriptors and host devices. We add
> + * their definitions here so that we can also work with the ARM version.
> + */
> +typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
> +typedef bool(^IOSFCUnmapMemory)(
> +    void *, void *, void *, void *, void *, void *);
> +typedef bool(^IOSFCMapMemory)(
> +    uint64_t phys, uint64_t len, bool ro, void **va, void *, void *);
> +
> +@interface PGDeviceDescriptor (IOSurfaceMapper)
> +@property (readwrite, nonatomic) bool usingIOSurfaceMapper;
> +@end
> +
> +@interface PGIOSurfaceHostDeviceDescriptor : NSObject
> +-(PGIOSurfaceHostDeviceDescriptor *)init;
> +@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory mapMemory;
> +@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory unmapMemory;
> +@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt raiseInterrupt;
> +@end
> +
> +@interface PGIOSurfaceHostDevice : NSObject
> +-(instancetype)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor *)desc;
> +-(uint32_t)mmioReadAtOffset:(size_t)offset;
> +-(void)mmioWriteAtOffset:(size_t)offset value:(uint32_t)value;
> +@end
> +
> +struct AppleGFXMapSurfaceMemoryJob;
> +struct AppleGFXMMIOState {
> +    SysBusDevice parent_obj;
> +
> +    AppleGFXState common;
> +
> +    qemu_irq irq_gfx;
> +    qemu_irq irq_iosfc;
> +    MemoryRegion iomem_iosfc;
> +    PGIOSurfaceHostDevice *pgiosfc;
> +};
> +
> +typedef struct AppleGFXMMIOJob {
> +    AppleGFXMMIOState *state;
> +    uint64_t offset;
> +    uint64_t value;
> +    bool completed;
> +} AppleGFXMMIOJob;
> +
> +static void iosfc_do_read(void *opaque)
> +{
> +    AppleGFXMMIOJob *job = opaque;
> +    job->value = [job->state->pgiosfc mmioReadAtOffset:job->offset];
> +    qatomic_set(&job->completed, true);
> +    aio_wait_kick();
> +}
> +
> +static uint64_t iosfc_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    AppleGFXMMIOJob job = {
> +        .state = opaque,
> +        .offset = offset,
> +        .completed = false,
> +    };
> +    AioContext *context = qemu_get_aio_context();
> +    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> +
> +    dispatch_async_f(queue, &job, iosfc_do_read);
> +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));

Pass NULL as the first argument of AIO_WAIT_WHILE().

> +
> +    trace_apple_gfx_mmio_iosfc_read(offset, job.value);
> +    return job.value;
> +}
> +
> +static void iosfc_do_write(void *opaque)
> +{
> +    AppleGFXMMIOJob *job = opaque;
> +    [job->state->pgiosfc mmioWriteAtOffset:job->offset value:job->value];
> +    qatomic_set(&job->completed, true);
> +    aio_wait_kick();
> +}
> +
> +static void iosfc_write(void *opaque, hwaddr offset, uint64_t val,
> +                        unsigned size)
> +{
> +    AppleGFXMMIOJob job = {
> +        .state = opaque,
> +        .offset = offset,
> +        .value = val,
> +        .completed = false,
> +    };
> +    AioContext *context = qemu_get_aio_context();
> +    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> +
> +    dispatch_async_f(queue, &job, iosfc_do_write);
> +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> +
> +    trace_apple_gfx_mmio_iosfc_write(offset, val);
> +}
> +
> +static const MemoryRegionOps apple_iosfc_ops = {
> +    .read = iosfc_read,
> +    .write = iosfc_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .valid = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +static void raise_iosfc_irq(void *opaque)
> +{
> +    AppleGFXMMIOState *s = opaque;
> +
> +    qemu_irq_pulse(s->irq_iosfc);
> +}
> +
> +typedef struct AppleGFXMapSurfaceMemoryJob {
> +    uint64_t guest_physical_address;
> +    uint64_t guest_physical_length;
> +    void *result_mem;
> +    AppleGFXMMIOState *state;
> +    bool read_only;
> +    bool success;
> +    bool done;
> +} AppleGFXMapSurfaceMemoryJob;
> +
> +static void apple_gfx_mmio_map_surface_memory(void *opaque)
> +{
> +    AppleGFXMapSurfaceMemoryJob *job = opaque;
> +    AppleGFXMMIOState *s = job->state;
> +    mach_vm_address_t mem;
> +
> +    mem = apple_gfx_host_address_for_gpa_range(job->guest_physical_address,
> +                                               job->guest_physical_length,
> +                                               job->read_only);
> +
> +    qemu_mutex_lock(&s->common.job_mutex);
> +    job->result_mem = (void*)mem;

nit: write as (void *).

> +    job->success = mem != 0;
> +    job->done = true;
> +    qemu_cond_broadcast(&s->common.job_cond);
> +    qemu_mutex_unlock(&s->common.job_mutex);
> +}
> +
> +static PGIOSurfaceHostDevice *apple_gfx_prepare_iosurface_host_device(
> +    AppleGFXMMIOState *s)
> +{
> +    PGIOSurfaceHostDeviceDescriptor *iosfc_desc =
> +        [PGIOSurfaceHostDeviceDescriptor new];
> +    PGIOSurfaceHostDevice *iosfc_host_dev = nil;
> +
> +    iosfc_desc.mapMemory =
> +        ^bool(uint64_t phys, uint64_t len, bool ro, void **va, void *e, void *f) {
> +            AppleGFXMapSurfaceMemoryJob job = {
> +                .guest_physical_address = phys, .guest_physical_length = len,
> +                .read_only = ro, .state = s,
> +            };
> +
> +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +                                    apple_gfx_mmio_map_surface_memory, &job);
> +            apple_gfx_await_bh_job(&s->common, &job.done);
> +
> +            *va = job.result_mem;
> +
> +            trace_apple_gfx_iosfc_map_memory(phys, len, ro, va, e, f, *va,
> +                                             job.success);
> +
> +            return job.success;
> +        };
> +
> +    iosfc_desc.unmapMemory =
> +        ^bool(void *a, void *b, void *c, void *d, void *e, void *f) {
> +            trace_apple_gfx_iosfc_unmap_memory(a, b, c, d, e, f);
> +            return true;
> +        };
> +
> +    iosfc_desc.raiseInterrupt = ^bool(uint32_t vector) {
> +        trace_apple_gfx_iosfc_raise_irq(vector);
> +        aio_bh_schedule_oneshot(qemu_get_aio_context(), raise_iosfc_irq, s);

Let's pass s->irq_iosfc here to unify raise_iosfc_irq() and raise_gfx_irq().

> +        return true;
> +    };
> +
> +    iosfc_host_dev =
> +        [[PGIOSurfaceHostDevice alloc] initWithDescriptor:iosfc_desc];
> +    [iosfc_desc release];
> +    return iosfc_host_dev;
> +}
> +
> +static void raise_gfx_irq(void *opaque)
> +{
> +    AppleGFXMMIOState *s = opaque;
> +
> +    qemu_irq_pulse(s->irq_gfx);
> +}
> +
> +static void apple_gfx_mmio_realize(DeviceState *dev, Error **errp)
> +{
> +    @autoreleasepool {
> +        AppleGFXMMIOState *s = APPLE_GFX_MMIO(dev);
> +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> +
> +        desc.raiseInterrupt = ^(uint32_t vector) {
> +            trace_apple_gfx_raise_irq(vector);
> +            aio_bh_schedule_oneshot(qemu_get_aio_context(), raise_gfx_irq, s);
> +        };
> +
> +        desc.usingIOSurfaceMapper = true;
> +        s->pgiosfc = apple_gfx_prepare_iosurface_host_device(s);
> +
> +        apple_gfx_common_realize(&s->common, desc, errp);
> +        [desc release];
> +        desc = nil;
> +    }
> +}
> +
> +static void apple_gfx_mmio_init(Object *obj)
> +{
> +    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
> +
> +    apple_gfx_common_init(obj, &s->common, TYPE_APPLE_GFX_MMIO);
> +
> +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->common.iomem_gfx);
> +    memory_region_init_io(&s->iomem_iosfc, obj, &apple_iosfc_ops, s,
> +                          TYPE_APPLE_GFX_MMIO, 0x10000);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem_iosfc);
> +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_gfx);
> +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_iosfc);
> +}
> +
> +static void apple_gfx_mmio_reset(Object *obj, ResetType type)
> +{
> +    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
> +    [s->common.pgdev reset];
> +}
> +
> +
> +static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    ResettableClass *rc = RESETTABLE_CLASS(klass);
> +
> +    rc->phases.hold = apple_gfx_mmio_reset;
> +    dc->hotpluggable = false;
> +    dc->realize = apple_gfx_mmio_realize;
> +}
> +
> +static TypeInfo apple_gfx_mmio_types[] = {
> +    {
> +        .name          = TYPE_APPLE_GFX_MMIO,
> +        .parent        = TYPE_SYS_BUS_DEVICE,
> +        .instance_size = sizeof(AppleGFXMMIOState),
> +        .class_init    = apple_gfx_mmio_class_init,
> +        .instance_init = apple_gfx_mmio_init,
> +    }
> +};
> +DEFINE_TYPES(apple_gfx_mmio_types)
> diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
> new file mode 100644
> index 00000000000..39931fba65a
> --- /dev/null
> +++ b/hw/display/apple-gfx.h
> @@ -0,0 +1,58 @@
> +#ifndef QEMU_APPLE_GFX_H
> +#define QEMU_APPLE_GFX_H
> +
> +#define TYPE_APPLE_GFX_MMIO         "apple-gfx-mmio"
> +#define TYPE_APPLE_GFX_PCI          "apple-gfx-pci"
> +
> +#include "qemu/osdep.h"
> +#include <dispatch/dispatch.h>
> +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> +#include "qemu/typedefs.h"
> +#include "exec/memory.h"
> +#include "ui/surface.h"
> +
> +@class PGDeviceDescriptor;
> +@protocol PGDevice;
> +@protocol PGDisplay;
> +@protocol MTLDevice;
> +@protocol MTLTexture;
> +@protocol MTLCommandQueue;
> +
> +typedef QTAILQ_HEAD(, PGTask_s) PGTaskList;
> +
> +struct AppleGFXMapMemoryJob;

Probably this declaration of AppleGFXMapMemoryJob is unnecessary.

> +typedef struct AppleGFXState {
> +    MemoryRegion iomem_gfx;
> +    id<PGDevice> pgdev;
> +    id<PGDisplay> pgdisp;
> +    PGTaskList tasks;
> +    QemuConsole *con;
> +    id<MTLDevice> mtl;
> +    id<MTLCommandQueue> mtl_queue;
> +    bool cursor_show;
> +    QEMUCursor *cursor;
> +
> +    /* For running PVG memory-mapping requests in the AIO context */
> +    QemuCond job_cond;
> +    QemuMutex job_mutex;

Use: QemuEvent

> +
> +    dispatch_queue_t render_queue;
> +    /* The following fields should only be accessed from the BQL: */

Perhaps it may be better to document fields that can be accessed 
*without* the BQL; most things in QEMU implicitly require the BQL.

> +    bool gfx_update_requested;
> +    bool new_frame_ready;
> +    bool using_managed_texture_storage;
> +} AppleGFXState;
> +
> +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char* obj_name);
> +void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
> +                              Error **errp);
> +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
> +                                               uint64_t length, bool read_only);
> +void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag);
> +
> +#endif
> +
> diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
> new file mode 100644
> index 00000000000..46be9957f69
> --- /dev/null
> +++ b/hw/display/apple-gfx.m
> @@ -0,0 +1,713 @@
> +/*
> + * QEMU Apple ParavirtualizedGraphics.framework device
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
> + * which implements 3d graphics passthrough to the host as well as a
> + * proprietary guest communication channel to drive it. This device model
> + * implements support to drive that library from within QEMU.
> + */
> +
> +#include "qemu/osdep.h"
> +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> +#include <mach/mach_vm.h>
> +#include "apple-gfx.h"
> +#include "trace.h"
> +#include "qemu-main.h"
> +#include "exec/address-spaces.h"
> +#include "migration/blocker.h"
> +#include "monitor/monitor.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/cutils.h"
> +#include "qemu/log.h"
> +#include "qapi/visitor.h"
> +#include "qapi/error.h"
> +#include "ui/console.h"
> +
> +static const PGDisplayCoord_t apple_gfx_modes[] = {
> +    { .x = 1440, .y = 1080 },
> +    { .x = 1280, .y = 1024 },
> +};
> +
> +/* This implements a type defined in <ParavirtualizedGraphics/PGDevice.h>
> + * which is opaque from the framework's point of view. Typedef PGTask_t already
> + * exists in the framework headers. */
> +struct PGTask_s {
> +    QTAILQ_ENTRY(PGTask_s) node;
> +    mach_vm_address_t address;
> +    uint64_t len;
> +};
> +
> +static Error *apple_gfx_mig_blocker;

This does not have to be a static variable.

> +
> +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> +                                             uint32_t width, uint32_t height);
> +
> +static inline dispatch_queue_t get_background_queue(void)

Don't add inline. The only effect for modern compilers of inline is to 
suppress the unused function warnings.

> +{
> +    return dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> +}
> +
> +static PGTask_t *apple_gfx_new_task(AppleGFXState *s, uint64_t len)
> +{
> +    mach_vm_address_t task_mem;
> +    PGTask_t *task;
> +    kern_return_t r;
> +
> +    r = mach_vm_allocate(mach_task_self(), &task_mem, len, VM_FLAGS_ANYWHERE);
> +    if (r != KERN_SUCCESS || task_mem == 0) {

Let's remove the check for task_mem == 0. We have no reason to reject it 
if the platform insists it allocated a memory at address 0 though such a 
situation should never happen in practice.

> +        return NULL;
> +    }
> +
> +    task = g_new0(PGTask_t, 1);
> +
> +    task->address = task_mem;
> +    task->len = len;
> +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
> +
> +    return task;
> +}
> +
> +typedef struct AppleGFXIOJob {
> +    AppleGFXState *state;
> +    uint64_t offset;
> +    uint64_t value;
> +    bool completed;
> +} AppleGFXIOJob;
> +
> +static void apple_gfx_do_read(void *opaque)
> +{
> +    AppleGFXIOJob *job = opaque;
> +    job->value = [job->state->pgdev mmioReadAtOffset:job->offset];
> +    qatomic_set(&job->completed, true);
> +    aio_wait_kick();
> +}
> +
> +static uint64_t apple_gfx_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    AppleGFXIOJob job = {
> +        .state = opaque,
> +        .offset = offset,
> +        .completed = false,
> +    };
> +    AioContext *context = qemu_get_aio_context();
> +    dispatch_queue_t queue = get_background_queue();
> +
> +    dispatch_async_f(queue, &job, apple_gfx_do_read);
> +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> +
> +    trace_apple_gfx_read(offset, job.value);
> +    return job.value;
> +}
> +
> +static void apple_gfx_do_write(void *opaque)
> +{
> +    AppleGFXIOJob *job = opaque;
> +    [job->state->pgdev mmioWriteAtOffset:job->offset value:job->value];
> +    qatomic_set(&job->completed, true);
> +    aio_wait_kick();
> +}
> +
> +static void apple_gfx_write(void *opaque, hwaddr offset, uint64_t val,
> +                            unsigned size)
> +{
> +    /* The methods mmioReadAtOffset: and especially mmioWriteAtOffset: can
> +     * trigger and block on operations on other dispatch queues, which in turn
> +     * may call back out on one or more of the callback blocks. For this reason,
> +     * and as we are holding the BQL, we invoke the I/O methods on a pool
> +     * thread and handle AIO tasks while we wait. Any work in the callbacks
> +     * requiring the BQL will in turn schedule BHs which this thread will
> +     * process while waiting. */
> +    AppleGFXIOJob job = {
> +        .state = opaque,
> +        .offset = offset,
> +        .value = val,
> +        .completed = false,
> +    };
> +    AioContext *context = qemu_get_current_aio_context();
> +    dispatch_queue_t queue = get_background_queue();
> +
> +    dispatch_async_f(queue, &job, apple_gfx_do_write);
> +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> +
> +    trace_apple_gfx_write(offset, val);
> +}
> +
> +static const MemoryRegionOps apple_gfx_ops = {
> +    .read = apple_gfx_read,
> +    .write = apple_gfx_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .valid = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 4,
> +    },
> +};
> +
> +static void apple_gfx_render_new_frame_bql_unlock(AppleGFXState *s)
> +{
> +    BOOL r;
> +    uint32_t width = surface_width(s->surface);
> +    uint32_t height = surface_height(s->surface);
> +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue commandBuffer];
> +    id<MTLTexture> texture = s->texture;
> +
> +    assert(bql_locked());
> +    [texture retain];
> +
> +    bql_unlock();
> +
> +    /* This is not safe to call from the BQL due to PVG-internal locks causing
> +     * deadlocks. */
> +    r = [s->pgdisp encodeCurrentFrameToCommandBuffer:command_buffer
> +                                             texture:texture
> +                                              region:region];
> +    if (!r) {
> +        [texture release];
> +        bql_lock();
> +        --s->pending_frames;
> +        bql_unlock();
> +        qemu_log_mask(LOG_GUEST_ERROR, "apple_gfx_render_new_frame_bql_unlock: "

Use: __func__

> +                      "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
> +        return;
> +    }
> +
> +    if (s->using_managed_texture_storage) {
> +        /* "Managed" textures exist in both VRAM and RAM and must be synced. */
> +        id<MTLBlitCommandEncoder> blit = [command_buffer blitCommandEncoder];
> +        [blit synchronizeResource:texture];
> +        [blit endEncoding];
> +    }
> +    [texture release];
> +    [command_buffer addCompletedHandler:
> +        ^(id<MTLCommandBuffer> cb)
> +        {
> +            dispatch_async(s->render_queue, ^{
> +                apple_gfx_render_frame_completed(s, width, height);
> +            });
> +        }];
> +    [command_buffer commit];
> +}
> +
> +static void copy_mtl_texture_to_surface_mem(id<MTLTexture> texture, void *vram)
> +{
> +    /* TODO: Skip this entirely on a pure Metal or headless/guest-only
> +     * rendering path, else use a blit command encoder? Needs careful
> +     * (double?) buffering design. */
> +    size_t width = texture.width, height = texture.height;
> +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> +    [texture getBytes:vram
> +          bytesPerRow:(width * 4)
> +        bytesPerImage:(width * height * 4)
> +           fromRegion:region
> +          mipmapLevel:0
> +                slice:0];
> +}copy_mtl_texture_to_surface_mem
> +
> +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> +                                             uint32_t width, uint32_t height)
> +{
> +    bql_lock();
> +    --s->pending_frames;
> +    assert(s->pending_frames >= 0);
> +
> +    /* Only update display if mode hasn't changed since we started rendering. */
> +    if (width == surface_width(s->surface) &&
> +        height == surface_height(s->surface)) {
> +        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
> +        if (s->gfx_update_requested) {
> +            s->gfx_update_requested = false;
> +            dpy_gfx_update_full(s->con);
> +            graphic_hw_update_done(s->con);
> +            s->new_frame_ready = false;
> +        } else {
> +            s->new_frame_ready = true;
> +        }
> +    }
> +    if (s->pending_frames > 0) {
> +        apple_gfx_render_new_frame_bql_unlock(s);
> +    } else {
> +        bql_unlock();
> +    }
> +}
> +
> +static void apple_gfx_fb_update_display(void *opaque)
> +{
> +    AppleGFXState *s = opaque;
> +
> +    assert(bql_locked());
> +    if (s->new_frame_ready) {
> +        dpy_gfx_update_full(s->con);
> +        s->new_frame_ready = false;
> +        graphic_hw_update_done(s->con);
> +    } else if (s->pending_frames > 0) {
> +        s->gfx_update_requested = true;
> +    } else {
> +        graphic_hw_update_done(s->con);
> +    }
> +}
> +
> +static const GraphicHwOps apple_gfx_fb_ops = {
> +    .gfx_update = apple_gfx_fb_update_display,
> +    .gfx_update_async = true,
> +};
> +
> +static void update_cursor(AppleGFXState *s)
> +{
> +    assert(bql_locked());
> +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
> +                  s->pgdisp.cursorPosition.y, s->cursor_show);
> +}
> +
> +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t height)
> +{
> +    MTLTextureDescriptor *textureDescriptor;
> +
> +    if (s->surface &&
> +        width == surface_width(s->surface) &&
> +        height == surface_height(s->surface)) {
> +        return;
> +    }
> +
> +    g_free(s->vram);
> +    [s->texture release];
> +
> +    s->vram = g_malloc0_n(width * height, 4);
> +    s->surface = qemu_create_displaysurface_from(width, height, PIXMAN_LE_a8r8g8b8,
> +                                                 width * 4, s->vram);> +> +    @autoreleasepool {
> +        textureDescriptor =
> +            [MTLTextureDescriptor
> +                texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
> +                                             width:width
> +                                            height:height
> +                                         mipmapped:NO];
> +        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
> +        s->texture = [s->mtl newTextureWithDescriptor:textureDescriptor];


What about creating pixman_image_t from s->texture.buffer.contents? This 
should save memory usage by removing the duplication of texture.

> +    }
> +
> +    s->using_managed_texture_storage =
> +        (s->texture.storageMode == MTLStorageModeManaged);
> +    dpy_gfx_replace_surface(s->con, s->surface);
> +}
> +
> +static void create_fb(AppleGFXState *s)
> +{
> +    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
> +    set_mode(s, 1440, 1080);
> +
> +    s->cursor_show = true;
> +}
> +
> +static size_t apple_gfx_get_default_mmio_range_size(void)
> +{
> +    size_t mmio_range_size;
> +    @autoreleasepool {
> +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> +        mmio_range_size = desc.mmioLength;
> +        [desc release];
> +    }
> +    return mmio_range_size;
> +}
> +
> +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char* obj_name)
> +{
> +    size_t mmio_range_size = apple_gfx_get_default_mmio_range_size();
> +
> +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
> +    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s, obj_name,
> +                          mmio_range_size);
> +
> +    /* TODO: PVG framework supports serialising device state: integrate it! */
> +}
> +
> +typedef struct AppleGFXMapMemoryJob {
> +    AppleGFXState *state;
> +    PGTask_t *task;
> +    uint64_t virtual_offset;
> +    PGPhysicalMemoryRange_t *ranges;
> +    uint32_t range_count;
> +    bool read_only;
> +    bool success;
> +    bool done;
> +} AppleGFXMapMemoryJob;
> +
> +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
> +                                               uint64_t length, bool read_only)
> +{
> +    MemoryRegion *ram_region;
> +    uintptr_t host_address;
> +    hwaddr ram_region_offset = 0;
> +    hwaddr ram_region_length = length;
> +
> +    ram_region = address_space_translate(&address_space_memory,
> +                                         guest_physical,
> +                                         &ram_region_offset,
> +                                         &ram_region_length, !read_only,
> +                                         MEMTXATTRS_UNSPECIFIED);

Call memory_region_ref() so that it won't go away.

> +
> +    if (!ram_region || ram_region_length < length ||
> +        !memory_access_is_direct(ram_region, !read_only)) {
> +        return 0;
> +    }
> +
> +    host_address = (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);

host_address is typed as uintptr_t, not mach_vm_address_t.

> +    if (host_address == 0) {
> +        return 0;
> +    }
> +    host_address += ram_region_offset;
> +
> +    return host_address;
> +}
> +
> +static void apple_gfx_map_memory(void *opaque)
> +{
> +    AppleGFXMapMemoryJob *job = opaque;
> +    AppleGFXState *s = job->state;
> +    PGTask_t *task                  = job->task;
> +    uint32_t range_count            = job->range_count;
> +    uint64_t virtual_offset         = job->virtual_offset;
> +    PGPhysicalMemoryRange_t *ranges = job->ranges;
> +    bool read_only                  = job->read_only;
> +    kern_return_t r;
> +    mach_vm_address_t target, source;
> +    vm_prot_t cur_protection, max_protection;
> +    bool success = true;
> +
> +    g_assert(bql_locked());
> +
> +    trace_apple_gfx_map_memory(task, range_count, virtual_offset, read_only);
> +    for (int i = 0; i < range_count; i++) {
> +        PGPhysicalMemoryRange_t *range = &ranges[i];
> +
> +        target = task->address + virtual_offset;
> +        virtual_offset += range->physicalLength;
> +
> +        trace_apple_gfx_map_memory_range(i, range->physicalAddress,
> +                                         range->physicalLength);
> +
> +        source = apple_gfx_host_address_for_gpa_range(range->physicalAddress,
> +                                                      range->physicalLength,
> +                                                      read_only);
> +        if (source == 0) {
> +            success = false;
> +            continue;
> +        }
> +
> +        MemoryRegion* alt_mr = NULL;
> +        mach_vm_address_t alt_source = (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress, range->physicalLength, NULL);
> +        g_assert(alt_source == source);

Remove this; I guess this is for debugging.

> +
> +        cur_protection = 0;
> +        max_protection = 0;
> +        // Map guest RAM at range->physicalAddress into PG task memory range
> +        r = mach_vm_remap(mach_task_self(),
> +                          &target, range->physicalLength, vm_page_size - 1,
> +                          VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE,
> +                          mach_task_self(),
> +                          source, false /* shared mapping, no copy */,
> +                          &cur_protection, &max_protection,
> +                          VM_INHERIT_COPY);
> +        trace_apple_gfx_remap(r, source, target);
> +        g_assert(r == KERN_SUCCESS);
> +    }
> +
> +    qemu_mutex_lock(&s->job_mutex);
> +    job->success = success;
> +    job->done = true;
> +    qemu_cond_broadcast(&s->job_cond);
> +    qemu_mutex_unlock(&s->job_mutex);
> +}
> +
> +void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag)
> +{
> +    qemu_mutex_lock(&s->job_mutex);
> +    while (!*job_done_flag) {
> +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
> +    }
> +    qemu_mutex_unlock(&s->job_mutex);
> +}
> +
> +typedef struct AppleGFXReadMemoryJob {
> +    AppleGFXState *s;
> +    hwaddr physical_address;
> +    uint64_t length;
> +    void *dst;
> +    bool done;
> +} AppleGFXReadMemoryJob;
> +
> +static void apple_gfx_do_read_memory(void *opaque)
> +{
> +    AppleGFXReadMemoryJob *job = opaque;
> +    AppleGFXState *s = job->s;
> +
> +    cpu_physical_memory_read(job->physical_address, job->dst, job->length);

Use: dma_memory_read()

> +
> +    qemu_mutex_lock(&s->job_mutex);
> +    job->done = true;
> +    qemu_cond_broadcast(&s->job_cond);
> +    qemu_mutex_unlock(&s->job_mutex);
> +}
> +
> +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr physical_address,
> +                                  uint64_t length, void *dst)
> +{
> +    AppleGFXReadMemoryJob job = {
> +        s, physical_address, length, dst
> +    };
> +
> +    trace_apple_gfx_read_memory(physical_address, length, dst);
> +
> +    /* Traversing the memory map requires RCU/BQL, so do it in a BH. */
> +    aio_bh_schedule_oneshot(qemu_get_aio_context(), apple_gfx_do_read_memory,
> +                            &job);
> +    apple_gfx_await_bh_job(s, &job.done);
> +}
> +
> +static void apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
> +                                                     PGDeviceDescriptor *desc)
> +{
> +    desc.createTask = ^(uint64_t vmSize, void * _Nullable * _Nonnull baseAddress) {
> +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
> +        *baseAddress = (void *)task->address;
> +        trace_apple_gfx_create_task(vmSize, *baseAddress);
> +        return task;
> +    };
> +
> +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
> +        trace_apple_gfx_destroy_task(task);
> +        QTAILQ_REMOVE(&s->tasks, task, node);
> +        mach_vm_deallocate(mach_task_self(), task->address, task->len);
> +        g_free(task);
> +    };
> +
> +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t range_count,
> +                       uint64_t virtual_offset, bool read_only,
> +                       PGPhysicalMemoryRange_t * _Nonnull ranges) {
> +        AppleGFXMapMemoryJob job = {
> +            .state = s,
> +            .task = task, .ranges = ranges, .range_count = range_count,
> +            .read_only = read_only, .virtual_offset = virtual_offset,
> +            .done = false, .success = true,
> +        };
> +        if (range_count > 0) {
> +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +                                    apple_gfx_map_memory, &job);
> +            apple_gfx_await_bh_job(s, &job.done);
> +        }
> +        return job.success;
> +    };
> +
> +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task, uint64_t virtualOffset,
> +                         uint64_t length) {
> +        kern_return_t r;
> +        mach_vm_address_t range_address;
> +
> +        trace_apple_gfx_unmap_memory(task, virtualOffset, length);
> +
> +        /* Replace task memory range with fresh pages, undoing the mapping
> +         * from guest RAM. */
> +        range_address = task->address + virtualOffset;
> +        r = mach_vm_allocate(mach_task_self(), &range_address, length,
> +                             VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE);
> +        g_assert(r == KERN_SUCCESS);error_setg

An extra error_setg

> +
> +        return true;
> +    };
> +
> +    desc.readMemory = ^bool(uint64_t physical_address, uint64_t length,
> +                            void * _Nonnull dst) {
> +        apple_gfx_read_memory(s, physical_address, length, dst);
> +        return true;
> +    };
> +}
> +
> +static PGDisplayDescriptor *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
> +{
> +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
> +
> +    disp_desc.name = @"QEMU display";
> +    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A 20" display */
> +    disp_desc.queue = dispatch_get_main_queue();
> +    disp_desc.newFrameEventHandler = ^(void) {
> +        trace_apple_gfx_new_frame();
> +        dispatch_async(s->render_queue, ^{
> +            /* Drop frames if we get too far ahead. */
> +            bql_lock();
> +            if (s->pending_frames >= 2) {
> +                bql_unlock();
> +                return;
> +            }
> +            ++s->pending_frames;
> +            if (s->pending_frames > 1) {
> +                bql_unlock();
> +                return;
> +            }
> +            @autoreleasepool {
> +                apple_gfx_render_new_frame_bql_unlock(s);
> +            }
> +        });
> +    };
> +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t sizeInPixels,
> +                                    OSType pixelFormat) {
> +        trace_apple_gfx_mode_change(sizeInPixels.x, sizeInPixels.y);
> +
> +        BQL_LOCK_GUARD();
> +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
> +    };
> +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
> +                                     PGDisplayCoord_t hotSpot) {
> +        [glyph retain];
> +        dispatch_async(get_background_queue(), ^{
> +            BQL_LOCK_GUARD();
> +            uint32_t bpp = glyph.bitsPerPixel;
> +            size_t width = glyph.pixelsWide;
> +            size_t height = glyph.pixelsHigh;
> +            size_t padding_bytes_per_row = glyph.bytesPerRow - width * 4;
> +            const uint8_t* px_data = glyph.bitmapData;
> +
> +            trace_apple_gfx_cursor_set(bpp, width, height);
> +
> +            if (s->cursor) {
> +                cursor_unref(s->cursor);
> +                s->cursor = NULL;
> +            }
> +
> +            if (bpp == 32) { /* Shouldn't be anything else, but just to be safe...*/
> +                s->cursor = cursor_alloc(width, height);
> +                s->cursor->hot_x = hotSpot.x;
> +                s->cursor->hot_y = hotSpot.y;
> +
> +                uint32_t *dest_px = s->cursor->data;
> +
> +                for (size_t y = 0; y < height; ++y) {
> +                    for (size_t x = 0; x < width; ++x) {
> +                        /* NSBitmapImageRep's red & blue channels are swapped
> +                         * compared to QEMUCursor's. */
> +                        *dest_px =
> +                            (px_data[0] << 16u) |
> +                            (px_data[1] <<  8u) |
> +                            (px_data[2] <<  0u) |
> +                            (px_data[3] << 24u);
> +                        ++dest_px;
> +                        px_data += 4;
> +                    }
> +                    px_data += padding_bytes_per_row;
> +                }
> +                dpy_cursor_define(s->con, s->cursor);
> +                update_cursor(s);
> +            }
> +            [glyph release];
> +        });
> +    };
> +    disp_desc.cursorShowHandler = ^(BOOL show) {
> +        dispatch_async(get_background_queue(), ^{
> +            BQL_LOCK_GUARD();
> +            trace_apple_gfx_cursor_show(show);
> +            s->cursor_show = show;
> +            update_cursor(s);
> +        });
> +    };
> +    disp_desc.cursorMoveHandler = ^(void) {
> +        dispatch_async(get_background_queue(), ^{
> +            BQL_LOCK_GUARD();
> +            trace_apple_gfx_cursor_move();
> +            update_cursor(s);
> +        });
> +    };
> +
> +    return disp_desc;
> +}
> +
> +static NSArray<PGDisplayMode*>* apple_gfx_prepare_display_mode_array(void)
> +{
> +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
> +    NSArray<PGDisplayMode*>* mode_array = nil;
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> +        modes[i] =
> +            [[PGDisplayMode alloc] initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
> +    }
> +
> +    mode_array = [NSArray arrayWithObjects:modes count:ARRAY_SIZE(apple_gfx_modes)];
> +
> +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> +        [modes[i] release];
> +        modes[i] = nil;
> +    }
> +
> +    return mode_array;
> +}
> +
> +static id<MTLDevice> copy_suitable_metal_device(void)
> +{
> +    id<MTLDevice> dev = nil;
> +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
> +
> +    /* Prefer a unified memory GPU. Failing that, pick a non-removable GPU. */
> +    for (size_t i = 0; i < devs.count; ++i) {
> +        if (devs[i].hasUnifiedMemory) {
> +            dev = devs[i];
> +            break;
> +        }
> +        if (!devs[i].removable) {
> +            dev = devs[i];
> +        }
> +    }
> +
> +    if (dev != nil) {
> +        [dev retain];
> +    } else {
> +        dev = MTLCreateSystemDefaultDevice();
> +    }
> +    [devs release];
> +
> +    return dev;
> +}
> +
> +void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
> +                              Error **errp)
> +{
> +    PGDisplayDescriptor *disp_desc = nil;
> +
> +    if (apple_gfx_mig_blocker == NULL) {
> +        error_setg(&apple_gfx_mig_blocker,
> +                  "Migration state blocked by apple-gfx display device");
> +        if (migrate_add_blocker(&apple_gfx_mig_blocker, errp) < 0) {
> +            return;
> +        }
> +    }
> +
> +    QTAILQ_INIT(&s->tasks);
> +    s->render_queue = dispatch_queue_create("apple-gfx.render",
> +                                            DISPATCH_QUEUE_SERIAL);
> +    s->mtl = copy_suitable_metal_device();
> +    s->mtl_queue = [s->mtl newCommandQueue];
> +
> +    desc.device = s->mtl;
> +
> +    apple_gfx_register_task_mapping_handlers(s, desc);
> +
> +    s->pgdev = PGNewDeviceWithDescriptor(desc);
> +
> +    disp_desc = apple_gfx_prepare_display_descriptor(s);
> +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
> +                                              port:0 serialNum:1234];
> +    [disp_desc release];
> +    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
> +
> +    create_fb(s);
> +
> +    qemu_mutex_init(&s->job_mutex);
> +    qemu_cond_init(&s->job_cond);
> +}
> diff --git a/hw/display/meson.build b/hw/display/meson.build
> index 20a94973fa2..619e642905a 100644
> --- a/hw/display/meson.build
> +++ b/hw/display/meson.build
> @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true: files('artist.c'))
>   
>   system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c', 'ati_2d.c', 'ati_dbg.c'), pixman])
>   
> +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true: [files('apple-gfx.m'), pvg, metal])
> +if cpu == 'aarch64'
> +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true: [files('apple-gfx-mmio.m'), pvg, metal])
> +endif
>   
>   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
>     virtio_gpu_ss = ss.source_set()
> diff --git a/hw/display/trace-events b/hw/display/trace-events
> index 781f8a33203..214998312b9 100644
> --- a/hw/display/trace-events
> +++ b/hw/display/trace-events
> @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width) "dest_width : %u"
>   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
>   dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
>   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
> +
> +# apple-gfx.m
> +apple_gfx_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
> +apple_gfx_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
> +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x base_addr=%p"
> +apple_gfx_destroy_task(void *task) "task=%p"
> +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t virtual_offset, uint32_t read_only) "task=%p range_count=0x%x virtual_offset=0x%"PRIx64" read_only=%d"
> +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr, uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
> +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
> +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset, uint64_t length) "task=%p virtual_offset=0x%"PRIx64" length=0x%"PRIx64
> +apple_gfx_read_memory(uint64_t phys_address, uint64_t length, void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
> +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
> +apple_gfx_new_frame(void) ""
> +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64" y=%"PRId64
> +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t height) "bpp=%d width=%"PRId64" height=0x%"PRId64
> +apple_gfx_cursor_show(uint32_t show) "show=%d"
> +apple_gfx_cursor_move(void) ""
> +apple_gfx_common_init(const char *device_name, size_t mmio_size) "device: %s; MMIO size: %zu bytes"
> +
> +# apple-gfx-mmio.m
> +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
> +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
> +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t ro, void *va, void *e, void *f, void* va_result, int success) "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p -> *va=%p, success = %d"
> +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c, void *d, void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
> +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
> +
> diff --git a/meson.build b/meson.build
> index d26690ce204..0e124eff13f 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -761,6 +761,8 @@ socket = []
>   version_res = []
>   coref = []
>   iokit = []
> +pvg = []
> +metal = []
>   emulator_link_args = []
>   midl = not_found
>   widl = not_found
> @@ -782,6 +784,8 @@ elif host_os == 'darwin'
>     coref = dependency('appleframeworks', modules: 'CoreFoundation')
>     iokit = dependency('appleframeworks', modules: 'IOKit', required: false)
>     host_dsosuf = '.dylib'
> +  pvg = dependency('appleframeworks', modules: 'ParavirtualizedGraphics')
> +  metal = dependency('appleframeworks', modules: 'Metal')
>   elif host_os == 'sunos'
>     socket = [cc.find_library('socket'),
>               cc.find_library('nsl'),



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-25  6:03   ` Akihiko Odaki
@ 2024-10-25 19:43     ` Phil Dennis-Jordan
  2024-10-26  4:40       ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-25 19:43 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 52064 bytes --]

On Fri, 25 Oct 2024 at 08:03, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> > MacOS provides a framework (library) that allows any vmm to implement a
> > paravirtualized 3d graphics passthrough to the host metal stack called
> > ParavirtualizedGraphics.Framework (PVG). The library abstracts away
> > almost every aspect of the paravirtualized device model and only provides
> > and receives callbacks on MMIO access as well as to share memory address
> > space between the VM and PVG.
> >
> > This patch implements a QEMU device that drives PVG for the VMApple
> > variant of it.
> >
> > Signed-off-by: Alexander Graf <graf@amazon.com>
> > Co-authored-by: Alexander Graf <graf@amazon.com>
> >
> > Subsequent changes:
> >
> >   * Cherry-pick/rebase conflict fixes, API use updates.
> >   * Moved from hw/vmapple/ (useful outside that machine type)
> >   * Overhaul of threading model, many thread safety improvements.
> >   * Asynchronous rendering.
> >   * Memory and object lifetime fixes.
> >   * Refactoring to split generic and (vmapple) MMIO variant specific
> >     code.
> >
> > Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> > ---
> >
> > v2:
> >
> >   * Cherry-pick/rebase conflict fixes
> >   * BQL function renaming
> >   * Moved from hw/vmapple/ (useful outside that machine type)
> >   * Code review comments: Switched to DEFINE_TYPES macro & little endian
> >     MMIO.
> >   * Removed some dead/superfluous code
> >   * Mad set_mode thread & memory safe
> >   * Added migration blocker due to lack of (de-)serialisation.
> >   * Fixes to ObjC refcounting and autorelease pool usage.
> >   * Fixed ObjC new/init misuse
> >   * Switched to ObjC category extension for private property.
> >   * Simplified task memory mapping and made it thread safe.
> >   * Refactoring to split generic and vmapple MMIO variant specific
> >     code.
> >   * Switched to asynchronous MMIO writes on x86-64
> >   * Rendering and graphics update are now done asynchronously
> >   * Fixed cursor handling
> >   * Coding convention fixes
> >   * Removed software cursor compositing
> >
> > v3:
> >
> >   * Rebased on latest upstream, fixed breakages including switching to
> Resettable methods.
> >   * Squashed patches dealing with dGPUs, MMIO area size, and GPU picking.
> >   * Allow re-entrant MMIO; this simplifies the code and solves the
> divergence
> >     between x86-64 and arm64 variants.
> >
> > v4:
> >
> >   * Renamed '-vmapple' device variant to '-mmio'
> >   * MMIO device type now requires aarch64 host and guest
> >   * Complete overhaul of the glue code for making Qemu's and
> >     ParavirtualizedGraphics.framework's threading and synchronisation
> models
> >     work together. Calls into PVG are from dispatch queues while the
> >     BQL-holding initiating thread processes AIO context events;
> callbacks from
> >     PVG are scheduled as BHs on the BQL/main AIO context, awaiting
> completion
> >     where necessary.
> >   * Guest frame rendering state is covered by the BQL, with only the PVG
> calls
> >     outside the lock, and serialised on the named render_queue.
> >   * Simplified logic for dropping frames in-flight during mode changes,
> fixed
> >     bug in pending frames logic.
> >   * Addressed smaller code review notes such as: function naming, object
> type
> >     declarations, type names/declarations/casts, code formatting,
> #include
> >     order, over-cautious ObjC retain/release, what goes in init vs
> realize,
> >     etc.
> >
> >
> >   hw/display/Kconfig          |   9 +
> >   hw/display/apple-gfx-mmio.m | 284 ++++++++++++++
> >   hw/display/apple-gfx.h      |  58 +++
> >   hw/display/apple-gfx.m      | 713 ++++++++++++++++++++++++++++++++++++
> >   hw/display/meson.build      |   4 +
> >   hw/display/trace-events     |  26 ++
> >   meson.build                 |   4 +
> >   7 files changed, 1098 insertions(+)
> >   create mode 100644 hw/display/apple-gfx-mmio.m
> >   create mode 100644 hw/display/apple-gfx.h
> >   create mode 100644 hw/display/apple-gfx.m
> >
> > diff --git a/hw/display/Kconfig b/hw/display/Kconfig
> > index 2250c740078..6a9b7b19ada 100644
> > --- a/hw/display/Kconfig
> > +++ b/hw/display/Kconfig
> > @@ -140,3 +140,12 @@ config XLNX_DISPLAYPORT
> >
> >   config DM163
> >       bool
> > +
> > +config MAC_PVG
> > +    bool
> > +    default y
> > +
> > +config MAC_PVG_MMIO
> > +    bool
> > +    depends on MAC_PVG && AARCH64
> > +
> > diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
> > new file mode 100644
> > index 00000000000..06131bc23f1
> > --- /dev/null
> > +++ b/hw/display/apple-gfx-mmio.m
> > @@ -0,0 +1,284 @@
> > +/*
> > + * QEMU Apple ParavirtualizedGraphics.framework device, MMIO (arm64)
> variant
> > + *
> > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights
> Reserved.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> > + * See the COPYING file in the top-level directory.
>
> Use SPDX-License-Identifier. You can find some examples with grep.
>

This was already part of the file when I took it over from Alex. I've used
SPDX on any new files I've started from scratch. (I can of course *add* the
SPDX line here too.)


> > + *
> > + * ParavirtualizedGraphics.framework is a set of libraries that macOS
> provides
> > + * which implements 3d graphics passthrough to the host as well as a
> > + * proprietary guest communication channel to drive it. This device
> model
> > + * implements support to drive that library from within QEMU as an
> MMIO-based
> > + * system device for macOS on arm64 VMs.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> > +#include "apple-gfx.h"
> > +#include "monitor/monitor.h"
> > +#include "hw/sysbus.h"
> > +#include "hw/irq.h"
> > +#include "trace.h"
> > +
> > +OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXMMIOState, APPLE_GFX_MMIO)
> > +
> > +/*
> > + * ParavirtualizedGraphics.Framework only ships header files for the PCI
> > + * variant which does not include IOSFC descriptors and host devices.
> We add
> > + * their definitions here so that we can also work with the ARM version.
> > + */
> > +typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
> > +typedef bool(^IOSFCUnmapMemory)(
> > +    void *, void *, void *, void *, void *, void *);
> > +typedef bool(^IOSFCMapMemory)(
> > +    uint64_t phys, uint64_t len, bool ro, void **va, void *, void *);
> > +
> > +@interface PGDeviceDescriptor (IOSurfaceMapper)
> > +@property (readwrite, nonatomic) bool usingIOSurfaceMapper;
> > +@end
> > +
> > +@interface PGIOSurfaceHostDeviceDescriptor : NSObject
> > +-(PGIOSurfaceHostDeviceDescriptor *)init;
> > +@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory
> mapMemory;
> > +@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory
> unmapMemory;
> > +@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt
> raiseInterrupt;
> > +@end
> > +
> > +@interface PGIOSurfaceHostDevice : NSObject
> > +-(instancetype)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor
> *)desc;
> > +-(uint32_t)mmioReadAtOffset:(size_t)offset;
> > +-(void)mmioWriteAtOffset:(size_t)offset value:(uint32_t)value;
> > +@end
> > +
> > +struct AppleGFXMapSurfaceMemoryJob;
> > +struct AppleGFXMMIOState {
> > +    SysBusDevice parent_obj;
> > +
> > +    AppleGFXState common;
> > +
> > +    qemu_irq irq_gfx;
> > +    qemu_irq irq_iosfc;
> > +    MemoryRegion iomem_iosfc;
> > +    PGIOSurfaceHostDevice *pgiosfc;
> > +};
> > +
> > +typedef struct AppleGFXMMIOJob {
> > +    AppleGFXMMIOState *state;
> > +    uint64_t offset;
> > +    uint64_t value;
> > +    bool completed;
> > +} AppleGFXMMIOJob;
> > +
> > +static void iosfc_do_read(void *opaque)
> > +{
> > +    AppleGFXMMIOJob *job = opaque;
> > +    job->value = [job->state->pgiosfc mmioReadAtOffset:job->offset];
> > +    qatomic_set(&job->completed, true);
> > +    aio_wait_kick();
> > +}
> > +
> > +static uint64_t iosfc_read(void *opaque, hwaddr offset, unsigned size)
> > +{
> > +    AppleGFXMMIOJob job = {
> > +        .state = opaque,
> > +        .offset = offset,
> > +        .completed = false,
> > +    };
> > +    AioContext *context = qemu_get_aio_context();
> > +    dispatch_queue_t queue =
> dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> > +
> > +    dispatch_async_f(queue, &job, iosfc_do_read);
> > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
>
> Pass NULL as the first argument of AIO_WAIT_WHILE().
>
> > +
> > +    trace_apple_gfx_mmio_iosfc_read(offset, job.value);
> > +    return job.value;
> > +}
> > +
> > +static void iosfc_do_write(void *opaque)
> > +{
> > +    AppleGFXMMIOJob *job = opaque;
> > +    [job->state->pgiosfc mmioWriteAtOffset:job->offset
> value:job->value];
> > +    qatomic_set(&job->completed, true);
> > +    aio_wait_kick();
> > +}
> > +
> > +static void iosfc_write(void *opaque, hwaddr offset, uint64_t val,
> > +                        unsigned size)
> > +{
> > +    AppleGFXMMIOJob job = {
> > +        .state = opaque,
> > +        .offset = offset,
> > +        .value = val,
> > +        .completed = false,
> > +    };
> > +    AioContext *context = qemu_get_aio_context();
> > +    dispatch_queue_t queue =
> dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> > +
> > +    dispatch_async_f(queue, &job, iosfc_do_write);
> > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> > +
> > +    trace_apple_gfx_mmio_iosfc_write(offset, val);
> > +}
> > +
> > +static const MemoryRegionOps apple_iosfc_ops = {
> > +    .read = iosfc_read,
> > +    .write = iosfc_write,
> > +    .endianness = DEVICE_LITTLE_ENDIAN,
> > +    .valid = {
> > +        .min_access_size = 4,
> > +        .max_access_size = 8,
> > +    },
> > +    .impl = {
> > +        .min_access_size = 4,
> > +        .max_access_size = 8,
> > +    },
> > +};
> > +
> > +static void raise_iosfc_irq(void *opaque)
> > +{
> > +    AppleGFXMMIOState *s = opaque;
> > +
> > +    qemu_irq_pulse(s->irq_iosfc);
> > +}
> > +
> > +typedef struct AppleGFXMapSurfaceMemoryJob {
> > +    uint64_t guest_physical_address;
> > +    uint64_t guest_physical_length;
> > +    void *result_mem;
> > +    AppleGFXMMIOState *state;
> > +    bool read_only;
> > +    bool success;
> > +    bool done;
> > +} AppleGFXMapSurfaceMemoryJob;
> > +
> > +static void apple_gfx_mmio_map_surface_memory(void *opaque)
> > +{
> > +    AppleGFXMapSurfaceMemoryJob *job = opaque;
> > +    AppleGFXMMIOState *s = job->state;
> > +    mach_vm_address_t mem;
> > +
> > +    mem =
> apple_gfx_host_address_for_gpa_range(job->guest_physical_address,
> > +
>  job->guest_physical_length,
> > +                                               job->read_only);
> > +
> > +    qemu_mutex_lock(&s->common.job_mutex);
> > +    job->result_mem = (void*)mem;
>
> nit: write as (void *).
>
> > +    job->success = mem != 0;
> > +    job->done = true;
> > +    qemu_cond_broadcast(&s->common.job_cond);
> > +    qemu_mutex_unlock(&s->common.job_mutex);
> > +}
> > +
> > +static PGIOSurfaceHostDevice *apple_gfx_prepare_iosurface_host_device(
> > +    AppleGFXMMIOState *s)
> > +{
> > +    PGIOSurfaceHostDeviceDescriptor *iosfc_desc =
> > +        [PGIOSurfaceHostDeviceDescriptor new];
> > +    PGIOSurfaceHostDevice *iosfc_host_dev = nil;
> > +
> > +    iosfc_desc.mapMemory =
> > +        ^bool(uint64_t phys, uint64_t len, bool ro, void **va, void *e,
> void *f) {
> > +            AppleGFXMapSurfaceMemoryJob job = {
> > +                .guest_physical_address = phys, .guest_physical_length
> = len,
> > +                .read_only = ro, .state = s,
> > +            };
> > +
> > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> > +                                    apple_gfx_mmio_map_surface_memory,
> &job);
> > +            apple_gfx_await_bh_job(&s->common, &job.done);
> > +
> > +            *va = job.result_mem;
> > +
> > +            trace_apple_gfx_iosfc_map_memory(phys, len, ro, va, e, f,
> *va,
> > +                                             job.success);
> > +
> > +            return job.success;
> > +        };
> > +
> > +    iosfc_desc.unmapMemory =
> > +        ^bool(void *a, void *b, void *c, void *d, void *e, void *f) {
> > +            trace_apple_gfx_iosfc_unmap_memory(a, b, c, d, e, f);
> > +            return true;
> > +        };
> > +
> > +    iosfc_desc.raiseInterrupt = ^bool(uint32_t vector) {
> > +        trace_apple_gfx_iosfc_raise_irq(vector);
> > +        aio_bh_schedule_oneshot(qemu_get_aio_context(),
> raise_iosfc_irq, s);
>
> Let's pass s->irq_iosfc here to unify raise_iosfc_irq() and
> raise_gfx_irq().
>
> > +        return true;
> > +    };
> > +
> > +    iosfc_host_dev =
> > +        [[PGIOSurfaceHostDevice alloc] initWithDescriptor:iosfc_desc];
> > +    [iosfc_desc release];
> > +    return iosfc_host_dev;
> > +}
> > +
> > +static void raise_gfx_irq(void *opaque)
> > +{
> > +    AppleGFXMMIOState *s = opaque;
> > +
> > +    qemu_irq_pulse(s->irq_gfx);
> > +}
> > +
> > +static void apple_gfx_mmio_realize(DeviceState *dev, Error **errp)
> > +{
> > +    @autoreleasepool {
> > +        AppleGFXMMIOState *s = APPLE_GFX_MMIO(dev);
> > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> > +
> > +        desc.raiseInterrupt = ^(uint32_t vector) {
> > +            trace_apple_gfx_raise_irq(vector);
> > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> raise_gfx_irq, s);
> > +        };
> > +
> > +        desc.usingIOSurfaceMapper = true;
> > +        s->pgiosfc = apple_gfx_prepare_iosurface_host_device(s);
> > +
> > +        apple_gfx_common_realize(&s->common, desc, errp);
> > +        [desc release];
> > +        desc = nil;
> > +    }
> > +}
> > +
> > +static void apple_gfx_mmio_init(Object *obj)
> > +{
> > +    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
> > +
> > +    apple_gfx_common_init(obj, &s->common, TYPE_APPLE_GFX_MMIO);
> > +
> > +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->common.iomem_gfx);
> > +    memory_region_init_io(&s->iomem_iosfc, obj, &apple_iosfc_ops, s,
> > +                          TYPE_APPLE_GFX_MMIO, 0x10000);
> > +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem_iosfc);
> > +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_gfx);
> > +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_iosfc);
> > +}
> > +
> > +static void apple_gfx_mmio_reset(Object *obj, ResetType type)
> > +{
> > +    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
> > +    [s->common.pgdev reset];
> > +}
> > +
> > +
> > +static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +    ResettableClass *rc = RESETTABLE_CLASS(klass);
> > +
> > +    rc->phases.hold = apple_gfx_mmio_reset;
> > +    dc->hotpluggable = false;
> > +    dc->realize = apple_gfx_mmio_realize;
> > +}
> > +
> > +static TypeInfo apple_gfx_mmio_types[] = {
> > +    {
> > +        .name          = TYPE_APPLE_GFX_MMIO,
> > +        .parent        = TYPE_SYS_BUS_DEVICE,
> > +        .instance_size = sizeof(AppleGFXMMIOState),
> > +        .class_init    = apple_gfx_mmio_class_init,
> > +        .instance_init = apple_gfx_mmio_init,
> > +    }
> > +};
> > +DEFINE_TYPES(apple_gfx_mmio_types)
> > diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
> > new file mode 100644
> > index 00000000000..39931fba65a
> > --- /dev/null
> > +++ b/hw/display/apple-gfx.h
> > @@ -0,0 +1,58 @@
> > +#ifndef QEMU_APPLE_GFX_H
> > +#define QEMU_APPLE_GFX_H
> > +
> > +#define TYPE_APPLE_GFX_MMIO         "apple-gfx-mmio"
> > +#define TYPE_APPLE_GFX_PCI          "apple-gfx-pci"
> > +
> > +#include "qemu/osdep.h"
> > +#include <dispatch/dispatch.h>
> > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> > +#include "qemu/typedefs.h"
> > +#include "exec/memory.h"
> > +#include "ui/surface.h"
> > +
> > +@class PGDeviceDescriptor;
> > +@protocol PGDevice;
> > +@protocol PGDisplay;
> > +@protocol MTLDevice;
> > +@protocol MTLTexture;
> > +@protocol MTLCommandQueue;
> > +
> > +typedef QTAILQ_HEAD(, PGTask_s) PGTaskList;
> > +
> > +struct AppleGFXMapMemoryJob;
>
> Probably this declaration of AppleGFXMapMemoryJob is unnecessary.
>
> > +typedef struct AppleGFXState {
> > +    MemoryRegion iomem_gfx;
> > +    id<PGDevice> pgdev;
> > +    id<PGDisplay> pgdisp;
> > +    PGTaskList tasks;
> > +    QemuConsole *con;
> > +    id<MTLDevice> mtl;
> > +    id<MTLCommandQueue> mtl_queue;
> > +    bool cursor_show;
> > +    QEMUCursor *cursor;
> > +
> > +    /* For running PVG memory-mapping requests in the AIO context */
> > +    QemuCond job_cond;
> > +    QemuMutex job_mutex;
>
> Use: QemuEvent
>

Hmm. I think if we were to use that, we would need to create a new
QemuEvent for every job and destroy it afterward, which seems expensive. We
can't rule out multiple concurrent jobs being submitted, and the QemuEvent
system only supports a single producer as far as I can tell.

You can probably sort of hack around it with just one QemuEvent by putting
the qemu_event_wait into a loop and turning the job.done flag into an
atomic (because it would now need to be checked outside the lock) but this
all seems unnecessarily complicated considering the QemuEvent uses the same
mechanism QemuCond/QemuMutex internally on macOS (the only platform
relevant here), except we can use it as intended with QemuCond/QemuMutex
rather than having to work against the abstraction.


> > +
> > +    dispatch_queue_t render_queue;
> > +    /* The following fields should only be accessed from the BQL: */
>
> Perhaps it may be better to document fields that can be accessed
> *without* the BQL; most things in QEMU implicitly require the BQL.
>
> > +    bool gfx_update_requested;
> > +    bool new_frame_ready;
> > +    bool using_managed_texture_storage;
> > +} AppleGFXState;
> > +
> > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char*
> obj_name);
> > +void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor
> *desc,
> > +                              Error **errp);
> > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
> > +                                               uint64_t length, bool
> read_only);
> > +void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag);
> > +
> > +#endif
> > +
> > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
> > new file mode 100644
> > index 00000000000..46be9957f69
> > --- /dev/null
> > +++ b/hw/display/apple-gfx.m
> > @@ -0,0 +1,713 @@
> > +/*
> > + * QEMU Apple ParavirtualizedGraphics.framework device
> > + *
> > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights
> Reserved.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + * ParavirtualizedGraphics.framework is a set of libraries that macOS
> provides
> > + * which implements 3d graphics passthrough to the host as well as a
> > + * proprietary guest communication channel to drive it. This device
> model
> > + * implements support to drive that library from within QEMU.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> > +#include <mach/mach_vm.h>
> > +#include "apple-gfx.h"
> > +#include "trace.h"
> > +#include "qemu-main.h"
> > +#include "exec/address-spaces.h"
> > +#include "migration/blocker.h"
> > +#include "monitor/monitor.h"
> > +#include "qemu/main-loop.h"
> > +#include "qemu/cutils.h"
> > +#include "qemu/log.h"
> > +#include "qapi/visitor.h"
> > +#include "qapi/error.h"
> > +#include "ui/console.h"
> > +
> > +static const PGDisplayCoord_t apple_gfx_modes[] = {
> > +    { .x = 1440, .y = 1080 },
> > +    { .x = 1280, .y = 1024 },
> > +};
> > +
> > +/* This implements a type defined in
> <ParavirtualizedGraphics/PGDevice.h>
> > + * which is opaque from the framework's point of view. Typedef PGTask_t
> already
> > + * exists in the framework headers. */
> > +struct PGTask_s {
> > +    QTAILQ_ENTRY(PGTask_s) node;
> > +    mach_vm_address_t address;
> > +    uint64_t len;
> > +};
> > +
> > +static Error *apple_gfx_mig_blocker;
>
> This does not have to be a static variable.
>

Hmm, the first 5 or so examples of migration blockers in other devices etc.
I could find were all declared in this way. What are you suggesting as the
alternative? And why not use the same pattern as in most of the rest of the
code base?


> > +
> > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> > +                                             uint32_t width, uint32_t
> height);
> > +
> > +static inline dispatch_queue_t get_background_queue(void)
>
> Don't add inline. The only effect for modern compilers of inline is to
> suppress the unused function warnings.
>
> > +{
> > +    return dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT,
> 0);
> > +}
> > +
> > +static PGTask_t *apple_gfx_new_task(AppleGFXState *s, uint64_t len)
> > +{
> > +    mach_vm_address_t task_mem;
> > +    PGTask_t *task;
> > +    kern_return_t r;
> > +
> > +    r = mach_vm_allocate(mach_task_self(), &task_mem, len,
> VM_FLAGS_ANYWHERE);
> > +    if (r != KERN_SUCCESS || task_mem == 0) {
>
> Let's remove the check for task_mem == 0. We have no reason to reject it
> if the platform insists it allocated a memory at address 0 though such a
> situation should never happen in practice.
>
> > +        return NULL;
> > +    }
> > +
> > +    task = g_new0(PGTask_t, 1);
> > +
> > +    task->address = task_mem;
> > +    task->len = len;
> > +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
> > +
> > +    return task;
> > +}
> > +
> > +typedef struct AppleGFXIOJob {
> > +    AppleGFXState *state;
> > +    uint64_t offset;
> > +    uint64_t value;
> > +    bool completed;
> > +} AppleGFXIOJob;
> > +
> > +static void apple_gfx_do_read(void *opaque)
> > +{
> > +    AppleGFXIOJob *job = opaque;
> > +    job->value = [job->state->pgdev mmioReadAtOffset:job->offset];
> > +    qatomic_set(&job->completed, true);
> > +    aio_wait_kick();
> > +}
> > +
> > +static uint64_t apple_gfx_read(void *opaque, hwaddr offset, unsigned
> size)
> > +{
> > +    AppleGFXIOJob job = {
> > +        .state = opaque,
> > +        .offset = offset,
> > +        .completed = false,
> > +    };
> > +    AioContext *context = qemu_get_aio_context();
> > +    dispatch_queue_t queue = get_background_queue();
> > +
> > +    dispatch_async_f(queue, &job, apple_gfx_do_read);
> > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> > +
> > +    trace_apple_gfx_read(offset, job.value);
> > +    return job.value;
> > +}
> > +
> > +static void apple_gfx_do_write(void *opaque)
> > +{
> > +    AppleGFXIOJob *job = opaque;
> > +    [job->state->pgdev mmioWriteAtOffset:job->offset value:job->value];
> > +    qatomic_set(&job->completed, true);
> > +    aio_wait_kick();
> > +}
> > +
> > +static void apple_gfx_write(void *opaque, hwaddr offset, uint64_t val,
> > +                            unsigned size)
> > +{
> > +    /* The methods mmioReadAtOffset: and especially mmioWriteAtOffset:
> can
> > +     * trigger and block on operations on other dispatch queues, which
> in turn
> > +     * may call back out on one or more of the callback blocks. For
> this reason,
> > +     * and as we are holding the BQL, we invoke the I/O methods on a
> pool
> > +     * thread and handle AIO tasks while we wait. Any work in the
> callbacks
> > +     * requiring the BQL will in turn schedule BHs which this thread
> will
> > +     * process while waiting. */
> > +    AppleGFXIOJob job = {
> > +        .state = opaque,
> > +        .offset = offset,
> > +        .value = val,
> > +        .completed = false,
> > +    };
> > +    AioContext *context = qemu_get_current_aio_context();
> > +    dispatch_queue_t queue = get_background_queue();
> > +
> > +    dispatch_async_f(queue, &job, apple_gfx_do_write);
> > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> > +
> > +    trace_apple_gfx_write(offset, val);
> > +}
> > +
> > +static const MemoryRegionOps apple_gfx_ops = {
> > +    .read = apple_gfx_read,
> > +    .write = apple_gfx_write,
> > +    .endianness = DEVICE_LITTLE_ENDIAN,
> > +    .valid = {
> > +        .min_access_size = 4,
> > +        .max_access_size = 8,
> > +    },
> > +    .impl = {
> > +        .min_access_size = 4,
> > +        .max_access_size = 4,
> > +    },
> > +};
> > +
> > +static void apple_gfx_render_new_frame_bql_unlock(AppleGFXState *s)
> > +{
> > +    BOOL r;
> > +    uint32_t width = surface_width(s->surface);
> > +    uint32_t height = surface_height(s->surface);
> > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue commandBuffer];
> > +    id<MTLTexture> texture = s->texture;
> > +
> > +    assert(bql_locked());
> > +    [texture retain];
> > +
> > +    bql_unlock();
> > +
> > +    /* This is not safe to call from the BQL due to PVG-internal locks
> causing
> > +     * deadlocks. */
> > +    r = [s->pgdisp encodeCurrentFrameToCommandBuffer:command_buffer
> > +                                             texture:texture
> > +                                              region:region];
> > +    if (!r) {
> > +        [texture release];
> > +        bql_lock();
> > +        --s->pending_frames;
> > +        bql_unlock();
> > +        qemu_log_mask(LOG_GUEST_ERROR,
> "apple_gfx_render_new_frame_bql_unlock: "
>
> Use: __func__
>
> > +
> "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
> > +        return;
> > +    }
> > +
> > +    if (s->using_managed_texture_storage) {
> > +        /* "Managed" textures exist in both VRAM and RAM and must be
> synced. */
> > +        id<MTLBlitCommandEncoder> blit = [command_buffer
> blitCommandEncoder];
> > +        [blit synchronizeResource:texture];
> > +        [blit endEncoding];
> > +    }
> > +    [texture release];
> > +    [command_buffer addCompletedHandler:
> > +        ^(id<MTLCommandBuffer> cb)
> > +        {
> > +            dispatch_async(s->render_queue, ^{
> > +                apple_gfx_render_frame_completed(s, width, height);
> > +            });
> > +        }];
> > +    [command_buffer commit];
> > +}
> > +
> > +static void copy_mtl_texture_to_surface_mem(id<MTLTexture> texture,
> void *vram)
> > +{
> > +    /* TODO: Skip this entirely on a pure Metal or headless/guest-only
> > +     * rendering path, else use a blit command encoder? Needs careful
> > +     * (double?) buffering design. */
> > +    size_t width = texture.width, height = texture.height;
> > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> > +    [texture getBytes:vram
> > +          bytesPerRow:(width * 4)
> > +        bytesPerImage:(width * height * 4)
> > +           fromRegion:region
> > +          mipmapLevel:0
> > +                slice:0];
> > +}copy_mtl_texture_to_surface_mem
> > +
> > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> > +                                             uint32_t width, uint32_t
> height)
> > +{
> > +    bql_lock();
> > +    --s->pending_frames;
> > +    assert(s->pending_frames >= 0);
> > +
> > +    /* Only update display if mode hasn't changed since we started
> rendering. */
> > +    if (width == surface_width(s->surface) &&
> > +        height == surface_height(s->surface)) {
> > +        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
> > +        if (s->gfx_update_requested) {
> > +            s->gfx_update_requested = false;
> > +            dpy_gfx_update_full(s->con);
> > +            graphic_hw_update_done(s->con);
> > +            s->new_frame_ready = false;
> > +        } else {
> > +            s->new_frame_ready = true;
> > +        }
> > +    }
> > +    if (s->pending_frames > 0) {
> > +        apple_gfx_render_new_frame_bql_unlock(s);
> > +    } else {
> > +        bql_unlock();
> > +    }
> > +}
> > +
> > +static void apple_gfx_fb_update_display(void *opaque)
> > +{
> > +    AppleGFXState *s = opaque;
> > +
> > +    assert(bql_locked());
> > +    if (s->new_frame_ready) {
> > +        dpy_gfx_update_full(s->con);
> > +        s->new_frame_ready = false;
> > +        graphic_hw_update_done(s->con);
> > +    } else if (s->pending_frames > 0) {
> > +        s->gfx_update_requested = true;
> > +    } else {
> > +        graphic_hw_update_done(s->con);
> > +    }
> > +}
> > +
> > +static const GraphicHwOps apple_gfx_fb_ops = {
> > +    .gfx_update = apple_gfx_fb_update_display,
> > +    .gfx_update_async = true,
> > +};
> > +
> > +static void update_cursor(AppleGFXState *s)
> > +{
> > +    assert(bql_locked());
> > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
> > +                  s->pgdisp.cursorPosition.y, s->cursor_show);
> > +}
> > +
> > +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t height)
> > +{
> > +    MTLTextureDescriptor *textureDescriptor;
> > +
> > +    if (s->surface &&
> > +        width == surface_width(s->surface) &&
> > +        height == surface_height(s->surface)) {
> > +        return;
> > +    }
> > +
> > +    g_free(s->vram);
> > +    [s->texture release];
> > +
> > +    s->vram = g_malloc0_n(width * height, 4);
> > +    s->surface = qemu_create_displaysurface_from(width, height,
> PIXMAN_LE_a8r8g8b8,
> > +                                                 width * 4, s->vram);>
> +> +    @autoreleasepool {
> > +        textureDescriptor =
> > +            [MTLTextureDescriptor
> > +
> texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
> > +                                             width:width
> > +                                            height:height
> > +                                         mipmapped:NO];
> > +        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
> > +        s->texture = [s->mtl
> newTextureWithDescriptor:textureDescriptor];
>
>
> What about creating pixman_image_t from s->texture.buffer.contents? This
> should save memory usage by removing the duplication of texture.
>

We need explicit control over when the GPU vs when the CPU may access the
texture - only one of them may access them at a time. As far as I can tell,
we can't control when the rest of Qemu might access the pixman_image used
in the console surface?


> > +    }
> > +
> > +    s->using_managed_texture_storage =
> > +        (s->texture.storageMode == MTLStorageModeManaged);
> > +    dpy_gfx_replace_surface(s->con, s->surface);
> > +}
> > +
> > +static void create_fb(AppleGFXState *s)
> > +{
> > +    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
> > +    set_mode(s, 1440, 1080);
> > +
> > +    s->cursor_show = true;
> > +}
> > +
> > +static size_t apple_gfx_get_default_mmio_range_size(void)
> > +{
> > +    size_t mmio_range_size;
> > +    @autoreleasepool {
> > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> > +        mmio_range_size = desc.mmioLength;
> > +        [desc release];
> > +    }
> > +    return mmio_range_size;
> > +}
> > +
> > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char*
> obj_name)
> > +{
> > +    size_t mmio_range_size = apple_gfx_get_default_mmio_range_size();
> > +
> > +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
> > +    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s,
> obj_name,
> > +                          mmio_range_size);
> > +
> > +    /* TODO: PVG framework supports serialising device state: integrate
> it! */
> > +}
> > +
> > +typedef struct AppleGFXMapMemoryJob {
> > +    AppleGFXState *state;
> > +    PGTask_t *task;
> > +    uint64_t virtual_offset;
> > +    PGPhysicalMemoryRange_t *ranges;
> > +    uint32_t range_count;
> > +    bool read_only;
> > +    bool success;
> > +    bool done;
> > +} AppleGFXMapMemoryJob;
> > +
> > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
> > +                                               uint64_t length, bool
> read_only)
> > +{
> > +    MemoryRegion *ram_region;
> > +    uintptr_t host_address;
> > +    hwaddr ram_region_offset = 0;
> > +    hwaddr ram_region_length = length;
> > +
> > +    ram_region = address_space_translate(&address_space_memory,
> > +                                         guest_physical,
> > +                                         &ram_region_offset,
> > +                                         &ram_region_length, !read_only,
> > +                                         MEMTXATTRS_UNSPECIFIED);
>
> Call memory_region_ref() so that it won't go away.
>
> > +
> > +    if (!ram_region || ram_region_length < length ||
> > +        !memory_access_is_direct(ram_region, !read_only)) {
> > +        return 0;
> > +    }
> > +
> > +    host_address =
> (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
>
> host_address is typed as uintptr_t, not mach_vm_address_t.
>
> > +    if (host_address == 0) {
> > +        return 0;
> > +    }
> > +    host_address += ram_region_offset;
> > +
> > +    return host_address;
> > +}
> > +
> > +static void apple_gfx_map_memory(void *opaque)
> > +{
> > +    AppleGFXMapMemoryJob *job = opaque;
> > +    AppleGFXState *s = job->state;
> > +    PGTask_t *task                  = job->task;
> > +    uint32_t range_count            = job->range_count;
> > +    uint64_t virtual_offset         = job->virtual_offset;
> > +    PGPhysicalMemoryRange_t *ranges = job->ranges;
> > +    bool read_only                  = job->read_only;
> > +    kern_return_t r;
> > +    mach_vm_address_t target, source;
> > +    vm_prot_t cur_protection, max_protection;
> > +    bool success = true;
> > +
> > +    g_assert(bql_locked());
> > +
> > +    trace_apple_gfx_map_memory(task, range_count, virtual_offset,
> read_only);
> > +    for (int i = 0; i < range_count; i++) {
> > +        PGPhysicalMemoryRange_t *range = &ranges[i];
> > +
> > +        target = task->address + virtual_offset;
> > +        virtual_offset += range->physicalLength;
> > +
> > +        trace_apple_gfx_map_memory_range(i, range->physicalAddress,
> > +                                         range->physicalLength);
> > +
> > +        source =
> apple_gfx_host_address_for_gpa_range(range->physicalAddress,
> > +
> range->physicalLength,
> > +                                                      read_only);
> > +        if (source == 0) {
> > +            success = false;
> > +            continue;
> > +        }
> > +
> > +        MemoryRegion* alt_mr = NULL;
> > +        mach_vm_address_t alt_source =
> (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress,
> range->physicalLength, NULL);
> > +        g_assert(alt_source == source);
>
> Remove this; I guess this is for debugging.
>
> > +
> > +        cur_protection = 0;
> > +        max_protection = 0;
> > +        // Map guest RAM at range->physicalAddress into PG task memory
> range
> > +        r = mach_vm_remap(mach_task_self(),
> > +                          &target, range->physicalLength, vm_page_size
> - 1,
> > +                          VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE,
> > +                          mach_task_self(),
> > +                          source, false /* shared mapping, no copy */,
> > +                          &cur_protection, &max_protection,
> > +                          VM_INHERIT_COPY);
> > +        trace_apple_gfx_remap(r, source, target);
> > +        g_assert(r == KERN_SUCCESS);
> > +    }
> > +
> > +    qemu_mutex_lock(&s->job_mutex);
> > +    job->success = success;
> > +    job->done = true;
> > +    qemu_cond_broadcast(&s->job_cond);
> > +    qemu_mutex_unlock(&s->job_mutex);
> > +}
> > +
> > +void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag)
> > +{
> > +    qemu_mutex_lock(&s->job_mutex);
> > +    while (!*job_done_flag) {
> > +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
> > +    }
> > +    qemu_mutex_unlock(&s->job_mutex);
> > +}
> > +
> > +typedef struct AppleGFXReadMemoryJob {
> > +    AppleGFXState *s;
> > +    hwaddr physical_address;
> > +    uint64_t length;
> > +    void *dst;
> > +    bool done;
> > +} AppleGFXReadMemoryJob;
> > +
> > +static void apple_gfx_do_read_memory(void *opaque)
> > +{
> > +    AppleGFXReadMemoryJob *job = opaque;
> > +    AppleGFXState *s = job->s;
> > +
> > +    cpu_physical_memory_read(job->physical_address, job->dst,
> job->length);
>
> Use: dma_memory_read()
>
> > +
> > +    qemu_mutex_lock(&s->job_mutex);
> > +    job->done = true;
> > +    qemu_cond_broadcast(&s->job_cond);
> > +    qemu_mutex_unlock(&s->job_mutex);
> > +}
> > +
> > +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr
> physical_address,
> > +                                  uint64_t length, void *dst)
> > +{
> > +    AppleGFXReadMemoryJob job = {
> > +        s, physical_address, length, dst
> > +    };
> > +
> > +    trace_apple_gfx_read_memory(physical_address, length, dst);
> > +
> > +    /* Traversing the memory map requires RCU/BQL, so do it in a BH. */
> > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> apple_gfx_do_read_memory,
> > +                            &job);
> > +    apple_gfx_await_bh_job(s, &job.done);
> > +}
> > +
> > +static void apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
> > +                                                     PGDeviceDescriptor
> *desc)
> > +{
> > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable * _Nonnull
> baseAddress) {
> > +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
> > +        *baseAddress = (void *)task->address;
> > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
> > +        return task;
> > +    };
> > +
> > +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
> > +        trace_apple_gfx_destroy_task(task);
> > +        QTAILQ_REMOVE(&s->tasks, task, node);
> > +        mach_vm_deallocate(mach_task_self(), task->address, task->len);
> > +        g_free(task);
> > +    };
> > +
> > +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t
> range_count,
> > +                       uint64_t virtual_offset, bool read_only,
> > +                       PGPhysicalMemoryRange_t * _Nonnull ranges) {
> > +        AppleGFXMapMemoryJob job = {
> > +            .state = s,
> > +            .task = task, .ranges = ranges, .range_count = range_count,
> > +            .read_only = read_only, .virtual_offset = virtual_offset,
> > +            .done = false, .success = true,
> > +        };
> > +        if (range_count > 0) {
> > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> > +                                    apple_gfx_map_memory, &job);
> > +            apple_gfx_await_bh_job(s, &job.done);
> > +        }
> > +        return job.success;
> > +    };
> > +
> > +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task, uint64_t
> virtualOffset,
> > +                         uint64_t length) {
> > +        kern_return_t r;
> > +        mach_vm_address_t range_address;
> > +
> > +        trace_apple_gfx_unmap_memory(task, virtualOffset, length);
> > +
> > +        /* Replace task memory range with fresh pages, undoing the
> mapping
> > +         * from guest RAM. */
> > +        range_address = task->address + virtualOffset;
> > +        r = mach_vm_allocate(mach_task_self(), &range_address, length,
> > +                             VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE);
> > +        g_assert(r == KERN_SUCCESS);error_setg
>
> An extra error_setg
>
> > +
> > +        return true;
> > +    };
> > +
> > +    desc.readMemory = ^bool(uint64_t physical_address, uint64_t length,
> > +                            void * _Nonnull dst) {
> > +        apple_gfx_read_memory(s, physical_address, length, dst);
> > +        return true;
> > +    };
> > +}
> > +
> > +static PGDisplayDescriptor
> *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
> > +{
> > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
> > +
> > +    disp_desc.name = @"QEMU display";
> > +    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A 20"
> display */
> > +    disp_desc.queue = dispatch_get_main_queue();
> > +    disp_desc.newFrameEventHandler = ^(void) {
> > +        trace_apple_gfx_new_frame();
> > +        dispatch_async(s->render_queue, ^{
> > +            /* Drop frames if we get too far ahead. */
> > +            bql_lock();
> > +            if (s->pending_frames >= 2) {
> > +                bql_unlock();
> > +                return;
> > +            }
> > +            ++s->pending_frames;
> > +            if (s->pending_frames > 1) {
> > +                bql_unlock();
> > +                return;
> > +            }
> > +            @autoreleasepool {
> > +                apple_gfx_render_new_frame_bql_unlock(s);
> > +            }
> > +        });
> > +    };
> > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t sizeInPixels,
> > +                                    OSType pixelFormat) {
> > +        trace_apple_gfx_mode_change(sizeInPixels.x, sizeInPixels.y);
> > +
> > +        BQL_LOCK_GUARD();
> > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
> > +    };
> > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
> > +                                     PGDisplayCoord_t hotSpot) {
> > +        [glyph retain];
> > +        dispatch_async(get_background_queue(), ^{
> > +            BQL_LOCK_GUARD();
> > +            uint32_t bpp = glyph.bitsPerPixel;
> > +            size_t width = glyph.pixelsWide;
> > +            size_t height = glyph.pixelsHigh;
> > +            size_t padding_bytes_per_row = glyph.bytesPerRow - width *
> 4;
> > +            const uint8_t* px_data = glyph.bitmapData;
> > +
> > +            trace_apple_gfx_cursor_set(bpp, width, height);
> > +
> > +            if (s->cursor) {
> > +                cursor_unref(s->cursor);
> > +                s->cursor = NULL;
> > +            }
> > +
> > +            if (bpp == 32) { /* Shouldn't be anything else, but just to
> be safe...*/
> > +                s->cursor = cursor_alloc(width, height);
> > +                s->cursor->hot_x = hotSpot.x;
> > +                s->cursor->hot_y = hotSpot.y;
> > +
> > +                uint32_t *dest_px = s->cursor->data;
> > +
> > +                for (size_t y = 0; y < height; ++y) {
> > +                    for (size_t x = 0; x < width; ++x) {
> > +                        /* NSBitmapImageRep's red & blue channels are
> swapped
> > +                         * compared to QEMUCursor's. */
> > +                        *dest_px =
> > +                            (px_data[0] << 16u) |
> > +                            (px_data[1] <<  8u) |
> > +                            (px_data[2] <<  0u) |
> > +                            (px_data[3] << 24u);
> > +                        ++dest_px;
> > +                        px_data += 4;
> > +                    }
> > +                    px_data += padding_bytes_per_row;
> > +                }
> > +                dpy_cursor_define(s->con, s->cursor);
> > +                update_cursor(s);
> > +            }
> > +            [glyph release];
> > +        });
> > +    };
> > +    disp_desc.cursorShowHandler = ^(BOOL show) {
> > +        dispatch_async(get_background_queue(), ^{
> > +            BQL_LOCK_GUARD();
> > +            trace_apple_gfx_cursor_show(show);
> > +            s->cursor_show = show;
> > +            update_cursor(s);
> > +        });
> > +    };
> > +    disp_desc.cursorMoveHandler = ^(void) {
> > +        dispatch_async(get_background_queue(), ^{
> > +            BQL_LOCK_GUARD();
> > +            trace_apple_gfx_cursor_move();
> > +            update_cursor(s);
> > +        });
> > +    };
> > +
> > +    return disp_desc;
> > +}
> > +
> > +static NSArray<PGDisplayMode*>*
> apple_gfx_prepare_display_mode_array(void)
> > +{
> > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
> > +    NSArray<PGDisplayMode*>* mode_array = nil;
> > +    int i;
> > +
> > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> > +        modes[i] =
> > +            [[PGDisplayMode alloc]
> initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
> > +    }
> > +
> > +    mode_array = [NSArray arrayWithObjects:modes
> count:ARRAY_SIZE(apple_gfx_modes)];
> > +
> > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> > +        [modes[i] release];
> > +        modes[i] = nil;
> > +    }
> > +
> > +    return mode_array;
> > +}
> > +
> > +static id<MTLDevice> copy_suitable_metal_device(void)
> > +{
> > +    id<MTLDevice> dev = nil;
> > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
> > +
> > +    /* Prefer a unified memory GPU. Failing that, pick a non-removable
> GPU. */
> > +    for (size_t i = 0; i < devs.count; ++i) {
> > +        if (devs[i].hasUnifiedMemory) {
> > +            dev = devs[i];
> > +            break;
> > +        }
> > +        if (!devs[i].removable) {
> > +            dev = devs[i];
> > +        }
> > +    }
> > +
> > +    if (dev != nil) {
> > +        [dev retain];
> > +    } else {
> > +        dev = MTLCreateSystemDefaultDevice();
> > +    }
> > +    [devs release];
> > +
> > +    return dev;
> > +}
> > +
> > +void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor
> *desc,
> > +                              Error **errp)
> > +{
> > +    PGDisplayDescriptor *disp_desc = nil;
> > +
> > +    if (apple_gfx_mig_blocker == NULL) {
> > +        error_setg(&apple_gfx_mig_blocker,
> > +                  "Migration state blocked by apple-gfx display
> device");
> > +        if (migrate_add_blocker(&apple_gfx_mig_blocker, errp) < 0) {
> > +            return;
> > +        }
> > +    }
> > +
> > +    QTAILQ_INIT(&s->tasks);
> > +    s->render_queue = dispatch_queue_create("apple-gfx.render",
> > +                                            DISPATCH_QUEUE_SERIAL);
> > +    s->mtl = copy_suitable_metal_device();
> > +    s->mtl_queue = [s->mtl newCommandQueue];
> > +
> > +    desc.device = s->mtl;
> > +
> > +    apple_gfx_register_task_mapping_handlers(s, desc);
> > +
> > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
> > +
> > +    disp_desc = apple_gfx_prepare_display_descriptor(s);
> > +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
> > +                                              port:0 serialNum:1234];
> > +    [disp_desc release];
> > +    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
> > +
> > +    create_fb(s);
> > +
> > +    qemu_mutex_init(&s->job_mutex);
> > +    qemu_cond_init(&s->job_cond);
> > +}
> > diff --git a/hw/display/meson.build b/hw/display/meson.build
> > index 20a94973fa2..619e642905a 100644
> > --- a/hw/display/meson.build
> > +++ b/hw/display/meson.build
> > @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true:
> files('artist.c'))
> >
> >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c',
> 'ati_2d.c', 'ati_dbg.c'), pixman])
> >
> > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
> [files('apple-gfx.m'), pvg, metal])
> > +if cpu == 'aarch64'
> > +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true:
> [files('apple-gfx-mmio.m'), pvg, metal])
> > +endif
> >
> >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
> >     virtio_gpu_ss = ss.source_set()
> > diff --git a/hw/display/trace-events b/hw/display/trace-events
> > index 781f8a33203..214998312b9 100644
> > --- a/hw/display/trace-events
> > +++ b/hw/display/trace-events
> > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width) "dest_width :
> %u"
> >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
> >   dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
> >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
> > +
> > +# apple-gfx.m
> > +apple_gfx_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64"
> res=0x%"PRIx64
> > +apple_gfx_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64"
> val=0x%"PRIx64
> > +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x
> base_addr=%p"
> > +apple_gfx_destroy_task(void *task) "task=%p"
> > +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t
> virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
> virtual_offset=0x%"PRIx64" read_only=%d"
> > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr, uint64_t
> phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
> > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t target)
> "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
> > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset, uint64_t
> length) "task=%p virtual_offset=0x%"PRIx64" length=0x%"PRIx64
> > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length, void
> *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
> > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
> > +apple_gfx_new_frame(void) ""
> > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64" y=%"PRId64
> > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t height)
> "bpp=%d width=%"PRId64" height=0x%"PRId64
> > +apple_gfx_cursor_show(uint32_t show) "show=%d"
> > +apple_gfx_cursor_move(void) ""
> > +apple_gfx_common_init(const char *device_name, size_t mmio_size)
> "device: %s; MMIO size: %zu bytes"
> > +
> > +# apple-gfx-mmio.m
> > +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res)
> "offset=0x%"PRIx64" res=0x%"PRIx64
> > +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val)
> "offset=0x%"PRIx64" val=0x%"PRIx64
> > +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t ro,
> void *va, void *e, void *f, void* va_result, int success) "phys=0x%"PRIx64"
> len=0x%"PRIx64" ro=%d va=%p e=%p f=%p -> *va=%p, success = %d"
> > +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c, void *d, void
> *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
> > +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
> > +
> > diff --git a/meson.build b/meson.build
> > index d26690ce204..0e124eff13f 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -761,6 +761,8 @@ socket = []
> >   version_res = []
> >   coref = []
> >   iokit = []
> > +pvg = []
> > +metal = []
> >   emulator_link_args = []
> >   midl = not_found
> >   widl = not_found
> > @@ -782,6 +784,8 @@ elif host_os == 'darwin'
> >     coref = dependency('appleframeworks', modules: 'CoreFoundation')
> >     iokit = dependency('appleframeworks', modules: 'IOKit', required:
> false)
> >     host_dsosuf = '.dylib'
> > +  pvg = dependency('appleframeworks', modules:
> 'ParavirtualizedGraphics')
> > +  metal = dependency('appleframeworks', modules: 'Metal')
> >   elif host_os == 'sunos'
> >     socket = [cc.find_library('socket'),
> >               cc.find_library('nsl'),
>
>

[-- Attachment #2: Type: text/html, Size: 64625 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-25 19:43     ` Phil Dennis-Jordan
@ 2024-10-26  4:40       ` Akihiko Odaki
  2024-10-26 10:24         ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  4:40 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/26 4:43, Phil Dennis-Jordan wrote:
> 
> 
> On Fri, 25 Oct 2024 at 08:03, Akihiko Odaki <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
>     On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
>      > MacOS provides a framework (library) that allows any vmm to
>     implement a
>      > paravirtualized 3d graphics passthrough to the host metal stack
>     called
>      > ParavirtualizedGraphics.Framework (PVG). The library abstracts away
>      > almost every aspect of the paravirtualized device model and only
>     provides
>      > and receives callbacks on MMIO access as well as to share memory
>     address
>      > space between the VM and PVG.
>      >
>      > This patch implements a QEMU device that drives PVG for the VMApple
>      > variant of it.
>      >
>      > Signed-off-by: Alexander Graf <graf@amazon.com
>     <mailto:graf@amazon.com>>
>      > Co-authored-by: Alexander Graf <graf@amazon.com
>     <mailto:graf@amazon.com>>
>      >
>      > Subsequent changes:
>      >
>      >   * Cherry-pick/rebase conflict fixes, API use updates.
>      >   * Moved from hw/vmapple/ (useful outside that machine type)
>      >   * Overhaul of threading model, many thread safety improvements.
>      >   * Asynchronous rendering.
>      >   * Memory and object lifetime fixes.
>      >   * Refactoring to split generic and (vmapple) MMIO variant specific
>      >     code.
>      >
>      > Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu
>     <mailto:phil@philjordan.eu>>
>      > ---
>      >
>      > v2:
>      >
>      >   * Cherry-pick/rebase conflict fixes
>      >   * BQL function renaming
>      >   * Moved from hw/vmapple/ (useful outside that machine type)
>      >   * Code review comments: Switched to DEFINE_TYPES macro & little
>     endian
>      >     MMIO.
>      >   * Removed some dead/superfluous code
>      >   * Mad set_mode thread & memory safe
>      >   * Added migration blocker due to lack of (de-)serialisation.
>      >   * Fixes to ObjC refcounting and autorelease pool usage.
>      >   * Fixed ObjC new/init misuse
>      >   * Switched to ObjC category extension for private property.
>      >   * Simplified task memory mapping and made it thread safe.
>      >   * Refactoring to split generic and vmapple MMIO variant specific
>      >     code.
>      >   * Switched to asynchronous MMIO writes on x86-64
>      >   * Rendering and graphics update are now done asynchronously
>      >   * Fixed cursor handling
>      >   * Coding convention fixes
>      >   * Removed software cursor compositing
>      >
>      > v3:
>      >
>      >   * Rebased on latest upstream, fixed breakages including
>     switching to Resettable methods.
>      >   * Squashed patches dealing with dGPUs, MMIO area size, and GPU
>     picking.
>      >   * Allow re-entrant MMIO; this simplifies the code and solves
>     the divergence
>      >     between x86-64 and arm64 variants.
>      >
>      > v4:
>      >
>      >   * Renamed '-vmapple' device variant to '-mmio'
>      >   * MMIO device type now requires aarch64 host and guest
>      >   * Complete overhaul of the glue code for making Qemu's and
>      >     ParavirtualizedGraphics.framework's threading and
>     synchronisation models
>      >     work together. Calls into PVG are from dispatch queues while the
>      >     BQL-holding initiating thread processes AIO context events;
>     callbacks from
>      >     PVG are scheduled as BHs on the BQL/main AIO context,
>     awaiting completion
>      >     where necessary.
>      >   * Guest frame rendering state is covered by the BQL, with only
>     the PVG calls
>      >     outside the lock, and serialised on the named render_queue.
>      >   * Simplified logic for dropping frames in-flight during mode
>     changes, fixed
>      >     bug in pending frames logic.
>      >   * Addressed smaller code review notes such as: function naming,
>     object type
>      >     declarations, type names/declarations/casts, code formatting,
>     #include
>      >     order, over-cautious ObjC retain/release, what goes in init
>     vs realize,
>      >     etc.
>      >
>      >
>      >   hw/display/Kconfig          |   9 +
>      >   hw/display/apple-gfx-mmio.m | 284 ++++++++++++++
>      >   hw/display/apple-gfx.h      |  58 +++
>      >   hw/display/apple-gfx.m      | 713 +++++++++++++++++++++++++++++
>     +++++++
>      >   hw/display/meson.build      |   4 +
>      >   hw/display/trace-events     |  26 ++
>      >   meson.build                 |   4 +
>      >   7 files changed, 1098 insertions(+)
>      >   create mode 100644 hw/display/apple-gfx-mmio.m
>      >   create mode 100644 hw/display/apple-gfx.h
>      >   create mode 100644 hw/display/apple-gfx.m
>      >
>      > diff --git a/hw/display/Kconfig b/hw/display/Kconfig
>      > index 2250c740078..6a9b7b19ada 100644
>      > --- a/hw/display/Kconfig
>      > +++ b/hw/display/Kconfig
>      > @@ -140,3 +140,12 @@ config XLNX_DISPLAYPORT
>      >
>      >   config DM163
>      >       bool
>      > +
>      > +config MAC_PVG
>      > +    bool
>      > +    default y
>      > +
>      > +config MAC_PVG_MMIO
>      > +    bool
>      > +    depends on MAC_PVG && AARCH64
>      > +
>      > diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-
>     mmio.m
>      > new file mode 100644
>      > index 00000000000..06131bc23f1
>      > --- /dev/null
>      > +++ b/hw/display/apple-gfx-mmio.m
>      > @@ -0,0 +1,284 @@
>      > +/*
>      > + * QEMU Apple ParavirtualizedGraphics.framework device, MMIO
>     (arm64) variant
>      > + *
>      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All
>     Rights Reserved.
>      > + *
>      > + * This work is licensed under the terms of the GNU GPL, version
>     2 or later.
>      > + * See the COPYING file in the top-level directory.
> 
>     Use SPDX-License-Identifier. You can find some examples with grep.
> 
> This was already part of the file when I took it over from Alex. I've 
> used SPDX on any new files I've started from scratch. (I can of course / 
> add/ the SPDX line here too.)

Please add it here too.

> 
>      > + *
>      > + * ParavirtualizedGraphics.framework is a set of libraries that
>     macOS provides
>      > + * which implements 3d graphics passthrough to the host as well as a
>      > + * proprietary guest communication channel to drive it. This
>     device model
>      > + * implements support to drive that library from within QEMU as
>     an MMIO-based
>      > + * system device for macOS on arm64 VMs.
>      > + */
>      > +
>      > +#include "qemu/osdep.h"
>      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>      > +#include "apple-gfx.h"
>      > +#include "monitor/monitor.h"
>      > +#include "hw/sysbus.h"
>      > +#include "hw/irq.h"
>      > +#include "trace.h"
>      > +
>      > +OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXMMIOState, APPLE_GFX_MMIO)
>      > +
>      > +/*
>      > + * ParavirtualizedGraphics.Framework only ships header files for
>     the PCI
>      > + * variant which does not include IOSFC descriptors and host
>     devices. We add
>      > + * their definitions here so that we can also work with the ARM
>     version.
>      > + */
>      > +typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
>      > +typedef bool(^IOSFCUnmapMemory)(
>      > +    void *, void *, void *, void *, void *, void *);
>      > +typedef bool(^IOSFCMapMemory)(
>      > +    uint64_t phys, uint64_t len, bool ro, void **va, void *,
>     void *);
>      > +
>      > +@interface PGDeviceDescriptor (IOSurfaceMapper)
>      > +@property (readwrite, nonatomic) bool usingIOSurfaceMapper;
>      > +@end
>      > +
>      > +@interface PGIOSurfaceHostDeviceDescriptor : NSObject
>      > +-(PGIOSurfaceHostDeviceDescriptor *)init;
>      > +@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory
>     mapMemory;
>      > +@property (readwrite, nonatomic, copy, nullable)
>     IOSFCUnmapMemory unmapMemory;
>      > +@property (readwrite, nonatomic, copy, nullable)
>     IOSFCRaiseInterrupt raiseInterrupt;
>      > +@end
>      > +
>      > +@interface PGIOSurfaceHostDevice : NSObject
>      > +-(instancetype)initWithDescriptor:
>     (PGIOSurfaceHostDeviceDescriptor *)desc;
>      > +-(uint32_t)mmioReadAtOffset:(size_t)offset;
>      > +-(void)mmioWriteAtOffset:(size_t)offset value:(uint32_t)value;
>      > +@end
>      > +
>      > +struct AppleGFXMapSurfaceMemoryJob;
>      > +struct AppleGFXMMIOState {
>      > +    SysBusDevice parent_obj;
>      > +
>      > +    AppleGFXState common;
>      > +
>      > +    qemu_irq irq_gfx;
>      > +    qemu_irq irq_iosfc;
>      > +    MemoryRegion iomem_iosfc;
>      > +    PGIOSurfaceHostDevice *pgiosfc;
>      > +};
>      > +
>      > +typedef struct AppleGFXMMIOJob {
>      > +    AppleGFXMMIOState *state;
>      > +    uint64_t offset;
>      > +    uint64_t value;
>      > +    bool completed;
>      > +} AppleGFXMMIOJob;
>      > +
>      > +static void iosfc_do_read(void *opaque)
>      > +{
>      > +    AppleGFXMMIOJob *job = opaque;
>      > +    job->value = [job->state->pgiosfc mmioReadAtOffset:job->offset];
>      > +    qatomic_set(&job->completed, true);
>      > +    aio_wait_kick();
>      > +}
>      > +
>      > +static uint64_t iosfc_read(void *opaque, hwaddr offset, unsigned
>     size)
>      > +{
>      > +    AppleGFXMMIOJob job = {
>      > +        .state = opaque,
>      > +        .offset = offset,
>      > +        .completed = false,
>      > +    };
>      > +    AioContext *context = qemu_get_aio_context();
>      > +    dispatch_queue_t queue =
>     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
>      > +
>      > +    dispatch_async_f(queue, &job, iosfc_do_read);
>      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> 
>     Pass NULL as the first argument of AIO_WAIT_WHILE().
> 
>      > +
>      > +    trace_apple_gfx_mmio_iosfc_read(offset, job.value);
>      > +    return job.value;
>      > +}
>      > +
>      > +static void iosfc_do_write(void *opaque)
>      > +{
>      > +    AppleGFXMMIOJob *job = opaque;
>      > +    [job->state->pgiosfc mmioWriteAtOffset:job->offset
>     value:job->value];
>      > +    qatomic_set(&job->completed, true);
>      > +    aio_wait_kick();
>      > +}
>      > +
>      > +static void iosfc_write(void *opaque, hwaddr offset, uint64_t val,
>      > +                        unsigned size)
>      > +{
>      > +    AppleGFXMMIOJob job = {
>      > +        .state = opaque,
>      > +        .offset = offset,
>      > +        .value = val,
>      > +        .completed = false,
>      > +    };
>      > +    AioContext *context = qemu_get_aio_context();
>      > +    dispatch_queue_t queue =
>     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
>      > +
>      > +    dispatch_async_f(queue, &job, iosfc_do_write);
>      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
>      > +
>      > +    trace_apple_gfx_mmio_iosfc_write(offset, val);
>      > +}
>      > +
>      > +static const MemoryRegionOps apple_iosfc_ops = {
>      > +    .read = iosfc_read,
>      > +    .write = iosfc_write,
>      > +    .endianness = DEVICE_LITTLE_ENDIAN,
>      > +    .valid = {
>      > +        .min_access_size = 4,
>      > +        .max_access_size = 8,
>      > +    },
>      > +    .impl = {
>      > +        .min_access_size = 4,
>      > +        .max_access_size = 8,
>      > +    },
>      > +};
>      > +
>      > +static void raise_iosfc_irq(void *opaque)
>      > +{
>      > +    AppleGFXMMIOState *s = opaque;
>      > +
>      > +    qemu_irq_pulse(s->irq_iosfc);
>      > +}
>      > +
>      > +typedef struct AppleGFXMapSurfaceMemoryJob {
>      > +    uint64_t guest_physical_address;
>      > +    uint64_t guest_physical_length;
>      > +    void *result_mem;
>      > +    AppleGFXMMIOState *state;
>      > +    bool read_only;
>      > +    bool success;
>      > +    bool done;
>      > +} AppleGFXMapSurfaceMemoryJob;
>      > +
>      > +static void apple_gfx_mmio_map_surface_memory(void *opaque)
>      > +{
>      > +    AppleGFXMapSurfaceMemoryJob *job = opaque;
>      > +    AppleGFXMMIOState *s = job->state;
>      > +    mach_vm_address_t mem;
>      > +
>      > +    mem = apple_gfx_host_address_for_gpa_range(job-
>      >guest_physical_address,
>      > +                                               job-
>      >guest_physical_length,
>      > +                                               job->read_only);
>      > +
>      > +    qemu_mutex_lock(&s->common.job_mutex);
>      > +    job->result_mem = (void*)mem;
> 
>     nit: write as (void *).
> 
>      > +    job->success = mem != 0;
>      > +    job->done = true;
>      > +    qemu_cond_broadcast(&s->common.job_cond);
>      > +    qemu_mutex_unlock(&s->common.job_mutex);
>      > +}
>      > +
>      > +static PGIOSurfaceHostDevice
>     *apple_gfx_prepare_iosurface_host_device(
>      > +    AppleGFXMMIOState *s)
>      > +{
>      > +    PGIOSurfaceHostDeviceDescriptor *iosfc_desc =
>      > +        [PGIOSurfaceHostDeviceDescriptor new];
>      > +    PGIOSurfaceHostDevice *iosfc_host_dev = nil;
>      > +
>      > +    iosfc_desc.mapMemory =
>      > +        ^bool(uint64_t phys, uint64_t len, bool ro, void **va,
>     void *e, void *f) {
>      > +            AppleGFXMapSurfaceMemoryJob job = {
>      > +                .guest_physical_address =
>     phys, .guest_physical_length = len,
>      > +                .read_only = ro, .state = s,
>      > +            };
>      > +
>      > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
>      > +                                   
>     apple_gfx_mmio_map_surface_memory, &job);
>      > +            apple_gfx_await_bh_job(&s->common, &job.done);
>      > +
>      > +            *va = job.result_mem;
>      > +
>      > +            trace_apple_gfx_iosfc_map_memory(phys, len, ro, va,
>     e, f, *va,
>      > +                                             job.success);
>      > +
>      > +            return job.success;
>      > +        };
>      > +
>      > +    iosfc_desc.unmapMemory =
>      > +        ^bool(void *a, void *b, void *c, void *d, void *e, void
>     *f) {
>      > +            trace_apple_gfx_iosfc_unmap_memory(a, b, c, d, e, f);
>      > +            return true;
>      > +        };
>      > +
>      > +    iosfc_desc.raiseInterrupt = ^bool(uint32_t vector) {
>      > +        trace_apple_gfx_iosfc_raise_irq(vector);
>      > +        aio_bh_schedule_oneshot(qemu_get_aio_context(),
>     raise_iosfc_irq, s);
> 
>     Let's pass s->irq_iosfc here to unify raise_iosfc_irq() and
>     raise_gfx_irq().
> 
>      > +        return true;
>      > +    };
>      > +
>      > +    iosfc_host_dev =
>      > +        [[PGIOSurfaceHostDevice alloc]
>     initWithDescriptor:iosfc_desc];
>      > +    [iosfc_desc release];
>      > +    return iosfc_host_dev;
>      > +}
>      > +
>      > +static void raise_gfx_irq(void *opaque)
>      > +{
>      > +    AppleGFXMMIOState *s = opaque;
>      > +
>      > +    qemu_irq_pulse(s->irq_gfx);
>      > +}
>      > +
>      > +static void apple_gfx_mmio_realize(DeviceState *dev, Error **errp)
>      > +{
>      > +    @autoreleasepool {
>      > +        AppleGFXMMIOState *s = APPLE_GFX_MMIO(dev);
>      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
>      > +
>      > +        desc.raiseInterrupt = ^(uint32_t vector) {
>      > +            trace_apple_gfx_raise_irq(vector);
>      > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
>     raise_gfx_irq, s);
>      > +        };
>      > +
>      > +        desc.usingIOSurfaceMapper = true;
>      > +        s->pgiosfc = apple_gfx_prepare_iosurface_host_device(s);
>      > +
>      > +        apple_gfx_common_realize(&s->common, desc, errp);
>      > +        [desc release];
>      > +        desc = nil;
>      > +    }
>      > +}
>      > +
>      > +static void apple_gfx_mmio_init(Object *obj)
>      > +{
>      > +    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
>      > +
>      > +    apple_gfx_common_init(obj, &s->common, TYPE_APPLE_GFX_MMIO);
>      > +
>      > +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->common.iomem_gfx);
>      > +    memory_region_init_io(&s->iomem_iosfc, obj, &apple_iosfc_ops, s,
>      > +                          TYPE_APPLE_GFX_MMIO, 0x10000);
>      > +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem_iosfc);
>      > +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_gfx);
>      > +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq_iosfc);
>      > +}
>      > +
>      > +static void apple_gfx_mmio_reset(Object *obj, ResetType type)
>      > +{
>      > +    AppleGFXMMIOState *s = APPLE_GFX_MMIO(obj);
>      > +    [s->common.pgdev reset];
>      > +}
>      > +
>      > +
>      > +static void apple_gfx_mmio_class_init(ObjectClass *klass, void
>     *data)
>      > +{
>      > +    DeviceClass *dc = DEVICE_CLASS(klass);
>      > +    ResettableClass *rc = RESETTABLE_CLASS(klass);
>      > +
>      > +    rc->phases.hold = apple_gfx_mmio_reset;
>      > +    dc->hotpluggable = false;
>      > +    dc->realize = apple_gfx_mmio_realize;
>      > +}
>      > +
>      > +static TypeInfo apple_gfx_mmio_types[] = {
>      > +    {
>      > +        .name          = TYPE_APPLE_GFX_MMIO,
>      > +        .parent        = TYPE_SYS_BUS_DEVICE,
>      > +        .instance_size = sizeof(AppleGFXMMIOState),
>      > +        .class_init    = apple_gfx_mmio_class_init,
>      > +        .instance_init = apple_gfx_mmio_init,
>      > +    }
>      > +};
>      > +DEFINE_TYPES(apple_gfx_mmio_types)
>      > diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
>      > new file mode 100644
>      > index 00000000000..39931fba65a
>      > --- /dev/null
>      > +++ b/hw/display/apple-gfx.h
>      > @@ -0,0 +1,58 @@
>      > +#ifndef QEMU_APPLE_GFX_H
>      > +#define QEMU_APPLE_GFX_H
>      > +
>      > +#define TYPE_APPLE_GFX_MMIO         "apple-gfx-mmio"
>      > +#define TYPE_APPLE_GFX_PCI          "apple-gfx-pci"
>      > +
>      > +#include "qemu/osdep.h"
>      > +#include <dispatch/dispatch.h>
>      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>      > +#include "qemu/typedefs.h"
>      > +#include "exec/memory.h"
>      > +#include "ui/surface.h"
>      > +
>      > +@class PGDeviceDescriptor;
>      > +@protocol PGDevice;
>      > +@protocol PGDisplay;
>      > +@protocol MTLDevice;
>      > +@protocol MTLTexture;
>      > +@protocol MTLCommandQueue;
>      > +
>      > +typedef QTAILQ_HEAD(, PGTask_s) PGTaskList;
>      > +
>      > +struct AppleGFXMapMemoryJob;
> 
>     Probably this declaration of AppleGFXMapMemoryJob is unnecessary.
> 
>      > +typedef struct AppleGFXState {
>      > +    MemoryRegion iomem_gfx;
>      > +    id<PGDevice> pgdev;
>      > +    id<PGDisplay> pgdisp;
>      > +    PGTaskList tasks;
>      > +    QemuConsole *con;
>      > +    id<MTLDevice> mtl;
>      > +    id<MTLCommandQueue> mtl_queue;
>      > +    bool cursor_show;
>      > +    QEMUCursor *cursor;
>      > +
>      > +    /* For running PVG memory-mapping requests in the AIO context */
>      > +    QemuCond job_cond;
>      > +    QemuMutex job_mutex;
> 
>     Use: QemuEvent
> 
> 
> Hmm. I think if we were to use that, we would need to create a new 
> QemuEvent for every job and destroy it afterward, which seems expensive. 
> We can't rule out multiple concurrent jobs being submitted, and the 
> QemuEvent system only supports a single producer as far as I can tell.
> 
> You can probably sort of hack around it with just one QemuEvent by 
> putting the qemu_event_wait into a loop and turning the job.done flag 
> into an atomic (because it would now need to be checked outside the 
> lock) but this all seems unnecessarily complicated considering the 
> QemuEvent uses the same mechanism QemuCond/QemuMutex internally on macOS 
> (the only platform relevant here), except we can use it as intended with 
> QemuCond/QemuMutex rather than having to work against the abstraction.

I don't think it's going to be used concurrently. It would be difficult 
to reason even for the framework if it performs memory 
unmapping/mapping/reading operations concurrently. PGDevice.h also notes 
raiseInterrupt needs to be thread-safe while it doesn't make such notes 
for memory operations. This actually makes sense.

If it's ever going to be used concurrently, it's better to have 
QemuEvent for each job to avoid the thundering herd problem.

> 
>      > +
>      > +    dispatch_queue_t render_queue;
>      > +    /* The following fields should only be accessed from the BQL: */
> 
>     Perhaps it may be better to document fields that can be accessed
>     *without* the BQL; most things in QEMU implicitly require the BQL.
> 
>      > +    bool gfx_update_requested;
>      > +    bool new_frame_ready;
>      > +    bool using_managed_texture_storage;
>      > +} AppleGFXState;
>      > +
>      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
>     char* obj_name);
>      > +void apple_gfx_common_realize(AppleGFXState *s,
>     PGDeviceDescriptor *desc,
>      > +                              Error **errp);
>      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
>     guest_physical,
>      > +                                               uint64_t length,
>     bool read_only);
>      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag);
>      > +
>      > +#endif
>      > +
>      > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
>      > new file mode 100644
>      > index 00000000000..46be9957f69
>      > --- /dev/null
>      > +++ b/hw/display/apple-gfx.m
>      > @@ -0,0 +1,713 @@
>      > +/*
>      > + * QEMU Apple ParavirtualizedGraphics.framework device
>      > + *
>      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All
>     Rights Reserved.
>      > + *
>      > + * This work is licensed under the terms of the GNU GPL, version
>     2 or later.
>      > + * See the COPYING file in the top-level directory.
>      > + *
>      > + * ParavirtualizedGraphics.framework is a set of libraries that
>     macOS provides
>      > + * which implements 3d graphics passthrough to the host as well as a
>      > + * proprietary guest communication channel to drive it. This
>     device model
>      > + * implements support to drive that library from within QEMU.
>      > + */
>      > +
>      > +#include "qemu/osdep.h"
>      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>      > +#include <mach/mach_vm.h>
>      > +#include "apple-gfx.h"
>      > +#include "trace.h"
>      > +#include "qemu-main.h"
>      > +#include "exec/address-spaces.h"
>      > +#include "migration/blocker.h"
>      > +#include "monitor/monitor.h"
>      > +#include "qemu/main-loop.h"
>      > +#include "qemu/cutils.h"
>      > +#include "qemu/log.h"
>      > +#include "qapi/visitor.h"
>      > +#include "qapi/error.h"
>      > +#include "ui/console.h"
>      > +
>      > +static const PGDisplayCoord_t apple_gfx_modes[] = {
>      > +    { .x = 1440, .y = 1080 },
>      > +    { .x = 1280, .y = 1024 },
>      > +};
>      > +
>      > +/* This implements a type defined in <ParavirtualizedGraphics/
>     PGDevice.h>
>      > + * which is opaque from the framework's point of view. Typedef
>     PGTask_t already
>      > + * exists in the framework headers. */
>      > +struct PGTask_s {
>      > +    QTAILQ_ENTRY(PGTask_s) node;
>      > +    mach_vm_address_t address;
>      > +    uint64_t len;
>      > +};
>      > +
>      > +static Error *apple_gfx_mig_blocker;
> 
>     This does not have to be a static variable.
> 
> 
> Hmm, the first 5 or so examples of migration blockers in other devices 
> etc. I could find were all declared in this way. What are you suggesting 
> as the alternative? And why not use the same pattern as in most of the 
> rest of the code base?

I was wrong. This is better to be a static variable to ensure we won't 
add the same blocker in case we have two device instances.

> 
>      > +
>      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
>      > +                                             uint32_t width,
>     uint32_t height);
>      > +
>      > +static inline dispatch_queue_t get_background_queue(void)
> 
>     Don't add inline. The only effect for modern compilers of inline is to
>     suppress the unused function warnings.
> 
>      > +{
>      > +    return
>     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
>      > +}
>      > +
>      > +static PGTask_t *apple_gfx_new_task(AppleGFXState *s, uint64_t len)
>      > +{
>      > +    mach_vm_address_t task_mem;
>      > +    PGTask_t *task;
>      > +    kern_return_t r;
>      > +
>      > +    r = mach_vm_allocate(mach_task_self(), &task_mem, len,
>     VM_FLAGS_ANYWHERE);
>      > +    if (r != KERN_SUCCESS || task_mem == 0) {
> 
>     Let's remove the check for task_mem == 0. We have no reason to
>     reject it
>     if the platform insists it allocated a memory at address 0 though
>     such a
>     situation should never happen in practice.
> 
>      > +        return NULL;
>      > +    }
>      > +
>      > +    task = g_new0(PGTask_t, 1);
>      > +
>      > +    task->address = task_mem;
>      > +    task->len = len;
>      > +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
>      > +
>      > +    return task;
>      > +}
>      > +
>      > +typedef struct AppleGFXIOJob {
>      > +    AppleGFXState *state;
>      > +    uint64_t offset;
>      > +    uint64_t value;
>      > +    bool completed;
>      > +} AppleGFXIOJob;
>      > +
>      > +static void apple_gfx_do_read(void *opaque)
>      > +{
>      > +    AppleGFXIOJob *job = opaque;
>      > +    job->value = [job->state->pgdev mmioReadAtOffset:job->offset];
>      > +    qatomic_set(&job->completed, true);
>      > +    aio_wait_kick();
>      > +}
>      > +
>      > +static uint64_t apple_gfx_read(void *opaque, hwaddr offset,
>     unsigned size)
>      > +{
>      > +    AppleGFXIOJob job = {
>      > +        .state = opaque,
>      > +        .offset = offset,
>      > +        .completed = false,
>      > +    };
>      > +    AioContext *context = qemu_get_aio_context();
>      > +    dispatch_queue_t queue = get_background_queue();
>      > +
>      > +    dispatch_async_f(queue, &job, apple_gfx_do_read);
>      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
>      > +
>      > +    trace_apple_gfx_read(offset, job.value);
>      > +    return job.value;
>      > +}
>      > +
>      > +static void apple_gfx_do_write(void *opaque)
>      > +{
>      > +    AppleGFXIOJob *job = opaque;
>      > +    [job->state->pgdev mmioWriteAtOffset:job->offset value:job-
>      >value];
>      > +    qatomic_set(&job->completed, true);
>      > +    aio_wait_kick();
>      > +}
>      > +
>      > +static void apple_gfx_write(void *opaque, hwaddr offset,
>     uint64_t val,
>      > +                            unsigned size)
>      > +{
>      > +    /* The methods mmioReadAtOffset: and especially
>     mmioWriteAtOffset: can
>      > +     * trigger and block on operations on other dispatch queues,
>     which in turn
>      > +     * may call back out on one or more of the callback blocks.
>     For this reason,
>      > +     * and as we are holding the BQL, we invoke the I/O methods
>     on a pool
>      > +     * thread and handle AIO tasks while we wait. Any work in
>     the callbacks
>      > +     * requiring the BQL will in turn schedule BHs which this
>     thread will
>      > +     * process while waiting. */
>      > +    AppleGFXIOJob job = {
>      > +        .state = opaque,
>      > +        .offset = offset,
>      > +        .value = val,
>      > +        .completed = false,
>      > +    };
>      > +    AioContext *context = qemu_get_current_aio_context();
>      > +    dispatch_queue_t queue = get_background_queue();
>      > +
>      > +    dispatch_async_f(queue, &job, apple_gfx_do_write);
>      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
>      > +
>      > +    trace_apple_gfx_write(offset, val);
>      > +}
>      > +
>      > +static const MemoryRegionOps apple_gfx_ops = {
>      > +    .read = apple_gfx_read,
>      > +    .write = apple_gfx_write,
>      > +    .endianness = DEVICE_LITTLE_ENDIAN,
>      > +    .valid = {
>      > +        .min_access_size = 4,
>      > +        .max_access_size = 8,
>      > +    },
>      > +    .impl = {
>      > +        .min_access_size = 4,
>      > +        .max_access_size = 4,
>      > +    },
>      > +};
>      > +
>      > +static void apple_gfx_render_new_frame_bql_unlock(AppleGFXState *s)
>      > +{
>      > +    BOOL r;
>      > +    uint32_t width = surface_width(s->surface);
>      > +    uint32_t height = surface_height(s->surface);
>      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
>      > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue
>     commandBuffer];
>      > +    id<MTLTexture> texture = s->texture;
>      > +
>      > +    assert(bql_locked());
>      > +    [texture retain];
>      > +
>      > +    bql_unlock();
>      > +
>      > +    /* This is not safe to call from the BQL due to PVG-internal
>     locks causing
>      > +     * deadlocks. */
>      > +    r = [s->pgdisp encodeCurrentFrameToCommandBuffer:command_buffer
>      > +                                             texture:texture
>      > +                                              region:region];
>      > +    if (!r) {
>      > +        [texture release];
>      > +        bql_lock();
>      > +        --s->pending_frames;
>      > +        bql_unlock();
>      > +        qemu_log_mask(LOG_GUEST_ERROR,
>     "apple_gfx_render_new_frame_bql_unlock: "
> 
>     Use: __func__
> 
>      > +                     
>     "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
>      > +        return;
>      > +    }
>      > +
>      > +    if (s->using_managed_texture_storage) {
>      > +        /* "Managed" textures exist in both VRAM and RAM and
>     must be synced. */
>      > +        id<MTLBlitCommandEncoder> blit = [command_buffer
>     blitCommandEncoder];
>      > +        [blit synchronizeResource:texture];
>      > +        [blit endEncoding];
>      > +    }
>      > +    [texture release];
>      > +    [command_buffer addCompletedHandler:
>      > +        ^(id<MTLCommandBuffer> cb)
>      > +        {
>      > +            dispatch_async(s->render_queue, ^{
>      > +                apple_gfx_render_frame_completed(s, width, height);
>      > +            });
>      > +        }];
>      > +    [command_buffer commit];
>      > +}
>      > +
>      > +static void copy_mtl_texture_to_surface_mem(id<MTLTexture>
>     texture, void *vram)
>      > +{
>      > +    /* TODO: Skip this entirely on a pure Metal or headless/
>     guest-only
>      > +     * rendering path, else use a blit command encoder? Needs
>     careful
>      > +     * (double?) buffering design. */
>      > +    size_t width = texture.width, height = texture.height;
>      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
>      > +    [texture getBytes:vram
>      > +          bytesPerRow:(width * 4)
>      > +        bytesPerImage:(width * height * 4)
>      > +           fromRegion:region
>      > +          mipmapLevel:0
>      > +                slice:0];
>      > +}copy_mtl_texture_to_surface_mem
>      > +
>      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
>      > +                                             uint32_t width,
>     uint32_t height)
>      > +{
>      > +    bql_lock();
>      > +    --s->pending_frames;
>      > +    assert(s->pending_frames >= 0);
>      > +
>      > +    /* Only update display if mode hasn't changed since we
>     started rendering. */
>      > +    if (width == surface_width(s->surface) &&
>      > +        height == surface_height(s->surface)) {
>      > +        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
>      > +        if (s->gfx_update_requested) {
>      > +            s->gfx_update_requested = false;
>      > +            dpy_gfx_update_full(s->con);
>      > +            graphic_hw_update_done(s->con);
>      > +            s->new_frame_ready = false;
>      > +        } else {
>      > +            s->new_frame_ready = true;
>      > +        }
>      > +    }
>      > +    if (s->pending_frames > 0) {
>      > +        apple_gfx_render_new_frame_bql_unlock(s);
>      > +    } else {
>      > +        bql_unlock();
>      > +    }
>      > +}
>      > +
>      > +static void apple_gfx_fb_update_display(void *opaque)
>      > +{
>      > +    AppleGFXState *s = opaque;
>      > +
>      > +    assert(bql_locked());
>      > +    if (s->new_frame_ready) {
>      > +        dpy_gfx_update_full(s->con);
>      > +        s->new_frame_ready = false;
>      > +        graphic_hw_update_done(s->con);
>      > +    } else if (s->pending_frames > 0) {
>      > +        s->gfx_update_requested = true;
>      > +    } else {
>      > +        graphic_hw_update_done(s->con);
>      > +    }
>      > +}
>      > +
>      > +static const GraphicHwOps apple_gfx_fb_ops = {
>      > +    .gfx_update = apple_gfx_fb_update_display,
>      > +    .gfx_update_async = true,
>      > +};
>      > +
>      > +static void update_cursor(AppleGFXState *s)
>      > +{
>      > +    assert(bql_locked());
>      > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
>      > +                  s->pgdisp.cursorPosition.y, s->cursor_show);
>      > +}
>      > +
>      > +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t
>     height)
>      > +{
>      > +    MTLTextureDescriptor *textureDescriptor;
>      > +
>      > +    if (s->surface &&
>      > +        width == surface_width(s->surface) &&
>      > +        height == surface_height(s->surface)) {
>      > +        return;
>      > +    }
>      > +
>      > +    g_free(s->vram);
>      > +    [s->texture release];
>      > +
>      > +    s->vram = g_malloc0_n(width * height, 4);
>      > +    s->surface = qemu_create_displaysurface_from(width, height,
>     PIXMAN_LE_a8r8g8b8,
>      > +                                                 width * 4, s-
>      >vram);> +> +    @autoreleasepool {
>      > +        textureDescriptor =
>      > +            [MTLTextureDescriptor
>      > +               
>     texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
>      > +                                             width:width
>      > +                                            height:height
>      > +                                         mipmapped:NO];
>      > +        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
>      > +        s->texture = [s->mtl
>     newTextureWithDescriptor:textureDescriptor];
> 
> 
>     What about creating pixman_image_t from s->texture.buffer.contents?
>     This
>     should save memory usage by removing the duplication of texture.
> 
> 
> We need explicit control over when the GPU vs when the CPU may access 
> the texture - only one of them may access them at a time. As far as I 
> can tell, we can't control when the rest of Qemu might access the 
> pixman_image used in the console surface?

You are right; we need to have duplicate buffers. We can still avoid 
copying by using two MTLTextures for double-buffering instead of having 
a MTLTexture and a pixman_image and copying between them for 
MTLStorageModeManaged.

> 
>      > +    }
>      > +
>      > +    s->using_managed_texture_storage =
>      > +        (s->texture.storageMode == MTLStorageModeManaged);
>      > +    dpy_gfx_replace_surface(s->con, s->surface);
>      > +}
>      > +
>      > +static void create_fb(AppleGFXState *s)
>      > +{
>      > +    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
>      > +    set_mode(s, 1440, 1080);
>      > +
>      > +    s->cursor_show = true;
>      > +}
>      > +
>      > +static size_t apple_gfx_get_default_mmio_range_size(void)
>      > +{
>      > +    size_t mmio_range_size;
>      > +    @autoreleasepool {
>      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
>      > +        mmio_range_size = desc.mmioLength;
>      > +        [desc release];
>      > +    }
>      > +    return mmio_range_size;
>      > +}
>      > +
>      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
>     char* obj_name)
>      > +{
>      > +    size_t mmio_range_size =
>     apple_gfx_get_default_mmio_range_size();
>      > +
>      > +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
>      > +    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s,
>     obj_name,
>      > +                          mmio_range_size);
>      > +
>      > +    /* TODO: PVG framework supports serialising device state:
>     integrate it! */
>      > +}
>      > +
>      > +typedef struct AppleGFXMapMemoryJob {
>      > +    AppleGFXState *state;
>      > +    PGTask_t *task;
>      > +    uint64_t virtual_offset;
>      > +    PGPhysicalMemoryRange_t *ranges;
>      > +    uint32_t range_count;
>      > +    bool read_only;
>      > +    bool success;
>      > +    bool done;
>      > +} AppleGFXMapMemoryJob;
>      > +
>      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
>     guest_physical,
>      > +                                               uint64_t length,
>     bool read_only)
>      > +{
>      > +    MemoryRegion *ram_region;
>      > +    uintptr_t host_address;
>      > +    hwaddr ram_region_offset = 0;
>      > +    hwaddr ram_region_length = length;
>      > +
>      > +    ram_region = address_space_translate(&address_space_memory,
>      > +                                         guest_physical,
>      > +                                         &ram_region_offset,
>      > +                                         &ram_region_length, !
>     read_only,
>      > +                                         MEMTXATTRS_UNSPECIFIED);
> 
>     Call memory_region_ref() so that it won't go away.
> 
>      > +
>      > +    if (!ram_region || ram_region_length < length ||
>      > +        !memory_access_is_direct(ram_region, !read_only)) {
>      > +        return 0;
>      > +    }
>      > +
>      > +    host_address =
>     (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
> 
>     host_address is typed as uintptr_t, not mach_vm_address_t.
> 
>      > +    if (host_address == 0) {
>      > +        return 0;
>      > +    }
>      > +    host_address += ram_region_offset;
>      > +
>      > +    return host_address;
>      > +}
>      > +
>      > +static void apple_gfx_map_memory(void *opaque)
>      > +{
>      > +    AppleGFXMapMemoryJob *job = opaque;
>      > +    AppleGFXState *s = job->state;
>      > +    PGTask_t *task                  = job->task;
>      > +    uint32_t range_count            = job->range_count;
>      > +    uint64_t virtual_offset         = job->virtual_offset;
>      > +    PGPhysicalMemoryRange_t *ranges = job->ranges;
>      > +    bool read_only                  = job->read_only;
>      > +    kern_return_t r;
>      > +    mach_vm_address_t target, source;
>      > +    vm_prot_t cur_protection, max_protection;
>      > +    bool success = true;
>      > +
>      > +    g_assert(bql_locked());
>      > +
>      > +    trace_apple_gfx_map_memory(task, range_count,
>     virtual_offset, read_only);
>      > +    for (int i = 0; i < range_count; i++) {
>      > +        PGPhysicalMemoryRange_t *range = &ranges[i];
>      > +
>      > +        target = task->address + virtual_offset;
>      > +        virtual_offset += range->physicalLength;
>      > +
>      > +        trace_apple_gfx_map_memory_range(i, range->physicalAddress,
>      > +                                         range->physicalLength);
>      > +
>      > +        source = apple_gfx_host_address_for_gpa_range(range-
>      >physicalAddress,
>      > +                                                      range-
>      >physicalLength,
>      > +                                                      read_only);
>      > +        if (source == 0) {
>      > +            success = false;
>      > +            continue;
>      > +        }
>      > +
>      > +        MemoryRegion* alt_mr = NULL;
>      > +        mach_vm_address_t alt_source =
>     (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress, range-
>      >physicalLength, NULL);
>      > +        g_assert(alt_source == source);
> 
>     Remove this; I guess this is for debugging.
> 
>      > +
>      > +        cur_protection = 0;
>      > +        max_protection = 0;
>      > +        // Map guest RAM at range->physicalAddress into PG task
>     memory range
>      > +        r = mach_vm_remap(mach_task_self(),
>      > +                          &target, range->physicalLength,
>     vm_page_size - 1,
>      > +                          VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE,
>      > +                          mach_task_self(),
>      > +                          source, false /* shared mapping, no
>     copy */,
>      > +                          &cur_protection, &max_protection,
>      > +                          VM_INHERIT_COPY);
>      > +        trace_apple_gfx_remap(r, source, target);
>      > +        g_assert(r == KERN_SUCCESS);
>      > +    }
>      > +
>      > +    qemu_mutex_lock(&s->job_mutex);
>      > +    job->success = success;
>      > +    job->done = true;
>      > +    qemu_cond_broadcast(&s->job_cond);
>      > +    qemu_mutex_unlock(&s->job_mutex);
>      > +}
>      > +
>      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag)
>      > +{
>      > +    qemu_mutex_lock(&s->job_mutex);
>      > +    while (!*job_done_flag) {
>      > +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
>      > +    }
>      > +    qemu_mutex_unlock(&s->job_mutex);
>      > +}
>      > +
>      > +typedef struct AppleGFXReadMemoryJob {
>      > +    AppleGFXState *s;
>      > +    hwaddr physical_address;
>      > +    uint64_t length;
>      > +    void *dst;
>      > +    bool done;
>      > +} AppleGFXReadMemoryJob;
>      > +
>      > +static void apple_gfx_do_read_memory(void *opaque)
>      > +{
>      > +    AppleGFXReadMemoryJob *job = opaque;
>      > +    AppleGFXState *s = job->s;
>      > +
>      > +    cpu_physical_memory_read(job->physical_address, job->dst,
>     job->length);
> 
>     Use: dma_memory_read()
> 
>      > +
>      > +    qemu_mutex_lock(&s->job_mutex);
>      > +    job->done = true;
>      > +    qemu_cond_broadcast(&s->job_cond);
>      > +    qemu_mutex_unlock(&s->job_mutex);
>      > +}
>      > +
>      > +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr
>     physical_address,
>      > +                                  uint64_t length, void *dst)
>      > +{
>      > +    AppleGFXReadMemoryJob job = {
>      > +        s, physical_address, length, dst
>      > +    };
>      > +
>      > +    trace_apple_gfx_read_memory(physical_address, length, dst);
>      > +
>      > +    /* Traversing the memory map requires RCU/BQL, so do it in a
>     BH. */
>      > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
>     apple_gfx_do_read_memory,
>      > +                            &job);
>      > +    apple_gfx_await_bh_job(s, &job.done);
>      > +}
>      > +
>      > +static void
>     apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
>      > +                                                   
>       PGDeviceDescriptor *desc)
>      > +{
>      > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable *
>     _Nonnull baseAddress) {
>      > +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
>      > +        *baseAddress = (void *)task->address;
>      > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
>      > +        return task;
>      > +    };
>      > +
>      > +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
>      > +        trace_apple_gfx_destroy_task(task);
>      > +        QTAILQ_REMOVE(&s->tasks, task, node);
>      > +        mach_vm_deallocate(mach_task_self(), task->address,
>     task->len);
>      > +        g_free(task);
>      > +    };
>      > +
>      > +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t
>     range_count,
>      > +                       uint64_t virtual_offset, bool read_only,
>      > +                       PGPhysicalMemoryRange_t * _Nonnull ranges) {
>      > +        AppleGFXMapMemoryJob job = {
>      > +            .state = s,
>      > +            .task = task, .ranges = ranges, .range_count =
>     range_count,
>      > +            .read_only = read_only, .virtual_offset =
>     virtual_offset,
>      > +            .done = false, .success = true,
>      > +        };
>      > +        if (range_count > 0) {
>      > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
>      > +                                    apple_gfx_map_memory, &job);
>      > +            apple_gfx_await_bh_job(s, &job.done);
>      > +        }
>      > +        return job.success;
>      > +    };
>      > +
>      > +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task, uint64_t
>     virtualOffset,
>      > +                         uint64_t length) {
>      > +        kern_return_t r;
>      > +        mach_vm_address_t range_address;
>      > +
>      > +        trace_apple_gfx_unmap_memory(task, virtualOffset, length);
>      > +
>      > +        /* Replace task memory range with fresh pages, undoing
>     the mapping
>      > +         * from guest RAM. */
>      > +        range_address = task->address + virtualOffset;
>      > +        r = mach_vm_allocate(mach_task_self(), &range_address,
>     length,
>      > +                             VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE);
>      > +        g_assert(r == KERN_SUCCESS);error_setg
> 
>     An extra error_setg
> 
>      > +
>      > +        return true;
>      > +    };
>      > +
>      > +    desc.readMemory = ^bool(uint64_t physical_address, uint64_t
>     length,
>      > +                            void * _Nonnull dst) {
>      > +        apple_gfx_read_memory(s, physical_address, length, dst);
>      > +        return true;
>      > +    };
>      > +}
>      > +
>      > +static PGDisplayDescriptor
>     *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
>      > +{
>      > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
>      > +
>      > + disp_desc.name <http://disp_desc.name> = @"QEMU display";
>      > +    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A
>     20" display */
>      > +    disp_desc.queue = dispatch_get_main_queue();
>      > +    disp_desc.newFrameEventHandler = ^(void) {
>      > +        trace_apple_gfx_new_frame();
>      > +        dispatch_async(s->render_queue, ^{
>      > +            /* Drop frames if we get too far ahead. */
>      > +            bql_lock();
>      > +            if (s->pending_frames >= 2) {
>      > +                bql_unlock();
>      > +                return;
>      > +            }
>      > +            ++s->pending_frames;
>      > +            if (s->pending_frames > 1) {
>      > +                bql_unlock();
>      > +                return;
>      > +            }
>      > +            @autoreleasepool {
>      > +                apple_gfx_render_new_frame_bql_unlock(s);
>      > +            }
>      > +        });
>      > +    };
>      > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t sizeInPixels,
>      > +                                    OSType pixelFormat) {
>      > +        trace_apple_gfx_mode_change(sizeInPixels.x, sizeInPixels.y);
>      > +
>      > +        BQL_LOCK_GUARD();
>      > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
>      > +    };
>      > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
>      > +                                     PGDisplayCoord_t hotSpot) {
>      > +        [glyph retain];
>      > +        dispatch_async(get_background_queue(), ^{
>      > +            BQL_LOCK_GUARD();
>      > +            uint32_t bpp = glyph.bitsPerPixel;
>      > +            size_t width = glyph.pixelsWide;
>      > +            size_t height = glyph.pixelsHigh;
>      > +            size_t padding_bytes_per_row = glyph.bytesPerRow -
>     width * 4;
>      > +            const uint8_t* px_data = glyph.bitmapData;
>      > +
>      > +            trace_apple_gfx_cursor_set(bpp, width, height);
>      > +
>      > +            if (s->cursor) {
>      > +                cursor_unref(s->cursor);
>      > +                s->cursor = NULL;
>      > +            }
>      > +
>      > +            if (bpp == 32) { /* Shouldn't be anything else, but
>     just to be safe...*/
>      > +                s->cursor = cursor_alloc(width, height);
>      > +                s->cursor->hot_x = hotSpot.x;
>      > +                s->cursor->hot_y = hotSpot.y;
>      > +
>      > +                uint32_t *dest_px = s->cursor->data;
>      > +
>      > +                for (size_t y = 0; y < height; ++y) {
>      > +                    for (size_t x = 0; x < width; ++x) {
>      > +                        /* NSBitmapImageRep's red & blue
>     channels are swapped
>      > +                         * compared to QEMUCursor's. */
>      > +                        *dest_px =
>      > +                            (px_data[0] << 16u) |
>      > +                            (px_data[1] <<  8u) |
>      > +                            (px_data[2] <<  0u) |
>      > +                            (px_data[3] << 24u);
>      > +                        ++dest_px;
>      > +                        px_data += 4;
>      > +                    }
>      > +                    px_data += padding_bytes_per_row;
>      > +                }
>      > +                dpy_cursor_define(s->con, s->cursor);
>      > +                update_cursor(s);
>      > +            }
>      > +            [glyph release];
>      > +        });
>      > +    };
>      > +    disp_desc.cursorShowHandler = ^(BOOL show) {
>      > +        dispatch_async(get_background_queue(), ^{
>      > +            BQL_LOCK_GUARD();
>      > +            trace_apple_gfx_cursor_show(show);
>      > +            s->cursor_show = show;
>      > +            update_cursor(s);
>      > +        });
>      > +    };
>      > +    disp_desc.cursorMoveHandler = ^(void) {
>      > +        dispatch_async(get_background_queue(), ^{
>      > +            BQL_LOCK_GUARD();
>      > +            trace_apple_gfx_cursor_move();
>      > +            update_cursor(s);
>      > +        });
>      > +    };
>      > +
>      > +    return disp_desc;
>      > +}
>      > +
>      > +static NSArray<PGDisplayMode*>*
>     apple_gfx_prepare_display_mode_array(void)
>      > +{
>      > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
>      > +    NSArray<PGDisplayMode*>* mode_array = nil;
>      > +    int i;
>      > +
>      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
>      > +        modes[i] =
>      > +            [[PGDisplayMode alloc]
>     initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
>      > +    }
>      > +
>      > +    mode_array = [NSArray arrayWithObjects:modes
>     count:ARRAY_SIZE(apple_gfx_modes)];
>      > +
>      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
>      > +        [modes[i] release];
>      > +        modes[i] = nil;
>      > +    }
>      > +
>      > +    return mode_array;
>      > +}
>      > +
>      > +static id<MTLDevice> copy_suitable_metal_device(void)
>      > +{
>      > +    id<MTLDevice> dev = nil;
>      > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
>      > +
>      > +    /* Prefer a unified memory GPU. Failing that, pick a non-
>     removable GPU. */
>      > +    for (size_t i = 0; i < devs.count; ++i) {
>      > +        if (devs[i].hasUnifiedMemory) {
>      > +            dev = devs[i];
>      > +            break;
>      > +        }
>      > +        if (!devs[i].removable) {
>      > +            dev = devs[i];
>      > +        }
>      > +    }
>      > +
>      > +    if (dev != nil) {
>      > +        [dev retain];
>      > +    } else {
>      > +        dev = MTLCreateSystemDefaultDevice();
>      > +    }
>      > +    [devs release];
>      > +
>      > +    return dev;
>      > +}
>      > +
>      > +void apple_gfx_common_realize(AppleGFXState *s,
>     PGDeviceDescriptor *desc,
>      > +                              Error **errp)
>      > +{
>      > +    PGDisplayDescriptor *disp_desc = nil;
>      > +
>      > +    if (apple_gfx_mig_blocker == NULL) {
>      > +        error_setg(&apple_gfx_mig_blocker,
>      > +                  "Migration state blocked by apple-gfx display
>     device");
>      > +        if (migrate_add_blocker(&apple_gfx_mig_blocker, errp) < 0) {
>      > +            return;
>      > +        }
>      > +    }
>      > +
>      > +    QTAILQ_INIT(&s->tasks);
>      > +    s->render_queue = dispatch_queue_create("apple-gfx.render",
>      > +                                            DISPATCH_QUEUE_SERIAL);
>      > +    s->mtl = copy_suitable_metal_device();
>      > +    s->mtl_queue = [s->mtl newCommandQueue];
>      > +
>      > +    desc.device = s->mtl;
>      > +
>      > +    apple_gfx_register_task_mapping_handlers(s, desc);
>      > +
>      > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
>      > +
>      > +    disp_desc = apple_gfx_prepare_display_descriptor(s);
>      > +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
>      > +                                              port:0
>     serialNum:1234];
>      > +    [disp_desc release];
>      > +    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
>      > +
>      > +    create_fb(s);
>      > +
>      > +    qemu_mutex_init(&s->job_mutex);
>      > +    qemu_cond_init(&s->job_cond);
>      > +}
>      > diff --git a/hw/display/meson.build b/hw/display/meson.build
>      > index 20a94973fa2..619e642905a 100644
>      > --- a/hw/display/meson.build
>      > +++ b/hw/display/meson.build
>      > @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true:
>     files('artist.c'))
>      >
>      >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c',
>     'ati_2d.c', 'ati_dbg.c'), pixman])
>      >
>      > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
>     [files('apple-gfx.m'), pvg, metal])
>      > +if cpu == 'aarch64'
>      > +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true:
>     [files('apple-gfx-mmio.m'), pvg, metal])
>      > +endif
>      >
>      >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
>      >     virtio_gpu_ss = ss.source_set()
>      > diff --git a/hw/display/trace-events b/hw/display/trace-events
>      > index 781f8a33203..214998312b9 100644
>      > --- a/hw/display/trace-events
>      > +++ b/hw/display/trace-events
>      > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width)
>     "dest_width : %u"
>      >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
>      >   dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
>      >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
>      > +
>      > +# apple-gfx.m
>      > +apple_gfx_read(uint64_t offset, uint64_t res)
>     "offset=0x%"PRIx64" res=0x%"PRIx64
>      > +apple_gfx_write(uint64_t offset, uint64_t val)
>     "offset=0x%"PRIx64" val=0x%"PRIx64
>      > +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x
>     base_addr=%p"
>      > +apple_gfx_destroy_task(void *task) "task=%p"
>      > +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t
>     virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
>     virtual_offset=0x%"PRIx64" read_only=%d"
>      > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr,
>     uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
>      > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t
>     target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
>      > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset,
>     uint64_t length) "task=%p virtual_offset=0x%"PRIx64" length=0x%"PRIx64
>      > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length,
>     void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
>      > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
>      > +apple_gfx_new_frame(void) ""
>      > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64"
>     y=%"PRId64
>      > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t
>     height) "bpp=%d width=%"PRId64" height=0x%"PRId64
>      > +apple_gfx_cursor_show(uint32_t show) "show=%d"
>      > +apple_gfx_cursor_move(void) ""
>      > +apple_gfx_common_init(const char *device_name, size_t mmio_size)
>     "device: %s; MMIO size: %zu bytes"
>      > +
>      > +# apple-gfx-mmio.m
>      > +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res)
>     "offset=0x%"PRIx64" res=0x%"PRIx64
>      > +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val)
>     "offset=0x%"PRIx64" val=0x%"PRIx64
>      > +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t
>     ro, void *va, void *e, void *f, void* va_result, int success)
>     "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p -> *va=%p,
>     success = %d"
>      > +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c, void *d,
>     void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
>      > +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
>      > +
>      > diff --git a/meson.build b/meson.build
>      > index d26690ce204..0e124eff13f 100644
>      > --- a/meson.build
>      > +++ b/meson.build
>      > @@ -761,6 +761,8 @@ socket = []
>      >   version_res = []
>      >   coref = []
>      >   iokit = []
>      > +pvg = []
>      > +metal = []
>      >   emulator_link_args = []
>      >   midl = not_found
>      >   widl = not_found
>      > @@ -782,6 +784,8 @@ elif host_os == 'darwin'
>      >     coref = dependency('appleframeworks', modules: 'CoreFoundation')
>      >     iokit = dependency('appleframeworks', modules: 'IOKit',
>     required: false)
>      >     host_dsosuf = '.dylib'
>      > +  pvg = dependency('appleframeworks', modules:
>     'ParavirtualizedGraphics')
>      > +  metal = dependency('appleframeworks', modules: 'Metal')
>      >   elif host_os == 'sunos'
>      >     socket = [cc.find_library('socket'),
>      >               cc.find_library('nsl'),
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-26  4:40       ` Akihiko Odaki
@ 2024-10-26 10:24         ` Phil Dennis-Jordan
  2024-10-28  7:42           ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-26 10:24 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 44871 bytes --]

On Sat, 26 Oct 2024 at 06:40, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/26 4:43, Phil Dennis-Jordan wrote:
> >
> >
> > On Fri, 25 Oct 2024 at 08:03, Akihiko Odaki <akihiko.odaki@daynix.com
> > <mailto:akihiko.odaki@daynix.com>> wrote:
> >
> >     On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> >      > +    /* For running PVG memory-mapping requests in the AIO
> context */
> >      > +    QemuCond job_cond;
> >      > +    QemuMutex job_mutex;
> >
> >     Use: QemuEvent
> >
> >
> > Hmm. I think if we were to use that, we would need to create a new
> > QemuEvent for every job and destroy it afterward, which seems expensive.
> > We can't rule out multiple concurrent jobs being submitted, and the
> > QemuEvent system only supports a single producer as far as I can tell.
> >
> > You can probably sort of hack around it with just one QemuEvent by
> > putting the qemu_event_wait into a loop and turning the job.done flag
> > into an atomic (because it would now need to be checked outside the
> > lock) but this all seems unnecessarily complicated considering the
> > QemuEvent uses the same mechanism QemuCond/QemuMutex internally on macOS
> > (the only platform relevant here), except we can use it as intended with
> > QemuCond/QemuMutex rather than having to work against the abstraction.
>
> I don't think it's going to be used concurrently. It would be difficult
> to reason even for the framework if it performs memory
> unmapping/mapping/reading operations concurrently.


I've just performed a very quick test by wrapping the job submission/wait
in the 2 mapMemory callbacks and the 1 readMemory callback with atomic
counters and logging whenever a counter went above 1.

 * Overall, concurrent callbacks across all types were common (many per
second when the VM is busy). It's not exactly a "thundering herd" (I never
saw >2) but it's probably not a bad idea to use a separate condition
variable for each job type. (task map, surface map, memory read)
 * While I did not observe any concurrent memory mapping operations *within*
a type of memory map (2 task mappings or 2 surface mappings) I did see very
occasional concurrent memory *read* callbacks. These would, as far as I can
tell, not be safe with QemuEvents, unless we placed the event inside the
job struct and init/destroyed it on every callback (which seems like
excessive overhead).

My recommendation would be to split it up into 3 pairs of mutex/cond; this
will almost entirely remove any contention, but continue to be safe for
when it does occur. I don't think QemuEvent is a realistic option (too
tricky to get right) for the observed-concurrent readMemory callback. I'm
nervous about assuming the mapMemory callbacks will NEVER be called
concurrently, but at a push I'll acquiesce to switching those to QemuEvent
in the absence of evidence of concurrency.


> PGDevice.h also notes
> raiseInterrupt needs to be thread-safe while it doesn't make such notes
> for memory operations. This actually makes sense.
>
> If it's ever going to be used concurrently, it's better to have
> QemuEvent for each job to avoid the thundering herd problem.
>
> >
> >      > +
> >      > +    dispatch_queue_t render_queue;
> >      > +    /* The following fields should only be accessed from the
> BQL: */
> >
> >     Perhaps it may be better to document fields that can be accessed
> >     *without* the BQL; most things in QEMU implicitly require the BQL.
> >
> >      > +    bool gfx_update_requested;
> >      > +    bool new_frame_ready;
> >      > +    bool using_managed_texture_storage;
> >      > +} AppleGFXState;
> >      > +
> >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
> >     char* obj_name);
> >      > +void apple_gfx_common_realize(AppleGFXState *s,
> >     PGDeviceDescriptor *desc,
> >      > +                              Error **errp);
> >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
> >     guest_physical,
> >      > +                                               uint64_t length,
> >     bool read_only);
> >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
> *job_done_flag);
> >      > +
> >      > +#endif
> >      > +
> >      > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
> >      > new file mode 100644
> >      > index 00000000000..46be9957f69
> >      > --- /dev/null
> >      > +++ b/hw/display/apple-gfx.m
> >      > @@ -0,0 +1,713 @@
> >      > +/*
> >      > + * QEMU Apple ParavirtualizedGraphics.framework device
> >      > + *
> >      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All
> >     Rights Reserved.
> >      > + *
> >      > + * This work is licensed under the terms of the GNU GPL, version
> >     2 or later.
> >      > + * See the COPYING file in the top-level directory.
> >      > + *
> >      > + * ParavirtualizedGraphics.framework is a set of libraries that
> >     macOS provides
> >      > + * which implements 3d graphics passthrough to the host as well
> as a
> >      > + * proprietary guest communication channel to drive it. This
> >     device model
> >      > + * implements support to drive that library from within QEMU.
> >      > + */
> >      > +
> >      > +#include "qemu/osdep.h"
> >      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> >      > +#include <mach/mach_vm.h>
> >      > +#include "apple-gfx.h"
> >      > +#include "trace.h"
> >      > +#include "qemu-main.h"
> >      > +#include "exec/address-spaces.h"
> >      > +#include "migration/blocker.h"
> >      > +#include "monitor/monitor.h"
> >      > +#include "qemu/main-loop.h"
> >      > +#include "qemu/cutils.h"
> >      > +#include "qemu/log.h"
> >      > +#include "qapi/visitor.h"
> >      > +#include "qapi/error.h"
> >      > +#include "ui/console.h"
> >      > +
> >      > +static const PGDisplayCoord_t apple_gfx_modes[] = {
> >      > +    { .x = 1440, .y = 1080 },
> >      > +    { .x = 1280, .y = 1024 },
> >      > +};
> >      > +
> >      > +/* This implements a type defined in <ParavirtualizedGraphics/
> >     PGDevice.h>
> >      > + * which is opaque from the framework's point of view. Typedef
> >     PGTask_t already
> >      > + * exists in the framework headers. */
> >      > +struct PGTask_s {
> >      > +    QTAILQ_ENTRY(PGTask_s) node;
> >      > +    mach_vm_address_t address;
> >      > +    uint64_t len;
> >      > +};
> >      > +
> >      > +static Error *apple_gfx_mig_blocker;
> >
> >     This does not have to be a static variable.
> >
> >
> > Hmm, the first 5 or so examples of migration blockers in other devices
> > etc. I could find were all declared in this way. What are you suggesting
> > as the alternative? And why not use the same pattern as in most of the
> > rest of the code base?
>
> I was wrong. This is better to be a static variable to ensure we won't
> add the same blocker in case we have two device instances.
>
> >
> >      > +
> >      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> >      > +                                             uint32_t width,
> >     uint32_t height);
> >      > +
> >      > +static inline dispatch_queue_t get_background_queue(void)
> >
> >     Don't add inline. The only effect for modern compilers of inline is
> to
> >     suppress the unused function warnings.
> >
> >      > +{
> >      > +    return
> >     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> >      > +}
> >      > +
> >      > +static PGTask_t *apple_gfx_new_task(AppleGFXState *s, uint64_t
> len)
> >      > +{
> >      > +    mach_vm_address_t task_mem;
> >      > +    PGTask_t *task;
> >      > +    kern_return_t r;
> >      > +
> >      > +    r = mach_vm_allocate(mach_task_self(), &task_mem, len,
> >     VM_FLAGS_ANYWHERE);
> >      > +    if (r != KERN_SUCCESS || task_mem == 0) {
> >
> >     Let's remove the check for task_mem == 0. We have no reason to
> >     reject it
> >     if the platform insists it allocated a memory at address 0 though
> >     such a
> >     situation should never happen in practice.
> >
> >      > +        return NULL;
> >      > +    }
> >      > +
> >      > +    task = g_new0(PGTask_t, 1);
> >      > +
> >      > +    task->address = task_mem;
> >      > +    task->len = len;
> >      > +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
> >      > +
> >      > +    return task;
> >      > +}
> >      > +
> >      > +typedef struct AppleGFXIOJob {
> >      > +    AppleGFXState *state;
> >      > +    uint64_t offset;
> >      > +    uint64_t value;
> >      > +    bool completed;
> >      > +} AppleGFXIOJob;
> >      > +
> >      > +static void apple_gfx_do_read(void *opaque)
> >      > +{
> >      > +    AppleGFXIOJob *job = opaque;
> >      > +    job->value = [job->state->pgdev
> mmioReadAtOffset:job->offset];
> >      > +    qatomic_set(&job->completed, true);
> >      > +    aio_wait_kick();
> >      > +}
> >      > +
> >      > +static uint64_t apple_gfx_read(void *opaque, hwaddr offset,
> >     unsigned size)
> >      > +{
> >      > +    AppleGFXIOJob job = {
> >      > +        .state = opaque,
> >      > +        .offset = offset,
> >      > +        .completed = false,
> >      > +    };
> >      > +    AioContext *context = qemu_get_aio_context();
> >      > +    dispatch_queue_t queue = get_background_queue();
> >      > +
> >      > +    dispatch_async_f(queue, &job, apple_gfx_do_read);
> >      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> >      > +
> >      > +    trace_apple_gfx_read(offset, job.value);
> >      > +    return job.value;
> >      > +}
> >      > +
> >      > +static void apple_gfx_do_write(void *opaque)
> >      > +{
> >      > +    AppleGFXIOJob *job = opaque;
> >      > +    [job->state->pgdev mmioWriteAtOffset:job->offset value:job-
> >      >value];
> >      > +    qatomic_set(&job->completed, true);
> >      > +    aio_wait_kick();
> >      > +}
> >      > +
> >      > +static void apple_gfx_write(void *opaque, hwaddr offset,
> >     uint64_t val,
> >      > +                            unsigned size)
> >      > +{
> >      > +    /* The methods mmioReadAtOffset: and especially
> >     mmioWriteAtOffset: can
> >      > +     * trigger and block on operations on other dispatch queues,
> >     which in turn
> >      > +     * may call back out on one or more of the callback blocks.
> >     For this reason,
> >      > +     * and as we are holding the BQL, we invoke the I/O methods
> >     on a pool
> >      > +     * thread and handle AIO tasks while we wait. Any work in
> >     the callbacks
> >      > +     * requiring the BQL will in turn schedule BHs which this
> >     thread will
> >      > +     * process while waiting. */
> >      > +    AppleGFXIOJob job = {
> >      > +        .state = opaque,
> >      > +        .offset = offset,
> >      > +        .value = val,
> >      > +        .completed = false,
> >      > +    };
> >      > +    AioContext *context = qemu_get_current_aio_context();
> >      > +    dispatch_queue_t queue = get_background_queue();
> >      > +
> >      > +    dispatch_async_f(queue, &job, apple_gfx_do_write);
> >      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
> >      > +
> >      > +    trace_apple_gfx_write(offset, val);
> >      > +}
> >      > +
> >      > +static const MemoryRegionOps apple_gfx_ops = {
> >      > +    .read = apple_gfx_read,
> >      > +    .write = apple_gfx_write,
> >      > +    .endianness = DEVICE_LITTLE_ENDIAN,
> >      > +    .valid = {
> >      > +        .min_access_size = 4,
> >      > +        .max_access_size = 8,
> >      > +    },
> >      > +    .impl = {
> >      > +        .min_access_size = 4,
> >      > +        .max_access_size = 4,
> >      > +    },
> >      > +};
> >      > +
> >      > +static void apple_gfx_render_new_frame_bql_unlock(AppleGFXState
> *s)
> >      > +{
> >      > +    BOOL r;
> >      > +    uint32_t width = surface_width(s->surface);
> >      > +    uint32_t height = surface_height(s->surface);
> >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> >      > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue
> >     commandBuffer];
> >      > +    id<MTLTexture> texture = s->texture;
> >      > +
> >      > +    assert(bql_locked());
> >      > +    [texture retain];
> >      > +
> >      > +    bql_unlock();
> >      > +
> >      > +    /* This is not safe to call from the BQL due to PVG-internal
> >     locks causing
> >      > +     * deadlocks. */
> >      > +    r = [s->pgdisp
> encodeCurrentFrameToCommandBuffer:command_buffer
> >      > +                                             texture:texture
> >      > +                                              region:region];
> >      > +    if (!r) {
> >      > +        [texture release];
> >      > +        bql_lock();
> >      > +        --s->pending_frames;
> >      > +        bql_unlock();
> >      > +        qemu_log_mask(LOG_GUEST_ERROR,
> >     "apple_gfx_render_new_frame_bql_unlock: "
> >
> >     Use: __func__
> >
> >      > +
> >     "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
> >      > +        return;
> >      > +    }
> >      > +
> >      > +    if (s->using_managed_texture_storage) {
> >      > +        /* "Managed" textures exist in both VRAM and RAM and
> >     must be synced. */
> >      > +        id<MTLBlitCommandEncoder> blit = [command_buffer
> >     blitCommandEncoder];
> >      > +        [blit synchronizeResource:texture];
> >      > +        [blit endEncoding];
> >      > +    }
> >      > +    [texture release];
> >      > +    [command_buffer addCompletedHandler:
> >      > +        ^(id<MTLCommandBuffer> cb)
> >      > +        {
> >      > +            dispatch_async(s->render_queue, ^{
> >      > +                apple_gfx_render_frame_completed(s, width,
> height);
> >      > +            });
> >      > +        }];
> >      > +    [command_buffer commit];
> >      > +}
> >      > +
> >      > +static void copy_mtl_texture_to_surface_mem(id<MTLTexture>
> >     texture, void *vram)
> >      > +{
> >      > +    /* TODO: Skip this entirely on a pure Metal or headless/
> >     guest-only
> >      > +     * rendering path, else use a blit command encoder? Needs
> >     careful
> >      > +     * (double?) buffering design. */
> >      > +    size_t width = texture.width, height = texture.height;
> >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
> >      > +    [texture getBytes:vram
> >      > +          bytesPerRow:(width * 4)
> >      > +        bytesPerImage:(width * height * 4)
> >      > +           fromRegion:region
> >      > +          mipmapLevel:0
> >      > +                slice:0];
> >      > +}copy_mtl_texture_to_surface_mem
> >      > +
> >      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
> >      > +                                             uint32_t width,
> >     uint32_t height)
> >      > +{
> >      > +    bql_lock();
> >      > +    --s->pending_frames;
> >      > +    assert(s->pending_frames >= 0);
> >      > +
> >      > +    /* Only update display if mode hasn't changed since we
> >     started rendering. */
> >      > +    if (width == surface_width(s->surface) &&
> >      > +        height == surface_height(s->surface)) {
> >      > +        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
> >      > +        if (s->gfx_update_requested) {
> >      > +            s->gfx_update_requested = false;
> >      > +            dpy_gfx_update_full(s->con);
> >      > +            graphic_hw_update_done(s->con);
> >      > +            s->new_frame_ready = false;
> >      > +        } else {
> >      > +            s->new_frame_ready = true;
> >      > +        }
> >      > +    }
> >      > +    if (s->pending_frames > 0) {
> >      > +        apple_gfx_render_new_frame_bql_unlock(s);
> >      > +    } else {
> >      > +        bql_unlock();
> >      > +    }
> >      > +}
> >      > +
> >      > +static void apple_gfx_fb_update_display(void *opaque)
> >      > +{
> >      > +    AppleGFXState *s = opaque;
> >      > +
> >      > +    assert(bql_locked());
> >      > +    if (s->new_frame_ready) {
> >      > +        dpy_gfx_update_full(s->con);
> >      > +        s->new_frame_ready = false;
> >      > +        graphic_hw_update_done(s->con);
> >      > +    } else if (s->pending_frames > 0) {
> >      > +        s->gfx_update_requested = true;
> >      > +    } else {
> >      > +        graphic_hw_update_done(s->con);
> >      > +    }
> >      > +}
> >      > +
> >      > +static const GraphicHwOps apple_gfx_fb_ops = {
> >      > +    .gfx_update = apple_gfx_fb_update_display,
> >      > +    .gfx_update_async = true,
> >      > +};
> >      > +
> >      > +static void update_cursor(AppleGFXState *s)
> >      > +{
> >      > +    assert(bql_locked());
> >      > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
> >      > +                  s->pgdisp.cursorPosition.y, s->cursor_show);
> >      > +}
> >      > +
> >      > +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t
> >     height)
> >      > +{
> >      > +    MTLTextureDescriptor *textureDescriptor;
> >      > +
> >      > +    if (s->surface &&
> >      > +        width == surface_width(s->surface) &&
> >      > +        height == surface_height(s->surface)) {
> >      > +        return;
> >      > +    }
> >      > +
> >      > +    g_free(s->vram);
> >      > +    [s->texture release];
> >      > +
> >      > +    s->vram = g_malloc0_n(width * height, 4);
> >      > +    s->surface = qemu_create_displaysurface_from(width, height,
> >     PIXMAN_LE_a8r8g8b8,
> >      > +                                                 width * 4, s-
> >      >vram);> +> +    @autoreleasepool {
> >      > +        textureDescriptor =
> >      > +            [MTLTextureDescriptor
> >      > +
> >     texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
> >      > +                                             width:width
> >      > +                                            height:height
> >      > +                                         mipmapped:NO];
> >      > +        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
> >      > +        s->texture = [s->mtl
> >     newTextureWithDescriptor:textureDescriptor];
> >
> >
> >     What about creating pixman_image_t from s->texture.buffer.contents?
> >     This
> >     should save memory usage by removing the duplication of texture.
> >
> >
> > We need explicit control over when the GPU vs when the CPU may access
> > the texture - only one of them may access them at a time. As far as I
> > can tell, we can't control when the rest of Qemu might access the
> > pixman_image used in the console surface?
>
> You are right; we need to have duplicate buffers. We can still avoid
> copying by using two MTLTextures for double-buffering instead of having
> a MTLTexture and a pixman_image and copying between them for
> MTLStorageModeManaged.
>

Do I understand correctly that you intend to swap the surface->image on
every frame, or even the surface->image->data? If so, it's my understanding
from reading the source of a bunch of UI implementations a few weeks ago
that this is neither supported nor safe, as some implementations take
long-lived references to these internal data structures until a
dpy_gfx_switch callback. And the implementations for those callbacks are in
turn very expensive in some cases. This is why my conclusion in the v4
thread was that double-buffering was infeasible with the current
architecture.


> >
> >      > +    }
> >      > +
> >      > +    s->using_managed_texture_storage =
> >      > +        (s->texture.storageMode == MTLStorageModeManaged);
> >      > +    dpy_gfx_replace_surface(s->con, s->surface);
> >      > +}
> >      > +
> >      > +static void create_fb(AppleGFXState *s)
> >      > +{
> >      > +    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
> >      > +    set_mode(s, 1440, 1080);
> >      > +
> >      > +    s->cursor_show = true;
> >      > +}
> >      > +
> >      > +static size_t apple_gfx_get_default_mmio_range_size(void)
> >      > +{
> >      > +    size_t mmio_range_size;
> >      > +    @autoreleasepool {
> >      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> >      > +        mmio_range_size = desc.mmioLength;
> >      > +        [desc release];
> >      > +    }
> >      > +    return mmio_range_size;
> >      > +}
> >      > +
> >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
> >     char* obj_name)
> >      > +{
> >      > +    size_t mmio_range_size =
> >     apple_gfx_get_default_mmio_range_size();
> >      > +
> >      > +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
> >      > +    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s,
> >     obj_name,
> >      > +                          mmio_range_size);
> >      > +
> >      > +    /* TODO: PVG framework supports serialising device state:
> >     integrate it! */
> >      > +}
> >      > +
> >      > +typedef struct AppleGFXMapMemoryJob {
> >      > +    AppleGFXState *state;
> >      > +    PGTask_t *task;
> >      > +    uint64_t virtual_offset;
> >      > +    PGPhysicalMemoryRange_t *ranges;
> >      > +    uint32_t range_count;
> >      > +    bool read_only;
> >      > +    bool success;
> >      > +    bool done;
> >      > +} AppleGFXMapMemoryJob;
> >      > +
> >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
> >     guest_physical,
> >      > +                                               uint64_t length,
> >     bool read_only)
> >      > +{
> >      > +    MemoryRegion *ram_region;
> >      > +    uintptr_t host_address;
> >      > +    hwaddr ram_region_offset = 0;
> >      > +    hwaddr ram_region_length = length;
> >      > +
> >      > +    ram_region = address_space_translate(&address_space_memory,
> >      > +                                         guest_physical,
> >      > +                                         &ram_region_offset,
> >      > +                                         &ram_region_length, !
> >     read_only,
> >      > +                                         MEMTXATTRS_UNSPECIFIED);
> >
> >     Call memory_region_ref() so that it won't go away.
> >
> >      > +
> >      > +    if (!ram_region || ram_region_length < length ||
> >      > +        !memory_access_is_direct(ram_region, !read_only)) {
> >      > +        return 0;
> >      > +    }
> >      > +
> >      > +    host_address =
> >     (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
> >
> >     host_address is typed as uintptr_t, not mach_vm_address_t.
> >
> >      > +    if (host_address == 0) {
> >      > +        return 0;
> >      > +    }
> >      > +    host_address += ram_region_offset;
> >      > +
> >      > +    return host_address;
> >      > +}
> >      > +
> >      > +static void apple_gfx_map_memory(void *opaque)
> >      > +{
> >      > +    AppleGFXMapMemoryJob *job = opaque;
> >      > +    AppleGFXState *s = job->state;
> >      > +    PGTask_t *task                  = job->task;
> >      > +    uint32_t range_count            = job->range_count;
> >      > +    uint64_t virtual_offset         = job->virtual_offset;
> >      > +    PGPhysicalMemoryRange_t *ranges = job->ranges;
> >      > +    bool read_only                  = job->read_only;
> >      > +    kern_return_t r;
> >      > +    mach_vm_address_t target, source;
> >      > +    vm_prot_t cur_protection, max_protection;
> >      > +    bool success = true;
> >      > +
> >      > +    g_assert(bql_locked());
> >      > +
> >      > +    trace_apple_gfx_map_memory(task, range_count,
> >     virtual_offset, read_only);
> >      > +    for (int i = 0; i < range_count; i++) {
> >      > +        PGPhysicalMemoryRange_t *range = &ranges[i];
> >      > +
> >      > +        target = task->address + virtual_offset;
> >      > +        virtual_offset += range->physicalLength;
> >      > +
> >      > +        trace_apple_gfx_map_memory_range(i,
> range->physicalAddress,
> >      > +                                         range->physicalLength);
> >      > +
> >      > +        source = apple_gfx_host_address_for_gpa_range(range-
> >      >physicalAddress,
> >      > +                                                      range-
> >      >physicalLength,
> >      > +                                                      read_only);
> >      > +        if (source == 0) {
> >      > +            success = false;
> >      > +            continue;
> >      > +        }
> >      > +
> >      > +        MemoryRegion* alt_mr = NULL;
> >      > +        mach_vm_address_t alt_source =
> >     (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress, range-
> >      >physicalLength, NULL);
> >      > +        g_assert(alt_source == source);
> >
> >     Remove this; I guess this is for debugging.
> >
> >      > +
> >      > +        cur_protection = 0;
> >      > +        max_protection = 0;
> >      > +        // Map guest RAM at range->physicalAddress into PG task
> >     memory range
> >      > +        r = mach_vm_remap(mach_task_self(),
> >      > +                          &target, range->physicalLength,
> >     vm_page_size - 1,
> >      > +                          VM_FLAGS_FIXED | VM_FLAGS_OVERWRITE,
> >      > +                          mach_task_self(),
> >      > +                          source, false /* shared mapping, no
> >     copy */,
> >      > +                          &cur_protection, &max_protection,
> >      > +                          VM_INHERIT_COPY);
> >      > +        trace_apple_gfx_remap(r, source, target);
> >      > +        g_assert(r == KERN_SUCCESS);
> >      > +    }
> >      > +
> >      > +    qemu_mutex_lock(&s->job_mutex);
> >      > +    job->success = success;
> >      > +    job->done = true;
> >      > +    qemu_cond_broadcast(&s->job_cond);
> >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      > +}
> >      > +
> >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
> *job_done_flag)
> >      > +{
> >      > +    qemu_mutex_lock(&s->job_mutex);
> >      > +    while (!*job_done_flag) {
> >      > +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
> >      > +    }
> >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      > +}
> >      > +
> >      > +typedef struct AppleGFXReadMemoryJob {
> >      > +    AppleGFXState *s;
> >      > +    hwaddr physical_address;
> >      > +    uint64_t length;
> >      > +    void *dst;
> >      > +    bool done;
> >      > +} AppleGFXReadMemoryJob;
> >      > +
> >      > +static void apple_gfx_do_read_memory(void *opaque)
> >      > +{
> >      > +    AppleGFXReadMemoryJob *job = opaque;
> >      > +    AppleGFXState *s = job->s;
> >      > +
> >      > +    cpu_physical_memory_read(job->physical_address, job->dst,
> >     job->length);
> >
> >     Use: dma_memory_read()
> >
> >      > +
> >      > +    qemu_mutex_lock(&s->job_mutex);
> >      > +    job->done = true;
> >      > +    qemu_cond_broadcast(&s->job_cond);
> >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      > +}
> >      > +
> >      > +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr
> >     physical_address,
> >      > +                                  uint64_t length, void *dst)
> >      > +{
> >      > +    AppleGFXReadMemoryJob job = {
> >      > +        s, physical_address, length, dst
> >      > +    };
> >      > +
> >      > +    trace_apple_gfx_read_memory(physical_address, length, dst);
> >      > +
> >      > +    /* Traversing the memory map requires RCU/BQL, so do it in a
> >     BH. */
> >      > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> >     apple_gfx_do_read_memory,
> >      > +                            &job);
> >      > +    apple_gfx_await_bh_job(s, &job.done);
> >      > +}
> >      > +
> >      > +static void
> >     apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
> >      > +
> >       PGDeviceDescriptor *desc)
> >      > +{
> >      > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable *
> >     _Nonnull baseAddress) {
> >      > +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
> >      > +        *baseAddress = (void *)task->address;
> >      > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
> >      > +        return task;
> >      > +    };
> >      > +
> >      > +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
> >      > +        trace_apple_gfx_destroy_task(task);
> >      > +        QTAILQ_REMOVE(&s->tasks, task, node);
> >      > +        mach_vm_deallocate(mach_task_self(), task->address,
> >     task->len);
> >      > +        g_free(task);
> >      > +    };
> >      > +
> >      > +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t
> >     range_count,
> >      > +                       uint64_t virtual_offset, bool read_only,
> >      > +                       PGPhysicalMemoryRange_t * _Nonnull
> ranges) {
> >      > +        AppleGFXMapMemoryJob job = {
> >      > +            .state = s,
> >      > +            .task = task, .ranges = ranges, .range_count =
> >     range_count,
> >      > +            .read_only = read_only, .virtual_offset =
> >     virtual_offset,
> >      > +            .done = false, .success = true,
> >      > +        };
> >      > +        if (range_count > 0) {
> >      > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> >      > +                                    apple_gfx_map_memory, &job);
> >      > +            apple_gfx_await_bh_job(s, &job.done);
> >      > +        }
> >      > +        return job.success;
> >      > +    };
> >      > +
> >      > +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task, uint64_t
> >     virtualOffset,
> >      > +                         uint64_t length) {
> >      > +        kern_return_t r;
> >      > +        mach_vm_address_t range_address;
> >      > +
> >      > +        trace_apple_gfx_unmap_memory(task, virtualOffset,
> length);
> >      > +
> >      > +        /* Replace task memory range with fresh pages, undoing
> >     the mapping
> >      > +         * from guest RAM. */
> >      > +        range_address = task->address + virtualOffset;
> >      > +        r = mach_vm_allocate(mach_task_self(), &range_address,
> >     length,
> >      > +                             VM_FLAGS_FIXED |
> VM_FLAGS_OVERWRITE);
> >      > +        g_assert(r == KERN_SUCCESS);error_setg
> >
> >     An extra error_setg
> >
> >      > +
> >      > +        return true;
> >      > +    };
> >      > +
> >      > +    desc.readMemory = ^bool(uint64_t physical_address, uint64_t
> >     length,
> >      > +                            void * _Nonnull dst) {
> >      > +        apple_gfx_read_memory(s, physical_address, length, dst);
> >      > +        return true;
> >      > +    };
> >      > +}
> >      > +
> >      > +static PGDisplayDescriptor
> >     *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
> >      > +{
> >      > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
> >      > +
> >      > + disp_desc.name <http://disp_desc.name> = @"QEMU display";
> >      > +    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A
> >     20" display */
> >      > +    disp_desc.queue = dispatch_get_main_queue();
> >      > +    disp_desc.newFrameEventHandler = ^(void) {
> >      > +        trace_apple_gfx_new_frame();
> >      > +        dispatch_async(s->render_queue, ^{
> >      > +            /* Drop frames if we get too far ahead. */
> >      > +            bql_lock();
> >      > +            if (s->pending_frames >= 2) {
> >      > +                bql_unlock();
> >      > +                return;
> >      > +            }
> >      > +            ++s->pending_frames;
> >      > +            if (s->pending_frames > 1) {
> >      > +                bql_unlock();
> >      > +                return;
> >      > +            }
> >      > +            @autoreleasepool {
> >      > +                apple_gfx_render_new_frame_bql_unlock(s);
> >      > +            }
> >      > +        });
> >      > +    };
> >      > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t
> sizeInPixels,
> >      > +                                    OSType pixelFormat) {
> >      > +        trace_apple_gfx_mode_change(sizeInPixels.x,
> sizeInPixels.y);
> >      > +
> >      > +        BQL_LOCK_GUARD();
> >      > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
> >      > +    };
> >      > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
> >      > +                                     PGDisplayCoord_t hotSpot) {
> >      > +        [glyph retain];
> >      > +        dispatch_async(get_background_queue(), ^{
> >      > +            BQL_LOCK_GUARD();
> >      > +            uint32_t bpp = glyph.bitsPerPixel;
> >      > +            size_t width = glyph.pixelsWide;
> >      > +            size_t height = glyph.pixelsHigh;
> >      > +            size_t padding_bytes_per_row = glyph.bytesPerRow -
> >     width * 4;
> >      > +            const uint8_t* px_data = glyph.bitmapData;
> >      > +
> >      > +            trace_apple_gfx_cursor_set(bpp, width, height);
> >      > +
> >      > +            if (s->cursor) {
> >      > +                cursor_unref(s->cursor);
> >      > +                s->cursor = NULL;
> >      > +            }
> >      > +
> >      > +            if (bpp == 32) { /* Shouldn't be anything else, but
> >     just to be safe...*/
> >      > +                s->cursor = cursor_alloc(width, height);
> >      > +                s->cursor->hot_x = hotSpot.x;
> >      > +                s->cursor->hot_y = hotSpot.y;
> >      > +
> >      > +                uint32_t *dest_px = s->cursor->data;
> >      > +
> >      > +                for (size_t y = 0; y < height; ++y) {
> >      > +                    for (size_t x = 0; x < width; ++x) {
> >      > +                        /* NSBitmapImageRep's red & blue
> >     channels are swapped
> >      > +                         * compared to QEMUCursor's. */
> >      > +                        *dest_px =
> >      > +                            (px_data[0] << 16u) |
> >      > +                            (px_data[1] <<  8u) |
> >      > +                            (px_data[2] <<  0u) |
> >      > +                            (px_data[3] << 24u);
> >      > +                        ++dest_px;
> >      > +                        px_data += 4;
> >      > +                    }
> >      > +                    px_data += padding_bytes_per_row;
> >      > +                }
> >      > +                dpy_cursor_define(s->con, s->cursor);
> >      > +                update_cursor(s);
> >      > +            }
> >      > +            [glyph release];
> >      > +        });
> >      > +    };
> >      > +    disp_desc.cursorShowHandler = ^(BOOL show) {
> >      > +        dispatch_async(get_background_queue(), ^{
> >      > +            BQL_LOCK_GUARD();
> >      > +            trace_apple_gfx_cursor_show(show);
> >      > +            s->cursor_show = show;
> >      > +            update_cursor(s);
> >      > +        });
> >      > +    };
> >      > +    disp_desc.cursorMoveHandler = ^(void) {
> >      > +        dispatch_async(get_background_queue(), ^{
> >      > +            BQL_LOCK_GUARD();
> >      > +            trace_apple_gfx_cursor_move();
> >      > +            update_cursor(s);
> >      > +        });
> >      > +    };
> >      > +
> >      > +    return disp_desc;
> >      > +}
> >      > +
> >      > +static NSArray<PGDisplayMode*>*
> >     apple_gfx_prepare_display_mode_array(void)
> >      > +{
> >      > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
> >      > +    NSArray<PGDisplayMode*>* mode_array = nil;
> >      > +    int i;
> >      > +
> >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> >      > +        modes[i] =
> >      > +            [[PGDisplayMode alloc]
> >     initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
> >      > +    }
> >      > +
> >      > +    mode_array = [NSArray arrayWithObjects:modes
> >     count:ARRAY_SIZE(apple_gfx_modes)];
> >      > +
> >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> >      > +        [modes[i] release];
> >      > +        modes[i] = nil;
> >      > +    }
> >      > +
> >      > +    return mode_array;
> >      > +}
> >      > +
> >      > +static id<MTLDevice> copy_suitable_metal_device(void)
> >      > +{
> >      > +    id<MTLDevice> dev = nil;
> >      > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
> >      > +
> >      > +    /* Prefer a unified memory GPU. Failing that, pick a non-
> >     removable GPU. */
> >      > +    for (size_t i = 0; i < devs.count; ++i) {
> >      > +        if (devs[i].hasUnifiedMemory) {
> >      > +            dev = devs[i];
> >      > +            break;
> >      > +        }
> >      > +        if (!devs[i].removable) {
> >      > +            dev = devs[i];
> >      > +        }
> >      > +    }
> >      > +
> >      > +    if (dev != nil) {
> >      > +        [dev retain];
> >      > +    } else {
> >      > +        dev = MTLCreateSystemDefaultDevice();
> >      > +    }
> >      > +    [devs release];
> >      > +
> >      > +    return dev;
> >      > +}
> >      > +
> >      > +void apple_gfx_common_realize(AppleGFXState *s,
> >     PGDeviceDescriptor *desc,
> >      > +                              Error **errp)
> >      > +{
> >      > +    PGDisplayDescriptor *disp_desc = nil;
> >      > +
> >      > +    if (apple_gfx_mig_blocker == NULL) {
> >      > +        error_setg(&apple_gfx_mig_blocker,
> >      > +                  "Migration state blocked by apple-gfx display
> >     device");
> >      > +        if (migrate_add_blocker(&apple_gfx_mig_blocker, errp) <
> 0) {
> >      > +            return;
> >      > +        }
> >      > +    }
> >      > +
> >      > +    QTAILQ_INIT(&s->tasks);
> >      > +    s->render_queue = dispatch_queue_create("apple-gfx.render",
> >      > +
> DISPATCH_QUEUE_SERIAL);
> >      > +    s->mtl = copy_suitable_metal_device();
> >      > +    s->mtl_queue = [s->mtl newCommandQueue];
> >      > +
> >      > +    desc.device = s->mtl;
> >      > +
> >      > +    apple_gfx_register_task_mapping_handlers(s, desc);
> >      > +
> >      > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
> >      > +
> >      > +    disp_desc = apple_gfx_prepare_display_descriptor(s);
> >      > +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
> >      > +                                              port:0
> >     serialNum:1234];
> >      > +    [disp_desc release];
> >      > +    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
> >      > +
> >      > +    create_fb(s);
> >      > +
> >      > +    qemu_mutex_init(&s->job_mutex);
> >      > +    qemu_cond_init(&s->job_cond);
> >      > +}
> >      > diff --git a/hw/display/meson.build b/hw/display/meson.build
> >      > index 20a94973fa2..619e642905a 100644
> >      > --- a/hw/display/meson.build
> >      > +++ b/hw/display/meson.build
> >      > @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true:
> >     files('artist.c'))
> >      >
> >      >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c',
> >     'ati_2d.c', 'ati_dbg.c'), pixman])
> >      >
> >      > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
> >     [files('apple-gfx.m'), pvg, metal])
> >      > +if cpu == 'aarch64'
> >      > +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true:
> >     [files('apple-gfx-mmio.m'), pvg, metal])
> >      > +endif
> >      >
> >      >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
> >      >     virtio_gpu_ss = ss.source_set()
> >      > diff --git a/hw/display/trace-events b/hw/display/trace-events
> >      > index 781f8a33203..214998312b9 100644
> >      > --- a/hw/display/trace-events
> >      > +++ b/hw/display/trace-events
> >      > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width)
> >     "dest_width : %u"
> >      >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
> >      >   dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
> >      >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
> >      > +
> >      > +# apple-gfx.m
> >      > +apple_gfx_read(uint64_t offset, uint64_t res)
> >     "offset=0x%"PRIx64" res=0x%"PRIx64
> >      > +apple_gfx_write(uint64_t offset, uint64_t val)
> >     "offset=0x%"PRIx64" val=0x%"PRIx64
> >      > +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x
> >     base_addr=%p"
> >      > +apple_gfx_destroy_task(void *task) "task=%p"
> >      > +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t
> >     virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
> >     virtual_offset=0x%"PRIx64" read_only=%d"
> >      > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr,
> >     uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
> >      > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t
> >     target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
> >      > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset,
> >     uint64_t length) "task=%p virtual_offset=0x%"PRIx64"
> length=0x%"PRIx64
> >      > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length,
> >     void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
> >      > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
> >      > +apple_gfx_new_frame(void) ""
> >      > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64"
> >     y=%"PRId64
> >      > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t
> >     height) "bpp=%d width=%"PRId64" height=0x%"PRId64
> >      > +apple_gfx_cursor_show(uint32_t show) "show=%d"
> >      > +apple_gfx_cursor_move(void) ""
> >      > +apple_gfx_common_init(const char *device_name, size_t mmio_size)
> >     "device: %s; MMIO size: %zu bytes"
> >      > +
> >      > +# apple-gfx-mmio.m
> >      > +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res)
> >     "offset=0x%"PRIx64" res=0x%"PRIx64
> >      > +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val)
> >     "offset=0x%"PRIx64" val=0x%"PRIx64
> >      > +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t
> >     ro, void *va, void *e, void *f, void* va_result, int success)
> >     "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p -> *va=%p,
> >     success = %d"
> >      > +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c, void *d,
> >     void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
> >      > +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
> >      > +
> >      > diff --git a/meson.build b/meson.build
> >      > index d26690ce204..0e124eff13f 100644
> >      > --- a/meson.build
> >      > +++ b/meson.build
> >      > @@ -761,6 +761,8 @@ socket = []
> >      >   version_res = []
> >      >   coref = []
> >      >   iokit = []
> >      > +pvg = []
> >      > +metal = []
> >      >   emulator_link_args = []
> >      >   midl = not_found
> >      >   widl = not_found
> >      > @@ -782,6 +784,8 @@ elif host_os == 'darwin'
> >      >     coref = dependency('appleframeworks', modules:
> 'CoreFoundation')
> >      >     iokit = dependency('appleframeworks', modules: 'IOKit',
> >     required: false)
> >      >     host_dsosuf = '.dylib'
> >      > +  pvg = dependency('appleframeworks', modules:
> >     'ParavirtualizedGraphics')
> >      > +  metal = dependency('appleframeworks', modules: 'Metal')
> >      >   elif host_os == 'sunos'
> >      >     socket = [cc.find_library('socket'),
> >      >               cc.find_library('nsl'),
> >
>
>

[-- Attachment #2: Type: text/html, Size: 60297 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-26 10:24         ` Phil Dennis-Jordan
@ 2024-10-28  7:42           ` Akihiko Odaki
  2024-10-28  9:00             ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-28  7:42 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/26 19:24, Phil Dennis-Jordan wrote:
> 
> 
> On Sat, 26 Oct 2024 at 06:40, Akihiko Odaki <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
>     On 2024/10/26 4:43, Phil Dennis-Jordan wrote:
>      >
>      >
>      > On Fri, 25 Oct 2024 at 08:03, Akihiko Odaki
>     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>      > <mailto:akihiko.odaki@daynix.com
>     <mailto:akihiko.odaki@daynix.com>>> wrote:
>      >
>      >     On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
>      >      > +    /* For running PVG memory-mapping requests in the AIO
>     context */
>      >      > +    QemuCond job_cond;
>      >      > +    QemuMutex job_mutex;
>      >
>      >     Use: QemuEvent
>      >
>      >
>      > Hmm. I think if we were to use that, we would need to create a new
>      > QemuEvent for every job and destroy it afterward, which seems
>     expensive.
>      > We can't rule out multiple concurrent jobs being submitted, and the
>      > QemuEvent system only supports a single producer as far as I can
>     tell.
>      >
>      > You can probably sort of hack around it with just one QemuEvent by
>      > putting the qemu_event_wait into a loop and turning the job.done
>     flag
>      > into an atomic (because it would now need to be checked outside the
>      > lock) but this all seems unnecessarily complicated considering the
>      > QemuEvent uses the same mechanism QemuCond/QemuMutex internally
>     on macOS
>      > (the only platform relevant here), except we can use it as
>     intended with
>      > QemuCond/QemuMutex rather than having to work against the
>     abstraction.
> 
>     I don't think it's going to be used concurrently. It would be difficult
>     to reason even for the framework if it performs memory
>     unmapping/mapping/reading operations concurrently.
> 
> 
> I've just performed a very quick test by wrapping the job submission/ 
> wait in the 2 mapMemory callbacks and the 1 readMemory callback with 
> atomic counters and logging whenever a counter went above 1.
> 
>   * Overall, concurrent callbacks across all types were common (many per 
> second when the VM is busy). It's not exactly a "thundering herd" (I 
> never saw >2) but it's probably not a bad idea to use a separate 
> condition variable for each job type. (task map, surface map, memory read)
>   * While I did not observe any concurrent memory mapping operations 
> *within* a type of memory map (2 task mappings or 2 surface mappings) I 
> did see very occasional concurrent memory *read* callbacks. These would, 
> as far as I can tell, not be safe with QemuEvents, unless we placed the 
> event inside the job struct and init/destroyed it on every callback 
> (which seems like excessive overhead).

I think we can tolerate that overhead. init/destroy essentially sets the 
fields in the data structure and I estimate its total size is about 100 
bytes. It is probably better than waking an irrelevant thread up. I also 
hope that keeps the code simple; it's not worthwhile adding code to 
optimize this.

> 
> My recommendation would be to split it up into 3 pairs of mutex/cond; 
> this will almost entirely remove any contention, but continue to be safe 
> for when it does occur. I don't think QemuEvent is a realistic option 
> (too tricky to get right) for the observed-concurrent readMemory 
> callback. I'm nervous about assuming the mapMemory callbacks will NEVER 
> be called concurrently, but at a push I'll acquiesce to switching those 
> to QemuEvent in the absence of evidence of concurrency.> >     PGDevice.h also notes
>     raiseInterrupt needs to be thread-safe while it doesn't make such notes
>     for memory operations. This actually makes sense.
> 
>     If it's ever going to be used concurrently, it's better to have
>     QemuEvent for each job to avoid the thundering herd problem.
 > >      >
>      >      > +
>      >      > +    dispatch_queue_t render_queue;
>      >      > +    /* The following fields should only be accessed from
>     the BQL: */
>      >
>      >     Perhaps it may be better to document fields that can be accessed
>      >     *without* the BQL; most things in QEMU implicitly require the
>     BQL.
>      >
>      >      > +    bool gfx_update_requested;
>      >      > +    bool new_frame_ready;
>      >      > +    bool using_managed_texture_storage;
>      >      > +} AppleGFXState;
>      >      > +
>      >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s,
>     const
>      >     char* obj_name);
>      >      > +void apple_gfx_common_realize(AppleGFXState *s,
>      >     PGDeviceDescriptor *desc,
>      >      > +                              Error **errp);
>      >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
>      >     guest_physical,
>      >      > +                                               uint64_t
>     length,
>      >     bool read_only);
>      >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
>     *job_done_flag);
>      >      > +
>      >      > +#endif
>      >      > +
>      >      > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
>      >      > new file mode 100644
>      >      > index 00000000000..46be9957f69
>      >      > --- /dev/null
>      >      > +++ b/hw/display/apple-gfx.m
>      >      > @@ -0,0 +1,713 @@
>      >      > +/*
>      >      > + * QEMU Apple ParavirtualizedGraphics.framework device
>      >      > + *
>      >      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All
>      >     Rights Reserved.
>      >      > + *
>      >      > + * This work is licensed under the terms of the GNU GPL,
>     version
>      >     2 or later.
>      >      > + * See the COPYING file in the top-level directory.
>      >      > + *
>      >      > + * ParavirtualizedGraphics.framework is a set of
>     libraries that
>      >     macOS provides
>      >      > + * which implements 3d graphics passthrough to the host
>     as well as a
>      >      > + * proprietary guest communication channel to drive it. This
>      >     device model
>      >      > + * implements support to drive that library from within QEMU.
>      >      > + */
>      >      > +
>      >      > +#include "qemu/osdep.h"
>      >      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>      >      > +#include <mach/mach_vm.h>
>      >      > +#include "apple-gfx.h"
>      >      > +#include "trace.h"
>      >      > +#include "qemu-main.h"
>      >      > +#include "exec/address-spaces.h"
>      >      > +#include "migration/blocker.h"
>      >      > +#include "monitor/monitor.h"
>      >      > +#include "qemu/main-loop.h"
>      >      > +#include "qemu/cutils.h"
>      >      > +#include "qemu/log.h"
>      >      > +#include "qapi/visitor.h"
>      >      > +#include "qapi/error.h"
>      >      > +#include "ui/console.h"
>      >      > +
>      >      > +static const PGDisplayCoord_t apple_gfx_modes[] = {
>      >      > +    { .x = 1440, .y = 1080 },
>      >      > +    { .x = 1280, .y = 1024 },
>      >      > +};
>      >      > +
>      >      > +/* This implements a type defined in
>     <ParavirtualizedGraphics/
>      >     PGDevice.h>
>      >      > + * which is opaque from the framework's point of view.
>     Typedef
>      >     PGTask_t already
>      >      > + * exists in the framework headers. */
>      >      > +struct PGTask_s {
>      >      > +    QTAILQ_ENTRY(PGTask_s) node;
>      >      > +    mach_vm_address_t address;
>      >      > +    uint64_t len;
>      >      > +};
>      >      > +
>      >      > +static Error *apple_gfx_mig_blocker;
>      >
>      >     This does not have to be a static variable.
>      >
>      >
>      > Hmm, the first 5 or so examples of migration blockers in other
>     devices
>      > etc. I could find were all declared in this way. What are you
>     suggesting
>      > as the alternative? And why not use the same pattern as in most
>     of the
>      > rest of the code base?
> 
>     I was wrong. This is better to be a static variable to ensure we won't
>     add the same blocker in case we have two device instances.
> 
>      >
>      >      > +
>      >      > +static void
>     apple_gfx_render_frame_completed(AppleGFXState *s,
>      >      > +                                             uint32_t width,
>      >     uint32_t height);
>      >      > +
>      >      > +static inline dispatch_queue_t get_background_queue(void)
>      >
>      >     Don't add inline. The only effect for modern compilers of
>     inline is to
>      >     suppress the unused function warnings.
>      >
>      >      > +{
>      >      > +    return
>      >     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
>      >      > +}
>      >      > +
>      >      > +static PGTask_t *apple_gfx_new_task(AppleGFXState *s,
>     uint64_t len)
>      >      > +{
>      >      > +    mach_vm_address_t task_mem;
>      >      > +    PGTask_t *task;
>      >      > +    kern_return_t r;
>      >      > +
>      >      > +    r = mach_vm_allocate(mach_task_self(), &task_mem, len,
>      >     VM_FLAGS_ANYWHERE);
>      >      > +    if (r != KERN_SUCCESS || task_mem == 0) {
>      >
>      >     Let's remove the check for task_mem == 0. We have no reason to
>      >     reject it
>      >     if the platform insists it allocated a memory at address 0 though
>      >     such a
>      >     situation should never happen in practice.
>      >
>      >      > +        return NULL;
>      >      > +    }
>      >      > +
>      >      > +    task = g_new0(PGTask_t, 1);
>      >      > +
>      >      > +    task->address = task_mem;
>      >      > +    task->len = len;
>      >      > +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
>      >      > +
>      >      > +    return task;
>      >      > +}
>      >      > +
>      >      > +typedef struct AppleGFXIOJob {
>      >      > +    AppleGFXState *state;
>      >      > +    uint64_t offset;
>      >      > +    uint64_t value;
>      >      > +    bool completed;
>      >      > +} AppleGFXIOJob;
>      >      > +
>      >      > +static void apple_gfx_do_read(void *opaque)
>      >      > +{
>      >      > +    AppleGFXIOJob *job = opaque;
>      >      > +    job->value = [job->state->pgdev mmioReadAtOffset:job-
>      >offset];
>      >      > +    qatomic_set(&job->completed, true);
>      >      > +    aio_wait_kick();
>      >      > +}
>      >      > +
>      >      > +static uint64_t apple_gfx_read(void *opaque, hwaddr offset,
>      >     unsigned size)
>      >      > +{
>      >      > +    AppleGFXIOJob job = {
>      >      > +        .state = opaque,
>      >      > +        .offset = offset,
>      >      > +        .completed = false,
>      >      > +    };
>      >      > +    AioContext *context = qemu_get_aio_context();
>      >      > +    dispatch_queue_t queue = get_background_queue();
>      >      > +
>      >      > +    dispatch_async_f(queue, &job, apple_gfx_do_read);
>      >      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
>      >      > +
>      >      > +    trace_apple_gfx_read(offset, job.value);
>      >      > +    return job.value;
>      >      > +}
>      >      > +
>      >      > +static void apple_gfx_do_write(void *opaque)
>      >      > +{
>      >      > +    AppleGFXIOJob *job = opaque;
>      >      > +    [job->state->pgdev mmioWriteAtOffset:job->offset
>     value:job-
>      >      >value];
>      >      > +    qatomic_set(&job->completed, true);
>      >      > +    aio_wait_kick();
>      >      > +}
>      >      > +
>      >      > +static void apple_gfx_write(void *opaque, hwaddr offset,
>      >     uint64_t val,
>      >      > +                            unsigned size)
>      >      > +{
>      >      > +    /* The methods mmioReadAtOffset: and especially
>      >     mmioWriteAtOffset: can
>      >      > +     * trigger and block on operations on other dispatch
>     queues,
>      >     which in turn
>      >      > +     * may call back out on one or more of the callback
>     blocks.
>      >     For this reason,
>      >      > +     * and as we are holding the BQL, we invoke the I/O
>     methods
>      >     on a pool
>      >      > +     * thread and handle AIO tasks while we wait. Any work in
>      >     the callbacks
>      >      > +     * requiring the BQL will in turn schedule BHs which this
>      >     thread will
>      >      > +     * process while waiting. */
>      >      > +    AppleGFXIOJob job = {
>      >      > +        .state = opaque,
>      >      > +        .offset = offset,
>      >      > +        .value = val,
>      >      > +        .completed = false,
>      >      > +    };
>      >      > +    AioContext *context = qemu_get_current_aio_context();
>      >      > +    dispatch_queue_t queue = get_background_queue();
>      >      > +
>      >      > +    dispatch_async_f(queue, &job, apple_gfx_do_write);
>      >      > +    AIO_WAIT_WHILE(context, !qatomic_read(&job.completed));
>      >      > +
>      >      > +    trace_apple_gfx_write(offset, val);
>      >      > +}
>      >      > +
>      >      > +static const MemoryRegionOps apple_gfx_ops = {
>      >      > +    .read = apple_gfx_read,
>      >      > +    .write = apple_gfx_write,
>      >      > +    .endianness = DEVICE_LITTLE_ENDIAN,
>      >      > +    .valid = {
>      >      > +        .min_access_size = 4,
>      >      > +        .max_access_size = 8,
>      >      > +    },
>      >      > +    .impl = {
>      >      > +        .min_access_size = 4,
>      >      > +        .max_access_size = 4,
>      >      > +    },
>      >      > +};
>      >      > +
>      >      > +static void
>     apple_gfx_render_new_frame_bql_unlock(AppleGFXState *s)
>      >      > +{
>      >      > +    BOOL r;
>      >      > +    uint32_t width = surface_width(s->surface);
>      >      > +    uint32_t height = surface_height(s->surface);
>      >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
>      >      > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue
>      >     commandBuffer];
>      >      > +    id<MTLTexture> texture = s->texture;
>      >      > +
>      >      > +    assert(bql_locked());
>      >      > +    [texture retain];
>      >      > +
>      >      > +    bql_unlock();
>      >      > +
>      >      > +    /* This is not safe to call from the BQL due to PVG-
>     internal
>      >     locks causing
>      >      > +     * deadlocks. */
>      >      > +    r = [s->pgdisp
>     encodeCurrentFrameToCommandBuffer:command_buffer
>      >      > +                                             texture:texture
>      >      > +                                              region:region];
>      >      > +    if (!r) {
>      >      > +        [texture release];
>      >      > +        bql_lock();
>      >      > +        --s->pending_frames;
>      >      > +        bql_unlock();
>      >      > +        qemu_log_mask(LOG_GUEST_ERROR,
>      >     "apple_gfx_render_new_frame_bql_unlock: "
>      >
>      >     Use: __func__
>      >
>      >      > +
>      >     "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
>      >      > +        return;
>      >      > +    }
>      >      > +
>      >      > +    if (s->using_managed_texture_storage) {
>      >      > +        /* "Managed" textures exist in both VRAM and RAM and
>      >     must be synced. */
>      >      > +        id<MTLBlitCommandEncoder> blit = [command_buffer
>      >     blitCommandEncoder];
>      >      > +        [blit synchronizeResource:texture];
>      >      > +        [blit endEncoding];
>      >      > +    }
>      >      > +    [texture release];
>      >      > +    [command_buffer addCompletedHandler:
>      >      > +        ^(id<MTLCommandBuffer> cb)
>      >      > +        {
>      >      > +            dispatch_async(s->render_queue, ^{
>      >      > +                apple_gfx_render_frame_completed(s,
>     width, height);
>      >      > +            });
>      >      > +        }];
>      >      > +    [command_buffer commit];
>      >      > +}
>      >      > +
>      >      > +static void copy_mtl_texture_to_surface_mem(id<MTLTexture>
>      >     texture, void *vram)
>      >      > +{
>      >      > +    /* TODO: Skip this entirely on a pure Metal or headless/
>      >     guest-only
>      >      > +     * rendering path, else use a blit command encoder? Needs
>      >     careful
>      >      > +     * (double?) buffering design. */
>      >      > +    size_t width = texture.width, height = texture.height;
>      >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
>      >      > +    [texture getBytes:vram
>      >      > +          bytesPerRow:(width * 4)
>      >      > +        bytesPerImage:(width * height * 4)
>      >      > +           fromRegion:region
>      >      > +          mipmapLevel:0
>      >      > +                slice:0];
>      >      > +}copy_mtl_texture_to_surface_mem
>      >      > +
>      >      > +static void
>     apple_gfx_render_frame_completed(AppleGFXState *s,
>      >      > +                                             uint32_t width,
>      >     uint32_t height)
>      >      > +{
>      >      > +    bql_lock();
>      >      > +    --s->pending_frames;
>      >      > +    assert(s->pending_frames >= 0);
>      >      > +
>      >      > +    /* Only update display if mode hasn't changed since we
>      >     started rendering. */
>      >      > +    if (width == surface_width(s->surface) &&
>      >      > +        height == surface_height(s->surface)) {
>      >      > +        copy_mtl_texture_to_surface_mem(s->texture, s->vram);
>      >      > +        if (s->gfx_update_requested) {
>      >      > +            s->gfx_update_requested = false;
>      >      > +            dpy_gfx_update_full(s->con);
>      >      > +            graphic_hw_update_done(s->con);
>      >      > +            s->new_frame_ready = false;
>      >      > +        } else {
>      >      > +            s->new_frame_ready = true;
>      >      > +        }
>      >      > +    }
>      >      > +    if (s->pending_frames > 0) {
>      >      > +        apple_gfx_render_new_frame_bql_unlock(s);
>      >      > +    } else {
>      >      > +        bql_unlock();
>      >      > +    }
>      >      > +}
>      >      > +
>      >      > +static void apple_gfx_fb_update_display(void *opaque)
>      >      > +{
>      >      > +    AppleGFXState *s = opaque;
>      >      > +
>      >      > +    assert(bql_locked());
>      >      > +    if (s->new_frame_ready) {
>      >      > +        dpy_gfx_update_full(s->con);
>      >      > +        s->new_frame_ready = false;
>      >      > +        graphic_hw_update_done(s->con);
>      >      > +    } else if (s->pending_frames > 0) {
>      >      > +        s->gfx_update_requested = true;
>      >      > +    } else {
>      >      > +        graphic_hw_update_done(s->con);
>      >      > +    }
>      >      > +}
>      >      > +
>      >      > +static const GraphicHwOps apple_gfx_fb_ops = {
>      >      > +    .gfx_update = apple_gfx_fb_update_display,
>      >      > +    .gfx_update_async = true,
>      >      > +};
>      >      > +
>      >      > +static void update_cursor(AppleGFXState *s)
>      >      > +{
>      >      > +    assert(bql_locked());
>      >      > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
>      >      > +                  s->pgdisp.cursorPosition.y, s-
>      >cursor_show);
>      >      > +}
>      >      > +
>      >      > +static void set_mode(AppleGFXState *s, uint32_t width,
>     uint32_t
>      >     height)
>      >      > +{
>      >      > +    MTLTextureDescriptor *textureDescriptor;
>      >      > +
>      >      > +    if (s->surface &&
>      >      > +        width == surface_width(s->surface) &&
>      >      > +        height == surface_height(s->surface)) {
>      >      > +        return;
>      >      > +    }
>      >      > +
>      >      > +    g_free(s->vram);
>      >      > +    [s->texture release];
>      >      > +
>      >      > +    s->vram = g_malloc0_n(width * height, 4);
>      >      > +    s->surface = qemu_create_displaysurface_from(width,
>     height,
>      >     PIXMAN_LE_a8r8g8b8,
>      >      > +                                                 width *
>     4, s-
>      >      >vram);> +> +    @autoreleasepool {
>      >      > +        textureDescriptor =
>      >      > +            [MTLTextureDescriptor
>      >      > +
>      >     texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
>      >      > +                                             width:width
>      >      > +                                            height:height
>      >      > +                                         mipmapped:NO];
>      >      > +        textureDescriptor.usage = s-
>      >pgdisp.minimumTextureUsage;
>      >      > +        s->texture = [s->mtl
>      >     newTextureWithDescriptor:textureDescriptor];
>      >
>      >
>      >     What about creating pixman_image_t from s-
>      >texture.buffer.contents?
>      >     This
>      >     should save memory usage by removing the duplication of texture.
>      >
>      >
>      > We need explicit control over when the GPU vs when the CPU may
>     access
>      > the texture - only one of them may access them at a time. As far
>     as I
>      > can tell, we can't control when the rest of Qemu might access the
>      > pixman_image used in the console surface?
> 
>     You are right; we need to have duplicate buffers. We can still avoid
>     copying by using two MTLTextures for double-buffering instead of having
>     a MTLTexture and a pixman_image and copying between them for
>     MTLStorageModeManaged.
> 
> Do I understand correctly that you intend to swap the surface->image on 
> every frame, or even the surface->image->data? If so, it's my 
> understanding from reading the source of a bunch of UI implementations a 
> few weeks ago that this is neither supported nor safe, as some 
> implementations take long-lived references to these internal data 
> structures until a dpy_gfx_switch callback. And the implementations for 
> those callbacks are in turn very expensive in some cases. This is why my 
> conclusion in the v4 thread was that double-buffering was infeasible 
> with the current architecture.

By the way, can't we take the BQL after 
encodeCurrentFrameToCommandBuffer and keep it until the completion 
handler? PVG requires the BQL unlocked for forward progress due to the 
bottom half usage in callbacks, but Metal doesn't.

> 
>      >
>      >      > +    }
>      >      > +
>      >      > +    s->using_managed_texture_storage =
>      >      > +        (s->texture.storageMode == MTLStorageModeManaged);
>      >      > +    dpy_gfx_replace_surface(s->con, s->surface);
>      >      > +}
>      >      > +
>      >      > +static void create_fb(AppleGFXState *s)
>      >      > +{
>      >      > +    s->con = graphic_console_init(NULL, 0,
>     &apple_gfx_fb_ops, s);
>      >      > +    set_mode(s, 1440, 1080);
>      >      > +
>      >      > +    s->cursor_show = true;
>      >      > +}
>      >      > +
>      >      > +static size_t apple_gfx_get_default_mmio_range_size(void)
>      >      > +{
>      >      > +    size_t mmio_range_size;
>      >      > +    @autoreleasepool {
>      >      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
>      >      > +        mmio_range_size = desc.mmioLength;
>      >      > +        [desc release];
>      >      > +    }
>      >      > +    return mmio_range_size;
>      >      > +}
>      >      > +
>      >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s,
>     const
>      >     char* obj_name)
>      >      > +{
>      >      > +    size_t mmio_range_size =
>      >     apple_gfx_get_default_mmio_range_size();
>      >      > +
>      >      > +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
>      >      > +    memory_region_init_io(&s->iomem_gfx, obj,
>     &apple_gfx_ops, s,
>      >     obj_name,
>      >      > +                          mmio_range_size);
>      >      > +
>      >      > +    /* TODO: PVG framework supports serialising device state:
>      >     integrate it! */
>      >      > +}
>      >      > +
>      >      > +typedef struct AppleGFXMapMemoryJob {
>      >      > +    AppleGFXState *state;
>      >      > +    PGTask_t *task;
>      >      > +    uint64_t virtual_offset;
>      >      > +    PGPhysicalMemoryRange_t *ranges;
>      >      > +    uint32_t range_count;
>      >      > +    bool read_only;
>      >      > +    bool success;
>      >      > +    bool done;
>      >      > +} AppleGFXMapMemoryJob;
>      >      > +
>      >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
>      >     guest_physical,
>      >      > +                                               uint64_t
>     length,
>      >     bool read_only)
>      >      > +{
>      >      > +    MemoryRegion *ram_region;
>      >      > +    uintptr_t host_address;
>      >      > +    hwaddr ram_region_offset = 0;
>      >      > +    hwaddr ram_region_length = length;
>      >      > +
>      >      > +    ram_region =
>     address_space_translate(&address_space_memory,
>      >      > +                                         guest_physical,
>      >      > +                                         &ram_region_offset,
>      >      > +                                       
>       &ram_region_length, !
>      >     read_only,
>      >      > +                                       
>       MEMTXATTRS_UNSPECIFIED);
>      >
>      >     Call memory_region_ref() so that it won't go away.
>      >
>      >      > +
>      >      > +    if (!ram_region || ram_region_length < length ||
>      >      > +        !memory_access_is_direct(ram_region, !read_only)) {
>      >      > +        return 0;
>      >      > +    }
>      >      > +
>      >      > +    host_address =
>      >     (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
>      >
>      >     host_address is typed as uintptr_t, not mach_vm_address_t.
>      >
>      >      > +    if (host_address == 0) {
>      >      > +        return 0;
>      >      > +    }
>      >      > +    host_address += ram_region_offset;
>      >      > +
>      >      > +    return host_address;
>      >      > +}
>      >      > +
>      >      > +static void apple_gfx_map_memory(void *opaque)
>      >      > +{
>      >      > +    AppleGFXMapMemoryJob *job = opaque;
>      >      > +    AppleGFXState *s = job->state;
>      >      > +    PGTask_t *task                  = job->task;
>      >      > +    uint32_t range_count            = job->range_count;
>      >      > +    uint64_t virtual_offset         = job->virtual_offset;
>      >      > +    PGPhysicalMemoryRange_t *ranges = job->ranges;
>      >      > +    bool read_only                  = job->read_only;
>      >      > +    kern_return_t r;
>      >      > +    mach_vm_address_t target, source;
>      >      > +    vm_prot_t cur_protection, max_protection;
>      >      > +    bool success = true;
>      >      > +
>      >      > +    g_assert(bql_locked());
>      >      > +
>      >      > +    trace_apple_gfx_map_memory(task, range_count,
>      >     virtual_offset, read_only);
>      >      > +    for (int i = 0; i < range_count; i++) {
>      >      > +        PGPhysicalMemoryRange_t *range = &ranges[i];
>      >      > +
>      >      > +        target = task->address + virtual_offset;
>      >      > +        virtual_offset += range->physicalLength;
>      >      > +
>      >      > +        trace_apple_gfx_map_memory_range(i, range-
>      >physicalAddress,
>      >      > +                                         range-
>      >physicalLength);
>      >      > +
>      >      > +        source = apple_gfx_host_address_for_gpa_range(range-
>      >      >physicalAddress,
>      >      > +                                                      range-
>      >      >physicalLength,
>      >      > +                                                     
>     read_only);
>      >      > +        if (source == 0) {
>      >      > +            success = false;
>      >      > +            continue;
>      >      > +        }
>      >      > +
>      >      > +        MemoryRegion* alt_mr = NULL;
>      >      > +        mach_vm_address_t alt_source =
>      >     (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress,
>     range-
>      >      >physicalLength, NULL);
>      >      > +        g_assert(alt_source == source);
>      >
>      >     Remove this; I guess this is for debugging.
>      >
>      >      > +
>      >      > +        cur_protection = 0;
>      >      > +        max_protection = 0;
>      >      > +        // Map guest RAM at range->physicalAddress into
>     PG task
>      >     memory range
>      >      > +        r = mach_vm_remap(mach_task_self(),
>      >      > +                          &target, range->physicalLength,
>      >     vm_page_size - 1,
>      >      > +                          VM_FLAGS_FIXED |
>     VM_FLAGS_OVERWRITE,
>      >      > +                          mach_task_self(),
>      >      > +                          source, false /* shared mapping, no
>      >     copy */,
>      >      > +                          &cur_protection, &max_protection,
>      >      > +                          VM_INHERIT_COPY);
>      >      > +        trace_apple_gfx_remap(r, source, target);
>      >      > +        g_assert(r == KERN_SUCCESS);
>      >      > +    }
>      >      > +
>      >      > +    qemu_mutex_lock(&s->job_mutex);
>      >      > +    job->success = success;
>      >      > +    job->done = true;
>      >      > +    qemu_cond_broadcast(&s->job_cond);
>      >      > +    qemu_mutex_unlock(&s->job_mutex);
>      >      > +}
>      >      > +
>      >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
>     *job_done_flag)
>      >      > +{
>      >      > +    qemu_mutex_lock(&s->job_mutex);
>      >      > +    while (!*job_done_flag) {
>      >      > +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
>      >      > +    }
>      >      > +    qemu_mutex_unlock(&s->job_mutex);
>      >      > +}
>      >      > +
>      >      > +typedef struct AppleGFXReadMemoryJob {
>      >      > +    AppleGFXState *s;
>      >      > +    hwaddr physical_address;
>      >      > +    uint64_t length;
>      >      > +    void *dst;
>      >      > +    bool done;
>      >      > +} AppleGFXReadMemoryJob;
>      >      > +
>      >      > +static void apple_gfx_do_read_memory(void *opaque)
>      >      > +{
>      >      > +    AppleGFXReadMemoryJob *job = opaque;
>      >      > +    AppleGFXState *s = job->s;
>      >      > +
>      >      > +    cpu_physical_memory_read(job->physical_address, job->dst,
>      >     job->length);
>      >
>      >     Use: dma_memory_read()
>      >
>      >      > +
>      >      > +    qemu_mutex_lock(&s->job_mutex);
>      >      > +    job->done = true;
>      >      > +    qemu_cond_broadcast(&s->job_cond);
>      >      > +    qemu_mutex_unlock(&s->job_mutex);
>      >      > +}
>      >      > +
>      >      > +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr
>      >     physical_address,
>      >      > +                                  uint64_t length, void *dst)
>      >      > +{
>      >      > +    AppleGFXReadMemoryJob job = {
>      >      > +        s, physical_address, length, dst
>      >      > +    };
>      >      > +
>      >      > +    trace_apple_gfx_read_memory(physical_address, length,
>     dst);
>      >      > +
>      >      > +    /* Traversing the memory map requires RCU/BQL, so do
>     it in a
>      >     BH. */
>      >      > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
>      >     apple_gfx_do_read_memory,
>      >      > +                            &job);
>      >      > +    apple_gfx_await_bh_job(s, &job.done);
>      >      > +}
>      >      > +
>      >      > +static void
>      >     apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
>      >      > +
>      >       PGDeviceDescriptor *desc)
>      >      > +{
>      >      > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable *
>      >     _Nonnull baseAddress) {
>      >      > +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
>      >      > +        *baseAddress = (void *)task->address;
>      >      > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
>      >      > +        return task;
>      >      > +    };
>      >      > +
>      >      > +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
>      >      > +        trace_apple_gfx_destroy_task(task);
>      >      > +        QTAILQ_REMOVE(&s->tasks, task, node);
>      >      > +        mach_vm_deallocate(mach_task_self(), task->address,
>      >     task->len);
>      >      > +        g_free(task);
>      >      > +    };
>      >      > +
>      >      > +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task, uint32_t
>      >     range_count,
>      >      > +                       uint64_t virtual_offset, bool
>     read_only,
>      >      > +                       PGPhysicalMemoryRange_t * _Nonnull
>     ranges) {
>      >      > +        AppleGFXMapMemoryJob job = {
>      >      > +            .state = s,
>      >      > +            .task = task, .ranges = ranges, .range_count =
>      >     range_count,
>      >      > +            .read_only = read_only, .virtual_offset =
>      >     virtual_offset,
>      >      > +            .done = false, .success = true,
>      >      > +        };
>      >      > +        if (range_count > 0) {
>      >      > +            aio_bh_schedule_oneshot(qemu_get_aio_context(),
>      >      > +                                    apple_gfx_map_memory,
>     &job);
>      >      > +            apple_gfx_await_bh_job(s, &job.done);
>      >      > +        }
>      >      > +        return job.success;
>      >      > +    };
>      >      > +
>      >      > +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task,
>     uint64_t
>      >     virtualOffset,
>      >      > +                         uint64_t length) {
>      >      > +        kern_return_t r;
>      >      > +        mach_vm_address_t range_address;
>      >      > +
>      >      > +        trace_apple_gfx_unmap_memory(task, virtualOffset,
>     length);
>      >      > +
>      >      > +        /* Replace task memory range with fresh pages,
>     undoing
>      >     the mapping
>      >      > +         * from guest RAM. */
>      >      > +        range_address = task->address + virtualOffset;
>      >      > +        r = mach_vm_allocate(mach_task_self(),
>     &range_address,
>      >     length,
>      >      > +                             VM_FLAGS_FIXED |
>     VM_FLAGS_OVERWRITE);
>      >      > +        g_assert(r == KERN_SUCCESS);error_setg
>      >
>      >     An extra error_setg
>      >
>      >      > +
>      >      > +        return true;
>      >      > +    };
>      >      > +
>      >      > +    desc.readMemory = ^bool(uint64_t physical_address,
>     uint64_t
>      >     length,
>      >      > +                            void * _Nonnull dst) {
>      >      > +        apple_gfx_read_memory(s, physical_address,
>     length, dst);
>      >      > +        return true;
>      >      > +    };
>      >      > +}
>      >      > +
>      >      > +static PGDisplayDescriptor
>      >     *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
>      >      > +{
>      >      > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor
>     new];
>      >      > +
>      >      > + disp_desc.name <http://disp_desc.name> <http://
>     disp_desc.name <http://disp_desc.name>> = @"QEMU display";
>      >      > +    disp_desc.sizeInMillimeters = NSMakeSize(400.,
>     300.); /* A
>      >     20" display */
>      >      > +    disp_desc.queue = dispatch_get_main_queue();
>      >      > +    disp_desc.newFrameEventHandler = ^(void) {
>      >      > +        trace_apple_gfx_new_frame();
>      >      > +        dispatch_async(s->render_queue, ^{
>      >      > +            /* Drop frames if we get too far ahead. */
>      >      > +            bql_lock();
>      >      > +            if (s->pending_frames >= 2) {
>      >      > +                bql_unlock();
>      >      > +                return;
>      >      > +            }
>      >      > +            ++s->pending_frames;
>      >      > +            if (s->pending_frames > 1) {
>      >      > +                bql_unlock();
>      >      > +                return;
>      >      > +            }
>      >      > +            @autoreleasepool {
>      >      > +                apple_gfx_render_new_frame_bql_unlock(s);
>      >      > +            }
>      >      > +        });
>      >      > +    };
>      >      > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t
>     sizeInPixels,
>      >      > +                                    OSType pixelFormat) {
>      >      > +        trace_apple_gfx_mode_change(sizeInPixels.x,
>     sizeInPixels.y);
>      >      > +
>      >      > +        BQL_LOCK_GUARD();
>      >      > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
>      >      > +    };
>      >      > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
>      >      > +                                     PGDisplayCoord_t
>     hotSpot) {
>      >      > +        [glyph retain];
>      >      > +        dispatch_async(get_background_queue(), ^{
>      >      > +            BQL_LOCK_GUARD();
>      >      > +            uint32_t bpp = glyph.bitsPerPixel;
>      >      > +            size_t width = glyph.pixelsWide;
>      >      > +            size_t height = glyph.pixelsHigh;
>      >      > +            size_t padding_bytes_per_row =
>     glyph.bytesPerRow -
>      >     width * 4;
>      >      > +            const uint8_t* px_data = glyph.bitmapData;
>      >      > +
>      >      > +            trace_apple_gfx_cursor_set(bpp, width, height);
>      >      > +
>      >      > +            if (s->cursor) {
>      >      > +                cursor_unref(s->cursor);
>      >      > +                s->cursor = NULL;
>      >      > +            }
>      >      > +
>      >      > +            if (bpp == 32) { /* Shouldn't be anything
>     else, but
>      >     just to be safe...*/
>      >      > +                s->cursor = cursor_alloc(width, height);
>      >      > +                s->cursor->hot_x = hotSpot.x;
>      >      > +                s->cursor->hot_y = hotSpot.y;
>      >      > +
>      >      > +                uint32_t *dest_px = s->cursor->data;
>      >      > +
>      >      > +                for (size_t y = 0; y < height; ++y) {
>      >      > +                    for (size_t x = 0; x < width; ++x) {
>      >      > +                        /* NSBitmapImageRep's red & blue
>      >     channels are swapped
>      >      > +                         * compared to QEMUCursor's. */
>      >      > +                        *dest_px =
>      >      > +                            (px_data[0] << 16u) |
>      >      > +                            (px_data[1] <<  8u) |
>      >      > +                            (px_data[2] <<  0u) |
>      >      > +                            (px_data[3] << 24u);
>      >      > +                        ++dest_px;
>      >      > +                        px_data += 4;
>      >      > +                    }
>      >      > +                    px_data += padding_bytes_per_row;
>      >      > +                }
>      >      > +                dpy_cursor_define(s->con, s->cursor);
>      >      > +                update_cursor(s);
>      >      > +            }
>      >      > +            [glyph release];
>      >      > +        });
>      >      > +    };
>      >      > +    disp_desc.cursorShowHandler = ^(BOOL show) {
>      >      > +        dispatch_async(get_background_queue(), ^{
>      >      > +            BQL_LOCK_GUARD();
>      >      > +            trace_apple_gfx_cursor_show(show);
>      >      > +            s->cursor_show = show;
>      >      > +            update_cursor(s);
>      >      > +        });
>      >      > +    };
>      >      > +    disp_desc.cursorMoveHandler = ^(void) {
>      >      > +        dispatch_async(get_background_queue(), ^{
>      >      > +            BQL_LOCK_GUARD();
>      >      > +            trace_apple_gfx_cursor_move();
>      >      > +            update_cursor(s);
>      >      > +        });
>      >      > +    };
>      >      > +
>      >      > +    return disp_desc;
>      >      > +}
>      >      > +
>      >      > +static NSArray<PGDisplayMode*>*
>      >     apple_gfx_prepare_display_mode_array(void)
>      >      > +{
>      >      > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
>      >      > +    NSArray<PGDisplayMode*>* mode_array = nil;
>      >      > +    int i;
>      >      > +
>      >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
>      >      > +        modes[i] =
>      >      > +            [[PGDisplayMode alloc]
>      >     initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
>      >      > +    }
>      >      > +
>      >      > +    mode_array = [NSArray arrayWithObjects:modes
>      >     count:ARRAY_SIZE(apple_gfx_modes)];
>      >      > +
>      >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
>      >      > +        [modes[i] release];
>      >      > +        modes[i] = nil;
>      >      > +    }
>      >      > +
>      >      > +    return mode_array;
>      >      > +}
>      >      > +
>      >      > +static id<MTLDevice> copy_suitable_metal_device(void)
>      >      > +{
>      >      > +    id<MTLDevice> dev = nil;
>      >      > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
>      >      > +
>      >      > +    /* Prefer a unified memory GPU. Failing that, pick a non-
>      >     removable GPU. */
>      >      > +    for (size_t i = 0; i < devs.count; ++i) {
>      >      > +        if (devs[i].hasUnifiedMemory) {
>      >      > +            dev = devs[i];
>      >      > +            break;
>      >      > +        }
>      >      > +        if (!devs[i].removable) {
>      >      > +            dev = devs[i];
>      >      > +        }
>      >      > +    }
>      >      > +
>      >      > +    if (dev != nil) {
>      >      > +        [dev retain];
>      >      > +    } else {
>      >      > +        dev = MTLCreateSystemDefaultDevice();
>      >      > +    }
>      >      > +    [devs release];
>      >      > +
>      >      > +    return dev;
>      >      > +}
>      >      > +
>      >      > +void apple_gfx_common_realize(AppleGFXState *s,
>      >     PGDeviceDescriptor *desc,
>      >      > +                              Error **errp)
>      >      > +{
>      >      > +    PGDisplayDescriptor *disp_desc = nil;
>      >      > +
>      >      > +    if (apple_gfx_mig_blocker == NULL) {
>      >      > +        error_setg(&apple_gfx_mig_blocker,
>      >      > +                  "Migration state blocked by apple-gfx
>     display
>      >     device");
>      >      > +        if (migrate_add_blocker(&apple_gfx_mig_blocker,
>     errp) < 0) {
>      >      > +            return;
>      >      > +        }
>      >      > +    }
>      >      > +
>      >      > +    QTAILQ_INIT(&s->tasks);
>      >      > +    s->render_queue = dispatch_queue_create("apple-
>     gfx.render",
>      >      > +                                           
>     DISPATCH_QUEUE_SERIAL);
>      >      > +    s->mtl = copy_suitable_metal_device();
>      >      > +    s->mtl_queue = [s->mtl newCommandQueue];
>      >      > +
>      >      > +    desc.device = s->mtl;
>      >      > +
>      >      > +    apple_gfx_register_task_mapping_handlers(s, desc);
>      >      > +
>      >      > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
>      >      > +
>      >      > +    disp_desc = apple_gfx_prepare_display_descriptor(s);
>      >      > +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
>      >      > +                                              port:0
>      >     serialNum:1234];
>      >      > +    [disp_desc release];
>      >      > +    s->pgdisp.modeList =
>     apple_gfx_prepare_display_mode_array();
>      >      > +
>      >      > +    create_fb(s);
>      >      > +
>      >      > +    qemu_mutex_init(&s->job_mutex);
>      >      > +    qemu_cond_init(&s->job_cond);
>      >      > +}
>      >      > diff --git a/hw/display/meson.build b/hw/display/meson.build
>      >      > index 20a94973fa2..619e642905a 100644
>      >      > --- a/hw/display/meson.build
>      >      > +++ b/hw/display/meson.build
>      >      > @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST',
>     if_true:
>      >     files('artist.c'))
>      >      >
>      >      >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true:
>     [files('ati.c',
>      >     'ati_2d.c', 'ati_dbg.c'), pixman])
>      >      >
>      >      > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
>      >     [files('apple-gfx.m'), pvg, metal])
>      >      > +if cpu == 'aarch64'
>      >      > +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true:
>      >     [files('apple-gfx-mmio.m'), pvg, metal])
>      >      > +endif
>      >      >
>      >      >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
>      >      >     virtio_gpu_ss = ss.source_set()
>      >      > diff --git a/hw/display/trace-events b/hw/display/trace-events
>      >      > index 781f8a33203..214998312b9 100644
>      >      > --- a/hw/display/trace-events
>      >      > +++ b/hw/display/trace-events
>      >      > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width)
>      >     "dest_width : %u"
>      >      >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
>      >      >   dm163_channels(int channel, uint8_t value) "channel %d:
>     0x%x"
>      >      >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
>      >      > +
>      >      > +# apple-gfx.m
>      >      > +apple_gfx_read(uint64_t offset, uint64_t res)
>      >     "offset=0x%"PRIx64" res=0x%"PRIx64
>      >      > +apple_gfx_write(uint64_t offset, uint64_t val)
>      >     "offset=0x%"PRIx64" val=0x%"PRIx64
>      >      > +apple_gfx_create_task(uint32_t vm_size, void *va)
>     "vm_size=0x%x
>      >     base_addr=%p"
>      >      > +apple_gfx_destroy_task(void *task) "task=%p"
>      >      > +apple_gfx_map_memory(void *task, uint32_t range_count,
>     uint64_t
>      >     virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
>      >     virtual_offset=0x%"PRIx64" read_only=%d"
>      >      > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr,
>      >     uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64"
>     phys_len=0x%"PRIx64
>      >      > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t
>      >     target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
>      >      > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset,
>      >     uint64_t length) "task=%p virtual_offset=0x%"PRIx64"
>     length=0x%"PRIx64
>      >      > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length,
>      >     void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
>      >      > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
>      >      > +apple_gfx_new_frame(void) ""
>      >      > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64"
>      >     y=%"PRId64
>      >      > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t
>      >     height) "bpp=%d width=%"PRId64" height=0x%"PRId64
>      >      > +apple_gfx_cursor_show(uint32_t show) "show=%d"
>      >      > +apple_gfx_cursor_move(void) ""
>      >      > +apple_gfx_common_init(const char *device_name, size_t
>     mmio_size)
>      >     "device: %s; MMIO size: %zu bytes"
>      >      > +
>      >      > +# apple-gfx-mmio.m
>      >      > +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res)
>      >     "offset=0x%"PRIx64" res=0x%"PRIx64
>      >      > +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val)
>      >     "offset=0x%"PRIx64" val=0x%"PRIx64
>      >      > +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len,
>     uint32_t
>      >     ro, void *va, void *e, void *f, void* va_result, int success)
>      >     "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p ->
>     *va=%p,
>      >     success = %d"
>      >      > +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c,
>     void *d,
>      >     void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
>      >      > +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
>      >      > +
>      >      > diff --git a/meson.build b/meson.build
>      >      > index d26690ce204..0e124eff13f 100644
>      >      > --- a/meson.build
>      >      > +++ b/meson.build
>      >      > @@ -761,6 +761,8 @@ socket = []
>      >      >   version_res = []
>      >      >   coref = []
>      >      >   iokit = []
>      >      > +pvg = []
>      >      > +metal = []
>      >      >   emulator_link_args = []
>      >      >   midl = not_found
>      >      >   widl = not_found
>      >      > @@ -782,6 +784,8 @@ elif host_os == 'darwin'
>      >      >     coref = dependency('appleframeworks', modules:
>     'CoreFoundation')
>      >      >     iokit = dependency('appleframeworks', modules: 'IOKit',
>      >     required: false)
>      >      >     host_dsosuf = '.dylib'
>      >      > +  pvg = dependency('appleframeworks', modules:
>      >     'ParavirtualizedGraphics')
>      >      > +  metal = dependency('appleframeworks', modules: 'Metal')
>      >      >   elif host_os == 'sunos'
>      >      >     socket = [cc.find_library('socket'),
>      >      >               cc.find_library('nsl'),
>      >
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28  7:42           ` Akihiko Odaki
@ 2024-10-28  9:00             ` Phil Dennis-Jordan
  2024-10-28 13:31               ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-28  9:00 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 53917 bytes --]

On Mon, 28 Oct 2024 at 08:42, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/26 19:24, Phil Dennis-Jordan wrote:
> >
> >
> > On Sat, 26 Oct 2024 at 06:40, Akihiko Odaki <akihiko.odaki@daynix.com
> > <mailto:akihiko.odaki@daynix.com>> wrote:
> >
> >     On 2024/10/26 4:43, Phil Dennis-Jordan wrote:
> >      >
> >      >
> >      > On Fri, 25 Oct 2024 at 08:03, Akihiko Odaki
> >     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
> >      > <mailto:akihiko.odaki@daynix.com
> >     <mailto:akihiko.odaki@daynix.com>>> wrote:
> >      >
> >      >     On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> >      >      > +    /* For running PVG memory-mapping requests in the AIO
> >     context */
> >      >      > +    QemuCond job_cond;
> >      >      > +    QemuMutex job_mutex;
> >      >
> >      >     Use: QemuEvent
> >      >
> >      >
> >      > Hmm. I think if we were to use that, we would need to create a new
> >      > QemuEvent for every job and destroy it afterward, which seems
> >     expensive.
> >      > We can't rule out multiple concurrent jobs being submitted, and
> the
> >      > QemuEvent system only supports a single producer as far as I can
> >     tell.
> >      >
> >      > You can probably sort of hack around it with just one QemuEvent by
> >      > putting the qemu_event_wait into a loop and turning the job.done
> >     flag
> >      > into an atomic (because it would now need to be checked outside
> the
> >      > lock) but this all seems unnecessarily complicated considering the
> >      > QemuEvent uses the same mechanism QemuCond/QemuMutex internally
> >     on macOS
> >      > (the only platform relevant here), except we can use it as
> >     intended with
> >      > QemuCond/QemuMutex rather than having to work against the
> >     abstraction.
> >
> >     I don't think it's going to be used concurrently. It would be
> difficult
> >     to reason even for the framework if it performs memory
> >     unmapping/mapping/reading operations concurrently.
> >
> >
> > I've just performed a very quick test by wrapping the job submission/
> > wait in the 2 mapMemory callbacks and the 1 readMemory callback with
> > atomic counters and logging whenever a counter went above 1.
> >
> >   * Overall, concurrent callbacks across all types were common (many per
> > second when the VM is busy). It's not exactly a "thundering herd" (I
> > never saw >2) but it's probably not a bad idea to use a separate
> > condition variable for each job type. (task map, surface map, memory
> read)
> >   * While I did not observe any concurrent memory mapping operations
> > *within* a type of memory map (2 task mappings or 2 surface mappings) I
> > did see very occasional concurrent memory *read* callbacks. These would,
> > as far as I can tell, not be safe with QemuEvents, unless we placed the
> > event inside the job struct and init/destroyed it on every callback
> > (which seems like excessive overhead).
>
> I think we can tolerate that overhead. init/destroy essentially sets the
> fields in the data structure and I estimate its total size is about 100
> bytes. It is probably better than waking an irrelevant thread up. I also
> hope that keeps the code simple; it's not worthwhile adding code to
> optimize this.
>

At least pthread_cond_{init,destroy} and pthread_mutex_{init,destroy} don't
make any syscalls, so yeah it's probably an acceptable overhead.


> >
> > My recommendation would be to split it up into 3 pairs of mutex/cond;
> > this will almost entirely remove any contention, but continue to be safe
> > for when it does occur. I don't think QemuEvent is a realistic option
> > (too tricky to get right) for the observed-concurrent readMemory
> > callback. I'm nervous about assuming the mapMemory callbacks will NEVER
> > be called concurrently, but at a push I'll acquiesce to switching those
> > to QemuEvent in the absence of evidence of concurrency.> >
>  PGDevice.h also notes
> >     raiseInterrupt needs to be thread-safe while it doesn't make such
> notes
> >     for memory operations. This actually makes sense.
> >
> >     If it's ever going to be used concurrently, it's better to have
> >     QemuEvent for each job to avoid the thundering herd problem.
>  > >      >
> >      >      > +
> >      >      > +    dispatch_queue_t render_queue;
> >      >      > +    /* The following fields should only be accessed from
> >     the BQL: */
> >      >
> >      >     Perhaps it may be better to document fields that can be
> accessed
> >      >     *without* the BQL; most things in QEMU implicitly require the
> >     BQL.
> >      >
> >      >      > +    bool gfx_update_requested;
> >      >      > +    bool new_frame_ready;
> >      >      > +    bool using_managed_texture_storage;
> >      >      > +} AppleGFXState;
> >      >      > +
> >      >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s,
> >     const
> >      >     char* obj_name);
> >      >      > +void apple_gfx_common_realize(AppleGFXState *s,
> >      >     PGDeviceDescriptor *desc,
> >      >      > +                              Error **errp);
> >      >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
> >      >     guest_physical,
> >      >      > +                                               uint64_t
> >     length,
> >      >     bool read_only);
> >      >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
> >     *job_done_flag);
> >      >      > +
> >      >      > +#endif
> >      >      > +
> >      >      > diff --git a/hw/display/apple-gfx.m
> b/hw/display/apple-gfx.m
> >      >      > new file mode 100644
> >      >      > index 00000000000..46be9957f69
> >      >      > --- /dev/null
> >      >      > +++ b/hw/display/apple-gfx.m
> >      >      > @@ -0,0 +1,713 @@
> >      >      > +/*
> >      >      > + * QEMU Apple ParavirtualizedGraphics.framework device
> >      >      > + *
> >      >      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates.
> All
> >      >     Rights Reserved.
> >      >      > + *
> >      >      > + * This work is licensed under the terms of the GNU GPL,
> >     version
> >      >     2 or later.
> >      >      > + * See the COPYING file in the top-level directory.
> >      >      > + *
> >      >      > + * ParavirtualizedGraphics.framework is a set of
> >     libraries that
> >      >     macOS provides
> >      >      > + * which implements 3d graphics passthrough to the host
> >     as well as a
> >      >      > + * proprietary guest communication channel to drive it.
> This
> >      >     device model
> >      >      > + * implements support to drive that library from within
> QEMU.
> >      >      > + */
> >      >      > +
> >      >      > +#include "qemu/osdep.h"
> >      >      > +#import
> <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> >      >      > +#include <mach/mach_vm.h>
> >      >      > +#include "apple-gfx.h"
> >      >      > +#include "trace.h"
> >      >      > +#include "qemu-main.h"
> >      >      > +#include "exec/address-spaces.h"
> >      >      > +#include "migration/blocker.h"
> >      >      > +#include "monitor/monitor.h"
> >      >      > +#include "qemu/main-loop.h"
> >      >      > +#include "qemu/cutils.h"
> >      >      > +#include "qemu/log.h"
> >      >      > +#include "qapi/visitor.h"
> >      >      > +#include "qapi/error.h"
> >      >      > +#include "ui/console.h"
> >      >      > +
> >      >      > +static const PGDisplayCoord_t apple_gfx_modes[] = {
> >      >      > +    { .x = 1440, .y = 1080 },
> >      >      > +    { .x = 1280, .y = 1024 },
> >      >      > +};
> >      >      > +
> >      >      > +/* This implements a type defined in
> >     <ParavirtualizedGraphics/
> >      >     PGDevice.h>
> >      >      > + * which is opaque from the framework's point of view.
> >     Typedef
> >      >     PGTask_t already
> >      >      > + * exists in the framework headers. */
> >      >      > +struct PGTask_s {
> >      >      > +    QTAILQ_ENTRY(PGTask_s) node;
> >      >      > +    mach_vm_address_t address;
> >      >      > +    uint64_t len;
> >      >      > +};
> >      >      > +
> >      >      > +static Error *apple_gfx_mig_blocker;
> >      >
> >      >     This does not have to be a static variable.
> >      >
> >      >
> >      > Hmm, the first 5 or so examples of migration blockers in other
> >     devices
> >      > etc. I could find were all declared in this way. What are you
> >     suggesting
> >      > as the alternative? And why not use the same pattern as in most
> >     of the
> >      > rest of the code base?
> >
> >     I was wrong. This is better to be a static variable to ensure we
> won't
> >     add the same blocker in case we have two device instances.
> >
> >      >
> >      >      > +
> >      >      > +static void
> >     apple_gfx_render_frame_completed(AppleGFXState *s,
> >      >      > +                                             uint32_t
> width,
> >      >     uint32_t height);
> >      >      > +
> >      >      > +static inline dispatch_queue_t get_background_queue(void)
> >      >
> >      >     Don't add inline. The only effect for modern compilers of
> >     inline is to
> >      >     suppress the unused function warnings.
> >      >
> >      >      > +{
> >      >      > +    return
> >      >     dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
> >      >      > +}
> >      >      > +
> >      >      > +static PGTask_t *apple_gfx_new_task(AppleGFXState *s,
> >     uint64_t len)
> >      >      > +{
> >      >      > +    mach_vm_address_t task_mem;
> >      >      > +    PGTask_t *task;
> >      >      > +    kern_return_t r;
> >      >      > +
> >      >      > +    r = mach_vm_allocate(mach_task_self(), &task_mem, len,
> >      >     VM_FLAGS_ANYWHERE);
> >      >      > +    if (r != KERN_SUCCESS || task_mem == 0) {
> >      >
> >      >     Let's remove the check for task_mem == 0. We have no reason to
> >      >     reject it
> >      >     if the platform insists it allocated a memory at address 0
> though
> >      >     such a
> >      >     situation should never happen in practice.
> >      >
> >      >      > +        return NULL;
> >      >      > +    }
> >      >      > +
> >      >      > +    task = g_new0(PGTask_t, 1);
> >      >      > +
> >      >      > +    task->address = task_mem;
> >      >      > +    task->len = len;
> >      >      > +    QTAILQ_INSERT_TAIL(&s->tasks, task, node);
> >      >      > +
> >      >      > +    return task;
> >      >      > +}
> >      >      > +
> >      >      > +typedef struct AppleGFXIOJob {
> >      >      > +    AppleGFXState *state;
> >      >      > +    uint64_t offset;
> >      >      > +    uint64_t value;
> >      >      > +    bool completed;
> >      >      > +} AppleGFXIOJob;
> >      >      > +
> >      >      > +static void apple_gfx_do_read(void *opaque)
> >      >      > +{
> >      >      > +    AppleGFXIOJob *job = opaque;
> >      >      > +    job->value = [job->state->pgdev mmioReadAtOffset:job-
> >      >offset];
> >      >      > +    qatomic_set(&job->completed, true);
> >      >      > +    aio_wait_kick();
> >      >      > +}
> >      >      > +
> >      >      > +static uint64_t apple_gfx_read(void *opaque, hwaddr
> offset,
> >      >     unsigned size)
> >      >      > +{
> >      >      > +    AppleGFXIOJob job = {
> >      >      > +        .state = opaque,
> >      >      > +        .offset = offset,
> >      >      > +        .completed = false,
> >      >      > +    };
> >      >      > +    AioContext *context = qemu_get_aio_context();
> >      >      > +    dispatch_queue_t queue = get_background_queue();
> >      >      > +
> >      >      > +    dispatch_async_f(queue, &job, apple_gfx_do_read);
> >      >      > +    AIO_WAIT_WHILE(context,
> !qatomic_read(&job.completed));
> >      >      > +
> >      >      > +    trace_apple_gfx_read(offset, job.value);
> >      >      > +    return job.value;
> >      >      > +}
> >      >      > +
> >      >      > +static void apple_gfx_do_write(void *opaque)
> >      >      > +{
> >      >      > +    AppleGFXIOJob *job = opaque;
> >      >      > +    [job->state->pgdev mmioWriteAtOffset:job->offset
> >     value:job-
> >      >      >value];
> >      >      > +    qatomic_set(&job->completed, true);
> >      >      > +    aio_wait_kick();
> >      >      > +}
> >      >      > +
> >      >      > +static void apple_gfx_write(void *opaque, hwaddr offset,
> >      >     uint64_t val,
> >      >      > +                            unsigned size)
> >      >      > +{
> >      >      > +    /* The methods mmioReadAtOffset: and especially
> >      >     mmioWriteAtOffset: can
> >      >      > +     * trigger and block on operations on other dispatch
> >     queues,
> >      >     which in turn
> >      >      > +     * may call back out on one or more of the callback
> >     blocks.
> >      >     For this reason,
> >      >      > +     * and as we are holding the BQL, we invoke the I/O
> >     methods
> >      >     on a pool
> >      >      > +     * thread and handle AIO tasks while we wait. Any
> work in
> >      >     the callbacks
> >      >      > +     * requiring the BQL will in turn schedule BHs which
> this
> >      >     thread will
> >      >      > +     * process while waiting. */
> >      >      > +    AppleGFXIOJob job = {
> >      >      > +        .state = opaque,
> >      >      > +        .offset = offset,
> >      >      > +        .value = val,
> >      >      > +        .completed = false,
> >      >      > +    };
> >      >      > +    AioContext *context = qemu_get_current_aio_context();
> >      >      > +    dispatch_queue_t queue = get_background_queue();
> >      >      > +
> >      >      > +    dispatch_async_f(queue, &job, apple_gfx_do_write);
> >      >      > +    AIO_WAIT_WHILE(context,
> !qatomic_read(&job.completed));
> >      >      > +
> >      >      > +    trace_apple_gfx_write(offset, val);
> >      >      > +}
> >      >      > +
> >      >      > +static const MemoryRegionOps apple_gfx_ops = {
> >      >      > +    .read = apple_gfx_read,
> >      >      > +    .write = apple_gfx_write,
> >      >      > +    .endianness = DEVICE_LITTLE_ENDIAN,
> >      >      > +    .valid = {
> >      >      > +        .min_access_size = 4,
> >      >      > +        .max_access_size = 8,
> >      >      > +    },
> >      >      > +    .impl = {
> >      >      > +        .min_access_size = 4,
> >      >      > +        .max_access_size = 4,
> >      >      > +    },
> >      >      > +};
> >      >      > +
> >      >      > +static void
> >     apple_gfx_render_new_frame_bql_unlock(AppleGFXState *s)
> >      >      > +{
> >      >      > +    BOOL r;
> >      >      > +    uint32_t width = surface_width(s->surface);
> >      >      > +    uint32_t height = surface_height(s->surface);
> >      >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width,
> height);
> >      >      > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue
> >      >     commandBuffer];
> >      >      > +    id<MTLTexture> texture = s->texture;
> >      >      > +
> >      >      > +    assert(bql_locked());
> >      >      > +    [texture retain];
> >      >      > +
> >      >      > +    bql_unlock();
> >      >      > +
> >      >      > +    /* This is not safe to call from the BQL due to PVG-
> >     internal
> >      >     locks causing
> >      >      > +     * deadlocks. */
> >      >      > +    r = [s->pgdisp
> >     encodeCurrentFrameToCommandBuffer:command_buffer
> >      >      > +
>  texture:texture
> >      >      > +
> region:region];
> >      >      > +    if (!r) {
> >      >      > +        [texture release];
> >      >      > +        bql_lock();
> >      >      > +        --s->pending_frames;
> >      >      > +        bql_unlock();
> >      >      > +        qemu_log_mask(LOG_GUEST_ERROR,
> >      >     "apple_gfx_render_new_frame_bql_unlock: "
> >      >
> >      >     Use: __func__
> >      >
> >      >      > +
> >      >     "encodeCurrentFrameToCommandBuffer:texture:region: failed\n");
> >      >      > +        return;
> >      >      > +    }
> >      >      > +
> >      >      > +    if (s->using_managed_texture_storage) {
> >      >      > +        /* "Managed" textures exist in both VRAM and RAM
> and
> >      >     must be synced. */
> >      >      > +        id<MTLBlitCommandEncoder> blit = [command_buffer
> >      >     blitCommandEncoder];
> >      >      > +        [blit synchronizeResource:texture];
> >      >      > +        [blit endEncoding];
> >      >      > +    }
> >      >      > +    [texture release];
> >      >      > +    [command_buffer addCompletedHandler:
> >      >      > +        ^(id<MTLCommandBuffer> cb)
> >      >      > +        {
> >      >      > +            dispatch_async(s->render_queue, ^{
> >      >      > +                apple_gfx_render_frame_completed(s,
> >     width, height);
> >      >      > +            });
> >      >      > +        }];
> >      >      > +    [command_buffer commit];
> >      >      > +}
> >      >      > +
> >      >      > +static void copy_mtl_texture_to_surface_mem(id<MTLTexture>
> >      >     texture, void *vram)
> >      >      > +{
> >      >      > +    /* TODO: Skip this entirely on a pure Metal or
> headless/
> >      >     guest-only
> >      >      > +     * rendering path, else use a blit command encoder?
> Needs
> >      >     careful
> >      >      > +     * (double?) buffering design. */
> >      >      > +    size_t width = texture.width, height = texture.height;
> >      >      > +    MTLRegion region = MTLRegionMake2D(0, 0, width,
> height);
> >      >      > +    [texture getBytes:vram
> >      >      > +          bytesPerRow:(width * 4)
> >      >      > +        bytesPerImage:(width * height * 4)
> >      >      > +           fromRegion:region
> >      >      > +          mipmapLevel:0
> >      >      > +                slice:0];
> >      >      > +}copy_mtl_texture_to_surface_mem
> >      >      > +
> >      >      > +static void
> >     apple_gfx_render_frame_completed(AppleGFXState *s,
> >      >      > +                                             uint32_t
> width,
> >      >     uint32_t height)
> >      >      > +{
> >      >      > +    bql_lock();
> >      >      > +    --s->pending_frames;
> >      >      > +    assert(s->pending_frames >= 0);
> >      >      > +
> >      >      > +    /* Only update display if mode hasn't changed since we
> >      >     started rendering. */
> >      >      > +    if (width == surface_width(s->surface) &&
> >      >      > +        height == surface_height(s->surface)) {
> >      >      > +        copy_mtl_texture_to_surface_mem(s->texture,
> s->vram);
> >      >      > +        if (s->gfx_update_requested) {
> >      >      > +            s->gfx_update_requested = false;
> >      >      > +            dpy_gfx_update_full(s->con);
> >      >      > +            graphic_hw_update_done(s->con);
> >      >      > +            s->new_frame_ready = false;
> >      >      > +        } else {
> >      >      > +            s->new_frame_ready = true;
> >      >      > +        }
> >      >      > +    }
> >      >      > +    if (s->pending_frames > 0) {
> >      >      > +        apple_gfx_render_new_frame_bql_unlock(s);
> >      >      > +    } else {
> >      >      > +        bql_unlock();
> >      >      > +    }
> >      >      > +}
> >      >      > +
> >      >      > +static void apple_gfx_fb_update_display(void *opaque)
> >      >      > +{
> >      >      > +    AppleGFXState *s = opaque;
> >      >      > +
> >      >      > +    assert(bql_locked());
> >      >      > +    if (s->new_frame_ready) {
> >      >      > +        dpy_gfx_update_full(s->con);
> >      >      > +        s->new_frame_ready = false;
> >      >      > +        graphic_hw_update_done(s->con);
> >      >      > +    } else if (s->pending_frames > 0) {
> >      >      > +        s->gfx_update_requested = true;
> >      >      > +    } else {
> >      >      > +        graphic_hw_update_done(s->con);
> >      >      > +    }
> >      >      > +}
> >      >      > +
> >      >      > +static const GraphicHwOps apple_gfx_fb_ops = {
> >      >      > +    .gfx_update = apple_gfx_fb_update_display,
> >      >      > +    .gfx_update_async = true,
> >      >      > +};
> >      >      > +
> >      >      > +static void update_cursor(AppleGFXState *s)
> >      >      > +{
> >      >      > +    assert(bql_locked());
> >      >      > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
> >      >      > +                  s->pgdisp.cursorPosition.y, s-
> >      >cursor_show);
> >      >      > +}
> >      >      > +
> >      >      > +static void set_mode(AppleGFXState *s, uint32_t width,
> >     uint32_t
> >      >     height)
> >      >      > +{
> >      >      > +    MTLTextureDescriptor *textureDescriptor;
> >      >      > +
> >      >      > +    if (s->surface &&
> >      >      > +        width == surface_width(s->surface) &&
> >      >      > +        height == surface_height(s->surface)) {
> >      >      > +        return;
> >      >      > +    }
> >      >      > +
> >      >      > +    g_free(s->vram);
> >      >      > +    [s->texture release];
> >      >      > +
> >      >      > +    s->vram = g_malloc0_n(width * height, 4);
> >      >      > +    s->surface = qemu_create_displaysurface_from(width,
> >     height,
> >      >     PIXMAN_LE_a8r8g8b8,
> >      >      > +                                                 width *
> >     4, s-
> >      >      >vram);> +> +    @autoreleasepool {
> >      >      > +        textureDescriptor =
> >      >      > +            [MTLTextureDescriptor
> >      >      > +
> >      >     texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
> >      >      > +                                             width:width
> >      >      > +                                            height:height
> >      >      > +                                         mipmapped:NO];
> >      >      > +        textureDescriptor.usage = s-
> >      >pgdisp.minimumTextureUsage;
> >      >      > +        s->texture = [s->mtl
> >      >     newTextureWithDescriptor:textureDescriptor];
> >      >
> >      >
> >      >     What about creating pixman_image_t from s-
> >      >texture.buffer.contents?
> >      >     This
> >      >     should save memory usage by removing the duplication of
> texture.
> >      >
> >      >
> >      > We need explicit control over when the GPU vs when the CPU may
> >     access
> >      > the texture - only one of them may access them at a time. As far
> >     as I
> >      > can tell, we can't control when the rest of Qemu might access the
> >      > pixman_image used in the console surface?
> >
> >     You are right; we need to have duplicate buffers. We can still avoid
> >     copying by using two MTLTextures for double-buffering instead of
> having
> >     a MTLTexture and a pixman_image and copying between them for
> >     MTLStorageModeManaged.
> >
> > Do I understand correctly that you intend to swap the surface->image on
> > every frame, or even the surface->image->data? If so, it's my
> > understanding from reading the source of a bunch of UI implementations a
> > few weeks ago that this is neither supported nor safe, as some
> > implementations take long-lived references to these internal data
> > structures until a dpy_gfx_switch callback. And the implementations for
> > those callbacks are in turn very expensive in some cases. This is why my
> > conclusion in the v4 thread was that double-buffering was infeasible
> > with the current architecture.
>
> By the way, can't we take the BQL after
> encodeCurrentFrameToCommandBuffer and keep it until the completion
> handler? PVG requires the BQL unlocked for forward progress due to the
> bottom half usage in callbacks, but Metal doesn't.
>

What would be the advantage of this?

Also, if you're suggesting unlocking the BQL *inside* the completion
handler: I'm pretty sure locking on one thread and unlocking on another is
not supported.



> >
> >      >
> >      >      > +    }
> >      >      > +
> >      >      > +    s->using_managed_texture_storage =
> >      >      > +        (s->texture.storageMode == MTLStorageModeManaged);
> >      >      > +    dpy_gfx_replace_surface(s->con, s->surface);
> >      >      > +}
> >      >      > +
> >      >      > +static void create_fb(AppleGFXState *s)
> >      >      > +{
> >      >      > +    s->con = graphic_console_init(NULL, 0,
> >     &apple_gfx_fb_ops, s);
> >      >      > +    set_mode(s, 1440, 1080);
> >      >      > +
> >      >      > +    s->cursor_show = true;
> >      >      > +}
> >      >      > +
> >      >      > +static size_t apple_gfx_get_default_mmio_range_size(void)
> >      >      > +{
> >      >      > +    size_t mmio_range_size;
> >      >      > +    @autoreleasepool {
> >      >      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor
> new];
> >      >      > +        mmio_range_size = desc.mmioLength;
> >      >      > +        [desc release];
> >      >      > +    }
> >      >      > +    return mmio_range_size;
> >      >      > +}
> >      >      > +
> >      >      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s,
> >     const
> >      >     char* obj_name)
> >      >      > +{
> >      >      > +    size_t mmio_range_size =
> >      >     apple_gfx_get_default_mmio_range_size();
> >      >      > +
> >      >      > +    trace_apple_gfx_common_init(obj_name,
> mmio_range_size);
> >      >      > +    memory_region_init_io(&s->iomem_gfx, obj,
> >     &apple_gfx_ops, s,
> >      >     obj_name,
> >      >      > +                          mmio_range_size);
> >      >      > +
> >      >      > +    /* TODO: PVG framework supports serialising device
> state:
> >      >     integrate it! */
> >      >      > +}
> >      >      > +
> >      >      > +typedef struct AppleGFXMapMemoryJob {
> >      >      > +    AppleGFXState *state;
> >      >      > +    PGTask_t *task;
> >      >      > +    uint64_t virtual_offset;
> >      >      > +    PGPhysicalMemoryRange_t *ranges;
> >      >      > +    uint32_t range_count;
> >      >      > +    bool read_only;
> >      >      > +    bool success;
> >      >      > +    bool done;
> >      >      > +} AppleGFXMapMemoryJob;
> >      >      > +
> >      >      > +uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t
> >      >     guest_physical,
> >      >      > +                                               uint64_t
> >     length,
> >      >     bool read_only)
> >      >      > +{
> >      >      > +    MemoryRegion *ram_region;
> >      >      > +    uintptr_t host_address;
> >      >      > +    hwaddr ram_region_offset = 0;
> >      >      > +    hwaddr ram_region_length = length;
> >      >      > +
> >      >      > +    ram_region =
> >     address_space_translate(&address_space_memory,
> >      >      > +                                         guest_physical,
> >      >      > +
>  &ram_region_offset,
> >      >      > +
> >       &ram_region_length, !
> >      >     read_only,
> >      >      > +
> >       MEMTXATTRS_UNSPECIFIED);
> >      >
> >      >     Call memory_region_ref() so that it won't go away.
> >      >
> >      >      > +
> >      >      > +    if (!ram_region || ram_region_length < length ||
> >      >      > +        !memory_access_is_direct(ram_region, !read_only))
> {
> >      >      > +        return 0;
> >      >      > +    }
> >      >      > +
> >      >      > +    host_address =
> >      >     (mach_vm_address_t)memory_region_get_ram_ptr(ram_region);
> >      >
> >      >     host_address is typed as uintptr_t, not mach_vm_address_t.
> >      >
> >      >      > +    if (host_address == 0) {
> >      >      > +        return 0;
> >      >      > +    }
> >      >      > +    host_address += ram_region_offset;
> >      >      > +
> >      >      > +    return host_address;
> >      >      > +}
> >      >      > +
> >      >      > +static void apple_gfx_map_memory(void *opaque)
> >      >      > +{
> >      >      > +    AppleGFXMapMemoryJob *job = opaque;
> >      >      > +    AppleGFXState *s = job->state;
> >      >      > +    PGTask_t *task                  = job->task;
> >      >      > +    uint32_t range_count            = job->range_count;
> >      >      > +    uint64_t virtual_offset         = job->virtual_offset;
> >      >      > +    PGPhysicalMemoryRange_t *ranges = job->ranges;
> >      >      > +    bool read_only                  = job->read_only;
> >      >      > +    kern_return_t r;
> >      >      > +    mach_vm_address_t target, source;
> >      >      > +    vm_prot_t cur_protection, max_protection;
> >      >      > +    bool success = true;
> >      >      > +
> >      >      > +    g_assert(bql_locked());
> >      >      > +
> >      >      > +    trace_apple_gfx_map_memory(task, range_count,
> >      >     virtual_offset, read_only);
> >      >      > +    for (int i = 0; i < range_count; i++) {
> >      >      > +        PGPhysicalMemoryRange_t *range = &ranges[i];
> >      >      > +
> >      >      > +        target = task->address + virtual_offset;
> >      >      > +        virtual_offset += range->physicalLength;
> >      >      > +
> >      >      > +        trace_apple_gfx_map_memory_range(i, range-
> >      >physicalAddress,
> >      >      > +                                         range-
> >      >physicalLength);
> >      >      > +
> >      >      > +        source =
> apple_gfx_host_address_for_gpa_range(range-
> >      >      >physicalAddress,
> >      >      > +
> range-
> >      >      >physicalLength,
> >      >      > +
> >     read_only);
> >      >      > +        if (source == 0) {
> >      >      > +            success = false;
> >      >      > +            continue;
> >      >      > +        }
> >      >      > +
> >      >      > +        MemoryRegion* alt_mr = NULL;
> >      >      > +        mach_vm_address_t alt_source =
> >      >     (mach_vm_address_t)gpa2hva(&alt_mr, range->physicalAddress,
> >     range-
> >      >      >physicalLength, NULL);
> >      >      > +        g_assert(alt_source == source);
> >      >
> >      >     Remove this; I guess this is for debugging.
> >      >
> >      >      > +
> >      >      > +        cur_protection = 0;
> >      >      > +        max_protection = 0;
> >      >      > +        // Map guest RAM at range->physicalAddress into
> >     PG task
> >      >     memory range
> >      >      > +        r = mach_vm_remap(mach_task_self(),
> >      >      > +                          &target, range->physicalLength,
> >      >     vm_page_size - 1,
> >      >      > +                          VM_FLAGS_FIXED |
> >     VM_FLAGS_OVERWRITE,
> >      >      > +                          mach_task_self(),
> >      >      > +                          source, false /* shared
> mapping, no
> >      >     copy */,
> >      >      > +                          &cur_protection,
> &max_protection,
> >      >      > +                          VM_INHERIT_COPY);
> >      >      > +        trace_apple_gfx_remap(r, source, target);
> >      >      > +        g_assert(r == KERN_SUCCESS);
> >      >      > +    }
> >      >      > +
> >      >      > +    qemu_mutex_lock(&s->job_mutex);
> >      >      > +    job->success = success;
> >      >      > +    job->done = true;
> >      >      > +    qemu_cond_broadcast(&s->job_cond);
> >      >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      >      > +}
> >      >      > +
> >      >      > +void apple_gfx_await_bh_job(AppleGFXState *s, bool
> >     *job_done_flag)
> >      >      > +{
> >      >      > +    qemu_mutex_lock(&s->job_mutex);
> >      >      > +    while (!*job_done_flag) {
> >      >      > +        qemu_cond_wait(&s->job_cond, &s->job_mutex);
> >      >      > +    }
> >      >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      >      > +}
> >      >      > +
> >      >      > +typedef struct AppleGFXReadMemoryJob {
> >      >      > +    AppleGFXState *s;
> >      >      > +    hwaddr physical_address;
> >      >      > +    uint64_t length;
> >      >      > +    void *dst;
> >      >      > +    bool done;
> >      >      > +} AppleGFXReadMemoryJob;
> >      >      > +
> >      >      > +static void apple_gfx_do_read_memory(void *opaque)
> >      >      > +{
> >      >      > +    AppleGFXReadMemoryJob *job = opaque;
> >      >      > +    AppleGFXState *s = job->s;
> >      >      > +
> >      >      > +    cpu_physical_memory_read(job->physical_address,
> job->dst,
> >      >     job->length);
> >      >
> >      >     Use: dma_memory_read()
> >      >
> >      >      > +
> >      >      > +    qemu_mutex_lock(&s->job_mutex);
> >      >      > +    job->done = true;
> >      >      > +    qemu_cond_broadcast(&s->job_cond);
> >      >      > +    qemu_mutex_unlock(&s->job_mutex);
> >      >      > +}
> >      >      > +
> >      >      > +static void apple_gfx_read_memory(AppleGFXState *s, hwaddr
> >      >     physical_address,
> >      >      > +                                  uint64_t length, void
> *dst)
> >      >      > +{
> >      >      > +    AppleGFXReadMemoryJob job = {
> >      >      > +        s, physical_address, length, dst
> >      >      > +    };
> >      >      > +
> >      >      > +    trace_apple_gfx_read_memory(physical_address, length,
> >     dst);
> >      >      > +
> >      >      > +    /* Traversing the memory map requires RCU/BQL, so do
> >     it in a
> >      >     BH. */
> >      >      > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> >      >     apple_gfx_do_read_memory,
> >      >      > +                            &job);
> >      >      > +    apple_gfx_await_bh_job(s, &job.done);
> >      >      > +}
> >      >      > +
> >      >      > +static void
> >      >     apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
> >      >      > +
> >      >       PGDeviceDescriptor *desc)
> >      >      > +{
> >      >      > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable
> *
> >      >     _Nonnull baseAddress) {
> >      >      > +        PGTask_t *task = apple_gfx_new_task(s, vmSize);
> >      >      > +        *baseAddress = (void *)task->address;
> >      >      > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
> >      >      > +        return task;
> >      >      > +    };
> >      >      > +
> >      >      > +    desc.destroyTask = ^(PGTask_t * _Nonnull task) {
> >      >      > +        trace_apple_gfx_destroy_task(task);
> >      >      > +        QTAILQ_REMOVE(&s->tasks, task, node);
> >      >      > +        mach_vm_deallocate(mach_task_self(),
> task->address,
> >      >     task->len);
> >      >      > +        g_free(task);
> >      >      > +    };
> >      >      > +
> >      >      > +    desc.mapMemory = ^bool(PGTask_t * _Nonnull task,
> uint32_t
> >      >     range_count,
> >      >      > +                       uint64_t virtual_offset, bool
> >     read_only,
> >      >      > +                       PGPhysicalMemoryRange_t * _Nonnull
> >     ranges) {
> >      >      > +        AppleGFXMapMemoryJob job = {
> >      >      > +            .state = s,
> >      >      > +            .task = task, .ranges = ranges, .range_count =
> >      >     range_count,
> >      >      > +            .read_only = read_only, .virtual_offset =
> >      >     virtual_offset,
> >      >      > +            .done = false, .success = true,
> >      >      > +        };
> >      >      > +        if (range_count > 0) {
> >      >      > +
> aio_bh_schedule_oneshot(qemu_get_aio_context(),
> >      >      > +                                    apple_gfx_map_memory,
> >     &job);
> >      >      > +            apple_gfx_await_bh_job(s, &job.done);
> >      >      > +        }
> >      >      > +        return job.success;
> >      >      > +    };
> >      >      > +
> >      >      > +    desc.unmapMemory = ^bool(PGTask_t * _Nonnull task,
> >     uint64_t
> >      >     virtualOffset,
> >      >      > +                         uint64_t length) {
> >      >      > +        kern_return_t r;
> >      >      > +        mach_vm_address_t range_address;
> >      >      > +
> >      >      > +        trace_apple_gfx_unmap_memory(task, virtualOffset,
> >     length);
> >      >      > +
> >      >      > +        /* Replace task memory range with fresh pages,
> >     undoing
> >      >     the mapping
> >      >      > +         * from guest RAM. */
> >      >      > +        range_address = task->address + virtualOffset;
> >      >      > +        r = mach_vm_allocate(mach_task_self(),
> >     &range_address,
> >      >     length,
> >      >      > +                             VM_FLAGS_FIXED |
> >     VM_FLAGS_OVERWRITE);
> >      >      > +        g_assert(r == KERN_SUCCESS);error_setg
> >      >
> >      >     An extra error_setg
> >      >
> >      >      > +
> >      >      > +        return true;
> >      >      > +    };
> >      >      > +
> >      >      > +    desc.readMemory = ^bool(uint64_t physical_address,
> >     uint64_t
> >      >     length,
> >      >      > +                            void * _Nonnull dst) {
> >      >      > +        apple_gfx_read_memory(s, physical_address,
> >     length, dst);
> >      >      > +        return true;
> >      >      > +    };
> >      >      > +}
> >      >      > +
> >      >      > +static PGDisplayDescriptor
> >      >     *apple_gfx_prepare_display_descriptor(AppleGFXState *s)
> >      >      > +{
> >      >      > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor
> >     new];
> >      >      > +
> >      >      > + disp_desc.name <http://disp_desc.name> <http://
> >     disp_desc.name <http://disp_desc.name>> = @"QEMU display";
> >      >      > +    disp_desc.sizeInMillimeters = NSMakeSize(400.,
> >     300.); /* A
> >      >     20" display */
> >      >      > +    disp_desc.queue = dispatch_get_main_queue();
> >      >      > +    disp_desc.newFrameEventHandler = ^(void) {
> >      >      > +        trace_apple_gfx_new_frame();
> >      >      > +        dispatch_async(s->render_queue, ^{
> >      >      > +            /* Drop frames if we get too far ahead. */
> >      >      > +            bql_lock();
> >      >      > +            if (s->pending_frames >= 2) {
> >      >      > +                bql_unlock();
> >      >      > +                return;
> >      >      > +            }
> >      >      > +            ++s->pending_frames;
> >      >      > +            if (s->pending_frames > 1) {
> >      >      > +                bql_unlock();
> >      >      > +                return;
> >      >      > +            }
> >      >      > +            @autoreleasepool {
> >      >      > +                apple_gfx_render_new_frame_bql_unlock(s);
> >      >      > +            }
> >      >      > +        });
> >      >      > +    };
> >      >      > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t
> >     sizeInPixels,
> >      >      > +                                    OSType pixelFormat) {
> >      >      > +        trace_apple_gfx_mode_change(sizeInPixels.x,
> >     sizeInPixels.y);
> >      >      > +
> >      >      > +        BQL_LOCK_GUARD();
> >      >      > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
> >      >      > +    };
> >      >      > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep
> *glyph,
> >      >      > +                                     PGDisplayCoord_t
> >     hotSpot) {
> >      >      > +        [glyph retain];
> >      >      > +        dispatch_async(get_background_queue(), ^{
> >      >      > +            BQL_LOCK_GUARD();
> >      >      > +            uint32_t bpp = glyph.bitsPerPixel;
> >      >      > +            size_t width = glyph.pixelsWide;
> >      >      > +            size_t height = glyph.pixelsHigh;
> >      >      > +            size_t padding_bytes_per_row =
> >     glyph.bytesPerRow -
> >      >     width * 4;
> >      >      > +            const uint8_t* px_data = glyph.bitmapData;
> >      >      > +
> >      >      > +            trace_apple_gfx_cursor_set(bpp, width,
> height);
> >      >      > +
> >      >      > +            if (s->cursor) {
> >      >      > +                cursor_unref(s->cursor);
> >      >      > +                s->cursor = NULL;
> >      >      > +            }
> >      >      > +
> >      >      > +            if (bpp == 32) { /* Shouldn't be anything
> >     else, but
> >      >     just to be safe...*/
> >      >      > +                s->cursor = cursor_alloc(width, height);
> >      >      > +                s->cursor->hot_x = hotSpot.x;
> >      >      > +                s->cursor->hot_y = hotSpot.y;
> >      >      > +
> >      >      > +                uint32_t *dest_px = s->cursor->data;
> >      >      > +
> >      >      > +                for (size_t y = 0; y < height; ++y) {
> >      >      > +                    for (size_t x = 0; x < width; ++x) {
> >      >      > +                        /* NSBitmapImageRep's red & blue
> >      >     channels are swapped
> >      >      > +                         * compared to QEMUCursor's. */
> >      >      > +                        *dest_px =
> >      >      > +                            (px_data[0] << 16u) |
> >      >      > +                            (px_data[1] <<  8u) |
> >      >      > +                            (px_data[2] <<  0u) |
> >      >      > +                            (px_data[3] << 24u);
> >      >      > +                        ++dest_px;
> >      >      > +                        px_data += 4;
> >      >      > +                    }
> >      >      > +                    px_data += padding_bytes_per_row;
> >      >      > +                }
> >      >      > +                dpy_cursor_define(s->con, s->cursor);
> >      >      > +                update_cursor(s);
> >      >      > +            }
> >      >      > +            [glyph release];
> >      >      > +        });
> >      >      > +    };
> >      >      > +    disp_desc.cursorShowHandler = ^(BOOL show) {
> >      >      > +        dispatch_async(get_background_queue(), ^{
> >      >      > +            BQL_LOCK_GUARD();
> >      >      > +            trace_apple_gfx_cursor_show(show);
> >      >      > +            s->cursor_show = show;
> >      >      > +            update_cursor(s);
> >      >      > +        });
> >      >      > +    };
> >      >      > +    disp_desc.cursorMoveHandler = ^(void) {
> >      >      > +        dispatch_async(get_background_queue(), ^{
> >      >      > +            BQL_LOCK_GUARD();
> >      >      > +            trace_apple_gfx_cursor_move();
> >      >      > +            update_cursor(s);
> >      >      > +        });
> >      >      > +    };
> >      >      > +
> >      >      > +    return disp_desc;
> >      >      > +}
> >      >      > +
> >      >      > +static NSArray<PGDisplayMode*>*
> >      >     apple_gfx_prepare_display_mode_array(void)
> >      >      > +{
> >      >      > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
> >      >      > +    NSArray<PGDisplayMode*>* mode_array = nil;
> >      >      > +    int i;
> >      >      > +
> >      >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> >      >      > +        modes[i] =
> >      >      > +            [[PGDisplayMode alloc]
> >      >     initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
> >      >      > +    }
> >      >      > +
> >      >      > +    mode_array = [NSArray arrayWithObjects:modes
> >      >     count:ARRAY_SIZE(apple_gfx_modes)];
> >      >      > +
> >      >      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> >      >      > +        [modes[i] release];
> >      >      > +        modes[i] = nil;
> >      >      > +    }
> >      >      > +
> >      >      > +    return mode_array;
> >      >      > +}
> >      >      > +
> >      >      > +static id<MTLDevice> copy_suitable_metal_device(void)
> >      >      > +{
> >      >      > +    id<MTLDevice> dev = nil;
> >      >      > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
> >      >      > +
> >      >      > +    /* Prefer a unified memory GPU. Failing that, pick a
> non-
> >      >     removable GPU. */
> >      >      > +    for (size_t i = 0; i < devs.count; ++i) {
> >      >      > +        if (devs[i].hasUnifiedMemory) {
> >      >      > +            dev = devs[i];
> >      >      > +            break;
> >      >      > +        }
> >      >      > +        if (!devs[i].removable) {
> >      >      > +            dev = devs[i];
> >      >      > +        }
> >      >      > +    }
> >      >      > +
> >      >      > +    if (dev != nil) {
> >      >      > +        [dev retain];
> >      >      > +    } else {
> >      >      > +        dev = MTLCreateSystemDefaultDevice();
> >      >      > +    }
> >      >      > +    [devs release];
> >      >      > +
> >      >      > +    return dev;
> >      >      > +}
> >      >      > +
> >      >      > +void apple_gfx_common_realize(AppleGFXState *s,
> >      >     PGDeviceDescriptor *desc,
> >      >      > +                              Error **errp)
> >      >      > +{
> >      >      > +    PGDisplayDescriptor *disp_desc = nil;
> >      >      > +
> >      >      > +    if (apple_gfx_mig_blocker == NULL) {
> >      >      > +        error_setg(&apple_gfx_mig_blocker,
> >      >      > +                  "Migration state blocked by apple-gfx
> >     display
> >      >     device");
> >      >      > +        if (migrate_add_blocker(&apple_gfx_mig_blocker,
> >     errp) < 0) {
> >      >      > +            return;
> >      >      > +        }
> >      >      > +    }
> >      >      > +
> >      >      > +    QTAILQ_INIT(&s->tasks);
> >      >      > +    s->render_queue = dispatch_queue_create("apple-
> >     gfx.render",
> >      >      > +
> >     DISPATCH_QUEUE_SERIAL);
> >      >      > +    s->mtl = copy_suitable_metal_device();
> >      >      > +    s->mtl_queue = [s->mtl newCommandQueue];
> >      >      > +
> >      >      > +    desc.device = s->mtl;
> >      >      > +
> >      >      > +    apple_gfx_register_task_mapping_handlers(s, desc);
> >      >      > +
> >      >      > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
> >      >      > +
> >      >      > +    disp_desc = apple_gfx_prepare_display_descriptor(s);
> >      >      > +    s->pgdisp = [s->pgdev
> newDisplayWithDescriptor:disp_desc
> >      >      > +                                              port:0
> >      >     serialNum:1234];
> >      >      > +    [disp_desc release];
> >      >      > +    s->pgdisp.modeList =
> >     apple_gfx_prepare_display_mode_array();
> >      >      > +
> >      >      > +    create_fb(s);
> >      >      > +
> >      >      > +    qemu_mutex_init(&s->job_mutex);
> >      >      > +    qemu_cond_init(&s->job_cond);
> >      >      > +}
> >      >      > diff --git a/hw/display/meson.build
> b/hw/display/meson.build
> >      >      > index 20a94973fa2..619e642905a 100644
> >      >      > --- a/hw/display/meson.build
> >      >      > +++ b/hw/display/meson.build
> >      >      > @@ -61,6 +61,10 @@ system_ss.add(when: 'CONFIG_ARTIST',
> >     if_true:
> >      >     files('artist.c'))
> >      >      >
> >      >      >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true:
> >     [files('ati.c',
> >      >     'ati_2d.c', 'ati_dbg.c'), pixman])
> >      >      >
> >      >      > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
> >      >     [files('apple-gfx.m'), pvg, metal])
> >      >      > +if cpu == 'aarch64'
> >      >      > +  system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true:
> >      >     [files('apple-gfx-mmio.m'), pvg, metal])
> >      >      > +endif
> >      >      >
> >      >      >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
> >      >      >     virtio_gpu_ss = ss.source_set()
> >      >      > diff --git a/hw/display/trace-events
> b/hw/display/trace-events
> >      >      > index 781f8a33203..214998312b9 100644
> >      >      > --- a/hw/display/trace-events
> >      >      > +++ b/hw/display/trace-events
> >      >      > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width)
> >      >     "dest_width : %u"
> >      >      >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
> >      >      >   dm163_channels(int channel, uint8_t value) "channel %d:
> >     0x%x"
> >      >      >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
> >      >      > +
> >      >      > +# apple-gfx.m
> >      >      > +apple_gfx_read(uint64_t offset, uint64_t res)
> >      >     "offset=0x%"PRIx64" res=0x%"PRIx64
> >      >      > +apple_gfx_write(uint64_t offset, uint64_t val)
> >      >     "offset=0x%"PRIx64" val=0x%"PRIx64
> >      >      > +apple_gfx_create_task(uint32_t vm_size, void *va)
> >     "vm_size=0x%x
> >      >     base_addr=%p"
> >      >      > +apple_gfx_destroy_task(void *task) "task=%p"
> >      >      > +apple_gfx_map_memory(void *task, uint32_t range_count,
> >     uint64_t
> >      >     virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
> >      >     virtual_offset=0x%"PRIx64" read_only=%d"
> >      >      > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr,
> >      >     uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64"
> >     phys_len=0x%"PRIx64
> >      >      > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t
> >      >     target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
> >      >      > +apple_gfx_unmap_memory(void *task, uint64_t
> virtual_offset,
> >      >     uint64_t length) "task=%p virtual_offset=0x%"PRIx64"
> >     length=0x%"PRIx64
> >      >      > +apple_gfx_read_memory(uint64_t phys_address, uint64_t
> length,
> >      >     void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
> >      >      > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
> >      >      > +apple_gfx_new_frame(void) ""
> >      >      > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64"
> >      >     y=%"PRId64
> >      >      > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width,
> uint64_t
> >      >     height) "bpp=%d width=%"PRId64" height=0x%"PRId64
> >      >      > +apple_gfx_cursor_show(uint32_t show) "show=%d"
> >      >      > +apple_gfx_cursor_move(void) ""
> >      >      > +apple_gfx_common_init(const char *device_name, size_t
> >     mmio_size)
> >      >     "device: %s; MMIO size: %zu bytes"
> >      >      > +
> >      >      > +# apple-gfx-mmio.m
> >      >      > +apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res)
> >      >     "offset=0x%"PRIx64" res=0x%"PRIx64
> >      >      > +apple_gfx_mmio_iosfc_write(uint64_t offset, uint64_t val)
> >      >     "offset=0x%"PRIx64" val=0x%"PRIx64
> >      >      > +apple_gfx_iosfc_map_memory(uint64_t phys, uint64_t len,
> >     uint32_t
> >      >     ro, void *va, void *e, void *f, void* va_result, int success)
> >      >     "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d va=%p e=%p f=%p ->
> >     *va=%p,
> >      >     success = %d"
> >      >      > +apple_gfx_iosfc_unmap_memory(void *a, void *b, void *c,
> >     void *d,
> >      >     void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
> >      >      > +apple_gfx_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
> >      >      > +
> >      >      > diff --git a/meson.build b/meson.build
> >      >      > index d26690ce204..0e124eff13f 100644
> >      >      > --- a/meson.build
> >      >      > +++ b/meson.build
> >      >      > @@ -761,6 +761,8 @@ socket = []
> >      >      >   version_res = []
> >      >      >   coref = []
> >      >      >   iokit = []
> >      >      > +pvg = []
> >      >      > +metal = []
> >      >      >   emulator_link_args = []
> >      >      >   midl = not_found
> >      >      >   widl = not_found
> >      >      > @@ -782,6 +784,8 @@ elif host_os == 'darwin'
> >      >      >     coref = dependency('appleframeworks', modules:
> >     'CoreFoundation')
> >      >      >     iokit = dependency('appleframeworks', modules: 'IOKit',
> >      >     required: false)
> >      >      >     host_dsosuf = '.dylib'
> >      >      > +  pvg = dependency('appleframeworks', modules:
> >      >     'ParavirtualizedGraphics')
> >      >      > +  metal = dependency('appleframeworks', modules: 'Metal')
> >      >      >   elif host_os == 'sunos'
> >      >      >     socket = [cc.find_library('socket'),
> >      >      >               cc.find_library('nsl'),
> >      >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 76912 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28  9:00             ` Phil Dennis-Jordan
@ 2024-10-28 13:31               ` Phil Dennis-Jordan
  2024-10-28 14:02                 ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-28 13:31 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]

On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan <phil@philjordan.eu> wrote:

>
> >      >
>> >      > Hmm. I think if we were to use that, we would need to create a
>> new
>> >      > QemuEvent for every job and destroy it afterward, which seems
>> >     expensive.
>> >      > We can't rule out multiple concurrent jobs being submitted, and
>> the
>> >      > QemuEvent system only supports a single producer as far as I can
>> >     tell.
>> >      >
>> >      > You can probably sort of hack around it with just one QemuEvent
>> by
>> >      > putting the qemu_event_wait into a loop and turning the job.done
>> >     flag
>> >      > into an atomic (because it would now need to be checked outside
>> the
>> >      > lock) but this all seems unnecessarily complicated considering
>> the
>> >      > QemuEvent uses the same mechanism QemuCond/QemuMutex internally
>> >     on macOS
>> >      > (the only platform relevant here), except we can use it as
>> >     intended with
>> >      > QemuCond/QemuMutex rather than having to work against the
>> >     abstraction.
>> >
>> >     I don't think it's going to be used concurrently. It would be
>> difficult
>> >     to reason even for the framework if it performs memory
>> >     unmapping/mapping/reading operations concurrently.
>> >
>> >
>> > I've just performed a very quick test by wrapping the job submission/
>> > wait in the 2 mapMemory callbacks and the 1 readMemory callback with
>> > atomic counters and logging whenever a counter went above 1.
>> >
>> >   * Overall, concurrent callbacks across all types were common (many
>> per
>> > second when the VM is busy). It's not exactly a "thundering herd" (I
>> > never saw >2) but it's probably not a bad idea to use a separate
>> > condition variable for each job type. (task map, surface map, memory
>> read)
>> >   * While I did not observe any concurrent memory mapping operations
>> > *within* a type of memory map (2 task mappings or 2 surface mappings) I
>> > did see very occasional concurrent memory *read* callbacks. These
>> would,
>> > as far as I can tell, not be safe with QemuEvents, unless we placed the
>> > event inside the job struct and init/destroyed it on every callback
>> > (which seems like excessive overhead).
>>
>> I think we can tolerate that overhead. init/destroy essentially sets the
>> fields in the data structure and I estimate its total size is about 100
>> bytes. It is probably better than waking an irrelevant thread up. I also
>> hope that keeps the code simple; it's not worthwhile adding code to
>> optimize this.
>>
>
> At least pthread_cond_{init,destroy} and pthread_mutex_{init,destroy}
> don't make any syscalls, so yeah it's probably an acceptable overhead.
>

I've just experimented with QemuEvents created on-demand and ran into some
weird deadlocks, which then made me sit down and think about it some more.
I've come to the conclusion that creating (and crucially, destroying)
QemuEvents on demand in this way is not safe.

Specifically, you must not call qemu_event_destroy() - which transitively
destroys the mutex and condition variable - unless you can guarantee that
the qemu_event_set() call on that event object has completed.

In qemu_event_set, the event object's value is atomically set to EV_SET. If
the previous value was EV_BUSY, qemu_futex_wake() is called. All of this is
outside any mutex, however, so apart from memory coherence (there are
barriers) this can race with the waiting thread. qemu_event_wait() reads
the event's value. If EV_FREE, it's atomically set to EV_BUSY. Then the
mutex is locked, the value is checked again, and if it's still EV_BUSY, it
waits for the condition variable, otherwise the mutex is immediately
unlocked again. If the trigger thread's qemu_event_set() flip to EV_SET
occurs between the waiting thread's two atomic reads of the value, the
waiting thread will never wait for the condition variable, but the trigger
thread WILL try to acquire the mutex and signal the condition variable in
qemu_futex_wake(), by which  time the waiting thread may have advanced
outside of qemu_event_wait().

This is all fine usually, BUT if you destroy the QemuEvent immediately
after the qemu_event_wait() call, qemu_futex_wake() may try to lock a mutex
that has been destroyed, or signal a condition variable which has been
destroyed. I don't see a reasonable way of making this safe other than
using long-lived mutexes and condition variables. And anyway, we have much,
MUCH bigger contention/performance issues coming from almost everything
being covered by the BQL. (If waking these callbacks can even be considered
an issue: I haven't seen it show up in profiling, whereas BQL contention
very much does.)

I'll submit v5 of this patch set with separate condition variables for each
job type. This should make the occurrence of waking the wrong thread quite
rare, while reasoning about correctness is pretty straightforward. I think
that's good enough.

[-- Attachment #2: Type: text/html, Size: 6058 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28 13:31               ` Phil Dennis-Jordan
@ 2024-10-28 14:02                 ` Akihiko Odaki
  2024-10-28 14:13                   ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-28 14:02 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
> 
> 
> On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan <phil@philjordan.eu 
> <mailto:phil@philjordan.eu>> wrote:
> 
> 
>          >      >
>          >      > Hmm. I think if we were to use that, we would need to
>         create a new
>          >      > QemuEvent for every job and destroy it afterward,
>         which seems
>          >     expensive.
>          >      > We can't rule out multiple concurrent jobs being
>         submitted, and the
>          >      > QemuEvent system only supports a single producer as
>         far as I can
>          >     tell.
>          >      >
>          >      > You can probably sort of hack around it with just one
>         QemuEvent by
>          >      > putting the qemu_event_wait into a loop and turning
>         the job.done
>          >     flag
>          >      > into an atomic (because it would now need to be
>         checked outside the
>          >      > lock) but this all seems unnecessarily complicated
>         considering the
>          >      > QemuEvent uses the same mechanism QemuCond/QemuMutex
>         internally
>          >     on macOS
>          >      > (the only platform relevant here), except we can use it as
>          >     intended with
>          >      > QemuCond/QemuMutex rather than having to work against the
>          >     abstraction.
>          >
>          >     I don't think it's going to be used concurrently. It
>         would be difficult
>          >     to reason even for the framework if it performs memory
>          >     unmapping/mapping/reading operations concurrently.
>          >
>          >
>          > I've just performed a very quick test by wrapping the job
>         submission/
>          > wait in the 2 mapMemory callbacks and the 1 readMemory
>         callback with
>          > atomic counters and logging whenever a counter went above 1.
>          >
>          >   * Overall, concurrent callbacks across all types were
>         common (many per
>          > second when the VM is busy). It's not exactly a "thundering
>         herd" (I
>          > never saw >2) but it's probably not a bad idea to use a separate
>          > condition variable for each job type. (task map, surface map,
>         memory read)
>          >   * While I did not observe any concurrent memory mapping
>         operations
>          > *within* a type of memory map (2 task mappings or 2 surface
>         mappings) I
>          > did see very occasional concurrent memory *read* callbacks.
>         These would,
>          > as far as I can tell, not be safe with QemuEvents, unless we
>         placed the
>          > event inside the job struct and init/destroyed it on every
>         callback
>          > (which seems like excessive overhead).
> 
>         I think we can tolerate that overhead. init/destroy essentially
>         sets the
>         fields in the data structure and I estimate its total size is
>         about 100
>         bytes. It is probably better than waking an irrelevant thread
>         up. I also
>         hope that keeps the code simple; it's not worthwhile adding code to
>         optimize this.
> 
> 
>     At least pthread_cond_{init,destroy} and
>     pthread_mutex_{init,destroy} don't make any syscalls, so yeah it's
>     probably an acceptable overhead.
> 
> 
> I've just experimented with QemuEvents created on-demand and ran into 
> some weird deadlocks, which then made me sit down and think about it 
> some more. I've come to the conclusion that creating (and crucially, 
> destroying) QemuEvents on demand in this way is not safe.
> 
> Specifically, you must not call qemu_event_destroy() - which 
> transitively destroys the mutex and condition variable - unless you can 
> guarantee that the qemu_event_set() call on that event object has completed.
> 
> In qemu_event_set, the event object's value is atomically set to EV_SET. 
> If the previous value was EV_BUSY, qemu_futex_wake() is called. All of 
> this is outside any mutex, however, so apart from memory coherence 
> (there are barriers) this can race with the waiting thread. 
> qemu_event_wait() reads the event's value. If EV_FREE, it's atomically 
> set to EV_BUSY. Then the mutex is locked, the value is checked again, 
> and if it's still EV_BUSY, it waits for the condition variable, 
> otherwise the mutex is immediately unlocked again. If the trigger 
> thread's qemu_event_set() flip to EV_SET occurs between the waiting 
> thread's two atomic reads of the value, the waiting thread will never 
> wait for the condition variable, but the trigger thread WILL try to 
> acquire the mutex and signal the condition variable in 
> qemu_futex_wake(), by which  time the waiting thread may have advanced 
> outside of qemu_event_wait().

Sorry if I'm making a mistake again, but the waiting thread won't set to 
EV_BUSY unless the value is EV_FREE on the second read so the trigger 
thread will not call qemu_futex_wake() if it manages to set to EV_SET 
before the second read, will it?

> 
> This is all fine usually, BUT if you destroy the QemuEvent immediately 
> after the qemu_event_wait() call, qemu_futex_wake() may try to lock a 
> mutex that has been destroyed, or signal a condition variable which has 
> been destroyed. I don't see a reasonable way of making this safe other 
> than using long-lived mutexes and condition variables. And anyway, we 
> have much, MUCH bigger contention/performance issues coming from almost 
> everything being covered by the BQL. (If waking these callbacks can even 
> be considered an issue: I haven't seen it show up in profiling, whereas 
> BQL contention very much does.)
> 
> I'll submit v5 of this patch set with separate condition variables for 
> each job type. This should make the occurrence of waking the wrong 
> thread quite rare, while reasoning about correctness is pretty 
> straightforward. I think that's good enough.
> 
> 
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28 14:02                 ` Akihiko Odaki
@ 2024-10-28 14:13                   ` Phil Dennis-Jordan
  2024-10-28 16:06                     ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-28 14:13 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 7345 bytes --]

On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
> >
> >
> > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan <phil@philjordan.eu
> > <mailto:phil@philjordan.eu>> wrote:
> >
> >
> >          >      >
> >          >      > Hmm. I think if we were to use that, we would need to
> >         create a new
> >          >      > QemuEvent for every job and destroy it afterward,
> >         which seems
> >          >     expensive.
> >          >      > We can't rule out multiple concurrent jobs being
> >         submitted, and the
> >          >      > QemuEvent system only supports a single producer as
> >         far as I can
> >          >     tell.
> >          >      >
> >          >      > You can probably sort of hack around it with just one
> >         QemuEvent by
> >          >      > putting the qemu_event_wait into a loop and turning
> >         the job.done
> >          >     flag
> >          >      > into an atomic (because it would now need to be
> >         checked outside the
> >          >      > lock) but this all seems unnecessarily complicated
> >         considering the
> >          >      > QemuEvent uses the same mechanism QemuCond/QemuMutex
> >         internally
> >          >     on macOS
> >          >      > (the only platform relevant here), except we can use
> it as
> >          >     intended with
> >          >      > QemuCond/QemuMutex rather than having to work against
> the
> >          >     abstraction.
> >          >
> >          >     I don't think it's going to be used concurrently. It
> >         would be difficult
> >          >     to reason even for the framework if it performs memory
> >          >     unmapping/mapping/reading operations concurrently.
> >          >
> >          >
> >          > I've just performed a very quick test by wrapping the job
> >         submission/
> >          > wait in the 2 mapMemory callbacks and the 1 readMemory
> >         callback with
> >          > atomic counters and logging whenever a counter went above 1.
> >          >
> >          >   * Overall, concurrent callbacks across all types were
> >         common (many per
> >          > second when the VM is busy). It's not exactly a "thundering
> >         herd" (I
> >          > never saw >2) but it's probably not a bad idea to use a
> separate
> >          > condition variable for each job type. (task map, surface map,
> >         memory read)
> >          >   * While I did not observe any concurrent memory mapping
> >         operations
> >          > *within* a type of memory map (2 task mappings or 2 surface
> >         mappings) I
> >          > did see very occasional concurrent memory *read* callbacks.
> >         These would,
> >          > as far as I can tell, not be safe with QemuEvents, unless we
> >         placed the
> >          > event inside the job struct and init/destroyed it on every
> >         callback
> >          > (which seems like excessive overhead).
> >
> >         I think we can tolerate that overhead. init/destroy essentially
> >         sets the
> >         fields in the data structure and I estimate its total size is
> >         about 100
> >         bytes. It is probably better than waking an irrelevant thread
> >         up. I also
> >         hope that keeps the code simple; it's not worthwhile adding code
> to
> >         optimize this.
> >
> >
> >     At least pthread_cond_{init,destroy} and
> >     pthread_mutex_{init,destroy} don't make any syscalls, so yeah it's
> >     probably an acceptable overhead.
> >
> >
> > I've just experimented with QemuEvents created on-demand and ran into
> > some weird deadlocks, which then made me sit down and think about it
> > some more. I've come to the conclusion that creating (and crucially,
> > destroying) QemuEvents on demand in this way is not safe.
> >
> > Specifically, you must not call qemu_event_destroy() - which
> > transitively destroys the mutex and condition variable - unless you can
> > guarantee that the qemu_event_set() call on that event object has
> completed.
> >
> > In qemu_event_set, the event object's value is atomically set to EV_SET.
> > If the previous value was EV_BUSY, qemu_futex_wake() is called. All of
> > this is outside any mutex, however, so apart from memory coherence
> > (there are barriers) this can race with the waiting thread.
> > qemu_event_wait() reads the event's value. If EV_FREE, it's atomically
> > set to EV_BUSY. Then the mutex is locked, the value is checked again,
> > and if it's still EV_BUSY, it waits for the condition variable,
> > otherwise the mutex is immediately unlocked again. If the trigger
> > thread's qemu_event_set() flip to EV_SET occurs between the waiting
> > thread's two atomic reads of the value, the waiting thread will never
> > wait for the condition variable, but the trigger thread WILL try to
> > acquire the mutex and signal the condition variable in
> > qemu_futex_wake(), by which  time the waiting thread may have advanced
> > outside of qemu_event_wait().
>
> Sorry if I'm making a mistake again, but the waiting thread won't set to
> EV_BUSY unless the value is EV_FREE on the second read so the trigger
> thread will not call qemu_futex_wake() if it manages to set to EV_SET
> before the second read, will it?
>

This sequence of events will cause the problem:

WAITER (in qemu_event_wait):
value = qatomic_load_acquire(&ev->value);
-> EV_FREE

TRIGGER (in qemu_event_set):
qatomic_read(&ev->value) != EV_SET
-> EV_FREE (condition is false)

WAITER:
qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
-> cmpxchg returns EV_FREE, condition false.
ev->value =  EV_BUSY.

TRIGGER:
        int old = qatomic_xchg(&ev->value, EV_SET);
        smp_mb__after_rmw();
        if (old == EV_BUSY) {
-> old = EV_BUSY, condition true.
ev->value = EV_SET

WAITER (in qemu_futex_wait(ev, EV_BUSY)):
    pthread_mutex_lock(&ev->lock);
    if (ev->value == val) {
-> false, because value is EV_SET

WAITER:
    pthread_mutex_unlock(&ev->lock);
    …
    qemu_event_destroy(&job->done_event);

TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
    pthread_mutex_lock(&ev->lock);
-> hangs, because mutex has been destroyed

>
> > This is all fine usually, BUT if you destroy the QemuEvent immediately
> > after the qemu_event_wait() call, qemu_futex_wake() may try to lock a
> > mutex that has been destroyed, or signal a condition variable which has
> > been destroyed. I don't see a reasonable way of making this safe other
> > than using long-lived mutexes and condition variables. And anyway, we
> > have much, MUCH bigger contention/performance issues coming from almost
> > everything being covered by the BQL. (If waking these callbacks can even
> > be considered an issue: I haven't seen it show up in profiling, whereas
> > BQL contention very much does.)
> >
> > I'll submit v5 of this patch set with separate condition variables for
> > each job type. This should make the occurrence of waking the wrong
> > thread quite rare, while reasoning about correctness is pretty
> > straightforward. I think that's good enough.
> >
> >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 9710 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28 14:13                   ` Phil Dennis-Jordan
@ 2024-10-28 16:06                     ` Akihiko Odaki
  2024-10-28 21:06                       ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-28 16:06 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
> 
> 
> On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
>     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
>      >
>      >
>      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
>     <phil@philjordan.eu <mailto:phil@philjordan.eu>
>      > <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>> wrote:
>      >
>      >
>      >          >      >
>      >          >      > Hmm. I think if we were to use that, we would
>     need to
>      >         create a new
>      >          >      > QemuEvent for every job and destroy it afterward,
>      >         which seems
>      >          >     expensive.
>      >          >      > We can't rule out multiple concurrent jobs being
>      >         submitted, and the
>      >          >      > QemuEvent system only supports a single producer as
>      >         far as I can
>      >          >     tell.
>      >          >      >
>      >          >      > You can probably sort of hack around it with
>     just one
>      >         QemuEvent by
>      >          >      > putting the qemu_event_wait into a loop and turning
>      >         the job.done
>      >          >     flag
>      >          >      > into an atomic (because it would now need to be
>      >         checked outside the
>      >          >      > lock) but this all seems unnecessarily complicated
>      >         considering the
>      >          >      > QemuEvent uses the same mechanism QemuCond/
>     QemuMutex
>      >         internally
>      >          >     on macOS
>      >          >      > (the only platform relevant here), except we
>     can use it as
>      >          >     intended with
>      >          >      > QemuCond/QemuMutex rather than having to work
>     against the
>      >          >     abstraction.
>      >          >
>      >          >     I don't think it's going to be used concurrently. It
>      >         would be difficult
>      >          >     to reason even for the framework if it performs memory
>      >          >     unmapping/mapping/reading operations concurrently.
>      >          >
>      >          >
>      >          > I've just performed a very quick test by wrapping the job
>      >         submission/
>      >          > wait in the 2 mapMemory callbacks and the 1 readMemory
>      >         callback with
>      >          > atomic counters and logging whenever a counter went
>     above 1.
>      >          >
>      >          >   * Overall, concurrent callbacks across all types were
>      >         common (many per
>      >          > second when the VM is busy). It's not exactly a
>     "thundering
>      >         herd" (I
>      >          > never saw >2) but it's probably not a bad idea to use
>     a separate
>      >          > condition variable for each job type. (task map,
>     surface map,
>      >         memory read)
>      >          >   * While I did not observe any concurrent memory mapping
>      >         operations
>      >          > *within* a type of memory map (2 task mappings or 2
>     surface
>      >         mappings) I
>      >          > did see very occasional concurrent memory *read*
>     callbacks.
>      >         These would,
>      >          > as far as I can tell, not be safe with QemuEvents,
>     unless we
>      >         placed the
>      >          > event inside the job struct and init/destroyed it on every
>      >         callback
>      >          > (which seems like excessive overhead).
>      >
>      >         I think we can tolerate that overhead. init/destroy
>     essentially
>      >         sets the
>      >         fields in the data structure and I estimate its total size is
>      >         about 100
>      >         bytes. It is probably better than waking an irrelevant thread
>      >         up. I also
>      >         hope that keeps the code simple; it's not worthwhile
>     adding code to
>      >         optimize this.
>      >
>      >
>      >     At least pthread_cond_{init,destroy} and
>      >     pthread_mutex_{init,destroy} don't make any syscalls, so yeah
>     it's
>      >     probably an acceptable overhead.
>      >
>      >
>      > I've just experimented with QemuEvents created on-demand and ran
>     into
>      > some weird deadlocks, which then made me sit down and think about it
>      > some more. I've come to the conclusion that creating (and crucially,
>      > destroying) QemuEvents on demand in this way is not safe.
>      >
>      > Specifically, you must not call qemu_event_destroy() - which
>      > transitively destroys the mutex and condition variable - unless
>     you can
>      > guarantee that the qemu_event_set() call on that event object has
>     completed.
>      >
>      > In qemu_event_set, the event object's value is atomically set to
>     EV_SET.
>      > If the previous value was EV_BUSY, qemu_futex_wake() is called.
>     All of
>      > this is outside any mutex, however, so apart from memory coherence
>      > (there are barriers) this can race with the waiting thread.
>      > qemu_event_wait() reads the event's value. If EV_FREE, it's
>     atomically
>      > set to EV_BUSY. Then the mutex is locked, the value is checked
>     again,
>      > and if it's still EV_BUSY, it waits for the condition variable,
>      > otherwise the mutex is immediately unlocked again. If the trigger
>      > thread's qemu_event_set() flip to EV_SET occurs between the waiting
>      > thread's two atomic reads of the value, the waiting thread will
>     never
>      > wait for the condition variable, but the trigger thread WILL try to
>      > acquire the mutex and signal the condition variable in
>      > qemu_futex_wake(), by which  time the waiting thread may have
>     advanced
>      > outside of qemu_event_wait().
> 
>     Sorry if I'm making a mistake again, but the waiting thread won't
>     set to
>     EV_BUSY unless the value is EV_FREE on the second read so the trigger
>     thread will not call qemu_futex_wake() if it manages to set to EV_SET
>     before the second read, will it?
> 
> 
> This sequence of events will cause the problem:
> 
> WAITER (in qemu_event_wait):
> value = qatomic_load_acquire(&ev->value);
> -> EV_FREE
> 
> TRIGGER (in qemu_event_set):
> qatomic_read(&ev->value) != EV_SET
> -> EV_FREE (condition is false)
> 
> WAITER:
> qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
> -> cmpxchg returns EV_FREE, condition false.
> ev->value =  EV_BUSY.
> > TRIGGER:
>          int old = qatomic_xchg(&ev->value, EV_SET);
>          smp_mb__after_rmw();
>          if (old == EV_BUSY) {
> -> old = EV_BUSY, condition true.
> ev->value = EV_SET
> 
> WAITER (in qemu_futex_wait(ev, EV_BUSY)):
>      pthread_mutex_lock(&ev->lock);
>      if (ev->value == val) {
> -> false, because value is EV_SET
> 
> WAITER:
>      pthread_mutex_unlock(&ev->lock);
>      …
>      qemu_event_destroy(&job->done_event);
> 
> TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
>      pthread_mutex_lock(&ev->lock);
> -> hangs, because mutex has been destroyed

Thanks for clarification. This is very insightful.


> 
>      >
>      > This is all fine usually, BUT if you destroy the QemuEvent
>     immediately
>      > after the qemu_event_wait() call, qemu_futex_wake() may try to
>     lock a
>      > mutex that has been destroyed, or signal a condition variable
>     which has
>      > been destroyed. I don't see a reasonable way of making this safe
>     other
>      > than using long-lived mutexes and condition variables. And
>     anyway, we
>      > have much, MUCH bigger contention/performance issues coming from
>     almost
>      > everything being covered by the BQL. (If waking these callbacks
>     can even
>      > be considered an issue: I haven't seen it show up in profiling,
>     whereas
>      > BQL contention very much does.)
>      >
>      > I'll submit v5 of this patch set with separate condition
>     variables for
>      > each job type. This should make the occurrence of waking the wrong
>      > thread quite rare, while reasoning about correctness is pretty
>      > straightforward. I think that's good enough.

What about using QemuSemaphore then? It does not seem to have the 
problem same with QemuEvent.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28 16:06                     ` Akihiko Odaki
@ 2024-10-28 21:06                       ` Phil Dennis-Jordan
  2024-10-29  7:42                         ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-28 21:06 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 10107 bytes --]

On Mon, 28 Oct 2024 at 17:06, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
> >
> >
> > On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki <akihiko.odaki@daynix.com
> > <mailto:akihiko.odaki@daynix.com>> wrote:
> >
> >     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
> >      >
> >      >
> >      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
> >     <phil@philjordan.eu <mailto:phil@philjordan.eu>
> >      > <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>> wrote:
> >      >
> >      >
> >      >          >      >
> >      >          >      > Hmm. I think if we were to use that, we would
> >     need to
> >      >         create a new
> >      >          >      > QemuEvent for every job and destroy it
> afterward,
> >      >         which seems
> >      >          >     expensive.
> >      >          >      > We can't rule out multiple concurrent jobs being
> >      >         submitted, and the
> >      >          >      > QemuEvent system only supports a single
> producer as
> >      >         far as I can
> >      >          >     tell.
> >      >          >      >
> >      >          >      > You can probably sort of hack around it with
> >     just one
> >      >         QemuEvent by
> >      >          >      > putting the qemu_event_wait into a loop and
> turning
> >      >         the job.done
> >      >          >     flag
> >      >          >      > into an atomic (because it would now need to be
> >      >         checked outside the
> >      >          >      > lock) but this all seems unnecessarily
> complicated
> >      >         considering the
> >      >          >      > QemuEvent uses the same mechanism QemuCond/
> >     QemuMutex
> >      >         internally
> >      >          >     on macOS
> >      >          >      > (the only platform relevant here), except we
> >     can use it as
> >      >          >     intended with
> >      >          >      > QemuCond/QemuMutex rather than having to work
> >     against the
> >      >          >     abstraction.
> >      >          >
> >      >          >     I don't think it's going to be used concurrently.
> It
> >      >         would be difficult
> >      >          >     to reason even for the framework if it performs
> memory
> >      >          >     unmapping/mapping/reading operations concurrently.
> >      >          >
> >      >          >
> >      >          > I've just performed a very quick test by wrapping the
> job
> >      >         submission/
> >      >          > wait in the 2 mapMemory callbacks and the 1 readMemory
> >      >         callback with
> >      >          > atomic counters and logging whenever a counter went
> >     above 1.
> >      >          >
> >      >          >   * Overall, concurrent callbacks across all types were
> >      >         common (many per
> >      >          > second when the VM is busy). It's not exactly a
> >     "thundering
> >      >         herd" (I
> >      >          > never saw >2) but it's probably not a bad idea to use
> >     a separate
> >      >          > condition variable for each job type. (task map,
> >     surface map,
> >      >         memory read)
> >      >          >   * While I did not observe any concurrent memory
> mapping
> >      >         operations
> >      >          > *within* a type of memory map (2 task mappings or 2
> >     surface
> >      >         mappings) I
> >      >          > did see very occasional concurrent memory *read*
> >     callbacks.
> >      >         These would,
> >      >          > as far as I can tell, not be safe with QemuEvents,
> >     unless we
> >      >         placed the
> >      >          > event inside the job struct and init/destroyed it on
> every
> >      >         callback
> >      >          > (which seems like excessive overhead).
> >      >
> >      >         I think we can tolerate that overhead. init/destroy
> >     essentially
> >      >         sets the
> >      >         fields in the data structure and I estimate its total
> size is
> >      >         about 100
> >      >         bytes. It is probably better than waking an irrelevant
> thread
> >      >         up. I also
> >      >         hope that keeps the code simple; it's not worthwhile
> >     adding code to
> >      >         optimize this.
> >      >
> >      >
> >      >     At least pthread_cond_{init,destroy} and
> >      >     pthread_mutex_{init,destroy} don't make any syscalls, so yeah
> >     it's
> >      >     probably an acceptable overhead.
> >      >
> >      >
> >      > I've just experimented with QemuEvents created on-demand and ran
> >     into
> >      > some weird deadlocks, which then made me sit down and think about
> it
> >      > some more. I've come to the conclusion that creating (and
> crucially,
> >      > destroying) QemuEvents on demand in this way is not safe.
> >      >
> >      > Specifically, you must not call qemu_event_destroy() - which
> >      > transitively destroys the mutex and condition variable - unless
> >     you can
> >      > guarantee that the qemu_event_set() call on that event object has
> >     completed.
> >      >
> >      > In qemu_event_set, the event object's value is atomically set to
> >     EV_SET.
> >      > If the previous value was EV_BUSY, qemu_futex_wake() is called.
> >     All of
> >      > this is outside any mutex, however, so apart from memory coherence
> >      > (there are barriers) this can race with the waiting thread.
> >      > qemu_event_wait() reads the event's value. If EV_FREE, it's
> >     atomically
> >      > set to EV_BUSY. Then the mutex is locked, the value is checked
> >     again,
> >      > and if it's still EV_BUSY, it waits for the condition variable,
> >      > otherwise the mutex is immediately unlocked again. If the trigger
> >      > thread's qemu_event_set() flip to EV_SET occurs between the
> waiting
> >      > thread's two atomic reads of the value, the waiting thread will
> >     never
> >      > wait for the condition variable, but the trigger thread WILL try
> to
> >      > acquire the mutex and signal the condition variable in
> >      > qemu_futex_wake(), by which  time the waiting thread may have
> >     advanced
> >      > outside of qemu_event_wait().
> >
> >     Sorry if I'm making a mistake again, but the waiting thread won't
> >     set to
> >     EV_BUSY unless the value is EV_FREE on the second read so the trigger
> >     thread will not call qemu_futex_wake() if it manages to set to EV_SET
> >     before the second read, will it?
> >
> >
> > This sequence of events will cause the problem:
> >
> > WAITER (in qemu_event_wait):
> > value = qatomic_load_acquire(&ev->value);
> > -> EV_FREE
> >
> > TRIGGER (in qemu_event_set):
> > qatomic_read(&ev->value) != EV_SET
> > -> EV_FREE (condition is false)
> >
> > WAITER:
> > qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
> > -> cmpxchg returns EV_FREE, condition false.
> > ev->value =  EV_BUSY.
> > > TRIGGER:
> >          int old = qatomic_xchg(&ev->value, EV_SET);
> >          smp_mb__after_rmw();
> >          if (old == EV_BUSY) {
> > -> old = EV_BUSY, condition true.
> > ev->value = EV_SET
> >
> > WAITER (in qemu_futex_wait(ev, EV_BUSY)):
> >      pthread_mutex_lock(&ev->lock);
> >      if (ev->value == val) {
> > -> false, because value is EV_SET
> >
> > WAITER:
> >      pthread_mutex_unlock(&ev->lock);
> >      …
> >      qemu_event_destroy(&job->done_event);
> >
> > TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
> >      pthread_mutex_lock(&ev->lock);
> > -> hangs, because mutex has been destroyed
>
> Thanks for clarification. This is very insightful.
>
>
> >
> >      >
> >      > This is all fine usually, BUT if you destroy the QemuEvent
> >     immediately
> >      > after the qemu_event_wait() call, qemu_futex_wake() may try to
> >     lock a
> >      > mutex that has been destroyed, or signal a condition variable
> >     which has
> >      > been destroyed. I don't see a reasonable way of making this safe
> >     other
> >      > than using long-lived mutexes and condition variables. And
> >     anyway, we
> >      > have much, MUCH bigger contention/performance issues coming from
> >     almost
> >      > everything being covered by the BQL. (If waking these callbacks
> >     can even
> >      > be considered an issue: I haven't seen it show up in profiling,
> >     whereas
> >      > BQL contention very much does.)
> >      >
> >      > I'll submit v5 of this patch set with separate condition
> >     variables for
> >      > each job type. This should make the occurrence of waking the wrong
> >      > thread quite rare, while reasoning about correctness is pretty
> >      > straightforward. I think that's good enough.
>
> What about using QemuSemaphore then? It does not seem to have the
> problem same with QemuEvent.
>

Nowhere else in the code base uses short-lived semaphores, and while I
can't immediately see a risk (the mutex is held during both post and wait)
there might be some non-obvious problem with the approach. Internally, the
semaphores use condition variables. The solution using condition variables
directly already works, is safe, relatively easy to reason about, and does
not cause any performance issues. There is a tiny inefficiency about waking
up a thread unnecessarily in the rare case when two callbacks of the same
kind occur concurrently. In practice, it's irrelevant. Thanks to the
awkward mismatch of the PVGraphics.framework's libdispatch based approach
and Qemu's BQL/AIO/BH approach, we are already sending messages to other
threads very frequently. This isn't ideal, but not fixable without
drastically reducing the need to acquire the BQL across Qemu.

I do not think it is worth spending even more time trying to fix this part
of the code which isn't broken in the first place.

[-- Attachment #2: Type: text/html, Size: 13489 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-28 21:06                       ` Phil Dennis-Jordan
@ 2024-10-29  7:42                         ` Akihiko Odaki
  2024-10-29 21:16                           ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-29  7:42 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/29 6:06, Phil Dennis-Jordan wrote:
> 
> 
> On Mon, 28 Oct 2024 at 17:06, Akihiko Odaki <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
>     On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
>      >
>      >
>      > On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki
>     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>      > <mailto:akihiko.odaki@daynix.com
>     <mailto:akihiko.odaki@daynix.com>>> wrote:
>      >
>      >     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
>      >      >
>      >      >
>      >      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
>      >     <phil@philjordan.eu <mailto:phil@philjordan.eu>
>     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>
>      >      > <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>
>     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>>> wrote:
>      >      >
>      >      >
>      >      >          >      >
>      >      >          >      > Hmm. I think if we were to use that, we
>     would
>      >     need to
>      >      >         create a new
>      >      >          >      > QemuEvent for every job and destroy it
>     afterward,
>      >      >         which seems
>      >      >          >     expensive.
>      >      >          >      > We can't rule out multiple concurrent
>     jobs being
>      >      >         submitted, and the
>      >      >          >      > QemuEvent system only supports a single
>     producer as
>      >      >         far as I can
>      >      >          >     tell.
>      >      >          >      >
>      >      >          >      > You can probably sort of hack around it with
>      >     just one
>      >      >         QemuEvent by
>      >      >          >      > putting the qemu_event_wait into a loop
>     and turning
>      >      >         the job.done
>      >      >          >     flag
>      >      >          >      > into an atomic (because it would now
>     need to be
>      >      >         checked outside the
>      >      >          >      > lock) but this all seems unnecessarily
>     complicated
>      >      >         considering the
>      >      >          >      > QemuEvent uses the same mechanism QemuCond/
>      >     QemuMutex
>      >      >         internally
>      >      >          >     on macOS
>      >      >          >      > (the only platform relevant here), except we
>      >     can use it as
>      >      >          >     intended with
>      >      >          >      > QemuCond/QemuMutex rather than having to
>     work
>      >     against the
>      >      >          >     abstraction.
>      >      >          >
>      >      >          >     I don't think it's going to be used
>     concurrently. It
>      >      >         would be difficult
>      >      >          >     to reason even for the framework if it
>     performs memory
>      >      >          >     unmapping/mapping/reading operations
>     concurrently.
>      >      >          >
>      >      >          >
>      >      >          > I've just performed a very quick test by
>     wrapping the job
>      >      >         submission/
>      >      >          > wait in the 2 mapMemory callbacks and the 1
>     readMemory
>      >      >         callback with
>      >      >          > atomic counters and logging whenever a counter went
>      >     above 1.
>      >      >          >
>      >      >          >   * Overall, concurrent callbacks across all
>     types were
>      >      >         common (many per
>      >      >          > second when the VM is busy). It's not exactly a
>      >     "thundering
>      >      >         herd" (I
>      >      >          > never saw >2) but it's probably not a bad idea
>     to use
>      >     a separate
>      >      >          > condition variable for each job type. (task map,
>      >     surface map,
>      >      >         memory read)
>      >      >          >   * While I did not observe any concurrent
>     memory mapping
>      >      >         operations
>      >      >          > *within* a type of memory map (2 task mappings or 2
>      >     surface
>      >      >         mappings) I
>      >      >          > did see very occasional concurrent memory *read*
>      >     callbacks.
>      >      >         These would,
>      >      >          > as far as I can tell, not be safe with QemuEvents,
>      >     unless we
>      >      >         placed the
>      >      >          > event inside the job struct and init/destroyed
>     it on every
>      >      >         callback
>      >      >          > (which seems like excessive overhead).
>      >      >
>      >      >         I think we can tolerate that overhead. init/destroy
>      >     essentially
>      >      >         sets the
>      >      >         fields in the data structure and I estimate its
>     total size is
>      >      >         about 100
>      >      >         bytes. It is probably better than waking an
>     irrelevant thread
>      >      >         up. I also
>      >      >         hope that keeps the code simple; it's not worthwhile
>      >     adding code to
>      >      >         optimize this.
>      >      >
>      >      >
>      >      >     At least pthread_cond_{init,destroy} and
>      >      >     pthread_mutex_{init,destroy} don't make any syscalls,
>     so yeah
>      >     it's
>      >      >     probably an acceptable overhead.
>      >      >
>      >      >
>      >      > I've just experimented with QemuEvents created on-demand
>     and ran
>      >     into
>      >      > some weird deadlocks, which then made me sit down and
>     think about it
>      >      > some more. I've come to the conclusion that creating (and
>     crucially,
>      >      > destroying) QemuEvents on demand in this way is not safe.
>      >      >
>      >      > Specifically, you must not call qemu_event_destroy() - which
>      >      > transitively destroys the mutex and condition variable -
>     unless
>      >     you can
>      >      > guarantee that the qemu_event_set() call on that event
>     object has
>      >     completed.
>      >      >
>      >      > In qemu_event_set, the event object's value is atomically
>     set to
>      >     EV_SET.
>      >      > If the previous value was EV_BUSY, qemu_futex_wake() is
>     called.
>      >     All of
>      >      > this is outside any mutex, however, so apart from memory
>     coherence
>      >      > (there are barriers) this can race with the waiting thread.
>      >      > qemu_event_wait() reads the event's value. If EV_FREE, it's
>      >     atomically
>      >      > set to EV_BUSY. Then the mutex is locked, the value is checked
>      >     again,
>      >      > and if it's still EV_BUSY, it waits for the condition
>     variable,
>      >      > otherwise the mutex is immediately unlocked again. If the
>     trigger
>      >      > thread's qemu_event_set() flip to EV_SET occurs between
>     the waiting
>      >      > thread's two atomic reads of the value, the waiting thread
>     will
>      >     never
>      >      > wait for the condition variable, but the trigger thread
>     WILL try to
>      >      > acquire the mutex and signal the condition variable in
>      >      > qemu_futex_wake(), by which  time the waiting thread may have
>      >     advanced
>      >      > outside of qemu_event_wait().
>      >
>      >     Sorry if I'm making a mistake again, but the waiting thread won't
>      >     set to
>      >     EV_BUSY unless the value is EV_FREE on the second read so the
>     trigger
>      >     thread will not call qemu_futex_wake() if it manages to set
>     to EV_SET
>      >     before the second read, will it?
>      >
>      >
>      > This sequence of events will cause the problem:
>      >
>      > WAITER (in qemu_event_wait):
>      > value = qatomic_load_acquire(&ev->value);
>      > -> EV_FREE
>      >
>      > TRIGGER (in qemu_event_set):
>      > qatomic_read(&ev->value) != EV_SET
>      > -> EV_FREE (condition is false)
>      >
>      > WAITER:
>      > qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
>      > -> cmpxchg returns EV_FREE, condition false.
>      > ev->value =  EV_BUSY.
>      > > TRIGGER:
>      >          int old = qatomic_xchg(&ev->value, EV_SET);
>      >          smp_mb__after_rmw();
>      >          if (old == EV_BUSY) {
>      > -> old = EV_BUSY, condition true.
>      > ev->value = EV_SET
>      >
>      > WAITER (in qemu_futex_wait(ev, EV_BUSY)):
>      >      pthread_mutex_lock(&ev->lock);
>      >      if (ev->value == val) {
>      > -> false, because value is EV_SET
>      >
>      > WAITER:
>      >      pthread_mutex_unlock(&ev->lock);
>      >      …
>      >      qemu_event_destroy(&job->done_event);
>      >
>      > TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
>      >      pthread_mutex_lock(&ev->lock);
>      > -> hangs, because mutex has been destroyed
> 
>     Thanks for clarification. This is very insightful.
> 
> 
>      >
>      >      >
>      >      > This is all fine usually, BUT if you destroy the QemuEvent
>      >     immediately
>      >      > after the qemu_event_wait() call, qemu_futex_wake() may try to
>      >     lock a
>      >      > mutex that has been destroyed, or signal a condition variable
>      >     which has
>      >      > been destroyed. I don't see a reasonable way of making
>     this safe
>      >     other
>      >      > than using long-lived mutexes and condition variables. And
>      >     anyway, we
>      >      > have much, MUCH bigger contention/performance issues
>     coming from
>      >     almost
>      >      > everything being covered by the BQL. (If waking these
>     callbacks
>      >     can even
>      >      > be considered an issue: I haven't seen it show up in
>     profiling,
>      >     whereas
>      >      > BQL contention very much does.)
>      >      >
>      >      > I'll submit v5 of this patch set with separate condition
>      >     variables for
>      >      > each job type. This should make the occurrence of waking
>     the wrong
>      >      > thread quite rare, while reasoning about correctness is pretty
>      >      > straightforward. I think that's good enough.
> 
>     What about using QemuSemaphore then? It does not seem to have the
>     problem same with QemuEvent.
> 
> 
> Nowhere else in the code base uses short-lived semaphores, and while I 
> can't immediately see a risk (the mutex is held during both post and 
> wait) there might be some non-obvious problem with the approach. 
> Internally, the semaphores use condition variables. The solution using 
> condition variables directly already works, is safe, relatively easy to 
> reason about, and does not cause any performance issues. There is a tiny 
> inefficiency about waking up a thread unnecessarily in the rare case 
> when two callbacks of the same kind occur concurrently. In practice, 
> it's irrelevant. Thanks to the awkward mismatch of the 
> PVGraphics.framework's libdispatch based approach and Qemu's BQL/AIO/BH 
> approach, we are already sending messages to other threads very 
> frequently. This isn't ideal, but not fixable without drastically 
> reducing the need to acquire the BQL across Qemu.

I found several usage of ephemeral semaphores:
h_random() in hw/ppc/spapr_rng.c
colo_process_checkpoint() in migration/colo.c
postcopy_thread_create() in migration/postcopy-ram.c

I'm sure short-lived semaphores will keep working (or break migration in 
strange ways).

> 
> I do not think it is worth spending even more time trying to fix this 
> part of the code which isn't broken in the first place.

I'm sorry to bring you to this mess, which I didn't really expect. I 
thought combining a shared pair of conditional variable and mutex and 
job-specific bools is unnecessarily complex, and having one 
synchronization primitive for each job will be simpler and will just work.

However, there was a pitfall with QemuEvent as you demonstrated and now 
I grep QemuEvent and QemuSemaphore to find all such ephemeral usage is 
written with QemuSemaphore instead of QemuEvent. I think the critical 
problem here is that it is not documented that qemu_event_destroy() 
cannot be used immediately after qemu_event_wait(). We would not run 
into this situation if it is written. I will write a patch to add such a 
documentation comment.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-29  7:42                         ` Akihiko Odaki
@ 2024-10-29 21:16                           ` Phil Dennis-Jordan
  2024-10-31  6:52                             ` Akihiko Odaki
  0 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-29 21:16 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 14660 bytes --]

On Tue, 29 Oct 2024 at 08:42, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/29 6:06, Phil Dennis-Jordan wrote:
> >
> >
> > On Mon, 28 Oct 2024 at 17:06, Akihiko Odaki <akihiko.odaki@daynix.com
> > <mailto:akihiko.odaki@daynix.com>> wrote:
> >
> >     On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
> >      >
> >      >
> >      > On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki
> >     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
> >      > <mailto:akihiko.odaki@daynix.com
> >     <mailto:akihiko.odaki@daynix.com>>> wrote:
> >      >
> >      >     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
> >      >      >
> >      >      >
> >      >      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
> >      >     <phil@philjordan.eu <mailto:phil@philjordan.eu>
> >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>
> >      >      > <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>
> >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>>> wrote:
> >      >      >
> >      >      >
> >      >      >          >      >
> >      >      >          >      > Hmm. I think if we were to use that, we
> >     would
> >      >     need to
> >      >      >         create a new
> >      >      >          >      > QemuEvent for every job and destroy it
> >     afterward,
> >      >      >         which seems
> >      >      >          >     expensive.
> >      >      >          >      > We can't rule out multiple concurrent
> >     jobs being
> >      >      >         submitted, and the
> >      >      >          >      > QemuEvent system only supports a single
> >     producer as
> >      >      >         far as I can
> >      >      >          >     tell.
> >      >      >          >      >
> >      >      >          >      > You can probably sort of hack around it
> with
> >      >     just one
> >      >      >         QemuEvent by
> >      >      >          >      > putting the qemu_event_wait into a loop
> >     and turning
> >      >      >         the job.done
> >      >      >          >     flag
> >      >      >          >      > into an atomic (because it would now
> >     need to be
> >      >      >         checked outside the
> >      >      >          >      > lock) but this all seems unnecessarily
> >     complicated
> >      >      >         considering the
> >      >      >          >      > QemuEvent uses the same mechanism
> QemuCond/
> >      >     QemuMutex
> >      >      >         internally
> >      >      >          >     on macOS
> >      >      >          >      > (the only platform relevant here),
> except we
> >      >     can use it as
> >      >      >          >     intended with
> >      >      >          >      > QemuCond/QemuMutex rather than having to
> >     work
> >      >     against the
> >      >      >          >     abstraction.
> >      >      >          >
> >      >      >          >     I don't think it's going to be used
> >     concurrently. It
> >      >      >         would be difficult
> >      >      >          >     to reason even for the framework if it
> >     performs memory
> >      >      >          >     unmapping/mapping/reading operations
> >     concurrently.
> >      >      >          >
> >      >      >          >
> >      >      >          > I've just performed a very quick test by
> >     wrapping the job
> >      >      >         submission/
> >      >      >          > wait in the 2 mapMemory callbacks and the 1
> >     readMemory
> >      >      >         callback with
> >      >      >          > atomic counters and logging whenever a counter
> went
> >      >     above 1.
> >      >      >          >
> >      >      >          >   * Overall, concurrent callbacks across all
> >     types were
> >      >      >         common (many per
> >      >      >          > second when the VM is busy). It's not exactly a
> >      >     "thundering
> >      >      >         herd" (I
> >      >      >          > never saw >2) but it's probably not a bad idea
> >     to use
> >      >     a separate
> >      >      >          > condition variable for each job type. (task map,
> >      >     surface map,
> >      >      >         memory read)
> >      >      >          >   * While I did not observe any concurrent
> >     memory mapping
> >      >      >         operations
> >      >      >          > *within* a type of memory map (2 task mappings
> or 2
> >      >     surface
> >      >      >         mappings) I
> >      >      >          > did see very occasional concurrent memory *read*
> >      >     callbacks.
> >      >      >         These would,
> >      >      >          > as far as I can tell, not be safe with
> QemuEvents,
> >      >     unless we
> >      >      >         placed the
> >      >      >          > event inside the job struct and init/destroyed
> >     it on every
> >      >      >         callback
> >      >      >          > (which seems like excessive overhead).
> >      >      >
> >      >      >         I think we can tolerate that overhead. init/destroy
> >      >     essentially
> >      >      >         sets the
> >      >      >         fields in the data structure and I estimate its
> >     total size is
> >      >      >         about 100
> >      >      >         bytes. It is probably better than waking an
> >     irrelevant thread
> >      >      >         up. I also
> >      >      >         hope that keeps the code simple; it's not
> worthwhile
> >      >     adding code to
> >      >      >         optimize this.
> >      >      >
> >      >      >
> >      >      >     At least pthread_cond_{init,destroy} and
> >      >      >     pthread_mutex_{init,destroy} don't make any syscalls,
> >     so yeah
> >      >     it's
> >      >      >     probably an acceptable overhead.
> >      >      >
> >      >      >
> >      >      > I've just experimented with QemuEvents created on-demand
> >     and ran
> >      >     into
> >      >      > some weird deadlocks, which then made me sit down and
> >     think about it
> >      >      > some more. I've come to the conclusion that creating (and
> >     crucially,
> >      >      > destroying) QemuEvents on demand in this way is not safe.
> >      >      >
> >      >      > Specifically, you must not call qemu_event_destroy() -
> which
> >      >      > transitively destroys the mutex and condition variable -
> >     unless
> >      >     you can
> >      >      > guarantee that the qemu_event_set() call on that event
> >     object has
> >      >     completed.
> >      >      >
> >      >      > In qemu_event_set, the event object's value is atomically
> >     set to
> >      >     EV_SET.
> >      >      > If the previous value was EV_BUSY, qemu_futex_wake() is
> >     called.
> >      >     All of
> >      >      > this is outside any mutex, however, so apart from memory
> >     coherence
> >      >      > (there are barriers) this can race with the waiting thread.
> >      >      > qemu_event_wait() reads the event's value. If EV_FREE, it's
> >      >     atomically
> >      >      > set to EV_BUSY. Then the mutex is locked, the value is
> checked
> >      >     again,
> >      >      > and if it's still EV_BUSY, it waits for the condition
> >     variable,
> >      >      > otherwise the mutex is immediately unlocked again. If the
> >     trigger
> >      >      > thread's qemu_event_set() flip to EV_SET occurs between
> >     the waiting
> >      >      > thread's two atomic reads of the value, the waiting thread
> >     will
> >      >     never
> >      >      > wait for the condition variable, but the trigger thread
> >     WILL try to
> >      >      > acquire the mutex and signal the condition variable in
> >      >      > qemu_futex_wake(), by which  time the waiting thread may
> have
> >      >     advanced
> >      >      > outside of qemu_event_wait().
> >      >
> >      >     Sorry if I'm making a mistake again, but the waiting thread
> won't
> >      >     set to
> >      >     EV_BUSY unless the value is EV_FREE on the second read so the
> >     trigger
> >      >     thread will not call qemu_futex_wake() if it manages to set
> >     to EV_SET
> >      >     before the second read, will it?
> >      >
> >      >
> >      > This sequence of events will cause the problem:
> >      >
> >      > WAITER (in qemu_event_wait):
> >      > value = qatomic_load_acquire(&ev->value);
> >      > -> EV_FREE
> >      >
> >      > TRIGGER (in qemu_event_set):
> >      > qatomic_read(&ev->value) != EV_SET
> >      > -> EV_FREE (condition is false)
> >      >
> >      > WAITER:
> >      > qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
> >      > -> cmpxchg returns EV_FREE, condition false.
> >      > ev->value =  EV_BUSY.
> >      > > TRIGGER:
> >      >          int old = qatomic_xchg(&ev->value, EV_SET);
> >      >          smp_mb__after_rmw();
> >      >          if (old == EV_BUSY) {
> >      > -> old = EV_BUSY, condition true.
> >      > ev->value = EV_SET
> >      >
> >      > WAITER (in qemu_futex_wait(ev, EV_BUSY)):
> >      >      pthread_mutex_lock(&ev->lock);
> >      >      if (ev->value == val) {
> >      > -> false, because value is EV_SET
> >      >
> >      > WAITER:
> >      >      pthread_mutex_unlock(&ev->lock);
> >      >      …
> >      >      qemu_event_destroy(&job->done_event);
> >      >
> >      > TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
> >      >      pthread_mutex_lock(&ev->lock);
> >      > -> hangs, because mutex has been destroyed
> >
> >     Thanks for clarification. This is very insightful.
> >
> >
> >      >
> >      >      >
> >      >      > This is all fine usually, BUT if you destroy the QemuEvent
> >      >     immediately
> >      >      > after the qemu_event_wait() call, qemu_futex_wake() may
> try to
> >      >     lock a
> >      >      > mutex that has been destroyed, or signal a condition
> variable
> >      >     which has
> >      >      > been destroyed. I don't see a reasonable way of making
> >     this safe
> >      >     other
> >      >      > than using long-lived mutexes and condition variables. And
> >      >     anyway, we
> >      >      > have much, MUCH bigger contention/performance issues
> >     coming from
> >      >     almost
> >      >      > everything being covered by the BQL. (If waking these
> >     callbacks
> >      >     can even
> >      >      > be considered an issue: I haven't seen it show up in
> >     profiling,
> >      >     whereas
> >      >      > BQL contention very much does.)
> >      >      >
> >      >      > I'll submit v5 of this patch set with separate condition
> >      >     variables for
> >      >      > each job type. This should make the occurrence of waking
> >     the wrong
> >      >      > thread quite rare, while reasoning about correctness is
> pretty
> >      >      > straightforward. I think that's good enough.
> >
> >     What about using QemuSemaphore then? It does not seem to have the
> >     problem same with QemuEvent.
> >
> >
> > Nowhere else in the code base uses short-lived semaphores, and while I
> > can't immediately see a risk (the mutex is held during both post and
> > wait) there might be some non-obvious problem with the approach.
> > Internally, the semaphores use condition variables. The solution using
> > condition variables directly already works, is safe, relatively easy to
> > reason about, and does not cause any performance issues. There is a tiny
> > inefficiency about waking up a thread unnecessarily in the rare case
> > when two callbacks of the same kind occur concurrently. In practice,
> > it's irrelevant. Thanks to the awkward mismatch of the
> > PVGraphics.framework's libdispatch based approach and Qemu's BQL/AIO/BH
> > approach, we are already sending messages to other threads very
> > frequently. This isn't ideal, but not fixable without drastically
> > reducing the need to acquire the BQL across Qemu.
>
> I found several usage of ephemeral semaphores:
> h_random() in hw/ppc/spapr_rng.c
> colo_process_checkpoint() in migration/colo.c
> postcopy_thread_create() in migration/postcopy-ram.c
>
> I'm sure short-lived semaphores will keep working (or break migration in
> strange ways).
>
> >
> > I do not think it is worth spending even more time trying to fix this
> > part of the code which isn't broken in the first place.
>
> I'm sorry to bring you to this mess, which I didn't really expect. I
> thought combining a shared pair of conditional variable and mutex and
> job-specific bools is unnecessarily complex, and having one
> synchronization primitive for each job will be simpler and will just work.
>

With multithreading, the devil is always in the detail! 😅 I wouldn't mind
if we were seeing genuine issues with the Mutex/Cond code, but it's fine as
far as I can tell. The QemuEvent version wasn't even really any simpler
(replacing bool done; with QemuEvent done_event; and await ->
init/wait/destroy gets longer while lock/broadcast/unlock -> set gets
shorter), and I guess a QemuSemaphore version would be about the same.
Relying on the way an edge case is handled - destroying immediately after
waiting - in the long term potentially makes the code more fragile too in
case implementation details change. I think we've reached a bikeshedding
stage here, and I suggest any further improvements on this part other than
bug fixes should be deferred to future patches.


> However, there was a pitfall with QemuEvent as you demonstrated and now
> I grep QemuEvent and QemuSemaphore to find all such ephemeral usage is
> written with QemuSemaphore instead of QemuEvent. I think the critical
> problem here is that it is not documented that qemu_event_destroy()
> cannot be used immediately after qemu_event_wait(). We would not run
> into this situation if it is written. I will write a patch to add such a
> documentation comment.
>

Sounds good, I'd be happy to review it (cc me).

Thanks for the in-depth reviews on all the patches in this set! You've
prompted me to make some significant improvements, even if the experience
with the job signalling has ended up being a frustrating one. The code is
otherwise definitely better now than before. I've just posted v5 of the
patch set and I hope we're pretty close to "good enough" now.

All the best,
Phil

[-- Attachment #2: Type: text/html, Size: 20640 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-29 21:16                           ` Phil Dennis-Jordan
@ 2024-10-31  6:52                             ` Akihiko Odaki
  2024-11-03 15:08                               ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-31  6:52 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/30 6:16, Phil Dennis-Jordan wrote:
> 
> 
> On Tue, 29 Oct 2024 at 08:42, Akihiko Odaki <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
>     On 2024/10/29 6:06, Phil Dennis-Jordan wrote:
>      >
>      >
>      > On Mon, 28 Oct 2024 at 17:06, Akihiko Odaki
>     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>      > <mailto:akihiko.odaki@daynix.com
>     <mailto:akihiko.odaki@daynix.com>>> wrote:
>      >
>      >     On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
>      >      >
>      >      >
>      >      > On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki
>      >     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>     <mailto:akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>>
>      >      > <mailto:akihiko.odaki@daynix.com
>     <mailto:akihiko.odaki@daynix.com>
>      >     <mailto:akihiko.odaki@daynix.com
>     <mailto:akihiko.odaki@daynix.com>>>> wrote:
>      >      >
>      >      >     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
>      >      >      >
>      >      >      >
>      >      >      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
>      >      >     <phil@philjordan.eu <mailto:phil@philjordan.eu>
>     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>
>      >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>
>     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>>
>      >      >      > <mailto:phil@philjordan.eu
>     <mailto:phil@philjordan.eu> <mailto:phil@philjordan.eu
>     <mailto:phil@philjordan.eu>>
>      >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>
>     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>>>> wrote:
>      >      >      >
>      >      >      >
>      >      >      >          >      >
>      >      >      >          >      > Hmm. I think if we were to use
>     that, we
>      >     would
>      >      >     need to
>      >      >      >         create a new
>      >      >      >          >      > QemuEvent for every job and
>     destroy it
>      >     afterward,
>      >      >      >         which seems
>      >      >      >          >     expensive.
>      >      >      >          >      > We can't rule out multiple concurrent
>      >     jobs being
>      >      >      >         submitted, and the
>      >      >      >          >      > QemuEvent system only supports a
>     single
>      >     producer as
>      >      >      >         far as I can
>      >      >      >          >     tell.
>      >      >      >          >      >
>      >      >      >          >      > You can probably sort of hack
>     around it with
>      >      >     just one
>      >      >      >         QemuEvent by
>      >      >      >          >      > putting the qemu_event_wait into
>     a loop
>      >     and turning
>      >      >      >         the job.done
>      >      >      >          >     flag
>      >      >      >          >      > into an atomic (because it would now
>      >     need to be
>      >      >      >         checked outside the
>      >      >      >          >      > lock) but this all seems
>     unnecessarily
>      >     complicated
>      >      >      >         considering the
>      >      >      >          >      > QemuEvent uses the same mechanism
>     QemuCond/
>      >      >     QemuMutex
>      >      >      >         internally
>      >      >      >          >     on macOS
>      >      >      >          >      > (the only platform relevant
>     here), except we
>      >      >     can use it as
>      >      >      >          >     intended with
>      >      >      >          >      > QemuCond/QemuMutex rather than
>     having to
>      >     work
>      >      >     against the
>      >      >      >          >     abstraction.
>      >      >      >          >
>      >      >      >          >     I don't think it's going to be used
>      >     concurrently. It
>      >      >      >         would be difficult
>      >      >      >          >     to reason even for the framework if it
>      >     performs memory
>      >      >      >          >     unmapping/mapping/reading operations
>      >     concurrently.
>      >      >      >          >
>      >      >      >          >
>      >      >      >          > I've just performed a very quick test by
>      >     wrapping the job
>      >      >      >         submission/
>      >      >      >          > wait in the 2 mapMemory callbacks and the 1
>      >     readMemory
>      >      >      >         callback with
>      >      >      >          > atomic counters and logging whenever a
>     counter went
>      >      >     above 1.
>      >      >      >          >
>      >      >      >          >   * Overall, concurrent callbacks across all
>      >     types were
>      >      >      >         common (many per
>      >      >      >          > second when the VM is busy). It's not
>     exactly a
>      >      >     "thundering
>      >      >      >         herd" (I
>      >      >      >          > never saw >2) but it's probably not a
>     bad idea
>      >     to use
>      >      >     a separate
>      >      >      >          > condition variable for each job type.
>     (task map,
>      >      >     surface map,
>      >      >      >         memory read)
>      >      >      >          >   * While I did not observe any concurrent
>      >     memory mapping
>      >      >      >         operations
>      >      >      >          > *within* a type of memory map (2 task
>     mappings or 2
>      >      >     surface
>      >      >      >         mappings) I
>      >      >      >          > did see very occasional concurrent
>     memory *read*
>      >      >     callbacks.
>      >      >      >         These would,
>      >      >      >          > as far as I can tell, not be safe with
>     QemuEvents,
>      >      >     unless we
>      >      >      >         placed the
>      >      >      >          > event inside the job struct and init/
>     destroyed
>      >     it on every
>      >      >      >         callback
>      >      >      >          > (which seems like excessive overhead).
>      >      >      >
>      >      >      >         I think we can tolerate that overhead.
>     init/destroy
>      >      >     essentially
>      >      >      >         sets the
>      >      >      >         fields in the data structure and I estimate its
>      >     total size is
>      >      >      >         about 100
>      >      >      >         bytes. It is probably better than waking an
>      >     irrelevant thread
>      >      >      >         up. I also
>      >      >      >         hope that keeps the code simple; it's not
>     worthwhile
>      >      >     adding code to
>      >      >      >         optimize this.
>      >      >      >
>      >      >      >
>      >      >      >     At least pthread_cond_{init,destroy} and
>      >      >      >     pthread_mutex_{init,destroy} don't make any
>     syscalls,
>      >     so yeah
>      >      >     it's
>      >      >      >     probably an acceptable overhead.
>      >      >      >
>      >      >      >
>      >      >      > I've just experimented with QemuEvents created on-
>     demand
>      >     and ran
>      >      >     into
>      >      >      > some weird deadlocks, which then made me sit down and
>      >     think about it
>      >      >      > some more. I've come to the conclusion that
>     creating (and
>      >     crucially,
>      >      >      > destroying) QemuEvents on demand in this way is not
>     safe.
>      >      >      >
>      >      >      > Specifically, you must not call
>     qemu_event_destroy() - which
>      >      >      > transitively destroys the mutex and condition
>     variable -
>      >     unless
>      >      >     you can
>      >      >      > guarantee that the qemu_event_set() call on that event
>      >     object has
>      >      >     completed.
>      >      >      >
>      >      >      > In qemu_event_set, the event object's value is
>     atomically
>      >     set to
>      >      >     EV_SET.
>      >      >      > If the previous value was EV_BUSY, qemu_futex_wake() is
>      >     called.
>      >      >     All of
>      >      >      > this is outside any mutex, however, so apart from
>     memory
>      >     coherence
>      >      >      > (there are barriers) this can race with the waiting
>     thread.
>      >      >      > qemu_event_wait() reads the event's value. If
>     EV_FREE, it's
>      >      >     atomically
>      >      >      > set to EV_BUSY. Then the mutex is locked, the value
>     is checked
>      >      >     again,
>      >      >      > and if it's still EV_BUSY, it waits for the condition
>      >     variable,
>      >      >      > otherwise the mutex is immediately unlocked again.
>     If the
>      >     trigger
>      >      >      > thread's qemu_event_set() flip to EV_SET occurs between
>      >     the waiting
>      >      >      > thread's two atomic reads of the value, the waiting
>     thread
>      >     will
>      >      >     never
>      >      >      > wait for the condition variable, but the trigger thread
>      >     WILL try to
>      >      >      > acquire the mutex and signal the condition variable in
>      >      >      > qemu_futex_wake(), by which  time the waiting
>     thread may have
>      >      >     advanced
>      >      >      > outside of qemu_event_wait().
>      >      >
>      >      >     Sorry if I'm making a mistake again, but the waiting
>     thread won't
>      >      >     set to
>      >      >     EV_BUSY unless the value is EV_FREE on the second read
>     so the
>      >     trigger
>      >      >     thread will not call qemu_futex_wake() if it manages
>     to set
>      >     to EV_SET
>      >      >     before the second read, will it?
>      >      >
>      >      >
>      >      > This sequence of events will cause the problem:
>      >      >
>      >      > WAITER (in qemu_event_wait):
>      >      > value = qatomic_load_acquire(&ev->value);
>      >      > -> EV_FREE
>      >      >
>      >      > TRIGGER (in qemu_event_set):
>      >      > qatomic_read(&ev->value) != EV_SET
>      >      > -> EV_FREE (condition is false)
>      >      >
>      >      > WAITER:
>      >      > qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
>      >      > -> cmpxchg returns EV_FREE, condition false.
>      >      > ev->value =  EV_BUSY.
>      >      > > TRIGGER:
>      >      >          int old = qatomic_xchg(&ev->value, EV_SET);
>      >      >          smp_mb__after_rmw();
>      >      >          if (old == EV_BUSY) {
>      >      > -> old = EV_BUSY, condition true.
>      >      > ev->value = EV_SET
>      >      >
>      >      > WAITER (in qemu_futex_wait(ev, EV_BUSY)):
>      >      >      pthread_mutex_lock(&ev->lock);
>      >      >      if (ev->value == val) {
>      >      > -> false, because value is EV_SET
>      >      >
>      >      > WAITER:
>      >      >      pthread_mutex_unlock(&ev->lock);
>      >      >      …
>      >      >      qemu_event_destroy(&job->done_event);
>      >      >
>      >      > TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
>      >      >      pthread_mutex_lock(&ev->lock);
>      >      > -> hangs, because mutex has been destroyed
>      >
>      >     Thanks for clarification. This is very insightful.
>      >
>      >
>      >      >
>      >      >      >
>      >      >      > This is all fine usually, BUT if you destroy the
>     QemuEvent
>      >      >     immediately
>      >      >      > after the qemu_event_wait() call, qemu_futex_wake()
>     may try to
>      >      >     lock a
>      >      >      > mutex that has been destroyed, or signal a
>     condition variable
>      >      >     which has
>      >      >      > been destroyed. I don't see a reasonable way of making
>      >     this safe
>      >      >     other
>      >      >      > than using long-lived mutexes and condition
>     variables. And
>      >      >     anyway, we
>      >      >      > have much, MUCH bigger contention/performance issues
>      >     coming from
>      >      >     almost
>      >      >      > everything being covered by the BQL. (If waking these
>      >     callbacks
>      >      >     can even
>      >      >      > be considered an issue: I haven't seen it show up in
>      >     profiling,
>      >      >     whereas
>      >      >      > BQL contention very much does.)
>      >      >      >
>      >      >      > I'll submit v5 of this patch set with separate
>     condition
>      >      >     variables for
>      >      >      > each job type. This should make the occurrence of
>     waking
>      >     the wrong
>      >      >      > thread quite rare, while reasoning about
>     correctness is pretty
>      >      >      > straightforward. I think that's good enough.
>      >
>      >     What about using QemuSemaphore then? It does not seem to have the
>      >     problem same with QemuEvent.
>      >
>      >
>      > Nowhere else in the code base uses short-lived semaphores, and
>     while I
>      > can't immediately see a risk (the mutex is held during both post and
>      > wait) there might be some non-obvious problem with the approach.
>      > Internally, the semaphores use condition variables. The solution
>     using
>      > condition variables directly already works, is safe, relatively
>     easy to
>      > reason about, and does not cause any performance issues. There is
>     a tiny
>      > inefficiency about waking up a thread unnecessarily in the rare case
>      > when two callbacks of the same kind occur concurrently. In practice,
>      > it's irrelevant. Thanks to the awkward mismatch of the
>      > PVGraphics.framework's libdispatch based approach and Qemu's BQL/
>     AIO/BH
>      > approach, we are already sending messages to other threads very
>      > frequently. This isn't ideal, but not fixable without drastically
>      > reducing the need to acquire the BQL across Qemu.
> 
>     I found several usage of ephemeral semaphores:
>     h_random() in hw/ppc/spapr_rng.c
>     colo_process_checkpoint() in migration/colo.c
>     postcopy_thread_create() in migration/postcopy-ram.c
> 
>     I'm sure short-lived semaphores will keep working (or break
>     migration in
>     strange ways).
> 
>      >
>      > I do not think it is worth spending even more time trying to fix
>     this
>      > part of the code which isn't broken in the first place.
> 
>     I'm sorry to bring you to this mess, which I didn't really expect. I
>     thought combining a shared pair of conditional variable and mutex and
>     job-specific bools is unnecessarily complex, and having one
>     synchronization primitive for each job will be simpler and will just
>     work.
> 
> 
> With multithreading, the devil is always in the detail! 😅 I wouldn't 
> mind if we were seeing genuine issues with the Mutex/Cond code, but it's 
> fine as far as I can tell. The QemuEvent version wasn't even really any 
> simpler (replacing bool done; with QemuEvent done_event; and await -> 
> init/wait/destroy gets longer while lock/broadcast/unlock -> set gets 
> shorter), and I guess a QemuSemaphore version would be about the same. 
> Relying on the way an edge case is handled - destroying immediately 
> after waiting - in the long term potentially makes the code more fragile 
> too in case implementation details change. I think we've reached a 
> bikeshedding stage here, and I suggest any further improvements on this 
> part other than bug fixes should be deferred to future patches.

We still have more than bikeshedding. There are two design options 
discussed:

1) Whether synchronization primitives should be localized
2) Whether short-lived QemuSemaphore is appropriate if 1) is true

We discussed 2) in details, but haven't done much for 1) so there is 
some room for discussion.

Even for 2), let me emphasize that avoiding ephemeral QemuSemaphore in 
one device implementation is not appropriate as means to deal with its 
potential problem when there is similar existing usage. QEMU needs to be 
correct as whole, and having a workaround only part of its codebase is 
not OK. We need to either follow existing patterns or prepare for even 
more discussion (and I'm for the former).

Regarding 1), I think it's easier just to show code. Below is my idea of 
code change to localize synchronization primitives. This code is not 
tested or even compiled, but it should be sufficient to demonstrate the 
idea. There are a few notable observations:

a) apple_gfx_await_bh_job() can be extended to absorb all repetitive 
code of BH jobs. Such a change is probably beneficial even when 
synchronization primitives are shared, but it is more beneficial when 
synchronization primitives are localized since it allows to wrap init 
and destroy.

b) No need to declare multiple conditional variables and choosing one of 
them for each job type. Instead we can have one definition and let it 
instantiated whenever creating BH jobs.

c) Localized synchronization primitives make reasoning simpler and makes 
the comment in apple-gfx.h unnecessary. We still need the discussion of 
QemuEvent v.s. QemuSemaphore, but it will be dealt in the common code so 
apple-gfx does not need to have its own comment.

Regards,
Akihiko Odaki

---
  hw/display/apple-gfx.h      | 19 +++---------
  hw/display/apple-gfx-mmio.m | 23 ++------------
  hw/display/apple-gfx.m      | 60 +++++++++++++++++--------------------
  3 files changed, 34 insertions(+), 68 deletions(-)

diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
index e9fef09e37ea..b5aeed4f3dcf 100644
--- a/hw/display/apple-gfx.h
+++ b/hw/display/apple-gfx.h
@@ -40,19 +40,6 @@ typedef struct AppleGFXState {
      dispatch_queue_t render_queue;
      struct AppleGFXDisplayMode *display_modes;
      uint32_t num_display_modes;
-    /*
-     * QemuMutex & QemuConds for awaiting completion of PVG 
memory-mapping and
-     * reading requests after submitting them to run in the AIO context.
-     * QemuCond (rather than QemuEvent) are used so multiple concurrent 
jobs
-     * can be handled safely.
-     * The state associated with each job is tracked in a AppleGFX*Job 
struct
-     * for each kind of job; instances are allocated on the caller's stack.
-     * This struct also contains the completion flag which is used in
-     * conjunction with the condition variable.
-     */
-    QemuMutex job_mutex;
-    QemuCond task_map_job_cond;
-    QemuCond mem_read_job_cond;

      /* tasks is protected by task_mutex */
      QemuMutex task_mutex;
@@ -82,8 +69,10 @@ void apple_gfx_common_realize(AppleGFXState *s, 
PGDeviceDescriptor *desc,
  uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
                                                 uint64_t length, bool 
read_only,
                                                 MemoryRegion 
**mapping_in_region);
-void apple_gfx_await_bh_job(AppleGFXState *s, QemuCond *job_cond,
-                            bool *job_done_flag);
+void apple_gfx_bh_job_run_full(QEMUBHFunc *cb, void *opaque, const char 
*name);
+
+#define apple_gfx_bh_job_run(cb, opaque) \
+    apple_gfx_bh_job_run_full((cb), (opaque), (stringify(cb)))

  extern const PropertyInfo qdev_prop_display_mode;

diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
index a801c5fa722e..889a23df89e9 100644
--- a/hw/display/apple-gfx-mmio.m
+++ b/hw/display/apple-gfx-mmio.m
@@ -60,8 +60,6 @@ -(void)mmioWriteAtOffset:(size_t)offset 
value:(uint32_t)value;

      AppleGFXState common;

-    QemuCond iosfc_map_job_cond;
-    QemuCond iosfc_unmap_job_cond;
      qemu_irq irq_gfx;
      qemu_irq irq_iosfc;
      MemoryRegion iomem_iosfc;
@@ -154,7 +152,6 @@ static void raise_irq(void *opaque)
      AppleGFXMMIOState *state;
      bool read_only;
      bool success;
-    bool done;
  } AppleGFXMapSurfaceMemoryJob;

  typedef struct AppleGFXMMIOMappedRegion {
@@ -203,18 +200,13 @@ static void apple_gfx_mmio_map_surface_memory(void 
*opaque)
          }
      }

-    qemu_mutex_lock(&s->common.job_mutex);
      job->result_mem = (void *)mem;
      job->success = mem != 0;
-    job->done = true;
-    qemu_cond_broadcast(&s->iosfc_map_job_cond);
-    qemu_mutex_unlock(&s->common.job_mutex);
  }

  typedef struct AppleGFXUnmapSurfaceMemoryJob {
      void *virtual_address;
      AppleGFXMMIOState *state;
-    bool done;
  } AppleGFXUnmapSurfaceMemoryJob;

  static AppleGFXMMIOMappedRegion *find_mapped_region_containing(GArray 
*regions,
@@ -257,11 +249,6 @@ static void 
apple_gfx_mmio_unmap_surface_memory(void *opaque)
                        __func__,
                        job->virtual_address, regions->len);
      }
-
-    qemu_mutex_lock(&s->common.job_mutex);
-    job->done = true;
-    qemu_cond_broadcast(&s->iosfc_unmap_job_cond);
-    qemu_mutex_unlock(&s->common.job_mutex);
  }

  static PGIOSurfaceHostDevice *apple_gfx_prepare_iosurface_host_device(
@@ -278,9 +265,7 @@ static void apple_gfx_mmio_unmap_surface_memory(void 
*opaque)
                  .read_only = ro, .state = s,
              };

-            aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                                    apple_gfx_mmio_map_surface_memory, 
&job);
-            apple_gfx_await_bh_job(&s->common, &s->iosfc_map_job_cond, 
&job.done);
+            apple_gfx_bh_job_run(apple_gfx_mmio_map_surface_memory, &job);

              *va = job.result_mem;

@@ -295,9 +280,7 @@ static void apple_gfx_mmio_unmap_surface_memory(void 
*opaque)
              AppleGFXUnmapSurfaceMemoryJob job = { va, s };
              trace_apple_gfx_iosfc_unmap_memory(va, b, c, d, e, f);

-            aio_bh_schedule_oneshot(qemu_get_aio_context(),
- 
apple_gfx_mmio_unmap_surface_memory, &job);
-            apple_gfx_await_bh_job(&s->common, 
&s->iosfc_unmap_job_cond, &job.done);
+            apple_gfx_bh_job_run(apple_gfx_mmio_unmap_surface_memory, 
&job);

              return true;
          };
@@ -336,8 +319,6 @@ static void apple_gfx_mmio_realize(DeviceState *dev, 
Error **errp)
                                2 /* Usually no more RAM regions*/);

          apple_gfx_common_realize(&s->common, desc, errp);
-        qemu_cond_init(&s->iosfc_map_job_cond);
-        qemu_cond_init(&s->iosfc_unmap_job_cond);

          [desc release];
          desc = nil;
diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
index 2e264e5561fc..4d174e766310 100644
--- a/hw/display/apple-gfx.m
+++ b/hw/display/apple-gfx.m
@@ -90,6 +90,31 @@ static dispatch_queue_t get_background_queue(void)
      return task;
  }

+typedef struct AppleGFXJob {
+  QEMUBHFunc cb;
+  void *opaque;
+  QemuSemaphore sem;
+} AppleGFXJob;
+
+void apple_gfx_bh_job_cb(void *opaque)
+{
+  AppleGFXJob *job = opaque;
+  job->cb(job->opaque);
+  qemu_sem_post(&job->sem);
+}
+
+void apple_gfx_bh_job_run_full(QEMUBHFunc *cb, void *opaque, const char 
*name)
+{
+    AppleGFXJob job;
+    job->cb = cb;
+    job->opaque = opaque;
+    qemu_sem_init(&job->sem, 0);
+    aio_bh_schedule_oneshot_full(qemu_get_aio_context(), 
apple_gfx_bh_job_cb,
+                                 &job, name);
+    qemu_sem_wait(&job->sem);
+    qemu_sem_destroy(&job->sem);
+}
+
  typedef struct AppleGFXIOJob {
      AppleGFXState *state;
      uint64_t offset;
@@ -355,7 +380,6 @@ void apple_gfx_common_init(Object *obj, 
AppleGFXState *s, const char* obj_name)
      uint32_t range_count;
      bool read_only;
      bool success;
-    bool done;
  } AppleGFXMapMemoryJob;

  uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
@@ -457,20 +481,7 @@ static void apple_gfx_map_memory(void *opaque)
          g_assert(r == KERN_SUCCESS);
      }

-    qemu_mutex_lock(&s->job_mutex);
      job->success = success;
-    job->done = true;
-    qemu_cond_broadcast(&s->task_map_job_cond);
-    qemu_mutex_unlock(&s->job_mutex);
-}
-
-void apple_gfx_await_bh_job(AppleGFXState *s, QemuCond *job_cond, bool 
*job_done_flag)
-{
-    qemu_mutex_lock(&s->job_mutex);
-    while (!*job_done_flag) {
-        qemu_cond_wait(job_cond, &s->job_mutex);
-    }
-    qemu_mutex_unlock(&s->job_mutex);
  }

  typedef struct AppleGFXReadMemoryJob {
@@ -478,8 +489,6 @@ void apple_gfx_await_bh_job(AppleGFXState *s, 
QemuCond *job_cond, bool *job_done
      hwaddr physical_address;
      uint64_t length;
      void *dst;
-    bool done;
-    bool success;
  } AppleGFXReadMemoryJob;

  static void apple_gfx_do_read_memory(void *opaque)
@@ -491,11 +500,6 @@ static void apple_gfx_do_read_memory(void *opaque)
      r = dma_memory_read(&address_space_memory, job->physical_address,
                          job->dst, job->length, MEMTXATTRS_UNSPECIFIED);
      job->success = r == MEMTX_OK;
-
-    qemu_mutex_lock(&s->job_mutex);
-    job->done = true;
-    qemu_cond_broadcast(&s->mem_read_job_cond);
-    qemu_mutex_unlock(&s->job_mutex);
  }

  static bool apple_gfx_read_memory(AppleGFXState *s, hwaddr 
physical_address,
@@ -508,9 +512,7 @@ static bool apple_gfx_read_memory(AppleGFXState *s, 
hwaddr physical_address,
      trace_apple_gfx_read_memory(physical_address, length, dst);

      /* Traversing the memory map requires RCU/BQL, so do it in a BH. */
-    aio_bh_schedule_oneshot(qemu_get_aio_context(), 
apple_gfx_do_read_memory,
-                            &job);
-    apple_gfx_await_bh_job(s, &s->mem_read_job_cond, &job.done);
+    apple_gfx_bh_job_run(s, apple_gfx_do_read_memory, &job);
      return job.success;
  }

@@ -556,12 +558,10 @@ static void 
apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
              .state = s,
              .task = task, .ranges = ranges, .range_count = range_count,
              .read_only = read_only, .virtual_offset = virtual_offset,
-            .done = false, .success = true,
+            .success = true,
          };
          if (range_count > 0) {
-            aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                                    apple_gfx_map_memory, &job);
-            apple_gfx_await_bh_job(s, &s->task_map_job_cond, &job.done);
+            apple_gfx_bh_job_run(s, apple_gfx_map_memory, &job);
          }
          return job.success;
      };
@@ -780,10 +780,6 @@ void apple_gfx_common_realize(AppleGFXState *s, 
PGDeviceDescriptor *desc,
          apple_gfx_create_display_mode_array(display_modes, 
num_display_modes);

      create_fb(s);
-
-    qemu_mutex_init(&s->job_mutex);
-    qemu_cond_init(&s->task_map_job_cond);
-    qemu_cond_init(&s->mem_read_job_cond);
  }

  static void apple_gfx_get_display_mode(Object *obj, Visitor *v,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
  2024-10-31  6:52                             ` Akihiko Odaki
@ 2024-11-03 15:08                               ` Phil Dennis-Jordan
  0 siblings, 0 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-11-03 15:08 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 29556 bytes --]

On Thu, 31 Oct 2024 at 07:52, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/30 6:16, Phil Dennis-Jordan wrote:
> >
> >
> > On Tue, 29 Oct 2024 at 08:42, Akihiko Odaki <akihiko.odaki@daynix.com
> > <mailto:akihiko.odaki@daynix.com>> wrote:
> >
> >     On 2024/10/29 6:06, Phil Dennis-Jordan wrote:
> >      >
> >      >
> >      > On Mon, 28 Oct 2024 at 17:06, Akihiko Odaki
> >     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
> >      > <mailto:akihiko.odaki@daynix.com
> >     <mailto:akihiko.odaki@daynix.com>>> wrote:
> >      >
> >      >     On 2024/10/28 23:13, Phil Dennis-Jordan wrote:
> >      >      >
> >      >      >
> >      >      > On Mon, 28 Oct 2024 at 15:02, Akihiko Odaki
> >      >     <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
> >     <mailto:akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>>
> >      >      > <mailto:akihiko.odaki@daynix.com
> >     <mailto:akihiko.odaki@daynix.com>
> >      >     <mailto:akihiko.odaki@daynix.com
> >     <mailto:akihiko.odaki@daynix.com>>>> wrote:
> >      >      >
> >      >      >     On 2024/10/28 22:31, Phil Dennis-Jordan wrote:
> >      >      >      >
> >      >      >      >
> >      >      >      > On Mon, 28 Oct 2024 at 10:00, Phil Dennis-Jordan
> >      >      >     <phil@philjordan.eu <mailto:phil@philjordan.eu>
> >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>
> >      >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>
> >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>>
> >      >      >      > <mailto:phil@philjordan.eu
> >     <mailto:phil@philjordan.eu> <mailto:phil@philjordan.eu
> >     <mailto:phil@philjordan.eu>>
> >      >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>
> >     <mailto:phil@philjordan.eu <mailto:phil@philjordan.eu>>>>> wrote:
> >      >      >      >
> >      >      >      >
> >      >      >      >          >      >
> >      >      >      >          >      > Hmm. I think if we were to use
> >     that, we
> >      >     would
> >      >      >     need to
> >      >      >      >         create a new
> >      >      >      >          >      > QemuEvent for every job and
> >     destroy it
> >      >     afterward,
> >      >      >      >         which seems
> >      >      >      >          >     expensive.
> >      >      >      >          >      > We can't rule out multiple
> concurrent
> >      >     jobs being
> >      >      >      >         submitted, and the
> >      >      >      >          >      > QemuEvent system only supports a
> >     single
> >      >     producer as
> >      >      >      >         far as I can
> >      >      >      >          >     tell.
> >      >      >      >          >      >
> >      >      >      >          >      > You can probably sort of hack
> >     around it with
> >      >      >     just one
> >      >      >      >         QemuEvent by
> >      >      >      >          >      > putting the qemu_event_wait into
> >     a loop
> >      >     and turning
> >      >      >      >         the job.done
> >      >      >      >          >     flag
> >      >      >      >          >      > into an atomic (because it would
> now
> >      >     need to be
> >      >      >      >         checked outside the
> >      >      >      >          >      > lock) but this all seems
> >     unnecessarily
> >      >     complicated
> >      >      >      >         considering the
> >      >      >      >          >      > QemuEvent uses the same mechanism
> >     QemuCond/
> >      >      >     QemuMutex
> >      >      >      >         internally
> >      >      >      >          >     on macOS
> >      >      >      >          >      > (the only platform relevant
> >     here), except we
> >      >      >     can use it as
> >      >      >      >          >     intended with
> >      >      >      >          >      > QemuCond/QemuMutex rather than
> >     having to
> >      >     work
> >      >      >     against the
> >      >      >      >          >     abstraction.
> >      >      >      >          >
> >      >      >      >          >     I don't think it's going to be used
> >      >     concurrently. It
> >      >      >      >         would be difficult
> >      >      >      >          >     to reason even for the framework if
> it
> >      >     performs memory
> >      >      >      >          >     unmapping/mapping/reading operations
> >      >     concurrently.
> >      >      >      >          >
> >      >      >      >          >
> >      >      >      >          > I've just performed a very quick test by
> >      >     wrapping the job
> >      >      >      >         submission/
> >      >      >      >          > wait in the 2 mapMemory callbacks and
> the 1
> >      >     readMemory
> >      >      >      >         callback with
> >      >      >      >          > atomic counters and logging whenever a
> >     counter went
> >      >      >     above 1.
> >      >      >      >          >
> >      >      >      >          >   * Overall, concurrent callbacks across
> all
> >      >     types were
> >      >      >      >         common (many per
> >      >      >      >          > second when the VM is busy). It's not
> >     exactly a
> >      >      >     "thundering
> >      >      >      >         herd" (I
> >      >      >      >          > never saw >2) but it's probably not a
> >     bad idea
> >      >     to use
> >      >      >     a separate
> >      >      >      >          > condition variable for each job type.
> >     (task map,
> >      >      >     surface map,
> >      >      >      >         memory read)
> >      >      >      >          >   * While I did not observe any
> concurrent
> >      >     memory mapping
> >      >      >      >         operations
> >      >      >      >          > *within* a type of memory map (2 task
> >     mappings or 2
> >      >      >     surface
> >      >      >      >         mappings) I
> >      >      >      >          > did see very occasional concurrent
> >     memory *read*
> >      >      >     callbacks.
> >      >      >      >         These would,
> >      >      >      >          > as far as I can tell, not be safe with
> >     QemuEvents,
> >      >      >     unless we
> >      >      >      >         placed the
> >      >      >      >          > event inside the job struct and init/
> >     destroyed
> >      >     it on every
> >      >      >      >         callback
> >      >      >      >          > (which seems like excessive overhead).
> >      >      >      >
> >      >      >      >         I think we can tolerate that overhead.
> >     init/destroy
> >      >      >     essentially
> >      >      >      >         sets the
> >      >      >      >         fields in the data structure and I estimate
> its
> >      >     total size is
> >      >      >      >         about 100
> >      >      >      >         bytes. It is probably better than waking an
> >      >     irrelevant thread
> >      >      >      >         up. I also
> >      >      >      >         hope that keeps the code simple; it's not
> >     worthwhile
> >      >      >     adding code to
> >      >      >      >         optimize this.
> >      >      >      >
> >      >      >      >
> >      >      >      >     At least pthread_cond_{init,destroy} and
> >      >      >      >     pthread_mutex_{init,destroy} don't make any
> >     syscalls,
> >      >     so yeah
> >      >      >     it's
> >      >      >      >     probably an acceptable overhead.
> >      >      >      >
> >      >      >      >
> >      >      >      > I've just experimented with QemuEvents created on-
> >     demand
> >      >     and ran
> >      >      >     into
> >      >      >      > some weird deadlocks, which then made me sit down
> and
> >      >     think about it
> >      >      >      > some more. I've come to the conclusion that
> >     creating (and
> >      >     crucially,
> >      >      >      > destroying) QemuEvents on demand in this way is not
> >     safe.
> >      >      >      >
> >      >      >      > Specifically, you must not call
> >     qemu_event_destroy() - which
> >      >      >      > transitively destroys the mutex and condition
> >     variable -
> >      >     unless
> >      >      >     you can
> >      >      >      > guarantee that the qemu_event_set() call on that
> event
> >      >     object has
> >      >      >     completed.
> >      >      >      >
> >      >      >      > In qemu_event_set, the event object's value is
> >     atomically
> >      >     set to
> >      >      >     EV_SET.
> >      >      >      > If the previous value was EV_BUSY,
> qemu_futex_wake() is
> >      >     called.
> >      >      >     All of
> >      >      >      > this is outside any mutex, however, so apart from
> >     memory
> >      >     coherence
> >      >      >      > (there are barriers) this can race with the waiting
> >     thread.
> >      >      >      > qemu_event_wait() reads the event's value. If
> >     EV_FREE, it's
> >      >      >     atomically
> >      >      >      > set to EV_BUSY. Then the mutex is locked, the value
> >     is checked
> >      >      >     again,
> >      >      >      > and if it's still EV_BUSY, it waits for the
> condition
> >      >     variable,
> >      >      >      > otherwise the mutex is immediately unlocked again.
> >     If the
> >      >     trigger
> >      >      >      > thread's qemu_event_set() flip to EV_SET occurs
> between
> >      >     the waiting
> >      >      >      > thread's two atomic reads of the value, the waiting
> >     thread
> >      >     will
> >      >      >     never
> >      >      >      > wait for the condition variable, but the trigger
> thread
> >      >     WILL try to
> >      >      >      > acquire the mutex and signal the condition variable
> in
> >      >      >      > qemu_futex_wake(), by which  time the waiting
> >     thread may have
> >      >      >     advanced
> >      >      >      > outside of qemu_event_wait().
> >      >      >
> >      >      >     Sorry if I'm making a mistake again, but the waiting
> >     thread won't
> >      >      >     set to
> >      >      >     EV_BUSY unless the value is EV_FREE on the second read
> >     so the
> >      >     trigger
> >      >      >     thread will not call qemu_futex_wake() if it manages
> >     to set
> >      >     to EV_SET
> >      >      >     before the second read, will it?
> >      >      >
> >      >      >
> >      >      > This sequence of events will cause the problem:
> >      >      >
> >      >      > WAITER (in qemu_event_wait):
> >      >      > value = qatomic_load_acquire(&ev->value);
> >      >      > -> EV_FREE
> >      >      >
> >      >      > TRIGGER (in qemu_event_set):
> >      >      > qatomic_read(&ev->value) != EV_SET
> >      >      > -> EV_FREE (condition is false)
> >      >      >
> >      >      > WAITER:
> >      >      > qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET
> >      >      > -> cmpxchg returns EV_FREE, condition false.
> >      >      > ev->value =  EV_BUSY.
> >      >      > > TRIGGER:
> >      >      >          int old = qatomic_xchg(&ev->value, EV_SET);
> >      >      >          smp_mb__after_rmw();
> >      >      >          if (old == EV_BUSY) {
> >      >      > -> old = EV_BUSY, condition true.
> >      >      > ev->value = EV_SET
> >      >      >
> >      >      > WAITER (in qemu_futex_wait(ev, EV_BUSY)):
> >      >      >      pthread_mutex_lock(&ev->lock);
> >      >      >      if (ev->value == val) {
> >      >      > -> false, because value is EV_SET
> >      >      >
> >      >      > WAITER:
> >      >      >      pthread_mutex_unlock(&ev->lock);
> >      >      >      …
> >      >      >      qemu_event_destroy(&job->done_event);
> >      >      >
> >      >      > TRIGGER (in qemu_futex_wake(ev, INT_MAX)):
> >      >      >      pthread_mutex_lock(&ev->lock);
> >      >      > -> hangs, because mutex has been destroyed
> >      >
> >      >     Thanks for clarification. This is very insightful.
> >      >
> >      >
> >      >      >
> >      >      >      >
> >      >      >      > This is all fine usually, BUT if you destroy the
> >     QemuEvent
> >      >      >     immediately
> >      >      >      > after the qemu_event_wait() call, qemu_futex_wake()
> >     may try to
> >      >      >     lock a
> >      >      >      > mutex that has been destroyed, or signal a
> >     condition variable
> >      >      >     which has
> >      >      >      > been destroyed. I don't see a reasonable way of
> making
> >      >     this safe
> >      >      >     other
> >      >      >      > than using long-lived mutexes and condition
> >     variables. And
> >      >      >     anyway, we
> >      >      >      > have much, MUCH bigger contention/performance issues
> >      >     coming from
> >      >      >     almost
> >      >      >      > everything being covered by the BQL. (If waking
> these
> >      >     callbacks
> >      >      >     can even
> >      >      >      > be considered an issue: I haven't seen it show up in
> >      >     profiling,
> >      >      >     whereas
> >      >      >      > BQL contention very much does.)
> >      >      >      >
> >      >      >      > I'll submit v5 of this patch set with separate
> >     condition
> >      >      >     variables for
> >      >      >      > each job type. This should make the occurrence of
> >     waking
> >      >     the wrong
> >      >      >      > thread quite rare, while reasoning about
> >     correctness is pretty
> >      >      >      > straightforward. I think that's good enough.
> >      >
> >      >     What about using QemuSemaphore then? It does not seem to have
> the
> >      >     problem same with QemuEvent.
> >      >
> >      >
> >      > Nowhere else in the code base uses short-lived semaphores, and
> >     while I
> >      > can't immediately see a risk (the mutex is held during both post
> and
> >      > wait) there might be some non-obvious problem with the approach.
> >      > Internally, the semaphores use condition variables. The solution
> >     using
> >      > condition variables directly already works, is safe, relatively
> >     easy to
> >      > reason about, and does not cause any performance issues. There is
> >     a tiny
> >      > inefficiency about waking up a thread unnecessarily in the rare
> case
> >      > when two callbacks of the same kind occur concurrently. In
> practice,
> >      > it's irrelevant. Thanks to the awkward mismatch of the
> >      > PVGraphics.framework's libdispatch based approach and Qemu's BQL/
> >     AIO/BH
> >      > approach, we are already sending messages to other threads very
> >      > frequently. This isn't ideal, but not fixable without drastically
> >      > reducing the need to acquire the BQL across Qemu.
> >
> >     I found several usage of ephemeral semaphores:
> >     h_random() in hw/ppc/spapr_rng.c
> >     colo_process_checkpoint() in migration/colo.c
> >     postcopy_thread_create() in migration/postcopy-ram.c
> >
> >     I'm sure short-lived semaphores will keep working (or break
> >     migration in
> >     strange ways).
> >
> >      >
> >      > I do not think it is worth spending even more time trying to fix
> >     this
> >      > part of the code which isn't broken in the first place.
> >
> >     I'm sorry to bring you to this mess, which I didn't really expect. I
> >     thought combining a shared pair of conditional variable and mutex and
> >     job-specific bools is unnecessarily complex, and having one
> >     synchronization primitive for each job will be simpler and will just
> >     work.
> >
> >
> > With multithreading, the devil is always in the detail! 😅 I wouldn't
> > mind if we were seeing genuine issues with the Mutex/Cond code, but it's
> > fine as far as I can tell. The QemuEvent version wasn't even really any
> > simpler (replacing bool done; with QemuEvent done_event; and await ->
> > init/wait/destroy gets longer while lock/broadcast/unlock -> set gets
> > shorter), and I guess a QemuSemaphore version would be about the same.
> > Relying on the way an edge case is handled - destroying immediately
> > after waiting - in the long term potentially makes the code more fragile
> > too in case implementation details change. I think we've reached a
> > bikeshedding stage here, and I suggest any further improvements on this
> > part other than bug fixes should be deferred to future patches.
>
> We still have more than bikeshedding. There are two design options
> discussed:
>
> 1) Whether synchronization primitives should be localized
> 2) Whether short-lived QemuSemaphore is appropriate if 1) is true
>
> We discussed 2) in details, but haven't done much for 1) so there is
> some room for discussion.
>
> Even for 2), let me emphasize that avoiding ephemeral QemuSemaphore in
> one device implementation is not appropriate as means to deal with its
> potential problem when there is similar existing usage. QEMU needs to be
> correct as whole, and having a workaround only part of its codebase is
> not OK. We need to either follow existing patterns or prepare for even
> more discussion (and I'm for the former).
>
> Regarding 1), I think it's easier just to show code. Below is my idea of
> code change to localize synchronization primitives. This code is not
> tested or even compiled, but it should be sufficient to demonstrate the
> idea. There are a few notable observations:
>
> a) apple_gfx_await_bh_job() can be extended to absorb all repetitive
> code of BH jobs. Such a change is probably beneficial even when
> synchronization primitives are shared, but it is more beneficial when
> synchronization primitives are localized since it allows to wrap init
> and destroy.
>
> b) No need to declare multiple conditional variables and choosing one of
> them for each job type. Instead we can have one definition and let it
> instantiated whenever creating BH jobs.
>
> c) Localized synchronization primitives make reasoning simpler and makes
> the comment in apple-gfx.h unnecessary. We still need the discussion of
> QemuEvent v.s. QemuSemaphore, but it will be dealt in the common code so
> apple-gfx does not need to have its own comment.
>

After moving the memory mapping/unmapping code into RCU critical sections,
there's now actually only a single BH job on which we need to block in the
callback. This is the readMemory/DMA job. I've implemented that using an
ephemeral QemuSemaphore now, although I've skipped the extra level of
indirection, helper function and macro, as they'd only be used once.

I've done some testing on both x86-64 and arm64 with the QemuSemaphore and
unlike the QemuEvent, I haven't run into any deadlocks/hangs with it.

I've just posted v6 of the patch set with this change (and some other
changes vs v5).

Thanks,
Phil




> Regards,
> Akihiko Odaki
>
> ---
>   hw/display/apple-gfx.h      | 19 +++---------
>   hw/display/apple-gfx-mmio.m | 23 ++------------
>   hw/display/apple-gfx.m      | 60 +++++++++++++++++--------------------
>   3 files changed, 34 insertions(+), 68 deletions(-)
>
> diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
> index e9fef09e37ea..b5aeed4f3dcf 100644
> --- a/hw/display/apple-gfx.h
> +++ b/hw/display/apple-gfx.h
> @@ -40,19 +40,6 @@ typedef struct AppleGFXState {
>       dispatch_queue_t render_queue;
>       struct AppleGFXDisplayMode *display_modes;
>       uint32_t num_display_modes;
> -    /*
> -     * QemuMutex & QemuConds for awaiting completion of PVG
> memory-mapping and
> -     * reading requests after submitting them to run in the AIO context.
> -     * QemuCond (rather than QemuEvent) are used so multiple concurrent
> jobs
> -     * can be handled safely.
> -     * The state associated with each job is tracked in a AppleGFX*Job
> struct
> -     * for each kind of job; instances are allocated on the caller's
> stack.
> -     * This struct also contains the completion flag which is used in
> -     * conjunction with the condition variable.
> -     */
> -    QemuMutex job_mutex;
> -    QemuCond task_map_job_cond;
> -    QemuCond mem_read_job_cond;
>
>       /* tasks is protected by task_mutex */
>       QemuMutex task_mutex;
> @@ -82,8 +69,10 @@ void apple_gfx_common_realize(AppleGFXState *s,
> PGDeviceDescriptor *desc,
>   uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
>                                                  uint64_t length, bool
> read_only,
>                                                  MemoryRegion
> **mapping_in_region);
> -void apple_gfx_await_bh_job(AppleGFXState *s, QemuCond *job_cond,
> -                            bool *job_done_flag);
> +void apple_gfx_bh_job_run_full(QEMUBHFunc *cb, void *opaque, const char
> *name);
> +
> +#define apple_gfx_bh_job_run(cb, opaque) \
> +    apple_gfx_bh_job_run_full((cb), (opaque), (stringify(cb)))
>
>   extern const PropertyInfo qdev_prop_display_mode;
>
> diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
> index a801c5fa722e..889a23df89e9 100644
> --- a/hw/display/apple-gfx-mmio.m
> +++ b/hw/display/apple-gfx-mmio.m
> @@ -60,8 +60,6 @@ -(void)mmioWriteAtOffset:(size_t)offset
> value:(uint32_t)value;
>
>       AppleGFXState common;
>
> -    QemuCond iosfc_map_job_cond;
> -    QemuCond iosfc_unmap_job_cond;
>       qemu_irq irq_gfx;
>       qemu_irq irq_iosfc;
>       MemoryRegion iomem_iosfc;
> @@ -154,7 +152,6 @@ static void raise_irq(void *opaque)
>       AppleGFXMMIOState *state;
>       bool read_only;
>       bool success;
> -    bool done;
>   } AppleGFXMapSurfaceMemoryJob;
>
>   typedef struct AppleGFXMMIOMappedRegion {
> @@ -203,18 +200,13 @@ static void apple_gfx_mmio_map_surface_memory(void
> *opaque)
>           }
>       }
>
> -    qemu_mutex_lock(&s->common.job_mutex);
>       job->result_mem = (void *)mem;
>       job->success = mem != 0;
> -    job->done = true;
> -    qemu_cond_broadcast(&s->iosfc_map_job_cond);
> -    qemu_mutex_unlock(&s->common.job_mutex);
>   }
>
>   typedef struct AppleGFXUnmapSurfaceMemoryJob {
>       void *virtual_address;
>       AppleGFXMMIOState *state;
> -    bool done;
>   } AppleGFXUnmapSurfaceMemoryJob;
>
>   static AppleGFXMMIOMappedRegion *find_mapped_region_containing(GArray
> *regions,
> @@ -257,11 +249,6 @@ static void
> apple_gfx_mmio_unmap_surface_memory(void *opaque)
>                         __func__,
>                         job->virtual_address, regions->len);
>       }
> -
> -    qemu_mutex_lock(&s->common.job_mutex);
> -    job->done = true;
> -    qemu_cond_broadcast(&s->iosfc_unmap_job_cond);
> -    qemu_mutex_unlock(&s->common.job_mutex);
>   }
>
>   static PGIOSurfaceHostDevice *apple_gfx_prepare_iosurface_host_device(
> @@ -278,9 +265,7 @@ static void apple_gfx_mmio_unmap_surface_memory(void
> *opaque)
>                   .read_only = ro, .state = s,
>               };
>
> -            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> -                                    apple_gfx_mmio_map_surface_memory,
> &job);
> -            apple_gfx_await_bh_job(&s->common, &s->iosfc_map_job_cond,
> &job.done);
> +            apple_gfx_bh_job_run(apple_gfx_mmio_map_surface_memory, &job);
>
>               *va = job.result_mem;
>
> @@ -295,9 +280,7 @@ static void apple_gfx_mmio_unmap_surface_memory(void
> *opaque)
>               AppleGFXUnmapSurfaceMemoryJob job = { va, s };
>               trace_apple_gfx_iosfc_unmap_memory(va, b, c, d, e, f);
>
> -            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> -
> apple_gfx_mmio_unmap_surface_memory, &job);
> -            apple_gfx_await_bh_job(&s->common,
> &s->iosfc_unmap_job_cond, &job.done);
> +            apple_gfx_bh_job_run(apple_gfx_mmio_unmap_surface_memory,
> &job);
>
>               return true;
>           };
> @@ -336,8 +319,6 @@ static void apple_gfx_mmio_realize(DeviceState *dev,
> Error **errp)
>                                 2 /* Usually no more RAM regions*/);
>
>           apple_gfx_common_realize(&s->common, desc, errp);
> -        qemu_cond_init(&s->iosfc_map_job_cond);
> -        qemu_cond_init(&s->iosfc_unmap_job_cond);
>
>           [desc release];
>           desc = nil;
> diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
> index 2e264e5561fc..4d174e766310 100644
> --- a/hw/display/apple-gfx.m
> +++ b/hw/display/apple-gfx.m
> @@ -90,6 +90,31 @@ static dispatch_queue_t get_background_queue(void)
>       return task;
>   }
>
> +typedef struct AppleGFXJob {
> +  QEMUBHFunc cb;
> +  void *opaque;
> +  QemuSemaphore sem;
> +} AppleGFXJob;
> +
> +void apple_gfx_bh_job_cb(void *opaque)
> +{
> +  AppleGFXJob *job = opaque;
> +  job->cb(job->opaque);
> +  qemu_sem_post(&job->sem);
> +}
> +
> +void apple_gfx_bh_job_run_full(QEMUBHFunc *cb, void *opaque, const char
> *name)
> +{
> +    AppleGFXJob job;
> +    job->cb = cb;
> +    job->opaque = opaque;
> +    qemu_sem_init(&job->sem, 0);
> +    aio_bh_schedule_oneshot_full(qemu_get_aio_context(),
> apple_gfx_bh_job_cb,
> +                                 &job, name);
> +    qemu_sem_wait(&job->sem);
> +    qemu_sem_destroy(&job->sem);
> +}
> +
>   typedef struct AppleGFXIOJob {
>       AppleGFXState *state;
>       uint64_t offset;
> @@ -355,7 +380,6 @@ void apple_gfx_common_init(Object *obj,
> AppleGFXState *s, const char* obj_name)
>       uint32_t range_count;
>       bool read_only;
>       bool success;
> -    bool done;
>   } AppleGFXMapMemoryJob;
>
>   uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
> @@ -457,20 +481,7 @@ static void apple_gfx_map_memory(void *opaque)
>           g_assert(r == KERN_SUCCESS);
>       }
>
> -    qemu_mutex_lock(&s->job_mutex);
>       job->success = success;
> -    job->done = true;
> -    qemu_cond_broadcast(&s->task_map_job_cond);
> -    qemu_mutex_unlock(&s->job_mutex);
> -}
> -
> -void apple_gfx_await_bh_job(AppleGFXState *s, QemuCond *job_cond, bool
> *job_done_flag)
> -{
> -    qemu_mutex_lock(&s->job_mutex);
> -    while (!*job_done_flag) {
> -        qemu_cond_wait(job_cond, &s->job_mutex);
> -    }
> -    qemu_mutex_unlock(&s->job_mutex);
>   }
>
>   typedef struct AppleGFXReadMemoryJob {
> @@ -478,8 +489,6 @@ void apple_gfx_await_bh_job(AppleGFXState *s,
> QemuCond *job_cond, bool *job_done
>       hwaddr physical_address;
>       uint64_t length;
>       void *dst;
> -    bool done;
> -    bool success;
>   } AppleGFXReadMemoryJob;
>
>   static void apple_gfx_do_read_memory(void *opaque)
> @@ -491,11 +500,6 @@ static void apple_gfx_do_read_memory(void *opaque)
>       r = dma_memory_read(&address_space_memory, job->physical_address,
>                           job->dst, job->length, MEMTXATTRS_UNSPECIFIED);
>       job->success = r == MEMTX_OK;
> -
> -    qemu_mutex_lock(&s->job_mutex);
> -    job->done = true;
> -    qemu_cond_broadcast(&s->mem_read_job_cond);
> -    qemu_mutex_unlock(&s->job_mutex);
>   }
>
>   static bool apple_gfx_read_memory(AppleGFXState *s, hwaddr
> physical_address,
> @@ -508,9 +512,7 @@ static bool apple_gfx_read_memory(AppleGFXState *s,
> hwaddr physical_address,
>       trace_apple_gfx_read_memory(physical_address, length, dst);
>
>       /* Traversing the memory map requires RCU/BQL, so do it in a BH. */
> -    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> apple_gfx_do_read_memory,
> -                            &job);
> -    apple_gfx_await_bh_job(s, &s->mem_read_job_cond, &job.done);
> +    apple_gfx_bh_job_run(s, apple_gfx_do_read_memory, &job);
>       return job.success;
>   }
>
> @@ -556,12 +558,10 @@ static void
> apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
>               .state = s,
>               .task = task, .ranges = ranges, .range_count = range_count,
>               .read_only = read_only, .virtual_offset = virtual_offset,
> -            .done = false, .success = true,
> +            .success = true,
>           };
>           if (range_count > 0) {
> -            aio_bh_schedule_oneshot(qemu_get_aio_context(),
> -                                    apple_gfx_map_memory, &job);
> -            apple_gfx_await_bh_job(s, &s->task_map_job_cond, &job.done);
> +            apple_gfx_bh_job_run(s, apple_gfx_map_memory, &job);
>           }
>           return job.success;
>       };
> @@ -780,10 +780,6 @@ void apple_gfx_common_realize(AppleGFXState *s,
> PGDeviceDescriptor *desc,
>           apple_gfx_create_display_mode_array(display_modes,
> num_display_modes);
>
>       create_fb(s);
> -
> -    qemu_mutex_init(&s->job_mutex);
> -    qemu_cond_init(&s->task_map_job_cond);
> -    qemu_cond_init(&s->mem_read_job_cond);
>   }
>
>   static void apple_gfx_get_display_mode(Object *obj, Visitor *v,
> --
> 2.47.0
>

[-- Attachment #2: Type: text/html, Size: 40559 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 03/15] hw/display/apple-gfx: Adds PCI implementation
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
  2024-10-24 10:27 ` [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling Phil Dennis-Jordan
  2024-10-24 10:28 ` [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  4:45   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 04/15] hw/display/apple-gfx: Adds configurable mode list Phil Dennis-Jordan
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

This change wires up the PCI variant of the paravirtualised
graphics device, mainly useful for x86-64 macOS guests, implemented
by macOS's ParavirtualizedGraphics.framework. It builds on code
shared with the vmapple/mmio variant of the PVG device.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---

v4:

 * Threading improvements analogous to those in common apple-gfx code
   and mmio device variant.
 * Smaller code review issues addressed.

 hw/display/Kconfig         |   4 +
 hw/display/apple-gfx-pci.m | 152 +++++++++++++++++++++++++++++++++++++
 hw/display/meson.build     |   1 +
 3 files changed, 157 insertions(+)
 create mode 100644 hw/display/apple-gfx-pci.m

diff --git a/hw/display/Kconfig b/hw/display/Kconfig
index 6a9b7b19ada..2b53dfd7d26 100644
--- a/hw/display/Kconfig
+++ b/hw/display/Kconfig
@@ -149,3 +149,7 @@ config MAC_PVG_MMIO
     bool
     depends on MAC_PVG && AARCH64
 
+config MAC_PVG_PCI
+    bool
+    depends on MAC_PVG && PCI
+    default y if PCI_DEVICES
diff --git a/hw/display/apple-gfx-pci.m b/hw/display/apple-gfx-pci.m
new file mode 100644
index 00000000000..4ee26dde422
--- /dev/null
+++ b/hw/display/apple-gfx-pci.m
@@ -0,0 +1,152 @@
+/*
+ * QEMU Apple ParavirtualizedGraphics.framework device, PCI variant
+ *
+ * Copyright © 2023-2024 Phil Dennis-Jordan
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
+ * which implements 3d graphics passthrough to the host as well as a
+ * proprietary guest communication channel to drive it. This device model
+ * implements support to drive that library from within QEMU as a PCI device
+ * aimed primarily at x86-64 macOS VMs.
+ */
+
+#include "apple-gfx.h"
+#include "hw/pci/pci_device.h"
+#include "hw/pci/msi.h"
+#include "qapi/error.h"
+#include "trace.h"
+#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
+
+OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXPCIState, APPLE_GFX_PCI)
+
+struct AppleGFXPCIState {
+    PCIDevice parent_obj;
+
+    AppleGFXState common;
+};
+
+static const char* apple_gfx_pci_option_rom_path = NULL;
+
+static void apple_gfx_init_option_rom_path(void)
+{
+    NSURL *option_rom_url = PGCopyOptionROMURL();
+    const char *option_rom_path = option_rom_url.fileSystemRepresentation;
+    apple_gfx_pci_option_rom_path = g_strdup(option_rom_path);
+    [option_rom_url release];
+}
+
+static void apple_gfx_pci_init(Object *obj)
+{
+    AppleGFXPCIState *s = APPLE_GFX_PCI(obj);
+
+    if (!apple_gfx_pci_option_rom_path) {
+        /* The following is done on device not class init to avoid running
+         * ObjC code before fork() in -daemonize mode. */
+        PCIDeviceClass *pci = PCI_DEVICE_CLASS(object_get_class(obj));
+        apple_gfx_init_option_rom_path();
+        pci->romfile = apple_gfx_pci_option_rom_path;
+    }
+
+    apple_gfx_common_init(obj, &s->common, TYPE_APPLE_GFX_PCI);
+}
+
+typedef struct AppleGFXPCIInterruptJob {
+    PCIDevice *device;
+    uint32_t vector;
+} AppleGFXPCIInterruptJob;
+
+static void apple_gfx_pci_raise_interrupt(void *opaque)
+{
+    AppleGFXPCIInterruptJob *job = opaque;
+
+    if (msi_enabled(job->device)) {
+        msi_notify(job->device, job->vector);
+    }
+    g_free(job);
+}
+
+static void apple_gfx_pci_interrupt(PCIDevice *dev, AppleGFXPCIState *s,
+                                    uint32_t vector)
+{
+    AppleGFXPCIInterruptJob *job;
+
+    trace_apple_gfx_raise_irq(vector);
+    job = g_malloc0(sizeof(*job));
+    job->device = dev;
+    job->vector = vector;
+    aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                            apple_gfx_pci_raise_interrupt, job);
+}
+
+static void apple_gfx_pci_realize(PCIDevice *dev, Error **errp)
+{
+    AppleGFXPCIState *s = APPLE_GFX_PCI(dev);
+    Error *err = NULL;
+    int ret;
+
+    pci_register_bar(dev, PG_PCI_BAR_MMIO,
+                     PCI_BASE_ADDRESS_SPACE_MEMORY, &s->common.iomem_gfx);
+
+    ret = msi_init(dev, 0x0 /* config offset; 0 = find space */,
+                   PG_PCI_MAX_MSI_VECTORS, true /* msi64bit */,
+                   false /*msi_per_vector_mask*/, &err);
+    if (ret != 0) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    @autoreleasepool {
+        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
+        desc.raiseInterrupt = ^(uint32_t vector) {
+            apple_gfx_pci_interrupt(dev, s, vector);
+        };
+
+        apple_gfx_common_realize(&s->common, desc, errp);
+        [desc release];
+        desc = nil;
+    }
+}
+
+static void apple_gfx_pci_reset(Object *obj, ResetType type)
+{
+    AppleGFXPCIState *s = APPLE_GFX_PCI(obj);
+    [s->common.pgdev reset];
+}
+
+static void apple_gfx_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *pci = PCI_DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
+
+    assert(rc->phases.hold == NULL);
+    rc->phases.hold = apple_gfx_pci_reset;
+    dc->desc = "macOS Paravirtualized Graphics PCI Display Controller";
+    dc->hotpluggable = false;
+    set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
+
+    pci->vendor_id = PG_PCI_VENDOR_ID;
+    pci->device_id = PG_PCI_DEVICE_ID;
+    pci->class_id = PCI_CLASS_DISPLAY_OTHER;
+    pci->realize = apple_gfx_pci_realize;
+
+    // TODO: Property for setting mode list
+}
+
+static TypeInfo apple_gfx_pci_types[] = {
+    {
+        .name          = TYPE_APPLE_GFX_PCI,
+        .parent        = TYPE_PCI_DEVICE,
+        .instance_size = sizeof(AppleGFXPCIState),
+        .class_init    = apple_gfx_pci_class_init,
+        .instance_init = apple_gfx_pci_init,
+        .interfaces = (InterfaceInfo[]) {
+            { INTERFACE_PCIE_DEVICE },
+            { },
+        },
+    }
+};
+DEFINE_TYPES(apple_gfx_pci_types)
+
diff --git a/hw/display/meson.build b/hw/display/meson.build
index 619e642905a..78e1c41ea0a 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -65,6 +65,7 @@ system_ss.add(when: 'CONFIG_MAC_PVG',         if_true: [files('apple-gfx.m'), pv
 if cpu == 'aarch64'
   system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true: [files('apple-gfx-mmio.m'), pvg, metal])
 endif
+system_ss.add(when: 'CONFIG_MAC_PVG_PCI',     if_true: [files('apple-gfx-pci.m'), pvg, metal])
 
 if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
   virtio_gpu_ss = ss.source_set()
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 03/15] hw/display/apple-gfx: Adds PCI implementation
  2024-10-24 10:28 ` [PATCH v4 03/15] hw/display/apple-gfx: Adds PCI implementation Phil Dennis-Jordan
@ 2024-10-26  4:45   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  4:45 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> This change wires up the PCI variant of the paravirtualised
> graphics device, mainly useful for x86-64 macOS guests, implemented
> by macOS's ParavirtualizedGraphics.framework. It builds on code
> shared with the vmapple/mmio variant of the PVG device.
> 
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> 
> v4:
> 
>   * Threading improvements analogous to those in common apple-gfx code
>     and mmio device variant.
>   * Smaller code review issues addressed.
> 
>   hw/display/Kconfig         |   4 +
>   hw/display/apple-gfx-pci.m | 152 +++++++++++++++++++++++++++++++++++++
>   hw/display/meson.build     |   1 +
>   3 files changed, 157 insertions(+)
>   create mode 100644 hw/display/apple-gfx-pci.m
> 
> diff --git a/hw/display/Kconfig b/hw/display/Kconfig
> index 6a9b7b19ada..2b53dfd7d26 100644
> --- a/hw/display/Kconfig
> +++ b/hw/display/Kconfig
> @@ -149,3 +149,7 @@ config MAC_PVG_MMIO
>       bool
>       depends on MAC_PVG && AARCH64
>   
> +config MAC_PVG_PCI
> +    bool
> +    depends on MAC_PVG && PCI
> +    default y if PCI_DEVICES
> diff --git a/hw/display/apple-gfx-pci.m b/hw/display/apple-gfx-pci.m
> new file mode 100644
> index 00000000000..4ee26dde422
> --- /dev/null
> +++ b/hw/display/apple-gfx-pci.m
> @@ -0,0 +1,152 @@
> +/*
> + * QEMU Apple ParavirtualizedGraphics.framework device, PCI variant
> + *
> + * Copyright © 2023-2024 Phil Dennis-Jordan
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
> + * which implements 3d graphics passthrough to the host as well as a
> + * proprietary guest communication channel to drive it. This device model
> + * implements support to drive that library from within QEMU as a PCI device
> + * aimed primarily at x86-64 macOS VMs.
> + */
> +
> +#include "apple-gfx.h"
> +#include "hw/pci/pci_device.h"
> +#include "hw/pci/msi.h"
> +#include "qapi/error.h"
> +#include "trace.h"
> +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXPCIState, APPLE_GFX_PCI)
> +
> +struct AppleGFXPCIState {
> +    PCIDevice parent_obj;
> +
> +    AppleGFXState common;
> +};
> +
> +static const char* apple_gfx_pci_option_rom_path = NULL;
> +
> +static void apple_gfx_init_option_rom_path(void)
> +{
> +    NSURL *option_rom_url = PGCopyOptionROMURL();
> +    const char *option_rom_path = option_rom_url.fileSystemRepresentation;
> +    apple_gfx_pci_option_rom_path = g_strdup(option_rom_path);
> +    [option_rom_url release];
> +}
> +
> +static void apple_gfx_pci_init(Object *obj)
> +{
> +    AppleGFXPCIState *s = APPLE_GFX_PCI(obj);
> +
> +    if (!apple_gfx_pci_option_rom_path) {
> +        /* The following is done on device not class init to avoid running
> +         * ObjC code before fork() in -daemonize mode. */
> +        PCIDeviceClass *pci = PCI_DEVICE_CLASS(object_get_class(obj));
> +        apple_gfx_init_option_rom_path();
> +        pci->romfile = apple_gfx_pci_option_rom_path;
> +    }
> +
> +    apple_gfx_common_init(obj, &s->common, TYPE_APPLE_GFX_PCI);
> +}
> +
> +typedef struct AppleGFXPCIInterruptJob {
> +    PCIDevice *device;
> +    uint32_t vector;
> +} AppleGFXPCIInterruptJob;
> +
> +static void apple_gfx_pci_raise_interrupt(void *opaque)
> +{
> +    AppleGFXPCIInterruptJob *job = opaque;
> +
> +    if (msi_enabled(job->device)) {
> +        msi_notify(job->device, job->vector);
> +    }
> +    g_free(job);
> +}
> +
> +static void apple_gfx_pci_interrupt(PCIDevice *dev, AppleGFXPCIState *s,
> +                                    uint32_t vector)
> +{
> +    AppleGFXPCIInterruptJob *job;
> +
> +    trace_apple_gfx_raise_irq(vector);
> +    job = g_malloc0(sizeof(*job));
> +    job->device = dev;
> +    job->vector = vector;
> +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +                            apple_gfx_pci_raise_interrupt, job);
> +}
> +
> +static void apple_gfx_pci_realize(PCIDevice *dev, Error **errp)
> +{
> +    AppleGFXPCIState *s = APPLE_GFX_PCI(dev);
> +    Error *err = NULL;
> +    int ret;
> +
> +    pci_register_bar(dev, PG_PCI_BAR_MMIO,
> +                     PCI_BASE_ADDRESS_SPACE_MEMORY, &s->common.iomem_gfx);
> +
> +    ret = msi_init(dev, 0x0 /* config offset; 0 = find space */,
> +                   PG_PCI_MAX_MSI_VECTORS, true /* msi64bit */,
> +                   false /*msi_per_vector_mask*/, &err);
> +    if (ret != 0) {
> +        error_propagate(errp, err);

Don't use error_propaget() but just pass errp to msi_init.

> +        return;
> +    }
> +
> +    @autoreleasepool {
> +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
> +        desc.raiseInterrupt = ^(uint32_t vector) {
> +            apple_gfx_pci_interrupt(dev, s, vector);
> +        };
> +
> +        apple_gfx_common_realize(&s->common, desc, errp);
> +        [desc release];
> +        desc = nil;
> +    }
> +}
> +
> +static void apple_gfx_pci_reset(Object *obj, ResetType type)
> +{
> +    AppleGFXPCIState *s = APPLE_GFX_PCI(obj);
> +    [s->common.pgdev reset];
> +}
> +
> +static void apple_gfx_pci_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *pci = PCI_DEVICE_CLASS(klass);
> +    ResettableClass *rc = RESETTABLE_CLASS(klass);
> +
> +    assert(rc->phases.hold == NULL);

Remove this assertion; we don't have such for other PCI devices.

> +    rc->phases.hold = apple_gfx_pci_reset;
> +    dc->desc = "macOS Paravirtualized Graphics PCI Display Controller";
> +    dc->hotpluggable = false;
> +    set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
> +
> +    pci->vendor_id = PG_PCI_VENDOR_ID;
> +    pci->device_id = PG_PCI_DEVICE_ID;
> +    pci->class_id = PCI_CLASS_DISPLAY_OTHER;
> +    pci->realize = apple_gfx_pci_realize;
> +
> +    // TODO: Property for setting mode list
> +}
> +
> +static TypeInfo apple_gfx_pci_types[] = {
> +    {
> +        .name          = TYPE_APPLE_GFX_PCI,
> +        .parent        = TYPE_PCI_DEVICE,
> +        .instance_size = sizeof(AppleGFXPCIState),
> +        .class_init    = apple_gfx_pci_class_init,
> +        .instance_init = apple_gfx_pci_init,
> +        .interfaces = (InterfaceInfo[]) {
> +            { INTERFACE_PCIE_DEVICE },
> +            { },
> +        },
> +    }
> +};
> +DEFINE_TYPES(apple_gfx_pci_types)
> +
> diff --git a/hw/display/meson.build b/hw/display/meson.build
> index 619e642905a..78e1c41ea0a 100644
> --- a/hw/display/meson.build
> +++ b/hw/display/meson.build
> @@ -65,6 +65,7 @@ system_ss.add(when: 'CONFIG_MAC_PVG',         if_true: [files('apple-gfx.m'), pv
>   if cpu == 'aarch64'
>     system_ss.add(when: 'CONFIG_MAC_PVG_MMIO',  if_true: [files('apple-gfx-mmio.m'), pvg, metal])
>   endif
> +system_ss.add(when: 'CONFIG_MAC_PVG_PCI',     if_true: [files('apple-gfx-pci.m'), pvg, metal])
>   
>   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
>     virtio_gpu_ss = ss.source_set()



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 04/15] hw/display/apple-gfx: Adds configurable mode list
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (2 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 03/15] hw/display/apple-gfx: Adds PCI implementation Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  5:15   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 05/15] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF Phil Dennis-Jordan
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

This change adds a property 'display_modes' on the graphics device
which permits specifying a list of display modes. (screen resolution
and refresh rate)

The property is an array of a custom type to make the syntax slightly
less awkward to use, for example:

-device '{"driver":"apple-gfx-pci", "display-modes":["1920x1080@60", "3840x2160@60"]}'

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---

v4:

 * Switched to the native array property type, which recently gained
	 command line support.
 * The property has also been added to the -mmio variant.
 * Tidied up the code a little.

 hw/display/apple-gfx-mmio.m |   8 +++
 hw/display/apple-gfx-pci.m  |   9 ++-
 hw/display/apple-gfx.h      |  12 ++++
 hw/display/apple-gfx.m      | 127 ++++++++++++++++++++++++++++++++----
 hw/display/trace-events     |   2 +
 5 files changed, 145 insertions(+), 13 deletions(-)

diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
index 06131bc23f1..5d427c7005e 100644
--- a/hw/display/apple-gfx-mmio.m
+++ b/hw/display/apple-gfx-mmio.m
@@ -261,6 +261,12 @@ static void apple_gfx_mmio_reset(Object *obj, ResetType type)
     [s->common.pgdev reset];
 }
 
+static Property apple_gfx_mmio_properties[] = {
+    DEFINE_PROP_ARRAY("display-modes", AppleGFXMMIOState,
+                      common.num_display_modes, common.display_modes,
+                      qdev_prop_display_mode, AppleGFXDisplayMode),
+    DEFINE_PROP_END_OF_LIST(),
+};
 
 static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
 {
@@ -270,6 +276,8 @@ static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
     rc->phases.hold = apple_gfx_mmio_reset;
     dc->hotpluggable = false;
     dc->realize = apple_gfx_mmio_realize;
+
+    device_class_set_props(dc, apple_gfx_mmio_properties);
 }
 
 static TypeInfo apple_gfx_mmio_types[] = {
diff --git a/hw/display/apple-gfx-pci.m b/hw/display/apple-gfx-pci.m
index 4ee26dde422..32e81bbef8b 100644
--- a/hw/display/apple-gfx-pci.m
+++ b/hw/display/apple-gfx-pci.m
@@ -115,6 +115,13 @@ static void apple_gfx_pci_reset(Object *obj, ResetType type)
     [s->common.pgdev reset];
 }
 
+static Property apple_gfx_pci_properties[] = {
+    DEFINE_PROP_ARRAY("display-modes", AppleGFXPCIState,
+                      common.num_display_modes, common.display_modes,
+                      qdev_prop_display_mode, AppleGFXDisplayMode),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void apple_gfx_pci_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -132,7 +139,7 @@ static void apple_gfx_pci_class_init(ObjectClass *klass, void *data)
     pci->class_id = PCI_CLASS_DISPLAY_OTHER;
     pci->realize = apple_gfx_pci_realize;
 
-    // TODO: Property for setting mode list
+    device_class_set_props(dc, apple_gfx_pci_properties);
 }
 
 static TypeInfo apple_gfx_pci_types[] = {
diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
index 39931fba65a..d2c6a14229a 100644
--- a/hw/display/apple-gfx.h
+++ b/hw/display/apple-gfx.h
@@ -9,6 +9,7 @@
 #import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
 #include "qemu/typedefs.h"
 #include "exec/memory.h"
+#include "hw/qdev-properties.h"
 #include "ui/surface.h"
 
 @class PGDeviceDescriptor;
@@ -20,6 +21,7 @@
 
 typedef QTAILQ_HEAD(, PGTask_s) PGTaskList;
 
+struct AppleGFXDisplayMode;
 struct AppleGFXMapMemoryJob;
 typedef struct AppleGFXState {
     MemoryRegion iomem_gfx;
@@ -31,6 +33,8 @@ typedef struct AppleGFXState {
     id<MTLCommandQueue> mtl_queue;
     bool cursor_show;
     QEMUCursor *cursor;
+    struct AppleGFXDisplayMode *display_modes;
+    uint32_t num_display_modes;
 
     /* For running PVG memory-mapping requests in the AIO context */
     QemuCond job_cond;
@@ -47,6 +51,12 @@ typedef struct AppleGFXState {
     id<MTLTexture> texture;
 } AppleGFXState;
 
+typedef struct AppleGFXDisplayMode {
+    uint16_t width_px;
+    uint16_t height_px;
+    uint16_t refresh_rate_hz;
+} AppleGFXDisplayMode;
+
 void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char* obj_name);
 void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
                               Error **errp);
@@ -54,5 +64,7 @@ uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
                                                uint64_t length, bool read_only);
 void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag);
 
+extern const PropertyInfo qdev_prop_display_mode;
+
 #endif
 
diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
index 46be9957f69..42b601329fb 100644
--- a/hw/display/apple-gfx.m
+++ b/hw/display/apple-gfx.m
@@ -28,9 +28,10 @@
 #include "qapi/error.h"
 #include "ui/console.h"
 
-static const PGDisplayCoord_t apple_gfx_modes[] = {
-    { .x = 1440, .y = 1080 },
-    { .x = 1280, .y = 1024 },
+static const AppleGFXDisplayMode apple_gfx_default_modes[] = {
+    { 1920, 1080, 60 },
+    { 1440, 1080, 60 },
+    { 1280, 1024, 60 },
 };
 
 /* This implements a type defined in <ParavirtualizedGraphics/PGDevice.h>
@@ -303,7 +304,6 @@ static void set_mode(AppleGFXState *s, uint32_t width, uint32_t height)
 static void create_fb(AppleGFXState *s)
 {
     s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
-    set_mode(s, 1440, 1080);
 
     s->cursor_show = true;
 }
@@ -628,20 +628,25 @@ static void apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
     return disp_desc;
 }
 
-static NSArray<PGDisplayMode*>* apple_gfx_prepare_display_mode_array(void)
+static NSArray<PGDisplayMode*>* apple_gfx_create_display_mode_array(
+    const AppleGFXDisplayMode display_modes[], uint32_t display_mode_count)
 {
-    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
+    PGDisplayMode **modes = alloca(sizeof(modes[0]) * display_mode_count);
     NSArray<PGDisplayMode*>* mode_array = nil;
-    int i;
+    uint32_t i;
 
-    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
+    for (i = 0; i < display_mode_count; i++) {
+        const AppleGFXDisplayMode *mode = &display_modes[i];
+        trace_apple_gfx_display_mode(i, mode->width_px, mode->height_px);
+        PGDisplayCoord_t mode_size = { mode->width_px, mode->height_px };
         modes[i] =
-            [[PGDisplayMode alloc] initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
+            [[PGDisplayMode alloc] initWithSizeInPixels:mode_size
+                                        refreshRateInHz:mode->refresh_rate_hz];
     }
 
-    mode_array = [NSArray arrayWithObjects:modes count:ARRAY_SIZE(apple_gfx_modes)];
+    mode_array = [NSArray arrayWithObjects:modes count:display_mode_count];
 
-    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
+    for (i = 0; i < display_mode_count; i++) {
         [modes[i] release];
         modes[i] = nil;
     }
@@ -679,6 +684,8 @@ void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
                               Error **errp)
 {
     PGDisplayDescriptor *disp_desc = nil;
+    const AppleGFXDisplayMode *display_modes = apple_gfx_default_modes;
+    int num_display_modes = ARRAY_SIZE(apple_gfx_default_modes);
 
     if (apple_gfx_mig_blocker == NULL) {
         error_setg(&apple_gfx_mig_blocker,
@@ -704,10 +711,106 @@ void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
     s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
                                               port:0 serialNum:1234];
     [disp_desc release];
-    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
+
+    if (s->display_modes != NULL && s->num_display_modes > 0) {
+        trace_apple_gfx_common_realize_modes_property(s->num_display_modes);
+        display_modes = s->display_modes;
+        num_display_modes = s->num_display_modes;
+    }
+    s->pgdisp.modeList =
+        apple_gfx_create_display_mode_array(display_modes, num_display_modes);
 
     create_fb(s);
 
     qemu_mutex_init(&s->job_mutex);
     qemu_cond_init(&s->job_cond);
 }
+
+static void apple_gfx_get_display_mode(Object *obj, Visitor *v,
+                                       const char *name, void *opaque,
+                                       Error **errp)
+{
+    Property *prop = opaque;
+    AppleGFXDisplayMode *mode = object_field_prop_ptr(obj, prop);
+    /* 3 uint16s (max 5 digits) and 2 separator characters + nul. */
+    static const size_t buffer_size = 5 * 3 + 2 + 1;
+
+    char buffer[buffer_size];
+    char *pos = buffer;
+
+    int rc = snprintf(buffer, buffer_size,
+                      "%"PRIu16"x%"PRIu16"@%"PRIu16,
+                      mode->width_px, mode->height_px,
+                      mode->refresh_rate_hz);
+    assert(rc < buffer_size);
+
+    visit_type_str(v, name, &pos, errp);
+}
+
+static void apple_gfx_set_display_mode(Object *obj, Visitor *v,
+                                       const char *name, void *opaque,
+                                       Error **errp)
+{
+    Property *prop = opaque;
+    AppleGFXDisplayMode *mode = object_field_prop_ptr(obj, prop);
+    Error *local_err = NULL;
+    const char *endptr;
+    char *str;
+    int ret;
+    unsigned int val;
+
+    visit_type_str(v, name, &str, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    endptr = str;
+
+    ret = qemu_strtoui(endptr, &endptr, 10, &val);
+    if (ret || val > UINT16_MAX || val == 0) {
+        error_setg(errp, "width in '%s' must be a decimal integer number "
+                   "of pixels in the range 1..65535", name);
+        goto out;
+    }
+    mode->width_px = val;
+    if (*endptr != 'x') {
+        goto separator_error;
+    }
+
+    ret = qemu_strtoui(endptr + 1, &endptr, 10, &val);
+    if (ret || val > UINT16_MAX || val == 0) {
+        error_setg(errp, "height in '%s' must be a decimal integer number "
+                   "of pixels in the range 1..65535", name);
+        goto out;
+    }
+    mode->height_px = val;
+    if (*endptr != '@') {
+        goto separator_error;
+    }
+
+    ret = qemu_strtoui(endptr + 1, &endptr, 10, &val);
+    if (ret) {
+        error_setg(errp, "refresh rate in '%s'"
+                   " must be a non-negative decimal integer (Hertz)", name);
+    }
+    mode->refresh_rate_hz = val;
+
+    goto out;
+
+separator_error:
+    error_setg(errp, "Each display mode takes the format "
+               "'<width>x<height>@<rate>'");
+out:
+    g_free(str);
+    return;
+}
+
+const PropertyInfo qdev_prop_display_mode = {
+    .name  = "display_mode",
+    .description =
+        "Display mode in pixels and Hertz, as <width>x<height>@<refresh-rate> "
+        "Example: 3840x2160@60",
+    .get   = apple_gfx_get_display_mode,
+    .set   = apple_gfx_set_display_mode,
+};
diff --git a/hw/display/trace-events b/hw/display/trace-events
index 214998312b9..2780239dbde 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -209,6 +209,8 @@ apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t height) "bpp=%d widt
 apple_gfx_cursor_show(uint32_t show) "show=%d"
 apple_gfx_cursor_move(void) ""
 apple_gfx_common_init(const char *device_name, size_t mmio_size) "device: %s; MMIO size: %zu bytes"
+apple_gfx_common_realize_modes_property(uint32_t num_modes) "using %u modes supplied by 'display-modes' device property"
+apple_gfx_display_mode(uint32_t mode_idx, uint16_t width_px, uint16_t height_px) "mode %2"PRIu32": %4"PRIu16"x%4"PRIu16
 
 # apple-gfx-mmio.m
 apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/15] hw/display/apple-gfx: Adds configurable mode list
  2024-10-24 10:28 ` [PATCH v4 04/15] hw/display/apple-gfx: Adds configurable mode list Phil Dennis-Jordan
@ 2024-10-26  5:15   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  5:15 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> This change adds a property 'display_modes' on the graphics device
> which permits specifying a list of display modes. (screen resolution
> and refresh rate)
> 
> The property is an array of a custom type to make the syntax slightly
> less awkward to use, for example:
> 
> -device '{"driver":"apple-gfx-pci", "display-modes":["1920x1080@60", "3840x2160@60"]}'
> 
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> 
> v4:
> 
>   * Switched to the native array property type, which recently gained
> 	 command line support.
>   * The property has also been added to the -mmio variant.
>   * Tidied up the code a little.
> 
>   hw/display/apple-gfx-mmio.m |   8 +++
>   hw/display/apple-gfx-pci.m  |   9 ++-
>   hw/display/apple-gfx.h      |  12 ++++
>   hw/display/apple-gfx.m      | 127 ++++++++++++++++++++++++++++++++----
>   hw/display/trace-events     |   2 +
>   5 files changed, 145 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/display/apple-gfx-mmio.m b/hw/display/apple-gfx-mmio.m
> index 06131bc23f1..5d427c7005e 100644
> --- a/hw/display/apple-gfx-mmio.m
> +++ b/hw/display/apple-gfx-mmio.m
> @@ -261,6 +261,12 @@ static void apple_gfx_mmio_reset(Object *obj, ResetType type)
>       [s->common.pgdev reset];
>   }
>   
> +static Property apple_gfx_mmio_properties[] = {
> +    DEFINE_PROP_ARRAY("display-modes", AppleGFXMMIOState,
> +                      common.num_display_modes, common.display_modes,
> +                      qdev_prop_display_mode, AppleGFXDisplayMode),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
>   
>   static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
>   {
> @@ -270,6 +276,8 @@ static void apple_gfx_mmio_class_init(ObjectClass *klass, void *data)
>       rc->phases.hold = apple_gfx_mmio_reset;
>       dc->hotpluggable = false;
>       dc->realize = apple_gfx_mmio_realize;
> +
> +    device_class_set_props(dc, apple_gfx_mmio_properties);
>   }
>   
>   static TypeInfo apple_gfx_mmio_types[] = {
> diff --git a/hw/display/apple-gfx-pci.m b/hw/display/apple-gfx-pci.m
> index 4ee26dde422..32e81bbef8b 100644
> --- a/hw/display/apple-gfx-pci.m
> +++ b/hw/display/apple-gfx-pci.m
> @@ -115,6 +115,13 @@ static void apple_gfx_pci_reset(Object *obj, ResetType type)
>       [s->common.pgdev reset];
>   }
>   
> +static Property apple_gfx_pci_properties[] = {
> +    DEFINE_PROP_ARRAY("display-modes", AppleGFXPCIState,
> +                      common.num_display_modes, common.display_modes,
> +                      qdev_prop_display_mode, AppleGFXDisplayMode),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>   static void apple_gfx_pci_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -132,7 +139,7 @@ static void apple_gfx_pci_class_init(ObjectClass *klass, void *data)
>       pci->class_id = PCI_CLASS_DISPLAY_OTHER;
>       pci->realize = apple_gfx_pci_realize;
>   
> -    // TODO: Property for setting mode list
> +    device_class_set_props(dc, apple_gfx_pci_properties);
>   }
>   
>   static TypeInfo apple_gfx_pci_types[] = {
> diff --git a/hw/display/apple-gfx.h b/hw/display/apple-gfx.h
> index 39931fba65a..d2c6a14229a 100644
> --- a/hw/display/apple-gfx.h
> +++ b/hw/display/apple-gfx.h
> @@ -9,6 +9,7 @@
>   #import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>   #include "qemu/typedefs.h"
>   #include "exec/memory.h"
> +#include "hw/qdev-properties.h"
>   #include "ui/surface.h"
>   
>   @class PGDeviceDescriptor;
> @@ -20,6 +21,7 @@
>   
>   typedef QTAILQ_HEAD(, PGTask_s) PGTaskList;
>   
> +struct AppleGFXDisplayMode;
>   struct AppleGFXMapMemoryJob;
>   typedef struct AppleGFXState {
>       MemoryRegion iomem_gfx;
> @@ -31,6 +33,8 @@ typedef struct AppleGFXState {
>       id<MTLCommandQueue> mtl_queue;
>       bool cursor_show;
>       QEMUCursor *cursor;
> +    struct AppleGFXDisplayMode *display_modes;
> +    uint32_t num_display_modes;
>   
>       /* For running PVG memory-mapping requests in the AIO context */
>       QemuCond job_cond;
> @@ -47,6 +51,12 @@ typedef struct AppleGFXState {
>       id<MTLTexture> texture;
>   } AppleGFXState;
>   
> +typedef struct AppleGFXDisplayMode {
> +    uint16_t width_px;
> +    uint16_t height_px;
> +    uint16_t refresh_rate_hz;
> +} AppleGFXDisplayMode;
> +
>   void apple_gfx_common_init(Object *obj, AppleGFXState *s, const char* obj_name);
>   void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
>                                 Error **errp);
> @@ -54,5 +64,7 @@ uintptr_t apple_gfx_host_address_for_gpa_range(uint64_t guest_physical,
>                                                  uint64_t length, bool read_only);
>   void apple_gfx_await_bh_job(AppleGFXState *s, bool *job_done_flag);
>   
> +extern const PropertyInfo qdev_prop_display_mode;
> +
>   #endif
>   
> diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
> index 46be9957f69..42b601329fb 100644
> --- a/hw/display/apple-gfx.m
> +++ b/hw/display/apple-gfx.m
> @@ -28,9 +28,10 @@
>   #include "qapi/error.h"
>   #include "ui/console.h"
>   
> -static const PGDisplayCoord_t apple_gfx_modes[] = {
> -    { .x = 1440, .y = 1080 },
> -    { .x = 1280, .y = 1024 },
> +static const AppleGFXDisplayMode apple_gfx_default_modes[] = {
> +    { 1920, 1080, 60 },
> +    { 1440, 1080, 60 },
> +    { 1280, 1024, 60 },
>   };
>   
>   /* This implements a type defined in <ParavirtualizedGraphics/PGDevice.h>
> @@ -303,7 +304,6 @@ static void set_mode(AppleGFXState *s, uint32_t width, uint32_t height)
>   static void create_fb(AppleGFXState *s)
>   {
>       s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
> -    set_mode(s, 1440, 1080);
>   
>       s->cursor_show = true;
>   }
> @@ -628,20 +628,25 @@ static void apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
>       return disp_desc;
>   }
>   
> -static NSArray<PGDisplayMode*>* apple_gfx_prepare_display_mode_array(void)
> +static NSArray<PGDisplayMode*>* apple_gfx_create_display_mode_array(
> +    const AppleGFXDisplayMode display_modes[], uint32_t display_mode_count)
>   {
> -    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
> +    PGDisplayMode **modes = alloca(sizeof(modes[0]) * display_mode_count);

Avoid alloca().

>       NSArray<PGDisplayMode*>* mode_array = nil;
> -    int i;
> +    uint32_t i;
>   
> -    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> +    for (i = 0; i < display_mode_count; i++) {
> +        const AppleGFXDisplayMode *mode = &display_modes[i];
> +        trace_apple_gfx_display_mode(i, mode->width_px, mode->height_px);
> +        PGDisplayCoord_t mode_size = { mode->width_px, mode->height_px };
>           modes[i] =
> -            [[PGDisplayMode alloc] initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
> +            [[PGDisplayMode alloc] initWithSizeInPixels:mode_size
> +                                        refreshRateInHz:mode->refresh_rate_hz];
>       }
>   
> -    mode_array = [NSArray arrayWithObjects:modes count:ARRAY_SIZE(apple_gfx_modes)];
> +    mode_array = [NSArray arrayWithObjects:modes count:display_mode_count];
>   
> -    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
> +    for (i = 0; i < display_mode_count; i++) {
>           [modes[i] release];
>           modes[i] = nil;
>       }
> @@ -679,6 +684,8 @@ void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
>                                 Error **errp)
>   {
>       PGDisplayDescriptor *disp_desc = nil;
> +    const AppleGFXDisplayMode *display_modes = apple_gfx_default_modes;
> +    int num_display_modes = ARRAY_SIZE(apple_gfx_default_modes);
>   
>       if (apple_gfx_mig_blocker == NULL) {
>           error_setg(&apple_gfx_mig_blocker,
> @@ -704,10 +711,106 @@ void apple_gfx_common_realize(AppleGFXState *s, PGDeviceDescriptor *desc,
>       s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
>                                                 port:0 serialNum:1234];
>       [disp_desc release];
> -    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
> +
> +    if (s->display_modes != NULL && s->num_display_modes > 0) {
> +        trace_apple_gfx_common_realize_modes_property(s->num_display_modes);
> +        display_modes = s->display_modes;
> +        num_display_modes = s->num_display_modes;
> +    }
> +    s->pgdisp.modeList =
> +        apple_gfx_create_display_mode_array(display_modes, num_display_modes);
>   
>       create_fb(s);
>   
>       qemu_mutex_init(&s->job_mutex);
>       qemu_cond_init(&s->job_cond);
>   }
> +
> +static void apple_gfx_get_display_mode(Object *obj, Visitor *v,
> +                                       const char *name, void *opaque,
> +                                       Error **errp)
> +{
> +    Property *prop = opaque;
> +    AppleGFXDisplayMode *mode = object_field_prop_ptr(obj, prop);
> +    /* 3 uint16s (max 5 digits) and 2 separator characters + nul. */
> +    static const size_t buffer_size = 5 * 3 + 2 + 1;
> +
> +    char buffer[buffer_size];

I prefer it to be written as: char buffer[5 * 3 + 2 + 1];
to avoid the indirection by having another variable.

 > +    char *pos = buffer;> +
> +    int rc = snprintf(buffer, buffer_size,
> +                      "%"PRIu16"x%"PRIu16"@%"PRIu16,
> +                      mode->width_px, mode->height_px,
> +                      mode->refresh_rate_hz);
> +    assert(rc < buffer_size);
> +
> +    visit_type_str(v, name, &pos, errp);
> +}
> +
> +static void apple_gfx_set_display_mode(Object *obj, Visitor *v,
> +                                       const char *name, void *opaque,
> +                                       Error **errp)
> +{
> +    Property *prop = opaque;
> +    AppleGFXDisplayMode *mode = object_field_prop_ptr(obj, prop);
> +    Error *local_err = NULL;
> +    const char *endptr;
> +    char *str;
> +    int ret;
> +    unsigned int val;
> +
> +    visit_type_str(v, name, &str, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    endptr = str;
> +
> +    ret = qemu_strtoui(endptr, &endptr, 10, &val);
> +    if (ret || val > UINT16_MAX || val == 0) {
> +        error_setg(errp, "width in '%s' must be a decimal integer number "
> +                   "of pixels in the range 1..65535", name);
> +        goto out;
> +    }
> +    mode->width_px = val;
> +    if (*endptr != 'x') {
> +        goto separator_error;
> +    }
> +
> +    ret = qemu_strtoui(endptr + 1, &endptr, 10, &val);
> +    if (ret || val > UINT16_MAX || val == 0) {
> +        error_setg(errp, "height in '%s' must be a decimal integer number "
> +                   "of pixels in the range 1..65535", name);
> +        goto out;
> +    }
> +    mode->height_px = val;
> +    if (*endptr != '@') {
> +        goto separator_error;
> +    }
> +
> +    ret = qemu_strtoui(endptr + 1, &endptr, 10, &val);

Use qemu_strtoi() or it will have a perculiar behavior with negative 
values; see the comment in util/cutils.c for details.

> +    if (ret) {
> +        error_setg(errp, "refresh rate in '%s'"
> +                   " must be a non-negative decimal integer (Hertz)", name);
> +    }
> +    mode->refresh_rate_hz = val;
> +
> +    goto out;
> +
> +separator_error:
> +    error_setg(errp, "Each display mode takes the format "
> +               "'<width>x<height>@<rate>'");
> +out:
> +    g_free(str);

Use g_autofree. docs/devel/style.rst has some explanation.

> +    return;
> +}
> +
> +const PropertyInfo qdev_prop_display_mode = {
> +    .name  = "display_mode",
> +    .description =
> +        "Display mode in pixels and Hertz, as <width>x<height>@<refresh-rate> "
> +        "Example: 3840x2160@60",
> +    .get   = apple_gfx_get_display_mode,
> +    .set   = apple_gfx_set_display_mode,
> +};
> diff --git a/hw/display/trace-events b/hw/display/trace-events
> index 214998312b9..2780239dbde 100644
> --- a/hw/display/trace-events
> +++ b/hw/display/trace-events
> @@ -209,6 +209,8 @@ apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t height) "bpp=%d widt
>   apple_gfx_cursor_show(uint32_t show) "show=%d"
>   apple_gfx_cursor_move(void) ""
>   apple_gfx_common_init(const char *device_name, size_t mmio_size) "device: %s; MMIO size: %zu bytes"
> +apple_gfx_common_realize_modes_property(uint32_t num_modes) "using %u modes supplied by 'display-modes' device property"
> +apple_gfx_display_mode(uint32_t mode_idx, uint16_t width_px, uint16_t height_px) "mode %2"PRIu32": %4"PRIu16"x%4"PRIu16
>   
>   # apple-gfx-mmio.m
>   apple_gfx_mmio_iosfc_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 05/15] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (3 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 04/15] hw/display/apple-gfx: Adds configurable mode list Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-11-05 15:36   ` Roman Bolshakov
  2024-10-24 10:28 ` [PATCH v4 06/15] hw: Add vmapple subdir Phil Dennis-Jordan
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

I'm happy to take responsibility for the macOS PV graphics code. As
HVF patches don't seem to get much attention at the moment, I'm also
adding myself as designated reviewer for HVF and x86 HVF to try and
improve that.

I anticipate that the resulting workload should be covered by the
funding I'm receiving for improving Qemu in combination with macOS. As
of right now this runs out at the end of 2024; I expect the workload on
apple-gfx should be relatively minor and manageable in my spare time
beyond that. I may have to remove myself from more general HVF duties
once the contract runs out if it's more than I can manage.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c3bfa132fd6..16ea47a5e6d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -505,6 +505,7 @@ F: target/arm/hvf/
 X86 HVF CPUs
 M: Cameron Esfahani <dirty@apple.com>
 M: Roman Bolshakov <rbolshakov@ddn.com>
+R: Phil Dennis-Jordan <phil@philjordan.eu>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
 F: target/i386/hvf/
@@ -512,6 +513,7 @@ F: target/i386/hvf/
 HVF
 M: Cameron Esfahani <dirty@apple.com>
 M: Roman Bolshakov <rbolshakov@ddn.com>
+R: Phil Dennis-Jordan <phil@philjordan.eu>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
 F: accel/hvf/
@@ -2580,6 +2582,11 @@ F: hw/display/edid*
 F: include/hw/display/edid.h
 F: qemu-edid.c
 
+macOS PV Graphics (apple-gfx)
+M: Phil Dennis-Jordan <phil@philjordan.eu>
+S: Maintained
+F: hw/display/apple-gfx*
+
 PIIX4 South Bridge (i82371AB)
 M: Hervé Poussineau <hpoussin@reactos.org>
 M: Philippe Mathieu-Daudé <philmd@linaro.org>
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/15] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF
  2024-10-24 10:28 ` [PATCH v4 05/15] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF Phil Dennis-Jordan
@ 2024-11-05 15:36   ` Roman Bolshakov
  0 siblings, 0 replies; 42+ messages in thread
From: Roman Bolshakov @ 2024-11-05 15:36 UTC (permalink / raw)
  To: Phil Dennis-Jordan
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

On Thu, Oct 24, 2024 at 12:28:03PM +0200, Phil Dennis-Jordan wrote:
> I'm happy to take responsibility for the macOS PV graphics code. As
> HVF patches don't seem to get much attention at the moment, I'm also
> adding myself as designated reviewer for HVF and x86 HVF to try and
> improve that.
> 
> I anticipate that the resulting workload should be covered by the
> funding I'm receiving for improving Qemu in combination with macOS. As
> of right now this runs out at the end of 2024; I expect the workload on
> apple-gfx should be relatively minor and manageable in my spare time
> beyond that. I may have to remove myself from more general HVF duties
> once the contract runs out if it's more than I can manage.
> 
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
>  MAINTAINERS | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c3bfa132fd6..16ea47a5e6d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -505,6 +505,7 @@ F: target/arm/hvf/
>  X86 HVF CPUs
>  M: Cameron Esfahani <dirty@apple.com>
>  M: Roman Bolshakov <rbolshakov@ddn.com>
> +R: Phil Dennis-Jordan <phil@philjordan.eu>
>  W: https://wiki.qemu.org/Features/HVF
>  S: Maintained
>  F: target/i386/hvf/
> @@ -512,6 +513,7 @@ F: target/i386/hvf/
>  HVF
>  M: Cameron Esfahani <dirty@apple.com>
>  M: Roman Bolshakov <rbolshakov@ddn.com>
> +R: Phil Dennis-Jordan <phil@philjordan.eu>
>  W: https://wiki.qemu.org/Features/HVF
>  S: Maintained
>  F: accel/hvf/
> @@ -2580,6 +2582,11 @@ F: hw/display/edid*
>  F: include/hw/display/edid.h
>  F: qemu-edid.c
>  
> +macOS PV Graphics (apple-gfx)
> +M: Phil Dennis-Jordan <phil@philjordan.eu>
> +S: Maintained
> +F: hw/display/apple-gfx*
> +
>  PIIX4 South Bridge (i82371AB)
>  M: Hervé Poussineau <hpoussin@reactos.org>
>  M: Philippe Mathieu-Daudé <philmd@linaro.org>
> -- 
> 2.39.3 (Apple Git-145)
> 
> 

Thanks for helping out,
Reviewed-by: Roman Bolshakov <rbolshakov@ddn.com>

I have recently got some cycles to do HVF work too at DDN.

i386 HVF future is not clear as it took two days to update my 2015 MBA
just to get QEMU compiled. It's no longer supported by brew.sh so I had
to go through some hoops to get dependencies compiled.

Regards,
Roman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 06/15] hw: Add vmapple subdir
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (4 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 05/15] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-24 10:28 ` [PATCH v4 07/15] hw/misc/pvpanic: Add MMIO interface Phil Dennis-Jordan
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

We will introduce a number of devices that are specific to the vmapple
target machine. To keep them all tidily together, let's put them into
a single target directory.

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 MAINTAINERS             | 7 +++++++
 hw/Kconfig              | 1 +
 hw/meson.build          | 1 +
 hw/vmapple/Kconfig      | 1 +
 hw/vmapple/meson.build  | 0
 hw/vmapple/trace-events | 2 ++
 hw/vmapple/trace.h      | 1 +
 meson.build             | 1 +
 8 files changed, 14 insertions(+)
 create mode 100644 hw/vmapple/Kconfig
 create mode 100644 hw/vmapple/meson.build
 create mode 100644 hw/vmapple/trace-events
 create mode 100644 hw/vmapple/trace.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 16ea47a5e6d..104813ed85f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2733,6 +2733,13 @@ F: hw/hyperv/hv-balloon*.h
 F: include/hw/hyperv/dynmem-proto.h
 F: include/hw/hyperv/hv-balloon.h
 
+VMapple
+M: Alexander Graf <agraf@csgraf.de>
+R: Phil Dennis-Jordan <phil@philjordan.eu>
+S: Maintained
+F: hw/vmapple/*
+F: include/hw/vmapple/*
+
 Subsystems
 ----------
 Overall Audio backends
diff --git a/hw/Kconfig b/hw/Kconfig
index 1b4e9bb07f7..2871784cfdc 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -41,6 +41,7 @@ source ufs/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
 source vfio/Kconfig
+source vmapple/Kconfig
 source xen/Kconfig
 source watchdog/Kconfig
 
diff --git a/hw/meson.build b/hw/meson.build
index b827c82c5d7..9c4f6d0d636 100644
--- a/hw/meson.build
+++ b/hw/meson.build
@@ -39,6 +39,7 @@ subdir('ufs')
 subdir('usb')
 subdir('vfio')
 subdir('virtio')
+subdir('vmapple')
 subdir('watchdog')
 subdir('xen')
 subdir('xenpv')
diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
new file mode 100644
index 00000000000..8b137891791
--- /dev/null
+++ b/hw/vmapple/Kconfig
@@ -0,0 +1 @@
+
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events
new file mode 100644
index 00000000000..9ccc5790487
--- /dev/null
+++ b/hw/vmapple/trace-events
@@ -0,0 +1,2 @@
+# See docs/devel/tracing.rst for syntax documentation.
+
diff --git a/hw/vmapple/trace.h b/hw/vmapple/trace.h
new file mode 100644
index 00000000000..572adbefe04
--- /dev/null
+++ b/hw/vmapple/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_vmapple.h"
diff --git a/meson.build b/meson.build
index 0e124eff13f..dd07a425f0e 100644
--- a/meson.build
+++ b/meson.build
@@ -3478,6 +3478,7 @@ if have_system
     'hw/usb',
     'hw/vfio',
     'hw/virtio',
+    'hw/vmapple',
     'hw/watchdog',
     'hw/xen',
     'hw/gpio',
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 07/15] hw/misc/pvpanic: Add MMIO interface
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (5 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 06/15] hw: Add vmapple subdir Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-24 10:28 ` [PATCH v4 08/15] hvf: arm: Ignore writes to CNTP_CTL_EL0 Phil Dennis-Jordan
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

In addition to the ISA and PCI variants of pvpanic, let's add an MMIO
platform device that we can use in embedded arm environments.

Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/misc/Kconfig           |  4 +++
 hw/misc/meson.build       |  1 +
 hw/misc/pvpanic-mmio.c    | 61 +++++++++++++++++++++++++++++++++++++++
 include/hw/misc/pvpanic.h |  1 +
 4 files changed, 67 insertions(+)
 create mode 100644 hw/misc/pvpanic-mmio.c

diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 1f1baa5dde9..5a6c1603b60 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -145,6 +145,10 @@ config PVPANIC_ISA
     depends on ISA_BUS
     select PVPANIC_COMMON
 
+config PVPANIC_MMIO
+    bool
+    select PVPANIC_COMMON
+
 config AUX
     bool
     select I2C
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index d02d96e403b..4de4db0a600 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -122,6 +122,7 @@ system_ss.add(when: 'CONFIG_ARMSSE_MHU', if_true: files('armsse-mhu.c'))
 
 system_ss.add(when: 'CONFIG_PVPANIC_ISA', if_true: files('pvpanic-isa.c'))
 system_ss.add(when: 'CONFIG_PVPANIC_PCI', if_true: files('pvpanic-pci.c'))
+system_ss.add(when: 'CONFIG_PVPANIC_MMIO', if_true: files('pvpanic-mmio.c'))
 system_ss.add(when: 'CONFIG_AUX', if_true: files('auxbus.c'))
 system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files(
   'aspeed_hace.c',
diff --git a/hw/misc/pvpanic-mmio.c b/hw/misc/pvpanic-mmio.c
new file mode 100644
index 00000000000..56738efee53
--- /dev/null
+++ b/hw/misc/pvpanic-mmio.c
@@ -0,0 +1,61 @@
+/*
+ * QEMU simulated pvpanic device (MMIO frontend)
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/qdev-properties.h"
+#include "hw/misc/pvpanic.h"
+#include "hw/sysbus.h"
+#include "standard-headers/misc/pvpanic.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(PVPanicMMIOState, PVPANIC_MMIO_DEVICE)
+
+#define PVPANIC_MMIO_SIZE 0x2
+
+struct PVPanicMMIOState {
+    SysBusDevice parent_obj;
+
+    PVPanicState pvpanic;
+};
+
+static void pvpanic_mmio_initfn(Object *obj)
+{
+    PVPanicMMIOState *s = PVPANIC_MMIO_DEVICE(obj);
+
+    pvpanic_setup_io(&s->pvpanic, DEVICE(s), PVPANIC_MMIO_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->pvpanic.mr);
+}
+
+static Property pvpanic_mmio_properties[] = {
+    DEFINE_PROP_UINT8("events", PVPanicMMIOState, pvpanic.events,
+                      PVPANIC_PANICKED | PVPANIC_CRASH_LOADED),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pvpanic_mmio_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    device_class_set_props(dc, pvpanic_mmio_properties);
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo pvpanic_mmio_info = {
+    .name          = TYPE_PVPANIC_MMIO_DEVICE,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(PVPanicMMIOState),
+    .instance_init = pvpanic_mmio_initfn,
+    .class_init    = pvpanic_mmio_class_init,
+};
+
+static void pvpanic_register_types(void)
+{
+    type_register_static(&pvpanic_mmio_info);
+}
+
+type_init(pvpanic_register_types)
diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h
index 9a71a5ad0d7..049a94c1125 100644
--- a/include/hw/misc/pvpanic.h
+++ b/include/hw/misc/pvpanic.h
@@ -26,6 +26,7 @@
 
 #define TYPE_PVPANIC_ISA_DEVICE "pvpanic"
 #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci"
+#define TYPE_PVPANIC_MMIO_DEVICE "pvpanic-mmio"
 
 #define PVPANIC_IOPORT_PROP "ioport"
 
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 08/15] hvf: arm: Ignore writes to CNTP_CTL_EL0
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (6 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 07/15] hw/misc/pvpanic: Add MMIO interface Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-24 10:28 ` [PATCH v4 09/15] gpex: Allow more than 4 legacy IRQs Phil Dennis-Jordan
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

MacOS unconditionally disables interrupts of the physical timer on boot
and then continues to use the virtual one. We don't really want to support
a full physical timer emulation, so let's just ignore those writes.

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 target/arm/hvf/hvf.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 6cea483d422..b45b764dfd0 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
 
 #include "sysemu/runstate.h"
 #include "sysemu/hvf.h"
@@ -184,6 +185,7 @@ void hvf_arm_init_debug(void)
 #define SYSREG_OSLSR_EL1      SYSREG(2, 0, 1, 1, 4)
 #define SYSREG_OSDLR_EL1      SYSREG(2, 0, 1, 3, 4)
 #define SYSREG_CNTPCT_EL0     SYSREG(3, 3, 14, 0, 1)
+#define SYSREG_CNTP_CTL_EL0   SYSREG(3, 3, 14, 2, 1)
 #define SYSREG_PMCR_EL0       SYSREG(3, 3, 9, 12, 0)
 #define SYSREG_PMUSERENR_EL0  SYSREG(3, 3, 9, 14, 0)
 #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1)
@@ -1620,6 +1622,13 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
     case SYSREG_OSLAR_EL1:
         env->cp15.oslsr_el1 = val & 1;
         return 0;
+    case SYSREG_CNTP_CTL_EL0:
+        /*
+         * Guests should not rely on the physical counter, but macOS emits
+         * disable writes to it. Let it do so, but ignore the requests.
+         */
+        qemu_log_mask(LOG_UNIMP, "Unsupported write to CNTP_CTL_EL0\n");
+        return 0;
     case SYSREG_OSDLR_EL1:
         /* Dummy register */
         return 0;
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 09/15] gpex: Allow more than 4 legacy IRQs
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (7 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 08/15] hvf: arm: Ignore writes to CNTP_CTL_EL0 Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  5:21   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 10/15] hw/vmapple/aes: Introduce aes engine Phil Dennis-Jordan
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

Some boards such as vmapple don't do real legacy PCI IRQ swizzling.
Instead, they just keep allocating more board IRQ lines for each new
legacy IRQ. Let's support that mode by giving instantiators a new
"nr_irqs" property they can use to support more than 4 legacy IRQ lines.
In this mode, GPEX will export more IRQ lines, one for each device.

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---

v4:

 * Turned pair of IRQ arrays into array of structs.
 * Simplified swizzling logic selection.

 hw/arm/sbsa-ref.c          |  2 +-
 hw/arm/virt.c              |  2 +-
 hw/i386/microvm.c          |  2 +-
 hw/loongarch/virt.c        |  2 +-
 hw/mips/loongson3_virt.c   |  2 +-
 hw/openrisc/virt.c         | 12 +++++------
 hw/pci-host/gpex.c         | 43 ++++++++++++++++++++++++++++++--------
 hw/riscv/virt.c            | 12 +++++------
 hw/xtensa/virt.c           |  2 +-
 include/hw/pci-host/gpex.h |  7 +++----
 10 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index e3195d54497..7e7322486c2 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -673,7 +673,7 @@ static void create_pcie(SBSAMachineState *sms)
     /* Map IO port space */
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
                            qdev_get_gpio_in(sms->gic, irq + i));
         gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8b2b991d978..bd3b17be2ea 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1547,7 +1547,7 @@ static void create_pcie(VirtMachineState *vms)
     /* Map IO port space */
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
                            qdev_get_gpio_in(vms->gic, irq + i));
         gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 693099f2256..b3a348bee09 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -139,7 +139,7 @@ static void create_gpex(MicrovmMachineState *mms)
                                     mms->gpex.mmio64.base, mmio64_alias);
     }
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
                            x86ms->gsi[mms->gpex.irq + i]);
     }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 9a635d1d3d3..50056384994 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -741,7 +741,7 @@ static void virt_devices_init(DeviceState *pch_pic,
     memory_region_add_subregion(get_system_memory(), VIRT_PCI_IO_BASE,
                                 pio_alias);
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         sysbus_connect_irq(d, i,
                            qdev_get_gpio_in(pch_pic, 16 + i));
         gpex_set_irq_num(GPEX_HOST(gpex_dev), i, 16 + i);
diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c
index f3b6326cc59..884b5f23a99 100644
--- a/hw/mips/loongson3_virt.c
+++ b/hw/mips/loongson3_virt.c
@@ -458,7 +458,7 @@ static inline void loongson3_virt_devices_init(MachineState *machine,
                                 virt_memmap[VIRT_PCIE_PIO].base, s->pio_alias);
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, virt_memmap[VIRT_PCIE_PIO].base);
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         irq = qdev_get_gpio_in(pic, PCIE_IRQ_BASE + i);
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq);
         gpex_set_irq_num(GPEX_HOST(dev), i, PCIE_IRQ_BASE + i);
diff --git a/hw/openrisc/virt.c b/hw/openrisc/virt.c
index 47d2c9bd3c7..6f053bf48e0 100644
--- a/hw/openrisc/virt.c
+++ b/hw/openrisc/virt.c
@@ -318,7 +318,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename, int irq_base,
 {
     int pin, dev;
     uint32_t irq_map_stride = 0;
-    uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS * 6] = {};
+    uint32_t full_irq_map[PCI_NUM_PINS * PCI_NUM_PINS * 6] = {};
     uint32_t *irq_map = full_irq_map;
 
     /*
@@ -330,11 +330,11 @@ static void create_pcie_irq_map(void *fdt, char *nodename, int irq_base,
      * possible slot) seeing the interrupt-map-mask will allow the table
      * to wrap to any number of devices.
      */
-    for (dev = 0; dev < GPEX_NUM_IRQS; dev++) {
+    for (dev = 0; dev < PCI_NUM_PINS; dev++) {
         int devfn = dev << 3;
 
-        for (pin = 0; pin < GPEX_NUM_IRQS; pin++) {
-            int irq_nr = irq_base + ((pin + PCI_SLOT(devfn)) % GPEX_NUM_IRQS);
+        for (pin = 0; pin < PCI_NUM_PINS; pin++) {
+            int irq_nr = irq_base + ((pin + PCI_SLOT(devfn)) % PCI_NUM_PINS);
             int i = 0;
 
             /* Fill PCI address cells */
@@ -357,7 +357,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename, int irq_base,
     }
 
     qemu_fdt_setprop(fdt, nodename, "interrupt-map", full_irq_map,
-                     GPEX_NUM_IRQS * GPEX_NUM_IRQS *
+                     PCI_NUM_PINS * PCI_NUM_PINS *
                      irq_map_stride * sizeof(uint32_t));
 
     qemu_fdt_setprop_cells(fdt, nodename, "interrupt-map-mask",
@@ -409,7 +409,7 @@ static void openrisc_virt_pcie_init(OR1KVirtState *state,
     memory_region_add_subregion(get_system_memory(), pio_base, alias);
 
     /* Connect IRQ lines. */
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         pcie_irq = get_per_cpu_irq(cpus, num_cpus, irq_base + i);
 
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pcie_irq);
diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
index e9cf455bf52..cd63aa2d3cf 100644
--- a/hw/pci-host/gpex.c
+++ b/hw/pci-host/gpex.c
@@ -32,6 +32,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/irq.h"
+#include "hw/pci/pci_bus.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
@@ -41,20 +42,25 @@
  * GPEX host
  */
 
+struct GPEXIrq {
+    qemu_irq irq;
+    int irq_num;
+};
+
 static void gpex_set_irq(void *opaque, int irq_num, int level)
 {
     GPEXHost *s = opaque;
 
-    qemu_set_irq(s->irq[irq_num], level);
+    qemu_set_irq(s->irq[irq_num].irq, level);
 }
 
 int gpex_set_irq_num(GPEXHost *s, int index, int gsi)
 {
-    if (index >= GPEX_NUM_IRQS) {
+    if (index >= s->num_irqs) {
         return -EINVAL;
     }
 
-    s->irq_num[index] = gsi;
+    s->irq[index].irq_num = gsi;
     return 0;
 }
 
@@ -62,7 +68,7 @@ static PCIINTxRoute gpex_route_intx_pin_to_irq(void *opaque, int pin)
 {
     PCIINTxRoute route;
     GPEXHost *s = opaque;
-    int gsi = s->irq_num[pin];
+    int gsi = s->irq[pin].irq_num;
 
     route.irq = gsi;
     if (gsi < 0) {
@@ -74,6 +80,13 @@ static PCIINTxRoute gpex_route_intx_pin_to_irq(void *opaque, int pin)
     return route;
 }
 
+static int gpex_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin)
+{
+    PCIBus *bus = pci_device_root_bus(pci_dev);
+
+    return (PCI_SLOT(pci_dev->devfn) + pin) % bus->nirq;
+}
+
 static void gpex_host_realize(DeviceState *dev, Error **errp)
 {
     PCIHostState *pci = PCI_HOST_BRIDGE(dev);
@@ -82,6 +95,8 @@ static void gpex_host_realize(DeviceState *dev, Error **errp)
     PCIExpressHost *pex = PCIE_HOST_BRIDGE(dev);
     int i;
 
+    s->irq = g_malloc0_n(s->num_irqs, sizeof(*s->irq));
+
     pcie_host_mmcfg_init(pex, PCIE_MMCFG_SIZE_MAX);
     sysbus_init_mmio(sbd, &pex->mmio);
 
@@ -128,19 +143,27 @@ static void gpex_host_realize(DeviceState *dev, Error **errp)
         sysbus_init_mmio(sbd, &s->io_ioport);
     }
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
-        sysbus_init_irq(sbd, &s->irq[i]);
-        s->irq_num[i] = -1;
+    for (i = 0; i < s->num_irqs; i++) {
+        sysbus_init_irq(sbd, &s->irq[i].irq);
+        s->irq[i].irq_num = -1;
     }
 
     pci->bus = pci_register_root_bus(dev, "pcie.0", gpex_set_irq,
-                                     pci_swizzle_map_irq_fn, s, &s->io_mmio,
-                                     &s->io_ioport, 0, 4, TYPE_PCIE_BUS);
+                                     gpex_swizzle_map_irq_fn,
+                                     s, &s->io_mmio, &s->io_ioport, 0,
+                                     s->num_irqs, TYPE_PCIE_BUS);
 
     pci_bus_set_route_irq_fn(pci->bus, gpex_route_intx_pin_to_irq);
     qdev_realize(DEVICE(&s->gpex_root), BUS(pci->bus), &error_fatal);
 }
 
+static void gpex_host_unrealize(DeviceState *dev)
+{
+    GPEXHost *s = GPEX_HOST(dev);
+
+    g_free(s->irq);
+}
+
 static const char *gpex_host_root_bus_path(PCIHostState *host_bridge,
                                           PCIBus *rootbus)
 {
@@ -166,6 +189,7 @@ static Property gpex_host_properties[] = {
                        gpex_cfg.mmio64.base, 0),
     DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MMIO_SIZE, GPEXHost,
                      gpex_cfg.mmio64.size, 0),
+    DEFINE_PROP_UINT8("num-irqs", GPEXHost, num_irqs, PCI_NUM_PINS),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -176,6 +200,7 @@ static void gpex_host_class_init(ObjectClass *klass, void *data)
 
     hc->root_bus_path = gpex_host_root_bus_path;
     dc->realize = gpex_host_realize;
+    dc->unrealize = gpex_host_unrealize;
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->fw_name = "pci";
     device_class_set_props(dc, gpex_host_properties);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index ee3129f3b31..08832dc2359 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -167,7 +167,7 @@ static void create_pcie_irq_map(RISCVVirtState *s, void *fdt, char *nodename,
 {
     int pin, dev;
     uint32_t irq_map_stride = 0;
-    uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS *
+    uint32_t full_irq_map[PCI_NUM_PINS * PCI_NUM_PINS *
                           FDT_MAX_INT_MAP_WIDTH] = {};
     uint32_t *irq_map = full_irq_map;
 
@@ -179,11 +179,11 @@ static void create_pcie_irq_map(RISCVVirtState *s, void *fdt, char *nodename,
      * possible slot) seeing the interrupt-map-mask will allow the table
      * to wrap to any number of devices.
      */
-    for (dev = 0; dev < GPEX_NUM_IRQS; dev++) {
+    for (dev = 0; dev < PCI_NUM_PINS; dev++) {
         int devfn = dev * 0x8;
 
-        for (pin = 0; pin < GPEX_NUM_IRQS; pin++) {
-            int irq_nr = PCIE_IRQ + ((pin + PCI_SLOT(devfn)) % GPEX_NUM_IRQS);
+        for (pin = 0; pin < PCI_NUM_PINS; pin++) {
+            int irq_nr = PCIE_IRQ + ((pin + PCI_SLOT(devfn)) % PCI_NUM_PINS);
             int i = 0;
 
             /* Fill PCI address cells */
@@ -209,7 +209,7 @@ static void create_pcie_irq_map(RISCVVirtState *s, void *fdt, char *nodename,
     }
 
     qemu_fdt_setprop(fdt, nodename, "interrupt-map", full_irq_map,
-                     GPEX_NUM_IRQS * GPEX_NUM_IRQS *
+                     PCI_NUM_PINS * PCI_NUM_PINS *
                      irq_map_stride * sizeof(uint32_t));
 
     qemu_fdt_setprop_cells(fdt, nodename, "interrupt-map-mask",
@@ -1157,7 +1157,7 @@ static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
 
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, pio_base);
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         irq = qdev_get_gpio_in(irqchip, PCIE_IRQ + i);
 
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq);
diff --git a/hw/xtensa/virt.c b/hw/xtensa/virt.c
index 5310a888613..8f5c2009d29 100644
--- a/hw/xtensa/virt.c
+++ b/hw/xtensa/virt.c
@@ -93,7 +93,7 @@ static void create_pcie(MachineState *ms, CPUXtensaState *env, int irq_base,
     /* Connect IRQ lines. */
     extints = xtensa_get_extints(env);
 
-    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+    for (i = 0; i < PCI_NUM_PINS; i++) {
         void *q = extints[irq_base + i];
 
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, q);
diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index dce883573ba..84471533af0 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -32,8 +32,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(GPEXHost, GPEX_HOST)
 #define TYPE_GPEX_ROOT_DEVICE "gpex-root"
 OBJECT_DECLARE_SIMPLE_TYPE(GPEXRootState, GPEX_ROOT_DEVICE)
 
-#define GPEX_NUM_IRQS 4
-
 struct GPEXRootState {
     /*< private >*/
     PCIDevice parent_obj;
@@ -49,6 +47,7 @@ struct GPEXConfig {
     PCIBus      *bus;
 };
 
+typedef struct GPEXIrq GPEXIrq;
 struct GPEXHost {
     /*< private >*/
     PCIExpressHost parent_obj;
@@ -60,8 +59,8 @@ struct GPEXHost {
     MemoryRegion io_mmio;
     MemoryRegion io_ioport_window;
     MemoryRegion io_mmio_window;
-    qemu_irq irq[GPEX_NUM_IRQS];
-    int irq_num[GPEX_NUM_IRQS];
+    GPEXIrq *irq;
+    uint8_t num_irqs;
 
     bool allow_unmapped_accesses;
 
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 09/15] gpex: Allow more than 4 legacy IRQs
  2024-10-24 10:28 ` [PATCH v4 09/15] gpex: Allow more than 4 legacy IRQs Phil Dennis-Jordan
@ 2024-10-26  5:21   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  5:21 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> From: Alexander Graf <graf@amazon.com>
> 
> Some boards such as vmapple don't do real legacy PCI IRQ swizzling.
> Instead, they just keep allocating more board IRQ lines for each new
> legacy IRQ. Let's support that mode by giving instantiators a new
> "nr_irqs" property they can use to support more than 4 legacy IRQ lines.
> In this mode, GPEX will export more IRQ lines, one for each device.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>

Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 10/15] hw/vmapple/aes: Introduce aes engine
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (8 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 09/15] gpex: Allow more than 4 legacy IRQs Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  5:40   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 11/15] hw/vmapple/bdif: Introduce vmapple backdoor interface Phil Dennis-Jordan
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

VMApple contains an "aes" engine device that it uses to encrypt and
decrypt its nvram. It has trivial hard coded keys it uses for that
purpose.

Add device emulation for this device model.

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
v3:

 * Rebased on latest upstream and fixed minor breakages.
 * Replaced legacy device reset method with Resettable method

v4:

 * Improved logging of unimplemented functions and guest errors.
 * Better adherence to naming and coding conventions.
 * Cleaner error handling and recovery, including using g_autoptr

 hw/vmapple/Kconfig      |   2 +
 hw/vmapple/aes.c        | 572 ++++++++++++++++++++++++++++++++++++++++
 hw/vmapple/meson.build  |   1 +
 hw/vmapple/trace-events |  16 ++
 include/qemu/cutils.h   |  15 ++
 util/hexdump.c          |  14 +
 6 files changed, 620 insertions(+)
 create mode 100644 hw/vmapple/aes.c

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 8b137891791..a73504d5999 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -1 +1,3 @@
+config VMAPPLE_AES
+    bool
 
diff --git a/hw/vmapple/aes.c b/hw/vmapple/aes.c
new file mode 100644
index 00000000000..59cdcd65f90
--- /dev/null
+++ b/hw/vmapple/aes.c
@@ -0,0 +1,572 @@
+/*
+ * QEMU Apple AES device emulation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "trace.h"
+#include "crypto/hash.h"
+#include "crypto/aes.h"
+#include "crypto/cipher.h"
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "sysemu/dma.h"
+
+#define TYPE_AES          "apple-aes"
+OBJECT_DECLARE_SIMPLE_TYPE(AESState, AES)
+
+#define MAX_FIFO_SIZE     9
+
+#define CMD_KEY           0x1
+#define CMD_KEY_CONTEXT_SHIFT    27
+#define CMD_KEY_CONTEXT_MASK     (0x1 << CMD_KEY_CONTEXT_SHIFT)
+#define CMD_KEY_SELECT_MAX_IDX   0x7
+#define CMD_KEY_SELECT_SHIFT     24
+#define CMD_KEY_SELECT_MASK      (CMD_KEY_SELECT_MAX_IDX << CMD_KEY_SELECT_SHIFT)
+#define CMD_KEY_KEY_LEN_NUM      4u
+#define CMD_KEY_KEY_LEN_SHIFT    22
+#define CMD_KEY_KEY_LEN_MASK     ((CMD_KEY_KEY_LEN_NUM - 1u) << CMD_KEY_KEY_LEN_SHIFT)
+#define CMD_KEY_ENCRYPT_SHIFT    20
+#define CMD_KEY_ENCRYPT_MASK     (0x1 << CMD_KEY_ENCRYPT_SHIFT)
+#define CMD_KEY_BLOCK_MODE_SHIFT 16
+#define CMD_KEY_BLOCK_MODE_MASK  (0x3 << CMD_KEY_BLOCK_MODE_SHIFT)
+#define CMD_IV            0x2
+#define CMD_IV_CONTEXT_SHIFT     26
+#define CMD_IV_CONTEXT_MASK      (0x3 << CMD_KEY_CONTEXT_SHIFT)
+#define CMD_DSB           0x3
+#define CMD_SKG           0x4
+#define CMD_DATA          0x5
+#define CMD_DATA_KEY_CTX_SHIFT   27
+#define CMD_DATA_KEY_CTX_MASK    (0x1 << CMD_DATA_KEY_CTX_SHIFT)
+#define CMD_DATA_IV_CTX_SHIFT    25
+#define CMD_DATA_IV_CTX_MASK     (0x3 << CMD_DATA_IV_CTX_SHIFT)
+#define CMD_DATA_LEN_MASK        0xffffff
+#define CMD_STORE_IV      0x6
+#define CMD_STORE_IV_ADDR_MASK   0xffffff
+#define CMD_WRITE_REG     0x7
+#define CMD_FLAG          0x8
+#define CMD_FLAG_STOP_MASK       BIT(26)
+#define CMD_FLAG_RAISE_IRQ_MASK  BIT(27)
+#define CMD_FLAG_INFO_MASK       0xff
+#define CMD_MAX           0x10
+
+#define CMD_SHIFT         28
+
+#define REG_STATUS            0xc
+#define REG_STATUS_DMA_READ_RUNNING     BIT(0)
+#define REG_STATUS_DMA_READ_PENDING     BIT(1)
+#define REG_STATUS_DMA_WRITE_RUNNING    BIT(2)
+#define REG_STATUS_DMA_WRITE_PENDING    BIT(3)
+#define REG_STATUS_BUSY                 BIT(4)
+#define REG_STATUS_EXECUTING            BIT(5)
+#define REG_STATUS_READY                BIT(6)
+#define REG_STATUS_TEXT_DPA_SEEDED      BIT(7)
+#define REG_STATUS_UNWRAP_DPA_SEEDED    BIT(8)
+
+#define REG_IRQ_STATUS        0x18
+#define REG_IRQ_STATUS_INVALID_CMD      BIT(2)
+#define REG_IRQ_STATUS_FLAG             BIT(5)
+#define REG_IRQ_ENABLE        0x1c
+#define REG_WATERMARK         0x20
+#define REG_Q_STATUS          0x24
+#define REG_FLAG_INFO         0x30
+#define REG_FIFO              0x200
+
+static const uint32_t key_lens[CMD_KEY_KEY_LEN_NUM] = {
+    [0] = 16,
+    [1] = 24,
+    [2] = 32,
+    [3] = 64,
+};
+
+typedef struct Key {
+    uint32_t key_len;
+    uint8_t key[32];
+} Key;
+
+typedef struct IV {
+    uint32_t iv[4];
+} IV;
+
+static Key builtin_keys[CMD_KEY_SELECT_MAX_IDX + 1] = {
+    [1] = {
+        .key_len = 32,
+        .key = { 0x1 },
+    },
+    [2] = {
+        .key_len = 32,
+        .key = { 0x2 },
+    },
+    [3] = {
+        .key_len = 32,
+        .key = { 0x3 },
+    }
+};
+
+struct AESState {
+    SysBusDevice parent_obj;
+
+    qemu_irq irq;
+    MemoryRegion iomem1;
+    MemoryRegion iomem2;
+    AddressSpace *as;
+
+    uint32_t status;
+    uint32_t q_status;
+    uint32_t irq_status;
+    uint32_t irq_enable;
+    uint32_t watermark;
+    uint32_t flag_info;
+    uint32_t fifo[MAX_FIFO_SIZE];
+    uint32_t fifo_idx;
+    Key key[2];
+    IV iv[4];
+    bool is_encrypt;
+    QCryptoCipherMode block_mode;
+};
+
+static void aes_update_irq(AESState *s)
+{
+    qemu_set_irq(s->irq, !!(s->irq_status & s->irq_enable));
+}
+
+static uint64_t aes1_read(void *opaque, hwaddr offset, unsigned size)
+{
+    AESState *s = opaque;
+    uint64_t res = 0;
+
+    switch (offset) {
+    case REG_STATUS:
+        res = s->status;
+        break;
+    case REG_IRQ_STATUS:
+        res = s->irq_status;
+        break;
+    case REG_IRQ_ENABLE:
+        res = s->irq_enable;
+        break;
+    case REG_WATERMARK:
+        res = s->watermark;
+        break;
+    case REG_Q_STATUS:
+        res = s->q_status;
+        break;
+    case REG_FLAG_INFO:
+        res = s->flag_info;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP, "%s: Unknown AES MMIO offset %" PRIx64 "\n",
+                      __func__, offset);
+        break;
+    }
+
+    trace_aes_read(offset, res);
+
+    return res;
+}
+
+static void fifo_append(AESState *s, uint64_t val)
+{
+    if (s->fifo_idx == MAX_FIFO_SIZE) {
+        /* Exceeded the FIFO. Bail out */
+        return;
+    }
+
+    s->fifo[s->fifo_idx++] = val;
+}
+
+static bool has_payload(AESState *s, uint32_t elems)
+{
+    return s->fifo_idx >= (elems + 1);
+}
+
+static bool cmd_key(AESState *s)
+{
+    uint32_t cmd = s->fifo[0];
+    uint32_t key_select = (cmd & CMD_KEY_SELECT_MASK) >> CMD_KEY_SELECT_SHIFT;
+    uint32_t ctxt = (cmd & CMD_KEY_CONTEXT_MASK) >> CMD_KEY_CONTEXT_SHIFT;
+    uint32_t key_len;
+
+    switch ((cmd & CMD_KEY_BLOCK_MODE_MASK) >> CMD_KEY_BLOCK_MODE_SHIFT) {
+    case 0:
+        s->block_mode = QCRYPTO_CIPHER_MODE_ECB;
+        break;
+    case 1:
+        s->block_mode = QCRYPTO_CIPHER_MODE_CBC;
+        break;
+    default:
+        return false;
+    }
+
+    s->is_encrypt = cmd & CMD_KEY_ENCRYPT_MASK;
+    key_len = key_lens[((cmd & CMD_KEY_KEY_LEN_MASK) >> CMD_KEY_KEY_LEN_SHIFT)];
+
+    if (key_select) {
+        trace_aes_cmd_key_select_builtin(ctxt, key_select,
+                                         s->is_encrypt ? "en" : "de",
+                                         QCryptoCipherMode_str(s->block_mode));
+        s->key[ctxt] = builtin_keys[key_select];
+    } else {
+        trace_aes_cmd_key_select_new(ctxt, key_len,
+                                     s->is_encrypt ? "en" : "de",
+                                     QCryptoCipherMode_str(s->block_mode));
+        if (key_len > sizeof(s->key[ctxt].key)) {
+            return false;
+        }
+        if (!has_payload(s, key_len / sizeof(uint32_t))) {
+            /* wait for payload */
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: No payload\n", __func__);
+            return false;
+        }
+        memcpy(&s->key[ctxt].key, &s->fifo[1], key_len);
+        s->key[ctxt].key_len = key_len;
+    }
+
+    return true;
+}
+
+static bool cmd_iv(AESState *s)
+{
+    uint32_t cmd = s->fifo[0];
+    uint32_t ctxt = (cmd & CMD_IV_CONTEXT_MASK) >> CMD_IV_CONTEXT_SHIFT;
+
+    if (!has_payload(s, 4)) {
+        /* wait for payload */
+        return false;
+    }
+    memcpy(&s->iv[ctxt].iv, &s->fifo[1], sizeof(s->iv[ctxt].iv));
+    trace_aes_cmd_iv(ctxt, s->fifo[1], s->fifo[2], s->fifo[3], s->fifo[4]);
+
+    return true;
+}
+
+static void dump_data(const char *desc, const void *p, size_t len)
+{
+    static const size_t MAX_LEN = 0x1000;
+    char hex[MAX_LEN * 2 + 1] = "";
+
+    if (len > MAX_LEN) {
+        return;
+    }
+
+    qemu_hexdump_to_buffer(hex, sizeof(hex), p, len);
+    trace_aes_dump_data(desc, hex);
+}
+
+static bool cmd_data(AESState *s)
+{
+    uint32_t cmd = s->fifo[0];
+    uint32_t ctxt_iv = 0;
+    uint32_t ctxt_key = (cmd & CMD_DATA_KEY_CTX_MASK) >> CMD_DATA_KEY_CTX_SHIFT;
+    uint32_t len = cmd & CMD_DATA_LEN_MASK;
+    uint64_t src_addr = s->fifo[2];
+    uint64_t dst_addr = s->fifo[3];
+    QCryptoCipherAlgo alg;
+    g_autoptr(QCryptoCipher) cipher = NULL;
+    g_autoptr(GByteArray) src = NULL;
+    g_autoptr(GByteArray) dst = NULL;
+    MemTxResult r;
+
+    src_addr |= ((uint64_t)s->fifo[1] << 16) & 0xffff00000000ULL;
+    dst_addr |= ((uint64_t)s->fifo[1] << 32) & 0xffff00000000ULL;
+
+    trace_aes_cmd_data(ctxt_key, ctxt_iv, src_addr, dst_addr, len);
+
+    if (!has_payload(s, 3)) {
+        /* wait for payload */
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: No payload\n", __func__);
+        return false;
+    }
+
+    if (ctxt_key >= ARRAY_SIZE(s->key) ||
+        ctxt_iv >= ARRAY_SIZE(s->iv)) {
+        /* Invalid input */
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Invalid key or iv\n", __func__);
+        return false;
+    }
+
+    src = g_byte_array_sized_new(len);
+    g_byte_array_set_size(src, len);
+    dst = g_byte_array_sized_new(len);
+    g_byte_array_set_size(dst, len);
+
+    r = dma_memory_read(s->as, src_addr, src->data, len, MEMTXATTRS_UNSPECIFIED);
+    if (r != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: DMA read of %"PRIu32" bytes "
+                      "from 0x%"PRIx64" failed. (r=%d)\n",
+                      __func__, len, src_addr, r);
+        return false;
+    }
+
+    dump_data("cmd_data(): src_data=", src->data, len);
+
+    switch (s->key[ctxt_key].key_len) {
+    case 128 / 8:
+        alg = QCRYPTO_CIPHER_ALGO_AES_128;
+        break;
+    case 192 / 8:
+        alg = QCRYPTO_CIPHER_ALGO_AES_192;
+        break;
+    case 256 / 8:
+        alg = QCRYPTO_CIPHER_ALGO_AES_256;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Invalid key length\n", __func__);
+        return false;
+    }
+    cipher = qcrypto_cipher_new(alg, s->block_mode,
+                                s->key[ctxt_key].key,
+                                s->key[ctxt_key].key_len, NULL);
+    if (!cipher) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to create cipher object\n",
+                      __func__);
+        return false;
+    }
+    if (s->block_mode != QCRYPTO_CIPHER_MODE_ECB) {
+        if (qcrypto_cipher_setiv(cipher, (void *)s->iv[ctxt_iv].iv,
+                                 sizeof(s->iv[ctxt_iv].iv), NULL) != 0) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to set IV\n", __func__);
+            return false;
+        }
+    }
+    if (s->is_encrypt) {
+        if (qcrypto_cipher_encrypt(cipher, src->data, dst->data, len, NULL) != 0) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: Encryption failed\n", __func__);
+            return false;
+        }
+    } else {
+        if (qcrypto_cipher_decrypt(cipher, src->data, dst->data, len, NULL) != 0) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: Decryption failed\n", __func__);
+            return false;
+        }
+    }
+
+    dump_data("cmd_data(): dst_data=", dst->data, len);
+    r = dma_memory_write(s->as, dst_addr, dst->data, len, MEMTXATTRS_UNSPECIFIED);
+    if (r != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: DMA write of %"PRIu32" bytes "
+                      "to 0x%"PRIx64" failed. (r=%d)\n",
+                      __func__, len, src_addr, r);
+        return false;
+    }
+
+    return true;
+}
+
+static bool cmd_store_iv(AESState *s)
+{
+    uint32_t cmd = s->fifo[0];
+    uint32_t ctxt = (cmd & CMD_IV_CONTEXT_MASK) >> CMD_IV_CONTEXT_SHIFT;
+    uint64_t addr = s->fifo[1];
+
+    if (!has_payload(s, 1)) {
+        /* wait for payload */
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: No payload\n", __func__);
+        return false;
+    }
+
+    if (ctxt >= ARRAY_SIZE(s->iv)) {
+        /* Invalid context selected */
+        return false;
+    }
+
+    addr |= ((uint64_t)cmd << 32) & 0xff00000000ULL;
+    cpu_physical_memory_write(addr, &s->iv[ctxt].iv, sizeof(s->iv[ctxt].iv));
+
+    trace_aes_cmd_store_iv(ctxt, addr, s->iv[ctxt].iv[0], s->iv[ctxt].iv[1],
+                           s->iv[ctxt].iv[2], s->iv[ctxt].iv[3]);
+
+    return true;
+}
+
+static bool cmd_flag(AESState *s)
+{
+    uint32_t cmd = s->fifo[0];
+    uint32_t raise_irq = cmd & CMD_FLAG_RAISE_IRQ_MASK;
+
+    /* We always process data when it's coming in, so fire an IRQ immediately */
+    if (raise_irq) {
+        s->irq_status |= REG_IRQ_STATUS_FLAG;
+    }
+
+    s->flag_info = cmd & CMD_FLAG_INFO_MASK;
+
+    trace_aes_cmd_flag(!!raise_irq, s->flag_info);
+
+    return true;
+}
+
+static void fifo_process(AESState *s)
+{
+    uint32_t cmd = s->fifo[0] >> CMD_SHIFT;
+    bool success = false;
+
+    if (!s->fifo_idx) {
+        return;
+    }
+
+    switch (cmd) {
+    case CMD_KEY:
+        success = cmd_key(s);
+        break;
+    case CMD_IV:
+        success = cmd_iv(s);
+        break;
+    case CMD_DATA:
+        success = cmd_data(s);
+        break;
+    case CMD_STORE_IV:
+        success = cmd_store_iv(s);
+        break;
+    case CMD_FLAG:
+        success = cmd_flag(s);
+        break;
+    default:
+        s->irq_status |= REG_IRQ_STATUS_INVALID_CMD;
+        break;
+    }
+
+    if (success) {
+        s->fifo_idx = 0;
+    }
+
+    trace_aes_fifo_process(cmd, success ? 1 : 0);
+}
+
+static void aes1_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
+{
+    AESState *s = opaque;
+
+    trace_aes_write(offset, val);
+
+    switch (offset) {
+    case REG_IRQ_STATUS:
+        s->irq_status &= ~val;
+        break;
+    case REG_IRQ_ENABLE:
+        s->irq_enable = val;
+        break;
+    case REG_FIFO:
+        fifo_append(s, val);
+        fifo_process(s);
+        break;
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: Unknown AES MMIO offset %"PRIx64", data %"PRIx64"\n",
+                      __func__, offset, val);
+        return;
+    }
+
+    aes_update_irq(s);
+}
+
+static const MemoryRegionOps aes1_ops = {
+    .read = aes1_read,
+    .write = aes1_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static uint64_t aes2_read(void *opaque, hwaddr offset, unsigned size)
+{
+    uint64_t res = 0;
+
+    switch (offset) {
+    case 0:
+        res = 0;
+        break;
+    default:
+        trace_aes_2_read_unknown(offset);
+        break;
+    }
+
+    trace_aes_2_read(offset, res);
+
+    return res;
+}
+
+static void aes2_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
+{
+    trace_aes_2_write(offset, val);
+
+    switch (offset) {
+    default:
+        trace_aes_2_write_unknown(offset);
+        return;
+    }
+}
+
+static const MemoryRegionOps aes2_ops = {
+    .read = aes2_read,
+    .write = aes2_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void aes_reset(Object *obj, ResetType type)
+{
+    AESState *s = AES(obj);
+
+    s->status = 0x3f80;
+    s->q_status = 2;
+    s->irq_status = 0;
+    s->irq_enable = 0;
+    s->watermark = 0;
+}
+
+static void aes_init(Object *obj)
+{
+    AESState *s = AES(obj);
+
+    memory_region_init_io(&s->iomem1, obj, &aes1_ops, s, TYPE_AES, 0x4000);
+    memory_region_init_io(&s->iomem2, obj, &aes2_ops, s, TYPE_AES, 0x4000);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem1);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem2);
+    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq);
+    s->as = &address_space_memory;
+}
+
+static void aes_class_init(ObjectClass *klass, void *data)
+{
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
+
+    rc->phases.hold = aes_reset;
+}
+
+static const TypeInfo aes_info = {
+    .name          = TYPE_AES,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(AESState),
+    .class_init    = aes_class_init,
+    .instance_init = aes_init,
+};
+
+static void aes_register_types(void)
+{
+    type_register_static(&aes_info);
+}
+
+type_init(aes_register_types)
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index e69de29bb2d..bcd4dcb28d2 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -0,0 +1 @@
+system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events
index 9ccc5790487..12757bc0852 100644
--- a/hw/vmapple/trace-events
+++ b/hw/vmapple/trace-events
@@ -1,2 +1,18 @@
 # See docs/devel/tracing.rst for syntax documentation.
 
+# aes.c
+aes_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
+aes_cmd_key_select_builtin(uint32_t ctx, uint32_t key_id, const char *direction, const char *cipher) "[%d] Selecting builtin key %d to %scrypt with %s"
+aes_cmd_key_select_new(uint32_t ctx, uint32_t key_len, const char *direction, const char *cipher) "[%d] Selecting new key size=%d to %scrypt with %s"
+aes_cmd_iv(uint32_t ctx, uint32_t iv0, uint32_t iv1, uint32_t iv2, uint32_t iv3) "[%d] 0x%08x 0x%08x 0x%08x 0x%08x"
+aes_cmd_data(uint32_t key, uint32_t iv, uint64_t src, uint64_t dst, uint32_t len) "[key=%d iv=%d] src=0x%"PRIx64" dst=0x%"PRIx64" len=0x%x"
+aes_cmd_store_iv(uint32_t ctx, uint64_t addr, uint32_t iv0, uint32_t iv1, uint32_t iv2, uint32_t iv3) "[%d] addr=0x%"PRIx64"x -> 0x%08x 0x%08x 0x%08x 0x%08x"
+aes_cmd_flag(uint32_t raise, uint32_t flag_info) "raise=%d flag_info=0x%x"
+aes_fifo_process(uint32_t cmd, uint32_t success) "cmd=%d success=%d"
+aes_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
+aes_2_read_unknown(uint64_t offset) "offset=0x%"PRIx64
+aes_2_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
+aes_2_write_unknown(uint64_t offset) "offset=0x%"PRIx64
+aes_2_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
+aes_dump_data(const char *desc, const char *hex) "%s%s"
+
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 34a9b9b2204..36c68ce86c5 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -302,4 +302,19 @@ GString *qemu_hexdump_line(GString *str, const void *buf, size_t len,
 void qemu_hexdump(FILE *fp, const char *prefix,
                   const void *bufptr, size_t size);
 
+/**
+ * qemu_hexdump_to_buffer:
+ * @buffer: output string buffer
+ * @buffer_size: amount of available space in buffer. Must be at least
+ *               data_size*2+1.
+ * @data: input bytes
+ * @data_size: number of bytes in data
+ *
+ * Converts the @data_size bytes in @data into hex digit pairs, writing them to
+ * @buffer. Finally, a nul terminating character is written; @buffer therefore
+ * needs space for (data_size*2+1) chars.
+ */
+void qemu_hexdump_to_buffer(char *restrict buffer, size_t buffer_size,
+                            const uint8_t *restrict data, size_t data_size);
+
 #endif
diff --git a/util/hexdump.c b/util/hexdump.c
index ae0d4992dcf..86345db20a8 100644
--- a/util/hexdump.c
+++ b/util/hexdump.c
@@ -97,3 +97,17 @@ void qemu_hexdump(FILE *fp, const char *prefix,
     }
 
 }
+
+void qemu_hexdump_to_buffer(char *restrict buffer, size_t buffer_size,
+                            const uint8_t *restrict data, size_t data_size)
+{
+    size_t i;
+
+    assert(buffer_size >= data_size * 2 + 1 && buffer_size > data_size);
+    for (i = 0; i < data_size; i++) {
+        uint8_t val = data[i];
+        *(buffer++) = hexdump_nibble(val >> 4);
+        *(buffer++) = hexdump_nibble(val & 0xf);
+    }
+    *buffer = '\0';
+}
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 10/15] hw/vmapple/aes: Introduce aes engine
  2024-10-24 10:28 ` [PATCH v4 10/15] hw/vmapple/aes: Introduce aes engine Phil Dennis-Jordan
@ 2024-10-26  5:40   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  5:40 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> From: Alexander Graf <graf@amazon.com>
> 
> VMApple contains an "aes" engine device that it uses to encrypt and
> decrypt its nvram. It has trivial hard coded keys it uses for that
> purpose.
> 
> Add device emulation for this device model.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> v3:
> 
>   * Rebased on latest upstream and fixed minor breakages.
>   * Replaced legacy device reset method with Resettable method
> 
> v4:
> 
>   * Improved logging of unimplemented functions and guest errors.
>   * Better adherence to naming and coding conventions.
>   * Cleaner error handling and recovery, including using g_autoptr
> 
>   hw/vmapple/Kconfig      |   2 +
>   hw/vmapple/aes.c        | 572 ++++++++++++++++++++++++++++++++++++++++
>   hw/vmapple/meson.build  |   1 +
>   hw/vmapple/trace-events |  16 ++
>   include/qemu/cutils.h   |  15 ++
>   util/hexdump.c          |  14 +
>   6 files changed, 620 insertions(+)
>   create mode 100644 hw/vmapple/aes.c
> 
> diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
> index 8b137891791..a73504d5999 100644
> --- a/hw/vmapple/Kconfig
> +++ b/hw/vmapple/Kconfig
> @@ -1 +1,3 @@
> +config VMAPPLE_AES
> +    bool
>   
> diff --git a/hw/vmapple/aes.c b/hw/vmapple/aes.c
> new file mode 100644
> index 00000000000..59cdcd65f90
> --- /dev/null
> +++ b/hw/vmapple/aes.c
> @@ -0,0 +1,572 @@
> +/*
> + * QEMU Apple AES device emulation
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "trace.h"
> +#include "crypto/hash.h"
> +#include "crypto/aes.h"
> +#include "crypto/cipher.h"
> +#include "hw/irq.h"
> +#include "hw/sysbus.h"
> +#include "migration/vmstate.h"
> +#include "qemu/cutils.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "sysemu/dma.h"
> +
> +#define TYPE_AES          "apple-aes"
> +OBJECT_DECLARE_SIMPLE_TYPE(AESState, AES)
> +
> +#define MAX_FIFO_SIZE     9
> +
> +#define CMD_KEY           0x1
> +#define CMD_KEY_CONTEXT_SHIFT    27
> +#define CMD_KEY_CONTEXT_MASK     (0x1 << CMD_KEY_CONTEXT_SHIFT)
> +#define CMD_KEY_SELECT_MAX_IDX   0x7
> +#define CMD_KEY_SELECT_SHIFT     24
> +#define CMD_KEY_SELECT_MASK      (CMD_KEY_SELECT_MAX_IDX << CMD_KEY_SELECT_SHIFT)
> +#define CMD_KEY_KEY_LEN_NUM      4u
> +#define CMD_KEY_KEY_LEN_SHIFT    22
> +#define CMD_KEY_KEY_LEN_MASK     ((CMD_KEY_KEY_LEN_NUM - 1u) << CMD_KEY_KEY_LEN_SHIFT)
> +#define CMD_KEY_ENCRYPT_SHIFT    20
> +#define CMD_KEY_ENCRYPT_MASK     (0x1 << CMD_KEY_ENCRYPT_SHIFT)
> +#define CMD_KEY_BLOCK_MODE_SHIFT 16
> +#define CMD_KEY_BLOCK_MODE_MASK  (0x3 << CMD_KEY_BLOCK_MODE_SHIFT)
> +#define CMD_IV            0x2
> +#define CMD_IV_CONTEXT_SHIFT     26
> +#define CMD_IV_CONTEXT_MASK      (0x3 << CMD_KEY_CONTEXT_SHIFT)
> +#define CMD_DSB           0x3
> +#define CMD_SKG           0x4
> +#define CMD_DATA          0x5
> +#define CMD_DATA_KEY_CTX_SHIFT   27
> +#define CMD_DATA_KEY_CTX_MASK    (0x1 << CMD_DATA_KEY_CTX_SHIFT)
> +#define CMD_DATA_IV_CTX_SHIFT    25
> +#define CMD_DATA_IV_CTX_MASK     (0x3 << CMD_DATA_IV_CTX_SHIFT)
> +#define CMD_DATA_LEN_MASK        0xffffff
> +#define CMD_STORE_IV      0x6
> +#define CMD_STORE_IV_ADDR_MASK   0xffffff
> +#define CMD_WRITE_REG     0x7
> +#define CMD_FLAG          0x8
> +#define CMD_FLAG_STOP_MASK       BIT(26)
> +#define CMD_FLAG_RAISE_IRQ_MASK  BIT(27)
> +#define CMD_FLAG_INFO_MASK       0xff
> +#define CMD_MAX           0x10
> +
> +#define CMD_SHIFT         28
> +
> +#define REG_STATUS            0xc
> +#define REG_STATUS_DMA_READ_RUNNING     BIT(0)
> +#define REG_STATUS_DMA_READ_PENDING     BIT(1)
> +#define REG_STATUS_DMA_WRITE_RUNNING    BIT(2)
> +#define REG_STATUS_DMA_WRITE_PENDING    BIT(3)
> +#define REG_STATUS_BUSY                 BIT(4)
> +#define REG_STATUS_EXECUTING            BIT(5)
> +#define REG_STATUS_READY                BIT(6)
> +#define REG_STATUS_TEXT_DPA_SEEDED      BIT(7)
> +#define REG_STATUS_UNWRAP_DPA_SEEDED    BIT(8)
> +
> +#define REG_IRQ_STATUS        0x18
> +#define REG_IRQ_STATUS_INVALID_CMD      BIT(2)
> +#define REG_IRQ_STATUS_FLAG             BIT(5)
> +#define REG_IRQ_ENABLE        0x1c
> +#define REG_WATERMARK         0x20
> +#define REG_Q_STATUS          0x24
> +#define REG_FLAG_INFO         0x30
> +#define REG_FIFO              0x200
> +
> +static const uint32_t key_lens[CMD_KEY_KEY_LEN_NUM] = {
> +    [0] = 16,
> +    [1] = 24,
> +    [2] = 32,
> +    [3] = 64,
> +};
> +
> +typedef struct Key {
> +    uint32_t key_len;
> +    uint8_t key[32];
> +} Key;
> +
> +typedef struct IV {
> +    uint32_t iv[4];
> +} IV;
> +
> +static Key builtin_keys[CMD_KEY_SELECT_MAX_IDX + 1] = {
> +    [1] = {
> +        .key_len = 32,
> +        .key = { 0x1 },
> +    },
> +    [2] = {
> +        .key_len = 32,
> +        .key = { 0x2 },
> +    },
> +    [3] = {
> +        .key_len = 32,
> +        .key = { 0x3 },
> +    }
> +};
> +
> +struct AESState {
> +    SysBusDevice parent_obj;
> +
> +    qemu_irq irq;
> +    MemoryRegion iomem1;
> +    MemoryRegion iomem2;
> +    AddressSpace *as;
> +
> +    uint32_t status;
> +    uint32_t q_status;
> +    uint32_t irq_status;
> +    uint32_t irq_enable;
> +    uint32_t watermark;
> +    uint32_t flag_info;
> +    uint32_t fifo[MAX_FIFO_SIZE];
> +    uint32_t fifo_idx;
> +    Key key[2];
> +    IV iv[4];
> +    bool is_encrypt;
> +    QCryptoCipherMode block_mode;
> +};
> +
> +static void aes_update_irq(AESState *s)
> +{
> +    qemu_set_irq(s->irq, !!(s->irq_status & s->irq_enable));
> +}
> +
> +static uint64_t aes1_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    AESState *s = opaque;
> +    uint64_t res = 0;
> +
> +    switch (offset) {
> +    case REG_STATUS:
> +        res = s->status;
> +        break;
> +    case REG_IRQ_STATUS:
> +        res = s->irq_status;
> +        break;
> +    case REG_IRQ_ENABLE:
> +        res = s->irq_enable;
> +        break;
> +    case REG_WATERMARK:
> +        res = s->watermark;
> +        break;
> +    case REG_Q_STATUS:
> +        res = s->q_status;
> +        break;
> +    case REG_FLAG_INFO:
> +        res = s->flag_info;
> +        break;
> +
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "%s: Unknown AES MMIO offset %" PRIx64 "\n",
> +                      __func__, offset);
> +        break;
> +    }
> +
> +    trace_aes_read(offset, res);
> +
> +    return res;
> +}
> +
> +static void fifo_append(AESState *s, uint64_t val)
> +{
> +    if (s->fifo_idx == MAX_FIFO_SIZE) {
> +        /* Exceeded the FIFO. Bail out */
> +        return;
> +    }
> +
> +    s->fifo[s->fifo_idx++] = val;
> +}
> +
> +static bool has_payload(AESState *s, uint32_t elems)
> +{
> +    return s->fifo_idx >= (elems + 1);
> +}
> +
> +static bool cmd_key(AESState *s)
> +{
> +    uint32_t cmd = s->fifo[0];
> +    uint32_t key_select = (cmd & CMD_KEY_SELECT_MASK) >> CMD_KEY_SELECT_SHIFT;
> +    uint32_t ctxt = (cmd & CMD_KEY_CONTEXT_MASK) >> CMD_KEY_CONTEXT_SHIFT;
> +    uint32_t key_len;
> +
> +    switch ((cmd & CMD_KEY_BLOCK_MODE_MASK) >> CMD_KEY_BLOCK_MODE_SHIFT) {
> +    case 0:
> +        s->block_mode = QCRYPTO_CIPHER_MODE_ECB;
> +        break;
> +    case 1:
> +        s->block_mode = QCRYPTO_CIPHER_MODE_CBC;
> +        break;
> +    default:
> +        return false;
> +    }
> +
> +    s->is_encrypt = cmd & CMD_KEY_ENCRYPT_MASK;
> +    key_len = key_lens[((cmd & CMD_KEY_KEY_LEN_MASK) >> CMD_KEY_KEY_LEN_SHIFT)];
> +
> +    if (key_select) {
> +        trace_aes_cmd_key_select_builtin(ctxt, key_select,
> +                                         s->is_encrypt ? "en" : "de",
> +                                         QCryptoCipherMode_str(s->block_mode));
> +        s->key[ctxt] = builtin_keys[key_select];
> +    } else {
> +        trace_aes_cmd_key_select_new(ctxt, key_len,
> +                                     s->is_encrypt ? "en" : "de",
> +                                     QCryptoCipherMode_str(s->block_mode));
> +        if (key_len > sizeof(s->key[ctxt].key)) {
> +            return false;
> +        }
> +        if (!has_payload(s, key_len / sizeof(uint32_t))) {
> +            /* wait for payload */
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: No payload\n", __func__);
> +            return false;
> +        }
> +        memcpy(&s->key[ctxt].key, &s->fifo[1], key_len);
> +        s->key[ctxt].key_len = key_len;
> +    }
> +
> +    return true;
> +}
> +
> +static bool cmd_iv(AESState *s)
> +{
> +    uint32_t cmd = s->fifo[0];
> +    uint32_t ctxt = (cmd & CMD_IV_CONTEXT_MASK) >> CMD_IV_CONTEXT_SHIFT;
> +
> +    if (!has_payload(s, 4)) {
> +        /* wait for payload */
> +        return false;
> +    }
> +    memcpy(&s->iv[ctxt].iv, &s->fifo[1], sizeof(s->iv[ctxt].iv));
> +    trace_aes_cmd_iv(ctxt, s->fifo[1], s->fifo[2], s->fifo[3], s->fifo[4]);
> +
> +    return true;
> +}
> +
> +static void dump_data(const char *desc, const void *p, size_t len)
> +{
> +    static const size_t MAX_LEN = 0x1000;
> +    char hex[MAX_LEN * 2 + 1] = "";
> +
> +    if (len > MAX_LEN) {
> +        return;
> +    }
> +
> +    qemu_hexdump_to_buffer(hex, sizeof(hex), p, len);
> +    trace_aes_dump_data(desc, hex);
> +}
> +
> +static bool cmd_data(AESState *s)
> +{
> +    uint32_t cmd = s->fifo[0];
> +    uint32_t ctxt_iv = 0;
> +    uint32_t ctxt_key = (cmd & CMD_DATA_KEY_CTX_MASK) >> CMD_DATA_KEY_CTX_SHIFT;
> +    uint32_t len = cmd & CMD_DATA_LEN_MASK;
> +    uint64_t src_addr = s->fifo[2];
> +    uint64_t dst_addr = s->fifo[3];
> +    QCryptoCipherAlgo alg;
> +    g_autoptr(QCryptoCipher) cipher = NULL;
> +    g_autoptr(GByteArray) src = NULL;
> +    g_autoptr(GByteArray) dst = NULL;
> +    MemTxResult r;
> +
> +    src_addr |= ((uint64_t)s->fifo[1] << 16) & 0xffff00000000ULL;
> +    dst_addr |= ((uint64_t)s->fifo[1] << 32) & 0xffff00000000ULL;
> +
> +    trace_aes_cmd_data(ctxt_key, ctxt_iv, src_addr, dst_addr, len);
> +
> +    if (!has_payload(s, 3)) {
> +        /* wait for payload */
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: No payload\n", __func__);
> +        return false;
> +    }
> +
> +    if (ctxt_key >= ARRAY_SIZE(s->key) ||
> +        ctxt_iv >= ARRAY_SIZE(s->iv)) {
> +        /* Invalid input */
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Invalid key or iv\n", __func__);
> +        return false;
> +    }
> +
> +    src = g_byte_array_sized_new(len);
> +    g_byte_array_set_size(src, len);
> +    dst = g_byte_array_sized_new(len);
> +    g_byte_array_set_size(dst, len);
> +
> +    r = dma_memory_read(s->as, src_addr, src->data, len, MEMTXATTRS_UNSPECIFIED);
> +    if (r != MEMTX_OK) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: DMA read of %"PRIu32" bytes "
> +                      "from 0x%"PRIx64" failed. (r=%d)\n",
> +                      __func__, len, src_addr, r);
> +        return false;
> +    }
> +
> +    dump_data("cmd_data(): src_data=", src->data, len);
> +
> +    switch (s->key[ctxt_key].key_len) {
> +    case 128 / 8:
> +        alg = QCRYPTO_CIPHER_ALGO_AES_128;
> +        break;
> +    case 192 / 8:
> +        alg = QCRYPTO_CIPHER_ALGO_AES_192;
> +        break;
> +    case 256 / 8:
> +        alg = QCRYPTO_CIPHER_ALGO_AES_256;
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Invalid key length\n", __func__);
> +        return false;
> +    }
> +    cipher = qcrypto_cipher_new(alg, s->block_mode,
> +                                s->key[ctxt_key].key,
> +                                s->key[ctxt_key].key_len, NULL);
> +    if (!cipher) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to create cipher object\n",
> +                      __func__);
> +        return false;
> +    }
> +    if (s->block_mode != QCRYPTO_CIPHER_MODE_ECB) {
> +        if (qcrypto_cipher_setiv(cipher, (void *)s->iv[ctxt_iv].iv,
> +                                 sizeof(s->iv[ctxt_iv].iv), NULL) != 0) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to set IV\n", __func__);
> +            return false;
> +        }
> +    }
> +    if (s->is_encrypt) {
> +        if (qcrypto_cipher_encrypt(cipher, src->data, dst->data, len, NULL) != 0) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: Encryption failed\n", __func__);
> +            return false;
> +        }
> +    } else {
> +        if (qcrypto_cipher_decrypt(cipher, src->data, dst->data, len, NULL) != 0) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: Decryption failed\n", __func__);
> +            return false;
> +        }
> +    }
> +
> +    dump_data("cmd_data(): dst_data=", dst->data, len);
> +    r = dma_memory_write(s->as, dst_addr, dst->data, len, MEMTXATTRS_UNSPECIFIED);
> +    if (r != MEMTX_OK) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: DMA write of %"PRIu32" bytes "
> +                      "to 0x%"PRIx64" failed. (r=%d)\n",
> +                      __func__, len, src_addr, r);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static bool cmd_store_iv(AESState *s)
> +{
> +    uint32_t cmd = s->fifo[0];
> +    uint32_t ctxt = (cmd & CMD_IV_CONTEXT_MASK) >> CMD_IV_CONTEXT_SHIFT;
> +    uint64_t addr = s->fifo[1];
> +
> +    if (!has_payload(s, 1)) {
> +        /* wait for payload */
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: No payload\n", __func__);
> +        return false;
> +    }
> +
> +    if (ctxt >= ARRAY_SIZE(s->iv)) {
> +        /* Invalid context selected */
> +        return false;
> +    }
> +
> +    addr |= ((uint64_t)cmd << 32) & 0xff00000000ULL;
> +    cpu_physical_memory_write(addr, &s->iv[ctxt].iv, sizeof(s->iv[ctxt].iv));
> +
> +    trace_aes_cmd_store_iv(ctxt, addr, s->iv[ctxt].iv[0], s->iv[ctxt].iv[1],
> +                           s->iv[ctxt].iv[2], s->iv[ctxt].iv[3]);
> +
> +    return true;
> +}
> +
> +static bool cmd_flag(AESState *s)
> +{
> +    uint32_t cmd = s->fifo[0];
> +    uint32_t raise_irq = cmd & CMD_FLAG_RAISE_IRQ_MASK;
> +
> +    /* We always process data when it's coming in, so fire an IRQ immediately */
> +    if (raise_irq) {
> +        s->irq_status |= REG_IRQ_STATUS_FLAG;
> +    }
> +
> +    s->flag_info = cmd & CMD_FLAG_INFO_MASK;
> +
> +    trace_aes_cmd_flag(!!raise_irq, s->flag_info);
> +
> +    return true;
> +}
> +
> +static void fifo_process(AESState *s)
> +{
> +    uint32_t cmd = s->fifo[0] >> CMD_SHIFT;
> +    bool success = false;
> +
> +    if (!s->fifo_idx) {
> +        return;
> +    }
> +
> +    switch (cmd) {
> +    case CMD_KEY:
> +        success = cmd_key(s);
> +        break;
> +    case CMD_IV:
> +        success = cmd_iv(s);
> +        break;
> +    case CMD_DATA:
> +        success = cmd_data(s);
> +        break;
> +    case CMD_STORE_IV:
> +        success = cmd_store_iv(s);
> +        break;
> +    case CMD_FLAG:
> +        success = cmd_flag(s);
> +        break;
> +    default:
> +        s->irq_status |= REG_IRQ_STATUS_INVALID_CMD;
> +        break;
> +    }
> +
> +    if (success) {
> +        s->fifo_idx = 0;
> +    }
> +
> +    trace_aes_fifo_process(cmd, success ? 1 : 0);
> +}
> +
> +static void aes1_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
> +{
> +    AESState *s = opaque;
> +
> +    trace_aes_write(offset, val);
> +
> +    switch (offset) {
> +    case REG_IRQ_STATUS:
> +        s->irq_status &= ~val;
> +        break;
> +    case REG_IRQ_ENABLE:
> +        s->irq_enable = val;
> +        break;
> +    case REG_FIFO:
> +        fifo_append(s, val);
> +        fifo_process(s);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_UNIMP,
> +                      "%s: Unknown AES MMIO offset %"PRIx64", data %"PRIx64"\n",
> +                      __func__, offset, val);
> +        return;
> +    }
> +
> +    aes_update_irq(s);
> +}
> +
> +static const MemoryRegionOps aes1_ops = {
> +    .read = aes1_read,
> +    .write = aes1_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .valid = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 4,
> +    },
> +};
> +
> +static uint64_t aes2_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    uint64_t res = 0;
> +
> +    switch (offset) {
> +    case 0:
> +        res = 0;
> +        break;
> +    default:
> +        trace_aes_2_read_unknown(offset);

aes1_read uses LOG_UNIMP. Let's keep them consistent.

> +        break;
> +    }
> +
> +    trace_aes_2_read(offset, res);
> +
> +    return res;
> +}
> +
> +static void aes2_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
> +{
> +    trace_aes_2_write(offset, val);
> +
> +    switch (offset) {
> +    default:
> +        trace_aes_2_write_unknown(offset);
> +        return;
> +    }
> +}
> +
> +static const MemoryRegionOps aes2_ops = {
> +    .read = aes2_read,
> +    .write = aes2_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .valid = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 4,
> +    },
> +};
> +
> +static void aes_reset(Object *obj, ResetType type)
> +{
> +    AESState *s = AES(obj);
> +
> +    s->status = 0x3f80;
> +    s->q_status = 2;
> +    s->irq_status = 0;
> +    s->irq_enable = 0;
> +    s->watermark = 0;
> +}
> +
> +static void aes_init(Object *obj)
> +{
> +    AESState *s = AES(obj);
> +
> +    memory_region_init_io(&s->iomem1, obj, &aes1_ops, s, TYPE_AES, 0x4000);
> +    memory_region_init_io(&s->iomem2, obj, &aes2_ops, s, TYPE_AES, 0x4000);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem1);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem2);
> +    sysbus_init_irq(SYS_BUS_DEVICE(s), &s->irq);
> +    s->as = &address_space_memory;
> +}
> +
> +static void aes_class_init(ObjectClass *klass, void *data)
> +{
> +    ResettableClass *rc = RESETTABLE_CLASS(klass);
> +
> +    rc->phases.hold = aes_reset;
> +}
> +
> +static const TypeInfo aes_info = {
> +    .name          = TYPE_AES,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(AESState),
> +    .class_init    = aes_class_init,
> +    .instance_init = aes_init,
> +};
> +
> +static void aes_register_types(void)
> +{
> +    type_register_static(&aes_info);
> +}
> +
> +type_init(aes_register_types)
> diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
> index e69de29bb2d..bcd4dcb28d2 100644
> --- a/hw/vmapple/meson.build
> +++ b/hw/vmapple/meson.build
> @@ -0,0 +1 @@
> +system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
> diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events
> index 9ccc5790487..12757bc0852 100644
> --- a/hw/vmapple/trace-events
> +++ b/hw/vmapple/trace-events
> @@ -1,2 +1,18 @@
>   # See docs/devel/tracing.rst for syntax documentation.
>   
> +# aes.c
> +aes_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
> +aes_cmd_key_select_builtin(uint32_t ctx, uint32_t key_id, const char *direction, const char *cipher) "[%d] Selecting builtin key %d to %scrypt with %s"
> +aes_cmd_key_select_new(uint32_t ctx, uint32_t key_len, const char *direction, const char *cipher) "[%d] Selecting new key size=%d to %scrypt with %s"
> +aes_cmd_iv(uint32_t ctx, uint32_t iv0, uint32_t iv1, uint32_t iv2, uint32_t iv3) "[%d] 0x%08x 0x%08x 0x%08x 0x%08x"
> +aes_cmd_data(uint32_t key, uint32_t iv, uint64_t src, uint64_t dst, uint32_t len) "[key=%d iv=%d] src=0x%"PRIx64" dst=0x%"PRIx64" len=0x%x"
> +aes_cmd_store_iv(uint32_t ctx, uint64_t addr, uint32_t iv0, uint32_t iv1, uint32_t iv2, uint32_t iv3) "[%d] addr=0x%"PRIx64"x -> 0x%08x 0x%08x 0x%08x 0x%08x"
> +aes_cmd_flag(uint32_t raise, uint32_t flag_info) "raise=%d flag_info=0x%x"
> +aes_fifo_process(uint32_t cmd, uint32_t success) "cmd=%d success=%d"
> +aes_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
> +aes_2_read_unknown(uint64_t offset) "offset=0x%"PRIx64
> +aes_2_read(uint64_t offset, uint64_t res) "offset=0x%"PRIx64" res=0x%"PRIx64
> +aes_2_write_unknown(uint64_t offset) "offset=0x%"PRIx64
> +aes_2_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
> +aes_dump_data(const char *desc, const char *hex) "%s%s"
> +
> diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
> index 34a9b9b2204..36c68ce86c5 100644
> --- a/include/qemu/cutils.h
> +++ b/include/qemu/cutils.h
> @@ -302,4 +302,19 @@ GString *qemu_hexdump_line(GString *str, const void *buf, size_t len,
>   void qemu_hexdump(FILE *fp, const char *prefix,
>                     const void *bufptr, size_t size);
>   
> +/**
> + * qemu_hexdump_to_buffer:
> + * @buffer: output string buffer
> + * @buffer_size: amount of available space in buffer. Must be at least
> + *               data_size*2+1.
> + * @data: input bytes
> + * @data_size: number of bytes in data
> + *
> + * Converts the @data_size bytes in @data into hex digit pairs, writing them to
> + * @buffer. Finally, a nul terminating character is written; @buffer therefore
> + * needs space for (data_size*2+1) chars.
> + */
> +void qemu_hexdump_to_buffer(char *restrict buffer, size_t buffer_size,
> +                            const uint8_t *restrict data, size_t data_size);
> +
>   #endif
> diff --git a/util/hexdump.c b/util/hexdump.c
> index ae0d4992dcf..86345db20a8 100644
> --- a/util/hexdump.c
> +++ b/util/hexdump.c
> @@ -97,3 +97,17 @@ void qemu_hexdump(FILE *fp, const char *prefix,
>       }
>   
>   }
> +
> +void qemu_hexdump_to_buffer(char *restrict buffer, size_t buffer_size,
> +                            const uint8_t *restrict data, size_t data_size)
> +{
> +    size_t i;
> +
> +    assert(buffer_size >= data_size * 2 + 1 && buffer_size > data_size);

I suspect the latter condition is to catch the case where
data_size * 2 + 1 overflows, but it is insufficient, strictly speaking. 
It will pass if buffer_size == ((size_t)1 << 63) + 1 && data_size == 
((size_t)1 << 63).

> +    for (i = 0; i < data_size; i++) {
> +        uint8_t val = data[i];
> +        *(buffer++) = hexdump_nibble(val >> 4);
> +        *(buffer++) = hexdump_nibble(val & 0xf);
> +    }
> +    *buffer = '\0';
> +}



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 11/15] hw/vmapple/bdif: Introduce vmapple backdoor interface
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (9 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 10/15] hw/vmapple/aes: Introduce aes engine Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-24 10:28 ` [PATCH v4 12/15] hw/vmapple/cfg: Introduce vmapple cfg region Phil Dennis-Jordan
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG
emulation) via virtio-pci as well as a special, simple backdoor platform
device.

This patch implements this backdoor platform device to the best of my
understanding. I left out any USB OTG parts; they're only needed for
guest recovery and I don't understand the protocol yet.

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
v4:

 * Moved most header code to .c, rest to vmapple.h
 * Better compliance with coding, naming, and formatting conventions.

 hw/vmapple/Kconfig           |   3 +
 hw/vmapple/bdif.c            | 259 +++++++++++++++++++++++++++++++++++
 hw/vmapple/meson.build       |   1 +
 hw/vmapple/trace-events      |   5 +
 include/hw/vmapple/vmapple.h |  15 ++
 5 files changed, 283 insertions(+)
 create mode 100644 hw/vmapple/bdif.c
 create mode 100644 include/hw/vmapple/vmapple.h

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index a73504d5999..68f88876eb9 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -1,3 +1,6 @@
 config VMAPPLE_AES
     bool
 
+config VMAPPLE_BDIF
+    bool
+
diff --git a/hw/vmapple/bdif.c b/hw/vmapple/bdif.c
new file mode 100644
index 00000000000..8a697d759bd
--- /dev/null
+++ b/hw/vmapple/bdif.c
@@ -0,0 +1,259 @@
+/*
+ * VMApple Backdoor Interface
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "hw/vmapple/vmapple.h"
+#include "hw/sysbus.h"
+#include "hw/block/block.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(VMAppleBdifState, VMAPPLE_BDIF)
+
+struct VMAppleBdifState {
+    SysBusDevice parent_obj;
+
+    BlockBackend *aux;
+    BlockBackend *root;
+    MemoryRegion mmio;
+};
+
+#define VMAPPLE_BDIF_SIZE   0x00200000
+
+#define REG_DEVID_MASK      0xffff0000
+#define DEVID_ROOT          0x00000000
+#define DEVID_AUX           0x00010000
+#define DEVID_USB           0x00100000
+
+#define REG_STATUS          0x0
+#define REG_STATUS_ACTIVE     BIT(0)
+#define REG_CFG             0x4
+#define REG_CFG_ACTIVE        BIT(1)
+#define REG_UNK1            0x8
+#define REG_BUSY            0x10
+#define REG_BUSY_READY        BIT(0)
+#define REG_UNK2            0x400
+#define REG_CMD             0x408
+#define REG_NEXT_DEVICE     0x420
+#define REG_UNK3            0x434
+
+typedef struct VblkSector {
+    uint32_t pad;
+    uint32_t pad2;
+    uint32_t sector;
+    uint32_t pad3;
+} VblkSector;
+
+typedef struct VblkReqCmd {
+    uint64_t addr;
+    uint32_t len;
+    uint32_t flags;
+} VblkReqCmd;
+
+typedef struct VblkReq {
+    VblkReqCmd sector;
+    VblkReqCmd data;
+    VblkReqCmd retval;
+} VblkReq;
+
+#define VBLK_DATA_FLAGS_READ  0x00030001
+#define VBLK_DATA_FLAGS_WRITE 0x00010001
+
+#define VBLK_RET_SUCCESS  0
+#define VBLK_RET_FAILED   1
+
+static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size)
+{
+    uint64_t ret = -1;
+    uint64_t devid = offset & REG_DEVID_MASK;
+
+    switch (offset & ~REG_DEVID_MASK) {
+    case REG_STATUS:
+        ret = REG_STATUS_ACTIVE;
+        break;
+    case REG_CFG:
+        ret = REG_CFG_ACTIVE;
+        break;
+    case REG_UNK1:
+        ret = 0x420;
+        break;
+    case REG_BUSY:
+        ret = REG_BUSY_READY;
+        break;
+    case REG_UNK2:
+        ret = 0x1;
+        break;
+    case REG_UNK3:
+        ret = 0x0;
+        break;
+    case REG_NEXT_DEVICE:
+        switch (devid) {
+        case DEVID_ROOT:
+            ret = 0x8000000;
+            break;
+        case DEVID_AUX:
+            ret = 0x10000;
+            break;
+        }
+        break;
+    }
+
+    trace_bdif_read(offset, size, ret);
+    return ret;
+}
+
+static void le2cpu_sector(VblkSector *sector)
+{
+    sector->sector = le32_to_cpu(sector->sector);
+}
+
+static void le2cpu_reqcmd(VblkReqCmd *cmd)
+{
+    cmd->addr = le64_to_cpu(cmd->addr);
+    cmd->len = le32_to_cpu(cmd->len);
+    cmd->flags = le32_to_cpu(cmd->flags);
+}
+
+static void le2cpu_req(VblkReq *req)
+{
+    le2cpu_reqcmd(&req->sector);
+    le2cpu_reqcmd(&req->data);
+    le2cpu_reqcmd(&req->retval);
+}
+
+static void vblk_cmd(uint64_t devid, BlockBackend *blk, uint64_t value,
+                     uint64_t static_off)
+{
+    VblkReq req;
+    VblkSector sector;
+    uint64_t off = 0;
+    char *buf = NULL;
+    uint8_t ret = VBLK_RET_FAILED;
+    int r;
+
+    cpu_physical_memory_read(value, &req, sizeof(req));
+    le2cpu_req(&req);
+
+    if (req.sector.len != sizeof(sector)) {
+        ret = VBLK_RET_FAILED;
+        goto out;
+    }
+
+    /* Read the vblk command */
+    cpu_physical_memory_read(req.sector.addr, &sector, sizeof(sector));
+    le2cpu_sector(&sector);
+
+    off = sector.sector * 512ULL + static_off;
+
+    /* Sanity check that we're not allocating bogus sizes */
+    if (req.data.len > 128 * MiB) {
+        goto out;
+    }
+
+    buf = g_malloc0(req.data.len);
+    switch (req.data.flags) {
+    case VBLK_DATA_FLAGS_READ:
+        r = blk_pread(blk, off, req.data.len, buf, 0);
+        trace_bdif_vblk_read(devid == DEVID_AUX ? "aux" : "root",
+                             req.data.addr, off, req.data.len, r);
+        if (r < 0) {
+            goto out;
+        }
+        cpu_physical_memory_write(req.data.addr, buf, req.data.len);
+        ret = VBLK_RET_SUCCESS;
+        break;
+    case VBLK_DATA_FLAGS_WRITE:
+        /* Not needed, iBoot only reads */
+        break;
+    default:
+        break;
+    }
+
+out:
+    g_free(buf);
+    cpu_physical_memory_write(req.retval.addr, &ret, 1);
+}
+
+static void bdif_write(void *opaque, hwaddr offset,
+                       uint64_t value, unsigned size)
+{
+    VMAppleBdifState *s = opaque;
+    uint64_t devid = (offset & REG_DEVID_MASK);
+
+    trace_bdif_write(offset, size, value);
+
+    switch (offset & ~REG_DEVID_MASK) {
+    case REG_CMD:
+        switch (devid) {
+        case DEVID_ROOT:
+            vblk_cmd(devid, s->root, value, 0x0);
+            break;
+        case DEVID_AUX:
+            vblk_cmd(devid, s->aux, value, 0x0);
+            break;
+        }
+        break;
+    }
+}
+
+static const MemoryRegionOps bdif_ops = {
+    .read = bdif_read,
+    .write = bdif_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
+static void bdif_init(Object *obj)
+{
+    VMAppleBdifState *s = VMAPPLE_BDIF(obj);
+
+    memory_region_init_io(&s->mmio, obj, &bdif_ops, obj,
+                         "VMApple Backdoor Interface", VMAPPLE_BDIF_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mmio);
+}
+
+static Property bdif_properties[] = {
+    DEFINE_PROP_DRIVE("aux", VMAppleBdifState, aux),
+    DEFINE_PROP_DRIVE("root", VMAppleBdifState, root),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void bdif_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc = "VMApple Backdoor Interface";
+    device_class_set_props(dc, bdif_properties);
+}
+
+static const TypeInfo bdif_info = {
+    .name          = TYPE_VMAPPLE_BDIF,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VMAppleBdifState),
+    .instance_init = bdif_init,
+    .class_init    = bdif_class_init,
+};
+
+static void bdif_register_types(void)
+{
+    type_register_static(&bdif_info);
+}
+
+type_init(bdif_register_types)
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index bcd4dcb28d2..d4624713deb 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -1 +1,2 @@
 system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
+system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events
index 12757bc0852..6c3fdb389a6 100644
--- a/hw/vmapple/trace-events
+++ b/hw/vmapple/trace-events
@@ -16,3 +16,8 @@ aes_2_write_unknown(uint64_t offset) "offset=0x%"PRIx64
 aes_2_write(uint64_t offset, uint64_t val) "offset=0x%"PRIx64" val=0x%"PRIx64
 aes_dump_data(const char *desc, const char *hex) "%s%s"
 
+# bdif.c
+bdif_read(uint64_t offset, uint32_t size, uint64_t value) "offset=0x%"PRIx64" size=0x%x value=0x%"PRIx64
+bdif_write(uint64_t offset, uint32_t size, uint64_t value) "offset=0x%"PRIx64" size=0x%x value=0x%"PRIx64
+bdif_vblk_read(const char *dev, uint64_t addr, uint64_t offset, uint32_t len, int r) "dev=%s addr=0x%"PRIx64" off=0x%"PRIx64" size=0x%x r=%d"
+
diff --git a/include/hw/vmapple/vmapple.h b/include/hw/vmapple/vmapple.h
new file mode 100644
index 00000000000..a4c87b166d5
--- /dev/null
+++ b/include/hw/vmapple/vmapple.h
@@ -0,0 +1,15 @@
+/*
+ * Devices specific to the VMApple machine type
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VMAPPLE_VMAPPLE_H
+#define HW_VMAPPLE_VMAPPLE_H
+
+#define TYPE_VMAPPLE_BDIF "vmapple-bdif"
+
+#endif /* HW_VMAPPLE_VMAPPLE_H */
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 12/15] hw/vmapple/cfg: Introduce vmapple cfg region
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (10 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 11/15] hw/vmapple/bdif: Introduce vmapple backdoor interface Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  5:48   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 13/15] hw/vmapple/virtio-blk: Add support for apple virtio-blk Phil Dennis-Jordan
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

Instead of device tree or other more standardized means, VMApple passes
platform configuration to the first stage boot loader in a binary encoded
format that resides at a dedicated RAM region in physical address space.

This patch models this configuration space as a qdev device which we can
then map at the fixed location in the address space. That way, we can
influence and annotate all configuration fields easily.

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
v3:

 * Replaced legacy device reset method with Resettable method

v4:

 * Fixed initialisation of default values for properties
 * Dropped superfluous endianness conversions
 * Moved most header code to .c, device name #define goes in vmapple.h

 hw/vmapple/Kconfig           |   3 +
 hw/vmapple/cfg.c             | 197 +++++++++++++++++++++++++++++++++++
 hw/vmapple/meson.build       |   1 +
 include/hw/vmapple/vmapple.h |   2 +
 4 files changed, 203 insertions(+)
 create mode 100644 hw/vmapple/cfg.c

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 68f88876eb9..8bbeb9a9237 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -4,3 +4,6 @@ config VMAPPLE_AES
 config VMAPPLE_BDIF
     bool
 
+config VMAPPLE_CFG
+    bool
+
diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c
new file mode 100644
index 00000000000..aeb76ba363c
--- /dev/null
+++ b/hw/vmapple/cfg.c
@@ -0,0 +1,197 @@
+/*
+ * VMApple Configuration Region
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/vmapple.h"
+#include "hw/sysbus.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "net/net.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(VMAppleCfgState, VMAPPLE_CFG)
+
+#define VMAPPLE_CFG_SIZE 0x00010000
+
+typedef struct VMAppleCfg {
+    uint32_t version;         /* 0x000 */
+    uint32_t nr_cpus;         /* 0x004 */
+    uint32_t unk1;            /* 0x008 */
+    uint32_t unk2;            /* 0x00c */
+    uint32_t unk3;            /* 0x010 */
+    uint32_t unk4;            /* 0x014 */
+    uint64_t ecid;            /* 0x018 */
+    uint64_t ram_size;        /* 0x020 */
+    uint32_t run_installer1;  /* 0x028 */
+    uint32_t unk5;            /* 0x02c */
+    uint32_t unk6;            /* 0x030 */
+    uint32_t run_installer2;  /* 0x034 */
+    uint32_t rnd;             /* 0x038 */
+    uint32_t unk7;            /* 0x03c */
+    MACAddr mac_en0;          /* 0x040 */
+    uint8_t pad1[2];
+    MACAddr mac_en1;          /* 0x048 */
+    uint8_t pad2[2];
+    MACAddr mac_wifi0;        /* 0x050 */
+    uint8_t pad3[2];
+    MACAddr mac_bt0;          /* 0x058 */
+    uint8_t pad4[2];
+    uint8_t reserved[0xa0];   /* 0x060 */
+    uint32_t cpu_ids[0x80];   /* 0x100 */
+    uint8_t scratch[0x200];   /* 0x180 */
+    char serial[32];          /* 0x380 */
+    char unk8[32];            /* 0x3a0 */
+    char model[32];           /* 0x3c0 */
+    uint8_t unk9[32];         /* 0x3e0 */
+    uint32_t unk10;           /* 0x400 */
+    char soc_name[32];        /* 0x404 */
+} VMAppleCfg;
+
+struct VMAppleCfgState {
+    SysBusDevice parent_obj;
+    VMAppleCfg cfg;
+
+    MemoryRegion mem;
+    char *serial;
+    char *model;
+    char *soc_name;
+};
+
+static void vmapple_cfg_reset(Object *obj, ResetType type)
+{
+    VMAppleCfgState *s = VMAPPLE_CFG(obj);
+    VMAppleCfg *cfg;
+
+    cfg = memory_region_get_ram_ptr(&s->mem);
+    memset((void *)cfg, 0, VMAPPLE_CFG_SIZE);
+    *cfg = s->cfg;
+}
+
+static bool strlcpy_set_error(char *restrict dst, const char *restrict src,
+                              size_t dst_size, Error **errp,
+                              const char *parent_func, const char *location,
+                              const char *buffer_name)
+{
+    size_t len;
+
+    len = g_strlcpy(dst, src, dst_size);
+    if (len < dst_size) { /* len does not count nul terminator */
+        return true;
+    }
+
+    error_setg(errp,
+               "strlcpy_set_error: %s (%s): Destination buffer %s too small "
+               "(need %zu, have %zu)",
+               parent_func, location, buffer_name, len + 1, dst_size);
+    return false;
+}
+
+/*
+ * String copying wrapper that returns and reports a runtime error in
+ * case of truncation due to insufficient destination buffer space.
+ */
+#define strlcpy_array_return_error(dst_array, src, errp) \
+    do { \
+        if (!strlcpy_set_error((dst_array), (src), ARRAY_SIZE(dst_array), (errp),\
+                               __func__, stringify(__LINE__), # dst_array)) { \
+            return; \
+        } \
+    } while (0)
+
+static void vmapple_cfg_realize(DeviceState *dev, Error **errp)
+{
+    VMAppleCfgState *s = VMAPPLE_CFG(dev);
+    uint32_t i;
+
+    if (!s->serial) {
+        s->serial = g_strdup("1234");
+    }
+    if (!s->model) {
+        s->model = g_strdup("VM0001");
+    }
+    if (!s->soc_name) {
+        s->soc_name = g_strdup("Apple M1 (Virtual)");
+    }
+
+    strlcpy_array_return_error(s->cfg.serial, s->serial, errp);
+    strlcpy_array_return_error(s->cfg.model, s->model, errp);
+    strlcpy_array_return_error(s->cfg.soc_name, s->soc_name, errp);
+    strlcpy_array_return_error(s->cfg.unk8, "D/A", errp);
+    s->cfg.version = 2;
+    s->cfg.unk1 = 1;
+    s->cfg.unk2 = 1;
+    s->cfg.unk3 = 0x20;
+    s->cfg.unk4 = 0;
+    s->cfg.unk5 = 1;
+    s->cfg.unk6 = 1;
+    s->cfg.unk7 = 0;
+    s->cfg.unk10 = 1;
+
+    if (s->cfg.nr_cpus > ARRAY_SIZE(s->cfg.cpu_ids)) {
+        error_setg(errp,
+                   "Failed to create %u CPUs, vmapple machine supports %zu max",
+                   s->cfg.nr_cpus, ARRAY_SIZE(s->cfg.cpu_ids));
+        return;
+    }
+    for (i = 0; i < s->cfg.nr_cpus; i++) {
+        s->cfg.cpu_ids[i] = i;
+    }
+}
+
+static void vmapple_cfg_init(Object *obj)
+{
+    VMAppleCfgState *s = VMAPPLE_CFG(obj);
+
+    memory_region_init_ram(&s->mem, obj, "VMApple Config", VMAPPLE_CFG_SIZE,
+                           &error_fatal);
+    sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mem);
+}
+
+static Property vmapple_cfg_properties[] = {
+    DEFINE_PROP_UINT32("nr-cpus", VMAppleCfgState, cfg.nr_cpus, 1),
+    DEFINE_PROP_UINT64("ecid", VMAppleCfgState, cfg.ecid, 0),
+    DEFINE_PROP_UINT64("ram-size", VMAppleCfgState, cfg.ram_size, 0),
+    DEFINE_PROP_UINT32("run_installer1", VMAppleCfgState, cfg.run_installer1, 0),
+    DEFINE_PROP_UINT32("run_installer2", VMAppleCfgState, cfg.run_installer2, 0),
+    DEFINE_PROP_UINT32("rnd", VMAppleCfgState, cfg.rnd, 0),
+    DEFINE_PROP_MACADDR("mac-en0", VMAppleCfgState, cfg.mac_en0),
+    DEFINE_PROP_MACADDR("mac-en1", VMAppleCfgState, cfg.mac_en1),
+    DEFINE_PROP_MACADDR("mac-wifi0", VMAppleCfgState, cfg.mac_wifi0),
+    DEFINE_PROP_MACADDR("mac-bt0", VMAppleCfgState, cfg.mac_bt0),
+    DEFINE_PROP_STRING("serial", VMAppleCfgState, serial),
+    DEFINE_PROP_STRING("model", VMAppleCfgState, model),
+    DEFINE_PROP_STRING("soc_name", VMAppleCfgState, soc_name),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vmapple_cfg_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
+
+    dc->realize = vmapple_cfg_realize;
+    dc->desc = "VMApple Configuration Region";
+    device_class_set_props(dc, vmapple_cfg_properties);
+    rc->phases.hold = vmapple_cfg_reset;
+}
+
+static const TypeInfo vmapple_cfg_info = {
+    .name          = TYPE_VMAPPLE_CFG,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VMAppleCfgState),
+    .instance_init = vmapple_cfg_init,
+    .class_init    = vmapple_cfg_class_init,
+};
+
+static void vmapple_cfg_register_types(void)
+{
+    type_register_static(&vmapple_cfg_info);
+}
+
+type_init(vmapple_cfg_register_types)
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index d4624713deb..64b78693a31 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -1,2 +1,3 @@
 system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
 system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
+system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
diff --git a/include/hw/vmapple/vmapple.h b/include/hw/vmapple/vmapple.h
index a4c87b166d5..984281b6a67 100644
--- a/include/hw/vmapple/vmapple.h
+++ b/include/hw/vmapple/vmapple.h
@@ -12,4 +12,6 @@
 
 #define TYPE_VMAPPLE_BDIF "vmapple-bdif"
 
+#define TYPE_VMAPPLE_CFG "vmapple-cfg"
+
 #endif /* HW_VMAPPLE_VMAPPLE_H */
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 12/15] hw/vmapple/cfg: Introduce vmapple cfg region
  2024-10-24 10:28 ` [PATCH v4 12/15] hw/vmapple/cfg: Introduce vmapple cfg region Phil Dennis-Jordan
@ 2024-10-26  5:48   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  5:48 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> From: Alexander Graf <graf@amazon.com>
> 
> Instead of device tree or other more standardized means, VMApple passes
> platform configuration to the first stage boot loader in a binary encoded
> format that resides at a dedicated RAM region in physical address space.
> 
> This patch models this configuration space as a qdev device which we can
> then map at the fixed location in the address space. That way, we can
> influence and annotate all configuration fields easily.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> v3:
> 
>   * Replaced legacy device reset method with Resettable method
> 
> v4:
> 
>   * Fixed initialisation of default values for properties
>   * Dropped superfluous endianness conversions
>   * Moved most header code to .c, device name #define goes in vmapple.h
> 
>   hw/vmapple/Kconfig           |   3 +
>   hw/vmapple/cfg.c             | 197 +++++++++++++++++++++++++++++++++++
>   hw/vmapple/meson.build       |   1 +
>   include/hw/vmapple/vmapple.h |   2 +
>   4 files changed, 203 insertions(+)
>   create mode 100644 hw/vmapple/cfg.c
> 
> diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
> index 68f88876eb9..8bbeb9a9237 100644
> --- a/hw/vmapple/Kconfig
> +++ b/hw/vmapple/Kconfig
> @@ -4,3 +4,6 @@ config VMAPPLE_AES
>   config VMAPPLE_BDIF
>       bool
>   
> +config VMAPPLE_CFG
> +    bool
> +
> diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c
> new file mode 100644
> index 00000000000..aeb76ba363c
> --- /dev/null
> +++ b/hw/vmapple/cfg.c
> @@ -0,0 +1,197 @@
> +/*
> + * VMApple Configuration Region
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/vmapple/vmapple.h"
> +#include "hw/sysbus.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qapi/error.h"
> +#include "net/net.h"
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(VMAppleCfgState, VMAPPLE_CFG)
> +
> +#define VMAPPLE_CFG_SIZE 0x00010000
> +
> +typedef struct VMAppleCfg {
> +    uint32_t version;         /* 0x000 */
> +    uint32_t nr_cpus;         /* 0x004 */
> +    uint32_t unk1;            /* 0x008 */
> +    uint32_t unk2;            /* 0x00c */
> +    uint32_t unk3;            /* 0x010 */
> +    uint32_t unk4;            /* 0x014 */
> +    uint64_t ecid;            /* 0x018 */
> +    uint64_t ram_size;        /* 0x020 */
> +    uint32_t run_installer1;  /* 0x028 */
> +    uint32_t unk5;            /* 0x02c */
> +    uint32_t unk6;            /* 0x030 */
> +    uint32_t run_installer2;  /* 0x034 */
> +    uint32_t rnd;             /* 0x038 */
> +    uint32_t unk7;            /* 0x03c */
> +    MACAddr mac_en0;          /* 0x040 */
> +    uint8_t pad1[2];
> +    MACAddr mac_en1;          /* 0x048 */
> +    uint8_t pad2[2];
> +    MACAddr mac_wifi0;        /* 0x050 */
> +    uint8_t pad3[2];
> +    MACAddr mac_bt0;          /* 0x058 */
> +    uint8_t pad4[2];
> +    uint8_t reserved[0xa0];   /* 0x060 */
> +    uint32_t cpu_ids[0x80];   /* 0x100 */
> +    uint8_t scratch[0x200];   /* 0x180 */
> +    char serial[32];          /* 0x380 */
> +    char unk8[32];            /* 0x3a0 */
> +    char model[32];           /* 0x3c0 */
> +    uint8_t unk9[32];         /* 0x3e0 */
> +    uint32_t unk10;           /* 0x400 */
> +    char soc_name[32];        /* 0x404 */
> +} VMAppleCfg;
> +
> +struct VMAppleCfgState {
> +    SysBusDevice parent_obj;
> +    VMAppleCfg cfg;
> +
> +    MemoryRegion mem;
> +    char *serial;
> +    char *model;
> +    char *soc_name;
> +};
> +
> +static void vmapple_cfg_reset(Object *obj, ResetType type)
> +{
> +    VMAppleCfgState *s = VMAPPLE_CFG(obj);
> +    VMAppleCfg *cfg;
> +
> +    cfg = memory_region_get_ram_ptr(&s->mem);
> +    memset((void *)cfg, 0, VMAPPLE_CFG_SIZE);

This explicit cast is unnecessary.

> +    *cfg = s->cfg;
> +}
> +
> +static bool strlcpy_set_error(char *restrict dst, const char *restrict src,
> +                              size_t dst_size, Error **errp,
> +                              const char *parent_func, const char *location,
> +                              const char *buffer_name)
> +{
> +    size_t len;
> +
> +    len = g_strlcpy(dst, src, dst_size);
> +    if (len < dst_size) { /* len does not count nul terminator */
> +        return true;
> +    }
> +
> +    error_setg(errp,
> +               "strlcpy_set_error: %s (%s): Destination buffer %s too small "
> +               "(need %zu, have %zu)",
> +               parent_func, location, buffer_name, len + 1, dst_size);

This error message is user-facing so please describe the property name 
the user specified instead of writing the function name.

> +    return false;
> +}
> +
> +/*
> + * String copying wrapper that returns and reports a runtime error in
> + * case of truncation due to insufficient destination buffer space.
> + */
> +#define strlcpy_array_return_error(dst_array, src, errp) \
> +    do { \
> +        if (!strlcpy_set_error((dst_array), (src), ARRAY_SIZE(dst_array), (errp),\
> +                               __func__, stringify(__LINE__), # dst_array)) { \
> +            return; \
> +        } \
> +    } while (0)
> +
> +static void vmapple_cfg_realize(DeviceState *dev, Error **errp)
> +{
> +    VMAppleCfgState *s = VMAPPLE_CFG(dev);
> +    uint32_t i;
> +
> +    if (!s->serial) {
> +        s->serial = g_strdup("1234");
> +    }
> +    if (!s->model) {
> +        s->model = g_strdup("VM0001");
> +    }
> +    if (!s->soc_name) {
> +        s->soc_name = g_strdup("Apple M1 (Virtual)");
> +    }
> +
> +    strlcpy_array_return_error(s->cfg.serial, s->serial, errp);
> +    strlcpy_array_return_error(s->cfg.model, s->model, errp);
> +    strlcpy_array_return_error(s->cfg.soc_name, s->soc_name, errp);
> +    strlcpy_array_return_error(s->cfg.unk8, "D/A", errp);
> +    s->cfg.version = 2;
> +    s->cfg.unk1 = 1;
> +    s->cfg.unk2 = 1;
> +    s->cfg.unk3 = 0x20;
> +    s->cfg.unk4 = 0;
> +    s->cfg.unk5 = 1;
> +    s->cfg.unk6 = 1;
> +    s->cfg.unk7 = 0;
> +    s->cfg.unk10 = 1;
> +
> +    if (s->cfg.nr_cpus > ARRAY_SIZE(s->cfg.cpu_ids)) {
> +        error_setg(errp,
> +                   "Failed to create %u CPUs, vmapple machine supports %zu max",
> +                   s->cfg.nr_cpus, ARRAY_SIZE(s->cfg.cpu_ids));
> +        return;
> +    }
> +    for (i = 0; i < s->cfg.nr_cpus; i++) {
> +        s->cfg.cpu_ids[i] = i;
> +    }
> +}
> +
> +static void vmapple_cfg_init(Object *obj)
> +{
> +    VMAppleCfgState *s = VMAPPLE_CFG(obj);
> +
> +    memory_region_init_ram(&s->mem, obj, "VMApple Config", VMAPPLE_CFG_SIZE,
> +                           &error_fatal);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mem);
> +}
> +
> +static Property vmapple_cfg_properties[] = {
> +    DEFINE_PROP_UINT32("nr-cpus", VMAppleCfgState, cfg.nr_cpus, 1),
> +    DEFINE_PROP_UINT64("ecid", VMAppleCfgState, cfg.ecid, 0),
> +    DEFINE_PROP_UINT64("ram-size", VMAppleCfgState, cfg.ram_size, 0),
> +    DEFINE_PROP_UINT32("run_installer1", VMAppleCfgState, cfg.run_installer1, 0),
> +    DEFINE_PROP_UINT32("run_installer2", VMAppleCfgState, cfg.run_installer2, 0),
> +    DEFINE_PROP_UINT32("rnd", VMAppleCfgState, cfg.rnd, 0),
> +    DEFINE_PROP_MACADDR("mac-en0", VMAppleCfgState, cfg.mac_en0),
> +    DEFINE_PROP_MACADDR("mac-en1", VMAppleCfgState, cfg.mac_en1),
> +    DEFINE_PROP_MACADDR("mac-wifi0", VMAppleCfgState, cfg.mac_wifi0),
> +    DEFINE_PROP_MACADDR("mac-bt0", VMAppleCfgState, cfg.mac_bt0),
> +    DEFINE_PROP_STRING("serial", VMAppleCfgState, serial),
> +    DEFINE_PROP_STRING("model", VMAppleCfgState, model),
> +    DEFINE_PROP_STRING("soc_name", VMAppleCfgState, soc_name),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vmapple_cfg_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    ResettableClass *rc = RESETTABLE_CLASS(klass);
> +
> +    dc->realize = vmapple_cfg_realize;
> +    dc->desc = "VMApple Configuration Region";
> +    device_class_set_props(dc, vmapple_cfg_properties);
> +    rc->phases.hold = vmapple_cfg_reset;
> +}
> +
> +static const TypeInfo vmapple_cfg_info = {
> +    .name          = TYPE_VMAPPLE_CFG,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(VMAppleCfgState),
> +    .instance_init = vmapple_cfg_init,
> +    .class_init    = vmapple_cfg_class_init,
> +};
> +
> +static void vmapple_cfg_register_types(void)
> +{
> +    type_register_static(&vmapple_cfg_info);
> +}
> +
> +type_init(vmapple_cfg_register_types)
> diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
> index d4624713deb..64b78693a31 100644
> --- a/hw/vmapple/meson.build
> +++ b/hw/vmapple/meson.build
> @@ -1,2 +1,3 @@
>   system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
>   system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
> +system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
> diff --git a/include/hw/vmapple/vmapple.h b/include/hw/vmapple/vmapple.h
> index a4c87b166d5..984281b6a67 100644
> --- a/include/hw/vmapple/vmapple.h
> +++ b/include/hw/vmapple/vmapple.h
> @@ -12,4 +12,6 @@
>   
>   #define TYPE_VMAPPLE_BDIF "vmapple-bdif"
>   
> +#define TYPE_VMAPPLE_CFG "vmapple-cfg"
> +
>   #endif /* HW_VMAPPLE_VMAPPLE_H */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 13/15] hw/vmapple/virtio-blk: Add support for apple virtio-blk
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (11 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 12/15] hw/vmapple/cfg: Introduce vmapple cfg region Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  6:02   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 14/15] hw/block/virtio-blk: Replaces request free function with g_free Phil Dennis-Jordan
  2024-10-24 10:28 ` [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type Phil Dennis-Jordan
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

Apple has its own virtio-blk PCI device ID where it deviates from the
official virtio-pci spec slightly: It puts a new "apple type"
field at a static offset in config space and introduces a new barrier
command.

This patch first creates a mechanism for virtio-blk downstream classes to
handle unknown commands. It then creates such a downstream class and a new
vmapple-virtio-blk-pci class which support the additional apple type config
identifier as well as the barrier command.

It then exposes 2 subclasses from that that we can use to expose root and
aux virtio-blk devices: "vmapple-virtio-root" and "vmapple-virtio-aux".

Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
v4:

 * Use recommended object type declaration pattern.
 * Correctly log unimplemented code paths.
 * Most header code moved to .c, type name #defines moved to vmapple.h

 hw/block/virtio-blk.c          |  19 ++-
 hw/vmapple/Kconfig             |   3 +
 hw/vmapple/meson.build         |   1 +
 hw/vmapple/virtio-blk.c        | 233 +++++++++++++++++++++++++++++++++
 include/hw/pci/pci_ids.h       |   1 +
 include/hw/virtio/virtio-blk.h |  12 +-
 include/hw/vmapple/vmapple.h   |   4 +
 7 files changed, 268 insertions(+), 5 deletions(-)
 create mode 100644 hw/vmapple/virtio-blk.c

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9166d7974d4..9e8337bb639 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -50,12 +50,12 @@ static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
     req->mr_next = NULL;
 }
 
-static void virtio_blk_free_request(VirtIOBlockReq *req)
+void virtio_blk_free_request(VirtIOBlockReq *req)
 {
     g_free(req);
 }
 
-static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
+void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 {
     VirtIOBlock *s = req->dev;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
@@ -966,8 +966,18 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
         break;
     }
     default:
-        virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-        virtio_blk_free_request(req);
+    {
+        /*
+         * Give subclasses a chance to handle unknown requests. This way the
+         * class lookup is not in the hot path.
+         */
+        VirtIOBlkClass *vbk = VIRTIO_BLK_GET_CLASS(s);
+        if (!vbk->handle_unknown_request ||
+            !vbk->handle_unknown_request(req, mrb, type)) {
+            virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+            virtio_blk_free_request(req);
+        }
+    }
     }
     return 0;
 }
@@ -2044,6 +2054,7 @@ static const TypeInfo virtio_blk_info = {
     .instance_size = sizeof(VirtIOBlock),
     .instance_init = virtio_blk_instance_init,
     .class_init = virtio_blk_class_init,
+    .class_size = sizeof(VirtIOBlkClass),
 };
 
 static void virtio_register_types(void)
diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 8bbeb9a9237..bcd1be63e3c 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -7,3 +7,6 @@ config VMAPPLE_BDIF
 config VMAPPLE_CFG
     bool
 
+config VMAPPLE_VIRTIO_BLK
+    bool
+
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index 64b78693a31..bf17cf906c9 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -1,3 +1,4 @@
 system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
 system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
 system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
+system_ss.add(when: 'CONFIG_VMAPPLE_VIRTIO_BLK',  if_true: files('virtio-blk.c'))
diff --git a/hw/vmapple/virtio-blk.c b/hw/vmapple/virtio-blk.c
new file mode 100644
index 00000000000..3a8b47bc55f
--- /dev/null
+++ b/hw/vmapple/virtio-blk.c
@@ -0,0 +1,233 @@
+/*
+ * VMApple specific VirtIO Block implementation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * VMApple uses almost standard VirtIO Block, but with a few key differences:
+ *
+ *  - Different PCI device/vendor ID
+ *  - An additional "type" identifier to differentiate AUX and Root volumes
+ *  - An additional BARRIER command
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/vmapple.h"
+#include "hw/virtio/virtio-blk.h"
+#include "hw/virtio/virtio-pci.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+
+OBJECT_DECLARE_TYPE(VMAppleVirtIOBlk, VMAppleVirtIOBlkClass, VMAPPLE_VIRTIO_BLK)
+
+typedef struct VMAppleVirtIOBlkClass {
+    /*< private >*/
+    VirtIOBlkClass parent;
+    /*< public >*/
+    void (*get_config)(VirtIODevice *vdev, uint8_t *config);
+} VMAppleVirtIOBlkClass;
+
+typedef struct VMAppleVirtIOBlk {
+    /* <private> */
+    VirtIOBlock parent_obj;
+
+    /* <public> */
+    uint32_t apple_type;
+} VMAppleVirtIOBlk;
+
+/*
+ * vmapple-virtio-blk-pci: This extends VirtioPCIProxy.
+ */
+#define TYPE_VMAPPLE_VIRTIO_BLK_PCI "vmapple-virtio-blk-pci-base"
+OBJECT_DECLARE_SIMPLE_TYPE(VMAppleVirtIOBlkPCI, VMAPPLE_VIRTIO_BLK_PCI)
+
+#define VIRTIO_BLK_T_APPLE_BARRIER     0x10000
+
+#define VIRTIO_APPLE_TYPE_ROOT 1
+#define VIRTIO_APPLE_TYPE_AUX  2
+
+static bool vmapple_virtio_blk_handle_unknown_request(VirtIOBlockReq *req,
+                                                      MultiReqBuffer *mrb,
+                                                      uint32_t type)
+{
+    switch (type) {
+    case VIRTIO_BLK_T_APPLE_BARRIER:
+        qemu_log_mask(LOG_UNIMP, "%s: Barrier requests are currently no-ops\n",
+                      __func__);
+        virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+        virtio_blk_free_request(req);
+        return true;
+    default:
+        return false;
+    }
+}
+
+/*
+ * VMApple virtio-blk uses the same config format as normal virtio, with one
+ * exception: It adds an "apple type" specififer at the same location that
+ * the spec reserves for max_secure_erase_sectors. Let's hook into the
+ * get_config code path here, run it as usual and then patch in the apple type.
+ */
+static void vmapple_virtio_blk_get_config(VirtIODevice *vdev, uint8_t *config)
+{
+    VMAppleVirtIOBlk *dev = VMAPPLE_VIRTIO_BLK(vdev);
+    VMAppleVirtIOBlkClass *vvbk = VMAPPLE_VIRTIO_BLK_GET_CLASS(dev);
+    struct virtio_blk_config *blkcfg = (struct virtio_blk_config *)config;
+
+    vvbk->get_config(vdev, config);
+
+    g_assert(dev->parent_obj.config_size >= endof(struct virtio_blk_config, zoned));
+
+    /* Apple abuses the field for max_secure_erase_sectors as type id */
+    blkcfg->max_secure_erase_sectors = dev->apple_type;
+}
+
+static Property vmapple_virtio_blk_properties[] = {
+    DEFINE_PROP_UINT32("apple-type", VMAppleVirtIOBlk, apple_type, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vmapple_virtio_blk_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtIOBlkClass *vbk = VIRTIO_BLK_CLASS(klass);
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+    VMAppleVirtIOBlkClass *vvbk = VMAPPLE_VIRTIO_BLK_CLASS(klass);
+
+    vbk->handle_unknown_request = vmapple_virtio_blk_handle_unknown_request;
+    vvbk->get_config = vdc->get_config;
+    vdc->get_config = vmapple_virtio_blk_get_config;
+    device_class_set_props(dc, vmapple_virtio_blk_properties);
+}
+
+static const TypeInfo vmapple_virtio_blk_info = {
+    .name          = TYPE_VMAPPLE_VIRTIO_BLK,
+    .parent        = TYPE_VIRTIO_BLK,
+    .instance_size = sizeof(VMAppleVirtIOBlk),
+    .class_init    = vmapple_virtio_blk_class_init,
+};
+
+/* PCI Devices */
+
+struct VMAppleVirtIOBlkPCI {
+    VirtIOPCIProxy parent_obj;
+    VMAppleVirtIOBlk vdev;
+    uint32_t apple_type;
+};
+
+
+static Property vmapple_virtio_blk_pci_properties[] = {
+    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
+    DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+    DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
+                       DEV_NVECTORS_UNSPECIFIED),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vmapple_virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(vpci_dev);
+    DeviceState *vdev = DEVICE(&dev->vdev);
+    VirtIOBlkConf *conf = &dev->vdev.parent_obj.conf;
+
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+        conf->num_queues = virtio_pci_optimal_num_queues(0);
+    }
+
+    if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
+        vpci_dev->nvectors = conf->num_queues + 1;
+    }
+
+    /*
+     * We don't support zones, but we need the additional config space size.
+     * Let's just expose the feature so the rest of the virtio-blk logic
+     * allocates enough space for us. The guest will ignore zones anyway.
+     */
+    virtio_add_feature(&dev->vdev.parent_obj.host_features, VIRTIO_BLK_F_ZONED);
+    /* Propagate the apple type down to the virtio-blk device */
+    qdev_prop_set_uint32(DEVICE(&dev->vdev), "apple-type", dev->apple_type);
+    /* and spawn the virtio-blk device */
+    qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+
+    /*
+     * The virtio-pci machinery adjusts its vendor/device ID based on whether
+     * we support modern or legacy virtio. Let's patch it back to the Apple
+     * identifiers here.
+     */
+    pci_config_set_vendor_id(vpci_dev->pci_dev.config, PCI_VENDOR_ID_APPLE);
+    pci_config_set_device_id(vpci_dev->pci_dev.config,
+                             PCI_DEVICE_ID_APPLE_VIRTIO_BLK);
+}
+
+static void vmapple_virtio_blk_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+
+    set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+    device_class_set_props(dc, vmapple_virtio_blk_pci_properties);
+    k->realize = vmapple_virtio_blk_pci_realize;
+    pcidev_k->vendor_id = PCI_VENDOR_ID_APPLE;
+    pcidev_k->device_id = PCI_DEVICE_ID_APPLE_VIRTIO_BLK;
+    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+    pcidev_k->class_id = PCI_CLASS_STORAGE_SCSI;
+}
+
+static void vmapple_virtio_blk_pci_instance_init(Object *obj)
+{
+    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(obj);
+
+    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+                                TYPE_VMAPPLE_VIRTIO_BLK);
+}
+
+static const VirtioPCIDeviceTypeInfo vmapple_virtio_blk_pci_info = {
+    .base_name     = TYPE_VMAPPLE_VIRTIO_BLK_PCI,
+    .generic_name  = "vmapple-virtio-blk-pci",
+    .instance_size = sizeof(VMAppleVirtIOBlkPCI),
+    .instance_init = vmapple_virtio_blk_pci_instance_init,
+    .class_init    = vmapple_virtio_blk_pci_class_init,
+};
+
+static void vmapple_virtio_root_instance_init(Object *obj)
+{
+    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(obj);
+
+    dev->apple_type = VIRTIO_APPLE_TYPE_ROOT;
+}
+
+static const TypeInfo vmapple_virtio_root_info = {
+    .name          = TYPE_VMAPPLE_VIRTIO_ROOT,
+    .parent        = "vmapple-virtio-blk-pci",
+    .instance_size = sizeof(VMAppleVirtIOBlkPCI),
+    .instance_init = vmapple_virtio_root_instance_init,
+};
+
+static void vmapple_virtio_aux_instance_init(Object *obj)
+{
+    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(obj);
+
+    dev->apple_type = VIRTIO_APPLE_TYPE_AUX;
+}
+
+static const TypeInfo vmapple_virtio_aux_info = {
+    .name          = TYPE_VMAPPLE_VIRTIO_AUX,
+    .parent        = "vmapple-virtio-blk-pci",
+    .instance_size = sizeof(VMAppleVirtIOBlkPCI),
+    .instance_init = vmapple_virtio_aux_instance_init,
+};
+
+static void vmapple_virtio_blk_register_types(void)
+{
+    type_register_static(&vmapple_virtio_blk_info);
+    virtio_pci_types_register(&vmapple_virtio_blk_pci_info);
+    type_register_static(&vmapple_virtio_root_info);
+    type_register_static(&vmapple_virtio_aux_info);
+}
+
+type_init(vmapple_virtio_blk_register_types)
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index f1a53fea8d6..33e2898be95 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -191,6 +191,7 @@
 #define PCI_DEVICE_ID_APPLE_UNI_N_AGP    0x0020
 #define PCI_DEVICE_ID_APPLE_U3_AGP       0x004b
 #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC   0x0021
+#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK   0x1a00
 
 #define PCI_VENDOR_ID_SUN                0x108e
 #define PCI_DEVICE_ID_SUN_EBUS           0x1000
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 5c14110c4b1..28d5046ea6c 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -24,7 +24,7 @@
 #include "qapi/qapi-types-virtio.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
-OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
+OBJECT_DECLARE_TYPE(VirtIOBlock, VirtIOBlkClass, VIRTIO_BLK)
 
 /* This is the last element of the write scatter-gather list */
 struct virtio_blk_inhdr
@@ -100,6 +100,16 @@ typedef struct MultiReqBuffer {
     bool is_write;
 } MultiReqBuffer;
 
+typedef struct VirtIOBlkClass {
+    /*< private >*/
+    VirtioDeviceClass parent;
+    /*< public >*/
+    bool (*handle_unknown_request)(VirtIOBlockReq *req, MultiReqBuffer *mrb,
+                                   uint32_t type);
+} VirtIOBlkClass;
+
 void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
+void virtio_blk_free_request(VirtIOBlockReq *req);
+void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status);
 
 #endif
diff --git a/include/hw/vmapple/vmapple.h b/include/hw/vmapple/vmapple.h
index 984281b6a67..266dc826d38 100644
--- a/include/hw/vmapple/vmapple.h
+++ b/include/hw/vmapple/vmapple.h
@@ -14,4 +14,8 @@
 
 #define TYPE_VMAPPLE_CFG "vmapple-cfg"
 
+#define TYPE_VMAPPLE_VIRTIO_BLK  "vmapple-virtio-blk"
+#define TYPE_VMAPPLE_VIRTIO_ROOT "vmapple-virtio-root"
+#define TYPE_VMAPPLE_VIRTIO_AUX  "vmapple-virtio-aux"
+
 #endif /* HW_VMAPPLE_VMAPPLE_H */
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 13/15] hw/vmapple/virtio-blk: Add support for apple virtio-blk
  2024-10-24 10:28 ` [PATCH v4 13/15] hw/vmapple/virtio-blk: Add support for apple virtio-blk Phil Dennis-Jordan
@ 2024-10-26  6:02   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  6:02 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> From: Alexander Graf <graf@amazon.com>
> 
> Apple has its own virtio-blk PCI device ID where it deviates from the
> official virtio-pci spec slightly: It puts a new "apple type"
> field at a static offset in config space and introduces a new barrier
> command.
> 
> This patch first creates a mechanism for virtio-blk downstream classes to
> handle unknown commands. It then creates such a downstream class and a new
> vmapple-virtio-blk-pci class which support the additional apple type config
> identifier as well as the barrier command.
> 
> It then exposes 2 subclasses from that that we can use to expose root and
> aux virtio-blk devices: "vmapple-virtio-root" and "vmapple-virtio-aux".
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> v4:
> 
>   * Use recommended object type declaration pattern.
>   * Correctly log unimplemented code paths.
>   * Most header code moved to .c, type name #defines moved to vmapple.h
> 
>   hw/block/virtio-blk.c          |  19 ++-
>   hw/vmapple/Kconfig             |   3 +
>   hw/vmapple/meson.build         |   1 +
>   hw/vmapple/virtio-blk.c        | 233 +++++++++++++++++++++++++++++++++
>   include/hw/pci/pci_ids.h       |   1 +
>   include/hw/virtio/virtio-blk.h |  12 +-
>   include/hw/vmapple/vmapple.h   |   4 +
>   7 files changed, 268 insertions(+), 5 deletions(-)
>   create mode 100644 hw/vmapple/virtio-blk.c
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 9166d7974d4..9e8337bb639 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -50,12 +50,12 @@ static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
>       req->mr_next = NULL;
>   }
>   
> -static void virtio_blk_free_request(VirtIOBlockReq *req)
> +void virtio_blk_free_request(VirtIOBlockReq *req)
>   {
>       g_free(req);
>   }
>   
> -static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
> +void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
>   {
>       VirtIOBlock *s = req->dev;
>       VirtIODevice *vdev = VIRTIO_DEVICE(s);
> @@ -966,8 +966,18 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>           break;
>       }
>       default:
> -        virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
> -        virtio_blk_free_request(req);
> +    {
> +        /*
> +         * Give subclasses a chance to handle unknown requests. This way the
> +         * class lookup is not in the hot path.
> +         */
> +        VirtIOBlkClass *vbk = VIRTIO_BLK_GET_CLASS(s);
> +        if (!vbk->handle_unknown_request ||
> +            !vbk->handle_unknown_request(req, mrb, type)) {
> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
> +            virtio_blk_free_request(req);
> +        }
> +    }
>       }
>       return 0;
>   }
> @@ -2044,6 +2054,7 @@ static const TypeInfo virtio_blk_info = {
>       .instance_size = sizeof(VirtIOBlock),
>       .instance_init = virtio_blk_instance_init,
>       .class_init = virtio_blk_class_init,
> +    .class_size = sizeof(VirtIOBlkClass),
>   };
>   
>   static void virtio_register_types(void)
> diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
> index 8bbeb9a9237..bcd1be63e3c 100644
> --- a/hw/vmapple/Kconfig
> +++ b/hw/vmapple/Kconfig
> @@ -7,3 +7,6 @@ config VMAPPLE_BDIF
>   config VMAPPLE_CFG
>       bool
>   
> +config VMAPPLE_VIRTIO_BLK
> +    bool
> +
> diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
> index 64b78693a31..bf17cf906c9 100644
> --- a/hw/vmapple/meson.build
> +++ b/hw/vmapple/meson.build
> @@ -1,3 +1,4 @@
>   system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
>   system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
>   system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
> +system_ss.add(when: 'CONFIG_VMAPPLE_VIRTIO_BLK',  if_true: files('virtio-blk.c'))
> diff --git a/hw/vmapple/virtio-blk.c b/hw/vmapple/virtio-blk.c
> new file mode 100644
> index 00000000000..3a8b47bc55f
> --- /dev/null
> +++ b/hw/vmapple/virtio-blk.c
> @@ -0,0 +1,233 @@
> +/*
> + * VMApple specific VirtIO Block implementation
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + * VMApple uses almost standard VirtIO Block, but with a few key differences:
> + *
> + *  - Different PCI device/vendor ID
> + *  - An additional "type" identifier to differentiate AUX and Root volumes
> + *  - An additional BARRIER command
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/vmapple/vmapple.h"
> +#include "hw/virtio/virtio-blk.h"
> +#include "hw/virtio/virtio-pci.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qapi/error.h"
> +
> +OBJECT_DECLARE_TYPE(VMAppleVirtIOBlk, VMAppleVirtIOBlkClass, VMAPPLE_VIRTIO_BLK)
> +
> +typedef struct VMAppleVirtIOBlkClass {
> +    /*< private >*/
> +    VirtIOBlkClass parent;
> +    /*< public >*/

No need for these comments. All members are private to this file.

> +    void (*get_config)(VirtIODevice *vdev, uint8_t *config);
> +} VMAppleVirtIOBlkClass;
> +
> +typedef struct VMAppleVirtIOBlk {
> +    /* <private> */
> +    VirtIOBlock parent_obj;
> +
> +    /* <public> */
> +    uint32_t apple_type;
> +} VMAppleVirtIOBlk;
> +
> +/*
> + * vmapple-virtio-blk-pci: This extends VirtioPCIProxy.
> + */
> +#define TYPE_VMAPPLE_VIRTIO_BLK_PCI "vmapple-virtio-blk-pci-base"
> +OBJECT_DECLARE_SIMPLE_TYPE(VMAppleVirtIOBlkPCI, VMAPPLE_VIRTIO_BLK_PCI)
> +
> +#define VIRTIO_BLK_T_APPLE_BARRIER     0x10000
> +
> +#define VIRTIO_APPLE_TYPE_ROOT 1
> +#define VIRTIO_APPLE_TYPE_AUX  2
> +
> +static bool vmapple_virtio_blk_handle_unknown_request(VirtIOBlockReq *req,
> +                                                      MultiReqBuffer *mrb,
> +                                                      uint32_t type)
> +{
> +    switch (type) {
> +    case VIRTIO_BLK_T_APPLE_BARRIER:
> +        qemu_log_mask(LOG_UNIMP, "%s: Barrier requests are currently no-ops\n",
> +                      __func__);
> +        virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> +        virtio_blk_free_request(req);
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
> +/*
> + * VMApple virtio-blk uses the same config format as normal virtio, with one
> + * exception: It adds an "apple type" specififer at the same location that
> + * the spec reserves for max_secure_erase_sectors. Let's hook into the
> + * get_config code path here, run it as usual and then patch in the apple type.
> + */
> +static void vmapple_virtio_blk_get_config(VirtIODevice *vdev, uint8_t *config)
> +{
> +    VMAppleVirtIOBlk *dev = VMAPPLE_VIRTIO_BLK(vdev);
> +    VMAppleVirtIOBlkClass *vvbk = VMAPPLE_VIRTIO_BLK_GET_CLASS(dev);
> +    struct virtio_blk_config *blkcfg = (struct virtio_blk_config *)config;
> +
> +    vvbk->get_config(vdev, config);
> +
> +    g_assert(dev->parent_obj.config_size >= endof(struct virtio_blk_config, zoned));
> +
> +    /* Apple abuses the field for max_secure_erase_sectors as type id */
> +    blkcfg->max_secure_erase_sectors = dev->apple_type;

Use stl_he_p(). The argument type indicates it be unaligned, and 
virtio_blk_update_config() also uses functions for unaligned stores.

> +}
> +
> +static Property vmapple_virtio_blk_properties[] = {
> +    DEFINE_PROP_UINT32("apple-type", VMAppleVirtIOBlk, apple_type, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vmapple_virtio_blk_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VirtIOBlkClass *vbk = VIRTIO_BLK_CLASS(klass);
> +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> +    VMAppleVirtIOBlkClass *vvbk = VMAPPLE_VIRTIO_BLK_CLASS(klass);
> +
> +    vbk->handle_unknown_request = vmapple_virtio_blk_handle_unknown_request;
> +    vvbk->get_config = vdc->get_config;
> +    vdc->get_config = vmapple_virtio_blk_get_config;
> +    device_class_set_props(dc, vmapple_virtio_blk_properties);
> +}
> +
> +static const TypeInfo vmapple_virtio_blk_info = {
> +    .name          = TYPE_VMAPPLE_VIRTIO_BLK,
> +    .parent        = TYPE_VIRTIO_BLK,
> +    .instance_size = sizeof(VMAppleVirtIOBlk),
> +    .class_init    = vmapple_virtio_blk_class_init,
> +};
> +
> +/* PCI Devices */
> +
> +struct VMAppleVirtIOBlkPCI {
> +    VirtIOPCIProxy parent_obj;
> +    VMAppleVirtIOBlk vdev;
> +    uint32_t apple_type;
> +};
> +
> +
> +static Property vmapple_virtio_blk_pci_properties[] = {
> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> +    DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
> +                       DEV_NVECTORS_UNSPECIFIED),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vmapple_virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> +{
> +    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(vpci_dev);
> +    DeviceState *vdev = DEVICE(&dev->vdev);
> +    VirtIOBlkConf *conf = &dev->vdev.parent_obj.conf;
> +
> +    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> +        conf->num_queues = virtio_pci_optimal_num_queues(0);
> +    }
> +
> +    if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> +        vpci_dev->nvectors = conf->num_queues + 1;
> +    }
> +
> +    /*
> +     * We don't support zones, but we need the additional config space size.
> +     * Let's just expose the feature so the rest of the virtio-blk logic
> +     * allocates enough space for us. The guest will ignore zones anyway.
> +     */
> +    virtio_add_feature(&dev->vdev.parent_obj.host_features, VIRTIO_BLK_F_ZONED);
> +    /* Propagate the apple type down to the virtio-blk device */
> +    qdev_prop_set_uint32(DEVICE(&dev->vdev), "apple-type", dev->apple_type);

The property is unnecessary if it is set internally. The value can be 
directly set using dev->vdev as done for conf->num_queues.

> +    /* and spawn the virtio-blk device */
> +    qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> +
> +    /*
> +     * The virtio-pci machinery adjusts its vendor/device ID based on whether
> +     * we support modern or legacy virtio. Let's patch it back to the Apple
> +     * identifiers here.
> +     */
> +    pci_config_set_vendor_id(vpci_dev->pci_dev.config, PCI_VENDOR_ID_APPLE);
> +    pci_config_set_device_id(vpci_dev->pci_dev.config,
> +                             PCI_DEVICE_ID_APPLE_VIRTIO_BLK);
> +}
> +
> +static void vmapple_virtio_blk_pci_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> +
> +    set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> +    device_class_set_props(dc, vmapple_virtio_blk_pci_properties);
> +    k->realize = vmapple_virtio_blk_pci_realize;
> +    pcidev_k->vendor_id = PCI_VENDOR_ID_APPLE;
> +    pcidev_k->device_id = PCI_DEVICE_ID_APPLE_VIRTIO_BLK;
> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> +    pcidev_k->class_id = PCI_CLASS_STORAGE_SCSI;
> +}
> +
> +static void vmapple_virtio_blk_pci_instance_init(Object *obj)
> +{
> +    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(obj);
> +
> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> +                                TYPE_VMAPPLE_VIRTIO_BLK);
> +}
> +
> +static const VirtioPCIDeviceTypeInfo vmapple_virtio_blk_pci_info = {
> +    .base_name     = TYPE_VMAPPLE_VIRTIO_BLK_PCI,
> +    .generic_name  = "vmapple-virtio-blk-pci",
> +    .instance_size = sizeof(VMAppleVirtIOBlkPCI),
> +    .instance_init = vmapple_virtio_blk_pci_instance_init,
> +    .class_init    = vmapple_virtio_blk_pci_class_init,
> +};
> +
> +static void vmapple_virtio_root_instance_init(Object *obj)
> +{
> +    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(obj);
> +
> +    dev->apple_type = VIRTIO_APPLE_TYPE_ROOT;
> +}
> +
> +static const TypeInfo vmapple_virtio_root_info = {
> +    .name          = TYPE_VMAPPLE_VIRTIO_ROOT,
> +    .parent        = "vmapple-virtio-blk-pci",
> +    .instance_size = sizeof(VMAppleVirtIOBlkPCI),
> +    .instance_init = vmapple_virtio_root_instance_init,
> +};
> +
> +static void vmapple_virtio_aux_instance_init(Object *obj)
> +{
> +    VMAppleVirtIOBlkPCI *dev = VMAPPLE_VIRTIO_BLK_PCI(obj);
> +
> +    dev->apple_type = VIRTIO_APPLE_TYPE_AUX;
> +}
> +
> +static const TypeInfo vmapple_virtio_aux_info = {
> +    .name          = TYPE_VMAPPLE_VIRTIO_AUX,
> +    .parent        = "vmapple-virtio-blk-pci",
> +    .instance_size = sizeof(VMAppleVirtIOBlkPCI),
> +    .instance_init = vmapple_virtio_aux_instance_init,
> +};
> +
> +static void vmapple_virtio_blk_register_types(void)
> +{
> +    type_register_static(&vmapple_virtio_blk_info);
> +    virtio_pci_types_register(&vmapple_virtio_blk_pci_info);
> +    type_register_static(&vmapple_virtio_root_info);
> +    type_register_static(&vmapple_virtio_aux_info);
> +}
> +
> +type_init(vmapple_virtio_blk_register_types)
> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
> index f1a53fea8d6..33e2898be95 100644
> --- a/include/hw/pci/pci_ids.h
> +++ b/include/hw/pci/pci_ids.h
> @@ -191,6 +191,7 @@
>   #define PCI_DEVICE_ID_APPLE_UNI_N_AGP    0x0020
>   #define PCI_DEVICE_ID_APPLE_U3_AGP       0x004b
>   #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC   0x0021
> +#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK   0x1a00
>   
>   #define PCI_VENDOR_ID_SUN                0x108e
>   #define PCI_DEVICE_ID_SUN_EBUS           0x1000
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index 5c14110c4b1..28d5046ea6c 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -24,7 +24,7 @@
>   #include "qapi/qapi-types-virtio.h"
>   
>   #define TYPE_VIRTIO_BLK "virtio-blk-device"
> -OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
> +OBJECT_DECLARE_TYPE(VirtIOBlock, VirtIOBlkClass, VIRTIO_BLK)
>   
>   /* This is the last element of the write scatter-gather list */
>   struct virtio_blk_inhdr
> @@ -100,6 +100,16 @@ typedef struct MultiReqBuffer {
>       bool is_write;
>   } MultiReqBuffer;
>   
> +typedef struct VirtIOBlkClass {
> +    /*< private >*/
> +    VirtioDeviceClass parent;
> +    /*< public >*/
> +    bool (*handle_unknown_request)(VirtIOBlockReq *req, MultiReqBuffer *mrb,
> +                                   uint32_t type);
> +} VirtIOBlkClass;
> +
>   void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
> +void virtio_blk_free_request(VirtIOBlockReq *req);
> +void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status);
>   
>   #endif
> diff --git a/include/hw/vmapple/vmapple.h b/include/hw/vmapple/vmapple.h
> index 984281b6a67..266dc826d38 100644
> --- a/include/hw/vmapple/vmapple.h
> +++ b/include/hw/vmapple/vmapple.h
> @@ -14,4 +14,8 @@
>   
>   #define TYPE_VMAPPLE_CFG "vmapple-cfg"
>   
> +#define TYPE_VMAPPLE_VIRTIO_BLK  "vmapple-virtio-blk"
> +#define TYPE_VMAPPLE_VIRTIO_ROOT "vmapple-virtio-root"
> +#define TYPE_VMAPPLE_VIRTIO_AUX  "vmapple-virtio-aux"
> +
>   #endif /* HW_VMAPPLE_VMAPPLE_H */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 14/15] hw/block/virtio-blk: Replaces request free function with g_free
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (12 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 13/15] hw/vmapple/virtio-blk: Add support for apple virtio-blk Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  6:03   ` Akihiko Odaki
  2024-10-24 10:28 ` [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type Phil Dennis-Jordan
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv

The virtio_blk_free_request() function has been a 1-liner forwarding
to g_free() for a while now. We may as well call g_free on the request
pointer directly.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 hw/block/virtio-blk.c          | 43 +++++++++++++++-------------------
 hw/vmapple/virtio-blk.c        |  2 +-
 include/hw/virtio/virtio-blk.h |  1 -
 3 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9e8337bb639..40d2c9bc591 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -50,11 +50,6 @@ static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
     req->mr_next = NULL;
 }
 
-void virtio_blk_free_request(VirtIOBlockReq *req)
-{
-    g_free(req);
-}
-
 void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 {
     VirtIOBlock *s = req->dev;
@@ -93,7 +88,7 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
         if (acct_failed) {
             block_acct_failed(blk_get_stats(s->blk), &req->acct);
         }
-        virtio_blk_free_request(req);
+        g_free(req);
     }
 
     blk_error_action(s->blk, action, is_read, error);
@@ -136,7 +131,7 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 
         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
         block_acct_done(blk_get_stats(s->blk), &req->acct);
-        virtio_blk_free_request(req);
+        g_free(req);
     }
 }
 
@@ -151,7 +146,7 @@ static void virtio_blk_flush_complete(void *opaque, int ret)
 
     virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
     block_acct_done(blk_get_stats(s->blk), &req->acct);
-    virtio_blk_free_request(req);
+    g_free(req);
 }
 
 static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
@@ -169,7 +164,7 @@ static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
     if (is_write_zeroes) {
         block_acct_done(blk_get_stats(s->blk), &req->acct);
     }
-    virtio_blk_free_request(req);
+    g_free(req);
 }
 
 static VirtIOBlockReq *virtio_blk_get_request(VirtIOBlock *s, VirtQueue *vq)
@@ -214,7 +209,7 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 
 fail:
     virtio_blk_req_complete(req, status);
-    virtio_blk_free_request(req);
+    g_free(req);
 }
 
 static inline void submit_requests(VirtIOBlock *s, MultiReqBuffer *mrb,
@@ -612,7 +607,7 @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
 
 out:
     virtio_blk_req_complete(req, err_status);
-    virtio_blk_free_request(req);
+    g_free(req);
     g_free(data->zone_report_data.zones);
     g_free(data);
 }
@@ -661,7 +656,7 @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
     return;
 out:
     virtio_blk_req_complete(req, err_status);
-    virtio_blk_free_request(req);
+    g_free(req);
 }
 
 static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
@@ -677,7 +672,7 @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
     }
 
     virtio_blk_req_complete(req, err_status);
-    virtio_blk_free_request(req);
+    g_free(req);
 }
 
 static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
@@ -719,7 +714,7 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
     return 0;
 out:
     virtio_blk_req_complete(req, err_status);
-    virtio_blk_free_request(req);
+    g_free(req);
     return err_status;
 }
 
@@ -750,7 +745,7 @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
 
 out:
     virtio_blk_req_complete(req, err_status);
-    virtio_blk_free_request(req);
+    g_free(req);
     g_free(data);
 }
 
@@ -788,7 +783,7 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
 
 out:
     virtio_blk_req_complete(req, err_status);
-    virtio_blk_free_request(req);
+    g_free(req);
     return err_status;
 }
 
@@ -855,7 +850,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
             virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
             block_acct_invalid(blk_get_stats(s->blk),
                                is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
-            virtio_blk_free_request(req);
+            g_free(req);
             return 0;
         }
 
@@ -911,7 +906,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
                               VIRTIO_BLK_ID_BYTES));
         iov_from_buf(in_iov, in_num, 0, serial, size);
         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-        virtio_blk_free_request(req);
+        g_free(req);
         break;
     }
     case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
@@ -943,7 +938,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
         if (unlikely(!(type & VIRTIO_BLK_T_OUT) ||
                      out_len > sizeof(dwz_hdr))) {
             virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-            virtio_blk_free_request(req);
+            g_free(req);
             return 0;
         }
 
@@ -960,7 +955,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
                                                             is_write_zeroes);
         if (err_status != VIRTIO_BLK_S_OK) {
             virtio_blk_req_complete(req, err_status);
-            virtio_blk_free_request(req);
+            g_free(req);
         }
 
         break;
@@ -975,7 +970,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
         if (!vbk->handle_unknown_request ||
             !vbk->handle_unknown_request(req, mrb, type)) {
             virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-            virtio_blk_free_request(req);
+            g_free(req);
         }
     }
     }
@@ -998,7 +993,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
         while ((req = virtio_blk_get_request(s, vq))) {
             if (virtio_blk_handle_request(req, &mrb)) {
                 virtqueue_detach_element(req->vq, &req->elem, 0);
-                virtio_blk_free_request(req);
+                g_free(req);
                 break;
             }
         }
@@ -1048,7 +1043,7 @@ static void virtio_blk_dma_restart_bh(void *opaque)
             while (req) {
                 next = req->next;
                 virtqueue_detach_element(req->vq, &req->elem, 0);
-                virtio_blk_free_request(req);
+                g_free(req);
                 req = next;
             }
             break;
@@ -1131,7 +1126,7 @@ static void virtio_blk_reset(VirtIODevice *vdev)
             /* No other threads can access req->vq here */
             virtqueue_detach_element(req->vq, &req->elem, 0);
 
-            virtio_blk_free_request(req);
+            g_free(req);
         }
     }
 
diff --git a/hw/vmapple/virtio-blk.c b/hw/vmapple/virtio-blk.c
index 3a8b47bc55f..9f84c4851f5 100644
--- a/hw/vmapple/virtio-blk.c
+++ b/hw/vmapple/virtio-blk.c
@@ -58,7 +58,7 @@ static bool vmapple_virtio_blk_handle_unknown_request(VirtIOBlockReq *req,
         qemu_log_mask(LOG_UNIMP, "%s: Barrier requests are currently no-ops\n",
                       __func__);
         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-        virtio_blk_free_request(req);
+        g_free(req);
         return true;
     default:
         return false;
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 28d5046ea6c..dcb2c89aed5 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -109,7 +109,6 @@ typedef struct VirtIOBlkClass {
 } VirtIOBlkClass;
 
 void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
-void virtio_blk_free_request(VirtIOBlockReq *req);
 void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status);
 
 #endif
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 14/15] hw/block/virtio-blk: Replaces request free function with g_free
  2024-10-24 10:28 ` [PATCH v4 14/15] hw/block/virtio-blk: Replaces request free function with g_free Phil Dennis-Jordan
@ 2024-10-26  6:03   ` Akihiko Odaki
  0 siblings, 0 replies; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  6:03 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> The virtio_blk_free_request() function has been a 1-liner forwarding
> to g_free() for a while now. We may as well call g_free on the request
> pointer directly.
> 
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>

Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type
  2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
                   ` (13 preceding siblings ...)
  2024-10-24 10:28 ` [PATCH v4 14/15] hw/block/virtio-blk: Replaces request free function with g_free Phil Dennis-Jordan
@ 2024-10-24 10:28 ` Phil Dennis-Jordan
  2024-10-26  6:20   ` Akihiko Odaki
  14 siblings, 1 reply; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-24 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: agraf, phil, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, akihiko.odaki, qemu-arm, qemu-block, qemu-riscv,
	Alexander Graf

From: Alexander Graf <graf@amazon.com>

Apple defines a new "vmapple" machine type as part of its proprietary
macOS Virtualization.Framework vmm. This machine type is similar to the
virt one, but with subtle differences in base devices, a few special
vmapple device additions and a vastly different boot chain.

This patch reimplements this machine type in QEMU. To use it, you
have to have a readily installed version of macOS for VMApple,
run on macOS with -accel hvf, pass the Virtualization.Framework
boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
and pass aux and root volume as virtio drives. In addition, you also
need to find the machine UUID and pass that as -M vmapple,uuid= parameter:

$ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
    -bios /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
    -drive file=aux,if=pflash,format=raw \
    -drive file=root,if=pflash,format=raw \
    -drive file=aux,if=none,id=aux,format=raw \
    -device vmapple-virtio-aux,drive=aux \
    -drive file=root,if=none,id=root,format=raw \
    -device vmapple-virtio-root,drive=root

With all these in place, you should be able to see macOS booting
successfully.

Known issues:
 - Keyboard and mouse/tablet input is laggy. The reason for this is
   either that macOS's XHCI driver is broken when the device/platform
   does not support MSI/MSI-X, or there's some unfortunate interplay
   with Qemu's XHCI implementation in this scenario.
 - Currently only macOS 12 guests are supported. The boot process for
   13+ will need further investigation and adjustment.

Signed-off-by: Alexander Graf <graf@amazon.com>
Co-authored-by: Phil Dennis-Jordan <phil@philjordan.eu>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
v3:
 * Rebased on latest upstream, updated affinity and NIC creation
API usage
 * Included Apple-variant virtio-blk in build dependency
 * Updated API usage for setting 'redist-region-count' array-typed property on GIC.
 * Switched from virtio HID devices (for which macOS 12 does not contain drivers) to an XHCI USB controller and USB HID devices.

v4:
 * Fixups for v4 changes to the other patches in the set.
 * Corrected the assert macro to use
 * Removed superfluous endian conversions corresponding to cfg's.
 * Init error handling improvement.
 * No need to select CPU type on TCG, as only HVF is supported.
 * Machine type version bumped to 9.2
 * #include order improved

 MAINTAINERS                 |   1 +
 docs/system/arm/vmapple.rst |  63 ++++
 docs/system/target-arm.rst  |   1 +
 hw/vmapple/Kconfig          |  20 ++
 hw/vmapple/meson.build      |   1 +
 hw/vmapple/vmapple.c        | 652 ++++++++++++++++++++++++++++++++++++
 6 files changed, 738 insertions(+)
 create mode 100644 docs/system/arm/vmapple.rst
 create mode 100644 hw/vmapple/vmapple.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 104813ed85f..f44418b4a95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2739,6 +2739,7 @@ R: Phil Dennis-Jordan <phil@philjordan.eu>
 S: Maintained
 F: hw/vmapple/*
 F: include/hw/vmapple/*
+F: docs/system/arm/vmapple.rst
 
 Subsystems
 ----------
diff --git a/docs/system/arm/vmapple.rst b/docs/system/arm/vmapple.rst
new file mode 100644
index 00000000000..acb921ffb35
--- /dev/null
+++ b/docs/system/arm/vmapple.rst
@@ -0,0 +1,63 @@
+VMApple machine emulation
+========================================================================================
+
+VMApple is the device model that the macOS built-in hypervisor called "Virtualization.framework"
+exposes to Apple Silicon macOS guests. The "vmapple" machine model in QEMU implements the same
+device model, but does not use any code from Virtualization.Framework.
+
+Prerequisites
+-------------
+
+To run the vmapple machine model, you need to
+
+ * Run on Apple Silicon
+ * Run on macOS 12.0 or above
+ * Have an already installed copy of a Virtualization.Framework macOS 12 virtual machine. I will
+   assume that you installed it using the macosvm CLI.
+
+First, we need to extract the UUID from the virtual machine that you installed. You can do this
+by running the following shell script:
+
+.. code-block:: bash
+  :caption: uuid.sh script to extract the UUID from a macosvm.json file
+
+  #!/bin/bash
+
+  MID=$(cat "$1" | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]);')
+  echo "$MID" | base64 -d | plutil -extract ECID raw -
+
+Now we also need to trim the aux partition. It contains metadata that we can just discard:
+
+.. code-block:: bash
+  :caption: Command to trim the aux file
+
+  $ dd if="aux.img" of="aux.img.trimmed" bs=$(( 0x4000 )) skip=1
+
+How to run
+----------
+
+Then, we can launch QEMU with the Virtualization.Framework pre-boot environment and the readily
+installed target disk images. I recommend to port forward the VM's ssh and vnc ports to the host
+to get better interactive access into the target system:
+
+.. code-block:: bash
+  :caption: Example execution command line
+
+  $ UUID=$(uuid.sh macosvm.json)
+  $ AVPBOOTER=/System/Library/Frameworks/Virtualization.framework/Resources/AVPBooter.vmapple2.bin
+  $ AUX=aux.img.trimmed
+  $ DISK=disk.img
+  $ qemu-system-aarch64 \
+       -serial mon:stdio \
+       -m 4G \
+       -accel hvf \
+       -M vmapple,uuid=$UUID \
+       -bios $AVPBOOTER \
+        -drive file="$AUX",if=pflash,format=raw \
+        -drive file="$DISK",if=pflash,format=raw \
+       -drive file="$AUX",if=none,id=aux,format=raw \
+       -drive file="$DISK",if=none,id=root,format=raw \
+       -device vmapple-virtio-aux,drive=aux \
+       -device vmapple-virtio-root,drive=root \
+       -net user,ipv6=off,hostfwd=tcp::2222-:22,hostfwd=tcp::5901-:5900 \
+       -net nic,model=virtio-net-pci \
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 3c0a5848453..f2e0ac99537 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -102,6 +102,7 @@ undocumented; you can get a complete list by running
    arm/stellaris
    arm/stm32
    arm/virt
+   arm/vmapple
    arm/xenpvh
    arm/xlnx-versal-virt
    arm/xlnx-zynq
diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index bcd1be63e3c..6a4c4a7fa2e 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -10,3 +10,23 @@ config VMAPPLE_CFG
 config VMAPPLE_VIRTIO_BLK
     bool
 
+config VMAPPLE
+    bool
+    depends on ARM
+    depends on HVF
+    default y if ARM
+    imply PCI_DEVICES
+    select ARM_GIC
+    select PLATFORM_BUS
+    select PCI_EXPRESS
+    select PCI_EXPRESS_GENERIC_BRIDGE
+    select PL011 # UART
+    select PL031 # RTC
+    select PL061 # GPIO
+    select GPIO_PWR
+    select PVPANIC_MMIO
+    select VMAPPLE_AES
+    select VMAPPLE_BDIF
+    select VMAPPLE_CFG
+    select MAC_PVG_MMIO
+    select VMAPPLE_VIRTIO_BLK
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index bf17cf906c9..e572f7d5602 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -2,3 +2,4 @@ system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
 system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
 system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
 system_ss.add(when: 'CONFIG_VMAPPLE_VIRTIO_BLK',  if_true: files('virtio-blk.c'))
+specific_ss.add(when: 'CONFIG_VMAPPLE',     if_true: files('vmapple.c'))
diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c
new file mode 100644
index 00000000000..b9454c07eee
--- /dev/null
+++ b/hw/vmapple/vmapple.c
@@ -0,0 +1,652 @@
+/*
+ * VMApple machine emulation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * VMApple is the device model that the macOS built-in hypervisor called
+ * "Virtualization.framework" exposes to Apple Silicon macOS guests. The
+ * machine model in this file implements the same device model in QEMU, but
+ * does not use any code from Virtualization.Framework.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "qemu/datadir.h"
+#include "qemu/error-report.h"
+#include "qemu/guest-random.h"
+#include "qemu/help-texts.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/units.h"
+#include "monitor/qdev.h"
+#include "hw/boards.h"
+#include "hw/irq.h"
+#include "hw/loader.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "hw/usb.h"
+#include "hw/arm/boot.h"
+#include "hw/arm/primecell.h"
+#include "hw/char/pl011.h"
+#include "hw/intc/arm_gic.h"
+#include "hw/intc/arm_gicv3_common.h"
+#include "hw/misc/pvpanic.h"
+#include "hw/pci-host/gpex.h"
+#include "hw/usb/xhci.h"
+#include "hw/virtio/virtio-pci.h"
+#include "hw/vmapple/vmapple.h"
+#include "net/net.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/visitor.h"
+#include "qapi/qapi-visit-common.h"
+#include "standard-headers/linux/input.h"
+#include "sysemu/hvf.h"
+#include "sysemu/kvm.h"
+#include "sysemu/reset.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "target/arm/internals.h"
+#include "target/arm/kvm_arm.h"
+
+struct VMAppleMachineClass {
+    MachineClass parent;
+};
+
+struct VMAppleMachineState {
+    MachineState parent;
+
+    Notifier machine_done;
+    struct arm_boot_info bootinfo;
+    MemMapEntry *memmap;
+    const int *irqmap;
+    DeviceState *gic;
+    DeviceState *cfg;
+    Notifier powerdown_notifier;
+    PCIBus *bus;
+    MemoryRegion fw_mr;
+    uint64_t uuid;
+};
+
+#define DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, latest) \
+    static void vmapple##major##_##minor##_class_init(ObjectClass *oc, \
+                                                    void *data) \
+    { \
+        MachineClass *mc = MACHINE_CLASS(oc); \
+        vmapple_machine_##major##_##minor##_options(mc); \
+        mc->desc = "QEMU " # major "." # minor " Apple Virtual Machine"; \
+        if (latest) { \
+            mc->alias = "vmapple"; \
+        } \
+    } \
+    static const TypeInfo machvmapple##major##_##minor##_info = { \
+        .name = MACHINE_TYPE_NAME("vmapple-" # major "." # minor), \
+        .parent = TYPE_VMAPPLE_MACHINE, \
+        .class_init = vmapple##major##_##minor##_class_init, \
+    }; \
+    static void machvmapple_machine_##major##_##minor##_init(void) \
+    { \
+        type_register_static(&machvmapple##major##_##minor##_info); \
+    } \
+    type_init(machvmapple_machine_##major##_##minor##_init);
+
+#define DEFINE_VMAPPLE_MACHINE_AS_LATEST(major, minor) \
+    DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, true)
+#define DEFINE_VMAPPLE_MACHINE(major, minor) \
+    DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, false)
+
+#define TYPE_VMAPPLE_MACHINE   MACHINE_TYPE_NAME("vmapple")
+OBJECT_DECLARE_TYPE(VMAppleMachineState, VMAppleMachineClass, VMAPPLE_MACHINE)
+
+/* Number of external interrupt lines to configure the GIC with */
+#define NUM_IRQS 256
+
+enum {
+    VMAPPLE_FIRMWARE,
+    VMAPPLE_CONFIG,
+    VMAPPLE_MEM,
+    VMAPPLE_GIC_DIST,
+    VMAPPLE_GIC_REDIST,
+    VMAPPLE_UART,
+    VMAPPLE_RTC,
+    VMAPPLE_PCIE,
+    VMAPPLE_PCIE_MMIO,
+    VMAPPLE_PCIE_ECAM,
+    VMAPPLE_GPIO,
+    VMAPPLE_PVPANIC,
+    VMAPPLE_APV_GFX,
+    VMAPPLE_APV_IOSFC,
+    VMAPPLE_AES_1,
+    VMAPPLE_AES_2,
+    VMAPPLE_BDOOR,
+    VMAPPLE_MEMMAP_LAST,
+};
+
+static MemMapEntry memmap[] = {
+    [VMAPPLE_FIRMWARE] =           { 0x00100000, 0x00100000 },
+    [VMAPPLE_CONFIG] =             { 0x00400000, 0x00010000 },
+
+    [VMAPPLE_GIC_DIST] =           { 0x10000000, 0x00010000 },
+    [VMAPPLE_GIC_REDIST] =         { 0x10010000, 0x00400000 },
+
+    [VMAPPLE_UART] =               { 0x20010000, 0x00010000 },
+    [VMAPPLE_RTC] =                { 0x20050000, 0x00001000 },
+    [VMAPPLE_GPIO] =               { 0x20060000, 0x00001000 },
+    [VMAPPLE_PVPANIC] =            { 0x20070000, 0x00000002 },
+    [VMAPPLE_BDOOR] =              { 0x30000000, 0x00200000 },
+    [VMAPPLE_APV_GFX] =            { 0x30200000, 0x00010000 },
+    [VMAPPLE_APV_IOSFC] =          { 0x30210000, 0x00010000 },
+    [VMAPPLE_AES_1] =              { 0x30220000, 0x00004000 },
+    [VMAPPLE_AES_2] =              { 0x30230000, 0x00004000 },
+    [VMAPPLE_PCIE_ECAM] =          { 0x40000000, 0x10000000 },
+    [VMAPPLE_PCIE_MMIO] =          { 0x50000000, 0x1fff0000 },
+
+    /* Actual RAM size depends on configuration */
+    [VMAPPLE_MEM] =                { 0x70000000ULL, GiB},
+};
+
+static const int irqmap[] = {
+    [VMAPPLE_UART] = 1,
+    [VMAPPLE_RTC] = 2,
+    [VMAPPLE_GPIO] = 0x5,
+    [VMAPPLE_APV_IOSFC] = 0x10,
+    [VMAPPLE_APV_GFX] = 0x11,
+    [VMAPPLE_AES_1] = 0x12,
+    [VMAPPLE_PCIE] = 0x20,
+};
+
+#define GPEX_NUM_IRQS 16
+
+static void create_bdif(VMAppleMachineState *vms, MemoryRegion *mem)
+{
+    DeviceState *bdif;
+    SysBusDevice *bdif_sb;
+    DriveInfo *di_aux = drive_get(IF_PFLASH, 0, 0);
+    DriveInfo *di_root = drive_get(IF_PFLASH, 0, 1);
+
+    if (!di_aux) {
+        error_report("No AUX device. Please specify one as pflash drive.");
+        exit(1);
+    }
+
+    if (!di_root) {
+        /* Fall back to the first IF_VIRTIO device as root device */
+        di_root = drive_get(IF_VIRTIO, 0, 0);
+    }
+
+    if (!di_root) {
+        error_report("No root device. Please specify one as virtio drive.");
+        exit(1);
+    }
+
+    /* PV backdoor device */
+    bdif = qdev_new(TYPE_VMAPPLE_BDIF);
+    bdif_sb = SYS_BUS_DEVICE(bdif);
+    sysbus_mmio_map(bdif_sb, 0, vms->memmap[VMAPPLE_BDOOR].base);
+
+    qdev_prop_set_drive(DEVICE(bdif), "aux", blk_by_legacy_dinfo(di_aux));
+    qdev_prop_set_drive(DEVICE(bdif), "root", blk_by_legacy_dinfo(di_root));
+
+    sysbus_realize_and_unref(bdif_sb, &error_fatal);
+}
+
+static void create_pvpanic(VMAppleMachineState *vms, MemoryRegion *mem)
+{
+    SysBusDevice *cfg;
+
+    vms->cfg = qdev_new(TYPE_PVPANIC_MMIO_DEVICE);
+    cfg = SYS_BUS_DEVICE(vms->cfg);
+    sysbus_mmio_map(cfg, 0, vms->memmap[VMAPPLE_PVPANIC].base);
+
+    sysbus_realize_and_unref(cfg, &error_fatal);
+}
+
+static void create_cfg(VMAppleMachineState *vms, MemoryRegion *mem)
+{
+    SysBusDevice *cfg;
+    MachineState *machine = MACHINE(vms);
+    uint32_t rnd = 1;
+
+    vms->cfg = qdev_new(TYPE_VMAPPLE_CFG);
+    cfg = SYS_BUS_DEVICE(vms->cfg);
+    sysbus_mmio_map(cfg, 0, vms->memmap[VMAPPLE_CONFIG].base);
+
+    qemu_guest_getrandom_nofail(&rnd, sizeof(rnd));
+
+    qdev_prop_set_uint32(vms->cfg, "nr-cpus", machine->smp.cpus);
+    qdev_prop_set_uint64(vms->cfg, "ecid", vms->uuid);
+    qdev_prop_set_uint64(vms->cfg, "ram-size", machine->ram_size);
+    qdev_prop_set_uint32(vms->cfg, "rnd", rnd);
+
+    sysbus_realize_and_unref(cfg, &error_fatal);
+}
+
+static void create_gfx(VMAppleMachineState *vms, MemoryRegion *mem)
+{
+    int irq_gfx = vms->irqmap[VMAPPLE_APV_GFX];
+    int irq_iosfc = vms->irqmap[VMAPPLE_APV_IOSFC];
+    SysBusDevice *aes;
+
+    aes = SYS_BUS_DEVICE(qdev_new("apple-gfx-mmio"));
+    sysbus_mmio_map(aes, 0, vms->memmap[VMAPPLE_APV_GFX].base);
+    sysbus_mmio_map(aes, 1, vms->memmap[VMAPPLE_APV_IOSFC].base);
+    sysbus_connect_irq(aes, 0, qdev_get_gpio_in(vms->gic, irq_gfx));
+    sysbus_connect_irq(aes, 1, qdev_get_gpio_in(vms->gic, irq_iosfc));
+    sysbus_realize_and_unref(aes, &error_fatal);
+}
+
+static void create_aes(VMAppleMachineState *vms, MemoryRegion *mem)
+{
+    int irq = vms->irqmap[VMAPPLE_AES_1];
+    SysBusDevice *aes;
+
+    aes = SYS_BUS_DEVICE(qdev_new("apple-aes"));
+    sysbus_mmio_map(aes, 0, vms->memmap[VMAPPLE_AES_1].base);
+    sysbus_mmio_map(aes, 1, vms->memmap[VMAPPLE_AES_2].base);
+    sysbus_connect_irq(aes, 0, qdev_get_gpio_in(vms->gic, irq));
+    sysbus_realize_and_unref(aes, &error_fatal);
+}
+
+static inline int arm_gic_ppi_index(int cpu_nr, int ppi_index)
+{
+    return NUM_IRQS + cpu_nr * GIC_INTERNAL + ppi_index;
+}
+
+static void create_gic(VMAppleMachineState *vms, MemoryRegion *mem)
+{
+    MachineState *ms = MACHINE(vms);
+    /* We create a standalone GIC */
+    SysBusDevice *gicbusdev;
+    QList *redist_region_count;
+    int i;
+    unsigned int smp_cpus = ms->smp.cpus;
+
+    vms->gic = qdev_new(gicv3_class_name());
+    qdev_prop_set_uint32(vms->gic, "revision", 3);
+    qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus);
+    /*
+     * Note that the num-irq property counts both internal and external
+     * interrupts; there are always 32 of the former (mandated by GIC spec).
+     */
+    qdev_prop_set_uint32(vms->gic, "num-irq", NUM_IRQS + 32);
+
+    uint32_t redist0_capacity =
+                vms->memmap[VMAPPLE_GIC_REDIST].size / GICV3_REDIST_SIZE;
+    uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
+
+    redist_region_count = qlist_new();
+    qlist_append_int(redist_region_count, redist0_count);
+    qdev_prop_set_array(vms->gic, "redist-region-count", redist_region_count);
+
+    gicbusdev = SYS_BUS_DEVICE(vms->gic);
+    sysbus_realize_and_unref(gicbusdev, &error_fatal);
+    sysbus_mmio_map(gicbusdev, 0, vms->memmap[VMAPPLE_GIC_DIST].base);
+    sysbus_mmio_map(gicbusdev, 1, vms->memmap[VMAPPLE_GIC_REDIST].base);
+
+    /*
+     * Wire the outputs from each CPU's generic timer and the GICv3
+     * maintenance interrupt signal to the appropriate GIC PPI inputs,
+     * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
+     */
+    for (i = 0; i < smp_cpus; i++) {
+        DeviceState *cpudev = DEVICE(qemu_get_cpu(i));
+
+        /* Map the virt timer to PPI 27 */
+        qdev_connect_gpio_out(cpudev, GTIMER_VIRT,
+                              qdev_get_gpio_in(vms->gic,
+                                               arm_gic_ppi_index(i, 27)));
+
+        /* Map the GIC IRQ and FIQ lines to CPU */
+        sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
+        sysbus_connect_irq(gicbusdev, i + smp_cpus,
+                           qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
+    }
+}
+
+static void create_uart(const VMAppleMachineState *vms, int uart,
+                        MemoryRegion *mem, Chardev *chr)
+{
+    hwaddr base = vms->memmap[uart].base;
+    int irq = vms->irqmap[uart];
+    DeviceState *dev = qdev_new(TYPE_PL011);
+    SysBusDevice *s = SYS_BUS_DEVICE(dev);
+
+    qdev_prop_set_chr(dev, "chardev", chr);
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    memory_region_add_subregion(mem, base,
+                                sysbus_mmio_get_region(s, 0));
+    sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq));
+}
+
+static void create_rtc(const VMAppleMachineState *vms)
+{
+    hwaddr base = vms->memmap[VMAPPLE_RTC].base;
+    int irq = vms->irqmap[VMAPPLE_RTC];
+
+    sysbus_create_simple("pl031", base, qdev_get_gpio_in(vms->gic, irq));
+}
+
+static DeviceState *gpio_key_dev;
+static void vmapple_powerdown_req(Notifier *n, void *opaque)
+{
+    /* use gpio Pin 3 for power button event */
+    qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
+}
+
+static void create_gpio_devices(const VMAppleMachineState *vms, int gpio,
+                                MemoryRegion *mem)
+{
+    DeviceState *pl061_dev;
+    hwaddr base = vms->memmap[gpio].base;
+    int irq = vms->irqmap[gpio];
+    SysBusDevice *s;
+
+    pl061_dev = qdev_new("pl061");
+    /* Pull lines down to 0 if not driven by the PL061 */
+    qdev_prop_set_uint32(pl061_dev, "pullups", 0);
+    qdev_prop_set_uint32(pl061_dev, "pulldowns", 0xff);
+    s = SYS_BUS_DEVICE(pl061_dev);
+    sysbus_realize_and_unref(s, &error_fatal);
+    memory_region_add_subregion(mem, base, sysbus_mmio_get_region(s, 0));
+    sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq));
+    gpio_key_dev = sysbus_create_simple("gpio-key", -1,
+                                        qdev_get_gpio_in(pl061_dev, 3));
+}
+
+static void vmapple_firmware_init(VMAppleMachineState *vms,
+                                  MemoryRegion *sysmem)
+{
+    hwaddr size = vms->memmap[VMAPPLE_FIRMWARE].size;
+    hwaddr base = vms->memmap[VMAPPLE_FIRMWARE].base;
+    const char *bios_name;
+    int image_size;
+    char *fname;
+
+    bios_name = MACHINE(vms)->firmware;
+    if (!bios_name) {
+        error_report("No firmware specified");
+        exit(1);
+    }
+
+    fname = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+    if (!fname) {
+        error_report("Could not find ROM image '%s'", bios_name);
+        exit(1);
+    }
+
+    memory_region_init_ram(&vms->fw_mr, NULL, "firmware", size, &error_fatal);
+    image_size = load_image_mr(fname, &vms->fw_mr);
+
+    g_free(fname);
+    if (image_size < 0) {
+        error_report("Could not load ROM image '%s'", bios_name);
+        exit(1);
+    }
+
+    memory_region_add_subregion(get_system_memory(), base, &vms->fw_mr);
+}
+
+static void create_pcie(VMAppleMachineState *vms)
+{
+    hwaddr base_mmio = vms->memmap[VMAPPLE_PCIE_MMIO].base;
+    hwaddr size_mmio = vms->memmap[VMAPPLE_PCIE_MMIO].size;
+    hwaddr base_ecam = vms->memmap[VMAPPLE_PCIE_ECAM].base;
+    hwaddr size_ecam = vms->memmap[VMAPPLE_PCIE_ECAM].size;
+    int irq = vms->irqmap[VMAPPLE_PCIE];
+    MemoryRegion *mmio_alias;
+    MemoryRegion *mmio_reg;
+    MemoryRegion *ecam_alias;
+    MemoryRegion *ecam_reg;
+    DeviceState *dev;
+    int i;
+    PCIHostState *pci;
+    DeviceState *usb_controller;
+    USBBus *usb_bus;
+
+    dev = qdev_new(TYPE_GPEX_HOST);
+    qdev_prop_set_uint32(dev, "num-irqs", GPEX_NUM_IRQS);
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+
+    /* Map only the first size_ecam bytes of ECAM space */
+    ecam_alias = g_new0(MemoryRegion, 1);
+    ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
+    memory_region_init_alias(ecam_alias, OBJECT(dev), "pcie-ecam",
+                             ecam_reg, 0, size_ecam);
+    memory_region_add_subregion(get_system_memory(), base_ecam, ecam_alias);
+
+    /*
+     * Map the MMIO window from [0x50000000-0x7fff0000] in PCI space into
+     * system address space at [0x50000000-0x7fff0000].
+     */
+    mmio_alias = g_new0(MemoryRegion, 1);
+    mmio_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 1);
+    memory_region_init_alias(mmio_alias, OBJECT(dev), "pcie-mmio",
+                             mmio_reg, base_mmio, size_mmio);
+    memory_region_add_subregion(get_system_memory(), base_mmio, mmio_alias);
+
+    for (i = 0; i < GPEX_NUM_IRQS; i++) {
+        sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
+                           qdev_get_gpio_in(vms->gic, irq + i));
+        gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
+    }
+
+    pci = PCI_HOST_BRIDGE(dev);
+    vms->bus = pci->bus;
+    g_assert(vms->bus);
+
+    while ((dev = qemu_create_nic_device("virtio-net-pci", true, NULL))) {
+        qdev_realize_and_unref(dev, BUS(vms->bus), &error_fatal);
+    }
+
+    usb_controller = qdev_new(TYPE_QEMU_XHCI);
+    qdev_realize_and_unref(usb_controller, BUS(pci->bus), &error_fatal);
+
+    usb_bus = USB_BUS(object_resolve_type_unambiguous(TYPE_USB_BUS,
+                                                      &error_fatal));
+    usb_create_simple(usb_bus, "usb-kbd");
+    usb_create_simple(usb_bus, "usb-tablet");
+}
+
+static void vmapple_reset(void *opaque)
+{
+    VMAppleMachineState *vms = opaque;
+    hwaddr base = vms->memmap[VMAPPLE_FIRMWARE].base;
+
+    cpu_set_pc(first_cpu, base);
+}
+
+static void mach_vmapple_init(MachineState *machine)
+{
+    VMAppleMachineState *vms = VMAPPLE_MACHINE(machine);
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    const CPUArchIdList *possible_cpus;
+    MemoryRegion *sysmem = get_system_memory();
+    int n;
+    unsigned int smp_cpus = machine->smp.cpus;
+    unsigned int max_cpus = machine->smp.max_cpus;
+
+    vms->memmap = memmap;
+    machine->usb = true;
+
+    possible_cpus = mc->possible_cpu_arch_ids(machine);
+    assert(possible_cpus->len == max_cpus);
+    for (n = 0; n < possible_cpus->len; n++) {
+        Object *cpu;
+        CPUState *cs;
+
+        if (n >= smp_cpus) {
+            break;
+        }
+
+        cpu = object_new(possible_cpus->cpus[n].type);
+        object_property_set_int(cpu, "mp-affinity",
+                                possible_cpus->cpus[n].arch_id, NULL);
+
+        cs = CPU(cpu);
+        cs->cpu_index = n;
+
+        numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index], DEVICE(cpu),
+                          &error_fatal);
+
+        object_property_set_bool(cpu, "has_el3", false, NULL);
+        object_property_set_bool(cpu, "has_el2", false, NULL);
+        object_property_set_int(cpu, "psci-conduit", QEMU_PSCI_CONDUIT_HVC,
+                                NULL);
+
+        /* Secondary CPUs start in PSCI powered-down state */
+        if (n > 0) {
+            object_property_set_bool(cpu, "start-powered-off", true, NULL);
+        }
+
+        object_property_set_link(cpu, "memory", OBJECT(sysmem), &error_abort);
+        qdev_realize(DEVICE(cpu), NULL, &error_fatal);
+        object_unref(cpu);
+    }
+
+    memory_region_add_subregion(sysmem, vms->memmap[VMAPPLE_MEM].base,
+                                machine->ram);
+
+    create_gic(vms, sysmem);
+    create_bdif(vms, sysmem);
+    create_pvpanic(vms, sysmem);
+    create_aes(vms, sysmem);
+    create_gfx(vms, sysmem);
+    create_uart(vms, VMAPPLE_UART, sysmem, serial_hd(0));
+    create_rtc(vms);
+    create_pcie(vms);
+
+    create_gpio_devices(vms, VMAPPLE_GPIO, sysmem);
+
+    vmapple_firmware_init(vms, sysmem);
+    create_cfg(vms, sysmem);
+
+    /* connect powerdown request */
+    vms->powerdown_notifier.notify = vmapple_powerdown_req;
+    qemu_register_powerdown_notifier(&vms->powerdown_notifier);
+
+    vms->bootinfo.ram_size = machine->ram_size;
+    vms->bootinfo.board_id = -1;
+    vms->bootinfo.loader_start = vms->memmap[VMAPPLE_MEM].base;
+    vms->bootinfo.skip_dtb_autoload = true;
+    vms->bootinfo.firmware_loaded = true;
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
+
+    qemu_register_reset(vmapple_reset, vms);
+}
+
+static CpuInstanceProperties
+vmapple_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;
+}
+
+
+static int64_t vmapple_get_default_cpu_node_id(const MachineState *ms, int idx)
+{
+    return idx % ms->numa_state->num_nodes;
+}
+
+static const CPUArchIdList *vmapple_possible_cpu_arch_ids(MachineState *ms)
+{
+    int n;
+    unsigned int max_cpus = ms->smp.max_cpus;
+
+    if (ms->possible_cpus) {
+        assert(ms->possible_cpus->len == max_cpus);
+        return ms->possible_cpus;
+    }
+
+    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+                                  sizeof(CPUArchId) * max_cpus);
+    ms->possible_cpus->len = max_cpus;
+    for (n = 0; n < ms->possible_cpus->len; n++) {
+        ms->possible_cpus->cpus[n].type = ms->cpu_type;
+        ms->possible_cpus->cpus[n].arch_id =
+            arm_build_mp_affinity(n, GICV3_TARGETLIST_BITS);
+        ms->possible_cpus->cpus[n].props.has_thread_id = true;
+        ms->possible_cpus->cpus[n].props.thread_id = n;
+    }
+    return ms->possible_cpus;
+}
+
+static void vmapple_get_uuid(Object *obj, Visitor *v, const char *name,
+                             void *opaque, Error **errp)
+{
+    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
+
+    visit_type_uint64(v, name, &vms->uuid, errp);
+}
+
+static void vmapple_set_uuid(Object *obj, Visitor *v, const char *name,
+                             void *opaque, Error **errp)
+{
+    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
+    Error *error = NULL;
+
+    visit_type_uint64(v, name, &vms->uuid, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+}
+
+static void vmapple_machine_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->init = mach_vmapple_init;
+    mc->max_cpus = 32;
+    mc->block_default_type = IF_VIRTIO;
+    mc->no_cdrom = 1;
+    mc->pci_allow_0_address = true;
+    mc->minimum_page_bits = 12;
+    mc->possible_cpu_arch_ids = vmapple_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = vmapple_cpu_index_to_props;
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("host");
+    mc->get_default_cpu_node_id = vmapple_get_default_cpu_node_id;
+    mc->default_ram_id = "mach-vmapple.ram";
+
+    object_register_sugar_prop(TYPE_VIRTIO_PCI, "disable-legacy",
+                               "on", true);
+
+    object_class_property_add(oc, "uuid", "uint64", vmapple_get_uuid,
+                              vmapple_set_uuid, NULL, NULL);
+    object_class_property_set_description(oc, "uuid", "Machine UUID (SDOM)");
+}
+
+static void vmapple_instance_init(Object *obj)
+{
+    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
+
+    vms->irqmap = irqmap;
+}
+
+static const TypeInfo vmapple_machine_info = {
+    .name          = TYPE_VMAPPLE_MACHINE,
+    .parent        = TYPE_MACHINE,
+    .abstract      = true,
+    .instance_size = sizeof(VMAppleMachineState),
+    .class_size    = sizeof(VMAppleMachineClass),
+    .class_init    = vmapple_machine_class_init,
+    .instance_init = vmapple_instance_init,
+};
+
+static void machvmapple_machine_init(void)
+{
+    type_register_static(&vmapple_machine_info);
+}
+type_init(machvmapple_machine_init);
+
+static void vmapple_machine_9_2_options(MachineClass *mc)
+{
+}
+DEFINE_VMAPPLE_MACHINE_AS_LATEST(9, 2)
+
-- 
2.39.3 (Apple Git-145)



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type
  2024-10-24 10:28 ` [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type Phil Dennis-Jordan
@ 2024-10-26  6:20   ` Akihiko Odaki
  2024-10-26 11:58     ` Phil Dennis-Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Akihiko Odaki @ 2024-10-26  6:20 UTC (permalink / raw)
  To: Phil Dennis-Jordan, qemu-devel
  Cc: agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> From: Alexander Graf <graf@amazon.com>
> 
> Apple defines a new "vmapple" machine type as part of its proprietary
> macOS Virtualization.Framework vmm. This machine type is similar to the
> virt one, but with subtle differences in base devices, a few special
> vmapple device additions and a vastly different boot chain.
> 
> This patch reimplements this machine type in QEMU. To use it, you
> have to have a readily installed version of macOS for VMApple,
> run on macOS with -accel hvf, pass the Virtualization.Framework
> boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
> and pass aux and root volume as virtio drives. In addition, you also
> need to find the machine UUID and pass that as -M vmapple,uuid= parameter:
> 
> $ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
>      -bios /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
>      -drive file=aux,if=pflash,format=raw \
>      -drive file=root,if=pflash,format=raw \
>      -drive file=aux,if=none,id=aux,format=raw \
>      -device vmapple-virtio-aux,drive=aux \
>      -drive file=root,if=none,id=root,format=raw \
>      -device vmapple-virtio-root,drive=root
> 
> With all these in place, you should be able to see macOS booting
> successfully.
> 
> Known issues:
>   - Keyboard and mouse/tablet input is laggy. The reason for this is
>     either that macOS's XHCI driver is broken when the device/platform
>     does not support MSI/MSI-X, or there's some unfortunate interplay
>     with Qemu's XHCI implementation in this scenario.
>   - Currently only macOS 12 guests are supported. The boot process for
>     13+ will need further investigation and adjustment.
> 
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Co-authored-by: Phil Dennis-Jordan <phil@philjordan.eu>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
> v3:
>   * Rebased on latest upstream, updated affinity and NIC creation
> API usage
>   * Included Apple-variant virtio-blk in build dependency
>   * Updated API usage for setting 'redist-region-count' array-typed property on GIC.
>   * Switched from virtio HID devices (for which macOS 12 does not contain drivers) to an XHCI USB controller and USB HID devices.
> 
> v4:
>   * Fixups for v4 changes to the other patches in the set.
>   * Corrected the assert macro to use
>   * Removed superfluous endian conversions corresponding to cfg's.
>   * Init error handling improvement.
>   * No need to select CPU type on TCG, as only HVF is supported.
>   * Machine type version bumped to 9.2
>   * #include order improved
> 
>   MAINTAINERS                 |   1 +
>   docs/system/arm/vmapple.rst |  63 ++++
>   docs/system/target-arm.rst  |   1 +
>   hw/vmapple/Kconfig          |  20 ++
>   hw/vmapple/meson.build      |   1 +
>   hw/vmapple/vmapple.c        | 652 ++++++++++++++++++++++++++++++++++++
>   6 files changed, 738 insertions(+)
>   create mode 100644 docs/system/arm/vmapple.rst
>   create mode 100644 hw/vmapple/vmapple.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 104813ed85f..f44418b4a95 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2739,6 +2739,7 @@ R: Phil Dennis-Jordan <phil@philjordan.eu>
>   S: Maintained
>   F: hw/vmapple/*
>   F: include/hw/vmapple/*
> +F: docs/system/arm/vmapple.rst
>   
>   Subsystems
>   ----------
> diff --git a/docs/system/arm/vmapple.rst b/docs/system/arm/vmapple.rst
> new file mode 100644
> index 00000000000..acb921ffb35
> --- /dev/null
> +++ b/docs/system/arm/vmapple.rst
> @@ -0,0 +1,63 @@
> +VMApple machine emulation
> +========================================================================================
> +
> +VMApple is the device model that the macOS built-in hypervisor called "Virtualization.framework"
> +exposes to Apple Silicon macOS guests. The "vmapple" machine model in QEMU implements the same
> +device model, but does not use any code from Virtualization.Framework.
> +
> +Prerequisites
> +-------------
> +
> +To run the vmapple machine model, you need to
> +
> + * Run on Apple Silicon
> + * Run on macOS 12.0 or above
> + * Have an already installed copy of a Virtualization.Framework macOS 12 virtual machine. I will
> +   assume that you installed it using the macosvm CLI.
> +
> +First, we need to extract the UUID from the virtual machine that you installed. You can do this
> +by running the following shell script:
> +
> +.. code-block:: bash
> +  :caption: uuid.sh script to extract the UUID from a macosvm.json file
> +
> +  #!/bin/bash
> +
> +  MID=$(cat "$1" | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]);')
> +  echo "$MID" | base64 -d | plutil -extract ECID raw -

I prefer it to be written entirely in Python instead of a mixture of 
Python and Bash.

Perhaps it is better to put this script in contrib to avoid requiring 
the user to create a file and copy and paste it.

> +
> +Now we also need to trim the aux partition. It contains metadata that we can just discard:
> +
> +.. code-block:: bash
> +  :caption: Command to trim the aux file
> +
> +  $ dd if="aux.img" of="aux.img.trimmed" bs=$(( 0x4000 )) skip=1
> +
> +How to run
> +----------
> +
> +Then, we can launch QEMU with the Virtualization.Framework pre-boot environment and the readily
> +installed target disk images. I recommend to port forward the VM's ssh and vnc ports to the host
> +to get better interactive access into the target system:
> +
> +.. code-block:: bash
> +  :caption: Example execution command line
> +
> +  $ UUID=$(uuid.sh macosvm.json)
> +  $ AVPBOOTER=/System/Library/Frameworks/Virtualization.framework/Resources/AVPBooter.vmapple2.bin
> +  $ AUX=aux.img.trimmed
> +  $ DISK=disk.img
> +  $ qemu-system-aarch64 \
> +       -serial mon:stdio \
> +       -m 4G \
> +       -accel hvf \
> +       -M vmapple,uuid=$UUID \
> +       -bios $AVPBOOTER \
> +        -drive file="$AUX",if=pflash,format=raw \
> +        -drive file="$DISK",if=pflash,format=raw \
> +       -drive file="$AUX",if=none,id=aux,format=raw \
> +       -drive file="$DISK",if=none,id=root,format=raw \
> +       -device vmapple-virtio-aux,drive=aux \
> +       -device vmapple-virtio-root,drive=root \
> +       -net user,ipv6=off,hostfwd=tcp::2222-:22,hostfwd=tcp::5901-:5900 \
> +       -net nic,model=virtio-net-pci \
> diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
> index 3c0a5848453..f2e0ac99537 100644
> --- a/docs/system/target-arm.rst
> +++ b/docs/system/target-arm.rst
> @@ -102,6 +102,7 @@ undocumented; you can get a complete list by running
>      arm/stellaris
>      arm/stm32
>      arm/virt
> +   arm/vmapple
>      arm/xenpvh
>      arm/xlnx-versal-virt
>      arm/xlnx-zynq
> diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
> index bcd1be63e3c..6a4c4a7fa2e 100644
> --- a/hw/vmapple/Kconfig
> +++ b/hw/vmapple/Kconfig
> @@ -10,3 +10,23 @@ config VMAPPLE_CFG
>   config VMAPPLE_VIRTIO_BLK
>       bool
>   
> +config VMAPPLE
> +    bool
> +    depends on ARM
> +    depends on HVF
> +    default y if ARM
> +    imply PCI_DEVICES
> +    select ARM_GIC
> +    select PLATFORM_BUS
> +    select PCI_EXPRESS
> +    select PCI_EXPRESS_GENERIC_BRIDGE
> +    select PL011 # UART
> +    select PL031 # RTC
> +    select PL061 # GPIO
> +    select GPIO_PWR
> +    select PVPANIC_MMIO
> +    select VMAPPLE_AES
> +    select VMAPPLE_BDIF
> +    select VMAPPLE_CFG
> +    select MAC_PVG_MMIO
> +    select VMAPPLE_VIRTIO_BLK
> diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
> index bf17cf906c9..e572f7d5602 100644
> --- a/hw/vmapple/meson.build
> +++ b/hw/vmapple/meson.build
> @@ -2,3 +2,4 @@ system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: files('aes.c'))
>   system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
>   system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
>   system_ss.add(when: 'CONFIG_VMAPPLE_VIRTIO_BLK',  if_true: files('virtio-blk.c'))
> +specific_ss.add(when: 'CONFIG_VMAPPLE',     if_true: files('vmapple.c'))
> diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c
> new file mode 100644
> index 00000000000..b9454c07eee
> --- /dev/null
> +++ b/hw/vmapple/vmapple.c
> @@ -0,0 +1,652 @@
> +/*
> + * VMApple machine emulation
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + * VMApple is the device model that the macOS built-in hypervisor called
> + * "Virtualization.framework" exposes to Apple Silicon macOS guests. The
> + * machine model in this file implements the same device model in QEMU, but
> + * does not use any code from Virtualization.Framework.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/bitops.h"
> +#include "qemu/datadir.h"
> +#include "qemu/error-report.h"
> +#include "qemu/guest-random.h"
> +#include "qemu/help-texts.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qemu/option.h"
> +#include "qemu/units.h"
> +#include "monitor/qdev.h"
> +#include "hw/boards.h"
> +#include "hw/irq.h"
> +#include "hw/loader.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/sysbus.h"
> +#include "hw/usb.h"
> +#include "hw/arm/boot.h"
> +#include "hw/arm/primecell.h"
> +#include "hw/char/pl011.h"
> +#include "hw/intc/arm_gic.h"
> +#include "hw/intc/arm_gicv3_common.h"
> +#include "hw/misc/pvpanic.h"
> +#include "hw/pci-host/gpex.h"
> +#include "hw/usb/xhci.h"
> +#include "hw/virtio/virtio-pci.h"
> +#include "hw/vmapple/vmapple.h"
> +#include "net/net.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qlist.h"
> +#include "qapi/visitor.h"
> +#include "qapi/qapi-visit-common.h"
> +#include "standard-headers/linux/input.h"
> +#include "sysemu/hvf.h"
> +#include "sysemu/kvm.h"
> +#include "sysemu/reset.h"
> +#include "sysemu/runstate.h"
> +#include "sysemu/sysemu.h"
> +#include "target/arm/internals.h"
> +#include "target/arm/kvm_arm.h"
> +
> +struct VMAppleMachineClass {
> +    MachineClass parent;
> +};
> +
> +struct VMAppleMachineState {
> +    MachineState parent;
> +
> +    Notifier machine_done;
> +    struct arm_boot_info bootinfo;
> +    MemMapEntry *memmap;
> +    const int *irqmap;
> +    DeviceState *gic;
> +    DeviceState *cfg;
> +    Notifier powerdown_notifier;
> +    PCIBus *bus;
> +    MemoryRegion fw_mr;
> +    uint64_t uuid;
> +};
> +
> +#define DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, latest) \
> +    static void vmapple##major##_##minor##_class_init(ObjectClass *oc, \
> +                                                    void *data) \
> +    { \
> +        MachineClass *mc = MACHINE_CLASS(oc); \
> +        vmapple_machine_##major##_##minor##_options(mc); \
> +        mc->desc = "QEMU " # major "." # minor " Apple Virtual Machine"; \
> +        if (latest) { \
> +            mc->alias = "vmapple"; \
> +        } \
> +    } \
> +    static const TypeInfo machvmapple##major##_##minor##_info = { \
> +        .name = MACHINE_TYPE_NAME("vmapple-" # major "." # minor), \
> +        .parent = TYPE_VMAPPLE_MACHINE, \
> +        .class_init = vmapple##major##_##minor##_class_init, \
> +    }; \
> +    static void machvmapple_machine_##major##_##minor##_init(void) \
> +    { \
> +        type_register_static(&machvmapple##major##_##minor##_info); \
> +    } \
> +    type_init(machvmapple_machine_##major##_##minor##_init);
> +
> +#define DEFINE_VMAPPLE_MACHINE_AS_LATEST(major, minor) \
> +    DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, true)
> +#define DEFINE_VMAPPLE_MACHINE(major, minor) \
> +    DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, false)
> +
> +#define TYPE_VMAPPLE_MACHINE   MACHINE_TYPE_NAME("vmapple")
> +OBJECT_DECLARE_TYPE(VMAppleMachineState, VMAppleMachineClass, VMAPPLE_MACHINE)
> +
> +/* Number of external interrupt lines to configure the GIC with */
> +#define NUM_IRQS 256
> +
> +enum {
> +    VMAPPLE_FIRMWARE,
> +    VMAPPLE_CONFIG,
> +    VMAPPLE_MEM,
> +    VMAPPLE_GIC_DIST,
> +    VMAPPLE_GIC_REDIST,
> +    VMAPPLE_UART,
> +    VMAPPLE_RTC,
> +    VMAPPLE_PCIE,
> +    VMAPPLE_PCIE_MMIO,
> +    VMAPPLE_PCIE_ECAM,
> +    VMAPPLE_GPIO,
> +    VMAPPLE_PVPANIC,
> +    VMAPPLE_APV_GFX,
> +    VMAPPLE_APV_IOSFC,
> +    VMAPPLE_AES_1,
> +    VMAPPLE_AES_2,
> +    VMAPPLE_BDOOR,
> +    VMAPPLE_MEMMAP_LAST,
> +};
> +
> +static MemMapEntry memmap[] = {
> +    [VMAPPLE_FIRMWARE] =           { 0x00100000, 0x00100000 },
> +    [VMAPPLE_CONFIG] =             { 0x00400000, 0x00010000 },
> +
> +    [VMAPPLE_GIC_DIST] =           { 0x10000000, 0x00010000 },
> +    [VMAPPLE_GIC_REDIST] =         { 0x10010000, 0x00400000 },
> +
> +    [VMAPPLE_UART] =               { 0x20010000, 0x00010000 },
> +    [VMAPPLE_RTC] =                { 0x20050000, 0x00001000 },
> +    [VMAPPLE_GPIO] =               { 0x20060000, 0x00001000 },
> +    [VMAPPLE_PVPANIC] =            { 0x20070000, 0x00000002 },
> +    [VMAPPLE_BDOOR] =              { 0x30000000, 0x00200000 },
> +    [VMAPPLE_APV_GFX] =            { 0x30200000, 0x00010000 },
> +    [VMAPPLE_APV_IOSFC] =          { 0x30210000, 0x00010000 },
> +    [VMAPPLE_AES_1] =              { 0x30220000, 0x00004000 },
> +    [VMAPPLE_AES_2] =              { 0x30230000, 0x00004000 },
> +    [VMAPPLE_PCIE_ECAM] =          { 0x40000000, 0x10000000 },
> +    [VMAPPLE_PCIE_MMIO] =          { 0x50000000, 0x1fff0000 },
> +
> +    /* Actual RAM size depends on configuration */
> +    [VMAPPLE_MEM] =                { 0x70000000ULL, GiB},
> +};
> +
> +static const int irqmap[] = {
> +    [VMAPPLE_UART] = 1,
> +    [VMAPPLE_RTC] = 2,
> +    [VMAPPLE_GPIO] = 0x5,
> +    [VMAPPLE_APV_IOSFC] = 0x10,
> +    [VMAPPLE_APV_GFX] = 0x11,
> +    [VMAPPLE_AES_1] = 0x12,
> +    [VMAPPLE_PCIE] = 0x20,
> +};
> +
> +#define GPEX_NUM_IRQS 16
> +
> +static void create_bdif(VMAppleMachineState *vms, MemoryRegion *mem)
> +{
> +    DeviceState *bdif;
> +    SysBusDevice *bdif_sb;
> +    DriveInfo *di_aux = drive_get(IF_PFLASH, 0, 0);
> +    DriveInfo *di_root = drive_get(IF_PFLASH, 0, 1);
> +
> +    if (!di_aux) {
> +        error_report("No AUX device. Please specify one as pflash drive.");
> +        exit(1);
> +    }
> +
> +    if (!di_root) {
> +        /* Fall back to the first IF_VIRTIO device as root device */
> +        di_root = drive_get(IF_VIRTIO, 0, 0);
> +    }
> +
> +    if (!di_root) {
> +        error_report("No root device. Please specify one as virtio drive.");
> +        exit(1);
> +    }
> +
> +    /* PV backdoor device */
> +    bdif = qdev_new(TYPE_VMAPPLE_BDIF);
> +    bdif_sb = SYS_BUS_DEVICE(bdif);
> +    sysbus_mmio_map(bdif_sb, 0, vms->memmap[VMAPPLE_BDOOR].base);
> +
> +    qdev_prop_set_drive(DEVICE(bdif), "aux", blk_by_legacy_dinfo(di_aux));
> +    qdev_prop_set_drive(DEVICE(bdif), "root", blk_by_legacy_dinfo(di_root));
> +
> +    sysbus_realize_and_unref(bdif_sb, &error_fatal);
> +}
> +
> +static void create_pvpanic(VMAppleMachineState *vms, MemoryRegion *mem)
> +{
> +    SysBusDevice *cfg;
> +
> +    vms->cfg = qdev_new(TYPE_PVPANIC_MMIO_DEVICE);
> +    cfg = SYS_BUS_DEVICE(vms->cfg);
> +    sysbus_mmio_map(cfg, 0, vms->memmap[VMAPPLE_PVPANIC].base);
> +
> +    sysbus_realize_and_unref(cfg, &error_fatal);
> +}
> +
> +static void create_cfg(VMAppleMachineState *vms, MemoryRegion *mem)
> +{
> +    SysBusDevice *cfg;
> +    MachineState *machine = MACHINE(vms);
> +    uint32_t rnd = 1;
> +
> +    vms->cfg = qdev_new(TYPE_VMAPPLE_CFG);
> +    cfg = SYS_BUS_DEVICE(vms->cfg);
> +    sysbus_mmio_map(cfg, 0, vms->memmap[VMAPPLE_CONFIG].base);
> +
> +    qemu_guest_getrandom_nofail(&rnd, sizeof(rnd));
> +
> +    qdev_prop_set_uint32(vms->cfg, "nr-cpus", machine->smp.cpus);
> +    qdev_prop_set_uint64(vms->cfg, "ecid", vms->uuid);
> +    qdev_prop_set_uint64(vms->cfg, "ram-size", machine->ram_size);
> +    qdev_prop_set_uint32(vms->cfg, "rnd", rnd);
> +
> +    sysbus_realize_and_unref(cfg, &error_fatal);
> +}
> +
> +static void create_gfx(VMAppleMachineState *vms, MemoryRegion *mem)
> +{
> +    int irq_gfx = vms->irqmap[VMAPPLE_APV_GFX];
> +    int irq_iosfc = vms->irqmap[VMAPPLE_APV_IOSFC];
> +    SysBusDevice *aes;
> +
> +    aes = SYS_BUS_DEVICE(qdev_new("apple-gfx-mmio"));
> +    sysbus_mmio_map(aes, 0, vms->memmap[VMAPPLE_APV_GFX].base);
> +    sysbus_mmio_map(aes, 1, vms->memmap[VMAPPLE_APV_IOSFC].base);
> +    sysbus_connect_irq(aes, 0, qdev_get_gpio_in(vms->gic, irq_gfx));
> +    sysbus_connect_irq(aes, 1, qdev_get_gpio_in(vms->gic, irq_iosfc));
> +    sysbus_realize_and_unref(aes, &error_fatal);
> +}
> +
> +static void create_aes(VMAppleMachineState *vms, MemoryRegion *mem)
> +{
> +    int irq = vms->irqmap[VMAPPLE_AES_1];
> +    SysBusDevice *aes;
> +
> +    aes = SYS_BUS_DEVICE(qdev_new("apple-aes"));
> +    sysbus_mmio_map(aes, 0, vms->memmap[VMAPPLE_AES_1].base);
> +    sysbus_mmio_map(aes, 1, vms->memmap[VMAPPLE_AES_2].base);
> +    sysbus_connect_irq(aes, 0, qdev_get_gpio_in(vms->gic, irq));
> +    sysbus_realize_and_unref(aes, &error_fatal);
> +}
> +
> +static inline int arm_gic_ppi_index(int cpu_nr, int ppi_index)
> +{
> +    return NUM_IRQS + cpu_nr * GIC_INTERNAL + ppi_index;
> +}
> +
> +static void create_gic(VMAppleMachineState *vms, MemoryRegion *mem)
> +{
> +    MachineState *ms = MACHINE(vms);
> +    /* We create a standalone GIC */
> +    SysBusDevice *gicbusdev;
> +    QList *redist_region_count;
> +    int i;
> +    unsigned int smp_cpus = ms->smp.cpus;
> +
> +    vms->gic = qdev_new(gicv3_class_name());
> +    qdev_prop_set_uint32(vms->gic, "revision", 3);
> +    qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus);
> +    /*
> +     * Note that the num-irq property counts both internal and external
> +     * interrupts; there are always 32 of the former (mandated by GIC spec).
> +     */
> +    qdev_prop_set_uint32(vms->gic, "num-irq", NUM_IRQS + 32);
> +
> +    uint32_t redist0_capacity =
> +                vms->memmap[VMAPPLE_GIC_REDIST].size / GICV3_REDIST_SIZE;
> +    uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
> +
> +    redist_region_count = qlist_new();
> +    qlist_append_int(redist_region_count, redist0_count);
> +    qdev_prop_set_array(vms->gic, "redist-region-count", redist_region_count);
> +
> +    gicbusdev = SYS_BUS_DEVICE(vms->gic);
> +    sysbus_realize_and_unref(gicbusdev, &error_fatal);
> +    sysbus_mmio_map(gicbusdev, 0, vms->memmap[VMAPPLE_GIC_DIST].base);
> +    sysbus_mmio_map(gicbusdev, 1, vms->memmap[VMAPPLE_GIC_REDIST].base);
> +
> +    /*
> +     * Wire the outputs from each CPU's generic timer and the GICv3
> +     * maintenance interrupt signal to the appropriate GIC PPI inputs,
> +     * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
> +     */
> +    for (i = 0; i < smp_cpus; i++) {
> +        DeviceState *cpudev = DEVICE(qemu_get_cpu(i));
> +
> +        /* Map the virt timer to PPI 27 */
> +        qdev_connect_gpio_out(cpudev, GTIMER_VIRT,
> +                              qdev_get_gpio_in(vms->gic,
> +                                               arm_gic_ppi_index(i, 27)));
> +
> +        /* Map the GIC IRQ and FIQ lines to CPU */
> +        sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
> +        sysbus_connect_irq(gicbusdev, i + smp_cpus,
> +                           qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
> +    }
> +}
> +
> +static void create_uart(const VMAppleMachineState *vms, int uart,
> +                        MemoryRegion *mem, Chardev *chr)
> +{
> +    hwaddr base = vms->memmap[uart].base;
> +    int irq = vms->irqmap[uart];
> +    DeviceState *dev = qdev_new(TYPE_PL011);
> +    SysBusDevice *s = SYS_BUS_DEVICE(dev);
> +
> +    qdev_prop_set_chr(dev, "chardev", chr);
> +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
> +    memory_region_add_subregion(mem, base,
> +                                sysbus_mmio_get_region(s, 0));
> +    sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq));
> +}
> +
> +static void create_rtc(const VMAppleMachineState *vms)
> +{
> +    hwaddr base = vms->memmap[VMAPPLE_RTC].base;
> +    int irq = vms->irqmap[VMAPPLE_RTC];
> +
> +    sysbus_create_simple("pl031", base, qdev_get_gpio_in(vms->gic, irq));
> +}
> +
> +static DeviceState *gpio_key_dev;
> +static void vmapple_powerdown_req(Notifier *n, void *opaque)
> +{
> +    /* use gpio Pin 3 for power button event */
> +    qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
> +}
> +
> +static void create_gpio_devices(const VMAppleMachineState *vms, int gpio,
> +                                MemoryRegion *mem)
> +{
> +    DeviceState *pl061_dev;
> +    hwaddr base = vms->memmap[gpio].base;
> +    int irq = vms->irqmap[gpio];
> +    SysBusDevice *s;
> +
> +    pl061_dev = qdev_new("pl061");
> +    /* Pull lines down to 0 if not driven by the PL061 */
> +    qdev_prop_set_uint32(pl061_dev, "pullups", 0);
> +    qdev_prop_set_uint32(pl061_dev, "pulldowns", 0xff);
> +    s = SYS_BUS_DEVICE(pl061_dev);
> +    sysbus_realize_and_unref(s, &error_fatal);
> +    memory_region_add_subregion(mem, base, sysbus_mmio_get_region(s, 0));
> +    sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq));
> +    gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> +                                        qdev_get_gpio_in(pl061_dev, 3));
> +}
> +
> +static void vmapple_firmware_init(VMAppleMachineState *vms,
> +                                  MemoryRegion *sysmem)
> +{
> +    hwaddr size = vms->memmap[VMAPPLE_FIRMWARE].size;
> +    hwaddr base = vms->memmap[VMAPPLE_FIRMWARE].base;
> +    const char *bios_name;
> +    int image_size;
> +    char *fname;
> +
> +    bios_name = MACHINE(vms)->firmware;
> +    if (!bios_name) {
> +        error_report("No firmware specified");
> +        exit(1);
> +    }
> +
> +    fname = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> +    if (!fname) {
> +        error_report("Could not find ROM image '%s'", bios_name);
> +        exit(1);
> +    }
> +
> +    memory_region_init_ram(&vms->fw_mr, NULL, "firmware", size, &error_fatal);
> +    image_size = load_image_mr(fname, &vms->fw_mr);
> +
> +    g_free(fname);
> +    if (image_size < 0) {
> +        error_report("Could not load ROM image '%s'", bios_name);
> +        exit(1);
> +    }
> +
> +    memory_region_add_subregion(get_system_memory(), base, &vms->fw_mr);
> +}
> +
> +static void create_pcie(VMAppleMachineState *vms)
> +{
> +    hwaddr base_mmio = vms->memmap[VMAPPLE_PCIE_MMIO].base;
> +    hwaddr size_mmio = vms->memmap[VMAPPLE_PCIE_MMIO].size;
> +    hwaddr base_ecam = vms->memmap[VMAPPLE_PCIE_ECAM].base;
> +    hwaddr size_ecam = vms->memmap[VMAPPLE_PCIE_ECAM].size;
> +    int irq = vms->irqmap[VMAPPLE_PCIE];
> +    MemoryRegion *mmio_alias;
> +    MemoryRegion *mmio_reg;
> +    MemoryRegion *ecam_alias;
> +    MemoryRegion *ecam_reg;
> +    DeviceState *dev;
> +    int i;
> +    PCIHostState *pci;
> +    DeviceState *usb_controller;
> +    USBBus *usb_bus;
> +
> +    dev = qdev_new(TYPE_GPEX_HOST);
> +    qdev_prop_set_uint32(dev, "num-irqs", GPEX_NUM_IRQS);
> +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
> +
> +    /* Map only the first size_ecam bytes of ECAM space */
> +    ecam_alias = g_new0(MemoryRegion, 1);

Include this in VMAppleMachineState.

> +    ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
> +    memory_region_init_alias(ecam_alias, OBJECT(dev), "pcie-ecam",
> +                             ecam_reg, 0, size_ecam);
> +    memory_region_add_subregion(get_system_memory(), base_ecam, ecam_alias);
> +
> +    /*
> +     * Map the MMIO window from [0x50000000-0x7fff0000] in PCI space into
> +     * system address space at [0x50000000-0x7fff0000].
> +     */
> +    mmio_alias = g_new0(MemoryRegion, 1);
> +    mmio_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 1);
> +    memory_region_init_alias(mmio_alias, OBJECT(dev), "pcie-mmio",
> +                             mmio_reg, base_mmio, size_mmio);
> +    memory_region_add_subregion(get_system_memory(), base_mmio, mmio_alias);
> +
> +    for (i = 0; i < GPEX_NUM_IRQS; i++) {
> +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
> +                           qdev_get_gpio_in(vms->gic, irq + i));
> +        gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
> +    }
> +
> +    pci = PCI_HOST_BRIDGE(dev);
> +    vms->bus = pci->bus;
> +    g_assert(vms->bus);
> +
> +    while ((dev = qemu_create_nic_device("virtio-net-pci", true, NULL))) {
> +        qdev_realize_and_unref(dev, BUS(vms->bus), &error_fatal);
> +    }
> +
> +    usb_controller = qdev_new(TYPE_QEMU_XHCI);
> +    qdev_realize_and_unref(usb_controller, BUS(pci->bus), &error_fatal);
> +
> +    usb_bus = USB_BUS(object_resolve_type_unambiguous(TYPE_USB_BUS,
> +                                                      &error_fatal));
> +    usb_create_simple(usb_bus, "usb-kbd");
> +    usb_create_simple(usb_bus, "usb-tablet");
> +}
> +
> +static void vmapple_reset(void *opaque)
> +{
> +    VMAppleMachineState *vms = opaque;
> +    hwaddr base = vms->memmap[VMAPPLE_FIRMWARE].base;
> +
> +    cpu_set_pc(first_cpu, base);
> +}
> +
> +static void mach_vmapple_init(MachineState *machine)
> +{
> +    VMAppleMachineState *vms = VMAPPLE_MACHINE(machine);
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    const CPUArchIdList *possible_cpus;
> +    MemoryRegion *sysmem = get_system_memory();
> +    int n;
> +    unsigned int smp_cpus = machine->smp.cpus;
> +    unsigned int max_cpus = machine->smp.max_cpus;
> +
> +    vms->memmap = memmap;
> +    machine->usb = true;
> +
> +    possible_cpus = mc->possible_cpu_arch_ids(machine);
> +    assert(possible_cpus->len == max_cpus);
> +    for (n = 0; n < possible_cpus->len; n++) {
> +        Object *cpu;
> +        CPUState *cs;
> +
> +        if (n >= smp_cpus) {
> +            break;
> +        }
> +
> +        cpu = object_new(possible_cpus->cpus[n].type);
> +        object_property_set_int(cpu, "mp-affinity",
> +                                possible_cpus->cpus[n].arch_id, NULL);

Pass &error_fatal instead of NULL.

> +
> +        cs = CPU(cpu);
> +        cs->cpu_index = n;
> +
> +        numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index], DEVICE(cpu),
> +                          &error_fatal);
> +
> +        object_property_set_bool(cpu, "has_el3", false, NULL);
> +        object_property_set_bool(cpu, "has_el2", false, NULL);
> +        object_property_set_int(cpu, "psci-conduit", QEMU_PSCI_CONDUIT_HVC,
> +                                NULL);
> +
> +        /* Secondary CPUs start in PSCI powered-down state */
> +        if (n > 0) {
> +            object_property_set_bool(cpu, "start-powered-off", true, NULL);
> +        }
> +
> +        object_property_set_link(cpu, "memory", OBJECT(sysmem), &error_abort);
> +        qdev_realize(DEVICE(cpu), NULL, &error_fatal);
> +        object_unref(cpu);
> +    }
> +
> +    memory_region_add_subregion(sysmem, vms->memmap[VMAPPLE_MEM].base,
> +                                machine->ram);
> +
> +    create_gic(vms, sysmem);
> +    create_bdif(vms, sysmem);
> +    create_pvpanic(vms, sysmem);
> +    create_aes(vms, sysmem);
> +    create_gfx(vms, sysmem);
> +    create_uart(vms, VMAPPLE_UART, sysmem, serial_hd(0));
> +    create_rtc(vms);
> +    create_pcie(vms);
> +
> +    create_gpio_devices(vms, VMAPPLE_GPIO, sysmem);
> +
> +    vmapple_firmware_init(vms, sysmem);
> +    create_cfg(vms, sysmem);
> +
> +    /* connect powerdown request */
> +    vms->powerdown_notifier.notify = vmapple_powerdown_req;
> +    qemu_register_powerdown_notifier(&vms->powerdown_notifier);
> +
> +    vms->bootinfo.ram_size = machine->ram_size;
> +    vms->bootinfo.board_id = -1;
> +    vms->bootinfo.loader_start = vms->memmap[VMAPPLE_MEM].base;
> +    vms->bootinfo.skip_dtb_autoload = true;
> +    vms->bootinfo.firmware_loaded = true;
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
> +
> +    qemu_register_reset(vmapple_reset, vms);
> +}
> +
> +static CpuInstanceProperties
> +vmapple_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;
> +}
> +
> +
> +static int64_t vmapple_get_default_cpu_node_id(const MachineState *ms, int idx)
> +{
> +    return idx % ms->numa_state->num_nodes;
> +}
> +
> +static const CPUArchIdList *vmapple_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    int n;
> +    unsigned int max_cpus = ms->smp.max_cpus;
> +
> +    if (ms->possible_cpus) {
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (n = 0; n < ms->possible_cpus->len; n++) {
> +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> +        ms->possible_cpus->cpus[n].arch_id =
> +            arm_build_mp_affinity(n, GICV3_TARGETLIST_BITS);
> +        ms->possible_cpus->cpus[n].props.has_thread_id = true;
> +        ms->possible_cpus->cpus[n].props.thread_id = n;
> +    }
> +    return ms->possible_cpus;
> +}
> +
> +static void vmapple_get_uuid(Object *obj, Visitor *v, const char *name,
> +                             void *opaque, Error **errp)
> +{
> +    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
> +
> +    visit_type_uint64(v, name, &vms->uuid, errp);
> +}
> +
> +static void vmapple_set_uuid(Object *obj, Visitor *v, const char *name,
> +                             void *opaque, Error **errp)
> +{
> +    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
> +    Error *error = NULL;
> +
> +    visit_type_uint64(v, name, &vms->uuid, &error);
> +    if (error) {
> +        error_propagate(errp, error);
> +        return;
> +    }
> +}
> +
> +static void vmapple_machine_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +
> +    mc->init = mach_vmapple_init;
> +    mc->max_cpus = 32;
> +    mc->block_default_type = IF_VIRTIO;
> +    mc->no_cdrom = 1;
> +    mc->pci_allow_0_address = true;
> +    mc->minimum_page_bits = 12;
> +    mc->possible_cpu_arch_ids = vmapple_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = vmapple_cpu_index_to_props;
> +    mc->default_cpu_type = ARM_CPU_TYPE_NAME("host");
> +    mc->get_default_cpu_node_id = vmapple_get_default_cpu_node_id;
> +    mc->default_ram_id = "mach-vmapple.ram";
> +
> +    object_register_sugar_prop(TYPE_VIRTIO_PCI, "disable-legacy",
> +                               "on", true);
> +
> +    object_class_property_add(oc, "uuid", "uint64", vmapple_get_uuid,
> +                              vmapple_set_uuid, NULL, NULL);
> +    object_class_property_set_description(oc, "uuid", "Machine UUID (SDOM)");
> +}
> +
> +static void vmapple_instance_init(Object *obj)
> +{
> +    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
> +
> +    vms->irqmap = irqmap;
> +}
> +
> +static const TypeInfo vmapple_machine_info = {
> +    .name          = TYPE_VMAPPLE_MACHINE,
> +    .parent        = TYPE_MACHINE,
> +    .abstract      = true,
> +    .instance_size = sizeof(VMAppleMachineState),
> +    .class_size    = sizeof(VMAppleMachineClass),
> +    .class_init    = vmapple_machine_class_init,
> +    .instance_init = vmapple_instance_init,
> +};
> +
> +static void machvmapple_machine_init(void)
> +{
> +    type_register_static(&vmapple_machine_info);
> +}
> +type_init(machvmapple_machine_init);
> +
> +static void vmapple_machine_9_2_options(MachineClass *mc)
> +{
> +}
> +DEFINE_VMAPPLE_MACHINE_AS_LATEST(9, 2)
> +



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type
  2024-10-26  6:20   ` Akihiko Odaki
@ 2024-10-26 11:58     ` Phil Dennis-Jordan
  0 siblings, 0 replies; 42+ messages in thread
From: Phil Dennis-Jordan @ 2024-10-26 11:58 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: qemu-devel, agraf, peter.maydell, pbonzini, rad, quic_llindhol,
	marcin.juszkiewicz, stefanha, mst, slp, richard.henderson,
	eduardo, marcel.apfelbaum, gaosong, jiaxun.yang, chenhuacai,
	kwolf, hreitz, philmd, shorne, palmer, alistair.francis, bmeng.cn,
	liwei1518, dbarboza, zhiwei_liu, jcmvbkbc, marcandre.lureau,
	berrange, qemu-arm, qemu-block, qemu-riscv, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 34792 bytes --]

On Sat, 26 Oct 2024 at 08:21, Akihiko Odaki <akihiko.odaki@daynix.com>
wrote:

> On 2024/10/24 19:28, Phil Dennis-Jordan wrote:
> > From: Alexander Graf <graf@amazon.com>
> >
> > Apple defines a new "vmapple" machine type as part of its proprietary
> > macOS Virtualization.Framework vmm. This machine type is similar to the
> > virt one, but with subtle differences in base devices, a few special
> > vmapple device additions and a vastly different boot chain.
> >
> > This patch reimplements this machine type in QEMU. To use it, you
> > have to have a readily installed version of macOS for VMApple,
> > run on macOS with -accel hvf, pass the Virtualization.Framework
> > boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
> > and pass aux and root volume as virtio drives. In addition, you also
> > need to find the machine UUID and pass that as -M vmapple,uuid=
> parameter:
> >
> > $ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
> >      -bios
> /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
> >      -drive file=aux,if=pflash,format=raw \
> >      -drive file=root,if=pflash,format=raw \
> >      -drive file=aux,if=none,id=aux,format=raw \
> >      -device vmapple-virtio-aux,drive=aux \
> >      -drive file=root,if=none,id=root,format=raw \
> >      -device vmapple-virtio-root,drive=root
> >
> > With all these in place, you should be able to see macOS booting
> > successfully.
> >
> > Known issues:
> >   - Keyboard and mouse/tablet input is laggy. The reason for this is
> >     either that macOS's XHCI driver is broken when the device/platform
> >     does not support MSI/MSI-X, or there's some unfortunate interplay
> >     with Qemu's XHCI implementation in this scenario.
> >   - Currently only macOS 12 guests are supported. The boot process for
> >     13+ will need further investigation and adjustment.
> >
> > Signed-off-by: Alexander Graf <graf@amazon.com>
> > Co-authored-by: Phil Dennis-Jordan <phil@philjordan.eu>
> > Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
> > ---
> > v3:
> >   * Rebased on latest upstream, updated affinity and NIC creation
> > API usage
> >   * Included Apple-variant virtio-blk in build dependency
> >   * Updated API usage for setting 'redist-region-count' array-typed
> property on GIC.
> >   * Switched from virtio HID devices (for which macOS 12 does not
> contain drivers) to an XHCI USB controller and USB HID devices.
> >
> > v4:
> >   * Fixups for v4 changes to the other patches in the set.
> >   * Corrected the assert macro to use
> >   * Removed superfluous endian conversions corresponding to cfg's.
> >   * Init error handling improvement.
> >   * No need to select CPU type on TCG, as only HVF is supported.
> >   * Machine type version bumped to 9.2
> >   * #include order improved
> >
> >   MAINTAINERS                 |   1 +
> >   docs/system/arm/vmapple.rst |  63 ++++
> >   docs/system/target-arm.rst  |   1 +
> >   hw/vmapple/Kconfig          |  20 ++
> >   hw/vmapple/meson.build      |   1 +
> >   hw/vmapple/vmapple.c        | 652 ++++++++++++++++++++++++++++++++++++
> >   6 files changed, 738 insertions(+)
> >   create mode 100644 docs/system/arm/vmapple.rst
> >   create mode 100644 hw/vmapple/vmapple.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 104813ed85f..f44418b4a95 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2739,6 +2739,7 @@ R: Phil Dennis-Jordan <phil@philjordan.eu>
> >   S: Maintained
> >   F: hw/vmapple/*
> >   F: include/hw/vmapple/*
> > +F: docs/system/arm/vmapple.rst
> >
> >   Subsystems
> >   ----------
> > diff --git a/docs/system/arm/vmapple.rst b/docs/system/arm/vmapple.rst
> > new file mode 100644
> > index 00000000000..acb921ffb35
> > --- /dev/null
> > +++ b/docs/system/arm/vmapple.rst
> > @@ -0,0 +1,63 @@
> > +VMApple machine emulation
> >
> +========================================================================================
> > +
> > +VMApple is the device model that the macOS built-in hypervisor called
> "Virtualization.framework"
> > +exposes to Apple Silicon macOS guests. The "vmapple" machine model in
> QEMU implements the same
> > +device model, but does not use any code from Virtualization.Framework.
> > +
> > +Prerequisites
> > +-------------
> > +
> > +To run the vmapple machine model, you need to
> > +
> > + * Run on Apple Silicon
> > + * Run on macOS 12.0 or above
> > + * Have an already installed copy of a Virtualization.Framework macOS
> 12 virtual machine. I will
> > +   assume that you installed it using the macosvm CLI.
> > +
> > +First, we need to extract the UUID from the virtual machine that you
> installed. You can do this
> > +by running the following shell script:
> > +
> > +.. code-block:: bash
> > +  :caption: uuid.sh script to extract the UUID from a macosvm.json file
> > +
> > +  #!/bin/bash
> > +
> > +  MID=$(cat "$1" | python3 -c 'import
> json,sys;obj=json.load(sys.stdin);print(obj["machineId"]);')
> > +  echo "$MID" | base64 -d | plutil -extract ECID raw -
>
> I prefer it to be written entirely in Python instead of a mixture of
> Python and Bash.
>

I have essentially zero Python skills, so I won't be doing that. I can
however remove the Python code entirely by using:

plutil -extract machineId raw "$1" | base64 -d | plutil -extract ECID raw -

Perhaps it is better to put this script in contrib to avoid requiring
> the user to create a file and copy and paste it.
>

This I can do.


> > +
> > +Now we also need to trim the aux partition. It contains metadata that
> we can just discard:
> > +
> > +.. code-block:: bash
> > +  :caption: Command to trim the aux file
> > +
> > +  $ dd if="aux.img" of="aux.img.trimmed" bs=$(( 0x4000 )) skip=1
> > +
> > +How to run
> > +----------
> > +
> > +Then, we can launch QEMU with the Virtualization.Framework pre-boot
> environment and the readily
> > +installed target disk images. I recommend to port forward the VM's ssh
> and vnc ports to the host
> > +to get better interactive access into the target system:
> > +
> > +.. code-block:: bash
> > +  :caption: Example execution command line
> > +
> > +  $ UUID=$(uuid.sh macosvm.json)
> > +  $
> AVPBOOTER=/System/Library/Frameworks/Virtualization.framework/Resources/AVPBooter.vmapple2.bin
> > +  $ AUX=aux.img.trimmed
> > +  $ DISK=disk.img
> > +  $ qemu-system-aarch64 \
> > +       -serial mon:stdio \
> > +       -m 4G \
> > +       -accel hvf \
> > +       -M vmapple,uuid=$UUID \
> > +       -bios $AVPBOOTER \
> > +        -drive file="$AUX",if=pflash,format=raw \
> > +        -drive file="$DISK",if=pflash,format=raw \
> > +       -drive file="$AUX",if=none,id=aux,format=raw \
> > +       -drive file="$DISK",if=none,id=root,format=raw \
> > +       -device vmapple-virtio-aux,drive=aux \
> > +       -device vmapple-virtio-root,drive=root \
> > +       -net user,ipv6=off,hostfwd=tcp::2222-:22,hostfwd=tcp::5901-:5900
> \
> > +       -net nic,model=virtio-net-pci \
> > diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
> > index 3c0a5848453..f2e0ac99537 100644
> > --- a/docs/system/target-arm.rst
> > +++ b/docs/system/target-arm.rst
> > @@ -102,6 +102,7 @@ undocumented; you can get a complete list by running
> >      arm/stellaris
> >      arm/stm32
> >      arm/virt
> > +   arm/vmapple
> >      arm/xenpvh
> >      arm/xlnx-versal-virt
> >      arm/xlnx-zynq
> > diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
> > index bcd1be63e3c..6a4c4a7fa2e 100644
> > --- a/hw/vmapple/Kconfig
> > +++ b/hw/vmapple/Kconfig
> > @@ -10,3 +10,23 @@ config VMAPPLE_CFG
> >   config VMAPPLE_VIRTIO_BLK
> >       bool
> >
> > +config VMAPPLE
> > +    bool
> > +    depends on ARM
> > +    depends on HVF
> > +    default y if ARM
> > +    imply PCI_DEVICES
> > +    select ARM_GIC
> > +    select PLATFORM_BUS
> > +    select PCI_EXPRESS
> > +    select PCI_EXPRESS_GENERIC_BRIDGE
> > +    select PL011 # UART
> > +    select PL031 # RTC
> > +    select PL061 # GPIO
> > +    select GPIO_PWR
> > +    select PVPANIC_MMIO
> > +    select VMAPPLE_AES
> > +    select VMAPPLE_BDIF
> > +    select VMAPPLE_CFG
> > +    select MAC_PVG_MMIO
> > +    select VMAPPLE_VIRTIO_BLK
> > diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
> > index bf17cf906c9..e572f7d5602 100644
> > --- a/hw/vmapple/meson.build
> > +++ b/hw/vmapple/meson.build
> > @@ -2,3 +2,4 @@ system_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true:
> files('aes.c'))
> >   system_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
> >   system_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
> >   system_ss.add(when: 'CONFIG_VMAPPLE_VIRTIO_BLK',  if_true:
> files('virtio-blk.c'))
> > +specific_ss.add(when: 'CONFIG_VMAPPLE',     if_true: files('vmapple.c'))
> > diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c
> > new file mode 100644
> > index 00000000000..b9454c07eee
> > --- /dev/null
> > +++ b/hw/vmapple/vmapple.c
> > @@ -0,0 +1,652 @@
> > +/*
> > + * VMApple machine emulation
> > + *
> > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights
> Reserved.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + * VMApple is the device model that the macOS built-in hypervisor called
> > + * "Virtualization.framework" exposes to Apple Silicon macOS guests. The
> > + * machine model in this file implements the same device model in QEMU,
> but
> > + * does not use any code from Virtualization.Framework.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/bitops.h"
> > +#include "qemu/datadir.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/guest-random.h"
> > +#include "qemu/help-texts.h"
> > +#include "qemu/log.h"
> > +#include "qemu/module.h"
> > +#include "qemu/option.h"
> > +#include "qemu/units.h"
> > +#include "monitor/qdev.h"
> > +#include "hw/boards.h"
> > +#include "hw/irq.h"
> > +#include "hw/loader.h"
> > +#include "hw/qdev-properties.h"
> > +#include "hw/sysbus.h"
> > +#include "hw/usb.h"
> > +#include "hw/arm/boot.h"
> > +#include "hw/arm/primecell.h"
> > +#include "hw/char/pl011.h"
> > +#include "hw/intc/arm_gic.h"
> > +#include "hw/intc/arm_gicv3_common.h"
> > +#include "hw/misc/pvpanic.h"
> > +#include "hw/pci-host/gpex.h"
> > +#include "hw/usb/xhci.h"
> > +#include "hw/virtio/virtio-pci.h"
> > +#include "hw/vmapple/vmapple.h"
> > +#include "net/net.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qmp/qlist.h"
> > +#include "qapi/visitor.h"
> > +#include "qapi/qapi-visit-common.h"
> > +#include "standard-headers/linux/input.h"
> > +#include "sysemu/hvf.h"
> > +#include "sysemu/kvm.h"
> > +#include "sysemu/reset.h"
> > +#include "sysemu/runstate.h"
> > +#include "sysemu/sysemu.h"
> > +#include "target/arm/internals.h"
> > +#include "target/arm/kvm_arm.h"
> > +
> > +struct VMAppleMachineClass {
> > +    MachineClass parent;
> > +};
> > +
> > +struct VMAppleMachineState {
> > +    MachineState parent;
> > +
> > +    Notifier machine_done;
> > +    struct arm_boot_info bootinfo;
> > +    MemMapEntry *memmap;
> > +    const int *irqmap;
> > +    DeviceState *gic;
> > +    DeviceState *cfg;
> > +    Notifier powerdown_notifier;
> > +    PCIBus *bus;
> > +    MemoryRegion fw_mr;
> > +    uint64_t uuid;
> > +};
> > +
> > +#define DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, latest) \
> > +    static void vmapple##major##_##minor##_class_init(ObjectClass *oc, \
> > +                                                    void *data) \
> > +    { \
> > +        MachineClass *mc = MACHINE_CLASS(oc); \
> > +        vmapple_machine_##major##_##minor##_options(mc); \
> > +        mc->desc = "QEMU " # major "." # minor " Apple Virtual
> Machine"; \
> > +        if (latest) { \
> > +            mc->alias = "vmapple"; \
> > +        } \
> > +    } \
> > +    static const TypeInfo machvmapple##major##_##minor##_info = { \
> > +        .name = MACHINE_TYPE_NAME("vmapple-" # major "." # minor), \
> > +        .parent = TYPE_VMAPPLE_MACHINE, \
> > +        .class_init = vmapple##major##_##minor##_class_init, \
> > +    }; \
> > +    static void machvmapple_machine_##major##_##minor##_init(void) \
> > +    { \
> > +        type_register_static(&machvmapple##major##_##minor##_info); \
> > +    } \
> > +    type_init(machvmapple_machine_##major##_##minor##_init);
> > +
> > +#define DEFINE_VMAPPLE_MACHINE_AS_LATEST(major, minor) \
> > +    DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, true)
> > +#define DEFINE_VMAPPLE_MACHINE(major, minor) \
> > +    DEFINE_VMAPPLE_MACHINE_LATEST(major, minor, false)
> > +
> > +#define TYPE_VMAPPLE_MACHINE   MACHINE_TYPE_NAME("vmapple")
> > +OBJECT_DECLARE_TYPE(VMAppleMachineState, VMAppleMachineClass,
> VMAPPLE_MACHINE)
> > +
> > +/* Number of external interrupt lines to configure the GIC with */
> > +#define NUM_IRQS 256
> > +
> > +enum {
> > +    VMAPPLE_FIRMWARE,
> > +    VMAPPLE_CONFIG,
> > +    VMAPPLE_MEM,
> > +    VMAPPLE_GIC_DIST,
> > +    VMAPPLE_GIC_REDIST,
> > +    VMAPPLE_UART,
> > +    VMAPPLE_RTC,
> > +    VMAPPLE_PCIE,
> > +    VMAPPLE_PCIE_MMIO,
> > +    VMAPPLE_PCIE_ECAM,
> > +    VMAPPLE_GPIO,
> > +    VMAPPLE_PVPANIC,
> > +    VMAPPLE_APV_GFX,
> > +    VMAPPLE_APV_IOSFC,
> > +    VMAPPLE_AES_1,
> > +    VMAPPLE_AES_2,
> > +    VMAPPLE_BDOOR,
> > +    VMAPPLE_MEMMAP_LAST,
> > +};
> > +
> > +static MemMapEntry memmap[] = {
> > +    [VMAPPLE_FIRMWARE] =           { 0x00100000, 0x00100000 },
> > +    [VMAPPLE_CONFIG] =             { 0x00400000, 0x00010000 },
> > +
> > +    [VMAPPLE_GIC_DIST] =           { 0x10000000, 0x00010000 },
> > +    [VMAPPLE_GIC_REDIST] =         { 0x10010000, 0x00400000 },
> > +
> > +    [VMAPPLE_UART] =               { 0x20010000, 0x00010000 },
> > +    [VMAPPLE_RTC] =                { 0x20050000, 0x00001000 },
> > +    [VMAPPLE_GPIO] =               { 0x20060000, 0x00001000 },
> > +    [VMAPPLE_PVPANIC] =            { 0x20070000, 0x00000002 },
> > +    [VMAPPLE_BDOOR] =              { 0x30000000, 0x00200000 },
> > +    [VMAPPLE_APV_GFX] =            { 0x30200000, 0x00010000 },
> > +    [VMAPPLE_APV_IOSFC] =          { 0x30210000, 0x00010000 },
> > +    [VMAPPLE_AES_1] =              { 0x30220000, 0x00004000 },
> > +    [VMAPPLE_AES_2] =              { 0x30230000, 0x00004000 },
> > +    [VMAPPLE_PCIE_ECAM] =          { 0x40000000, 0x10000000 },
> > +    [VMAPPLE_PCIE_MMIO] =          { 0x50000000, 0x1fff0000 },
> > +
> > +    /* Actual RAM size depends on configuration */
> > +    [VMAPPLE_MEM] =                { 0x70000000ULL, GiB},
> > +};
> > +
> > +static const int irqmap[] = {
> > +    [VMAPPLE_UART] = 1,
> > +    [VMAPPLE_RTC] = 2,
> > +    [VMAPPLE_GPIO] = 0x5,
> > +    [VMAPPLE_APV_IOSFC] = 0x10,
> > +    [VMAPPLE_APV_GFX] = 0x11,
> > +    [VMAPPLE_AES_1] = 0x12,
> > +    [VMAPPLE_PCIE] = 0x20,
> > +};
> > +
> > +#define GPEX_NUM_IRQS 16
> > +
> > +static void create_bdif(VMAppleMachineState *vms, MemoryRegion *mem)
> > +{
> > +    DeviceState *bdif;
> > +    SysBusDevice *bdif_sb;
> > +    DriveInfo *di_aux = drive_get(IF_PFLASH, 0, 0);
> > +    DriveInfo *di_root = drive_get(IF_PFLASH, 0, 1);
> > +
> > +    if (!di_aux) {
> > +        error_report("No AUX device. Please specify one as pflash
> drive.");
> > +        exit(1);
> > +    }
> > +
> > +    if (!di_root) {
> > +        /* Fall back to the first IF_VIRTIO device as root device */
> > +        di_root = drive_get(IF_VIRTIO, 0, 0);
> > +    }
> > +
> > +    if (!di_root) {
> > +        error_report("No root device. Please specify one as virtio
> drive.");
> > +        exit(1);
> > +    }
> > +
> > +    /* PV backdoor device */
> > +    bdif = qdev_new(TYPE_VMAPPLE_BDIF);
> > +    bdif_sb = SYS_BUS_DEVICE(bdif);
> > +    sysbus_mmio_map(bdif_sb, 0, vms->memmap[VMAPPLE_BDOOR].base);
> > +
> > +    qdev_prop_set_drive(DEVICE(bdif), "aux",
> blk_by_legacy_dinfo(di_aux));
> > +    qdev_prop_set_drive(DEVICE(bdif), "root",
> blk_by_legacy_dinfo(di_root));
> > +
> > +    sysbus_realize_and_unref(bdif_sb, &error_fatal);
> > +}
> > +
> > +static void create_pvpanic(VMAppleMachineState *vms, MemoryRegion *mem)
> > +{
> > +    SysBusDevice *cfg;
> > +
> > +    vms->cfg = qdev_new(TYPE_PVPANIC_MMIO_DEVICE);
> > +    cfg = SYS_BUS_DEVICE(vms->cfg);
> > +    sysbus_mmio_map(cfg, 0, vms->memmap[VMAPPLE_PVPANIC].base);
> > +
> > +    sysbus_realize_and_unref(cfg, &error_fatal);
> > +}
> > +
> > +static void create_cfg(VMAppleMachineState *vms, MemoryRegion *mem)
> > +{
> > +    SysBusDevice *cfg;
> > +    MachineState *machine = MACHINE(vms);
> > +    uint32_t rnd = 1;
> > +
> > +    vms->cfg = qdev_new(TYPE_VMAPPLE_CFG);
> > +    cfg = SYS_BUS_DEVICE(vms->cfg);
> > +    sysbus_mmio_map(cfg, 0, vms->memmap[VMAPPLE_CONFIG].base);
> > +
> > +    qemu_guest_getrandom_nofail(&rnd, sizeof(rnd));
> > +
> > +    qdev_prop_set_uint32(vms->cfg, "nr-cpus", machine->smp.cpus);
> > +    qdev_prop_set_uint64(vms->cfg, "ecid", vms->uuid);
> > +    qdev_prop_set_uint64(vms->cfg, "ram-size", machine->ram_size);
> > +    qdev_prop_set_uint32(vms->cfg, "rnd", rnd);
> > +
> > +    sysbus_realize_and_unref(cfg, &error_fatal);
> > +}
> > +
> > +static void create_gfx(VMAppleMachineState *vms, MemoryRegion *mem)
> > +{
> > +    int irq_gfx = vms->irqmap[VMAPPLE_APV_GFX];
> > +    int irq_iosfc = vms->irqmap[VMAPPLE_APV_IOSFC];
> > +    SysBusDevice *aes;
> > +
> > +    aes = SYS_BUS_DEVICE(qdev_new("apple-gfx-mmio"));
> > +    sysbus_mmio_map(aes, 0, vms->memmap[VMAPPLE_APV_GFX].base);
> > +    sysbus_mmio_map(aes, 1, vms->memmap[VMAPPLE_APV_IOSFC].base);
> > +    sysbus_connect_irq(aes, 0, qdev_get_gpio_in(vms->gic, irq_gfx));
> > +    sysbus_connect_irq(aes, 1, qdev_get_gpio_in(vms->gic, irq_iosfc));
> > +    sysbus_realize_and_unref(aes, &error_fatal);
> > +}
> > +
> > +static void create_aes(VMAppleMachineState *vms, MemoryRegion *mem)
> > +{
> > +    int irq = vms->irqmap[VMAPPLE_AES_1];
> > +    SysBusDevice *aes;
> > +
> > +    aes = SYS_BUS_DEVICE(qdev_new("apple-aes"));
> > +    sysbus_mmio_map(aes, 0, vms->memmap[VMAPPLE_AES_1].base);
> > +    sysbus_mmio_map(aes, 1, vms->memmap[VMAPPLE_AES_2].base);
> > +    sysbus_connect_irq(aes, 0, qdev_get_gpio_in(vms->gic, irq));
> > +    sysbus_realize_and_unref(aes, &error_fatal);
> > +}
> > +
> > +static inline int arm_gic_ppi_index(int cpu_nr, int ppi_index)
> > +{
> > +    return NUM_IRQS + cpu_nr * GIC_INTERNAL + ppi_index;
> > +}
> > +
> > +static void create_gic(VMAppleMachineState *vms, MemoryRegion *mem)
> > +{
> > +    MachineState *ms = MACHINE(vms);
> > +    /* We create a standalone GIC */
> > +    SysBusDevice *gicbusdev;
> > +    QList *redist_region_count;
> > +    int i;
> > +    unsigned int smp_cpus = ms->smp.cpus;
> > +
> > +    vms->gic = qdev_new(gicv3_class_name());
> > +    qdev_prop_set_uint32(vms->gic, "revision", 3);
> > +    qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus);
> > +    /*
> > +     * Note that the num-irq property counts both internal and external
> > +     * interrupts; there are always 32 of the former (mandated by GIC
> spec).
> > +     */
> > +    qdev_prop_set_uint32(vms->gic, "num-irq", NUM_IRQS + 32);
> > +
> > +    uint32_t redist0_capacity =
> > +                vms->memmap[VMAPPLE_GIC_REDIST].size /
> GICV3_REDIST_SIZE;
> > +    uint32_t redist0_count = MIN(smp_cpus, redist0_capacity);
> > +
> > +    redist_region_count = qlist_new();
> > +    qlist_append_int(redist_region_count, redist0_count);
> > +    qdev_prop_set_array(vms->gic, "redist-region-count",
> redist_region_count);
> > +
> > +    gicbusdev = SYS_BUS_DEVICE(vms->gic);
> > +    sysbus_realize_and_unref(gicbusdev, &error_fatal);
> > +    sysbus_mmio_map(gicbusdev, 0, vms->memmap[VMAPPLE_GIC_DIST].base);
> > +    sysbus_mmio_map(gicbusdev, 1, vms->memmap[VMAPPLE_GIC_REDIST].base);
> > +
> > +    /*
> > +     * Wire the outputs from each CPU's generic timer and the GICv3
> > +     * maintenance interrupt signal to the appropriate GIC PPI inputs,
> > +     * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's
> inputs.
> > +     */
> > +    for (i = 0; i < smp_cpus; i++) {
> > +        DeviceState *cpudev = DEVICE(qemu_get_cpu(i));
> > +
> > +        /* Map the virt timer to PPI 27 */
> > +        qdev_connect_gpio_out(cpudev, GTIMER_VIRT,
> > +                              qdev_get_gpio_in(vms->gic,
> > +                                               arm_gic_ppi_index(i,
> 27)));
> > +
> > +        /* Map the GIC IRQ and FIQ lines to CPU */
> > +        sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev,
> ARM_CPU_IRQ));
> > +        sysbus_connect_irq(gicbusdev, i + smp_cpus,
> > +                           qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
> > +    }
> > +}
> > +
> > +static void create_uart(const VMAppleMachineState *vms, int uart,
> > +                        MemoryRegion *mem, Chardev *chr)
> > +{
> > +    hwaddr base = vms->memmap[uart].base;
> > +    int irq = vms->irqmap[uart];
> > +    DeviceState *dev = qdev_new(TYPE_PL011);
> > +    SysBusDevice *s = SYS_BUS_DEVICE(dev);
> > +
> > +    qdev_prop_set_chr(dev, "chardev", chr);
> > +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
> > +    memory_region_add_subregion(mem, base,
> > +                                sysbus_mmio_get_region(s, 0));
> > +    sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq));
> > +}
> > +
> > +static void create_rtc(const VMAppleMachineState *vms)
> > +{
> > +    hwaddr base = vms->memmap[VMAPPLE_RTC].base;
> > +    int irq = vms->irqmap[VMAPPLE_RTC];
> > +
> > +    sysbus_create_simple("pl031", base, qdev_get_gpio_in(vms->gic,
> irq));
> > +}
> > +
> > +static DeviceState *gpio_key_dev;
> > +static void vmapple_powerdown_req(Notifier *n, void *opaque)
> > +{
> > +    /* use gpio Pin 3 for power button event */
> > +    qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
> > +}
> > +
> > +static void create_gpio_devices(const VMAppleMachineState *vms, int
> gpio,
> > +                                MemoryRegion *mem)
> > +{
> > +    DeviceState *pl061_dev;
> > +    hwaddr base = vms->memmap[gpio].base;
> > +    int irq = vms->irqmap[gpio];
> > +    SysBusDevice *s;
> > +
> > +    pl061_dev = qdev_new("pl061");
> > +    /* Pull lines down to 0 if not driven by the PL061 */
> > +    qdev_prop_set_uint32(pl061_dev, "pullups", 0);
> > +    qdev_prop_set_uint32(pl061_dev, "pulldowns", 0xff);
> > +    s = SYS_BUS_DEVICE(pl061_dev);
> > +    sysbus_realize_and_unref(s, &error_fatal);
> > +    memory_region_add_subregion(mem, base, sysbus_mmio_get_region(s,
> 0));
> > +    sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq));
> > +    gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> > +                                        qdev_get_gpio_in(pl061_dev, 3));
> > +}
> > +
> > +static void vmapple_firmware_init(VMAppleMachineState *vms,
> > +                                  MemoryRegion *sysmem)
> > +{
> > +    hwaddr size = vms->memmap[VMAPPLE_FIRMWARE].size;
> > +    hwaddr base = vms->memmap[VMAPPLE_FIRMWARE].base;
> > +    const char *bios_name;
> > +    int image_size;
> > +    char *fname;
> > +
> > +    bios_name = MACHINE(vms)->firmware;
> > +    if (!bios_name) {
> > +        error_report("No firmware specified");
> > +        exit(1);
> > +    }
> > +
> > +    fname = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> > +    if (!fname) {
> > +        error_report("Could not find ROM image '%s'", bios_name);
> > +        exit(1);
> > +    }
> > +
> > +    memory_region_init_ram(&vms->fw_mr, NULL, "firmware", size,
> &error_fatal);
> > +    image_size = load_image_mr(fname, &vms->fw_mr);
> > +
> > +    g_free(fname);
> > +    if (image_size < 0) {
> > +        error_report("Could not load ROM image '%s'", bios_name);
> > +        exit(1);
> > +    }
> > +
> > +    memory_region_add_subregion(get_system_memory(), base, &vms->fw_mr);
> > +}
> > +
> > +static void create_pcie(VMAppleMachineState *vms)
> > +{
> > +    hwaddr base_mmio = vms->memmap[VMAPPLE_PCIE_MMIO].base;
> > +    hwaddr size_mmio = vms->memmap[VMAPPLE_PCIE_MMIO].size;
> > +    hwaddr base_ecam = vms->memmap[VMAPPLE_PCIE_ECAM].base;
> > +    hwaddr size_ecam = vms->memmap[VMAPPLE_PCIE_ECAM].size;
> > +    int irq = vms->irqmap[VMAPPLE_PCIE];
> > +    MemoryRegion *mmio_alias;
> > +    MemoryRegion *mmio_reg;
> > +    MemoryRegion *ecam_alias;
> > +    MemoryRegion *ecam_reg;
> > +    DeviceState *dev;
> > +    int i;
> > +    PCIHostState *pci;
> > +    DeviceState *usb_controller;
> > +    USBBus *usb_bus;
> > +
> > +    dev = qdev_new(TYPE_GPEX_HOST);
> > +    qdev_prop_set_uint32(dev, "num-irqs", GPEX_NUM_IRQS);
> > +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
> > +
> > +    /* Map only the first size_ecam bytes of ECAM space */
> > +    ecam_alias = g_new0(MemoryRegion, 1);
>
> Include this in VMAppleMachineState.
>
> > +    ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
> > +    memory_region_init_alias(ecam_alias, OBJECT(dev), "pcie-ecam",
> > +                             ecam_reg, 0, size_ecam);
> > +    memory_region_add_subregion(get_system_memory(), base_ecam,
> ecam_alias);
> > +
> > +    /*
> > +     * Map the MMIO window from [0x50000000-0x7fff0000] in PCI space
> into
> > +     * system address space at [0x50000000-0x7fff0000].
> > +     */
> > +    mmio_alias = g_new0(MemoryRegion, 1);
> > +    mmio_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 1);
> > +    memory_region_init_alias(mmio_alias, OBJECT(dev), "pcie-mmio",
> > +                             mmio_reg, base_mmio, size_mmio);
> > +    memory_region_add_subregion(get_system_memory(), base_mmio,
> mmio_alias);
> > +
> > +    for (i = 0; i < GPEX_NUM_IRQS; i++) {
> > +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
> > +                           qdev_get_gpio_in(vms->gic, irq + i));
> > +        gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
> > +    }
> > +
> > +    pci = PCI_HOST_BRIDGE(dev);
> > +    vms->bus = pci->bus;
> > +    g_assert(vms->bus);
> > +
> > +    while ((dev = qemu_create_nic_device("virtio-net-pci", true,
> NULL))) {
> > +        qdev_realize_and_unref(dev, BUS(vms->bus), &error_fatal);
> > +    }
> > +
> > +    usb_controller = qdev_new(TYPE_QEMU_XHCI);
> > +    qdev_realize_and_unref(usb_controller, BUS(pci->bus), &error_fatal);
> > +
> > +    usb_bus = USB_BUS(object_resolve_type_unambiguous(TYPE_USB_BUS,
> > +                                                      &error_fatal));
> > +    usb_create_simple(usb_bus, "usb-kbd");
> > +    usb_create_simple(usb_bus, "usb-tablet");
> > +}
> > +
> > +static void vmapple_reset(void *opaque)
> > +{
> > +    VMAppleMachineState *vms = opaque;
> > +    hwaddr base = vms->memmap[VMAPPLE_FIRMWARE].base;
> > +
> > +    cpu_set_pc(first_cpu, base);
> > +}
> > +
> > +static void mach_vmapple_init(MachineState *machine)
> > +{
> > +    VMAppleMachineState *vms = VMAPPLE_MACHINE(machine);
> > +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> > +    const CPUArchIdList *possible_cpus;
> > +    MemoryRegion *sysmem = get_system_memory();
> > +    int n;
> > +    unsigned int smp_cpus = machine->smp.cpus;
> > +    unsigned int max_cpus = machine->smp.max_cpus;
> > +
> > +    vms->memmap = memmap;
> > +    machine->usb = true;
> > +
> > +    possible_cpus = mc->possible_cpu_arch_ids(machine);
> > +    assert(possible_cpus->len == max_cpus);
> > +    for (n = 0; n < possible_cpus->len; n++) {
> > +        Object *cpu;
> > +        CPUState *cs;
> > +
> > +        if (n >= smp_cpus) {
> > +            break;
> > +        }
> > +
> > +        cpu = object_new(possible_cpus->cpus[n].type);
> > +        object_property_set_int(cpu, "mp-affinity",
> > +                                possible_cpus->cpus[n].arch_id, NULL);
>
> Pass &error_fatal instead of NULL.
>
> > +
> > +        cs = CPU(cpu);
> > +        cs->cpu_index = n;
> > +
> > +        numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index],
> DEVICE(cpu),
> > +                          &error_fatal);
> > +
> > +        object_property_set_bool(cpu, "has_el3", false, NULL);
> > +        object_property_set_bool(cpu, "has_el2", false, NULL);
> > +        object_property_set_int(cpu, "psci-conduit",
> QEMU_PSCI_CONDUIT_HVC,
> > +                                NULL);
> > +
> > +        /* Secondary CPUs start in PSCI powered-down state */
> > +        if (n > 0) {
> > +            object_property_set_bool(cpu, "start-powered-off", true,
> NULL);
> > +        }
> > +
> > +        object_property_set_link(cpu, "memory", OBJECT(sysmem),
> &error_abort);
> > +        qdev_realize(DEVICE(cpu), NULL, &error_fatal);
> > +        object_unref(cpu);
> > +    }
> > +
> > +    memory_region_add_subregion(sysmem, vms->memmap[VMAPPLE_MEM].base,
> > +                                machine->ram);
> > +
> > +    create_gic(vms, sysmem);
> > +    create_bdif(vms, sysmem);
> > +    create_pvpanic(vms, sysmem);
> > +    create_aes(vms, sysmem);
> > +    create_gfx(vms, sysmem);
> > +    create_uart(vms, VMAPPLE_UART, sysmem, serial_hd(0));
> > +    create_rtc(vms);
> > +    create_pcie(vms);
> > +
> > +    create_gpio_devices(vms, VMAPPLE_GPIO, sysmem);
> > +
> > +    vmapple_firmware_init(vms, sysmem);
> > +    create_cfg(vms, sysmem);
> > +
> > +    /* connect powerdown request */
> > +    vms->powerdown_notifier.notify = vmapple_powerdown_req;
> > +    qemu_register_powerdown_notifier(&vms->powerdown_notifier);
> > +
> > +    vms->bootinfo.ram_size = machine->ram_size;
> > +    vms->bootinfo.board_id = -1;
> > +    vms->bootinfo.loader_start = vms->memmap[VMAPPLE_MEM].base;
> > +    vms->bootinfo.skip_dtb_autoload = true;
> > +    vms->bootinfo.firmware_loaded = true;
> > +    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
> > +
> > +    qemu_register_reset(vmapple_reset, vms);
> > +}
> > +
> > +static CpuInstanceProperties
> > +vmapple_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > +{
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;
> > +}
> > +
> > +
> > +static int64_t vmapple_get_default_cpu_node_id(const MachineState *ms,
> int idx)
> > +{
> > +    return idx % ms->numa_state->num_nodes;
> > +}
> > +
> > +static const CPUArchIdList *vmapple_possible_cpu_arch_ids(MachineState
> *ms)
> > +{
> > +    int n;
> > +    unsigned int max_cpus = ms->smp.max_cpus;
> > +
> > +    if (ms->possible_cpus) {
> > +        assert(ms->possible_cpus->len == max_cpus);
> > +        return ms->possible_cpus;
> > +    }
> > +
> > +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> > +                                  sizeof(CPUArchId) * max_cpus);
> > +    ms->possible_cpus->len = max_cpus;
> > +    for (n = 0; n < ms->possible_cpus->len; n++) {
> > +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> > +        ms->possible_cpus->cpus[n].arch_id =
> > +            arm_build_mp_affinity(n, GICV3_TARGETLIST_BITS);
> > +        ms->possible_cpus->cpus[n].props.has_thread_id = true;
> > +        ms->possible_cpus->cpus[n].props.thread_id = n;
> > +    }
> > +    return ms->possible_cpus;
> > +}
> > +
> > +static void vmapple_get_uuid(Object *obj, Visitor *v, const char *name,
> > +                             void *opaque, Error **errp)
> > +{
> > +    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
> > +
> > +    visit_type_uint64(v, name, &vms->uuid, errp);
> > +}
> > +
> > +static void vmapple_set_uuid(Object *obj, Visitor *v, const char *name,
> > +                             void *opaque, Error **errp)
> > +{
> > +    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
> > +    Error *error = NULL;
> > +
> > +    visit_type_uint64(v, name, &vms->uuid, &error);
> > +    if (error) {
> > +        error_propagate(errp, error);
> > +        return;
> > +    }
> > +}
> > +
> > +static void vmapple_machine_class_init(ObjectClass *oc, void *data)
> > +{
> > +    MachineClass *mc = MACHINE_CLASS(oc);
> > +
> > +    mc->init = mach_vmapple_init;
> > +    mc->max_cpus = 32;
> > +    mc->block_default_type = IF_VIRTIO;
> > +    mc->no_cdrom = 1;
> > +    mc->pci_allow_0_address = true;
> > +    mc->minimum_page_bits = 12;
> > +    mc->possible_cpu_arch_ids = vmapple_possible_cpu_arch_ids;
> > +    mc->cpu_index_to_instance_props = vmapple_cpu_index_to_props;
> > +    mc->default_cpu_type = ARM_CPU_TYPE_NAME("host");
> > +    mc->get_default_cpu_node_id = vmapple_get_default_cpu_node_id;
> > +    mc->default_ram_id = "mach-vmapple.ram";
> > +
> > +    object_register_sugar_prop(TYPE_VIRTIO_PCI, "disable-legacy",
> > +                               "on", true);
> > +
> > +    object_class_property_add(oc, "uuid", "uint64", vmapple_get_uuid,
> > +                              vmapple_set_uuid, NULL, NULL);
> > +    object_class_property_set_description(oc, "uuid", "Machine UUID
> (SDOM)");
> > +}
> > +
> > +static void vmapple_instance_init(Object *obj)
> > +{
> > +    VMAppleMachineState *vms = VMAPPLE_MACHINE(obj);
> > +
> > +    vms->irqmap = irqmap;
> > +}
> > +
> > +static const TypeInfo vmapple_machine_info = {
> > +    .name          = TYPE_VMAPPLE_MACHINE,
> > +    .parent        = TYPE_MACHINE,
> > +    .abstract      = true,
> > +    .instance_size = sizeof(VMAppleMachineState),
> > +    .class_size    = sizeof(VMAppleMachineClass),
> > +    .class_init    = vmapple_machine_class_init,
> > +    .instance_init = vmapple_instance_init,
> > +};
> > +
> > +static void machvmapple_machine_init(void)
> > +{
> > +    type_register_static(&vmapple_machine_info);
> > +}
> > +type_init(machvmapple_machine_init);
> > +
> > +static void vmapple_machine_9_2_options(MachineClass *mc)
> > +{
> > +}
> > +DEFINE_VMAPPLE_MACHINE_AS_LATEST(9, 2)
> > +
>
>

[-- Attachment #2: Type: text/html, Size: 43294 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-11-05 15:38 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-24 10:27 [PATCH v4 00/15] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
2024-10-24 10:27 ` [PATCH v4 01/15] ui & main loop: Redesign of system-specific main thread event handling Phil Dennis-Jordan
2024-10-25  4:34   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 02/15] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support Phil Dennis-Jordan
2024-10-25  6:03   ` Akihiko Odaki
2024-10-25 19:43     ` Phil Dennis-Jordan
2024-10-26  4:40       ` Akihiko Odaki
2024-10-26 10:24         ` Phil Dennis-Jordan
2024-10-28  7:42           ` Akihiko Odaki
2024-10-28  9:00             ` Phil Dennis-Jordan
2024-10-28 13:31               ` Phil Dennis-Jordan
2024-10-28 14:02                 ` Akihiko Odaki
2024-10-28 14:13                   ` Phil Dennis-Jordan
2024-10-28 16:06                     ` Akihiko Odaki
2024-10-28 21:06                       ` Phil Dennis-Jordan
2024-10-29  7:42                         ` Akihiko Odaki
2024-10-29 21:16                           ` Phil Dennis-Jordan
2024-10-31  6:52                             ` Akihiko Odaki
2024-11-03 15:08                               ` Phil Dennis-Jordan
2024-10-24 10:28 ` [PATCH v4 03/15] hw/display/apple-gfx: Adds PCI implementation Phil Dennis-Jordan
2024-10-26  4:45   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 04/15] hw/display/apple-gfx: Adds configurable mode list Phil Dennis-Jordan
2024-10-26  5:15   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 05/15] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF Phil Dennis-Jordan
2024-11-05 15:36   ` Roman Bolshakov
2024-10-24 10:28 ` [PATCH v4 06/15] hw: Add vmapple subdir Phil Dennis-Jordan
2024-10-24 10:28 ` [PATCH v4 07/15] hw/misc/pvpanic: Add MMIO interface Phil Dennis-Jordan
2024-10-24 10:28 ` [PATCH v4 08/15] hvf: arm: Ignore writes to CNTP_CTL_EL0 Phil Dennis-Jordan
2024-10-24 10:28 ` [PATCH v4 09/15] gpex: Allow more than 4 legacy IRQs Phil Dennis-Jordan
2024-10-26  5:21   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 10/15] hw/vmapple/aes: Introduce aes engine Phil Dennis-Jordan
2024-10-26  5:40   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 11/15] hw/vmapple/bdif: Introduce vmapple backdoor interface Phil Dennis-Jordan
2024-10-24 10:28 ` [PATCH v4 12/15] hw/vmapple/cfg: Introduce vmapple cfg region Phil Dennis-Jordan
2024-10-26  5:48   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 13/15] hw/vmapple/virtio-blk: Add support for apple virtio-blk Phil Dennis-Jordan
2024-10-26  6:02   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 14/15] hw/block/virtio-blk: Replaces request free function with g_free Phil Dennis-Jordan
2024-10-26  6:03   ` Akihiko Odaki
2024-10-24 10:28 ` [PATCH v4 15/15] hw/vmapple/vmapple: Add vmapple machine type Phil Dennis-Jordan
2024-10-26  6:20   ` Akihiko Odaki
2024-10-26 11:58     ` Phil Dennis-Jordan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).