From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3AD1BCFB424 for ; Sun, 6 Oct 2024 10:40:29 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sxOg7-0007t5-23; Sun, 06 Oct 2024 06:39:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sxOfw-0007p4-T3 for qemu-devel@nongnu.org; Sun, 06 Oct 2024 06:39:46 -0400 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sxOfp-0002Hd-Tl for qemu-devel@nongnu.org; Sun, 06 Oct 2024 06:39:44 -0400 Received: by mail-vs1-xe2f.google.com with SMTP id ada2fe7eead31-4a3a9f7b8f0so1149307137.2 for ; Sun, 06 Oct 2024 03:39:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=philjordan-eu.20230601.gappssmtp.com; s=20230601; t=1728211175; x=1728815975; darn=nongnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wNmZKYUMpoKi6FN8mCmbn6YeWKye7iZm0j1Bq/b5nmk=; b=VPdI7ySP8NQcKftc84CP/UweaYInFm9YYHaZNOpBfRgVYXIp4WLOp3tTIPWo9ZjBvw /bI754pidfW26Qly9N0vIQWjZ9Z/TP/xloo+R4XuhsggUuiPAuzje+TARcSKVBU6Hm7a GesQqz/9oUPuBPXqzfBJVR14wgZMCE5KM8b68wR/kq+hLp569rYbUsQ+noYQBxboymfD Exosj7Pddedegr7mQ4G3vXhUqpGZSybHekDgTKqPxnIRmpL6LseVAcI9DmnAC4pTKFtV +bTpJ34vJsY8vgbCY6xro+NoLqAXv9za3zj0zRTH22YkCsE+bAYHdmPEjNnHmFqs4g5N w1Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728211175; x=1728815975; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wNmZKYUMpoKi6FN8mCmbn6YeWKye7iZm0j1Bq/b5nmk=; b=tfys/48NWInLLxUWzvJMZ9tQfYrp4CMY6lSfKHagIB/S/K3sMPybIIcOoLzdRyTwjg HKsCxMDsk5ZcMpcmyxFWRSvIY0ktDViF+1AII8zzBuo79PxIHTmwo66QU12w36AhWFbO bDoGznfA2EdoO62s28/7dlszLDEqBpg+uPJgZRC3Sp51VEOy6Hwru+3EJlCMGB4a5UVw eHsMSQ45rXVwM+H6skOvmvWxM1bYb0HcrWaQDXxlleYxR064DhaHTwPdVuCXorBbLL/R ChgI47UbunriacmIJm9UpIuFDjuMBNK7NP6omEPH0lscxmRvBIn7x8kQQ+xWle3JUP1w 0+AQ== X-Gm-Message-State: AOJu0YxF/jfBWBHTkjdofw3r5yyHROp20jp3j9of7mCy16ivb14C6A4l oaiIpRe8eYSHwr8xZWazrWfT0WJ92J3N4s6vLkpacaskHTKd4Ums4uYDGUyQoigTMlNa4kXLSxi sTOE2SPPCgRnKpjM0sGQm9Yx9851YgBNFsamt X-Google-Smtp-Source: AGHT+IH2XeCXfXNkJCy53x0MABWbBQc2vZp/+Q2x9efOPoD5HLPMJ13IxCkBOFuxVcg+EKcPMCBD6/O3f3joiIPlkSQ= X-Received: by 2002:a05:6102:3595:b0:4a3:b777:3613 with SMTP id ada2fe7eead31-4a405905334mr5994812137.27.1728211175264; Sun, 06 Oct 2024 03:39:35 -0700 (PDT) MIME-Version: 1.0 References: <20240928085727.56883-1-phil@philjordan.eu> <20240928085727.56883-2-phil@philjordan.eu> <7f3a1a60-a2f3-433c-8f2e-a2dfe0afdcb5@daynix.com> In-Reply-To: From: Phil Dennis-Jordan Date: Sun, 6 Oct 2024 12:39:23 +0200 Message-ID: Subject: Re: [PATCH v3 01/14] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support To: Akihiko Odaki Cc: qemu-devel@nongnu.org, agraf@csgraf.de, peter.maydell@linaro.org, pbonzini@redhat.com, rad@semihalf.com, quic_llindhol@quicinc.com, marcin.juszkiewicz@linaro.org, stefanha@redhat.com, mst@redhat.com, slp@redhat.com, richard.henderson@linaro.org, eduardo@habkost.net, marcel.apfelbaum@gmail.com, gaosong@loongson.cn, jiaxun.yang@flygoat.com, chenhuacai@kernel.org, kwolf@redhat.com, hreitz@redhat.com, philmd@linaro.org, shorne@gmail.com, palmer@dabbelt.com, alistair.francis@wdc.com, bmeng.cn@gmail.com, liwei1518@gmail.com, dbarboza@ventanamicro.com, zhiwei_liu@linux.alibaba.com, jcmvbkbc@gmail.com, marcandre.lureau@redhat.com, berrange@redhat.com, qemu-arm@nongnu.org, qemu-block@nongnu.org, qemu-riscv@nongnu.org, Alexander Graf Content-Type: multipart/alternative; boundary="00000000000032f0df0623cc857b" Received-SPF: neutral client-ip=2607:f8b0:4864:20::e2f; envelope-from=phil@philjordan.eu; helo=mail-vs1-xe2f.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org --00000000000032f0df0623cc857b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 3 Oct 2024 at 09:09, Akihiko Odaki wrote= : > On 2024/10/02 22:33, Phil Dennis-Jordan wrote: > > > > > > > +#include "apple-gfx.h" > > > +#include "monitor/monitor.h" > > > +#include "hw/sysbus.h" > > > +#include "hw/irq.h" > > > +#include "trace.h" > > > +#import > > > + > > > +_Static_assert(__aarch64__, ""); > > > > I don't think this assertion is worthwhile. This assertion will > trigger > > if you accidentally remove depends on AARCH64 from Kconfig, but I > don't > > think such code change happens by accident, and there is no reason = to > > believe that this assertion won't be removed in such a case. > > > > > > As far as I'm aware the Kconfig AARCH64 dependency is for the /target/ > > architecture, not the /host/ architecture? The static assert checks for > > the latter. The PGIOSurfaceHostDeviceDescriptor type isn't available at > > all on non-aarch64 macOS hosts. I've not had any luck with using this > > variant of the device on x86-64 hosts simply by disabling any surface > > mapper code. > > > > Incidentally, if you know of a way to depend on a specific /host/ > > architecture in the Kconfig, that would be even better. I couldn't spot > > a way of doing that though. > > I got your intention now. The correct way to do that is to check for cpu > =3D=3D 'aarch64'. Having assertion will break qemu-system-aarch64 on Inte= l > Macs. > OK, looks Iike that needs to be done at the meson.build level not Kconfig, but this seems to work: if cpu =3D=3D 'aarch64' system_ss.add(when: 'CONFIG_MAC_PVG_MMIO', if_true: [files('apple-gfx-mmio.m'), pvg, metal]) endif > > > > It is dangerous to unlock BQL at an arbitrary place. Instead of > > unlocking, I suggest: > > - running [s->pgiosfc mmioReadAtOffset:offset] on another thread > > - using a bottom half to request operations that require BQL from t= he > > thread running [s->pgiosfc mmioReadAtOffset:offset] > > - calling AIO_WAIT_WHILE() to process the bottom half and to wait f= or > > the completion of [s->pgiosfc mmioReadAtOffset:offset] > > > > > > OK, I think I see what you mean, I'll try to rework things around that > > pattern. Any preference on how I kick off the job on the other thread? > > As we necessarily need to use libdispatch in a bunch of places in this > > code anyway, I guess dispatch_async() would probably be the simplest? > > Perhaps so. The QEMU way is to use a bottom half with AioContext, but > you can't simultaneously run a dispatch queue and AioContext in one > thread so you have to use the dispatch queue if you need one. > > > > > > + res =3D [s->pgiosfc mmioReadAtOffset:offset]; > > > + bql_lock(); > > > + > > > + trace_apple_iosfc_read(offset, res); > > > + > > > + return res; > > > +} > > > + > > > +static void apple_iosfc_write( > > > + void *opaque, hwaddr offset, uint64_t val, unsigned size) > > > +{ > > > + AppleGFXVmappleState *s =3D opaque; > > > + > > > + trace_apple_iosfc_write(offset, val); > > > + > > > + [s->pgiosfc mmioWriteAtOffset:offset value:val]; > > > +} > > > + > > > +static const MemoryRegionOps apple_iosfc_ops =3D { > > > + .read =3D apple_iosfc_read, > > > + .write =3D apple_iosfc_write, > > > + .endianness =3D DEVICE_LITTLE_ENDIAN, > > > + .valid =3D { > > > + .min_access_size =3D 4, > > > + .max_access_size =3D 8, > > > + }, > > > + .impl =3D { > > > + .min_access_size =3D 4, > > > + .max_access_size =3D 8, > > > + }, > > > +}; > > > + > > > +static PGIOSurfaceHostDevice > > *apple_gfx_prepare_iosurface_host_device( > > > + AppleGFXVmappleState *s) > > > +{ > > > + PGIOSurfaceHostDeviceDescriptor *iosfc_desc =3D > > > + [PGIOSurfaceHostDeviceDescriptor new]; > > > + PGIOSurfaceHostDevice *iosfc_host_dev =3D nil; > > > + > > > + iosfc_desc.mapMemory =3D > > > + ^(uint64_t phys, uint64_t len, bool ro, void **va, void > > *e, void *f) { > > > + trace_apple_iosfc_map_memory(phys, len, ro, va, e, > f); > > > + MemoryRegion *tmp_mr; > > > + *va =3D gpa2hva(&tmp_mr, phys, len, NULL); > > > > Use: dma_memory_map() > > > > > > That doesn't seem to be a precisely equivalent operation. It also says > > in its headerdoc, > > > > Use only for reads OR writes - not for read-modify-write operations= . > > > > which I don't think we can guarantee here at all. > > > > I guess I can call it twice, once for writing and once for reading, but > > given that the dma_memory_unmap operation marks the area dirty, I'm not > > it's intended for what I understand the use case here to be: As far as = I > > can tell, the PV graphics device uses (some) of this memory to exchange > > data in a cache-coherent way between host and guest, e.g. as a lock-fre= e > > ring buffer, using atomic operations as necessary. (This works because > > it's a PV device: it "knows" the other end just another CPU core (or > > even the same one) executing in a different hypervisor context.) This > > doesn't really match "traditional" DMA patterns where there's either a > > read or a write happening. > > I think the story is a bit different for this VMApple variant. Probably > the CPU and GPU in Apple Silicon is cache-coherent so you can map normal > memory for GPU without any kind of cache maintenance. > > Cache conherency of CPU and GPU in Apple Silicon is implied with Apple's > documentation; it says you don't need to synchronize resources for > MTLStorageModeShared, which is the default for Apple Silicon. > > https://developer.apple.com/documentation/metal/resource_fundamentals/syn= chronizing_a_managed_resource_in_macos > > The name "IOSurface" also implies it is used not only for e.g., ring > buffer but also for real data. > Note that the PGTask map/unmap callbacks appear to have equivalent semantics, so it's not just the surface mapping. > > > > Hunting around some more for alternative APIs, there's also > > memory_region_get_ram_ptr(), but I'm not sure its restrictions apply > > here either. > > I think you can call memory_access_is_direct() to check if the > requirement is satisfied. > > It will still break dirty page tracking implemented by > dma_memory_unmap() and others, but it's broken for hvf, which does not > implement dirty page tracking either. > > > > > + return (bool)true; > > > > Why cast? > > > > > > Good question. Not originally my code, so I've fixed all the instances = I > > could find now. > OK, it turns out the reason for this is that C treats 'true' as an int, which then becomes the block's inferred return type - and the callbacks are expecting bool-returning blocks. I've fixed it by explicitly specifying the block return type and removing the cast in the return statement: iosfc_desc.unmapMemory =3D ^bool(=E2=80=A6) { =E2=80=A6 return true; }; > > > + > > > + iosfc_desc.unmapMemory =3D > > > + ^(void *a, void *b, void *c, void *d, void *e, void *f)= { > > > + trace_apple_iosfc_unmap_memory(a, b, c, d, e, f); > > > + return (bool)true; > > > + }; > > > + > > > + iosfc_desc.raiseInterrupt =3D ^(uint32_t vector) { > > > + trace_apple_iosfc_raise_irq(vector); > > > + bool locked =3D bql_locked(); > > > + if (!locked) { > > > + bql_lock(); > > > + } > > > + qemu_irq_pulse(s->irq_iosfc);> + if (!locked) { > > > + bql_unlock(); > > > + } > > > + return (bool)true; > > > + }; > > > + > > > + iosfc_host_dev =3D > > > + [[PGIOSurfaceHostDevice alloc] > > initWithDescriptor:iosfc_desc]; > > > + [iosfc_desc release]; > > > + return iosfc_host_dev; > > > +} > > > + > > > +static void apple_gfx_vmapple_realize(DeviceState *dev, Error > > **errp) > > > +{ > > > + @autoreleasepool { > > > > This autoreleasepool is not used. > > > > > > It is definitely used inside the apple_gfx_common_realize() call. It's > > also impossible to say whether [PGDeviceDescriptor new] uses autoreleas= e > > semantics internally, so it seemed safer to wrap the whole thing in an > > outer pool. > > Theoretically, It should be safe to assume the callee creates > autoreleasepool by themselves as needed in general. We have bunch of > code to call Objective-C APIs without creating autoreleasepool in the > caller. Practically, [PGDeviceDescriptor new] is likely to be > implemented with ARC, which wraps methods in autoreleasepool as necessary= . > As far as I'm aware, ARC does NOT automatically insert autorelease pool blocks. The reason you rarely need to create autoreleasepool blocks in "plain" Objective-C programming is that Cocoa/CFRunloop/libdispatch event handlers run each event in an autoreleasepool. So you don't need to create them explicitly when using dispatch_async and similar, or when running code on the main thread (which runs inside NSApplicationMain/CFRunloopRun/dispatch_main). As far as I'm aware, if you don't explicitly define autoreleasepools in raw threads created with the pthreads API, any autoreleased objects will leak. At least I've not found any specification/documentation contradicting this. And most code in Qemu runs on such raw threads, so we need to play it safe with regard to autorelease semantics. Whether the existing Qemu Objective-C code is safe in this regard I don't know for certain, but I've certainly paid attention to this aspect when modifying ui/cocoa.m in the past, and indeed most of that code runs on the main thread. Note also how I wrap the apple_gfx_render_new_frame call in a pool when it can't be guaranteed it's running on a dispatch queue because the command buffer inside that uses autorelease semantics. Functions that uses a method that returns autorelease resources should > be wrapped with autoreleasepool instead of assuming the caller creates > autoreleasepool for them. > I'm treating apple_gfx_common_realize as an internal API, and I don't think expecting its callers to wrap it in an autoreleasepool block is unreasonable. I can certainly explicitly document this in a comment. > > > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m > > > new file mode 100644 > > > index 00000000000..837300f9cd4 > > > --- /dev/null > > > +++ b/hw/display/apple-gfx.m > > > @@ -0,0 +1,536 @@ > > > +/* > > > + * QEMU Apple ParavirtualizedGraphics.framework device > > > + * > > > + * Copyright =C2=A9 2023 Amazon.com, Inc. or its affiliates. Al= l > > Rights Reserved. > > > + * > > > + * This work is licensed under the terms of the GNU GPL, versio= n > > 2 or later. > > > + * See the COPYING file in the top-level directory. > > > + * > > > + * ParavirtualizedGraphics.framework is a set of libraries that > > macOS provides > > > + * which implements 3d graphics passthrough to the host as well > as a > > > + * proprietary guest communication channel to drive it. This > > device model > > > + * implements support to drive that library from within QEMU. > > > + */ > > > + > > > +#include "apple-gfx.h" > > > +#include "trace.h" > > > +#include "qemu/main-loop.h" > > > +#include "ui/console.h" > > > +#include "monitor/monitor.h" > > > +#include "qapi/error.h" > > > +#include "migration/blocker.h" > > > +#include > > > +#import > > > + > > > +static const PGDisplayCoord_t apple_gfx_modes[] =3D { > > > + { .x =3D 1440, .y =3D 1080 }, > > > + { .x =3D 1280, .y =3D 1024 }, > > > +}; > > > + > > > +typedef struct PGTask_s { // Name matches forward declaration i= n > > PG header > > > > Let's name it AppleGFXTask. It is a common practice to have the sam= e > > tag > > name and typedef in QEMU. > > > > > > This is defining a forward-declared type from framework headers which i= s > > opaque from the framework's point of view. We do not get to choose its > > struct name. The alternative is having casts wherever these objects are > > being passed between our code and the framework. (See the original v1/v= 2 > > vmapple patch series for how messy this is.) > > I got the idea. Let's not avoid the typedef then to clarify the naming > is not under our control. > I'm not sure what you mean by this double negative. Are you saying, don't add our own typedef for struct PGTask_s at all, just use the framework-supplied PGTask_t where appropriate? > > > > > > +static void apple_gfx_render_frame_completed(AppleGFXState *s, > > void *vram, > > > + id > texture) > > > +{ > > > + --s->pending_frames; > > > + assert(s->pending_frames >=3D 0); > > > + > > > + if (vram !=3D s->vram) { > > > + /* Display mode has changed, drop this old frame. */ > > > + assert(texture !=3D s->texture); > > > + g_free(vram); > > > > This management of buffers looks a bit convoluted. I suggest > > remembering > > the width and height instead of pointers and comparing them. This w= ay > > you can free resources in set_mode(). > > > > > > Yeah, I suppose that works, I can change that around. > > > > > + } else { > > > + copy_mtl_texture_to_surface_mem(texture, vram); > > > > Writing vram outside BQL may result in tearing. > > > > > > As far as I can tell(*), QXL does the same. I couldn't find any example= s > > of double-buffering in any of the existing display devices, which would > > be the only way to do async updates efficiently and without tearing. In > > any case, this solution is still vastly better than a regular VGA > > device, which suffers from very visible tearing with macOS on the guest > > side anyway. And in an ideal world, we'd pass through the rendered > > texture directly to the Cocoa UI code rather than copying out only for > > the CPU to draw it back into a window surface which is then passed to > > the GPU for host side rendering. But I felt this patch is already very, > > very large, and if anyone cares, we can fix imperfections in subsequent > > updates. > > > > (*)The rendering code in that device is also fairly complex, so I may b= e > > misreading it. > > QXL always modifies the surface with BQL. The surface is modified with > qxl_blit(), which is called by qxl_render_update_area_unlocked(). > qxl_render_update_area_unlocked() is called by either of > qxl_render_update() and qxl_render_update_area_bh(). Both of them are > called with BQL. The name includes "unlocked", but it means it is called > without holding QXL-internal lock. > > Most devices works entirely with BQL so they don't perform double > buffering. apple-gfx can do the same. > I think we can safely move apple-gfx's framebuffer state management back inside the BQL, yes. I just figured that copying dozens of megabytes of framebuffer data on every frame while holding the lock was not going to help BQL contention. Especially as PVG does not have a concept of dirty areas, so we must copy the whole framebuffer every time. (Unless we were to implement dirty area detection ourselves.) Unfortunately, implementing double-buffering would require a major rework of Qemu's whole surface management, console code, and probably most of the UI implementations. I'm guessing the OpenGL fast-path sidesteps all of this, so replicating that with Metal would probably be the easier way forward. (Although doing all this graphics stuff inside the BQL generally seems like a major architectural flaw; I suppose most enterprise use of Qemu does not involve the framebuffer, so it's not shown up in BQL contention profiling there. It certainly does in desktop use, although at least on macOS hosts there are far worse culprits in that regard.) > > > > + if (s->gfx_update_requested) { > > > + s->gfx_update_requested =3D false; > > > + dpy_gfx_update_full(s->con); > > > + graphic_hw_update_done(s->con); > > > + s->new_frame_ready =3D false; > > > > This assignment is unnecessary as s->new_frame_ready is always fals= e > > when s->gfx_update_requested. If you want to make sure > > s->gfx_update_requested and s->new_frame_ready are mutually > exclusive, > > use one enum value instead of having two bools. > > > > > > I'll need to refresh my memory and get back to you on this one, it's > > been so many months since I actively worked on this code. > > > > > + } else { > > > + s->new_frame_ready =3D true; > > > + } > > > + } > > > + if (s->pending_frames > 0) { > > > + apple_gfx_render_new_frame(s); > > > + } > > > +} > > > + > > > +static void apple_gfx_fb_update_display(void *opaque) > > > +{ > > > + AppleGFXState *s =3D opaque; > > > + > > > + dispatch_async(s->render_queue, ^{ > > > + if (s->pending_frames > 0) { > > > > It should check for s->new_frame_ready as > > apple_gfx_render_frame_completed() doesn't check if > > s->pending_frames > 0 before calling graphic_hw_update_done(), whic= h > is > > inconsistent. > > > > > > pending_frames is about guest-side frames that are queued to be rendere= d > > by the host GPU. > > new_frame_ready being true indicates that the contents of the Qemu > > console surface has been updated with new frame data since the last > > gfx_update. > > gfx_update_requested indicates that gfx_update is currently awaiting an > > async completion (graphic_hw_update_done) but the surface has not > > received a new frame content, but the GPU is stily busy drawing one. > > > > apple_gfx_render_frame_completed is scheduled exactly once per pending > > frame, so pending_frames > 0 is an invariant there. (Hence the assert.)= > > > I don't think there is any inconsistency here, but I'll double check. > > It's possible that there's an easier way to express the state machine, > > and I'll take a look at that. > > I meant that apple_gfx_render_frame_completed() does not check if the > frame is the last one currently pending. apple_gfx_fb_update_display() > ignores a new ready frame when there is a more pending frame, but > apple_gfx_render_frame_completed() unconditionally fires > graphic_hw_update_done() even if there is a more pending frame. And I > think apple_gfx_render_frame_completed() is right and > apple_gfx_fb_update_display() is wrong in such a situation. > > OK, got it. And yes, I agree. > > > > Checking if s->pending_frames > 0 both in > apple_gfx_fb_update_display() > > and apple_gfx_render_frame_completed() is also problematic as that > can > > defer graphic_hw_update_done() indefinitely if we are getting new > > frames > > too fast. > > > > > > I see what you mean about this part. I'll have to test it, but I guess > > we should reverse the priority, like this: > > > > if (s->new_frame_ready) { > > dpy_gfx_update_full(s->con); > > s->new_frame_ready =3D false; > > graphic_hw_update_done(s->con); > > } else if (s->pending_frames > 0) { > > s->gfx_update_requested =3D true; > > } else { > > graphic_hw_update_done(s->con); > > } > > > > 1. If we already have a frame, ready to be displayed, return it > immediately. > > 2. If the guest has reported that it's completed a frame and the GPU is > > currently busy rendering it, defer graphic_hw_update_done until it's > done. > > 3. If the guest reports no changes to its display, indicate this back t= o > > Qemu as a no-op display update graphic_hw_update_done() with no > > dpy_gfx_update* call. > > Yes, that looks correct. > > > > > > + s->gfx_update_requested =3D true; > > > + } else { > > > + if (s->new_frame_ready) { > > > + dpy_gfx_update_full(s->con); > > > + s->new_frame_ready =3D false; > > > + } > > > + graphic_hw_update_done(s->con); > > > + }> + }); > > > +} > > > + > > > +static const GraphicHwOps apple_gfx_fb_ops =3D { > > > + .gfx_update =3D apple_gfx_fb_update_display, > > > + .gfx_update_async =3D true, > > > +}; > > > + > > > +static void update_cursor(AppleGFXState *s) > > > +{ > > > + dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x, > > > + s->pgdisp.cursorPosition.y, s->cursor_show); > > > +} > > > + > > > +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t > > height) > > > +{ > > > + void *vram =3D NULL; > > > + DisplaySurface *surface; > > > + MTLTextureDescriptor *textureDescriptor; > > > + id texture =3D nil; > > > + __block bool no_change =3D false; > > > + > > > + dispatch_sync(s->render_queue, > > > > Calling dispatch_sync() while holding BQL may result in deadlock. > > > > Only if any code executed on the same dispatch queue acquires the lock > > either directly or transitively. I believe I have ensure this is not > > done on the reqnder_queue, have you found anywhere this is the case? > > The documentation is not clear what threads a dispatch queue runs on. We > can have a deadlock if they lock the BQL. > dispatch_sync is a synchronisation primitive (it waits for and asserts exclusive access to the given queue), it doesn't actually do any thread scheduling. Work scheduled asynchronously to non-main dispatch queues will otherwise run on libdispatch pool threads. Running blocks on dispatch queues will not preempt and schedule it on other threads which may or may not be holding some locks. So the only way this code will deadlock is if any code scheduled to render_queue directly or transitively acquires the BQL. None of it does, although updating the console while holding the BQL rather complicates this= . > > > > > + ^{ > > > + if (s->surface && > > > + width =3D=3D surface_width(s->surface) && > > > + height =3D=3D surface_height(s->surface)) { > > > + no_change =3D true; > > > + } > > > + }); > > > + > > > + if (no_change) { > > > + return; > > > + } > > > + > > > + vram =3D g_malloc0(width * height * 4); > > > + surface =3D qemu_create_displaysurface_from(width, height, > > PIXMAN_LE_a8r8g8b8, > > > + width * 4, vram); > > > + > > > + @autoreleasepool { > > > + textureDescriptor =3D > > > + [MTLTextureDescriptor > > > + > > texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm > > > + width:width > > > + height:height > > > + mipmapped:NO]; > > > + textureDescriptor.usage =3D s->pgdisp.minimumTextureUsa= ge; > > > + texture =3D [s->mtl > > newTextureWithDescriptor:textureDescriptor]; > > > + } > > > + > > > + s->using_managed_texture_storage =3D > > > + (texture.storageMode =3D=3D MTLStorageModeManaged); > > > + > > > + dispatch_sync(s->render_queue, > > > + ^{ > > > + id old_texture =3D nil; > > > + void *old_vram =3D s->vram; > > > + s->vram =3D vram; > > > + s->surface =3D surface; > > > + > > > + dpy_gfx_replace_surface(s->con, surface); > > > + > > > + old_texture =3D s->texture; > > > + s->texture =3D texture; > > > + [old_texture release]; > > > > You can just do: > > [s->texture release]; > > s->texture =3D texture; > > > > This will make s->texture dangling between the two statements, but > that > > don't matter since s->texture is not an atomic variable that can be > > safely observed from another thread anyway. > > > > > + > > > + if (s->pending_frames =3D=3D 0) { > > > + g_free(old_vram); > > > + } > > > + }); > > > +} > > > + > > > +static void create_fb(AppleGFXState *s) > > > +{ > > > + s->con =3D graphic_console_init(NULL, 0, &apple_gfx_fb_ops,= s); > > > + set_mode(s, 1440, 1080); > > > + > > > + s->cursor_show =3D true; > > > +} > > > + > > > +static size_t apple_gfx_get_default_mmio_range_size(void) > > > +{ > > > + size_t mmio_range_size; > > > + @autoreleasepool { > > > + PGDeviceDescriptor *desc =3D [PGDeviceDescriptor new]; > > > + mmio_range_size =3D desc.mmioLength; > > > + [desc release]; > > > + } > > > + return mmio_range_size; > > > +} > > > + > > > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const > > char* obj_name) > > > > This function can be merged into apple_gfx_common_realize(). > > > > > > Probably. I'll try it. > Upon further inspection, we need to call cocoa_enable_runloop_on_main_thread() during the init phase, not realize(). So we can't get rid of this entirely. Is there any value in moving the other init code into _realize()? > > > +{ > > > + Error *local_err =3D NULL; > > > + int r; > > > + size_t mmio_range_size =3D > > apple_gfx_get_default_mmio_range_size(); > > > + > > > + trace_apple_gfx_common_init(obj_name, mmio_range_size); > > > + memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s= , > > obj_name, > > > + mmio_range_size); > > > + s->iomem_gfx.disable_reentrancy_guard =3D true; > > > > Why do you disable reentrancy guard? > > > > > > Perhaps with the proposed AIO_WAIT_WHILE based I/O scheme, this won't b= e > > an issue anymore, but the guard would otherwise keep dropping MMIOs > > which immediately caused the PV graphics device to stop making progress= . > > The MMIO APIs in the PVG framework are thread- and reentrancy-safe, so > > we certainly don't need to serialise them on our side. > > It's better to understand why such reentrancy happens. Reentrancy itself > is often a sign of bug. > > > > > > + > > > + /* TODO: PVG framework supports serialising device state: > > integrate it! */ > > > + if (apple_gfx_mig_blocker =3D=3D NULL) { > > > + error_setg(&apple_gfx_mig_blocker, > > > + "Migration state blocked by apple-gfx display > > device"); > > > + r =3D migrate_add_blocker(&apple_gfx_mig_blocker, > &local_err); > > > + if (r < 0) { > > > + error_report_err(local_err); > > > > Please report the error to the caller of apple_gfx_common_realize() > > instead. > > > > > + } > > > + } > > > +}> + > > > +static void > > apple_gfx_register_task_mapping_handlers(AppleGFXState *s, > > > + > > PGDeviceDescriptor *desc) > > > +{ > > > + desc.createTask =3D ^(uint64_t vmSize, void * _Nullable * > > _Nonnull baseAddress) { > > > + AppleGFXTask *task =3D apple_gfx_new_task(s, vmSize); > > > + *baseAddress =3D (void*)task->address; > > > > nit: please write as (void *) instead of (void*). > > > > > + trace_apple_gfx_create_task(vmSize, *baseAddress); > > > + return task; > > > + }; > > > + > > > > > +{ > > > + PGDisplayDescriptor *disp_desc =3D [PGDisplayDescriptor new= ]; > > > + > > > + disp_desc.name =3D @"QEMU display"; > > > + disp_desc.sizeInMillimeters =3D NSMakeSize(400., 300.); /* = A > > 20" display */ > > > + disp_desc.queue =3D dispatch_get_main_queue(); > > > + disp_desc.newFrameEventHandler =3D ^(void) { > > > + trace_apple_gfx_new_frame(); > > > + dispatch_async(s->render_queue, ^{ > > > + /* Drop frames if we get too far ahead. */ > > > + if (s->pending_frames >=3D 2) > > > + return; > > > + ++s->pending_frames; > > > + if (s->pending_frames > 1) { > > > + return; > > > + } > > > + @autoreleasepool { > > > + apple_gfx_render_new_frame(s); > > > + } > > > + }); > > > + }; > > > + disp_desc.modeChangeHandler =3D ^(PGDisplayCoord_t > sizeInPixels, > > > + OSType pixelFormat) { > > > + trace_apple_gfx_mode_change(sizeInPixels.x, > sizeInPixels.y); > > > + set_mode(s, sizeInPixels.x, sizeInPixels.y); > > > + }; > > > + disp_desc.cursorGlyphHandler =3D ^(NSBitmapImageRep *glyph, > > > + PGDisplayCoord_t hotSpot) = { > > > + uint32_t bpp =3D glyph.bitsPerPixel; > > > + size_t width =3D glyph.pixelsWide; > > > + size_t height =3D glyph.pixelsHigh; > > > + size_t padding_bytes_per_row =3D glyph.bytesPerRow - wi= dth > > * 4; > > > + const uint8_t* px_data =3D glyph.bitmapData; > > > + > > > + trace_apple_gfx_cursor_set(bpp, width, height); > > > + > > > + if (s->cursor) { > > > + cursor_unref(s->cursor); > > > + s->cursor =3D NULL; > > > + } > > > + > > > + if (bpp =3D=3D 32) { /* Shouldn't be anything else, but= just > > to be safe...*/ > > > + s->cursor =3D cursor_alloc(width, height); > > > + s->cursor->hot_x =3D hotSpot.x; > > > + s->cursor->hot_y =3D hotSpot.y; > > > + > > > + uint32_t *dest_px =3D s->cursor->data; > > > + > > > + for (size_t y =3D 0; y < height; ++y) { > > > + for (size_t x =3D 0; x < width; ++x) { > > > + /* NSBitmapImageRep's red & blue channels > > are swapped > > > + * compared to QEMUCursor's. */ > > > + *dest_px =3D > > > + (px_data[0] << 16u) | > > > + (px_data[1] << 8u) | > > > + (px_data[2] << 0u) | > > > + (px_data[3] << 24u); > > > + ++dest_px; > > > + px_data +=3D 4; > > > + } > > > + px_data +=3D padding_bytes_per_row; > > > + } > > > + dpy_cursor_define(s->con, s->cursor); > > > + update_cursor(s); > > > + } > > > + }; > > > + disp_desc.cursorShowHandler =3D ^(BOOL show) { > > > + trace_apple_gfx_cursor_show(show); > > > + s->cursor_show =3D show; > > > + update_cursor(s); > > > + }; > > > + disp_desc.cursorMoveHandler =3D ^(void) { > > > + trace_apple_gfx_cursor_move(); > > > + update_cursor(s); > > > + }; > > > + > > > + return disp_desc; > > > +} > > > + > > > +static NSArray* > > apple_gfx_prepare_display_mode_array(void) > > > +{ > > > + PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)]; > > > + NSArray* mode_array =3D nil; > > > + int i; > > > + > > > + for (i =3D 0; i < ARRAY_SIZE(apple_gfx_modes); i++) { > > > + modes[i] =3D > > > + [[PGDisplayMode alloc] > > initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.]; > > > + } > > > + > > > + mode_array =3D [NSArray arrayWithObjects:modes > > count:ARRAY_SIZE(apple_gfx_modes)]; > > > + > > > + for (i =3D 0; i < ARRAY_SIZE(apple_gfx_modes); i++) { > > > + [modes[i] release]; > > > + modes[i] =3D nil; > > > + } > > > + > > > + return mode_array; > > > +} > > > + > > > +static id copy_suitable_metal_device(void) > > > +{ > > > + id dev =3D nil; > > > + NSArray> *devs =3D MTLCopyAllDevices(); > > > + > > > + /* Prefer a unified memory GPU. Failing that, pick a non- > > removable GPU. */ > > > + for (size_t i =3D 0; i < devs.count; ++i) { > > > + if (devs[i].hasUnifiedMemory) { > > > + dev =3D devs[i]; > > > + break; > > > + } > > > + if (!devs[i].removable) { > > > + dev =3D devs[i]; > > > + } > > > + } > > > + > > > + if (dev !=3D nil) { > > > + [dev retain]; > > > + } else { > > > + dev =3D MTLCreateSystemDefaultDevice(); > > > + } > > > + [devs release]; > > > + > > > + return dev; > > > +} > > > + > > > +void apple_gfx_common_realize(AppleGFXState *s, > > PGDeviceDescriptor *desc) > > > +{ > > > + PGDisplayDescriptor *disp_desc =3D nil; > > > + > > > + QTAILQ_INIT(&s->tasks); > > > + s->render_queue =3D dispatch_queue_create("apple-gfx.render= ", > > > + > DISPATCH_QUEUE_SERIAL); > > > + s->mtl =3D copy_suitable_metal_device(); > > > + s->mtl_queue =3D [s->mtl newCommandQueue]; > > > + > > > + desc.device =3D s->mtl; > > > + > > > + apple_gfx_register_task_mapping_handlers(s, desc); > > > + > > > + s->pgdev =3D PGNewDeviceWithDescriptor(desc); > > > + > > > + disp_desc =3D apple_gfx_prepare_display_handlers(s); > > > + s->pgdisp =3D [s->pgdev newDisplayWithDescriptor:disp_desc > > > + port:0 > > serialNum:1234]; > > > + [disp_desc release]; > > > + s->pgdisp.modeList =3D apple_gfx_prepare_display_mode_array= (); > > > + > > > + create_fb(s); > > > +} > > > diff --git a/hw/display/meson.build b/hw/display/meson.build > > > index 7db05eace97..70d855749c0 100644 > > > --- a/hw/display/meson.build > > > +++ b/hw/display/meson.build > > > @@ -65,6 +65,8 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true: > > files('artist.c')) > > > > > > system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c', > > 'ati_2d.c', 'ati_dbg.c'), pixman]) > > > > > > +system_ss.add(when: 'CONFIG_MAC_PVG', if_true: > > [files('apple-gfx.m'), pvg, metal]) > > > +system_ss.add(when: 'CONFIG_MAC_PVG_VMAPPLE', if_true: > > [files('apple-gfx-vmapple.m'), pvg, metal]) > > > > > > if config_all_devices.has_key('CONFIG_VIRTIO_GPU') > > > virtio_gpu_ss =3D ss.source_set() > > > diff --git a/hw/display/trace-events b/hw/display/trace-events > > > index 781f8a33203..1809a358e36 100644 > > > --- a/hw/display/trace-events > > > +++ b/hw/display/trace-events > > > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width) > > "dest_width : %u" > > > dm163_leds(int led, uint32_t value) "led %d: 0x%x" > > > dm163_channels(int channel, uint8_t value) "channel %d: 0x%x" > > > dm163_refresh_rate(uint32_t rr) "refresh rate %d" > > > + > > > +# apple-gfx.m > > > +apple_gfx_read(uint64_t offset, uint64_t res) > > "offset=3D0x%"PRIx64" res=3D0x%"PRIx64 > > > +apple_gfx_write(uint64_t offset, uint64_t val) > > "offset=3D0x%"PRIx64" val=3D0x%"PRIx64 > > > +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=3D0x= %x > > base_addr=3D%p" > > > +apple_gfx_destroy_task(void *task) "task=3D%p" > > > +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t > > virtual_offset, uint32_t read_only) "task=3D%p range_count=3D0x%x > > virtual_offset=3D0x%"PRIx64" read_only=3D%d" > > > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr, > > uint64_t phys_len) "[%d] phys_addr=3D0x%"PRIx64" phys_len=3D0x%"PRI= x64 > > > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t > > target) "retval=3D%"PRId64" source=3D0x%"PRIx64" target=3D0x%"PRIx6= 4 > > > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset, > > uint64_t length) "task=3D%p virtual_offset=3D0x%"PRIx64" > length=3D0x%"PRIx64 > > > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length, > > void *dst) "phys_addr=3D0x%"PRIx64" length=3D0x%"PRIx64" dest=3D%p" > > > +apple_gfx_raise_irq(uint32_t vector) "vector=3D0x%x" > > > +apple_gfx_new_frame(void) "" > > > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=3D%"PRId64" > > y=3D%"PRId64 > > > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t > > height) "bpp=3D%d width=3D%"PRId64" height=3D0x%"PRId64 > > > +apple_gfx_cursor_show(uint32_t show) "show=3D%d" > > > +apple_gfx_cursor_move(void) "" > > > +apple_gfx_common_init(const char *device_name, size_t mmio_size= ) > > "device: %s; MMIO size: %zu bytes" > > > + > > > +# apple-gfx-vmapple.m > > > +apple_iosfc_read(uint64_t offset, uint64_t res) > > "offset=3D0x%"PRIx64" res=3D0x%"PRIx64 > > > +apple_iosfc_write(uint64_t offset, uint64_t val) > > "offset=3D0x%"PRIx64" val=3D0x%"PRIx64 > > > +apple_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t ro= , > > void *va, void *e, void *f) "phys=3D0x%"PRIx64" len=3D0x%"PRIx64" r= o=3D%d > > va=3D%p e=3D%p f=3D%p" > > > +apple_iosfc_unmap_memory(void *a, void *b, void *c, void *d, > > void *e, void *f) "a=3D%p b=3D%p c=3D%p d=3D%p e=3D%p f=3D%p" > > > +apple_iosfc_raise_irq(uint32_t vector) "vector=3D0x%x" > > > + > > > diff --git a/meson.build b/meson.build > > > index 10464466ff3..f09df3f09d5 100644 > > > --- a/meson.build > > > +++ b/meson.build > > > @@ -741,6 +741,8 @@ socket =3D [] > > > version_res =3D [] > > > coref =3D [] > > > iokit =3D [] > > > +pvg =3D [] > > > +metal =3D [] > > > emulator_link_args =3D [] > > > midl =3D not_found > > > widl =3D not_found > > > @@ -762,6 +764,8 @@ elif host_os =3D=3D 'darwin' > > > coref =3D dependency('appleframeworks', modules: > 'CoreFoundation') > > > iokit =3D dependency('appleframeworks', modules: 'IOKit', > > required: false) > > > host_dsosuf =3D '.dylib' > > > + pvg =3D dependency('appleframeworks', modules: > > 'ParavirtualizedGraphics') > > > + metal =3D dependency('appleframeworks', modules: 'Metal') > > > elif host_os =3D=3D 'sunos' > > > socket =3D [cc.find_library('socket'), > > > cc.find_library('nsl'), > > > > --00000000000032f0df0623cc857b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, 3 Oct 2024 at 09:09, Akihiko = Odaki <akihiko.odaki@daynix.= com> wrote:
On 2024/10/02 22:33, Phil Dennis-Jordan wrote:
>
>
>=C2=A0 =C2=A0 =C2=A0 > +#include "apple-gfx.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "monitor/monitor.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "hw/sysbus.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "hw/irq.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "trace.h"
>=C2=A0 =C2=A0 =C2=A0 > +#import <ParavirtualizedGraphics/Paravirt= ualizedGraphics.h>
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +_Static_assert(__aarch64__, ""); >
>=C2=A0 =C2=A0 =C2=A0I don't think this assertion is worthwhile. Thi= s assertion will trigger
>=C2=A0 =C2=A0 =C2=A0if you accidentally remove depends on AARCH64 from = Kconfig, but I don't
>=C2=A0 =C2=A0 =C2=A0think such code change happens by accident, and the= re is no reason to
>=C2=A0 =C2=A0 =C2=A0believe that this assertion won't be removed in= such a case.
>
>
> As far as I'm aware the Kconfig AARCH64 dependency is for the /tar= get/
> architecture, not the /host/ architecture? The static assert checks fo= r
> the latter. The PGIOSurfaceHostDeviceDescriptor type isn't availab= le at
> all on non-aarch64 macOS hosts. I've not had any luck with using t= his
> variant of the device on x86-64 hosts simply by disabling any surface =
> mapper code.
>
> Incidentally, if you know of a way to depend on a specific /host/
> architecture in the Kconfig, that would be even better. I couldn't= spot
> a way of doing that though.

I got your intention now. The correct way to do that is to check for cpu =3D=3D 'aarch64'. Having assertion will break qemu-system-aarch64 o= n Intel Macs.

OK, looks Iike that needs= to be done at the meson.build level not Kconfig, but this seems to work:

if cpu =3D=3D 'aarch64'
=C2=A0 system_ss= .add(when: 'CONFIG_MAC_PVG_MMIO', =C2=A0if_true: [files('apple-= gfx-mmio.m'), pvg, metal])
endif

=
=C2=A0
>
>=C2=A0 =C2=A0 =C2=A0It is dangerous to unlock BQL at an arbitrary place= . Instead of
>=C2=A0 =C2=A0 =C2=A0unlocking, I suggest:
>=C2=A0 =C2=A0 =C2=A0- running [s->pgiosfc mmioReadAtOffset:offset] o= n another thread
>=C2=A0 =C2=A0 =C2=A0- using a bottom half to request operations that re= quire BQL from the
>=C2=A0 =C2=A0 =C2=A0thread running [s->pgiosfc mmioReadAtOffset:offs= et]
>=C2=A0 =C2=A0 =C2=A0- calling AIO_WAIT_WHILE() to process the bottom ha= lf and to wait for
>=C2=A0 =C2=A0 =C2=A0the completion of [s->pgiosfc mmioReadAtOffset:o= ffset]
>
>
> OK, I think I see what you mean, I'll try to rework things around = that
> pattern. Any preference on how I kick off the job on the other thread?=
> As we necessarily need to use libdispatch in a bunch of places in this=
> code anyway, I guess dispatch_async() would probably be the simplest?<= br>
Perhaps so. The QEMU way is to use a bottom half with AioContext, but
you can't simultaneously run a dispatch queue and AioContext in one thread so you have to use the dispatch queue if you need one.

>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 res =3D [s->pgiosfc mmioRea= dAtOffset:offset];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 bql_lock();
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 trace_apple_iosfc_read(offset,= res);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 return res;
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static void apple_iosfc_write(
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 void *opaque, hwaddr offset, u= int64_t val, unsigned size)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 AppleGFXVmappleState *s =3D op= aque;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 trace_apple_iosfc_write(offset= , val);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 [s->pgiosfc mmioWriteAtOffs= et:offset value:val];
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static const MemoryRegionOps apple_iosfc_ops= =3D {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .read =3D apple_iosfc_read, >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .write =3D apple_iosfc_write,<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .endianness =3D DEVICE_LITTLE_= ENDIAN,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .valid =3D {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .min_access_size= =3D 4,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .max_access_size= =3D 8,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 },
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .impl =3D {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .min_access_size= =3D 4,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .max_access_size= =3D 8,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 },
>=C2=A0 =C2=A0 =C2=A0 > +};
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static PGIOSurfaceHostDevice
>=C2=A0 =C2=A0 =C2=A0*apple_gfx_prepare_iosurface_host_device(
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 AppleGFXVmappleState *s)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 PGIOSurfaceHostDeviceDescripto= r *iosfc_desc =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 [PGIOSurfaceHost= DeviceDescriptor new];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 PGIOSurfaceHostDevice *iosfc_h= ost_dev =3D nil;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 iosfc_desc.mapMemory =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ^(uint64_t phys,= uint64_t len, bool ro, void **va, void
>=C2=A0 =C2=A0 =C2=A0*e, void *f) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tr= ace_apple_iosfc_map_memory(phys, len, ro, va, e, f);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Me= moryRegion *tmp_mr;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *v= a =3D gpa2hva(&tmp_mr, phys, len, NULL);
>
>=C2=A0 =C2=A0 =C2=A0Use: dma_memory_map()
>
>
> That doesn't seem to be a precisely equivalent operation. It also = says
> in its headerdoc,
>
>=C2=A0 =C2=A0 =C2=A0Use only for reads OR writes - not for read-modify-= write operations.
>
> which I don't think we can guarantee here at all.
>
> I guess I can call it twice, once for writing and once for reading, bu= t
> given that the dma_memory_unmap operation marks the area dirty, I'= m not
> it's intended for what I understand the use case here to be: As fa= r as I
> can tell, the PV graphics device uses (some) of this memory to exchang= e
> data in a cache-coherent way between host and guest, e.g. as a lock-fr= ee
> ring buffer, using atomic operations as necessary. (This works because=
> it's a PV device: it "knows" the other end just another = CPU core (or
> even the same one) executing in a different hypervisor context.) This =
> doesn't really match "traditional" DMA patterns where th= ere's either a
> read or a write happening.

I think the story is a bit different for this VMApple variant. Probably the CPU and GPU in Apple Silicon is cache-coherent so you can map normal memory for GPU without any kind of cache maintenance.

Cache conherency of CPU and GPU in Apple Silicon is implied with Apple'= s
documentation; it says you don't need to synchronize resources for
MTLStorageModeShared, which is the default for Apple Silicon.
https://developer.apple.com/documentation/metal/resource_fundam= entals/synchronizing_a_managed_resource_in_macos

The name "IOSurface" also implies it is used not only for e.g., r= ing
buffer but also for real data.
=C2=A0
Note t= hat the PGTask map/unmap callbacks appear to have equivalent semantics, so = it's not just the surface mapping.
=C2=A0
>
> Hunting around some more for alternative APIs, there's also
> memory_region_get_ram_ptr(), but I'm not sure its restrictions app= ly
> here either.

I think you can call memory_access_is_direct() to check if the
requirement is satisfied.

It will still break dirty page tracking implemented by
dma_memory_unmap() and others, but it's broken for hvf, which does not =
implement dirty page tracking either.

<= br>

>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 re= turn (bool)true;
>
>=C2=A0 =C2=A0 =C2=A0Why cast?
>
>
> Good question. Not originally my code, so I've fixed all the insta= nces I
> could find now.

OK, it turns out the reason for this is th= at C treats 'true' as an int, which then becomes the block's in= ferred return type - and the callbacks are expecting bool-returning blocks.=

I've fixed it by explicitly specifying the bl= ock return type and removing the cast in the return statement:

iosfc_desc.unmapMemory =3D
=C2=A0 =C2=A0 =C2= =A0=C2=A0 ^bool(=E2=80=A6) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 =E2=80=A6
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return true;
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 };
=C2=A0
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 iosfc_desc.unmapMemory =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ^(void *a, void = *b, void *c, void *d, void *e, void *f) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tr= ace_apple_iosfc_unmap_memory(a, b, c, d, e, f);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 re= turn (bool)true;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 iosfc_desc.raiseInterrupt =3D = ^(uint32_t vector) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_iosf= c_raise_irq(vector);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 bool locked =3D = bql_locked();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!locked) { >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bq= l_lock();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 qemu_irq_p= ulse(s->irq_iosfc);> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!locked) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bq= l_unlock();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (bool)tru= e;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 iosfc_host_dev =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 [[PGIOSurfaceHos= tDevice alloc]
>=C2=A0 =C2=A0 =C2=A0initWithDescriptor:iosfc_desc];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 [iosfc_desc release];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 return iosfc_host_dev;
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static void apple_gfx_vmapple_realize(Device= State *dev, Error
>=C2=A0 =C2=A0 =C2=A0**errp)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 @autoreleasepool {
>
>=C2=A0 =C2=A0 =C2=A0This autoreleasepool is not used.
>
>
> It is definitely used inside the apple_gfx_common_realize() call. It&#= 39;s
> also impossible to say whether [PGDeviceDescriptor new] uses autorelea= se
> semantics internally, so it seemed safer to wrap the whole thing in an=
> outer pool.

Theoretically, It should be safe to assume the callee creates
autoreleasepool by themselves as needed in general. We have bunch of
code to call Objective-C APIs without creating autoreleasepool in the
caller. Practically, [PGDeviceDescriptor new] is likely to be
implemented with ARC, which wraps methods in autoreleasepool as necessary.<= br>

As far as I'm aware, ARC does NOT a= utomatically insert autorelease pool blocks. The reason you rarely need to = create autoreleasepool blocks in "plain" Objective-C programming = is that Cocoa/CFRunloop/libdispatch event handlers run each event in an aut= oreleasepool. So you don't need to create them explicitly when using di= spatch_async and similar, or when running code on the main thread (which ru= ns inside NSApplicationMain/CFRunloopRun/dispatch_main).

As far as I'm aware, if you don't explicitly define autorele= asepools in raw threads created with the pthreads API, any autoreleased obj= ects will leak. At least I've not found any specification/documentation= contradicting this. And most code in Qemu runs on such raw threads, so we = need to play it safe with regard to autorelease semantics.
Whether the existing Qemu Objective-C code is safe in this reg= ard I don't know for certain, but I've certainly paid attention to= this aspect when modifying ui/cocoa.m in the past, and indeed most of that= code runs on the main thread. Note also how I wrap the apple_gfx_render_ne= w_frame call in a pool when it can't be guaranteed it's running on = a dispatch queue because the command buffer inside that uses autorelease se= mantics.

Functions that uses a method that returns autorelease resources should
be wrapped with autoreleasepool instead of assuming the caller creates
autoreleasepool for them.

I'm treat= ing=C2=A0apple_gfx_common_realize as an internal API, and I don't think= expecting its callers to wrap it in an autoreleasepool block is unreasonab= le. I can certainly explicitly document this in a comment.
=C2=A0
= >=C2=A0 =C2=A0 =C2=A0 > diff --git a/hw/display/apple-gfx.m b/hw/disp= lay/apple-gfx.m
>=C2=A0 =C2=A0 =C2=A0 > new file mode 100644
>=C2=A0 =C2=A0 =C2=A0 > index 00000000000..837300f9cd4
>=C2=A0 =C2=A0 =C2=A0 > --- /dev/null
>=C2=A0 =C2=A0 =C2=A0 > +++ b/hw/display/apple-gfx.m
>=C2=A0 =C2=A0 =C2=A0 > @@ -0,0 +1,536 @@
>=C2=A0 =C2=A0 =C2=A0 > +/*
>=C2=A0 =C2=A0 =C2=A0 > + * QEMU Apple ParavirtualizedGraphics.framew= ork device
>=C2=A0 =C2=A0 =C2=A0 > + *
>=C2=A0 =C2=A0 =C2=A0 > + * Copyright =C2=A9 2023 Amazon.com, Inc. or= its affiliates. All
>=C2=A0 =C2=A0 =C2=A0Rights Reserved.
>=C2=A0 =C2=A0 =C2=A0 > + *
>=C2=A0 =C2=A0 =C2=A0 > + * This work is licensed under the terms of = the GNU GPL, version
>=C2=A0 =C2=A0 =C2=A02 or later.
>=C2=A0 =C2=A0 =C2=A0 > + * See the COPYING file in the top-level dir= ectory.
>=C2=A0 =C2=A0 =C2=A0 > + *
>=C2=A0 =C2=A0 =C2=A0 > + * ParavirtualizedGraphics.framework is a se= t of libraries that
>=C2=A0 =C2=A0 =C2=A0macOS provides
>=C2=A0 =C2=A0 =C2=A0 > + * which implements 3d graphics passthrough = to the host as well as a
>=C2=A0 =C2=A0 =C2=A0 > + * proprietary guest communication channel t= o drive it. This
>=C2=A0 =C2=A0 =C2=A0device model
>=C2=A0 =C2=A0 =C2=A0 > + * implements support to drive that library = from within QEMU.
>=C2=A0 =C2=A0 =C2=A0 > + */
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +#include "apple-gfx.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "trace.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "qemu/main-loop.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "ui/console.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "monitor/monitor.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "qapi/error.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include "migration/blocker.h"
>=C2=A0 =C2=A0 =C2=A0 > +#include <mach/mach_vm.h>
>=C2=A0 =C2=A0 =C2=A0 > +#import <ParavirtualizedGraphics/Paravirt= ualizedGraphics.h>
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static const PGDisplayCoord_t apple_gfx_mode= s[] =3D {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 { .x =3D 1440, .y =3D 1080 },<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 { .x =3D 1280, .y =3D 1024 },<= br> >=C2=A0 =C2=A0 =C2=A0 > +};
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +typedef struct PGTask_s { // Name matches fo= rward declaration in
>=C2=A0 =C2=A0 =C2=A0PG header
>
>=C2=A0 =C2=A0 =C2=A0Let's name it AppleGFXTask. It is a common prac= tice to have the same
>=C2=A0 =C2=A0 =C2=A0tag
>=C2=A0 =C2=A0 =C2=A0name and typedef in QEMU.
>
>
> This is defining a forward-declared type from framework headers which = is
> opaque from the framework's point of view. We do not get to choose= its
> struct name. The alternative is having casts wherever these objects ar= e
> being passed between our code and the framework. (See the original v1/= v2
> vmapple patch series for how messy this is.)

I got the idea. Let's not avoid the typedef then to clarify the naming =
is not under our control.

I'm not s= ure what you mean by this double negative. Are you saying, don't add ou= r own typedef for struct PGTask_s at all, just use the framework-supplied P= GTask_t where appropriate?
=C2=A0

>
>=C2=A0 =C2=A0 =C2=A0 > +static void apple_gfx_render_frame_completed= (AppleGFXState *s,
>=C2=A0 =C2=A0 =C2=A0void *vram,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0id<MTLTexture> texture)<= br> >=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 --s->pending_frames;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 assert(s->pending_frames &g= t;=3D 0);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 if (vram !=3D s->vram) { >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Display mode = has changed, drop this old frame. */
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 assert(texture != =3D s->texture);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 g_free(vram); >
>=C2=A0 =C2=A0 =C2=A0This management of buffers looks a bit convoluted. = I suggest
>=C2=A0 =C2=A0 =C2=A0remembering
>=C2=A0 =C2=A0 =C2=A0the width and height instead of pointers and compar= ing them. This way
>=C2=A0 =C2=A0 =C2=A0you can free resources in set_mode().
>
>
> Yeah, I suppose that works, I can change that around.
>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 } else {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 copy_mtl_texture= _to_surface_mem(texture, vram);
>
>=C2=A0 =C2=A0 =C2=A0Writing vram outside BQL may result in tearing.
>
>
> As far as I can tell(*), QXL does the same. I couldn't find any ex= amples
> of double-buffering in any of the existing display devices, which woul= d
> be the only way to do async updates efficiently and without tearing. I= n
> any case, this solution is still vastly better than a regular VGA
> device, which suffers from very visible tearing with macOS on the gues= t
> side anyway. And in an ideal world, we'd pass through the rendered=
> texture directly to the Cocoa UI code rather than copying out only for=
> the CPU to draw it back into a window surface which is then passed to =
> the GPU for host side rendering. But I felt this patch is already very= ,
> very large, and if anyone cares, we can fix imperfections in subsequen= t
> updates.
>
> (*)The rendering code in that device is also fairly complex, so I may = be
> misreading it.

QXL always modifies the surface with BQL. The surface is modified with
qxl_blit(), which is called by qxl_render_update_area_unlocked().
qxl_render_update_area_unlocked() is called by either of
qxl_render_update() and qxl_render_update_area_bh(). Both of them are
called with BQL. The name includes "unlocked", but it means it is= called
without holding QXL-internal lock.

Most devices works entirely with BQL so they don't perform double
buffering. apple-gfx can do the same.

I= think we can safely move apple-gfx's framebuffer state management back= inside the BQL, yes. I just figured that copying dozens of megabytes of fr= amebuffer data on every frame while holding the lock was not going to help = BQL contention. Especially as PVG does not have a concept of dirty areas, s= o we must copy the whole framebuffer every time. (Unless we were to impleme= nt dirty area detection ourselves.)

Unfortunately,= implementing double-buffering would require a major rework of Qemu's w= hole surface management, console code, and probably most of the UI implemen= tations. I'm guessing the OpenGL fast-path sidesteps all of this, so re= plicating that with Metal would probably be the easier way forward. (Althou= gh doing all this graphics stuff inside the BQL generally seems like a majo= r architectural flaw; I suppose most enterprise use of Qemu does not involv= e the framebuffer, so it's not shown up in BQL contention profiling the= re. It certainly does in desktop use, although at least on macOS hosts ther= e are far worse culprits in that regard.)

>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (s->gfx_up= date_requested) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >gfx_update_requested =3D false;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dp= y_gfx_update_full(s->con);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 gr= aphic_hw_update_done(s->con);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 s->new_frame_ready =3D false;
>
>=C2=A0 =C2=A0 =C2=A0This assignment is unnecessary as s->new_frame_r= eady is always false
>=C2=A0 =C2=A0 =C2=A0when s->gfx_update_requested. If you want to mak= e sure
>=C2=A0 =C2=A0 =C2=A0s->gfx_update_requested and s->new_frame_read= y are mutually exclusive,
>=C2=A0 =C2=A0 =C2=A0use one enum value instead of having two bools.
>
>
> I'll need to refresh my memory and get back to you on this one, it= 's
> been so many months since I actively worked on this code.
>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 } else {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >new_frame_ready =3D true;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 if (s->pending_frames > = 0) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 apple_gfx_render= _new_frame(s);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static void apple_gfx_fb_update_display(void= *opaque)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 AppleGFXState *s =3D opaque; >=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 dispatch_async(s->render_qu= eue, ^{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (s->pendin= g_frames > 0) {
>
>=C2=A0 =C2=A0 =C2=A0It should check for s->new_frame_ready as
>=C2=A0 =C2=A0 =C2=A0apple_gfx_render_frame_completed() doesn't chec= k if
>=C2=A0 =C2=A0 =C2=A0s->pending_frames > 0 before calling graphic_= hw_update_done(), which is
>=C2=A0 =C2=A0 =C2=A0inconsistent.
>
>
> pending_frames is about guest-side frames that are queued to be render= ed
> by the host GPU.
> new_frame_ready being true indicates that the contents of the Qemu > console surface has been updated with new frame data since the last > gfx_update.
> gfx_update_requested indicates that gfx_update is currently awaiting a= n
> async completion (graphic_hw_update_done) but the surface has not
> received a new frame content, but the GPU is stily busy drawing one. >
> apple_gfx_render_frame_completed is scheduled exactly once per pending=
> frame, so pending_frames > 0 is an invariant there. (Hence the asse= rt.)> > I don't think there is any inconsistency here, but I'= ll double check.
> It's possible that there's an easier way to express the state = machine,
> and I'll take a look at that.

I meant that apple_gfx_render_frame_completed() does not check if the
frame is the last one currently pending. apple_gfx_fb_update_display()
ignores a new ready frame when there is a more pending frame, but
apple_gfx_render_frame_completed() unconditionally fires
graphic_hw_update_done() even if there is a more pending frame. And I
think apple_gfx_render_frame_completed() is right and
apple_gfx_fb_update_display() is wrong in such a situation.


OK, got it. And yes, I agree.
=C2=A0
>
>=C2=A0 =C2=A0 =C2=A0Checking if s->pending_frames > 0 both in app= le_gfx_fb_update_display()
>=C2=A0 =C2=A0 =C2=A0and apple_gfx_render_frame_completed() is also prob= lematic as that can
>=C2=A0 =C2=A0 =C2=A0defer graphic_hw_update_done() indefinitely if we a= re getting new
>=C2=A0 =C2=A0 =C2=A0frames
>=C2=A0 =C2=A0 =C2=A0too fast.
>
>
> I see what you mean about this part. I'll have to test it, but I g= uess
> we should reverse the priority, like this:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (s->new_frame_ready) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dpy_gfx_update_full(s-= >con);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s->new_frame_ready = =3D false;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 graphic_hw_update_done= (s->con);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } else if (s->pending_frames >= 0) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s->gfx_update_reque= sted =3D true;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } else {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 graphic_hw_update_done= (s->con);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>
> 1. If we already have a frame, ready to be displayed, return it immedi= ately.
> 2. If the guest has reported that it's completed a frame and the G= PU is
> currently busy rendering it, defer graphic_hw_update_done until it'= ;s done.
> 3. If the guest reports no changes to its display, indicate this back = to
> Qemu as a no-op display update graphic_hw_update_done() with no
> dpy_gfx_update* call.

Yes, that looks correct.

>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >gfx_update_requested =3D true;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 } else {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if= (s->new_frame_ready) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 dpy_gfx_update_full(s->con);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 s->new_frame_ready =3D false;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 gr= aphic_hw_update_done(s->con);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }> +=C2= =A0 =C2=A0 });
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static const GraphicHwOps apple_gfx_fb_ops = =3D {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .gfx_update =3D apple_gfx_fb_u= pdate_display,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 .gfx_update_async =3D true, >=C2=A0 =C2=A0 =C2=A0 > +};
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static void update_cursor(AppleGFXState *s)<= br> >=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 dpy_mouse_set(s->con, s->= ;pgdisp.cursorPosition.x,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 s->pgdisp.cursorPosition.y, s->cursor_show);
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static void set_mode(AppleGFXState *s, uint3= 2_t width, uint32_t
>=C2=A0 =C2=A0 =C2=A0height)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 void *vram =3D NULL;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 DisplaySurface *surface;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 MTLTextureDescriptor *textureD= escriptor;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 id<MTLTexture> texture = =3D nil;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 __block bool no_change =3D fal= se;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 dispatch_sync(s->render_que= ue,
>
>=C2=A0 =C2=A0 =C2=A0Calling dispatch_sync() while holding BQL may resul= t in deadlock.
>
> Only if any code executed on the same dispatch queue acquires the lock=
> either directly or transitively. I believe I have ensure this is not <= br> > done on the reqnder_queue, have you found anywhere this is the case?
The documentation is not clear what threads a dispatch queue runs on. We can have a deadlock if they lock the BQL.

dispatch_sync is a synchronisation primitive (it waits for and asserts e= xclusive access to the given queue), it doesn't actually do any thread = scheduling. Work scheduled asynchronously to non-main dispatch queues will = otherwise run on libdispatch pool threads. Running blocks on dispatch queue= s will not preempt and schedule it on other threads which may or may not be= holding some locks.

So the only way this code wil= l deadlock is if any code scheduled to render_queue directly or transitivel= y acquires the BQL. None of it does, although updating the console while ho= lding the BQL rather complicates this.
=C2=A0
>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ^{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if= (s->surface &&
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 width =3D=3D surface_width(s->surface) &&
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 height =3D=3D surface_height(s->surface)) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 no_change =3D true;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 });
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 if (no_change) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 vram =3D g_malloc0(width * hei= ght * 4);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 surface =3D qemu_create_displa= ysurface_from(width, height,
>=C2=A0 =C2=A0 =C2=A0PIXMAN_LE_a8r8g8b8,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 width * 4, vram);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 @autoreleasepool {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 textureDescripto= r =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [M= TLTextureDescriptor
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0texture2DDescriptorWithPixelFormat:MTLPixelFormatBG= RA8Unorm
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0width:width
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 height:height
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mipmapped:NO];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 textureDescripto= r.usage =3D s->pgdisp.minimumTextureUsage;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 texture =3D [s-&= gt;mtl
>=C2=A0 =C2=A0 =C2=A0newTextureWithDescriptor:textureDescriptor];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->using_managed_texture_st= orage =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 (texture.storage= Mode =3D=3D MTLStorageModeManaged);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 dispatch_sync(s->render_que= ue,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ^{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 id= <MTLTexture> old_texture =3D nil;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vo= id *old_vram =3D s->vram;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >vram =3D vram;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >surface =3D surface;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dp= y_gfx_replace_surface(s->con, surface);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ol= d_texture =3D s->texture;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >texture =3D texture;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [o= ld_texture release];
>
>=C2=A0 =C2=A0 =C2=A0You can just do:
>=C2=A0 =C2=A0 =C2=A0[s->texture release];
>=C2=A0 =C2=A0 =C2=A0s->texture =3D texture;
>
>=C2=A0 =C2=A0 =C2=A0This will make s->texture dangling between the t= wo statements, but that
>=C2=A0 =C2=A0 =C2=A0don't matter since s->texture is not an atom= ic variable that can be
>=C2=A0 =C2=A0 =C2=A0safely observed from another thread anyway.
>
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if= (s->pending_frames =3D=3D 0) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 g_free(old_vram);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 });
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static void create_fb(AppleGFXState *s)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->con =3D graphic_console_= init(NULL, 0, &apple_gfx_fb_ops, s);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 set_mode(s, 1440, 1080);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->cursor_show =3D true; >=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static size_t apple_gfx_get_default_mmio_ran= ge_size(void)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 size_t mmio_range_size;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 @autoreleasepool {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 PGDeviceDescript= or *desc =3D [PGDeviceDescriptor new];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 mmio_range_size = =3D desc.mmioLength;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 [desc release];<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 return mmio_range_size;
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +void apple_gfx_common_init(Object *obj, Appl= eGFXState *s, const
>=C2=A0 =C2=A0 =C2=A0char* obj_name)
>
>=C2=A0 =C2=A0 =C2=A0This function can be merged into apple_gfx_common_r= ealize().
>
>
> Probably. I'll try it.

Upon fu= rther inspection, we need to call cocoa_enable_runloop_on_main_thread() dur= ing the init phase, not realize(). So we can't get rid of this entirely= . Is there any value in moving the other init code into _realize()?
=C2=A0
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 Error *local_err =3D NULL;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 int r;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 size_t mmio_range_size =3D
>=C2=A0 =C2=A0 =C2=A0apple_gfx_get_default_mmio_range_size();
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 trace_apple_gfx_common_init(ob= j_name, mmio_range_size);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 memory_region_init_io(&s-&= gt;iomem_gfx, obj, &apple_gfx_ops, s,
>=C2=A0 =C2=A0 =C2=A0obj_name,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mmio_range_size);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->iomem_gfx.disable_reentr= ancy_guard =3D true;
>
>=C2=A0 =C2=A0 =C2=A0Why do you disable reentrancy guard?
>
>
> Perhaps with the proposed AIO_WAIT_WHILE based I/O scheme, this won= 9;t be
> an issue anymore, but the guard would otherwise keep dropping MMIOs > which immediately caused the PV graphics device to stop making progres= s.
> The MMIO APIs in the PVG framework are thread- and reentrancy-safe, so=
> we certainly don't need to serialise them on our side.

It's better to understand why such reentrancy happens. Reentrancy itsel= f
is often a sign of bug.

>
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 /* TODO: PVG framework support= s serialising device state:
>=C2=A0 =C2=A0 =C2=A0integrate it! */
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 if (apple_gfx_mig_blocker =3D= =3D NULL) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_setg(&= apple_gfx_mig_blocker,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 "Migration state blocked by apple-gfx display
>=C2=A0 =C2=A0 =C2=A0device");
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 r =3D migrate_ad= d_blocker(&apple_gfx_mig_blocker, &local_err);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (r < 0) {<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 er= ror_report_err(local_err);
>
>=C2=A0 =C2=A0 =C2=A0Please report the error to the caller of apple_gfx_= common_realize()
>=C2=A0 =C2=A0 =C2=A0instead.
>
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> +}> +
>=C2=A0 =C2=A0 =C2=A0 > +static void
>=C2=A0 =C2=A0 =C2=A0apple_gfx_register_task_mapping_handlers(AppleGFXSt= ate *s,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0PGDeviceDescriptor *desc)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 desc.createTask =3D ^(uint64_t= vmSize, void * _Nullable *
>=C2=A0 =C2=A0 =C2=A0_Nonnull baseAddress) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 AppleGFXTask *ta= sk =3D apple_gfx_new_task(s, vmSize);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 *baseAddress =3D= (void*)task->address;
>
>=C2=A0 =C2=A0 =C2=A0nit: please write as (void *) instead of (void*). >
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_gfx_= create_task(vmSize, *baseAddress);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return task;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +
>
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 PGDisplayDescriptor *disp_desc= =3D [PGDisplayDescriptor new];
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > + disp_desc.name <http://disp_desc.name>= ; =3D @"QEMU display";
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.sizeInMillimeters = =3D NSMakeSize(400., 300.); /* A
>=C2=A0 =C2=A0 =C2=A020" display */
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.queue =3D dispatch_g= et_main_queue();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.newFrameEventHandler= =3D ^(void) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_gfx_= new_frame();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 dispatch_async(s= ->render_queue, ^{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*= Drop frames if we get too far ahead. */
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if= (s->pending_frames >=3D 2)
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 return;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ++= s->pending_frames;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if= (s->pending_frames > 1) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 return;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 @a= utoreleasepool {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 apple_gfx_render_new_frame(s);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 });
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.modeChangeHandler = =3D ^(PGDisplayCoord_t sizeInPixels,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 OSType pixelFormat) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_gfx_= mode_change(sizeInPixels.x, sizeInPixels.y);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 set_mode(s, size= InPixels.x, sizeInPixels.y);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.cursorGlyphHandler = =3D ^(NSBitmapImageRep *glyph,
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0PGDisplayCoord_t hotSpot) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 uint32_t bpp =3D= glyph.bitsPerPixel;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t width =3D= glyph.pixelsWide;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t height = =3D glyph.pixelsHigh;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t padding_b= ytes_per_row =3D glyph.bytesPerRow - width
>=C2=A0 =C2=A0 =C2=A0* 4;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 const uint8_t* p= x_data =3D glyph.bitmapData;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_gfx_= cursor_set(bpp, width, height);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (s->cursor= ) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cu= rsor_unref(s->cursor);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >cursor =3D NULL;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (bpp =3D=3D 3= 2) { /* Shouldn't be anything else, but just
>=C2=A0 =C2=A0 =C2=A0to be safe...*/
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >cursor =3D cursor_alloc(width, height);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >cursor->hot_x =3D hotSpot.x;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s-= >cursor->hot_y =3D hotSpot.y;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ui= nt32_t *dest_px =3D s->cursor->data;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 fo= r (size_t y =3D 0; y < height; ++y) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 for (size_t x =3D 0; x < width; ++x) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* NSBitmapImageRep's red & blue channe= ls
>=C2=A0 =C2=A0 =C2=A0are swapped
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* compared to QEMUCursor's. */
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 *dest_px =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (px_data[0] << 16u) |
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (px_data[1] <<=C2=A0 8u) |<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (px_data[2] <<=C2=A0 0u) |<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (px_data[3] << 24u);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 ++dest_px;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 px_data +=3D 4;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 px_data +=3D padding_bytes_per_row;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br> >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dp= y_cursor_define(s->con, s->cursor);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 up= date_cursor(s);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.cursorShowHandler = =3D ^(BOOL show) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_gfx_= cursor_show(show);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 s->cursor_sho= w =3D show;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 update_cursor(s)= ;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc.cursorMoveHandler = =3D ^(void) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 trace_apple_gfx_= cursor_move();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 update_cursor(s)= ;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 };
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 return disp_desc;
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static NSArray<PGDisplayMode*>*
>=C2=A0 =C2=A0 =C2=A0apple_gfx_prepare_display_mode_array(void)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 PGDisplayMode *modes[ARRAY_SIZ= E(apple_gfx_modes)];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 NSArray<PGDisplayMode*>*= mode_array =3D nil;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 int i;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 for (i =3D 0; i < ARRAY_SIZ= E(apple_gfx_modes); i++) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 modes[i] =3D
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [[= PGDisplayMode alloc]
>=C2=A0 =C2=A0 =C2=A0initWithSizeInPixels:apple_gfx_modes[i] refreshRate= InHz:60.];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 mode_array =3D [NSArray arrayW= ithObjects:modes
>=C2=A0 =C2=A0 =C2=A0count:ARRAY_SIZE(apple_gfx_modes)];
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 for (i =3D 0; i < ARRAY_SIZ= E(apple_gfx_modes); i++) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 [modes[i] releas= e];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 modes[i] =3D nil= ;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 return mode_array;
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +static id<MTLDevice> copy_suitable_met= al_device(void)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 id<MTLDevice> dev =3D ni= l;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 NSArray<id<MTLDevice>= > *devs =3D MTLCopyAllDevices();
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 /* Prefer a unified memory GPU= . Failing that, pick a non-
>=C2=A0 =C2=A0 =C2=A0removable GPU. */
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 for (size_t i =3D 0; i < de= vs.count; ++i) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (devs[i].hasU= nifiedMemory) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 de= v =3D devs[i];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 br= eak;
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!devs[i].rem= ovable) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 de= v =3D devs[i];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 if (dev !=3D nil) {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 [dev retain]; >=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 } else {
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 dev =3D MTLCreat= eSystemDefaultDevice();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 }
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 [devs release];
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 return dev;
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +void apple_gfx_common_realize(AppleGFXState = *s,
>=C2=A0 =C2=A0 =C2=A0PGDeviceDescriptor *desc)
>=C2=A0 =C2=A0 =C2=A0 > +{
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 PGDisplayDescriptor *disp_desc= =3D nil;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 QTAILQ_INIT(&s->tasks);=
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->render_queue =3D dispatc= h_queue_create("apple-gfx.render",
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 DISPATCH_QUEUE_SERIAL);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->mtl =3D copy_suitable_me= tal_device();
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->mtl_queue =3D [s->mtl= newCommandQueue];
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 desc.device =3D s->mtl;
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 apple_gfx_register_task_mappin= g_handlers(s, desc);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->pgdev =3D PGNewDeviceWit= hDescriptor(desc);
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 disp_desc =3D apple_gfx_prepar= e_display_handlers(s);
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->pgdisp =3D [s->pgdev = newDisplayWithDescriptor:disp_desc
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 port:0
>=C2=A0 =C2=A0 =C2=A0serialNum:1234];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 [disp_desc release];
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 s->pgdisp.modeList =3D appl= e_gfx_prepare_display_mode_array();
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 =C2=A0 create_fb(s);
>=C2=A0 =C2=A0 =C2=A0 > +}
>=C2=A0 =C2=A0 =C2=A0 > diff --git a/hw/display/meson.build b/hw/disp= lay/meson.build
>=C2=A0 =C2=A0 =C2=A0 > index 7db05eace97..70d855749c0 100644
>=C2=A0 =C2=A0 =C2=A0 > --- a/hw/display/meson.build
>=C2=A0 =C2=A0 =C2=A0 > +++ b/hw/display/meson.build
>=C2=A0 =C2=A0 =C2=A0 > @@ -65,6 +65,8 @@ system_ss.add(when: 'CO= NFIG_ARTIST', if_true:
>=C2=A0 =C2=A0 =C2=A0files('artist.c'))
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0system_ss.add(when: 'CONFIG_A= TI_VGA', if_true: [files('ati.c',
>=C2=A0 =C2=A0 =C2=A0'ati_2d.c', 'ati_dbg.c'), pixman])<= br> >=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 > +system_ss.add(when: 'CONFIG_MAC_PVG'= ,=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if_true:
>=C2=A0 =C2=A0 =C2=A0[files('apple-gfx.m'), pvg, metal])
>=C2=A0 =C2=A0 =C2=A0 > +system_ss.add(when: 'CONFIG_MAC_PVG_VMAP= PLE', if_true:
>=C2=A0 =C2=A0 =C2=A0[files('apple-gfx-vmapple.m'), pvg, metal])=
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0if config_all_devices.has_key(= 9;CONFIG_VIRTIO_GPU')
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0virtio_gpu_ss =3D ss.sourc= e_set()
>=C2=A0 =C2=A0 =C2=A0 > diff --git a/hw/display/trace-events b/hw/dis= play/trace-events
>=C2=A0 =C2=A0 =C2=A0 > index 781f8a33203..1809a358e36 100644
>=C2=A0 =C2=A0 =C2=A0 > --- a/hw/display/trace-events
>=C2=A0 =C2=A0 =C2=A0 > +++ b/hw/display/trace-events
>=C2=A0 =C2=A0 =C2=A0 > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned = dest_width)
>=C2=A0 =C2=A0 =C2=A0"dest_width : %u"
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0dm163_leds(int led, uint32_t valu= e) "led %d: 0x%x"
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0dm163_channels(int channel, uint8= _t value) "channel %d: 0x%x"
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0dm163_refresh_rate(uint32_t rr) &= quot;refresh rate %d"
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +# apple-gfx.m
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_read(uint64_t offset, uint64_t res= )
>=C2=A0 =C2=A0 =C2=A0"offset=3D0x%"PRIx64" res=3D0x%"= ;PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_write(uint64_t offset, uint64_t va= l)
>=C2=A0 =C2=A0 =C2=A0"offset=3D0x%"PRIx64" val=3D0x%"= ;PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_create_task(uint32_t vm_size, void= *va) "vm_size=3D0x%x
>=C2=A0 =C2=A0 =C2=A0base_addr=3D%p"
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_destroy_task(void *task) "tas= k=3D%p"
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_map_memory(void *task, uint32_t ra= nge_count, uint64_t
>=C2=A0 =C2=A0 =C2=A0virtual_offset, uint32_t read_only) "task=3D%p= range_count=3D0x%x
>=C2=A0 =C2=A0 =C2=A0virtual_offset=3D0x%"PRIx64" read_only=3D= %d"
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_map_memory_range(uint32_t i, uint6= 4_t phys_addr,
>=C2=A0 =C2=A0 =C2=A0uint64_t phys_len) "[%d] phys_addr=3D0x%"= PRIx64" phys_len=3D0x%"PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_remap(uint64_t retval, uint64_t so= urce, uint64_t
>=C2=A0 =C2=A0 =C2=A0target) "retval=3D%"PRId64" source= =3D0x%"PRIx64" target=3D0x%"PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_unmap_memory(void *task, uint64_t = virtual_offset,
>=C2=A0 =C2=A0 =C2=A0uint64_t length) "task=3D%p virtual_offset=3D0= x%"PRIx64" length=3D0x%"PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_read_memory(uint64_t phys_address,= uint64_t length,
>=C2=A0 =C2=A0 =C2=A0void *dst) "phys_addr=3D0x%"PRIx64" = length=3D0x%"PRIx64" dest=3D%p"
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_raise_irq(uint32_t vector) "v= ector=3D0x%x"
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_new_frame(void) ""
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_mode_change(uint64_t x, uint64_t y= ) "x=3D%"PRId64"
>=C2=A0 =C2=A0 =C2=A0y=3D%"PRId64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_cursor_set(uint32_t bpp, uint64_t = width, uint64_t
>=C2=A0 =C2=A0 =C2=A0height) "bpp=3D%d width=3D%"PRId64" = height=3D0x%"PRId64
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_cursor_show(uint32_t show) "s= how=3D%d"
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_cursor_move(void) ""
>=C2=A0 =C2=A0 =C2=A0 > +apple_gfx_common_init(const char *device_nam= e, size_t mmio_size)
>=C2=A0 =C2=A0 =C2=A0"device: %s; MMIO size: %zu bytes"
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > +# apple-gfx-vmapple.m
>=C2=A0 =C2=A0 =C2=A0 > +apple_iosfc_read(uint64_t offset, uint64_t r= es)
>=C2=A0 =C2=A0 =C2=A0"offset=3D0x%"PRIx64" res=3D0x%"= ;PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_iosfc_write(uint64_t offset, uint64_t = val)
>=C2=A0 =C2=A0 =C2=A0"offset=3D0x%"PRIx64" val=3D0x%"= ;PRIx64
>=C2=A0 =C2=A0 =C2=A0 > +apple_iosfc_map_memory(uint64_t phys, uint64= _t len, uint32_t ro,
>=C2=A0 =C2=A0 =C2=A0void *va, void *e, void *f) "phys=3D0x%"P= RIx64" len=3D0x%"PRIx64" ro=3D%d
>=C2=A0 =C2=A0 =C2=A0va=3D%p e=3D%p f=3D%p"
>=C2=A0 =C2=A0 =C2=A0 > +apple_iosfc_unmap_memory(void *a, void *b, v= oid *c, void *d,
>=C2=A0 =C2=A0 =C2=A0void *e, void *f) "a=3D%p b=3D%p c=3D%p d=3D%p= e=3D%p f=3D%p"
>=C2=A0 =C2=A0 =C2=A0 > +apple_iosfc_raise_irq(uint32_t vector) "= ;vector=3D0x%x"
>=C2=A0 =C2=A0 =C2=A0 > +
>=C2=A0 =C2=A0 =C2=A0 > diff --git a/meson.build b/meson.build
>=C2=A0 =C2=A0 =C2=A0 > index 10464466ff3..f09df3f09d5 100644
>=C2=A0 =C2=A0 =C2=A0 > --- a/meson.build
>=C2=A0 =C2=A0 =C2=A0 > +++ b/meson.build
>=C2=A0 =C2=A0 =C2=A0 > @@ -741,6 +741,8 @@ socket =3D []
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0version_res =3D []
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0coref =3D []
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0iokit =3D []
>=C2=A0 =C2=A0 =C2=A0 > +pvg =3D []
>=C2=A0 =C2=A0 =C2=A0 > +metal =3D []
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0emulator_link_args =3D []
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0midl =3D not_found
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0widl =3D not_found
>=C2=A0 =C2=A0 =C2=A0 > @@ -762,6 +764,8 @@ elif host_os =3D=3D '= darwin'
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0coref =3D dependency('= appleframeworks', modules: 'CoreFoundation')
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0iokit =3D dependency('= appleframeworks', modules: 'IOKit',
>=C2=A0 =C2=A0 =C2=A0required: false)
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0host_dsosuf =3D '.dyli= b'
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 pvg =3D dependency('appleframewor= ks', modules:
>=C2=A0 =C2=A0 =C2=A0'ParavirtualizedGraphics')
>=C2=A0 =C2=A0 =C2=A0 > +=C2=A0 metal =3D dependency('appleframew= orks', modules: 'Metal')
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0elif host_os =3D=3D 'sunos= 9;
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0socket =3D [cc.find_librar= y('socket'),
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0cc.find_library('nsl'),
>

--00000000000032f0df0623cc857b--