Re: [PATCH v3 01/14] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Phil Dennis-Jordan <phil@philjordan.eu>
Cc: qemu-devel@nongnu.org, agraf@csgraf.de, peter.maydell@linaro.org,
	pbonzini@redhat.com, rad@semihalf.com, quic_llindhol@quicinc.com,
	marcin.juszkiewicz@linaro.org, stefanha@redhat.com,
	mst@redhat.com, slp@redhat.com, richard.henderson@linaro.org,
	eduardo@habkost.net, marcel.apfelbaum@gmail.com,
	gaosong@loongson.cn, jiaxun.yang@flygoat.com,
	chenhuacai@kernel.org, kwolf@redhat.com, hreitz@redhat.com,
	philmd@linaro.org, shorne@gmail.com, palmer@dabbelt.com,
	alistair.francis@wdc.com, bmeng.cn@gmail.com,
	liwei1518@gmail.com, dbarboza@ventanamicro.com,
	zhiwei_liu@linux.alibaba.com, jcmvbkbc@gmail.com,
	marcandre.lureau@redhat.com, berrange@redhat.com,
	qemu-arm@nongnu.org, qemu-block@nongnu.org,
	qemu-riscv@nongnu.org, Alexander Graf <graf@amazon.com>
Subject: Re: [PATCH v3 01/14] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
Date: Thu, 3 Oct 2024 16:09:19 +0900	[thread overview]
Message-ID: <ae0f5ef7-c714-40e1-be5a-1aa12df518c8@daynix.com> (raw)
In-Reply-To: <CAAibmn1+Z0zcB-vQ3ZbrY0o7wtev+YtV1EOCH-4s=PfQg_Pd8A@mail.gmail.com>

On 2024/10/02 22:33, Phil Dennis-Jordan wrote:
> Hi,
> 
> Thanks for taking a close look at this.
> 
> There are some further comments, explanations, and also a few questions 
> inline below. Where I've not commented, I'll just go ahead and make the 
> suggested change for v4.
> 
> On Tue, 1 Oct 2024 at 11:40, Akihiko Odaki <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
> 
>      > This patch implements a QEMU device that drives PVG for the VMApple
>      > variant of it.
> 
>     I think it is better to name it MMIO variant instead of VMApple. There
>     is nothing specific to VMApple in: hw/display/apple-gfx-vmapple.m
> 
> 
> I mean, I don't see it being useful for anything whatsoever outside the 
> vmapple machine type… But I guess there's little harm in renaming it.

I don't know but perhaps we may get vmapple2 or something in the future.

> 
> 
>      > +#include "apple-gfx.h"
>      > +#include "monitor/monitor.h"
>      > +#include "hw/sysbus.h"
>      > +#include "hw/irq.h"
>      > +#include "trace.h"
>      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>      > +
>      > +_Static_assert(__aarch64__, "");
> 
>     I don't think this assertion is worthwhile. This assertion will trigger
>     if you accidentally remove depends on AARCH64 from Kconfig, but I don't
>     think such code change happens by accident, and there is no reason to
>     believe that this assertion won't be removed in such a case.
> 
> 
> As far as I'm aware the Kconfig AARCH64 dependency is for the /target/ 
> architecture, not the /host/ architecture? The static assert checks for 
> the latter. The PGIOSurfaceHostDeviceDescriptor type isn't available at 
> all on non-aarch64 macOS hosts. I've not had any luck with using this 
> variant of the device on x86-64 hosts simply by disabling any surface 
> mapper code.
> 
> Incidentally, if you know of a way to depend on a specific /host/ 
> architecture in the Kconfig, that would be even better. I couldn't spot 
> a way of doing that though.

I got your intention now. The correct way to do that is to check for cpu 
== 'aarch64'. Having assertion will break qemu-system-aarch64 on Intel Macs.

> 
>      > +
>      > +/*
>      > + * ParavirtualizedGraphics.Framework only ships header files for
>     the PCI
>      > + * variant which does not include IOSFC descriptors and host
>     devices. We add
>      > + * their definitions here so that we can also work with the ARM
>     version.
>      > + */
>      > +typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
>      > +typedef bool(^IOSFCUnmapMemory)(
>      > +    void *a, void *b, void *c, void *d, void *e, void *f);
> 
>     Omit dummy parameter names.
> 
> 
>      > +@end
>      > +
>      > +typedef struct AppleGFXVmappleState {
>      > +    SysBusDevice parent_obj;
>      > +
>      > +    AppleGFXState common;
>      > +
>      > +    qemu_irq irq_gfx;
>      > +    qemu_irq irq_iosfc;
>      > +    MemoryRegion iomem_iosfc;
>      > +    PGIOSurfaceHostDevice *pgiosfc;
>      > +} AppleGFXVmappleState;
>      > +
>      > +OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXVmappleState, APPLE_GFX_VMAPPLE)
>      > +
>      > +
>      > +static uint64_t apple_iosfc_read(void *opaque, hwaddr offset,
>     unsigned size)
>      > +{
>      > +    AppleGFXVmappleState *s = opaque;
>      > +    uint64_t res = 0;
>      > +
>      > +    bql_unlock();
> 
>     It is dangerous to unlock BQL at an arbitrary place. Instead of
>     unlocking, I suggest:
>     - running [s->pgiosfc mmioReadAtOffset:offset] on another thread
>     - using a bottom half to request operations that require BQL from the
>     thread running [s->pgiosfc mmioReadAtOffset:offset]
>     - calling AIO_WAIT_WHILE() to process the bottom half and to wait for
>     the completion of [s->pgiosfc mmioReadAtOffset:offset]
> 
> 
> OK, I think I see what you mean, I'll try to rework things around that 
> pattern. Any preference on how I kick off the job on the other thread? 
> As we necessarily need to use libdispatch in a bunch of places in this 
> code anyway, I guess dispatch_async() would probably be the simplest?

Perhaps so. The QEMU way is to use a bottom half with AioContext, but 
you can't simultaneously run a dispatch queue and AioContext in one 
thread so you have to use the dispatch queue if you need one.

> 
>      > +    res = [s->pgiosfc mmioReadAtOffset:offset];
>      > +    bql_lock();
>      > +
>      > +    trace_apple_iosfc_read(offset, res);
>      > +
>      > +    return res;
>      > +}
>      > +
>      > +static void apple_iosfc_write(
>      > +    void *opaque, hwaddr offset, uint64_t val, unsigned size)
>      > +{
>      > +    AppleGFXVmappleState *s = opaque;
>      > +
>      > +    trace_apple_iosfc_write(offset, val);
>      > +
>      > +    [s->pgiosfc mmioWriteAtOffset:offset value:val];
>      > +}
>      > +
>      > +static const MemoryRegionOps apple_iosfc_ops = {
>      > +    .read = apple_iosfc_read,
>      > +    .write = apple_iosfc_write,
>      > +    .endianness = DEVICE_LITTLE_ENDIAN,
>      > +    .valid = {
>      > +        .min_access_size = 4,
>      > +        .max_access_size = 8,
>      > +    },
>      > +    .impl = {
>      > +        .min_access_size = 4,
>      > +        .max_access_size = 8,
>      > +    },
>      > +};
>      > +
>      > +static PGIOSurfaceHostDevice
>     *apple_gfx_prepare_iosurface_host_device(
>      > +    AppleGFXVmappleState *s)
>      > +{
>      > +    PGIOSurfaceHostDeviceDescriptor *iosfc_desc =
>      > +        [PGIOSurfaceHostDeviceDescriptor new];
>      > +    PGIOSurfaceHostDevice *iosfc_host_dev = nil;
>      > +
>      > +    iosfc_desc.mapMemory =
>      > +        ^(uint64_t phys, uint64_t len, bool ro, void **va, void
>     *e, void *f) {
>      > +            trace_apple_iosfc_map_memory(phys, len, ro, va, e, f);
>      > +            MemoryRegion *tmp_mr;
>      > +            *va = gpa2hva(&tmp_mr, phys, len, NULL);
> 
>     Use: dma_memory_map()
> 
> 
> That doesn't seem to be a precisely equivalent operation. It also says 
> in its headerdoc,
> 
>     Use only for reads OR writes - not for read-modify-write operations.
> 
> which I don't think we can guarantee here at all.
> 
> I guess I can call it twice, once for writing and once for reading, but 
> given that the dma_memory_unmap operation marks the area dirty, I'm not 
> it's intended for what I understand the use case here to be: As far as I 
> can tell, the PV graphics device uses (some) of this memory to exchange 
> data in a cache-coherent way between host and guest, e.g. as a lock-free 
> ring buffer, using atomic operations as necessary. (This works because 
> it's a PV device: it "knows" the other end just another CPU core (or 
> even the same one) executing in a different hypervisor context.) This 
> doesn't really match "traditional" DMA patterns where there's either a 
> read or a write happening.

I think the story is a bit different for this VMApple variant. Probably 
the CPU and GPU in Apple Silicon is cache-coherent so you can map normal 
memory for GPU without any kind of cache maintenance.

Cache conherency of CPU and GPU in Apple Silicon is implied with Apple's 
documentation; it says you don't need to synchronize resources for 
MTLStorageModeShared, which is the default for Apple Silicon.
https://developer.apple.com/documentation/metal/resource_fundamentals/synchronizing_a_managed_resource_in_macos

The name "IOSurface" also implies it is used not only for e.g., ring 
buffer but also for real data.

> 
> Hunting around some more for alternative APIs, there's also 
> memory_region_get_ram_ptr(), but I'm not sure its restrictions apply 
> here either.

I think you can call memory_access_is_direct() to check if the 
requirement is satisfied.

It will still break dirty page tracking implemented by 
dma_memory_unmap() and others, but it's broken for hvf, which does not 
implement dirty page tracking either.

> 
>      > +            return (bool)true;
> 
>     Why cast?
> 
> 
> Good question. Not originally my code, so I've fixed all the instances I 
> could find now.
> 
>      > +        };
>      > +
>      > +    iosfc_desc.unmapMemory =
>      > +        ^(void *a, void *b, void *c, void *d, void *e, void *f) {
>      > +            trace_apple_iosfc_unmap_memory(a, b, c, d, e, f);
>      > +            return (bool)true;
>      > +        };
>      > +
>      > +    iosfc_desc.raiseInterrupt = ^(uint32_t vector) {
>      > +        trace_apple_iosfc_raise_irq(vector);
>      > +        bool locked = bql_locked();
>      > +        if (!locked) {
>      > +            bql_lock();
>      > +        }
>       > +        qemu_irq_pulse(s->irq_iosfc);> +        if (!locked) {
>      > +            bql_unlock();
>      > +        }
>      > +        return (bool)true;
>      > +    };
>      > +
>      > +    iosfc_host_dev =
>      > +        [[PGIOSurfaceHostDevice alloc]
>     initWithDescriptor:iosfc_desc];
>      > +    [iosfc_desc release];
>      > +    return iosfc_host_dev;
>      > +}
>      > +
>      > +static void apple_gfx_vmapple_realize(DeviceState *dev, Error
>     **errp)
>      > +{
>      > +    @autoreleasepool {
> 
>     This autoreleasepool is not used.
> 
> 
> It is definitely used inside the apple_gfx_common_realize() call. It's 
> also impossible to say whether [PGDeviceDescriptor new] uses autorelease 
> semantics internally, so it seemed safer to wrap the whole thing in an 
> outer pool.

Theoretically, It should be safe to assume the callee creates 
autoreleasepool by themselves as needed in general. We have bunch of 
code to call Objective-C APIs without creating autoreleasepool in the 
caller. Practically, [PGDeviceDescriptor new] is likely to be 
implemented with ARC, which wraps methods in autoreleasepool as necessary.

Functions that uses a method that returns autorelease resources should 
be wrapped with autoreleasepool instead of assuming the caller creates 
autoreleasepool for them.

> 
>      > +        AppleGFXVmappleState *s = APPLE_GFX_VMAPPLE(dev);
>      > +
>      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
>      > +        desc.usingIOSurfaceMapper = true;
>      > +        desc.raiseInterrupt = ^(uint32_t vector) {
>      > +            bool locked;
>      > +
>      > +            trace_apple_gfx_raise_irq(vector);
>      > +            locked = bql_locked();
>      > +            if (!locked) {
>      > +                bql_lock();
>      > +            }
>      > +            qemu_irq_pulse(s->irq_gfx);
>      > +            if (!locked) {
>      > +                bql_unlock();
>      > +            }
>      > +        };
>      > +
>      > +        s->pgiosfc = apple_gfx_prepare_iosurface_host_device(s);
>      > +
>      > +        apple_gfx_common_realize(&s->common, desc);
>      > +        [desc release];
>      > +        desc = nil;
>      > +    }
>      > +}
>      > +
> 
>      > +++ b/hw/display/apple-gfx.h
>      > @@ -0,0 +1,57 @@
>      > +#ifndef QEMU_APPLE_GFX_H
>      > +#define QEMU_APPLE_GFX_H
>      > +
>      > +#define TYPE_APPLE_GFX_VMAPPLE      "apple-gfx-vmapple"
>      > +#define TYPE_APPLE_GFX_PCI          "apple-gfx-pci"
>      > +
>      > +#include "qemu/typedefs.h"
>      > +
>      > +typedef struct AppleGFXState AppleGFXState;
>      > +
>      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
>     char* obj_name);
>      > +
>      > +#ifdef __OBJC__
> 
>     This ifdef is unnecessary.
> 
> Ah indeed; at one point, vmapple.c was #including this file, but that's 
> no longer necessary.
> 
> 
>      > diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
>      > new file mode 100644
>      > index 00000000000..837300f9cd4
>      > --- /dev/null
>      > +++ b/hw/display/apple-gfx.m
>      > @@ -0,0 +1,536 @@
>      > +/*
>      > + * QEMU Apple ParavirtualizedGraphics.framework device
>      > + *
>      > + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All
>     Rights Reserved.
>      > + *
>      > + * This work is licensed under the terms of the GNU GPL, version
>     2 or later.
>      > + * See the COPYING file in the top-level directory.
>      > + *
>      > + * ParavirtualizedGraphics.framework is a set of libraries that
>     macOS provides
>      > + * which implements 3d graphics passthrough to the host as well as a
>      > + * proprietary guest communication channel to drive it. This
>     device model
>      > + * implements support to drive that library from within QEMU.
>      > + */
>      > +
>      > +#include "apple-gfx.h"
>      > +#include "trace.h"
>      > +#include "qemu/main-loop.h"
>      > +#include "ui/console.h"
>      > +#include "monitor/monitor.h"
>      > +#include "qapi/error.h"
>      > +#include "migration/blocker.h"
>      > +#include <mach/mach_vm.h>
>      > +#import <ParavirtualizedGraphics/ParavirtualizedGraphics.h>
>      > +
>      > +static const PGDisplayCoord_t apple_gfx_modes[] = {
>      > +    { .x = 1440, .y = 1080 },
>      > +    { .x = 1280, .y = 1024 },
>      > +};
>      > +
>      > +typedef struct PGTask_s { // Name matches forward declaration in
>     PG header
> 
>     Let's name it AppleGFXTask. It is a common practice to have the same
>     tag
>     name and typedef in QEMU.
> 
> 
> This is defining a forward-declared type from framework headers which is 
> opaque from the framework's point of view. We do not get to choose its 
> struct name. The alternative is having casts wherever these objects are 
> being passed between our code and the framework. (See the original v1/v2 
> vmapple patch series for how messy this is.)

I got the idea. Let's not avoid the typedef then to clarify the naming 
is not under our control.

> 
>      > +static void apple_gfx_render_new_frame(AppleGFXState *s)
>      > +{
>      > +    BOOL r;
>      > +    void *vram = s->vram;
>      > +    uint32_t width = surface_width(s->surface);
>      > +    uint32_t height = surface_height(s->surface);
>      > +    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
>      > +    id<MTLCommandBuffer> command_buffer = [s->mtl_queue
>     commandBuffer];
>      > +    id<MTLTexture> texture = s->texture;
>      > +    r = [s->pgdisp encodeCurrentFrameToCommandBuffer:command_buffer
>      > +                                             texture:texture
>      > +                                              region:region];
>      > +    if (!r) {
>      > +        return;
>      > +    }
>      > +    [texture retain];
>      > +    if (s->using_managed_texture_storage) {
>      > +        /* "Managed" textures exist in both VRAM and RAM and
>     must be synced. */
>      > +        id<MTLBlitCommandEncoder> blit = [command_buffer
>     blitCommandEncoder];
>      > +        [blit synchronizeResource:texture];
>      > +        [blit endEncoding];
>      > +    }
>      > +    [command_buffer retain];
> 
>     I don't think this call of retain is necessary. The completion handler
>     gets the command buffer via parameter, which implies the command buffer
>     is automatically retained until the completion handler finishes.
> 
> 
> I couldn't find any hard guarantees for this documented anywhere, hence 
> erring on the side of caution. I think the harm of an additional retain/ 
> release pair here is quite minor.

Apple's sample code doesn't retain for such a case so we can certainly 
omit it:
https://developer.apple.com/documentation/metal/indirect_command_encoding/encoding_indirect_command_buffers_on_the_cpu

It's better to avoid additional resource management when the underlying 
API automatically does things correct. There is a hazard of forgetting 
release anytime adding one call of retain.

> 
> 
>      > +static void apple_gfx_render_frame_completed(AppleGFXState *s,
>     void *vram,
>      > +                                             id<MTLTexture> texture)
>      > +{
>      > +    --s->pending_frames;
>      > +    assert(s->pending_frames >= 0);
>      > +
>      > +    if (vram != s->vram) {
>      > +        /* Display mode has changed, drop this old frame. */
>      > +        assert(texture != s->texture);
>      > +        g_free(vram);
> 
>     This management of buffers looks a bit convoluted. I suggest
>     remembering
>     the width and height instead of pointers and comparing them. This way
>     you can free resources in set_mode().
> 
> 
> Yeah, I suppose that works, I can change that around.
> 
>      > +    } else {
>      > +        copy_mtl_texture_to_surface_mem(texture, vram);
> 
>     Writing vram outside BQL may result in tearing.
> 
> 
> As far as I can tell(*), QXL does the same. I couldn't find any examples 
> of double-buffering in any of the existing display devices, which would 
> be the only way to do async updates efficiently and without tearing. In 
> any case, this solution is still vastly better than a regular VGA 
> device, which suffers from very visible tearing with macOS on the guest 
> side anyway. And in an ideal world, we'd pass through the rendered 
> texture directly to the Cocoa UI code rather than copying out only for 
> the CPU to draw it back into a window surface which is then passed to 
> the GPU for host side rendering. But I felt this patch is already very, 
> very large, and if anyone cares, we can fix imperfections in subsequent 
> updates.
> 
> (*)The rendering code in that device is also fairly complex, so I may be 
> misreading it.

QXL always modifies the surface with BQL. The surface is modified with 
qxl_blit(), which is called by qxl_render_update_area_unlocked(). 
qxl_render_update_area_unlocked() is called by either of 
qxl_render_update() and qxl_render_update_area_bh(). Both of them are 
called with BQL. The name includes "unlocked", but it means it is called 
without holding QXL-internal lock.

Most devices works entirely with BQL so they don't perform double 
buffering. apple-gfx can do the same.

> 
>      > +        if (s->gfx_update_requested) {
>      > +            s->gfx_update_requested = false;
>      > +            dpy_gfx_update_full(s->con);
>      > +            graphic_hw_update_done(s->con);
>       > +            s->new_frame_ready = false;
> 
>     This assignment is unnecessary as s->new_frame_ready is always false
>     when s->gfx_update_requested. If you want to make sure
>     s->gfx_update_requested and s->new_frame_ready are mutually exclusive,
>     use one enum value instead of having two bools.
> 
> 
> I'll need to refresh my memory and get back to you on this one, it's 
> been so many months since I actively worked on this code.
> 
>      > +        } else {
>      > +            s->new_frame_ready = true;
>      > +        }
>      > +    }
>      > +    if (s->pending_frames > 0) {
>      > +        apple_gfx_render_new_frame(s);
>      > +    }
>      > +}
>      > +
>      > +static void apple_gfx_fb_update_display(void *opaque)
>      > +{
>      > +    AppleGFXState *s = opaque;
>      > +
>      > +    dispatch_async(s->render_queue, ^{
>      > +        if (s->pending_frames > 0) {
> 
>     It should check for s->new_frame_ready as
>     apple_gfx_render_frame_completed() doesn't check if
>     s->pending_frames > 0 before calling graphic_hw_update_done(), which is
>     inconsistent.
> 
> 
> pending_frames is about guest-side frames that are queued to be rendered 
> by the host GPU.
> new_frame_ready being true indicates that the contents of the Qemu 
> console surface has been updated with new frame data since the last 
> gfx_update.
> gfx_update_requested indicates that gfx_update is currently awaiting an 
> async completion (graphic_hw_update_done) but the surface has not 
> received a new frame content, but the GPU is stily busy drawing one.
> 
> apple_gfx_render_frame_completed is scheduled exactly once per pending 
> frame, so pending_frames > 0 is an invariant there. (Hence the assert.)> > I don't think there is any inconsistency here, but I'll double check.
> It's possible that there's an easier way to express the state machine, 
> and I'll take a look at that.

I meant that apple_gfx_render_frame_completed() does not check if the 
frame is the last one currently pending. apple_gfx_fb_update_display() 
ignores a new ready frame when there is a more pending frame, but 
apple_gfx_render_frame_completed() unconditionally fires 
graphic_hw_update_done() even if there is a more pending frame. And I 
think apple_gfx_render_frame_completed() is right and 
apple_gfx_fb_update_display() is wrong in such a situation.

> 
>     Checking if s->pending_frames > 0 both in apple_gfx_fb_update_display()
>     and apple_gfx_render_frame_completed() is also problematic as that can
>     defer graphic_hw_update_done() indefinitely if we are getting new
>     frames
>     too fast.
> 
> 
> I see what you mean about this part. I'll have to test it, but I guess 
> we should reverse the priority, like this:
> 
>          if (s->new_frame_ready) {
>              dpy_gfx_update_full(s->con);
>              s->new_frame_ready = false;
>              graphic_hw_update_done(s->con);
>          } else if (s->pending_frames > 0) {
>              s->gfx_update_requested = true;
>          } else {
>              graphic_hw_update_done(s->con);
>          }
> 
> 1. If we already have a frame, ready to be displayed, return it immediately.
> 2. If the guest has reported that it's completed a frame and the GPU is 
> currently busy rendering it, defer graphic_hw_update_done until it's done.
> 3. If the guest reports no changes to its display, indicate this back to 
> Qemu as a no-op display update graphic_hw_update_done() with no 
> dpy_gfx_update* call.

Yes, that looks correct.

> 
>      > +            s->gfx_update_requested = true;
>      > +        } else {
>      > +            if (s->new_frame_ready) {
>      > +                dpy_gfx_update_full(s->con);
>      > +                s->new_frame_ready = false;
>      > +            }
>      > +            graphic_hw_update_done(s->con);
>       > +        }> +    });
>      > +}
>      > +
>      > +static const GraphicHwOps apple_gfx_fb_ops = {
>      > +    .gfx_update = apple_gfx_fb_update_display,
>      > +    .gfx_update_async = true,
>      > +};
>      > +
>      > +static void update_cursor(AppleGFXState *s)
>      > +{
>      > +    dpy_mouse_set(s->con, s->pgdisp.cursorPosition.x,
>      > +                  s->pgdisp.cursorPosition.y, s->cursor_show);
>      > +}
>      > +
>      > +static void set_mode(AppleGFXState *s, uint32_t width, uint32_t
>     height)
>      > +{
>      > +    void *vram = NULL;
>      > +    DisplaySurface *surface;
>      > +    MTLTextureDescriptor *textureDescriptor;
>      > +    id<MTLTexture> texture = nil;
>      > +    __block bool no_change = false;
>      > +
>      > +    dispatch_sync(s->render_queue,
> 
>     Calling dispatch_sync() while holding BQL may result in deadlock.
> 
> Only if any code executed on the same dispatch queue acquires the lock 
> either directly or transitively. I believe I have ensure this is not 
> done on the reqnder_queue, have you found anywhere this is the case?

The documentation is not clear what threads a dispatch queue runs on. We 
can have a deadlock if they lock the BQL.

> 
>      > +        ^{
>      > +            if (s->surface &&
>      > +                width == surface_width(s->surface) &&
>      > +                height == surface_height(s->surface)) {
>      > +                no_change = true;
>      > +            }
>      > +        });
>      > +
>      > +    if (no_change) {
>      > +        return;
>      > +    }
>      > +
>      > +    vram = g_malloc0(width * height * 4);
>      > +    surface = qemu_create_displaysurface_from(width, height,
>     PIXMAN_LE_a8r8g8b8,
>      > +                                              width * 4, vram);
>      > +
>      > +    @autoreleasepool {
>      > +        textureDescriptor =
>      > +            [MTLTextureDescriptor
>      > +               
>     texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
>      > +                                             width:width
>      > +                                            height:height
>      > +                                         mipmapped:NO];
>      > +        textureDescriptor.usage = s->pgdisp.minimumTextureUsage;
>      > +        texture = [s->mtl
>     newTextureWithDescriptor:textureDescriptor];
>      > +    }
>      > +
>      > +    s->using_managed_texture_storage =
>      > +        (texture.storageMode == MTLStorageModeManaged);
>      > +
>      > +    dispatch_sync(s->render_queue,
>      > +        ^{
>      > +            id<MTLTexture> old_texture = nil;
>      > +            void *old_vram = s->vram;
>      > +            s->vram = vram;
>      > +            s->surface = surface;
>      > +
>      > +            dpy_gfx_replace_surface(s->con, surface);
>      > +
>      > +            old_texture = s->texture;
>      > +            s->texture = texture;
>      > +            [old_texture release];
> 
>     You can just do:
>     [s->texture release];
>     s->texture = texture;
> 
>     This will make s->texture dangling between the two statements, but that
>     don't matter since s->texture is not an atomic variable that can be
>     safely observed from another thread anyway.
> 
>      > +
>      > +            if (s->pending_frames == 0) {
>      > +                g_free(old_vram);
>      > +            }
>      > +        });
>      > +}
>      > +
>      > +static void create_fb(AppleGFXState *s)
>      > +{
>      > +    s->con = graphic_console_init(NULL, 0, &apple_gfx_fb_ops, s);
>      > +    set_mode(s, 1440, 1080);
>      > +
>      > +    s->cursor_show = true;
>      > +}
>      > +
>      > +static size_t apple_gfx_get_default_mmio_range_size(void)
>      > +{
>      > +    size_t mmio_range_size;
>      > +    @autoreleasepool {
>      > +        PGDeviceDescriptor *desc = [PGDeviceDescriptor new];
>      > +        mmio_range_size = desc.mmioLength;
>      > +        [desc release];
>      > +    }
>      > +    return mmio_range_size;
>      > +}
>      > +
>      > +void apple_gfx_common_init(Object *obj, AppleGFXState *s, const
>     char* obj_name)
> 
>     This function can be merged into apple_gfx_common_realize().
> 
> 
> Probably. I'll try it.
> 
>      > +{
>      > +    Error *local_err = NULL;
>      > +    int r;
>      > +    size_t mmio_range_size =
>     apple_gfx_get_default_mmio_range_size();
>      > +
>      > +    trace_apple_gfx_common_init(obj_name, mmio_range_size);
>      > +    memory_region_init_io(&s->iomem_gfx, obj, &apple_gfx_ops, s,
>     obj_name,
>      > +                          mmio_range_size);
>      > +    s->iomem_gfx.disable_reentrancy_guard = true;
> 
>     Why do you disable reentrancy guard?
> 
> 
> Perhaps with the proposed AIO_WAIT_WHILE based I/O scheme, this won't be 
> an issue anymore, but the guard would otherwise keep dropping MMIOs 
> which immediately caused the PV graphics device to stop making progress. 
> The MMIO APIs in the PVG framework are thread- and reentrancy-safe, so 
> we certainly don't need to serialise them on our side.

It's better to understand why such reentrancy happens. Reentrancy itself 
is often a sign of bug.

> 
>      > +
>      > +    /* TODO: PVG framework supports serialising device state:
>     integrate it! */
>      > +    if (apple_gfx_mig_blocker == NULL) {
>      > +        error_setg(&apple_gfx_mig_blocker,
>      > +                  "Migration state blocked by apple-gfx display
>     device");
>      > +        r = migrate_add_blocker(&apple_gfx_mig_blocker, &local_err);
>      > +        if (r < 0) {
>      > +            error_report_err(local_err);
> 
>     Please report the error to the caller of apple_gfx_common_realize()
>     instead.
> 
>      > +        }
>      > +    }
>       > +}> +
>      > +static void
>     apple_gfx_register_task_mapping_handlers(AppleGFXState *s,
>      > +                                                   
>       PGDeviceDescriptor *desc)
>      > +{
>      > +    desc.createTask = ^(uint64_t vmSize, void * _Nullable *
>     _Nonnull baseAddress) {
>      > +        AppleGFXTask *task = apple_gfx_new_task(s, vmSize);
>      > +        *baseAddress = (void*)task->address;
> 
>     nit: please write as (void *) instead of (void*).
> 
>      > +        trace_apple_gfx_create_task(vmSize, *baseAddress);
>      > +        return task;
>      > +    };
>      > +
> 
>      > +{
>      > +    PGDisplayDescriptor *disp_desc = [PGDisplayDescriptor new];
>      > +
>      > + disp_desc.name <http://disp_desc.name> = @"QEMU display";
>      > +    disp_desc.sizeInMillimeters = NSMakeSize(400., 300.); /* A
>     20" display */
>      > +    disp_desc.queue = dispatch_get_main_queue();
>      > +    disp_desc.newFrameEventHandler = ^(void) {
>      > +        trace_apple_gfx_new_frame();
>      > +        dispatch_async(s->render_queue, ^{
>      > +            /* Drop frames if we get too far ahead. */
>      > +            if (s->pending_frames >= 2)
>      > +                return;
>      > +            ++s->pending_frames;
>      > +            if (s->pending_frames > 1) {
>      > +                return;
>      > +            }
>      > +            @autoreleasepool {
>      > +                apple_gfx_render_new_frame(s);
>      > +            }
>      > +        });
>      > +    };
>      > +    disp_desc.modeChangeHandler = ^(PGDisplayCoord_t sizeInPixels,
>      > +                                    OSType pixelFormat) {
>      > +        trace_apple_gfx_mode_change(sizeInPixels.x, sizeInPixels.y);
>      > +        set_mode(s, sizeInPixels.x, sizeInPixels.y);
>      > +    };
>      > +    disp_desc.cursorGlyphHandler = ^(NSBitmapImageRep *glyph,
>      > +                                     PGDisplayCoord_t hotSpot) {
>      > +        uint32_t bpp = glyph.bitsPerPixel;
>      > +        size_t width = glyph.pixelsWide;
>      > +        size_t height = glyph.pixelsHigh;
>      > +        size_t padding_bytes_per_row = glyph.bytesPerRow - width
>     * 4;
>      > +        const uint8_t* px_data = glyph.bitmapData;
>      > +
>      > +        trace_apple_gfx_cursor_set(bpp, width, height);
>      > +
>      > +        if (s->cursor) {
>      > +            cursor_unref(s->cursor);
>      > +            s->cursor = NULL;
>      > +        }
>      > +
>      > +        if (bpp == 32) { /* Shouldn't be anything else, but just
>     to be safe...*/
>      > +            s->cursor = cursor_alloc(width, height);
>      > +            s->cursor->hot_x = hotSpot.x;
>      > +            s->cursor->hot_y = hotSpot.y;
>      > +
>      > +            uint32_t *dest_px = s->cursor->data;
>      > +
>      > +            for (size_t y = 0; y < height; ++y) {
>      > +                for (size_t x = 0; x < width; ++x) {
>      > +                    /* NSBitmapImageRep's red & blue channels
>     are swapped
>      > +                     * compared to QEMUCursor's. */
>      > +                    *dest_px =
>      > +                        (px_data[0] << 16u) |
>      > +                        (px_data[1] <<  8u) |
>      > +                        (px_data[2] <<  0u) |
>      > +                        (px_data[3] << 24u);
>      > +                    ++dest_px;
>      > +                    px_data += 4;
>      > +                }
>      > +                px_data += padding_bytes_per_row;
>      > +            }
>      > +            dpy_cursor_define(s->con, s->cursor);
>      > +            update_cursor(s);
>      > +        }
>      > +    };
>      > +    disp_desc.cursorShowHandler = ^(BOOL show) {
>      > +        trace_apple_gfx_cursor_show(show);
>      > +        s->cursor_show = show;
>      > +        update_cursor(s);
>      > +    };
>      > +    disp_desc.cursorMoveHandler = ^(void) {
>      > +        trace_apple_gfx_cursor_move();
>      > +        update_cursor(s);
>      > +    };
>      > +
>      > +    return disp_desc;
>      > +}
>      > +
>      > +static NSArray<PGDisplayMode*>*
>     apple_gfx_prepare_display_mode_array(void)
>      > +{
>      > +    PGDisplayMode *modes[ARRAY_SIZE(apple_gfx_modes)];
>      > +    NSArray<PGDisplayMode*>* mode_array = nil;
>      > +    int i;
>      > +
>      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
>      > +        modes[i] =
>      > +            [[PGDisplayMode alloc]
>     initWithSizeInPixels:apple_gfx_modes[i] refreshRateInHz:60.];
>      > +    }
>      > +
>      > +    mode_array = [NSArray arrayWithObjects:modes
>     count:ARRAY_SIZE(apple_gfx_modes)];
>      > +
>      > +    for (i = 0; i < ARRAY_SIZE(apple_gfx_modes); i++) {
>      > +        [modes[i] release];
>      > +        modes[i] = nil;
>      > +    }
>      > +
>      > +    return mode_array;
>      > +}
>      > +
>      > +static id<MTLDevice> copy_suitable_metal_device(void)
>      > +{
>      > +    id<MTLDevice> dev = nil;
>      > +    NSArray<id<MTLDevice>> *devs = MTLCopyAllDevices();
>      > +
>      > +    /* Prefer a unified memory GPU. Failing that, pick a non-
>     removable GPU. */
>      > +    for (size_t i = 0; i < devs.count; ++i) {
>      > +        if (devs[i].hasUnifiedMemory) {
>      > +            dev = devs[i];
>      > +            break;
>      > +        }
>      > +        if (!devs[i].removable) {
>      > +            dev = devs[i];
>      > +        }
>      > +    }
>      > +
>      > +    if (dev != nil) {
>      > +        [dev retain];
>      > +    } else {
>      > +        dev = MTLCreateSystemDefaultDevice();
>      > +    }
>      > +    [devs release];
>      > +
>      > +    return dev;
>      > +}
>      > +
>      > +void apple_gfx_common_realize(AppleGFXState *s,
>     PGDeviceDescriptor *desc)
>      > +{
>      > +    PGDisplayDescriptor *disp_desc = nil;
>      > +
>      > +    QTAILQ_INIT(&s->tasks);
>      > +    s->render_queue = dispatch_queue_create("apple-gfx.render",
>      > +                                            DISPATCH_QUEUE_SERIAL);
>      > +    s->mtl = copy_suitable_metal_device();
>      > +    s->mtl_queue = [s->mtl newCommandQueue];
>      > +
>      > +    desc.device = s->mtl;
>      > +
>      > +    apple_gfx_register_task_mapping_handlers(s, desc);
>      > +
>      > +    s->pgdev = PGNewDeviceWithDescriptor(desc);
>      > +
>      > +    disp_desc = apple_gfx_prepare_display_handlers(s);
>      > +    s->pgdisp = [s->pgdev newDisplayWithDescriptor:disp_desc
>      > +                                              port:0
>     serialNum:1234];
>      > +    [disp_desc release];
>      > +    s->pgdisp.modeList = apple_gfx_prepare_display_mode_array();
>      > +
>      > +    create_fb(s);
>      > +}
>      > diff --git a/hw/display/meson.build b/hw/display/meson.build
>      > index 7db05eace97..70d855749c0 100644
>      > --- a/hw/display/meson.build
>      > +++ b/hw/display/meson.build
>      > @@ -65,6 +65,8 @@ system_ss.add(when: 'CONFIG_ARTIST', if_true:
>     files('artist.c'))
>      >
>      >   system_ss.add(when: 'CONFIG_ATI_VGA', if_true: [files('ati.c',
>     'ati_2d.c', 'ati_dbg.c'), pixman])
>      >
>      > +system_ss.add(when: 'CONFIG_MAC_PVG',         if_true:
>     [files('apple-gfx.m'), pvg, metal])
>      > +system_ss.add(when: 'CONFIG_MAC_PVG_VMAPPLE', if_true:
>     [files('apple-gfx-vmapple.m'), pvg, metal])
>      >
>      >   if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
>      >     virtio_gpu_ss = ss.source_set()
>      > diff --git a/hw/display/trace-events b/hw/display/trace-events
>      > index 781f8a33203..1809a358e36 100644
>      > --- a/hw/display/trace-events
>      > +++ b/hw/display/trace-events
>      > @@ -191,3 +191,29 @@ dm163_bits_ppi(unsigned dest_width)
>     "dest_width : %u"
>      >   dm163_leds(int led, uint32_t value) "led %d: 0x%x"
>      >   dm163_channels(int channel, uint8_t value) "channel %d: 0x%x"
>      >   dm163_refresh_rate(uint32_t rr) "refresh rate %d"
>      > +
>      > +# apple-gfx.m
>      > +apple_gfx_read(uint64_t offset, uint64_t res)
>     "offset=0x%"PRIx64" res=0x%"PRIx64
>      > +apple_gfx_write(uint64_t offset, uint64_t val)
>     "offset=0x%"PRIx64" val=0x%"PRIx64
>      > +apple_gfx_create_task(uint32_t vm_size, void *va) "vm_size=0x%x
>     base_addr=%p"
>      > +apple_gfx_destroy_task(void *task) "task=%p"
>      > +apple_gfx_map_memory(void *task, uint32_t range_count, uint64_t
>     virtual_offset, uint32_t read_only) "task=%p range_count=0x%x
>     virtual_offset=0x%"PRIx64" read_only=%d"
>      > +apple_gfx_map_memory_range(uint32_t i, uint64_t phys_addr,
>     uint64_t phys_len) "[%d] phys_addr=0x%"PRIx64" phys_len=0x%"PRIx64
>      > +apple_gfx_remap(uint64_t retval, uint64_t source, uint64_t
>     target) "retval=%"PRId64" source=0x%"PRIx64" target=0x%"PRIx64
>      > +apple_gfx_unmap_memory(void *task, uint64_t virtual_offset,
>     uint64_t length) "task=%p virtual_offset=0x%"PRIx64" length=0x%"PRIx64
>      > +apple_gfx_read_memory(uint64_t phys_address, uint64_t length,
>     void *dst) "phys_addr=0x%"PRIx64" length=0x%"PRIx64" dest=%p"
>      > +apple_gfx_raise_irq(uint32_t vector) "vector=0x%x"
>      > +apple_gfx_new_frame(void) ""
>      > +apple_gfx_mode_change(uint64_t x, uint64_t y) "x=%"PRId64"
>     y=%"PRId64
>      > +apple_gfx_cursor_set(uint32_t bpp, uint64_t width, uint64_t
>     height) "bpp=%d width=%"PRId64" height=0x%"PRId64
>      > +apple_gfx_cursor_show(uint32_t show) "show=%d"
>      > +apple_gfx_cursor_move(void) ""
>      > +apple_gfx_common_init(const char *device_name, size_t mmio_size)
>     "device: %s; MMIO size: %zu bytes"
>      > +
>      > +# apple-gfx-vmapple.m
>      > +apple_iosfc_read(uint64_t offset, uint64_t res)
>     "offset=0x%"PRIx64" res=0x%"PRIx64
>      > +apple_iosfc_write(uint64_t offset, uint64_t val)
>     "offset=0x%"PRIx64" val=0x%"PRIx64
>      > +apple_iosfc_map_memory(uint64_t phys, uint64_t len, uint32_t ro,
>     void *va, void *e, void *f) "phys=0x%"PRIx64" len=0x%"PRIx64" ro=%d
>     va=%p e=%p f=%p"
>      > +apple_iosfc_unmap_memory(void *a, void *b, void *c, void *d,
>     void *e, void *f) "a=%p b=%p c=%p d=%p e=%p f=%p"
>      > +apple_iosfc_raise_irq(uint32_t vector) "vector=0x%x"
>      > +
>      > diff --git a/meson.build b/meson.build
>      > index 10464466ff3..f09df3f09d5 100644
>      > --- a/meson.build
>      > +++ b/meson.build
>      > @@ -741,6 +741,8 @@ socket = []
>      >   version_res = []
>      >   coref = []
>      >   iokit = []
>      > +pvg = []
>      > +metal = []
>      >   emulator_link_args = []
>      >   midl = not_found
>      >   widl = not_found
>      > @@ -762,6 +764,8 @@ elif host_os == 'darwin'
>      >     coref = dependency('appleframeworks', modules: 'CoreFoundation')
>      >     iokit = dependency('appleframeworks', modules: 'IOKit',
>     required: false)
>      >     host_dsosuf = '.dylib'
>      > +  pvg = dependency('appleframeworks', modules:
>     'ParavirtualizedGraphics')
>      > +  metal = dependency('appleframeworks', modules: 'Metal')
>      >   elif host_os == 'sunos'
>      >     socket = [cc.find_library('socket'),
>      >               cc.find_library('nsl'),
>

next prev parent reply	other threads:[~2024-10-03  7:10 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-28  8:57 [PATCH v3 00/14] macOS PV Graphics and new vmapple machine type Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 01/14] hw/display/apple-gfx: Introduce ParavirtualizedGraphics.Framework support Phil Dennis-Jordan
2024-10-01  9:40   ` Akihiko Odaki
2024-10-02 13:33     ` Phil Dennis-Jordan
2024-10-03  7:09       ` Akihiko Odaki [this message]
2024-10-06 10:39         ` Phil Dennis-Jordan
2024-10-07  8:25           ` Akihiko Odaki
2024-10-09 15:06             ` Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 02/14] hw/display/apple-gfx: Adds PCI implementation Phil Dennis-Jordan
2024-09-28 10:39   ` BALATON Zoltan
2024-09-28 13:33     ` Phil Dennis-Jordan
2024-10-02  7:14   ` Akihiko Odaki
2024-10-02 13:39     ` Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 03/14] ui/cocoa: Adds non-app runloop on main thread mode Phil Dennis-Jordan
2024-10-02  7:23   ` Akihiko Odaki
2024-09-28  8:57 ` [PATCH v3 04/14] hw/display/apple-gfx: Adds configurable mode list Phil Dennis-Jordan
2024-10-04  4:17   ` Akihiko Odaki
2024-10-09 14:04     ` Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 05/14] MAINTAINERS: Add myself as maintainer for apple-gfx, reviewer for HVF Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 06/14] hw: Add vmapple subdir Phil Dennis-Jordan
2024-10-05  6:13   ` Akihiko Odaki
2024-09-28  8:57 ` [PATCH v3 07/14] hw/misc/pvpanic: Add MMIO interface Phil Dennis-Jordan
2024-10-05  6:13   ` Akihiko Odaki
2024-09-28  8:57 ` [PATCH v3 08/14] hvf: arm: Ignore writes to CNTP_CTL_EL0 Phil Dennis-Jordan
2024-10-05  6:14   ` Akihiko Odaki
2024-09-28  8:57 ` [PATCH v3 09/14] gpex: Allow more than 4 legacy IRQs Phil Dennis-Jordan
2024-10-04  4:54   ` Akihiko Odaki
2024-09-28  8:57 ` [PATCH v3 10/14] hw/vmapple/aes: Introduce aes engine Phil Dennis-Jordan
2024-10-04  5:32   ` Akihiko Odaki
2024-10-09 12:48     ` Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 11/14] hw/vmapple/bdif: Introduce vmapple backdoor interface Phil Dennis-Jordan
2024-10-05  5:12   ` Akihiko Odaki
2024-10-09 14:00     ` Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 12/14] hw/vmapple/cfg: Introduce vmapple cfg region Phil Dennis-Jordan
2024-10-05  5:35   ` Akihiko Odaki
2024-10-07 14:10     ` Phil Dennis-Jordan
2024-10-07 18:03       ` Akihiko Odaki
2024-10-09 13:08         ` Phil Dennis-Jordan
2024-10-12 10:40           ` Akihiko Odaki
2024-09-28  8:57 ` [PATCH v3 13/14] hw/vmapple/virtio-blk: Add support for apple virtio-blk Phil Dennis-Jordan
2024-10-05  5:47   ` Akihiko Odaki
2024-10-07 14:31     ` Phil Dennis-Jordan
2024-10-07 18:10       ` Akihiko Odaki
2024-10-09 12:52         ` Phil Dennis-Jordan
2024-09-28  8:57 ` [PATCH v3 14/14] hw/vmapple/vmapple: Add vmapple machine type Phil Dennis-Jordan
2024-10-05  6:11   ` Akihiko Odaki
2024-10-08 12:17     ` Phil Dennis-Jordan
2024-10-03  8:06 ` [PATCH v3 00/14] macOS PV Graphics and new " Alex Bennée
2024-10-29 21:20   ` Phil Dennis-Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae0f5ef7-c714-40e1-be5a-1aa12df518c8@daynix.com \
    --to=akihiko.odaki@daynix.com \
    --cc=agraf@csgraf.de \
    --cc=alistair.francis@wdc.com \
    --cc=berrange@redhat.com \
    --cc=bmeng.cn@gmail.com \
    --cc=chenhuacai@kernel.org \
    --cc=dbarboza@ventanamicro.com \
    --cc=eduardo@habkost.net \
    --cc=gaosong@loongson.cn \
    --cc=graf@amazon.com \
    --cc=hreitz@redhat.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=kwolf@redhat.com \
    --cc=liwei1518@gmail.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=marcin.juszkiewicz@linaro.org \
    --cc=mst@redhat.com \
    --cc=palmer@dabbelt.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=phil@philjordan.eu \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=quic_llindhol@quicinc.com \
    --cc=rad@semihalf.com \
    --cc=richard.henderson@linaro.org \
    --cc=shorne@gmail.com \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=zhiwei_liu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).