qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render
@ 2024-06-20 23:17 dongwon.kim
  2024-06-20 23:17 ` [RFC PATCH 1/4] hw/display/virtio-gpu: Introducing render_sync param dongwon.kim
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: dongwon.kim @ 2024-06-20 23:17 UTC (permalink / raw)
  To: qemu-devel

From: Dongwon Kim <dongwon.kim@intel.com>

Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
is used (blob=true). The new param is used to specify when to start
rendering a guest scanout frame.

By default (and so far) rendering of the guest frame is started in
the draw event to make sure guest display update is sychronized with
host's vsync. But this method inevitably brings some extra wait because
most of time, the draw event is not happening right after the guest
scanout frame is flushed.

This delay often makes the guest stuck at certain frame for too long and
causes general performance degradation of graphic workloads on the guest's
side especially when the display update rate is high. This unwanted perf
drop can be reduced if the guest scanout frame is rendered as soon as it is
flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
pipeline can be unblocked a lot earlier in this case so that guest can
move to the next display frame right away.

However, this "asynchrounous" render mode may cause some waste of resources
as the guest could produce more frames than what are actually displayed
on the host display. This is similar as running rendering apps with no vblank
or vsync option. This is why this feature should stay as optional.

The param 'render_sync' is set to 'true' by default and this is in line
with traditional way while setting it to 'false' is basically enabling
this asynchronouse mode.

Dongwon Kim (4):
  hw/display/virtio-gpu: Introducing render_sync param
  ui/egl-helpers: Consolidates create-sync and create-fence
  ui/gtk-egl: Start rendering of guest blob scanout if render_sync is
    off
  ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync
    is off

 include/hw/virtio/virtio-gpu.h  |  3 ++
 include/ui/dmabuf.h             |  4 +-
 include/ui/egl-helpers.h        |  3 +-
 hw/display/vhost-user-gpu.c     |  3 +-
 hw/display/virtio-gpu-udmabuf.c |  3 +-
 hw/display/virtio-gpu.c         |  2 +
 hw/vfio/display.c               |  3 +-
 ui/dbus-listener.c              |  2 +-
 ui/dmabuf.c                     | 11 +++-
 ui/egl-helpers.c                | 27 ++++------
 ui/gtk-egl.c                    | 93 ++++++++++++++++++---------------
 ui/gtk-gl-area.c                | 90 +++++++++++++++++++------------
 12 files changed, 146 insertions(+), 98 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/4] hw/display/virtio-gpu: Introducing render_sync param
  2024-06-20 23:17 [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render dongwon.kim
@ 2024-06-20 23:17 ` dongwon.kim
  2024-06-20 23:17 ` [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence dongwon.kim
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: dongwon.kim @ 2024-06-20 23:17 UTC (permalink / raw)
  To: qemu-devel

From: Dongwon Kim <dongwon.kim@intel.com>

Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
is used (blob=true). The new param is used to specify when to start
rendering a guest scanout frame.

By default (and so far) rendering of the guest frame is started in
the draw event to make sure guest display update is sychronized with
host's vsync. But this method inevitably brings some extra wait because
most of time, the draw event is not happening right after the guest
scanout frame is flushed.

This delay often makes the guest stuck at certain frame for too long and
causes general performance degradation of graphic workloads on the guest's
side especially when the display update rate is high. This unwanted perf
drop can be reduced if the guest scanout frame is rendered as soon as it is
flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
pipeline can be unblocked a lot earlier in this case so that guest can
move to the next display frame right away.

However, this "asynchrounous" render mode may cause some waste of resources
as the guest could produce more frames than what are actually displayed
on the host display. This is similar as running rendering apps with no vblank
or vsync option. This is why this feature should stay as optional.

The param 'render_sync' is set to 'true' by default and this is in line
with traditional way while setting it to 'false' is basically enabling
this asynchronouse mode.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
---
 include/hw/virtio/virtio-gpu.h  |  3 +++
 include/ui/dmabuf.h             |  4 +++-
 hw/display/vhost-user-gpu.c     |  3 ++-
 hw/display/virtio-gpu-udmabuf.c |  3 ++-
 hw/display/virtio-gpu.c         |  2 ++
 hw/vfio/display.c               |  3 ++-
 ui/dbus-listener.c              |  2 +-
 ui/dmabuf.c                     | 11 ++++++++++-
 8 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 7a59379f5a..9bcc572eab 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -97,6 +97,7 @@ enum virtio_gpu_base_conf_flags {
     VIRTIO_GPU_FLAG_EDID_ENABLED,
     VIRTIO_GPU_FLAG_DMABUF_ENABLED,
     VIRTIO_GPU_FLAG_BLOB_ENABLED,
+    VIRTIO_GPU_FLAG_RENDER_SYNC_ENABLED,
     VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
     VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
 };
@@ -111,6 +112,8 @@ enum virtio_gpu_base_conf_flags {
     (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
     (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_render_sync_enabled(_cfg) \
+    (_cfg.flags & (1 << VIRTIO_GPU_FLAG_RENDER_SYNC_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
     (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 #define virtio_gpu_rutabaga_enabled(_cfg) \
diff --git a/include/ui/dmabuf.h b/include/ui/dmabuf.h
index dc74ba895a..45384e32e3 100644
--- a/include/ui/dmabuf.h
+++ b/include/ui/dmabuf.h
@@ -17,7 +17,8 @@ QemuDmaBuf *qemu_dmabuf_new(uint32_t width, uint32_t height,
                             uint32_t y, uint32_t backing_width,
                             uint32_t backing_height, uint32_t fourcc,
                             uint64_t modifier, int dmabuf_fd,
-                            bool allow_fences, bool y0_top);
+                            bool allow_fences, bool y0_top,
+                            bool render_sync);
 void qemu_dmabuf_free(QemuDmaBuf *dmabuf);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QemuDmaBuf, qemu_dmabuf_free);
@@ -40,6 +41,7 @@ void *qemu_dmabuf_get_sync(QemuDmaBuf *dmabuf);
 int32_t qemu_dmabuf_get_fence_fd(QemuDmaBuf *dmabuf);
 bool qemu_dmabuf_get_allow_fences(QemuDmaBuf *dmabuf);
 bool qemu_dmabuf_get_draw_submitted(QemuDmaBuf *dmabuf);
+bool qemu_dmabuf_get_render_sync(QemuDmaBuf *dmabuf);
 void qemu_dmabuf_set_texture(QemuDmaBuf *dmabuf, uint32_t texture);
 void qemu_dmabuf_set_fence_fd(QemuDmaBuf *dmabuf, int32_t fence_fd);
 void qemu_dmabuf_set_sync(QemuDmaBuf *dmabuf, void *sync);
diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index e4b398d26c..69fa722c88 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -285,7 +285,8 @@ vhost_user_gpu_handle_display(VhostUserGPU *g, VhostUserGpuMsg *msg)
                                  m->fd_stride, 0, 0, 0, 0,
                                  m->fd_drm_fourcc, modifier,
                                  fd, false, m->fd_flags &
-                                 VIRTIO_GPU_RESOURCE_FLAG_Y_0_TOP);
+                                 VIRTIO_GPU_RESOURCE_FLAG_Y_0_TOP,
+                                 false);
 
         dpy_gl_scanout_dmabuf(con, dmabuf);
         g->dmabuf[m->scanout_id] = dmabuf;
diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index c02ec6d37d..8fcc0c3055 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -176,6 +176,7 @@ static VGPUDMABuf
                           struct virtio_gpu_rect *r)
 {
     VGPUDMABuf *dmabuf;
+    bool render_sync = virtio_gpu_render_sync_enabled(g->parent_obj.conf);
 
     if (res->dmabuf_fd < 0) {
         return NULL;
@@ -185,7 +186,7 @@ static VGPUDMABuf
     dmabuf->buf = qemu_dmabuf_new(r->width, r->height, fb->stride,
                                   r->x, r->y, fb->width, fb->height,
                                   qemu_pixman_to_drm_format(fb->format),
-                                  0, res->dmabuf_fd, true, false);
+                                  0, res->dmabuf_fd, true, false, render_sync);
     dmabuf->scanout_id = scanout_id;
     QTAILQ_INSERT_HEAD(&g->dmabuf.bufs, dmabuf, next);
 
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index d60b1b2973..b6b82de306 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1671,6 +1671,8 @@ static Property virtio_gpu_properties[] = {
                      256 * MiB),
     DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
                     VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+    DEFINE_PROP_BIT("render_sync", VirtIOGPU, parent_obj.conf.flags,
+                    VIRTIO_GPU_FLAG_RENDER_SYNC_ENABLED, true),
     DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
     DEFINE_PROP_UINT8("x-scanout-vmstate-version", VirtIOGPU, scanout_vmstate_version, 2),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 661e921616..202ba78288 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -244,7 +244,8 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice *vdev,
     dmabuf->buf = qemu_dmabuf_new(plane.width, plane.height,
                                   plane.stride, 0, 0, plane.width,
                                   plane.height, plane.drm_format,
-                                  plane.drm_format_mod, fd, false, false);
+                                  plane.drm_format_mod, fd, false,
+                                  false, false);
 
     if (plane_type == DRM_PLANE_TYPE_CURSOR) {
         vfio_display_update_cursor(dmabuf, &plane);
diff --git a/ui/dbus-listener.c b/ui/dbus-listener.c
index 5490088043..7547b0e248 100644
--- a/ui/dbus-listener.c
+++ b/ui/dbus-listener.c
@@ -456,7 +456,7 @@ static void dbus_scanout_texture(DisplayChangeListener *dcl,
     }
     dmabuf = qemu_dmabuf_new(w, h, stride, x, y, backing_width,
                              backing_height, fourcc, modifier, fd,
-                             false, backing_y_0_top);
+                             false, backing_y_0_top, false);
 
     dbus_scanout_dmabuf(dcl, dmabuf);
     qemu_dmabuf_close(dmabuf);
diff --git a/ui/dmabuf.c b/ui/dmabuf.c
index df7a09703f..193097f9a2 100644
--- a/ui/dmabuf.c
+++ b/ui/dmabuf.c
@@ -26,6 +26,7 @@ struct QemuDmaBuf {
     void      *sync;
     int       fence_fd;
     bool      allow_fences;
+    bool      render_sync;
     bool      draw_submitted;
 };
 
@@ -34,7 +35,7 @@ QemuDmaBuf *qemu_dmabuf_new(uint32_t width, uint32_t height,
                             uint32_t y, uint32_t backing_width,
                             uint32_t backing_height, uint32_t fourcc,
                             uint64_t modifier, int32_t dmabuf_fd,
-                            bool allow_fences, bool y0_top) {
+                            bool allow_fences, bool y0_top, bool render_sync) {
     QemuDmaBuf *dmabuf;
 
     dmabuf = g_new0(QemuDmaBuf, 1);
@@ -51,6 +52,7 @@ QemuDmaBuf *qemu_dmabuf_new(uint32_t width, uint32_t height,
     dmabuf->fd = dmabuf_fd;
     dmabuf->allow_fences = allow_fences;
     dmabuf->y0_top = y0_top;
+    dmabuf->render_sync = render_sync;
     dmabuf->fence_fd = -1;
 
     return dmabuf;
@@ -198,6 +200,13 @@ bool qemu_dmabuf_get_draw_submitted(QemuDmaBuf *dmabuf)
     return dmabuf->draw_submitted;
 }
 
+bool qemu_dmabuf_get_render_sync(QemuDmaBuf *dmabuf)
+{
+    assert(dmabuf != NULL);
+
+    return dmabuf->render_sync;
+}
+
 void qemu_dmabuf_set_texture(QemuDmaBuf *dmabuf, uint32_t texture)
 {
     assert(dmabuf != NULL);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence
  2024-06-20 23:17 [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render dongwon.kim
  2024-06-20 23:17 ` [RFC PATCH 1/4] hw/display/virtio-gpu: Introducing render_sync param dongwon.kim
@ 2024-06-20 23:17 ` dongwon.kim
  2024-07-02  8:32   ` Marc-André Lureau
  2024-06-20 23:17 ` [RFC PATCH 3/4] ui/gtk-egl: Start rendering of guest blob scanout if render_sync is off dongwon.kim
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: dongwon.kim @ 2024-06-20 23:17 UTC (permalink / raw)
  To: qemu-devel

From: Dongwon Kim <dongwon.kim@intel.com>

There is no reason to split those two operations so combining
two functions - egl_dmabuf_create_sync and egl_dmabuf_create_fence.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
---
 include/ui/egl-helpers.h |  3 +--
 ui/egl-helpers.c         | 27 +++++++++++----------------
 ui/gtk-egl.c             | 19 +++++++------------
 ui/gtk-gl-area.c         | 10 ++--------
 4 files changed, 21 insertions(+), 38 deletions(-)

diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 4b8c0d2281..606d6c8288 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -51,8 +51,7 @@ int egl_get_fd_for_texture(uint32_t tex_id, EGLint *stride, EGLint *fourcc,
 
 void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf);
 void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
-void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf);
-void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
+int egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
 
 #endif
 
diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 99b2ebbe23..e16f2cb23d 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -371,34 +371,29 @@ void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf)
     qemu_dmabuf_set_texture(dmabuf, 0);
 }
 
-void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
+int egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
 {
     EGLSyncKHR sync;
+    int fence_fd = -1;
 
     if (epoxy_has_egl_extension(qemu_egl_display,
                                 "EGL_KHR_fence_sync") &&
         epoxy_has_egl_extension(qemu_egl_display,
-                                "EGL_ANDROID_native_fence_sync")) {
+                                "EGL_ANDROID_native_fence_sync") &&
+        qemu_dmabuf_get_fence_fd(dmabuf) == -1) {
         sync = eglCreateSyncKHR(qemu_egl_display,
                                 EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
         if (sync != EGL_NO_SYNC_KHR) {
-            qemu_dmabuf_set_sync(dmabuf, sync);
+            fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
+                                                  sync);
+            if (fence_fd >= 0) {
+                qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
+            }
+            eglDestroySyncKHR(qemu_egl_display, sync);
         }
     }
-}
-
-void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
-{
-    void *sync = qemu_dmabuf_get_sync(dmabuf);
-    int fence_fd;
 
-    if (sync) {
-        fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
-                                              sync);
-        qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
-        eglDestroySyncKHR(qemu_egl_display, sync);
-        qemu_dmabuf_set_sync(dmabuf, NULL);
-    }
+    return fence_fd;
 }
 
 #endif /* CONFIG_GBM */
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 9831c10e1b..55199f8659 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -68,7 +68,6 @@ void gd_egl_draw(VirtualConsole *vc)
     GdkWindow *window;
 #ifdef CONFIG_GBM
     QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
-    int fence_fd;
 #endif
     int ww, wh, ws;
 
@@ -99,13 +98,12 @@ void gd_egl_draw(VirtualConsole *vc)
         glFlush();
 #ifdef CONFIG_GBM
         if (dmabuf) {
-            egl_dmabuf_create_fence(dmabuf);
-            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
+            fence_fd = egl_dmabuf_create_fence(dmabuf);
             if (fence_fd >= 0) {
                 qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
-                return;
+            } else {
+                graphic_hw_gl_block(vc->gfx.dcl.con, false);
             }
-            graphic_hw_gl_block(vc->gfx.dcl.con, false);
         }
 #endif
     } else {
@@ -336,7 +334,11 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 {
     VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
     GdkWindow *window;
+#ifdef CONFIG_GBM
+    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+#endif
     int ww, wh, ws;
+    int fence_fd;
 
     if (!vc->gfx.scanout_mode) {
         return;
@@ -364,12 +366,6 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
         egl_fb_blit(&vc->gfx.win_fb, &vc->gfx.guest_fb, !vc->gfx.y0_top);
     }
 
-#ifdef CONFIG_GBM
-    if (vc->gfx.guest_fb.dmabuf) {
-        egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
-    }
-#endif
-
     eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 }
 
@@ -387,7 +383,6 @@ void gd_egl_flush(DisplayChangeListener *dcl,
         gtk_widget_queue_draw_area(area, x, y, w, h);
         return;
     }
-
     gd_egl_scanout_flush(&vc->gfx.dcl, x, y, w, h);
 }
 
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index b628b35451..0b11423824 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -77,17 +77,10 @@ void gd_gl_area_draw(VirtualConsole *vc)
         glBlitFramebuffer(0, y1, vc->gfx.w, y2,
                           0, 0, ww, wh,
                           GL_COLOR_BUFFER_BIT, GL_NEAREST);
-#ifdef CONFIG_GBM
-        if (dmabuf) {
-            egl_dmabuf_create_sync(dmabuf);
-        }
-#endif
-        glFlush();
 #ifdef CONFIG_GBM
         if (dmabuf) {
             int fence_fd;
-            egl_dmabuf_create_fence(dmabuf);
-            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
+            fence_fd = egl_dmabuf_create_fence(dmabuf);
             if (fence_fd >= 0) {
                 qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
                 return;
@@ -95,6 +88,7 @@ void gd_gl_area_draw(VirtualConsole *vc)
             graphic_hw_gl_block(vc->gfx.dcl.con, false);
         }
 #endif
+        glFlush();
     } else {
         if (!vc->gfx.ds) {
             return;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/4] ui/gtk-egl: Start rendering of guest blob scanout if render_sync is off
  2024-06-20 23:17 [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render dongwon.kim
  2024-06-20 23:17 ` [RFC PATCH 1/4] hw/display/virtio-gpu: Introducing render_sync param dongwon.kim
  2024-06-20 23:17 ` [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence dongwon.kim
@ 2024-06-20 23:17 ` dongwon.kim
  2024-06-20 23:17 ` [RFC PATCH 4/4] ui/gtk-gl-draw: " dongwon.kim
  2024-07-02 10:29 ` [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render Marc-André Lureau
  4 siblings, 0 replies; 12+ messages in thread
From: dongwon.kim @ 2024-06-20 23:17 UTC (permalink / raw)
  To: qemu-devel

From: Dongwon Kim <dongwon.kim@intel.com>

Draw (executing glBlitFramebuffer) immediately as soon as the frame
is flushed instead of getting it done in the next draw event if render_sync
flag is reset. With this, the fence will be signaled way ealier so the guest
can be working on the next frame right away.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
---
 ui/gtk-egl.c | 88 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 55199f8659..2877140c3b 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -80,6 +80,12 @@ void gd_egl_draw(VirtualConsole *vc)
     ww = gdk_window_get_width(window) * ws;
     wh = gdk_window_get_height(window) * ws;
 
+    vc->gfx.scale_x = (double)ww / surface_width(vc->gfx.ds);
+    vc->gfx.scale_y = (double)wh / surface_height(vc->gfx.ds);
+
+    eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
+                   vc->gfx.esurface, vc->gfx.ectx);
+
     if (vc->gfx.scanout_mode) {
 #ifdef CONFIG_GBM
         if (dmabuf) {
@@ -88,21 +94,9 @@ void gd_egl_draw(VirtualConsole *vc)
             } else {
                 qemu_dmabuf_set_draw_submitted(dmabuf, false);
             }
-        }
-#endif
-        gd_egl_scanout_flush(&vc->gfx.dcl, 0, 0, vc->gfx.w, vc->gfx.h);
 
-        vc->gfx.scale_x = (double)ww / surface_width(vc->gfx.ds);
-        vc->gfx.scale_y = (double)wh / surface_height(vc->gfx.ds);
-
-        glFlush();
-#ifdef CONFIG_GBM
-        if (dmabuf) {
-            fence_fd = egl_dmabuf_create_fence(dmabuf);
-            if (fence_fd >= 0) {
-                qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
-            } else {
-                graphic_hw_gl_block(vc->gfx.dcl.con, false);
+            if (qemu_dmabuf_get_render_sync(dmabuf)) {
+                gd_egl_scanout_flush(&vc->gfx.dcl, 0, 0, vc->gfx.w, vc->gfx.h);
             }
         }
 #endif
@@ -110,19 +104,12 @@ void gd_egl_draw(VirtualConsole *vc)
         if (!vc->gfx.ds) {
             return;
         }
-        eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
-                       vc->gfx.esurface, vc->gfx.ectx);
-
         surface_gl_setup_viewport(vc->gfx.gls, vc->gfx.ds, ww, wh);
         surface_gl_render_texture(vc->gfx.gls, vc->gfx.ds);
-
-        eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
-
-        vc->gfx.scale_x = (double)ww / surface_width(vc->gfx.ds);
-        vc->gfx.scale_y = (double)wh / surface_height(vc->gfx.ds);
-
-        glFlush();
     }
+
+    eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
+    glFlush();
 }
 
 void gd_egl_update(DisplayChangeListener *dcl,
@@ -146,14 +133,20 @@ void gd_egl_refresh(DisplayChangeListener *dcl)
 {
     VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+#ifdef CONFIG_GBM
+    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+#endif
+
     gd_update_monitor_refresh_rate(
             vc, vc->window ? vc->window : vc->gfx.drawing_area);
 
-    if (vc->gfx.guest_fb.dmabuf &&
-        qemu_dmabuf_get_draw_submitted(vc->gfx.guest_fb.dmabuf)) {
+#ifdef CONFIG_GBM
+    if (dmabuf && qemu_dmabuf_get_draw_submitted(dmabuf) &&
+        qemu_dmabuf_get_render_sync(dmabuf)) {
         gd_egl_draw(vc);
         return;
     }
+#endif
 
     if (!vc->gfx.esurface) {
         gd_egl_init(vc);
@@ -166,9 +159,9 @@ void gd_egl_refresh(DisplayChangeListener *dcl)
             surface_gl_create_texture(vc->gfx.gls, vc->gfx.ds);
         }
 #ifdef CONFIG_GBM
-        if (vc->gfx.guest_fb.dmabuf) {
-            egl_dmabuf_release_texture(vc->gfx.guest_fb.dmabuf);
-            gd_egl_scanout_dmabuf(dcl, vc->gfx.guest_fb.dmabuf);
+        if (dmabuf) {
+            egl_dmabuf_release_texture(dmabuf);
+            gd_egl_scanout_dmabuf(dcl, dmabuf);
         }
 #endif
     }
@@ -344,6 +337,11 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
         return;
     }
     if (!vc->gfx.guest_fb.framebuffer) {
+#ifdef CONFIG_GBM
+        if (dmabuf) {
+            graphic_hw_gl_block(vc->gfx.dcl.con, false);
+        }
+#endif
         return;
     }
 
@@ -366,7 +364,16 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
         egl_fb_blit(&vc->gfx.win_fb, &vc->gfx.guest_fb, !vc->gfx.y0_top);
     }
 
-    eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
+#ifdef CONFIG_GBM
+    if (dmabuf) {
+        fence_fd = egl_dmabuf_create_fence(dmabuf);
+        if (fence_fd >= 0) {
+            qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
+        } else {
+            graphic_hw_gl_block(vc->gfx.dcl.con, false);
+        }
+    }
+#endif
 }
 
 void gd_egl_flush(DisplayChangeListener *dcl,
@@ -374,15 +381,22 @@ void gd_egl_flush(DisplayChangeListener *dcl,
 {
     VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
     GtkWidget *area = vc->gfx.drawing_area;
-
-    if (vc->gfx.guest_fb.dmabuf &&
-        !qemu_dmabuf_get_draw_submitted(vc->gfx.guest_fb.dmabuf)) {
-        graphic_hw_gl_block(vc->gfx.dcl.con, true);
-        qemu_dmabuf_set_draw_submitted(vc->gfx.guest_fb.dmabuf, true);
-        gtk_egl_set_scanout_mode(vc, true);
-        gtk_widget_queue_draw_area(area, x, y, w, h);
+#ifdef CONFIG_GBM
+    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+    if (dmabuf) {
+        if (!qemu_dmabuf_get_draw_submitted(dmabuf)) {
+            graphic_hw_gl_block(vc->gfx.dcl.con, true);
+            qemu_dmabuf_set_draw_submitted(dmabuf, true);
+            gtk_egl_set_scanout_mode(vc, true);
+            if (!qemu_dmabuf_get_render_sync(dmabuf)) {
+                gd_egl_scanout_flush(&vc->gfx.dcl, 0, 0, vc->gfx.w, vc->gfx.h);
+            }
+            gtk_widget_queue_draw_area(area, x, y, w, h);
+        }
         return;
     }
+#endif
+
     gd_egl_scanout_flush(&vc->gfx.dcl, x, y, w, h);
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/4] ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync is off
  2024-06-20 23:17 [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render dongwon.kim
                   ` (2 preceding siblings ...)
  2024-06-20 23:17 ` [RFC PATCH 3/4] ui/gtk-egl: Start rendering of guest blob scanout if render_sync is off dongwon.kim
@ 2024-06-20 23:17 ` dongwon.kim
  2024-07-02 10:29 ` [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render Marc-André Lureau
  4 siblings, 0 replies; 12+ messages in thread
From: dongwon.kim @ 2024-06-20 23:17 UTC (permalink / raw)
  To: qemu-devel

From: Dongwon Kim <dongwon.kim@intel.com>

Draw (executing glBlitFramebuffer) immediately as soon as the frame
is flushed instead of getting it done in the next draw event if render_sync
flag is reset. With this, the fence will be signaled way ealier so the guest
can be working on the next frame right away.

Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
---
 ui/gtk-gl-area.c | 84 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 26 deletions(-)

diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 0b11423824..88d4e66a52 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -65,39 +65,36 @@ void gd_gl_area_draw(VirtualConsole *vc)
             } else {
                 qemu_dmabuf_set_draw_submitted(dmabuf, false);
             }
-        }
-#endif
 
-        glBindFramebuffer(GL_READ_FRAMEBUFFER, vc->gfx.guest_fb.framebuffer);
-        /* GtkGLArea sets GL_DRAW_FRAMEBUFFER for us */
-
-        glViewport(0, 0, ww, wh);
-        y1 = vc->gfx.y0_top ? 0 : vc->gfx.h;
-        y2 = vc->gfx.y0_top ? vc->gfx.h : 0;
-        glBlitFramebuffer(0, y1, vc->gfx.w, y2,
-                          0, 0, ww, wh,
-                          GL_COLOR_BUFFER_BIT, GL_NEAREST);
-#ifdef CONFIG_GBM
-        if (dmabuf) {
-            int fence_fd;
-            fence_fd = egl_dmabuf_create_fence(dmabuf);
-            if (fence_fd >= 0) {
-                qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
-                return;
+            if (qemu_dmabuf_get_render_sync(dmabuf)) {
+                int fence_fd;
+                glBindFramebuffer(GL_READ_FRAMEBUFFER, vc->gfx.guest_fb.framebuffer);
+                /* GtkGLArea sets GL_DRAW_FRAMEBUFFER for us */
+
+                glViewport(0, 0, ww, wh);
+                y1 = vc->gfx.y0_top ? 0 : vc->gfx.h;
+                y2 = vc->gfx.y0_top ? vc->gfx.h : 0;
+                glBlitFramebuffer(0, y1, vc->gfx.w, y2,
+                                  0, 0, ww, wh,
+                                  GL_COLOR_BUFFER_BIT, GL_NEAREST);
+                fence_fd = egl_dmabuf_create_fence(dmabuf);
+                if (fence_fd >= 0) {
+                    qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
+                } else {
+                    graphic_hw_gl_block(vc->gfx.dcl.con, false);
+                }
             }
-            graphic_hw_gl_block(vc->gfx.dcl.con, false);
         }
 #endif
-        glFlush();
     } else {
         if (!vc->gfx.ds) {
             return;
         }
-        gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
 
         surface_gl_setup_viewport(vc->gfx.gls, vc->gfx.ds, ww, wh);
         surface_gl_render_texture(vc->gfx.gls, vc->gfx.ds);
     }
+    glFlush();
 }
 
 void gd_gl_area_update(DisplayChangeListener *dcl,
@@ -119,13 +116,19 @@ void gd_gl_area_refresh(DisplayChangeListener *dcl)
 {
     VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+#ifdef CONFIG_GBM
+    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+#endif
+
     gd_update_monitor_refresh_rate(vc, vc->window ? vc->window : vc->gfx.drawing_area);
 
-    if (vc->gfx.guest_fb.dmabuf &&
-        qemu_dmabuf_get_draw_submitted(vc->gfx.guest_fb.dmabuf)) {
+#ifdef CONFIG_GBM
+    if (dmabuf && qemu_dmabuf_get_draw_submitted(dmabuf) &&
+        qemu_dmabuf_get_render_sync(dmabuf)) {
         gd_gl_area_draw(vc);
         return;
     }
+#endif
 
     if (!vc->gfx.gls) {
         if (!gtk_widget_get_realized(vc->gfx.drawing_area)) {
@@ -282,13 +285,42 @@ void gd_gl_area_scanout_flush(DisplayChangeListener *dcl,
                           uint32_t x, uint32_t y, uint32_t w, uint32_t h)
 {
     VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+#ifdef CONFIG_GBM
+    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+    int ww, wh, ws, y1, y2;
+
+    if (dmabuf && !qemu_dmabuf_get_draw_submitted(dmabuf)) {
+        gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
+        ws = gdk_window_get_scale_factor(gtk_widget_get_window(vc->gfx.drawing_area));
+        ww = gtk_widget_get_allocated_width(vc->gfx.drawing_area) * ws;
+        wh = gtk_widget_get_allocated_height(vc->gfx.drawing_area) * ws;
 
-    if (vc->gfx.guest_fb.dmabuf &&
-        !qemu_dmabuf_get_draw_submitted(vc->gfx.guest_fb.dmabuf)) {
         graphic_hw_gl_block(vc->gfx.dcl.con, true);
-        qemu_dmabuf_set_draw_submitted(vc->gfx.guest_fb.dmabuf, true);
+        qemu_dmabuf_set_draw_submitted(dmabuf, true);
         gtk_gl_area_set_scanout_mode(vc, true);
+        if (!qemu_dmabuf_get_render_sync(dmabuf)) {
+            int fence_fd;
+            glBindFramebuffer(GL_READ_FRAMEBUFFER, vc->gfx.guest_fb.framebuffer);
+            /* GtkGLArea sets GL_DRAW_FRAMEBUFFER for us */
+
+            glViewport(0, 0, ww, wh);
+            y1 = vc->gfx.y0_top ? 0 : vc->gfx.h;
+            y2 = vc->gfx.y0_top ? vc->gfx.h : 0;
+            glBlitFramebuffer(0, y1, vc->gfx.w, y2,
+                              0, 0, ww, wh,
+                              GL_COLOR_BUFFER_BIT, GL_NEAREST);
+
+            egl_dmabuf_create_fence(dmabuf);
+            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
+            if (fence_fd >= 0) {
+                qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
+            } else {
+                graphic_hw_gl_block(vc->gfx.dcl.con, false);
+            }
+        }
     }
+#endif
+
     gtk_gl_area_queue_render(GTK_GL_AREA(vc->gfx.drawing_area));
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence
  2024-06-20 23:17 ` [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence dongwon.kim
@ 2024-07-02  8:32   ` Marc-André Lureau
  2024-07-02 18:12     ` Kim, Dongwon
  0 siblings, 1 reply; 12+ messages in thread
From: Marc-André Lureau @ 2024-07-02  8:32 UTC (permalink / raw)
  To: dongwon.kim; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6686 bytes --]

Hi

On Fri, Jun 21, 2024 at 3:19 AM <dongwon.kim@intel.com> wrote:

> From: Dongwon Kim <dongwon.kim@intel.com>
>
> There is no reason to split those two operations so combining
> two functions - egl_dmabuf_create_sync and egl_dmabuf_create_fence.
>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
>

Can you send this change as a different patch series, or earlier in the
series?

You should also remove qemu_dmabuf_{set,get}_sync() and associated fields

---
>  include/ui/egl-helpers.h |  3 +--
>  ui/egl-helpers.c         | 27 +++++++++++----------------
>  ui/gtk-egl.c             | 19 +++++++------------
>  ui/gtk-gl-area.c         | 10 ++--------
>  4 files changed, 21 insertions(+), 38 deletions(-)
>
> diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
> index 4b8c0d2281..606d6c8288 100644
> --- a/include/ui/egl-helpers.h
> +++ b/include/ui/egl-helpers.h
> @@ -51,8 +51,7 @@ int egl_get_fd_for_texture(uint32_t tex_id, EGLint
> *stride, EGLint *fourcc,
>
>  void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf);
>  void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
> -void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf);
> -void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
> +int egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
>
>  #endif
>
> diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
> index 99b2ebbe23..e16f2cb23d 100644
> --- a/ui/egl-helpers.c
> +++ b/ui/egl-helpers.c
> @@ -371,34 +371,29 @@ void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf)
>      qemu_dmabuf_set_texture(dmabuf, 0);
>  }
>
> -void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
> +int egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
>  {
>      EGLSyncKHR sync;
> +    int fence_fd = -1;
>
>      if (epoxy_has_egl_extension(qemu_egl_display,
>                                  "EGL_KHR_fence_sync") &&
>          epoxy_has_egl_extension(qemu_egl_display,
> -                                "EGL_ANDROID_native_fence_sync")) {
> +                                "EGL_ANDROID_native_fence_sync") &&
> +        qemu_dmabuf_get_fence_fd(dmabuf) == -1) {
>          sync = eglCreateSyncKHR(qemu_egl_display,
>                                  EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
>          if (sync != EGL_NO_SYNC_KHR) {
> -            qemu_dmabuf_set_sync(dmabuf, sync);
> +            fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
> +                                                  sync);
> +            if (fence_fd >= 0) {
> +                qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
> +            }
> +            eglDestroySyncKHR(qemu_egl_display, sync);
>          }
>      }
> -}
> -
> -void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
> -{
> -    void *sync = qemu_dmabuf_get_sync(dmabuf);
> -    int fence_fd;
>
> -    if (sync) {
> -        fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
> -                                              sync);
> -        qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
> -        eglDestroySyncKHR(qemu_egl_display, sync);
> -        qemu_dmabuf_set_sync(dmabuf, NULL);
> -    }
> +    return fence_fd;
>

It should probably return qemu_dmabuf_get_fence_fd() instead. The function
should be renamed with _fd postfix since you make it return the fd.

 }
>
>  #endif /* CONFIG_GBM */
> diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
> index 9831c10e1b..55199f8659 100644
> --- a/ui/gtk-egl.c
> +++ b/ui/gtk-egl.c
> @@ -68,7 +68,6 @@ void gd_egl_draw(VirtualConsole *vc)
>      GdkWindow *window;
>  #ifdef CONFIG_GBM
>      QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
> -    int fence_fd;
>  #endif
>      int ww, wh, ws;
>
> @@ -99,13 +98,12 @@ void gd_egl_draw(VirtualConsole *vc)
>          glFlush();
>  #ifdef CONFIG_GBM
>          if (dmabuf) {
> -            egl_dmabuf_create_fence(dmabuf);
> -            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
> +            fence_fd = egl_dmabuf_create_fence(dmabuf);
>              if (fence_fd >= 0) {
>                  qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
> -                return;
> +            } else {
> +                graphic_hw_gl_block(vc->gfx.dcl.con, false);
>              }
> -            graphic_hw_gl_block(vc->gfx.dcl.con, false);
>          }
>  #endif
>      } else {
> @@ -336,7 +334,11 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
>  {
>      VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
>      GdkWindow *window;
> +#ifdef CONFIG_GBM
> +    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
> +#endif
>      int ww, wh, ws;
> +    int fence_fd;
>
>      if (!vc->gfx.scanout_mode) {
>          return;
> @@ -364,12 +366,6 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
>          egl_fb_blit(&vc->gfx.win_fb, &vc->gfx.guest_fb, !vc->gfx.y0_top);
>      }
>
> -#ifdef CONFIG_GBM
> -    if (vc->gfx.guest_fb.dmabuf) {
> -        egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
> -    }
> -#endif
> -
>      eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
>  }
>
> @@ -387,7 +383,6 @@ void gd_egl_flush(DisplayChangeListener *dcl,
>          gtk_widget_queue_draw_area(area, x, y, w, h);
>          return;
>      }
> -
>      gd_egl_scanout_flush(&vc->gfx.dcl, x, y, w, h);
>  }
>
> diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
> index b628b35451..0b11423824 100644
> --- a/ui/gtk-gl-area.c
> +++ b/ui/gtk-gl-area.c
> @@ -77,17 +77,10 @@ void gd_gl_area_draw(VirtualConsole *vc)
>          glBlitFramebuffer(0, y1, vc->gfx.w, y2,
>                            0, 0, ww, wh,
>                            GL_COLOR_BUFFER_BIT, GL_NEAREST);
> -#ifdef CONFIG_GBM
> -        if (dmabuf) {
> -            egl_dmabuf_create_sync(dmabuf);
> -        }
> -#endif
> -        glFlush();
>  #ifdef CONFIG_GBM
>          if (dmabuf) {
>              int fence_fd;
> -            egl_dmabuf_create_fence(dmabuf);
> -            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
> +            fence_fd = egl_dmabuf_create_fence(dmabuf);
>              if (fence_fd >= 0) {
>                  qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed, NULL, vc);
>                  return;
> @@ -95,6 +88,7 @@ void gd_gl_area_draw(VirtualConsole *vc)
>              graphic_hw_gl_block(vc->gfx.dcl.con, false);
>          }
>  #endif
> +        glFlush();
>      } else {
>          if (!vc->gfx.ds) {
>              return;
> --
> 2.34.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 8815 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render
  2024-06-20 23:17 [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render dongwon.kim
                   ` (3 preceding siblings ...)
  2024-06-20 23:17 ` [RFC PATCH 4/4] ui/gtk-gl-draw: " dongwon.kim
@ 2024-07-02 10:29 ` Marc-André Lureau
  2024-07-02 18:11   ` Kim, Dongwon
  4 siblings, 1 reply; 12+ messages in thread
From: Marc-André Lureau @ 2024-07-02 10:29 UTC (permalink / raw)
  To: dongwon.kim, Gerd Hoffmann; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3030 bytes --]

Hi

On Fri, Jun 21, 2024 at 3:20 AM <dongwon.kim@intel.com> wrote:

> From: Dongwon Kim <dongwon.kim@intel.com>
>
> Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
> is used (blob=true). The new param is used to specify when to start
> rendering a guest scanout frame.
>
> By default (and so far) rendering of the guest frame is started in
> the draw event to make sure guest display update is sychronized with
> host's vsync. But this method inevitably brings some extra wait because
> most of time, the draw event is not happening right after the guest
> scanout frame is flushed.
>

> This delay often makes the guest stuck at certain frame for too long and
> causes general performance degradation of graphic workloads on the guest's
> side especially when the display update rate is high. This unwanted perf
> drop can be reduced if the guest scanout frame is rendered as soon as it is
> flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
> pipeline can be unblocked a lot earlier in this case so that guest can
> move to the next display frame right away.
>
> However, this "asynchrounous" render mode may cause some waste of resources
> as the guest could produce more frames than what are actually displayed
> on the host display. This is similar as running rendering apps with no
> vblank
> or vsync option. This is why this feature should stay as optional.
>

Indeed, I don't see much point in doing so, it's pure waste. If you want to
benchmark something perhaps. But then, why not simply run a benchmark
offscreen?


>
> The param 'render_sync' is set to 'true' by default and this is in line
> with traditional way while setting it to 'false' is basically enabling
> this asynchronouse mode.
>
>
As it stands now, the option should actually be on the display backend
(gtk/gtk-egl), it's not implemented for other backends.

I am not convinced this is generally useful to be an extra option though.


> Dongwon Kim (4):
>   hw/display/virtio-gpu: Introducing render_sync param
>   ui/egl-helpers: Consolidates create-sync and create-fence
>   ui/gtk-egl: Start rendering of guest blob scanout if render_sync is
>     off
>   ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync
>     is off
>
>  include/hw/virtio/virtio-gpu.h  |  3 ++
>  include/ui/dmabuf.h             |  4 +-
>  include/ui/egl-helpers.h        |  3 +-
>  hw/display/vhost-user-gpu.c     |  3 +-
>  hw/display/virtio-gpu-udmabuf.c |  3 +-
>  hw/display/virtio-gpu.c         |  2 +
>  hw/vfio/display.c               |  3 +-
>  ui/dbus-listener.c              |  2 +-
>  ui/dmabuf.c                     | 11 +++-
>  ui/egl-helpers.c                | 27 ++++------
>  ui/gtk-egl.c                    | 93 ++++++++++++++++++---------------
>  ui/gtk-gl-area.c                | 90 +++++++++++++++++++------------
>  12 files changed, 146 insertions(+), 98 deletions(-)
>
> --
> 2.34.1
>
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 4492 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render
  2024-07-02 10:29 ` [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render Marc-André Lureau
@ 2024-07-02 18:11   ` Kim, Dongwon
  2024-07-03  6:26     ` Marc-André Lureau
  0 siblings, 1 reply; 12+ messages in thread
From: Kim, Dongwon @ 2024-07-02 18:11 UTC (permalink / raw)
  To: Marc-André Lureau, Gerd Hoffmann; +Cc: qemu-devel

Hi Marc-André,

On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Jun 21, 2024 at 3:20 AM <dongwon.kim@intel.com 
> <mailto:dongwon.kim@intel.com>> wrote:
> 
>     From: Dongwon Kim <dongwon.kim@intel.com <mailto:dongwon.kim@intel.com>>
> 
>     Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
>     is used (blob=true). The new param is used to specify when to start
>     rendering a guest scanout frame.
> 
>     By default (and so far) rendering of the guest frame is started in
>     the draw event to make sure guest display update is sychronized with
>     host's vsync. But this method inevitably brings some extra wait because
>     most of time, the draw event is not happening right after the guest
>     scanout frame is flushed.
> 
> 
>     This delay often makes the guest stuck at certain frame for too long and
>     causes general performance degradation of graphic workloads on the
>     guest's
>     side especially when the display update rate is high. This unwanted perf
>     drop can be reduced if the guest scanout frame is rendered as soon
>     as it is
>     flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
>     pipeline can be unblocked a lot earlier in this case so that guest can
>     move to the next display frame right away.
> 
>     However, this "asynchrounous" render mode may cause some waste of
>     resources
>     as the guest could produce more frames than what are actually displayed
>     on the host display. This is similar as running rendering apps with
>     no vblank
>     or vsync option. This is why this feature should stay as optional.
> 
> 
> Indeed, I don't see much point in doing so, it's pure waste. If you want 
> to benchmark something perhaps. But then, why not simply run a benchmark 
> offscreen?

Benchmark score is not the primary reason. The problem we want to avoid 
is the laggy display update due to too many asynchronous plane updates 
happening in the guest in certain situations like when moving SW mouse 
cursor rigorously on the guest. This is because the fence (as well as 
response for the resource-flush cmd) is signaled in the draw event.

> 
> 
>     The param 'render_sync' is set to 'true' by default and this is in line
>     with traditional way while setting it to 'false' is basically enabling
>     this asynchronouse mode.
> 
> 
> As it stands now, the option should actually be on the display backend 
> (gtk/gtk-egl), it's not implemented for other backends.

We can help to deploy this concept to other backends if needed.

> 
> I am not convinced this is generally useful to be an extra option though.

I initially thought this should be the default because the virtio-gpu 
guest should not be hold by the host for too long in any cases. And this 
new approach is comparable to the default mode with blob=false where the 
guest is released as soon as the resource flush is done. Only difference 
is the resource synchronization using the fence.

> 
>     Dongwon Kim (4):
>        hw/display/virtio-gpu: Introducing render_sync param
>        ui/egl-helpers: Consolidates create-sync and create-fence
>        ui/gtk-egl: Start rendering of guest blob scanout if render_sync is
>          off
>        ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync
>          is off
> 
>       include/hw/virtio/virtio-gpu.h  |  3 ++
>       include/ui/dmabuf.h             |  4 +-
>       include/ui/egl-helpers.h        |  3 +-
>       hw/display/vhost-user-gpu.c     |  3 +-
>       hw/display/virtio-gpu-udmabuf.c |  3 +-
>       hw/display/virtio-gpu.c         |  2 +
>       hw/vfio/display.c               |  3 +-
>       ui/dbus-listener.c              |  2 +-
>       ui/dmabuf.c                     | 11 +++-
>       ui/egl-helpers.c                | 27 ++++------
>       ui/gtk-egl.c                    | 93 ++++++++++++++++++---------------
>       ui/gtk-gl-area.c                | 90 +++++++++++++++++++------------
>       12 files changed, 146 insertions(+), 98 deletions(-)
> 
>     -- 
>     2.34.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence
  2024-07-02  8:32   ` Marc-André Lureau
@ 2024-07-02 18:12     ` Kim, Dongwon
  0 siblings, 0 replies; 12+ messages in thread
From: Kim, Dongwon @ 2024-07-02 18:12 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: qemu-devel

Hi Marc-André,

I will come up with a separate patch for this.

On 7/2/2024 1:32 AM, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Jun 21, 2024 at 3:19 AM <dongwon.kim@intel.com 
> <mailto:dongwon.kim@intel.com>> wrote:
> 
>     From: Dongwon Kim <dongwon.kim@intel.com <mailto:dongwon.kim@intel.com>>
> 
>     There is no reason to split those two operations so combining
>     two functions - egl_dmabuf_create_sync and egl_dmabuf_create_fence.
> 
>     Cc: Gerd Hoffmann <kraxel@redhat.com <mailto:kraxel@redhat.com>>
>     Cc: Marc-André Lureau <marcandre.lureau@redhat.com
>     <mailto:marcandre.lureau@redhat.com>>
>     Cc: Vivek Kasireddy <vivek.kasireddy@intel.com
>     <mailto:vivek.kasireddy@intel.com>>
>     Signed-off-by: Dongwon Kim <dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com>>
> 
> 
> Can you send this change as a different patch series, or earlier in the 
> series?
> 
> You should also remove qemu_dmabuf_{set,get}_sync() and associated fields
> 
>     ---
>       include/ui/egl-helpers.h |  3 +--
>       ui/egl-helpers.c         | 27 +++++++++++----------------
>       ui/gtk-egl.c             | 19 +++++++------------
>       ui/gtk-gl-area.c         | 10 ++--------
>       4 files changed, 21 insertions(+), 38 deletions(-)
> 
>     diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
>     index 4b8c0d2281..606d6c8288 100644
>     --- a/include/ui/egl-helpers.h
>     +++ b/include/ui/egl-helpers.h
>     @@ -51,8 +51,7 @@ int egl_get_fd_for_texture(uint32_t tex_id, EGLint
>     *stride, EGLint *fourcc,
> 
>       void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf);
>       void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
>     -void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf);
>     -void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
>     +int egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
> 
>       #endif
> 
>     diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
>     index 99b2ebbe23..e16f2cb23d 100644
>     --- a/ui/egl-helpers.c
>     +++ b/ui/egl-helpers.c
>     @@ -371,34 +371,29 @@ void egl_dmabuf_release_texture(QemuDmaBuf
>     *dmabuf)
>           qemu_dmabuf_set_texture(dmabuf, 0);
>       }
> 
>     -void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
>     +int egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
>       {
>           EGLSyncKHR sync;
>     +    int fence_fd = -1;
> 
>           if (epoxy_has_egl_extension(qemu_egl_display,
>                                       "EGL_KHR_fence_sync") &&
>               epoxy_has_egl_extension(qemu_egl_display,
>     -                                "EGL_ANDROID_native_fence_sync")) {
>     +                                "EGL_ANDROID_native_fence_sync") &&
>     +        qemu_dmabuf_get_fence_fd(dmabuf) == -1) {
>               sync = eglCreateSyncKHR(qemu_egl_display,
>                                       EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
>               if (sync != EGL_NO_SYNC_KHR) {
>     -            qemu_dmabuf_set_sync(dmabuf, sync);
>     +            fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
>     +                                                  sync);
>     +            if (fence_fd >= 0) {
>     +                qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
>     +            }
>     +            eglDestroySyncKHR(qemu_egl_display, sync);
>               }
>           }
>     -}
>     -
>     -void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
>     -{
>     -    void *sync = qemu_dmabuf_get_sync(dmabuf);
>     -    int fence_fd;
> 
>     -    if (sync) {
>     -        fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
>     -                                              sync);
>     -        qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
>     -        eglDestroySyncKHR(qemu_egl_display, sync);
>     -        qemu_dmabuf_set_sync(dmabuf, NULL);
>     -    }
>     +    return fence_fd;
> 
> 
> It should probably return qemu_dmabuf_get_fence_fd() instead. The 
> function should be renamed with _fd postfix since you make it return the fd.
> 
>       }
> 
>       #endif /* CONFIG_GBM */
>     diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
>     index 9831c10e1b..55199f8659 100644
>     --- a/ui/gtk-egl.c
>     +++ b/ui/gtk-egl.c
>     @@ -68,7 +68,6 @@ void gd_egl_draw(VirtualConsole *vc)
>           GdkWindow *window;
>       #ifdef CONFIG_GBM
>           QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
>     -    int fence_fd;
>       #endif
>           int ww, wh, ws;
> 
>     @@ -99,13 +98,12 @@ void gd_egl_draw(VirtualConsole *vc)
>               glFlush();
>       #ifdef CONFIG_GBM
>               if (dmabuf) {
>     -            egl_dmabuf_create_fence(dmabuf);
>     -            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
>     +            fence_fd = egl_dmabuf_create_fence(dmabuf);
>                   if (fence_fd >= 0) {
>                       qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed,
>     NULL, vc);
>     -                return;
>     +            } else {
>     +                graphic_hw_gl_block(vc->gfx.dcl.con, false);
>                   }
>     -            graphic_hw_gl_block(vc->gfx.dcl.con, false);
>               }
>       #endif
>           } else {
>     @@ -336,7 +334,11 @@ void gd_egl_scanout_flush(DisplayChangeListener
>     *dcl,
>       {
>           VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
>           GdkWindow *window;
>     +#ifdef CONFIG_GBM
>     +    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
>     +#endif
>           int ww, wh, ws;
>     +    int fence_fd;
> 
>           if (!vc->gfx.scanout_mode) {
>               return;
>     @@ -364,12 +366,6 @@ void gd_egl_scanout_flush(DisplayChangeListener
>     *dcl,
>               egl_fb_blit(&vc->gfx.win_fb, &vc->gfx.guest_fb,
>     !vc->gfx.y0_top);
>           }
> 
>     -#ifdef CONFIG_GBM
>     -    if (vc->gfx.guest_fb.dmabuf) {
>     -        egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
>     -    }
>     -#endif
>     -
>           eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
>       }
> 
>     @@ -387,7 +383,6 @@ void gd_egl_flush(DisplayChangeListener *dcl,
>               gtk_widget_queue_draw_area(area, x, y, w, h);
>               return;
>           }
>     -
>           gd_egl_scanout_flush(&vc->gfx.dcl, x, y, w, h);
>       }
> 
>     diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
>     index b628b35451..0b11423824 100644
>     --- a/ui/gtk-gl-area.c
>     +++ b/ui/gtk-gl-area.c
>     @@ -77,17 +77,10 @@ void gd_gl_area_draw(VirtualConsole *vc)
>               glBlitFramebuffer(0, y1, vc->gfx.w, y2,
>                                 0, 0, ww, wh,
>                                 GL_COLOR_BUFFER_BIT, GL_NEAREST);
>     -#ifdef CONFIG_GBM
>     -        if (dmabuf) {
>     -            egl_dmabuf_create_sync(dmabuf);
>     -        }
>     -#endif
>     -        glFlush();
>       #ifdef CONFIG_GBM
>               if (dmabuf) {
>                   int fence_fd;
>     -            egl_dmabuf_create_fence(dmabuf);
>     -            fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
>     +            fence_fd = egl_dmabuf_create_fence(dmabuf);
>                   if (fence_fd >= 0) {
>                       qemu_set_fd_handler(fence_fd, gd_hw_gl_flushed,
>     NULL, vc);
>                       return;
>     @@ -95,6 +88,7 @@ void gd_gl_area_draw(VirtualConsole *vc)
>                   graphic_hw_gl_block(vc->gfx.dcl.con, false);
>               }
>       #endif
>     +        glFlush();
>           } else {
>               if (!vc->gfx.ds) {
>                   return;
>     -- 
>     2.34.1
> 
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render
  2024-07-02 18:11   ` Kim, Dongwon
@ 2024-07-03  6:26     ` Marc-André Lureau
  2024-07-05 18:24       ` Kim, Dongwon
  2024-07-08 16:40       ` Kim, Dongwon
  0 siblings, 2 replies; 12+ messages in thread
From: Marc-André Lureau @ 2024-07-03  6:26 UTC (permalink / raw)
  To: Kim, Dongwon; +Cc: Gerd Hoffmann, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4930 bytes --]

Hi

On Tue, Jul 2, 2024 at 10:11 PM Kim, Dongwon <dongwon.kim@intel.com> wrote:

> Hi Marc-André,
>
> On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
> > Hi
> >
> > On Fri, Jun 21, 2024 at 3:20 AM <dongwon.kim@intel.com
> > <mailto:dongwon.kim@intel.com>> wrote:
> >
> >     From: Dongwon Kim <dongwon.kim@intel.com <mailto:
> dongwon.kim@intel.com>>
> >
> >     Introducing new virtio-gpu param, 'render_sync' when guest scanout
> blob
> >     is used (blob=true). The new param is used to specify when to start
> >     rendering a guest scanout frame.
> >
> >     By default (and so far) rendering of the guest frame is started in
> >     the draw event to make sure guest display update is sychronized with
> >     host's vsync. But this method inevitably brings some extra wait
> because
> >     most of time, the draw event is not happening right after the guest
> >     scanout frame is flushed.
> >
> >
> >     This delay often makes the guest stuck at certain frame for too long
> and
> >     causes general performance degradation of graphic workloads on the
> >     guest's
> >     side especially when the display update rate is high. This unwanted
> perf
> >     drop can be reduced if the guest scanout frame is rendered as soon
> >     as it is
> >     flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
> >     pipeline can be unblocked a lot earlier in this case so that guest
> can
> >     move to the next display frame right away.
> >
> >     However, this "asynchrounous" render mode may cause some waste of
> >     resources
> >     as the guest could produce more frames than what are actually
> displayed
> >     on the host display. This is similar as running rendering apps with
> >     no vblank
> >     or vsync option. This is why this feature should stay as optional.
> >
> >
> > Indeed, I don't see much point in doing so, it's pure waste. If you want
> > to benchmark something perhaps. But then, why not simply run a benchmark
> > offscreen?
>
> Benchmark score is not the primary reason. The problem we want to avoid
> is the laggy display update due to too many asynchronous plane updates
> happening in the guest in certain situations like when moving SW mouse
> cursor rigorously on the guest. This is because the fence (as well as
> response for the resource-flush cmd) is signaled in the draw event.
>
>
Presumably, you won't get more frames displayed (perhaps even less due to
extra load), so why is it laggy? Is it because the guest is doing too much
buffering? Because the mouse event queue isn't being drained between
scanouts?


> >
> >
> >     The param 'render_sync' is set to 'true' by default and this is in
> line
> >     with traditional way while setting it to 'false' is basically
> enabling
> >     this asynchronouse mode.
> >
> >
> > As it stands now, the option should actually be on the display backend
> > (gtk/gtk-egl), it's not implemented for other backends.
>
> We can help to deploy this concept to other backends if needed.
>

Why not make it a gtk display option instead?


> >
> > I am not convinced this is generally useful to be an extra option though.
>
> I initially thought this should be the default because the virtio-gpu
> guest should not be hold by the host for too long in any cases. And this
> new approach is comparable to the default mode with blob=false where the
> guest is released as soon as the resource flush is done. Only difference
> is the resource synchronization using the fence.
>

virgl should be blocking rendering too

Could you detail your testing setup ?

thanks


> >
> >     Dongwon Kim (4):
> >        hw/display/virtio-gpu: Introducing render_sync param
> >        ui/egl-helpers: Consolidates create-sync and create-fence
> >        ui/gtk-egl: Start rendering of guest blob scanout if render_sync
> is
> >          off
> >        ui/gtk-gl-draw: Start rendering of guest blob scanout if
> render_sync
> >          is off
> >
> >       include/hw/virtio/virtio-gpu.h  |  3 ++
> >       include/ui/dmabuf.h             |  4 +-
> >       include/ui/egl-helpers.h        |  3 +-
> >       hw/display/vhost-user-gpu.c     |  3 +-
> >       hw/display/virtio-gpu-udmabuf.c |  3 +-
> >       hw/display/virtio-gpu.c         |  2 +
> >       hw/vfio/display.c               |  3 +-
> >       ui/dbus-listener.c              |  2 +-
> >       ui/dmabuf.c                     | 11 +++-
> >       ui/egl-helpers.c                | 27 ++++------
> >       ui/gtk-egl.c                    | 93
> ++++++++++++++++++---------------
> >       ui/gtk-gl-area.c                | 90
> +++++++++++++++++++------------
> >       12 files changed, 146 insertions(+), 98 deletions(-)
> >
> >     --
> >     2.34.1
> >
> >
> >
> >
> > --
> > Marc-André Lureau
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 6999 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render
  2024-07-03  6:26     ` Marc-André Lureau
@ 2024-07-05 18:24       ` Kim, Dongwon
  2024-07-08 16:40       ` Kim, Dongwon
  1 sibling, 0 replies; 12+ messages in thread
From: Kim, Dongwon @ 2024-07-05 18:24 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Gerd Hoffmann, qemu-devel

On 7/2/2024 11:26 PM, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Jul 2, 2024 at 10:11 PM Kim, Dongwon <dongwon.kim@intel.com 
> <mailto:dongwon.kim@intel.com>> wrote:
> 
>     Hi Marc-André,
> 
>     On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
>      > Hi
>      >
>      > On Fri, Jun 21, 2024 at 3:20 AM <dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com>
>      > <mailto:dongwon.kim@intel.com <mailto:dongwon.kim@intel.com>>> wrote:
>      >
>      >     From: Dongwon Kim <dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com> <mailto:dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com>>>
>      >
>      >     Introducing new virtio-gpu param, 'render_sync' when guest
>     scanout blob
>      >     is used (blob=true). The new param is used to specify when to
>     start
>      >     rendering a guest scanout frame.
>      >
>      >     By default (and so far) rendering of the guest frame is
>     started in
>      >     the draw event to make sure guest display update is
>     sychronized with
>      >     host's vsync. But this method inevitably brings some extra
>     wait because
>      >     most of time, the draw event is not happening right after the
>     guest
>      >     scanout frame is flushed.
>      >
>      >
>      >     This delay often makes the guest stuck at certain frame for
>     too long and
>      >     causes general performance degradation of graphic workloads
>     on the
>      >     guest's
>      >     side especially when the display update rate is high. This
>     unwanted perf
>      >     drop can be reduced if the guest scanout frame is rendered as
>     soon
>      >     as it is
>      >     flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl
>     display
>      >     pipeline can be unblocked a lot earlier in this case so that
>     guest can
>      >     move to the next display frame right away.
>      >
>      >     However, this "asynchrounous" render mode may cause some waste of
>      >     resources
>      >     as the guest could produce more frames than what are actually
>     displayed
>      >     on the host display. This is similar as running rendering
>     apps with
>      >     no vblank
>      >     or vsync option. This is why this feature should stay as
>     optional.
>      >
>      >
>      > Indeed, I don't see much point in doing so, it's pure waste. If
>     you want
>      > to benchmark something perhaps. But then, why not simply run a
>     benchmark
>      > offscreen?
> 
>     Benchmark score is not the primary reason. The problem we want to avoid
>     is the laggy display update due to too many asynchronous plane updates
>     happening in the guest in certain situations like when moving SW mouse
>     cursor rigorously on the guest. This is because the fence (as well as
>     response for the resource-flush cmd) is signaled in the draw event.
> 
> 
> Presumably, you won't get more frames displayed (perhaps even less due 
> to extra load), so why is it laggy? Is it because the guest is doing too 
> much buffering? 

Yes, I believe so. Virtio-gpu driver can't issue another resource flush 
as the previous frame is still on hold by the host. But there are many 
plane-update requests due to frame buffer updates because of the 
movement of SW mouse cursor in between. BTW, we currently use dirtyFB 
update on the guest running Xorg. Maybe using pageflip would make things 
better but we haven't tried it yet.

Because the mouse event queue isn't being drained
> between scanouts?



> 
>      >
>      >
>      >     The param 'render_sync' is set to 'true' by default and this
>     is in line
>      >     with traditional way while setting it to 'false' is basically
>     enabling
>      >     this asynchronouse mode.
>      >
>      >
>      > As it stands now, the option should actually be on the display
>     backend
>      > (gtk/gtk-egl), it's not implemented for other backends.
> 
>     We can help to deploy this concept to other backends if needed.
> 
> 
> Why not make it a gtk display option instead?

That is possible but I made it virtio-gpu option as this is specific to 
virtio-gpu backend. We can consider moving it to gtk layer if it makes 
more sense.

> 
> 
>      >
>      > I am not convinced this is generally useful to be an extra option
>     though.
> 
>     I initially thought this should be the default because the virtio-gpu
>     guest should not be hold by the host for too long in any cases. And
>     this
>     new approach is comparable to the default mode with blob=false where
>     the
>     guest is released as soon as the resource flush is done. Only
>     difference
>     is the resource synchronization using the fence.
> 
> 
> virgl should be blocking rendering too
> 
> Could you detail your testing setup ?

We use ubuntu linux as a guest OS but it's modified for our GFX SRIOV
setup. I am not sure if this is the details you are looking for but here 
is it: we virtualize our GPU using SRIOV. So individual linux guest have 
their dedicated portion of GPU device that is detected and worked as 
"render-only" device. And since it is render-only, we pair it with 
virtio-gpu device/driver that provides display feature. Scanouts on the 
guest are allocated by virtio-gpu then imported by Mesa driver then used 
as a main framebuffer. On every plane update done by compositor will get
virtio-gpu driver to issue resource-flush (followed by set scanout) to 
host. Host (QEMU) gets this scanout data as blob (scatter-gather list) 
then creates its own dmabuf using udmabuf driver. At every resource 
flush, we block the render pipeline as well as the guest until the 
rendering if finished for scanout synchronization as you know.

We do not use virgl (instead, the normal HW driver for SRIOV-Virtual 
device is used).

So this idea of having render_sync=false is based on

1. glBlitFrameBuffer can be done anytime after resource-flush is received.
2. the guest can be unblocked whenever the blob is used (cloned by 
glBlitFrameBuffer).

> 
> thanks
> 
> 
>      >
>      >     Dongwon Kim (4):
>      >        hw/display/virtio-gpu: Introducing render_sync param
>      >        ui/egl-helpers: Consolidates create-sync and create-fence
>      >        ui/gtk-egl: Start rendering of guest blob scanout if
>     render_sync is
>      >          off
>      >        ui/gtk-gl-draw: Start rendering of guest blob scanout if
>     render_sync
>      >          is off
>      >
>      >       include/hw/virtio/virtio-gpu.h  |  3 ++
>      >       include/ui/dmabuf.h             |  4 +-
>      >       include/ui/egl-helpers.h        |  3 +-
>      >       hw/display/vhost-user-gpu.c     |  3 +-
>      >       hw/display/virtio-gpu-udmabuf.c |  3 +-
>      >       hw/display/virtio-gpu.c         |  2 +
>      >       hw/vfio/display.c               |  3 +-
>      >       ui/dbus-listener.c              |  2 +-
>      >       ui/dmabuf.c                     | 11 +++-
>      >       ui/egl-helpers.c                | 27 ++++------
>      >       ui/gtk-egl.c                    | 93
>     ++++++++++++++++++---------------
>      >       ui/gtk-gl-area.c                | 90
>     +++++++++++++++++++------------
>      >       12 files changed, 146 insertions(+), 98 deletions(-)
>      >
>      >     --
>      >     2.34.1
>      >
>      >
>      >
>      >
>      > --
>      > Marc-André Lureau
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render
  2024-07-03  6:26     ` Marc-André Lureau
  2024-07-05 18:24       ` Kim, Dongwon
@ 2024-07-08 16:40       ` Kim, Dongwon
  1 sibling, 0 replies; 12+ messages in thread
From: Kim, Dongwon @ 2024-07-08 16:40 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Gerd Hoffmann, qemu-devel

On 7/2/2024 11:26 PM, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Jul 2, 2024 at 10:11 PM Kim, Dongwon <dongwon.kim@intel.com 
> <mailto:dongwon.kim@intel.com>> wrote:
> 
>     Hi Marc-André,
> 
>     On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
>      > Hi
>      >
>      > On Fri, Jun 21, 2024 at 3:20 AM <dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com>
>      > <mailto:dongwon.kim@intel.com <mailto:dongwon.kim@intel.com>>> wrote:
>      >
>      >     From: Dongwon Kim <dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com> <mailto:dongwon.kim@intel.com
>     <mailto:dongwon.kim@intel.com>>>
>      >
>      >     Introducing new virtio-gpu param, 'render_sync' when guest
>     scanout blob
>      >     is used (blob=true). The new param is used to specify when to
>     start
>      >     rendering a guest scanout frame.
>      >
>      >     By default (and so far) rendering of the guest frame is
>     started in
>      >     the draw event to make sure guest display update is
>     sychronized with
>      >     host's vsync. But this method inevitably brings some extra
>     wait because
>      >     most of time, the draw event is not happening right after the
>     guest
>      >     scanout frame is flushed.
>      >
>      >
>      >     This delay often makes the guest stuck at certain frame for
>     too long and
>      >     causes general performance degradation of graphic workloads
>     on the
>      >     guest's
>      >     side especially when the display update rate is high. This
>     unwanted perf
>      >     drop can be reduced if the guest scanout frame is rendered as
>     soon
>      >     as it is
>      >     flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl
>     display
>      >     pipeline can be unblocked a lot earlier in this case so that
>     guest can
>      >     move to the next display frame right away.
>      >
>      >     However, this "asynchrounous" render mode may cause some waste of
>      >     resources
>      >     as the guest could produce more frames than what are actually
>     displayed
>      >     on the host display. This is similar as running rendering
>     apps with
>      >     no vblank
>      >     or vsync option. This is why this feature should stay as
>     optional.
>      >
>      >
>      > Indeed, I don't see much point in doing so, it's pure waste. If
>     you want
>      > to benchmark something perhaps. But then, why not simply run a
>     benchmark
>      > offscreen?
> 
>     Benchmark score is not the primary reason. The problem we want to avoid
>     is the laggy display update due to too many asynchronous plane updates
>     happening in the guest in certain situations like when moving SW mouse
>     cursor rigorously on the guest. This is because the fence (as well as
>     response for the resource-flush cmd) is signaled in the draw event.
> 
> 
> Presumably, you won't get more frames displayed (perhaps even less due 
> to extra load), so why is it laggy? Is it because the guest is doing too 
> much buffering? 

Yes, I believe so. Virtio-gpu driver can't issue another resource flush 
as the previous frame is still on hold by the host. But there are many 
plane-update requests due to frame buffer updates because of the 
movement of SW mouse cursor in between. BTW, we currently use dirtyFB 
update on the guest running Xorg. Maybe using pageflip would make things 
better but we haven't tried it yet.

Because the mouse event queue isn't being drained
> between scanouts?



> 
>      >
>      >
>      >     The param 'render_sync' is set to 'true' by default and this
>     is in line
>      >     with traditional way while setting it to 'false' is basically
>     enabling
>      >     this asynchronouse mode.
>      >
>      >
>      > As it stands now, the option should actually be on the display
>     backend
>      > (gtk/gtk-egl), it's not implemented for other backends.
> 
>     We can help to deploy this concept to other backends if needed.
> 
> 
> Why not make it a gtk display option instead?

That is possible but I made it virtio-gpu option as this is specific to 
virtio-gpu backend. We can consider moving it to gtk layer if it makes 
more sense.

> 
> 
>      >
>      > I am not convinced this is generally useful to be an extra option
>     though.
> 
>     I initially thought this should be the default because the virtio-gpu
>     guest should not be hold by the host for too long in any cases. And
>     this
>     new approach is comparable to the default mode with blob=false where
>     the
>     guest is released as soon as the resource flush is done. Only
>     difference
>     is the resource synchronization using the fence.
> 
> 
> virgl should be blocking rendering too
> 
> Could you detail your testing setup ?

We use ubuntu linux as a guest OS but it's modified for our GFX SRIOV
setup. I am not sure if this is the details you are looking for but here 
is it: we virtualize our GPU using SRIOV. So individual linux guest have 
their dedicated portion of GPU device that is detected and worked as 
"render-only" device. And since it is render-only, we pair it with 
virtio-gpu device/driver that provides display function. Scanouts on the 
guest are allocated by virtio-gpu then imported by Mesa driver then used 
as a main framebuffer. On every plane update done by compositor will get
virtio-gpu driver to issue resource-flush (followed by set scanout) to 
host. Host (QEMU) gets this scanout data as blob (scatter-gather list) 
then creates its own dmabuf using udmabuf driver. At every resource 
flush, we block the render pipeline as well as the guest until the 
rendering if finished for scanout synchronization as you know.

We do not use virgl (instead, the normal HW driver for SRIOV-Virtual 
device is used).

> 
> thanks
> 
> 
>      >
>      >     Dongwon Kim (4):
>      >        hw/display/virtio-gpu: Introducing render_sync param
>      >        ui/egl-helpers: Consolidates create-sync and create-fence
>      >        ui/gtk-egl: Start rendering of guest blob scanout if
>     render_sync is
>      >          off
>      >        ui/gtk-gl-draw: Start rendering of guest blob scanout if
>     render_sync
>      >          is off
>      >
>      >       include/hw/virtio/virtio-gpu.h  |  3 ++
>      >       include/ui/dmabuf.h             |  4 +-
>      >       include/ui/egl-helpers.h        |  3 +-
>      >       hw/display/vhost-user-gpu.c     |  3 +-
>      >       hw/display/virtio-gpu-udmabuf.c |  3 +-
>      >       hw/display/virtio-gpu.c         |  2 +
>      >       hw/vfio/display.c               |  3 +-
>      >       ui/dbus-listener.c              |  2 +-
>      >       ui/dmabuf.c                     | 11 +++-
>      >       ui/egl-helpers.c                | 27 ++++------
>      >       ui/gtk-egl.c                    | 93
>     ++++++++++++++++++---------------
>      >       ui/gtk-gl-area.c                | 90
>     +++++++++++++++++++------------
>      >       12 files changed, 146 insertions(+), 98 deletions(-)
>      >
>      >     --
>      >     2.34.1
>      >
>      >
>      >
>      >
>      > --
>      > Marc-André Lureau
> 
> 
> 
> -- 
> Marc-André Lureau



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-08 16:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 23:17 [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render dongwon.kim
2024-06-20 23:17 ` [RFC PATCH 1/4] hw/display/virtio-gpu: Introducing render_sync param dongwon.kim
2024-06-20 23:17 ` [RFC PATCH 2/4] ui/egl-helpers: Consolidates create-sync and create-fence dongwon.kim
2024-07-02  8:32   ` Marc-André Lureau
2024-07-02 18:12     ` Kim, Dongwon
2024-06-20 23:17 ` [RFC PATCH 3/4] ui/gtk-egl: Start rendering of guest blob scanout if render_sync is off dongwon.kim
2024-06-20 23:17 ` [RFC PATCH 4/4] ui/gtk-gl-draw: " dongwon.kim
2024-07-02 10:29 ` [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render Marc-André Lureau
2024-07-02 18:11   ` Kim, Dongwon
2024-07-03  6:26     ` Marc-André Lureau
2024-07-05 18:24       ` Kim, Dongwon
2024-07-08 16:40       ` Kim, Dongwon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).