qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
@ 2007-03-14  1:52 Julian Seward
  2007-03-14  3:39 ` Anthony Liguori
  2007-03-16 18:40 ` Anthony Liguori
  0 siblings, 2 replies; 11+ messages in thread
From: Julian Seward @ 2007-03-14  1:52 UTC (permalink / raw)
  To: qemu-devel


Here is a somewhat revised version of a patch I first made nearly
three years ago.  The original thread is 

http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00263.html

The patch makes QEMU's graphics emulation much more usable over remote
X connections, by reducing the amount of data sent to the X server.
This is particularly noticeable for small display updates, most
importantly mouse cursor movements, which become faster and so 
generally make the guest's GUI more pleasant to use.

Compared to the original patch, this one:

- is relative to 0.9.0

- handle screen->format->BytesPerPixel values of both 2 and 4,
  so handles most guest depths - I tested 8, 16, 24bpp.

- has unrolled comparison/copy loops for the depth 2 case.  Most
  of the comparisons are short (<= 64 bytes) so I don't see much
  point in taking the overhead of a call to memcmp/memcpy.

- most importantly, is optional and disabled by default, so that
  default performance is unchanged.  To use it you need the new
  -remote-x11 flag (perhaps -low-bandwidth-x11 would be a better 
  name).

It still uses a shadow frame buffer.  Fabrice mentioned this is not
necessary

 http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00279.html

but I can't see how to get rid of it and still check for redundant
updates in NxN pixel blocks (N=32 by default).  The point of checking
NxN squares is that mouse pointer pixmaps are square and so the most
common display updates - mouse pointer movements - are often reduced
to transmission of a single 32x32 block using this strategy.

The shadow framebuffer is only allocated when -remote-x11 is present,
so the patch has no effect on default memory use.

I measured the bandwidth saving roughly by resuming a vm snapshot
containing a web browser showing a page with a lot of links.  I moved
the pointer slowly over the links (so they change colour) and scrolled
up and down a bit; about 1/2 minute of activity in total.  I tried to do
the same with and without -remote-x11.  Without -remote-x11, 163Mbyte
was transmitted to the X server; with it, 20.6Mbyte was, about an 8:1
reduction.

J


diff -u -r Orig/qemu-0.9.0/sdl.c qemu-0.9.0/sdl.c
--- Orig/qemu-0.9.0/sdl.c	2007-02-05 23:01:54.000000000 +0000
+++ qemu-0.9.0/sdl.c	2007-03-13 22:16:40.000000000 +0000
@@ -29,6 +29,8 @@
 #include <signal.h>
 #endif
 
+#include <assert.h>
+
 static SDL_Surface *screen;
 static int gui_grab; /* if true, all keyboard/mouse events are grabbed */
 static int last_vm_running;
@@ -44,17 +46,232 @@
 static SDL_Cursor *sdl_cursor_hidden;
 static int absolute_enabled = 0;
 
+/* Mechanism to reduce the total amount of data transmitted to the X
+   server, often quite dramatically.  Keep a shadow copy of video
+   memory in alt_pixels, and when asked to update a rectangle, use
+   the shadow copy to establish areas which are the same, and so do
+   not need updating.
+*/
+
+static void* alt_pixels = NULL;
+
+#define THRESH 32
+
+/* Return 1 if the area [x .. x+w-1, y .. y+w-1] is different from
+   the old version and so needs updating. */
+static int cmpArea_16bit ( int x, int y, int w, int h )
+{
+           int    i, j;
+  unsigned int    sll;
+  unsigned short* p1base = (unsigned short*)screen->pixels;
+  unsigned short* p2base = (unsigned short*)alt_pixels;
+  assert(screen->format->BytesPerPixel == 2);
+  if (w == 0 || h == 0)
+     return 0;
+  assert(w > 0 && h > 0);
+  sll = ((unsigned int)screen->pitch) >> 1;
+  for (j = y; j < y+h; j++) {
+    unsigned short* p1p = & p1base[j * sll + x];
+    unsigned short* p2p = & p2base[j * sll + x];
+    for (i = 0; i < w-5; i += 5) {
+      if (p1p[i+0] != p2p[i+0]) return 1;
+      if (p1p[i+1] != p2p[i+1]) return 1;
+      if (p1p[i+2] != p2p[i+2]) return 1;
+      if (p1p[i+3] != p2p[i+3]) return 1;
+      if (p1p[i+4] != p2p[i+4]) return 1;
+    }
+    for (/*fixup*/; i < w; i++) {
+      if (p1p[i+0] != p2p[i+0]) return 1;
+    }
+  }
+  return 0;
+}
+static void copyArea_16bit ( int x, int y, int w, int h )
+{
+           int    i, j;
+  unsigned int    sll;
+  unsigned short* p1base = (unsigned short*)screen->pixels;
+  unsigned short* p2base = (unsigned short*)alt_pixels;
+  assert(screen->format->BytesPerPixel == 2);
+  sll = ((unsigned int)screen->pitch) >> 1;
+  if (w == 0 || h == 0)
+     return;
+  assert(w > 0 && h > 0);
+  for (j = y; j < y+h; j++) {
+    unsigned short* p1p = & p1base[j * sll + x];
+    unsigned short* p2p = & p2base[j * sll + x];
+    for (i = 0; i < w-5; i += 5) {
+      p2p[i+0] = p1p[i+0];
+      p2p[i+1] = p1p[i+1];
+      p2p[i+2] = p1p[i+2];
+      p2p[i+3] = p1p[i+3];
+      p2p[i+4] = p1p[i+4];
+    }
+    for (/*fixup*/; i < w; i++) {
+      p2p[i+0] = p1p[i+0];
+    }
+  }
+}
+
+static int cmpArea_32bit ( int x, int y, int w, int h )
+{
+  int           i, j;
+  unsigned int  sll;
+  unsigned int* p1base = (unsigned int*)screen->pixels;
+  unsigned int* p2base = (unsigned int*)alt_pixels;
+  assert(screen->format->BytesPerPixel == 4);
+  sll = ((unsigned int)screen->pitch) >> 2;
+  for (j = y; j < y+h; j++) {
+    for (i = x; i < x+w; i++) {
+      if (p1base [j * sll +i] != p2base [j * sll +i])
+	return 1;
+    }
+  }
+  return 0;
+}
+static void copyArea_32bit ( int x, int y, int w, int h )
+{
+  int           i, j;
+  unsigned int  sll;
+  unsigned int* p1base = (unsigned int*)screen->pixels;
+  unsigned int* p2base = (unsigned int*)alt_pixels;
+  assert(screen->format->BytesPerPixel == 4);
+  sll = ((unsigned int)screen->pitch) >> 2;
+  for (j = y; j < y+h; j++) {
+    for (i = x; i < x+w; i++) {
+      p2base [j * sll +i] = p1base [j * sll +i];
+    }
+  }
+}
+
+
+static void econoUpdate (DisplayState *ds, int x, int y, 
+                         unsigned int w, unsigned int h)
+{
+   static int tested_total = 0;
+   static int update_total = 0;
+
+   int  (*cmpArea) (int, int, int, int) = NULL;
+   void (*copyArea)(int, int, int, int) = NULL;
+
+   int xi, xj, xlim, yi, yj, ylim, ntest, nupd;
+   if (w == 0 || h == 0) 
+      return;
+
+   if (screen->format->BytesPerPixel == 4) {
+      cmpArea = cmpArea_32bit;
+      copyArea = copyArea_32bit;
+   }
+   else
+   if (screen->format->BytesPerPixel == 2) {
+      cmpArea = cmpArea_16bit;
+      copyArea = copyArea_16bit;
+   }
+   else
+      assert(0); /* should not be reached - sdl_update guards against this */
+
+   xlim = x + w;
+   ylim = y + h;
+
+   ntest = nupd = 0;
+   for (xi = x; xi < xlim; xi += THRESH) {
+      xj = xi + THRESH;
+      if (xj > xlim) xj = xlim;
+      for (yi = y; yi < ylim; yi += THRESH) {
+         yj = yi + THRESH;
+         if (yj > ylim) yj = ylim;
+         if (xj-xi == 0 || yj-yi == 0)
+            continue;
+	 ntest++;
+	 if (cmpArea(xi, yi, xj-xi, yj-yi)) {
+ 	    nupd++;
+            copyArea(xi, yi, xj-xi, yj-yi);
+            SDL_UpdateRect(screen, xi, yi, xj-xi, yj-yi);
+	 }
+      }
+   }
+   tested_total += ntest;
+   update_total += nupd;
+   if (0)
+   printf("(tested, updated):   total (%d, %d),   this time (%d, %d)\n", 
+          tested_total, update_total, ntest, nupd);
+}
+
+
 static void sdl_update(DisplayState *ds, int x, int y, int w, int h)
 {
-    //    printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
-    SDL_UpdateRect(screen, x, y, w, h);
+   int warned = 0;
+
+   //printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
+   //   printf("Total Size    %d %d\n", screen->w, screen->h);
+   //printf("BytesPerPixel %d\n", screen->format->BytesPerPixel);
+   //printf("pitch         %d (%d)\n", screen->pitch, 
+   //                       screen->w * screen->format->BytesPerPixel);
+
+   if (!remote_x11) {
+      SDL_UpdateRect(screen, x, y, w, h);
+      return;
+   }
+
+   if ( (screen->format->BytesPerPixel != 4 
+         && screen->format->BytesPerPixel != 2)
+       || screen->pitch != screen->w * screen->format->BytesPerPixel) {
+       if (!warned) {
+          warned = 1;
+          fprintf(stderr, "qemu: SDL update optimisation disabled\n"
+                          "      (wrong bits-per-pixel, or wrong pitch)\n");
+       }
+       SDL_UpdateRect(screen, x, y, w, h);
+       return;
+    }
+
+    assert(screen->pitch == screen->w * screen->format->BytesPerPixel);
+    assert(sizeof(int) == 4);
+    assert(sizeof(short) == 2);
+    if (alt_pixels == NULL) {
+       /* First time through (at this resolution).  
+          Copy entire screen. */
+       if (screen->format->BytesPerPixel == 4) {
+          int i, word32s = screen->w * screen->h;
+          if (0) printf("copying init screen - 32 bit\n");
+          alt_pixels = malloc(word32s * sizeof(int));
+          assert(alt_pixels);
+          for (i = 0; i < word32s; i++) 
+             ((unsigned int*)alt_pixels)[i] 
+                = ((unsigned int*)(screen->pixels))[i];
+          SDL_UpdateRect(screen, x, y, w, h);
+          if (0) printf("done- 32 bit\n");
+       }
+       else
+       if (screen->format->BytesPerPixel == 2) {
+          int i, word16s = screen->w * screen->h;
+          if (0) printf("copying init screen - 16 bit\n");
+          alt_pixels = malloc(word16s * sizeof(int));
+          assert(alt_pixels);
+          for (i = 0; i < word16s; i++) 
+             ((unsigned short*)alt_pixels)[i] 
+                = ((unsigned short*)(screen->pixels))[i];
+          SDL_UpdateRect(screen, x, y, w, h);
+          if (0) printf("done - 16 bit\n");
+       }
+       else 
+          assert(0); /* guarded above */
+    } else {
+       assert(w >= 0);
+       assert(h >= 0);
+       econoUpdate(ds, x, y, w, h);
+    }
 }
 
 static void sdl_resize(DisplayState *ds, int w, int h)
 {
     int flags;
 
-    //    printf("resizing to %d %d\n", w, h);
+    if (alt_pixels) {
+        assert(remote_x11);
+        free(alt_pixels);
+	alt_pixels = NULL;
+    }
 
     flags = SDL_HWSURFACE|SDL_ASYNCBLIT|SDL_HWACCEL;
     if (gui_fullscreen)
diff -u -r Orig/qemu-0.9.0/vl.c qemu-0.9.0/vl.c
--- Orig/qemu-0.9.0/vl.c	2007-02-05 23:01:54.000000000 +0000
+++ qemu-0.9.0/vl.c	2007-03-13 22:12:39.000000000 +0000
@@ -173,6 +173,7 @@
 int nb_option_roms;
 int semihosting_enabled = 0;
 int autostart = 1;
+int remote_x11 = 0;
 
 /***********************************************************/
 /* x86 ISA bus support */
@@ -6121,6 +6122,7 @@
 	   "-vnc display    start a VNC server on display\n"
 #ifndef _WIN32
 	   "-daemonize      daemonize QEMU after initializing\n"
+           "-remote-x11     improve graphics responsiveness when X11 
client != server\n"
 #endif
 	   "-option-rom rom load a file, rom, into the option ROM space\n"
            "\n"
@@ -6204,6 +6206,7 @@
     QEMU_OPTION_no_acpi,
     QEMU_OPTION_no_reboot,
     QEMU_OPTION_daemonize,
+    QEMU_OPTION_remote_x11,
     QEMU_OPTION_option_rom,
     QEMU_OPTION_semihosting
 };
@@ -6288,6 +6291,7 @@
     { "no-acpi", 0, QEMU_OPTION_no_acpi },
     { "no-reboot", 0, QEMU_OPTION_no_reboot },
     { "daemonize", 0, QEMU_OPTION_daemonize },
+    { "remote-x11", 0, QEMU_OPTION_remote_x11 },
     { "option-rom", HAS_ARG, QEMU_OPTION_option_rom },
 #if defined(TARGET_ARM)
     { "semihosting", 0, QEMU_OPTION_semihosting },
@@ -6947,6 +6951,9 @@
 	    case QEMU_OPTION_daemonize:
 		daemonize = 1;
 		break;
+	    case QEMU_OPTION_remote_x11:
+		remote_x11 = 1;
+		break;
 	    case QEMU_OPTION_option_rom:
 		if (nb_option_roms >= MAX_OPTION_ROMS) {
 		    fprintf(stderr, "Too many option ROMs\n");
diff -u -r Orig/qemu-0.9.0/vl.h qemu-0.9.0/vl.h
--- Orig/qemu-0.9.0/vl.h	2007-02-05 23:01:54.000000000 +0000
+++ qemu-0.9.0/vl.h	2007-03-13 22:11:24.000000000 +0000
@@ -159,6 +159,7 @@
 extern int no_quit;
 extern int semihosting_enabled;
 extern int autostart;
+extern int remote_x11;
 
 #define MAX_OPTION_ROMS 16
 extern const char *option_rom[MAX_OPTION_ROMS];

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-14  1:52 [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2 Julian Seward
@ 2007-03-14  3:39 ` Anthony Liguori
  2007-03-14  4:57   ` Mark Williamson
  2007-03-16 18:40 ` Anthony Liguori
  1 sibling, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2007-03-14  3:39 UTC (permalink / raw)
  To: qemu-devel

Julian Seward wrote:
> Here is a somewhat revised version of a patch I first made nearly
> three years ago.  The original thread is 
>   

The idea here is quite similar to what the VNC server does now.

If this is desirable for SDL too, then perhaps we should find a way to 
fold this into common code?

Although, is there a compelling reason to use SDL over X instead of VNC?

Regards,

Anthony Liguori

> http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00263.html
>
> The patch makes QEMU's graphics emulation much more usable over remote
> X connections, by reducing the amount of data sent to the X server.
> This is particularly noticeable for small display updates, most
> importantly mouse cursor movements, which become faster and so 
> generally make the guest's GUI more pleasant to use.
>
> Compared to the original patch, this one:
>
> - is relative to 0.9.0
>
> - handle screen->format->BytesPerPixel values of both 2 and 4,
>   so handles most guest depths - I tested 8, 16, 24bpp.
>
> - has unrolled comparison/copy loops for the depth 2 case.  Most
>   of the comparisons are short (<= 64 bytes) so I don't see much
>   point in taking the overhead of a call to memcmp/memcpy.
>
> - most importantly, is optional and disabled by default, so that
>   default performance is unchanged.  To use it you need the new
>   -remote-x11 flag (perhaps -low-bandwidth-x11 would be a better 
>   name).
>
> It still uses a shadow frame buffer.  Fabrice mentioned this is not
> necessary
>
>  http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00279.html
>
> but I can't see how to get rid of it and still check for redundant
> updates in NxN pixel blocks (N=32 by default).  The point of checking
> NxN squares is that mouse pointer pixmaps are square and so the most
> common display updates - mouse pointer movements - are often reduced
> to transmission of a single 32x32 block using this strategy.
>
> The shadow framebuffer is only allocated when -remote-x11 is present,
> so the patch has no effect on default memory use.
>
> I measured the bandwidth saving roughly by resuming a vm snapshot
> containing a web browser showing a page with a lot of links.  I moved
> the pointer slowly over the links (so they change colour) and scrolled
> up and down a bit; about 1/2 minute of activity in total.  I tried to do
> the same with and without -remote-x11.  Without -remote-x11, 163Mbyte
> was transmitted to the X server; with it, 20.6Mbyte was, about an 8:1
> reduction.
>
> J
>
>
> diff -u -r Orig/qemu-0.9.0/sdl.c qemu-0.9.0/sdl.c
> --- Orig/qemu-0.9.0/sdl.c	2007-02-05 23:01:54.000000000 +0000
> +++ qemu-0.9.0/sdl.c	2007-03-13 22:16:40.000000000 +0000
> @@ -29,6 +29,8 @@
>  #include <signal.h>
>  #endif
>  
> +#include <assert.h>
> +
>  static SDL_Surface *screen;
>  static int gui_grab; /* if true, all keyboard/mouse events are grabbed */
>  static int last_vm_running;
> @@ -44,17 +46,232 @@
>  static SDL_Cursor *sdl_cursor_hidden;
>  static int absolute_enabled = 0;
>  
> +/* Mechanism to reduce the total amount of data transmitted to the X
> +   server, often quite dramatically.  Keep a shadow copy of video
> +   memory in alt_pixels, and when asked to update a rectangle, use
> +   the shadow copy to establish areas which are the same, and so do
> +   not need updating.
> +*/
> +
> +static void* alt_pixels = NULL;
> +
> +#define THRESH 32
> +
> +/* Return 1 if the area [x .. x+w-1, y .. y+w-1] is different from
> +   the old version and so needs updating. */
> +static int cmpArea_16bit ( int x, int y, int w, int h )
> +{
> +           int    i, j;
> +  unsigned int    sll;
> +  unsigned short* p1base = (unsigned short*)screen->pixels;
> +  unsigned short* p2base = (unsigned short*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 2);
> +  if (w == 0 || h == 0)
> +     return 0;
> +  assert(w > 0 && h > 0);
> +  sll = ((unsigned int)screen->pitch) >> 1;
> +  for (j = y; j < y+h; j++) {
> +    unsigned short* p1p = & p1base[j * sll + x];
> +    unsigned short* p2p = & p2base[j * sll + x];
> +    for (i = 0; i < w-5; i += 5) {
> +      if (p1p[i+0] != p2p[i+0]) return 1;
> +      if (p1p[i+1] != p2p[i+1]) return 1;
> +      if (p1p[i+2] != p2p[i+2]) return 1;
> +      if (p1p[i+3] != p2p[i+3]) return 1;
> +      if (p1p[i+4] != p2p[i+4]) return 1;
> +    }
> +    for (/*fixup*/; i < w; i++) {
> +      if (p1p[i+0] != p2p[i+0]) return 1;
> +    }
> +  }
> +  return 0;
> +}
> +static void copyArea_16bit ( int x, int y, int w, int h )
> +{
> +           int    i, j;
> +  unsigned int    sll;
> +  unsigned short* p1base = (unsigned short*)screen->pixels;
> +  unsigned short* p2base = (unsigned short*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 2);
> +  sll = ((unsigned int)screen->pitch) >> 1;
> +  if (w == 0 || h == 0)
> +     return;
> +  assert(w > 0 && h > 0);
> +  for (j = y; j < y+h; j++) {
> +    unsigned short* p1p = & p1base[j * sll + x];
> +    unsigned short* p2p = & p2base[j * sll + x];
> +    for (i = 0; i < w-5; i += 5) {
> +      p2p[i+0] = p1p[i+0];
> +      p2p[i+1] = p1p[i+1];
> +      p2p[i+2] = p1p[i+2];
> +      p2p[i+3] = p1p[i+3];
> +      p2p[i+4] = p1p[i+4];
> +    }
> +    for (/*fixup*/; i < w; i++) {
> +      p2p[i+0] = p1p[i+0];
> +    }
> +  }
> +}
> +
> +static int cmpArea_32bit ( int x, int y, int w, int h )
> +{
> +  int           i, j;
> +  unsigned int  sll;
> +  unsigned int* p1base = (unsigned int*)screen->pixels;
> +  unsigned int* p2base = (unsigned int*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 4);
> +  sll = ((unsigned int)screen->pitch) >> 2;
> +  for (j = y; j < y+h; j++) {
> +    for (i = x; i < x+w; i++) {
> +      if (p1base [j * sll +i] != p2base [j * sll +i])
> +	return 1;
> +    }
> +  }
> +  return 0;
> +}
> +static void copyArea_32bit ( int x, int y, int w, int h )
> +{
> +  int           i, j;
> +  unsigned int  sll;
> +  unsigned int* p1base = (unsigned int*)screen->pixels;
> +  unsigned int* p2base = (unsigned int*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 4);
> +  sll = ((unsigned int)screen->pitch) >> 2;
> +  for (j = y; j < y+h; j++) {
> +    for (i = x; i < x+w; i++) {
> +      p2base [j * sll +i] = p1base [j * sll +i];
> +    }
> +  }
> +}
> +
> +
> +static void econoUpdate (DisplayState *ds, int x, int y, 
> +                         unsigned int w, unsigned int h)
> +{
> +   static int tested_total = 0;
> +   static int update_total = 0;
> +
> +   int  (*cmpArea) (int, int, int, int) = NULL;
> +   void (*copyArea)(int, int, int, int) = NULL;
> +
> +   int xi, xj, xlim, yi, yj, ylim, ntest, nupd;
> +   if (w == 0 || h == 0) 
> +      return;
> +
> +   if (screen->format->BytesPerPixel == 4) {
> +      cmpArea = cmpArea_32bit;
> +      copyArea = copyArea_32bit;
> +   }
> +   else
> +   if (screen->format->BytesPerPixel == 2) {
> +      cmpArea = cmpArea_16bit;
> +      copyArea = copyArea_16bit;
> +   }
> +   else
> +      assert(0); /* should not be reached - sdl_update guards against this */
> +
> +   xlim = x + w;
> +   ylim = y + h;
> +
> +   ntest = nupd = 0;
> +   for (xi = x; xi < xlim; xi += THRESH) {
> +      xj = xi + THRESH;
> +      if (xj > xlim) xj = xlim;
> +      for (yi = y; yi < ylim; yi += THRESH) {
> +         yj = yi + THRESH;
> +         if (yj > ylim) yj = ylim;
> +         if (xj-xi == 0 || yj-yi == 0)
> +            continue;
> +	 ntest++;
> +	 if (cmpArea(xi, yi, xj-xi, yj-yi)) {
> + 	    nupd++;
> +            copyArea(xi, yi, xj-xi, yj-yi);
> +            SDL_UpdateRect(screen, xi, yi, xj-xi, yj-yi);
> +	 }
> +      }
> +   }
> +   tested_total += ntest;
> +   update_total += nupd;
> +   if (0)
> +   printf("(tested, updated):   total (%d, %d),   this time (%d, %d)\n", 
> +          tested_total, update_total, ntest, nupd);
> +}
> +
> +
>  static void sdl_update(DisplayState *ds, int x, int y, int w, int h)
>  {
> -    //    printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
> -    SDL_UpdateRect(screen, x, y, w, h);
> +   int warned = 0;
> +
> +   //printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
> +   //   printf("Total Size    %d %d\n", screen->w, screen->h);
> +   //printf("BytesPerPixel %d\n", screen->format->BytesPerPixel);
> +   //printf("pitch         %d (%d)\n", screen->pitch, 
> +   //                       screen->w * screen->format->BytesPerPixel);
> +
> +   if (!remote_x11) {
> +      SDL_UpdateRect(screen, x, y, w, h);
> +      return;
> +   }
> +
> +   if ( (screen->format->BytesPerPixel != 4 
> +         && screen->format->BytesPerPixel != 2)
> +       || screen->pitch != screen->w * screen->format->BytesPerPixel) {
> +       if (!warned) {
> +          warned = 1;
> +          fprintf(stderr, "qemu: SDL update optimisation disabled\n"
> +                          "      (wrong bits-per-pixel, or wrong pitch)\n");
> +       }
> +       SDL_UpdateRect(screen, x, y, w, h);
> +       return;
> +    }
> +
> +    assert(screen->pitch == screen->w * screen->format->BytesPerPixel);
> +    assert(sizeof(int) == 4);
> +    assert(sizeof(short) == 2);
> +    if (alt_pixels == NULL) {
> +       /* First time through (at this resolution).  
> +          Copy entire screen. */
> +       if (screen->format->BytesPerPixel == 4) {
> +          int i, word32s = screen->w * screen->h;
> +          if (0) printf("copying init screen - 32 bit\n");
> +          alt_pixels = malloc(word32s * sizeof(int));
> +          assert(alt_pixels);
> +          for (i = 0; i < word32s; i++) 
> +             ((unsigned int*)alt_pixels)[i] 
> +                = ((unsigned int*)(screen->pixels))[i];
> +          SDL_UpdateRect(screen, x, y, w, h);
> +          if (0) printf("done- 32 bit\n");
> +       }
> +       else
> +       if (screen->format->BytesPerPixel == 2) {
> +          int i, word16s = screen->w * screen->h;
> +          if (0) printf("copying init screen - 16 bit\n");
> +          alt_pixels = malloc(word16s * sizeof(int));
> +          assert(alt_pixels);
> +          for (i = 0; i < word16s; i++) 
> +             ((unsigned short*)alt_pixels)[i] 
> +                = ((unsigned short*)(screen->pixels))[i];
> +          SDL_UpdateRect(screen, x, y, w, h);
> +          if (0) printf("done - 16 bit\n");
> +       }
> +       else 
> +          assert(0); /* guarded above */
> +    } else {
> +       assert(w >= 0);
> +       assert(h >= 0);
> +       econoUpdate(ds, x, y, w, h);
> +    }
>  }
>  
>  static void sdl_resize(DisplayState *ds, int w, int h)
>  {
>      int flags;
>  
> -    //    printf("resizing to %d %d\n", w, h);
> +    if (alt_pixels) {
> +        assert(remote_x11);
> +        free(alt_pixels);
> +	alt_pixels = NULL;
> +    }
>  
>      flags = SDL_HWSURFACE|SDL_ASYNCBLIT|SDL_HWACCEL;
>      if (gui_fullscreen)
> diff -u -r Orig/qemu-0.9.0/vl.c qemu-0.9.0/vl.c
> --- Orig/qemu-0.9.0/vl.c	2007-02-05 23:01:54.000000000 +0000
> +++ qemu-0.9.0/vl.c	2007-03-13 22:12:39.000000000 +0000
> @@ -173,6 +173,7 @@
>  int nb_option_roms;
>  int semihosting_enabled = 0;
>  int autostart = 1;
> +int remote_x11 = 0;
>  
>  /***********************************************************/
>  /* x86 ISA bus support */
> @@ -6121,6 +6122,7 @@
>  	   "-vnc display    start a VNC server on display\n"
>  #ifndef _WIN32
>  	   "-daemonize      daemonize QEMU after initializing\n"
> +           "-remote-x11     improve graphics responsiveness when X11 
> client != server\n"
>  #endif
>  	   "-option-rom rom load a file, rom, into the option ROM space\n"
>             "\n"
> @@ -6204,6 +6206,7 @@
>      QEMU_OPTION_no_acpi,
>      QEMU_OPTION_no_reboot,
>      QEMU_OPTION_daemonize,
> +    QEMU_OPTION_remote_x11,
>      QEMU_OPTION_option_rom,
>      QEMU_OPTION_semihosting
>  };
> @@ -6288,6 +6291,7 @@
>      { "no-acpi", 0, QEMU_OPTION_no_acpi },
>      { "no-reboot", 0, QEMU_OPTION_no_reboot },
>      { "daemonize", 0, QEMU_OPTION_daemonize },
> +    { "remote-x11", 0, QEMU_OPTION_remote_x11 },
>      { "option-rom", HAS_ARG, QEMU_OPTION_option_rom },
>  #if defined(TARGET_ARM)
>      { "semihosting", 0, QEMU_OPTION_semihosting },
> @@ -6947,6 +6951,9 @@
>  	    case QEMU_OPTION_daemonize:
>  		daemonize = 1;
>  		break;
> +	    case QEMU_OPTION_remote_x11:
> +		remote_x11 = 1;
> +		break;
>  	    case QEMU_OPTION_option_rom:
>  		if (nb_option_roms >= MAX_OPTION_ROMS) {
>  		    fprintf(stderr, "Too many option ROMs\n");
> diff -u -r Orig/qemu-0.9.0/vl.h qemu-0.9.0/vl.h
> --- Orig/qemu-0.9.0/vl.h	2007-02-05 23:01:54.000000000 +0000
> +++ qemu-0.9.0/vl.h	2007-03-13 22:11:24.000000000 +0000
> @@ -159,6 +159,7 @@
>  extern int no_quit;
>  extern int semihosting_enabled;
>  extern int autostart;
> +extern int remote_x11;
>  
>  #define MAX_OPTION_ROMS 16
>  extern const char *option_rom[MAX_OPTION_ROMS];
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
>
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-14  3:39 ` Anthony Liguori
@ 2007-03-14  4:57   ` Mark Williamson
  2007-03-14 10:53     ` Julian Seward
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Williamson @ 2007-03-14  4:57 UTC (permalink / raw)
  To: qemu-devel

> > Here is a somewhat revised version of a patch I first made nearly
> > three years ago.  The original thread is
>
> The idea here is quite similar to what the VNC server does now.
>
> If this is desirable for SDL too, then perhaps we should find a way to
> fold this into common code?
>
> Although, is there a compelling reason to use SDL over X instead of VNC?

I sometimes do this sort of thing because it Just Works with no manual 
configuring of port forwarding etc.  I don't necessarily like to do it for 
extended usage but it is very convenient.

Cheers,
Mark

> Regards,
>
> Anthony Liguori
>
> > http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00263.html
> >
> > The patch makes QEMU's graphics emulation much more usable over remote
> > X connections, by reducing the amount of data sent to the X server.
> > This is particularly noticeable for small display updates, most
> > importantly mouse cursor movements, which become faster and so
> > generally make the guest's GUI more pleasant to use.
> >
> > Compared to the original patch, this one:
> >
> > - is relative to 0.9.0
> >
> > - handle screen->format->BytesPerPixel values of both 2 and 4,
> >   so handles most guest depths - I tested 8, 16, 24bpp.
> >
> > - has unrolled comparison/copy loops for the depth 2 case.  Most
> >   of the comparisons are short (<= 64 bytes) so I don't see much
> >   point in taking the overhead of a call to memcmp/memcpy.
> >
> > - most importantly, is optional and disabled by default, so that
> >   default performance is unchanged.  To use it you need the new
> >   -remote-x11 flag (perhaps -low-bandwidth-x11 would be a better
> >   name).
> >
> > It still uses a shadow frame buffer.  Fabrice mentioned this is not
> > necessary
> >
> >  http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00279.html
> >
> > but I can't see how to get rid of it and still check for redundant
> > updates in NxN pixel blocks (N=32 by default).  The point of checking
> > NxN squares is that mouse pointer pixmaps are square and so the most
> > common display updates - mouse pointer movements - are often reduced
> > to transmission of a single 32x32 block using this strategy.
> >
> > The shadow framebuffer is only allocated when -remote-x11 is present,
> > so the patch has no effect on default memory use.
> >
> > I measured the bandwidth saving roughly by resuming a vm snapshot
> > containing a web browser showing a page with a lot of links.  I moved
> > the pointer slowly over the links (so they change colour) and scrolled
> > up and down a bit; about 1/2 minute of activity in total.  I tried to do
> > the same with and without -remote-x11.  Without -remote-x11, 163Mbyte
> > was transmitted to the X server; with it, 20.6Mbyte was, about an 8:1
> > reduction.
> >
> > J
> >
> >
> > diff -u -r Orig/qemu-0.9.0/sdl.c qemu-0.9.0/sdl.c
> > --- Orig/qemu-0.9.0/sdl.c	2007-02-05 23:01:54.000000000 +0000
> > +++ qemu-0.9.0/sdl.c	2007-03-13 22:16:40.000000000 +0000
> > @@ -29,6 +29,8 @@
> >  #include <signal.h>
> >  #endif
> >
> > +#include <assert.h>
> > +
> >  static SDL_Surface *screen;
> >  static int gui_grab; /* if true, all keyboard/mouse events are grabbed
> > */ static int last_vm_running;
> > @@ -44,17 +46,232 @@
> >  static SDL_Cursor *sdl_cursor_hidden;
> >  static int absolute_enabled = 0;
> >
> > +/* Mechanism to reduce the total amount of data transmitted to the X
> > +   server, often quite dramatically.  Keep a shadow copy of video
> > +   memory in alt_pixels, and when asked to update a rectangle, use
> > +   the shadow copy to establish areas which are the same, and so do
> > +   not need updating.
> > +*/
> > +
> > +static void* alt_pixels = NULL;
> > +
> > +#define THRESH 32
> > +
> > +/* Return 1 if the area [x .. x+w-1, y .. y+w-1] is different from
> > +   the old version and so needs updating. */
> > +static int cmpArea_16bit ( int x, int y, int w, int h )
> > +{
> > +           int    i, j;
> > +  unsigned int    sll;
> > +  unsigned short* p1base = (unsigned short*)screen->pixels;
> > +  unsigned short* p2base = (unsigned short*)alt_pixels;
> > +  assert(screen->format->BytesPerPixel == 2);
> > +  if (w == 0 || h == 0)
> > +     return 0;
> > +  assert(w > 0 && h > 0);
> > +  sll = ((unsigned int)screen->pitch) >> 1;
> > +  for (j = y; j < y+h; j++) {
> > +    unsigned short* p1p = & p1base[j * sll + x];
> > +    unsigned short* p2p = & p2base[j * sll + x];
> > +    for (i = 0; i < w-5; i += 5) {
> > +      if (p1p[i+0] != p2p[i+0]) return 1;
> > +      if (p1p[i+1] != p2p[i+1]) return 1;
> > +      if (p1p[i+2] != p2p[i+2]) return 1;
> > +      if (p1p[i+3] != p2p[i+3]) return 1;
> > +      if (p1p[i+4] != p2p[i+4]) return 1;
> > +    }
> > +    for (/*fixup*/; i < w; i++) {
> > +      if (p1p[i+0] != p2p[i+0]) return 1;
> > +    }
> > +  }
> > +  return 0;
> > +}
> > +static void copyArea_16bit ( int x, int y, int w, int h )
> > +{
> > +           int    i, j;
> > +  unsigned int    sll;
> > +  unsigned short* p1base = (unsigned short*)screen->pixels;
> > +  unsigned short* p2base = (unsigned short*)alt_pixels;
> > +  assert(screen->format->BytesPerPixel == 2);
> > +  sll = ((unsigned int)screen->pitch) >> 1;
> > +  if (w == 0 || h == 0)
> > +     return;
> > +  assert(w > 0 && h > 0);
> > +  for (j = y; j < y+h; j++) {
> > +    unsigned short* p1p = & p1base[j * sll + x];
> > +    unsigned short* p2p = & p2base[j * sll + x];
> > +    for (i = 0; i < w-5; i += 5) {
> > +      p2p[i+0] = p1p[i+0];
> > +      p2p[i+1] = p1p[i+1];
> > +      p2p[i+2] = p1p[i+2];
> > +      p2p[i+3] = p1p[i+3];
> > +      p2p[i+4] = p1p[i+4];
> > +    }
> > +    for (/*fixup*/; i < w; i++) {
> > +      p2p[i+0] = p1p[i+0];
> > +    }
> > +  }
> > +}
> > +
> > +static int cmpArea_32bit ( int x, int y, int w, int h )
> > +{
> > +  int           i, j;
> > +  unsigned int  sll;
> > +  unsigned int* p1base = (unsigned int*)screen->pixels;
> > +  unsigned int* p2base = (unsigned int*)alt_pixels;
> > +  assert(screen->format->BytesPerPixel == 4);
> > +  sll = ((unsigned int)screen->pitch) >> 2;
> > +  for (j = y; j < y+h; j++) {
> > +    for (i = x; i < x+w; i++) {
> > +      if (p1base [j * sll +i] != p2base [j * sll +i])
> > +	return 1;
> > +    }
> > +  }
> > +  return 0;
> > +}
> > +static void copyArea_32bit ( int x, int y, int w, int h )
> > +{
> > +  int           i, j;
> > +  unsigned int  sll;
> > +  unsigned int* p1base = (unsigned int*)screen->pixels;
> > +  unsigned int* p2base = (unsigned int*)alt_pixels;
> > +  assert(screen->format->BytesPerPixel == 4);
> > +  sll = ((unsigned int)screen->pitch) >> 2;
> > +  for (j = y; j < y+h; j++) {
> > +    for (i = x; i < x+w; i++) {
> > +      p2base [j * sll +i] = p1base [j * sll +i];
> > +    }
> > +  }
> > +}
> > +
> > +
> > +static void econoUpdate (DisplayState *ds, int x, int y,
> > +                         unsigned int w, unsigned int h)
> > +{
> > +   static int tested_total = 0;
> > +   static int update_total = 0;
> > +
> > +   int  (*cmpArea) (int, int, int, int) = NULL;
> > +   void (*copyArea)(int, int, int, int) = NULL;
> > +
> > +   int xi, xj, xlim, yi, yj, ylim, ntest, nupd;
> > +   if (w == 0 || h == 0)
> > +      return;
> > +
> > +   if (screen->format->BytesPerPixel == 4) {
> > +      cmpArea = cmpArea_32bit;
> > +      copyArea = copyArea_32bit;
> > +   }
> > +   else
> > +   if (screen->format->BytesPerPixel == 2) {
> > +      cmpArea = cmpArea_16bit;
> > +      copyArea = copyArea_16bit;
> > +   }
> > +   else
> > +      assert(0); /* should not be reached - sdl_update guards against
> > this */ +
> > +   xlim = x + w;
> > +   ylim = y + h;
> > +
> > +   ntest = nupd = 0;
> > +   for (xi = x; xi < xlim; xi += THRESH) {
> > +      xj = xi + THRESH;
> > +      if (xj > xlim) xj = xlim;
> > +      for (yi = y; yi < ylim; yi += THRESH) {
> > +         yj = yi + THRESH;
> > +         if (yj > ylim) yj = ylim;
> > +         if (xj-xi == 0 || yj-yi == 0)
> > +            continue;
> > +	 ntest++;
> > +	 if (cmpArea(xi, yi, xj-xi, yj-yi)) {
> > + 	    nupd++;
> > +            copyArea(xi, yi, xj-xi, yj-yi);
> > +            SDL_UpdateRect(screen, xi, yi, xj-xi, yj-yi);
> > +	 }
> > +      }
> > +   }
> > +   tested_total += ntest;
> > +   update_total += nupd;
> > +   if (0)
> > +   printf("(tested, updated):   total (%d, %d),   this time (%d, %d)\n",
> > +          tested_total, update_total, ntest, nupd);
> > +}
> > +
> > +
> >  static void sdl_update(DisplayState *ds, int x, int y, int w, int h)
> >  {
> > -    //    printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
> > -    SDL_UpdateRect(screen, x, y, w, h);
> > +   int warned = 0;
> > +
> > +   //printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
> > +   //   printf("Total Size    %d %d\n", screen->w, screen->h);
> > +   //printf("BytesPerPixel %d\n", screen->format->BytesPerPixel);
> > +   //printf("pitch         %d (%d)\n", screen->pitch,
> > +   //                       screen->w * screen->format->BytesPerPixel);
> > +
> > +   if (!remote_x11) {
> > +      SDL_UpdateRect(screen, x, y, w, h);
> > +      return;
> > +   }
> > +
> > +   if ( (screen->format->BytesPerPixel != 4
> > +         && screen->format->BytesPerPixel != 2)
> > +       || screen->pitch != screen->w * screen->format->BytesPerPixel) {
> > +       if (!warned) {
> > +          warned = 1;
> > +          fprintf(stderr, "qemu: SDL update optimisation disabled\n"
> > +                          "      (wrong bits-per-pixel, or wrong
> > pitch)\n"); +       }
> > +       SDL_UpdateRect(screen, x, y, w, h);
> > +       return;
> > +    }
> > +
> > +    assert(screen->pitch == screen->w * screen->format->BytesPerPixel);
> > +    assert(sizeof(int) == 4);
> > +    assert(sizeof(short) == 2);
> > +    if (alt_pixels == NULL) {
> > +       /* First time through (at this resolution).
> > +          Copy entire screen. */
> > +       if (screen->format->BytesPerPixel == 4) {
> > +          int i, word32s = screen->w * screen->h;
> > +          if (0) printf("copying init screen - 32 bit\n");
> > +          alt_pixels = malloc(word32s * sizeof(int));
> > +          assert(alt_pixels);
> > +          for (i = 0; i < word32s; i++)
> > +             ((unsigned int*)alt_pixels)[i]
> > +                = ((unsigned int*)(screen->pixels))[i];
> > +          SDL_UpdateRect(screen, x, y, w, h);
> > +          if (0) printf("done- 32 bit\n");
> > +       }
> > +       else
> > +       if (screen->format->BytesPerPixel == 2) {
> > +          int i, word16s = screen->w * screen->h;
> > +          if (0) printf("copying init screen - 16 bit\n");
> > +          alt_pixels = malloc(word16s * sizeof(int));
> > +          assert(alt_pixels);
> > +          for (i = 0; i < word16s; i++)
> > +             ((unsigned short*)alt_pixels)[i]
> > +                = ((unsigned short*)(screen->pixels))[i];
> > +          SDL_UpdateRect(screen, x, y, w, h);
> > +          if (0) printf("done - 16 bit\n");
> > +       }
> > +       else
> > +          assert(0); /* guarded above */
> > +    } else {
> > +       assert(w >= 0);
> > +       assert(h >= 0);
> > +       econoUpdate(ds, x, y, w, h);
> > +    }
> >  }
> >
> >  static void sdl_resize(DisplayState *ds, int w, int h)
> >  {
> >      int flags;
> >
> > -    //    printf("resizing to %d %d\n", w, h);
> > +    if (alt_pixels) {
> > +        assert(remote_x11);
> > +        free(alt_pixels);
> > +	alt_pixels = NULL;
> > +    }
> >
> >      flags = SDL_HWSURFACE|SDL_ASYNCBLIT|SDL_HWACCEL;
> >      if (gui_fullscreen)
> > diff -u -r Orig/qemu-0.9.0/vl.c qemu-0.9.0/vl.c
> > --- Orig/qemu-0.9.0/vl.c	2007-02-05 23:01:54.000000000 +0000
> > +++ qemu-0.9.0/vl.c	2007-03-13 22:12:39.000000000 +0000
> > @@ -173,6 +173,7 @@
> >  int nb_option_roms;
> >  int semihosting_enabled = 0;
> >  int autostart = 1;
> > +int remote_x11 = 0;
> >
> >  /***********************************************************/
> >  /* x86 ISA bus support */
> > @@ -6121,6 +6122,7 @@
> >  	   "-vnc display    start a VNC server on display\n"
> >  #ifndef _WIN32
> >  	   "-daemonize      daemonize QEMU after initializing\n"
> > +           "-remote-x11     improve graphics responsiveness when X11
> > client != server\n"
> >  #endif
> >  	   "-option-rom rom load a file, rom, into the option ROM space\n"
> >             "\n"
> > @@ -6204,6 +6206,7 @@
> >      QEMU_OPTION_no_acpi,
> >      QEMU_OPTION_no_reboot,
> >      QEMU_OPTION_daemonize,
> > +    QEMU_OPTION_remote_x11,
> >      QEMU_OPTION_option_rom,
> >      QEMU_OPTION_semihosting
> >  };
> > @@ -6288,6 +6291,7 @@
> >      { "no-acpi", 0, QEMU_OPTION_no_acpi },
> >      { "no-reboot", 0, QEMU_OPTION_no_reboot },
> >      { "daemonize", 0, QEMU_OPTION_daemonize },
> > +    { "remote-x11", 0, QEMU_OPTION_remote_x11 },
> >      { "option-rom", HAS_ARG, QEMU_OPTION_option_rom },
> >  #if defined(TARGET_ARM)
> >      { "semihosting", 0, QEMU_OPTION_semihosting },
> > @@ -6947,6 +6951,9 @@
> >  	    case QEMU_OPTION_daemonize:
> >  		daemonize = 1;
> >  		break;
> > +	    case QEMU_OPTION_remote_x11:
> > +		remote_x11 = 1;
> > +		break;
> >  	    case QEMU_OPTION_option_rom:
> >  		if (nb_option_roms >= MAX_OPTION_ROMS) {
> >  		    fprintf(stderr, "Too many option ROMs\n");
> > diff -u -r Orig/qemu-0.9.0/vl.h qemu-0.9.0/vl.h
> > --- Orig/qemu-0.9.0/vl.h	2007-02-05 23:01:54.000000000 +0000
> > +++ qemu-0.9.0/vl.h	2007-03-13 22:11:24.000000000 +0000
> > @@ -159,6 +159,7 @@
> >  extern int no_quit;
> >  extern int semihosting_enabled;
> >  extern int autostart;
> > +extern int remote_x11;
> >
> >  #define MAX_OPTION_ROMS 16
> >  extern const char *option_rom[MAX_OPTION_ROMS];
> >
> >
> > _______________________________________________
> > Qemu-devel mailing list
> > Qemu-devel@nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/qemu-devel
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-14  4:57   ` Mark Williamson
@ 2007-03-14 10:53     ` Julian Seward
  2007-03-14 12:29       ` Johannes Schindelin
  0 siblings, 1 reply; 11+ messages in thread
From: Julian Seward @ 2007-03-14 10:53 UTC (permalink / raw)
  To: qemu-devel

On Wednesday 14 March 2007 04:57, Mark Williamson wrote:
> > > Here is a somewhat revised version of a patch I first made nearly
> > > three years ago.  The original thread is
> >
> > The idea here is quite similar to what the VNC server does now.
> >
> > If this is desirable for SDL too, then perhaps we should find a way to
> > fold this into common code?
> >
> > Although, is there a compelling reason to use SDL over X instead of VNC?
>
> I sometimes do this sort of thing because it Just Works with no manual
> configuring of port forwarding etc.  I don't necessarily like to do it for
> extended usage but it is very convenient.

Yes.  VNC is all very nice (I use it a lot) but is hassle to set up,
what with making holes in firewalls and/or port forwarding etc.  This
patch has the "it just works" property.

In fact I obtained (by far) the best remote X performance by using both
this patch and making the remote X connection with 
"ssh -XC -o CompressionLevel=1".  The patch knocks out the majority of
the data, and the ssh compression squashed what remained by more than a
factor of 10.  Doing this it was hard to tell that QEMU was not running
locally.

J

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-14 10:53     ` Julian Seward
@ 2007-03-14 12:29       ` Johannes Schindelin
  0 siblings, 0 replies; 11+ messages in thread
From: Johannes Schindelin @ 2007-03-14 12:29 UTC (permalink / raw)
  To: Julian Seward; +Cc: qemu-devel

Hi,

On Wed, 14 Mar 2007, Julian Seward wrote:

> In fact I obtained (by far) the best remote X performance by using both 
> this patch and making the remote X connection with "ssh -XC -o 
> CompressionLevel=1".  The patch knocks out the majority of the data, and 
> the ssh compression squashed what remained by more than a factor of 10.  
> Doing this it was hard to tell that QEMU was not running locally.

This should be helped by the VNC encodings Tight, ZRLE, and for multimedia 
content (movies) the upcoming ZYWRLE encoding.

But I agree that your idea to reduce X communication makes tons of sense, 
since not everybody wants to use VNC.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-14  1:52 [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2 Julian Seward
  2007-03-14  3:39 ` Anthony Liguori
@ 2007-03-16 18:40 ` Anthony Liguori
  2007-03-16 20:53   ` Julian Seward
  1 sibling, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2007-03-16 18:40 UTC (permalink / raw)
  To: qemu-devel, Julian Seward

Hi Julian,

Julian Seward wrote:
> Here is a somewhat revised version of a patch I first made nearly
> three years ago.  The original thread is 
>
> http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00263.html
>
> It still uses a shadow frame buffer.  Fabrice mentioned this is not
> necessary
>   

I thought about this a little and here's what I came up with.

I think we could change vga_draw_line* so that as part of the drawing 
process, it compared the newly generated pixel with the previous pixel 
value and returned back the min, max x-coordinate that changed.

Since we tend to only extend the vertical dirty range by a couple 
pixels, this should be a relatively cheap way of reducing the size of 
the update region.

If there still is concern about performance, then we could introduce a 
flag that make this behavior conditional.  This would be nice since 
there are already certain scenarios where I want to disable this for VNC 
(using a shared memory transport for instance).

It also helps for cases where this behavior is totally unneeded such as 
emulating the VMware VGA device.

The more I think about it, the more convinced I am that this 
minimization has to occur in the VGA devices themselves.

What do you think?

Regards,

Anthony Liguori

>  http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00279.html
>
> but I can't see how to get rid of it and still check for redundant
> updates in NxN pixel blocks (N=32 by default).  The point of checking
> NxN squares is that mouse pointer pixmaps are square and so the most
> common display updates - mouse pointer movements - are often reduced
> to transmission of a single 32x32 block using this strategy.
>
> The shadow framebuffer is only allocated when -remote-x11 is present,
> so the patch has no effect on default memory use.
>
> I measured the bandwidth saving roughly by resuming a vm snapshot
> containing a web browser showing a page with a lot of links.  I moved
> the pointer slowly over the links (so they change colour) and scrolled
> up and down a bit; about 1/2 minute of activity in total.  I tried to do
> the same with and without -remote-x11.  Without -remote-x11, 163Mbyte
> was transmitted to the X server; with it, 20.6Mbyte was, about an 8:1
> reduction.
>
> J
>
>
> diff -u -r Orig/qemu-0.9.0/sdl.c qemu-0.9.0/sdl.c
> --- Orig/qemu-0.9.0/sdl.c	2007-02-05 23:01:54.000000000 +0000
> +++ qemu-0.9.0/sdl.c	2007-03-13 22:16:40.000000000 +0000
> @@ -29,6 +29,8 @@
>  #include <signal.h>
>  #endif
>  
> +#include <assert.h>
> +
>  static SDL_Surface *screen;
>  static int gui_grab; /* if true, all keyboard/mouse events are grabbed */
>  static int last_vm_running;
> @@ -44,17 +46,232 @@
>  static SDL_Cursor *sdl_cursor_hidden;
>  static int absolute_enabled = 0;
>  
> +/* Mechanism to reduce the total amount of data transmitted to the X
> +   server, often quite dramatically.  Keep a shadow copy of video
> +   memory in alt_pixels, and when asked to update a rectangle, use
> +   the shadow copy to establish areas which are the same, and so do
> +   not need updating.
> +*/
> +
> +static void* alt_pixels = NULL;
> +
> +#define THRESH 32
> +
> +/* Return 1 if the area [x .. x+w-1, y .. y+w-1] is different from
> +   the old version and so needs updating. */
> +static int cmpArea_16bit ( int x, int y, int w, int h )
> +{
> +           int    i, j;
> +  unsigned int    sll;
> +  unsigned short* p1base = (unsigned short*)screen->pixels;
> +  unsigned short* p2base = (unsigned short*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 2);
> +  if (w == 0 || h == 0)
> +     return 0;
> +  assert(w > 0 && h > 0);
> +  sll = ((unsigned int)screen->pitch) >> 1;
> +  for (j = y; j < y+h; j++) {
> +    unsigned short* p1p = & p1base[j * sll + x];
> +    unsigned short* p2p = & p2base[j * sll + x];
> +    for (i = 0; i < w-5; i += 5) {
> +      if (p1p[i+0] != p2p[i+0]) return 1;
> +      if (p1p[i+1] != p2p[i+1]) return 1;
> +      if (p1p[i+2] != p2p[i+2]) return 1;
> +      if (p1p[i+3] != p2p[i+3]) return 1;
> +      if (p1p[i+4] != p2p[i+4]) return 1;
> +    }
> +    for (/*fixup*/; i < w; i++) {
> +      if (p1p[i+0] != p2p[i+0]) return 1;
> +    }
> +  }
> +  return 0;
> +}
> +static void copyArea_16bit ( int x, int y, int w, int h )
> +{
> +           int    i, j;
> +  unsigned int    sll;
> +  unsigned short* p1base = (unsigned short*)screen->pixels;
> +  unsigned short* p2base = (unsigned short*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 2);
> +  sll = ((unsigned int)screen->pitch) >> 1;
> +  if (w == 0 || h == 0)
> +     return;
> +  assert(w > 0 && h > 0);
> +  for (j = y; j < y+h; j++) {
> +    unsigned short* p1p = & p1base[j * sll + x];
> +    unsigned short* p2p = & p2base[j * sll + x];
> +    for (i = 0; i < w-5; i += 5) {
> +      p2p[i+0] = p1p[i+0];
> +      p2p[i+1] = p1p[i+1];
> +      p2p[i+2] = p1p[i+2];
> +      p2p[i+3] = p1p[i+3];
> +      p2p[i+4] = p1p[i+4];
> +    }
> +    for (/*fixup*/; i < w; i++) {
> +      p2p[i+0] = p1p[i+0];
> +    }
> +  }
> +}
> +
> +static int cmpArea_32bit ( int x, int y, int w, int h )
> +{
> +  int           i, j;
> +  unsigned int  sll;
> +  unsigned int* p1base = (unsigned int*)screen->pixels;
> +  unsigned int* p2base = (unsigned int*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 4);
> +  sll = ((unsigned int)screen->pitch) >> 2;
> +  for (j = y; j < y+h; j++) {
> +    for (i = x; i < x+w; i++) {
> +      if (p1base [j * sll +i] != p2base [j * sll +i])
> +	return 1;
> +    }
> +  }
> +  return 0;
> +}
> +static void copyArea_32bit ( int x, int y, int w, int h )
> +{
> +  int           i, j;
> +  unsigned int  sll;
> +  unsigned int* p1base = (unsigned int*)screen->pixels;
> +  unsigned int* p2base = (unsigned int*)alt_pixels;
> +  assert(screen->format->BytesPerPixel == 4);
> +  sll = ((unsigned int)screen->pitch) >> 2;
> +  for (j = y; j < y+h; j++) {
> +    for (i = x; i < x+w; i++) {
> +      p2base [j * sll +i] = p1base [j * sll +i];
> +    }
> +  }
> +}
> +
> +
> +static void econoUpdate (DisplayState *ds, int x, int y, 
> +                         unsigned int w, unsigned int h)
> +{
> +   static int tested_total = 0;
> +   static int update_total = 0;
> +
> +   int  (*cmpArea) (int, int, int, int) = NULL;
> +   void (*copyArea)(int, int, int, int) = NULL;
> +
> +   int xi, xj, xlim, yi, yj, ylim, ntest, nupd;
> +   if (w == 0 || h == 0) 
> +      return;
> +
> +   if (screen->format->BytesPerPixel == 4) {
> +      cmpArea = cmpArea_32bit;
> +      copyArea = copyArea_32bit;
> +   }
> +   else
> +   if (screen->format->BytesPerPixel == 2) {
> +      cmpArea = cmpArea_16bit;
> +      copyArea = copyArea_16bit;
> +   }
> +   else
> +      assert(0); /* should not be reached - sdl_update guards against this */
> +
> +   xlim = x + w;
> +   ylim = y + h;
> +
> +   ntest = nupd = 0;
> +   for (xi = x; xi < xlim; xi += THRESH) {
> +      xj = xi + THRESH;
> +      if (xj > xlim) xj = xlim;
> +      for (yi = y; yi < ylim; yi += THRESH) {
> +         yj = yi + THRESH;
> +         if (yj > ylim) yj = ylim;
> +         if (xj-xi == 0 || yj-yi == 0)
> +            continue;
> +	 ntest++;
> +	 if (cmpArea(xi, yi, xj-xi, yj-yi)) {
> + 	    nupd++;
> +            copyArea(xi, yi, xj-xi, yj-yi);
> +            SDL_UpdateRect(screen, xi, yi, xj-xi, yj-yi);
> +	 }
> +      }
> +   }
> +   tested_total += ntest;
> +   update_total += nupd;
> +   if (0)
> +   printf("(tested, updated):   total (%d, %d),   this time (%d, %d)\n", 
> +          tested_total, update_total, ntest, nupd);
> +}
> +
> +
>  static void sdl_update(DisplayState *ds, int x, int y, int w, int h)
>  {
> -    //    printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
> -    SDL_UpdateRect(screen, x, y, w, h);
> +   int warned = 0;
> +
> +   //printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h);
> +   //   printf("Total Size    %d %d\n", screen->w, screen->h);
> +   //printf("BytesPerPixel %d\n", screen->format->BytesPerPixel);
> +   //printf("pitch         %d (%d)\n", screen->pitch, 
> +   //                       screen->w * screen->format->BytesPerPixel);
> +
> +   if (!remote_x11) {
> +      SDL_UpdateRect(screen, x, y, w, h);
> +      return;
> +   }
> +
> +   if ( (screen->format->BytesPerPixel != 4 
> +         && screen->format->BytesPerPixel != 2)
> +       || screen->pitch != screen->w * screen->format->BytesPerPixel) {
> +       if (!warned) {
> +          warned = 1;
> +          fprintf(stderr, "qemu: SDL update optimisation disabled\n"
> +                          "      (wrong bits-per-pixel, or wrong pitch)\n");
> +       }
> +       SDL_UpdateRect(screen, x, y, w, h);
> +       return;
> +    }
> +
> +    assert(screen->pitch == screen->w * screen->format->BytesPerPixel);
> +    assert(sizeof(int) == 4);
> +    assert(sizeof(short) == 2);
> +    if (alt_pixels == NULL) {
> +       /* First time through (at this resolution).  
> +          Copy entire screen. */
> +       if (screen->format->BytesPerPixel == 4) {
> +          int i, word32s = screen->w * screen->h;
> +          if (0) printf("copying init screen - 32 bit\n");
> +          alt_pixels = malloc(word32s * sizeof(int));
> +          assert(alt_pixels);
> +          for (i = 0; i < word32s; i++) 
> +             ((unsigned int*)alt_pixels)[i] 
> +                = ((unsigned int*)(screen->pixels))[i];
> +          SDL_UpdateRect(screen, x, y, w, h);
> +          if (0) printf("done- 32 bit\n");
> +       }
> +       else
> +       if (screen->format->BytesPerPixel == 2) {
> +          int i, word16s = screen->w * screen->h;
> +          if (0) printf("copying init screen - 16 bit\n");
> +          alt_pixels = malloc(word16s * sizeof(int));
> +          assert(alt_pixels);
> +          for (i = 0; i < word16s; i++) 
> +             ((unsigned short*)alt_pixels)[i] 
> +                = ((unsigned short*)(screen->pixels))[i];
> +          SDL_UpdateRect(screen, x, y, w, h);
> +          if (0) printf("done - 16 bit\n");
> +       }
> +       else 
> +          assert(0); /* guarded above */
> +    } else {
> +       assert(w >= 0);
> +       assert(h >= 0);
> +       econoUpdate(ds, x, y, w, h);
> +    }
>  }
>  
>  static void sdl_resize(DisplayState *ds, int w, int h)
>  {
>      int flags;
>  
> -    //    printf("resizing to %d %d\n", w, h);
> +    if (alt_pixels) {
> +        assert(remote_x11);
> +        free(alt_pixels);
> +	alt_pixels = NULL;
> +    }
>  
>      flags = SDL_HWSURFACE|SDL_ASYNCBLIT|SDL_HWACCEL;
>      if (gui_fullscreen)
> diff -u -r Orig/qemu-0.9.0/vl.c qemu-0.9.0/vl.c
> --- Orig/qemu-0.9.0/vl.c	2007-02-05 23:01:54.000000000 +0000
> +++ qemu-0.9.0/vl.c	2007-03-13 22:12:39.000000000 +0000
> @@ -173,6 +173,7 @@
>  int nb_option_roms;
>  int semihosting_enabled = 0;
>  int autostart = 1;
> +int remote_x11 = 0;
>  
>  /***********************************************************/
>  /* x86 ISA bus support */
> @@ -6121,6 +6122,7 @@
>  	   "-vnc display    start a VNC server on display\n"
>  #ifndef _WIN32
>  	   "-daemonize      daemonize QEMU after initializing\n"
> +           "-remote-x11     improve graphics responsiveness when X11 
> client != server\n"
>  #endif
>  	   "-option-rom rom load a file, rom, into the option ROM space\n"
>             "\n"
> @@ -6204,6 +6206,7 @@
>      QEMU_OPTION_no_acpi,
>      QEMU_OPTION_no_reboot,
>      QEMU_OPTION_daemonize,
> +    QEMU_OPTION_remote_x11,
>      QEMU_OPTION_option_rom,
>      QEMU_OPTION_semihosting
>  };
> @@ -6288,6 +6291,7 @@
>      { "no-acpi", 0, QEMU_OPTION_no_acpi },
>      { "no-reboot", 0, QEMU_OPTION_no_reboot },
>      { "daemonize", 0, QEMU_OPTION_daemonize },
> +    { "remote-x11", 0, QEMU_OPTION_remote_x11 },
>      { "option-rom", HAS_ARG, QEMU_OPTION_option_rom },
>  #if defined(TARGET_ARM)
>      { "semihosting", 0, QEMU_OPTION_semihosting },
> @@ -6947,6 +6951,9 @@
>  	    case QEMU_OPTION_daemonize:
>  		daemonize = 1;
>  		break;
> +	    case QEMU_OPTION_remote_x11:
> +		remote_x11 = 1;
> +		break;
>  	    case QEMU_OPTION_option_rom:
>  		if (nb_option_roms >= MAX_OPTION_ROMS) {
>  		    fprintf(stderr, "Too many option ROMs\n");
> diff -u -r Orig/qemu-0.9.0/vl.h qemu-0.9.0/vl.h
> --- Orig/qemu-0.9.0/vl.h	2007-02-05 23:01:54.000000000 +0000
> +++ qemu-0.9.0/vl.h	2007-03-13 22:11:24.000000000 +0000
> @@ -159,6 +159,7 @@
>  extern int no_quit;
>  extern int semihosting_enabled;
>  extern int autostart;
> +extern int remote_x11;
>  
>  #define MAX_OPTION_ROMS 16
>  extern const char *option_rom[MAX_OPTION_ROMS];
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
>
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-16 18:40 ` Anthony Liguori
@ 2007-03-16 20:53   ` Julian Seward
  2007-03-16 21:09     ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Julian Seward @ 2007-03-16 20:53 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On Friday 16 March 2007 18:40, Anthony Liguori wrote:
> Hi Julian,
>
> Julian Seward wrote:
> > Here is a somewhat revised version of a patch I first made nearly
> > three years ago.  The original thread is
> >
> > http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00263.html
> >
> > It still uses a shadow frame buffer.  Fabrice mentioned this is not
> > necessary
>
> I thought about this a little and here's what I came up with.
>
> I think we could change vga_draw_line* so that as part of the drawing
> process, it compared the newly generated pixel with the previous pixel
> value and returned back the min, max x-coordinate that changed.
>
> Since we tend to only extend the vertical dirty range by a couple
> pixels, this should be a relatively cheap way of reducing the size of
> the update region.

Sounds plausible - having said that, I have no familiarity with the VGA
code.  Also sounds like a cleaner solution than mine.

Is there something which guarantees that the vertical dirty range only
overshoots by some small number of pixels?  (thinking more about it ..
it doesn't matter - finding min/max that changed for each line will also
make it possible to identify the vertical limits of the change).

Will this work also for the CL542x adaptor?  (Does that fall in the category
of vga?)  My current hack works for with/without -std-vga and I think 
that's because it lives "underneath" both, in the connection to SDL.

J

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-16 20:53   ` Julian Seward
@ 2007-03-16 21:09     ` Anthony Liguori
  2007-03-16 21:38       ` Paul Brook
  0 siblings, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2007-03-16 21:09 UTC (permalink / raw)
  To: Julian Seward; +Cc: qemu-devel

Julian Seward wrote:
> On Friday 16 March 2007 18:40, Anthony Liguori wrote:
>   
>> Hi Julian,
>>
>> Julian Seward wrote:
>>     
>>> Here is a somewhat revised version of a patch I first made nearly
>>> three years ago.  The original thread is
>>>
>>> http://lists.gnu.org/archive/html/qemu-devel/2004-07/msg00263.html
>>>
>>> It still uses a shadow frame buffer.  Fabrice mentioned this is not
>>> necessary
>>>       
>> I thought about this a little and here's what I came up with.
>>
>> I think we could change vga_draw_line* so that as part of the drawing
>> process, it compared the newly generated pixel with the previous pixel
>> value and returned back the min, max x-coordinate that changed.
>>
>> Since we tend to only extend the vertical dirty range by a couple
>> pixels, this should be a relatively cheap way of reducing the size of
>> the update region.
>>     
>
> Sounds plausible - having said that, I have no familiarity with the VGA
> code.  Also sounds like a cleaner solution than mine.
>
> Is there something which guarantees that the vertical dirty range only
> overshoots by some small number of pixels?

Even at a relatively low resolution (640 x 480 @ 16 bit), a page will 
span only three rows so one dirty page cannot dirty more than 5 vertical 
pixels.

>   (thinking more about it ..
> it doesn't matter - finding min/max that changed for each line will also
> make it possible to identify the vertical limits of the change).
>   

This is also true.

> Will this work also for the CL542x adaptor?  (Does that fall in the category
> of vga?)  My current hack works for with/without -std-vga and I think 
> that's because it lives "underneath" both, in the connection to SDL.
>   

Each adapter will have to do it's own minimization but that's sort of 
the write thing anyway IMHO.  How granular each update is really only 
depends on the adapter.  For instance, the VMware adapter really 
shouldn't need to do any minimization at all.

Regards,

Anthony Liguori

> J
>
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2
  2007-03-16 21:09     ` Anthony Liguori
@ 2007-03-16 21:38       ` Paul Brook
  2007-03-19 16:07         ` [Qemu-devel] " Brian Johnson
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Brook @ 2007-03-16 21:38 UTC (permalink / raw)
  To: qemu-devel

> > Will this work also for the CL542x adaptor?  (Does that fall in the
> > category of vga?)  My current hack works for with/without -std-vga and I
> > think that's because it lives "underneath" both, in the connection to
> > SDL.
>
> Each adapter will have to do it's own minimization but that's sort of
> the write thing anyway IMHO.  How granular each update is really only
> depends on the adapter.  For instance, the VMware adapter really
> shouldn't need to do any minimization at all.

It would be nice if we could share the framebuffer blitting routines. We've 
currently got 3 different implementations (vga/cirrus, tcx and pl110) of 
basically the same framebuffer rendering routines.

Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] Reducing X communication bandwidth, take 2
  2007-03-16 21:38       ` Paul Brook
@ 2007-03-19 16:07         ` Brian Johnson
  2007-03-25 17:22           ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Johnson @ 2007-03-19 16:07 UTC (permalink / raw)
  To: qemu-devel

Paul Brook wrote:
>>> Will this work also for the CL542x adaptor?  (Does that fall in the
>>> category of vga?)  My current hack works for with/without -std-vga and I
>>> think that's because it lives "underneath" both, in the connection to
>>> SDL.
>> Each adapter will have to do it's own minimization but that's sort of
>> the write thing anyway IMHO.  How granular each update is really only
>> depends on the adapter.  For instance, the VMware adapter really
>> shouldn't need to do any minimization at all.
> 
> It would be nice if we could share the framebuffer blitting routines. We've 
> currently got 3 different implementations (vga/cirrus, tcx and pl110) of 
> basically the same framebuffer rendering routines.

Take a look at the video code in BasiliskII / SheepShaver, a 68k/PPC 
classic MacOS emulator written by Gwenolé Beauchesne:

http://gwenole.beauchesne.info/projects/sheepshaver/

It contains optimized code (source level) for blitting between various 
bit depths and endiannesses.  See 
SheepShaver-2.3/src/Unix/video_blit.{h,cpp} in the sources.

It also uses a technique called "video on segfault" (VOSF) to improve 
performance on platforms which support it:  rather than testing each 
store to see if it modifies the framebuffer, it keeps the framebuffer 
write-protected (via mprotect(), or the equivalent on non-POSIX systems) 
and uses a SIGSEGV handler to catch stores to the buffer.  When a page 
receives a store, the handler unprotects the page and updates a bitmap 
of modified pages.  Every so often a display update thread wakes up, 
consults the bitmap, calculates the updated region, blits it to the 
screen (using the optimized blitters), and clears the bitmap.  See 
SheepShaver-2.3/src/Unix/video_vosf.h, 
SheepShaver-2.3/src/Unix/video_x.cpp, and other files.

The emulators also have alternative techniques for tracking update 
regions on systems for which VOSF is not supported.  But VOSF is almost 
always a big win.  And most modern OSes can support it without trouble.

The code is in C++ so it can't be dropped in directly, but Some of the 
techniques may be useful in qemu.


Brian J. Johnson

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Re: [PATCH] Reducing X communication bandwidth, take 2
  2007-03-19 16:07         ` [Qemu-devel] " Brian Johnson
@ 2007-03-25 17:22           ` Anthony Liguori
  0 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2007-03-25 17:22 UTC (permalink / raw)
  To: qemu-devel

Brian Johnson wrote:
> Paul Brook wrote:
>>>> Will this work also for the CL542x adaptor?  (Does that fall in the
>>>> category of vga?)  My current hack works for with/without -std-vga 
>>>> and I
>>>> think that's because it lives "underneath" both, in the connection to
>>>> SDL.
>>> Each adapter will have to do it's own minimization but that's sort of
>>> the write thing anyway IMHO.  How granular each update is really only
>>> depends on the adapter.  For instance, the VMware adapter really
>>> shouldn't need to do any minimization at all.
>>
>> It would be nice if we could share the framebuffer blitting routines. 
>> We've currently got 3 different implementations (vga/cirrus, tcx and 
>> pl110) of basically the same framebuffer rendering routines.
>
> Take a look at the video code in BasiliskII / SheepShaver, a 68k/PPC 
> classic MacOS emulator written by Gwenolé Beauchesne:
>
> http://gwenole.beauchesne.info/projects/sheepshaver/
>
> It contains optimized code (source level) for blitting between various 
> bit depths and endiannesses.  See 
> SheepShaver-2.3/src/Unix/video_blit.{h,cpp} in the sources.
>
> It also uses a technique called "video on segfault" (VOSF) to improve 
> performance on platforms which support it:  

This is what QEMU already does.  In fact, it's an exceedingly common 
technique.

Regards,

Anthony Liguori

> rather than testing each store to see if it modifies the framebuffer, 
> it keeps the framebuffer write-protected (via mprotect(), or the 
> equivalent on non-POSIX systems) and uses a SIGSEGV handler to catch 
> stores to the buffer.  When a page receives a store, the handler 
> unprotects the page and updates a bitmap of modified pages.  Every so 
> often a display update thread wakes up, consults the bitmap, 
> calculates the updated region, blits it to the screen (using the 
> optimized blitters), and clears the bitmap.  See 
> SheepShaver-2.3/src/Unix/video_vosf.h, 
> SheepShaver-2.3/src/Unix/video_x.cpp, and other files.
>
> The emulators also have alternative techniques for tracking update 
> regions on systems for which VOSF is not supported.  But VOSF is 
> almost always a big win.  And most modern OSes can support it without 
> trouble.
>
> The code is in C++ so it can't be dropped in directly, but Some of the 
> techniques may be useful in qemu.
>
>
> Brian J. Johnson
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-03-25 17:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-14  1:52 [Qemu-devel] [PATCH] Reducing X communication bandwidth, take 2 Julian Seward
2007-03-14  3:39 ` Anthony Liguori
2007-03-14  4:57   ` Mark Williamson
2007-03-14 10:53     ` Julian Seward
2007-03-14 12:29       ` Johannes Schindelin
2007-03-16 18:40 ` Anthony Liguori
2007-03-16 20:53   ` Julian Seward
2007-03-16 21:09     ` Anthony Liguori
2007-03-16 21:38       ` Paul Brook
2007-03-19 16:07         ` [Qemu-devel] " Brian Johnson
2007-03-25 17:22           ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).