From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BjDUq-0005m9-Bt for qemu-devel@nongnu.org; Sat, 10 Jul 2004 04:46:12 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BjDUm-0005lj-52 for qemu-devel@nongnu.org; Sat, 10 Jul 2004 04:46:09 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BjDUl-0005lg-Ts for qemu-devel@nongnu.org; Sat, 10 Jul 2004 04:46:08 -0400 Received: from [62.253.162.41] (helo=mta01-svc.ntlworld.com) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BjDSV-0006uO-Th for qemu-devel@nongnu.org; Sat, 10 Jul 2004 04:43:48 -0400 Received: from nemesis.frop.org ([62.253.132.150]) by mta01-svc.ntlworld.com (InterMail vM.4.01.03.37 201-229-121-137-20020806) with ESMTP id <20040710084312.BLDF19746.mta01-svc.ntlworld.com@nemesis.frop.org> for ; Sat, 10 Jul 2004 09:43:12 +0100 From: Julian Seward Date: Sat, 10 Jul 2004 09:45:06 +0100 MIME-Version: 1.0 Content-Disposition: inline Content-Type: Multipart/Mixed; boundary="Boundary-00=_Sy67A2Mae8weAiX" Message-Id: <200407100945.06998.jseward@acm.org> Subject: [Qemu-devel] Reducing X communication bandwidth Reply-To: jseward@acm.org, qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org --Boundary-00=_Sy67A2Mae8weAiX Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline QEMU is great, but one specific problem has made it unusable for me so far. The (fast) machine I want to run QEMU on is not the same as my (slow) machine which is on my desk. That is, I want to do the usual X "export DISPLAY=someothermachine:0.0" game. These machines are connected by an old-fashioned 10 Mbit/sec network, which you may argue is obsolete, but I would argue represents reasonably the approximate bandwidth of 802.11b/g LANs which are common now. Anyway, QEMU from cvs is unusable like this, at least when running WinXP or Win2K. The SDL layer (qemu/sdl.c) blasts huge numbers of pixels across the network in response to even the simplest graphical operations. Notably, moving the mouse pointer is appalling, with an update rate of about twice per second, which is hopeless. The attached patch against sdl.c fixes this. It keeps a shadow copy of video memory. When a request arrives at sdl_update() to redraw an area, the area is compared against the shadow copy, and only the parts that have really changed are passed to SDL_UpdateRect(). The comparison is done at a granularity of 32x32 chunks of pixels. Currently I have an XP session running for several hours. Without the patch, 2.7 million of these 32x32 chunks would have been transmitted across the network, whereas with it, the number is reduced to 322000, a factor of 9 reduction. This makes QEMU actually usable for me. The most dramatic improvement is merely moving the mouse pointer, where typically QEMU requests 32 such chunks to be drawn, but only 1 or 2 are actually necessary, massively improving mouse-pointer responsiveness. Of course it brings no benefit if sdl_update is constantly asked to draw a series of "new" images, but that generally doesn't appear to be the case. Given that currently QEMU is (imo) unusable unless X client and server run on the same machine, I think this is a good candidate for cvs. It may need to handle the case where screen->format->BytesPerPixel != 4 I don't know if that situation happens or not, but the patch is a good start. J --Boundary-00=_Sy67A2Mae8weAiX Content-Type: text/x-diff; charset="us-ascii"; name="remote_x_client.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="remote_x_client.patch" Index: sdl.c =================================================================== RCS file: /cvsroot/qemu/qemu/sdl.c,v retrieving revision 1.15 diff -u -3 -p -r1.15 sdl.c --- sdl.c 5 Jul 2004 22:13:07 -0000 1.15 +++ sdl.c 10 Jul 2004 08:14:05 -0000 @@ -33,6 +33,8 @@ #define CONFIG_SDL_GENERIC_KBD #endif +#include + static SDL_Surface *screen; static int gui_grab; /* if true, all keyboard/mouse events are grabbed */ static int last_vm_running; @@ -41,10 +43,126 @@ static int gui_fullscreen; static int gui_key_modifier_pressed; static int gui_keysym; +/* Mechanism to reduce the total amount of data transmitted to the X + server, often quite dramatically. Keep a shadow copy of video + memory in alt_pixels, and when asked to update a rectangle, used + the shadow copy to establish areas which are the same, and so do + not need updating. +*/ + +static int* alt_pixels = NULL; + +#define THRESH 32 + +/* Return 1 if the area [x .. x+w-1, y .. y+w-1] is different from + the old version and so needs updating. */ +static int cmpArea ( int x, int y, unsigned int w, unsigned int h) +{ + int i, j; + unsigned int sll; + int* p1base = (int*)screen->pixels; + int* p2base = (int*)alt_pixels; + assert(screen->format->BytesPerPixel == 4); + sll = ((unsigned int)screen->pitch) >> 2; + + for (j = y; j < y+h; j++) { + for (i = x; i < x+w; i++) { + if (p1base [j * sll +i] != p2base [j * sll +i]) + return 1; + } + } + return 0; +} + +static void copyArea ( int x, int y, unsigned int w, unsigned int h) +{ + int i, j; + unsigned int sll; + int* p1base = (int*)screen->pixels; + int* p2base = (int*)alt_pixels; + assert(screen->format->BytesPerPixel == 4); + sll = ((unsigned int)screen->pitch) >> 2; + for (j = y; j < y+h; j++) { + for (i = x; i < x+w; i++) { + p2base [j * sll +i] = p1base [j * sll +i]; + } + } +} + + +static void econoUpdate (DisplayState *ds, int x, int y, + unsigned int w, unsigned int h) +{ + static int tested_total = 0; + static int update_total = 0; + + int xi, xj, xlim, yi, yj, ylim, ntest, nupd; + if (w == 0 || h == 0) + return; + xlim = x + w; + ylim = y + h; + + ntest = nupd = 0; + for (xi = x; xi < xlim; xi += THRESH) { + xj = xi + THRESH; + if (xj > xlim) xj = xlim; + for (yi = y; yi < ylim; yi += THRESH) { + yj = yi + THRESH; + if (yj > ylim) yj = ylim; + if (xj-xi == 0 || yj-yi == 0) + continue; + ntest++; + if (cmpArea(xi, yi, xj-xi, yj-yi)) { + nupd++; + copyArea(xi, yi, xj-xi, yj-yi); + SDL_UpdateRect(screen, xi, yi, xj-xi, yj-yi); + } + } + } + tested_total += ntest; + update_total += nupd; + printf("(tested, updated): total (%d, %d), this time (%d, %d)\n", + tested_total, update_total, ntest, nupd); +} + + static void sdl_update(DisplayState *ds, int x, int y, int w, int h) { - // printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h); - SDL_UpdateRect(screen, x, y, w, h); + int warned = 0; + //printf("updating x=%d y=%d w=%d h=%d\n", x, y, w, h); + // printf("Total Size %d %d\n", screen->w, screen->h); + //printf("BytesPerPixel %d\n", screen->format->BytesPerPixel); + //printf("pitch %d\n", screen->pitch); + + if (screen->format->BytesPerPixel != 4 + || screen->pitch != screen->w * screen->format->BytesPerPixel) { + if (!warned) { + warned = 1; + fprintf(stderr, "qemu: SDL update optimisation disabled\n" + " (wrong bits-per-pixel, or wrong pitch)\n"); + } + SDL_UpdateRect(screen, x, y, w, h); + return; + } + + assert(screen->pitch == screen->w * screen->format->BytesPerPixel); + assert(sizeof(int) == 4); + if (alt_pixels == NULL) { + /* First time through (at this resolution). + Copy entire screen. */ + int i, word32s = screen->w * screen->h; + //printf("copying init screen\n"); + alt_pixels = malloc(word32s * sizeof(int)); + assert(alt_pixels); + for (i = 0; i < word32s; i++) + alt_pixels[i] = ((int*)(screen->pixels))[i]; + SDL_UpdateRect(screen, x, y, w, h); + //printf("done\n"); + } else { + assert(w >= 0); + assert(h >= 0); + econoUpdate(ds, x, y, w, h); + } } static void sdl_resize(DisplayState *ds, int w, int h) @@ -53,6 +171,11 @@ static void sdl_resize(DisplayState *ds, // printf("resizing to %d %d\n", w, h); + if (alt_pixels) { + free(alt_pixels); + alt_pixels = NULL; + } + flags = SDL_HWSURFACE|SDL_ASYNCBLIT|SDL_HWACCEL; flags |= SDL_RESIZABLE; if (gui_fullscreen) --Boundary-00=_Sy67A2Mae8weAiX--