From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:40630)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1RiR9t-0001FA-Jn
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 08:45:43 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1RiR9n-0005Rj-5o
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 08:45:37 -0500
Received: from mx1.redhat.com ([209.132.183.28]:14700)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1RiR9m-0005Rf-T1
	for qemu-devel@nongnu.org; Wed, 04 Jan 2012 08:45:31 -0500
Message-ID: <4F0457F6.4030400@redhat.com>
Date: Wed, 04 Jan 2012 15:45:26 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <1325604879-15862-1-git-send-email-owasserm@redhat.com>
	<1325604879-15862-3-git-send-email-owasserm@redhat.com>
	<4F044D41.5000200@redhat.com>
	<CAJSP0QVjqx-acfp8wqCLLdjJpufYcqJ3HmMVRj_0tZoayKndmg@mail.gmail.com>
In-Reply-To: <CAJSP0QVjqx-acfp8wqCLLdjJpufYcqJ3HmMVRj_0tZoayKndmg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v5 2/9] Add rle_encode and rle_decode
 functions Implement Run Length Encoding compression
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: blauwirbel@gmail.com, Orit Wasserman <owasserm@redhat.com>, qemu-devel@nongnu.org, quintela@redhat.com

On 01/04/2012 03:35 PM, Stefan Hajnoczi wrote:
> > I think we should specialize this for out case, where we expect runs of
> > zeros (so no need to encode the repeated byte) and runs of non-repeating
> > nonzeros.  I propose this encoding:
> >
> >  page = zrun
> >       | zrun nzrun
> >       | zrun nzrun page
> >
> >  zrun = length
> >
> >  nzrun = length byte...
> >
> >  length = uleb128 encoded integer
> >
> > take for example a xor-encoded page:
> >
> >  { 1000*0, 1, 2, 3, 4, 3092*0 }
> >
> > representing a page that had a single 32-bit write in the middle.  The
> > current encoding would generate
> >
> >  ff 00 ff 00 ff 00 eb 00 01 01 01 02 01 03 01 04 ff 00 ff 00 ff 00 ff
> > 00 ff 00 ff 00ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 20 00
> >
> > while the zrle encoding generates
> >
> >  e8 07 04 01 02 03 04 94 18
> >
> > (e8 07 = uleb128 encoding for 1000)
>
> Had to look up the Unsigned Little-Endian Base 128 encoding, but it's
> a nice idea and simple to implement (though we probably want to
> constrain ULEB128 to max 32-bit or 64-bit in practice; we don't want
> arbitrarily long integers).
>
> http://en.wikipedia.org/wiki/LEB128
>

Sorry, should have provided the URL.  And yes, for this use we're
limited to TARGET_PAGE_BITS, so having 13-bit encoders/decoders would
suffice:

   int uleb128_encode_small(uint8_t *out, uint32_t n)
   {
       if (n < 0x80) {
           *out++ = n;
           return 1;
       } else {
           *out++ = (n & 0x7f) | 0x80;
           *out++ = n >> 7;
       }
   }

   int uleb128_decode_small(const uint128 *in, uint32_t *n)
   {
       if (!(*in & 0x80)) {
           *n = *in++;
           return 1;
       } else {
           *n = *in++ & 0x7f;
           *n |= *in++ << 7;
           return 2;
       }
   }

-- 
error compiling committee.c: too many arguments to function