From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:40630) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiR9t-0001FA-Jn for qemu-devel@nongnu.org; Wed, 04 Jan 2012 08:45:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RiR9n-0005Rj-5o for qemu-devel@nongnu.org; Wed, 04 Jan 2012 08:45:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:14700) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiR9m-0005Rf-T1 for qemu-devel@nongnu.org; Wed, 04 Jan 2012 08:45:31 -0500 Message-ID: <4F0457F6.4030400@redhat.com> Date: Wed, 04 Jan 2012 15:45:26 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1325604879-15862-1-git-send-email-owasserm@redhat.com> <1325604879-15862-3-git-send-email-owasserm@redhat.com> <4F044D41.5000200@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v5 2/9] Add rle_encode and rle_decode functions Implement Run Length Encoding compression List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: blauwirbel@gmail.com, Orit Wasserman , qemu-devel@nongnu.org, quintela@redhat.com On 01/04/2012 03:35 PM, Stefan Hajnoczi wrote: > > I think we should specialize this for out case, where we expect runs of > > zeros (so no need to encode the repeated byte) and runs of non-repeating > > nonzeros. I propose this encoding: > > > > page = zrun > > | zrun nzrun > > | zrun nzrun page > > > > zrun = length > > > > nzrun = length byte... > > > > length = uleb128 encoded integer > > > > take for example a xor-encoded page: > > > > { 1000*0, 1, 2, 3, 4, 3092*0 } > > > > representing a page that had a single 32-bit write in the middle. The > > current encoding would generate > > > > ff 00 ff 00 ff 00 eb 00 01 01 01 02 01 03 01 04 ff 00 ff 00 ff 00 ff > > 00 ff 00 ff 00ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 20 00 > > > > while the zrle encoding generates > > > > e8 07 04 01 02 03 04 94 18 > > > > (e8 07 = uleb128 encoding for 1000) > > Had to look up the Unsigned Little-Endian Base 128 encoding, but it's > a nice idea and simple to implement (though we probably want to > constrain ULEB128 to max 32-bit or 64-bit in practice; we don't want > arbitrarily long integers). > > http://en.wikipedia.org/wiki/LEB128 > Sorry, should have provided the URL. And yes, for this use we're limited to TARGET_PAGE_BITS, so having 13-bit encoders/decoders would suffice: int uleb128_encode_small(uint8_t *out, uint32_t n) { if (n < 0x80) { *out++ = n; return 1; } else { *out++ = (n & 0x7f) | 0x80; *out++ = n >> 7; } } int uleb128_decode_small(const uint128 *in, uint32_t *n) { if (!(*in & 0x80)) { *n = *in++; return 1; } else { *n = *in++ & 0x7f; *n |= *in++ << 7; return 2; } } -- error compiling committee.c: too many arguments to function