From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:41738)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1SmAhq-0000Qz-Oa
	for qemu-devel@nongnu.org; Tue, 03 Jul 2012 17:32:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1SmAho-00037H-A9
	for qemu-devel@nongnu.org; Tue, 03 Jul 2012 17:32:22 -0400
Received: from mx1.redhat.com ([209.132.183.28]:40523)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1SmAho-00036l-19
	for qemu-devel@nongnu.org; Tue, 03 Jul 2012 17:32:20 -0400
Message-ID: <4FF364D8.4020407@redhat.com>
Date: Tue, 03 Jul 2012 15:32:08 -0600
From: Eric Blake <eblake@redhat.com>
MIME-Version: 1.0
References: <1341323574-23206-1-git-send-email-owasserm@redhat.com>
	<1341323574-23206-11-git-send-email-owasserm@redhat.com>
In-Reply-To: <1341323574-23206-11-git-send-email-owasserm@redhat.com>
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="------------enig6FB6C486FEDA83EEA5DC64D6"
Subject: Re: [Qemu-devel] [PATCH v14 10/13] Add xbzrle_encode_buffer and
 xbzrle_decode_buffer functions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Orit Wasserman <owasserm@redhat.com>
Cc: peter.maydell@linaro.org, aliguori@us.ibm.com, quintela@redhat.com, stefanha@gmail.com, qemu-devel@nongnu.org, mdroth@linux.vnet.ibm.com, blauwirbel@gmail.com, Petter Svard <petters@cs.umu.se>, Benoit Hudzia <benoit.hudzia@sap.com>, avi@redhat.com, Aidan Shribman <aidan.shribman@sap.com>, pbonzini@redhat.com, chegu_vinod@hp.com

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig6FB6C486FEDA83EEA5DC64D6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 07/03/2012 07:52 AM, Orit Wasserman wrote:
> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
> Signed-off-by: Petter Svard <petters@cs.umu.se>
> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

> +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,=

> +                         uint8_t *dst, int dlen)
> +{
> +    uint32_t zrun_len =3D 0, nzrun_len =3D 0;
> +    int d =3D 0, i =3D 0, start;
> +    long res, xor;
> +    uint8_t *nzrun_start =3D NULL;
> +
> +    g_assert(!((uintptr_t)old_buf & (sizeof(long) - 1)) &&
> +             !((uintptr_t)new_buf & (sizeof(long) - 1)) &&
> +             !(slen & (sizeof(long) - 1)));

I know I suggested this in v13, so now I'm bike-shedding my own review,
but I think it might be a bit more legible as:

g_assert(!(((uintptr_t)old_buf) | ((uintptr_t)new_buf) | slen) &
           (sizeof(long) - 1)));

> +
> +    while (i < slen) {
> +        /* overflow */
> +        if (d + 2 > dlen) {
> +            return -1;
> +        }
> +
> +        /* not aligned to sizeof(long) */
> +        res =3D (slen - i) % sizeof(long);
> +        if (res) {
> +            start =3D i;
> +            while (!(old_buf[i] ^ new_buf[i]) && (i - start) <=3D res)=
 {
> +                zrun_len++;
> +                i++;
> +            }
> +        }

I'm not sure why you need 'start'; this same block will do the same
amount of computation with less typing and register pressure:

res =3D (slen - i) % sizeof(long);
while (res && old_buf[i] =3D=3D new_buf[i]) {
    zrun_len++;
    i++;
    res--;
}

> +
> +        if (zrun_len =3D=3D res) {

If you use my above rewrite, then this condition changes to:

if (!res) {

> +            while (i <=3D (slen - sizeof(long)) &&

I'd shorten this to:

while (i < slen &&

> +                   (*(long *)(old_buf + i)) =3D=3D (*(long *)(new_buf =
+ i))) {
> +                i +=3D sizeof(long);
> +                zrun_len +=3D sizeof(long);
> +            }
> +
> +            /* go over the rest */
> +            while (old_buf[i] =3D=3D new_buf[i] && ++i <=3D slen) {

Buffer overrun: if the page ends on a zrun, then at this point, i =3D=3D
slen, and old_buf[i] is now pointing one past the end of the array.  You
need to swap the conditions, and only increment 'i' inside the {}, as in:=


while (i < slen && old_buf[i] =3D=3D new_buf[i]) {
    i++;

> +                zrun_len++;
> +            }
> +        }
> +
> +        /* buffer unchanged */
> +        if (zrun_len =3D=3D slen) {
> +            return 0;
> +        }
> +
> +        /* skip last zero run */
> +        if (i =3D=3D slen + 1) {

Buffer overrun.  The loop should terminate at 'i =3D=3D slen', not at 'i =
=3D=3D
slen + 1', since old_buf[slen - 1] is the last addressable byte.

> +            return d;
> +        }
> +
> +        d +=3D uleb128_encode_small(dst + d, zrun_len);
> +
> +        /* no nzrun */
> +        if (i =3D=3D slen) {
> +            return d;
> +        }

This is the correct code but repeats the intent of what you were trying
just before the uleb128_encode_small().

> +
> +        zrun_len =3D 0;
> +        nzrun_start =3D new_buf + i;

Missing an overflow check here.  If there are fewer than 2 bytes before
we hit dlen, then there is no way that we can fit in an nzrun, so we
might as well quit here.

> +
> +        /* not aligned to sizeof(long) */
> +        res =3D (slen - i) % sizeof(long);
> +        if (res) {
> +            start =3D i;
> +            while (old_buf[i] !=3D new_buf[i] && i - start < res) {
> +                i++;
> +                nzrun_len++;
> +            }
> +        }

Again, you can exploit the fact that slen is aligned, and shorten to:

res =3D (slen - i) % sizeof(long);
while (res && old_bf[i] !=3D new_buf[i]) {
    i++;
    nzrun_len++;
    res--;
}

> +
> +        if (nzrun_len =3D=3D res) {

and this would change to:

if (!res) {

> +            /* truncation to 32-bit long okay */
> +            long mask =3D 0x0101010101010101ULL;
> +            xor =3D *(long *)(old_buf + i) ^ *(long *)(new_buf + i);
> +            printf("cont\n");

Stray debugging comment.

> +            while (i <=3D (slen - sizeof(long))) {

shorter to write as:

while (i < slen) {

> +                if ((xor - mask) & ~xor & (mask << 7)) {
> +                    /* found the end of an nzrun within the current lo=
ng */
> +                    while (old_buf[i] !=3D new_buf[i] && ++i <=3D slen=
) {

No need to compare against slen here, since the outer while guarantees
that.  You can simplify to:

while (old_buf[i] !=3D new_buf[i]) {
    i++;

> +                        nzrun_len++;
> +                    }
> +                    break;
> +                } else {
> +                    i +=3D sizeof(long);
> +                    nzrun_len +=3D sizeof(long);
> +                    xor =3D *(long *)(old_buf + i) ^ *(long *)(new_buf=
 + i);
> +                }
> +            }
> +

At this point, you are guaranteed to have found the end of the nzrun
(either you found two equal bytes, or you hit slen), which means...

> +            while (old_buf[i] !=3D new_buf[i] && ++i <=3D slen) {
> +                nzrun_len++;
> +            }

this while loop is dead code.  Furthermore, this while loop is a
potential out-of-bounds-read (remember, old_buf[slen] is not valid
memory), so nuking it is the way to go.

> +        }
> +
> +        /* overflow */
> +        if (d + nzrun_len + 2 > dlen) {
> +            return -1;

There is a corner case where this reports overflow even when there was
none - that is when the old_buf ends in an nzrun of 0x7f bytes or less,
so the encoding occupies only 1 byte, and the encoded version would
still fit in dlen.  I think you want to defer your overflow checking
until after...

> +        }
> +
> +        d +=3D uleb128_encode_small(dst + d, nzrun_len);

=2E..you know the encoded size.  But that only works if you made sure
earlier that there was room to encode nzrun_len (up to 2 bytes) in the
first place.

> +        memcpy(dst + d, nzrun_start, nzrun_len);
> +        d +=3D nzrun_len;
> +        nzrun_len =3D 0;
> +    }
> +
> +    return d;
> +}
> +

Do you have any unit tests that pass in various inputs and compare
against expected outputs?

> +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dle=
n)
> +{
> +    int i =3D 0, d =3D 0;
> +    int ret;
> +    uint32_t count =3D 0;
> +
> +    while (i < slen) {
> +
> +        /* zrun */
> +        if ((slen - i) < 2 && *(src + i) & 0x80) {
> +            return -1;
> +        }

You can simplify this - the protocol demands an nzrun after every zrun,
which means at this point in the decode, there must be at least 3 bytes
to be valid.  It's simpler to write:

if (slen - i < 2) {
    return -1;
}

> +
> +        ret =3D uleb128_decode_small(src + i, &count);
> +        if (ret < 0) {
> +            return -1;
> +        }

I'd also add a check that we only ever get a zero-length zrun at the
start of a buffer (anywhere else, a zero-length zrun is invalid); as in:

if (!count && i) {
    return -1;
}

> +        i +=3D ret;
> +        d +=3D count;
> +
> +        /* overflow */
> +        if (d > dlen) {
> +            return -1;
> +        }
> +
> +        /* completed decoding */

No, if we got here, then we were handed an invalid stream.  Remember,
the protocol says that every zrun must be followed by an nzrun.

> +        if (i =3D=3D slen - 1) {
> +            return d;
> +        }
> +
> +        /* nzrun */
> +        if ((slen - i) < 2 && *(src + i) & 0x80) {
> +            return -1;
> +        }

Again, a valid nzrun is at least 2 bytes, so no need to worry about the
contents of *src, it is sufficient to check:

if (slen - i < 2) {
    return -1;
}

> +        ret =3D uleb128_decode_small(src + i, &count);
> +        if (ret < 0) {

An nzrun should be a non-zero value; I'd write this as (ret <=3D 0) to
rule out an attempt to pass a zero-length nzrun.

> +            return -1;
> +        }
> +        i +=3D ret;
> +
> +        /* overflow */
> +        if (d + count > dlen) {
> +            return -1;
> +        }
> +
> +        memcpy(dst + d , src + i, count);
> +        d +=3D count;
> +        i +=3D count;
> +    }
> +
> +    return d;
> +}
>=20

--=20
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--------------enig6FB6C486FEDA83EEA5DC64D6
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBCAAGBQJP82TYAAoJEKeha0olJ0NqJKwH/2oHs+u3yheZsadmnotsm7Dg
IhRCfaeIxtUHYJn4hb8ZZjT62mmTsra23qMkgbaprya+4RhMs5rUjxaGExbf7EZ/
SifAGfPbZTTOQ1cZa7EgG5Op+DT7pjChdFZ9x1MtHK1ueTD5WYETk8LwLcv5qvXf
anYjeJRiXF4d+LasKt+kl2Kwb5dQUHLvPHOn4JBD4ztRQ0/EBahTvAfkdOwpLxQM
CEvk4x+AFvE0ZaamAyJZ2jN0G/gzS0hifZi3tTJcCvuYA2reCBiyKmouW6EL9v4i
bgxm6UVwmOu2/UzRM/JW86W0ywq5c2dqbtC9EPl+c+1DLSifD8H6cDdHCAgNf3o=
=WUxm
-----END PGP SIGNATURE-----

--------------enig6FB6C486FEDA83EEA5DC64D6--