From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52123) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bN2eF-0000fG-7y for qemu-devel@nongnu.org; Tue, 12 Jul 2016 14:43:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bN2eD-00079i-VA for qemu-devel@nongnu.org; Tue, 12 Jul 2016 14:43:11 -0400 References: <1468345431-106198-1-git-send-email-vsementsov@virtuozzo.com> From: Eric Blake Message-ID: <57853A35.4030501@redhat.com> Date: Tue, 12 Jul 2016 12:43:01 -0600 MIME-Version: 1.0 In-Reply-To: <1468345431-106198-1-git-send-email-vsementsov@virtuozzo.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="71FlXQEd9OOPjrxXAPAwFWjehh2QpCVCg" Subject: Re: [Qemu-devel] [PATCH] qcow2: do not allocate extra memory List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: mreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com, den@openvz.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --71FlXQEd9OOPjrxXAPAwFWjehh2QpCVCg From: Eric Blake To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: mreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com, den@openvz.org Message-ID: <57853A35.4030501@redhat.com> Subject: Re: [PATCH] qcow2: do not allocate extra memory References: <1468345431-106198-1-git-send-email-vsementsov@virtuozzo.com> In-Reply-To: <1468345431-106198-1-git-send-email-vsementsov@virtuozzo.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 07/12/2016 11:43 AM, Vladimir Sementsov-Ogievskiy wrote: > There are no needs to allocate more than one cluster, as we set > avail_out for deflate to one cluster. >=20 > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- >=20 > Hi all! >=20 > Please, can anybody say me what I'm missing? >=20 https://tools.ietf.org/html/rfc1951 states a simple fact about compressio= n: A simple counting argument shows that no lossless compression algorithm can compress every possible input data set. For the format defined here, the worst case expansion is 5 bytes per 32K- byte block, i.e., a size increase of 0.015% for large data sets. So overallocating the output buffer guarantees that you will get a valid compression result via a single function call, even when the data is incompressible (the zlib format specifically documents that if the normal algorithm on the data does not reduce its size, then you merely add a fixed-length marker that documents that fact, so you at least avoid unbounded expansion when trying to compress pathological data). But since the qcow2 format already has a way of documenting whether a cluster is compressed or not, we probably don't have to rely on zlib's marker for uncompressible data, and could instead tweak the code to specifically refuse to compress any cluster whose output would result in more than a cluster's worth of bytes. I'm not familiar enough with zlib's interface to know how easy or hard this is, and whether merely checking error codes is sufficient, nor whether qemu's use of zlib would behave correctly in the face of such an error when the output buffer is undersized because the data was incompressible. > ... > strm.avail_out =3D s->cluster_size; > strm.next_out =3D out_buf; >=20 > ret =3D deflate(&strm, Z_FINISH); > ... > out_len =3D strm.next_out - out_buf; You've skipped what is done with ret, which will be different according to whether the entire compressed stream fit in the buffer described by strm, and that would have to be audited as part of your proposed patch. > - out_buf =3D g_malloc(s->cluster_size + (s->cluster_size / 1000) + = 128); > + out_buf =3D g_malloc(s->cluster_size); Is avoiding the fudge factor really worth it? I don't know that we'll get a noticeable performance gain with this patch, and it may be easier to leave things alone than to audit that we are correctly handling cases where the attempt at compression results in a zlib buffer larger than the original data, even when the output buffer size is now constrained differently. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --71FlXQEd9OOPjrxXAPAwFWjehh2QpCVCg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJXhTo1AAoJEKeha0olJ0NqaiIIAKeh9Tjoj2ulQyPy7Oo+ZSiJ VgnghLj/rGFjDzglvcyG9HJkafkIKmx8FaLxJdu1OHb9spiCR8PNgzrWfs9f8w2i ejgkHYqeeNCjPZn45n2+OAUYnulntoClfu3JeB0AEGt9aNA8QVzJR/UCMr6/7WX1 zTltaBdS4Vxol2iwIRrqqnSz5f1zLEgc8NpxZP7yzJgC11rh9UAqS62NoHiO/xsS Uzr2T5Eg6IewyofooBR/EeVC8SrUrUQjvAnv6GG7l64rgJPMDa/r0E3JnUpowgba OQ/u83yv5jlxE8rl2Oo6Jf7fqV2xDV+/II4/a5R3D+uuN8HptiB/hHdpZk7j6Fk= =igUe -----END PGP SIGNATURE----- --71FlXQEd9OOPjrxXAPAwFWjehh2QpCVCg--