Re: [Qemu-devel] [PATCH] qcow2: do not allocate extra memory

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Blake <eblake@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: mreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH] qcow2: do not allocate extra memory
Date: Tue, 12 Jul 2016 12:43:01 -0600	[thread overview]
Message-ID: <57853A35.4030501@redhat.com> (raw)
In-Reply-To: <1468345431-106198-1-git-send-email-vsementsov@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 2735 bytes --]

On 07/12/2016 11:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> There are no needs to allocate more than one cluster, as we set
> avail_out for deflate to one cluster.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Hi all!
> 
> Please, can anybody say me what I'm missing?
> 

https://tools.ietf.org/html/rfc1951 states a simple fact about compression:

      A simple counting argument shows that no lossless compression
      algorithm can compress every possible input data set.  For the
      format defined here, the worst case expansion is 5 bytes per 32K-
      byte block, i.e., a size increase of 0.015% for large data sets.

So overallocating the output buffer guarantees that you will get a valid
compression result via a single function call, even when the data is
incompressible (the zlib format specifically documents that if the
normal algorithm on the data does not reduce its size, then you merely
add a fixed-length marker that documents that fact, so you at least
avoid unbounded expansion when trying to compress pathological data).

But since the qcow2 format already has a way of documenting whether a
cluster is compressed or not, we probably don't have to rely on zlib's
marker for uncompressible data, and could instead tweak the code to
specifically refuse to compress any cluster whose output would result in
more than a cluster's worth of bytes.  I'm not familiar enough with
zlib's interface to know how easy or hard this is, and whether merely
checking error codes is sufficient, nor whether qemu's use of zlib would
behave correctly in the face of such an error when the output buffer is
undersized because the data was incompressible.

> ...
> strm.avail_out = s->cluster_size;
> strm.next_out = out_buf;
> 
> ret = deflate(&strm, Z_FINISH);
> ...
> out_len = strm.next_out - out_buf;

You've skipped what is done with ret, which will be different according
to whether the entire compressed stream fit in the buffer described by
strm, and that would have to be audited as part of your proposed patch.

> -    out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
> +    out_buf = g_malloc(s->cluster_size);

Is avoiding the fudge factor really worth it? I don't know that we'll
get a noticeable performance gain with this patch, and it may be easier
to leave things alone than to audit that we are correctly handling cases
where the attempt at compression results in a zlib buffer larger than
the original data, even when the output buffer size is now constrained
differently.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

next prev parent reply	other threads:[~2016-07-12 18:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-12 17:43 [Qemu-devel] [PATCH] qcow2: do not allocate extra memory Vladimir Sementsov-Ogievskiy
2016-07-12 18:43 ` Eric Blake [this message]
2016-07-12 19:11   ` Vladimir Sementsov-Ogievskiy
2016-07-12 20:30     ` Eric Blake
2016-07-12 19:03 ` [Qemu-devel] [Qemu-block] " John Snow
2016-07-12 19:19 ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57853A35.4030501@redhat.com \
    --to=eblake@redhat.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).