Re: [Qemu-devel] QCow2 compression

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: <mgreger@cinci.rr.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] QCow2 compression
Date: Sat, 5 Mar 2016 0:11:44 +0000	[thread overview]
Message-ID: <20160305001144.6JX6U.312524.root@dnvrco-web07> (raw)


---- Eric Blake <eblake@redhat.com> wrote: 
> [any way you could convince your mailer to not break threading?]
> 
> On 03/03/2016 09:24 PM, mgreger@cinci.rr.com wrote:
> >>
> >> The zeros are not part of the compressed data, though, that's why the 
> >> Compressed Cluster Descriptor indicates a shorter size. Had another 
> >> compressed cluster been written to the same image, it might have ended 
> >> up where you are seeing the zero padding now. (The trick with 
> >> compression is that multiple guest clusters can end up in a single host 
> >> cluster.) 
> >>
> >  
> > Thanks, but the given length of 0x5600 is still short by 160(decimal) bytes 
> > compared to the 
> > non-zero data (which occupies an additional 133 bytes beyond the expected end at 
> > 0x3DED50) and zero 
> > padding (an additional 27 bytes beyond that). Could there be an off-by-one error 
> > somewhere? 
> > The file doesn't even end on a sector boundary let alone a cluster boundary. 
> 
> Based on an IRC conversation I had when you first asked the question, I
> think the spec is indeed weak, and that we DO have some fishy code.
> 
> Look at what the code does:
> 
> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>                                                uint64_t offset,
>                                                int compressed_size)
> ...
>     nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
>                   (cluster_offset >> 9);
> 
>     cluster_offset |= QCOW_OFLAG_COMPRESSED |
>                       ((uint64_t)nb_csectors << s->csize_shift);
> 
> That sure does sound like an off-by-one.  cluster_offset does indeed
> look like a byte offset (from qcow2_alloc_bytes); so let's consider what
> happens if we've already allocated one compressed cluster in the past,
> going from 65536 to 66999.  So on this call, cluster_offset would be
> 67000, and if compressed size is 1025 (just round numbers to make
> discussion easy; no idea if gzip would really do this on any particular
> data), we are computing ((67000+1025-1)>>9) - (67000>>9) == 2, but 1025
> bytes occupies 3 sectors, not 2.
> 
> But we offset it by another off-by-one:
> 
> int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
> {
> ...
>         nb_csectors = ((cluster_offset >> s->csize_shift) &
> s->csize_mask) + 1;
> 
> Yuck.  That is horribly ugly.
> 
> We need to fix the documentation to make it obvious that the sector
> count is a _LOWER BOUND_ on the number of sectors occupied, and that you
> need to read at least one more cluster's worth of data before decompressing.
> 
> It would also be nice to fix qcow2 code to not have quite so many
> off-by-one computations, but be more precise about what data is going where.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

Thanks for verifying the behavior for me. This will
allow me to add compression support to the code I've
already written.

I've implemented QCow2 support for a fork of the popular
DOSBox emulator.

The project (not mine), is called DOSBox-X, and is geared
towards emulating Windows9x era hardware with a specific
focus on games: http://dosbox-x.com/

If you'd like to give me any feedback on the QCow2 code I've written
please feel free to e-mail me directly. The relevant files are
qcow2_disk.cpp and qcow_disk.h. If you do, please be kind as
my C++ skills are very rusty. Overall I think it is pretty readable.
It took me about 2 and a half weeks working in my spare time to go
from reading the spec to the final code integrated into DOSBox-X.

Thanks,
Michael Greger

next             reply	other threads:[~2016-03-05  0:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-05  0:11 mgreger [this message]
  -- strict thread matches above, loose matches on Subject: below --
2016-03-04  4:24 [Qemu-devel] QCow2 compression mgreger
2016-03-04 22:19 ` Eric Blake
2016-02-27  5:00 mgreger
2016-02-29 14:01 ` Kevin Wolf
2016-02-29 15:58   ` Eric Blake
2016-02-29 14:59 ` Eric Blake
2016-02-29 15:54   ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160305001144.6JX6U.312524.root@dnvrco-web07 \
    --to=mgreger@cinci.rr.com \
    --cc=eblake@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).