Re: Avoid copying unallocated clusters during full backup

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Leo Luan <leoluan@gmail.com>
To: Eric Blake <eblake@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	John Snow <jsnow@redhat.com>,
	qemu-devel@nongnu.org, Qemu-block <qemu-block@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: Avoid copying unallocated clusters during full backup
Date: Fri, 17 Apr 2020 15:57:30 -0700	[thread overview]
Message-ID: <CAD_3s9KuB7QaXCTygRfOsscXbrhm0kHbHRpUjb51EiwRpEM5MA@mail.gmail.com> (raw)
In-Reply-To: <7c722a98-29ab-ba65-2f19-088628ce8f00@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3334 bytes --]

On Fri, Apr 17, 2020 at 1:24 PM Eric Blake <eblake@redhat.com> wrote:

> On 4/17/20 3:11 PM, John Snow wrote:
>
> >> +
> >> +    if (s->sync_mode == MIRROR_SYNC_MODE_FULL &&
> >> +       s->bcs->target->bs->drv != NULL &&
> >> +       strncmp(s->bcs->target->bs->drv->format_name, "qcow2", 5) == 0
> &&
> >> +       s->bcs->source->bs->backing_file[0] == '\0')
> >
> > This isn't going to suffice upstream; the backup job can't be performing
> > format introspection to determine behavior on the fly.
>
> Agreed.  The idea is right (we NEED to make backup operations smarter
> based on knowledge about both source and destination block status), but
> the implementation is not (a check for strcncmp("qcow2") is not ideal).
>

I see/agree that using strncmp("qcow2") is not general enough for the
upstream.  Would changing it to bdrv_unallocated_blocks_are_zero() suffice?


> >
> > I think what you're really after is something like
> > bdrv_unallocated_blocks_are_zero().
>
> The fact that qemu-img already has a lot of optimizations makes me
> wonder what we can salvage from there into reusable code that both
> qemu-img and block backup can share, so that we're not reimplementing
> block status handling in multiple places.
>

A general fix reusing some existing code would be great.  When will it
appear in the upstream?  We are hoping to avoid needing to use a private
branch if possible.

>
> > So the basic premise is that if you are copying a qcow2 file and the
> > unallocated portions as defined by the qcow2 metadata are zero, it's
> > safe to skip those, so you can treat it like SYNC_MODE_TOP.
> >
> > I think you *also* have to know if the *source* needs those regions
> > explicitly zeroed, and it's not always safe to just skip them at the
> > manifest level.
> >
> > I thought there was code that handled this to some extent already, but I
> > don't know. I think Vladimir has worked on it recently and can probably
> > let you know where I am mistaken :)
>
> Yes, I'm hoping Vladimir (or his other buddies at Virtuozzo) can chime
> in.  Meanwhile, I've working on v2 of some patches that will improve
> qemu's ability to tell if a destination qcow2 file already reads as all
> zeroes, and we already have bdrv_block_status() for telling which
> portions of a source image already read as all zeroes (whether or not it
> is due to not being allocated, the goal here is that we should NOT have
> to copy anything that reads as zero on the source over to the
> destination if the destination already starts life as reading all zero).
>

Can the eventual/optimal solution allow unallocated clusters to be skipped
entirely in the backup loop and make the detection of allocated zeroes an
option, not forcing the backup thread to loop through a potentially huge
empty virtual disk?

>
> And if nothing else, qemu 5.0 just added 'qemu-img convert
> --target-is-zero' as a last-ditch means of telling qemu to assume the
> destination reads as all zeroes, even if it cannot quickly prove it; we
> probably want to add a similar knob into the QMP commands for initiating
> block backup, for the same reasons.
>

This seems a good way of assuring the status of the target file.

Thanks!

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
>

[-- Attachment #2: Type: text/html, Size: 4836 bytes --]

next prev parent reply	other threads:[~2020-04-17 22:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-17 18:33 Avoid copying unallocated clusters during full backup Leo Luan
2020-04-17 20:11 ` John Snow
2020-04-17 20:24   ` Eric Blake
2020-04-17 22:57     ` Leo Luan [this message]
2020-04-18  0:34       ` John Snow
2020-04-18  1:43         ` Leo Luan
2020-04-20 10:56           ` Vladimir Sementsov-Ogievskiy
2020-04-20 14:31             ` Bryan S Rosenburg
2020-04-20 15:04               ` Vladimir Sementsov-Ogievskiy
2020-04-21 14:41                 ` Bryan S Rosenburg
2020-04-17 22:31   ` Leo Luan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD_3s9KuB7QaXCTygRfOsscXbrhm0kHbHRpUjb51EiwRpEM5MA@mail.gmail.com \
    --to=leoluan@gmail.com \
    --cc=eblake@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).