From: Kevin Wolf <kwolf@redhat.com>
To: Martin Kletzander <mkletzan@redhat.com>
Cc: qemu-devel@nongnu.org, Richard Jones <rjones@redhat.com>,
Eric Blake <eblake@redhat.com>
Subject: Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
Date: Tue, 23 Apr 2019 17:08:45 +0200 [thread overview]
Message-ID: <20190423150845.GG9041@localhost.localdomain> (raw)
In-Reply-To: <20190423142648.GA2967@wheatley>
[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]
Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
> > Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> > > Hi,
> > >
> > > I am using qemu-img with nbdkit to transfer a disk image and the update it with
> > > extra data from newer snapshots. The end image cannot be transferred because
> > > the snapshots will be created later than the first transfer and we want to save
> > > some time up front. You might think of it as a continuous synchronisation. It
> > > looks something like this:
> > >
> > > I first transfer the whole image:
> > >
> > > qemu-img convert -p $nbd disk.raw
> > >
> > > Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> > >
> > > Then, after the next snapshot is created, I can update it thanks to the `-n`
> > > parameter (the $nbd now points to the newer snapshot with unchanged data looking
> > > like holes in the file):
> > >
> > > qemu-img convert -p -n $nbd disk.raw
> > >
> > > This is fast and efficient as it uses block status nbd extension, so it only
> > > transfers new data.
> >
> > This is an implementation detail. Don't rely on it. What you're doing is
> > abusing 'qemu-img convert', so problems like what you describe are to be
> > expected.
> >
> > > This can be done over and over again to keep the local
> > > `disk.raw` image up to date with the latest remote snapshot.
> > >
> > > However, when the guest OS zeroes some of the data and it gets written into the
> > > snapshot, qemu-img scans for those zeros and does not write them to the
> > > destination image. Checking the output of `qemu-img map --output=json $nbd`
> > > shows that the zeroed data is properly marked as `data: true`.
> > >
> > > Using `-S 0` would write zeros even where the holes are, effectively overwriting
> > > the data from the last snapshot even though they should not be changed.
> > >
> > > Having gone through some workarounds I would like there to be another way. I
> > > know this is far from the typical usage of qemu-img, but is this really the
> > > expected behaviour or is this just something nobody really needed before? If it
> > > is the former, would it be possible to have a parameter that would control this
> > > behaviour? If the latter is the case, can that behaviour be changed so that it
> > > properly replicates the data when `-n` parameter is used?
> > >
> > > Basically the only thing we need is to either:
> > >
> > > 1) write zeros where they actually are or
> > >
> > > 2) turn off explicit sparsification without requesting dense image (basically
> > > sparsify only the par that is reported as hole on the source) or
> > >
> > > 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
> > > but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
> > > believe, is effectively the same)
> > >
> > > If you want to try this out, I found the easiest reproducible way is using
> > > nbdkit's data plugin, which can simulate whatever source image you like.
> >
> > I think what you _really_ want is a commit block job. The problem is
> > just that you don't have a proper backing file chain, but just a bunch
> > of NBD connections.
> >
> > Can't you get an NBD connection that already provides the condensed form
> > of the whole snapshot chain directly at the source? If the NBD server
> > was QEMU, this would actually be easier than providing each snapshot
> > individually.
> >
> > If this isn't possible, I think you need to replicate the backing chain
> > on the destination instead of converting into the same image again and
> > again so that qemu-img knows that it must take existing data of the
> > backing file into consideration:
> >
> > qemu-img convert -O qcow2 nbd://... base.qcow2
> > qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
> > qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
> > ...
> >
>
> I thought of this, but (to be honest) I did not know that `-B` would
> work for nbd.
It still depends on the NBD server providing the right block allocation
status, but that's no worse than what you needed for -n. But whether -B
can be used at all depends on the target format, not the source.
> Does it assume that data are to be taken from the base image if and
> only if the source (be it nbd server or just a plain file) says there
> is a hole? If yes, then it could nicely solve the issue.
I haven't tested it now, but yes, that's what I remember it to do.
Looking at the code, the requirement seems to be that the NBD server
flags the sparse blocks as a HOLE, but not as ZERO.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Kevin Wolf <kwolf@redhat.com>
To: Martin Kletzander <mkletzan@redhat.com>
Cc: qemu-devel@nongnu.org, Richard Jones <rjones@redhat.com>
Subject: Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
Date: Tue, 23 Apr 2019 17:08:45 +0200 [thread overview]
Message-ID: <20190423150845.GG9041@localhost.localdomain> (raw)
Message-ID: <20190423150845.ADwznPVgr0YXlKJlbm8nhfQA7q5HiaIAp3F5J5jdXRY@z> (raw)
In-Reply-To: <20190423142648.GA2967@wheatley>
[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]
Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
> > Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> > > Hi,
> > >
> > > I am using qemu-img with nbdkit to transfer a disk image and the update it with
> > > extra data from newer snapshots. The end image cannot be transferred because
> > > the snapshots will be created later than the first transfer and we want to save
> > > some time up front. You might think of it as a continuous synchronisation. It
> > > looks something like this:
> > >
> > > I first transfer the whole image:
> > >
> > > qemu-img convert -p $nbd disk.raw
> > >
> > > Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> > >
> > > Then, after the next snapshot is created, I can update it thanks to the `-n`
> > > parameter (the $nbd now points to the newer snapshot with unchanged data looking
> > > like holes in the file):
> > >
> > > qemu-img convert -p -n $nbd disk.raw
> > >
> > > This is fast and efficient as it uses block status nbd extension, so it only
> > > transfers new data.
> >
> > This is an implementation detail. Don't rely on it. What you're doing is
> > abusing 'qemu-img convert', so problems like what you describe are to be
> > expected.
> >
> > > This can be done over and over again to keep the local
> > > `disk.raw` image up to date with the latest remote snapshot.
> > >
> > > However, when the guest OS zeroes some of the data and it gets written into the
> > > snapshot, qemu-img scans for those zeros and does not write them to the
> > > destination image. Checking the output of `qemu-img map --output=json $nbd`
> > > shows that the zeroed data is properly marked as `data: true`.
> > >
> > > Using `-S 0` would write zeros even where the holes are, effectively overwriting
> > > the data from the last snapshot even though they should not be changed.
> > >
> > > Having gone through some workarounds I would like there to be another way. I
> > > know this is far from the typical usage of qemu-img, but is this really the
> > > expected behaviour or is this just something nobody really needed before? If it
> > > is the former, would it be possible to have a parameter that would control this
> > > behaviour? If the latter is the case, can that behaviour be changed so that it
> > > properly replicates the data when `-n` parameter is used?
> > >
> > > Basically the only thing we need is to either:
> > >
> > > 1) write zeros where they actually are or
> > >
> > > 2) turn off explicit sparsification without requesting dense image (basically
> > > sparsify only the par that is reported as hole on the source) or
> > >
> > > 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
> > > but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
> > > believe, is effectively the same)
> > >
> > > If you want to try this out, I found the easiest reproducible way is using
> > > nbdkit's data plugin, which can simulate whatever source image you like.
> >
> > I think what you _really_ want is a commit block job. The problem is
> > just that you don't have a proper backing file chain, but just a bunch
> > of NBD connections.
> >
> > Can't you get an NBD connection that already provides the condensed form
> > of the whole snapshot chain directly at the source? If the NBD server
> > was QEMU, this would actually be easier than providing each snapshot
> > individually.
> >
> > If this isn't possible, I think you need to replicate the backing chain
> > on the destination instead of converting into the same image again and
> > again so that qemu-img knows that it must take existing data of the
> > backing file into consideration:
> >
> > qemu-img convert -O qcow2 nbd://... base.qcow2
> > qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
> > qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
> > ...
> >
>
> I thought of this, but (to be honest) I did not know that `-B` would
> work for nbd.
It still depends on the NBD server providing the right block allocation
status, but that's no worse than what you needed for -n. But whether -B
can be used at all depends on the target format, not the source.
> Does it assume that data are to be taken from the base image if and
> only if the source (be it nbd server or just a plain file) says there
> is a hole? If yes, then it could nicely solve the issue.
I haven't tested it now, but yes, that's what I remember it to do.
Looking at the code, the requirement seems to be that the NBD server
flags the sparse blocks as a HOLE, but not as ZERO.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
next prev parent reply other threads:[~2019-04-23 15:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-23 11:30 [Qemu-devel] Possibly incorrect data sparsification by qemu-img Martin Kletzander
2019-04-23 11:30 ` Martin Kletzander
2019-04-23 11:36 ` Richard W.M. Jones
2019-04-23 11:36 ` Richard W.M. Jones
2019-04-23 11:55 ` Daniel P. Berrangé
2019-04-23 12:12 ` Kevin Wolf
2019-04-23 12:12 ` Kevin Wolf
2019-04-23 14:26 ` Martin Kletzander
2019-04-23 14:26 ` Martin Kletzander
2019-04-23 15:08 ` Kevin Wolf [this message]
2019-04-23 15:08 ` Kevin Wolf
2019-04-24 6:40 ` Vladimir Sementsov-Ogievskiy
2019-04-24 7:19 ` Kevin Wolf
2019-04-24 9:04 ` Martin Kletzander
2019-04-29 7:27 ` Martin Kletzander
2019-04-29 8:58 ` Vladimir Sementsov-Ogievskiy
2019-04-29 9:16 ` Martin Kletzander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190423150845.GG9041@localhost.localdomain \
--to=kwolf@redhat.com \
--cc=eblake@redhat.com \
--cc=mkletzan@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rjones@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.