From: Eric Blake <eblake@redhat.com>
To: Max Reitz <mreitz@redhat.com>, lampahome <pahome.chen@mirlab.org>,
QEMU Developers <qemu-devel@nongnu.org>,
Qemu-block <qemu-block@nongnu.org>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
Date: Thu, 13 Sep 2018 15:01:55 -0500 [thread overview]
Message-ID: <31456c31-7a74-7df2-40d3-2a5841f39996@redhat.com> (raw)
In-Reply-To: <56133002-7a79-bf6a-8835-fba043638224@redhat.com>
On 9/13/18 1:37 PM, Max Reitz wrote:
> On 13.09.18 19:05, Eric Blake wrote:
>> [adding Markus, because of an interesting observation about --image-opts
>> vs. JSON null - search for [1] below]
>>
>> On 9/13/18 8:22 AM, Max Reitz wrote:
>>> On 13.09.18 05:33, lampahome wrote:
>>>> I split data to 3 chunks and save it in 3 independent backing files like
>>>> below:
>>>> img.000 <-- img.001 <-- img.002
>>>> img.000 is the backing file of img.001 and 001 is the backing file of
>>>> 002.
>>>> img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
>>>> data, and img.002 saves the 3rd chunk of data.
>>
>> How have you ensured that these three files are visiting different
>> ranges of guest data?
>
> He did say "independent".
True, but I'm curious how they were created in the first place (our
simple qemu-io -c 'write ...' is fine for testing, but nothing like
knowing the real story)
>>> $ qemu-img create -f qcow2 img.000 3M
>>> $ qemu-img create -f qcow2 -b img.000 img.001
>>> $ qemu-img create -f qcow2 -b img.001 img.002
>>> $ qemu-img create -f qcow2 -b img.002 img.003
>>
>> Missing -F qcow2 in those last three lines (you should always specify
>> the backing format in the qcow2 metadata, otherwise you are setting
>> yourself up for failures because probing is unsafe)
>
> Is it really unsafe for non-raw images?
In practice, not a problem for isolated testing. But it DOES interfere
with libvirt - libvirt assumes that any image that was not explicitly
specified is raw, rather than probing it, and treating img.002 as raw
(with no access to img.000 or img.001) means reading through img.003
sees garbage.
>
>>> $ qemu-io -c 'write -P 1 0M 1M' img.000
>>> $ qemu-io -c 'write -P 2 1M 1M' img.001
>>> $ qemu-io -c 'write -P 3 2M 1M' img.002
>>> $ qemu-io -c 'write -P 4 0M 1M' img.003
>>
>> I'd modify this example to use:
>> qemu-io -c 'write -P 4 0M 512k' -c 'write -P 4 1m 512k' \
>> -c 'write -P 4 2m 512k' img.003
>>
>> so that it becomes easier to see if we are ever committing more than
>> desired.
>
> Well, I interpreted the problem in a way that .003 does not shadow any
> data from .001 or .002.
True, but the question is again - how was the actual img.003 created, to
either ensure that it really does just touch clusters shadowed from .000
(qemu-img map output helps, if it's not too verbose).
>> $ qemu-io -c 'discard 0 1m' --image-opts
>> driver=qcow2,backing=,file.driver=file,file.filename=img.003
>> warning: Use of "backing": "" is deprecated; use "backing": null instead
>> discard 1048576/1048576 bytes at offset 0
>> 1 MiB, 1 ops; 0.0002 sec (4.399 GiB/sec and 4504.5045 ops/sec)
>>
>> doesn't work, as 'discard' causes img.003 to now make things read as
>> zero rather than deferring to the backing chain,
>
> Which is intentional because making data re-appear from the backing
> chain can be a security issue, as far as I remember.
It can be a potential issue if there is a backing file (exposing data
that you thought was wiped is not fun). But where there is NO backing
file, it's overly cautious, and gets in our way (we read all zeros from
a file with no backing, whether the cluster is marked as 0 or as
defer-to-backing). I'm okay if we still keep the overly cautious way by
default, but having a knob to say "discard this, and I really do mean
discard rather than read back as 0" would be useful in qemu (after all,
that's what fallocate(FALLOC_FL_NO_HIDE_STALE) has recently been used
for in the kernel, as the knob for whether discarding on a block device
must read back as zero or may go faster [2]).
[2] https://lore.kernel.org/patchwork/patch/953421/
>>
>> $ qemu-io -c 'discard 0 1m' --image-opts '{"driver":"qcow2",
>> "backing":null, "file":{"driver":"file", "filename":"img.003"}}'
>>
>> except THAT doesn't work yet (we haven't converted all our command line
>> arguments to taking JSON yet). (end [1])
>
> I hate json:{}, but we have it, so why not use it?
>
> $ qemu-io -c 'discard 0 1m' \
> "json:{'driver':'qcow2','backing':null,
> 'file':{'driver':'file','filename':'img.003'}}"
Hmm - that's the pseudo-JSON protocol rather than --image-opts detecting
a first character of '{'. But yeah, that works for getting at
"backing":null cleaner than the "backing=" with intentionally empty
argument via dotted syntax.
>> Sorry - for all my experimenting, I could NOT find a reliable way to
>> remove duplicated clusters out of img.003 once they were committed to
>> img.000,
>
> I'm not sure whether your experiments really concern what the reporter
> needs in his exact case, but just for fun:
Indeed - lampahome, concrete tests with accurate reproduction
instructions always makes life easier for people trying to help you.
>
> Basically, there is only one way to reliably make an image pass through
> data from its backing files again. Well, two, actually. One is
> qemu-img commit, which (for compatibility, mainly) makes the image empty
> after the commit.
And only if you did NOT use the -b option (in other words, it only
empties the file if you are committing to the immediate backing file,
not deep in the chain).
> The other is just throwing the image away and
> re-creating it from scratch.
Well yeah, there's that. But now you have a transient problem of extra
pressure on your storage, while you have duplicated blocks between old
and new images, prior to being able to remove the old image. If the
goal is to make img.000 not grow during the commit, I was assuming that
we are already storage-constrained, and any solution that does in-place
modification is therefore better than one that has to create yet another
copy of data, even if the end result is the same once all operations
have finished.
>
> So in any case, you cannot reliably do that for just a part of the image.
>
> First, split .003 into the part we want to commit and the part we don't
> want to commit. This is a bit tricky without qemu-img dd @seek (or a
> corresponding convert parameter), so we'll have to make do with
> backing=null so we don't copy anything into the output from img.003's
> backing chain.
>
> Or, we would have to use backing=null, but for some reason that doesn't
> work. I'll have to investigate.
Just so I'm following along, what didn't work? 'backing':null in a
json:{...} pseudoformat, or driver.raw,file.driver=qcow2,file.backing=,
in dotted syntax?
>
> So rebase will need to do:
>
> $ qemu-img rebase -u -b '' img.003
>
> $ qemu-img convert -O qcow2 \
> "json:{'driver':'raw','offset':0,'size':1048576,\
> 'file':{'driver':'qcow2',\
> 'file':{'driver':'file','filename':'img.003'}}}" \
> "json:{'driver':'null-co','size':2097152}" \
> img.003.commit.000
Oh right - you can indeed concatenate multiple inputs into one output
with qemu-img convert.
>
> $ qemu-img convert -O qcow2 \
> "json:{'driver':'null-co','size':1048576}" \
> "json:{'driver':'raw','offset':1048576,'size':2097152,\
> 'file':{'driver':'qcow2',\
> 'file':{'driver':'file','filename':'img.003'}}}" \
> img.003.nocommit
So you created:
img.000 11----
img.001 --22--
img.002 ----33
img.003 4-4-4-
guest sees 414243
img.003.commit.000 4-----
img.003.nocommit --4-4-
>
> Now let's set the backing files. img.003.commit.000 has only data that
> goes into img.000, so that goes there, and img.003.nocommit is going to
> replace our old img.003, so that goes where that was:
>
> $ qemu-img rebase -u -b img.000 img.003.commit.000
> $ qemu-img rebase -u -b img.002 img.003.nocommit
>
> And now let's commit:
>
> $ qemu-img commit img.003.commit.000
>
> And let's clean up:
>
> $ rm img.003.commit.000
> $ mv img.003.nocommit img.003
>
> Done.
Done, but with temporary storage usage higher than doing it in place.
>
> (If you want to commit all three parts of img.003 into the three
> different base images, you would create img.003.commit.001 and
> img.003.commit.002 similarly as above, and then commit those into the
> respective base images. Then you'd just rm img.003* and you're back to
> the original state.)
Your solution of qemu-img convert to concatenate null-co with an offset
of img.003 is nice.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
next prev parent reply other threads:[~2018-09-13 20:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-13 3:33 [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd? lampahome
2018-09-13 13:22 ` Max Reitz
2018-09-13 17:05 ` Eric Blake
2018-09-13 18:37 ` Max Reitz
2018-09-13 19:41 ` Max Reitz
2018-09-13 20:06 ` Eric Blake
2018-09-13 20:01 ` Eric Blake [this message]
2018-09-13 20:44 ` Max Reitz
2018-09-14 2:19 ` lampahome
2018-09-14 14:48 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=31456c31-7a74-7df2-40d3-2a5841f39996@redhat.com \
--to=eblake@redhat.com \
--cc=armbru@redhat.com \
--cc=mreitz@redhat.com \
--cc=pahome.chen@mirlab.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).