From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty()
Date: Thu, 10 Jul 2014 01:23:12 +0200 [thread overview]
Message-ID: <53BDCEE0.20400@redhat.com> (raw)
In-Reply-To: <20140630113339.GE4334@noname.str.redhat.com>
On 30.06.2014 13:33, Kevin Wolf wrote:
> Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:
>> bdrv_make_empty() is currently only called if the current image
>> represents an external snapshot that has been committed to its base
>> image; it is therefore unlikely to have internal snapshots. In this
>> case, bdrv_make_empty() can be greatly sped up by creating an empty L1
>> table and dropping all data clusters at once by recreating the refcount
>> structure accordingly instead of normally discarding all clusters.
>>
>> If there are snapshots, fall back to the simple implementation (discard
>> all clusters).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> This approach looks a bit too complicated to me, and calulating the
> required metadata size seems error-prone.
>
> How about this:
>
> 1. Set the dirty flag in the header so we can mess with the L1 table
> without keeping the refcounts consistent
>
> 2. Overwrite the L1 table with zeros
>
> 3. Overwrite the first n clusters after the header with zeros
> (n = 2 + l1_clusters).
>
> 4. Update the header:
> refcount_table_offset = cluster_size
> refcount_table_clusters = 1
> l1_table_offset = 3 * cluster_size
>
> 6. bdrv_truncate to n + 1 clusters
>
> 7. Now update the first 8 bytes at cluster_size (the first new refcount
> table entry) to point to 2 * cluster_size (new refcount block)
>
> 8. Reset refcount block and L2 cache
>
> 9. Allocate n + 1 clusters (the header, too) and make sure you get
> offset 0
>
> 10. Remove the dirty flag
Okay, after some fixing around and getting it to work, I noticed a
(seemingly to me) rather big problem: If something bad happens between 3
and 7 (especially between 4 and 7), the image cannot be repaired. The
reason is that the refcount table is empty and a new refcount block
cannot be allocated because the consistency checks correctly signal an
overlap with the refcount table (I guess, I would have expected the
image header instead, but well...); this is because nothing is allocated
and the first cluster offset returned by an allocation will probably be
zero (the image header) or $cluster_size (where the reftable resides).
So I think we absolutely have to make sure that whenever the
refcount_table_offset is changed on disk, the reftable it points to
already contains a valid offset. We could pull 7 before 4, but then we'd
have to guarantee that 3 did not already overwrite the reftable (which
it probably does). Well, maybe we could change 3 so it checks whether
the reftable is already part of that area, and if it is, overwrite its
first entry not with zero, but with 2 * cluster_size; if the offset of
the reftable is not 2 * cluster_size, in which case we'd have to take
some other offset. Then we could either try to write a new reftable
anyway or just place everything behind that old reftable, just ignoring
the "lost" space.
In any case, I doubt it'll be much shorter overall with these additional
checks. The current code has 340 LOC with extremely verbose commentary;
my new code (failing to address the problem described above) has 100 LOC
without any comments.
So I guess the main issue is how *complicated* the code actually is; in
my opinion, the most complicated and hardest to review piece of code in
this patch (patch v8 3/14) is minimal_blob_size(); which, as far as I
think, we will need in one form or another eventually anyway.
create_refcount_l1() is pretty long, but due to the commentary should be
well comprehensible.
In any case, I still have the code for your proposal here and I'd be
absolutely fine with working further on it. So if you think it'll be
worth it anyway (which I personally don't have any opinion on), I'll
continue on it.
Max
next prev parent reply other threads:[~2014-07-09 23:23 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-07 18:51 [Qemu-devel] [PATCH v8 00/14] qemu-img: Implement commit like QMP Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 01/14] qcow2: Allow "full" discard Max Reitz
2014-06-30 10:00 ` Kevin Wolf
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 02/14] qcow2: Implement bdrv_make_empty() Max Reitz
2014-06-30 10:00 ` Kevin Wolf
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty() Max Reitz
2014-06-30 11:33 ` Kevin Wolf
2014-07-01 7:11 ` Hu Tao
2014-07-01 12:12 ` Max Reitz
2014-07-09 23:23 ` Max Reitz [this message]
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 04/14] blockjob: Introduce block_job_complete_sync() Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 05/14] blockjob: Add "ready" field Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 06/14] block/mirror: Improve progress report Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 07/14] qemu-img: Implement commit like QMP Max Reitz
2014-06-09 16:53 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 08/14] qemu-img: Empty image after commit Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 09/14] qemu-img: Enable progress output for commit Max Reitz
2014-06-09 17:28 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 10/14] qemu-img: Specify backing file " Max Reitz
2014-06-09 17:40 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 11/14] iotests: Add _filter_qemu_img_map Max Reitz
2014-06-09 17:51 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 12/14] iotests: Add test for backing-chain commits Max Reitz
2014-06-09 18:50 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 13/14] iotests: Add test for qcow2's bdrv_make_empty Max Reitz
2014-06-09 18:55 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 14/14] iotests: Omit length/offset test in 040 and 041 Max Reitz
2014-06-27 22:07 ` [Qemu-devel] [PATCH v8 00/14] qemu-img: Implement commit like QMP Max Reitz
2014-06-30 9:50 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53BDCEE0.20400@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).