From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty()
Date: Thu, 10 Jul 2014 01:23:12 +0200 [thread overview]
Message-ID: <53BDCEE0.20400@redhat.com> (raw)
In-Reply-To: <20140630113339.GE4334@noname.str.redhat.com>
On 30.06.2014 13:33, Kevin Wolf wrote:
> Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:
>> bdrv_make_empty() is currently only called if the current image
>> represents an external snapshot that has been committed to its base
>> image; it is therefore unlikely to have internal snapshots. In this
>> case, bdrv_make_empty() can be greatly sped up by creating an empty L1
>> table and dropping all data clusters at once by recreating the refcount
>> structure accordingly instead of normally discarding all clusters.
>>
>> If there are snapshots, fall back to the simple implementation (discard
>> all clusters).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> This approach looks a bit too complicated to me, and calulating the
> required metadata size seems error-prone.
>
> How about this:
>
> 1. Set the dirty flag in the header so we can mess with the L1 table
> without keeping the refcounts consistent
>
> 2. Overwrite the L1 table with zeros
>
> 3. Overwrite the first n clusters after the header with zeros
> (n = 2 + l1_clusters).
>
> 4. Update the header:
> refcount_table_offset = cluster_size
> refcount_table_clusters = 1
> l1_table_offset = 3 * cluster_size
>
> 6. bdrv_truncate to n + 1 clusters
>
> 7. Now update the first 8 bytes at cluster_size (the first new refcount
> table entry) to point to 2 * cluster_size (new refcount block)
>
> 8. Reset refcount block and L2 cache
>
> 9. Allocate n + 1 clusters (the header, too) and make sure you get
> offset 0
>
> 10. Remove the dirty flag
Okay, after some fixing around and getting it to work, I noticed a
(seemingly to me) rather big problem: If something bad happens between 3
and 7 (especially between 4 and 7), the image cannot be repaired. The
reason is that the refcount table is empty and a new refcount block
cannot be allocated because the consistency checks correctly signal an
overlap with the refcount table (I guess, I would have expected the
image header instead, but well...); this is because nothing is allocated
and the first cluster offset returned by an allocation will probably be
zero (the image header) or $cluster_size (where the reftable resides).
So I think we absolutely have to make sure that whenever the
refcount_table_offset is changed on disk, the reftable it points to
already contains a valid offset. We could pull 7 before 4, but then we'd
have to guarantee that 3 did not already overwrite the reftable (which
it probably does). Well, maybe we could change 3 so it checks whether
the reftable is already part of that area, and if it is, overwrite its
first entry not with zero, but with 2 * cluster_size; if the offset of
the reftable is not 2 * cluster_size, in which case we'd have to take
some other offset. Then we could either try to write a new reftable
anyway or just place everything behind that old reftable, just ignoring
the "lost" space.
In any case, I doubt it'll be much shorter overall with these additional
checks. The current code has 340 LOC with extremely verbose commentary;
my new code (failing to address the problem described above) has 100 LOC
without any comments.
So I guess the main issue is how *complicated* the code actually is; in
my opinion, the most complicated and hardest to review piece of code in
this patch (patch v8 3/14) is minimal_blob_size(); which, as far as I
think, we will need in one form or another eventually anyway.
create_refcount_l1() is pretty long, but due to the commentary should be
well comprehensible.
In any case, I still have the code for your proposal here and I'd be
absolutely fine with working further on it. So if you think it'll be
worth it anyway (which I personally don't have any opinion on), I'll
continue on it.
Max
next prev parent reply other threads:[~2014-07-09 23:23 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-07 18:51 [Qemu-devel] [PATCH v8 00/14] qemu-img: Implement commit like QMP Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 01/14] qcow2: Allow "full" discard Max Reitz
2014-06-30 10:00 ` Kevin Wolf
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 02/14] qcow2: Implement bdrv_make_empty() Max Reitz
2014-06-30 10:00 ` Kevin Wolf
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty() Max Reitz
2014-06-30 11:33 ` Kevin Wolf
2014-07-01 7:11 ` Hu Tao
2014-07-01 12:12 ` Max Reitz
2014-07-09 23:23 ` Max Reitz [this message]
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 04/14] blockjob: Introduce block_job_complete_sync() Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 05/14] blockjob: Add "ready" field Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 06/14] block/mirror: Improve progress report Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 07/14] qemu-img: Implement commit like QMP Max Reitz
2014-06-09 16:53 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 08/14] qemu-img: Empty image after commit Max Reitz
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 09/14] qemu-img: Enable progress output for commit Max Reitz
2014-06-09 17:28 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 10/14] qemu-img: Specify backing file " Max Reitz
2014-06-09 17:40 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 11/14] iotests: Add _filter_qemu_img_map Max Reitz
2014-06-09 17:51 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 12/14] iotests: Add test for backing-chain commits Max Reitz
2014-06-09 18:50 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 13/14] iotests: Add test for qcow2's bdrv_make_empty Max Reitz
2014-06-09 18:55 ` Eric Blake
2014-06-07 18:51 ` [Qemu-devel] [PATCH v8 14/14] iotests: Omit length/offset test in 040 and 041 Max Reitz
2014-06-27 22:07 ` [Qemu-devel] [PATCH v8 00/14] qemu-img: Implement commit like QMP Max Reitz
2014-06-30 9:50 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53BDCEE0.20400@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.