From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 7/8] block/qcow2: Speed up zero cluster expansion
Date: Wed, 30 Jul 2014 22:31:18 +0200 [thread overview]
Message-ID: <53D95616.7070209@redhat.com> (raw)
In-Reply-To: <53D919CC.9050706@redhat.com>
On 30.07.2014 18:14, Eric Blake wrote:
> On 07/25/2014 12:07 PM, Max Reitz wrote:
>> Actually, we do not need to allocate a new data cluster for every zero
>> cluster to be expanded: It is completely sufficient to rely on qcow2's
>> COW part and instead create a single zero cluster and reuse it as much
>> as possible.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> block/qcow2-cluster.c | 119 ++++++++++++++++++++++++++++++++++++++------------
>> 1 file changed, 92 insertions(+), 27 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 905beb6..867db03 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1558,6 +1558,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>> BDRVQcowState *s = bs->opaque;
>> bool is_active_l1 = (l1_table == s->l1_table);
>> uint64_t *l2_table = NULL;
>> + int64_t zeroed_cluster_offset = 0;
>> + int zeroed_cluster_refcount = 0;
>> + int last_zeroed_cluster_l1i = 0, last_zeroed_cluster_l2i = 0;
>> int ret;
>> int i, j;
>>
>> @@ -1617,47 +1620,79 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>> continue;
>> }
>>
>> - offset = qcow2_alloc_clusters(bs, s->cluster_size);
>> - if (offset < 0) {
>> - ret = offset;
>> - goto fail;
>> + if (zeroed_cluster_offset) {
>> + zeroed_cluster_refcount += l2_refcount;
>> + if (zeroed_cluster_refcount > 0xffff) {
> Doesn't the qcow2 file format allow variable-sized maximum refcount
> (bytes 96-99 refcount_order in the header)? Therefore, you should be
> using the value computed from the header rather than hard-coding the
> assumption that the header used (the default of) 16-bit refcount.
Oh, you're right, I didn't even think of that.
> [Yeah,
> I know, we don't yet have code that supports non-default size, even
> though the file format documents it, but that doesn't mean we should
> make it harder to add support down the road...]
And this is probably why, yes. ;-)
>> + zeroed_cluster_refcount = 0;
>> + zeroed_cluster_offset = 0;
>> + }
>> }
> This isn't a maximal packing. As long as we don't mind complexity to
> gain compactness, couldn't we also expand the existing
> zeroed_cluster_offset all the way up to full refcount, and decrement
> l2_refcount by the difference, before spilling over to allocating the
> next zero cluster?
Hm, right.
> Also, I have to wonder - since the all-zero cluster is the most likely
> cluster to have a large refcount, even during normal runtime, should we
> special case the normal qcow2 write code to track the current all-zero
> cluster (if any), and merely increase its refcount rather than allocate
> a new cluster any time it is detected that an all-zero cluster is
> needed? [Of course, the tracking would be runtime only, since
> compat=0.10 header doesn't provide any way to track the location of an
> all-zero cluster across file reloads. Each new runtime would probably
> settle on a new location for the all-zero cluster used during that run,
> rather than trying to find an existing one. And there's really no point
> to adding a header to track an all-zero cluster in compat=1.1 images,
> since those images already have the ability to track zero clusters
> without needing one allocated.]
This may improve performance for compat=0.10 images; however, I don't
think we should care that much about compat=0.10 images to justify
optimizations specifically for those all over the qcow2 code.
>> + if (!zeroed_cluster_offset) {
>> + offset = qcow2_alloc_clusters(bs, s->cluster_size);
>> + if (offset < 0) {
>> + ret = offset;
>> + goto fail;
>> + }
>>
>> - if (l2_refcount > 1) {
>> - /* For shared L2 tables, set the refcount accordingly (it is
>> - * already 1 and needs to be l2_refcount) */
>> - ret = qcow2_update_cluster_refcount(bs,
>> - offset >> s->cluster_bits, l2_refcount - 1,
>> - QCOW2_DISCARD_OTHER);
>> + ret = qcow2_pre_write_overlap_check(bs, 0, offset,
>> + s->cluster_size);
>> + if (ret < 0) {
>> + qcow2_free_clusters(bs, offset, s->cluster_size,
>> + QCOW2_DISCARD_OTHER);
>> + goto fail;
>> + }
>> +
>> + ret = bdrv_write_zeroes(bs->file, offset / BDRV_SECTOR_SIZE,
>> + s->cluster_sectors, 0);
> That is, if bdrv_write_zeroes knows how to take advantage of an already
> existing all-zero cluster, it would be less special casing in this code,
> but still get the same benefits of maximizing refcount during the amend
> operation, if all expanded clusters go through bdrv_write_zeroes.
The special casing would then be somewhere else and I think only this
code would really benefit from it. I don't think we should ingrain such
optimizations in all of the qcow2 code; I personally can live with
having these optimizations contained and separated from the rest of the
qcow2 code in functions like this one, but anything else would be a bit
too much effort (and maybe even too error-prone, since it would probably
be rather complex) for current qemu. We do have compat=1.1 with zero
clusters and that is what should be used for performance.
I'll wait with v2 of this patch until you have decided on whether it's
worth it (in the alternative series, I completely dropped this patch).
Max
next prev parent reply other threads:[~2014-07-30 20:31 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-25 18:07 [Qemu-devel] [PATCH 0/8] block/qcow2: Improve (?) zero cluster expansion Max Reitz
2014-07-25 18:07 ` [Qemu-devel] [PATCH 1/8] block: Add status callback to bdrv_amend_options() Max Reitz
2014-07-30 14:50 ` Eric Blake
2014-07-25 18:07 ` [Qemu-devel] [PATCH 2/8] qemu-img: Add progress output for amend Max Reitz
2014-07-30 14:55 ` Eric Blake
2014-07-30 20:20 ` Max Reitz
2014-07-25 18:07 ` [Qemu-devel] [PATCH 3/8] qemu-img: Fix insignifcant memleak Max Reitz
2014-07-30 14:56 ` Eric Blake
2014-07-25 18:07 ` [Qemu-devel] [PATCH 4/8] block/qcow2: Make get_refcount() global Max Reitz
2014-07-30 15:04 ` Eric Blake
2014-07-25 18:07 ` [Qemu-devel] [PATCH 5/8] block/qcow2: Implement status CB for amend Max Reitz
2014-07-30 15:23 ` Eric Blake
2014-07-25 18:07 ` [Qemu-devel] [PATCH 6/8] block/qcow2: Simplify shared L2 handling in amend Max Reitz
2014-07-30 15:36 ` Eric Blake
2014-07-25 18:07 ` [Qemu-devel] [PATCH 7/8] block/qcow2: Speed up zero cluster expansion Max Reitz
2014-07-30 16:14 ` Eric Blake
2014-07-30 20:31 ` Max Reitz [this message]
2014-07-30 20:31 ` Eric Blake
2014-07-30 20:41 ` Max Reitz
2014-07-25 18:07 ` [Qemu-devel] [PATCH 8/8] iotests: Expand test 061 Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53D95616.7070209@redhat.com \
--to=mreitz@redhat.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).