qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@openvz.org>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes()
Date: Wed, 27 Apr 2016 10:07:35 +0300	[thread overview]
Message-ID: <57206537.4040706@openvz.org> (raw)
In-Reply-To: <20160426101941.GB4213@noname.str.redhat.com>

On 04/26/2016 01:19 PM, Kevin Wolf wrote:

[skipped]
> Did you ever talk to the kernel people?
>
> We can try to make the best out of suboptimal requests in qemu, but it
> looks to me as if the real fix was in the kernel, and if we don't get it
> fixed there, we'll see more and more of this kind of problems. I think
> this is relevant not only for VMs, but probably on real hardware as
> well.
it is difficult and could be done partially only :(


>> Locally with QEMU-IO the reproduction is simple. We can repeat above
>> requests or could simple do the following:
>>      qemu-io -c "write 0xff 32k 1M" 1.img
> I assume you mean "-z" instead of "0xff"?
sure


>> The code without the patch will allocate 2 blocks for guest offsets
>> 0-64k and 1M-(1M+64k) and performs writes there. The code with
>> the patch will skip creation of blocks if possible.
> Okay, that's the second of the optimisations you mentioned in your
> commit message. I can see how this adds something that the generic block
> layer can't easily add, if it can be made safe (I think it can, even
> though your patch doesn't get it completely right yet, see below).
>
>> I have recorded parameters in qcow2_co_do_write_zeroes for the
>> reference (1 MB is written, memory is not aligned as sent in the
>> letter above, [sudo ./a.out /dev/nbd3]):
>>
>> qcow2_co_write_zeroes off=0 size=10000
>> qcow2_co_write_zeroes off=1fe00 size=200
>> qcow2_co_write_zeroes off=3fe00 size=200
>> qcow2_co_write_zeroes off=5fe00 size=200
>> qcow2_co_write_zeroes off=7fe00 size=200
>> qcow2_co_write_zeroes off=9fe00 size=200
>> qcow2_co_write_zeroes off=bfe00 size=200
>> qcow2_co_write_zeroes off=dfe00 size=200
>> qcow2_co_write_zeroes off=ffe00 size=200
>> qcow2_co_write_zeroes off=10000 size=fe00
>> qcow2_co_write_zeroes off=20000 size=10000
>> qcow2_co_write_zeroes off=30000 size=fe00
>> qcow2_co_write_zeroes off=60000 size=10000
>> qcow2_co_write_zeroes off=70000 size=fe00
>> qcow2_co_write_zeroes off=c0000 size=10000
>> qcow2_co_write_zeroes off=d0000 size=fe00
>> qcow2_co_write_zeroes off=e0000 size=10000
>> qcow2_co_write_zeroes off=f0000 size=fe00
>> qcow2_co_write_zeroes off=80000 size=10000
>> qcow2_co_write_zeroes off=90000 size=fe00
>> qcow2_co_write_zeroes off=a0000 size=10000
>> qcow2_co_write_zeroes off=b0000 size=fe00
>> qcow2_co_write_zeroes off=40000 size=10000
>> qcow2_co_write_zeroes off=50000 size=fe00
> I don't see any requests here where your code actually ends up splitting
> the request into head, aligned part and tail. Which is expected because
> bdrv_co_do_write_zeroes() already does that.
>
> What I can't see here is whether this actually happened (10000 + fe00
> could be a split request) or whether it already came in this way over
> NBD.

bdrv_aligned_pwritev off=0 size=1fe00
qcow2_co_write_zeroes off=0 size=10000
qcow2_co_write_zeroes off=10000 size=fe00
bdrv_aligned_pwritev off=1fe00 size=20000
qcow2_co_write_zeroes off=1fe00 size=200
qcow2_co_write_zeroes off=20000 size=10000
qcow2_co_write_zeroes off=30000 size=fe00
bdrv_aligned_pwritev off=3fe00 size=20000
qcow2_co_write_zeroes off=3fe00 size=200
qcow2_co_write_zeroes off=40000 size=10000
qcow2_co_write_zeroes off=50000 size=fe00
bdrv_aligned_pwritev off=5fe00 size=20000
qcow2_co_write_zeroes off=5fe00 size=200
qcow2_co_write_zeroes off=60000 size=10000
qcow2_co_write_zeroes off=70000 size=fe00
bdrv_aligned_pwritev off=7fe00 size=20000
qcow2_co_write_zeroes off=7fe00 size=200
qcow2_co_write_zeroes off=80000 size=10000
qcow2_co_write_zeroes off=90000 size=fe00
bdrv_aligned_pwritev off=9fe00 size=20000
qcow2_co_write_zeroes off=9fe00 size=200
qcow2_co_write_zeroes off=a0000 size=10000
qcow2_co_write_zeroes off=b0000 size=fe00
bdrv_aligned_pwritev off=bfe00 size=20000
qcow2_co_write_zeroes off=bfe00 size=200
qcow2_co_write_zeroes off=c0000 size=10000
qcow2_co_write_zeroes off=d0000 size=fe00
bdrv_aligned_pwritev off=dfe00 size=20000
qcow2_co_write_zeroes off=dfe00 size=200
qcow2_co_write_zeroes off=e0000 size=10000
qcow2_co_write_zeroes off=f0000 size=fe00
bdrv_aligned_pwritev off=ffe00 size=200
qcow2_co_write_zeroes off=ffe00 size=200


[skipped]

> Here is the trick that I think will save us:
>
> On a misaligned call, we call bdrv_get_block_status_above() for the
> whole cluster that we're in. We know that it's only a single cluster
> because bdrv_co_do_write_zeroes() splits things this way; only aligned
> requests can be longer than a sector (we can even assert this).
>
> If the result is that the cluster already reads as zero, instead of
> doing nothing and possibly breaking backing chain manipulations, we
> simply extend the write zeroes operation to the whole cluster and
> continue as normal with an aligned request. This way we end up with a
> zero cluster instead of an unallocated one, and that should be safe.
>
> If the result is that the cluster isn't completed zeroed, return
> -ENOTSUP as you did in the snippet above.
>
> That approach should probably result in an (even) simpler patch, too.

let's start from the simple thing. I'll send a patch in a couple of minutes.
It really looks MUCH better than the original one. Thank you for this cool
suggestion.

>
> Hm... Or actually, if we want something more complex that will help all
> block drivers, extending the range of the request could even be done in
> bdrv_co_do_write_zeroes(), I guess. I won't insist on it, though.
I do not think that this should be done now. This patch should go into
our own stable and thus it would better be as simple as possible.

Den

  reply	other threads:[~2016-04-27  7:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-23 12:05 [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes() Denis V. Lunev
2016-04-25  9:05 ` Kevin Wolf
2016-04-25 10:20   ` Denis V. Lunev
2016-04-25 19:35     ` Eric Blake
2016-04-25 21:00       ` Denis V. Lunev
2016-04-26  8:23     ` Kevin Wolf
2016-04-26  9:35       ` Denis V. Lunev
2016-04-26 10:19         ` Kevin Wolf
2016-04-27  7:07           ` Denis V. Lunev [this message]
2016-04-27  8:12             ` Kevin Wolf
2016-04-27  8:32               ` Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57206537.4040706@openvz.org \
    --to=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).