Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image
Date: Tue, 10 Nov 2015 10:01:54 +0100	[thread overview]
Message-ID: <5641B282.10706@redhat.com> (raw)
In-Reply-To: <20151110061458.GB26289@ad.usersys.redhat.com>



On 10/11/2015 07:14, Fam Zheng wrote:
> On Mon, 11/09 17:29, Kevin Wolf wrote:
>> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
>>>
>>>
>>> On 09/11/2015 17:04, Kevin Wolf wrote:
>>>> Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
>>>>> The "pnum < nb_sectors" condition in deciding whether to actually copy
>>>>> data is unnecessarily strict, and the qiov initialization is
>>>>> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
>>>>> branches.
>>>>>
>>>>> Reorganize mirror_iteration flow so that we:
>>>>>
>>>>>     1) Find the contiguous zero/discarded sectors with
>>>>>     bdrv_get_block_status_above() before deciding what to do. We query
>>>>>     s->buf_size sized blocks at a time.
>>>>>
>>>>>     2) If the sectors in question are zeroed/discarded and aligned to
>>>>>     target cluster, issue zero write or discard accordingly. It's done
>>>>>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
>>>>>
>>>>>     3) Otherwise, do the same loop as before in mirror_do_read.
>>>>>
>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>
>>>> I'm not sure where in the patch to comment on this, so I'll just do it
>>>> here right in the beginning.
>>>>
>>>> I'm concerned that we need to be more careful about races in this patch,
>>>> in particular regarding the bitmaps. I think the conditions for the two
>>>> bitmaps are:
>>>>
>>>> * Dirty bitmap: We must clear the bit after finding the next piece of
>>>>   data to be mirrored, but before we yield after getting information
>>>>   that we use for the decision which kind of operation we need.
>>>>
>>>>   In other words, we need to clear the dirty bitmap bit before calling
>>>>   bdrv_get_block_status_above(), because that's both the function that
>>>>   retrieves information about the next chunk and also a function that
>>>>   can yield.
>>>>
>>>>   If after this point the data is written to, we need to mirror it
>>>>   again.
>>>
>>> With Fam's patch, that's not trivial for two reasons:
>>>
>>> 1) bdrv_get_block_status_above() can return a smaller amount than what
>>> is asked.
>>>
>>> 2) the "read and write" case can handle s->granularity sectors per
>>> iteration (many of them can be coalesced, but still that's how the
>>> iteration works).
>>>
>>> The simplest solution is to perform the query with s->granularity size
>>> rather than s->buf_size.
>>
>> Then we end up with many small operations, that's not what we want.
>>
>> Why can't we mark up to s->buf_size dirty clusters as clean first, then
>> query the status, and mark all of those that we can't handle dirty
>> again?
> 
> Then we may end up marking more clusters as dirty than it should be.

You're both right.

> Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are coroutine,
> we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
> bdrv_set_dirty_bitmap.

I think this is not necessary.

I think the following is safe:

1) before calling bdrv_get_block_status_above(), find out how many
consecutive bits in the dirty bitmap are 1

2) zero all those bits in the dirty bitmap

3) call bdrv_get_block_status_above() with a size equivalent to the
number of dirty bits

4) if bdrv_get_block_status_above() only returns a partial result, loop
step (3) until all the dirty bits are processed

For full mirroring, this strategy will probably make the first
incremental iteration more expensive.

Paolo

next prev parent reply	other threads:[~2015-11-10  9:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-06 10:22 [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image Fam Zheng
2015-11-06 18:36 ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-11-09  2:12   ` Fam Zheng
2015-11-09 16:04 ` [Qemu-devel] " Kevin Wolf
2015-11-09 16:18   ` Paolo Bonzini
2015-11-09 16:29     ` Kevin Wolf
2015-11-10  6:14       ` Fam Zheng
2015-11-10  9:01         ` Paolo Bonzini [this message]
2015-11-10 10:12           ` Kevin Wolf
2015-11-10 10:30             ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5641B282.10706@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).