qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com,
	stefanha@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes
Date: Thu, 15 May 2014 23:20:25 +0200	[thread overview]
Message-ID: <53752F99.2090007@kamp.de> (raw)
In-Reply-To: <20140515095409.GB3822@noname.redhat.com>

Am 15.05.2014 11:54, schrieb Kevin Wolf:
> Am 15.05.2014 um 07:16 hat Peter Lieven geschrieben:
>> Am 14.05.2014 13:41, schrieb Kevin Wolf:
>>> Am 08.05.2014 um 18:22 hat Peter Lieven geschrieben:
>>>> this patch tries to optimize zero write requests
>>>> by automatically using bdrv_write_zeroes if it is
>>>> supported by the format.
>>>>
>>>> This significantly speeds up file system initialization and
>>>> should speed zero write test used to test backend storage
>>>> performance.
>>>>
>>>> I ran the following 2 tests on my internal SSD with a
>>>> 50G QCOW2 container and on an attached iSCSI storage.
>>>>
>>>> a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
>>>>
>>>> QCOW2         [off]     [on]     [unmap]
>>>> -----
>>>> runtime:       14secs    1.1secs  1.1secs
>>>> filesize:      937M      18M      18M
>>>>
>>>> iSCSI         [off]     [on]     [unmap]
>>>> ----
>>>> runtime:       9.3s      0.9s     0.9s
>>>>
>>>> b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct
>>>>
>>>> QCOW2         [off]     [on]     [unmap]
>>>> -----
>>>> runtime:       246secs   18secs   18secs
>>>> filesize:      51G       192K     192K
>>>> throughput:    203M/s    2.3G/s   2.3G/s
>>>>
>>>> iSCSI*        [off]     [on]     [unmap]
>>>> ----
>>>> runtime:       8mins     45secs   33secs
>>>> throughput:    106M/s    1.2G/s   1.6G/s
>>>> allocated:     100%      100%     0%
>>>>
>>>> * The storage was connected via an 1Gbit interface.
>>>>   It seems to internally handle writing zeroes
>>>>   via WRITESAME16 very fast.
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>> v3->v4: - use QAPI generated enum and lookup table [Kevin]
>>>>         - added more details about the options in the comments
>>>>           of the qapi-schema [Eric]
>>>>         - changed the type of detect_zeroes from str to
>>>>           BlockdevDetectZeroesOptions. I left the name
>>>>           as is because it is consistent with e.g.
>>>>           BlockdevDiscardOptions or BlockdevAioOptions [Eric]
>>>>         - changed the parse function in blockdev_init to
>>>>           be generic usable for other enum parameters
>>>>         
>>>> v2->v3: - moved parameter parsing to blockdev_init
>>>>         - added per device detect_zeroes status to 
>>>>           hmp (info block -v) and qmp (query-block) [Eric]
>>>>         - added support to enable detect-zeroes also
>>>>           for hot added devices [Eric]
>>>>         - added missing entry to qemu_common_drive_opts
>>>>         - fixed description of qemu_iovec_is_zero [Fam]
>>>>
>>>> v1->v2: - added tests to commit message (Markus)
>>>> RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
>>>>            - fixed typo (choosen->chosen) (Eric)
>>>>            - added second example to commit msg
>>>>
>>>> RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter
>>>>               - call zero detection only for format (bs->file != NULL)
>>>>
>>>>  block.c                   |   11 ++++++++++
>>>>  block/qapi.c              |    1 +
>>>>  blockdev.c                |   34 +++++++++++++++++++++++++++++
>>>>  hmp.c                     |    5 +++++
>>>>  include/block/block_int.h |    1 +
>>>>  include/qemu-common.h     |    1 +
>>>>  qapi-schema.json          |   52 ++++++++++++++++++++++++++++++++-------------
>>>>  qemu-options.hx           |    6 ++++++
>>>>  qmp-commands.hx           |    3 +++
>>>>  util/iov.c                |   21 ++++++++++++++++++
>>>>  10 files changed, 120 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index b749d31..aea4c77 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -3244,6 +3244,17 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>>>>  
>>>>      ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>>>>  
>>>> +    if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
>>>> +        !(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
>>>> +        drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
>>> Pretty long condition. :-)
>>>
>>> Looks like most is obviously needed, but I wonder what the bs->file part
>>> is good for? That looks rather arbitrary.
>> What I wanted to achieve is that this condition is only true if we handle
>> the format (e.g. raw, qcow2, vmdk etc.). If e.g. qcow2 then sends a
>> zero write this should always end on the disk and should not be optimizable.
> But why? This means setting an arbitrary policy for no good reason. You
> already have an option, and it already defaults to off, so unless
> someone specifically enables it for bs->file, we don't do the
> optimisation. But if someone wants to have it on bs->file, what reason
> is there to ignore that request?

I was erroneously thinking that the setting would be inherited to bs->file.
I will remove the bs->file from the condition.
>
>>>> +        flags |= BDRV_REQ_ZERO_WRITE;
>>>> +        /* if the device was not opened with discard=on the below flag
>>>> +         * is immediately cleared again in bdrv_co_do_write_zeroes */
>>> Is it? I only see it being cleared in bdrv_co_write_zeroes(), but that's
>>> not a function that seems to be called from here.
>> You are right. Question, do we want to support detect_zeroes = unmap
>> if discard = ignore? If yes, I have to update the docs. Otherwise
>> I have to check for BDRV_O_DISCARD before setting BDRV_REQ_MAY_UNMAP.
> I think it would be reasonable enough to just error out when you try to
> open an image with detect_zeroes=unmap,discard=ignore.
>
> Can these flags be changed during runtime? If so, we need to check there
> as well.

No, but you can specify them during hot add. Same as with discard.

Peter

>
>>>> +        if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
>>>> +            flags |= BDRV_REQ_MAY_UNMAP;
>>>> +        }
>>>> +    }
>>>> +
>>>>      if (ret < 0) {
>>>>          /* Do nothing, write notifier decided to fail this request */
>>>>      } else if (flags & BDRV_REQ_ZERO_WRITE) {
> Kevin

      reply	other threads:[~2014-05-15 21:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08 16:22 [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
2014-05-12 20:28 ` Eric Blake
2014-05-13 12:06   ` Peter Lieven
2014-05-14 13:15     ` Eric Blake
2014-05-14 11:41 ` Kevin Wolf
2014-05-15  5:16   ` Peter Lieven
2014-05-15  9:54     ` Kevin Wolf
2014-05-15 21:20       ` Peter Lieven [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53752F99.2090007@kamp.de \
    --to=pl@kamp.de \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).