From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes
Date: Thu, 15 May 2014 23:20:25 +0200 [thread overview]
Message-ID: <53752F99.2090007@kamp.de> (raw)
In-Reply-To: <20140515095409.GB3822@noname.redhat.com>
Am 15.05.2014 11:54, schrieb Kevin Wolf:
> Am 15.05.2014 um 07:16 hat Peter Lieven geschrieben:
>> Am 14.05.2014 13:41, schrieb Kevin Wolf:
>>> Am 08.05.2014 um 18:22 hat Peter Lieven geschrieben:
>>>> this patch tries to optimize zero write requests
>>>> by automatically using bdrv_write_zeroes if it is
>>>> supported by the format.
>>>>
>>>> This significantly speeds up file system initialization and
>>>> should speed zero write test used to test backend storage
>>>> performance.
>>>>
>>>> I ran the following 2 tests on my internal SSD with a
>>>> 50G QCOW2 container and on an attached iSCSI storage.
>>>>
>>>> a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
>>>>
>>>> QCOW2 [off] [on] [unmap]
>>>> -----
>>>> runtime: 14secs 1.1secs 1.1secs
>>>> filesize: 937M 18M 18M
>>>>
>>>> iSCSI [off] [on] [unmap]
>>>> ----
>>>> runtime: 9.3s 0.9s 0.9s
>>>>
>>>> b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct
>>>>
>>>> QCOW2 [off] [on] [unmap]
>>>> -----
>>>> runtime: 246secs 18secs 18secs
>>>> filesize: 51G 192K 192K
>>>> throughput: 203M/s 2.3G/s 2.3G/s
>>>>
>>>> iSCSI* [off] [on] [unmap]
>>>> ----
>>>> runtime: 8mins 45secs 33secs
>>>> throughput: 106M/s 1.2G/s 1.6G/s
>>>> allocated: 100% 100% 0%
>>>>
>>>> * The storage was connected via an 1Gbit interface.
>>>> It seems to internally handle writing zeroes
>>>> via WRITESAME16 very fast.
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>> v3->v4: - use QAPI generated enum and lookup table [Kevin]
>>>> - added more details about the options in the comments
>>>> of the qapi-schema [Eric]
>>>> - changed the type of detect_zeroes from str to
>>>> BlockdevDetectZeroesOptions. I left the name
>>>> as is because it is consistent with e.g.
>>>> BlockdevDiscardOptions or BlockdevAioOptions [Eric]
>>>> - changed the parse function in blockdev_init to
>>>> be generic usable for other enum parameters
>>>>
>>>> v2->v3: - moved parameter parsing to blockdev_init
>>>> - added per device detect_zeroes status to
>>>> hmp (info block -v) and qmp (query-block) [Eric]
>>>> - added support to enable detect-zeroes also
>>>> for hot added devices [Eric]
>>>> - added missing entry to qemu_common_drive_opts
>>>> - fixed description of qemu_iovec_is_zero [Fam]
>>>>
>>>> v1->v2: - added tests to commit message (Markus)
>>>> RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
>>>> - fixed typo (choosen->chosen) (Eric)
>>>> - added second example to commit msg
>>>>
>>>> RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter
>>>> - call zero detection only for format (bs->file != NULL)
>>>>
>>>> block.c | 11 ++++++++++
>>>> block/qapi.c | 1 +
>>>> blockdev.c | 34 +++++++++++++++++++++++++++++
>>>> hmp.c | 5 +++++
>>>> include/block/block_int.h | 1 +
>>>> include/qemu-common.h | 1 +
>>>> qapi-schema.json | 52 ++++++++++++++++++++++++++++++++-------------
>>>> qemu-options.hx | 6 ++++++
>>>> qmp-commands.hx | 3 +++
>>>> util/iov.c | 21 ++++++++++++++++++
>>>> 10 files changed, 120 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index b749d31..aea4c77 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -3244,6 +3244,17 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>>>>
>>>> ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>>>>
>>>> + if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
>>>> + !(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
>>>> + drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
>>> Pretty long condition. :-)
>>>
>>> Looks like most is obviously needed, but I wonder what the bs->file part
>>> is good for? That looks rather arbitrary.
>> What I wanted to achieve is that this condition is only true if we handle
>> the format (e.g. raw, qcow2, vmdk etc.). If e.g. qcow2 then sends a
>> zero write this should always end on the disk and should not be optimizable.
> But why? This means setting an arbitrary policy for no good reason. You
> already have an option, and it already defaults to off, so unless
> someone specifically enables it for bs->file, we don't do the
> optimisation. But if someone wants to have it on bs->file, what reason
> is there to ignore that request?
I was erroneously thinking that the setting would be inherited to bs->file.
I will remove the bs->file from the condition.
>
>>>> + flags |= BDRV_REQ_ZERO_WRITE;
>>>> + /* if the device was not opened with discard=on the below flag
>>>> + * is immediately cleared again in bdrv_co_do_write_zeroes */
>>> Is it? I only see it being cleared in bdrv_co_write_zeroes(), but that's
>>> not a function that seems to be called from here.
>> You are right. Question, do we want to support detect_zeroes = unmap
>> if discard = ignore? If yes, I have to update the docs. Otherwise
>> I have to check for BDRV_O_DISCARD before setting BDRV_REQ_MAY_UNMAP.
> I think it would be reasonable enough to just error out when you try to
> open an image with detect_zeroes=unmap,discard=ignore.
>
> Can these flags be changed during runtime? If so, we need to check there
> as well.
No, but you can specify them during hot add. Same as with discard.
Peter
>
>>>> + if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
>>>> + flags |= BDRV_REQ_MAY_UNMAP;
>>>> + }
>>>> + }
>>>> +
>>>> if (ret < 0) {
>>>> /* Do nothing, write notifier decided to fail this request */
>>>> } else if (flags & BDRV_REQ_ZERO_WRITE) {
> Kevin
prev parent reply other threads:[~2014-05-15 21:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-08 16:22 [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
2014-05-12 20:28 ` Eric Blake
2014-05-13 12:06 ` Peter Lieven
2014-05-14 13:15 ` Eric Blake
2014-05-14 11:41 ` Kevin Wolf
2014-05-15 5:16 ` Peter Lieven
2014-05-15 9:54 ` Kevin Wolf
2014-05-15 21:20 ` Peter Lieven [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53752F99.2090007@kamp.de \
--to=pl@kamp.de \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).