qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>, qemu-block@nongnu.org
Cc: peter.maydell@linaro.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes
Date: Wed, 14 Aug 2019 21:44:28 -0500	[thread overview]
Message-ID: <fb6063ca-d4c4-7add-adec-5e92c3805ad8@redhat.com> (raw)
In-Reply-To: <20190326155157.3719-6-kwolf@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 3312 bytes --]

On 3/26/19 10:51 AM, Kevin Wolf wrote:
> We know that the kernel implements a slow fallback code path for
> BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
> The other operations we call in the context of .bdrv_co_pwrite_zeroes
> should usually be quick, so no modification should be needed for them.
> If we ever notice that there are additional problematic cases, we can
> still make these conditional as well.

Are there cases where fallocate(FALLOC_FL_ZERO_RANGE) falls back to slow
writes?  It may be fast on some file systems, but when used on a block
device, that may equally trigger slow fallbacks.  The man page is not
clear on that fact; I suspect that there may be cases in there that need
to be made conditional (it would be awesome if the kernel folks would
give us another FALLOC_ flag when we want to guarantee no fallback).

By the way, is there an easy setup to prove (maybe some qemu-img convert
command on a specially-prepared source image) whether the no fallback
flag makes a difference?  I'm about to cross-post a series of patches to
nbd/qemu/nbdkit/libnbd that adds a new NBD_CMD_FLAG_FAST_ZERO which fits
the bill of BDRV_REQ_NO_FALLBACK, but would like to include some
benchmark numbers in my cover letter if I can reproduce a setup where it
matters.

And this patch has a bug:

> +++ b/block/file-posix.c
> @@ -652,7 +652,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>      }
>  #endif
>  
> -    bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
> +    bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
>      ret = 0;
>  fail:
>      if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
> @@ -1500,14 +1500,19 @@ static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
{
    int ret = -ENOTSUP;
    BDRVRawState *s = aiocb->bs->opaque;

    if (!s->has_write_zeroes) {
        return -ENOTSUP;
>      }

At this point, ret is -ENOTSUP.

>  
>  #ifdef BLKZEROOUT
> -    do {
> -        uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
> -        if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
> -            return 0;
> -        }
> -    } while (errno == EINTR);
> +    /* The BLKZEROOUT implementation in the kernel doesn't set
> +     * BLKDEV_ZERO_NOFALLBACK, so we can't call this if we have to avoid slow
> +     * fallbacks. */
> +    if (!(aiocb->aio_type & QEMU_AIO_NO_FALLBACK)) {
> +        do {
> +            uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
> +            if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
> +                return 0;
> +            }
> +        } while (errno == EINTR);
>  
> -    ret = translate_err(-errno);
> +        ret = translate_err(-errno);
> +    }

If the very first call to this function is with NO_FALLBACK, then this
'if' is skipped,

>  #endif
>  
>      if (ret == -ENOTSUP) {
        s->has_write_zeroes = false;
    }

and we set s->has_write_zeroes to false, permanently disabling any
BLKZEROOUT attempts in future calls, even if the future calls no longer
pass the NO_FALLBACK flag.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

       reply	other threads:[~2019-08-15  3:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190326155157.3719-1-kwolf@redhat.com>
     [not found] ` <20190326155157.3719-6-kwolf@redhat.com>
2019-08-15  2:44   ` Eric Blake [this message]
2019-08-15 10:29     ` [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes Kevin Wolf
2019-08-17 17:45       ` Nir Soffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb6063ca-d4c4-7add-adec-5e92c3805ad8@redhat.com \
    --to=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).