qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: "Denis V. Lunev" <den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs
Date: Fri, 26 Dec 2014 14:13:58 +0100	[thread overview]
Message-ID: <549D5F16.2070007@kamp.de> (raw)
In-Reply-To: <1419597313-20514-2-git-send-email-den@openvz.org>

Am 26.12.2014 um 13:35 schrieb Denis V. Lunev:
> The check for maximum length was added by
>   commit c31cb70728d2c0c8900b35a66784baa446fd5147
>   Author: Peter Lieven <pl@kamp.de>
>   Date:   Thu Oct 24 12:06:58 2013 +0200
>     block: honour BlockLimits in bdrv_co_do_write_zeroes
>
> but actually if driver provides .bdrv_co_write_zeroes callback, there is
> no need to limit the size to 32 MB. Callback should provide effective
> implementation which normally should not write any zeroes in comparable
> amount.

NACK.

First there is no guarantee that bdrv_co_do_write_zeroes is a fast operation.
This heaviliy depends on several circumstances that the block layer is not aware of.
If a specific protocol knows it is very fast in writing zeroes under any circumstance
it should provide INT_MAX in bs->bl.max_write_zeroes. It is then still allowed to
return -ENOTSUP if the request size or alignment doesn't fit.

There are known backends e.g. anything that deals with SCSI that have a known
limitation of the maximum number of zeroes they can write fast in a single request.
This number MUST NOT be exceeded. The below patch would break all those backends.

What issue are you trying to fix with this patch? Maybe there is a better way to fix
it at another point in the code.

Peter

>
> Correct check should be as follows to honour BlockLimits for slow writes:
> - if bs->bl.max_write_zeroes is specified, it should be applied to all
>   paths (fast and slow)
> - MAX_WRITE_ZEROES_DEFAULT should be applied for slow path only
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Eric Blake <eblake@redhat.com>
> CC: Peter Lieven <pl@kamp.de>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/block.c b/block.c
> index 4165d42..0ec8b15 100644
> --- a/block.c
> +++ b/block.c
> @@ -3182,8 +3182,12 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>      struct iovec iov = {0};
>      int ret = 0;
>  
> -    int max_write_zeroes = bs->bl.max_write_zeroes ?
> -                           bs->bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT;
> +    int max_fast_write_size = nb_sectors;
> +    int max_slow_write_size = MAX_WRITE_ZEROES_DEFAULT;
> +    if (bs->bl.max_write_zeroes != 0) {
> +        max_fast_write_size = bs->bl.max_write_zeroes;
> +        max_slow_write_size = bs->bl.max_write_zeroes;
> +    }
>  
>      while (nb_sectors > 0 && !ret) {
>          int num = nb_sectors;
> @@ -3205,9 +3209,9 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>              }
>          }
>  
> -        /* limit request size */
> -        if (num > max_write_zeroes) {
> -            num = max_write_zeroes;
> +        /* limit request size for fast path */
> +        if (num > max_fast_write_size) {
> +            num = max_fast_write_size;
>          }
>  
>          ret = -ENOTSUP;
> @@ -3217,6 +3221,11 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>          }
>  
>          if (ret == -ENOTSUP) {
> +            /* limit request size further for slow path */
> +            if (num > max_slow_write_size) {
> +                num = max_slow_write_size;
> +            }
> +
>              /* Fall back to bounce buffer if write zeroes is unsupported */
>              iov.iov_len = num * BDRV_SECTOR_SIZE;
>              if (iov.iov_base == NULL) {
> @@ -3234,7 +3243,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>              /* Keep bounce buffer around if it is big enough for all
>               * all future requests.
>               */
> -            if (num < max_write_zeroes) {
> +            if (num < max_slow_write_size) {
>                  qemu_vfree(iov.iov_base);
>                  iov.iov_base = NULL;
>              }

  reply	other threads:[~2014-12-26 13:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-26 12:35 [Qemu-devel] [PATCH v2 0/7] eliminate data write in bdrv_write_zeroes on Linux Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs Denis V. Lunev
2014-12-26 13:13   ` Peter Lieven [this message]
2014-12-26 13:32     ` Denis V. Lunev
2014-12-26 19:15       ` Denis V. Lunev
2014-12-27 14:52         ` Peter Lieven
2014-12-27 17:42           ` Denis V. Lunev
2014-12-27 20:01             ` Peter Lieven
2014-12-27 20:14               ` Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 2/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 3/7] block/raw-posix: create do_fallocate helper Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 4/7] block/raw-posix: create translate_err helper to merge errno values Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 5/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 6/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 7/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=549D5F16.2070007@kamp.de \
    --to=pl@kamp.de \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).