All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: "Denis V. Lunev" <den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/8] block: prepare bdrv_co_do_write_zeroes to deal with large bl.max_write_zeroes
Date: Mon, 05 Jan 2015 08:34:28 +0100	[thread overview]
Message-ID: <54AA3E84.8080003@kamp.de> (raw)
In-Reply-To: <1419931250-19259-2-git-send-email-den@openvz.org>

On 30.12.2014 10:20, Denis V. Lunev wrote:
> bdrv_co_do_write_zeroes split writes using bl.max_write_zeroes or
> 16 MiB as a chunk size. This is implemented in this way to tolerate
> buggy block backends which do not accept too big requests.
>
> Though if the bdrv_co_write_zeroes callback is not good enough, we
> fallback to write data explicitely using bdrv_co_writev and we
> create buffer to accomodate zeroes inside. The size of this buffer
> is the size of the chunk. Thus if the underlying layer will have
> bl.max_write_zeroes high enough, f.e. 4 GiB, the allocation can fail.
>
> Actually, there is no need to allocate such a big amount of memory.
> We could simply allocate 1 MiB buffer and create iovec, which will
> point to the same memory.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Peter Lieven <pl@kamp.de>
> ---
>   block.c | 35 ++++++++++++++++++++++++-----------
>   1 file changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/block.c b/block.c
> index 4165d42..d69c121 100644
> --- a/block.c
> +++ b/block.c
> @@ -3173,14 +3173,18 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>    * of 32768 512-byte sectors (16 MiB) per request.
>    */
>   #define MAX_WRITE_ZEROES_DEFAULT 32768
> +/* allocate iovec with zeroes using 1 MiB chunks to avoid to big allocations */
> +#define MAX_ZEROES_CHUNK (1024 * 1024)
>   
>   static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>       int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
>   {
>       BlockDriver *drv = bs->drv;
>       QEMUIOVector qiov;
> -    struct iovec iov = {0};
>       int ret = 0;
> +    void *chunk = NULL;
> +
> +    qemu_iovec_init(&qiov, 0);
>   
>       int max_write_zeroes = bs->bl.max_write_zeroes ?
>                              bs->bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT;
> @@ -3217,27 +3221,35 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>           }
>   
>           if (ret == -ENOTSUP) {
> +            int64_t num_bytes = (int64_t)num << BDRV_SECTOR_BITS;
> +            int chunk_size = MIN(MAX_ZEROES_CHUNK, num_bytes);
> +
>               /* Fall back to bounce buffer if write zeroes is unsupported */
> -            iov.iov_len = num * BDRV_SECTOR_SIZE;
> -            if (iov.iov_base == NULL) {
> -                iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE);
> -                if (iov.iov_base == NULL) {
> +            if (chunk == NULL) {
> +                chunk = qemu_try_blockalign(bs, chunk_size);
> +                if (chunk == NULL) {
>                       ret = -ENOMEM;
>                       goto fail;
>                   }
> -                memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
> +                memset(chunk, 0, chunk_size);
> +            }
> +
> +            while (num_bytes > 0) {
> +                int to_add = MIN(chunk_size, num_bytes);
> +                qemu_iovec_add(&qiov, chunk, to_add);

This can and likely will fail for big num_bytes if you exceed IOV_MAX vectors.

I would stick to the old method and limit the num to a reasonable value e.g. MAX_WRITE_ZEROES_DEFAULT.
This becomes necessary as you set INT_MAX for max_write_zeroes. That hasn't been considered before in
the original patch.

Peter

> +                num_bytes -= to_add;
>               }
> -            qemu_iovec_init_external(&qiov, &iov, 1);
>   
>               ret = drv->bdrv_co_writev(bs, sector_num, num, &qiov);
>   
>               /* Keep bounce buffer around if it is big enough for all
>                * all future requests.
>                */
> -            if (num < max_write_zeroes) {
> -                qemu_vfree(iov.iov_base);
> -                iov.iov_base = NULL;
> +            if (chunk_size != MAX_ZEROES_CHUNK) {
> +                qemu_vfree(chunk);
> +                chunk = NULL;
>               }
> +            qemu_iovec_reset(&qiov);
>           }
>   
>           sector_num += num;
> @@ -3245,7 +3257,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>       }
>   
>   fail:
> -    qemu_vfree(iov.iov_base);
> +    qemu_iovec_destroy(&qiov);
> +    qemu_vfree(chunk);
>       return ret;
>   }
>   


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

  reply	other threads:[~2015-01-05  7:34 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-30  9:20 [Qemu-devel] [PATCH v3 0/8] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 1/8] block: prepare bdrv_co_do_write_zeroes to deal with large bl.max_write_zeroes Denis V. Lunev
2015-01-05  7:34   ` Peter Lieven [this message]
2015-01-05 11:06     ` Denis V. Lunev
2015-01-05 11:23       ` Peter Lieven
2015-01-05 11:48         ` Denis V. Lunev
2015-01-05 12:26         ` [Qemu-devel] [PATCH v2 1/1] " Denis V. Lunev
2015-01-05 12:32           ` [Qemu-devel] [PATCH v3 " Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 2/8] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 3/8] block/raw-posix: create do_fallocate helper Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 4/8] block/raw-posix: create translate_err helper to merge errno values Denis V. Lunev
2015-01-05  6:46   ` Fam Zheng
2015-01-05 11:17     ` Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 5/8] block/raw-posix: refactor handle_aiocb_write_zeroes a bit Denis V. Lunev
2015-01-05  6:57   ` Fam Zheng
2015-01-05 11:07     ` Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 6/8] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes Denis V. Lunev
2015-01-05  7:02   ` Fam Zheng
2015-01-05 11:14     ` Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 7/8] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes Denis V. Lunev
2014-12-30  9:20 ` [Qemu-devel] [PATCH 8/8] block/raw-posix: set max_write_zeroes to INT_MAX for regular files Denis V. Lunev
2014-12-30 10:55 ` [Qemu-devel] [PATCH v3 0/8] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c Peter Lieven
2014-12-30 11:07   ` Denis V. Lunev
2015-01-05  6:55     ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54AA3E84.8080003@kamp.de \
    --to=pl@kamp.de \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.