From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60816)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1bMy5x-0004uE-6j
	for qemu-devel@nongnu.org; Tue, 12 Jul 2016 09:51:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1bMy5s-000148-8S
	for qemu-devel@nongnu.org; Tue, 12 Jul 2016 09:51:28 -0400
Date: Tue, 12 Jul 2016 15:51:08 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20160712135108.GC4478@noname.redhat.com>
References: <1468316175-11522-1-git-send-email-den@openvz.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1468316175-11522-1-git-send-email-den@openvz.org>
Subject: Re: [Qemu-devel] [PATCH 1/1] mirror: double performance of the bulk
 stage if the disc is full
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Denis V. Lunev" <den@openvz.org>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <famz@redhat.com>, Max Reitz <mreitz@redhat.com>, Jeff Cody <jcody@redhat.com>, Eric Blake <eblake@redhat.com>

Am 12.07.2016 um 11:36 hat Denis V. Lunev geschrieben:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Mirror can do up to 16 in-flight requests, but actually on full copy
> (the whole source disk is non-zero) in-flight is always 1. This happens
> as the request is not limited in size: the data occupies maximum available
> capacity of s->buf.
> 
> The patch limits the size of the request to some artificial constant
> (1 Mb here), which is not that big or small. This effectively enables
> back parallelism in mirror code as it was designed.
> 
> The result is important: the time to migrate 10 Gb disk is reduced from
> ~350 sec to 170 sec.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> ---
>  block/mirror.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 4fe127e..53d3bcd 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -23,7 +23,9 @@
>  
>  #define SLICE_TIME    100000000ULL /* ns */
>  #define MAX_IN_FLIGHT 16
> -#define DEFAULT_MIRROR_BUF_SIZE   (10 << 20)
> +#define MAX_IO_SECTORS ((1 << 20) >> BDRV_SECTOR_BITS) /* 1 Mb */
> +#define DEFAULT_MIRROR_BUF_SIZE \
> +    (MAX_IN_FLIGHT * MAX_IO_SECTORS * BDRV_SECTOR_SIZE)
>  
>  /* The mirroring buffer is a list of granularity-sized chunks.
>   * Free chunks are organized in a list.
> @@ -387,7 +389,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>                                            nb_chunks * sectors_per_chunk,
>                                            &io_sectors, &file);
>          if (ret < 0) {
> -            io_sectors = nb_chunks * sectors_per_chunk;
> +            io_sectors = MIN(nb_chunks * sectors_per_chunk, MAX_IO_SECTORS);
> +        } else if (ret & BDRV_BLOCK_DATA) {
> +            io_sectors = MIN(io_sectors, MAX_IO_SECTORS);
>          }

Would it make sense to consider the actual buffer size? If we have
s->buf_size / 16 > 1 MB, then this is wasting buffer space.

On the other hand, there is probably a minimum size where using a single
larger buffer performs better than two concurrent small ones. Which size
this is, is hard to tell, though. If we assume that 1 MB is a good
default (should we do some more testing to find the sweet spot?), we
could write this as:

  io_sectors = MIN(io_sectors,
                   MAX((s->buf_size / BDRV_SECTOR_SIZE) / MAX_IN_FLIGHT,
                       MAX_IO_SECTORS))

Kevin