From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56225) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y85kh-0007ze-L5 for qemu-devel@nongnu.org; Mon, 05 Jan 2015 06:23:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y85kd-0007Ss-AT for qemu-devel@nongnu.org; Mon, 05 Jan 2015 06:23:15 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:44710 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y85kd-0007Sg-08 for qemu-devel@nongnu.org; Mon, 05 Jan 2015 06:23:11 -0500 Message-ID: <54AA7419.8050304@kamp.de> Date: Mon, 05 Jan 2015 12:23:05 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1419931250-19259-1-git-send-email-den@openvz.org> <1419931250-19259-2-git-send-email-den@openvz.org> <54AA3E84.8080003@kamp.de> <54AA7022.2030304@openvz.org> In-Reply-To: <54AA7022.2030304@openvz.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/8] block: prepare bdrv_co_do_write_zeroes to deal with large bl.max_write_zeroes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Denis V. Lunev" Cc: Kevin Wolf , qemu-devel@nongnu.org, Stefan Hajnoczi On 05.01.2015 12:06, Denis V. Lunev wrote: > On 05/01/15 10:34, Peter Lieven wrote: >> On 30.12.2014 10:20, Denis V. Lunev wrote: >>> bdrv_co_do_write_zeroes split writes using bl.max_write_zeroes or >>> 16 MiB as a chunk size. This is implemented in this way to tolerate >>> buggy block backends which do not accept too big requests. >>> >>> Though if the bdrv_co_write_zeroes callback is not good enough, we >>> fallback to write data explicitely using bdrv_co_writev and we >>> create buffer to accomodate zeroes inside. The size of this buffer >>> is the size of the chunk. Thus if the underlying layer will have >>> bl.max_write_zeroes high enough, f.e. 4 GiB, the allocation can fail. >>> >>> Actually, there is no need to allocate such a big amount of memory. >>> We could simply allocate 1 MiB buffer and create iovec, which will >>> point to the same memory. >>> >>> Signed-off-by: Denis V. Lunev >>> CC: Kevin Wolf >>> CC: Stefan Hajnoczi >>> CC: Peter Lieven >>> --- >>> block.c | 35 ++++++++++++++++++++++++----------- >>> 1 file changed, 24 insertions(+), 11 deletions(-) >>> >>> diff --git a/block.c b/block.c >>> index 4165d42..d69c121 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -3173,14 +3173,18 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, >>> * of 32768 512-byte sectors (16 MiB) per request. >>> */ >>> #define MAX_WRITE_ZEROES_DEFAULT 32768 >>> +/* allocate iovec with zeroes using 1 MiB chunks to avoid to big allocations */ >>> +#define MAX_ZEROES_CHUNK (1024 * 1024) >>> static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, >>> int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) >>> { >>> BlockDriver *drv = bs->drv; >>> QEMUIOVector qiov; >>> - struct iovec iov = {0}; >>> int ret = 0; >>> + void *chunk = NULL; >>> + >>> + qemu_iovec_init(&qiov, 0); >>> int max_write_zeroes = bs->bl.max_write_zeroes ? >>> bs->bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT; >>> @@ -3217,27 +3221,35 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, >>> } >>> if (ret == -ENOTSUP) { >>> + int64_t num_bytes = (int64_t)num << BDRV_SECTOR_BITS; >>> + int chunk_size = MIN(MAX_ZEROES_CHUNK, num_bytes); >>> + >>> /* Fall back to bounce buffer if write zeroes is unsupported */ >>> - iov.iov_len = num * BDRV_SECTOR_SIZE; >>> - if (iov.iov_base == NULL) { >>> - iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE); >>> - if (iov.iov_base == NULL) { >>> + if (chunk == NULL) { >>> + chunk = qemu_try_blockalign(bs, chunk_size); >>> + if (chunk == NULL) { >>> ret = -ENOMEM; >>> goto fail; >>> } >>> - memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE); >>> + memset(chunk, 0, chunk_size); >>> + } >>> + >>> + while (num_bytes > 0) { >>> + int to_add = MIN(chunk_size, num_bytes); >>> + qemu_iovec_add(&qiov, chunk, to_add); >> >> This can and likely will fail for big num_bytes if you exceed IOV_MAX vectors. >> >> I would stick to the old method and limit the num to a reasonable value e.g. MAX_WRITE_ZEROES_DEFAULT. >> This becomes necessary as you set INT_MAX for max_write_zeroes. That hasn't been considered before in >> the original patch. >> >> Peter >> > > hmm. You are right, but I think that it would be better to limit iovec size > to 32 and this will solve the problem. Allocation of 32 Mb could be a real problem > on loaded system could be a problem. > > What do you think on this? May be we could consider 16 as a limit... I would do the following: ---8<--- From 8c2a08baddbcd9e89bbb11fa83a42350bd7cc095 Mon Sep 17 00:00:00 2001 From: Peter Lieven Date: Mon, 5 Jan 2015 12:14:52 +0100 Subject: [PATCH] block: limited request size in write zeroes unsupported path If bs->bl.max_write_zeroes is large and we end up in the unsupported path we might allocate a lot of memory for the iovector and/or even generate an oversized requests. Fix this by limiting the request by the minimum of the reported maximum transfer size or 16MB (32768 sectors). Reported-by: Denis V. Lunev Signed-off-by: Peter Lieven --- block.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index a612594..8009478 100644 --- a/block.c +++ b/block.c @@ -3203,6 +3203,9 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, if (ret == -ENOTSUP) { /* Fall back to bounce buffer if write zeroes is unsupported */ + int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer_length, + MAX_WRITE_ZEROES_DEFAULT); + num = MIN(num, max_xfer_len); iov.iov_len = num * BDRV_SECTOR_SIZE; if (iov.iov_base == NULL) { iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE); @@ -3219,7 +3222,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, /* Keep bounce buffer around if it is big enough for all * all future requests. */ - if (num < max_write_zeroes) { + if (num < max_xfer_len) { qemu_vfree(iov.iov_base); iov.iov_base = NULL; } -- 1.7.9.5