From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34418) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YIHcg-0001Hj-MX for qemu-devel@nongnu.org; Mon, 02 Feb 2015 09:05:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YIHcb-0000S4-Do for qemu-devel@nongnu.org; Mon, 02 Feb 2015 09:05:06 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60701) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YIHcb-0000Rx-5h for qemu-devel@nongnu.org; Mon, 02 Feb 2015 09:05:01 -0500 Date: Mon, 2 Feb 2015 15:04:52 +0100 From: Kevin Wolf Message-ID: <20150202140452.GG9478@noname.redhat.com> References: <1422607337-25335-1-git-send-email-den@openvz.org> <1422607337-25335-8-git-send-email-den@openvz.org> <20150202132355.GC9478@noname.redhat.com> <54CF81DA.3020003@kamp.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54CF81DA.3020003@kamp.de> Subject: Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: "Denis V. Lunev" , Fam Zheng , qemu-devel@nongnu.org, Stefan Hajnoczi Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: > Am 02.02.2015 um 14:23 schrieb Kevin Wolf: > >Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: > >>fallocate() works fine and could handle properly with arbitrary size > >>requests. There is no sense to reduce the amount of space to fallocate. > >>The bigger is the size, the better is the performance as the amount of > >>journal updates is reduced. > >> > >>The patch changes behavior for both generic filesystem and XFS codepaths, > >>which are different in handle_aiocb_write_zeroes. The implementation > >>of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same > >>thus the change is fine for both ways. > >> > >>Signed-off-by: Denis V. Lunev > >>Reviewed-by: Max Reitz > >>CC: Kevin Wolf > >>CC: Stefan Hajnoczi > >>CC: Peter Lieven > >>CC: Fam Zheng > >>--- > >> block/raw-posix.c | 17 +++++++++++++++++ > >> 1 file changed, 17 insertions(+) > >> > >>diff --git a/block/raw-posix.c b/block/raw-posix.c > >>index 7b42f37..933c778 100644 > >>--- a/block/raw-posix.c > >>+++ b/block/raw-posix.c > >>@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > >> } > >> } > >>+static void raw_probe_max_write_zeroes(BlockDriverState *bs) > >>+{ > >>+ BDRVRawState *s = bs->opaque; > >>+ struct stat st; > >>+ > >>+ if (fstat(s->fd, &st) < 0) { > >>+ return; /* no problem, keep default value */ > >>+ } > >>+ if (!S_ISREG(st.st_mode) || !s->discard_zeroes) { > >>+ return; > >>+ } > >>+ bs->bl.max_write_zeroes = INT_MAX; > >>+} > >Peter, do you remember why INT_MAX isn't actually the default? I think > >the most reasonable behaviour would be that a limitation is only used if > >a block driver requests it, and otherwise unlimited is assumed. > > The default (0) actually means unlimited or undefined. We introduced > that limit of 16MB in bdrv_co_write_zeroes to create only reasonable > sized requests because there is no guarantee that write zeroes is a > fast operation. We should set INT_MAX only if we know that write > zeroes of an arbitrary size is always fast. Well, splitting it up doesn't make it any faster. I think we can assume that drv->bdrv_co_write_zeroes() wants to know the full request size unless the driver has explicitly set bs->bl.max_write_zeroes. Only if we go on emulating the operation with a zero-filled buffer, I understand that we might need to split it up so that our bounce buffer doesn't become huge. Kevin