From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53465) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Whxfk-0002bq-Gd for qemu-devel@nongnu.org; Wed, 07 May 2014 04:57:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Whxfe-0007tQ-75 for qemu-devel@nongnu.org; Wed, 07 May 2014 04:57:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46311) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Whxfd-0007tJ-Us for qemu-devel@nongnu.org; Wed, 07 May 2014 04:57:46 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s478viRs014083 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 7 May 2014 04:57:44 -0400 Date: Wed, 7 May 2014 16:57:55 +0800 From: Fam Zheng Message-ID: <20140507085755.GA10558@T430.nay.redhat.com> References: <1399343564-17687-1-git-send-email-famz@redhat.com> <20140507014517.GG1574@T430.nay.redhat.com> <20140507082039.GA4045@noname.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140507082039.GA4045@noname.str.redhat.com> Subject: Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, Stefan Hajnoczi On Wed, 05/07 10:20, Kevin Wolf wrote: > Am 07.05.2014 um 03:45 hat Fam Zheng geschrieben: > > On Tue, 05/06 10:32, Fam Zheng wrote: > > > On mounted NFS filesystem, ftruncate is much much slower than doing a > > > zero write. Changing this significantly speeds up cluster allocation. > > > > > > Comparing by converting a cirros image (296M) to VMDK on an NFS mount > > > point, over 1Gbe LAN: > > > > > > $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk > > > > > > Before: > > > real 0m26.464s > > > user 0m0.133s > > > sys 0m0.527s > > > > > > After: > > > real 0m2.120s > > > user 0m0.080s > > > sys 0m0.197s > > > > > > Signed-off-by: Fam Zheng > > > > > > --- > > > V2: Fix cluster_offset check. (Kevin) > > > > > > Signed-off-by: Fam Zheng > > > --- > > > block/vmdk.c | 19 ++++++++++++++----- > > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > > > diff --git a/block/vmdk.c b/block/vmdk.c > > > index 06a1f9f..98d2d56 100644 > > > --- a/block/vmdk.c > > > +++ b/block/vmdk.c > > > @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, > > > int min_index, i, j; > > > uint32_t min_count, *l2_table; > > > bool zeroed = false; > > > + int64_t ret; > > > > > > if (m_data) { > > > m_data->valid = 0; > > > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > > > } > > > > > > /* Avoid the L2 tables update for the images that have snapshots. */ > > > - *cluster_offset = bdrv_getlength(extent->file); > > > + ret = bdrv_getlength(extent->file); > > > + if (ret < 0 || > > > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > > > + return VMDK_ERROR; > > > + } > > > + *cluster_offset = ret; > > > if (!extent->compressed) { > > > - bdrv_truncate( > > > - extent->file, > > > - *cluster_offset + (extent->cluster_sectors << 9) > > > - ); > > > + ret = bdrv_write_zeroes(extent->file, > > > + *cluster_offset >> BDRV_SECTOR_BITS, > > > + extent->cluster_sectors, > > > + 0); > > > > Hi Stefan, > > > > By considering a bdrv_write_zeroes as a pre-write, it in general doubles the > > write for the whole image, so it's not a good solution. > > > > A better way would be removing the bdrv_truncate and require the caller to do > > full cluster write (with a bounce buffer if necessary). > > Doesn't get_whole_cluster() already ensure that you already write a full > cluster to the image file? That one is actually called get_backing_cluster(), if you look at the code it has. :) > > However, it might be better to not use bdrv_getlength() each time you > need a new cluster, but instead use a field in VmdkExtent to keep the > next free cluster offset (which is rounded up in vmdk_open). Yes, indeed. We should do that. Thanks, Fam