* [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation @ 2014-05-06 2:32 Fam Zheng 2014-05-07 1:45 ` Fam Zheng 0 siblings, 1 reply; 7+ messages in thread From: Fam Zheng @ 2014-05-06 2:32 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi On mounted NFS filesystem, ftruncate is much much slower than doing a zero write. Changing this significantly speeds up cluster allocation. Comparing by converting a cirros image (296M) to VMDK on an NFS mount point, over 1Gbe LAN: $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk Before: real 0m26.464s user 0m0.133s sys 0m0.527s After: real 0m2.120s user 0m0.080s sys 0m0.197s Signed-off-by: Fam Zheng <famz@redhat.com> --- V2: Fix cluster_offset check. (Kevin) Signed-off-by: Fam Zheng <famz@redhat.com> --- block/vmdk.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 06a1f9f..98d2d56 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, int min_index, i, j; uint32_t min_count, *l2_table; bool zeroed = false; + int64_t ret; if (m_data) { m_data->valid = 0; @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, } /* Avoid the L2 tables update for the images that have snapshots. */ - *cluster_offset = bdrv_getlength(extent->file); + ret = bdrv_getlength(extent->file); + if (ret < 0 || + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { + return VMDK_ERROR; + } + *cluster_offset = ret; if (!extent->compressed) { - bdrv_truncate( - extent->file, - *cluster_offset + (extent->cluster_sectors << 9) - ); + ret = bdrv_write_zeroes(extent->file, + *cluster_offset >> BDRV_SECTOR_BITS, + extent->cluster_sectors, + 0); + if (ret) { + return VMDK_ERROR; + } } *cluster_offset >>= 9; -- 1.9.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation 2014-05-06 2:32 [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation Fam Zheng @ 2014-05-07 1:45 ` Fam Zheng 2014-05-07 8:04 ` Stefan Hajnoczi 2014-05-07 8:20 ` Kevin Wolf 0 siblings, 2 replies; 7+ messages in thread From: Fam Zheng @ 2014-05-07 1:45 UTC (permalink / raw) To: qemu-devel, Stefan Hajnoczi; +Cc: Kevin Wolf On Tue, 05/06 10:32, Fam Zheng wrote: > On mounted NFS filesystem, ftruncate is much much slower than doing a > zero write. Changing this significantly speeds up cluster allocation. > > Comparing by converting a cirros image (296M) to VMDK on an NFS mount > point, over 1Gbe LAN: > > $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk > > Before: > real 0m26.464s > user 0m0.133s > sys 0m0.527s > > After: > real 0m2.120s > user 0m0.080s > sys 0m0.197s > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > V2: Fix cluster_offset check. (Kevin) > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block/vmdk.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/block/vmdk.c b/block/vmdk.c > index 06a1f9f..98d2d56 100644 > --- a/block/vmdk.c > +++ b/block/vmdk.c > @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, > int min_index, i, j; > uint32_t min_count, *l2_table; > bool zeroed = false; > + int64_t ret; > > if (m_data) { > m_data->valid = 0; > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > } > > /* Avoid the L2 tables update for the images that have snapshots. */ > - *cluster_offset = bdrv_getlength(extent->file); > + ret = bdrv_getlength(extent->file); > + if (ret < 0 || > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > + return VMDK_ERROR; > + } > + *cluster_offset = ret; > if (!extent->compressed) { > - bdrv_truncate( > - extent->file, > - *cluster_offset + (extent->cluster_sectors << 9) > - ); > + ret = bdrv_write_zeroes(extent->file, > + *cluster_offset >> BDRV_SECTOR_BITS, > + extent->cluster_sectors, > + 0); Hi Stefan, By considering a bdrv_write_zeroes as a pre-write, it in general doubles the write for the whole image, so it's not a good solution. A better way would be removing the bdrv_truncate and require the caller to do full cluster write (with a bounce buffer if necessary). So let's drop this patch. Thanks, Fam ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation 2014-05-07 1:45 ` Fam Zheng @ 2014-05-07 8:04 ` Stefan Hajnoczi 2014-05-07 8:20 ` Kevin Wolf 1 sibling, 0 replies; 7+ messages in thread From: Stefan Hajnoczi @ 2014-05-07 8:04 UTC (permalink / raw) To: Fam Zheng; +Cc: Kevin Wolf, qemu-devel On Wed, May 07, 2014 at 09:45:17AM +0800, Fam Zheng wrote: > On Tue, 05/06 10:32, Fam Zheng wrote: > > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > > } > > > > /* Avoid the L2 tables update for the images that have snapshots. */ > > - *cluster_offset = bdrv_getlength(extent->file); > > + ret = bdrv_getlength(extent->file); > > + if (ret < 0 || > > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > > + return VMDK_ERROR; > > + } > > + *cluster_offset = ret; > > if (!extent->compressed) { > > - bdrv_truncate( > > - extent->file, > > - *cluster_offset + (extent->cluster_sectors << 9) > > - ); > > + ret = bdrv_write_zeroes(extent->file, > > + *cluster_offset >> BDRV_SECTOR_BITS, > > + extent->cluster_sectors, > > + 0); > > Hi Stefan, > > By considering a bdrv_write_zeroes as a pre-write, it in general doubles the > write for the whole image, so it's not a good solution. > > A better way would be removing the bdrv_truncate and require the caller to do > full cluster write (with a bounce buffer if necessary). > > So let's drop this patch. Okay, thanks for investigating this. Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation 2014-05-07 1:45 ` Fam Zheng 2014-05-07 8:04 ` Stefan Hajnoczi @ 2014-05-07 8:20 ` Kevin Wolf 2014-05-07 8:57 ` Fam Zheng 1 sibling, 1 reply; 7+ messages in thread From: Kevin Wolf @ 2014-05-07 8:20 UTC (permalink / raw) To: Fam Zheng; +Cc: qemu-devel, Stefan Hajnoczi Am 07.05.2014 um 03:45 hat Fam Zheng geschrieben: > On Tue, 05/06 10:32, Fam Zheng wrote: > > On mounted NFS filesystem, ftruncate is much much slower than doing a > > zero write. Changing this significantly speeds up cluster allocation. > > > > Comparing by converting a cirros image (296M) to VMDK on an NFS mount > > point, over 1Gbe LAN: > > > > $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk > > > > Before: > > real 0m26.464s > > user 0m0.133s > > sys 0m0.527s > > > > After: > > real 0m2.120s > > user 0m0.080s > > sys 0m0.197s > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > --- > > V2: Fix cluster_offset check. (Kevin) > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > > block/vmdk.c | 19 ++++++++++++++----- > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > diff --git a/block/vmdk.c b/block/vmdk.c > > index 06a1f9f..98d2d56 100644 > > --- a/block/vmdk.c > > +++ b/block/vmdk.c > > @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, > > int min_index, i, j; > > uint32_t min_count, *l2_table; > > bool zeroed = false; > > + int64_t ret; > > > > if (m_data) { > > m_data->valid = 0; > > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > > } > > > > /* Avoid the L2 tables update for the images that have snapshots. */ > > - *cluster_offset = bdrv_getlength(extent->file); > > + ret = bdrv_getlength(extent->file); > > + if (ret < 0 || > > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > > + return VMDK_ERROR; > > + } > > + *cluster_offset = ret; > > if (!extent->compressed) { > > - bdrv_truncate( > > - extent->file, > > - *cluster_offset + (extent->cluster_sectors << 9) > > - ); > > + ret = bdrv_write_zeroes(extent->file, > > + *cluster_offset >> BDRV_SECTOR_BITS, > > + extent->cluster_sectors, > > + 0); > > Hi Stefan, > > By considering a bdrv_write_zeroes as a pre-write, it in general doubles the > write for the whole image, so it's not a good solution. > > A better way would be removing the bdrv_truncate and require the caller to do > full cluster write (with a bounce buffer if necessary). Doesn't get_whole_cluster() already ensure that you already write a full cluster to the image file? However, it might be better to not use bdrv_getlength() each time you need a new cluster, but instead use a field in VmdkExtent to keep the next free cluster offset (which is rounded up in vmdk_open). This will ensure that we don't overlap the next cluster allocation in case get_whole_cluster() fails halfway through. (In fact, the L2 table should only be updated after get_whole_cluster() has succeeded, but we can do both to be on the safe side...) Kevin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation 2014-05-07 8:20 ` Kevin Wolf @ 2014-05-07 8:57 ` Fam Zheng 2014-05-07 9:06 ` Kevin Wolf 0 siblings, 1 reply; 7+ messages in thread From: Fam Zheng @ 2014-05-07 8:57 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi On Wed, 05/07 10:20, Kevin Wolf wrote: > Am 07.05.2014 um 03:45 hat Fam Zheng geschrieben: > > On Tue, 05/06 10:32, Fam Zheng wrote: > > > On mounted NFS filesystem, ftruncate is much much slower than doing a > > > zero write. Changing this significantly speeds up cluster allocation. > > > > > > Comparing by converting a cirros image (296M) to VMDK on an NFS mount > > > point, over 1Gbe LAN: > > > > > > $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk > > > > > > Before: > > > real 0m26.464s > > > user 0m0.133s > > > sys 0m0.527s > > > > > > After: > > > real 0m2.120s > > > user 0m0.080s > > > sys 0m0.197s > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > > > --- > > > V2: Fix cluster_offset check. (Kevin) > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > --- > > > block/vmdk.c | 19 ++++++++++++++----- > > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > > > diff --git a/block/vmdk.c b/block/vmdk.c > > > index 06a1f9f..98d2d56 100644 > > > --- a/block/vmdk.c > > > +++ b/block/vmdk.c > > > @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, > > > int min_index, i, j; > > > uint32_t min_count, *l2_table; > > > bool zeroed = false; > > > + int64_t ret; > > > > > > if (m_data) { > > > m_data->valid = 0; > > > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > > > } > > > > > > /* Avoid the L2 tables update for the images that have snapshots. */ > > > - *cluster_offset = bdrv_getlength(extent->file); > > > + ret = bdrv_getlength(extent->file); > > > + if (ret < 0 || > > > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > > > + return VMDK_ERROR; > > > + } > > > + *cluster_offset = ret; > > > if (!extent->compressed) { > > > - bdrv_truncate( > > > - extent->file, > > > - *cluster_offset + (extent->cluster_sectors << 9) > > > - ); > > > + ret = bdrv_write_zeroes(extent->file, > > > + *cluster_offset >> BDRV_SECTOR_BITS, > > > + extent->cluster_sectors, > > > + 0); > > > > Hi Stefan, > > > > By considering a bdrv_write_zeroes as a pre-write, it in general doubles the > > write for the whole image, so it's not a good solution. > > > > A better way would be removing the bdrv_truncate and require the caller to do > > full cluster write (with a bounce buffer if necessary). > > Doesn't get_whole_cluster() already ensure that you already write a full > cluster to the image file? That one is actually called get_backing_cluster(), if you look at the code it has. :) > > However, it might be better to not use bdrv_getlength() each time you > need a new cluster, but instead use a field in VmdkExtent to keep the > next free cluster offset (which is rounded up in vmdk_open). Yes, indeed. We should do that. Thanks, Fam ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation 2014-05-07 8:57 ` Fam Zheng @ 2014-05-07 9:06 ` Kevin Wolf 2014-05-07 9:21 ` Fam Zheng 0 siblings, 1 reply; 7+ messages in thread From: Kevin Wolf @ 2014-05-07 9:06 UTC (permalink / raw) To: Fam Zheng; +Cc: qemu-devel, Stefan Hajnoczi Am 07.05.2014 um 10:57 hat Fam Zheng geschrieben: > On Wed, 05/07 10:20, Kevin Wolf wrote: > > Am 07.05.2014 um 03:45 hat Fam Zheng geschrieben: > > > On Tue, 05/06 10:32, Fam Zheng wrote: > > > > On mounted NFS filesystem, ftruncate is much much slower than doing a > > > > zero write. Changing this significantly speeds up cluster allocation. > > > > > > > > Comparing by converting a cirros image (296M) to VMDK on an NFS mount > > > > point, over 1Gbe LAN: > > > > > > > > $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk > > > > > > > > Before: > > > > real 0m26.464s > > > > user 0m0.133s > > > > sys 0m0.527s > > > > > > > > After: > > > > real 0m2.120s > > > > user 0m0.080s > > > > sys 0m0.197s > > > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > > > > > --- > > > > V2: Fix cluster_offset check. (Kevin) > > > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > --- > > > > block/vmdk.c | 19 ++++++++++++++----- > > > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/block/vmdk.c b/block/vmdk.c > > > > index 06a1f9f..98d2d56 100644 > > > > --- a/block/vmdk.c > > > > +++ b/block/vmdk.c > > > > @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, > > > > int min_index, i, j; > > > > uint32_t min_count, *l2_table; > > > > bool zeroed = false; > > > > + int64_t ret; > > > > > > > > if (m_data) { > > > > m_data->valid = 0; > > > > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > > > > } > > > > > > > > /* Avoid the L2 tables update for the images that have snapshots. */ > > > > - *cluster_offset = bdrv_getlength(extent->file); > > > > + ret = bdrv_getlength(extent->file); > > > > + if (ret < 0 || > > > > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > > > > + return VMDK_ERROR; > > > > + } > > > > + *cluster_offset = ret; > > > > if (!extent->compressed) { > > > > - bdrv_truncate( > > > > - extent->file, > > > > - *cluster_offset + (extent->cluster_sectors << 9) > > > > - ); > > > > + ret = bdrv_write_zeroes(extent->file, > > > > + *cluster_offset >> BDRV_SECTOR_BITS, > > > > + extent->cluster_sectors, > > > > + 0); > > > > > > Hi Stefan, > > > > > > By considering a bdrv_write_zeroes as a pre-write, it in general doubles the > > > write for the whole image, so it's not a good solution. > > > > > > A better way would be removing the bdrv_truncate and require the caller to do > > > full cluster write (with a bounce buffer if necessary). > > > > Doesn't get_whole_cluster() already ensure that you already write a full > > cluster to the image file? > > That one is actually called get_backing_cluster(), if you look at the code it > has. :) Right, it doesn't do anything without a backing file. This is different from qcow2, whose mechanism I assumed without reading the code in detail. :-) I think it would make sense to rewrite get_whole_cluster() to write the cluster for both image with a backing file and standalone images; just that without a backing file it would use memset() to fill the buffer instead of bdrv_read(). Not sure how easy it would be, but it might be an opportunity to also change it to write only those parts of the cluster that aren't written to anyway by the cluster. Kevin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation 2014-05-07 9:06 ` Kevin Wolf @ 2014-05-07 9:21 ` Fam Zheng 0 siblings, 0 replies; 7+ messages in thread From: Fam Zheng @ 2014-05-07 9:21 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi On Wed, 05/07 11:06, Kevin Wolf wrote: > Am 07.05.2014 um 10:57 hat Fam Zheng geschrieben: > > On Wed, 05/07 10:20, Kevin Wolf wrote: > > > Am 07.05.2014 um 03:45 hat Fam Zheng geschrieben: > > > > On Tue, 05/06 10:32, Fam Zheng wrote: > > > > > On mounted NFS filesystem, ftruncate is much much slower than doing a > > > > > zero write. Changing this significantly speeds up cluster allocation. > > > > > > > > > > Comparing by converting a cirros image (296M) to VMDK on an NFS mount > > > > > point, over 1Gbe LAN: > > > > > > > > > > $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk > > > > > > > > > > Before: > > > > > real 0m26.464s > > > > > user 0m0.133s > > > > > sys 0m0.527s > > > > > > > > > > After: > > > > > real 0m2.120s > > > > > user 0m0.080s > > > > > sys 0m0.197s > > > > > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > > > > > > > --- > > > > > V2: Fix cluster_offset check. (Kevin) > > > > > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > > --- > > > > > block/vmdk.c | 19 ++++++++++++++----- > > > > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/block/vmdk.c b/block/vmdk.c > > > > > index 06a1f9f..98d2d56 100644 > > > > > --- a/block/vmdk.c > > > > > +++ b/block/vmdk.c > > > > > @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs, > > > > > int min_index, i, j; > > > > > uint32_t min_count, *l2_table; > > > > > bool zeroed = false; > > > > > + int64_t ret; > > > > > > > > > > if (m_data) { > > > > > m_data->valid = 0; > > > > > @@ -1110,12 +1111,20 @@ static int get_cluster_offset(BlockDriverState *bs, > > > > > } > > > > > > > > > > /* Avoid the L2 tables update for the images that have snapshots. */ > > > > > - *cluster_offset = bdrv_getlength(extent->file); > > > > > + ret = bdrv_getlength(extent->file); > > > > > + if (ret < 0 || > > > > > + ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) { > > > > > + return VMDK_ERROR; > > > > > + } > > > > > + *cluster_offset = ret; > > > > > if (!extent->compressed) { > > > > > - bdrv_truncate( > > > > > - extent->file, > > > > > - *cluster_offset + (extent->cluster_sectors << 9) > > > > > - ); > > > > > + ret = bdrv_write_zeroes(extent->file, > > > > > + *cluster_offset >> BDRV_SECTOR_BITS, > > > > > + extent->cluster_sectors, > > > > > + 0); > > > > > > > > Hi Stefan, > > > > > > > > By considering a bdrv_write_zeroes as a pre-write, it in general doubles the > > > > write for the whole image, so it's not a good solution. > > > > > > > > A better way would be removing the bdrv_truncate and require the caller to do > > > > full cluster write (with a bounce buffer if necessary). > > > > > > Doesn't get_whole_cluster() already ensure that you already write a full > > > cluster to the image file? > > > > That one is actually called get_backing_cluster(), if you look at the code it > > has. :) > > Right, it doesn't do anything without a backing file. This is different > from qcow2, whose mechanism I assumed without reading the code in > detail. :-) > > I think it would make sense to rewrite get_whole_cluster() to write the > cluster for both image with a backing file and standalone images; just > that without a backing file it would use memset() to fill the buffer > instead of bdrv_read(). > > Not sure how easy it would be, but it might be an opportunity to also > change it to write only those parts of the cluster that aren't written > to anyway by the cluster. > I think that shouldn't be hard . I'll make the change and send another patch later. Thanks, Fam ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-05-07 9:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-06 2:32 [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation Fam Zheng 2014-05-07 1:45 ` Fam Zheng 2014-05-07 8:04 ` Stefan Hajnoczi 2014-05-07 8:20 ` Kevin Wolf 2014-05-07 8:57 ` Fam Zheng 2014-05-07 9:06 ` Kevin Wolf 2014-05-07 9:21 ` Fam Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).