[Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters
@ 2017-04-01 14:44 Ashijeet Acharya
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Previously posted series patches:
v1 - http://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg02044.html
v2 - http://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg05080.html

This series helps to optimize the I/O performance of VMDK driver.

Patch 1 helps us to move vmdk_find_offset_in_cluster.

Patch 2 performs a simple function re-naming task.

Patch 3 adds new functions to help us allocate multiple clusters according to
the size requested, perform COW if required and return the offset of the first
newly allocated cluster. Also make loading of metadata tables easier and
avoid code duplication.

Patch 4 performs a simple function re-naming task and re-factors it to make use of
new metadata functions to avoid code duplication.

Patch 5 helps to set the upper limit of the bytes handled in one cycle.

Patch 6 changes the metadata update code to update the L2 tables for multiple
clusters at once.

Note: v3 has an addition of new optimization of calling bdrv_pwrite_sync() only
once for atmost 512 clusters, as a result performance has increased to a great
extent (earlier till v2 it was 29%).

Optimization test results:

This patch series improves 128 KB sequential write performance to an
empty VMDK file by 54%

Benchmark command: ./qemu-img bench -w -c 1024 -s 128K -d 1 -t none -f
vmdk test.vmdk

Note: These patches pass all 41/41 tests suitable for the VMDK driver.

Changes in v3:
- move size_to_clusters() from patch 1 to 3 (fam)
- use DIV_ROUND_UP in size_to_clusters (fam)
- make patch 2 compilable (fam)
- rename vmdk_L2update as vmdk_l2update and use UINT32_MAX (fam)
- combine patch 3 and patch 4 (as in v2) to make them compilable (fam)
- call bdrv_pwrite_sync() for batches of atmost 512 clusters at once (fam)

Changes in v2:
- segregate the ugly Patch 1 in v1 into 6 readable and sensible patches
- include benchmark test results in v2

Ashijeet Acharya (6):
  vmdk: Move vmdk_find_offset_in_cluster() to the top
  vmdk: Rename get_whole_cluster() to vmdk_perform_cow()
  vmdk: New functions to assist allocating multiple clusters
  vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  vmdk: Set maximum bytes allocated in one cycle
  vmdk: Update metadata for multiple clusters

 block/vmdk.c | 608 ++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 456 insertions(+), 152 deletions(-)

-- 
2.6.2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top
  2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
@ 2017-04-01 14:44 ` Ashijeet Acharya
  2017-04-10 13:04   ` Ashijeet Acharya
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 2/6] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Move the existing vmdk_find_offset_in_cluster() function to the top of
the driver. Also, introduce a new helper function size_to_clusters()
which returns the number of clusters for a given size in bytes. Here,
we leave the last cluster as we need to perform COW for that one.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a9bd22b..22be887 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -242,6 +242,18 @@ static void vmdk_free_last_extent(BlockDriverState *bs)
     s->extents = g_renew(VmdkExtent, s->extents, s->num_extents);
 }
 
+static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
+                                                   int64_t offset)
+{
+    uint64_t extent_begin_offset, extent_relative_offset;
+    uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+
+    extent_begin_offset =
+        (extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
+    extent_relative_offset = offset - extent_begin_offset;
+    return extent_relative_offset % cluster_size;
+}
+
 static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
 {
     char *desc;
@@ -1266,18 +1278,6 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
     return NULL;
 }
 
-static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
-                                                   int64_t offset)
-{
-    uint64_t extent_begin_offset, extent_relative_offset;
-    uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
-
-    extent_begin_offset =
-        (extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
-    extent_relative_offset = offset - extent_begin_offset;
-    return extent_relative_offset % cluster_size;
-}
-
 static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
                                                   int64_t sector_num)
 {
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
@ 2017-04-10 13:04   ` Ashijeet Acharya
  2017-04-19 12:14     ` Fam Zheng
  0 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-10 13:04 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, John Snow, Max Reitz, Stefan Hajnoczi,
	QEMU Developers, qemu block, Ashijeet Acharya

On Sat, Apr 1, 2017 at 8:14 PM, Ashijeet Acharya
<ashijeetacharya@gmail.com> wrote:
> Move the existing vmdk_find_offset_in_cluster() function to the top of
> the driver. Also, introduce a new helper function size_to_clusters()
> which returns the number of clusters for a given size in bytes. Here,
> we leave the last cluster as we need to perform COW for that one.
>
I will remove the trailing part of the commit message in v4 as there
is no size_to_clusters() in this patch anymore, I forgot to update it!

Ashijeet

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top
  2017-04-10 13:04   ` Ashijeet Acharya
@ 2017-04-19 12:14     ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-19 12:14 UTC (permalink / raw)
  To: Ashijeet Acharya
  Cc: Kevin Wolf, qemu block, Stefan Hajnoczi, QEMU Developers,
	Max Reitz, John Snow

On Mon, 04/10 18:34, Ashijeet Acharya wrote:
> On Sat, Apr 1, 2017 at 8:14 PM, Ashijeet Acharya
> <ashijeetacharya@gmail.com> wrote:
> > Move the existing vmdk_find_offset_in_cluster() function to the top of
> > the driver. Also, introduce a new helper function size_to_clusters()
> > which returns the number of clusters for a given size in bytes. Here,
> > we leave the last cluster as we need to perform COW for that one.
> >
> I will remove the trailing part of the commit message in v4 as there
> is no size_to_clusters() in this patch anymore, I forgot to update it!

With that updated, you can add my:

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 2/6] vmdk: Rename get_whole_cluster() to vmdk_perform_cow()
  2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
@ 2017-04-01 14:44 ` Ashijeet Acharya
  2017-04-19 12:14   ` Fam Zheng
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Rename the existing function get_whole_cluster() to vmdk_perform_cow()
as its sole purpose is to perform COW for the first and the last
allocated clusters if needed.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 22be887..73ae786 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1028,8 +1028,8 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 }
 
-/**
- * get_whole_cluster
+/*
+ * vmdk_perform_cow
  *
  * Copy backing file's cluster that covers @sector_num, otherwise write zero,
  * to the cluster at @cluster_sector_num.
@@ -1037,13 +1037,18 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
  * If @skip_start_sector < @skip_end_sector, the relative range
  * [@skip_start_sector, @skip_end_sector) is not copied or written, and leave
  * it for call to write user data in the request.
+ *
+ * Returns:
+ *   VMDK_OK:       on success
+ *
+ *   VMDK_ERROR:    in error cases
  */
-static int get_whole_cluster(BlockDriverState *bs,
-                             VmdkExtent *extent,
-                             uint64_t cluster_offset,
-                             uint64_t offset,
-                             uint64_t skip_start_bytes,
-                             uint64_t skip_end_bytes)
+static int vmdk_perform_cow(BlockDriverState *bs,
+                            VmdkExtent *extent,
+                            uint64_t cluster_offset,
+                            uint64_t offset,
+                            uint64_t skip_start_bytes,
+                            uint64_t skip_end_bytes)
 {
     int ret = VMDK_OK;
     int64_t cluster_bytes;
@@ -1244,7 +1249,7 @@ static int get_cluster_offset(BlockDriverState *bs,
          * This problem may occur because of insufficient space on host disk
          * or inappropriate VM shutdown.
          */
-        ret = get_whole_cluster(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
+        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
                                 offset, skip_start_bytes, skip_end_bytes);
         if (ret) {
             return ret;
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vmdk: Rename get_whole_cluster() to vmdk_perform_cow()
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 2/6] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
@ 2017-04-19 12:14   ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-19 12:14 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> Rename the existing function get_whole_cluster() to vmdk_perform_cow()
> as its sole purpose is to perform COW for the first and the last
> allocated clusters if needed.
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> ---
>  block/vmdk.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 22be887..73ae786 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1028,8 +1028,8 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
>      }
>  }
>  
> -/**
> - * get_whole_cluster
> +/*
> + * vmdk_perform_cow
>   *
>   * Copy backing file's cluster that covers @sector_num, otherwise write zero,
>   * to the cluster at @cluster_sector_num.
> @@ -1037,13 +1037,18 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
>   * If @skip_start_sector < @skip_end_sector, the relative range
>   * [@skip_start_sector, @skip_end_sector) is not copied or written, and leave
>   * it for call to write user data in the request.
> + *
> + * Returns:
> + *   VMDK_OK:       on success
> + *
> + *   VMDK_ERROR:    in error cases
>   */
> -static int get_whole_cluster(BlockDriverState *bs,
> -                             VmdkExtent *extent,
> -                             uint64_t cluster_offset,
> -                             uint64_t offset,
> -                             uint64_t skip_start_bytes,
> -                             uint64_t skip_end_bytes)
> +static int vmdk_perform_cow(BlockDriverState *bs,
> +                            VmdkExtent *extent,
> +                            uint64_t cluster_offset,
> +                            uint64_t offset,
> +                            uint64_t skip_start_bytes,
> +                            uint64_t skip_end_bytes)
>  {
>      int ret = VMDK_OK;
>      int64_t cluster_bytes;
> @@ -1244,7 +1249,7 @@ static int get_cluster_offset(BlockDriverState *bs,
>           * This problem may occur because of insufficient space on host disk
>           * or inappropriate VM shutdown.
>           */
> -        ret = get_whole_cluster(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
> +        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
>                                  offset, skip_start_bytes, skip_end_bytes);
>          if (ret) {
>              return ret;
> -- 
> 2.6.2
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters
  2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 2/6] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
@ 2017-04-01 14:44 ` Ashijeet Acharya
  2017-04-19 12:56   ` Fam Zheng
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Move the cluster tables loading code out of the existing
get_cluster_offset() function to avoid code duplication and implement it
in separate get_cluster_table() and vmdk_L2load() functions.

Introduce two new helper functions handle_alloc() and
vmdk_alloc_cluster_offset(). handle_alloc() helps to allocate multiple
clusters at once starting from a given offset on disk and performs COW
if necessary for first and last allocated clusters.
vmdk_alloc_cluster_offset() helps to return the offset of the first of
the many newly allocated clusters. Also, provide proper documentation
for both.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 337 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 308 insertions(+), 29 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 73ae786..e5a289d 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -136,6 +136,7 @@ typedef struct VmdkMetaData {
     unsigned int l2_offset;
     int valid;
     uint32_t *l2_cache_entry;
+    uint32_t nb_clusters;
 } VmdkMetaData;
 
 typedef struct VmdkGrainMarker {
@@ -254,6 +255,14 @@ static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
     return extent_relative_offset % cluster_size;
 }
 
+static inline uint64_t size_to_clusters(VmdkExtent *extent, uint64_t size)
+{
+    uint64_t cluster_size, round_off_size;
+    cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+    round_off_size = cluster_size - (size % cluster_size);
+    return DIV_ROUND_UP(size + round_off_size, BDRV_SECTOR_SIZE * 128) - 1;
+}
+
 static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
 {
     char *desc;
@@ -1028,6 +1037,133 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 }
 
+static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
+                         uint32_t offset)
+{
+    offset = cpu_to_le32(offset);
+    /* update L2 table */
+    if (bdrv_pwrite_sync(extent->file,
+                ((int64_t)m_data->l2_offset * 512)
+                    + (m_data->l2_index * sizeof(offset)),
+                &offset, sizeof(offset)) < 0) {
+        return VMDK_ERROR;
+    }
+    /* update backup L2 table */
+    if (extent->l1_backup_table_offset != 0) {
+        m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
+        if (bdrv_pwrite_sync(extent->file,
+                    ((int64_t)m_data->l2_offset * 512)
+                        + (m_data->l2_index * sizeof(offset)),
+                    &offset, sizeof(offset)) < 0) {
+            return VMDK_ERROR;
+        }
+    }
+    if (m_data->l2_cache_entry) {
+        *m_data->l2_cache_entry = offset;
+    }
+
+    return VMDK_OK;
+}
+
+/*
+ * vmdk_l2load
+ *
+ * Loads a new L2 table into memory. If the table is in the cache, the cache
+ * is used; otherwise the L2 table is loaded from the image file.
+ *
+ * Returns:
+ *   VMDK_OK:       on success
+ *   VMDK_ERROR:    in error cases
+ */
+static int vmdk_l2load(VmdkExtent *extent, uint64_t offset, int l2_offset,
+                       uint32_t **new_l2_table, int *new_l2_index)
+{
+    int min_index, i, j;
+    uint32_t *l2_table;
+    uint32_t min_count;
+
+    for (i = 0; i < L2_CACHE_SIZE; i++) {
+        if (l2_offset == extent->l2_cache_offsets[i]) {
+            /* increment the hit count */
+            if (++extent->l2_cache_counts[i] == UINT32_MAX) {
+                for (j = 0; j < L2_CACHE_SIZE; j++) {
+                    extent->l2_cache_counts[j] >>= 1;
+                }
+            }
+            l2_table = extent->l2_cache + (i * extent->l2_size);
+            goto found;
+        }
+    }
+    /* not found: load a new entry in the least used one */
+    min_index = 0;
+    min_count = UINT32_MAX;
+    for (i = 0; i < L2_CACHE_SIZE; i++) {
+        if (extent->l2_cache_counts[i] < min_count) {
+            min_count = extent->l2_cache_counts[i];
+            min_index = i;
+        }
+    }
+    l2_table = extent->l2_cache + (min_index * extent->l2_size);
+    if (bdrv_pread(extent->file,
+                (int64_t)l2_offset * 512,
+                l2_table,
+                extent->l2_size * sizeof(uint32_t)
+            ) != extent->l2_size * sizeof(uint32_t)) {
+        return VMDK_ERROR;
+    }
+
+    extent->l2_cache_offsets[min_index] = l2_offset;
+    extent->l2_cache_counts[min_index] = 1;
+found:
+    *new_l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
+    *new_l2_table = l2_table;
+
+    return VMDK_OK;
+}
+
+/*
+ * get_cluster_table
+ *
+ * for a given offset, load (and allocate if needed) the l2 table.
+ *
+ * Returns:
+ *   VMDK_OK:        on success
+ *
+ *   VMDK_UNALLOC:   if cluster is not mapped
+ *
+ *   VMDK_ERROR:     in error cases
+ */
+static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
+                             int *new_l1_index, int *new_l2_offset,
+                             int *new_l2_index, uint32_t **new_l2_table)
+{
+    int l1_index, l2_offset, l2_index;
+    uint32_t *l2_table;
+    int ret;
+
+    offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
+    l1_index = (offset >> 9) / extent->l1_entry_sectors;
+    if (l1_index >= extent->l1_size) {
+        return VMDK_ERROR;
+    }
+    l2_offset = extent->l1_table[l1_index];
+    if (!l2_offset) {
+        return VMDK_UNALLOC;
+    }
+
+    ret = vmdk_l2load(extent, offset, l2_offset, &l2_table, &l2_index);
+    if (ret < 0) {
+        return ret;
+    }
+
+    *new_l1_index = l1_index;
+    *new_l2_offset = l2_offset;
+    *new_l2_index = l2_index;
+    *new_l2_table = l2_table;
+
+    return VMDK_OK;
+}
+
 /*
  * vmdk_perform_cow
  *
@@ -1115,29 +1251,168 @@ exit:
     return ret;
 }
 
-static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
-                         uint32_t offset)
+/*
+ * handle_alloc
+ *
+ * Allocates new clusters for an area that either is yet unallocated or needs a
+ * copy on write. If *cluster_offset is non_zero, clusters are only allocated if
+ * the new allocation can match the specified host offset.
+ *
+ * Returns:
+ *   VMDK_OK:       if new clusters were allocated, *bytes may be decreased if
+ *                  the new allocation doesn't cover all of the requested area.
+ *                  *cluster_offset is updated to contain the offset of the
+ *                  first newly allocated cluster.
+ *
+ *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is left
+ *                  unchanged.
+ *
+ *   VMDK_ERROR:    in error cases
+ */
+static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
+                        uint64_t offset, uint64_t *cluster_offset,
+                        int64_t *bytes, VmdkMetaData *m_data,
+                        bool allocate, uint32_t *total_alloc_clusters)
 {
-    offset = cpu_to_le32(offset);
-    /* update L2 table */
-    if (bdrv_pwrite_sync(extent->file,
-                ((int64_t)m_data->l2_offset * 512)
-                    + (m_data->l2_index * sizeof(offset)),
-                &offset, sizeof(offset)) < 0) {
-        return VMDK_ERROR;
+    int l1_index, l2_offset, l2_index;
+    uint32_t *l2_table;
+    uint32_t cluster_sector;
+    uint32_t nb_clusters;
+    bool zeroed = false;
+    uint64_t skip_start_bytes, skip_end_bytes;
+    int ret;
+
+    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
+                            &l2_index, &l2_table);
+    if (ret < 0) {
+        return ret;
     }
-    /* update backup L2 table */
-    if (extent->l1_backup_table_offset != 0) {
-        m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
-        if (bdrv_pwrite_sync(extent->file,
-                    ((int64_t)m_data->l2_offset * 512)
-                        + (m_data->l2_index * sizeof(offset)),
-                    &offset, sizeof(offset)) < 0) {
-            return VMDK_ERROR;
+
+    cluster_sector = le32_to_cpu(l2_table[l2_index]);
+
+    skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
+    /* Calculate the number of clusters to look for. Here it will return one
+     * cluster less than the actual value calculated as we may need to perfrom
+     * COW for the last one. */
+    nb_clusters = size_to_clusters(extent, skip_start_bytes + *bytes);
+
+    nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
+    assert(nb_clusters <= INT_MAX);
+
+    /* update bytes according to final nb_clusters value */
+    if (nb_clusters != 0) {
+        *bytes = ((nb_clusters * extent->cluster_sectors) << 9)
+                                            - skip_start_bytes;
+    } else {
+        nb_clusters = 1;
+    }
+    *total_alloc_clusters += nb_clusters;
+    skip_end_bytes = skip_start_bytes + MIN(*bytes,
+                     extent->cluster_sectors * BDRV_SECTOR_SIZE
+                                    - skip_start_bytes);
+
+    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
+        zeroed = true;
+    }
+
+    if (!cluster_sector || zeroed) {
+        if (!allocate) {
+            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
+        }
+
+        cluster_sector = extent->next_cluster_sector;
+        extent->next_cluster_sector += extent->cluster_sectors
+                                                * nb_clusters;
+
+        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
+                               offset, skip_start_bytes,
+                               skip_end_bytes);
+        if (ret < 0) {
+            return ret;
+        }
+        if (m_data) {
+            m_data->valid = 1;
+            m_data->l1_index = l1_index;
+            m_data->l2_index = l2_index;
+            m_data->l2_offset = l2_offset;
+            m_data->l2_cache_entry = &l2_table[l2_index];
+            m_data->nb_clusters = nb_clusters;
         }
     }
-    if (m_data->l2_cache_entry) {
-        *m_data->l2_cache_entry = offset;
+    *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
+    return VMDK_OK;
+}
+
+/*
+ * vmdk_alloc_cluster_offset
+ *
+ * For a given offset on the virtual disk, find the cluster offset in vmdk
+ * file. If the offset is not found, allocate a new cluster.
+ *
+ * If the cluster is newly allocated, m_data->nb_clusters is set to the number
+ * of contiguous clusters that have been allocated. In this case, the other
+ * fields of m_data are valid and contain information about the first allocated
+ * cluster.
+ *
+ * Returns:
+ *
+ *   VMDK_OK:           on success and @cluster_offset was set
+ *
+ *   VMDK_UNALLOC:      if no clusters were allocated and @cluster_offset is
+ *                      set to zero
+ *
+ *   VMDK_ERROR:        in error cases
+ */
+static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
+                                     VmdkExtent *extent,
+                                     VmdkMetaData *m_data, uint64_t offset,
+                                     bool allocate, uint64_t *cluster_offset,
+                                     int64_t bytes,
+                                     uint32_t *total_alloc_clusters)
+{
+    uint64_t start, remaining;
+    uint64_t new_cluster_offset;
+    int64_t n_bytes;
+    int ret;
+
+    if (extent->flat) {
+        *cluster_offset = extent->flat_start_offset;
+        return VMDK_OK;
+    }
+
+    start = offset;
+    remaining = bytes;
+    new_cluster_offset = 0;
+    *cluster_offset = 0;
+    n_bytes = 0;
+    if (m_data) {
+        m_data->valid = 0;
+    }
+
+    /* due to L2 table margins all bytes may not get allocated at once */
+    while (true) {
+
+        if (!*cluster_offset) {
+            *cluster_offset = new_cluster_offset;
+        }
+
+        start              += n_bytes;
+        remaining          -= n_bytes;
+        new_cluster_offset += n_bytes;
+
+        if (remaining == 0) {
+            break;
+        }
+
+        n_bytes = remaining;
+
+        ret = handle_alloc(bs, extent, start, &new_cluster_offset, &n_bytes,
+                           m_data, allocate, total_alloc_clusters);
+
+        if (ret < 0) {
+            return ret;
+
+        }
     }
 
     return VMDK_OK;
@@ -1567,6 +1842,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     uint64_t cluster_offset;
     uint64_t bytes_done = 0;
     VmdkMetaData m_data;
+    uint32_t total_alloc_clusters = 0;
 
     if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
         error_report("Wrong offset: offset=0x%" PRIx64
@@ -1584,10 +1860,10 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                              - offset_in_cluster);
 
-        ret = get_cluster_offset(bs, extent, &m_data, offset,
-                                 !(extent->compressed || zeroed),
-                                 &cluster_offset, offset_in_cluster,
-                                 offset_in_cluster + n_bytes);
+        ret = vmdk_alloc_cluster_offset(bs, extent, &m_data, offset,
+                                        !(extent->compressed || zeroed),
+                                        &cluster_offset, n_bytes,
+                                        &total_alloc_clusters);
         if (extent->compressed) {
             if (ret == VMDK_OK) {
                 /* Refuse write to allocated cluster for streamOptimized */
@@ -1596,19 +1872,22 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
                 return -EIO;
             } else {
                 /* allocate */
-                ret = get_cluster_offset(bs, extent, &m_data, offset,
-                                         true, &cluster_offset, 0, 0);
+                ret = vmdk_alloc_cluster_offset(bs, extent, &m_data, offset,
+                                        true, &cluster_offset, n_bytes,
+                                        &total_alloc_clusters);
             }
         }
         if (ret == VMDK_ERROR) {
             return -EINVAL;
         }
+
         if (zeroed) {
             /* Do zeroed write, buf is ignored */
-            if (extent->has_zero_grain &&
-                    offset_in_cluster == 0 &&
-                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE) {
-                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+            if (extent->has_zero_grain && offset_in_cluster == 0 &&
+                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE *
+                        total_alloc_clusters) {
+                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE *
+                                        total_alloc_clusters;
                 if (!zero_dry_run) {
                     /* update L2 tables */
                     if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
@ 2017-04-19 12:56   ` Fam Zheng
  2017-04-19 15:13     ` Ashijeet Acharya
  0 siblings, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2017-04-19 12:56 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> Move the cluster tables loading code out of the existing
> get_cluster_offset() function to avoid code duplication and implement it
> in separate get_cluster_table() and vmdk_L2load() functions.
> 
> Introduce two new helper functions handle_alloc() and
> vmdk_alloc_cluster_offset(). handle_alloc() helps to allocate multiple
> clusters at once starting from a given offset on disk and performs COW
> if necessary for first and last allocated clusters.
> vmdk_alloc_cluster_offset() helps to return the offset of the first of
> the many newly allocated clusters. Also, provide proper documentation
> for both.
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> ---
>  block/vmdk.c | 337 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 308 insertions(+), 29 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 73ae786..e5a289d 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -136,6 +136,7 @@ typedef struct VmdkMetaData {
>      unsigned int l2_offset;
>      int valid;
>      uint32_t *l2_cache_entry;
> +    uint32_t nb_clusters;
>  } VmdkMetaData;
>  
>  typedef struct VmdkGrainMarker {
> @@ -254,6 +255,14 @@ static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
>      return extent_relative_offset % cluster_size;
>  }
>  
> +static inline uint64_t size_to_clusters(VmdkExtent *extent, uint64_t size)
> +{
> +    uint64_t cluster_size, round_off_size;
> +    cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
> +    round_off_size = cluster_size - (size % cluster_size);
> +    return DIV_ROUND_UP(size + round_off_size, BDRV_SECTOR_SIZE * 128) - 1;

What is (BDRV_SECTOR_SIZE * 128)? Do you mean extent->cluster_size?  And the
function doesn't make sense up to me.

Just un-inline this to

    DIV_ROUND_UP(size,
                 extent->cluster_sectors << BDRV_SECTOR_BITS) - 1

in the calling site and be done with it.

> +}
> +
>  static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
>  {
>      char *desc;
> @@ -1028,6 +1037,133 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
>      }
>  }
>  
> +static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> +                         uint32_t offset)
> +{
> +    offset = cpu_to_le32(offset);
> +    /* update L2 table */
> +    if (bdrv_pwrite_sync(extent->file,
> +                ((int64_t)m_data->l2_offset * 512)
> +                    + (m_data->l2_index * sizeof(offset)),
> +                &offset, sizeof(offset)) < 0) {
> +        return VMDK_ERROR;
> +    }
> +    /* update backup L2 table */
> +    if (extent->l1_backup_table_offset != 0) {
> +        m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
> +        if (bdrv_pwrite_sync(extent->file,
> +                    ((int64_t)m_data->l2_offset * 512)
> +                        + (m_data->l2_index * sizeof(offset)),
> +                    &offset, sizeof(offset)) < 0) {
> +            return VMDK_ERROR;
> +        }
> +    }
> +    if (m_data->l2_cache_entry) {
> +        *m_data->l2_cache_entry = offset;
> +    }
> +
> +    return VMDK_OK;
> +}
> +
> +/*
> + * vmdk_l2load
> + *
> + * Loads a new L2 table into memory. If the table is in the cache, the cache

Not a native speaker, but s/Loads/Load/ feels more nature and consistent with
other comments.

> + * is used; otherwise the L2 table is loaded from the image file.
> + *
> + * Returns:
> + *   VMDK_OK:       on success
> + *   VMDK_ERROR:    in error cases
> + */
> +static int vmdk_l2load(VmdkExtent *extent, uint64_t offset, int l2_offset,
> +                       uint32_t **new_l2_table, int *new_l2_index)
> +{
> +    int min_index, i, j;
> +    uint32_t *l2_table;
> +    uint32_t min_count;
> +
> +    for (i = 0; i < L2_CACHE_SIZE; i++) {
> +        if (l2_offset == extent->l2_cache_offsets[i]) {
> +            /* increment the hit count */
> +            if (++extent->l2_cache_counts[i] == UINT32_MAX) {
> +                for (j = 0; j < L2_CACHE_SIZE; j++) {
> +                    extent->l2_cache_counts[j] >>= 1;
> +                }
> +            }
> +            l2_table = extent->l2_cache + (i * extent->l2_size);
> +            goto found;
> +        }
> +    }
> +    /* not found: load a new entry in the least used one */
> +    min_index = 0;
> +    min_count = UINT32_MAX;
> +    for (i = 0; i < L2_CACHE_SIZE; i++) {
> +        if (extent->l2_cache_counts[i] < min_count) {
> +            min_count = extent->l2_cache_counts[i];
> +            min_index = i;
> +        }
> +    }
> +    l2_table = extent->l2_cache + (min_index * extent->l2_size);
> +    if (bdrv_pread(extent->file,
> +                (int64_t)l2_offset * 512,
> +                l2_table,
> +                extent->l2_size * sizeof(uint32_t)
> +            ) != extent->l2_size * sizeof(uint32_t)) {
> +        return VMDK_ERROR;
> +    }
> +
> +    extent->l2_cache_offsets[min_index] = l2_offset;
> +    extent->l2_cache_counts[min_index] = 1;
> +found:
> +    *new_l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
> +    *new_l2_table = l2_table;
> +
> +    return VMDK_OK;
> +}
> +
> +/*
> + * get_cluster_table
> + *
> + * for a given offset, load (and allocate if needed) the l2 table.
> + *
> + * Returns:
> + *   VMDK_OK:        on success
> + *
> + *   VMDK_UNALLOC:   if cluster is not mapped
> + *
> + *   VMDK_ERROR:     in error cases
> + */
> +static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
> +                             int *new_l1_index, int *new_l2_offset,
> +                             int *new_l2_index, uint32_t **new_l2_table)
> +{
> +    int l1_index, l2_offset, l2_index;
> +    uint32_t *l2_table;
> +    int ret;
> +
> +    offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
> +    l1_index = (offset >> 9) / extent->l1_entry_sectors;
> +    if (l1_index >= extent->l1_size) {
> +        return VMDK_ERROR;
> +    }
> +    l2_offset = extent->l1_table[l1_index];
> +    if (!l2_offset) {
> +        return VMDK_UNALLOC;
> +    }
> +
> +    ret = vmdk_l2load(extent, offset, l2_offset, &l2_table, &l2_index);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    *new_l1_index = l1_index;
> +    *new_l2_offset = l2_offset;
> +    *new_l2_index = l2_index;
> +    *new_l2_table = l2_table;
> +
> +    return VMDK_OK;
> +}
> +

Can you move this hunk into patch 4 and put it before this patch? It will make
reviewing a bit easier. (Yes, this patch is already big.)

>  /*
>   * vmdk_perform_cow
>   *
> @@ -1115,29 +1251,168 @@ exit:
>      return ret;
>  }
>  
> -static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> -                         uint32_t offset)
> +/*
> + * handle_alloc
> + *
> + * Allocates new clusters for an area that either is yet unallocated or needs a

Similar to vmdk_l2load, s/Allocates/Allocate/

> + * copy on write. If *cluster_offset is non_zero, clusters are only allocated if
> + * the new allocation can match the specified host offset.
> + *
> + * Returns:
> + *   VMDK_OK:       if new clusters were allocated, *bytes may be decreased if
> + *                  the new allocation doesn't cover all of the requested area.
> + *                  *cluster_offset is updated to contain the offset of the
> + *                  first newly allocated cluster.
> + *
> + *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is left
> + *                  unchanged.
> + *
> + *   VMDK_ERROR:    in error cases
> + */
> +static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
> +                        uint64_t offset, uint64_t *cluster_offset,
> +                        int64_t *bytes, VmdkMetaData *m_data,
> +                        bool allocate, uint32_t *total_alloc_clusters)
>  {
> -    offset = cpu_to_le32(offset);
> -    /* update L2 table */
> -    if (bdrv_pwrite_sync(extent->file,
> -                ((int64_t)m_data->l2_offset * 512)
> -                    + (m_data->l2_index * sizeof(offset)),
> -                &offset, sizeof(offset)) < 0) {
> -        return VMDK_ERROR;
> +    int l1_index, l2_offset, l2_index;
> +    uint32_t *l2_table;
> +    uint32_t cluster_sector;
> +    uint32_t nb_clusters;
> +    bool zeroed = false;
> +    uint64_t skip_start_bytes, skip_end_bytes;
> +    int ret;
> +
> +    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> +                            &l2_index, &l2_table);
> +    if (ret < 0) {
> +        return ret;
>      }
> -    /* update backup L2 table */
> -    if (extent->l1_backup_table_offset != 0) {
> -        m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
> -        if (bdrv_pwrite_sync(extent->file,
> -                    ((int64_t)m_data->l2_offset * 512)
> -                        + (m_data->l2_index * sizeof(offset)),
> -                    &offset, sizeof(offset)) < 0) {
> -            return VMDK_ERROR;
> +
> +    cluster_sector = le32_to_cpu(l2_table[l2_index]);
> +
> +    skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
> +    /* Calculate the number of clusters to look for. Here it will return one
> +     * cluster less than the actual value calculated as we may need to perfrom
> +     * COW for the last one. */
> +    nb_clusters = size_to_clusters(extent, skip_start_bytes + *bytes);
> +
> +    nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
> +    assert(nb_clusters <= INT_MAX);
> +
> +    /* update bytes according to final nb_clusters value */
> +    if (nb_clusters != 0) {
> +        *bytes = ((nb_clusters * extent->cluster_sectors) << 9)
> +                                            - skip_start_bytes;
> +    } else {
> +        nb_clusters = 1;
> +    }
> +    *total_alloc_clusters += nb_clusters;
> +    skip_end_bytes = skip_start_bytes + MIN(*bytes,
> +                     extent->cluster_sectors * BDRV_SECTOR_SIZE
> +                                    - skip_start_bytes);
> +
> +    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
> +        zeroed = true;
> +    }
> +
> +    if (!cluster_sector || zeroed) {
> +        if (!allocate) {
> +            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
> +        }
> +
> +        cluster_sector = extent->next_cluster_sector;
> +        extent->next_cluster_sector += extent->cluster_sectors
> +                                                * nb_clusters;
> +
> +        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
> +                               offset, skip_start_bytes,
> +                               skip_end_bytes);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        if (m_data) {
> +            m_data->valid = 1;
> +            m_data->l1_index = l1_index;
> +            m_data->l2_index = l2_index;
> +            m_data->l2_offset = l2_offset;
> +            m_data->l2_cache_entry = &l2_table[l2_index];
> +            m_data->nb_clusters = nb_clusters;
>          }
>      }
> -    if (m_data->l2_cache_entry) {
> -        *m_data->l2_cache_entry = offset;
> +    *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
> +    return VMDK_OK;
> +}
> +
> +/*
> + * vmdk_alloc_cluster_offset

Maybe just name it "vmdk_alloc_clusters", which sounds better to me? Because the
clusters are what we allocate here, it's rather
"vmdk_alloc_clusters_and_get_offset" but we probably don't want it that long.

> + *
> + * For a given offset on the virtual disk, find the cluster offset in vmdk
> + * file. If the offset is not found, allocate a new cluster.
> + *
> + * If the cluster is newly allocated, m_data->nb_clusters is set to the number
> + * of contiguous clusters that have been allocated. In this case, the other
> + * fields of m_data are valid and contain information about the first allocated
> + * cluster.
> + *
> + * Returns:
> + *
> + *   VMDK_OK:           on success and @cluster_offset was set
> + *
> + *   VMDK_UNALLOC:      if no clusters were allocated and @cluster_offset is
> + *                      set to zero
> + *
> + *   VMDK_ERROR:        in error cases

Thank you for adding the function documentations!

> + */
> +static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
> +                                     VmdkExtent *extent,
> +                                     VmdkMetaData *m_data, uint64_t offset,
> +                                     bool allocate, uint64_t *cluster_offset,
> +                                     int64_t bytes,
> +                                     uint32_t *total_alloc_clusters)
> +{
> +    uint64_t start, remaining;
> +    uint64_t new_cluster_offset;
> +    int64_t n_bytes;
> +    int ret;
> +
> +    if (extent->flat) {
> +        *cluster_offset = extent->flat_start_offset;
> +        return VMDK_OK;
> +    }
> +
> +    start = offset;
> +    remaining = bytes;
> +    new_cluster_offset = 0;
> +    *cluster_offset = 0;
> +    n_bytes = 0;
> +    if (m_data) {
> +        m_data->valid = 0;
> +    }
> +
> +    /* due to L2 table margins all bytes may not get allocated at once */
> +    while (true) {
> +
> +        if (!*cluster_offset) {
> +            *cluster_offset = new_cluster_offset;
> +        }
> +
> +        start              += n_bytes;
> +        remaining          -= n_bytes;

Here, in the first iteration, remaining == bytes and n_bytes == 0.

> +        new_cluster_offset += n_bytes;
> +
> +        if (remaining == 0) {
> +            break;
> +        }
> +
> +        n_bytes = remaining;

Then n_bytes becomes bytes;

In the second iteration, remaining is always 0 because of "remaining -=
n_bytes". What's the point of the while loop?

> +
> +        ret = handle_alloc(bs, extent, start, &new_cluster_offset, &n_bytes,
> +                           m_data, allocate, total_alloc_clusters);
> +
> +        if (ret < 0) {
> +            return ret;
> +
> +        }
>      }
>  
>      return VMDK_OK;
> @@ -1567,6 +1842,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>      uint64_t cluster_offset;
>      uint64_t bytes_done = 0;
>      VmdkMetaData m_data;
> +    uint32_t total_alloc_clusters = 0;
>  
>      if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
>          error_report("Wrong offset: offset=0x%" PRIx64
> @@ -1584,10 +1860,10 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>          n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
>                               - offset_in_cluster);
>  
> -        ret = get_cluster_offset(bs, extent, &m_data, offset,
> -                                 !(extent->compressed || zeroed),
> -                                 &cluster_offset, offset_in_cluster,
> -                                 offset_in_cluster + n_bytes);
> +        ret = vmdk_alloc_cluster_offset(bs, extent, &m_data, offset,
> +                                        !(extent->compressed || zeroed),
> +                                        &cluster_offset, n_bytes,
> +                                        &total_alloc_clusters);
>          if (extent->compressed) {
>              if (ret == VMDK_OK) {
>                  /* Refuse write to allocated cluster for streamOptimized */
> @@ -1596,19 +1872,22 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>                  return -EIO;
>              } else {
>                  /* allocate */
> -                ret = get_cluster_offset(bs, extent, &m_data, offset,
> -                                         true, &cluster_offset, 0, 0);
> +                ret = vmdk_alloc_cluster_offset(bs, extent, &m_data, offset,
> +                                        true, &cluster_offset, n_bytes,
> +                                        &total_alloc_clusters);

Parameter list is no longer aligned now.

>              }
>          }
>          if (ret == VMDK_ERROR) {
>              return -EINVAL;
>          }
> +
>          if (zeroed) {
>              /* Do zeroed write, buf is ignored */
> -            if (extent->has_zero_grain &&
> -                    offset_in_cluster == 0 &&
> -                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE) {
> -                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE;
> +            if (extent->has_zero_grain && offset_in_cluster == 0 &&
> +                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE *
> +                        total_alloc_clusters) {
> +                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE *
> +                                        total_alloc_clusters;
>                  if (!zero_dry_run) {
>                      /* update L2 tables */
>                      if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
> -- 
> 2.6.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters
  2017-04-19 12:56   ` Fam Zheng
@ 2017-04-19 15:13     ` Ashijeet Acharya
  2017-04-20  0:47       ` Fam Zheng
  0 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-19 15:13 UTC (permalink / raw)
  To: Fam Zheng; +Cc: jsnow, kwolf, mreitz, qemu-block, qemu-devel, stefanha

On Wed, Apr 19, 2017 at 18:26 Fam Zheng <famz@redhat.com> wrote:

> On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> > Move the cluster tables loading code out of the existing
> > get_cluster_offset() function to avoid code duplication and implement it
> > in separate get_cluster_table() and vmdk_L2load() functions.
> >
> > Introduce two new helper functions handle_alloc() and
> > vmdk_alloc_cluster_offset(). handle_alloc() helps to allocate multiple
> > clusters at once starting from a given offset on disk and performs COW
> > if necessary for first and last allocated clusters.
> > vmdk_alloc_cluster_offset() helps to return the offset of the first of
> > the many newly allocated clusters. Also, provide proper documentation
> > for both.
> >
> > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> > ---
> >  block/vmdk.c | 337
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 308 insertions(+), 29 deletions(-)
> >
> > diff --git a/block/vmdk.c b/block/vmdk.c
> > index 73ae786..e5a289d 100644
> > --- a/block/vmdk.c
> > +++ b/block/vmdk.c
> > @@ -136,6 +136,7 @@ typedef struct VmdkMetaData {
> >      unsigned int l2_offset;
> >      int valid;
> >      uint32_t *l2_cache_entry;
> > +    uint32_t nb_clusters;
> >  } VmdkMetaData;
> >
> >  typedef struct VmdkGrainMarker {
> > @@ -254,6 +255,14 @@ static inline uint64_t
> vmdk_find_offset_in_cluster(VmdkExtent *extent,
> >      return extent_relative_offset % cluster_size;
> >  }
> >
> > +static inline uint64_t size_to_clusters(VmdkExtent *extent, uint64_t
> size)
> > +{
> > +    uint64_t cluster_size, round_off_size;
> > +    cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
> > +    round_off_size = cluster_size - (size % cluster_size);
> > +    return DIV_ROUND_UP(size + round_off_size, BDRV_SECTOR_SIZE * 128)
> - 1;
>
> What is (BDRV_SECTOR_SIZE * 128)? Do you mean extent->cluster_size?  And
> the
> function doesn't make sense up to me.
>
> Just un-inline this to
>
>     DIV_ROUND_UP(size,
>                  extent->cluster_sectors << BDRV_SECTOR_BITS) - 1
>
> in the calling site and be done with it.
>
> > +}
> > +
> >  static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
> >  {
> >      char *desc;
> > @@ -1028,6 +1037,133 @@ static void vmdk_refresh_limits(BlockDriverState
> *bs, Error **errp)
> >      }
> >  }
> >
> > +static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> > +                         uint32_t offset)
> > +{
> > +    offset = cpu_to_le32(offset);
> > +    /* update L2 table */
> > +    if (bdrv_pwrite_sync(extent->file,
> > +                ((int64_t)m_data->l2_offset * 512)
> > +                    + (m_data->l2_index * sizeof(offset)),
> > +                &offset, sizeof(offset)) < 0) {
> > +        return VMDK_ERROR;
> > +    }
> > +    /* update backup L2 table */
> > +    if (extent->l1_backup_table_offset != 0) {
> > +        m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
> > +        if (bdrv_pwrite_sync(extent->file,
> > +                    ((int64_t)m_data->l2_offset * 512)
> > +                        + (m_data->l2_index * sizeof(offset)),
> > +                    &offset, sizeof(offset)) < 0) {
> > +            return VMDK_ERROR;
> > +        }
> > +    }
> > +    if (m_data->l2_cache_entry) {
> > +        *m_data->l2_cache_entry = offset;
> > +    }
> > +
> > +    return VMDK_OK;
> > +}
> > +
> > +/*
> > + * vmdk_l2load
> > + *
> > + * Loads a new L2 table into memory. If the table is in the cache, the
> cache
>
> Not a native speaker, but s/Loads/Load/ feels more nature and consistent
> with
> other comments.
>
> > + * is used; otherwise the L2 table is loaded from the image file.
> > + *
> > + * Returns:
> > + *   VMDK_OK:       on success
> > + *   VMDK_ERROR:    in error cases
> > + */
> > +static int vmdk_l2load(VmdkExtent *extent, uint64_t offset, int
> l2_offset,
> > +                       uint32_t **new_l2_table, int *new_l2_index)
> > +{
> > +    int min_index, i, j;
> > +    uint32_t *l2_table;
> > +    uint32_t min_count;
> > +
> > +    for (i = 0; i < L2_CACHE_SIZE; i++) {
> > +        if (l2_offset == extent->l2_cache_offsets[i]) {
> > +            /* increment the hit count */
> > +            if (++extent->l2_cache_counts[i] == UINT32_MAX) {
> > +                for (j = 0; j < L2_CACHE_SIZE; j++) {
> > +                    extent->l2_cache_counts[j] >>= 1;
> > +                }
> > +            }
> > +            l2_table = extent->l2_cache + (i * extent->l2_size);
> > +            goto found;
> > +        }
> > +    }
> > +    /* not found: load a new entry in the least used one */
> > +    min_index = 0;
> > +    min_count = UINT32_MAX;
> > +    for (i = 0; i < L2_CACHE_SIZE; i++) {
> > +        if (extent->l2_cache_counts[i] < min_count) {
> > +            min_count = extent->l2_cache_counts[i];
> > +            min_index = i;
> > +        }
> > +    }
> > +    l2_table = extent->l2_cache + (min_index * extent->l2_size);
> > +    if (bdrv_pread(extent->file,
> > +                (int64_t)l2_offset * 512,
> > +                l2_table,
> > +                extent->l2_size * sizeof(uint32_t)
> > +            ) != extent->l2_size * sizeof(uint32_t)) {
> > +        return VMDK_ERROR;
> > +    }
> > +
> > +    extent->l2_cache_offsets[min_index] = l2_offset;
> > +    extent->l2_cache_counts[min_index] = 1;
> > +found:
> > +    *new_l2_index = ((offset >> 9) / extent->cluster_sectors) %
> extent->l2_size;
> > +    *new_l2_table = l2_table;
> > +
> > +    return VMDK_OK;
> > +}
> > +
> > +/*
> > + * get_cluster_table
> > + *
> > + * for a given offset, load (and allocate if needed) the l2 table.
> > + *
> > + * Returns:
> > + *   VMDK_OK:        on success
> > + *
> > + *   VMDK_UNALLOC:   if cluster is not mapped
> > + *
> > + *   VMDK_ERROR:     in error cases
> > + */
> > +static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
> > +                             int *new_l1_index, int *new_l2_offset,
> > +                             int *new_l2_index, uint32_t **new_l2_table)
> > +{
> > +    int l1_index, l2_offset, l2_index;
> > +    uint32_t *l2_table;
> > +    int ret;
> > +
> > +    offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
> > +    l1_index = (offset >> 9) / extent->l1_entry_sectors;
> > +    if (l1_index >= extent->l1_size) {
> > +        return VMDK_ERROR;
> > +    }
> > +    l2_offset = extent->l1_table[l1_index];
> > +    if (!l2_offset) {
> > +        return VMDK_UNALLOC;
> > +    }
> > +
> > +    ret = vmdk_l2load(extent, offset, l2_offset, &l2_table, &l2_index);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +
> > +    *new_l1_index = l1_index;
> > +    *new_l2_offset = l2_offset;
> > +    *new_l2_index = l2_index;
> > +    *new_l2_table = l2_table;
> > +
> > +    return VMDK_OK;
> > +}
> > +
>
> Can you move this hunk into patch 4 and put it before this patch? It will
> make
> reviewing a bit easier. (Yes, this patch is already big.)
>

Right, I will change it to as you say. I know its big and I didn't like it
either :(


> >  /*
> >   * vmdk_perform_cow
> >   *
> > @@ -1115,29 +1251,168 @@ exit:
> >      return ret;
> >  }
> >
> > -static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> > -                         uint32_t offset)
> > +/*
> > + * handle_alloc
> > + *
> > + * Allocates new clusters for an area that either is yet unallocated or
> needs a
>
> Similar to vmdk_l2load, s/Allocates/Allocate/
>
> > + * copy on write. If *cluster_offset is non_zero, clusters are only
> allocated if
> > + * the new allocation can match the specified host offset.
> > + *
> > + * Returns:
> > + *   VMDK_OK:       if new clusters were allocated, *bytes may be
> decreased if
> > + *                  the new allocation doesn't cover all of the
> requested area.
> > + *                  *cluster_offset is updated to contain the offset of
> the
> > + *                  first newly allocated cluster.
> > + *
> > + *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset
> is left
> > + *                  unchanged.
> > + *
> > + *   VMDK_ERROR:    in error cases
> > + */
> > +static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
> > +                        uint64_t offset, uint64_t *cluster_offset,
> > +                        int64_t *bytes, VmdkMetaData *m_data,
> > +                        bool allocate, uint32_t *total_alloc_clusters)
> >  {
> > -    offset = cpu_to_le32(offset);
> > -    /* update L2 table */
> > -    if (bdrv_pwrite_sync(extent->file,
> > -                ((int64_t)m_data->l2_offset * 512)
> > -                    + (m_data->l2_index * sizeof(offset)),
> > -                &offset, sizeof(offset)) < 0) {
> > -        return VMDK_ERROR;
> > +    int l1_index, l2_offset, l2_index;
> > +    uint32_t *l2_table;
> > +    uint32_t cluster_sector;
> > +    uint32_t nb_clusters;
> > +    bool zeroed = false;
> > +    uint64_t skip_start_bytes, skip_end_bytes;
> > +    int ret;
> > +
> > +    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> > +                            &l2_index, &l2_table);
> > +    if (ret < 0) {
> > +        return ret;
> >      }
> > -    /* update backup L2 table */
> > -    if (extent->l1_backup_table_offset != 0) {
> > -        m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
> > -        if (bdrv_pwrite_sync(extent->file,
> > -                    ((int64_t)m_data->l2_offset * 512)
> > -                        + (m_data->l2_index * sizeof(offset)),
> > -                    &offset, sizeof(offset)) < 0) {
> > -            return VMDK_ERROR;
> > +
> > +    cluster_sector = le32_to_cpu(l2_table[l2_index]);
> > +
> > +    skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
> > +    /* Calculate the number of clusters to look for. Here it will
> return one
> > +     * cluster less than the actual value calculated as we may need to
> perfrom
> > +     * COW for the last one. */
> > +    nb_clusters = size_to_clusters(extent, skip_start_bytes + *bytes);
> > +
> > +    nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
> > +    assert(nb_clusters <= INT_MAX);
> > +
> > +    /* update bytes according to final nb_clusters value */
> > +    if (nb_clusters != 0) {
> > +        *bytes = ((nb_clusters * extent->cluster_sectors) << 9)
> > +                                            - skip_start_bytes;


[continuation of why the while loop?]....here. So the bytes may get reduced
if nb_clusters were more than 512 (l2 table margin) . Thus @remaining down
there won't necessarily be zero after first pass. I hope I explained it
correctly!

>
> > +    } else {
> > +        nb_clusters = 1;
> > +    }
> > +    *total_alloc_clusters += nb_clusters;
> > +    skip_end_bytes = skip_start_bytes + MIN(*bytes,
> > +                     extent->cluster_sectors * BDRV_SECTOR_SIZE
> > +                                    - skip_start_bytes);
> > +
> > +    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
> > +        zeroed = true;
> > +    }
> > +
> > +    if (!cluster_sector || zeroed) {
> > +        if (!allocate) {
> > +            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
> > +        }
> > +
> > +        cluster_sector = extent->next_cluster_sector;
> > +        extent->next_cluster_sector += extent->cluster_sectors
> > +                                                * nb_clusters;
> > +
> > +        ret = vmdk_perform_cow(bs, extent, cluster_sector *
> BDRV_SECTOR_SIZE,
> > +                               offset, skip_start_bytes,
> > +                               skip_end_bytes);
> > +        if (ret < 0) {
> > +            return ret;
> > +        }
> > +        if (m_data) {
> > +            m_data->valid = 1;
> > +            m_data->l1_index = l1_index;
> > +            m_data->l2_index = l2_index;
> > +            m_data->l2_offset = l2_offset;
> > +            m_data->l2_cache_entry = &l2_table[l2_index];
> > +            m_data->nb_clusters = nb_clusters;
> >          }
> >      }
> > -    if (m_data->l2_cache_entry) {
> > -        *m_data->l2_cache_entry = offset;
> > +    *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
> > +    return VMDK_OK;
> > +}
> > +
> > +/*
> > + * vmdk_alloc_cluster_offset
>
> Maybe just name it "vmdk_alloc_clusters", which sounds better to me?
> Because the
> clusters are what we allocate here, it's rather
> "vmdk_alloc_clusters_and_get_offset" but we probably don't want it that
> long.
>
> > + *
> > + * For a given offset on the virtual disk, find the cluster offset in
> vmdk
> > + * file. If the offset is not found, allocate a new cluster.
> > + *
> > + * If the cluster is newly allocated, m_data->nb_clusters is set to the
> number
> > + * of contiguous clusters that have been allocated. In this case, the
> other
> > + * fields of m_data are valid and contain information about the first
> allocated
> > + * cluster.
> > + *
> > + * Returns:
> > + *
> > + *   VMDK_OK:           on success and @cluster_offset was set
> > + *
> > + *   VMDK_UNALLOC:      if no clusters were allocated and
> @cluster_offset is
> > + *                      set to zero
> > + *
> > + *   VMDK_ERROR:        in error cases
>
> Thank you for adding the function documentations!
>
> > + */
> > +static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
> > +                                     VmdkExtent *extent,
> > +                                     VmdkMetaData *m_data, uint64_t
> offset,
> > +                                     bool allocate, uint64_t
> *cluster_offset,
> > +                                     int64_t bytes,
> > +                                     uint32_t *total_alloc_clusters)
> > +{
> > +    uint64_t start, remaining;
> > +    uint64_t new_cluster_offset;
> > +    int64_t n_bytes;
> > +    int ret;
> > +
> > +    if (extent->flat) {
> > +        *cluster_offset = extent->flat_start_offset;
> > +        return VMDK_OK;
> > +    }
> > +
> > +    start = offset;
> > +    remaining = bytes;
> > +    new_cluster_offset = 0;
> > +    *cluster_offset = 0;
> > +    n_bytes = 0;
> > +    if (m_data) {
> > +        m_data->valid = 0;
> > +    }
> > +
> > +    /* due to L2 table margins all bytes may not get allocated at once
> */
> > +    while (true) {
> > +
> > +        if (!*cluster_offset) {
> > +            *cluster_offset = new_cluster_offset;
> > +        }
> > +
> > +        start              += n_bytes;
> > +        remaining          -= n_bytes;
>
> Here, in the first iteration, remaining == bytes and n_bytes == 0.
>
> > +        new_cluster_offset += n_bytes;
> > +
> > +        if (remaining == 0) {
> > +            break;
> > +        }
> > +
> > +        n_bytes = remaining;
>
> Then n_bytes becomes bytes;
>
> In the second iteration, remaining is always 0 because of "remaining -=
> n_bytes". What's the point of the while loop?


I need the while loop in case if I truncate the bytes according to the L2
table margins....[scroll up to handle alloc() __^ ]

Ashijeet

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters
  2017-04-19 15:13     ` Ashijeet Acharya
@ 2017-04-20  0:47       ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-20  0:47 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: jsnow, kwolf, mreitz, qemu-block, qemu-devel, stefanha

On Wed, 04/19 15:13, Ashijeet Acharya wrote:
> > In the second iteration, remaining is always 0 because of "remaining -=
> > n_bytes". What's the point of the while loop?
> 
> 
> I need the while loop in case if I truncate the bytes according to the L2
> table margins....[scroll up to handle alloc() __^ ]

Yes, I see it now.

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (2 preceding siblings ...)
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
@ 2017-04-01 14:44 ` Ashijeet Acharya
  2017-04-19 12:57   ` Fam Zheng
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters Ashijeet Acharya
  5 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Rename the existing get_cluster_offset() function to
vmdk_get_cluster_offset() and have it make use of the new
get_cluster_table() to load the cluster tables. Also, it is no longer
used to allocate new clusters and hence perform COW. Make the necessary
renames at all the occurrences of get_cluster_offset().

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 117 +++++++++++------------------------------------------------
 1 file changed, 21 insertions(+), 96 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index e5a289d..a8babd7 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1419,7 +1419,7 @@ static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
 }
 
 /**
- * get_cluster_offset
+ * vmdk_get_cluster_offset
  *
  * Look up cluster offset in extent file by sector number, and store in
  * @cluster_offset.
@@ -1427,84 +1427,34 @@ static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
  * For flat extents, the start offset as parsed from the description file is
  * returned.
  *
- * For sparse extents, look up in L1, L2 table. If allocate is true, return an
- * offset for a new cluster and update L2 cache. If there is a backing file,
- * COW is done before returning; otherwise, zeroes are written to the allocated
- * cluster. Both COW and zero writing skips the sector range
- * [@skip_start_sector, @skip_end_sector) passed in by caller, because caller
- * has new data to write there.
+ * For sparse extents, look up in L1, L2 table.
  *
  * Returns: VMDK_OK if cluster exists and mapped in the image.
- *          VMDK_UNALLOC if cluster is not mapped and @allocate is false.
- *          VMDK_ERROR if failed.
+ *          VMDK_UNALLOC if cluster is not mapped.
+ *          VMDK_ERROR if failed
  */
-static int get_cluster_offset(BlockDriverState *bs,
-                              VmdkExtent *extent,
-                              VmdkMetaData *m_data,
-                              uint64_t offset,
-                              bool allocate,
-                              uint64_t *cluster_offset,
-                              uint64_t skip_start_bytes,
-                              uint64_t skip_end_bytes)
+static int vmdk_get_cluster_offset(BlockDriverState *bs,
+                                   VmdkExtent *extent,
+                                   uint64_t offset,
+                                   uint64_t *cluster_offset)
 {
-    unsigned int l1_index, l2_offset, l2_index;
-    int min_index, i, j;
-    uint32_t min_count, *l2_table;
+    int l1_index, l2_offset, l2_index;
+    uint32_t *l2_table;
     bool zeroed = false;
     int64_t ret;
     int64_t cluster_sector;
 
-    if (m_data) {
-        m_data->valid = 0;
-    }
     if (extent->flat) {
         *cluster_offset = extent->flat_start_offset;
         return VMDK_OK;
     }
 
-    offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
-    l1_index = (offset >> 9) / extent->l1_entry_sectors;
-    if (l1_index >= extent->l1_size) {
-        return VMDK_ERROR;
-    }
-    l2_offset = extent->l1_table[l1_index];
-    if (!l2_offset) {
-        return VMDK_UNALLOC;
-    }
-    for (i = 0; i < L2_CACHE_SIZE; i++) {
-        if (l2_offset == extent->l2_cache_offsets[i]) {
-            /* increment the hit count */
-            if (++extent->l2_cache_counts[i] == 0xffffffff) {
-                for (j = 0; j < L2_CACHE_SIZE; j++) {
-                    extent->l2_cache_counts[j] >>= 1;
-                }
-            }
-            l2_table = extent->l2_cache + (i * extent->l2_size);
-            goto found;
-        }
-    }
-    /* not found: load a new entry in the least used one */
-    min_index = 0;
-    min_count = 0xffffffff;
-    for (i = 0; i < L2_CACHE_SIZE; i++) {
-        if (extent->l2_cache_counts[i] < min_count) {
-            min_count = extent->l2_cache_counts[i];
-            min_index = i;
-        }
-    }
-    l2_table = extent->l2_cache + (min_index * extent->l2_size);
-    if (bdrv_pread(extent->file,
-                (int64_t)l2_offset * 512,
-                l2_table,
-                extent->l2_size * sizeof(uint32_t)
-            ) != extent->l2_size * sizeof(uint32_t)) {
-        return VMDK_ERROR;
+    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
+                            &l2_index, &l2_table);
+    if (ret < 0) {
+        return ret;
     }
 
-    extent->l2_cache_offsets[min_index] = l2_offset;
-    extent->l2_cache_counts[min_index] = 1;
- found:
-    l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
     cluster_sector = le32_to_cpu(l2_table[l2_index]);
 
     if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
@@ -1512,31 +1462,9 @@ static int get_cluster_offset(BlockDriverState *bs,
     }
 
     if (!cluster_sector || zeroed) {
-        if (!allocate) {
-            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
-        }
-
-        cluster_sector = extent->next_cluster_sector;
-        extent->next_cluster_sector += extent->cluster_sectors;
-
-        /* First of all we write grain itself, to avoid race condition
-         * that may to corrupt the image.
-         * This problem may occur because of insufficient space on host disk
-         * or inappropriate VM shutdown.
-         */
-        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
-                                offset, skip_start_bytes, skip_end_bytes);
-        if (ret) {
-            return ret;
-        }
-        if (m_data) {
-            m_data->valid = 1;
-            m_data->l1_index = l1_index;
-            m_data->l2_index = l2_index;
-            m_data->l2_offset = l2_offset;
-            m_data->l2_cache_entry = &l2_table[l2_index];
-        }
+        return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
     }
+
     *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
     return VMDK_OK;
 }
@@ -1579,9 +1507,7 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
         return 0;
     }
     qemu_co_mutex_lock(&s->lock);
-    ret = get_cluster_offset(bs, extent, NULL,
-                             sector_num * 512, false, &offset,
-                             0, 0);
+    ret = vmdk_get_cluster_offset(bs, extent, sector_num * 512, &offset);
     qemu_co_mutex_unlock(&s->lock);
 
     index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
@@ -1772,13 +1698,13 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
             ret = -EIO;
             goto fail;
         }
-        ret = get_cluster_offset(bs, extent, NULL,
-                                 offset, false, &cluster_offset, 0, 0);
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                              - offset_in_cluster);
 
+        ret = vmdk_get_cluster_offset(bs, extent, offset, &cluster_offset);
+
         if (ret != VMDK_OK) {
             /* if not allocated, try to read from parent image, if exist */
             if (bs->backing && ret != VMDK_ZEROED) {
@@ -2508,9 +2434,8 @@ static int vmdk_check(BlockDriverState *bs, BdrvCheckResult *result,
                     sector_num);
             break;
         }
-        ret = get_cluster_offset(bs, extent, NULL,
-                                 sector_num << BDRV_SECTOR_BITS,
-                                 false, &cluster_offset, 0, 0);
+        ret = vmdk_get_cluster_offset(bs, extent,
+                        sector_num << BDRV_SECTOR_BITS, &cluster_offset);
         if (ret == VMDK_ERROR) {
             fprintf(stderr,
                     "ERROR: could not get cluster_offset for sector %"
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
@ 2017-04-19 12:57   ` Fam Zheng
  2017-04-19 15:21     ` Ashijeet Acharya
  0 siblings, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2017-04-19 12:57 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> Rename the existing get_cluster_offset() function to
> vmdk_get_cluster_offset() and have it make use of the new
> get_cluster_table() to load the cluster tables. Also, it is no longer
> used to allocate new clusters and hence perform COW. Make the necessary
> renames at all the occurrences of get_cluster_offset().
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> ---
>  block/vmdk.c | 117 +++++++++++------------------------------------------------
>  1 file changed, 21 insertions(+), 96 deletions(-)

This is definitely more than a function rename, like I said in reply to patch 3,
it could probably be split to smaller ones (rename, and others, for example),
and reordered to make reviewing easier.

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  2017-04-19 12:57   ` Fam Zheng
@ 2017-04-19 15:21     ` Ashijeet Acharya
  2017-04-20  0:45       ` Fam Zheng
  0 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-19 15:21 UTC (permalink / raw)
  To: Fam Zheng; +Cc: jsnow, kwolf, mreitz, qemu-block, qemu-devel, stefanha

On Wed, Apr 19, 2017 at 18:27 Fam Zheng <famz@redhat.com> wrote:

> On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> > Rename the existing get_cluster_offset() function to
> > vmdk_get_cluster_offset() and have it make use of the new
> > get_cluster_table() to load the cluster tables. Also, it is no longer
> > used to allocate new clusters and hence perform COW. Make the necessary
> > renames at all the occurrences of get_cluster_offset().
> >
> > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> > ---
> >  block/vmdk.c | 117
> +++++++++++------------------------------------------------
> >  1 file changed, 21 insertions(+), 96 deletions(-)
>
> This is definitely more than a function rename, like I said in reply to
> patch 3,
> it could probably be split to smaller ones (rename, and others, for
> example),
> and reordered to make reviewing easier.


Maybe, because I have also refactored it to have vmdk_get_cluster_offset()
make use of the get_cluster_table() (and friends) to avoid duplication.

I will try to split it as

1. Rename
2. Refactor it to make use of get_cluster_table() by moving that out of
patch 3 as of now.

Will that work?

I think this will also keep the compiler happy while reviewing.

Ashijeet

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  2017-04-19 15:21     ` Ashijeet Acharya
@ 2017-04-20  0:45       ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-20  0:45 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, qemu-block, stefanha, qemu-devel, mreitz, jsnow

On Wed, 04/19 15:21, Ashijeet Acharya wrote:
> On Wed, Apr 19, 2017 at 18:27 Fam Zheng <famz@redhat.com> wrote:
> 
> > On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> > > Rename the existing get_cluster_offset() function to
> > > vmdk_get_cluster_offset() and have it make use of the new
> > > get_cluster_table() to load the cluster tables. Also, it is no longer
> > > used to allocate new clusters and hence perform COW. Make the necessary
> > > renames at all the occurrences of get_cluster_offset().
> > >
> > > Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> > > ---
> > >  block/vmdk.c | 117
> > +++++++++++------------------------------------------------
> > >  1 file changed, 21 insertions(+), 96 deletions(-)
> >
> > This is definitely more than a function rename, like I said in reply to
> > patch 3,
> > it could probably be split to smaller ones (rename, and others, for
> > example),
> > and reordered to make reviewing easier.
> 
> 
> Maybe, because I have also refactored it to have vmdk_get_cluster_offset()
> make use of the get_cluster_table() (and friends) to avoid duplication.
> 
> I will try to split it as
> 
> 1. Rename
> 2. Refactor it to make use of get_cluster_table() by moving that out of
> patch 3 as of now.
> 
> Will that work?

Sounds good. Thanks.

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle
  2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (3 preceding siblings ...)
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
@ 2017-04-01 14:44 ` Ashijeet Acharya
  2017-04-19 13:00   ` Fam Zheng
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters Ashijeet Acharya
  5 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Set the maximum bytes allowed to get allocated at once to be not more
than the extent size boundary to handle writes at two separate extents
appropriately.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a8babd7..9456ddd 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1767,6 +1767,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     int64_t offset_in_cluster, n_bytes;
     uint64_t cluster_offset;
     uint64_t bytes_done = 0;
+    uint64_t extent_size;
     VmdkMetaData m_data;
     uint32_t total_alloc_clusters = 0;
 
@@ -1782,9 +1783,17 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         if (!extent) {
             return -EIO;
         }
+        extent_size = extent->end_sector * BDRV_SECTOR_SIZE;
+
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
-        n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
-                             - offset_in_cluster);
+
+        /* truncate n_bytes to first cluster because we need to perform COW */
+        if (offset_in_cluster > 0) {
+            n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
+                                 - offset_in_cluster);
+        } else {
+            n_bytes = MIN(bytes, extent_size - offset);
+        }
 
         ret = vmdk_alloc_cluster_offset(bs, extent, &m_data, offset,
                                         !(extent->compressed || zeroed),
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
@ 2017-04-19 13:00   ` Fam Zheng
  2017-04-21 14:53     ` Ashijeet Acharya
  0 siblings, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2017-04-19 13:00 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> Set the maximum bytes allowed to get allocated at once to be not more
> than the extent size boundary to handle writes at two separate extents
> appropriately.
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> ---
>  block/vmdk.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index a8babd7..9456ddd 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1767,6 +1767,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>      int64_t offset_in_cluster, n_bytes;
>      uint64_t cluster_offset;
>      uint64_t bytes_done = 0;
> +    uint64_t extent_size;
>      VmdkMetaData m_data;
>      uint32_t total_alloc_clusters = 0;
>  
> @@ -1782,9 +1783,17 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>          if (!extent) {
>              return -EIO;
>          }
> +        extent_size = extent->end_sector * BDRV_SECTOR_SIZE;

Maybe extent_end to be more accurate?

> +
>          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
> -        n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
> -                             - offset_in_cluster);
> +
> +        /* truncate n_bytes to first cluster because we need to perform COW */

Makes sense, but shouldn't this be squashed into patch patch 3? Because it looks
like it is fixing an intermediate bug.

> +        if (offset_in_cluster > 0) {
> +            n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
> +                                 - offset_in_cluster);
> +        } else {
> +            n_bytes = MIN(bytes, extent_size - offset);
> +        }
>  
>          ret = vmdk_alloc_cluster_offset(bs, extent, &m_data, offset,
>                                          !(extent->compressed || zeroed),
> -- 
> 2.6.2
> 

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle
  2017-04-19 13:00   ` Fam Zheng
@ 2017-04-21 14:53     ` Ashijeet Acharya
  2017-04-22  4:27       ` Ashijeet Acharya
  0 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-21 14:53 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, John Snow, Max Reitz, Stefan Hajnoczi,
	QEMU Developers, qemu block

On Wed, Apr 19, 2017 at 6:30 PM, Fam Zheng <famz@redhat.com> wrote:
> On Sat, 04/01 20:14, Ashijeet Acharya wrote:
>> Set the maximum bytes allowed to get allocated at once to be not more
>> than the extent size boundary to handle writes at two separate extents
>> appropriately.
>>
>> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
>> ---
>>  block/vmdk.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index a8babd7..9456ddd 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1767,6 +1767,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>      int64_t offset_in_cluster, n_bytes;
>>      uint64_t cluster_offset;
>>      uint64_t bytes_done = 0;
>> +    uint64_t extent_size;
>>      VmdkMetaData m_data;
>>      uint32_t total_alloc_clusters = 0;
>>
>> @@ -1782,9 +1783,17 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>          if (!extent) {
>>              return -EIO;
>>          }
>> +        extent_size = extent->end_sector * BDRV_SECTOR_SIZE;
>
> Maybe extent_end to be more accurate?

Done

>> +
>>          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
>> -        n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
>> -                             - offset_in_cluster);
>> +
>> +        /* truncate n_bytes to first cluster because we need to perform COW */
>
> Makes sense, but shouldn't this be squashed into patch patch 3? Because it looks
> like it is fixing an intermediate bug.

Did you mean that I should merge this whole patch into patch 3? Maybe
moving it before patch 3 rather than squashing it make more sense?

Ashijeet

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle
  2017-04-21 14:53     ` Ashijeet Acharya
@ 2017-04-22  4:27       ` Ashijeet Acharya
  0 siblings, 0 replies; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-22  4:27 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, John Snow, Max Reitz, Stefan Hajnoczi,
	QEMU Developers, qemu block

On Fri, Apr 21, 2017 at 8:23 PM, Ashijeet Acharya
<ashijeetacharya@gmail.com> wrote:
> On Wed, Apr 19, 2017 at 6:30 PM, Fam Zheng <famz@redhat.com> wrote:
>> On Sat, 04/01 20:14, Ashijeet Acharya wrote:
>>> Set the maximum bytes allowed to get allocated at once to be not more
>>> than the extent size boundary to handle writes at two separate extents
>>> appropriately.
>>>
>>> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
>>> ---
>>>  block/vmdk.c | 13 +++++++++++--
>>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/vmdk.c b/block/vmdk.c
>>> index a8babd7..9456ddd 100644
>>> --- a/block/vmdk.c
>>> +++ b/block/vmdk.c
>>> @@ -1767,6 +1767,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>>      int64_t offset_in_cluster, n_bytes;
>>>      uint64_t cluster_offset;
>>>      uint64_t bytes_done = 0;
>>> +    uint64_t extent_size;
>>>      VmdkMetaData m_data;
>>>      uint32_t total_alloc_clusters = 0;
>>>
>>> @@ -1782,9 +1783,17 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>>          if (!extent) {
>>>              return -EIO;
>>>          }
>>> +        extent_size = extent->end_sector * BDRV_SECTOR_SIZE;
>>
>> Maybe extent_end to be more accurate?
>
> Done
>
>>> +
>>>          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
>>> -        n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
>>> -                             - offset_in_cluster);
>>> +
>>> +        /* truncate n_bytes to first cluster because we need to perform COW */
>>
>> Makes sense, but shouldn't this be squashed into patch patch 3? Because it looks
>> like it is fixing an intermediate bug.
>
> Did you mean that I should merge this whole patch into patch 3? Maybe
> moving it before patch 3 rather than squashing it make more sense?

Instead I have moved it before patch 3 in v4

Ashijeet

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters
  2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (4 preceding siblings ...)
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
@ 2017-04-01 14:44 ` Ashijeet Acharya
  2017-04-21  8:15   ` Fam Zheng
  5 siblings, 1 reply; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-01 14:44 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block,
	Ashijeet Acharya

Include a next pointer in VmdkMetaData struct to point to the previous
allocated L2 table. Modify vmdk_L2update to start updating metadata for
allocation of multiple clusters at once.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 111 insertions(+), 25 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 9456ddd..c7675db 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
     int valid;
     uint32_t *l2_cache_entry;
     uint32_t nb_clusters;
+    uint32_t offset;
+    struct VmdkMetaData *next;
 } VmdkMetaData;
 
 typedef struct VmdkGrainMarker {
@@ -263,6 +265,12 @@ static inline uint64_t size_to_clusters(VmdkExtent *extent, uint64_t size)
     return (DIV_ROUND_UP(size + round_off_size, BDRV_SECTOR_SIZE * 128) - 1);
 }
 
+static inline int64_t vmdk_align_offset(int64_t offset, int n)
+{
+    offset = (offset + n - 1) & ~(n - 1);
+    return offset;
+}
+
 static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
 {
     char *desc;
@@ -1037,29 +1045,88 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 }
 
-static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
-                         uint32_t offset)
+static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
+                                      VmdkMetaData *m_data, bool zeroed)
 {
-    offset = cpu_to_le32(offset);
+    int i;
+    uint32_t offset, temp_offset;
+    int *l2_table_array;
+    int l2_array_size;
+
+    if (zeroed) {
+        temp_offset = VMDK_GTE_ZEROED;
+    } else {
+        temp_offset = m_data->offset;
+    }
+
+    temp_offset = cpu_to_le32(temp_offset);
+
+    l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
+    l2_table_array = qemu_try_blockalign(extent->file->bs,
+                                    vmdk_align_offset(l2_array_size, 512));
+    if (l2_table_array == NULL) {
+        return VMDK_ERROR;
+    }
+    memset(l2_table_array, 0, vmdk_align_offset(l2_array_size, 512));
+
     /* update L2 table */
+    offset = temp_offset;
+    for (i = 0; i < m_data->nb_clusters; i++) {
+        l2_table_array[i] = offset;
+        if (!zeroed) {
+            offset += 128;
+        }
+    }
+
     if (bdrv_pwrite_sync(extent->file,
-                ((int64_t)m_data->l2_offset * 512)
-                    + (m_data->l2_index * sizeof(offset)),
-                &offset, sizeof(offset)) < 0) {
+            ((int64_t)m_data->l2_offset * 512)
+                + ((m_data->l2_index) * sizeof(offset)),
+                         l2_table_array, l2_array_size) < 0) {
         return VMDK_ERROR;
     }
+
     /* update backup L2 table */
     if (extent->l1_backup_table_offset != 0) {
         m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
         if (bdrv_pwrite_sync(extent->file,
                     ((int64_t)m_data->l2_offset * 512)
-                        + (m_data->l2_index * sizeof(offset)),
-                    &offset, sizeof(offset)) < 0) {
+                        + ((m_data->l2_index) * sizeof(offset)),
+                                l2_table_array, l2_array_size) < 0) {
             return VMDK_ERROR;
         }
     }
+
+    offset = temp_offset;
     if (m_data->l2_cache_entry) {
-        *m_data->l2_cache_entry = offset;
+        for (i = 0; i < m_data->nb_clusters; i++) {
+            *m_data->l2_cache_entry = offset;
+            m_data->l2_cache_entry++;
+
+            if (!zeroed) {
+                offset += 128;
+            }
+        }
+    }
+
+    qemu_vfree(l2_table_array);
+    return VMDK_OK;
+}
+
+static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
+                         bool zeroed)
+{
+    int ret;
+
+    while (m_data->next != NULL) {
+        VmdkMetaData *next;
+
+        ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
+        if (ret < 0) {
+            return ret;
+        }
+
+        next = m_data->next;
+        m_data = next;
     }
 
     return VMDK_OK;
@@ -1271,7 +1338,7 @@ exit:
  */
 static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
                         uint64_t offset, uint64_t *cluster_offset,
-                        int64_t *bytes, VmdkMetaData *m_data,
+                        int64_t *bytes, VmdkMetaData **m_data,
                         bool allocate, uint32_t *total_alloc_clusters)
 {
     int l1_index, l2_offset, l2_index;
@@ -1280,6 +1347,7 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
     uint32_t nb_clusters;
     bool zeroed = false;
     uint64_t skip_start_bytes, skip_end_bytes;
+    VmdkMetaData *old_m_data;
     int ret;
 
     ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
@@ -1330,13 +1398,21 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
         if (ret < 0) {
             return ret;
         }
-        if (m_data) {
-            m_data->valid = 1;
-            m_data->l1_index = l1_index;
-            m_data->l2_index = l2_index;
-            m_data->l2_offset = l2_offset;
-            m_data->l2_cache_entry = &l2_table[l2_index];
-            m_data->nb_clusters = nb_clusters;
+
+        if (*m_data) {
+            old_m_data = *m_data;
+            *m_data = g_malloc0(sizeof(**m_data));
+
+            **m_data = (VmdkMetaData) {
+                .valid            =    1,
+                .l1_index         =    l1_index,
+                .l2_index         =    l2_index,
+                .l2_offset        =    l2_offset,
+                .l2_cache_entry   =    &l2_table[l2_index],
+                .nb_clusters      =    nb_clusters,
+                .offset           =    cluster_sector,
+                .next             =    old_m_data,
+            };
         }
     }
     *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -1365,7 +1441,7 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
  */
 static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
                                      VmdkExtent *extent,
-                                     VmdkMetaData *m_data, uint64_t offset,
+                                     VmdkMetaData **m_data, uint64_t offset,
                                      bool allocate, uint64_t *cluster_offset,
                                      int64_t bytes,
                                      uint32_t *total_alloc_clusters)
@@ -1385,8 +1461,8 @@ static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
     new_cluster_offset = 0;
     *cluster_offset = 0;
     n_bytes = 0;
-    if (m_data) {
-        m_data->valid = 0;
+    if (*m_data) {
+        (*m_data)->valid = 0;
     }
 
     /* due to L2 table margins all bytes may not get allocated at once */
@@ -1768,9 +1844,11 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     uint64_t cluster_offset;
     uint64_t bytes_done = 0;
     uint64_t extent_size;
-    VmdkMetaData m_data;
+    VmdkMetaData *m_data;
     uint32_t total_alloc_clusters = 0;
 
+    m_data = g_malloc0(sizeof(*m_data));
+
     if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
         error_report("Wrong offset: offset=0x%" PRIx64
                      " total_sectors=0x%" PRIx64,
@@ -1779,6 +1857,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     }
 
     while (bytes > 0) {
+        m_data->next = NULL;
         extent = find_extent(s, offset >> BDRV_SECTOR_BITS, extent);
         if (!extent) {
             return -EIO;
@@ -1825,7 +1904,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
                                         total_alloc_clusters;
                 if (!zero_dry_run) {
                     /* update L2 tables */
-                    if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
+                    if (vmdk_L2update(extent, m_data, zeroed)
                             != VMDK_OK) {
                         return -EIO;
                     }
@@ -1839,10 +1918,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
             if (ret) {
                 return ret;
             }
-            if (m_data.valid) {
+            if (m_data->valid) {
                 /* update L2 tables */
-                if (vmdk_L2update(extent, &m_data,
-                                  cluster_offset >> BDRV_SECTOR_BITS)
+                if (vmdk_L2update(extent, m_data, zeroed)
                         != VMDK_OK) {
                     return -EIO;
                 }
@@ -1852,6 +1930,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         offset += n_bytes;
         bytes_done += n_bytes;
 
+        while (m_data->next != NULL) {
+            VmdkMetaData *next;
+            next = m_data->next;
+            g_free(m_data);
+            m_data = next;
+        }
+
         /* update CID on the first write every time the virtual disk is
          * opened */
         if (!s->cid_updated) {
@@ -1862,6 +1947,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
             s->cid_updated = true;
         }
     }
+    g_free(m_data);
     return 0;
 }
 
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters
  2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters Ashijeet Acharya
@ 2017-04-21  8:15   ` Fam Zheng
  2017-04-22  4:13     ` Ashijeet Acharya
  0 siblings, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2017-04-21  8:15 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Sat, 04/01 20:14, Ashijeet Acharya wrote:
> Include a next pointer in VmdkMetaData struct to point to the previous
> allocated L2 table. Modify vmdk_L2update to start updating metadata for
> allocation of multiple clusters at once.
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>

This is the metadata part of the coalesed allocation. I think patch 3 is
functionally incomplete without these changes, and is perhaps broken because
metadata is not handled correctly.

Such an "intermediate functional regression" is not good in a series, which we
need to avoid.

> ---
>  block/vmdk.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 111 insertions(+), 25 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 9456ddd..c7675db 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
>      int valid;
>      uint32_t *l2_cache_entry;
>      uint32_t nb_clusters;
> +    uint32_t offset;
> +    struct VmdkMetaData *next;
>  } VmdkMetaData;
>  
>  typedef struct VmdkGrainMarker {
> @@ -263,6 +265,12 @@ static inline uint64_t size_to_clusters(VmdkExtent *extent, uint64_t size)
>      return (DIV_ROUND_UP(size + round_off_size, BDRV_SECTOR_SIZE * 128) - 1);
>  }
>  
> +static inline int64_t vmdk_align_offset(int64_t offset, int n)
> +{
> +    offset = (offset + n - 1) & ~(n - 1);
> +    return offset;
> +}
> +
>  static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
>  {
>      char *desc;
> @@ -1037,29 +1045,88 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
>      }
>  }
>  
> -static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> -                         uint32_t offset)
> +static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
> +                                      VmdkMetaData *m_data, bool zeroed)
>  {
> -    offset = cpu_to_le32(offset);
> +    int i;
> +    uint32_t offset, temp_offset;
> +    int *l2_table_array;
> +    int l2_array_size;
> +
> +    if (zeroed) {
> +        temp_offset = VMDK_GTE_ZEROED;
> +    } else {
> +        temp_offset = m_data->offset;
> +    }
> +
> +    temp_offset = cpu_to_le32(temp_offset);
> +
> +    l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
> +    l2_table_array = qemu_try_blockalign(extent->file->bs,
> +                                    vmdk_align_offset(l2_array_size, 512));

Indentation is off.

Use QEMU_ALIGN_UP, instead of vmdk_align_offset.

512 is a magic number, use BDRV_SECTOR_SIZE.

> +    if (l2_table_array == NULL) {
> +        return VMDK_ERROR;
> +    }
> +    memset(l2_table_array, 0, vmdk_align_offset(l2_array_size, 512));
> +
>      /* update L2 table */
> +    offset = temp_offset;
> +    for (i = 0; i < m_data->nb_clusters; i++) {
> +        l2_table_array[i] = offset;
> +        if (!zeroed) {
> +            offset += 128;

Something is going wrong here with endianness on BE host, I believe.

> +        }
> +    }
> +
>      if (bdrv_pwrite_sync(extent->file,
> -                ((int64_t)m_data->l2_offset * 512)
> -                    + (m_data->l2_index * sizeof(offset)),
> -                &offset, sizeof(offset)) < 0) {
> +            ((int64_t)m_data->l2_offset * 512)
> +                + ((m_data->l2_index) * sizeof(offset)),
> +                         l2_table_array, l2_array_size) < 0) {

You can fix the indentation while changing these lines. If not, don't change it,
or at least don't make it uglier.

>          return VMDK_ERROR;
>      }
> +
>      /* update backup L2 table */
>      if (extent->l1_backup_table_offset != 0) {
>          m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
>          if (bdrv_pwrite_sync(extent->file,
>                      ((int64_t)m_data->l2_offset * 512)
> -                        + (m_data->l2_index * sizeof(offset)),
> -                    &offset, sizeof(offset)) < 0) {
> +                        + ((m_data->l2_index) * sizeof(offset)),
> +                                l2_table_array, l2_array_size) < 0) {

Same here.

>              return VMDK_ERROR;
>          }
>      }
> +
> +    offset = temp_offset;
>      if (m_data->l2_cache_entry) {
> -        *m_data->l2_cache_entry = offset;
> +        for (i = 0; i < m_data->nb_clusters; i++) {
> +            *m_data->l2_cache_entry = offset;
> +            m_data->l2_cache_entry++;
> +
> +            if (!zeroed) {
> +                offset += 128;
> +            }
> +        }
> +    }
> +
> +    qemu_vfree(l2_table_array);
> +    return VMDK_OK;
> +}
> +
> +static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> +                         bool zeroed)
> +{
> +    int ret;
> +
> +    while (m_data->next != NULL) {
> +        VmdkMetaData *next;
> +
> +        ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        next = m_data->next;
> +        m_data = next;

Why not simply "m_data = m_data->next" and drop "next" variable?

>      }
>  
>      return VMDK_OK;
> @@ -1271,7 +1338,7 @@ exit:
>   */
>  static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>                          uint64_t offset, uint64_t *cluster_offset,
> -                        int64_t *bytes, VmdkMetaData *m_data,
> +                        int64_t *bytes, VmdkMetaData **m_data,
>                          bool allocate, uint32_t *total_alloc_clusters)
>  {
>      int l1_index, l2_offset, l2_index;
> @@ -1280,6 +1347,7 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>      uint32_t nb_clusters;
>      bool zeroed = false;
>      uint64_t skip_start_bytes, skip_end_bytes;
> +    VmdkMetaData *old_m_data;
>      int ret;
>  
>      ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> @@ -1330,13 +1398,21 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>          if (ret < 0) {
>              return ret;
>          }
> -        if (m_data) {
> -            m_data->valid = 1;
> -            m_data->l1_index = l1_index;
> -            m_data->l2_index = l2_index;
> -            m_data->l2_offset = l2_offset;
> -            m_data->l2_cache_entry = &l2_table[l2_index];
> -            m_data->nb_clusters = nb_clusters;
> +
> +        if (*m_data) {
> +            old_m_data = *m_data;
> +            *m_data = g_malloc0(sizeof(**m_data));
> +
> +            **m_data = (VmdkMetaData) {
> +                .valid            =    1,
> +                .l1_index         =    l1_index,
> +                .l2_index         =    l2_index,
> +                .l2_offset        =    l2_offset,
> +                .l2_cache_entry   =    &l2_table[l2_index],
> +                .nb_clusters      =    nb_clusters,
> +                .offset           =    cluster_sector,
> +                .next             =    old_m_data,
> +            };

I think if the new m_data can be merged into the old, there is no need to
allocate a new one.

>          }
>      }
>      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
> @@ -1365,7 +1441,7 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>   */
>  static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
>                                       VmdkExtent *extent,
> -                                     VmdkMetaData *m_data, uint64_t offset,
> +                                     VmdkMetaData **m_data, uint64_t offset,
>                                       bool allocate, uint64_t *cluster_offset,
>                                       int64_t bytes,
>                                       uint32_t *total_alloc_clusters)
> @@ -1385,8 +1461,8 @@ static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
>      new_cluster_offset = 0;
>      *cluster_offset = 0;
>      n_bytes = 0;
> -    if (m_data) {
> -        m_data->valid = 0;
> +    if (*m_data) {
> +        (*m_data)->valid = 0;
>      }
>  
>      /* due to L2 table margins all bytes may not get allocated at once */
> @@ -1768,9 +1844,11 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>      uint64_t cluster_offset;
>      uint64_t bytes_done = 0;
>      uint64_t extent_size;
> -    VmdkMetaData m_data;
> +    VmdkMetaData *m_data;
>      uint32_t total_alloc_clusters = 0;
>  
> +    m_data = g_malloc0(sizeof(*m_data));
> +
>      if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
>          error_report("Wrong offset: offset=0x%" PRIx64
>                       " total_sectors=0x%" PRIx64,
> @@ -1779,6 +1857,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>      }
>  
>      while (bytes > 0) {
> +        m_data->next = NULL;
>          extent = find_extent(s, offset >> BDRV_SECTOR_BITS, extent);
>          if (!extent) {
>              return -EIO;
> @@ -1825,7 +1904,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>                                          total_alloc_clusters;
>                  if (!zero_dry_run) {
>                      /* update L2 tables */
> -                    if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
> +                    if (vmdk_L2update(extent, m_data, zeroed)
>                              != VMDK_OK) {
>                          return -EIO;
>                      }
> @@ -1839,10 +1918,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>              if (ret) {
>                  return ret;
>              }
> -            if (m_data.valid) {
> +            if (m_data->valid) {
>                  /* update L2 tables */
> -                if (vmdk_L2update(extent, &m_data,
> -                                  cluster_offset >> BDRV_SECTOR_BITS)
> +                if (vmdk_L2update(extent, m_data, zeroed)
>                          != VMDK_OK) {
>                      return -EIO;
>                  }
> @@ -1852,6 +1930,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>          offset += n_bytes;
>          bytes_done += n_bytes;
>  
> +        while (m_data->next != NULL) {
> +            VmdkMetaData *next;
> +            next = m_data->next;
> +            g_free(m_data);
> +            m_data = next;
> +        }
> +
>          /* update CID on the first write every time the virtual disk is
>           * opened */
>          if (!s->cid_updated) {
> @@ -1862,6 +1947,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>              s->cid_updated = true;
>          }
>      }
> +    g_free(m_data);

This is weird, you free all but the last m_data with a while loop, a few lines
above, and this one with a separate g_free().

Please use one loop:

    for (p = m_data; p; p = next) {
        next = p->next;
        g_free(p);
    }

>      return 0;
>  }
>  
> -- 
> 2.6.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters
  2017-04-21  8:15   ` Fam Zheng
@ 2017-04-22  4:13     ` Ashijeet Acharya
  0 siblings, 0 replies; 21+ messages in thread
From: Ashijeet Acharya @ 2017-04-22  4:13 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, John Snow, Max Reitz, Stefan Hajnoczi,
	QEMU Developers, qemu block

On Fri, Apr 21, 2017 at 1:45 PM, Fam Zheng <famz@redhat.com> wrote:
> On Sat, 04/01 20:14, Ashijeet Acharya wrote:
>> Include a next pointer in VmdkMetaData struct to point to the previous
>> allocated L2 table. Modify vmdk_L2update to start updating metadata for
>> allocation of multiple clusters at once.
>>
>> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
>
> This is the metadata part of the coalesed allocation. I think patch 3 is
> functionally incomplete without these changes, and is perhaps broken because
> metadata is not handled correctly.
>
> Such an "intermediate functional regression" is not good in a series, which we
> need to avoid.

I have moved this patch right after patch 3 because merging both will
result in an unnecessary huge patch. Will that work?
>
>> ---
>>  block/vmdk.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 111 insertions(+), 25 deletions(-)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index 9456ddd..c7675db 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
>>      int valid;
>>      uint32_t *l2_cache_entry;
>>      uint32_t nb_clusters;
>> +    uint32_t offset;
>> +    struct VmdkMetaData *next;
>>  } VmdkMetaData;
>>
>>  typedef struct VmdkGrainMarker {
>> @@ -263,6 +265,12 @@ static inline uint64_t size_to_clusters(VmdkExtent *extent, uint64_t size)
>>      return (DIV_ROUND_UP(size + round_off_size, BDRV_SECTOR_SIZE * 128) - 1);
>>  }
>>
>> +static inline int64_t vmdk_align_offset(int64_t offset, int n)
>> +{
>> +    offset = (offset + n - 1) & ~(n - 1);
>> +    return offset;
>> +}
>> +
>>  static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
>>  {
>>      char *desc;
>> @@ -1037,29 +1045,88 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
>>      }
>>  }
>>
>> -static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
>> -                         uint32_t offset)
>> +static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
>> +                                      VmdkMetaData *m_data, bool zeroed)
>>  {
>> -    offset = cpu_to_le32(offset);
>> +    int i;
>> +    uint32_t offset, temp_offset;
>> +    int *l2_table_array;
>> +    int l2_array_size;
>> +
>> +    if (zeroed) {
>> +        temp_offset = VMDK_GTE_ZEROED;
>> +    } else {
>> +        temp_offset = m_data->offset;
>> +    }
>> +
>> +    temp_offset = cpu_to_le32(temp_offset);
>> +
>> +    l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
>> +    l2_table_array = qemu_try_blockalign(extent->file->bs,
>> +                                    vmdk_align_offset(l2_array_size, 512));
>
> Indentation is off.
>
> Use QEMU_ALIGN_UP, instead of vmdk_align_offset.
>
> 512 is a magic number, use BDRV_SECTOR_SIZE.

Done

>
>> +    if (l2_table_array == NULL) {
>> +        return VMDK_ERROR;
>> +    }
>> +    memset(l2_table_array, 0, vmdk_align_offset(l2_array_size, 512));
>> +
>>      /* update L2 table */
>> +    offset = temp_offset;
>> +    for (i = 0; i < m_data->nb_clusters; i++) {
>> +        l2_table_array[i] = offset;
>> +        if (!zeroed) {
>> +            offset += 128;
>
> Something is going wrong here with endianness on BE host, I believe.

I have changed temp_offset to LE above, wouldn't that be enough. I am not sure.

>
>> +        }
>> +    }
>> +
>>      if (bdrv_pwrite_sync(extent->file,
>> -                ((int64_t)m_data->l2_offset * 512)
>> -                    + (m_data->l2_index * sizeof(offset)),
>> -                &offset, sizeof(offset)) < 0) {
>> +            ((int64_t)m_data->l2_offset * 512)
>> +                + ((m_data->l2_index) * sizeof(offset)),
>> +                         l2_table_array, l2_array_size) < 0) {
>
> You can fix the indentation while changing these lines. If not, don't change it,
> or at least don't make it uglier.

I have aligned it, if it still looks ugly in v4, I will revert.

>
>>          return VMDK_ERROR;
>>      }
>> +
>>      /* update backup L2 table */
>>      if (extent->l1_backup_table_offset != 0) {
>>          m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
>>          if (bdrv_pwrite_sync(extent->file,
>>                      ((int64_t)m_data->l2_offset * 512)
>> -                        + (m_data->l2_index * sizeof(offset)),
>> -                    &offset, sizeof(offset)) < 0) {
>> +                        + ((m_data->l2_index) * sizeof(offset)),
>> +                                l2_table_array, l2_array_size) < 0) {
>
> Same here.
>
>>              return VMDK_ERROR;
>>          }
>>      }
>> +
>> +    offset = temp_offset;
>>      if (m_data->l2_cache_entry) {
>> -        *m_data->l2_cache_entry = offset;
>> +        for (i = 0; i < m_data->nb_clusters; i++) {
>> +            *m_data->l2_cache_entry = offset;
>> +            m_data->l2_cache_entry++;
>> +
>> +            if (!zeroed) {
>> +                offset += 128;
>> +            }
>> +        }
>> +    }
>> +
>> +    qemu_vfree(l2_table_array);
>> +    return VMDK_OK;
>> +}
>> +
>> +static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
>> +                         bool zeroed)
>> +{
>> +    int ret;
>> +
>> +    while (m_data->next != NULL) {
>> +        VmdkMetaData *next;
>> +
>> +        ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +
>> +        next = m_data->next;
>> +        m_data = next;
>
> Why not simply "m_data = m_data->next" and drop "next" variable?
>>      }
>>
>>      return VMDK_OK;
>> @@ -1271,7 +1338,7 @@ exit:
>>   */
>>  static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>>                          uint64_t offset, uint64_t *cluster_offset,
>> -                        int64_t *bytes, VmdkMetaData *m_data,
>> +                        int64_t *bytes, VmdkMetaData **m_data,
>>                          bool allocate, uint32_t *total_alloc_clusters)
>>  {
>>      int l1_index, l2_offset, l2_index;
>> @@ -1280,6 +1347,7 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>>      uint32_t nb_clusters;
>>      bool zeroed = false;
>>      uint64_t skip_start_bytes, skip_end_bytes;
>> +    VmdkMetaData *old_m_data;
>>      int ret;
>>
>>      ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
>> @@ -1330,13 +1398,21 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>>          if (ret < 0) {
>>              return ret;
>>          }
>> -        if (m_data) {
>> -            m_data->valid = 1;
>> -            m_data->l1_index = l1_index;
>> -            m_data->l2_index = l2_index;
>> -            m_data->l2_offset = l2_offset;
>> -            m_data->l2_cache_entry = &l2_table[l2_index];
>> -            m_data->nb_clusters = nb_clusters;
>> +
>> +        if (*m_data) {
>> +            old_m_data = *m_data;
>> +            *m_data = g_malloc0(sizeof(**m_data));
>> +
>> +            **m_data = (VmdkMetaData) {
>> +                .valid            =    1,
>> +                .l1_index         =    l1_index,
>> +                .l2_index         =    l2_index,
>> +                .l2_offset        =    l2_offset,
>> +                .l2_cache_entry   =    &l2_table[l2_index],
>> +                .nb_clusters      =    nb_clusters,
>> +                .offset           =    cluster_sector,
>> +                .next             =    old_m_data,
>> +            };
>
> I think if the new m_data can be merged into the old, there is no need to
> allocate a new one.

Do you mean that if the clusters lie in the same l2 table, then merge
them? I think this case only appears when I leave out the first and
last cluster for COW. If I misunderstood, sorry!
I think I will post v4 without attending this issue and we can discuss
this when you are available after the weekend.

>
>>          }
>>      }
>>      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
>> @@ -1365,7 +1441,7 @@ static int handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>>   */
>>  static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
>>                                       VmdkExtent *extent,
>> -                                     VmdkMetaData *m_data, uint64_t offset,
>> +                                     VmdkMetaData **m_data, uint64_t offset,
>>                                       bool allocate, uint64_t *cluster_offset,
>>                                       int64_t bytes,
>>                                       uint32_t *total_alloc_clusters)
>> @@ -1385,8 +1461,8 @@ static int vmdk_alloc_cluster_offset(BlockDriverState *bs,
>>      new_cluster_offset = 0;
>>      *cluster_offset = 0;
>>      n_bytes = 0;
>> -    if (m_data) {
>> -        m_data->valid = 0;
>> +    if (*m_data) {
>> +        (*m_data)->valid = 0;
>>      }
>>
>>      /* due to L2 table margins all bytes may not get allocated at once */
>> @@ -1768,9 +1844,11 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>      uint64_t cluster_offset;
>>      uint64_t bytes_done = 0;
>>      uint64_t extent_size;
>> -    VmdkMetaData m_data;
>> +    VmdkMetaData *m_data;
>>      uint32_t total_alloc_clusters = 0;
>>
>> +    m_data = g_malloc0(sizeof(*m_data));
>> +
[scroll till here] [1] So this allocation will need to move....[2]

>>      if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
>>          error_report("Wrong offset: offset=0x%" PRIx64
>>                       " total_sectors=0x%" PRIx64,
>> @@ -1779,6 +1857,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>      }
>>
>>      while (bytes > 0) {

....[2] here. Thus we will need to allocate it again every time we
enter here otherwise the very next line m_data->next=NULL will
segfault.

So maybe its good to free it separately?
I will retain it this way for v4 and change it otherwise if you still
say so after my reasoning in v5.

>> +        m_data->next = NULL;
>>          extent = find_extent(s, offset >> BDRV_SECTOR_BITS, extent);
>>          if (!extent) {
>>              return -EIO;
>> @@ -1825,7 +1904,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>                                          total_alloc_clusters;
>>                  if (!zero_dry_run) {
>>                      /* update L2 tables */
>> -                    if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
>> +                    if (vmdk_L2update(extent, m_data, zeroed)
>>                              != VMDK_OK) {
>>                          return -EIO;
>>                      }
>> @@ -1839,10 +1918,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>              if (ret) {
>>                  return ret;
>>              }
>> -            if (m_data.valid) {
>> +            if (m_data->valid) {
>>                  /* update L2 tables */
>> -                if (vmdk_L2update(extent, &m_data,
>> -                                  cluster_offset >> BDRV_SECTOR_BITS)
>> +                if (vmdk_L2update(extent, m_data, zeroed)
>>                          != VMDK_OK) {
>>                      return -EIO;
>>                  }
>> @@ -1852,6 +1930,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>          offset += n_bytes;
>>          bytes_done += n_bytes;
>>
>> +        while (m_data->next != NULL) {
>> +            VmdkMetaData *next;
>> +            next = m_data->next;
>> +            g_free(m_data);
>> +            m_data = next;
>> +        }
>> +
>>          /* update CID on the first write every time the virtual disk is
>>           * opened */
>>          if (!s->cid_updated) {
>> @@ -1862,6 +1947,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>              s->cid_updated = true;
>>          }
>>      }
>> +    g_free(m_data);
>
> This is weird, you free all but the last m_data with a while loop, a few lines
> above, and this one with a separate g_free().
>
> Please use one loop:
>
>     for (p = m_data; p; p = next) {
>         next = p->next;
>         g_free(p);
>     }

I have a good (maybe good enough) reason for it, if I free it in the
while loop above, then I will need to allocate it again when we enter
the superior while(bytes>0) loop, otherwise we will segfault for
everything from that point onwards....[scroll up __^] [1]

Ashijeet

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-04-22  4:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-01 14:44 [Qemu-devel] [PATCH v3 0/6] Optiomize VMDK I/O by allocating multiple clusters Ashijeet Acharya
2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 1/6] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
2017-04-10 13:04   ` Ashijeet Acharya
2017-04-19 12:14     ` Fam Zheng
2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 2/6] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
2017-04-19 12:14   ` Fam Zheng
2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 3/6] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
2017-04-19 12:56   ` Fam Zheng
2017-04-19 15:13     ` Ashijeet Acharya
2017-04-20  0:47       ` Fam Zheng
2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 4/6] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
2017-04-19 12:57   ` Fam Zheng
2017-04-19 15:21     ` Ashijeet Acharya
2017-04-20  0:45       ` Fam Zheng
2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 5/6] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
2017-04-19 13:00   ` Fam Zheng
2017-04-21 14:53     ` Ashijeet Acharya
2017-04-22  4:27       ` Ashijeet Acharya
2017-04-01 14:44 ` [Qemu-devel] [PATCH v3 6/6] vmdk: Update metadata for multiple clusters Ashijeet Acharya
2017-04-21  8:15   ` Fam Zheng
2017-04-22  4:13     ` Ashijeet Acharya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).