[Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
@ 2008-07-29 14:13 Laurent Vivier
  2008-07-29 14:13 ` [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset() Laurent Vivier
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

These patches improve qcow2 performance when used with cache=off.

They modify block-qcow2.c to read/write as many clusters as
possible per bdrv_aio_[read|write]().

Some benchmarks:

* mkfs on 500 MB qcow2 disk

mkfs                            WITHOUT         WITH

ide, cache=off,snapshot=off     41 s            5 s     8x faster
ide, cache=off,snapshot=on      40 s            5 s     8x faster
ide, cache=on, snapshot=off      3 s            3 s
ide, cache=on, snapshot=on       4 s            3 s

* tar jxvf linux-2.6.25.7.tar.bz2

ide, cache=off,snapshot=off     847 s           379 s   2.2x faster
ide, cache=off,snapshot=on      801 s           364 s   2.2x faster
ide, cache=on, snapshot=off     238 s           236 s
ide, cache=on, snapshot=on      236 s           237 s

* dd if=/dev/zero of=file

dd if=/dev/null                 WITHOUT         WITH

ide, cache=off,snapshot=off     333 kB/s        3.7 MB/s  11.3x faster
ide, cache=off,snapshot=on      337 kB/s        3.6 MB/s  10.9x faster
ide, cache=on, snapshot=off    9.06 MB/s        9.23 MB/s
ide, cache=on, snapshot=on     8,89 MB/s        8.89 MB/s

* dbench 1

dbench                          WITHOUT          WITH

ide, cache=off,snapshot=off     20.8494 MB/sec  24.8521 MB/sec          +19,2%
ide, cache=off,snapshot=on      20.9349 MB/sec  24.2296 MB/sec          +15,7%
ide, cache=on, snapshot=off     23.6612 MB/sec  25.4724 MB/sec          + 7,6%
ide, cache=on, snapshot=on      24.1836 MB/sec  24.8169 MB/sec

[PATCH 1/5] extract code from get_cluster_offset() into new functions
            seek_l2_table(), l2_load() and l2_allocate().

[PATCH 2/5] divide get_cluster_offset() into get_cluster_offset() and
            alloc_cluster_offset().

[PATCH 3/5] divide alloc_cluster_offset() into alloc_cluster_offset()
            and alloc_compressed_cluster_offset(). Factorize code
            to free clusters into free_used_clusters().

[PATCH 4/5] modify get_cluster_offset(), alloc_cluster_offset() and 
            free_used_clusters() to specify how many clusters we
            want.

[PATCH 5/5] in free_used_clusters(), try to aggregate free clusters
            and freed clusters.

Laurent
--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset()
  2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
@ 2008-07-29 14:13 ` Laurent Vivier
  2008-08-05 14:15   ` Kevin Wolf
  2008-07-29 14:13 ` [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset() Laurent Vivier
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

[-- Attachment #1: qcow2-factor.patch --]
[-- Type: text/plain, Size: 7726 bytes --]

Extract code from get_cluster_offset() into new functions:

- seek_l2_table()

Search an l2 offset in the l2_cache table.

- l2_load()

Read the l2 entry from disk

- l2_allocate()

Allocate a new l2 entry.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 block-qcow2.c |  176 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 109 insertions(+), 67 deletions(-)

Index: qemu/block-qcow2.c
===================================================================
--- qemu.orig/block-qcow2.c	2008-07-29 15:12:37.000000000 +0200
+++ qemu/block-qcow2.c	2008-07-29 15:22:04.000000000 +0200
@@ -480,96 +480,138 @@ static int grow_l1_table(BlockDriverStat
     return -EIO;
 }
 
-/* 'allocate' is:
- *
- * 0 not to allocate.
- *
- * 1 to allocate a normal cluster (for sector indexes 'n_start' to
- * 'n_end')
- *
- * 2 to allocate a compressed cluster of size
- * 'compressed_size'. 'compressed_size' must be > 0 and <
- * cluster_size
- *
- * return 0 if not allocated.
- */
+static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
+{
+    int i,j;
+
+    for(i = 0; i < L2_CACHE_SIZE; i++) {
+        if (l2_offset == s->l2_cache_offsets[i]) {
+            /* increment the hit count */
+            if (++s->l2_cache_counts[i] == 0xffffffff) {
+                for(j = 0; j < L2_CACHE_SIZE; j++) {
+                    s->l2_cache_counts[j] >>= 1;
+                }
+            }
+            return s->l2_cache + (i << s->l2_bits);
+        }
+    }
+    return NULL;
+}
+
+static int l2_load(BlockDriverState *bs, int l1_index,
+                   uint64_t **new_l2_table, uint64_t *new_l2_offset)
+{
+    BDRVQcowState *s = bs->opaque;
+    int min_index;
+    uint64_t *l2_table, l2_offset;
+
+    l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED;
+    l2_table = seek_l2_table(s, l2_offset);
+    if (l2_table != NULL)
+        goto found;
+
+    /* not found: load a new entry in the least used one */
+
+    min_index = l2_cache_new_entry(bs);
+    l2_table = s->l2_cache + (min_index << s->l2_bits);
+    if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
+        s->l2_size * sizeof(uint64_t))
+        return 0;
+    s->l2_cache_offsets[min_index] = l2_offset;
+    s->l2_cache_counts[min_index] = 1;
+
+found:
+    *new_l2_table = l2_table;
+    *new_l2_offset = l2_offset;
+
+    return 1;
+}
+
+static int l2_allocate(BlockDriverState *bs, int l1_index,
+                       uint64_t **new_l2_table, uint64_t *new_l2_offset)
+{
+    BDRVQcowState *s = bs->opaque;
+    int min_index;
+    uint64_t old_l2_offset, tmp;
+    uint64_t *l2_table, l2_offset;
+
+    old_l2_offset = s->l1_table[l1_index];
+
+    /* allocate a new l2 entry */
+
+    l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+
+    /* update the L1 entry */
+
+    s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
+
+    tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
+    if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
+                    &tmp, sizeof(tmp)) != sizeof(tmp))
+        return 0;
+
+    min_index = l2_cache_new_entry(bs);
+    l2_table = s->l2_cache + (min_index << s->l2_bits);
+
+    if (old_l2_offset == 0) {
+        memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
+    } else {
+        if (bdrv_pread(s->hd, old_l2_offset,
+                       l2_table, s->l2_size * sizeof(uint64_t)) !=
+            s->l2_size * sizeof(uint64_t))
+            return 0;
+    }
+    if (bdrv_pwrite(s->hd, l2_offset,
+                    l2_table, s->l2_size * sizeof(uint64_t)) !=
+        s->l2_size * sizeof(uint64_t))
+        return 0;
+
+    s->l2_cache_offsets[min_index] = l2_offset;
+    s->l2_cache_counts[min_index] = 1;
+
+    *new_l2_table = l2_table;
+    *new_l2_offset = l2_offset;
+
+    return 1;
+}
+
 static uint64_t get_cluster_offset(BlockDriverState *bs,
                                    uint64_t offset, int allocate,
                                    int compressed_size,
                                    int n_start, int n_end)
 {
     BDRVQcowState *s = bs->opaque;
-    int min_index, i, j, l1_index, l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset;
+    int l1_index, l2_index, ret;
+    uint64_t l2_offset, *l2_table, cluster_offset, tmp;
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
         /* outside l1 table is allowed: we grow the table if needed */
         if (!allocate)
             return 0;
-        if (grow_l1_table(bs, l1_index + 1) < 0)
+        ret = grow_l1_table(bs, l1_index + 1);
+        if (ret < 0)
             return 0;
     }
     l2_offset = s->l1_table[l1_index];
     if (!l2_offset) {
         if (!allocate)
             return 0;
-    l2_allocate:
-        old_l2_offset = l2_offset;
-        /* allocate a new l2 entry */
-        l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
-        /* update the L1 entry */
-        s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
-        tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
-        if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
-                        &tmp, sizeof(tmp)) != sizeof(tmp))
-            return 0;
-        min_index = l2_cache_new_entry(bs);
-        l2_table = s->l2_cache + (min_index << s->l2_bits);
-
-        if (old_l2_offset == 0) {
-            memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
-        } else {
-            if (bdrv_pread(s->hd, old_l2_offset,
-                           l2_table, s->l2_size * sizeof(uint64_t)) !=
-                s->l2_size * sizeof(uint64_t))
-                return 0;
-        }
-        if (bdrv_pwrite(s->hd, l2_offset,
-                        l2_table, s->l2_size * sizeof(uint64_t)) !=
-            s->l2_size * sizeof(uint64_t))
+        ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
+        if (ret == 0)
             return 0;
     } else {
-        if (!(l2_offset & QCOW_OFLAG_COPIED)) {
-            if (allocate) {
-                free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
-                goto l2_allocate;
-            }
+        if (!(l2_offset & QCOW_OFLAG_COPIED) && allocate) {
+            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
+            ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
+            if (ret == 0)
+                return 0;
         } else {
-            l2_offset &= ~QCOW_OFLAG_COPIED;
-        }
-        for(i = 0; i < L2_CACHE_SIZE; i++) {
-            if (l2_offset == s->l2_cache_offsets[i]) {
-                /* increment the hit count */
-                if (++s->l2_cache_counts[i] == 0xffffffff) {
-                    for(j = 0; j < L2_CACHE_SIZE; j++) {
-                        s->l2_cache_counts[j] >>= 1;
-                    }
-                }
-                l2_table = s->l2_cache + (i << s->l2_bits);
-                goto found;
-            }
+            ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
+            if (ret == 0)
+                return 0;
         }
-        /* not found: load a new entry in the least used one */
-        min_index = l2_cache_new_entry(bs);
-        l2_table = s->l2_cache + (min_index << s->l2_bits);
-        if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
-            s->l2_size * sizeof(uint64_t))
-            return 0;
     }
-    s->l2_cache_offsets[min_index] = l2_offset;
-    s->l2_cache_counts[min_index] = 1;
- found:
     l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
     if (!cluster_offset) {

--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset()
  2008-07-29 14:13 ` [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset() Laurent Vivier
@ 2008-08-05 14:15   ` Kevin Wolf
  2008-08-05 14:28     ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-05 14:15 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> Extract code from get_cluster_offset() into new functions:
> 
> - seek_l2_table()
> 
> Search an l2 offset in the l2_cache table.
> 
> - l2_load()
> 
> Read the l2 entry from disk
> 
> - l2_allocate()
> 
> Allocate a new l2 entry.
> 
> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>

Acked-by: Kevin Wolf <kwolf@suse.de>

However, unrelated to your patch, I noticed that you touch code which
could use more comments in some places. (Oh btw, why do you explain the
new functions in the patch header but not in the code itself? But okay,
these are more or less obvious.)

> +static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
> +{
> +    int i,j;
> +
> +    for(i = 0; i < L2_CACHE_SIZE; i++) {
> +        if (l2_offset == s->l2_cache_offsets[i]) {
> +            /* increment the hit count */
> +            if (++s->l2_cache_counts[i] == 0xffffffff) {
> +                for(j = 0; j < L2_CACHE_SIZE; j++) {
> +                    s->l2_cache_counts[j] >>= 1;
> +                }
> +            }

This if block is one of them. While it's quite obvious that some counter
is incremented, the action to avoid an overflow isn't. Why don't we do
something like if (count < 0xffffffff) count++?

And l2_cache_counts is a bad name anyway, IMHO. I first thought of
something like a reference counter (and then it would be definitely
worth a comment; now it is debatable). l2_cache_hits could be a better name.

> +        if (!(l2_offset & QCOW_OFLAG_COPIED) && allocate) {
> +            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> +            ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
> +            if (ret == 0)
> +                return 0;

The assumption is that allocate != 0 means there will be a write
operation and the cluster has to be copied, right? You certainly can
figure that out from the code, but it would be nice to have a comment
saying exactly this and that free_clusters is used to decrease the
refcount of the copied cluster.

Agreed, having written this, both are not really that bad. But it were
the places where I needed a bit more time to understand what the code is
doing and if it's right, so I thought I'd mention them.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset()
  2008-08-05 14:15   ` Kevin Wolf
@ 2008-08-05 14:28     ` Laurent Vivier
  2008-08-05 14:34       ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-08-05 14:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Le mardi 05 août 2008 à 16:15 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > Extract code from get_cluster_offset() into new functions:
> > 
> > - seek_l2_table()
> > 
> > Search an l2 offset in the l2_cache table.
> > 
> > - l2_load()
> > 
> > Read the l2 entry from disk
> > 
> > - l2_allocate()
> > 
> > Allocate a new l2 entry.
> > 
> > Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> 
> Acked-by: Kevin Wolf <kwolf@suse.de>
> 
> 
> However, unrelated to your patch, I noticed that you touch code which
> could use more comments in some places. (Oh btw, why do you explain the
> new functions in the patch header but not in the code itself? But okay,
> these are more or less obvious.)

OK, you're right, I'll add some comments.

> > +static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
> > +{
> > +    int i,j;
> > +
> > +    for(i = 0; i < L2_CACHE_SIZE; i++) {
> > +        if (l2_offset == s->l2_cache_offsets[i]) {
> > +            /* increment the hit count */
> > +            if (++s->l2_cache_counts[i] == 0xffffffff) {
> > +                for(j = 0; j < L2_CACHE_SIZE; j++) {
> > +                    s->l2_cache_counts[j] >>= 1;
> > +                }
> > +            }
> 
> This if block is one of them. While it's quite obvious that some counter
> is incremented, the action to avoid an overflow isn't. Why don't we do
> something like if (count < 0xffffffff) count++?
> 
> And l2_cache_counts is a bad name anyway, IMHO. I first thought of
> something like a reference counter (and then it would be definitely
> worth a comment; now it is debatable). l2_cache_hits could be a better name.

It is a cut&paste... so I don't know what it is really doing and I don't
want to modify something that has no direct link to the goal I reach...
even a variable name.

I'm currently testing these patches with encrypted disk and compressed
disk, and I've found a bug with encrypted one. I'll repost patches soon.

Thank you for your comments,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset()
  2008-08-05 14:28     ` Laurent Vivier
@ 2008-08-05 14:34       ` Kevin Wolf
  2008-08-05 14:45         ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-05 14:34 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> It is a cut&paste... so I don't know what it is really doing and I don't
> want to modify something that has no direct link to the goal I reach...
> even a variable name.

I know and I certainly don't want you to include this change in this
patch as it would make the patch even worse to read. This would be more
of a change for some [patch 6/5] with the accumulated minor cleanups.

> I'm currently testing these patches with encrypted disk and compressed
> disk, and I've found a bug with encrypted one. I'll repost patches soon.

So the segfault when saving a snapshot with encrypted disks is gone? ;-)

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset()
  2008-08-05 14:34       ` Kevin Wolf
@ 2008-08-05 14:45         ` Laurent Vivier
  0 siblings, 0 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-08-05 14:45 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Le mardi 05 août 2008 à 16:34 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > It is a cut&paste... so I don't know what it is really doing and I don't
> > want to modify something that has no direct link to the goal I reach...
> > even a variable name.
> 
> I know and I certainly don't want you to include this change in this
> patch as it would make the patch even worse to read. This would be more
> of a change for some [patch 6/5] with the accumulated minor cleanups.

Well, I don't like minor cleanups ;-)

> > I'm currently testing these patches with encrypted disk and compressed
> > disk, and I've found a bug with encrypted one. I'll repost patches soon.
> 
> So the segfault when saving a snapshot with encrypted disks is gone? ;-)

It corrects segfault with encrypted disk... for the moment I don't test
it with snapshot (next step)...

Regards,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset()
  2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
  2008-07-29 14:13 ` [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset() Laurent Vivier
@ 2008-07-29 14:13 ` Laurent Vivier
  2008-08-05 15:13   ` Kevin Wolf
  2008-07-29 14:13 ` [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset() Laurent Vivier
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

[-- Attachment #1: qcow2-divide.patch --]
[-- Type: text/plain, Size: 9929 bytes --]

Divide get_cluster_offset() into get_cluster_offset() and
alloc_cluster_offset().

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 block-qcow2.c |  161 +++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 94 insertions(+), 67 deletions(-)

Index: qemu/block-qcow2.c
===================================================================
--- qemu.orig/block-qcow2.c	2008-07-11 10:07:10.000000000 +0200
+++ qemu/block-qcow2.c	2008-07-11 11:04:14.000000000 +0200
@@ -575,51 +575,65 @@ static int l2_allocate(BlockDriverState 
     return 1;
 }
 
-static uint64_t get_cluster_offset(BlockDriverState *bs,
-                                   uint64_t offset, int allocate,
-                                   int compressed_size,
-                                   int n_start, int n_end)
+static uint64_t get_cluster_offset(BlockDriverState *bs, uint64_t offset)
 {
     BDRVQcowState *s = bs->opaque;
     int l1_index, l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset, tmp;
+    uint64_t l2_offset, *l2_table, cluster_offset;
+
+    l1_index = offset >> (s->l2_bits + s->cluster_bits);
+    if (l1_index >= s->l1_size)
+        return 0;
+
+    if (!s->l1_table[l1_index])
+        return 0;
+
+    ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
+    if (ret == 0)
+        return 0;
+
+    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
+    cluster_offset = be64_to_cpu(l2_table[l2_index]);
+
+    return cluster_offset & ~QCOW_OFLAG_COPIED;
+}
+
+static uint64_t alloc_cluster_offset(BlockDriverState *bs,
+                                     uint64_t offset,
+                                     int compressed_size,
+                                     int n_start, int n_end)
+{
+    BDRVQcowState *s = bs->opaque;
+    int l1_index, l2_index, ret;
+    uint64_t l2_offset, *l2_table, cluster_offset;
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
-        /* outside l1 table is allowed: we grow the table if needed */
-        if (!allocate)
-            return 0;
         ret = grow_l1_table(bs, l1_index + 1);
         if (ret < 0)
             return 0;
     }
+
     l2_offset = s->l1_table[l1_index];
-    if (!l2_offset) {
-        if (!allocate)
-            return 0;
-        ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
+    if (l2_offset & QCOW_OFLAG_COPIED) {
+        ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
         if (ret == 0)
             return 0;
     } else {
-        if (!(l2_offset & QCOW_OFLAG_COPIED) && allocate) {
+        if (l2_offset)
             free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
-            ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
-            if (ret == 0)
-                return 0;
-        } else {
-            ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
-            if (ret == 0)
-                return 0;
-        }
+        ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
+        if (ret == 0)
+            return 0;
     }
+
     l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
-    if (!cluster_offset) {
-        if (!allocate)
-            return cluster_offset;
-    } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) {
-        if (!allocate)
-            return cluster_offset;
+
+    if (cluster_offset & QCOW_OFLAG_COPIED)
+        return cluster_offset & ~QCOW_OFLAG_COPIED;
+
+    if (cluster_offset) {
         /* free the cluster */
         if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
             int nb_csectors;
@@ -630,45 +644,59 @@ static uint64_t get_cluster_offset(Block
         } else {
             free_clusters(bs, cluster_offset, s->cluster_size);
         }
-    } else {
-        cluster_offset &= ~QCOW_OFLAG_COPIED;
-        return cluster_offset;
     }
-    if (allocate == 1) {
-        /* allocate a new cluster */
-        cluster_offset = alloc_clusters(bs, s->cluster_size);
-
-        /* we must initialize the cluster content which won't be
-           written */
-        if ((n_end - n_start) < s->cluster_sectors) {
-            uint64_t start_sect;
-
-            start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
-            ret = copy_sectors(bs, start_sect,
-                               cluster_offset, 0, n_start);
-            if (ret < 0)
-                return 0;
-            ret = copy_sectors(bs, start_sect,
-                               cluster_offset, n_end, s->cluster_sectors);
-            if (ret < 0)
-                return 0;
-        }
-        tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
-    } else {
+
+    if (compressed_size) {
         int nb_csectors;
         cluster_offset = alloc_bytes(bs, compressed_size);
         nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
             (cluster_offset >> 9);
         cluster_offset |= QCOW_OFLAG_COMPRESSED |
             ((uint64_t)nb_csectors << s->csize_shift);
-        /* compressed clusters never have the copied flag */
-        tmp = cpu_to_be64(cluster_offset);
+
+        /* update L2 table
+         * compressed clusters never have the copied flag
+         */
+
+        l2_table[l2_index] = cpu_to_be64(cluster_offset);
+        if (bdrv_pwrite(s->hd,
+                        l2_offset + l2_index * sizeof(uint64_t),
+                        l2_table + l2_index,
+                        sizeof(uint64_t)) != sizeof(uint64_t))
+            return 0;
+
+        return cluster_offset;
     }
+
+    /* allocate a new cluster */
+
+    cluster_offset = alloc_clusters(bs, s->cluster_size);
+
+    /* we must initialize the cluster content which won't be
+       written */
+
+    if ((n_end - n_start) < s->cluster_sectors) {
+        uint64_t start_sect;
+
+        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
+        ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
+        if (ret < 0)
+            return 0;
+        ret = copy_sectors(bs, start_sect,
+                           cluster_offset, n_end, s->cluster_sectors);
+        if (ret < 0)
+            return 0;
+    }
+
     /* update L2 table */
-    l2_table[l2_index] = tmp;
+
+    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
     if (bdrv_pwrite(s->hd,
-                    l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
+                    l2_offset + l2_index * sizeof(uint64_t),
+                    l2_table + l2_index,
+                    sizeof(uint64_t)) != sizeof(uint64_t))
         return 0;
+
     return cluster_offset;
 }
 
@@ -679,7 +707,7 @@ static int qcow_is_allocated(BlockDriver
     int index_in_cluster, n;
     uint64_t cluster_offset;
 
-    cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
+    cluster_offset = get_cluster_offset(bs, sector_num << 9);
     index_in_cluster = sector_num & (s->cluster_sectors - 1);
     n = s->cluster_sectors - index_in_cluster;
     if (n > nb_sectors)
@@ -761,7 +789,7 @@ static int qcow_read(BlockDriverState *b
     uint64_t cluster_offset;
 
     while (nb_sectors > 0) {
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
+        cluster_offset = get_cluster_offset(bs, sector_num << 9);
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
         n = s->cluster_sectors - index_in_cluster;
         if (n > nb_sectors)
@@ -810,9 +838,9 @@ static int qcow_write(BlockDriverState *
         n = s->cluster_sectors - index_in_cluster;
         if (n > nb_sectors)
             n = nb_sectors;
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0,
-                                            index_in_cluster,
-                                            index_in_cluster + n);
+        cluster_offset = alloc_cluster_offset(bs, sector_num << 9, 0,
+                                              index_in_cluster,
+                                              index_in_cluster + n);
         if (!cluster_offset)
             return -1;
         if (s->crypt_method) {
@@ -885,8 +913,7 @@ static void qcow_aio_read_cb(void *opaqu
     }
 
     /* prepare next AIO request */
-    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9,
-                                             0, 0, 0, 0);
+    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9);
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
     acb->n = s->cluster_sectors - index_in_cluster;
     if (acb->n > acb->nb_sectors)
@@ -995,9 +1022,9 @@ static void qcow_aio_write_cb(void *opaq
     acb->n = s->cluster_sectors - index_in_cluster;
     if (acb->n > acb->nb_sectors)
         acb->n = acb->nb_sectors;
-    cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0,
-                                        index_in_cluster,
-                                        index_in_cluster + acb->n);
+    cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9, 0,
+                                          index_in_cluster,
+                                          index_in_cluster + acb->n);
     if (!cluster_offset || (cluster_offset & 511) != 0) {
         ret = -EIO;
         goto fail;
@@ -1257,8 +1284,8 @@ static int qcow_write_compressed(BlockDr
         /* could not compress: write normal cluster */
         qcow_write(bs, sector_num, buf, s->cluster_sectors);
     } else {
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
-                                            out_len, 0, 0);
+        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
+                                              out_len, 0, 0);
         cluster_offset &= s->cluster_offset_mask;
         if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
             qemu_free(out_buf);

--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset()
  2008-07-29 14:13 ` [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset() Laurent Vivier
@ 2008-08-05 15:13   ` Kevin Wolf
  2008-08-05 15:25     ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-05 15:13 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> -    if (allocate == 1) {
> -        /* allocate a new cluster */
> -        cluster_offset = alloc_clusters(bs, s->cluster_size);
> -
> -        /* we must initialize the cluster content which won't be
> -           written */
> -        if ((n_end - n_start) < s->cluster_sectors) {
> -            uint64_t start_sect;
> -
> -            start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> -            ret = copy_sectors(bs, start_sect,
> -                               cluster_offset, 0, n_start);
> -            if (ret < 0)
> -                return 0;
> -            ret = copy_sectors(bs, start_sect,
> -                               cluster_offset, n_end, s->cluster_sectors);
> -            if (ret < 0)
> -                return 0;
> -        }
> -        tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> -    } else {
> +
> +    if (compressed_size) {
>          int nb_csectors;
>          cluster_offset = alloc_bytes(bs, compressed_size);
>          nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
>              (cluster_offset >> 9);
>          cluster_offset |= QCOW_OFLAG_COMPRESSED |
>              ((uint64_t)nb_csectors << s->csize_shift);
> -        /* compressed clusters never have the copied flag */
> -        tmp = cpu_to_be64(cluster_offset);
> +
> +        /* update L2 table
> +         * compressed clusters never have the copied flag
> +         */
> +
> +        l2_table[l2_index] = cpu_to_be64(cluster_offset);
> +        if (bdrv_pwrite(s->hd,
> +                        l2_offset + l2_index * sizeof(uint64_t),
> +                        l2_table + l2_index,
> +                        sizeof(uint64_t)) != sizeof(uint64_t))
> +            return 0;
> +
> +        return cluster_offset;
>      }
> +
> +    /* allocate a new cluster */
> +
> +    cluster_offset = alloc_clusters(bs, s->cluster_size);
> +
> +    /* we must initialize the cluster content which won't be
> +       written */
> +
> +    if ((n_end - n_start) < s->cluster_sectors) {
> +        uint64_t start_sect;
> +
> +        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> +        ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
> +        if (ret < 0)
> +            return 0;
> +        ret = copy_sectors(bs, start_sect,
> +                           cluster_offset, n_end, s->cluster_sectors);
> +        if (ret < 0)
> +            return 0;
> +    }
> +
>      /* update L2 table */
> -    l2_table[l2_index] = tmp;
> +
> +    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
>      if (bdrv_pwrite(s->hd,
> -                    l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
> +                    l2_offset + l2_index * sizeof(uint64_t),
> +                    l2_table + l2_index,
> +                    sizeof(uint64_t)) != sizeof(uint64_t))
>          return 0;
> +
>      return cluster_offset;
>  }


Why are you moving that code around? I don't think it's needed for the
get_cluster_offset/allocate_cluster_offset split. You're just
duplicating the write and return.

Otherwise, the patch looks fine and makes the whole thing much cleaner.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset()
  2008-08-05 15:13   ` Kevin Wolf
@ 2008-08-05 15:25     ` Laurent Vivier
  2008-08-05 15:41       ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-08-05 15:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Le mardi 05 août 2008 à 17:13 +0200, Kevin Wolf a écrit :
[...]
> Why are you moving that code around? I don't think it's needed for the
> get_cluster_offset/allocate_cluster_offset split. You're just
> duplicating the write and return.

You're right but doing this here allows to clearly split compressed case
and normal case. It is doing for following patches.
 
> Otherwise, the patch looks fine and makes the whole thing much cleaner.

Thank you,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset()
  2008-08-05 15:25     ` Laurent Vivier
@ 2008-08-05 15:41       ` Kevin Wolf
  0 siblings, 0 replies; 29+ messages in thread
From: Kevin Wolf @ 2008-08-05 15:41 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> Le mardi 05 août 2008 à 17:13 +0200, Kevin Wolf a écrit :
> [...]
>> Why are you moving that code around? I don't think it's needed for the
>> get_cluster_offset/allocate_cluster_offset split. You're just
>> duplicating the write and return.
> 
> You're right but doing this here allows to clearly split compressed case
> and normal case. It is doing for following patches.

Yes, I've noticed that now by myself. Seems I have to look at the
complete patch series until I can tell if this makes sense. ;-)

For now, I assume you have good reasons to do it this way. So take it as
an Ack for this patch as it is.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset()
  2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
  2008-07-29 14:13 ` [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset() Laurent Vivier
  2008-07-29 14:13 ` [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset() Laurent Vivier
@ 2008-07-29 14:13 ` Laurent Vivier
  2008-08-06 14:20   ` Kevin Wolf
  2008-07-29 14:13 ` [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters Laurent Vivier
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

[-- Attachment #1: qcow2-compressed.patch --]
[-- Type: text/plain, Size: 6969 bytes --]

Divide alloc_cluster_offset() into alloc_cluster_offset() and
alloc_compressed_cluster_offset(). Factorize code to free clusters into
free_used_clusters().

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 block-qcow2.c |  105 ++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 66 insertions(+), 39 deletions(-)

Index: qemu/block-qcow2.c
===================================================================
--- qemu.orig/block-qcow2.c	2008-07-29 15:22:20.000000000 +0200
+++ qemu/block-qcow2.c	2008-07-29 15:22:26.000000000 +0200
@@ -598,14 +598,13 @@ static uint64_t get_cluster_offset(Block
     return cluster_offset & ~QCOW_OFLAG_COPIED;
 }
 
-static uint64_t alloc_cluster_offset(BlockDriverState *bs,
-                                     uint64_t offset,
-                                     int compressed_size,
-                                     int n_start, int n_end)
+static uint64_t free_used_clusters(BlockDriverState *bs, uint64_t offset,
+				   uint64_t **l2_table, uint64_t *l2_offset,
+                                   int *l2_index)
 {
     BDRVQcowState *s = bs->opaque;
-    int l1_index, l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset;
+    int l1_index, ret;
+    uint64_t cluster_offset;
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
@@ -614,24 +613,24 @@ static uint64_t alloc_cluster_offset(Blo
             return 0;
     }
 
-    l2_offset = s->l1_table[l1_index];
-    if (l2_offset & QCOW_OFLAG_COPIED) {
-        ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
+    *l2_offset = s->l1_table[l1_index];
+    if (*l2_offset & QCOW_OFLAG_COPIED) {
+        ret = l2_load(bs, l1_index, l2_table, l2_offset);
         if (ret == 0)
             return 0;
     } else {
-        if (l2_offset)
-            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
-        ret = l2_allocate(bs, l1_index, &l2_table, &l2_offset);
+        if (*l2_offset)
+            free_clusters(bs, *l2_offset, s->l2_size * sizeof(uint64_t));
+        ret = l2_allocate(bs, l1_index, l2_table, l2_offset);
         if (ret == 0)
             return 0;
     }
 
-    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-    cluster_offset = be64_to_cpu(l2_table[l2_index]);
+    *l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
+    cluster_offset = be64_to_cpu((*l2_table)[*l2_index]);
 
     if (cluster_offset & QCOW_OFLAG_COPIED)
-        return cluster_offset & ~QCOW_OFLAG_COPIED;
+        return cluster_offset;
 
     if (cluster_offset) {
         /* free the cluster */
@@ -645,28 +644,56 @@ static uint64_t alloc_cluster_offset(Blo
             free_clusters(bs, cluster_offset, s->cluster_size);
         }
     }
+    return 0;
+}
 
-    if (compressed_size) {
-        int nb_csectors;
-        cluster_offset = alloc_bytes(bs, compressed_size);
-        nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
-            (cluster_offset >> 9);
-        cluster_offset |= QCOW_OFLAG_COMPRESSED |
-            ((uint64_t)nb_csectors << s->csize_shift);
-
-        /* update L2 table
-         * compressed clusters never have the copied flag
-         */
-
-        l2_table[l2_index] = cpu_to_be64(cluster_offset);
-        if (bdrv_pwrite(s->hd,
-                        l2_offset + l2_index * sizeof(uint64_t),
-                        l2_table + l2_index,
-                        sizeof(uint64_t)) != sizeof(uint64_t))
-            return 0;
+static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
+                                                 uint64_t offset,
+                                                 int compressed_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    int l2_index;
+    uint64_t l2_offset, *l2_table, cluster_offset;
+    int nb_csectors;
+
+    cluster_offset = free_used_clusters(bs, offset,
+                                        &l2_table, &l2_offset, &l2_index);
+    if (cluster_offset & QCOW_OFLAG_COPIED)
+        return cluster_offset & ~QCOW_OFLAG_COPIED;
+
+    cluster_offset = alloc_bytes(bs, compressed_size);
+    nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
+        (cluster_offset >> 9);
+    cluster_offset |= QCOW_OFLAG_COMPRESSED |
+        ((uint64_t)nb_csectors << s->csize_shift);
+
+    /* update L2 table
+     * compressed clusters never have the copied flag
+     */
+
+    l2_table[l2_index] = cpu_to_be64(cluster_offset);
+    if (bdrv_pwrite(s->hd,
+                    l2_offset + l2_index * sizeof(uint64_t),
+                    l2_table + l2_index,
+                    sizeof(uint64_t)) != sizeof(uint64_t))
+        return 0;
+
+    return cluster_offset;
+}
+
+static uint64_t alloc_cluster_offset(BlockDriverState *bs,
+                                     uint64_t offset,
+                                     int n_start, int n_end)
+{
+    BDRVQcowState *s = bs->opaque;
+    int l2_index, ret;
+    uint64_t l2_offset, *l2_table, cluster_offset;
 
-        return cluster_offset;
-    }
+
+    cluster_offset = free_used_clusters(bs, offset,
+                                        &l2_table, &l2_offset, &l2_index);
+    if (cluster_offset & QCOW_OFLAG_COPIED)
+        return cluster_offset & ~QCOW_OFLAG_COPIED;
 
     /* allocate a new cluster */
 
@@ -838,7 +865,7 @@ static int qcow_write(BlockDriverState *
         n = s->cluster_sectors - index_in_cluster;
         if (n > nb_sectors)
             n = nb_sectors;
-        cluster_offset = alloc_cluster_offset(bs, sector_num << 9, 0,
+        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
                                               index_in_cluster,
                                               index_in_cluster + n);
         if (!cluster_offset)
@@ -1022,7 +1049,7 @@ static void qcow_aio_write_cb(void *opaq
     acb->n = s->cluster_sectors - index_in_cluster;
     if (acb->n > acb->nb_sectors)
         acb->n = acb->nb_sectors;
-    cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9, 0,
+    cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
                                           index_in_cluster,
                                           index_in_cluster + acb->n);
     if (!cluster_offset || (cluster_offset & 511) != 0) {
@@ -1284,8 +1311,8 @@ static int qcow_write_compressed(BlockDr
         /* could not compress: write normal cluster */
         qcow_write(bs, sector_num, buf, s->cluster_sectors);
     } else {
-        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
-                                              out_len, 0, 0);
+        cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9,
+                                                         out_len);
         cluster_offset &= s->cluster_offset_mask;
         if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
             qemu_free(out_buf);

--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset()
  2008-07-29 14:13 ` [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset() Laurent Vivier
@ 2008-08-06 14:20   ` Kevin Wolf
  2008-08-06 14:41     ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-06 14:20 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> Divide alloc_cluster_offset() into alloc_cluster_offset() and
> alloc_compressed_cluster_offset(). Factorize code to free clusters into
> free_used_clusters().
> 
> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>

I think free_used_clusters() is misnamed. That it frees clusters is more
of an additional action it performs. What it really is doing is to load
the appropriate L2 table into memory (and to allocate it if needed).

Also, the current function name doesn't provide a hint how the return
value is to be interpreted. If I'm not mistaken, 0 could mean that
everything went fine and the cluster has been freed, or it could mean
that an error occurred.

Maybe you meant to return cluster_offset instead of 0 at the end of the
function (that you check for QCOW_OFLAG_COPIED which is _always_ set if
the function returns != 0 suggests this), but I can't tell for sure
because there is no documentation on the return value.

So I suggest that, besides renaming and possibly fixing, you add a
header comment to the function which describes the whole functionality
it performs (it's too much to fit in a function name) and what the
return value actually means.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset()
  2008-08-06 14:20   ` Kevin Wolf
@ 2008-08-06 14:41     ` Laurent Vivier
  2008-08-06 14:56       ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-08-06 14:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Le mercredi 06 août 2008 à 16:20 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > Divide alloc_cluster_offset() into alloc_cluster_offset() and
> > alloc_compressed_cluster_offset(). Factorize code to free clusters into
> > free_used_clusters().
> > 
> > Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> 
> I think free_used_clusters() is misnamed. That it frees clusters is more
> of an additional action it performs. What it really is doing is to load
> the appropriate L2 table into memory (and to allocate it if needed).

It frees the cluster if it is found except if it has the flag
QCOW_OFLAG_COPIED. To do that it needs to load the L2 table.

But I'm open to a new name, I don't like this one too.

> Also, the current function name doesn't provide a hint how the return
> value is to be interpreted. If I'm not mistaken, 0 could mean that
> everything went fine and the cluster has been freed, or it could mean
> that an error occurred.

Yes, but this is the original behavior and I don't want to modify it.

> 
> Maybe you meant to return cluster_offset instead of 0 at the end of the

No, at the end of function the cluster has been freed and the offset is
not valid anymore.

> function (that you check for QCOW_OFLAG_COPIED which is _always_ set if
> the function returns != 0 suggests this), but I can't tell for sure
> because there is no documentation on the return value.

You're right. But to return the offset with the flag allows to
differentiate the case returning 0 from an offset equal to 0.
If offset is 0, it returns (0 | QCOW_OFLAG_COPIED), but I agree to say
that offset cannot be 0, but again I keep the original behavior.

> So I suggest that, besides renaming and possibly fixing, you add a
> header comment to the function which describes the whole functionality
> it performs (it's too much to fit in a function name) and what the
> return value actually means.

I agree. Could you propose a name ?

Regards,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset()
  2008-08-06 14:41     ` Laurent Vivier
@ 2008-08-06 14:56       ` Kevin Wolf
  2008-08-06 15:03         ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-06 14:56 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> Le mercredi 06 août 2008 à 16:20 +0200, Kevin Wolf a écrit :
>> Laurent Vivier schrieb:
>>> Divide alloc_cluster_offset() into alloc_cluster_offset() and
>>> alloc_compressed_cluster_offset(). Factorize code to free clusters into
>>> free_used_clusters().
>>>
>>> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
>> I think free_used_clusters() is misnamed. That it frees clusters is more
>> of an additional action it performs. What it really is doing is to load
>> the appropriate L2 table into memory (and to allocate it if needed).
> 
> It frees the cluster if it is found except if it has the flag
> QCOW_OFLAG_COPIED. To do that it needs to load the L2 table.
> 
> But I'm open to a new name, I don't like this one too.

Well, yes and no. Of course, it needs to have the L2 table to free the
cluster. But it does not only load that table for local usage but
returns it to the caller using the parameters passed by reference. So
this loading doesn't only happen internally but it belongs to the interface.

Actually, I'm also having a hard time finding a fitting name for the
function. Maybe that's an indication that the function is doing two
things which don't really belong together. I'm not sure if it makes
sense, but you could do the L2 table loading in another function which
is called before free_used_clusters() and pass the already loaded table
into free_used_clusters().

>> Also, the current function name doesn't provide a hint how the return
>> value is to be interpreted. If I'm not mistaken, 0 could mean that
>> everything went fine and the cluster has been freed, or it could mean
>> that an error occurred.
> 
> Yes, but this is the original behavior and I don't want to modify it.
> 
>> Maybe you meant to return cluster_offset instead of 0 at the end of the
> 
> No, at the end of function the cluster has been freed and the offset is
> not valid anymore.

Yes, you have a point there...

>> function (that you check for QCOW_OFLAG_COPIED which is _always_ set if
>> the function returns != 0 suggests this), but I can't tell for sure
>> because there is no documentation on the return value.
> 
> You're right. But to return the offset with the flag allows to
> differentiate the case returning 0 from an offset equal to 0.
> If offset is 0, it returns (0 | QCOW_OFLAG_COPIED), but I agree to say
> that offset cannot be 0, but again I keep the original behavior.

I fully agree if you want to keep the original behaviour in the external
interface. However, free_used_clusters() is only used in your own
functions, so you could actually make that change without a too big risk.

But the most important part is really the documentation. You see how I
misunderstood what you were trying to achieve. This could be easily avoided.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset()
  2008-08-06 14:56       ` Kevin Wolf
@ 2008-08-06 15:03         ` Laurent Vivier
  0 siblings, 0 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-08-06 15:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Le mercredi 06 août 2008 à 16:56 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > Le mercredi 06 août 2008 à 16:20 +0200, Kevin Wolf a écrit :
> >> Laurent Vivier schrieb:
> >>> Divide alloc_cluster_offset() into alloc_cluster_offset() and
> >>> alloc_compressed_cluster_offset(). Factorize code to free clusters into
> >>> free_used_clusters().
> >>>
> >>> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> >> I think free_used_clusters() is misnamed. That it frees clusters is more
> >> of an additional action it performs. What it really is doing is to load
> >> the appropriate L2 table into memory (and to allocate it if needed).
> > 
> > It frees the cluster if it is found except if it has the flag
> > QCOW_OFLAG_COPIED. To do that it needs to load the L2 table.
> > 
> > But I'm open to a new name, I don't like this one too.
> 
> Well, yes and no. Of course, it needs to have the L2 table to free the
> cluster. But it does not only load that table for local usage but
> returns it to the caller using the parameters passed by reference. So
> this loading doesn't only happen internally but it belongs to the interface.

I agree.

> 
> Actually, I'm also having a hard time finding a fitting name for the
> function. Maybe that's an indication that the function is doing two
> things which don't really belong together. I'm not sure if it makes
> sense, but you could do the L2 table loading in another function which
> is called before free_used_clusters() and pass the already loaded table
> into free_used_clusters().

I agree.

> >> Also, the current function name doesn't provide a hint how the return
> >> value is to be interpreted. If I'm not mistaken, 0 could mean that
> >> everything went fine and the cluster has been freed, or it could mean
> >> that an error occurred.
> > 
> > Yes, but this is the original behavior and I don't want to modify it.
> > 
> >> Maybe you meant to return cluster_offset instead of 0 at the end of the
> > 
> > No, at the end of function the cluster has been freed and the offset is
> > not valid anymore.
> 
> Yes, you have a point there...
> 
> >> function (that you check for QCOW_OFLAG_COPIED which is _always_ set if
> >> the function returns != 0 suggests this), but I can't tell for sure
> >> because there is no documentation on the return value.
> > 
> > You're right. But to return the offset with the flag allows to
> > differentiate the case returning 0 from an offset equal to 0.
> > If offset is 0, it returns (0 | QCOW_OFLAG_COPIED), but I agree to say
> > that offset cannot be 0, but again I keep the original behavior.
> 
> I fully agree if you want to keep the original behaviour in the external
> interface. However, free_used_clusters() is only used in your own
> functions, so you could actually make that change without a too big risk.

I'll try. No promise...

> But the most important part is really the documentation. You see how I
> misunderstood what you were trying to achieve. This could be easily avoided.

OK.

Thank you for your comments,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.
  2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
                   ` (2 preceding siblings ...)
  2008-07-29 14:13 ` [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset() Laurent Vivier
@ 2008-07-29 14:13 ` Laurent Vivier
  2008-08-11 12:10   ` Kevin Wolf
  2008-07-29 14:13 ` [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters Laurent Vivier
  2008-07-29 19:15 ` [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Anthony Liguori
  5 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

[-- Attachment #1: qcow2-extent.patch --]
[-- Type: text/plain, Size: 12137 bytes --]

Modify get_cluster_offset(), alloc_cluster_offset() and free_used_clusters()
to specify how many clusters we want.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 block-qcow2.c |  212 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 154 insertions(+), 58 deletions(-)

Index: qemu/block-qcow2.c
===================================================================
--- qemu.orig/block-qcow2.c	2008-07-29 15:22:26.000000000 +0200
+++ qemu/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
@@ -575,32 +575,76 @@ static int l2_allocate(BlockDriverState 
     return 1;
 }
 
-static uint64_t get_cluster_offset(BlockDriverState *bs, uint64_t offset)
+static uint64_t get_cluster_offset(BlockDriverState *bs,
+                                   uint64_t offset, int *num)
 {
     BDRVQcowState *s = bs->opaque;
     int l1_index, l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset;
+    uint64_t l2_offset, *l2_table, cluster_offset, next;
+    int l1_bits;
+    int index_in_cluster, nb_available, nb_needed;
 
-    l1_index = offset >> (s->l2_bits + s->cluster_bits);
+    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
+    nb_needed = *num + index_in_cluster;
+
+    l1_bits = s->l2_bits + s->cluster_bits;
+
+    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
+    nb_available = (nb_available >> 9) + index_in_cluster;
+
+    cluster_offset = 0;
+
+    l1_index = offset >> l1_bits;
     if (l1_index >= s->l1_size)
-        return 0;
+        goto out;
 
     if (!s->l1_table[l1_index])
-        return 0;
+        goto out;
 
     ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
     if (ret == 0)
-        return 0;
+        goto out;
 
     l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
+    nb_available = s->cluster_sectors;
+    l2_index++;
+
+    if (!cluster_offset) {
+
+       /* how many empty clusters ? */
+
+       while (nb_available < nb_needed && !l2_table[l2_index]) {
+           l2_index++;
+           nb_available += s->cluster_sectors;
+       }
+
+   } else {
 
-    return cluster_offset & ~QCOW_OFLAG_COPIED;
+       /* how many allocated clusters ? */
+
+       cluster_offset &= ~QCOW_OFLAG_COPIED;
+       while (nb_available < nb_needed) {
+           next = be64_to_cpu(l2_table[l2_index]) & ~QCOW_OFLAG_COPIED;
+           if (next != cluster_offset + (nb_available << 9))
+               break;
+           l2_index++;
+           nb_available += s->cluster_sectors;
+       }
+   }
+
+out:
+    if (nb_available > nb_needed)
+        nb_available = nb_needed;
+
+    *num = nb_available - index_in_cluster;
+
+    return cluster_offset;
 }
 
 static uint64_t free_used_clusters(BlockDriverState *bs, uint64_t offset,
 				   uint64_t **l2_table, uint64_t *l2_offset,
-                                   int *l2_index)
+                                   int *l2_index, int *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int l1_index, ret;
@@ -629,21 +673,63 @@ static uint64_t free_used_clusters(Block
     *l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
     cluster_offset = be64_to_cpu((*l2_table)[*l2_index]);
 
-    if (cluster_offset & QCOW_OFLAG_COPIED)
+    if (nb_clusters && *nb_clusters > s->l2_size - (*l2_index))
+            *nb_clusters = s->l2_size - (*l2_index);
+
+    if (!cluster_offset) {
+        if (nb_clusters) {
+            int i = 1;
+            while (i < *nb_clusters && (*l2_table)[(*l2_index) + i] == 0) {
+                i++;
+            }
+            *nb_clusters = i;
+        }
+        return 0;
+    }
+
+    if (cluster_offset & QCOW_OFLAG_COPIED) {
+        if (nb_clusters) {
+            int i = 1;
+            uint64_t current;
+            while (i < *nb_clusters) {
+                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
+                if (cluster_offset + (i << s->cluster_bits) != current)
+                    break;
+                i++;
+            }
+            *nb_clusters = i;
+        }
         return cluster_offset;
+    }
 
-    if (cluster_offset) {
-        /* free the cluster */
-        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
-            int nb_csectors;
-            nb_csectors = ((cluster_offset >> s->csize_shift) &
-                           s->csize_mask) + 1;
-            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
-                          nb_csectors * 512);
-        } else {
-            free_clusters(bs, cluster_offset, s->cluster_size);
+    /* free the cluster */
+
+    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
+        int nb_csectors;
+        nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
+        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
+                      nb_csectors * 512);
+        if (nb_clusters)
+            *nb_clusters = 1;
+        return 0;
+    }
+
+    if (nb_clusters) {
+        int i = 1;
+        uint64_t current;
+        while (i < *nb_clusters) {
+            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
+            if (cluster_offset + (i << s->cluster_bits) != current)
+                break;
+            i++;
         }
+        *nb_clusters = i;
+        free_clusters(bs, cluster_offset, i << s->cluster_bits);
+        return 0;
     }
+
+    free_clusters(bs, cluster_offset, s->cluster_size);
+
     return 0;
 }
 
@@ -657,7 +743,8 @@ static uint64_t alloc_compressed_cluster
     int nb_csectors;
 
     cluster_offset = free_used_clusters(bs, offset,
-                                        &l2_table, &l2_offset, &l2_index);
+                                        &l2_table, &l2_offset, &l2_index,
+                                        NULL);
     if (cluster_offset & QCOW_OFLAG_COPIED)
         return cluster_offset & ~QCOW_OFLAG_COPIED;
 
@@ -683,63 +770,80 @@ static uint64_t alloc_compressed_cluster
 
 static uint64_t alloc_cluster_offset(BlockDriverState *bs,
                                      uint64_t offset,
-                                     int n_start, int n_end)
+                                     int n_start, int n_end,
+                                     int *num)
 {
     BDRVQcowState *s = bs->opaque;
     int l2_index, ret;
     uint64_t l2_offset, *l2_table, cluster_offset;
+    int nb_available, nb_clusters, i;
+    uint64_t start_sect;
 
+    nb_clusters = ((n_end << 9) + s->cluster_size - 1) >>
+                  s->cluster_bits;
 
     cluster_offset = free_used_clusters(bs, offset,
-                                        &l2_table, &l2_offset, &l2_index);
-    if (cluster_offset & QCOW_OFLAG_COPIED)
-        return cluster_offset & ~QCOW_OFLAG_COPIED;
+                                        &l2_table, &l2_offset, &l2_index,
+                                        &nb_clusters);
+    nb_available = nb_clusters << (s->cluster_bits - 9);
+    if (nb_available > n_end)
+        nb_available = n_end;
+
+    if (cluster_offset & QCOW_OFLAG_COPIED) {
+        cluster_offset &= ~QCOW_OFLAG_COPIED;
+        goto out;
+    }
 
-    /* allocate a new cluster */
+    /* allocate new clusters */
 
-    cluster_offset = alloc_clusters(bs, s->cluster_size);
+    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
 
     /* we must initialize the cluster content which won't be
        written */
 
-    if ((n_end - n_start) < s->cluster_sectors) {
-        uint64_t start_sect;
-
-        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
+    start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
+    if (n_start) {
         ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
         if (ret < 0)
             return 0;
-        ret = copy_sectors(bs, start_sect,
-                           cluster_offset, n_end, s->cluster_sectors);
+    }
+
+    if (nb_available & (s->cluster_sectors - 1)) {
+        uint64_t end = nb_available & ~(uint64_t)(s->cluster_sectors - 1);
+        ret = copy_sectors(bs, start_sect + end,
+                           cluster_offset + (end << 9),
+                           nb_available - end,
+                           s->cluster_sectors);
         if (ret < 0)
             return 0;
     }
 
     /* update L2 table */
 
-    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
+    for (i = 0; i < nb_clusters; i++)
+        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
+                                             (i << s->cluster_bits)) |
+                                             QCOW_OFLAG_COPIED);
+
     if (bdrv_pwrite(s->hd,
                     l2_offset + l2_index * sizeof(uint64_t),
                     l2_table + l2_index,
-                    sizeof(uint64_t)) != sizeof(uint64_t))
+                    nb_clusters * sizeof(uint64_t)) !=
+                    nb_clusters * sizeof(uint64_t))
         return 0;
 
+out:
+    *num = nb_available - n_start;
     return cluster_offset;
 }
 
 static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
                              int nb_sectors, int *pnum)
 {
-    BDRVQcowState *s = bs->opaque;
-    int index_in_cluster, n;
     uint64_t cluster_offset;
 
-    cluster_offset = get_cluster_offset(bs, sector_num << 9);
-    index_in_cluster = sector_num & (s->cluster_sectors - 1);
-    n = s->cluster_sectors - index_in_cluster;
-    if (n > nb_sectors)
-        n = nb_sectors;
-    *pnum = n;
+    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
+
     return (cluster_offset != 0);
 }
 
@@ -816,11 +920,9 @@ static int qcow_read(BlockDriverState *b
     uint64_t cluster_offset;
 
     while (nb_sectors > 0) {
-        cluster_offset = get_cluster_offset(bs, sector_num << 9);
+        n = nb_sectors;
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n = s->cluster_sectors - index_in_cluster;
-        if (n > nb_sectors)
-            n = nb_sectors;
         if (!cluster_offset) {
             if (bs->backing_hd) {
                 /* read from the base image */
@@ -862,12 +964,10 @@ static int qcow_write(BlockDriverState *
 
     while (nb_sectors > 0) {
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n = s->cluster_sectors - index_in_cluster;
-        if (n > nb_sectors)
-            n = nb_sectors;
         cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
                                               index_in_cluster,
-                                              index_in_cluster + n);
+                                              index_in_cluster + nb_sectors,
+                                              &n);
         if (!cluster_offset)
             return -1;
         if (s->crypt_method) {
@@ -940,11 +1040,9 @@ static void qcow_aio_read_cb(void *opaqu
     }
 
     /* prepare next AIO request */
-    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9);
+    acb->n = acb->nb_sectors;
+    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
-    acb->n = s->cluster_sectors - index_in_cluster;
-    if (acb->n > acb->nb_sectors)
-        acb->n = acb->nb_sectors;
 
     if (!acb->cluster_offset) {
         if (bs->backing_hd) {
@@ -1046,12 +1144,10 @@ static void qcow_aio_write_cb(void *opaq
     }
 
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
-    acb->n = s->cluster_sectors - index_in_cluster;
-    if (acb->n > acb->nb_sectors)
-        acb->n = acb->nb_sectors;
     cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
                                           index_in_cluster,
-                                          index_in_cluster + acb->n);
+                                          index_in_cluster + acb->nb_sectors,
+                                          &acb->n);
     if (!cluster_offset || (cluster_offset & 511) != 0) {
         ret = -EIO;
         goto fail;

--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.
  2008-07-29 14:13 ` [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters Laurent Vivier
@ 2008-08-11 12:10   ` Kevin Wolf
  2008-08-11 12:39     ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-11 12:10 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> Modify get_cluster_offset(), alloc_cluster_offset() and free_used_clusters()
> to specify how many clusters we want.
> 
> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> ---
>  block-qcow2.c |  212 ++++++++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 154 insertions(+), 58 deletions(-)
> 
> Index: qemu/block-qcow2.c
> ===================================================================
> --- qemu.orig/block-qcow2.c	2008-07-29 15:22:26.000000000 +0200
> +++ qemu/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
> @@ -575,32 +575,76 @@ static int l2_allocate(BlockDriverState 
>      return 1;
>  }
>  
> -static uint64_t get_cluster_offset(BlockDriverState *bs, uint64_t offset)
> +static uint64_t get_cluster_offset(BlockDriverState *bs,
> +                                   uint64_t offset, int *num)

I think you start to know what kind of comments I'll provide. So yes,
here's another one of them: While it's intuitive what value I should
pass for num, it's cleary not what the function will return in it. Or
even what the function is doing at all.

This is how I understand it: The returned num is the number of
contiguous clusters that can be read with a single read operation, i.e.
they are all sparse, come from a backing file or are physically
contiguous in the image file.

Add a comment which says this and I'll be happy.

>  {
>      BDRVQcowState *s = bs->opaque;
>      int l1_index, l2_index, ret;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> +    uint64_t l2_offset, *l2_table, cluster_offset, next;
> +    int l1_bits;
> +    int index_in_cluster, nb_available, nb_needed;
>  
> -    l1_index = offset >> (s->l2_bits + s->cluster_bits);
> +    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> +    nb_needed = *num + index_in_cluster;
> +
> +    l1_bits = s->l2_bits + s->cluster_bits;
> +
> +    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> +    nb_available = (nb_available >> 9) + index_in_cluster;

This could use a comment that nb_available is the remaining sectors in
the L2 table (is it?) and that it is used in the following two
conditions (the goto makes this non-obvious - at first, I thought that
this value wouldn't be used at all)

> +
> +    cluster_offset = 0;
> +
> +    l1_index = offset >> l1_bits;
>      if (l1_index >= s->l1_size)
> -        return 0;
> +        goto out;
>  
>      if (!s->l1_table[l1_index])
> -        return 0;
> +        goto out;
>  
>      ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
>      if (ret == 0)
> -        return 0;
> +        goto out;

ret == 0 means that loading the L2 table failed. This is a real error,
right? Isn't return 0 the right thing to do then?

>  
>      l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
> +    nb_available = s->cluster_sectors;
> +    l2_index++;
> +
> +    if (!cluster_offset) {
> +
> +       /* how many empty clusters ? */
> +
> +       while (nb_available < nb_needed && !l2_table[l2_index]) {
> +           l2_index++;
> +           nb_available += s->cluster_sectors;
> +       }
> +
> +   } else {
>  
> -    return cluster_offset & ~QCOW_OFLAG_COPIED;
> +       /* how many allocated clusters ? */
> +
> +       cluster_offset &= ~QCOW_OFLAG_COPIED;
> +       while (nb_available < nb_needed) {
> +           next = be64_to_cpu(l2_table[l2_index]) & ~QCOW_OFLAG_COPIED;
> +           if (next != cluster_offset + (nb_available << 9))
> +               break;
> +           l2_index++;
> +           nb_available += s->cluster_sectors;
> +       }
> +   }
> +
> +out:
> +    if (nb_available > nb_needed)
> +        nb_available = nb_needed;
> +
> +    *num = nb_available - index_in_cluster;
> +
> +    return cluster_offset;
>  }
>  
>  static uint64_t free_used_clusters(BlockDriverState *bs, uint64_t offset,
>  				   uint64_t **l2_table, uint64_t *l2_offset,
> -                                   int *l2_index)
> +                                   int *l2_index, int *nb_clusters)

You would save some ifs if you didn't allow nb_cluster to be NULL.
Passing a local variable containing 1 should do the very same thing and
seems to be less error prone. Otherwise, put a note here which says what
passing NULL means.

>  {
>      BDRVQcowState *s = bs->opaque;
>      int l1_index, ret;
> @@ -629,21 +673,63 @@ static uint64_t free_used_clusters(Block
>      *l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
>      cluster_offset = be64_to_cpu((*l2_table)[*l2_index]);
>  
> -    if (cluster_offset & QCOW_OFLAG_COPIED)
> +    if (nb_clusters && *nb_clusters > s->l2_size - (*l2_index))
> +            *nb_clusters = s->l2_size - (*l2_index);
> +
> +    if (!cluster_offset) {
> +        if (nb_clusters) {
> +            int i = 1;
> +            while (i < *nb_clusters && (*l2_table)[(*l2_index) + i] == 0) {
> +                i++;
> +            }
> +            *nb_clusters = i;
> +        }
> +        return 0;
> +    }
> +
> +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> +        if (nb_clusters) {
> +            int i = 1;
> +            uint64_t current;
> +            while (i < *nb_clusters) {
> +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> +                if (cluster_offset + (i << s->cluster_bits) != current)
> +                    break;
> +                i++;
> +            }
> +            *nb_clusters = i;
> +        }
>          return cluster_offset;
> +    }
>  
> -    if (cluster_offset) {
> -        /* free the cluster */
> -        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> -            int nb_csectors;
> -            nb_csectors = ((cluster_offset >> s->csize_shift) &
> -                           s->csize_mask) + 1;
> -            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> -                          nb_csectors * 512);
> -        } else {
> -            free_clusters(bs, cluster_offset, s->cluster_size);
> +    /* free the cluster */
> +
> +    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> +        int nb_csectors;
> +        nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
> +        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> +                      nb_csectors * 512);
> +        if (nb_clusters)
> +            *nb_clusters = 1;
> +        return 0;
> +    }
> +
> +    if (nb_clusters) {
> +        int i = 1;
> +        uint64_t current;
> +        while (i < *nb_clusters) {
> +            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> +            if (cluster_offset + (i << s->cluster_bits) != current)
> +                break;
> +            i++;
>          }
> +        *nb_clusters = i;
> +        free_clusters(bs, cluster_offset, i << s->cluster_bits);
> +        return 0;
>      }
> +
> +    free_clusters(bs, cluster_offset, s->cluster_size);
> +
>      return 0;
>  }
>  
> @@ -657,7 +743,8 @@ static uint64_t alloc_compressed_cluster
>      int nb_csectors;
>  
>      cluster_offset = free_used_clusters(bs, offset,
> -                                        &l2_table, &l2_offset, &l2_index);
> +                                        &l2_table, &l2_offset, &l2_index,
> +                                        NULL);
>      if (cluster_offset & QCOW_OFLAG_COPIED)
>          return cluster_offset & ~QCOW_OFLAG_COPIED;
>  
> @@ -683,63 +770,80 @@ static uint64_t alloc_compressed_cluster
>  
>  static uint64_t alloc_cluster_offset(BlockDriverState *bs,
>                                       uint64_t offset,
> -                                     int n_start, int n_end)
> +                                     int n_start, int n_end,
> +                                     int *num)

The interface between get_cluster_offset and alloc_cluster_offset is
inconsistent. In the former function, the value passed in num is used to
determine the number of clusters to get. In the latter, num is an output
parameter whose value isn't used. This is confusing.

>  {
>      BDRVQcowState *s = bs->opaque;
>      int l2_index, ret;
>      uint64_t l2_offset, *l2_table, cluster_offset;
> +    int nb_available, nb_clusters, i;
> +    uint64_t start_sect;
>  
> +    nb_clusters = ((n_end << 9) + s->cluster_size - 1) >>
> +                  s->cluster_bits;
>  
>      cluster_offset = free_used_clusters(bs, offset,
> -                                        &l2_table, &l2_offset, &l2_index);
> -    if (cluster_offset & QCOW_OFLAG_COPIED)
> -        return cluster_offset & ~QCOW_OFLAG_COPIED;
> +                                        &l2_table, &l2_offset, &l2_index,
> +                                        &nb_clusters);
> +    nb_available = nb_clusters << (s->cluster_bits - 9);
> +    if (nb_available > n_end)
> +        nb_available = n_end;
> +
> +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> +        cluster_offset &= ~QCOW_OFLAG_COPIED;
> +        goto out;
> +    }
>  
> -    /* allocate a new cluster */
> +    /* allocate new clusters */
>  
> -    cluster_offset = alloc_clusters(bs, s->cluster_size);
> +    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
>  
>      /* we must initialize the cluster content which won't be
>         written */
>  
> -    if ((n_end - n_start) < s->cluster_sectors) {
> -        uint64_t start_sect;
> -
> -        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> +    start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> +    if (n_start) {
>          ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
>          if (ret < 0)
>              return 0;
> -        ret = copy_sectors(bs, start_sect,
> -                           cluster_offset, n_end, s->cluster_sectors);
> +    }
> +
> +    if (nb_available & (s->cluster_sectors - 1)) {
> +        uint64_t end = nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> +        ret = copy_sectors(bs, start_sect + end,
> +                           cluster_offset + (end << 9),
> +                           nb_available - end,
> +                           s->cluster_sectors);
>          if (ret < 0)
>              return 0;
>      }
>  
>      /* update L2 table */
>  
> -    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> +    for (i = 0; i < nb_clusters; i++)
> +        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> +                                             (i << s->cluster_bits)) |
> +                                             QCOW_OFLAG_COPIED);
> +
>      if (bdrv_pwrite(s->hd,
>                      l2_offset + l2_index * sizeof(uint64_t),
>                      l2_table + l2_index,
> -                    sizeof(uint64_t)) != sizeof(uint64_t))
> +                    nb_clusters * sizeof(uint64_t)) !=
> +                    nb_clusters * sizeof(uint64_t))
>          return 0;
>  
> +out:
> +    *num = nb_available - n_start;
>      return cluster_offset;
>  }
>  
>  static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
>                               int nb_sectors, int *pnum)
>  {
> -    BDRVQcowState *s = bs->opaque;
> -    int index_in_cluster, n;
>      uint64_t cluster_offset;
>  
> -    cluster_offset = get_cluster_offset(bs, sector_num << 9);
> -    index_in_cluster = sector_num & (s->cluster_sectors - 1);
> -    n = s->cluster_sectors - index_in_cluster;
> -    if (n > nb_sectors)
> -        n = nb_sectors;
> -    *pnum = n;
> +    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> +
>      return (cluster_offset != 0);
>  }
>  
> @@ -816,11 +920,9 @@ static int qcow_read(BlockDriverState *b
>      uint64_t cluster_offset;
>  
>      while (nb_sectors > 0) {
> -        cluster_offset = get_cluster_offset(bs, sector_num << 9);
> +        n = nb_sectors;
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
>          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> -        n = s->cluster_sectors - index_in_cluster;
> -        if (n > nb_sectors)
> -            n = nb_sectors;
>          if (!cluster_offset) {
>              if (bs->backing_hd) {
>                  /* read from the base image */
> @@ -862,12 +964,10 @@ static int qcow_write(BlockDriverState *
>  
>      while (nb_sectors > 0) {
>          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> -        n = s->cluster_sectors - index_in_cluster;
> -        if (n > nb_sectors)
> -            n = nb_sectors;
>          cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
>                                                index_in_cluster,
> -                                              index_in_cluster + n);
> +                                              index_in_cluster + nb_sectors,
> +                                              &n);
>          if (!cluster_offset)
>              return -1;
>          if (s->crypt_method) {
> @@ -940,11 +1040,9 @@ static void qcow_aio_read_cb(void *opaqu
>      }
>  
>      /* prepare next AIO request */
> -    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9);
> +    acb->n = acb->nb_sectors;
> +    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
>      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> -    acb->n = s->cluster_sectors - index_in_cluster;
> -    if (acb->n > acb->nb_sectors)
> -        acb->n = acb->nb_sectors;
>  
>      if (!acb->cluster_offset) {
>          if (bs->backing_hd) {
> @@ -1046,12 +1144,10 @@ static void qcow_aio_write_cb(void *opaq
>      }
>  
>      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> -    acb->n = s->cluster_sectors - index_in_cluster;
> -    if (acb->n > acb->nb_sectors)
> -        acb->n = acb->nb_sectors;
>      cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
>                                            index_in_cluster,
> -                                          index_in_cluster + acb->n);
> +                                          index_in_cluster + acb->nb_sectors,
> +                                          &acb->n);
>      if (!cluster_offset || (cluster_offset & 511) != 0) {
>          ret = -EIO;
>          goto fail;

In the writing functions, you can't just assign a big n, because
s->cluster_data will be too small when processing encrypted data. As you
said you fixed a segfault, I think you know this one already.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.
  2008-08-11 12:10   ` Kevin Wolf
@ 2008-08-11 12:39     ` Laurent Vivier
  0 siblings, 0 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-08-11 12:39 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Hi Kevin,

BTW, I'm currently rewriting this patch...

Le lundi 11 août 2008 à 14:10 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > Modify get_cluster_offset(), alloc_cluster_offset() and free_used_clusters()
> > to specify how many clusters we want.
> > 
> > Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> > ---
> >  block-qcow2.c |  212 ++++++++++++++++++++++++++++++++++++++++++----------------
> >  1 file changed, 154 insertions(+), 58 deletions(-)
> > 
> > Index: qemu/block-qcow2.c
> > ===================================================================
> > --- qemu.orig/block-qcow2.c	2008-07-29 15:22:26.000000000 +0200
> > +++ qemu/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
> > @@ -575,32 +575,76 @@ static int l2_allocate(BlockDriverState 
> >      return 1;
> >  }
> >  
> > -static uint64_t get_cluster_offset(BlockDriverState *bs, uint64_t offset)
> > +static uint64_t get_cluster_offset(BlockDriverState *bs,
> > +                                   uint64_t offset, int *num)
> 
> I think you start to know what kind of comments I'll provide. So yes,
> here's another one of them: While it's intuitive what value I should
> pass for num, it's cleary not what the function will return in it. Or
> even what the function is doing at all.
> 
> This is how I understand it: The returned num is the number of
> contiguous clusters that can be read with a single read operation, i.e.
> they are all sparse, come from a backing file or are physically
> contiguous in the image file.

Yes

> Add a comment which says this and I'll be happy.

OK (I'll cut&paste the lines above ;-) )

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l1_index, l2_index, ret;
> > -    uint64_t l2_offset, *l2_table, cluster_offset;
> > +    uint64_t l2_offset, *l2_table, cluster_offset, next;
> > +    int l1_bits;
> > +    int index_in_cluster, nb_available, nb_needed;
> >  
> > -    l1_index = offset >> (s->l2_bits + s->cluster_bits);
> > +    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> > +    nb_needed = *num + index_in_cluster;
> > +
> > +    l1_bits = s->l2_bits + s->cluster_bits;
> > +
> > +    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> > +    nb_available = (nb_available >> 9) + index_in_cluster;
> 
> This could use a comment that nb_available is the remaining sectors in
> the L2 table (is it?) and that it is used in the following two
> conditions (the goto makes this non-obvious - at first, I thought that
> this value wouldn't be used at all)

nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));

is the number of bytes given by one l1 entry

nb_available = (nb_available >> 9) + index_in_cluster;

is the number of sectors from the first sector we want to the last
sector of the same l1 entry.

I do that because I don't want to manage the case where we run across a
l1 cache entry boundary.

I guess you want comments ?

> > +
> > +    cluster_offset = 0;
> > +
> > +    l1_index = offset >> l1_bits;
> >      if (l1_index >= s->l1_size)
> > -        return 0;
> > +        goto out;
> >  
> >      if (!s->l1_table[l1_index])
> > -        return 0;
> > +        goto out;
> >  
> >      ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
> >      if (ret == 0)
> > -        return 0;
> > +        goto out;
> 
> ret == 0 means that loading the L2 table failed. This is a real error,
> right? Isn't return 0 the right thing to do then?

Yes... :-P

> >  
> >      l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> >      cluster_offset = be64_to_cpu(l2_table[l2_index]);
> > +    nb_available = s->cluster_sectors;
> > +    l2_index++;
> > +
> > +    if (!cluster_offset) {
> > +
> > +       /* how many empty clusters ? */
> > +
> > +       while (nb_available < nb_needed && !l2_table[l2_index]) {
> > +           l2_index++;
> > +           nb_available += s->cluster_sectors;
> > +       }
> > +
> > +   } else {
> >  
> > -    return cluster_offset & ~QCOW_OFLAG_COPIED;
> > +       /* how many allocated clusters ? */
> > +
> > +       cluster_offset &= ~QCOW_OFLAG_COPIED;
> > +       while (nb_available < nb_needed) {
> > +           next = be64_to_cpu(l2_table[l2_index]) & ~QCOW_OFLAG_COPIED;
> > +           if (next != cluster_offset + (nb_available << 9))
> > +               break;
> > +           l2_index++;
> > +           nb_available += s->cluster_sectors;
> > +       }
> > +   }
> > +
> > +out:
> > +    if (nb_available > nb_needed)
> > +        nb_available = nb_needed;
> > +
> > +    *num = nb_available - index_in_cluster;
> > +
> > +    return cluster_offset;
> >  }
> >  
> >  static uint64_t free_used_clusters(BlockDriverState *bs, uint64_t offset,
> >  				   uint64_t **l2_table, uint64_t *l2_offset,
> > -                                   int *l2_index)
> > +                                   int *l2_index, int *nb_clusters)
> 
> You would save some ifs if you didn't allow nb_cluster to be NULL.
> Passing a local variable containing 1 should do the very same thing and
> seems to be less error prone. Otherwise, put a note here which says what
> passing NULL means.
> 

To follow your previous comments (patch 3), free_used_clusters() has
been removed from this patch...

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l1_index, ret;
> > @@ -629,21 +673,63 @@ static uint64_t free_used_clusters(Block
> >      *l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> >      cluster_offset = be64_to_cpu((*l2_table)[*l2_index]);
> >  
> > -    if (cluster_offset & QCOW_OFLAG_COPIED)
> > +    if (nb_clusters && *nb_clusters > s->l2_size - (*l2_index))
> > +            *nb_clusters = s->l2_size - (*l2_index);
> > +
> > +    if (!cluster_offset) {
> > +        if (nb_clusters) {
> > +            int i = 1;
> > +            while (i < *nb_clusters && (*l2_table)[(*l2_index) + i] == 0) {
> > +                i++;
> > +            }
> > +            *nb_clusters = i;
> > +        }
> > +        return 0;
> > +    }
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> > +        if (nb_clusters) {
> > +            int i = 1;
> > +            uint64_t current;
> > +            while (i < *nb_clusters) {
> > +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +                if (cluster_offset + (i << s->cluster_bits) != current)
> > +                    break;
> > +                i++;
> > +            }
> > +            *nb_clusters = i;
> > +        }
> >          return cluster_offset;
> > +    }
> >  
> > -    if (cluster_offset) {
> > -        /* free the cluster */
> > -        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> > -            int nb_csectors;
> > -            nb_csectors = ((cluster_offset >> s->csize_shift) &
> > -                           s->csize_mask) + 1;
> > -            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> > -                          nb_csectors * 512);
> > -        } else {
> > -            free_clusters(bs, cluster_offset, s->cluster_size);
> > +    /* free the cluster */
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> > +        int nb_csectors;
> > +        nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
> > +        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> > +                      nb_csectors * 512);
> > +        if (nb_clusters)
> > +            *nb_clusters = 1;
> > +        return 0;
> > +    }
> > +
> > +    if (nb_clusters) {
> > +        int i = 1;
> > +        uint64_t current;
> > +        while (i < *nb_clusters) {
> > +            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +            if (cluster_offset + (i << s->cluster_bits) != current)
> > +                break;
> > +            i++;
> >          }
> > +        *nb_clusters = i;
> > +        free_clusters(bs, cluster_offset, i << s->cluster_bits);
> > +        return 0;
> >      }
> > +
> > +    free_clusters(bs, cluster_offset, s->cluster_size);
> > +
> >      return 0;
> >  }
> >  
> > @@ -657,7 +743,8 @@ static uint64_t alloc_compressed_cluster
> >      int nb_csectors;
> >  
> >      cluster_offset = free_used_clusters(bs, offset,
> > -                                        &l2_table, &l2_offset, &l2_index);
> > +                                        &l2_table, &l2_offset, &l2_index,
> > +                                        NULL);
> >      if (cluster_offset & QCOW_OFLAG_COPIED)
> >          return cluster_offset & ~QCOW_OFLAG_COPIED;
> >  
> > @@ -683,63 +770,80 @@ static uint64_t alloc_compressed_cluster
> >  
> >  static uint64_t alloc_cluster_offset(BlockDriverState *bs,
> >                                       uint64_t offset,
> > -                                     int n_start, int n_end)
> > +                                     int n_start, int n_end,
> > +                                     int *num)
> 
> The interface between get_cluster_offset and alloc_cluster_offset is
> inconsistent. In the former function, the value passed in num is used to
> determine the number of clusters to get. In the latter, num is an output
> parameter whose value isn't used. This is confusing.

Yes, I know, I have to think about this.

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l2_index, ret;
> >      uint64_t l2_offset, *l2_table, cluster_offset;
> > +    int nb_available, nb_clusters, i;
> > +    uint64_t start_sect;
> >  
> > +    nb_clusters = ((n_end << 9) + s->cluster_size - 1) >>
> > +                  s->cluster_bits;
> >  
> >      cluster_offset = free_used_clusters(bs, offset,
> > -                                        &l2_table, &l2_offset, &l2_index);
> > -    if (cluster_offset & QCOW_OFLAG_COPIED)
> > -        return cluster_offset & ~QCOW_OFLAG_COPIED;
> > +                                        &l2_table, &l2_offset, &l2_index,
> > +                                        &nb_clusters);
> > +    nb_available = nb_clusters << (s->cluster_bits - 9);
> > +    if (nb_available > n_end)
> > +        nb_available = n_end;
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> > +        cluster_offset &= ~QCOW_OFLAG_COPIED;
> > +        goto out;
> > +    }
> >  
> > -    /* allocate a new cluster */
> > +    /* allocate new clusters */
> >  
> > -    cluster_offset = alloc_clusters(bs, s->cluster_size);
> > +    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
> >  
> >      /* we must initialize the cluster content which won't be
> >         written */
> >  
> > -    if ((n_end - n_start) < s->cluster_sectors) {
> > -        uint64_t start_sect;
> > -
> > -        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> > +    start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> > +    if (n_start) {
> >          ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
> >          if (ret < 0)
> >              return 0;
> > -        ret = copy_sectors(bs, start_sect,
> > -                           cluster_offset, n_end, s->cluster_sectors);
> > +    }
> > +
> > +    if (nb_available & (s->cluster_sectors - 1)) {
> > +        uint64_t end = nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> > +        ret = copy_sectors(bs, start_sect + end,
> > +                           cluster_offset + (end << 9),
> > +                           nb_available - end,
> > +                           s->cluster_sectors);
> >          if (ret < 0)
> >              return 0;
> >      }
> >  
> >      /* update L2 table */
> >  
> > -    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> > +    for (i = 0; i < nb_clusters; i++)
> > +        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> > +                                             (i << s->cluster_bits)) |
> > +                                             QCOW_OFLAG_COPIED);
> > +
> >      if (bdrv_pwrite(s->hd,
> >                      l2_offset + l2_index * sizeof(uint64_t),
> >                      l2_table + l2_index,
> > -                    sizeof(uint64_t)) != sizeof(uint64_t))
> > +                    nb_clusters * sizeof(uint64_t)) !=
> > +                    nb_clusters * sizeof(uint64_t))
> >          return 0;
> >  
> > +out:
> > +    *num = nb_available - n_start;
> >      return cluster_offset;
> >  }
> >  
> >  static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
> >                               int nb_sectors, int *pnum)
> >  {
> > -    BDRVQcowState *s = bs->opaque;
> > -    int index_in_cluster, n;
> >      uint64_t cluster_offset;
> >  
> > -    cluster_offset = get_cluster_offset(bs, sector_num << 9);
> > -    index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -    n = s->cluster_sectors - index_in_cluster;
> > -    if (n > nb_sectors)
> > -        n = nb_sectors;
> > -    *pnum = n;
> > +    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> > +
> >      return (cluster_offset != 0);
> >  }
> >  
> > @@ -816,11 +920,9 @@ static int qcow_read(BlockDriverState *b
> >      uint64_t cluster_offset;
> >  
> >      while (nb_sectors > 0) {
> > -        cluster_offset = get_cluster_offset(bs, sector_num << 9);
> > +        n = nb_sectors;
> > +        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
> >          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -        n = s->cluster_sectors - index_in_cluster;
> > -        if (n > nb_sectors)
> > -            n = nb_sectors;
> >          if (!cluster_offset) {
> >              if (bs->backing_hd) {
> >                  /* read from the base image */
> > @@ -862,12 +964,10 @@ static int qcow_write(BlockDriverState *
> >  
> >      while (nb_sectors > 0) {
> >          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -        n = s->cluster_sectors - index_in_cluster;
> > -        if (n > nb_sectors)
> > -            n = nb_sectors;
> >          cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
> >                                                index_in_cluster,
> > -                                              index_in_cluster + n);
> > +                                              index_in_cluster + nb_sectors,
> > +                                              &n);
> >          if (!cluster_offset)
> >              return -1;
> >          if (s->crypt_method) {
> > @@ -940,11 +1040,9 @@ static void qcow_aio_read_cb(void *opaqu
> >      }
> >  
> >      /* prepare next AIO request */
> > -    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9);
> > +    acb->n = acb->nb_sectors;
> > +    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
> >      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> > -    acb->n = s->cluster_sectors - index_in_cluster;
> > -    if (acb->n > acb->nb_sectors)
> > -        acb->n = acb->nb_sectors;
> >  
> >      if (!acb->cluster_offset) {
> >          if (bs->backing_hd) {
> > @@ -1046,12 +1144,10 @@ static void qcow_aio_write_cb(void *opaq
> >      }
> >  
> >      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> > -    acb->n = s->cluster_sectors - index_in_cluster;
> > -    if (acb->n > acb->nb_sectors)
> > -        acb->n = acb->nb_sectors;
> >      cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
> >                                            index_in_cluster,
> > -                                          index_in_cluster + acb->n);
> > +                                          index_in_cluster + acb->nb_sectors,
> > +                                          &acb->n);
> >      if (!cluster_offset || (cluster_offset & 511) != 0) {
> >          ret = -EIO;
> >          goto fail;
> 
> In the writing functions, you can't just assign a big n, because
> s->cluster_data will be too small when processing encrypted data. As you
> said you fixed a segfault, I think you know this one already.

yes, this is the cause of the segfault.

Regards,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters
  2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
                   ` (3 preceding siblings ...)
  2008-07-29 14:13 ` [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters Laurent Vivier
@ 2008-07-29 14:13 ` Laurent Vivier
  2008-08-11 13:13   ` Kevin Wolf
  2008-07-29 19:15 ` [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Anthony Liguori
  5 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

[-- Attachment #1: qcow2-extent-more.patch --]
[-- Type: text/plain, Size: 3336 bytes --]

In free_used_clusters(), try to aggregate free clusters and freed clusters.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
---
 block-qcow2.c |   61 ++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 36 insertions(+), 25 deletions(-)

Index: qemu/block-qcow2.c
===================================================================
--- qemu.orig/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
+++ qemu/block-qcow2.c	2008-07-29 15:22:30.000000000 +0200
@@ -649,6 +649,8 @@ static uint64_t free_used_clusters(Block
     BDRVQcowState *s = bs->opaque;
     int l1_index, ret;
     uint64_t cluster_offset;
+    int i, j, do_loop;
+    uint64_t current;
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
@@ -676,17 +678,6 @@ static uint64_t free_used_clusters(Block
     if (nb_clusters && *nb_clusters > s->l2_size - (*l2_index))
             *nb_clusters = s->l2_size - (*l2_index);
 
-    if (!cluster_offset) {
-        if (nb_clusters) {
-            int i = 1;
-            while (i < *nb_clusters && (*l2_table)[(*l2_index) + i] == 0) {
-                i++;
-            }
-            *nb_clusters = i;
-        }
-        return 0;
-    }
-
     if (cluster_offset & QCOW_OFLAG_COPIED) {
         if (nb_clusters) {
             int i = 1;
@@ -702,8 +693,6 @@ static uint64_t free_used_clusters(Block
         return cluster_offset;
     }
 
-    /* free the cluster */
-
     if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
         int nb_csectors;
         nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
@@ -714,22 +703,44 @@ static uint64_t free_used_clusters(Block
         return 0;
     }
 
-    if (nb_clusters) {
-        int i = 1;
-        uint64_t current;
-        while (i < *nb_clusters) {
-            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
-            if (cluster_offset + (i << s->cluster_bits) != current)
-                break;
-            i++;
-        }
-        *nb_clusters = i;
-        free_clusters(bs, cluster_offset, i << s->cluster_bits);
+    if (!nb_clusters) {
+        if (cluster_offset)
+            free_clusters(bs, cluster_offset, s->cluster_size);
         return 0;
     }
 
-    free_clusters(bs, cluster_offset, s->cluster_size);
+    i = 0;
+    do_loop = 1;
+    while (i < *nb_clusters && do_loop) {
+        i++;
+        if (!cluster_offset) {
+            while (i < *nb_clusters) {
+                cluster_offset = (*l2_table)[(*l2_index) + i];
+                if (cluster_offset)
+                    break;
+                i++;
+            }
+            if ((cluster_offset & QCOW_OFLAG_COPIED) ||
+                (cluster_offset & QCOW_OFLAG_COMPRESSED))
+                do_loop = 0;
+        } else {
+            j = 1;
+            current = 0;
+            while (i < *nb_clusters) {
+                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
+                if (cluster_offset + (j << s->cluster_bits) != current)
+                    break;
+                i++;
+                j++;
+            }
+            free_clusters(bs, cluster_offset, j << s->cluster_bits);
+            cluster_offset = current;
+            if (current)
+                do_loop = 0;
+        }
+    }
 
+    *nb_clusters = i;
     return 0;
 }
 

--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters
  2008-07-29 14:13 ` [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters Laurent Vivier
@ 2008-08-11 13:13   ` Kevin Wolf
  2008-08-11 14:04     ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2008-08-11 13:13 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Hi Laurent,

Laurent Vivier schrieb:
> In free_used_clusters(), try to aggregate free clusters and freed clusters.
> 
> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> ---
>  block-qcow2.c |   61 ++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 36 insertions(+), 25 deletions(-)

I know that this patch will be rewritten as well, but I'll comment on it
nevertheless. Maybe the comments apply to your new version, too.


> Index: qemu/block-qcow2.c
> ===================================================================
> --- qemu.orig/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
> +++ qemu/block-qcow2.c	2008-07-29 15:22:30.000000000 +0200
> @@ -714,22 +703,44 @@ static uint64_t free_used_clusters(Block
>          return 0;
>      }
>  
> -    if (nb_clusters) {
> -        int i = 1;
> -        uint64_t current;
> -        while (i < *nb_clusters) {
> -            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> -            if (cluster_offset + (i << s->cluster_bits) != current)
> -                break;
> -            i++;
> -        }
> -        *nb_clusters = i;
> -        free_clusters(bs, cluster_offset, i << s->cluster_bits);
> +    if (!nb_clusters) {
> +        if (cluster_offset)
> +            free_clusters(bs, cluster_offset, s->cluster_size);
>          return 0;
>      }
>  
> -    free_clusters(bs, cluster_offset, s->cluster_size);
> +    i = 0;
> +    do_loop = 1;
> +    while (i < *nb_clusters && do_loop) {
> +        i++;
> +        if (!cluster_offset) {
> +            while (i < *nb_clusters) {
> +                cluster_offset = (*l2_table)[(*l2_index) + i];
> +                if (cluster_offset)
> +                    break;
> +                i++;
> +            }
> +            if ((cluster_offset & QCOW_OFLAG_COPIED) ||
> +                (cluster_offset & QCOW_OFLAG_COMPRESSED))
> +                do_loop = 0;
> +        } else {
> +            j = 1;
> +            current = 0;
> +            while (i < *nb_clusters) {
> +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> +                if (cluster_offset + (j << s->cluster_bits) != current)
> +                    break;
> +                i++;
> +                j++;
> +            }
> +            free_clusters(bs, cluster_offset, j << s->cluster_bits);
> +            cluster_offset = current;
> +            if (current)
> +                do_loop = 0;

The else block changes its behaviour with this patch: I think you should
 take QCOW_OFLAG_COPIED into consideration. In this implementation, when
a cluster with QCOW_OFLAG_COPIED is found, the inner loop aborts, but
the outer one will just call the else block once again. So you're also
accumulating clusters with QCOW_OFLAG_COPIED set.

BTW, why do you use that do_loop variable instead of a break statement?

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters
  2008-08-11 13:13   ` Kevin Wolf
@ 2008-08-11 14:04     ` Laurent Vivier
  2008-08-11 14:42       ` Laurent Vivier
  2008-08-11 15:03       ` Kevin Wolf
  0 siblings, 2 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-08-11 14:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf

Le lundi 11 août 2008 à 15:13 +0200, Kevin Wolf a écrit :
> Hi Laurent,
> 
> Laurent Vivier schrieb:
> > In free_used_clusters(), try to aggregate free clusters and freed clusters.
> > 
> > Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
> > ---
> >  block-qcow2.c |   61 ++++++++++++++++++++++++++++++++++------------------------
> >  1 file changed, 36 insertions(+), 25 deletions(-)
> 
> I know that this patch will be rewritten as well, but I'll comment on it
> nevertheless. Maybe the comments apply to your new version, too.
> 
> 
> > Index: qemu/block-qcow2.c
> > ===================================================================
> > --- qemu.orig/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
> > +++ qemu/block-qcow2.c	2008-07-29 15:22:30.000000000 +0200
> > @@ -714,22 +703,44 @@ static uint64_t free_used_clusters(Block
> >          return 0;
> >      }
> >  
> > -    if (nb_clusters) {
> > -        int i = 1;
> > -        uint64_t current;
> > -        while (i < *nb_clusters) {
> > -            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > -            if (cluster_offset + (i << s->cluster_bits) != current)
> > -                break;
> > -            i++;
> > -        }
> > -        *nb_clusters = i;
> > -        free_clusters(bs, cluster_offset, i << s->cluster_bits);
> > +    if (!nb_clusters) {
> > +        if (cluster_offset)
> > +            free_clusters(bs, cluster_offset, s->cluster_size);
> >          return 0;
> >      }
> >  
> > -    free_clusters(bs, cluster_offset, s->cluster_size);
> > +    i = 0;
> > +    do_loop = 1;
> > +    while (i < *nb_clusters && do_loop) {
> > +        i++;
> > +        if (!cluster_offset) {
> > +            while (i < *nb_clusters) {
> > +                cluster_offset = (*l2_table)[(*l2_index) + i];
> > +                if (cluster_offset)
> > +                    break;
> > +                i++;
> > +            }
> > +            if ((cluster_offset & QCOW_OFLAG_COPIED) ||
> > +                (cluster_offset & QCOW_OFLAG_COMPRESSED))
> > +                do_loop = 0;
> > +        } else {
> > +            j = 1;
> > +            current = 0;
> > +            while (i < *nb_clusters) {
> > +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +                if (cluster_offset + (j << s->cluster_bits) != current)
> > +                    break;
> > +                i++;
> > +                j++;
> > +            }
> > +            free_clusters(bs, cluster_offset, j << s->cluster_bits);
> > +            cluster_offset = current;
> > +            if (current)
> > +                do_loop = 0;
> 
> The else block changes its behaviour with this patch: I think you should
>  take QCOW_OFLAG_COPIED into consideration. In this implementation, when
> a cluster with QCOW_OFLAG_COPIED is found, the inner loop aborts, but
> the outer one will just call the else block once again. So you're also
> accumulating clusters with QCOW_OFLAG_COPIED set.

No,
before the first loop QCOW_OFLAG_COPIED is not set.
if cluster_offset is 0 and flag appears, we set do_loop to 0 and exit
from the main loop.
if cluster_offset is != 0 we go into the else until current becomes 0 or
a flag is set and then if current is not 0 (i.e. a flag or a non
contiguous cluster) we set do_loop to 0 and exit from the loop.

> BTW, why do you use that do_loop variable instead of a break statement?

Yes

Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters
  2008-08-11 14:04     ` Laurent Vivier
@ 2008-08-11 14:42       ` Laurent Vivier
  2008-08-11 15:03       ` Kevin Wolf
  1 sibling, 0 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-08-11 14:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel@nongnu.org

Le lundi 11 août 2008 à 16:04 +0200, Laurent Vivier a écrit :
> Le lundi 11 août 2008 à 15:13 +0200, Kevin Wolf a écrit :
> > BTW, why do you use that do_loop variable instead of a break statement?
> 
> Yes

... I mean "Yes, I should use a break statement" ...

Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters
  2008-08-11 14:04     ` Laurent Vivier
  2008-08-11 14:42       ` Laurent Vivier
@ 2008-08-11 15:03       ` Kevin Wolf
  1 sibling, 0 replies; 29+ messages in thread
From: Kevin Wolf @ 2008-08-11 15:03 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier schrieb:
> Le lundi 11 août 2008 à 15:13 +0200, Kevin Wolf a écrit :
>> Hi Laurent,
>>
>> Laurent Vivier schrieb:
>>> In free_used_clusters(), try to aggregate free clusters and freed clusters.
>>>
>>> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
>>> ---
>>>  block-qcow2.c |   61 ++++++++++++++++++++++++++++++++++------------------------
>>>  1 file changed, 36 insertions(+), 25 deletions(-)
>> I know that this patch will be rewritten as well, but I'll comment on it
>> nevertheless. Maybe the comments apply to your new version, too.
>>
>>
>>> Index: qemu/block-qcow2.c
>>> ===================================================================
>>> --- qemu.orig/block-qcow2.c	2008-07-29 15:22:28.000000000 +0200
>>> +++ qemu/block-qcow2.c	2008-07-29 15:22:30.000000000 +0200
>>> @@ -714,22 +703,44 @@ static uint64_t free_used_clusters(Block
>>>          return 0;
>>>      }
>>>  
>>> -    if (nb_clusters) {
>>> -        int i = 1;
>>> -        uint64_t current;
>>> -        while (i < *nb_clusters) {
>>> -            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
>>> -            if (cluster_offset + (i << s->cluster_bits) != current)
>>> -                break;
>>> -            i++;
>>> -        }
>>> -        *nb_clusters = i;
>>> -        free_clusters(bs, cluster_offset, i << s->cluster_bits);
>>> +    if (!nb_clusters) {
>>> +        if (cluster_offset)
>>> +            free_clusters(bs, cluster_offset, s->cluster_size);
>>>          return 0;
>>>      }
>>>  
>>> -    free_clusters(bs, cluster_offset, s->cluster_size);
>>> +    i = 0;
>>> +    do_loop = 1;
>>> +    while (i < *nb_clusters && do_loop) {
>>> +        i++;
>>> +        if (!cluster_offset) {
>>> +            while (i < *nb_clusters) {
>>> +                cluster_offset = (*l2_table)[(*l2_index) + i];
>>> +                if (cluster_offset)
>>> +                    break;
>>> +                i++;
>>> +            }
>>> +            if ((cluster_offset & QCOW_OFLAG_COPIED) ||
>>> +                (cluster_offset & QCOW_OFLAG_COMPRESSED))
>>> +                do_loop = 0;
>>> +        } else {
>>> +            j = 1;
>>> +            current = 0;
>>> +            while (i < *nb_clusters) {
>>> +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
>>> +                if (cluster_offset + (j << s->cluster_bits) != current)
>>> +                    break;
>>> +                i++;
>>> +                j++;
>>> +            }
>>> +            free_clusters(bs, cluster_offset, j << s->cluster_bits);
>>> +            cluster_offset = current;
>>> +            if (current)
>>> +                do_loop = 0;
>> The else block changes its behaviour with this patch: I think you should
>>  take QCOW_OFLAG_COPIED into consideration. In this implementation, when
>> a cluster with QCOW_OFLAG_COPIED is found, the inner loop aborts, but
>> the outer one will just call the else block once again. So you're also
>> accumulating clusters with QCOW_OFLAG_COPIED set.
> 
> No,
> before the first loop QCOW_OFLAG_COPIED is not set.
> if cluster_offset is 0 and flag appears, we set do_loop to 0 and exit
> from the main loop.
> if cluster_offset is != 0 we go into the else until current becomes 0 or
> a flag is set and then if current is not 0 (i.e. a flag or a non
> contiguous cluster) we set do_loop to 0 and exit from the loop.

You're right. I really should learn reading. Sorry for the noise.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
  2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
                   ` (4 preceding siblings ...)
  2008-07-29 14:13 ` [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters Laurent Vivier
@ 2008-07-29 19:15 ` Anthony Liguori
  2008-07-29 21:35   ` Laurent Vivier
  5 siblings, 1 reply; 29+ messages in thread
From: Anthony Liguori @ 2008-07-29 19:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier

Laurent Vivier wrote:
> These patches improve qcow2 performance when used with cache=off.
>
> They modify block-qcow2.c to read/write as many clusters as
> possible per bdrv_aio_[read|write]().
>   

This patch series looks like a pretty good clean up of the code.  The 
perf improvement is a nice side effect too.

I'm a little concerned about how much code this touches though.  How 
much testing have you done of these changes?  Have you tested all of the 
corner cases (backing files, filling up a disk image, etc.)?

Regards,

Anthony Liguori

> Some benchmarks:
>
> * mkfs on 500 MB qcow2 disk
>
> mkfs                            WITHOUT         WITH
>
> ide, cache=off,snapshot=off     41 s            5 s     8x faster
> ide, cache=off,snapshot=on      40 s            5 s     8x faster
> ide, cache=on, snapshot=off      3 s            3 s
> ide, cache=on, snapshot=on       4 s            3 s
>
> * tar jxvf linux-2.6.25.7.tar.bz2
>
> ide, cache=off,snapshot=off     847 s           379 s   2.2x faster
> ide, cache=off,snapshot=on      801 s           364 s   2.2x faster
> ide, cache=on, snapshot=off     238 s           236 s
> ide, cache=on, snapshot=on      236 s           237 s
>
> * dd if=/dev/zero of=file
>
> dd if=/dev/null                 WITHOUT         WITH
>
> ide, cache=off,snapshot=off     333 kB/s        3.7 MB/s  11.3x faster
> ide, cache=off,snapshot=on      337 kB/s        3.6 MB/s  10.9x faster
> ide, cache=on, snapshot=off    9.06 MB/s        9.23 MB/s
> ide, cache=on, snapshot=on     8,89 MB/s        8.89 MB/s
>
> * dbench 1
>
> dbench                          WITHOUT          WITH
>
> ide, cache=off,snapshot=off     20.8494 MB/sec  24.8521 MB/sec          +19,2%
> ide, cache=off,snapshot=on      20.9349 MB/sec  24.2296 MB/sec          +15,7%
> ide, cache=on, snapshot=off     23.6612 MB/sec  25.4724 MB/sec          + 7,6%
> ide, cache=on, snapshot=on      24.1836 MB/sec  24.8169 MB/sec
>
> [PATCH 1/5] extract code from get_cluster_offset() into new functions
>             seek_l2_table(), l2_load() and l2_allocate().
>
> [PATCH 2/5] divide get_cluster_offset() into get_cluster_offset() and
>             alloc_cluster_offset().
>
> [PATCH 3/5] divide alloc_cluster_offset() into alloc_cluster_offset()
>             and alloc_compressed_cluster_offset(). Factorize code
>             to free clusters into free_used_clusters().
>
> [PATCH 4/5] modify get_cluster_offset(), alloc_cluster_offset() and 
>             free_used_clusters() to specify how many clusters we
>             want.
>
> [PATCH 5/5] in free_used_clusters(), try to aggregate free clusters
>             and freed clusters.
>
> Laurent
> --
>
>
>   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
  2008-07-29 19:15 ` [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Anthony Liguori
@ 2008-07-29 21:35   ` Laurent Vivier
  2008-07-29 21:49     ` Anthony Liguori
  0 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 21:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Le mardi 29 juillet 2008 à 14:15 -0500, Anthony Liguori a écrit :
> Laurent Vivier wrote:
> > These patches improve qcow2 performance when used with cache=off.
> >
> > They modify block-qcow2.c to read/write as many clusters as
> > possible per bdrv_aio_[read|write]().
> >   
> 
> This patch series looks like a pretty good clean up of the code.  The 
> perf improvement is a nice side effect too.
> 
> I'm a little concerned about how much code this touches though.  How 

Yes, I know, it's why I split it in several patches: easier to review,
easier to test.

> much testing have you done of these changes?  Have you tested all of the 
> corner cases (backing files, filling up a disk image, etc.)?

Well, before all I made a lot of review of my modifications, but it must
also be reviewed by other eyes.

My test process was:

qemu-img create test.qcow2 500MB
qemu ... -hda boot.raw -hdb test.qcow2

mkfs /dev/hdb
fsck /dev/hdb
mount /dev/hdb /mnt
cd /mnt
dbench 16
cd
umount /mnt
fsck /dev/hdb
mkfs /dev/hdb
fsck /dev/hdb
mount /dev/hdb /mnt
cd /mnt
tar xvf /root/linux.tar.bz2
cd
umount /mnt
fsck /dev/hdb
mount /mnt
cd /mnt/linux/
make defconfig
make
cd
umount /mnt
fsck /dev/hdb
mount /mnt
cd /mnt/linux
make clean
cd
umount /mnt
fsck /dev/hdb
mount /dev/hdb
cd /mnt
rm -fr linux
cd 
umount /mnt
fsck /dev/hdb
mkfs /dev/hdb
fsck /dev/hdb
mount /dev/hdb /mnt
dd if=/dev/zero of=/mnt/file
umount /mnt
fsck /dev/hdb
mount /dev/hdb /mnt
rm /mnt/file
umount /mnt
fsck /dev/hdb
mount /dev/hdb /mnt
cd /mnt
dbench 1
cd
umount /mnt
fsck /dev/hdb

I think it covers a lot of cases, but I didn't test encrypted disk image
and compressed disk image. The case with backed files was tested only
with mkfs/fsck/dbench.

If you think these patches are good candidates to be included, I can
make more tests.

Regards,
Laurent
-- 
----------------- Laurent.Vivier@bull.net  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
  2008-07-29 21:35   ` Laurent Vivier
@ 2008-07-29 21:49     ` Anthony Liguori
  2008-07-29 21:59       ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Anthony Liguori @ 2008-07-29 21:49 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: qemu-devel

Laurent Vivier wrote:
> Le mardi 29 juillet 2008 à 14:15 -0500, Anthony Liguori a écrit :
>   
>> Laurent Vivier wrote:
>>     
>>> These patches improve qcow2 performance when used with cache=off.
>>>
>>> They modify block-qcow2.c to read/write as many clusters as
>>> possible per bdrv_aio_[read|write]().
>>>   
>>>       
>> This patch series looks like a pretty good clean up of the code.  The 
>> perf improvement is a nice side effect too.
>>
>> I'm a little concerned about how much code this touches though.  How 
>>     
>
> Yes, I know, it's why I split it in several patches: easier to review,
> easier to test.
>
>   
>> much testing have you done of these changes?  Have you tested all of the 
>> corner cases (backing files, filling up a disk image, etc.)?
>>     
>
> Well, before all I made a lot of review of my modifications, but it must
> also be reviewed by other eyes.
>
> I think it covers a lot of cases, but I didn't test encrypted disk image
> and compressed disk image. The case with backed files was tested only
> with mkfs/fsck/dbench.
>
> If you think these patches are good candidates to be included, I can
> make more tests.
>   

Yes, I do think these patches make sense.  I would like to see more 
testing though.  It all seemed pretty clear to me (breaking out the 
patches was very helpful!) but I am very worried about corrupting qcow2 
images.

Regards,

Anthony Liguori

> Regards,
> Laurent
>   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
  2008-07-29 21:49     ` Anthony Liguori
@ 2008-07-29 21:59       ` Laurent Vivier
  2008-08-01 14:54         ` Anthony Liguori
  0 siblings, 1 reply; 29+ messages in thread
From: Laurent Vivier @ 2008-07-29 21:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier


Le 29 juil. 08 à 23:49, Anthony Liguori a écrit :
> [...] I am very worried about corrupting qcow2 images.


Me too.

I'm going to make more tests. Any idea ?

Regards,
Laurent
----------------------- Laurent Vivier ----------------------
"The best way to predict the future is to invent it."
- Alan Kay

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
  2008-07-29 21:59       ` Laurent Vivier
@ 2008-08-01 14:54         ` Anthony Liguori
  2008-08-01 15:05           ` Laurent Vivier
  0 siblings, 1 reply; 29+ messages in thread
From: Anthony Liguori @ 2008-08-01 14:54 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Laurent Vivier, qemu-devel

Laurent Vivier wrote:
>
> Le 29 juil. 08 à 23:49, Anthony Liguori a écrit :
>> [...] I am very worried about corrupting qcow2 images.
>
>
> Me too.
>
> I'm going to make more tests. Any idea ?

Just make sure that we're covering the various corner cases (snapshots, 
full/empty disks, backing files, etc.).

Regards,

Anthony Liguori

> Regards,
> Laurent
> ----------------------- Laurent Vivier ----------------------
> "The best way to predict the future is to invent it."
> - Alan Kay
>
>
>
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off
  2008-08-01 14:54         ` Anthony Liguori
@ 2008-08-01 15:05           ` Laurent Vivier
  0 siblings, 0 replies; 29+ messages in thread
From: Laurent Vivier @ 2008-08-01 15:05 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Le vendredi 01 août 2008 à 09:54 -0500, Anthony Liguori a écrit :
> Laurent Vivier wrote:
> >
> > Le 29 juil. 08 à 23:49, Anthony Liguori a écrit :
> >> [...] I am very worried about corrupting qcow2 images.
> >
> >
> > Me too.
> >
> > I'm going to make more tests. Any idea ?
> 
> Just make sure that we're covering the various corner cases (snapshots, 
> full/empty disks, backing files, etc.).

OK.
I've found a little bug with encrypted file. I'm working on.

Regards,
Laurent

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-08-11 15:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-29 14:13 [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 1/5][v2] Extract code from get_cluster_offset() Laurent Vivier
2008-08-05 14:15   ` Kevin Wolf
2008-08-05 14:28     ` Laurent Vivier
2008-08-05 14:34       ` Kevin Wolf
2008-08-05 14:45         ` Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 2/5][v2] Divide get_cluster_offset() Laurent Vivier
2008-08-05 15:13   ` Kevin Wolf
2008-08-05 15:25     ` Laurent Vivier
2008-08-05 15:41       ` Kevin Wolf
2008-07-29 14:13 ` [Qemu-devel] [patch 3/5][v2] Extract compressing part from alloc_cluster_offset() Laurent Vivier
2008-08-06 14:20   ` Kevin Wolf
2008-08-06 14:41     ` Laurent Vivier
2008-08-06 14:56       ` Kevin Wolf
2008-08-06 15:03         ` Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters Laurent Vivier
2008-08-11 12:10   ` Kevin Wolf
2008-08-11 12:39     ` Laurent Vivier
2008-07-29 14:13 ` [Qemu-devel] [patch 5/5][v2] Try to aggregate free clusters and freed clusters Laurent Vivier
2008-08-11 13:13   ` Kevin Wolf
2008-08-11 14:04     ` Laurent Vivier
2008-08-11 14:42       ` Laurent Vivier
2008-08-11 15:03       ` Kevin Wolf
2008-07-29 19:15 ` [Qemu-devel] [patch 0/5][v2] qcow2: improve I/O performance with cache=off Anthony Liguori
2008-07-29 21:35   ` Laurent Vivier
2008-07-29 21:49     ` Anthony Liguori
2008-07-29 21:59       ` Laurent Vivier
2008-08-01 14:54         ` Anthony Liguori
2008-08-01 15:05           ` Laurent Vivier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).