[Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW
@ 2012-09-18 11:40 Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 01/16] qcow2: Round QCowL2Meta.offset down to cluster boundary Kevin Wolf
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

During the last few releases we have got rid of most of the overhead of
metadata writes during cluster allocation. What's left is the COW for
unaligned allocating write requests, and it's quite expensive.

In the general case, this cost cannot be avoided. However, if we're
lucky enough that before the next flush the data copied during COW
would be overwritten, we can do without the COW. Sequential writes
always overwrite the COW area at the end of the cluster immediately,
so delaying the COW a bit and cancelling it if it's overwritten is a
worthwhile optimisation.


The really interesting part of this series should be close to final;
however, you only see the improvements with the last patch applied,
which isn't quite correct yet. Doing it right requires some additional
refactoring, so I thought I'd get this out for a first round of review
before fixing it.


iozone results with and without this series show significant difference
for allocating writes:

                                                         random  random
           KB  reclen   write rewrite    read    reread    read   write

base    65536       8    1727    1945    12546    12539    2449    1836
patch   65536       8    1934    1949    12263    12521    2463    1796

base  1048576     256   22344   38437   105823   106135   37743   32167
patch 1048576     256   35989   38542   105231   105994   38301   33036


Kevin Wolf (16):
  qcow2: Round QCowL2Meta.offset down to cluster boundary
  qcow2: Introduce Qcow2COWRegion
  qcow2: Allocate l2meta dynamically
  qcow2: Drop l2meta.cluster_offset
  qcow2: Allocate l2meta only for cluster allocations
  qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
  qcow2: Factor out handle_dependencies()
  qcow2: Reading from areas not in L2 tables yet
  qcow2: Move COW and L2 update into own coroutine
  qcow2: Delay the COW
  qcow2: Add error handling to the l2meta coroutine
  qcow2: Handle dependencies earlier
  qcow2: Change handle_dependency to byte granularity
  qcow2: Execute run_dependent_requests() without lock
  qcow2: Cancel COW when overwritten
  [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same
    time

 block.c               |    5 +
 block/qcow2-cluster.c |  432 ++++++++++++++++++++++++++++++++++++++-----------
 block/qcow2.c         |  239 +++++++++++++++++++++++-----
 block/qcow2.h         |  153 +++++++++++++++++-
 block_int.h           |    3 +
 5 files changed, 692 insertions(+), 140 deletions(-)

-- 
1.7.6.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 01/16] qcow2: Round QCowL2Meta.offset down to cluster boundary
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 02/16] qcow2: Introduce Qcow2COWRegion Kevin Wolf
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

The offset within the cluster is already present as n_start and this is
what the code uses. QCowL2Meta.offset is only needed at a cluster
granularity.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |    4 ++--
 block/qcow2.h         |   22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e179211..d17a37c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -631,7 +631,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     old_cluster = g_malloc(m->nb_clusters * sizeof(uint64_t));
 
     /* copy content of unmodified sectors */
-    start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
+    start_sect = m->offset >> 9;
     if (m->n_start) {
         cow = true;
         qemu_co_mutex_unlock(&s->lock);
@@ -966,7 +966,7 @@ again:
                 .cluster_offset = keep_clusters == 0 ?
                                   alloc_cluster_offset : cluster_offset,
                 .alloc_offset   = alloc_cluster_offset,
-                .offset         = alloc_offset,
+                .offset         = alloc_offset & ~(s->cluster_size - 1),
                 .n_start        = keep_clusters == 0 ? n_start : 0,
                 .nb_clusters    = nb_clusters,
                 .nb_available   = MIN(requested_sectors, avail_sectors),
diff --git a/block/qcow2.h b/block/qcow2.h
index b4eb654..2a406a7 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -199,12 +199,34 @@ struct QCowAIOCB;
 /* XXX This could be private for qcow2-cluster.c */
 typedef struct QCowL2Meta
 {
+    /** Guest offset of the first newly allocated cluster */
     uint64_t offset;
+
+    /** Host offset of the first cluster of the request */
     uint64_t cluster_offset;
+
+    /** Host offset of the first newly allocated cluster */
     uint64_t alloc_offset;
+
+    /**
+     * Number of sectors between the start of the first allocated cluster and
+     * the area that the guest actually writes to.
+     */
     int n_start;
+
+    /**
+     * Number of sectors from the start of the first allocated cluster to
+     * the end of the (possibly shortened) request
+     */
     int nb_available;
+
+    /** Number of newly allocated clusters */
     int nb_clusters;
+
+    /**
+     * Requests that overlap with this allocation and wait to be restarted
+     * when the allocating request has completed.
+     */
     CoQueue dependent_requests;
 
     QLIST_ENTRY(QCowL2Meta) next_in_flight;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 02/16] qcow2: Introduce Qcow2COWRegion
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 01/16] qcow2: Round QCowL2Meta.offset down to cluster boundary Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 03/16] qcow2: Allocate l2meta dynamically Kevin Wolf
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

This makes it easier to address the areas for which a COW must be
performed. As a nice side effect, the COW code in
qcow2_alloc_cluster_link_l2 becomes really trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   85 +++++++++++++++++++++++++++++++------------------
 block/qcow2.h         |   29 +++++++++++++---
 2 files changed, 77 insertions(+), 37 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d17a37c..94b7f13 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -615,13 +615,41 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     return cluster_offset;
 }
 
+static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret;
+
+    if (r->nb_sectors == 0) {
+        return 0;
+    }
+
+    qemu_co_mutex_unlock(&s->lock);
+    ret = copy_sectors(bs, m->offset / BDRV_SECTOR_SIZE, m->alloc_offset,
+                       r->offset / BDRV_SECTOR_SIZE,
+                       r->offset / BDRV_SECTOR_SIZE + r->nb_sectors);
+    qemu_co_mutex_lock(&s->lock);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * Before we update the L2 table to actually point to the new cluster, we
+     * need to be sure that the refcounts have been increased and COW was
+     * handled.
+     */
+    qcow2_cache_depends_on_flush(s->l2_table_cache);
+
+    return 0;
+}
+
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 {
     BDRVQcowState *s = bs->opaque;
     int i, j = 0, l2_index, ret;
-    uint64_t *old_cluster, start_sect, *l2_table;
+    uint64_t *old_cluster, *l2_table;
     uint64_t cluster_offset = m->alloc_offset;
-    bool cow = false;
 
     trace_qcow2_cluster_link_l2(qemu_coroutine_self(), m->nb_clusters);
 
@@ -631,36 +659,17 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     old_cluster = g_malloc(m->nb_clusters * sizeof(uint64_t));
 
     /* copy content of unmodified sectors */
-    start_sect = m->offset >> 9;
-    if (m->n_start) {
-        cow = true;
-        qemu_co_mutex_unlock(&s->lock);
-        ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
-        qemu_co_mutex_lock(&s->lock);
-        if (ret < 0)
-            goto err;
+    ret = perform_cow(bs, m, &m->cow_start);
+    if (ret < 0) {
+        goto err;
     }
 
-    if (m->nb_available & (s->cluster_sectors - 1)) {
-        cow = true;
-        qemu_co_mutex_unlock(&s->lock);
-        ret = copy_sectors(bs, start_sect, cluster_offset, m->nb_available,
-                           align_offset(m->nb_available, s->cluster_sectors));
-        qemu_co_mutex_lock(&s->lock);
-        if (ret < 0)
-            goto err;
+    ret = perform_cow(bs, m, &m->cow_end);
+    if (ret < 0) {
+        goto err;
     }
 
-    /*
-     * Update L2 table.
-     *
-     * Before we update the L2 table to actually point to the new cluster, we
-     * need to be sure that the refcounts have been increased and COW was
-     * handled.
-     */
-    if (cow) {
-        qcow2_cache_depends_on_flush(s->l2_table_cache);
-    }
+    /* Update L2 table. */
 
     if (qcow2_need_accurate_refcounts(s)) {
         qcow2_cache_set_dependency(bs, s->l2_table_cache,
@@ -957,19 +966,33 @@ again:
              *
              * avail_sectors: Number of sectors from the start of the first
              * newly allocated to the end of the last newly allocated cluster.
+             *
+             * nb_sectors: The number of sectors from the start of the first
+             * newly allocated cluster to the end of the aread that the write
+             * request actually writes to (excluding COW at the end)
              */
             int requested_sectors = n_end - keep_clusters * s->cluster_sectors;
             int avail_sectors = nb_clusters
                                 << (s->cluster_bits - BDRV_SECTOR_BITS);
+            int alloc_n_start = keep_clusters == 0 ? n_start : 0;
+            int nb_sectors = MIN(requested_sectors, avail_sectors);
 
             *m = (QCowL2Meta) {
                 .cluster_offset = keep_clusters == 0 ?
                                   alloc_cluster_offset : cluster_offset,
                 .alloc_offset   = alloc_cluster_offset,
                 .offset         = alloc_offset & ~(s->cluster_size - 1),
-                .n_start        = keep_clusters == 0 ? n_start : 0,
                 .nb_clusters    = nb_clusters,
-                .nb_available   = MIN(requested_sectors, avail_sectors),
+                .nb_available   = nb_sectors,
+
+                .cow_start = {
+                    .offset     = 0,
+                    .nb_sectors = alloc_n_start,
+                },
+                .cow_end = {
+                    .offset     = nb_sectors * BDRV_SECTOR_SIZE,
+                    .nb_sectors = avail_sectors - nb_sectors,
+                },
             };
             qemu_co_queue_init(&m->dependent_requests);
             QLIST_INSERT_HEAD(&s->cluster_allocs, m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 2a406a7..1106b33 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -196,6 +196,17 @@ typedef struct QCowCreateState {
 
 struct QCowAIOCB;
 
+typedef struct Qcow2COWRegion {
+    /**
+     * Offset of the COW region in bytes from the start of the first cluster
+     * touched by the request.
+     */
+    uint64_t    offset;
+
+    /** Number of sectors to copy */
+    int         nb_sectors;
+} Qcow2COWRegion;
+
 /* XXX This could be private for qcow2-cluster.c */
 typedef struct QCowL2Meta
 {
@@ -209,12 +220,6 @@ typedef struct QCowL2Meta
     uint64_t alloc_offset;
 
     /**
-     * Number of sectors between the start of the first allocated cluster and
-     * the area that the guest actually writes to.
-     */
-    int n_start;
-
-    /**
      * Number of sectors from the start of the first allocated cluster to
      * the end of the (possibly shortened) request
      */
@@ -229,6 +234,18 @@ typedef struct QCowL2Meta
      */
     CoQueue dependent_requests;
 
+    /**
+     * The COW Region between the start of the first allocated cluster and the
+     * area the guest actually writes to.
+     */
+    Qcow2COWRegion cow_start;
+
+    /**
+     * The COW Region between the area the guest actually writes to and the
+     * end of the last allocated cluster.
+     */
+    Qcow2COWRegion cow_end;
+
     QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 03/16] qcow2: Allocate l2meta dynamically
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 01/16] qcow2: Round QCowL2Meta.offset down to cluster boundary Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 02/16] qcow2: Introduce Qcow2COWRegion Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 04/16] qcow2: Drop l2meta.cluster_offset Kevin Wolf
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

As soon as delayed COW is introduced, the l2meta struct is needed even
after completion of the request, so it can't live on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8f183f1..9e4d440 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -765,15 +765,11 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     QEMUIOVector hd_qiov;
     uint64_t bytes_done = 0;
     uint8_t *cluster_data = NULL;
-    QCowL2Meta l2meta = {
-        .nb_clusters = 0,
-    };
+    QCowL2Meta *l2meta;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
                                  remaining_sectors);
 
-    qemu_co_queue_init(&l2meta.dependent_requests);
-
     qemu_iovec_init(&hd_qiov, qiov->niov);
 
     s->cluster_cache_offset = -1; /* disable compressed cache */
@@ -782,6 +778,9 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 
     while (remaining_sectors != 0) {
 
+        l2meta = g_malloc0(sizeof(*l2meta));
+        qemu_co_queue_init(&l2meta->dependent_requests);
+
         trace_qcow2_writev_start_part(qemu_coroutine_self());
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
         n_end = index_in_cluster + remaining_sectors;
@@ -791,17 +790,17 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         ret = qcow2_alloc_cluster_offset(bs, sector_num << 9,
-            index_in_cluster, n_end, &cur_nr_sectors, &l2meta);
+            index_in_cluster, n_end, &cur_nr_sectors, l2meta);
         if (ret < 0) {
             goto fail;
         }
 
-        if (l2meta.nb_clusters > 0 &&
+        if (l2meta->nb_clusters > 0 &&
             (s->compatible_features & QCOW2_COMPAT_LAZY_REFCOUNTS)) {
             qcow2_mark_dirty(bs);
         }
 
-        cluster_offset = l2meta.cluster_offset;
+        cluster_offset = l2meta->cluster_offset;
         assert((cluster_offset & 511) == 0);
 
         qemu_iovec_reset(&hd_qiov);
@@ -838,12 +837,14 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             goto fail;
         }
 
-        ret = qcow2_alloc_cluster_link_l2(bs, &l2meta);
+        ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
         if (ret < 0) {
             goto fail;
         }
 
-        run_dependent_requests(s, &l2meta);
+        run_dependent_requests(s, l2meta);
+        g_free(l2meta);
+        l2meta = NULL;
 
         remaining_sectors -= cur_nr_sectors;
         sector_num += cur_nr_sectors;
@@ -853,7 +854,10 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     ret = 0;
 
 fail:
-    run_dependent_requests(s, &l2meta);
+    if (l2meta != NULL) {
+        run_dependent_requests(s, l2meta);
+        g_free(l2meta);
+    }
 
     qemu_co_mutex_unlock(&s->lock);
 
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 04/16] qcow2: Drop l2meta.cluster_offset
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (2 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 03/16] qcow2: Allocate l2meta dynamically Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 05/16] qcow2: Allocate l2meta only for cluster allocations Kevin Wolf
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

There's no real reason to have an l2meta for normal requests that don't
allocate anything. Before we can get rid of it, we must return the host
cluster offset in a different way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   10 ++++++----
 block/qcow2.c         |   14 +++++++-------
 block/qcow2.h         |    5 +----
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 94b7f13..c4752ee 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -856,7 +856,7 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
  * Return 0 on success and -errno in error cases
  */
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-    int n_start, int n_end, int *num, QCowL2Meta *m)
+    int n_start, int n_end, int *num, uint64_t *host_offset, QCowL2Meta *m)
 {
     BDRVQcowState *s = bs->opaque;
     int l2_index, ret, sectors;
@@ -929,7 +929,6 @@ again:
 
     /* If there is something left to allocate, do that now */
     *m = (QCowL2Meta) {
-        .cluster_offset     = cluster_offset,
         .nb_clusters        = 0,
     };
     qemu_co_queue_init(&m->dependent_requests);
@@ -977,9 +976,11 @@ again:
             int alloc_n_start = keep_clusters == 0 ? n_start : 0;
             int nb_sectors = MIN(requested_sectors, avail_sectors);
 
+            if (keep_clusters == 0) {
+                cluster_offset = alloc_cluster_offset;
+            }
+
             *m = (QCowL2Meta) {
-                .cluster_offset = keep_clusters == 0 ?
-                                  alloc_cluster_offset : cluster_offset,
                 .alloc_offset   = alloc_cluster_offset,
                 .offset         = alloc_offset & ~(s->cluster_size - 1),
                 .nb_clusters    = nb_clusters,
@@ -1007,6 +1008,7 @@ again:
 
     assert(sectors > n_start);
     *num = sectors - n_start;
+    *host_offset = cluster_offset;
 
     return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 9e4d440..a98e899 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -790,7 +790,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         ret = qcow2_alloc_cluster_offset(bs, sector_num << 9,
-            index_in_cluster, n_end, &cur_nr_sectors, l2meta);
+            index_in_cluster, n_end, &cur_nr_sectors, &cluster_offset, l2meta);
         if (ret < 0) {
             goto fail;
         }
@@ -800,7 +800,6 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             qcow2_mark_dirty(bs);
         }
 
-        cluster_offset = l2meta->cluster_offset;
         assert((cluster_offset & 511) == 0);
 
         qemu_iovec_reset(&hd_qiov);
@@ -1122,6 +1121,7 @@ static int preallocate(BlockDriverState *bs)
 {
     uint64_t nb_sectors;
     uint64_t offset;
+    uint64_t host_offset = 0;
     int num;
     int ret;
     QCowL2Meta meta;
@@ -1129,18 +1129,18 @@ static int preallocate(BlockDriverState *bs)
     nb_sectors = bdrv_getlength(bs) >> 9;
     offset = 0;
     qemu_co_queue_init(&meta.dependent_requests);
-    meta.cluster_offset = 0;
 
     while (nb_sectors) {
         num = MIN(nb_sectors, INT_MAX >> 9);
-        ret = qcow2_alloc_cluster_offset(bs, offset, 0, num, &num, &meta);
+        ret = qcow2_alloc_cluster_offset(bs, offset, 0, num, &num,
+                                         &host_offset, &meta);
         if (ret < 0) {
             return ret;
         }
 
         ret = qcow2_alloc_cluster_link_l2(bs, &meta);
         if (ret < 0) {
-            qcow2_free_any_clusters(bs, meta.cluster_offset, meta.nb_clusters);
+            qcow2_free_any_clusters(bs, meta.alloc_offset, meta.nb_clusters);
             return ret;
         }
 
@@ -1159,10 +1159,10 @@ static int preallocate(BlockDriverState *bs)
      * all of the allocated clusters (otherwise we get failing reads after
      * EOF). Extend the image to the last allocated sector.
      */
-    if (meta.cluster_offset != 0) {
+    if (host_offset != 0) {
         uint8_t buf[512];
         memset(buf, 0, 512);
-        ret = bdrv_write(bs->file, (meta.cluster_offset >> 9) + num - 1, buf, 1);
+        ret = bdrv_write(bs->file, (host_offset >> 9) + num - 1, buf, 1);
         if (ret < 0) {
             return ret;
         }
diff --git a/block/qcow2.h b/block/qcow2.h
index 1106b33..24f1001 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -213,9 +213,6 @@ typedef struct QCowL2Meta
     /** Guest offset of the first newly allocated cluster */
     uint64_t offset;
 
-    /** Host offset of the first cluster of the request */
-    uint64_t cluster_offset;
-
     /** Host offset of the first newly allocated cluster */
     uint64_t alloc_offset;
 
@@ -336,7 +333,7 @@ void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     int *num, uint64_t *cluster_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-    int n_start, int n_end, int *num, QCowL2Meta *m);
+    int n_start, int n_end, int *num, uint64_t *host_offset, QCowL2Meta *m);
 uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          uint64_t offset,
                                          int compressed_size);
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 05/16] qcow2: Allocate l2meta only for cluster allocations
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (3 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 04/16] qcow2: Drop l2meta.cluster_offset Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 06/16] qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 Kevin Wolf
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   23 +++++++++--------------
 block/qcow2.c         |   32 +++++++++++++++++---------------
 block/qcow2.h         |    7 +++++--
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c4752ee..c2b59e7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -652,9 +652,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     uint64_t cluster_offset = m->alloc_offset;
 
     trace_qcow2_cluster_link_l2(qemu_coroutine_self(), m->nb_clusters);
-
-    if (m->nb_clusters == 0)
-        return 0;
+    assert(m->nb_clusters > 0);
 
     old_cluster = g_malloc(m->nb_clusters * sizeof(uint64_t));
 
@@ -856,7 +854,7 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
  * Return 0 on success and -errno in error cases
  */
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-    int n_start, int n_end, int *num, uint64_t *host_offset, QCowL2Meta *m)
+    int n_start, int n_end, int *num, uint64_t *host_offset, QCowL2Meta **m)
 {
     BDRVQcowState *s = bs->opaque;
     int l2_index, ret, sectors;
@@ -928,11 +926,6 @@ again:
     }
 
     /* If there is something left to allocate, do that now */
-    *m = (QCowL2Meta) {
-        .nb_clusters        = 0,
-    };
-    qemu_co_queue_init(&m->dependent_requests);
-
     if (nb_clusters > 0) {
         uint64_t alloc_offset;
         uint64_t alloc_cluster_offset;
@@ -980,7 +973,9 @@ again:
                 cluster_offset = alloc_cluster_offset;
             }
 
-            *m = (QCowL2Meta) {
+            *m = g_malloc0(sizeof(**m));
+
+            **m = (QCowL2Meta) {
                 .alloc_offset   = alloc_cluster_offset,
                 .offset         = alloc_offset & ~(s->cluster_size - 1),
                 .nb_clusters    = nb_clusters,
@@ -995,8 +990,8 @@ again:
                     .nb_sectors = avail_sectors - nb_sectors,
                 },
             };
-            qemu_co_queue_init(&m->dependent_requests);
-            QLIST_INSERT_HEAD(&s->cluster_allocs, m, next_in_flight);
+            qemu_co_queue_init(&(*m)->dependent_requests);
+            QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
         }
     }
 
@@ -1013,8 +1008,8 @@ again:
     return 0;
 
 fail:
-    if (m->nb_clusters > 0) {
-        QLIST_REMOVE(m, next_in_flight);
+    if (*m && (*m)->nb_clusters > 0) {
+        QLIST_REMOVE(*m, next_in_flight);
     }
     return ret;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index a98e899..c0a2822 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -778,8 +778,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 
     while (remaining_sectors != 0) {
 
-        l2meta = g_malloc0(sizeof(*l2meta));
-        qemu_co_queue_init(&l2meta->dependent_requests);
+        l2meta = NULL;
 
         trace_qcow2_writev_start_part(qemu_coroutine_self());
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
@@ -790,7 +789,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         ret = qcow2_alloc_cluster_offset(bs, sector_num << 9,
-            index_in_cluster, n_end, &cur_nr_sectors, &cluster_offset, l2meta);
+            index_in_cluster, n_end, &cur_nr_sectors, &cluster_offset, &l2meta);
         if (ret < 0) {
             goto fail;
         }
@@ -836,14 +835,16 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             goto fail;
         }
 
-        ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
-        if (ret < 0) {
-            goto fail;
-        }
+        if (l2meta != NULL) {
+            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
+            if (ret < 0) {
+                goto fail;
+            }
 
-        run_dependent_requests(s, l2meta);
-        g_free(l2meta);
-        l2meta = NULL;
+            run_dependent_requests(s, l2meta);
+            g_free(l2meta);
+            l2meta = NULL;
+        }
 
         remaining_sectors -= cur_nr_sectors;
         sector_num += cur_nr_sectors;
@@ -1124,11 +1125,10 @@ static int preallocate(BlockDriverState *bs)
     uint64_t host_offset = 0;
     int num;
     int ret;
-    QCowL2Meta meta;
+    QCowL2Meta *meta;
 
     nb_sectors = bdrv_getlength(bs) >> 9;
     offset = 0;
-    qemu_co_queue_init(&meta.dependent_requests);
 
     while (nb_sectors) {
         num = MIN(nb_sectors, INT_MAX >> 9);
@@ -1138,15 +1138,17 @@ static int preallocate(BlockDriverState *bs)
             return ret;
         }
 
-        ret = qcow2_alloc_cluster_link_l2(bs, &meta);
+        ret = qcow2_alloc_cluster_link_l2(bs, meta);
         if (ret < 0) {
-            qcow2_free_any_clusters(bs, meta.alloc_offset, meta.nb_clusters);
+            qcow2_free_any_clusters(bs, meta->alloc_offset, meta->nb_clusters);
             return ret;
         }
 
         /* There are no dependent requests, but we need to remove our request
          * from the list of in-flight requests */
-        run_dependent_requests(bs->opaque, &meta);
+        if (meta != NULL) {
+            run_dependent_requests(bs->opaque, meta);
+        }
 
         /* TODO Preallocate data if requested */
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 24f1001..6dc79b5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -207,7 +207,10 @@ typedef struct Qcow2COWRegion {
     int         nb_sectors;
 } Qcow2COWRegion;
 
-/* XXX This could be private for qcow2-cluster.c */
+/**
+ * Describes an in-flight (part of a) write request that writes to clusters
+ * that are not referenced in their L2 table yet.
+ */
 typedef struct QCowL2Meta
 {
     /** Guest offset of the first newly allocated cluster */
@@ -333,7 +336,7 @@ void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     int *num, uint64_t *cluster_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-    int n_start, int n_end, int *num, uint64_t *host_offset, QCowL2Meta *m);
+    int n_start, int n_end, int *num, uint64_t *host_offset, QCowL2Meta **m);
 uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          uint64_t offset,
                                          int compressed_size);
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 06/16] qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (4 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 05/16] qcow2: Allocate l2meta only for cluster allocations Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 07/16] qcow2: Factor out handle_dependencies() Kevin Wolf
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |    5 ++++-
 block/qcow2.c         |    7 +------
 block/qcow2.h         |    2 ++
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c2b59e7..7a038ac 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -668,11 +668,14 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* Update L2 table. */
-
+    if (s->compatible_features & QCOW2_COMPAT_LAZY_REFCOUNTS) {
+        qcow2_mark_dirty(bs);
+    }
     if (qcow2_need_accurate_refcounts(s)) {
         qcow2_cache_set_dependency(bs, s->l2_table_cache,
                                    s->refcount_block_cache);
     }
+
     ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
     if (ret < 0) {
         goto err;
diff --git a/block/qcow2.c b/block/qcow2.c
index c0a2822..6515fdd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -221,7 +221,7 @@ static void report_unsupported_feature(BlockDriverState *bs,
  * updated successfully.  Therefore it is not required to check the return
  * value of this function.
  */
-static int qcow2_mark_dirty(BlockDriverState *bs)
+int qcow2_mark_dirty(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t val;
@@ -794,11 +794,6 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             goto fail;
         }
 
-        if (l2meta->nb_clusters > 0 &&
-            (s->compatible_features & QCOW2_COMPAT_LAZY_REFCOUNTS)) {
-            qcow2_mark_dirty(bs);
-        }
-
         assert((cluster_offset & 511) == 0);
 
         qemu_iovec_reset(&hd_qiov);
diff --git a/block/qcow2.h b/block/qcow2.h
index 6dc79b5..a60fcb4 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -303,6 +303,8 @@ static inline bool qcow2_need_accurate_refcounts(BDRVQcowState *s)
 /* qcow2.c functions */
 int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
                   int64_t sector_num, int nb_sectors);
+
+int qcow2_mark_dirty(BlockDriverState *bs);
 int qcow2_update_header(BlockDriverState *bs);
 
 /* qcow2-refcount.c functions */
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 07/16] qcow2: Factor out handle_dependencies()
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (5 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 06/16] qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 08/16] qcow2: Reading from areas not in L2 tables yet Kevin Wolf
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   70 +++++++++++++++++++++++++++++-------------------
 1 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 7a038ac..468ef1b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -753,38 +753,16 @@ out:
 }
 
 /*
- * Allocates new clusters for the given guest_offset.
- *
- * At most *nb_clusters are allocated, and on return *nb_clusters is updated to
- * contain the number of clusters that have been allocated and are contiguous
- * in the image file.
- *
- * If *host_offset is non-zero, it specifies the offset in the image file at
- * which the new clusters must start. *nb_clusters can be 0 on return in this
- * case if the cluster at host_offset is already in use. If *host_offset is
- * zero, the clusters can be allocated anywhere in the image file.
- *
- * *host_offset is updated to contain the offset into the image file at which
- * the first allocated cluster starts.
- *
- * Return 0 on success and -errno in error cases. -EAGAIN means that the
- * function has been waiting for another request and the allocation must be
- * restarted, but the whole request should not be failed.
+ * Check if there already is an AIO write request in flight which allocates
+ * the same cluster. In this case we need to wait until the previous
+ * request has completed and updated the L2 table accordingly.
  */
-static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
-    uint64_t *host_offset, unsigned int *nb_clusters)
+static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
+    unsigned int *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *old_alloc;
 
-    trace_qcow2_do_alloc_clusters_offset(qemu_coroutine_self(), guest_offset,
-                                         *host_offset, *nb_clusters);
-
-    /*
-     * Check if there already is an AIO write request in flight which allocates
-     * the same cluster. In this case we need to wait until the previous
-     * request has completed and updated the L2 table accordingly.
-     */
     QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
 
         uint64_t start = guest_offset >> s->cluster_bits;
@@ -817,6 +795,42 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
         abort();
     }
 
+    return 0;
+}
+
+/*
+ * Allocates new clusters for the given guest_offset.
+ *
+ * At most *nb_clusters are allocated, and on return *nb_clusters is updated to
+ * contain the number of clusters that have been allocated and are contiguous
+ * in the image file.
+ *
+ * If *host_offset is non-zero, it specifies the offset in the image file at
+ * which the new clusters must start. *nb_clusters can be 0 on return in this
+ * case if the cluster at host_offset is already in use. If *host_offset is
+ * zero, the clusters can be allocated anywhere in the image file.
+ *
+ * *host_offset is updated to contain the offset into the image file at which
+ * the first allocated cluster starts.
+ *
+ * Return 0 on success and -errno in error cases. -EAGAIN means that the
+ * function has been waiting for another request and the allocation must be
+ * restarted, but the whole request should not be failed.
+ */
+static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
+    uint64_t *host_offset, unsigned int *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret;
+
+    trace_qcow2_do_alloc_clusters_offset(qemu_coroutine_self(), guest_offset,
+                                         *host_offset, *nb_clusters);
+
+    ret = handle_dependencies(bs, guest_offset, nb_clusters);
+    if (ret < 0) {
+        return ret;
+    }
+
     /* Allocate new clusters */
     trace_qcow2_cluster_alloc_phys(qemu_coroutine_self());
     if (*host_offset == 0) {
@@ -828,7 +842,7 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
         *host_offset = cluster_offset;
         return 0;
     } else {
-        int ret = qcow2_alloc_clusters_at(bs, *host_offset, *nb_clusters);
+        ret = qcow2_alloc_clusters_at(bs, *host_offset, *nb_clusters);
         if (ret < 0) {
             return ret;
         }
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 08/16] qcow2: Reading from areas not in L2 tables yet
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (6 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 07/16] qcow2: Factor out handle_dependencies() Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine Kevin Wolf
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

In preparation of delayed COW (i.e. completing the guest write request
before the associated COWs have completed) we must make sure that after
the guest data has written the new data is read back, even if the COW
hasn't completed and the new cluster isn't linked in the L2 table yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   39 +++++++++++++++++++++++++++++++++++++++
 block/qcow2.c         |    2 ++
 block/qcow2.h         |   19 +++++++++++++++++++
 3 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 468ef1b..a89d68d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -372,6 +372,40 @@ out:
     return ret;
 }
 
+static bool overlaps_allocation(BlockDriverState *bs, uint64_t start,
+    int *num, uint64_t *cluster_offset)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowL2Meta *m;
+    uint64_t end = start + (*num << BDRV_SECTOR_BITS);
+
+    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+
+        uint64_t old_start = l2meta_req_start(m);
+        uint64_t old_end = l2meta_req_end(m);
+
+        /* If the write hasn't completed yet and the allocating request can't
+         * have completed yet therefore, we're free to read the old data. */
+        if (!m->is_written) {
+            continue;
+        }
+
+        if (start >= old_start && start < old_end) {
+            /* Start of the new request overlaps: Read from the newly allocated
+             * cluster even if it isn't in the L2 table yet. */
+            *num = MIN(*num, (old_end - start) >> BDRV_SECTOR_BITS);
+            *cluster_offset = m->alloc_offset
+                + ((start - old_start) & ~(s->cluster_size - 1));
+            return true;
+        } else if (start < old_start && end > old_start) {
+            /* Overlap somewhere after the start. Shorten this request so that
+             * no overlap occurs. */
+            *num = MIN(*num, (old_start - start) >> BDRV_SECTOR_BITS);
+        }
+    }
+
+    return false;
+}
 
 /*
  * get_cluster_offset
@@ -398,6 +432,11 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     uint64_t nb_available, nb_needed;
     int ret;
 
+    /* Check overlap with not yet completed allocations */
+    if (overlaps_allocation(bs, offset, num, cluster_offset)) {
+        return QCOW2_CLUSTER_NORMAL;
+    }
+
     index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
     nb_needed = *num + index_in_cluster;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 6515fdd..2e32136 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -831,6 +831,8 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         if (l2meta != NULL) {
+            l2meta->is_written = true;
+
             ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
             if (ret < 0) {
                 goto fail;
diff --git a/block/qcow2.h b/block/qcow2.h
index a60fcb4..504dbad 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -229,6 +229,14 @@ typedef struct QCowL2Meta
     int nb_clusters;
 
     /**
+     * true if the guest data (but not necessarily the related COW) has been
+     * written to disk, so that read requests can (and after having completed
+     * this request actually _must_) read the new data instead of reading the
+     * old data that the L2 table still refers to.
+     */
+    bool is_written;
+
+    /**
      * Requests that overlap with this allocation and wait to be restarted
      * when the allocating request has completed.
      */
@@ -298,6 +306,17 @@ static inline bool qcow2_need_accurate_refcounts(BDRVQcowState *s)
     return !(s->incompatible_features & QCOW2_INCOMPAT_DIRTY);
 }
 
+static inline uint64_t l2meta_req_start(QCowL2Meta *m)
+{
+    return (m->offset + m->cow_start.offset)
+        + (m->cow_start.nb_sectors << BDRV_SECTOR_BITS);
+}
+
+static inline uint64_t l2meta_req_end(QCowL2Meta *m)
+{
+    return m->offset + (m->nb_available << BDRV_SECTOR_BITS);
+}
+
 // FIXME Need qcow2_ prefix to global functions
 
 /* qcow2.c functions */
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (7 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 08/16] qcow2: Reading from areas not in L2 tables yet Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 14:24   ` Paolo Bonzini
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW Kevin Wolf
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

This creates a separate coroutine for processing the COW and the L2
table update of allocating requests. The request itself can then
complete while the second part is still being processed.

We need a qemu_aio_flush() hook in order to ensure that these
coroutines for the second part aren't still running after bdrv_drain_all
(e.g. when the VM is stopped).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c       |    5 +++
 block/qcow2.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 block/qcow2.h |    8 +++++
 block_int.h   |    3 ++
 4 files changed, 93 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index e78039b..b852f3e 100644
--- a/block.c
+++ b/block.c
@@ -948,7 +948,12 @@ void bdrv_drain_all(void)
                 qemu_co_queue_restart_all(&bs->throttled_reqs);
                 busy = true;
             }
+
+            if (bs->drv && bs->drv->bdrv_drain) {
+                busy |= bs->drv->bdrv_drain(bs);
+            }
         }
+
     } while (busy);
 
     /* If requests are still pending there is a bug somewhere */
diff --git a/block/qcow2.c b/block/qcow2.c
index 2e32136..f9881d0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -482,6 +482,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 
     /* Initialise locks */
     qemu_co_mutex_init(&s->lock);
+    qemu_co_rwlock_init(&s->l2meta_flush);
 
     /* Repair image if dirty */
     if (!(flags & BDRV_O_CHECK) && !bs->read_only &&
@@ -751,6 +752,50 @@ static void run_dependent_requests(BDRVQcowState *s, QCowL2Meta *m)
     }
 }
 
+typedef struct ProcessL2Meta {
+    BlockDriverState *bs;
+    QCowL2Meta *m;
+} ProcessL2Meta;
+
+static void coroutine_fn process_l2meta(void *opaque)
+{
+    ProcessL2Meta *p = opaque;
+    QCowL2Meta *m = p->m;
+    BlockDriverState *bs = p->bs;
+    BDRVQcowState *s = bs->opaque;
+    int ret;
+
+    qemu_co_mutex_lock(&s->lock);
+
+    ret = qcow2_alloc_cluster_link_l2(bs, m);
+    if (ret < 0) {
+        /* FIXME */
+    }
+
+    run_dependent_requests(s, m);
+    g_free(m);
+
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_co_rwlock_unlock(&s->l2meta_flush);
+}
+
+static inline coroutine_fn void stop_l2meta(BDRVQcowState *s)
+{
+    qemu_co_rwlock_wrlock(&s->l2meta_flush);
+}
+
+static inline coroutine_fn void resume_l2meta(BDRVQcowState *s)
+{
+    qemu_co_rwlock_unlock(&s->l2meta_flush);
+}
+
+static bool qcow2_drain(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    return !QLIST_EMPTY(&s->cluster_allocs);
+}
+
 static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
                            int64_t sector_num,
                            int remaining_sectors,
@@ -831,16 +876,21 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         if (l2meta != NULL) {
-            l2meta->is_written = true;
+            Coroutine *co;
+            ProcessL2Meta p = {
+                .bs = bs,
+                .m  = l2meta,
+            };
 
-            ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
-            if (ret < 0) {
-                goto fail;
-            }
+            qemu_co_mutex_unlock(&s->lock);
+            qemu_co_rwlock_rdlock(&s->l2meta_flush);
+
+            l2meta->is_written = true;
+            co = qemu_coroutine_create(process_l2meta);
+            qemu_coroutine_enter(co, &p);
 
-            run_dependent_requests(s, l2meta);
-            g_free(l2meta);
             l2meta = NULL;
+            qemu_co_mutex_lock(&s->lock);
         }
 
         remaining_sectors -= cur_nr_sectors;
@@ -868,6 +918,11 @@ fail:
 static void qcow2_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
+
+    while (qcow2_drain(bs)) {
+        qemu_aio_wait();
+    }
+
     g_free(s->l1_table);
 
     qcow2_cache_flush(bs, s->l2_table_cache);
@@ -1405,10 +1460,12 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
     }
 
     /* Whatever is left can use real zero clusters */
+    stop_l2meta(s);
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
         nb_sectors);
     qemu_co_mutex_unlock(&s->lock);
+    resume_l2meta(s);
 
     return ret;
 }
@@ -1419,10 +1476,13 @@ static coroutine_fn int qcow2_co_discard(BlockDriverState *bs,
     int ret;
     BDRVQcowState *s = bs->opaque;
 
+    stop_l2meta(s);
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_discard_clusters(bs, sector_num << BDRV_SECTOR_BITS,
         nb_sectors);
     qemu_co_mutex_unlock(&s->lock);
+    resume_l2meta(s);
+
     return ret;
 }
 
@@ -1548,23 +1608,27 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     int ret;
 
+    stop_l2meta(s);
     qemu_co_mutex_lock(&s->lock);
+
     ret = qcow2_cache_flush(bs, s->l2_table_cache);
     if (ret < 0) {
-        qemu_co_mutex_unlock(&s->lock);
-        return ret;
+        goto fail;
     }
 
     if (qcow2_need_accurate_refcounts(s)) {
         ret = qcow2_cache_flush(bs, s->refcount_block_cache);
         if (ret < 0) {
-            qemu_co_mutex_unlock(&s->lock);
-            return ret;
+            goto fail;
         }
     }
+
+    ret = 0;
+fail:
     qemu_co_mutex_unlock(&s->lock);
+    resume_l2meta(s);
 
-    return 0;
+    return ret;
 }
 
 static int64_t qcow2_vm_state_offset(BDRVQcowState *s)
@@ -1690,6 +1754,7 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_co_readv          = qcow2_co_readv,
     .bdrv_co_writev         = qcow2_co_writev,
     .bdrv_co_flush_to_os    = qcow2_co_flush_to_os,
+    .bdrv_drain             = qcow2_drain,
 
     .bdrv_co_write_zeroes   = qcow2_co_write_zeroes,
     .bdrv_co_discard        = qcow2_co_discard,
diff --git a/block/qcow2.h b/block/qcow2.h
index 504dbad..73dac17 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -162,6 +162,14 @@ typedef struct BDRVQcowState {
 
     CoMutex lock;
 
+    /*
+     * Only to be aquired while s->lock is not held.
+     *
+     * Readers: All l2meta coroutines that are in flight
+     * Writers: Anyone who requires l2meta to be flushed
+     */
+    CoRwlock l2meta_flush;
+
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
     AES_KEY aes_encrypt_key;
diff --git a/block_int.h b/block_int.h
index 4452f6f..f259f52 100644
--- a/block_int.h
+++ b/block_int.h
@@ -198,6 +198,9 @@ struct BlockDriver {
      */
     int coroutine_fn (*bdrv_co_flush_to_os)(BlockDriverState *bs);
 
+    /** Returns true if the block device is still busy */
+    bool (*bdrv_drain)(BlockDriverState *bs);
+
     const char *protocol_name;
     int (*bdrv_truncate)(BlockDriverState *bs, int64_t offset);
     int64_t (*bdrv_getlength)(BlockDriverState *bs);
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (8 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 14:27   ` Paolo Bonzini
  2012-09-19 18:47   ` Blue Swirl
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine Kevin Wolf
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   29 +++++++++++++++++++++++++++++
 block/qcow2.c         |   31 ++++++++++++++++++++++++++++---
 block/qcow2.h         |   10 ++++++++++
 3 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a89d68d..0f50888 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -791,6 +791,34 @@ out:
     return i;
 }
 
+struct KickL2Meta {
+    QEMUBH *bh;
+    QCowL2Meta *m;
+};
+
+static void kick_l2meta_bh(void *opaque)
+{
+    struct KickL2Meta *k = opaque;
+    QCowL2Meta *m = k->m;
+
+    qemu_bh_delete(k->bh);
+    free(k);
+
+    if (m->sleeping) {
+        qemu_coroutine_enter(m->co, NULL);
+    }
+}
+
+static void kick_l2meta(QCowL2Meta *m)
+{
+    struct KickL2Meta *k = g_malloc(sizeof(*k));
+
+    k->bh = qemu_bh_new(kick_l2meta_bh, k);
+    k->m = m;
+
+    qemu_bh_schedule(k->bh);
+}
+
 /*
  * Check if there already is an AIO write request in flight which allocates
  * the same cluster. In this case we need to wait until the previous
@@ -823,6 +851,7 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
                 /* Wait for the dependency to complete. We need to recheck
                  * the free/allocated clusters when we continue. */
                 qemu_co_mutex_unlock(&s->lock);
+                kick_l2meta(old_alloc);
                 qemu_co_queue_wait(&old_alloc->dependent_requests);
                 qemu_co_mutex_lock(&s->lock);
                 return -EAGAIN;
diff --git a/block/qcow2.c b/block/qcow2.c
index f9881d0..2e220c7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -765,6 +765,12 @@ static void coroutine_fn process_l2meta(void *opaque)
     BDRVQcowState *s = bs->opaque;
     int ret;
 
+    if (!s->in_l2meta_flush) {
+        m->sleeping = true;
+        co_sleep_ns(rt_clock, 1000000);
+        m->sleeping = false;
+    }
+
     qemu_co_mutex_lock(&s->lock);
 
     ret = qcow2_alloc_cluster_link_l2(bs, m);
@@ -781,17 +787,37 @@ static void coroutine_fn process_l2meta(void *opaque)
 
 static inline coroutine_fn void stop_l2meta(BDRVQcowState *s)
 {
+    QCowL2Meta *m;
+
+    s->in_l2meta_flush = true;
+again:
+    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+        if (m->sleeping) {
+            qemu_coroutine_enter(m->co, NULL);
+            /* next_in_flight link could have become invalid */
+            goto again;
+        }
+    }
+
     qemu_co_rwlock_wrlock(&s->l2meta_flush);
 }
 
 static inline coroutine_fn void resume_l2meta(BDRVQcowState *s)
 {
+    s->in_l2meta_flush = false;
     qemu_co_rwlock_unlock(&s->l2meta_flush);
 }
 
 static bool qcow2_drain(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
+    QCowL2Meta *m;
+
+    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+        if (m->sleeping) {
+            qemu_coroutine_enter(m->co, NULL);
+        }
+    }
 
     return !QLIST_EMPTY(&s->cluster_allocs);
 }
@@ -876,7 +902,6 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         if (l2meta != NULL) {
-            Coroutine *co;
             ProcessL2Meta p = {
                 .bs = bs,
                 .m  = l2meta,
@@ -886,8 +911,8 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             qemu_co_rwlock_rdlock(&s->l2meta_flush);
 
             l2meta->is_written = true;
-            co = qemu_coroutine_create(process_l2meta);
-            qemu_coroutine_enter(co, &p);
+            l2meta->co = qemu_coroutine_create(process_l2meta);
+            qemu_coroutine_enter(l2meta->co, &p);
 
             l2meta = NULL;
             qemu_co_mutex_lock(&s->lock);
diff --git a/block/qcow2.h b/block/qcow2.h
index 73dac17..8bf145c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -169,6 +169,7 @@ typedef struct BDRVQcowState {
      * Writers: Anyone who requires l2meta to be flushed
      */
     CoRwlock l2meta_flush;
+    bool in_l2meta_flush;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
@@ -245,6 +246,15 @@ typedef struct QCowL2Meta
     bool is_written;
 
     /**
+     * true if the request is sleeping in the COW delay and the coroutine may
+     * be reentered in order to cancel the timer.
+     */
+    bool sleeping;
+
+    /** Coroutine that handles delayed COW and updates L2 entry */
+    Coroutine *co;
+
+    /**
      * Requests that overlap with this allocation and wait to be restarted
      * when the allocating request has completed.
      */
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (9 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 14:29   ` Paolo Bonzini
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 12/16] qcow2: Handle dependencies earlier Kevin Wolf
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Not exactly bisectable, but one large patch isn't much better either :-(

m->error is used to allow bdrv_drain() to stop with l2meta in error
state rather than go into an endless loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 block/qcow2.h |    3 +++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2e220c7..e001436 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -771,11 +771,33 @@ static void coroutine_fn process_l2meta(void *opaque)
         m->sleeping = false;
     }
 
+again:
     qemu_co_mutex_lock(&s->lock);
 
     ret = qcow2_alloc_cluster_link_l2(bs, m);
     if (ret < 0) {
-        /* FIXME */
+        /*
+         * This is a nasty situation: We have already completed the allocation
+         * write request and returned success, so just failing it isn't
+         * possible. We need to make sure to return an error during the next
+         * flush.
+         *
+         * However, we still can't drop the l2meta because we want I/O errors
+         * to be recoverable e.g. after the block device has been grown or the
+         * network connection restored. Sleep until the next flush comes and
+         * then retry.
+         */
+        s->flush_error = ret;
+
+        qemu_co_mutex_unlock(&s->lock);
+        qemu_co_rwlock_unlock(&s->l2meta_flush);
+        m->sleeping = true;
+        m->error = true;
+        qemu_coroutine_yield();
+        m->error = false;
+        m->sleeping = false;
+        qemu_co_rwlock_rdlock(&s->l2meta_flush);
+        goto again;
     }
 
     run_dependent_requests(s, m);
@@ -812,14 +834,27 @@ static bool qcow2_drain(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *m;
+    bool busy = false;
 
     QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
-        if (m->sleeping) {
+        if (m->sleeping && !m->error) {
             qemu_coroutine_enter(m->co, NULL);
         }
     }
 
-    return !QLIST_EMPTY(&s->cluster_allocs);
+    /*
+     * If there's still a sleeping l2meta, then an error must have occured.
+     * Don't consider l2metas in this state as busy, they only get active on
+     * flushes.
+     */
+    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+        if (!m->sleeping) {
+            busy = true;
+            break;
+        }
+    }
+
+    return busy;
 }
 
 static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
@@ -1648,7 +1683,8 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
         }
     }
 
-    ret = 0;
+    ret = s->flush_error;
+    s->flush_error = 0;
 fail:
     qemu_co_mutex_unlock(&s->lock);
     resume_l2meta(s);
diff --git a/block/qcow2.h b/block/qcow2.h
index 8bf145c..1c4dc0e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -171,6 +171,8 @@ typedef struct BDRVQcowState {
     CoRwlock l2meta_flush;
     bool in_l2meta_flush;
 
+    int flush_error;
+
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
     AES_KEY aes_encrypt_key;
@@ -250,6 +252,7 @@ typedef struct QCowL2Meta
      * be reentered in order to cancel the timer.
      */
     bool sleeping;
+    bool error;
 
     /** Coroutine that handles delayed COW and updates L2 entry */
     Coroutine *co;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 12/16] qcow2: Handle dependencies earlier
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (10 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 13/16] qcow2: Change handle_dependency to byte granularity Kevin Wolf
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Handling overlapping allocations aren't just a detail of cluster
allocation. They are rather one of three ways to get the host cluster
offset for a write request:

1. If a request overlaps an in-flight allocations, the cluster offset
   can be taken from there (this is what handle_dependencies will evolve
   into) or the request must just wait until the allocation has
   completed. Accessing the L2 is not valid in this case, it has
   outdated information.

2. Outside overlapping areas, check the clusters that can be written to
   as they are, with no COW involved.

3. If a COW is required, allocate new clusters

Changing the code to reflect this doesn't change the behaviour because
overlaps cannot exist for clusters that are kept in step 2. It does
however make it easier for later patches to work on clusters that belong
to an allocation that is still in flight.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   55 ++++++++++++++++++++++++++++++++++--------------
 block/qcow2.h         |    5 ++++
 2 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0f50888..4d5c3da 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -889,16 +889,10 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t *host_offset, unsigned int *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int ret;
 
     trace_qcow2_do_alloc_clusters_offset(qemu_coroutine_self(), guest_offset,
                                          *host_offset, *nb_clusters);
 
-    ret = handle_dependencies(bs, guest_offset, nb_clusters);
-    if (ret < 0) {
-        return ret;
-    }
-
     /* Allocate new clusters */
     trace_qcow2_cluster_alloc_phys(qemu_coroutine_self());
     if (*host_offset == 0) {
@@ -910,7 +904,7 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
         *host_offset = cluster_offset;
         return 0;
     } else {
-        ret = qcow2_alloc_clusters_at(bs, *host_offset, *nb_clusters);
+        int ret = qcow2_alloc_clusters_at(bs, *host_offset, *nb_clusters);
         if (ret < 0) {
             return ret;
         }
@@ -950,20 +944,51 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
     trace_qcow2_alloc_clusters_offset(qemu_coroutine_self(), offset,
                                       n_start, n_end);
 
-    /* Find L2 entry for the first involved cluster */
 again:
-    ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
-    if (ret < 0) {
-        return ret;
-    }
-
     /*
      * Calculate the number of clusters to look for. We stop at L2 table
      * boundaries to keep things simple.
      */
+    l2_index = offset_to_l2_index(s, offset);
     nb_clusters = MIN(size_to_clusters(s, n_end << BDRV_SECTOR_BITS),
                       s->l2_size - l2_index);
 
+    /*
+     * Now start gathering as many contiguous clusters as possible:
+     *
+     * 1. Check for overlaps with in-flight allocations
+     *
+     *      a) Overlap not in the first cluster -> shorten this request and let
+     *         the caller handle the rest in its next loop iteration.
+     *
+     *      b) Real overlaps of two requests. Yield and restart the search for
+     *         contiguous clusters (the situation could have changed while we
+     *         were sleeping)
+     *
+     *      c) TODO: Request starts in the same cluster as the in-flight
+     *         allocation ends. Shorten the COW of the in-fight allocation, set
+     *         cluster_offset to write to the same cluster and set up the right
+     *         synchronisation between the in-flight request and the new one.
+     *
+     * 2. Count contiguous COPIED clusters.
+     *    TODO: Consider cluster_offset if set in step 1c.
+     *
+     * 3. If the request still hasn't completed, allocate new clusters,
+     *    considering any cluster_offset of steps 1c or 2.
+     */
+    ret = handle_dependencies(bs, offset, &nb_clusters);
+    if (ret == -EAGAIN) {
+        goto again;
+    } else if (ret < 0) {
+        return ret;
+    }
+
+    /* Find L2 entry for the first involved cluster */
+    ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
+    if (ret < 0) {
+        return ret;
+    }
+
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
 
     /*
@@ -1028,9 +1053,7 @@ again:
         /* Allocate, if necessary at a given offset in the image file */
         ret = do_alloc_cluster_offset(bs, alloc_offset, &alloc_cluster_offset,
                                       &nb_clusters);
-        if (ret == -EAGAIN) {
-            goto again;
-        } else if (ret < 0) {
+        if (ret < 0) {
             goto fail;
         }
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 1c4dc0e..eb94463 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -302,6 +302,11 @@ static inline int size_to_l1(BDRVQcowState *s, int64_t size)
     return (size + (1ULL << shift) - 1) >> shift;
 }
 
+static inline int offset_to_l2_index(BDRVQcowState *s, int64_t offset)
+{
+    return (offset >> s->cluster_bits) & (s->l2_size - 1);
+}
+
 static inline int64_t align_offset(int64_t offset, int n)
 {
     offset = (offset + n - 1) & ~(n - 1);
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 13/16] qcow2: Change handle_dependency to byte granularity
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (11 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 12/16] qcow2: Handle dependencies earlier Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock Kevin Wolf
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   26 ++++++++++++++++----------
 block/qcow2.h         |   11 +++++++++++
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4d5c3da..440fdbf 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -825,29 +825,29 @@ static void kick_l2meta(QCowL2Meta *m)
  * request has completed and updated the L2 table accordingly.
  */
 static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
-    unsigned int *nb_clusters)
+    uint64_t bytes, unsigned int *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *old_alloc;
 
     QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
 
-        uint64_t start = guest_offset >> s->cluster_bits;
-        uint64_t end = start + *nb_clusters;
-        uint64_t old_start = old_alloc->offset >> s->cluster_bits;
-        uint64_t old_end = old_start + old_alloc->nb_clusters;
+        uint64_t start = guest_offset;
+        uint64_t end = start + bytes;
+        uint64_t old_start = l2meta_cow_start(old_alloc);
+        uint64_t old_end = l2meta_cow_end(old_alloc);
 
-        if (end < old_start || start > old_end) {
+        if (end <= old_start || start >= old_end) {
             /* No intersection */
         } else {
             if (start < old_start) {
                 /* Stop at the start of a running allocation */
-                *nb_clusters = old_start - start;
+                bytes = old_start - start;
             } else {
-                *nb_clusters = 0;
+                bytes = 0;
             }
 
-            if (*nb_clusters == 0) {
+            if (bytes == 0) {
                 /* Wait for the dependency to complete. We need to recheck
                  * the free/allocated clusters when we continue. */
                 qemu_co_mutex_unlock(&s->lock);
@@ -859,6 +859,9 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
         }
     }
 
+    *nb_clusters = size_to_clusters(s, guest_offset + bytes)
+                 - (guest_offset >> s->cluster_bits);
+
     if (!*nb_clusters) {
         abort();
     }
@@ -952,6 +955,7 @@ again:
     l2_index = offset_to_l2_index(s, offset);
     nb_clusters = MIN(size_to_clusters(s, n_end << BDRV_SECTOR_BITS),
                       s->l2_size - l2_index);
+    n_end = MIN(n_end, nb_clusters * s->cluster_sectors);
 
     /*
      * Now start gathering as many contiguous clusters as possible:
@@ -976,7 +980,9 @@ again:
      * 3. If the request still hasn't completed, allocate new clusters,
      *    considering any cluster_offset of steps 1c or 2.
      */
-    ret = handle_dependencies(bs, offset, &nb_clusters);
+    ret = handle_dependencies(bs, offset,
+                              (n_end - n_start) * BDRV_SECTOR_SIZE,
+                              &nb_clusters);
     if (ret == -EAGAIN) {
         goto again;
     } else if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index eb94463..06ca195 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -343,6 +343,17 @@ static inline uint64_t l2meta_req_end(QCowL2Meta *m)
     return m->offset + (m->nb_available << BDRV_SECTOR_BITS);
 }
 
+static inline uint64_t l2meta_cow_start(QCowL2Meta *m)
+{
+    return m->offset + m->cow_start.offset;
+}
+
+static inline uint64_t l2meta_cow_end(QCowL2Meta *m)
+{
+    return m->offset + m->cow_end.offset
+        + (m->cow_end.nb_sectors << BDRV_SECTOR_BITS);
+}
+
 // FIXME Need qcow2_ prefix to global functions
 
 /* qcow2.c functions */
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (12 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 13/16] qcow2: Change handle_dependency to byte granularity Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 14:33   ` Paolo Bonzini
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten Kevin Wolf
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 16/16] [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same time Kevin Wolf
  15 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

There's no reason for run_dependent_requests() to hold s->lock, and a
later patch will require that in fact the lock is not held.

Also, before this patch, run_dependent_requests() not only does what its
name suggests, but also removes the l2meta from the list of in-flight
requests. Change this, while we're touching it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e001436..88a2020 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -739,16 +739,9 @@ fail:
 
 static void run_dependent_requests(BDRVQcowState *s, QCowL2Meta *m)
 {
-    /* Take the request off the list of running requests */
-    if (m->nb_clusters != 0) {
-        QLIST_REMOVE(m, next_in_flight);
-    }
-
     /* Restart all dependent requests */
     if (!qemu_co_queue_empty(&m->dependent_requests)) {
-        qemu_co_mutex_unlock(&s->lock);
         qemu_co_queue_restart_all(&m->dependent_requests);
-        qemu_co_mutex_lock(&s->lock);
     }
 }
 
@@ -800,10 +793,18 @@ again:
         goto again;
     }
 
+    qemu_co_mutex_unlock(&s->lock);
+
+    /* Take the request off the list of running requests */
+    if (m->nb_clusters != 0) {
+        QLIST_REMOVE(m, next_in_flight);
+    }
+
+    /* Meanwhile some new dependencies could have accumulated */
     run_dependent_requests(s, m);
+
     g_free(m);
 
-    qemu_co_mutex_unlock(&s->lock);
     qemu_co_rwlock_unlock(&s->l2meta_flush);
 }
 
@@ -961,13 +962,16 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     ret = 0;
 
 fail:
+    qemu_co_mutex_unlock(&s->lock);
+
     if (l2meta != NULL) {
+        if (l2meta->nb_clusters != 0) {
+            QLIST_REMOVE(l2meta, next_in_flight);
+        }
         run_dependent_requests(s, l2meta);
         g_free(l2meta);
     }
 
-    qemu_co_mutex_unlock(&s->lock);
-
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
@@ -1259,7 +1263,7 @@ static int preallocate(BlockDriverState *bs)
         /* There are no dependent requests, but we need to remove our request
          * from the list of in-flight requests */
         if (meta != NULL) {
-            run_dependent_requests(bs->opaque, meta);
+            QLIST_REMOVE(meta, next_in_flight);
         }
 
         /* TODO Preallocate data if requested */
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (13 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  2012-09-18 14:44   ` Paolo Bonzini
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 16/16] [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same time Kevin Wolf
  15 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

This is the first part of an optimisation to improve the performance of
sequential cluster allocations.

Typically, write requests aren't aligned to cluster boundaries, so
sequential allocation means that every other request has to wait for the
COW of the previous request to complete. We can do better: Just cancel
the COW instead of waiting for it and then overwriting the same area
with the second write request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |  127 +++++++++++++++++++++++++++++++++++++++++++------
 block/qcow2.c         |   21 ++++++++
 block/qcow2.h         |   47 ++++++++++++++++++
 3 files changed, 180 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 440fdbf..ff22992 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -659,6 +659,8 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
     BDRVQcowState *s = bs->opaque;
     int ret;
 
+    r->final = true;
+
     if (r->nb_sectors == 0) {
         return 0;
     }
@@ -689,6 +691,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     int i, j = 0, l2_index, ret;
     uint64_t *old_cluster, *l2_table;
     uint64_t cluster_offset = m->alloc_offset;
+    bool has_wr_lock = false;
 
     trace_qcow2_cluster_link_l2(qemu_coroutine_self(), m->nb_clusters);
     assert(m->nb_clusters > 0);
@@ -707,6 +710,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* Update L2 table. */
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_co_rwlock_wrlock(&m->l2_writeback_lock);
+    has_wr_lock = true;
+    qemu_co_mutex_lock(&s->lock);
+
+    if (m->no_l2_update) {
+        ret = 0;
+        goto err;
+    }
+
     if (s->compatible_features & QCOW2_COMPAT_LAZY_REFCOUNTS) {
         qcow2_mark_dirty(bs);
     }
@@ -753,6 +766,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
     ret = 0;
 err:
+    if (has_wr_lock) {
+        qemu_co_rwlock_unlock(&m->l2_writeback_lock);
+    }
     g_free(old_cluster);
     return ret;
  }
@@ -825,7 +841,8 @@ static void kick_l2meta(QCowL2Meta *m)
  * request has completed and updated the L2 table accordingly.
  */
 static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
-    uint64_t bytes, unsigned int *nb_clusters)
+    uint64_t *host_offset, uint64_t bytes, unsigned int *nb_clusters,
+    QCowL2Meta **m)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *old_alloc;
@@ -840,22 +857,96 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
         if (end <= old_start || start >= old_end) {
             /* No intersection */
         } else {
+            uint64_t new_bytes;
+            uint64_t old_cow_end;
+
+           /*
+            * Shorten the request to stop at the start of a running
+            * allocation.
+            */
             if (start < old_start) {
-                /* Stop at the start of a running allocation */
-                bytes = old_start - start;
+                new_bytes = old_start - start;
             } else {
-                bytes = 0;
+                new_bytes = 0;
+            }
+
+            if (new_bytes > 0) {
+                bytes = new_bytes;
+                continue;
+            }
+
+            /*
+             * Check if we're just overwriting some COW of the old allocation
+             * that is safe to be replaced by the data of this request.
+             */
+            old_cow_end = old_alloc->offset + old_alloc->cow_end.offset;
+
+            if ((old_end & (s->cluster_size - 1)) == 0
+                && start >= old_cow_end
+                && !old_alloc->cow_end.final)
+            {
+                uint64_t subcluster_offset;
+                int nb_sectors;
+
+                *nb_clusters = 1;
+                subcluster_offset = offset_into_cluster(s, guest_offset);
+                nb_sectors = (subcluster_offset + bytes) >> BDRV_SECTOR_BITS;
+
+                /* Move forward cluster by cluster when overwriting COW areas,
+                 * or we'd have to deal with multiple overlapping requests and
+                 * things would become complicated. */
+                nb_sectors = MIN(s->cluster_sectors, nb_sectors);
+
+                /* Shorten the COW area at the end of the old request */
+                old_alloc->cow_end.nb_sectors =
+                    (guest_offset - old_cow_end) >> BDRV_SECTOR_BITS;
+
+                /* The new data region starts in the same cluster where the COW
+                 * region at the end of the old request starts. */
+                *host_offset = start_of_cluster(s,
+                    old_alloc->alloc_offset + old_alloc->cow_end.offset);
+
+                /* Create new l2meta that doesn't actually allocate new L2
+                 * entries, but describes the new data area so that reads
+                 * access the right cluster */
+                *m = g_malloc0(sizeof(**m));
+                **m = (QCowL2Meta) {
+                    .parent         = old_alloc,
+                    .alloc_offset   = *host_offset,
+                    .offset         = guest_offset & ~(s->cluster_size - 1),
+                    .nb_clusters    = 1,
+                    .nb_available   = nb_sectors,
+                    .no_l2_update   = true,
+
+                    .cow_start = {
+                        .offset     = subcluster_offset,
+                        .nb_sectors = 0,
+                    },
+                    .cow_end = {
+                        .offset     = nb_sectors << BDRV_SECTOR_BITS,
+                        .nb_sectors = s->cluster_sectors - nb_sectors,
+                    },
+                };
+                qemu_co_queue_init(&(*m)->dependent_requests);
+                qemu_co_rwlock_init(&(*m)->l2_writeback_lock);
+                QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+
+                /* Ensure that the L2 isn't written before COW has completed */
+                assert(!old_alloc->l2_writeback_lock.writer);
+                qemu_co_rwlock_rdlock(&old_alloc->l2_writeback_lock);
+
+                return 0;
             }
 
-            if (bytes == 0) {
-                /* Wait for the dependency to complete. We need to recheck
-                 * the free/allocated clusters when we continue. */
-                qemu_co_mutex_unlock(&s->lock);
+            /* Wait for the dependency to complete. We need to recheck
+             * the free/allocated clusters when we continue. */
+            qemu_co_mutex_unlock(&s->lock);
+            if (old_alloc->sleeping) {
                 kick_l2meta(old_alloc);
-                qemu_co_queue_wait(&old_alloc->dependent_requests);
-                qemu_co_mutex_lock(&s->lock);
-                return -EAGAIN;
             }
+            qemu_co_queue_wait(&old_alloc->dependent_requests);
+            qemu_co_mutex_lock(&s->lock);
+            return -EAGAIN;
         }
     }
 
@@ -942,7 +1033,7 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
     int l2_index, ret, sectors;
     uint64_t *l2_table;
     unsigned int nb_clusters, keep_clusters;
-    uint64_t cluster_offset;
+    uint64_t cluster_offset = 0;
 
     trace_qcow2_alloc_clusters_offset(qemu_coroutine_self(), offset,
                                       n_start, n_end);
@@ -969,7 +1060,7 @@ again:
      *         contiguous clusters (the situation could have changed while we
      *         were sleeping)
      *
-     *      c) TODO: Request starts in the same cluster as the in-flight
+     *      c) Request starts in the same cluster as the in-flight
      *         allocation ends. Shorten the COW of the in-fight allocation, set
      *         cluster_offset to write to the same cluster and set up the right
      *         synchronisation between the in-flight request and the new one.
@@ -980,13 +1071,17 @@ again:
      * 3. If the request still hasn't completed, allocate new clusters,
      *    considering any cluster_offset of steps 1c or 2.
      */
-    ret = handle_dependencies(bs, offset,
+    ret = handle_dependencies(bs, offset, &cluster_offset,
                               (n_end - n_start) * BDRV_SECTOR_SIZE,
-                              &nb_clusters);
+                              &nb_clusters, m);
     if (ret == -EAGAIN) {
         goto again;
     } else if (ret < 0) {
         return ret;
+    } else if (*m) {
+        keep_clusters = 1;
+        nb_clusters = 0;
+        goto done;
     }
 
     /* Find L2 entry for the first involved cluster */
@@ -1105,11 +1200,13 @@ again:
                 },
             };
             qemu_co_queue_init(&(*m)->dependent_requests);
+            qemu_co_rwlock_init(&(*m)->l2_writeback_lock);
             QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
         }
     }
 
     /* Some cleanup work */
+done:
     sectors = (keep_clusters + nb_clusters) << (s->cluster_bits - 9);
     if (sectors > n_end) {
         sectors = n_end;
diff --git a/block/qcow2.c b/block/qcow2.c
index 88a2020..abc3de3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -779,6 +779,11 @@ again:
          * to be recoverable e.g. after the block device has been grown or the
          * network connection restored. Sleep until the next flush comes and
          * then retry.
+         *
+         * FIXME:
+         * - Need to cancel any other requests to the same cluster (or at least
+         *   prevent the L2 entry from being updated) (Really? Don't they wait
+         *   automatically?)
          */
         s->flush_error = ret;
 
@@ -795,6 +800,22 @@ again:
 
     qemu_co_mutex_unlock(&s->lock);
 
+    if (m->parent) {
+        /*
+         * Allow the parent to write the L2 table entry now that both guest
+         * data write and COW have completed. Don't remove the request from the
+         * queue yet until the L2 is updated, we still need it for overriding
+         * the cluster_offset of requests to the same area.
+         *
+         * The release of the l2_writeback_lock and the queuing must be
+         * performed atomically to avoid a race. This code implements this
+         * correctly because unlocking only schedules a BH to wake the parent,
+         * but doesn't yield yet.
+         */
+        qemu_co_rwlock_unlock(&m->parent->l2_writeback_lock);
+        qemu_co_queue_wait(&m->parent->dependent_requests);
+    }
+
     /* Take the request off the list of running requests */
     if (m->nb_clusters != 0) {
         QLIST_REMOVE(m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 06ca195..7ed082b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -216,6 +216,12 @@ typedef struct Qcow2COWRegion {
 
     /** Number of sectors to copy */
     int         nb_sectors;
+
+    /**
+     * Becomes true when offset and nb_sectors cannot be modified any more
+     * because the COW is already being processed.
+     */
+    bool        final;
 } Qcow2COWRegion;
 
 /**
@@ -254,6 +260,12 @@ typedef struct QCowL2Meta
     bool sleeping;
     bool error;
 
+    /**
+     * true indicates that this is not an actual cluster allocation, but reuses
+     * the allocation of an earlier request, so don't update the L2 table.
+     */
+    bool no_l2_update;
+
     /** Coroutine that handles delayed COW and updates L2 entry */
     Coroutine *co;
 
@@ -275,6 +287,31 @@ typedef struct QCowL2Meta
      */
     Qcow2COWRegion cow_end;
 
+    /**
+     * The L2 table must only be updated when all COWs to the same sector have
+     * completed. A request that shortens part of the COW, takes a reader lock
+     * until its data and COW is written.
+     */
+    CoRwlock l2_writeback_lock;
+
+    /**
+     * QCowL2Metas can form a tree. The root of this tree is a request that
+     * actually allocated the host cluster in the image file. Children are
+     * requests that overwrite (part of) a COW region of their parents, so that
+     * the COW of their parent was cancelled.
+     *
+     * Important rules to bear in mind:
+     *
+     * - The parent must not write its L2 entry before all children have
+     *   written both their guest data and COW regions. If it did, the image on
+     *   disk would yet have incomplete COW and data would be corrupted.
+     *
+     * - The chilren must stay in s->cluster_allocs until the parent has
+     *   written the L2 entry, so that read requests correctly return the new
+     *   data even though the L2 table is not updated yet.
+     */
+    struct QCowL2Meta *parent;
+
     QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
@@ -291,6 +328,16 @@ enum {
 
 #define REFT_OFFSET_MASK 0xffffffffffffff00ULL
 
+static inline int start_of_cluster(BDRVQcowState *s, int64_t offset)
+{
+    return offset & ~(s->cluster_size - 1);
+}
+
+static inline int offset_into_cluster(BDRVQcowState *s, int64_t offset)
+{
+    return offset & (s->cluster_size - 1);
+}
+
 static inline int size_to_clusters(BDRVQcowState *s, int64_t size)
 {
     return (size + (s->cluster_size - 1)) >> s->cluster_bits;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [RFC PATCH 16/16] [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same time
  2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
                   ` (14 preceding siblings ...)
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten Kevin Wolf
@ 2012-09-18 11:40 ` Kevin Wolf
  15 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 11:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Cancelling COW when it's overwritten by a subsequent write request of
the guest was a good start, however in practice we don't gain
performance yet. The second write request is split in two, the first one
containing the overwritten COW area, and the second one allocating
another cluster.

We can do both at the same time and then we actually do gain
performance (iozone initial writer test up from 22 to 35 MB/s).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---

This patch is not correct at all and potentially corrupts images, it's
ugly too, but it works good enough to try out what gains to expect, so
I decided to include it here anyway.
---
 block/qcow2-cluster.c |   17 ++++++++++++-----
 block/qcow2.c         |   29 ++++++++++++++++++-----------
 block/qcow2.h         |    1 +
 3 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ff22992..39ef7b0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -888,7 +888,6 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
                 uint64_t subcluster_offset;
                 int nb_sectors;
 
-                *nb_clusters = 1;
                 subcluster_offset = offset_into_cluster(s, guest_offset);
                 nb_sectors = (subcluster_offset + bytes) >> BDRV_SECTOR_BITS;
 
@@ -1032,7 +1031,7 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
     BDRVQcowState *s = bs->opaque;
     int l2_index, ret, sectors;
     uint64_t *l2_table;
-    unsigned int nb_clusters, keep_clusters;
+    unsigned int nb_clusters, keep_clusters = 0;
     uint64_t cluster_offset = 0;
 
     trace_qcow2_alloc_clusters_offset(qemu_coroutine_self(), offset,
@@ -1079,17 +1078,21 @@ again:
     } else if (ret < 0) {
         return ret;
     } else if (*m) {
+        /* FIXME There could be more dependencies */
         keep_clusters = 1;
-        nb_clusters = 0;
-        goto done;
+        nb_clusters -= keep_clusters;
     }
 
+
     /* Find L2 entry for the first involved cluster */
     ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
     if (ret < 0) {
         return ret;
     }
 
+    if (cluster_offset != 0) {
+        goto do_alloc;
+    }
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
 
     /*
@@ -1111,6 +1114,7 @@ again:
         cluster_offset = 0;
     }
 
+do_alloc:
     if (nb_clusters > 0) {
         /* For the moment, overwrite compressed clusters one by one */
         uint64_t entry = be64_to_cpu(l2_table[l2_index + keep_clusters]);
@@ -1177,6 +1181,7 @@ again:
                                 << (s->cluster_bits - BDRV_SECTOR_BITS);
             int alloc_n_start = keep_clusters == 0 ? n_start : 0;
             int nb_sectors = MIN(requested_sectors, avail_sectors);
+            QCowL2Meta *old_m = *m;
 
             if (keep_clusters == 0) {
                 cluster_offset = alloc_cluster_offset;
@@ -1185,6 +1190,8 @@ again:
             *m = g_malloc0(sizeof(**m));
 
             **m = (QCowL2Meta) {
+                .next           = old_m,
+
                 .alloc_offset   = alloc_cluster_offset,
                 .offset         = alloc_offset & ~(s->cluster_size - 1),
                 .nb_clusters    = nb_clusters,
@@ -1198,6 +1205,7 @@ again:
                     .offset     = nb_sectors * BDRV_SECTOR_SIZE,
                     .nb_sectors = avail_sectors - nb_sectors,
                 },
+
             };
             qemu_co_queue_init(&(*m)->dependent_requests);
             qemu_co_rwlock_init(&(*m)->l2_writeback_lock);
@@ -1206,7 +1214,6 @@ again:
     }
 
     /* Some cleanup work */
-done:
     sectors = (keep_clusters + nb_clusters) << (s->cluster_bits - 9);
     if (sectors > n_end) {
         sectors = n_end;
diff --git a/block/qcow2.c b/block/qcow2.c
index abc3de3..e6fa616 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -959,19 +959,21 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
         }
 
         if (l2meta != NULL) {
-            ProcessL2Meta p = {
-                .bs = bs,
-                .m  = l2meta,
-            };
-
             qemu_co_mutex_unlock(&s->lock);
-            qemu_co_rwlock_rdlock(&s->l2meta_flush);
 
-            l2meta->is_written = true;
-            l2meta->co = qemu_coroutine_create(process_l2meta);
-            qemu_coroutine_enter(l2meta->co, &p);
+            while (l2meta != NULL) {
+                ProcessL2Meta p = {
+                    .bs = bs,
+                    .m  = l2meta,
+                };
+
+                qemu_co_rwlock_rdlock(&s->l2meta_flush);
+                l2meta->is_written = true;
+                l2meta->co = qemu_coroutine_create(process_l2meta);
+                qemu_coroutine_enter(l2meta->co, &p);
+                l2meta = l2meta->next;
+            }
 
-            l2meta = NULL;
             qemu_co_mutex_lock(&s->lock);
         }
 
@@ -985,12 +987,17 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 fail:
     qemu_co_mutex_unlock(&s->lock);
 
-    if (l2meta != NULL) {
+    while (l2meta != NULL) {
+        QCowL2Meta *next;
+
         if (l2meta->nb_clusters != 0) {
             QLIST_REMOVE(l2meta, next_in_flight);
         }
         run_dependent_requests(s, l2meta);
+
+        next = l2meta->next;
         g_free(l2meta);
+        l2meta = next;
     }
 
     qemu_iovec_destroy(&hd_qiov);
diff --git a/block/qcow2.h b/block/qcow2.h
index 7ed082b..9ebb5f9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -311,6 +311,7 @@ typedef struct QCowL2Meta
      *   data even though the L2 table is not updated yet.
      */
     struct QCowL2Meta *parent;
+    struct QCowL2Meta *next;
 
     QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine Kevin Wolf
@ 2012-09-18 14:24   ` Paolo Bonzini
  2012-09-18 14:44     ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 14:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 13:40, Kevin Wolf ha scritto:
> +            qemu_co_mutex_unlock(&s->lock);
> +            qemu_co_rwlock_rdlock(&s->l2meta_flush);

Should this lock be taken in process_l2meta?  It's a bit easier to follow.

Paolo

> +            l2meta->is_written = true;
> +            co = qemu_coroutine_create(process_l2meta);
> +            qemu_coroutine_enter(co, &p);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW Kevin Wolf
@ 2012-09-18 14:27   ` Paolo Bonzini
  2012-09-18 14:49     ` Kevin Wolf
  2012-09-19 18:47   ` Blue Swirl
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 14:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 13:40, Kevin Wolf ha scritto:
> +again:
> +    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
> +        if (m->sleeping) {
> +            qemu_coroutine_enter(m->co, NULL);
> +            /* next_in_flight link could have become invalid */
> +            goto again;
> +        }
> +    }
> +
>      qemu_co_rwlock_wrlock(&s->l2meta_flush);
>  }
>  
>  static inline coroutine_fn void resume_l2meta(BDRVQcowState *s)
>  {
> +    s->in_l2meta_flush = false;
>      qemu_co_rwlock_unlock(&s->l2meta_flush);
>  }
>  
>  static bool qcow2_drain(BlockDriverState *bs)
>  {
>      BDRVQcowState *s = bs->opaque;
> +    QCowL2Meta *m;
> +
> +    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
> +        if (m->sleeping) {
> +            qemu_coroutine_enter(m->co, NULL);
> +        }
> +    }
>  

Why are the goto and in_l2meta_flush not needed here?  If they are,
perhaps stop_l2meta can just use qcow2_drain?

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine Kevin Wolf
@ 2012-09-18 14:29   ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 14:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 13:40, Kevin Wolf ha scritto:
> Not exactly bisectable, but one large patch isn't much better either 

For better bisectability you could add the co_sleep_ns in a separate
patch, later in the series.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock Kevin Wolf
@ 2012-09-18 14:33   ` Paolo Bonzini
  2012-09-18 14:54     ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 14:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 13:40, Kevin Wolf ha scritto:
>  static void run_dependent_requests(BDRVQcowState *s, QCowL2Meta *m)
>  {
> -    /* Take the request off the list of running requests */
> -    if (m->nb_clusters != 0) {
> -        QLIST_REMOVE(m, next_in_flight);
> -    }
> -
>      /* Restart all dependent requests */
>      if (!qemu_co_queue_empty(&m->dependent_requests)) {
> -        qemu_co_mutex_unlock(&s->lock);
>          qemu_co_queue_restart_all(&m->dependent_requests);
> -        qemu_co_mutex_lock(&s->lock);
>      }

The comment and if can go away.

Perhaps this patch could be moved earlier in the series?  (Just asking,
in case the rebase is not too painful).

Paolo

>  }
>  
> @@ -800,10 +793,18 @@ again:
>          goto again;
>      }
>  
> +    qemu_co_mutex_unlock(&s->lock);
> +
> +    /* Take the request off the list of running requests */
> +    if (m->nb_clusters != 0) {
> +        QLIST_REMOVE(m, next_in_flight);
> +    }
> +
> +    /* Meanwhile some new dependencies could have accumulated */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine
  2012-09-18 14:24   ` Paolo Bonzini
@ 2012-09-18 14:44     ` Kevin Wolf
  2012-09-18 14:59       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 14:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Am 18.09.2012 16:24, schrieb Paolo Bonzini:
> Il 18/09/2012 13:40, Kevin Wolf ha scritto:
>> +            qemu_co_mutex_unlock(&s->lock);
>> +            qemu_co_rwlock_rdlock(&s->l2meta_flush);
> 
> Should this lock be taken in process_l2meta?  It's a bit easier to follow.

I'm pretty sure there was a reason, but it isn't obvious any more. I
guess I should have put a comment there... Maybe it doesn't exist any
more, or maybe it's not that obvious.

The difference would be that while waiting for the lock, the original
write request could complete instead of waiting as well, and that the
lock is potentially taken only in a BH instead of immediately.

What happens if bdrv_aio_flush() and bdrv_aio_writev() are both in
flight? If the flush runs its stop_l2meta() after the write request has
signalled completion, but before the COW coroutine has started, it gets
the lock even though a COW must still be processed. I believe we could
then return a successful flush when the metadata isn't really on disk yet.

So if you agree, I think we need to leave it where it is.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten Kevin Wolf
@ 2012-09-18 14:44   ` Paolo Bonzini
  2012-09-18 15:02     ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 14:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 13:40, Kevin Wolf ha scritto:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_co_rwlock_wrlock(&m->l2_writeback_lock);

Can anybody else take the lock as reader again at this point?  If not, I
wonder if this is more clear if you write it as a CoQueue.

Paolo

> +    has_wr_lock = true;
> +    qemu_co_mutex_lock(&s->lock);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW
  2012-09-18 14:27   ` Paolo Bonzini
@ 2012-09-18 14:49     ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 14:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Am 18.09.2012 16:27, schrieb Paolo Bonzini:
> Il 18/09/2012 13:40, Kevin Wolf ha scritto:
>> +again:
>> +    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
>> +        if (m->sleeping) {
>> +            qemu_coroutine_enter(m->co, NULL);
>> +            /* next_in_flight link could have become invalid */
>> +            goto again;
>> +        }
>> +    }
>> +
>>      qemu_co_rwlock_wrlock(&s->l2meta_flush);
>>  }
>>  
>>  static inline coroutine_fn void resume_l2meta(BDRVQcowState *s)
>>  {
>> +    s->in_l2meta_flush = false;
>>      qemu_co_rwlock_unlock(&s->l2meta_flush);
>>  }
>>  
>>  static bool qcow2_drain(BlockDriverState *bs)
>>  {
>>      BDRVQcowState *s = bs->opaque;
>> +    QCowL2Meta *m;
>> +
>> +    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
>> +        if (m->sleeping) {
>> +            qemu_coroutine_enter(m->co, NULL);
>> +        }
>> +    }
>>  
> 
> Why are the goto and in_l2meta_flush not needed here?  If they are,
> perhaps stop_l2meta can just use qcow2_drain?

I think you're right, thanks.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock
  2012-09-18 14:33   ` Paolo Bonzini
@ 2012-09-18 14:54     ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 14:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Am 18.09.2012 16:33, schrieb Paolo Bonzini:
> Il 18/09/2012 13:40, Kevin Wolf ha scritto:
>>  static void run_dependent_requests(BDRVQcowState *s, QCowL2Meta *m)
>>  {
>> -    /* Take the request off the list of running requests */
>> -    if (m->nb_clusters != 0) {
>> -        QLIST_REMOVE(m, next_in_flight);
>> -    }
>> -
>>      /* Restart all dependent requests */
>>      if (!qemu_co_queue_empty(&m->dependent_requests)) {
>> -        qemu_co_mutex_unlock(&s->lock);
>>          qemu_co_queue_restart_all(&m->dependent_requests);
>> -        qemu_co_mutex_lock(&s->lock);
>>      }
> 
> The comment and if can go away.

Ah, yes. And the one remaining line can be open coded in the two callers.

> Perhaps this patch could be moved earlier in the series?  (Just asking,
> in case the rebase is not too painful).

I'll check how painful it becomes. I'd expect it's doable.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine
  2012-09-18 14:44     ` Kevin Wolf
@ 2012-09-18 14:59       ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 14:59 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 16:44, Kevin Wolf ha scritto:
>>> +            qemu_co_mutex_unlock(&s->lock);
>>> >> +            qemu_co_rwlock_rdlock(&s->l2meta_flush);
>> > 
>> > Should this lock be taken in process_l2meta?  It's a bit easier to follow.
> I'm pretty sure there was a reason, but it isn't obvious any more. I
> guess I should have put a comment there... Maybe it doesn't exist any
> more, or maybe it's not that obvious.
> 
> The difference would be that while waiting for the lock, the original
> write request could complete instead of waiting as well, and that the
> lock is potentially taken only in a BH instead of immediately.
> 
> What happens if bdrv_aio_flush() and bdrv_aio_writev() are both in
> flight? If the flush runs its stop_l2meta() after the write request has
> signalled completion, but before the COW coroutine has started, it gets
> the lock even though a COW must still be processed. I believe we could
> then return a successful flush when the metadata isn't really on disk yet.

Then you need two comments, because you also need to document that
process_l2meta expects the read lock to be taken.

> So if you agree, I think we need to leave it where it is.

We do, but there may be other alternatives:

1) Perhaps you can take the rdlock again in process_l2meta, and also add
an unlock here just after qemu_coroutine_enter?

2) Perhaps you can replace the rwlock with an explicit CoQueue, for
which an rwlock is just a wrapper.  In thread terms, that would be a
condition variable, which I find easier to reason on than an rwlock.  In
this case, the stop_l2meta would see that there is a pending write, and
yield.  The bottom half then can pick up the work.

Do you actually need the writers side of the lock, since you're already
taking s->lock immediately after calling stop_l2meta?  i.e. can
qcow2_{zero,discard}_clusters release the s->lock, or not?  If not,
you're just using the write side to kick all readers, and that would be
a point in favor of (2).

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten
  2012-09-18 14:44   ` Paolo Bonzini
@ 2012-09-18 15:02     ` Kevin Wolf
  2012-09-18 15:05       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-09-18 15:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Am 18.09.2012 16:44, schrieb Paolo Bonzini:
> Il 18/09/2012 13:40, Kevin Wolf ha scritto:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_co_rwlock_wrlock(&m->l2_writeback_lock);
> 
> Can anybody else take the lock as reader again at this point?  If not, I
> wonder if this is more clear if you write it as a CoQueue.

This isn't "let one writer complete, then wake up n readers" (which
would indeed be represented more naturally as CoQueue), but rather "let
n readers complete, then wake up one writer".

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten
  2012-09-18 15:02     ` Kevin Wolf
@ 2012-09-18 15:05       ` Paolo Bonzini
  2012-09-18 15:08         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 15:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Il 18/09/2012 17:02, Kevin Wolf ha scritto:
>>> >> +    qemu_co_mutex_unlock(&s->lock);
>>> >> +    qemu_co_rwlock_wrlock(&m->l2_writeback_lock);
>> > 
>> > Can anybody else take the lock as reader again at this point?  If not, I
>> > wonder if this is more clear if you write it as a CoQueue.
> This isn't "let one writer complete, then wake up n readers" (which
> would indeed be represented more naturally as CoQueue), but rather "let
> n readers complete, then wake up one writer".

So would it be representable as a list of children (matching the parent
pointer) and then unlocking is

  remove self from children list
  if the children list is empty, wake parent->co

?

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten
  2012-09-18 15:05       ` Paolo Bonzini
@ 2012-09-18 15:08         ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2012-09-18 15:08 UTC (permalink / raw)
  Cc: Kevin Wolf, qemu-devel

Il 18/09/2012 17:05, Paolo Bonzini ha scritto:
>>>>>> >>> >> +    qemu_co_mutex_unlock(&s->lock);
>>>>>> >>> >> +    qemu_co_rwlock_wrlock(&m->l2_writeback_lock);
>>>> >> > 
>>>> >> > Can anybody else take the lock as reader again at this point?  If not, I
>>>> >> > wonder if this is more clear if you write it as a CoQueue.
>> > This isn't "let one writer complete, then wake up n readers" (which
>> > would indeed be represented more naturally as CoQueue), but rather "let
>> > n readers complete, then wake up one writer".
> So would it be representable as a list of children (matching the parent
> pointer) and then unlocking is
> 
>   remove self from children list
>   if the children list is empty, wake parent->co
> 
> ?

Of course this only work if the list of children cannot become non-empty
anymore after the writer restarts.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW
  2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW Kevin Wolf
  2012-09-18 14:27   ` Paolo Bonzini
@ 2012-09-19 18:47   ` Blue Swirl
  2012-09-20  6:58     ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Blue Swirl @ 2012-09-19 18:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Tue, Sep 18, 2012 at 11:40 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c |   29 +++++++++++++++++++++++++++++
>  block/qcow2.c         |   31 ++++++++++++++++++++++++++++---
>  block/qcow2.h         |   10 ++++++++++
>  3 files changed, 67 insertions(+), 3 deletions(-)
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index a89d68d..0f50888 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -791,6 +791,34 @@ out:
>      return i;
>  }
>
> +struct KickL2Meta {
> +    QEMUBH *bh;
> +    QCowL2Meta *m;
> +};
> +
> +static void kick_l2meta_bh(void *opaque)
> +{
> +    struct KickL2Meta *k = opaque;
> +    QCowL2Meta *m = k->m;
> +
> +    qemu_bh_delete(k->bh);
> +    free(k);

You use g_malloc() below.

> +
> +    if (m->sleeping) {
> +        qemu_coroutine_enter(m->co, NULL);
> +    }
> +}
> +
> +static void kick_l2meta(QCowL2Meta *m)
> +{
> +    struct KickL2Meta *k = g_malloc(sizeof(*k));
> +
> +    k->bh = qemu_bh_new(kick_l2meta_bh, k);
> +    k->m = m;
> +
> +    qemu_bh_schedule(k->bh);
> +}
> +
>  /*
>   * Check if there already is an AIO write request in flight which allocates
>   * the same cluster. In this case we need to wait until the previous
> @@ -823,6 +851,7 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
>                  /* Wait for the dependency to complete. We need to recheck
>                   * the free/allocated clusters when we continue. */
>                  qemu_co_mutex_unlock(&s->lock);
> +                kick_l2meta(old_alloc);
>                  qemu_co_queue_wait(&old_alloc->dependent_requests);
>                  qemu_co_mutex_lock(&s->lock);
>                  return -EAGAIN;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f9881d0..2e220c7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -765,6 +765,12 @@ static void coroutine_fn process_l2meta(void *opaque)
>      BDRVQcowState *s = bs->opaque;
>      int ret;
>
> +    if (!s->in_l2meta_flush) {
> +        m->sleeping = true;
> +        co_sleep_ns(rt_clock, 1000000);
> +        m->sleeping = false;
> +    }
> +
>      qemu_co_mutex_lock(&s->lock);
>
>      ret = qcow2_alloc_cluster_link_l2(bs, m);
> @@ -781,17 +787,37 @@ static void coroutine_fn process_l2meta(void *opaque)
>
>  static inline coroutine_fn void stop_l2meta(BDRVQcowState *s)
>  {
> +    QCowL2Meta *m;
> +
> +    s->in_l2meta_flush = true;
> +again:
> +    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
> +        if (m->sleeping) {
> +            qemu_coroutine_enter(m->co, NULL);
> +            /* next_in_flight link could have become invalid */
> +            goto again;
> +        }
> +    }
> +
>      qemu_co_rwlock_wrlock(&s->l2meta_flush);
>  }
>
>  static inline coroutine_fn void resume_l2meta(BDRVQcowState *s)
>  {
> +    s->in_l2meta_flush = false;
>      qemu_co_rwlock_unlock(&s->l2meta_flush);
>  }
>
>  static bool qcow2_drain(BlockDriverState *bs)
>  {
>      BDRVQcowState *s = bs->opaque;
> +    QCowL2Meta *m;
> +
> +    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
> +        if (m->sleeping) {
> +            qemu_coroutine_enter(m->co, NULL);
> +        }
> +    }
>
>      return !QLIST_EMPTY(&s->cluster_allocs);
>  }
> @@ -876,7 +902,6 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
>          }
>
>          if (l2meta != NULL) {
> -            Coroutine *co;
>              ProcessL2Meta p = {
>                  .bs = bs,
>                  .m  = l2meta,
> @@ -886,8 +911,8 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
>              qemu_co_rwlock_rdlock(&s->l2meta_flush);
>
>              l2meta->is_written = true;
> -            co = qemu_coroutine_create(process_l2meta);
> -            qemu_coroutine_enter(co, &p);
> +            l2meta->co = qemu_coroutine_create(process_l2meta);
> +            qemu_coroutine_enter(l2meta->co, &p);
>
>              l2meta = NULL;
>              qemu_co_mutex_lock(&s->lock);
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 73dac17..8bf145c 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -169,6 +169,7 @@ typedef struct BDRVQcowState {
>       * Writers: Anyone who requires l2meta to be flushed
>       */
>      CoRwlock l2meta_flush;
> +    bool in_l2meta_flush;
>
>      uint32_t crypt_method; /* current crypt method, 0 if no key yet */
>      uint32_t crypt_method_header;
> @@ -245,6 +246,15 @@ typedef struct QCowL2Meta
>      bool is_written;
>
>      /**
> +     * true if the request is sleeping in the COW delay and the coroutine may
> +     * be reentered in order to cancel the timer.
> +     */
> +    bool sleeping;
> +
> +    /** Coroutine that handles delayed COW and updates L2 entry */
> +    Coroutine *co;
> +
> +    /**
>       * Requests that overlap with this allocation and wait to be restarted
>       * when the allocating request has completed.
>       */
> --
> 1.7.6.5
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW
  2012-09-19 18:47   ` Blue Swirl
@ 2012-09-20  6:58     ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-09-20  6:58 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

Am 19.09.2012 20:47, schrieb Blue Swirl:
> On Tue, Sep 18, 2012 at 11:40 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block/qcow2-cluster.c |   29 +++++++++++++++++++++++++++++
>>  block/qcow2.c         |   31 ++++++++++++++++++++++++++++---
>>  block/qcow2.h         |   10 ++++++++++
>>  3 files changed, 67 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index a89d68d..0f50888 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -791,6 +791,34 @@ out:
>>      return i;
>>  }
>>
>> +struct KickL2Meta {
>> +    QEMUBH *bh;
>> +    QCowL2Meta *m;
>> +};
>> +
>> +static void kick_l2meta_bh(void *opaque)
>> +{
>> +    struct KickL2Meta *k = opaque;
>> +    QCowL2Meta *m = k->m;
>> +
>> +    qemu_bh_delete(k->bh);
>> +    free(k);
> 
> You use g_malloc() below.

Oops! Thanks, will fix.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-09-20  6:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 01/16] qcow2: Round QCowL2Meta.offset down to cluster boundary Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 02/16] qcow2: Introduce Qcow2COWRegion Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 03/16] qcow2: Allocate l2meta dynamically Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 04/16] qcow2: Drop l2meta.cluster_offset Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 05/16] qcow2: Allocate l2meta only for cluster allocations Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 06/16] qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 07/16] qcow2: Factor out handle_dependencies() Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 08/16] qcow2: Reading from areas not in L2 tables yet Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine Kevin Wolf
2012-09-18 14:24   ` Paolo Bonzini
2012-09-18 14:44     ` Kevin Wolf
2012-09-18 14:59       ` Paolo Bonzini
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW Kevin Wolf
2012-09-18 14:27   ` Paolo Bonzini
2012-09-18 14:49     ` Kevin Wolf
2012-09-19 18:47   ` Blue Swirl
2012-09-20  6:58     ` Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine Kevin Wolf
2012-09-18 14:29   ` Paolo Bonzini
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 12/16] qcow2: Handle dependencies earlier Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 13/16] qcow2: Change handle_dependency to byte granularity Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock Kevin Wolf
2012-09-18 14:33   ` Paolo Bonzini
2012-09-18 14:54     ` Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten Kevin Wolf
2012-09-18 14:44   ` Paolo Bonzini
2012-09-18 15:02     ` Kevin Wolf
2012-09-18 15:05       ` Paolo Bonzini
2012-09-18 15:08         ` Paolo Bonzini
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 16/16] [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same time Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).